Home » Articles » Unique Product Features » A SAX-like Parser for JSON

A SAX-like Parser for JSON

By Mark Joseph - May 22, 2008 @ 4:09 pm

Introduction

P6R’s JSON parser provides a C++ implementation of the SAX2 like interface. (See our JSN Streaming JSON Parser Library, which is also included in our XJR SDK.) Our parser implementation is designed to be high performance and to directly support a streaming IO model. The parser can be invoked with the entire JSON document in one buffer, or the JSON document feed into the parser a chunk at a time over multiple calls. This interface allows chaining of components (e.g., filters, sources and sinks). To assist a developer in debugging a detailed parse trace can be turned on programmatically.

Ultimately, our goal was two fold. First to provide a generic JSON parser that applications could use directly. And secondly, to support JSON data in our XPath 2.0 and XSLT 2.0 products. How the second goal was achieved is described in another P6R article: XSLT and XPath for JSON. We should note that our SAX-like JSON parser is fully implemented and is part of our P6Platform product.

P6R’s SAX2 parser and JSON parser reduce the amount of string copying to help improve performance. In the Java definition ‘String’ objects are frequently passed to the application. However, all this object creation and string copying comes at a cost. We have taken a different approach, we return pointers and a length to parsed strings. These pointers point into the application provided buffer that contains the document to be parsed. Pointers to application provided buffers are defined by our P6JSONSTRING type:

typedef struct  
{
    const P6CHAR* pStart; 
    P6UINT32      length;
} P6JSONSTRING;

P6JSONSTRING values are only valid during a callback into an application written content handler. An application that wants to keep a copy of the string must make a copy during a callback. This way an application has total control of how it manages its own memory and related performance concerns.

References:
(1) JSON Home Page, (2) RFC 4627, The application/json Media Type for JavaScript Object Notation (JSON), (3) D.Browneww, SAX2, O/Reilly, 2002, ISBN 0-596-00237-8.

P6IJSONReader Interface Reference

Detailed Description

This is the main JSON interface. Notice that there is no separate parse function, because parsing is done through the p6IDataStream interface.

// To parse either a single buffer or a stream of JSON buffers perform the following steps:
// First get an JSON reader object: 
p6IJSONReader *pReader;
err = p6CreateInstance( CID_p6JSONReader, IID_p6IJSONReader, &pReader );

// Second, using the JSON reader, get the p6IDataStream interface on that object: 
p6IDataStream *pStream;
err = pReader->queryInterface( p6IDataStream, &pStream );

err = pStream->beginStream(); // Third, initialize the data stream interface: 

// Pass the buffer(s) to be parsed one at a time: 
err = pStream->processStream( buffer, bufSize );   // (1st buffer of stream)
//  . . . . . . 
err = pStream->processStream( buffer, bufSize );   // (nth buffer of stream)

The input ‘buffer’ to the processStream() method is where the P6JSONSTRING pointers will often point to. The processStream() function can return an “eEndOfFile” error code to indicate that it is done with the buffer provided and that the buffer is incomplete (i.e., the JSON top most object has not yet been closed).

Lastly, close the stream down: err = pStream->endStream();

Public Member Functions

  1. P6R::P6ERR initialize( P6JSONFLAGS flags ): Sets up the interface to run properly.

  2. P6R::P6ERR getContentHandler( p6IJSONContentHandler **pObject ): Returns the content handler object defined and set by the application.

  3. P6R::P6ERR getErrorHandler( p6IJSONErrorHandler **pObject ): Returns the Error handler object defined and set by the application.

  4. P6R::P6ERR setContentHandler( p6IJSONContentHandler *pObject ): The calling application uses this method to register a content handler. That handler is called directly by the JSON parser as tokens from the JSON document are recognized.

  5. P6R::P6ERR setErrorHandler( p6IJSONErrorHandler *pObject ): The calling application uses this method to register an error event handler.

  6. P6R::P6ERR releaseAllHandlers (): This was addded to make it easy for an application to tell the JSON reader to free all set handlers.

P6IJSONContentHandler Interface Reference

Detailed Description

This JSON interface is implemented by an application using P6R’s JSON parser. This is a callback object that is registered with the p6IJSONReader::setContentHandler() method. This is the main parsing interface to the application. It provides a stream of events for each item in the input JSON document. Each event results in a method call to one of the methods below.

Public Member Functions

  1. P6R::P6ERR endDocument(): Provides an application the notification when the end of the JSON input document has been reached by the parser.

  2. P6R::P6ERR endObject( P6UINT32 nestingLevel ): Provides an application the notification when the most recent object (as defined by a callback of the startObject method below) has ended. Since JSON objects can nest, the ‘nestingLevel’ parameter is provided so an application can match the proper startObject() and endObject() calls.

  3. P6R::P6ERR endArray ( P6UINT32 nestingLevel ): Provides an application the notification when the most recent array (as defined by a callback of the startArray method below) has ended. Since JSON arrays can nest, the ‘nestingLevel’ parameter is provided so an application can match the proper startArray() and endArray() calls.

  4. P6R::P6ERR setDocumentLocator( p6IJSONLocator *pObject ): This method is called just before the startDocument() method and the application is given a p6IJSONLocator object created by the JSON parser.

  5. P6R::P6ERR startDocument(): Provides an application the notification when the start of the JSON input document has been reached by the parser.

  6. P6R::P6ERR startObject( P6UINT32 nestingLevel ): Provides an application the notification when a new JSON object has been detected.

  7. P6R::P6ERR startPair( P6JSONSTRING *pName ): A JSON object consists of an unordered set of name/value pairs.

  8. P6R::P6ERR startArray( P6UINT32 nestingLevel ): Provides an application the notification when a new JSON array has been detected.

  9. P6R::P6ERR value( P6JSONVALUE *pValue): Both JSON objects and arrays contain values which can be strings, boolean values, numeric values, nested objects, nested arrays, and the value “null”.



The “P6R::P6ERR value(P6JSONVALUE *pValue)” method is designed after the SAX2 characters(…) method. We extended this concept in JSON so that we could return parsed out values from a JSON string.

// P6JSON_TYPE can be one of the following values:
// P6JSON_TYPE_NULL      -- a JSON 'null' value
// P6JSON_TYPE_STR       -- a complete string has been returned in one chunk
// P6JSON_TYPE_STRPART   -- part of a string has been returned supporting a streaming interface
// P6JSON_TYPE_STREND    -- end of a string has been returned 	
// P6JSON_TYPE_BOOL      -- { P6TRUE, P6FALSE }	
// P6JSON_TYPE_INTEGER 	
// P6JSON_TYPE_REAL 

struct jsonValueParts
{
    P6JSON_TYPE  type;         // which type is used?
    P6JSONSTRING jstring;      // offset into buffer where string was found
    P6INT32      integer;      // P6JSON_TYPE_INTEGER
    P6FLOAT      real;         // P6JSON_TYPE_REAL
    P6BOOL       boolean;      // P6JSON_TYPE_BOOL
};
typedef struct jsonValueParts P6JSONVALUE;

P6IJSONLocator Interface Reference

Detailed Description

An instance of this interface is passed to the application in 2 ways: a) via the setDocumentLocator() method of the p6IJSONContentHandler interface implemented by an application, and registered with the JSON Reader component (p6IJSONReader). and b) Via the warning(), error(), and fatalError() methods of the p6IJSONErrorHandler interface implemented by an application and registered with the JSON Reader component.

Public Member Functions

  1. P6R::P6ERR getColumnNumber( P6INT32 *pNumber ): The application making use of the JSON Reader can use this interface to find out where in the JSON document the parser is currently processing.

  2. P6R::P6ERR getLineNumber( P6INT32 *pNumber ): Return the current line number, in the JSON document being parsed, where the JSON parser is currently pointing to. This function is typically used to pinpoint parsing errors.

  3. P6R::P6ERR convertToWideString( P6JSONSTRING *pJstring, P6WCHAR *pOut, P6UINT32 *pLength ): (A Helper function.) JSON value strings can be encoded with hex digits in the form “\uhhhh” and have control characters encoded in strings (e.g., tab as “\t”). This function translates all encodings into their wide character representation.

P6IJSONErrorHandler Interface Reference

Detailed Description

This JSON interface is implemented by an application using P6R’s JSON parser. This is a callback object that is registered with the p6IJSONReader::setErrorHandler() method. Once registered the JSON parser will call one of the 3 methods of this object to notify the application of a parsing error. The application can take the error information and write it to a file or to a socket or etc. P6R provides a default implementation of this interface if none is set by the application.

Public Member Functions

  1. P6R::P6ERR warning( P6ERR errorCode, p6IJSONLocator *pObject ) OR
    P6R::P6ERR warningEx ( P6ERR errorCode, p6IJSONLocator *pObject, const P6CHAR *pDisplay ): The JSON parser notifies the application of an unusual condition detected during parsing.

  2. P6R::P6ERR error ( P6ERR errorCode, p6IJSONLocator *pObject ) OR
    P6R::P6ERR errorEx( P6ERR errorCode, p6IJSONLocator *pObject, const P6CHAR *pDisplay ): An application can typically proceed when receiving a warning. However, on receipt of an error or fatal parsing error the application should stop parsing the document.

  3. P6R::P6ERR fatalError( P6ERR errorCode, p6IJSONLocator *pObject ) OR
    P6R::P6ERR fatalErrorEx( P6ERR errorCode, p6IJSONLocator *pObject, const P6CHAR *pDisplay ):The JSON parser notifies the application that a non-recoverable parsing error has been detected.

A Simple Example

Given the very simple JSON document:

{ "menuitem": "File",
   "offsets": [ 44, 99 ]
}

The following sequence of callbacks would happen in the application written instance of the
P6IJSONContentHandler interface:

 startDocument()  
 startObject( 1 ) -  {
 startPair()     -  menuitem   
 value()         -  File  (P6JSON_TYPE_STR)
 startPair()     -  offsets  (P6JSON_TYPE_STR)
 startArray( 1 ) -  [
 value()         -  44  (P6JSON_TYPE_INTEGER)
 value()         -  99  (P6JSON_TYPE_INTEGER)
 endArray( 1 )   -  ]
 endObject( 1 )  -  }
 endDocument() 


Now if we extended this example slightly so that the first string value returned from the JSON parser is broken into two pieces (i.e., each string chunk appeared in a different input buffer) the sequence of callbacks would change to the following:

{ "menuitem": "File with the name henry-the-great-dane.txt",
   "offsets": [ 44, 99 ]
}

The following sequence of callbacks would happen in the application written instance of the
P6IJSONContentHandler interface:

 startDocument()  
 startObject( 1 ) -  {
 startPair()      -  menuitem  
 value()          -  'File with the name '  (P6JSON_TYPE_STRPART)
 value()          -  'henry-the-great-dane.ext'  (P6JSON_TYPE_STREND)
 startPair()      -  offsets  (P6JSON_TYPE_STR)
 startArray( 1 )  -  [
 value()          -  44  (P6JSON_TYPE_INTEGER)
 value()          -  99  (P6JSON_TYPE_INTEGER)
 endArray( 1 )    -  ]
 endObject( 1 )   -  }
 endDocument()   

"A SAX-like Parser for JSON" was published on May 22nd, 2008 and is listed in Unique Product Features.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Comments on "A SAX-like Parser for JSON": 3 Comments

  1. On May 23rd, 2008 at 10:02 pm Claudio said,

    Hi,
    in the example there is an error. No event for attribute “offesets” is thrown.
    I think that the correct events are:
    startDocument()
    startObject( 1 ) – {
    startPair() – menuitem
    value() – File (P6JSON_TYPE_STR)
    startPair() – offsets
    startArray( 1 ) – [
    value() - 44 (P6JSON_TYPE_INTEGER)
    value() - 99 (P6JSON_TYPE_INTEGER)
    endArray( 1 ) - ]
    endObject( 1 ) – }
    endDocument()

  2. On May 24th, 2008 at 5:07 pm Mark Joseph said,

    You are completely correct I fixed the error.

  3. On November 2nd, 2008 at 9:46 pm <xsl:output method=’json’/> | Project 6 Research - Articles said,

    [...] on this site have described the JSON support we have added to our XSLT and XPath products (see A SAX-like Parser for JSON  and XSLT and XPath for JSON). In this article, we describe two additional extensions to our XSLT [...]


Leave Your Comment