XSLT and XPath for JSON

By Mark Joseph - May 6, 2008 @ 3:12 pm

Instead of developing separate software to provide XSLT and XPath functionality to documents encoded in JSON we decided to extend our existing XML suite of tools to accommodate more than XML documents. Once we had this ability we took one more step and applied this capability to another XPath enabled software in our product list (namely our XPath enabled Rule engine).

P6R’s suite of XML tools implement the XPath 2.0 and XSLT 2.0 standards. Excellent references to this functionality are: (1) M.Kay, XPath 2.0, Programmer’s Reference, 2004, Wiley Publishing, ISBN 0-7645-6910-4, and (2) M,Kay, XSLT 2.0, 3rd edition, Programmer’s Reference, Wiley Publishing, ISBN 0-7645-6909-0. The 2.0 versions of these standards represent a significant increase in functionality from their original versions (e.g., both have embedded regular expressions).

All of P6R’s products are written in C++. In addition to the XSLT and XPath products P6R’s XML suite consist of a SAX2-like XML parser, and a DOM XML tree parser which holds an XML document in a tree of nodes. For a reference on SAX2 see: D.Brownell, SAX2, O’Reilly, 2002, ISBN 0-596-00237-8. All of these components work together in the following way:

Figure 1. P6R\'s XML tool suite

In figure 1, the XML input document is parsed by the SAX2 parser which sends events to the XML DOM Tree parser. The XML DOM tree has a SAX2 “content handler” that receives the SAX events (e.g., start of element, characters, …). The content handler makes calls to the XML DOM Tree API to create the nodes in the XML Tree. Once the XML document is totally parsed it is available via the XPath 2.0 API and the DOM tree API. It is important to realize that once the data is placed into the DOM tree its original format is basically invisible to the code above.

Next we decided that we wanted a JSON parser and thought that a SAX-like API (i.e., a stream of events such as start of object, and start of array) would be a unique and powerful tool. Such a JSON parser should fit into our XML architecture outlined above. So we built a SAX-like JSON parser and the tool suite became as follows:

Figure 2. P6R\'s XML & JSON tool suite

In figure 2, the new XML DOM Tree layer is renamed to just be DOM tree. The DOM tree layer is extended to also contain a JSON SAX parser “content handler” that receives the JSON events like “start of array”, and a series of “values”. See the JSON reference: http://www.json.org. This new JSON content handler takes the stream of JSON events and calls the DOM tree API to construct a tree data structure in the same basic way as the XML SAX2 content handler did. The end result is that above the DOM tree layer, the origin of the data (that is either XML or JSON) is mostly invisible (we describe a few exceptions to this below).

An XML document can have namespaces and attributes for its contained elements. See the XML standard reference: http://www.w3.org/TR/REC-xml/. However, its unclear how these concepts map to a JSON document. So in translating a JSON document into a DOM tree we took the approach that there are no attributes in a JSON document. Without some related JSON like “schema” its not possible to infer additional structure. Our goal was to be able to parse and represent as many JSON documents as possible.

An alternative approach could have been to translate a JSON document directly into an XML document and then just run the XML SAX parser against it. This was not chosen for two reasons. First, we wanted to built a native JSON parser anyway and the SAX approach was attractive to us. Second, converting the JSON to XML seems slow and unnecessary since the conversion requires a JSON parser anyway.

At the XPath level, access to an element in a DOM tree, which came from a JSON source, requires a special root path. For example, given the following XML:

<menu>
   <lunch>
       <soup> ... </soup>
       <veggies> ... </veggies>
   </lunch>
</menu>

a typical XPath step expression could be “/menu/lunch/soup”, for the contents of the soup element. However, not all JSON documents have a top level name such as “menu”. For example, the following JSON document is perfectly legal:

{ "soup": "...",
   "veggies: "...",
   "dessert": "..."
}

In the above example, the top most “wrapper” does not have a name associated with it. In order to handle as many JSON documents as possible our implementation of the DOM tree for JSON was to give a name to the top most “{” “}” pair of any JSON document. That name is simply “JSON-document”. And so one XPath step expression to the example above becomes “/JSON-document/soup”. Given another example:

{ "menu": {
    "lunch": {
        "soup": "...",
        "veggies": "..."
    }
}

which is the previous XML example encoded in JSON. Now the similar XPath step expression would be “/JSON-document/menu/lunch/soup”. Other than this special top level path name and the lack of XML element attributes, all the functionality provided by XPATH 2.0 and XSLT 2.0 is available to any JSON encoded document.

XSLT defines several of its own support functions, such as key(), format-date(), document(), etc. The purpose of the “document()” function is to allow an XSLT script to include an external XML document, parse it and return the parsed XML tree back to the rest of the stylesheet. We took the implementation of this function one step further by allowing JSON encoded documents to be included via the document() function. Our implementation detects if the included document is JSON or XML and parses it accordingly. This proper parsing is done by checking the document’s MIME type, or for local files looking for the opening “{” of a JSON document. The cool thing with this feature is that now both XML and JSON documents can freely be mixed in the processing of a single XSLT stylesheet. Any XSLT operation that can be done on an XML document (e.g., xsl:copy) can also be done on an included JSON document.

At this point, we where very happy with the functionality we achieved. However, it became apparent that JSON support would be immediately available to any other software that used our XPath and DOM tree components. P6R’s Rule engine software uses our XPath and DOM tree components because the rules are defined in XML with XPath expressions. For example, a typical rule could look like:

<rule name='yellocar' setname='over1'>
     <if test='$Var1'/>
     <then>
        <set-variable name='$g3' select='/menu/dessert/item'/>
        <call func=\"setclickurl( 'www.p6r.com', $g3 )\" />
        <set-fact name='sunshine' select='15*4' location='/P6R:infer/mealitem' />
     </then>
</rule>

While this rule is defined in XML its input data (or “facts” in rule engine jargon) can be encoded in XML or JSON. And the JSON support comes for free. Without going into details on how the rule engine works the picture of the software layers now becomes:

Figure 3. Other XPath enabled software

The main point we are making here is that any software that uses our XPath and DOM tree components are XML and JSON enabled with next to no effort.

Author: Mark K. Joseph, Ph.D.

"XSLT and XPath for JSON" was published on May 6th, 2008 and is listed in Unique Product Features.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Comments on "XSLT and XPath for JSON": 3 Comments

  1. On May 22nd, 2008 at 9:05 am An XPath Enabled Rule Engine | Project 6 Research - Articles said,

    [...] Our XPath 2.0 supports stated facts defined in either XML or JSON , thus our Rule Engine is also JSON [...]

  2. On May 22nd, 2008 at 4:09 pm A SAX-like Parser for JSON | Project 6 Research - Articles said,

    [...] 2.0 and XSLT 2.0 products. How the second goal was achieved is described in another P6R article: XSLT and XPath for JSON. We should note that our SAX-like JSON parser is fully implemented and is part of our P6Platform [...]

  3. On May 24th, 2008 at 7:30 am Phil Gibbs said,

    Very useful - will evaluate.


Leave Your Comment