Home » Articles » Unique Product Features » XML to JSON and Back

XML to JSON and Back

By Mark Joseph - April 5, 2010 @ 4:41 pm

1. The Problem

There is no standard way of translating an XML document into its JSON equivalent.  An acceptable translation scheme should be lossless (i.e., not lose any of the XML specific information).  Also a translation scheme should ensure that if the JSON document is converted back into XML that important XML aspects of the original document are maintained (e.g., element attributes, namespaces).

2. Detailed Translation Issues

The following three issues make it difficult to provide a lossless XML to JSON translation.

2.1 Issue 1: JSON does not have an equivalent to XML’s element attributes.

For example in the following XML:

<description lang="en-us" maxlength="500">Vitamin D Supplementation</description>

The “description” element has two attributes “lang” and “maxlength” that modify the contents of that element.  How are these attributes included in the JSON representation of the “description” element?  For example,  the scheme defined in
http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html results in the following JSON:

{ "description":
      { "@lang":"en-us",
        "@maxlength":"500",
        "#text":"Vitamin D Supplementation"
      }
}

While this is one of the better approaches, notice that the original parameter names are modified and that the value of the description element is given a name of “#text” to generate proper JSON syntax.

2.2 Issue 2: XML’s Mixed Content

The content of XML elements can contain “mixed content” (i.e., elements and text mixed together).  The following simple XML example demonstrates this:

<maincourse> lean steak <prepared>well done</prepared> with a side of green beans</maincourse>

JSON does not support elements with mixed content.   So how can we translate the above into JSON that will allow us to convert back into XML without losing the mixed content form?

2.3 Issue 3: XML Namespaces

JSON does not have the equivalent of XML namespaces.   The following XML example demonstrates this:

<book abc="http://www.p6r.com/specialindex">
    <title>XML test example</title>
    <abc:title>12348383747</abc:title>
    <chapter id="1">
    . . . .
    </chapter>
</book>

The point here is that “title” and “abc:title” are different.   And in XML the “abc” prefix is just a place holder for the URL “http://www.p6r.com/specialindex”.   The real name for “abc:title” (in XML its QName or “Qualified Name”, see http://www.w3.org/2001/tag/doc/qnameids)  is a
tuple {http://www.p6r.com/specialindex, title}.    How is this converted into JSON in a way that can be converted back into XML?

3. Some Existing Approaches

The following is a list of published XML to JSON translation schemes. The first three do not meet the desired properties discussed above.

3.1 http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp

In this scheme, there appears to be no description on how the following XML is encoded:
<a p=”5″>value</a>.   Attributes are handled only on elements that contain other elements.  Also there is no discussion of Issue 2 or 3.

3.2 http://www.phdcc.com/xml2json.htm
and http://onwebdevelopment.blogspot.com/2008/05/converting-xml-to-json.html

Both these schemes are similar.  XML attributes are carried into the JSON representation.   However, the fact that they where attributes is lost.  Also the approach to handle mixed content makes it impossible to convert back into XML, since the order is lost.

3.3 http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

This approach is perhaps the best we have seen. However, it seems to be lacking support for Issue 3 above.

3.4 Our initial approach

Here are the principles of our approach, each addressing the issues defined in Section 2 above.

Group all attributes together

<description lang=”en-us” maxlength=”500″>Vitamin D Supplementation</description>

becomes:

{ "description" :
      [ {"xmlattr": { "lang":"en-us", "maxlength":"500" }},
        "Vitamin D Supplementation"
      ]
}

In the above JSON, we have created an array for the “description” element.   We did this so that we would not have to create a fake name (e.g., “#text”) for the element’s contents (i.e., “Vitamin D…”).   Also notice that we have grouped all the attributes into an “xmlattr” JSON object that always appears first in the element’s array.

And our scheme nests nicely. Lets look at the next complex example with nesting.

<description lang="en-us">
    <original maxlength="500">Vitamin D Supplementation</original>
    <short>Vitamin D</short>
    <translated lang="fr">Suppléments de vitamine D</translated>
</description>

becomes:

{ "description":
      [ {"xmlattr": { "lang":"en-us" }},
        {"original":
              [ {"xmlattr": { "maxlength":"500" }}, "Vitamin D Supplementation" ]
        },
        {"short":"Vitamin D"},
        {"translated":
              [ {"xmlattr": { "lang":"fr" }}, "Suppléments de vitamine D" ]
        }
      ]
}

Preserve the order of mixed content

<maincourse> lean steak <prepared>well done</prepared> with a side of green beans</maincourse>

becomes:

{ "maincourse":
      { "P6R:10":"lean steak",
        "prepared":"well done",
        "P6R:20":"with a side of green beans"
      }
}

So what is that “P6R:10″ thing?   Well its a unique name where the number, for us, is the node id in the DOM tree, but it can be any unique value.   The “P6R” tells us that its a constructed name so we can drop it when we convert this back into XML.   Also notice that the order of the mixed content of the maincourse element is preserved.

Encode the QName not the XML prefix

<book abc="http://www.p6r.com/specialindex">
     <title>XML test example</title>
     <abc:title>12348383747</abc:title>
     <chapter id="1">
     . . . .
     </chapter>
</book>

becomes:

{ "book":
     { "title":"XML test example",
       "QName-http://www.p6r.com/specialindex/title":"12348383747",
       "chapter":
            [ {"xmlattr: { "id":"1" }},  . . .
     }
}

Notice that the XML prefix “abc” is nowhere to be found in the JSON.   In fact, when converting this back into XML any unique prefix can be used for the URL “http://www.p6r.com/specialindex”.  The XML QName is encoded just like a URL with the “title” string (in XML jargon the local part) as the last element in the URL path.   “QName-” is added to the front of the URL to distinguish it from a generic URL used as a JSON string (i.e., it is a type indicator).

4. The scheme we have chosen for our products

After careful consideration, we have chosen to use the approach defined as the JSON Markup Language (JsonML). This approach handles all the issues defined in Section 2 above, and appears to have a chance to become an industry standard method for translating XML to JSON. JsonML support will be added to the next release of our XJR SDK.

"XML to JSON and Back" was published on April 5th, 2010 and is listed in Unique Product Features.

Follow comments via the RSS Feed | Leave a comment | Trackback URL


Leave Your Comment