P6R's XPath functionality is integrated with its DOM XML parser (i.e., P6R::p6IDOMXML). XPath expressions are compiled into an P6R::p6IXpathExpression component. After compilation the expression can be evaluated over and over again. The expression is evaluated in the context of an XML (or JSON or JsonML) tree (i.e., a p6IDOMXML component). The same compiled expression can be evaluated against one or more DOM trees.
The DOM XML parser can support data encoded in JSON and JsonML. To use JSON two things are required: (a) the initialize() DOM parser method needs the P6DOMXML_USEJSON flag, and (b) all XPath step expressions must start with "/JSON-document" (e.g., "/JSON-document/book/chapter[1]/title" to find the title of chapter 1). The use of "/JSON-document" is necessary since some JSON documents have no top most element as in XML. To use JsonML its only required that the initialize() DOM parser method needs the P6DOMXML_USEJSONML flag. JsonML documents have top level elements so the "/JSON-document" prefix for XPath expressions is not used.
The P6R::p6IXpathExpression component has been integrated with P6R's XSLT processor (i.e., P6R::p6IXSLT) and the XML based Rule Engine components (i.e., P6R::p6IRuleEngine). Thus any XML based application (written by P6R or our customers) can embed XPATH 2.0 into its application as we have done. In doing so these applications can access data in both the XML and JSON encodings. For example, our XSLT processor handles XSL templates which are a XML-based dialect that reads/writes XML, JSON, or JsonML input data.
XPath 2.0 contains many more features than XPath 1.0, for example reqular expressions, see reference 1 below.
In addition, XPath 2.0 can be used totally by itself, that is, it does not have to be embedded into XML for it to be used. Our component architecture allows direct application use of the XPath 2.0 and related components. Thus an application wishing to use this powerful expression language directly would simple do the following steps:
1) M.Kay, XPath 2.0, Programmer's Reference, Wiley Publishing Inc, 2004, ISBN 0-7645-6910-4.
2) M.Kay, XSLT 2.0, Programmer's Reference, 3rd Edition, Wiley Publishing Inc, 2004, ISBN 0-7645-6909-0.
3) N.Bradley, The XSL Companion, Addison-Wesley, 2000, ISBN 0-201-67487-4.
4) J.Fridel, Mastering Regular Expressions, 2nd edition, O'Reilly, 2002, ISBN 0-596-00289-0.
XSLT and XPath for JSON - https://www.p6r.com/articles/2008/05/06/xslt-and-xpath-for-json/
Any JSON and JsonML can be loaded into P6R's DOM tree. However, there are limitations using XPath and outputing such a DOM tree as XML. JSON and JsonML allow a larger character set for use in object names than does XML and XPath. XPath has several characters that might collide with a JSON object name. For example, XPath uses characters such as '/' (used as a step in a path) and '@' (used to access a node's attribute) as special symbols that cannot appear in an XML node's name.
When outputing a JSON originated DOM tree, any JSON object name (which becomes a DOM node name) that starts with a number (e.g., "{ "9top" : "some object value" } ) will get translated on XML output replacing the number with the default character "_". The above JSON example would end up being the following XML: "<?XML ..><top>some object value</_top>". In addition, any other non-XML characters that appear in the JSON object name will also be replaced with the default "" character. This is done so that the outputed XML is valid XML. Note, that this translation only happens when outputing the DOM tree as XML. Outputing the DOM tree as JSON or JsonML perserves the orgininal object names.
Several XPath functions take an optional collation string paramter (e.g., compare, starts-with). The XPath standard defines these collation strings to be URIs. However, for P6R's XPath implemenation these collation strings are not URIs. The collation strings are what the underlying operating system expects for its I18n support.
1) tokenize( input string, regex, flags )
The standard definition of this function states that it has the following limitiation: it is not possible to do anything with the separator substrings. That is, only the substrings between the separators are returned. However, this is not true for our implementation. We have based our implementation of tokenize() on the Perl split() function (see P6R::p6ISplit).
With the Perl split() function, the separator substrings can be obtained by use of capturing parentheses. See reference #4 above, pp.326, "Split's Match Operand with Capturing Parentheses". As an example, given tokenize( "1:2-3;4", "([:-;])" ), would return the sequence: '1' ':' '2' '-' '3' ';' '4'. Without the capturing parentheses the regex would be "[:-;]", and the sequence returned would instead be: '1' '2' '3' '4'.
So P6R's implementation of tokenize() is more powerful than what is defined by the XPath 2.0 standard, yet it follows standard Perl regex rules.
The following functions have not yet been implemented in P6R's XPath 2.0 implementation: base-uri, collection, deep-equal, doc, document-uri, format-number (see our extension below), id, idref, iri-to-uri, nilled, normalize-unicode, and resolve-uri.
These added functions require the use of the P6R namespace: http://www.p6r.com/XPath/extensions
1) P6R:base64encode
This function encodes a given byte array into a base64 encoded string. However, all strings in our XPath implementation are stored in a wide character, Unicode representation. The output string can be output into UTF8 format along with all other template output. Note that the input to this function can come from an externally defined variable that can contain any binary data via the use of the P6R::p6IXpathVariables::lookupVariable() interface.
2) P6R:base64decode
This function removes the base64 encoding of the input string. Warning, care must be taken when using these base64 functions when encoding Unicode strings. On Linux each wide character is represented in 4 bytes while Windows and Solaris use 2 bytes. Thus encoding on Linux and trying to decode on Windows or Solaris will not work properly. (Likewise encoding on Windows or Solaris and trying to decode on Linux will also not work.) The calling application needs to normalize strings, before using this standard base64 algorithm, when passing the base64 result between Linux and other operating systems.
3) P6R:match-attribute
This is an extension of the standard lang() function. This method allows the caller to match any attribute of the context node.
4) P6R:matches-with-capture
This is an extension of the XPath matches() function. It takes the exact same parameters as the matches() function but returns a node set as a result instead. The returned node set is composed of the following values: the first value is the matching string, all other values (if any), are substrings of the first value which are captured by the regular expression via back references. If no match occurs, then an empty node set is returned to indicate false. XPath itself does not currently have a way to return the captured strings.
5) P6R:format-number
This is meant as an alternative to the XSLT format-number() function. The XSLT function is complex and does not allow the explicit selection of a language and locale. The third parameter of this function is a standard locale string, for example: 'en' (for English), 'en_us' (for english in the United states, and 'fr_ca' (for French Canadian).
An example XSLT stylesheet using the base64 encode function.
Apply the following XML input data to the stylesheet defined above.
The output of the XSLT stylesheet applied to the XML input using the P6R:base64encode() function.
To call one of the extension functions outside of XSLT requires the use of the following, qualified names (i.e., QNames):
http-&&www.p6r.com&XPath&extensions&p-base64encode( string )
http-&&www.p6r.com&XPath&extensions&p-base64decode( string )
http-&&www.p6r.com&XPath&extensions&p-match-attribute( string, string )
http-&&www.p6r.com&XPath&extensions&p-matches-with-capture( string, string, string )
http-&&www.p6r.com&XPath&extensions&p-format-number( numeric, string, integer, string )
The QName encoding is simple: (1) all '/' characters are replaced with '&', (2) all ':' characters are replaced with '-', and (3) the name of the extension function is placed at the very end of the string with a "&p-" connector.