|
|
Home | Loader API | Interfaces | File List | Index | |
P6R's XPath functionality is integrated with its DOM XML parser (i.e., P6R::p6IDOMXML). XPath expressions are compiled into an P6R::p6IXpathExpression component. After compilation the expression can be evaluated over and over again. The expression is evaluated in the context of an XML (or JSON) tree (i.e., a p6IDOMXML component). The same compiled expression can be evaluated against one or more DOM trees.
The DOM XML parser can support data encoded in JSON. To do this only two things are required: (a) the initialize() DOM parser method needs the P6DOMXML_USEJSON flag, and (b) all XPath step expressions must start with "/JSON-document" (e.g., "/JSON-document/book/chapter[1]/title" to find the title of chapter 1). The use of "/JSON-document" is necessary since some JSON documents have no top most element as in XML.
The P6R::p6IXpathExpression component has been integrated with P6R's XSLT processor (i.e., P6R::p6IXSLT) and the XML based Rule Engine components (i.e., P6R::p6IRuleEngine). Thus any XML based application (written by P6R or our customers) can embed XPATH 2.0 into its application as we have done. In doing so these applications can access data in both the XML and JSON encodings. For example, our XSLT processor handles XSL templates which are a XML-based dialect that reads/writes XML or JSON input data.
XPath 2.0 contains many more features than XPath 1.0, for example reqular expressions, see reference 1 below.
In addition, XPath 2.0 can be used totally by itself, that is, it does not have to be embedded into XML for it to be used. Our component architecture allows direct application use of the XPath 2.0 and related components. Thus an application wishing to use this powerful expression language directly would simple do the following steps:
// This is a code sketch P6XPATH_RESULT result; p6IXpathExpression *pExpress; p6IDOMXML *pDOM; p6IDataStream *pStream; // get a DOM tree component p6CreateInstance( NULL, CID_p6DOMXML, VALIDATEIF( p6IDOMXML, &pDOM )); // get a XPath component p6CreateInstance( NULL, CID_p6XpathExpression, VALIDATEIF( p6IXpathExpression, &pExpress )); // fill the pDOM with XML to be accessed via XPath expressions // the XML is streamed in from the p6IDataStream object pDOM->parse( &pStream ); .... // compile the XPath 2.0 expression pExpress->compileExpression( "7 ge 5", ... ); // evaluate the expression against the pDOM tree with the 'result' type returned. pExpress->eval( pDOM, NULL, NULL, &result ); // the same XPath component, e.g., pDOM, can be resued to compile many expressions and evaluated against the same DOM tree
1) M.Kay, XPath 2.0, Programmer's Reference, Wiley Publishing Inc, 2004, ISBN 0-7645-6910-4.
2) M.Kay, XSLT 2.0, Programmer's Reference, 3rd Edition, Wiley Publishing Inc, 2004, ISBN 0-7645-6909-0.
3) N.Bradley, The XSL Companion, Addison-Wesley, 2000, ISBN 0-201-67487-4.
4) J.Fridel, Mastering Regular Expressions, 2nd edition, O'Reilly, 2002, ISBN 0-596-00289-0.
Several XPath functions take an optional collation string paramter (e.g., compare, starts-with). The XPath standard defines these collation strings to be URIs. However, for P6R's XPath implemenation these collation strings are not URIs. The collation strings are what the underlying operating system expects for its I18n support.
1) tokenize( input string, regex, flags )
The standard definition of this function states that it has the following limitiation: it is not possible to do anything with the separator substrings. That is, only the substrings between the separators are returned. However, this is not true for our implementation. We have based our implementation of tokenize() on the Perl split() function (see P6R::p6ISplit).
With the Perl split() function, the separator substrings can be obtained by use of capturing parentheses. See reference #4 above, pp.326, "Split's Match Operand with Capturing Parentheses". As an example, given tokenize( "1:2-3;4", "([:-;])" ), would return the sequence: '1' ':' '2' '-' '3' ';' '4'. Without the capturing parentheses the regex would be "[:-;]", and the sequence returned would instead be: '1' '2' '3' '4'.
So P6R's implementation of tokenize() is more powerful than what is defined by the XPath 2.0 standard, yet it follows standard Perl regex rules.
The following functions have not yet been implemented in P6R's XPath 2.0 implementation: base-uri, collection, deep-equal, doc, document-uri, format-number (see our extension below), id, idref, iri-to-uri, nilled, normalize-unicode, and resolve-uri.
These added functions require the use of the P6R namespace: http://www.p6r.com/XPath/extensions
1) P6R:base64encode
This function encodes a given byte array into a base64 encoded string. However, all strings in our XPath implementation are stored in a wide character, Unicode representation. The output string can be output into UTF8 format along with all other template output. Note that the input to this function can come from an externally defined variable that can contain any binary data via the use of the P6R::p6IXpathVariables::lookupVariable() interface.
Argument Data Type Meaning input byte array Function takes a standard XPath expression as input. result byte array A base64 encoded character string in wide string format. XPath returned type of P6R::P6XPATH_TYPE_STR.
2) P6R:base64decode
This function removes the base64 encoding of the input string. Warning, care must be taken when using these base64 functions when encoding Unicode strings. On Linux each wide character is represented in 4 bytes while Windows and Solaris use 2 bytes. Thus encoding on Linux and trying to decode on Windows or Solaris will not work properly. (Likewise encoding on Windows or Solaris and trying to decode on Linux will also not work.) The calling application needs to normalize strings, before using this standard base64 algorithm, when passing the base64 result between Linux and other operating systems.
Argument Data Type Meaning input xs:string A base64 encoded string result byte array XPath returned type of P6R::P6XPATH_TYPE_STR.
3) P6R:match-attribute
This is an extension of the standard lang() function. This method allows the caller to match any attribute of the context node.
Argument Data Type Meaning attribute name xs:string To match the lang() function this would be "lang" attribute value xs:string To match the lang() function this could be "fr-CA" result xs:boolean Base64 allow the encoding of binary data
4) P6R:matches-with-capture
This is an extension of the XPath matches() function. It takes the exact same parameters as the matches() function but returns a node set as a result instead. The returned node set is composed of the following values: the first value is the matching string, all other values (if any), are substrings of the first value which are captured by the regular expression via back references. If no match occurs, then an empty node set is returned to indicate false. XPath itself does not currently have a way to return the captured strings.
Argument Data Type Meaning attribute input xs:string See the XPath 2.0 standard for the standard meaning attribute regex xs:string of these arguments. atrribute flags xs:string (optional) result item()* A node set with zero or more values as defined above
5) P6R:format-number
This is meant as an alternative to the XSLT format-number() function. The XSLT function is complex and does not allow the explicit selection of a language and locale. The third parameter of this function is a standard locale string, for example: 'en' (for English), 'en_us' (for english in the United states, and 'fr_ca' (for French Canadian).
Argument Data Type Meaning numeric xs:double or xs:integer any one of the numeric values is valid or xs:decimal or xs:float format xs:string a standard format string as used in P6R::p6i18n::formatString function (i.e., %1$) field width xs:integer if zero then no default width is used, otherwise use size as maximum length of number locale xs:string (optional) indicates language and locale (e.g., en_us) result xs:string The format parameter expanded with the 'numeric' parameter
An example XSLT stylesheet using the base64 encode function.
<?xml version='1.0' encoding='ISO-8859-1'?> <xsl:stylesheet version='2.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:P6R='http://www.p6r.com/XPath/extensions'> <xsl:output method='html'/> <xsl:variable name='gv1' select='output/bye' /> <xsl:variable name='gv2' select='output/hello'/> <xsl:template match='/'> <HTML> <BODY> <P/> Base64 of " <xsl:value-of select='$gv2' /> " is:<BR/> <xsl:value-of select="P6R:base64encode( $gv2 )" /> </BODY> </HTML> </xsl:template> </xsl:stylesheet>
Apply the following XML input data to the stylesheet defined above.
<?xml version='1.0' encoding='UTF-8' ?> <output> <hello>Hi there1</hello> <hello>Hi There2</hello> <hello>HI THERE3</hello> <hello>HI4</hello> <bye>simple period test</bye> </output>
The output of the XSLT stylesheet applied to the XML input using the P6R:base64encode() function.
<HTML><BODY><P>
Base64 of " Hi there1" is:
SGkgdGhlcmUx
</BODY></HTML>
To call one of the extension functions outside of XSLT requires the use of the following, qualified names (i.e., QNames):
http-&&www.p6r.com&XPath&extensions&p-base64encode( string )
http-&&www.p6r.com&XPath&extensions&p-base64decode( string )
http-&&www.p6r.com&XPath&extensions&p-match-attribute( string, string )
http-&&www.p6r.com&XPath&extensions&p-matches-with-capture( string, string, string )
http-&&www.p6r.com&XPath&extensions&p-format-number( numeric, string, integer, string )
The QName encoding is simple: (1) all '/' characters are replaced with '&', (2) all ':' characters are replaced with '-', and (3) the name of the extension function is placed at the very end of the string with a "&p-" connector.