20050419

A Window Into My Work Day

WARNING: Highly technical stuff ahead. Do not proceed without a pocket protector.

Here's a good puzzle for you. How do you use a literal quotation mark in quoted XML text in a string wrapped by the same kind of quotation mark in an XPath query?

If you say:
comments[text()="There's a 48" door."]

The quotation mark after the 48 ends the text quotation, and the final part of the string (" door.",) is unexpected, and causes a syntax error. Simple enough so far, right?

A literal quotation mark in an XPath string is NOT automatically escaped to it's character entity to indicate a literal quotation mark. Instead, it's naturally used as part of the XPath code, to break the XPath string.

Your first impulse might be to switch quotation mark types. I disdain the switching of quotation mark types. In my opinion, the double quotation mark should always be used to indicate a string quotation, and if it must be escaped, it should be escaped. Some people prefer to use the alternative single quotation mark, but as I'll demonstrate, that doesn't really get you anywhere. It generally just makes things more complicated. Consider our example.

To solve my problem, you might try changing the query to:
comments[text()='There's a 48" door.']

This was a great thought, because now the double quotation mark in the quoted text won't be used to terminate the XPath string, like it was before. Astute observers, however will instantly discover another problem. The single quotation mark, used in our example for an apostrophe, WILL terminate the string, which is now wrapped in single quotation marks, and we're back at square one.

Switching between wrapping a string in double and single quotation marks, doesn't solve the problem of how to indicating that a double or single quotation mark is to be taken as a literal part of the string, instead of as a string terminator.

No problem, I told myself. I'll just use the special XML character entities to escape the literal quotation marks. XML defines two character entities to indicate that a quotation mark is a literal part of the string instead of a string terminator. The entities are " for a double quote, and ' for a single quote.

So I fixed my query like this:
comments[text()="There's a 48" door."]

I thought that XPath string should now literally contain my quotation marks. But then I was really shocked. I don't know whether this is an error with my XPath implementation, or what, but now the real fun begins.

Naturally, XPath assumed I couldn't possibly be wanting to put a literal ampersand character in the string, and start an entity. What? How am I supposed to indicate an entity within an XPath string? XPath promptly converted the ampersands to the XML character entity for literal ampersands, &.

The query, of course, only matches strings that actually look like this:
There's a 48" door.

I wonder why it took me so long to realize this now obvious rule:

Ampersands, along with less than, and greater than symbols, ARE automatically converted to their literal character entities when used within XPath strings, because XPath queries themselves are XML strings.

Wow. Doesn't this present a blissfully fun, uniquely challenging puzzle? This is the kind of thing we programmers really like.

Two ideas immediately hit me to solve this problem. First, I figured, why not just double-escape the quotation marks by escaping the characters in the escape sequence?

Now my query starts out like this:
comments[text()="There's a 48" door."]

We're getting in deeper!

When the query is converted into an XML string, I assume it will come out like this:
comments[text()="There's a 48" door."]

When the XPath string is actually processed as part of the XPath, I assume the new ampersands will be converted back to real literal ampersands in the query itself:
comments[text()="There's a 48" door."]

And then the ampersands in the string will indicate literal ampersands to the query processor, like this:
comments[text()="There's a 48" door."]

And the query processor will process the string as an XML string when it's querying the XML, so the entities will be un-escaped again, and match text like this:
There's a 48" door.

Which is what I wanted all along. But, I was wrong again. The selectNodes and selectSingleNode DOM methods don't use XML strings, so it still doesn't work.

Another idea I've had to far is that maybe one of the processing layers will skip numeric character code entities, instead of treating them the same as the named character entities, and so I'll be able to indicate literal quotation marks using the character code numbers in numeric XML entities. How would that work?

How about if I catted the entity together, like this:
concat("&nb","sp;")

Would that help? What would that do?

Otherwise, the thing left that I can think of to do is to put the XPath string in it's own node or param/variable, and refer to the variable/param/node from the XPath. But how do I use a variable with the selectSingleNode method?

I found out that somebody else had the same problem.

Just a little thinking out-loud, and a little window into the world I work in all day.

Oh, uh- did I mention what Blogger does when you try to put all these escape characters and escape codes for escape characters, and everything, in your blog? Let's just say it adds even more fun. Maybe I'll post another message later about the Hebrew language I'm learning- that shouldn't be too much more complicated to understand.

3 comments:

  1. "String literals may be enclosed in either single or double quotes as convenient. The quotes are not themselves part of the string. The only restriction XPath places on a string literal is that it must not contain the kind of quote that delimits it. That is, if the string contains single quotes, it has to be enclosed in double quotes and vice versa."
    -XML in a Nutshell, Fair Usage

    So that's all there is to it- it can't be done.

    XPath provides no conventions to indicate literal quotation marks in a string literal.

    In an XPath query, if you create a string literal dynamically, don't assume that the string won't literally contain the type of quotation marks used to enclose the string itself.

    For instance, never write Javascript like this:
    var myName = "John";
    var myNode = dom.selectSingleNode( "*[local-name() = \"" + myName + "\"]");


    Instead, a new XML context must be created, containing a variable, parameter, or node with the quotation marks in the text. Instead of matching a string literal, the XPath query must match the text by referencing the node, variable, or parameter.

    A string literal in an XPath query should not be created dynamically.

    Dynamic text for use by an XPath query should not be expressed by a string literal in that query.

    I think this is a serious defect because:
    1. This behavior is unpredictable.
    2. This is a significant inconvenience.

    ReplyDelete
  2. Your link to XML in a Nutshell is broken. :-)

    ReplyDelete
  3. Sorry the XML in a Nutshell link went dead. Here's the Google Cache of the page, which might be good for a little while longer.

    ReplyDelete

You can use some HTML tags, such as <b>, <i>, <a>.

Comment Approval Policy
It costs money to create and maintain findmercy.com . The purpose of this web site is for serving the Lord. Approved comments will be hosted on this web site, and available for all the world to see.

For that reason, I would appreciate it if you would please try to keep your comments generally constructive, and in line with the purpose of this site.

If you would like to criticize something about findmercy.com, please email me personally instead.

Comments:

Bookmark and Share