Assertion in XmlElement::XmlAttributeNode::XmlAttributeNode() is not standards compliant


#1

I've an application in the works that's user-data driven, where this data is poured into an XML file, generated by ValueTrees. While testing, I'm getting an assertion in the attribute node creation. This is due to the assertion incorrectly assuming that only few symbols are valid, and ASCII letters and digits.

In reality, XML attribute names are more complicated and open than that: there's a much broader range of legal codepoints, and the standard has a specific set of rules and restrictions for setting up the name. See here for more details: http://www.w3.org/TR/2008/REC-xml-20081126/#sec-common-syn

Seeing that the information I'm dealing with can be multilingual, I'm open to suggestions that don't involve disabling assertions or hacking the code (which is unreasonable in a multi-person team, even with a fork).


Here's an attempt at summarising the rules:

A name cannot start with:

  • Any case combination of the word "XML"

The attribute name can start with the following list below. JUCE accepts starting with a dash or [0-9], which is incorrect. See the next list.

  • ":"
  • "_"
  • [A-Z]
  • [a-z]
  • [#xC0-#xD6]
  • [#xD8-#xF6]
  • [#xF8-#x2FF]
  • [#x370-#x37D]
  • [#x37F-#x1FFF]
  • [#x200C-#x200D]
  • [#x2070-#x218F]
  • [#x2C00-#x2FEF]
  • [#x3001-#xD7FF]
  • [#xF900-#xFDCF]
  • [#xFDF0-#xFFFD]
  • [#x10000-#xEFFFF]

Any subsequent codepoints in the name can be whatever is listed above, and additionally:

  • "-"
  • "."
  • [0-9]
  • [#xB7]
  • [#x0300-#x036F]
  • [#x203F-#x2040]

#2

Maybe https://github.com/julianstorer/JUCE/blob/master/modules/juce_core/xml/juce_XmlElement.cpp#L51   could be extended and used in the attribute name.

The function could be tweaked in such a way that the subset of rules for one won't apply to the other.


Something like:

static void sanityCheckXmlName (const String& name, bool checkForTagRules = true)
{
    (void) name;
    (void) checkForTagRules;

    //Check all-encompassing rules here

    if (checkForTagRules)
        //Check tag name specific rules here
    else
        //Check attribute name specific rules here
}

#3

Thank you! We'll take a look at this and get a fix in soon.


#4

https://github.com/julianstorer/JUCE/commit/705e7f6110ec8901f5844542b701237a5062e667

Note that a valid name cannot start with any case combination of "xml": http://www.w3.org/TR/2008/REC-xml-20081126/#sec-common-syn

Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

#5

You're quick!

This fix was mainly put in so that incorrect assertions (such as the one you were getting) did not trigger. Frankly, even if a name starts with XML, it'll probably be fine. If this is a problem for you too, let me know and I'll sort it. :)


#6

It's not a problem for me - and doubtfully anybody else anytime soon! Just wanted to point out the standard having a rule agaist doing that, is all.


#7

Got it, thanks very much for letting us know regardless :)