Xml - unmatched tags!


#1

going mental!
i’ve got loads of xml documents that read perfectly in many xml reading applications. i have also some that i’ve made myself using JUCE xml element structures… and lots of them are failing to read with XmlDocument.getDocumentElement(); - the getLastParseError always says “unmatched tags”.

i could put some example files up if you’d like, but have you any ideas why this might be? looking thru the files i can’t see any ‘unmatched tags’, but perhaps you know what might cause such a result. some of these files are quite large, but not all of them!


#2

in trying to figure it out i thought i’d try the ANSI form of strings instead of Unicode, but unfortunately unsetting the flag in juce_Config.h caused the following errors:

c:\coding\juce\src\juce_core\text\juce_string.cpp(503) : error C3861: ‘snprintf’: identifier not found
c:\coding\juce\src\juce_core\text\juce_xmlelement.cpp(684) : error C3861: ‘snprintf’: identifier not found

and indeed, a search thru all the juce source docs shows this ‘snprintf’ to only appear once…

C:\Coding\juce\src\juce_core\text\juce_TextFunctions.h(109):

and nowhere else!

now i’m pretty sure this won’t solve my problem, but i thought i’d point it out anyway


#3

could it just be that the files are too big? is there a maximum limit to the size of a String?

i see that XmlDocument.getDocumentElement() reads the whole file into a string first. could it be that it’s just running out of data because the whole file didn’t fit in the string? i want to go in and put some DBG code there to figure out what’s going on but that’s impossible!! :cry:


#4

… progress at least…

just went into the juce_XmlDocument.cpp file and put in some extra detail for the parse error that ended the file reading. I changed it to:

493            setLastError (T("unmatched tags : " + String( input-50, 50 )), false);

i had a DBG line in my code to output the parse error, and navigated to the exact point of failure in the loaded document. Copying every character up until the failure into a separate file, and examining the size, it is ALWAYS 8,192 bytes. This appears to be the maximum filesize that an XmlDocument can take to be read :frowning: and this is such a serious problem for me that my project is dead until i sort it out!

this was not a problem in the past- before Juce1.15 i was able to load these files without such problems


#5

okay. sorry aboout all this craziness! this is kind of “haydxn’s madness blog”…

it turns out i’ve only just come across this weird behaviour since trying the XmlDocument::getDocumentElement(true); method, testing the outer element first. Looking thru the source, i can see why i find this 8192 ‘magic number’ :hihi: … it seems that it ‘remembers’ to only read up to 8192 chars of the file.

This works fine…

XmlElement* configData = configDoc.getDocumentElement();
if ( configData )
{
   return configData;
}

THIS, on the other hand…

XmlElement* configData = configDoc.getDocumentElement(true);
if ( configData )
{
   if ( configData->hasTagName( configRootTagName ) )
   {
      // root tag is correct, so get whole tag...
      delete configData; // get rid of outer element...
      configData = configDoc.getDocumentElement();
      DBG( configDoc.getLastParseError() );
      return configData;
   }
}

… always manages to run out of data after 8192 bytes of processing in the XmlDocument. Am i doing something very wrong here? Am i supposed to create a new XmlDocument object to do the full scan?

for example…

XmlElement* configData = configDoc.getDocumentElement(true);
if ( configData )
{
   if ( configData->hasTagName( configRootTagName ) )
   {
      // root tag is correct, so get whole tag...
      delete configData; // get rid of outer element...
      XmlDocument auxDoc( sameFileAsBefore );
      configData = auxDoc.getDocumentElement();
      DBG( auxDoc.getLastParseError() );
      return configData;
   }
}

… this works!

i’d have thought it would be okay to just do the scan again with the default ‘false’ setting, yet somehow it keeps to only a maximum of 8192 characters (which i see is the max number of characters read in ‘true’ mode)…

shoot me in the face for being such a retard up til now…

but is this indeed a bug, or does my code show that i’m even more stupid than i’ve just realised?


#6

ooh - good bug!

I always tend to only use a new XmlDocument object once, so never came across this, but will get it sorted pronto. Thanks!


#7

:hihi: i thought i’d NEVER suss that one out

but isn’t that ALWAYS the way with bugs? every new bug is somehow far more severe and deadly than the last, and ACTUALLY impossible to find this time! :lol: such joy programming brings!

gah i’m such a nerd, i love it.


#8

incidentally, is the non-unicode thing i mentioned in my second post a bug? i can’t find any other mention of the macro defined at that point when using ANSI strings, so it looks like there’s a function missing


#9

sorry - yes, that was a bug that I’d already fixed. I’ll do another version soon, as there’s a few other little fixes I want to release too.


#10

crumbs, you ARE falling behind, aren’t you!! :hihi: :wink:

keep up the good work mr jules


#11