Parsing(?) HTML

xrousaios · April 7, 2016, 2:42pm

Hi guys,

I need a couple of days to feel friendly with the new look of forum. However, it is far more functional now.

My current problem is called “HTML parsing”.
I am trying to process the code of some Juce pages (www.juce.com/doc/annotated for example), but I can’t find an HTML parser.

Is there a way to treat the HTML as XML pages ?

When I am trying to parse the www.juce.com/doc/annotated page, the xmlDocument->getDocumentElement() parser, throws an “unmatched tags” error.

Thanks in advance

George

daniel · April 7, 2016, 10:17pm

Even though html is considered a subset of xml, it allows some simplifications:
the closing tag for <p> may be omitted. Not so in xml.
The closing tag of <li> may also be omitted, not valid in xml.
There is the XHTML, where all tags needs to be closed…

xrousaios · April 9, 2016, 8:23am

So, there is no juce-way to parse a juce-documentation page.
It means that I must implement a very dirty program code to extract the info I want.

daniel · April 9, 2016, 9:13am

…well I’m not speaking for juce, and I have also only limited knowledge of all the juce features…
Did you have a look at WebBrowserComponent? https://www.juce.com/doc/classWebBrowserComponent
It obviously can parse html. And even if it doesn’t help, have a look into the source, how they parse it…
Good luck…

xrousaios · April 10, 2016, 7:38pm

All I want to do is to build a database (using SQLite), with Juce classes and member functions.
Obviously, I was hoping that the XmlDocument/XmnElement parser, would make my job much easier.
That’s all.

Topic		Replies	Views
Parse HTML Feature Requests	2	708	July 25, 2019
XML parsing problem General JUCE discussion	7	1485	March 22, 2008
Not parsing an xml General JUCE discussion	11	957	May 12, 2017
Is there any document abouth xml parsing General JUCE discussion	1	275	July 3, 2008
HTML formatting General JUCE discussion	5	1596	September 12, 2018

Parsing(?) HTML

Purchase

Discover

Learn

Support

About

Events

Parsing(?) HTML

Related topics

Purchase

Discover

Learn

Support

About

Events