Parse HTML

I am looking into parsing html and displaying it without the WebBrowserComponent, a) because of the overhead and b) I don’t like the OS windows lagging behind (same as for VideoComponent and OpenGLContext enabled Components).

So the first step is to extend the XML parseXML to be more forgiving. Would you accept a PR to add a flag “allowHTML” or similar, that would do the following:

  • if certain tags are the same as their parent, close the parent tag (<p> and <li>)
  • and further differences I might find on the way…

Funny, how the circles close, I answered myself to the same question before:

Thanks for considering

Ehh, XHTML is closer to XML than HTML could ever be. IMHO, you would have to roll something from scratch.

For example, HTML is case insensitive, whereas XML is the opposite.

There’s also the consideration that closing tags are optional in HTML.

Have you considered embedding Chromium instead? I’ve seen it used in various environments like set-top boxes, mobile devices, and smart TVs. If it can work on spectacularly shitty platforms like those, should be fine for a desktop environment.

I should mention WebKit is available as well. Sky, for example, use that on their set-top boxes just fine.

1 Like

Thanks, yes fair point.
I think I wouldn’t win anything with chromium compared to WebBrowserComponent, as long as it is an OS handle, it will have those artefacts.

My client wants just a bit more than AttributedString, like bullet points and links, so it doesn’t have to be a full featured browser anyway. So I thought, I might at least not re-invent the parsing, but maybe that’s the fastest way to success…