AnsweredAssumed Answered

XHTMLExtensionProvider un-escapes restricted XHTML chars when parsing

Question asked by Enrico Ferrari on Jul 1, 2015
Latest reply on Jul 2, 2015 by Enrico Ferrari

Hello,

 

After upgrading to smack 4 (specifically 4.1.2), we noticed that some incoming XHTML messages are failing XHTML 1.0 validation.  It seems that the parsing has changed with version 4. In XHTMLExtensionProvider the body is populated using the output parsed from the XmlPullParser. The pull parser, MXParser.parseEntityRef() method, is unescaping sequences like:

& < > ' " etc...

 

So if smack receives a message like this:

<message to='john@fooo.com/foo' from='jane@fooo.com/foo' id='1234' type='chat'>

<body>Sending restricted XHTML char &amp;</body>

<thread>Isome_thread_id</thread>

<html xmlns='http://jabber.org/protocol/xhtml-im'><body xmlns="http://www.w3.org/1999/xhtml"><p>Sending restricted XHTML char &amp;</p></body></html>

</message>

 

The body that smack provides in the parsed XHTMLExtension will be:

<p>Sending restricted XHTML char &</p>

 

Which is not compliant XHTML.

 

Is there a way to force the pull parser to preserve the XML escaped character sequences within the XHTML body?

Outcomes