-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDATA scanner not scanning content properly #125
Comments
Hi @akshay-kr, correct, the cdata handling was changed to be in sync with the spec. You can read a bit about the reasoning for the changes in #102. If you think there is an error in the parser please provide a small html sample. You can open this samples in your browser and then have a look at the DOM tree with the developer tools. If neko generates a different one i will try to fix it. And curious to learn more about your use case.... |
Hi @rbri So basically if I have this as an input The part after the first We allow users on our app to enter code snippets and we store these code snippets after sanitising it with Antisamy plugin. Now for example if user enters It used to work properly before this committ. Please do let me know if this information works for you else I will try to see if I can provide additional information. |
I still think this is correct. Please play with the test-cdata-close-early-empty-tag.html file in your browser. The browser parses the file exactly the same way.
Please have a look at the paper attached to #102 for more details. |
@akshay-kr do you still think i have to fix something here? |
@rbri I am now understanding what you are trying to explain. For example like this xml, <xt:c-code xt:name="code" xt:version="1" xt:id="15ae0cc7-ded7-4a74-97b8-d66238d3c177"><xt:parameter xt:name="language">html</xt:parameter><xt:text-body><![CDATA[<div></div>]]></xt:text-body></xt:c-code> We are parsing this XML and specifically the content inside CDATA and then storing it. Later when viewing we extract the content inside CDATA and render it on the web page. |
Maybe we have to discuss this with the Antisamy peoples. |
Sure. Raised an issue on Antisamy repo for the same nahsra/antisamy#531 |
@akshay-kr had another look at this (from the HtmlUnit point of view). HtmlUnit uses neko to parse html and XHtml documents - and in case of XHtml the current behaviour of the parser is wrong. Current plan: add another flag to disable the behaviour - but i have to think a bit about that. |
Yup a flag to disable the behaviour will work. Or if there is a way user can pass what sort of document they are trying to parse then we can have separate logic for HTML and XHTML. |
have added the feature but i fear there is more to do if you like to use it from antisamy |
@rbri Thanks a lot for adding this feature flag. Is it possible to back port this to |
@akshay-kr for my points about such fix releases please have a look at my comment in issur #123 And thanks for that pr. |
@akshay-kr any news here? |
It seems the CDATA scanning is behaving different after below commit,
49a31c0
For example if it has to scan
<div></div>
it just scans till<div
and then append CDATA closing<div]]
. Rest of the contents are not scanned.I guess it is due to addition of below condition,
49a31c0#diff-c0ef5d78bafd2e2d032b106fd0ca50086966f97030bd6b6f4403d8f5ebe39f87R2490
I am using version
3.11.1
Any help here will be really appreciated.
The text was updated successfully, but these errors were encountered: