-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Antisamy 1.7.5 version - <body> tag issue #453
Comments
Will have a look... |
Had a quick look and have added a test case to neko. From this first look it seems like neko works ok. Maybe the tag is closed by some cleanup? |
can antisamy version 1.7.5 adds |
@jeetu22 i do not know so much about the inner workings of antisamy but i'm responsible for the neko-htmlunit parser (https://github.com/HtmlUnit/htmlunit-neko) used by antisamy to parse the html file and convert it into a dom tree. During this process some cleanup is done to form a valid dom (or emit valid sax events). And yes missing body (start) elements are added for from valid dom trees. Proving this for your case was exactly the reason to write the additional test case for the parser. |
Thank you very much!!. as antisamy uses neko internally , anyone from Antisamy who can guide us in this scenario.i m suspecting HTMLScanner.java is modifying DOM |
Maybe the org.owasp.validator.html.scan.MagicSAXFilter is the one - but only a guess. |
can you please remove body tag from this Junit test case and assert that output HTML should contains |
@spassarop - Can you look into this with @rbri? |
i guess i found the reason - will analyze this a bit more |
Ok, antisamy is using the fragment parser instead of the document parser; with the fragment parser i can reproduce the problem. |
Thanks @rbri for being so proactive with this. @jeetu22, even though @rbri seem to have reproduced the problem to debug, it would be useful if you provide how are you calling AntiSamy and what policy you are using. These factors make AntiSamy decide if it should use DOM or SAX parser, o which tags to preserve. |
we are using SAX parser. |
@jeetu22, setting this feature for the parser changes the behavior in some ways. One of the effects is the one you are facing - the tag balancer no longer adds missing body tags. But there are some others also. As promised i will have a look at all that - at the moment i'm thinking about why antisamy should use the fragment way of parsing at all. Because i'm working on all this in my spare time and i have some other private things on my todo list, please be a bit patient to do not see a fix in the next hours ;-) |
Thank you for the update! I appreciate you looking into the issue.Given your busy schedule, I completely understand that a fix might take some time. Please take the time you need, and I look forward to your findings. Thanks again for your efforts! |
I don’t know too much about the SAX parser, so I have no idea about the
difference nor why AntiSamy uses fragment parser. It could be changed and
see how the tests react.
…On Mon, 27 May 2024 at 04:30 Jitendra ***@***.***> wrote:
parser.setFeature("
http://cyberneko.org/html/features/balance-tags/document-fragment", true);
@jeetu22 <https://github.com/jeetu22>, setting this feature for the
parser changes the behavior in some ways. One of the effects is the one you
are facing - the tag balancer no longer adds missing body tags. But there
are some others also.
As promised i will have a look at all that - at the moment i'm thinking
about why antisamy should use the fragment way of parsing at all. Because
i'm working on all this in my spare time and i have some other private
things on my todo list, please be a bit patient to do not see a fix in the
next hours ;-)
Thank you for the update! I appreciate you looking into the issue.Given
your busy schedule, I completely understand that a fix might take some time.
Please take the time you need, and I look forward to your findings.
Thanks again for your efforts!
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHL3BMIMIZA76YXAHWLM7CDZELOHXAVCNFSM6AAAAABIG5VRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSHAZDIOJWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
After some thinking
|
@rbri wrote:
That is exactly why! In fact, I think that is the most common use case for HTML sanitizers in general. There's generally some user input that you might capture that only allows some specific mark-up (and which mark up may be vary from one use to another) and you want to sanitize that to make it safe to use it in a broader context of an application generated page. I think it's rare that AntiSamy or the OWASP HTML Sanitizer project would get a complete HTML page to sanitize. That's certainly a valid use case too, but just not one that is as common. If AntiSamy ditched the fragment parser, then I think that ESAPI would have to ditch AntiSamy because dealing with HTML fragments is what Validator.getValidSafeHTML is generally expecting. |
Oh right, of course. I didn’t know what fragment parser meant initially.
…On Mon, 27 May 2024 at 11:59 Kevin W. Wall ***@***.***> wrote:
@rbri <https://github.com/rbri> wrote:
- i have an idea why the fragment parser is used - at least form the
tests it looks like antisamy also can clean html snippets, not only
complete html pages
That is exactly why! In fact, I think that is the most common use case for
HTML sanitizers in general. There's generally some user input that you
might capture that only allows some specific mark-up (and which mark up may
be vary from one use to another) and you want to sanitize that to make it
safe to use it in a broader context of an application generated page. I
think it's rare that AntiSamy or the OWASP HTML Sanitizer project would get
a complete HTML page to sanitize. That's certainly a valid use case too,
but just not one that is as common. If AntiSamy ditched the fragment
parser, then I think that ESAPI would have to ditch AntiSamy because
dealing with HTML fragments is what Validator.getValidSafeHTML
<https://javadoc.io/static/org.owasp.esapi/esapi/2.5.3.1/org/owasp/esapi/Validator.html#getValidSafeHTML-java.lang.String-java.lang.String-int-boolean-org.owasp.esapi.ValidationErrorList->
is generally expecting.
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHL3BMMSMUEIHHRT77J737DZENC4XAVCNFSM6AAAAABIG5VRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTGY2TCOJZGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@davewichers @spassarop fix is ready in PR #454 |
@spassarop - can you create test case for this situation that fails, and then verify that it now passes with his snapshot version? |
My PR includes such an test case... |
Hahaha yeah, our man here is one step ahead ;) |
neko 4.2.0 released |
@rbri , can you confirm if the Neko 4.2.0 release resolves the above issue? |
@jeetu22 strange - have done this right now
|
for me this looks like you still have an old version of neko somewhere in your class path... can you please provide the whole stack trace... |
* add test for issue #453 use 4.2.0-SNAPSHOT * code style * add neko-htmlunit snapshot repo * use neko 4.2.0 release * neko-htmlunit version 4.2.1 * remove property
checking , will update you @rbri . |
Hi @rbri, We've thoroughly tested Antisamy 1.7.6-SNAPSHOT and found that both the workflow and UI are working fine. It would be great if you could provide a tentative release date for the non-snapshot version. Thank you! |
In between neko 4.3.0 is out (https://github.com/HtmlUnit/htmlunit-neko/releases) - you should be able to safely switch to this one. |
oh you are already at 4.3.0... |
This was fixed in release 1.7.6 which went out yesterday. |
The Antisamy library versions above 1.7.2 require a
<body>
tag in the HTML page; otherwise, it causes the HTML to break. Here's an example of the input HTML:The output produced is:
As you can see, the
<select>
tag closes on the same line, causing the dropdown to malfunction and breaking the HTML page. This issue does not occur in Antisamy version 1.7.2 and earlier but appears in versions after 1.7.2. We are upgrading Antisamy in our project to version 1.7.5, but this issue is causing the complete HTML page to become distorted.The text was updated successfully, but these errors were encountered: