Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow boolean and empty attributes for certain node types #278

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dedene
Copy link

@dedene dedene commented Nov 28, 2023

We are using Loofah in a number of projects where the scrubbing of empty attributes of boolean attributes became an issue. This PR adds support for boolean attributes or empty string values on certain node types. It fixes #242.

I.e. <option value="">Empty Value</option> is a perfectly safe html, but the empty value was stripped when using the scrubber.

It also adds support for boolean attributes (i.e. download on an <a> element, or autoplay on a <video> tag. I could not get Nokogiri to output it as a boolean attributes, but the html5 specification (section 3.2.2) specifies that empty string is also fine.

# Before this PR:
>> Loofah.html5_fragment('<option value="" selected></selected>').scrub!(:strip).to_s
=> "<option></option>"

# After this PR:
>> Loofah.html5_fragment('<option value="" selected></selected>').scrub!(:strip).to_s
=> "<option value=\"\" selected=\"\"></option>"

The behaviour from #51 is still the same, so the risk for unwanted regressions is minimal imho.

The tests on Github Action seem to fail for truffleruby. But that seems to be related to ruby/stringio#71 which just got merged and not related to the actual code changes in this PR.

Feel free to make or suggest changes if needed. Thanks a lot for having a look at this!

@dedene dedene force-pushed the fix/allow-empty-attrs branch from 0260dd8 to 88941ce Compare November 28, 2023 13:34
@dedene
Copy link
Author

dedene commented Nov 28, 2023

(the force-push: I've squashed my changes up till now in a single commit)

@flavorjones
Copy link
Owner

Thanks for submitting this! It may be a day or two before I'm able to review.

@flavorjones
Copy link
Owner

@dedene Thank you for your patience!

So I'm proceeding carefully here for the moment, since Rails::HTML::Sanitizer is sensitive to empty/boolean attributes. See rails/rails-html-sanitizer#136 for the original description of the general problem back in June 2022.

I had been waiting for HTML5 parsing to land in the sanitizer stack before tackling some of these behavioral edge cases. This PR might be the right answer, but I want to try to see if we can get the underlying parser (libgumbo) to do the right thing here first.

All of which is to say: I'm going to play with this for a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTML5 empty attributes are being scrubbed
2 participants