Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Regex Character Class Escape Tests #4195

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Aurele-Barriere
Copy link
Contributor

Character class escape tests for \d, \D, \s, \S, \w, \W are not testing what they should.

Consider the escape \d.
We should test two things:

  • That it matches ALL digits (positive cases)
  • That it matches NONE of the other code points (negative cases).

This is not what the auto-generated tests are currently doing.
character-class-digit-class-escape.js only checks that \d finds a match in "0123456789".
And character-class-digit-class-escape-plus-quantifier.js only checks that \d+ finds a match in "0123456789".

First, these two tests are equivalent.
But also, they do not check that \d matches all digits, only one.
And they never check that non-digits are not matched.
It seems that an implementation where \d matches only "3" and "z" would pass all the current tests.

I suggest the following:

  • check positive cases by testing ^\d+$ (with anchors) on the string "0123456789".
  • check negative cases by testing the absence of any match of \d on the string that contains all code points except digits.
  • do the same for each character class escape.

Thanks to Noé De Santo @Ef55 for realizing that these tests were probably not what they were intended to be, while working on the Warblre project.

@Aurele-Barriere Aurele-Barriere requested a review from a team as a code owner August 14, 2024 12:52
For each character class escape (\d, \D, \s, \S, \w, \W), check
positive cases (the escape matches all characters it's supposed to
match) and negative cases (the escape doesn't match any of the
characters it should not match).  Each of these checks is also done in
Unicode mode.

This is part of my work at the SYSTEMF lab at EPFL.
@ptomato
Copy link
Contributor

ptomato commented Aug 14, 2024

cc @mathiasbynens, you might be interested in this.

@mathiasbynens
Copy link
Member

mathiasbynens commented Aug 15, 2024

CC @leobalter as the original author of these tests.

Thanks for kicking this off! Since these are generated tests, could you please update the source script that generates these files rather than editing them directly? It would be easier to review that patch.

@Aurele-Barriere
Copy link
Contributor Author

I'll give it a try in the next few days!

@leobalter
Copy link
Member

I no longer have the code on any of my machines, the diff information points out to https://github.com/bocoup/test262-regexp-generator. I hope that helps.

@Aurele-Barriere
Copy link
Contributor Author

Hi,
It seems like the tests currently in Test262 are not the same as the tests of the generator.
Currently, the tests in the generator check that, for instance, str.replace(/\d/g,'') === str.replace(/|0-9]/g,'') where str is the string of all code points.
I think these generator tests are fine: if \d matched too many things, then the string on the left side of the equality would be shorter.
And if \d didn't match enough characters, it would be longer.

So I believe these generator tests correctly check positive and negative cases!
I still think that the tests for the plus quantifier are redundant, but one could argue that it's just an extra sanity check.

It seems to me like the reason for the differences between the two sets of tests is due to this commit, optimizing the tests by removing this str.replace comparison.

I see two ways forward:

  • We could simply revert the optimization commit. The tests would be then agree with the generator. But probably this is too slow.
  • I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use str.replace, so maybe this is fast enough?

What do you think?

@mathiasbynens
Copy link
Member

I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use str.replace, so maybe this is fast enough?

IMHO this is the way to go.

@Aurele-Barriere
Copy link
Contributor Author

Hello,
I've created a pull request here: bocoup/test262-regexp-generator#2

To generate the correct strings, I've used the regenerate.js file from this repo: https://github.com/mathiasbynens/unicode-property-escapes-tests/

I've also added a configuration with the v flag.

Best,
Aurèle

@ptomato
Copy link
Contributor

ptomato commented Sep 25, 2024

We are considering copying bocoup/test262-regexp-generator into this repo. I'm not sure Bocoup is still maintaining it.

@Aurele-Barriere
Copy link
Contributor Author

Makes sense! Let me know and I can update this PR when the generator is copied here.

@ptomato
Copy link
Contributor

ptomato commented Nov 2, 2024

I've copied and updated the generator script in #4303. I had to make some modifications to it, to implement some changes that had been made directly to the test files and weren't reflected in the script. So any extra review is appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants