New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add some new rules to en-GB grammar.xml. #10596

Open

Tang567 wants to merge 1 commit into languagetool-org:master from Tang567:cpt304Ass2

Tang567 commented May 15, 2024

No description provided.


          Add some new rules to en-GB grammar.xml.

97a8cb4

evan-defran-lt suggested changes

View reviewed changes

Contributor

evan-defran-lt left a comment

Hi @Tang567, thanks for contributing to LT!

I've reviewed your PR, and I've left some comments for you. Some things will need to be amended before this PR could be approved.

Some other things to note:

none of these rules are specific to en-GB, so these rules would have more impact in the general EN file: languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml.
please make sure all new rules have the attribute default="temp_off". This ensures that rules do not go into production before we've had the chance to test them fully.
before committing, it's good practice to run a Maven test in your IDE terminal to ensure that there are no errors that would break the build. Here's the command for English: mvn --projects languagetool-language-modules/en --also-make clean test.
I'm not sure if you already know this, but you can also test your rules in LT's "expert" RuleEditor here: https://community.languagetool.org/ruleEditor/expert?devMode=true. It will run your rule over a large corpus of sentences, allowing you to see how your rule performs, which means you can easily identify TPs and FPs. It will also flag some errors that would break the build. It's not perfect – you cannot test rule groups with it, for instance – but it is a helpful resource to start!

Thank you again for contributing to LT. If you have any questions, please let me know.
Evan

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

-                  </category>
+                      </category>
+                      <rule id="YOUR_YOURE" name="Your/You're Usage">

Contributor

evan-defran-lt May 16, 2024

LT already has rules for {your=>you're}. Additionally, as currently written, this rule will change every instance of your to you're, which will result in a large number of False Positives.
If you want to refine this rule, you should find instances where LT does not successfully correct {your=>you're} (i.e., False Negatives) and refine our already-existing rules for those cases.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                          <example correction="you're">I think <marker>your</marker> going to love it.</example>
+                      </rule>
+                      <rule id="WORD_COLLOCATION_1" name="Inappropriate Word Collocation">

Contributor

evan-defran-lt May 16, 2024 •

edited

Loading

We would probably want a more descriptive id for this rule than just WORD_COLLOCATION_1. Additionally, this wouldn't really be a collocation issue (the classic example of a collocation error would be powerful coffee instead of strong coffee); it's more along the lines of an adverb/adjective morphology error.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                              <token>believe</token>
+                          </pattern>
+                          <message>Consider using <suggestion>strongly believe</suggestion> instead of 'strong believe'. "Strongly" is the adverb form of "strong" and is used to modify verbs like "believe."</message>
+                          <example correction="strongly believe">I strong believe in your abilities.</example>

Contributor

evan-defran-lt May 16, 2024

You need to include <marker>...</marker> here.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                      <rule id="WORD_COLLOCATION_1" name="Inappropriate Word Collocation">
+                          <pattern>
+                              <token>strong</token>

Contributor

evan-defran-lt May 16, 2024

Is there a way to make this rule more robust so that it covers all (or at least more) instances where an adjective is incorrectly being used to modify a verb? Right now, this rule would apply only to the bi-gram strong believe.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                          <example correction="strongly believe">I strong believe in your abilities.</example>
+                      </rule>
+                      <rule id="WORD_COLLOCATION_3" name="Inappropriate Word Collocation">

Contributor

evan-defran-lt May 16, 2024

Same comment as above. We'd want a more descriptive id, and this isn't really a collocation error.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                              <token>fast</token>
+                              <token>meal</token>
+                          </pattern>
+                          <message>Consider using <suggesiton>fast food</suggestion> instead of 'fast meal'. "Fast food" is a specific type of food that is prepared and served quickly, whereas "fast meal" is not commonly used.</message>

Contributor

evan-defran-lt May 16, 2024

There's a typo here: should be <suggestion>, not <suggesiton>.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                              <token postag="VBP">run</token>
+                          </pattern>
+                          <message>The verb "run" does not agree with the singular subject "dog". Consider using <suggestion>runs</suggestion> to match the singular subject.</message>
+                          <example correction="runs">The dog <marker>run</marker> in the park.</example>

Contributor

evan-defran-lt May 16, 2024

The <marker>...</marker> tags in all the examples in this rule are not placed correctly.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                          <example correction="fast food">Let's grab some fast meal.</example>
+                      </rule>
+                      <rule id="Singular_Subject_Verb_Agreement" name="Singular Subject-Verb Agreement">

Contributor

evan-defran-lt May 16, 2024

LT already has rules for subject verb agreement. Additionally, as currently written, this rule will result in some false positives, like in the sentence, He let his dog run free in the field.
If you want to refine this rule, you should find instances where LT does not successfully correct the issue (i.e., False Negatives) and refine our already-existing rules for those cases.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml

+                      <rule id="Singular_Subject_Verb_Agreement" name="Singular Subject-Verb Agreement">
+                          <pattern>
+                              <token postag="DT" min="0" max="1"/>
+                              <token postag="NN">dog</token>

Contributor

evan-defran-lt May 16, 2024

This rule will apply only when the subject is dog and the verb is run. Could you think of ways to make it more robust? (Just flagging this for you. As noted above, LT already has rules for subject-verb agreement issues, so it would be preferable to refine those rules than create new ones from scratch.)

pom.xml

@@ @@ -31,6 +31,7 @@ @@
                       </developer>
                   </developers>
                   <modules>
+                      <module>languagetool-client-example</module>

Contributor

evan-defran-lt May 16, 2024

Could you let me know why you added this here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet