Enable type-checking and correct some resulting problems #399

dhdaines · 2024-09-11T20:21:56Z

A few functions that had type hints were not actually being type-checked, because there weren't any in the function signature.

This revealed a few problems:

click.BadParameter's type signature doesn't match its documentation (the documentation is correct)
mypy doesn't narrow strings to literal types properly (see Narrowing types to Literal using in syntax python/mypy#12535)
we declare Mapping.rules as List[Rule] but then we put ... other stuff in it, which causes some problems. Stop doing this and simplify the code so that we don't need a noqa.

g2p/mappings/__init__.py

dhdaines · 2024-09-11T20:23:29Z

g2p/mappings/__init__.py

-            if (
-                self.abbreviations
-                and self.rules
-                and "match_pattern" not in self.rules[0]


I really don't know what this was intended to do. Why only check self.rules[0]? Why not the other rules?

Therefore I removed this check, it seems meaningless - if the intent was not to overwrite a pre-specified match_pattern then this is definitely the wrong place to do that.

I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.

yes, I admit I didn't look at git blame here, it might reveal the original purpose. not overwriting an explicit match_pattern is something we might want to do, for instance...

This is the commit that added the check for "match_pattern" in the first rule: 8ec4f27 @roedoejet do you remember what the intent was?

Looks to me like it's a check to make sure abbreviations have not already been expanded before, so we don't do it twice. I would put a conditional breakpoint() or logger.error() if this condition was met and look at the state of the class if it's met.

Yeah, I think it is actually impossible for this to happen now, unless process_model_specs, which is called during construction of a Mapping, gets called explicitly later on. Probably what we want to actually do is ensure that nobody does that.

That's a good analysis. In fact, initialization of the Mappings class has been significantly cleaned up over the years, with the move to Pydantic in particular, let's keep the change you made here, and replace it as you suggest by a guard preventing us from calling process_model_specs, and maybe even model_post_init, more than once. I'd be happy with that guard being a simple assert because it's a programmer error, not a user error, that would get either called twice.

Ok - the way that makes the most sense to do this is to just declare that seeing match_pattern or intermediate_form in a rule (any rule) is a programmer error, since they are excluded from serialization, and if you specify them, they will get overwritten anyway, so you shouldn't do that. Added a commit for that!

dhdaines · 2024-09-11T20:25:30Z

g2p/mappings/utils.py

@@ -131,21 +142,22 @@ def normalize(inp: str, norm_form: str):

    Also, find any Unicode Escapes & decode 'em!
    """
-    if norm_form not in ["none", "NFC", "NFD", "NFKC", "NFKD"]:


It seems like None was a valid value below, but we don't include it here, so that check was meaningless. Made it meaningful...

dhdaines · 2024-09-11T20:26:05Z

g2p/tests/time_panphon.py

@@ -34,16 +34,16 @@ def getPanphonDistanceSingleton1():

 def getPanphonDistanceSingleton2():
    if not hasattr(getPanphonDistanceSingleton2, "value"):
-        setattr(getPanphonDistanceSingleton2, "value", panphon.distance.Distance())


This setattr is gratuitous (and flake8 or mypy or somebody warns about it)

github-actions · 2024-09-11T20:31:28Z

CLI load time: 0:00.05
Pull Request HEAD: bbcd1e87c2735c263c0dcb11d5a02563c55c2c6a
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2024-09-11T20:31:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.82%. Comparing base (5631210) to head (bbcd1e8).
Report is 6 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #399      +/-   ##
==========================================
+ Coverage   93.54%   93.82%   +0.28%     
==========================================
  Files          18       18              
  Lines        2571     2575       +4     
  Branches      579      577       -2     
==========================================
+ Hits         2405     2416      +11     
+ Misses         95       91       -4     
+ Partials       71       68       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joanise

This is great stuff. Leaving some comments now, I'll have to come back for further analysis, though.

g2p/cli.py

joanise · 2024-09-11T21:41:56Z

g2p/mappings/__init__.py

-            if (
-                self.abbreviations
-                and self.rules
-                and "match_pattern" not in self.rules[0]


I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.

joanise · 2024-09-12T20:17:21Z

g2p/mappings/__init__.py

-                key=lambda x: (
-                    len(normalize(strip_index_notation(x.rule_input), "NFD"))
-                    if isinstance(x, Rule)
-                    else len(normalize(x["in"], "NFD"))
-                ),


Interestingly, removing this conditional also fixed a bug that has umista_equiv.csv sorted before removing its index notation instead of after.
Presumably the real different here is that you're creating Rule objects right away all the time, rather initializing the Mapping class with either/or.

oh - hopefully I haven't broken some subtle behaviour that we depended on, it seemed to make sense to me to create the Rule objects right away, so that they always have the same validation

includes these squashed commits: - fix: 3.7 compatibility - fix: type-ignore bad type annotation in click - fix: properly test normalize function

joanise · 2024-09-12T20:49:28Z

I rebased this PR to the tip of main, and squashed some commits.

I also removed the description from the generated fields, because those description trigger a schema update, which would require a minor version bump and a new schema published to the schema store, and I really don't want to do that just for two hidden fields we're not supposed to populate.

joanise

Great work, thanks for taking care of these refinements.

dhdaines requested review from joanise and roedoejet September 11, 2024 20:21

dhdaines commented Sep 11, 2024

View reviewed changes

g2p/mappings/__init__.py Show resolved Hide resolved

dhdaines commented Sep 11, 2024

View reviewed changes

joanise reviewed Sep 11, 2024

View reviewed changes

joanise reviewed Sep 12, 2024

View reviewed changes

joanise force-pushed the dev.dhd/activate_typing branch from da6a8fc to 302a0b8 Compare September 12, 2024 20:28

dhdaines added 4 commits September 12, 2024 16:38

fix: enable type-checking and fix things

16668b2

includes these squashed commits: - fix: 3.7 compatibility - fix: type-ignore bad type annotation in click - fix: properly test normalize function

fix: make sure self.rules is always the type we say it is

6ab8545

chore: g2p update

0523ba9

fix: seeing match_pattern or intermediate_form is an error

3eee1a6

joanise force-pushed the dev.dhd/activate_typing branch from 302a0b8 to 3eee1a6 Compare September 12, 2024 20:39

fix: avoid unnecessarily requiring a schema update

bbcd1e8

joanise approved these changes Sep 12, 2024

View reviewed changes

joanise merged commit b315a6c into main Sep 12, 2024
8 checks passed

joanise deleted the dev.dhd/activate_typing branch September 12, 2024 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable type-checking and correct some resulting problems #399

Enable type-checking and correct some resulting problems #399

dhdaines commented Sep 11, 2024 •

edited

Loading

dhdaines Sep 11, 2024

dhdaines Sep 11, 2024

joanise Sep 11, 2024

dhdaines Sep 11, 2024

dhdaines Sep 12, 2024

joanise Sep 12, 2024

dhdaines Sep 12, 2024

joanise Sep 12, 2024

dhdaines Sep 12, 2024

dhdaines Sep 11, 2024

dhdaines Sep 11, 2024

github-actions bot commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

joanise left a comment •

edited

Loading

joanise Sep 11, 2024

joanise Sep 12, 2024

dhdaines Sep 12, 2024

joanise commented Sep 12, 2024

joanise left a comment

Enable type-checking and correct some resulting problems #399

Enable type-checking and correct some resulting problems #399

Conversation

dhdaines commented Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 11, 2024 • edited Loading

codecov bot commented Sep 11, 2024 • edited Loading

Codecov Report

joanise left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joanise commented Sep 12, 2024

joanise left a comment

Choose a reason for hiding this comment

dhdaines commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024 •

edited

Loading

codecov bot commented Sep 11, 2024 •

edited

Loading

joanise left a comment •

edited

Loading