Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable type-checking and correct some resulting problems #399

Merged
merged 5 commits into from
Sep 12, 2024

Conversation

dhdaines
Copy link
Collaborator

@dhdaines dhdaines commented Sep 11, 2024

A few functions that had type hints were not actually being type-checked, because there weren't any in the function signature.

This revealed a few problems:

  • click.BadParameter's type signature doesn't match its documentation (the documentation is correct)
  • mypy doesn't narrow strings to literal types properly (see Narrowing types to Literal using in syntax python/mypy#12535)
  • we declare Mapping.rules as List[Rule] but then we put ... other stuff in it, which causes some problems. Stop doing this and simplify the code so that we don't need a noqa.

if (
self.abbreviations
and self.rules
and "match_pattern" not in self.rules[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know what this was intended to do. Why only check self.rules[0]? Why not the other rules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore I removed this check, it seems meaningless - if the intent was not to overwrite a pre-specified match_pattern then this is definitely the wrong place to do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I admit I didn't look at git blame here, it might reveal the original purpose. not overwriting an explicit match_pattern is something we might want to do, for instance...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the commit that added the check for "match_pattern" in the first rule: 8ec4f27 @roedoejet do you remember what the intent was?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like it's a check to make sure abbreviations have not already been expanded before, so we don't do it twice. I would put a conditional breakpoint() or logger.error() if this condition was met and look at the state of the class if it's met.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it is actually impossible for this to happen now, unless process_model_specs, which is called during construction of a Mapping, gets called explicitly later on. Probably what we want to actually do is ensure that nobody does that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good analysis. In fact, initialization of the Mappings class has been significantly cleaned up over the years, with the move to Pydantic in particular, let's keep the change you made here, and replace it as you suggest by a guard preventing us from calling process_model_specs, and maybe even model_post_init, more than once. I'd be happy with that guard being a simple assert because it's a programmer error, not a user error, that would get either called twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - the way that makes the most sense to do this is to just declare that seeing match_pattern or intermediate_form in a rule (any rule) is a programmer error, since they are excluded from serialization, and if you specify them, they will get overwritten anyway, so you shouldn't do that. Added a commit for that!

@@ -131,21 +142,22 @@ def normalize(inp: str, norm_form: str):

Also, find any Unicode Escapes & decode 'em!
"""
if norm_form not in ["none", "NFC", "NFD", "NFKC", "NFKD"]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like None was a valid value below, but we don't include it here, so that check was meaningless. Made it meaningful...

@@ -34,16 +34,16 @@ def getPanphonDistanceSingleton1():

def getPanphonDistanceSingleton2():
if not hasattr(getPanphonDistanceSingleton2, "value"):
setattr(getPanphonDistanceSingleton2, "value", panphon.distance.Distance())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setattr is gratuitous (and flake8 or mypy or somebody warns about it)

Copy link
Contributor

github-actions bot commented Sep 11, 2024

CLI load time: 0:00.05
Pull Request HEAD: bbcd1e87c2735c263c0dcb11d5a02563c55c2c6a
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.82%. Comparing base (5631210) to head (bbcd1e8).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #399      +/-   ##
==========================================
+ Coverage   93.54%   93.82%   +0.28%     
==========================================
  Files          18       18              
  Lines        2571     2575       +4     
  Branches      579      577       -2     
==========================================
+ Hits         2405     2416      +11     
+ Misses         95       91       -4     
+ Partials       71       68       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff. Leaving some comments now, I'll have to come back for further analysis, though.

g2p/cli.py Outdated Show resolved Hide resolved
if (
self.abbreviations
and self.rules
and "match_pattern" not in self.rules[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.

Comment on lines -179 to -183
key=lambda x: (
len(normalize(strip_index_notation(x.rule_input), "NFD"))
if isinstance(x, Rule)
else len(normalize(x["in"], "NFD"))
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, removing this conditional also fixed a bug that has umista_equiv.csv sorted before removing its index notation instead of after.
Presumably the real different here is that you're creating Rule objects right away all the time, rather initializing the Mapping class with either/or.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - hopefully I haven't broken some subtle behaviour that we depended on, it seemed to make sense to me to create the Rule objects right away, so that they always have the same validation

includes these squashed commits:
 - fix: 3.7 compatibility
 - fix: type-ignore bad type annotation in click
 - fix: properly test normalize function
@joanise
Copy link
Collaborator

joanise commented Sep 12, 2024

I rebased this PR to the tip of main, and squashed some commits.

I also removed the description from the generated fields, because those description trigger a schema update, which would require a minor version bump and a new schema published to the schema store, and I really don't want to do that just for two hidden fields we're not supposed to populate.

Copy link
Collaborator

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for taking care of these refinements.

@joanise joanise merged commit b315a6c into main Sep 12, 2024
8 checks passed
@joanise joanise deleted the dev.dhd/activate_typing branch September 12, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants