-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable type-checking and correct some resulting problems #399
Conversation
if ( | ||
self.abbreviations | ||
and self.rules | ||
and "match_pattern" not in self.rules[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't know what this was intended to do. Why only check self.rules[0]
? Why not the other rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore I removed this check, it seems meaningless - if the intent was not to overwrite a pre-specified match_pattern
then this is definitely the wrong place to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I admit I didn't look at git blame
here, it might reveal the original purpose. not overwriting an explicit match_pattern
is something we might want to do, for instance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the commit that added the check for "match_pattern" in the first rule: 8ec4f27 @roedoejet do you remember what the intent was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like it's a check to make sure abbreviations have not already been expanded before, so we don't do it twice. I would put a conditional breakpoint() or logger.error() if this condition was met and look at the state of the class if it's met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it is actually impossible for this to happen now, unless process_model_specs
, which is called during construction of a Mapping
, gets called explicitly later on. Probably what we want to actually do is ensure that nobody does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good analysis. In fact, initialization of the Mappings
class has been significantly cleaned up over the years, with the move to Pydantic in particular, let's keep the change you made here, and replace it as you suggest by a guard preventing us from calling process_model_specs
, and maybe even model_post_init
, more than once. I'd be happy with that guard being a simple assert
because it's a programmer error, not a user error, that would get either called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - the way that makes the most sense to do this is to just declare that seeing match_pattern
or intermediate_form
in a rule (any rule) is a programmer error, since they are excluded from serialization, and if you specify them, they will get overwritten anyway, so you shouldn't do that. Added a commit for that!
@@ -131,21 +142,22 @@ def normalize(inp: str, norm_form: str): | |||
|
|||
Also, find any Unicode Escapes & decode 'em! | |||
""" | |||
if norm_form not in ["none", "NFC", "NFD", "NFKC", "NFKD"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like None
was a valid value below, but we don't include it here, so that check was meaningless. Made it meaningful...
@@ -34,16 +34,16 @@ def getPanphonDistanceSingleton1(): | |||
|
|||
def getPanphonDistanceSingleton2(): | |||
if not hasattr(getPanphonDistanceSingleton2, "value"): | |||
setattr(getPanphonDistanceSingleton2, "value", panphon.distance.Distance()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setattr
is gratuitous (and flake8 or mypy or somebody warns about it)
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #399 +/- ##
==========================================
+ Coverage 93.54% 93.82% +0.28%
==========================================
Files 18 18
Lines 2571 2575 +4
Branches 579 577 -2
==========================================
+ Hits 2405 2416 +11
+ Misses 95 91 -4
+ Partials 71 68 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great stuff. Leaving some comments now, I'll have to come back for further analysis, though.
if ( | ||
self.abbreviations | ||
and self.rules | ||
and "match_pattern" not in self.rules[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like most of this PR, but I'll have to come back and analyse this change more carefully, because I'm not clear on the original purpose of this check and therefore I'm not confident it's unnecessary. When we wrote it, there must have been a reason, I'd like to dig and find it.
key=lambda x: ( | ||
len(normalize(strip_index_notation(x.rule_input), "NFD")) | ||
if isinstance(x, Rule) | ||
else len(normalize(x["in"], "NFD")) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, removing this conditional also fixed a bug that has umista_equiv.csv
sorted before removing its index notation instead of after.
Presumably the real different here is that you're creating Rule
objects right away all the time, rather initializing the Mapping
class with either/or.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh - hopefully I haven't broken some subtle behaviour that we depended on, it seemed to make sense to me to create the Rule
objects right away, so that they always have the same validation
da6a8fc
to
302a0b8
Compare
includes these squashed commits: - fix: 3.7 compatibility - fix: type-ignore bad type annotation in click - fix: properly test normalize function
302a0b8
to
3eee1a6
Compare
I rebased this PR to the tip of main, and squashed some commits. I also removed the description from the generated fields, because those description trigger a schema update, which would require a minor version bump and a new schema published to the schema store, and I really don't want to do that just for two hidden fields we're not supposed to populate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks for taking care of these refinements.
A few functions that had type hints were not actually being type-checked, because there weren't any in the function signature.
This revealed a few problems:
click.BadParameter
's type signature doesn't match its documentation (the documentation is correct)Literal
usingin
syntax python/mypy#12535)Mapping.rules
asList[Rule]
but then we put ... other stuff in it, which causes some problems. Stop doing this and simplify the code so that we don't need a noqa.