Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Fallback to attributes on types.ModuleType if a symbol can't be found in locals or globals #13904

Merged
merged 7 commits into from
Oct 29, 2024

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Oct 24, 2024

Summary

All modules in Python are instances of types.ModuleType. Various attributes on types.ModuleType in typeshed represent "implicit globals" that are part of the namespace of every module in Python, even if not explicitly defined. As such, before declaring that a symbol is unbound (and in fact, before looking the symbol up in the builtins scope) we should look the symbol up as an attribute on types.ModuleType. The only attributes on types.ModuleType that are not present as implicit globals are __dict__, __getattr__ and __init__.

Test Plan

Markdown test added as crates/red_knot_python_semantic/resources/mdtest/scopes/moduletype_attrs.md. We also get to remove a false-positive error from the tomllib benchmark 🎉

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Oct 24, 2024
Copy link
Contributor

github-actions bot commented Oct 24, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@AlexWaygood AlexWaygood marked this pull request as draft October 24, 2024 11:17
@AlexWaygood AlexWaygood force-pushed the moduletype-attrs branch 2 times, most recently from dcccb7a to 22fc347 Compare October 24, 2024 11:22
Copy link

codspeed-hq bot commented Oct 24, 2024

CodSpeed Performance Report

Merging #13904 will not alter performance

Comparing moduletype-attrs (eee3e80) with main (7dd0c7f)

Summary

✅ 32 untouched benchmarks

@AlexWaygood AlexWaygood changed the base branch from main to alex/pep604-todo October 24, 2024 12:41
@AlexWaygood AlexWaygood marked this pull request as ready for review October 24, 2024 12:41
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I do think we need to resolve the regression here, if at all possible; this is too much of an edge-case to pay that much perf for.

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
Base automatically changed from alex/pep604-todo to main October 25, 2024 18:21
@AlexWaygood AlexWaygood force-pushed the moduletype-attrs branch 2 times, most recently from 3739088 to ae8ab30 Compare October 26, 2024 16:19
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I assume you are still planning to try replacing the hard coded allowlist with a salsa query based on actual attributes found in typeshed, just noticed one minor terminology nit.

@AlexWaygood AlexWaygood force-pushed the moduletype-attrs branch 4 times, most recently from 2f14b09 to abb4375 Compare October 27, 2024 19:04
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Oct 27, 2024

Okay, I've switched from the hardcoded allowlist to a Salsa-cached generated allowlist! Locally I'm consistently seeing a regression of 1-2% on the incremental benchmark, but Codspeed reports the benchmarks are neutral on this PR, so maybe what I'm seeing locally is just noise.

In order to obtain a list of symbols that typeshed declares in the body of types.ModuleType, I had to add a new SymbolsFlag member, IS_DECLARED. We had ways of querying whether symbols in a SymbolTable had been defined in that scope, but no way of querying whether symbols in a SymbolTable had been declared in that scope -- and for a stub file, the latter is what's important.

I've also fixed several edge cases regarding accessing members as attributes on imported modules. In some cases, whether a module "defines" a member can vary depending on whether you access the member as a global variable from inside the module or as an attribute on the module object after it's been imported elsewhere.

TL;DR: ready for re-review!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unfortunate to me that we now have to track DECLARED for all symbols just for this one use case.

}

bitflags! {
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
struct SymbolFlags: u8 {
const IS_USED = 1 << 0;
const IS_BOUND = 1 << 1;
const IS_BOUND = 1 << 1;
const IS_DECLARED = 1 << 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comment explaining IS_DECLARED's meaning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

It seems unfortunate to me that we now have to track DECLARED for all symbols just for this one use case.

I would wager that this information will come in useful in other contexts in the future (though I can't right now say what -- it's just an intuition I have).

However, one alternative way of tackling this would be to write a Python script that inspects the AST of the typeshed stub for types.ModuleType and generates a Rust file that contains a function that returns true if a given symbol is declared in the body of types.ModuleType. (This would be similar to the approach we take in https://github.com/astral-sh/ruff/blob/main/scripts/generate_known_standard_library.py.) We could run the script as part of every typeshed sync, to make sure that the generated function is always up to date with the typeshed stubs.

This would be a more complicated than the current solution. But it would have the following advantages:

  • We could probably use the typeshed_client library in the Python script, which means that the script itself probably wouldn't be that complicated
  • The generated Rust function would likely be more performant than the current solution that uses a Salsa query, because we wouldn't have to look anything up in the cache: the specific symbols that are declared on types.ModuleType would just be hardcoded into the generated Rust function.
  • We wouldn't have to track DECLARED for all symbols just for this one use case

I'm not sure whether this would be better or worse on net than the solution I currently have in this PR -- but it's an alternative to consider, anyway!

@MichaReiser
Copy link
Member

I would wager that this information will come in useful in other contexts in the future (though I can't right now say what -- it's just an intuition I have).

Yeah. I'm not a fan of it but I don't consider it blocking. I guess an alternative to this would be to look at the class type and iterate over its members. I don't think we support it today, but I suspect that's a capability we want long-term.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few comments that I think are worth at least testing and/or mentioning in TODOs, even if we don't make code changes for them right now.

an alternative to this would be to look at the class type and iterate over its members. I don't think we support it today, but I suspect that's a capability we want long-term.

I agree that we'll want this, and I think the best / most efficient way to implement this would be to iterate over symbols in the symbol table that are defined/declared in the class, using IS_DEFINED and IS_DECLARED flags. So I would flip it around and say this is not an alternative approach, rather it's another likely future use case for this flag.

Comment on lines +22 to +23
# TODO: this should probably be added to typeshed; not sure why it isn't?
# error: [unresolved-reference]
# revealed: Unbound
reveal_type(__doc__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because it's on builtins.object?

The awkward thing is that I think __doc__ might be the only attribute on object that is accessible in every module as a global name.

I think all of them except for __module__ are accessible on module objects as an attribute.

You'll know better than I whether it's likely we can get __doc__ added specifically to ModuleType (despite it already being on object) to help support fixing this TODO with fewer special cases, or whether we should instead just bite the bullet and handle it as a special case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I think it would be fine to add __doc__ to ModuleType in typeshed if we add a comment saying why we have the "redundant" override from object... I'll try to put up a typeshed PR today...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reveal_type(__init__)
```

## Accessed as attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__class__ is another one defined on object that is accessible on ModuleType instances, but not as a module global name.

__module__ is an odd special case in that typeshed says it exists on object but it does not actually exist on module objects. Probably not worth modeling this LSP violation, it's fine if we just let it exist on all objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__module__ is an odd special case in that typeshed says it exists on object but it does not actually exist on module objects. Probably not worth modeling this LSP violation, it's fine if we just let it exist on all objects?

Sigh. __module__ really shouldn't be listed as an instance attribute on object; it's just obviously incorrect. But I'm not going to restart that conversation 🙃

I think this is not worth special-casing in red-knot; we should just accept it as an inaccuracy in type inference that we have to live with unless and until it's fixed in typeshed.

@AlexWaygood AlexWaygood merged commit d2c9f5e into main Oct 29, 2024
20 checks passed
@AlexWaygood AlexWaygood deleted the moduletype-attrs branch October 29, 2024 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants