Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Literal special form #13874

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

Glyphack
Copy link
Contributor

@Glyphack Glyphack commented Oct 22, 2024

Handling Literal type in annotations.

Resolves: #13672

Implementation

Since Literals are not a fully defined type in typeshed. I used a trick to figure out when a special form is a literal.
When we are inferring assignment types I am checking if the type of that assignment was resolved to typing.SpecialForm and the name of the target is Literal if that is the case then I am re creating a new instance type and set the known instance field to KnownInstance:Literal.

Why not defining a new type?

From this issue I learned that we want to resolve members to SpecialMethod class. So if we create a new instance here we can rely on the member resolving in that already exists.

Tests

https://typing.readthedocs.io/en/latest/spec/literal.html#equivalence-of-two-literals
Since the type of the value inside Literal is evaluated as a Literal(LiteralString, LiteralInt, ...) then the equality is only true when types and value are equal.

https://typing.readthedocs.io/en/latest/spec/literal.html#legal-and-illegal-parameterizations

The illegal parameterizations are mostly implemented I'm currently checking the slice expression and the slice type to make sure it's valid.

Not covered:

  1. I did not find an easy way to error on things like Literal["foo".replace("o", "b")] because I cannot fully disable attribute expressions in the Literal since enum members are allowed.
  2. parenthesized Tuples are not allowed. Although pyright allows this in the doc is stated that tuples containing valid literal types are illegal. Tuples are valid in case of Literal["w", "r"] for example.

The union creation with Literals is not working because I saw comments about Union not implemented yet.
https://typing.readthedocs.io/en/latest/spec/literal.html#shortening-unions-of-literals

Summary

Test Plan

@Glyphack Glyphack force-pushed the literal-type branch 4 times, most recently from 3fe84bd to fbcc66c Compare October 22, 2024 22:13
@Glyphack Glyphack marked this pull request as ready for review October 22, 2024 22:26
Copy link
Contributor

github-actions bot commented Oct 22, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This task is actually quite a bit harder than I made it sound, I was forgetting some of the complexity in recognizing special forms :) This is a really good initial effort. Let me know if any of the comments below don't make sense or need further clarification.

@@ -1130,6 +1132,7 @@ impl<'db> KnownClass {
Self::ModuleType => "ModuleType",
Self::FunctionType => "FunctionType",
Self::NoneType => "NoneType",
Self::SpecialForm => "SpecialForm",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should match the actual name of the symbol in the typing module?

Suggested change
Self::SpecialForm => "SpecialForm",
Self::SpecialForm => "_SpecialForm",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Un-resolved this comment, because it doesn't look addressed.) Is there a reason this needs to stay "SpecialForm" and not match the actual name in the module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry I accidentally force pushed.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
let annotation_ty = self.infer_annotation_expression(annotation);
let mut annotation_ty = self.infer_annotation_expression(annotation);

// If the variable is annotation with SpecialForm then create a new class with name of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the variable is annotation with SpecialForm then create a new class with name of the
// If the variable is annotated with SpecialForm then create a new class with name of the

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment also doesn't look resolved?

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
self.infer_subscript_expression(subscript);
Type::Todo
}
ast::Expr::Subscript(subscript) => self.infer_subscript_expression(subscript),
Copy link
Contributor

@carljm carljm Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where we need the code to recognize special forms, but we should not be falling back to infer_subscript_expression here (that's for value expressions), instead we should have a dedicated infer_subscript_type_expression method, which should use infer_type_expression on the value and the index, and for now handle only the case where the value is typing.Literal special form, otherwise just return Todo.

(The fact that infer_subscript_expression was previously called here was just an easy placeholder way to ensure we cover all the sub-expressions, until we added proper support for inferring types correctly in type expressions.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Just one question, should we use infer_type_expression for all indexes?
For example Literal in here is defined with an expression in the index. Others have annotation_expression in their grammar. So here I use infer_type_expression for other things but if it's Literal I use infer_expression.

My reasoning behind it was when the value is True. The True itself should not have any meaning when used alone in the type annotation.

@carljm carljm added the red-knot Multi-file analysis & type inference label Oct 23, 2024
@Glyphack Glyphack force-pushed the literal-type branch 2 times, most recently from bb69a4c to a5a4f7f Compare October 27, 2024 17:07
Comment on lines 76 to 78
/// Lookup the type of `symbol` in the `_typeshed` module namespace.
///
/// Returns `Unbound` if the `_typeshed` module isn't available for some reason.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Lookup the type of `symbol` in the `_typeshed` module namespace.
///
/// Returns `Unbound` if the `_typeshed` module isn't available for some reason.
/// Lookup the type of `symbol` in the `typing` module namespace.
///
/// Returns `Unbound` if the `typing` module isn't available for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was marked resolved, but it looks like it is still relevant and not addressed yet? I un-resolved it.

Copy link
Member

@AlexWaygood AlexWaygood Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @Glyphack did apply this suggestion using the GitHub web UI... Possibly it was accidentally lost in a force-push following that @Glyphack? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm sorry about that it caused a lot of resolved ones to be unresolved again. I will keep the comments open so I check them again before requesing review.

@Glyphack
Copy link
Contributor Author

Glyphack commented Oct 27, 2024

I applied the comments will spend another day on adding diagnostic messages for https://typing.readthedocs.io/en/latest/spec/literal.html#legal-and-illegal-parameterizations

@Glyphack
Copy link
Contributor Author

Glyphack commented Oct 28, 2024

Okay I added more parts of the legal and illegal parameters from the spec. Right now we have:

  1. All Literal types I managed to do this through a lot of recursion.
  2. Nested Literals

I think the remaining part is assignability check. Right now the Literals are unwrapped to their inner type I don't think this is the right way, is it? It works in the tests but I'm not sure if Literal types should carry some special flags with themselves.

I did not find an easy way to error on things like Literal["foo".replace("o", "b")] because I cannot fully disable attribute expressions in the Literal since enum members are allowed.
I can do the same I did on nested literals:

  1. Check if it's attribute
  2. Check the type of value and if it's a class and has Enum in bases otherwise emit diagnostic.

Does this sound good?

parenthesized Tuples are not allowed. Although pyright allows this in the doc is stated that tuples containing valid literal types are illegal. Tuples are valid in case of Literal["w", "r"] for example.

Also I'm not correctly joining union when it's possible. I left it as a todo in the tests:

    # TODO: revealed: Literal[1, 2, 3, "foo", 5] | None
    reveal_type(union_var)  # revealed: Literal[1, 2, 3] | Literal["foo"] | Literal[5] | None

Please let me know what do you think.

@Glyphack Glyphack changed the title Add Literal special form to types Literal special form Oct 28, 2024
@Glyphack Glyphack changed the title Literal special form [red-knot] Literal special form Oct 28, 2024
@@ -568,58 +575,76 @@ impl<'db> Type<'db> {

(Type::None, Type::Instance(class_type)) | (Type::Instance(class_type), Type::None) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking to go over all of the instance of Instance(class_type) and rename to Instance(instance) so it's not misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree that we should do this before landing this PR. I did a couple more in the commit I just pushed, but not all of them.

}
}

fn infer_literal_parameter_type(&mut self, parameters: &ast::Expr) -> Type<'db> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this function so I can call this on each parameter inside the [] so if we have Literal[Literal[expr]] it's converted to Literal[expr]

value_ty => {
let value_node = value.as_ref();
let slice_ty = self.infer_expression(slice);
// TODO: currently the logic to get the type of type of a subscript in type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to keep this here because we have a test case that checks we emit unsubscriptable error in type annotations so I kept it here with a todo to not break that test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you look carefully at those test cases, they all have their own TODO comments saying that they shouldn't emit that unsubscriptable error :)

This PR will eventually need to merge with #13943 so you can look at what @AlexWaygood did there and follow the same approach.

@carljm
Copy link
Contributor

carljm commented Oct 29, 2024

Explaining the commit I just pushed:

My intent in suggesting InstanceType was that it would not be a Salsa interned struct, but just a regular struct that would be stored inline in Type, so we wouldn't add an extra layer of database queries. That's why I was discussing the size of Type. I wanted to verify that this really worked, and didn't increase the size of Type, before making the suggestion -- and once I had verified that, I figured I might as well push the changes and not ask you to make them again :)

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are looking really good here! I pushed some changes to the InstanceType implementation (so it's not Salsa-interned), and left some comments on the inference implementation.

a3: Literal[-4]
a4: Literal["hello world"]
a5: Literal[b"hello world"]
a6: Literal["hello world"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to a4?

Comment on lines 76 to 78
/// Lookup the type of `symbol` in the `_typeshed` module namespace.
///
/// Returns `Unbound` if the `_typeshed` module isn't available for some reason.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was marked resolved, but it looks like it is still relevant and not addressed yet? I un-resolved it.

@@ -1130,6 +1132,7 @@ impl<'db> KnownClass {
Self::ModuleType => "ModuleType",
Self::FunctionType => "FunctionType",
Self::NoneType => "NoneType",
Self::SpecialForm => "SpecialForm",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Un-resolved this comment, because it doesn't look addressed.) Is there a reason this needs to stay "SpecialForm" and not match the actual name in the module?

Comment on lines 1219 to 1223
Self::SpecialForm => {
let t = typing_symbol_ty(db, self.as_str());
debug_assert!(t.is_unbound(), "special form not found");
t
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do this assert for the other special forms, I don't think we need to here either.

Suggested change
Self::SpecialForm => {
let t = typing_symbol_ty(db, self.as_str());
debug_assert!(t.is_unbound(), "special form not found");
t
}
Self::SpecialForm => typing_symbol_ty(db, self.as_str())

value_ty => {
let value_node = value.as_ref();
let slice_ty = self.infer_expression(slice);
// TODO: currently the logic to get the type of type of a subscript in type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you look carefully at those test cases, they all have their own TODO comments saying that they shouldn't emit that unsubscriptable error :)

This PR will eventually need to merge with #13943 so you can look at what @AlexWaygood did there and follow the same approach.

Comment on lines +3491 to +3492
// slice_ty is treated as expression because Literal accepts expression
// inside the []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems out of place and probably unnecessary? There's nothing near this comment named slice_ty, and I'm not sure what "treated as expression" means -- the slice of a subscript expression is an expression in the AST, that's just a fact.

Comment on lines +3495 to +3548
match parameters {
ruff_python_ast::Expr::StringLiteral(_)
| ruff_python_ast::Expr::BytesLiteral(_)
| ruff_python_ast::Expr::BooleanLiteral(_)
// For enum values
| ruff_python_ast::Expr::Attribute(_)
// For Another Literal inside this Literal
| ruff_python_ast::Expr::Subscript(_)
| ruff_python_ast::Expr::NoneLiteral(_) => {}
// for negative numbers
ruff_python_ast::Expr::UnaryOp(ref u) if (u.op == UnaryOp::USub || u.op == UnaryOp::UAdd) && u.operand.is_number_literal_expr() => {}
ruff_python_ast::Expr::NumberLiteral(ref number) if number.value.is_int() => {}
ruff_python_ast::Expr::Tuple(ref t) if !t.parenthesized => {}
_ => {
self.add_diagnostic(
parameters.into(),
"invalid-literal-parameter",
format_args!(
"Type arguments for `Literal` must be None, a literal value (int, bool, str, or bytes), or an enum value",
),
);
return Type::Unknown;
}
};

let slice_ty = self.infer_literal_parameter_type(parameters);

match slice_ty {
Type::Never
| Type::Unknown
| Type::Unbound
| Type::Todo
| Type::FunctionLiteral(_)
| Type::ModuleLiteral(_)
| Type::ClassLiteral(_)
| Type::Union(_)
| Type::Intersection(_)
| Type::Any => {
self.add_diagnostic(
parameters.into(),
"invalid-literal-parameter",
format_args!(
"Type arguments for `Literal` must be None, a literal value (int, bool, str, or bytes), or an enum value",
),
);
Type::Unknown
}
Type::Tuple(tuple) => {
let elts = tuple.elements(self.db);
Type::Union(UnionType::new(self.db, elts))
}
ty => ty,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach here would be for infer_literal_parameter_type to return Option<Type<'db>>, and return None if the literal parameter is not valid, otherwise the right Type. Then you only need to emit the error in one place (if infer_literal_parameter_type returns None), and you don't need these two extra match statements here.

// the values
match parameters {
ruff_python_ast::Expr::Subscript(inner_literal_subscript) => {
let inner_subscript_value = self.infer_expression(&inner_literal_subscript.value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem right to use infer_expression here, it should be infer_type_expression -- and then you shouldn't have to repeat the recognition of Literal below, or the invalid-literal-parameter error, infer_type_expression should do all that for you? You just have to verify the type you get back is a literal type.

}
Type::Tuple(TupleType::new(self.db, elts.into_boxed_slice()))
}
_ => self.infer_expression(parameters),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than matching out in infer_parameterized_known_instance_type_expression on AST forms known not to be valid, I think we should explicitly match here on each AST form known to be valid, and directly return the right type, without relying on self.infer_expression.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments on top of Carl's

Comment on lines +3543 to +3544
let elts = tuple.elements(self.db);
Type::Union(UnionType::new(self.db, elts))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a bit of a footgun here in our current design, in that you should never really use UnionType::new() directly, because it doesn't deduplicate the elements in the union. Instead you should always use UnionType::from_elements(), which takes care of all the deduplication for you

Comment on lines +3568 to +3574
self.add_diagnostic(
parameters.into(),
"invalid-literal-parameter",
format_args!(
"Type arguments for `Literal` must be None, a literal value (int, bool, str, or bytes), or an enum value",
),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting here is somewhat skew-whiff (I think cargo fmt is failing to spot it because of the macro, unfortunately)

Suggested change
self.add_diagnostic(
parameters.into(),
"invalid-literal-parameter",
format_args!(
"Type arguments for `Literal` must be None, a literal value (int, bool, str, or bytes), or an enum value",
),
);
self.add_diagnostic(
parameters.into(),
"invalid-literal-parameter",
format_args!(
"Type arguments for `Literal` must be None, a literal value (int, bool, str, or bytes), or an enum value",
),
);

Comment on lines +3578 to +3584
ruff_python_ast::Expr::Tuple(t) => {
let mut elts = vec![];
for elm in &t.elts {
elts.push(self.infer_literal_parameter_type(elm));
}
Type::Tuple(TupleType::new(self.db, elts.into_boxed_slice()))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ruff_python_ast::Expr::Tuple(t) => {
let mut elts = vec![];
for elm in &t.elts {
elts.push(self.infer_literal_parameter_type(elm));
}
Type::Tuple(TupleType::new(self.db, elts.into_boxed_slice()))
}
ruff_python_ast::Expr::Tuple(t) => {
let elements: Box<_> = t.iter().map(|elt| self.infer_literal_parameter_type(elt)).collect();
Type::Tuple(TupleType::new(self.db, elements))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] understand the Literal[] special form in annotations
3 participants