Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Cpp namespace when import Cpp is used #4873

Open
wants to merge 12 commits into
base: trunk
Choose a base branch
from

Conversation

bricknerb
Copy link
Contributor

Also defined a dedicated ImportCppDecl InstKind.
Part of #4666

Also defined a dedicated `ImportCppDecl` `InstKind`.
@bricknerb bricknerb requested a review from jonmeow January 31, 2025 12:05
@github-actions github-actions bot requested a review from danakj January 31, 2025 12:06
@@ -340,17 +345,22 @@ auto CheckUnit::ImportCppPackages() -> void {
return;
}

IdentifierId package_id = imports.front().names.package_id;
for (size_t i = 1; i < imports.size(); ++i) {
CARBON_CHECK(imports[i].names.package_id == package_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to use all here instead of a for loop? If you like:

IdentifierId package_id = imports.front().names.package_id;
CARBON_CHECK(
    llvm::all(imports, [&](const CppImport& import) { return import.names.package_id == package_id; }));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

first_import_decl_id, SemIR::AccessKind::Public);
CARBON_CHECK(inserted);

CARBON_CHECK(first_import_decl_id.has_value());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetLocId() checks the same thing (via >= 0) so I think this isn't needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto import_loc_id = context.insts().GetLocId(first_import_decl_id);

auto namespace_inst = SemIR::Namespace{
namespace_type_id, SemIR::NameScopeId::None, first_import_decl_id};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'd like the fields to be named in here:

- Use designated initializers (`{.a = 1}`) when possible for structs,

(I see import.cpp didn't name them here, but does in other cases, probably an oversight/inconsistency there.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

namespace_id, name_id, SemIR::NameScopeId::Package);
context.ReplaceInstBeforeConstantUse(namespace_id, namespace_inst);

// Note we have to get the parent scope freshly, creating the imported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just avoid the local variable entirely? It's only used once each time. And leave this comment around explaining why we don't hold a local variable instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// CHECK:STDOUT: package: <namespace> = namespace [template] {
// CHECK:STDOUT: .Cpp = imports.%Cpp
// CHECK:STDOUT: }
// CHECK:STDOUT: import_cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks different from the regular import semir. For example, stripped down from toolchain/check/testdata/alias/fail_builtins.carbon:

// CHECK:STDOUT: imports {
// CHECK:STDOUT:   %Core: <namespace> = namespace file.%Core.import, [template] {
// CHECK:STDOUT:     .Int = %Core.Int
// CHECK:STDOUT:   }
// CHECK:STDOUT: }
// CHECK:STDOUT:
// CHECK:STDOUT: file {
// CHECK:STDOUT:   package: <namespace> = namespace [template] {
// CHECK:STDOUT:     .Core = imports.%Core
// CHECK:STDOUT:   }
// CHECK:STDOUT:   %Core.import = import Core
// CHECK:STDOUT: }

a) The file name is %Core.import but here we have import_cpp. When there's more than one import_cpp they all have the same name, and there's no % attached to it.
b) %Core.import is = import Core but there's nothing attached to import_cpp
c) In the imports block, the %Core is set to namespace file.%Core.import which matches the name we see in the file block. But for cpp we have %Cpp set to namespace file.%cpp_import which is not the same as import_cpp in file.

How do we make this more consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question.
I think the main difference is that when we import Cpp files, we generate a header that includes all the imported Cpp headers and process the generated header, so all Cpp imports are actually merged.
a) import is replaced by import_cpp as expected. There is no specific package name because all import_cpp are for Cpp.
b) That's because of the generated file that merges all import Cpps.
c) I've replaced cpp_import with import_cpp, so it matches better.

That said, I'm looking for suggestions what should the output look like given that we merge all import Cpp together. Perhaps some improvements can be done to this PR and more complicated ones can be done in follow ups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImportDecl and ImportCppDecl have separate printing behavior. When you see %Core.import = import Core, it's declaring an ImportDecl with Core as an argument, and assigning a name as %Core.import. Right now CppImport doesn't have any arguments, so nothing on the RHS should be expected.

@bricknerb In formatter.cpp, if you copy auto FormatInstLHS(InstId inst_id, ImportDecl /*inst*/) -> void to add a similar overload for ImportCppDecl, what's the result? I think you might want to aim for something like %Cpp = import_cpp, which is how we indicate the instruction has a name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jon!
Added a simple overload and we now have
%import_cpp = import_cpp

@bricknerb bricknerb requested a review from danakj February 3, 2025 09:09
@@ -118,10 +121,11 @@ static auto AddNamespace(Context& context, IdentifierId cpp_package_id,

// Note we have to get the parent scope freshly, creating the imported
// namespace may invalidate the pointer above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// namespace may invalidate the pointer above.
// namespace may invalidate any pointers into name_scopes().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -26,7 +26,7 @@ import Cpp library "file2.h";
// CHECK:STDOUT: --- multiple_imports.carbon
// CHECK:STDOUT:
// CHECK:STDOUT: imports {
// CHECK:STDOUT: %Cpp: <namespace> = namespace file.%cpp_import.loc4, [template] {}
// CHECK:STDOUT: %Cpp: <namespace> = namespace file.%import_cpp.loc4, [template] {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location is interesting, considering they are all being merged as you said. I'm not sure what that will do for diagnostics, if it will be hard to know which import is bringing in a clang error? Or maybe that is clear in diagnostics but we just don't have them all listed in semir? Then that might make tooling difficult though.

I think this is okay for now, but maybe we can think about improvements if this is a pain. Perhaps we could have multiple %Cpp imports and they'd get .xyz suffixes in semir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how all imported namespaces behave -- C++ isn't any different here. i.e., you should expect the same result from:

import Foo library "bar";
import Foo library "baz";

The name Foo can have only one location associated with it.

When it comes to specific names, we can have more specific locations. We probably won't have that short-term for C++ due to complexities involves, and how #include semantics need to take priority. But a single name -- whether Cpp, Foo, or something else -- has a single location in SemIR.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the instruction model choice here, maybe you can fill out some detail for me?

At present, it looks like each import Cpp line gets its own instruction. Then, these are combined into a single namespace. The import appears to only be associated with the first import Cpp, is that correct?

Had you considered instead having the Cpp namespace backed by a single instruction, and having that linked to a structure that stores the information about the Cpp data as part of SemIR::File?

Some of the significance for me is in how name references work. I'd expect SemIR for something like Cpp.printf to read a bit like:

%import_cpp = import_cpp
%printf.ref: type = name_ref printf, %import_cpp.printf [template = constants.%printf]

Note here that only one ImportCppDecl can actually end up in name lookup, and this is akin to how we use ImportDecl when there are multiple imports from the same package: the instruction's argument is a package, and internally it decides how to split name references across libraries.

toolchain/sem_ir/typed_insts.h Outdated Show resolved Hide resolved
@@ -26,7 +26,7 @@ import Cpp library "file2.h";
// CHECK:STDOUT: --- multiple_imports.carbon
// CHECK:STDOUT:
// CHECK:STDOUT: imports {
// CHECK:STDOUT: %Cpp: <namespace> = namespace file.%cpp_import.loc4, [template] {}
// CHECK:STDOUT: %Cpp: <namespace> = namespace file.%import_cpp.loc4, [template] {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how all imported namespaces behave -- C++ isn't any different here. i.e., you should expect the same result from:

import Foo library "bar";
import Foo library "baz";

The name Foo can have only one location associated with it.

When it comes to specific names, we can have more specific locations. We probably won't have that short-term for C++ due to complexities involves, and how #include semantics need to take priority. But a single name -- whether Cpp, Foo, or something else -- has a single location in SemIR.

// CHECK:STDOUT: package: <namespace> = namespace [template] {
// CHECK:STDOUT: .Cpp = imports.%Cpp
// CHECK:STDOUT: }
// CHECK:STDOUT: import_cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImportDecl and ImportCppDecl have separate printing behavior. When you see %Core.import = import Core, it's declaring an ImportDecl with Core as an argument, and assigning a name as %Core.import. Right now CppImport doesn't have any arguments, so nothing on the RHS should be expected.

@bricknerb In formatter.cpp, if you copy auto FormatInstLHS(InstId inst_id, ImportDecl /*inst*/) -> void to add a similar overload for ImportCppDecl, what's the result? I think you might want to aim for something like %Cpp = import_cpp, which is how we indicate the instruction has a name.

@jonmeow
Copy link
Contributor

jonmeow commented Feb 3, 2025

Note, to try to build some contrast for how imports from a package work...

// --- foo.carbon

package Foo library "bar";

// --- baz.carbon

package Foo library "baz";

class C {}

// --- wiz.carbon

import Foo library "bar";
import Foo library "baz";

var x: Foo.C;

We get the detail of the imported libraries inside the import data:

// CHECK:STDOUT: imports {
// CHECK:STDOUT:   %Foo: <namespace> = namespace file.%Foo.import, [template] {
// CHECK:STDOUT:     .C = %Foo.C
// CHECK:STDOUT:     import Foo//bar
// CHECK:STDOUT:     import Foo//baz
// CHECK:STDOUT:   }

At the file level, they're combined:

// CHECK:STDOUT: file {
// CHECK:STDOUT:   package: <namespace> = namespace [template] {
// CHECK:STDOUT:     .Foo = imports.%Foo
// CHECK:STDOUT:     .x = %x
// CHECK:STDOUT:   }
// CHECK:STDOUT:   %Foo.import = import Foo

I think here the imports is gotten by adding to import_ref_ids() -- see import.cpp's AddNamespace, you might want to see if you can take advantage of that rather than writing a copy.

@jonmeow
Copy link
Contributor

jonmeow commented Feb 3, 2025

Had you considered instead having the Cpp namespace backed by a single instruction, and having that linked to a structure that stores the information about the Cpp data as part of SemIR::File?

BTW, to correct this and connect to the cross-package example... I think a good goal would be something like:

// CHECK:STDOUT: imports {
// CHECK:STDOUT:   %Cpp: <namespace> = namespace file.%Cpp.import_cpp, [template] {
// CHECK:STDOUT:     import_cpp "bar.h"
// CHECK:STDOUT:     import_cpp "baz.h"
// CHECK:STDOUT:   }
// CHECK:STDOUT: file {
// CHECK:STDOUT:   package: <namespace> = namespace [template] {
// CHECK:STDOUT:     .Cpp = imports.%Cpp
// CHECK:STDOUT:     .x = %x
// CHECK:STDOUT:   }
// CHECK:STDOUT:   %Cpp.import_cpp = import_cpp

Comment on lines 83 to 84
static auto AddNamespace(Context& context, IdentifierId cpp_package_id,
SemIR::InstId first_import_decl_id) -> void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had you tried something like using import.cpp's AddNamespace, i.e.:

AddNamespace(context, namespace_type_id, name_id, SemIR::NameScopeId::Package, /*diagnose_duplicate_namespace=*/true, [&] { return first_import_decl_id; });

What was the difference between that and what you're trying to do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it further, but I still believe you're right that there are a lot of similarities and that's why I have a TODO extract and deduplicate the common logic.
Once we agree on the functionality, I can do this TODO as a follow up PR or as part of this PR.

@@ -705,6 +705,10 @@ auto InstNamer::CollectNamesInBlock(ScopeId top_scope_id,
add_inst_name(out.TakeStr());
continue;
}
case ImportCppDecl::Kind: {
add_inst_name("import_cpp");
Copy link
Contributor

@jonmeow jonmeow Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_inst_name("import_cpp");
add_inst_name("Cpp.import_cpp");

We're typically trying to reflect the identifier as part of instruction names so that the parallels with code are easier to see. How about this, which is close to the ImportDecl format?

(noting this from the %import_cpp = import_cpp reply, this should make it %Cpp.import_cpp = import_cpp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

… one for the `Cpp` namespace.

Create a new `ImportCpp` struct that contains information per `import Cpp`, and store all instances in `File`.
Format `ImportCppDecl` by printing all of the file's `ImportCpp`.
Name `ImportCppDecl` as `"Cpp.import_cpp"` instead of `"import_cpp"`.
@bricknerb
Copy link
Contributor Author

Had you considered instead having the Cpp namespace backed by a single instruction, and having that linked to a structure that stores the information about the Cpp data as part of SemIR::File?

BTW, to correct this and connect to the cross-package example... I think a good goal would be something like:

// CHECK:STDOUT: imports {
// CHECK:STDOUT:   %Cpp: <namespace> = namespace file.%Cpp.import_cpp, [template] {
// CHECK:STDOUT:     import_cpp "bar.h"
// CHECK:STDOUT:     import_cpp "baz.h"
// CHECK:STDOUT:   }
// CHECK:STDOUT: file {
// CHECK:STDOUT:   package: <namespace> = namespace [template] {
// CHECK:STDOUT:     .Cpp = imports.%Cpp
// CHECK:STDOUT:     .x = %x
// CHECK:STDOUT:   }
// CHECK:STDOUT:   %Cpp.import_cpp = import_cpp

Thanks Jon!

I've changed the logic to have a single ImportCppDecl instruction and have that is pointed from the imported namespace.
This instruction is in file and I added metadata in File for the list of cpp files imported and format them as part of that (so under file {} and not under imports {}.
WDYT?

@bricknerb bricknerb requested a review from jonmeow February 5, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants