Skip to content

Commit

Permalink
Remove Parse::Node, add ElementIndex in docs for typed insts (#4604)
Browse files Browse the repository at this point in the history
The documentation still referred to typed instructions having a
Parse::Node field, however that was removed and moved to the InstStore
in f197219.

Then GetParseNode() was renamed to GetNodeId() in 86a7c9f and
then GetLocationId() in b079acd and finally GetLocId() in
b5d28f2.

The comment in typed_inst.h mentions only three fields now, but some
types still have four, thanks to the unmentioned `ElementIndex index`
field. Normally this field comes last, after the `[...]Id` fields except
for in one case, AssociatedEntity. Rather than write ambiguously ordered
documentation, update the comment to and docs to say that the
ElementIndex comes last, and move it to the last position in
AssociatedEntity. Tests are rebased accordingly.

---------

Co-authored-by: Richard Smith <[email protected]>
  • Loading branch information
danakj and zygoloid authored Dec 3, 2024
1 parent b705be9 commit d434828
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 26 deletions.
44 changes: 24 additions & 20 deletions toolchain/docs/check.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,37 +84,41 @@ instruction, as described in
[sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) (also see
[adding features for Check](adding_features.md#check)):

- A `Parse::Node parse_node;` member that tracks its location is present on
almost all instructions, except instructions like `SemIR::Builtin` that
don't have an associated location.
- An `InstKind kind;` member if the instruction has a `Kinds` constant making
it a shorthand for multiple individual instructions.

- A `SemIR::TypeId type_id;` member that describes the type of the instruction
is present on all instructions that produce a value. This includes namespace
instructions, which are modeled as producing a value of "namespace" type,
even though they can't be used as a first-class value in Carbon expressions.

- Up to two additional, kind-specific members. For example `SemIR::Assign` has
members `InstId lhs_id` and `InstId rhs_id`.
- Up to two additional [kind-specific members](#instruction-operands). For
example `SemIR::Assign` has members `InstId lhs_id` and `InstId rhs_id`. And
`SemIR::ClassElementAccess` has an `ElementIndex` into the fields of the
class to define which field it is accessing.

Instructions are stored as type-erased `SemIR::Inst` objects, which store the
instruction kind and the (up to) four fields described above. This balances the
size of `SemIR::Inst` against the overhead of indirection.
size of `SemIR::Inst` against the overhead of indirection. All instructions have
with them a `LocId` that tracks their source location, and is stored separately
on the `InstStore` and retrieved by `GetLocId()`.

A `SemIR::InstBlock` can represent a code block. However, it can also be created
when a series of instructions needs to be closely associated, such as a
parameter list.
A `SemIR::InstBlockId`, used to index into `SemIR::InstBlockStore`, can
represent a code block. However, it can also be created when a series of
instructions needs to be closely associated, such as a parameter list.

A `SemIR::Builtin` represents a language built-in, such as the unconstrained
facet type `type`. We will also have built-in functions which would need to form
the implementation of some library types, such as `i32`. Built-ins are in a
stable index across `SemIR` instances.
A number of instruction types in
[sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) are builtin
instructions, such as `SemIR::TypeType` which represents the unconstrained facet
type `type`. Builtins have stable ids in the `SemIR::InstStore` across `SemIR`
instances.

### Instruction operands

The kind-specific members on a typed instruction struct can be of any type
listed in the `SemIR::IdKind` enumeration defined in
[sem_ir/id_kind.h](/toolchain/sem_ir/id_kind.h). The most commonly used kinds
refer to other instructions:
[sem_ir/id_kind.h](/toolchain/sem_ir/id_kind.h), typically derived from
`IdBase`. The most commonly used kinds refer to other instructions:

- `SemIR::InstId`: Refers to a specific instance of an instruction. For
example, this should be used if the operand may have side-effects or a
Expand All @@ -132,11 +136,11 @@ documentation for the ID type for more details.

### Parameters and arguments

Parameters and arguments will be stored as two `SemIR::InstBlock`s each. The
first will contain the full IR, while the second will contain references to the
last instruction for each parameter or argument. The references block will have
a size equal to the number of parameters or arguments, allowing for quick size
comparisons and indexed access.
Parameters and arguments will be stored as two blocks (referred to by
`SemIR::InstBlockId`) each. The first will contain the full IR, while the second
will contain references to the last instruction for each parameter or argument.
The references block will have a size equal to the number of parameters or
arguments, allowing for quick size comparisons and indexed access.

## SemIR textual format

Expand Down
5 changes: 5 additions & 0 deletions toolchain/sem_ir/id_kind.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,11 @@ class TypeEnum {
};

// An enum of all the ID types used as instruction operands.
//
// As instruction operands, the types listed here can appear as fields of typed
// instructions (`toolchain/sem_ir/typed_insts.h`) and must implement the
// `FromRaw` and `ToRaw` protocol in `SemIR::Inst`. In most cases this is done
// by inheriting from `IdBase` or `IndexBase`.
using IdKind = TypeEnum<
// From base/value_store.h.
IntId, RealId, FloatId, StringLiteralValueId,
Expand Down
15 changes: 9 additions & 6 deletions toolchain/sem_ir/typed_insts.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
// - Optionally, a `TypeId type_id;` member, for instructions that produce a
// value. This includes instructions that produce an abstract value, such as a
// `Namespace`, for which a placeholder type should be used.
// - Up to two `[...]Id` members describing the contents of the struct.
// - Up to two members describing the contents of the struct. These are types
// listed in the `SemIR::IdKind` type-enum, typically derived from `IdBase`.
//
// TODO: There are some `int32_t` fields, which breaks the above rule, how/why?
//
// The field names here matter -- the fields must have the names specified
// above, when present. When converting to a `SemIR::Inst`, the `kind` and
Expand All @@ -32,16 +35,16 @@
// enumeration. This `Kind` declaration also defines the instruction kind by
// calling `InstKind::Define` and specifying additional information about the
// instruction kind. This information is available through the member functions
// of the `InstKind` value declared in `inst_kind.h`, and includes the name
// used in textual IR and whether the instruction is a terminator instruction.
// of the `InstKind` value declared in `inst_kind.h`, and includes the name used
// in textual IR and whether the instruction is a terminator instruction.
//
// Struct types can also be provided for categories of instructions with a
// common representation, to allow the common representation to be accessed
// conveniently. In this case, instead of providing a constant `Kind` member,
// the struct should have a constant `InstKind Kinds[];` member that lists the
// kinds of instructions in the category, and an `InstKind kind;` member that
// is used to identify the specific kind of the instruction. Separate struct
// types still need to be defined for each instruction kind in the category.
// kinds of instructions in the category, and an `InstKind kind;` member that is
// used to identify the specific kind of the instruction. Separate struct types
// still need to be defined for each instruction kind in the category.

namespace Carbon::SemIR {

Expand Down

0 comments on commit d434828

Please sign in to comment.