diff --git a/proposals/p4682.md b/proposals/p4682.md new file mode 100644 index 000000000000..c7a0f4952e11 --- /dev/null +++ b/proposals/p4682.md @@ -0,0 +1,520 @@ +# Array forwards to the prelude + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/4682) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) + - [Rust](#rust) + - [Swift](#swift) + - [Safe C++](#safe-c) + - [Goals](#goals) + - [Privileging the most common type names](#privileging-the-most-common-type-names) + - [Absence of syntax should make clear defaults](#absence-of-syntax-should-make-clear-defaults) + - [Avoiding confusion with other languages](#avoiding-confusion-with-other-languages) + - [Avoiding confusion with other domains](#avoiding-confusion-with-other-domains) +- [Proposal](#proposal) +- [Rationale](#rationale) +- [Future work](#future-work) + - [Automatically importing names from the `prelude` into file scope](#automatically-importing-names-from-the-prelude-into-file-scope) + - [Namespacing the `Core` package.](#namespacing-the-core-package) +- [Alternatives considered](#alternatives-considered) + - [`[T; N]`](#t-n) + - [`array [T; N]`](#array-t-n) + - [`Core.Array(T, N)`](#corearrayt-n) + - [`array(T, N)`](#arrayt-n) + + + +## Abstract + +We propose to add `Core.Array(T, N)` as a library type in the `prelude` library +of the `Core` package. Since arrays are a very frequent type, we propose to +privilege use of this type by providing a builtin `Array(T, N)` type that +resolves to the `Core.Array(T, N)` type. Users can model this as an implicit +import of the `Core.Array(T, N)` type into the file scope, much like the +implicit import of the `prelude` library of the `Core` package. + +## Problem + +Carbon's current syntax for a fixed-size, direct storage array (hereafter called +"array") is the provisional `[T; N]` and there is no syntax yet for a +mutably-sized indirect storage buffer (hereafter called "heap-buffer"). + +Arrays and heap-buffers are some of the most commonly used types, after +primitive types. The syntax, whatever it is, will be incredibly frequent in +Carbon source code. + +We explore and propose a new syntax for arrays that addresses design issues with +the provisional syntax that allows for writing each of the following in clear +ways: slice, compile-time sized slice, array, and pointer to array. And that +leaves clear room for a sibling indirect-storage type. + +## Background + +We have developed a matrix for enumerating and describing the vocabulary of +owning array and buffer types. Direct refers to an in-place storage buffer, as +with arrays. Indirect refers to heap allocation, where the type itself holds +storage of a pointer to the buffer, as with heap-buffers. + +To provide familiarity, here is the table for the C++ language as a baseline: + +| Owning type | Runtime Sized | Compile-time Sized | +| ------------------------ | ---------------------- | --------------------------- | +| Direct, Immutable Size | - | `T[N]` / `std::array` | +| Indirect, Immutable Size | `std::unique_ptr` | `std::unique_ptr` | +| Indirect, Mutable Size | `std::vector` | - | + +### Rust + +The Rust vocabulary is as follows: + +| Owning type | Runtime Sized | Compile-time Sized | +| ------------------------ | ------------- | ------------------ | +| Direct, Immutable Size | - | `[T; N]` | +| Indirect, Immutable Size | `Box<[T]>` | `Box<[T; N]>` | +| Indirect, Mutable Size | `Vec` | - | + +There are a few things of note when comparing to C++: + +- The Rust `Box` and `Vec` types are part of `std` but are imported into the + current scope automatically, so they do not need any prefix. +- The `[T]` type represents a fixed-runtime-size buffer. The type itself is + not instantiable since its size is not known at compile time. `Box` is + specialized for the type to store a runtime size in its own type. +- The array type syntax matches the Carbon provisional syntax. +- The heap-buffer type name matches the C++ `vector` type, but it is + privileged with a shorter name. The `Vec` type name is at most the same + length as an array type name (for the same `T`). + +### Swift + +The Swift vocabulary is significantly smaller, to support automatic refcounting: + +| Owning type | Runtime Sized | Compile-time Sized | +| ------------------------ | ------------------ | ------------------ | +| Direct, Immutable Size | - | - | +| Indirect, Immutable Size | - | - | +| Indirect, Mutable Size | `Array` / `[T]` | - | + +Because there is no direct storage option, only one name is needed, and "Array" +is used to refer to a heap-buffer. + +There is +[a recent proposal to](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md) +add a direct-storage immutably sized type. Because "Array" is already taken, the +original proposal called this new type "Vector" in reference to mathematical +vectors. The choice of name was +[heavily discussed](https://forums.swift.org/t/second-review-se-0453-vector-a-fixed-size-array/76412) +however, due to the confusion with C++'s `std::vector` and Rust's +`std::vec::Vec`, and has been +[provisionally renamed to `Slab`](https://github.com/swiftlang/swift/pull/76438). + +### Safe C++ + +The [Safe C++ proposal](https://safecpp.org/draft.html#tuples-arrays-and-slices) +introduces array syntax very similar to Rust: + +| Owning type | Runtime Sized | Compile-time Sized | +| ------------------------ | --------------------- | ------------------- | +| Direct, Immutable Size | - | `[T; N]` | +| Indirect, Immutable Size | `std2::box<[T; dyn]>` | `std2::box<[T; N]>` | +| Indirect, Mutable Size | `std2::vector` | - | + +There are a few things of note: + +- While Rust omits a size to indicate the size is known only at runtime, Safe + C++ uses a `dyn` keyword indicate the same. +- The heap-buffer type name is unchanged from C++, sticking with `vector`. + +### Goals + +It will help to establish some goals in order to weigh alternatives against. +These goals are based on the +[open discussion from 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?usp=sharing&resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA#heading=h.h0tg34pzq5yz), +where we discussed the +[Pointers, Arrays, Slices](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?usp=sharing) +document. + +The goals here are largely informed by and trying to achieve the top-level goal +of +["Code that is easy to read, understand, and write"](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write). +We define some more specific targets here as relate to the specifics of the +array syntax. + +#### Privileging the most common type names + +- "Explicitness must be balanced against conciseness, as verbosity and + ceremony add cognitive overhead for the reader, while explicitness reduces + the amount of outside context the reader must have or assume." + +The more common it will be for a type to be used, the shorter we would like the +name to be. This follows from the presumption that we weigh conciseness as +increasingly valuable for types that will appear more frequently in Carbon code. + +We expect the ordering of frequency in Carbon code to be: + +- primitives ≈ tuples >> heap-buffers > arrays >> everything else[^1]. + +Where primitives are: machine-sized integers (8 bit, 16 bit, etc.), +machine-sized floating points, and pointers including slices[^2]. Function +parameters/arguments are an example of tuples. + +From this, we derive that we want: + +- Primitives and tuples to have the most concise names. + - We can lean on special syntax/keywords as needed to make them concise + but descriptive. +- Heap-buffers to have a concise name, even more so than arrays. + - We could use special syntax if needed to achieve conciseness. +- Arrays to have a concise name, but they do not need to be comparably concise + to primitives and tuples. + - We should try to avoid special syntax. +- Everything else should be written as idiomatic types with descriptive names. + +[^1]: + "[chandlerc] Prioritize: slices first, then resizable storage, then + compile-time sized storage, then everything else is vastly less common. + Between those three, the difference in frequency between the first two is + the biggest." from + [open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0) + +[^2]: + Slices are included with primitives for simplicity, since they will take the + place of many pointers in C++, giving them similar frequency to pointers, + and can be logically thought of as a bounded pointer. + +#### Absence of syntax should make clear defaults + +One way to write arrays and compile-time-sized slices is like we see in Rust: +`[T; N]` and `&[T; N]`. This suggests a relationship where array is like slice, +and the default form. But they are very different types, rather than a +modification of a single type, and this can be confusing[^3] for developers +learning the language. + +[^3]: https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing) + +We want to avoid the situation where +[absence of syntax](https://www.youtube.com/watch?v=-Hb-9TUyjoo), such as a +missing pointer indicator, changes the entire meaning of the remaining syntax or +is otherwise confusing. + +#### Avoiding confusion with other languages + +The most general meaning of "array" is a range of consecutive values in memory. + +However in many languages it is used, either in formally or informally, to refer +to a direct-storage, immutably-sized memory range: + +- C, + [colloquial](https://en.wikibooks.org/wiki/C_Programming/Arrays_and_strings) +- C++, colloquial (from C) and + [`std::array`](https://en.cppreference.com/w/cpp/container/array) +- Go, [colloquial](https://go.dev/tour/moretypes/6) +- Rust, [colloquial](https://doc.rust-lang.org/std/primitive.array.html)[^4] + +In particular, this is the usage in the languages which Carbon will most +frequently interoperate, and/or from which code will be migrated to Carbon and +thus comments and variable names would use these terms in this way. + +[^4]: + Maybe this is more formal than colloquial, but the name is not part of the + typename/syntax. + +Languages which require shared ownership _don't have direct-storage arrays_, so +the same term gets used for indirect storage: + +- Swift, [`Array`](https://developer.apple.com/documentation/swift/array) +- Javascript, + [`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array) +- Java and Kotlin, + [`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html) + +And some languages use array to refer to both direct and indirect storage types. + +- Dlang has direct-storage arrays + [colloqually](https://dlang.org/spec/arrays.html) and the indirect-storage + [`Array`](https://dlang.org/phobos/std_container_array.html) type. +- Pascal uses the presence or absence of a size to determine if + [`Array`](https://www.freepascal.org/docs-html/ref/refsu14.html) uses direct + (immutably-sized) or indirect (mutably-sized) storage. + +In sum, languages which have direct-storage immutably-sized arrays use the term +"array" to refer to those, and most then use a separate name for the +indirect-storage type. + +#### Avoiding confusion with other domains + +The term "vector" in mathematics refers to a fixed-size set of numbers. This +leads to confusion with the C++ type `std::vector` since it holds a +mutably-sized set of values. Developers coming from other domains must learn a +new and contradictory term of art. The Rust language chose naming that derives +from C++, with `std::vec::Vec`. + +These type names conflict with names in mathematical and graphics libraries, +which want to use vector in its mathematical sense. In Rust this leads to `Vec` +for a mutably-sized array, and +[`Vec3`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec3.html), +[`Vec4`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec4.html), and so on +for fixed-size mathematical vectors. While not fatal, this does create ambiguity +that must be overcome by developers. + +## Proposal + +The +[All APIs are library APIs principle](/docs/project/principles/library_apis_only.md) +states: + +> In Carbon, every public function is declared in some Carbon API file. + +As such, we propose a `Core` library type for a direct-storage immutably-sized +array, and then a shorthand for referring to that library type. + +In line with other languages surveyed above, given the presence of a +direct-storage immutably-sized array in Carbon, we will reserve the unqualified +name "array" for this type. In full, its name is `Core.Array(T, N)`, where `T` +is the type of elements in the array, and `N` is the number of elements. Notably +this leaves room for supporting multi-dimensional arrays by adding further +optional size parameters, either in the `Array` type or in a similar sibling +type. + +Here is a provisional vocabulary table to compare with other languages: + +| Owning type | Runtime Sized | Compile-time Sized | +| ------------------------ | ------------- | ----------------------- | +| Direct, Immutable Size | - | `Core.Array(T, N)` | +| Indirect, Immutable Size | ? | `Core.Box(Array(T, N))` | +| Indirect, Mutable Size | `Core.Buf(T)` | - | + +Carbon does not have proposed names for heap allocated storage, so we use some +placeholders here, in order to show where `Array` fits into the picture: + +- `Box(T)` for a heap-allocated `T` value. +- `Buf(T)` for a heap-buffer of `T` values. + +An indirect, immutably-sized buffer does not have a clearly expressible syntax +at the moment. `Box([T])` is the closest fit with the current provisional syntax +for slices. But `[T]` is a sized pointer, which would make this type a +heap-allocated sized pointer, rather than a heap-allocated fixed-size array. +This is in contrast with Rust where `&[T]` is a slice, and thus `[T]` is a +fixed-size buffer, so it then follows that `Box<[T]>` is a heap-allocated +fixed-size buffer. + +Because arrays will be very common in Carbon code, we want to privilege their +usage. There are at least two ways in which we can do so. The first is to +include them in the `prelude` library of the `Core` package. This ensures they +are available in every Carbon file as `Core.Array`. The second is by making the +type available through a shorthand without going through the `Core` package +name. Here we propose to do the former, and leave the latter to +[future work](#future-work). + +## Rationale + +As this proposal is addressing the question of introducing a new `prelude` +library type in `Core`, it is mostly focused on the goal +[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + +This proposal aims to make code easy to understand by using a name that is +consistent across systems programming languages, and avoiding names that have +conflicting meaning. It also uses a standard type syntax, with a type in the +`Core` package, making the type and its documentation maximally discoverable +without requiring special-casing. + +We introduced some more specific sub-goals above: + +1. Privileging the most common type names + +This proposal privileges `Core.Array` as it will appear frequently in code, by +placing it in the `prelude` library. This avoids the need for developers to +`import` another `Core` library in order to access the type. + +While we might argue that `Core.Array` should be further privileged with a +shorthand that avoids any scope prefixes, due to the expected high frequency of +arrays in Carbon code, we leave that to [future work](#future-work) for a more +general feature. + +In this proposal, we avoid introducing additional syntax (such as with `[T; N]` +or `(1, 2)`) or breaking naming rules (such as with a lowercase type name) +because the frequency of use of arrays will be much lower than that of +primitives and tuples. + +2. Absence of syntax should make clear defaults + +We introduce a type name rather than making arrays look more like slices but +without being a pointer, in order to avoid the confusion raised when removing +syntax changes the meaning significantly, and especially in ways that differ +from defaults/options for a single language concept. + +3. Avoiding confusion with other languages + +We propose using the `Array` type name in line with how other languages use the +same term. When a direct-storage array type is part of the language, it's +consistently referred to as an "array" without qualifications. + +Most importantly, the name is consistent with the meaning in C++ and its +standard library (`std::array`) as well as with Rust, the languages which +we expect Carbon code to interact with the most. + +4. Avoiding confusion with other domains + +The name `Vector` is a possible choice for a fixed-length set of values, due to +its mathematical meaning, as was originally proposed for the direct-storage +immutably-sized array type in Swift. However any use of the name `Vector` in a +core systems programming language construct is fraught. Either the name is to be +incorrectly confused with a mathematical vector or with a C++ `std::vector`. We +avoid the confusion by avoiding this name. + +## Future work + +### Automatically importing names from the `prelude` into file scope + +We stated above that we want "arrays to have a concise name, but they do not +need to be comparably concise to primitives and tuples". As such, accessing the +`Array` type without naming `Core` may be a productive option for Carbon +developers, both for writing and reading Carbon code. + +The act of importing the `Array` type from `Core` into every Carbon file scope +has some challenges as a one-off exception, and thus could use a more general +and robust follow-on proposal. The challenges identified here are: + +- `Array` is only one name from the `prelude` that may be productive to + automatically import into the file scope. There are also traits in the + `prelude` that are commonly used, such as `Core.As`. We would like to see a + design that allows a clear way to specify what names are imported into the + file scope and allows future names to be added to that set in a + clearly-specified way. + + For comparison, this is done in Rust by importing all names in the + [`std::prelude` module](https://doc.rust-lang.org/stable/std/prelude/index.html) + into the file scope. The names in the `prelude` are + [aliases to](https://doc.rust-lang.org/stable/std/prelude/v1/index.html) + other parts of the standard library. Each edition of Rust has its + [own `prelude` sub-module](https://doc.rust-lang.org/stable/core/prelude/rust_2021/index.html) + that re-exports the previous prelude and adds any new symbols. In a similar + way, a namespace in Carbon's `prelude` library, or a separate library, could + be used to specify aliases that are imported into the file scope. + +- Developers may want to opt out of importing names from the `prelude` into + the file scope if they'd like to use those names for themselves. We should + consider if and how to provide a mechanism to opt out of the import of any + `prelude` names (including `Core`) into the file scope. + +- We should specify a more general criteria for considering what names to + import from the `prelude` into the file scope automatically. This proposal + states that the frequency-of-use of a name make a good metric, but doesn't + give a clearer rule that would we could measure names against. Rust uses a + metric that is also based on frequency-of-use for its automatically-imported + `prelude`: + + > The prelude is the list of things that Rust automatically imports into + > every Rust program. It’s kept as small as possible, and is focused on + > things, particularly traits, which are used in almost every single Rust + > program. + > + > _-- From + > [std::prelude](https://doc.rust-lang.org/stable/std/prelude/index.html)_ + +- Given such a feature, we could consider moving `i8`, `bool`, etc to be + aliases in the `prelude` that are in the set of names automatically imported + into the file scope. + +This feature would aim to make code easy to understand by having the names in +the file scopematch exactly with the names in the `Core` library, minus the +scope prefixes. This choice to make the shorthands be an implicit `import` of +the type into the file scope matches the existing implicit `import` of the +`prelude` library of the `Core` package, something developers will already need +to model in their minds when reading Carbon code. + +Ultimately, it would be an implementation detail if this is done through an +import of a part of `Core`, or through the compiler's builtin mechanisms. The +mechanism for opting-out of the import could be expected to influence the way +this feature would be implemented. + +By having such shorthands use the same names as the library, we make the +shorthands the least magical they could possibly be, while still maintaining +their primary benefit of conciseness. + +### Namespacing the `Core` package. + +At this time, the `Core` package remains small, but there will come a time where +the names within need to be split into smaller namespaces. Then the name +`Core.Array`, among others, will become longer and the act of previleging the +name through importing it into the file scope will become more pronounced. At +this time, we don't propose to put `Array` into a namespace in `Core` as there's +no such existing structure to point to yet. + +## Alternatives considered + +### `[T; N]` + +This is the current syntax used by the toolchain, however it had the following +problems raised: + +- It's very similar to the syntax for slices, which is `[T]`, but very + different in nature, being storage instead of a reference to storage. +- Given `[T]` is a slice, `[T; N]` would better suit a compile-time-sized + slice. + +The syntax for a slice may also be changed, we discussed +[adding a pointer annotation](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?tab=t.0#heading=h.fahgww8db6f0) +to it, such as `[T]*` and `[T; N]*`. Some downsides remained: + +- The `[T; N]*` syntax would be a fixed-size slice, rather than a pointer to + an array. This leaves no room for writing a pointer to an array, which can + indicate a different intent, that it always includes the full memory range + of the array. Without this distinction, we can't model both + `std::span` and `std::array*` in code migrated from C++ to + Carbon and would need to collapse these to a single type. +- Removing the pointer annotation would change the meaning of the type + expression more then we'd like, since it would change from a slice into an + array, rather than pointer-to-an-array into an array. + +### `array [T; N]` + +This introduces a keyword as a modifier of a fixed-size slice, rather than a +builtin forwarding type. While arrays will be very common, it's not clear that +they rise to the level of requiring breaking the languages naming rules (using a +lowercase name) in order to provide a shorthand. And the shorthand is longer in +the end than the `Array(T, N)` being proposed here. So this uses a larger +weirdness budget for privileging the type while achieving less conciseness. + +This has a similar issue as with `[T; N]` but in the reverse. Removing the +`array` modifier keyword changes the meaning of the type expression in ways that +are larger than a default/modifier relationship. Fixed-size slices are not the +more-default array. + +The use of a lowercase keyword also costs us by preventing users from using the +word `array` in variables, a name which is quite common. + +### `Core.Array(T, N)` + +Providing just the library type is possible, but arrays will be one of the most +common types in Carbon code, as described earlier. Privileging them with a +shorthand that avoids `Core.` will help make Carbon code significantly more +concise, due to the frequency, without hurting understandability. This makes it +worth the tradeoff of putting a name into the file scope (by way of a builtin +type). + +### `array(T, N)` + +This is very similar to the current proposal, just using a lowercase name for +the type name. This would break the language rules without making the result any +more concise for developers. It could highlight that it is a builtin type, but +we argued earlier that this is an implementation detail. Since developers have +to model the implicit import of `Core`'s `prelude` library already, modelling +the implicit import of `Core.Array` as `Array` will be at least as +straightforward. Using a name consistent with the language naming rules (with a +leading capital letter) is preferable in the absence of any strong benefit to +breaking the rules, which we don't see here. And since the frequency of use will +be lower than that of primitives and tuples, the amount of rule-breaking budget +for privileging the type is lower.