{% if tweet %}
Retweet this article!
diff --git a/src/blog.atom.njk b/src/blog.atom.njk
index 0e22d21d6..2206884f5 100644
--- a/src/blog.atom.njk
+++ b/src/blog.atom.njk
@@ -20,6 +20,11 @@ excludeFromSitemap: true
{{ (post.data.updated or post.date) | rssDate }}{{ absolutePostUrl }}
+ {%- for tag in post.data.tags %}
+ {%- if not 'io' in tag and not 'Node.js' in tag %}
+
+ {%- endif %}
+ {%- endfor %}
{{ post.data.author | markdown | striptags }}
diff --git a/src/blog/control-flow-integrity.md b/src/blog/control-flow-integrity.md
new file mode 100644
index 000000000..763d031fd
--- /dev/null
+++ b/src/blog/control-flow-integrity.md
@@ -0,0 +1,103 @@
+---
+title: 'Control-flow Integrity in V8'
+description: 'This blog post discusses the plans to implement control-flow integrity in V8.'
+author: 'Stephen Röttger'
+date: 2023-10-09
+tags:
+ - security
+---
+Control-flow integrity (CFI) is a security feature aiming to prevent exploits from hijacking control-flow. The idea is that even if an attacker manages to corrupt the memory of a process, additional integrity checks can prevent them from executing arbitrary code. In this blog post, we want to discuss our work to enable CFI in V8.
+
+# Background
+
+The popularity of Chrome makes it a valuable target for 0-day attacks and most in-the-wild exploits we’ve seen target V8 to gain initial code execution. V8 exploits typically follow a similar pattern: an initial bug leads to memory corruption but often the initial corruption is limited and the attacker has to find a way to arbitrarily read/write in the whole address space. This allows them to hijack the control-flow and run shellcode that executes the next step of the exploit chain that will try to break out of the Chrome sandbox.
+
+
+To prevent the attacker from turning memory corruption into shellcode execution, we’re implementing control-flow integrity in V8. This is especially challenging in the presence of a JIT compiler. If you turn data into machine code at runtime, you now need to ensure that corrupted data can’t turn into malicious code. Fortunately, modern hardware features provide us with the building blocks to design a JIT compiler that is robust even while processing corrupted memory.
+
+
+Following, we’ll look at the problem divided into three separate parts:
+
+- **Forward-Edge CFI** verifies the integrity of indirect control-flow transfers such as function pointer or vtable calls.
+- **Backward-Edge CFI** needs to ensure that return addresses read from the stack are valid.
+- **JIT Memory Integrity** validates all data that is written to executable memory at runtime.
+
+# Forward-Edge CFI
+
+There are two hardware features that we want to use to protect indirect calls and jumps: landing pads and pointer authentication.
+
+
+## Landing Pads
+
+Landing pads are special instructions that can be used to mark valid branch targets. If enabled, indirect branches can only jump to a landing pad instruction, anything else will raise an exception.
+On ARM64 for example, landing pads are available with the Branch Target Identification (BTI) feature introduced in Armv8.5-A. BTI support is [already enabled](https://bugs.chromium.org/p/chromium/issues/detail?id=1145581) in V8.
+On x64, landing pads were introduced with the Indirect Branch Tracking (IBT) part of the Control Flow Enforcement Technology (CET) feature.
+
+
+However, adding landing pads on all potential targets for indirect branches only provides us with coarse-grained control-flow integrity and still gives attackers lots of freedom. We can further tighten the restrictions by adding function signature checks (the argument and return types at the call site must match the called function) as well as through dynamically removing unneeded landing pad instructions at runtime.
+These features are part of the recent [FineIBT proposal](https://arxiv.org/abs/2303.16353) and we hope that it can get OS adoption.
+
+## Pointer Authentication
+
+Armv8.3-A introduced pointer authentication (PAC) which can be used to embed a signature in the upper unused bits of a pointer. Since the signature is verified before the pointer is used, attackers won’t be able to provide arbitrary forged pointers to indirect branches.
+
+# Backward-Edge CFI
+
+To protect return addresses, we also want to make use of two separate hardware features: shadow stacks and PAC.
+
+## Shadow Stacks
+
+With Intel CET’s shadow stacks and the guarded control stack (GCS) in [Armv9.4-A](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022), we can have a separate stack just for return addresses that has hardware protections against malicious writes. These features provide some pretty strong protections against return address overwrites, but we will need to deal with cases where we legitimately modify the return stack such as during optimization / deoptimization and exception handling.
+
+## Pointer Authentication (PAC-RET)
+
+Similar to indirect branches, pointer authentication can be used to sign return addresses before they get pushed to the stack. This is [already enabled](https://bugs.chromium.org/p/chromium/issues/detail?id=919548) in V8 on ARM64 CPUs.
+
+
+A side effect of using hardware support for Forward-edge and Backward-edge CFI is that it will allow us to keep the performance impact to a minimum.
+
+# JIT Memory Integrity
+
+A unique challenge to CFI in JIT compilers is that we need to write machine code to executable memory at runtime. We need to protect the memory in a way that the JIT compiler is allowed to write to it but the attacker’s memory write primitive can’t. A naive approach would be to change the page permissions temporarily to add / remove write access. But this is inherently racy since we need to assume that the attacker can trigger an arbitrary write concurrently from a second thread.
+
+
+## Per-thread Memory Permissions
+
+On modern CPUs, we can have different views of the memory permissions that only apply to the current thread and can be changed quickly in userland.
+On x64 CPUs, this can be achieved with memory protection keys (pkeys) and ARM announced the [permission overlay extensions](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022) in Armv8.9-A.
+This allows us to fine-grained toggle the write access to executable memory, for example by tagging it with a separate pkey.
+
+
+The JIT pages are now not attacker writable anymore but the JIT compiler still needs to write generated code into it. In V8, the generated code lives in [AssemblerBuffers](https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/assembler.h;l=255;drc=064b9a7903b793734b6c03a86ee53a2dc85f0f80) on the heap which can be corrupted by the attacker instead. We could protect the AssemblerBuffers too in the same fashion, but this just shifts the problem. For example, we’d then also need to protect the memory where the pointer to the AssemblerBuffer lives.
+In fact, any code that enables write access to such protected memory constitutes CFI attack surface and needs to be coded very defensively. E.g. any write to a pointer that comes from unprotected memory is game over, since the attacker can use it to corrupt executable memory. Thus, our design goal is to have as few of these critical sections as possible and keep the code inside short and self-contained.
+
+## Control-Flow Validation
+
+If we don’t want to protect all compiler data, we can assume it to be untrusted from the point of view of CFI instead. Before writing anything to executable memory, we need to validate that it doesn’t lead to arbitrary control-flow. That includes for example that the written code doesn’t perform any syscall instructions or that it doesn’t jump into arbitrary code. Of course, we also need to check that it doesn’t change the pkey permissions of the current thread. Note that we don’t try to prevent the code from corrupting arbitrary memory since if the code is corrupted we can assume the attacker already has this capability.
+To perform such validation safely, we will also need to keep required metadata in protected memory as well as protect local variables on the stack.
+We ran some preliminary tests to assess the impact of such validation on performance. Fortunately, the validation is not occurring in performance-critical code paths, and we did not observe any regressions in the jetstream or speedometer benchmarks.
+
+# Evaluation
+
+Offensive security research is an essential part of any mitigation design and we’re continuously trying to find new ways to bypass our protections. Here are some examples of attacks that we think will be possible and ideas to address them.
+
+## Corrupted Syscall Arguments
+
+As mentioned before, we assume that an attacker can trigger a memory write primitive concurrently to other running threads. If another thread performs a syscall, some of the arguments could then be attacker-controlled if they’re read from memory. Chrome runs with a restrictive syscall filter but there’s still a few syscalls that could be used to bypass the CFI protections.
+
+
+Sigaction for example is a syscall to register signal handlers. During our research we found that a sigaction call in Chrome is reachable in a CFI-compliant way. Since the arguments are passed in memory, an attacker could trigger this code path and point the signal handler function to arbitrary code. Luckily, we can address this easily: either block the path to the sigaction call or block it with a syscall filter after initialization.
+
+
+Other interesting examples are the memory management syscalls. For example, if a thread calls munmap on a corrupted pointer, the attacker could unmap read-only pages and a consecutive mmap call can reuse this address, effectively adding write permissions to the page.
+Some OSes already provide protections against this attack with memory sealing: Apple platforms provide the [VM\_FLAGS\_PERMANENT](https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274) flag and OpenBSD has an [mimmutable](https://man.openbsd.org/mimmutable.2) syscall.
+
+## Signal Frame Corruption
+
+When the kernel executes a signal handler, it will save the current CPU state on the userland stack. A second thread could corrupt the saved state which will then get restored by the kernel.
+Protecting against this in user space seems difficult if the signal frame data is untrusted. At that point one would have to always exit or overwrite the signal frame with a known save state to return to.
+A more promising approach would be to protect the signal stack using per-thread memory permissions. For example, a pkey-tagged sigaltstack would protect against malicious overwrites, but it would require the kernel to temporarily allow write permissions when saving the CPU state onto it.
+
+# v8CTF
+
+These were just a few examples of potential attacks that we’re working on addressing and we also want to learn more from the security community. If this interests you, try your hand at the recently launched [v8CTF](https://security.googleblog.com/2023/10/expanding-our-exploit-reward-program-to.html)! Exploit V8 and gain a bounty, exploits targeting n-day vulnerabilities are explicitly in scope!
diff --git a/src/blog/fast-async.md b/src/blog/fast-async.md
index b623a3885..34609f87c 100644
--- a/src/blog/fast-async.md
+++ b/src/blog/fast-async.md
@@ -156,6 +156,10 @@ We’ve also been working on a new garbage collector, called Orinoco, which move
And last but not least, there was a handy bug in Node.js 8 that caused `await` to skip microticks in some cases, resulting in better performance. The bug started out as an unintended spec violation, but it later gave us the idea for an optimization. Let’s start by explaining the buggy behavior:
+:::note
+**Note:** The following behavior was correct according to the JavaScript spec at the time of writing. Since then, our spec proposal was accepted, and the following "buggy" behavior is now correct.
+:::
+
```js
const p = Promise.resolve();
diff --git a/src/blog/holiday-season-2023.md b/src/blog/holiday-season-2023.md
new file mode 100644
index 000000000..0debbf79b
--- /dev/null
+++ b/src/blog/holiday-season-2023.md
@@ -0,0 +1,66 @@
+---
+title: 'V8 is Faster and Safer than Ever!'
+author: '[Victor Gomes](https://twitter.com/VictorBFG), the Glühwein expert'
+avatars:
+ - victor-gomes
+date: 2023-12-14
+tags:
+ - JavaScript
+ - WebAssembly
+ - security
+ - benchmarks
+description: "V8's impressive accomplishments in 2023"
+tweet: ''
+---
+
+Welcome to the thrilling world of V8, where speed is not just a feature but a way of life. As we bid farewell to 2023, it's time to celebrate the impressive accomplishments V8 has achieved this year.
+
+Through innovative performance optimizations, V8 continues to push the boundaries of what's possible in the ever-evolving landscape of the Web. We introduced a new mid-tier compiler and implemented several improvements to the top-tier compiler infrastructure, the runtime and the garbage collector, which have resulted in significant speed gains across the board.
+
+In addition to performance improvements, we landed exciting new features for both Javascript and WebAssembly. We also shipped a new approach to bringing garbage-collected programming languages efficiently to the Web with [WebAssembly Garbage Collection (WasmGC)](https://v8.dev/blog/wasm-gc-porting).
+
+But our commitment to excellence doesn't stop there – we've also prioritized safety. We improved our sandboxing infrastructure and introduced [Control-flow Integrity (CFI)](https://en.wikipedia.org/wiki/Control-flow_integrity) to V8, providing a safer environment for users.
+
+Below, we've outlined some key highlights from the year.
+
+# Maglev: new mid tier optimizing compiler
+
+We've introduced a new optimizing compiler named [Maglev](https://v8.dev/blog/maglev), strategically positioned between our existing [Sparkplug](https://v8.dev/blog/sparkplug) and [TurboFan](https://v8.dev/docs/turbofan) compilers. It functions in-between as a high-speed optimizing compiler, efficiently generating optimized code at an impressive pace. It generates code approximately 20 times slower than our baseline non-optimizing compiler Sparkplug, but 10 to 100 times faster than the top-tier TurboFan. We've observed significant performance improvements with Maglev, with [JetStream](https://browserbench.org/JetStream2.1/) improving by 8.2% and [Speedometer](https://browserbench.org/Speedometer2.1/) by 6%. Maglev's faster compilation speed and reduced reliance on TurboFan resulted in a 10% energy savings in V8's overall consumption during Speedometer runs. [While not fully complete](https://en.m.wikipedia.org/wiki/Full-employment_theorem), Maglev's current state justifies its launch in Chrome 117. More details in our [blog post](https://v8.dev/blog/maglev).
+
+# Turboshaft: new architecture for the top-tier optimizing compiler
+
+Maglev wasn't our only investment in improved compiler technology. We've also introduced Turboshaft, a new internal architecture for our top-tier optimizing compiler Turbofan, making it both easier to extend with new optimizations and faster at compiling. Since Chrome 120, the CPU-agnostic backend phases all use Turboshaft rather than Turbofan, and compile about twice as fast as before. This is saving energy and is paving the way for more exciting performance gains next year and beyond. Keep an eye out for updates!
+
+# Faster HTML parser
+
+We observed a significant portion of our benchmark time being consumed by HTML parsing. While not a direct enhancement to V8, we took initiative and applied our expertise in performance optimization to add a faster HTML parser to Blink. These changes resulted in a notable 3.4% increase in Speedometer scores. The impact on Chrome was so positive that the WebKit project promptly integrated these changes into [their repository](https://github.com/WebKit/WebKit/pull/9926). We take pride in contributing to the collective goal of achieving a faster Web!
+
+# Faster DOM allocations
+
+We have also been actively investing to the DOM side. Significant optimizations have been applied to the memory allocation strategies in [Oilpan](https://chromium.googlesource.com/v8/v8/+/main/include/cppgc/README.md) - the allocator for the DOM objects. It has gained a page pool, which notably reduced the cost of the round-trips to the kernel. Oilpan now supports both compressed and uncompressed pointers, and we avoid compressing high-traffic fields in Blink. Given how frequently decompression is performed, it had a wide spread impact on performance. In addition, knowing how fast the allocator is, we oilpanized frequently-allocated classes, which made allocation workloads 3x faster and showed significant improvement on DOM-heavy benchmarks such as Speedometer.
+
+# New JavaScript features
+
+JavaScript continues to evolve with newly standardized features, and this year was no exception. We shipped [resizable ArrayBuffers](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer#resizing_arraybuffers) and [ArrayBuffer transfer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/transfer), String [`isWellFormed`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/isWellFormed) and [`toWellFormed`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toWellFormed), [RegExp `v` flag](https://v8.dev/features/regexp-v-flag) (a.k.a. Unicode set notation), [`JSON.parse` with source](https://github.com/tc39/proposal-json-parse-with-source), [Array grouping](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/groupBy), [`Promise.withResolvers`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers), and [`Array.fromAsync`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/fromAsync). Unfortunately, we had to unship [iterator helpers](https://github.com/tc39/proposal-iterator-helpers) after discovering a web incompatibility, but we've worked with TC39 to fix the issue and will reship soon. Finally, we also made ES6+ JS code faster by [eliding some redundant temporal dead zone checks](https://docs.google.com/document/d/1klT7-tQpxtYbwhssRDKfUMEgm-NS3iUeMuApuRgZnAw/edit?usp=sharing) for `let` and `const` bindings.
+
+# WebAssembly updates
+
+Many new features and performance improvements landed for Wasm this year. We enabled support for [multi-memory](https://github.com/WebAssembly/multi-memory), [tail-calls](https://github.com/WebAssembly/tail-call) (see our [blog post](https://v8.dev/blog/wasm-tail-call) for more details), and [relaxed SIMD](https://github.com/WebAssembly/relaxed-simd) to unleash next-level performance. We finished implementing [memory64](https://github.com/WebAssembly/memory64) for your memory-hungry applications and are just waiting for the proposal to [reach phase 4](https://github.com/WebAssembly/memory64/issues/43) so we can ship it! We made sure to incorporate the latest updates to the [exception-handling proposal](https://github.com/WebAssembly/exception-handling) while still supporting the previous format. And we kept investing in [JSPI](https://v8.dev/blog/jspi) for [enabling another big class of applications on the web](https://docs.google.com/document/d/16Us-pyte2-9DECJDfGm5tnUpfngJJOc8jbj54HMqE9Y/edit#bookmark=id.razn6wo5j2m). Stay tuned for next year!
+
+# WebAssembly Garbage Collection
+
+Speaking of bringing new classes of applications to the web, we also finally shipped WebAssembly Garbage Collection (WasmGC) after several years of work on the [proposal](https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md)'s standardization and [implementation](https://bugs.chromium.org/p/v8/issues/detail?id=7748). Wasm now has a built-in way to allocate objects and arrays that are managed by V8's existing garbage collector. That enables compiling applications written in Java, Kotlin, Dart, and similar garbage-collected languages to Wasm – where they typically run about twice as fast as when they're compiled to JavaScript. See [our blog post](https://v8.dev/blog/wasm-gc-porting) for a lot more details.
+
+# Security
+
+On the security side, our three main topics for the year were sandboxing, fuzzing, and CFI. On the [sandboxing](https://docs.google.com/document/d/1FM4fQmIhEqPG8uGp5o9A-mnPB5BOeScZYpkHjo0KKA8/edit?usp=sharing) side we focused on building the missing infrastructure such as the code- and trusted pointer table. On the fuzzing side we invested into everything from fuzzing infrastructure to special purpose fuzzers and better language coverage. Some of our work was covered in [this presentation](https://www.youtube.com/watch?v=Yd9m7e9-pG0). Finally, on the CFI-side we laid the foundation for our [CFI architecture](https://v8.dev/blog/control-flow-integrity) so that it can be realized on as many platforms as possible. Besides these, some smaller but noteworthy efforts include work on [mitigating a popular exploit technique](https://crbug.com/1445008) around `the_hole`` and the launch of a new exploit bounty program in the form of the [V8CTF](https://github.com/google/security-research/blob/master/v8ctf/rules.md).
+
+# Conclusion
+
+Throughout the year, we dedicated efforts to numerous incremental performance enhancements. The combined impact of these small projects, along with the ones detailed in the blog post, is substantial! Below are benchmark scores illustrating V8’s performance improvements achieved in 2023, with an overall growth of `14%` for JetStream and an impressive `34%` for Speedometer.
+
+![Web performance benchmarks measured on a 13” M1 MacBook Pro.](/_img/holiday-season-2023/scores.svg)
+
+These results show that V8 is faster and safer than ever. Buckle up, fellow developer, because with V8, the journey into fast and furious Web has only just begun! We're committed to keeping V8 the best JavaScript and WebAssembly engine on the planet!
+
+From all of us at V8, we wish you a joyous holiday season filled with fast, safe and fabulous experiences as you navigate the Web!
diff --git a/src/blog/jspi-ot.md b/src/blog/jspi-ot.md
new file mode 100644
index 000000000..f3e6417b7
--- /dev/null
+++ b/src/blog/jspi-ot.md
@@ -0,0 +1,40 @@
+---
+title: 'WebAssembly JSPI is going to origin trial'
+description: 'We explain the start of the origin trial for JSPI'
+author: 'Francis McCabe, Thibaud Michaud, Ilya Rezvov, Brendan Dahl'
+date: 2024-03-06
+tags:
+ - WebAssembly
+---
+WebAssembly’s JavaScript Promise Integration (JSPI) API is entering an origin trial, with Chrome release M123. What that means is that you can test whether you and your users can benefit from this new API.
+
+JSPI is an API that allows so-called sequential code – that has been compiled to WebAssembly – to access Web APIs that are _asynchronous_. Many Web APIs are crafted in terms of JavaScript `Promise`s: instead of immediately performing the requested operation they return a `Promise` to do so. When the action is finally performed, the browser’s task runner invokes any callbacks with the Promise. JSPI hooks into this architecture to allow a WebAssembly application to be suspended when the `Promise` is returned and resumed when the `Promise` is resolved.
+
+You can find out more about JSPI and how to use it [here](https://v8.dev/blog/jspi) and the specification itself is [here](https://github.com/WebAssembly/js-promise-integration).
+
+## Requirements
+
+Apart from registering for an origin trial, you will also need to generate the appropriate WebAssembly and JavaScript. If you are using Emscripten, then this is straightforward. You should ensure that you are using at least version 3.1.47.
+
+## Registering for the origin trial
+
+JSPI is still pre-release; it is going through a standardization process and will not be fully released until we get to phase 4 of that process. To use it today, you can set a flag in the Chrome browser; or, you can apply for an origin trial token that will allow your users to access it without having to set the flag themselves.
+
+To register you can go [here](https://developer.chrome.com/origintrials/#/register_trial/1603844417297317889), make sure to follow the registration signup process. To find out more about origin trials in general, [this](https://developer.chrome.com/docs/web-platform/origin-trials) is a good starting place.
+
+## Some potential caveats
+
+There have been some [discussions](https://github.com/WebAssembly/js-promise-integration/issues) in the WebAssembly community about some aspects of the JSPI API. As a result, there are some changes indicated, which will take time to fully work their way through the system. We anticipate that these changes will be *soft launched*: we will share the changes as they become available, however, the existing API will be maintained until at least the end of the origin trial.
+
+In addition, there are some known issues that are unlikely to be fully addressed during the origin trial period:
+
+For applications that intensively create spawned-off computations, the performance of a wrapped sequence (i.e., using JSPI to access an asynchronous API) may suffer. This is because the resources used when creating the wrapped call are not cached between calls; we rely on garbage collection to clear up the stacks that are created.
+We currently allocate a fixed size stack for each wrapped call. This stack is necessarily large in order to accommodate complex applications. However, it also means that an application that has a large number of simple wrapped calls _in flight_ may experience memory pressure.
+
+Neither of these issues are likely to impede experimentation with JSPI; we expect them to be addressed before JSPI is officially released.
+
+## Feedback
+
+Since JSPI is a standards-track effort, we prefer that any issues and feedback be shared [here](https://github.com/WebAssembly/js-promise-integration/issues). However, bug reports can be raised at the standard Chrome bug reporting [site](https://issues.chromium.org/new). If you suspect a problem with code generation, use [this](https://github.com/emscripten-core/emscripten/issues) to report an issue.
+
+Finally, we would like to hear about any benefits that you uncovered. Use the [issue tracker](https://github.com/WebAssembly/js-promise-integration/issues) to share your experience.
diff --git a/src/blog/maglev.md b/src/blog/maglev.md
new file mode 100644
index 000000000..9b521d879
--- /dev/null
+++ b/src/blog/maglev.md
@@ -0,0 +1,151 @@
+---
+title: 'Maglev - V8’s Fastest Optimizing JIT'
+author: '[Toon Verwaest](https://twitter.com/tverwaes), [Leszek Swirski](https://twitter.com/leszekswirski), [Victor Gomes](https://twitter.com/VictorBFG), Olivier Flückiger, Darius Mercadier, and Camillo Bruni — not enough cooks to spoil the broth'
+avatars:
+ - toon-verwaest
+ - leszek-swirski
+ - victor-gomes
+ - olivier-flueckiger
+ - darius-mercadier
+ - camillo-bruni
+date: 2023-12-05
+tags:
+ - JavaScript
+description: "V8's newest compiler, Maglev, improves performance while reducing power consumption"
+tweet: ''
+---
+
+In Chrome M117 we introduced a new optimizing compiler: Maglev. Maglev sits between our existing Sparkplug and TurboFan compilers, and fills the role of a fast optimizing compiler that generates good enough code, fast enough.
+
+
+# Background
+
+Until 2021 V8 had two main execution tiers: Ignition, the interpreter; and [TurboFan](/docs/turbofan), V8’s optimizing compiler focused on peak performance. All JavaScript code is first compiled to ignition bytecode, and executed by interpreting it. During execution V8 tracks how the program behaves, including tracking object shapes and types. Both the runtime execution metadata and bytecode are fed into the optimizing compiler to generate high-performance, often speculative, machine code that runs significantly faster than the interpreter can.
+
+These improvements are clearly visible on benchmarks like [JetStream](https://browserbench.org/JetStream2.1/), a collection of traditional pure JavaScript benchmarks measuring startup, latency, and peak performance. TurboFan helps V8 run the suite 4.35x as fast! JetStream has a reduced emphasis on steady state performance compared to past benchmarks (like the [retired Octane benchmark](/blog/retiring-octane)), but due to the simplicity of many line items, the optimized code is still where most time is spent.
+
+[Speedometer](https://browserbench.org/Speedometer2.1/) is a different kind of benchmark suite than JetStream. It’s designed to measure a web app’s responsiveness by timing simulated user interactions. Instead of smaller static standalone JavaScript apps, the suite consists of full web pages, most of which are built using popular frameworks. Like during most web page loads, Speedometer line items spend much less time running tight JavaScript loops and much more executing a lot of code that interacts with the rest of the browser.
+
+TurboFan still has a lot of impact on Speedometer: it runs over 1.5x as fast! But the impact is clearly much more muted than on JetStream. Part of this difference results from the fact that full pages [just spend less time in pure JavaScript](/blog/real-world-performance#making-a-real-difference). But in part it’s due to the benchmark spending a lot of time in functions that don’t get hot enough to be optimized by TurboFan.
+
+![Web performance benchmarks comparing unoptimized and optimized execution](/_img/maglev/I-IT.svg)
+
+::: note
+All the benchmark scores in this post were measured with Chrome 117.0.5897.3 on a 13” M2 Macbook Air.
+:::
+
+Since the difference in execution speed and compile time between Ignition and TurboFan is so large, in 2021 we introduced a new baseline JIT called [Sparkplug](/blog/sparkplug). It’s designed to compile bytecode to equivalent machine code almost instantaneously.
+
+On JetStream, Sparkplug improves performance quite a bit compared to Ignition (+45%). Even when TurboFan is also in the picture we still see a solid improvement in performance (+8%). On Speedometer we see a 41% improvement over Ignition, bringing it close to TurboFan performance, and a 22% improvement over Ignition + TurboFan! Since Sparkplug is so fast, we can easily deploy it very broadly and get a consistent speedup. If code doesn’t rely solely on easily optimized, long-running, tight JavaScript loops, it’s a great addition.
+
+![Web performance benchmarks with added Sparkplug](/_img/maglev/I-IS-IT-IST.svg)
+
+The simplicity of Sparkplug imposes a relatively low upper limit on the speedup it can provide though. This is clearly demonstrated by the large gap between Ignition + Sparkplug and Ignition + TurboFan.
+
+This is where Maglev comes in, our new optimizing JIT that generates code that’s much faster than Sparkplug code, but is generated much faster than TurboFan can.
+
+
+# Maglev: A Simple SSA-Based JIT compiler
+
+When we started this project we saw two paths forward to cover the gap between Sparkplug and TurboFan: either try to generate better code using the single-pass approach taken by Sparkplug, or build a JIT with an intermediate representation (IR). Since we felt that not having an IR at all during compilation would likely severely restrict the compiler, we decided to go with a somewhat traditional static single-assignment (SSA) based approach, using a CFG (control flow graph) rather than TurboFan's more flexible but cache unfriendly sea-of-nodes representation.
+
+The compiler itself is designed to be fast and easy to work on. It has a minimal set of passes and a simple, single IR that encodes specialized JavaScript semantics.
+
+
+## Prepass
+
+First Maglev does a prepass over the bytecode to find branch targets, including loops, and assignments to variables in loop. This pass also collects liveness information, encoding which values in which variables are still needed across which expressions. This information can reduce the amount of state that needs to be tracked by the compiler later.
+
+
+## SSA
+
+![A printout of the Maglev SSA graph on the command line](/_img/maglev/graph.svg)
+
+Maglev does an abstract interpretation of the frame state, creating SSA nodes representing the results of expression evaluation. Variable assignments are emulated by storing those SSA nodes in the respective abstract interpreter register. In the case of branches and switches, all paths are evaluated.
+
+When multiple paths merge, values in abstract interpreter registers are merged by inserting so-called Phi nodes: value nodes that know which value to pick depending on which path was taken at runtime.
+
+Loops can merge variable values “back in time”, with the data flowing backwards from the loop end to the loop header, in the case when variables are assigned in the loop body. That’s where the data from the prepass comes in handy: since we already know which variables are assigned inside loops, we can pre-create loop phis before we even start processing the loop body. At the end of the loop we can populate the phi input with the correct SSA node. This allows the SSA graph generation to be a single forward pass, without needing to "fix up" loop variables, while also minimizing the amount of Phi nodes that need to be allocated.
+
+
+## Known Node Information
+
+To be as fast as possible, Maglev does as much as possible at once. Instead of building a generic JavaScript graph and then lowering that during later optimization phases, which is a theoretically clean but computationally expensive approach, Maglev does as much as possible immediately during graph building.
+
+During graph building Maglev will look at runtime feedback metadata collected during unoptimized execution, and generate specialized SSA nodes for the types observed. If Maglev sees `o.x` and knows from the runtime feedback that `o` always has one specific shape, it will generate an SSA node to check at runtime that `o` still has the expected shape, followed by a cheap `LoadField` node which does a simple access by offset.
+
+Additionally, Maglev will make a side node that it now knows the shape of `o`, making it unnecessary to check the shape again later. If Maglev later encounters an operation on `o` that doesn't have feedback for some reason, this kind of information learned during compilation can be used as a second source of feedback.
+
+Runtime information can come in various forms. Some information needs to be checked at runtime, like the shape check previously described. Other information can be used without runtime checks by registering dependencies to the runtime. Globals that are de-facto constant (not changed between initialization and when their value is seen by Maglev) fall into this category: Maglev does not need to generate code to dynamically load and check their identity. Maglev can load the value at compile time and embed it directly into the machine code; if the runtime ever mutates that global, it'll also take care to invalidate and deoptimize that machine code.
+
+Some forms of information are “unstable”. Such information can only be used to the extent that the compiler knows for sure that it can’t change. For example, if we just allocated an object, we know it’s a new object and we can skip expensive write barriers entirely. Once there has been another potential allocation, the garbage collector could have moved the object, and we now need to emit such checks. Others are "stable": if we have never seen any object transition away from having a certain shape, then we can register a dependency on this event (any object transitioning away from that particular shape) and don’t need to recheck the shape of the object, even after a call to an unknown function with unknown side effects.
+
+
+## Deoptimization
+
+Given that Maglev can use speculative information that it checks at runtime, Maglev code needs to be able to deoptimize. To make this work, Maglev attaches abstract interpreter frame state to nodes that can deoptimize. This state maps interpreter registers to SSA values. This state turns into metadata during code generation, providing a mapping from optimized state to unoptimized state. The deoptimizer interprets this data, reading values from the interpreter frame and machine registers and putting them into the required places for interpretation. This builds on the same deoptimization mechanism as used by TurboFan, allowing us to share most of the logic and take advantage of the testing of the existing system.
+
+
+## Representation Selection
+
+JavaScript numbers represent, according to [the spec](https://tc39.es/ecma262/#sec-ecmascript-language-types-number-type), a 64-bit floating point value. This doesn't mean that the engine has to always store them as 64-bit floats though, especially since In practice many numbers are small integers (e.g. array indices). V8 tries to encode numbers as 31-bit tagged integers (internally called “Small Integers” or "Smi"), both to save memory (32bit due to [pointer compression](/blog/pointer-compression)), and for performance (integer operations are faster than float operations).
+
+To make numerics-heavy JavaScript code fast, it’s important that optimal representations are chosen for value nodes. Unlike the interpreter and Sparkplug, the optimizing compiler can unbox values once it knows their type, operating on raw numbers rather than JavaScript values representing numbers, and rebox values only if strictly necessary. Floats can directly be passed in floating point registers instead of allocating a heap object that contains the float.
+
+Maglev learns about the representation of SSA nodes mainly by looking at runtime feedback of e.g., binary operations, and propagating that information forwards through the Known Node Info mechanism. When SSA values with specific representations flow into Phis, a correct representation that supports all the inputs needs to be chosen. Loop phis are again tricky, since inputs from within the loop are seen after a representation should be chosen for the phi — the same "back in time" problem as for graph building. This is why Maglev has a separate phase after graph building to do representation selection on loop phis.
+
+
+## Register Allocation
+
+After graph building and representation selection, Maglev mostly knows what kind of code it wants to generate, and is "done" from a classical optimization point of view. To be able to generate code though, we need to choose where SSA values actually live when executing machine code; when they're in machine registers, and when they're saved on the stack. This is done through register allocation.
+
+Each Maglev node has input and output requirements, including requirements on temporaries needed. The register allocator does a single forward walk over the graph, maintaining an abstract machine register state not too dissimilar from the abstract interpretation state maintained during graph building, and will satisfy those requirements, replacing the requirements on the node with actual locations. Those locations can then be used by code generation.
+
+First, a prepass runs over the graph to find linear live ranges of nodes, so that we can free up registers once an SSA node isn’t needed anymore. This prepass also keeps track of the chain of uses. Knowing how far in the future a value is needed can be useful to decide which values to prioritize, and which to drop, when we run out of registers.
+
+After the prepass, the register allocation runs. Register assignment follows some simple, local rules: If a value is already in a register, that register is used if possible. Nodes keep track of what registers they are stored into during the graph walk. If the node doesn’t yet have a register, but a register is free, it’s picked. The node gets updated to indicate it’s in the register, and the abstract register state is updated to know it contains the node. If there’s no free register, but a register is required, another value is pushed out of the register. Ideally, we have a node that’s already in a different register, and can drop this "for free"; otherwise we pick a value that won’t be needed for a long time, and spill it onto the stack.
+
+On branch merges, the abstract register states from the incoming branches are merged. We try to keep as many values in registers as possible. This can mean we need to introduce register-to-register moves, or may need to unspill values from the stack, using moves called “gap moves”. If a branch merge has a phi node, register allocation will assign output registers to the phis. Maglev prefers to output phis to the same registers as its inputs, to minimize moves.
+
+If more SSA values are live than we have registers, we’ll need to spill some values on the stack, and unspill them later. In the spirit of Maglev, we keep it simple: if a value needs to be spilled, it is retroactively told to immediately spill on definition (right after the value is created), and code generation will handle emitting the spill code. The definition is guaranteed to ‘dominate’ all uses of the value (to reach the use we must have passed through the definition and therefore the spill code). This also means that a spilled value will have exactly one spill slot for the entire duration of the code; values with overlapping lifetimes will thus have non-overlapping assigned spill slots.
+
+Due to representation selection, some values in the Maglev frame will be tagged pointers, pointers that V8’s GC understands and needs to consider; and some will be untagged, values that the GC should not look at. TurboFan handles this by precisely keeping track of which stack slots contain tagged values, and which contain untagged values, which changes during execution as slots are reused for different values. For Maglev we decided to keep things simpler, to reduce the memory required for tracking this: we split the stack frame into a tagged and an untagged region, and only store this split point.
+
+
+## Code Generation
+
+Once we know what expressions we want to generate code for, and where we want to put their outputs and inputs, Maglev is ready to generate code.
+
+Maglev nodes directly know how to generate assembly code using a “macro assembler”. For example, a `CheckMap` node knows how to emit assembler instructions that compare the shape (internally called the “map”) of an input object with a known value, and to deoptimize the code if the object had a wrong shape.
+
+One slightly tricky bit of code handles gap moves: The requested moves created by the register allocator know that a value lives somewhere and needs to go elsewhere. If there’s a sequence of such moves though, a preceding move could clobber the input needed by a subsequent move. The Parallel Move Resolver computes how to safely perform the moves so that all values end up in the right place.
+
+
+# Results
+
+So the compiler we just presented is both clearly much more complex than Sparkplug, and much simpler than TurboFan. How does it fare?
+
+In terms of compilation speed we’ve managed to build a JIT that’s roughly 10x slower than Sparkplug, and 10x faster than TurboFan.
+
+![Compile time comparison of the compilation tiers, for all functions compiled in JetStream](/_img/maglev/compile-time.svg)
+
+This allows us to deploy Maglev much earlier than we’d want to deploy TurboFan. If the feedback it relied upon ended up not being very stable yet, there’s no huge cost to deoptimizing and recompiling later. It also allows us to use TurboFan a little later: we’re running much faster than we’d run with Sparkplug.
+
+Slotting in Maglev between Sparkplug and TurboFan results in noticeable benchmark improvements:
+
+![Web performance benchmarks with Maglev](/_img/maglev/I-IS-IT-IST-ISTM.svg)
+
+We have also validated Maglev on real-world data, and see good improvements on [Core Web Vitals](https://web.dev/vitals/).
+
+Since Maglev compiles much faster, and since we can now afford to wait longer before we compile functions with TurboFan, this results in a secondary benefit that’s not as visible on the surface. The benchmarks focus on main-thread latency, but Maglev also significantly reduces V8’s overall resource consumption by using less off-thread CPU time. The energy consumption of a process can be measured easily on an M1- or M2-based Macbook using `taskinfo`.
+
+:::table-wrapper
+| Benchmark | Energy Consumption |
+| :---------: | :----------------: |
+| JetStream | -3.5% |
+| Speedometer | -10% |
+:::
+
+Maglev isn’t complete by any means. We've still got plenty more work to do, more ideas to try out, and more low-hanging fruit to pick — as Maglev gets more complete, we’ll expect to see higher scores, and more reduction in energy consumption.
+
+Maglev is now available for desktop Chrome now, and will be rolled out to mobile devices soon.
diff --git a/src/blog/speeding-up-v8-heap-snapshots.md b/src/blog/speeding-up-v8-heap-snapshots.md
new file mode 100644
index 000000000..0d12b66ea
--- /dev/null
+++ b/src/blog/speeding-up-v8-heap-snapshots.md
@@ -0,0 +1,172 @@
+---
+title: 'Speeding up V8 heap snapshots'
+description: 'This post about V8 heap snapshots presents some performance problems found by Bloomberg engineers, and how we fixed them to make JavaScript memory analysis faster than ever.'
+author: 'Jose Dapena Paz'
+date: 2023-07-27
+tags:
+ - memory
+ - tools
+---
+*This blog post has been authored by José Dapena Paz (Igalia), with contributions from Jason Williams (Bloomberg), Ashley Claymore (Bloomberg), Rob Palmer (Bloomberg), Joyee Cheung (Igalia), and Shu-yu Guo (Google).*
+
+In this post about V8 heap snapshots, I will talk about some performance problems found by Bloomberg engineers, and how we fixed them to make JavaScript memory analysis faster than ever.
+
+## The problem
+
+Bloomberg engineers were working on diagnosing a memory leak in a JavaScript application. It was failing with *Out-Of-Memory* errors. For the tested application, the V8 heap limit was configured to be around 1400 MB. Normally V8’s garbage collector should be able to keep the heap usage under that limit, so the failures indicated that there was likely a leak.
+
+A common technique to debug a routine memory leak scenario like this is to capture a heap snapshot first, then load it in the DevTools “Memory” tab and find out what is consuming the most memory by inspecting the various summaries and object attributes. In the DevTools UI, the heap snapshot can be taken in the “Memory” tab. For Node.js applications, the heap snapshot [can be triggered programmatically](https://nodejs.org/en/docs/guides/diagnostics/memory/using-heap-snapshot) using this API:
+
+```js
+require('v8').writeHeapSnapshot();
+```
+
+They wanted to capture several snapshots at different points in the application’s life, so that DevTools Memory viewer could be used to show the difference between the heaps at different times. The problem was that capturing a single full-size (500 MB) snapshot was taking **over 30 minutes**!
+
+It was this slowness in the memory analysis workflow that we needed to solve.
+
+## Narrowing the problem
+
+Then, Bloomberg engineers started investigating the issue using some V8 parameters. As described in the [this post](https://blogs.igalia.com/dape/2023/05/18/javascript-memory-profiling-with-heap-snapshot/), Node.js and V8 have some nice command line parameters that can help with that. These options were used to create the heap snapshots, simplify the reproduction, and improve observability:
+
+- `--max-old-space-size=100`: This limits the heap to 100 megabytes and helps to reproduce the issue much faster.
+- `--heapsnapshot-near-heap-limit=10`: This is a Node.js specific command line parameter that tells Node.js to generate a snapshot each time it comes close to running out of memory. It is configured to generate up to 10 snapshots in total. This prevents thrashing where the memory-starved program spends a long time producing more snapshots than needed.
+- `--enable-etw-stack-walking`: This allows tools such as ETW, WPA & xperf to see the JS stack which has been called in V8. (available in Node.js v20+)
+- `--interpreted-frames-native-stack`: This flag is used in combination with tools like ETW, WPA & xperf to see the native stack when profiling. (available in Node.js v20+).
+
+When the size of the V8 heap is approaching the limit, V8 forces a garbage collection to reduce the memory usage. It also notifies the embedder about this. The `--heapsnapshot-near-heap-limit` flag in Node.js generates a new heap snapshot upon notification. In the test case, the memory usage decreases, but, after several iterations, garbage collection ultimately can not free up enough space and so the application is terminated with an *Out-Of-Memory* error.
+
+They took recordings using Windows Performance Analyzer (see below) in order to narrow down the issue. This revealed that most CPU time was being spent within the V8 Heap Explorer. Specifically, it took around 30 minutes just to walk through the heap to visit each node and collect the name. This didn’t seem to make much sense — why would recording the name of each property take so long?
+
+This is when I was asked to take a look.
+
+## Quantifying the problem
+
+The first step was adding support in V8 to better understand where time is spent during the capturing of heap snapshots. The capture process itself is split into two phases: generation, then serialization. We landed [this patch](https://chromium-review.googlesource.com/c/v8/v8/+/4428810) upstream to introduce a new command line flag `--profile_heap_snapshot` to V8, which enables logging of both the generation and serialization times.
+
+Using this flag, we learned some interesting things!
+
+First, we could observe the exact amount of time V8 was spending on generating each snapshot. In our reduced test case, the first took 5 minutes, the second took 8 minutes, and each subsequent snapshot kept on taking longer and longer. Nearly all of this time was spent in the generation phase.
+
+This also allowed us to quantify the time spent on snapshot generation with a trivial overhead, which helped us isolate and identify similar slowdowns in other widely-used JavaScript applications - in particular, ESLint on TypeScript. So we know the problem was not app-specific.
+
+Furthermore, we found the problem happened on both Windows and Linux. The problem was also not platform-specific.
+
+## First optimization: improved `StringsStorage` hashing
+
+To identify what was causing the excesive delay I profiled the failing script using [Windows Performance Toolkit](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/).
+
+When I opened the recording with [Windows Performance Analyzer](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer), this was what I found:
+
+![](/_img/speeding-up-v8-heap-snapshots/wpa-1.png){ .no-darkening }
+
+
+One third of the samples was spent in `v8::internal::StringsStorage::GetEntry`:
+
+```cpp
+181 base::HashMap::Entry* StringsStorage::GetEntry(const char* str, int len) {
+182 uint32_t hash = ComputeStringHash(str, len);
+183 return names_.LookupOrInsert(const_cast(str), hash);
+184 }
+```
+
+Because this was run with a release build, the information of the inlined function calls were folded into `StringsStorage::GetEntry()`. To figure out exactly how much time the inlined function calls were taking, I added the “Source Line Number” column to the breakdown and found that most of the time was spent on line 182, which was a call to `ComputeStringHash()`:
+
+![](/_img/speeding-up-v8-heap-snapshots/wpa-2.png){ .no-darkening }
+
+So over 30% of the snapshot generation time was spent on `ComputeStringHash()`, but why?
+
+Let’s first talk about `StringsStorage`. Its purpose is to store a unique copy of all the strings that will be used in the heap snapshot. For fast access and avoiding duplicates, this class uses a hashmap backed by an array, where collisions are handled by storing elements in the next free location in the array.
+
+I started to suspect that the problem could be caused by collisions, which could lead to long searches in the array. So I added exhaustive logs to see the generated hash keys and, on insertion, see how far it was between the expected position calculated from the hash key and the actual position the entry ended up in due to collisions.
+
+In the logs, things were… not right: the offset of many items was over 20, and in the worst case, in the order of thousands!
+
+Part of the problem was caused by numeric strings — especially strings for a wide range of consecutive numbers. The hash key algorithm had two implementations, one for numeric strings and another for other strings. While the string hash function was quite classical, the implementation for the numeric strings would basically return the value of the number prefixed by the number of digits:
+
+```cpp
+int32_t OriginalHash(const std::string& numeric_string) {
+ int kValueBits = 24;
+
+ int32_t mask = (1 << kValueBits) - 1; /* 0xffffff */
+ return (numeric_string.length() << kValueBits) | (numeric_string & mask);
+}
+```
+
+| `x` | `OriginalHash(x)` |
+| --: | ----------------: |
+| 0 | `0x1000000` |
+| 1 | `0x1000001` |
+| 2 | `0x1000002` |
+| 3 | `0x1000003` |
+| 10 | `0x200000a` |
+| 11 | `0x200000b` |
+| 100 | `0x3000064` |
+
+This function was problematic. Some examples of problems with this hash function:
+
+- Once we inserted a string whose hash key value was a small number, we would run into collisions when we tried to store another number in that location, and there would be similar collisions if we tried to store subsequent numbers consecutively.
+- Or even worse: if there were already a lot of consecutive numbers stored in the map, and we wanted to insert a string whose hash key value was in that range, we had to move the entry along all the occupied locations to find a free location.
+
+What did I do to fix it? As the problem comes mostly from numbers represented as strings that would fall in consecutive positions, I modified the hash function so we would rotate the resulting hash value 2 bits to the left.
+
+```cpp
+int32_t NewHash(const std::string& numeric_string) {
+ return OriginalHash(numeric_string) << 2;
+}
+```
+
+| `x` | `OriginalHash(x)` | `NewHash(x)` |
+| --: | ----------------: | -----------: |
+| 0 | `0x1000000` | `0x4000000` |
+| 1 | `0x1000001` | `0x4000004` |
+| 2 | `0x1000002` | `0x4000008` |
+| 3 | `0x1000003` | `0x400000c` |
+| 10 | `0x200000a` | `0x8000028` |
+| 11 | `0x200000b` | `0x800002c` |
+| 100 | `0x3000064` | `0xc000190` |
+
+So for each pair of consecutive numbers, we would introduce 3 free positions in between. This modification was chosen because empirical testing across several work-sets showed that it worked best for minimizing collisions.
+
+[This hashing fix](https://chromium-review.googlesource.com/c/v8/v8/+/4428811) has landed in V8.
+
+## Second optimization: caching source positions
+
+After fixing the hashing, we re-profiled and found a further optimization opportunity that would reduce a significant part of the overhead.
+
+When generating a heap snapshot, for each function in the heap, V8 tries to record its start position in a pair of line and column numbers. This information can be used by the DevTools to display a link to the source code of the function. During usual compilation, however, V8 only stores the start position of each function in the form of a linear offset from the beginning of the script. To calculate the line and column numbers based on the linear offset, V8 needs to traverse the whole script and record where the line breaks are. This calculation turns out to be very expensive.
+
+Normally, after V8 finishes calculating the offsets of line breaks in a script, it caches them in a newly allocated array attached to the script. Unfortunately, the snapshot implementation cannot modify the heap when traversing it, so the newly calculated line information cannot be cached.
+
+The solution? Before generating the heap snapshot, we now iterate over all the scripts in the V8 context to compute and cache the offsets of the line breaks. As this is not done when we traverse the heap for heap snapshot generation, it is still possible to modify the heap and store the source line positions as a cache.
+
+[The fix for the caching of line break offsets](https://chromium-review.googlesource.com/c/v8/v8/+/4538766) has also landed in V8.
+
+## Did we make it fast?
+
+After enabling both fixes, we re-profiled. Both of our fixes only affect snapshot generation time, so, as expected, snapshot serialization times were unaffected.
+
+When operating on a JS program containing…
+
+- Development JS, generation time is **50% faster** 👍
+- Production JS, generation time is **90% faster** 😮
+
+Why was there a massive difference between production and development code? The production code is optimized using bundling and minification, so there are fewer JS files, and these files tend to be large. It takes longer to calculate source lines positions for these large files, so they benefit the most when we can cache the source position and avoid repeating calculations.
+
+The optimizations were validated on both Windows and Linux target environments.
+
+For the particularly challenging problem originally faced by the Bloomberg engineers, the total end-to-end time to capture a 100MB snapshot was reduced from a painful 10 minutes down to a very pleasant 6 seconds. That is **a 100× win!** 🔥
+
+The optimizations are generic wins that we expect to be widely applicable to anyone performing memory debugging on V8, Node.js, and Chromium. These wins were shipped in V8 v11.5.130, which means they are found in Chromium 115.0.5576.0. We look forward to Node.js gaining these optimizations in the next semver-major release.
+
+## What’s next?
+
+First, it would be useful for Node.js to accept the new `--profile-heap-snapshot` flag in `NODE_OPTIONS`. In some use cases, users cannot control the the command line options passed to Node.js directly and have to configure them through the environment variable `NODE_OPTIONS`. Today, Node.js filters V8 command line options set in the environment variable, and only allows a known subset, which could make it harder to test new V8 flags in Node.js, as what happened in our case.
+
+Information accuracy in snapshots can be improved further. Today, each script source code line information is stored in a representation in the V8 heap itself. And that’s a problem because we want to measure the heap precisely without the performance measurement overhead affecting the subject we are observing. Ideally, we would store the cache of line information outside the V8 heap in order to make heap snapshot information more accurate.
+
+Finally, now that we improved the generation phase, the biggest cost is now the serialization phase. Further analysis may reveal new optimization opportunities in serialization.
+
+## Credits
+
+This was possible thanks to the work of [Igalia](https://www.igalia.com/) and [Bloomberg](https://techatbloomberg.com/) engineers.
diff --git a/src/blog/static-roots.md b/src/blog/static-roots.md
new file mode 100644
index 000000000..96dd7863a
--- /dev/null
+++ b/src/blog/static-roots.md
@@ -0,0 +1,41 @@
+---
+title: 'Static Roots: Objects with Compile-Time Constant Addresses'
+author: 'Olivier Flückiger'
+avatars:
+ - olivier-flueckiger
+date: 2024-02-05
+tags:
+ - JavaScript
+description: "Static Roots makes the addresses of certain JS objects a compile-time constant."
+tweet: ''
+---
+
+Did you ever wonder where `undefined`, `true`, and other core JavaScript objects come from? These objects are the atoms of any user defined object and need to be there first. V8 calls them immovable immutable roots and they live in their own heap – the read-only heap. Since they are used constantly, quick access is crucial. And what could be quicker than correctly guessing their memory address at compile time?
+
+As an example, consider the extremely common `IsUndefined` [API function](https://source.chromium.org/chromium/chromium/src/+/main:v8/include/v8-value.h?q=symbol:%5Cbv8::Value::IsUndefined%5Cb%20case:yes). Instead of having to look up the address of the `undefined` object for reference, what if we could simply check if an object's pointer ends in, say, `0x61` to know if it is undefined. This is exactly what the V8’s *static roots* feature achieves. This post explores the hurdles we had to take to get there. The feature landed in Chrome 111 and brought performance benefits across the whole VM, particularly speeding up C++ code and builtin functions.
+
+## Bootstrapping the Read-Only Heap
+
+Creating the read-only objects takes some time, so V8 creates them at compile time. To compile V8, first a minimal proto-V8 binary called `mksnapshot` is compiled. This one creates all the shared read-only objects as well as the native code of builtin functions and writes them into a snapshot. Then, the actual V8 binary is compiled and bundled with the snapshot. To start V8 the snapshot is loaded into memory and we can immediately start using its content. The following diagram shows the simplified build process for the standalone `d8` binary.
+
+![](/_img/static-roots/static-roots1.svg)
+
+Once `d8` is up and running all the read-only objects have their fixed place in memory and never move. When we JIT code, we can e.g., directly refer to `undefined` by its address. However, when building the snapshot and when compiling the C++ for libv8 the address is not known yet. It depends on two things unknown at build time. First, the binary layout of the read-only heap and second, where in the memory space that read-only heap is located.
+
+## How to Predict Addresses?
+
+V8 uses [pointer compression](https://v8.dev/blog/pointer-compression). Instead of full 64 bit addresses we refer to objects by a 32 bit offset into a 4GB region of memory. For many operations such as property loads or comparisons, the 32 bit offset into that cage is all that is needed to uniquely identify an object. Therefore our second problem — not knowing where in the memory space the read-only heap is placed — is not actually a problem. We simply place the read-only heap at the start of every pointer compression cage thus giving it a known location. For instance of all objects in V8’s heap, `undefined` always has the smallest compressed address, starting at 0x61 bytes. That’s how we know that if the lower 32 bits of any JS object’s full address are 0x61, then it must be `undefined`.
+
+This is already useful, but we want to be able to use this address in the snapshot and in libv8 – a seemingly circular problem. However, if we ensure that `mksnapshot` deterministically creates a bit identical read-only heap, then we can re-use these addresses across builds. To use them in libv8 itself, we basically build V8 twice:
+
+![](/_img/static-roots/static-roots2.svg)
+
+The first time round calling `mksnapshot` the only artifact produced is a file that contains the [addresses](https://source.chromium.org/chromium/chromium/src/+/main:v8/src/roots/static-roots.h) relative to the cage base of every object in the read-only heap. In the second stage of the build we compile libv8 again and a flag ensures that whenever we refer to `undefined` we literally use `cage_base + StaticRoot::kUndefined` instead; the static offset of `undefined` of course being defined in the static-roots.h file. In many cases this will allow the C++ compiler creating libv8 and the builtins compiler in `mksnapshot` to create much more efficient code as the alternative is to always load the address from a global array of root objects. We end up with a `d8` binary where the compressed address of `undefined` is hardcoded to be `0x61`.
+
+Well, morally this is how everything works, but practically we only build V8 once – ain’t nobody got time for this. The generated static-roots.h file is cached in the source repository and only needs to be recreated if we change the layout of the read-only heap.
+
+## Further Applications
+
+Speaking of practicalities, static roots enable even more optimizations. For instance we have since grouped common objects together allowing us to implement some operations as range checks over their addresses. For instance all string maps (i.e., the [hidden-class](https://v8.dev/docs/hidden-classes) meta objects describing the layout of different string types) are next to each other, hence an object is a string if its map has a compressed address between `0xdd` and `0x49d`. Or, truthy objects must have an address that is at least `0xc1`.
+
+Not everything is about the performance of JITed code in V8. As this project has shown, a relatively small change to the C++ code can have significant impact too. For instance Speedometer 2, a benchmark which exercises the V8 API and the interaction between V8 and its embedder, gained about 1% in score on an M1 CPU thanks to static roots.
diff --git a/src/blog/wasm-gc-porting.md b/src/blog/wasm-gc-porting.md
new file mode 100644
index 000000000..b06c8cf7e
--- /dev/null
+++ b/src/blog/wasm-gc-porting.md
@@ -0,0 +1,219 @@
+---
+title: 'A new way to bring garbage collected programming languages efficiently to WebAssembly'
+author: 'Alon Zakai'
+avatars:
+ - 'alon-zakai'
+date: 2023-11-01
+tags:
+ - WebAssembly
+tweet: '1720161507324076395'
+---
+
+A recent article on [WebAssembly Garbage Collection (WasmGC)](https://developer.chrome.com/blog/wasmgc) explains at a high level how the [Garbage Collection (GC) proposal](https://github.com/WebAssembly/gc) aims to better support GC languages in Wasm, which is very important given their popularity. In this article, we will get into the technical details of how GC languages such as Java, Kotlin, Dart, Python, and C# can be ported to Wasm. There are in fact two main approaches:
+
+- The “**traditional**” porting approach, in which an existing implementation of the language is compiled to WasmMVP, that is, the WebAssembly Minimum Viable Product that launched in 2017.
+- The **WasmGC** porting approach, in which the language is compiled down to GC constructs in Wasm itself that are defined in the recent GC proposal.
+
+We’ll explain what those two approaches are and the technical tradeoffs between them, especially regarding size and speed. While doing so, we’ll see that WasmGC has several major advantages, but it also requires new work both in toolchains and in Virtual Machines (VMs). The later sections of this article will explain what the V8 team has been doing in those areas, including benchmark numbers. If you’re interested in Wasm, GC, or both, we hope you’ll find this interesting, and make sure to check out the demo and getting started links near the end!
+
+## The “Traditional” Porting Approach
+
+How are languages typically ported to new architectures? Say that Python wants to run on the [ARM architecture](https://en.wikipedia.org/wiki/ARM_architecture_family), or Dart wants to run on the [MIPS architecture](https://en.wikipedia.org/wiki/MIPS_architecture). The general idea is then to recompile the VM to that architecture. Aside from that, if the VM has architecture-specific code, like just-in-time (JIT) or ahead-of-time (AOT) compilation, then you also implement a backend for JIT/AOT for the new architecture. This approach makes a lot of sense, because often the main part of the codebase can just be recompiled for each new architecture you port to:
+
+
+![Structure of a ported VM](/_img/wasm-gc-porting/ported-vm.svg "On the left, main runtime code including a parser, garbage collector, optimizer, library support, and more; on the right, separate backend code for x64, ARM, etc.")
+
+In this figure, the parser, library support, garbage collector, optimizer, etc., are all shared between all architectures in the main runtime. Porting to a new architecture only requires a new backend for it, which is a comparatively small amount of code.
+
+Wasm is a low-level compiler target and so it is not surprising that the traditional porting approach can be used. Since Wasm first started we have seen this work well in practice in many cases, such as [Pyodide for Python](https://pyodide.org/en/stable/) and [Blazor for C#](https://dotnet.microsoft.com/en-us/apps/aspnet/web-apps/blazor) (note that Blazor supports both [AOT](https://learn.microsoft.com/en-us/aspnet/core/blazor/host-and-deploy/webassembly?view=aspnetcore-7.0#ahead-of-time-aot-compilation) and [JIT](https://github.com/dotnet/runtime/blob/main/docs/design/mono/jiterpreter.md) compilation, so it is a nice example of all the above). In all these cases, a runtime for the language is compiled into WasmMVP just like any other program that is compiled to Wasm, and so the result uses WasmMVP’s linear memory, table, functions, and so forth.
+
+As mentioned before, this is how languages are typically ported to new architectures, so it makes a lot of sense for the usual reason that you can reuse almost all the existing VM code, including language implementation and optimizations. It turns out, however, that there are several Wasm-specific downsides to this approach, and that is where WasmGC can help.
+
+## The WasmGC Porting Approach
+
+Briefly, the GC proposal for WebAssembly (“WasmGC”) allows you to define struct and array types and perform operations such as create instances of them, read from and write to fields, cast between types, etc. (for more details, see the [proposal overview](https://github.com/WebAssembly/gc/blob/main/proposals/gc/Overview.md)). Those objects are managed by the Wasm VM’s own GC implementation, which is the main difference between this approach and the traditional porting approach.
+
+It may help to think of it like this: _If the traditional porting approach is how one ports a language to an **architecture**, then the WasmGC approach is very similar to how one ports a language to a **VM**_. For example, if you want to port Java to JavaScript, then you can use a compiler like [J2CL](https://j2cl.io) which represents Java objects as JavaScript objects, and those JavaScript objects are then managed by the JavaScript VM just like all others. Porting languages to existing VMs is a very useful technique, as can be seen by all the languages that compile to [JavaScript](https://gist.github.com/matthiasak/c3c9c40d0f98ca91def1), [the JVM](https://en.wikipedia.org/wiki/List_of_JVM_languages), and [the CLR](https://en.wikipedia.org/wiki/List_of_CLI_languages).
+
+This architecture/VM metaphor is not an exact one, in particular because WasmGC intends to be lower-level than the other VMs we mentioned in the last paragraph. Still, WasmGC defines VM-managed structs and arrays and a type system for describing their shapes and relationships, and porting to WasmGC is the process of representing your language’s constructs with those primitives; this is certainly higher-level than a traditional port to WasmMVP (which lowers everything into untyped bytes in linear memory). Thus, WasmGC is quite similar to ports of languages to VMs, and it shares the advantages of such ports, in particular good integration with the target VM and reuse of its optimizations.
+
+## Comparing the Two Approaches
+
+Now that we have an idea of what the two porting approaches for GC languages are, let’s see how they compare.
+
+### Shipping memory management code
+
+In practice, a lot of Wasm code is run inside a VM that already has a garbage collector, which is the case on the Web, and also in runtimes like [Node.js](https://nodejs.org/), [workerd](https://github.com/cloudflare/workerd), [Deno](https://deno.com/), and [Bun](https://bun.sh/). In such places, shipping a GC implementation adds unnecessary size to the Wasm binary. In fact, this is not just a problem with GC languages in WasmMVP, but also with languages using linear memory like C, C++, and Rust, since code in those languages that does any sort of interesting allocation will end up bundling `malloc/free` to manage linear memory, which requires several kilobytes of code. For example, `dlmalloc` requires 6K, and even a malloc that trades off speed for size, like [`emmalloc`](https://groups.google.com/g/emscripten-discuss/c/SCZMkfk8hyk/m/yDdZ8Db3AwAJ), takes over 1K. WasmGC, on the other hand, has the VM automatically manage memory for us so we need no memory management code at all—neither a GC nor `malloc/free`—in the Wasm. In [the previously-mentioned article on WasmGC](https://developer.chrome.com/blog/wasmgc), the size of the `fannkuch` benchmark was measured and WasmGC was much smaller than C or Rust—**2.3** K vs **6.1-9.6** K—for this exact reason.
+
+### Cycle collection
+
+In browsers, Wasm often interacts with JavaScript (and through JavaScript, Web APIs), but in WasmMVP (and even with the [reference types](https://github.com/WebAssembly/reference-types/blob/master/proposals/reference-types/Overview.md) proposal) there is no way to have bidirectional links between Wasm and JS that allow cycles to be collected in a fine-grained manner. Links to JS objects can only be placed in the Wasm table, and links back to the Wasm can only refer to the entire Wasm instance as a single big object, like this:
+
+
+![Cycles between JS and an entire Wasm module](/_img/wasm-gc-porting/cycle2.svg "Individual JS objects refer to a single big Wasm instance, and not to individual objects inside it.")
+
+That is not enough to efficiently collect specific cycles of objects where some happen to be in the compiled VM and some in JavaScript. With WasmGC, on the other hand, we define Wasm objects that the VM is aware of, and so we can have proper references from Wasm to JavaScript and back:
+
+![Cycles between JS and WasmGC objects](/_img/wasm-gc-porting/cycle3.svg "JS and Wasm objects with links between them.")
+
+### GC references on the stack
+
+GC languages must be aware of references on the stack, that is, from local variables in a call scope, as such references may be the only thing keeping an object alive. In a traditional port of a GC language that is a problem because Wasm’s sandboxing prevents programs from inspecting their own stack. There are solutions for traditional ports, like a shadow stack ([which can be done automatically](https://github.com/WebAssembly/binaryen/blob/main/src/passes/SpillPointers.cpp)), or only collecting garbage when nothing is on the stack (which is the case in between turns of the JavaScript event loop). A possible future addition which would help traditional ports might be [stack scanning support](https://github.com/WebAssembly/design/issues/1459) in Wasm. For now, only WasmGC can handle stack references without overhead, and it does so completely automatically since the Wasm VM is in charge of GC.
+
+### GC Efficiency
+
+A related issue is the efficiency of performing a GC. Both porting approaches have potential advantages here. A traditional port can reuse optimizations in an existing VM that may be tailored to a particular language, such as a heavy focus on optimizing interior pointers or short-lived objects. A WasmGC port that runs on the Web, on the other hand, has the advantage of reusing all the work that has gone into making JavaScript GC fast, including techniques like [generational GC](https://en.wikipedia.org/wiki/Tracing_garbage_collection#Generational_GC_(ephemeral_GC)), [incremental collection](https://en.wikipedia.org/wiki/Tracing_garbage_collection#Stop-the-world_vs._incremental_vs._concurrent), etc. WasmGC also leaves GC to the VM, which makes things like efficient write barriers simpler.
+
+Another advantage of WasmGC is that the GC can be aware of things like memory pressure and can adjust its heap size and collection frequency accordingly, again, as JavaScript VMs already do on the Web.
+
+### Memory fragmentation
+
+Over time, and especially in long-running programs, `malloc/free` operations on WasmMVP linear memory can cause *fragmentation*. Imagine that we have a total of 2 MB of memory, and right in the middle of it we have an existing small allocation of only a few bytes. In languages like C, C++, and Rust it is impossible to move an arbitrary allocation at runtime, and so we have almost 1MB to the left of that allocation and almost 1MB to the right. But those are two separate fragments, and so if we try to allocate 1.5 MB we will fail, even though we do have that amount of total unallocated memory:
+
+
+![](/_img/wasm-gc-porting/fragment1.svg "A linear memory with a rude small allocation right in the middle, splitting the free space into 2 halves.")
+
+Such fragmentation can force a Wasm module to grow its memory more often, which [adds overhead and can cause out-of-memory errors](https://github.com/WebAssembly/design/issues/1397); [improvements](https://github.com/WebAssembly/design/issues/1439) are being designed, but it is a challenging problem. This is an issue in all WasmMVP programs, including traditional ports of GC languages (note that the GC objects themselves might be movable, but not parts of the runtime itself). WasmGC, on the other hand, avoids this issue because memory is completely managed by the VM, which can move them around to compact the GC heap and avoid fragmentation.
+
+### Developer tools integration
+
+In a traditional port to WasmMVP, objects are placed in linear memory which is hard for developer tools to provide useful information about, because such tools only see bytes without high-level type information. In WasmGC, on the other hand, the VM manages GC objects so better integration is possible. For example, in Chrome you can use the heap profiler to measure memory usage of a WasmGC program:
+
+
+![WasmGC code running in the Chrome heap profiler](/_img/wasm-gc-porting/devtools.png)
+
+The figure above shows the Memory tab in Chrome DevTools, where we have a heap snapshot of a page that ran WasmGC code that created 1,001 small objects in a [linked list](https://gist.github.com/kripken/5cd3e18b6de41c559d590e44252eafff). You can see the name of the object’s type, `$Node`, and the field `$next` which refers to the next object in the list. All the usual heap snapshot information is present, like the number of objects, the shallow size, the retained size, and so forth, letting us easily see how much memory is actually used by WasmGC objects. Other Chrome DevTools features like the debugger work as well on WasmGC objects.
+
+### Language Semantics
+
+When you recompile a VM in a traditional port you get the exact language you expect, since you’re running familiar code that implements that language. That’s a major advantage! In comparison, with a WasmGC port you may end up considering compromises in semantics in return for efficiency. That is because with WasmGC we define new GC types—structs and arrays—and compile to them. As a result, we can’t simply compile a VM written in C, C++, Rust, or similar languages to that form, since those only compile to linear memory, and so WasmGC can’t help with the great majority of existing VM codebases. Instead, in a WasmGC port you typically write new code that transforms your language’s constructs into WasmGC primitives. And there are multiple ways to do that transformation, with different tradeoffs.
+
+Whether compromises are needed or not depends on how a particular language’s constructs can be implemented in WasmGC. For example, WasmGC struct fields have fixed indexes and types, so a language that wishes to access fields in a more dynamic manner [may have challenges](https://github.com/WebAssembly/gc/issues/397); there are various ways to work around that, and in that space of solutions some options may be simpler or faster but not support the full original semantics of the language. (WasmGC has other current limitations as well, for example, it lacks [interior pointers](https://go.dev/blog/ismmkeynote); over time such things are expected to [improve](https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md).)
+
+As we’ve mentioned, compiling to WasmGC is like compiling to an existing VM, and there are many examples of compromises that make sense in such ports. For example, [dart2js (Dart compiled to JavaScript) numbers behave differently than in the Dart VM](https://dart.dev/guides/language/numbers), and [IronPython (Python compiled to .NET) strings behave like C# strings](https://nedbatchelder.com/blog/201703/ironpython_is_weird.html). As a result, not all programs of a language may run in such ports, but there are good reasons for these choices: Implementing dart2js numbers as JavaScript numbers lets VMs optimize them well, and using .NET strings in IronPython means you can pass those strings to other .NET code with no overhead.
+
+While compromises may be needed in WasmGC ports, WasmGC also has some advantages as a compiler target compared to JavaScript in particular. For example, while dart2js has the numeric limitations we just mentioned, [dart2wasm](https://flutter.dev/wasm) (Dart compiled to WasmGC) behaves exactly as it should, without compromise (that is possible since Wasm has efficient representations for the numeric types Dart requires).
+
+Why isn’t this an issue for traditional ports? Simply because they recompile an existing VM into linear memory, where objects are stored in untyped bytes, which is lower-level than WasmGC. When all you have are untyped bytes then you have a lot more flexibility to do all manner of low-level (and potentially unsafe) tricks, and by recompiling an existing VM you get all the tricks that VM has up its sleeve.
+
+### Toolchain Effort
+
+As we mentioned in the previous subsection, a WasmGC port cannot simply recompile an existing VM. You might be able to reuse certain code (such as parser logic and AOT optimizations, because those don’t integrate with the GC at runtime), but in general WasmGC ports require a substantial amount of new code.
+
+In comparison, traditional ports to WasmMVP can be simpler and quicker: for example, you can compile the Lua VM (written in C) to Wasm in just a few minutes. A WasmGC port of Lua, on the other hand, would require more effort as you’d need to write code to lower Lua’s constructs into WasmGC structs and arrays, and you’d need to decide how to actually do that within the specific constraints of the WasmGC type system.
+
+Greater toolchain effort is therefore a significant disadvantage of WasmGC porting. However, given all the advantages we’ve mentioned earlier, we think WasmGC is still very appealing! The ideal situation would be one in which WasmGC’s type system could support all languages efficiently, and all languages put in the work to implement a WasmGC port. The first part of that will be helped by [future additions to the WasmGC type system](https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md), and for the second, we can reduce the work involved in WasmGC ports by sharing the effort on the toolchain side as much as possible. Luckily, it turns out that WasmGC makes it very practical to share toolchain work, which we’ll see in the next section.
+
+## Optimizing WasmGC
+
+We’ve already mentioned that WasmGC ports have potential speed advantages, such as using less memory and reusing optimizations in the host GC. In this section we’ll show other interesting optimization advantages of WasmGC over WasmMVP, which can have a large impact on how WasmGC ports are designed and how fast the final results are.
+
+The key issue here is that *WasmGC is higher-level than WasmMVP*. To get an intuition for that, remember that we’ve already said that a traditional port to WasmMVP is like porting to a new architecture while a WasmGC port is like porting to a new VM, and VMs are of course higher-level abstractions over architectures—and higher-level representations are often more optimizable. We can perhaps see this more clearly with a concrete example in pseudocode:
+
+```csharp
+func foo() {
+ let x = allocate(); // Allocate a GC object.
+ x.val = 10; // Set a field to 10.
+ let y = allocate(); // Allocate another object.
+ y.val = x.val; // This must be 10.
+ return y.val; // This must also be 10.
+}
+```
+
+As the comments indicate, `x.val` will contain `10`, as will `y.val`, so the final return is of `10` as well, and then the optimizer can even remove the allocations, leading to this:
+
+```csharp
+func foo() {
+ return 10;
+}
+```
+
+Great! Sadly, however, that is not possible in WasmMVP, because each allocation turns into a call to `malloc`, a large and complex function in the Wasm which has side effects on linear memory. As a result of those side effects, the optimizer must assume that the second allocation (for `y`) might alter `x.val`, which also resides in linear memory. Memory management is complex, and when we implement it inside the Wasm at a low level then our optimization options are limited.
+
+In contrast, in WasmGC we operate at a higher level: each allocation executes the `struct.new` instruction, a VM operation that we can actually reason about, and an optimizer can track references as well to conclude that `x.val` is written exactly once with the value `10`. As a result we can optimize that function down to a simple return of `10` as expected!
+
+Aside from allocations, other things WasmGC adds are explicit function pointers (`ref.func`) and calls using them (`call_ref`), types on struct and array fields (unlike untyped linear memory), and more. As a result, WasmGC is a higher-level Intermediate Representation (IR) than WasmMVP, and much more optimizable.
+
+If WasmMVP has limited optimizability, why is it as fast as it is? Wasm, after all, can run pretty close to full native speed. That is because WasmMVP is generally the output of a powerful optimizing compiler like LLVM. LLVM IR, like WasmGC and unlike WasmMVP, has a special representation for allocations and so forth, so LLVM can optimize the things we’ve been discussing. The design of WasmMVP is that most optimizations happen at the toolchain level *before* Wasm, and Wasm VMs only do the “last mile” of optimization (things like register allocation).
+
+Can WasmGC adopt a similar toolchain model as WasmMVP, and in particular use LLVM? Unfortunately, no, since LLVM does not support WasmGC (some amount of support [has been explored](https://github.com/Igalia/ref-cpp), but it is hard to see how full support could even work). Also, many GC languages do not use LLVM–there is a wide variety of compiler toolchains in that space. And so we need something else for WasmGC.
+
+Luckily, as we’ve mentioned, WasmGC is very optimizable, and that opens up new options. Here is one way to look at that:
+
+![WasmMVP and WasmGC toolchain workflows](/_img/wasm-gc-porting/workflows1.svg)
+
+Both the WasmMVP and WasmGC workflows begin with the same two boxes on the left: we start with source code that is processed and optimized in a language-specific manner (which each language knows best about itself). Then a difference appears: for WasmMVP we must perform general-purpose optimizations first and then lower to Wasm, while for WasmGC we have the option to first lower to Wasm and optimize later. This is important because there is a large advantage to optimizing after lowering: then we can share toolchain code for general-purpose optimizations between all languages that compile to WasmGC. The next figure shows what that looks like:
+
+
+![Multiple WasmGC toolchains are optimized by the Binaryen optimizer](/_img/wasm-gc-porting/workflows2.svg "Several languages on the left compile to WasmGC in the middle, and all that flows into the Binaryen optimizer (wasm-opt).")
+
+Since we can do general optimizations *after* compiling to WasmGC, a Wasm-to-Wasm optimizer can help all WasmGC compiler toolchains. For this reason the V8 team has invested in WasmGC in [Binaryen](https://github.com/WebAssembly/binaryen/), which all toolchains can use as the `wasm-opt` commandline tool. We’ll focus on that in the next subsection.
+
+### Toolchain optimizations
+
+[Binaryen](https://github.com/WebAssembly/binaryen/), the WebAssembly toolchain optimizer project, already had a [wide range of optimizations](https://www.youtube.com/watch?v=_lLqZR4ufSI) for WasmMVP content such as inlining, constant propagation, dead code elimination, etc., almost all of which also apply to WasmGC. However, as we mentioned before, WasmGC allows us to do a lot more optimizations than WasmMVP, and we have written a lot of new optimizations accordingly:
+
+- [Escape analysis](https://github.com/WebAssembly/binaryen/blob/main/src/passes/Heap2Local.cpp) to move heap allocations to locals.
+- [Devirtualization](https://github.com/WebAssembly/binaryen/blob/main/src/passes/ConstantFieldPropagation.cpp) to turn indirect calls into direct ones (that can then be inlined, potentially).
+- [More powerful global dead code elimination](https://github.com/WebAssembly/binaryen/pull/4621).
+- [Whole-program type-aware content flow analysis (GUFA)](https://github.com/WebAssembly/binaryen/pull/4598).
+- [Cast optimizations](https://github.com/WebAssembly/binaryen/blob/main/src/passes/OptimizeCasts.cpp) such as removing redundant casts and moving them to earlier locations.
+- [Type pruning](https://github.com/WebAssembly/binaryen/blob/main/src/passes/GlobalTypeOptimization.cpp).
+- [Type merging](https://github.com/WebAssembly/binaryen/blob/main/src/passes/TypeMerging.cpp).
+- Type refining (for [locals](https://github.com/WebAssembly/binaryen/blob/main/src/passes/LocalSubtyping.cpp), [globals](https://github.com/WebAssembly/binaryen/blob/main/src/passes/GlobalRefining.cpp), [fields](https://github.com/WebAssembly/binaryen/blob/main/src/passes/TypeRefining.cpp), and [signatures](https://github.com/WebAssembly/binaryen/blob/main/src/passes/SignatureRefining.cpp)).
+
+That’s just a quick list of some of the work we’ve been doing. For more on Binaryen’s new GC optimizations and how to use them, see the [Binaryen docs](https://github.com/WebAssembly/binaryen/wiki/GC-Optimization-Guidebook).
+
+To measure the effectiveness of all those optimizations in Binaryen, let’s look at Java performance with and without `wasm-opt`, on output from the [J2Wasm](https://github.com/google/j2cl/tree/master/samples/wasm) compiler which compiles Java to WasmGC:
+
+![Java performance with and without wasm-opt](/_img/wasm-gc-porting/benchmark1.svg "Box2D, DeltaBlue, RayTrace, and Richards benchmarks, all showing an improvement with wasm-opt.")
+
+Here, “without wasm-opt” means we do not run Binaryen’s optimizations, but we do still optimize in the VM and in the J2Wasm compiler. As shown in the figure, `wasm-opt` provides a significant speedup on each of these benchmarks, on average making them **1.9×** faster.
+
+In summary, `wasm-opt` can be used by any toolchain that compiles to WasmGC and it avoids the need to reimplement general-purpose optimizations in each. And, as we continue to improve Binaryen’s optimizations, that will benefit all toolchains that use `wasm-opt`, just like improvements to LLVM help all languages that compile to WasmMVP using LLVM.
+
+Toolchain optimizations are just one part of the picture. As we will see next, optimizations in Wasm VMs are also absolutely critical.
+
+### V8 optimizations
+
+As we’ve mentioned, WasmGC is more optimizable than WasmMVP, and not only toolchains can benefit from that but also VMs. And that turns out to be important because GC languages are different from the languages that compile to WasmMVP. Consider inlining, for example, which is one of the most important optimizations: Languages like C, C++, and Rust inline at compile time, while GC languages like Java and Dart typically run in a VM that inlines and optimizes at runtime. That performance model has affected both language design and how people write code in GC languages.
+
+For example, in a language like Java, all calls begin as indirect (a child class can override a parent function, even when calling a child using a reference of the parent type). We benefit whenever the toolchain can turn an indirect call into a direct one, but in practice code patterns in real-world Java programs often have paths that actually do have lots of indirect calls, or at least ones that cannot be inferred statically to be direct. To handle those cases well, we’ve implemented **speculative inlining** in V8, that is, indirect calls are noted as they occur at runtime, and if we see that a call site has fairly simple behavior (few call targets), then we inline there with appropriate guard checks, which is closer to how Java is normally optimized than if we left such things entirely to the toolchain.
+
+Real-world data validates that approach. We measured performance on the Google Sheets Calc Engine, which is a Java codebase that is used to compute spreadsheet formulas, which until now has been compiled to JavaScript using [J2CL](https://j2cl.io). The V8 team has been collaborating with Sheets and J2CL to port that code to WasmGC, both because of the expected performance benefits for Sheets, and to provide useful real-world feedback for the WasmGC spec process. Looking at performance there, it turns out that speculative inlining is the most significant individual optimization we’ve implemented for WasmGC in V8, as the following chart shows:
+
+
+![Java performance with different V8 optimizations](/_img/wasm-gc-porting/benchmark2.svg "WasmGC latency without opts, with other opts, with speculative inlining, and with speculative inlining + other opts. The largest improvement by far is to add speculative inlining.")
+
+“Other opts” here means optimizations aside from speculative inlining that we could disable for measurement purposes, which includes: load elimination, type-based optimizations, branch elimination, constant folding, escape analysis, and common subexpression elimination. “No opts” means we’ve switched off all of those as well as speculative inlining (but other optimizations exist in V8 which we can’t easily switch off; for that reason the numbers here are only an approximation). The very large improvement due to speculative inlining—about a **30%** speedup(!)—compared to all the other opts together shows how important inlining is at least on compiled Java.
+
+Aside from speculative inlining, WasmGC builds upon the existing Wasm support in V8, which means it benefits from the same optimizer pipeline, register allocation, tiering, and so forth. In addition to all that, specific aspects of WasmGC can benefit from additional optimizations, the most obvious of which is to optimize the new instructions that WasmGC provides, such as having an efficient implementation of type casts. Another important piece of work we’ve done is to use WasmGC’s type information in the optimizer. For example, `ref.test` checks if a reference is of a particular type at runtime, and after such a check succeeds we know that `ref.cast`, a cast to the same type, must also succeed. That helps optimize patterns like this in Java:
+
+```java
+if (ref instanceof Type) {
+ foo((Type) ref); // This downcast can be eliminated.
+}
+```
+
+These optimizations are especially useful after speculative inlining, because then we see more than the toolchain did when it produced the Wasm.
+
+Overall, in WasmMVP there was a fairly clear separation between toolchain and VM optimizations: We did as much as possible in the toolchain and left only necessary ones for the VM, which made sense as it kept VMs simpler. With WasmGC that balance might shift somewhat, because as we’ve seen there is a need to do more optimizations at runtime for GC languages, and also WasmGC itself is more optimizable, allowing us to have more of an overlap between toolchain and VM optimizations. It will be interesting to see how the ecosystem develops here.
+
+## Demo and status
+
+You can use WasmGC today! After reaching [phase 4](https://github.com/WebAssembly/meetings/blob/main/process/phases.md#4-standardize-the-feature-working-group) at the W3C, WasmGC is now a full and finalized standard, and Chrome 119 shipped with support for it. With that browser (or any other browser that has WasmGC support; for example, Firefox 120 is expected to launch with WasmGC support later this month) you can run this [Flutter demo](https://flutterweb-wasm.web.app/) in which Dart compiled to WasmGC drives the application’s logic, including its widgets, layout, and animation.
+
+![The Flutter demo running in Chrome 119.](/_img/wasm-gc-porting/flutter-wasm-demo.png "Material 3 rendered by Flutter WasmGC."){ .no-darkening }
+
+## Getting started
+
+If you’re interested in using WasmGC, the following links might be useful:
+
+- Various toolchains have support for WasmGC today, including [Dart](https://flutter.dev/wasm), [Java (J2Wasm)](https://github.com/google/j2cl/blob/master/docs/getting-started-j2wasm.md), [Kotlin](https://kotl.in/wasmgc), [OCaml (wasm_of_ocaml)](https://github.com/ocaml-wasm/wasm_of_ocaml), and [Scheme (Hoot)]( https://gitlab.com/spritely/guile-hoot).
+- The [source code](https://gist.github.com/kripken/5cd3e18b6de41c559d590e44252eafff) of the small program whose output we showed in the developer tools section is an example of writing a “hello world” WasmGC program by hand. (In particular you can see the `$Node` type defined and then created using `struct.new`.)
+- The Binaryen wiki has [documentation](https://github.com/WebAssembly/binaryen/wiki/GC-Implementation---Lowering-Tips) about how compilers can emit WasmGC code that optimizes well. The earlier links to the various WasmGC-targeting toolchains can also be useful to learn from, for example, you can look at the Binaryen passes and flags that [Java](https://github.com/google/j2cl/blob/8609e47907cfabb7c038101685153d3ebf31b05b/build_defs/internal_do_not_use/j2wasm_application.bzl#L382-L415), [Dart](https://github.com/dart-lang/sdk/blob/f36c1094710bd51f643fb4bc84d5de4bfc5d11f3/sdk/bin/dart2wasm#L135), and [Kotlin](https://github.com/JetBrains/kotlin/blob/f6b2c642c2fff2db7f9e13cd754835b4c23e90cf/libraries/tools/kotlin-gradle-plugin/src/common/kotlin/org/jetbrains/kotlin/gradle/targets/js/binaryen/BinaryenExec.kt#L36-L67) use.
+
+## Summary
+
+WasmGC is a new and promising way to implement GC languages in WebAssembly. Traditional ports in which a VM is recompiled to Wasm will still make the most sense in some cases, but we hope that WasmGC ports will become a popular technique because of their benefits: WasmGC ports have the ability to be smaller than traditional ports—even smaller than WasmMVP programs written in C, C++, or Rust—and they integrate better with the Web on matters like cycle collection, memory use, developer tooling, and more. WasmGC is also a more optimizable representation, which can provide significant speed benefits as well as opportunities to share more toolchain effort between languages.
+
diff --git a/src/docs/cross-compile-ios.md b/src/docs/cross-compile-ios.md
index 6f0af85e8..5b8ecc68b 100644
--- a/src/docs/cross-compile-ios.md
+++ b/src/docs/cross-compile-ios.md
@@ -31,14 +31,12 @@ This section shows how to build a monolithic V8 version for use on either a phys
Set up GN build files by running `gn args out/release-ios` and inserting the following keys:
```python
-enable_ios_bitcode = true
ios_deployment_target = 10
is_component_build = false
is_debug = false
target_cpu = "arm64" # "x64" for a simulator build.
target_os = "ios"
use_custom_libcxx = false # Use Xcode's libcxx.
-use_xcode_clang = true
v8_enable_i18n_support = false # Produces a smaller binary.
v8_monolithic = true # Enable the v8_monolith target.
v8_use_external_startup_data = false # The snaphot is included in the binary.
diff --git a/src/docs/embed.md b/src/docs/embed.md
index 23dcdf2df..55945aef6 100644
--- a/src/docs/embed.md
+++ b/src/docs/embed.md
@@ -10,7 +10,7 @@ This document is intended for C++ programmers who want to embed the V8 JavaScrip
## Hello world
-Let’s look at a [Hello World example](https://chromium.googlesource.com/v8/v8/+/branch-heads/6.8/samples/hello-world.cc) that takes a JavaScript statement as a string argument, executes it as JavaScript code, and prints the result to standard out.
+Let’s look at a [Hello World example](https://chromium.googlesource.com/v8/v8/+/branch-heads/11.9/samples/hello-world.cc) that takes a JavaScript statement as a string argument, executes it as JavaScript code, and prints the result to standard out.
First, some key concepts:
@@ -26,7 +26,7 @@ These concepts are discussed in greater detail in [the advanced guide](/docs/emb
Follow the steps below to run the example yourself:
1. Download the V8 source code by following [the Git instructions](/docs/source-code#using-git).
-1. The instructions for this hello world example have last been tested with V8 v10.5.1. You can check out this branch with `git checkout refs/tags/10.5.1 -b sample -t`
+1. The instructions for this hello world example have last been tested with V8 v11.9. You can check out this branch with `git checkout branch-heads/11.9 -b sample -t`
1. Create a build configuration using the helper script:
```bash
@@ -48,7 +48,7 @@ Follow the steps below to run the example yourself:
1. Compile `hello-world.cc`, linking to the static library created in the build process. For example, on 64bit Linux using the GNU compiler:
```bash
- g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith -lv8_libbase -lv8_libplatform -ldl -Lout.gn/x64.release.sample/obj/ -pthread -std=c++17 -DV8_COMPRESS_POINTERS
+ g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith -lv8_libbase -lv8_libplatform -ldl -Lout.gn/x64.release.sample/obj/ -pthread -std=c++17 -DV8_COMPRESS_POINTERS -DV8_ENABLE_SANDBOX
```
1. For more complex code, V8 fails without an ICU data file. Copy this file to where your binary is stored:
@@ -65,17 +65,17 @@ Follow the steps below to run the example yourself:
1. It prints `Hello, World!`. Yay!
-If you are looking for an example which is in sync with master, check out the file [`hello-world.cc`](https://chromium.googlesource.com/v8/v8/+/master/samples/hello-world.cc). This is a very simple example and you’ll likely want to do more than just execute scripts as strings. [The advanced guide below](#advanced-guide) contains more information for V8 embedders.
+If you are looking for an example which is in sync with the main branch, check out the file [`hello-world.cc`](https://chromium.googlesource.com/v8/v8/+/main/samples/hello-world.cc). This is a very simple example and you’ll likely want to do more than just execute scripts as strings. [The advanced guide below](#advanced-guide) contains more information for V8 embedders.
## More example code
The following samples are provided as part of the source code download.
-### [`process.cc`](https://github.com/v8/v8/blob/master/samples/process.cc)
+### [`process.cc`](https://github.com/v8/v8/blob/main/samples/process.cc)
This sample provides the code necessary to extend a hypothetical HTTP request processing application — which could be part of a web server, for example — so that it is scriptable. It takes a JavaScript script as an argument, which must provide a function called `Process`. The JavaScript `Process` function can be used to, for example, collect information such as how many hits each page served by the fictional web server gets.
-### [`shell.cc`](https://github.com/v8/v8/blob/master/samples/shell.cc)
+### [`shell.cc`](https://github.com/v8/v8/blob/main/samples/shell.cc)
This sample takes filenames as arguments then reads and executes their contents. Includes a command prompt at which you can enter JavaScript code snippets which are then executed. In this sample additional functions like `print` are also added to JavaScript through the use of object and function templates.
diff --git a/src/docs/feature-launch-process.md b/src/docs/feature-launch-process.md
index ea92059c8..6967fb3f6 100644
--- a/src/docs/feature-launch-process.md
+++ b/src/docs/feature-launch-process.md
@@ -2,24 +2,78 @@
title: 'Implementing and shipping JavaScript/WebAssembly language features'
description: 'This document explains the process for implementing and shipping JavaScript or WebAssembly language features in V8.'
---
-In general, V8 uses the [Blink Intent process](https://www.chromium.org/blink/launching-features) for JavaScript and WebAssembly language features. The differences are laid out in the errata below. Please follow the Blink Intent process, unless the errata tells you otherwise.
+In general, V8 follows the [Blink Intent process for already-defined consensus-based standards](https://www.chromium.org/blink/launching-features/#process-existing-standard) for JavaScript and WebAssembly language features. V8-specific errata are laid out below. Please follow the Blink Intent process, unless the errata tells you otherwise.
-If you have any questions on this topic, please send hablich@chromium.org and v8-dev@googlegroups.com an email.
+If you have any questions on this topic for JavaScript features, please email syg@chromium.org and v8-dev@googlegroups.com.
+
+For WebAssembly features, please email gdeepti@chromium.org and v8-dev@googlegroups.com.
## Errata
+### JavaScript features usually wait until Stage 3+ { #stage3plus }
+
+As a rule of thumb, V8 waits to implement JavaScript feature proposals until they advance to [Stage 3 or later in TC39](https://tc39.es/process-document/). TC39 has its own consensus process, and Stage 3 or later signals explicit consensus among TC39 delegates, including all browser vendors, that a feature proposal is ready to implement. This external consensus process means Stage 3+ features do not need to send Intent emails other than Intent to Ship.
+
### TAG review { #tag }
For smaller JavaScript or WebAssembly features, a TAG review is not required, as TC39 and the Wasm CG already provide significant technical oversight. If the feature is large or cross-cutting (e.g., requires changes to other Web Platform APIs or modifications to Chromium), TAG review is recommended.
-### Instead of WPT, Test262 and WebAssembly spec tests are sufficient { #tests }
+### Both V8 and blink flags are required { #flags }
+
+When implementing a feature, both a V8 flag and a blink `base::Feature` are required.
+
+Blink features are required so that Chrome can turn off features without distributing new binaries in emergency situations. This is usually implemented in [`gin/gin_features.h`](https://source.chromium.org/chromium/chromium/src/+/main:gin/gin_features.h), [`gin/gin_features.cc`](https://source.chromium.org/chromium/chromium/src/+/main:gin/gin_features.cc), and [`gin/v8_initializer.cc`](https://source.chromium.org/chromium/chromium/src/+/main:gin/v8_initializer.cc),
+
+### Fuzzing is required to ship { #fuzzing }
+
+JavaScript and WebAssembly features must be fuzzed for a minimum period of 4 weeks, or one (1) release milestone, with all fuzz bugs fixed, before they can be shipped.
+
+For code-complete JavaScript features, start fuzzing by moving the feature flag to the `JAVASCRIPT_STAGED_FEATURES_BASE` macro in [`src/flags/flag-definitions.h`](https://source.chromium.org/chromium/chromium/src/+/master:v8/src/flags/flag-definitions.h).
+
+For WebAssembly, see the [WebAssembly shipping checklist](/docs/wasm-shipping-checklist).
+
+### [Chromestatus](https://chromestatus.com/) and review gates { #chromestatus }
+
+The blink intent process includes a series of review gates that must be approved on the feature's entry in [Chromestatus](https://chromestatus.com/) before an Intent to Ship is sent out seeking API OWNER approvals.
+
+These gates are tailored towards web APIs, and some gates may not be applicable to JavaScript and WebAssembly features. The following is broad guidance. The specifics differ from feature to feature; do not apply guidance blindly!
+
+#### Privacy
+
+Most JavaScript and WebAssembly features do not affect privacy. Rarely, features may add new fingerprinting vectors that reveal information about a user's operating system or hardware.
-Adding Web Platform Tests (WPT) is not required, as JavaScript and WebAssembly language features have their own test repositories. Feel free to add some though, if you think it is beneficial.
+#### Security
-For JavaScript features, explicit correctness tests in [Test262](https://github.com/tc39/test262) are preferred and required.
+While JavaScript and WebAssembly are common attack vectors in security exploits, most new features do not add additional attack surface. [Fuzzing](#fuzzing) is required, and mitigates some of the risk.
+
+Features that affect known popular attack vectors, such as `ArrayBuffer`s in JavaScript, and features that might enable side-channel attacks, need extra scrutiny and must be reviewed.
+
+#### Enterprise
+
+Throughout their standardization process in TC39 and the Wasm CG, JavaScript and WebAssembly features already undergo heavy backwards compatibility scrutiny. It is exceedingly rare for features to be willfuly backwards incompatible.
+
+For JavaScript, recently shipped features can also be disabled via `chrome://flags/#disable-javascript-harmony-shipping`.
+
+#### Debuggability
+
+JavaScript and WebAssembly features' debuggability differs significantly from feature to feature. JavaScript features that only add new built-in methods do not need additional debugger support, while WebAssembly features that add new capabilities may need significant additional debugger support.
+
+For more details, see the [JavaScript feature debugging checklist](https://docs.google.com/document/d/1_DBgJ9eowJJwZYtY6HdiyrizzWzwXVkG5Kt8s3TccYE/edit#heading=h.u5lyedo73aa9) and the [WebAssembly feature debugging checklist](https://goo.gle/devtools-wasm-checklist).
+
+When in doubt, this gate is applicable.
+
+#### Testing { #tests }
+
+Instead of WPT, Test262 tests are sufficient for JavaScript features, and WebAssembly spec tests are sufficient for WebAssembly features.
+
+Adding Web Platform Tests (WPT) is not required, as JavaScript and WebAssembly language features have their own interoperable test repositories that are run by multiple implementations. Feel free to add some though, if you think it is beneficial.
+
+For JavaScript features, explicit correctness tests in [Test262](https://github.com/tc39/test262) are required. Note that tests in the [staging directory](https://github.com/tc39/test262/blob/main/CONTRIBUTING.md#staging) suffice.
For WebAssembly features, explicit correctness tests in the [WebAssembly Spec Test repo](https://github.com/WebAssembly/spec/tree/master/test) are required.
+For performance tests, JavaScript already underlies most existing performance benchmarks, like Speedometer.
+
### Who to CC { #cc }
**Every** “intent to `$something`” email (e.g. “intent to implement”) should CC in addition to . This way, other embedders of V8 are kept in the loop too.
diff --git a/src/docs/index.md b/src/docs/index.md
index 3a0c268af..c92adcddb 100644
--- a/src/docs/index.md
+++ b/src/docs/index.md
@@ -8,7 +8,7 @@ This documentation is aimed at C++ developers who want to use V8 in their applic
## About V8
-V8 implements ECMAScript and WebAssembly, and runs on Windows 7 or later, macOS 10.12+, and Linux systems that use x64, IA-32, or ARM processors. Additional systems (IBM i, AIX) and processors (MIPS, ppcle64, s390x) are externally maintained, see [ports](/docs/ports). V8 can run standalone, or can be embedded into any C++ application.
+V8 implements ECMAScript and WebAssembly, and runs on Windows, macOS, and Linux systems that use x64, IA-32, or ARM processors. Additional systems (IBM i, AIX) and processors (MIPS, ppcle64, s390x) are externally maintained, see [ports](/docs/ports). V8 can be embedded into any C++ application.
V8 compiles and executes JavaScript source code, handles memory allocation for objects, and garbage collects objects it no longer needs. V8’s stop-the-world, generational, accurate garbage collector is one of the keys to V8’s performance.
diff --git a/src/docs/release-process.md b/src/docs/release-process.md
index 92606d177..8640ccc8e 100644
--- a/src/docs/release-process.md
+++ b/src/docs/release-process.md
@@ -4,7 +4,7 @@ description: 'This document explains the V8 release process.'
---
The V8 release process is tightly connected to [Chrome’s](https://www.chromium.org/getting-involved/dev-channel). The V8 team is using all four Chrome release channels to push new versions to the users.
-If you want to look up what V8 version is in a Chrome release you can check [OmahaProxy](https://omahaproxy.appspot.com/). For each Chrome release a separate branch is created in the V8 repository to make the trace-back easier e.g. for [Chrome 94.0.4606.61](https://chromium.googlesource.com/v8/v8.git/+/chromium/4606).
+If you want to look up what V8 version is in a Chrome release you can check [Chromiumdash](https://chromiumdash.appspot.com/releases). For each Chrome release a separate branch is created in the V8 repository to make the trace-back easier e.g. for [Chrome M121](https://chromium.googlesource.com/v8/v8/+log/refs/branch-heads/12.1).
## Canary releases
@@ -12,30 +12,21 @@ Every day a new Canary build is pushed to the users via [Chrome’s Canary chann
Branches for a Canary normally look like this:
-```
-remotes/origin/9.4.146
-```
-
## Dev releases
Every week a new Dev build is pushed to the users via [Chrome’s Dev channel](https://www.google.com/chrome/browser/desktop/index.html?extra=devchannel&platform=win64). Normally the deliverable includes the latest stable enough V8 version on the Canary channel.
-Branches for a Dev normally look like this:
-
-```
-remotes/origin/9.4.146
-```
## Beta releases
-Roughly every 4 weeks a new major branch is created e.g. [for Chrome 94](https://chromium.googlesource.com/v8/v8.git/+log/branch-heads/9.4). This is happening in sync with the creation of [Chrome’s Beta channel](https://www.google.com/chrome/browser/beta.html?platform=win64). The Chrome Beta is pinned to the head of V8’s branch. After approx. 4 weeks the branch is promoted to Stable.
+Roughly every 2 weeks a new major branch is created e.g. [for Chrome 94](https://chromium.googlesource.com/v8/v8.git/+log/branch-heads/9.4). This is happening in sync with the creation of [Chrome’s Beta channel](https://www.google.com/chrome/browser/beta.html?platform=win64). The Chrome Beta is pinned to the head of V8’s branch. After approx. 2 weeks the branch is promoted to Stable.
Changes are only cherry-picked onto the branch in order to stabilize the version.
Branches for a Beta normally look like this
```
-remotes/branch-heads/9.4
+refs/branch-heads/12.1
```
They are based on a Canary branch.
@@ -47,19 +38,31 @@ Roughly every 4 weeks a new major Stable release is done. No special branch is c
Branches for a Stable release normally look like this:
```
-remotes/branch-heads/9.4
+refs/branch-heads/12.1
```
They are promoted (reused) Beta branches.
+## API
+
+Chromiumdash is also providing an API to collect the same information:
+
+```
+https://chromiumdash.appspot.com/fetch_milestones (to get the V8 branch name e.g. refs/branch-heads/12.1)
+https://chromiumdash.appspot.com/fetch_releases (to get the the V8 branch git hash)
+```
+
+The following parameter are helpful:
+mstone=121
+channel=Stable,Canary,Beta,Dev
+platform=Mac,Windows,Lacros,Linux,Android,Webview,etc.
+
## Which version should I embed in my application?
The tip of the same branch that Chrome’s Stable channel uses.
We often backmerge important bug fixes to a stable branch, so if you care about stability and security and correctness, you should include those updates too — that’s why we recommend “the tip of the branch”, as opposed to an exact version.
-As soon as a new branch is promoted to Stable, we stop maintaining the previous stable branch. This happens every six weeks, so you should be prepared to update at least this often.
-
-Example: If the current stable Chrome release is [94.0.4606.61](https://omahaproxy.appspot.com), with V8 v9.4.146.17. So you should embed [branch-heads/9.4](https://chromium.googlesource.com/v8/v8.git/+/branch-heads/9.4). And you should update to branch-heads/9.5 when Chrome 95 is released on the Stable channel.
+As soon as a new branch is promoted to Stable, we stop maintaining the previous stable branch. This happens every four weeks, so you should be prepared to update at least this often.
**Related:** [Which V8 version should I use?](/docs/version-numbers#which-v8-version-should-i-use%3F)
diff --git a/src/docs/stack-trace-api.md b/src/docs/stack-trace-api.md
index 4c29c0fd9..cff70dd09 100644
--- a/src/docs/stack-trace-api.md
+++ b/src/docs/stack-trace-api.md
@@ -102,7 +102,6 @@ The structured stack trace is an array of `CallSite` objects, each of which repr
- `isConstructor`: is this a constructor call?
- `isAsync`: is this an async call (i.e. `await`, `Promise.all()`, or `Promise.any()`)?
- `isPromiseAll`: is this an async call to `Promise.all()`?
-- `isPromiseAny`: is this an async call to `Promise.any()`?
- `getPromiseIndex`: returns the index of the promise element that was followed in `Promise.all()` or `Promise.any()` for async stack traces, or `null` if the `CallSite` is not an async `Promise.all()` or `Promise.any()` call.
The default stack trace is created using the CallSite API so any information that is available there is also available through this API.
diff --git a/src/docs/version-numbers.md b/src/docs/version-numbers.md
index 54aa8af4a..aa2d45b45 100644
--- a/src/docs/version-numbers.md
+++ b/src/docs/version-numbers.md
@@ -18,11 +18,10 @@ Embedders of V8 should generally use *the head of the branch corresponding to th
To find out what version this is,
-1. Go to
+1. Go to
2. Find the latest stable Chrome version in the table
-3. Check the `v8_version` column (to the right) on the same row
+3. Click on the (i) and check the `V8` column
-Example: at the time of this writing, the site indicates that for `mac`/`stable`, the Chrome release version is 59.0.3071.86, which corresponds to V8 version 5.9.211.31.
### Finding the head of the corresponding branch
@@ -32,7 +31,7 @@ V8’s version-related branches do not appear in the online repository at
```
-Example: for the V8 minor version 5.9 found above, we go to , finding a commit titled “Version 5.9.211.33”. Thus, the version of V8 that embedders should use at the time of this writing is **5.9.211.33**.
+Example: for the V8 minor version 12.1 found above, we go to , finding a commit titled “Version 12.1.285.2.
**Caution:** You should *not* simply find the numerically-greatest tag corresponding to the above minor V8 version, as sometimes those are not supported, e.g. they are tagged before deciding where to cut minor releases. Such versions do not receive backports or similar.
@@ -53,10 +52,3 @@ If you did not use `depot_tools`, edit `.git/config` and add the line below to t
```
fetch = +refs/branch-heads/*:refs/remotes/branch-heads/*
```
-
-Example: for the V8 minor version 5.9 found above, we can do:
-
-```bash
-$ git checkout branch-heads/5.9
-HEAD is now at 8c3db649d8... Version 5.9.211.33
-```
diff --git a/src/docs/wasm-compilation-pipeline.md b/src/docs/wasm-compilation-pipeline.md
index 696753644..6bd959384 100644
--- a/src/docs/wasm-compilation-pipeline.md
+++ b/src/docs/wasm-compilation-pipeline.md
@@ -9,7 +9,7 @@ WebAssembly is a binary format that allows you to run code from programming lang
Initially, V8 does not compile any functions in a WebAssembly module. Instead, functions get compiled lazily with the baseline compiler [Liftoff](/blog/liftoff) when the function gets called for the first time. Liftoff is a [one-pass compiler](https://en.wikipedia.org/wiki/One-pass_compiler), which means it iterates over the WebAssembly code once and emits machine code immediately for each WebAssembly instruction. One-pass compilers excel at fast code generation, but can only apply a small set of optimizations. Indeed, Liftoff can compile WebAssembly code very fast, tens of megabytes per second.
-Once Liftoff compilation is finished, the the resulting machine code gets registered with the WebAssembly module, so that for future calls to the function the compiled code can be used immediately.
+Once Liftoff compilation is finished, the resulting machine code gets registered with the WebAssembly module, so that for future calls to the function the compiled code can be used immediately.
## TurboFan
diff --git a/src/features.atom.njk b/src/features.atom.njk
index 38e2802b3..bb20e9027 100644
--- a/src/features.atom.njk
+++ b/src/features.atom.njk
@@ -20,6 +20,11 @@ excludeFromSitemap: true
{{ (post.data.updated or post.date) | rssDate }}{{ absolutePostUrl }}
+ {%- for tag in post.data.tags %}
+ {%- if not 'io' in tag and not 'Node.js' in tag %}
+
+ {%- endif %}
+ {%- endfor %}
{{ post.data.author | markdown | striptags }}
diff --git a/src/features/import-attributes.md b/src/features/import-attributes.md
new file mode 100644
index 000000000..54eeb0a20
--- /dev/null
+++ b/src/features/import-attributes.md
@@ -0,0 +1,63 @@
+---
+title: 'Import attributes'
+author: 'Shu-yu Guo ([@_shu](https://twitter.com/_shu))'
+avatars:
+ - 'shu-yu-guo'
+date: 2024-01-31
+tags:
+ - ECMAScript
+description: 'Import attributes: the evolution of import assertions'
+tweet: ''
+---
+
+## Previously
+
+V8 shipped the [import assertions](https://chromestatus.com/feature/5765269513306112) feature in v9.1. This feature allowed module import statements to include additional information by using the `assert` keyword. This additional information is currently used to import JSON and CSS modules inside JavaScript modules.
+
+## Import attributes
+
+Since then, import assertions has evolved into [import attributes](https://github.com/tc39/proposal-import-attributes). The point of the feature remains the same: to allow module import statements to include additional information.
+
+The most important difference is that import assertions had assert-only semantics, while import attributes has more relaxed semantics. Assert-only semantics means that the additional information has no effect on _how_ a module is loaded, only on _whether_ it is loaded. For example, a JSON module is always loaded as JSON module by virtue of its MIME type, and the `assert { type: 'json' }` clause can only cause loading to fail if the requested module's MIME type is not `application/json`.
+
+However, assert-only semantics had a fatal flaw. On the web, the shape of HTTP requests differs depending on the type of resource that is requested. For example, the [`Accept` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept) affects the MIME type of the response, and the [`Sec-Fetch-Dest` metadata header](https://web.dev/articles/fetch-metadata) affects whether the web server accepts or rejects the request. Because an import assertion could not affect _how_ to load a module, it was not able to change the shape of the HTTP request. The type of the resource that is being requested also affects which [Content Security Policies](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) are used: import assertions could not correctly work with the security model of the web.
+
+Import attributes relaxes the assert-only semantics to allow the attributes to affect how a module is loaded. In other words, import attributes can generate HTTP requests that contains the appropriate `Accept` and `Sec-Fetch-Dest` headers. To match the syntax to the new semantics, the old `assert` keyword is updated to `with`:
+
+```javascript
+// main.mjs
+//
+// New 'with' syntax.
+import json from './foo.json' with { type: 'json' };
+console.log(json.answer); // 42
+```
+
+## Dynamic `import()`
+
+Similarly, [dynamic `import()`](https://v8.dev/features/dynamic-import#dynamic) is similarly updated to accept a `with` option.
+
+```javascript
+// main.mjs
+//
+// New 'with' option.
+const jsonModule = await import('./foo.json', {
+ with: { type: 'json' }
+});
+console.log(jsonModule.default.answer); // 42
+```
+
+## Availability of `with`
+
+Import attributes is enabled by default in V8 v12.3.
+
+## Deprecation and eventual removal of `assert`
+
+The `assert` keyword is deprecated as of V8 v12.3 and is planned to be removed by v12.6. Please use `with` instead of `assert`! Use of the `assert` clause will print a warning to the console urging use of `with` instead.
+
+## Import attribute support
+
+
diff --git a/src/features/regexp-v-flag.md b/src/features/regexp-v-flag.md
index 0dbdc209c..8df15bbec 100644
--- a/src/features/regexp-v-flag.md
+++ b/src/features/regexp-v-flag.md
@@ -249,10 +249,10 @@ As part of our work on these JavaScript features, we went beyond “just” prop
## RegExp `v` flag support { #support }
-V8 v11.0 (Chrome 110) offers experimental support for this new functionality via the `--harmony-regexp-unicode-sets` flag. Babel also supports transpiling the `v` flag — [try out the examples from this article in the Babel REPL](https://babeljs.io/repl#?code_lz=MYewdgzgLgBATgUxgXhgegNoYIYFoBmAugGTEbC4AWhhaAbgNwBQTaaMAKpQJYQy8xKAVwDmSQCgEMKHACeMIWFABbJQjBRuYEfygBCVmlCRYCJSABW3FOgA6ABwDeAJQDiASQD6AUTOWAvvTMQA&presets=stage-3)! The support table below links to tracking issues you can subscribe to for updates.
+V8 v11.0 (Chrome 110) offers experimental support for this new functionality via the `--harmony-regexp-unicode-sets` flag. V8 v12.0 (Chrome 112) has the new features enabled by default. Babel also supports transpiling the `v` flag — [try out the examples from this article in the Babel REPL](https://babeljs.io/repl#?code_lz=MYewdgzgLgBATgUxgXhgegNoYIYFoBmAugGTEbC4AWhhaAbgNwBQTaaMAKpQJYQy8xKAVwDmSQCgEMKHACeMIWFABbJQjBRuYEfygBCVmlCRYCJSABW3FOgA6ABwDeAJQDiASQD6AUTOWAvvTMQA&presets=stage-3)! The support table below links to tracking issues you can subscribe to for updates.
diff --git a/src/index.njk b/src/index.njk
index 86dfb4d21..fc5c2d834 100644
--- a/src/index.njk
+++ b/src/index.njk
@@ -5,7 +5,7 @@ description: 'V8 is Google’s open source high-performance JavaScript and WebAs
---
What is V8?
-
V8 is Google’s open source high-performance JavaScript and WebAssembly engine, written in C++. It is used in Chrome and in Node.js, among others. It implements ECMAScript and WebAssembly, and runs on Windows 7 or later, macOS 10.12+, and Linux systems that use x64, IA-32, ARM, or MIPS processors. V8 can run standalone, or can be embedded into any C++ application.
+
V8 is Google’s open source high-performance JavaScript and WebAssembly engine, written in C++. It is used in Chrome and in Node.js, among others. It implements ECMAScript and WebAssembly, and runs on Windows, macOS, and Linux systems that use x64, IA-32, or ARM processors. V8 can be embedded into any C++ application.