You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nowadays, popular bundlers include Webpack, Rollup, and native language-based tools like esbuild and farm. Each of them has different strategies for building module graph and chunk graph. In this blog, I will mainly introduce Webpack, which I am more familiar with. What is its chunk strategy? Through this blog, you can fully understand how chunks are generated in the code and how to reduce their size.
For simplicity, we will make some incorrect simplifications here. For example, a module refers to your file, and a chunk refers to a large file consisting of multiple modules.
At the same time, in Webpack, chunks do not have a parent-child relationship, but chunk groups have a parent-child relationship. Since the concept of chunk groups involves splitChunks, we will not discuss it here for now. When we mention the parent-child relationship of chunks in this blog, we are referring to the parent-child relationship of chunk groups to facilitate readers' understanding.
How Webpack Works
Webpack has some runtime code to perform like commonjs, all your module source code are stored in a map, here is the simplified code:
In the entry chunk, only modules imported statically from the entry module will be included in this map.
Module execution is similar to commonjs require, but it is called webpack_require, and its simplified syntax is as follows:
When there is a dynamic import statement in the code, it will load the chunk. After loading, it continues to use webpack_require to execute.
For example, if we have a bar.js module in the bar-chunk chunk, at this time we dynamically import bar.js. The source code is as follows:
The ensure_chunk here will load bar-chunk.js. Generally, on the browser side, it inserts a script into the html body with the src set to the url of bar-chunk. After loading the chunk, all modules in that chunk will be added to webpack_modules, and then webpack_require is called to execute.
Therefore, it can be observed that there is no relationship between module execution order and chunk loading order. It only needs to ensure that the corresponding chunk exists before execution.
And if a module appears in multiple chunks, it doesn't matter because each module will only be executed once, as seen in the simplified code above with cache.
How About Rollup
Let's take a look at Rollup. Rollup has almost no runtime, and the product is the code content of the module after some transformations are made and then concatenated into one large file.
When encountering a dynamic import statement, directly use the esm's import() syntax to load. (depending on your output format)
The cost
At first glance, this output may seem intuitive, but it actually has issues. The code in the module will be executed during import.
if one module appears repeatedly in multiple chunks, it may be executed multiple times. If the module has some global side effects, errors will occur in most cases.
How is the order of modules ensured? If the execution order of modules in one reference chain is A -> B -> C, but in another reference chain it appears as D -> C -> B, how can we ensure that the execution order of B and C is correct in different reference chains?
One of the points is that for duplicate modules, in most cases Rollup can extract the duplicate modules and turn them into separate chunks, which solves the problem but introduces a new problem. There may be a large number of small chunks because even if it is a very small module, it has to be moved out as a separate chunk if it appears in multiple chunks. This situation is very common in real project.
For example, when writing React, people usually use import('./Home.tsx') for lazy loading of routes. Then commonly, people write some common utility tools called utils in the code, which are shared among multiple pages. So utils is a repeated module and will be extracted as a Chunk separately. If your utils is very small, say 10 kb, you need to make a single network request for this 10 kb. We are talking about this one utils here, and there are many other situations where small modules can be reused multiple times in Chunks.
You may wonder if putting multiple repeated modules into one chunk can solve the problem. It cannot, as you will soon encounter issues with module execution order, which will be discussed next.
For point 2, the current Rollup cannot guarantee the consistency of module execution order. Please refer to the playground. There are two entry points here. The first one imports in the order of ab, and the second one imports in the order of ba. If I only want to execute the home entry, it will still output ab.
There are also solutions to break it down into smaller modules, such as separating a and b separately. However, the problem of having too many small chunks mentioned earlier will become more serious. Moreover, this sequential detection may also have a significant impact on performance.
How Webpack Generates Chunks
In Webpack, modules can appear repeatedly and the order of modules in a chunk does not need to be concerned. Therefore, the process of building the Chunk Graph becomes very easy.
In general, without considering various workers, Module Federation, new URL, etc., creating chunks is done through dynamic import statements in ESM format, which is import("./path").
Building Chunk Graph
It's not difficult to build chunk graph
Walk current module's imports
When visiting static import, put referenced module to current module, then walk the referenced module recursively
When visiting dynamic import, creating a new chunk, put referenced chunk inside it, then walk the referenced module with new chunk been current chunk
For example, let's say we have following module graph
The solid line represents the import generated by static esm import statement, while the dotted line represents the import generated by dynamic import().
First, there will be an entry chunk, starting from the entry module. Importing a will place it into the entry chunk. Here, outer blocks represent chunk and each inner block represents a module.
When visiting a, found import("./shared"), creating new chunk, we call this chunk shared-chunk-1, we'll explain why there is a 1 postfix.
Back to index, there is also an import to b, put b into the entry chunk.
Starting from b, we found import('./shared'), but this import is from different module, Webpack won't reuse shared-chunk-1, but create shared-chunk-2 instead.
But here you can actually use the magic comments in webpack to force chunk reuse. The method is to write both import('./shared') as import(/* webpackChunkName: "shared" */ './shared'), so that only one shared chunk will be created.
At this point, the Chunk creation is complete. At this time, you will find two same chunks, shared-chunk-1 and shared-chunk-2. In early versions of webpack, mergeDuplicateChunk was introduced to deduplicate chunks. After deduplication, only one shared chunk remains.
Why create duplicate chunks and then deduplicate them? We can consider a scenario with multiple entry points and modify the module topology graph.
The index and home are two entry points, both of which dynamically import the shared module. At the same time, the shared module and the index entry point statically import the m module.
Start from index, entry chunk should have both index and m two modules.
Then index dynamically import shared, creating shared-chunk-1, then shared module imports m, so m is included in shared-chunk-1.
Let's say second entry: home, home is included in another entry chunk, and home module dynamically imports shared, so the final chunk graph is:
At this time, the optimization of removeAvailableModules comes into play. You will find that shared chunk 1 and its parent chunk both contain module m. shared chunk 1 must be loaded after its parent is loaded, so module m must have been loaded by the parent chunk at this time. Therefore, it is safe to remove m from shared chunk 1. After removal, it looks like this:
You will notice that there is no intersection between the chunks of the two entries now. When loading with different entry points, only the required chunks are loaded. Assuming that loading shared in two entry points is triggered by a user clicking a button, then for the home page's initial screen, there is no need to load module m; while for index page, when loading shared, there is no need to load module m either.
Rollup and esbuild
If you have used Rollup and esbuild, they will try to ensure zero duplicate modules. For import("./shared"), only one chunk will be created. When using Rollup for bundling, we will find that the final chunk is as follows (assuming our m module is a small module containing only one line: console.log(42)).
At first glance, it looks good. There are no repeated modules and the chunk relationship is very clear.
But why does a very small module need to be placed in a separate chunk?
Because having repeated modules is highly discouraged by tools like Rollup.
In fact, modern Webpack also generates this kind of structure by default. However, if the size of the module m is small enough, webpack will not separate it into its own chunk.
splitChunks
splitChunks can precisely control the strategy of allocating modules to chunks. If we want to achieve a similar strategy as Rollup, we just need to open the default splitChunks rule and make slight modifications.
module.exports={optimization: {splitChunks: {chunks: 'all',minSize: 0// you should not enable this in real project}}}
Here, chunks: 'all' means that all chunks can be split, it is optional to split and not mandatory.
minSize: 0 does not need to be enabled in your real project. It is only enabled here because webpack will not split out very small modules, and splitting actually affects loading performance. There is a default splitting size threshold, changing it to 0 is just for demonstration purposes.
After this operation, any module that appears in two or more chunks will be extracted into a separate chunk. Let's take a look at the chunk graph before the split.
Among them, the repeated modules are m and shared. They will be extracted as separate chunks. The relationship after extraction is as follows:
shared chunk 1 and shared chunk 2 have become empty chunks. In the subsequent build phase, webpack will remove the empty chunks and eventually form:
However, webpack does not have the module execution order issue like Rollup. Webpack's chunk is only equivalent to a module map, and the actual module execution order depends on the import order in the source code. It can ensure the correct execution order through any arbitrary splitting.
In addition, due to the existence of splitChunks, after configuring chunks: 'all', it is actually possible to disable the optimization of mergeDuplicateModules. The functionality of splitChunks can fully cover mergeDuplicateModules, and mergeDuplicateModules actually consumes a considerable amount of performance. Its algorithm complexity is not low.
In addition, the removeAvailableModules optimization can also be disabled to improve compilation performance. On one hand, splitChunks also remove the same module. On the other hand, this configuration actually has no effect. In the early days, this configuration controlled whether to add webpack's built-in plugin called RemoveParentModulesPlugin. However, later on, webpack implemented code splitting with similar functionality and better performance. Moreover, this behavior of code splitting cannot be disabled.
concatenateModules
Some people also call it scope hoisting, which literally means connecting modules. We know that often there are three main reasons why many people feel that webpack is not good enough.
The output is too verbose. When you first see the webpack output, you will find it full of comments as indentation, and a bunch of webpack-specific runtime functions. Each module is wrapped in a function, making it feel like running cjs, and always feel that the performance won't be good.
The build performance is not great. Many large projects take more than 10 minutes to bundle.
There are too many detailed configuration options (as a low-level build tool, it's hard to say this is a disadvantage).
Similar lightweight runtime bundlers like Rollup and esbuild, chunks are just modules concatenated together, so it looks clean. In fact, webpack does the same thing in production environment for pure ESM modules. The output is actually the same as Rollup and others. However, for CJS modules, Rollup needs a commonjs plugin to provide a small amount of CJS runtime (wrapping CJS modules with functions so that when require is called, it's equivalent to invoking that function and getting its return value, along with some CJS-ESM interaction runtime). On the other hand, esbuild comes with its own CJS runtime.
Take a look at the output in development mode after enabling the concatenateModules configuration.
After optimizing with splitChunks and concatenateModules, the output is basically as clean as Rollup.
Rspack's concatenateModule optimization will also be released in future versions, stay tuned.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Webpack ChunkGraph Algorithm
Nowadays, popular bundlers include Webpack, Rollup, and native language-based tools like esbuild and farm. Each of them has different strategies for building module graph and chunk graph. In this blog, I will mainly introduce Webpack, which I am more familiar with. What is its chunk strategy? Through this blog, you can fully understand how chunks are generated in the code and how to reduce their size.
For simplicity, we will make some incorrect simplifications here. For example, a module refers to your file, and a chunk refers to a large file consisting of multiple modules.
At the same time, in Webpack, chunks do not have a parent-child relationship, but chunk groups have a parent-child relationship. Since the concept of chunk groups involves splitChunks, we will not discuss it here for now. When we mention the parent-child relationship of chunks in this blog, we are referring to the parent-child relationship of chunk groups to facilitate readers' understanding.
How Webpack Works
Webpack has some runtime code to perform like commonjs, all your module source code are stored in a map, here is the simplified code:
In the entry chunk, only modules imported statically from the entry module will be included in this map.
Module execution is similar to commonjs require, but it is called webpack_require, and its simplified syntax is as follows:
When there is a dynamic import statement in the code, it will load the chunk. After loading, it continues to use webpack_require to execute.
For example, if we have a bar.js module in the bar-chunk chunk, at this time we dynamically import bar.js. The source code is as follows:
will be converted to:
The ensure_chunk here will load bar-chunk.js. Generally, on the browser side, it inserts a script into the html body with the
src
set to the url of bar-chunk. After loading the chunk, all modules in that chunk will be added to webpack_modules, and then webpack_require is called to execute.Therefore, it can be observed that there is no relationship between module execution order and chunk loading order. It only needs to ensure that the corresponding chunk exists before execution.
And if a module appears in multiple chunks, it doesn't matter because each module will only be executed once, as seen in the simplified code above with cache.
How About Rollup
Let's take a look at Rollup. Rollup has almost no runtime, and the product is the code content of the module after some transformations are made and then concatenated into one large file.
When encountering a dynamic import statement, directly use the esm's
import()
syntax to load. (depending on your output format)The cost
At first glance, this output may seem intuitive, but it actually has issues. The code in the module will be executed during import.
One of the points is that for duplicate modules, in most cases Rollup can extract the duplicate modules and turn them into separate chunks, which solves the problem but introduces a new problem. There may be a large number of small chunks because even if it is a very small module, it has to be moved out as a separate chunk if it appears in multiple chunks. This situation is very common in real project.
For example, when writing React, people usually use
import('./Home.tsx')
for lazy loading of routes. Then commonly, people write some common utility tools called utils in the code, which are shared among multiple pages. So utils is a repeated module and will be extracted as a Chunk separately. If your utils is very small, say 10 kb, you need to make a single network request for this 10 kb. We are talking about this one utils here, and there are many other situations where small modules can be reused multiple times in Chunks.You may wonder if putting multiple repeated modules into one chunk can solve the problem. It cannot, as you will soon encounter issues with module execution order, which will be discussed next.
For point 2, the current Rollup cannot guarantee the consistency of module execution order. Please refer to the playground. There are two entry points here. The first one imports in the order of
a
b
, and the second one imports in the order ofb
a
. If I only want to execute thehome
entry, it will still outputa
b
.There are also solutions to break it down into smaller modules, such as separating
a
andb
separately. However, the problem of having too many small chunks mentioned earlier will become more serious. Moreover, this sequential detection may also have a significant impact on performance.How Webpack Generates Chunks
In Webpack, modules can appear repeatedly and the order of modules in a chunk does not need to be concerned. Therefore, the process of building the Chunk Graph becomes very easy.
In general, without considering various workers, Module Federation, new URL, etc., creating chunks is done through dynamic import statements in ESM format, which is
import("./path")
.Building Chunk Graph
It's not difficult to build chunk graph
For example, let's say we have following module graph
The solid line represents the import generated by static esm import statement, while the dotted line represents the import generated by dynamic
import()
.First, there will be an entry chunk, starting from the entry module. Importing
a
will place it into the entry chunk. Here, outer blocks represent chunk and each inner block representsa
module.When visiting
a
, foundimport("./shared")
, creating new chunk, we call thischunk shared-chunk-1
, we'll explain why there is a1
postfix.Back to index, there is also an import to
b
, putb
into the entry chunk.Starting from
b
, we foundimport('./shared')
, but this import is from different module, Webpack won't reuseshared-chunk-1
, but createshared-chunk-2
instead.But here you can actually use the magic comments in webpack to force chunk reuse. The method is to write both
import('./shared')
asimport(/* webpackChunkName: "shared" */ './shared')
, so that only one shared chunk will be created.At this point, the Chunk creation is complete. At this time, you will find two same chunks,
shared-chunk-1
andshared-chunk-2
. In early versions of webpack, mergeDuplicateChunk was introduced to deduplicate chunks. After deduplication, only one shared chunk remains.Why create duplicate chunks and then deduplicate them? We can consider a scenario with multiple entry points and modify the module topology graph.
The
index
andhome
are two entry points, both of which dynamically import theshared
module. At the same time, theshared
module and the index entry point statically import them
module.Start from
index
, entry chunk should have bothindex
andm
two modules.Then
index
dynamically importshared
, creating shared-chunk-1, thenshared
module importsm
, som
is included inshared-chunk-1
.Let's say second entry:
home
,home
is included in another entry chunk, andhome
module dynamically importsshared
, so the final chunk graph is:At this time, the optimization of removeAvailableModules comes into play. You will find that
shared chunk 1
and its parent chunk both contain modulem
.shared chunk 1
must be loaded after its parent is loaded, so modulem
must have been loaded by the parent chunk at this time. Therefore, it is safe to removem
fromshared chunk 1
. After removal, it looks like this:You will notice that there is no intersection between the chunks of the two entries now. When loading with different entry points, only the required chunks are loaded. Assuming that loading shared in two entry points is triggered by a user clicking a button, then for the home page's initial screen, there is no need to load module
m
; while forindex
page, when loadingshared
, there is no need to load modulem
either.Rollup and esbuild
If you have used Rollup and esbuild, they will try to ensure zero duplicate modules. For
import("./shared")
, only one chunk will be created. When using Rollup for bundling, we will find that the final chunk is as follows (assuming our m module is a small module containing only one line:console.log(42)
).At first glance, it looks good. There are no repeated modules and the chunk relationship is very clear.
But why does a very small module need to be placed in a separate chunk?
Because having repeated modules is highly discouraged by tools like Rollup.
In fact, modern Webpack also generates this kind of structure by default. However, if the size of the module
m
is small enough, webpack will not separate it into its own chunk.splitChunks
splitChunks can precisely control the strategy of allocating modules to chunks. If we want to achieve a similar strategy as Rollup, we just need to open the default splitChunks rule and make slight modifications.
Here,
chunks: 'all'
means that all chunks can be split, it is optional to split and not mandatory.minSize: 0
does not need to be enabled in your real project. It is only enabled here because webpack will not split out very small modules, and splitting actually affects loading performance. There is a default splitting size threshold, changing it to 0 is just for demonstration purposes.After this operation, any module that appears in two or more chunks will be extracted into a separate chunk. Let's take a look at the chunk graph before the split.
Among them, the repeated modules are
m
andshared
. They will be extracted as separate chunks. The relationship after extraction is as follows:shared chunk 1
andshared chunk 2
have become empty chunks. In the subsequent build phase, webpack will remove the empty chunks and eventually form:However, webpack does not have the module execution order issue like Rollup. Webpack's chunk is only equivalent to a module map, and the actual module execution order depends on the import order in the source code. It can ensure the correct execution order through any arbitrary splitting.
In addition, due to the existence of splitChunks, after configuring
chunks: 'all'
, it is actually possible to disable the optimization ofmergeDuplicateModules
. The functionality of splitChunks can fully covermergeDuplicateModules
, andmergeDuplicateModules
actually consumes a considerable amount of performance. Its algorithm complexity is not low.In addition, the removeAvailableModules optimization can also be disabled to improve compilation performance. On one hand, splitChunks also remove the same module. On the other hand, this configuration actually has no effect. In the early days, this configuration controlled whether to add webpack's built-in plugin called RemoveParentModulesPlugin. However, later on, webpack implemented code splitting with similar functionality and better performance. Moreover, this behavior of code splitting cannot be disabled.
concatenateModules
Some people also call it scope hoisting, which literally means connecting modules. We know that often there are three main reasons why many people feel that webpack is not good enough.
The first point among these can be improved by concatenateModules.
Similar lightweight runtime bundlers like Rollup and esbuild, chunks are just modules concatenated together, so it looks clean. In fact, webpack does the same thing in production environment for pure ESM modules. The output is actually the same as Rollup and others. However, for CJS modules, Rollup needs a commonjs plugin to provide a small amount of CJS runtime (wrapping CJS modules with functions so that when require is called, it's equivalent to invoking that function and getting its return value, along with some CJS-ESM interaction runtime). On the other hand, esbuild comes with its own CJS runtime.
Take a look at the output in development mode after enabling the concatenateModules configuration.
After optimizing with splitChunks and concatenateModules, the output is basically as clean as Rollup.
Rspack's concatenateModule optimization will also be released in future versions, stay tuned.
End
Drawing tool https://www.doodleboard.pro/app/RlAuShKAn0R2
Beta Was this translation helpful? Give feedback.
All reactions