Christoph Nakazawa

Building a JavaScript Bundler

23 minutes reading time, 4500 words.
Target Audience: Front End Engineers.
Watch this episode on YouTube.

Jest’s packages make up an entire ecosystem of packages useful for building any kind of JavaScript tooling. “The whole is greater than the sum of its parts” doesn’t apply to Jest! In this article we are going to leverage some of Jest’s packages to learn how a JavaScript bundler works. In the end, you’ll have a toy bundler, and you’ll understand the fundamental concepts behind bundling JavaScript code.

This post is part of a series about JavaScript infrastructure. Here is where we are at:

  1. Dependency Managers Don’t Manage Your Dependencies
  2. Rethinking JavaScript Infrastructure
  3. Building a JavaScript Testing Framework
  4. Building a JavaScript Bundler (you are here)
  5. Defaults Matter: The Jest Story
  6. Title to be announced

Subscribe to get notified about new posts.


Building a Bundler

I frequently joked that Jest should come with a bundler out of the box, and that it would only take about an hour to build one on top of Jest with a basic set of features. Let’s break down the bundling steps from source code to a JavaScript bundle that can run in a browser:

  1. Efficiently search for all files on the file system
  2. Resolve the dependency graph
  3. Serialize the bundle
  4. Execute our bundle using a runtime
  5. Compile each file in parallel

If we think of JavaScript testing as a map-reduce operation that maps over all test files and “reduces” them to test results, then JavaScript bundling maps over all source files and “reduces” them into a bundle. Let’s see if we can put together a working jest-bundler in one hour! If you haven’t read the previous entry in this series, Building a JavaScript Testing Framework, I suggest starting there as we’ll re-use many of the concepts and modules.

Let’s get started by initializing our project and adding a few test files:

bash
# In your terminal:
mkdir jest-bundler
cd jest-bundler
yarn init --yes
mkdir product
echo "console.log(require('./apple'));" > product/entry-point.js
echo "module.exports = 'apple ' + require('./banana') + ' ' + require('./kiwi');" > product/apple.js
echo "module.exports = 'banana ' + require('./kiwi');" > product/banana.js
echo "module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');" > product/kiwi.js
echo "module.exports = 'melon';" > product/melon.js
echo "module.exports = 'tomato';" > product/tomato.js
touch index.mjs
yarn add chalk yargs jest-haste-map
bash
# In your terminal:
mkdir jest-bundler
cd jest-bundler
yarn init --yes
mkdir product
echo "console.log(require('./apple'));" > product/entry-point.js
echo "module.exports = 'apple ' + require('./banana') + ' ' + require('./kiwi');" > product/apple.js
echo "module.exports = 'banana ' + require('./kiwi');" > product/banana.js
echo "module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');" > product/kiwi.js
echo "module.exports = 'melon';" > product/melon.js
echo "module.exports = 'tomato';" > product/tomato.js
touch index.mjs
yarn add chalk yargs jest-haste-map

Fruits and vegetables are great, you should eat more of them! We’ll extend our test code as we go, but for now when we run our entry-point, it prints out this sequence of words:

bash
# In your terminal:
node product/entry-point.js
# apple banana kiwi melon tomato kiwi melon tomato
bash
# In your terminal:
node product/entry-point.js
# apple banana kiwi melon tomato kiwi melon tomato

This works in node, but we’ll have to bundle everything into a single file if we want to run it in a browser.

Efficiently search for all files on the file system

If you’ve been following the previous article, this section will look almost identical to how we got started last time. Most JavaScript tooling operates on all the code in your project, and jest-haste-map is an efficient way to keep track of all files, analyze relationships between them and keep monitoring the file system for changes:

javascript
// index.mjs
import JestHasteMap from 'jest-haste-map';
import { cpus } from 'os';
import { dirname, join } from 'path';
import { fileURLToPath } from 'url';
// Get the root path to our project (Like `__dirname`).
const root = join(dirname(fileURLToPath(import.meta.url)), 'product');
const hasteMapOptions = {
extensions: ['js'],
maxWorkers: cpus().length,
name: 'jest-bundler',
platforms: [],
rootDir: root,
roots: [root],
};
// Need to use `.default` as of Jest 27.
/** @type {JestHasteMap} */
const hasteMap = new JestHasteMap.default(hasteMapOptions);
// This line is only necessary in `jest-haste-map` version 28 or later.
await hasteMap.setupCachePath(hasteMapOptions);
const { hasteFS, moduleMap } = await hasteMap.build();
console.log(hasteFS.getAllFiles());
// ['/path/to/product/apple.js', '/path/to/product/banana.js', …]
javascript
// index.mjs
import JestHasteMap from 'jest-haste-map';
import { cpus } from 'os';
import { dirname, join } from 'path';
import { fileURLToPath } from 'url';
// Get the root path to our project (Like `__dirname`).
const root = join(dirname(fileURLToPath(import.meta.url)), 'product');
const hasteMapOptions = {
extensions: ['js'],
maxWorkers: cpus().length,
name: 'jest-bundler',
platforms: [],
rootDir: root,
roots: [root],
};
// Need to use `.default` as of Jest 27.
/** @type {JestHasteMap} */
const hasteMap = new JestHasteMap.default(hasteMapOptions);
// This line is only necessary in `jest-haste-map` version 28 or later.
await hasteMap.setupCachePath(hasteMapOptions);
const { hasteFS, moduleMap } = await hasteMap.build();
console.log(hasteFS.getAllFiles());
// ['/path/to/product/apple.js', '/path/to/product/banana.js', …]

Sweet, we got a quick start. Now, bundlers usually require a lot of configuration or command line options. Let’s make use of yargs to add an --entry-point option so we can tell our bundler where to start bundling from. Since our bundler consists of many different steps, let’s also add some output to tell the user what is happening:

javascript
// index.mjs
import { resolve } from 'path';
import chalk from 'chalk';
import yargs from 'yargs';
const options = yargs(process.argv).argv;
const entryPoint = resolve(process.cwd(), options.entryPoint);
if (!hasteFS.exists(entryPoint)) {
throw new Error(
'`--entry-point` does not exist. Please provide a path to a valid file.',
);
}
console.log(chalk.bold(`❯ Building ${chalk.blue(options.entryPoint)}`));
javascript
// index.mjs
import { resolve } from 'path';
import chalk from 'chalk';
import yargs from 'yargs';
const options = yargs(process.argv).argv;
const entryPoint = resolve(process.cwd(), options.entryPoint);
if (!hasteFS.exists(entryPoint)) {
throw new Error(
'`--entry-point` does not exist. Please provide a path to a valid file.',
);
}
console.log(chalk.bold(`❯ Building ${chalk.blue(options.entryPoint)}`));

If we run this using node index.mjs --entry-point product/entry-point.js from the root of our project, it’ll tell us that it is building that file. That was good for a warm-up as we check off the first task ✅ Let’s get started for real.

Resolve the dependency graph

To determine which files should be present in our output bundle, we need to resolve all dependencies recursively from the entry point down to every leaf node. The previous post was hinting at jest-haste-map having additional functionality that is going to come in handy now: By the time it gives us a list of files, it actually has much more information available than it seems. We can ask it to give us the dependencies of individual files:

javascript
// Append to index.mjs:
console.log(hasteFS.getDependencies(entryPoint));
// ['./apple.js']
javascript
// Append to index.mjs:
console.log(hasteFS.getDependencies(entryPoint));
// ['./apple.js']

That’s great but the name is unresolved, meaning that we have to implement the entire node resolution algorithm to figure out which file it maps to. For example, a module can usually be required without providing a file extension, or a package can redirect its main module through an entry in its package.json. Let’s use jest-resolve and jest-resolve-dependencies which were made to do just that: yarn add jest-resolve jest-resolve-dependencies. We can set them up by passing along some of our jest-haste-map data structures and some configuration options:

javascript
// Append to index.mjs:
import Resolver from 'jest-resolve';
import { DependencyResolver } from 'jest-resolve-dependencies';
/** @type {Resolver} */
const resolver = new Resolver.default(moduleMap, {
extensions: ['.js'],
hasCoreModules: false,
rootDir: root,
});
const dependencyResolver = new DependencyResolver(resolver, hasteFS);
console.log(dependencyResolver.resolve(entryPoint));
// ['/path/to/apple.js']
javascript
// Append to index.mjs:
import Resolver from 'jest-resolve';
import { DependencyResolver } from 'jest-resolve-dependencies';
/** @type {Resolver} */
const resolver = new Resolver.default(moduleMap, {
extensions: ['.js'],
hasCoreModules: false,
rootDir: root,
});
const dependencyResolver = new DependencyResolver(resolver, hasteFS);
console.log(dependencyResolver.resolve(entryPoint));
// ['/path/to/apple.js']

Nice! With this solution we can now retrieve the full file paths of each module that our entry point depends on. We’ll need to process each dependency once to create the full dependency graph. I am going to use a queue for the modules that need to be processed, and a Set to keep track of modules that have already been processed. This is necessary because we don’t want to process modules more than once, which might happen if our dependency graph has cycles, like A → B → C → A. We are not using recursion because it might lead to overflows.

javascript
// index.mjs
/** @type {Set<string>} */
const allFiles = new Set();
const queue = [entryPoint];
while (queue.length) {
const module = queue.shift();
// Ensure we process each module at most once
// to guard for cycles.
if (allFiles.has(module)) {
continue;
}
allFiles.add(module);
queue.push(...dependencyResolver.resolve(module));
}
console.log(chalk.bold(`❯ Found ${chalk.blue(allFiles.size)} files`));
console.log(Array.from(allFiles));
// ['/path/to/entry-point.js', '/path/to/apple.js', …]
javascript
// index.mjs
/** @type {Set<string>} */
const allFiles = new Set();
const queue = [entryPoint];
while (queue.length) {
const module = queue.shift();
// Ensure we process each module at most once
// to guard for cycles.
if (allFiles.has(module)) {
continue;
}
allFiles.add(module);
queue.push(...dependencyResolver.resolve(module));
}
console.log(chalk.bold(`❯ Found ${chalk.blue(allFiles.size)} files`));
console.log(Array.from(allFiles));
// ['/path/to/entry-point.js', '/path/to/apple.js', …]

Success! We now have a list of all the modules in our dependency graph. You can play around with this by adding/removing test files or require calls and you’ll see that the output changes accordingly. Our second step, resolving the dependency graph, is complete ✅

Serialize the bundle

We now have all the necessary information to “serialize” our bundle. Serialization is the process of taking the dependency information and all code to turn it into a bundle that we can be run as a single file in a browser. Here is an initial approach:

javascript
import fs from 'fs';
console.log(chalk.bold(`❯ Serializing bundle`));
/** @type {Array<string>} */
const allCode = [];
await Promise.all(
Array.from(allFiles).map(async (file) => {
const code = await fs.promises.readFile(file, 'utf8');
allCode.push(code);
}),
);
console.log(allCode.join('\n'));
javascript
import fs from 'fs';
console.log(chalk.bold(`❯ Serializing bundle`));
/** @type {Array<string>} */
const allCode = [];
await Promise.all(
Array.from(allFiles).map(async (file) => {
const code = await fs.promises.readFile(file, 'utf8');
allCode.push(code);
}),
);
console.log(allCode.join('\n'));

The above example concatenates all of the source files and prints them. Unfortunately, if we tried running the output it won’t work: it calls require, which doesn’t exist in a browser, and there is no way to reference modules. We need to think about a different strategy that will actually work. Here is another idea: What if we inline every module? Let’s change our dependency collection to keep track of dependency names in the code to full paths, and attempt to inline modules by swapping out each require('…') call with the implementation of the module. We won’t need jest-resolve-dependencies any longer as we have to do something slightly more complex, so here is a full bundler with inlining:

javascript
// index.mjs
import { cpus } from 'os';
import { dirname, resolve, join } from 'path';
import { fileURLToPath } from 'url';
import chalk from 'chalk';
import JestHasteMap from 'jest-haste-map';
import Resolver from 'jest-resolve';
import yargs from 'yargs';
import fs from 'fs';
const root = join(dirname(fileURLToPath(import.meta.url)), 'product');
const hasteMapOptions = {
extensions: ['js'],
maxWorkers: cpus().length,
name: 'jest-bundler',
platforms: [],
rootDir: root,
roots: [root],
};
/** @type {JestHasteMap} */
const hasteMap = new JestHasteMap.default(hasteMapOptions);
// This line is only necessary in `jest-haste-map` version 28 or later.
await hasteMap.setupCachePath(hasteMapOptions);
const { hasteFS, moduleMap } = await hasteMap.build();
const options = yargs(process.argv).argv;
const entryPoint = resolve(process.cwd(), options.entryPoint);
if (!hasteFS.exists(entryPoint)) {
throw new Error(
'`--entry-point` does not exist. Please provide a path to a valid file.',
);
}
console.log(chalk.bold(`❯ Building ${chalk.blue(options.entryPoint)}`));
/** @type {Resolver} */
const resolver = new Resolver.default(moduleMap, {
extensions: ['.js'],
hasCoreModules: false,
rootDir: root,
});
/** @type {Set<string>} */
const seen = new Set();
/** @type {Map<string, {code: string, dependencyMap: Map<string, string>}>} */
const modules = new Map();
const queue = [entryPoint];
while (queue.length) {
const module = queue.shift();
if (seen.has(module)) {
continue;
}
seen.add(module);
// Resolve each dependency and store it based on their "name",
// that is the actual occurrence in code via `require('<name>');`.
const dependencyMap = new Map(
hasteFS
.getDependencies(module)
.map((dependencyName) => [
dependencyName,
resolver.resolveModule(module, dependencyName),
]),
);
const code = fs.readFileSync(module, 'utf8');
// Extract the "module body", in our case everything after `module.exports =`;
const moduleBody = code.match(/module\.exports\s+=\s+(.*?);/)?.[1] || '';
const metadata = {
code: moduleBody || code,
dependencyMap,
};
modules.set(module, metadata);
queue.push(...dependencyMap.values());
}
console.log(chalk.bold(`❯ Found ${chalk.blue(seen.size)} files`));
console.log(chalk.bold(`❯ Serializing bundle`));
// Go through each module (backwards, to process the entry-point last).
for (const [module, metadata] of Array.from(modules).reverse()) {
let { code } = metadata;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
// Inline the module body of the dependency into the module that requires it.
code = code.replace(
new RegExp(
// Escape `.` and `/`.
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
modules.get(dependencyPath).code,
);
}
metadata.code = code;
}
console.log(modules.get(entryPoint).code);
// console.log('apple ' + 'banana ' + 'kiwi ' + 'melon' + ' ' + 'tomato' + ' ' + 'kiwi ' + 'melon' + ' ' + 'tomato');
javascript
// index.mjs
import { cpus } from 'os';
import { dirname, resolve, join } from 'path';
import { fileURLToPath } from 'url';
import chalk from 'chalk';
import JestHasteMap from 'jest-haste-map';
import Resolver from 'jest-resolve';
import yargs from 'yargs';
import fs from 'fs';
const root = join(dirname(fileURLToPath(import.meta.url)), 'product');
const hasteMapOptions = {
extensions: ['js'],
maxWorkers: cpus().length,
name: 'jest-bundler',
platforms: [],
rootDir: root,
roots: [root],
};
/** @type {JestHasteMap} */
const hasteMap = new JestHasteMap.default(hasteMapOptions);
// This line is only necessary in `jest-haste-map` version 28 or later.
await hasteMap.setupCachePath(hasteMapOptions);
const { hasteFS, moduleMap } = await hasteMap.build();
const options = yargs(process.argv).argv;
const entryPoint = resolve(process.cwd(), options.entryPoint);
if (!hasteFS.exists(entryPoint)) {
throw new Error(
'`--entry-point` does not exist. Please provide a path to a valid file.',
);
}
console.log(chalk.bold(`❯ Building ${chalk.blue(options.entryPoint)}`));
/** @type {Resolver} */
const resolver = new Resolver.default(moduleMap, {
extensions: ['.js'],
hasCoreModules: false,
rootDir: root,
});
/** @type {Set<string>} */
const seen = new Set();
/** @type {Map<string, {code: string, dependencyMap: Map<string, string>}>} */
const modules = new Map();
const queue = [entryPoint];
while (queue.length) {
const module = queue.shift();
if (seen.has(module)) {
continue;
}
seen.add(module);
// Resolve each dependency and store it based on their "name",
// that is the actual occurrence in code via `require('<name>');`.
const dependencyMap = new Map(
hasteFS
.getDependencies(module)
.map((dependencyName) => [
dependencyName,
resolver.resolveModule(module, dependencyName),
]),
);
const code = fs.readFileSync(module, 'utf8');
// Extract the "module body", in our case everything after `module.exports =`;
const moduleBody = code.match(/module\.exports\s+=\s+(.*?);/)?.[1] || '';
const metadata = {
code: moduleBody || code,
dependencyMap,
};
modules.set(module, metadata);
queue.push(...dependencyMap.values());
}
console.log(chalk.bold(`❯ Found ${chalk.blue(seen.size)} files`));
console.log(chalk.bold(`❯ Serializing bundle`));
// Go through each module (backwards, to process the entry-point last).
for (const [module, metadata] of Array.from(modules).reverse()) {
let { code } = metadata;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
// Inline the module body of the dependency into the module that requires it.
code = code.replace(
new RegExp(
// Escape `.` and `/`.
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
modules.get(dependencyPath).code,
);
}
metadata.code = code;
}
console.log(modules.get(entryPoint).code);
// console.log('apple ' + 'banana ' + 'kiwi ' + 'melon' + ' ' + 'tomato' + ' ' + 'kiwi ' + 'melon' + ' ' + 'tomato');

Congratulations, we just built rollup.js, a compiler that inlines modules! Let’s apply one more trick:

javascript
console.log(modules.get(entryPoint).code.replace(/' \+ '/g, ''));
// console.log('apple banana kiwi melon tomato kiwi melon tomato');
javascript
console.log(modules.get(entryPoint).code.replace(/' \+ '/g, ''));
// console.log('apple banana kiwi melon tomato kiwi melon tomato');

Now we have an optimizing compiler, more advanced than most actual JavaScript compilers. Of course, this approach will break down quickly. First, we are using regular expressions. Second we cannot do anything complex in our modules as we are only extracting what comes after module.exports = and disregard any other code in the module’s scope. While rollup.js has shown this is indeed possible (and awesome!), this guide is focused on a simpler but robust solution: We’ll give each module a scope and state, and use a runtime to orchestrate the execution of modules.

Execute our bundle using a runtime

Let’s take a step back and think about what the output of our bundler could look like if we want to create a portable artifact that can run in any JavaScript environment. We just learned about one serialization format: collapsing all modules into a single statement. There are many others we could choose from. This is a good moment to stop reading and see if you can come up with a solution of your own!

You might come up with a serialization format that looks like this:

javascript
// Serialization format 2nd attempt.
let module;
// tomato.js
module = {};
module.exports = 'tomato';
const tomatoModule = module.exports;
// melon.js
module = {};
module.exports = 'melon';
const melonModule = module.exports;
// kiwi.js
module = {};
module.exports = 'kiwi ' + melonModule + ' ' + tomatoModule;
const kiwiModule = module.exports;
javascript
// Serialization format 2nd attempt.
let module;
// tomato.js
module = {};
module.exports = 'tomato';
const tomatoModule = module.exports;
// melon.js
module = {};
module.exports = 'melon';
const melonModule = module.exports;
// kiwi.js
module = {};
module.exports = 'kiwi ' + melonModule + ' ' + tomatoModule;
const kiwiModule = module.exports;

This serialized format still concatenates all the modules, but injects code before and after each module. Before running the module it resets the module variable, and after executing the module it stores the result in a module specific variable. Further, we swap out require calls with the reference to each module’s exports. This is a much better solution compared to what we had before as we can actually execute more than a single exports statement in each module. However, this solution also has downsides. We’ll quickly run into limitations, like when two modules use the same variable names or when the module variable is referenced lazily.

For our bundler, we are going to go with a serialization format that preserves modules and brings a runtime that has the functionality to execute and import modules. This means we also need to register modules somehow. We used an interesting pattern in the previous post when building a test runner where we used eval in a vm context and wrapped our code in a function: (function(module) {${code}}). Could we use this for our bundler?

javascript
// Serialization format 3rd attempt.
// tomato.js
(function (module) {
module.exports = 'tomato';
});
// melon.js
(function (module) {
module.exports = 'melon';
});
// kiwi.js
(function (module) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});
javascript
// Serialization format 3rd attempt.
// tomato.js
(function (module) {
module.exports = 'tomato';
});
// melon.js
(function (module) {
module.exports = 'melon';
});
// kiwi.js
(function (module) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});

Great, now we have all of our modules isolated as we turned them into moduleFactories! However, if we tried running this code, nothing will happen. We have no way of referencing modules and executing them, we are just defining a few functions and immediately forgetting about them. Let’s add some functionality to define modules:

javascript
// Serialization format 4th attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
// tomato.js
define('tomato', function (module) {
module.exports = 'tomato';
});
// melon.js
define('melon', function (module) {
module.exports = 'melon';
});
// kiwi.js
define('kiwi', function (module) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});
javascript
// Serialization format 4th attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
// tomato.js
define('tomato', function (module) {
module.exports = 'tomato';
});
// melon.js
define('melon', function (module) {
module.exports = 'melon';
});
// kiwi.js
define('kiwi', function (module) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});

We can now run our program and define modules. This code is still not running our code though. Modules are usually executed when they are required. So let’s add an implementation for running and requiring modules:

javascript
// Serialization format 5th attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: string, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
// If this module has already been executed,
// return a reference to it.
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
// Throw if the module doesn't exist.
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
// Create a module object.
const module = {
exports: {},
};
// Set the moduleCache immediately so that we do not
// run into infinite loops with circular dependencies.
moduleCache.set(name, module);
// Execute the module factory. It will likely mutate the `module` object.
moduleFactory(module, module.exports, requireModule);
// Return the exported data.
return module.exports;
};
// tomato.js
define('tomato', function (module, exports, require) {
module.exports = 'tomato';
});
// melon.js
define('melon', function (module, exports, require) {
module.exports = 'melon';
});
// kiwi.js
define('kiwi', function (module, exports, require) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});
javascript
// Serialization format 5th attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: string, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
// If this module has already been executed,
// return a reference to it.
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
// Throw if the module doesn't exist.
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
// Create a module object.
const module = {
exports: {},
};
// Set the moduleCache immediately so that we do not
// run into infinite loops with circular dependencies.
moduleCache.set(name, module);
// Execute the module factory. It will likely mutate the `module` object.
moduleFactory(module, module.exports, requireModule);
// Return the exported data.
return module.exports;
};
// tomato.js
define('tomato', function (module, exports, require) {
module.exports = 'tomato';
});
// melon.js
define('melon', function (module, exports, require) {
module.exports = 'melon';
});
// kiwi.js
define('kiwi', function (module, exports, require) {
module.exports = 'kiwi ' + require('./melon') + ' ' + require('./tomato');
});

With this code, we can add requireModule('kiwi'); to the end of our bundle to actually execute it. The only problem is that it will throw with Module './melon' does not exist.. This is because when we require modules, we usually reference files on a file system but here we are compiling modules into the same file and giving them an arbitrary id. We could change the require('./melon') call to require('melon') but in a real-world scenario we’ll quickly run into module name collisions. We can avoid this problem by assigning a unique id to each module, making our final bundle output look like this:

javascript
// Serialization format final attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: number, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
const module = {
exports: {},
};
moduleCache.set(name, module);
moduleFactory(module, module.exports, requireModule);
return module.exports;
};
// tomato.js
define(2, function (module, exports, require) {
module.exports = 'tomato';
});
// melon.js
define(1, function (module, exports, require) {
module.exports = 'melon';
});
// kiwi.js
define(0, function (module, exports, require) {
module.exports = 'kiwi ' + require(1) + ' ' + require(2);
});
requireModule(0);
javascript
// Serialization format final attempt.
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: number, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
const module = {
exports: {},
};
moduleCache.set(name, module);
moduleFactory(module, module.exports, requireModule);
return module.exports;
};
// tomato.js
define(2, function (module, exports, require) {
module.exports = 'tomato';
});
// melon.js
define(1, function (module, exports, require) {
module.exports = 'melon';
});
// kiwi.js
define(0, function (module, exports, require) {
module.exports = 'kiwi ' + require(1) + ' ' + require(2);
});
requireModule(0);

Fantastic! Now let’s figure out how we can actually output this kind of code from our bundler. Let’s start by taking our require-runtime and putting it into a separate template file:

javascript
// require.js
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: string, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
const module = {
exports: {},
};
moduleCache.set(name, module);
moduleFactory(module, module.exports, requireModule);
return module.exports;
};
javascript
// require.js
/** @type {Map<string, Function>} */
const modules = new Map();
/** @type {(name: string, moduleFactor: Function) => void} */
const define = (name, moduleFactory) => {
modules.set(name, moduleFactory);
};
/** @type {Map<string, {exports: any}>} */
const moduleCache = new Map();
const requireModule = (name) => {
if (moduleCache.has(name)) {
return moduleCache.get(name).exports;
}
if (!modules.has(name)) {
throw new Error(`Module '${name}' does not exist.`);
}
const moduleFactory = modules.get(name);
const module = {
exports: {},
};
moduleCache.set(name, module);
moduleFactory(module, module.exports, requireModule);
return module.exports;
};

It’s been a while since we touched our bundling code. As our previous version was optimizing and inlining a lot of code, we’ll need to throw away some of what we’ve written. Let’s start with a small update to our dependency collector, removing the code extraction and adding an id generator:

javascript
/** @type {Set<string>} */
const seen = new Set();
/** @type {Map<string, {id: number, code: string, dependencyMap: Map<string, string>}>} */
const modules = new Map();
const queue = [entryPoint];
let id = 0;
while (queue.length) {
const module = queue.shift();
if (seen.has(module)) {
continue;
}
seen.add(module);
const dependencyMap = new Map(
hasteFS
.getDependencies(module)
.map((dependencyName) => [
dependencyName,
resolver.resolveModule(module, dependencyName),
]),
);
const code = fs.readFileSync(module, 'utf8');
const metadata = {
// Assign a unique id to each module.
id: id++,
code,
dependencyMap,
};
modules.set(module, metadata);
queue.push(...dependencyMap.values());
}
javascript
/** @type {Set<string>} */
const seen = new Set();
/** @type {Map<string, {id: number, code: string, dependencyMap: Map<string, string>}>} */
const modules = new Map();
const queue = [entryPoint];
let id = 0;
while (queue.length) {
const module = queue.shift();
if (seen.has(module)) {
continue;
}
seen.add(module);
const dependencyMap = new Map(
hasteFS
.getDependencies(module)
.map((dependencyName) => [
dependencyName,
resolver.resolveModule(module, dependencyName),
]),
);
const code = fs.readFileSync(module, 'utf8');
const metadata = {
// Assign a unique id to each module.
id: id++,
code,
dependencyMap,
};
modules.set(module, metadata);
queue.push(...dependencyMap.values());
}

With the above code we now have a unique ascending id for each module. Our entry point will conveniently always be id 0 because it is the first module we look at. As a next step we need to adjust our serializer with three updates:

  • Wrap each module in a function and call define.
  • Output our require-runtime.
  • Add requireModule(0); to the end of our bundle to run the entry-point.

Here is what that looks like:

javascript
console.log(chalk.bold(`❯ Serializing bundle`));
// Wrap modules with `define(<id>, function(module, exports, require) { <code> });`
/** @type {(id: number, code: string) => string} */
const wrapModule = (id, code) =>
`define(${id}, function(module, exports, require) {\n${code}});`;
// The code for each module gets added to this array.
/** @type {Array<string>} */
const output = [];
for (const [module, metadata] of Array.from(modules).reverse()) {
let { id, code } = metadata;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
// Swap out the reference the required module with the generated
// module it. We use regex for simplicity. A real bundler would likely
// do an AST transform using Babel or similar.
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
// Wrap the code and add it to our output array.
output.push(wrapModule(id, code));
}
// Add the `require`-runtime at the beginning of our bundle.
output.unshift(fs.readFileSync('./require.js', 'utf8'));
// And require the entry point at the end of the bundle.
output.push(['requireModule(0);']);
// Write it to stdout.
console.log(output.join('\n'));
javascript
console.log(chalk.bold(`❯ Serializing bundle`));
// Wrap modules with `define(<id>, function(module, exports, require) { <code> });`
/** @type {(id: number, code: string) => string} */
const wrapModule = (id, code) =>
`define(${id}, function(module, exports, require) {\n${code}});`;
// The code for each module gets added to this array.
/** @type {Array<string>} */
const output = [];
for (const [module, metadata] of Array.from(modules).reverse()) {
let { id, code } = metadata;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
// Swap out the reference the required module with the generated
// module it. We use regex for simplicity. A real bundler would likely
// do an AST transform using Babel or similar.
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
// Wrap the code and add it to our output array.
output.push(wrapModule(id, code));
}
// Add the `require`-runtime at the beginning of our bundle.
output.unshift(fs.readFileSync('./require.js', 'utf8'));
// And require the entry point at the end of the bundle.
output.push(['requireModule(0);']);
// Write it to stdout.
console.log(output.join('\n'));

And it works! Re-running our bundler via node index.mjs --entry-point product/entry-point.js will print a bundle exactly the way designed it earlier. For convenience, let’s add an --output flag to write our bundle to a file:

javascript
if (options.output) {
fs.writeFileSync(options.output, output.join('\n'), 'utf8');
}
javascript
if (options.output) {
fs.writeFileSync(options.output, output.join('\n'), 'utf8');
}
bash
# In your terminal:
node index.mjs --entry-point product/entry-point.js --output test.js
node test.js
# apple banana kiwi melon tomato kiwi melon tomato
bash
# In your terminal:
node index.mjs --entry-point product/entry-point.js --output test.js
node test.js
# apple banana kiwi melon tomato kiwi melon tomato

This will bundle our code, and then execute it in Node.js. You can also go ahead and load test.js within an HTML file in your browser and it will run your code. jest-bundler lives!

Compile each file in parallel

We solved the fundamental problems around dependency resolution, serializing a bundle and creating a runtime to execute our code. However, one big challenge remains: compiling our source files with a tool like Babel. Adding Babel allows us to make use of modern syntax. For example, we could make use of ECMAScript module syntax like import and export while still running our bundled code using our require-runtime. Let’s try this by adding Babel as a compiler: yarn add @babel/core @babel/plugin-transform-modules-commonjs and updating some of our example code:

javascript
// product/entry-point.js
import Apple from './apple';
console.log(Apple);
javascript
// product/entry-point.js
import Apple from './apple';
console.log(Apple);
javascript
// product/apple.js
import Banana from './banana';
import Kiwi from './kiwi';
export default 'apple ' + Banana + ' ' + Kiwi;
javascript
// product/apple.js
import Banana from './banana';
import Kiwi from './kiwi';
export default 'apple ' + Banana + ' ' + Kiwi;
javascript
// product/banana.js
export default 'banana ' + require('./kiwi');
javascript
// product/banana.js
export default 'banana ' + require('./kiwi');

Alright, that gives us enough test code to play with Babel compilation which looks something like this for one file:

javascript
import { transformSync } from '@babel/core';
const result = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;
javascript
import { transformSync } from '@babel/core';
const result = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;

Currently our code serially processes each module. Let’s rewrite our earlier for-of loop to use Promise.all so that each transformation can happen in parallel:

javascript
const results = await Promise.all(
Array.from(modules)
.reverse()
.map(async ([module, metadata]) => {
let { id, code } = metadata;
code = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
return wrapModule(id, code);
}),
);
// Append the results to our output array:
output.push(...results);
javascript
const results = await Promise.all(
Array.from(modules)
.reverse()
.map(async ([module, metadata]) => {
let { id, code } = metadata;
code = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
return wrapModule(id, code);
}),
);
// Append the results to our output array:
output.push(...results);

Actually, we can clean up our code that produces the output now. Let’s rewrite the serialization part of our bundler like this:

javascript
const output = [
fs.readFileSync('./require.js', 'utf8'),
...results,
'requireModule(0);',
].join('\n');
console.log(output);
if (options.output) {
fs.writeFileSync(options.output, output, 'utf8');
}
javascript
const output = [
fs.readFileSync('./require.js', 'utf8'),
...results,
'requireModule(0);',
].join('\n');
console.log(output);
if (options.output) {
fs.writeFileSync(options.output, output, 'utf8');
}

Similar to parallelizing test runs when we were building a test runner, code transformation is highly parallelizable. Instead of transforming all code in the same process, we can drop in jest-worker for improved performance. Let’s run yarn add jest-worker and create a new worker.js file:

javascript
const { transformSync } = require('@babel/core');
exports.transformFile = function (code) {
const transformResult = { code: '' };
try {
transformResult.code = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;
} catch (error) {
transformResult.errorMessage = error.message;
}
return transformResult;
};
javascript
const { transformSync } = require('@babel/core');
exports.transformFile = function (code) {
const transformResult = { code: '' };
try {
transformResult.code = transformSync(code, {
plugins: ['@babel/plugin-transform-modules-commonjs'],
}).code;
} catch (error) {
transformResult.errorMessage = error.message;
}
return transformResult;
};

And then on the top of our index.mjs file, we’ll create a worker instance:

javascript
import { Worker } from 'jest-worker';
const worker = new Worker(
join(dirname(fileURLToPath(import.meta.url)), 'worker.js'),
{
enableWorkerThreads: true,
},
);
javascript
import { Worker } from 'jest-worker';
const worker = new Worker(
join(dirname(fileURLToPath(import.meta.url)), 'worker.js'),
{
enableWorkerThreads: true,
},
);

All that’s left to do now is to modify our transform call to this:

javascript
const results = await Promise.all(
Array.from(modules)
.reverse()
.map(async ([module, metadata]) => {
let { id, code } = metadata;
({ code } = await worker.transformFile(code));
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
return wrapModule(id, code);
}),
);
javascript
const results = await Promise.all(
Array.from(modules)
.reverse()
.map(async ([module, metadata]) => {
let { id, code } = metadata;
({ code } = await worker.transformFile(code));
for (const [dependencyName, dependencyPath] of metadata.dependencyMap) {
const dependency = modules.get(dependencyPath);
code = code.replace(
new RegExp(
`require\\(('|")${dependencyName.replace(/[\/.]/g, '\\$&')}\\1\\)`,
),
`require(${dependency.id})`,
);
}
return wrapModule(id, code);
}),
);

We now don’t just have a bundler, we have a fast bundler. That was exciting!

Modern Bundling

You can find the full implementation of jest-bundler on GitHub. Through this guide we built what I’d call a “traditional bundler”. Nowadays many bundlers support ECMAScript Modules or advanced compilation options out of the box. Real bundlers may do incremental compilation, eliminate dead code, run whole program analysis to remove unnecessary functions or collapse multiple modules into a single scope. However, almost all production bundlers today ship with a runtime and module factories, which means they go through a similar flow of dependency resolution and module serialization. The concepts are transferrable and should set you up for building your own bundler.

If you have made it this far, here are some exciting follow-up projects you can try to dive deeper:

  • Add a --minify flag that runs a minifier like terser on each individual file in the bundle.
  • Add a cache that will store transformed files and only re-compile files that have changed.
  • Medium: Learn about source maps and generate the corresponding .map file for your bundle.
  • Medium: Add a --dev option that starts a HTTP server that serves the bundled code through an HTTP endpoint.
  • Medium: After implementing the HTTP server, make use of jest-haste-map’s watch function to listen for changes and re-bundle automatically.
  • Advanced: Learn about Import Maps and change the bundler from being require based to work with native ESM!
  • Advanced: Hot reloading: Adjust the runtime so it can update modules by first de-registering and then re-running the module and all of its dependencies.
  • Advanced: Rewrite the above bundler in another programming language like Rust.

By now we built a testing framework and a bundler. We could extend this series indefinitely and build a linter, a refactoring tool, a formatter or really any tool in the JavaScript space. All of these tools work on the same source, and share similar concepts – there is no reason they can’t also share the same infrastructure.

Tweet about this article, or share it with your friends. Discuss with the community, or email me. Thank you for reading, and have a great day!
Subscribe for updates on tech and management.