Rethinking JavaScript Infrastructure
The previous post claimed that dependency managers don’t manage dependencies. This post builds on the previous one, and I promise it’s going to be controversial. I provide recommendations that I implemented and have seen work in a large monorepo with hundreds of thousands of files at Facebook. The ideas are high impact but also high effort. Let me take you on a journey of what reimagining JavaScript infrastructure could look like.
To recap, here is where we are in our series about JavaScript infrastructure:
- Dependency Managers Don’t Manage Your Dependencies
- Rethinking JavaScript Infrastructure (you are here)
- Building a JavaScript Testing Framework
- Building a JavaScript Bundler
Performance Performance Performance
In the previous post we established that:
Adding many large dependencies tends to slow down install times significantly, and make all operations slower for everyone globally, even if individuals only use a subset of tools in a project.
We can look at this problem another way: How long does it take to get started on a project after checking out the repository or when dependencies require an update after rebasing? I’ve seen this process take minutes when it can be seconds.
Continuous Integration (CI) pipelines are a concrete example of this: You may have multiple workflows that verify a different thing about your project, yet all workflows tend to materialize the entire dependency tree. Imagine only installing the dependencies you need for each task! When we designed Yarn’s workspaces feature, we were guided by solving issues related to organizing mono repo and keeping compatibility with lerna. While we have a neat separation of concerns across packages, we don’t leverage that separation to improve the performance of dependency installation.
In hindsight, we made a critical mistake by underinvesting in the workflow performance considerations around dependencies and the continued growth of the JavaScript ecosystem. Yarn eventually added focused workspaces, but few people are aware, and the feature is rarely used. Arguably, the feature should have been the default: Yarn’s node_modules
installation process should be per-workspace and incrementally install more dependencies based on the operations executed in a repository.
So let’s take all of this in another direction: What would it look like if we eliminated the ongoing need for installing dependencies from our development iteration cycle completely? My solution — check all of our dependencies into source control and make all development tools available as pre-compiled binaries. This would speed up all repository operations (getting everything up-to-date after a rebase) as well as reduce our in-band reliance on dependency management tools.
Here are the three topics we are going to need to wrap our heads around for rethinking JavaScript infrastructure:
- DevDependencies were a mistake
- Checking third-party product code into version control
- Build Zero-Overhead Tooling
DevDependencies were a mistake
The idea of having product and tooling code integrated made a lot of sense early in the node.js/front-end ecosystem and still makes sense for libraries. In the past, entries in the dependencies
field usually meant that code is part of the actual production build artifact, while entries part of devDependencies
are only used during development.
However, for applications, as the ecosystem matured, it is becoming evident that this system no longer serves its purpose.1 Nowadays, product dependencies are usually just inputs into a complex compilation pipeline, type system, or test framework. Quite often, there is no meaningful distinction between what is part of dependencies
or devDependencies
and what ships to production. A simple example is a frontend UI library installed in node_modules
. If you are deploying compiled JavaScript bundles to production, the code of your UI library is only an input to your compilation pipeline. The UI library is compiled into a bundle that will be deployed to production. You do not need the source code of the library in your production environment. It doesn’t matter if you are listing the UI library under dependencies
or devDependencies
.
For applications, I propose a slightly new way of thinking about dependencies that will lead to a clearer distinction between product and development dependencies: Everything that behaves or is used as if it was first-party code should be a product dependency. Treat it like any other code that your team is writing, and don’t think of node_modules
for product code as a magic directory. The code and type definitions for your UI library go into production dependencies, while your compiler toolchain remains a development dependency. This perspective may be completely obvious to you, but I encourage you to open the package.json
file of a large application, and you’ll find some packages violating this principle.
Now that we have a clear rule around what goes into dependencies
and devDependencies
, we can split them up into two separate folders each with its own package.json
, one for product and one for tooling. At the end of this process, we’ll end up with one smaller node_modules
folder with all product-related code and one large node_modules
folder containing all of the tools that operate on the product and third-party product code which leads us directly to the next step:
Checking third-party product code into version control
Now that we did the initial steps to separate product and tooling dependencies, we can go a step further and check all of our product dependencies into version control.
“Hold up”, you say? It may sound counter-intuitive as the JavaScript community spent years building infrastructure to keep node_modules
out of source trees. Historically, projects with checked-in node_modules
are painful to manage. However, only checking product dependencies into source control limits most of the downsides. Let’s analyze some of the trade-offs:
Downside: Checked-in node_modules
are too big for version control
Mitigation: Usually, the amount of product dependencies is only about 10-30% of the total size of a node_modules
folder. Product dependencies are compiled into production bundles, so it is unusual for them to be more than an order of magnitude larger than first-party code. This means that the size of product dependencies node_modules
folder should not be much larger than first-party code already is, resulting in at most twice as much code that’s already in the repository. To keep these dependencies in check, you can use Yarn’s flat option to ensure you are only using a single version of the same package as well as Yarn’s autoclean feature and a strictly managed custom exclusion list is helpful to remove unused and unnecessary files. Additional repository size can be the biggest downside to this strategy so I recommend analyzing the current size and predicting the future growth before committing to checking third-party dependencies into version control.
Downside: The checked-in node_modules
folder will grow to be unmanageable
Mitigation: The difference between third-party dependencies for product code and tooling code is that the dependencies for product code are deployed with applications as if it was first-party code. Given that this code impacts the size of an application, it is unlikely to grow by orders of magnitude or to outpace first-party code creation significantly. Further, there is an incentive for teams to reduce the size of their production bundles instead of increasing them substantially.
Downside: Updating checked-in node_modules
is painful and slows people down
Mitigation: Managing a checked-in node_modules
folder is painful, especially when upgrading large dependencies or large trees of dependencies, for example, Babel and Jest, which both contain dozens of packages. However, since we are exclusively checking in product dependencies, we are unlikely to encounter such tightly connected packages. Most of the time, people will only add or update a small number of third-party product dependencies.
Downside: People may commit manual changes to node_modules
instead of sending fixes upstream
Mitigation: This problem can be avoided by building a CI step that verifies the integrity of the node_modules
folder and prevents people from patching files directly.
There are also upsides to checking in node_modules
:
Upside: Visibility into third-party code deployed to production
There is usually little scrutiny on newly added dependencies during code reviews. Somebody may add a single line to a package.json
file that can pull in hundreds of other dependencies. This problem is exacerbated and easy to miss because GitHub hides changes in yarn.lock
files by default.
When materializing third-party dependencies in the repository via a yarn
install, changes and additions are visible to people during the code review stage. It makes people aware of large trees of transitive dependencies. If somebody adds a thousand files just to use a single utility function, it makes sense to apply more scrutiny during code review and maybe even recommend alternative solutions.
Upside: Reduce reliance on individual package managers
There are many JavaScript dependency managers, and it is unclear how they (and their use) will evolve in the future. Checking node_modules
into the repository will reduce the reliance on a single package manager and increase option value. In this case, it enables the option to switch to another package manager with less work as we eliminate Yarn from the critical path and only use the output of the install operation (the node_modules
folder). From Principles of Developer Experience:
[Maximizing Option Value] is about retaining or gaining option value, which means any change to a system should unlock more options for improvements and significant future changes. […] There is usually little option value embedded in the design of existing systems. If we keep option value in mind when redesigning infrastructure, we can naturally adapt to new requirements in the future.
To summarize, we can mitigate many downsides and gain significant upsides by taking more ownership of the dependency management process with this strategy. We are not done yet, though! Now that product and tooling dependencies are neatly separated, and product dependencies are part of version control, together with the next step, everything will fall into place:
Build Zero-Overhead Tooling
To get to a state where we can immediately get started after checking out a repository or after rebasing, we need to have fast access to all the tools like the bundling infrastructure, web server, test frameworks, linter, type checker, and all other tools. Installing them as part of the node_modules
install process is slow. The vast majority of time spent here is resolving dependencies and copying tens of thousands of files from tarballs. Even after everything is installed, the tools are slow to start: Many of these tools load thousands of source files into memory when using them. The solution is to compile them into binaries and vendor them into your projects so they don’t require installing and running third-party dependencies from source.
Various tools are already beginning to move the ecosystem in this direction, like deno’s compile command which helps create executables. Next.js is also compiling many of its dependencies into pre-compiled bundles, which already had a meaningful impact on its install times and startup times. You can pre-compile tools in one of the following ways:
- Compile your JavaScript tool into a single JavaScript file.
- Use Vercel’s pkg to create optimized binaries for your tool.
- Use
deno compile
to produce binaries for tools written using deno.
The next step will be to deploy the tool. Here are some example strategies:
- Maintain a private homebrew tap.
- Use
node_modules
and JavaScript package managers! We previously established that JavaScript dependencies are great at downloading artifacts and putting them in place, and they work great for storing binary data and downloading them. In this case, the idea is to put binary artifacts into packages without dependencies. - Build a custom system that will build packages on GitHub and download and execute them transparently.
- Not recommended: Check binary artifacts into your repository.2
While we’ll still have an install process, it is usually an order of magnitude faster than installing all the source files. The process can be hidden from the user by integrating it into the tools themselves that manage their updates. It’s essential to version the tools with the state of the repository: A version or hash must be embedded into the repository as a commit every time the tool is changed. This way, you can roll back tools by updating the version or hash in the repository if there are issues. Navigating to older commits will use the older versions of the tools instead of a possibly incompatible newer one.
Not all scripts and tools need to be pre-compiled, only the ones used by a large population of developers or tools with specific performance constraints. Ideally, you can separate tools that aren’t often used into a different workspace, so their dependencies will only be installed when using the tool. An excellent example of that is end-to-end testing frameworks: they usually come with many dependencies, but only very few developers run end-to-end tests locally. Consider isolating these tools into a separate part of the repository and writing a script to automatically handle installing and updating its dependencies when developers invoke the tool.
I’m not aware of any ready-to-use solution that unifies both the compilation and deployment process into a smooth experience. If you are building one, please let me know!
Adding it all up
With all of the steps above applied, a repository can be checked out or rebased, and engineers can immediately start developing inside of it. Engineers will gain more control over the code they deploy to production, spend much less time waiting to install dependencies and benefit from a better separation of concerns. Further, because the tooling for bundling, type-checking, testing, and linting are separated into different packages, they can each be improved and managed separately.
I have built and deployed a system at Facebook using the ideas presented in this article series. Immediately after checking out a gigantic monorepo, the time it took to install dependencies, start Metro and build JavaScript bundles was reduced from seven minutes to seven seconds. Half of the improvements came from ideas presented in this series so far, and the other half was achieved through bundling optimizations we’ll talk about in the future. Stay tuned!
Reimagining JavaScript Infrastructure is a luxury, and not every team and project will have the resources to make step-function changes like I’m proposing. I’m not here to convince you, I’m here to tell you that the JavaScript ecosystem is capable of doing much better than it has in the past. Prove me wrong, and improve on my ideas, but always bet on JavaScript.
Next: Building a JavaScript Testing Framework.
Footnotes
-
Here, libraries are defined as packages that end up being published and consumed by other libraries or applications. Applications are defined as repositories that use something like Yarn workspaces to manage dependencies. ↩
-
I do not recommend this approach because it will negatively affect repository performance. It’s only ok if the binary in question rarely changes, like once a year. ↩