Monorepos: Please don’t!

At scale, there is simply no way to rebuild the entirety of a codebase and run all automated tests when each change is submitted (or, more importantly and more often, in CI when a change is proposed).

To deal with this problem, all of the large monorepos have developed sophisticated build systems (see Bazel/Blaze from Google and Buck from Facebook as examples) that are designed in such a way as to internally track dependencies and build a directed acyclic graph (DAG) of the source code.

This DAG allows for efficient build and test caching such that only code that changes, or code that depends on it, needs to be built and tested.

Furthermore, because code that is built must actually be deployed, and not all software is deployed at the same time, it is essential that build artifacts are carefully tracked so that previously deployed software can be redeployed to new hosts as needed.

This reality means that even in a monorepo world, multiple versions of code exist at the same time in the wild, and must be carefully tracked and reconciled.

Monorepo proponents will argue that even with the large amount of build/dependency tracking required, there is still substantial benefit because a single commit/SHA describes the entire state of the world.

I would argue this benefit is dubious; given the DAG that already exists, it’s a trivial leap to include individual repository SHAs as part of the DAG, and in fact, Bazel can seamlessly work across repositories or within a single repository, abstracting the underlying layout from the developer.

Furthermore, automated refactor tooling can trivially be built that automatically bumps dependent library versions across many repositories, thus blurring the difference between a monorepo and polyrepo in this area (more on this below).

The end result is that the realities of build/deploy management at scale are largely identical whether using a monorepo or polyrepo.

The tools don’t care, and neither should the developers writing code.

Theoretical benefit 3: Code refactors are easy / atomic commitsThe final benefit that monorepo proponents typically tout is the fact that when all code is in a single repository, it makes code refactors much easier, due to ease of searching and the idea that a single atomic commit can span the entire codebase.

This is a fallacy for multiple reasons:As described above, at scale, a developer will not be able to easily edit or search the entirety of the codebase on their local machine.

Thus, the idea that one can clone all of the code and simply do a grep/replace is not trivial in practice.

If we assume that via a sophisticated VFS a developer can clone and edit the entire codebase, the next question is how often does that actually happen?.I’m not talking about fixing a bug in an implementation of a shared library, as this type of fix is identically carried out whether using a monorepo or polyrepo (assuming similar build/deploy tooling as described in the previous section).

I’m talking about a library API change that has follow-on build breakage effects for other code.

In very large code bases, it is likely impossible to make a change to a fundamental API and get it code reviewed by every affected team before merge conflicts force the process to start over again.

Developers are faced with two realistic choices.

First, they can give up, and work around the API issue (this happens more often than we would like to admit).

Second, they can deprecate the existing API, implement a new API, and then go through the laborious process of individual deprecation changes throughout the codebase.

Either way, this is exactly the same process undertaken in a polyrepo.

In a service oriented world, applications are now composed of many loosely coupled services that interact with each other using some type of well specified API.

Larger organizations inevitably migrate to an IDL such as Thrift or Protobuf that allow for type-safe APIs and backwards compatible changes.

As described in the previous section on build/deploy management, code is not deployed at the same time.

It might be deployed over a period of hours, days, or months.

Thus, modern developers must think about backwards compatibility in the wild.

This is a simple reality of modern application development that many developers would like to ignore but cannot.

Thus, when it comes to services, versus library APIs, developers must use one of the two options described above (give up on changing an API or go through a deprecation cycle), and this is no different whether using a monorepo or polyrepo.

In terms of actually making refactor changes across large codebases, many organizations end up developing automated refactor tooling such as fastmod, recently released by Facebook.

As elsewhere, a tool such as this can trivially operate within a single repository or across multiple repositories.

Lyft has a tool internally called “refactorator” which does just this.

It works like fastmod but automates making changes across our polyrepo, including opening PRs, tracking review status, etc.

Unique monorepo downsidesIn the previous section I laid out all of the theoretical benefits that a monorepo provides, and explained why in order to realize them, extraordinarily complex tooling must be developed that is no different to what is required for a polyrepo.

In this section, I’m going to cover two unique downsides to monorepos.

Downside 1: Tight coupling and OSSOrganizationally, a monorepo encourages tight coupling and development of brittle software.

It gives developers the feeling they can easily fix abstraction mistakes, when they actually cannot in the real world due to the realities of staggered build/deploy and the human/organizational/cultural factors inherent in asking developers to make changes across the entire codebase.

Polyrepo code layout offers clear team/project/abstraction/ownership boundaries and encourages developers to think carefully about contracts.

This is a subtle yet hugely important benefit: it imbues an organization’s developers with a more scalable and long-term way of thinking.

Furthermore, the use of a polyrepo does not mean that developers cannot reach across repository boundaries.

Whether this happens or not is a function of the engineering culture in place versus whether the organization uses a monorepo or polyrepo.

Tight coupling also has substantial implications with regard to open source.

If an organization wishes to create or easily consume OSS, using a polyrepo is required.

The contortions that large monorepo organizations undertake (reverse import/export, private/public issue tracking, shim layers to abstract standard library differences, etc.

) are not conducive to productive OSS collaboration and community building, and also create substantial overhead for engineers within the organization.

Downside 2: VCS scalabilityScaling a single VCS to hundreds of developers, hundreds of millions lines of code, and a rapid rate of submissions is a monumental task.

Twitter’s monorepo roll-out about 5 years ago (based on git) was one of the biggest software engineering boondoggles I have ever witnessed in my career.

Running simple commands such as git status would take minutes.

If an individual clone got too far behind, it took hours to catch up (for a time there was even a practice of shipping hard drives to remote employees with a recent clone to start out with).

I bring this up not specifically to make fun of Twitter engineering, but to illustrate how hard this problem is.

I’m told that 5 years later, the performance of Twitter’s monorepo is still not what the developer tooling team there would like, and not for lack of trying.

Of course, the past 5 years has also seen development in this area.

Microsoft’s git VFS which is used internally to develop Windows, has tackled creating a real VFS for git, as I described above, as a requirement for monorepo scalability (and with Microsoft’s acquisition of GitHub it seems likely this level of git scalability will find its way into GitHub’s enterprise offerings).

And, of course, Google and Facebook continue to invest tremendous resources into their internal systems to keep them running, although none of this work is publicly available.

However, why bother solving the VCS scalability problem at all when, as described in the previous section, tooling will also need to be built that is identical to what is required for a polyrepo?.There is no good reason.

ConclusionAs is often the case in software engineering, we tend to look at tech’s most successful companies for guidance on best practices, without understanding the monumental engineering that has gone into making those companies successful at scale.

Monorepos, in my opinion, are an egregious example of this.

Google, Facebook, and Twitter have invested extensively in their code storage systems, only to wind up with a solution that is no different from what is required when using a polyrepo, yet leads to tight coupling and requires a substantial investment in VCS scalability.

The frank reality is that, at scale, how well an organization does with code sharing, collaboration, tight coupling, etc.

is a direct result of engineering culture and leadership, and has nothing to do with whether a monorepo or a polyrepo is used.

The two solutions end up looking identical to the developer.

In the face of this, why use a monorepo in the first place?.Please don’t!.

. More details

Leave a Reply