All the version control you need
“The limits of my language mean the limits of my world.” - Ludwig Wittgenstein
Git is great for building open-source software – its model of version control gels wonderfully with the needs of open-source maintainers and contributors. Unfortunately, the same cannot be said for teams building closed-source software. These teams have different needs for their version control and different capabilities for working, both technically and interpersonally.
In fact, Git is typically a productivity inhibitor for closed-source teams. Workflows that should be simple require extra steps and some productive ways of working are impossible to implement. To make matters worse, most developers are so used to the Git ‘way of working’ that it is difficult for them to see or even conceptualize the opportunities they’re missing.
Just as a version control system like Git can guide users in how to build software, different software methodologies can dictate what version control operations are necessary (and which are superfluous).
Let’s imagine a typical software team building a closed-source web service. They use small PRs, fast code reviews, and automated testing to deliver incremental changes quickly. They want a linear history of small changes to enable continuous integration and continuous delivery, along with easy reverts of bad changes.
What operations are necessary to create a code change on this team? Developers must git clone
(or git pull
) the repo, git branch
, make file edits, git add
, git commit
, git push
, and then create a pull request (if using GitHub). Often developers will add
and commit
many times in the process of making one pull request to capture intermediate states of their work.
Branching is largely a formality in this flow but it creates significant friction. If the goal is to write one small, isolated change and merge it into trunk, the branch primarily exists to fit into git’s (and GitHub’s) workflow. The branch itself does not provide useful information or context that is not contained in the PR description.
Similarly, any commit messages the developer has written are wasted effort. They provide no value beyond saving intermediate states during development[0], something that could (and should) be solved automatically by the version control system. Instead, Git requires three manual steps (add
, write commit message, commit
) to capture intermediate PR state.
The commits themselves will be squashed away when the PR is merged; they are an ephemeral byproduct of the PR’s early stages rather than a durable artifact. Commit messages often reflect this: the messages are contentless (e.g. fix typo
, make it work
, make it work for real
) and intended for the author’s consumption during implementation rather than the reviewer’s consumption during review.
This is how Git implicitly encourages suboptimal workflows: Git’s core constructs (commits and branches) are both ephemeral overhead for a team doing closed-source, trunk-based development. The actual atomic unit of work, the PR, is not even represented in Git’s model.
What would a better workflow look like?
No branches or commits, only PRs
The minimum viable version control workflow looks like this:
- Obtain some environment where you can build on the latest revision of the code repository. This could be locally on your machine or remotely on a VM. It could download the entire repository (such as with Git) or use a virtual file system.
- Modify one or more files
- Create a change set for review
By eliminating the superfluous and redundant steps from the Git workflow we reduce from seven steps to three. This ratio gets more extreme if modifications are needed in review, since updating a change in review can be a single operation rather than three.
No branches
Because every reviewed change will be appended linearly onto the repository’s trunk, the concept of a branch is useless in typical development and can be abstracted away. Although development is always done ‘off of’ some revision in the trunk’s history, the developer experience can feel akin to working on the trunk directly.
The VCS must provide a command to ‘rebase’ any changes in progress onto the latest revision of the trunk (or a past revision, if desired) and require this operation before merging if the edited files have new upstream changes. However, this at most requires resolving merge conflicts in the edited files; there is never a need for reordering commits or inspecting the ‘tree’ of history. As a result, this operation is typically quick and easy[1].
No commits
When PRs are small and frequent, the utility of intermediate checkpoints (i.e. commits) diminishes significantly. Since commits will be squashed (and their messages lost) during a merge, requiring developers to write a message in order to save their state is pure overhead. Due to Git’s staging area, saving state in Git is a three step process of add
, write message, commit
.
Instead of manual commits, it suffices for the VCS to take automatic snapshots of files whenever they are saved. Developers can then ‘rewind’ files to a previous state whenever needed without any manual action (and work is never lost by forgetting to commit).
Only PRs
PRs are the atomic unit of work because they form the steps in the linear revision history of the repository. PRs must pass tests before they can be merged, ensuring that the latest revision (which all developers build off of) remains green. PRs must (usually) be small, allowing them to be written, reviewed, tested, merged, and (when necessary) reverted quickly.
PR review is the boundary that separates an individual’s work from the team’s responsibility. PR review is the process by which ‘my’ code or ‘your’ code becomes our code. PRs are all there is.
Minimum viable version control
The following commands encompass the functionality I’ve discussed (try to ignore any Git-specific meaning of reused terms and imagine them conceptually):
workspace
: Acquire a named environment where code changes can be made and tracked.changeset
: Create a reviewable set of changes (or, by option, add changes to an existing change set).patch
: Copy a WIP or submitted change set from another workspace into your workspacepush
: Update the remote change set with whatever modifications are present.sync
: Update the environment to use the latest version of all files in the repository, triggering conflict resolution where necessary.snapshots list
: List the automatically-captured previous states for a change set or individual file[2].snapshots revert
: Revert a file or files to a previously-captured state. This is non-destructive; on save, a new snapshot is created rather than ‘rewinding’ the history.
Those are the commands needed to create and iterate on change sets. Although they are fundamentally affordances of the VCS, most can be used more efficiently through an interface or IDE integration. The same is true for functionality related to the revision history: viewing the linear revision history of a file or directory (history
) and the blame (blame
) are tasks best accomplished in a dedicated UI.
A VCS built around change sets (unlike Git) would naturally afford managing the change set through the review cycle: sending them for review (send
), reviewing them (review
), and merging them (submit
). Pragmatically, these are also well-suited for a standalone review UI.
Other considerations
Stacking
Even for a very focused team, there is a limit to the reasonable expectation of review speed. Because of this, developers will often find it useful to ‘stack’ changes by starting a follow-up PR on top of their previous work while it is in review. There are several ways to ease this process, mostly around the resolution of later revisions to the upstream PR. Ultimately, this functionality is a layer on top of the primitives described above and can be integrated into the VCS, a meta-tool, the review tool, or all three.
However, it is important to note that stacking PRs is a workflow regression and must not be abused. Used in excess, stacked PRs cause many of the same issues as large PRs. In my experience, three stacked PRs is roughly the breakeven point. Frequent stacking beyond this is an indicator of structural issues around review speed and distribution.
Release branches
Some teams have an unavoidable need for long-lived release branches due to technological or business constraints. Obviously, continuous delivery and a truly-branchless model is infeasible for them, even if they otherwise practice trunk-driven development.
[0] Squash on merge is the best option for teams using Git with the model discussed here. Because the reviewed change set is the atomic unit of work, commit messages have no relevancy and can be discarded on merge. Other merge options that preserve commits work poorly since commits are often incomplete work states with broken/failing tests and unhelpful messages which clog the repo history.
[1] Qualitative improvements in the version control workflow are harder to convey than quantitative ones but no less real. ‘Merge hell’ and ‘rebase hell’ are both primarily consequences of the limitations inherent in Git’s model. Different tools and workflows, while not feasible for all teams, can largely eliminate them.
[2] It’s reasonable to also have a ‘name latest snapshot’ convenience feature, but it’s peripheral enough that I’m excluding it from the main list.