Imagining Jujutsu UX in Git
Evolving Git
Git has enabled an explosion in collaborative software development while remaining a thorn in many engineers' sides. I see developers troubleshooting merge conflicts or struggling to juggle branches roughly once per month. Last month, it was me! Git developers are working to evolve the project, including configuring new repositories with SHA-256, more efficient reference distribution via reftables, promisor remotes for excluding large file history by-default, and more opinionated user-facing commands. More secure default hash functions and efficient reference storage are important enhancements, but their absence has not been mentioned in any Git complaint I've heard. When I hear people complain about Git, they are complaining about the user experience. New user commands, as planned by Git's development team, could address the user experience dissatisfaction that's remained for 20 years.
For anyone thinking through improved user-facing commands for the git CLI, Jujutsu is an obvious source of inspiration.
The jj CLI has led the field in providing opinionated commands that its users adore.
Stable change identifiers provide clear relationships across commits.
The jj rolling-commit workflow, which treats the working copy as the single source of truth without any manual staging, removes one of the trees from your mental three-way merge: the base commit tree, the working copy, and the staged index.
For true merge conflicts, automatically-recorded conflict resolutions ensure you don't need to resolve the same conflict twice.
Using jj also lowers your risk of losing work when juggling branches (or bookmarks, in Jujutsu's parlance).
In addition to every change being captured automatically in a commit, without manual staging, all operations are reversible via the operation log.
Thanks to jj undo and jj redo, every user interaction is reversible.
Jujutsu also provides a new sidecar database, the .jj directory, for its own data model.
This data model was not designed to enable more flexible user-facing commands; it's an end unto itself.
Jujutsu's data model was developed by a large organization (Google) to support source control management at the massive scale large organizations need.
A CLI with an improved user experience — jj — was then wrapped around this new, scalable data model.
While I suspect that Jujutsu's data model is more flexible than the vast majority of developers need it to be, the jj user experience is preferred to the git CLI's for many developers.
Thankfully, the jj user experience is largely expressible with Git's data model, and it's possible that this expression can scale to real usage.
We can build a new git CLI experience — a git jj perhaps — that provides all of the jj CLI's beloved user commands without changing Git's data model or adding sidecars.
Thinking through how jj data can be stored directly in the .git directory is, in my opinion, a useful exercise.
Please read this post as a sketch of ideas that may be worth further investigation in the project of improving Git's user experience. Git has been ubiquitous in the software industry for over 10 years; I do not believe any of the ideas in this post are novel or complete. I do believe, though, that revisiting previously proposed ideas periodically has value; this is particularly true when the ideas are relevant to an active search for improved usability tools in Git.
Data in Git
Git, like Jujutsu, separates its data model from its user experience.
The Git user experience is held entirely in the git CLI that we all know, and some love. (For the record, I include myself in the set of developers who love the git CLI!) The underlying data is stored in Merkle trees with compressed data stored as leaves, or blobs.
Each tree is an ordered list of its contents: other trees, and blobs.
Git captures trees in commits, which are nodes in the directed acyclic graph (DAG) that may contain the hashes of parent commits, and always contain the hash of a single tree.
Every branch, or more generically every ref, is a (often human-readable) label that links to a commit.
When any Git object is not linked-to by another object or ref, that object is liable to be deleted in Git's garbage collection pass.
Git requires you to stage changes you make to your working copy.
Those changes are stored in a special binary file: .git/index.
When committing, the changes to commit are pulled directly from the index, not from the working copy.
All objects are stored in the objects directory under .git, and all refs (with the exception of most SCREAMING_SNAKE_CASE files, which are reserved for git) are stored in the refs directory.
Other files in .git are for features specific to git that are separate from its data model, such as repository configuration options, rerere cache, and the packed-refs file.
For a more complete overview, see the Git project documentation.
An example of Git's directory structure is shown immediately below.
$ tree .git -L1
.git
├── config
├── description
├── FETCH_HEAD
├── fsmonitor--daemon
├── fsmonitor--daemon.ipc
├── HEAD
├── hooks
├── index
├── info
├── logs
├── MERGE_RR
├── objects
├── ORIG_HEAD
├── packed-refs
├── refs
└── rr-cache
8 directories, 9 filesData in Jujutsu
The Jujutsu data model differs from Git in ways that are more relevant to massive organizations than they are to individual developers.
Objects, operations, indices, and working copies are all tracked independently in separate databases.
You can run jj git init in any Git repository and inspect the contents of the .jj/ sidecar directory yourself; it will look like the directory structure shown immediately below.
Each type blob represents the data backend used for that particular data type.
This backend-agnostic design is described more in Jujutsu's architecture documentation.
In practice, just one data backend is supported today (the Git object database) but in theory, users can benefit from data backends that are optimized to each data type's expected usage pattern.
For example: the working copy via squashfs, operations as ephemeral local storage, and objects and indices stored in cloud infrastructure.
This is a pattern that Google in particular, the organization which currently staffs full-time engineers for Jujutsu development, may benefit from.
$ tree .jj -L3
.jj
├── repo
│ ├── config-id
│ ├── index
│ │ ├── changed_paths
│ │ ├── op_links
│ │ ├── segments
│ │ └── type
│ ├── op_heads
│ │ ├── heads
│ │ └── type
│ ├── op_store
│ │ ├── operations
│ │ ├── type
│ │ └── views
│ ├── store
│ │ ├── extra
│ │ ├── git_target
│ │ └── type
│ ├── submodule_store
│ │ └── type
│ └── workspace_store
│ └── index
└── working_copy
├── checkout
├── tree_state
└── type
16 directories, 11 filesJujutsu UX in Git
Jujutsu's data model is more explicit than Git's data model, but I believe Git's data model may be expressive enough to store all of the information that's necessary for a fully featured git jj porcelain that scales.
Every beloved jj feature that I'm aware of is listed below.
- Operation history, with universal undo for the full repository state.
- Rolling commits with persistent change identifiers; staging and committing (with the change identifier preserved) happens automatically on most
jjinvocations. - Conflicts stored structurally in the object store; an algebra over these conflicts allows
jjto gracefully track complicated conflicts, and defer resolution for conflicted commits which are not checked out to the working copy.
For each jj feature, let's explore how the same functionality could be provided by storing information directly in Git's object store.
Each solution is not necessarily the best implementation for real usage.
In fact, the writing below gets a bit hand-wavy at times.
Still, I believe thinking through these ideas is valuable as the Git project itself evolves to meet the higher usability standard set by projects like Jujutsu.
And there's no need to reinvent the wheel!
We can use well-explored strategies such as metadata refs and indirection in the object graph to express each of the above features directly in Git's data model.
Operation History
Let's imagine a new command: git jj op. (Perhaps for this one, we can just leave off the jj: git op.) How could we capture git commands and their effects?
Git supports reference transaction hooks, and hooks which run after committing, rewriting prior commits, merging, checking out a different working tree, and applying patches.
Each of these hooks can be used to grab information about the recently changed repository state, and store that information somewhere.
But where?
Other Git extensions have stored this information out-of-band.
The git-branchless project stores repository history in a SQLite database.
GitUp similarly uses an on-disk, out-of-band operation representation to provide undo support.
Multiple projects use Git's history directly to represent operations, albeit not Git operations specifically: git-bug, git-appraise, radicle, and gerrit use combinations of CRDTs and JSON schemas to encode information in Git objects.
Storing repository operations directly in Git as a DAG of commits would provide tracking repository state, reverting operations, and viewing diffs automatically.
In appropriate hooks, snapshots of the repository's refs and config could be taken and written to the object store as tree objects.
Each tree object would represent the repository's state at some instance.
History could be recorded as a DAG of commits, where each commit represents an operation, and each commit's tree represents the output repository state.
A special refs/op ref, exempted from tracking in the hooks, could be used to conveniently view the latest operation, and keep all operation objects reachable to protect them from garbage collection.
If link filemodes are used for refs stored in each refs/op commit's tree, then all objects ever created in a repository could be stored.
Git's garbage collection will keep every object that is reachable from a ref; if every ref move is captured, no objects will be deleted.
To protect against this, the hooks could store the object ID as a value in the refs/op DAG.
This design could rely on the reflog to keep recently created objects alive long enough for practical operation log usage.
There are, of course, plenty of other issues with this design, including accidental git push refs/op invocations that would distribute private edits to a repository.
Git's default refspec would prevent the vast majority of pushes, but explicit git push refs/op invocations would still upload the data to the remote.
While that particular pain point is not different in kind from the refs/jj/keep refs, this is not a rigorously proposed design.
It is, I believe, a proof of existence: we can store repository operations directly in Git, without any sidecar databases.
Rolling Commits
The git CLI uses the add and reset commands to stage and unstage content to the index.
The jj CLI has no such index; instead, the working copy is amended to the active commit automatically during most jj command invocations.
To support this workflow in Git, a new porcelain command is needed.
A server process, optionally detachable, could be used to automatically add working copy changes to the object store, and amend the HEAD reference to the latest commit.
I believe, in this way, the index can be completely bypassed.
To replicate Jujutsu change IDs, a Gerrit strategy could be used.
Unique identifiers for each rolling commit could be automatically amended to the commit's message.
Alternatively, a metadata ref could be used, à la git notes.
This metadata ref could be refs/notes/commits, but only one note per object is allowed; by adding change ID content to refs/notes/commits, we would be clobbering other data written there.
Instead, if the metadata strategy is used, change ID notes could be kept in a new ref: refs/notes/changeid.
Git hooks could be used to avoid pushing refs/notes/changeid if users explicitly push it, but notes are excluded from Git's default refspec, so most users would not find an issue.
Care must be taken to ensure this data is not corrupted by other Git operations.
Still, if such a porcelain existed, I believe there's a real chance it could reach parity with the jj CLI in commit workflows.
Gerrit has maintained change IDs via hooks and commit trailers, and Git notes are largely underutilized.
Perhaps we can extract more utility from these patterns to improve Git's user experience.
Deferred Conflict Resolution
There's nothing stopping you from adding and committing unresolved conflicts in Git! Unfortunately, if you refuse to resolve conflicts, your software will likely refuse to build. Conflicts in Git are presented as special markers in file content. Jujutsu improves on Git's conflict markers by only creating them when materializing objects to your working copy. In the object store, Jujutsu adds extra tree content to conflicted commits, and uses an algebra over the structured tree content to handle complicated conflicts (e.g. nested conflicts) gracefully.
Jujutsu specifies the schema for the conflict trees, and the jj CLI provides the implementation to process and materialize conflicts into the working copy correctly.
To provide the same functionality in Git, a new porcelain command (perhaps the same porcelain that provides rolling commits would have to implement similar logic to process conflicted commits in the DAG, then materialize them to the working copy with markers.
We could add an extra tree to conflicted commits, just as Jujutsu does, but then we'd be using unsupported commit formats.
The jj CLI protects against pushing these invalid objects to forges in its jj git push command.
We could do the same, requiring users to stop using git push, and instead use git jj push.
Thankfully, I don't think it's necessary to constrain users with a new push command.
We could instead store a conflicted commit's conflict trees (base, ours, and theirs) in a metadata ref à la git notes.
Unlike git notes, this ref (perhaps refs/conflicts/commits) could map commit object IDs to tree IDs.
I've developed one such tool that provides this functionality: git-metadata.
Of course, this ref should also be kept local.
Thankfully, Git's default refspec only pushes and fetches refs/heads/* and refs/tags/*.
Git hooks could be used to avoid pushing refs/conflicts/* if users explicitly push it.
Imagine a conflicted commit abcdef, which sits in the middle of a long branch of commits that was recently rebased over main.
A descendant commit, say the tip of the branch, deletes all the conflicted code.
The tip of the branch is fine, and you can keep working!
Still, when it comes time to try to merge the branch into main, perhaps your repository requires all commits reachable from the tip of main to pass all tests, and you're blocked until you resolve all conflicts.
If the conflicted trees are stored in a refs/conflicts/commits ref, the conflicted commit IDs could be stored as top-level entries which point to a tree with three entries, all pointing to their respective commits: base, ours, and theirs.
The conflicted blobs can be found with a cheap tree traversal (just as they are in Jujutsu), and resolved without checking out the conflicted commit directly via git jj edit.
An example of such a ref is shown below.
refs/conflicts/commits: commit a0b1c2d3
└── tree 01234567
└── abcdef
├── base
├── ours
└── theirsSo far, we've discussed where conflict information is stored.
How do we actually store that information?
When Git's default merge driver detects a conflict, the merge halts.
This is seen frequently by developers when their cherry-pick, rebase, and merge commands pause and ask for manual resolutions before continuing with the --continue flag.
To support merging and rebasing without pausing to resolve conflicts, a custom merge driver could be used to save the conflict source information in the metadata ref, and then continue on without error.
This feature, deferred resolution for conflicted commits, is the feature I'm least confident about implementing in Git's object store directly, and perhaps this proposal would be a regression in user experience for most Git users. Regardless, I'm not convinced that better merge semantics can't be added to Git. The payoff for considering this design, and others like it, is more user-friendly conflict resolution procedures while preserving a single data model.
Moving Forward
Many Git users are searching for an improved user experience, and are willing to pay the cost of a second data model.
Jujutsu provides exactly this experience out of the box.
Due to the ubiquity of Git, I believe it's worth considering how user experience enhancements found in sidecar databases, like .jj/, can instead be stored directly in Git's object store.
If there's even a chance of providing all jj features directly in git, the considerations will have been worth it.
I believe there's a real chance that the most beloved features in the jj CLI — operation traversal, rolling commits sans staging, and more robust conflict storage — can be provided by the git CLI at scale.
Of course, this post presents no proof.
More work needs to be done to prove or disprove the claim.
By continuing to study past Git extensions, as cited throughout the prior sections, we may be able to find a solution that improves Git for everyone.