jd:/dev/blog_
· 6 min read

How Entire Works Under the Hood

I dug into Entire's open source Checkpoints CLI. It's a clever abuse of git internals — shadow branches, orphan metadata, and a session state machine. Here's how it works.

In part 1, I covered why Entire raised $60M and what problem they’re solving. Now let’s look at the actual code.

I pointed Claude Code at Entire’s open source CLI and asked it to explain how things work. The architecture is more interesting than I expected — they’ve essentially built a session-aware metadata layer on top of git using nothing but git’s own primitives.

The Entire CLI repository on GitHub

The Big Picture

Entire hooks into two things: your AI agent (Claude Code, Gemini CLI) and git itself. The agent hooks capture what’s happening during a session. The git hooks capture what the developer commits.

Agent hooks (Claude Code)         Git hooks
  SessionStart                     prepare-commit-msg
  UserPromptSubmit                 post-commit
  Stop                             pre-push
  PreToolUse / PostToolUse
         │                              │
         └──────────┬───────────────────┘

            ┌───────▼────────┐
            │   Strategy     │
            │                │
            │ SaveChanges()  │
            │ Rewind()       │
            │ Condense()     │
            └───────┬────────┘

         ┌──────────┴──────────┐
         │                     │
    Shadow branches      Metadata branch
    (local, temp)        (shared, permanent)
    entire/<hash>        entire/checkpoints/v1

How Agent Hooks Get Installed

Running entire enable writes hook entries into .claude/settings.json. Seven hooks, covering the full session lifecycle:

  • SessionStart/SessionEnd — track session boundaries
  • UserPromptSubmit — fires before the agent starts working (captures human edits)
  • Stop — fires after the agent finishes a turn (triggers checkpoint save)
  • PreToolUse/PostToolUse[Task] — track subagent spawning
  • PostToolUse[TodoWrite] — capture task state

Each hook is just a shell command: entire hooks claude-code stop. The CLI parses the agent’s transcript to extract everything it needs.

The Transcript Is the Source of Truth

This is the key insight. When the Stop hook fires, Claude Code passes two things via stdin: a session_id and a transcript_path. That transcript — the JSONL file where Claude logs every message, tool call, and response — is the single source of truth.

The CLI mines it for:

  • Modified files — scans for tool_use blocks where the tool is Write, Edit, etc., and extracts the file_path
  • User prompts — finds type: "user" entries
  • Token usage — sums input_tokens, output_tokens from response metadata
  • Summary — grabs the last assistant message

No magic, no APIs. It just reads the same JSONL file that Claude Code writes to disk.

Shadow Branches: Snapshots Without Commits

Here’s where it gets clever. When the agent finishes a turn, Entire needs to save a snapshot of the working tree. But it can’t commit to your branch — that would mess up your history.

So it creates shadow branches: refs like entire/2b4c177-a5e3f2 that live in your local repo but never touch your working branch.

The name encodes two things:

  • 2b4c177 — first 7 chars of HEAD when the session started
  • a5e3f2 — hash of the worktree ID (to support git worktree)

The snapshot is built entirely in memory using go-git’s plumbing APIs:

  1. Take HEAD’s tree (the full repo structure)
  2. Apply the agent’s changes (add/remove/modify blobs)
  3. Attach the metadata directory (.entire/metadata/<session-id>/)
  4. Create a commit on the shadow branch

No checkout, no stash, no visible side effects. The user and agent don’t even know it happened.

Deduplication is automatic: if the tree hash matches the previous checkpoint, it skips the commit. Git’s content-addressable storage means identical files share blobs across checkpoints.

The Condensation Model

The entire/checkpoints/v1 orphan branch stores all metadata

Shadow branches are local scratch space. The real metadata lives on entire/checkpoints/v1 — an orphan branch (no common ancestor with your code) that’s pushed alongside your regular branches.

The flow:

  1. Agent works → checkpoints saved on shadow branch (local)
  2. You commit → post-commit hook fires
  3. prepare-commit-msg adds a trailer: Entire-Checkpoint: a3b2c4d5e6f7
  4. Shadow branch data gets condensed — copied into the metadata branch
  5. Shadow branch gets cleaned up

The checkpoint ID (a3b2c4d5e6f7) is 6 random bytes, not a git SHA. It’s sharded into a directory path on the metadata branch:

entire/checkpoints/v1  (orphan branch)
└── a3/b2c4d5e6f7/
    ├── metadata.json          # summary, attribution, token usage
    ├── 0/
    │   ├── full.jsonl         # complete session transcript
    │   ├── prompt.txt         # user prompts
    │   └── context.md         # generated context
    └── 1/                     # additional sessions if any

That one-line trailer in your commit — Entire-Checkpoint: a3b2c4d5e6f7 — is the bidirectional link. From the commit you find metadata via the sharded path. From the metadata you find the commit by searching for the trailer.

Attribution: Who Wrote What?

This is the piece that matters for engineering leads. Entire tracks line-level code attribution: what percentage was agent-written vs. human-written.

The trick is the UserPromptSubmit hook. Every time you type a new prompt — before the agent starts working — the CLI snapshots the worktree diff against the last checkpoint. This captures exactly what you changed between agent turns.

By commit time, it has:

  • Agent lines: changes from the last checkpoint’s tree
  • Human added: lines you added between prompts
  • Human modified: lines you edited in agent-written code
  • Agent percentage: the ratio

The result is stored in initial_attribution in the metadata:

{
  "agent_lines": 150,
  "human_added": 25,
  "human_modified": 10,
  "agent_percentage": 85.7
}

It even uses a LIFO heuristic for self-modifications — if you add lines then remove lines from the same file, it assumes you’re removing your own first, not penalizing the agent’s contribution.

Multi-Developer: Conflict-Free by Design

The metadata branch gets pushed during git push (via the pre-push hook). Multiple developers push to the same entire/checkpoints/v1 branch.

This works because checkpoint IDs are random — two developers will essentially never produce the same 12-hex-char ID. Merging is just a tree union: flatten both trees, combine entries, done. No merge conflicts possible.

If a normal push fails (non-fast-forward), the CLI fetches the remote, merges trees, creates a merge commit, and retries.

What’s Missing

The architecture is solid engineering, but a few things stood out:

Transcript privacy. Session transcripts (full agent conversations) get pushed to a branch anyone with repo access can read. For private repos, maybe fine. For orgs with varying access levels — that’s a problem.

Squash merges break links. If a PR with 5 commits (each with Entire-Checkpoint trailers) gets squash-merged, those trailers disappear. The metadata exists but the bidirectional link from the merged commit is broken.

The metadata branch grows forever. Every session from every developer, including abandoned PRs and throwaway experiments, accumulates on entire/checkpoints/v1. There’s an entire clean command for local shadow branches, but no retention policy for the permanent metadata. For a large team over months, that’ll bloat.

No PR linkage. The branch name is stored, but there’s no PR number or URL. You can’t easily ask “show me all sessions related to PR #42.”

The Smart Parts

What I genuinely admire:

Git as a free database. Shadow branches store full repo snapshots, but git’s content-addressable storage means only changed blobs cost anything. You get atomic snapshots, deduplication, and transport for free.

In-memory tree building. Checkpoints are created through go-git plumbing APIs — no worktree checkout, no stash, nothing visible. Zero disruption to the developer’s flow.

Attribution at prompt boundaries. Capturing human edits before the agent contaminates the worktree is the cleanest measurement point possible.

Shadow branch migration. If you rebase or pull (HEAD changes), the shadow branch name automatically updates. Your session continues seamlessly. This handles a common workflow that would otherwise silently break.

So What?

Entire doesn’t solve a burning problem today. Most of us are fine with agent-written code landing in our repos without detailed provenance. But the trajectory is clear: as agents write more code, the audit trail becomes essential.

The approach of storing session context alongside code in git — rather than in a separate system — is the right architectural bet. Git is already where your code lives, where your CI runs, where your reviews happen. Adding a metadata layer inside git itself (instead of a SaaS dashboard somewhere) means the context travels with the code.

Whether Entire is the company that turns this into a platform worth $300M is above my pay grade. But the engineering is genuine, the problem is real, and the timing feels right.

I’ll be watching.

share:

Related posts