Core Concepts
Understanding a few key ideas will help you get the most out of Semantica.
Checkpoints
A checkpoint is a snapshot of your repository at a specific point in time. It records:
- A full file manifest (every tracked file and its SHA-256 hash)
- The commit hash it was created at (if commit-linked)
- The kind of checkpoint:
auto(created by a commit hook),manual(created withsemantica checkpoint), orsafety(created automatically before a rewind)
Checkpoints are the foundation of everything in Semantica. Attribution, rewind, and explain all operate on checkpoints.
Every commit automatically creates a checkpoint. You can also create them manually at any time:
semantica checkpoint -m "before big refactor"
Checkpoints are stored locally in .semantica/ and never written to Git history.
Checkpoint IDs
Checkpoint IDs look like chk_abc123def456. They are prefix-matchable, you only need enough characters to be unique in your repo:
semantica show chk_abc # works as long as it's unambiguous
semantica rewind abc # same prefix matching applies
Sessions
A session is a single AI agent conversation, one Claude Code chat, one Cursor composer thread, one Kiro IDE session. Semantica reads session data passively from each provider's local logs after a commit, and links sessions to the checkpoint created by that commit.
Sessions contain events: user messages, assistant responses, and tool calls (file reads, writes, shell commands). Semantica uses these events to compute attribution.
View sessions for your repo:
semantica sessions
View the full transcript of a session:
semantica sessions <session_id> --transcript
Attribution
Attribution is Semantica's answer to "how much of this commit was AI-generated?"
After a commit, Semantica diffs the changed files against the AI session output captured for that checkpoint. It uses a three-tier matching system:
| Tier | Description |
|---|---|
| Exact | Line matches AI output verbatim |
| Modified | Line is a lightly edited version of AI output |
| Formatted | Line is AI output with whitespace or formatting changes |
The result is a percentage and a per-file breakdown:
HEAD (abc1234) add authentication module
AI attribution: 78% (31/40 lines)
auth/login.go 91% (20/22 lines) claude_code
auth/middleware.go 61% (11/18 lines) claude_code
Attribution data is also pushed to GitHub PR comments and check runs when you connect a repo to the dashboard.
Playbooks
A playbook is an LLM-generated structured summary of a commit, created when you run semantica explain HEAD --generate or when auto-playbook is enabled. Each playbook captures:
- Title and intent
- What was changed and why
- Outcome and learnings
- Friction points encountered
- Keywords for search
Playbooks are stored locally and indexed for full-text search:
semantica search "retry logic"
semantica search "auth token refresh"
AI agents can also search playbooks via MCP, so knowledge from past sessions compounds across future work.
The commit pipeline
Here's the full sequence of what happens on every commit once Semantica is enabled:
1. pre-commit hook
└── Creates a pending checkpoint stub (saves file manifest offset)
2. git commit (your commit runs normally, nothing is blocked)
└── commit-msg hook appends Semantica-Checkpoint trailer
3. post-commit hook
└── Links checkpoint to the commit hash
└── Spawns background worker (detached, no terminal output)
4. Background worker (async)
├── Ingests agent session data from detected providers
├── Builds full file manifest snapshot
├── Computes AI attribution (diff vs agent output)
├── Links sessions to checkpoint
└── Optionally generates playbook (if auto-playbook enabled)
The worker is fully decoupled from your terminal session. If you close the terminal after committing, the worker continues until it finishes.
Local-first design
Semantica is designed to work entirely offline without any account or backend. All data stays in .semantica/ inside your repository:
- No data leaves your machine unless you run
semantica auth loginandsemantica connect - No Git history modification: Semantica only appends trailers to commit messages and stores data in
.semantica/ - No side branches: the blob store and database are separate from the Git object store
If you do connect a repo, Semantica pushes attribution summaries (not file contents or transcripts) to the hosted dashboard. Before any data leaves the machine, Semantica applies secret redaction using Gitleaks patterns. If redaction fails for any reason, the send is blocked rather than sending raw data.