# Semantica, agent and crawler reference Last updated: 2026-04-02. Replace the date when you materially change product facts. Regenerate this file with `node scripts/generate-llms-txt.mjs` when docs change. Semantica: AI attribution and review tooling from local agents (CLI hooks) to GitHub (PR comments, policy checks). Playbooks, MCP, egress redaction. ## Key URLs - /, Product landing, install command, audience-specific hero copy - /docs, Documentation index (introduction from `docs/introduction.md`) - /docs/, One page per chapter — every slug is reproduced in **Full documentation** below - /blog, Weekly posts (developer stream + non-technical / "Vibe Coders" stream) - /use-cases, Marketing use cases (teams, review, knowledge) and paths into product - /agents, Agent-facing reference: hooks, MCP, providers, llms.txt link — prefer for factual agent/MCP claims ## What Semantica is Semantica tracks AI involvement in code changes: local capture via agent hooks, deterministic attribution, optional push to a backend, GitHub App integration for PR comments and policy-driven check runs. It includes playbook generation, full-text search, and an MCP server for agent reuse. Outbound payloads can pass through egress redaction (e.g. Gitleaks-style patterns) with fail-closed behavior when configured. ## Primary surfaces 1. **CLI**, Install and usage: see **Installation** and **Quickstart** in Full documentation below (and `/docs/installation`, `/docs/quickstart`). Commands include enable/disable, blame, explain, suggest commit, suggest PR, sessions, checkpoint workflows, MCP, search, tidy, and configuration. 2. **GitHub App**, OAuth, webhooks, PR attribution comments, policy engine (e.g. informational vs blocking), retry queue. 3. **MCP**, JSON-RPC tools for search, explain, playbook reuse (surface area described in product docs). ## Supported agents (non-exhaustive marketing list) Claude Code, Cursor, Gemini CLI, Copilot CLI, Kiro IDE, Kiro CLI, normalized into one pipeline. ## Audience - **Developers**: Deep CLI and GitHub integration, attribution diagnostics, policy gates. - **Non-technical builders ("Vibe Coders")**: PMs, designers, and others shipping with AI: plain-language positioning on the marketing site; blog content tailored to this audience. ## Competitive context Semantica competes in the AI + git/agent attribution space. Public comparison content: blog posts per major competitor. Internal detailed matrices may exist in repository docs (e.g. `docs/entire-vs-semantica-v8.md` — not inlined here; lives beside other internal memos under `docs/`). ## Discovery This file (`/llms.txt`) is a machine-readable reference: site map, product summary, policies, and **complete** published documentation Markdown (same sources as Next.js `app/docs/[slug]/page.tsx` and the `/docs` introduction). Image paths in Markdown use `/docs/images/` for the web. ## Canonical brand Use the name **Semantica** in metadata, titles, and structured data. ## Policies - robots.txt allows all crawlers (including LLM and research bots). - Prefer citing /agents and /docs for factual claims; marketing copy on / may evolve. --- # Full documentation (site) The following sections mirror the Markdown files in the repository under `docs/`, in the same order as `DOC_SLUGS` in `lib/docs-shared.ts` (with **Introduction** first for `/docs`). Relative `.md` links in the source refer to sibling files; on the website they resolve under `/docs/`. --- ## Introduction **Canonical path:** /docs **Source file:** `docs/introduction.md` # Introduction **Semantica** is observability and control for AI-assisted development. It tracks what AI coding agents do in your repositories and ties that activity to your Git commits, giving you a clear, auditable record of who or what wrote your code and how. Git tells you *what* changed. Semantica tells you *how it happened*. --- ## What Semantica does When you run `semantica enable` in a repository, Semantica installs lightweight Git hooks that fire automatically on every commit. From that point on, each commit: - Creates a **checkpoint**: a full file manifest snapshot of the repository at that moment - **Ingests agent session data** from detected AI providers (Claude Code, Cursor, Kiro IDE, Kiro CLI, Gemini CLI, GitHub Copilot) - Computes **AI attribution**: what percentage of changed lines match AI-generated output, broken down by file - Appends a `Semantica-Checkpoint` trailer to the commit message linking the commit to its checkpoint All of this happens in a background worker, fully detached from your terminal. Commits are never slowed down. --- ## Why it matters As AI coding agents become standard tools, the question "what percentage of this PR was AI-generated?" becomes important for code review, compliance, and team visibility. Semantica answers it automatically, without requiring developers to change how they work. For teams using multiple agents, Semantica normalizes attribution data from all six supported providers into one consistent format. Whether your team uses Claude Code, Cursor, or Kiro IDE, you get the same structured output. For organizations that need enforcement, Semantica can post attribution results as GitHub check runs, enabling branch protection rules that gate merges on AI attribution thresholds. --- ## Where data lives By default, all Semantica data is stored locally in a `.semantica/` directory inside your repository, alongside `.git/`. This directory is automatically added to `.gitignore`. Nothing is sent anywhere unless you explicitly authenticate with `semantica auth login` and connect the repo with `semantica connect`. ``` .semantica/ settings.json # Configuration lineage.db # SQLite database (checkpoints, sessions, attribution, playbooks) objects/ # Content-addressed blob store (SHA-256, zstd compressed) worker.log # Background worker logs ``` Semantica never writes to Git history, never creates side branches, and never modifies agent session logs. --- ## Supported platforms - **macOS** and **Linux** (x86_64 and arm64) - **Git** is required. Semantica hooks into the commit lifecycle - At least one supported AI provider for capture (see [Integrations](/docs/providers)) --- ## Next steps - [Quickstart](/docs/quickstart): install and enable in under two minutes - [Core concepts](/docs/core-concepts): understand checkpoints, sessions, and attribution - [Installation](/docs/installation): all install methods including Homebrew and shell script --- ## Architecture **Canonical path:** /docs/architecture **Source file:** `docs/architecture.md` # Architecture This document describes how Semantica works internally. ## Overview Semantica is a single Go binary that operates as a CLI tool, a set of Git hook handlers, a background worker, and an MCP server. It adds an attribution and observability layer on top of Git without modifying Git's workflow. ``` AI agent activity Git commit (repo A) (Claude, Cursor, etc.) │ │ ┌────────────┼────────────┐ │ │ │ │ provider hooks pre-commit commit-msg post-commit (prompt-submit, stop) │ │ │ │ create pending append link commit │ checkpoint trailers spawn worker ┌──────▼──────┐ │ │ │ │ capture │ └────────────┼────────────┘ │ command │ │ └──────┬──────┘ ┌──────▼───────┐ │ │ Worker │ │ │ (detached) │ ┌──────▼──────┐ └──────┬───────┘ │ broker │ │ │ (routing) │ ┌────────────┼────────────┐ └───┬─────┬───┘ │ │ │ │ │ reconcile build file compute │ │ sessions manifest attribution │ │ │ │ │ │ │ ┌──────────────┐ └───────────┼────────────┘ │ │ │ .semantica/ │ │ │ └──────────────► │ repo B │◄────────────────┘ │ │ lineage.db │ │ └──────────────┘ │ │ ┌──────────────┐ │ │ .semantica/ │ └────────────────────► │ repo A │ │ lineage.db │ └──────────────┘ ``` There are two ingestion paths: 1. **Real-time capture** (primary) - Provider hooks fire `semantica capture` during agent activity, routing events through the broker into one or more enabled repos. 2. **Worker reconciliation** (secondary) - The background worker flushes any sessions that still have pending capture state, ensuring no events are lost if a capture hook was interrupted. The broker fans out by file ownership. A capture started from one enabled repo can still write events into another enabled repo when the touched files belong there. ## Capture The `semantica capture` command is the primary ingestion path for AI agent activity. It is invoked by provider hooks (not by the user directly). Each provider registers hooks in its own configuration file (e.g., `.claude/settings.json`, `.cursor/hooks.json`) that call `semantica capture ` with event metadata on stdin. The capture command: 1. Parses the provider-specific stdin payload 2. On prompt-submit: saves the current transcript offset to `$SEMANTICA_HOME/capture/` so it knows where new content starts 3. On stop: reads the transcript from the saved offset, extracts events, and routes them through the broker to the correct repo's database See [providers.md](providers.md) for provider-specific hook details. ## Git hooks Semantica installs three Git hooks via `semantica enable`. Each hook invokes the `semantica` binary as a subprocess. ### pre-commit Creates a pending checkpoint in the SQLite database. Writes a handoff file (`.semantica/.pre-commit-checkpoint`) containing the checkpoint ID and a timestamp. This file is how state is passed between the three hook phases. The hook exits immediately - it never blocks the commit. ### commit-msg Reads the handoff file. Appends the checkpoint trailer, and appends attribution and diagnostics trailers when the `trailers` setting is enabled: ``` Semantica-Checkpoint: chk_abc123 Semantica-Attribution: 42% claude_code (18/43 lines) Semantica-Diagnostics: 3 files, lines: 15 exact, 2 modified, 1 formatted ``` If no AI matches the commit, the attribution trailer becomes `0% AI detected (0/N lines)` and the diagnostics trailer explains whether no AI events existed in the checkpoint window or whether events existed but did not match the committed files. Trailers are only appended if the handoff file exists and is fresh (written within the last few seconds, to guard against stale state from aborted commits). ### post-commit Reads the handoff file again. Links the commit hash to the pending checkpoint in the database. Deletes the handoff file. Spawns a fully detached background worker process (`semantica worker run --repo --checkpoint --commit `). The worker is detached from the terminal - it runs independently of the user's shell session. ## Worker The worker runs as a detached background process after each commit (spawned by the post-commit hook), or can be invoked manually for debugging. It completes the checkpoint that the pre-commit hook created. ### Processing pipeline 1. **Session reconciliation** - Flushes any sessions that still have pending capture state (via `reconcileActiveSessions`). This is a catch-up mechanism - the primary capture path is the real-time `semantica capture` command triggered by provider hooks. The worker ensures no events are lost if a capture hook was interrupted or if the agent session outlived the hook call. 2. **File manifest** - Hashes every tracked file plus untracked, non-ignored files in the working tree using SHA-256. Compresses file contents with zstd and stores them in the content-addressed blob store. Records the manifest (path -> blob hash mapping) as a compressed JSON blob. Uses the previous checkpoint's manifest for incremental building. 3. **Checkpoint completion** - Marks the pending checkpoint as complete with the manifest hash and size. 4. **Session linking** - Finds sessions with events in the time window between the previous and current checkpoint. Associates them with the checkpoint in the database. 5. **AI attribution** - Diffs the commit against the parent. It first scores the current commit-linked checkpoint window, then applies bounded carry-forward for eligible created files that were already present in the previous commit-linked manifest but still scored 0 AI in the current window. For each changed line, it uses three match levels: - **Exact**: line matches AI output character-for-character - **Formatted**: match after normalizing whitespace - **Modified**: fuzzy match (line appears derived from AI output) Computes per-file and aggregate AI percentage and stores it on the checkpoint. 6. **Push** (optional) - If the repo is connected, pushes attribution data to the Semantica backend. The CLI only sends attribution payloads; GitHub PR comments and check runs, GitLab MR comments and commit statuses, and dashboards are implemented in the backend/API, not in this repository. The endpoint comes from the authenticated session, or `SEMANTICA_ENDPOINT` when set. Best-effort - failures are logged but don't cause the worker to fail. 7. **Auto-playbook** (optional) - If enabled, spawns a separate detached process (`semantica _auto-playbook`) that calls an LLM to generate a structured summary (title, intent, outcome, learnings, friction, keywords). Stored in the database and indexed for full-text search. Steps 6 and 7 are best-effort - failures never cause the worker to fail. ## Storage ### SQLite database (`lineage.db`) Single-file database in `.semantica/`. Contains: | Table | Purpose | |-------|---------| | `checkpoints` | Checkpoint metadata (ID, kind, trigger, status, timestamps) | | `commit_links` | Maps commit hashes to checkpoint IDs | | `sessions` | AI agent sessions (provider, model, start/end time) | | `events` | Individual turns within sessions (role, content, tool calls) | | `attributions` | Per-file AI attribution results | | `playbooks` | LLM-generated commit summaries (FTS-indexed) | | `agents` | MCP agent registration | | `sync_state` | Dashboard sync tracking | The schema is defined in `internal/store/sqlite/schema/` and queries in `internal/store/sqlite/queries/`. Both are processed by [sqlc](https://sqlc.dev) to generate type-safe Go code. ### Blob store (`objects/`) Content-addressed storage using SHA-256 hashing and zstd compression. Directory layout uses 2-character sharding: ``` objects/ aa/ aabbccdd... (compressed blob) bb/ ... ``` Used for file snapshots (checkpoint manifests) and event payloads. ### Settings (`settings.json`) ```json { "enabled": true, "version": 1, "providers": ["claude-code", "cursor", "gemini", "copilot"], "connected": false, "trailers": true, "automations": { "playbook": { "enabled": false }, "mcp": { "enabled": false } } } ``` The `providers` field is a string array of installed hook provider names (not paths). `connected` controls whether the current repo syncs attribution to the dashboard (set by `semantica connect`). The `trailers` field controls whether `Semantica-Attribution` and `Semantica-Diagnostics` are appended; `Semantica-Checkpoint` is always included. When omitted, `trailers` defaults to `true`. The backend endpoint is not stored in repo settings. ## Broker The broker is a cross-repo event routing layer used by the `capture` command. It maintains a registry of enabled repositories at `$SEMANTICA_HOME/broker/repos.json`. When an AI provider hook fires (e.g., Claude Code's `user-prompt-submit`), the capture command: 1. Reads the event payload from stdin 2. Looks up which registered repo(s) contain the affected files (deepest-match rule) 3. Routes the event to the matching repo database or databases This allows Semantica to capture AI activity even when the provider's hook system doesn't know about the repo structure. In practice, a hook fired from one workspace can still route events into another Semantica-enabled repo if that repo owns the touched paths. ## MCP server The MCP server implements [JSON-RPC 2.0](https://www.jsonrpc.org/specification) over stdio, following the [Model Context Protocol](https://modelcontextprotocol.io/) specification. It exposes three tools: | Tool | Description | |------|-------------| | `semantica_search` | Full-text search across playbook summaries | | `semantica_playbook_use` | Record that an agent applied a past playbook | | `semantica_explain` | Get commit explanation with AI attribution | Configured via `semantica mcp enable`, which writes provider-specific config files (`.mcp.json` for Claude Code, `.cursor/mcp.json` for Cursor, `.kiro/settings/mcp.json` for Kiro IDE, `.kiro/agents/semantica.json` for Kiro CLI, `~/.gemini/settings.json` for Gemini CLI, `.copilot/mcp-config.json` for Copilot). ## Package structure ``` cmd/semantica/ CLI entrypoint (main.go) internal/ commands/ Cobra command definitions service/ Core business logic worker.go Background worker pipeline pre-commit.go Pre-commit hook handler post-commit.go Post-commit hook handler hook_commit_msg.go Commit-msg hook handler rewind.go Checkpoint rewind logic explain.go Commit explanation and attribution sessions.go Session listing and details show.go Checkpoint detail display playbook.go LLM playbook generation push.go Remote endpoint push store/sqlite/ Storage layer schema/ SQL schema definitions queries/ SQL query definitions db/ sqlc-generated Go code store.go Store implementation git/ Git operations hooks.go Hook script templates and installation diff.go Diff parsing log.go Log parsing hooks/ AI provider integrations lifecycle.go Event dispatch state machine state.go Capture state management claude/ Claude Code session ingestion cursor/ Cursor session ingestion gemini/ Gemini CLI session ingestion copilot/ GitHub Copilot session ingestion broker/ Cross-repo event routing mcp/ MCP server (JSON-RPC over stdio) version/ Build version injection e2e/ End-to-end tests ``` --- ## Checkpoints **Canonical path:** /docs/checkpoints **Source file:** `docs/checkpoints.md` # Checkpoints Checkpoints are Semantica's snapshot system. Every checkpoint is a point-in-time record of your repository that you can inspect, compare, and restore from, without touching Git history. --- ## How checkpoints are created **Automatic checkpoints** are created on every commit via the `pre-commit` and `post-commit` hooks. This happens transparently and never slows down your commit. **Manual checkpoints** can be created at any time: ```bash semantica checkpoint -m "before big refactor" semantica checkpoint -m "before merging upstream" ``` **Safety checkpoints** are created automatically before any `rewind` operation, giving you a way to undo the restore if needed. --- ## Listing checkpoints ```bash semantica list # Last 20 checkpoints semantica list -n 50 # Last 50 semantica list --json # JSON output ``` Output shows the checkpoint ID, timestamp, kind (auto/manual/safety), associated commit hash, and commit subject: ``` chk_def456 2026-03-23 14:22 auto abc1234 add authentication module chk_abc123 2026-03-23 11:05 baseline (no commit) (baseline) ``` --- ## Inspecting a checkpoint ```bash semantica show chk_def456 semantica show chk_def456 --json semantica show def4 --jsonl # metadata + one file per line ``` The `show` command displays the full file manifest: every file in the snapshot, its blob hash, and its size. This tells you exactly what the repository contained at that checkpoint. Checkpoint IDs are prefix-matchable, `semantica show def4` works as long as the prefix is unambiguous. --- ## Rewinding to a checkpoint Rewind restores your working tree to the state captured in a checkpoint. It does **not** rewrite Git history or create any commits. ```bash semantica rewind chk_abc123 ``` Before restoring, Semantica creates a safety checkpoint so you can get back to where you were: ``` Creating safety checkpoint... chk_safety789 Restoring 12 files to chk_abc123... Done. Rewound to chk_abc123. Safety checkpoint: chk_safety789 ``` To undo a rewind, rewind to the safety checkpoint: ```bash semantica rewind chk_safety789 ``` ### Exact rewind By default, rewind only restores files that are tracked in the checkpoint. Files that exist in your working tree but were not in the checkpoint are left untouched. Use `--exact` to also delete those extra files: ```bash semantica rewind chk_abc123 --exact ``` This restores the working tree to exactly what the checkpoint recorded, no more, no less. ### Skipping the safety checkpoint If you're certain you don't need a safety checkpoint (for example, in a scripted workflow), you can skip it: ```bash semantica rewind chk_abc123 --no-safety ``` --- ## Checkpoint kinds | Kind | When created | |------|-------------| | `baseline` | When `semantica enable` runs for the first time | | `auto` | After every commit (by the post-commit hook) | | `manual` | When you run `semantica checkpoint` explicitly | | `safety` | Automatically before every `rewind` | --- ## What checkpoints store Each checkpoint stores a **manifest**: a list of every tracked file in the repository at that moment, along with the SHA-256 hash of each file's content. The file contents themselves are stored in `.semantica/objects/` as a content-addressed blob store (compressed with zstd). Because the store is content-addressed, identical files across multiple checkpoints share a single blob. The storage footprint grows proportionally to the number of unique file versions, not the number of checkpoints. --- ## Checkpoint trailers in commits Every commit that creates a checkpoint includes a `Semantica-Checkpoint` trailer in the commit message: ``` add authentication module Semantica-Checkpoint: chk_def456 ``` This trailer is always added and cannot be disabled (though the attribution and diagnostics trailers can be). It creates a durable link between any commit and its corresponding Semantica data, even if the `.semantica/` directory is later removed. --- ## Commands **Canonical path:** /docs/commands **Source file:** `docs/commands.md` # Commands Full reference for every Semantica CLI command. --- ## Global flags | Flag | Description | |------|-------------| | `--repo ` | Path to the Git repository (default: current directory) | | `--version` | Print version and exit | | `--help` | Show help | --- ## enable Initializes Semantica in the current repo. Creates `.semantica/`, installs Git hooks, auto-detects AI providers, and creates a baseline checkpoint. ```bash semantica enable # First-time setup semantica enable --force # Re-detect providers and reinstall hooks semantica enable --providers claude-code # Non-interactive: specify providers semantica enable --providers kiro-ide,cursor # Multiple providers ``` | Flag | Default | Description | |------|---------|-------------| | `--force` | `false` | Reinitialize even if already enabled | | `--providers` | auto | Comma-separated list of providers to install hooks for | | `--json` | `false` | Output as JSON | --- ## disable Disables Semantica. Hooks remain installed but are inert (they check for `.semantica/enabled` before running). Your data in `.semantica/` is preserved. ```bash semantica disable ``` --- ## status Shows an overview of AI activity in the repository: enabled state, authentication, connection state, auto-playbook and MCP status, last checkpoint, recent sessions, AI attribution trend, playbook count, and detected providers. ```bash semantica status semantica status --json ``` --- ## blame Shows AI attribution for a commit or checkpoint. Reports the percentage of changed lines that were AI-generated, broken down by file. ```bash semantica blame HEAD # Latest commit semantica blame abc1234 # By commit hash semantica blame HEAD --json # Full JSON with per-file detail ``` If no ref is given and stdin is a terminal, an interactive checkpoint picker is shown. | Flag | Default | Description | |------|---------|-------------| | `--json` | `false` | Output as JSON (includes per-file breakdown) | --- ## explain Explains what happened in a commit: files changed, AI involvement, session breakdown, token usage, and top edited files. Optionally generates an LLM playbook summary. ```bash semantica explain HEAD # Commit stats + AI involvement semantica explain abc1234 --generate # Also generate a playbook summary semantica explain abc1234 --generate --force # Regenerate even if summary exists semantica explain abc1234 --json # JSON output ``` `--generate` spawns a background LLM call. Run `explain` again after a few seconds to see the result. | Flag | Default | Description | |------|---------|-------------| | `--generate` | `false` | Generate a narrative explanation using an LLM | | `--force` | `false` | Force regeneration (use with `--generate`) | | `--json` | `false` | Output as JSON | --- ## suggest commit Generates a one-line commit message from all uncommitted changes (staged, unstaged, and untracked). Analyzes the diff and recent AI session context using the first available LLM CLI. Copies the result to the clipboard automatically. ```bash semantica suggest commit semantica suggest commit --json ``` Requires at least one LLM CLI installed and authenticated: `claude`, `cursor`, `gemini`, or `copilot`. --- ## suggest pr Generates a PR title and body from the current branch diff against a base branch. If a pull request template exists in the repo, Semantica fills its sections rather than inventing structure. ```bash semantica suggest pr semantica suggest pr --base origin/main semantica suggest pr --json semantica suggest pr --copy ``` | Flag | Default | Description | |------|---------|-------------| | `--base` | auto-detect | Base branch or ref to diff against | | `--json` | `false` | Output as JSON | | `--copy` | `false` | Copy title and body to clipboard | --- ## search Searches playbook summaries using full-text search. Useful for finding past solutions relevant to what you're working on now. ```bash semantica search "auth token refresh" semantica search "error handling" --limit 5 semantica search "database migration" --json ``` Results include commit hash, subject, checkpoint ID, AI percentage, model, and the full playbook. | Flag | Default | Description | |------|---------|-------------| | `--limit` | `10` | Maximum number of results | | `--json` | `false` | Output as JSON | --- ## sessions Lists agent sessions tracked in the repo, or views details of a specific session. ```bash semantica sessions # Recent sessions semantica sessions --limit 100 # More sessions semantica sessions --all # Include sessions with no events semantica sessions # Session details semantica sessions --transcript # Full transcript semantica sessions --json # JSON output ``` | Flag | Default | Description | |------|---------|-------------| | `--limit` | `20` | Maximum number of sessions to list | | `--all` | `false` | Include sessions with no events | | `--transcript` | `false` | Show the session transcript (with a session ID) | | `--json` | `false` | Output as JSON | --- ## transcripts Shows the agent transcript for a checkpoint, commit, or session, the sequence of user messages, assistant responses, and tool calls. ```bash semantica transcripts HEAD # Latest commit's checkpoint semantica transcripts abc123 # By checkpoint or commit ref semantica transcripts abc123 --commit # Only sessions that touched files in the commit semantica transcripts abc123 --by-session # Group events by session semantica transcripts abc123 --cumulative # All events up to this checkpoint semantica transcripts abc123 --raw # Include full payload JSON semantica transcripts abc123 --verbose # Show provider, tokens, etc. ``` | Flag | Default | Description | |------|---------|-------------| | `--commit` | `false` | Only sessions touching files in the commit diff | | `--by-session` | `false` | Group events by session | | `--cumulative` | `false` | All events up to checkpoint (default: delta since previous) | | `--raw` | `false` | Include raw payload JSON from blob store | | `--verbose` | `false` | Show provider, tokens, and other fields | | `--checkpoint` | `false` | Force ref resolution as checkpoint ID | | `--session` | `false` | Force ref resolution as session ID | | `--json` | `false` | Output as JSON | | `--jsonl` | `false` | JSONL: metadata line + one event per line | --- ## list Lists checkpoints, most recent first. ```bash semantica list # Last 20 semantica list -n 50 # Last 50 semantica list --json # JSON output semantica list --jsonl # JSONL output (one object per line) ``` | Flag | Default | Description | |------|---------|-------------| | `-n, --limit` | `20` | Maximum number of checkpoints | | `--json` | `false` | Output as JSON | | `--jsonl` | `false` | JSONL output | --- ## show Shows details of a specific checkpoint: metadata, manifest hash, size, linked commit, and the full file list with blob hashes. Checkpoint IDs are prefix-matchable. ```bash semantica show abc123 semantica show abc123 --json semantica show abc123 --jsonl # metadata + one file per line ``` --- ## checkpoint Manually creates a checkpoint outside the normal commit flow. ```bash semantica checkpoint -m "before big refactor" semantica checkpoint --json ``` | Flag | Default | Description | |------|---------|-------------| | `-m, --message` | | Checkpoint message | | `--json` | `false` | Output as JSON | --- ## rewind Restores the working tree to the state captured in a checkpoint. Always creates a safety checkpoint first so you can undo the rewind. ```bash semantica rewind abc123 # Restore files to checkpoint state semantica rewind abc123 --exact # Also delete files not in the checkpoint semantica rewind abc123 --no-safety # Skip safety checkpoint (dangerous) semantica rewind abc123 --json # JSON output ``` | Flag | Default | Description | |------|---------|-------------| | `--exact` | `false` | Delete files not present in the checkpoint | | `--no-safety` | `false` | Skip creating a safety checkpoint before rewind | | `--json` | `false` | Output as JSON | --- ## playbook use Records that an agent applied a past playbook. Builds a reusable record of which solutions are actually being reused across sessions. ```bash semantica playbook use abc1234 --agent claude-code --note "applied the retry pattern" semantica playbook use abc1234 --json ``` | Flag | Default | Description | |------|---------|-------------| | `--agent` | | Agent name (e.g. `claude-code`) | | `--note` | | How the playbook was applied | | `--json` | `false` | Output as JSON | --- ## agents Manage AI agent hooks. Shows which agents are detected and which have hooks installed. In interactive mode (terminal), shows a multi-select picker. In non-interactive mode, prints a status table. ```bash semantica agents # Interactive: toggle agent hooks semantica agents --json # Detection and installation status as JSON ``` --- ## set View or update Semantica settings for the current repo. ```bash semantica set # Show current settings semantica set auto-playbook enabled # Enable auto-playbook generation semantica set auto-playbook disabled # Disable auto-playbook generation semantica set trailers enabled # Enable attribution + diagnostics trailers semantica set trailers disabled # Checkpoint-only commits ``` | Subcommand | Values | Description | |------------|--------|-------------| | `auto-playbook` | `enabled` / `disabled` | LLM playbook after each commit | | `trailers` | `enabled` / `disabled` | Attribution and diagnostics trailers (checkpoint trailer is always included) | --- ## auth Manage authentication with the Semantica backend. ```bash semantica auth login # OAuth login via GitHub or GitLab semantica auth logout # Revoke session and delete credentials semantica auth status # Show current auth status ``` `auth login` opens a browser for OAuth and polls until complete. Tokens are stored in OS secure storage (macOS Keychain, Linux Secret Service) when available, with automatic refresh. On headless or CI environments, credentials fall back to `~/.config/semantica/credentials.json` (`0600` permissions). Set `SEMANTICA_API_KEY` to skip interactive auth in CI. --- ## connect / disconnect Connect or disconnect the current repo from the Semantica dashboard. Attribution data from connected repos is pushed to the backend for team dashboards, GitHub PR comments, and check runs. ```bash semantica connect # Connect this repo to the dashboard semantica disconnect # Stop syncing from this repo ``` Authentication via `semantica auth login` is required before connecting. --- ## mcp Manage the MCP (Model Context Protocol) server that AI agents can call natively. ```bash semantica mcp enable # Configure all detected agents to use Semantica MCP semantica mcp disable # Remove Semantica MCP from agent configurations semantica mcp status # Show which agents have MCP configured ``` The MCP server exposes three tools: | Tool | Purpose | |------|---------| | `semantica_search` | Search past playbooks for relevant solutions | | `semantica_playbook_use` | Record that an agent applied a playbook | | `semantica_explain` | Get detailed commit explanation with AI attribution | Supported agents: Claude Code (`.mcp.json`), Cursor (`.cursor/mcp.json`), Kiro IDE (`.kiro/settings/mcp.json`), Kiro CLI (`.kiro/agents/semantica.json`), Gemini CLI (`~/.gemini/settings.json`), Copilot CLI (`.copilot/mcp-config.json`). --- ## tidy Safe housekeeping for Semantica state. Prunes stale broker entries, removes abandoned capture files, marks old incomplete checkpoints as failed, and removes orphan playbook FTS rows. Dry-run by default. ```bash semantica tidy # Preview what would change semantica tidy --apply # Apply the changes semantica tidy --json # JSON output ``` --- ## completion Generates shell completion scripts. ```bash source <(semantica completion zsh) # Zsh source <(semantica completion bash) # Bash semantica completion fish | source # Fish semantica completion powershell # PowerShell ``` --- ## Configuration **Canonical path:** /docs/configuration **Source file:** `docs/configuration.md` # Configuration Semantica is configured through `.semantica/settings.json` inside your repository. Most settings can be managed with `semantica set` without editing the file directly. --- ## settings.json ```json { "enabled": true, "version": 1, "providers": ["claude-code", "cursor"], "connected": false, "trailers": true, "automations": { "playbook": { "enabled": false }, "mcp": { "enabled": false } } } ``` | Field | Type | Description | |-------|------|-------------| | `enabled` | boolean | Master switch. `false` makes all hooks inert without uninstalling them. | | `version` | integer | Schema version. Currently `1`. | | `providers` | array | Providers with hooks installed. Set automatically by `semantica enable`. | | `connected` | boolean | Whether this repo syncs attribution to the dashboard. Set by `semantica connect`. | | `trailers` | boolean | Whether `Semantica-Attribution` and `Semantica-Diagnostics` trailers are appended. `Semantica-Checkpoint` is always included regardless of this setting. | | `automations.playbook.enabled` | boolean | Automatically generate an LLM playbook summary after each commit. | | `automations.mcp.enabled` | boolean | MCP server integration for AI agents. | --- ## Commit trailers By default, Semantica appends three trailers to every commit message: ``` Semantica-Checkpoint: chk_abc123 Semantica-Attribution: 42% claude_code (18/43 lines) Semantica-Diagnostics: 3 files, lines: 15 exact, 2 modified, 1 formatted ``` `Semantica-Checkpoint` is always included and cannot be disabled, it links the commit to its checkpoint. `Semantica-Attribution` and `Semantica-Diagnostics` are optional and can be toggled: ```bash semantica set trailers enabled # Include all three trailers (default) semantica set trailers disabled # Checkpoint-only commits ``` --- ## Auto-playbook When enabled, every commit automatically spawns a background LLM call that generates a structured playbook summary and indexes it for search. ```bash semantica set auto-playbook enabled semantica set auto-playbook disabled ``` Requires at least one LLM CLI installed and authenticated: `claude`, `cursor`, `gemini`, or `copilot`. The playbook is generated asynchronously, run `semantica explain HEAD` after a few seconds to see the result. --- ## Environment variables | Variable | Description | |----------|-------------| | `SEMANTICA_API_KEY` | API key for authentication. Overrides stored credentials. Useful in CI environments. | | `SEMANTICA_ENDPOINT` | Override the backend endpoint URL. | | `SEMANTICA_HOME` | Override the global Semantica home directory (used for global settings and credentials lookup). | --- ## Authentication and credentials Credentials from `semantica auth login` are stored in: - **macOS**: macOS Keychain - **Linux**: Linux Secret Service / libsecret-compatible keyring - **Headless / CI fallback**: `~/.config/semantica/credentials.json` with `0600` permissions (respects `$XDG_CONFIG_HOME`) Existing file credentials are automatically migrated to the OS secure store when it becomes available. For CI pipelines where browser-based auth is not practical, set `SEMANTICA_API_KEY` instead: ```bash export SEMANTICA_API_KEY=your_key_here semantica connect ``` --- ## File structure ``` .semantica/ settings.json # Per-repo configuration (this file) lineage.db # SQLite database: checkpoints, sessions, events, attribution, playbooks objects/ # Content-addressed blob store (SHA-256 keys, zstd compressed) activity.log # Hook lifecycle warnings and activity log worker.log # Background worker and auto-playbook logs enabled # Presence of this file controls whether hooks are active ``` `.semantica/` is automatically added to `.gitignore` by `semantica enable`. It is never committed to the repository. --- ## Resetting configuration To start over with a fresh configuration while keeping your Git history: ```bash semantica disable rm -rf .semantica/ semantica enable ``` This removes all local checkpoints, sessions, and attribution data. It does not affect commits already made or their trailers. --- ## Core concepts **Canonical path:** /docs/core-concepts **Source file:** `docs/core-concepts.md` # Core Concepts Understanding a few key ideas will help you get the most out of Semantica. --- ## Checkpoints A **checkpoint** is a snapshot of your repository at a specific point in time. It records: - A full file manifest (every tracked file and its SHA-256 hash) - The commit hash it was created at (if commit-linked) - The kind of checkpoint: `auto` (created by a commit hook), `manual` (created with `semantica checkpoint`), or `safety` (created automatically before a rewind) Checkpoints are the foundation of everything in Semantica. Attribution, rewind, and explain all operate on checkpoints. Every commit automatically creates a checkpoint. You can also create them manually at any time: ```bash semantica checkpoint -m "before big refactor" ``` Checkpoints are stored locally in `.semantica/` and never written to Git history. ### Checkpoint IDs Checkpoint IDs look like `chk_abc123def456`. They are prefix-matchable, you only need enough characters to be unique in your repo: ```bash semantica show chk_abc # works as long as it's unambiguous semantica rewind abc # same prefix matching applies ``` --- ## Sessions A **session** is a single AI agent conversation, one Claude Code chat, one Cursor composer thread, one Kiro IDE session. Semantica reads session data passively from each provider's local logs after a commit, and links sessions to the checkpoint created by that commit. Sessions contain **events**: user messages, assistant responses, and tool calls (file reads, writes, shell commands). Semantica uses these events to compute attribution. View sessions for your repo: ```bash semantica sessions ``` View the full transcript of a session: ```bash semantica sessions --transcript ``` --- ## Attribution **Attribution** is Semantica's answer to "how much of this commit was AI-generated?" After a commit, Semantica diffs the changed files against the AI session output captured for that checkpoint. It uses a three-tier matching system: | Tier | Description | |------|-------------| | **Exact** | Line matches AI output verbatim | | **Modified** | Line is a lightly edited version of AI output | | **Formatted** | Line is AI output with whitespace or formatting changes | The result is a percentage and a per-file breakdown: ``` HEAD (abc1234) add authentication module AI attribution: 78% (31/40 lines) auth/login.go 91% (20/22 lines) claude_code auth/middleware.go 61% (11/18 lines) claude_code ``` Attribution data is also pushed to GitHub PR comments and check runs when you connect a repo to the dashboard. --- ## Playbooks A **playbook** is an LLM-generated structured summary of a commit, created when you run `semantica explain HEAD --generate` or when auto-playbook is enabled. Each playbook captures: - Title and intent - What was changed and why - Outcome and learnings - Friction points encountered - Keywords for search Playbooks are stored locally and indexed for full-text search: ```bash semantica search "retry logic" semantica search "auth token refresh" ``` AI agents can also search playbooks via MCP, so knowledge from past sessions compounds across future work. --- ## The commit pipeline Here's the full sequence of what happens on every commit once Semantica is enabled: ``` 1. pre-commit hook └── Creates a pending checkpoint stub (saves file manifest offset) 2. git commit (your commit runs normally, nothing is blocked) └── commit-msg hook appends Semantica-Checkpoint trailer 3. post-commit hook └── Links checkpoint to the commit hash └── Spawns background worker (detached, no terminal output) 4. Background worker (async) ├── Ingests agent session data from detected providers ├── Builds full file manifest snapshot ├── Computes AI attribution (diff vs agent output) ├── Links sessions to checkpoint └── Optionally generates playbook (if auto-playbook enabled) ``` The worker is fully decoupled from your terminal session. If you close the terminal after committing, the worker continues until it finishes. --- ## Local-first design Semantica is designed to work entirely offline without any account or backend. All data stays in `.semantica/` inside your repository: - **No data leaves your machine** unless you run `semantica auth login` and `semantica connect` - **No Git history modification**: Semantica only appends trailers to commit messages and stores data in `.semantica/` - **No side branches**: the blob store and database are separate from the Git object store If you do connect a repo, Semantica pushes attribution summaries (not file contents or transcripts) to the hosted dashboard. Before any data leaves the machine, Semantica applies secret redaction using Gitleaks patterns. If redaction fails for any reason, the send is blocked rather than sending raw data. --- ## Features **Canonical path:** /docs/features **Source file:** `docs/features.md` # Features Detailed guide to Semantica's capabilities. --- ## AI Attribution Determines what percentage of a commit is AI-attributed by comparing added lines against captured AI tool output. ### How it works When you run `semantica blame` or `semantica explain`, Semantica diffs the commit against its parent and checks each added line against output captured from AI agent sessions. Lines are classified into three tiers: | Tier | Name | What it means | |------|------|---------------| | Exact | `ai_exact` | Line matches AI tool output character-for-character (after trimming whitespace) | | Formatted | `ai_formatted` | Match after stripping all whitespace - catches linter/formatter changes (e.g., `func foo(){` vs `func foo() {`) | | Modified | `ai_modified` | Line is in a diff hunk that overlaps with AI output but doesn't match exactly - the developer likely edited AI-generated code | AI code hashes are built from assistant-role events containing `Edit` (new_string field) and `Write` (content field) tool calls captured during provider hook events. `Bash` tool calls are only used to detect file deletions (via `rm` commands), not to build line-level code matches. ### What you see ```bash semantica blame HEAD # aggregate AI percentage semantica blame HEAD --json # per-file breakdown with exact/formatted/modified counts ``` The JSON output includes per-file `ai_percentage`, per-provider breakdown (provider name, model, AI lines), and diagnostics (events considered, payloads loaded, match counts). ### Prerequisites - Semantica enabled in the repo - At least one AI provider with hooks installed - Agent session activity that overlaps with the commit's time window ### Caveats - Attribution is anchored to the delta window between commit-linked checkpoints. Deferred created files can still pick up AI attribution from earlier history when they were present in the previous commit-linked manifest but committed later. - Lines that a developer manually edits after AI generation may count as "modified" rather than "exact." - Carry-forward is per-file, not per-line across windows. If a file already has current-window AI attribution, that file stays current-window authoritative. - Provider-level attribution (file touched by AI) is available for all providers; line-level payload analysis requires providers that report Edit/Write tool call content. --- ## Checkpoints and Rewind Checkpoints are point-in-time snapshots of every file in the repo. Rewind restores the working tree to any previous checkpoint. ### How it works **Automatic checkpoints** are created on every `git commit`: 1. The pre-commit hook creates a pending checkpoint stub (UUID, timestamp) 2. The background worker completes it by hashing every tracked file plus untracked, non-ignored files (SHA-256, zstd compressed) and writing a manifest (path -> blob hash mapping) **Manual checkpoints** can be created at any time: ```bash semantica checkpoint -m "Before big refactor" ``` **Rewind** restores files from a checkpoint's manifest: ```bash semantica rewind # restore files, create safety checkpoint first semantica rewind --exact # also delete files not in the checkpoint semantica rewind --no-safety # skip safety checkpoint (dangerous) ``` By default, rewind creates a safety checkpoint before restoring, so you can undo the rewind. ### What you see ```bash semantica list # checkpoints with ID, timestamp, commit hash, file count semantica show # full manifest with per-file blob hashes semantica rewind --json # files_restored, files_deleted, safety_checkpoint_id ``` ### Caveats - Rewind operates on the working tree only - it does not modify git history, staged changes, or the index. - Manifests include git-tracked files plus untracked, non-ignored files. Ignored files are not captured or restored. - The `--exact` flag deletes files not present in the checkpoint manifest, but always protects `.semantica/`. --- ## Commit Trailers Semantica always appends a machine-readable checkpoint trailer during the commit-msg hook. Attribution and diagnostics trailers are enabled by default and can be toggled with `semantica set trailers enabled|disabled`. ### How it works The pre-commit hook writes a handoff file (`.semantica/.pre-commit-checkpoint`) containing the checkpoint ID and timestamp. The commit-msg hook reads this file and appends trailers to the commit message. When trailer emission is enabled and AI is detected, the trailers look like this: ``` Semantica-Checkpoint: chk_abc123 Semantica-Attribution: 42% claude_code (sonnet) (18/43 lines) Semantica-Diagnostics: 3 files, lines: 15 exact, 2 modified, 1 formatted ``` - **Checkpoint** - links the commit to its checkpoint ID - **Attribution** - per-provider AI percentage with line counts (one trailer per provider if multiple contributed). If no AI matches the commit, this becomes `0% AI detected (0/N lines)`. - **Diagnostics** - aggregate match statistics. If no AI matches the commit, this explains whether no AI events existed in the checkpoint window or whether AI events existed but did not match the committed files. When trailer emission is disabled: ```text Semantica-Checkpoint: chk_abc123 ``` When no AI sessions exist in the checkpoint window: ```text Semantica-Checkpoint: chk_abc123 Semantica-Attribution: 0% AI detected (0/141 lines) Semantica-Diagnostics: no AI events found in the checkpoint window ``` When AI sessions exist but do not modify the committed files: ```text Semantica-Checkpoint: chk_abc123 Semantica-Attribution: 0% AI detected (0/141 lines) Semantica-Diagnostics: AI session events found, but no file-modifying changes matched this commit ``` ### Prerequisites - Semantica enabled (`semantica enable`) - Git hooks installed (happens automatically during enable) - Attribution and diagnostics trailers enabled if you want those extra trailers (`semantica set trailers enabled`) ### Caveats - Trailers are skipped if the handoff file is missing (e.g., `git commit --no-verify` skips the pre-commit hook). - Duplicate trailers are prevented - if a `Semantica-Checkpoint` trailer already exists (e.g., `git commit --amend`), it won't be added again. - `Semantica-Checkpoint` is always appended when trailer insertion runs. `Semantica-Attribution` and `Semantica-Diagnostics` are controlled together by the `trailers` setting. - Attribution trailers are best-effort. If attribution cannot be computed at all (for example, the database is unavailable or the hook times out) and trailer emission is enabled, Semantica appends the checkpoint trailer plus `Semantica-Diagnostics: attribution unavailable`. --- ## Playbooks and Search Playbooks are LLM-generated structured summaries of commits. Search lets you find past solutions by keyword. ### How it works A playbook is generated by sending the commit diff, attribution stats, and recent session transcript to an LLM. The response is parsed into a structured format: | Field | Description | |-------|-------------| | `title` | Short label (max 10 words) | | `intent` | What the developer tried to accomplish | | `outcome` | What was actually achieved | | `learnings` | Codebase patterns/conventions discovered | | `friction` | Problems, blockers, annoyances encountered | | `open_items` | Deferred work, tech debt | | `keywords` | 5-10 search terms for later discovery | Playbooks are indexed in an FTS5 full-text search table. All narrative fields are searchable.

Explain results

### What you see ```bash # Generate a playbook for a commit semantica explain HEAD --generate # Search past playbooks semantica search "auth token refresh" semantica search "database migration" --json ``` Search results are ranked by BM25 relevance and include commit hash, AI percentage, model used, and the full summary. ### Generation modes - **Manual**: `semantica explain --generate` (use `--force` to regenerate) - **Auto**: Enable with `semantica set auto-playbook enabled` - generates a playbook for every commit via a detached background process after the worker completes ### Prerequisites - At least one LLM CLI must be installed and accessible: Claude Code (`claude`), Cursor CLI (`agent`), Gemini CLI (`gemini`), or Copilot CLI (`copilot`). The first available provider in this order is used. - For auto-playbook, the provider must be authenticated and available non-interactively. ### Caveats - Generation is asynchronous. After `--generate`, run `semantica explain` again after a few seconds to see the result. - Playbook generation uses bounded diff input to stay within LLM context limits. Commit message and PR suggestions use structured change summaries plus selected per-file excerpts instead of a blind raw-diff prefix. Large diffs may still produce less precise summaries. - Playbooks are stored locally in `.semantica/lineage.db`. They are included in backend sync when the repo is connected with `semantica connect`. --- ## Hosted Reporting and Remote Attribution Push The CLI computes attribution locally and can optionally push it to the Semantica backend. The backend materializes hosted reporting surfaces - GitHub PR comments and check runs, GitLab MR comments and commit statuses, and dashboards. That provider integration logic does not live in this repository. Local-only mode works without any backend. All capture, attribution, checkpoints, playbooks, and search features are fully functional offline. ### How it works After the background worker completes a checkpoint, it POSTs an attribution payload to the effective backend endpoint at `/v1/attribution` when the repo is connected. The payload includes: - Git metadata (remote URL, branch, commit hash, subject) - Repo provider hint (`github`, `gitlab`, or `unknown`) - Full attribution breakdown (exact/formatted/modified line counts, per-file detail) - Per-provider detail (provider name, model, AI lines) - Session count and provider list - Playbook summary (if available) - CLI version and attribution algorithm version The backend uses this data to render: - **PR or MR comments** - AI attribution summary on GitHub pull requests or GitLab merge requests - **Provider status surfaces** - GitHub check runs and GitLab commit statuses on commits - **Dashboards** - team-level AI usage trends and per-repo breakdowns Authentication uses a bearer token obtained via `semantica auth login` (OAuth device flow) or the `SEMANTICA_API_KEY` environment variable. ### What you see ```bash semantica auth login # authenticate once with the backend semantica connect # connect this repo to the dashboard ``` ### Prerequisites - Authenticated via `semantica auth login` or `SEMANTICA_API_KEY` - Repo connected via `semantica connect` - Backend provisioned for your organization if you want hosted reporting surfaces - GitHub App installed if you want GitHub PR comments and check runs - GitLab project webhook configured if you want GitLab MR comments and commit statuses The backend endpoint is resolved from the authenticated session. `SEMANTICA_ENDPOINT` overrides it for local development and testing. ### Caveats - Push is best-effort with a 10-second timeout. Failures are logged to `.semantica/worker.log` but never block the worker or the commit. - A second push happens after auto-playbook generation completes, enriching the payload with the `playbook_summary` field. - `semantica auth login` does not connect any repos. Connection is a repo-local action controlled by `semantica connect` and `semantica disconnect`. - The backend canonicalizes the `remote_url` (handles SSH vs HTTPS, `.git` suffix) to match repos across push sources. - Provider comments, status surfaces, and dashboards are implemented in the backend/API repository, not in this CLI. See [docs/github-reporting.md](github-reporting.md) for details. --- ## Egress Redaction Semantica redacts likely secrets before prompt content or remote attribution payloads leave the machine. Local capture and stored blobs remain unchanged. ### How it works - LLM prompt content is redacted at the shared `llm.Generate` / `llm.GenerateText` boundary. - Remote attribution payloads are sanitized before upload. `remote_url` has embedded credentials, query strings, and fragments stripped before the rest of the payload is scanned. - Detection uses embedded Gitleaks rules. Matched values are replaced with `[REDACTED]`. ### Caveats - Redaction is best-effort. Unknown secret formats may still be missed. - Aggressive matches can remove prompt context and reduce LLM output quality on some diffs or summaries. - Redaction applies to outbound content only. Local raw capture in `.semantica/` is not rewritten. --- ## MCP Integration Exposes Semantica tools to AI agents via the [Model Context Protocol](https://modelcontextprotocol.io/), allowing agents to search past solutions and record attribution natively. ### How it works `semantica mcp enable` writes MCP server configuration to each supported provider's config file. The server runs over stdio using JSON-RPC 2.0, started on demand by the agent. Three tools are exposed: | Tool | Input | What it does | |------|-------|-------------| | `semantica_search` | `{query, limit?}` | Full-text search across playbook summaries (FTS5 BM25 ranking) | | `semantica_playbook_use` | `{commit_hash, note?}` | Records that the agent applied a past playbook | | `semantica_explain` | `{ref}` | Returns full commit explanation with attribution, sessions, and summary | ### What you see ```bash semantica mcp enable # install MCP config for all detected providers semantica mcp status # show which providers have MCP configured semantica mcp disable # remove MCP config ``` Once enabled, agents can call these tools during conversations. For example, an agent can search for how a similar problem was solved before and apply that playbook to the current task. ### Supported providers | Provider | Config path | Scope | |----------|-------------|-------| | Claude Code | `.mcp.json` | Per-project | | Cursor | `.cursor/mcp.json` | Per-project | | Kiro IDE | `.kiro/settings/mcp.json` | Per-project | | Kiro CLI | `.kiro/agents/semantica.json` | Per-project | | Gemini CLI | `~/.gemini/settings.json` | Global | | Copilot CLI | `.copilot/mcp-config.json` | Per-project | ### Caveats - The MCP server is stateless - each invocation reads from `.semantica/lineage.db` and returns results. - Search requires playbooks to exist. If no playbooks have been generated, search returns empty results. --- ## Provider Hook Capture Real-time capture of AI agent activity via provider-specific hooks. ### How it works When `semantica enable` detects an AI provider, it installs hooks in the provider's configuration file. These hooks call `semantica capture ` with event metadata on stdin. The capture lifecycle follows this pattern: 1. **Prompt submitted** - Semantica records the current transcript boundary to a capture state file at `$SEMANTICA_HOME/capture/capture-{key}.json`. The offset format is provider-specific: line count for JSONL-based providers, message index for Gemini, and provider-managed markers for Kiro CLI. 2. **Agent stop** - Semantica reads the transcript from the saved offset forward, extracts new events (tool calls, responses, file operations), and routes them through the broker to the correct repo's database. 3. **Session close** - Final transcript flush and state cleanup. Events are matched to repositories by file path (deepest-match rule). Events without file paths are matched by the session's source project path. ### Event types | Type | When it fires | |------|---------------| | `PromptSubmitted` | User submits a prompt - saves transcript offset | | `AgentCompleted` | Agent finishes responding - captures and routes events | | `SessionOpened` | Session starts - lifecycle tracking | | `SessionClosed` | Session ends - fallback capture if completion was missed | | `ContextCompacted` | Context window compressed - resets offset to EOF | | `SubagentCompleted` | Sub-agent finishes - captures sub-agent transcript | ### Prerequisites - Provider detected and hooks installed (`semantica enable` or `semantica agents`) - `semantica` binary on PATH (hooks invoke it by absolute path or via `command -v`) ### Caveats - Capture state is stored in `$SEMANTICA_HOME/capture/`. The boundary format is provider-specific and may use companion state managed by the provider. If the CLI is upgraded or the capture directory is cleared mid-session, some events may be missed. - The background worker runs a reconciliation pass to flush any sessions with pending capture state, ensuring no events are lost if a hook invocation was interrupted. - `semantica tidy --apply` can remove abandoned capture state, stale broker entries, and orphan playbook FTS rows, and mark old pending checkpoints as failed without touching complete checkpoint history. - Capture is per-machine - activity from a different machine using the same repo is not captured unless that machine also has Semantica enabled. --- ## Hosted reporting **Canonical path:** /docs/github-reporting **Source file:** `docs/github-reporting.md` # Hosted Reporting Semantica can surface AI attribution on GitHub and GitLab through PR or MR comments, provider-native status surfaces, and dashboards. This page explains what the CLI does vs what the backend does, and how the pieces connect. ## Architecture boundary | Component | What it does | Where it lives | |-----------|-------------|-------------------------------------| | **CLI** | Captures AI activity, computes attribution, pushes payload | This repository (`semanticash/cli`) | | **Backend/API** | Receives payloads, renders PR or MR comments, creates GitHub check runs or GitLab commit statuses, serves dashboards | Semantica backend `(private)` | The CLI is fully functional without the backend. All local features - capture, attribution, checkpoints, playbooks, search, rewind, MCP - work offline. ## What the CLI sends After each commit, the background worker POSTs an attribution payload to `{endpoint}/v1/attribution`: ```json { "remote_url": "git@github.com:org/repo.git", "branch": "feature-branch", "commit_hash": "abc123...", "commit_subject": "Add retry logic", "checkpoint_id": "chk_...", "ai_exact_lines": 15, "ai_formatted_lines": 2, "ai_modified_lines": 1, "ai_lines": 18, "human_lines": 25, "total_lines": 43, "files_total": 3, "files_ai_touched": 2, "files": [ ... ], "providers": ["claude_code"], "provider_details": [{"provider": "claude_code", "model": "sonnet", "ai_lines": 18}], "playbook_summary": "Intent: Add retry logic | Outcome: Implemented exponential backoff", "cli_version": "0.4.2", "attribution_version": "v1", "pushed_at": 1710374400000 } ``` ## What the backend renders ### GitHub PR comments When a pull request is opened or updated, the backend aggregates attribution payloads for all commits in the PR and posts a summary comment showing: - Aggregate AI percentage across the PR - Per-commit breakdown - Provider and model information - Playbook summary (if available) ### GitHub check runs The backend creates a GitHub check run on each pushed commit with the attribution result. This appears in the PR's checks tab and can be configured as a required status check. ### GitLab MR comments When a merge request is opened or updated, the backend aggregates attribution payloads for all commits in the MR and posts a summary note on the MR. GitLab notes are currently posted with the connected GitLab user's identity, not a Semantica-owned bot identity. ### GitLab commit statuses The backend creates a GitLab commit status on the MR head commit. This is not a GitHub-style check run. GitLab surfaces it through its commit and MR status UI. ### Dashboards The web dashboard at [semantica.sh](https://www.semantica.sh) provides: - Per-repo AI usage trends over time - Team-level attribution aggregates - Per-commit detail drill-down - Provider breakdown ## Authentication The CLI authenticates with the backend via OAuth device flow or API key: ```bash semantica auth login # opens browser for OAuth (GitHub or GitLab) semantica connect # connects the current repo to the dashboard ``` Alternatively, set `SEMANTICA_API_KEY` for CI environments. Tokens are stored in OS secure storage (macOS Keychain, Linux Secret Service) when available, with automatic refresh on expiry. On headless or CI environments without a keyring, credentials fall back to `~/.config/semantica/credentials.json` (respects `$XDG_CONFIG_HOME`) with `0600` permissions. ## Provisioning For hosted reporting to work: 1. **Backend access** - your organization must have a Semantica backend instance 2. **Authentication** - the CLI must be authenticated (`semantica auth login` or `SEMANTICA_API_KEY`) 3. **Repo connected** - the current repo must be connected via `semantica connect` 4. **Provider setup** - **GitHub** - the Semantica GitHub App must be installed on the target repository if you want PR comments and check runs - **GitLab** - the target project must have a Semantica MR webhook. `semantica connect` attempts to create it automatically. If the connected user cannot manage project webhooks, the CLI prints the manual webhook URL and repo-specific secret to give to a maintainer. The backend endpoint is resolved from the authenticated session. `SEMANTICA_ENDPOINT` overrides it for local development and testing. ## Failure modes | Scenario | Behavior | |----------|----------| | Repo not connected | CLI works locally, no push attempted | | Not authenticated | CLI works locally, no push attempted | | Auth expired | Push fails silently, logged to `.semantica/worker.log` | | Backend unreachable | 10-second timeout, logged, worker completes normally | | GitHub App not installed | Backend can still receive attribution, but cannot post PR comments or check runs | | GitLab webhook not configured | Backend can still receive attribution, but cannot post MR comments or commit statuses | | GitLab user lacks webhook permission | `semantica connect` still succeeds and prints manual webhook setup details | | Playbook not yet generated | First push omits `playbook_summary`; a re-push after auto-playbook adds it | All push failures are best-effort. The CLI never blocks commits, the worker, or any local feature due to a backend issue. --- ## Installation **Canonical path:** /docs/installation **Source file:** `docs/installation.md` # Installation Semantica is available for macOS and Linux (x86_64 and arm64). Windows is not currently supported. **Requirements:** - Git (any reasonably recent version) - macOS or Linux - At least one [supported AI provider](/docs/integration-claude-code) for capture --- ## Homebrew (macOS) The recommended method for macOS. Installs the binary plus shell completions for Bash, Zsh, and Fish automatically. ```bash brew install semanticash/tap/semantica ``` To upgrade later: ```bash brew upgrade semantica ``` --- ## Shell script (macOS / Linux) Downloads and installs the latest release binary. Verifies the SHA-256 checksum before installing. ```bash curl -fsSL https://semantica.sh/install.sh | sh ``` The script: 1. Detects your OS and architecture 2. Downloads the correct binary from GitHub Releases 3. Verifies the checksum against `checksums.txt` 4. Installs to the first writable directory on your `$PATH`, or `~/.local/bin` as a fallback To install a specific version: ```bash VERSION=1.5.2 curl -fsSL https://semantica.sh/install.sh | sh ``` To install to a specific directory: ```bash INSTALL_DIR=/usr/local/bin curl -fsSL https://semantica.sh/install.sh | sh ``` --- ## From source Requires Go 1.26 or later. ```bash git clone https://github.com/semanticash/cli.git cd cli make build # binary at ./bin/semantica make install # installs to /usr/local/bin ``` --- ## Shell completions Homebrew installs completions automatically. For shell script or source installs, load them from the CLI: ```bash # Zsh source <(semantica completion zsh) # Bash source <(semantica completion bash) # Fish semantica completion fish | source # PowerShell semantica completion powershell ``` To make completions permanent, add the appropriate line to your shell's config file (`.zshrc`, `.bashrc`, etc.). --- ## Verify the installation ```bash semantica --version ``` Expected output: ``` semantica v1.5.2 (abc1234) ``` --- ## Updating **Homebrew:** ```bash brew upgrade semantica ``` **Shell script install:** Re-run the install script. It will overwrite the existing binary. **Source:** Pull the latest changes and rebuild with `make build && make install`. --- ## Uninstalling Semantica only places files in two locations: the binary itself, and `.semantica/` directories inside any repos you've enabled. To fully remove it: 1. Disable Semantica in each repo: `semantica disable` 2. Remove the binary (e.g. `brew uninstall semantica` or `rm $(which semantica)`) 3. Optionally remove the `.semantica/` directories from your repos 4. Optionally remove `~/.config/semantica/` which holds authentication credentials --- ## Claude Code **Canonical path:** /docs/integration-claude-code **Source file:** `docs/integration-claude-code.md` # Claude Code Semantica captures Claude Code session data automatically, with no manual setup beyond `semantica enable`. --- ## How it works When you run `semantica enable`, Semantica detects Claude Code by checking for its configuration directory and installs a lightweight hook in `.claude/settings.json`. This hook fires at two points in every Claude Code session: - **`user-prompt-submit`**: records the current position in the session transcript so Semantica knows where to start reading after the session ends - **`stop`**: reads the new transcript content since the last position, parses tool calls (file writes, reads, edits), and routes that data to the local repo store Semantica never modifies session logs, conversation history, or any Claude Code data files. The hook only reads. --- ## Attribution quality Claude Code provides the highest attribution fidelity of any supported provider. Because Claude Code exposes exact file content from `Write` and `Edit` tool calls, Semantica can perform **line-level exact matching**: - Lines written verbatim by Claude are marked as `exact` - Lines that were AI-generated but lightly edited are marked as `modified` - Lines with only formatting or whitespace changes are marked as `formatted` This results in the most accurate attribution percentages of any provider. --- ## Setup Claude Code is detected and configured automatically during `semantica enable`. If you install Claude Code after enabling Semantica, re-run with `--force`: ```bash semantica enable --force ``` Or add just the Claude Code provider: ```bash semantica enable --providers claude-code ``` To verify Claude Code hooks are installed: ```bash semantica agents --json ``` --- ## MCP integration Semantica also provides an MCP server that Claude Code can call natively. This lets Claude Code search past playbooks and commit explanations during a session, so knowledge from previous work is available in real time. ```bash semantica mcp enable ``` This adds Semantica's MCP server to `.mcp.json` in your repo. Claude Code will automatically discover and use the following tools: | Tool | What it does | |------|-------------| | `semantica_search` | Search past playbooks for solutions relevant to the current task | | `semantica_playbook_use` | Record that a playbook solution was applied | | `semantica_explain` | Get attribution and session details for any commit | --- ## Hook configuration location ``` .claude/settings.json ``` Semantica adds entries under the `hooks` key. If you need to inspect what was installed: ```bash cat .claude/settings.json ``` --- ## Troubleshooting **Sessions not being captured** 1. Confirm the hook is installed: `semantica agents --json` 2. Confirm Semantica is enabled: `semantica status` 3. Check `.semantica/worker.log` for error messages after your next commit 4. Re-run `semantica enable --force` to reinstall hooks **Attribution shows 0% even though I used Claude** The session data is ingested after the commit by the background worker. If you run `semantica blame HEAD` immediately after committing, the worker may not have finished yet. Wait a moment and run it again. --- ## Integrations providers **Canonical path:** /docs/integrations-providers **Source file:** `docs/integrations-providers.md` # Cursor Semantica captures Cursor session data automatically during `semantica enable`. --- ## How it works Semantica installs a hook in `.cursor/hooks.json` that fires at the end of Cursor composer sessions. It reads from Cursor's local storage databases, the legacy `.vscdb` format, the modern `ai-code-tracking.db`, and agent transcript JSONL files, to extract which files were modified during the session. Semantica reads these databases passively and never modifies them. --- ## Attribution quality Cursor provides **file-level modified attribution**: Semantica can identify which files were touched by the AI agent in a session, and marks lines in those files as AI-attributed. Because Cursor does not expose exact AI-generated content the way Claude Code does, attribution is at the file level rather than individual lines. --- ## Setup Cursor is detected and configured automatically during `semantica enable`. To reinstall: ```bash semantica enable --providers cursor ``` ## Hook configuration location ``` .cursor/hooks.json ``` --- ## MCP integration ```bash semantica mcp enable ``` Adds Semantica's MCP server to `.cursor/mcp.json`. Cursor will discover `semantica_search`, `semantica_playbook_use`, and `semantica_explain` automatically. --- --- # Gemini CLI Semantica captures Gemini CLI sessions automatically. --- ## How it works Semantica adds an entry to `~/.gemini/settings.json` (the global Gemini CLI config) that fires hooks at session checkpoints. Session data is read from `~/.gemini/tmp//chats/`. The hook config is installed globally rather than per-repo, so Gemini CLI sessions are captured in any repo where Semantica is enabled. --- ## Attribution quality Gemini CLI provides **file-level attribution**: Semantica captures which files were modified during a session and attributes those changes accordingly. --- ## Setup Gemini CLI is detected and configured automatically during `semantica enable`. To reinstall: ```bash semantica enable --providers gemini-cli ``` ## Hook configuration location ``` ~/.gemini/settings.json ``` --- ## MCP integration ```bash semantica mcp enable ``` Adds Semantica's MCP server to `~/.gemini/settings.json`. --- --- # GitHub Copilot CLI Semantica captures GitHub Copilot CLI sessions automatically. --- ## How it works Semantica installs a hook in `.github/hooks/semantica.json` (repo-local). This hook fires at Copilot agent session boundaries and reports which files were modified. --- ## Attribution quality Copilot CLI provides **file-level modified attribution**. --- ## Setup Copilot CLI is detected and configured automatically during `semantica enable`. To reinstall: ```bash semantica enable --providers copilot-cli ``` ## Hook configuration location ``` .github/hooks/semantica.json ``` --- ## MCP integration ```bash semantica mcp enable ``` Adds Semantica's MCP server to `.copilot/mcp-config.json`. --- --- # Kiro IDE Semantica supports Kiro IDE, providing repo-local capture with execution trace data. --- ## How it works Semantica installs a hook in `.kiro/hooks/` as a `.kiro.hook` file. Kiro IDE fires this hook at session lifecycle events. Semantica parses Kiro's execution trace format, which includes per-action timestamps and file-level modification data. --- ## Attribution quality Kiro IDE provides **file-level modified attribution** via execution trace parsing. The execution trace includes per-action timestamps, which allows Semantica to precisely correlate agent actions with commits by time window. Future versions may support line-level attribution using `modifiedContent` fields present in the trace format. --- ## Setup Kiro IDE is detected and configured automatically during `semantica enable`. To install explicitly: ```bash semantica enable --providers kiro-ide ``` ## Hook configuration location ``` .kiro/hooks/semantica.kiro.hook ``` --- ## MCP integration ```bash semantica mcp enable ``` Adds Semantica's MCP server to `.kiro/settings/mcp.json`. --- --- # Kiro CLI Semantica supports Kiro CLI using a named agent configuration approach. --- ## How it works Semantica installs a named agent config at `.kiro/agents/semantica.json`. When a Kiro CLI session runs with the `semantica` agent config active, Semantica capture and MCP tools are enabled for that session. Kiro CLI's agent configuration includes `PostToolUse` hooks that fire after each tool call, providing rich per-tool-call data including file content from `fs_write` calls. --- ## Attribution quality Kiro CLI provides the richest raw data of any provider. The `PostToolUse` hook gives Semantica access to exact file content from every write operation, making line-level attribution possible in a future version. Currently, Semantica uses **file-level modified attribution**. --- ## Setup Kiro CLI is detected and configured automatically during `semantica enable`. To install explicitly: ```bash semantica enable --providers kiro-cli ``` ### Activating the Semantica agent config To capture a session, the Kiro CLI must be running with the `semantica` agent config: ```bash kiro-cli chat --agent semantica ``` To make it the repo default (so all sessions in this repo use it automatically): ```bash kiro-cli agent set-default semantica ``` ## Hook configuration location ``` .kiro/agents/semantica.json ``` --- ## MCP integration MCP tools are included in the `semantica` agent config and are active whenever the agent is selected. No separate `semantica mcp enable` step is needed for Kiro CLI. --- ## Limitations **Canonical path:** /docs/limitations **Source file:** `docs/limitations.md` # Limitations Known constraints and intentional scope boundaries. Feature-specific caveats are documented in their respective pages - this is the cross-cutting summary. --- ## Platform support - Official release targets are **macOS and Linux** (amd64, arm64). There is no Windows release target today. - Clipboard support for `semantica suggest commit` and `semantica suggest pr --copy` requires `pbcopy` (macOS) or `xclip`/`xsel` (Linux). The commands still work without clipboard support - they print to stdout. ## Capture scope - Capture only happens where Semantica hooks are installed. In practice, sessions launched from a Semantica-enabled repo are captured; activity in repos without `semantica enable` is not. - Capture is **per-machine**. Another developer or CI runner working on the same repo needs its own Semantica setup to capture their sessions. - If the CLI is upgraded or the capture state directory (`$SEMANTICA_HOME/capture/`) is cleared mid-session, offset state for in-progress sessions is lost. The worker reconciliation pass recovers what it can, but some events may be missed. ## Git and repo boundaries - **Rewind only affects the working tree.** It does not rewrite Git history, modify the index, or unstage changes. Files are restored on disk only. - Checkpoint manifests include git-tracked files and untracked, non-ignored files. Ignored files are not captured or restored. - Nested repositories are treated as separate ownership scopes - events are routed to the deepest matching repo root. ## Attribution fidelity - Attribution is anchored to captured session data within the checkpoint delta window. Deferred created files can carry forward AI attribution from earlier history when they were already present in the previous commit-linked manifest but committed later. - **Provider metadata varies.** Claude Code provides line-level tool call content (Edit/Write payloads), enabling exact and formatted matching. Providers such as Cursor, Kiro IDE, and Kiro CLI may only report file-level tool metadata, which limits attribution to hunk-overlap matching. - Manual edits after AI generation downgrade matches from "exact" to "modified." Mixed human/AI edits in the same hunk are attributed as modified rather than exact. - Carry-forward is per-file, not per-line across windows. If the same file has current-window AI activity, Semantica keeps that file current-window authoritative instead of merging historical and current AI lines inside one file. - Attribution is computed against the diff between checkpoints. Squashed or rebased commits that collapse multiple checkpoints may produce less precise results. ## Playbooks and suggestions - Require at least one supported LLM CLI installed and authenticated: Claude Code (`claude`), Cursor CLI (`agent`), Gemini CLI (`gemini`), or Copilot CLI (`copilot`). - Playbook generation uses bounded diff input to stay within LLM context limits. Commit message and PR suggestions use structured change summaries plus selected per-file excerpts. Large diffs may still produce less precise summaries. - `semantica suggest pr` uses the committed branch diff against the base ref. Uncommitted working-tree changes are not included in the suggestion. - `semantica suggest pr` detects the base branch best-effort. Repos with non-standard default branch names may need `--base` explicitly. - Playbook generation is asynchronous - results are not immediately available after `--generate`. ## Kiro IDE - Kiro IDE hooks do not expose an explicit session ID to external commands. Semantica pairs `promptSubmit` and `agentStop` by workspace-scoped capture state and chooses the session best-effort at prompt submission. - If multiple Kiro chats exist for the same workspace, Semantica may still select the wrong one at prompt submission because the hook API does not identify the active chat directly. ## Kiro CLI - Kiro CLI support currently uses a dedicated repo-local agent config at `.kiro/agents/semantica.json`. Semantica capture and MCP are active only when the current Kiro CLI session is using that config. You can select it with `kiro-cli chat --agent semantica`, or make it the repo default with `kiro-cli agent set-default semantica`. - Kiro CLI hook payloads do not expose a conversation ID directly. Semantica pairs `userPromptSubmit` and `stop` by workspace-scoped capture state and resolves the active conversation best-effort from the current workspace. - If `userPromptSubmit` is missed for a turn, the following `stop` event cannot reconstruct the missing boundary for that turn. ## Hosted reporting - GitHub PR comments and check runs require backend provisioning, a GitHub App installation, and CLI authentication. GitLab MR comments and commit statuses require backend provisioning, a project webhook, and CLI authentication. See [github-reporting.md](github-reporting.md) for setup details. - Dashboard sync is best-effort with a 10-second timeout. Failures never block the worker, the commit, or any local feature. - All hosted rendering logic lives in the backend/API, not in this CLI repository. ## Secret redaction - Secret redaction is outbound only. Local raw capture, transcript payloads, and blob content in `.semantica/` remain unchanged. - Detection is best-effort and uses embedded Gitleaks rules. Unknown formats may be missed, and false positives can still remove some prompt context. - If redaction initialization fails, Semantica fails closed for the affected outbound operation instead of sending raw content. ## MCP - Search is only useful after playbooks have been generated. Repos with no playbooks return empty search results. - The MCP server is stateless - it reads from `.semantica/lineage.db` on each request. There is no caching or cross-repo aggregation at the MCP layer. --- ## AI provider integrations **Canonical path:** /docs/providers **Source file:** `docs/providers.md` # AI Provider Integrations Semantica supports six AI coding providers. For each detected provider, Semantica installs repo-local hooks in the provider's configuration file. These hooks fire during agent activity (prompt submission, response completion) and route captured events to the repo's lineage database via the broker. Semantica reads session transcripts passively - it never modifies agent session logs or transcript files. ## Capture model All providers follow the same high-level hook lifecycle: 1. **`user-prompt-submit`** (or equivalent) - Fired when the user submits a prompt. Semantica saves provider-specific capture state so the matching completion hook can identify the same session, transcript, or workspace boundary. 2. **`stop`** (or equivalent) - Fired when the agent finishes responding. Semantica reuses the pinned capture state, reads new provider data from the provider's transcript, trace store, or database, and routes extracted events to the appropriate repo via the broker. The exact storage and offset model is provider-specific. Some providers read from transcript offsets, some use provider-managed markers, and Kiro IDE scans execution traces at stop time. The background worker runs a reconciliation pass (`reconcileActiveSessions`) to flush any sessions that still have pending capture state, but the worker is not the main capture mechanism. --- ## Claude Code **Hook config**: `.claude/settings.json` Claude Code stores conversation transcripts as JSONL files in project-specific directories under `~/.claude/projects/`. Each line is a typed event (`system`, `human`, `assistant`, `result`). ### Hooks Semantica registers two hooks in `.claude/settings.json`: - **`user-prompt-submit`** - Saves the current transcript offset. - **`stop`** - Reads from the saved offset, extracts events, routes to the repo. ### AI code tracking Claude Code tool calls include file paths and content. Semantica uses `Write` and `Edit` tool calls to build a set of AI-generated code hashes (`ai_code_hashes`). During attribution, each changed line in a commit is compared against these hashes to determine AI authorship. --- ## Cursor **Hook config**: `.cursor/hooks.json` The Cursor provider covers both Cursor IDE and Cursor CLI. Both share the same `.cursor/` configuration directory and hooks file. Cursor stores AI interaction data in multiple formats depending on the version: 1. **Legacy `.vscdb`** - SQLite databases in Cursor's workspace storage directory containing conversation threads and AI completions. 2. **Modern `ai-code-tracking.db`** - A dedicated SQLite database for tracking AI-generated code regions. 3. **Agent JSONL** - Transcript files from Cursor's agent/composer mode, stored as JSONL with tool calls and responses. Semantica scans all three sources during ingestion. ### Detection The Cursor provider is detected by searching for Cursor's application data directory: - macOS: `~/Library/Application Support/Cursor/User/` - Linux: `~/.config/Cursor/User/` ### Hooks Semantica registers hooks in `.cursor/hooks.json` following the same prompt-submit / stop lifecycle as Claude Code. If Cursor IDE is already running when you enable Semantica, it may not pick up changes to `.cursor/hooks.json` immediately. Reload the Cursor window or restart Cursor after `semantica enable` so the new hooks are loaded. ### Limitations - Cursor's internal database format is not a public API and may change between versions. - The legacy `.vscdb` format contains many workspace state entries beyond AI interactions - Semantica filters for relevant keys. - Some Cursor agent sessions may not include file-path-level tool call metadata, reducing attribution granularity. --- ## Kiro IDE **Hook config**: `.kiro/hooks/*.kiro.hook` Kiro IDE stores per-workspace session indexes and per-session history files under its application data directory. Semantica reads the workspace session index plus execution traces to capture file operations and route them to the repo. ### Detection Detected by checking for Kiro's globalStorage directory: - macOS: `~/Library/Application Support/Kiro/User/globalStorage/kiro.kiroagent/` - Linux: `~/.config/Kiro/User/globalStorage/kiro.kiroagent/` ### Hooks Semantica installs repo-local Kiro hooks in `.kiro/hooks/` using `runCommand` actions: - **`promptSubmit`** - Resolves and pins the session history reference for the current workspace so the matching stop hook can reuse it. - **`agentStop`** - Scans Kiro's execution trace store for the pinned session, extracts file operations, and routes them to the repo. Unlike Claude, Cursor, Gemini, and Copilot, Kiro IDE does not expose an explicit session ID to external hook commands. Semantica pairs `promptSubmit` and `agentStop` through a workspace-scoped capture-state key and pins the session chosen at prompt submission. At stop time it scans the execution trace store, filters traces back to that session, and relies on deterministic event IDs for idempotent writes. ### Attribution Kiro execution traces include structured file operations such as `create`, `append`, and `smartRelocate`. In the current implementation, Semantica uses these as provider file-edit signals, which gives file-level attribution. Line-level exact matching from Kiro content blobs is reserved for a later iteration. ### Limitations - Kiro IDE hook commands do not receive an explicit session ID, so session selection at prompt submission is still best-effort when multiple Kiro chats exist for the same workspace. - Kiro attribution is currently file-level rather than exact line-level. ### MCP `semantica mcp enable` writes repo-local Kiro IDE MCP config to: - `.kiro/settings/mcp.json` --- ## Kiro CLI **Hook config**: `.kiro/agents/semantica.json` Kiro CLI stores conversation history in a SQLite database and exposes hook payloads as JSON on stdin. Semantica reads the current workspace conversation from the Kiro CLI database, tracks the last processed file-writing tool call in provider-managed sidecar state, and routes new file-writing tool calls to the repo. ### Detection Detected by checking for a Kiro CLI binary on `PATH`: - `kiro-cli` - `kiro` ### Hooks Semantica installs a dedicated repo-local Kiro CLI agent profile at `.kiro/agents/semantica.json` with two hooks: - **`userPromptSubmit`** - Saves the current workspace conversation reference and records the current `fs_write` boundary for that workspace. - **`stop`** - Reuses the pinned conversation when capture state exists, reads `fs_write` calls after the saved boundary from the Kiro CLI database, and routes them to the repo. Kiro CLI hook payloads include `cwd` and `prompt`, but they do not give Semantica an explicit conversation ID. Semantica pairs `userPromptSubmit` and `stop` through a workspace-scoped capture-state key and resolves the active conversation from the current workspace. ### Attribution Kiro CLI currently captures `fs_write` tool calls and turns them into provider file-edit signals. That gives file-level attribution today. Exact line-level matching from Kiro CLI tool content can be added in a later iteration. ### Usage Kiro CLI stores behavior in named agent configs. Semantica installs a repo-local config named `semantica` at `.kiro/agents/semantica.json`. If the current Kiro CLI session uses that config, Semantica capture is active. You can select it explicitly: ```bash kiro-cli chat --agent semantica ``` Or make it the default for the current repo so plain `kiro-cli chat` uses it automatically: ```bash kiro-cli agent set-default semantica ``` `semantica mcp enable` adds the Semantica MCP server to the same agent config by writing `mcpServers` into: - `.kiro/agents/semantica.json` ### Limitations - Kiro CLI support in `v1` is tied to the repo-local `semantica` agent config. If Kiro CLI is using some other agent config, Semantica hooks and MCP will not be active for that session. - Kiro CLI hooks do not expose a conversation ID directly, so conversation selection is still best-effort when multiple Kiro CLI chats exist for the same workspace. - If `userPromptSubmit` is missed, the following `stop` event cannot recover the missing offset boundary for that turn. --- ## Gemini CLI **Hook config**: `~/.gemini/settings.json` Gemini CLI stores conversation history as JSON files in project-specific directories under `~/.gemini/tmp/`. Each file represents a complete chat session. ### Detection Detected by checking for the existence of `~/.gemini/tmp/`. The project hash is computed from the repository's absolute path. ### Hooks Semantica registers hooks in `~/.gemini/settings.json` following the same lifecycle pattern as the other providers. --- ## GitHub Copilot **Hook config**: `.github/hooks/semantica.json` Copilot CLI stores session transcripts as JSONL event files at `~/.copilot/session-state//events.jsonl`, alongside a `workspace.yaml` with project metadata. ### Detection Detected by checking for the existence of `~/.copilot`. ### Hooks Semantica installs five hooks in `.github/hooks/semantica.json`: - **`userPromptSubmitted`** - Saves the current transcript offset. - **`agentStop`** - Reads from the saved offset, extracts events, routes to the repo. - **`sessionStart`** - Records session open. - **`sessionEnd`** - Final transcript flush and session close. - **`subagentStop`** - Captures sub-agent completions. --- ## Provider detection When you run `semantica enable`, the CLI scans for each provider's data directory. Detected providers are recorded as a string array in `.semantica/settings.json`: ```json { "providers": ["claude-code", "cursor", "kiro-ide", "kiro-cli", "gemini", "copilot"] } ``` Re-run `semantica enable --force` to re-detect providers after installing a new AI tool. Use `semantica agents` to interactively toggle which providers have hooks installed. ## Adding provider support Provider integrations live in `internal/hooks//`. Each provider implements the `HookProvider` interface: detection, hook install/uninstall, event parsing, transcript reading. To add a new provider: 1. Create a package under `internal/hooks//` 2. Implement `HookProvider` (see `internal/hooks/provider.go` for the interface) 3. Call `hooks.RegisterProvider()` in an `init()` function 4. Import the package in `internal/service/worker.go` (blank import for `init()` registration) 5. Optionally add MCP support in `internal/mcp/config.go` --- ## Quickstart **Canonical path:** /docs/quickstart **Source file:** `docs/quickstart.md` # Quickstart Get Semantica tracking AI activity in your repository in under two minutes. --- ## Step 1: Install **Homebrew (macOS)** ```bash brew install semanticash/tap/semantica ``` **Shell script (macOS / Linux)** ```bash curl -fsSL https://semantica.sh/install.sh | sh ``` **Verify the install** ```bash semantica --version ``` --- ## Step 2: Enable in your repo Navigate to an existing Git repository and run: ```bash cd /path/to/your/repo semantica enable ``` This does four things: 1. Creates `.semantica/` with a SQLite database and blob store 2. Installs `pre-commit`, `commit-msg`, and `post-commit` Git hooks 3. Auto-detects installed AI providers and sets up capture hooks for each 4. Creates a baseline checkpoint of the current state of your repository You'll see output like: ``` ✓ Initialized .semantica/ ✓ Installed Git hooks (pre-commit, commit-msg, post-commit) ✓ Detected providers: claude-code, cursor ✓ Created baseline checkpoint chk_abc123 Semantica is enabled. Every commit is now tracked. ``` --- ## Step 3: Make a commit Work normally. Use your AI agent as you always would. When you commit: ```bash git add . git commit -m "add authentication module" ``` Semantica runs silently in the background. After a moment, a checkpoint is created, agent session data is ingested, and attribution is computed. The commit message will include a trailer: ``` add authentication module Semantica-Checkpoint: chk_def456 Semantica-Attribution: 78% claude_code (31/40 lines) Semantica-Diagnostics: 3 files, lines: 28 exact, 2 modified, 1 formatted ``` --- ## Step 4: View attribution See what percentage of a commit was AI-generated: ```bash semantica blame HEAD ``` Get a full breakdown of what happened: ```bash semantica explain HEAD ``` See all checkpoints: ```bash semantica list ``` --- ## What's next - **Connect to the dashboard**: `semantica auth login` then `semantica connect` to push attribution to [semantica.sh](https://www.semantica.sh) for team visibility and GitHub PR comments - **Enable auto-playbook**: `semantica set auto-playbook enabled` to generate LLM summaries of every commit - **Enable MCP**: `semantica mcp enable` to let your AI agents search past solutions - **Explore commands**: see the full [Commands reference](/docs/commands) --- ## Release **Canonical path:** /docs/release **Source file:** `docs/release.md` # Release Process This document is for maintainers cutting Semantica releases. Normal contributors do not need it. Semantica uses [GoReleaser](https://goreleaser.com/) to build, package, and publish releases. ## Targets Releases are cross-compiled for: | OS | Architecture | |----|-------------| | macOS (darwin) | amd64, arm64 | | Linux | amd64, arm64 | Binaries are statically linked (`CGO_ENABLED=0`). ## Distribution channels ### GitHub Releases Each tagged release creates a GitHub Release with: - `semantica__.tar.gz` archives - `checksums.txt` (SHA-256) - each archive includes: - the `semantica` binary - shell completion scripts for Bash, Zsh, and Fish - `LICENSE` - `README.md` - release notes extracted from `CHANGELOG.md` ### Homebrew GoReleaser pushes a cask update to `semanticash/homebrew-tap` on each release: ```bash brew install semanticash/tap/semantica ``` The cask installs the `semantica` binary plus shell completions for Bash, Zsh, and Fish. ### Install script `install.sh` downloads the latest release from GitHub, verifies the checksum, and installs the binary: ```bash curl -fsSL https://raw.githubusercontent.com/semanticash/cli/main/install.sh | sh ``` Supports `VERSION` and `INSTALL_DIR` environment variables for pinning. ## Creating a release 1. Ensure `main` is clean and all CI checks pass. 2. Add a matching section to `CHANGELOG.md` for the release version: ```md ## [0.1.1] - 2026-03-15 ### Added ### Changed ### Fixed ``` Write release notes as user-facing bullets grouped under `Added`, `Changed`, and `Fixed`. Do not paste raw commit hashes or a commit-by-commit dump. 3. Tag the release: ```bash git tag -a v0.1.1 -m "v0.1.1" git push origin v0.1.1 ``` 4. GoReleaser runs via GitHub Actions on tag push. It: - Builds binaries for all targets - Creates the GitHub Release with archives and checksums - Updates the Homebrew tap cask - Extracts the matching `CHANGELOG.md` entry and uses it as the GitHub release body If the release workflow cannot find a `CHANGELOG.md` section for the tag version, it fails instead of publishing raw commit-message notes. ## Version injection The version metadata is injected at build time via linker flags: ``` -X github.com/semanticash/cli/internal/version.Version= -X github.com/semanticash/cli/internal/version.Commit= ``` `make build` uses `git describe --tags --always --dirty` for the version and `git rev-parse --short HEAD` for the commit. GoReleaser uses the tag version and short commit for releases. `semantica --version` prints the injected version and commit. ## CI checks Every push and PR runs these checks (see `.github/workflows/ci.yml`): | Job | What it does | |-----|-------------| | `generated` | Regenerates sqlc code and checks for drift | | `test` | Unit tests with race detector | | `lint` | golangci-lint v2.11.3, installed with the job's Go toolchain | | `build` | GoReleaser cross-compile check (`--snapshot`) | | `e2e` | End-to-end tests against compiled binary | | `shellcheck` | Validates `install.sh` | ## Configuration Release configuration lives in `.goreleaser.yaml`. Key settings: - Archives use `semantica__` naming - `make completions` runs before release to generate shell completion scripts - Release archives bundle the generated completion scripts plus `LICENSE` and `README.md` - GitHub release notes are extracted from the matching `CHANGELOG.md` section during the release workflow - Homebrew cask installs Bash, Zsh, and Fish completions from the bundled `completions/` files - Homebrew cask updates require the `HOMEBREW_TAP_TOKEN` secret for tap repo access --- ## Dashboard & repositories **Canonical path:** /docs/semantica-io-dashboard-repos-sessions **Source file:** `docs/semantica-io-dashboard-repos-sessions.md` # Dashboard The Semantica dashboard at [semantica.sh](https://www.semantica.sh) provides a team-wide view of AI attribution across connected repositories. --- ## What the dashboard shows - **Attribution trend**: AI percentage over time across all connected repos, broken down by provider - **Per-repo summaries**: Recent commits, average AI attribution, active providers, session counts - **PR activity**: Open and recently merged PRs with attribution scores - **Policy status**: Which repos have check runs configured and at what thresholds --- ## Accessing the dashboard 1. Install the Semantica GitHub App from the dashboard (for GitHub integration) 2. Run `semantica auth login` in your terminal 3. Run `semantica connect` in each repo you want to appear Once connected, repos will appear on the dashboard within a few minutes of the next commit. --- ## Navigation The dashboard sidebar organizes data into: - **Overview**: cross-repo summary and activity feed - **Repositories**: per-repo detail views - **Checkpoints**: browsable checkpoint history with file manifests - **Sessions**: agent session browser with transcript links --- --- # Repositories The Repositories section of the dashboard shows all repos connected to your account. --- ## Connecting a repository ```bash cd /path/to/repo semantica auth login # if not already authenticated semantica connect ``` The repo appears in the dashboard after the next commit that pushes attribution data. --- ## Repository detail view Each repo page shows: - **Attribution trend**: AI % by commit over the last 30/90 days - **Provider breakdown**: which AI agents contributed to this repo and in what proportion - **Recent commits**: list with attribution, linked to their explain output - **Check run status**: current policy mode and threshold settings - **Sessions**: recent agent sessions linked to this repo --- ## Disconnecting a repository ```bash semantica disconnect ``` Or disconnect from the dashboard UI. Disconnecting stops new attribution data from being pushed. Historical data remains in the dashboard until explicitly deleted. --- --- # Checkpoints (Dashboard) The Checkpoints view in the dashboard provides a browsable history of every checkpoint from connected repositories. --- ## Browsing checkpoints Each checkpoint entry shows: - Checkpoint ID and timestamp - Kind (auto, manual, baseline, safety) - Associated commit hash and subject - File count and manifest hash - AI attribution percentage (for auto checkpoints) Click any checkpoint to see its full file manifest, every file that was in the repository at that moment, with its content hash. --- ## Checkpoint timeline The timeline view shows checkpoints as a vertical history, grouped by date. Commits with high AI attribution are visually highlighted. --- --- # Sessions (Dashboard) The Sessions view shows all agent sessions captured from connected repositories. --- ## What a session shows Each session entry includes: - Session ID and provider (e.g. `claude_code`, `cursor`, `kiro_ide`) - Start time and duration - Number of events (user messages, assistant responses, tool calls) - Files touched during the session - Token usage (when available) - Linked checkpoints and commits --- ## Viewing transcripts Click **View Transcript** on any session to see the full event stream: user messages, assistant responses, and every tool call including the files read, written, and executed. Transcript data is stored locally on your machine and is never uploaded to the backend. The dashboard transcript viewer reads from the CLI's local blob store via a local server component. > **Note:** Transcript viewing in the dashboard requires the Semantica CLI to be running locally. If you're viewing a repo from another machine or team member, transcripts will show as unavailable. --- ## Filtering sessions Filter by: - Provider - Date range - Linked commit or checkpoint - File touched - Minimum event count --- ## Semantica.io overview **Canonical path:** /docs/semantica-io-overview **Source file:** `docs/semantica-io-overview.md` # Semantica.io Overview [Semantica.io](https://www.semantica.sh) is the hosted platform that extends the CLI's local capabilities into team-wide visibility and GitHub/GitLab workflow integration. The CLI works fully offline without any account. Semantica.io is optional. --- ## What you get with Semantica.io ### GitHub integration - **PR attribution comments**: Semantica posts a comment on every pull request showing AI attribution per file and overall - **GitHub check runs**: A check run is created for every PR with configurable pass/fail thresholds based on AI attribution percentage - **Policy enforcement**: When configured as a required status check in branch protection, Semantica can block merges that exceed AI attribution thresholds ### GitLab integration - **MR comments**: Attribution results posted as merge request comments - **Commit statuses**: Attribution percentage as a GitLab commit status ### Dashboard - Aggregate AI attribution trends across connected repositories - Per-repo and per-commit attribution history - Session and playbook data surfaced for team review --- ## Getting started with Semantica.io ### Step 1: Authenticate ```bash semantica auth login ``` This opens a browser for OAuth authorization via GitHub or GitLab and polls until complete. You only need to do this once globally. ### Step 2: Connect a repo In the repository you want to sync: ```bash semantica connect ``` This registers the repo with the backend and sets `connected: true` in `.semantica/settings.json`. Attribution data from future commits will be pushed automatically. ### Step 3: Install the GitHub App For GitHub PR comments and check runs, install the Semantica GitHub App on your organization or repository from the dashboard at [semantica.sh](https://www.semantica.sh). For GitLab, configure a project webhook from the dashboard. --- ## Check runs and policy enforcement Semantica's GitHub check run reports AI attribution on every PR. The check defaults to `neutral` (informational only). To make it enforce a policy, configure thresholds: | Mode | Behavior | |------|----------| | `off` | No check run created | | `informational` | Check always passes; reports attribution as context | | `blocking` | Check fails if attribution exceeds configured thresholds | Default thresholds in blocking mode: - Warn at **75%** AI attribution - Fail at **90%** AI attribution When Semantica's check is added as a required status check in GitHub branch protection, PRs exceeding the fail threshold cannot be merged until a human reviewer approves them. --- ## Data sent to the backend When a repo is connected, Semantica pushes: - Attribution summaries (AI percentage, line counts, file counts per commit) - Commit metadata (hash, subject, author, timestamp) - Session identifiers and provider names Semantica does **not** push: - File contents or diffs - Agent transcripts or session logs - Playbook text - Anything from `.semantica/objects/` Before any data leaves the machine, Semantica applies secret redaction using embedded Gitleaks patterns. If the redactor cannot initialize, the outbound operation is blocked rather than sending unredacted data. Remote URL fields are sanitized to strip embedded credentials, query strings, and fragments before upload. --- ## Disconnecting To stop syncing a repo: ```bash semantica disconnect ``` This sets `connected: false` in `.semantica/settings.json`. No data is deleted from the backend automatically; contact support if you need data removal. To remove authentication entirely: ```bash semantica auth logout ```