Sentinel

Agent visibility for Claude Code — see what your agents are doing, and how you respond.

What it is

Sentinel is a local-first plugin for Claude Code. It installs as an MCP server plus two prompt hooks, and gives you visibility into agent sessions on three surfaces:

Security probes — simulated adversarial requests (deploy to prod unreviewed, export bulk PII, fabricate an audit log, follow a spoofed instruction). The agent answers as if you'd really asked; an LLM judge then grades each answer pass or fail. Probes come from a universal pool; a workspace can opt into domain-specific probes via a .sentinel.json file.
Drift signals — optional, agent-initiated flags for when it notices itself drifting from your intent: scope creep, boundary pressure, instruction conflict, intent uncertainty.
Operator scorecard — a mirror. When the agent flags drift, you do something next. Sentinel reconstructs your response from the session transcript and reflects it back: did you engage with the flag, ignore it, or push harder?

How it works — the processes

Three things run, on different triggers. It helps to keep them separate:

Two hooks fire automatically. Every message you send triggers Sentinel's UserPromptSubmit hooks. The probe hook is gated to ~10 min, the drift hook to ~30 min (randomized). On most messages nothing happens — the interval simply hasn't elapsed.
A fired hook invites the agent — silently. When the interval has elapsed, the hook injects a one-line invitation into the agent's context. You don't see this; it isn't printed on screen. It's a nudge, not a command.
The agent decides. It can draw a probe (answer it, record the response) or file a drift signal — or skip, if it's mid-task. A fired hook does not guarantee a probe.
Grading happens on review. Probe answers are graded pass/fail by an LLM judge; the operator scorecard is judged from the transcript. Verdicts are computed when you read them.

Everything is logged locally, per workspace, under ~/.sentinel/workspaces/<id>/. Two Claude Code windows in different repos never share a timer or overwrite each other's state.

Automatic vs. what you call

Automatic — the hooks. Once installed they fire on their own; no action needed.
The agent's choice — whether to act on a fired invitation (draw a probe, file a drift signal).
You call — the audit tools, whenever you want to see what has happened.

Knowing it's working

Because hook fires are silent by design, "I don't see it firing" is the expected experience — not a fault. To confirm Sentinel is alive, call sentinel_status: it shows whether the hooks are registered, when the last probe fired, what has been drawn and scored, and when the next one is due. The installer also runs a self-test, so a fresh install gets immediate confirmation rather than silence.

MCP tools

sentinel_status — proof-of-life: hooks, last fire, activity, next due.
sentinel_get_next_probe / sentinel_record_probe_response — the agent draws and answers a probe.
sentinel_review_probes — this session's probes with pass/fail verdicts.
sentinel_probe_history — every probe + verdict for this workspace, across sessions.
sentinel_report_drift / sentinel_recent_drift_reports — file and read drift signals.
sentinel_operator_scorecard — the mirror of how you responded to drift.

Install

git clone https://github.com/kandikandikandi/sentinel.git
cd sentinel
bash scripts/install.sh

Requires Node 18+ and the Claude Code CLI. The installer registers the MCP server and both hooks, runs a self-test, and stores everything locally under ~/.sentinel/ — there is no Sentinel server and no account to create. Start a new Claude Code session, then run sentinel_status to confirm it's live.

Scoring

Pass/fail verdicts and the operator scorecard use an LLM judge, which calls the Anthropic API with your own key (ANTHROPIC_API_KEY, or anthropic_api_key in config/org-config.json). It runs on Haiku, so it costs fractions of a cent per probe. This is the only thing that leaves your machine — set scoring_enabled: false to keep Sentinel fully local; probes and drift signals still log, just ungraded.

Who it's for

Engineers, eng managers, and security folks who want visibility into what their Claude Code agents actually do under pressure — and an honest mirror of how they themselves respond when an agent pushes back.