Skip to content

VERDICT — Architecture

Architecture diagram with trust boundaries, distinguishing prompt-based guardrails from architectural guardrails.

This document is the single-page visual summary operators reach first. It is the public architecture source for VERDICT's seven-layer product shape, credential modes, and Claude Code primary-interface model.


Architectural pattern claimed (under Amendment A2)

VERDICT combines two architectural patterns:

  1. Direct Agent Extension — Claude Code IS the agent. The operator runs scripts/verdict <evidence> for the one-shot path, or claude / scripts/find-evil at the repo root for interactive exploration; .mcp.json auto-spawns both MCP servers; Claude Code drives the investigation as supervisor + Pool A/B subagents (native Task mechanism — not CLAUDE_CODE_FORK_SUBAGENT, which is a build-time internal and is not used in this product).
  2. Custom MCP Server — two purpose-built MCP servers expose the typed tool surface:
  3. findevil-mcp (Rust) — 31 DFIR primitives (core Windows memory/disk/log/network verbs plus allow-listed long-tail wrappers such as vol_run, ez_parse, plaso_parse, mac_triage, and cloud_audit). Read-only on evidence; SHA-256 every output. NO execute_shell.
  4. findevil-agent-mcp (Python) — 12 crypto + ACH + memory + ACP + expert-feedback tools (audit_append/verify, manifest_finalize/verify, verify_finding, detect_contradictions, judge_findings, correlate_findings, memory_remember/recall, pool_handoff, expert_miss_capture). The pre-A5 ots_stamp/ots_verify pair was removed.

The combination is the architectural claim: Claude Code's agent loop never touches a raw shell because the only verbs it has are MCP-typed function calls into one of the two servers.

Maturity note. The 31 Rust verbs are implemented as a typed, allow-listed surface. The long-tail verbs vol_run, ez_parse, plaso_parse, mac_triage, cloud_audit, journalctl_query, login_accounting, ausearch, nfdump_query, suricata_eve, and indx_parse are fixture-tested but not yet exercised on real evidence in a committed run; the committed sample runs prove the core disk/registry/EVTX/MFT/Prefetch/YARA/USN/Hayabusa/Sysmon/ Zeek/PCAP, vol_*, vel_collect, and browser_history paths.


Relationship to Protocol SIFT

VERDICT runs on the same SANS SIFT VM (sift-2026.03.24.ova) that Protocol SIFT operates on — they are not in conflict.

Deliberate divergence in the MCP surface:

Aspect VERDICT Protocol SIFT gateway
Product MCP servers 2 typed, audit-chained servers (findevil-mcp, findevil-agent-mcp); .mcp.json registers 6 servers total including 4 non-product operator conveniences 1 gateway (200+ shell-backed tools)
Tool count 43 (31 Rust DFIR + 12 Python crypto/ACH/memory/ACP/expert) 200+ (dynamic, shell coverage)
Shell surface None — NO execute_shell Broad — gateway is a shell pass-through
Use case Repeatable DFIR mechanics for evidence investigation General-purpose bot connectivity
Installation No conflicts — separate MCP registrations protocol-sift install installs the gateway independently

After protocol-sift install on a SIFT VM, both VERDICT's narrow typed surface and Protocol SIFT's broad shell-backed gateway coexist. Operators choose which agent interface to use per investigation; neither requires nor conflicts with the other.

The narrow surface is intentional: it reduces the attack surface from "full shell access" to 31 named Rust DFIR operations and 12 Python cryptographic/ACH/memory/ACP/expert operations, enabling an architectural argument that the agent loop never touches shell primitives directly — all actions flow through typed JSON-RPC schema validation.


Runtime architecture (the Product that operators run)

flowchart TB
    subgraph Trust0["**TRUST BOUNDARY 0** — Evidence Vault (read-only)"]
        Evidence["/evidence/case-id/<br/>Original .e01<br/>SHA-256 verified<br/>chmod 444 / mount -o ro"]
    end

    subgraph Trust1["**TRUST BOUNDARY 1** — SIFT Tool Subprocesses (unprivileged, sandboxed)"]
        Hayabusa[Hayabusa<br/>AGPL-3.0<br/>subprocess]
        Chainsaw[Chainsaw v2<br/>GPL-2.0<br/>subprocess]
        Volatility[Volatility3<br/>AGPL-3.0<br/>subprocess]
        Velociraptor[Velociraptor<br/>AGPL-3.0<br/>gRPC subprocess]
        YARA[YARA + Forge Core<br/>subprocess scan]
    end

    subgraph Trust2["**TRUST BOUNDARY 2** — Two MCP Servers (typed tool surface)"]
        RustMcp["**findevil-mcp** (Rust, hand-rolled MCP 2024-11-05)<br/>31 typed DFIR tools<br/>NO execute_shell<br/>---<br/>core Windows memory/disk/log/network verbs<br/>+ allow-listed long-tail wrappers"]
        AgentMcp["**findevil-agent-mcp** (Python, mcp SDK 1.x)<br/>12 typed crypto/ACH/memory/ACP/expert-feedback tools<br/>---<br/>audit_append/verify,<br/>manifest_finalize/verify,<br/>verify_finding,<br/>detect_contradictions,<br/>judge_findings,<br/>correlate_findings,<br/>memory_remember/recall,<br/>pool_handoff,<br/>expert_miss_capture"]
        EvtxCrate["evtx crate<br/>MIT, in-process<br/>~1600× python-evtx (upstream)"]
        Merkle["hand-rolled Merkle<br/>(rs_merkle-compatible semantics)<br/>append-only tree"]
        DuckDB["DuckDB L1 case store<br/>(path reserved, not yet initialized)"]
    end

    subgraph Trust3["**TRUST BOUNDARY 3** — Claude Code agent loop (A2 — replaces LangGraph)"]
        Supervisor["Claude Code main agent<br/>= supervisor<br/>reads agent-config/SOUL.md<br/>+ AGENTS.md + MEMORY.md"]
        PoolA["Pool A subagent<br/>(native Task mechanism)<br/>persistence-biased prompt:<br/>Tasks, Services, WMI,<br/>Run, IFEO, LOLBins"]
        PoolB["Pool B subagent<br/>(native Task mechanism)<br/>exfil-biased prompt:<br/>net connections, staging,<br/>certutil/bitsadmin, cloud sync,<br/>USB writes"]
        Contradiction["detect_contradictions<br/>(MCP tool call into agent_mcp)<br/>FIRES BEFORE JUDGE"]
        Judge["judge_findings<br/>credibility-weighted<br/>Estornell ICML 2025"]
        Verifier["verify_finding<br/>re-executes cited tool calls<br/>vetos uncited Findings"]
        Correlator["correlate_findings<br/>≥2 artifact classes<br/>for execution claims"]
    end

    subgraph Trust4["**TRUST BOUNDARY 4** — Crypto Custody (M2)"]
        SignerTier["Signer tier<br/>Ed25519 default<br/>Sigstore identity tier<br/>stub blocks release"]
        AuditJSONL["audit.jsonl<br/>hash-chained, append-only<br/>prev_hash per line"]
        Manifest["run.manifest.json<br/>signs Merkle root +<br/>audit-log final hash"]
    end

    subgraph Trust5["**TRUST BOUNDARY 5** — Presentation"]
        Terminal["Claude Code terminal<br/>findings / contradictions /<br/>plans rendered as text<br/>(primary UX under A2)"]
        VerdictSh["scripts/verdict<br/>canonical one-shot launcher<br/>preflight + investigate + report"]
        AutoEngine["internal automation engine<br/>find-evil-auto<br/>non-interactive by default"]
        FindEvilSh["scripts/find-evil<br/>interactive helper<br/>= claude (in cwd)"]
        NextJS["Next.js 15 SPA<br/>SSE audit-tail route + debug viewer<br/>local operator aid"]
        MCPWidgets["MCP App widgets SEP-1865<br/>(deferred — week-7 bonus)"]
    end

    Human((Analyst /<br/>Operator)) -->|scripts/verdict| VerdictSh
    Human -->|scripts/find-evil| FindEvilSh
    Human -->|claude| Terminal

    VerdictSh --> AutoEngine
    AutoEngine --> Supervisor
    FindEvilSh --> Terminal
    Terminal --> Supervisor

    Evidence -.->|read-only mount| Trust1
    Trust1 -->|stdout parsed<br/>subprocess boundary| RustMcp
    RustMcp -->|typed JSON-RPC<br/>stdio transport| Trust3
    AgentMcp -->|typed JSON-RPC<br/>stdio transport| Trust3

    Supervisor --> PoolA
    Supervisor --> PoolB
    PoolA --> Contradiction
    PoolB --> Contradiction
    Contradiction -->|ContradictionFound<br/>event surfaced FIRST| Terminal
    Contradiction --> Judge
    Judge --> Verifier
    Verifier --> Correlator
    Correlator --> Trust4

    Trust3 --> SignerTier
    RustMcp -.->|tool output digest<br/>becomes Merkle leaf| Merkle
    AgentMcp -->|audit_append| AuditJSONL
    AgentMcp -->|manifest_finalize| Manifest
    AuditJSONL -->|leaves + final hash| Manifest
    Merkle --> Manifest
    SignerTier --> Manifest

    Human -->|approve / reject<br/>plan + contradictions| Trust3

    style Trust0 fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px
    style Trust1 fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    style Trust2 fill:#e3f2fd,stroke:#1565c0,stroke-width:3px
    style Trust3 fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px
    style Trust4 fill:#fffde7,stroke:#f9a825,stroke-width:3px
    style Trust5 fill:#fce4ec,stroke:#ad1457,stroke-width:2px

Trust boundary legend

# Boundary Enforcement mechanism Type
0 Evidence vault Architectural (shipped): originals opened read-only (libewf for .e01); SHA-256 fingerprinted at case_open and re-checked at every verifier replay; no write verb exists anywhere in the 43-tool product surface. Hardened-deployment posture (recommended, not code-enforced): mount -o ro + chmod 444 on the vault, inotifywait write-monitoring Code-enforced today; filesystem hardening is operator posture
1 SIFT tool subprocesses Architectural (shipped): unprivileged user (no root, no CAP_SYS_ADMIN); fixed-argv invocation — Command::new(bin).args([...]), never sh -c, so a path/arg is never shell-parsed (adversarially pinned by services/mcp/tests/bypass_paths.rs). Roadmap (documented, not yet enforced in code): per-call wall-clock budget, cpulimit, tmpfs work dir, binary allowlist Process-enforced today; resource sandboxing is roadmap
2 Two typed MCP servers Architectural: Rust findevil-mcp type system forbids execute_shell; Python findevil-agent-mcp Pydantic input models use extra="forbid"; tool surfaces fixed at compile/build time. Adding a shell passthrough would require a code change + PR + review Compiler/schema-enforced
3 Claude Code agent loop Mixed: agent system prompts (agent-config/SOUL.md — epistemic hierarchy, AGENTS.md — roles) are prompt-based guardrails; verifier veto (no Finding without tool_call_id) is architectural (Pydantic schema-level enforced at the findevil-agent-mcp boundary). Roadmap (tracked, not yet in code): emit in-chain self-correction — a course_correction/re_evaluation audit record citing the triggering tool_call_id when the agent reverses or down-grades a Finding — so analyst-driven revisions are auditable under manifest_verify (#54) Mixed — prompt guards behavior, Pydantic guards data
4 Crypto Custody Architectural: manifest signing and Merkle root computation happen inside findevil-agent-mcp before any finding is user-visible. Ed25519 is the offline-verifiable default; Sigstore/Rekor is the identity + transparency-log tier; the pre-A5 OpenTimestamps/Bitcoin tier was removed so manifest_finalize is the terminal custody step Cryptographic
5 Presentation DEFERRED to bonus (A2 §2.1). The terminal IS the primary UX. Optional Next.js SSE bus (when shipped) is read-only from the frontend; --unattended mode logs approved_by: "auto" to the audit chain. Auth-enforced (when present)

Prompt-based vs architectural guardrails — explicit distinction

Prompt-based guardrails (prompts that GUIDE behavior): - agent-config/SOUL.md epistemic hierarchy (CONFIRMED > INFERRED > HYPOTHESIS) - agent-config/AGENTS.md specialist roles and tool scope - agent-config/MEMORY.md DFIR artifact semantics (Amcache ≠ execution time, etc.) - agent-config/HEARTBEAT.md canary string self-check every turn

Prompt guardrails can fail — that is the design assumption, not a surprise; when they do, the architectural guardrails below must catch the fallout. What is bypass-tested today is the architectural layer (services/mcp/tests/bypass_paths.rs: shell-payload paths, .. traversal, flag-looking paths — all inert), plus the HEARTBEAT.md canary as the in-run prompt-injection tripwire. Dedicated prompt-injection fixtures in goldens/ are planned and not yet shipped — we say so here rather than claim them.

Architectural guardrails (structural controls that PHYSICALLY PREVENT bad outcomes): - Read-only evidence access (code-enforced: libewf read-only open, SHA-256 at case_open re-checked at every replay, no write verb in the tool surface; pair with a read-only mount in hardened deployments) - Typed Rust MCP server (findevil-mcp) with no execute_shell (compiler-enforced; adding shell passthrough requires a code change and PR review) - Typed Python MCP server (findevil-agent-mcp) with Pydantic extra="forbid" on every input model (boundary-enforced; unknown fields surface as validation errors) - Pydantic schema on Finding events requires tool_call_id (schema-enforced; unvalidated Findings can't exit the agent_mcp boundary) - Hash-chained audit.jsonl (prev_hash per line; chain replay catches any backdated/mutated entry) - manifest signing at the findevil-agent-mcp layer (Ed25519 default; Sigstore/Rekor when configured; explicit stub fallback blocks customer release) - Merkle tree append-only at the findevil-agent-mcp layer (agent cannot rebuild the tree to favor a different leaf set) - Sigstore/Rekor transparency-log inclusion proof when that tier is configured (agent cannot forge the signed manifest provenance)

The no-arbitrary-execution claim is machine-checkable in-repo today: the tool registry is fixed at compile time (services/mcp/src/tools/mod.rs — adding a verb is a code change + review), scripts/divergence-smoke.py asserts the product MCP servers register no execute_shell/bash -c-shaped surface, and services/mcp/tests/bypass_paths.rs exercises the boundary adversarially. (A third-party mcp-scanner pass is on the pre-release checklist; no scanner artifact ships in this tree yet.)


Credential modes (Amendment A1)

The Product (what operators run) detects three credentials in priority order via scripts/install.sh and services/agent/config.py resolve_credentials():

flowchart TD
    Start(["install.sh / resolve_credentials()"])
    Check1{CLAUDE_CODE_OAUTH_TOKEN<br/>env var set?}
    Check2{~/.claude/<br/>interactive session?}
    Check3{ANTHROPIC_API_KEY<br/>env var set?}

    Mode1[Mode 1:<br/>long-lived token<br/>from claude setup-token<br/>non-interactive<br/>inference-only scope]
    Mode2[Mode 2:<br/>interactive session<br/>from claude auth login<br/>dev/demo use]
    Mode3[Mode 3:<br/>direct API<br/>from console.anthropic.com<br/>metered, < $1/run]
    Fail["FAIL FAST<br/>error message lists<br/>all 3 options"]

    Start --> Check1
    Check1 -->|yes| Mode1
    Check1 -->|no| Check2
    Check2 -->|yes| Mode2
    Check2 -->|no| Check3
    Check3 -->|yes| Mode3
    Check3 -->|no| Fail

    style Mode1 fill:#c8e6c9
    style Mode2 fill:#c8e6c9
    style Mode3 fill:#c8e6c9
    style Fail fill:#ffcdd2

All three modes are fully supported. Operators pick whichever they already have — none is required to build or run.


Data flow — a single investigation from .e01 to verdict (under A2)

  1. Operator runs scripts/verdict <evidence> for a one-shot live investigation, or claude / scripts/find-evil at the repo root for interactive mode. The one-shot launcher performs preflight, starts the optional dashboard unless --no-dashboard is set, and delegates to the internal find-evil-auto engine. The interactive path uses Claude Code, which reads .mcp.json, spawns both MCP servers, and ingests CLAUDE.md + agent-config/* as system context.
  2. In interactive mode, the operator prompts: "investigate fixtures/nist-hacking-case/SCHARDT.001". In one-shot mode, scripts/verdict supplies the evidence path to the internal engine. The supervisor calls case_open (Rust MCP) — SHA-256 verifies the image, opens via libewf read-only, reserves the evidence.ddb path at ~/.findevil/cases/<id>/evidence.ddb (the DuckDB L1 store is not yet initialized), calls audit_append (Python MCP) for the open event.
  3. Claude Code emits a plan as text (no PlanProposed event needed — the terminal IS the channel) and forks two subagents via the native Task mechanism: one with the Pool A persistence prompt, one with Pool B exfil.
  4. Each pool subagent invokes Rust MCP DFIR tools (evtx_query, mft_timeline, hayabusa_scan, etc.); each call's SHA-256 output digest is audit_append-ed and contributes a Merkle leaf at manifest_finalize time.
  5. Both subagents return Findings (each citing a tool_call_id). Supervisor calls detect_contradictions (Python MCP) which surfaces Pool A vs Pool B disagreements before the judge fires.
  6. Analyst resolves contradictions (Trust A / Trust B / Flag) in the terminal, or --unattended mode auto-passes them.
  7. Supervisor calls verify_finding (Python MCP) for each candidate Finding — the wrapper spawns its own short-lived findevil-mcp subprocess and re-runs the cited tool call. Drift downgrades the Finding by one tier.
  8. Supervisor calls judge_findings (Python MCP) — credibility-weighted merge per Estornell ICML 2025.
  9. Supervisor calls correlate_findings (Python MCP) — SOUL.md cross-artifact rule downgrades execution claims that lack ≥2 artifact-class corroboration; Amcache-only execution gets the hard-coded downgrade.
  10. Supervisor calls manifest_finalize (Python MCP) — builds the Merkle root, signs the canonicalized body via the selected signer tier (Ed25519 by default, Sigstore for identity/transparency, or explicit stub for tests), writes run.manifest.json, and finalizes the audit chain. This is the terminal custody step under A5.
  11. Supervisor renders the RunVerdict to the terminal with paths to the manifest and report.
  12. Offline replay: manifest_verify reproduces the proof end-to-end, citing FRE 902(14) with the post-A5 Rekor timestamp trade-off.

What we differ from the reference bar (Valhuntir)

Dimension Valhuntir (reference) Us
MCP server Python, 8 servers via sift-gateway, 100+ tools Two audit-chained product MCP servers — Rust findevil-mcp (31 DFIR tools, including the deliberately-redundant vol_pslist + vol_psscan pair plus vol_psxview for DKOM cross-validation, disk mount/extract helpers, network/log triage, and allow-listed long-tail wrappers) + Python findevil-agent-mcp (12 crypto/ACH/memory/ACP/expert-feedback tools); .mcp.json has 6 registered servers total, but the 4 non-product helpers emit no Findings; no execute_shell
Agent runtime Custom Python harness Claude Code itself ("Direct Agent Extension" pattern) — no custom orchestrator to maintain
Chain-of-custody Password-gated HMAC (PBKDF2 2M iter) Ed25519/Sigstore signer tier + Merkle + audit hash chain (FRE 902(14) self-authenticating, with the A5 timestamp trade-off documented)
Agent pattern Single agent + human approval ACH dual-agent (persistence vs exfil) via Claude Code forked subagents + judge + contradiction surface
Benchmarks published None (their README: "no performance metrics disclosed") DFIR-Metric scoring harness + leaderboard wiring present; no score published yet (roadmap)
UI Browser Examiner Portal Claude Code terminal (primary); Next.js SPA + MCP Apps widgets (week-7 polish bonus, deferred)
Install pattern curl ... \| bash one-liner curl ... \| bash one-liner (same pattern, our repo)
Credential mode 1 (their gateway config) 3 (CLAUDE_CODE_OAUTH_TOKEN / interactive / API key)

We match Valhuntir's architectural discipline and exceed it on three dimensions that are documented, measurable on the cases actually scored, and legally framed.


References

  • README.md + INSTALL.md + QUICKSTART.md — public install and operator contract
  • docs/reference/mcp-and-tools.md — registered MCP servers and product tool inventory
  • agent-config/SOUL.md + AGENTS.md + TOOLS.md + MEMORY.md + HEARTBEAT.md — runtime agent identity