docs/release-evidence/ — release-validation evidence¶
This directory holds small, reviewable evidence summaries for two scopes: the
historical v-submit L3 fallback packet, and a later Stage One EVTX
execution-log packet. Each file documents what it proves; neither file is raw
customer evidence.
Files¶
| File | Purpose |
|---|---|
l3-local-sift.json |
Committed local NIST fallback evidence for v-submit. It records the NIST Hacking Case image hash, run/readiness state, artifact hashes, and verification commands used when the GitHub KVM runner label had no capacity. It is not SIFT/KVM parity evidence. |
evtx-security-log-clear-trace.jsonl |
Compact structured execution trace from a fresh live EVTX run. It includes agent messages, typed tool calls, ACP handoffs, verifier replay, finding approval, report QA, and release-gate records with timestamps. |
evtx-security-log-clear-trace-summary.json |
Reviewer index for the EVTX trace: run command, case id, evidence hash, manifest verification result, token usage ledger, and a spot-check mapping from Finding f-A-evtx-audit-log-cleared to evtx_query tool call tc-002. |
natural-self-correction-trace.jsonl |
Verbatim excerpt (seq 169-183) from an organic run's hash-chained audit.jsonl: real registry_query failures on truncated RegBack hives, course_correction narrow/skip decisions, and heartbeat_failure escalation to an honest partial verdict. No fault injection. |
natural-self-correction-summary.json |
Reviewer index for the self-correction trace: source run, organic (fault_injection absent) statement, the failure->adjust->escalate arc, event counts, and how to verify from seq/ts/prev_hash. |
nist-schardt-disk-trace.txt |
Captured scripts/trace-finding output for a real disk Case on the NIST CFReDS Hacking Case image (SCHARDT.dd): audit chain OK, all leaves resolve, all findings traced. |
nist-schardt-disk-summary.json |
Reviewer index for the NIST disk Case: SUSPICIOUS, 27 findings across 6 parsed artifact classes, the organic plaso_parse-unavailable -> PARTIAL-timeline pivot, Merkle root, and manifest_verify.overall=true. |
memory-volatility-summary.json |
Reviewer index for a real ~18 GB memory-image Case: INDETERMINATE, 2 findings, the four Volatility tools run, and the honest single-class scope (no second class to corroborate execution). |
Historical L3 fallback packet¶
The preferred L3 path is a full SIFT run on a KVM-capable GitHub runner. During
final release, the ubuntu-latest-4-core-kvm label had no available jobs, so
this packet recorded the explicit committed local evidence boundary instead of
treating a skipped L3 run as success. It is not a passing L3 recall result.
This is intentionally narrow:
- It does not contain raw evidence, disk images, reports, or case artifacts.
- It records hashes and gate outcomes only.
- It preserves the truth boundary: the packet is
READY_FOR_EXPERT_REVIEW, not customer-releasable. - It does not mutate or refresh the GitHub
v-submitrelease asset set.
Strict check:
This check is expected to fail the recall threshold while the historical packet records 7/14 NIST recall (50%) against the 71% bar. That failure is the point of the packet: the release evidence remains honest about the L3 gap.
Stage One execution-log packet¶
evtx-security-log-clear-trace.jsonl and evtx-security-log-clear-trace-summary.json
exist for the Find Evil! self-check item that asks for structured logs showing
the agent communication and tool execution sequence. They were generated from:
Reviewer spot-check:
- Finding:
f-A-evtx-audit-log-cleared - Cited tool call:
tc-002 - Tool:
evtx_query - Trace records: start at
seq=3, output atseq=4, verifier approval atseq=8, replay atseq=9, approved Finding atseq=17 - Token usage:
0LLM API calls /0input tokens /0output tokens for this deterministic headlessfind-evil-autopath; credentials are checked during preflight, but the EVTX run itself does not call a completion API. - Manifest note: the
customer_release_gatetrace record is emitted before manifest finalization; the summary JSON records the latermanifest_verify.overall=truecheck.
Trace validation:
jq -e . docs/release-evidence/evtx-security-log-clear-trace-summary.json >/dev/null
jq -e . docs/release-evidence/evtx-security-log-clear-trace.jsonl >/dev/null
Stage One self-correction packet¶
natural-self-correction-trace.jsonl and natural-self-correction-summary.json
exist for the requirement that self-correction be visible in the logs, not
only in the demo video (Stage One Check 11 / Stage Two criterion 1).
They are a verbatim excerpt of an organic run (the string fault_injection
never appears in its audit chain). Reviewer spot-check:
- Trigger:
registry_queryfails on truncated Windows RegBack hives (SAM,SECURITY,SOFTWARE) withregistry hive parse failed ... hive truncated (header too small). - Adjustment: each
course_correctionrecord setsaction = "narrow (skip this key; continue remaining hive triage)"— the agent abandons the unreadable hive and continues other lanes instead of inventing a result. - Escalation: after consecutive failures,
heartbeat_failuresetsaction = "escalate"with a risingconsecutive_failurescounter and a recovery posture that seals an honestINDETERMINATE/partial Verdict. - Integrity: every record carries
seq,ts, andprev_hash; the excerpt is contiguous, so each record'sprev_hashlinks to the one before it.
This is the deliberately un-edited counterpart to the demo film: a real tool failure and a real recovery, not a staged or injected one.
Stage Two real-run packets (disk + memory)¶
nist-schardt-disk-* and memory-volatility-summary.json are reviewer indexes
for two real scripts/verdict runs, each traced with scripts/trace-finding
and each carrying manifest_verify.overall=true. They cover the depth and the
breadth criteria without committing any raw evidence.
- Disk depth — NIST CFReDS Hacking Case image (
SCHARDT.dd, public domain):SUSPICIOUS, 27 findings, 6 parsed artifact classes (custody, disk/filesystem, MFT, prefetch, registry, and tool-output), with the timeline class sealedPARTIAL.plaso_parsewas genuinely unavailable (attempted, all failed), so the agent sealed the timeline as partial and continued to an honest verdict — an organic course-correction,fault_injection=0. Seenist-schardt-disk-trace.txtfor the captured trace. - Memory breadth — a real ~18 GB memory image:
INDETERMINATE, 2 findings, withvol_pslist/vol_psscan/vol_psxview/vol_malfindall run and the memory class parsed.INDETERMINATEis the correct word: memory-only, with no second artifact class to corroborate execution against.
Known gap (kept honest): same-host disk and memory correlated inside one
Case — the disk-vs-memory discrepancy signal — is still pending. A flat evidence
folder was not ingested as a single fusion Case; the two packets above are
separate single-class Cases, and memory-volatility-summary.json records this
gap rather than implying fusion was demonstrated.
Reviewer spot-check:
jq -e '.manifest_verify_overall == true' docs/release-evidence/nist-schardt-disk-summary.json
jq -e '.manifest_verify_overall == true' docs/release-evidence/memory-volatility-summary.json
Stage Two evidence map¶
stage-two-evidence.md indexes each of the six Official Rules criteria to the committed artifact that supports it, with a one-line verify command per criterion. Start there for a criterion-by-criterion walk of the evidence.