` — release-validation evidence¶

This directory holds small, reviewable evidence summaries for two scopes: the historical v-submit L3 fallback packet, and a later Stage One EVTX execution-log packet. Each file documents what it proves; neither file is raw customer evidence.

Files¶

File	Purpose
`l3-local-sift.json`	Committed local NIST fallback evidence for `v-submit`. It records the NIST Hacking Case image hash, run/readiness state, artifact hashes, and verification commands used when the GitHub KVM runner label had no capacity. It is not SIFT/KVM parity evidence.
`evtx-security-log-clear-trace.jsonl`	Compact structured execution trace from a fresh live EVTX run. It includes agent messages, typed tool calls, ACP handoffs, verifier replay, finding approval, report QA, and release-gate records with timestamps.
`evtx-security-log-clear-trace-summary.json`	Reviewer index for the EVTX trace: run command, case id, evidence hash, manifest verification result, token usage ledger, and a spot-check mapping from Finding `f-A-evtx-audit-log-cleared` to `evtx_query` tool call `tc-002`.
`natural-self-correction-trace.jsonl`	Verbatim excerpt (seq 169-183) from an organic run's hash-chained `audit.jsonl`: real `registry_query` failures on truncated RegBack hives, `course_correction` narrow/skip decisions, and `heartbeat_failure` escalation to an honest partial verdict. No fault injection.
`natural-self-correction-summary.json`	Reviewer index for the self-correction trace: source run, organic (`fault_injection` absent) statement, the failure->adjust->escalate arc, event counts, and how to verify from `seq`/`ts`/`prev_hash`.
`nist-schardt-disk-trace.txt`	Captured `scripts/trace-finding` output for a real disk Case on the NIST CFReDS Hacking Case image (SCHARDT.dd): audit chain OK, all leaves resolve, all findings traced.
`nist-schardt-disk-summary.json`	Reviewer index for the NIST disk Case: SUSPICIOUS, 27 findings across 6 parsed artifact classes, the organic `plaso_parse`-unavailable -> PARTIAL-timeline pivot, Merkle root, and `manifest_verify.overall=true`.
`memory-volatility-summary.json`	Reviewer index for a real ~18 GB memory-image Case: INDETERMINATE, 2 findings, the four Volatility tools run, and the honest single-class scope (no second class to corroborate execution).

Historical L3 fallback packet¶

The preferred L3 path is a full SIFT run on a KVM-capable GitHub runner. During final release, the ubuntu-latest-4-core-kvm label had no available jobs, so this packet recorded the explicit committed local evidence boundary instead of treating a skipped L3 run as success. It is not a passing L3 recall result.

This is intentionally narrow:

It does not contain raw evidence, disk images, reports, or case artifacts.
It records hashes and gate outcomes only.
It preserves the truth boundary: the packet is READY_FOR_EXPERT_REVIEW, not customer-releasable.
It does not mutate or refresh the GitHub v-submit release asset set.

Strict check:

This check is expected to fail the recall threshold while the historical packet records 7/14 NIST recall (50%) against the 71% bar. That failure is the point of the packet: the release evidence remains honest about the L3 gap.

python3 scripts/validate-l3-evidence.py docs/release-evidence/l3-local-sift.json

Stage One execution-log packet¶

evtx-security-log-clear-trace.jsonl and evtx-security-log-clear-trace-summary.json exist for the Find Evil! self-check item that asks for structured logs showing the agent communication and tool execution sequence. They were generated from:

scripts/verdict evidence/DE_1102_security_log_cleared.evtx --no-dashboard

Reviewer spot-check:

Finding: f-A-evtx-audit-log-cleared
Cited tool call: tc-002
Tool: evtx_query
Trace records: start at seq=3, output at seq=4, verifier approval at seq=8, replay at seq=9, approved Finding at seq=17
Token usage: 0 LLM API calls / 0 input tokens / 0 output tokens for this deterministic headless find-evil-auto path; credentials are checked during preflight, but the EVTX run itself does not call a completion API.
Manifest note: the customer_release_gate trace record is emitted before manifest finalization; the summary JSON records the later manifest_verify.overall=true check.

Trace validation:

jq -e . docs/release-evidence/evtx-security-log-clear-trace-summary.json >/dev/null
jq -e . docs/release-evidence/evtx-security-log-clear-trace.jsonl >/dev/null

Stage One self-correction packet¶

natural-self-correction-trace.jsonl and natural-self-correction-summary.json exist for the requirement that self-correction be visible in the logs, not only in the demo video (Stage One Check 11 / Stage Two criterion 1).

They are a verbatim excerpt of an organic run (the string fault_injection never appears in its audit chain). Reviewer spot-check:

Trigger: registry_query fails on truncated Windows RegBack hives (SAM, SECURITY, SOFTWARE) with registry hive parse failed ... hive truncated (header too small).
Adjustment: each course_correction record sets action = "narrow (skip this key; continue remaining hive triage)" — the agent abandons the unreadable hive and continues other lanes instead of inventing a result.
Escalation: after consecutive failures, heartbeat_failure sets action = "escalate" with a rising consecutive_failures counter and a recovery posture that seals an honest INDETERMINATE/partial Verdict.
Integrity: every record carries seq, ts, and prev_hash; the excerpt is contiguous, so each record's prev_hash links to the one before it.

This is the deliberately un-edited counterpart to the demo film: a real tool failure and a real recovery, not a staged or injected one.

Stage Two real-run packets (disk + memory)¶

nist-schardt-disk-* and memory-volatility-summary.json are reviewer indexes for two real scripts/verdict runs, each traced with scripts/trace-finding and each carrying manifest_verify.overall=true. They cover the depth and the breadth criteria without committing any raw evidence.

Disk depth — NIST CFReDS Hacking Case image (SCHARDT.dd, public domain): SUSPICIOUS, 27 findings, 6 parsed artifact classes (custody, disk/filesystem, MFT, prefetch, registry, and tool-output), with the timeline class sealed PARTIAL. plaso_parse was genuinely unavailable (attempted, all failed), so the agent sealed the timeline as partial and continued to an honest verdict — an organic course-correction, fault_injection=0. See nist-schardt-disk-trace.txt for the captured trace.
Memory breadth — a real ~18 GB memory image: INDETERMINATE, 2 findings, with vol_pslist / vol_psscan / vol_psxview / vol_malfind all run and the memory class parsed. INDETERMINATE is the correct word: memory-only, with no second artifact class to corroborate execution against.

Known gap (kept honest): same-host disk and memory correlated inside one Case — the disk-vs-memory discrepancy signal — is still pending. A flat evidence folder was not ingested as a single fusion Case; the two packets above are separate single-class Cases, and memory-volatility-summary.json records this gap rather than implying fusion was demonstrated.