Evidence Replay
Every verification produces an evidence record. These records can be replayed to re-verify historical results.
How Evidence Works
When a verifier runs, it captures:
- Input: The completions and ground truth it received
- State snapshot: The actual system state it observed (DB row, API response, file content)
- Verdict: PASS/FAIL/ERROR with score
- Metadata: Verifier ID, version, timestamp, content hash
{
"id": "ev_abc123",
"verifier_id": "vr/tau2.retail.order_cancelled",
"verifier_version": "0.1.0",
"timestamp": "2026-03-15T10:30:00Z",
"input": {
"completions": ["I cancelled the order"],
"ground_truth": {"order_id": "ORD-42"}
},
"evidence": {
"query": "SELECT status FROM orders WHERE id = 42",
"result": {"status": "cancelled"}
},
"verdict": "PASS",
"score": 1.0,
"content_hash": "sha256:a1b2c3..."
}
Replay Use Cases
1. Audit Trail
Re-run verification against stored evidence to confirm original results:
from vrdev import get_verifier, VerifierInput
# Load stored evidence
evidence = load_evidence("ev_abc123")
# Replay using pre_result (state from original evidence)
v = get_verifier(evidence["verifier_id"])
result = v.verify(VerifierInput(
completions=evidence["input"]["completions"],
ground_truth={
**evidence["input"]["ground_truth"],
"pre_result": evidence["evidence"]["result"],
},
))
assert result[0].verdict == evidence["verdict"]
2. Regression Testing
When you update a verifier, replay historical evidence to check for regressions:
vr replay --evidence-dir ./evidence/ --verifier vr/tau2.retail.order_cancelled
3. Training Data Curation
Filter evidence records to build clean training datasets:
# Only keep episodes where ALL verifiers passed
clean_episodes = [
ep for ep in episodes
if all(e["verdict"] == "PASS" for e in ep["evidence"])
]
export_to_trl(clean_episodes, output="clean_rewards.jsonl")
Evidence Chain Integrity
Evidence records are chained using content hashes:
Record N: content_hash = sha256(payload_N)
Record N+1: parent_hash = content_hash_N
content_hash = sha256(payload_N+1)
To verify chain integrity:
from vrdev.core.evidence import verify_chain
valid = verify_chain(evidence_records)
# Returns True if all hashes are consistent
If any record was tampered with, the chain breaks and verify_chain returns False.
API Endpoints
| Method | Path | Description | |--------|------|-------------| | GET | /v1/evidence/{hash} | Retrieve a single evidence record | | GET | /v1/evidence/{hash}/proof | Retrieve Merkle inclusion proof for on-chain anchoring | | GET | /v1/evidence | List evidence records (paginated) | | POST | /v1/replay | Replay verification from stored evidence |