Evidence Replay

Every verification produces an evidence record. These records can be replayed to re-verify historical results.

How Evidence Works

When a verifier runs, it captures:

  1. Input: The completions and ground truth it received
  2. State snapshot: The actual system state it observed (DB row, API response, file content)
  3. Verdict: PASS/FAIL/ERROR with score
  4. Metadata: Verifier ID, version, timestamp, content hash
{
  "id": "ev_abc123",
  "verifier_id": "vr/tau2.retail.order_cancelled",
  "verifier_version": "0.1.0",
  "timestamp": "2026-03-15T10:30:00Z",
  "input": {
    "completions": ["I cancelled the order"],
    "ground_truth": {"order_id": "ORD-42"}
  },
  "evidence": {
    "query": "SELECT status FROM orders WHERE id = 42",
    "result": {"status": "cancelled"}
  },
  "verdict": "PASS",
  "score": 1.0,
  "content_hash": "sha256:a1b2c3..."
}

Replay Use Cases

1. Audit Trail

Re-run verification against stored evidence to confirm original results:

from vrdev import get_verifier, VerifierInput

# Load stored evidence
evidence = load_evidence("ev_abc123")

# Replay using pre_result (state from original evidence)
v = get_verifier(evidence["verifier_id"])
result = v.verify(VerifierInput(
    completions=evidence["input"]["completions"],
    ground_truth={
        **evidence["input"]["ground_truth"],
        "pre_result": evidence["evidence"]["result"],
    },
))

assert result[0].verdict == evidence["verdict"]

2. Regression Testing

When you update a verifier, replay historical evidence to check for regressions:

vr replay --evidence-dir ./evidence/ --verifier vr/tau2.retail.order_cancelled

3. Training Data Curation

Filter evidence records to build clean training datasets:

# Only keep episodes where ALL verifiers passed
clean_episodes = [
    ep for ep in episodes
    if all(e["verdict"] == "PASS" for e in ep["evidence"])
]

export_to_trl(clean_episodes, output="clean_rewards.jsonl")

Evidence Chain Integrity

Evidence records are chained using content hashes:

Record N:   content_hash = sha256(payload_N)
Record N+1: parent_hash  = content_hash_N
            content_hash = sha256(payload_N+1)

To verify chain integrity:

from vrdev.core.evidence import verify_chain

valid = verify_chain(evidence_records)
# Returns True if all hashes are consistent

If any record was tampered with, the chain breaks and verify_chain returns False.

API Endpoints

| Method | Path | Description | |--------|------|-------------| | GET | /v1/evidence/{hash} | Retrieve a single evidence record | | GET | /v1/evidence/{hash}/proof | Retrieve Merkle inclusion proof for on-chain anchoring | | GET | /v1/evidence | List evidence records (paginated) | | POST | /v1/replay | Replay verification from stored evidence |

Evidence Replay | vr.dev Docs