Cost & Latency Profile

Verification speed matters for RL training loops. Here is the performance profile by tier.

Per-Tier Latency

| Tier | Typical Latency | Cost | LLM Required | |------|----------------|------|--------------| | HARD | 1-50ms | Free | No | | SOFT | 500-2000ms | LLM API cost | Yes | | AGENTIC | 2-10s | LLM + browser | Yes |

Percentile Breakdown

| Tier | p50 | p95 | p99 | Notes | |------|-----|-----|-----|-------| | HARD (live DB) | 12ms | 45ms | 95ms | Network-bound SELECT | | HARD (BYOS/pre_result) | 0.05ms | 0.1ms | 0.3ms | Pure computation | | HARD (file parse) | 3ms | 8ms | 15ms | JSON/CSV/YAML | | SOFT (GPT-4o) | 780ms | 1400ms | 2200ms | LLM API latency dominates | | SOFT (Claude Sonnet) | 1100ms | 1800ms | 2800ms | Slightly higher base | | SOFT (local LLM) | 200ms | 800ms | 1500ms | Depends on hardware | | AGENTIC (browser) | 3.2s | 7.5s | 12s | DOM load + interaction |

For RL training loops, use BYOS (pre_result) mode to keep HARD verifiers at sub-millisecond p99. SOFT tier latency is dominated by LLM inference; use local models to eliminate network variance.

HARD Tier Details

HARD verifiers are the fastest because they perform simple state checks:

| Verifier Type | Latency | Notes | |--------------|---------|-------| | File read + parse | 1-10ms | JSON, CSV, YAML, text | | Database SELECT | 10-50ms | Single-row lookups | | HTTP GET | 50-200ms | Network-bound | | Shell command | 100-5000ms | Depends on command (pytest, ruff) |

With BYOS (pre_result)

When using the pre_result pattern, all HARD verifiers drop to sub-millisecond latency since they skip external I/O entirely:

| Verifier Type | Latency with pre_result | |--------------|------------------------| | Database checks | < 0.1ms | | API checks | < 0.1ms | | File checks | < 0.1ms (if content pre-loaded) |

SOFT Tier Details

SOFT verifiers call an LLM to evaluate text against a rubric:

GPT-4o: ~800ms, ~$0.003/verification
Claude Sonnet: ~1200ms, ~$0.003/verification
Local models: Variable, no API cost

At Scale: RL Training Loops

For a typical RL training episode with 3 HARD + 1 SOFT verifier:

| Scenario | Total Latency | Cost per Episode | |----------|--------------|-----------------| | Live state + LLM | ~2.5s | ~$0.003 | | BYOS + LLM | ~1.2s | ~$0.003 | | BYOS only (HARD) | < 1ms | Free |

10,000 Episodes

| Mode | Total Time | Total Cost | |------|-----------|------------| | Live | ~7 hours | ~$30 | | BYOS + LLM | ~3.3 hours | ~$30 | | BYOS HARD-only | ~10 seconds | Free |

Optimization Tips

Use BYOS for training: Pre-compute state once, verify thousands of times
Run HARD first: Gate expensive SOFT calls behind cheap HARD checks
Batch API calls: Use /v1/batch endpoint for bulk verification
Cache LLM results: SOFT verifier results are deterministic for same input + rubric

← Execution SandboxLocal Execution →