AGENTIC Verifier Maintenance

AGENTIC verifiers are the most powerful - and most fragile - tier. They launch sub-agents that interact with real systems (browsers, APIs, calendars). This page explains how they work and how to keep them healthy.

How AGENTIC Verifiers Work

Unlike HARD verifiers (deterministic state checks) and SOFT verifiers (LLM judges), AGENTIC verifiers:

Launch a sub-agent (browser automation, API client, etc.)
Interact with the target system (navigate pages, read calendar events, query APIs)
Extract state and compare against expected outcomes
Return structured evidence with screenshots, API responses, or DOM snapshots

# Example: calendar event verification
v = get_verifier("vr/aiv.calendar.event_created")
result = v.verify(VerifierInput(
    completions=["Meeting scheduled for 3pm"],
    ground_truth={
        "calendar_id": "primary",
        "expected_summary": "Team standup",
        "expected_time": "2026-03-07T15:00:00Z",
    },
))

Why They Break

AGENTIC verifiers can break when target systems change:

| Change | Impact | Detection | |--------|--------|-----------| | UI redesign | Browser selectors fail | Adversarial fixture failure | | API version bump | Response schema changes | Negative fixture failure | | Auth flow change | Login steps fail | All fixture types fail | | Rate limiting | Intermittent failures | Flaky positive fixtures |

Maintenance Checklist

Weekly

Run vr test --tier AGENTIC to catch regressions
Check for API deprecation notices from target services

On Failure

Run the adversarial fixtures first - they're designed to catch common breakage patterns
Check evidence output for DOM snapshots or API error responses
Update selectors/schemas in the verifier implementation
Re-run all three fixture types (positive, negative, adversarial)

Versioning

When you update an AGENTIC verifier's implementation, bump the version in VERIFIER.json:

{
  "version": "0.2.0",
  "changelog": "Updated browser selectors for new calendar UI"
}

AGENTIC vs HARD Fallbacks

For critical pipelines, compose an AGENTIC verifier with a HARD fallback:

pipeline = compose(
    [
        get_verifier("vr/aiv.calendar.event_created"),   # AGENTIC: full check
        get_verifier("vr/api.http.status_ok"),            # HARD: API fallback
    ],
    policy_mode=PolicyMode.ESCALATION,
)

If the AGENTIC verifier times out or errors, the HARD verifier provides a baseline check.

Cost Considerations

AGENTIC verifiers are the most expensive tier ($0.15/call) because they launch sub-agents. For RL training loops, prefer HARD verifiers for speed and use AGENTIC verifiers only for final evaluation.

← End-to-End WalkthroughLocal vs Hosted →