AGENTIC Verifier Maintenance

AGENTIC verifiers are the most powerful - and most fragile - tier. They launch sub-agents that interact with real systems (browsers, APIs, calendars). This page explains how they work and how to keep them healthy.

How AGENTIC Verifiers Work

Unlike HARD verifiers (deterministic state checks) and SOFT verifiers (LLM judges), AGENTIC verifiers:

  1. Launch a sub-agent (browser automation, API client, etc.)
  2. Interact with the target system (navigate pages, read calendar events, query APIs)
  3. Extract state and compare against expected outcomes
  4. Return structured evidence with screenshots, API responses, or DOM snapshots
# Example: calendar event verification
v = get_verifier("vr/aiv.calendar.event_created")
result = v.verify(VerifierInput(
    completions=["Meeting scheduled for 3pm"],
    ground_truth={
        "calendar_id": "primary",
        "expected_summary": "Team standup",
        "expected_time": "2026-03-07T15:00:00Z",
    },
))

Why They Break

AGENTIC verifiers can break when target systems change:

| Change | Impact | Detection | |--------|--------|-----------| | UI redesign | Browser selectors fail | Adversarial fixture failure | | API version bump | Response schema changes | Negative fixture failure | | Auth flow change | Login steps fail | All fixture types fail | | Rate limiting | Intermittent failures | Flaky positive fixtures |

Maintenance Checklist

Weekly

  • Run vr test --tier AGENTIC to catch regressions
  • Check for API deprecation notices from target services

On Failure

  1. Run the adversarial fixtures first - they're designed to catch common breakage patterns
  2. Check evidence output for DOM snapshots or API error responses
  3. Update selectors/schemas in the verifier implementation
  4. Re-run all three fixture types (positive, negative, adversarial)

Versioning

When you update an AGENTIC verifier's implementation, bump the version in VERIFIER.json:

{
  "version": "0.2.0",
  "changelog": "Updated browser selectors for new calendar UI"
}

AGENTIC vs HARD Fallbacks

For critical pipelines, compose an AGENTIC verifier with a HARD fallback:

pipeline = compose(
    [
        get_verifier("vr/aiv.calendar.event_created"),   # AGENTIC: full check
        get_verifier("vr/api.http.status_ok"),            # HARD: API fallback
    ],
    policy_mode=PolicyMode.ESCALATION,
)

If the AGENTIC verifier times out or errors, the HARD verifier provides a baseline check.

Cost Considerations

AGENTIC verifiers are the most expensive tier ($0.15/call) because they launch sub-agents. For RL training loops, prefer HARD verifiers for speed and use AGENTIC verifiers only for final evaluation.

AGENTIC Verifier Maintenance | vr.dev Docs