AGENTIC Verifier Maintenance
AGENTIC verifiers are the most powerful - and most fragile - tier. They launch sub-agents that interact with real systems (browsers, APIs, calendars). This page explains how they work and how to keep them healthy.
How AGENTIC Verifiers Work
Unlike HARD verifiers (deterministic state checks) and SOFT verifiers (LLM judges), AGENTIC verifiers:
- Launch a sub-agent (browser automation, API client, etc.)
- Interact with the target system (navigate pages, read calendar events, query APIs)
- Extract state and compare against expected outcomes
- Return structured evidence with screenshots, API responses, or DOM snapshots
# Example: calendar event verification
v = get_verifier("vr/aiv.calendar.event_created")
result = v.verify(VerifierInput(
completions=["Meeting scheduled for 3pm"],
ground_truth={
"calendar_id": "primary",
"expected_summary": "Team standup",
"expected_time": "2026-03-07T15:00:00Z",
},
))
Why They Break
AGENTIC verifiers can break when target systems change:
| Change | Impact | Detection | |--------|--------|-----------| | UI redesign | Browser selectors fail | Adversarial fixture failure | | API version bump | Response schema changes | Negative fixture failure | | Auth flow change | Login steps fail | All fixture types fail | | Rate limiting | Intermittent failures | Flaky positive fixtures |
Maintenance Checklist
Weekly
- Run
vr test --tier AGENTICto catch regressions - Check for API deprecation notices from target services
On Failure
- Run the adversarial fixtures first - they're designed to catch common breakage patterns
- Check evidence output for DOM snapshots or API error responses
- Update selectors/schemas in the verifier implementation
- Re-run all three fixture types (positive, negative, adversarial)
Versioning
When you update an AGENTIC verifier's implementation, bump the version in VERIFIER.json:
{
"version": "0.2.0",
"changelog": "Updated browser selectors for new calendar UI"
}
AGENTIC vs HARD Fallbacks
For critical pipelines, compose an AGENTIC verifier with a HARD fallback:
pipeline = compose(
[
get_verifier("vr/aiv.calendar.event_created"), # AGENTIC: full check
get_verifier("vr/api.http.status_ok"), # HARD: API fallback
],
policy_mode=PolicyMode.ESCALATION,
)
If the AGENTIC verifier times out or errors, the HARD verifier provides a baseline check.
Cost Considerations
AGENTIC verifiers are the most expensive tier ($0.15/call) because they launch sub-agents. For RL training loops, prefer HARD verifiers for speed and use AGENTIC verifiers only for final evaluation.