Golden Pipeline Templates
Pre-composed verification pipelines for common agent tasks. Copy, paste, and adapt.
Each template uses the compose() API with policy_mode="fail_closed". SOFT scores only count if all HARD checks pass first.
1. Cancel Order and Verify (Retail)
Use case: Agent cancels an order, processes a refund, and sends a notification email.
Verifiers: 2× HARD + 1× AGENTIC (fail_closed)
from vrdev import get_verifier, compose, VerifierInput
from vrdev.core.types import PolicyMode
pipeline = compose(
[get_verifier("vr/tau2.retail.order_cancelled"),
get_verifier("vr/tau2.retail.refund_processed"),
get_verifier("vr/aiv.email.sent_folder_confirmed")],
require_hard=True,
policy_mode=PolicyMode.FAIL_CLOSED,
)
result = pipeline.verify(VerifierInput(
completions=["Order cancelled and confirmation sent"],
ground_truth={
"order_id": "ORD-42",
"expected_refund_amount": 49.99,
"email_subject": "Your order has been cancelled",
},
))
print(result[0].passed) # True only if ALL checks pass
print(result[0].score) # 1.0 or 0.0
print(result[0].breakdown) # per-verifier results
Why this works: Even if the agent writes a perfect cancellation email, the pipeline fails if the order is still active or the refund wasn't processed.
2. Code Agent with Quality Gate (Dev)
Use case: Agent writes or modifies code. Must pass linting and tests before style is scored.
Verifiers: 2× HARD gate + 1× SOFT scorer
from vrdev import get_verifier, compose, VerifierInput
from vrdev.core.types import PolicyMode
pipeline = compose(
[get_verifier("vr/code.python.lint_ruff"),
get_verifier("vr/code.python.tests_pass"),
get_verifier("vr/rubric.code.logic_correct")],
require_hard=True,
policy_mode=PolicyMode.FAIL_CLOSED,
)
result = pipeline.verify(VerifierInput(
completions=["Fixed the bug and committed"],
ground_truth={
"file_path": "src/handler.py",
"repo": ".",
"test_cmd": "pytest tests/ -q",
},
))
# If linting or tests fail, rubric score is zeroed out
# Agent can't get a high score by writing "nice-looking" broken code
print(f"Score: {result[0].score:.2f}")
Why this works: The SOFT rubric only contributes to the score if both HARD gates pass. An agent can't game the LLM judge score while submitting code that doesn't lint or pass tests.
3. Email with Tone Check (Support)
Use case: Agent sends a customer email. Must verify the email was actually sent (AGENTIC) before scoring tone quality (SOFT).
Verifiers: 1× AGENTIC gate + 1× SOFT scorer
from vrdev import get_verifier, compose, VerifierInput
from vrdev.core.types import PolicyMode
pipeline = compose(
[get_verifier("vr/aiv.email.sent_folder_confirmed"),
get_verifier("vr/rubric.email.tone_professional")],
policy_mode=PolicyMode.FAIL_CLOSED,
)
result = pipeline.verify(VerifierInput(
completions=["Sent a response to the customer"],
ground_truth={
"email_subject": "Re: Your support ticket #1234",
"expected_recipient": "customer@example.com",
},
))
# SOFT score only counts if the email was actually sent
print(f"Sent: {result[0].passed}")
print(f"Score: {result[0].score:.2f}")
Why this works: An agent that generates a beautifully written email but never actually sends it gets a score of 0.0.
Adapting Templates
Change the policy mode
fail_closed(default): Any HARD/AGENTIC FAIL or ERROR → score 0.0fail_open: Only explicit FAIL blocks the pipeline; ERROR is toleratedescalation: Run tiers in order, stop when a tier passesensemble: Run all verifiers and aggregate scores
Use with the CLI
vr compose \
--verifiers vr/tau2.retail.order_cancelled,vr/tau2.retail.refund_processed \
--policy fail_closed \
--ground-truth '{"order_id": "ORD-42"}'
Export for RL training
from vrdev import export_to_trl
# Run pipeline on many episodes, export for GRPO/DPO
export_to_trl(results, output="training_data.jsonl")
Building Your Own Pipeline
- Browse the Registry to find verifiers for your domain
- Compose HARD checks first (state verification), then SOFT (quality scoring)
- Use
fail_closedto prevent reward hacking - Test with adversarial inputs: use each verifier's built-in adversarial fixtures
- Export results to your training framework
See also: Composition Engine · BYOS Pattern · Integration Guide