Integration Guide
TRL (Transformer Reinforcement Learning)
Export verified rewards directly to TRL format:
from vrdev import get_verifier, compose, VerifierInput, export_to_trl
from vrdev.core.types import PolicyMode
pipeline = compose(
[get_verifier("vr/tau2.retail.order_cancelled"),
get_verifier("vr/rubric.email.tone_professional")],
require_hard=True,
policy_mode=PolicyMode.FAIL_CLOSED,
)
# Run verification across your eval set
results = [pipeline.verify(episode) for episode in eval_set]
# Export TRL-compatible reward dataset
export_to_trl(results, output="rewards.jsonl")
The output JSONL contains one record per verification with the reward signal, evidence hash, and trajectory reference.
VERL
For VERL-based training:
from vrdev import export_to_verl
export_to_verl(results, output="verl_rewards/")
This generates the directory structure VERL expects, including reward shaping metadata.
OpenClaw Adapter
The openclaw.cancel_and_email skill in the registry shows how to integrate vr.dev verifiers as the reward function for an OpenClaw skill:
from openclaw import Skill
from vrdev import compose
reward_fn = compose([
"tau2.retail.order_cancelled",
"aiv.email.sent_folder_confirmed",
"rubric.email.tone_professional",
], policy_mode="fail_closed")
skill = Skill(
name="cancel_and_email",
reward=reward_fn,
max_steps=15,
)
Custom Training Loops
For arbitrary training frameworks, use the raw result objects:
from vrdev import get_verifier, VerifierInput
v = get_verifier("vr/code.python.tests_pass")
result = v.verify(VerifierInput(
completions=["agent output here"],
ground_truth={"repo": ".", "test_cmd": "pytest"},
))
reward = result[0].score # float 0.0-1.0
passed = result[0].passed # bool
evidence = result[0].evidence # dict
hash = result[0].evidence_hash # str, sha256:...
Map these to your framework's reward format as needed.
CI/CD Integration
Add verification to your CI pipeline:
# .github/workflows/verify.yml
- name: Verify agent outputs
run: |
pip install vrdev
vr compose \
--verifiers code.python.lint_ruff,code.python.tests_pass \
--ground-truth '{"repo": "."}' \
--policy fail_closed \
--output json