About vr.dev
What we do
vr.dev is an open platform for verifying AI agent actions. We maintain a registry of reward verifiers: software modules that check whether an AI agent actually did what it claimed to do, by inspecting real system state rather than trusting the agent's self-report.
Why it matters
Research shows that 27-78% of agent “successes” are procedurally wrong. The agent says it cancelled the order, but the database still shows it active. It says it sent the email, but the content doesn't match the rubric. Training on these false positives creates models that learn to appear correct instead of being correct.
Verifiable rewards fix this by introducing ground-truth checks into the training loop, evaluation pipeline, and production monitoring stack.
The approach
Every verifier in the registry operates at one of three tiers: HARD (deterministic state checks), SOFT (rubric-based LLM judges), or AGENTIC (agent-driven probing). These compose into reward pipelines where hard checks gate soft scores, preventing reward hacking by design.
Every verification returns raw evidence payloads for full auditability. The hosted API adds integrity (Ed25519 signatures, content-hashed evidence chains) and optional on-chain anchoring (Merkle roots on Base L2) for third-party-verifiable tamper evidence.
What's open source
The following are open source under the MIT license:
- Python SDK (
vrdevpackage) - CLI tool (
vr) - All verifier registry specs and verification logic
- Composition engine and fixture data
- MCP server for Claude Desktop / Cursor
You can run everything locally with pip install vrdev. No API key, no hosted service required. HARD and SOFT verifiers run fully offline; AGENTIC verifiers may require network access to probe external services (IMAP, CalDAV, browsers).
Hosted service
The vr.dev API adds managed infrastructure on top of the open-source SDK:
- Managed execution of AGENTIC verifiers (IMAP, CalDAV, browser)
- Evidence storage with configurable retention
- Team/org management and API key provisioning
- Audit trail exports for compliance
- Rate limiting and usage analytics
Pricing
The open-source SDK runs all verifiers locally at zero cost. The hosted API charges per verification via USDC micropayments on Base using the x402 protocol. During beta, payments use Base Sepolia (testnet). Mainnet coming soon.
HARD
$0.005 USDC
- Deterministic state checks
- Sub-second, no LLM calls
- Database, filesystem, HTTP, git
SOFT
$0.05 USDC
- Rubric-based LLM judges
- Content quality, tone, structure
- Structured prompt consistency
AGENTIC
$0.15 USDC
- Agent-driven probing
- Browser, calendar, shell
- Multi-step verification
API key billing and team invoicing coming soon. See pricing page for full details. The open-source SDK is always free and fully functional for local use.
Stage
vr.dev is at v1.0.0. The SDK is stable (342+ test fixtures, 85%+ coverage), the registry has 38 verifiers across 19 domains, and the hosted API is live. The hosted API provides Ed25519-signed evidence chains with optional on-chain anchoring via Base L2. We're actively adding verifiers, improving documentation, and gathering feedback from early users.
Contributions welcome: see the contributing guide. Follow us on Bluesky.
Case Studies
Our benchmark case study demonstrates how HARD-gated verification eliminates 100% of false positives in a 100-episode e-commerce simulation, compared to 35% false-positive rate with soft-only scoring.
If you're using vr.dev for RL training, CI/CD gating, or production monitoring and would like to share your experience, reach out at hello@vr.dev.