About vr.dev

What we do

vr.dev is an open platform for verifying AI agent actions. We maintain a registry of reward verifiers: software modules that check whether an AI agent actually did what it claimed to do, by inspecting real system state rather than trusting the agent's self-report.

Why it matters

Research shows that 27-78% of agent “successes” are procedurally wrong. The agent says it cancelled the order, but the database still shows it active. It says it sent the email, but the content doesn't match the rubric. Training on these false positives creates models that learn to appear correct instead of being correct.

Verifiable rewards fix this by introducing ground-truth checks into the training loop, evaluation pipeline, and production monitoring stack.

The approach

Every verifier in the registry operates at one of three tiers: HARD (deterministic state checks), SOFT (rubric-based LLM judges), or AGENTIC (agent-driven probing). These compose into reward pipelines where hard checks gate soft scores, preventing reward hacking by design.

Every verification returns raw evidence payloads for full auditability. The hosted API adds integrity (Ed25519 signatures, content-hashed evidence chains) and optional on-chain anchoring (Merkle roots on Base L2) for third-party-verifiable tamper evidence.

What's open source

The following are open source under the MIT license:

Python SDK (vrdev package)
CLI tool (vr)
All verifier registry specs and verification logic
Composition engine and fixture data
MCP server for Claude Desktop / Cursor

You can run everything locally with pip install vrdev. No API key, no hosted service required. HARD and SOFT verifiers run fully offline; AGENTIC verifiers may require network access to probe external services (IMAP, CalDAV, browsers).

Hosted service

The vr.dev API adds managed infrastructure on top of the open-source SDK:

Managed execution of AGENTIC verifiers (IMAP, CalDAV, browser)
Evidence storage with configurable retention
Team/org management and API key provisioning
Audit trail exports for compliance
Rate limiting and usage analytics

Pricing

The open-source SDK runs all verifiers locally at zero cost. The hosted API charges per verification via USDC micropayments on Base using the x402 protocol. During beta, payments use Base Sepolia (testnet). Mainnet coming soon.

HARD

$0.005 USDC

Deterministic state checks
Sub-second, no LLM calls
Database, filesystem, HTTP, git

SOFT

$0.05 USDC

Rubric-based LLM judges
Content quality, tone, structure
Structured prompt consistency

AGENTIC

$0.15 USDC

Agent-driven probing
Browser, calendar, shell
Multi-step verification

API key billing and team invoicing coming soon. See pricing page for full details. The open-source SDK is always free and fully functional for local use.

Stage

vr.dev is at v1.0.0. The SDK is stable (342+ test fixtures, 85%+ coverage), the registry has 38 verifiers across 19 domains, and the hosted API is live. The hosted API provides Ed25519-signed evidence chains with optional on-chain anchoring via Base L2. We're actively adding verifiers, improving documentation, and gathering feedback from early users.

Contributions welcome: see the contributing guide. Follow us on Bluesky.

Case Studies

Our benchmark case study demonstrates how HARD-gated verification eliminates 100% of false positives in a 100-episode e-commerce simulation, compared to 35% false-positive rate with soft-only scoring.

If you're using vr.dev for RL training, CI/CD gating, or production monitoring and would like to share your experience, reach out at hello@vr.dev.