MCP Server
The vr.dev SDK includes an MCP (Model Context Protocol) server that exposes verifiers as tools for AI assistants like Claude Desktop and Cursor.
Install
pip install vrdev[mcp]
This installs the mcp optional dependency alongside the core SDK.
Start the Server
# stdio transport (for Claude Desktop / Cursor)
python -m vrdev.adapters.mcp_server
Claude Desktop Setup
Add to your claude_desktop_config.json:
{
"mcpServers": {
"vrdev": {
"command": "python",
"args": ["-m", "vrdev.adapters.mcp_server"]
}
}
}
Cursor Setup
In Cursor settings, add a new MCP server:
- Name: vrdev
- Command:
python -m vrdev.adapters.mcp_server - Transport: stdio
Available Tools
The MCP server exposes 6 tools:
list_verifiers
List all registered verifier IDs in the vr.dev registry.
→ list_verifiers()
← ["vr/filesystem.file_created", "vr/code.python.lint_ruff", ...]
Use this to discover available verifiers before running them.
run_verifier
Run a single verifier against agent completions. Returns verdict (PASS/FAIL), score, and evidence.
→ run_verifier(
verifier_id="vr/filesystem.file_created",
completions=["Created output.txt"],
ground_truth={"expected_path": "/tmp/output.txt"}
)
← { verdict: "PASS", score: 1.0, evidence: { ... } }
compose_chain
Run a composed chain of verifiers with hard-gating. With fail_closed policy, if any HARD verifier fails, the entire chain scores 0.0, preventing agents from gaming SOFT judges.
→ compose_chain(
verifier_ids=["vr/tau2.retail.order_cancelled", "vr/rubric.email.tone_professional"],
completions=["I cancelled order ORD-42 and sent confirmation"],
ground_truth={"order_id": "ORD-42"},
policy="fail_closed"
)
← { verdict: "PASS", score: 0.85, breakdown: { ... } }
explain_failure
Run a verifier and get a human-readable markdown explanation of why it passed or failed. Useful for debugging agent behavior.
→ explain_failure(
verifier_id="vr/code.python.lint_ruff",
completions=["import os\nimport os"],
ground_truth={"file_path": "example.py"}
)
← "## Verification: vr/code.python.lint_ruff\n- Verdict: FAIL\n..."
search_verifiers
Search verifiers by keyword across IDs, descriptions, and domains.
→ search_verifiers(query="database")
← ["vr/database.row.exists", "vr/database.row.updated", "vr/database.table.row_count"]
gem_reward
Compute a GEM-compatible reward score from verification results. Useful for training integrations that expect the GEM reward format.
→ gem_reward(
verifier_id="vr/code.python.tests_pass",
completions=["def add(a, b): return a + b"],
ground_truth={"repo": ".", "test_cmd": "pytest"}
)
← { reward: 1.0, metadata: { ... } }
Example Workflow in Claude
- Ask Claude: "Check if my Python file passes linting"
- Claude calls
search_verifiers(query="lint")→ findsvr/code.python.lint_ruff - Claude calls
run_verifier("vr/code.python.lint_ruff", ...) - Claude reports the verdict and any linting issues
Example: Composed Verification
- Ask Claude: "Verify that the order was cancelled and the email was sent"
- Claude calls
compose_chain(["vr/tau2.retail.order_cancelled", "vr/aiv.email.sent_folder_confirmed"], ..., policy="fail_closed") - If the order check fails, the entire pipeline fails, even if the email looks fine
Programmatic Usage
You can also create the MCP server programmatically:
from vrdev.adapters.mcp_server import create_mcp_server
mcp = create_mcp_server()
mcp.run() # starts stdio transport
This is useful for embedding the MCP server in a larger application or custom transport.