MCP Server

The vr.dev SDK includes an MCP (Model Context Protocol) server that exposes verifiers as tools for AI assistants like Claude Desktop and Cursor.

Install

pip install vrdev[mcp]

This installs the mcp optional dependency alongside the core SDK.

Start the Server

# stdio transport (for Claude Desktop / Cursor)
python -m vrdev.adapters.mcp_server

Claude Desktop Setup

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "vrdev": {
      "command": "python",
      "args": ["-m", "vrdev.adapters.mcp_server"]
    }
  }
}

Cursor Setup

In Cursor settings, add a new MCP server:

Name: vrdev
Command: python -m vrdev.adapters.mcp_server
Transport: stdio

Available Tools

The MCP server exposes 6 tools:

list_verifiers

List all registered verifier IDs in the vr.dev registry.

→ list_verifiers()
← ["vr/filesystem.file_created", "vr/code.python.lint_ruff", ...]

Use this to discover available verifiers before running them.

run_verifier

Run a single verifier against agent completions. Returns verdict (PASS/FAIL), score, and evidence.

→ run_verifier(
    verifier_id="vr/filesystem.file_created",
    completions=["Created output.txt"],
    ground_truth={"expected_path": "/tmp/output.txt"}
  )
← { verdict: "PASS", score: 1.0, evidence: { ... } }

compose_chain

Run a composed chain of verifiers with hard-gating. With fail_closed policy, if any HARD verifier fails, the entire chain scores 0.0, preventing agents from gaming SOFT judges.

→ compose_chain(
    verifier_ids=["vr/tau2.retail.order_cancelled", "vr/rubric.email.tone_professional"],
    completions=["I cancelled order ORD-42 and sent confirmation"],
    ground_truth={"order_id": "ORD-42"},
    policy="fail_closed"
  )
← { verdict: "PASS", score: 0.85, breakdown: { ... } }

explain_failure

Run a verifier and get a human-readable markdown explanation of why it passed or failed. Useful for debugging agent behavior.

→ explain_failure(
    verifier_id="vr/code.python.lint_ruff",
    completions=["import os\nimport os"],
    ground_truth={"file_path": "example.py"}
  )
← "## Verification: vr/code.python.lint_ruff\n- Verdict: FAIL\n..."

search_verifiers

Search verifiers by keyword across IDs, descriptions, and domains.

→ search_verifiers(query="database")
← ["vr/database.row.exists", "vr/database.row.updated", "vr/database.table.row_count"]

gem_reward

Compute a GEM-compatible reward score from verification results. Useful for training integrations that expect the GEM reward format.

→ gem_reward(
    verifier_id="vr/code.python.tests_pass",
    completions=["def add(a, b): return a + b"],
    ground_truth={"repo": ".", "test_cmd": "pytest"}
  )
← { reward: 1.0, metadata: { ... } }

Example Workflow in Claude

Ask Claude: "Check if my Python file passes linting"
Claude calls search_verifiers(query="lint") → finds vr/code.python.lint_ruff
Claude calls run_verifier("vr/code.python.lint_ruff", ...)
Claude reports the verdict and any linting issues

Example: Composed Verification

Ask Claude: "Verify that the order was cancelled and the email was sent"
Claude calls compose_chain(["vr/tau2.retail.order_cancelled", "vr/aiv.email.sent_folder_confirmed"], ..., policy="fail_closed")
If the order check fails, the entire pipeline fails, even if the email looks fine

Programmatic Usage

You can also create the MCP server programmatically:

from vrdev.adapters.mcp_server import create_mcp_server

mcp = create_mcp_server()
mcp.run()  # starts stdio transport

This is useful for embedding the MCP server in a larger application or custom transport.

← Local vs HostedGolden Pipeline Templates →