Execution Sandbox

Verifiers run in your environment, not ours. This page documents the security model.

Permission Model

Every verifier declares the permissions it requires in its registry entry:

| Permission | What it allows | Example verifiers | |-----------|---------------|-------------------| | fs:read | Read files from disk | document.json.valid, filesystem.file_created | | db:read | Execute SELECT queries | database.row.exists, database.row.updated | | net:http | Make outbound HTTP GET requests | api.http.status_ok | | shell:exec | Run shell commands | code.python.tests_pass, code.python.lint_ruff | | llm:call | Call an LLM API | rubric.summary.faithful | | browser:read | Read browser DOM state | web.browser.element_visible |

Security Guarantees

Minimal Write Surface

Most HARD verifiers are read-only. They observe state but never modify target systems:

Database verifiers use SELECT, never INSERT/UPDATE/DELETE
File verifiers open in read mode only
API verifiers use GET requests only

Exception: Code verifiers. code.python.tests_pass and code.python.lint_ruff require fs:write_tmp permission. They write agent-generated code to an isolated temp directory and execute it via pytest/ruff. These verifiers never modify your source tree; writes are confined to OS-managed temp directories that are cleaned up after execution. The permission is declared explicitly in each verifier's registry entry.

No Agent Text Execution in Non-Code Verifiers

Non-code verifiers never eval() or execute agent completions. The agent's text output is compared against ground truth, but never treated as code or commands. Code verifiers (code.python.tests_pass, code.python.lint_ruff) do execute agent-provided code, but only in sandboxed temp environments, never in your working directory.

Scoped Network Access

API verifiers only contact URLs specified in ground_truth. They don't follow redirects to different hosts or resolve DNS dynamically.

Timeout Enforcement

All external operations (HTTP, database, shell) enforce configurable timeouts:

HTTP: 10 seconds default
Database: 5 seconds default
Shell: 30 seconds default

BYOS Bypass

The pre_result pattern (see BYOS docs) lets you skip all external access entirely. When pre_result is provided, the verifier performs pure in-memory comparison with zero system access.

Docker Isolation

For production deployments, the vrdev API server runs in a minimal Docker container:

FROM python:3.11-slim
# No shell tools, no network tools, minimal attack surface
RUN pip install vrdev

The container has no access to host filesystem, databases, or network beyond what you explicitly configure.

← Trust ModelCost & Latency Profile →