Execution Sandbox
Verifiers run in your environment, not ours. This page documents the security model.
Permission Model
Every verifier declares the permissions it requires in its registry entry:
| Permission | What it allows | Example verifiers |
|-----------|---------------|-------------------|
| fs:read | Read files from disk | document.json.valid, filesystem.file_created |
| db:read | Execute SELECT queries | database.row.exists, database.row.updated |
| net:http | Make outbound HTTP GET requests | api.http.status_ok |
| shell:exec | Run shell commands | code.python.tests_pass, code.python.lint_ruff |
| llm:call | Call an LLM API | rubric.summary.faithful |
| browser:read | Read browser DOM state | web.browser.element_visible |
Security Guarantees
Minimal Write Surface
Most HARD verifiers are read-only. They observe state but never modify target systems:
- Database verifiers use SELECT, never INSERT/UPDATE/DELETE
- File verifiers open in read mode only
- API verifiers use GET requests only
Exception: Code verifiers. code.python.tests_pass and code.python.lint_ruff require fs:write_tmp permission. They write agent-generated code to an isolated temp directory and execute it via pytest/ruff. These verifiers never modify your source tree; writes are confined to OS-managed temp directories that are cleaned up after execution. The permission is declared explicitly in each verifier's registry entry.
No Agent Text Execution in Non-Code Verifiers
Non-code verifiers never eval() or execute agent completions. The agent's text output is compared against ground truth, but never treated as code or commands. Code verifiers (code.python.tests_pass, code.python.lint_ruff) do execute agent-provided code, but only in sandboxed temp environments, never in your working directory.
Scoped Network Access
API verifiers only contact URLs specified in ground_truth. They don't follow redirects to different hosts or resolve DNS dynamically.
Timeout Enforcement
All external operations (HTTP, database, shell) enforce configurable timeouts:
- HTTP: 10 seconds default
- Database: 5 seconds default
- Shell: 30 seconds default
BYOS Bypass
The pre_result pattern (see BYOS docs) lets you skip all external access entirely. When pre_result is provided, the verifier performs pure in-memory comparison with zero system access.
Docker Isolation
For production deployments, the vrdev API server runs in a minimal Docker container:
FROM python:3.11-slim
# No shell tools, no network tools, minimal attack surface
RUN pip install vrdev
The container has no access to host filesystem, databases, or network beyond what you explicitly configure.