The proving ground

Oracle helps hone the next run.

Point Oracle at your repo. It turns merged PRs into fixtures, reruns your agent against them, scores what changed, and sends back corpus improvements as pull requests.

Cyberpunk evaluation chamber with a luminous software architecture being measured.Greek proving chamber with a bronze project model, tablets, and measuring instruments.Arkham evaluation room with an impossible architectural model held in a diagnostic basin.
Oracle view The same project, tested in isolation The build floor makes the attempt. Oracle measures what improved and what still drifts. Cyberpunk / Greek / Arkham
How it works

Measurement turns correction into compounding project memory.

The point is not a prettier report. It is knowing whether a corpus change made future agent work better, worse, or merely different.

01
PR merges

Oracle reads the merged PR, ticket, review comments, and resulting diff.

02
Fixture generated

The task, golden diff, and review context become a regression fixture in .foundry/fixtures/.

03
Agent reruns

Your corpus plus the fixture task runs in an isolated worktree, using your own LLM keys.

04
Oracle scores

Completion, correctness, craft, efficiency, and precision roll into a measurable corpus score.

05
Gap diagnosed

Low scores point to missing context, stale docs, weak guardrails, or over-broad instructions.

06
PR comes back

Oracle proposes the corpus change and tracks whether the next run actually improves.

The eval team

Oracle does not build. It measures the work honestly.

The Artificer under test is your own agent setup. Steward keeps the run isolated from the golden answer. Oracle scores the attempt and names the gap.

around the project ORACLE
ORACLE — Measures improvementORACLE — Measures improvementORACLE — Measures improvementSTEWARD — Preserves isolationSTEWARD — Preserves isolationSTEWARD — Preserves isolation

ORACLE

Measures improvement

Runs the attempt against fixtures, scores what improved, catches regressions, and explains the gap.

Bring your own infrastructure

Your project stays yours.

Your repo is the source of truth

Oracle writes fixtures, corpus proposals, and analytics into .foundry/ so the important state stays with you.

Your keys, your LLM

Evaluation runs use your provider credentials per run. We do not need to store your keys.

Minimal hosted surface

The hosted layer coordinates jobs and trend history. Your project, corpus, and fixtures remain portable.

Closed beta

Score your corpus against real work.

We are onboarding design partners first. Leave your email and a repo if you want an early corpus audit.