Oracle reads the merged PR, ticket, review comments, and resulting diff.
Oracle helps hone the next run.
Point Oracle at your repo. It turns merged PRs into fixtures, reruns your agent against them, scores what changed, and sends back corpus improvements as pull requests.


Measurement turns correction into compounding project memory.
The point is not a prettier report. It is knowing whether a corpus change made future agent work better, worse, or merely different.
The task, golden diff, and review context become a regression fixture in .foundry/fixtures/.
Your corpus plus the fixture task runs in an isolated worktree, using your own LLM keys.
Completion, correctness, craft, efficiency, and precision roll into a measurable corpus score.
Low scores point to missing context, stale docs, weak guardrails, or over-broad instructions.
Oracle proposes the corpus change and tracks whether the next run actually improves.
Oracle does not build. It measures the work honestly.
The Artificer under test is your own agent setup. Steward keeps the run isolated from the golden answer. Oracle scores the attempt and names the gap.





ORACLE
Runs the attempt against fixtures, scores what improved, catches regressions, and explains the gap.
Your project stays yours.
Oracle writes fixtures, corpus proposals, and analytics into .foundry/ so the important state stays with you.
Evaluation runs use your provider credentials per run. We do not need to store your keys.
The hosted layer coordinates jobs and trend history. Your project, corpus, and fixtures remain portable.
Score your corpus against real work.
We are onboarding design partners first. Leave your email and a repo if you want an early corpus audit.