AI cost approval before merge

Class1

Turn every AI pull request into a priced, risk-adjusted business decision before it ships.

Class1 reads the diff, models the monthly cost delta, declares P50/P90/P95, checks model fit, names the tail drivers, projects escalation, adds carbon and water, and enforces the budget gate in CI.

Open the sandbox See the PR gate

pull request #184 budget gate: fail

+$3.6kP50 / mo

+$22.4kP90 / mo

Class 4estimate basis

volumedriver to attack

- model="gpt-4o-mini" + model="gpt-4.1" + max_tokens=8192 + retries=5

6,924 priced model rows effective 2026-06-06

7,389 model metadata rows drop-nothing spec sheet

38 coding grades SWE-Bench Verified

873 pytest cases 97 test files

103 Python modules engine, takeoff, ledger, organism

The category

Not another spend dashboard. A pre-merge cost approval layer.

Dashboards tell you what happened after the money is gone. Class1 answers the decision while the change is still a pull request.

The method is cost engineering: quantity takeoff, rate basis, contingency, escalation, estimate class, and actuals calibration.

Buyer moment

The cost decision belongs in code review because the architecture is still negotiable.

Most AI cost tools start after production telemetry exists. By then the model choice, retry policy, context shape, fallback path, and tool schema architecture have already become habits. Class1 moves the decision upstream, where the team can still cap output, narrow context, lazy-load tools, choose a fit-for-purpose model, or require a budget owner before merge.

The buyer is not buying a prettier dashboard. The buyer is buying a governance moment: a repeatable way to ask whether a software change creates recurring AI spend, whether that spend is justified, and which control lowers the tail without blocking useful engineering work.

That is why the homepage leads with P90 after scale. Expected cost is useful for discussion, but P90 is the number a finance team can approve against. Class1 keeps both numbers visible and separates the modeled risk band from the estimate class, so a report can be useful without pretending to be definitive.

CTO

Will this PR create an unstable AI workload?

See callsites, model swaps, max-token changes, retry/fallback exposure, MCP schema overhead, and the exact controls that reduce the tail.

CFO

What recurring spend should we approve at P90?

Review expected, P50, P90, P95, worst case, estimate class, budget gate status, and the 12-month escalation curve.

CEO

Is this feature worth the cost after scale?

Approve, defer, or require controls with one report that engineering and finance can both defend.

Why now

AI spend is leaving the infrastructure budget and entering the product design loop.

Agents multiply hidden workA single feature can add retries, fallbacks, tool definitions, longer outputs, and larger context windows. The invoice shows the aggregate later; Class1 shows the architectural source before merge.

Cheap per token is not the same as cheap per taskA weak model can look inexpensive until retries, failures, and human rework are counted. Class1 prices the completed task, not only the token.

Governance needs a narrow leverThe policy gate is intentionally simple: positive P90 monthly delta versus a declared budget. That makes the enforcement explainable to engineering and finance.

Calibration becomes the moatEvery post-merge actual can become an estimate-actual pair. The product improves because the organisation learns its own retry tails, demand spikes, and model-fit economics.

Free to estimate. Paid to enforce.

The wedge is simple: comments are education, blocking checks are governance.

Open core proves the forecast. The Business Pilot sells private repo installation, blocking P90 policy gates, actuals ingestion, a private Blue Book basis, and monthly variance reports.

Apply for the Business Pilot

01Scan diffPython, TypeScript, JavaScript callsites

02Price the deltaMonte Carlo plus structured rates

03Fail if neededP90 over budget returns non-zero CI

04Learn from actualsEstimate -> actual -> variance -> calibration

Explore the system

Every page is grounded in a real module, dataset, or test path.

Product The PR comment, policy gate, config, and CI workflow. Platform The cost engine, takeoff, Blue Book, footprint, and autobuild layers. Ledger Frozen pricing, specs, capability, actuals, and cloud basis. Footprint Carbon, water, and materials as a second currency. Trust Tests, assumptions, honest gaps, and reproducibility discipline. Pilot How the product becomes revenue without pretending the open items are done.