Product surface
Diff -> callsites -> scenario pair -> PR comment -> policy verdict.
- Python AST scanner
- TS/JS tree-sitter scanner
- estimate_pr pure core
- GitHub Action glue
Platform
103 Python modules span cost simulation, callsite takeoff, data ledgers, actuals, footprint, and a self-maintenance loop.
Diff -> callsites -> scenario pair -> PR comment -> policy verdict.
No DB, no GitHub, no UI. Just the deterministic engine behind the forecast.
The reference data and actuals layer that makes estimates reproducible and calibratable.
A dry-run-first loop that proposes and verifies improvements against the same cost controls.
Architecture depth
The cost engine is pure math. It does not know about GitHub, databases, UI, or billing. That boundary makes the forecast testable, portable, and usable from local demos, CI, future web demos, or private pilot installs.
The takeoff layer owns product context: it reads code changes, finds AI callsites, builds baseline and PR scenarios, and renders a report reviewers can understand. The Blue Book owns the basis: prices, structures, specs, capability grades, actuals, and provenance. The split keeps the model honest: the engine calculates, the ledger substantiates, and the product surface explains.
Autobuild is included because it proves the recursive idea. Class1 can meter its own development loop with the same pricing basis it uses for customer workloads, then feed self-calibration into the actuarial table. That does not replace human judgement, but it shows the platform can govern AI work, including its own.
The thesis
Class1 models the fully loaded bill: token rates, cache structures, model capability, MCP/tool schemas, cloud services, committed capacity, owned GPUs, carbon, water, materials, and estimate decay.
Platform proof points
Month-level systemic risk, p50/p90 lognormals, paired common random numbers.
Tornado sensitivity ranks who owns the P90 tail.
Price decline, volume growth, and structural drift separated instead of called inflation.
Ranks by cost per completed task using task-specific benchmarks.
Cache, batch, context tiers, reasoning effort, and effective dates.
Tool schemas as recurring input tokens; architecture break-even analysis.
FOCUS and provider usage become estimate-actual pairs.
Basis status, hash chains, and reproducible snapshots.