Scan
Find LLM callsites in Python, TypeScript, and JavaScript. Detect OpenAI, Anthropic, LiteLLM, Gemini, LangChain, model constructors, aliases, max_tokens, tool/MCP lists, and default-model inference.
Product
Class1 turns a diff into a budget-case approval workflow. It does not wait for spend to appear in production.
Find LLM callsites in Python, TypeScript, and JavaScript. Detect OpenAI, Anthropic, LiteLLM, Gemini, LangChain, model constructors, aliases, max_tokens, tool/MCP lists, and default-model inference.
Convert only defensible code signals into scenario changes. Model swaps change the rate. max_tokens changes output tail. New callsites are counted but not fabricated into volume.
Run paired Monte Carlo with common random numbers. Report expected, P50, P80, P90, P95, and worst monthly delta with exact-zero self-delta.
Rank tail drivers, project escalation, recommend a model by cost per completed task, and attach the Basis of Estimate.
If a .class1 budget exists and positive P90 delta exceeds it, the check fails. No budget stays advisory. Non-increases never block.
After merge, actuals pair with the stored estimate. The class improves only when reality proves it.
What the gate actually reads
The scanner looks for LLM callsites and concrete parameters: model identifiers, maximum output tokens, constructor-bound model choices, provider wrappers, LangChain chat classes, Gemini model objects, LiteLLM aliases, and TypeScript or JavaScript patterns when tree-sitter is available.
When a pull request changes a model from a cheaper model to a frontier model, the rate basis changes. When max_tokens rises, the output tail widens. When tool schemas are loaded into every request, MCP overhead becomes recurring input tokens. These are defensible signals because the code itself exposes them.
The product deliberately does not invent traffic volume from a new callsite. Added callsites are counted and called out, but their magnitude belongs to actuals and calibration. That restraint matters in sales: the report is credible because it refuses to fill unknowns with confident fiction.
PR comment anatomy
Approval semantics
Config surface
The enforcement lever is intentionally narrow: P90 monthly delta against a declared budget. A PR that does not increase cost cannot be blocked by the gate.
fail_pr_if:
delta_p90_usd: 500
warn_at_fraction: 0.8
Python AST is stdlib/offline. TS/JS uses tree-sitter when installed and degrades cleanly otherwise.
The Action runs tests, estimates the diff, posts the comment, then enforces the gate so the PR still gets context when it fails.
--json writes the gate payload for dashboards, governance systems, and reporting workflows.
--persist stores estimates so later actuals can close the loop and improve estimate class.
Product objections
The default mode is advisory. Teams can start by posting comments, then turn on blocking only for repositories or workflows where P90 budget exposure matters.
That is why the estimate class is part of the report. Class 5 and Class 4 are still useful for risk detection, but the product does not pretend they are definitive budgets.
Production usage is necessary for calibration, but it arrives after the decision. Class1 uses actuals to improve future estimates while still reviewing the next risky change before merge.
A failed check is not a moral judgement. It creates a controlled conversation: reduce the tail, change the model, add a budget owner, or explicitly accept the cost.