Blue Book ledger

The estimate is only as credible as its rate basis.

Class1 keeps a frozen, effective-dated, provenance-aware data layer so every PR estimate can be reproduced and audited.

6,924 pricing rows effective 2026-06-06

2,257 structured price rows cache, batch, tiers

7,389 spec rows litellm plus models.dev

4,580 price trend models historical token index

1,188 cloud instances 118 dated versions

Why the ledger matters

The hard part is not displaying a price. The hard part is knowing which price was used, when, and why.

LLM vendors change prices, aggregators rename models, context tiers appear, cache discounts differ by provider, and frontier model releases can reset capability assumptions overnight. A PR estimate that uses a live feed without freezing the basis is difficult to reproduce later.

Class1 treats the pricing basis like an estimator treats a cost book. Raw artifacts are preserved with provenance, normalized into silver rows, and curated into a gold snapshot. The estimate can say which basis it used, how old that basis is, and which items were measured, inferred, assumed, excluded, or unpriced.

That makes the Blue Book commercially important. A public basis proves the open-core product. A private basis becomes the paid pilot: customer-specific rates, private actuals, internal model allowances, team allocation rules, and variance reports that cannot be copied from a public dashboard.

Medallion discipline

Bronze raw source. Silver normalizer. Gold estimate basis.

Prices, specs, capability, cloud cost, actuals, carbon, water, and materials all live as frozen snapshots.

The product can refresh the basis, but an estimate never silently depends on a live feed. Staleness is modeled as estimate decay.

What a buyer gets from the Blue Book

A credible estimate needs more than a token price table.

Structured ratesCache reads, cache writes, batch discounts, reasoning tokens, and context tiers can change the cost-per-call without any code changing the model name.

Capability joinsSWE-Bench Verified coding grades let the product explain why the cheapest per-token model may be expensive per completed task.

Actuals bridgeFOCUS and provider usage rows give the calibration loop a place to land after the PR has shipped.

Cloud-side basisVector databases, managed endpoints, storage, egress, and GPU commitments belong in the fully loaded AI cost discussion.

Footprint basisCarbon intensity, water scarcity, WUE, and materials coefficients give the environmental report traceable assumptions.

pricing.json6,924 model rows, USD per 1M tokens

pricing_structure.json2,257 structured rates for cache, batch, tiers and related pricing shapes

price_index.json4,580 models with historical token-price trend data

capability.json38 coding grades from SWE-Bench Verified

spec_sheet.json7,389 model metadata records, with raw source retention

actuals_index.json8 public actuals sources feeding the cloud-cost basis

Capability sample

Real coding grades join the price basis.

Model	Provider	SWE-Bench score	Date
claude-opus-4-5	anthropic	79.2%	2025-12-05
claude-opus-4-5-20251101	anthropic	79.2%	2025-12-15
Doubao-Seed-Code	unknown	78.8%	2025-09-28
gemini-3-pro-preview	google	77.4%	2025-11-20
claude-sonnet-4-20250514	anthropic	76.8%	2025-08-04
claude-4-sonnet	anthropic	76.4%	2025-08-19
gpt-5	openai	75.6%	2025-09-01
claude-4-sonnet-20250522	anthropic	75.2%	2025-06-12
claude-sonnet-4-5	anthropic	74.8%	2025-11-03
claude-4-sonnet-20250514	anthropic	74.6%	2025-07-20
claude-sonnet-4.5	anthropic	73.8%	2025-11-03
claude-4-opus-20250514	anthropic	73.2%	2025-05-22