Blue Book ledger

The estimate is only as credible as its rate basis.

Class1 keeps a frozen, effective-dated, provenance-aware data layer so every PR estimate can be reproduced and audited.

6,924 pricing rows effective 2026-06-06
2,257 structured price rows cache, batch, tiers
7,389 spec rows litellm plus models.dev
4,580 price trend models historical token index
1,188 cloud instances 118 dated versions

Why the ledger matters

The hard part is not displaying a price. The hard part is knowing which price was used, when, and why.

LLM vendors change prices, aggregators rename models, context tiers appear, cache discounts differ by provider, and frontier model releases can reset capability assumptions overnight. A PR estimate that uses a live feed without freezing the basis is difficult to reproduce later.

Class1 treats the pricing basis like an estimator treats a cost book. Raw artifacts are preserved with provenance, normalized into silver rows, and curated into a gold snapshot. The estimate can say which basis it used, how old that basis is, and which items were measured, inferred, assumed, excluded, or unpriced.

That makes the Blue Book commercially important. A public basis proves the open-core product. A private basis becomes the paid pilot: customer-specific rates, private actuals, internal model allowances, team allocation rules, and variance reports that cannot be copied from a public dashboard.

Medallion discipline

Bronze raw source. Silver normalizer. Gold estimate basis.

Prices, specs, capability, cloud cost, actuals, carbon, water, and materials all live as frozen snapshots.

The product can refresh the basis, but an estimate never silently depends on a live feed. Staleness is modeled as estimate decay.

What a buyer gets from the Blue Book

A credible estimate needs more than a token price table.

Structured ratesCache reads, cache writes, batch discounts, reasoning tokens, and context tiers can change the cost-per-call without any code changing the model name.
Capability joinsSWE-Bench Verified coding grades let the product explain why the cheapest per-token model may be expensive per completed task.
Actuals bridgeFOCUS and provider usage rows give the calibration loop a place to land after the PR has shipped.
Cloud-side basisVector databases, managed endpoints, storage, egress, and GPU commitments belong in the fully loaded AI cost discussion.
Footprint basisCarbon intensity, water scarcity, WUE, and materials coefficients give the environmental report traceable assumptions.
pricing.json6,924 model rows, USD per 1M tokens
pricing_structure.json2,257 structured rates for cache, batch, tiers and related pricing shapes
price_index.json4,580 models with historical token-price trend data
capability.json38 coding grades from SWE-Bench Verified
spec_sheet.json7,389 model metadata records, with raw source retention
actuals_index.json8 public actuals sources feeding the cloud-cost basis

Capability sample

Real coding grades join the price basis.

ModelProviderSWE-Bench scoreDate
claude-opus-4-5anthropic79.2%2025-12-05
claude-opus-4-5-20251101anthropic79.2%2025-12-15
Doubao-Seed-Codeunknown78.8%2025-09-28
gemini-3-pro-previewgoogle77.4%2025-11-20
claude-sonnet-4-20250514anthropic76.8%2025-08-04
claude-4-sonnetanthropic76.4%2025-08-19
gpt-5openai75.6%2025-09-01
claude-4-sonnet-20250522anthropic75.2%2025-06-12
claude-sonnet-4-5anthropic74.8%2025-11-03
claude-4-sonnet-20250514anthropic74.6%2025-07-20
claude-sonnet-4.5anthropic73.8%2025-11-03
claude-4-opus-20250514anthropic73.2%2025-05-22