Product — TOLVYN

The moat

The Immutable Ledger

Every AI request that passes through TOLVYN is recorded in an append-only ledger. Each entry is hashed with SHA-256, chained to the previous entry, and signed with HMAC-SHA256. The result: a cryptographically verifiable record of every API call your organization made to every AI provider, ever.

How the chain works

Each ledger record holds a prev_hash field pointing at the SHA-256 hash of the previous record's serialized payload. Any retroactive modification of any record breaks the chain — and breakage is detectable in O(1) at any sequence number.

HMAC-SHA256 signature

On top of the hash chain, each record is signed with HMAC-SHA256 using a per-tenant key. Signature verification proves both integrity (the record wasn't modified) and authenticity (the record was written by TOLVYN, not forged by someone with read access to the database).

Advisory lock for sequence integrity

Sequence numbers are allocated under a Postgres advisory lock per tenant. There are no gaps and no duplicates — even under concurrent writes from multiple proxy workers.

Verification endpoint

# Verify the entire ledger for your tenant
$ tolvyn ledger verify --from 1 --to latest
verifying 4,821 records...
✓ hash chain intact
✓ all HMAC signatures valid
✓ no sequence gaps
ledger integrity: PASS

What it proves — and what it doesn't

The ledger proves what was billable: which request, at which timestamp, to which model, with which token counts, at which cost. It does not store prompt or response content. There is nothing in the ledger your finance team can't show an auditor.

Attribution

Six dimensions of cost attribution

Every request is tagged across six dimensions. Slice cost reports by any combination.

Team — X-Tolvyn-Team — engineering, marketing, support
Service — X-Tolvyn-Service — chatbot-api, search-svc, content-gen
Feature — X-Tolvyn-Feature — autocomplete, summarize, classify
Agent — X-Tolvyn-Agent — sdr-agent, support-bot, code-reviewer
User — X-Tolvyn-User — your end user's ID, hashed before storage
End-customer — X-Tolvyn-End-Customer — your customer ID for COGS-per-customer

Hierarchy

Dimensions are independent — but in the dashboard they nest naturally: end-customer → team → service → feature → agent → user. Roll up to any level for board reports; drill down to any level for incident response.

Budgets

Budgets & Enforcement

Set spending limits at any granularity. Choose how strictly they're enforced. Get pre-request cost estimation so the proxy can refuse a request before it hits the provider.

Three enforcement modes

Soft mode — the request goes through and an alert fires. Nothing is ever blocked. Hard mode — at the cap the proxy returns HTTP 429 with a x-tolvyn-budget header explaining which budget was hit, before the provider is called. Your application can fall back, queue, or surface an error to the user. Approval mode (approve-and-wait) — at the cap the request is blocked exactly like hard mode, but a pending approval is opened. An administrator grants a bounded extension — an extra dollar amount, valid for a bounded time — after which requests flow again until that grant's amount or time runs out. Governance without a permanent block: route the decision to a human instead of just failing.

Scope hierarchy

Budgets evaluate in order: agent → service → team → org. The first exhausted budget wins. So a $500/mo agent budget will hard-block even when the parent team budget has plenty of headroom — useful for runaway agent protection.

Pre-request cost estimation

Token counts and per-model pricing are known before the request goes out. The proxy rejects a request that would push you over budget rather than incurring the charge and refusing the response.

Spend Quotas

Spend quotas & burn-rate forecast

Budgets enforce. Quotas give you foresight. A spend quota tracks spend along one dimension and alerts as it climbs toward a limit — it never blocks a request. Run them alongside budgets: the budget is the wall, the quota is the early-warning line painted well before it.

Dimensions & periods

Track spend per model, team, service, end-customer, or organization total, over a daily, weekly, or monthly period. Set any number of percentage thresholds — an alert fires as each one is crossed within the period.

Burn-rate forecast

Each quota projects whether you're on track to reach the limit before the period resets, by extrapolating recent spend. The result is one of on track, projected to hit the cap on <date>, already over, or not enough data yet.

It's an estimate — not a guarantee

The forecast is a straight-line extrapolation of a trailing window of spend. Real usage is spiky and non-linear, so the projected date is guidance, not a promise — it exists to warn you early, not to predict the future precisely. Quotas alert; they never block. Enforcement is the budget's job.

Alerts

Four alert types

Budget threshold — fires at 50% (info), 75% (warning), and 90%/100% (critical) of any budget scope.
Cost anomaly — fires when cost-per-request deviates >3σ from rolling baseline (e.g. someone shipped a code change that switched gpt-4o-mini → gpt-4o).
Model change — fires when a service starts using a model it has never used before.
Pricing change — fires when a provider you depend on changes a model's price (see Pricing Change Governance).

Severity

Two levels: warning (informational) and critical (action required). Critical alerts always page; warnings are configurable.

Delivery

In-app · email · webhook. Webhooks are HMAC-signed (see below).

Webhooks

Subscribe to four event types, or use the alert.all wildcard.

alert.budget_threshold — any budget threshold crossed
alert.cost_anomaly — cost anomaly detected
alert.model_change — service switched models
alert.pricing_change — provider changed a model price
alert.all — receive all of the above

HMAC-SHA256 signature verification

Every webhook payload is signed with HMAC-SHA256 using your endpoint secret. Verify the X-Tolvyn-Signature header before processing. Replay protection via X-Tolvyn-Timestamp with a 5-minute tolerance.

Retry policy with jitter

Failed deliveries (non-2xx response or timeout) are retried with exponential backoff (1s, 4s, 16s, 1m, 5m, 30m) plus uniform jitter to prevent thundering-herd. Six attempts total over ~36 minutes before the event is moved to a dead-letter queue visible in the dashboard.

Reconciliation

Invoice Reconciliation

Every month, you receive an invoice from OpenAI, Anthropic, and Google. TOLVYN tells you whether what they billed matches what your applications actually requested.

Upload your invoice

Upload the CSV export from your provider dashboard. TOLVYN parses it, matches each line item against the ledger by model and date, and computes the gap.

Gap = Invoice − TOLVYN

A positive gap means you were billed for usage that didn't pass through the TOLVYN proxy. That's shadow AI: applications using provider keys without your observability. The reconciliation report names the model, the gap amount, and the date range — so you can hunt down the source.

Three-way match

Available on Growth and above: the proxy ledger, the provider invoice, and your customer billing system can be reconciled simultaneously — so you can prove unit economics per customer line by line.

Savings

Savings Analyzer

Nightly at 02:00 UTC, TOLVYN runs a savings analysis over the previous 30 days of traffic and recommends concrete, dollar-quantified migrations.

Four rules

Small-token requests on premium models — if >80% of your gpt-4o requests use fewer than 500 tokens, gpt-4o-mini is likely the right model.
Duplicate prompts — identical prompt signatures should hit cache, not the provider.
Underutilized prompt cache — long system prompts that don't trigger Anthropic's cache reuse.
Idle models — models you provisioned but barely use; consolidation simplifies governance.

Output

Each recommendation includes: estimated monthly savings, affected service, suggested migration steps, and confidence interval. You can dismiss, defer, or schedule the recommendation directly from the dashboard.

Kill switch

Kill Switch

A panic button for AI traffic. The proxy returns HTTP 451 ("Unavailable For Legal Reasons") for any request matching a kill-switch rule. Checked before budget enforcement, so killing a runaway agent always works even if all budgets are exhausted.

Five scope types

Org-wide — stop all AI traffic immediately
Provider — stop all traffic to a specific provider
Model — stop all traffic to a specific model
Service — stop all traffic from a specific service
Agent — stop a specific agent (most common)

Activation

Dashboard, CLI (tolvyn kill agent sdr-agent), or API. Activation is logged to the ledger with operator identity. Deactivation requires the same role that activated.

Pricing changes

Pricing Change Governance

Provider pricing changes silently. Sometimes a new model is cheaper. Sometimes a model gets repriced upward. Sometimes a model is deprecated and routed to a more expensive replacement. TOLVYN monitors all of this and tells your customers before they're surprised.

Daily scrape

Every day TOLVYN automatically checks the public pricing pages of every supported provider. Diffs are captured, persisted, and surface in the operator console.

Operator approval workflow

A detected change does not auto-update billing. A TOLVYN operator reviews the diff, confirms it's a real change (not a transient page error), and approves. Only then is it propagated to customer accounts.

Customer impact notifications

Any customer whose recent usage included the affected model receives an alert with exact dollar impact estimate: "Your projected monthly cost on gpt-4o-mini goes from $1,240 to $1,615 (+30%) at the new $0.30/M output rate."

Cost index

AI Cost Index

Opt-in benchmarks from real production AI traffic across TOLVYN customers. Published monthly, free, no signup required. See cost-index.html for the live data.

k-anonymity ≥ 3

A data point is published only if at least three independent tenants contribute to it. Below that threshold, the cell is suppressed.

Opt-out

Opted in by default on signup. Toggle off anytime in account settings. Opted-out data is excluded from the next nightly collection — never persists.

Apache 2.0 data

The published aggregates are released under Apache 2.0. Use them in your own analyses, research, blog posts, or vendor evaluations. Attribution appreciated but not required.

Integration

SDKs & Integration

Python — tolvyn 0.1.8 — drop-in replacement for openai, anthropic, and Google's GenAI client.
Node.js — tolvyn 1.0.10 — drop-in replacement for openai and @anthropic-ai/sdk.
Go — tolvyn-go v0.1.4 — idiomatic Go bindings.
CLI — 56 commands and subcommands across ledger, budgets, alerts, agents, kill-switch, reconciliation, and savings.
Direct HTTP — change the base URL, keep your existing request code.

Documentation

Full reference at docs.tolvyn.io including the API reference and the CLI reference.

Evaluating TOLVYN against an observability-first gateway? See TOLVYN vs Helicone.

Start free — 10,000 requests

What TOLVYN does