The moat
The Immutable Ledger
Every AI request that passes through TOLVYN is recorded in an append-only ledger. Each entry is hashed with SHA-256, chained to the previous entry, and signed with HMAC-SHA256. The result: a cryptographically verifiable record of every API call your organization made to every AI provider, ever.
How the chain works
Each ledger record holds a prev_hash field pointing at the
SHA-256 hash of the previous record's serialized payload. Any retroactive modification
of any record breaks the chain — and breakage is detectable in O(1) at any sequence number.
HMAC-SHA256 signature
On top of the hash chain, each record is signed with HMAC-SHA256 using a per-tenant key. Signature verification proves both integrity (the record wasn't modified) and authenticity (the record was written by TOLVYN, not forged by someone with read access to the database).
Advisory lock for sequence integrity
Sequence numbers are allocated under a Postgres advisory lock per tenant. There are no gaps and no duplicates — even under concurrent writes from multiple proxy workers.
Verification endpoint
# Verify the entire ledger for your tenant
$ tolvyn ledger verify --from 1 --to latest
verifying 4,821 records...
✓ hash chain intact
✓ all HMAC signatures valid
✓ no sequence gaps
ledger integrity: PASS
What it proves — and what it doesn't
The ledger proves what was billable: which request, at which timestamp, to which model, with which token counts, at which cost. It does not store prompt or response content. There is nothing in the ledger your finance team can't show an auditor.
Attribution
Six dimensions of cost attribution
Every request is tagged across six dimensions. Slice cost reports by any combination.
- Team —
X-Tolvyn-Team— engineering, marketing, support - Service —
X-Tolvyn-Service— chatbot-api, search-svc, content-gen - Feature —
X-Tolvyn-Feature— autocomplete, summarize, classify - Agent —
X-Tolvyn-Agent— sdr-agent, support-bot, code-reviewer - User —
X-Tolvyn-User— your end user's ID, hashed before storage - End-customer —
X-Tolvyn-Customer— your customer ID for COGS-per-customer
Hierarchy
Dimensions are independent — but in the dashboard they nest naturally: end-customer → team → service → feature → agent → user. Roll up to any level for board reports; drill down to any level for incident response.
Budgets
Budgets & Enforcement
Set spending limits at any granularity. Choose how strictly they're enforced. Get pre-request cost estimation so the proxy can refuse a request before it hits the provider.
Hard vs soft mode
Hard mode — the proxy returns HTTP 429 with a x-tolvyn-budget
header explaining which budget was hit. Your application can fall back, queue, or
surface an error to the user.
Soft mode — the request goes through and an alert fires.
Scope hierarchy
Budgets evaluate in order: agent → service → team → org. The first exhausted budget wins. So a $500/mo agent budget will hard-block even when the parent team budget has plenty of headroom — useful for runaway agent protection.
Pre-request cost estimation
Token counts and per-model pricing are known before the request goes out. The proxy rejects a request that would push you over budget rather than incurring the charge and refusing the response.
Alerts
Four alert types
- Budget threshold — fires at 75% (warning) and 90%/100% (critical) of any budget scope.
- Cost anomaly — fires when cost-per-request deviates >3σ from rolling baseline (e.g. someone shipped a code change that switched gpt-4o-mini → gpt-4o).
- Model change — fires when a service starts using a model it has never used before.
- Pricing change — fires when a provider you depend on changes a model's price (see Pricing Change Governance).
Severity
Two levels: warning (informational) and critical (action required). Critical alerts always page; warnings are configurable.
Delivery
In-app · email · webhook. Webhooks are HMAC-signed (see below).
Webhooks
Webhooks
Subscribe to four event types, or use the alert.all wildcard.
alert.budget_threshold— any budget threshold crossedalert.cost_anomaly— cost anomaly detectedalert.model_change— service switched modelsalert.pricing_change— provider changed a model pricealert.all— receive all of the above
HMAC-SHA256 signature verification
Every webhook payload is signed with HMAC-SHA256 using your endpoint secret.
Verify the X-Tolvyn-Signature header before processing.
Replay protection via X-Tolvyn-Timestamp with a 5-minute tolerance.
Retry policy with jitter
Failed deliveries (non-2xx response or timeout) are retried with exponential backoff (1s, 4s, 16s, 1m, 5m, 30m) plus uniform jitter to prevent thundering-herd. Six attempts total over ~36 minutes before the event is moved to a dead-letter queue visible in the dashboard.
Reconciliation
Invoice Reconciliation
Every month, you receive an invoice from OpenAI, Anthropic, and Google. TOLVYN tells you whether what they billed matches what your applications actually requested.
Upload your invoice
Upload the CSV export from your provider dashboard. TOLVYN parses it, matches each line item against the ledger by model and date, and computes the gap.
Gap = Invoice − TOLVYN
A positive gap means you were billed for usage that didn't pass through the TOLVYN proxy. That's shadow AI: applications using provider keys without your observability. The reconciliation report names the model, the gap amount, and the date range — so you can hunt down the source.
Three-way match
Available on Growth and above: the proxy ledger, the provider invoice, and your customer billing system can be reconciled simultaneously — so you can prove unit economics per customer line by line.
Savings
Savings Analyzer
Nightly at 02:00 UTC, TOLVYN runs a savings analysis over the
previous 30 days of traffic and recommends concrete, dollar-quantified migrations.
Four rules
- Small-token requests on premium models — if >80% of your gpt-4o requests use fewer than 500 tokens, gpt-4o-mini is likely the right model.
- Duplicate prompts — identical prompt signatures should hit cache, not the provider.
- Underutilized prompt cache — long system prompts that don't trigger Anthropic's cache reuse.
- Idle models — models you provisioned but barely use; consolidation simplifies governance.
Output
Each recommendation includes: estimated monthly savings, affected service, suggested migration steps, and confidence interval. You can dismiss, defer, or schedule the recommendation directly from the dashboard.
Kill switch
Kill Switch
A panic button for AI traffic. The proxy returns HTTP 451 ("Unavailable For Legal Reasons") for any request matching a kill-switch rule. Checked before budget enforcement, so killing a runaway agent always works even if all budgets are exhausted.
Five scope types
- Org-wide — stop all AI traffic immediately
- Provider — stop all traffic to a specific provider
- Model — stop all traffic to a specific model
- Service — stop all traffic from a specific service
- Agent — stop a specific agent (most common)
Activation
Dashboard, CLI (tolvyn kill agent sdr-agent), or API.
Activation is logged to the ledger with operator identity. Deactivation requires the
same role that activated.
Pricing changes
Pricing Change Governance
Provider pricing changes silently. Sometimes a new model is cheaper. Sometimes a model gets repriced upward. Sometimes a model is deprecated and routed to a more expensive replacement. TOLVYN monitors all of this and tells your customers before they're surprised.
Daily scrape
Every day TOLVYN automatically checks the public pricing pages of every supported provider. Diffs are captured, persisted, and surface in the operator console.
Operator approval workflow
A detected change does not auto-update billing. A TOLVYN operator reviews the diff, confirms it's a real change (not a transient page error), and approves. Only then is it propagated to customer accounts.
Customer impact notifications
Any customer whose recent usage included the affected model receives an alert with exact dollar impact estimate: "Your projected monthly cost on gpt-4o-mini goes from $1,240 to $1,615 (+30%) at the new $0.30/M output rate."
Cost index
AI Cost Index
Opt-in benchmarks from real production AI traffic across TOLVYN customers. Published monthly, free, no signup required. See cost-index.html for the live data.
k-anonymity ≥ 3
A data point is published only if at least three independent tenants contribute to it. Below that threshold, the cell is suppressed.
Opt-out
Opted in by default on signup. Toggle off anytime in account settings. Opted-out data is excluded from the next nightly collection — never persists.
Apache 2.0 data
The published aggregates are released under Apache 2.0. Use them in your own analyses, research, blog posts, or vendor evaluations. Attribution appreciated but not required.
Integration
SDKs & Integration
- Python —
tolvyn 0.1.5— drop-in replacement foropenai,anthropic, and Google's GenAI client. - Node.js —
tolvyn 1.0.6— drop-in replacement foropenaiand@anthropic-ai/sdk. - Go —
tolvyn-go v0.1.3— idiomatic Go bindings. - CLI — 58 commands across ledger, budgets, alerts, agents, kill-switch, reconciliation, and savings.
- Direct HTTP — change the base URL, keep your existing request code.
Documentation
Full reference at docs.tolvyn.io including the API reference and the CLI reference.