Architecture

How TOLVYN works

A lightweight, transparent proxy sits between your application and every AI provider. Sub-millisecond overhead. Total financial visibility.

The proxy model

TOLVYN intercepts every AI request before it reaches the provider — recording, attributing, and enforcing limits in real time.

Your Application SDK / HTTP request response TOLVYN Proxy meter · hash · route attribute · govern forward response AI Provider any model API write immutable ledger Dashboard costs · alerts budgets · audit realtime analytics

Four steps to full visibility

Connect

Recommended for production: install the tolvyn SDK and change one import — a drop-in replacement for your OpenAI, Anthropic, or Google client. SDK mode adds fail-open: if TOLVYN is ever unreachable, calls retry against the provider directly, so your app keeps running.

Simpler alternative: point your existing client at TOLVYN's proxy by changing one base-URL setting — e.g. OPENAI_BASE_URL — with no code changes. Here "no downtime" means no deploy or refactor to integrate; note that proxy mode keeps TOLVYN in the request path, so for automatic fallback if TOLVYN is unreachable, use SDK mode. Works with any HTTP-based model API.

Use your existing API keys. TOLVYN passes them through securely and never stores credentials in plaintext. TLS everywhere, end to end.

Proxy

Every request is received by TOLVYN's proxy layer, which operates with sub-millisecond overhead. The request metadata — model, token counts, latency, status — is captured and appended to the immutable ledger.

Each ledger entry is hash-chained to the previous, making it cryptographically tamper-proof. Any retroactive modification of a record breaks the chain — immediately detectable.

Attribute

Tag requests with arbitrary metadata — team, service, environment, feature — via HTTP headers or the TOLVYN SDK helper. TOLVYN uses these tags to break down costs at any granularity you care about.

See which team spent $2,400 on GPT-4o last week, which microservice is driving token growth, and which model is delivering the best cost-per-output ratio — all in one dashboard.

Govern

Set hard spending limits per team, per service, or per model. Define alert thresholds at 50%, 80%, and 95% of budget. TOLVYN can automatically block requests once a budget is exhausted — no surprise invoices at month end.

Finance teams get a complete audit trail. Engineering teams get guardrails. Everyone stays aligned on AI spend without slowing down development velocity.

Get started

Ready to take control of your AI costs?

Free forever. 10,000 requests/month. No credit card. Up and running in minutes.

Start free — 10,000 requests View pricing