Architecture
A lightweight, transparent proxy sits between your application and every AI provider. Sub-millisecond overhead. Total financial visibility.
Architecture overview
TOLVYN intercepts every AI request before it reaches the provider — recording, attributing, and enforcing limits in real time.
Step by step
Recommended for production: install the tolvyn SDK and change one import — a drop-in replacement for your OpenAI, Anthropic, or Google client. SDK mode adds fail-open: if TOLVYN is ever unreachable, calls retry against the provider directly, so your app keeps running.
Simpler alternative: point your existing client at TOLVYN's proxy by changing one base-URL setting — e.g. OPENAI_BASE_URL — with no code changes. Here "no downtime" means no deploy or refactor to integrate; note that proxy mode keeps TOLVYN in the request path, so for automatic fallback if TOLVYN is unreachable, use SDK mode. Works with any HTTP-based model API.
Use your existing API keys. TOLVYN passes them through securely and never stores credentials in plaintext. TLS everywhere, end to end.
Every request is received by TOLVYN's proxy layer, which operates with sub-millisecond overhead. The request metadata — model, token counts, latency, status — is captured and appended to the immutable ledger.
Each ledger entry is hash-chained to the previous, making it cryptographically tamper-proof. Any retroactive modification of a record breaks the chain — immediately detectable.
Tag requests with arbitrary metadata — team, service, environment, feature — via HTTP headers or the TOLVYN SDK helper. TOLVYN uses these tags to break down costs at any granularity you care about.
See which team spent $2,400 on GPT-4o last week, which microservice is driving token growth, and which model is delivering the best cost-per-output ratio — all in one dashboard.
Set hard spending limits per team, per service, or per model. Define alert thresholds at 50%, 80%, and 95% of budget. TOLVYN can automatically block requests once a budget is exhausted — no surprise invoices at month end.
Finance teams get a complete audit trail. Engineering teams get guardrails. Everyone stays aligned on AI spend without slowing down development velocity.