The AI Spend Platform — How It Works

Product

Govern every dollar of AI spend. Prove every saved dollar, net of quality.

The whole product is one idea: control and visibility are live today (one base_url, caps, kill-switch, attribution, an auditable cost ledger); proving a saving without breaking the product is the hard part, and that proof gate is what would let us turn on a savings share. We run request-side levers at the gateway (a light source-side slim is planned, an optional Intake SDK is coming), all through one eval gate and one ledger — verified savings off · proof pending.

See and govern your spend

Integration

Change one line. Recovea does the rest.

Swap your base URL and key — your existing SDK works unchanged (Python, JS, cURL, Anthropic-shaped). We wrote the gateway ourselves; no third-party proxy sits in your request path. Your provider keys pass through; we’re fail-open to baseline — before the first token we fall back cleanly to your provider, mid-stream we surface a clean error so you can retry — and rollback is the same line you changed.

Recovea Intake (Tier 2) is an optional thin SDK: install, wrap your retrieval and tool-output steps with a connector for your framework (LangChain / LlamaIndex / raw / MCP). Tier 1 is one line and zero app change, available today; Tier 2 is coming in v1.1.

Read the docs

one line in, one line out

from openai import OpenAI
client = OpenAI(
-     base_url="https://api.openai.com/v1",
+     base_url="https://api.recovea.ai/v1",  # your keys pass through · fail-open
    api_key=OPENAI_API_KEY,
)

Instrumentation

We record what every call costs, and what it would have cost untouched.

Per request we log input/output/cache/reasoning tokens, model, latency, and your original baseline model, so the counterfactual is computable from day one. The locked baseline period and normalization rules go into the contract. No prompt bodies logged by default.

The lever stack

Zero-risk first. Eval-gated next. Nothing risky without a passing test.

Lever	Typical net band	Risk tier	Status
Prompt-cache prefix hygiene	do-first · eroding without upkeep	do-first	Planned
Batch API migration	do-first · flat provider Batch discount on the slice	do-first	Planned
Reasoning / effort trimming	do-first on simple routes	do-first	Planned
Dedup / single-flight	prevents stampede spikes · concierge-verified	do-first	Concierge · not self-serve billed
Exact cache	varies by repeat rate · concierge-verified	do-first	Concierge · not self-serve billed
Model routing / cascading	on the routable fraction, eval-gated	eval-gated	Per route, after it passes (shadow today)
Light source-side slim (gateway)	obvious RAG / context trim	eval-gated	Planned

Cache/dedup savings (byte-identical, zero quality risk) are concierge-verified hands-on, never billed as a live self-serve dollar; model routing runs in shadow until its gate passes. Levers marked Planned are not wired yet and bill nothing. Verified savings and the 25% share are off · proof pending until the eval gate ships at volume. Other tools’ routing headlines are conversational benchmarks on the routable fraction, not a blended number. Semantic cache, heavy LLMLingua-2 compression, distillation, and the Tier-2 Intake SDK are designed but not in v1; they arrive in v1.1+.

The eval / quality gate

Shadow mode on your real traffic. No user ever sees a test.

Traffic mirroring shadows the cheaper candidate on a small sampled slice; a layered oracle scores it; a per-route non-inferiority test decides; a promotion ladder ramps it. Per-route gating is non-negotiable.

Deterministic checks: schema, required fields, tool-call well-formedness, refusal / truncation detection.
Golden-dataset gate: the route’s curated regression suite.
Cross-family calibrated judge: a judge from a different model family, calibrated against human labels where a labeled set exists for the route.

non-inferiority testSample data PASS

routesupport-classify

candidategpt-4o-mini· baseline gpt-4o

margin δ2.0 pts· observed +0.4

oraclegolden set + x-family judge

→ promote to 5% canary · rollback armed

shadow1%5%20%50%100%

gate must hold at each rung · one-config rollback any time

The cost ledger

An auditable, basis-labeled cost ledger. Savings, once proven, net of quality.

Live today: spend attributed per key, team, feature, and lever, with the basis labeled. The savings method is IPMVP: Savings = Baseline − Reporting Period ± Adjustments, adjustments we didn’t cause stripped (traffic growth, provider price cuts, your prompt edits), on cost-per-successful-output. Verified savings and the 25% share are off · proof pending until the eval gate ships at volume.

See the method in the docs

RECOVEA · COST LEDGERSample dataacct ████ · period 2026-05

leverbasisstatus

prompt-cache prefix hygieneplannedplanned

Batch API migrationplannedplanned

reasoning / effort trimplannedplanned

dedup / single-flightconciergeconcierge

exact cacheconciergeconcierge

model routing / cascade—proof pending

context / RAG trim (gateway)—proof pending

spend visibilitylive · every provider

cost ledger basislabeled · auditable

verified savingsoff · proof pending

The cost ledger is live and basis-labeled. Verified savings — net of quality, adjustments stripped — turn on only once the eval gate ships at volume; the 25% share stays off until then.

Build vs. buy

We wrap the commodity. We build only the part that’s defensible.

Wrap

Provider Batch APIs (the 50%-off lane) + native prompt caching
OpenRouter (long-tail model access)
A cross-encoder reranker for the light trim
The eval models themselves (rented, cross-family, calibrated)

Build

The gateway hot path: our own code end to end — no third-party proxy in your request path.
The cost ledger: per-lever cost attribution, basis-labeled — the verified counterfactual is off · proof pending.
The eval / quality gate: cross-family non-inferiority, the layered oracle, the promotion ladder.
We don’t train a black-box router — per-lever attribution beats squeezing the last routing point.

FAQ

Blunt answers to the real questions.

Want the number for your traffic?

Free savings scan on a sample of your logs: dollar figure back in 48 hours, no proxy.

Run a free scan Book a call

Govern every dollar of AI spend. Prove every saved dollar, net of quality.

Change one line. Recovea does the rest.

We record what every call costs, and what it would have cost untouched.

Zero-risk first. Eval-gated next. Nothing risky without a passing test.

Shadow mode on your real traffic. No user ever sees a test.

An auditable, basis-labeled cost ledger. Savings, once proven, net of quality.

We wrap the commodity. We build only the part that’s defensible.

Blunt answers to the real questions.

Why don’t you advertise the big savings numbers I’ve seen elsewhere?

How do I know quality didn’t drop?

You’d see all our prompts. Isn’t that a security risk?

Providers keep cutting prices / auto-caching. Won’t this evaporate?

How are savings computed?

Can I just scan without integrating?

Can my own engineer do this?

Does it work on Anthropic / Gemini / Bedrock / my provider?

Want the number for your traffic?