Host.Rodeo
Live The Console

One token. The whole fleet, wrangled.

Send model:"auto" to one OpenAI-compatible endpoint. Host.Rodeo routes every request to the fit-for-purpose source — cheapest that clears the bar, local when it matters, paid only when needed — and absorbs every provider's outage and rate-limit before your agent ever feels it. The savings, source tier, and fallback proof ride back on every receipt.

free local · owned
Read the manifest Access is invite-only while we dogfood.
Clint — your host on the floor

“Name’s Clint. I ride herd on the models so your agents don’t get bucked off. Pull up a rail — I’ll show you the whole fleet working, and I won’t hide a thing.”

01 / Capabilities

Everything behind the one token

One OpenAI-compatible endpoint, one bearer token — and a whole platform under model:"auto". Here's what's wrangling on your behalf.

Smart auto, no knobs.

A local classifier reads each request, a cascade tries cheap-then-verifies, and a learned router picks the model. You never hand-pick a model again.

Local & private — your call.

Set your account posture once: Cost (cheapest), Local-aware (free/local before paid is the default ladder), or Private (your data never leaves eligible self-hosted hardware — fails closed, never a silent third-party fallback). Override any single call with a header.

It doesn't go down.

Invisible failover across a diverse fleet, plus an engine that predicts each provider's rate-limit and latency walls and routes around them before they hit. Outages absorbed, not forwarded.

It gets smarter.

The learned router improves from real outcomes — poison-resistant by construction: it learns from a frozen first-party benchmark, never from gameable live traffic. Gated; never auto-promoted.

All inference, one door.

Chat, embeddings, rerank, images, speech — and speech-to-text — through the same token, same failover, same receipt. Pin a model for stable vectors; let auto pick when you don't care.

Built for agents.

One token self-describes the whole fleet at /v1/contract, and a two-way feedback spine lets your app flag problems — adjudicated autonomously, pushed back when resolved. Plus a bounded research loop behind model:"auto-deep-research".

Proof on every call.

Counterfactual savings, the model + source that served you, the tier (free, local, or paid), and what we failed over from — all on X-Rodeo-* headers. Paid actual-cost reads "unknown" rather than inflate a number.

02 / Quickstart

Start in three lines

No SDK lock-in, no model picker, no fallback plumbing. One call, and the whole fleet is behind it.

one call · any OpenAI client
curl https://api.host.rodeo/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

Point any OpenAI SDK at https://api.host.rodeo/v1. Steer with one header — X-Rodeo-Prefer: latency | quality | cost or X-Rodeo-Sensitivity: secret. Discover everything — models, modalities, your account profile — at GET /v1/contract.

03 / Savings

The odometers

What auto-routing saved, and what it would have cost at a premium anchor model. Real money, proven per call — and it compounds with every request.

Saved so far
$0.000000

Summed across every request served on a free or local source instead of paid. Real money, full precision — it climbs as the fleet works.

What it would’ve cost $0.000000 — the counterfactual at a premium anchor model (not money spent)
Not yet audited. These figures are computed from our own receipts; they are not reconciled to a provider billing API yet (auditable: false). We’ll show the badge the day they are. We would rather under-claim than overstate.
Requests served
The fleet is warming up. Numbers appear as real traffic arrives.
Success rate
04 / Fleet

The live registry

Every source the gateway can reach right now, generated live from the self-describing manifest. Sources are grouped by what they can do and whether they are free, local, or paid.

 sources  available now  local tracked

05 / Governance

The platform, governing itself

A live feed of the autonomous loop at work — corroborating a failure, acting on roster drift, turning away a manipulation attempt. No human in the queue. No other gateway shows you this.

Action log · adjudicated autonomously
decisions
  • Watching the bus — the first decision will appear here the moment the loop acts.
How to read it: the adjudicator is intentionally redacted, and a caller’s claim is treated as evidence to verify — never a command to obey. A flag means a human review was queued; the fleet kept running.

Clint, your host

Clint — the host on the floor
“The fleet’s saddled and the gate’s open. Send me anything and I’ll bring you back the best answer on the cheapest honest path — and tell you exactly what it cost and what it saved.”

No knobs required. Point your app at model: "auto" and let me pick. Need it kept close? Mark it sensitive and it never leaves our own hardware.

06 / Account

Your account

You’re signed in.