Smart auto, no knobs.
A local classifier reads each request, a cascade tries cheap-then-verifies, and a learned router picks the model. You never hand-pick a model again.
Send model:"auto" to one OpenAI-compatible endpoint. Host.Rodeo routes every request to the fit-for-purpose source — cheapest that clears the bar, local when it matters, paid only when needed — and absorbs every provider's outage and rate-limit before your agent ever feels it. The savings, source tier, and fallback proof ride back on every receipt.
“Name’s Clint. I ride herd on the models so your agents don’t get bucked off. Pull up a rail — I’ll show you the whole fleet working, and I won’t hide a thing.”
One OpenAI-compatible endpoint, one bearer token — and a whole platform under model:"auto". Here's what's wrangling on your behalf.
A local classifier reads each request, a cascade tries cheap-then-verifies, and a learned router picks the model. You never hand-pick a model again.
Set your account posture once: Cost (cheapest), Local-aware (free/local before paid is the default ladder), or Private (your data never leaves eligible self-hosted hardware — fails closed, never a silent third-party fallback). Override any single call with a header.
Invisible failover across a diverse fleet, plus an engine that predicts each provider's rate-limit and latency walls and routes around them before they hit. Outages absorbed, not forwarded.
The learned router improves from real outcomes — poison-resistant by construction: it learns from a frozen first-party benchmark, never from gameable live traffic. Gated; never auto-promoted.
Chat, embeddings, rerank, images, speech — and speech-to-text — through the same token, same failover, same receipt. Pin a model for stable vectors; let auto pick when you don't care.
One token self-describes the whole fleet at /v1/contract, and a two-way feedback spine lets your app flag problems — adjudicated autonomously, pushed back when resolved. Plus a bounded research loop behind model:"auto-deep-research".
Counterfactual savings, the model + source that served you, the tier (free, local, or paid), and what we failed over from — all on X-Rodeo-* headers. Paid actual-cost reads "unknown" rather than inflate a number.
No SDK lock-in, no model picker, no fallback plumbing. One call, and the whole fleet is behind it.
curl https://api.host.rodeo/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'
Point any OpenAI SDK at https://api.host.rodeo/v1. Steer with one header — X-Rodeo-Prefer: latency | quality | cost or X-Rodeo-Sensitivity: secret. Discover everything — models, modalities, your account profile — at GET /v1/contract.
What auto-routing saved, and what it would have cost at a premium anchor model. Real money, proven per call — and it compounds with every request.
Summed across every request served on a free or local source instead of paid. Real money, full precision — it climbs as the fleet works.
Every source the gateway can reach right now, generated live from the self-describing manifest. Sources are grouped by what they can do and whether they are free, local, or paid.
A live feed of the autonomous loop at work — corroborating a failure, acting on roster drift, turning away a manipulation attempt. No human in the queue. No other gateway shows you this.
“The fleet’s saddled and the gate’s open. Send me anything and I’ll bring you back the best answer on the cheapest honest path — and tell you exactly what it cost and what it saved.”
No knobs required. Point your app at model: "auto" and let me pick. Need it kept close? Mark it sensitive and it never leaves our own hardware.
You’re signed in.
Your account is active. Point your app at model: "auto" and Clint wrangles the rest — free first, then local, then paid only when forced.
Your keys & playground unlock the moment we hand you accessThe proof, on your own traffic — real money saved, what stayed local, what stayed private, and the outages you never felt.
Make your first call and Clint brings back the answer and the ticket. Point your app at model: "auto", or send one from the playground below. The dollars saved, the local runs, the private runs — they start climbing here from request one.
Real money, full precision — every request served free or local instead of a premium anchor model. It only grows.
Your account’s posture — the default Clint applies to every call. Cost, quality, latency, balanced, or private. Pick one; a per-request X-Rodeo-* header still wins on any single call.
Use one as the bearer token against /v1 — the same OpenAI-compatible endpoint, auto-routed. A full key is shown once at mint; keep it safe.
Send a prompt and watch Clint pick the source. You get the answer and the ticket — which model served it, what it saved, whether it rode the sun.
What auto-routing actually did, per intent and tier — measured from real receipts. Sorted by volume. This is the “smart auto, measured” view.
| Intent | Tier | Model | Requests | Success | Avg ms | Saved |
|---|
What auto learned — the ranked model order it earned per intent, scored against the frozen benchmark, not live traffic. This is the policy the router uses now.
Every autonomous decision, newest first. Pick any one to reconstruct its full lineage — signal → decision → resolution — exactly as the loop saw it.
“Pull up a rail. Show me who you are and I’ll open the gate.”
“New rider? Good. Free developer account, no card. Tell me where to reach you.”
— is on the roster. Here is their one-time password — it’s shown once and we don’t store it in the clear.