API Reference
🌐 Languages: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština
Complete reference for all OmniRoute API endpoints.
Table of Contents
- Chat Completions
- Embeddings
- Image Generation
- List Models
- Compatibility Endpoints
- Files API
- Batches API
- Search API
- WebSocket Streaming
- Quotas & Issues Reporting
- Semantic Cache
- Dashboard & Management
- Combo Management
- Webhooks
- Registered Keys (Auto-Management)
- Agents Protocol
- Management Proxies
- Resilience (extended)
- Skills
- Memory
- MCP Server
- A2A Server
- Cloud, Evals & Assess
- Request Processing
- Authentication
Chat Completions
POST /v1/chat/completions
Authorization: Bearer your-api-key
Content-Type: application/json
{
"model": "cc/claude-opus-4-6",
"messages": [
{"role": "user", "content": "Write a function to..."}
],
"stream": true
}Custom Headers
| Header | Direction | Description |
|---|---|---|
X-OmniRoute-No-Cache | Request | Set to true to bypass cache |
X-OmniRoute-Progress | Request | Set to true for progress events |
X-Session-Id | Request | Sticky session key for external session affinity |
x_session_id | Request | Underscore variant also accepted (direct HTTP) |
Idempotency-Key | Request | Dedup key (5s window) |
X-Request-Id | Request | Alternative dedup key |
X-OmniRoute-Cache | Response | HIT or MISS (non-streaming) |
X-OmniRoute-Idempotent | Response | true if deduplicated |
X-OmniRoute-Progress | Response | enabled if progress tracking on |
X-OmniRoute-Session-Id | Response | Effective session ID used by OmniRoute |
Nginx note: if you rely on underscore headers (for example
x_session_id), enableunderscores_in_headers on;.
Embeddings
POST /v1/embeddings
Authorization: Bearer your-api-key
Content-Type: application/json
{
"model": "nebius/Qwen/Qwen3-Embedding-8B",
"input": "The food was delicious"
}Available providers: Nebius, OpenAI, Mistral, Together AI, Fireworks, NVIDIA, OpenRouter, GitHub Models.
# List all embedding models
GET /v1/embeddingsImage Generation
POST /v1/images/generations
Authorization: Bearer your-api-key
Content-Type: application/json
{
"model": "openai/gpt-image-2",
"prompt": "A beautiful sunset over mountains",
"size": "1024x1024"
}Available providers: OpenAI (GPT Image 2), xAI (Grok Image), Together AI (FLUX), Fireworks AI, Nebius (FLUX), Hyperbolic, NanoBanana, OpenRouter, SD WebUI (local), ComfyUI (local).
# List all image models
GET /v1/images/generationsList Models
GET /v1/models
Authorization: Bearer your-api-key
→ Returns all chat, embedding, and image models + combos in OpenAI formatCompatibility Endpoints
| Method | Path | Format |
|---|---|---|
| POST | /v1/chat/completions | OpenAI |
| POST | /v1/messages | Anthropic |
| POST | /v1/responses | OpenAI Responses |
| POST | /v1/embeddings | OpenAI |
| POST | /v1/images/generations | OpenAI Images |
| POST | /v1/images/edits | OpenAI Images (edit/inpaint) |
| POST | /v1/videos/generations | OpenAI-style video generation |
| POST | /v1/music/generations | OpenAI-style music generation |
| POST | /v1/audio/transcriptions | OpenAI Audio (STT) |
| POST | /v1/audio/speech | OpenAI TTS (returns audio body) |
| POST | /v1/rerank | Cohere/Voyage-style rerank |
| POST | /v1/moderations | OpenAI Moderations |
| GET | /v1/models | OpenAI |
| POST | /v1/messages/count_tokens | Anthropic |
| GET | /v1beta/models | Gemini |
| POST | /v1beta/models/{...path} | Gemini generateContent |
| POST | /v1/api/chat | Ollama |
| GET | /api/v1/vscode/{token}/ | OpenAI catalog alias |
| GET | /api/v1/vscode/{token}/models | OpenAI models alias |
| POST | /api/v1/vscode/{token}/chat/completions | OpenAI tokenized alias |
| POST | /api/v1/vscode/{token}/responses | OpenAI Responses tokenized alias |
| POST | /api/v1/vscode/{token}/api/chat | Ollama tokenized alias |
| GET | /api/v1/vscode/{token}/api/tags | Ollama tags tokenized alias |
All POST routes follow the same shape: Bearer your-api-key + Zod-validated JSON body (v1RerankSchema, v1ModerationSchema, v1AudioSpeechSchema, etc., see src/shared/validation/schemas.ts). 4xx is returned on schema failure.
For clients that cannot attach Authorization: Bearer ..., OmniRoute also accepts API keys in the URL via either query-string compatibility (?token=..., ?apiKey=..., ?api_key=..., ?key=...) or the dedicated /api/v1/vscode/{token}/... endpoints documented below.
# Rerank
POST /v1/rerank { "model": "cohere/rerank-3", "query": "...", "documents": ["..."] }
# Moderations
POST /v1/moderations { "model": "omni-moderation-latest", "input": "..." }
# TTS — returns audio/mpeg (or requested format) body
POST /v1/audio/speech { "model": "openai/tts-1", "input": "Hello", "voice": "alloy" }
# Image edit (multipart)
POST /v1/images/edits -F image=@input.png -F prompt="..." -F mask=@mask.png
# Video / music generation (provider-prefixed model id)
POST /v1/videos/generations { "model": "runway/gen-3", "prompt": "..." }
POST /v1/music/generations { "model": "suno/v3.5", "prompt": "..." }Dedicated Provider Routes
POST /v1/providers/{provider}/chat/completions
POST /v1/providers/{provider}/embeddings
POST /v1/providers/{provider}/images/generationsThe provider prefix is auto-added if missing. Mismatched models return 400.
Files API
OpenAI-compatible files endpoint for batch input/output and file-purpose uploads.
| Method | Path | Description |
|---|---|---|
| POST | /v1/files | Upload a file (multipart: file, purpose, expires_after[anchor], expires_after[seconds]) — 512 MiB max |
| GET | /v1/files | List files for the authenticated API key |
| GET | /v1/files/[id] | Retrieve a file's metadata |
| DELETE | /v1/files/[id] | Delete a file |
| GET | /v1/files/[id]/content | Stream the raw file body back |
Auth: Bearer API key — files are scoped per-API-key via getApiKeyRequestScope.
Batches API
OpenAI-compatible batch processing.
| Method | Path | Description |
|---|---|---|
| POST | /v1/batches | Create batch — body validated by v1BatchCreateSchema (input_file_id, endpoint, completion_window) |
| GET | /v1/batches | List batches |
| GET | /v1/batches/[id] | Retrieve batch status + request_counts |
| DELETE | /v1/batches/[id] | Delete a finished/failed batch |
| POST | /v1/batches/[id]/cancel | Cancel an in-progress batch |
Auth: Bearer API key. Batches are scoped per-API-key.
Search API
Web/search provider abstraction (Tavily, Brave, Exa, Serper, etc.).
| Method | Path | Description |
|---|---|---|
| GET | /v1/search | List configured search providers + capabilities |
| POST | /v1/search | Run a search query — body validated by v1SearchSchema, supports caching/coalescing |
| GET | /v1/search/analytics | Per-provider hit/latency/cache stats |
Auth: Bearer API key (extractApiKey + isValidApiKey). Search policy enforced via enforceApiKeyPolicy.
WebSocket Streaming
GET /v1/ws?handshake=1Validates a WebSocket upgrade handshake and returns the wire protocol example messages (request, cancel). Actual WS frames are handled by the bundled WS server outside the Next.js route table.
Auth: Bearer API key during handshake.
Responses API over WebSocket (codex only)
# Same host:port as the HTTP API (default 20128); upgrade the connection:
wscat -c "ws://localhost:20128/v1/responses?api_key=<OMNIROUTE_API_KEY>"
# (or: -H "Authorization: Bearer <OMNIROUTE_API_KEY>")
# First frame MUST be response.create:
{ "type": "response.create", "model": "gpt-5.5", "input": [ { "role": "user", "content": "hi" } ] }A Responses-API-over-WebSocket proxy is wired exclusively to codex (ChatGPT
backend). It listens on the same port as the API/dashboard at paths /v1/responses,
/responses, and /api/v1/responses. On the first response.create frame it
authenticates + prepares via the internal codex-responses-ws bridge, selects a
codex OAuth connection, and tunnels to wss://chatgpt.com/backend-api/codex/responses
via the wreq-js transport. Non-codex models are rejected (codex_ws_provider_required).
For quota-share routing use model: "qtSd/<group>/codex/<model>". Implemented in
app/server-ws.mjs + scripts/dev/responses-ws-proxy.mjs + src/app/api/internal/codex-responses-ws/route.ts.
Auth: Bearer API key during handshake. The bundled HTTP server (server-ws.mjs)
must be the active entrypoint (it is, by default, when app/server-ws.mjs exists).
Model id: use the bare ChatGPT id (no codex/ prefix)
The OpenAI Codex CLI validates the model name client-side when
supports_websockets = true and rejects provider-prefixed ids like
codex/gpt-5.5 (The 'codex/gpt-5.5' model is not supported when using Codex with a ChatGPT account). Send the bare id (e.g. gpt-5.5). OmniRoute's bridge is
codex-only, so it re-resolves a bare id as a codex model
(resolveCodexWsModelInfo) before tunneling upstream — even though a bare
gpt-5.5 would otherwise route to another provider over HTTP.
Configuring the OpenAI Codex CLI
Point the Codex CLI at OmniRoute by adding a custom provider with WebSocket
support to ~/.codex/config.toml (use a separate CODEX_HOME to avoid touching
an existing config):
model = "gpt-5.5" # bare id — NOT "codex/gpt-5.5"
model_provider = "omniroute"
[model_providers.omniroute]
name = "OmniRoute (WS)"
base_url = "http://localhost:20128/v1" # no trailing slash; the WS URL is derived (use https/wss in production)
wire_api = "responses" # only supported value since Feb 2026
supports_websockets = true # enables the Responses-over-WS transport
env_key = "OMNIROUTE_API_KEY" # holds the OmniRoute API key (Bearer)export OMNIROUTE_API_KEY=sk-... # an OmniRoute API key (any key if REQUIRE_API_KEY=false)
codex exec "Responda apenas: PONG"The CLI upgrades base_url + /responses to a WebSocket and OmniRoute tunnels it
to the selected codex OAuth connection. Validated end-to-end against the local
server: ChatGPT returns codex.rate_limits + response.created and streams the
completion.
Quotas & Issues Reporting
| Method | Path | Description |
|---|---|---|
| GET | /v1/quotas/check | Pre-validate quota for a provider + accountId before issuing a registered key |
| POST | /v1/issues/report | Report a quota/key issuance failure to GitHub (requires GITHUB_ISSUES_REPO + token) |
Auth: Bearer API key (isAuthenticated).
Semantic Cache
# Get cache stats
GET /api/cache/stats
# Clear all caches
DELETE /api/cache/statsResponse example:
{
"semanticCache": {
"memorySize": 42,
"memoryMaxSize": 500,
"dbSize": 128,
"hitRate": 0.65
},
"idempotency": {
"activeKeys": 3,
"windowMs": 5000
}
}Dashboard & Management
Authentication
| Endpoint | Method | Description |
|---|---|---|
/api/auth/login | POST | Login |
/api/auth/logout | POST | Logout |
/api/settings/require-login | GET/PUT | Toggle login required |
Provider Management
| Endpoint | Method | Description |
|---|---|---|
/api/providers | GET/POST | List / create providers |
/api/providers/[id] | GET/PUT/DELETE | Manage a provider |
/api/providers/[id]/test | POST | Test provider connection |
/api/providers/[id]/models | GET | List provider models |
/api/providers/validate | POST | Validate provider config |
/api/provider-nodes* | Various | Provider node management |
/api/provider-models | GET/POST/PATCH/DELETE | Custom models (add, update, hide/show, delete) |
OAuth Flows
| Endpoint | Method | Description |
|---|---|---|
/api/oauth/[provider]/[action] | Various | Provider-specific OAuth |
Routing & Config
| Endpoint | Method | Description |
|---|---|---|
/api/models/alias | GET/POST | Model aliases |
/api/models/catalog | GET | All models by provider + type |
/api/combos* | Various | Combo management |
/api/keys* | Various | API key management |
/api/pricing | GET | Model pricing |
Usage & Analytics
| Endpoint | Method | Description |
|---|---|---|
/api/usage/history | GET | Usage history |
/api/usage/logs | GET | Usage logs |
/api/usage/request-logs | GET | Request-level logs |
/api/usage/[connectionId] | GET | Per-connection usage |
/api/usage/token-limits | GET/POST/DELETE | Per-API-key token-limit budgets |
Settings
| Endpoint | Method | Description |
|---|---|---|
/api/settings | GET/PUT/PATCH | General settings |
/api/settings/proxy | GET/PUT | Network proxy config |
/api/settings/proxy/test | POST | Test proxy connection |
/api/settings/ip-filter | GET/PUT | IP allowlist/blocklist |
/api/settings/thinking-budget | GET/PUT | Reasoning token budget |
/api/settings/system-prompt | GET/PUT | Global system prompt |
/api/settings/compression | GET/PUT | Global compression config |
Context & Compression
| Endpoint | Method | Description |
|---|---|---|
/api/compression/preview | POST | Preview off/lite/standard/aggressive/ultra/RTK/stacked compression |
/api/compression/language-packs | GET | List available Caveman language packs |
/api/compression/rules | GET | List Caveman rule metadata |
/api/context/caveman/config | GET/PUT | Caveman-specific settings alias |
/api/context/rtk/config | GET/PUT | RTK-specific settings, including custom filters and raw-output retention |
/api/context/rtk/filters | GET | RTK filter catalog and custom-filter diagnostics |
/api/context/rtk/test | POST | Run RTK preview/test against a text payload |
/api/context/rtk/raw-output/[id] | GET | Read retained redacted raw output by pointer id |
/api/context/combos | GET/POST | Compression combo list/create |
/api/context/combos/[id] | GET/PUT/DELETE | Compression combo detail/update/delete |
/api/context/combos/[id]/assignments | GET/PUT | Assign compression combos to routing combos |
/api/context/analytics | GET | Compression analytics alias |
Monitoring
| Endpoint | Method | Description |
|---|---|---|
/api/sessions | GET | Active session tracking |
/api/rate-limits | GET | Per-account rate limits |
/api/monitoring/health | GET | Health check + provider summary (catalogCount, configuredCount, activeCount, monitoredCount) |
/api/cache/stats | GET/DELETE | Cache stats / clear |
Backup & Export/Import
| Endpoint | Method | Description |
|---|---|---|
/api/db-backups | GET | List available backups |
/api/db-backups | PUT | Create a manual backup |
/api/db-backups | POST | Restore from a specific backup |
/api/db-backups/export | GET | Download database as .sqlite file |
/api/db-backups/import | POST | Upload .sqlite file to replace database |
/api/db-backups/exportAll | GET | Download full backup as .tar.gz archive |
Cloud Sync
| Endpoint | Method | Description |
|---|---|---|
/api/sync/cloud | Various | Cloud sync operations |
/api/sync/initialize | POST | Initialize sync |
/api/cloud/* | Various | Cloud management |
Tunnels
| Endpoint | Method | Description |
|---|---|---|
/api/tunnels/cloudflared | GET | Read Cloudflare Quick Tunnel install/runtime status for the dashboard |
/api/tunnels/cloudflared | POST | Enable or disable the Cloudflare Quick Tunnel (action=enable/disable) |
/api/tunnels/ngrok | GET | Read ngrok Tunnel runtime status for the dashboard |
/api/tunnels/ngrok | POST | Enable or disable the ngrok Tunnel (action=enable/disable) |
CLI Tools
| Endpoint | Method | Description |
|---|---|---|
/api/cli-tools/claude-settings | GET | Claude CLI status |
/api/cli-tools/codex-settings | GET | Codex CLI status |
/api/cli-tools/droid-settings | GET | Droid CLI status |
/api/cli-tools/openclaw-settings | GET | OpenClaw CLI status |
/api/cli-tools/runtime/[toolId] | GET | Generic CLI runtime |
CLI responses include: installed, runnable, command, commandPath, runtimeMode, reason.
ACP Agents
| Endpoint | Method | Description |
|---|---|---|
/api/acp/agents | GET | List all detected agents (built-in + custom) with status |
/api/acp/agents | POST | Add custom agent or refresh detection cache |
/api/acp/agents | DELETE | Remove a custom agent by id query param |
GET response includes agents[] (id, name, binary, version, installed, protocol, isCustom) and summary (total, installed, notFound, builtIn, custom).
Resilience & Rate Limits
| Endpoint | Method | Description |
|---|---|---|
/api/resilience | GET/PATCH | Get/update request queue, connection cooldown, provider breaker, and wait settings |
/api/resilience/reset | POST | Reset provider circuit breakers |
/api/resilience/model-cooldowns | GET | List active per-(provider, connection, model) lockouts, sorted by remaining time |
/api/resilience/model-cooldowns | DELETE | Clear a model lockout — body {provider, model} or {all: true} to wipe everything |
/api/rate-limits | GET | Per-account rate limit status |
/api/rate-limit | GET | Global rate limit configuration |
All four
/api/resilience/*routes require management auth (requireManagementAuth). See Resilience (extended) for a full breakdown of provider breaker vs connection cooldown vs model lockout.
Evals
| Endpoint | Method | Description |
|---|---|---|
/api/evals | GET/POST | List eval suites / run evaluation |
Policies
| Endpoint | Method | Description |
|---|---|---|
/api/policies | GET/POST/DELETE | Manage routing policies |
Compliance
| Endpoint | Method | Description |
|---|---|---|
/api/compliance/audit-log | GET | Compliance audit log (last N) |
v1beta (Gemini-Compatible)
| Endpoint | Method | Description |
|---|---|---|
/v1beta/models | GET | List models in Gemini format |
/v1beta/models/{...path} | POST | Gemini generateContent endpoint |
These endpoints mirror Gemini's API format for clients that expect native Gemini SDK compatibility.
Internal / System APIs
| Endpoint | Method | Description |
|---|---|---|
/api/init | GET | Application initialization check (used on first run) |
/api/tags | GET | Ollama-compatible model tags (for Ollama clients) |
/api/restart | POST | Trigger graceful server restart |
/api/shutdown | POST | Trigger graceful server shutdown |
/api/system/env/repair | POST | Repair OAuth provider environment variables |
Note: These endpoints are used internally by the system or for Ollama client compatibility. They are not typically called by end users.
OAuth Environment Repair (v3.6.1+)
POST /api/system/env/repair
Content-Type: application/json
{
"provider": "claude-code"
}Repairs missing or corrupted OAuth environment variables for a specific provider. Returns:
{
"success": true,
"repaired": ["CLAUDE_CODE_OAUTH_CLIENT_ID", "CLAUDE_CODE_OAUTH_CLIENT_SECRET"],
"backupPath": "/home/user/.omniroute/backups/env-repair-2026-04-11.bak"
}Audio Transcription
POST /v1/audio/transcriptions
Authorization: Bearer your-api-key
Content-Type: multipart/form-dataTranscribe audio files using Deepgram or AssemblyAI.
Request:
curl -X POST http://localhost:20128/v1/audio/transcriptions \
-H "Authorization: Bearer your-api-key" \
-F "file=@recording.mp3" \
-F "model=deepgram/nova-3"Response:
{
"text": "Hello, this is the transcribed audio content.",
"task": "transcribe",
"language": "en",
"duration": 12.5
}Supported providers: deepgram/nova-3, assemblyai/best.
Supported formats: mp3, wav, m4a, flac, ogg, webm.
Ollama Compatibility
For clients that use Ollama's API format:
# Chat endpoint (Ollama format)
POST /v1/api/chat
# Model listing (Ollama format)
GET /api/tagsRequests are automatically translated between Ollama and internal formats.
Tokenized VS Code / Headerless Aliases
Use these aliases when an integration cannot inject an Authorization header and needs the API key embedded in the base URL.
# OpenAI-style catalog alias
GET /api/v1/vscode/{token}/
GET /api/v1/vscode/{token}/models
# OpenAI-style chat aliases
POST /api/v1/vscode/{token}/chat/completions
POST /api/v1/vscode/{token}/responses
# Ollama-style aliases
POST /api/v1/vscode/{token}/api/chat
GET /api/v1/vscode/{token}/api/tagsExample:
curl https://your-host.example/api/v1/vscode/YOUR_API_KEY/models
curl -X POST https://your-host.example/api/v1/vscode/YOUR_API_KEY/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"hello"}]}'Notes:
- The tokenized aliases reuse the same handlers as
/v1/*and/api/tags; response shapes stay identical. - Prefer
Authorization: Bearer ...whenever the client supports custom headers. - URL-based tokens may appear in reverse-proxy logs, browser history, and telemetry outside OmniRoute. Treat them as a compatibility option, not the default authentication mode.
Telemetry
# Get latency telemetry summary (p50/p95/p99 per provider)
GET /api/telemetry/summaryResponse:
{
"providers": {
"claudeCode": { "p50": 245, "p95": 890, "p99": 1200, "count": 150 },
"github": { "p50": 180, "p95": 620, "p99": 950, "count": 320 }
}
}Budget
# Get budget status for all API keys
GET /api/usage/budget
# Set or update a budget
POST /api/usage/budget
Content-Type: application/json
{
"apiKeyId": "key-123",
"dailyLimitUsd": 5.00,
"weeklyLimitUsd": 30.00,
"monthlyLimitUsd": 100.00,
"warningThreshold": 0.8,
"resetInterval": "monthly"
}Schema notes (
setBudgetSchema):apiKeyIdis required; at least one ofdailyLimitUsd,weeklyLimitUsd, ormonthlyLimitUsdmust be greater than zero. Optional fields:warningThreshold(0–1),resetInterval(daily|weekly|monthly),resetTime(HH:MM). The legacy{keyId, limit, period}shape returns400 Bad Request.
Token Limits
Per-API-key token budgets (distinct from the USD-based Budget above). Enforced inline on the request path: when a key's current window usage reaches its limit, requests are rejected with 429 Too Many Requests. Limits can be scoped to a specific model, a provider, or applied globally across the key; when several limits match a request, the most restrictive one wins.
# List a key's token limits (includes live window usage)
GET /api/usage/token-limits?apiKeyId=key-123
# Create or update a token limit
POST /api/usage/token-limits
Content-Type: application/json
{
"apiKeyId": "key-123",
"scopeType": "model",
"scopeValue": "openai/gpt-4o",
"tokenLimit": 1000000,
"resetInterval": "monthly",
"enabled": true
}
# Delete a token limit by id
DELETE /api/usage/token-limits?id=tl-abcSchema notes (
setTokenLimitSchema):apiKeyIdandscopeType(model|provider|global) are required.scopeValueis required unlessscopeTypeisglobal(e.g. a model id formodelscope, a provider id forproviderscope).tokenLimitmust be a positive integer (coerced from string). Optional:id(omit to create, supply to update),resetInterval(daily|weekly|monthly, defaultmonthly),resetTime(HH:MM),enabled(defaulttrue).GETresponses enrich each limit withtokensUsed,remaining,windowStart,periodStartAt, andnextResetAt. This is a management-class endpoint (auth enforced centrally by the authz pipeline).
Request Processing
- Client sends request to
/v1/* - Route handler calls
handleChat,handleEmbedding,handleAudioTranscription, orhandleImageGeneration - Model is resolved (direct provider/model or alias/combo)
- Credentials selected from local DB with account availability filtering
- For chat:
handleChatCorechecks semantic/signature cache and resolves combo compression settings - Proactive compression runs before provider translation when enabled (
lite, Caveman, RTK, or stacked) - Provider executor sends upstream request
- Response translated back to client format (chat) or returned as-is (embeddings/images/audio)
- Usage, compression analytics, and request logs are recorded
- Fallback applies on errors according to combo rules
Full architecture reference: ARCHITECTURE.md
Combo Management
Higher-level routing combos (already summarized under /api/combos*) can also be mapped 1:1 from a model id pattern, allowing transparent redirection of an OpenAI-style model id to a combo.
| Method | Path | Description |
|---|---|---|
| GET | /api/model-combo-mappings | List all model→combo mappings |
| POST | /api/model-combo-mappings | Create mapping — body: {pattern, comboId, priority?, enabled?, description?} |
| GET | /api/model-combo-mappings/[id] | Retrieve a single mapping |
| PUT | /api/model-combo-mappings/[id] | Update fields of an existing mapping |
| DELETE | /api/model-combo-mappings/[id] | Remove a mapping |
Auth: management session/API key (requireManagementAuth).
Webhooks
Outbound webhook subscriptions for OmniRoute events (request completion, quota exhaustion, key rotation, etc.).
| Method | Path | Description |
|---|---|---|
| GET | /api/webhooks | List webhooks (secrets are masked to <prefix>...) |
| POST | /api/webhooks | Create webhook — body: {url, events?: ["*"], secret?, description?} |
| GET | /api/webhooks/[id] | Retrieve a webhook |
| PUT | /api/webhooks/[id] | Update url/events/secret/description |
| DELETE | /api/webhooks/[id] | Remove a webhook |
| POST | /api/webhooks/[id]/test | Send a test payload to the webhook URL and return delivery status |
Auth: management session/API key (requireManagementAuth).
Registered Keys (Auto-Management)
Used by the auto-key management subsystem to issue and rotate API keys against a backing provider/account, with daily/hourly quotas.
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/registered-keys | List registered keys (masked prefix only) |
| POST | /api/v1/registered-keys | Issue a new registered key — body: {name, provider?, accountId?, idempotencyKey?, expiresAt?, dailyBudget?, hourlyBudget?}. Returns the raw key once. Returns 429 on quota refusal. |
| GET | /api/v1/registered-keys/[id] | Retrieve a registered key's metadata (no raw material) |
| DELETE | /api/v1/registered-keys/[id] | Revoke a registered key |
| POST | /api/v1/registered-keys/[id]/revoke | Explicit revoke endpoint (same effect as DELETE) |
Auth: Bearer API key (isAuthenticated). See also /v1/quotas/check and /v1/issues/report.
Agents Protocol
Cloud agent tasks (Claude Code, Codex Cloud, OpenHands, etc.) executed remotely on behalf of OmniRoute users.
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/agents/tasks | List tasks — optional ?provider=, ?status=, ?limit= (1–500, default 50) |
| POST | /api/v1/agents/tasks | Create task — body validated by CreateCloudAgentTaskSchema (providerId, prompt, source, options?). Returns 201 with task envelope |
| DELETE | /api/v1/agents/tasks?id=... | Delete a task |
| GET | /api/v1/agents/tasks/[id] | Read task — synchronously refreshes status from the upstream cloud agent when an external_id is set |
| POST | /api/v1/agents/tasks/[id] | Discriminated action: {action: "approve"}, {action: "message", message}, or {action: "cancel"} |
| DELETE | /api/v1/agents/tasks/[id] | Delete a specific task by id |
Auth: management auth required on every method (
requireCloudAgentManagementAuth). Prior to v3.8.0 these were unauthenticated — see commit588a0333for the breaking change.
# Create a Claude Code cloud task
curl -X POST http://localhost:20128/api/v1/agents/tasks \
-H "Authorization: Bearer your-management-key" \
-H "Content-Type: application/json" \
-d '{"providerId":"claude-code-cloud","prompt":"Fix the failing test","source":{"repo":"...","branch":"..."}}'Management Proxies
Outbound HTTP(S)/SOCKS proxies that can be assigned to providers, accounts, or globally.
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/management/proxies | List proxies (with ?id= returns one; with ?id=&where_used=1 returns the assignment graph) |
| POST | /api/v1/management/proxies | Create proxy — body validated by createProxyRegistrySchema |
| PATCH | /api/v1/management/proxies | Update proxy — body validated by updateProxyRegistrySchema (requires id) |
| DELETE | /api/v1/management/proxies?id=...&force=1 | Delete proxy (use force=1 to detach assignments) |
| GET | /api/v1/management/proxies/assignments | List assignments — filterable by proxy_id, scope, scope_id; pass resolve_connection_id=<id> to resolve the active proxy for a connection |
| PUT | /api/v1/management/proxies/assignments | Assign — body validated by proxyAssignmentSchema ({scope, scopeId?, proxyId?}). Clears dispatcher cache |
| PUT | /api/v1/management/proxies/bulk-assign | Bulk-assign — body validated by bulkProxyAssignmentSchema ({scope, scopeIds[], proxyId?}) |
| GET | /api/v1/management/proxies/health?hours=24 | Aggregate proxy health (success/fail counts, latency) over a window |
Auth: management session/API key on every route (requireManagementAuth).
The task description's
POST /api/v1/management/proxies/[id]/assignmentsandPOST /api/v1/management/proxies/[id]/healthare served by the flat/assignmentsand/healthroutes shown above — there are no per-id subroutes in the codebase.
Resilience (extended)
OmniRoute exposes three independent temporary-failure mechanisms; the management endpoints below let operators read and override them:
| Scope | State storage | Read | Reset / clear |
|---|---|---|---|
| Provider breaker | domain_circuit_breakers + in-memory | /api/monitoring/health | POST /api/resilience/reset |
| Connection cooldown | rateLimitedUntil on provider connections | /api/rate-limits, /api/providers/[id] | (re-enables lazily; clear via provider PUT) |
| Model lockout | In-memory model-availability registry | GET /api/resilience/model-cooldowns | DELETE /api/resilience/model-cooldowns |
PATCH /api/resilience accepts provider breaker overrides under providerBreaker.oauth and providerBreaker.apikey. Each profile supports degradationThreshold, failureThreshold, and resetTimeoutMs; the same fields are exposed in Dashboard → Settings → Resilience.
# Clear a single model lockout
curl -X DELETE http://localhost:20128/api/resilience/model-cooldowns \
-H "Cookie: auth_token=..." \
-H "Content-Type: application/json" \
-d '{"provider":"openai","model":"gpt-4o-mini"}'
# Wipe every lockout
curl -X DELETE http://localhost:20128/api/resilience/model-cooldowns \
-H "Cookie: auth_token=..." \
-d '{"all":true}'Full conceptual reference and breaker defaults: see CLAUDE.md → "Resilience Runtime State".
Skills
Skill framework for extending OmniRoute with custom executable handlers, plus marketplace integrations.
| Method | Path | Description |
|---|---|---|
| GET | /api/skills | List installed skills — filterable by ?q=, ?mode=on|off|auto, ?source=skillsmp|skillssh|local, paginated |
| GET | /api/skills/[id] | Retrieve one skill |
| PUT | /api/skills/[id] | Update skill (name, description, mode, schema, handler, tags) |
| DELETE | /api/skills/[id] | Uninstall a skill |
| POST | /api/skills/install | Install a skill from a raw manifest — body: {name, version, description, schema:{input, output}, handlerCode, apiKeyId?} |
| GET | /api/skills/executions | List recent skill executions (audit trail with inputs/outputs/duration) |
| GET | /api/skills/marketplace?q=... | Search/popular list from the SkillsMP marketplace (requires skillsmpApiKey setting) |
| POST | /api/skills/marketplace/install | Install a skill by id from SkillsMP |
| GET | /api/skills/skillssh?q=&limit= | Search the skills.sh registry |
| POST | /api/skills/skillssh/install | Install a skill by id from skills.sh |
Auth: management session/API key. Marketplace search routes accept either management auth or a Bearer API key (isAuthenticated).
Memory
Persistent conversational/factual memory store, scoped per API key / session.
| Method | Path | Description |
|---|---|---|
| GET | /api/memory | List memories — ?apiKeyId=, ?type=, ?sessionId=, ?q=, with offset/limit or page/limit pagination |
| POST | /api/memory | Create memory — body validated by Zod: {content, key, type?, sessionId?, apiKeyId?, metadata?, expiresAt?} |
| GET | /api/memory/[id] | Retrieve one memory |
| DELETE | /api/memory/[id] | Delete a memory |
| GET | /api/memory/health | Memory subsystem health (DB connectivity, embeddings backend, vector index status) |
Auth: management session/API key (requireManagementAuth). type enum: FACTUAL, EPISODIC, SEMANTIC, PROCEDURAL (see MemoryType in src/lib/memory/types.ts).
MCP Server
OmniRoute ships an embedded Model Context Protocol server with 3 transports (stdio, SSE, streamable-http) and scoped tools. The dashboard endpoints below read status/audit data and proxy the HTTP transports.
| Method | Path | Description |
| ------ | ---------------------- | ------------------------------------------------------------------------------------------------ | -------------------- |
| GET | /api/mcp/status | Heartbeat, transport, online state, last call, top tools, 24h success rate |
| GET | /api/mcp/tools | List of MCP tools with name, description, scopes, phase, auditLevel, sourceEndpoints |
| GET | /api/mcp/sse | Open SSE stream for the SSE transport (returns 503 if MCP disabled or transport mismatch) |
| POST | /api/mcp/sse | Send JSON-RPC frame on the SSE transport |
| GET | /api/mcp/stream | Open SSE side of the Streamable HTTP transport (server-initiated messages) |
| POST | /api/mcp/stream | Send JSON-RPC frame on the Streamable HTTP transport |
| DELETE | /api/mcp/stream | End a Streamable HTTP session |
| GET | /api/mcp/audit | Query audit log — ?limit=, ?offset=, ?tool=, ?success=true | false, ?apiKeyId= |
| GET | /api/mcp/audit/stats | Aggregate audit stats (totals, success rate, avg duration, top tools) |
Auth: the sse/stream transports honor the MCP-specific auth surface (Bearer API key with mcp scope); the status/tools/audit* routes are readable from the dashboard (no extra auth required beyond reaching the dashboard host).
Both HTTP transports are gated by
settings.mcpEnabledandsettings.mcpTransport— a transport mismatch returns400, an MCP disabled state returns503.
A2A Server
OmniRoute exposes an A2A (Agent-to-Agent) JSON-RPC 2.0 endpoint plus a REST wrapper for inspection/dashboard use.
JSON-RPC
POST /a2a
Authorization: Bearer your-api-key # optional unless OMNIROUTE_API_KEY is set
Content-Type: application/json
{
"jsonrpc": "2.0",
"id": 1,
"method": "message/send",
"params": {
"skill": "smart-routing",
"messages": [{"role": "user", "content": "Route this coding task"}]
}
}Supported methods (all gated on settings.a2aEnabled):
| Method | Description |
|---|---|
message/send | Synchronous skill execution; returns {task, artifacts, metadata} |
message/stream | Streaming SSE execution of the same skill set |
tasks/get | Fetch a task by taskId |
tasks/cancel | Cancel a task by taskId |
Built-in skills: smart-routing, quota-management, provider-discovery, cost-analysis, health-report.
Agent Card
GET /.well-known/agent.jsonReturns the public A2A agent card (name, description, capabilities, skill catalog, auth scheme) — cached publicly for 1h. No auth required.
REST helpers
| Method | Path | Description |
|---|---|---|
| GET | /api/a2a/status | A2A enabled + task stats + cached agent card summary |
| GET | /api/a2a/tasks | List tasks — ?state=submitted|working|completed|failed|cancelled, ?skill=, ?limit= (≤200), ?offset= |
| POST | /api/a2a/tasks | (Not implemented as a REST helper — create via JSON-RPC message/send) |
| GET | /api/a2a/tasks/[id] | Retrieve one task |
| POST | /api/a2a/tasks/[id]/cancel | Cancel a task |
Auth: the REST helpers run without management auth (dashboard-readable); the JSON-RPC /a2a route uses Bearer OMNIROUTE_API_KEY if configured.
Cloud, Evals & Assess
| Method | Path | Description |
| ------ | ------------------------------- | ------------------------------------------------------------------------------------------------- | ----------------------------- | ----------------------------------- |
| POST | /api/cloud/auth | Verify a Bearer key and return masked provider connections + model aliases for cloud sync clients |
| POST | /api/cloud/credentials/update | Update encrypted credentials for a cloud-synced provider |
| POST | /api/cloud/model/resolve | Resolve a logical model id to a concrete provider/model using the local routing table |
| GET | /api/cloud/models/alias | List model aliases as exposed to cloud sync |
| GET | /api/assess | Read latest assessment categorizations (per-provider/model) |
| POST | /api/assess | Run an assessment — body: {scope: {type:"all"} | {type:"provider", providerId} | {type:"model", modelId}, trigger?} |
| GET | /api/evals | List built-in eval suites + most recent runs |
| POST | /api/evals | Trigger an eval run |
| POST | /api/evals/suites | Create a custom eval suite — body validated by evalSuiteSaveSchema |
| GET | /api/evals/suites/[id] | Retrieve a custom eval suite |
Auth: /api/cloud/auth validates a Bearer key directly; the other /api/cloud/*, /api/evals/*, and /api/assess routes require management session/API key. /api/assess POST uses validateBody with a discriminated-union scope schema.
ACP (Agent Client Protocol) Management
The ACP framework lets you spawn CLI agents (Claude Code, Codex, Gemini CLI, etc.) as child processes. These endpoints manage ACP agent detection and custom agent registration.
| Method | Path | Description |
|---|---|---|
| GET | /api/acp/agents | List all known CLI agents (built-in + custom) with installation status, version, binary |
| POST | /api/acp/agents | Register a custom ACP agent or refresh cache — body: {id, name, binary, versionCommand, providerAlias, spawnArgs, protocol} or {action: "refresh"} |
| DELETE | /api/acp/agents | Remove a custom ACP agent — query param: ?id=<agentId> |
Response example (GET /api/acp/agents):
{
"agents": [
{
"id": "claude",
"name": "Claude Code CLI",
"binary": "claude",
"version": "1.0.45",
"installed": true,
"protocol": "stdio",
"providerAlias": "claude",
"isCustom": false
},
{
"id": "my-custom-cli",
"name": "My Custom CLI",
"installed": false,
"protocol": "stdio",
"providerAlias": "my-provider",
"isCustom": true
}
],
"cacheTtlMs": 60000,
"cacheAge": 1234
}Auth: Requires management session (dashboard auth_token cookie) or a
management-scoped API key.
See ACP Framework for full details.
Analytics & Observability
Real-time analytics endpoints for monitoring routing, compression, and provider
diversity. These power the /dashboard/analytics/* pages.
Auto-routing analytics
| Method | Path | Description |
|---|---|---|
| GET | /api/analytics/auto-routing | Aggregate auto-routing stats: total calls, strategy distribution, tier distribution, top providers |
| GET | /api/analytics/auto-routing?days=7 | Time-windowed stats (default 24h) |
Response example:
{
"window": "24h",
"totalCalls": 1234,
"strategyBreakdown": {
"rules": 800,
"cost": 200,
"latency": 150,
"sla-aware": 50,
"lkgp": 34
},
"tierBreakdown": {
"ultra": 100,
"pro": 500,
"standard": 400,
"free": 234
},
"topProviders": [
{ "provider": "openai", "calls": 500, "avgLatencyMs": 850 },
{ "provider": "anthropic", "calls": 300, "avgLatencyMs": 1200 }
]
}Compression analytics
| Method | Path | Description |
|---|---|---|
| GET | /api/analytics/compression | Aggregate compression stats: tokens saved, savings %, mode distribution, engine usage |
Response example:
{
"window": "24h",
"totalOriginalTokens": 5000000,
"totalCompressedTokens": 3500000,
"totalSavings": 1500000,
"savingsPct": 30.0,
"modeBreakdown": {
"lite": 400,
"standard": 600,
"aggressive": 100,
"ultra": 50,
"rtk": 84
},
"engineBreakdown": {
"caveman": 800,
"rtk": 434
}
}Provider diversity tracking
| Method | Path | Description |
|---|---|---|
| GET | /api/analytics/diversity | Shannon entropy-based diversity tracking: prevents single points of failure by measuring provider spread |
Response example:
{
"window": "24h",
"shannonEntropy": 2.45,
"maxEntropy": 3.17,
"diversityRatio": 0.77,
"providerUsage": {
"openai": 0.40,
"anthropic": 0.25,
"google": 0.20,
"kiro": 0.15
},
"warnings": [
"OpenAI accounts for 40% of traffic — consider diversifying"
]
}Auth: Requires management session or management-scoped API key.
Admin Operations
Admin-only endpoints for operational management.
| Method | Path | Description |
|---|---|---|
| GET | /api/admin/concurrency | Read current concurrency limits (global + per-provider) |
| POST | /api/admin/concurrency | Update concurrency limits — body: {global?: number, perProvider?: Record<string, number>} |
Auth: Requires management session with admin scope.
CLI Tools Management
Manage CLI tools that integrate with OmniRoute (antigravity, chipotle, commandCode, devin-cli, etc.). See Provider Reference for the full list.
| Method | Path | Description |
|---|---|---|
| GET | /api/cli-tools/all-statuses | Status of all CLI tools (installed, version, last seen) |
| GET | /api/cli-tools/[id]/status | Status of a specific CLI tool (id can be: antigravity, chipotle, commandCode, devin-cli, etc.) |
| POST | /api/cli-tools/apply | Apply a CLI tool configuration to a provider connection |
| GET | /api/cli-tools/backups | List CLI tool configuration backups |
| POST | /api/cli-tools/backups | Create a backup of all CLI tool configurations |
| POST | /api/cli-tools/[id]/restore | Restore a CLI tool from a backup |
| GET | /api/cli-tools/antigravity-mitm | Antigravity MITM proxy status (the "antigravity-mitm" CLI tool) |
| POST | /api/cli-tools/antigravity-mitm/alias | Configure antigravity-mitm aliases |
Auth: Requires management session.
Agent Skills
Manage AI agent skills (similar to OpenAI's custom GPTs but for agents).
| Method | Path | Description |
|---|---|---|
| GET | /api/agent-skills | List all agent skills (built-in + custom) |
| GET | /api/agent-skills/[id] | Get a specific agent skill |
| POST | /api/agent-skills | Create a custom agent skill — body: {name, description, prompt, model?, temperature?} |
| PUT | /api/agent-skills/[id] | Update a custom agent skill |
| DELETE | /api/agent-skills/[id] | Delete a custom agent skill |
| GET | /api/agent-skills/[id]/raw | Get raw prompt + metadata (no execution) |
| POST | /api/agent-skills/generate | AI-generate a new skill from a natural language description |
Auth: Requires management session or management-scoped API key.
Cache Management
Manage the semantic cache and reasoning cache.
| Method | Path | Description |
|---|---|---|
| GET | /api/cache | Cache overview: total entries, hit rate, size on disk |
| GET | /api/cache/entries | List cached entries (with pagination) |
| DELETE | /api/cache/entries | Delete cache entries (filter by query parameters) |
| GET | /api/cache/stats | Detailed cache statistics (per-provider, per-model) |
| GET | /api/cache/reasoning | Reasoning cache status (for reasoning replay) |
| DELETE | /api/cache/reasoning | Clear reasoning cache — query params: ?toolCallId=<id> (single) or ?provider=<p> or no params (all) |
Auth: Requires management session.
Memory System
Manage persistent memory (FTS5 + vector embeddings).
| Method | Path | Description |
|---|---|---|
| GET | /api/memory | List memory entries (filter by scope, type, search query) |
| POST | /api/memory | Create a new memory entry — body: {scope, type, content, metadata?} |
| GET | /api/memory/[id] | Get a specific memory entry |
| PUT | /api/memory/[id] | Update a memory entry |
| DELETE | /api/memory/[id] | Delete a memory entry |
| GET | /api/memory/search | Search memory (FTS5 + vector) |
| POST | /api/memory/clear | Clear memory entries (with filters) |
| GET | /api/memory/stats | Memory statistics (total entries, embedding coverage, etc.) |
Auth: Requires management session or management-scoped API key.
Webhooks
Manage webhook subscriptions for events.
| Method | Path | Description |
|---|---|---|
| GET | /api/webhooks | List all webhook subscriptions |
| POST | /api/webhooks | Create a webhook subscription — body: {url, events[], secret?, active?} |
| GET | /api/webhooks/[id] | Get a specific webhook subscription |
| PUT | /api/webhooks/[id] | Update a webhook subscription |
| DELETE | /api/webhooks/[id] | Delete a webhook subscription |
| GET | /api/webhooks/events | List all available webhook event types |
| GET | /api/webhooks/[id]/deliveries | List delivery history for a webhook (success/failure log) |
| POST | /api/webhooks/[id]/test | Send a test event to a webhook |
Auth: Requires management session.
See Webhooks Framework for full event types.
Skills Framework
Manage Skills (the agentic extensions framework).
| Method | Path | Description |
|---|---|---|
| GET | /api/skills | List all installed skills (built-in + custom) |
| POST | /api/skills/install | Install a skill from a local path or URL |
| DELETE | /api/skills/[id] | Uninstall a skill |
| PUT | /api/skills/[id] | Enable or disable a skill — body: {enabled?: boolean, mode?: "on" | "off" | "auto"} |
| POST | /api/skills/executions | Execute a skill — body: {skillName, apiKeyId, input?, sessionId?} |
| GET | /api/skills/executions | List execution history for all skills (filter by ?apiKeyId=) |
Auth: Requires management session or management-scoped API key.
See Skills Framework for full details.
Plugins
Manage OmniRoute plugins (third-party extensions).
| Method | Path | Description |
|---|---|---|
| GET | /api/plugins | List installed plugins |
| POST | /api/plugins/install | Install a plugin from a local path or URL |
| DELETE | /api/plugins/[name] | Uninstall a plugin |
| POST | /api/plugins/[name]/activate | Activate a plugin |
| POST | /api/plugins/[name]/deactivate | Deactivate a plugin |
| GET | /api/plugins/[name]/config | Get plugin configuration |
| PUT | /api/plugins/[name]/config | Update plugin configuration |
Auth: Requires management session.
See Plugins Framework for full details.
Shadow Routing
Shadow / A-B comparison of providers is not a standalone REST surface — it is configured through combo routing (see Auto-Combo). Per-combo comparison metrics are served by GET /api/combos/metrics.
Guardrails
Inspect the runtime guardrails (PII detection, prompt injection detection, vision bridging). Guardrails run on every request; per-call opt-out is via the x-omniroute-disabled-guardrails request header — there is no persisted enable/disable surface.
| Method | Path | Description |
|---|---|---|
| GET | /api/guardrails | List the registered guardrails and their status (name / enabled / priority) |
| POST | /api/guardrails/test | Dry-run the pre-call pipeline over a sample input — body: {input, disabledGuardrails?} |
Auth: Requires management session.
See Security > Guardrails for full details.
Authentication
- Dashboard routes (
/dashboard/*) useauth_tokencookie - Login uses saved password hash; fallback to
INITIAL_PASSWORD requireLogintoggleable via/api/settings/require-login/v1/*routes optionally require Bearer API key whenREQUIRE_API_KEY=true
Breaking change (v3.8.0) —
/api/v1/agents/tasks/*and the cooldown management endpoints now require management auth (dashboardauth_tokencookie or a management-scoped API key). Clients that previously called these routes unauthenticated will receive401 Unauthorized. See commit588a0333(fix(auth): require management auth for agent and cooldown APIs).