A unified model gateway that routes tasks to the right AI provider based on complexity. A lightweight judge model classifies each task, then routes it to the cheapest capable tier — delivering frontier-quality on complex work while running cheap models on everything else. Target: 60–90% cost savings vs. naive all-frontier routing.
Every task enters through the Gateway interface. The Judge-Router classifies complexity,
the config maps tiers to concrete models, the LiteLLM client dispatches via provider adapters,
and cost is recorded on every response.
Ctrl/Cmd + wheel to zoom · Scroll to pan · Double-click to fit
The judge model classifies every task into one of three tiers. Each tier has a primary model and a fallback chain
configured in config/models.yaml.
Five features cover the full gateway stack. Each has its own spec, branch, and implementation.
Gateway, Router,
Completer, and CostTracker interfaces plus all request/response types.
No business logic, no I/O — pure type definitions.
/chat/completions on LiteLLM proxy.
Provider adapters translate the canonical request into each API’s format:
Anthropic clamps temperature ≤1.0, OpenAI zeros temperature for reasoning models,
Ollama strips the ollama/ prefix.
config/models.yaml for tier definitions, provider settings,
fallback ordering, and cost tables. FallbackCompleter retries with the next
provider in the chain on failure. Validated at startup — no hot-reload.
Record calls via sync.RWMutex. Proves the 60–90% savings claim.
| Interface | Methods | Implementor | Package |
|---|---|---|---|
Gateway |
Route() Complete() GetCostReport() |
Top-level facade | internal/gateway |
Router |
Route(TaskSpec) (ModelTier, error) |
JudgeRouter |
internal/gateway |
Completer |
Complete(CompletionRequest) (CompletionResponse, error) |
LiteLLMClient, FallbackCompleter |
internal/gateway |
CostTracker |
Record(CostRecord) Report(TimePeriod) |
InMemoryCostTracker |
internal/gateway |
FormatAdapter |
Name() FormatRequest() ParseModelName() |
Anthropic, OpenAI, Ollama |
internal/gateway/providers |
Ctrl/Cmd + wheel to zoom · Scroll to pan · Double-click to fit
| Risk | Likelihood | Mitigation |
|---|---|---|
| Judge model misclassifies task complexity | Medium | Log routing decisions; human override flag; tune with feedback loop |
| LiteLLM proxy adds ~10ms latency per call | Low | Acceptable for agent workloads; batch mode amortises for bulk |
| DeepSeek pricing varies by cache hit/miss (10×) | Medium | Track cache-hit rate separately; alert on unexpected cost spikes |
| Provider adapter misformats requests | Low | Each adapter has dedicated unit tests; roundtrip integration tests in T9 |