ADR-002: LiteLLM Provider Abstraction
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-03-11 |
| Decision Makers | CA (lead), PO, MEE, IE |
Context
xOps BC1 calls AI models from two distinct paths:
- Open WebUI (L6): Uses
ANTHROPIC_API_KEYfor direct native Anthropic integration (chat, embeddings, RAG) - FastAPI + CrewAI (L5): Uses LiteLLM as a gateway for all AI calls (crews, flows, batch jobs)
The platform must support provider changes (Claude → Bedrock → Ollama) without code changes, and maintain consistent behavior across LOCAL/TEST/SIT/PROD environments.
Decision
Use LiteLLM as the provider abstraction layer for FastAPI+CrewAI (L5). Open WebUI (L6) uses its native Anthropic integration directly.
This two-path architecture means:
- L6 (Open WebUI):
ANTHROPIC_API_KEYenv var → native Claude features (streaming, tool use, vision) - L5 (FastAPI+CrewAI):
LITELLM_MODELenv var → LiteLLM gateway → any supported provider
Both paths evolve via environment variables. Zero code change for provider switches.
Consequences
Gains
- Config-not-code provider switches:
LITELLM_MODEL=bedrock/...for sovereignty - Consistent behavior across all environments (same model, same prompts)
- Multi-provider failover via LiteLLM fallback config
- Cost optimization: Prompt Caching (60-80% savings), Batch API (50% savings), intelligent routing (~30%)
- No vendor lock-in: API keys are the only provider-specific configuration
Losses
- Two configuration surfaces: Open WebUI and LiteLLM configured separately
- LiteLLM dependency: Additional Python package in L5 container
- Feature parity risk: Not all LiteLLM-supported providers have identical feature sets (e.g., Ollama lacks native tool use)
Provider Evolution Path
BC1: LiteLLM → Claude API direct (golden path — simplest, best quality)
↓ config change (zero code change)
BC2+: LiteLLM → Bedrock VPC endpoint (sovereignty)
↓ config change (zero code change)
BC2+: LiteLLM → OpenAI / Azure OpenAI (redundancy)
↓ config change (zero code change)
BC2+: LiteLLM → Ollama local (privacy + cost at scale)
Alternatives Considered
| Alternative | Verdict | When to Reconsider |
|---|---|---|
| Direct Bedrock API | VPC endpoint complexity unnecessary at BC1 | Sovereignty mandate (APRA CPS 234 finding) |
| Direct OpenAI API | No abstraction = vendor lock-in | Never — LiteLLM wraps OpenAI natively |
| Ollama only | Quality gap vs Claude/GPT-4o for complex reasoning | Privacy mandate + cost at scale (>100 users) |
| Route all through LiteLLM | Loses Open WebUI native Anthropic features | If Open WebUI drops Anthropic native support |
Source: xops.jsx ORCH[].strengths, LAYERS[id=5].features
Agent Scores
| Agent | Score | Key Rationale |
|---|---|---|
| PO | 95% | Config-change story removes vendor lock-in objection |
| CA | 96% | Two-path architecture correctly separates UI and API concerns |
| MEE | 94% | LiteLLM = standard ADLC provider abstraction pattern |
| IE | 97% | Environment variable deployment = zero Terraform complexity |