ADR-002: LiteLLM Provider Abstraction

Field	Value
Status	Accepted
Date	2026-03-11
Decision Makers	CA (lead), PO, MEE, IE

Context

xOps BC1 calls AI models from two distinct paths:

Open WebUI (L6): Uses ANTHROPIC_API_KEY for direct native Anthropic integration (chat, embeddings, RAG)
FastAPI + CrewAI (L5): Uses LiteLLM as a gateway for all AI calls (crews, flows, batch jobs)

The platform must support provider changes (Claude → Bedrock → Ollama) without code changes, and maintain consistent behavior across LOCAL/TEST/SIT/PROD environments.

Decision

Use LiteLLM as the provider abstraction layer for FastAPI+CrewAI (L5). Open WebUI (L6) uses its native Anthropic integration directly.

This two-path architecture means:

L6 (Open WebUI): ANTHROPIC_API_KEY env var → native Claude features (streaming, tool use, vision)
L5 (FastAPI+CrewAI): LITELLM_MODEL env var → LiteLLM gateway → any supported provider

Both paths evolve via environment variables. Zero code change for provider switches.

Consequences

Gains

Config-not-code provider switches: LITELLM_MODEL=bedrock/... for sovereignty
Consistent behavior across all environments (same model, same prompts)
Multi-provider failover via LiteLLM fallback config
Cost optimization: Prompt Caching (60-80% savings), Batch API (50% savings), intelligent routing (~30%)
No vendor lock-in: API keys are the only provider-specific configuration

Losses

Two configuration surfaces: Open WebUI and LiteLLM configured separately
LiteLLM dependency: Additional Python package in L5 container
Feature parity risk: Not all LiteLLM-supported providers have identical feature sets (e.g., Ollama lacks native tool use)

Provider Evolution Path

BC1: LiteLLM → Claude API direct (golden path — simplest, best quality)
      ↓ config change (zero code change)
BC2+: LiteLLM → Bedrock VPC endpoint (sovereignty)
      ↓ config change (zero code change)
BC2+: LiteLLM → OpenAI / Azure OpenAI (redundancy)
      ↓ config change (zero code change)
BC2+: LiteLLM → Ollama local (privacy + cost at scale)

Alternatives Considered

Alternative	Verdict	When to Reconsider
Direct Bedrock API	VPC endpoint complexity unnecessary at BC1	Sovereignty mandate (APRA CPS 234 finding)
Direct OpenAI API	No abstraction = vendor lock-in	Never — LiteLLM wraps OpenAI natively
Ollama only	Quality gap vs Claude/GPT-4o for complex reasoning	Privacy mandate + cost at scale (>100 users)
Route all through LiteLLM	Loses Open WebUI native Anthropic features	If Open WebUI drops Anthropic native support

Source: xops.jsx ORCH[].strengths, LAYERS[id=5].features

Agent Scores

Agent	Score	Key Rationale
PO	95%	Config-change story removes vendor lock-in objection
CA	96%	Two-path architecture correctly separates UI and API concerns
MEE	94%	LiteLLM = standard ADLC provider abstraction pattern
IE	97%	Environment variable deployment = zero Terraform complexity

Context​

Decision​

Consequences​

Gains​

Losses​

Provider Evolution Path​

Alternatives Considered​

Agent Scores​