Skip to main content

ADR-004: 2 Docker Services, Not Microservices

FieldValue
StatusAccepted
Date2026-03-11
Decision MakersCA (lead), PO, MEE, IE

Context

The initial xOps architecture proposal included 8+ services: Open WebUI, FastAPI, CrewAI, Ollama, ChromaDB, Redis/Valkey, analytics, monitoring. Through iterative HITL-driven KISS/5S simplification (50+ edits across 2 sessions), the architecture was reduced to 2 docker services.

The HITL principle: "Every service must pass the BC1 NEED test." If a service can be a library import, a built-in feature, or an optional profile — it's not a standalone service.

Decision

2 docker services for BC1:

  1. openwebui — Open WebUI 0.8+ (L6: Interface + built-in RAG + SQLite)
  2. fastapi-crewai — FastAPI + CrewAI + CloudOps-Runbooks (L5: API + crews + MCP)

Plus 1 optional profile:

  • ollama — Ollama for local LLM inference (docker compose --profile ollama up)

The KISS/5S Journey

Original ServiceKISS DecisionWhere It Went
Open WebUIKEEPService 1 (core)
FastAPI + CrewAIMERGE into 1 containerService 2 (API + crews)
OllamaOPTIONAL PROFILE--profile ollama (not default)
ChromaDB serverREMOVEDBuilt into CrewAI Knowledge (library, not server)
Redis/ValkeyREMOVEDALB sticky sessions sufficient at BC1
Analytics serverREMOVEDFile-based JSON/CSV in tmp/
Monitoring sidecarREMOVEDCloudWatch Application Signals (managed)
KeycloakREMOVEDOpen WebUI built-in auth + IAM Identity Center

Result: 8 services → 2 services. 75% fewer containers. Same functionality.

Consequences

Gains

  • $0 container orchestration: docker-compose, not Kubernetes
  • 2-minute local startup: docker compose up -d and you're running
  • Same file everywhere: bare-metal and devcontainer use the same docker-compose.yml
  • Minimal operational surface: 2 health checks, 2 log streams, 2 scaling policies

Losses

  • Coupled scaling: L5 (API) and L6 (UI) scale together in their respective services. Fine at BC1 (<50 users), may need separation at BC2+.
  • Larger container images: Combined FastAPI+CrewAI image includes all Python dependencies
  • No service mesh: ECS Service Connect provides discovery but not mTLS between services

Upgrade Path

When team grows to >5 engineers or >6 distinct workloads: decompose into separate services with ECS Service Connect. At >6 services with mTLS requirement, evaluate EKS migration. This is a full architecture change, not a config change. See Evolution Architecture for the classification.

Option C Hybrid path: When on-prem, IoT, multi-cloud, or GitOps platform requirements emerge, activate K3S as Stream 2 alongside ECS (Stream 1). ECS handles AI services (CloudOps+FinOps), K3S handles DevOps GitOps (ArgoCD, Vault, Atlantis). Independent failure domains — see ADR-005 for the full decision record.

Alternatives Considered

AlternativeCost ImpactVerdict
EKS+$73/mo control planeOver-engineered for 2 services
ECS with 6+ services+$50/mo computeNo business need at BC1
EC2 ASG~sameOS patching overhead; Fargate = zero OS ops
docker-compose only (no ECS)-$110/moNo auto-scaling, no health checks, no CloudWatch
K3S GitOps$0 on-prem / $120 cloudExcellent for DevOps GitOps (ArgoCD+Atlantis). BC2+ Stream 2 when on-prem/multi-cloud mandated. 161-file IaC exists at DevOps-Terraform/tf-k3s. See ADR-005.

Source: xops.jsx LAYERS[id=2].whyNot[], KISS/5S retrospective

Agent Scores

AgentScoreKey Rationale
PO96%2 services = fastest time to value; enterprise buyers understand "simple first"
CA98%Architecturally sound; ECS scaling handles BC1 load; clean upgrade path
MEE95%ADLC framework operates identically on 2 or 20 services
IE97%M2 Terraform module already supports this pattern