ADR-004: 2 Docker Services, Not Microservices
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-03-11 |
| Decision Makers | CA (lead), PO, MEE, IE |
Context
The initial xOps architecture proposal included 8+ services: Open WebUI, FastAPI, CrewAI, Ollama, ChromaDB, Redis/Valkey, analytics, monitoring. Through iterative HITL-driven KISS/5S simplification (50+ edits across 2 sessions), the architecture was reduced to 2 docker services.
The HITL principle: "Every service must pass the BC1 NEED test." If a service can be a library import, a built-in feature, or an optional profile — it's not a standalone service.
Decision
2 docker services for BC1:
openwebui— Open WebUI 0.8+ (L6: Interface + built-in RAG + SQLite)fastapi-crewai— FastAPI + CrewAI + CloudOps-Runbooks (L5: API + crews + MCP)
Plus 1 optional profile:
ollama— Ollama for local LLM inference (docker compose --profile ollama up)
The KISS/5S Journey
| Original Service | KISS Decision | Where It Went |
|---|---|---|
| Open WebUI | KEEP | Service 1 (core) |
| FastAPI + CrewAI | MERGE into 1 container | Service 2 (API + crews) |
| Ollama | OPTIONAL PROFILE | --profile ollama (not default) |
| ChromaDB server | REMOVED | Built into CrewAI Knowledge (library, not server) |
| Redis/Valkey | REMOVED | ALB sticky sessions sufficient at BC1 |
| Analytics server | REMOVED | File-based JSON/CSV in tmp/ |
| Monitoring sidecar | REMOVED | CloudWatch Application Signals (managed) |
| Keycloak | REMOVED | Open WebUI built-in auth + IAM Identity Center |
Result: 8 services → 2 services. 75% fewer containers. Same functionality.
Consequences
Gains
- $0 container orchestration: docker-compose, not Kubernetes
- 2-minute local startup:
docker compose up -dand you're running - Same file everywhere: bare-metal and devcontainer use the same docker-compose.yml
- Minimal operational surface: 2 health checks, 2 log streams, 2 scaling policies
Losses
- Coupled scaling: L5 (API) and L6 (UI) scale together in their respective services. Fine at BC1 (<50 users), may need separation at BC2+.
- Larger container images: Combined FastAPI+CrewAI image includes all Python dependencies
- No service mesh: ECS Service Connect provides discovery but not mTLS between services
Upgrade Path
When team grows to >5 engineers or >6 distinct workloads: decompose into separate services with ECS Service Connect. At >6 services with mTLS requirement, evaluate EKS migration. This is a full architecture change, not a config change. See Evolution Architecture for the classification.
Option C Hybrid path: When on-prem, IoT, multi-cloud, or GitOps platform requirements emerge, activate K3S as Stream 2 alongside ECS (Stream 1). ECS handles AI services (CloudOps+FinOps), K3S handles DevOps GitOps (ArgoCD, Vault, Atlantis). Independent failure domains — see ADR-005 for the full decision record.
Alternatives Considered
| Alternative | Cost Impact | Verdict |
|---|---|---|
| EKS | +$73/mo control plane | Over-engineered for 2 services |
| ECS with 6+ services | +$50/mo compute | No business need at BC1 |
| EC2 ASG | ~same | OS patching overhead; Fargate = zero OS ops |
| docker-compose only (no ECS) | -$110/mo | No auto-scaling, no health checks, no CloudWatch |
| K3S GitOps | $0 on-prem / $120 cloud | Excellent for DevOps GitOps (ArgoCD+Atlantis). BC2+ Stream 2 when on-prem/multi-cloud mandated. 161-file IaC exists at DevOps-Terraform/tf-k3s. See ADR-005. |
Source: xops.jsx LAYERS[id=2].whyNot[], KISS/5S retrospective
Agent Scores
| Agent | Score | Key Rationale |
|---|---|---|
| PO | 96% | 2 services = fastest time to value; enterprise buyers understand "simple first" |
| CA | 98% | Architecturally sound; ECS scaling handles BC1 load; clean upgrade path |
| MEE | 95% | ADLC framework operates identically on 2 or 20 services |
| IE | 97% | M2 Terraform module already supports this pattern |