Skip to main content

ADR-005: Option C Hybrid Architecture

FieldValue
StatusAccepted
Date2026-03-11
Decision MakersCA (lead), PO, MEE, IE — 4-agent consensus

Context

xOps = CloudOps + FinOps + DevOps. BC1 uses 2 ECS Fargate services (ADR-004) for AI workloads. However, the 2026-2030 enterprise trend demands capabilities beyond ECS-only:

  • Local-first + hybrid-cloud: Docker + K3D locally, ECS + K3S in production
  • IoT / Edge: K3S runs on ARM64 devices, Raspberry Pi, industrial gateways
  • On-prem: Regulated industries (FSI, Energy) require on-premises compute
  • Multi-cloud: Crossplane on K3S provisions Azure/GCP resources from a single control plane
  • Air-gapped: K3S installs offline; ECS requires internet connectivity

The DevOps domain (ArgoCD, Vault, Atlantis, Crossplane) has fundamentally different operational characteristics from AI services (Open WebUI, FastAPI+CrewAI). Mixing them on one compute platform creates coupling without benefit.

Existing IaC: 161 files at DevOps-Terraform/tf-k3s — 85% ready for DevOps GitOps platform.

Decision

Two parallel streams with independent failure domains:

Stream 1: ECS FargateStream 2: K3S GitOps
DomainCloudOps + FinOps (AI Services)DevOps (GitOps Platform)
ServicesOpen WebUI, FastAPI+CrewAIArgoCD, Vault HA, Atlantis, Crossplane
Cost$180/mo (BC1)$0 on-prem / $120-190 cloud VMs
Agentinfrastructure-engineerkubernetes-engineer
Localdocker-composeK3D
ProdECS Graviton4 ARM64K3S 3-node HA
IaCterraform-aws modules (M1-M4)DevOps-Terraform/tf-k3s (161 files)

Stream 2 is activated only when quantified triggers fire — not proactively. BC1 starts with Stream 1 only.

Consequences

Gains

  • Hybrid-cloud: On-prem, IoT, edge, multi-cloud — all addressed by K3S
  • GitOps platform: ArgoCD + Atlantis = production-grade IaC review and deployment
  • ADLC Principle IV compliance: Hybrid Deployment (LocalStack + K3D + AWS)
  • Independent failure domains: ECS AI services unaffected by K3S operations
  • Existing IaC: 161 files at tf-k3s, 85% ready — minimal new work

Losses

  • Two compute planes: Additional monitoring and operational surface
  • Complexity: Kubernetes knowledge required for Stream 2
  • Cost: $120-190/mo for cloud K3S VMs (on-prem = $0)

Alternatives Considered

OptionWeighted ScoreVerdict
A: ECS only83.3Covers BC1 CloudOps+FinOps. Cannot address on-prem/IoT/multi-cloud.
B: K3S only71.1Over-engineered for AI services. Missing managed ECS benefits.
C: Hybrid (winner)87.1ECS for AI (managed, simple) + K3S for DevOps (flexible, portable). Best of both.

Well-Architected Scores

PillarECS OnlyK3S OnlyHybrid
Operational Excellence926590
Security905888
Reliability886290
Performance856092
Cost Optimisation905588
Sustainability806592
Average88.560.389.8
With 2026-2030 trend91.2

Agent Scores

AgentScoreKey Rationale
PO76.25%BC1 Stream 2 adds no immediate customer value; justified as BC2+ readiness
CA91.2%Architecturally sound; independent failure domains; tf-k3s 85% ready
MEE93.0%kubernetes-engineer agent + K3D/K3S commands already in ADLC framework
IE87.8%161-file IaC exists; K3S operational model well-understood
Consensus87.1%Architecture agreement: 100% (all agents approve Option C)

Activation Triggers

TriggerThresholdActionClassification
IaC PRs>5/weekActivate Atlantis on K3SService Addition
Team size>3 engineersArgoCD + Atlantis for concurrent PR isolationService Addition
Second cloudAzure/GCP mandateCrossplane on K3SModule Addition
On-prem/IoTRegulatory mandateK3S edge nodesArchitecture Change

Cross-References

  • ADR-004: 2 docker services (KISS/5S) — BC1 ECS baseline
  • Evolution Architecture: Scaling classification + hybrid section
  • Golden Paths: Stage 3B K3S Hybrid-Cloud path
  • xops.jsx: HYBRID_ARCH constant, LAYERS[id=2].whyNot[] K3S entry
  • Coordination: tmp/adlc-framework/coordination-logs/*-2026-03-11-docker-vs-k3s-v2.json