Evolution Architecture: BC1 → BC2+ at Scale

Every BC2+ capability is a configuration change, module addition, or documented migration — never a rewrite.

The primary trust signal for ANZ FSI enterprise buyers: xOps BC1 ($180/mo, 2 services) scales to enterprise platform without architectural redesign.

Scaling Classification

Not all upgrades are equal. Each BC1→BC2+ change is classified by complexity:

Classification	Definition	Example	Risk
Config Change	Environment variable or Terraform variable	LiteLLM → Bedrock	LOW
Module Addition	New Terraform module, existing architecture	EFS module (M4)	LOW
Service Addition	New ECS service, same cluster	Monitoring sidecar	MED
Data Migration	Schema change, requires migration script + downtime	SQLite → Aurora	MED
Architecture Change	New orchestration layer, different operational model	ECS → EKS	HIGH

Honest Classification

SQLite → Aurora and ECS → EKS are NOT config changes. They require migration effort and planning. The golden path defers these to BC2+ because BC1 doesn't need them — not because they're trivial.

Component Evolution Matrix

Component	BC1 (Now)	BC2+ (When Needed)	Classification	Trigger	How
AI Provider	Claude API direct	Bedrock VPC endpoint	Config Change	Sovereignty mandate	`LITELLM_MODEL` env var
AI Redundancy	Single provider	Multi-provider failover	Config Change	Availability SLA	LiteLLM fallback config
Database	SQLite + EFS	RDS PostgreSQL	Data Migration	>50 concurrent writes	Migration script + TF module
Vector DB	ChromaDB (built-in)	pgvector or Qdrant	Config Change	Cross-system SQL+vector	CrewAI Knowledge config
Services	2 docker services	8+ microservices	Service Addition	Team >5 engineers	docker-compose profiles
Auth	Open WebUI built-in	Keycloak + SCIM pipeline	Config Change	Enterprise SSO	OIDC env var
FinOps Analytics	File-based JSON/CSV	S3 Tables (Iceberg)	Module Addition	Scan volume >1TB	Terraform module
Compute	ECS Fargate	EKS with service mesh	Architecture Change	>6 services + mTLS	Full migration plan
Cache	ALB sticky sessions	Valkey / ElastiCache	Module Addition	Pub/sub required	Terraform module
Edge	CloudFront PriceClass_100	PriceClass_200	Config Change	Cross-region latency	TF variable
DevOps Platform	GitHub Actions CI	K3S GitOps (ArgoCD+Atlantis)	Service Addition	IaC PRs >5/wk, team >3	Activate tf-k3s
Multi-cloud	AWS-only	Crossplane on K3S	Module Addition	Second cloud provider	Crossplane ProviderConfig
Edge/IoT	N/A	K3S ARM64 edge nodes	Architecture Change	On-prem mandate	K3S + edge Ansible

Hybrid Architecture: Option C (ADR-005)

When enterprise requirements exceed ECS-only capability (on-prem, IoT, multi-cloud, GitOps), Option C Hybrid provides two independent streams:

Stream 1: ECS Fargate                Stream 2: K3S GitOps
├── CloudOps AI services              ├── ArgoCD
├── FinOps AI services                ├── Vault HA
├── Open WebUI (L6)                   ├── Atlantis
└── FastAPI+CrewAI (L5)               ├── Crossplane
                                      ├── cert-manager
Cost: $180/mo (BC1)                   └── external-dns
Agent: infrastructure-engineer
Local: docker-compose                 Cost: $0 on-prem / $120-190 cloud
Prod:  ECS Graviton4                  Agent: kubernetes-engineer
                                      Local: K3D
                                      Prod:  K3S 3-node HA

Key principle: Independent failure domains. ECS AI services operate independently of K3S DevOps platform. Either stream can be activated, scaled, or decommissioned without affecting the other.

K3S IaC: 161 files at DevOps-Terraform/tf-k3s (85% ready for DevOps GitOps platform).

K3S Activation Triggers

Trigger	Action	Classification
IaC PRs >5/week	Activate Atlantis on K3S	Service Addition
Team >3 engineers	ArgoCD + Atlantis for concurrent PR isolation	Service Addition
Second cloud (Azure/GCP)	Crossplane on K3S	Module Addition
On-prem/IoT mandate	K3S edge nodes	Architecture Change

2026-2030 Enterprise Trend Coverage

Trend	ECS Only (BC1)	Hybrid (Option C)
Local-first (docker)	docker-compose	docker-compose + K3D
Local-AI (Ollama)	docker profile	+ K3S GPU nodes
IoT / Edge	AWS-only	K3S ARM64 any device
On-prem	AWS-only	K3S bare metal
Multi-cloud	AWS-only	Crossplane from K3S
Air-gapped	needs internet	K3S offline install

Cost Evolution

Scale	Users	Services	Monthly Cost	ROI vs $2k SaaS
BC1 PROD	<50	2	$180	11x
BC1 PEAK	<50	2	$380	5.3x
BC2 Growth	50-200	4-6	$360	5.6x
BC2+ Enterprise	200-500	8+	$1,200	1.7x
SaaS Equivalent	Any	N/A	$2,000+	1x (baseline)

ROI remains positive at all scales. Break-even with SaaS only at >500 users with full EKS+Aurora+Keycloak — at which point you have a platform, not a tool.

LiteLLM: The Config-Change Enabler

LiteLLM is the architectural decision that makes most AI provider changes a config change:

# BC1: Claude API direct (golden path)
LITELLM_MODEL=claude-sonnet-4-6
ANTHROPIC_API_KEY=sk-ant-...

# BC2+: Bedrock VPC (sovereignty)
LITELLM_MODEL=bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0
AWS_REGION=ap-southeast-2

# BC2+: Multi-provider failover
LITELLM_FALLBACKS=["bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0", "openai/gpt-4o"]

# BC2+: Ollama local (privacy + cost at scale)
LITELLM_MODEL=ollama/llama3.1
OLLAMA_API_BASE=http://localhost:11434

Same application code. Same prompts. Same CrewAI crews. Zero code change.

Two AI API Paths

Open WebUI (L6) uses ANTHROPIC_API_KEY for direct native Anthropic integration. FastAPI+CrewAI (L5) uses LiteLLM as a gateway.

Both paths evolve via environment variables — but through different configuration surfaces. See ADR-002 for details.

Scaling Triggers with Thresholds

Threshold	Signal	Detection	Action	Phase
>50 users	SQLite write contention	`PRAGMA journal_mode=WAL` errors	Upgrade L4 to RDS	BC2
>100 concurrent	ECS CPU >80% sustained	CloudWatch Application Signals	Scale L5+L6 to 6 replicas	BC1 PEAK
>10 crews/hr	CrewAI queue depth	Fargate task pending count	Add Fargate Spot workers	BC1 PEAK
>6 services	Service discovery complexity	ECS Service Connect limits	Evaluate EKS migration	BC2+
Sovereignty mandate	Regulatory requirement	APRA CPS 234 audit finding	LiteLLM → Bedrock VPC	BC2+
FinOps scale	Scan volume >1TB	S3 storage growth metrics	S3 Tables (Iceberg) for analytics	BC2+
Cross-region	Latency >200ms from NZ	CloudFront origin latency	PriceClass_200	BC2+
IaC PRs >5/wk	Team PR velocity	GitHub Actions workflow runs	Atlantis on K3S	BC2+
Team >3 eng	Concurrent PR conflicts	PR merge conflicts/week	ArgoCD + Atlantis	BC2+
Second cloud	Multi-cloud mandate	Procurement decision	Crossplane on K3S	BC2+
On-prem mandate	Regulatory/data gravity	Board/compliance directive	K3S edge nodes	BC2+

What Doesn't Change

These BC1 decisions are permanent — they scale without modification:

Decision	Why It's Permanent
ECS Fargate (not EC2)	Zero OS patching at any scale
Graviton4 ARM64	Better price-performance grows with scale
CloudFront + WAFv2	450+ PoPs, bot protection scales automatically
IAM Identity Center	AWS-native, free at any user count
ADLC framework governance	9 agents, 5 hooks, 58 checkpoints — scale-independent
4-way cross-validation	24 signals work identically at $180/mo and $1,200/mo
FOCUS 1.2+ cost tags	Chargeback reporting scales with resources, not users

Evidence

Source of truth: docs/src/pages/command-center.jsx (LAYERS[].whyNot[], COST_ENV[])
PR/FAQ: Command Center BC1 PR/FAQ Q10, Q11
ADRs: Architecture Decision Records
Golden Paths: Golden Paths (stage-by-stage progression)
Coordination: tmp/adlc-framework/coordination-logs/cloud-architect-2026-03-11-docs-expansion.json

Scaling Classification​

Component Evolution Matrix​

Hybrid Architecture: Option C (ADR-005)​

K3S Activation Triggers​

2026-2030 Enterprise Trend Coverage​

Cost Evolution​

LiteLLM: The Config-Change Enabler​

Scaling Triggers with Thresholds​

What Doesn't Change​

Evidence​