Evolution Architecture: BC1 → BC2+ at Scale
Every BC2+ capability is a configuration change, module addition, or documented migration — never a rewrite.
The primary trust signal for ANZ FSI enterprise buyers: xOps BC1 ($180/mo, 2 services) scales to enterprise platform without architectural redesign.
Scaling Classification
Not all upgrades are equal. Each BC1→BC2+ change is classified by complexity:
| Classification | Definition | Example | Risk |
|---|---|---|---|
| Config Change | Environment variable or Terraform variable | LiteLLM → Bedrock | LOW |
| Module Addition | New Terraform module, existing architecture | EFS module (M4) | LOW |
| Service Addition | New ECS service, same cluster | Monitoring sidecar | MED |
| Data Migration | Schema change, requires migration script + downtime | SQLite → Aurora | MED |
| Architecture Change | New orchestration layer, different operational model | ECS → EKS | HIGH |
SQLite → Aurora and ECS → EKS are NOT config changes. They require migration effort and planning. The golden path defers these to BC2+ because BC1 doesn't need them — not because they're trivial.
Component Evolution Matrix
| Component | BC1 (Now) | BC2+ (When Needed) | Classification | Trigger | How |
|---|---|---|---|---|---|
| AI Provider | Claude API direct | Bedrock VPC endpoint | Config Change | Sovereignty mandate | LITELLM_MODEL env var |
| AI Redundancy | Single provider | Multi-provider failover | Config Change | Availability SLA | LiteLLM fallback config |
| Database | SQLite + EFS | RDS PostgreSQL | Data Migration | >50 concurrent writes | Migration script + TF module |
| Vector DB | ChromaDB (built-in) | pgvector or Qdrant | Config Change | Cross-system SQL+vector | CrewAI Knowledge config |
| Services | 2 docker services | 8+ microservices | Service Addition | Team >5 engineers | docker-compose profiles |
| Auth | Open WebUI built-in | Keycloak + SCIM pipeline | Config Change | Enterprise SSO | OIDC env var |
| FinOps Analytics | File-based JSON/CSV | S3 Tables (Iceberg) | Module Addition | Scan volume >1TB | Terraform module |
| Compute | ECS Fargate | EKS with service mesh | Architecture Change | >6 services + mTLS | Full migration plan |
| Cache | ALB sticky sessions | Valkey / ElastiCache | Module Addition | Pub/sub required | Terraform module |
| Edge | CloudFront PriceClass_100 | PriceClass_200 | Config Change | Cross-region latency | TF variable |
| DevOps Platform | GitHub Actions CI | K3S GitOps (ArgoCD+Atlantis) | Service Addition | IaC PRs >5/wk, team >3 | Activate tf-k3s |
| Multi-cloud | AWS-only | Crossplane on K3S | Module Addition | Second cloud provider | Crossplane ProviderConfig |
| Edge/IoT | N/A | K3S ARM64 edge nodes | Architecture Change | On-prem mandate | K3S + edge Ansible |
Hybrid Architecture: Option C (ADR-005)
When enterprise requirements exceed ECS-only capability (on-prem, IoT, multi-cloud, GitOps), Option C Hybrid provides two independent streams:
Stream 1: ECS Fargate Stream 2: K3S GitOps
├── CloudOps AI services ├── ArgoCD
├── FinOps AI services ├── Vault HA
├── Open WebUI (L6) ├── Atlantis
└── FastAPI+CrewAI (L5) ├── Crossplane
├── cert-manager
Cost: $180/mo (BC1) └── external-dns
Agent: infrastructure-engineer
Local: docker-compose Cost: $0 on-prem / $120-190 cloud
Prod: ECS Graviton4 Agent: kubernetes-engineer
Local: K3D
Prod: K3S 3-node HA
Key principle: Independent failure domains. ECS AI services operate independently of K3S DevOps platform. Either stream can be activated, scaled, or decommissioned without affecting the other.
K3S IaC: 161 files at DevOps-Terraform/tf-k3s (85% ready for DevOps GitOps platform).
K3S Activation Triggers
| Trigger | Action | Classification |
|---|---|---|
| IaC PRs >5/week | Activate Atlantis on K3S | Service Addition |
| Team >3 engineers | ArgoCD + Atlantis for concurrent PR isolation | Service Addition |
| Second cloud (Azure/GCP) | Crossplane on K3S | Module Addition |
| On-prem/IoT mandate | K3S edge nodes | Architecture Change |
2026-2030 Enterprise Trend Coverage
| Trend | ECS Only (BC1) | Hybrid (Option C) |
|---|---|---|
| Local-first (docker) | docker-compose | docker-compose + K3D |
| Local-AI (Ollama) | docker profile | + K3S GPU nodes |
| IoT / Edge | AWS-only | K3S ARM64 any device |
| On-prem | AWS-only | K3S bare metal |
| Multi-cloud | AWS-only | Crossplane from K3S |
| Air-gapped | needs internet | K3S offline install |
Cost Evolution
| Scale | Users | Services | Monthly Cost | ROI vs $2k SaaS |
|---|---|---|---|---|
| BC1 PROD | <50 | 2 | $180 | 11x |
| BC1 PEAK | <50 | 2 | $380 | 5.3x |
| BC2 Growth | 50-200 | 4-6 | $360 | 5.6x |
| BC2+ Enterprise | 200-500 | 8+ | $1,200 | 1.7x |
| SaaS Equivalent | Any | N/A | $2,000+ | 1x (baseline) |
ROI remains positive at all scales. Break-even with SaaS only at >500 users with full EKS+Aurora+Keycloak — at which point you have a platform, not a tool.
LiteLLM: The Config-Change Enabler
LiteLLM is the architectural decision that makes most AI provider changes a config change:
# BC1: Claude API direct (golden path)
LITELLM_MODEL=claude-sonnet-4-6
ANTHROPIC_API_KEY=sk-ant-...
# BC2+: Bedrock VPC (sovereignty)
LITELLM_MODEL=bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0
AWS_REGION=ap-southeast-2
# BC2+: Multi-provider failover
LITELLM_FALLBACKS=["bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0", "openai/gpt-4o"]
# BC2+: Ollama local (privacy + cost at scale)
LITELLM_MODEL=ollama/llama3.1
OLLAMA_API_BASE=http://localhost:11434
Same application code. Same prompts. Same CrewAI crews. Zero code change.
Open WebUI (L6) uses ANTHROPIC_API_KEY for direct native Anthropic integration.
FastAPI+CrewAI (L5) uses LiteLLM as a gateway.
Both paths evolve via environment variables — but through different configuration surfaces. See ADR-002 for details.
Scaling Triggers with Thresholds
| Threshold | Signal | Detection | Action | Phase |
|---|---|---|---|---|
| >50 users | SQLite write contention | PRAGMA journal_mode=WAL errors | Upgrade L4 to RDS | BC2 |
| >100 concurrent | ECS CPU >80% sustained | CloudWatch Application Signals | Scale L5+L6 to 6 replicas | BC1 PEAK |
| >10 crews/hr | CrewAI queue depth | Fargate task pending count | Add Fargate Spot workers | BC1 PEAK |
| >6 services | Service discovery complexity | ECS Service Connect limits | Evaluate EKS migration | BC2+ |
| Sovereignty mandate | Regulatory requirement | APRA CPS 234 audit finding | LiteLLM → Bedrock VPC | BC2+ |
| FinOps scale | Scan volume >1TB | S3 storage growth metrics | S3 Tables (Iceberg) for analytics | BC2+ |
| Cross-region | Latency >200ms from NZ | CloudFront origin latency | PriceClass_200 | BC2+ |
| IaC PRs >5/wk | Team PR velocity | GitHub Actions workflow runs | Atlantis on K3S | BC2+ |
| Team >3 eng | Concurrent PR conflicts | PR merge conflicts/week | ArgoCD + Atlantis | BC2+ |
| Second cloud | Multi-cloud mandate | Procurement decision | Crossplane on K3S | BC2+ |
| On-prem mandate | Regulatory/data gravity | Board/compliance directive | K3S edge nodes | BC2+ |
What Doesn't Change
These BC1 decisions are permanent — they scale without modification:
| Decision | Why It's Permanent |
|---|---|
| ECS Fargate (not EC2) | Zero OS patching at any scale |
| Graviton4 ARM64 | Better price-performance grows with scale |
| CloudFront + WAFv2 | 450+ PoPs, bot protection scales automatically |
| IAM Identity Center | AWS-native, free at any user count |
| ADLC framework governance | 9 agents, 5 hooks, 58 checkpoints — scale-independent |
| 4-way cross-validation | 24 signals work identically at $180/mo and $1,200/mo |
| FOCUS 1.2+ cost tags | Chargeback reporting scales with resources, not users |
Evidence
- Source of truth:
docs/src/pages/xops.jsx(LAYERS[].whyNot[], COST_ENV[]) - PR/FAQ: xOps BC1 PR/FAQ Q10, Q11
- ADRs: Architecture Decision Records
- Golden Paths: Golden Paths (stage-by-stage progression)
- Coordination:
tmp/adlc-framework/coordination-logs/cloud-architect-2026-03-11-docs-expansion.json