Skip to main content

Evolution Architecture: BC1 → BC2+ at Scale

Every BC2+ capability is a configuration change, module addition, or documented migration — never a rewrite.

The primary trust signal for ANZ FSI enterprise buyers: xOps BC1 ($180/mo, 2 services) scales to enterprise platform without architectural redesign.


Scaling Classification

Not all upgrades are equal. Each BC1→BC2+ change is classified by complexity:

ClassificationDefinitionExampleRisk
Config ChangeEnvironment variable or Terraform variableLiteLLM → BedrockLOW
Module AdditionNew Terraform module, existing architectureEFS module (M4)LOW
Service AdditionNew ECS service, same clusterMonitoring sidecarMED
Data MigrationSchema change, requires migration script + downtimeSQLite → AuroraMED
Architecture ChangeNew orchestration layer, different operational modelECS → EKSHIGH
Honest Classification

SQLite → Aurora and ECS → EKS are NOT config changes. They require migration effort and planning. The golden path defers these to BC2+ because BC1 doesn't need them — not because they're trivial.


Component Evolution Matrix

ComponentBC1 (Now)BC2+ (When Needed)ClassificationTriggerHow
AI ProviderClaude API directBedrock VPC endpointConfig ChangeSovereignty mandateLITELLM_MODEL env var
AI RedundancySingle providerMulti-provider failoverConfig ChangeAvailability SLALiteLLM fallback config
DatabaseSQLite + EFSRDS PostgreSQLData Migration>50 concurrent writesMigration script + TF module
Vector DBChromaDB (built-in)pgvector or QdrantConfig ChangeCross-system SQL+vectorCrewAI Knowledge config
Services2 docker services8+ microservicesService AdditionTeam >5 engineersdocker-compose profiles
AuthOpen WebUI built-inKeycloak + SCIM pipelineConfig ChangeEnterprise SSOOIDC env var
FinOps AnalyticsFile-based JSON/CSVS3 Tables (Iceberg)Module AdditionScan volume >1TBTerraform module
ComputeECS FargateEKS with service meshArchitecture Change>6 services + mTLSFull migration plan
CacheALB sticky sessionsValkey / ElastiCacheModule AdditionPub/sub requiredTerraform module
EdgeCloudFront PriceClass_100PriceClass_200Config ChangeCross-region latencyTF variable
DevOps PlatformGitHub Actions CIK3S GitOps (ArgoCD+Atlantis)Service AdditionIaC PRs >5/wk, team >3Activate tf-k3s
Multi-cloudAWS-onlyCrossplane on K3SModule AdditionSecond cloud providerCrossplane ProviderConfig
Edge/IoTN/AK3S ARM64 edge nodesArchitecture ChangeOn-prem mandateK3S + edge Ansible

Hybrid Architecture: Option C (ADR-005)

When enterprise requirements exceed ECS-only capability (on-prem, IoT, multi-cloud, GitOps), Option C Hybrid provides two independent streams:

Stream 1: ECS Fargate                Stream 2: K3S GitOps
├── CloudOps AI services ├── ArgoCD
├── FinOps AI services ├── Vault HA
├── Open WebUI (L6) ├── Atlantis
└── FastAPI+CrewAI (L5) ├── Crossplane
├── cert-manager
Cost: $180/mo (BC1) └── external-dns
Agent: infrastructure-engineer
Local: docker-compose Cost: $0 on-prem / $120-190 cloud
Prod: ECS Graviton4 Agent: kubernetes-engineer
Local: K3D
Prod: K3S 3-node HA

Key principle: Independent failure domains. ECS AI services operate independently of K3S DevOps platform. Either stream can be activated, scaled, or decommissioned without affecting the other.

K3S IaC: 161 files at DevOps-Terraform/tf-k3s (85% ready for DevOps GitOps platform).

K3S Activation Triggers

TriggerActionClassification
IaC PRs >5/weekActivate Atlantis on K3SService Addition
Team >3 engineersArgoCD + Atlantis for concurrent PR isolationService Addition
Second cloud (Azure/GCP)Crossplane on K3SModule Addition
On-prem/IoT mandateK3S edge nodesArchitecture Change

2026-2030 Enterprise Trend Coverage

TrendECS Only (BC1)Hybrid (Option C)
Local-first (docker)docker-composedocker-compose + K3D
Local-AI (Ollama)docker profile+ K3S GPU nodes
IoT / EdgeAWS-onlyK3S ARM64 any device
On-premAWS-onlyK3S bare metal
Multi-cloudAWS-onlyCrossplane from K3S
Air-gappedneeds internetK3S offline install

Cost Evolution

ScaleUsersServicesMonthly CostROI vs $2k SaaS
BC1 PROD<502$18011x
BC1 PEAK<502$3805.3x
BC2 Growth50-2004-6$3605.6x
BC2+ Enterprise200-5008+$1,2001.7x
SaaS EquivalentAnyN/A$2,000+1x (baseline)

ROI remains positive at all scales. Break-even with SaaS only at >500 users with full EKS+Aurora+Keycloak — at which point you have a platform, not a tool.


LiteLLM: The Config-Change Enabler

LiteLLM is the architectural decision that makes most AI provider changes a config change:

# BC1: Claude API direct (golden path)
LITELLM_MODEL=claude-sonnet-4-6
ANTHROPIC_API_KEY=sk-ant-...

# BC2+: Bedrock VPC (sovereignty)
LITELLM_MODEL=bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0
AWS_REGION=ap-southeast-2

# BC2+: Multi-provider failover
LITELLM_FALLBACKS=["bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0", "openai/gpt-4o"]

# BC2+: Ollama local (privacy + cost at scale)
LITELLM_MODEL=ollama/llama3.1
OLLAMA_API_BASE=http://localhost:11434

Same application code. Same prompts. Same CrewAI crews. Zero code change.

Two AI API Paths

Open WebUI (L6) uses ANTHROPIC_API_KEY for direct native Anthropic integration. FastAPI+CrewAI (L5) uses LiteLLM as a gateway.

Both paths evolve via environment variables — but through different configuration surfaces. See ADR-002 for details.


Scaling Triggers with Thresholds

ThresholdSignalDetectionActionPhase
>50 usersSQLite write contentionPRAGMA journal_mode=WAL errorsUpgrade L4 to RDSBC2
>100 concurrentECS CPU >80% sustainedCloudWatch Application SignalsScale L5+L6 to 6 replicasBC1 PEAK
>10 crews/hrCrewAI queue depthFargate task pending countAdd Fargate Spot workersBC1 PEAK
>6 servicesService discovery complexityECS Service Connect limitsEvaluate EKS migrationBC2+
Sovereignty mandateRegulatory requirementAPRA CPS 234 audit findingLiteLLM → Bedrock VPCBC2+
FinOps scaleScan volume >1TBS3 storage growth metricsS3 Tables (Iceberg) for analyticsBC2+
Cross-regionLatency >200ms from NZCloudFront origin latencyPriceClass_200BC2+
IaC PRs >5/wkTeam PR velocityGitHub Actions workflow runsAtlantis on K3SBC2+
Team >3 engConcurrent PR conflictsPR merge conflicts/weekArgoCD + AtlantisBC2+
Second cloudMulti-cloud mandateProcurement decisionCrossplane on K3SBC2+
On-prem mandateRegulatory/data gravityBoard/compliance directiveK3S edge nodesBC2+

What Doesn't Change

These BC1 decisions are permanent — they scale without modification:

DecisionWhy It's Permanent
ECS Fargate (not EC2)Zero OS patching at any scale
Graviton4 ARM64Better price-performance grows with scale
CloudFront + WAFv2450+ PoPs, bot protection scales automatically
IAM Identity CenterAWS-native, free at any user count
ADLC framework governance9 agents, 5 hooks, 58 checkpoints — scale-independent
4-way cross-validation24 signals work identically at $180/mo and $1,200/mo
FOCUS 1.2+ cost tagsChargeback reporting scales with resources, not users

Evidence

  • Source of truth: docs/src/pages/xops.jsx (LAYERS[].whyNot[], COST_ENV[])
  • PR/FAQ: xOps BC1 PR/FAQ Q10, Q11
  • ADRs: Architecture Decision Records
  • Golden Paths: Golden Paths (stage-by-stage progression)
  • Coordination: tmp/adlc-framework/coordination-logs/cloud-architect-2026-03-11-docs-expansion.json