Cross-Validation Framework
Architecture principle: NOT replacement validation — all 4 layers must agree ≤0.5% variance → ≥99.5% accuracy. No single layer is ground truth; consensus across independent sources IS the ground truth.
Layer 1 (boto3) ↔ Layer 2 (MCP) ↔ Layer 3 (runbooks PyPI) ↔ Layer 4 (Console screenshots)
Layer 1 — AWS Native (boto3/CLI)
Role: Evidence collection · signal calculation A1–A6 · diagram generation
Evidence path: python evidence/collect_layer1.py --env prod → tmp/cloud-infrastructure/layer1/
| Signal | Name | Command | FOCUS Tag |
|---|---|---|---|
| A1 | FinOps Cost Allocation | cost_explorer.get_cost_and_usage() | FOCUS:ChargeType |
| A2 | EFS Throughput + IOPS | cloudwatch.get_metric_data(Namespace='AWS/EFS',MetricName='TotalIOBytes') | FOCUS:Storage |
| A3 | CloudFront Hit Ratio | cloudwatch.get_metric_statistics(MetricName='CacheHitRate') | FOCUS:ProviderSvc |
| A4 | AI API Token Usage | litellm.get_usage() or Anthropic/OpenAI usage dashboard | FOCUS:AI/ML |
| A5 | WAFv2 Block Rate | cloudwatch.get_metric_data(Namespace='AWS/WAFV2',MetricName='BlockedRequests') | FOCUS:Security |
| A6 | ECS Task Count + Health | ecs.list_tasks() + ecs.describe_tasks() | FOCUS:Compute |
Layer 2 — MCP Real-time API
Role: Live validation against Layer 1 · no evidence files created
Evidence path: mcp_validate.py --layer1 tmp/cloud-infrastructure/layer1/ --tolerance 0.005
| Signal | Name | Command | Cross-checks |
|---|---|---|---|
| M1 | ECS services state | MCP aws:list_ecs_services(cluster=xops-prod) | A6 |
| M2 | Cost & Usage live | MCP aws:get_cost_and_usage(granularity=DAILY) | A1 |
| M3 | EFS filesystem live | MCP aws:describe_file_systems() | A2 |
| M4 | Runbook multi-account | MCP cloudops:describe_account_summary(--all) | Multi-account |
| M5 | AI API usage live | MCP litellm:get_usage() or provider-specific usage API | A4 |
| M6 | CloudFront distribution | MCP aws:list_distributions() | A3 |
Layer 3 — Runbooks PyPI + Rich CLI
Role: Production-grade APIs · multi-account aggregation · FOCUS 1.2+
Evidence path: runbooks finops report --output tmp/cloud-infrastructure/layer3/ → Rich table + JSON
| Signal | Name | Command | Cross-checks |
|---|---|---|---|
| R1 | FOCUS cost report | runbooks finops report --format focus-1.2 --accounts all | All accounts |
| R2 | ECS service status | runbooks cloudops status --cluster xops-prod | A6+M1 |
| R3 | Cost anomaly detection | runbooks finops anomaly --threshold 5% --days 30 | A1+M2 |
| R4 | Tag compliance audit | runbooks cloudops tag-audit --framework focus-1.2 | FinOps governance |
| R5 | EFS storage audit | runbooks cloudops efs-audit | A2+M3 |
| R6 | Security posture | runbooks security prowler-scan --accounts all | CPS 234 evidence |
Layer 4 — Console Screenshots (Ground Truth)
Role: Manager visual verification · Playwright/CDP automated capture · ground truth
Evidence path: playwright test xops-console.spec.ts --screenshot=on → tmp/cloud-infrastructure/screenshots/
| Signal | Name | Command | Verifies |
|---|---|---|---|
| S1 | ECS Services Console | playwright goto /ecs/v2/clusters/xops-prod/services | Visual: task count |
| S2 | Cost Explorer Console | playwright goto /cost-management/cost-explorer | Visual: cost graph |
| S3 | EFS Console | playwright goto /efs/file-systems/ | Visual: throughput/IOPS |
| S4 | CloudFront Metrics | playwright goto /cloudfront/v4/distributions | Visual: hit ratio |
| S5 | WAFv2 Dashboard | playwright goto /wafv2/homev2/web-acls | Visual: block rate |
| S6 | Open WebUI Pipeline Logs | playwright goto localhost:3000/admin/pipelines | Visual: pipeline runs |
Signal Cross-Reference Matrix
| Domain | Layer 1 | Layer 2 | Layer 3 | Layer 4 |
|---|---|---|---|---|
| FinOps Cost | A1 | M2 | R1, R3 | S2 |
| ECS Tasks | A6 | M1 | R2 | S1 |
| Storage | A2 | M3 | R5 | S3 |
| AI Tokens | A4 | M5 | — | — |
| CloudFront | A3 | M6 | — | S4 |
| Security | A5 | — | R6 | S5 |
HITL Gate Scoring
The 8 HITL gates are validated by all 4 agents against this framework:
| HITL Gate | Description | Criteria |
|---|---|---|
| H1 CloudOps+Runbooks | PyPI runbooks + MCP + Rich CLI | MCP accuracy ≤0.5%, 119+ analyzers, Rich CLI output |
| H2 DevOps+TF | Terraform modules + GitOps pipelines | M1–M4 composition, checkov 0 FAILED, infracost ≤5% |
| H3 Orchestrator | CrewAI flows + LiteLLM routing | LiteLLM provider abstraction, Flows v2, Bedrock BC2+ path |
| H4 Frontend | Open WebUI + Docusaurus docs site | Pipeline engine, MCP client, SCIM 2.0, ARM64 image |
| H5 Architecture | ECS AI + K3S GitOps hybrid ADR-005 | 6-layer completeness, Option C hybrid, zero circular deps |
| H6 Cost Model | FOCUS 1.2+ · $180/mo optimised prod | FOCUS 1.2+ tags, $180 verified vs AWS pricing, 6 optimisations |
| H7 Execution Plan | 6-phase ADLC · 1 HITL/phase gate | 5 phases in 10 weeks, M1+M2 head start, PDCA autonomous |
| H8 Cross-Validation | 4-way L1–L4 signal matrix · 24 signals | 24 signals, ≤0.5% tolerance, ≥99.5% accuracy target |