Skip to main content

Cross-Validation Framework

Architecture principle: NOT replacement validation — all 4 layers must agree ≤0.5% variance → ≥99.5% accuracy. No single layer is ground truth; consensus across independent sources IS the ground truth.

Layer 1 (boto3) ↔ Layer 2 (MCP) ↔ Layer 3 (runbooks PyPI) ↔ Layer 4 (Console screenshots)

Layer 1 — AWS Native (boto3/CLI)

Role: Evidence collection · signal calculation A1–A6 · diagram generation

Evidence path: python evidence/collect_layer1.py --env prod → tmp/cloud-infrastructure/layer1/

SignalNameCommandFOCUS Tag
A1FinOps Cost Allocationcost_explorer.get_cost_and_usage()FOCUS:ChargeType
A2EFS Throughput + IOPScloudwatch.get_metric_data(Namespace='AWS/EFS',MetricName='TotalIOBytes')FOCUS:Storage
A3CloudFront Hit Ratiocloudwatch.get_metric_statistics(MetricName='CacheHitRate')FOCUS:ProviderSvc
A4AI API Token Usagelitellm.get_usage() or Anthropic/OpenAI usage dashboardFOCUS:AI/ML
A5WAFv2 Block Ratecloudwatch.get_metric_data(Namespace='AWS/WAFV2',MetricName='BlockedRequests')FOCUS:Security
A6ECS Task Count + Healthecs.list_tasks() + ecs.describe_tasks()FOCUS:Compute

Layer 2 — MCP Real-time API

Role: Live validation against Layer 1 · no evidence files created

Evidence path: mcp_validate.py --layer1 tmp/cloud-infrastructure/layer1/ --tolerance 0.005

SignalNameCommandCross-checks
M1ECS services stateMCP aws:list_ecs_services(cluster=xops-prod)A6
M2Cost & Usage liveMCP aws:get_cost_and_usage(granularity=DAILY)A1
M3EFS filesystem liveMCP aws:describe_file_systems()A2
M4Runbook multi-accountMCP cloudops:describe_account_summary(--all)Multi-account
M5AI API usage liveMCP litellm:get_usage() or provider-specific usage APIA4
M6CloudFront distributionMCP aws:list_distributions()A3

Layer 3 — Runbooks PyPI + Rich CLI

Role: Production-grade APIs · multi-account aggregation · FOCUS 1.2+

Evidence path: runbooks finops report --output tmp/cloud-infrastructure/layer3/ → Rich table + JSON

SignalNameCommandCross-checks
R1FOCUS cost reportrunbooks finops report --format focus-1.2 --accounts allAll accounts
R2ECS service statusrunbooks cloudops status --cluster xops-prodA6+M1
R3Cost anomaly detectionrunbooks finops anomaly --threshold 5% --days 30A1+M2
R4Tag compliance auditrunbooks cloudops tag-audit --framework focus-1.2FinOps governance
R5EFS storage auditrunbooks cloudops efs-auditA2+M3
R6Security posturerunbooks security prowler-scan --accounts allCPS 234 evidence

Layer 4 — Console Screenshots (Ground Truth)

Role: Manager visual verification · Playwright/CDP automated capture · ground truth

Evidence path: playwright test xops-console.spec.ts --screenshot=on → tmp/cloud-infrastructure/screenshots/

SignalNameCommandVerifies
S1ECS Services Consoleplaywright goto /ecs/v2/clusters/xops-prod/servicesVisual: task count
S2Cost Explorer Consoleplaywright goto /cost-management/cost-explorerVisual: cost graph
S3EFS Consoleplaywright goto /efs/file-systems/Visual: throughput/IOPS
S4CloudFront Metricsplaywright goto /cloudfront/v4/distributionsVisual: hit ratio
S5WAFv2 Dashboardplaywright goto /wafv2/homev2/web-aclsVisual: block rate
S6Open WebUI Pipeline Logsplaywright goto localhost:3000/admin/pipelinesVisual: pipeline runs

Signal Cross-Reference Matrix

DomainLayer 1Layer 2Layer 3Layer 4
FinOps CostA1M2R1, R3S2
ECS TasksA6M1R2S1
StorageA2M3R5S3
AI TokensA4M5
CloudFrontA3M6S4
SecurityA5R6S5

HITL Gate Scoring

The 8 HITL gates are validated by all 4 agents against this framework:

HITL GateDescriptionCriteria
H1 CloudOps+RunbooksPyPI runbooks + MCP + Rich CLIMCP accuracy ≤0.5%, 119+ analyzers, Rich CLI output
H2 DevOps+TFTerraform modules + GitOps pipelinesM1–M4 composition, checkov 0 FAILED, infracost ≤5%
H3 OrchestratorCrewAI flows + LiteLLM routingLiteLLM provider abstraction, Flows v2, Bedrock BC2+ path
H4 FrontendOpen WebUI + Docusaurus docs sitePipeline engine, MCP client, SCIM 2.0, ARM64 image
H5 ArchitectureECS AI + K3S GitOps hybrid ADR-0056-layer completeness, Option C hybrid, zero circular deps
H6 Cost ModelFOCUS 1.2+ · $180/mo optimised prodFOCUS 1.2+ tags, $180 verified vs AWS pricing, 6 optimisations
H7 Execution Plan6-phase ADLC · 1 HITL/phase gate5 phases in 10 weeks, M1+M2 head start, PDCA autonomous
H8 Cross-Validation4-way L1–L4 signal matrix · 24 signals24 signals, ≤0.5% tolerance, ≥99.5% accuracy target