Skip to main content

ADR-006: MCP Validation Methodology

FieldValue
StatusAccepted
Date2025-08
Decision MakersCloud Architect, QA Engineer
ContextCloudOps-Runbooks MCP server accuracy requirements

Context

MCP (Model Context Protocol) servers provide AI agents with live access to AWS APIs. When agents make decisions based on MCP-returned data (cost figures, resource counts, security findings), incorrect data leads to incorrect recommendations. Enterprise environments with 60+ accounts amplify the blast radius of inaccurate data.

The question: How do we ensure MCP server responses are trustworthy enough to base operational decisions on?

Decision

Adopt a cross-validation methodology that compares MCP server responses against native SDK/CLI results, with configurable tolerance thresholds per operation type.

Accuracy Target: ≥99.5%

This target was chosen because:

  • Below 95%: decisions based on MCP data are unreliable — agents may recommend wrong actions
  • 95–99%: acceptable for advisory use but not for automated actions
  • ≥99.5%: sufficient confidence for automated operational decisions with HITL approval gates

Tolerance Thresholds

Operation TypeToleranceRationale
Cost data±5%Cost Explorer has inherent lag/rounding; exact match is unrealistic
Organisation structure0% (exact)Account hierarchy must be precise — wrong account = wrong blast radius
Resource counts±5%Eventual consistency in some AWS APIs; small variance acceptable
Security findings±5%Security Hub aggregation timing differences
VPC topology±1%Network topology must be highly accurate for routing decisions

Validation Process

1. Execute operation via MCP server → capture result
2. Execute same operation via native SDK/CLI → capture result
3. Compare within tolerance → PASS/WARNING/FAIL
4. Log both results + comparison to evidence directory

Status Classification

StatusCriteriaAction
PASSED≥99.5% accuracy AND ≤30s executionProceed with confidence
WARNING95–99.4% accuracy OR performance concernInvestigate before proceeding
FAILEDBelow 95% accuracy OR above 60s executionBlock automated actions, investigate
ERRORValidation framework failureFix infrastructure before proceeding

Consequences

Positive

  • Agents can make data-driven decisions with quantified confidence
  • Cross-validation catches API changes, permission issues, and data staleness early
  • Evidence trail supports compliance requirements (SOC2, PCI-DSS, HIPAA)
  • Performance target (≤30s) ensures validation doesn't bottleneck operations

Negative

  • Doubles API calls (MCP + native validation)
  • Adds ~30s to operations that include validation
  • Requires maintaining both MCP and native SDK code paths

Compliance Mapping

FrameworkRelevant ControlHow This ADR Supports
SOC2CC7.2 — MonitoringCross-validation provides continuous data accuracy monitoring
PCI-DSS10.5 — Audit trailsAll validation results logged with timestamps
HIPAA§164.312(b) — Audit controlsEvidence directory maintains validation history

Alternatives Considered

Option A: Trust MCP without validation

  • Rejected: No mechanism to detect silent failures or data drift

Option B: Validate only on first use, cache thereafter

  • Rejected: AWS data changes continuously; cached validation becomes stale

Option C: Sample-based validation (validate 10% of operations)

  • Rejected: Insufficient coverage for enterprise compliance requirements

Implementation Notes

  • Validation runs as part of the operation pipeline, not as a separate step
  • Evidence is written to tmp/<project>/cross-validation/ per ADLC convention
  • Exit codes: 0=PASSED, 1=WARNING, 2=FAILED, 3=ERROR
  • Performance target applies to total validation cycle, not individual API calls

Origin: CloudOps-Runbooks MCP validation framework. Elevated to ADR because this methodology governs all MCP integrations across ADLC consumer projects.