ADR-006: MCP Validation Methodology

Field	Value
Status	Accepted
Date	2025-08
Decision Makers	Cloud Architect, QA Engineer
Context	CloudOps-Runbooks MCP server accuracy requirements

Context

MCP (Model Context Protocol) servers provide AI agents with live access to AWS APIs. When agents make decisions based on MCP-returned data (cost figures, resource counts, security findings), incorrect data leads to incorrect recommendations. Enterprise environments with 60+ accounts amplify the blast radius of inaccurate data.

The question: How do we ensure MCP server responses are trustworthy enough to base operational decisions on?

Decision

Adopt a cross-validation methodology that compares MCP server responses against native SDK/CLI results, with configurable tolerance thresholds per operation type.

Accuracy Target: ≥99.5%

This target was chosen because:

Below 95%: decisions based on MCP data are unreliable — agents may recommend wrong actions
95–99%: acceptable for advisory use but not for automated actions
≥99.5%: sufficient confidence for automated operational decisions with HITL approval gates

Tolerance Thresholds

Operation Type	Tolerance	Rationale
Cost data	±5%	Cost Explorer has inherent lag/rounding; exact match is unrealistic
Organisation structure	0% (exact)	Account hierarchy must be precise — wrong account = wrong blast radius
Resource counts	±5%	Eventual consistency in some AWS APIs; small variance acceptable
Security findings	±5%	Security Hub aggregation timing differences
VPC topology	±1%	Network topology must be highly accurate for routing decisions

Validation Process

Execute operation via MCP server → capture result
Execute same operation via native SDK/CLI → capture result
Compare within tolerance → PASS/WARNING/FAIL
Log both results + comparison to evidence directory

Status Classification

Status	Criteria	Action
PASSED	≥99.5% accuracy AND ≤30s execution	Proceed with confidence
WARNING	95–99.4% accuracy OR performance concern	Investigate before proceeding
FAILED	Below 95% accuracy OR above 60s execution	Block automated actions, investigate
ERROR	Validation framework failure	Fix infrastructure before proceeding

Consequences

Positive

Agents can make data-driven decisions with quantified confidence
Cross-validation catches API changes, permission issues, and data staleness early
Evidence trail supports compliance requirements (SOC2, PCI-DSS, HIPAA)
Performance target (≤30s) ensures validation doesn't bottleneck operations

Negative

Doubles API calls (MCP + native validation)
Adds ~30s to operations that include validation
Requires maintaining both MCP and native SDK code paths

Compliance Mapping

Framework	Relevant Control	How This ADR Supports
SOC2	CC7.2 — Monitoring	Cross-validation provides continuous data accuracy monitoring
PCI-DSS	10.5 — Audit trails	All validation results logged with timestamps
HIPAA	§164.312(b) — Audit controls	Evidence directory maintains validation history

Alternatives Considered

Option A: Trust MCP without validation

Rejected: No mechanism to detect silent failures or data drift

Option B: Validate only on first use, cache thereafter

Rejected: AWS data changes continuously; cached validation becomes stale

Option C: Sample-based validation (validate 10% of operations)

Rejected: Insufficient coverage for enterprise compliance requirements

Implementation Notes

Validation runs as part of the operation pipeline, not as a separate step
Evidence is written to tmp/<project>/cross-validation/ per ADLC convention
Exit codes: 0=PASSED, 1=WARNING, 2=FAILED, 3=ERROR
Performance target applies to total validation cycle, not individual API calls

Origin: CloudOps-Runbooks MCP validation framework. Elevated to ADR because this methodology governs all MCP integrations across ADLC consumer projects.

Context​

Decision​

Accuracy Target: ≥99.5%​

Tolerance Thresholds​

Validation Process​

Status Classification​

Consequences​

Positive​

Negative​

Compliance Mapping​

Alternatives Considered​

Option A: Trust MCP without validation​

Option B: Validate only on first use, cache thereafter​

Option C: Sample-based validation (validate 10% of operations)​

Implementation Notes​