ADR-006: MCP Validation Methodology
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2025-08 |
| Decision Makers | Cloud Architect, QA Engineer |
| Context | CloudOps-Runbooks MCP server accuracy requirements |
Context
MCP (Model Context Protocol) servers provide AI agents with live access to AWS APIs. When agents make decisions based on MCP-returned data (cost figures, resource counts, security findings), incorrect data leads to incorrect recommendations. Enterprise environments with 60+ accounts amplify the blast radius of inaccurate data.
The question: How do we ensure MCP server responses are trustworthy enough to base operational decisions on?
Decision
Adopt a cross-validation methodology that compares MCP server responses against native SDK/CLI results, with configurable tolerance thresholds per operation type.
Accuracy Target: ≥99.5%
This target was chosen because:
- Below 95%: decisions based on MCP data are unreliable — agents may recommend wrong actions
- 95–99%: acceptable for advisory use but not for automated actions
- ≥99.5%: sufficient confidence for automated operational decisions with HITL approval gates
Tolerance Thresholds
| Operation Type | Tolerance | Rationale |
|---|---|---|
| Cost data | ±5% | Cost Explorer has inherent lag/rounding; exact match is unrealistic |
| Organisation structure | 0% (exact) | Account hierarchy must be precise — wrong account = wrong blast radius |
| Resource counts | ±5% | Eventual consistency in some AWS APIs; small variance acceptable |
| Security findings | ±5% | Security Hub aggregation timing differences |
| VPC topology | ±1% | Network topology must be highly accurate for routing decisions |
Validation Process
1. Execute operation via MCP server → capture result
2. Execute same operation via native SDK/CLI → capture result
3. Compare within tolerance → PASS/WARNING/FAIL
4. Log both results + comparison to evidence directory
Status Classification
| Status | Criteria | Action |
|---|---|---|
| PASSED | ≥99.5% accuracy AND ≤30s execution | Proceed with confidence |
| WARNING | 95–99.4% accuracy OR performance concern | Investigate before proceeding |
| FAILED | Below 95% accuracy OR above 60s execution | Block automated actions, investigate |
| ERROR | Validation framework failure | Fix infrastructure before proceeding |
Consequences
Positive
- Agents can make data-driven decisions with quantified confidence
- Cross-validation catches API changes, permission issues, and data staleness early
- Evidence trail supports compliance requirements (SOC2, PCI-DSS, HIPAA)
- Performance target (≤30s) ensures validation doesn't bottleneck operations
Negative
- Doubles API calls (MCP + native validation)
- Adds ~30s to operations that include validation
- Requires maintaining both MCP and native SDK code paths
Compliance Mapping
| Framework | Relevant Control | How This ADR Supports |
|---|---|---|
| SOC2 | CC7.2 — Monitoring | Cross-validation provides continuous data accuracy monitoring |
| PCI-DSS | 10.5 — Audit trails | All validation results logged with timestamps |
| HIPAA | §164.312(b) — Audit controls | Evidence directory maintains validation history |
Alternatives Considered
Option A: Trust MCP without validation
- Rejected: No mechanism to detect silent failures or data drift
Option B: Validate only on first use, cache thereafter
- Rejected: AWS data changes continuously; cached validation becomes stale
Option C: Sample-based validation (validate 10% of operations)
- Rejected: Insufficient coverage for enterprise compliance requirements
Implementation Notes
- Validation runs as part of the operation pipeline, not as a separate step
- Evidence is written to
tmp/<project>/cross-validation/per ADLC convention - Exit codes: 0=PASSED, 1=WARNING, 2=FAILED, 3=ERROR
- Performance target applies to total validation cycle, not individual API calls
Origin: CloudOps-Runbooks MCP validation framework. Elevated to ADR because this methodology governs all MCP integrations across ADLC consumer projects.