Skip to main content

Testing & Quality

Enterprise-grade AI agent governance requires enterprise-grade testing. This page summarises the ADLC testing strategy and improvement map for ANZ FSI/Energy/Telecom/Aviation targets.


Enterprise Standards Checklist (2026-2030)

StandardCurrent StateTargetStatus
Test Coverage59.27% (2026-03-17)≥80%In Progress
3-Tier Progressive PipelineTier 1+2 automated, Tier 3 manualFully automated Tier 1→2→3Partial
BDD Given/When/Then docstringsIntroduced Sprint 1All public APIsIn Progress
SAST (bandit)CI gate live0 HIGH/CRITICAL blockingIn Progress
SCA (pip-audit)CI gate liveWeekly scheduled + gateIn Progress
SBOM (cyclonedx)PlannedEvery tagged releasePlanned
Secret scanning (truffleHog)PlannedPre-commit + CIPlanned
Docker-first CI (act validated)AdoptedAll workflowsDone
DORA metrics (all 4)Collected via hooksAutomated per-sprintPartial
APRA CPS 234 (FSI)DocumentedAuditableIn Progress
Pinned SHA digests (actions)PartialAll workflowsIn Progress
Wolfi/distroless containersAdopted for E2EAll imagesIn Progress
Agent consensus quality96% achieved (S1)≥95% per changeEnforced
Current Baseline

pytest: 33/33 PASS, 59.27% coverage (2026-03-17). CI gates: 16 active. Hook tests: 190+ cases, 9 suites PASS.


TDD Approach

Test-Driven Development is used as the primary design technique. Passing tests are a by-product; modular, injectable interfaces are the goal.

1. RED     — Write a failing test that specifies the behaviour
2. GREEN — Write the minimum code to make the test pass
3. REFACTOR — Clean up while tests protect
4. REPEAT
# Step 1: RED — test describes desired behaviour
def test_list_ec2_instances_returns_table(runner, mock_boto3):
"""Given valid AWS credentials, when listing EC2, then render a Rich table."""
result = runner.invoke(cli, ["ec2", "list"])
assert result.exit_code == 0
assert "Instance ID" in result.output

# Step 2: GREEN — minimal implementation
# Step 3: REFACTOR — extract table rendering to rich_utils.py

BDD Approach

Behaviour-Driven Development converts tests from implementation checks into business-readable contracts.

Given a multi-account org with 5 AWS accounts
When querying costs for March 2026
Then all 5 accounts are aggregated with correct totals
def test_cost_explorer_returns_monthly_summary(runner, mock_mcp):
"""
Given: A valid AWS profile with Cost Explorer access
When: The user runs `runbooks cost monthly --profile dev`
Then: A table showing service-level costs is rendered
And: Total cost is displayed in the summary row
And: Exit code is 0
"""
result = runner.invoke(cli, ["cost", "monthly", "--profile", "dev"])
assert result.exit_code == 0
assert "Amazon EC2" in result.output
assert "Total" in result.output

Progressive Testing Pipeline

Tier 1 (Unit, ~2s, $0) → Tier 2 (Integration, ~30s, $0) → Tier 3 (E2E, ~15min, ~$5)

Gates are additive: Tier 2 runs only if Tier 1 passes; Tier 3 only if Tier 2 passes.

TierScopeToolsGate
1 — UnitFunctions, modulespytest + fixturesAll tests PASS
2 — IntegrationCLI commands, mock AWSpytest + moto + LocalStackAll PASS + ≥80% coverage
3 — E2EBrowser + AWS sandboxPlaywright + LocalStackAll E2E PASS + Lighthouse ≥90
task test:unit          # Tier 1 — run on every save
task test:integration # Tier 2 — run before commit
task test:e2e # Tier 3 — run before PR
task test:progressive # Full gate-chained pipeline

Component Coverage Map

ComponentCountTest TypeStatus
Core Agents10Frontmatter schema + behavioural specValidated
Commands74Frontmatter schema validationValidated
Hooks22Functional tests — 22/22 tested (300+ cases)PASS
Skills20Referenced file existence validatedValidated
Settings / MANIFEST2Cross-validated (settings ↔ hooks ↔ agents)PASS
MCPs58JSON schema + connectivity shapeValid JSON
CLI source modules22pytest unit + integration59% (baseline)
E2E / Playwright22 testsBrowser automation against DocusaurusPASS
Coverage Gap

59% source coverage is below the 80% enterprise target. Active remediation: CloudOps-Runbooks S5 moto test unskipping + quality gate fixes. Target: ≥80% by end of S5.


Quality Gates (All 16)

GateToolThresholdEnforcement
Lintruff check0 errorsBLOCK
Formatruff format --check0 diffBLOCK
Unit testspytest tests/unit100% PASSBLOCK
Integration testspytest tests/integration100% PASSBLOCK
Coveragepytest --cov≥80% (advisory until S5)ADVISORY
Type checkingpyright0 errors on public APIADVISORY
SASTbandit -r src0 HIGH/CRITICALBLOCK
SCApip-audit0 CRITICAL CVEsBLOCK
IaC securitycheckov0 FAILED policiesBLOCK
Container scantrivy0 HIGH/CRITICALBLOCK
SecretstruffleHog0 detectedBLOCK
Infracostinfracost diff≤+5% cost deltaADVISORY
Hook testsbash tests/hooks/run-all-tests.sh190+ cases PASSBLOCK
E2E smokePlaywright @smokeAll PASSBLOCK
Lighthouselhci autorun≥90 performanceADVISORY
SBOMcyclonedx-pythonGeneratedADVISORY

Further Reading