Testing & Quality

Enterprise-grade AI agent governance requires enterprise-grade testing. This page summarises the ADLC testing strategy and improvement map for ANZ FSI/Energy/Telecom/Aviation targets.

Enterprise Standards Checklist (2026-2030)

Standard	Current State	Target	Status
Test Coverage	59.27% (2026-03-17)	≥80%	In Progress
3-Tier Progressive Pipeline	Tier 1+2 automated, Tier 3 manual	Fully automated Tier 1→2→3	Partial
BDD Given/When/Then docstrings	Introduced Sprint 1	All public APIs	In Progress
SAST (`bandit`)	CI gate live	0 HIGH/CRITICAL blocking	In Progress
SCA (`pip-audit`)	CI gate live	Weekly scheduled + gate	In Progress
SBOM (`cyclonedx`)	Planned	Every tagged release	Planned
Secret scanning (`truffleHog`)	Planned	Pre-commit + CI	Planned
Docker-first CI (`act` validated)	Adopted	All workflows	Done
DORA metrics (all 4)	Collected via hooks	Automated per-sprint	Partial
APRA CPS 234 (FSI)	Documented	Auditable	In Progress
Pinned SHA digests (actions)	Partial	All workflows	In Progress
Wolfi/distroless containers	Adopted for E2E	All images	In Progress
Agent consensus quality	96% achieved (S1)	≥95% per change	Enforced

Current Baseline

pytest: 33/33 PASS, 59.27% coverage (2026-03-17). CI gates: 16 active. Hook tests: 190+ cases, 9 suites PASS.

TDD Approach

Test-Driven Development is used as the primary design technique. Passing tests are a by-product; modular, injectable interfaces are the goal.

RED     — Write a failing test that specifies the behaviour
GREEN   — Write the minimum code to make the test pass
REFACTOR — Clean up while tests protect
REPEAT

# Step 1: RED — test describes desired behaviour
def test_list_ec2_instances_returns_table(runner, mock_boto3):
    """Given valid AWS credentials, when listing EC2, then render a Rich table."""
    result = runner.invoke(cli, ["ec2", "list"])
    assert result.exit_code == 0
    assert "Instance ID" in result.output

# Step 2: GREEN — minimal implementation
# Step 3: REFACTOR — extract table rendering to rich_utils.py

BDD Approach

Behaviour-Driven Development converts tests from implementation checks into business-readable contracts.

Given a multi-account org with 5 AWS accounts
When querying costs for March 2026
Then all 5 accounts are aggregated with correct totals

def test_cost_explorer_returns_monthly_summary(runner, mock_mcp):
    """
    Given: A valid AWS profile with Cost Explorer access
    When:  The user runs `runbooks cost monthly --profile dev`
    Then:  A table showing service-level costs is rendered
    And:   Total cost is displayed in the summary row
    And:   Exit code is 0
    """
    result = runner.invoke(cli, ["cost", "monthly", "--profile", "dev"])
    assert result.exit_code == 0
    assert "Amazon EC2" in result.output
    assert "Total" in result.output

Progressive Testing Pipeline

Tier 1 (Unit, ~2s, $0) → Tier 2 (Integration, ~30s, $0) → Tier 3 (E2E, ~15min, ~$5)

Gates are additive: Tier 2 runs only if Tier 1 passes; Tier 3 only if Tier 2 passes.

Tier	Scope	Tools	Gate
1 — Unit	Functions, modules	`pytest` + fixtures	All tests PASS
2 — Integration	CLI commands, mock AWS	`pytest` + `moto` + LocalStack	All PASS + ≥80% coverage
3 — E2E	Browser + AWS sandbox	Playwright + LocalStack	All E2E PASS + Lighthouse ≥90

task test:unit          # Tier 1 — run on every save
task test:integration   # Tier 2 — run before commit
task test:e2e           # Tier 3 — run before PR
task test:progressive   # Full gate-chained pipeline

Component Coverage Map

Component	Count	Test Type	Status
Core Agents	10	Frontmatter schema + behavioural spec	Validated
Commands	74	Frontmatter schema validation	Validated
Hooks	22	Functional tests — 22/22 tested (300+ cases)	PASS
Skills	20	Referenced file existence validated	Validated
Settings / MANIFEST	2	Cross-validated (settings ↔ hooks ↔ agents)	PASS
MCPs	58	JSON schema + connectivity shape	Valid JSON
CLI source modules	22	pytest unit + integration	59% (baseline)
E2E / Playwright	22 tests	Browser automation against Docusaurus	PASS

Coverage Gap

59% source coverage is below the 80% enterprise target. Active remediation: CloudOps-Runbooks S5 moto test unskipping + quality gate fixes. Target: ≥80% by end of S5.

Quality Gates (All 16)

Gate	Tool	Threshold	Enforcement
Lint	`ruff check`	0 errors	BLOCK
Format	`ruff format --check`	0 diff	BLOCK
Unit tests	`pytest tests/unit`	100% PASS	BLOCK
Integration tests	`pytest tests/integration`	100% PASS	BLOCK
Coverage	`pytest --cov`	≥80% (advisory until S5)	ADVISORY
Type checking	`pyright`	0 errors on public API	ADVISORY
SAST	`bandit -r src`	0 HIGH/CRITICAL	BLOCK
SCA	`pip-audit`	0 CRITICAL CVEs	BLOCK
IaC security	`checkov`	0 FAILED policies	BLOCK
Container scan	`trivy`	0 HIGH/CRITICAL	BLOCK
Secrets	`truffleHog`	0 detected	BLOCK
Infracost	`infracost diff`	≤+5% cost delta	ADVISORY
Hook tests	`bash tests/hooks/run-all-tests.sh`	190+ cases PASS	BLOCK
E2E smoke	Playwright `@smoke`	All PASS	BLOCK
Lighthouse	`lhci autorun`	≥90 performance	ADVISORY
SBOM	`cyclonedx-python`	Generated	ADVISORY

Enterprise Standards Checklist (2026-2030)​

TDD Approach​

BDD Approach​

Progressive Testing Pipeline​

Component Coverage Map​

Quality Gates (All 16)​

Further Reading​