ITSM Lifecycle Golden Path
ITSM operations are distinct from product development. OPS board = customer incidents. SPM board = product stories. The pattern bridge connects them — after PII scrub.
At a Glance
Every OPS ticket flows through a standardized pipeline: classify the issue, cross-validate against real infrastructure, then — for changes only — create a change record and change request for approval. Post-incident reviews capture lessons learned.
| I have a... | What happens | Command |
|---|---|---|
| New OPS ticket | Classify type + priority, then verify against AWS | /itsm:classify OPS-NNN |
| Classified ticket needing evidence | Cross-validate against 4 sources | /itsm:cross-validate OPS-NNN |
| Infrastructure change to implement | Create change record with risk + rollback plan | /itsm:create-change "title" |
| Change needing CAB approval | Create change request for review board | /itsm:create-cr OPS-NNN |
| Resolved P0/P1 incident | Blameless post-incident review | /itsm:create-pir OPS-NNN |
| Full end-to-end processing | All stages with approval gates | /itsm:lifecycle OPS-NNN |
All commands default to preview mode. Add --execute to apply changes to JIRA.
The Flow
Stage A: Classification (/itsm:classify)
Who: sre-automation-specialist classifies. HITL approves label set before JIRA update.
What: Decision tree assigns ticket type, priority, and 6-prefix label taxonomy.
Why: Correct classification drives SLA, routing, and CAB review requirements. Misclassification at Stage A propagates through all downstream stages.
What-if skip: Tickets misrouted, SLAs misapplied, CAB reviews missed for Changes that require them.
Decision Tree
Ticket received
├── Unplanned disruption to service → Incident
│ └── Underlying cause unknown → Problem (linked)
├── Planned modification to production → Change
│ ├── Standard (pre-approved template) → No CAB required
│ ├── Normal (requires CAB review) → Stage C + D
│ └── Emergency (P0/P1 impact) → Expedited CAB
└── User-requested provisioning → Service Request
└── Fulfillment via service catalog
6-Prefix Label Taxonomy
| Prefix | Values | Purpose |
|---|---|---|
blast: | single-service, multi-service, platform | Blast radius for impact scoring |
rca: | code-defect, config-drift, infra-failure, human-error, 3rd-party | Root cause category |
ke: | known-error-{id} | Known error database reference |
service: | cloudops, finops, network, identity, compute | Affected service domain |
env: | prod, staging, dev, dr | Environment scoping |
tier: | critical, important, best-effort | Business impact tier (not tier-1/2/3) |
Priority Matrix
| Low Urgency | High Urgency | |
|---|---|---|
| High Impact | P2 (4h SLA) | P0 (15min SLA) |
| Low Impact | P4 (5d SLA) | P3 (8h SLA) |
P1 = High Impact, High Urgency, not platform-wide (P0 threshold).
Quality Gate
- Ticket type assigned (Incident / Problem / Change / Service Request)
- All 6 label prefixes populated
- Priority P0-P4 set based on Impact × Urgency matrix
- HITL approves label set before
jira_update_issuecall
Stage B: Cross-Validation (/itsm:cross-validate)
Who: sre-automation-specialist gathers evidence. qa-engineer validates accuracy.
What: 4 independent sources corroborate the incident report before any change record is created.
Why: 99.5% accuracy gate. SELF_COMPARISON_VALIDATION prevented — sources must be independent, not same-process exports from the same API call.
What-if skip: Change record based on incorrect scope, wrong systems in blast radius, PIR missing affected components.
4 Independent Sources
| Source | Tool | What It Provides |
|---|---|---|
| JIRA OPS | atlassian-tools MCP | Ticket description, comments, affected systems |
| runbooks CLI | /inventory:discover | Live resource inventory, cross-account |
| AWS API | READONLY profiles ($AWS_OPERATIONS_PROFILE) | CloudWatch alarms, Config events, flow logs |
| Visual | CloudWatch Console / dashboards | Timeline reconstruction, anomaly visualization |
5W1H Evidence Template
**What**: [Specific failure observed — service, endpoint, error message]
**Who**: [Affected users/systems — count, persona, account]
**When**: [Start time, detection time, escalation time — UTC ISO-8601]
**Where**: [Region, VPC, account ID reference, service domain]
**Why**: [Root cause hypothesis — rca: label maps here]
**How**: [Reproduce steps for engineering team]
This block is appended to the JIRA ticket description via ADF panel node (not blockquote — MCP ADF fidelity preserved via REST API v3 for structured nodes).
Quality Gate
- All 4 sources queried (no
CROSS_ACCOUNT_SILENT_ZERO— 0 results verified against account scope) - 5W1H evidence appended to JIRA ticket
- Reproduce prompt attached for engineering team
- Cross-validation delta ≤0.5% across independent sources
Stage C: Change Management (/itsm:create-change)
Who: sre-automation-specialist drafts. HITL approves before creation.
What: 7-section change record created as JIRA sub-task or linked issue under the incident.
Why: MSP cannot modify production environments without a formal change order. Audit trail required for APRA CPS 234.
What-if skip: Production changes executed without approval, audit gaps, regulatory non-compliance.
Stage C executes only for Change-type tickets (from Stage A classification). Incident and Service Request tickets skip to PIR.
7-Section Change Template
| Section | Content | Required |
|---|---|---|
| 1. Summary | One-line description of the change | Yes |
| 2. Impact Assessment | Systems affected, blast radius, user count | Yes |
| 3. Risk Rating | Low/Medium/High with justification | Yes |
| 4. Rollback Plan | Step-by-step revert procedure with time estimate | Yes |
| 5. Test Plan | Pre/post validation commands with expected outputs | Yes |
| 6. Implementation Steps | Numbered runbook with commands | Yes |
| 7. Approval | CAB reviewer list, approval date/time | Yes (CAB changes) |
HITL Gate
sre-automation-specialist → drafts all 7 sections
→ attaches evidence from Stage B
→ presents draft to HITL
HITL → reviews rollback plan
→ approves or requests changes
→ executes: jira_create_issue (NOT agent-initiated)
Quality Gate
- All 7 sections populated (no placeholders)
- Rollback plan tested in non-prod before change window
- Implementation window confirmed with CAB (for Normal changes)
- Change ticket linked to originating Incident in JIRA
Stage D: Change Request (/itsm:create-cr)
Who: sre-automation-specialist creates CAB subtask. HITL schedules review.
What: CAB review subtask with implementation window, rollback plan, and test plan.
Why: Normal changes require explicit CAB approval with documented review. Emergency changes require expedited CAB with post-implementation review.
What-if skip: Changes executed without CAB oversight, audit trail incomplete, regulatory breach.
Stage D executes only when Stage A assigned label cr:normal or cr:emergency. Standard pre-approved changes skip Stage D.
CAB Subtask Fields
| Field | Normal Change | Emergency Change |
|---|---|---|
| Implementation Window | Scheduled (≥48h notice) | Expedited (HITL-approved window) |
| Rollback Plan | Full 7-section plan | Abbreviated (minimum 3 steps) |
| Test Plan | Pre+post validation | Post-implementation only |
| CAB Reviewers | 2 approvers minimum | 1 approver + post-review |
| Evidence Required | Stage B + Stage C | Stage B minimum |
Quality Gate
- CAB subtask linked to Change ticket (not standalone)
- Implementation window confirmed in JIRA
- At least 1 CAB approver assigned before window opens
- Emergency changes trigger post-implementation review within 24h
Post-Resolution: PIR (/itsm:create-pir)
Who: sre-automation-specialist drafts. HITL reviews and publishes to Confluence OPS.
What: Post-Incident Review with 5-Why RCA, timeline, and action items.
Why: Without PIR, incidents recur. Action items with owners and due dates prevent recurrence.
What-if skip: Incidents repeat, team learns nothing, MTTR stagnates, pattern bridge never fires.
PIR Structure
## Post-Incident Review: {Ticket ID}
**Incident**: {title}
**Duration**: {start} → {resolved} ({total minutes})
**Severity**: P{n} | Impact: {blast: label}
## Timeline
| Time (UTC) | Event | Source |
|------------|-------|--------|
| HH:MM | Detection | CloudWatch alarm |
| HH:MM | Escalation | PagerDuty |
| HH:MM | Mitigation applied | Change record |
| HH:MM | Service restored | Monitoring |
## 5-Why Root Cause Analysis
1. Why did the service fail? → {observation}
2. Why did {observation} occur? → {deeper cause}
3. Why did {deeper cause} exist? → {system cause}
4. Why wasn't {system cause} detected? → {monitoring gap}
5. Why wasn't the monitoring gap closed? → {process gap}
**Root Cause**: {concise statement}
**rca: label**: {rca:code-defect | config-drift | infra-failure | human-error | 3rd-party}
## Action Items
| Action | Owner | Due Date | JIRA Ticket |
|--------|-------|---------|-------------|
| Fix monitoring gap | CloudOps Engineer | {date} | OPS-NNN |
| Update runbook | SRE | {date} | OPS-NNN |
Quality Gate
- 5-Why analysis reaches process/system root cause (not stopping at symptom)
- All action items have owners and due dates
- PIR published to Confluence OPS space (not just JIRA)
- JIRA ticket transitioned to
Resolvedonly after PIR approved by HITL
Pattern Bridge: OPS to SPM
When 3 or more incidents share the same service: and rca: labels within a 90-day window, the pattern bridge triggers an auto-draft product story in JIRA SPM.
Why: Recurring incidents that exceed the MTTR budget have a root cause that belongs in the product backlog, not the operations queue.
Bridge Conditions
| Condition | Threshold | Action |
|---|---|---|
Same service: + rca: | 3+ incidents in 90 days | Auto-draft SPM story |
Same service: + P0 | 1 P0 incident | Immediate SPM story draft |
| MTTR > SLA × 3 | Any severity | Draft SRE capacity story |
PII Scrubbing
OPS tickets contain customer names, account IDs, and configuration data. Before creating an SPM story, replace:
- Account IDs with
{account_type}placeholders - Customer names with
{customer_type}or persona labels - IP/CIDR with network tier labels (
prod-vpc,dr-vpc)
Key: The bridge is intentionally one-directional. Operational incidents drive product improvements. Planned product changes follow the SDLC lifecycle via /sync:jira-push.
Regulatory Compliance
ITSM lifecycle stages meet ANZ FSI/Energy/Telecom regulatory requirements:
| Stage | Compliance Area |
|---|---|
| Classify | Asset identification and criticality assessment |
| Cross-Validate | Control testing and evidence gathering |
| Create Change | Change management proportional to risk |
| Create CR | Material change review and approval |
| PIR | Systematic review and continuous improvement |
| Pattern Bridge | Recurring issue escalation to product backlog |
Detailed mapping (APRA CPS 234, CPS 230, NIST CSF 2.0) is documented in ADLC Governance Rules.
Component Map
| Component | Type | Purpose |
|---|---|---|
/itsm:lifecycle | Command | Full pipeline orchestrator |
/itsm:classify | Command | Stage A: ticket classification |
/itsm:cross-validate | Command | Stage B: 4-source evidence |
/itsm:create-change | Command | Stage C: change record |
/itsm:create-cr | Command | Stage D: CAB review |
/itsm:create-pir | Command | Post-incident review |
sre-automation-specialist | Agent | Executes all ITSM commands (haiku tier) |
itsm-ticket-classification | Skill | Classification decision tree |
itsm-change-management | Skill | Change workflow and risk assessment |
jira-jsm-service-desk | Skill | JSM lifecycle and SLA management |
atlassian-tools | MCP | 72 tools for JIRA + Confluence |
Common Mistakes (What NOT to Do)
| Mistake | Why It Fails | Fix |
|---|---|---|
TECHNICAL_WITHOUT_PROCESS | Changes without change order and approver | All infrastructure actions must include approval process |
AUTONOMOUS_SPRINT_ASSIGNMENT | Agent moves OPS tickets to sprint without HITL | Sprint assignment is HITL-exclusive action |
INCOMPLETE_EVIDENCE | Change records built on incomplete facts | All 4 sources verified before creating changes |
Last Updated: April 2026 | Status: Active | Maintenance: sre-automation-specialist