Skip to main content

ITSM Lifecycle Golden Path

ITSM operations are distinct from product development. OPS board = customer incidents. SPM board = product stories. The pattern bridge connects them — after PII scrub.

At a Glance

Every OPS ticket flows through a standardized pipeline: classify the issue, cross-validate against real infrastructure, then — for changes only — create a change record and change request for approval. Post-incident reviews capture lessons learned.

I have a...What happensCommand
New OPS ticketClassify type + priority, then verify against AWS/itsm:classify OPS-NNN
Classified ticket needing evidenceCross-validate against 4 sources/itsm:cross-validate OPS-NNN
Infrastructure change to implementCreate change record with risk + rollback plan/itsm:create-change "title"
Change needing CAB approvalCreate change request for review board/itsm:create-cr OPS-NNN
Resolved P0/P1 incidentBlameless post-incident review/itsm:create-pir OPS-NNN
Full end-to-end processingAll stages with approval gates/itsm:lifecycle OPS-NNN

All commands default to preview mode. Add --execute to apply changes to JIRA.


The Flow


Stage A: Classification (/itsm:classify)

Who: sre-automation-specialist classifies. HITL approves label set before JIRA update.

What: Decision tree assigns ticket type, priority, and 6-prefix label taxonomy.

Why: Correct classification drives SLA, routing, and CAB review requirements. Misclassification at Stage A propagates through all downstream stages.

What-if skip: Tickets misrouted, SLAs misapplied, CAB reviews missed for Changes that require them.

Decision Tree

Ticket received
├── Unplanned disruption to service → Incident
│ └── Underlying cause unknown → Problem (linked)
├── Planned modification to production → Change
│ ├── Standard (pre-approved template) → No CAB required
│ ├── Normal (requires CAB review) → Stage C + D
│ └── Emergency (P0/P1 impact) → Expedited CAB
└── User-requested provisioning → Service Request
└── Fulfillment via service catalog

6-Prefix Label Taxonomy

PrefixValuesPurpose
blast:single-service, multi-service, platformBlast radius for impact scoring
rca:code-defect, config-drift, infra-failure, human-error, 3rd-partyRoot cause category
ke:known-error-{id}Known error database reference
service:cloudops, finops, network, identity, computeAffected service domain
env:prod, staging, dev, drEnvironment scoping
tier:critical, important, best-effortBusiness impact tier (not tier-1/2/3)

Priority Matrix

Low UrgencyHigh Urgency
High ImpactP2 (4h SLA)P0 (15min SLA)
Low ImpactP4 (5d SLA)P3 (8h SLA)

P1 = High Impact, High Urgency, not platform-wide (P0 threshold).

Quality Gate

  • Ticket type assigned (Incident / Problem / Change / Service Request)
  • All 6 label prefixes populated
  • Priority P0-P4 set based on Impact × Urgency matrix
  • HITL approves label set before jira_update_issue call

Stage B: Cross-Validation (/itsm:cross-validate)

Who: sre-automation-specialist gathers evidence. qa-engineer validates accuracy.

What: 4 independent sources corroborate the incident report before any change record is created.

Why: 99.5% accuracy gate. SELF_COMPARISON_VALIDATION prevented — sources must be independent, not same-process exports from the same API call.

What-if skip: Change record based on incorrect scope, wrong systems in blast radius, PIR missing affected components.

4 Independent Sources

SourceToolWhat It Provides
JIRA OPSatlassian-tools MCPTicket description, comments, affected systems
runbooks CLI/inventory:discoverLive resource inventory, cross-account
AWS APIREADONLY profiles ($AWS_OPERATIONS_PROFILE)CloudWatch alarms, Config events, flow logs
VisualCloudWatch Console / dashboardsTimeline reconstruction, anomaly visualization

5W1H Evidence Template

**What**: [Specific failure observed — service, endpoint, error message]
**Who**: [Affected users/systems — count, persona, account]
**When**: [Start time, detection time, escalation time — UTC ISO-8601]
**Where**: [Region, VPC, account ID reference, service domain]
**Why**: [Root cause hypothesis — rca: label maps here]
**How**: [Reproduce steps for engineering team]

This block is appended to the JIRA ticket description via ADF panel node (not blockquote — MCP ADF fidelity preserved via REST API v3 for structured nodes).

Quality Gate

  • All 4 sources queried (no CROSS_ACCOUNT_SILENT_ZERO — 0 results verified against account scope)
  • 5W1H evidence appended to JIRA ticket
  • Reproduce prompt attached for engineering team
  • Cross-validation delta ≤0.5% across independent sources

Stage C: Change Management (/itsm:create-change)

Who: sre-automation-specialist drafts. HITL approves before creation.

What: 7-section change record created as JIRA sub-task or linked issue under the incident.

Why: MSP cannot modify production environments without a formal change order. Audit trail required for APRA CPS 234.

What-if skip: Production changes executed without approval, audit gaps, regulatory non-compliance.

Conditional Stage

Stage C executes only for Change-type tickets (from Stage A classification). Incident and Service Request tickets skip to PIR.

7-Section Change Template

SectionContentRequired
1. SummaryOne-line description of the changeYes
2. Impact AssessmentSystems affected, blast radius, user countYes
3. Risk RatingLow/Medium/High with justificationYes
4. Rollback PlanStep-by-step revert procedure with time estimateYes
5. Test PlanPre/post validation commands with expected outputsYes
6. Implementation StepsNumbered runbook with commandsYes
7. ApprovalCAB reviewer list, approval date/timeYes (CAB changes)

HITL Gate

sre-automation-specialist → drafts all 7 sections
→ attaches evidence from Stage B
→ presents draft to HITL
HITL → reviews rollback plan
→ approves or requests changes
→ executes: jira_create_issue (NOT agent-initiated)

Quality Gate

  • All 7 sections populated (no placeholders)
  • Rollback plan tested in non-prod before change window
  • Implementation window confirmed with CAB (for Normal changes)
  • Change ticket linked to originating Incident in JIRA

Stage D: Change Request (/itsm:create-cr)

Who: sre-automation-specialist creates CAB subtask. HITL schedules review.

What: CAB review subtask with implementation window, rollback plan, and test plan.

Why: Normal changes require explicit CAB approval with documented review. Emergency changes require expedited CAB with post-implementation review.

What-if skip: Changes executed without CAB oversight, audit trail incomplete, regulatory breach.

Conditional Stage

Stage D executes only when Stage A assigned label cr:normal or cr:emergency. Standard pre-approved changes skip Stage D.

CAB Subtask Fields

FieldNormal ChangeEmergency Change
Implementation WindowScheduled (≥48h notice)Expedited (HITL-approved window)
Rollback PlanFull 7-section planAbbreviated (minimum 3 steps)
Test PlanPre+post validationPost-implementation only
CAB Reviewers2 approvers minimum1 approver + post-review
Evidence RequiredStage B + Stage CStage B minimum

Quality Gate

  • CAB subtask linked to Change ticket (not standalone)
  • Implementation window confirmed in JIRA
  • At least 1 CAB approver assigned before window opens
  • Emergency changes trigger post-implementation review within 24h

Post-Resolution: PIR (/itsm:create-pir)

Who: sre-automation-specialist drafts. HITL reviews and publishes to Confluence OPS.

What: Post-Incident Review with 5-Why RCA, timeline, and action items.

Why: Without PIR, incidents recur. Action items with owners and due dates prevent recurrence.

What-if skip: Incidents repeat, team learns nothing, MTTR stagnates, pattern bridge never fires.

PIR Structure

## Post-Incident Review: {Ticket ID}

**Incident**: {title}
**Duration**: {start} → {resolved} ({total minutes})
**Severity**: P{n} | Impact: {blast: label}

## Timeline
| Time (UTC) | Event | Source |
|------------|-------|--------|
| HH:MM | Detection | CloudWatch alarm |
| HH:MM | Escalation | PagerDuty |
| HH:MM | Mitigation applied | Change record |
| HH:MM | Service restored | Monitoring |

## 5-Why Root Cause Analysis
1. Why did the service fail? → {observation}
2. Why did {observation} occur? → {deeper cause}
3. Why did {deeper cause} exist? → {system cause}
4. Why wasn't {system cause} detected? → {monitoring gap}
5. Why wasn't the monitoring gap closed? → {process gap}

**Root Cause**: {concise statement}
**rca: label**: {rca:code-defect | config-drift | infra-failure | human-error | 3rd-party}

## Action Items
| Action | Owner | Due Date | JIRA Ticket |
|--------|-------|---------|-------------|
| Fix monitoring gap | CloudOps Engineer | {date} | OPS-NNN |
| Update runbook | SRE | {date} | OPS-NNN |

Quality Gate

  • 5-Why analysis reaches process/system root cause (not stopping at symptom)
  • All action items have owners and due dates
  • PIR published to Confluence OPS space (not just JIRA)
  • JIRA ticket transitioned to Resolved only after PIR approved by HITL

Pattern Bridge: OPS to SPM

When 3 or more incidents share the same service: and rca: labels within a 90-day window, the pattern bridge triggers an auto-draft product story in JIRA SPM.

Why: Recurring incidents that exceed the MTTR budget have a root cause that belongs in the product backlog, not the operations queue.

Bridge Conditions

ConditionThresholdAction
Same service: + rca:3+ incidents in 90 daysAuto-draft SPM story
Same service: + P01 P0 incidentImmediate SPM story draft
MTTR > SLA × 3Any severityDraft SRE capacity story

PII Scrubbing

OPS tickets contain customer names, account IDs, and configuration data. Before creating an SPM story, replace:

  • Account IDs with {account_type} placeholders
  • Customer names with {customer_type} or persona labels
  • IP/CIDR with network tier labels (prod-vpc, dr-vpc)

Key: The bridge is intentionally one-directional. Operational incidents drive product improvements. Planned product changes follow the SDLC lifecycle via /sync:jira-push.


Regulatory Compliance

ITSM lifecycle stages meet ANZ FSI/Energy/Telecom regulatory requirements:

StageCompliance Area
ClassifyAsset identification and criticality assessment
Cross-ValidateControl testing and evidence gathering
Create ChangeChange management proportional to risk
Create CRMaterial change review and approval
PIRSystematic review and continuous improvement
Pattern BridgeRecurring issue escalation to product backlog

Detailed mapping (APRA CPS 234, CPS 230, NIST CSF 2.0) is documented in ADLC Governance Rules.


Component Map

ComponentTypePurpose
/itsm:lifecycleCommandFull pipeline orchestrator
/itsm:classifyCommandStage A: ticket classification
/itsm:cross-validateCommandStage B: 4-source evidence
/itsm:create-changeCommandStage C: change record
/itsm:create-crCommandStage D: CAB review
/itsm:create-pirCommandPost-incident review
sre-automation-specialistAgentExecutes all ITSM commands (haiku tier)
itsm-ticket-classificationSkillClassification decision tree
itsm-change-managementSkillChange workflow and risk assessment
jira-jsm-service-deskSkillJSM lifecycle and SLA management
atlassian-toolsMCP72 tools for JIRA + Confluence

Common Mistakes (What NOT to Do)

MistakeWhy It FailsFix
TECHNICAL_WITHOUT_PROCESSChanges without change order and approverAll infrastructure actions must include approval process
AUTONOMOUS_SPRINT_ASSIGNMENTAgent moves OPS tickets to sprint without HITLSprint assignment is HITL-exclusive action
INCOMPLETE_EVIDENCEChange records built on incomplete factsAll 4 sources verified before creating changes

Last Updated: April 2026 | Status: Active | Maintenance: sre-automation-specialist