Skip to main content

Session Handoff Lessons Learned

Extracted from CloudOps-Runbooks operational experience. These five lessons were learned through incidents, not theory.

Context

AI agents have no persistent memory between sessions. Every new session starts fresh. This creates specific failure modes that governance must address.

Lesson 1: Framework Enforcement is Everything

Incident: Agent operated standalone, bypassing coordination framework. Produced work that conflicted with architectural decisions made in previous sessions.

Root Cause: No mechanism prevented the agent from skipping the coordination step.

Lesson: Coordination must be enforced, not optional. Hooks, pre-checks, and settings enforcement prevent bypass — voluntary compliance doesn't work.

ADLC Application: The remind-coordination.sh hook blocks specialist execution before product-owner + cloud-architect coordination. The enforcement is structural, not procedural.

Lesson 2: Evidence-Based Claims Only

Incident: Agent reported "100% complete" for a module. Manual testing revealed the module failed to import. The claim was based on code file existence, not functionality.

Root Cause: No requirement for evidence paths in completion claims.

Lesson: Every completion claim must include:

  • File path to the evidence (test output, screenshot, log)
  • The exact command that was run
  • The actual output (not a summary)

ADLC Application: The NATO prevention hook blocks completion claims (complete, done, finished) that don't include tmp/ evidence paths.

Lesson 3: Testing Before Claims

Incident: Agent claimed ">90% test coverage" based on the number of test functions that existed. None of the tests could actually execute due to import errors.

Root Cause: Conflation of test existence with test execution.

Lesson: Test validation requires:

  1. Tests execute (not just exist)
  2. Tests pass (not just run)
  3. Coverage is measured (not estimated from file count)

ADLC Application: Quality gates require actual test execution output (pytest --cov results), not assertions about test quality.

Lesson 4: Memory System is Critical

Incident: New session repeated work already completed in previous session. Worse, it made architectural decisions that contradicted previous consensus because it didn't know the consensus existed.

Root Cause: Agent relied on session context which doesn't persist.

Lesson: Project context must be persisted in files that every session reads:

  • CLAUDE.md — project-level instructions and architecture decisions
  • Memory files — user preferences, feedback, project state
  • Coordination logs — recent decisions with their rationale

ADLC Application: The .claude/memory/ system provides persistent memory across sessions. Coordination logs in tmp/ capture recent decisions with context.

Lesson 5: Transparency Over Optimism

Incident: Agent reported feature as "working" when it was "partially implemented". Manager discovered the gap during a demo. Trust was damaged.

Root Cause: Optimistic reporting bias — agent defaulted to positive framing.

Lesson: Honest assessment of completion state builds trust:

  • "75% complete with these specific gaps" is more trustworthy than "almost done"
  • Identifying what doesn't work is more valuable than confirming what does
  • A transparent roadmap for the remaining 25% gives the manager control

ADLC Application: The quality baseline assessment template (separate doc) provides the structure for honest status reporting. The anti-pattern list includes "NATO (No Action, Talk Only)" as a governance violation.

Common Thread

All five lessons point to one principle: structural enforcement beats voluntary compliance.

Voluntary (Fails)Structural (Works)
"Remember to coordinate"Hook blocks execution without coordination
"Include evidence"Hook blocks completion claims without tmp/ paths
"Test before claiming"Quality gate requires test execution output
"Check previous context"Session reads persistent memory files automatically
"Be honest about status"Assessment template forces categorisation of verified vs claimed

Applicability

These lessons apply to any project using AI agents for implementation:

  • The failure modes are inherent to stateless AI sessions
  • The mitigations are structural (hooks, templates, enforcement)
  • The governance overhead is small compared to the cost of the incidents

Origin: CloudOps-Runbooks multi-session operational experience. Elevated to framework methodology because these failure modes are universal, not project-specific.