Skip to main content

Enterprise GenAI Integration

xOps implements a 6-layer enterprise GenAI architecture. Each layer maps to specific ADLC agents, commands, skills, and hooks.

Architecture Overview

L1: Customer          → CloudOps/FinOps/DevOps engineer
L2: Interaction → Open WebUI RAG chatbot + runbooks CLI
L3: Generative AI → CrewAI + LiteLLM + Claude API
L4: Backend Apps → IAM Identity Center + ADLC hooks + runbooks CLI
L5: Data Sources → Config Aggregator + Cost Explorer + Security Hub
L6: Infrastructure → Docker ($0) → ECS Fargate ($180/mo) → K3S hybrid

Layer 1 — Customer

Industry Pattern: User logs in, reviews options, completes task or escalates to live agent

xOps Pattern: CloudOps/FinOps/DevOps engineer opens xOps, asks operational question, reviews AI analysis, approves remediation or escalates to HITL manager

StepActionSystem
Engineer opens xOpsSSO login via IAM Identity CenterOpen WebUI
Asks operational questionNatural language: "Show me idle EC2 across all accounts"Chat Interface
Reviews AI analysisMulti-account dashboard with cost/security/inventory signalsRich CLI + Tables
Requests HITL escalationComplex remediation needs human approval (Principle I)SNS + Slack
Completes taskExports report, closes ticket, evidence in tmp/CSV/JSON/PDF

ADLC Agent: product-owner — validates business value at the customer touchpoint

Commands: /product:pr-faq xops, /metrics:daily-standup


Layer 2 — Interaction

Industry Pattern: Chatbot initiates and guides conversation; agent escalation when AI cannot resolve

xOps Pattern: Open WebUI RAG chatbot + runbooks CLI handle 80% of queries; CrewAI crews escalate complex multi-step operations to HITL

StepActionSystem
Chatbot activatedxOps RAG pipeline processes query against multi-account AWS knowledge baseOpen WebUI + ChromaDB
Communicates optionsPresents cost analysis, security findings, or inventory results with confidence scoresCrewAI Flows v2
User selects actionApprove recommendation or request alternativesPipeline Engine
Chatbot respondsExecutes approved action via runbooks CLI (READONLY)runbooks PyPI
Pings HITL supportEscalation when action requires write access or compliance approvalSNS → Slack
HITL provides solutionManager reviews evidence, approves terraform apply or remediationADLC Phase 3+
Feedback to modelHITL correction feeds back to RAG knowledge base for continuous improvementCrewAI Knowledge

ADLC Agents: frontend-docs-engineer (Open WebUI), meta-engineering-expert (pipeline engine)


Layer 3 — Generative AI

Industry Pattern: Model receives request, checks policies, generates options, executes next steps

xOps Pattern: CrewAI orchestrates specialist crews (CloudOps/FinOps/DevSecOps) via LiteLLM gateway to Claude API with ADLC governance hooks

StepActionSystem
Model receives requestCrewAI Flow receives user query + pulls AWS account context via MCPCrewAI + LiteLLM
Checks compliance policyAPRA CPS 234 data sovereignty check — sensitive queries route to Ollama (local), not cloudLiteLLM Router
Generates analysisMulti-account cost analysis, security posture, inventory discoveryrunbooks analyzers (119+)
Presents optionsRanked recommendations with cost impact, risk score, and compliance mappingClaude API + Prompt Caching
Executes approved actionREADONLY queries autonomous; write operations require HITL gate (Principle I)ADLC Hooks

ADLC Agents: meta-engineering-expert (CrewAI flows), python-engineer (analyzers), qa-engineer (accuracy validation), cloud-architect (AI architecture decisions)


Layer 4 — Backend Apps

Industry Pattern: Authentication, policy enforcement, booking workflows, agent assignment

xOps Pattern: IAM Identity Center SSO, ADLC constitutional governance (22 hooks), runbooks CLI (131 commands), MCP server orchestration (58 integrations)

StepActionSystem
Authentication + authorisationIAM Identity Center SCIM 2.0 → Open WebUI user sync + OIDC for API authM1 terraform-aws-iam-identity-center
Policy enforcementADLC 22 deterministic hooks: enforce-coordination, validate-bash, detect-nato-violation.claude/hooks/scripts/
Workflow management6-phase ADLC lifecycle: PLAN → BUILD → TEST → DEPLOY → MONITOR → OPERATEADLC Constitution v2.1.0
Agent assignment15 constitutional agents in 3 tiers: 3 opus (decision) + 8 sonnet (execution) + 4 haiku (operations).claude/agents/

ADLC Agents: infrastructure-engineer (IaC lifecycle), security-compliance-engineer (APRA CPS 234)


Layer 5 — Data Sources

Industry Pattern: Customer ID, booking history, policy rules, agent directories

xOps Pattern: AWS account inventory (Config Aggregator), cost history (Cost Explorer FOCUS 1.2+), security findings (Security Hub), certificate inventory (ACM org-wide)

StepActionSystem
Account inventoryConfig Aggregator: org-wide resource discovery in seconds (P1 path)AWS Config
Cost historyCost Explorer: monthly/daily granularity, FOCUS 1.2+ normalisation, multi-accountAWS Cost Explorer
Security findingsSecurity Hub: aggregated CRITICAL/HIGH/MEDIUM across all accounts, SOC2 mappingAWS Security Hub
Certificate inventoryACM: risk-ranked expiry dashboard across all accountsAWS ACM + IAM
Operational knowledgeChromaDB RAG: runbooks docs, SOPs, incident history (xOps-S2 roadmap — S2-01)CrewAI Knowledge

ADLC Agents: cloud-architect (data architecture), observability-engineer (MELT telemetry instrumentation)


Layer 6 — Infrastructure

Industry Pattern: Cloud/hybrid infrastructure, model orchestration, low-latency, security governance

xOps Pattern: Docker-first ($0 local) → ECS Fargate Graviton4 ($180/mo prod), LiteLLM model orchestration, CloudFront edge delivery, ADLC + WAFv2 security governance

StepActionSystem
Cloud / on-premisesLOCAL: docker-compose ($0) → PROD: ECS Fargate Graviton4 ARM64 ($180/mo) → HYBRID: K3S for on-prem sovereigntyM2 terraform-aws-ecs
Model orchestrationLiteLLM gateway: Claude API (BC1) → Bedrock VPC (BC2+) → Ollama local (air-gapped). Config change, not arch change.LiteLLM + CrewAI
Low-latency deliveryCloudFront 450+ PoPs, WebSocket passthrough, ALB sticky sessions for streamingM3 terraform-aws-web
Security governanceWAFv2 ATPRuleSet + ADLC 22 hooks + 65 anti-patterns + 17 rules files + APRA CPS 234ADLC Framework v3.7.2

ADLC Agents: infrastructure-engineer (deployment), cloud-architect (infra design), sre-automation-specialist (reliability)


Agent-to-Layer Mapping

LayerAgentsWhy
Customerproduct-ownerValidates business value — ensures AI serves real operator needs
Interactionfrontend-docs-engineer, meta-engineering-expertFrontend expertise for Open WebUI + MEE for pipeline engine integration
Generative AImeta-engineering-expert, python-engineer, qa-engineer, cloud-architectMEE designs CrewAI flows; python-engineer implements analyzers; QA validates accuracy; CA validates AI architecture
Backend Appsinfrastructure-engineer, security-compliance-engineerIaC lifecycle + APRA CPS 234 compliance at every backend touchpoint
Data Sourcescloud-architect, observability-engineerCA designs data architecture; observability-engineer instruments MELT telemetry
Infrastructureinfrastructure-engineer, cloud-architect, sre-automation-specialistInfra-engineer deploys; CA designs infra architecture; SRE ensures reliability