AI agil riskbedömning visar kritisk threshold i UMass Amherst experiment (3 sprint teams, 47 story points, standardized deliverable): AI-only wins planning time (0.38 hrs vs 4.5), cost per point ($78.50 vs $91), men degraderar risk capture till 36.4% (human 78.6%, hybrid 86.7%). Noll procent novel context-specific risks fångade av AI – CSS framework incompatibility, API token expiration, CDN font issues, viewport mismatches. Rework rate AI-only 14.2% vs hybrid 8.6%, scope change recovery 6.5 hrs vs 3.2. För dig som projektledare betyder detta: efficiency ≠ effectiveness. Total Cost of Delivery (TCD) model visar hybrid endast 1% dyrare än AI ($4,272 vs $4,229) men 138% bättre risk capture + vinner blind client evaluation. Synergistic effect: hybrid identifierar 13 risks (mer än AI 4 + human 11 combined). Cognitive offloading threshold kryssat när AI handles risk identification utan mandated human review. HPGF framework (Hybrid Planning Governance): high contextual ambiguity tasks kräver human deliberation med AI scaffolding, inte full delegation.
Experiment design: Three conditions, eight metrics, controlled scope change
Vierra Digital agency (35-50 personnel), standard Scrum, 2-week sprints:
Three separate teams (ingen cross-condition learning):
- Experience-matched: 3.2 years average Agile
- Compensation: $47/hr blended rate (konstant across conditions)
- Same deliverable: 47 story points, semi-complex landing page
- Three sprints sequential execution
AI-only condition: Claude Sonnet 4.6 via claude.ai handles ALL planning (backlog, estimation, velocity, risk, sequencing). Human team executes plan utan participation i planning decisions.
Human-only condition: No AI tooling. Scrum team leads all planning genom Planning Poker, dedicated risk discussions, standard collaborative ceremonies.
Hybrid condition: Claude generates initial backlog, velocity forecast, baseline risk log FÖRE planning meeting. Human team reviews AI outputs, validates estimates, mandatory structured session för risk identification + assumption documentation.
Controlled disruption: 40% completion mark Sprint 2: client requests animation library replacement (third-party → custom-built, licensing concerns). Genuine technical dependency shift requiring architectural reassessment.
Eight tracked metrics: Efficiency: planning time, completion time, cost per point Robustness: backlog revisions, rework rate, documented risks, risk capture rate, scope change recovery time
Blind client evaluation: Client selects preferred deliverable utan knowledge vilken condition produced it.
Risk capture catastrophe: AI-only 0% novel risks
Overall risk capture rates:
- AI-only: 36.4% (4 risks documented, 11 total materialized)
- Human-only: 78.6% (11 documented)
- Hybrid: 86.7% (13 documented) – VINNER
Risk capture by category:
Technical Dependencies: AI 20%, Human 80%, Hybrid 100%
Client Behavior Risks: AI 33%, Human 100%, Hybrid 100%
Third-Party Service: AI 67%, Human 67%, Hybrid 100%
Novel/Context-Specific: AI 0%, Human 67%, Hybrid 100%
Seven undocumented AI-only risks (all novel/context-specific):
- CSS framework incompatibility med client component library
- API authentication token expiration
- CDN font availability issue
- Mobile viewport breakpoint mismatch
- Legacy CSS overrides i client template
- Version deprecation dependencies
- (+ one more undisclosed)
Kritisk insight: AI captured 0% novel risks NOT because miscalculated probabilities – dessa risks existed OUTSIDE training distribution. Team received finished plan utan having performed deliberative labor necessary för shared mental model.
Statistical significance: AI-only → Hybrid improvement: t(2)=-8.45, p=.007 AI-only → Human improvement: t(2)=-6.12, p=.013
Rework rate driven av unmitigated risks
Sprint-level rework rates:
Sprint 1 (setup): AI 2.3%, Human 1.4%, Hybrid 2.0%
Sprint 2 (core build – CRITICAL): AI 20.5%, Human 12.8%, Hybrid 11.3%
Sprint 3 (finalization): AI 12.9%, Human 7.5%, Hybrid 9.7%
Overall: AI 14.2%, Human 9.1%, Hybrid 8.6%
AI-only Sprint 2 driver: Single unmitigated risk (CSS framework incompatibility) caused 7.2 hours rework = 64.9% total phase rework. Discovered 60% completion mark integration task.
Human-only prevention: Identified som medium-likelihood, high-impact risk during planning. Allocated 1.5 hours upfront validate compatibility → incident avoided entirely.
Hybrid result: Caught risk similarly, reduced Sprint 2 rework till 11.3%.
Statistical significance: Sprint 2 rework AI → Hybrid reduction: t(2)=4.12, p=.027
Total Cost of Delivery reframe: 1% premium, 138% risk improvement
Standard industry evaluation (AI wins): Planning speed: AI 0.38 hrs vs Human 4.5 hrs Cost per point: AI $78.50 vs Human $91
TCD model (incorporates rework + scope change + planning ceremony):
Execution Cost: AI $3,689.50, Human $4,277, Hybrid $3,854
Rework Cost: AI $521.70, Human $390.10, Hybrid $333.70
Planning Ceremony Cost: AI $17.86, Human $211.50, Hybrid $84.60
TOTAL: AI $4,229.06 Human $4,878.60 Hybrid $4,272.30 (+$43.24 = +1.0% vs AI)
För 1% premium, hybrid delivers:
- 138.2% improvement risk capture rate
- 50.8% improvement scope change recovery (3.2 hrs vs 6.5)
- Blind client evaluation winner
- 86.7% risk capture vs 36.4%
Understated advantage: TCD only accounts risks that materialized i experiment. AI 36.4% capture = substantially higher risk exposure premium för unmaterialized risks i more complex/high-stakes projects.
Synergistic effect: 13 risks > (4 + 11)
Simple additive model would predict: Hybrid falls between AI 4 risks + Human 11 risks
Actual result: Hybrid identified 13 risks – MORE than either baseline alone
Mechanism (cognitive scaffolding effect):
Human-only vulnerability: Unstructured deliberation → availability bias. Team anchored på cognitively salient risks från recent projects, overlooked less memorable dependencies (API version deprecation Sprint 2).
Hybrid advantage: AI structured backlog + baseline risk log forced systematic review broader risk categories → mitigated availability bias.
Samtidigt: mandatory human review med explicit requirement identify AI-missed risks → activated critical interrogation (suppressed i AI-only).
Superadditive example: Risk R13 (undocumented legacy CSS overrides i client existing template) – novel, context-specific, appeared i NEITHER baseline condition. Prevented estimated 5-7 hours rework.
When AI outputs used som cognitive scaffolds (not authoritative directives) → enhance rather than erode shared mental model.
Cognitive offloading threshold: Where delegation becomes harmful
Definition: Cognitive offloading = externalization av cognitive work onto tools/environments. NOT inherently harmful.
Safe delegation: Velocity forecasting, backlog formatting → frees human attention för higher-order judgment utan sacrificing quality.
Threshold crossed when delegation extends to:
- Tasks requiring tacit organizational knowledge
- Novel technical dependency evaluation
- Unstated client preference alignment
Result: Team receives finished planning artifact UTAN having performed deliberative labor necessary för shared mental model + vulnerability identification.
Vierra quantification: AI-only captured 0% novel context-specific risks, not miscalculation – dessa existed OUTSIDE training distribution.
HPGF framework: Four quadrants, distinct governance rules
Hybrid Planning Governance Framework categorizes tasks:
Axis 1: Computational Complexity Degree task relies på large historical datasets + quantitative optimization
Axis 2: Contextual Ambiguity Degree task relies på tacit knowledge, novel dependencies, unstated preferences
Four quadrants:
LOW ambiguity + LOW complexity: Full AI Automation (Administrative docs, status reporting, retrospective formatting) NO cognitive risk → full delegation appropriate
LOW ambiguity + HIGH complexity: AI Delegation with Human Review (Velocity forecasting, throughput analysis, routine estimation) AI generates → human confirms applicability till current context
HIGH ambiguity + LOW complexity: Human Deliberation with AI Scaffolding ⚠️ KRITISK (Risk identification, assumption articulation, contingency planning) Human leads → AI provides baseline structure till interrogate AI-only failed här (0% novel risks)
HIGH ambiguity + HIGH complexity: Iterative Human-AI Collaboration (Scope change impact analysis, architectural refactoring estimation) Human frames context → AI models quantitative implications Hybrid superior scope change recovery (3.2 hrs vs 6.5)
Most critical quadrant: Lower-right (high ambiguity, low complexity). HPGF mandates AI outputs treated explicitly som starting point för deliberation, NOT conclusion. Governance must REQUIRE team document risks AI didn’t identify → activates critical interrogation preventing automation complacency.
Fem praktiska implementations-strategier
1. Replace cost-per-point med TCD evaluation: Standard metrics (planning speed, initial cost) drive AI adoption medan leaving success rates stagnant. TCD incorporates rework, scope change recovery, planning ceremony costs. Organizations adopting TCD standard will find hybrid economically compelling.
2. Mandate risk interrogation protocols: Explicit requirement: team MUST document risks AI didn’t identify. This activates critical thinking. Template: “Vilka context-specific dependencies har AI missat?” Forces systematic review beyond AI baseline.
3. Classify planning tasks via HPGF före deployment: Map each task till quadrant. High contextual ambiguity (lower-right, upper-right) → never full AI delegation. Velocity forecasting (upper-left) → safe delegation med review. Risk identification (lower-right) → human-led med AI scaffold.
4. Treat AI outputs som scaffolds, not conclusions: Especially i high-ambiguity quadrants. AI baseline risk log = starting point för structured human deliberation. Prevents both unstructured availability bias (human-only) + automation complacency (AI-only).
5. Track capability trajectory, not just delivery metrics: Longitudinal concern: continuous AI-assisted planning may produce deskilling i junior developers. If estimation + risk identification permanently offloaded → capacity evaluate AI outputs gradually erodes → dangerous dependency loop. Monitor: Can team perform planning utan AI? Error detection rates when AI unavailable? Skill preservation programs.
Bottom line
AI agil riskbedömning kräver hybrid approach: UMass experiment (3 teams, 47 points, 8 metrics) visar AI-only wins planning time (0.38 hrs) + cost ($78.50/point) men degraderar risk capture till 36.4% (human 78.6%, hybrid 86.7%). Zero procent novel context-specific risks fångade. Rework 14.2% vs 8.6% hybrid. TCD model: hybrid endast 1% dyrare ($4,272 vs $4,229) men 138% bättre risk capture + blind client winner. Synergistic effect: 13 risks identified (> AI 4 + human 11). Cognitive offloading threshold kryssat när AI handles risk utan mandated review. HPGF framework: high contextual ambiguity tasks (risk identification, assumptions, contingencies) kräver human deliberation med AI scaffolding. Implementation: TCD evaluation, mandate interrogation protocols, HPGF task classification, scaffolds not conclusions, track capability trajectory.
Källa: “Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality” av Adriana Caraeni, Alexander Shick & Andrew Lan, University of Massachusetts Amherst, publicerad 15 april 2026.
