GenAI agile teams produktivitet visar P-A-E divergence i 13-månaders longitudinal studie (3 teams, 21 developers): Performance +59.1% (story points), Efficiency ≈82% perceived increase, men Activity flat (commits/LoC unchanged, p>0.05). Detta avslöjar att GenAI förbättrar produktivitet inte genom increased volume utan genom increased value density. Developers deliver mer värde per line of code. För dig som projektledare betyder detta: mät inte bara velocity (activity metrics) – mät value delivery (performance metrics). Annars missar du entire productivity transformation.
P-A-E Divergence: Kritiska upptäckten
Historical period (pre-GenAI): 281 story points completed Research period (post-GenAI): 447 story points completed Ökning: +59.1% (statistiskt signifikant: p=4.94e-07, Cohen’s d=0.49)
Men commits/LoC: Ingen signifikant skillnad (p=0.928)
Implication: Teams achieved 59% mer throughput med samma activity level. Detta är inte “work harder” utan “work smarter” via value concentration.
Qualitative evidence: “After using Copilot, need for research in forums greatly reduced. GitHub Copilot became my main source of reference” (Profile 03)
“Fact of not needing leave IDE and its understanding of application context helped solve problems in little time” (Anonymous)
För projektledare: If you only track velocity (commits, LoC), you conclude “no change”. If you track value delivery (story points, features shipped), you see massive gain. Multi-dimensional measurement är mandatory.
SPACE Framework: Holistic productivity lens
Studien använder SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) för capture complete picture:
S – Satisfaction: 3.78/5.0 mean, 4.0/5.0 median. ≈90% satisfied overall. Men task-dependent:
- Unit test coverage: ≈75% positive
- New features: ≈65% positive
- Legacy APIs/Kafka: <25% positive
P – Performance: +59.1% story points. Planned story points även ökade +150% (447→1155), suggesting increased team confidence.
A – Activity: Flat. Number of committed lines unchanged (p=0.928).
C – Communication: Shifted från “shared construction” till “shared evaluation”. Less co-writing, more review/integration.
E – Efficiency: ≈82% perceive increased speed. ≈75% report avoiding repetitive work, focusing high-value tasks.
För projektledare: Traditional velocity metrics (A) miss transformation. SPACE-inspired multi-metric tracking essential.
Task dependency: Where GenAI shines vs struggles
High satisfaction tasks (≈65-75% positive):
- Unit test coverage (API + Frontend)
- New feature development
- Test scenario creation
- “Really very useful for replicating test patterns almost impeccably” (profile_12)
Low satisfaction tasks (<25-50% positive):
- APIs integration (especially legacy)
- ETL migration
- Legacy Kafka integration
- “Only works well for simple methods, for complex methods fails to be reliable” (profile_12)
För projektledare: Don’t expect uniform gains. GenAI excels at well-defined, pattern-heavy work (tests, boilerplate). Struggles med context-heavy integration. Allocate accordingly.
Code quality: Mixed results requiring monitoring
SonarQube analysis (High Severity issues):
- Case F: 313 → 261 (reduction ✓)
- Case J: 1624 → 825 (massive reduction ✓)
- Case D: 33 → 62 (increase ✗)
Takeaway: Quality improvement är not automatic. Two av three teams improved, one degraded. Context-dependent outcome requiring active monitoring.
För projektledare: Increased throughput ≠ guaranteed quality. Implement continuous quality gates, don’t assume GenAI maintains standards by default.
Fem konkreta implementation insights
1. Mandatory standardized training: Study provided 2-hour workshop på prompt engineering + Copilot usage för all participants. Minimizes learning-curve bias, establishes common baseline.
2. Collaborative channels för knowledge sharing: Microsoft Teams channel där developers share prompts, experiences, challenges. Shifted communication från “co-writing code” till “co-evaluating AI output”.
3. Multi-dimensional metrics dashboard: Track P (story points), A (commits), E (perceived efficiency), S (satisfaction) simultaneously. Single metric misleading.
4. Task-appropriate AI allocation: High GenAI suitability: Tests, documentation, new features (well-defined) Low GenAI suitability: Legacy integration, complex APIs (context-heavy)
5. Quality monitoring protocols: Don’t assume quality maintenance. Implement SonarQube-style continuous monitoring. Case D example proves quality can degrade despite productivity gains.
Long-term risks participants identified
“Excessive tool dependence” risk “Skill atrophy” concern (especially för juniors) Validation overhead för complex AI outputs “Required lot of work to adapt” generated unit tests
För projektledare: Balance short-term gains mot long-term capability preservation. 13-month study insufficient för capture full skill atrophy effects.
Bottom line för agile teams
GenAI agile teams produktivitet transformeras via value density, not volume. P-A-E divergence (Performance +59%, Activity flat) validates multi-dimensional frameworks like SPACE essential. Measuring Activity alone misses entire transformation. Task dependency critical: 65-75% satisfaction för tests/features, <25% för legacy integration. Quality mixed: 2/3 teams improved, 1/3 degraded. Efficiency gains (≈82%) driven by reduced cognitive friction, less context-switching. Communication shifted från construction till evaluation. Standardized training + collaborative channels accelerate adoption. Long-term risks (skill atrophy, dependency) require monitoring beyond 13 months. Future measurement must track value delivery metrics alongside traditional velocity.
Källa: “Impacts of Generative AI on Agile Teams’ Productivity: A Multi-Case Longitudinal Study” av Rafael Tomaz et al., Pontifical Catholic University of Rio de Janeiro, publicerad 14 februari 2026.
