admin/_docs/06_metrics/retro_2026-05-13.md

# Retrospective — 2026-05-13 (Cycle 1, end of cycle)

**Mode**: cycle-end
**Cycle**: 1
**Window**: 2026-04-16 (Phase A baseline) → 2026-05-13 (Phase B feature cycle complete + Deploy)
**Previous retro**: N/A — first retrospective

## Implementation Summary

| Metric | Phase A (baseline) | Phase B (cycle 1) | Total |
|--------|-------------------:|------------------:|------:|
| Total tasks | 7 | 4 | **11** |
| Total batches | 4 | 2 | **6** |
| Total complexity points | 29 | 11 | **40** |
| Avg tasks per batch | 1.75 | 2.0 | 1.83 |
| Avg complexity per batch | 7.25 | 5.5 | 6.67 |
| Tasks per task spec | — | — | 1 |

Per-task complexity (Phase B): AZ-513 (3) + AZ-196 (2) + AZ-183 (3, reverted) + AZ-197 (3) = 11 points.

## Quality Metrics

### Code Review Results

| Verdict | Count | % |
|---------|------:|--:|
| PASS | 5 | 83% |
| PASS_WITH_WARNINGS | 1 | 17% |
| FAIL | 0 | 0% |

### Findings by Severity (code review only — security audit findings counted separately below)

| Severity | Count | Source |
|----------|------:|--------|
| Critical | 0 | — |
| High | 0 | — |
| Medium | 1 | batch_05 F1 (race on sequential serial) |
| Low | 3 | batch_05 F2/F3/F4 (uniqueness, key rotation, default empty key) |

### Findings by Category

| Category | Count | Top Files |
|----------|------:|-----------|
| Bug | 1 | `Azaion.Services/UserService.cs` (RegisterDevice) |
| Maintainability | 3 | `Azaion.Services/ResourceUpdateService.cs` (×2), `Azaion.AdminApi/appsettings.json` |
| Spec-Gap | 0 | — |
| Security | 0 *(code review)* / 13 *(security audit)* | — |
| Performance | 0 | — |
| Style | 0 | — |
| Scope | 0 | — |

### Security Audit (out-of-band, post-implementation)

| Severity | Count | Status at end of cycle |
|----------|------:|------------------------|
| Critical | 0 | — |
| High | 3 | F-1 closed (OTA reverted), F-3 closed (UNIQUE INDEX), D-1 closed (Newtonsoft 13.0.4); 1 pre-existing (F-2 path traversal) deferred to AZ-516 |
| Medium | 5 | 0 closed in audit; recorded as AZ-517..AZ-520 |
| Low | 5 | 0 closed; recorded as AZ-521 (bundle) |

> The audit found 1 **regression** introduced by cycle-1 work: F-1 (`/get-update` exposed plaintext encryption keys, AZ-183). Fix: full revert of AZ-183. F-3 was an amplification of a pre-existing race (`RegisterDevice` not having a UNIQUE INDEX); the audit closed it by adding `env/db/06_users_email_unique.sql` and consolidating `RegisterDevice` to delegate row insertion to `RegisterUser`.

### Performance Test

| Verdict | NFT thresholds met | Coverage gaps |
|---------|--------------------|---------------|
| PASS | 2/2 (NFT-PERF-01 login p95=33 ms vs 500 ms; NFT-PERF-04 user-list p95=152 ms vs 1000 ms) | NFT-PERF-02/03 obsolete (OTA reverted); no `/classes` perf coverage yet |

### Deploy Audit (this step)

| Drift | Severity | Resolved this cycle | Carried forward |
|-------|---------:|--------------------:|----------------:|
| A — host pulls `:latest`, CI never produces it | Medium | yes | — |
| B — no secret manager | Medium | yes (sops + age) | — |
| C — container runs as root | Medium | yes (`USER app`) | — |
| D — stale `.woodpecker/build-arm.yml` reference | Low | yes (doc + actual files audited) | — |
| E — perf script run-on-demand | Low | spec'd; auto-gating deferred | I |
| F — no vulnerable-dep gate | Low | yes (deps-audit step) | — |
| G — unused `docker.test/Dockerfile` | Low | yes (deleted) | — |
| H — TCP-only healthcheck in test compose | Low | yes (curl /health/live) | — |
| I — no coverage threshold | Low | — | yes |
| J — manual DB migrations | Low | — | yes |
| K — no metrics / tracing implemented | Medium | spec only | yes |
| L — no central log aggregator | Low | — | yes |
| M — no tracing exporter | Low | — | yes |
| N — no zero-downtime deploy | Medium | — | yes |
| O — no remote SSH wrapper | Low | — | yes |

**7 resolved this cycle, 8 carried forward.**

## Efficiency Metrics

| Metric | Value | Notes |
|--------|------:|-------|
| Blocked tasks | 0 | — |
| Tasks requiring fixes after review | 0 | All findings deferred or descoped, none required cycle re-entry |
| Auto-fix attempts triggered | 0 | Across all 6 batches |
| Stuck agents | 0 | — |
| Reverts after main code shipped | 1 | **AZ-183** — same-day revert after security audit finding F-1 |
| Skipped tests with documented reason | 1 | AZ-195 AC-1 (DB recovery test needs Docker socket access) |
| Test pass rate (E2E suite, end of Step 7) | 44/44 | After Dockerfile + healthcheck changes |

### Blocker Analysis

No blockers, but two notable mid-cycle pivots:

| Event | Type | Prevention idea |
|-------|------|------------------|
| User clarified mid-implement (2026-05-13) that the Loader is architecturally retired → AZ-197 was rescoped from cross-workspace to admin-only | Spec ambiguity discovered late | Add an "implicit assumptions" review gate to `new-task` Step 5 (Acceptance Criteria) that explicitly asks: which other workspaces does this touch? Are they still active? |
| Security audit found AZ-183 ships plaintext encryption keys → entire feature reverted same day | Threat model gap not caught at planning | Add a lightweight "what new authenticated endpoints / persistence does this introduce?" prompt to `new-task` Step 5; route any non-zero answer through a 5-minute threat-model check before complexity is finalized |

## Structural Snapshot

This is the first retro, so no delta computation. Snapshot persisted to `_docs/06_metrics/structure_2026-05-13.md` (placeholder — module-layout.md has 5 conceptual sub-components but only **one** ownership boundary in the registry, so cross-component edge counting is degenerate for this workspace).

| Metric | Value | Source |
|--------|------:|--------|
| Components (registry) | 1 (`Admin API`) | `_docs/02_document/module-layout.md` |
| Conceptual sub-components | 5 | same |
| csproj projects | 5 | `Azaion.AdminApi.sln` (4 prod + 1 e2e) |
| Cycles in module graph | 0 | inspection (single deployable, no cross-component edges in the registry) |
| New Architecture violations this cycle | 0 | no `cumulative_review_batches_*.md` exists; verified by inspection of batch reviews — no Architecture-category findings |
| Resolved Architecture violations | 0 | — |
| Net Architecture delta | 0 | — |
| Public-API contract files (`_docs/02_document/contracts/`) | 0 | folder absent |
| Contract coverage % | n/a | n/a |

> Contract files are not part of this project's documentation set today. If future cycles introduce them (e.g., as part of a UI ↔ admin contract test effort), this section will start carrying real coverage numbers.

## Trend Comparison

| Metric | Previous | Current | Change |
|--------|----------|--------:|--------|
| Pass rate | n/a | 83% (5/6) | n/a |
| Avg findings per batch | n/a | 0.67 | n/a |
| Reverts | n/a | 1 | n/a |
| Carried-forward operational drifts | n/a | 8 | n/a |

## Top 3 Improvement Actions

1. **Add a security threat-model micro-step to `new-task` Step 5 (Acceptance Criteria)**
   - **What**: Two extra lines on every task spec — "New authenticated endpoints introduced: [list]" and "New persistent data introduced: [list]". If either is non-empty, the next sub-step is a 5-minute threat-model check (data flow, secrets exposure, replay surface). Output recorded in the task spec under `## Threat Model Notes`.
   - **Impact**: catches the AZ-183-style "endpoint exposes plaintext key" class of regression at planning time, before the 3-pt budget is committed. Saves at least one cycle of implement → security-audit → revert per occurrence.
   - **Effort**: low (skill text edit + template addition).

2. **Adopt the `_cycleN_` batch-report naming convention starting cycle 2**
   - **What**: Rename forward — every new batch report and code-review file in cycle 2+ uses `batch_NN_cycleM_report.md` and `batch_NN_cycleM_review.md`. Cycle-1 files stay as `batch_NN_report.md` for history. Update the `implement` skill's report-filename template.
   - **Impact**: prevents silent overwrite of cycle-1 batch reports when cycle 2's `batch_07` lands (would currently collide with `batch_07_report.md` if that name was used). Already documented in the existing-code flow Step 10 — this enforces it.
   - **Effort**: low (one edit in `.cursor/skills/implement/`).

3. **File the 8 carried-forward deploy drifts as Jira tickets in cycle 2 backlog**
   - **What**: I, J, K, L, M, N, O are real backlog items (coverage gates, automated migrations, metrics + tracing, central logs, exporter, zero-downtime deploy, remote SSH wrapper). They currently live only as references in `_docs/04_deploy/*.md`. Promote them to AZ-tickets with story points.
   - **Impact**: makes operational debt visible alongside feature work; protects against silent erosion of the deploy plan over multiple cycles.
   - **Effort**: medium (≈ 30 min of ticket creation + sizing).

## Suggested Rule / Skill Updates

| File | Change | Rationale |
|------|--------|-----------|
| `.cursor/skills/new-task/SKILL.md` | Add Step 5.5 — "Threat-Model Micro-Check" with the two prompts above | AZ-183 revert (cycle 1) |
| `.cursor/skills/implement/SKILL.md` | Update batch-report filename template to `batch_NN_cycleM_report.md` (and review file analogously) | Naming-collision risk on cycle 2 |
| `.cursor/rules/coderule.mdc` | Add bullet: "Do not reuse retired numeric error codes (gaps are intentional)" | Batch 6 deletes codes 40 and 45 from `ExceptionEnum` — needs a rule so cycle 2 reviewers know not to fill the gap |
| `_docs/04_deploy/`-derived backlog | New AZ-* tickets for drifts I, J, K, L, M, N, O | Top action 3 above |

## Notes

- **First retrospective.** No prior baseline; cycle 2 will be the first one with delta numbers.
- **Cycle health**: green. 0 FAIL verdicts, 0 stuck agents, 0 auto-fix attempts, 44/44 E2E tests pass after Step 7's code edits. The single revert (AZ-183) was caught by the next-step security audit and resolved before deploy — the system worked, but the goal of the threat-model micro-check is to catch it one step earlier.
- **Operator burden after this cycle**: the 8 carried-forward drifts represent ≈ 22 story points of follow-up infrastructure work (rough sizing — to be confirmed when filed as tickets per Top Action 3).