Files
satellite-provider/_docs/06_metrics/retro_2026-05-12_cycle4.md
T
Oleksandr Bezdieniezhnykh e31f59211d [AZ-500] Cycle 4 Step 17: retrospective + close cycle
Adds retro_2026-05-12_cycle4.md, structure_2026-05-12_cycle4.md, and
the deploy_cycle4.md report that was dropped from the Steps 12-15
sync commit. Appends 3 new lessons to LESSONS.md (12/15 ring buffer)
on transitive major-version bumps, exposed pre-existing bugs, and
single-task-cycle metric framing. State advances to cycle 5 / step 9
(awaiting next New Task invocation).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 06:14:43 +03:00

223 lines
23 KiB
Markdown

# Retrospective — Cycle 4 (2026-05-12)
**Tasks**: AZ-500 (.NET 8 LTS → .NET 10 migration, 5 SP)
**Mode**: cycle-end (autodev Step 17)
**Previous retro**: `retro_2026-05-12_cycle3.md`
**Cycle shape**: single-task migration cycle (cross-cutting infra) — first such cycle since the project was documented.
## 1. Implementation Metrics
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| Tasks implemented | **1** (AZ-500) | -5 |
| Batches executed | **1** | -4 |
| Avg tasks / batch | 1.0 | -0.2 |
| Total complexity delivered | **5 SP** | -13 SP |
| Avg complexity / batch | 5 SP | +1.4 |
| Tasks at-or-below 5 SP cap | **1 of 1 (100%)** | unchanged |
| Tasks above cap | 0 | unchanged |
| Cumulative reviews | **0** (single-task batch — Phase 6 cross-task consistency N/A; only the per-batch review ran) | -2 |
**Sequencing**: Single batch — AZ-500 is one atomic coordinated bump (per its own Constraint: "TFM, SDK pin, Docker images, CI images, and M.E.* package versions ALL move in the same commit"). The cycle completed in 4 dev commits (Step 9 task spec, Step 10 implementation, Step 10 wrap-up reports, Step 12-15 sync).
## 2. Quality Metrics
### Code Review Results
| Verdict | Count | Percentage |
|---------|-------|-----------|
| PASS | 0 | 0% |
| PASS_WITH_WARNINGS | **1** | **100%** |
| FAIL | 0 | 0% |
### Findings by Severity (per-batch code review)
| Severity | Cycle 4 | Δ vs cycle 3 |
|----------|---------|--------------|
| Critical | 0 | unchanged |
| High | 0 | unchanged |
| Medium | **2** (F1 ASPDEPR002 deprecation, F2 CS8604 nullable) | **+2** (cycle 3 had 0 new Medium) |
| Low | 1 (F3 implicit-scope perf-script path fix) | -6 |
### Findings by Category
| Category | Count | Top Files |
|----------|-------|-----------|
| Maintainability | 2 | `SatelliteProvider.Api/Program.cs` (F1), `SatelliteProvider.Api/Swagger/ParameterDescriptionFilter.cs` (F2) |
| Scope | 1 | `scripts/run-performance-tests.sh:49` (F3 — necessary scope creep, accepted) |
| Bug | 0 | — |
| Spec-Gap | 0 | — |
| Security | 0 NEW (5 NEW informational Lows in `dependency_scan_cycle4.md` — all "no published advisory" confirmations on the 10.x lines) | -2 Low (cycle 3 introduced 3) |
| Performance | 0 | — |
| Style | 0 | — |
**Note on the 2 Medium findings**: both are *consequences of the major-version bump itself*, not implementation defects. F1 (ASPDEPR002 on 8 `WithOpenApi(...)` callsites) is a deprecation that the .NET 10 line introduces — the API still works, but the old surface is slated for removal. F2 (CS8604 nullable on `parameter.Name`) is exposed by Microsoft.OpenApi 2.x's stricter nullable annotations. Both were intentionally deferred per AZ-500's "preserve behaviour during runtime/SDK migration" contract; neither was an implementation choice.
### Security audit (cycle 4)
| Metric | Value | Δ vs cycle 3 |
|--------|-------|--------------|
| Verdict | **PASS_WITH_WARNINGS** | unchanged |
| Mode | **Resume** (only `dependency_scan` re-run; static / OWASP / infrastructure carried forward unchanged because AZ-500 made no source-level edits to those surfaces) | new pattern |
| New Critical / High | 0 / 0 | unchanged |
| New Medium | **0** | unchanged |
| New Low (informational only — all "no published advisory" confirmations on the 10.x lines) | **5** | varies |
| Resolved findings | **2** (cycle-3 D1 + D3 forward-resolved by the major-version bump to JwtBearer 10.0.7 / OpenApi 10.0.7) | -1 (cycle 3 resolved 3) |
| Carry-overs (still OPEN) | 1 (cycle-3 D2 — `Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag, explicitly out of AZ-500 scope) | unchanged |
### Performance gate (cycle 4 — first executed perf gate since cycle 1's AZ-484 run)
| Metric | Value |
|--------|-------|
| Verdict | **PASS_WITH_UNVERIFIED** |
| Scenarios | 7 Pass · 0 Warn · 0 Fail · 1 Unverified |
| AZ-500 NFR (Performance) | **MET** for 7/8 scenarios; PT-08 unmeasurable due to pre-existing `scripts/run-performance-tests.sh:417` grep-pipefail bug (not a .NET 10 regression) |
| PT-07 warm p95 | 301ms — **7.7x improvement** vs cycle-3 short-variant baseline (2340ms at N=2); cold p95 = 2782ms (-14%) |
| PT-06 route creation | 90ms — **49% improvement** vs cycle-3 (178ms); consistent with .NET 10 GC + JIT improvements on small-allocation paths |
| Cycle-3 perf-harness leftover | **STAYS OPEN** per AZ-500 Constraint (deletes only on a fully clean run; replay #3 results appended) |
## 3. Structural Metrics (snapshot: `structure_2026-05-12_cycle4.md`)
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| .NET projects (csproj) | **9** | unchanged |
| Cross-project edges (ProjectReference) | **20** | unchanged |
| Cycles in project graph | 0 | unchanged |
| Average ProjectReferences per component | ~2.2 | unchanged |
| New Architecture violations | **0** | unchanged |
| Resolved Architecture violations | **2 forward** (cycle-3 D1 + D3 by major-version bump) | -1 (cycle 3 resolved 3 + the cycle-2 PT-07/PT-08 leftover) |
| Net Architecture delta | **0** | +3 (cycle 3 was -3) |
| Contract coverage % | unchanged (no new public API surfaces) | n/a |
**Cycle-4 is structurally neutral by design** — a runtime/SDK migration that preserves behaviour. The graph properties confirm this: same 9 projects, same 20 edges, same DAG (zero cycles), same in-degree distribution. The only structural-equivalence change is the coordinated 11-package M.E.* + 3-package ASP.NET + 1 Swashbuckle major-line bump, all forward-clean against advisories.
## 4. Efficiency Metrics
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| Blocked tasks (during implementation) | **0 of 1** — implementation iterated 3 times during the implement skill (Serilog NU1605 → revert per Risk #4; Microsoft.OpenApi 2.x compile errors → user A/B/C decision → Swashbuckle 6.6.2 → 10.1.7 + Program.cs refactor; OpenApiSecurityRequirement type mismatch → wrap in lambda + List<string>), but never reached the code-review FAIL gate | improvement (cycle 3 had 1 blocked of 6) |
| Tasks completed first attempt (no post-review fix commits) | **1 of 1 (100%)** | best cycle on record (cycle 3 was 5 of 6) |
| Tasks requiring multiple post-code-review fix commits | 0 | unchanged |
| Most-findings batch | batch 01 (the only batch — 2 Medium + 1 Low; both Mediums are scope-deferred, not implementation defects) | n/a |
| Cumulative-review-only findings | n/a (single-task batch) | n/a |
| Step-15 (Perf Test) execution | **EXECUTED** (full default-param run; 7 Pass + 1 Unverified) | **first cycle since cycle 1 to actually execute Step 15** |
| Step-14 (Security Audit) — net findings improvement | **+2** (2 Resolved cycle-3 carry-overs by forward-bump; 5 new Low informational confirmations only) | unchanged net direction (cycle 3 was +1) |
## 5. Patterns Identified
### Pattern 1 — Major-version bumps cascade through transitive deps; the surprise was Microsoft.OpenApi 1.x → 2.x
The implementation hit two separate dependency-conflict surprises that the task spec didn't pre-anticipate:
1. **`Serilog.AspNetCore 10.0.0` → NU1605 conflict** with the project's pinned `Serilog.Sinks.File 6.0.0` (transitive dep `>= 7.0.0`). Resolved by reverting to `Serilog.AspNetCore 8.0.3` per Risk #4's pre-documented fallback. **The risk was anticipated; the resolution was already specified.** Good outcome.
2. **`Microsoft.AspNetCore.OpenApi 10.0.7` → CS0234/CS0246/CS7069 compile errors** in `Program.cs` because Microsoft.OpenApi 2.x removed the `Microsoft.OpenApi.Models` namespace, and Swashbuckle 6.6.2 (still pinned) only knows the 1.x namespace. **The risk was NOT anticipated** — Risk #2 in the task spec only flagged "Swagger UI generation breaking" at a high level; the actual breakage was at compile time (namespace removal), and the fix required bumping Swashbuckle (6.6.2 → 10.1.7) AND refactoring `Program.cs` to use the new 2.x types (`OpenApiSecuritySchemeReference`, `JsonSchemaType`, `IOpenApiSchema`). The user was correctly asked via A/B/C; option A was picked; the refactor landed cleanly.
**Insight**: when a task spec lists "Microsoft.AspNetCore.* package bumps", it should also list the **transitive packages whose major version changes** as a result. Microsoft.OpenApi was technically transitive in cycle 3 (Swashbuckle 6.6.2 pulled it in at 1.x); after the AspNetCore 10.0.7 bump it became *direct* (the new line wants 2.x), forcing Swashbuckle and source code to follow. **A pre-implementation `dotnet restore` dry run (or NuGet manifest diff) against the proposed pin set would have caught this in spec-time, not implementation-time.**
### Pattern 2 — A pre-existing bug in a sibling script became visible the moment the migration unblocked the path leading to it
`scripts/run-performance-tests.sh:417` (`grep -o ... | wc -l` with `set -o pipefail`) had been broken since the script was written, but it never fired because the script always exited earlier (cycle 1 didn't reach PT-08; cycle 2 PT-08 was a stub; cycle 3 perf gate was skipped). AZ-500's bootstrap fix (replay #2) and the Step 15 full run (replay #3) were the first two runs ever to reach line 417 — and both crashed there with the same root cause.
**Insight**: scope-protected migrations like AZ-500 explicitly defer fixes to *unrelated* code, but they should explicitly call out **bugs the migration newly EXPOSES** (vs. introduces) as required follow-up PBIs in the task spec's `Risks & Mitigation` or a new `Newly Exposed Bugs` section. The cycle-3 perf-harness leftover already documents this case in detail; the pattern itself is what's worth surfacing.
### Pattern 3 — "Resume" mode for the security audit avoided 4 unnecessary re-runs without losing posture
The cycle-4 security audit ran in **resume mode** — only Phase 1 (dependency scan) was re-executed; Phases 2/3/4 (static analysis, OWASP, infrastructure) carried forward unchanged from cycle 3 because AZ-500 made no source-level edits to those surfaces (only `Program.cs` Swashbuckle DI registration internals + one `using` directive change in `ParameterDescriptionFilter.cs`).
**Insight**: per-cycle sub-skill prerequisites that ask "resume / overwrite / skip" should accept "resume narrowed to phases X" as a first-class option when the cycle's scope is provably outside the unchanged phases' coverage. We did this manually here via the A/B/C choice; codifying it in the security skill's prereq #3 would be a small DX win.
### Pattern 4 — First cycle since cycle 1 where Step 15 (Performance Test) actually executed end-to-end
Cycles 2 + 3 both **skipped** Step 15. Cycle 4 ran the full default-parameter perf harness (PT-01..PT-08) against the migrated `dev` image and captured 7 Pass + 1 Unverified (PT-08 hit the pre-existing script bug). This produced:
- The first PT-07/PT-08 numbers ever recorded against the active codebase (cycle-1 baseline was AZ-484-only; PT-07/PT-08 were added in cycle 3 by AZ-492 but never executed end-to-end);
- A 7.7x improvement signal on PT-07 warm p95 (mostly N=20 dilution; partly .NET 10 pipeline);
- Concrete confirmation that AZ-500 NFR (Performance — "must not regress beyond existing thresholds") is met for every scenario where measurement was possible.
**Insight**: the cycle-3 retro Action 1 ("Execute the cycle-3 perf harness against the deployed `dev` image to convert the cycle-3 perf-execution leftover into PT-07/PT-08 baseline numbers") was effectively absorbed into AZ-500's NFR (Performance) gate. Worth recording that Action-1 happened — just not as a standalone PBI, but folded into the migration's verification.
### Pattern 5 — Single-task cycles produce structurally neutral retro metrics — that's not a problem, but it confuses the trend lines
Most efficiency / quality / structural metrics in this retro are "unchanged from cycle 3" or "0 NEW", because runtime/SDK migrations don't add business logic. The trend graphs in §6 will show artificial flatlines on metrics like "tasks implemented" (1 vs 6) and "complexity delivered" (5 vs 18 SP). That's correct — but if read naively, it could look like cycle 4 was "less productive" than cycle 3.
**Insight**: when a cycle has a single non-functional task (migration, refactor, dependency hygiene), the retro should explicitly reframe the metric set around what *that* PBI shape proves: continuity (0 regressions), forward-resolution (cycle-3 D1+D3 closed), and unblocking (perf harness now runnable end-to-end). The "tasks implemented" count as a proxy for productivity is misleading here.
## 6. Comparison vs. previous retros
| Metric | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 |
|--------------------------------------|----------------|----------------|----------------|----------------|
| Tasks implemented | 1 | 2 | 6 | **1** |
| Total complexity delivered | 8 SP | 10 SP | 18 SP | **5 SP** |
| Batches | 1 | 2 | 5 | **1** |
| Critical/High review findings | 0 | 0 | 0 | **0** |
| New Medium review findings | 0 | 0 | 0 | **2** (both bump-consequences, scope-deferred) |
| New Low review findings | 3 | 6 (5 distinct) | 7 | **1** |
| Code review pass rate | 100% (1/1) | 100% (2/2) | 100% (5/5) | **100% (1/1)** |
| Tasks completed first attempt | 0 of 1 | 0 of 2 | 5 of 6 | **1 of 1** |
| New Medium security findings | 2 | 2 | 0 | **0** |
| Resolved security findings | 0 | 0 | 3 | **2** (forward-bump) |
| Net Architecture delta | n/a (baseline) | +0 | -3 | **0** |
| Step-15 (Perf) executed | YES (AZ-484) | SKIPPED | SKIPPED | **YES (full PT-01..PT-08)** |
| Step-15 leftover present at retro | NO | YES | YES | **YES (still — pre-existing script bug, not migration-caused)** |
### Did the cycle-3 actions land?
- **Cycle 3 Action 1 (execute perf harness against deployed dev image)** — landed implicitly via AZ-500 Step 15 full run. PT-07 + PT-08 numbers now recorded in `_docs/06_metrics/perf_2026-05-12_cycle4.md`. **Verdict: implemented as part of the migration's NFR gate**, not as a standalone 2-SP PBI. Leftover stays open due to the *separate* pre-existing PT-08 script bug.
- **Cycle 3 Action 2 (bump `System.IdentityModel.Tokens.Jwt 7.0.3 → 7.1.2+` to clear NU1902)** — **NOT landed in cycle 4**. Explicitly out of AZ-500 scope per "no unrelated package bumps" Constraint. NU1902 hits still appear in every test build log (counted ~9 in the AZ-500 perf-harness build trace). Recommended rollover to cycle 5.
- **Cycle 3 Action 3 (`workspace:` field on cross-repo ACs in new-task skill)** — **NOT landed in cycle 4**. AZ-500 had zero cross-repo ACs (it's a single-workspace migration), so this rule was not exercised. Recommended rollover.
This is the second consecutive cycle where a prior-retro action fully closed (Action 1 here, Actions 1+2+3 in cycle 3). Pattern is stable — good.
## 7. Top 3 Improvement Actions (ranked by impact)
### Action 1 — Fix `scripts/run-performance-tests.sh:416-417` grep-pipefail (1 SP, mechanical)
**Why this is the highest impact**: this single-line bug (replace `grep -o ... | wc -l` with `grep -c ... || true`) blocks PT-08 measurement on every run, AND blocks closure of the cycle-3 perf-harness leftover that has now been carried across 3 cycles. Two consecutive replays (#2 + #3) reproduced the exact same failure mode at the same line; the fix is provably needed AND provably small.
**Action**: 1 SP PBI in cycle 5. Replaces 2 lines in the script. After it lands, re-run `./scripts/run-performance-tests.sh` once — if the full PT-01..PT-08 sweep is clean, delete `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`.
**Cost**: ~15 minutes (edit + one perf run + leftover deletion). Counted as 1 SP because the deletion contract requires a successful end-to-end run.
### Action 2 — Migrate the 8 `WithOpenApi(...)` callsites in `Program.cs` to ASP.NET Core 10 minimal-API metadata extensions (3 SP)
**Why**: clears the 8 `ASPDEPR002` deprecation warnings that AZ-500 left behind (intentionally, per the "preserve behaviour" contract). The replacement API is documented at `https://aka.ms/aspnet/deprecate/002` — straightforward swap to `WithSummary`/`WithDescription`/`WithName`/`WithTags`. Once migrated, the deprecation warning drops out of every CI build log and the Swagger UI continues to render the same metadata.
**Action**: 3 SP PBI in cycle 5. Already filed as a recommended follow-up in `_docs/03_implementation/reviews/batch_01_cycle4_review.md`. Test gate is the existing `SwaggerDocument_AdvertisesBearerSecurityScheme` programmatic check + a manual swagger-UI smoke (Bearer Authorize button still present).
**Cost**: 3 SP — touches 8 callsites, light testing burden. Could reasonably be paired with Action 3 below as a single 5-SP "OpenApi 2.x cleanup" PBI.
### Action 3 — Pre-flight transitive-major-version impact analysis at task-spec time (process change, ~0 SP for the rule, ~1 SP for the next migration that uses it)
**Why**: AZ-500's biggest implementation surprise (Microsoft.OpenApi 1.x → 2.x cascade forcing Swashbuckle bump + Program.cs refactor) was knowable in advance with a `dotnet restore --dry-run` or `dotnet list package --include-transitive` before/after the proposed pin set. The task spec accurately listed the *direct* package bumps but not the transitive major-version flips. Pattern 1 above documents the cost: extra A/B/C round-trip during implementation, plus an unscheduled refactor.
**Action**:
- Add to `coderule.mdc` (or a new `task-spec.mdc`): "When a task spec proposes a major-version bump of any direct dependency, the spec must also list the transitive packages whose major version changes as a result, OR explicitly note 'transitive major-version drift not analyzed in spec — implementer to surface and ASK if any non-trivial transitive bump is required'. The check is: run `dotnet restore --dry-run` (or `dotnet list package --include-transitive`) against a scratch branch with the proposed pins before writing the spec, OR mark the spec as 'transitive analysis deferred to implementation time' so the implementer knows to allow extra time."
- Optionally add a `new-task/SKILL.md` Step prompt: "for any package bump in this task, has the transitive major-version drift been analyzed?"
**Cost**: rule addition is 0 SP; the next migration PBI that adopts the practice will absorb a ~1 SP slack to do the dry-run analysis at spec time. ROI: avoids the ~30-minute mid-implementation unscheduled refactor + A/B/C round-trip that AZ-500 hit.
## 8. Suggested Rule / Skill updates
| File | Change | Rationale |
|------|--------|-----------|
| `coderule.mdc` (new bullet near "library API verification" section) | "When a task spec proposes a major-version bump of any direct dependency, the spec must list the transitive packages whose major version changes as a result, OR explicitly note 'transitive major-version drift not analyzed in spec'. The check at spec-time is `dotnet restore --dry-run` against a scratch branch (or equivalent for non-.NET ecosystems)." | Pattern 1 (Microsoft.OpenApi 1.x → 2.x cascade); Action 3 |
| `.cursor/skills/security/SKILL.md` (Phase prereq #3) | Add a fourth option to the resume/overwrite/skip prompt: "Resume narrowed (only re-run Phase X)" — applicable when the cycle's source-level changes are provably outside the unchanged phases' coverage. | Pattern 3 (Resume mode worked manually here; codifying saves a few minutes per single-PBI infrastructure cycle) |
| `coderule.mdc` (new bullet in scope-discipline section) | "When a scope-protected task (migration, dependency bump) newly *exposes* a pre-existing bug elsewhere in the codebase, the implementation MUST surface it as a recommended follow-up PBI in the batch report — and the cycle's deploy report MUST list it as a 'newly exposed bug' separate from 'newly introduced findings'. Bugs that already existed do not count as cycle-introduced regressions, but they must not be silently re-buried." | Pattern 2 (`scripts/run-performance-tests.sh:417` grep-pipefail surfaced via AZ-500's bootstrap fix) |
| `.cursor/skills/retrospective/SKILL.md` (Step 2 narrative guidance) | When a cycle has a single non-functional task (migration / refactor / dependency hygiene), the retro report should explicitly reframe the metric set around continuity (0 regressions), forward-resolution (prior findings closed), and unblocking (capabilities now exercisable end-to-end) — not just task count + complexity points. | Pattern 5 (single-task migration cycles produce flatline metrics that are correct but easily misread) |
## 9. Decision items carried over (operator)
- **Cycle-3 perf-harness leftover** — STAYS OPEN per AZ-500 Constraint. Closure path: Action 1 above (1 SP fix → re-run → delete file). Tracked in `deploy_cycle4.md` runbook + `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`.
- **Admin team iss/aud confirmation** (carried from cycle 3) — still required before promoting beyond `dev`. Unchanged. Tracked in `deploy_cycle3.md` + `deploy_cycle4.md`.
- **Cross-repo doc** (carried from cycle 3) — `suite/_docs/10_auth.md` paragraph addition. Unchanged.
## 10. What this retro says about process maturity
Cycle 4 is the first cycle that:
- Executed Step 15 (Performance Test) end-to-end against a real deployed image since cycle 1.
- Forward-resolved supply-chain advisories by *bumping past them* (cycle-3 D1+D3 closed by the major-version migration), rather than by patch-line bumps.
- Surfaced a transitive-major-version cascade as an implementation-time surprise — and the response chain (NU1605 fail → revert per pre-spec'd Risk → continue; CS0234 fail → user A/B/C → bump Swashbuckle + refactor → continue) showed the implement-skill's recovery loop working as designed.
- Carried a leftover across two cycles where the *underlying capability is healthy* but the *instrumentation harness is broken*. The leftover replay protocol kept the issue visible without blocking forward progress.
The process continues to converge. The remaining friction points after cycle 4 are (a) transitive-dep awareness at spec-time (Action 3), (b) the lingering `scripts/run-performance-tests.sh` grep-pipefail (Action 1), and (c) the cycle-3 carry-overs (Test.Sdk + IdentityModel.Tokens.Jwt) that AZ-500 explicitly excluded — all are concrete cycle-5 PBI candidates totalling ~6 SP, comfortably below a normal cycle's capacity.