Drains all three deferred perf-harness items in one batch: - PT-01..PT-06 now carry Authorization: Bearer minted via the canonical SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy of JWT logic in the shell. - PT-07 implemented as cold + warm dual-pass distribution (N=20 each), reports p50/p95 for both passes and fails if warm p95 >= cold p95. - PT-08 implemented as 20-batch upload distribution with batch p95 gated at the AZ-488 2000 ms target; per-item gate cost reported as derived proxy (batch_p95 / batch_size). New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI short-circuit subcommands (--mint-only and --gen-uav-fixture <path>) invoked by the shell so the perf script never inlines the JWT or JPEG-fixture logic. The dispatch sits at the top of Program.cs Main and runs before any HTTP / DB / readiness setup. performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented. traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered (PT-08 partial due to per-item proxy — flagged Low in batch-4 review). _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the leftovers directory is now empty. Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred NFRs remains in force as a guardrail. Also includes the previously-uncommitted cumulative review report for cycle-3 batches 01-03 (generated at the end of batch 3 but not staged). Co-authored-by: Cursor <cursoragent@cursor.com>
19 KiB
Retrospective — Cycle 2 (AZ-487 + AZ-488, 2026-05-11)
Cycle: 2 (Phase B feature cycle, two tasks)
Tasks: AZ-487 (JWT validation baseline, 2 SP) + AZ-488 (UAV tile upload + 5-rule quality gate, 8 SP — user-accepted over-cap)
Mode: cycle-end (autodev Step 17)
Previous retro: retro_2026-05-11.md (cycle 1, AZ-484)
1. Implementation Metrics
| Metric | Value | Δ vs cycle 1 |
|---|---|---|
| Tasks implemented | 2 (AZ-487, AZ-488) | +1 |
| Batches | 2 (batch 01 + batch 02 of cycle 2) | +1 |
| Avg tasks / batch | 1.0 | unchanged |
| Complexity points delivered | 10 SP (2 + 8) | +5 SP |
| Avg complexity / batch | 5.0 SP | unchanged |
| Tasks at-or-below 5 SP cap | 1 of 2 (AZ-487) | AZ-488 above cap |
| Source files modified (production) | 6 | +2 |
| Source files added (production) | 11 (JWT extension, permission requirement, UAV gate + handler + DTOs + config + request envelope) | +8 |
| Test files added | 11 (4 JWT-side unit, 1 permission unit, 3 UAV-gate/handler/path unit, 1 image factory, 1 UAV integration suite, 1 JWT integration suite) | +10 |
| Doc files added | 4 (uav-tile-upload.md contract, ripple_log_cycle2.md, batch_01_cycle2_report.md, batch_02_cycle2_report.md, plus 2 review files, deploy report) |
+6 |
| Migrations added | 0 (AZ-488 reuses AZ-484 schema) | -1 |
| Frozen contracts produced | 1 (uav-tile-upload v1.0.0) |
unchanged trend (cycle 1 also produced 1) |
| Total source diff in cycle | +2,488 / -80 lines across 28 files | — |
| Total doc diff in cycle | +1,811 / -64 lines across 35 files | — |
2. Quality Metrics
| Metric | Value | Δ vs cycle 1 |
|---|---|---|
| Code review pass rate | 2/2 = 100% (both PASS_WITH_WARNINGS, neither FAIL) | unchanged |
| Code review findings — Critical | 0 | unchanged |
| Code review findings — High | 0 | unchanged |
| Code review findings — Medium | 0 | unchanged |
| Code review findings — Low | 6 (2 in batch 01, 4 in batch 02 — but batch-02 F1 is the same item as batch-01 F1 carried over, so 5 distinct items) | +5 |
| Distinct findings (deduped) | 5 (F1 doc-path missing, F2 dev JWT secret committed, batch-02 F2 mutable JpegMagicBytes, F3 UAV path divergence, F4 byte[].ToArray() per write) | +5 |
| Code review FAIL count | 0 | unchanged |
| Auto-fix attempts during code review | 1 (batch 02 in-flight build fix: removed unused using Microsoft.AspNetCore.Http;) |
+1 |
| Post-code-review test-failure iterations | 5 (CS0104 ambiguity, IDX12401 unit-side, IDX12401 integration-side, uniform-grey test, _coordinateCounter collision) |
+3 vs cycle 1 (which had 2) |
| Security audit verdict | PASS_WITH_WARNINGS | unchanged |
| Security findings introduced by cycle 2 | 2 new Medium (F-AUTH-2 iss/aud unvalidated, F-UAV-1+F-DEPS-UAV ImageSharp decode surface) + 4 new Low + 1 Informational | +2 Medium |
3. Structural Metrics (snapshot: structure_2026-05-11_cycle2.md)
| Metric | Value | Δ vs cycle 1 |
|---|---|---|
| Components | 6 | 0 |
| ProjectReference edges | 12 | 0 (no new cross-component edges) |
| Cycles in import graph | 0 | 0 (still clean DAG) |
| Avg ProjectReferences / component | 2.0 | 0 |
| Architecture violations newly introduced | 0 | — |
| Architecture violations resolved | 0 | — |
| Net architecture delta | 0 | second clean cycle in a row |
| Frozen contracts | 2 | +1 (uav-tile-upload v1.0.0) |
| Contract coverage % | ~29% | +15 pp |
| External NuGet edges added | +2 (JwtBearer 8.0.21, ImageSharp 3.1.11) | — |
4. Efficiency Metrics
| Metric | Value | Δ vs cycle 1 |
|---|---|---|
| Blocked task count | 0 | unchanged |
| Tasks completed first attempt (post-review) | 0 of 2 — both required test-gate fix iterations | unchanged (0 of 1 in cycle 1) |
| Tasks requiring multiple post-code-review fix commits | 2 of 2 — AZ-487 had 3 separate fix: commits; AZ-488 had 1 wallclock fix + 1 in-flight build fix |
+1 (cycle 1 had 1 of 1) |
| First-attempt-runtime-pass rate | 0 / 2 = 0% | unchanged (cycle 1 had 0/1) |
| Most-findings batch | batch 02 (4 findings, all Low) | — |
| Spec accuracy ("doc paths exist") | 0 of 2 task specs accurate — both referenced _docs/02_document/components/01_web_api/description.md which doesn't exist |
new pattern this cycle (was AZ-484 specific in cycle 1) |
| Performance gate scenarios run | 0 of 7 active (PT-01..PT-06) + 0 of 2 deferred (PT-07, PT-08) — Step 15 skipped by user, run-performance-tests.sh doesn't attach JWT |
0 → 0 (still no perf runs landing) |
5. Patterns Identified
Pattern 1 — Duplicate JWT mint helpers diverged then both needed identical fixes
SatelliteProvider.Tests/TestUtilities/JwtTokenFactory.cs and SatelliteProvider.IntegrationTests/JwtTestHelpers.cs were authored as two near-identical copies of the same logic during AZ-487. They were not the same file because the unit and integration test projects don't share a reference. Both contained the same Expires < NotBefore bug for negative-lifetime tokens; both needed the same fix in separate commits (f64d0d7 for unit-side, 11b7074 for integration-side).
This is a classic copy-paste-diverge pattern. The unit-side helper is more mature (has happy + tampered + expired variants); the integration-side helper started as a subset and inherited the same bug. The fix-twice cost was small this time (one extra commit), but if the divergence had been a security-relevant difference the consequences would be larger.
Pattern 2 — Pre-existing test-side bugs unmasked by downstream test pressure
The AZ-487 batch report shipped with status "test gate not yet run" because the implement skill committed before Step 11. Three separate AZ-487 test bugs (CS0104, IDX12401 unit, IDX12401 integration) did not surface until AZ-488 layered new tests on top and ran the full suite. The bugs were introduced by AZ-487 but only observable during AZ-488's test gate.
The fix flow worked correctly (separate fix: commits, no scope creep), but the metric "AZ-487 batch report: 8/8 ACs covered, PASS_WITH_WARNINGS" is misleading — at that point the new tests had not actually been executed end-to-end. A more honest framing: "AZ-487 declared done while its own tests had compile errors masked by build resolution order."
Pattern 3 — Cycle 1 retro Action 2 not executed; perf gate now has 3 deferred items
Cycle 1 Action 2 ("couple NFR additions in test-spec with runner-script updates in the same step") was approved and the lesson appended to LESSONS.md. Cycle 2 added PT-08 to performance-tests.md under the Deferred branch sanctioned by that lesson — which is correct by the letter of the rule. But the underlying harness work (scripts/run-performance-tests.sh PT-07 implementation) still has not landed, and cycle 2 added a third perf-side item (the runner script doesn't attach JWT tokens against the new auth-required endpoints).
The improvement action helped us stop adding undocumented Unverified scenarios. It did NOT help us reduce the backlog. The PT-07 leftover is now 1 cycle old; PT-08 was born deferred; the JWT-attach rot was discovered during the cycle 2 Step 15 gate. The perf gate is currently 0 actual scenarios verified for 2 cycles running.
Pattern 4 — Task-spec doc-path drift repeated, no decision made
Both AZ-487 and AZ-488 task specs referenced _docs/02_document/components/01_web_api/description.md — a folder that has never existed. Both code reviews flagged it as F1 (Low / Style). Both batches worked around it by updating modules/api_program.md instead. The decision item (create the stub folder vs. formalize the WebApi-lives-in-modules convention) is a 1-turn decision; until it's made, every cycle that touches WebApi will repeat the same finding and the same workaround.
Pattern 5 — Integration test state leakage from persistent Postgres volume
UavUploadTests failed mid-test-run with duplicate key value violates unique constraint "idx_tiles_unique_location_source" because _coordinateCounter reset to 0 on every test-runner process start, while the Postgres named volume in docker-compose.yml persists tile rows across docker-compose down cycles. The wallclock-seeded counter (dc3dabe) makes per-run coordinates unique enough to avoid collision, but it's a workaround — the test isn't isolated, it just probabilistically doesn't collide.
The real fixture-isolation options are: (a) docker-compose down -v in scripts/run-tests.sh, (b) explicit DELETE-from-tiles at integration test startup, or (c) a per-test transaction with rollback. Option (b) is the smallest fix; option (a) is the easiest. Option (c) is the proper TDD answer but probably overkill for this codebase.
Pattern 6 — 8 SP task came with measurable friction even with explicit user override
AZ-488 was 8 SP (over the 5 SP user-rule cap), and the cycle landed cleanly — but at observable cost: 1 in-flight build fix (auto-fix during implement), 1 wallclock-counter fix during test-run, 1 PT-08 NFR carried as a Deferred entry, 1 in-flight DTO-layering refactor (Common vs Api placement). None of these were failures. All of them were small, in-isolation harmless edits. But this is exactly the friction profile that the 5 SP guidance is designed to flag, and the cumulative cost should be visible in retros so future "should I split or accept the user override?" decisions are calibrated.
This is NOT a recommendation to never accept user-overrides on SP cap — the cycle 2 user-override was correct given AZ-488's natural unit-of-work shape. It IS a recommendation to track the over-cap friction so we can see if the pattern persists across cycles.
6. Comparison vs. previous retro
| Metric | Cycle 1 | Cycle 2 | Direction |
|---|---|---|---|
| First-attempt-runtime-pass rate | 0/1 | 0/2 | unchanged at 0% |
| Code review pass rate | 100% | 100% | unchanged |
| Test-failure iterations post-review | 2 | 5 | worsened — but cycle 2 had a 2-batch surface area (5/2 ≈ 2.5 per batch, slightly higher than cycle 1's 2/1) |
| Net architecture delta | 0 | 0 | unchanged (good) |
| Contracts coverage % | 14% | 29% | improved +15 pp |
| Security findings introduced | 0 | 2 Medium | worsened — but both are bounded by existing mitigations |
| Performance scenarios run | 0/7 | 0/8 | unchanged at 0% (Action 2 didn't move the needle) |
Effectiveness check on cycle 1's top-3 actions:
- Cycle 1 Action 1 (persisted-enum DB-roundtrip rule) — landed in LESSONS.md and
coderule.mdc. AZ-484-style enum bugs did not recur in cycle 2 (cycle 2 didn't introduce any new persisted enums, so the rule wasn't exercised but also wasn't violated). Verdict: cannot evaluate — no triggering work in cycle 2. - Cycle 1 Action 2 (NFR + runner same step) — landed in LESSONS.md. Cycle 2 PT-08 was added as Deferred (legal per the rule). But the underlying harness work didn't progress; one new perf-rot item appeared (JWT-attach). Verdict: rule prevented one new failure mode, did not reduce the existing backlog.
- Cycle 1 Action 3 (periodic doc-base-rate refresh) — recorded as backlog item, never scheduled. The pre-existing
tests_unit.mdstaleness was incidentally refreshed during cycle 2's Step 13 doc sync (we walked the file as part of touching it). Verdict: opportunistic fix happened; the recurring task was not scheduled.
7. Top 3 Improvement Actions (ranked by impact)
Action 1 — Consolidate JWT token-mint helpers into one shared utility
Why: Pattern 1. Same bug existed in two near-identical files (SatelliteProvider.Tests/TestUtilities/JwtTokenFactory.cs and SatelliteProvider.IntegrationTests/JwtTestHelpers.cs); both needed independent fixes. The two files will continue to drift, and the next time they drift on something security-relevant (claim handling, signing algorithm, audience binding) the cost will be larger than one extra commit.
Concrete action:
- Move the canonical
JwtTokenFactoryto a shared location that both unit and integration test projects can reference. Two viable shapes:- Option A (preferred): create a new tiny test-support project
SatelliteProvider.TestSupport(class library, no test framework) holdingJwtTokenFactory,UavTileImageFactory, and similar. Reference it from bothSatelliteProvider.TestsandSatelliteProvider.IntegrationTests. Cost: ~1 PBI, ~3 SP. - Option B (cheaper): have
SatelliteProvider.IntegrationTestsadd aProjectReferencetoSatelliteProvider.Testsand consume the existingTestUtilities/JwtTokenFactory.csdirectly. Cost: ~1 SP. Downside: integration tests acquire a dependency on unit tests, slightly unusual.
- Option A (preferred): create a new tiny test-support project
- Either way, delete
IntegrationTests/JwtTestHelpers.cs::MintValidTokenand have the integration tests call the unified factory. - Add a code-review checklist item to
.cursor/skills/code-review/SKILL.mdPhase 6 (Cross-Task Consistency): "If two test projects contain the same helper logic, flag as a Medium finding for consolidation."
Owner: whoever picks up the next test-infrastructure PBI. Recommend scheduling as AZ-XXX Unify test JWT mint helper at 3 SP.
Estimated impact: removes the dual-fix tax on every future JWT-side change; prevents drift on the surface that mints test credentials.
Action 2 — Schedule PT-07 harness work as actual feature work, not a leftover — RESOLVED in cycle 3 (AZ-492)
Why: Pattern 3 + Comparison checkpoint. Cycle 1 Action 2 made the spec-side honest (no more silent Unverified scenarios) but did nothing to drain the backlog. Cycle 2 added a 3rd deferred item (the script-rot on run-performance-tests.sh). The perf gate has been 0-of-N for two cycles running and is now actively misleading — anyone reading the gate banner sees "skipped, will check next cycle" without realizing "next cycle" never arrives.
Concrete action:
- Promote
_docs/_process_leftovers/2026-05-11_perf-pt07-harness.mditems into a real Jira PBI (e.g.AZ-XXX Perf harness: implement PT-07 + PT-08 + JWT-attach in run-performance-tests.sh). 3 SP estimate. - The PBI's exit criteria: PT-07 + PT-08 + at least one of PT-01..PT-06 actually return a real measurement (not
Unverified) in_docs/06_metrics/perf_<date>.md. - After the PBI lands, delete the leftover file. Until it lands, do not accept new Deferred-status NFR entries — either the harness exists or the spec entry doesn't get added. Update
.cursor/skills/test-spec/SKILL.mdaccordingly.
Owner: next planning loop. Sequence it before any AC that has a hard latency or throughput requirement. Estimated impact: turns the perf gate from theatre into a real gate. Reduces the surface area where regressions can hide.
Cycle 3 resolution (AZ-492): PT-07 + PT-08 + Bearer-token attach for PT-01..PT-06 all landed in AZ-492 (cycle 3 batch 4). Leftover _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted. PT-07 and PT-08 status flipped from Deferred to Implemented in _docs/02_document/tests/performance-tests.md; traceability-matrix.md PT-07 + PT-08 rows moved from ◐ recorded to ✓. The cycle-1 [process] LESSONS.md rule about Deferred-status NFRs remains in force as a guardrail.
Action 3 — Reset integration-test DB state between runs (real fix, not workaround)
Why: Pattern 5. The wallclock-seed for _coordinateCounter is a workaround that lets cycle 2 ship but doesn't fix the fundamental issue: integration tests are not isolated from the persistent Postgres volume. The next test class that inserts into tiles will have to invent its own collision-avoidance scheme. Eventually two tests will collide despite both using wallclock seeds (e.g. parallel test execution, fast back-to-back runs in CI).
Concrete action:
- Add a
truncate_or_reset_test_datastep toSatelliteProvider.IntegrationTests/Program.csstartup that explicitly clearstiles,regions,routes,route_points,route_regions(in FK-safe order) whenASPNETCORE_ENVIRONMENT=Testing(or when an explicit--reset-stateflag is passed). The test runner has direct Postgres access already (used byUavUploadTestsfor seeding). - Alternative: extend
scripts/run-tests.shto calldocker-compose down -vbetween--fullruns by default, with an explicit--keep-stateflag for debugging. Choose this if data-reset-in-runner is too invasive. - Document the chosen pattern in
_docs/02_document/modules/tests_integration.mdso future integration-test authors don't reinvent collision-avoidance.
Owner: same PBI as Action 2 ("Test infrastructure hardening: perf harness + DB isolation") OR a separate 2 SP PBI. Estimated impact: enables parallel test execution in the future; eliminates the recurring "why are my inserts colliding?" debug loop.
8. Recommended Rule / Skill updates
| Target file | Change |
|---|---|
.cursor/skills/code-review/SKILL.md (Phase 6) |
Add the duplicate-helper detection rule from Action 1 |
.cursor/skills/test-spec/SKILL.md |
Tighten the cycle-1 Action 2 rule: "Deferred-status entries are allowed at most once per NFR. If a Deferred NFR has not landed by the end of the next cycle in which it was deferred, the spec MUST be locked to that NFR and the harness work scheduled as a real PBI before any new NFR is accepted as Deferred." |
.cursor/skills/implement/SKILL.md (test-gate step) |
Add: "A batch report's 'AC test coverage' claim is provisional until Step 11 (Run Tests) has executed end-to-end. Mark the batch as 'tests-not-yet-exercised' in its report header if the tests were committed but not run." — guards against Pattern 2. |
(optional) .cursor/rules/coderule.mdc § Testing |
Add a note that integration-test data isolation is required, not optional — link to Action 3's chosen pattern when implemented. |
9. Decision items carried over (operator)
These are not part of the top-3 improvement actions because they're 1-turn decisions, not multi-cycle initiatives:
- F1 (cycle 1 + cycle 2 carry) — RESOLVED in cycle 3 (AZ-495):
_docs/02_document/components/01_web_api/description.mddoesn't exist. Choose: (a) create stub folder that defers tomodules/api_program.md, (b) formalize "WebApi has nocomponents/*folder" in the module-layout doc and update all task spec templates to point atmodules/api_program.md. Recommendation: (b) —modules/api_program.mdis already the de-facto canonical location. Resolution: chose Option B;_docs/02_document/module-layout.md§ Documentation Layout and.cursor/skills/new-task/SKILL.mdStep 4 updated to anchor WebApi documentation atmodules/api_program.md; cycle-2 batch-review F1 entries explicitly marked resolved with back-reference to AZ-495. - R2 (cycle 2 security report): confirm with admin team the expected
iss/audvalues for production JWT tokens; flipValidateIssuer/ValidateAudiencetotruein a small follow-up PBI when the values are known. - R3 (cycle 2 deploy report): coordinate with
gps-denied-onboardand mission-planner-UI teams to attach Bearer tokens BEFORE promoting cycle-2 image pastdev.
10. Self-verification
- All cycle 2 implementation artifacts parsed (2 batch reports, 2 reviews, 0 cumulative review since K=3 not reached, 1 deploy report)
- All metric categories computed
- Structural snapshot written (
structure_2026-05-11_cycle2.md); cycle 1 baseline (structure_2026-05-11.md) used for delta calc - Comparison vs previous retro (
retro_2026-05-11.md) included - Top 3 actions are concrete, owner-named, SP-estimated
- Suggested rule/skill updates are file-specific
- Cycle-1 action effectiveness explicitly checked