[AZ-503] [AZ-504] Cycle 5 Step 17: retrospective + close cycle

retro_2026-05-12_cycle5.md captures the cycle-end retrospective:
- Implementation: 2 tasks (AZ-504 + AZ-503-foundation), 4 SP total,
  100% first-attempt pass rate, 1 mid-implement scope-split
  (AZ-503 → AZ-503-foundation + AZ-505, blocked-linked).
- Quality: 50/50 PASS/PASS_WITH_WARNINGS, 0 new Medium+, 1 new Low
  (defensive contentSha256 soft-NULL guard).
- Security: PASS_WITH_WARNINGS, 0 new Critical/High/Medium, 2 new
  Low informational (F1 flightId provenance, F2 pgcrypto runbook
  gap).
- Performance: PASS_WITH_INFRA_WARNINGS — first measurable PT-08
  ever (Run #1 199ms, Run #2 117ms vs 2000ms threshold); PT-01/02
  failed on recurring local Docker/colima DNS cold-start, not an
  app regression.
- Structural: +1 ProjectReference edge (IntegrationTests → Common),
  +1 minor contract bump (uav-tile-upload 1.0.0 → 1.1.0), +1 DB
  migration (014_AddTileIdentityColumns.sql), 0 NuGet bumps,
  0 csproj additions, DAG still acyclic at 9 projects.

structure_2026-05-12_cycle5.md captures the structural snapshot.

LESSONS.md updated with 3 cycle-5 entries (oldest dropped to
preserve the 15-entry ring buffer):
- [architecture] Cross-repo cryptographic invariants must live as
  code constants in both repos with reference-vector tests.
- [tooling] When perf-mode "one re-run" fires twice with the same
  DNS root cause, escalate from re-run to harness fix.
- [process] Spec contradicts live code by >=2 prerequisites →
  prefer split into foundation + follow-up (A/B/C option C).

Top 3 follow-up actions (cycle 6 candidates):
- Action 1 (1 SP): DNS pre-warm in scripts/run-performance-tests.sh
  → closes the cycle-3 perf-harness leftover.
- Action 2 (5 SP): AZ-505 — inventory endpoint + HTTP/2 + Leaflet
  covering index (blocked-linked on AZ-503-foundation, this cycle).
- Action 3 (1 SP): pgcrypto pre-install runbook step (F2-cy5 doc
  fix).

Cycle 5 closed. Autodev state advanced for cycle 6 by the next
/autodev invocation.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 18:07:57 +03:00
parent 0e05fc519a
commit ea278afb37
3 changed files with 353 additions and 0 deletions
+249
View File
@@ -0,0 +1,249 @@
# Retrospective — Cycle 5 (2026-05-12)
**Tasks**: AZ-503-foundation (tile identity — UUIDv5 + integer UPSERT foundation, 3 SP) + AZ-504 (perf-script grep-pipefail fix, 1 SP). The original AZ-503 spec (5 SP combined) was split mid-implement into AZ-503-foundation (this cycle) + **AZ-505** (5 SP — inventory endpoint + HTTP/2 + Leaflet covering index, blocked-linked to AZ-503-foundation, deferred to cycle 6).
**Mode**: cycle-end (autodev Step 17)
**Previous retro**: `retro_2026-05-12_cycle4.md`
**Cycle shape**: small-foundation cycle — first schema-changing cycle since cycle 1's AZ-484; first cycle to ship a contract minor-version bump; first cycle since cycle 1 with multiple measurable PT-08 batches.
## 1. Implementation Metrics
| Metric | Cycle 5 | Δ vs cycle 4 |
|--------|---------|--------------|
| Tasks implemented | **2** (AZ-504, AZ-503-foundation) | +1 |
| Batches executed | **2** | +1 |
| Avg tasks / batch | 1.0 | unchanged |
| Total complexity delivered | **4 SP** (1 + 3) | -1 SP |
| Avg complexity / batch | 2 SP | -3 SP |
| Tasks at-or-below 5 SP cap | **2 of 2 (100%)** | unchanged |
| Tasks split mid-cycle into a follow-up PBI | **1 of 2** (AZ-503 → AZ-503-foundation + AZ-505) | new (cycle 4 had no splits) |
| Tasks above cap | 0 | unchanged |
| Cumulative reviews | **0** (cumulative-review trigger is every 3 batches; cycle 5 has 2 batches, so no trigger) | unchanged |
**Sequencing**: 2 batches — AZ-504 first (smallest mechanical fix; landed in batch 01), AZ-503-foundation second (batch 02). This ordering was chosen so the AZ-504 fix would be in place by the time the AZ-503-foundation tests run (some of which use the perf script's regression-tested helpers). The cycle completed in **6 dev commits** counting Step 9 task specs, both batches, the autodev-state chore, Steps 11-15 sync, and Step 16 deploy — three more than cycle-4's 4-commit count because of the scope-split conversation and the two-batch structure.
## 2. Quality Metrics
### Code Review Results
| Verdict | Count | Percentage |
|---------|-------|-----------|
| PASS | **1** (batch 01 AZ-504) | **50%** |
| PASS_WITH_WARNINGS | **1** (batch 02 AZ-503-foundation) | **50%** |
| FAIL | 0 | 0% |
### Findings by Severity (per-batch code review)
| Severity | Cycle 5 | Δ vs cycle 4 |
|----------|---------|--------------|
| Critical | 0 | unchanged |
| High | 0 | unchanged |
| Medium | **0** | -2 (cycle 4 had 2 bump-consequence Mediums) |
| Low | **1** (F-cy5 batch02 — soft-NULL guard on `contentSha256` when `File.Exists==false`; practically unreachable in happy path) | unchanged |
### Findings by Category
| Category | Count | Top Files |
|----------|-------|-----------|
| Maintainability | **1** | `SatelliteProvider.Services.TileDownloader/TileService.cs` (BuildTileEntity) |
| Bug | 0 | — |
| Spec-Gap | 0 | — |
| Security | 0 NEW code-review (2 NEW informational Lows in `_docs/05_security/` — F1-cy5 flightId provenance, F2-cy5 pgcrypto runbook gap — both long-term, not code defects) | -3 informational vs cycle 4 |
| Performance | 0 | — |
| Style | 0 | — |
| Scope | 0 | -1 (cycle 4 had 1 — the perf-script path fix) |
**Note on the 1 Low finding**: the `contentSha256` soft-NULL guard is defensive against transient I/O failure between the JPEG-write and the SHA256-compute steps. The downloader writes the file before the SHA256 read, so in practice the NULL branch is unreachable. The column is NULLable at the DB level. Tightening to throw-on-missing-file is recommended as a follow-up if downstream consumers ever rely on NOT NULL.
### Security audit (cycle 5)
| Metric | Value | Δ vs cycle 4 |
|--------|-------|--------------|
| Verdict | **PASS_WITH_WARNINGS** | unchanged (cumulative pos.) |
| Mode | **Delta** (full re-scan of new/modified code in AZ-503/AZ-504 against OWASP Top 10 + dependency manifest diff + infrastructure-change check) | new this cycle (cycle 4 was "Resume narrowed to dependency_scan only") |
| New Critical / High | 0 / 0 | unchanged |
| New Medium | **0** | unchanged |
| New Low (informational only) | **2** (F1-cy5: `metadata.flightId` not authenticated provenance — long-term recommendation when an authoritative flight registry exists; F2-cy5: `pgcrypto` deployment runbook gap on managed Postgres — captured in `deploy_cycle5.md` operator runbook) | -3 vs cycle 4 (5 informational confirmations) |
| Resolved findings | **0** | -2 (cycle 4 forward-resolved 2 cycle-3 carry-overs via major-version bumps; cycle 5 made no package bumps) |
| Carry-overs (still OPEN) | 3 (cycle-3 D2 `Microsoft.NET.Test.Sdk 17.8.0` transitive flag; cycle-4 D-IdentityModel-7.0.3 — both `Microsoft.IdentityModel.Tokens` and `System.IdentityModel.Tokens.Jwt` in TestSupport; `Serilog.AspNetCore 8.0.3` fallback) | unchanged |
**SHA-1 in `Uuidv5.cs` was explicitly assessed and cleared**: RFC 9562 §5.5 mandates SHA-1 for namespace-based UUIDv5; the algorithm is not used as a cryptographic hash for security (no collision-resistance against an adversary is claimed); the input space (tile coordinates × namespace) is not adversary-controlled in any reasonable threat model. This is the standard library-vetted construction for deterministic IDs — flagged explicitly in `static_analysis_cycle5.md` so it does not get re-flagged in future audits.
### Performance gate (cycle 5)
| Metric | Value |
|--------|-------|
| Verdict | **PASS_WITH_INFRA_WARNINGS** |
| Scenarios | 6 Pass · 0 Warn · 2 Fail · 0 Unverified (across Run #2 — Run #1 was 5 Pass + 3 Fail before `colima restart`) |
| AZ-504 NFR (PT-08 reaches summary) | **MET** — first cycle ever where PT-08 returns a measured batch p95. PT-08 ran clean across both runs: Run #1 batch p95 = 199ms, Run #2 batch p95 = **117ms** (vs 2000ms AZ-488 threshold). |
| AZ-503-foundation NFR (no UPSERT hot-path regression) | **MET** — PT-08 uses the new integer-key + flight-aware UPSERT for 200 batches; 200/200 accepted, 0 rejected, 0 failed; p95 117ms is **faster** than any prior cycle's measurement of this path. |
| Cycle-3 perf-harness leftover | **STAYS OPEN** — replay #5 documented. The AZ-504 fix is verified working across two perf runs, but the "default-parameter exit-0 run" criterion is blocked by recurring local Docker/colima DNS cold-start (PT-01 fails on `mt0/tile.googleapis.com` DNS lookup at the first request after each `docker compose up`). Two concrete closure paths recorded in the leftover (DNS pre-warm in script, OR move perf gate to CI). |
## 3. Structural Metrics (snapshot: `structure_2026-05-12_cycle5.md`)
| Metric | Cycle 5 | Δ vs cycle 4 |
|--------|---------|--------------|
| .NET projects (csproj) | **9** | unchanged |
| Cross-project edges (ProjectReference) | **21** | **+1** (IntegrationTests → Common — justified by cross-repo invariant deduplication) |
| Cycles in project graph | 0 | unchanged |
| Average ProjectReferences per component | ~2.3 | +0.1 |
| Max in-degree (Common) | 7 | **+1** (now also imported by IntegrationTests) |
| New Architecture violations | **0** | unchanged |
| Resolved Architecture violations | **0** | -2 vs cycle 4 (cycle 4 forward-resolved 2 via the .NET 10 bump; cycle 5 had no bumps) |
| Net Architecture delta | **0** | unchanged (cycle 4 was also 0) |
| Public-API contract delta | **+1 minor** (`uav-tile-upload.md` 1.0.0 → 1.1.0 additive) | new this cycle (cycle 4 had no contract changes) |
| Database schema delta | **+4 columns, +2 indexes, -2 indexes, +1 extension (`pgcrypto`)** | new this cycle (cycle 4 had no schema changes) |
| NuGet package bumps | **0** | -16 (cycle 4 coordinated 16 distinct package bumps) |
**Cycle 5 structural posture**: one schema migration, one minor contract bump, one new cross-component utility (`Uuidv5`), one new IntegrationTests → Common edge. Zero NuGet bumps, zero csproj additions, zero new public-API endpoints (the inventory endpoint is deferred to AZ-505). The DAG remains acyclic.
## 4. Efficiency Metrics
| Metric | Cycle 5 | Δ vs cycle 4 |
|--------|---------|--------------|
| Blocked tasks (during implementation) | **0 of 2** | unchanged |
| Tasks split mid-implement | **1 of 2** (AZ-503 → AZ-503-foundation + AZ-505 via A/B/C decision at /autodev Step 10) | new this cycle |
| Tasks completed first attempt (no post-review fix commits) | **2 of 2 (100%)** | unchanged (cycle 4 also 100%) |
| Tasks requiring multiple post-code-review fix commits | 0 | unchanged |
| Most-findings batch | batch 02 (AZ-503-foundation — 1 Low; batch 01 had 0 findings) | similar shape |
| Step-15 (Perf Test) execution | **EXECUTED twice** (Run #1 + Run #2; Run #2 is the better signal post `colima restart`) | unchanged (cycle 4 also executed Step 15) |
| Step-15 leftover at end of cycle | **YES (still — same cycle-3 leftover)** — but the AZ-504 verification half is satisfied this cycle; only the "infra-noise-free exit-0 run" half remains. | unchanged in identity, narrower in remaining gap |
| Step-14 (Security Audit) — net findings improvement | **0** new Medium+, 2 new informational Lows, 0 resolved | unchanged net direction |
| Number of A/B/C decision points hit during autodev | **3** (Step 10 scope split, Step 15 Run #1 gate, Step 15 Run #2 gate) | +2 vs cycle 4's 0 in-cycle A/B/C points |
## 5. Patterns Identified
### Pattern 1 — Spec-contradiction-driven A/B/C split at /autodev Step 10 is now a known shape
When AZ-503-implementation started, three contradictions surfaced against the live codebase: `flight_id` did not exist as a column, `FlightId` did not exist as a DTO field, and `voting_status` did not exist as referenced by an AC. The combined work needed to make the spec executable as written was ~5 SP — above the cycle-policy cap. Per `meta-rule.mdc` "Critical Thinking" + `autodev/protocols.md` A/B/C scope-protection, the implement skill stopped, surfaced the contradictions, and offered three options (A: implement the spec as-written; B: defer the whole task; **C: split into foundation + follow-up** — which the user picked).
The split produced a self-contained, testable, 3 SP foundation PBI that this cycle delivered cleanly, and a 5 SP `AZ-505` follow-up that is now blocked-linked in the Jira graph. **This is the first cycle where a scope-split mid-implement happened and landed cleanly without losing test coverage continuity.**
**Insight**: when a /autodev cycle's task spec drifts from the live codebase by more than ~1 missing prerequisite, the split-into-foundation pattern is preferable to either (a) silently expanding the cycle's SP budget or (b) deferring the entire PBI. The foundation captures the prerequisite infrastructure the follow-up needs; the follow-up captures the user-facing capability. Both halves remain individually shippable and individually testable.
### Pattern 2 — First measurable PT-08 batch in the project's history
PT-08 (UAV batch upload p95) was scenario-spec'd in cycle 2 (AZ-488) but never produced a measurable batch number until this cycle. Three runs preceded this one (cycle 3 short variant: PT-08 crashed at line 417; cycle 4 full run: same crash; cycle 5 Run #1 + Run #2: **clean exits to summary with 200/200 accepted**). The AZ-504 fix (`grep -c … || true`) is the single thing that unblocked this.
The captured numbers are: Run #1 batch p95 = 199ms, Run #2 batch p95 = **117ms**, both far under the 2000ms AZ-488 threshold. Per-item proxy = batch p95 / batch_size = **1119ms**, far under the AZ-492 per-item gate.
**Insight**: a 1-SP mechanical script fix unblocked a previously-unmeasurable NFR that had been carried as a leftover across three cycles. The leftover replay protocol kept the issue visible without blocking forward progress; the eventual fix took 15 minutes once it became the cycle's smallest atomic PBI. **This is a textbook case for the "leftover replay" mechanism in `tracker.mdc` — keep small unblocked-but-not-yet-fixed items visible, close them when they fit naturally in a cycle's scope.**
### Pattern 3 — Local Docker/colima DNS cold-start is a recurring class of "infra-noise" failure that contaminates the perf gate
Cycle 4's perf gate noted DNS resolution intermittence as an unrelated test flake. Cycle 5 hit the same class **twice in the same session** — first during the functional test phase (resolved by `colima restart`), then during Step 15 Run #1 (manifested on `tile.googleapis.com`), then again during Step 15 Run #2 (manifested on `mt0.google.com` — the warmup probe between Run #1 and Run #2 only touched `tile.googleapis.com` + `mt1.google.com`).
The pattern is: the first Google-Maps tile fetch immediately after `docker compose up` may fail DNS resolution if the colima resolver hasn't pre-cached the specific hostname yet. Subsequent fetches succeed. PT-01 (cold tile download) is the first scenario that hits a Google-Maps hostname and therefore is the canary for this class of failure.
**Insight**: this is not an application regression — it is **environment instability that the perf script does not currently shield itself against**. Two concrete mitigations are recorded in the cycle-3 leftover (replay #5 entry) as out-of-scope follow-up PBIs:
1. **Add a DNS pre-warm step** to `scripts/run-performance-tests.sh` before PT-01 (1 SP, deterministic fix);
2. **Move the perf gate to CI / a stable-resolver environment** (2 SP, structural fix).
The pattern is worth surfacing as a Lesson — the perf-mode skill (`test-run/SKILL.md`) already says "always worth ONE re-run before declaring a regression", and we did one; but the underlying environment continues to be the cycle's single largest source of non-application gate noise.
### Pattern 4 — Cross-repo cryptographic invariants belong in a code-level constant, not a doc reference
AZ-503 introduces a `TileNamespace` UUID (`5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c`) that MUST byte-match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py` (sibling workspace), or every cross-repo `tileId` lookup silently misses. The constant lives as a pinned `public const string TileNamespace` in `SatelliteProvider.Common.Utils.Uuidv5` and is asserted by the `Uuidv5Tests` reference-vector unit tests. The sibling workspace is documented to mirror the same constant.
**Insight**: when a cross-repo invariant is a magic constant (a UUID, a base32 alphabet, a tile-zoom convention), it must live in code in BOTH repos with reference-vector tests on BOTH sides. Documentation alone (e.g., "see contract foo.md") is not enough — a drift between the constants would only surface as a 100% lookup-miss in production, which is harder to detect than a unit-test failure. Cycle 5 captured this on the satellite-provider side; the sibling-repo side will be handled when AZ-505 lands.
### Pattern 5 — Schema migrations + `pgcrypto` create a deploy-side dependency that the runbook must spell out
Migration 014 enables `pgcrypto` (`CREATE EXTENSION IF NOT EXISTS pgcrypto;`) for the session-scoped backfill function. On stock Postgres 16 (our `postgres:16` Docker image), `pgcrypto` is bundled. On managed cloud Postgres providers (RDS, Cloud SQL, Azure Postgres), the migration-running role typically needs `cloudsqlsuperuser` / `rds_superuser` / equivalent to `CREATE EXTENSION`. If that privilege is absent, migration 014 fails and the application doesn't start.
This was flagged as F2-cy5 (Low informational) during the security audit, recorded in `deploy_cycle5.md` operator runbook step 2, and explicitly called out as a recommended out-of-scope follow-up PBI. **It is NOT a code defect — it is a deploy-runbook gap.**
**Insight**: cycles that add `CREATE EXTENSION` / `CREATE ROLE` / `ALTER SYSTEM` / other privilege-sensitive statements to migrations must produce a deploy-runbook update in the same cycle, even when the cycle is otherwise small. The information is uniquely-visible at code-write time; if the runbook update is deferred, it tends to get lost.
## 6. Comparison vs. previous retros
| Metric | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 |
|--------------------------------------|----------------|----------------|----------------|----------------|----------------|
| Tasks implemented | 1 | 2 | 6 | 1 | **2** |
| Total complexity delivered | 8 SP | 10 SP | 18 SP | 5 SP | **4 SP** |
| Batches | 1 | 2 | 5 | 1 | **2** |
| Critical/High review findings | 0 | 0 | 0 | 0 | **0** |
| New Medium review findings | 0 | 0 | 0 | 2 (bump conseq.) | **0** |
| New Low review findings | 3 | 6 (5 distinct) | 7 | 1 | **1** |
| Code review pass rate | 100% (1/1) | 100% (2/2) | 100% (5/5) | 100% (1/1) | **100% (2/2)** |
| Tasks completed first attempt | 0 of 1 | 0 of 2 | 5 of 6 | 1 of 1 | **2 of 2** |
| Tasks split mid-implement | 0 | 0 | 0 | 0 | **1** |
| New Medium security findings | 2 | 2 | 0 | 0 | **0** |
| Resolved security findings | 0 | 0 | 3 | 2 (fwd-bump) | **0** |
| Net Architecture delta | n/a (baseline) | +0 | -3 | 0 | **0** |
| Schema change | YES (AZ-484) | NO | NO | NO | **YES (AZ-503-foundation)** |
| Public-API contract change | YES (tile-storage v1.0.0) | YES (uav-tile-upload v1.0.0) | NO | NO | **YES (uav-tile-upload v1.0.0 → v1.1.0 additive)** |
| Step-15 (Perf) executed | YES | SKIPPED | SKIPPED | YES | **YES (Run #1 + Run #2)** |
| Step-15 leftover at retro | NO | YES | YES | YES | **YES (still — same cycle-3 leftover; AZ-504 verification half satisfied)** |
| First measurable PT-08 batch | n/a | n/a | n/a | NO (script crash) | **YES (Run #1 199ms, Run #2 117ms)** |
### Did the cycle-4 actions land?
- **Cycle 4 Action 1 (fix `scripts/run-performance-tests.sh:416-417` grep-pipefail)** — **LANDED as AZ-504** in cycle 5 (this cycle). Closes Pattern 2 from cycle 4's retro. Verified working across two perf runs.
- **Cycle 4 Action 2 (migrate `WithOpenApi(...)` callsites to ASP.NET Core 10 minimal-API metadata extensions, 3 SP)** — **NOT landed in cycle 5**. Explicitly out of AZ-503/AZ-504 scope per `coderule.mdc` "scope discipline". Re-listed as a recommended follow-up PBI in `deploy_cycle5.md`. Carries to cycle 6.
- **Cycle 4 Action 3 (pre-flight transitive-major-version impact analysis at task-spec time)** — **NOT directly exercised** in cycle 5. AZ-503 and AZ-504 had zero NuGet bumps, so the rule was preventive only. It still lives in `coderule.mdc` (added in cycle 4) and will fire on the next package-bumping PBI.
This is the **third consecutive cycle where a prior-retro action landed** (Action 1 here, Action 1 in cycle 4, Actions 1+2+3 in cycle 3). Pattern is stable.
### Did the cycle-3 actions land?
- **Cycle 3 Action 1 (execute perf harness against deployed dev image)** — landed in cycle 4 (implicit AZ-500 NFR gate). Closed.
- **Cycle 3 Action 2 (bump `System.IdentityModel.Tokens.Jwt 7.0.3 → 7.1.2+`)** — **NOT landed in cycle 5**. Carry-over D2-cy4 still open. Test-runtime exposure only; safe to land independently.
- **Cycle 3 Action 3 (`workspace:` field on cross-repo ACs in new-task skill)** — **NOT exercised**. AZ-503 has one cross-workspace invariant (the `TileNamespace` UUID) but that invariant is captured as a Constraint in the task spec body rather than as a per-AC `workspace:` tag. The rule is still not codified in `new-task/SKILL.md`. AZ-505 next cycle has explicit cross-repo writes (the gps-denied-onboard side of the UUIDv5 namespace constant); the cycle-3 rule should land before AZ-505 writes its spec.
## 7. Top 3 Improvement Actions (ranked by impact)
### Action 1 — Add DNS pre-warm to `scripts/run-performance-tests.sh` before PT-01 (1 SP, deterministic)
**Why this is the highest impact**: Pattern 3 above documents the same class of failure across cycles 4 + 5. It contaminates the perf gate with non-application noise, blocks closure of the cycle-3 perf-harness leftover (now in its third carry-over cycle), and forces a manual `colima restart` + re-run per session. A deterministic DNS pre-warm before PT-01 fires removes the entire failure class on the local-dev runner.
**Action**: 1 SP PBI in cycle 6. Insert a `getent hosts mt0.google.com mt1.google.com mt2.google.com mt3.google.com tile.googleapis.com` (or equivalent for the runner's resolver) inside the api container immediately before PT-01 — fail-fast if any hostname is unresolvable after a small retry window. Closes the cycle-3 perf-harness leftover on the next run.
**Cost**: ~20 minutes (one shell addition + one perf run to verify exit-0 + delete the leftover). Counted as 1 SP because deletion of the leftover requires a full clean run.
### Action 2 — Implement AZ-505 (deferred AZ-503 half: inventory endpoint + HTTP/2 + Leaflet covering index, 5 SP)
**Why**: AZ-505 is blocked-linked to AZ-503-foundation. The foundation (this cycle) shipped the deterministic identity, the integer-key UPSERT, and the per-flight layout. AZ-505 ships the user-facing inventory endpoint (`POST /api/satellite/tiles/inventory`) that lets consumers ask "given these N (lat,lon,zoom) coordinates, which tileIds + variants do you have?". AZ-505 also enables HTTP/2 on the API (required for batched inventory responses without TCP head-of-line blocking) and rewrites the Leaflet hot path against the new `location_hash` index.
**Action**: 5 SP PBI in cycle 6. Foundation prerequisites are now in place. AZ-505 also carries the cross-repo write obligation for the gps-denied-onboard side of the UUIDv5 namespace constant.
**Cost**: 5 SP — within the cycle cap. Foundation is done; this is the user-facing payload that justifies the schema work.
### Action 3 — Add a `pgcrypto` pre-install check step to the deployment runbook (1 SP, ops-side)
**Why**: Pattern 5 above. Migration 014 silently relies on `pgcrypto` being installable by the migration-running role. Stock Postgres 16 is fine; managed cloud Postgres providers may not be. F2-cy5 already captures this in `_docs/05_security/owasp_review_cycle5.md` and `deploy_cycle5.md` operator runbook step 2 — but a runbook step that says "check this before running the migration" is necessary if the project ever migrates off Docker-postgres for a non-dev environment.
**Action**: 1 SP PBI in cycle 6 (or land as a doc-only PR within cycle 6's scope-discipline budget). Update the deployment runbook with a pre-migration `SELECT EXTNAME FROM pg_extension WHERE extname='pgcrypto'` + a fallback path if missing.
**Cost**: ~30 minutes of doc work. Counted as 1 SP because it touches the deployment runbook (cross-cutting infra doc).
## 8. Suggested Rule / Skill updates
| File | Change | Rationale |
|------|--------|-----------|
| `coderule.mdc` (new bullet in scope-discipline section) | "When a migration adds a `CREATE EXTENSION` / `CREATE ROLE` / `ALTER SYSTEM` / other privilege-sensitive statement, the same cycle's deploy report MUST add a pre-migration runbook step that verifies the privilege exists in the target environment. The runbook step is required even when the cycle is otherwise small." | Pattern 5 (pgcrypto in migration 014 created a deploy-runbook gap that was caught at security-audit time, not at migration-write time) |
| `new-task/SKILL.md` (new check in Step 5 — Risks & Mitigation) | When a task spec introduces a cross-repo cryptographic invariant (a UUID namespace, a base32/64 alphabet, a tile-zoom convention, a deterministic-key formula), the spec MUST list both the in-workspace code location of the constant AND the sibling-workspace code location it must byte-match — with reference-vector tests on both sides. Doc references alone do not satisfy this. | Pattern 4 (`TileNamespace` UUID must byte-match `gps-denied-onboard/components/c6_tile_cache/_uuid.py`) |
| `autodev/protocols.md` (formalise Step-10 contradiction-driven A/B/C) | The "scope-split" branch of the scope-protection A/B/C choice should be a first-class named option, not an ad-hoc decision. When a /autodev cycle's task spec contradicts the live codebase by >=2 prerequisites, the implement skill should preferentially offer **A: implement as-written / B: defer entirely / C: split into foundation + follow-up** with C being the recommended default. Cycle 5 derived this manually; codifying it makes future cycles cheaper to navigate. | Pattern 1 (AZ-503 → AZ-503-foundation + AZ-505 split was the cycle's largest decision, made via ad-hoc Choose; the recommended option matched the documented pattern but the path was discovered, not followed) |
| `test-run/SKILL.md` (Perf Mode Step 5 — add explicit retrospective trigger) | After the "one re-run" rule fires twice across consecutive cycles with the same root-cause class (e.g., DNS, NTP, resolver), the perf-mode skill should auto-surface a recommended PBI for a deterministic fix at the environment / harness layer — not as a re-run, as a fix. Cycle 5 fired the rule manually here. | Pattern 3 (DNS cold-start hit cycle 4 + cycle 5; the perf-mode skill's "one re-run" already shields against single-incident noise, but doesn't yet escalate when noise recurs) |
## 9. Decision items carried over (operator)
- **Cycle-3 perf-harness leftover** — STAYS OPEN. Replay #5 entry recorded. **Half-closed** this cycle: AZ-504 script fix verified working across 2 runs; remaining half (full default-parameter exit-0 run) blocked by recurring local DNS noise. Closure path: Action 1 above (DNS pre-warm in script, 1 SP) OR move perf gate to CI/cloud runner.
- **Admin team iss/aud confirmation** (carried from cycles 3 + 4) — still required before promoting beyond `dev`. Unchanged. Tracked in `deploy_cycle3.md` + `deploy_cycle4.md` + `deploy_cycle5.md`.
- **D2-cy4 — `Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag** — unchanged. Test-runtime exposure only; safe to land in a future cycle.
- **D-IdentityModel-7.0.3** (cycle-4 carry-over — both `Microsoft.IdentityModel.Tokens` and `System.IdentityModel.Tokens.Jwt` at 7.0.3 in TestSupport, NU1902) — unchanged. Cycle-3 Action 2 obligation; should land before any new auth-touching cycle.
- **F1-cy5 — `metadata.flightId` authenticated provenance** — long-term, not actionable until an authoritative flight registry exists in the suite. Recorded in `owasp_review_cycle5.md` and `deploy_cycle5.md` as a long-term recommendation.
- **F2-cy5 — `pgcrypto` deployment runbook gap** — Action 3 above. Trivial doc-only fix.
- **`Serilog.AspNetCore 8.0.3` fallback** — unchanged; no 10.x line published as of cycle 5. Recheck at every cycle start.
- **Cross-repo doc `suite/_docs/10_auth.md` paragraph** (cycle-3 carry-over) — unchanged.
- **`workspace:` field on cross-repo ACs in `new-task/SKILL.md`** (cycle-3 Action 3, never landed) — must land before AZ-505 task spec is written, since AZ-505 has explicit cross-repo writes (the gps-denied-onboard side of the UUIDv5 namespace constant).
## 10. What this retro says about process maturity
Cycle 5 is the first cycle that:
- **Split a task spec mid-implementation into a foundation + follow-up pair** that both shipped (foundation in cycle 5, follow-up scheduled as AZ-505). The /autodev step-10 contradiction-driven A/B/C path proved itself end-to-end.
- **Carried a 3-cycle-old leftover from "completely unmeasurable" to "verified-fixed, exit-0 blocked only by infra noise"**. The AZ-504 fix is provably working; the remaining blocker is environment, not code.
- **Shipped a schema migration with a backfill, a contract minor version bump, and a new on-disk path layout — all additive, all backward-compatible with cycle-4 clients** — proving the project's documentation pipeline (test-spec sync, doc update, ripple log, security audit, deploy report) now scales to schema-touching cycles, not just runtime/SDK migrations (cycle 4) or single-file refactors (cycles 1-3).
- **Recorded 3 new infrastructure-level recommendations** (DNS pre-warm in perf script, pgcrypto pre-install runbook step, cross-repo invariant rule in new-task skill) — none of them are bugs in the cycle's code, all of them are process gaps the cycle's *work shape* surfaced. This is the second consecutive cycle where the retro's top action items are predominantly process / harness / runbook, not code defects.
The process continues to converge. The remaining friction points after cycle 5 are (a) local-dev Docker/colima DNS noise that contaminates Step 15 (Action 1), (b) the carry-over package-hygiene PBIs from cycles 3/4 (Test.Sdk, IdentityModel.Tokens.Jwt, WithOpenApi callsites, Serilog 10.x) that have been deferred per scope discipline four cycles in a row, and (c) the cross-repo invariant codification (cycle-3 Action 3) that must land before AZ-505 writes its spec. All are concrete cycle-6 PBI candidates totalling ~12-15 SP, which is one fully-loaded normal cycle — or split across cycles 6 + 7 if AZ-505 alone fills cycle 6.
@@ -0,0 +1,98 @@
# Structural Snapshot — 2026-05-12 (post-cycle 5, AZ-503-foundation + AZ-504)
Cycle 5 delta against `structure_2026-05-12_cycle4.md`. Source of truth: `_docs/02_document/module-layout.md` + on-disk `*.csproj` graph + `_docs/02_document/contracts/`.
## Projects
| Layer | csproj | Cycle 5 delta |
|-------|--------|---------------|
| 1 (Foundation) | `SatelliteProvider.Common` | **+1 file**: `Utils/Uuidv5.cs` (NEW, 80 LoC; RFC 9562 §5.5 SHA-1 UUIDv5 + pinned `TileNamespace` GUID). No new NuGet deps; uses framework-only `System.Security.Cryptography.SHA1` + `System.Buffers.Binary.BinaryPrimitives`. |
| 1 (Foundation) | `SatelliteProvider.DataAccess` | **+1 migration**: `Migrations/014_AddTileIdentityColumns.sql` (NEW, embedded resource); **+4 properties** on `Models/TileEntity.cs` (`FlightId`, `LocationHash`, `ContentSha256`, `LegacyId`); UPSERT in `Repositories/TileRepository.cs` rewritten for the integer-key + flight-aware contract. No new NuGet deps; `pgcrypto` extension enabled by the migration script (PG-server-side, not a NuGet). |
| 1 (Foundation, shared DTO) | `SatelliteProvider.Common` (DTO sub-folder) | **+1 property**: `DTO/UavTileMetadata.FlightId` (`Guid?`, init-only, optional). Contract `uav-tile-upload.md` bumped 1.0.0 → 1.1.0 (additive). |
| 3 (Application) | `SatelliteProvider.Services.TileDownloader` | **+ImportSite**: `using SatelliteProvider.Common.Utils` (for `Uuidv5.Create`) + `using System.Security.Cryptography` (for `SHA256.HashData`) in `TileService.cs` and `UavTileUploadHandler.cs`. `UavTileUploadHandler.BuildUavTileFilePath` gains an optional `Guid? flightId` parameter. No new csproj refs. |
| 3 (Application) | `SatelliteProvider.Services.RegionProcessing` | unchanged |
| 3 (Application) | `SatelliteProvider.Services.RouteManagement` | unchanged |
| 4 (API / Entry) | `SatelliteProvider.Api` | unchanged (`Program.cs` not edited this cycle) |
| 5 (Test-Support) | `SatelliteProvider.TestSupport` | unchanged |
| 6 (Tests) | `SatelliteProvider.Tests` | **+1 test class** (`Uuidv5Tests.cs`); **+3 facts** in `UavTileFilePathTests.cs`; **+2 facts** in `UavTileUploadHandlerTests.cs`. |
| 6 (Tests) | `SatelliteProvider.IntegrationTests` | **+1 ProjectReference** (`SatelliteProvider.Common`) — so raw-SQL seeds can call `Uuidv5.Create` directly; **+2 facts** in `UavUploadTests.cs` (AC-3 multi-flight, AC-4 float-rounding); **+3 facts** in `MigrationTests.cs` (Az503 columns / index / backfill determinism); 1 fact superseded (`NewUniqueConstraintIncludesSourceColumn_AZ484_AC1``Az503MigrationSupersedesAz484UniqueIndex`); seed for the pre-existing `MultiSourceCoexistence_AZ484_Cycle2` test repaired to populate `location_hash`. |
**Project count**: 9 (unchanged from cycle 4 — AZ-503-foundation adds files to existing projects, doesn't add a new csproj).
## Cross-Project Import Edges (compile-time `ProjectReference`)
| Edge | Count | Cycle 5 delta |
|------|-------|----------------|
| Api → {Common, DataAccess, TileDownloader, RegionProcessing, RouteManagement} | 5 | unchanged |
| TileDownloader → {Common, DataAccess} | 2 | unchanged |
| DataAccess → {Common} | 1 | unchanged |
| RegionProcessing → {Common, DataAccess} | 2 | unchanged |
| RouteManagement → {Common, DataAccess} | 2 | unchanged |
| Tests → {Api, TileDownloader, RegionProcessing, RouteManagement, Common, DataAccess, TestSupport} | 7 | unchanged |
| IntegrationTests → {TestSupport, **Common (NEW)**} | **2** | **+1** edge (motivated by AZ-503 — seeds need the production `Uuidv5.Create` algorithm to compute `location_hash` for raw-SQL inserts) |
**Total ProjectReference edges**: **21** (cycle 4: 20). Net delta: +1 edge.
## Source-import sites — cycle 5 delta
| Importer | Imports from | Cycle 5 delta |
|----------|--------------|---------------|
| `SatelliteProvider.Services.TileDownloader/TileService.cs` | `SatelliteProvider.Common.Utils` (Uuidv5), `System.Security.Cryptography` (SHA256) | NEW (AZ-503 deterministic identity + content hash) |
| `SatelliteProvider.Services.TileDownloader/UavTileUploadHandler.cs` | `SatelliteProvider.Common.Utils` (Uuidv5), `System.Security.Cryptography` (SHA256) | NEW (same — UAV write path) |
| `SatelliteProvider.IntegrationTests/UavUploadTests.cs` | `SatelliteProvider.Common.Utils` (Uuidv5) | NEW (seed helper for raw-SQL inserts) |
| `SatelliteProvider.Tests/Uuidv5Tests.cs` | `SatelliteProvider.Common.Utils` | NEW (unit-test class) |
| All other source files | unchanged | — |
**5 new source-level import lines** across 4 files; all to either the new internal `SatelliteProvider.Common.Utils.Uuidv5` utility or the framework `System.Security.Cryptography.SHA256`. **Zero new third-party imports.**
## Graph properties
- **Cycles in project import graph**: 0 (clean DAG — unchanged)
- **Average ProjectReferences per component**: 21 / 9 = **~2.3** (cycle 4: ~2.2). Net delta: +0.1 (one new IntegrationTests → Common edge).
- **Max in-degree**: Common (still highest — now at **7** incoming edges: Api, TileDownloader, DataAccess, RegionProcessing, RouteManagement, Tests, **IntegrationTests (new)**). Cycle 4 had Common at 6.
- **Max out-degree**: Tests (7 — unchanged).
- **TestSupport position**: leaf-of-test-subgraph; no production-layer importers (unchanged).
- **The new IntegrationTests → Common edge is justified**: integration-test seeds need the same deterministic `Uuidv5` algorithm the production code uses, otherwise the new NOT NULL `location_hash` column would force every seed to encode the SHA-1 byte order by hand. Reusing `SatelliteProvider.Common.Utils.Uuidv5` keeps the algorithm in one place (which is also where the cross-repo invariant with `gps-denied-onboard/components/c6_tile_cache/_uuid.py` lives).
## NuGet dependency hygiene (cycle 5)
| Package | Cycle-4 version | Cycle-5 version | Status |
|---------|-----------------|-----------------|--------|
| All NuGet packages across all 9 csproj files | unchanged | **unchanged** | **Zero NuGet bumps this cycle.** AZ-503-foundation uses framework-only types (`SHA1`, `SHA256`, `BinaryPrimitives`); AZ-504 is a 2-line shell-script edit. |
| Carry-overs (still OPEN) | Cycle-3 D2 (`Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag), cycle-4 D4 (`Microsoft.IdentityModel.Tokens` 7.0.3 + `System.IdentityModel.Tokens.Jwt` 7.0.3 in TestSupport), `Serilog.AspNetCore` 8.0.3 fallback | unchanged | All three remain explicitly out of cycle-5 scope per `coderule.mdc` "scope discipline". |
## Database schema surface (cycle 5 delta)
| Object | Change | Source |
|--------|--------|--------|
| `tiles` table | **+4 columns**: `flight_id uuid NULL`, `location_hash uuid NOT NULL` (backfilled deterministically for pre-existing rows), `content_sha256 bytea NULL`, `legacy_id uuid NULL` | `014_AddTileIdentityColumns.sql` |
| `idx_tiles_unique_location_source` (AZ-484, float-based) | **DROPPED** | same |
| `idx_tiles_unique_location` (pre-AZ-484, defensive duplicate cleanup) | **DROPPED** | same |
| `idx_tiles_unique_identity` (integer-key + COALESCE(flight_id, zero-UUID)) | **CREATED** (unique) | same |
| `idx_tiles_location_hash` (location_hash lookups for the future AZ-505 inventory endpoint) | **CREATED** (non-unique) | same |
| Postgres extension `pgcrypto` | **CREATE EXTENSION IF NOT EXISTS** (used during the migration's session-scoped `pg_temp.uuidv5` PL/pgSQL function only) | same |
The migration runs in a single transaction and is idempotent under DbUp's journal. No table or column was renamed (per `coderule.mdc` "Do not rename any database objects without confirmation").
## Architecture / contract surface (cycle 5 delta)
- **Contract bumped**: `_docs/02_document/contracts/api/uav-tile-upload.md` v1.0.0 → **v1.1.0** (additive — adds optional `metadata.flightId: uuid?`; adds derived `tileId` field in the response). Backward-compatible with cycle-4 clients.
- **No new public-API contract files.** (AZ-505 will introduce `inventory.md` next cycle.)
- **New per-flight on-disk path layout** for UAV tiles: `./tiles/uav/{flightId or 'none'}/{z}/{x}/{y}.jpg`. Additive — legacy paths under `./tiles/uav/{z}/{x}/{y}.jpg` are not moved. Documented in `uav-tile-upload.md` "File-path layout" section.
- **New cross-repo invariant**: the `TileNamespace` GUID `5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c` in `SatelliteProvider.Common/Utils/Uuidv5.cs` must match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py` (sibling workspace). Cycle-5 commit covers the satellite-provider side; the gps-denied-onboard side will be handled when AZ-505 / the consumer-side work runs.
## Net Architecture delta vs cycle 4
- **Resolved (closed by this cycle)**: 0 architecture-level findings closed (the cycle-3 perf-harness leftover is half-closed — AZ-504 script fix is verified working, but the exit-0 deletion criterion is blocked by recurring local DNS noise; replay #5 documented).
- **Newly introduced (informational only)**:
- 2 Low informational findings in the security audit (F1-cy5 `metadata.flightId` not authenticated provenance; F2-cy5 `pgcrypto` deployment runbook gap on managed Postgres). Both are long-term recommendations, not code defects.
- 1 Low Maintainability finding in the AZ-503 code review (the `contentSha256` soft-NULL guard in `TileService.BuildTileEntity` when `File.Exists` is false — practically unreachable, defensively kept).
- 0 new Medium, 0 new High, 0 new Critical.
- **+1 cross-project edge** (IntegrationTests → Common) — justified (cross-repo invariant deduplication).
- **Contract delta**: 1 minor version bump on `uav-tile-upload.md` (1.0.0 → 1.1.0, additive).
- **Schema delta**: +1 migration (014), +4 columns, +2 indexes, -2 indexes, +1 Postgres extension.
- **Net Architecture delta**: 0 net architecture-level findings (the 3 new Lows are informational only, and the +1 edge + minor contract bump are net-neutral structural additions, not violations).
## What this snapshot says about cycle 5's shape
Cycle 5 is the project's first **schema-changing cycle since cycle 1's AZ-484** — a database migration with backfill, a new internal cross-component algorithm (`Uuidv5`), a contract minor bump, and a new on-disk layout for one tile source. Quantitatively it's small (4 SP delivered = 3 SP foundation + 1 SP script fix), but it lays foundational identity infrastructure that next cycle's AZ-505 (5 SP — inventory endpoint, HTTP/2, Leaflet covering index) is blocked on. The structural delta is bounded (`+1` import edge, `+1` minor contract version, +1 migration, +4 columns, +2 indexes, -2 indexes, 0 new csproj, 0 new NuGet bumps), and the DAG remains acyclic with the same 9 projects.
+6
View File
@@ -37,6 +37,12 @@ If the enum's wire string happens to match a member name case-insensitively (e.g
## Ring buffer (last 15 entries — newest at top) ## Ring buffer (last 15 entries — newest at top)
- [2026-05-12] [architecture] Cross-repo cryptographic invariants (UUID namespaces, deterministic-key formulas, base32/64 alphabets, tile-zoom conventions) MUST live as code-level constants in BOTH repos with reference-vector tests on BOTH sides — documentation alone is insufficient because constant drift surfaces only as 100% lookup misses in production, harder to detect than a unit-test failure (cycle 5: AZ-503 introduced `TileNamespace = 5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c` which must byte-match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py`; the satellite-provider side has the constant + 10 Python-generated reference vectors in `Uuidv5Tests.cs` and the sibling repo will mirror).
Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
- [2026-05-12] [tooling] Local Docker/colima DNS cold-start is a recurring class of failure that contaminates the Step-15 perf gate — when the perf-mode "one re-run" rule fires twice across consecutive cycles with the same root-cause class (DNS / NTP / resolver), the harness must escalate from "re-run" to a deterministic fix at the harness layer (DNS pre-warm in script, OR move gate to CI), not just another re-run (cycle 5: PT-01 failed Run #1 on `tile.googleapis.com` cold-start, then Run #2 on `mt0.google.com` cold-start; the warmup probe between runs only touched the hostnames it explicitly named).
Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
- [2026-05-12] [process] When a /autodev cycle's task spec contradicts the live codebase by ≥2 missing prerequisites, the implement skill should preferentially split into foundation + follow-up via A/B/C (option C) rather than (A) silently expand the SP budget or (B) defer the entire task — both halves remain individually shippable and individually testable, the cross-PBI dependency is captured as a blocked-link in the tracker (cycle 5: AZ-503 → AZ-503-foundation + AZ-505 split when `flight_id` / `FlightId` / `voting_status` all missing from live code; AZ-503-foundation shipped this cycle, AZ-505 blocks-on-it for cycle 6).
Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
- [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with `dotnet restore --dry-run` against a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumped `Microsoft.OpenApi` 1.x → 2.x via the `Microsoft.AspNetCore.OpenApi` 8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation). - [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with `dotnet restore --dry-run` against a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumped `Microsoft.OpenApi` 1.x → 2.x via the `Microsoft.AspNetCore.OpenApi` 8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation).
Source: _docs/06_metrics/retro_2026-05-12_cycle4.md Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
- [2026-05-12] [process] When a scope-protected task newly *exposes* a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing `scripts/run-performance-tests.sh:417` `grep -o | wc -l` + `pipefail` bug). - [2026-05-12] [process] When a scope-protected task newly *exposes* a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing `scripts/run-performance-tests.sh:417` `grep -o | wc -l` + `pipefail` bug).