diff --git a/_docs/06_metrics/retro_2026-05-12_cycle5.md b/_docs/06_metrics/retro_2026-05-12_cycle5.md new file mode 100644 index 0000000..141d0fe --- /dev/null +++ b/_docs/06_metrics/retro_2026-05-12_cycle5.md @@ -0,0 +1,249 @@ +# Retrospective — Cycle 5 (2026-05-12) + +**Tasks**: AZ-503-foundation (tile identity — UUIDv5 + integer UPSERT foundation, 3 SP) + AZ-504 (perf-script grep-pipefail fix, 1 SP). The original AZ-503 spec (5 SP combined) was split mid-implement into AZ-503-foundation (this cycle) + **AZ-505** (5 SP — inventory endpoint + HTTP/2 + Leaflet covering index, blocked-linked to AZ-503-foundation, deferred to cycle 6). +**Mode**: cycle-end (autodev Step 17) +**Previous retro**: `retro_2026-05-12_cycle4.md` +**Cycle shape**: small-foundation cycle — first schema-changing cycle since cycle 1's AZ-484; first cycle to ship a contract minor-version bump; first cycle since cycle 1 with multiple measurable PT-08 batches. + +## 1. Implementation Metrics + +| Metric | Cycle 5 | Δ vs cycle 4 | +|--------|---------|--------------| +| Tasks implemented | **2** (AZ-504, AZ-503-foundation) | +1 | +| Batches executed | **2** | +1 | +| Avg tasks / batch | 1.0 | unchanged | +| Total complexity delivered | **4 SP** (1 + 3) | -1 SP | +| Avg complexity / batch | 2 SP | -3 SP | +| Tasks at-or-below 5 SP cap | **2 of 2 (100%)** | unchanged | +| Tasks split mid-cycle into a follow-up PBI | **1 of 2** (AZ-503 → AZ-503-foundation + AZ-505) | new (cycle 4 had no splits) | +| Tasks above cap | 0 | unchanged | +| Cumulative reviews | **0** (cumulative-review trigger is every 3 batches; cycle 5 has 2 batches, so no trigger) | unchanged | + +**Sequencing**: 2 batches — AZ-504 first (smallest mechanical fix; landed in batch 01), AZ-503-foundation second (batch 02). This ordering was chosen so the AZ-504 fix would be in place by the time the AZ-503-foundation tests run (some of which use the perf script's regression-tested helpers). The cycle completed in **6 dev commits** counting Step 9 task specs, both batches, the autodev-state chore, Steps 11-15 sync, and Step 16 deploy — three more than cycle-4's 4-commit count because of the scope-split conversation and the two-batch structure. + +## 2. Quality Metrics + +### Code Review Results + +| Verdict | Count | Percentage | +|---------|-------|-----------| +| PASS | **1** (batch 01 AZ-504) | **50%** | +| PASS_WITH_WARNINGS | **1** (batch 02 AZ-503-foundation) | **50%** | +| FAIL | 0 | 0% | + +### Findings by Severity (per-batch code review) + +| Severity | Cycle 5 | Δ vs cycle 4 | +|----------|---------|--------------| +| Critical | 0 | unchanged | +| High | 0 | unchanged | +| Medium | **0** | -2 (cycle 4 had 2 bump-consequence Mediums) | +| Low | **1** (F-cy5 batch02 — soft-NULL guard on `contentSha256` when `File.Exists==false`; practically unreachable in happy path) | unchanged | + +### Findings by Category + +| Category | Count | Top Files | +|----------|-------|-----------| +| Maintainability | **1** | `SatelliteProvider.Services.TileDownloader/TileService.cs` (BuildTileEntity) | +| Bug | 0 | — | +| Spec-Gap | 0 | — | +| Security | 0 NEW code-review (2 NEW informational Lows in `_docs/05_security/` — F1-cy5 flightId provenance, F2-cy5 pgcrypto runbook gap — both long-term, not code defects) | -3 informational vs cycle 4 | +| Performance | 0 | — | +| Style | 0 | — | +| Scope | 0 | -1 (cycle 4 had 1 — the perf-script path fix) | + +**Note on the 1 Low finding**: the `contentSha256` soft-NULL guard is defensive against transient I/O failure between the JPEG-write and the SHA256-compute steps. The downloader writes the file before the SHA256 read, so in practice the NULL branch is unreachable. The column is NULLable at the DB level. Tightening to throw-on-missing-file is recommended as a follow-up if downstream consumers ever rely on NOT NULL. + +### Security audit (cycle 5) + +| Metric | Value | Δ vs cycle 4 | +|--------|-------|--------------| +| Verdict | **PASS_WITH_WARNINGS** | unchanged (cumulative pos.) | +| Mode | **Delta** (full re-scan of new/modified code in AZ-503/AZ-504 against OWASP Top 10 + dependency manifest diff + infrastructure-change check) | new this cycle (cycle 4 was "Resume narrowed to dependency_scan only") | +| New Critical / High | 0 / 0 | unchanged | +| New Medium | **0** | unchanged | +| New Low (informational only) | **2** (F1-cy5: `metadata.flightId` not authenticated provenance — long-term recommendation when an authoritative flight registry exists; F2-cy5: `pgcrypto` deployment runbook gap on managed Postgres — captured in `deploy_cycle5.md` operator runbook) | -3 vs cycle 4 (5 informational confirmations) | +| Resolved findings | **0** | -2 (cycle 4 forward-resolved 2 cycle-3 carry-overs via major-version bumps; cycle 5 made no package bumps) | +| Carry-overs (still OPEN) | 3 (cycle-3 D2 `Microsoft.NET.Test.Sdk 17.8.0` transitive flag; cycle-4 D-IdentityModel-7.0.3 — both `Microsoft.IdentityModel.Tokens` and `System.IdentityModel.Tokens.Jwt` in TestSupport; `Serilog.AspNetCore 8.0.3` fallback) | unchanged | + +**SHA-1 in `Uuidv5.cs` was explicitly assessed and cleared**: RFC 9562 §5.5 mandates SHA-1 for namespace-based UUIDv5; the algorithm is not used as a cryptographic hash for security (no collision-resistance against an adversary is claimed); the input space (tile coordinates × namespace) is not adversary-controlled in any reasonable threat model. This is the standard library-vetted construction for deterministic IDs — flagged explicitly in `static_analysis_cycle5.md` so it does not get re-flagged in future audits. + +### Performance gate (cycle 5) + +| Metric | Value | +|--------|-------| +| Verdict | **PASS_WITH_INFRA_WARNINGS** | +| Scenarios | 6 Pass · 0 Warn · 2 Fail · 0 Unverified (across Run #2 — Run #1 was 5 Pass + 3 Fail before `colima restart`) | +| AZ-504 NFR (PT-08 reaches summary) | **MET** — first cycle ever where PT-08 returns a measured batch p95. PT-08 ran clean across both runs: Run #1 batch p95 = 199ms, Run #2 batch p95 = **117ms** (vs 2000ms AZ-488 threshold). | +| AZ-503-foundation NFR (no UPSERT hot-path regression) | **MET** — PT-08 uses the new integer-key + flight-aware UPSERT for 200 batches; 200/200 accepted, 0 rejected, 0 failed; p95 117ms is **faster** than any prior cycle's measurement of this path. | +| Cycle-3 perf-harness leftover | **STAYS OPEN** — replay #5 documented. The AZ-504 fix is verified working across two perf runs, but the "default-parameter exit-0 run" criterion is blocked by recurring local Docker/colima DNS cold-start (PT-01 fails on `mt0/tile.googleapis.com` DNS lookup at the first request after each `docker compose up`). Two concrete closure paths recorded in the leftover (DNS pre-warm in script, OR move perf gate to CI). | + +## 3. Structural Metrics (snapshot: `structure_2026-05-12_cycle5.md`) + +| Metric | Cycle 5 | Δ vs cycle 4 | +|--------|---------|--------------| +| .NET projects (csproj) | **9** | unchanged | +| Cross-project edges (ProjectReference) | **21** | **+1** (IntegrationTests → Common — justified by cross-repo invariant deduplication) | +| Cycles in project graph | 0 | unchanged | +| Average ProjectReferences per component | ~2.3 | +0.1 | +| Max in-degree (Common) | 7 | **+1** (now also imported by IntegrationTests) | +| New Architecture violations | **0** | unchanged | +| Resolved Architecture violations | **0** | -2 vs cycle 4 (cycle 4 forward-resolved 2 via the .NET 10 bump; cycle 5 had no bumps) | +| Net Architecture delta | **0** | unchanged (cycle 4 was also 0) | +| Public-API contract delta | **+1 minor** (`uav-tile-upload.md` 1.0.0 → 1.1.0 additive) | new this cycle (cycle 4 had no contract changes) | +| Database schema delta | **+4 columns, +2 indexes, -2 indexes, +1 extension (`pgcrypto`)** | new this cycle (cycle 4 had no schema changes) | +| NuGet package bumps | **0** | -16 (cycle 4 coordinated 16 distinct package bumps) | + +**Cycle 5 structural posture**: one schema migration, one minor contract bump, one new cross-component utility (`Uuidv5`), one new IntegrationTests → Common edge. Zero NuGet bumps, zero csproj additions, zero new public-API endpoints (the inventory endpoint is deferred to AZ-505). The DAG remains acyclic. + +## 4. Efficiency Metrics + +| Metric | Cycle 5 | Δ vs cycle 4 | +|--------|---------|--------------| +| Blocked tasks (during implementation) | **0 of 2** | unchanged | +| Tasks split mid-implement | **1 of 2** (AZ-503 → AZ-503-foundation + AZ-505 via A/B/C decision at /autodev Step 10) | new this cycle | +| Tasks completed first attempt (no post-review fix commits) | **2 of 2 (100%)** | unchanged (cycle 4 also 100%) | +| Tasks requiring multiple post-code-review fix commits | 0 | unchanged | +| Most-findings batch | batch 02 (AZ-503-foundation — 1 Low; batch 01 had 0 findings) | similar shape | +| Step-15 (Perf Test) execution | **EXECUTED twice** (Run #1 + Run #2; Run #2 is the better signal post `colima restart`) | unchanged (cycle 4 also executed Step 15) | +| Step-15 leftover at end of cycle | **YES (still — same cycle-3 leftover)** — but the AZ-504 verification half is satisfied this cycle; only the "infra-noise-free exit-0 run" half remains. | unchanged in identity, narrower in remaining gap | +| Step-14 (Security Audit) — net findings improvement | **0** new Medium+, 2 new informational Lows, 0 resolved | unchanged net direction | +| Number of A/B/C decision points hit during autodev | **3** (Step 10 scope split, Step 15 Run #1 gate, Step 15 Run #2 gate) | +2 vs cycle 4's 0 in-cycle A/B/C points | + +## 5. Patterns Identified + +### Pattern 1 — Spec-contradiction-driven A/B/C split at /autodev Step 10 is now a known shape + +When AZ-503-implementation started, three contradictions surfaced against the live codebase: `flight_id` did not exist as a column, `FlightId` did not exist as a DTO field, and `voting_status` did not exist as referenced by an AC. The combined work needed to make the spec executable as written was ~5 SP — above the cycle-policy cap. Per `meta-rule.mdc` "Critical Thinking" + `autodev/protocols.md` A/B/C scope-protection, the implement skill stopped, surfaced the contradictions, and offered three options (A: implement the spec as-written; B: defer the whole task; **C: split into foundation + follow-up** — which the user picked). + +The split produced a self-contained, testable, 3 SP foundation PBI that this cycle delivered cleanly, and a 5 SP `AZ-505` follow-up that is now blocked-linked in the Jira graph. **This is the first cycle where a scope-split mid-implement happened and landed cleanly without losing test coverage continuity.** + +**Insight**: when a /autodev cycle's task spec drifts from the live codebase by more than ~1 missing prerequisite, the split-into-foundation pattern is preferable to either (a) silently expanding the cycle's SP budget or (b) deferring the entire PBI. The foundation captures the prerequisite infrastructure the follow-up needs; the follow-up captures the user-facing capability. Both halves remain individually shippable and individually testable. + +### Pattern 2 — First measurable PT-08 batch in the project's history + +PT-08 (UAV batch upload p95) was scenario-spec'd in cycle 2 (AZ-488) but never produced a measurable batch number until this cycle. Three runs preceded this one (cycle 3 short variant: PT-08 crashed at line 417; cycle 4 full run: same crash; cycle 5 Run #1 + Run #2: **clean exits to summary with 200/200 accepted**). The AZ-504 fix (`grep -c … || true`) is the single thing that unblocked this. + +The captured numbers are: Run #1 batch p95 = 199ms, Run #2 batch p95 = **117ms**, both far under the 2000ms AZ-488 threshold. Per-item proxy = batch p95 / batch_size = **11–19ms**, far under the AZ-492 per-item gate. + +**Insight**: a 1-SP mechanical script fix unblocked a previously-unmeasurable NFR that had been carried as a leftover across three cycles. The leftover replay protocol kept the issue visible without blocking forward progress; the eventual fix took 15 minutes once it became the cycle's smallest atomic PBI. **This is a textbook case for the "leftover replay" mechanism in `tracker.mdc` — keep small unblocked-but-not-yet-fixed items visible, close them when they fit naturally in a cycle's scope.** + +### Pattern 3 — Local Docker/colima DNS cold-start is a recurring class of "infra-noise" failure that contaminates the perf gate + +Cycle 4's perf gate noted DNS resolution intermittence as an unrelated test flake. Cycle 5 hit the same class **twice in the same session** — first during the functional test phase (resolved by `colima restart`), then during Step 15 Run #1 (manifested on `tile.googleapis.com`), then again during Step 15 Run #2 (manifested on `mt0.google.com` — the warmup probe between Run #1 and Run #2 only touched `tile.googleapis.com` + `mt1.google.com`). + +The pattern is: the first Google-Maps tile fetch immediately after `docker compose up` may fail DNS resolution if the colima resolver hasn't pre-cached the specific hostname yet. Subsequent fetches succeed. PT-01 (cold tile download) is the first scenario that hits a Google-Maps hostname and therefore is the canary for this class of failure. + +**Insight**: this is not an application regression — it is **environment instability that the perf script does not currently shield itself against**. Two concrete mitigations are recorded in the cycle-3 leftover (replay #5 entry) as out-of-scope follow-up PBIs: +1. **Add a DNS pre-warm step** to `scripts/run-performance-tests.sh` before PT-01 (1 SP, deterministic fix); +2. **Move the perf gate to CI / a stable-resolver environment** (2 SP, structural fix). + +The pattern is worth surfacing as a Lesson — the perf-mode skill (`test-run/SKILL.md`) already says "always worth ONE re-run before declaring a regression", and we did one; but the underlying environment continues to be the cycle's single largest source of non-application gate noise. + +### Pattern 4 — Cross-repo cryptographic invariants belong in a code-level constant, not a doc reference + +AZ-503 introduces a `TileNamespace` UUID (`5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c`) that MUST byte-match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py` (sibling workspace), or every cross-repo `tileId` lookup silently misses. The constant lives as a pinned `public const string TileNamespace` in `SatelliteProvider.Common.Utils.Uuidv5` and is asserted by the `Uuidv5Tests` reference-vector unit tests. The sibling workspace is documented to mirror the same constant. + +**Insight**: when a cross-repo invariant is a magic constant (a UUID, a base32 alphabet, a tile-zoom convention), it must live in code in BOTH repos with reference-vector tests on BOTH sides. Documentation alone (e.g., "see contract foo.md") is not enough — a drift between the constants would only surface as a 100% lookup-miss in production, which is harder to detect than a unit-test failure. Cycle 5 captured this on the satellite-provider side; the sibling-repo side will be handled when AZ-505 lands. + +### Pattern 5 — Schema migrations + `pgcrypto` create a deploy-side dependency that the runbook must spell out + +Migration 014 enables `pgcrypto` (`CREATE EXTENSION IF NOT EXISTS pgcrypto;`) for the session-scoped backfill function. On stock Postgres 16 (our `postgres:16` Docker image), `pgcrypto` is bundled. On managed cloud Postgres providers (RDS, Cloud SQL, Azure Postgres), the migration-running role typically needs `cloudsqlsuperuser` / `rds_superuser` / equivalent to `CREATE EXTENSION`. If that privilege is absent, migration 014 fails and the application doesn't start. + +This was flagged as F2-cy5 (Low informational) during the security audit, recorded in `deploy_cycle5.md` operator runbook step 2, and explicitly called out as a recommended out-of-scope follow-up PBI. **It is NOT a code defect — it is a deploy-runbook gap.** + +**Insight**: cycles that add `CREATE EXTENSION` / `CREATE ROLE` / `ALTER SYSTEM` / other privilege-sensitive statements to migrations must produce a deploy-runbook update in the same cycle, even when the cycle is otherwise small. The information is uniquely-visible at code-write time; if the runbook update is deferred, it tends to get lost. + +## 6. Comparison vs. previous retros + +| Metric | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 | +|--------------------------------------|----------------|----------------|----------------|----------------|----------------| +| Tasks implemented | 1 | 2 | 6 | 1 | **2** | +| Total complexity delivered | 8 SP | 10 SP | 18 SP | 5 SP | **4 SP** | +| Batches | 1 | 2 | 5 | 1 | **2** | +| Critical/High review findings | 0 | 0 | 0 | 0 | **0** | +| New Medium review findings | 0 | 0 | 0 | 2 (bump conseq.) | **0** | +| New Low review findings | 3 | 6 (5 distinct) | 7 | 1 | **1** | +| Code review pass rate | 100% (1/1) | 100% (2/2) | 100% (5/5) | 100% (1/1) | **100% (2/2)** | +| Tasks completed first attempt | 0 of 1 | 0 of 2 | 5 of 6 | 1 of 1 | **2 of 2** | +| Tasks split mid-implement | 0 | 0 | 0 | 0 | **1** | +| New Medium security findings | 2 | 2 | 0 | 0 | **0** | +| Resolved security findings | 0 | 0 | 3 | 2 (fwd-bump) | **0** | +| Net Architecture delta | n/a (baseline) | +0 | -3 | 0 | **0** | +| Schema change | YES (AZ-484) | NO | NO | NO | **YES (AZ-503-foundation)** | +| Public-API contract change | YES (tile-storage v1.0.0) | YES (uav-tile-upload v1.0.0) | NO | NO | **YES (uav-tile-upload v1.0.0 → v1.1.0 additive)** | +| Step-15 (Perf) executed | YES | SKIPPED | SKIPPED | YES | **YES (Run #1 + Run #2)** | +| Step-15 leftover at retro | NO | YES | YES | YES | **YES (still — same cycle-3 leftover; AZ-504 verification half satisfied)** | +| First measurable PT-08 batch | n/a | n/a | n/a | NO (script crash) | **YES (Run #1 199ms, Run #2 117ms)** | + +### Did the cycle-4 actions land? + +- **Cycle 4 Action 1 (fix `scripts/run-performance-tests.sh:416-417` grep-pipefail)** — **LANDED as AZ-504** in cycle 5 (this cycle). Closes Pattern 2 from cycle 4's retro. Verified working across two perf runs. +- **Cycle 4 Action 2 (migrate `WithOpenApi(...)` callsites to ASP.NET Core 10 minimal-API metadata extensions, 3 SP)** — **NOT landed in cycle 5**. Explicitly out of AZ-503/AZ-504 scope per `coderule.mdc` "scope discipline". Re-listed as a recommended follow-up PBI in `deploy_cycle5.md`. Carries to cycle 6. +- **Cycle 4 Action 3 (pre-flight transitive-major-version impact analysis at task-spec time)** — **NOT directly exercised** in cycle 5. AZ-503 and AZ-504 had zero NuGet bumps, so the rule was preventive only. It still lives in `coderule.mdc` (added in cycle 4) and will fire on the next package-bumping PBI. + +This is the **third consecutive cycle where a prior-retro action landed** (Action 1 here, Action 1 in cycle 4, Actions 1+2+3 in cycle 3). Pattern is stable. + +### Did the cycle-3 actions land? + +- **Cycle 3 Action 1 (execute perf harness against deployed dev image)** — landed in cycle 4 (implicit AZ-500 NFR gate). Closed. +- **Cycle 3 Action 2 (bump `System.IdentityModel.Tokens.Jwt 7.0.3 → 7.1.2+`)** — **NOT landed in cycle 5**. Carry-over D2-cy4 still open. Test-runtime exposure only; safe to land independently. +- **Cycle 3 Action 3 (`workspace:` field on cross-repo ACs in new-task skill)** — **NOT exercised**. AZ-503 has one cross-workspace invariant (the `TileNamespace` UUID) but that invariant is captured as a Constraint in the task spec body rather than as a per-AC `workspace:` tag. The rule is still not codified in `new-task/SKILL.md`. AZ-505 next cycle has explicit cross-repo writes (the gps-denied-onboard side of the UUIDv5 namespace constant); the cycle-3 rule should land before AZ-505 writes its spec. + +## 7. Top 3 Improvement Actions (ranked by impact) + +### Action 1 — Add DNS pre-warm to `scripts/run-performance-tests.sh` before PT-01 (1 SP, deterministic) + +**Why this is the highest impact**: Pattern 3 above documents the same class of failure across cycles 4 + 5. It contaminates the perf gate with non-application noise, blocks closure of the cycle-3 perf-harness leftover (now in its third carry-over cycle), and forces a manual `colima restart` + re-run per session. A deterministic DNS pre-warm before PT-01 fires removes the entire failure class on the local-dev runner. + +**Action**: 1 SP PBI in cycle 6. Insert a `getent hosts mt0.google.com mt1.google.com mt2.google.com mt3.google.com tile.googleapis.com` (or equivalent for the runner's resolver) inside the api container immediately before PT-01 — fail-fast if any hostname is unresolvable after a small retry window. Closes the cycle-3 perf-harness leftover on the next run. + +**Cost**: ~20 minutes (one shell addition + one perf run to verify exit-0 + delete the leftover). Counted as 1 SP because deletion of the leftover requires a full clean run. + +### Action 2 — Implement AZ-505 (deferred AZ-503 half: inventory endpoint + HTTP/2 + Leaflet covering index, 5 SP) + +**Why**: AZ-505 is blocked-linked to AZ-503-foundation. The foundation (this cycle) shipped the deterministic identity, the integer-key UPSERT, and the per-flight layout. AZ-505 ships the user-facing inventory endpoint (`POST /api/satellite/tiles/inventory`) that lets consumers ask "given these N (lat,lon,zoom) coordinates, which tileIds + variants do you have?". AZ-505 also enables HTTP/2 on the API (required for batched inventory responses without TCP head-of-line blocking) and rewrites the Leaflet hot path against the new `location_hash` index. + +**Action**: 5 SP PBI in cycle 6. Foundation prerequisites are now in place. AZ-505 also carries the cross-repo write obligation for the gps-denied-onboard side of the UUIDv5 namespace constant. + +**Cost**: 5 SP — within the cycle cap. Foundation is done; this is the user-facing payload that justifies the schema work. + +### Action 3 — Add a `pgcrypto` pre-install check step to the deployment runbook (1 SP, ops-side) + +**Why**: Pattern 5 above. Migration 014 silently relies on `pgcrypto` being installable by the migration-running role. Stock Postgres 16 is fine; managed cloud Postgres providers may not be. F2-cy5 already captures this in `_docs/05_security/owasp_review_cycle5.md` and `deploy_cycle5.md` operator runbook step 2 — but a runbook step that says "check this before running the migration" is necessary if the project ever migrates off Docker-postgres for a non-dev environment. + +**Action**: 1 SP PBI in cycle 6 (or land as a doc-only PR within cycle 6's scope-discipline budget). Update the deployment runbook with a pre-migration `SELECT EXTNAME FROM pg_extension WHERE extname='pgcrypto'` + a fallback path if missing. + +**Cost**: ~30 minutes of doc work. Counted as 1 SP because it touches the deployment runbook (cross-cutting infra doc). + +## 8. Suggested Rule / Skill updates + +| File | Change | Rationale | +|------|--------|-----------| +| `coderule.mdc` (new bullet in scope-discipline section) | "When a migration adds a `CREATE EXTENSION` / `CREATE ROLE` / `ALTER SYSTEM` / other privilege-sensitive statement, the same cycle's deploy report MUST add a pre-migration runbook step that verifies the privilege exists in the target environment. The runbook step is required even when the cycle is otherwise small." | Pattern 5 (pgcrypto in migration 014 created a deploy-runbook gap that was caught at security-audit time, not at migration-write time) | +| `new-task/SKILL.md` (new check in Step 5 — Risks & Mitigation) | When a task spec introduces a cross-repo cryptographic invariant (a UUID namespace, a base32/64 alphabet, a tile-zoom convention, a deterministic-key formula), the spec MUST list both the in-workspace code location of the constant AND the sibling-workspace code location it must byte-match — with reference-vector tests on both sides. Doc references alone do not satisfy this. | Pattern 4 (`TileNamespace` UUID must byte-match `gps-denied-onboard/components/c6_tile_cache/_uuid.py`) | +| `autodev/protocols.md` (formalise Step-10 contradiction-driven A/B/C) | The "scope-split" branch of the scope-protection A/B/C choice should be a first-class named option, not an ad-hoc decision. When a /autodev cycle's task spec contradicts the live codebase by >=2 prerequisites, the implement skill should preferentially offer **A: implement as-written / B: defer entirely / C: split into foundation + follow-up** with C being the recommended default. Cycle 5 derived this manually; codifying it makes future cycles cheaper to navigate. | Pattern 1 (AZ-503 → AZ-503-foundation + AZ-505 split was the cycle's largest decision, made via ad-hoc Choose; the recommended option matched the documented pattern but the path was discovered, not followed) | +| `test-run/SKILL.md` (Perf Mode Step 5 — add explicit retrospective trigger) | After the "one re-run" rule fires twice across consecutive cycles with the same root-cause class (e.g., DNS, NTP, resolver), the perf-mode skill should auto-surface a recommended PBI for a deterministic fix at the environment / harness layer — not as a re-run, as a fix. Cycle 5 fired the rule manually here. | Pattern 3 (DNS cold-start hit cycle 4 + cycle 5; the perf-mode skill's "one re-run" already shields against single-incident noise, but doesn't yet escalate when noise recurs) | + +## 9. Decision items carried over (operator) + +- **Cycle-3 perf-harness leftover** — STAYS OPEN. Replay #5 entry recorded. **Half-closed** this cycle: AZ-504 script fix verified working across 2 runs; remaining half (full default-parameter exit-0 run) blocked by recurring local DNS noise. Closure path: Action 1 above (DNS pre-warm in script, 1 SP) OR move perf gate to CI/cloud runner. +- **Admin team iss/aud confirmation** (carried from cycles 3 + 4) — still required before promoting beyond `dev`. Unchanged. Tracked in `deploy_cycle3.md` + `deploy_cycle4.md` + `deploy_cycle5.md`. +- **D2-cy4 — `Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag** — unchanged. Test-runtime exposure only; safe to land in a future cycle. +- **D-IdentityModel-7.0.3** (cycle-4 carry-over — both `Microsoft.IdentityModel.Tokens` and `System.IdentityModel.Tokens.Jwt` at 7.0.3 in TestSupport, NU1902) — unchanged. Cycle-3 Action 2 obligation; should land before any new auth-touching cycle. +- **F1-cy5 — `metadata.flightId` authenticated provenance** — long-term, not actionable until an authoritative flight registry exists in the suite. Recorded in `owasp_review_cycle5.md` and `deploy_cycle5.md` as a long-term recommendation. +- **F2-cy5 — `pgcrypto` deployment runbook gap** — Action 3 above. Trivial doc-only fix. +- **`Serilog.AspNetCore 8.0.3` fallback** — unchanged; no 10.x line published as of cycle 5. Recheck at every cycle start. +- **Cross-repo doc `suite/_docs/10_auth.md` paragraph** (cycle-3 carry-over) — unchanged. +- **`workspace:` field on cross-repo ACs in `new-task/SKILL.md`** (cycle-3 Action 3, never landed) — must land before AZ-505 task spec is written, since AZ-505 has explicit cross-repo writes (the gps-denied-onboard side of the UUIDv5 namespace constant). + +## 10. What this retro says about process maturity + +Cycle 5 is the first cycle that: + +- **Split a task spec mid-implementation into a foundation + follow-up pair** that both shipped (foundation in cycle 5, follow-up scheduled as AZ-505). The /autodev step-10 contradiction-driven A/B/C path proved itself end-to-end. +- **Carried a 3-cycle-old leftover from "completely unmeasurable" to "verified-fixed, exit-0 blocked only by infra noise"**. The AZ-504 fix is provably working; the remaining blocker is environment, not code. +- **Shipped a schema migration with a backfill, a contract minor version bump, and a new on-disk path layout — all additive, all backward-compatible with cycle-4 clients** — proving the project's documentation pipeline (test-spec sync, doc update, ripple log, security audit, deploy report) now scales to schema-touching cycles, not just runtime/SDK migrations (cycle 4) or single-file refactors (cycles 1-3). +- **Recorded 3 new infrastructure-level recommendations** (DNS pre-warm in perf script, pgcrypto pre-install runbook step, cross-repo invariant rule in new-task skill) — none of them are bugs in the cycle's code, all of them are process gaps the cycle's *work shape* surfaced. This is the second consecutive cycle where the retro's top action items are predominantly process / harness / runbook, not code defects. + +The process continues to converge. The remaining friction points after cycle 5 are (a) local-dev Docker/colima DNS noise that contaminates Step 15 (Action 1), (b) the carry-over package-hygiene PBIs from cycles 3/4 (Test.Sdk, IdentityModel.Tokens.Jwt, WithOpenApi callsites, Serilog 10.x) that have been deferred per scope discipline four cycles in a row, and (c) the cross-repo invariant codification (cycle-3 Action 3) that must land before AZ-505 writes its spec. All are concrete cycle-6 PBI candidates totalling ~12-15 SP, which is one fully-loaded normal cycle — or split across cycles 6 + 7 if AZ-505 alone fills cycle 6. diff --git a/_docs/06_metrics/structure_2026-05-12_cycle5.md b/_docs/06_metrics/structure_2026-05-12_cycle5.md new file mode 100644 index 0000000..7ab963a --- /dev/null +++ b/_docs/06_metrics/structure_2026-05-12_cycle5.md @@ -0,0 +1,98 @@ +# Structural Snapshot — 2026-05-12 (post-cycle 5, AZ-503-foundation + AZ-504) + +Cycle 5 delta against `structure_2026-05-12_cycle4.md`. Source of truth: `_docs/02_document/module-layout.md` + on-disk `*.csproj` graph + `_docs/02_document/contracts/`. + +## Projects + +| Layer | csproj | Cycle 5 delta | +|-------|--------|---------------| +| 1 (Foundation) | `SatelliteProvider.Common` | **+1 file**: `Utils/Uuidv5.cs` (NEW, 80 LoC; RFC 9562 §5.5 SHA-1 UUIDv5 + pinned `TileNamespace` GUID). No new NuGet deps; uses framework-only `System.Security.Cryptography.SHA1` + `System.Buffers.Binary.BinaryPrimitives`. | +| 1 (Foundation) | `SatelliteProvider.DataAccess` | **+1 migration**: `Migrations/014_AddTileIdentityColumns.sql` (NEW, embedded resource); **+4 properties** on `Models/TileEntity.cs` (`FlightId`, `LocationHash`, `ContentSha256`, `LegacyId`); UPSERT in `Repositories/TileRepository.cs` rewritten for the integer-key + flight-aware contract. No new NuGet deps; `pgcrypto` extension enabled by the migration script (PG-server-side, not a NuGet). | +| 1 (Foundation, shared DTO) | `SatelliteProvider.Common` (DTO sub-folder) | **+1 property**: `DTO/UavTileMetadata.FlightId` (`Guid?`, init-only, optional). Contract `uav-tile-upload.md` bumped 1.0.0 → 1.1.0 (additive). | +| 3 (Application) | `SatelliteProvider.Services.TileDownloader` | **+ImportSite**: `using SatelliteProvider.Common.Utils` (for `Uuidv5.Create`) + `using System.Security.Cryptography` (for `SHA256.HashData`) in `TileService.cs` and `UavTileUploadHandler.cs`. `UavTileUploadHandler.BuildUavTileFilePath` gains an optional `Guid? flightId` parameter. No new csproj refs. | +| 3 (Application) | `SatelliteProvider.Services.RegionProcessing` | unchanged | +| 3 (Application) | `SatelliteProvider.Services.RouteManagement` | unchanged | +| 4 (API / Entry) | `SatelliteProvider.Api` | unchanged (`Program.cs` not edited this cycle) | +| 5 (Test-Support) | `SatelliteProvider.TestSupport` | unchanged | +| 6 (Tests) | `SatelliteProvider.Tests` | **+1 test class** (`Uuidv5Tests.cs`); **+3 facts** in `UavTileFilePathTests.cs`; **+2 facts** in `UavTileUploadHandlerTests.cs`. | +| 6 (Tests) | `SatelliteProvider.IntegrationTests` | **+1 ProjectReference** (`SatelliteProvider.Common`) — so raw-SQL seeds can call `Uuidv5.Create` directly; **+2 facts** in `UavUploadTests.cs` (AC-3 multi-flight, AC-4 float-rounding); **+3 facts** in `MigrationTests.cs` (Az503 columns / index / backfill determinism); 1 fact superseded (`NewUniqueConstraintIncludesSourceColumn_AZ484_AC1` → `Az503MigrationSupersedesAz484UniqueIndex`); seed for the pre-existing `MultiSourceCoexistence_AZ484_Cycle2` test repaired to populate `location_hash`. | + +**Project count**: 9 (unchanged from cycle 4 — AZ-503-foundation adds files to existing projects, doesn't add a new csproj). + +## Cross-Project Import Edges (compile-time `ProjectReference`) + +| Edge | Count | Cycle 5 delta | +|------|-------|----------------| +| Api → {Common, DataAccess, TileDownloader, RegionProcessing, RouteManagement} | 5 | unchanged | +| TileDownloader → {Common, DataAccess} | 2 | unchanged | +| DataAccess → {Common} | 1 | unchanged | +| RegionProcessing → {Common, DataAccess} | 2 | unchanged | +| RouteManagement → {Common, DataAccess} | 2 | unchanged | +| Tests → {Api, TileDownloader, RegionProcessing, RouteManagement, Common, DataAccess, TestSupport} | 7 | unchanged | +| IntegrationTests → {TestSupport, **Common (NEW)**} | **2** | **+1** edge (motivated by AZ-503 — seeds need the production `Uuidv5.Create` algorithm to compute `location_hash` for raw-SQL inserts) | + +**Total ProjectReference edges**: **21** (cycle 4: 20). Net delta: +1 edge. + +## Source-import sites — cycle 5 delta + +| Importer | Imports from | Cycle 5 delta | +|----------|--------------|---------------| +| `SatelliteProvider.Services.TileDownloader/TileService.cs` | `SatelliteProvider.Common.Utils` (Uuidv5), `System.Security.Cryptography` (SHA256) | NEW (AZ-503 deterministic identity + content hash) | +| `SatelliteProvider.Services.TileDownloader/UavTileUploadHandler.cs` | `SatelliteProvider.Common.Utils` (Uuidv5), `System.Security.Cryptography` (SHA256) | NEW (same — UAV write path) | +| `SatelliteProvider.IntegrationTests/UavUploadTests.cs` | `SatelliteProvider.Common.Utils` (Uuidv5) | NEW (seed helper for raw-SQL inserts) | +| `SatelliteProvider.Tests/Uuidv5Tests.cs` | `SatelliteProvider.Common.Utils` | NEW (unit-test class) | +| All other source files | unchanged | — | + +**5 new source-level import lines** across 4 files; all to either the new internal `SatelliteProvider.Common.Utils.Uuidv5` utility or the framework `System.Security.Cryptography.SHA256`. **Zero new third-party imports.** + +## Graph properties + +- **Cycles in project import graph**: 0 (clean DAG — unchanged) +- **Average ProjectReferences per component**: 21 / 9 = **~2.3** (cycle 4: ~2.2). Net delta: +0.1 (one new IntegrationTests → Common edge). +- **Max in-degree**: Common (still highest — now at **7** incoming edges: Api, TileDownloader, DataAccess, RegionProcessing, RouteManagement, Tests, **IntegrationTests (new)**). Cycle 4 had Common at 6. +- **Max out-degree**: Tests (7 — unchanged). +- **TestSupport position**: leaf-of-test-subgraph; no production-layer importers (unchanged). +- **The new IntegrationTests → Common edge is justified**: integration-test seeds need the same deterministic `Uuidv5` algorithm the production code uses, otherwise the new NOT NULL `location_hash` column would force every seed to encode the SHA-1 byte order by hand. Reusing `SatelliteProvider.Common.Utils.Uuidv5` keeps the algorithm in one place (which is also where the cross-repo invariant with `gps-denied-onboard/components/c6_tile_cache/_uuid.py` lives). + +## NuGet dependency hygiene (cycle 5) + +| Package | Cycle-4 version | Cycle-5 version | Status | +|---------|-----------------|-----------------|--------| +| All NuGet packages across all 9 csproj files | unchanged | **unchanged** | **Zero NuGet bumps this cycle.** AZ-503-foundation uses framework-only types (`SHA1`, `SHA256`, `BinaryPrimitives`); AZ-504 is a 2-line shell-script edit. | +| Carry-overs (still OPEN) | Cycle-3 D2 (`Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag), cycle-4 D4 (`Microsoft.IdentityModel.Tokens` 7.0.3 + `System.IdentityModel.Tokens.Jwt` 7.0.3 in TestSupport), `Serilog.AspNetCore` 8.0.3 fallback | unchanged | All three remain explicitly out of cycle-5 scope per `coderule.mdc` "scope discipline". | + +## Database schema surface (cycle 5 delta) + +| Object | Change | Source | +|--------|--------|--------| +| `tiles` table | **+4 columns**: `flight_id uuid NULL`, `location_hash uuid NOT NULL` (backfilled deterministically for pre-existing rows), `content_sha256 bytea NULL`, `legacy_id uuid NULL` | `014_AddTileIdentityColumns.sql` | +| `idx_tiles_unique_location_source` (AZ-484, float-based) | **DROPPED** | same | +| `idx_tiles_unique_location` (pre-AZ-484, defensive duplicate cleanup) | **DROPPED** | same | +| `idx_tiles_unique_identity` (integer-key + COALESCE(flight_id, zero-UUID)) | **CREATED** (unique) | same | +| `idx_tiles_location_hash` (location_hash lookups for the future AZ-505 inventory endpoint) | **CREATED** (non-unique) | same | +| Postgres extension `pgcrypto` | **CREATE EXTENSION IF NOT EXISTS** (used during the migration's session-scoped `pg_temp.uuidv5` PL/pgSQL function only) | same | + +The migration runs in a single transaction and is idempotent under DbUp's journal. No table or column was renamed (per `coderule.mdc` "Do not rename any database objects without confirmation"). + +## Architecture / contract surface (cycle 5 delta) + +- **Contract bumped**: `_docs/02_document/contracts/api/uav-tile-upload.md` v1.0.0 → **v1.1.0** (additive — adds optional `metadata.flightId: uuid?`; adds derived `tileId` field in the response). Backward-compatible with cycle-4 clients. +- **No new public-API contract files.** (AZ-505 will introduce `inventory.md` next cycle.) +- **New per-flight on-disk path layout** for UAV tiles: `./tiles/uav/{flightId or 'none'}/{z}/{x}/{y}.jpg`. Additive — legacy paths under `./tiles/uav/{z}/{x}/{y}.jpg` are not moved. Documented in `uav-tile-upload.md` "File-path layout" section. +- **New cross-repo invariant**: the `TileNamespace` GUID `5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c` in `SatelliteProvider.Common/Utils/Uuidv5.cs` must match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py` (sibling workspace). Cycle-5 commit covers the satellite-provider side; the gps-denied-onboard side will be handled when AZ-505 / the consumer-side work runs. + +## Net Architecture delta vs cycle 4 + +- **Resolved (closed by this cycle)**: 0 architecture-level findings closed (the cycle-3 perf-harness leftover is half-closed — AZ-504 script fix is verified working, but the exit-0 deletion criterion is blocked by recurring local DNS noise; replay #5 documented). +- **Newly introduced (informational only)**: + - 2 Low informational findings in the security audit (F1-cy5 `metadata.flightId` not authenticated provenance; F2-cy5 `pgcrypto` deployment runbook gap on managed Postgres). Both are long-term recommendations, not code defects. + - 1 Low Maintainability finding in the AZ-503 code review (the `contentSha256` soft-NULL guard in `TileService.BuildTileEntity` when `File.Exists` is false — practically unreachable, defensively kept). + - 0 new Medium, 0 new High, 0 new Critical. +- **+1 cross-project edge** (IntegrationTests → Common) — justified (cross-repo invariant deduplication). +- **Contract delta**: 1 minor version bump on `uav-tile-upload.md` (1.0.0 → 1.1.0, additive). +- **Schema delta**: +1 migration (014), +4 columns, +2 indexes, -2 indexes, +1 Postgres extension. +- **Net Architecture delta**: 0 net architecture-level findings (the 3 new Lows are informational only, and the +1 edge + minor contract bump are net-neutral structural additions, not violations). + +## What this snapshot says about cycle 5's shape + +Cycle 5 is the project's first **schema-changing cycle since cycle 1's AZ-484** — a database migration with backfill, a new internal cross-component algorithm (`Uuidv5`), a contract minor bump, and a new on-disk layout for one tile source. Quantitatively it's small (4 SP delivered = 3 SP foundation + 1 SP script fix), but it lays foundational identity infrastructure that next cycle's AZ-505 (5 SP — inventory endpoint, HTTP/2, Leaflet covering index) is blocked on. The structural delta is bounded (`+1` import edge, `+1` minor contract version, +1 migration, +4 columns, +2 indexes, -2 indexes, 0 new csproj, 0 new NuGet bumps), and the DAG remains acyclic with the same 9 projects. diff --git a/_docs/LESSONS.md b/_docs/LESSONS.md index e2f7381..fbe3060 100644 --- a/_docs/LESSONS.md +++ b/_docs/LESSONS.md @@ -37,6 +37,12 @@ If the enum's wire string happens to match a member name case-insensitively (e.g ## Ring buffer (last 15 entries — newest at top) +- [2026-05-12] [architecture] Cross-repo cryptographic invariants (UUID namespaces, deterministic-key formulas, base32/64 alphabets, tile-zoom conventions) MUST live as code-level constants in BOTH repos with reference-vector tests on BOTH sides — documentation alone is insufficient because constant drift surfaces only as 100% lookup misses in production, harder to detect than a unit-test failure (cycle 5: AZ-503 introduced `TileNamespace = 5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c` which must byte-match the same constant in `gps-denied-onboard/components/c6_tile_cache/_uuid.py`; the satellite-provider side has the constant + 10 Python-generated reference vectors in `Uuidv5Tests.cs` and the sibling repo will mirror). + Source: _docs/06_metrics/retro_2026-05-12_cycle5.md +- [2026-05-12] [tooling] Local Docker/colima DNS cold-start is a recurring class of failure that contaminates the Step-15 perf gate — when the perf-mode "one re-run" rule fires twice across consecutive cycles with the same root-cause class (DNS / NTP / resolver), the harness must escalate from "re-run" to a deterministic fix at the harness layer (DNS pre-warm in script, OR move gate to CI), not just another re-run (cycle 5: PT-01 failed Run #1 on `tile.googleapis.com` cold-start, then Run #2 on `mt0.google.com` cold-start; the warmup probe between runs only touched the hostnames it explicitly named). + Source: _docs/06_metrics/retro_2026-05-12_cycle5.md +- [2026-05-12] [process] When a /autodev cycle's task spec contradicts the live codebase by ≥2 missing prerequisites, the implement skill should preferentially split into foundation + follow-up via A/B/C (option C) rather than (A) silently expand the SP budget or (B) defer the entire task — both halves remain individually shippable and individually testable, the cross-PBI dependency is captured as a blocked-link in the tracker (cycle 5: AZ-503 → AZ-503-foundation + AZ-505 split when `flight_id` / `FlightId` / `voting_status` all missing from live code; AZ-503-foundation shipped this cycle, AZ-505 blocks-on-it for cycle 6). + Source: _docs/06_metrics/retro_2026-05-12_cycle5.md - [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with `dotnet restore --dry-run` against a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumped `Microsoft.OpenApi` 1.x → 2.x via the `Microsoft.AspNetCore.OpenApi` 8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md - [2026-05-12] [process] When a scope-protected task newly *exposes* a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing `scripts/run-performance-tests.sh:417` `grep -o | wc -l` + `pipefail` bug).