retro_2026-05-12_cycle5.md captures the cycle-end retrospective: - Implementation: 2 tasks (AZ-504 + AZ-503-foundation), 4 SP total, 100% first-attempt pass rate, 1 mid-implement scope-split (AZ-503 → AZ-503-foundation + AZ-505, blocked-linked). - Quality: 50/50 PASS/PASS_WITH_WARNINGS, 0 new Medium+, 1 new Low (defensive contentSha256 soft-NULL guard). - Security: PASS_WITH_WARNINGS, 0 new Critical/High/Medium, 2 new Low informational (F1 flightId provenance, F2 pgcrypto runbook gap). - Performance: PASS_WITH_INFRA_WARNINGS — first measurable PT-08 ever (Run #1 199ms, Run #2 117ms vs 2000ms threshold); PT-01/02 failed on recurring local Docker/colima DNS cold-start, not an app regression. - Structural: +1 ProjectReference edge (IntegrationTests → Common), +1 minor contract bump (uav-tile-upload 1.0.0 → 1.1.0), +1 DB migration (014_AddTileIdentityColumns.sql), 0 NuGet bumps, 0 csproj additions, DAG still acyclic at 9 projects. structure_2026-05-12_cycle5.md captures the structural snapshot. LESSONS.md updated with 3 cycle-5 entries (oldest dropped to preserve the 15-entry ring buffer): - [architecture] Cross-repo cryptographic invariants must live as code constants in both repos with reference-vector tests. - [tooling] When perf-mode "one re-run" fires twice with the same DNS root cause, escalate from re-run to harness fix. - [process] Spec contradicts live code by >=2 prerequisites → prefer split into foundation + follow-up (A/B/C option C). Top 3 follow-up actions (cycle 6 candidates): - Action 1 (1 SP): DNS pre-warm in scripts/run-performance-tests.sh → closes the cycle-3 perf-harness leftover. - Action 2 (5 SP): AZ-505 — inventory endpoint + HTTP/2 + Leaflet covering index (blocked-linked on AZ-503-foundation, this cycle). - Action 3 (1 SP): pgcrypto pre-install runbook step (F2-cy5 doc fix). Cycle 5 closed. Autodev state advanced for cycle 6 by the next /autodev invocation. Co-authored-by: Cursor <cursoragent@cursor.com>
10 KiB
Engineering Lessons
Recurring bugs, surprising library behaviors, and process insights extracted from completed cycles. Newest at the top. Keep entries short — this is for fast scanning at the start of new cycles, not exhaustive history.
This file has two layers:
- Deep engineering lessons (L-NNN): library bugs, architectural insights, multi-paragraph context. Persist forever.
- Ring buffer at the bottom: most recent 15 single-sentence lessons emitted by the retrospective skill, consumed by
new-task/plan/decompose/autodevStep 0. Oldest entries drop off the top.
Categories: estimation · architecture · testing · dependencies · tooling · process
L-001 — Dapper TypeHandler<T> is bypassed for enum types during read deserialization
Cycle: 1 (AZ-484)
Discovered by: integration test failure (Error parsing column 12 (source=google_maps - String)); root-caused via web search to long-standing Dapper issue #259.
Affects: Dapper 2.1.35 (and most other versions until the proposed Settings.PreferTypeHandlersForEnums opt-in in PR #2200, not yet merged).
What happens
Registering SqlMapper.AddTypeHandler(new MyEnumHandler()) for an enum type — even via SqlMapper.TypeHandler<TEnum> — works for writes (the handler's SetValue is invoked for parameter binding) but is silently bypassed for reads. Dapper's IL-emitted deserializer checks IsEnum first and falls back to Enum.TryParse(string, ignoreCase: true).
Why this is dangerous
If the enum's wire string happens to match a member name case-insensitively (e.g., RegionStatus.Failed ↔ "failed"), the bypass goes unnoticed and round-trip works accidentally. The bug only surfaces when the wire format diverges from the C# member name (e.g., TileSource.GoogleMaps ↔ "google_maps" — Enum.TryParse("google_maps") does not match GoogleMaps because of the underscore).
Recommended approach
- Do not rely on
SqlMapper.TypeHandler<TEnum>for read-side enum mapping unless the wire values match the enum member names case-insensitively. - For enums whose wire format diverges (snake_case, kebab-case, custom IDs), store the entity field as
stringand provide an explicit converter (*Converter.ToWireValue/FromWireValue) for use at the service-layer boundary. This is what AZ-484 does forTileEntity.Source↔TileSourceConverter. - Unit-test the converter directly. Do not assume that round-tripping through Dapper proves anything for enums.
Detection
- Unit tests of the type handler in isolation will pass even when the handler is bypassed at runtime.
- Failure surfaces only at integration-test time when the actual SELECT runs.
- If you must keep an enum-typed field, write at minimum one integration test that reads the enum back through Dapper from a real database row.
Ring buffer (last 15 entries — newest at top)
- [2026-05-12] [architecture] Cross-repo cryptographic invariants (UUID namespaces, deterministic-key formulas, base32/64 alphabets, tile-zoom conventions) MUST live as code-level constants in BOTH repos with reference-vector tests on BOTH sides — documentation alone is insufficient because constant drift surfaces only as 100% lookup misses in production, harder to detect than a unit-test failure (cycle 5: AZ-503 introduced
TileNamespace = 5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6cwhich must byte-match the same constant ingps-denied-onboard/components/c6_tile_cache/_uuid.py; the satellite-provider side has the constant + 10 Python-generated reference vectors inUuidv5Tests.csand the sibling repo will mirror). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md - [2026-05-12] [tooling] Local Docker/colima DNS cold-start is a recurring class of failure that contaminates the Step-15 perf gate — when the perf-mode "one re-run" rule fires twice across consecutive cycles with the same root-cause class (DNS / NTP / resolver), the harness must escalate from "re-run" to a deterministic fix at the harness layer (DNS pre-warm in script, OR move gate to CI), not just another re-run (cycle 5: PT-01 failed Run #1 on
tile.googleapis.comcold-start, then Run #2 onmt0.google.comcold-start; the warmup probe between runs only touched the hostnames it explicitly named). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md - [2026-05-12] [process] When a /autodev cycle's task spec contradicts the live codebase by ≥2 missing prerequisites, the implement skill should preferentially split into foundation + follow-up via A/B/C (option C) rather than (A) silently expand the SP budget or (B) defer the entire task — both halves remain individually shippable and individually testable, the cross-PBI dependency is captured as a blocked-link in the tracker (cycle 5: AZ-503 → AZ-503-foundation + AZ-505 split when
flight_id/FlightId/voting_statusall missing from live code; AZ-503-foundation shipped this cycle, AZ-505 blocks-on-it for cycle 6). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md - [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with
dotnet restore --dry-runagainst a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumpedMicrosoft.OpenApi1.x → 2.x via theMicrosoft.AspNetCore.OpenApi8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md - [2026-05-12] [process] When a scope-protected task newly exposes a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing
scripts/run-performance-tests.sh:417grep -o | wc -l+pipefailbug). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md - [2026-05-12] [process] When a cycle has a single non-functional task (migration / refactor / dependency hygiene), the retro must reframe the metric set around continuity (0 regressions), forward-resolution (prior findings closed by the bump itself), and unblocking (capabilities now exercisable end-to-end) — task count + complexity points read as misleading flatlines that look like under-productivity (cycle 4: AZ-500 alone delivered 5 SP vs cycle 3's 18 SP, but the cycle's value was forward-resolving 2 cycle-3 advisories and finally executing PT-01..PT-08 end-to-end against the migrated build). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
- [2026-05-12] [process] For cross-team blockers (admin team must supply config values, etc.), prefer an Option-B forcing function (ship the validation/scaffolding with prod-empty config that fails-fast at deploy) over deferring the entire task — the fail-fast contract makes the cross-team conversation impossible to skip and ships the in-workspace work in the current cycle (cycle 3: AZ-494 shipped iss/aud validation with empty prod appsettings so deploy must supply real values). Source: _docs/06_metrics/retro_2026-05-12_cycle3.md
- [2026-05-12] [process] ACs that prescribe a specific measurement or sentinel mechanism (e.g. "per-item latency < 50ms", "guard fires when DB name contains _test") should also prescribe — or explicitly defer — the path for collecting / enforcing it, or implementations will substitute proxies / equivalents that look like spec drift in review (cycle 3: AZ-492 PT-08 per-item gate cost became a derived proxy; AZ-493 DB-name guard became Host-allowlist). Source: _docs/06_metrics/retro_2026-05-12_cycle3.md
- [2026-05-12] [process] ACs that require cross-repo writes should be tagged with the target workspace and rendered separately in the traceability matrix — mixing them with in-workspace ACs makes "correctly deferred" indistinguishable from "incomplete work" (cycle 3: AZ-494 AC-7 deferred for the suite-repo write; matrix renders as
◐ deferredwhich is ambiguous). Source: _docs/06_metrics/retro_2026-05-12_cycle3.md - [2026-05-11] [testing] Test helpers shared across unit + integration projects must live in one consolidated location — duplicate near-identical copies will diverge and require parallel fixes (cycle 2:
JwtTokenFactory.csandJwtTestHelpers.cshad the sameExpires < NotBeforebug fixed in separate commits). Source: _docs/06_metrics/retro_2026-05-11_cycle2.md - [2026-05-11] [process] Deferred-status NFR entries are allowed at most ONCE per NFR — if a Deferred NFR has not landed by the end of the cycle that follows the one in which it was deferred, the harness work must be promoted to a real PBI before any new NFR is accepted as Deferred (cycle 2 inherited cycle 1's PT-07 + added PT-08 + JWT-attach script-rot). Source: _docs/06_metrics/retro_2026-05-11_cycle2.md
- [2026-05-11] [testing] Integration tests must explicitly reset DB state at startup — relying on wallclock seeds or "tests probably won't collide" is a workaround, not isolation; the persistent Postgres volume in docker-compose makes test data accumulation the default state (cycle 2:
UavUploadTests._coordinateCountercollision was patched with a wallclock seed instead of a real DB-reset hook). Source: _docs/06_metrics/retro_2026-05-11_cycle2.md - [2026-05-11] [testing] Persisted enums need a Dapper read-roundtrip integration test — unit-testing the type handler in isolation does not prove read-side behavior (see L-001). Source: _docs/06_metrics/retro_2026-05-11.md
- [2026-05-11] [process] NFR test-spec additions must include the runner-script implementation in the same step, or be tagged "Deferred — harness work tracked in "; otherwise scenarios accumulate as Unverified across cycles. Source: _docs/06_metrics/retro_2026-05-11.md
- [2026-05-11] [estimation] Task-spec test-site-count estimates must be backed by an explicit grep evidence block, not pattern-matched against neighboring code (AZ-484 spec said ~3 sites in
RegionServiceTests; actual = 0). Source: _docs/06_metrics/retro_2026-05-11.md