Files
satellite-provider/_docs/LESSONS.md
T
Oleksandr Bezdieniezhnykh 62d6b8310a
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful
[AZ-808] [AZ-809] [AZ-810] [AZ-811] [AZ-812] Cycle 8 retro + close
Cycle 8 retrospective (cycle-end mode): 5 tickets shipped, 17 SP, 4
batches across 1 cycle theme (strict input validation for the 4
AZ-795 child endpoints + AZ-812 region API field rename).

Artifacts:
- _docs/06_metrics/retro_2026-05-23_cycle8.md
- _docs/06_metrics/structure_2026-05-23_cycle8.md (gap-filled; last
  structural snapshot was cycle 5)

Key cycle-8 findings (now in _docs/LESSONS.md ring buffer):
- Step-14 security-audit Medium findings under the small-fix
  threshold should be resolved in-cycle, not deferred (F-AZ809-1
  closed in commit 8fca6e0, ~30 min from discovery to fix).
- Retro recommendations ship end-to-end when they name concrete
  tickets/files + size as a coherent cycle theme (cycle 7 Action 3
  -> cycle 8 strict-validation slate, first end-to-end traceable
  cross-cycle improvement action in project history).
- Contract wire-format updates (new required field / rename) need a
  ripgrep probe across all consumer paths (perf script, probe
  scripts, README, deploy docs, OpenAPI examples) — partial syncs
  surface at Step 15 perf gate (PT-06 missed AZ-809 requestMaps +
  createTilesZip, fixed in commit 32bc5c1).

Carry-overs to cycle 9: track PT-07 cache-pollution false positive
(harness, not regression), reduce 3 cycles in a row of misleading
"PT-07 fails on warm/cold ratio" entries in the perf report.

Marks Step 17 completed; cycle 8 closed. Next /autodev invocation
starts cycle 9 from Step 0.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 17:17:03 +03:00

14 KiB

Engineering Lessons

Recurring bugs, surprising library behaviors, and process insights extracted from completed cycles. Newest at the top. Keep entries short — this is for fast scanning at the start of new cycles, not exhaustive history.

This file has two layers:

  • Deep engineering lessons (L-NNN): library bugs, architectural insights, multi-paragraph context. Persist forever.
  • Ring buffer at the bottom: most recent 15 single-sentence lessons emitted by the retrospective skill, consumed by new-task / plan / decompose / autodev Step 0. Oldest entries drop off the top.

Categories: estimation · architecture · testing · dependencies · tooling · process


L-001 — Dapper TypeHandler<T> is bypassed for enum types during read deserialization

Cycle: 1 (AZ-484) Discovered by: integration test failure (Error parsing column 12 (source=google_maps - String)); root-caused via web search to long-standing Dapper issue #259. Affects: Dapper 2.1.35 (and most other versions until the proposed Settings.PreferTypeHandlersForEnums opt-in in PR #2200, not yet merged).

What happens Registering SqlMapper.AddTypeHandler(new MyEnumHandler()) for an enum type — even via SqlMapper.TypeHandler<TEnum> — works for writes (the handler's SetValue is invoked for parameter binding) but is silently bypassed for reads. Dapper's IL-emitted deserializer checks IsEnum first and falls back to Enum.TryParse(string, ignoreCase: true).

Why this is dangerous If the enum's wire string happens to match a member name case-insensitively (e.g., RegionStatus.Failed"failed"), the bypass goes unnoticed and round-trip works accidentally. The bug only surfaces when the wire format diverges from the C# member name (e.g., TileSource.GoogleMaps"google_maps"Enum.TryParse("google_maps") does not match GoogleMaps because of the underscore).

Recommended approach

  • Do not rely on SqlMapper.TypeHandler<TEnum> for read-side enum mapping unless the wire values match the enum member names case-insensitively.
  • For enums whose wire format diverges (snake_case, kebab-case, custom IDs), store the entity field as string and provide an explicit converter (*Converter.ToWireValue / FromWireValue) for use at the service-layer boundary. This is what AZ-484 does for TileEntity.SourceTileSourceConverter.
  • Unit-test the converter directly. Do not assume that round-tripping through Dapper proves anything for enums.

Detection

  • Unit tests of the type handler in isolation will pass even when the handler is bypassed at runtime.
  • Failure surfaces only at integration-test time when the actual SELECT runs.
  • If you must keep an enum-typed field, write at minimum one integration test that reads the enum back through Dapper from a real database row.

Ring buffer (last 15 entries — newest at top)

  • [2026-05-23] [process] Step-14 security-audit Medium findings whose remediation fits the small-fix threshold (≤2 SP, ≤2 files, ≤1 contract bump) should be resolved within the same autodev invocation rather than deferred to cycle N+1 — the fix lands with the same commit chain that introduced the surface, the contract version reflects the fix immediately, and the traceability matrix and blackbox-tests.md sub-cases are written while the finding is fresh; codify the option as a first-class A/B/C choice in .cursor/skills/autodev/flows/existing-code.md Step-14 action (cycle 8: F-AZ809-1 unbounded geofences.polygons DoS — discovered in commit ac40a8b, resolved in commit 8fca6e0 with MaxPolygons = 50 cap + unit + integration test + route-creation.md v1.0.1 patch bump, ~30 minutes from finding to fix landed). Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
  • [2026-05-23] [process] Retrospective recommendations ship end-to-end in the next cycle only when they (a) name concrete tracker tickets / files / endpoints in the action text, (b) are sized as a coherent cycle theme rather than scattered one-off fixes, and (c) the next cycle's planning phase pulls the recommendation directly into the task slate without re-deriving scope — phrase recommendations to satisfy all three or they become multi-cycle carry-overs (cycle 7 Action 3 named the 4 AZ-795 child endpoints + the SP sizing → cycle 8 shipped AZ-808 + AZ-809 + AZ-810 + AZ-811 + AZ-812 as the coherent strict-validation theme, first directly-traceable cross-cycle improvement-action end-to-end in project history). Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
  • [2026-05-23] [tooling] When a contract introduces a new required field or renames a wire key, the test-spec sync step must ripgrep all consumer paths (scripts/run-performance-tests.sh, scripts/probe_*.sh, README.md example URLs, _docs/04_deploy/*.md example bodies, _docs/02_document/tests/blackbox-tests.md trigger excerpts, OpenAPI examples) and verify they reference the new shape — otherwise partial syncs surface at Step 15 Performance Test gate or worse in production (cycle 8: AZ-812 lat/lon rename updated the perf script but AZ-809's newly-required requestMaps + createTilesZip fields did not propagate to PT-06; caught at Step 15 with a 1-line script fix in commit 32bc5c1 — a static-check probe at batch closure would have caught it 2 days earlier). Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
  • [2026-05-23] [process] When verifying a "no-regression" AC for an input-validation change ("AZ-NNN does not break existing tests"), the only sound evidence is a green integration-test run — tracing fixture variables back to their generators in source is insufficient because helpers can produce values outside the new bounds and previously slipped through silently when no validator existed; document the standard as "verified by reading source" → unconfirmed, "verified by full test run" → confirmed, and gate the batch report's AC table on the latter before the implement skill closes the batch (cycle 8: AZ-810 batch_04 AC-9 claimed "no AZ-488 regression" based on tracing latitude = coord.Latitude in test source, but NextTestCoordinate seeded by (Ticks/TicksPerSecond) % 1_000_000 produced lat far above 90° at runtime; the false-PASS only surfaced at autodev Step 11 when the integration test run returned HTTP 400 from the new validator on the AZ-488 happy path). Source: _docs/03_implementation/batch_04_cycle8_report.md (AC-9 row)
  • [2026-05-22] [process] When the implement skill ships a cycle's batch commit without writing _docs/03_implementation/implementation_report_*_cycle{N}.md, downstream skills (test-spec cycle-update, document task mode, retrospective Step 1) must fall back to reading the cycle's task specs in _docs/02_tasks/done/ plus the commit body via git log --grep='[AZ-...]' — codify the fallback in those skills' instructions instead of leaving it as per-cycle improvisation, because the implicit contract between Step 10 and Steps 11-17 broke silently this cycle and only succeeded because every downstream skill happened to be robust enough to substitute (cycle 7: AZ-794+AZ-795+AZ-796 shipped as commit 865dfdb with no report artifact; doc-skill auto-walked the diff, test-spec read the task specs, retrospective wrote from the deploy + security + perf reports — all worked, but the contract was never formal). Status: closed by cycle 8 — the implement skill self-corrected and produced both per-batch reports AND a consolidated implementation report AND a completeness report AND a cumulative cross-batch review. Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
  • [2026-05-22] [testing] When a strict-validation layer ships (JsonSerializerOptions.UnmappedMemberHandling.Disallow, FluentValidation rules, explicit DTO [JsonRequired]), expect the project's own integration tests to surface latent bugs the prior lenient defaults had been masking — silent PascalCase fallback property names, out-of-range fixture coordinates, wrong-cased JSON keys; correct them in the same PR or the test suite goes red and the strict layer looks like a regression instead of the bug-finder it is (cycle 7: IdempotentPostTests.RoutePoint had been posting {"Lat":...} against a [JsonPropertyName("lat")] DTO for months; the new strict deserializer caught it and the 2-line payload fix landed alongside the strict layer; cycle 8 instance: UavUploadTests.NextTestCoordinate produced lat > 90°, caught by AZ-810 validator, 2-file clamp fix in batch 4). Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
  • [2026-05-22] [architecture] Contract MAJOR bumps for projects with ≤2 known consumers should ship without a wire-format adapter — the cost of maintaining a dual-accepting endpoint outweighs the benefit when the operator runbook can coordinate the consumer cut-over directly; only invest in an adapter when consumer count or release-cadence asymmetry makes coordinated cut-over impractical (cycle 7: tile-inventory.md 1.0.0 → 2.0.0 renamed tileZoom/tileX/tileY → z/x/y and shipped breaking; the only consumer is gps-denied-onboard and the AZ-777 follow-up loop handled the coordination via the operator runbook in deploy_cycle7.md). Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
  • [2026-05-12] [tooling] Kestrel HttpProtocols.Http1AndHttp2 silently serves only HTTP/1.1 over a plaintext listener — ALPN requires TLS, so any "enable HTTP/2" task without TLS in its definition-of-done will downgrade transparently and the only log line is at INFO; tasks that mention HTTP/2 / h2 / multiplexing / ALPN MUST resolve the TLS-vs-h2c choice at spec-write time and the test gate MUST assert HttpVersion == 2.0 not just a 200 (cycle 6: AZ-505 AC-5 first landed on h2c plaintext, required a post-merge TLS+ALPN pivot with dev-cert plumbing across compose/tests/perf/docs). Source: _docs/06_metrics/retro_2026-05-12_cycle6.md
  • [2026-05-12] [testing] When a test bypasses Dapper to gain access to a feature Dapper doesn't expose (e.g. ANY($1::uuid[]) array params, raw NpgsqlCommand for performance fixtures), the test owns the Npgsql type-conversion contract that Dapper used to handle silently — DateTime.Kind=Utc must be converted to Unspecified before binding into a timestamp without time zone column (cycle 6: AZ-505 introduced two Dapper-bypassing paths and all three new test files hit the same Cannot write DateTime with Kind=UTC error until DateTime.SpecifyKind(..., Unspecified) was added at the bind sites). Source: _docs/06_metrics/retro_2026-05-12_cycle6.md
  • [2026-05-12] [testing] Tests that assert specific schema artifact names (idx_<name> / pk_<name> / fk_<name>) need cross-migration awareness — phrase assertions at the capability abstraction level ("any index whose first column is location_hash") rather than the artifact-name level when possible, otherwise drop/rename migrations require fixture co-updates in the same PR (cycle 6: MigrationTests.Az503NewUniqueIndexCoversIntegerKeyAndFlightId hardcoded idx_tiles_location_hash from migration 014; migration 015 dropped it, broke the assertion until broadened to accept either index name). Source: _docs/06_metrics/retro_2026-05-12_cycle6.md
  • [2026-05-12] [architecture] Cross-repo cryptographic invariants (UUID namespaces, deterministic-key formulas, base32/64 alphabets, tile-zoom conventions) MUST live as code-level constants in BOTH repos with reference-vector tests on BOTH sides — documentation alone is insufficient because constant drift surfaces only as 100% lookup misses in production, harder to detect than a unit-test failure (cycle 5: AZ-503 introduced TileNamespace = 5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c which must byte-match the same constant in gps-denied-onboard/components/c6_tile_cache/_uuid.py; the satellite-provider side has the constant + 10 Python-generated reference vectors in Uuidv5Tests.cs and the sibling repo will mirror). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
  • [2026-05-12] [tooling] Local Docker/colima DNS cold-start is a recurring class of failure that contaminates the Step-15 perf gate — when the perf-mode "one re-run" rule fires twice across consecutive cycles with the same root-cause class (DNS / NTP / resolver), the harness must escalate from "re-run" to a deterministic fix at the harness layer (DNS pre-warm in script, OR move gate to CI), not just another re-run (cycle 5: PT-01 failed Run #1 on tile.googleapis.com cold-start, then Run #2 on mt0.google.com cold-start; the warmup probe between runs only touched the hostnames it explicitly named). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
  • [2026-05-12] [process] When a /autodev cycle's task spec contradicts the live codebase by ≥2 missing prerequisites, the implement skill should preferentially split into foundation + follow-up via A/B/C (option C) rather than (A) silently expand the SP budget or (B) defer the entire task — both halves remain individually shippable and individually testable, the cross-PBI dependency is captured as a blocked-link in the tracker (cycle 5: AZ-503 → AZ-503-foundation + AZ-505 split when flight_id / FlightId / voting_status all missing from live code; AZ-503-foundation shipped this cycle, AZ-505 blocks-on-it for cycle 6). Source: _docs/06_metrics/retro_2026-05-12_cycle5.md
  • [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with dotnet restore --dry-run against a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumped Microsoft.OpenApi 1.x → 2.x via the Microsoft.AspNetCore.OpenApi 8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
  • [2026-05-12] [process] When a scope-protected task newly exposes a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing scripts/run-performance-tests.sh:417 grep -o | wc -l + pipefail bug). Source: _docs/06_metrics/retro_2026-05-12_cycle4.md