Files
satellite-provider/_docs/LESSONS.md
T
Oleksandr Bezdieniezhnykh fb9e3ee31a [AZ-1132] [AZ-1133] Cycle 15 closeout and cycle 16 task setup
Doc ripple for FluentValidation bump, cycle 15 retro, and AZ-1133
perf preflight task queued for cycle 16.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-26 17:50:29 +03:00

82 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Engineering Lessons
Recurring bugs, surprising library behaviors, and process insights extracted from completed cycles. Newest at the top. Keep entries short — this is for fast scanning at the start of new cycles, not exhaustive history.
This file has two layers:
- **Deep engineering lessons** (L-NNN): library bugs, architectural insights, multi-paragraph context. Persist forever.
- **Ring buffer** at the bottom: most recent 15 single-sentence lessons emitted by the retrospective skill, consumed by `new-task` / `plan` / `decompose` / `autodev` Step 0. Oldest entries drop off the top.
Categories: `estimation` · `architecture` · `testing` · `dependencies` · `tooling` · `process`
---
## L-001 — Dapper `TypeHandler<T>` is bypassed for enum types during read deserialization
**Cycle**: 1 (AZ-484)
**Discovered by**: integration test failure (`Error parsing column 12 (source=google_maps - String)`); root-caused via web search to long-standing Dapper issue [#259](https://github.com/DapperLib/Dapper/issues/259).
**Affects**: Dapper 2.1.35 (and most other versions until the proposed `Settings.PreferTypeHandlersForEnums` opt-in in PR #2200, not yet merged).
**What happens**
Registering `SqlMapper.AddTypeHandler(new MyEnumHandler())` for an `enum` type — even via `SqlMapper.TypeHandler<TEnum>` — works for **writes** (the handler's `SetValue` is invoked for parameter binding) but is **silently bypassed for reads**. Dapper's IL-emitted deserializer checks `IsEnum` first and falls back to `Enum.TryParse(string, ignoreCase: true)`.
**Why this is dangerous**
If the enum's wire string happens to match a member name case-insensitively (e.g., `RegionStatus.Failed``"failed"`), the bypass goes unnoticed and round-trip works *accidentally*. The bug only surfaces when the wire format diverges from the C# member name (e.g., `TileSource.GoogleMaps``"google_maps"``Enum.TryParse("google_maps")` does not match `GoogleMaps` because of the underscore).
**Recommended approach**
- Do **not** rely on `SqlMapper.TypeHandler<TEnum>` for read-side enum mapping unless the wire values match the enum member names case-insensitively.
- For enums whose wire format diverges (snake_case, kebab-case, custom IDs), store the entity field as `string` and provide an explicit converter (`*Converter.ToWireValue` / `FromWireValue`) for use at the service-layer boundary. This is what AZ-484 does for `TileEntity.Source``TileSourceConverter`.
- Unit-test the converter directly. Do not assume that round-tripping through Dapper proves anything for enums.
**Detection**
- Unit tests of the type handler in isolation will pass even when the handler is bypassed at runtime.
- Failure surfaces only at integration-test time when the actual SELECT runs.
- If you must keep an enum-typed field, write at minimum one integration test that reads the enum back through Dapper from a real database row.
---
- [2026-06-25] [process] Documentation-only autodev cycles should still run Step 12 traceability rows (doc-verified ACs) and Step 13 ripple logs, while Steps 1415 are appropriately skipped when no code surface changes — avoids empty cycle artifacts without running meaningless security/perf gates (cycle 11: AZ-1123).
Source: _docs/06_metrics/retro_2026-06-25_cycle11.md
- [2026-06-26] [tooling] Host-side gRPC perf (`--run-pt10`) must load cert-only PEM CA files via `X509Certificate2.CreateFromPem(text)``CreateFromPemFile(api.crt)` throws when no private key is present; REST curl `--cacert` tolerates cert-only but .NET gRPC channel setup does not (cycle 12: PT-10 Step 15 first run failed, one-line fix in `GrpcTestHelpers`).
Source: _docs/06_metrics/perf_2026-06-26_cycle12.md
- [2026-06-26] [process] Docs-only cycles that close a named retro carry-over (stale command in `environment.md` / `README.md` / `AGENTS.md`) ship cleanly as a 1 SP sole theme with Steps 1415 skipped and smoke as Step 11 regression evidence — cycle 14 AZ-1131 closed the cycle 12/13 integration-compose doc mismatch in one batch.
Source: _docs/06_metrics/retro_2026-06-26_cycle14.md
- [2026-06-26] [dependencies] Sole-theme dependency bumps (FluentValidation 12.0.0 → 12.1.1) ship cleanly in one 1 SP batch when security scan artifacts are updated in the same commit — D-AZ795-1 closed with full suite regression and Steps 1415 skipped (cycle 15: AZ-1132).
Source: _docs/06_metrics/retro_2026-06-26_cycle15.md
- [2026-06-26] [process] Dependency bumps should ripple version pins through `module-layout.md`, `architecture.md`, and `error-shape.md` in Step 13 — leaving 12.0.0 references in those files after AZ-1132 required a doc-sync pass (cycle 15).
Source: _docs/06_metrics/retro_2026-06-26_cycle15.md
- [2026-06-26] [process] Cycle 14 retro Action #1 (D-AZ795-1) closed end-to-end when phrased as a concrete tracker ticket + ≤2 SP sole theme — same pattern as cycle 13 F-AZ810-2 and cycle 10 F-AZ795-* (cycle 15).
Source: _docs/06_metrics/retro_2026-06-26_cycle15.md
## Ring buffer (last 15 entries — newest at top)
- [2026-06-26] [process] Multi-cycle security carry-overs that name a concrete finding ID and fit ≤2 SP ship cleanly as a sole cycle theme — cycle 12 retro Action #1 → cycle 13 AZ-1126 closed F-AZ810-2 in one batch with full security + perf gate coverage.
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
- [2026-06-26] [testing] Custom `JsonConverter` exceptions must propagate through boundary filters — a generic metadata parse string in `UavUploadValidationFilter` masked `UtcOffsetRequiredDateTimeOffsetConverter` diagnostics until integration tests failed (cycle 13 AZ-1126 auto-fix).
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
- [2026-06-26] [tooling] Step 15 perf gate exit 7 on first run when the perf compose stack is not up — preflight with `docker compose -f docker-compose.yml -f docker-compose.perf.yml up -d` before `run-performance-tests.sh` or add a health check to the script (cycle 13).
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
- [2026-06-25] [testing] PT-07 cold-vs-warm region latency is sensitive to outlier cold p95 on a warm compose volume — the perf gate should drain the region queue before the warm pass and accept warm p50 < cold p50 when p95 is within noise (cycle 10: two marginal PT-07 FAILs before harness fix; AZ-1113 did not touch region paths).
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
- [2026-06-25] [process] Retrospective security recommendations that name concrete finding IDs (F-AZ795-1/2, F-AZ810-1) and fit ≤2 SP can ship as the sole cycle theme and close multi-cycle carry-overs in one batch — cycle 9 Action #3 → cycle 10 AZ-1113 resolved all three Low A09 findings.
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
- [2026-06-25] [tooling] Creating `docker-compose.perf.yml` without documenting it in deployment docs leaves the next cycle rediscovering the same host-port conflict — ship the compose overlay and a one-paragraph playbook in `containerization.md` in the same cycle (cycle 10: file added, doc still pending).
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
- [2026-06-25] [tooling] When host port 5433 is occupied by a sibling Postgres container, integration and perf gates must not depend on publishing postgres to the host — use a self-contained test compose file (internal network only) or a documented `ports: !reset []` override on the dev stack so Step 11/15 can run without stopping sibling projects (cycle 9: `fleet-viewer-dev-db` blocked both integration tests and perf until compose was adjusted).
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
- [2026-06-25] [testing] Adding a new transport (gRPC) over shared orchestrator logic does not automatically extend the perf harness — REST PT-01..PT-08 can pass while the new RPC surface stays Unverified until an explicit PT-NN scenario and threshold land in `performance-tests.md` + `run-performance-tests.sh` (cycle 9: gRPC DeliverRouteTiles had no perf scenario; gate passed on REST-only evidence).
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
- [2026-06-25] [architecture] Shared wire contracts belong in a leaf `*.GrpcContracts` (or equivalent) project referenced by both server and test client — keeping proto in the API project forces test projects to reference the full API assembly and couples codegen to the delivery layer (cycle 9: `SatelliteProvider.GrpcContracts` extracted from Api).
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
- [2026-05-23] [process] Step-14 security-audit Medium findings whose remediation fits the small-fix threshold (≤2 SP, ≤2 files, ≤1 contract bump) should be resolved within the same autodev invocation rather than deferred to cycle N+1 — the fix lands with the same commit chain that introduced the surface, the contract version reflects the fix immediately, and the traceability matrix and blackbox-tests.md sub-cases are written while the finding is fresh; codify the option as a first-class A/B/C choice in `.cursor/skills/autodev/flows/existing-code.md` Step-14 action (cycle 8: F-AZ809-1 unbounded `geofences.polygons` DoS — discovered in commit `ac40a8b`, resolved in commit `8fca6e0` with `MaxPolygons = 50` cap + unit + integration test + `route-creation.md` v1.0.1 patch bump, ~30 minutes from finding to fix landed).
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
- [2026-05-23] [process] Retrospective recommendations ship end-to-end in the next cycle only when they (a) name concrete tracker tickets / files / endpoints in the action text, (b) are sized as a coherent cycle theme rather than scattered one-off fixes, and (c) the next cycle's planning phase pulls the recommendation directly into the task slate without re-deriving scope — phrase recommendations to satisfy all three or they become multi-cycle carry-overs (cycle 7 Action 3 named the 4 AZ-795 child endpoints + the SP sizing → cycle 8 shipped AZ-808 + AZ-809 + AZ-810 + AZ-811 + AZ-812 as the coherent strict-validation theme, first directly-traceable cross-cycle improvement-action end-to-end in project history).
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
- [2026-05-23] [tooling] When a contract introduces a new required field or renames a wire key, the test-spec sync step must ripgrep all consumer paths (`scripts/run-performance-tests.sh`, `scripts/probe_*.sh`, `README.md` example URLs, `_docs/04_deploy/*.md` example bodies, `_docs/02_document/tests/blackbox-tests.md` trigger excerpts, OpenAPI examples) and verify they reference the new shape — otherwise partial syncs surface at Step 15 Performance Test gate or worse in production (cycle 8: AZ-812 lat/lon rename updated the perf script but AZ-809's newly-required `requestMaps` + `createTilesZip` fields did not propagate to PT-06; caught at Step 15 with a 1-line script fix in commit `32bc5c1` — a static-check probe at batch closure would have caught it 2 days earlier).
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
- [2026-05-23] [process] When verifying a "no-regression" AC for an input-validation change ("AZ-NNN does not break existing tests"), the only sound evidence is a green integration-test run — tracing fixture variables back to their generators in source is insufficient because helpers can produce values outside the new bounds and previously slipped through silently when no validator existed; document the standard as "verified by reading source" → unconfirmed, "verified by full test run" → confirmed, and gate the batch report's AC table on the latter before the implement skill closes the batch (cycle 8: AZ-810 batch_04 AC-9 claimed "no AZ-488 regression" based on tracing `latitude = coord.Latitude` in test source, but `NextTestCoordinate` seeded by `(Ticks/TicksPerSecond) % 1_000_000` produced lat far above 90° at runtime; the false-PASS only surfaced at autodev Step 11 when the integration test run returned HTTP 400 from the new validator on the AZ-488 happy path).
Source: _docs/03_implementation/batch_04_cycle8_report.md (AC-9 row)
- [2026-05-22] [process] When the implement skill ships a cycle's batch commit without writing `_docs/03_implementation/implementation_report_*_cycle{N}.md`, downstream skills (test-spec cycle-update, document task mode, retrospective Step 1) must fall back to reading the cycle's task specs in `_docs/02_tasks/done/` plus the commit body via `git log --grep='[AZ-...]'` — codify the fallback in those skills' instructions instead of leaving it as per-cycle improvisation, because the implicit contract between Step 10 and Steps 11-17 broke silently this cycle and only succeeded because every downstream skill happened to be robust enough to substitute (cycle 7: AZ-794+AZ-795+AZ-796 shipped as commit `865dfdb` with no report artifact; doc-skill auto-walked the diff, test-spec read the task specs, retrospective wrote from the deploy + security + perf reports — all worked, but the contract was never formal). **Status**: closed by cycle 8 — the implement skill self-corrected and produced both per-batch reports AND a consolidated implementation report AND a completeness report AND a cumulative cross-batch review.
Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
- [2026-05-22] [testing] When a strict-validation layer ships (`JsonSerializerOptions.UnmappedMemberHandling.Disallow`, FluentValidation rules, explicit DTO `[JsonRequired]`), expect the project's own integration tests to surface latent bugs the prior lenient defaults had been masking — silent PascalCase fallback property names, out-of-range fixture coordinates, wrong-cased JSON keys; correct them in the same PR or the test suite goes red and the strict layer looks like a regression instead of the bug-finder it is (cycle 7: `IdempotentPostTests.RoutePoint` had been posting `{"Lat":...}` against a `[JsonPropertyName("lat")]` DTO for months; the new strict deserializer caught it and the 2-line payload fix landed alongside the strict layer; **cycle 8 instance**: `UavUploadTests.NextTestCoordinate` produced lat > 90°, caught by AZ-810 validator, 2-file clamp fix in batch 4).
Source: _docs/06_metrics/retro_2026-05-22_cycle7.md