mirror of
https://github.com/azaion/satellite-provider.git
synced 2026-06-27 12:31:14 +00:00
80ef5608f1
Co-authored-by: Cursor <cursoragent@cursor.com>
74 lines
13 KiB
Markdown
74 lines
13 KiB
Markdown
# Engineering Lessons
|
||
|
||
Recurring bugs, surprising library behaviors, and process insights extracted from completed cycles. Newest at the top. Keep entries short — this is for fast scanning at the start of new cycles, not exhaustive history.
|
||
|
||
This file has two layers:
|
||
|
||
- **Deep engineering lessons** (L-NNN): library bugs, architectural insights, multi-paragraph context. Persist forever.
|
||
- **Ring buffer** at the bottom: most recent 15 single-sentence lessons emitted by the retrospective skill, consumed by `new-task` / `plan` / `decompose` / `autodev` Step 0. Oldest entries drop off the top.
|
||
|
||
Categories: `estimation` · `architecture` · `testing` · `dependencies` · `tooling` · `process`
|
||
|
||
---
|
||
|
||
## L-001 — Dapper `TypeHandler<T>` is bypassed for enum types during read deserialization
|
||
|
||
**Cycle**: 1 (AZ-484)
|
||
**Discovered by**: integration test failure (`Error parsing column 12 (source=google_maps - String)`); root-caused via web search to long-standing Dapper issue [#259](https://github.com/DapperLib/Dapper/issues/259).
|
||
**Affects**: Dapper 2.1.35 (and most other versions until the proposed `Settings.PreferTypeHandlersForEnums` opt-in in PR #2200, not yet merged).
|
||
|
||
**What happens**
|
||
Registering `SqlMapper.AddTypeHandler(new MyEnumHandler())` for an `enum` type — even via `SqlMapper.TypeHandler<TEnum>` — works for **writes** (the handler's `SetValue` is invoked for parameter binding) but is **silently bypassed for reads**. Dapper's IL-emitted deserializer checks `IsEnum` first and falls back to `Enum.TryParse(string, ignoreCase: true)`.
|
||
|
||
**Why this is dangerous**
|
||
If the enum's wire string happens to match a member name case-insensitively (e.g., `RegionStatus.Failed` ↔ `"failed"`), the bypass goes unnoticed and round-trip works *accidentally*. The bug only surfaces when the wire format diverges from the C# member name (e.g., `TileSource.GoogleMaps` ↔ `"google_maps"` — `Enum.TryParse("google_maps")` does not match `GoogleMaps` because of the underscore).
|
||
|
||
**Recommended approach**
|
||
- Do **not** rely on `SqlMapper.TypeHandler<TEnum>` for read-side enum mapping unless the wire values match the enum member names case-insensitively.
|
||
- For enums whose wire format diverges (snake_case, kebab-case, custom IDs), store the entity field as `string` and provide an explicit converter (`*Converter.ToWireValue` / `FromWireValue`) for use at the service-layer boundary. This is what AZ-484 does for `TileEntity.Source` ↔ `TileSourceConverter`.
|
||
- Unit-test the converter directly. Do not assume that round-tripping through Dapper proves anything for enums.
|
||
|
||
**Detection**
|
||
- Unit tests of the type handler in isolation will pass even when the handler is bypassed at runtime.
|
||
- Failure surfaces only at integration-test time when the actual SELECT runs.
|
||
- If you must keep an enum-typed field, write at minimum one integration test that reads the enum back through Dapper from a real database row.
|
||
|
||
---
|
||
|
||
- [2026-06-25] [process] Documentation-only autodev cycles should still run Step 12 traceability rows (doc-verified ACs) and Step 13 ripple logs, while Steps 14–15 are appropriately skipped when no code surface changes — avoids empty cycle artifacts without running meaningless security/perf gates (cycle 11: AZ-1123).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle11.md
|
||
- [2026-06-26] [tooling] Host-side gRPC perf (`--run-pt10`) must load cert-only PEM CA files via `X509Certificate2.CreateFromPem(text)` — `CreateFromPemFile(api.crt)` throws when no private key is present; REST curl `--cacert` tolerates cert-only but .NET gRPC channel setup does not (cycle 12: PT-10 Step 15 first run failed, one-line fix in `GrpcTestHelpers`).
|
||
Source: _docs/06_metrics/perf_2026-06-26_cycle12.md
|
||
## Ring buffer (last 15 entries — newest at top)
|
||
|
||
- [2026-06-26] [process] Multi-cycle security carry-overs that name a concrete finding ID and fit ≤2 SP ship cleanly as a sole cycle theme — cycle 12 retro Action #1 → cycle 13 AZ-1126 closed F-AZ810-2 in one batch with full security + perf gate coverage.
|
||
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
|
||
- [2026-06-26] [testing] Custom `JsonConverter` exceptions must propagate through boundary filters — a generic metadata parse string in `UavUploadValidationFilter` masked `UtcOffsetRequiredDateTimeOffsetConverter` diagnostics until integration tests failed (cycle 13 AZ-1126 auto-fix).
|
||
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
|
||
- [2026-06-26] [tooling] Step 15 perf gate exit 7 on first run when the perf compose stack is not up — preflight with `docker compose -f docker-compose.yml -f docker-compose.perf.yml up -d` before `run-performance-tests.sh` or add a health check to the script (cycle 13).
|
||
Source: _docs/06_metrics/retro_2026-06-26_cycle13.md
|
||
- [2026-06-25] [testing] PT-07 cold-vs-warm region latency is sensitive to outlier cold p95 on a warm compose volume — the perf gate should drain the region queue before the warm pass and accept warm p50 < cold p50 when p95 is within noise (cycle 10: two marginal PT-07 FAILs before harness fix; AZ-1113 did not touch region paths).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
|
||
- [2026-06-25] [process] Retrospective security recommendations that name concrete finding IDs (F-AZ795-1/2, F-AZ810-1) and fit ≤2 SP can ship as the sole cycle theme and close multi-cycle carry-overs in one batch — cycle 9 Action #3 → cycle 10 AZ-1113 resolved all three Low A09 findings.
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
|
||
- [2026-06-25] [tooling] Creating `docker-compose.perf.yml` without documenting it in deployment docs leaves the next cycle rediscovering the same host-port conflict — ship the compose overlay and a one-paragraph playbook in `containerization.md` in the same cycle (cycle 10: file added, doc still pending).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle10.md
|
||
- [2026-06-25] [tooling] When host port 5433 is occupied by a sibling Postgres container, integration and perf gates must not depend on publishing postgres to the host — use a self-contained test compose file (internal network only) or a documented `ports: !reset []` override on the dev stack so Step 11/15 can run without stopping sibling projects (cycle 9: `fleet-viewer-dev-db` blocked both integration tests and perf until compose was adjusted).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
|
||
- [2026-06-25] [testing] Adding a new transport (gRPC) over shared orchestrator logic does not automatically extend the perf harness — REST PT-01..PT-08 can pass while the new RPC surface stays Unverified until an explicit PT-NN scenario and threshold land in `performance-tests.md` + `run-performance-tests.sh` (cycle 9: gRPC DeliverRouteTiles had no perf scenario; gate passed on REST-only evidence).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
|
||
- [2026-06-25] [architecture] Shared wire contracts belong in a leaf `*.GrpcContracts` (or equivalent) project referenced by both server and test client — keeping proto in the API project forces test projects to reference the full API assembly and couples codegen to the delivery layer (cycle 9: `SatelliteProvider.GrpcContracts` extracted from Api).
|
||
Source: _docs/06_metrics/retro_2026-06-25_cycle9.md
|
||
- [2026-05-23] [process] Step-14 security-audit Medium findings whose remediation fits the small-fix threshold (≤2 SP, ≤2 files, ≤1 contract bump) should be resolved within the same autodev invocation rather than deferred to cycle N+1 — the fix lands with the same commit chain that introduced the surface, the contract version reflects the fix immediately, and the traceability matrix and blackbox-tests.md sub-cases are written while the finding is fresh; codify the option as a first-class A/B/C choice in `.cursor/skills/autodev/flows/existing-code.md` Step-14 action (cycle 8: F-AZ809-1 unbounded `geofences.polygons` DoS — discovered in commit `ac40a8b`, resolved in commit `8fca6e0` with `MaxPolygons = 50` cap + unit + integration test + `route-creation.md` v1.0.1 patch bump, ~30 minutes from finding to fix landed).
|
||
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
|
||
- [2026-05-23] [process] Retrospective recommendations ship end-to-end in the next cycle only when they (a) name concrete tracker tickets / files / endpoints in the action text, (b) are sized as a coherent cycle theme rather than scattered one-off fixes, and (c) the next cycle's planning phase pulls the recommendation directly into the task slate without re-deriving scope — phrase recommendations to satisfy all three or they become multi-cycle carry-overs (cycle 7 Action 3 named the 4 AZ-795 child endpoints + the SP sizing → cycle 8 shipped AZ-808 + AZ-809 + AZ-810 + AZ-811 + AZ-812 as the coherent strict-validation theme, first directly-traceable cross-cycle improvement-action end-to-end in project history).
|
||
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
|
||
- [2026-05-23] [tooling] When a contract introduces a new required field or renames a wire key, the test-spec sync step must ripgrep all consumer paths (`scripts/run-performance-tests.sh`, `scripts/probe_*.sh`, `README.md` example URLs, `_docs/04_deploy/*.md` example bodies, `_docs/02_document/tests/blackbox-tests.md` trigger excerpts, OpenAPI examples) and verify they reference the new shape — otherwise partial syncs surface at Step 15 Performance Test gate or worse in production (cycle 8: AZ-812 lat/lon rename updated the perf script but AZ-809's newly-required `requestMaps` + `createTilesZip` fields did not propagate to PT-06; caught at Step 15 with a 1-line script fix in commit `32bc5c1` — a static-check probe at batch closure would have caught it 2 days earlier).
|
||
Source: _docs/06_metrics/retro_2026-05-23_cycle8.md
|
||
- [2026-05-23] [process] When verifying a "no-regression" AC for an input-validation change ("AZ-NNN does not break existing tests"), the only sound evidence is a green integration-test run — tracing fixture variables back to their generators in source is insufficient because helpers can produce values outside the new bounds and previously slipped through silently when no validator existed; document the standard as "verified by reading source" → unconfirmed, "verified by full test run" → confirmed, and gate the batch report's AC table on the latter before the implement skill closes the batch (cycle 8: AZ-810 batch_04 AC-9 claimed "no AZ-488 regression" based on tracing `latitude = coord.Latitude` in test source, but `NextTestCoordinate` seeded by `(Ticks/TicksPerSecond) % 1_000_000` produced lat far above 90° at runtime; the false-PASS only surfaced at autodev Step 11 when the integration test run returned HTTP 400 from the new validator on the AZ-488 happy path).
|
||
Source: _docs/03_implementation/batch_04_cycle8_report.md (AC-9 row)
|
||
- [2026-05-22] [process] When the implement skill ships a cycle's batch commit without writing `_docs/03_implementation/implementation_report_*_cycle{N}.md`, downstream skills (test-spec cycle-update, document task mode, retrospective Step 1) must fall back to reading the cycle's task specs in `_docs/02_tasks/done/` plus the commit body via `git log --grep='[AZ-...]'` — codify the fallback in those skills' instructions instead of leaving it as per-cycle improvisation, because the implicit contract between Step 10 and Steps 11-17 broke silently this cycle and only succeeded because every downstream skill happened to be robust enough to substitute (cycle 7: AZ-794+AZ-795+AZ-796 shipped as commit `865dfdb` with no report artifact; doc-skill auto-walked the diff, test-spec read the task specs, retrospective wrote from the deploy + security + perf reports — all worked, but the contract was never formal). **Status**: closed by cycle 8 — the implement skill self-corrected and produced both per-batch reports AND a consolidated implementation report AND a completeness report AND a cumulative cross-batch review.
|
||
Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
|
||
- [2026-05-22] [testing] When a strict-validation layer ships (`JsonSerializerOptions.UnmappedMemberHandling.Disallow`, FluentValidation rules, explicit DTO `[JsonRequired]`), expect the project's own integration tests to surface latent bugs the prior lenient defaults had been masking — silent PascalCase fallback property names, out-of-range fixture coordinates, wrong-cased JSON keys; correct them in the same PR or the test suite goes red and the strict layer looks like a regression instead of the bug-finder it is (cycle 7: `IdempotentPostTests.RoutePoint` had been posting `{"Lat":...}` against a `[JsonPropertyName("lat")]` DTO for months; the new strict deserializer caught it and the 2-line payload fix landed alongside the strict layer; **cycle 8 instance**: `UavUploadTests.NextTestCoordinate` produced lat > 90°, caught by AZ-810 validator, 2-file clamp fix in batch 4).
|
||
Source: _docs/06_metrics/retro_2026-05-22_cycle7.md
|