[AZ-270] [AZ-272] [AZ-279] [AZ-281] [AZ-283] Compose root + FDR schema + 3 Layer-1 helpers

AZ-270: composition root with strategy registry, tier-gated lookup,
topo-order construction, all-or-nothing teardown, StrategyNotLinkedError
payload.
AZ-272: orjson-backed FdrRecord serialise/parse with forward-compat for
unknown payload + top-level fields and canonical overrun-record shape.
AZ-279: pyproj-backed WGS84/ECEF/ENU + OSM slippy-map tile math with
WgsConversionError for shape/range/zoom guards.
AZ-281: strict EngineFilenameSchema build/parse/matches_host with
anchored regex + enum validation; round-trip identity by construction.
AZ-283: dtype-preserving (fp16/fp32) single + batch L2 normaliser with
zero-norm safety and descriptor_metric() source-of-truth.
pyproject.toml pins pyproj>=3.6 and orjson>=3.9 (named-backend deps per
the AZ-272 / AZ-279 contracts). New DTOs LatLonAlt + BoundingBox and
EngineCacheKey + HostCapabilities land in _types/ to back the helper
contracts.
203 unit tests pass (64 new). Review verdict: PASS_WITH_WARNINGS;
findings are perf-NFR deferrals + dep amendment + minor docstring polish.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 02:03:36 +03:00
parent 8e71f6c002
commit 3acc7f33dd
24 changed files with 2381 additions and 97 deletions
+108
View File
@@ -0,0 +1,108 @@
# Composition Root + StrategyNotLinkedError
**Task**: AZ-270_compose_root
**Name**: Composition Root
**Description**: Implement `compose_root(config) -> RuntimeRoot` for the airborne process and `compose_operator(config) -> OperatorRoot` for the operator-side tooling. Both functions construct every component instance, inject dependencies against component interfaces, and refuse to start when the config selects a strategy whose `BUILD_<NAME>` flag was OFF in the linked binary (raises `StrategyNotLinkedError`).
**Complexity**: 3 points
**Dependencies**: AZ-269_config_loader
**Component**: shared.config (cross-cutting; epic AZ-246 / E-CC-CONF)
**Tracker**: AZ-270
**Epic**: AZ-246 (E-CC-CONF)
## Problem
Per ADR-009 (interface-first DI), only ONE place in the codebase may import concrete component implementations — the composition root. Without a single, tested composition function, components grow direct cross-imports and the build-time exclusion gate (ADR-002) loses its third enforcement point at runtime.
## Outcome
- A single `compose_root(config)` call returns a fully-wired airborne `RuntimeRoot` whose component graph matches the `Config`-selected strategies.
- Strategy/build-flag mismatch raises `StrategyNotLinkedError` with a clear message naming the missing strategy, the owning component, and the strategies actually linked into this binary.
- `compose_operator(config)` returns the operator-side `OperatorRoot` with only operator-tier components (e.g. C11 TileManager, C12 operator tooling) — and refuses to wire C1C5 / C7 / C13 (airborne-only) even if asked.
- `runtime_root.py` exits with code 0 on a valid Config when no components do work (reachability proof per epic AC-4).
## Scope
### Included
- `compose_root(config: Config) -> RuntimeRoot` per the composition_root_protocol contract.
- `compose_operator(config: Config) -> OperatorRoot` per the same contract.
- `StrategyNotLinkedError` exception with `strategy_name`, `component_slug`, `available_strategies` payload.
- Strategy/build-flag consistency check that runs at the start of both compose functions; ADR-002 enforcement gate #3.
- Component construction order respects the dependency graph in `_docs/02_document/architecture.md` (foundational components first).
- Composition-root code is the ONLY allowed importer of concrete component classes; module-layout.md's Layout Rule 6 is enforced at code-review time.
### Excluded
- The `RuntimeRoot` and `OperatorRoot` internal class definitions — owned by E-BOOT (AZ-263) for the skeleton; per-component `add_to_root` registration logic lives in each component epic.
- Per-component config blocks — owned by each component epic.
- Per-component strategy registration — each component epic registers its strategies into a discovery map; this task only wires what's been registered.
## Acceptance Criteria
**AC-1: Default deployment composes**
Given a default-deployment-binary `Config` and a binary built with the deployment `BUILD_*` flag set
When `compose_root(config)` runs
Then it returns a `RuntimeRoot` whose every component slot is populated by the strategy declared in `Config`
**AC-2: Strategy/build-flag mismatch rejected**
Given a `Config` selects `vins_mono` for `c1_vio` and the binary was built with `BUILD_VINS_MONO=OFF`
When `compose_root(config)` runs
Then it raises `StrategyNotLinkedError` with `strategy_name="vins_mono"`, `component_slug="c1_vio"`, `available_strategies` listing the strategies actually linked
**AC-3: Operator-side excludes airborne**
Given an operator `Config` accidentally references an airborne-only component (e.g. `c1_vio`)
When `compose_operator(config)` runs
Then it raises `StrategyNotLinkedError` (or a clearly-named subclass) noting the component is airborne-only
**AC-4: Reachability proof**
Given a valid `Config` with all components stubbed to do nothing
When `runtime_root.py` runs `compose_root(config)` and exits
Then exit code is 0 and no exception is raised
**AC-5: Construction order respects dependencies**
Given `Config` selects `c5_state` (depends on `c1_vio`, `c4_pose`)
When `compose_root(config)` constructs the graph
Then `c1_vio` and `c4_pose` instances exist before `c5_state` is constructed (verified by an order-tracing fake)
**AC-6: Single import point enforced**
Given the codebase
When the architecture lint check (added under code-review skill, Phase 7) runs
Then only `compose_root` and `compose_operator` import from `components.<name>.<concrete>` — every other module imports only from `components.<name>` (Public API)
## Non-Functional Requirements
**Performance**
- `compose_root(config)` ≤ 750 ms on Tier-2 (combined with AZ-269's 250 ms loader budget for the 1 s total).
**Reliability**
- Composition is deterministic: same `Config` → same component graph (verified by structural equality on the fake recorder).
- A failure mid-composition leaves no partially-constructed singletons (composition is all-or-nothing; on error, every constructed instance is closed).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Default Config + deployment-flag binary | Every component slot populated |
| AC-2 | Config selects unlinked strategy | `StrategyNotLinkedError` with full payload |
| AC-3 | Operator Config references airborne-only component | `StrategyNotLinkedError` (or subclass) noting tier mismatch |
| AC-4 | `runtime_root.py` smoke run with stubbed components | exit code 0 |
| AC-5 | `compose_root` with construction-order recorder | dependency order respected |
| AC-6 | Architecture lint over the codebase | Only compose_root / compose_operator import concrete strategies |
| NFR-perf | Microbench `compose_root` over a representative Config | p99 ≤ 750 ms on Tier-2 |
| NFR-reliability | Force a mid-composition failure (one strategy raises in `__init__`) | No partial state; every prior instance closed |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_config/composition_root_protocol.md` v1.0.0.
- Composition-root code is the ONLY place concrete strategy classes may be imported. Code-review Phase 7 emits an Architecture finding (High) on any other importer.
## Risks & Mitigation
**Risk 1: Component registration not fully discoverable at compose time**
- *Risk*: A component epic forgets to register its strategies into the discovery map, leaving `compose_root` unable to construct it.
- *Mitigation*: A startup self-check enumerates required components from the architecture spec and asserts every one has at least one registered strategy; missing → loud error at compose start.
## Contract
This task produces (jointly with AZ-269 config loader) the contract at `_docs/02_document/contracts/shared_config/composition_root_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
@@ -0,0 +1,123 @@
# FdrRecord Schema + Versioned Serialiser
**Task**: AZ-272_fdr_record_schema
**Name**: FdrRecord Schema
**Description**: Define the `FdrRecord` versioned schema (one record kind per payload class — `log`, `vio.tick`, `state.tick`, `tile_match`, `overrun`, `segment_rollover`, `failed_tile_thumbnail`, `mid_flight_tile_snapshot`, etc.) and the matching serialiser/parser pair so every onboard producer emits and post-flight tooling reads the same wire format. Library choice (orjson or msgpack) is pinned at E-BOOT; the schema layer is library-agnostic.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-266_log_module
**Component**: shared.fdr_client (cross-cutting; epic AZ-247 / E-CC-FDR-CLIENT)
**Tracker**: AZ-272
**Epic**: AZ-247 (E-CC-FDR-CLIENT)
## Problem
C13 (FdrWriter) and every onboard producer must agree on a single, versioned wire format for FDR records. Without one frozen schema:
- Producers drift in field naming over time, breaking post-flight analysis.
- Forward-compatible parsing is impossible — a new field added in version N+1 silently breaks tooling pinned at version N.
- The cross-component "no silent drops" guarantee (AC-NEW-3) is unenforceable because the `kind=overrun` record has no canonical shape.
## Outcome
- A single `FdrRecord` definition is the only record type any onboard process emits, and the only one C13's writer thread + post-flight tooling parses.
- The schema carries a top-level `schema_version` integer; the parser is forward-compatible — a record at version N is readable by tooling pinned at version N-1 with documented field-set degradation rules.
- Adding a new record `kind` is a minor version bump; renaming or removing a field is a major version bump (covered by `Versioning Rules` in the contract).
## Scope
### Included
- `FdrRecord` outer envelope: `schema_version: int`, `ts: str (ISO 8601 UTC, µs)`, `producer_id: str`, `kind: str`, `payload: object`.
- A closed enum of supported `kind` values for v1.0.0 covering: `log`, `vio.tick`, `state.tick`, `tile_match`, `overrun`, `segment_rollover`, `failed_tile_thumbnail`, `mid_flight_tile_snapshot`, `flight_header`, `flight_footer`. Per-`kind` payload shape is documented in the contract.
- A `serialise(record: FdrRecord) -> bytes` and `parse(buf: bytes) -> FdrRecord` pair. Library is pinned at E-BOOT (orjson or msgpack); the public API hides the choice.
- A forward-compat parser: unknown future fields inside `payload` are preserved on read (deserialised into a generic `extra: dict[str, Any]` bucket); unknown future `kind` values surface as `FdrRecord(kind=<raw>, payload=<raw>)` so tooling can skip them rather than crash.
- Public interface contract published at `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md`.
### Excluded
- The lock-free ring buffer (`FdrClient.enqueue`) — owned by AZ-XX (next task in this epic).
- Drop-oldest + overrun-emission policy — owned by the third task in this epic.
- The `FakeFdrSink` test double — owned by the fourth task in this epic.
- The C13 writer thread, segment files, and 64 GB cap — owned by E-C13 (AZ-248).
## Acceptance Criteria
**AC-1: One envelope, every kind**
Given any of the v1.0.0 `kind` values
When a producer constructs an `FdrRecord(kind=<kind>, payload=<payload>)` and `serialise` is called
Then the resulting bytes parse back to a deep-equal `FdrRecord` via `parse`
**AC-2: Forward-compatible parser**
Given a record serialised at schema_version 1.1 (a hypothetical future minor) with an additional payload field `new_field`
When tooling pinned at schema_version 1.0 calls `parse`
Then the record parses successfully; `new_field` is preserved under `payload.extra["new_field"]`
**AC-3: Unknown kind tolerated**
Given a record whose `kind` is not in the v1.0.0 closed enum
When `parse` runs
Then a valid `FdrRecord` is returned with `kind` set to the raw string and `payload` set to the raw decoded object — no exception is raised
**AC-4: Schema version is mandatory**
Given a serialised record missing `schema_version` (or with a non-integer value)
When `parse` runs
Then `FdrSchemaError` is raised with a message naming the offending field
**AC-5: Overrun record shape is canonical**
Given a `kind="overrun"` record
When constructed by any producer
Then `payload` MUST contain `producer_id: str` and `dropped_count: int (>0)` — schema validation rejects payloads missing either field
**AC-6: Producer ID is required on every record**
Given any `FdrRecord` with an empty or missing top-level `producer_id`
When `serialise` runs
Then `FdrSchemaError` is raised — there are no anonymous records on the wire
## Non-Functional Requirements
**Performance**
- `serialise` p99 ≤ 20 µs on Tier-2 for a record with `len(payload) ≤ 16` scalar entries (this is the budget that lets the enqueue path stay within its 5 µs hot-path target — serialisation may run on the writer thread instead of the producer; the contract test asserts the producer path does not call `serialise`).
- `parse` p99 ≤ 50 µs on the same record shape.
**Reliability**
- `serialise` and `parse` are pure functions: same input → byte-identical output (or deep-equal record).
- `FdrSchemaError` is the ONLY exception type either function raises on schema violation; no `KeyError`, `ValueError`, or library-specific exceptions leak to callers.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Round-trip every v1.0.0 kind | `parse(serialise(r)) == r` for every kind |
| AC-2 | Parse a synthetic v1.1 record (extra field added) at v1.0 | record parses; extra field preserved in `payload.extra` |
| AC-3 | Parse a record with `kind="future.kind"` | record parses; `kind` and `payload` opaque |
| AC-4 | Missing `schema_version` | `FdrSchemaError` mentions `schema_version` |
| AC-5 | `overrun` record missing `dropped_count` | `FdrSchemaError` mentions `dropped_count` |
| AC-6 | Serialise with empty `producer_id` | `FdrSchemaError` mentions `producer_id` |
| NFR-perf | Microbench `serialise` and `parse` on a 16-entry payload | p99 within budget over 10k iterations |
| NFR-reliability | Call `serialise` twice with same input | byte-identical outputs |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` v1.0.0.
- Underlying serialisation library (orjson vs. msgpack) is pinned in `pyproject.toml` at E-BOOT and must NOT leak through the public API — the contract talks about bytes in/out only.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
## Risks & Mitigation
**Risk 1: Library choice (orjson vs msgpack) changes after consumers exist**
- *Risk*: Switching from JSON-bytes to msgpack-bytes after producers are wired forces a wire-format migration.
- *Mitigation*: The library is pinned in `pyproject.toml` at AZ-263 / E-BOOT; the contract's `Shape` section documents the chosen wire format and its `magic` prefix so post-flight tooling can detect a wrong-format file fast.
**Risk 2: `payload.extra` preserves bytes-blob fields and bloats memory**
- *Risk*: A future schema adds a large binary field; old tooling preserves it under `extra` and balloons in-memory record size.
- *Mitigation*: The forward-compat rule documents that fields larger than 4 KiB MUST be referenced by sidecar path, not embedded — enforced in the contract `Invariants` section.
## Runtime Completeness
- **Named capability**: `FdrRecord` versioned schema + serialiser/parser pair (architecture / E-CC-FDR-CLIENT / AC-NEW-3, AC-8.5).
- **Production code that must exist**: real schema enforcement (every required field validated), real round-trip tested against the chosen library.
- **Allowed external stubs**: none — the library (orjson/msgpack) is the production dependency.
- **Unacceptable substitutes**: hand-rolled `repr()` -> `eval()` round-trip; "for now we just store dicts and worry about schema later"; serialiser that drops unknown fields silently (breaks forward-compat AC-2).
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
+153
View File
@@ -0,0 +1,153 @@
# WgsConverter Helper Module
**Task**: AZ-279_wgs_converter
**Name**: WgsConverter Helper
**Description**: Implement the shared `WgsConverter` helper for WGS84 ↔ local-tangent-plane (ENU) ↔ tile-pixel coordinate conversions, backed by `pyproj`. Used by C4, C5, C6, C8, C10, C11, and C12 — every component that crosses the geographic-vs-local-frame boundary. Stateless static-only design (per `coderule.mdc`); slippy-map tile convention matches `satellite-provider`'s on-disk layout.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.wgs_converter (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-279
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/wgs_converter.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/04_helper_wgs_converter.md` — design rationale and consumer mapping.
## Problem
Seven components (C4, C5, C6, C8, C10, C11, C12) need to cross the geographic-vs-local-frame boundary:
- C4 compares pose-in-WGS to pose-in-ENU; C5 initialises iSAM2 from a WGS origin.
- C6's tile bbox queries map between lat/lon and tile-pixel `(zoom, x, y)`.
- C8 encodes pose for FC emission; C10 / C11 resolve operator-entered bboxes to tile lists; C12 takes the operator's bbox input.
Without a shared helper:
- Each component re-derives the WGS84 → ECEF → ENU pipeline; sign conventions (ENU vs NED) drift; altitude treatment (ellipsoidal vs orthometric) diverges.
- Tile-xy conversions go through OSM-style math in some places and Mercator-projection in others, breaking on-disk compatibility with `satellite-provider`'s `{zoom}/{x}/{y}.jpg` layout.
- A future datum or geoid change becomes a 7-place coordinated edit instead of a single helper update.
## Outcome
- A single `helpers.wgs_converter` module is the only place that performs WGS84 / ECEF / ENU / tile-xy conversions across the codebase. Component imports go through the helper.
- All conversions are pure static functions: same input → byte-equal output (deep-equal numpy / `LatLonAlt`).
- ENU sign convention is locked to `(east, north, up)` and documented; consumers cannot drift to NED accidentally.
- Slippy-map tile convention matches `satellite-provider`'s on-disk layout — the contract test pins the `(zoom=18, lat=50.45, lon=30.52) → (x, y)` round-trip against a known-good fixture.
- Out-of-range inputs (zoom > 22, lat outside Web-Mercator-valid range, ECEF shape mismatch, tile-xy out of `[0, 2^zoom)`) raise `WgsConversionError` rather than silently producing garbage.
## Scope
### Included
- Static methods on `WgsConverter`: `latlonalt_to_ecef`, `ecef_to_latlonalt`, `latlonalt_to_local_enu`, `local_enu_to_latlonalt`, `latlon_to_tile_xy`, `tile_xy_to_latlon_bounds`.
- `WgsConversionError` exception type.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/wgs_converter.md`.
### Excluded
- Datum-shift logic / non-WGS84 datums — out of scope for v1.0.0.
- UTM / MGRS conversions — out of scope.
- Geoid-height corrections (orthometric vs. ellipsoidal altitude) — out of scope; the contract documents that altitude is ellipsoidal.
- Vincenty / great-circle distance helpers — out of scope.
- Body-frame ↔ ECEF rotation transforms — `helpers.se3_utils` + per-deployment `CameraCalibration`.
- The `LatLonAlt` / `BoundingBox` DTOs themselves — owned by `_types/` (AZ-263).
## Acceptance Criteria
**AC-1: ECEF round-trip**
Given `p = LatLonAlt(50.0, 30.0, 100.0)`
When `ecef_to_latlonalt(latlonalt_to_ecef(p))` runs
Then the returned `LatLonAlt` matches `p` within `atol=1e-9` deg lat/lon and `1e-6` m altitude
**AC-2: ENU round-trip within 10 km**
Given an `origin` and a `p` ~10 km away
When `local_enu_to_latlonalt(origin, latlonalt_to_local_enu(origin, p))` runs
Then the returned `LatLonAlt` matches `p` within 1 m horizontal + 1 cm vertical
**AC-3: Slippy-map tile round-trip at z18**
Given `(zoom=18, lat=50.45, lon=30.52)`
When `tile_xy_to_latlon_bounds(zoom, *latlon_to_tile_xy(zoom, lat, lon))` runs
Then the returned bounding box contains the input lat/lon AND the `(x, y)` matches the OSM-pinned fixture for the same coordinates
**AC-4: Web-Mercator latitude range guard**
Given `lat = 95.0` passed to `latlon_to_tile_xy`
When the call runs
Then `WgsConversionError` is raised mentioning the Web-Mercator-valid range `[-85.0511, 85.0511]`
**AC-5: Zoom range guard**
Given `zoom = 25`
When `latlon_to_tile_xy` or `tile_xy_to_latlon_bounds` runs
Then `WgsConversionError` is raised mentioning the supported zoom range `[0, 22]`
**AC-6: Tile-xy range guard**
Given `(zoom=18, x=2^18, y=0)`
When `tile_xy_to_latlon_bounds` runs
Then `WgsConversionError` is raised mentioning the valid `(x, y)` range `[0, 2^zoom)`
**AC-7: ECEF shape contract**
Given an array of shape `(2,)` passed to `ecef_to_latlonalt`
When the call runs
Then `WgsConversionError` is raised mentioning the expected shape `(3,)`
**AC-8: Determinism**
Given the same input
When any helper function is called twice
Then both outputs are byte-equal
**AC-9: No upward imports (Layer 1 invariant)**
Given the helper module
When a static-import check runs
Then it imports ONLY from `_types`, `pyproj`, numpy, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
## Non-Functional Requirements
**Performance**
- No specific latency budget per `_docs/02_document/common-helpers/04_helper_wgs_converter.md` (consumers are pre-flight / post-landing). Each function p99 ≤ 200 µs on Tier-2 as a sanity bound.
**Reliability**
- Pure deterministic; same input → byte-equal output.
- `WgsConversionError` is the ONLY exception type the public surface raises on shape / range violations. `pyproj`'s lower-level exceptions MUST be wrapped.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | ECEF round-trip on 100 random valid `LatLonAlt`s | all match within `atol=1e-9` deg + `1e-6` m |
| AC-2 | ENU round-trip on 100 origin/point pairs within 10 km | all match within 1 m + 1 cm |
| AC-3 | Slippy-map round-trip at z18 with OSM-pinned fixture | `(x, y)` matches fixture; bounds contain input |
| AC-4 | `latlon_to_tile_xy(18, 95.0, 0.0)` | `WgsConversionError`; mentions Web-Mercator range |
| AC-5 | `latlon_to_tile_xy(25, 0, 0)` | `WgsConversionError`; mentions zoom range |
| AC-6 | `tile_xy_to_latlon_bounds(18, 2**18, 0)` | `WgsConversionError`; mentions tile-xy range |
| AC-7 | `ecef_to_latlonalt(np.zeros(2))` | `WgsConversionError`; mentions shape `(3,)` |
| AC-8 | each helper called twice with same input | byte-equal outputs |
| AC-9 | importlinter / grep gate | no `components.*` imports |
| NFR-perf | microbench each helper (10k iterations on Tier-2 fixture) | p99 ≤ 200 µs each |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/wgs_converter.md` v1.0.0.
- Layer 1 Foundation only.
- `pyproj` is the single geodesy backend; pinned in `pyproject.toml` at AZ-263 / E-BOOT.
- Static-only design satisfies `coderule.mdc` ("only use static methods for pure self-contained computations") — every operation is a pure mathematical function of its arguments.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
## Risks & Mitigation
**Risk 1: Tangent-plane approximation degrades silently beyond 100 km**
- *Risk*: A consumer (e.g., C12 operator tooling with a continent-scale bbox) calls `latlonalt_to_local_enu` on a point 500 km from origin; the helper returns a result with O(1 km) error; consumer uses it as ground truth.
- *Mitigation*: The contract `Invariants` section documents the 100 km validity range. Consumers that need wider range explicitly chain ECEF↔ENU through a closer origin.
**Risk 2: Datum drift if `pyproj` upgrades silently change WGS84 parameters**
- *Risk*: A future `pyproj` minor version changes the WGS84 ellipsoid parameters; all conversions shift by sub-metre amounts, breaking the round-trip ACs.
- *Mitigation*: `pyproj` is pinned at AZ-263; round-trip ACs are the canary that detects drift on dependency upgrade.
## Runtime Completeness
- **Named capability**: WGS84 ↔ ECEF ↔ ENU ↔ tile-xy conversions via `pyproj` (architecture / E-CC-HELPERS / `04_helper_wgs_converter.md`).
- **Production code that must exist**: real `pyproj`-backed conversions; real slippy-map tile math matching `satellite-provider`'s on-disk layout.
- **Allowed external stubs**: none — `pyproj` is the production runtime.
- **Unacceptable substitutes**: hand-rolled flat-earth ENU approximation (silently breaks AC-2 beyond a few km); custom Mercator tile math that drifts from OSM convention (breaks `satellite-provider` compatibility); skipping out-of-range guards (silent garbage for high latitudes).
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/wgs_converter.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
@@ -0,0 +1,158 @@
# EngineFilenameSchema Helper Module
**Task**: AZ-281_engine_filename_schema
**Name**: EngineFilenameSchema Helper
**Description**: Implement the shared `EngineFilenameSchema` helper for the self-describing `.engine` filename schema (D-C10-7). TensorRT engines are NOT portable across `(SM, JetPack, TRT, precision)` tuples; encoding the tuple in the filename makes mismatch instantly visible at takeoff load (F2). Used by C7 (writes engines on compile, reads on `deserialize_engine`) and C10 (compiles engines via C7 and writes them to the cache root). Stateless static-only design.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.engine_filename_schema (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-281
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/06_helper_engine_filename_schema.md` — design rationale (D-C10-7).
## Problem
TensorRT engines are not portable. An engine compiled for SM 87 / JetPack 6.2 / TRT 10.3 / FP16 will fail to deserialize — or, worse, deserialize and silently produce wrong output — on a host with a different `(sm, jp, trt, precision)` tuple. Without a self-describing filename:
- C7's `deserialize_engine` cannot tell whether an engine in the cache root matches the host capabilities until it tries to load it (an expensive, non-cheap, partially-side-effecting operation).
- C10 has to maintain an out-of-band sidecar mapping filenames to tuples; that sidecar drifts.
- An operator who copies an engine from a different deployment by mistake gets opaque "deserialize failed" errors at takeoff instead of a clear "engine was built for sm87, host is sm72".
## Outcome
- A single `helpers.engine_filename_schema` module is the only path through which any onboard process composes or parses `.engine` filenames.
- The schema makes `(model_name, sm, jetpack, trt, precision)` part of the filename: `{model}__sm{SM}_jp{JP}_trt{TRT}_{precision}.engine`. F2 takeoff load uses `matches_host` to decide which engines to deserialize and which to refuse before paying the deserialise cost.
- The schema is strict — invalid model names, non-dotted version strings, unknown precisions are rejected at `build` time; malformed filenames are rejected at `parse` time. Both raise `EngineFilenameSchemaError` with messages that name the offending field.
- Round-trip identity: `parse(build(*args)) == EngineCacheKey(*args)` for any valid args. Round-trip is the contract test that catches any future format drift.
## Scope
### Included
- `EngineFilenameSchema` static methods: `build`, `parse`, `matches_host`.
- `EngineFilenameSchemaError` exception type.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md`.
### Excluded
- Schema versioning (no `schema_version` field) — adding a new tuple dimension is a Plan-phase carryforward.
- Engine compilation / compatibility resolution — C7.
- Hot-loading / lazy materialisation — C7.
- Filename collision detection across cache roots — C10's Manifest.
- The `EngineCacheKey` / `HostCapabilities` types themselves — owned by `_types/manifests.py` (AZ-263).
## Acceptance Criteria
**AC-1: Reference example builds correctly**
Given `("ultravpr", 87, "6.2", "10.3", "fp16")`
When `build` runs
Then the result is exactly `"ultravpr__sm87_jp6.2_trt10.3_fp16.engine"`
**AC-2: Round-trip identity**
Given 10 random valid tuples
When each round-trips through `parse(build(*args))`
Then each produces deep-equal `EngineCacheKey` outputs
**AC-3: Host-match exact**
Given a filename built for `(sm=87, jp=6.2, trt=10.3)` and a `HostCapabilities(sm=87, jp=6.2, trt=10.3)`
When `matches_host` runs
Then the result is True
**AC-4: Host-mismatch on any tuple element returns False (no exception)**
Given a filename built for `(sm=87, jp=6.2, trt=10.3)` and a host with `sm=72`
When `matches_host` runs
Then the result is False (NOT an exception — tuple mismatch is the expected "not a match" path)
**AC-5: Precision enum strictness**
Given `build(..., precision="bf16")`
When the call runs
Then `EngineFilenameSchemaError` is raised mentioning the allowed enum `{fp16, int8, mixed}`
**AC-6: Model-name character set**
Given `build("UltraVPR", ...)` (uppercase letters)
When the call runs
Then `EngineFilenameSchemaError` is raised mentioning the allowed `[a-z0-9_]` set
**AC-7: Reserved separator collision**
Given `build("ultra__vpr", ...)` (double underscore in model name)
When the call runs
Then `EngineFilenameSchemaError` is raised mentioning the reserved `__` separator
**AC-8: Version format strictness**
Given `build(..., jetpack="6.2.1", ...)` (three-segment version)
When the call runs
Then `EngineFilenameSchemaError` is raised mentioning the dotted `<major>.<minor>` format
**AC-9: Parse rejects malformed filenames**
Given `parse("not_an_engine_file.bin")`
When the call runs
Then `EngineFilenameSchemaError` is raised
**AC-10: Parse requires `.engine` suffix**
Given `parse("ultravpr__sm87_jp6.2_trt10.3_fp16")` (missing `.engine`)
When the call runs
Then `EngineFilenameSchemaError` is raised mentioning the required suffix
**AC-11: No upward imports (Layer 1 invariant)**
Given the helper module
When a static-import check runs
Then it imports ONLY from `_types`, `re`, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
## Non-Functional Requirements
**Performance**
- No specific latency budget per `_docs/02_document/common-helpers/06_helper_engine_filename_schema.md` (consumers are pre-flight / takeoff-load). Sanity bound: each helper call ≤ 50 µs on Tier-2.
**Reliability**
- Pure deterministic; same input → byte-equal output.
- `EngineFilenameSchemaError` is the ONLY exception type the public surface raises on validation / parse errors.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | reference example | exact filename match |
| AC-2 | round-trip 10 random valid tuples | deep-equal `EngineCacheKey` outputs |
| AC-3 | matching host | True |
| AC-4 | mismatched `sm` | False; no exception |
| AC-5 | `precision="bf16"` | `EngineFilenameSchemaError`; mentions enum |
| AC-6 | uppercase model name | `EngineFilenameSchemaError`; mentions `[a-z0-9_]` |
| AC-7 | double-underscore model name | `EngineFilenameSchemaError`; mentions reserved separator |
| AC-8 | three-segment version | `EngineFilenameSchemaError`; mentions dotted format |
| AC-9 | malformed filename | `EngineFilenameSchemaError` |
| AC-10 | missing `.engine` suffix | `EngineFilenameSchemaError`; mentions suffix |
| AC-11 | importlinter / grep gate | no `components.*` imports |
| NFR-perf | microbench each helper (10k iterations on Tier-2 fixture) | p99 ≤ 50 µs each |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` v1.0.0.
- Layer 1 Foundation only.
- Static-only design satisfies `coderule.mdc`.
- No new dependency beyond what AZ-263 / E-BOOT pinned (only `re` and stdlib are needed).
- The `EngineCacheKey` / `HostCapabilities` types live in `_types/manifests.py` (AZ-263 responsibility).
## Risks & Mitigation
**Risk 1: A future format change breaks existing cache roots**
- *Risk*: Adding a tuple dimension (e.g., `BUILD_*` flag combination) requires re-writing every existing `.engine` filename; deployments with stale cache roots fail silently.
- *Mitigation*: The contract `Versioning Rules` mandate a major-version bump for any format change. C7's `deserialize_engine` should also reject unrecognised filename patterns rather than guess; that is C7's responsibility to wire on top of this helper's `parse`.
**Risk 2: `matches_host` returns False without explanation**
- *Risk*: An operator copies an engine from a different deployment; takeoff-load skips it; the operator sees "no engine matches host" without knowing which tuple element mismatched.
- *Mitigation*: This helper is just the predicate. The error-surfacing UX is C7's / C10's responsibility — they call `parse` to extract the engine's tuple AND read `host_capabilities`, then format an actionable error. The contract documents the predicate's "True iff all tuple elements match" semantics so consumers can produce that message themselves.
## Runtime Completeness
- **Named capability**: self-describing engine filename schema (D-C10-7 / `06_helper_engine_filename_schema.md`).
- **Production code that must exist**: real format builder + parser + host-match predicate; real strict validation for all five tuple elements.
- **Allowed external stubs**: none — pure string parsing on stdlib.
- **Unacceptable substitutes**: `f"{model}_{sm}_{jp}_{trt}_{precision}.engine"` (single underscore separators ambiguate `model` from `sm`); silently truncating `jetpack="6.2.1"` to `6.2`; matching host with substring instead of exact-equality.
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
@@ -0,0 +1,175 @@
# DescriptorNormaliser Helper Module
**Task**: AZ-283_descriptor_normaliser
**Name**: DescriptorNormaliser Helper
**Description**: Implement the shared `DescriptorNormaliser` helper that L2-normalises descriptors so cosine similarity aligns with FAISS HNSW's inner-product metric. Used by C2 (query-side per-frame embedding before FAISS lookup), C2.5 (descriptor pre-processing for re-rank), C3 (descriptor pre-processing for cross-domain matching), and C10 (corpus-side per-tile embedding before FAISS index population). The same helper on both sides is what guarantees the index returns useful neighbours rather than garbage. Stateless static-only; dtype-preserving (`float16`/`float32` in → same out).
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.descriptor_normaliser (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-283
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/08_helper_descriptor_normaliser.md` — design rationale (FAISS metric alignment, corpus-vs-query coupling).
## Problem
FAISS HNSW operates on Euclidean / inner-product spaces, but the upstream backbones (UltraVPR, MegaLoc, MixVPR, SelaVPR, EigenPlaces, NetVLAD, SALAD) emit raw cosine-similar embeddings. The standard FAISS-idiomatic recipe is "L2-normalise both sides + use inner-product metric" — but this is fragile:
- If C10 (corpus side, pre-flight) and C2 (query side, runtime) drift on whether to normalise, or how to handle zero-norm vectors, the FAISS index returns garbage.
- If one side silently up-casts `float16` to `float32`, the index gets built with a different precision than the queries, producing wrong neighbours.
- A future contributor "improves" one side's normalisation (e.g., adds whitening) without the other; recall drops silently.
## Outcome
- A single `helpers.descriptor_normaliser` module is the only path through which any onboard process L2-normalises descriptors.
- The metric ("inner_product") is exposed via `descriptor_metric()` so C6's `DescriptorIndex.search_topk` and C10's index-build code consult the same source — no hard-coded `"l2"` or `"cosine"` strings anywhere.
- dtype is preserved: `float16` in → `float16` out (preserves the precision the backbone chose); `float32` in → `float32` out. No silent up-cast.
- Zero-norm input vectors are returned as the zero vector (no division-by-zero); callers filter or accept that such descriptors will match nothing.
- L2 normalisation is idempotent (byte-equal for `float32`, near-byte-equal for `float16` due to half-precision rounding) so accidentally normalising twice is harmless.
## Scope
### Included
- `DescriptorNormaliser` static methods: `l2_normalise(descriptor) -> ndarray`, `l2_normalise_batch(descriptors) -> ndarray`, `descriptor_metric() -> str`.
- `DescriptorNormaliserError` exception type.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md`.
### Excluded
- Whitening / mean-subtraction — out of scope; consumers that need it apply it before / after this helper.
- PCA / dimensionality reduction — out of scope.
- GPU-accelerated normalisation — out of scope for v1.0.0.
- Quantisation (PQ, IVF) — owned by C6 / C10 around the FAISS index.
- Auto-detection of descriptor dim — helper is shape-agnostic for any `D >= 1`.
## Acceptance Criteria
**AC-1: Unit-vector example**
Given `np.array([3.0, 4.0], dtype=float32)`
When `l2_normalise` runs
Then the result equals `np.array([0.6, 0.8], dtype=float32)` within `atol=1e-6`; norm ≈ 1.0
**AC-2: Batch normalisation**
Given `np.array([[3.0, 4.0], [1.0, 0.0]], dtype=float32)`
When `l2_normalise_batch` runs
Then the rows are `[0.6, 0.8]` and `[1.0, 0.0]`; each row's norm is ≈ 1.0
**AC-3: dtype preservation — float16**
Given a random `float16` descriptor of dim 512
When `l2_normalise` runs
Then `result.dtype == float16` AND `np.linalg.norm(result.astype(float32))` ≈ 1.0 within `atol=1e-3`
**AC-4: dtype preservation — float32**
Given a random `float32` descriptor of dim 512
When `l2_normalise` runs
Then `result.dtype == float32` AND `np.linalg.norm(result)` ≈ 1.0 within `atol=1e-6`
**AC-5: Zero-vector handling**
Given `np.zeros(128, dtype=float32)`
When `l2_normalise` runs
Then the result is `np.zeros(128, dtype=float32)` (no exception, no NaN)
**AC-6: Idempotence — float32**
Given a random `float32` descriptor `x`
When `l2_normalise(l2_normalise(x))` runs
Then it is byte-equal to `l2_normalise(x)`
**AC-7: Idempotence — float16**
Given a random `float16` descriptor `x`
When `l2_normalise(l2_normalise(x))` runs
Then it matches `l2_normalise(x)` within `atol=1e-3` (half-precision rounding)
**AC-8: No in-place mutation**
Given `x` is a `float32` descriptor
When `l2_normalise(x)` runs
Then `x` is bit-identical to its original value
**AC-9: Metric source of truth**
Given a call to `descriptor_metric()`
When it runs
Then it returns the string `"inner_product"`
**AC-10: dtype contract — float64 rejected**
Given a `float64` array
When `l2_normalise` runs
Then `DescriptorNormaliserError` is raised mentioning `float16` / `float32` only
**AC-11: Shape contract — 1-D for single, 2-D for batch**
Given a 2-D array passed to `l2_normalise` (single)
When the call runs
Then `DescriptorNormaliserError` is raised mentioning the 1-D shape requirement
And given a 1-D array passed to `l2_normalise_batch`
When the call runs
Then `DescriptorNormaliserError` is raised mentioning the 2-D shape requirement
**AC-12: No upward imports (Layer 1 invariant)**
Given the helper module
When a static-import check runs
Then it imports ONLY from `_types`, numpy, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
## Non-Functional Requirements
**Performance**
- `l2_normalise` p99 ≤ 50 µs on Tier-2 for `D=512` (matches the per-frame VPR query budget).
- `l2_normalise_batch` p99 ≤ 5 ms on Tier-2 for `(N=1000, D=512)` (matches the C10 batch index-build chunk size).
- Helper-level overhead vs. inline `x / np.linalg.norm(x)` ≤ 5 % (per E-CC-HELPERS hot-path NFR).
**Reliability**
- `DescriptorNormaliserError` is the ONLY exception type the public surface raises on shape / dtype violations.
- Pure deterministic; same input → byte-equal output (within `float16` rounding).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `[3.0, 4.0]` fp32 | result `[0.6, 0.8]` within `atol=1e-6` |
| AC-2 | batch `[[3, 4], [1, 0]]` | rows `[0.6, 0.8]` and `[1.0, 0.0]` |
| AC-3 | random fp16 dim-512 | result.dtype fp16; norm ≈ 1.0 within `atol=1e-3` |
| AC-4 | random fp32 dim-512 | result.dtype fp32; norm ≈ 1.0 within `atol=1e-6` |
| AC-5 | zero vector | returned as zero vector; no exception |
| AC-6 | double-normalise fp32 | byte-equal to single-normalise |
| AC-7 | double-normalise fp16 | matches single-normalise within `atol=1e-3` |
| AC-8 | mutation check | input unchanged after call |
| AC-9 | `descriptor_metric()` | exact string `"inner_product"` |
| AC-10 | fp64 input | `DescriptorNormaliserError`; mentions fp16/fp32 |
| AC-11 | 2-D into single, 1-D into batch | `DescriptorNormaliserError` for each |
| AC-12 | importlinter / grep gate | no `components.*` imports |
| NFR-perf | microbench `l2_normalise` (D=512, 10k iterations on Tier-2 fixture) | p99 ≤ 50 µs; overhead ≤ 5 % |
| NFR-perf-batch | microbench `l2_normalise_batch` (N=1000, D=512, 10k iterations) | p99 ≤ 5 ms |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` v1.0.0.
- Layer 1 Foundation only.
- Numpy is the single backend; numpy-CUDA may be used opportunistically but the contract surface is dtype-only (no GPU array types leak through).
- Static-only design satisfies `coderule.mdc`.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
## Risks & Mitigation
**Risk 1: Silent dtype up-cast hides corpus-vs-query precision drift**
- *Risk*: A future change up-casts `float16``float32` "for numerical stability"; C10's corpus is built with fp32 normalisations while C2's queries are still fp16 raw embeddings; recall silently drops.
- *Mitigation*: AC-3 / AC-4 pin dtype preservation. The contract test is the canary.
**Risk 2: Whitening creep**
- *Risk*: A contributor adds optional whitening "for `MixVPR` only" inside this helper; one consumer calls it with `whiten=True`, the other doesn't; index becomes inconsistent.
- *Mitigation*: The contract `Non-Goals` explicitly excludes whitening. Whitening lives elsewhere (or not at all in v1.0.0). A whitening-related contract change is a major version with a forced index rebuild.
**Risk 3: Zero-norm vectors crash the FAISS index build**
- *Risk*: Zero-norm input → `nan / 0` propagates into the FAISS index; the index becomes corrupt.
- *Mitigation*: AC-5 pins the zero-vector handling: zero in → zero out, no NaN. C10 / C2 are responsible for filtering zero descriptors before / after this helper if they don't want them in the index.
## Runtime Completeness
- **Named capability**: L2 normalisation aligning cosine-similar embeddings to FAISS inner-product metric (architecture / E-CC-HELPERS / `08_helper_descriptor_normaliser.md`).
- **Production code that must exist**: real numpy-backed L2 normalisation; real dtype-preserving path; real zero-norm-safe handling.
- **Allowed external stubs**: none — numpy is stdlib-tier production dep.
- **Unacceptable substitutes**: silent dtype up-cast; `np.divide(x, np.linalg.norm(x))` without zero-norm guard (NaN propagation); hard-coded metric string in C6 / C10 instead of consulting `descriptor_metric()`.
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md`.
Consumers MUST read that file — not this task spec — to discover the interface.