Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,145 @@
# Contract: CacheProvisioner (C10)
**Type**: Python Protocol (`@runtime_checkable`) — local in-process API.
**Producer task**: AZ-325_c10_cache_provisioner
**Consumers**:
- C12 Operator Tooling — orchestrates the F1 build sequence `C11 TileDownloader → CacheProvisioner.build_artifacts` and surfaces the `BuildReport` to the operator (E-C12 / AZ-253).
- C13 FDR — out of scope for build (F1 is offline / pre-flight); F2's verify is owned by the `ManifestVerifier` contract.
## Purpose
`CacheProvisioner` is the public top-level surface for the C10 build phase. It composes `EngineCompiler` (AZ-321), `DescriptorBatcher` (AZ-322), and `ManifestBuilder` (AZ-323) into a single idempotent operation that the operator runs after `C11 TileDownloader` has populated C6. The Provisioner enforces D-C10-1 idempotence (skip rebuild when the build-identity hash matches the prior Manifest), D-C10-3 ManifestCoverageError (every shipped artifact under `cache_root` MUST be in the Manifest — no smuggled files), and D-C10-6 hardware-tied engine reuse (delegated to AZ-321). It does NOT touch `satellite-provider` (per epic § Architecture notes); tile I/O is C11's responsibility.
## Public Surface
```python
from pathlib import Path
from typing import Protocol, runtime_checkable
@runtime_checkable
class CacheProvisioner(Protocol):
"""Public top-level orchestrator for C10 cache build.
Idempotent: if the prior Manifest's build-identity hash matches the
request's, returns `outcome=IDEMPOTENT_NO_OP` without rebuilding.
Otherwise composes engine compile + descriptor population + Manifest
write + coverage check.
"""
def build_cache_artifacts(self, request: BuildRequest) -> BuildReport: ...
def compile_engines_for_corpus(self, request: EngineCompileRequest) -> tuple[EngineCacheEntry, ...]: ...
```
### DTOs
```python
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
class SectorClassification(Enum):
ACTIVE_CONFLICT = "active_conflict"
STABLE_REAR = "stable_rear"
class BuildOutcome(Enum):
SUCCESS = "success"
FAILURE = "failure"
IDEMPOTENT_NO_OP = "idempotent_no_op"
@dataclass(frozen=True)
class Bbox:
lat_min: float
lon_min: float
lat_max: float
lon_max: float
@dataclass(frozen=True)
class BuildRequest:
bbox: Bbox
zoom_levels: tuple[int, ...]
sector_class: SectorClassification
calibration_path: Path
cache_root: Path
key_path: Path # operator signing key per C10-ST-01
@dataclass(frozen=True)
class BuildReport:
outcome: BuildOutcome
engines_built: int
engines_reused: int
descriptors_generated: int
manifest_hash: str | None
manifest_path: Path | None
failure_reason: str | None
elapsed_s: float
```
(`EngineCompileRequest` and `EngineCacheEntry` are AZ-321's; re-exported for convenience.)
### Exceptions
| Exception | When raised | Caller action |
|-----------|------------|---------------|
| `BuildLockHeldError` | Another `build_cache_artifacts` invocation holds the cache_root lockfile (per description.md § 7 race-condition mitigation). | Operator waits / kills the other process; not retried automatically. |
| `ManifestCoverageError` | After build, an orphan file exists under `cache_root` that is not listed in the Manifest. | Build is rolled back to prior-good Manifest (if present); operator inspects the orphan. |
| `EngineBuildError`, `CalibrationCacheError` | Propagated from AZ-321 / AZ-298. | Operator triages GPU / calibration. |
| `DescriptorBatchError` | Propagated from AZ-322. | Operator triages GPU OOM / model. |
| `ManifestWriteError` | Propagated from AZ-323 (key fingerprint mismatch in operator mode, key load failure, atomic-write failure). | Operator inspects key / disk. |
`BuildOutcome.FAILURE` is reserved for soft failures captured in `BuildReport` (missing tiles in C6, coverage warning when configured non-strict). Hard errors raise.
## Invariants
| ID | Invariant | Why |
|----|-----------|-----|
| CP-INV-1 | Idempotence: if `Manifest.json` exists at `cache_root` AND its `manifest_hash` equals the build-identity hash for the new request → `outcome=IDEMPOTENT_NO_OP`, ZERO new compiles, ZERO new embeds, ZERO new Manifest writes; the existing Manifest is left untouched. | D-C10-1; warm re-run ≤ 1 min envelope (C10-PT-01). |
| CP-INV-2 | A failed `build_cache_artifacts` does NOT leave the cache in a worse state than at the start: new engines may exist (cache hits) but the Manifest is either the previous-good one OR rolled back; the FAISS index is either the previous-good one OR atomically replaced. | Operators can retry safely. |
| CP-INV-3 | After a SUCCESS outcome, `ManifestCoverageError` has been verified absent: every file under `cache_root` (recursively, excluding the Manifest itself + sidecars + sig) is listed in the Manifest's artifacts. | D-C10-3 — no smuggled artifacts in the takeoff cache. |
| CP-INV-4 | Concurrent `build_cache_artifacts` calls on the same `cache_root` are mutually exclusive via a filesystem lockfile at `cache_root/.c10.lock`. | description.md § 7 race-condition mitigation. |
| CP-INV-5 | `cache_root` must already exist; `build_cache_artifacts` does NOT create the directory tree (operator workflow places it). | Avoids accidental builds in unintended paths. |
| CP-INV-6 | No network calls (no `satellite-provider`, no Postgres TLS to a remote DB beyond the local instance, no metric push). | Epic § Architecture notes: C10 is workstation-local. |
| CP-INV-7 | The operator key file at `request.key_path` is opened exactly once (via AZ-323's signer) and zeroized when out of scope; this contract does NOT cache the key in memory across calls. | Operator key hygiene. |
## Non-Goals
- Tile fetch from `satellite-provider` — owned by E-C11 / C11 TileDownloader.
- Engine deserialization at takeoff — owned by E-C7 / AZ-298 + C5 takeoff arming.
- Manifest verification — owned by AZ-324's `ManifestVerifier` (separate contract).
- Multi-cache management (rotating between sector caches) — operator runs `build_cache_artifacts` per cache_root.
- Garbage collection of stale engines — explicit operator action; not part of the build flow.
- Resumable build (mid-build process kill → resume from last batch) — out of scope; restart from scratch.
## Versioning
- v1.0.0 — initial Protocol surface (this document).
- Breaking changes: changing `BuildRequest` shape, removing a `BuildOutcome`, adding a required field — bump major.
- Additive changes: new optional kwarg, new `BuildOutcome` value, new field on `BuildReport` — bump minor. Consumers MUST handle unknown outcomes gracefully (treat as FAILURE).
- Patch: clarifications, doc edits.
| Version | Date | Notes | Author |
|---------|------|-------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — produced by AZ-325 (E-C10 decomposition) | autodev |
## Test Cases (consumer side)
| ID | Scenario | Expected Outcome |
|----|----------|------------------|
| CP-TC-1 | Cold build with all dependencies satisfied | `outcome=SUCCESS`; counts > 0; Manifest at `cache_root/Manifest.json` |
| CP-TC-2 | Warm build, identical request | `outcome=IDEMPOTENT_NO_OP`; counts all 0; Manifest unchanged on disk |
| CP-TC-3 | Warm build, different bbox | `outcome=SUCCESS`; rebuild happens; new Manifest replaces old (atomic) |
| CP-TC-4 | C6 has zero tiles for the requested scope | `outcome=FAILURE`; `failure_reason` directs operator to run C11 first |
| CP-TC-5 | Concurrent invocation while another build in progress | `BuildLockHeldError`; second invocation does not corrupt state |
| CP-TC-6 | An orphan file exists under `cache_root` after build | `ManifestCoverageError`; rolled back to prior Manifest if present |
| CP-TC-7 | Operator key file fingerprint not in allowlist (operator mode) | `ManifestWriteError` (propagated from AZ-323); ZERO file writes |
| CP-TC-8 | `EngineBuildError` mid-compile | Exception propagates; partial cache state consistent (atomic engines on disk for those that succeeded; Manifest NOT updated) |
| CP-TC-9 | `DescriptorBatchError` (persistent CUDA OOM) | Exception propagates; engines may be on disk; Manifest NOT updated |
| CP-TC-10 | Conformance: `isinstance(impl, CacheProvisioner)` | `True` |
| CP-TC-11 | `compile_engines_for_corpus` directly callable for re-compile-only flows | Returns `tuple[EngineCacheEntry, ...]`; no descriptor / Manifest work |
| CP-TC-12 | Cold build wall-clock benchmark on Tier-1 dev workstation, 1k tiles, 3 backbones | ≤ 12 min (NFR C10-PT-01) |
| CP-TC-13 | Warm idempotent re-run benchmark | ≤ 1 min (NFR C10-PT-01) |
@@ -0,0 +1,134 @@
# Contract: ManifestVerifier (C10)
**Type**: Python Protocol (`@runtime_checkable`) — local in-process API.
**Producer task**: AZ-324_c10_manifest_verifier
**Consumers**:
- C5 State Estimator / takeoff-arming gate (F2 phase) — refuses to arm if `verify_manifest` does not return `outcome=pass`. (E-C5 / AZ-249.)
- C12 Operator Tooling — runs verify before flight handoff to surface drift between F1 build time and F2 takeoff (E-C12 / AZ-253).
- C13 FDR — emits a `manifest.verify` record on every airborne verify call (`outcome` field gates downstream).
## Purpose
`ManifestVerifier` is the read-only validator for the C10-produced cache Manifest. It is the takeoff trust anchor for AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") and D-C10-3 ("SHA-256 content-hash gate over every shipped artifact"). At F2 takeoff, every artifact listed in the Manifest is re-hashed and compared to its recorded digest; any mismatch fails the verdict and prevents arming. The Ed25519 signature over the Manifest is verified against a pinned operator public key before any artifact is touched — defence-in-depth against a spliced Manifest pointing at attacker-chosen content hashes.
## Public Surface
```python
from pathlib import Path
from typing import Protocol, runtime_checkable
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PublicKey
@runtime_checkable
class ManifestVerifier(Protocol):
"""Read-only verifier for a C10-produced Manifest.json.
Fail-closed: any deviation in signature, schema, or per-artifact hash
yields `VerificationResult(outcome=fail, ...)`. Never raises on a verify
failure — operators / takeoff arming code branch on `outcome`.
Raises only on resource errors (Manifest.json missing, key file
unreadable) — those are environment problems, not verify outcomes.
"""
def verify_manifest(
self,
*,
manifest_path: Path,
trusted_public_keys: tuple[Ed25519PublicKey, ...],
) -> VerificationResult: ...
```
### DTOs
```python
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
class VerifyOutcome(Enum):
PASS = "pass"
FAIL = "fail"
class VerifyFailReason(Enum):
MANIFEST_NOT_FOUND = "manifest_not_found"
SIGNATURE_NOT_FOUND = "signature_not_found"
SIGNATURE_INVALID = "signature_invalid"
UNTRUSTED_PUBLIC_KEY = "untrusted_public_key"
SCHEMA_VIOLATION = "schema_violation"
ARTIFACT_MISSING = "artifact_missing"
ARTIFACT_HASH_MISMATCH = "artifact_hash_mismatch"
TILES_COVERAGE_MISMATCH = "tiles_coverage_mismatch"
MANIFEST_SELF_HASH_MISMATCH = "manifest_self_hash_mismatch"
@dataclass(frozen=True)
class ArtifactCheck:
relative_path: str
expected_sha256: str
actual_sha256: str | None # None if file missing
matched: bool
@dataclass(frozen=True)
class VerificationResult:
outcome: VerifyOutcome
fail_reasons: tuple[VerifyFailReason, ...]
fail_details: tuple[str, ...] # human-readable diagnostic per reason
signing_public_key_fingerprint: str | None # populated when signature parses, even if untrusted
per_artifact_checks: tuple[ArtifactCheck, ...]
elapsed_ms: int
```
## Invariants
| ID | Invariant | Why |
|----|-----------|-----|
| MV-INV-1 | The verifier is fail-closed: any deviation produces `outcome=FAIL` with at least one `VerifyFailReason`; never returns `PASS` with non-empty `fail_reasons`. | AC-NEW-1 / D-C10-3 — takeoff cannot arm on a partial verify. |
| MV-INV-2 | Signature verification happens BEFORE per-artifact hashing. If the signature is invalid or untrusted, no file content is read beyond the Manifest itself. | Defence-in-depth: a malicious Manifest must not trick the verifier into hashing attacker-chosen file paths. |
| MV-INV-3 | The Manifest's own `Manifest.json.sha256` sidecar (written by AZ-323) must match `sha256(Manifest.json)`; mismatch is `MANIFEST_SELF_HASH_MISMATCH`. | The sidecar is the entry point of the chain of trust — drift here means tampering or atomic-write failure. |
| MV-INV-4 | Per-artifact paths are interpreted relative to `manifest_path.parent`; absolute paths in the Manifest are rejected as `SCHEMA_VIOLATION`. | Prevents a malicious Manifest from pointing outside `cache_root`. |
| MV-INV-5 | `tiles_coverage` mismatch is reported separately from `ARTIFACT_HASH_MISMATCH` because tiles are hashed in aggregate (per AZ-323). The verifier re-derives the aggregate hash from a `TileMetadataStore` query if available, OR (in airborne F2 mode) treats the recorded `tiles_coverage_sha256` as authoritative and only verifies the Manifest signature + non-tile artifacts. | Airborne C5 may not load 100k per-tile rows just to arm; the trust chain is signature → manifest_hash → tiles_coverage_sha256. C12 / operator mode does the full re-derivation. |
| MV-INV-6 | The verifier never writes to disk, never opens network sockets, never calls C13. Telemetry is the caller's responsibility. | Read-only contract — composable in airborne C5 + operator C12 contexts without side-effect surprise. |
| MV-INV-7 | `elapsed_ms` is recorded for every call (pass or fail) so operators and C5 can observe drift in verify cost on slow disks. | NFR for C10-PT-01's takeoff load budget. |
## Non-Goals
- **Signature production** — owned by AZ-323's `ManifestSigner`. The verifier never signs.
- **Cache repair** — the verifier reports failures; rebuild is owned by AZ-325 (the orchestrator).
- **Trusted-key distribution / revocation** — `trusted_public_keys` is supplied by the caller; this contract does not define a key registry.
- **Coverage check (orphan files in cache_root)** — owned by AZ-325 (`ManifestCoverageError`); the verifier checks "every Manifest entry exists and matches", not "every cache_root file is in the Manifest".
- **Rollback to prior-good Manifest** — out of scope; caller decides next action on `FAIL`.
## Versioning
- v1.0.0 — initial Protocol surface (this document).
- Breaking changes — adding a required argument, removing a `VerifyFailReason`, changing semantics of an existing one — bump major.
- Additive changes — new `VerifyFailReason` value, new optional kwarg on `verify_manifest`, new field on `VerificationResult` — bump minor. Consumers MUST handle unknown reasons gracefully (default to FAIL).
- Patch — clarifications, doc edits, bug-fix tests.
| Version | Date | Notes | Author |
|---------|------|-------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — produced by AZ-324 (E-C10 decomposition) | autodev |
## Test Cases (consumer side)
| ID | Scenario | Expected Outcome |
|----|----------|------------------|
| MV-TC-1 | Valid Manifest + trusted key + all artifacts present + hashes match | `outcome=PASS`, empty `fail_reasons`, `per_artifact_checks` all `matched=True` |
| MV-TC-2 | Manifest.json missing | `outcome=FAIL`, `fail_reasons=(MANIFEST_NOT_FOUND,)`; no further work |
| MV-TC-3 | Manifest.json.sig missing | `outcome=FAIL`, `fail_reasons=(SIGNATURE_NOT_FOUND,)`; signature_public_key_fingerprint=None |
| MV-TC-4 | Signature does not verify | `outcome=FAIL`, `fail_reasons=(SIGNATURE_INVALID,)`; no per-artifact checks performed |
| MV-TC-5 | Signature verifies but key is not in `trusted_public_keys` | `outcome=FAIL`, `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)`; fingerprint populated |
| MV-TC-6 | Schema violation (missing required key, absolute path, wrong types) | `outcome=FAIL`, `fail_reasons=(SCHEMA_VIOLATION,)` with detail naming the field |
| MV-TC-7 | One engine missing on disk | `outcome=FAIL`, `fail_reasons=(ARTIFACT_MISSING,)`; `per_artifact_checks` shows that engine with `actual_sha256=None, matched=False` |
| MV-TC-8 | One engine present but bytes drifted | `outcome=FAIL`, `fail_reasons=(ARTIFACT_HASH_MISMATCH,)`; offending check has `matched=False` |
| MV-TC-9 | Multiple failures (missing + drifted + signature OK) | `fail_reasons` contains BOTH `ARTIFACT_MISSING` and `ARTIFACT_HASH_MISMATCH`; per-artifact checks complete (don't short-circuit on first failure) |
| MV-TC-10 | `Manifest.json.sha256` sidecar mismatch | `outcome=FAIL`, `fail_reasons=(MANIFEST_SELF_HASH_MISMATCH,)`; signature path NOT consulted |
| MV-TC-11 | Tampered Manifest body but matching sidecar | `outcome=FAIL`, `fail_reasons=(SIGNATURE_INVALID,)` (the signature cannot match if body changed even by 1 byte) |
| MV-TC-12 | Conformance: `isinstance(ManifestVerifier, my_impl)` | `True` |
| MV-TC-13 | Tier-2 Tile-coverage check (operator mode with TileMetadataStore) | If recomputed `tiles_coverage_sha256` differs → `TILES_COVERAGE_MISMATCH`; if matches → that part passes |
| MV-TC-14 | Empty `trusted_public_keys` | `outcome=FAIL`, `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)` (every key is untrusted by definition) |
| MV-TC-15 | Pristine Manifest verified inside 100 ms on Tier-2 (excludes per-tile re-walk) | `elapsed_ms ≤ 100` for the signature + non-tile artifact path |
@@ -0,0 +1,115 @@
# Contract: tile_downloader
**Component**: c11_tilemanager
**Producer task**: AZ-316_c11_tile_downloader
**Consumer tasks**: AZ-253 (E-C12 Operator Pre-flight Tooling — TBD at C12 decompose time)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
The `TileDownloader` Protocol is C11's operator-side download interface. C12 invokes it during F1 (pre-flight cache build) to fetch satellite tiles from the parent suite's `satellite-provider` GET surface, apply RESTRICT-SAT-4 resolution gating at the C11 boundary, and write accepted tiles into C6. Freshness rejections surfacing from C6 (AZ-307) are counted and surfaced in the report.
C11 is operator-side ONLY; ADR-004 forbids the airborne companion image from importing this module.
## Shape
### Function / method API
```python
from typing import Protocol, runtime_checkable
from pathlib import Path
@runtime_checkable
class TileDownloader(Protocol):
def download_tiles_for_area(self, request: DownloadRequest) -> DownloadBatchReport: ...
def enumerate_remote_coverage(self, bbox: Bbox, zoom_levels: list[int]) -> list[TileSummary]: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `download_tiles_for_area` | `(request: DownloadRequest) -> DownloadBatchReport` | `SatelliteProviderError`, `RateLimitedError`, `ResolutionRejectionError`, `CacheBudgetExceededError`, `TileFsError`, `TileMetadataError` | sync (offline; minutes) |
| `enumerate_remote_coverage` | `(bbox: Bbox, zoom_levels: list[int]) -> list[TileSummary]` | `SatelliteProviderError`, `RateLimitedError` | sync (seconds) |
### Data DTOs
```python
@dataclass(frozen=True)
class DownloadRequest:
bbox: Bbox # from c6_tile_cache
zoom_levels: tuple[int, ...]
sector_class: SectorClassification # from c6_tile_cache
satellite_provider_url: str # parent-suite base URL
service_api_key: str # TLS + service-internal
cache_root: Path # operator workstation
flight_id: uuid.UUID # tags downloads in C6 metadata
@dataclass(frozen=True)
class DownloadBatchReport:
tiles_downloaded: int
tiles_rejected_freshness: int # raised by AZ-307 at C6 boundary
tiles_rejected_resolution: int # rejected by C11 (RESTRICT-SAT-4)
tiles_downgraded: int # stable_rear stale → DOWNGRADED label
freshness_summary: dict[FreshnessLabel, int]
outcome: DownloadOutcome # success | failure | idempotent_no_op
failure_reason: str | None
@dataclass(frozen=True)
class TileSummary:
tile_id: TileId # from c6_tile_cache
produced_at: datetime
resolution_m_per_px: float
estimated_bytes: int
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `DownloadRequest.bbox` | `Bbox` | yes | Operational area | min_lat ≤ max_lat, min_lon ≤ max_lon |
| `DownloadRequest.zoom_levels` | `tuple[int, ...]` | yes | Zoom levels to fetch | each in `[0, 21]`; deduplicated |
| `DownloadRequest.sector_class` | `SectorClassification` | yes | Drives freshness rule applied at C6 | `ACTIVE_CONFLICT \| STABLE_REAR` |
| `DownloadRequest.cache_root` | `Path` | yes | Operator workstation cache dir | must exist; must be writable |
| `DownloadBatchReport.tiles_downloaded` | `int` | yes | Tiles written to C6 successfully | ≥ 0 |
| `DownloadBatchReport.tiles_rejected_resolution` | `int` | yes | Tiles rejected at C11 boundary for < 0.5 m/px | ≥ 0 |
| `DownloadBatchReport.tiles_rejected_freshness` | `int` | yes | Count of `FreshnessRejectionError` raised by C6 (AZ-307) | ≥ 0 |
| `DownloadBatchReport.outcome` | `DownloadOutcome` | yes | Aggregate outcome | enum |
## Invariants
- I-1: `tiles_downloaded + tiles_rejected_resolution + tiles_rejected_freshness == sum of attempted tiles`. The report accounts for every tile the downloader attempted; no silent drops.
- I-2: A re-run of `download_tiles_for_area` for the same `(bbox, zoom_levels, sector_class, flight_id)` after a successful prior run is idempotent: `outcome = idempotent_no_op` and no GETs are issued. Idempotence is enforced by C11's download-progress journal under `cache_root/.c11/journal/`.
- I-3: Every accepted tile passes BOTH the C11 resolution gate (≥ 0.5 m/px per RESTRICT-SAT-4) AND the C6 freshness gate (AZ-307). A tile that fails either is excluded from `tiles_downloaded`.
- I-4: TLS + service-internal API key authenticate the GET; auth failure surfaces as `SatelliteProviderError` and aborts the run with `outcome = failure`. The downloader does NOT fall back to plaintext or unauthenticated requests.
- I-5: The downloader writes via the AZ-303 `TileStore`/`TileMetadataStore` Protocols; it does NOT touch C6's filesystem layout directly.
- I-6: A `CacheBudgetExceededError` aborts pre-write with no partial write and `outcome = failure`. The C6 cache budget enforcer (AZ-308) drives the headroom check.
## Non-Goals
- Not covered: airborne or in-flight downloads (RESTRICT-SAT-1 forbids them; airborne process cannot import this module per ADR-004).
- Not covered: orchestration of when the operator runs F1 — owned by C12.
- Not covered: cache artifact build (descriptors, FAISS index) — owned by C10 after the downloader populates C6.
- Not covered: tile uploads to `satellite-provider` ingest — owned by `TileUploader` (separate contract).
- Not covered: parsing or validation of `satellite-provider`'s authentication payload beyond what `httpx` provides — out of scope for the onboard side.
## Versioning Rules
- **Breaking changes** (renamed method, removed required field, changed return type) require a major version bump. C12 is the sole consumer today; coordinate via Choose A/B/C/D when bumping.
- **Non-breaking additions** (new optional field on the report, new error variant the consumer already catches via the family) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| download-happy-path | `DownloadRequest` for Derkachi bbox with mix of fresh active_conflict + stable_rear tiles | `DownloadBatchReport` with `tiles_downloaded > 0`; sum of report counts equals attempt count; tiles present in C6 | C11-IT-01 |
| freshness-rejection-counts | source returns stale tiles in active_conflict sector | `DownloadBatchReport.tiles_rejected_freshness > 0`; matches C6's AZ-307 rejection count for that batch | C11-IT-02 |
| resolution-gate-rejects | source returns tile with `resolution_m_per_px = 0.3` (< 0.5) | tile excluded from `tiles_downloaded`; `tiles_rejected_resolution += 1`; no C6 write attempted | RESTRICT-SAT-4 |
| auth-failure-aborts | invalid `service_api_key` | first GET raises `SatelliteProviderError`; `outcome = failure`; no tiles written | I-4 |
| budget-exceeded-aborts | pre-write check shows insufficient headroom | `CacheBudgetExceededError`; `outcome = failure`; zero partial writes | I-6 |
| idempotent-rerun | second call with identical request after success | `outcome = idempotent_no_op`; zero GETs observed | I-2 |
| rate-limited-honors-retry-after | source returns 429 with `Retry-After: 30` | downloader sleeps ≥ 30s before retry; no `RateLimitedError` raised on success path | RFC 6585 |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — produced by AZ-316 (E-C11 decomposition) | autodev |
@@ -0,0 +1,114 @@
# Contract: tile_uploader
**Component**: c11_tilemanager
**Producer task**: AZ-319_c11_tile_uploader
**Consumer tasks**: AZ-253 (E-C12 Operator Pre-flight Tooling — TBD at C12 decompose time)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
The `TileUploader` Protocol is C11's operator-side post-landing upload interface. C12 invokes it during F10 (post-landing) to read mid-flight tiles flagged pending-upload from C6 (`source = onboard_ingest`, `voting_status = pending`), package them per the D-PROJ-2 ingest contract sketch, sign each tile payload with the per-flight ephemeral key (AZ-318), and POST to `satellite-provider`'s `/api/satellite/tiles/ingest` endpoint. Acknowledged tiles are marked uploaded in C6.
The uploader gates on `flight_state == ON_GROUND` (AZ-317) before any network egress. C11 is operator-side ONLY; ADR-004 forbids the airborne companion image from importing this module.
## Shape
### Function / method API
```python
from typing import Protocol, runtime_checkable
@runtime_checkable
class TileUploader(Protocol):
def upload_pending_tiles(self, request: UploadRequest) -> UploadBatchReport: ...
def enumerate_pending_tiles(self, flight_id: uuid.UUID | None = None) -> list[TileMetadata]: ...
def confirm_flight_state(self) -> FlightStateSignal: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `upload_pending_tiles` | `(request: UploadRequest) -> UploadBatchReport` | `FlightStateNotOnGroundError`, `SatelliteProviderError`, `RateLimitedError`, `SignatureRejectedError`, `TileMetadataError` | sync (post-landing; minutes) |
| `enumerate_pending_tiles` | `(flight_id: uuid.UUID \| None) -> list[TileMetadata]` | `TileMetadataError` | sync (seconds) |
| `confirm_flight_state` | `() -> FlightStateSignal` | `FlightStateNotOnGroundError` | sync (≤ 1 ms) |
### Data DTOs
```python
@dataclass(frozen=True)
class UploadRequest:
flight_id: uuid.UUID | None # None = all flights with pending
batch_size: int # tiles per HTTP POST
satellite_provider_url: str # parent-suite ingest base URL
@dataclass(frozen=True)
class UploadBatchReport:
batch_uuid: uuid.UUID # assigned by parent-suite ingest
per_tile_status: tuple[PerTileStatus, ...]
retry_count: int
next_retry_at_s: int | None # set when partial-success
outcome: UploadOutcome # success | partial | failure
public_key_fingerprint: str # 16-hex; from AZ-318
@dataclass(frozen=True)
class PerTileStatus:
tile_id: TileId # from c6_tile_cache
status: IngestStatus # queued | rejected | duplicate | superseded
rejection_reason: str | None
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `UploadRequest.flight_id` | `UUID \| None` | no | Restricts batch to one flight | None = all pending across flights |
| `UploadRequest.batch_size` | `int` | yes | Tiles per HTTP POST | `1 ≤ batch_size ≤ 200` |
| `UploadBatchReport.batch_uuid` | `UUID` | yes | Parent-suite batch identifier | Server-assigned per D-PROJ-2 |
| `UploadBatchReport.per_tile_status` | `tuple[PerTileStatus, ...]` | yes | Per-tile result | Length = number of tiles attempted in this report |
| `UploadBatchReport.outcome` | `UploadOutcome` | yes | Aggregate outcome | `success` (all queued/duplicate/superseded) \| `partial` (some rejected/timeout) \| `failure` (gate blocked or full failure) |
| `UploadBatchReport.public_key_fingerprint` | `str` | yes | Identifies the per-flight signing key | 16 hex chars from AZ-318 |
| `PerTileStatus.status` | `IngestStatus` | yes | Server response status | `queued` \| `rejected` \| `duplicate` \| `superseded` |
## Invariants
- I-1: `confirm_flight_state` is called by `upload_pending_tiles` BEFORE any C6 read or network egress; if `FlightStateNotOnGroundError` is raised, NO tiles are read, NO POSTs are issued, NO C6 mutation occurs. The gate is closed by default.
- I-2: Every uploaded tile carries a signature produced by the AZ-318 per-flight key manager's `sign(payload)`. The parent suite verifies against the public key it received via the safety officer's pre-flight enrolment OR the `kind="c11.upload.session.key.public"` FDR record.
- I-3: A tile acknowledged as `queued`, `duplicate`, or `superseded` by the parent suite is marked `uploaded` in C6 (`mark_uploaded(tile_id)`); a tile acknowledged as `rejected` is NOT marked uploaded — it remains `pending` for human review.
- I-4: The per-flight signing key is zeroised at the end of `upload_pending_tiles` regardless of success or failure (try/finally in the caller; AZ-318's `end_session()`).
- I-5: A `SignatureRejectedError` from the parent suite triggers an FDR alert (AZ-318's `record_signature_rejection`); it is NEVER silently caught.
- I-6: The uploader writes via the AZ-303 `TileMetadataStore.mark_uploaded` Protocol; it does NOT update the metadata table directly.
- I-7: Partial-success batches are reported (not raised as failures) so the caller can re-invoke for the unacked tiles; idempotent retry behaviour is owned by the AZ-320 decorator that wraps this Protocol's impl.
- I-8: The signed payload includes `capture_timestamp` per the D-PROJ-2 contract sketch; the parent suite's nonce / timestamp validation owns replay defence.
## Non-Goals
- Not covered: airborne or in-flight uploads (RESTRICT-SAT-1 forbids them; airborne process cannot import this module per ADR-004).
- Not covered: orchestration of when the operator runs F10 — owned by C12.
- Not covered: tile downloads from `satellite-provider` — owned by `TileDownloader` (separate contract).
- Not covered: parent-suite voting / trust-promotion of uploaded tiles — owned by D-PROJ-2 design task #2 (`satellite-provider`).
- Not covered: HSM / TPM-backed key storage — out of scope this cycle (in-memory key with zeroisation).
- Not covered: mid-upload key rotation — one key per session.
- Not covered: idempotent retry across partial-success batches — separate task in this epic decorates this contract.
## Versioning Rules
- **Breaking changes** (renamed method, removed required field, changed return type, changed signature contract) require a major version bump. C12 is the sole consumer today; coordinate via Choose A/B/C/D when bumping.
- **Non-breaking additions** (new optional field on the report, new `IngestStatus` enum value the consumer already tolerates via `_ = status`) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| upload-happy-path | 50 pending tiles, ON_GROUND, parent-suite returns 202 with all `queued` | `UploadBatchReport.outcome = success`; all 50 marked `uploaded` in C6; signature verifies on each | C11-IT-03 |
| flight-state-blocks | `FlightStateSource` returns `IN_FLIGHT` | `FlightStateNotOnGroundError`; zero C6 reads; zero POSTs | C11-IT-04 |
| signature-rejected | Parent suite returns `rejected` for 1 tile with reason `"invalid signature"` | `PerTileStatus.status = rejected`; `outcome = partial`; FDR `c11.upload.signature_rejected` emitted; the tile NOT marked uploaded | I-5 |
| duplicate-acknowledged | Parent suite returns `duplicate` for 5 tiles (already ingested in a prior batch) | All 5 marked `uploaded`; `outcome = success` | I-3 |
| signing-key-zeroised | Run a successful upload, then assert the AZ-318 manager's `_private_key is None` | Always zeroised; FDR `c11.upload.session.key.zeroised` recorded | I-4 |
| signing-key-zeroised-on-failure | Network drop mid-batch raises `SatelliteProviderError`, then assert key zeroised | Always zeroised even on failure | I-4 |
| empty-pending-set | No pending tiles | `outcome = success` with empty `per_tile_status`; zero POSTs; zero key generation | edge case |
| public-key-in-fdr-before-first-post | Capture FDR records | `kind="c11.upload.session.key.public"` precedes any `c11.upload.tile.*` records | safety-officer correlation |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — produced by AZ-319 (E-C11 decomposition) | autodev |
@@ -0,0 +1,106 @@
# Contract: operator_command_transport
**Component**: c12_operator_tooling
**Producer task**: AZ-330 — `_docs/02_tasks/todo/AZ-330_c12_operator_reloc_service.md`
**Consumer tasks**: TBD — a future E-C8 (AZ-261) task implements `MavlinkOperatorCommandTransport` against pymavlink
**Version**: 1.0.0
**Status**: frozen
**Last Updated**: 2026-05-10
## Purpose
Defines the operator-workstation ↔ companion command channel for AC-3.4 operator-relocalization. C12 owns the Protocol shape; E-C8 (AZ-261) ships the pymavlink-backed concrete implementation that encodes the hint into a MAVLink message and transmits it over the GCS link to the airborne companion. Decoupling the two sides through this Protocol prevents C12 from having to know MAVLink details, and prevents E-C8 from having to know operator-tool internals — they meet at this contract.
## Shape
### DTOs
```python
@dataclass(frozen=True)
class LatLonAlt:
latitude_deg: float # -90 ≤ value ≤ 90
longitude_deg: float # -180 < value ≤ 180
altitude_m: float # WGS84 ellipsoidal height; no documented bound
# If shared_helpers/wgs_converter.md already defines LatLonAlt, this contract REUSES that definition. The shape above is the canonical fallback if no shared definition exists.
@dataclass(frozen=True)
class ReLocHint:
approximate_position_wgs84: LatLonAlt # operator's best guess of current aircraft position
confidence_radius_m: float # > 0; operator's uncertainty radius around the position
reason: str # non-empty; free-text operator note for forensics
# Validates `confidence_radius_m > 0` and `reason != ""` in __post_init__.
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `LatLonAlt.latitude_deg` | `float` | yes | WGS84 latitude in degrees | `-90 ≤ x ≤ 90` |
| `LatLonAlt.longitude_deg` | `float` | yes | WGS84 longitude in degrees | `-180 < x ≤ 180` |
| `LatLonAlt.altitude_m` | `float` | yes | WGS84 ellipsoidal altitude in metres | no bound |
| `ReLocHint.approximate_position_wgs84` | `LatLonAlt` | yes | Operator's best guess | per `LatLonAlt` constraints |
| `ReLocHint.confidence_radius_m` | `float` | yes | Operator's uncertainty radius | `> 0` strictly |
| `ReLocHint.reason` | `str` | yes | Free-text operator note | non-empty; no length cap; no charset restriction |
### Protocol
```python
@runtime_checkable
class OperatorCommandTransport(Protocol):
def send_reloc_hint(self, hint: ReLocHint) -> None: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `send_reloc_hint` | `(hint: ReLocHint) -> None` | `GcsLinkError` (any failure to transmit: signal lost, link timeout, framing error, mavlink encode error) | sync |
### Errors
```python
class GcsLinkError(Exception):
reason: str # operator-friendly one-line description (e.g. "link signal lost")
wrapped_exception_repr: str | None # repr() of the underlying transport exception, if any
remediation: str = "Check GCS link signal strength; re-issue the re-loc command when the link recovers."
```
The transport implementation MUST raise `GcsLinkError` (and only `GcsLinkError`) on any failure to transmit. C12's `OperatorReLocService` catches and re-raises with C12-specific context.
## Invariants
- **INV-1 (validation already done)**: when `send_reloc_hint(hint)` is called, the `hint` is already validated (`confidence_radius_m > 0`, `reason != ""`, lat/lon in range). The transport MAY skip re-validation but MUST NOT perform a different validation pass that rejects values C12 considers valid.
- **INV-2 (single transmission attempt)**: `send_reloc_hint` MUST attempt transmission exactly once. The transport MUST NOT retry internally — best-effort semantics per description.md § 7 are enforced at the C12 / operator level, not at the transport layer.
- **INV-3 (no return value contract)**: `send_reloc_hint` returning normally means the transport believes the hint left the operator workstation; it does NOT mean the airborne companion received or processed it (no ack mechanism in v1.0.0).
- **INV-4 (preserve `reason` byte-for-byte)**: the transport MUST encode `reason` such that the airborne side decodes the identical UTF-8 byte sequence, up to the MAVLink message's documented field-length limit. If `reason` exceeds the MAVLink message capacity, the transport MUST raise `GcsLinkError(reason="reason field exceeds MAVLink encoding capacity: <N> bytes > <max> bytes")` rather than silently truncate.
- **INV-5 (no side effects beyond transmission)**: `send_reloc_hint` MUST NOT write to the local filesystem, emit FDR records, or change any operator-workstation state beyond the network transmission. C12 owns side effects (FDR record, log).
- **INV-6 (thread-safety)**: a single `OperatorCommandTransport` instance MAY be called from at most one thread per session. Concurrent calls from multiple threads are undefined behaviour and MAY raise `GcsLinkError(reason="concurrent send")`.
## Non-Goals
- **Acknowledgement / round-trip** — v1.0.0 is fire-and-forget. A future v2.0.0 may add an ack channel via FDR + STATUSTEXT; out of scope here.
- **Encryption / signing of the re-loc payload** — covered by the MAVLink 2.0 message-signing on the wired channel per ADR-009 / D-C8-9; this Protocol does not re-specify it.
- **Multiple companions** — one transport instance addresses one companion; multi-companion broadcast is out of scope.
- **Retry / backoff** — best-effort per description.md § 7. The operator decides when to re-issue.
- **Backpressure / flow control** — `send_reloc_hint` is sync and unbounded; if the operator issues 100 re-loc commands in 1 s, the transport sends 100 messages. The MAVLink physical layer's bandwidth is the natural bound.
- **GCS-link health probing** — this Protocol does NOT expose a `is_link_healthy()` method. Liveness is observed via `GcsLinkError` raised by `send_reloc_hint`.
## Versioning Rules
- **Breaking changes** (renaming `send_reloc_hint`, removing it, changing its signature, changing `ReLocHint` field types or names, removing `confidence_radius_m`, etc.) require a new major version (v2.0.0). The producer (this contract owner, AZ-330's owner) bumps the version, updates the Change Log, and notifies all consumers via the autodev tracker leftovers mechanism.
- **Non-breaking additions** (new optional kwarg with default, new method on the Protocol that consumers don't need to implement, new optional field in `ReLocHint` with a documented default) require a minor version bump (v1.1.0). Existing implementations remain valid.
- **Patch changes** (clarifying invariants, adding test cases, fixing typos) require a patch version bump (v1.0.1).
- A breaking change requires a deprecation period of at least one Plan cycle (one major release) before consumers may stop supporting the old version.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| TC-1 valid-minimal | `ReLocHint(LatLonAlt(49.99, 36.12, 100.0), confidence_radius_m=50.0, reason="lost track at WP3")` + healthy transport | `send_reloc_hint` returns `None`; airborne side decodes `reason="lost track at WP3"` and `confidence_radius_m=50.0` byte-identical | minimal happy path; verifies INV-4 |
| TC-2 invalid-radius | `ReLocHint(..., confidence_radius_m=0.0, ...)` constructed first (raises `ValueError` at DTO `__post_init__`); the transport is NEVER called | `ValueError` at construction; transport spy shows zero calls | producer-side validation (INV-1) — transport is not the gatekeeper |
| TC-3 link-failure | Healthy hint + transport whose underlying link drops mid-encode | `send_reloc_hint` raises `GcsLinkError(reason="link signal lost", wrapped_exception_repr="...")` | INV-2 (single attempt, no internal retry); INV-3 (return semantics) |
| TC-4 reason-too-long | `ReLocHint(..., reason="x" * 10000)` against a transport whose MAVLink encoding capacity is, say, 2000 bytes | `send_reloc_hint` raises `GcsLinkError(reason="reason field exceeds MAVLink encoding capacity: 10000 bytes > 2000 bytes")` | INV-4 enforcement; no silent truncation |
| TC-5 lat-lon-out-of-range | `LatLonAlt(latitude_deg=91.0, ...)` constructed first | `ValueError` at construction; transport never reached | producer-side validation; transport never called |
| TC-6 concurrent-call | Two threads calling `send_reloc_hint` on the same instance simultaneously | EITHER both succeed in some order, OR one raises `GcsLinkError(reason="concurrent send")` | INV-6 — undefined-behaviour-with-bounds; either outcome is contract-conformant; deterministic single-threaded use is the recommended pattern |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — frozen Protocol shape + DTO + error type + 6 test cases. | autodev (AZ-330 decompose) |
@@ -0,0 +1,166 @@
# Contract: VioStrategy Protocol
**Component**: c1_vio
**Producer task**: AZ-331 — `_docs/02_tasks/todo/AZ-331_c1_vio_strategy_protocol.md`
**Consumer tasks**:
- AZ-332 (OKVIS2 implementation — implements)
- AZ-333 (VINS-Mono implementation — implements)
- AZ-334 (KLT/RANSAC implementation — implements)
- AZ-335 (warm-start + F8 reboot recovery wiring — invokes `reset_to_warm_start`)
- E-C5 state estimator tasks under AZ-260 (consume `VioOutput`)
- E-C13 FDR writer tasks under AZ-248 (consume `VioHealth`)
- `runtime_root` composition under AZ-270 (selects strategy by config)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Defines the typed boundary between the on-Jetson visual / visual-inertial odometry runtime and every downstream consumer (C5 state estimator, C13 FDR, runtime_root composition). The Protocol is the single point of contact that lets ADR-001 select between three concrete strategies (OKVIS2 production-default, VINS-Mono research-only, KLT/RANSAC mandatory simple-baseline) at startup without consumers caring which is wired. Per-frame DTOs (`VioOutput`, `VioHealth`) are frozen here so C5 fusion and C13 FDR records do not drift across implementations.
## Shape
### Protocol surface
The Protocol is `typing.Protocol` (PEP 544 structural typing) with `runtime_checkable=True`.
| Method | Signature | Throws / Errors | Blocking? |
|--------|-----------|-----------------|-----------|
| `process_frame` | `(frame: NavCameraFrame, imu: ImuWindow, calibration: CameraCalibration) -> VioOutput` | `VioInitializingError`, `VioDegradedError`, `VioFatalError` | sync (camera-ingest hot path; bound by C1-PT-01 latency budget) |
| `reset_to_warm_start` | `(hint: WarmStartPose) -> None` | `VioFatalError` (only on irrecoverable backend init failure) | sync |
| `health_snapshot` | `() -> VioHealth` | — | sync |
| `current_strategy_label` | `() -> Literal["okvis2", "vins_mono", "klt_ransac"]` | — | sync |
### DTOs
`NavCameraFrame`, `ImuSample`, `ImuWindow`, `ImuBias`, `CameraCalibration` are owned by `gps_denied_onboard._types.nav` (AZ-263). This contract owns `WarmStartPose`, `VioOutput`, `VioHealth`, `FeatureQuality`, and the `VioState` enum, all `@dataclass(frozen=True)` (or `enum.Enum`). `VioOutput` and `VioHealth` are placed in `_types/nav.py` for cross-component access; the `VioStrategy` Protocol itself lives in `components/c1_vio/interface.py`.
```python
from dataclasses import dataclass
from enum import Enum
from typing import Protocol, Literal, runtime_checkable
from gps_denied_onboard._types.nav import (
NavCameraFrame, ImuWindow, ImuBias, CameraCalibration,
)
from gps_denied_onboard._types.geom import SE3, Vector3, Matrix6
class VioState(str, Enum):
INIT = "init"
TRACKING = "tracking"
DEGRADED = "degraded"
LOST = "lost"
@dataclass(frozen=True)
class WarmStartPose:
body_T_world: SE3
velocity_b: Vector3
bias: ImuBias
captured_at_ns: int # monotonic_ns when the hint was produced
@dataclass(frozen=True)
class FeatureQuality:
tracked: int
new: int
lost: int
mean_parallax: float
mre_px: float
@dataclass(frozen=True)
class VioOutput:
frame_id: str # echoes NavCameraFrame.frame_id
relative_pose_T: SE3
pose_covariance_6x6: Matrix6
imu_bias: ImuBias
feature_quality: FeatureQuality
emitted_at_ns: int
@dataclass(frozen=True)
class VioHealth:
state: VioState
consecutive_lost: int
bias_norm: float
@runtime_checkable
class VioStrategy(Protocol):
def process_frame(
self,
frame: NavCameraFrame,
imu: ImuWindow,
calibration: CameraCalibration,
) -> VioOutput: ...
def reset_to_warm_start(self, hint: WarmStartPose) -> None: ...
def health_snapshot(self) -> VioHealth: ...
def current_strategy_label(self) -> Literal["okvis2", "vins_mono", "klt_ransac"]: ...
```
### Error hierarchy
All under `gps_denied_onboard.components.c1_vio.errors`:
```
VioError (base; subclasses Exception)
├── VioInitializingError (state == INIT; no VioOutput emitted; C5 falls back to FC IMU prior)
├── VioDegradedError (state == DEGRADED; output IS still emitted with inflated covariance — see Invariants)
└── VioFatalError (state == LOST after configurable consecutive frames; AC-5.2 fallback path)
```
`VioDegradedError` is documented but is **not raised** during normal `process_frame` returns when degraded — degraded operation returns a `VioOutput` with inflated covariance and `VioHealth.state = DEGRADED`. The error type exists for the rare case where degradation transitions to fatality and consumer wrappers want to catch the family.
### Composition-root selection
```python
def build_vio_strategy(config: Config, *, fdr_client: FdrClient) -> VioStrategy: ...
```
Lives at `src/gps_denied_onboard/runtime_root/vio_factory.py`. Selects the strategy by `config.vio.strategy` (`okvis2 | vins_mono | klt_ransac`) and respects compile-time `BUILD_*` gating (`BUILD_OKVIS2`, `BUILD_VINS_MONO`, `BUILD_KLT_RANSAC`). Requesting a strategy whose `BUILD_*` flag is OFF raises `StrategyNotAvailableError` at composition time (NOT at first frame). Lazy-imports the concrete strategy module so a Tier-0 workstation build without OKVIS2 native libs still composes successfully when only KLT/RANSAC is requested.
## Invariants
- **6×6 SPD covariance always returned**: `pose_covariance_6x6` is symmetric and positive-definite for every `VioOutput`. Implementations MUST NOT return a "tightened" covariance (smaller Frobenius norm) during a degradation event; honest covariance is the safety floor for AC-NEW-4 and AC-NEW-7. A test (covariance-monotonicity contract test, deferred to Step 9 / E-BBT) asserts this across all three strategies.
- **`frame_id` echo**: `VioOutput.frame_id` equals the input `NavCameraFrame.frame_id`. C5 relies on this for time-aligned factor insertion.
- **Single-threaded by contract**: each `VioStrategy` instance is bound to one writer thread (the camera ingest thread). Concurrent calls to `process_frame` on the same instance are undefined behaviour. The composition root binds one instance per ingest thread.
- **`reset_to_warm_start` is destructive**: clears the strategy's keyframe window, IMU integration state, and feature track buffer; subsequent `process_frame` calls re-initialise from the hint. Calling `reset_to_warm_start` mid-flight is allowed (F8 reboot recovery) but must not be issued concurrently with a `process_frame` call on the same instance.
- **`current_strategy_label()` is constant per instance**: returns the same string for the lifetime of the instance and matches `config.vio.strategy` exactly. The label is FDR-stamped on every `VioHealth` event for AC-NEW-3 audit.
- **No ambient state**: implementations MUST NOT read environment variables, wall clock, or filesystem inside `process_frame`; calibration arrives via constructor + per-call argument; logging uses the injected logger only.
- **Error envelope is closed**: `process_frame` raises only members of `VioError` (the family). Lower-level exceptions from OpenCV / OKVIS2 / VINS-Mono / GTSAM MUST be caught and rewrapped.
## Non-Goals
- IMU preintegration mathematics — owned by AZ-276 / `helpers.imu_preintegrator`. Strategies feed `ImuWindow` to the helper; they do NOT implement preintegration internally.
- Bias estimation policy — each strategy decides when to update its bias; the contract does not prescribe a schedule.
- WarmStartPose persistence (write to disk after takeoff, read after F8 reboot) — owned by the warm-start + F8 reboot recovery wiring task in this same epic. The contract here only defines the in-memory DTO and the `reset_to_warm_start` method.
- C5 fusion semantics — owned by E-C5; this contract only delivers `VioOutput`.
- Multi-camera strategies — out of scope this cycle (single nav-camera per ADR / RESTRICT-UAV-3).
## Versioning Rules
- **Breaking changes** (method renamed/removed, parameter type changed, return type changed, invariant relaxed) require a new major version + a deprecation pass through every consumer task in the header.
- **Non-breaking additions** (new optional method, new diagnostic accessor that does not mutate state, new `VioState` enum variant added at the end) require a minor version bump.
- The `VioState` enum is treated as a closed set for switch-style consumer code (C5 fusion); adding a new variant is a minor bump but consumers MUST handle the new state defensively (default branch → treat as LOST).
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| protocol-conformance | three concrete strategy classes | `isinstance(impl, VioStrategy)` returns True for each | Catches drift between impl and Protocol surface |
| frozen-dto-mutation | a constructed `VioOutput` instance and an attempt to set `.relative_pose_T` | `dataclasses.FrozenInstanceError` raised | Confirms DTOs are immutable |
| error-family-catchable | each of `VioInitializingError`, `VioDegradedError`, `VioFatalError` raised | `except VioError` catches all three; `except ValueError` does NOT | Confirms error envelope |
| factory-build-flag-respected | `config.vio.strategy = "vins_mono"` and `BUILD_VINS_MONO=OFF` | `StrategyNotAvailableError` raised at composition; `sys.modules` has no `vins_mono` entry | Confirms lazy-import gating |
| current-strategy-label-exact-match | each strategy constructed via factory with matching config | `current_strategy_label()` returns the literal config value | AC-NEW-3 audit gate |
| frame-id-echoed | a `NavCameraFrame` with a known UUID fed into `process_frame` | the returned `VioOutput.frame_id` equals the input UUID | C5 alignment invariant |
| covariance-spd | inspect 100 emitted `VioOutput.pose_covariance_6x6` matrices | every matrix is symmetric and positive-definite (eigenvalues > 0) | AC-1.4 floor |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/components/01_c1_vio/description.md` § 2 + AZ-254 epic child issue #1 | autodev decompose Step 2 |
@@ -0,0 +1,183 @@
# Contract: `ReRankStrategy` Protocol
**Owner**: c2_5_rerank (epic AZ-256 / E-C2.5)
**Producer task**: AZ-342 (`ReRankStrategy` Protocol + factory + composition)
**Consumer tasks**: AZ-343 (`InlierCountReRanker` impl); downstream c3_matcher (epic AZ-257 / E-C3 — TBD at AZ-257 decompose time) which consumes `RerankResult`
**Version**: 1.0.0
**Status**: draft, awaiting AZ-342 implementation
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c2_5_rerank/interface.py` (Protocol), `src/gps_denied_onboard/components/c2_5_rerank/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/rerank_factory.py` (factory)
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — Protocol surface, DTOs, error hierarchy, factory signature, 8 invariants, drop-and-continue contract (INV-8) | autodev / decompose Step 2 |
## Purpose
Defines the public interface for the C2.5 inlier-based re-rank strategy: `rerank` consumes a C2 `VprResult` (top-K=10) and produces a `RerankResult` (top-N=3) ranked by single-pair LightGlue inlier count against each candidate's tile pixels. The re-rank step is the architectural boundary between cheap descriptor retrieval (C2) and expensive cross-domain matching (C3) — it pays a small extra GPU cost so C3 only operates on the most promising candidates.
`ReRankStrategy` is a Strategy interface with a single concrete implementation today (`InlierCountReRanker`). Future re-rank algorithms (e.g., learned re-rankers) can be added as additional implementations behind the same interface, gated by `BUILD_RERANK_<variant>` build flags per ADR-002.
The shared `LightGlueRuntime` helper (AZ-278 / `helpers.lightglue_runtime`) is constructor-injected — neither C2.5 nor C3 owns the helper. This resolves R14 (apparent C2.5↔C3 cycle) by making both components sibling consumers of the helper.
## Public API
### Protocol: `ReRankStrategy`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration, VprResult, RerankResult
@runtime_checkable
class ReRankStrategy(Protocol):
"""Single-camera re-rank strategy. Stateless per-frame; the only persistent state is the constructor-injected `LightGlueRuntime` helper handle and the `TileStore` Public API reference."""
def rerank(
self,
frame: NavCameraFrame,
vpr_result: VprResult,
n: int,
calibration: CameraCalibration,
) -> RerankResult:
"""Re-rank the top-K candidates from `vpr_result` down to top-N by single-pair LightGlue inlier count.
For each candidate in `vpr_result.candidates`:
1. Fetch tile pixels via `TileStore.get_tile_pixels(candidate.tile_id)`.
2. Run a single-pair LightGlue forward via the shared `LightGlueRuntime` (frame ↔ tile).
3. Record the inlier count.
Sort candidates descending by inlier count; return the top-N as a `RerankResult`.
Drop-and-continue semantics: if a per-candidate failure occurs (`TileFetchError` from C6 OR `RerankBackboneError` from LightGlue), the candidate is dropped from the rerank set and a per-candidate ERROR log + FDR record is emitted. Sorting and top-N selection proceed against the surviving candidates.
If FEWER than N candidates survive, the strategy returns `RerankResult` with whatever it has (length 1..N-1); C3 proceeds with reduced N. If ZERO candidates survive, the strategy raises `RerankAllCandidatesFailedError`; downstream C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5).
Raises:
RerankAllCandidatesFailedError: every candidate's LightGlue or tile-fetch failed; no rerank result possible.
"""
...
```
**Invariants** (every implementation MUST guarantee):
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces). The shared `LightGlueRuntime` requires serial access (per description.md § 7); concurrent `rerank` calls on a single instance race the GPU stream.
2. **Stateless per-frame** — no implicit dependency on prior frames; reordering `rerank` calls (which the live path NEVER does, but tests do) MUST yield identical `RerankResult` content (same surviving candidates in same order, given same inputs).
3. **Top-N ordering by inlier count descending**`RerankResult.candidates` is sorted descending by `inlier_count`. Ties broken deterministically by `descriptor_distance` ascending (carried forward from C2). Stable, reproducible across runs.
4. **`RerankResult.candidates` length is bounded** — `0 < len <= n` when returned (zero raises `RerankAllCandidatesFailedError`); never exceeds `n`; never exceeds `len(vpr_result.candidates)`.
5. **`descriptor_distance` is carried forward unchanged** — re-rank does NOT compute a new descriptor distance; the C2-stage value is preserved on every surviving `RerankCandidate` for FDR provenance.
6. **`tile_pixels_handle` is a reference, NOT a copy** — `RerankCandidate.tile_pixels_handle` is the same handle returned by `TileStore.get_tile_pixels` (page-cache backed). Copying tile pixels at re-rank time would defeat AC-4.1's latency budget.
7. **Deterministic per (frame, vpr_result, corpus, helper) tuple** — given identical inputs and an identical `LightGlueRuntime` helper state, two calls return bit-identical `RerankResult` (same inlier counts, same ordering, same surviving candidates).
8. **Drop-and-continue is the ONLY per-candidate failure mode** — a per-candidate exception NEVER propagates out of `rerank` unless every candidate fails. This is the contract that lets C3 absorb partial failures gracefully.
### DTOs (in `_types/rerank.py`)
```python
from dataclasses import dataclass
from uuid import UUID
import numpy as np
@dataclass(frozen=True, slots=True)
class RerankCandidate:
"""One re-rank survivor. Carries the C2-stage descriptor_distance forward for FDR provenance plus the new inlier_count from single-pair LightGlue."""
tile_id: tuple # composite (zoomLevel, lat, lon); see C6 TileRecord
inlier_count: int # single-pair LightGlue inliers; > 0 for any survivor
descriptor_distance: float # carried forward from C2's VprCandidate
descriptor_dim: int # carried forward from C2 for sanity assertions
tile_pixels_handle: object # opaque page-cache-backed pixel reference; see C6 TileStore contract
@dataclass(frozen=True, slots=True)
class RerankResult:
"""Top-N survivors from `ReRankStrategy.rerank`. Consumed by C3 CrossDomainMatcher."""
frame_id: UUID
candidates: list[RerankCandidate] # 0 < len <= n; sorted descending by inlier_count, ties broken by descriptor_distance ascending
reranked_at: int # monotonic_ns
rerank_label: str # non-empty; matches BUILD_RERANK_<variant> lowercase (e.g., "inlier_count")
candidates_input: int # len(vpr_result.candidates) at entry — for FDR observability
candidates_dropped: int # candidates_input - len(candidates)
```
### Error Hierarchy (in `c2_5_rerank/errors.py`)
```python
class RerankError(Exception):
"""Base for all C2.5 re-rank errors. Caught at the runtime root; downstream effect: C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5) only when `RerankAllCandidatesFailedError` is raised."""
class RerankBackboneError(RerankError):
"""Per-candidate LightGlue forward-pass failure (CUDA OOM, TRT engine deserialize mismatch). Logged at ERROR; per-occurrence FDR record. Drop-and-continue: the candidate is dropped from the rerank set, NOT the whole batch."""
class RerankAllCandidatesFailedError(RerankError):
"""Every candidate's LightGlue or tile fetch failed; zero survivors. Logged at ERROR; per-occurrence FDR record `kind=rerank.all_failed`. C5 falls back to VIO-only."""
```
`TileFetchError` is owned by C6 (`components.c6_tile_cache`); C2.5 catches it inside the per-candidate loop and treats it identically to `RerankBackboneError` (drop-and-continue + ERROR log + FDR record `kind=rerank.tile_fetch_error`).
## Composition-Root Factory
```python
# src/gps_denied_onboard/runtime_root/rerank_factory.py
from gps_denied_onboard.config import Config
from gps_denied_onboard.components.c2_5_rerank import ReRankStrategy
from gps_denied_onboard.components.c6_tile_cache import TileStore
from gps_denied_onboard.helpers.lightglue_runtime import LightGlueRuntime
def build_rerank_strategy(
config: Config,
tile_store: TileStore,
lightglue_runtime: LightGlueRuntime,
) -> ReRankStrategy:
"""Composition-root factory. Reads `config.rerank.strategy` (currently only `"inlier_count"` is defined; future strategies extend the table); lazy-imports the concrete strategy module gated by its CMake `BUILD_RERANK_<variant>` flag; refuses to instantiate a strategy whose flag is OFF (raises `ConfigurationError` pointing at the offending strategy name + missing flag).
Strategy resolution table:
| config.rerank.strategy | Implementation | Module | Build flag |
|------------------------|-----------------------|---------------------------------------------------|---------------------------|
| "inlier_count" | InlierCountReRanker | components.c2_5_rerank.inlier_based_reranker | BUILD_RERANK_INLIER_COUNT |
The shared `LightGlueRuntime` is constructor-injected; the factory does NOT own its lifecycle. The runtime root constructs ONE `LightGlueRuntime` instance and passes the same reference to both this factory (for C2.5) and the C3 matcher factory.
Returns a fully-constructed strategy ready for `rerank` invocation. The caller (runtime root) is responsible for binding the instance to one ingest thread.
"""
...
```
## Versioning
- The `ReRankStrategy` Protocol's method signature is part of the cross-component public API. Any change (new method, removed method, parameter rename, return-type change) is a major bump and requires updating every concrete implementation in lockstep.
- DTO field additions are minor (frozen dataclasses with new optional fields default to None); field removals are major.
- The drop-and-continue contract (Invariant 8) is non-negotiable; changing it would break C3's tolerance of partial input.
## Test Cases (protocol conformance — runs against every concrete strategy)
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| INV-1 (single-thread) | Composition root rejects multi-thread binding | `RuntimeError` on second binding attempt |
| INV-2 (stateless) | `rerank(frame_A)` then `rerank(frame_B)` then `rerank(frame_A)` again with the same `vpr_result` | First and third call return identical `RerankResult` (same surviving candidates, same order) |
| INV-3 (top-N order) | Mixed inlier counts (e.g., [412, 198, 287, 0, 153, ...]) on K=10 input with N=3 | Returned candidates sorted descending by inlier_count: [412, 287, 198] |
| INV-3 (tie-break) | Two candidates with identical inlier_count but different descriptor_distance | Lower descriptor_distance ranked first |
| INV-4 (length bound) | N=3 with K=10 input, all 10 succeeding | `len(result.candidates) == 3` |
| INV-4 (length under failure) | N=3 with K=10 input, 8 candidates fail | `len(result.candidates) == 2`; `candidates_dropped == 8` |
| INV-5 (descriptor_distance carried) | Each survivor's `descriptor_distance` | Equals the C2-stage value from `vpr_result.candidates[i].descriptor_distance` |
| INV-6 (handle is reference) | Mutate the underlying tile pixel buffer and re-read via `tile_pixels_handle` | Mutation visible (proves no copy) |
| INV-7 (deterministic) | `rerank(same inputs)` × 3 | All three return bit-identical `RerankResult` (same inlier_counts, same ordering, same surviving tile_ids) |
| INV-8 (drop-and-continue) | One candidate raises `RerankBackboneError`; nine succeed | Result has 3 survivors from the surviving 9; ONE ERROR log per failed candidate; the success path is NOT interrupted |
| AC-2.5-IT-01 (top-1 promotion rate) | `rerank` against fixture corpus where C2 top-1 was correct | Top-1 promotion rate ≥ 0.98 (C2's top-1 is preserved as result top-1 in ≥ 98% of frames) |
| AC-2.5-IT-02 (drop-and-continue smoke) | Inject `RerankBackboneError` for one candidate | Drop semantics hold; surviving candidates re-ranked |
| AC-2.5-IT-03 (helper serial-access) | Two `rerank` calls on the same instance from a single thread | Second call sees no `LightGlueRuntime` state corruption from the first; results bit-identical to single-threaded baseline |
| All-fail | Inject `RerankBackboneError` for every candidate | `RerankAllCandidatesFailedError` raised; per-candidate ERROR logs + final `kind=rerank.all_failed` FDR record |
## Open Questions / Risks
- **Risk: the shared `LightGlueRuntime` helper's serial-access invariant must be enforced upstream** — by the composition root binding both C2.5 and C3 to the same single ingest thread. *Mitigation*: AZ-278 (helper) ships with an internal assertion on each call that the calling thread matches the binding thread; AZ-342 (this Protocol task) consumes the helper as a constructor dependency and does NOT need to add a per-call check.
- **Risk: `tile_pixels_handle` semantics drift between C6's `TileStore` Public API and C2.5's expectation** — C2.5 expects a page-cache-backed reference, NOT a copy; C6's `get_tile_pixels` MUST guarantee that. *Mitigation*: cross-referenced in AZ-303 (`tile_store` contract) — the contract test for `get_tile_pixels` asserts the returned object is the same identity across two calls within a TTL window.
- **Risk: `n` parameter clamping vs. epic spec** — the epic fixes K=10, N=3; the Protocol leaves `n` parametric for testability. *Mitigation*: composition root binds `n=3` from `config.rerank.top_n` (default 3); the Protocol accepts arbitrary `n` so tests can use smaller values.
- **Risk: drop-and-continue can mask a backbone-wide regression** — if every flight has 3/10 candidates failing silently, recall degrades without any single failure being investigated. *Mitigation*: `RerankResult.candidates_dropped` is published per-frame; an FDR aggregate alert (post-flight tooling) flags flights with `candidates_dropped` p95 > 1.
@@ -0,0 +1,214 @@
# Contract: `VprStrategy` Protocol + `BackbonePreprocessor` Protocol
**Owner**: c2_vpr (epic AZ-255 / E-C2)
**Producer task**: AZ-336 (`VprStrategy` Protocol + factory + composition)
**Consumer tasks**: AZ-337 (UltraVPR), AZ-338 (NetVLAD baseline), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD), AZ-341 (FAISS HNSW retrieve wiring), and downstream c2_5_rerank (AZ-256 / E-C2.5)
**Module-layout home**: `src/gps_denied_onboard/components/c2_vpr/interface.py` (Protocols), `src/gps_denied_onboard/components/c2_vpr/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/vpr_factory.py` (factory)
**Status**: draft, awaiting AZ-336 implementation
## Purpose
Defines the public interface for every C2 VPR backbone strategy: `embed_query` produces a `VprQuery` from a `NavCameraFrame`, `retrieve_topk` runs the FAISS HNSW lookup against the C6-owned descriptor index, and `descriptor_dim` advertises the embedding dimensionality so the composition root can pre-validate index/strategy compatibility. Every concrete backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD) implements this Protocol; the composition root selects exactly one at startup based on `config.vpr.strategy` and refuses to wire a strategy whose `BUILD_VPR_<variant>` flag is OFF (ADR-002 + ADR-009).
`BackbonePreprocessor` is the C2-internal helper Protocol for resize/crop/normalise per backbone's input contract. It lives next to the strategy (NOT in `helpers/`) because preprocessing parameters are tightly coupled to the backbone weights; sharing across backbones is forbidden — each strategy owns its own concrete preprocessor.
## Public API
### Protocol: `VprStrategy`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration, VprQuery, VprResult
@runtime_checkable
class VprStrategy(Protocol):
"""Single-camera visual place recognition strategy. Stateless per-frame; the only persistent state is the loaded backbone weights and the C6-owned FAISS index handle (passed in via constructor)."""
def embed_query(
self,
frame: NavCameraFrame,
calibration: CameraCalibration,
) -> VprQuery:
"""Run the backbone forward pass on the provided frame and return a `VprQuery` carrying the descriptor embedding.
Calibration is consumed for input preprocessing (resize / crop / normalise per the backbone's input contract — owned by the strategy's internal `BackbonePreprocessor`).
Raises:
VprBackboneError: backbone forward pass failed (CUDA OOM, TRT engine deserialize mismatch, etc.).
"""
...
def retrieve_topk(self, query: VprQuery, k: int) -> VprResult:
"""Run the FAISS HNSW top-K lookup against the corpus descriptor index.
The strategy holds the FAISS index handle (constructor-injected from C6's `TileStore` Public API). Top-K candidates are returned in ascending `descriptor_distance` order.
Raises:
IndexUnavailableError: FAISS index handle invalid (e.g., post-F8 reboot before warm-up, or out-of-band file replacement caught by the underlying mmap defence).
VprBackboneError: descriptor distance computation failed unexpectedly.
"""
...
def descriptor_dim(self) -> int:
"""Backbone embedding dimensionality (e.g., 512 for UltraVPR, 4096 for NetVLAD-VGG16). Stable for the strategy's lifetime; consumed by the composition root to pre-validate index compatibility (the C6 index file declares its own dim in its sidecar; mismatch → `ConfigurationError` at startup, NOT at first frame)."""
...
```
**Invariants** (every implementation MUST guarantee):
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces; concurrent `embed_query` calls on a single instance race the GPU stream).
2. **Stateless per-frame** — no implicit dependency on prior frames; reordering `embed_query` calls (which the live path NEVER does, but tests do) MUST yield identical embeddings.
3. **L2-normalised embeddings** — the `VprQuery.embedding` MUST be L2-normalised (via `helpers.descriptor_normaliser`) so cosine similarity aligns with Euclidean distance for FAISS HNSW lookup. Strategies that produce raw embeddings (e.g., NetVLAD) MUST normalise before returning.
4. **`retrieve_topk` returns exactly `k` candidates, sorted ascending by `descriptor_distance`** — never fewer, never more, never unordered. If the corpus has fewer than `k` tiles, the strategy raises `IndexUnavailableError` (production deployments stage corpora with ≥1000 tiles; `k=10`).
5. **`backbone_label` is non-empty** — every `VprResult` carries the strategy's name (e.g., `"ultra_vpr"`, `"net_vlad"`) for FDR provenance. This MUST match the `BUILD_VPR_<variant>` flag's lowercase form.
6. **`embed_query` and `retrieve_topk` are deterministic** — given the same frame + calibration + corpus, identical embeddings and identical top-K candidates (in identical order). This is required for the C2-IT-02 invariant test and post-flight forensics.
7. **`descriptor_dim()` is stable for the strategy's lifetime** — never changes after construction; the value reflects the loaded weights' output dim, NOT a config knob.
### DTOs (in `_types/vpr.py`)
```python
from dataclasses import dataclass
from uuid import UUID
import numpy as np
@dataclass(frozen=True, slots=True)
class VprQuery:
"""Backbone embedding for a single nav-camera frame. Produced by `VprStrategy.embed_query`; consumed by `VprStrategy.retrieve_topk` (same instance) or — in the C10 corpus-build path — by `DescriptorIndexBuilder` to populate the corpus descriptor matrix."""
frame_id: UUID
embedding: np.ndarray # shape (D,), dtype float16 or float32; L2-normalised
produced_at: int # monotonic_ns
@dataclass(frozen=True, slots=True)
class VprCandidate:
"""One retrieval candidate from the top-K result."""
tile_id: tuple # composite (zoomLevel, lat, lon); see C6 TileRecord
descriptor_distance: float # backbone-specific metric (cosine for L2-normalised; Euclidean for raw)
descriptor_dim: int
@dataclass(frozen=True, slots=True)
class VprResult:
"""Top-K candidates from `VprStrategy.retrieve_topk`. Consumed by C2.5 ReRanker."""
frame_id: UUID
candidates: list[VprCandidate] # length == k, sorted ascending by descriptor_distance
retrieved_at: int # monotonic_ns
backbone_label: str # non-empty; matches BUILD_VPR_<variant> lowercase
```
### Protocol: `BackbonePreprocessor` (C2-internal; lives in `c2_vpr/_preprocessor.py`)
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration
import numpy as np
@runtime_checkable
class BackbonePreprocessor(Protocol):
"""Resize / crop / normalise per backbone's input contract. Each `VprStrategy` implementation owns its concrete preprocessor (NOT shared across backbones — preprocessing parameters are tightly coupled to weights)."""
def preprocess(
self,
frame: NavCameraFrame,
calibration: CameraCalibration,
) -> np.ndarray:
"""Return the preprocessed input tensor in the layout the backbone's forward pass expects (e.g., (1, 3, H, W) NCHW float16 for TRT).
Raises:
VprPreprocessError: input frame violates the backbone's contract (wrong colour channels, calibration mismatch).
"""
...
def input_shape(self) -> tuple[int, ...]:
"""The (H, W) resize target the backbone expects. Stable for the preprocessor's lifetime; consumed by tests to assert preprocessing fidelity."""
...
```
### Error Hierarchy (in `c2_vpr/errors.py`)
```python
class VprError(Exception):
"""Base for all C2 VPR errors. Caught at the runtime root; downstream effect: C5 falls back to VIO-only with provenance `visual_propagated` (AC-1.4)."""
class VprBackboneError(VprError):
"""Backbone forward pass failed (CUDA OOM, TRT engine deserialize mismatch, ONNX runtime IO mismatch). Logged at ERROR; per-occurrence FDR record."""
class VprPreprocessError(VprError):
"""Input frame violates backbone's preprocessing contract (wrong colour channels, calibration mismatch). Logged at ERROR; per-occurrence FDR record."""
class IndexUnavailableError(VprError):
"""FAISS index handle invalid (post-F8 reboot before warm-up; out-of-band file replacement). Logged at ERROR; recovery: F8 reboot path re-mmaps the index. Per C2-ST-01 the strategy MUST raise this rather than return stale candidates."""
```
## Composition-Root Factory
```python
# src/gps_denied_onboard/runtime_root/vpr_factory.py
from typing import TYPE_CHECKING
from gps_denied_onboard.config import Config
from gps_denied_onboard.components.c2_vpr import VprStrategy
from gps_denied_onboard.components.c6_tile_cache import TileStore
from gps_denied_onboard.components.c7_inference import InferenceRuntime
def build_vpr_strategy(
config: Config,
tile_store: TileStore,
inference_runtime: InferenceRuntime,
) -> VprStrategy:
"""Composition-root factory. Reads `config.vpr.strategy` and `config.vpr.backbone_weights_path`; lazy-imports the concrete strategy module gated by its CMake `BUILD_VPR_<variant>` flag; refuses to instantiate a strategy whose flag is OFF (raises `ConfigurationError` pointing at the offending strategy name + missing flag).
Strategy resolution table:
| config.vpr.strategy | Implementation | Module | Build flag |
|---------------------|----------------------|-----------------------------------------------|-------------------|
| "ultra_vpr" | UltraVprStrategy | components.c2_vpr.ultra_vpr | BUILD_VPR_ULTRA_VPR |
| "net_vlad" | NetVladStrategy | components.c2_vpr.net_vlad | BUILD_VPR_NETVLAD |
| "mega_loc" | MegaLocStrategy | components.c2_vpr.mega_loc | BUILD_VPR_MEGALOC |
| "mix_vpr" | MixVprStrategy | components.c2_vpr.mix_vpr | BUILD_VPR_MIXVPR |
| "sela_vpr" | SelaVprStrategy | components.c2_vpr.sela_vpr | BUILD_VPR_SELAVPR |
| "eigen_places" | EigenPlacesStrategy | components.c2_vpr.eigen_places | BUILD_VPR_EIGENPLACES |
| "salad" | SaladStrategy | components.c2_vpr.salad | BUILD_VPR_SALAD |
Pre-flight validation: after constructing the strategy, the factory queries `strategy.descriptor_dim()` and asserts it matches the C6 corpus index's declared `descriptor_dim` (read from the FAISS index sidecar). Mismatch → `ConfigurationError` at startup, NOT at first frame.
Returns a fully-constructed strategy ready for `embed_query` / `retrieve_topk` invocation. The caller (runtime root) is responsible for binding the instance to one ingest thread.
"""
...
```
## Versioning
- The `VprStrategy` Protocol's method signatures are part of the cross-component public API. Any change (new method, removed method, parameter rename, return-type change) is a major bump and requires updating every concrete implementation in lockstep.
- DTO field additions are minor (frozen dataclasses with new optional fields default to None); field removals are major.
- `BackbonePreprocessor` is C2-internal; backwards-compat is per-strategy, not cross-strategy.
## Test Cases (protocol conformance — runs against every concrete strategy)
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| INV-1 (single-thread) | Concurrent `embed_query` from 2 threads on one instance | Documented as forbidden in test docstring; test asserts composition root rejects multi-thread binding |
| INV-2 (stateless) | `embed_query(frame_A)` then `embed_query(frame_B)` then `embed_query(frame_A)` again | First and third call return identical embeddings (bit-exact for float embeddings; ULP-tolerant for float16) |
| INV-3 (L2-normalised) | `||VprQuery.embedding||_2` after `embed_query` | Equal to 1.0 ± 1e-3 (tolerance for float16) |
| INV-4 (top-K size + order) | `retrieve_topk(query, k=10)` against a 100-tile fixture corpus | `len(candidates) == 10`; distances are non-strictly-ascending |
| INV-5 (backbone_label non-empty) | Every `VprResult` from `retrieve_topk` | `backbone_label` is a non-empty string and matches the strategy's `BUILD_VPR_<variant>` lowercase |
| INV-6 (deterministic) | `embed_query(same frame)` × 3 then `retrieve_topk(same query)` × 3 | All three pairs return bit-exact embeddings + identical top-K (tile_ids in same order) |
| INV-7 (descriptor_dim stable) | `descriptor_dim()` × 100 calls | Returns the same value every call |
| AC-2.1b (recall floor) | UltraVPR + NetVLAD on Derkachi normal-segment corpus | UltraVPR recall@10 ≥ 0.95; NetVLAD recall@10 ≥ 0.85 (engine rule check; AZ-338) |
| AC-NEW-7 (poisoned tile) | Top-1 distance to poisoned tile in NFT-SEC-01 corpus | Within AC-NEW-7 relaxed CI |
| C2-ST-01 (stale index) | Out-of-band corpus file replacement | `retrieve_topk` raises `IndexUnavailableError`; no candidates returned |
## Open Questions / Risks
- **Risk: backbone weights' descriptor_dim drifts across upstream code drops** (e.g., a new UltraVPR release changes embedding dim from 512 to 768). *Mitigation*: the factory's pre-flight `descriptor_dim()` × C6 sidecar match catches this at startup; the operator must rebuild the C6 corpus before the new weights can be used.
- **Risk: SALAD is mentioned in description.md but NOT in the original epic's child issues** — included here for completeness because module-layout.md `BUILD_VPR_<variant>` table lists SALAD. *Decision*: SALAD lives in AZ-340 (with SelaVPR + EigenPlaces). If the team decides SALAD is out of scope this cycle, that task drops one backbone with no other changes needed.
@@ -0,0 +1,170 @@
# Contract: `ConditionalRefiner` Protocol
**Owner**: c3_5_adhop (epic AZ-258 / E-C3.5)
**Producer task**: AZ-348 (Protocol + factory + DTOs + composition + `PassthroughRefiner`)
**Consumer tasks**: AZ-349 (`AdHoPRefiner` real refinement); downstream c4_pose (epic AZ-259) which consumes the (possibly refined) `MatchResult`
**Version**: 1.0.0
**Status**: draft, awaiting Producer task implementation
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c3_5_adhop/interface.py` (Protocol), `src/gps_denied_onboard/components/c3_5_adhop/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/refiner_factory.py` (factory)
> **Public API symbol naming.** The component's public interface symbol is named `ConditionalRefiner` in `description.md` § 2 and `AdHoPRefinementStrategy` in `module-layout.md` § c3_5_adhop. Both refer to the SAME Protocol; the canonical class name in code is `ConditionalRefiner` — it is the role description-first name and matches the method `refine_if_needed`. The producer task ALSO updates `module-layout.md` to align (`AdHoPRefinementStrategy` → `ConditionalRefiner`) so the two documents agree.
## Purpose
Defines the public interface for every C3.5 refinement strategy: `refine_if_needed(frame, mr, residual_threshold_px)` returns a `MatchResult` that is either (a) the input unchanged ("passthrough") OR (b) enriched with refined inlier correspondences from OrthoLoC AdHoP perspective preconditioning. The conditional gate is a configurable residual threshold: if the input `MatchResult.reprojection_residual_px` ≤ threshold the refiner returns the input unchanged; otherwise the refiner runs the AdHoP backbone and returns an enriched `MatchResult`. `was_invoked()` exposes the last-call decision for FDR provenance and for NFT-PERF-01 invocation-rate accounting.
Two concrete strategies are linked into the production binary by default: `AdHoPRefiner` (production-default; conditional invocation) and `PassthroughRefiner` (always passes through; non-conditional baseline used by smoke tests and by IT-12's "no refinement" comparison). Both implementations co-exist at build time per ADR-001 — gating is at runtime via `config.refiner.strategy`. Build-time exclusion (ADR-002) is NOT used here because both strategies are tiny (passthrough is a no-op; AdHoP's backbone is a single TRT engine shared with C7).
The shared `RansacFilter` helper (AZ-282) is constructor-injected — `c3_5_adhop` imports the SAME helper used by `c3_matcher` and `c4_pose`; the runtime root constructs ONE instance and identity-shares it across all three components.
## Public API
### Protocol: `ConditionalRefiner`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import (
NavCameraFrame, MatchResult,
)
@runtime_checkable
class ConditionalRefiner(Protocol):
"""Conditional refinement strategy invoked between C3 (matcher) and C4 (pose). Stateless per-frame; the only persistent state is the constructor-injected backbone runtime handle + the last-invocation flag."""
def refine_if_needed(
self,
frame: NavCameraFrame,
mr: MatchResult,
residual_threshold_px: float,
) -> MatchResult:
"""If `mr.reprojection_residual_px <= residual_threshold_px` (the steady-state path), return `mr` unchanged AND set `was_invoked()` to False. Otherwise, run the strategy's refinement procedure and return an enriched `MatchResult` with `refinement_label` set, AND set `was_invoked()` to True.
On `RefinerBackboneError` (AdHoP backbone failure during the invoked path), the refiner MUST fall through to passthrough — return `mr` unchanged with `refinement_label = "passthrough"` AND `was_invoked()` = True (the attempt counts towards the invocation rate even on failure). The error is logged at ERROR level + emitted to FDR; downstream pose estimation may then trigger F6 satellite re-localisation if quality gates fail.
Determinism: same inputs MUST produce the same output. The conditional gate is a `<=` comparison only — no probabilistic gating, no time-based gating.
"""
...
def was_invoked(self) -> bool:
"""Return True iff the last call to `refine_if_needed` actually entered the refinement procedure (regardless of whether it produced a refined result or fell through to passthrough on backbone error). Reset to False at the start of every `refine_if_needed` call. Used by FDR per-frame provenance and by NFT-PERF-01 / C3.5-IT-03 invocation-rate accounting."""
...
```
**Invariants**:
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces; same thread as C3 because they share the C-frame ingest path).
2. **Stateless per-frame for `refine_if_needed`** — except for the `was_invoked()` flag, no implicit dependency on prior frames; reordering `refine_if_needed` calls (tests only) MUST yield identical output `MatchResult` content.
3. **Conditional gate is a pure comparison**`mr.reprojection_residual_px <= threshold` → passthrough; `>` → invoke. No tolerance, no smoothing, no hysteresis. The threshold is a parameter (NOT a hidden internal constant) so operator tooling can tune pre-flight per AC-NEW-5 / R10.
4. **Passthrough fall-through on backbone error**`RefinerBackboneError` raised inside the invoked path is caught by the strategy and converted to passthrough output (input `MatchResult` returned unchanged with `refinement_label = "passthrough"`); the error is logged at ERROR level. The exception is NEVER re-raised out of `refine_if_needed` (downstream pose estimation gets a usable `MatchResult` and decides whether to trigger F6).
5. **Bit-identical correspondences on passthrough** — when `refinement_label == "passthrough"`, every `inlier_correspondences` ndarray in the output equals the input ndarray bit-for-bit (`np.array_equal` AND same dtype). Refinement may NEVER silently rewrite correspondences when the gate decided not to invoke.
6. **`refinement_label` is `"adhop"` OR `"passthrough"`** — exactly one of those two values; matches the strategy's selected variant. The label distinguishes "AdHoP ran successfully" from "passthrough or AdHoP-fell-through-to-passthrough"; readers check `was_invoked()` for the latter discrimination.
7. **`refinement_added_latency_ms` is the STRATEGY-INTERNAL added latency** — not the matcher's or pose estimator's; covers exactly the work done inside `refine_if_needed`. Always ≥ 0; near-zero on passthrough; up to ~90 ms on AdHoP invoke per AC C3.5-PT-01.
8. **`was_invoked()` semantics** — set to True iff the strategy entered the refinement procedure (post-gate, regardless of whether AdHoP succeeded or fell through). On passthrough strategy + every gate-decided-passthrough call: False.
9. **Threshold validation** — the strategy MUST reject `residual_threshold_px <= 0` (raise `ValueError`); the composition root validates the config-loaded threshold at startup so this in-method check is defensive.
### DTOs (in `_types/refiner.py` — additions; reuse `MatchResult` from `_types/matcher.py`)
The output of `refine_if_needed` is a `MatchResult` (same DTO as C3 produces) with the following NEW optional fields populated by C3.5:
```python
# Additions to existing MatchResult in _types/matcher.py (NOT a new DTO; in-place extension)
@dataclass(frozen=True, slots=True)
class MatchResult:
# ... existing fields from C3 ...
# NEW (populated by C3.5; default values for non-refined frames):
refinement_label: str = "passthrough" # "adhop" | "passthrough"
refinement_added_latency_ms: float = 0.0 # added latency due to refinement; 0 on pure passthrough
```
Rationale: `MatchResult` is consumed by C3 producers and C3.5 (which may rewrite); since `MatchResult` is a frozen dataclass, C3.5 produces a NEW `MatchResult` instance via `dataclasses.replace(...)` whenever it enriches. The new fields default to the passthrough values so a C3 producer that never goes through C3.5 still yields a valid downstream-readable `MatchResult`.
> **Cross-task coordination.** AZ-344 (C3 Protocol task) defines the `MatchResult` DTO with the C3 fields. The C3.5 Producer task (TBD) extends `MatchResult` with the two NEW fields (with their defaults) in the SAME `_types/matcher.py` file. Because the fields default to passthrough values, the addition is backward-compatible for AZ-344's tests; AZ-344's `MatchResult` constructor stays valid. The C3.5 Producer task is responsible for updating AZ-344's frozen-dataclass tests (if any) to assert the new field defaults.
### Error hierarchy (in `c3_5_adhop/errors.py`)
```python
class RefinerError(Exception):
"""Base class for all C3.5 refinement-strategy errors."""
class RefinerBackboneError(RefinerError):
"""AdHoP backbone forward failed (TensorRT exception, OOM, NaN, shape mismatch). Caught inside `refine_if_needed`; converted to passthrough fall-through; never re-raised out of the strategy."""
class RefinerConfigError(RefinerError):
"""Composition-root rejected the refiner config (unknown strategy, invalid threshold). Raised at startup ONLY; never per-frame."""
```
The error hierarchy is intentionally small — drop-and-continue at the C3 matcher level handles per-candidate failures already; at C3.5 the only failure mode is the AdHoP backbone, and it is contained within the strategy via passthrough fall-through (Invariant 4).
### Composition-root factory
```python
# In src/gps_denied_onboard/runtime_root/refiner_factory.py
from gps_denied_onboard._types import config
from gps_denied_onboard.helpers.ransac_filter import RansacFilter
from gps_denied_onboard.components.c7_inference.interface import InferenceRuntime
from gps_denied_onboard.components.c3_5_adhop.interface import ConditionalRefiner
def build_refiner_strategy(
config: config.AppConfig,
ransac_filter: RansacFilter,
inference_runtime: InferenceRuntime,
) -> ConditionalRefiner:
"""Construct the configured C3.5 strategy at composition-root time. Selects between `AdHoPRefiner` and `PassthroughRefiner` per `config.refiner.strategy`. Both strategies are imported eagerly (no `BUILD_REFINER_*` flag gating — both are linked unconditionally) — runtime selection only.
Raises:
RefinerConfigError: unknown strategy name OR invalid threshold (≤ 0).
"""
...
```
Strategy resolution table:
| `config.refiner.strategy` | Module path | Class | Notes |
|---|---|---|---|
| `"adhop"` | `gps_denied_onboard.components.c3_5_adhop.adhop_refiner` | `AdHoPRefiner` | production-default; conditional invocation. |
| `"passthrough"` | `gps_denied_onboard.components.c3_5_adhop.passthrough_refiner` | `PassthroughRefiner` | always-passthrough; baseline / smoke / IT-12 comparison. |
Config-load-time validation (in AZ-269):
- `config.refiner.strategy` (enum, required): `"adhop"` | `"passthrough"`.
- `config.refiner.residual_threshold_px` (float, default `2.5`): must be > 0.
- `config.refiner.invocation_rate_warn_threshold` (float, default `0.25`): rolling-60s threshold above which a WARN log is emitted (per description.md § 9). Must be in `(0, 1)`.
## Test expectations summarised by Invariant
| Invariant | Test name | Assertion |
|---|---|---|
| 1 | thread-binding | composition root binds the strategy to ONE ingest thread; second binding raises `RuntimeError`. |
| 2 | stateless reorder | shuffle 10 frames → output content identical to in-order pass; `was_invoked()` flags identical positionwise. |
| 3 | gate semantics | residual = threshold → passthrough (`<=` is inclusive); residual = threshold + 1e-6 → invoked. |
| 4 | backbone-error fall-through | monkey-patch backbone to raise `RefinerBackboneError`; `refine_if_needed` returns input unchanged with `refinement_label = "passthrough"`; ERROR log emitted; `was_invoked()` is True. |
| 5 | bit-identical on passthrough | when `refinement_label == "passthrough"`, every `inlier_correspondences` array satisfies `np.array_equal(out, in_) and out.dtype == in_.dtype`. |
| 6 | label values | every output's `refinement_label` is in `{"adhop", "passthrough"}`. |
| 7 | added-latency monotonic | every output's `refinement_added_latency_ms >= 0`; passthrough p95 ≤ 0.5 ms; AdHoP-invoked p95 ≤ 90 ms. |
| 8 | `was_invoked()` semantics | gate-passthrough: False; AdHoP-success: True; AdHoP-fall-through: True; PassthroughRefiner: always False. |
| 9 | threshold validation | `residual_threshold_px = 0``ValueError` raised by the strategy; `RefinerConfigError` raised by `build_refiner_strategy` at startup. |
## What this contract does NOT define
- The AdHoP TRT engine compile path — owned by AZ-321 (engine compiler).
- The AdHoP forward pass implementation — owned by C7 `InferenceRuntime` consumers.
- The `RansacFilter` API — owned by AZ-282; this contract only consumes it.
- The downstream pose estimator's behaviour when `refinement_added_latency_ms` is high — owned by E-C4 (D-CROSS-LATENCY-1 hybrid is C4-internal).
## Producer-task / consumer-task split
- The Protocol task (TBD) ships: Protocol, DTO extension to `MatchResult`, error hierarchy, composition-root factory, config schema extension, AND the `PassthroughRefiner` (because it is a 1-pt no-op that naturally accompanies the Protocol task and acts as the reference implementation for tests).
- The AdHoPRefiner task (TBD) ships: `AdHoPRefiner` only (TRT engine load, perspective preconditioning, conditional gate, backbone-error fall-through to passthrough). Composition-root wiring path for `config.refiner.strategy = "adhop"`.
## Versioning + change policy
- Protocol method-signature changes (signatures of `refine_if_needed` or `was_invoked`) are MAJOR-version bumps. Every concrete strategy must be updated lockstep.
- DTO field additions (e.g., a future `refinement_iterations: int`) are MINOR. Field removals are MAJOR.
- Adding a third strategy (e.g., a learned-conditional refiner) is a feature-cycle change; it adds an entry to the resolution table without changing this contract's surface.
@@ -0,0 +1,170 @@
# Contract: `CrossDomainMatcher` Protocol
**Owner**: c3_matcher (epic AZ-257 / E-C3)
**Producer task**: AZ-344 (`CrossDomainMatcher` Protocol + factory + composition)
**Consumer tasks**: AZ-345 (DISK+LightGlue primary), AZ-346 (ALIKED+LightGlue secondary), AZ-347 (XFeat alternate); downstream c3_5_adhop (epic AZ-258) which consumes `MatchResult`
**Version**: 1.0.0
**Status**: draft, awaiting AZ-344 implementation
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c3_matcher/interface.py` (Protocol), `src/gps_denied_onboard/components/c3_matcher/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/matcher_factory.py` (factory)
## Purpose
Defines the public interface for every C3 cross-domain matcher strategy: `match(frame, rerank_result, calibration)` produces a `MatchResult` containing per-candidate inlier counts + RANSAC-filtered correspondences + median reprojection residual; `health_snapshot()` returns rolling matcher health for AC-NEW-7 cache-poisoning detection. Every concrete matcher (DISK+LightGlue, ALIKED+LightGlue, XFeat) implements this Protocol; the composition root selects exactly one at startup based on `config.matcher.strategy` and refuses to wire a strategy whose `BUILD_MATCHER_<variant>` flag is OFF (ADR-002 + ADR-009).
The shared `LightGlueRuntime` helper (AZ-278) is constructor-injected — neither C2.5 nor C3 owns its lifecycle (R14 fix); the runtime root constructs ONE instance and passes the same reference to both. The shared `RansacFilter` helper (AZ-282) is also constructor-injected and consumed by C3, C3.5, and C4.
## Public API
### Protocol: `CrossDomainMatcher`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import (
NavCameraFrame, CameraCalibration, RerankResult, MatchResult, MatcherHealth,
)
@runtime_checkable
class CrossDomainMatcher(Protocol):
"""Cross-domain (nav-camera ↔ satellite-imagery) matcher strategy. Stateless per-frame; the only persistent state is the constructor-injected backbone runtime handles + the rolling health window."""
def match(
self,
frame: NavCameraFrame,
rerank_result: RerankResult,
calibration: CameraCalibration,
) -> MatchResult:
"""Run feature extraction + matching + RANSAC + reprojection-residual computation against each top-N=3 candidate in `rerank_result`. Pick the best candidate by inlier count (deterministic tie-break: lower median residual ranked higher).
Drop-and-continue per candidate: per-candidate `MatcherBackboneError` (backbone forward failure) → candidate dropped, ERROR log + FDR record, success path continues. If ALL candidates fail OR every candidate's inlier count falls below `config.matcher.min_inliers_threshold`: raise `InsufficientInliersError`; downstream C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5).
Raises:
InsufficientInliersError: every candidate failed or every candidate's inlier count is below the configured floor.
"""
...
def health_snapshot(self) -> MatcherHealth:
"""Return a rolling-window snapshot of matcher health: consecutive low-inlier frames, mean inliers over the last 60 s. Used by C5's spoof-promotion gate (AC-NEW-2 / AC-NEW-7) and by post-flight forensics."""
...
```
**Invariants**:
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces; same thread as C2.5 because they share `LightGlueRuntime`).
2. **Stateless per-frame for `match`** — except for the rolling health window, no implicit dependency on prior frames; reordering `match` calls (tests only) MUST yield identical `MatchResult` content.
3. **Best-candidate selection is deterministic**`MatchResult.best_candidate_idx == argmax(inlier_count)` over `per_candidate`; ties broken by `per_candidate_residual_px` ascending (lower residual wins).
4. **Drop-and-continue per candidate** — per-candidate exceptions never propagate out of `match` unless every candidate fails. Mirrors C2.5 INV-8.
5. **`per_candidate` length is bounded** — `0 < len <= len(rerank_result.candidates)` (zero raises `InsufficientInliersError`); never exceeds the input N.
6. **`matcher_label` is non-empty** — every `MatchResult` carries the strategy's name (e.g., `"disk_lightglue"`) for FDR provenance. MUST match `BUILD_MATCHER_<variant>` lowercase.
7. **`inlier_correspondences` shape contract** — `ndarray[I, 4, dtype=float32]`, columns `(px_query, py_query, px_tile, py_tile)`; rows are RANSAC inliers only; `I == inlier_count`.
8. **`reprojection_residual_px` is the BEST candidate's median residual** — not the mean, not a max; downstream C3.5's threshold gate compares against this value.
9. **`health_snapshot()` is cheap** — O(1); reads the rolling window's pre-computed accumulators. Never recomputes over the window contents.
### DTOs (in `_types/matcher.py`)
```python
from dataclasses import dataclass
from uuid import UUID
import numpy as np
@dataclass(frozen=True, slots=True)
class CandidateMatchSet:
"""Per-candidate matching outcome inside a MatchResult."""
tile_id: tuple # composite (zoomLevel, lat, lon)
inlier_count: int
inlier_correspondences: np.ndarray # shape (I, 4) float32; (px_query, py_query, px_tile, py_tile)
ransac_outlier_count: int
per_candidate_residual_px: float # median residual on inliers
@dataclass(frozen=True, slots=True)
class MatchResult:
"""Cross-domain match outcome for one frame. Consumed by C3.5 ConditionalRefiner."""
frame_id: UUID
per_candidate: list[CandidateMatchSet] # 0 < len <= N=3, ranked by inlier_count descending; ties broken by per_candidate_residual_px ascending
best_candidate_idx: int # 0 by construction (sorted)
reprojection_residual_px: float # best candidate's median residual
matched_at: int # monotonic_ns
matcher_label: str # non-empty; matches BUILD_MATCHER_<variant> lowercase
candidates_input: int # len(rerank_result.candidates) at entry
candidates_dropped: int # candidates_input - len(per_candidate)
@dataclass(frozen=True, slots=True)
class MatcherHealth:
"""Rolling-window matcher health snapshot."""
consecutive_low_inlier: int # consecutive frames where inlier_count < min_inliers_threshold
mean_inliers_60s: float # rolling 60 s mean of best-candidate inlier_count
backbone_error_count_60s: int # rolling 60 s count of MatcherBackboneError occurrences
```
### Error Hierarchy (in `c3_matcher/errors.py`)
```python
class MatcherError(Exception):
"""Base for all C3 matcher errors. Caught at the runtime root; downstream effect: C5 falls back to VIO-only with provenance `visual_propagated` (AC-3.5)."""
class MatcherBackboneError(MatcherError):
"""Per-candidate backbone forward-pass failure (CUDA OOM, TRT engine deserialize mismatch). Drop-and-continue inside `match`."""
class InsufficientInliersError(MatcherError):
"""Every candidate failed OR every candidate's inlier count is below `config.matcher.min_inliers_threshold`. Raised by `match`. C5 falls back to VIO-only."""
```
## Composition-Root Factory
```python
# src/gps_denied_onboard/runtime_root/matcher_factory.py
def build_matcher_strategy(
config: Config,
lightglue_runtime: LightGlueRuntime,
ransac_filter: RansacFilter,
inference_runtime: InferenceRuntime,
) -> CrossDomainMatcher:
"""Composition-root factory. Reads `config.matcher.strategy` and lazy-imports the concrete module gated by `BUILD_MATCHER_<variant>`.
Strategy resolution table:
| config.matcher.strategy | Implementation | Module | Build flag |
|-------------------------|----------------------------|-----------------------------------------------|-----------------------------|
| "disk_lightglue" | DiskLightGlueMatcher | components.c3_matcher.disk_lightglue | BUILD_MATCHER_DISK_LIGHTGLUE |
| "aliked_lightglue" | AlikedLightGlueMatcher | components.c3_matcher.aliked_lightglue | BUILD_MATCHER_ALIKED_LIGHTGLUE |
| "xfeat" | XFeatMatcher | components.c3_matcher.xfeat | BUILD_MATCHER_XFEAT |
The shared `LightGlueRuntime` and `RansacFilter` are constructor-injected; the factory does NOT own their lifecycles. The runtime root constructs ONE `LightGlueRuntime` and passes the SAME reference to both this factory and the C2.5 ReRank factory (per AZ-342 AC-10).
"""
...
```
## Versioning
- The `CrossDomainMatcher` Protocol's method signatures are part of the cross-component public API. Any change is a major bump and requires updating every concrete implementation in lockstep.
- DTO field additions are minor; field removals are major. The drop-and-continue contract (Invariant 4) is non-negotiable.
## Test Cases (protocol conformance — runs against every concrete strategy)
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| INV-1 (single-thread) | Composition root rejects multi-thread binding | `RuntimeError` on second binding attempt |
| INV-2 (stateless `match`) | Reorder calls; replay calls | `MatchResult.per_candidate` content is identical (ignoring `matched_at`) |
| INV-3 (best-candidate det.) | Mixed inlier counts with one tie | Best candidate is the tied one with lower median residual |
| INV-4 (drop-and-continue) | One candidate's backbone raises | Result has remaining survivors; ERROR log + FDR record per failure |
| INV-5 (length bound) | N=3 input, 2 candidates fail | `len(per_candidate) == 1` |
| INV-6 (matcher_label) | Every MatchResult | `matcher_label` non-empty + matches `BUILD_MATCHER_<variant>` lowercase |
| INV-7 (correspondences shape) | Each `CandidateMatchSet` | `inlier_correspondences.shape == (I, 4)`, `dtype == float32`, `I == inlier_count` |
| INV-8 (median residual) | Median of inliers' residual list | `per_candidate_residual_px` matches numpy.median computed independently |
| INV-9 (`health_snapshot` cheap) | Microbench `health_snapshot` × 1000 | p99 ≤ 50 µs |
| AC-1.1 floor | Inlier count p5 across a fixture | ≥ 80 (AC-1.1 partition) |
| All-fail | Every candidate's backbone raises | `InsufficientInliersError`; all-failed FDR record |
| Below-threshold | Every candidate's inlier_count < `config.matcher.min_inliers_threshold` | `InsufficientInliersError` |
## Open Questions / Risks
- **Risk: D-C3-1 IT-12 verdict may shift the production-default backbone** from DISK+LightGlue to ALIKED+LightGlue or another. *Mitigation*: every backbone implements the same Protocol; switching is a config change. The contract holds.
- **Risk: `LightGlueRuntime` shared with C2.5** — both must serialise through one ingest thread. *Mitigation*: composition root binds both to the same ingest thread; helper has internal thread-binding assertion (AZ-278).
- **Risk: `min_inliers_threshold` is not yet calibrated** — the AC-1.1 floor (p5 ≥ 80) is the production target; the threshold may need to be lower (e.g., 40) to leave headroom. *Mitigation*: `config.matcher.min_inliers_threshold` is config-driven (default 60); FT-P-19 telemetry will tune it.
@@ -0,0 +1,194 @@
# Contract: `PoseEstimator` Protocol
**Owner**: c4_pose (epic AZ-259 / E-C4)
**Producer task**: AZ-355 (Protocol + DTO + factory + composition)
**Consumer tasks**: AZ-358 (`OpenCVGtsamPoseEstimator` Marginals path), AZ-361 (D-CROSS-LATENCY-1 hybrid: Jacobian fallback + thermal-state-driven mode switch). Downstream c5_state (epic AZ-260) which consumes `PoseEstimate`.
**Version**: 1.0.0
**Status**: draft, awaiting Producer task implementation
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c4_pose/interface.py` (Protocol), `src/gps_denied_onboard/components/c4_pose/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/pose_factory.py` (factory)
## Purpose
Defines the public interface for the C4 pose estimator: `estimate(match_result, calibration, thermal_state) -> PoseEstimate` produces a WGS84 position + 6×6 covariance + provenance label by running OpenCV `solvePnPRansac` (`SOLVEPNP_IPPE`) and recovering the posterior 6×6 covariance via GTSAM `Marginals.marginalCovariance(pose_key)` against C5's shared iSAM2 graph. Under thermal throttle (D-CROSS-LATENCY-1 / ADR-006), the implementation switches per-frame to Jacobian-derived covariance accepting ~510% accuracy loss to preserve the AC-4.1 latency budget. `current_covariance_mode()` exposes the per-frame decision for FDR provenance and AC-NEW-5 verification.
There is exactly ONE concrete implementation (`OpenCVGtsamPoseEstimator`); the Protocol exists for ADR-009 (interface-first DI) so consumers (C5, runtime root) hold a typed reference rather than the concrete class. ADR-002 build-time exclusion does NOT apply (one strategy only) — but lazy-import via the factory remains the entry-point pattern for symmetry with C2 / C2.5 / C3 / C3.5.
The shared `RansacFilter` (AZ-282), `WgsConverter` (AZ-279), and `SE3Utils` (AZ-277) helpers are constructor-injected. The C5 iSAM2 graph handle is constructor-injected from the runtime root; C4 NEVER owns the graph (ADR-003 shared substrate).
## Public API
### Protocol: `PoseEstimator`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import (
MatchResult, CameraCalibration, ThermalState, PoseEstimate, CovarianceMode,
)
@runtime_checkable
class PoseEstimator(Protocol):
"""Single-pose estimator producing WGS84 + 6×6 covariance + provenance label. Stateless per-frame except for the constructor-injected shared GTSAM substrate (owned by C5)."""
def estimate(
self,
match_result: MatchResult,
calibration: CameraCalibration,
thermal_state: ThermalState,
) -> PoseEstimate:
"""Run PnP → factor add → covariance recovery. Per-frame thermal decision: `thermal_state.throttle == True` → Jacobian path (cheap, ~510% accuracy loss); `False` → Marginals path (production-default).
Raises:
PnpFailureError: RANSAC convergence failure or degenerate match geometry. C5 falls back to VIO-only with `source_label = "visual_propagated"`. NEVER converted to a fallback PoseEstimate; C5 is the place where the fallback decision is taken.
"""
...
def current_covariance_mode(self) -> CovarianceMode:
"""Return the mode used for the LAST `estimate` call: `CovarianceMode.MARGINALS` or `CovarianceMode.JACOBIAN`. Used by C5 for FDR provenance and by C4-IT-03 to verify the per-frame switch."""
...
```
**Invariants**:
1. **Single-threaded by contract** — bound to the SAME ingest thread as C5 (composition root enforces; shared GTSAM substrate per ADR-003 is non-thread-safe).
2. **Stateless w.r.t. flight history for `estimate`** — relies solely on inputs + the shared iSAM2 graph (which carries history but is C5-owned).
3. **Per-frame mode decision**`thermal_state.throttle` is read at call entry; the choice between Marginals/Jacobian is made on EVERY call independently. NO hysteresis, NO smoothing, NO operator-tooling override at this layer (R10 covers operator tuning at a higher layer via `config`).
4. **Mode-switch latency ≤ 1 frame** — switching from JACOBIAN to MARGINALS or back happens immediately on the next `estimate` call when the thermal flag flips. C4-IT-03 verifies.
5. **`PoseEstimate.covariance_6x6` is always SPD** — both paths produce SPD matrices; non-SPD is a bug. C4-IT-02 verifies.
6. **`PoseEstimate.covariance_mode` matches the path actually taken** — never reports MARGINALS while computing Jacobian.
7. **`source_label` is set by C4 to `"satellite_anchored"`** unconditionally on success; C5 is the component that may downgrade it to `"visual_propagated"` or `"dead_reckoned"` when the gate decides. C4 never emits `"visual_propagated"` from `estimate` directly.
8. **`last_satellite_anchor_age_ms` is provided BY C5 and PASSED THROUGH** — C4 receives the current value via the runtime root + caches it; on emit, the value reflects the time since C5's last anchor add. C4 does not compute this metric independently.
9. **`PnpFailureError` is the ONLY non-warning exception escaping `estimate`** — `CovarianceDegradedWarning` is a Python `Warning` (filterwarnings-compatible), NOT an exception.
### DTOs (in `_types/pose.py`)
```python
from dataclasses import dataclass
from enum import Enum
from uuid import UUID
import numpy as np
class CovarianceMode(Enum):
MARGINALS = "marginals"
JACOBIAN = "jacobian"
class PoseSourceLabel(Enum):
SATELLITE_ANCHORED = "satellite_anchored"
VISUAL_PROPAGATED = "visual_propagated"
DEAD_RECKONED = "dead_reckoned"
@dataclass(frozen=True, slots=True)
class LatLonAlt:
"""WGS84 position. lat/lon in degrees, alt in metres MSL."""
lat_deg: float
lon_deg: float
alt_m_msl: float
@dataclass(frozen=True, slots=True)
class Quat:
"""Unit quaternion (w, x, y, z); scalar-first."""
w: float
x: float
y: float
z: float
@dataclass(frozen=True, slots=True)
class PoseEstimate:
"""Pose estimate emitted by C4 to C5."""
frame_id: UUID
position_wgs84: LatLonAlt
orientation_world_T_body: Quat
covariance_6x6: np.ndarray # shape (6, 6) float64; SPD; position (3x3) | orientation (3x3) blocks
covariance_mode: CovarianceMode
source_label: PoseSourceLabel # C4 always emits SATELLITE_ANCHORED on success
last_satellite_anchor_age_ms: int
emitted_at: int # monotonic_ns
```
### Error hierarchy (in `c4_pose/errors.py`)
```python
class PoseEstimatorError(Exception):
"""Base class."""
class PnpFailureError(PoseEstimatorError):
"""RANSAC convergence failure or degenerate match geometry. NEVER converted to a fallback PoseEstimate by C4 itself; C5 owns the fallback decision."""
class CovarianceDegradedWarning(Warning):
"""Per-frame thermal-state-driven Jacobian-path engagement. NOT an exception. Emitted via `warnings.warn(...)` at the start of every Jacobian-path frame; users SHOULD filter to one warning per 60 s window via `warnings.simplefilter("once")` to avoid log flooding."""
```
### Composition-root factory
```python
# In src/gps_denied_onboard/runtime_root/pose_factory.py
def build_pose_estimator(
config: AppConfig,
ransac_filter: RansacFilter,
wgs_converter: WgsConverter,
se3_utils: SE3Utils,
isam2_graph_handle: ISam2GraphHandle, # owned by C5, constructor-injected
) -> PoseEstimator:
"""Construct the configured C4 estimator at composition-root time. Currently only `"opencv_gtsam"` is defined; the Protocol exists for ADR-009.
Raises:
PoseEstimatorConfigError: invalid config; missing camera calibration; invalid `isam2_graph_handle`.
"""
...
```
Strategy resolution table:
| `config.pose.strategy` | Module path | Class | Notes |
|---|---|---|---|
| `"opencv_gtsam"` | `gps_denied_onboard.components.c4_pose.opencv_gtsam_estimator` | `OpenCVGtsamPoseEstimator` | production-default; only strategy. |
Config-load-time validation:
- `config.pose.strategy` (enum, default `"opencv_gtsam"`).
- `config.pose.ransac_iterations` (int, default 200).
- `config.pose.ransac_reprojection_threshold_px` (float, default 4.0).
- `config.pose.thermal_throttle_threshold_celsius` (float, default 75.0) — informational only; the actual `ThermalState.throttle` decision is owned by C7, not C4.
## Test expectations summarised by Invariant
| Invariant | Test name | Assertion |
|---|---|---|
| 1 | thread-binding | composition root binds to the same thread as C5; second binding raises `RuntimeError`. |
| 2 | stateless reorder | shuffle 10 frames → same outputs (modulo iSAM2 graph state which is C5-owned). |
| 3 | per-frame mode decision | thermal flag flipped between consecutive frames → mode flips immediately. |
| 4 | mode-switch latency | switch happens on the NEXT `estimate` call after the flag changes (no buffering). |
| 5 | covariance SPD | every emitted `covariance_6x6` is symmetric AND positive-definite (Cholesky succeeds). |
| 6 | mode reporting honesty | when Jacobian path runs, `covariance_mode == JACOBIAN` AND `current_covariance_mode()` returns `JACOBIAN`. |
| 7 | source_label = SATELLITE_ANCHORED on success | C4 always emits SATELLITE_ANCHORED; downgrade is C5's job. |
| 8 | `last_satellite_anchor_age_ms` pass-through | matches the last value from C5's broadcast. |
| 9 | only `PnpFailureError` escapes | `CovarianceDegradedWarning` is via `warnings.warn` not `raise`. |
## What this contract does NOT define
- The OpenCV `solvePnPRansac` configuration — owned by the producer task.
- The GTSAM `Marginals` factor add path — owned by the Marginals task.
- The Jacobian covariance derivation — owned by the hybrid task.
- The C5 iSAM2 graph internals — owned by E-C5 (AZ-260).
- The `ThermalState` source — owned by E-C7 (AZ-249 / AZ-302).
## Producer-task / consumer-task split
- **Protocol task (TBD)**: Protocol, `PoseEstimate` + `LatLonAlt` + `Quat` + `CovarianceMode` + `PoseSourceLabel` DTOs, error hierarchy, factory, config schema extension.
- **Marginals task (TBD)**: `OpenCVGtsamPoseEstimator` core (PnP + IPPE + GTSAM `Marginals` factor add against C5's iSAM2 graph). Steady-state path only; fails fast if `thermal_state.throttle` is True (raises `NotImplementedError` until the hybrid task lands).
- **Hybrid task (TBD)**: D-CROSS-LATENCY-1 — Jacobian fallback + per-frame thermal-state-driven mode switch. Adds the JACOBIAN code path; replaces the Marginals task's `NotImplementedError` with the actual Jacobian implementation; verifies AC-NEW-5 (workstation portion).
## Versioning + change policy
- Protocol method-signature changes are MAJOR version bumps (lockstep update of consumers).
- DTO field additions are MINOR; field removals are MAJOR.
- Adding a third covariance mode (e.g., a learned-prior covariance) is a feature-cycle change; it adds an entry to `CovarianceMode` without changing the Protocol surface.
@@ -0,0 +1,143 @@
# Contract: `StateEstimator` Protocol
**Owner**: c5_state (epic AZ-260 / E-C5)
**Producer task**: AZ-381 (Protocol + DTOs + factory + composition + concrete `ISam2GraphHandle`)
**Consumer tasks**: AZ-382 (iSAM2 + IncrementalFixedLagSmoother wiring), AZ-383 (Factor adds), AZ-384 (Marginals + outputs), AZ-385 (Source-label + spoof gate), AZ-386 (ESKF baseline), AZ-387 (Smoothed history → FDR), AZ-388 (AC-5.2 fallback), AZ-389 (Orthorectifier → C6).
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c5_state/interface.py`, `src/gps_denied_onboard/components/c5_state/__init__.py`, `src/gps_denied_onboard/runtime_root/state_factory.py`
## Purpose
Defines the public interface for the C5 state estimator: fuses `VioOutput` (C1), `PoseEstimate` (C4), and FC `ImuWindow` (C8 inbound) into the posterior pose with native 6×6 covariance. Two concrete strategies linked at build time per ADR-002: `GtsamIsam2StateEstimator` (production-default; iSAM2 + IncrementalFixedLagSmoother K=1020 per D-C5-3) and `EskfStateEstimator` (mandatory simple-baseline per IT-12 engine rule). Selected at startup via `config.state.strategy` with `BUILD_STATE_<variant>` flag gating per ADR-002.
C5 owns the GTSAM iSAM2 graph (ADR-003 shared substrate); C4's `OpenCVGtsamPoseEstimator` adds factors to this graph via the `ISam2GraphHandle` Protocol (defined by AZ-355 stub; concrete impl owned by AZ-381 — first child of E-C5). Single-writer thread invariant: composition root binds C5 to the same ingest thread as C4.
The shared `ImuPreintegrator` (AZ-276), `SE3Utils` (AZ-277), and `WgsConverter` (AZ-279) helpers are constructor-injected.
## Public API
### Protocol: `StateEstimator`
```python
@runtime_checkable
class StateEstimator(Protocol):
def add_vio(self, vio: VioOutput) -> None: ...
def add_pose_anchor(self, pose: PoseEstimate) -> None: ...
def add_fc_imu(self, imu_window: ImuWindow) -> None: ...
def current_estimate(self) -> EstimatorOutput: ...
def smoothed_history(self, n_keyframes: int) -> list[EstimatorOutput]: ...
def health_snapshot(self) -> EstimatorHealth: ...
```
**Invariants**:
1. **Single-writer thread** — every `add_*` and `current_estimate`/`smoothed_history` runs on the same ingest thread; ADR-003 GTSAM substrate is non-thread-safe.
2. **`add_*` calls are timestamp-ordered** — composition root provides a merge queue; out-of-order arrivals are rejected with `EstimatorDegradedError`.
3. **`add_pose_anchor(pose)` MUST inspect `pose.covariance_mode`** — `JACOBIAN` mode adds the pose to the running estimate but DOES NOT add an iSAM2 factor (per AZ-361 cross-task interaction); `MARGINALS` mode triggers the full factor add + iSAM2 update.
4. **`current_estimate()` ALWAYS returns a fresh `EstimatorOutput`** — never None on the steady-state path; `EstimatorFatalError` propagates if iSAM2 is unrecoverable.
5. **`source_label` reflects gate state** — `SATELLITE_ANCHORED` only when the spoof-promotion gate confirms (≥10 s `STABLE_NON_SPOOFED` AND visual-consistent next anchor); else `VISUAL_PROPAGATED` or `DEAD_RECKONED`.
6. **`smoothed_history(n)` returns up to K keyframes** — K bounded by `IncrementalFixedLagSmoother` window (D-C5-3 K=1020); out-of-window keyframes are NOT recoverable.
7. **`smoothed_history(n)` entries have `smoothed=True`** — distinguishes from `current_estimate()` which has `smoothed=False`.
8. **Spoof-rejection events ALWAYS land in FDR + GCS STATUSTEXT** — never silent (R07; C5-ST-01).
9. **AC-5.2 fallback on 3 s no-estimate** — if `current_estimate()` would raise OR the keyframe window is empty for ≥3 s, downstream C8 emits FC IMU-only.
10. **`covariance_6x6` is always SPD** — both strategies enforce; on numerical failure raise `EstimatorFatalError`.
### DTOs (in `_types/state.py`)
```python
@dataclass(frozen=True, slots=True)
class EstimatorOutput:
frame_id: UUID
position_wgs84: LatLonAlt
orientation_world_T_body: Quat
velocity_world_mps: tuple[float, float, float]
covariance_6x6: np.ndarray
source_label: PoseSourceLabel
last_satellite_anchor_age_ms: int
smoothed: bool
emitted_at: int
class IsamState(Enum):
INIT = "init"
TRACKING = "tracking"
DEGRADED = "degraded"
LOST = "lost"
@dataclass(frozen=True, slots=True)
class EstimatorHealth:
isam2_state: IsamState
keyframe_count: int
cov_norm_growing_for_s: float
spoof_promotion_blocked: bool
```
### Error hierarchy (in `c5_state/errors.py`)
```python
class StateEstimatorError(Exception): pass
class EstimatorDegradedError(StateEstimatorError): pass # poor convergence; emit degraded estimate
class EstimatorFatalError(StateEstimatorError): pass # numerical failure; AC-5.2 path
class StateEstimatorConfigError(StateEstimatorError): pass # composition-time
```
### Composition-root factory
```python
def build_state_estimator(
config: AppConfig,
imu_preintegrator: ImuPreintegrator,
se3_utils: SE3Utils,
wgs_converter: WgsConverter,
fdr_client: FdrClient,
) -> tuple[StateEstimator, ISam2GraphHandle]:
"""Construct the configured state estimator + return the iSAM2 graph handle for C4 to inject. Selects between gtsam_isam2 / eskf via config; ADR-002 BUILD_STATE_<variant> gating."""
...
```
Strategy resolution table:
| `config.state.strategy` | Module path | Class |
|---|---|---|
| `"gtsam_isam2"` | `gps_denied_onboard.components.c5_state.gtsam_isam2_estimator` | `GtsamIsam2StateEstimator` |
| `"eskf"` | `gps_denied_onboard.components.c5_state.eskf_baseline` | `EskfStateEstimator` |
Config schema additions:
- `config.state.strategy` (enum; required)
- `config.state.keyframe_window_size` (int, default 15) — D-C5-3 K=1020
- `config.state.spoof_promotion_min_stable_s` (float, default 10.0) — AC-NEW-2
- `config.state.spoof_promotion_visual_consistency_tol_m` (float, default 30.0) — AC-NEW-8
- `config.state.no_estimate_fallback_s` (float, default 3.0) — AC-5.2
## Test expectations summarised by Invariant
| Invariant | Test | Assertion |
|---|---|---|
| 1 | Thread-binding | second binding from a different thread → `RuntimeError` |
| 2 | Timestamp ordering | out-of-order `add_*``EstimatorDegradedError` |
| 3 | `add_pose_anchor` mode dispatch | JACOBIAN: no iSAM2 factor add; MARGINALS: factor + update |
| 4 | `current_estimate` shape | always returns fresh `EstimatorOutput` on steady state |
| 5 | Spoof gate | label reflects gate state |
| 6 | Smoothed history bounded | `len(smoothed_history(100))` ≤ K |
| 7 | Smoothed flag | every `smoothed_history` entry has `smoothed=True`; `current_estimate` has `smoothed=False` |
| 8 | Spoof-rejection logging | FDR + GCS STATUSTEXT both fire on every gate decision |
| 9 | AC-5.2 timeout | 3 s no estimate → fallback signal emitted |
| 10 | SPD covariance | every emitted `covariance_6x6` is SPD |
## Producer-task / consumer-task split
1. **Protocol + composition + ISam2GraphHandle concrete** (Producer; 3 pts): Protocol, DTOs, error hierarchy, factory, config schema, Concrete `ISam2GraphHandle` impl extending the AZ-355 stub.
2. **iSAM2 + IncrementalFixedLagSmoother K wiring** (5 pts): GTSAM graph construction, K=1020 window, key management.
3. **Factor adds (VIO + Pose + IMU)** (5 pts): `BetweenFactorPose3`, `GenericProjectionFactorCal3DS2`, `CombinedImuFactor` per the input DTO.
4. **Marginals + outputs** (3 pts): `current_estimate` / `smoothed_history` / `health_snapshot` body using `Marginals`.
5. **Source-label + spoof-promotion gate** (5 pts): `SourceLabelStateMachine` + AC-NEW-2 / AC-NEW-8 logic.
6. **ESKF baseline** (5 pts): `EskfStateEstimator` mandatory simple-baseline (IT-12 engine rule).
7. **Smoothed history → FDR** (3 pts): writer path + AC-4.5 invariant (NOT to FC).
8. **AC-5.2 fallback** (3 pts): 3 s no-estimate detector + signal emission.
9. **Orthorectifier → C6 mid-flight tile** (3 pts): orthorectifier sub-path.
Total: 35 pts (within the XL 3455 band).
@@ -0,0 +1,122 @@
# Contract: DescriptorIndex Protocol
**Component**: c6_tile_cache
**Producer task**: AZ-303 — `_docs/02_tasks/todo/AZ-303_c6_storage_interfaces.md`
**Consumer tasks**:
- AZ-TBD-c6-faiss-descriptor-index (implements: FAISS HNSW)
- TBD at decompose time: E-C2 (AZ-255 — sole runtime consumer; per-frame top-K=10 retrieval), E-C10 (AZ-252 — F1 pre-flight index build via `rebuild_from_descriptors`)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Defines the typed boundary to the per-flight descriptor retrieval index. C2 VPR queries the index per frame at 3 Hz (top-K=10) to nominate candidate tiles for C2.5 ReRanker. The concrete impl is FAISS HNSW (`FaissDescriptorIndex`), but consumers depend only on this Protocol so a future swap (e.g., ScaNN, custom index) does not ripple. C10 CacheProvisioner (`AZ-252`) is the F1 pre-flight write-side caller — it builds the `.index` file once per provisioning; in flight the index is read-only mmap.
## Shape
### Protocol surface
`typing.Protocol` (PEP 544) with `runtime_checkable=True`. All methods are sync; the index is held in memory-mapped form.
| Method | Signature | Throws / Errors | Blocking? |
|--------|-----------|-----------------|-----------|
| `search_topk` | `(query: np.ndarray, k: int) -> list[tuple[TileId, float]]` | `IndexUnavailableError` | sync (HNSW; ≤ 5 ms p95 warm; first call ≤ 1 s cold for mmap page-in) |
| `descriptor_dim` | `() -> int` | — | sync; constant-time |
| `mmap_handle` | `() -> Path` | `IndexUnavailableError` | sync; returns the `.index` file path (consumers needing custom mmap-aware tooling — e.g., operator post-flight inspection — call this) |
| `rebuild_from_descriptors` | `(descriptors: np.ndarray, tile_ids: list[TileId], hnsw_params: HnswParams) -> None` | `IndexBuildError`, `TileFsError` | sync (offline; minutes for a full-area corpus). Atomic file replacement via the AZ-280 sidecar pattern. |
| `index_metadata` | `() -> IndexMetadata` | `IndexUnavailableError` | sync; reads the sidecar metadata block |
### DTOs
```python
from dataclasses import dataclass
from datetime import datetime
from pathlib import Path
from typing import Optional
@dataclass(frozen=True)
class HnswParams:
"""HNSW build hyperparameters. See description.md § 5; defaults from the
FAISS team's HNSW32+M=32 / efConstruction=200 / efSearch=64 baseline."""
m: int = 32 # # of connections per node
ef_construction: int = 200 # build-time candidate list size
ef_search: int = 64 # query-time candidate list size
metric: str = "L2" # "L2" | "INNER_PRODUCT"
@dataclass(frozen=True)
class IndexMetadata:
descriptor_dim: int # dimension of the indexed vectors
n_vectors: int # number of indexed tiles
backbone_label: str # producer backbone — e.g. "ultra_vpr_v0"
backbone_sha256_hex: str # producer backbone weights hash (D-C10-3 chain)
built_at: datetime # ISO 8601 UTC
hnsw_params: HnswParams
sidecar_sha256_hex: str # canonical content hash of the .index file
file_path: Path # absolute path to the .index file
```
### Numpy contract
- `query`: shape `(descriptor_dim,)`, dtype `float32`, C-contiguous. The Protocol does NOT auto-pad batches; per-frame is a single query (C2's per-frame call site).
- `descriptors`: shape `(N, descriptor_dim)`, dtype `float32`, C-contiguous; `N == len(tile_ids)`. The Protocol does NOT validate shape mismatch — the impl raises `IndexBuildError` on dtype/shape violation.
### Errors
```
TileCacheError (shared with TileStore / TileMetadataStore)
└── IndexUnavailableError # mmap handle invalid, file missing, or sidecar mismatched
IndexBuildError # raised only by rebuild_from_descriptors; NOT in the read-side envelope
```
`IndexBuildError` is intentionally NOT a subclass of `TileCacheError` — the build path is offline, lives in C10's pre-flight provisioning, and has different fault semantics than the runtime-read path. C2 (the only runtime consumer) catches `IndexUnavailableError`; C10 catches `IndexBuildError`.
## Invariants
- **I-1 (immutable in flight):** once an `.index` file is opened via the impl's loader, the file's content MUST NOT change for the lifetime of the impl instance. F1 pre-flight is the only legal write path; a mid-flight rebuild is forbidden (the impl raises `IndexUnavailableError` if it detects a content-hash mismatch on a periodic sidecar re-check — out-of-band tampering signal).
- **I-2 (top-K is best-effort):** `search_topk(query, k=K)` MAY return fewer than K results when the corpus has fewer than K vectors. Consumers (C2) tolerate fewer-than-K results.
- **I-3 (descriptor-dim is fixed at build):** `descriptor_dim()` returns the value baked into the `.index` file at build time; if a consumer's query vector dimension does not match, the impl raises `IndexUnavailableError` (NOT a separate `DimensionMismatchError` — keeps the read-side envelope to a single error type).
- **I-4 (no GPU resident memory):** the impl MUST hold the index in CPU mmap'd memory only. FAISS GPU index variants are explicitly excluded — the F3 hot path's GPU is reserved for `c7_inference` engines (per NFT-LIM-01 / D-CROSS-LATENCY-1).
- **I-5 (atomic rebuild):** `rebuild_from_descriptors` MUST write to a temporary path, sync to disk, atomically rename to the target path, write the sidecar `.sha256`, and only then return. A crash mid-rebuild leaves the prior index intact.
- **I-6 (sidecar coherence):** `mmap_handle()` returns a path whose `.sha256` sidecar matches the file's actual content hash; if the sidecar is missing or mismatched, `IndexUnavailableError` is raised on the FIRST `search_topk` of the flight (not lazily on the read that hits the corrupted region). C10's pre-flight gate is the canonical place this is validated; this Protocol carries the runtime-side check as defence-in-depth.
- **I-7 (frozen DTOs):** `HnswParams`, `IndexMetadata` are `@dataclass(frozen=True)`.
- **I-8 (single-thread search):** `search_topk` is NOT re-entrant; the F3 hot path is single-threaded per the description.md assumption. Future multi-threaded callers MUST use a per-thread impl instance (out of scope this cycle).
## Non-Goals
- **Not covered: tile pixel I/O.** That's `TileStore`.
- **Not covered: tile metadata bbox queries.** That's `TileMetadataStore`.
- **Not covered: incremental updates / online learning.** F1 pre-flight is full-rebuild only. Future task if needed.
- **Not covered: GPU FAISS variants.** I-4 forbids them this cycle.
- **Not covered: cross-flight index sharing.** Each flight provisions its own per-area `.index`; cross-flight is a parent-suite concern (D-PROJ-2).
- **Not covered: descriptor compression / PQ quantisation.** HNSW32 raw float32 is the only supported variant this cycle. Future task if AC-8.3 (10 GB cap) becomes binding.
- **Not covered: backbone retraining.** This Protocol is consumer-facing; the producer side (C10's compile of an UltraVPR engine) lives in E-C7 / E-C10.
## Versioning Rules
Same rules as `tile_store.md` § Versioning Rules. Note that `IndexMetadata.backbone_sha256_hex` ties this contract's lifecycle to the C7 engine cache (AZ-298 / AZ-301 / AZ-281): a backbone weights bump invalidates every prior `.index` AND requires a coordinated update — recorded as a major version of THIS contract only when the field's shape changes; backbone-weight refreshes within the existing schema are non-breaking content updates handled by C10.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| protocol-conformance-full | A class implementing all 5 methods | `isinstance(impl, DescriptorIndex) == True` | Producer AC-1 |
| protocol-conformance-partial | A class missing `index_metadata` | `isinstance == False` | CI drift gate |
| search-topk-warm | Query vector of correct dim against a 10k-vector index, OS page cache warm | Returns `[(tile_id, distance), ...]` length ≤ k; p95 ≤ 5 ms | I-2 / Consumer C2-PT-01 |
| search-topk-fewer-than-k | k=20 against a 10-vector index | Returns 10 results, ordered by distance ascending | I-2 |
| search-topk-dim-mismatch | Query vector of wrong dim | `IndexUnavailableError` | I-3 |
| search-topk-corrupted-sidecar | Index file present, sidecar missing | First `search_topk` raises `IndexUnavailableError`; subsequent calls also raise (no silent recovery) | I-6 |
| descriptor-dim | After a rebuild with `descriptors.shape == (N, 512)` | `descriptor_dim() == 512` | I-3 |
| rebuild-atomic-on-crash | Simulated `os._exit` mid-rebuild | The original `.index` file is intact and still loadable; partial temp file is cleaned up at next start | I-5 |
| rebuild-sidecar-content-hash | Successful rebuild | `.sha256` sidecar matches `sha256(.index)` | I-6 / AZ-280 contract |
| index-metadata | After rebuild | Returns `IndexMetadata` with matching `descriptor_dim`, `n_vectors`, `built_at` (within 1 s of call), `hnsw_params` (mirrors input), `sidecar_sha256_hex` (matches sidecar content) | I-7 |
| frozen-dto-mutation | `HnswParams(m=32, ...).m = 64` | `FrozenInstanceError` | I-7 |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — 5-method Protocol + HNSW params DTO + IndexMetadata sidecar shape + immutable-in-flight + atomic-rebuild invariants. | autodev (decompose Step 2 of AZ-250 / E-C6) |
@@ -0,0 +1,132 @@
# Contract: TileMetadataStore Protocol
**Component**: c6_tile_cache
**Producer task**: AZ-303 — `_docs/02_tasks/todo/AZ-303_c6_storage_interfaces.md`
**Consumer tasks**:
- AZ-TBD-c6-postgres-filesystem-store (implements)
- AZ-TBD-c6-freshness-gate (insert hook + sector classification reader)
- AZ-TBD-c6-cache-budget-eviction (LRU candidate enumeration + delete coordination)
- TBD at decompose time: E-C10 (AZ-252 — manifest + provisioning), E-C11 (AZ-251 — both `TileDownloader` insert and `TileUploader` reader queries), E-C12 (AZ-253 — operator pre-flight tooling)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Defines the typed boundary to the Postgres-backed spatial index over `TileMetadata`. Concrete impls (today only `PostgresFilesystemStore` — same class also implements `TileStore`) own row insert / bbox query / voting-state transitions. Pre-flight cache builders (C10 / C11 / C12), the F4 mid-flight orthorectifier path (via C5 → C6), and post-landing tooling (C11 `TileUploader`) all consume this surface.
## Shape
### Protocol surface
`typing.Protocol` (PEP 544) with `runtime_checkable=True`. All methods are sync; the Postgres connection pool is owned inside the impl.
| Method | Signature | Throws / Errors | Blocking? |
|--------|-----------|-----------------|-----------|
| `query_by_bbox` | `(bbox: Bbox, zoom: int, *, voting_filter: Optional[VotingStatus] = None, source_filter: Optional[TileSource] = None) -> list[TileMetadata]` | `TileMetadataError` | sync (btree index; ≤ 50 ms typical) |
| `insert_metadata` | `(metadata: TileMetadata) -> None` | `TileMetadataError`, `FreshnessRejectionError` | sync (single-row insert) |
| `update_voting_status` | `(tile_id: TileId, status: VotingStatus) -> None` | `TileMetadataError`, `TileNotFoundError` | sync |
| `mark_uploaded` | `(tile_id: TileId, uploaded_at: datetime) -> None` | `TileMetadataError`, `TileNotFoundError` | sync |
| `pending_uploads` | `() -> list[TileMetadata]` | `TileMetadataError` | sync (filtered query: `source = ONBOARD_INGEST AND uploaded_at IS NULL`) |
| `record_lru_access` | `(tile_id: TileId, accessed_at: datetime) -> None` | `TileMetadataError` | sync (timestamp update only — no row-level read) |
| `lru_candidates` | `(*, max_count: int) -> list[TileMetadata]` | `TileMetadataError` | sync (oldest-`accessed_at`-first; bounded result set) |
| `total_disk_bytes` | `() -> int` | `TileMetadataError` | sync (sum of `disk_bytes` column; ≤ 100 ms even at 100k rows) |
| `get_by_id` | `(tile_id: TileId) -> Optional[TileMetadata]` | `TileMetadataError` | sync; returns `None` if absent (NOT `TileNotFoundError`) |
### DTOs
Reuses `TileId`, `TileMetadata`, `TileQualityMetadata`, `TileSource`, `FreshnessLabel`, `VotingStatus` from `tile_store.md`. The same DTOs are shared across both Protocols by design (single source of truth in `c6_tile_cache._types`).
```python
from dataclasses import dataclass
@dataclass(frozen=True)
class Bbox:
"""Axis-aligned WGS84 bounding box. Inclusive on min, exclusive on max."""
min_lat: float
min_lon: float
max_lat: float
max_lon: float
```
In addition, `TileMetadata` is extended with two columns owned by the metadata store (NOT meaningful to `TileStore`; see Invariants):
```python
@dataclass(frozen=True)
class TileMetadataPersistent:
metadata: TileMetadata # the read-only DTO from tile_store.md
accessed_at: datetime # LRU clock — last read time
uploaded_at: Optional[datetime] # set when C11 TileUploader has confirmed upload
disk_bytes: int # JPEG body size on disk; tracked for cache-budget enforcement
```
The Protocol returns `TileMetadata` from queries. `TileMetadataPersistent` is the in-process view of LRU and disk-budget state, accessible only via `lru_candidates` / `record_lru_access` / `total_disk_bytes`.
### Sector classification (read-only input to the freshness gate)
```python
class SectorClassification(str, Enum):
ACTIVE_CONFLICT = "active_conflict"
STABLE_REAR = "stable_rear"
@dataclass(frozen=True)
class SectorBoundary:
bbox: Bbox
classification: SectorClassification
```
`SectorClassification` is set pre-flight by the operator via C12; the metadata store reads `SectorBoundary` rows from a sibling table (`sector_boundaries`) at insert-time to decide which freshness rule to apply. The Protocol does NOT expose insert-side methods for `SectorBoundary` rows — that surface lives in C12.
## Invariants
- **I-1 (composite key uniqueness):** `(zoom_level, lat, lon, source)` is the unique key in the `tiles` table. Re-inserting the same key with different content_sha256 raises `TileMetadataError` — no silent overwrite.
- **I-2 (freshness gate at insert):** `insert_metadata` rejects (raises `FreshnessRejectionError`) iff the tile's `(lat, lon)` falls inside an `ACTIVE_CONFLICT` sector AND `capture_timestamp < now() - active_conflict_max_age`. The freshness rules table is configured per-flight (default 6 months for active_conflict; 12 months for stable_rear which downgrades rather than rejects).
- **I-3 (downgrade marking):** when a tile in a `STABLE_REAR` sector is older than `stable_rear_max_age`, the row is inserted with `freshness_label=DOWNGRADED` (NOT rejected). `query_by_bbox` returns the downgrade flag intact so consumers (C2 / C3 spoof-rejection) can act on it.
- **I-4 (LRU clock):** `record_lru_access` updates `accessed_at = max(current accessed_at, supplied timestamp)`; clock skew never sets `accessed_at` backward. `lru_candidates` returns oldest-first.
- **I-5 (disk-budget invariant):** `total_disk_bytes` MUST equal `SUM(disk_bytes)` over all rows where `voting_status != REJECTED`. Rejected rows are tombstones — they keep the on-disk file deleted but retain the row for the manifest's content-hash check (D-C10-3).
- **I-6 (frozen DTOs):** `Bbox`, `SectorBoundary`, `TileMetadataPersistent` are `@dataclass(frozen=True)`.
- **I-7 (transactional writes):** `insert_metadata` is a single transaction over the `tiles` table; the freshness check + the row insert MUST be atomic (a parallel sector-boundary update MUST NOT race the gate).
- **I-8 (no silent voting-status downgrade):** `update_voting_status` accepts only forward transitions (`PENDING → TRUSTED`, `PENDING → REJECTED`); a backward transition raises `TileMetadataError`. `TRUSTED → REJECTED` is allowed (covers the cache-poisoning recall path).
- **I-9 (`pending_uploads` is the single source for C11 TileUploader):** the uploader MUST NOT scan the filesystem for pending tiles; it MUST drive its loop off `pending_uploads()`. The metadata store is the bookkeeping.
## Non-Goals
- **Not covered: filesystem JPEG I/O.** That's `TileStore`.
- **Not covered: descriptor index queries.** That's `DescriptorIndex`.
- **Not covered: sector boundary insert / update.** Owned by C12 operator-tooling against a sibling table; this Protocol is read-only on `SectorBoundary` and does NOT expose CRUD.
- **Not covered: cross-flight aggregation / voting threshold computation.** That's `satellite-provider`'s D-PROJ-2 trust layer (parent suite); C6 just stamps the per-row `voting_status`.
- **Not covered: full-text search / arbitrary-WHERE queries.** Only the methods above; ad-hoc queries go through DBA tooling, not this Protocol.
- **Not covered: schema migrations.** Migration scripts live in `c6_tile_cache/_alembic/`; the Protocol is shape-only.
## Versioning Rules
Same rules as `tile_store.md` § Versioning Rules.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| protocol-conformance-full | A class implementing all 9 methods | `isinstance(impl, TileMetadataStore) == True` | Producer AC-1 |
| query-by-bbox-basic | bbox covering 100 inserted tiles at zoom=18 | Returns exactly the 100 tiles; `voting_filter=None` returns all statuses | Smoke |
| query-by-bbox-voting-filter | Same with `voting_filter=TRUSTED` | Returns only TRUSTED tiles in bbox | Used by C10 manifest builder |
| insert-duplicate-key | Insert (z=18, lat, lon, src=GOOGLEMAPS) twice with different content_sha256 | First succeeds; second raises `TileMetadataError` | I-1 |
| insert-active-conflict-stale | Insert into ACTIVE_CONFLICT sector, capture_timestamp = now - 7 months | `FreshnessRejectionError`; row not committed | I-2 / C6-IT-02 |
| insert-stable-rear-stale | Insert into STABLE_REAR sector, capture_timestamp = now - 13 months | Row inserted with `freshness_label=DOWNGRADED` | I-3 |
| update-voting-status-forward | PENDING → TRUSTED | Succeeds | I-8 |
| update-voting-status-backward | TRUSTED → PENDING | `TileMetadataError` | I-8 |
| update-voting-status-trusted-to-rejected | TRUSTED → REJECTED | Succeeds (recall path) | I-8 |
| pending-uploads-empty | No ONBOARD_INGEST tiles | Returns `[]` | I-9 |
| pending-uploads-after-mark | Insert + `mark_uploaded` for half | Returns the unmarked half | I-9 |
| record-lru-access-monotonic | `record_lru_access(t, ts1)` then `record_lru_access(t, ts0 < ts1)` | `accessed_at` stays at `ts1` | I-4 |
| lru-candidates-order | Mixed `accessed_at` for 100 rows; `lru_candidates(max_count=10)` | Returns the 10 oldest in ascending `accessed_at` order | I-4 |
| total-disk-bytes-sum | Insert 5 tiles with known disk_bytes, mark 1 REJECTED | `total_disk_bytes()` excludes the rejected row | I-5 |
| get-by-id-missing | Random tile_id never inserted | Returns `None` (not `TileNotFoundError`) | Documented null-return semantic |
| frozen-dto-mutation | `Bbox(0, 0, 1, 1).min_lat = 5.0` | `FrozenInstanceError` | I-6 |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — 9-method Protocol + LRU/disk-budget extensions + freshness gate semantics + composite-key uniqueness invariant. | autodev (decompose Step 2 of AZ-250 / E-C6) |
@@ -0,0 +1,166 @@
# Contract: TileStore Protocol
**Component**: c6_tile_cache
**Producer task**: AZ-303 — `_docs/02_tasks/todo/AZ-303_c6_storage_interfaces.md`
**Consumer tasks**:
- AZ-TBD-c6-postgres-filesystem-store (implements)
- AZ-TBD-c6-freshness-gate (insert hook collaborator)
- AZ-TBD-c6-cache-budget-eviction (uses `tile_exists` + `delete_tile`)
- TBD at decompose time: E-C2.5 (AZ-256), E-C3 (AZ-257), E-C11 (AZ-251 — both `TileDownloader` and `TileUploader`)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Defines the typed boundary between filesystem-resident tile pixel I/O and every component that produces or consumes JPEG tile bytes. Concrete impls (today only `PostgresFilesystemStore`) write JPEGs to a layout byte-identical to `satellite-provider`'s on-disk format so the C11 `TileUploader` post-landing upload (F10) is a straight copy.
## Shape
### Protocol surface
`typing.Protocol` (PEP 544 structural typing) with `runtime_checkable=True`.
| Method | Signature | Throws / Errors | Blocking? |
|--------|-----------|-----------------|-----------|
| `read_tile_pixels` | `(tile_id: TileId) -> TilePixelHandle` | `TileNotFoundError`, `TileFsError` | sync (mmap, ≤ 0.5 ms warm; ≤ 50 ms cold) |
| `write_tile` | `(tile_blob: bytes, metadata: TileMetadata) -> None` | `TileFsError`, `TileMetadataError`, `ContentHashMismatchError`, `FreshnessRejectionError` | sync (atomic fs write + sidecar) |
| `tile_exists` | `(tile_id: TileId) -> bool` | — | sync (page-cache lookup; ≤ 1 ms) |
| `delete_tile` | `(tile_id: TileId) -> bool` | `TileFsError` | sync (returns `True` if a file was removed; `False` if missing — no-error path for the cache-eviction caller) |
### DTOs
```python
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from pathlib import Path
from typing import Optional
@dataclass(frozen=True)
class TileId:
zoom_level: int # 0..21 — `satellite-provider` legal range
lat: float # WGS84 centre latitude
lon: float # WGS84 centre longitude
class TileSource(str, Enum):
GOOGLEMAPS = "googlemaps"
ONBOARD_INGEST = "onboard_ingest"
class FreshnessLabel(str, Enum):
FRESH = "fresh"
STALE_ACTIVE_CONFLICT = "stale_active_conflict"
STALE_REAR = "stale_rear"
DOWNGRADED = "downgraded"
class VotingStatus(str, Enum):
PENDING = "pending"
TRUSTED = "trusted"
REJECTED = "rejected"
@dataclass(frozen=True)
class TileQualityMetadata:
estimator_label: str # "satellite_anchored" | "visual_propagated" | "dead_reckoned"
covariance_2x2: tuple[tuple[float, float], tuple[float, float]]
last_anchor_age_ms: int
mre_px: float
imu_bias_norm: float
@dataclass(frozen=True)
class TileMetadata:
tile_id: TileId
tile_size_meters: float
tile_size_pixels: int
capture_timestamp: datetime # ISO 8601 UTC
source: TileSource
content_sha256_hex: str # canonical sha256 of the JPEG body
freshness_label: FreshnessLabel
flight_id: Optional[str] # uuid; set for ONBOARD_INGEST
companion_id: Optional[str] # set for ONBOARD_INGEST
quality_metadata: Optional[TileQualityMetadata] # set for ONBOARD_INGEST
voting_status: VotingStatus # default PENDING for ONBOARD_INGEST
class TilePixelHandle:
"""Opaque handle: filesystem path + mmap pointer. Consumer MUST NOT copy the bytes
or close the underlying mapping; the handle's lifetime is bounded by the caller's
use-site `with` block."""
@property
def filesystem_path(self) -> Path: ...
def __enter__(self) -> memoryview: ...
def __exit__(self, *exc) -> None: ...
```
### Error types
All under `c6_tile_cache.errors`:
```
TileCacheError (Exception subclass)
├── TileNotFoundError # tile_id not present on disk
├── TileFsError # I/O error on read/write/rename
├── TileMetadataError # row missing despite file present, or vice-versa (consistency violation)
├── ContentHashMismatchError # supplied JPEG bytes don't match declared content_sha256
└── FreshnessRejectionError # rejected by the C6 freshness gate (raised on insert in active_conflict)
```
`IndexUnavailableError` lives under the same package but is exclusively raised by `DescriptorIndex` — it is not part of `TileStore`'s envelope.
### Filesystem layout
JPEG body lands at `<root>/tiles/{zoom_level}/{x}/{y}.jpg` where `(x, y)` is derived from `(lat, lon, zoom_level)` per the same Web-Mercator tile-coordinate function `satellite-provider` uses (see `satellite-provider/README.md`). A sidecar file `<root>/tiles/{zoom_level}/{x}/{y}.jpg.sha256` carries the canonical content hash (produced by `helpers.sha256_sidecar.atomic_write_with_sidecar` per AZ-280 contract).
## Invariants
- **I-1 (byte-identity with satellite-provider):** for any `(zoom_level, lat, lon)`, the filesystem path computed by C6 `write_tile` MUST equal the path that `satellite-provider` would compute for the same coordinate; any deviation breaks AC-8.4 / F10 upload.
- **I-2 (atomic write + sidecar invariant):** a successful `write_tile` returns only after BOTH the JPEG file AND its `.sha256` sidecar are durable on disk; partial states (file without sidecar or sidecar without file) MUST NOT be observable to readers.
- **I-3 (content-hash gate):** `write_tile` rejects (raises `ContentHashMismatchError`) if `sha256(tile_blob) != metadata.content_sha256_hex`; the cache-poisoning safety budget (D-C10-3 + AC-NEW-7) is bound to this check.
- **I-4 (read mmap is read-only):** `TilePixelHandle.__enter__()` returns a read-only `memoryview`; consumers MUST NOT mutate; a writer that mutates through the mmap is a `Reliability` finding (Critical) at code-review time.
- **I-5 (race-free reads under concurrent F4 writes):** C2 / C2.5 / C3 readers see either the pre-write tile bytes or the post-write tile bytes — never partial bytes. Enforced by `atomicwrites` rename semantics on the writer side.
- **I-6 (idempotent delete):** `delete_tile` returns `False` when the tile is missing; it does NOT raise. The cache-eviction caller relies on this no-error path because it deletes by LRU and may race with a concurrent eviction sweep.
- **I-7 (frozen DTOs):** `TileId`, `TileMetadata`, `TileQualityMetadata` are `@dataclass(frozen=True)`. Mutation raises `FrozenInstanceError`.
- **I-8 (fail-fast on consistency violation):** if a row exists in the metadata store but the JPEG file is missing (or vice-versa), `read_tile_pixels` raises `TileMetadataError` — NOT `TileNotFoundError`. The two errors are the operator's signal that the cache is in a degraded state and needs reprovisioning.
## Non-Goals
- **Not covered: tile descriptor index.** Descriptor mmap + HNSW search is `DescriptorIndex` — separate Protocol, separate contract.
- **Not covered: spatial bbox queries.** `query_by_bbox` is on `TileMetadataStore` — separate Protocol.
- **Not covered: HTTP transport to satellite-provider.** C11 `TileDownloader` / `TileUploader` own transport; they call `TileStore.write_tile` / `TileStore.read_tile_pixels` for the local-side persistence step.
- **Not covered: eviction policy.** `delete_tile` is the eviction primitive; the LRU policy lives in the cache-budget enforcer (separate task).
- **Not covered: multi-process readers writing concurrently.** Single-process producer/consumer per flight; multi-process scenarios are out of scope this cycle.
## Versioning Rules
- **Breaking** (renamed method, removed field, type change, required→optional flip, error class removed from family) requires a major version bump and a coordinated update of every consumer task listed in this header. Producer task MUST surface the change to the user via Choose format before merging.
- **Non-breaking additions** (new optional method via Protocol structural compatibility, new optional field on a DTO with a default, new error variant added to `TileCacheError`) require a minor version bump.
- **Patch** (clarification only, no shape change) is documentation-only.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| protocol-conformance-full | A class implementing all four methods with matching signatures | `isinstance(impl, TileStore) == True` | I-1 / Producer AC-1 |
| protocol-conformance-partial | A class missing `delete_tile` | `isinstance == False` | Drift detection at CI time |
| frozen-dto-mutation | `TileId(zoom_level=18, lat=49.94, lon=36.31).lat = 0.0` | `FrozenInstanceError` | I-7 |
| write-tile-byte-identical | `write_tile(blob, metadata)` for `(zoom=18, lat=49.94, lon=36.31)` | Filesystem path equals `satellite-provider`'s path for same coord; JPEG bytes equal `blob`; sidecar contains `sha256(blob)` | I-1 / I-2 / C6-IT-01 |
| write-tile-content-hash-mismatch | `write_tile(blob, metadata.with(content_sha256_hex="0x00..."))` | `ContentHashMismatchError`; no file written; no sidecar written | I-3 / C6-ST-01 |
| write-tile-freshness-reject | active_conflict sector + stale tile | `FreshnessRejectionError`; no file/row written | Hand-off to freshness-gate task |
| read-tile-pixels-warm | `read_tile_pixels(tile_id)` after a prior write; OS page cache warm | `TilePixelHandle.__enter__()` returns within 0.5 ms; bytes equal the written JPEG body | C6-PT-01 |
| read-tile-pixels-missing | `read_tile_pixels(tile_id)` for never-written tile | `TileNotFoundError` | I-8 (the row-missing-and-file-missing case) |
| read-tile-pixels-row-without-file | metadata row exists; JPEG file deleted out-of-band | `TileMetadataError` (not `TileNotFoundError`) | I-8 |
| concurrent-write-and-read | F4 writer + 9 Hz C2.5 reader on same tile | Reader sees either pre-write or post-write bytes — never partial | I-5 |
| delete-tile-missing | `delete_tile` for a never-written tile | Returns `False`; no exception raised | I-6 |
| delete-tile-existing | `delete_tile` after a prior write | Returns `True`; subsequent `tile_exists` returns `False`; sidecar also removed | I-6 |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — Protocol + DTOs + 5-error family + filesystem byte-identity invariant. | autodev (decompose Step 2 of AZ-250 / E-C6) |
@@ -0,0 +1,176 @@
# Contract: InferenceRuntime Protocol
**Component**: c7_inference
**Producer task**: AZ-297 — `_docs/02_tasks/todo/AZ-297_c7_runtime_protocol.md`
**Consumer tasks**:
- AZ-298 (TensorrtRuntime — implements)
- AZ-299 (OnnxTrtEpRuntime — implements)
- AZ-300 (PytorchFp16Runtime — implements)
- AZ-301 (EngineGate — uses error types)
- AZ-302 (ThermalState publisher — extends `ThermalState` DTO with `is_telemetry_available`)
- TBD at decompose time: E-C2 (AZ-250), E-C2.5 (AZ-251), E-C3 (AZ-252), E-C3.5 (AZ-253), E-C4 (AZ-254 — `ThermalState` consumer), E-C10 (AZ-257 — `compile_engine` caller)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Defines the typed boundary between the on-Jetson inference runtime (engine compilation, deserialisation, per-call inference, GPU memory management, thermal-throttle telemetry) and every downstream component that depends on GPU inference. The Protocol is the single point of contact that lets ADR-001 select between three concrete strategies (TensorRT 10.3 production, ONNX Runtime + TRT EP fallback, PyTorch FP16 simple-baseline) at startup without consumers caring which is wired.
## Shape
### Protocol surface
The Protocol is `typing.Protocol` (PEP 544 structural typing) with `runtime_checkable=True`.
| Method | Signature | Throws / Errors | Blocking? |
|--------|-----------|-----------------|-----------|
| `compile_engine` | `(model_path: Path, build_config: BuildConfig) -> EngineCacheEntry` | `EngineBuildError`, `CalibrationCacheError` | sync (offline; minutes for INT8) |
| `deserialize_engine` | `(entry: EngineCacheEntry) -> EngineHandle` | `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `OutOfMemoryError` | sync |
| `infer` | `(handle: EngineHandle, inputs: dict[str, np.ndarray]) -> dict[str, np.ndarray]` | `InferenceError`, `OutOfMemoryError` | sync (GPU stream sync) |
| `release_engine` | `(handle: EngineHandle) -> None` | — (idempotent) | sync |
| `thermal_state` | `() -> ThermalState` | `TelemetryUnavailableError` (only on cold-start fail; steady-state defaults to `is_telemetry_available=False`) | sync |
| `current_runtime_label` | `() -> Literal["tensorrt", "onnx_trt_ep", "pytorch_fp16"]` | — | sync |
### DTOs
All DTOs are stdlib `@dataclass(frozen=True)` (`EngineHandle` is the exception — opaque marker class).
```python
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from typing import Optional
class PrecisionMode(str, Enum):
FP16 = "fp16"
INT8 = "int8"
MIXED = "mixed"
@dataclass(frozen=True)
class OptimizationProfile:
input_name: str
min_shape: tuple[int, ...]
opt_shape: tuple[int, ...]
max_shape: tuple[int, ...]
@dataclass(frozen=True)
class BuildConfig:
precision: PrecisionMode
workspace_mb: int
calibration_dataset: Optional[Path] # required for INT8; None for FP16/Mixed
optimization_profiles: tuple[OptimizationProfile, ...]
use_trtexec: bool = False # TRT-only hint; ignored by ORT / PyTorch
@dataclass(frozen=True)
class EngineCacheEntry:
engine_path: Path # `.engine` for TRT/ORT; `.onnx` for ORT-direct; `.pt` for PyTorch
sha256_hex: str # canonical sha256 of engine_path
sm: Optional[int] # None for PyTorch (hardware-portable)
jp: Optional[str] # JetPack version, e.g. "6.2"
trt: Optional[str] # TensorRT version, e.g. "10.3"
precision: PrecisionMode
extras: dict[str, str] # implementation-specific (e.g., calibration cache path)
class EngineHandle:
"""Opaque marker class. Consumers MUST NOT introspect; pass back to the same runtime."""
pass
@dataclass(frozen=True)
class ThermalState:
cpu_temp_c: Optional[float]
gpu_temp_c: Optional[float]
thermal_throttle_active: bool # default False on telemetry unavailability
measured_clock_mhz: Optional[int]
measured_at_ns: int # monotonic_ns of poll
is_telemetry_available: bool # False if the source is hung/absent (default-safe path)
```
### Error hierarchy
All errors live under `c7_inference.errors`:
```
RuntimeError (Exception subclass — NOT stdlib RuntimeError)
├── EngineBuildError
├── EngineDeserializeError
├── EngineHashMismatchError
├── EngineSchemaMismatchError
├── EngineSidecarMissingError
├── CalibrationCacheError
├── InferenceError
├── OutOfMemoryError
└── TelemetryUnavailableError
RuntimeNotAvailableError (composition-root only; NOT a Protocol family error)
ConfigSchemaError (config-load only; NOT a Protocol family error)
```
Consumers catch the family with `except c7_inference.errors.RuntimeError as e`. Implementations MUST raise only members of this family from Protocol methods; third-party library errors (TRT C++ exceptions, ORT internal errors, PyTorch CUDA errors) MUST be caught and rewrapped.
### Composition-root factory
Defined in `runtime_root/inference_factory.py` (NOT in `c7_inference` itself; the factory is the wiring layer):
```python
def build_inference_runtime(config: Config) -> InferenceRuntime:
"""
Selects exactly one strategy by config.inference.runtime + BUILD_* flag gating.
Raises RuntimeNotAvailableError if the requested strategy's BUILD_* flag is OFF.
"""
```
## Invariants
- **I-1 (single source of truth for runtime label):** `current_runtime_label()` returns a string equal to `config.inference.runtime`. AC-NEW-3 audit relies on this exact-match property.
- **I-2 (Protocol-family error envelope):** Every Protocol method raises only members of `c7_inference.errors.RuntimeError` family or returns normally. Third-party exceptions are caught and rewrapped.
- **I-3 (frozen DTOs):** `BuildConfig`, `EngineCacheEntry`, `ThermalState`, and `OptimizationProfile` are `@dataclass(frozen=True)`. Mutation attempts raise `FrozenInstanceError`.
- **I-4 (opaque EngineHandle):** Consumers MUST NOT introspect `EngineHandle` fields. Implementations subclass with private state; the Protocol surface is unchanged.
- **I-5 (lazy-import gating):** Concrete strategies are imported only inside the factory's `if BUILD_*:` blocks. The package `__init__.py` exports only the Protocol, DTOs, and errors. A Tier-0 build with `BUILD_TENSORRT_RUNTIME=OFF` MUST NOT load `c7_inference.tensorrt_runtime` (verifiable via `sys.modules`).
- **I-6 (default-safe thermal):** When `ThermalState.is_telemetry_available == False`, `ThermalState.thermal_throttle_active == False` (the steady-state default; consumers may choose to ignore the throttle bit when telemetry is unavailable).
- **I-7 (idempotent release):** `release_engine(handle)` may be called more than once on the same handle; second-and-later calls return silently.
- **I-8 (sync-stream `infer`):** `infer` returns only after the GPU stream has synchronised; the returned dict's tensors are host-resident (numpy arrays) and ready for consumer use.
## Non-Goals
- **Not covered: multi-stream concurrent inference.** One CUDA stream per Runtime instance this cycle. Future work if the F3 hot path becomes multi-threaded.
- **Not covered: cross-process engine cache reuse.** Engines are per-process; a separate process must deserialise from the on-disk cache.
- **Not covered: per-frame input/output type negotiation.** Inputs / outputs are numpy arrays in named dicts; type / dtype negotiation is per-strategy and per-engine.
- **Not covered: streaming / iterative inference.** `infer` is request/response; no callbacks, no chunked outputs.
- **Not covered: dynamic batch.** `OptimizationProfile` carries `min_shape / opt_shape / max_shape`, but the consumer is responsible for picking the actual runtime shape; the Protocol does not auto-batch.
- **Not covered: engine versioning / hot-reload.** Engines are loaded at takeoff (F2) and held for the flight; a new engine requires a process restart.
## Versioning Rules
- **Breaking changes** (renamed method, removed field, type change, required→optional flip, error-class removed from family) require a new major version (`2.0.0`) and a deprecation path for every consumer task listed in the contract header. The change log MUST list each consuming task that needs a coordinated update.
- **Non-breaking additions** (new optional method via Protocol structural compatibility, new optional field on a DTO with a default, new error variant added to the family) require a minor version bump (e.g., `1.1.0`).
- **Patch** (clarification only; no shape change) is documentation-only.
The current contract is `1.0.0` and includes the 1.1.0 anticipated extension `ThermalState.is_telemetry_available` from AZ-302 (added pre-freeze; will be `1.0.0` at first frozen freeze).
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| protocol-conformance-full | A class implementing all six methods | `isinstance(impl, InferenceRuntime) == True` | AZ-297 AC-1 |
| protocol-conformance-partial | A class missing `thermal_state` | `isinstance == False` | AZ-297 AC-1 |
| frozen-dto-mutation | `BuildConfig(precision=Fp16, ...).precision = Int8` | `FrozenInstanceError` | AZ-297 AC-2 / I-3 |
| error-family-catch-all | Raise each of the nine error subtypes | All caught by `except c7_inference.errors.RuntimeError` | AZ-297 AC-3 / I-2 |
| factory-tensorrt-on | `config.inference.runtime="tensorrt"` + `BUILD_TENSORRT_RUNTIME=ON` | Returns `TensorrtRuntime`; label `"tensorrt"` | AZ-297 AC-4 |
| factory-tensorrt-off | Same config + `BUILD_TENSORRT_RUNTIME=OFF` | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` | AZ-297 AC-5 / I-5 |
| factory-unknown-runtime | `config.inference.runtime="tensorflow_lite"` | `ConfigSchemaError` at config-load time | AZ-297 AC-6 |
| label-exact-match | Runtime constructed for each of the three strategies | `current_runtime_label()` == `config.inference.runtime` | AZ-297 AC-7 / I-1 |
| contract-introspection-parity | Parse this file's Shape section vs. the runtime Protocol | All methods, fields, errors match | AZ-297 AC-8 |
| thermal-default-safe | `ThermalState(is_telemetry_available=False, thermal_throttle_active=True)` | Implementations MUST NOT construct this — invariant I-6 says `throttle_active=False` whenever `is_telemetry_available=False`. A test asserts the publisher's output respects this. | I-6 |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract — Protocol + 4 DTOs + 9-error family + composition-root factory + lazy-import gating. Includes the `ThermalState.is_telemetry_available` field added by AZ-302 (no separate version bump because the field landed before first freeze). | autodev (AZ-297 / AZ-302 coordination) |
@@ -0,0 +1,181 @@
# Contract: `FcAdapter` / `GcsAdapter` Protocols
**Owner**: c8_fc_adapter (epic AZ-261 / E-C8)
**Producer task**: AZ-390 (FcAdapter / GcsAdapter Protocols + DTOs + errors + factories + composition)
**Consumer tasks**: AZ-391 (Inbound subscription + telemetry dispatch), AZ-392 (CovarianceProjector), AZ-393 (PymavlinkArdupilotAdapter outbound), AZ-394 (Msp2InavAdapter outbound), AZ-395 (MAVLink 2.0 signing handshake), AZ-396 (Source-set switch), AZ-397 (GcsAdapter + QgcTelemetryAdapter).
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
**Module-layout home**: `src/gps_denied_onboard/components/c8_fc_adapter/interface.py`, `src/gps_denied_onboard/components/c8_fc_adapter/__init__.py`, `src/gps_denied_onboard/runtime_root/fc_factory.py`
## Purpose
Defines the public interfaces for C8: per-FC inbound telemetry subscription + outbound external-position emission, plus the GCS link. Two production `FcAdapter` strategies linked at build time per ADR-002: `PymavlinkArdupilotAdapter` (ArduPilot Plane via MAVLink 2.0 with signing) and `Msp2InavAdapter` (iNav via MSP2, unsigned per RESTRICT-COMM-2). One production `GcsAdapter` strategy: `QgcTelemetryAdapter` (downsampled 12 Hz summary to QGroundControl + operator command ingestion). Selected at startup via `config.fc.adapter` and `config.gcs.adapter` with `BUILD_FC_<variant>` / `BUILD_GCS_<variant>` flag gating per ADR-002.
C8 is the **single source** of FC inbound telemetry — C1 (VIO) and C5 (StateEstimator) receive `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` exclusively via a constructor-injected `FcAdapter`. C8 is also the **single sink** of outbound external-position — C5's `EstimatorOutput` is encoded into the per-FC wire format at 5 Hz with honest 6×6 → 2×2 covariance projection.
Replay extensions (AZ-265 / E-DEMO-REPLAY) live inside the same component but ship under separate `BUILD_TLOG_REPLAY_ADAPTER` / `BUILD_REPLAY_SINK_JSONL` flags; they implement the same Protocols and are out of scope for E-C8 itself.
The shared `WgsConverter` (AZ-279), `SE3Utils` (AZ-277), and `FdrClient` (AZ-273) helpers are constructor-injected.
## Public API
### Protocol: `FcAdapter`
```python
@runtime_checkable
class FcAdapter(Protocol):
def open(self, port: PortConfig, signing_key: bytes | None) -> None: ...
def close(self) -> None: ...
def subscribe_telemetry(
self, callback: Callable[[FcTelemetryFrame], None]
) -> Subscription: ...
def emit_external_position(self, output: EstimatorOutput) -> None: ...
def emit_status_text(self, msg: str, severity: Severity) -> None: ...
def request_source_set_switch(self) -> None: ... # AP-only; iNav raises SourceSetSwitchNotSupportedError
def current_flight_state(self) -> FlightStateSignal: ...
```
### Protocol: `GcsAdapter`
```python
@runtime_checkable
class GcsAdapter(Protocol):
def open(self, port: PortConfig) -> None: ...
def close(self) -> None: ...
def emit_summary(self, output: EstimatorOutput) -> None: ... # internally rate-limited to 12 Hz
def subscribe_operator_commands(
self, callback: Callable[[OperatorCommand], None]
) -> Subscription: ...
def emit_status_text(self, msg: str, severity: Severity) -> None: ...
```
### DTOs (frozen, slotted)
```python
@dataclass(frozen=True, slots=True)
class PortConfig:
device: str # e.g. /dev/ttyTHS1
baud: int
fc_kind: FcKind # enum {ARDUPILOT_PLANE, INAV}
class FcKind(Enum):
ARDUPILOT_PLANE = "ardupilot_plane"
INAV = "inav"
class Severity(Enum):
INFO = 6
WARNING = 4
ERROR = 3 # values mirror MAVLink STATUSTEXT severities
@dataclass(frozen=True, slots=True)
class FcTelemetryFrame:
kind: TelemetryKind # enum {IMU_SAMPLE, ATTITUDE, GPS_HEALTH, MAV_STATE}
payload: TelemetryPayload # union; see _types/fc.py
received_at: int # monotonic_ns
signed: bool # true ONLY for AP signed frames
@dataclass(frozen=True, slots=True)
class FlightStateSignal:
state: FlightState # enum {INIT, ARMED, IN_FLIGHT, ON_GROUND, FAILED}
last_valid_gps_hint_wgs84: LatLonAlt | None # for AC-5.1 warm-start
last_valid_gps_age_ms: int | None
captured_at: int # monotonic_ns
@dataclass(frozen=True, slots=True)
class GpsHealth:
status: GpsStatus # enum {NO_FIX, DEGRADED, STABLE, STABLE_NON_SPOOFED, SPOOFED}
fix_age_ms: int
captured_at: int
@dataclass(frozen=True, slots=True)
class EmittedExternalPosition:
fc_kind: FcKind
horiz_accuracy_m: float # AP horiz_accuracy / iNav hPosAccuracy (mm internally)
source_label: PoseSourceLabel
emitted_at: int # monotonic_ns
sequence_number: int
```
### Error hierarchy
```python
class FcAdapterError(Exception): ...
class FcOpenError(FcAdapterError): ...
class FcEmitError(FcAdapterError): ...
class SigningHandshakeError(FcAdapterError): ...
class SigningKeyExpiredError(FcAdapterError): ...
class SourceSetSwitchError(FcAdapterError): ...
class SourceSetSwitchNotSupportedError(SourceSetSwitchError): ...
class FcAdapterConfigError(FcAdapterError): ...
class GcsAdapterError(Exception): ...
class GcsEmitError(GcsAdapterError): ...
class GcsAdapterConfigError(GcsAdapterError): ...
```
### Composition-root factories
```python
def build_fc_adapter(
config: AppConfig,
wgs_converter: WgsConverter,
se3_utils: SE3Utils,
covariance_projector: CovarianceProjector,
fdr_client: FdrClient,
clock: Clock,
) -> FcAdapter: ...
def build_gcs_adapter(
config: AppConfig,
fdr_client: FdrClient,
clock: Clock,
) -> GcsAdapter: ...
```
Selection: `config.fc.adapter ∈ {"ardupilot_plane", "inav"}` → corresponding strategy, gated by `BUILD_FC_ARDUPILOT_PLANE` / `BUILD_FC_INAV`. `config.gcs.adapter ∈ {"qgc_mavlink"}``QgcTelemetryAdapter`, gated by `BUILD_GCS_QGC_MAVLINK`. Unknown strategy → `FcAdapterConfigError` / `GcsAdapterConfigError` at config load. Build-flag OFF for the requested strategy → same error class with the disabled-flag name in the message.
## Invariants
1. **Single open**: `open(...)` MUST be called exactly once per adapter instance. Re-open raises `FcOpenError`. `close()` is idempotent.
2. **Signing key required for AP**: `PymavlinkArdupilotAdapter.open(...)` with `signing_key=None` raises `SigningHandshakeError`. `Msp2InavAdapter.open(...)` MUST reject any non-None `signing_key` with `FcAdapterConfigError` (RESTRICT-COMM-2 — iNav has no signing).
3. **5 Hz periodic emit**: `emit_external_position` is consumed at exactly 5 Hz by the runtime root's emit timer. The adapter does NOT drive its own timer; it only encodes + writes when called. Internal emission rate-limit lives in the runtime root.
4. **Honest covariance projection**: every emitted external-position MUST have `horiz_accuracy_m` derived from the input `EstimatorOutput.covariance_6x6` via the shared `CovarianceProjector` — Frobenius-norm equivalence to the source 3×3 horizontal block within 1% (C8-IT-01). NEVER substitute a constant or downsampled estimate.
5. **Source-label propagation**: `EstimatorOutput.source_label` MUST be re-emitted via the per-FC out-of-band channel (AP: `NAMED_VALUE_FLOAT` + STATUSTEXT; iNav: STATUSTEXT only via the MAVLink telemetry side-channel).
6. **Smoothed estimates rejected**: `emit_external_position` MUST raise `FcEmitError` if `output.smoothed == True`. The forward-time invariant (AC-4.5 revised) is enforced at the C8 boundary as a defensive backstop on top of C5's filtering.
7. **Inbound timestamp monotonicity**: `FcTelemetryFrame.received_at` MUST be monotonically non-decreasing per kind. Out-of-order frames are dropped + logged at WARN.
8. **Single-writer thread for outbound**: `emit_external_position`, `emit_status_text`, and `request_source_set_switch` MUST be called from the same thread. Multi-thread write raises `RuntimeError`. Inbound subscribe-callbacks fire on the inbound decode thread; consumers must handle the thread boundary themselves.
9. **iNav signing assertion**: the iNav adapter MUST never emit a MAVLink2 frame with the signed-flag set, even on the side-channel telemetry link. Verified by C8-IT-08.
10. **Per-flight key zeroisation**: at `close()` (or process exit), the AP signing key buffer MUST be overwritten with zeroes before deallocation. The key MUST never be written to disk. Verified by C8-ST-02.
11. **Source-set switch idempotence**: `request_source_set_switch()` is safe to call multiple times in the same flight. Re-entry within 1 s is no-op'd (rate-limited); re-entry after a successful switch logs INFO + sends STATUSTEXT but does not re-issue the command.
12. **GcsAdapter downsampling**: `emit_summary` is invoked at 5 Hz by the runtime; the adapter internally downsamples to 12 Hz (configurable; default 2 Hz). Downsampling is rate-based (every Nth call), not selection-based.
## Producer / Consumer Split
| Task ID | Scope |
|---------|-------|
| AZ-390 (Producer) | Protocols, DTOs, error hierarchy, factories, composition root extension, `FcKind` / `FlightState` / `GpsStatus` / `Severity` enums, `FcAdapterStub` baseline (test-only no-op accepted by composition). NO concrete production adapter, NO wire encoding. |
| AZ-391 (Consumer 1) | Inbound subscription path: MAVLink 2.0 telemetry decoder (RAW_IMU/ATTITUDE/GPS_RAW_INT/MAV_STATE/HEARTBEAT) for AP + MSP2 telemetry decoder for iNav; produces `FcTelemetryFrame` + bounded ring buffers; emits `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` to subscribers. Backpressure + drop-oldest on overflow. |
| AZ-392 (Consumer 2) | `CovarianceProjector` helper inside C8: 6×6 → 3×3 position sub-matrix → 2×2 horizontal sub-matrix → equivalent_radius (m for AP, mm for iNav). Honest projection per the AC-4.3 formula. |
| AZ-393 (Consumer 3) | `PymavlinkArdupilotAdapter` outbound path: encode `EstimatorOutput` as `GPS_INPUT` (5 Hz); side-channel `NAMED_VALUE_FLOAT` for `source_label` + `STATUSTEXT` mirror; uses the CovarianceProjector. NO signing logic (delivered separately). |
| AZ-394 (Consumer 4) | `Msp2InavAdapter` outbound path: encode `EstimatorOutput` as `MSP2_SENSOR_GPS` (5 Hz) via YAMSPy + INAV-Toolkit; STATUSTEXT mirror via the secondary MAVLink telemetry channel. iNav-specific quirks (mm units, sequence numbers). |
| AZ-395 (Consumer 5) | MAVLink 2.0 per-flight signing handshake (AP only): generate ephemeral key at `open(...)`, complete pymavlink signing handshake, key rotation logging to FDR, key zeroisation on close. D-C8-9 R03 risk; gated for production by IT-3 SITL pass. |
| AZ-396 (Consumer 6) | `MAV_CMD_SET_EKF_SOURCE_SET` D-C8-2 source-set switch (AP only): `request_source_set_switch()` body, ACK handling, `SourceSetSwitchError` on timeout, idempotence per Invariant 11. Wired to C5's spoof-recovery gate via the runtime root. |
| AZ-397 (Consumer 7) | `QgcTelemetryAdapter` GcsAdapter: open MAVLink 2.0 channel, downsample 5 Hz → 12 Hz `emit_summary`, operator command ingestion (`subscribe_operator_commands`), STATUSTEXT mirror. |
Tests C8-IT-01..08 + C8-PT-01 + C8-ST-01..02 are deferred to E-BBT (AZ-262) per the project's E-BBT pattern.
## Constraints
- `@runtime_checkable` Protocols; DTOs `frozen=True, slots=True`.
- Lazy-import per ADR-002.
- Public API restricted to `interface.py` + `__init__.py` re-exports per `module-layout.md`.
- `pymavlink` is bundled unmodified per D-C8-3.
- Signing key MUST never appear in a log line, FDR record, or stderr trace.
- The `PortConfig.device` and `signing_key` are constructor-time inputs to `open(...)`; they MUST NOT be re-readable from the adapter post-open (no `get_port_config()` accessor).
## Risks / Mitigations
- **R03** (MAVLink 2.0 per-flight signing has no operator-deployed precedent): gated by IT-3 SITL pass before flight-test sign-off.
- **R09** (signing key compromise): per-flight ephemeral keys + zeroisation; never persisted; never logged.
- Cross-adapter drift (AP vs iNav contract): the shared `CovarianceProjector` + `FcTelemetryFrame` enforce wire-agnostic semantics. Per-FC quirks (mm units, signing) are quarantined to the variant adapter.
@@ -0,0 +1,161 @@
# Contract: Replay Mode (`FrameSource` + `ReplaySink` + `Clock` + replay composition)
**Owner**: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (`frame_source/`, `c8_fc_adapter/`); only the composition root and CLI are net-new top-level files.
**Producer task**: AZ-398 (`FrameSource` Protocol + `VideoFileFrameSource` + `LiveCameraFrameSource` retrofit + `Clock` Protocol)
**Consumer tasks**: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink), AZ-401 (compose_replay + Clock injection), AZ-402 (gps-denied-replay CLI), AZ-403 (Dockerfile + CI matrix + SBOM diff), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection).
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
**Module-layout home**:
- `src/gps_denied_onboard/frame_source/interface.py`, `__init__.py``FrameSource` Protocol (Layer 1 cross-cutting per `module-layout.md`).
- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py``TlogReplayFcAdapter` (gated `BUILD_TLOG_REPLAY_ADAPTER`).
- `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py``ReplaySink` interface + `JsonlReplaySink` (gated `BUILD_REPLAY_SINK_JSONL`).
- `src/gps_denied_onboard/clock/interface.py`, `__init__.py``Clock` Protocol.
- `src/gps_denied_onboard/runtime_root/replay.py``compose_replay(config) -> ReplayRoot`.
## Purpose
Defines the public interfaces enabling **offline replay mode** per epic AZ-265: run the production C1C5 pipeline against historical inputs (12 min Derkachi-style clip + matching pymavlink `.tlog`) so the parent-suite UI demo has end-to-end fidelity equal to a live flight. Production C1C5 components MUST remain mode-agnostic — replay-aware logic lives ONLY in the composition root, the new strategies, and the CLI. The replay binary is a fourth Docker image (`gps-denied-replay-cli`) containing C1C5 + replay strategies but NOT C6/C10/C11/C12 (no operator-side workflows; tile cache is read pre-built).
This contract defines four Protocols and the replay composition surface:
- **`FrameSource`** — the formalised cross-cutting interface for camera-frame ingestion (previously implicit). Two strategies: `LiveCameraFrameSource` (retrofit; existing camera plumbing renamed and put behind the Protocol) and `VideoFileFrameSource` (replay-only, gated `BUILD_VIDEO_FILE_FRAME_SOURCE`).
- **`Clock`** — the wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: `WallClock` (live/research/operator) and `TlogDerivedClock` (replay only).
- **`ReplaySink`** — the offline `EstimatorOutput` consumer interface. One strategy: `JsonlReplaySink` (one `EstimatorOutput` per JSONL line; gated `BUILD_REPLAY_SINK_JSONL`).
- **`TlogReplayFcAdapter`** — replay-only `FcAdapter` strategy (per AZ-261 `FcAdapter` Protocol from `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md`); parses pymavlink `.tlog` and emits `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` at tlog-timestamp cadence (or wall-clock-paced per `--pace`). Gated `BUILD_TLOG_REPLAY_ADAPTER`.
The shared `WgsConverter` (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion.
## Public API
### Protocol: `FrameSource`
```python
@runtime_checkable
class FrameSource(Protocol):
def next_frame(self) -> NavCameraFrame | None: ... # None on end-of-stream
def close(self) -> None: ...
```
### Protocol: `Clock`
```python
@runtime_checkable
class Clock(Protocol):
def monotonic_ns(self) -> int: ...
def time_ns(self) -> int: ... # wall-clock (UTC) for log timestamps
def sleep_until_ns(self, target_ns: int) -> None: ... # honoured in --pace realtime; no-op in --pace asap
```
### Protocol: `ReplaySink`
```python
@runtime_checkable
class ReplaySink(Protocol):
def emit(self, output: EstimatorOutput) -> None: ...
def close(self) -> None: ...
```
### Concrete: `TlogReplayFcAdapter`
```python
class TlogReplayFcAdapter(FcAdapter):
def __init__(
self,
tlog_path: Path,
target_fc_dialect: FcKind, # ARDUPILOT_PLANE | INAV
clock: Clock,
wgs_converter: WgsConverter,
time_offset_ms: int = 0, # auto-detected by AZ-405 auto-sync task or set via --time-offset-ms
pace: ReplayPace = ReplayPace.ASAP, # REALTIME | ASAP
): ...
```
The `TlogReplayFcAdapter` implements the full `FcAdapter` Protocol from AZ-261. `emit_external_position` raises `FcEmitError("replay adapter does not emit to FC")` (replay is read-only on the FC side; downstream consumers use `ReplaySink` instead). `request_source_set_switch` raises `SourceSetSwitchNotSupportedError`. `subscribe_telemetry` is the primary surface — fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace.
### CLI surface
```
gps-denied-replay
--video PATH
--tlog PATH
--output results.jsonl
--camera-calibration calib.json
--config config.yaml
[--pace {realtime,asap}] # default asap
[--time-offset-ms N] # overrides auto-sync
```
### Composition root extension
```python
def compose_replay(config: Config) -> ReplayRoot: ...
```
`ReplayRoot` is a dataclass holding all wired components plus the `FrameSource`, `TlogReplayFcAdapter`, `ReplaySink`, and `Clock` chosen for the replay run. The runtime loop is:
```
loop:
frame = frame_source.next_frame()
if frame is None: break
c1 = vio.process(frame) # C1
candidates = vpr.lookup(c1) # C2
reranked = rerank.rerank(candidates) # C2.5
matched = matcher.match(reranked) # C3
refined = refiner.refine_if_needed(matched) # C3.5
pose = pose_estimator.estimate(refined) # C4
state.add_pose_anchor(pose) # C5
state.add_vio(c1.vio_output) # C5
output = state.current_estimate()
replay_sink.emit(output)
replay_sink.close()
```
The tlog adapter's `subscribe_telemetry` callbacks are wired to C5's `add_fc_imu` and to C1's IMU prior on the same threads as in the live binary.
## Invariants
1. **Mode-agnostic C1C5**: production components MUST NOT contain `if replay_mode:` branches. Mode-specific behaviour lives in the strategy (Frame source / FC adapter / Sink / Clock). Verified by an explicit grep guard in CI.
2. **Single `Clock` per process**: the composition root resolves `Clock` exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected `Clock` via constructor — never `time.monotonic_ns()` directly. Verified by an AST scan in CI for direct `time.monotonic_ns` / `time.time_ns` references in components.
3. **Frame source ordering**: `next_frame()` returns frames in monotonically non-decreasing `monotonic_ns` order. Out-of-order frames raise `FrameSourceError` (NOT silently dropped — replay must be deterministic).
4. **End-of-stream is None**: `next_frame()` returns `None` ONLY when the stream is permanently exhausted. Transient I/O failures raise `FrameSourceError`.
5. **TlogReplayFcAdapter emit-only-via-sink**: `emit_external_position` and `emit_status_text` raise `FcEmitError("replay adapter does not emit to FC")`. Downstream consumers MUST emit to `ReplaySink` instead.
6. **Pace mode honoured by Clock**: `pace=REALTIME``Clock.sleep_until_ns(target_ns)` blocks until wall-clock catches up; `pace=ASAP` → no-op. The pace flag is consumed ONLY by the `Clock` and the tlog adapter — components see only the `Clock` Protocol.
7. **JsonlReplaySink one-line-per-emit**: each `emit(output)` writes exactly one JSON object + newline; the file is fsync'd on `close()`. Schema matches `EstimatorOutput` (frozen dataclass serialised via `dataclasses.asdict` + `orjson.dumps`).
8. **Time-offset honoured**: when constructed with `time_offset_ms != 0`, the tlog adapter shifts every emitted timestamp by that offset before passing to subscribers. `time_offset_ms` is set ONCE at construction (no live re-tuning).
9. **Build-flag gating**: `VideoFileFrameSource`, `TlogReplayFcAdapter`, `JsonlReplaySink` MUST refuse construction when their respective `BUILD_*` flag is OFF (per ADR-002 — replay binary has them ON; airborne / research / operator have them OFF).
10. **Determinism**: same `(video, tlog, config, time_offset_ms, pace=ASAP)` input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5).
## Producer / Consumer Split
| Task ID | Scope |
|---------|-------|
| AZ-398 (Producer) | `FrameSource` Protocol; `Clock` Protocol; `VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE`); `LiveCameraFrameSource` retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); `WallClock` + `TlogDerivedClock` strategies; composition wiring in the existing `compose_root`/`compose_operator` (Clock = WallClock there). NO tlog parsing, NO sink, NO replay composition. |
| AZ-399 (Consumer 1) | `TlogReplayFcAdapter`: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → `FcTelemetryFrame`; supports both AP and iNav dialects; `subscribe_telemetry` fan-out at the configured pace; respects `time_offset_ms`; honours `Clock` for pacing; fail-fast at startup if required message types absent (R-DEMO-3). |
| AZ-400 (Consumer 2) | `ReplaySink` Protocol + `JsonlReplaySink` (one JSON object per line; orjson serialiser; `close()` fsyncs). |
| AZ-401 (Consumer 3) | `compose_replay(config) -> ReplayRoot`: full strategy resolution for the replay binary; `Clock` strategy selection (TlogDerivedClock for ASAP, WallClock for REALTIME; documented per R-DEMO-4); `FrameSource` = `VideoFileFrameSource`; `FcAdapter` = `TlogReplayFcAdapter`; `Sink` = `JsonlReplaySink`; ALL of C1C5 wired with the same Public API as the live binary. NO C6/C10/C11/C12. Configuration loading + camera-calibration loading. |
| AZ-402 (Consumer 4) | `gps-denied-replay` CLI entrypoint: argparse, config + calibration loader, runtime loop (the loop body documented in this contract above), structured-error exit codes (0=success, 2=AC-8 sync-impossible, 1=any other error). |
| AZ-403 (Consumer 5) | `gps-denied-replay-cli` Dockerfile (multi-stage; Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server) + GitHub Actions matrix entry + SBOM diff CI step verifying absence of excluded components per AC-4. |
| AZ-404 (Consumer 6) | E2E replay fixture test: `tests/e2e/replay/test_derkachi_1min.py` — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by `RUN_REPLAY_E2E=1` in CI. |
| AZ-405 (Consumer 7) | Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8). Take-off pattern: sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s (typical quadcopter signature). Confidence-scored; falls back to WARN + best-guess if < 80 %; `--time-offset-ms` always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. |
## Constraints
- `@runtime_checkable` on all Protocols; DTOs `frozen=True, slots=True`.
- Lazy-import per ADR-002 with the new `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` flags.
- C1C5 components MUST remain mode-agnostic (Invariant 1).
- All time-driven logic in components MUST consume the injected `Clock` (Invariant 2).
- No HTTP server in the replay binary (parent-suite UI shells out to the CLI; defer until subprocess shape is proven insufficient).
- pymavlink bundled unmodified per D-C8-3.
- The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs).
## Risks / Mitigations
- **R-DEMO-1** (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + `--time-offset-ms` manual override. Fixed-wing hand-launch fallback documented.
- **R-DEMO-2** (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI.
- **R-DEMO-3** (demo footage missing required FC messages): `TlogReplayFcAdapter.open(...)` fails fast at startup, listing missing message types and the components that need them.
- **R-DEMO-4** (production C1C5 paths bake real-time-cadence assumptions): `Clock` injection (Invariants 1, 2). Documented as ADR amendment in next architecture-doc cycle.
## Notes for the Implementer
- The `LiveCameraFrameSource` retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing `FrameSource`. Its behaviour is unchanged. This is what allows C1 to consume `FrameSource` via constructor without becoming replay-aware.
- The `TlogReplayFcAdapter`'s `subscribe_telemetry` fan-out runs on a dedicated thread (mirroring the live `PymavlinkArdupilotAdapter` decode-thread semantics). This way C1 and C5 see identical thread boundaries in live and replay.
- The `Clock` Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume `Clock` without knowing the mode.
@@ -0,0 +1,83 @@
# Contract: composition_root_protocol
**Component**: shared_config (cross-cutting concern owned by E-CC-CONF / AZ-246)
**Producer tasks**: AZ-269 (config loader + outer Config) and AZ-270 (compose_root + compose_operator + StrategyNotLinkedError)
**Consumer tasks**: every component task that takes a config block; `runtime_root.py` and `operator_tool/__main__.py` (the two composition-root entrypoints)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen public surface for the configuration loader and the two composition-root functions. Components depend on these signatures (and the precedence rule) to know how their per-component config arrives at construction time and how they will be wired against their declared interfaces.
## Shape
### Function signatures (pythonic; binding is stdlib `dataclasses` / `attrs`-style)
```python
@frozen
class Config:
"""Outer config object. Populated by union of every component's config block.
Each component contributes one immutable nested dataclass field named after its slug
(e.g. config.c2_vpr, config.c5_state). Components MUST NOT read other components' blocks
— the composition root is the only consumer of the full Config."""
def load_config(env: Mapping[str, str], paths: Sequence[Path]) -> Config: ...
def compose_root(config: Config) -> RuntimeRoot: ...
def compose_operator(config: Config) -> OperatorRoot: ...
class StrategyNotLinkedError(RuntimeError):
"""Raised by compose_root / compose_operator when the config selects a strategy whose
BUILD_<NAME> flag was OFF in the linked binary (ADR-002 enforcement gate #3, after
SBOM diff and runtime self-check)."""
strategy_name: str # the strategy class identifier the config requested
component_slug: str # owning component (e.g. "c1_vio")
available_strategies: list[str] # strategies actually linked into this binary
```
| Symbol | Required | Description | Constraints |
|--------|----------|-------------|-------------|
| `Config` | yes | Outer frozen dataclass | One nested field per component slug; nested fields are immutable |
| `load_config` | yes | Builds `Config` from env + YAML files | Precedence: env > YAML > documented defaults |
| `compose_root` | yes | Wires the airborne `RuntimeRoot` | Constructs every component instance, injects dependencies, returns root |
| `compose_operator` | yes | Wires the operator-side `OperatorRoot` | Same contract, different component subset |
| `StrategyNotLinkedError` | yes | Raised on strategy/build-flag mismatch | Carries `strategy_name`, `component_slug`, `available_strategies` |
## Invariants
- `load_config` is pure with respect to its inputs: same `env` + same file contents always yields the same `Config`.
- Precedence is **env > YAML > defaults** for every key. Two YAML files merge with later paths winning over earlier ones.
- `compose_root` and `compose_operator` MUST NOT mutate the passed `Config`.
- `StrategyNotLinkedError` is the only error type these functions raise on a strategy/build-flag mismatch — never `ValueError`, `KeyError`, or a generic `RuntimeError`.
- Cold-start `load_config` + `compose_root` ≤ 1 s on Tier-2 (counts toward AC-NEW-1's 30 s startup budget).
## Non-Goals
- This contract does NOT define the Config dataclass field set — each component owns its own block (defined in its component epic). The contract only fixes the OUTER container's composition rule (one nested field per component slug, frozen).
- This contract does NOT define the YAML schema — that follows from the per-component config blocks.
- This contract does NOT define `RuntimeRoot` / `OperatorRoot` internal structure — only that they are returned from these functions.
## Versioning Rules
- **Breaking changes** (function rename, new required positional arg, exception class rename, precedence change) require a new major version + a deprecation pass through every component config block.
- **Non-breaking additions** (new keyword-only arg with default, new optional method on `RuntimeRoot`) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| precedence-env-wins | env sets `LOG_LEVEL=DEBUG`; YAML sets `log.level=INFO` | `config.log.level == "DEBUG"` | env > YAML |
| precedence-yaml-wins | YAML sets `log.level=INFO`; no env entry | `config.log.level == "INFO"` | YAML > defaults |
| precedence-defaults | neither env nor YAML set `log.level` | `config.log.level == <documented default>` | defaults baseline |
| compose-root-default-binary | valid Config with default strategies | returns `RuntimeRoot` whose component count matches the airborne profile | reachability proof |
| compose-root-strategy-missing | config selects `vins_mono`; binary built with `BUILD_VINS_MONO=OFF` | raises `StrategyNotLinkedError` with `strategy_name="vins_mono"`, `component_slug="c1_vio"`, `available_strategies=["okvis2", "klt_ransac"]` | ADR-002 enforcement |
| compose-operator-no-airborne | operator-side config | returns `OperatorRoot` containing only operator-tier components (e.g. C11, C12) | wrong-tier components excluded |
| load-config-purity | call `load_config(env, paths)` twice with same inputs | identical `Config` objects (or deep-equal) | reproducibility |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-CONF epic (AZ-246) | autodev decompose Step 2 |
@@ -0,0 +1,107 @@
# Contract: fdr_client_protocol
**Component**: shared_fdr_client (cross-cutting concern owned by E-CC-FDR-CLIENT / AZ-247)
**Producer task**: AZ-273 — `_docs/02_tasks/todo/AZ-273_fdr_client_ringbuf.md`
**Consumer tasks**: every onboard component that emits FDR records (C1C13), the C13 writer thread (AZ-248 / E-C13), the overrun-policy task (AZ-XX / E-CC-FDR-CLIENT #3), `FakeFdrSink` (AZ-XX / E-CC-FDR-CLIENT #4), and the composition root (`runtime_root.py`)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen public surface for the producer-side FDR queue. Every onboard producer holds exactly one `FdrClient(producer_id)`, calls `enqueue(record)`, and never blocks. The C13 writer thread is the sole consumer via `pop_one` / `drain`. The `on_overrun` hook is the documented extension point through which the overrun-policy PBI (next task in this epic) implements drop-oldest + `kind="overrun"` emission — without this hook, overrun behaviour would be hard-coded into the queue and AC-NEW-3 ("no silent drops") would be unobservable from outside.
## Shape
### Function and method APIs
```python
from typing import Callable
from .fdr_record_schema import FdrRecord # owned by AZ-272
class EnqueueResult:
OK = "ok"
OVERRUN = "overrun"
class FdrSpscViolationError(RuntimeError):
"""Raised when the SPSC contract is violated (concurrent dequeue, multi-producer enqueue)."""
class FdrClient:
def __init__(self, producer_id: str, capacity: int) -> None: ...
@property
def producer_id(self) -> str: ...
@property
def on_overrun(self) -> Callable[[FdrRecord], None] | None: ...
@on_overrun.setter
def on_overrun(self, hook: Callable[[FdrRecord], None] | None) -> None: ...
# Producer-side (single-threaded per FdrClient; lock-free; never blocks).
def enqueue(self, record: FdrRecord) -> EnqueueResult: ...
# Consumer-side (C13 writer; single-threaded per FdrClient; SPSC contract).
def pop_one(self) -> FdrRecord | None: ...
def drain(self, max_records: int) -> list[FdrRecord]: ...
# Test-only.
def flush(self) -> None: ...
# Module-level factory; preferred entrypoint for production code.
def make_fdr_client(producer_id: str, config: Config) -> FdrClient: ...
```
| Symbol | Required | Description | Constraints |
|--------|----------|-------------|-------------|
| `FdrClient(producer_id, capacity)` | yes | Construct a per-producer client; `capacity` MUST be `>= 16` and a power of two (ring-buffer-friendly) | `producer_id` non-empty; raises `ValueError` otherwise |
| `enqueue(record)` | yes | Non-blocking single-producer enqueue | Returns `OK` on success or `OVERRUN` when buffer is full; never raises into the caller; allocation-free on steady state |
| `on_overrun` (property) | yes | Hook invoked exactly once per overrun event with the would-be-enqueued record | Set by the overrun-policy PBI; default is `None` (records dropped silently is NOT acceptable in production — AC-NEW-3 requires the hook to be wired in `compose_root`) |
| `pop_one()` | yes | Single-consumer dequeue; returns the next record or `None` if empty | SPSC: only ONE thread may call `pop_one` / `drain` |
| `drain(max_records)` | yes | Pop up to `max_records` records in a single call | Same SPSC constraint as `pop_one` |
| `flush()` | yes | Test-only: blocks the calling thread until the buffer is empty | Production code MUST NOT call this on the hot path |
| `make_fdr_client(producer_id, config)` | yes | Factory; reads capacity from `config.fdr_client.<producer_id>.capacity` with documented default; caches one instance per `producer_id` | Two calls with the same `producer_id` return the same instance |
## Invariants
- **Lock-free**: `enqueue` and `pop_one` MUST NOT acquire a lock that any other thread can hold. They MAY use atomic primitives (CAS, single-word reads/writes, memory barriers) — these are not "locks" in the queue's sense.
- **Non-blocking enqueue**: `enqueue` returns within O(1) and never transitions the calling thread to BLOCKED state. When the buffer is full, it returns `OVERRUN` synchronously and invokes `on_overrun(record)` exactly once if the hook is set.
- **Allocation-free steady state**: `enqueue` for an in-buffer record (slot is free) MUST NOT allocate any heap object. The contract test verifies this with a `tracemalloc` snapshot diff (0 new objects).
- **SPSC**: each `FdrClient` instance has at most ONE producer thread (calls `enqueue`) and at most ONE consumer thread (calls `pop_one` / `drain`). Multi-producer or multi-consumer use is undefined behaviour. The instance includes an opt-in guard that raises `FdrSpscViolationError` on concurrent entry — used by the contract test.
- **One client per producer_id**: `make_fdr_client(producer_id, config)` returns the same cached instance on repeat calls. Tests use `_reset_for_tests()` (private, documented in Non-Goals) to clear the cache.
- **Producer_id stamped on every record**: `enqueue` does NOT mutate `record.producer_id` — the caller is responsible for setting it. The contract test verifies that `enqueue(FdrRecord(..., producer_id="c1_vio"))` lands on the consumer side with `producer_id == "c1_vio"`.
- **Cold-start budget**: constructing all FdrClient instances during `compose_root` is a one-time cost; the contract requires per-instance construction p99 ≤ 1 ms on Tier-2 (so ≤ 13 producers × 1 ms = 13 ms within the 1 s `compose_root` budget from the composition_root_protocol contract).
## Non-Goals
- This contract does NOT define the drop-oldest behaviour or what `on_overrun` does — that is the next PBI (AZ-XX) in this epic. The contract only defines the hook signature and the "exactly-once" invariant.
- This contract does NOT define the C13 writer thread, segment files, segment rotation, or 64 GB cap — owned by E-C13 (AZ-248).
- This contract does NOT define the `FdrRecord` schema or its serialisation — owned by AZ-272.
- This contract does NOT define `FakeFdrSink` — owned by the fourth PBI in this epic. `FakeFdrSink` SHOULD conform to `FdrClient`'s public surface so it is a drop-in replacement for component tests.
- `_reset_for_tests()` is intentionally private and test-only. Production code calling it is a contract violation.
## Versioning Rules
- **Breaking changes** (renaming a public method, changing return types, removing a method, weakening an invariant) → new major version + a deprecation pass through every consumer.
- **Non-breaking additions** (adding a new method, adding an optional kwarg with a default, strengthening an invariant) → minor version bump.
- **Patch changes** (doc clarification, performance budget tightening within tested limits) → patch bump.
- The contract test (`tests/contract/fdr_client_protocol.py`) MUST be updated alongside any version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-enqueue-pop-roundtrip | One `enqueue(record)` followed by one `pop_one()` | record returned; second `pop_one()` returns None | basic happy path |
| nonblocking-stalled-consumer | Consumer never calls `pop_one`; producer calls `enqueue` 1025 times into 1024-cap client | every call returns within 50 µs; #1025 returns `OVERRUN` | covers AC-1 |
| allocation-free-steady-state | Warmup, then `tracemalloc` snapshot diff across one `enqueue` | 0 new heap objects | covers AC-2 |
| capacity-from-config | `make_fdr_client("c1_vio", config_with_capacity_4096)` | `client._capacity() == 4096` | covers AC-3 |
| spsc-guard-rejects-multi-consumer | Two threads call `pop_one()` concurrently with guard enabled | `FdrSpscViolationError` raised | covers AC-4 |
| on-overrun-fires-once | Recording closure on `on_overrun`; force one overrun | closure called exactly once with the offending record | covers AC-5 |
| flush-drains | N records buffered, draining consumer thread, call `flush()` | returns only after buffer empty | covers AC-6 |
| empty-producer-id-rejected | `FdrClient(producer_id="")` | `ValueError` mentioning `producer_id` | covers AC-7 |
| invariant-cached-instance | Two `make_fdr_client("c1_vio", config)` calls | same instance | NFR-reliability |
| spsc-guard-rejects-multi-producer | Two threads call `enqueue` concurrently on same client with guard enabled | `FdrSpscViolationError` raised | strengthens AC-4 |
| no-mutation-of-producer-id | `enqueue(FdrRecord(producer_id="c1_vio"))` then `pop_one()` | popped record has `producer_id == "c1_vio"` | invariant test |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-FDR-CLIENT epic (AZ-247) | autodev decompose Step 2 |
@@ -0,0 +1,107 @@
# Contract: fdr_record_schema
**Component**: shared_fdr_client (cross-cutting concern owned by E-CC-FDR-CLIENT / AZ-247)
**Producer task**: AZ-272 — `_docs/02_tasks/todo/AZ-272_fdr_record_schema.md`
**Consumer tasks**: every onboard component that emits FDR records (C1C13), the C13 writer (AZ-248 / E-C13), post-flight tooling (E-C12 operator side), the FdrClient ring buffer (AZ-XX / E-CC-FDR-CLIENT next task), and `FakeFdrSink` (AZ-XX / E-CC-FDR-CLIENT fourth task)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen, versioned wire format for every record written to the Flight Data Recorder. Every onboard producer (logs, VIO ticks, state ticks, tile matches, overruns, rollovers, failed-tile thumbnails, mid-flight tile snapshots, flight headers/footers) MUST round-trip through this schema, and the C13 writer + post-flight tooling MUST be the only readers. The schema enforces forward-compatibility so post-flight tooling pinned at version N keeps working when producers move to N+1.
## Shape
### Outer envelope (one of these per record on the wire)
```python
# Conceptual dataclass — actual implementation may emit via orjson- or msgpack-backed serialiser pinned at E-BOOT.
@frozen
class FdrRecord:
schema_version: int # MUST be >= 1; reader uses this to pick the right parser branch
ts: str # ISO 8601 UTC, microsecond precision, e.g. "2026-05-10T03:14:15.123456Z"
producer_id: str # non-empty; component slug from module-layout.md (e.g. "c2_vpr") or "shared.<name>" for cross-cutting producers
kind: str # one of the v1.0.0 kinds (closed enum below) OR an unknown future tag (preserved opaquely)
payload: dict[str, Any] # kind-specific shape; well-known shapes documented per kind below
# Forward-compat bucket — populated by parser when the wire bytes carry fields the local schema does not know.
# NEVER set by producers; producers leave it empty.
extra: dict[str, Any] = field(default_factory=dict)
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `schema_version` | integer | yes | Schema major.minor packed as integer (1 for 1.x, 2 for 2.x) | `>= 1` |
| `ts` | string (ISO 8601 UTC, µs) | yes | Emit timestamp | RFC 3339 with `Z` suffix |
| `producer_id` | string | yes | Origin producer slug | non-empty; matches a module-layout component slug or `shared.<name>` |
| `kind` | string | yes | Record category | dotted snake_case, max 64 chars; v1.0.0 closed enum below |
| `payload` | object | yes (may be `{}` for kinds whose payload is empty) | Kind-specific data | JSON-safe / msgpack-safe scalars, nested dicts/arrays, no binary blobs >4 KiB |
| `extra` | object | parser-only | Forward-compat bucket for unknown future fields | populated by parser; producers MUST leave empty |
### v1.0.0 closed enum of `kind` values
| `kind` | Producer | Payload shape (required keys) | Notes |
|--------|----------|-------------------------------|-------|
| `log` | every component (via E-CC-LOG bridge) | `{level, component, frame_id?, kind, msg, kv, exc?}` (matches `log_record_schema` v1.0.0) | Forwarded WARN/ERROR records (per AZ-267 fdr_log_bridge) |
| `vio.tick` | C1 | `{frame_id, R, t, P, last_anchor_age_ms, mre_px?, imu_bias_norm?}` | Per-frame VIO output |
| `state.tick` | C5 | `{frame_id, fused_pose, covariance_2x2, estimator_label}` | Smoothed fused-pose tick |
| `tile_match` | C2.5 / C3 | `{frame_id, tile_id, score, match_count, ransac_inliers}` | Tile-matching diagnostics |
| `overrun` | E-CC-FDR-CLIENT itself | `{producer_id, dropped_count}` (`dropped_count > 0`) | AC-NEW-3: never silent. Emitted by drop-oldest hook |
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at}` | AC-8.4 mid-flight snapshot pointer |
| `flight_header` | C13 (writer) | `{flight_id, started_at, schema_version, build_info}` | Single record at flight open |
| `flight_footer` | C13 (writer) | `{flight_id, ended_at, records_written, records_dropped}` | Single record at flight close |
### Wire bytes
- `serialise(record: FdrRecord) -> bytes` returns a single self-delimited byte string (length-prefixed if msgpack, single-line UTF-8 if orjson — pinned at E-BOOT in `pyproject.toml`).
- `parse(buf: bytes) -> FdrRecord` is the inverse for a single record. Streaming parser (multi-record) is not part of this contract — C13 writer/reader own that.
## Invariants
- `schema_version >= 1` on every record; missing or non-integer values are rejected by `parse` with `FdrSchemaError`.
- `producer_id` is non-empty on every record. Anonymous records on the wire are a contract violation — `serialise` rejects them with `FdrSchemaError`.
- For `kind="overrun"`: `payload.producer_id` MUST equal the originating producer's slug, and `payload.dropped_count` MUST be `> 0`. (The OUTER envelope's `producer_id` is `"shared.fdr_client"` because the overrun record is emitted by the FdrClient itself, not by the producer whose enqueue overran.)
- Forward-compatible parser: a record at minor version N+1 carrying fields unknown at version N parses without exception; unknown payload fields land in `payload.extra`; unknown top-level fields land in record-level `extra`. Tooling MAY then choose to skip the record.
- Unknown future `kind` values do NOT raise — `parse` returns an `FdrRecord` with `kind` set to the raw string and `payload` set to whatever decoded; tooling MAY skip.
- Renaming a field, changing a field type, or removing a required field requires a major version bump (schema_version 2.x).
- Embedded binary blobs ≤ 4 KiB only. Bigger payloads (e.g. mid-flight tile JPEGs, ML inference inputs) MUST be referenced by sidecar path on disk; the contract test rejects oversized inline blobs.
- `serialise` and `parse` are pure: same input → byte-identical output (or deep-equal record).
- `FdrSchemaError` is the ONLY exception type either function raises on schema violation; library-specific exceptions (`orjson.JSONDecodeError`, `msgpack.UnpackException`, etc.) MUST be wrapped before crossing the public API.
## Non-Goals
- This contract does NOT define the lock-free SPSC ring buffer (`FdrClient`) — owned by the next task in E-CC-FDR-CLIENT.
- This contract does NOT define the writer thread, segment files, or 64 GB cap — owned by E-C13 (AZ-248).
- This contract does NOT define what triggers a record (per-component § 9 logging policies, VIO tick rate, etc. are owned by component epics).
- This contract does NOT define multi-record framing on disk — that is C13's segment file format, owned separately.
## Versioning Rules
- **Breaking changes** (field renamed/removed, type changed, ordering changed for length-prefixed wire format, library choice changed) → new major version (e.g. 2.0.0) + a deprecation pass through every consumer + a paired major bump on this contract.
- **Non-breaking additions** (new optional payload field appended, new `kind` value, new top-level optional field) → minor version bump. Forward-compat parser tolerates these by design.
- **Patch changes** (clarification, doc-only, no wire change) → patch bump.
- The contract test (`tests/contract/fdr_record_schema.py`) MUST be updated alongside any version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-roundtrip-log | `kind="log", payload={"level":"INFO","component":"c2_vpr","kind":"vpr.warmup","msg":"loaded","kv":{"model":"salad"}}` | `parse(serialise(r)) == r` | covers AC-1 |
| valid-roundtrip-overrun | `kind="overrun", producer_id="shared.fdr_client", payload={"producer_id":"c1_vio","dropped_count":42}` | round-trips; both producer_ids preserved | covers AC-1 + AC-5 |
| forward-compat-future-field | wire bytes carry `payload.new_field="x"` (hypothetical v1.1) parsed at v1.0 | record parses; `payload.extra["new_field"] == "x"` | covers AC-2 |
| forward-compat-unknown-kind | `kind="future.kind", payload={"foo":1}` | record parses opaquely; no exception | covers AC-3 |
| invalid-missing-version | bytes missing `schema_version` field | `FdrSchemaError`; message names `schema_version` | covers AC-4 |
| invalid-overrun-missing-dropped-count | `kind="overrun", payload={"producer_id":"c1_vio"}` | `FdrSchemaError`; message names `dropped_count` | covers AC-5 |
| invalid-overrun-zero-dropped-count | `kind="overrun", payload={"producer_id":"c1_vio","dropped_count":0}` | `FdrSchemaError`; message names `dropped_count` | covers AC-5 (must be `> 0`) |
| invalid-empty-producer-id | `producer_id=""` on serialise | `FdrSchemaError`; message names `producer_id` | covers AC-6 |
| invalid-oversized-blob | `payload={"jpeg":<8 KiB bytes>}` | `FdrSchemaError`; message says "use sidecar path" | invariant: ≤ 4 KiB inline |
| pure-determinism | call `serialise(r)` twice | byte-identical outputs | NFR-reliability |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-FDR-CLIENT epic (AZ-247) | autodev decompose Step 2 |
@@ -0,0 +1,82 @@
# Contract: descriptor_normaliser
**Component**: shared_helpers / `helpers.descriptor_normaliser` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-283 — `_docs/02_tasks/todo/AZ-283_descriptor_normaliser.md`
**Consumer tasks**: every C2 task that produces a query embedding before FAISS lookup; every C2.5 task that pre-processes descriptors for re-rank; every C3 task that pre-processes descriptors for cross-domain matching; every C10 task that builds the corpus side of the FAISS index during pre-flight provisioning
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
L2-normalise descriptors so cosine similarity aligns with FAISS's Euclidean / inner-product metric. Required because FAISS HNSW operates on Euclidean / inner-product spaces but the upstream backbones (UltraVPR, MegaLoc, MixVPR, etc.) emit raw cosine-similar embeddings. The same normalisation MUST be applied at both the **corpus** side (C10 during F1 provisioning) and the **query** side (C2 at runtime) — otherwise the index returns garbage. Centralising the helper guarantees they don't drift apart. Per `_docs/02_document/common-helpers/08_helper_descriptor_normaliser.md`.
## Shape
### For function / method APIs
```python
class DescriptorNormaliser:
@staticmethod
def l2_normalise(descriptor: np.ndarray) -> np.ndarray: ... # shape (D,)
@staticmethod
def l2_normalise_batch(descriptors: np.ndarray) -> np.ndarray: ... # shape (N, D)
@staticmethod
def descriptor_metric() -> str: ... # always "inner_product"
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `l2_normalise` | `(descriptor: (D,)) -> (D,)` | `DescriptorNormaliserError` if shape is not 1-D, `D < 1`, or dtype is not `float16` / `float32` | sync, hot-path |
| `l2_normalise_batch` | `(descriptors: (N, D)) -> (N, D)` | `DescriptorNormaliserError` if shape is not 2-D, `N < 1`, `D < 1`, or dtype is not `float16` / `float32` | sync, hot-path |
| `descriptor_metric` | `() -> str` | none | sync, in-memory, returns `"inner_product"` |
Numpy arrays. dtype contract: `float16` in → `float16` out; `float32` in → `float32` out (no silent up-cast). The helper does NOT mutate inputs in place — it returns a new array.
## Invariants
- **Stateless**: no module-level state; static methods only. Stateless static-only design satisfies `coderule.mdc`.
- **dtype-preserving**: `float16` in → `float16` out; `float32` in → `float32` out. The helper does NOT silently up-cast or down-cast. Other dtypes (e.g., `float64`, `int8`) are rejected.
- **Zero-norm vector handling**: a zero-norm input vector is returned as the zero vector (no division-by-zero, no exception). Callers must filter or accept that such descriptors will match nothing on FAISS lookup. Documented invariant.
- **No in-place mutation**: every call returns a new numpy array; the input is never modified.
- **Single source of truth for metric**: `descriptor_metric()` always returns `"inner_product"`. C6's `DescriptorIndex.search_topk` and C10's index-build code MUST call this helper for the FAISS index distance metric — never hard-code `"l2"` or `"cosine"`.
- **L2 idempotence**: `l2_normalise(l2_normalise(x)) == l2_normalise(x)` byte-equal for non-zero `x`. Re-normalising an already-normalised vector is a no-op (within `atol=0` for `float32`; within `atol=1e-3` for `float16` due to half-precision rounding).
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, numpy, and stdlib. No `gps_denied_onboard.components.*` imports.
## Non-Goals
- Whitening / mean-subtraction — out of scope; consumers that need it apply it before / after this helper.
- PCA / dimensionality reduction — owned elsewhere (or out of scope entirely).
- GPU-accelerated normalisation — out of scope for v1.0.0; numpy / numpy-CUDA is fine for descriptor vector sizes (≤ 8192 dims) at the per-frame rate.
- Quantisation (PQ, IVF) — owned by C6 / C10 around the FAISS index, not by this helper.
- Auto-detection of descriptor dim — the helper is shape-agnostic for any `D >= 1`; consumers ensure the corpus and query side use the same `D`.
## Versioning Rules
- **Breaking changes** (function renamed/removed, signature changed, dtype contract relaxed, return value of `descriptor_metric()` changed) require a new major version + a re-build of every FAISS index built with the previous version (since the index metric is baked into the corpus-side normalisation).
- **Non-breaking additions** (new helper function, new optional kwarg with safe default) require a minor version bump.
- Changing `descriptor_metric()` return value is ALWAYS a major version because it forces every downstream FAISS index to be rebuilt.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-unit-vector | `np.array([3.0, 4.0], dtype=float32)` | `np.array([0.6, 0.8], dtype=float32)`; norm ≈ 1.0 within `atol=1e-6` | Round-trip happy path |
| valid-batch | `np.array([[3.0, 4.0], [1.0, 0.0]], dtype=float32)` | rows `[0.6, 0.8]` and `[1.0, 0.0]`; each row's norm ≈ 1.0 | Batch path |
| valid-fp16-roundtrip | random `float16` descriptor of dim 512 | `result.dtype == float16`; norm ≈ 1.0 within `atol=1e-3` | dtype preservation |
| valid-fp32-roundtrip | random `float32` descriptor of dim 512 | `result.dtype == float32`; norm ≈ 1.0 within `atol=1e-6` | dtype preservation |
| valid-zero-vector | `np.zeros(128, dtype=float32)` | returned as `np.zeros(128, dtype=float32)`; no exception, no NaN | Zero-norm invariant |
| valid-idempotent-fp32 | `l2_normalise(l2_normalise(x))` for `float32` `x` | byte-equal to `l2_normalise(x)` | Idempotence (fp32) |
| valid-idempotent-fp16 | `l2_normalise(l2_normalise(x))` for `float16` `x` | matches within `atol=1e-3` | Idempotence (fp16, looser due to half-precision) |
| valid-no-mutation | call `l2_normalise(x)`; check `x` afterward | `x` is bit-identical to its original value | No in-place mutation |
| valid-metric | `descriptor_metric()` | returns the string `"inner_product"` | Single source of truth |
| invalid-dtype-float64 | `np.array([1.0, 2.0], dtype=float64)` | `DescriptorNormaliserError` mentions `float16` / `float32` only | dtype contract |
| invalid-shape-2d-on-single | `np.zeros((2, 3), dtype=float32)` passed to `l2_normalise` (single) | `DescriptorNormaliserError` mentions 1-D shape required | Shape contract (single) |
| invalid-shape-1d-on-batch | `np.zeros(128, dtype=float32)` passed to `l2_normalise_batch` | `DescriptorNormaliserError` mentions 2-D shape required | Shape contract (batch) |
| no-upward-imports | static import scan | only `_types`, numpy, stdlib | Layer 1 invariant |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/08_helper_descriptor_normaliser.md` | autodev decompose Step 2 |
@@ -0,0 +1,92 @@
# Contract: engine_filename_schema
**Component**: shared_helpers / `helpers.engine_filename_schema` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-281 — `_docs/02_tasks/todo/AZ-281_engine_filename_schema.md`
**Consumer tasks**: every C7 task that writes / reads `.engine` files via the inference runtime; every C10 task that compiles engines through C7 and writes them to the cache root
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Self-describing `.engine` filename schema per D-C10-7. TensorRT engines are NOT portable across `(SM, JetPack, TRT, precision)` tuples; encoding the tuple in the filename makes mismatch instantly visible at takeoff load (F2) so refusing-to-deserialize-on-mismatch becomes trivial. Per `_docs/02_document/common-helpers/06_helper_engine_filename_schema.md`.
## Shape
### For function / method APIs
```python
class EngineFilenameSchema:
@staticmethod
def build(model_name: str, sm: int, jetpack: str, trt: str, precision: str) -> str: ...
@staticmethod
def parse(filename: str) -> EngineCacheKey: ...
@staticmethod
def matches_host(filename: str, host_capabilities: HostCapabilities) -> bool: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `build` | `(model_name, sm, jetpack, trt, precision) -> str` | `EngineFilenameSchemaError` if any input fails validation (see Invariants) | sync, pure |
| `parse` | `(filename) -> EngineCacheKey` | `EngineFilenameSchemaError` if filename does not match the format | sync, pure |
| `matches_host` | `(filename, host_capabilities) -> bool` | `EngineFilenameSchemaError` only if the filename itself is malformed (returns False on tuple mismatch — that's the expected "not a match" path) | sync, pure |
`EngineCacheKey` and `HostCapabilities` are imported from `gps_denied_onboard._types.manifests`. The `EngineCacheKey` Protocol exposes: `model_name: str`, `sm: int`, `jetpack: str`, `trt: str`, `precision: str` (where `precision in {"fp16", "int8", "mixed"}`).
### Filename format
```
{model}__sm{SM}_jp{JP_dotted}_trt{TRT_dotted}_{precision}.engine
```
Example: `ultravpr__sm87_jp6.2_trt10.3_fp16.engine`
## Invariants
- **Stateless**: no module-level state; static methods only. The static-only design satisfies the coderule.mdc constraint ("only use static methods for pure self-contained computations") because filename parsing is a pure mathematical function of its arguments.
- **Format strictness**: filenames MUST follow `{model}__sm{SM}_jp{JP}_trt{TRT}_{precision}.engine` exactly. The double underscore (`__`) after `model` is intentional — it is the field separator that lets `model` itself contain single underscores (e.g., `ultra_vpr__sm87_...`).
- **Field validation**:
- `model_name`: non-empty, only `[a-z0-9_]` characters (no double underscores), max 64 chars.
- `sm`: positive integer (e.g., 87 for Jetson Orin Nano Super; 86 for Orin AGX; 72 for Xavier).
- `jetpack`: dotted version string `<major>.<minor>` (e.g., `6.2`); each segment is a non-negative integer.
- `trt`: dotted version string `<major>.<minor>` (e.g., `10.3`); same rules as `jetpack`.
- `precision`: strictly one of `"fp16"`, `"int8"`, `"mixed"`.
- The dotted-version format must round-trip cleanly through filesystems — no `/` or `\` in `model_name` or version segments.
- **`matches_host` is exact-match**: returns True iff every tuple element matches exactly (`sm == current_sm`, `jetpack == current_jetpack`, `trt == current_trt`). Precision and model_name do not affect host-matching but ARE preserved in the parsed key.
- **Round-trip identity**: `parse(build(*args)) == EngineCacheKey(*args)` for any valid args. `build(parse(filename)._asdict())` returns the same filename for any valid filename.
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, `re`, and stdlib. No `gps_denied_onboard.components.*` imports.
## Non-Goals
- Versioning of the schema itself — there is no `schema_version` field. Adding a new tuple dimension is a Plan-phase carryforward (see Caveats in `_docs/02_document/common-helpers/06_helper_engine_filename_schema.md`).
- Engine compilation / compatibility resolution — owned by C7.
- Hot-loading engines / lazy materialisation — owned by C7.
- Filename collision detection across cache roots — owned by C10's Manifest.
## Versioning Rules
- **Breaking changes** (filename format changed, separator changed, new mandatory field added, precision enum reduced) require a new major version + a re-write pass over every existing `.engine` filename in the cache root.
- **Non-breaking additions** (new accessor function, new optional kwarg with safe default, new `precision` enum value appended) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-build-ultravpr | `("ultravpr", 87, "6.2", "10.3", "fp16")` | `"ultravpr__sm87_jp6.2_trt10.3_fp16.engine"` | Reference example from helper doc |
| valid-roundtrip | `parse(build(*args))` for 10 random valid tuples | each round-trip returns deep-equal `EngineCacheKey` | Round-trip invariant |
| valid-matches-host-true | filename built for `(sm=87, jp=6.2, trt=10.3)`, host with same | `matches_host` returns True | Exact match |
| valid-matches-host-false-sm | filename built for `sm=87`, host with `sm=72` | `matches_host` returns False (no exception) | Tuple mismatch |
| valid-matches-host-false-trt | filename built for `trt=10.3`, host with `trt=10.4` | `matches_host` returns False | Patch-version mismatch is still a mismatch |
| invalid-precision-enum | `build(..., precision="bf16")` | `EngineFilenameSchemaError` mentions allowed enum | Precision strictness |
| invalid-model-uppercase | `build("UltraVPR", ...)` | `EngineFilenameSchemaError` mentions `[a-z0-9_]` | Model-name strictness |
| invalid-model-double-underscore | `build("ultra__vpr", ...)` | `EngineFilenameSchemaError` mentions reserved separator | Separator collision guard |
| invalid-jetpack-format | `jetpack="6.2.1"` | `EngineFilenameSchemaError` mentions dotted `<major>.<minor>` format | Version strictness |
| invalid-parse-malformed | `parse("not_an_engine_file.bin")` | `EngineFilenameSchemaError` raised | Parse strictness |
| invalid-parse-missing-suffix | `parse("ultravpr__sm87_jp6.2_trt10.3_fp16")` (no `.engine`) | `EngineFilenameSchemaError` raised | Suffix required |
| no-upward-imports | static import scan | only `_types`, `re`, stdlib | Layer 1 invariant |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/06_helper_engine_filename_schema.md` | autodev decompose Step 2 |
@@ -0,0 +1,82 @@
# Contract: imu_preintegrator
**Component**: shared_helpers / `helpers.imu_preintegrator` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-276 — `_docs/02_tasks/todo/AZ-276_imu_preintegrator.md`
**Consumer tasks**: every C1 VIO task that consumes IMU windows; every C5 state-estimator task that builds GTSAM `CombinedImuFactor`s
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Centralise GTSAM `CombinedImuFactor` preintegration so C1 (VIO) and C5 (StateEstimator) cannot drift into two slightly-different IMU integrations of the same FC IMU window. The helper owns the GTSAM `PreintegrationCombinedParams` + `PreintegratedCombinedMeasurements` lifecycle; consumers feed samples and read closed factors. Per `_docs/02_document/common-helpers/01_helper_imu_preintegrator.md`.
## Shape
### For function / method APIs
```python
class ImuPreintegrator:
def __init__(self, params: PreintegrationCombinedParams) -> None: ...
def reset_with_bias(self, bias: ImuBias) -> None: ...
def integrate_sample(self, sample: ImuSample) -> None: ...
def integrate_window(self, window: ImuWindow) -> None: ...
def current_preintegration(self) -> CombinedImuFactor: ...
def reset_for_new_keyframe(self) -> CombinedImuFactor: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `reset_with_bias` | `(bias: ImuBias) -> None` | none | sync, in-memory |
| `integrate_sample` | `(sample: ImuSample) -> None` | `ImuPreintegrationError` if `sample.ts_ns` is not strictly monotonic vs. last sample | sync, hot-path |
| `integrate_window` | `(window: ImuWindow) -> None` | `ImuPreintegrationError` on monotonicity violation | sync, hot-path |
| `current_preintegration` | `() -> CombinedImuFactor` | `ImuPreintegrationError` if zero samples integrated since last reset | sync |
| `reset_for_new_keyframe` | `() -> CombinedImuFactor` | `ImuPreintegrationError` if zero samples integrated since last reset | sync; clears internal state |
`ImuSample`, `ImuWindow`, `ImuBias` types are imported from `gps_denied_onboard._types.nav`. `CombinedImuFactor` is the GTSAM-native factor type (re-exported from `helpers.imu_preintegrator` so consumers do not import GTSAM directly).
### Construction
```python
def make_imu_preintegrator(calibration: CameraCalibration) -> ImuPreintegrator: ...
```
`make_imu_preintegrator` reads gyro/accel noise covariances from `CameraCalibration` (which carries the IMU noise model per-deployment per `_docs/02_document/components/01_c1_vio/description.md`) and returns an instance with the right `PreintegrationCombinedParams`. Composition root binds one instance per writer thread.
## Invariants
- **Single-threaded by design**: no internal lock. The composition root binds ONE preintegrator instance to ONE writer thread; concurrent calls from multiple threads are undefined behaviour. The contract test asserts the helper does not acquire any locks.
- **Strict monotonic timestamps**: every sample fed through `integrate_sample` / `integrate_window` MUST have `ts_ns` strictly greater than the previously-integrated sample's `ts_ns`. Violations raise `ImuPreintegrationError`; the preintegrator state is NOT mutated by a rejected sample.
- **Bias drift is the consumer's responsibility**: the preintegrator never re-estimates bias internally. Consumers (C1, C5) call `reset_with_bias(...)` whenever their bias estimate changes; until then, integration uses the last-set bias.
- **No clock ownership**: every IMU sample carries its own monotonic timestamp. The preintegrator never reads a wall clock and never injects timestamps.
- **Consumers receive GTSAM types**: `current_preintegration()` and `reset_for_new_keyframe()` return GTSAM `CombinedImuFactor` instances that consumers attach to their factor graphs. The factor object is owned by the caller after return (no lingering references inside the helper).
- **`reset_for_new_keyframe` is destructive**: it returns the closed factor AND resets internal accumulators. Callers MUST capture the return value or lose the integration.
## Non-Goals
- Bias estimation / re-bias logic — owned by C1 and C5.
- Multi-threaded sample feeding — out of scope; helper is single-thread by contract.
- IMU sample acquisition / FC adapter integration — owned by C8.
- Serialising preintegrated factors to FDR records — owned by C13 / E-CC-FDR-CLIENT.
## Versioning Rules
- **Breaking changes** (method renamed/removed, parameter type changed, return type changed, monotonicity invariant relaxed) require a new major version + a deprecation pass through C1 and C5.
- **Non-breaking additions** (new optional method, new diagnostic accessor that does not mutate state) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-monotonic-sequence | 100 samples with strictly increasing `ts_ns`, then `current_preintegration()` | factor returned with `deltaTij` matching the time span; non-zero `delta_pose` | Round-trip happy path |
| valid-window-then-keyframe | one `integrate_window(N samples)` then `reset_for_new_keyframe()` | factor returned; subsequent `current_preintegration()` raises `ImuPreintegrationError` (state cleared) | Confirms destructive reset |
| invalid-non-monotonic-sample | sample with `ts_ns < last_ts_ns` | `ImuPreintegrationError` raised; internal state unchanged (next valid sample integrates as if rejected sample never came) | Strict-monotonic invariant |
| valid-rebias | `reset_with_bias(bias_a)`, integrate 50 samples, `reset_with_bias(bias_b)`, integrate 50 more, `current_preintegration()` | factor reflects bias_b applied to second half | Re-bias mid-window |
| invalid-empty-preintegration | `current_preintegration()` after `reset_for_new_keyframe()` with no further samples | `ImuPreintegrationError` mentions "no samples since reset" | Guard against empty factor |
| determinism | same `(bias, samples)` integrated twice into two instances | deep-equal `CombinedImuFactor` outputs | Pure-function determinism |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/01_helper_imu_preintegrator.md` | autodev decompose Step 2 |
@@ -0,0 +1,93 @@
# Contract: lightglue_runtime
**Component**: shared_helpers / `helpers.lightglue_runtime` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-278 — `_docs/02_tasks/todo/AZ-278_lightglue_runtime.md`
**Consumer tasks**: C2.5 InlierBasedReranker (single-pair LightGlue inlier counter); C3 CrossDomainMatcher (heavier matching pass)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Single owner of the LightGlue inference engine. C2.5 does single-pair LightGlue matching for inlier counting on K=10 candidates per frame; C3 does the heavier matching pass on the surviving N=3 candidates. Both consume the SAME LightGlue engine — sharing avoids paying the engine-build / GPU-memory cost twice and structurally prevents the C2.5 ↔ C3 import cycle (R14 fix in `_docs/02_document/epics.md`). Per `_docs/02_document/common-helpers/03_helper_lightglue_runtime.md`.
## Shape
### For function / method APIs
```python
class LightGlueRuntime:
def __init__(self, engine_handle: EngineHandle) -> None: ...
def descriptor_dim(self) -> int: ...
def match(
self,
features_a: KeypointSet,
features_b: KeypointSet,
) -> CorrespondenceSet: ...
def match_batch(
self,
features_a_list: list[KeypointSet],
features_b_list: list[KeypointSet],
) -> list[CorrespondenceSet]: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `__init__` | `(engine_handle: EngineHandle) -> None` | `LightGlueRuntimeError` if `engine_handle` is None or descriptor_dim < 1 | sync, one-time |
| `descriptor_dim` | `() -> int` | none | sync, in-memory |
| `match` | `(KeypointSet, KeypointSet) -> CorrespondenceSet` | `LightGlueRuntimeError` if descriptor dims mismatch the engine's expected dim, or if a concurrent caller tries to enter | sync, GPU-bound |
| `match_batch` | `(list[KeypointSet], list[KeypointSet]) -> list[CorrespondenceSet]` | same as `match` | sync, GPU-bound |
`EngineHandle`, `KeypointSet`, and `CorrespondenceSet` are imported from `gps_denied_onboard._types`. `EngineHandle` is a Protocol (NOT a concrete class) so this helper does not import any Layer 2+ component; the production handle is created by C7's `InferenceRuntime.deserialize_engine` and injected by the composition root.
### Construction
The composition root constructs the runtime once at takeoff:
```python
engine_handle = inference_runtime.deserialize_engine(LIGHTGLUE_ENGINE_CACHE_ENTRY)
runtime = LightGlueRuntime(engine_handle)
# inject the SAME instance into both consumers
c2_5_reranker = InlierBasedReranker(..., lightglue_runtime=runtime, ...)
c3_matcher = CrossDomainMatcher(..., lightglue_runtime=runtime, ...)
```
## Invariants
- **Serial-access invariant** (R14 cross-component): the runtime owns ONE CUDA stream. Concurrent calls to `match` / `match_batch` from multiple threads are FORBIDDEN. The composition root binds the runtime to the single F3 hot-path thread (per `_docs/02_document/epics.md` R14 entry). The helper's contract test asserts a guard exists that rejects concurrent entry with `LightGlueConcurrentAccessError`.
- **Backbone consistency**: features fed in MUST come from the same backbone as the LightGlue engine was trained for (DISK in production-default; ALIKED / XFeat alternates). Mixing backbones is a runtime error caught by the input shape check (`descriptor_dim` mismatch raises `LightGlueRuntimeError`). The helper does NOT silently coerce dimensions.
- **No shared mutable state**: the runtime exposes no `set_*` / `update_*` methods. Once constructed with an `engine_handle`, its behaviour is fixed for its lifetime.
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, numpy, and stdlib. NO `gps_denied_onboard.components.*` imports — neither C2.5 nor C3 nor C7 — under any circumstance. This is the structural fix for R14: the helper sits below the components in the layering, so the C2.5 ↔ C3 cycle becomes impossible to express.
- **Engine handle is opaque**: the helper does not know whether the handle wraps a TensorRT engine, an ONNX session, or a PyTorch model. It calls a fixed Protocol surface (`forward(...)`, `descriptor_dim`); the implementation owner is C7.
## Non-Goals
- Engine compilation / serialisation — owned by C7 (via `EngineFilenameSchema` + the inference runtime).
- Engine cache management / takeoff load — owned by C10 (`CacheProvisioner`).
- Backbone-specific feature extraction (DISK, ALIKED, XFeat) — owned by C3 / C7.
- Multi-GPU sharding — out of scope; production target is single-GPU Tier-2.
- Mixed-backbone matching (cross-DISK-ALIKED) — out of scope; consumers ensure backbone consistency before calling.
## Versioning Rules
- **Breaking changes** (method renamed/removed, signature changed, `EngineHandle` Protocol changed, serial-access invariant relaxed) require a new major version + a deprecation pass through C2.5 and C3.
- **Non-breaking additions** (new optional kwarg with safe default, new diagnostic accessor) require a minor version bump.
- Changing the underlying engine format (TensorRT → ONNX) is NOT a contract change because the helper's surface treats the handle as opaque — but it IS a C7 contract change and must follow C7's versioning rules.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-single-pair | two `KeypointSet`s of matching descriptor dim | `CorrespondenceSet` returned with `len > 0` for a synthetic-overlap pair | Round-trip happy path (C2.5 use) |
| valid-batch-3 | three pairs of `KeypointSet`s | three `CorrespondenceSet`s returned in order | Batch path (C3 use) |
| invalid-dim-mismatch | features with `descriptor_dim` not matching the engine | `LightGlueRuntimeError` mentions the expected vs actual dim | Backbone-consistency invariant |
| invalid-concurrent-access | two threads call `match` simultaneously | `LightGlueConcurrentAccessError` raised in the second-entering thread | R14 serial-access invariant |
| invalid-empty-handle | `LightGlueRuntime(engine_handle=None)` | `LightGlueRuntimeError` raised at construction | Construction guard |
| no-upward-imports | static import scan | only `_types`, numpy, stdlib — no `components.*` | R14 structural fix |
| determinism-given-engine | same `(features_a, features_b)` matched twice with the same engine handle | byte-equal `CorrespondenceSet` outputs | Pure-function determinism downstream of the engine |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/03_helper_lightglue_runtime.md` (R14 fix) | autodev decompose Step 2 |
@@ -0,0 +1,95 @@
# Contract: ransac_filter
**Component**: shared_helpers / `helpers.ransac_filter` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-282 — `_docs/02_tasks/todo/AZ-282_ransac_filter.md`
**Consumer tasks**: every C2.5 task that runs RANSAC over single-pair LightGlue matches; every C3 task that runs RANSAC over 2D-2D correspondences for the per-candidate inlier count; every C3.5 task that recomputes residual after AdHoP refinement; every C4 task that computes the per-frame final reprojection residual for FDR provenance
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Thin, deterministic wrapper around OpenCV's RANSAC + reprojection-residual computation. Keeps the four call sites (C2.5, C3, C3.5, C4) on one canonical inlier-filtering algorithm and one canonical residual definition (median pixel residual). Per `_docs/02_document/common-helpers/07_helper_ransac_filter.md`.
## Shape
### For function / method APIs
```python
class RansacFilter:
@staticmethod
def filter_correspondences(
correspondences: np.ndarray, # shape (N, 4): [x_a, y_a, x_b, y_b]
ransac_threshold_px: float,
min_inliers: int,
) -> RansacResult: ...
@staticmethod
def compute_reprojection_residual(
correspondences: np.ndarray, # shape (I, 4): inlier set
K: np.ndarray, # shape (3, 3): camera intrinsics
distortion: np.ndarray, # shape (5,) or (8,): OpenCV distortion model
pose: SE3,
) -> float: ...
```
`RansacResult` is a frozen dataclass:
```python
@dataclass(frozen=True)
class RansacResult:
inlier_correspondences: np.ndarray # shape (I, 4)
inlier_count: int # I
outlier_count: int # N - I
median_residual_px: float
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `filter_correspondences` | `(correspondences, ransac_threshold_px, min_inliers) -> RansacResult` | `RansacFilterError` if `correspondences` shape != `(N, 4)`, `ransac_threshold_px <= 0`, `min_inliers < 0`, or `N < 4` (RANSAC needs ≥4 points for homography) | sync, CPU |
| `compute_reprojection_residual` | `(correspondences, K, distortion, pose) -> float` | `RansacFilterError` on shape / dtype mismatch (correspondences must be `(I, 4)`, `K` must be `(3, 3)`, distortion must be `(5,)` or `(8,)`); returns `NaN` if `I == 0` | sync, CPU |
`SE3` is the type alias from `helpers.se3_utils` (re-exported GTSAM `Pose3`). All numpy arrays use `dtype=float64`.
## Invariants
- **Stateless**: no module-level state; static methods only. Stateless static-only design satisfies `coderule.mdc` ("only use static methods for pure self-contained computations").
- **Deterministic given fixed seed**: `cv2.findHomography(..., cv2.RANSAC)` is non-deterministic by default. The helper sets `cv2.setRNGSeed(0)` (or uses the explicit `seed` kwarg where the OpenCV API supports it) so the same input correspondences always produce the same `RansacResult`. Deterministic behaviour is part of the contract.
- **Median residual semantics**: `compute_reprojection_residual` returns the MEDIAN reprojection residual in pixels (NOT the mean — outliers in the 2D residual distribution should not bias the consumer's quality signal). Returns `NaN` if `correspondences.shape[0] == 0`.
- **OpenCV-internal RANSAC ownership note**: for C4's `solvePnPRansac` (2D-3D RANSAC), OpenCV does its own internal RANSAC. THIS helper's `filter_correspondences` is for the standalone 2D-2D case (C3, C2.5, C3.5). C4 uses ONLY `compute_reprojection_residual` from this helper.
- **Min-inliers semantics**: `min_inliers` is informational — `RansacResult.inlier_count` may be less than `min_inliers`. The helper does NOT raise when the count falls short; the consumer decides whether to proceed (`InsufficientInliersError` etc. live in the consuming components).
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, `helpers.se3_utils` (allowed — same Layer 1), `cv2`, `numpy`, and stdlib. No `gps_denied_onboard.components.*` imports.
## Non-Goals
- 2D-3D RANSAC inside `solvePnPRansac` — OpenCV does it internally; this helper does not wrap it.
- Per-component RANSAC threshold defaults — they are documented per-component in C2.5, C3, C3.5, C4 specs. This helper takes the threshold as a parameter; defaults belong to the consumers.
- Adaptive RANSAC (PROSAC, USAC) — out of scope for v1.0.0.
- GPU-accelerated RANSAC — out of scope for v1.0.0.
- Confidence / iteration-count tuning of the underlying `cv2.findHomography` call — exposed only via the `ransac_threshold_px` parameter; if a future consumer needs to tune iterations, that's a minor-version contract addition.
## Versioning Rules
- **Breaking changes** (function renamed/removed, signature changed, return shape changed, residual statistic changed from median to mean) require a new major version + a deprecation pass through C2.5, C3, C3.5, C4.
- **Non-breaking additions** (new optional kwarg with safe default, new accessor on `RansacResult`) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-clean-correspondences | 100 perfect homography correspondences | `inlier_count == 100`, `outlier_count == 0`, `median_residual_px ≈ 0.0` | Round-trip happy path |
| valid-mixed | 80 inliers + 20 outlier correspondences with threshold 1.5 px | `inlier_count``[78, 82]` (RANSAC noise tolerance), `outlier_count == 100 - inlier_count` | Mixed-quality input |
| valid-determinism | same input run twice through `filter_correspondences` | byte-equal `RansacResult` outputs | Deterministic-seed invariant |
| valid-residual-zero-on-clean | 4 perfect 2D-2D correspondences with known pose | `median_residual_px ≈ 0.0` | Clean residual |
| valid-residual-nan-on-empty | empty inlier array | returns `NaN` (no exception) | Empty-input semantics |
| invalid-shape | `correspondences.shape = (10, 3)` | `RansacFilterError`; mentions `(N, 4)` shape | Shape contract |
| invalid-threshold | `ransac_threshold_px = -1.0` | `RansacFilterError`; mentions positive threshold | Threshold guard |
| invalid-too-few-points | `correspondences.shape = (3, 4)` | `RansacFilterError`; mentions minimum 4 points | RANSAC point-count guard |
| invalid-K-shape | `K.shape = (4, 4)` in residual call | `RansacFilterError`; mentions `(3, 3)` shape | K shape contract |
| no-upward-imports | static import scan | only `_types`, `helpers.se3_utils`, `cv2`, `numpy`, stdlib | Layer 1 invariant |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/07_helper_ransac_filter.md` | autodev decompose Step 2 |
@@ -0,0 +1,78 @@
# Contract: se3_utils
**Component**: shared_helpers / `helpers.se3_utils` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-277 — `_docs/02_tasks/todo/AZ-277_se3_utils.md`
**Consumer tasks**: every C1 VIO task that produces relative poses, every C2.5 / C3 / C3.5 task that handles 4x4 → SE(3) conversion, every C4 task that converts `solvePnPRansac` output into a GTSAM factor, every C5 task that builds iSAM2 graph keys, every C8 task that encodes pose for FC emission
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Centralise SE(3) ↔ 4×4-matrix conversion and Lie-algebra exponential / logarithm / adjoint so every component that crosses the matrix-vs-pose boundary uses the same numerical convention. Per `_docs/02_document/common-helpers/02_helper_se3_utils.md`. Backed by GTSAM `Pose3` primitives where available; pure numpy fallback otherwise.
## Shape
### For function / method APIs
```python
def matrix_to_se3(T_4x4: np.ndarray) -> SE3: ...
def se3_to_matrix(pose: SE3) -> np.ndarray: ...
def exp_map(xi: np.ndarray) -> SE3: ... # xi shape (6,)
def log_map(pose: SE3) -> np.ndarray: ... # returns shape (6,)
def adjoint(pose: SE3) -> np.ndarray: ... # returns shape (6, 6)
def is_valid_rotation(R_3x3: np.ndarray, *, atol: float = 1e-6) -> bool: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `matrix_to_se3` | `(T_4x4) -> SE3` | `Se3InvalidMatrixError` if shape != (4,4), bottom row != [0,0,0,1], or rotation is not orthogonal within `atol` | sync, pure |
| `se3_to_matrix` | `(SE3) -> np.ndarray (4,4)` | none | sync, pure |
| `exp_map` | `(xi: (6,)) -> SE3` | `Se3InvalidMatrixError` if shape != (6,) | sync, pure |
| `log_map` | `(SE3) -> np.ndarray (6,)` | none | sync, pure |
| `adjoint` | `(SE3) -> np.ndarray (6,6)` | none | sync, pure |
| `is_valid_rotation` | `(R_3x3) -> bool` | none (returns False for any invalid input) | sync, pure |
`SE3` is a type alias for the GTSAM `Pose3` (re-exported from `helpers.se3_utils` so consumers do not import GTSAM directly). All numpy arrays use `dtype=float64`; passing `float32` raises `Se3InvalidMatrixError`.
## Invariants
- **Stateless**: no module-level state; every function is pure. The same input always produces the same output (deep-equal).
- **Right-handed convention**: rotation order is right-handed; `T_4x4` follows the standard `[[R, t], [0, 1]]` block layout.
- **Orthogonal-rotation guarantee on the way in**: callers MUST orthogonalise their rotation matrices before `matrix_to_se3`. The helper rejects matrices whose `R^T R` deviates from `I` by more than `atol`. The helper does NOT silently re-orthogonalise.
- **Positive-determinant rotation**: `det(R) ≈ +1`. Mirror matrices (`det(R) ≈ -1`) are rejected.
- **Round-trip identity**: `se3_to_matrix(matrix_to_se3(T)) == T` for any valid `T` within numerical tolerance (`np.allclose(..., atol=1e-9)`).
- **Lie-algebra round-trip**: `exp_map(log_map(p)) == p` for any non-degenerate `p` within `atol=1e-9`. Near-identity edge cases (twist norm < 1e-10) MUST not raise — the implementation falls back to the small-angle Taylor expansion documented in GTSAM.
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, GTSAM, numpy, and stdlib. No `gps_denied_onboard.components.*` imports.
## Non-Goals
- Quaternion utilities (`Rotation` / `Quaternion`) — out of scope; consumers that need a quaternion are expected to convert via numpy's `from_matrix` / `from_quat` paths inline.
- SE(2) / planar pose helpers — out of scope.
- Pose interpolation / Slerp — out of scope (consumers that need it implement it locally on top of `exp_map` / `log_map`).
- Manifold operators richer than exp/log/adjoint (e.g., parallel transport, twist composition Jacobians) — out of scope; revisit when a consumer needs them.
## Versioning Rules
- **Breaking changes** (function renamed/removed, signature changed, error type changed, dtype contract relaxed) require a new major version + a deprecation pass through C1, C2.5, C3, C3.5, C4, C5, C8.
- **Non-breaking additions** (new helper function, new optional kwarg with safe default) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-roundtrip-4x4 | random valid `T_4x4` | `np.allclose(se3_to_matrix(matrix_to_se3(T)), T, atol=1e-9)` | Round-trip happy path |
| valid-roundtrip-lie | random `xi` of norm ≈ 1.0 | `np.allclose(log_map(exp_map(xi)), xi, atol=1e-9)` | Lie-algebra round-trip |
| valid-near-identity | `xi = [1e-12]*6` | `exp_map(xi)` returns identity within `atol=1e-9`; no exception | Small-angle stability |
| invalid-non-orthogonal | `T_4x4` whose `R` has `R^T R - I` of norm 1e-3 | `Se3InvalidMatrixError` raised; helper does NOT silently re-orthogonalise | Strict caller-orthogonalisation rule |
| invalid-mirror | `T_4x4` with `det(R) = -1` | `Se3InvalidMatrixError` raised | Positive-det invariant |
| invalid-bottom-row | `T_4x4` with bottom row `[0,0,0,2]` | `Se3InvalidMatrixError` raised | Block-layout guard |
| invalid-dtype | `T_4x4` with `dtype=float32` | `Se3InvalidMatrixError` raised mentioning dtype | dtype contract |
| determinism | same `T_4x4` through `matrix_to_se3 → se3_to_matrix` twice | byte-equal numpy outputs | Pure-function determinism |
| no-upward-imports | static import scan of `helpers.se3_utils` | only `_types`, GTSAM, numpy, stdlib | Layer 1 invariant |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/02_helper_se3_utils.md` | autodev decompose Step 2 |
@@ -0,0 +1,77 @@
# Contract: sha256_sidecar
**Component**: shared_helpers / `helpers.sha256_sidecar` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-280 — `_docs/02_tasks/todo/AZ-280_sha256_sidecar.md`
**Consumer tasks**: every C6 task that writes the FAISS index / descriptor sidecar; every C7 task that writes engine cache files + INT8 calibration cache; every C10 task that writes the Manifest; every C11 task that verifies tile artifacts before serving them
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Centralise the atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) must verify gets written atomically AND has a `.sha256` sidecar that the verifier can independently recompute. Without a shared helper, C6 / C7 / C10 / C11 each grow their own slightly-different implementation; the takeoff-load gate breaks the moment one of them drifts. Per `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md`.
## Shape
### For function / method APIs
```python
class Sha256Sidecar:
@staticmethod
def write_atomic(path: Path, payload: bytes) -> str: ... # returns hex digest
@staticmethod
def write_atomic_and_sidecar(path: Path, payload: bytes) -> str: ... # returns hex digest
@staticmethod
def verify(path: Path) -> bool: ... # checks payload hash against sidecar
@staticmethod
def aggregate_hash(paths: list[Path]) -> str: ... # for Manifest covering many files
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `write_atomic` | `(path, payload) -> str` | `Sha256SidecarError` if parent dir missing or filesystem rejects rename; underlying `OSError` is wrapped | sync, I/O |
| `write_atomic_and_sidecar` | `(path, payload) -> str` | same as `write_atomic` plus failure to write the sidecar atomically | sync, I/O |
| `verify` | `(path) -> bool` | `Sha256SidecarError` if `path` exists but `path.sha256` is missing or malformed (returns `False` if `path` itself is missing) | sync, I/O |
| `aggregate_hash` | `(list[Path]) -> str` | `Sha256SidecarError` if any path is missing | sync, I/O |
`Path` is `pathlib.Path`. Hex digests are lowercase 64-char strings.
## Invariants
- **Atomic write**: `write_atomic` writes to a temp file in the same directory as `path` and renames to `path` once the bytes are flushed. The rename is filesystem-level — partial files NEVER appear at `path`.
- **Sidecar format**: `write_atomic_and_sidecar` writes `<path>.sha256` containing ONLY the lowercase hex digest, no JSON wrapper, no trailing newline. Keeps verification trivial (`open(...).read().strip() == expected`).
- **Verify is independent**: `verify(path)` recomputes the digest from the file's bytes and compares to the sidecar; it does NOT trust the sidecar's value alone.
- **Aggregate hash is order-deterministic**: `aggregate_hash` sorts the input paths first (case-sensitive, full path) so two runs that read the same files always yield the same aggregate. The aggregate is the SHA-256 of the concatenation of `<filename>\0<file-hex-digest>\n` lines (in sorted order).
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, `atomicwrites`, `hashlib`, `pathlib`, and stdlib. No `gps_denied_onboard.components.*` imports.
- **Production filesystem requirement**: the atomic rename is filesystem-level — works on POSIX local filesystems, not on NFS / SMB / overlayfs. The cache root MUST live on a local filesystem in production. Documented in the contract's Caveats section; not enforced at runtime (it would require an OS-specific check that adds no value when the deployment is locked).
## Non-Goals
- Cryptographic signing — the sidecar protects against accidental corruption + file-replacement-after-staging, NOT against an attacker with write access. Threat model treats the operator workstation as trusted; the companion's write access is restricted to F4 (mid-flight tile gen) which has its own per-flight signing key path (out of scope for this helper).
- Streaming hashing of files larger than RAM — the helper's API takes `payload: bytes`, so the entire payload is in memory at write time. Files larger than RAM are out of scope (and outside the operational constraints of the cache root anyway).
- Compression / on-disk encoding — payload is written verbatim.
- Sidecar format versioning — there is no version byte; if the format ever changes, the verifier rejects the old format and forces a re-write.
## Versioning Rules
- **Breaking changes** (sidecar format changed, function renamed/removed, return type changed, atomicity invariant relaxed) require a new major version + a deprecation pass through C6, C7, C10, C11.
- **Non-breaking additions** (new helper function, new optional kwarg with safe default) require a minor version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-write-and-verify | random 1 MiB payload, write to tmp path, then `verify` | `verify` returns True; sidecar contains the hex digest of the payload | Round-trip happy path |
| valid-aggregate-deterministic | 3 files written with the helper, then `aggregate_hash` called twice with paths in different order | both calls return the same hex digest | Order-deterministic invariant |
| valid-atomic-no-partial | inject a fault between temp write and rename (e.g., raise `OSError` mid-write); call `verify` afterward | `path` does NOT exist (or pre-existing version unchanged); no partial file at the target name | Atomicity invariant |
| invalid-sidecar-mismatch | manually overwrite `path` with different bytes after the sidecar was written | `verify(path)` returns False | Independent verification |
| invalid-missing-sidecar | `verify` on a path whose `.sha256` was deleted | `Sha256SidecarError` raised mentioning the missing sidecar | Strict sidecar requirement |
| invalid-malformed-sidecar | sidecar contains `not a hex digest` | `Sha256SidecarError` raised mentioning malformed digest | Sidecar format strictness |
| invalid-missing-file-in-aggregate | `aggregate_hash` on a list including a non-existent path | `Sha256SidecarError` raised mentioning the missing path | Aggregate input validation |
| no-upward-imports | static import scan | only `_types`, `atomicwrites`, `hashlib`, `pathlib`, stdlib | Layer 1 invariant |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` | autodev decompose Step 2 |
@@ -0,0 +1,88 @@
# Contract: wgs_converter
**Component**: shared_helpers / `helpers.wgs_converter` (cross-cutting concern owned by E-CC-HELPERS / AZ-264)
**Producer task**: AZ-279 — `_docs/02_tasks/todo/AZ-279_wgs_converter.md`
**Consumer tasks**: every C4 pose-estimation task that compares pose-in-WGS to pose-in-ENU; every C5 state-estimator task that initialises the iSAM2 graph from a WGS origin; every C6 task that maps a tile bbox to lat/lon; every C8 task that encodes pose for FC emission; every C10 / C11 task that resolves a bbox to a tile-id list; every C12 task where the operator enters a bbox
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Centralise WGS84 ↔ local-tangent-plane (ENU) ↔ tile-pixel coordinate conversions. Required by every component that interacts with geographic positions. Per `_docs/02_document/common-helpers/04_helper_wgs_converter.md`. Backed by `pyproj` for the geodesy primitives; tile_xy math uses the standard slippy-map convention so it matches `satellite-provider`'s on-disk layout.
## Shape
### For function / method APIs
```python
class WgsConverter:
@staticmethod
def latlonalt_to_ecef(p: LatLonAlt) -> np.ndarray: ... # shape (3,)
@staticmethod
def ecef_to_latlonalt(p_ecef: np.ndarray) -> LatLonAlt: ...
@staticmethod
def latlonalt_to_local_enu(origin: LatLonAlt, p: LatLonAlt) -> np.ndarray: ... # shape (3,)
@staticmethod
def local_enu_to_latlonalt(origin: LatLonAlt, p_enu: np.ndarray) -> LatLonAlt: ...
@staticmethod
def latlon_to_tile_xy(zoom: int, lat: float, lon: float) -> tuple[int, int]: ...
@staticmethod
def tile_xy_to_latlon_bounds(zoom: int, x: int, y: int) -> BoundingBox: ...
```
| Name | Signature | Throws / Errors | Blocking? |
|------|-----------|-----------------|-----------|
| `latlonalt_to_ecef` | `(LatLonAlt) -> np.ndarray (3,)` | `WgsConversionError` if lat / lon / alt are out of range | sync, pure |
| `ecef_to_latlonalt` | `(np.ndarray (3,)) -> LatLonAlt` | `WgsConversionError` on shape mismatch | sync, pure |
| `latlonalt_to_local_enu` | `(origin, p) -> np.ndarray (3,)` | `WgsConversionError` on origin / point validation | sync, pure |
| `local_enu_to_latlonalt` | `(origin, p_enu) -> LatLonAlt` | `WgsConversionError` on origin / shape | sync, pure |
| `latlon_to_tile_xy` | `(zoom, lat, lon) -> (int, int)` | `WgsConversionError` if zoom < 0 or > 22, lat out of `[-85.0511, 85.0511]`, lon out of `[-180, 180]` | sync, pure |
| `tile_xy_to_latlon_bounds` | `(zoom, x, y) -> BoundingBox` | `WgsConversionError` if `x` or `y` out of `[0, 2^zoom)` | sync, pure |
`LatLonAlt` and `BoundingBox` are imported from `gps_denied_onboard._types`. Numpy arrays use `dtype=float64`. `WgsConversionError` is the only exception type the public surface raises.
## Invariants
- **Stateless**: no module-level state; static methods only. The static-only design satisfies the coderule.mdc constraint ("only use static methods for pure self-contained computations") because every operation is a pure mathematical function of its arguments.
- **WGS84 ellipsoid only**: all conversions use the WGS84 ellipsoid; no datum-shift logic. If a future deployment needs alternative datum support, switch to an instance-based factory then.
- **Slippy-map tile convention**: `latlon_to_tile_xy` matches OSM / `satellite-provider`'s on-disk `{zoom}/{x}/{y}.jpg` layout. Latitude is clamped to the Web-Mercator-valid range `[-85.0511, 85.0511]`; values outside raise `WgsConversionError`.
- **ENU sign convention**: `latlonalt_to_local_enu` returns `(east, north, up)` in metres. Origin altitude IS used (height above ellipsoid); zero altitude is NOT silently substituted.
- **Round-trip identity**: `local_enu_to_latlonalt(origin, latlonalt_to_local_enu(origin, p)) ≈ p` within `atol=1e-6` metres (lat/lon to ~1 m, alt to ~1 cm) for `p` within 100 km of `origin`. Beyond 100 km the tangent-plane approximation degrades — the contract documents this limit.
- **Zoom-level dependence**: `tile_xy_to_latlon_bounds` and `latlon_to_tile_xy` are sensitive to `zoom`; callers MUST pass the right zoom for the tile in question (typically `zoomLevel` from `TileMetadata`).
- **No upward imports** (Layer 1): the module imports ONLY from `_types`, `pyproj`, numpy, and stdlib. NO `gps_denied_onboard.components.*` imports.
## Non-Goals
- Datum-shift logic / non-WGS84 datums — out of scope for v1.0.0.
- UTM / MGRS conversions — out of scope.
- Geoid-height corrections (orthometric vs. ellipsoidal altitude) — out of scope; consumers using altitude do so under the ellipsoid convention or apply geoid correction themselves.
- Vincenty / great-circle distance helpers — out of scope.
- Coordinate transforms involving rotation (body-frame ↔ ECEF) — owned by `helpers.se3_utils` plus the per-deployment `CameraCalibration`.
## Versioning Rules
- **Breaking changes** (function renamed/removed, signature changed, ENU sign convention flipped, return shape changed) require a new major version + a deprecation pass through C4, C5, C6, C8, C10, C11, C12.
- **Non-breaking additions** (new helper function, new optional kwarg with safe default) require a minor version bump.
- Adding a new datum is a major version (the static-only design assumes WGS84).
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-roundtrip-ecef | `LatLonAlt(50.0, 30.0, 100.0)` | `ecef_to_latlonalt(latlonalt_to_ecef(p))` matches `p` within `atol=1e-9 deg, 1e-6 m` | Round-trip happy path |
| valid-roundtrip-enu | origin + point ~10 km away | `local_enu_to_latlonalt(origin, latlonalt_to_local_enu(origin, p))` matches `p` within 1 m horizontal + 1 cm vertical | ENU round-trip |
| valid-tile-roundtrip-z18 | `(zoom=18, lat=50.45, lon=30.52)` | `latlon_to_tile_xy` returns valid `(x, y)`; `tile_xy_to_latlon_bounds(zoom, x, y)` contains the input lat/lon | Slippy-map convention |
| valid-tile-bounds-z18 | `(zoom=18, x=148000, y=89400)` | bounds returned with non-zero area; corners at expected slippy-map lat/lon | Tile bounds |
| invalid-lat-out-of-range | lat = 95.0 in `latlon_to_tile_xy` | `WgsConversionError` mentions Web-Mercator latitude range | Slippy-map invariant |
| invalid-zoom-too-high | zoom = 25 | `WgsConversionError` mentions zoom range `[0, 22]` | Zoom guard |
| invalid-tile-xy-out-of-range | `(zoom=18, x=2^18, y=0)` | `WgsConversionError` mentions tile-xy range | Tile-xy guard |
| invalid-shape | `ecef_to_latlonalt(np.array([1.0, 2.0]))` (shape (2,)) | `WgsConversionError` mentions expected shape (3,) | Shape contract |
| no-upward-imports | static import scan | only `_types`, `pyproj`, numpy, stdlib | Layer 1 invariant |
| determinism | same input through any function twice | byte-equal outputs | Pure-function determinism |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from `_docs/02_document/common-helpers/04_helper_wgs_converter.md` | autodev decompose Step 2 |
@@ -0,0 +1,84 @@
# Contract: log_record_schema
**Component**: shared_logging (cross-cutting concern owned by E-CC-LOG / AZ-245)
**Producer task**: AZ-266 — `_docs/02_tasks/todo/AZ-266_log_module.md`
**Consumer tasks**: every component task that emits logs (C1C13 components, plus C12 operator tooling)
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
## Purpose
Frozen, machine-parseable JSON envelope for every log record emitted by any onboard component. Stable field set + ordering is a hard requirement for FDR analysis tooling (`kind="log"` records are post-flight queryable) and for the contract test that verifies field-name + ordering invariants.
## Shape
### One JSON object per log line, UTF-8, no trailing comma, newline-terminated
```python
# Conceptual dataclass — actual implementation may emit via orjson / python-json-logger
@frozen
class LogRecord:
ts: str # ISO 8601 UTC, microsecond precision, e.g. "2026-05-10T03:14:15.123456Z"
level: str # one of {"DEBUG", "INFO", "WARN", "ERROR"} — matches Python stdlib levelname (no "WARNING")
component: str # component slug from module-layout.md, e.g. "c2_vpr", "c5_state", "shared.logging"
frame_id: int | None # monotonic per-flight frame counter; None for non-frame-correlated records (startup, shutdown, periodic)
kind: str # categorical tag, e.g. "vio.tick", "vpr.query", "fdr.write", "log.diag"
msg: str # human-readable short message, no PII, no stack traces (those go in `exc`)
kv: dict[str, Any] # arbitrary structured key-value payload, JSON-safe scalars + nested dict/list only
exc: str | None # optional formatted exception traceback for ERROR/WARN; None otherwise
```
| Field | Type | Required | Description | Constraints |
|-------|------|----------|-------------|-------------|
| `ts` | string (ISO 8601 UTC, µs) | yes | Emit timestamp | RFC 3339 with `Z` suffix |
| `level` | string | yes | Log level | strictly one of `DEBUG`, `INFO`, `WARN`, `ERROR` |
| `component` | string | yes | Origin component slug | snake_case, must match a module-layout entry or `shared.<name>` |
| `frame_id` | integer or null | no | Per-flight monotonic frame index | non-negative when present |
| `kind` | string | yes | Record category tag | dotted snake_case, max 64 chars |
| `msg` | string | yes | Human message | no embedded newlines (use `kv` for multi-line context) |
| `kv` | object | yes (may be `{}`) | Structured key-value payload | JSON-safe scalars + nested objects/arrays |
| `exc` | string or null | no | Exception traceback for ERROR/WARN | absent or `null` for INFO/DEBUG |
### Field ordering (REQUIRED — verified by contract test)
`ts, level, component, frame_id, kind, msg, kv, exc` — formatter MUST emit keys in this order. Re-ordering breaks downstream column-aligned parsers used by FDR tooling.
## Invariants
- Every record is a single JSON object on a single line (newline-terminated, no embedded newlines in any field value).
- `level` value uses `WARN` not `WARNING` (intentional, simpler grep target).
- `frame_id` is omitted (`null`) — never invented — when the emitter has no current frame context.
- `kv` values must be JSON-serialisable without custom encoders; binary payloads are base64-encoded strings within `kv`.
- `exc` is present only for `level in {WARN, ERROR}` records that originated from an exception; otherwise it is `null` or absent.
- The schema is strictly additive — no field is ever removed or renamed without a major version bump and a matching FDR record-schema migration in E-CC-FDR-CLIENT.
## Non-Goals
- This contract does not define WHAT to log (per-component § 9 sections own that).
- This contract does not define log routing (stdout vs journald vs FDR — see handler topology in E-CC-LOG epic).
- This contract does not define structured event types — `kind` is a free-form tag, not a closed enum.
## Versioning Rules
- **Breaking changes** (field renamed/removed, type changed, ordering changed, level enum reduced) require a new major version + a deprecation pass through every consumer.
- **Non-breaking additions** (new optional field appended at the end of the order, new `kind` tag, new `level` value) require a minor version bump.
- The contract test (`tests/contract/log_schema.py`) MUST be updated alongside any version bump.
## Test Cases
| Case | Input | Expected | Notes |
|------|-------|----------|-------|
| valid-info-no-frame | `level=INFO, component="c2_vpr", kind="vpr.warmup", msg="loaded model", kv={"model": "salad"}` | accepted; `frame_id=null`, `exc=null`; field order matches spec | Startup-time INFO record |
| valid-warn-with-frame | `level=WARN, component="c5_state", frame_id=4321, kind="state.cov_spike", msg="covariance jumped 5x", kv={"jump_factor": 5.2}` | accepted; key order locked; FDR bridge MUST forward this record | Cross-cuts AC: WARN flows into FDR |
| valid-error-with-exc | `level=ERROR, component="c11_tilemanager", kind="tile.upload_fail", msg="HTTP 503", kv={"tile": "z18/x12345/y67890"}, exc="Traceback (most recent call last):..."` | accepted; `exc` present and non-null; FDR bridge MUST forward | Cross-cuts AC: ERROR + exc captured |
| invalid-bad-level | `level="WARNING"` | rejected with `LogSchemaError` (or formatter logs at ERROR and drops record) | Contract test enforces `WARN` not `WARNING` |
| invalid-multiline-msg | `msg="line1\nline2"` | rejected OR newline replaced with `\\n` literal — formatter must guarantee single-line output | One JSON object per line invariant |
| invalid-non-serialisable-kv | `kv={"obj": <numpy.ndarray>}` | rejected with `LogSchemaError` (caller must convert to list before passing) | JSON-safe-only invariant |
| ordering-stable | any valid record | emitted JSON keys appear in `ts, level, component, frame_id, kind, msg, kv, exc` order regardless of construction order | Contract test parses raw bytes and asserts key order |
## Change Log
| Version | Date | Change | Author |
|---------|------|--------|--------|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-LOG epic (AZ-245) | autodev decompose Step 2 |