mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 21:41:12 +00:00
5adf3dd04f
Re-design replay mode per user direction: replay is no longer a fourth Docker image with a reduced component set, but a `config.mode = "replay"` branch of the single airborne binary. The pre-flight workflow (route in suite UI -> C12 tile download via real satellite-provider -> C10 manifest+engines build) is identical between live and replay; only three strategies swap at compose time: FrameSource: Live <-> Video FcAdapter: Pymavlink/MSP2 <-> TlogReplay MavlinkTransport: Serial <-> Noop The C8 outbound MAVLink encoders run unchanged in both modes; their bytes hit `NoopMavlinkTransport` in replay and disappear. A new `JsonlReplaySink` taps C5's `EstimatorOutput` stream so the parent-suite UI sees per-tick coordinates by tailing `results.jsonl`. MAVLink 2.0 signing key remains mandatory (operator supplies a dummy file). A new `replay_input/` Layer-4 cross-cutting coordinator owns `(video, tlog) -> (FrameSource, FcAdapter, Clock)` convergence; the composition root sees only standard interfaces past `.open()`. Docs: - architecture.md: new ADR-011 with full rationale; ADR-002 binary narrative updated. - contracts/replay/replay_protocol.md: bumped to v2.0.0; 12 invariants (notably mode-agnosticism + encoder byte-equality + signing key mandatory + real C6 cache in replay). - module-layout.md: Build-Time Exclusion Map dropped from 4 to 3 binary columns; replay-mode `BUILD_*` flags default ON in airborne; `shared/replay_input` cross-cutting entry added. - epics.md: E-DEMO-REPLAY scope reframed; story points 27-32 -> 19-24. Task respecs: - AZ-401: shrunk 3 -> 2 pts; `compose_root` mode branch + JSONL sink + NoopMavlinkTransport wiring; legacy `compose_replay` export deleted. - AZ-402: console-script wrapper that mutates `config.mode = "replay"` and dispatches into the shared airborne main; `--mavlink-signing-key` mandatory. - AZ-403: CANCELLED. Moved to done/ with banner; Jira transition deferred via `_docs/_process_leftovers/2026-05-14_az_403_cancellation_pending_tracker.md`. - AZ-404: AC-4 reworded as mode-agnosticism AST scan + encoder byte-equality test; new AC-8 operator-workflow rehearsal. - AZ-405: also owns the `replay_input/` module + `ReplayInputAdapter`. _dependencies_table.md updated: AZ-401 gains AZ-405 dep; AZ-404 drops AZ-403 dep; AZ-403 row marked CANCELLED. Co-authored-by: Cursor <cursoragent@cursor.com>
2291 lines
91 KiB
Markdown
2291 lines
91 KiB
Markdown
# Work-Item Epics — gps-denied-onboard Plan cycle 1
|
||
|
||
This file is the local epic draft for Plan Step 6. Tracker IDs (`AZ-XXX`) are now populated for every epic — they live in Jira project `AZ`. The canonical `E-*` ↔ `AZ-NN` mapping below is the source of truth referenced from each Jira epic's description.
|
||
|
||
## Conventions
|
||
|
||
- **Issue type**: Epic.
|
||
- **Epic descriptions are self-contained** per the plan-skill rule: a developer reading only the epic should understand the full context. Each epic has the 14 required sections (system context, problem, scope, architecture notes, interface spec, data flow, dependencies, AC, NFRs, risks, effort, child issues, key constraints, testing strategy).
|
||
- **Effort sizing**: T-shirt size + story-points range for the epic; per-task story points (PBI complexity) follow the user rule (1, 2, 3, 5, 8 — no PBI > 5; create 2/3, sometimes 5).
|
||
- **Cross-cutting epics** parent exactly one shared implementation task; component epics consuming the concern declare a dependency, never re-implement locally.
|
||
- **Dependency rule**: no epic depends on a later one in this index.
|
||
|
||
## Decompose-time amendment (cycle 1, dated 2026-05-10)
|
||
|
||
Row 20 (E-CC-HELPERS / AZ-264) was added during Decompose Step 2 to comply with the cross-cutting rule. The 8 shared helpers (`ImuPreintegrator`, `SE3Utils`, `LightGlueRuntime`, `WgsConverter`, `Sha256Sidecar`, `EngineFilenameSchema`, `RansacFilter`, `DescriptorNormaliser`) were originally listed as child issues inside their largest-consumer component epics (e.g., `ImuPreintegrator` under E-C1 child #5, `LightGlueRuntime` under E-C2.5 child #2). Those child-issue listings are now superseded — helper ownership moves to E-CC-HELPERS, and component epics consume helpers as dependencies. The original component epic descriptions in Jira still reference the helpers in their child-issue tables; those will be reconciled at the next epic-edit pass (or at Step 4 cross-verification).
|
||
|
||
## Index
|
||
|
||
| # | Epic ID | Title | Type | Tracker | T-shirt | Story Pts | Depends on |
|
||
|---|---------|-------|------|---------|---------|-----------|------------|
|
||
| 1 | E-BOOT | Bootstrap & Initial Structure | bootstrap | AZ-244 | M | 13–21 | — |
|
||
| 2 | E-CC-LOG | Cross-Cutting: Structured JSON Logging | cross-cutting | AZ-245 | S | 5–8 | E-BOOT |
|
||
| 3 | E-CC-CONF | Cross-Cutting: Configuration & Composition Root | cross-cutting | AZ-246 | S | 5–8 | E-BOOT |
|
||
| 4 | E-CC-FDR-CLIENT | Cross-Cutting: FDR Producer Client (lock-free queue + record schema) | cross-cutting | AZ-247 | M | 8–13 | E-BOOT, E-CC-LOG |
|
||
| 5 | E-C13 | C13 Flight Data Recorder (writer thread + segments + cap) | component | AZ-248 | L | 21–34 | E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT |
|
||
| 6 | E-C7 | C7 On-Jetson Inference Runtime | component | AZ-249 | L | 21–34 | E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT |
|
||
| 7 | E-C6 | C6 Tile Cache + Spatial Index | component | AZ-250 | M | 13–21 | E-BOOT, E-CC-LOG, E-CC-CONF |
|
||
| 8 | E-C11 | C11 Tile Manager (TileDownloader + TileUploader) | component | AZ-251 | M | 13–21 | E-C6, E-CC-CONF, E-CC-LOG |
|
||
| 9 | E-C10 | C10 Pre-flight Cache Provisioning | component | AZ-252 | M | 13–21 | E-C6, E-C7, E-CC-LOG |
|
||
| 10 | E-C12 | C12 Operator Pre-flight Orchestrator | component | AZ-253 | M | 13–21 | E-C10, E-C11, E-CC-LOG |
|
||
| 11 | E-C1 | C1 Visual / Visual-Inertial Odometry | component | AZ-254 | XL | 34–55 | E-BOOT, E-CC-FDR-CLIENT, E-C7 |
|
||
| 12 | E-C2 | C2 Visual Place Recognition | component | AZ-255 | L | 21–34 | E-C6, E-C7, E-CC-FDR-CLIENT |
|
||
| 13 | E-C2.5 | C2.5 Inlier-based Re-rank | component | AZ-256 | S | 5–8 | E-C2, E-C7, E-C6 (LightGlue helper shared with C3) |
|
||
| 14 | E-C3 | C3 Cross-Domain Matcher | component | AZ-257 | L | 21–34 | E-C2.5, E-C7 |
|
||
| 15 | E-C3.5 | C3.5 AdHoP-Conditional Refinement | component | AZ-258 | M | 8–13 | E-C3, E-C7 |
|
||
| 16 | E-C4 | C4 Pose Estimator | component | AZ-259 | M | 13–21 | E-C3.5, E-C5 (shared GTSAM substrate; co-developed) |
|
||
| 17 | E-C5 | C5 State Estimator | component | AZ-260 | XL | 34–55 | E-C1, E-C4 (shared graph), E-CC-FDR-CLIENT |
|
||
| 18 | E-C8 | C8 FC + GCS Adapter | component | AZ-261 | L | 21–34 | E-C5, E-CC-CONF, E-CC-LOG |
|
||
| 19 | E-BBT | Blackbox Tests (FT/NFT scenarios) | tests | AZ-262 | M | 13–21 | every component epic ships its component-internal tests under its own epic; this one parents the suite-level FT/NFT scenarios in `_docs/02_document/tests/*.md` |
|
||
| 20 | E-CC-HELPERS | Cross-Cutting: Common Helpers (8 shared utilities) | cross-cutting | AZ-264 | M | 13–21 | E-BOOT, E-CC-LOG (added in Decompose Step 2 — supersedes per-component helper child-issues from cycle 1) |
|
||
| 21 | E-DEMO-REPLAY | Offline replay mode (video + tlog → per-tick coordinate stream) — configuration of the airborne binary (ADR-011), NOT a separate image | feature | AZ-265 | M | 19–24 | E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C6, E-C8, E-CC-CONF (added in Decompose Step 2 — enables parent-suite UI demo via subprocess + JSONL streaming) |
|
||
|
||
## High-level component dependency diagram
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
BOOT[E-BOOT Bootstrap]
|
||
LOG[E-CC-LOG Logging]
|
||
CONF[E-CC-CONF Config + Composition Root]
|
||
FDRC[E-CC-FDR-CLIENT FDR Producer Client]
|
||
C13[E-C13 FDR]
|
||
C7[E-C7 Inference Runtime]
|
||
C6[E-C6 Tile Cache]
|
||
C11[E-C11 Tile Manager]
|
||
C10[E-C10 Cache Provisioning]
|
||
C12[E-C12 Operator Tooling]
|
||
C1[E-C1 VIO]
|
||
C2[E-C2 VPR]
|
||
C25[E-C2.5 Re-rank]
|
||
C3[E-C3 Matcher]
|
||
C35[E-C3.5 AdHoP]
|
||
C4[E-C4 Pose]
|
||
C5[E-C5 State]
|
||
C8[E-C8 FC Adapter]
|
||
BBT[E-BBT Blackbox Tests]
|
||
HELP[E-CC-HELPERS Common Helpers]
|
||
DEMO[E-DEMO-REPLAY Offline Replay Mode]
|
||
|
||
BOOT --> LOG --> FDRC --> C13
|
||
BOOT --> CONF --> C13
|
||
BOOT --> CONF --> C7
|
||
BOOT --> LOG --> HELP
|
||
C13 -.-> C7
|
||
CONF --> C6 --> C11
|
||
C6 --> C10
|
||
C7 --> C10
|
||
C10 --> C12
|
||
C11 --> C12
|
||
C7 --> C2 --> C25 --> C3 --> C35 --> C4
|
||
C6 --> C2
|
||
C6 --> C25
|
||
C1 --> C5
|
||
C4 <--> C5
|
||
C5 --> C8
|
||
FDRC --> C1
|
||
FDRC --> C5
|
||
C8 --> BBT
|
||
C12 --> BBT
|
||
HELP -.-> C1
|
||
HELP -.-> C2
|
||
HELP -.-> C25
|
||
HELP -.-> C3
|
||
HELP -.-> C35
|
||
HELP -.-> C4
|
||
HELP -.-> C5
|
||
HELP -.-> C6
|
||
HELP -.-> C7
|
||
HELP -.-> C8
|
||
HELP -.-> C10
|
||
HELP -.-> C11
|
||
HELP -.-> C12
|
||
C1 --> DEMO
|
||
C5 --> DEMO
|
||
C8 --> DEMO
|
||
CONF --> DEMO
|
||
```
|
||
|
||
---
|
||
|
||
## E-BOOT — Bootstrap & Initial Structure
|
||
|
||
**Tracker**: AZ-244
|
||
**Type**: bootstrap
|
||
**T-shirt**: M | **Story points**: 13–21
|
||
**Owner**: onboard team
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
EBOOT[E-BOOT scaffolding] --> SRC[src/ component dirs]
|
||
EBOOT --> CICD[CI Tier-1 + Tier-2 jobs]
|
||
EBOOT --> DOCKER[docker-compose.test.yml]
|
||
EBOOT --> DB[Postgres init scripts]
|
||
EBOOT --> TESTROOT[tests/ + tests/fixtures/]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
No source layout exists yet. Every downstream epic assumes a defined repo skeleton: `src/components/<id>_<name>/`, `src/shared/<concern>/`, `tests/`, `tests/fixtures/`, plus the Tier-1 Docker compose, the Tier-2 CI job, the Postgres init scripts that match `data_model.md`, and the operator-orchestrator tarball build path. Until this exists, no other epic can start.
|
||
|
||
### Scope
|
||
|
||
**In scope**:
|
||
- Create `src/components/<id>_<name>/` for all 14 components with empty package init.
|
||
- Create `src/shared/{logging,config,fdr_client,crypto,calibration_loader}/` placeholders.
|
||
- `pyproject.toml` (Python) + `CMakeLists.txt` (C++ where used by C1) with the project's pinned dep set.
|
||
- Tier-1 `docker-compose.test.yml` skeleton (companion + Postgres + e2e-runner; mock-suite-sat-service compose pulled in only by upload tests).
|
||
- Tier-2 CI job that runs on the bench Jetson runner, with the JetPack 6.2 / TRT 10.3 / SM 87 image pinned per ADR-005.
|
||
- Postgres init scripts for the schema in `data_model.md`.
|
||
- `tests/` directory with `tests/fixtures/`, `tests/tmp/`, `tests/conftest.py`.
|
||
- Empty `runtime_root.py` for the airborne composition root + `operator_tool/__main__.py` for the operator side.
|
||
- `.gitignore` covering binaries, engine caches, FDR segments, ephemeral keys.
|
||
- README with run commands.
|
||
|
||
**Out of scope**:
|
||
- Any per-component logic (each component's epic owns its own implementation).
|
||
- Cross-cutting impl (logging / config / FDR client live in their own epics).
|
||
|
||
### Architecture notes
|
||
|
||
- ADR-005 (Tier-1 / Tier-2 are first-class) drives the CI split.
|
||
- ADR-009 (composition root) places `runtime_root.py` at the airborne entrypoint and `operator_tool/__main__.py` at the operator side.
|
||
- ADR-002 (build-time exclusion) requires per-implementation CMake `BUILD_*` flags and the SBOM diff to be wired in CI from day one.
|
||
- ADR-004 (process isolation) requires the airborne build target to refuse `c11_tilemanager/` symbols. SBOM diff hook lives here from Bootstrap onward.
|
||
|
||
### Interface specification
|
||
|
||
This epic exposes no runtime interface; it ships repository scaffolding only.
|
||
|
||
### Data flow
|
||
|
||
N/A.
|
||
|
||
### Dependencies
|
||
|
||
- Epic dependencies: none.
|
||
- External: GitHub Actions runner pool (Tier-1 Docker), bench Jetson runner (Tier-2), pinned base images (JetPack 6.2, Postgres 16, mcr.microsoft.com/dotnet/aspnet:8.0-alpine for the test fixture).
|
||
|
||
### Acceptance criteria
|
||
|
||
- `docker compose -f docker-compose.test.yml up -d` brings up companion + Postgres + e2e-runner cleanly on a fresh workstation.
|
||
- Tier-2 CI smoke-job (`echo $JETPACK_VERSION` + `nvidia-smi`) passes on the bench Jetson.
|
||
- `pytest tests/ -q --collect-only` discovers the empty `tests/` tree without errors.
|
||
- The SBOM diff CI step exists and fails the build if `c11_tilemanager` ever appears in the airborne `production-binary` artifact (R02 enforcement seed).
|
||
- `runtime_root.py` runs and exits cleanly with a "no components configured" message (proves composition root wiring).
|
||
|
||
### Non-functional requirements
|
||
|
||
- CI cold-build wall-clock ≤ 10 min on Tier-1; ≤ 6 min on Tier-2 (just the smoke-job).
|
||
- Repo size at this stage ≤ 5 MB (no fixtures committed).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R12** (single deployment camera) — Bootstrap's CI must not assume the unit is plugged in; Tier-2 smoke-job runs without the camera, only against TRT/SM/JP version.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 story points across child PBIs (each ≤ 5 points).
|
||
|
||
### Child issues (PBIs)
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | Repo scaffolding: `src/components/`, `src/shared/`, `tests/`, `runtime_root.py` | 2 |
|
||
| 2 | `pyproject.toml` + `CMakeLists.txt` with pinned deps | 3 |
|
||
| 3 | Tier-1 `docker-compose.test.yml` skeleton + Postgres init | 3 |
|
||
| 4 | Tier-2 CI smoke-job on bench Jetson | 3 |
|
||
| 5 | SBOM diff CI step (R02 enforcement seed; fails on `c11_tilemanager` in airborne artifact) | 3 |
|
||
| 6 | `.gitignore` + `README.md` + run commands | 2 |
|
||
| 7 | `runtime_root.py` minimum (compose root + "no components configured" exit path) | 2 |
|
||
|
||
### Key constraints
|
||
|
||
- RESTRICT-HW-1 (Jetson Orin Nano Super, 8 GB shared LPDDR5, 25 W) — Tier-2 image pins SM 87 / JP 6.2 / TRT 10.3.
|
||
- RESTRICT-FC-1 (AP + iNav supported; PX4 out of scope) — composition root wires only AP + iNav adapters.
|
||
|
||
### Testing strategy
|
||
|
||
- CI smoke tests on every PR (Tier-1 compose-up, Tier-2 nvidia-smi).
|
||
- No unit tests yet — those live in component epics.
|
||
|
||
---
|
||
|
||
## E-CC-LOG — Cross-Cutting: Structured JSON Logging
|
||
|
||
**Tracker**: AZ-245
|
||
**Type**: cross-cutting
|
||
**T-shirt**: S | **Story points**: 5–8
|
||
|
||
### System context
|
||
|
||
Every component's `§ 9 Logging Strategy` mandates structured JSON logging at ERROR / WARN / INFO / DEBUG levels with per-frame fields (`frame_id`, `kind`, component-specific keys). A single shared logger module under `src/shared/logging/` produces these records; every component imports it.
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
COMP[Any component] --> LOGGER[src/shared/logging<br/>structured JSON]
|
||
LOGGER --> STDOUT[stdout / journald]
|
||
LOGGER --> FDR[FDR (via E-CC-FDR-CLIENT for ERROR + WARN)]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
If every component rolls its own logger, format drift is guaranteed. The `traceability-matrix` and post-flight FDR analysis rely on a stable JSON schema; a shared logger is the only honest way.
|
||
|
||
### Scope
|
||
|
||
**In scope**:
|
||
- `src/shared/logging/__init__.py` exporting `get_logger(component_id: str) -> Logger`.
|
||
- JSON formatter with stable field ordering (`ts, level, component, frame_id, kind, msg, ...kv`).
|
||
- Drop-in `RotatingStdoutHandler` for Tier-1 dev; `JournaldHandler` for Tier-2 production.
|
||
- Bridge into the FDR client for ERROR + WARN levels (handler subscribes to log records and enqueues a `kind = "log"` FdrRecord).
|
||
- Helpers for the documented per-frame log shapes (`vio.frame_id`, `vpr.top10_distances`, etc.) so component code is short.
|
||
|
||
**Out of scope**: per-component log content (lives in each component epic's child PBIs).
|
||
|
||
### Architecture notes
|
||
|
||
Stdlib `logging` + `python-json-logger` (or `orjson` formatter for speed). No new dependency beyond what's already in `pyproject.toml`. No third-party log aggregator — Tier-1 uses Docker stdout capture; Tier-2 uses journald.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
def get_logger(component_id: str) -> logging.Logger: ...
|
||
|
||
class StructuredJsonHandler(logging.Handler):
|
||
"""JSON formatter + FDR bridge for ERROR/WARN."""
|
||
|
||
class FdrLogBridge:
|
||
"""Subscribed by the logger; forwards ERROR + WARN to E-CC-FDR-CLIENT.enqueue."""
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as Component
|
||
participant L as Logger
|
||
participant S as stdout
|
||
participant F as FDR Client
|
||
C->>L: log.warn("VPR top-1 above threshold", distance=0.42)
|
||
L->>S: {"level":"WARN", "component":"c2", ...}
|
||
L->>F: enqueue(kind="log", level="WARN", payload=...)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- Depends on E-BOOT.
|
||
- External: `python-json-logger` or `orjson` (whichever is already pinned).
|
||
|
||
### Acceptance criteria
|
||
|
||
- Every component test that asserts a log message uses the shared logger and finds the expected JSON shape.
|
||
- ERROR + WARN records appear in FDR with `kind = "log"` and a back-reference to the originating component.
|
||
- INFO + DEBUG do NOT appear in FDR (per-component § 9 storage rule).
|
||
- Log format passes a contract test (`tests/contract/log_schema.py`) verifying field names + ordering + required keys.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Per-record latency p99 ≤ 0.2 ms (lock-free emit on the hot path).
|
||
- No allocation in the steady-state DEBUG path beyond the message string itself.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R13** (FDR queue overrun) — the FDR bridge uses E-CC-FDR-CLIENT's drop-oldest semantics; it never blocks the caller.
|
||
|
||
### Effort
|
||
|
||
T-shirt S; 5–8 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `src/shared/logging/` module + JSON formatter + handlers | 3 |
|
||
| 2 | FDR log bridge (ERROR + WARN → kind=log) | 2 |
|
||
| 3 | Contract test `tests/contract/log_schema.py` | 2 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-NEW-3 (FDR ≤ 64 GB / flight, no silent drops) — DEBUG must not flow into FDR; verified by the contract test.
|
||
|
||
### Testing strategy
|
||
|
||
- Unit tests for the formatter (field ordering + escaping).
|
||
- Contract test against the FDR record schema (kind=log).
|
||
- Integration via every component's tests.md (each component asserts at least one log message).
|
||
|
||
---
|
||
|
||
## E-CC-CONF — Cross-Cutting: Configuration & Composition Root
|
||
|
||
**Tracker**: AZ-246
|
||
**Type**: cross-cutting
|
||
**T-shirt**: S | **Story points**: 5–8
|
||
|
||
### System context
|
||
|
||
ADR-001 (runtime selection by config) + ADR-009 (composition root) together require a single shared loader that materialises the `Config` object at process startup, plus a `compose_root(config)` function that constructs each strategy/component instance with its dependencies. No component instantiates another component itself.
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
ENV[ENV vars] --> LOADER
|
||
YAML[config.yaml] --> LOADER
|
||
CALIB[Camera calibration JSON] --> LOADER
|
||
LOADER[src/shared/config/loader] --> ROOT[runtime_root.py / operator_tool/__main__.py]
|
||
ROOT --> COMPS[component instances]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Without a single source of truth for configuration, the BUILD_* + runtime-strategy-selection rules of ADR-001/002/009 collapse — components silently fall back to defaults, and the composition root grows local config-parsing logic that drifts. The CI gate that ensures only the *linked* strategies are selectable also lives here.
|
||
|
||
### Scope
|
||
|
||
**In scope**:
|
||
- `src/shared/config/loader.py`: env + YAML + camera-calibration JSON merging with explicit precedence (env > YAML > defaults).
|
||
- `Config` dataclass (frozen) covering every component's startup knob.
|
||
- `compose_root(config) -> RuntimeRoot` for the airborne process; `compose_operator(config) -> OperatorRoot` for the tooling side.
|
||
- Strategy-vs-build-flag consistency check at startup: refuse to start if config selects a strategy whose `BUILD_*` flag was off in the linked binary.
|
||
|
||
**Out of scope**: any component's specific config shape (defined inside its own epic).
|
||
|
||
### Architecture notes
|
||
|
||
- ADR-001, ADR-002, ADR-009 all converge here.
|
||
- The composition root is the only place `import` of a concrete `VioStrategy` / `VprStrategy` / etc. is allowed; component code imports the abstract interface only.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
@frozen
|
||
class Config: ... # populated by union of every component's config schema
|
||
|
||
def load_config(env: dict[str, str], paths: list[Path]) -> Config: ...
|
||
|
||
def compose_root(config: Config) -> RuntimeRoot: ...
|
||
def compose_operator(config: Config) -> OperatorRoot: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
Startup-only — runs once per process. No per-frame path.
|
||
|
||
### Dependencies
|
||
|
||
- Depends on E-BOOT.
|
||
|
||
### Acceptance criteria
|
||
|
||
- `compose_root` constructs a runnable airborne process for every documented config preset (default deployment, IT-12 research-binary, smoke-test minimal).
|
||
- Strategy/build-flag mismatch triggers an explicit `StrategyNotLinkedError` with a clear message (no silent fallback).
|
||
- Config precedence (env > YAML > defaults) verified by unit tests for at least 3 keys per layer.
|
||
- `runtime_root.py` exits with code 0 when given a valid config and no components actually do work (reachability proof).
|
||
|
||
### Non-functional requirements
|
||
|
||
- Cold-start config load + compose ≤ 1 s on Tier-2 (counts toward AC-NEW-1's 30 s budget).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R02** (ADR-004 process isolation) — compose_root's strategy/build-flag check is the third enforcement gate (after SBOM diff and runtime self-check) preventing C11 from running airborne.
|
||
|
||
### Effort
|
||
|
||
T-shirt S; 5–8 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `src/shared/config/loader.py` + `Config` dataclass | 3 |
|
||
| 2 | `compose_root` + `compose_operator` skeletons + StrategyNotLinkedError | 3 |
|
||
| 3 | Unit tests for env/YAML/defaults precedence | 2 |
|
||
|
||
### Key constraints
|
||
|
||
- ADR-002 (build-time exclusion) — only linked strategies selectable.
|
||
|
||
### Testing strategy
|
||
|
||
- Unit: precedence + `StrategyNotLinkedError`.
|
||
- Integration: every documented preset starts cleanly.
|
||
|
||
---
|
||
|
||
## E-CC-FDR-CLIENT — Cross-Cutting: FDR Producer Client
|
||
|
||
**Tracker**: AZ-247
|
||
**Type**: cross-cutting
|
||
**T-shirt**: M | **Story points**: 8–13
|
||
|
||
### System context
|
||
|
||
C13 owns the FDR writer thread, segment files, and the 64 GB cap. Every other component publishes via a producer-side client: lock-free enqueue + an `FdrRecord` schema versioned in `RecordSchema`. This epic owns ONLY the producer side; the writer-thread internals belong to E-C13.
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
PROD[Component producer] --> Q[lock-free ring buffer]
|
||
Q --> WRITER[E-C13 writer thread]
|
||
WRITER --> SEG[segment file on NVM]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Producer-side correctness (drop-oldest with rollover-log, schema versioning, never-block) is independent of where the file lands. Co-locating producer logic inside E-C13 would force every component test to spin up the writer thread; a thin shared client lets component tests use a fake sink.
|
||
|
||
### Scope
|
||
|
||
**In scope**:
|
||
- `src/shared/fdr_client/__init__.py` exporting `FdrClient(producer_id: str) -> Client`.
|
||
- Lock-free SPSC ring buffer per producer; capacity configurable (default per producer in `Config`).
|
||
- `FdrRecord` versioned schema (orjson or msgpack — pinned in E-BOOT).
|
||
- Drop-oldest behaviour writing a structured `kind=overrun` record with `producer_id` + dropped count (never silent).
|
||
- `FakeFdrSink` for component-level tests.
|
||
|
||
**Out of scope**: writer thread, segment files, 64 GB cap, rollover policy (E-C13).
|
||
|
||
### Architecture notes
|
||
|
||
- AC-NEW-3 (no silent drops) is enforced HERE: drop-oldest always emits the overrun record.
|
||
- Schema versioning prevents post-flight tooling breakage when payload classes evolve.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class FdrClient:
|
||
def __init__(self, producer_id: str): ...
|
||
def enqueue(self, record: FdrRecord) -> None: ... # lock-free, never blocks
|
||
def flush(self) -> None: ... # used by tests only
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C as Component
|
||
participant Q as Ring buffer
|
||
participant W as Writer (E-C13)
|
||
C->>Q: enqueue(record)
|
||
alt overrun
|
||
Q->>Q: drop oldest + emit kind=overrun record
|
||
end
|
||
W->>Q: dequeue (in writer thread)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- Depends on E-BOOT, E-CC-LOG.
|
||
- Consumed by every component that emits FDR records.
|
||
|
||
### Acceptance criteria
|
||
|
||
- `enqueue` never blocks even under writer-thread stall (verified by C13-IT-05 from the C13 tests.md).
|
||
- Every overrun event produces a structured record with non-zero `dropped_count` and the originating `producer_id`.
|
||
- Schema version bump (e.g., adding a new field) does not break post-flight tooling that reads at version N-1 (forward-compatible parser).
|
||
|
||
### Non-functional requirements
|
||
|
||
- `enqueue` p99 ≤ 5 µs on Tier-2 (no allocation on the steady-state path; pre-sized buffers).
|
||
- Per-producer ring buffer size ≤ configured cap (no unbounded growth).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R13** (queue overrun) — the design IS the mitigation: drop-oldest + always log.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 8–13 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | Lock-free SPSC ring buffer per producer | 5 |
|
||
| 2 | `FdrRecord` schema + versioned serialiser (orjson/msgpack) | 3 |
|
||
| 3 | Drop-oldest + `kind=overrun` record emission | 2 |
|
||
| 4 | `FakeFdrSink` for component tests | 2 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-NEW-3 (no silent drops).
|
||
|
||
### Testing strategy
|
||
|
||
- Unit: ring buffer correctness under contention; overrun record emitted.
|
||
- Property tests: forward-compat parser at version N-1.
|
||
|
||
---
|
||
|
||
## E-C13 — C13 Flight Data Recorder
|
||
|
||
**Tracker**: AZ-248 | **Type**: component | **T-shirt**: L | **Story points**: 21–34
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
ALL[All components] -->|enqueue via E-CC-FDR-CLIENT| Q[per-producer queues]
|
||
Q --> W[C13 writer thread]
|
||
W --> SEGS[segmented files on NVM]
|
||
SEGS -.->|post-landing| OPTOOL[E-C12 retrieval]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Per-flight ≤ 64 GB record of every payload class onboard, no silent drops, raw frames excluded except the ≤ 0.1 Hz failed-tile thumbnail forensic exception (AC-NEW-3, AC-8.5). Single writer thread; every other component produces.
|
||
|
||
### Scope
|
||
|
||
**In scope**: writer thread, segment file lifecycle, 64 GB cap with oldest-segment-dropped policy, per-flight `FlightHeader` + `FlightFooter`, atomic segment rotation, mid-flight tile snapshot path, failed-tile thumbnail rate cap, refusal of takeoff when `open_flight` fails.
|
||
|
||
**Out of scope**: producer-side enqueue (E-CC-FDR-CLIENT); post-flight retrieval UI (E-C12).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `_docs/02_document/components/14_c13_fdr/description.md` is the canonical spec.
|
||
- `IncrementalFixedLagSmoother` from C5 publishes smoothed past-keyframes via FDR ONLY (AC-4.5 revised) — NOT into the FC stream.
|
||
- Segment rotation uses `atomicwrites`; cross-process safety on the FDR root via `filelock`.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class FdrWriter:
|
||
def open_flight(header: FlightHeader) -> None: ... # raises FdrOpenError
|
||
def write_record(record: FdrRecord) -> None: ... # lock-free; FdrQueueOverrunError logged not raised
|
||
def close_flight() -> FlightFooter: ...
|
||
def current_size_bytes() -> int: ...
|
||
def is_rolling() -> bool: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Prods as Producers (every component)
|
||
participant Q as Per-producer queues
|
||
participant W as Writer thread
|
||
participant FS as NVM segment file
|
||
Prods->>Q: enqueue(record)
|
||
W->>Q: dequeue
|
||
W->>FS: serialise + append
|
||
alt segment >= cap
|
||
W->>FS: atomic rotate; drop oldest if total > 64 GB; emit kind=segment_rollover
|
||
end
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT.
|
||
|
||
### Acceptance criteria
|
||
|
||
- AC-NEW-3: synthetic 8 h replay produces ≤ 64 GB on disk, with every drop accompanied by a `kind=overrun` and/or `kind=segment_rollover` record.
|
||
- AC-8.5: `kind=raw_nav_frame` writes raise `RawFrameWriteForbiddenError`; `kind=failed_tile_thumbnail` rate-limited to ≤ 0.1 Hz.
|
||
- AC-1.4 / AC-4.5: every smoothed past-keyframe revision lands in FDR; the FC emission stream is unchanged.
|
||
- AC-NEW-3 takeoff gate: `FdrOpenError` aborts takeoff before the FC adapter is opened.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Writer throughput ≥ 200 Hz aggregate (per C13-PT-01).
|
||
- Per-record serialise + write p95 ≤ 5 ms.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R13** (queue overrun) — drop-oldest + always-log.
|
||
- **R02** (ADR-004) — C13 runs in the airborne process; no cross-process FDR root contention with C11 (C11 not airborne).
|
||
|
||
### Effort
|
||
|
||
T-shirt L; 21–34 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | Writer thread + segment file open/close/rotate | 5 |
|
||
| 2 | `FlightHeader` / `FlightFooter` + records-written/dropped accounting | 3 |
|
||
| 3 | 64 GB cap + oldest-segment-dropped policy + `kind=segment_rollover` record | 5 |
|
||
| 4 | Mid-flight tile snapshot path + filesystem layout | 3 |
|
||
| 5 | Failed-tile thumbnail ≤ 0.1 Hz rate limiter + AC-8.5 enforcement | 3 |
|
||
| 6 | `FdrOpenError` takeoff abort path | 2 |
|
||
| 7 | Component-internal tests C13-IT-01..06 + C13-PT-01 + C13-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-NEW-3, AC-8.5, AC-4.5 (revised), RESTRICT-UAV-4.
|
||
|
||
### Testing strategy
|
||
|
||
- Per `_docs/02_document/components/14_c13_fdr/tests.md` — six component-internal tests + 8 h NFT-LIM-02 at the suite level.
|
||
|
||
---
|
||
|
||
## E-C7 — C7 On-Jetson Inference Runtime
|
||
|
||
**Tracker**: AZ-249 | **Type**: component | **T-shirt**: L | **Story points**: 21–34
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C2[C2 VPR] --> C7
|
||
C25[C2.5 Re-rank] --> C7
|
||
C3[C3 Matcher] --> C7
|
||
C35[C3.5 AdHoP] --> C7
|
||
C7 --> TRT[TensorRT]
|
||
C7 --> ORT[ONNX Runtime]
|
||
C7 --> PT[PyTorch FP16]
|
||
C7 --> THERMAL[ThermalState publish]
|
||
THERMAL --> C4[C4 Pose hybrid]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Centralise GPU inference on Jetson: engine compilation, deserialise + warm-up, per-call inference, fallback chain (TRT → ONNX-RT+TRT-EP → PyTorch FP16), and `ThermalState` telemetry that drives D-CROSS-LATENCY-1.
|
||
|
||
### Scope
|
||
|
||
**In scope**: engine cache lifecycle, deserialise + warm-up budget (AC-NEW-1), `ThermalState` publisher from `jetson-stats`, D-C10-3 takeoff content-hash gate (engine-side), D-C10-7 filename-schema enforcement, ONNX-RT fallback path.
|
||
|
||
**Out of scope**: cache artifact build (E-C10), tile cache (E-C6), the per-frame consumers (their own epics).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/09_c7_inference/description.md`.
|
||
- Python in-process abstraction over C++ TRT bindings; no separate process.
|
||
- Engines hardware-tied (SM 87 / JP 6.2 / TRT 10.3 / FP16) per D-C10-6.
|
||
- Helper `EngineFilenameSchema` is shared with E-C10.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class InferenceRuntime:
|
||
def load_engine(model_id: str) -> EngineHandle: ...
|
||
def infer(handle: EngineHandle, batch: Tensor) -> Tensor: ...
|
||
def thermal_state() -> ThermalState: ...
|
||
def warm_up(handle: EngineHandle) -> None: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Caller as C2/C2.5/C3/C3.5
|
||
participant C7 as InferenceRuntime
|
||
participant GPU as TRT/ONNX/PT
|
||
Caller->>C7: infer(handle, batch)
|
||
C7->>GPU: forward
|
||
GPU-->>C7: output
|
||
C7-->>Caller: tensor
|
||
C7-->>C4: ThermalState pub (≥1 Hz)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT.
|
||
- External: TensorRT 10.3, ONNX Runtime + TRT EP, PyTorch FP16, jetson-stats.
|
||
|
||
### Acceptance criteria
|
||
|
||
- AC-NEW-1 cold-start: every required engine deserialises + warms in ≤ 30 s p95 (C7-IT-01).
|
||
- AC-NEW-5: `ThermalState` updates ≥ 1 Hz; throttle-detection latency ≤ 1 s; C4 hybrid switch within 1 frame (C7-IT-02).
|
||
- D-C10-3: `EngineHashMismatchError` aborts F2 takeoff; no GPU memory allocated on mismatch (C7-IT-03).
|
||
- D-C10-7: filename-schema mismatch refused at parse time (C7-IT-04).
|
||
- ONNX-RT fallback path produces correct results when TRT engine missing (C7-IT-05).
|
||
|
||
### Non-functional requirements
|
||
|
||
- Per-model p95 latencies (C7-PT-01): UltraVPR ≤ 60 ms, LightGlue ≤ 30 ms, AdHoP ≤ 90 ms, DISK ≤ 50 ms.
|
||
- GPU memory all engines resident ≤ 4 GB; system RAM ≤ 1.5 GB (C7-PT-02).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R04** (engine cache hardware-tied) — D-C10-7 + D-C10-3 enforced at C7's deserialise path.
|
||
- **R10** (Marginals under thermal throttle) — C7's `ThermalState` publish is the upstream input to the C4 hybrid.
|
||
|
||
### Effort
|
||
|
||
T-shirt L; 21–34 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | TRT engine load + warm-up + cache lifecycle | 5 |
|
||
| 2 | ONNX-RT + TRT-EP fallback path | 3 |
|
||
| 3 | PyTorch FP16 simple-baseline path | 3 |
|
||
| 4 | D-C10-3 content-hash gate + D-C10-7 filename schema enforcement | 3 |
|
||
| 5 | `ThermalState` publisher from `jetson-stats` | 3 |
|
||
| 6 | Component-internal tests C7-IT-01..05 + C7-PT-01..02 + C7-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- RESTRICT-HW-1 (Jetson + 25 W TDP), AC-4.2 (8 GB system memory), AC-NEW-5 (thermal envelope).
|
||
|
||
### Testing strategy
|
||
|
||
- Per `components/09_c7_inference/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C6 — C6 Tile Cache + Spatial Index
|
||
|
||
**Tracker**: AZ-250 | **Type**: component | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C11[C11 TileDownloader] --> C6
|
||
C10[C10 build] --> C6
|
||
C5[C5 mid-flight gen] --> C6
|
||
C2[C2 VPR] --> C6
|
||
C25[C2.5 rerank] --> C6
|
||
C3[C3 matcher] --> C6
|
||
C11U[C11 TileUploader] --> C6
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Persistent imagery store byte-identical to `satellite-provider`'s on-disk layout, plus the FAISS HNSW spatial index for VPR. Sole writer: C11 TileDownloader (production) + C5/orthorectifier (mid-flight). Sole readers: C2/C2.5/C3 (per-frame) + C11 TileUploader (post-landing).
|
||
|
||
### Scope
|
||
|
||
**In scope**: Postgres `tiles` schema, filesystem JPEG layout matching `satellite-provider`, FAISS HNSW build/load (the index FILE — population via C10), per-sector freshness gates at write-time, 10 GB cache budget enforcement with LRU eviction, content-SHA-256 invariant on insert, mid-flight tile insert with `quality_metadata`.
|
||
|
||
**Out of scope**: tile fetch (E-C11), descriptor population (E-C10), inference (E-C7).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/08_c6_tile_cache/description.md`, `data_model.md`.
|
||
- Schema in `data_model.md`; Postgres 16; SHA-256 sidecar via helper `Sha256Sidecar`.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class TileStore:
|
||
def insert(tile: TileRecord, jpeg: bytes) -> None: ... # raises FreshnessRejected, ContentHashMismatch
|
||
def get_tile_pixels(tile_id: TileId) -> bytes: ...
|
||
def query_spatial(bbox: Bbox, zoom: int) -> list[TileRecord]: ...
|
||
def mark_uploaded(tile_id: TileId) -> None: ...
|
||
def pending_uploads() -> list[TileRecord]: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant W as Writer (C11 / C5)
|
||
participant C6 as C6
|
||
participant DB as Postgres
|
||
participant FS as Filesystem
|
||
W->>C6: insert(tile, jpeg)
|
||
C6->>C6: freshness gate + sha256 check
|
||
C6->>FS: write jpeg + sidecar
|
||
C6->>DB: insert row
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-BOOT, E-CC-LOG, E-CC-CONF.
|
||
- External: PostgreSQL 16, FAISS.
|
||
|
||
### Acceptance criteria
|
||
|
||
- AC-8.1: filesystem layout byte-identical to `satellite-provider` for the same coordinate (C6-IT-01).
|
||
- AC-8.2 / AC-NEW-6: per-sector freshness gate rejects in active_conflict, downgrade-flags in stable_rear (C6-IT-02 / C6-IT-05).
|
||
- AC-8.4: every mid-flight tile carries `quality_metadata` (C6-IT-03).
|
||
- AC-NEW-3: peak F4 burst (5 Hz, 100 tiles) writes without dropping (C6-IT-04).
|
||
- RESTRICT-SAT-2: 10 GB cap enforced with LRU eviction, every eviction logged (C6-IT-06).
|
||
- Defensive: SHA-256 mismatch rejects insert (C6-ST-01).
|
||
|
||
### Non-functional requirements
|
||
|
||
- Per-tile read p95 (warm mmap) ≤ 0.5 ms; cold ≤ 50 ms (C6-PT-01).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R08** (freshness drift in active_conflict) — write-side gate is the primary mitigation.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | Postgres `tiles` schema + migration | 3 |
|
||
| 2 | Filesystem JPEG store byte-identical to satellite-provider | 3 |
|
||
| 3 | FAISS HNSW load/save + mmap | 3 |
|
||
| 4 | Freshness gate + sector classification | 3 |
|
||
| 5 | 10 GB LRU eviction with logging | 3 |
|
||
| 6 | Component-internal tests C6-IT-01..06 + C6-PT-01 + C6-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-8.1, AC-8.2, AC-NEW-6, RESTRICT-SAT-2, RESTRICT-UAV-4.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/08_c6_tile_cache/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C11 — C11 Tile Manager (TileDownloader + TileUploader)
|
||
|
||
**Tracker**: AZ-251 | **Type**: component | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
SP[satellite-provider] -->|GET| DL[C11 TileDownloader]
|
||
DL --> C6[C6 cache]
|
||
C6 --> UP[C11 TileUploader]
|
||
UP -->|POST /ingest| SP
|
||
classDef airborne fill:#fee
|
||
classDef operator fill:#cef
|
||
class DL,UP operator
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Sole operator-side network I/O against `satellite-provider`, both directions. Strict ADR-004: never loaded into the airborne companion image. Bundled because download + upload share auth, HTTP client, deployment unit, and the airborne-exclusion property.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `TileDownloader.fetch` (download → freshness gate → write to C6), `TileUploader.upload_pending` (read C6 pending → sign → POST → mark uploaded), per-flight ephemeral signing key, idempotent retry on partial-success batches.
|
||
|
||
**Out of scope**: any airborne code; cache artifact build (E-C10); orchestration (E-C12 — including the post-landing safety gate, which moved to C12 in Batch 44).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/12_c11_tilemanager/description.md`.
|
||
- ADR-004 enforcement via E-BOOT's SBOM diff + runtime self-check.
|
||
- Test substitute: e2e-test `mock-suite-sat-service` fixture under `tests/fixtures/` (R01).
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class TileDownloader:
|
||
def fetch(req: FetchRequest) -> DownloadBatchReport: ...
|
||
|
||
class TileUploader:
|
||
def upload_pending(req: UploadRequest) -> UploadBatchReport: ...
|
||
# contract v2.0.0 (frozen) — C11 no longer gates on flight state;
|
||
# the post-landing safety check lives in C12's PostLandingUploadOrchestrator
|
||
# (reads flight_footer.clean_shutdown from FDR) per Batch 44 SRP refactor.
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Op as Operator
|
||
participant DL as TileDownloader
|
||
participant SP as satellite-provider
|
||
participant C6 as C6
|
||
Op->>DL: fetch(area, sector_classification)
|
||
DL->>SP: GET tiles
|
||
SP-->>DL: tiles + metadata
|
||
DL->>C6: insert (after freshness gate)
|
||
DL-->>Op: DownloadBatchReport
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C6, E-CC-CONF, E-CC-LOG.
|
||
- External: real `satellite-provider` (download); D-PROJ-2 endpoint OR e2e-test fixture (upload).
|
||
|
||
### Acceptance criteria
|
||
|
||
- C11-IT-01: TileDownloader fetch + freshness gate + C6 write byte-identical layout.
|
||
- C11-IT-02: stale-rejection counts surface in `DownloadBatchReport`.
|
||
- C11-IT-03: TileUploader posts pending, signs payloads, marks uploaded on 202.
|
||
- C11-IT-04: post-landing safety gate is now a C12 concern — see `_docs/02_document/components/13_c12_operator_orchestrator/tests.md` C12-IT-03 (Batch 44 SRP refactor; AZ-317 superseded).
|
||
- C11-IT-05: idempotent retry — already-acked tiles not re-sent.
|
||
- C11-ST-01: airborne process cannot import `c11_tilemanager` (R02 enforcement).
|
||
- C11-ST-02: NFT-SEC-02 network-egress test passes.
|
||
- C11-ST-03: per-flight key zeroised after upload.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Download throughput ≥ 50 MB/s on 1 Gbps link (C11-PT-01).
|
||
- Upload throughput ≥ 20 tile/s with signing (C11-PT-02).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R01** (D-PROJ-2 not yet shipped) — TileUploader works against the e2e-test fixture; production retire when real endpoint lands.
|
||
- **R02** (ADR-004 break) — three enforcement gates; C11 tests verify each.
|
||
- **R09** (key compromise) — per-flight ephemeral keys; voting layer for compromise detection.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | TileDownloader: GET + freshness gate + C6 write | 5 |
|
||
| 2 | TileUploader: read pending + sign + POST + mark uploaded | 5 |
|
||
| 3 | Idempotent retry on partial-success batch | 3 |
|
||
| 4 | ~~`flight_state == ON_GROUND` gate~~ — moved to C12 `PostLandingUploadOrchestrator` (Batch 44 SRP refactor; AZ-317 superseded) | n/a |
|
||
| 5 | Per-flight ephemeral signing key + zeroisation | 3 |
|
||
| 6 | Component-internal tests C11-IT-01..05 + C11-PT-01..02 + C11-ST-01..03 + C11-AT-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- ADR-004, RESTRICT-SAT-1 (no in-flight Service calls), AC-8.3, AC-8.4, AC-NEW-6.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/12_c11_tilemanager/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C10 — C10 Pre-flight Cache Provisioning
|
||
|
||
**Tracker**: AZ-252 | **Type**: component | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C6[C6 already populated by C11] --> C10
|
||
C10 --> ENGINES[TRT engines]
|
||
C10 --> DESCS[FAISS descriptors]
|
||
C10 --> MAN[signed Manifest]
|
||
ENGINES & DESCS & MAN --> AIRBORNE[airborne image at F2 takeoff]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Build model-derived artifacts from an already-populated C6: TRT engines, VPR descriptors (calling C2's `embed_query` over the corpus), the signed Manifest with content-hashes. Idempotent re-run on unchanged C6.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `CacheProvisioner.build_artifacts`, `ManifestVerifier.verify`, idempotence (D-C10-1), Manifest covers every shipped artifact, hardware-tied engine compile (D-C10-6), filename schema (D-C10-7), operator-key requirement.
|
||
|
||
**Out of scope**: tile fetch (E-C11), tile cache writes (E-C6), engine deserialisation (E-C7).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/11_c10_provisioning/description.md`.
|
||
- C10 narrowed in this Plan cycle: it does NOT talk to `satellite-provider`. Tiles must be present in C6 before C10 runs.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class CacheProvisioner:
|
||
def build_artifacts(corpus_root: Path, key_path: Path) -> BuildReport: ...
|
||
|
||
class ManifestVerifier:
|
||
def verify(manifest: Path, public_key: PublicKey) -> ManifestVerdict: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Op as Operator
|
||
participant C10 as CacheProvisioner
|
||
participant C2 as C2 embed_query
|
||
participant FS as Filesystem
|
||
Op->>C10: build_artifacts(corpus_root, key)
|
||
C10->>C2: embed every tile
|
||
C2-->>C10: descriptors
|
||
C10->>FS: write engines + faiss + manifest
|
||
C10->>FS: sign manifest with operator key
|
||
C10-->>Op: BuildReport
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C6, E-C7, E-CC-LOG.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C10-IT-01: end-to-end build produces engines + descriptors + signed Manifest.
|
||
- C10-IT-02: ManifestVerifier rejects tampered or wrong-key Manifests.
|
||
- C10-IT-03: idempotent re-run — same hash, no recompile (D-C10-1).
|
||
- C10-IT-04: ManifestCoverageError on orphan files (no smuggled artifacts).
|
||
- C10-IT-05: Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines (D-C10-6).
|
||
- C10-ST-01: build refuses dev-key signing in operator mode.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Cold build wall-clock ≤ 12 min on developer laptop with NVIDIA GPU; warm idempotent re-run ≤ 1 min (C10-PT-01).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R04** (engine cache hardware-tied) — owner of the build side; deserialise side is C7.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | TRT engine compile (per-model) | 5 |
|
||
| 2 | FAISS descriptor population via C2's embed path | 3 |
|
||
| 3 | Signed Manifest builder + content-hash table | 3 |
|
||
| 4 | ManifestVerifier with operator-key requirement | 3 |
|
||
| 5 | Idempotent re-run + ManifestCoverageError | 3 |
|
||
| 6 | Component-internal tests C10-IT-01..05 + C10-PT-01 + C10-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-8.3, AC-NEW-1, D-C10-1 / D-C10-3 / D-C10-6 / D-C10-7.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/11_c10_provisioning/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C12 — C12 Operator Pre-flight Orchestrator
|
||
|
||
**Tracker**: AZ-253 | **Type**: component | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
CLI[operator-orchestrator CLI]
|
||
CLI --> C11D[C11 TileDownloader]
|
||
CLI --> C10[C10 CacheProvisioner]
|
||
CLI --> C11U[C11 TileUploader]
|
||
CLI --> RELOC[AC-3.4 re-loc workflow]
|
||
CLI --> FDR[FDR retrieval]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Operator-facing CLI that sequences pre-flight (C11 download → C10 build) and post-landing (C11 upload), surfaces actionable failures, and handles the AC-3.4 re-localization workflow. Delivered as part of the operator-orchestrator tarball.
|
||
|
||
### Scope
|
||
|
||
**In scope**: CLI subcommands (`download`, `build-cache`, `upload-pending`, `reloc-confirm`), `CacheBuildReport` aggregation, `PostLandingUploadOrchestrator` (post-landing safety gate reading `flight_footer.clean_shutdown` from FDR via `FdrFooterReader` — Batch 44 SRP refactor; supersedes the former C11-internal gate), `OperatorReLocService` (AC-3.4 visual-loss hint dispatched via `OperatorCommandTransport` Protocol — E-C8 ships the concrete pymavlink-backed impl), sector-classification UI hook, FDR retrieval helpers.
|
||
|
||
**Out of scope**: actual download/upload (E-C11; C11 no longer gates internally); engine compile (E-C10); FDR write side (E-C13); concrete `OperatorCommandTransport` (E-C8).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/13_c12_operator_orchestrator/description.md`.
|
||
- Strict process boundary: C12 is operator-side only, in the same image as C11, but never airborne.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class BuildCacheOrchestrator:
|
||
def build_cache(request: BuildCacheRequest) -> CacheBuildReport: ...
|
||
|
||
class PostLandingUploadOrchestrator:
|
||
def trigger_post_landing_upload(request: PostLandingUploadRequest) -> UploadBatchReportCut: ...
|
||
# raises FlightStateNotConfirmedError(reason) for {footer_missing,
|
||
# unclean_shutdown, flight_id_not_found, fdr_unreadable: <repr>}
|
||
# or SatelliteProviderError on C11 transport failures.
|
||
|
||
class OperatorReLocService:
|
||
def request_reloc(reloc_hint: ReLocHint) -> None: ...
|
||
# raises GcsLinkError with "C12 reloc-confirm: " prefix on link failure.
|
||
|
||
class FdrFooterReader(Protocol):
|
||
def read_flight_footer(flight_id: FlightId) -> FlightFooterRecord | None: ...
|
||
|
||
class OperatorCommandTransport(Protocol):
|
||
def send_reloc_hint(hint: ReLocHint) -> None: ...
|
||
# concrete impl owned by E-C8 (pymavlink-backed); pattern matches
|
||
# AZ-322 BackboneEmbedder (C10 owns Protocol; C2 implements later).
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant Op as Operator
|
||
participant C12 as OperatorTool
|
||
participant C11 as C11
|
||
participant C10 as C10
|
||
Op->>C12: build_cache(area)
|
||
C12->>C11: TileDownloader.fetch
|
||
C11-->>C12: DownloadBatchReport
|
||
C12->>C10: build_artifacts
|
||
C10-->>C12: BuildReport
|
||
C12-->>Op: CacheBuildReport
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C10, E-C11, E-CC-LOG.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C12-IT-01: operator re-loc workflow (`OperatorReLocService.request_reloc`) returns SUT to `satellite_anchored` ≤ 30 s (AC-3.4); on `GcsLinkError`, CLI exits with `EXIT_GCS_LINK_ERROR` and operator-actionable remediation text.
|
||
- C12-IT-02: `build_cache` orchestrates C11 then C10; download failure aborts before C10.
|
||
- C12-IT-03: `trigger_post_landing_upload` reads `flight_footer.clean_shutdown` from FDR via `FdrFooterReader` (Batch 44 footer-based gate; replaces the prior 30-s ON_GROUND heuristic). Refusal modes: `footer_missing`, `unclean_shutdown`, `flight_id_not_found`, `fdr_unreadable: <repr>` — each maps to a distinct CLI exit code.
|
||
- C12-IT-04: actionable failure messages + non-zero exit on stale-tile rate > 30% or manifest signature failure.
|
||
- C12-ST-01: no CLI command path imports into airborne package boundary.
|
||
|
||
### Non-functional requirements
|
||
|
||
- End-to-end `build_cache` wall-clock ≤ 18 min on developer laptop with NVIDIA GPU (C12-PT-01).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R08** (freshness drift) — actionable failure surfacing in CacheBuildReport.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | CLI scaffolding + subcommand routing (AZ-326) | 3 |
|
||
| 2 | `BuildCacheOrchestrator` — C11 then C10 sequenced flow + lockfile (AZ-328) | 5 |
|
||
| 3 | `PostLandingUploadOrchestrator` + `FdrFooterReader` — Batch 44 footer-based gate (AZ-329) | 3 |
|
||
| 4 | `OperatorReLocService` + `OperatorCommandTransport` Protocol — AC-3.4 (AZ-330) | 3 |
|
||
| 5 | Companion bringup (SSH-based pre-flight verification) (AZ-327) | 3 |
|
||
| 6 | `FlightsApiClient` — operator-origin path (AZ-489) | 3 |
|
||
| 7 | Component-internal tests C12-IT-01..04 + C12-PT-01 + C12-ST-01 + C12-AT-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- ADR-004 (C12 lives operator-side); AC-3.4, AC-8.3, AC-8.4.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/13_c12_operator_orchestrator/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C1 — C1 Visual / Visual-Inertial Odometry
|
||
|
||
**Tracker**: AZ-254 | **Type**: component | **T-shirt**: XL | **Story points**: 34–55
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
NAVCAM[Nav camera 3 Hz] --> C1
|
||
C8IMU[C8 ImuWindow 100-200 Hz] --> C1
|
||
CAL[CameraCalibration] --> C1
|
||
C1 --> C5[C5 StateEstimator]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate from nav-camera + FC IMU. Three pluggable strategies (Okvis2 production-default, VinsMono research-only, KltRansac mandatory simple-baseline) selected at startup, build-time gated, never hot-swappable. Largest single epic by complexity.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `VioStrategy` interface + the three concrete strategies, `ImuPreintegrator` helper, warm-start path (AC-5.1), reboot recovery (AC-5.3), KltRansac as the simple-baseline AC-2.1a check, honest covariance under degradation.
|
||
|
||
**Out of scope**: state fusion (E-C5), pose estimation (E-C4), satellite anchoring (E-C2/C3/C4 chain).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/01_c1_vio/description.md`.
|
||
- Strategy + composition root + build-time exclusion (ADR-001 / ADR-002 / ADR-009).
|
||
- C++ strategies via pybind11; KltRansac thin Python wrapper around OpenCV.
|
||
- `ImuPreintegrator` shared with E-C5 (built once, used twice).
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class VioStrategy(Protocol):
|
||
def process_frame(frame: NavCameraFrame, imu: ImuWindow, cal: CameraCalibration) -> VioOutput: ...
|
||
def reset_to_warm_start(pose: WarmStartPose) -> None: ...
|
||
def health_snapshot() -> VioHealth: ...
|
||
```
|
||
|
||
DTOs in `components/01_c1_vio/description.md` § 2.
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CAM as Nav camera
|
||
participant C1 as VioStrategy
|
||
participant C5 as C5
|
||
participant FDR as FDR
|
||
CAM->>C1: NavCameraFrame
|
||
C1->>C1: IMU preintegrate + feature tracking
|
||
C1->>C5: VioOutput (relative pose + 6x6 cov + bias)
|
||
C1->>FDR: VioHealth (ERROR + WARN; DEBUG to stdout)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-BOOT, E-CC-FDR-CLIENT, E-C7 (only for the simple-baseline KltRansac path; OKVIS2 / VinsMono are CPU-bound, not GPU).
|
||
|
||
### Acceptance criteria
|
||
|
||
- C1-IT-01: honest cov norm rises monotonically under feature-loss event (AC-1.3 / AC-1.4).
|
||
- C1-IT-02: `VioOutput` schema invariants — SPD covariance + matched frame_id (AC-1.4).
|
||
- C1-IT-03: KltRansac ≥ 95% tracked-frame ratio on Derkachi normal segment (AC-2.1a engine rule).
|
||
- C1-IT-04: MRE p95 < 1 px frame-to-frame for Okvis2 + KltRansac (AC-2.2).
|
||
- C1-IT-05: warm-start converges within 5 frames (AC-5.1).
|
||
- C1-IT-06: F8 reboot recovery from warm-start hint without fake confidence (AC-5.3).
|
||
|
||
### Non-functional requirements
|
||
|
||
- C1-PT-01: `process_frame` p95 ≤ 80 ms (Okvis2) at 3 Hz on Tier-2 with C2 backbone running concurrently; throughput ≥ 3 Hz sustained.
|
||
- CPU ≤ 30% one core; memory ≤ 1.5 GB resident.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R10** (latency under thermal throttle) — C1's budget partition is fixed; thermal-driven hybrid lives in C4.
|
||
- **R12** (single deployment camera) — KltRansac engine-rule path stays camera-agnostic; comparative IT-12 study uses static fixtures.
|
||
|
||
### Effort
|
||
|
||
T-shirt XL; 34–55 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `VioStrategy` interface + composition wiring | 3 |
|
||
| 2 | OKVIS2 strategy (pybind11 binding + integration) | 5 |
|
||
| 3 | VinsMono strategy (research-only; behind BUILD_VINS_MONO) | 5 |
|
||
| 4 | KltRansac simple-baseline strategy | 5 |
|
||
| 5 | `ImuPreintegrator` helper (shared with C5) | 3 |
|
||
| 6 | Warm-start + F8 reboot recovery paths | 3 |
|
||
| 7 | Honest-covariance contract tests | 3 |
|
||
| 8 | Component-internal tests C1-IT-01..06 + C1-PT-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-1.3, AC-1.4, AC-2.1a, AC-2.2, AC-4.1, AC-5.1, AC-5.3; RESTRICT-UAV-3 (sharp turns < 5% overlap).
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/01_c1_vio/tests.md` + suite-level FT-P-02 / FT-P-04 / FT-P-05.
|
||
|
||
---
|
||
|
||
## E-C2 — C2 Visual Place Recognition
|
||
|
||
**Tracker**: AZ-255 | **Type**: component | **T-shirt**: L | **Story points**: 21–34
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
CAM[Nav camera] --> C2
|
||
C7[C7 backbone] --> C2
|
||
C6[C6 FAISS index] --> C2
|
||
C2 --> C25[C2.5 Re-rank]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Top-K=10 candidate retrieval from the pre-cached corpus by descriptor similarity. UltraVPR primary, MegaLoc secondary, NetVLAD mandatory simple-baseline. Boundary between cheap retrieval and expensive matching.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `VprStrategy` + multiple backbones, FAISS HNSW lookup, descriptor pre-processing (resize/crop/normalise), L2 normalisation via `DescriptorNormaliser`, descriptor population entry-point used by C10.
|
||
|
||
**Out of scope**: re-rank (E-C2.5), matching (E-C3), index build (E-C10).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/02_c2_vpr/description.md`.
|
||
- Strategy + ADR-001/002/009.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class VprStrategy(Protocol):
|
||
def embed_query(frame: NavCameraFrame, cal: CameraCalibration) -> VprQuery: ...
|
||
def retrieve_topk(query: VprQuery, k: int) -> VprResult: ...
|
||
def descriptor_dim() -> int: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CAM as Nav camera
|
||
participant C2 as VprStrategy
|
||
participant C7 as C7
|
||
participant C6 as FAISS
|
||
CAM->>C2: NavCameraFrame
|
||
C2->>C7: backbone forward
|
||
C7-->>C2: embedding
|
||
C2->>C6: HNSW search k=10
|
||
C6-->>C2: candidates
|
||
C2-->>C25: VprResult
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C6, E-C7, E-CC-FDR-CLIENT.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C2-IT-01: UltraVPR recall@10 ≥ 0.95; NetVLAD ≥ 0.85 on Derkachi (AC-2.1b + engine rule).
|
||
- C2-IT-02: `VprResult` invariants (length, sorted distances, label).
|
||
- C2-IT-03: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.
|
||
- C2-IT-04: scale-ratio ±20% recall@10 ≥ 0.85 (AC-8.6 scale half).
|
||
- C2-ST-01: index handle invalidation rejected with `IndexUnavailableError`.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C2-PT-01: `embed_query` p95 ≤ 60 ms; `retrieve_topk` p95 ≤ 2 ms; combined ≤ 65 ms (AC-4.1 partition).
|
||
- GPU ≤ 600 MB resident; system mem ≤ 200 MB for index handle.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R06** (VPR top-1 false positive) — C2.5 + C3 + AC-NEW-7 downstream.
|
||
|
||
### Effort
|
||
|
||
T-shirt L; 21–34 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `VprStrategy` interface + composition | 3 |
|
||
| 2 | UltraVPR backbone (TRT) | 5 |
|
||
| 3 | MegaLoc, MixVPR, SelaVPR, EigenPlaces secondary backbones | 5 |
|
||
| 4 | NetVLAD mandatory simple-baseline | 3 |
|
||
| 5 | FAISS HNSW load + lookup wiring | 3 |
|
||
| 6 | `DescriptorNormaliser` helper (shared with C10) | 2 |
|
||
| 7 | Component-internal tests C2-IT-01..04 + C2-PT-01 + C2-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-2.1b, AC-2.2, AC-4.1, AC-8.6, AC-NEW-7.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/02_c2_vpr/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C2.5 — C2.5 Inlier-based Re-rank
|
||
|
||
**Tracker**: AZ-256 | **Type**: component | **T-shirt**: S | **Story points**: 5–8
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C2[C2 K=10] --> C25
|
||
C7[C7 LightGlueRuntime helper] --> C25
|
||
C6[C6 tile pixels] --> C25
|
||
C25 --> C3[C3 N=3]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
K=10 → N=3 by single-pair LightGlue inlier count. Boundary between cheap retrieval and expensive matching. Shares `LightGlueRuntime` helper with C3 (R14 — owned by helper, not by either component).
|
||
|
||
### Scope
|
||
|
||
**In scope**: `ReRankStrategy` + `InlierCountReRanker`, drop-and-continue on per-candidate failure.
|
||
|
||
**Out of scope**: matching itself (E-C3); LightGlue runtime ownership (the helper is its own module).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/03_c2_5_rerank/description.md`.
|
||
- Helper-ownership decision documented in R14 / risk_mitigations.md.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class ReRankStrategy(Protocol):
|
||
def rerank(frame: NavCameraFrame, vpr_result: VprResult, n: int) -> RerankResult: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C2 as C2
|
||
participant C25 as C2.5
|
||
participant LG as LightGlueRuntime helper
|
||
participant C6 as C6
|
||
C2->>C25: VprResult (k=10)
|
||
loop 10 candidates
|
||
C25->>C6: get_tile_pixels
|
||
C25->>LG: single-pair inlier count
|
||
LG-->>C25: inlier count
|
||
end
|
||
C25-->>C3: top-N=3 by inlier count
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C2, E-C7, E-C6, shared `LightGlueRuntime` helper (with C3).
|
||
|
||
### Acceptance criteria
|
||
|
||
- C2.5-IT-01: top-1 promotion rate ≥ 0.98 (rerank rarely overrides correct C2 top-1).
|
||
- C2.5-IT-02: drop-and-continue on per-candidate `RerankBackboneError`.
|
||
- C2.5-IT-03: shared `LightGlueRuntime` serial-access invariant (no deadlock; bit-identical to single-threaded).
|
||
|
||
### Non-functional requirements
|
||
|
||
- C2.5-PT-01: `rerank` p95 ≤ 80 ms for 10 single-pair LightGlue passes; engine reuse single instance across calls.
|
||
- GPU mem ≤ 300 MB shared LightGlue engine.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R14** (apparent C2.5↔C3 cycle) — resolved this iteration via helper ownership.
|
||
|
||
### Effort
|
||
|
||
T-shirt S; 5–8 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `InlierCountReRanker` + drop-and-continue | 3 |
|
||
| 2 | Shared `LightGlueRuntime` helper module | 3 |
|
||
| 3 | Component-internal tests C2.5-IT-01..03 + C2.5-PT-01 | 2 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-2.1b, AC-4.1, AC-NEW-7.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/03_c2_5_rerank/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C3 — C3 Cross-Domain Matcher
|
||
|
||
**Tracker**: AZ-257 | **Type**: component | **T-shirt**: L | **Story points**: 21–34
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C25[C2.5 N=3] --> C3
|
||
C7[C7] --> C3
|
||
CAL[CameraCalibration] --> C3
|
||
C6[C6 tiles] --> C3
|
||
C3 --> C35[C3.5 AdHoP]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
2D-3D correspondences between nav-camera and the top-N=3 satellite tiles, with RANSAC inliers + reprojection residual. Dominant compute cost in F3. Backbone choice locked (DISK+LightGlue per D-C3-1 = (a)) pending IT-12 verdict.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `CrossDomainMatcher` + DISK+LightGlue (primary) + ALIKED+LightGlue (secondary) + XFeat (alternate); RANSAC + reprojection residual via `RansacFilter` helper; `InsufficientInliersError` propagation.
|
||
|
||
**Out of scope**: refinement (E-C3.5); pose estimation (E-C4); LightGlue runtime ownership (helper).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/04_c3_matcher/description.md`.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class CrossDomainMatcher(Protocol):
|
||
def match(frame: NavCameraFrame, rerank: RerankResult, cal: CameraCalibration) -> MatchResult: ...
|
||
def health_snapshot() -> MatcherHealth: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C25 as C2.5
|
||
participant C3 as C3
|
||
participant C7 as C7
|
||
C25->>C3: RerankResult (n=3)
|
||
loop 3 candidates
|
||
C3->>C7: backbone forward
|
||
C3->>C3: RANSAC + residual
|
||
end
|
||
C3-->>C35: MatchResult (best by inlier count)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C2.5, E-C7, shared `LightGlueRuntime` helper, shared `RansacFilter` helper.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C3-IT-01: best-candidate inlier count p5 ≥ 80 (AC-1.1 partition).
|
||
- C3-IT-02: deterministic `best_candidate_idx == argmax(inlier_count)` with deterministic tie-break.
|
||
- C3-IT-03: cross-domain MRE p95 < 2.5 px (AC-2.2).
|
||
- C3-IT-04: tilt ±20° + 350 m outliers — inlier count p10 ≥ 40 (AC-3.1).
|
||
- C3-IT-05: `InsufficientInliersError` propagation when all N=3 fail.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C3-PT-01: `match` p95 ≤ 180 ms; per-candidate ≤ 60 ms; throughput ≥ 3 Hz; GPU mem ≤ 800 MB combined.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R06** (false positive) — RANSAC + residual + downstream AC-NEW-7.
|
||
- **R10** (Marginals under throttle) — D-CROSS-LATENCY-1 hybrid touches C4 not C3 (C3's budget is fixed).
|
||
|
||
### Effort
|
||
|
||
T-shirt L; 21–34 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `CrossDomainMatcher` interface + composition | 3 |
|
||
| 2 | DISK+LightGlue primary | 5 |
|
||
| 3 | ALIKED+LightGlue secondary | 3 |
|
||
| 4 | XFeat alternate (lightweight) | 3 |
|
||
| 5 | `RansacFilter` helper (shared C3/C3.5/C4) | 3 |
|
||
| 6 | Component-internal tests C3-IT-01..05 + C3-PT-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-1.1, AC-2.2, AC-3.1, AC-4.1.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/04_c3_matcher/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C3.5 — C3.5 AdHoP-Conditional Refinement
|
||
|
||
**Tracker**: AZ-258 | **Type**: component | **T-shirt**: M | **Story points**: 8–13
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C3[C3 MatchResult] --> C35
|
||
C7[C7 AdHoP backbone] --> C35
|
||
C35 --> C4[C4 Pose]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Conditional perspective preconditioning when residual exceeds threshold; passthrough otherwise. Preserves AC-4.1 budget on the steady-state path while keeping refinement for hard frames.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `ConditionalRefiner` + `AdHoPRefiner` + `PassthroughRefiner` (both linked); residual threshold configuration; passthrough fall-through on `RefinerBackboneError`.
|
||
|
||
**Out of scope**: matcher (E-C3); pose (E-C4).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/05_c3_5_adhop/description.md`.
|
||
- Both implementations linked into the deployment binary; runtime gate is a config knob.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class ConditionalRefiner(Protocol):
|
||
def refine_if_needed(frame: NavCameraFrame, mr: MatchResult, threshold: float) -> MatchResult: ...
|
||
def was_invoked() -> bool: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C3 as C3
|
||
participant C35 as C3.5
|
||
participant C7 as C7
|
||
C3->>C35: MatchResult (residual=R)
|
||
alt R > threshold
|
||
C35->>C7: AdHoP backbone forward
|
||
C7-->>C35: refined correspondences
|
||
C35-->>C4: enriched MatchResult
|
||
else
|
||
C35-->>C4: passthrough MatchResult
|
||
end
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C3, E-C7.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C3.5-IT-01: residual reduction ≥ 90% of invocations (AC-2.2 hard-frame portion).
|
||
- C3.5-IT-02: passthrough fall-through on `RefinerBackboneError` with bit-identical correspondences.
|
||
- C3.5-IT-03: invocation rate < 0.30 on Derkachi normal segment.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C3.5-PT-01: invoked p95 ≤ 90 ms; passthrough p95 ≤ 0.5 ms; aggregated added latency ≤ 25 ms.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R10** (latency under throttle) — threshold tunable via operator-orchestrator pre-flight.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 8–13 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `AdHoPRefiner` (TRT engine + perspective preconditioning) | 5 |
|
||
| 2 | `PassthroughRefiner` no-op | 1 |
|
||
| 3 | Conditional gate + passthrough fall-through | 2 |
|
||
| 4 | Component-internal tests C3.5-IT-01..03 + C3.5-PT-01 | 3 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-2.2, AC-4.1.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/05_c3_5_adhop/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C4 — C4 Pose Estimator
|
||
|
||
**Tracker**: AZ-259 | **Type**: component | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C35[C3.5] --> C4
|
||
CAL[CameraCalibration] --> C4
|
||
C7[C7 ThermalState] --> C4
|
||
C5GRAPH[C5 iSAM2 graph] --> C4
|
||
C4 --> C5[C5 add_pose_anchor]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Convert `MatchResult` into `PoseEstimate` (WGS84 + 6×6 covariance + provenance label). OpenCV `solvePnPRansac` + GTSAM `Marginals` for native 6×6; D-CROSS-LATENCY-1 hybrid degrades to Jacobian under thermal throttle.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `OpenCVGtsamPoseEstimator`, GTSAM `Marginals` integration with C5's iSAM2 graph, Jacobian fallback, per-frame thermal-state-driven mode switch, `WgsConverter` helper usage.
|
||
|
||
**Out of scope**: state fusion (E-C5); thermal telemetry source (E-C7).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/06_c4_pose/description.md`.
|
||
- ADR-003 shared substrate: C4 adds factors to C5's graph; co-developed.
|
||
- ADR-006 (Jacobian fallback ~5–10% accuracy loss accepted under throttle).
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class PoseEstimator(Protocol):
|
||
def estimate(mr: MatchResult, cal: CameraCalibration, thermal: ThermalState) -> PoseEstimate: ...
|
||
def current_covariance_mode() -> CovarianceMode: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C35 as C3.5
|
||
participant C4 as C4
|
||
participant C5 as C5 graph
|
||
participant C7 as C7 thermal
|
||
C35->>C4: MatchResult
|
||
C7-->>C4: ThermalState
|
||
alt thermal.throttle
|
||
C4->>C4: Jacobian covariance
|
||
else
|
||
C4->>C5: add factor
|
||
C5->>C5: Marginals.marginalCovariance
|
||
C5-->>C4: Sigma
|
||
end
|
||
C4-->>C5: PoseEstimate (add_pose_anchor)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C3.5, E-C5 (co-developed shared substrate), shared `RansacFilter`, shared `WgsConverter`, shared `SE3Utils`.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C4-IT-01: WGS84 accuracy p80 ≤ 50 m, p50 ≤ 20 m on Derkachi (AC-1.1 / AC-1.2).
|
||
- C4-IT-02: 6×6 SPD covariance + honest under inlier degradation (AC-1.4).
|
||
- C4-IT-03: D-CROSS-LATENCY-1 mode switch within 1 frame (AC-NEW-5 workstation portion).
|
||
- C4-IT-04: shared-graph integration with C5 — prior keyframe perturbations within tolerance.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C4-PT-01: `estimate` p95 MARGINALS ≤ 90 ms; JACOBIAN ≤ 15 ms; switch ≤ 1 frame.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R10** (Marginals throttle) — primary owner of the hybrid switch.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `solvePnPRansac` + IPPE wiring | 3 |
|
||
| 2 | GTSAM `Marginals` factor add to C5 graph | 5 |
|
||
| 3 | Jacobian-degraded fallback | 3 |
|
||
| 4 | Per-frame thermal-state-driven switch | 2 |
|
||
| 5 | `WgsConverter` helper (shared with C8) | 3 |
|
||
| 6 | Component-internal tests C4-IT-01..04 + C4-PT-01 | 3 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-1.1, AC-1.2, AC-1.4, AC-4.1, AC-NEW-5.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/06_c4_pose/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C5 — C5 State Estimator
|
||
|
||
**Tracker**: AZ-260 | **Type**: component | **T-shirt**: XL | **Story points**: 34–55
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
C1[C1 VioOutput] --> C5
|
||
C4[C4 PoseEstimate] --> C5
|
||
C8I[C8 IMU/attitude/gps_health] --> C5
|
||
C5 --> C8O[C8 outbound 5 Hz]
|
||
C5 --> ORTHO[Orthorectifier → C6 mid-flight tile]
|
||
C5 --> FDR[FDR smoothed history]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Own GTSAM iSAM2 + IncrementalFixedLagSmoother (K=10–20). Fuse VIO + Pose + FC IMU into the posterior state; emit smoothed current frame to C8 + smoothed past keyframes to FDR (AC-4.5 revised, NOT FC retroactive). Spoof-promotion gate (AC-NEW-2 / AC-NEW-8). Largest epic alongside C1.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `StateEstimator` + `GtsamIsam2StateEstimator` (production-default) + `EskfStateEstimator` (mandatory simple-baseline); spoof-promotion gate; source-label state machine; smoothed history → FDR; AC-5.2 fallback path.
|
||
|
||
**Out of scope**: VIO (E-C1); pose (E-C4); FC adapter (E-C8); orthorectifier (lives within C5 as an internal subcomponent OR could split — kept inside C5 per the spec).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/07_c5_state/description.md`.
|
||
- ADR-003 (shared GTSAM substrate with C4); co-developed.
|
||
- ADR-008 + spoof gate logic.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class StateEstimator(Protocol):
|
||
def add_vio(o: VioOutput) -> None: ...
|
||
def add_pose_anchor(p: PoseEstimate) -> None: ...
|
||
def add_fc_imu(w: ImuWindow) -> None: ...
|
||
def current_estimate() -> EstimatorOutput: ...
|
||
def smoothed_history(n: int) -> list[EstimatorOutput]: ...
|
||
def health_snapshot() -> EstimatorHealth: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant C1 as C1
|
||
participant C4 as C4
|
||
participant C8I as C8 inbound
|
||
participant C5 as C5 iSAM2
|
||
participant C8O as C8 outbound
|
||
participant FDR as FDR
|
||
C1->>C5: add_vio
|
||
C4->>C5: add_pose_anchor (factor add)
|
||
C8I->>C5: add_fc_imu
|
||
C5->>C5: iSAM2 update + Marginals
|
||
C5->>C8O: current_estimate (5 Hz)
|
||
C5->>FDR: smoothed_history (per AC-4.5)
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C1, E-C4, E-CC-FDR-CLIENT, E-C8 inbound side, shared `ImuPreintegrator`, `SE3Utils`, `WgsConverter`.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C5-IT-01: `last_satellite_anchor_age_ms` reset/monotonic-rise (AC-1.3 binning).
|
||
- C5-IT-02: smoothed-current honest covariance (AC-1.4).
|
||
- C5-IT-03: VIO-only fallback under matcher failure (AC-3.5).
|
||
- C5-IT-04: smoothed past-keyframes → FDR but NOT to FC stream (AC-4.5 revised).
|
||
- C5-IT-05: 3 s no-estimate triggers AC-5.2 fallback.
|
||
- C5-IT-06: spoof-promotion gate ≥ 10 s + visual consistency (AC-NEW-2).
|
||
- C5-IT-07: visual blackout + spoof escalation (AC-NEW-8).
|
||
- C5-ST-01: spoof-rejection logging cannot be silenced.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C5-PT-01: `add_pose_anchor` + `current_estimate` p95 ≤ 60 ms; memory ≤ 100 MB resident.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R05** (iSAM2 silent factor-add failure) — every add logs success/false.
|
||
- **R07** (spoof premature promotion) — primary owner of the gate.
|
||
|
||
### Effort
|
||
|
||
T-shirt XL; 34–55 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `StateEstimator` interface + composition | 3 |
|
||
| 2 | iSAM2 + IncrementalFixedLagSmoother K=10-20 wiring | 5 |
|
||
| 3 | `BetweenFactorPose3` (VIO) + `GenericProjectionFactorCal3DS2` (pose) | 5 |
|
||
| 4 | `Marginals.marginalCovariance` integration | 3 |
|
||
| 5 | Source-label state machine + spoof-promotion gate | 5 |
|
||
| 6 | `EskfStateEstimator` mandatory simple-baseline | 5 |
|
||
| 7 | Smoothed-history → FDR path (NOT to FC) | 3 |
|
||
| 8 | AC-5.2 fallback path | 3 |
|
||
| 9 | Orthorectifier → C6 mid-flight tile gen sub-path | 3 |
|
||
| 10 | Component-internal tests C5-IT-01..07 + C5-PT-01 + C5-ST-01 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-1.3, AC-1.4, AC-3.5, AC-4.5 (revised), AC-5.2, AC-NEW-2, AC-NEW-8.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/07_c5_state/tests.md`.
|
||
|
||
---
|
||
|
||
## E-C8 — C8 FC + GCS Adapter
|
||
|
||
**Tracker**: AZ-261 | **Type**: component | **T-shirt**: L | **Story points**: 21–34
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
FCIN[FC inbound MAVLink/MSP2] --> C8I[C8 inbound]
|
||
C8I --> C5[C5]
|
||
C8I --> C1[C1]
|
||
C5 --> C8O[C8 outbound]
|
||
C8O -->|GPS_INPUT / MSP2_SENSOR_GPS| FCOUT[FC]
|
||
C8O -->|telemetry 1-2 Hz| GCS[QGroundControl]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Per-FC inbound + outbound. Inbound: subscribe to FC IMU/attitude/GPS-health/MAV_STATE; publish `ImuWindow`/`AttitudeWindow`/`GpsHealth`/`FlightStateSignal`. Outbound: encode `EstimatorOutput` for AP (`GPS_INPUT`) and iNav (`MSP2_SENSOR_GPS`) at 5 Hz with honest 6×6 → 2×2 covariance projection. Owns MAVLink 2.0 signing on AP wired channel (D-C8-9 = (d), R03 risk) + per-flight key rotation. Also feeds GCS at 1–2 Hz.
|
||
|
||
### Scope
|
||
|
||
**In scope**: `FcAdapter` + `PymavlinkArdupilotAdapter` + `Msp2InavAdapter`; `GcsAdapter` + `QgcTelemetryAdapter`; signing handshake + per-flight ephemeral key + zeroisation; D-C8-2 source-set switch (gated by IT-3); honest covariance projection.
|
||
|
||
**Out of scope**: state estimation (E-C5); GCS workflow logic (operator side, E-C12).
|
||
|
||
### Architecture notes
|
||
|
||
- File: `components/10_c8_fc_adapter/description.md`.
|
||
- Both AP + iNav adapters typically linked into the deployment binary (per ADR-002 — config picks one at runtime).
|
||
- ADR-008 source-set switch gated by IT-3.
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class FcAdapter(Protocol):
|
||
def open(port: PortConfig, signing_key: bytes | None) -> None: ...
|
||
def subscribe_telemetry(cb: Callable[[FcTelemetryFrame], None]) -> Subscription: ...
|
||
def emit_external_position(o: EstimatorOutput) -> None: ...
|
||
def emit_status_text(msg: str, severity: Severity) -> None: ...
|
||
def request_source_set_switch() -> None: ... # AP only
|
||
def current_flight_state() -> FlightStateSignal: ...
|
||
```
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant FC as FC
|
||
participant C8I as C8 inbound
|
||
participant C5 as C5
|
||
participant C8O as C8 outbound
|
||
FC->>C8I: IMU + attitude + gps_health
|
||
C8I->>C5: ImuWindow / AttitudeWindow / GpsHealth
|
||
C5->>C8O: EstimatorOutput
|
||
C8O->>FC: GPS_INPUT / MSP2_SENSOR_GPS @ 5 Hz
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- E-C5, E-CC-CONF, E-CC-LOG.
|
||
- External: pymavlink, MSP2 client, ArduPilot SITL, QGroundControl SITL.
|
||
|
||
### Acceptance criteria
|
||
|
||
- C8-IT-01: 6×6 → 2×2 honest covariance projection within 1% norm.
|
||
- C8-IT-02: 5 Hz emission jitter ≤ ±5%.
|
||
- C8-IT-03: warm-start GPS from FC EKF ≤ 1 s after C8 ready (AC-5.1).
|
||
- C8-IT-04: GCS stream 1–2 Hz (AC-6.1).
|
||
- C8-IT-05: GCS commands accepted (AC-6.2).
|
||
- C8-IT-06: WGS84 round-trip ≤ 1 cm position residual (AC-6.3).
|
||
- C8-IT-07: source-set switch ≤ 3 s of gate-clear (AC-NEW-2).
|
||
- C8-IT-08: iNav adapter never attempts signing; AP always (RESTRICT-COMM-2).
|
||
- C8-ST-01: MAVLink 2.0 signing handshake passes IT-3 SITL gate (R03).
|
||
- C8-ST-02: per-flight key never persists across flights.
|
||
|
||
### Non-functional requirements
|
||
|
||
- C8-PT-01: `emit_external_position` p95 ≤ 5 ms; inbound IMU callback p95 ≤ 1 ms.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R03** (signing handshake no precedent) — gated by IT-3; D-C8-2-FALLBACK options recorded.
|
||
- **R09** (key compromise) — per-flight ephemeral keys + zeroisation.
|
||
|
||
### Effort
|
||
|
||
T-shirt L; 21–34 points.
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `FcAdapter` interface + composition | 3 |
|
||
| 2 | `PymavlinkArdupilotAdapter` outbound `GPS_INPUT` | 5 |
|
||
| 3 | `Msp2InavAdapter` outbound `MSP2_SENSOR_GPS` | 3 |
|
||
| 4 | Inbound IMU/attitude/gps_health/MAV_STATE subscription | 3 |
|
||
| 5 | Honest 6×6 → 2×2 covariance projection | 3 |
|
||
| 6 | MAVLink 2.0 per-flight signing handshake (AP) | 5 |
|
||
| 7 | Source-set switch (AP D-C8-2 gated by IT-3) | 3 |
|
||
| 8 | `GcsAdapter` + downsampled telemetry | 3 |
|
||
| 9 | Component-internal tests C8-IT-01..08 + C8-PT-01 + C8-ST-01..02 | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-4.3, AC-4.4, AC-5.1, AC-5.2, AC-6.1, AC-6.2, AC-6.3, AC-NEW-2; RESTRICT-FC-1 / FC-2 / FC-3, RESTRICT-COMM-1 / COMM-2.
|
||
|
||
### Testing strategy
|
||
|
||
Per `components/10_c8_fc_adapter/tests.md`.
|
||
|
||
---
|
||
|
||
## E-BBT — Blackbox Tests (FT/NFT scenarios)
|
||
|
||
**Tracker**: AZ-262 | **Type**: tests | **T-shirt**: M | **Story points**: 13–21
|
||
|
||
### System context
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
TESTROOT[tests/ runner] --> FTP[FT-P functional positive]
|
||
TESTROOT --> FTN[FT-N functional negative]
|
||
TESTROOT --> NFTPERF[NFT-PERF Tier-2]
|
||
TESTROOT --> NFTLIM[NFT-LIM resource]
|
||
TESTROOT --> NFTSEC[NFT-SEC security]
|
||
TESTROOT --> NFTRES[NFT-RES resilience]
|
||
TESTROOT --> IT[IT integration]
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
Per-component epics ship their own component-internal unit/contract tests; this epic parents the **suite-level** scenarios already specified in `_docs/02_document/tests/*.md`. They exercise end-to-end ACs and restrictions and bind multiple components together.
|
||
|
||
### Scope
|
||
|
||
**In scope**: implementing the FT-P, FT-N, NFT-PERF, NFT-LIM, NFT-SEC, NFT-RES, IT scenario IDs cited in `traceability-matrix.md`. Test data setup, fixtures (Derkachi flight + AerialVL S03 + e2e-test mock-suite-sat-service), Tier-2 runner orchestration.
|
||
|
||
**Out of scope**: per-component unit/contract tests (live in each component epic).
|
||
|
||
### Architecture notes
|
||
|
||
- Files: `_docs/02_document/tests/blackbox-tests.md`, `performance-tests.md`, `security-tests.md`, `resource-limit-tests.md`, `resilience-tests.md`, `environment.md`, `test-data.md`, `traceability-matrix.md`.
|
||
- Tier-1 vs Tier-2 split per ADR-005.
|
||
|
||
### Interface specification
|
||
|
||
Tests are pytest scenarios; no runtime interface beyond the test runner CLI.
|
||
|
||
### Data flow
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CI as CI runner
|
||
participant FX as Fixtures
|
||
participant SUT as System under test (compose / Tier-2 binary)
|
||
CI->>FX: stage Derkachi corpus + SITL containers
|
||
CI->>SUT: bring up
|
||
CI->>SUT: drive scenario inputs
|
||
SUT-->>CI: emitted MAVLink + FDR records
|
||
CI->>CI: assert per scenario pass criteria
|
||
```
|
||
|
||
### Dependencies
|
||
|
||
- All component epics (each must ship ready-to-test).
|
||
|
||
### Acceptance criteria
|
||
|
||
- Every scenario ID cited in `traceability-matrix.md` exists, runs, and passes on its target tier.
|
||
- Coverage ≥ 75% gate held (currently 92.4% inclusive / 89.8% strict — confirmed pre-Step-6).
|
||
- PARTIAL / NOT COVERED rows have linked leftover entries explaining the deferral.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Tier-1 full suite wall-clock ≤ 30 min on a developer laptop.
|
||
- Tier-2 NFT suite wall-clock ≤ 90 min on the bench Jetson.
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R11** (statistical headroom) — NFT-RES-03 / NFT-SEC-01 use Monte-Carlo-with-CI per the AC-text relaxation.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 13–21 points (test implementation; scenario specs already exist).
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | Test environment scaffolding (`tests/conftest.py`, fixtures dir, Postgres + SITL bring-up) | 3 |
|
||
| 2 | FT-P-* implementation (positive functional scenarios) | 5 |
|
||
| 3 | FT-N-* implementation (negative functional scenarios) | 3 |
|
||
| 4 | NFT-PERF / NFT-LIM Tier-2 runner integration | 5 |
|
||
| 5 | NFT-SEC implementation (incl. NFT-SEC-02 network egress + NFT-SEC-01 cache poisoning) | 5 |
|
||
| 6 | NFT-RES resilience scenarios | 3 |
|
||
| 7 | IT-3 ArduPilot SITL signing handshake (R03 gate) | 5 |
|
||
| 8 | IT-12 comparative-study runner | 3 |
|
||
|
||
### Key constraints
|
||
|
||
- AC-NEW-3, AC-NEW-5, AC-NEW-7, RESTRICT-HW-1.
|
||
|
||
### Testing strategy
|
||
|
||
This epic IS the testing strategy for system-level scenarios. Per-component testing belongs to component epics.
|
||
|
||
---
|
||
|
||
## E-DEMO-REPLAY — Offline replay mode (video + tlog → per-tick coordinate stream)
|
||
|
||
**Tracker**: AZ-265
|
||
**Type**: feature (deployment-adjacent)
|
||
**T-shirt**: M | **Story points**: 19–24
|
||
**Added**: Decompose Step 2 (cycle 1, 2026-05-10) — **revised 2026-05-14** per ADR-011 (replay-as-configuration; replaces the v1.0.0 four-binary design)
|
||
**Source notes**: `_docs/how_to_test.md` (user-written demo requirements — auto-sync incorporated as child task #8)
|
||
|
||
### System context
|
||
|
||
Demonstrate the GPS-denied positioning pipeline against historical flight data: a video file from the nav camera + a `.tlog` file from the FC. **Per ADR-011, replay is a configuration of the airborne binary, NOT a separate image.** The replay configuration runs the **same C1–C7 + C13 pipeline** the airborne binary runs in live mode; only three strategies differ at startup (chosen by `config.mode = "replay"`):
|
||
|
||
- `FrameSource`: `VideoFileFrameSource` instead of `LiveCameraFrameSource`.
|
||
- `FcAdapter`: `TlogReplayFcAdapter` instead of `PymavlinkArdupilotAdapter` / `Msp2InavAdapter`.
|
||
- `MavlinkTransport`: `NoopMavlinkTransport` instead of `SerialMavlinkTransport` — the C8 outbound encoders run unchanged (the MAVLink bytes are produced and dropped; the user-confirmed design intent is that the only UI-visible output in replay is per-tick `EstimatorOutput` via `JsonlReplaySink`, see below).
|
||
|
||
Additionally, the composition root attaches a `JsonlReplaySink` as an extra listener on C5's `EstimatorOutput` stream — the parent-suite UI tails the resulting JSONL file for the per-tick coordinate display. C13 (FDR) still writes a real flight record (just driven by historical inputs); C8 outbound encoders still run their signing handshake + per-flight key rotation (the operator supplies a dummy signing key); C6 reads the same pre-built tile cache the operator built via the normal pre-flight C10/C11/C12 flow.
|
||
|
||
NO ROS dependency is added — replay reuses the existing C8 `FcAdapter` interface via the strategy pattern.
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph LIVE[Airborne mode — config.mode = "live"]
|
||
CAM[Live camera] --> FS1[LiveCameraFrameSource] --> C1L[C1 VIO]
|
||
FCL[Live FC MAVLink wire] --> SMTL[SerialMavlinkTransport in] --> FCAL[PymavlinkArdupilotAdapter] --> C1L
|
||
C1L --> C2C5L[C2..C5]
|
||
C2C5L --> C8OL[C8 outbound encoders] --> SMTLOUT[SerialMavlinkTransport out] --> FCL
|
||
C2C5L --> FDR[C13 FDR]
|
||
end
|
||
subgraph REPLAY[Replay mode — config.mode = "replay"]
|
||
VID[Video file .mp4/.h264] --> RIA1[ReplayInputAdapter]
|
||
TLOG[tlog file] --> RIA1
|
||
RIA1 --> FS2[VideoFileFrameSource] --> C1R[C1 VIO]
|
||
RIA1 --> FCAR[TlogReplayFcAdapter] --> C1R
|
||
C1R --> C2C5R[C2..C5]
|
||
C2C5R --> C8OR[C8 outbound encoders] --> NMTOUT[NoopMavlinkTransport out — bytes dropped]
|
||
C2C5R --> RSINK[JsonlReplaySink] --> JSONL[results.jsonl — UI tails this]
|
||
C2C5R --> FDR2[C13 FDR]
|
||
end
|
||
```
|
||
|
||
### Problem / Context
|
||
|
||
The parent-suite UI (in `ui/` workspace, out of scope for this repo) needs to demo the GPS-denied positioning end-to-end. Per-component fixtures or simulators would not give the demo end-to-end fidelity. Instead, replay mode runs the production pipeline against historical inputs — demo confidence equals field test confidence on the same footage. **ADR-011 makes this fidelity structural**: the same binary runs in both contexts, so any drift between them is a behavioural-test failure that any unit/integration test can catch, not an SBOM-diff failure between two separate source trees.
|
||
|
||
ROS as the input transport was considered and rejected: the system is MAVLink-native; introducing ROS would (a) add a major new dependency, (b) split production vs. demo code paths, and (c) duplicate code. Reusing the existing C8 `FcAdapter` interface with a tlog-replay strategy is strictly better.
|
||
|
||
### Scope
|
||
|
||
**In scope**:
|
||
- `FrameSource` interface (formalised cross-cutting; previously implicit "camera ingest thread") + `VideoFileFrameSource` strategy + `LiveCameraFrameSource` retrofit (no-op restructure of existing camera plumbing).
|
||
- `TlogReplayFcAdapter` strategy (new C8 `FcAdapter` impl) parsing pymavlink `.tlog` files and emitting `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` at tlog timestamp cadence.
|
||
- `ReplaySink` interface + `JsonlReplaySink` impl (one `EstimatorOutput` per line).
|
||
- `MavlinkTransport` Protocol seam in `c8_fc_adapter/` + `SerialMavlinkTransport` retrofit (no-op restructure of existing live MAVLink transport code) + `NoopMavlinkTransport` strategy — together they keep the C8 outbound encoders byte-identical between live and replay (per replay protocol Invariant 5).
|
||
- `replay_input/` Layer-4 cross-cutting coordinator (`ReplayInputAdapter`) that owns `(video, tlog)` lifecycle, applies the time-offset (manual or auto), and instantiates the three replay strategies above. Composition root sees only standard `FrameSource` + `FcAdapter` + `Clock` after the coordinator is opened.
|
||
- `Clock` injection (per R-DEMO-4) so timer-driven logic in C1–C5 works in both wall-clock (live + replay-realtime) and tlog-simulated (replay-asap) modes.
|
||
- Extension of `compose_root(config)` with a `config.mode == "replay"` branch (NO separate `compose_replay` function; ADR-011).
|
||
- `gps-denied-replay` CLI: thin console-script wrapper that loads `config.yaml`, sets `config.mode = "replay"`, applies the replay-specific paths/flags, and dispatches into the same companion entry point as `gps-denied-onboard`.
|
||
- E2E replay test on a 1–2 min Derkachi clip + matching tlog asserting estimated track within ≤ 100 m of ground-truth GPS for ≥ 80 % of ticks. Asserts mode-agnosticism (replay protocol Invariant 1) via AST scan.
|
||
|
||
**Out of scope**:
|
||
- ROS / ROS2 dependency.
|
||
- HTTP wrapper microservice (parent-suite UI backend shells out to the CLI; defer until subprocess-shape is proven insufficient).
|
||
- Modifying any C1–C7 + C13 component to be replay-aware — they MUST remain mode-agnostic (replay protocol Invariant 1).
|
||
- C6 mid-flight write path (replay reads a pre-built tile cache via the same pre-flight C10/C11/C12 flow; doesn't write).
|
||
- A fourth Docker image (`gps-denied-replay-cli`) — **dropped per ADR-011**; the airborne image IS the replay image; AZ-403 is cancelled.
|
||
- An SBOM-diff CI step for the replay binary — **dropped per ADR-011**; no separate binary exists to diff.
|
||
|
||
### Architecture notes
|
||
|
||
- ADR-001 / ADR-002 / ADR-009 / **ADR-011** all apply. ADR-011 is the design-defining decision for this epic — read it first.
|
||
- New `BUILD_*` flags: `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` (the last one gates BOTH `JsonlReplaySink` and `NoopMavlinkTransport`). **All three default ON for the airborne and research binaries; OFF for operator-orchestrator.** The airborne binary serves both `config.mode = "live"` and `config.mode = "replay"` from a single image.
|
||
- New cross-cutting `FrameSource` interface lives at `src/gps_denied_onboard/frame_source/` (Layer 1 Foundation per `module-layout.md` § layering).
|
||
- New cross-cutting `Clock` interface lives at `src/gps_denied_onboard/clock/` (Layer 1 Foundation).
|
||
- New cross-cutting `replay_input/` coordinator lives at `src/gps_denied_onboard/replay_input/` (Layer 4 Adapters — it instantiates Layer-4 strategies).
|
||
- `compose_root(config)` in `runtime_root/__init__.py` gains a `config.mode` branch. **No separate `compose_replay` function.**
|
||
|
||
### Interface specification
|
||
|
||
```python
|
||
class FrameSource(Protocol):
|
||
def next_frame(self) -> NavCameraFrame | None: ...
|
||
def close(self) -> None: ...
|
||
|
||
class VideoFileFrameSource(FrameSource):
|
||
def __init__(self, video_path: Path, frame_rate_hz: float, camera_id: str): ...
|
||
|
||
class TlogReplayFcAdapter(FcAdapter): # FcAdapter from AZ-261 / E-C8; outbound emits delegate to MavlinkTransport
|
||
def __init__(self, tlog_path: Path, target_fc_dialect: enum {ARDUPILOT, INAV},
|
||
clock: Clock, wgs_converter: WgsConverter,
|
||
mavlink_transport: MavlinkTransport, # NoopMavlinkTransport in replay
|
||
time_offset_ms: int = 0, pace: ReplayPace = ReplayPace.ASAP): ...
|
||
|
||
class MavlinkTransport(Protocol): # new tiny Protocol seam introduced by AZ-400
|
||
def write(self, payload: bytes) -> None: ...
|
||
def close(self) -> None: ...
|
||
|
||
class NoopMavlinkTransport(MavlinkTransport):
|
||
def __init__(self) -> None: ...
|
||
def bytes_written(self) -> int: ... # observability (FDR + INFO log at close)
|
||
|
||
class ReplaySink(Protocol):
|
||
def emit(self, output: EstimatorOutput) -> None: ...
|
||
def close(self) -> None: ...
|
||
|
||
class JsonlReplaySink(ReplaySink):
|
||
def __init__(self, output_path: Path): ...
|
||
|
||
class ReplayInputAdapter: # cross-cutting coordinator in replay_input/
|
||
def __init__(self, *, video_path: Path, tlog_path: Path,
|
||
camera_calibration: CameraCalibration, target_fc_dialect: FcKind,
|
||
wgs_converter: WgsConverter, pace: ReplayPace,
|
||
manual_time_offset_ms: int | None,
|
||
auto_sync_config: AutoSyncConfig) -> None: ...
|
||
def open(self) -> ReplayInputBundle: ... # FrameSource + FcAdapter + Clock + resolved offset
|
||
def close(self) -> None: ...
|
||
|
||
def compose_root(config: Config) -> Runtime: ... # branches on config.mode internally
|
||
```
|
||
|
||
### Data flow
|
||
|
||
Startup → load config / calibration → if `config.mode == "replay"`: build `ReplayInputAdapter` → `.open()` → wire its bundle into the same C1–C5 graph as live + add `JsonlReplaySink` listener + pick `NoopMavlinkTransport`. Per-frame loop is identical to live: `FrameSource → C1 → C2 → C2.5 → C3 → C3.5 → C4 → C5 → emit_external_position (encoder bytes → noop transport in replay) + fdr.write + replay_sink.emit (replay only)`. End of input → close sink → exit.
|
||
|
||
`--pace realtime` paces frames at wall-clock; `--pace asap` runs uncapped (default). The injected `Clock` is wall-clock-derived in `realtime` mode and tlog-timestamp-derived in `asap` mode so component fallback timers (e.g., AC-5.2 3 s no-estimate fallback) trigger consistently in both.
|
||
|
||
### Dependencies
|
||
|
||
- E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8 (every per-frame component).
|
||
- **E-C6** — replay uses the real C6 `FaissDescriptorIndex` to query tiles, identically to live. (This is the architectural change vs. the v1.0.0 epic spec, which excluded C6 from the replay binary.)
|
||
- E-CC-CONF (AZ-246) for `compose_root` extension.
|
||
- E-CC-HELPERS (AZ-264) for `WgsConverter` (tlog GPS → local-tangent-plane).
|
||
- Does NOT depend on E-C10 / E-C11 / E-C12 — these are operator-side concerns; the operator runs the normal pre-flight C10/C11/C12 flow against the operator-orchestrator binary BEFORE the replay run on the airborne binary.
|
||
|
||
### Acceptance criteria
|
||
|
||
- AC-1: `gps-denied-replay` exits 0 on a valid 1-min fixture and produces JSONL with one `EstimatorOutput` line per tlog tick (within ±5 % of `GLOBAL_POSITION_INT` count).
|
||
- AC-2: Each line is a valid JSON object matching the `EstimatorOutput` schema.
|
||
- AC-3: For a fixture with known ground-truth GPS, the L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound).
|
||
- AC-4 (revised per ADR-011): The airborne binary running in `config.mode == "replay"` is byte-identical to the airborne binary running in `config.mode == "live"` for the C1–C7 + C13 components and the C8 outbound encoders. Verified via Invariant 1 (no-mode-branches AST scan in components) + Invariant 5 (encoder-byte-stream diff in unit tests) in AZ-404. **No SBOM diff** — there is only one binary.
|
||
- AC-5: Same input → same output (deterministic) within ≤ 1e-6 float drift in position fields.
|
||
- AC-6: `--pace realtime` runs the 1-min fixture in 60 ± 5 s; `--pace asap` in ≤ 30 s on Tier-1 hardware.
|
||
- AC-7: Without `--time-offset-ms`, the CLI auto-detects the video ↔ tlog offset by correlating video motion-onset (or first-frame timestamp) with the tlog IMU take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s, matching the typical quadcopter take-off signature). On a fixture with known correct offset, the auto-detected offset is within ± 200 ms of ground truth. If auto-detect confidence is < 80 % the CLI logs a WARN and proceeds with the best-guess offset; `--time-offset-ms N` always overrides the auto-detect.
|
||
- AC-8: If neither auto-detect nor manual offset can produce > 95 % of frames with at least one matching IMU window within ± 100 ms, the CLI exits with code 2 and prints both the auto-detected offset (if any) and the percentage of frames-with-IMU-window so the operator can debug.
|
||
- AC-9 (new per ADR-011): The operator's pre-flight workflow for a replay run is identical to a live flight up to the final "fly" step — plan route in suite UI → C12 build cache from real `satellite-provider` → confirm content-hash → run `gps-denied-replay` instead of running the airborne binary on the UAV. Verified by the AZ-404 E2E fixture's setup (which runs the operator pre-flight flow before invoking the replay CLI).
|
||
- AC-10 (new per ADR-011): The `--mavlink-signing-key PATH` CLI arg is mandatory in replay mode (the operator supplies a dummy key file); the C8 outbound signing handshake runs in replay and its bytes are dropped by `NoopMavlinkTransport`. Verified by a unit test asserting `NoopMavlinkTransport.bytes_written() > 0` after a replay run.
|
||
|
||
### Non-functional requirements
|
||
|
||
- Cold-start ≤ 5 s (not subject to AC-NEW-1's 30 s budget — that's live-airborne-only).
|
||
- Throughput ≥ 5 × real time on Jetson AGX Orin for `--pace asap`.
|
||
- Memory ≤ 4 GB resident (note: the airborne image's nominal memory budget is 8 GB shared on Jetson Orin Nano Super; replay has the same memory headroom as live).
|
||
|
||
### Risks & mitigations
|
||
|
||
- **R-DEMO-1**: Tlog ↔ video timestamp drift across long flights, AND the more-common case that recordings on the operator workstation are not synchronised at all (camera and FC start independently, often minutes apart). **Mitigation**: auto-sync via IMU take-off detection (AC-7) is the default; `--time-offset-ms N` is the manual override. If take-off pattern is ambiguous (e.g., fixed-wing hand-launch instead of quadcopter, or tlog includes pre-arm motion), CLI WARNs and falls back to the manual override.
|
||
- **R-DEMO-2**: Pymavlink slow on multi-GB tlogs. **Mitigation**: stream-parse, never materialise; benchmark + document throughput floor.
|
||
- **R-DEMO-3**: Demo footage missing required FC messages (HIL mode etc.). **Mitigation**: `ReplayInputAdapter.open()` fails fast at startup, listing missing message types and the components that need them.
|
||
- **R-DEMO-4**: Production C1–C5 paths bake real-time-cadence assumptions (e.g., 5 s fallback timer). **Mitigation**: `Clock` injection (wall-clock for live + replay-realtime, tlog-derived for replay-asap); captured in ADR-011.
|
||
- **R-DEMO-5 (new per ADR-011)**: Live and replay diverge silently because they share one composition root. **Mitigation**: replay protocol Invariant 1 (no mode-aware branches in components) enforced by AST scan in AZ-404 + Invariant 5 (encoder byte streams identical between modes) enforced by unit-test diff. Any drift becomes a test failure, not a silent dependency-set divergence as it would have been under the v1.0.0 four-binary design.
|
||
|
||
### Effort
|
||
|
||
T-shirt M; 19–24 points across 7 child tasks (was 27–32 across 8; AZ-403 dropped per ADR-011; AZ-401 shrank from 3 → 2 points).
|
||
|
||
### Child issues
|
||
|
||
| # | Title | Pts |
|
||
|---|-------|-----|
|
||
| 1 | `FrameSource` interface (cross-cutting) + `VideoFileFrameSource` strategy + `LiveCameraFrameSource` retrofit | 3 |
|
||
| 2 | `TlogReplayFcAdapter` strategy (pymavlink stream parser → inbound DTOs; outbound emits via injected `MavlinkTransport`) | 5 |
|
||
| 3 | `ReplaySink` interface + `JsonlReplaySink` impl + `MavlinkTransport` Protocol seam + `SerialMavlinkTransport` retrofit + `NoopMavlinkTransport` | 3 |
|
||
| 4 | Extend `compose_root(config)` with `config.mode == "replay"` branch (NO separate composition root); wire JSONL sink + `NoopMavlinkTransport` | 2 |
|
||
| 5 | `gps-denied-replay` console-script wrapper (mode-config dispatcher) | 3 |
|
||
| ~~6~~ | ~~`gps-denied-replay-cli` Dockerfile + GitHub Actions matrix entry + SBOM diff~~ | ~~CANCELLED per ADR-011~~ |
|
||
| 7 | E2E replay fixture test (Derkachi 1–2 min clip + tlog; AC-3 ≤ 100 m ≥ 80 % assertion + AC-4 mode-agnosticism + AC-9 operator workflow) | 5 |
|
||
| 8 | Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8) + the `ReplayInputAdapter` coordinator under `replay_input/` | 5 |
|
||
|
||
### Key constraints
|
||
|
||
- ADR-001 / ADR-002 / ADR-009 / **ADR-011**.
|
||
- C1–C7 + C13 components MUST remain mode-agnostic; replay-aware logic lives only in the composition root branch, the new strategies (FrameSource / FcAdapter / MavlinkTransport / ReplaySink / Clock), the `replay_input/` coordinator, and the CLI wrapper.
|
||
- No HTTP server in the airborne binary regardless of mode; HTTP wrapper, if added later, lives in operator-orchestrator per `module-layout.md` Layer-4 placement.
|
||
- MAVLink 2.0 signing key is mandatory in both modes (replay protocol Invariant 11).
|
||
|
||
### Testing strategy
|
||
|
||
Unit tests under `tests/unit/frame_source/`, `tests/unit/c8_fc_adapter/test_tlog_replay_adapter.py`, `tests/unit/c8_fc_adapter/test_replay_sink.py`, `tests/unit/c8_fc_adapter/test_noop_mavlink_transport.py`, `tests/unit/replay_input/`, `tests/unit/cli/test_replay_cli.py`. E2E under `tests/e2e/replay/` running the CLI against the Derkachi fixture (Tier-1 capable; gated by `RUN_REPLAY_E2E=1` in CI). No FT/NFT scenarios at this epic — those live in E-BBT.
|
||
|
||
---
|
||
|
||
## Lessons applied (Step 6 step-0 retrospective)
|
||
|
||
`_docs/LESSONS.md` does not yet exist (this is the project's first cycle), so no prior estimation/architecture/dependencies lessons were folded into the sizing above. When this cycle ends, the Final step's quality checklist should propose a lessons file capturing:
|
||
|
||
- C2.5 ↔ C3 helper-ownership (R14) — generalisable lesson: when two siblings share a runtime, place ownership in a shared helper from day one rather than discovering the cycle in a 4a evaluator pass.
|
||
- ADR-007 reversal (mock-as-fixture) — generalisable lesson: a test fixture is not a component; promoting one inflates architectural surface and risks contract drift.
|
||
- D-PROJ-2 / D-PROJ-3 carryforwards — generalisable lesson: cross-suite design dependencies belong in `_process_leftovers/` from the moment they are recognised, with full payload so a later cycle can replay them.
|
||
|
||
These three are candidates for the next cycle's `LESSONS.md`.
|
||
|