mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 08:41:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 5fe67023b2 [AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor

Implements two new C12 services and rebalances the C11/C12 boundary
in one atomic commit:

* AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the
  `flight_footer` FDR record's `clean_shutdown` field; 4 refusal
  modes; new FdrFooterReader Protocol + LocalFdrFooterReader.
* AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization
  hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol
  cut (E-C8 owns the future pymavlink concrete); new FDR record
  kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals,
  reason 200 chars).
* AZ-523 C11 internal flight-state gate removed (SRP refactor):
  `confirm_flight_state` / `FlightStateSignal` use /
  `FlightStateNotOnGroundError` deleted from C11; TileUploader
  contract bumped to v2.0.0 (frozen) with migration note; AZ-317
  superseded.
* AZ-524 Package rename `c12_operator_tooling` →
  `c12_operator_orchestrator` across source, tests, pyproject,
  CMake, Dockerfile, compose, CI, runtime-root services class
  (`OperatorOrchestratorServices`) + factory function
  (`build_operator_orchestrator`), logger namespaces, config slug,
  docs, and the E-C12 epic title.

Tests: 1543 passed, 80 skipped (all environment gates). Targeted
AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start
NFR-perf still ≤ 500 ms p99.

Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump
comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523
+ AZ-524 created and closed as audit-trail tickets.

See `_docs/03_implementation/batch_44_cycle1_report.md`.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-13 19:42:46 +03:00

82 KiB

Raw Blame History

Work-Item Epics — gps-denied-onboard Plan cycle 1

This file is the local epic draft for Plan Step 6. Tracker IDs (AZ-XXX) are now populated for every epic — they live in Jira project AZ. The canonical E-* ↔ AZ-NN mapping below is the source of truth referenced from each Jira epic's description.

Conventions

Issue type: Epic.
Epic descriptions are self-contained per the plan-skill rule: a developer reading only the epic should understand the full context. Each epic has the 14 required sections (system context, problem, scope, architecture notes, interface spec, data flow, dependencies, AC, NFRs, risks, effort, child issues, key constraints, testing strategy).
Effort sizing: T-shirt size + story-points range for the epic; per-task story points (PBI complexity) follow the user rule (1, 2, 3, 5, 8 — no PBI > 5; create 2/3, sometimes 5).
Cross-cutting epics parent exactly one shared implementation task; component epics consuming the concern declare a dependency, never re-implement locally.
Dependency rule: no epic depends on a later one in this index.

Decompose-time amendment (cycle 1, dated 2026-05-10)

Row 20 (E-CC-HELPERS / AZ-264) was added during Decompose Step 2 to comply with the cross-cutting rule. The 8 shared helpers (ImuPreintegrator, SE3Utils, LightGlueRuntime, WgsConverter, Sha256Sidecar, EngineFilenameSchema, RansacFilter, DescriptorNormaliser) were originally listed as child issues inside their largest-consumer component epics (e.g., ImuPreintegrator under E-C1 child #5, LightGlueRuntime under E-C2.5 child #2). Those child-issue listings are now superseded — helper ownership moves to E-CC-HELPERS, and component epics consume helpers as dependencies. The original component epic descriptions in Jira still reference the helpers in their child-issue tables; those will be reconciled at the next epic-edit pass (or at Step 4 cross-verification).

Index

#	Epic ID	Title	Type	Tracker	T-shirt	Story Pts	Depends on
1	E-BOOT	Bootstrap & Initial Structure	bootstrap	AZ-244	M	13–21	—
2	E-CC-LOG	Cross-Cutting: Structured JSON Logging	cross-cutting	AZ-245	S	5–8	E-BOOT
3	E-CC-CONF	Cross-Cutting: Configuration & Composition Root	cross-cutting	AZ-246	S	5–8	E-BOOT
4	E-CC-FDR-CLIENT	Cross-Cutting: FDR Producer Client (lock-free queue + record schema)	cross-cutting	AZ-247	M	8–13	E-BOOT, E-CC-LOG
5	E-C13	C13 Flight Data Recorder (writer thread + segments + cap)	component	AZ-248	L	21–34	E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT
6	E-C7	C7 On-Jetson Inference Runtime	component	AZ-249	L	21–34	E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT
7	E-C6	C6 Tile Cache + Spatial Index	component	AZ-250	M	13–21	E-BOOT, E-CC-LOG, E-CC-CONF
8	E-C11	C11 Tile Manager (TileDownloader + TileUploader)	component	AZ-251	M	13–21	E-C6, E-CC-CONF, E-CC-LOG
9	E-C10	C10 Pre-flight Cache Provisioning	component	AZ-252	M	13–21	E-C6, E-C7, E-CC-LOG
10	E-C12	C12 Operator Pre-flight Orchestrator	component	AZ-253	M	13–21	E-C10, E-C11, E-CC-LOG
11	E-C1	C1 Visual / Visual-Inertial Odometry	component	AZ-254	XL	34–55	E-BOOT, E-CC-FDR-CLIENT, E-C7
12	E-C2	C2 Visual Place Recognition	component	AZ-255	L	21–34	E-C6, E-C7, E-CC-FDR-CLIENT
13	E-C2.5	C2.5 Inlier-based Re-rank	component	AZ-256	S	5–8	E-C2, E-C7, E-C6 (LightGlue helper shared with C3)
14	E-C3	C3 Cross-Domain Matcher	component	AZ-257	L	21–34	E-C2.5, E-C7
15	E-C3.5	C3.5 AdHoP-Conditional Refinement	component	AZ-258	M	8–13	E-C3, E-C7
16	E-C4	C4 Pose Estimator	component	AZ-259	M	13–21	E-C3.5, E-C5 (shared GTSAM substrate; co-developed)
17	E-C5	C5 State Estimator	component	AZ-260	XL	34–55	E-C1, E-C4 (shared graph), E-CC-FDR-CLIENT
18	E-C8	C8 FC + GCS Adapter	component	AZ-261	L	21–34	E-C5, E-CC-CONF, E-CC-LOG
19	E-BBT	Blackbox Tests (FT/NFT scenarios)	tests	AZ-262	M	13–21	every component epic ships its component-internal tests under its own epic; this one parents the suite-level FT/NFT scenarios in `_docs/02_document/tests/*.md`
20	E-CC-HELPERS	Cross-Cutting: Common Helpers (8 shared utilities)	cross-cutting	AZ-264	M	13–21	E-BOOT, E-CC-LOG (added in Decompose Step 2 — supersedes per-component helper child-issues from cycle 1)
21	E-DEMO-REPLAY	Offline replay mode (video + tlog → per-tick coordinate stream)	feature	AZ-265	M	22–27	E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8, E-CC-CONF (added in Decompose Step 2 — enables parent-suite UI demo via subprocess + JSONL streaming)

High-level component dependency diagram

flowchart TB
  BOOT[E-BOOT Bootstrap]
  LOG[E-CC-LOG Logging]
  CONF[E-CC-CONF Config + Composition Root]
  FDRC[E-CC-FDR-CLIENT FDR Producer Client]
  C13[E-C13 FDR]
  C7[E-C7 Inference Runtime]
  C6[E-C6 Tile Cache]
  C11[E-C11 Tile Manager]
  C10[E-C10 Cache Provisioning]
  C12[E-C12 Operator Tooling]
  C1[E-C1 VIO]
  C2[E-C2 VPR]
  C25[E-C2.5 Re-rank]
  C3[E-C3 Matcher]
  C35[E-C3.5 AdHoP]
  C4[E-C4 Pose]
  C5[E-C5 State]
  C8[E-C8 FC Adapter]
  BBT[E-BBT Blackbox Tests]
  HELP[E-CC-HELPERS Common Helpers]
  DEMO[E-DEMO-REPLAY Offline Replay Mode]

  BOOT --> LOG --> FDRC --> C13
  BOOT --> CONF --> C13
  BOOT --> CONF --> C7
  BOOT --> LOG --> HELP
  C13 -.-> C7
  CONF --> C6 --> C11
  C6 --> C10
  C7 --> C10
  C10 --> C12
  C11 --> C12
  C7 --> C2 --> C25 --> C3 --> C35 --> C4
  C6 --> C2
  C6 --> C25
  C1 --> C5
  C4 <--> C5
  C5 --> C8
  FDRC --> C1
  FDRC --> C5
  C8 --> BBT
  C12 --> BBT
  HELP -.-> C1
  HELP -.-> C2
  HELP -.-> C25
  HELP -.-> C3
  HELP -.-> C35
  HELP -.-> C4
  HELP -.-> C5
  HELP -.-> C6
  HELP -.-> C7
  HELP -.-> C8
  HELP -.-> C10
  HELP -.-> C11
  HELP -.-> C12
  C1 --> DEMO
  C5 --> DEMO
  C8 --> DEMO
  CONF --> DEMO

E-BOOT — Bootstrap & Initial Structure

Tracker: AZ-244 Type: bootstrap T-shirt: M | Story points: 13–21 Owner: onboard team

System context

flowchart LR
  EBOOT[E-BOOT scaffolding] --> SRC[src/ component dirs]
  EBOOT --> CICD[CI Tier-1 + Tier-2 jobs]
  EBOOT --> DOCKER[docker-compose.test.yml]
  EBOOT --> DB[Postgres init scripts]
  EBOOT --> TESTROOT[tests/ + tests/fixtures/]

Problem / Context

No source layout exists yet. Every downstream epic assumes a defined repo skeleton: src/components/<id>_<name>/, src/shared/<concern>/, tests/, tests/fixtures/, plus the Tier-1 Docker compose, the Tier-2 CI job, the Postgres init scripts that match data_model.md, and the operator-orchestrator tarball build path. Until this exists, no other epic can start.

Scope

In scope:

Create src/components/<id>_<name>/ for all 14 components with empty package init.
Create src/shared/{logging,config,fdr_client,crypto,calibration_loader}/ placeholders.
pyproject.toml (Python) + CMakeLists.txt (C++ where used by C1) with the project's pinned dep set.
Tier-1 docker-compose.test.yml skeleton (companion + Postgres + e2e-runner; mock-suite-sat-service compose pulled in only by upload tests).
Tier-2 CI job that runs on the bench Jetson runner, with the JetPack 6.2 / TRT 10.3 / SM 87 image pinned per ADR-005.
Postgres init scripts for the schema in data_model.md.
tests/ directory with tests/fixtures/, tests/tmp/, tests/conftest.py.
Empty runtime_root.py for the airborne composition root + operator_tool/__main__.py for the operator side.
.gitignore covering binaries, engine caches, FDR segments, ephemeral keys.
README with run commands.

Out of scope:

Any per-component logic (each component's epic owns its own implementation).
Cross-cutting impl (logging / config / FDR client live in their own epics).

Architecture notes

ADR-005 (Tier-1 / Tier-2 are first-class) drives the CI split.
ADR-009 (composition root) places runtime_root.py at the airborne entrypoint and operator_tool/__main__.py at the operator side.
ADR-002 (build-time exclusion) requires per-implementation CMake BUILD_* flags and the SBOM diff to be wired in CI from day one.
ADR-004 (process isolation) requires the airborne build target to refuse c11_tilemanager/ symbols. SBOM diff hook lives here from Bootstrap onward.

Interface specification

This epic exposes no runtime interface; it ships repository scaffolding only.

Data flow

N/A.

Dependencies

Epic dependencies: none.
External: GitHub Actions runner pool (Tier-1 Docker), bench Jetson runner (Tier-2), pinned base images (JetPack 6.2, Postgres 16, mcr.microsoft.com/dotnet/aspnet:8.0-alpine for the test fixture).

Acceptance criteria

docker compose -f docker-compose.test.yml up -d brings up companion + Postgres + e2e-runner cleanly on a fresh workstation.
Tier-2 CI smoke-job (echo $JETPACK_VERSION + nvidia-smi) passes on the bench Jetson.
pytest tests/ -q --collect-only discovers the empty tests/ tree without errors.
The SBOM diff CI step exists and fails the build if c11_tilemanager ever appears in the airborne production-binary artifact (R02 enforcement seed).
runtime_root.py runs and exits cleanly with a "no components configured" message (proves composition root wiring).

Non-functional requirements

CI cold-build wall-clock ≤ 10 min on Tier-1; ≤ 6 min on Tier-2 (just the smoke-job).
Repo size at this stage ≤ 5 MB (no fixtures committed).

Risks & mitigations

R12 (single deployment camera) — Bootstrap's CI must not assume the unit is plugged in; Tier-2 smoke-job runs without the camera, only against TRT/SM/JP version.

Effort

T-shirt M; 13–21 story points across child PBIs (each ≤ 5 points).

Child issues (PBIs)

#	Title	Pts
1	Repo scaffolding: `src/components/`, `src/shared/`, `tests/`, `runtime_root.py`	2
2	`pyproject.toml` + `CMakeLists.txt` with pinned deps	3
3	Tier-1 `docker-compose.test.yml` skeleton + Postgres init	3
4	Tier-2 CI smoke-job on bench Jetson	3
5	SBOM diff CI step (R02 enforcement seed; fails on `c11_tilemanager` in airborne artifact)	3
6	`.gitignore` + `README.md` + run commands	2
7	`runtime_root.py` minimum (compose root + "no components configured" exit path)	2

Key constraints

RESTRICT-HW-1 (Jetson Orin Nano Super, 8 GB shared LPDDR5, 25 W) — Tier-2 image pins SM 87 / JP 6.2 / TRT 10.3.
RESTRICT-FC-1 (AP + iNav supported; PX4 out of scope) — composition root wires only AP + iNav adapters.

Testing strategy

CI smoke tests on every PR (Tier-1 compose-up, Tier-2 nvidia-smi).
No unit tests yet — those live in component epics.

E-CC-LOG — Cross-Cutting: Structured JSON Logging

Tracker: AZ-245 Type: cross-cutting T-shirt: S | Story points: 5–8

System context

Every component's § 9 Logging Strategy mandates structured JSON logging at ERROR / WARN / INFO / DEBUG levels with per-frame fields (frame_id, kind, component-specific keys). A single shared logger module under src/shared/logging/ produces these records; every component imports it.

flowchart LR
  COMP[Any component] --> LOGGER[src/shared/logging<br/>structured JSON]
  LOGGER --> STDOUT[stdout / journald]
  LOGGER --> FDR[FDR (via E-CC-FDR-CLIENT for ERROR + WARN)]

Problem / Context

If every component rolls its own logger, format drift is guaranteed. The traceability-matrix and post-flight FDR analysis rely on a stable JSON schema; a shared logger is the only honest way.

Scope

In scope:

src/shared/logging/__init__.py exporting get_logger(component_id: str) -> Logger.
JSON formatter with stable field ordering (ts, level, component, frame_id, kind, msg, ...kv).
Drop-in RotatingStdoutHandler for Tier-1 dev; JournaldHandler for Tier-2 production.
Bridge into the FDR client for ERROR + WARN levels (handler subscribes to log records and enqueues a kind = "log" FdrRecord).
Helpers for the documented per-frame log shapes (vio.frame_id, vpr.top10_distances, etc.) so component code is short.

Out of scope: per-component log content (lives in each component epic's child PBIs).

Architecture notes

Stdlib logging + python-json-logger (or orjson formatter for speed). No new dependency beyond what's already in pyproject.toml. No third-party log aggregator — Tier-1 uses Docker stdout capture; Tier-2 uses journald.

Interface specification

def get_logger(component_id: str) -> logging.Logger: ...

class StructuredJsonHandler(logging.Handler):
    """JSON formatter + FDR bridge for ERROR/WARN."""

class FdrLogBridge:
    """Subscribed by the logger; forwards ERROR + WARN to E-CC-FDR-CLIENT.enqueue."""

Data flow

sequenceDiagram
  participant C as Component
  participant L as Logger
  participant S as stdout
  participant F as FDR Client
  C->>L: log.warn("VPR top-1 above threshold", distance=0.42)
  L->>S: {"level":"WARN", "component":"c2", ...}
  L->>F: enqueue(kind="log", level="WARN", payload=...)

Dependencies

Depends on E-BOOT.
External: python-json-logger or orjson (whichever is already pinned).

Acceptance criteria

Every component test that asserts a log message uses the shared logger and finds the expected JSON shape.
ERROR + WARN records appear in FDR with kind = "log" and a back-reference to the originating component.
INFO + DEBUG do NOT appear in FDR (per-component § 9 storage rule).
Log format passes a contract test (tests/contract/log_schema.py) verifying field names + ordering + required keys.

Non-functional requirements

Per-record latency p99 ≤ 0.2 ms (lock-free emit on the hot path).
No allocation in the steady-state DEBUG path beyond the message string itself.

Risks & mitigations

R13 (FDR queue overrun) — the FDR bridge uses E-CC-FDR-CLIENT's drop-oldest semantics; it never blocks the caller.

Effort

T-shirt S; 5–8 points.

Child issues

#	Title	Pts
1	`src/shared/logging/` module + JSON formatter + handlers	3
2	FDR log bridge (ERROR + WARN → kind=log)	2
3	Contract test `tests/contract/log_schema.py`	2

Key constraints

AC-NEW-3 (FDR ≤ 64 GB / flight, no silent drops) — DEBUG must not flow into FDR; verified by the contract test.

Testing strategy

Unit tests for the formatter (field ordering + escaping).
Contract test against the FDR record schema (kind=log).
Integration via every component's tests.md (each component asserts at least one log message).

E-CC-CONF — Cross-Cutting: Configuration & Composition Root

Tracker: AZ-246 Type: cross-cutting T-shirt: S | Story points: 5–8

System context

ADR-001 (runtime selection by config) + ADR-009 (composition root) together require a single shared loader that materialises the Config object at process startup, plus a compose_root(config) function that constructs each strategy/component instance with its dependencies. No component instantiates another component itself.

flowchart LR
  ENV[ENV vars] --> LOADER
  YAML[config.yaml] --> LOADER
  CALIB[Camera calibration JSON] --> LOADER
  LOADER[src/shared/config/loader] --> ROOT[runtime_root.py / operator_tool/__main__.py]
  ROOT --> COMPS[component instances]

Problem / Context

Without a single source of truth for configuration, the BUILD_* + runtime-strategy-selection rules of ADR-001/002/009 collapse — components silently fall back to defaults, and the composition root grows local config-parsing logic that drifts. The CI gate that ensures only the linked strategies are selectable also lives here.

Scope

In scope:

src/shared/config/loader.py: env + YAML + camera-calibration JSON merging with explicit precedence (env > YAML > defaults).
Config dataclass (frozen) covering every component's startup knob.
compose_root(config) -> RuntimeRoot for the airborne process; compose_operator(config) -> OperatorRoot for the tooling side.
Strategy-vs-build-flag consistency check at startup: refuse to start if config selects a strategy whose BUILD_* flag was off in the linked binary.

Out of scope: any component's specific config shape (defined inside its own epic).

Architecture notes

ADR-001, ADR-002, ADR-009 all converge here.
The composition root is the only place import of a concrete VioStrategy / VprStrategy / etc. is allowed; component code imports the abstract interface only.

Interface specification

@frozen
class Config: ...  # populated by union of every component's config schema

def load_config(env: dict[str, str], paths: list[Path]) -> Config: ...

def compose_root(config: Config) -> RuntimeRoot: ...
def compose_operator(config: Config) -> OperatorRoot: ...

Data flow

Startup-only — runs once per process. No per-frame path.

Dependencies

Depends on E-BOOT.

Acceptance criteria

compose_root constructs a runnable airborne process for every documented config preset (default deployment, IT-12 research-binary, smoke-test minimal).
Strategy/build-flag mismatch triggers an explicit StrategyNotLinkedError with a clear message (no silent fallback).
Config precedence (env > YAML > defaults) verified by unit tests for at least 3 keys per layer.
runtime_root.py exits with code 0 when given a valid config and no components actually do work (reachability proof).

Non-functional requirements

Cold-start config load + compose ≤ 1 s on Tier-2 (counts toward AC-NEW-1's 30 s budget).

Risks & mitigations

R02 (ADR-004 process isolation) — compose_root's strategy/build-flag check is the third enforcement gate (after SBOM diff and runtime self-check) preventing C11 from running airborne.

Effort

T-shirt S; 5–8 points.

Child issues

#	Title	Pts
1	`src/shared/config/loader.py` + `Config` dataclass	3
2	`compose_root` + `compose_operator` skeletons + StrategyNotLinkedError	3
3	Unit tests for env/YAML/defaults precedence	2

Key constraints

ADR-002 (build-time exclusion) — only linked strategies selectable.

Testing strategy

Unit: precedence + StrategyNotLinkedError.
Integration: every documented preset starts cleanly.

E-CC-FDR-CLIENT — Cross-Cutting: FDR Producer Client

Tracker: AZ-247 Type: cross-cutting T-shirt: M | Story points: 8–13

System context

C13 owns the FDR writer thread, segment files, and the 64 GB cap. Every other component publishes via a producer-side client: lock-free enqueue + an FdrRecord schema versioned in RecordSchema. This epic owns ONLY the producer side; the writer-thread internals belong to E-C13.

flowchart LR
  PROD[Component producer] --> Q[lock-free ring buffer]
  Q --> WRITER[E-C13 writer thread]
  WRITER --> SEG[segment file on NVM]

Problem / Context

Producer-side correctness (drop-oldest with rollover-log, schema versioning, never-block) is independent of where the file lands. Co-locating producer logic inside E-C13 would force every component test to spin up the writer thread; a thin shared client lets component tests use a fake sink.

Scope

In scope:

src/shared/fdr_client/__init__.py exporting FdrClient(producer_id: str) -> Client.
Lock-free SPSC ring buffer per producer; capacity configurable (default per producer in Config).
FdrRecord versioned schema (orjson or msgpack — pinned in E-BOOT).
Drop-oldest behaviour writing a structured kind=overrun record with producer_id + dropped count (never silent).
FakeFdrSink for component-level tests.

Out of scope: writer thread, segment files, 64 GB cap, rollover policy (E-C13).

Architecture notes

AC-NEW-3 (no silent drops) is enforced HERE: drop-oldest always emits the overrun record.
Schema versioning prevents post-flight tooling breakage when payload classes evolve.

Interface specification

class FdrClient:
    def __init__(self, producer_id: str): ...
    def enqueue(self, record: FdrRecord) -> None: ...   # lock-free, never blocks
    def flush(self) -> None: ...                         # used by tests only

Data flow

sequenceDiagram
  participant C as Component
  participant Q as Ring buffer
  participant W as Writer (E-C13)
  C->>Q: enqueue(record)
  alt overrun
    Q->>Q: drop oldest + emit kind=overrun record
  end
  W->>Q: dequeue (in writer thread)

Dependencies

Depends on E-BOOT, E-CC-LOG.
Consumed by every component that emits FDR records.

Acceptance criteria

enqueue never blocks even under writer-thread stall (verified by C13-IT-05 from the C13 tests.md).
Every overrun event produces a structured record with non-zero dropped_count and the originating producer_id.
Schema version bump (e.g., adding a new field) does not break post-flight tooling that reads at version N-1 (forward-compatible parser).

Non-functional requirements

enqueue p99 ≤ 5 µs on Tier-2 (no allocation on the steady-state path; pre-sized buffers).
Per-producer ring buffer size ≤ configured cap (no unbounded growth).

Risks & mitigations

R13 (queue overrun) — the design IS the mitigation: drop-oldest + always log.

Effort

T-shirt M; 8–13 points.

Child issues

#	Title	Pts
1	Lock-free SPSC ring buffer per producer	5
2	`FdrRecord` schema + versioned serialiser (orjson/msgpack)	3
3	Drop-oldest + `kind=overrun` record emission	2
4	`FakeFdrSink` for component tests	2

Key constraints

AC-NEW-3 (no silent drops).

Testing strategy

Unit: ring buffer correctness under contention; overrun record emitted.
Property tests: forward-compat parser at version N-1.

E-C13 — C13 Flight Data Recorder

Tracker: AZ-248 | Type: component | T-shirt: L | Story points: 21–34

System context

flowchart LR
  ALL[All components] -->|enqueue via E-CC-FDR-CLIENT| Q[per-producer queues]
  Q --> W[C13 writer thread]
  W --> SEGS[segmented files on NVM]
  SEGS -.->|post-landing| OPTOOL[E-C12 retrieval]

Problem / Context

Per-flight ≤ 64 GB record of every payload class onboard, no silent drops, raw frames excluded except the ≤ 0.1 Hz failed-tile thumbnail forensic exception (AC-NEW-3, AC-8.5). Single writer thread; every other component produces.

Scope

In scope: writer thread, segment file lifecycle, 64 GB cap with oldest-segment-dropped policy, per-flight FlightHeader + FlightFooter, atomic segment rotation, mid-flight tile snapshot path, failed-tile thumbnail rate cap, refusal of takeoff when open_flight fails.

Out of scope: producer-side enqueue (E-CC-FDR-CLIENT); post-flight retrieval UI (E-C12).

Architecture notes

File: _docs/02_document/components/14_c13_fdr/description.md is the canonical spec.
IncrementalFixedLagSmoother from C5 publishes smoothed past-keyframes via FDR ONLY (AC-4.5 revised) — NOT into the FC stream.
Segment rotation uses atomicwrites; cross-process safety on the FDR root via filelock.

Interface specification

class FdrWriter:
    def open_flight(header: FlightHeader) -> None: ...     # raises FdrOpenError
    def write_record(record: FdrRecord) -> None: ...        # lock-free; FdrQueueOverrunError logged not raised
    def close_flight() -> FlightFooter: ...
    def current_size_bytes() -> int: ...
    def is_rolling() -> bool: ...

Data flow

sequenceDiagram
  participant Prods as Producers (every component)
  participant Q as Per-producer queues
  participant W as Writer thread
  participant FS as NVM segment file
  Prods->>Q: enqueue(record)
  W->>Q: dequeue
  W->>FS: serialise + append
  alt segment >= cap
    W->>FS: atomic rotate; drop oldest if total > 64 GB; emit kind=segment_rollover
  end

Dependencies

E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT.

Acceptance criteria

AC-NEW-3: synthetic 8 h replay produces ≤ 64 GB on disk, with every drop accompanied by a kind=overrun and/or kind=segment_rollover record.
AC-8.5: kind=raw_nav_frame writes raise RawFrameWriteForbiddenError; kind=failed_tile_thumbnail rate-limited to ≤ 0.1 Hz.
AC-1.4 / AC-4.5: every smoothed past-keyframe revision lands in FDR; the FC emission stream is unchanged.
AC-NEW-3 takeoff gate: FdrOpenError aborts takeoff before the FC adapter is opened.

Non-functional requirements

Writer throughput ≥ 200 Hz aggregate (per C13-PT-01).
Per-record serialise + write p95 ≤ 5 ms.

Risks & mitigations

R13 (queue overrun) — drop-oldest + always-log.
R02 (ADR-004) — C13 runs in the airborne process; no cross-process FDR root contention with C11 (C11 not airborne).

Effort

T-shirt L; 21–34 points.

Child issues

#	Title	Pts
1	Writer thread + segment file open/close/rotate	5
2	`FlightHeader` / `FlightFooter` + records-written/dropped accounting	3
3	64 GB cap + oldest-segment-dropped policy + `kind=segment_rollover` record	5
4	Mid-flight tile snapshot path + filesystem layout	3
5	Failed-tile thumbnail ≤ 0.1 Hz rate limiter + AC-8.5 enforcement	3
6	`FdrOpenError` takeoff abort path	2
7	Component-internal tests C13-IT-01..06 + C13-PT-01 + C13-ST-01	5

Key constraints

AC-NEW-3, AC-8.5, AC-4.5 (revised), RESTRICT-UAV-4.

Testing strategy

Per _docs/02_document/components/14_c13_fdr/tests.md — six component-internal tests + 8 h NFT-LIM-02 at the suite level.

E-C7 — C7 On-Jetson Inference Runtime

Tracker: AZ-249 | Type: component | T-shirt: L | Story points: 21–34

System context

flowchart LR
  C2[C2 VPR] --> C7
  C25[C2.5 Re-rank] --> C7
  C3[C3 Matcher] --> C7
  C35[C3.5 AdHoP] --> C7
  C7 --> TRT[TensorRT]
  C7 --> ORT[ONNX Runtime]
  C7 --> PT[PyTorch FP16]
  C7 --> THERMAL[ThermalState publish]
  THERMAL --> C4[C4 Pose hybrid]

Problem / Context

Centralise GPU inference on Jetson: engine compilation, deserialise + warm-up, per-call inference, fallback chain (TRT → ONNX-RT+TRT-EP → PyTorch FP16), and ThermalState telemetry that drives D-CROSS-LATENCY-1.

Scope

In scope: engine cache lifecycle, deserialise + warm-up budget (AC-NEW-1), ThermalState publisher from jetson-stats, D-C10-3 takeoff content-hash gate (engine-side), D-C10-7 filename-schema enforcement, ONNX-RT fallback path.

Out of scope: cache artifact build (E-C10), tile cache (E-C6), the per-frame consumers (their own epics).

Architecture notes

File: components/09_c7_inference/description.md.
Python in-process abstraction over C++ TRT bindings; no separate process.
Engines hardware-tied (SM 87 / JP 6.2 / TRT 10.3 / FP16) per D-C10-6.
Helper EngineFilenameSchema is shared with E-C10.

Interface specification

class InferenceRuntime:
    def load_engine(model_id: str) -> EngineHandle: ...
    def infer(handle: EngineHandle, batch: Tensor) -> Tensor: ...
    def thermal_state() -> ThermalState: ...
    def warm_up(handle: EngineHandle) -> None: ...

Data flow

sequenceDiagram
  participant Caller as C2/C2.5/C3/C3.5
  participant C7 as InferenceRuntime
  participant GPU as TRT/ONNX/PT
  Caller->>C7: infer(handle, batch)
  C7->>GPU: forward
  GPU-->>C7: output
  C7-->>Caller: tensor
  C7-->>C4: ThermalState pub (≥1 Hz)

Dependencies

E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT.
External: TensorRT 10.3, ONNX Runtime + TRT EP, PyTorch FP16, jetson-stats.

Acceptance criteria

AC-NEW-1 cold-start: every required engine deserialises + warms in ≤ 30 s p95 (C7-IT-01).
AC-NEW-5: ThermalState updates ≥ 1 Hz; throttle-detection latency ≤ 1 s; C4 hybrid switch within 1 frame (C7-IT-02).
D-C10-3: EngineHashMismatchError aborts F2 takeoff; no GPU memory allocated on mismatch (C7-IT-03).
D-C10-7: filename-schema mismatch refused at parse time (C7-IT-04).
ONNX-RT fallback path produces correct results when TRT engine missing (C7-IT-05).

Non-functional requirements

Per-model p95 latencies (C7-PT-01): UltraVPR ≤ 60 ms, LightGlue ≤ 30 ms, AdHoP ≤ 90 ms, DISK ≤ 50 ms.
GPU memory all engines resident ≤ 4 GB; system RAM ≤ 1.5 GB (C7-PT-02).

Risks & mitigations

R04 (engine cache hardware-tied) — D-C10-7 + D-C10-3 enforced at C7's deserialise path.
R10 (Marginals under thermal throttle) — C7's ThermalState publish is the upstream input to the C4 hybrid.

Effort

T-shirt L; 21–34 points.

Child issues

#	Title	Pts
1	TRT engine load + warm-up + cache lifecycle	5
2	ONNX-RT + TRT-EP fallback path	3
3	PyTorch FP16 simple-baseline path	3
4	D-C10-3 content-hash gate + D-C10-7 filename schema enforcement	3
5	`ThermalState` publisher from `jetson-stats`	3
6	Component-internal tests C7-IT-01..05 + C7-PT-01..02 + C7-ST-01	5

Key constraints

RESTRICT-HW-1 (Jetson + 25 W TDP), AC-4.2 (8 GB system memory), AC-NEW-5 (thermal envelope).

Testing strategy

Per components/09_c7_inference/tests.md.

E-C6 — C6 Tile Cache + Spatial Index

Tracker: AZ-250 | Type: component | T-shirt: M | Story points: 13–21

System context

flowchart LR
  C11[C11 TileDownloader] --> C6
  C10[C10 build] --> C6
  C5[C5 mid-flight gen] --> C6
  C2[C2 VPR] --> C6
  C25[C2.5 rerank] --> C6
  C3[C3 matcher] --> C6
  C11U[C11 TileUploader] --> C6

Problem / Context

Persistent imagery store byte-identical to satellite-provider's on-disk layout, plus the FAISS HNSW spatial index for VPR. Sole writer: C11 TileDownloader (production) + C5/orthorectifier (mid-flight). Sole readers: C2/C2.5/C3 (per-frame) + C11 TileUploader (post-landing).

Scope

In scope: Postgres tiles schema, filesystem JPEG layout matching satellite-provider, FAISS HNSW build/load (the index FILE — population via C10), per-sector freshness gates at write-time, 10 GB cache budget enforcement with LRU eviction, content-SHA-256 invariant on insert, mid-flight tile insert with quality_metadata.

Out of scope: tile fetch (E-C11), descriptor population (E-C10), inference (E-C7).

Architecture notes

File: components/08_c6_tile_cache/description.md, data_model.md.
Schema in data_model.md; Postgres 16; SHA-256 sidecar via helper Sha256Sidecar.

Interface specification

class TileStore:
    def insert(tile: TileRecord, jpeg: bytes) -> None: ...     # raises FreshnessRejected, ContentHashMismatch
    def get_tile_pixels(tile_id: TileId) -> bytes: ...
    def query_spatial(bbox: Bbox, zoom: int) -> list[TileRecord]: ...
    def mark_uploaded(tile_id: TileId) -> None: ...
    def pending_uploads() -> list[TileRecord]: ...

Data flow

sequenceDiagram
  participant W as Writer (C11 / C5)
  participant C6 as C6
  participant DB as Postgres
  participant FS as Filesystem
  W->>C6: insert(tile, jpeg)
  C6->>C6: freshness gate + sha256 check
  C6->>FS: write jpeg + sidecar
  C6->>DB: insert row

Dependencies

E-BOOT, E-CC-LOG, E-CC-CONF.
External: PostgreSQL 16, FAISS.

Acceptance criteria

AC-8.1: filesystem layout byte-identical to satellite-provider for the same coordinate (C6-IT-01).
AC-8.2 / AC-NEW-6: per-sector freshness gate rejects in active_conflict, downgrade-flags in stable_rear (C6-IT-02 / C6-IT-05).
AC-8.4: every mid-flight tile carries quality_metadata (C6-IT-03).
AC-NEW-3: peak F4 burst (5 Hz, 100 tiles) writes without dropping (C6-IT-04).
RESTRICT-SAT-2: 10 GB cap enforced with LRU eviction, every eviction logged (C6-IT-06).
Defensive: SHA-256 mismatch rejects insert (C6-ST-01).

Non-functional requirements

Per-tile read p95 (warm mmap) ≤ 0.5 ms; cold ≤ 50 ms (C6-PT-01).

Risks & mitigations

R08 (freshness drift in active_conflict) — write-side gate is the primary mitigation.

Effort

T-shirt M; 13–21 points.

Child issues

#	Title	Pts
1	Postgres `tiles` schema + migration	3
2	Filesystem JPEG store byte-identical to satellite-provider	3
3	FAISS HNSW load/save + mmap	3
4	Freshness gate + sector classification	3
5	10 GB LRU eviction with logging	3
6	Component-internal tests C6-IT-01..06 + C6-PT-01 + C6-ST-01	5

Key constraints

AC-8.1, AC-8.2, AC-NEW-6, RESTRICT-SAT-2, RESTRICT-UAV-4.

Testing strategy

Per components/08_c6_tile_cache/tests.md.

E-C11 — C11 Tile Manager (TileDownloader + TileUploader)

Tracker: AZ-251 | Type: component | T-shirt: M | Story points: 13–21

System context

flowchart LR
  SP[satellite-provider] -->|GET| DL[C11 TileDownloader]
  DL --> C6[C6 cache]
  C6 --> UP[C11 TileUploader]
  UP -->|POST /ingest| SP
  classDef airborne fill:#fee
  classDef operator fill:#cef
  class DL,UP operator

Problem / Context

Sole operator-side network I/O against satellite-provider, both directions. Strict ADR-004: never loaded into the airborne companion image. Bundled because download + upload share auth, HTTP client, deployment unit, and the airborne-exclusion property.

Scope

In scope: TileDownloader.fetch (download → freshness gate → write to C6), TileUploader.upload_pending (read C6 pending → sign → POST → mark uploaded), per-flight ephemeral signing key, idempotent retry on partial-success batches, flight_state == ON_GROUND gate (defense-in-depth atop ADR-004).

Out of scope: any airborne code; cache artifact build (E-C10); orchestration (E-C12).

Architecture notes

File: components/12_c11_tilemanager/description.md.
ADR-004 enforcement via E-BOOT's SBOM diff + runtime self-check.
Test substitute: e2e-test mock-suite-sat-service fixture under tests/fixtures/ (R01).

Interface specification

class TileDownloader:
    def fetch(req: FetchRequest) -> DownloadBatchReport: ...

class TileUploader:
    def upload_pending(flight_state: FlightStateSignal) -> UploadBatchReport: ...
    # raises UploadGateBlockedError if flight_state != ON_GROUND

Data flow

sequenceDiagram
  participant Op as Operator
  participant DL as TileDownloader
  participant SP as satellite-provider
  participant C6 as C6
  Op->>DL: fetch(area, sector_classification)
  DL->>SP: GET tiles
  SP-->>DL: tiles + metadata
  DL->>C6: insert (after freshness gate)
  DL-->>Op: DownloadBatchReport

Dependencies

E-C6, E-CC-CONF, E-CC-LOG.
External: real satellite-provider (download); D-PROJ-2 endpoint OR e2e-test fixture (upload).

Acceptance criteria

C11-IT-01: TileDownloader fetch + freshness gate + C6 write byte-identical layout.
C11-IT-02: stale-rejection counts surface in DownloadBatchReport.
C11-IT-03: TileUploader posts pending, signs payloads, marks uploaded on 202.
C11-IT-04: UploadGateBlockedError when not ON_GROUND.
C11-IT-05: idempotent retry — already-acked tiles not re-sent.
C11-ST-01: airborne process cannot import c11_tilemanager (R02 enforcement).
C11-ST-02: NFT-SEC-02 network-egress test passes.
C11-ST-03: per-flight key zeroised after upload.

Non-functional requirements

Download throughput ≥ 50 MB/s on 1 Gbps link (C11-PT-01).
Upload throughput ≥ 20 tile/s with signing (C11-PT-02).

Risks & mitigations

R01 (D-PROJ-2 not yet shipped) — TileUploader works against the e2e-test fixture; production retire when real endpoint lands.
R02 (ADR-004 break) — three enforcement gates; C11 tests verify each.
R09 (key compromise) — per-flight ephemeral keys; voting layer for compromise detection.

Effort

T-shirt M; 13–21 points.

Child issues

#	Title	Pts
1	TileDownloader: GET + freshness gate + C6 write	5
2	TileUploader: read pending + sign + POST + mark uploaded	5
3	Idempotent retry on partial-success batch	3
4	`flight_state == ON_GROUND` gate (defense-in-depth)	2
5	Per-flight ephemeral signing key + zeroisation	3
6	Component-internal tests C11-IT-01..05 + C11-PT-01..02 + C11-ST-01..03 + C11-AT-01	5

Key constraints

ADR-004, RESTRICT-SAT-1 (no in-flight Service calls), AC-8.3, AC-8.4, AC-NEW-6.

Testing strategy

Per components/12_c11_tilemanager/tests.md.

E-C10 — C10 Pre-flight Cache Provisioning

Tracker: AZ-252 | Type: component | T-shirt: M | Story points: 13–21

System context

flowchart LR
  C6[C6 already populated by C11] --> C10
  C10 --> ENGINES[TRT engines]
  C10 --> DESCS[FAISS descriptors]
  C10 --> MAN[signed Manifest]
  ENGINES & DESCS & MAN --> AIRBORNE[airborne image at F2 takeoff]

Problem / Context

Build model-derived artifacts from an already-populated C6: TRT engines, VPR descriptors (calling C2's embed_query over the corpus), the signed Manifest with content-hashes. Idempotent re-run on unchanged C6.

Scope

In scope: CacheProvisioner.build_artifacts, ManifestVerifier.verify, idempotence (D-C10-1), Manifest covers every shipped artifact, hardware-tied engine compile (D-C10-6), filename schema (D-C10-7), operator-key requirement.

Out of scope: tile fetch (E-C11), tile cache writes (E-C6), engine deserialisation (E-C7).

Architecture notes

File: components/11_c10_provisioning/description.md.
C10 narrowed in this Plan cycle: it does NOT talk to satellite-provider. Tiles must be present in C6 before C10 runs.

Interface specification

class CacheProvisioner:
    def build_artifacts(corpus_root: Path, key_path: Path) -> BuildReport: ...

class ManifestVerifier:
    def verify(manifest: Path, public_key: PublicKey) -> ManifestVerdict: ...

Data flow

sequenceDiagram
  participant Op as Operator
  participant C10 as CacheProvisioner
  participant C2 as C2 embed_query
  participant FS as Filesystem
  Op->>C10: build_artifacts(corpus_root, key)
  C10->>C2: embed every tile
  C2-->>C10: descriptors
  C10->>FS: write engines + faiss + manifest
  C10->>FS: sign manifest with operator key
  C10-->>Op: BuildReport

Dependencies

E-C6, E-C7, E-CC-LOG.

Acceptance criteria

C10-IT-01: end-to-end build produces engines + descriptors + signed Manifest.
C10-IT-02: ManifestVerifier rejects tampered or wrong-key Manifests.
C10-IT-03: idempotent re-run — same hash, no recompile (D-C10-1).
C10-IT-04: ManifestCoverageError on orphan files (no smuggled artifacts).
C10-IT-05: Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines (D-C10-6).
C10-ST-01: build refuses dev-key signing in operator mode.

Non-functional requirements

Cold build wall-clock ≤ 12 min on developer laptop with NVIDIA GPU; warm idempotent re-run ≤ 1 min (C10-PT-01).

Risks & mitigations

R04 (engine cache hardware-tied) — owner of the build side; deserialise side is C7.

Effort

T-shirt M; 13–21 points.

Child issues

#	Title	Pts
1	TRT engine compile (per-model)	5
2	FAISS descriptor population via C2's embed path	3
3	Signed Manifest builder + content-hash table	3
4	ManifestVerifier with operator-key requirement	3
5	Idempotent re-run + ManifestCoverageError	3
6	Component-internal tests C10-IT-01..05 + C10-PT-01 + C10-ST-01	5

Key constraints

AC-8.3, AC-NEW-1, D-C10-1 / D-C10-3 / D-C10-6 / D-C10-7.

Testing strategy

Per components/11_c10_provisioning/tests.md.

E-C12 — C12 Operator Pre-flight Orchestrator

Tracker: AZ-253 | Type: component | T-shirt: M | Story points: 13–21

System context

flowchart LR
  CLI[operator-orchestrator CLI]
  CLI --> C11D[C11 TileDownloader]
  CLI --> C10[C10 CacheProvisioner]
  CLI --> C11U[C11 TileUploader]
  CLI --> RELOC[AC-3.4 re-loc workflow]
  CLI --> FDR[FDR retrieval]

Problem / Context

Operator-facing CLI that sequences pre-flight (C11 download → C10 build) and post-landing (C11 upload), surfaces actionable failures, and handles the AC-3.4 re-localization workflow. Delivered as part of the operator-orchestrator tarball.

Scope

In scope: CLI subcommands (download, build-cache, upload-pending, reloc-confirm), CacheBuildReport aggregation, post-landing flight_state == ON_GROUND confirmation from FDR, sector-classification UI hook, FDR retrieval helpers.

Out of scope: actual download/upload (E-C11); engine compile (E-C10); FDR write side (E-C13).

Architecture notes

File: components/13_c12_operator_orchestrator/description.md.
Strict process boundary: C12 is operator-side only, in the same image as C11, but never airborne.

Interface specification

class OperatorTool:
    def build_cache(area: Area, sector_classification: SectorMap) -> CacheBuildReport: ...
    def trigger_post_landing_upload(fdr_root: Path) -> UploadBatchReport: ...
    def confirm_relocation(candidate: ReLocCandidate) -> None: ...

Data flow

sequenceDiagram
  participant Op as Operator
  participant C12 as OperatorTool
  participant C11 as C11
  participant C10 as C10
  Op->>C12: build_cache(area)
  C12->>C11: TileDownloader.fetch
  C11-->>C12: DownloadBatchReport
  C12->>C10: build_artifacts
  C10-->>C12: BuildReport
  C12-->>Op: CacheBuildReport

Dependencies

E-C10, E-C11, E-CC-LOG.

Acceptance criteria

C12-IT-01: operator re-loc workflow returns SUT to satellite_anchored ≤ 30 s (AC-3.4).
C12-IT-02: build_cache orchestrates C11 then C10; download failure aborts before C10.
C12-IT-03: trigger_post_landing_upload requires ≥ 30 s confirmed ON_GROUND in FDR.
C12-IT-04: actionable failure messages + non-zero exit on stale-tile rate > 30% or manifest signature failure.
C12-ST-01: no CLI command path imports into airborne package boundary.

Non-functional requirements

End-to-end build_cache wall-clock ≤ 18 min on developer laptop with NVIDIA GPU (C12-PT-01).

Risks & mitigations

R08 (freshness drift) — actionable failure surfacing in CacheBuildReport.

Effort

T-shirt M; 13–21 points.

Child issues

#	Title	Pts
1	CLI scaffolding + subcommand routing	3
2	`build_cache` orchestration (C11 then C10)	3
3	`trigger_post_landing_upload` with FDR-state confirmation	3
4	AC-3.4 re-localization workflow	3
5	Actionable failure surfacing in CacheBuildReport	2
6	Component-internal tests C12-IT-01..04 + C12-PT-01 + C12-ST-01 + C12-AT-01	5

Key constraints

ADR-004 (C12 lives operator-side); AC-3.4, AC-8.3, AC-8.4.

Testing strategy

Per components/13_c12_operator_orchestrator/tests.md.

E-C1 — C1 Visual / Visual-Inertial Odometry

Tracker: AZ-254 | Type: component | T-shirt: XL | Story points: 34–55

System context

flowchart LR
  NAVCAM[Nav camera 3 Hz] --> C1
  C8IMU[C8 ImuWindow 100-200 Hz] --> C1
  CAL[CameraCalibration] --> C1
  C1 --> C5[C5 StateEstimator]

Problem / Context

Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate from nav-camera + FC IMU. Three pluggable strategies (Okvis2 production-default, VinsMono research-only, KltRansac mandatory simple-baseline) selected at startup, build-time gated, never hot-swappable. Largest single epic by complexity.

Scope

In scope: VioStrategy interface + the three concrete strategies, ImuPreintegrator helper, warm-start path (AC-5.1), reboot recovery (AC-5.3), KltRansac as the simple-baseline AC-2.1a check, honest covariance under degradation.

Out of scope: state fusion (E-C5), pose estimation (E-C4), satellite anchoring (E-C2/C3/C4 chain).

Architecture notes

File: components/01_c1_vio/description.md.
Strategy + composition root + build-time exclusion (ADR-001 / ADR-002 / ADR-009).
C++ strategies via pybind11; KltRansac thin Python wrapper around OpenCV.
ImuPreintegrator shared with E-C5 (built once, used twice).

Interface specification

class VioStrategy(Protocol):
    def process_frame(frame: NavCameraFrame, imu: ImuWindow, cal: CameraCalibration) -> VioOutput: ...
    def reset_to_warm_start(pose: WarmStartPose) -> None: ...
    def health_snapshot() -> VioHealth: ...

DTOs in components/01_c1_vio/description.md § 2.

Data flow

sequenceDiagram
  participant CAM as Nav camera
  participant C1 as VioStrategy
  participant C5 as C5
  participant FDR as FDR
  CAM->>C1: NavCameraFrame
  C1->>C1: IMU preintegrate + feature tracking
  C1->>C5: VioOutput (relative pose + 6x6 cov + bias)
  C1->>FDR: VioHealth (ERROR + WARN; DEBUG to stdout)

Dependencies

E-BOOT, E-CC-FDR-CLIENT, E-C7 (only for the simple-baseline KltRansac path; OKVIS2 / VinsMono are CPU-bound, not GPU).

Acceptance criteria

C1-IT-01: honest cov norm rises monotonically under feature-loss event (AC-1.3 / AC-1.4).
C1-IT-02: VioOutput schema invariants — SPD covariance + matched frame_id (AC-1.4).
C1-IT-03: KltRansac ≥ 95% tracked-frame ratio on Derkachi normal segment (AC-2.1a engine rule).
C1-IT-04: MRE p95 < 1 px frame-to-frame for Okvis2 + KltRansac (AC-2.2).
C1-IT-05: warm-start converges within 5 frames (AC-5.1).
C1-IT-06: F8 reboot recovery from warm-start hint without fake confidence (AC-5.3).

Non-functional requirements

C1-PT-01: process_frame p95 ≤ 80 ms (Okvis2) at 3 Hz on Tier-2 with C2 backbone running concurrently; throughput ≥ 3 Hz sustained.
CPU ≤ 30% one core; memory ≤ 1.5 GB resident.

Risks & mitigations

R10 (latency under thermal throttle) — C1's budget partition is fixed; thermal-driven hybrid lives in C4.
R12 (single deployment camera) — KltRansac engine-rule path stays camera-agnostic; comparative IT-12 study uses static fixtures.

Effort

T-shirt XL; 34–55 points.

Child issues

#	Title	Pts
1	`VioStrategy` interface + composition wiring	3
2	OKVIS2 strategy (pybind11 binding + integration)	5
3	VinsMono strategy (research-only; behind BUILD_VINS_MONO)	5
4	KltRansac simple-baseline strategy	5
5	`ImuPreintegrator` helper (shared with C5)	3
6	Warm-start + F8 reboot recovery paths	3
7	Honest-covariance contract tests	3
8	Component-internal tests C1-IT-01..06 + C1-PT-01	5

Key constraints

AC-1.3, AC-1.4, AC-2.1a, AC-2.2, AC-4.1, AC-5.1, AC-5.3; RESTRICT-UAV-3 (sharp turns < 5% overlap).

Testing strategy

Per components/01_c1_vio/tests.md + suite-level FT-P-02 / FT-P-04 / FT-P-05.

E-C2 — C2 Visual Place Recognition

Tracker: AZ-255 | Type: component | T-shirt: L | Story points: 21–34

System context

flowchart LR
  CAM[Nav camera] --> C2
  C7[C7 backbone] --> C2
  C6[C6 FAISS index] --> C2
  C2 --> C25[C2.5 Re-rank]

Problem / Context

Top-K=10 candidate retrieval from the pre-cached corpus by descriptor similarity. UltraVPR primary, MegaLoc secondary, NetVLAD mandatory simple-baseline. Boundary between cheap retrieval and expensive matching.

Scope

In scope: VprStrategy + multiple backbones, FAISS HNSW lookup, descriptor pre-processing (resize/crop/normalise), L2 normalisation via DescriptorNormaliser, descriptor population entry-point used by C10.

Out of scope: re-rank (E-C2.5), matching (E-C3), index build (E-C10).

Architecture notes

File: components/02_c2_vpr/description.md.
Strategy + ADR-001/002/009.

Interface specification

class VprStrategy(Protocol):
    def embed_query(frame: NavCameraFrame, cal: CameraCalibration) -> VprQuery: ...
    def retrieve_topk(query: VprQuery, k: int) -> VprResult: ...
    def descriptor_dim() -> int: ...

Data flow

sequenceDiagram
  participant CAM as Nav camera
  participant C2 as VprStrategy
  participant C7 as C7
  participant C6 as FAISS
  CAM->>C2: NavCameraFrame
  C2->>C7: backbone forward
  C7-->>C2: embedding
  C2->>C6: HNSW search k=10
  C6-->>C2: candidates
  C2-->>C25: VprResult

Dependencies

E-C6, E-C7, E-CC-FDR-CLIENT.

Acceptance criteria

C2-IT-01: UltraVPR recall@10 ≥ 0.95; NetVLAD ≥ 0.85 on Derkachi (AC-2.1b + engine rule).
C2-IT-02: VprResult invariants (length, sorted distances, label).
C2-IT-03: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.
C2-IT-04: scale-ratio ±20% recall@10 ≥ 0.85 (AC-8.6 scale half).
C2-ST-01: index handle invalidation rejected with IndexUnavailableError.

Non-functional requirements

C2-PT-01: embed_query p95 ≤ 60 ms; retrieve_topk p95 ≤ 2 ms; combined ≤ 65 ms (AC-4.1 partition).
GPU ≤ 600 MB resident; system mem ≤ 200 MB for index handle.

Risks & mitigations

R06 (VPR top-1 false positive) — C2.5 + C3 + AC-NEW-7 downstream.

Effort

T-shirt L; 21–34 points.

Child issues

#	Title	Pts
1	`VprStrategy` interface + composition	3
2	UltraVPR backbone (TRT)	5
3	MegaLoc, MixVPR, SelaVPR, EigenPlaces secondary backbones	5
4	NetVLAD mandatory simple-baseline	3
5	FAISS HNSW load + lookup wiring	3
6	`DescriptorNormaliser` helper (shared with C10)	2
7	Component-internal tests C2-IT-01..04 + C2-PT-01 + C2-ST-01	5

Key constraints

AC-2.1b, AC-2.2, AC-4.1, AC-8.6, AC-NEW-7.

Testing strategy

Per components/02_c2_vpr/tests.md.

E-C2.5 — C2.5 Inlier-based Re-rank

Tracker: AZ-256 | Type: component | T-shirt: S | Story points: 5–8

System context

flowchart LR
  C2[C2 K=10] --> C25
  C7[C7 LightGlueRuntime helper] --> C25
  C6[C6 tile pixels] --> C25
  C25 --> C3[C3 N=3]

Problem / Context

K=10 → N=3 by single-pair LightGlue inlier count. Boundary between cheap retrieval and expensive matching. Shares LightGlueRuntime helper with C3 (R14 — owned by helper, not by either component).

Scope

In scope: ReRankStrategy + InlierCountReRanker, drop-and-continue on per-candidate failure.

Out of scope: matching itself (E-C3); LightGlue runtime ownership (the helper is its own module).

Architecture notes

File: components/03_c2_5_rerank/description.md.
Helper-ownership decision documented in R14 / risk_mitigations.md.

Interface specification

class ReRankStrategy(Protocol):
    def rerank(frame: NavCameraFrame, vpr_result: VprResult, n: int) -> RerankResult: ...

Data flow

sequenceDiagram
  participant C2 as C2
  participant C25 as C2.5
  participant LG as LightGlueRuntime helper
  participant C6 as C6
  C2->>C25: VprResult (k=10)
  loop 10 candidates
    C25->>C6: get_tile_pixels
    C25->>LG: single-pair inlier count
    LG-->>C25: inlier count
  end
  C25-->>C3: top-N=3 by inlier count

Dependencies

E-C2, E-C7, E-C6, shared LightGlueRuntime helper (with C3).

Acceptance criteria

C2.5-IT-01: top-1 promotion rate ≥ 0.98 (rerank rarely overrides correct C2 top-1).
C2.5-IT-02: drop-and-continue on per-candidate RerankBackboneError.
C2.5-IT-03: shared LightGlueRuntime serial-access invariant (no deadlock; bit-identical to single-threaded).

Non-functional requirements

C2.5-PT-01: rerank p95 ≤ 80 ms for 10 single-pair LightGlue passes; engine reuse single instance across calls.
GPU mem ≤ 300 MB shared LightGlue engine.

Risks & mitigations

R14 (apparent C2.5↔C3 cycle) — resolved this iteration via helper ownership.

Effort

T-shirt S; 5–8 points.

Child issues

#	Title	Pts
1	`InlierCountReRanker` + drop-and-continue	3
2	Shared `LightGlueRuntime` helper module	3
3	Component-internal tests C2.5-IT-01..03 + C2.5-PT-01	2

Key constraints

AC-2.1b, AC-4.1, AC-NEW-7.

Testing strategy

Per components/03_c2_5_rerank/tests.md.

E-C3 — C3 Cross-Domain Matcher

Tracker: AZ-257 | Type: component | T-shirt: L | Story points: 21–34

System context

flowchart LR
  C25[C2.5 N=3] --> C3
  C7[C7] --> C3
  CAL[CameraCalibration] --> C3
  C6[C6 tiles] --> C3
  C3 --> C35[C3.5 AdHoP]

Problem / Context

2D-3D correspondences between nav-camera and the top-N=3 satellite tiles, with RANSAC inliers + reprojection residual. Dominant compute cost in F3. Backbone choice locked (DISK+LightGlue per D-C3-1 = (a)) pending IT-12 verdict.

Scope

In scope: CrossDomainMatcher + DISK+LightGlue (primary) + ALIKED+LightGlue (secondary) + XFeat (alternate); RANSAC + reprojection residual via RansacFilter helper; InsufficientInliersError propagation.

Out of scope: refinement (E-C3.5); pose estimation (E-C4); LightGlue runtime ownership (helper).

Architecture notes

File: components/04_c3_matcher/description.md.

Interface specification

class CrossDomainMatcher(Protocol):
    def match(frame: NavCameraFrame, rerank: RerankResult, cal: CameraCalibration) -> MatchResult: ...
    def health_snapshot() -> MatcherHealth: ...

Data flow

sequenceDiagram
  participant C25 as C2.5
  participant C3 as C3
  participant C7 as C7
  C25->>C3: RerankResult (n=3)
  loop 3 candidates
    C3->>C7: backbone forward
    C3->>C3: RANSAC + residual
  end
  C3-->>C35: MatchResult (best by inlier count)

Dependencies

E-C2.5, E-C7, shared LightGlueRuntime helper, shared RansacFilter helper.

Acceptance criteria

C3-IT-01: best-candidate inlier count p5 ≥ 80 (AC-1.1 partition).
C3-IT-02: deterministic best_candidate_idx == argmax(inlier_count) with deterministic tie-break.
C3-IT-03: cross-domain MRE p95 < 2.5 px (AC-2.2).
C3-IT-04: tilt ±20° + 350 m outliers — inlier count p10 ≥ 40 (AC-3.1).
C3-IT-05: InsufficientInliersError propagation when all N=3 fail.

Non-functional requirements

C3-PT-01: match p95 ≤ 180 ms; per-candidate ≤ 60 ms; throughput ≥ 3 Hz; GPU mem ≤ 800 MB combined.

Risks & mitigations

R06 (false positive) — RANSAC + residual + downstream AC-NEW-7.
R10 (Marginals under throttle) — D-CROSS-LATENCY-1 hybrid touches C4 not C3 (C3's budget is fixed).

Effort

T-shirt L; 21–34 points.

Child issues

#	Title	Pts
1	`CrossDomainMatcher` interface + composition	3
2	DISK+LightGlue primary	5
3	ALIKED+LightGlue secondary	3
4	XFeat alternate (lightweight)	3
5	`RansacFilter` helper (shared C3/C3.5/C4)	3
6	Component-internal tests C3-IT-01..05 + C3-PT-01	5

Key constraints

AC-1.1, AC-2.2, AC-3.1, AC-4.1.

Testing strategy

Per components/04_c3_matcher/tests.md.

E-C3.5 — C3.5 AdHoP-Conditional Refinement

Tracker: AZ-258 | Type: component | T-shirt: M | Story points: 8–13

System context

flowchart LR
  C3[C3 MatchResult] --> C35
  C7[C7 AdHoP backbone] --> C35
  C35 --> C4[C4 Pose]

Problem / Context

Conditional perspective preconditioning when residual exceeds threshold; passthrough otherwise. Preserves AC-4.1 budget on the steady-state path while keeping refinement for hard frames.

Scope

In scope: ConditionalRefiner + AdHoPRefiner + PassthroughRefiner (both linked); residual threshold configuration; passthrough fall-through on RefinerBackboneError.

Out of scope: matcher (E-C3); pose (E-C4).

Architecture notes

File: components/05_c3_5_adhop/description.md.
Both implementations linked into the deployment binary; runtime gate is a config knob.

Interface specification

class ConditionalRefiner(Protocol):
    def refine_if_needed(frame: NavCameraFrame, mr: MatchResult, threshold: float) -> MatchResult: ...
    def was_invoked() -> bool: ...

Data flow

sequenceDiagram
  participant C3 as C3
  participant C35 as C3.5
  participant C7 as C7
  C3->>C35: MatchResult (residual=R)
  alt R > threshold
    C35->>C7: AdHoP backbone forward
    C7-->>C35: refined correspondences
    C35-->>C4: enriched MatchResult
  else
    C35-->>C4: passthrough MatchResult
  end

Dependencies

E-C3, E-C7.

Acceptance criteria

C3.5-IT-01: residual reduction ≥ 90% of invocations (AC-2.2 hard-frame portion).
C3.5-IT-02: passthrough fall-through on RefinerBackboneError with bit-identical correspondences.
C3.5-IT-03: invocation rate < 0.30 on Derkachi normal segment.

Non-functional requirements

C3.5-PT-01: invoked p95 ≤ 90 ms; passthrough p95 ≤ 0.5 ms; aggregated added latency ≤ 25 ms.

Risks & mitigations

R10 (latency under throttle) — threshold tunable via operator-orchestrator pre-flight.

Effort

T-shirt M; 8–13 points.

Child issues

#	Title	Pts
1	`AdHoPRefiner` (TRT engine + perspective preconditioning)	5
2	`PassthroughRefiner` no-op	1
3	Conditional gate + passthrough fall-through	2
4	Component-internal tests C3.5-IT-01..03 + C3.5-PT-01	3

Key constraints

AC-2.2, AC-4.1.

Testing strategy

Per components/05_c3_5_adhop/tests.md.

E-C4 — C4 Pose Estimator

Tracker: AZ-259 | Type: component | T-shirt: M | Story points: 13–21

System context

flowchart LR
  C35[C3.5] --> C4
  CAL[CameraCalibration] --> C4
  C7[C7 ThermalState] --> C4
  C5GRAPH[C5 iSAM2 graph] --> C4
  C4 --> C5[C5 add_pose_anchor]

Problem / Context

Convert MatchResult into PoseEstimate (WGS84 + 6×6 covariance + provenance label). OpenCV solvePnPRansac + GTSAM Marginals for native 6×6; D-CROSS-LATENCY-1 hybrid degrades to Jacobian under thermal throttle.

Scope

In scope: OpenCVGtsamPoseEstimator, GTSAM Marginals integration with C5's iSAM2 graph, Jacobian fallback, per-frame thermal-state-driven mode switch, WgsConverter helper usage.

Out of scope: state fusion (E-C5); thermal telemetry source (E-C7).

Architecture notes

File: components/06_c4_pose/description.md.
ADR-003 shared substrate: C4 adds factors to C5's graph; co-developed.
ADR-006 (Jacobian fallback ~5–10% accuracy loss accepted under throttle).

Interface specification

class PoseEstimator(Protocol):
    def estimate(mr: MatchResult, cal: CameraCalibration, thermal: ThermalState) -> PoseEstimate: ...
    def current_covariance_mode() -> CovarianceMode: ...

Data flow

sequenceDiagram
  participant C35 as C3.5
  participant C4 as C4
  participant C5 as C5 graph
  participant C7 as C7 thermal
  C35->>C4: MatchResult
  C7-->>C4: ThermalState
  alt thermal.throttle
    C4->>C4: Jacobian covariance
  else
    C4->>C5: add factor
    C5->>C5: Marginals.marginalCovariance
    C5-->>C4: Sigma
  end
  C4-->>C5: PoseEstimate (add_pose_anchor)

Dependencies

E-C3.5, E-C5 (co-developed shared substrate), shared RansacFilter, shared WgsConverter, shared SE3Utils.

Acceptance criteria

C4-IT-01: WGS84 accuracy p80 ≤ 50 m, p50 ≤ 20 m on Derkachi (AC-1.1 / AC-1.2).
C4-IT-02: 6×6 SPD covariance + honest under inlier degradation (AC-1.4).
C4-IT-03: D-CROSS-LATENCY-1 mode switch within 1 frame (AC-NEW-5 workstation portion).
C4-IT-04: shared-graph integration with C5 — prior keyframe perturbations within tolerance.

Non-functional requirements

C4-PT-01: estimate p95 MARGINALS ≤ 90 ms; JACOBIAN ≤ 15 ms; switch ≤ 1 frame.

Risks & mitigations

R10 (Marginals throttle) — primary owner of the hybrid switch.

Effort

T-shirt M; 13–21 points.

Child issues

#	Title	Pts
1	`solvePnPRansac` + IPPE wiring	3
2	GTSAM `Marginals` factor add to C5 graph	5
3	Jacobian-degraded fallback	3
4	Per-frame thermal-state-driven switch	2
5	`WgsConverter` helper (shared with C8)	3
6	Component-internal tests C4-IT-01..04 + C4-PT-01	3

Key constraints

AC-1.1, AC-1.2, AC-1.4, AC-4.1, AC-NEW-5.

Testing strategy

Per components/06_c4_pose/tests.md.

E-C5 — C5 State Estimator

Tracker: AZ-260 | Type: component | T-shirt: XL | Story points: 34–55

System context

flowchart LR
  C1[C1 VioOutput] --> C5
  C4[C4 PoseEstimate] --> C5
  C8I[C8 IMU/attitude/gps_health] --> C5
  C5 --> C8O[C8 outbound 5 Hz]
  C5 --> ORTHO[Orthorectifier → C6 mid-flight tile]
  C5 --> FDR[FDR smoothed history]

Problem / Context

Own GTSAM iSAM2 + IncrementalFixedLagSmoother (K=10–20). Fuse VIO + Pose + FC IMU into the posterior state; emit smoothed current frame to C8 + smoothed past keyframes to FDR (AC-4.5 revised, NOT FC retroactive). Spoof-promotion gate (AC-NEW-2 / AC-NEW-8). Largest epic alongside C1.

Scope

In scope: StateEstimator + GtsamIsam2StateEstimator (production-default) + EskfStateEstimator (mandatory simple-baseline); spoof-promotion gate; source-label state machine; smoothed history → FDR; AC-5.2 fallback path.

Out of scope: VIO (E-C1); pose (E-C4); FC adapter (E-C8); orthorectifier (lives within C5 as an internal subcomponent OR could split — kept inside C5 per the spec).

Architecture notes

File: components/07_c5_state/description.md.
ADR-003 (shared GTSAM substrate with C4); co-developed.
ADR-008 + spoof gate logic.

Interface specification

class StateEstimator(Protocol):
    def add_vio(o: VioOutput) -> None: ...
    def add_pose_anchor(p: PoseEstimate) -> None: ...
    def add_fc_imu(w: ImuWindow) -> None: ...
    def current_estimate() -> EstimatorOutput: ...
    def smoothed_history(n: int) -> list[EstimatorOutput]: ...
    def health_snapshot() -> EstimatorHealth: ...

Data flow

sequenceDiagram
  participant C1 as C1
  participant C4 as C4
  participant C8I as C8 inbound
  participant C5 as C5 iSAM2
  participant C8O as C8 outbound
  participant FDR as FDR
  C1->>C5: add_vio
  C4->>C5: add_pose_anchor (factor add)
  C8I->>C5: add_fc_imu
  C5->>C5: iSAM2 update + Marginals
  C5->>C8O: current_estimate (5 Hz)
  C5->>FDR: smoothed_history (per AC-4.5)

Dependencies

E-C1, E-C4, E-CC-FDR-CLIENT, E-C8 inbound side, shared ImuPreintegrator, SE3Utils, WgsConverter.

Acceptance criteria

C5-IT-01: last_satellite_anchor_age_ms reset/monotonic-rise (AC-1.3 binning).
C5-IT-02: smoothed-current honest covariance (AC-1.4).
C5-IT-03: VIO-only fallback under matcher failure (AC-3.5).
C5-IT-04: smoothed past-keyframes → FDR but NOT to FC stream (AC-4.5 revised).
C5-IT-05: 3 s no-estimate triggers AC-5.2 fallback.
C5-IT-06: spoof-promotion gate ≥ 10 s + visual consistency (AC-NEW-2).
C5-IT-07: visual blackout + spoof escalation (AC-NEW-8).
C5-ST-01: spoof-rejection logging cannot be silenced.

Non-functional requirements

C5-PT-01: add_pose_anchor + current_estimate p95 ≤ 60 ms; memory ≤ 100 MB resident.

Risks & mitigations

R05 (iSAM2 silent factor-add failure) — every add logs success/false.
R07 (spoof premature promotion) — primary owner of the gate.

Effort

T-shirt XL; 34–55 points.

Child issues

#	Title	Pts
1	`StateEstimator` interface + composition	3
2	iSAM2 + IncrementalFixedLagSmoother K=10-20 wiring	5
3	`BetweenFactorPose3` (VIO) + `GenericProjectionFactorCal3DS2` (pose)	5
4	`Marginals.marginalCovariance` integration	3
5	Source-label state machine + spoof-promotion gate	5
6	`EskfStateEstimator` mandatory simple-baseline	5
7	Smoothed-history → FDR path (NOT to FC)	3
8	AC-5.2 fallback path	3
9	Orthorectifier → C6 mid-flight tile gen sub-path	3
10	Component-internal tests C5-IT-01..07 + C5-PT-01 + C5-ST-01	5

Key constraints

AC-1.3, AC-1.4, AC-3.5, AC-4.5 (revised), AC-5.2, AC-NEW-2, AC-NEW-8.

Testing strategy

Per components/07_c5_state/tests.md.

E-C8 — C8 FC + GCS Adapter

Tracker: AZ-261 | Type: component | T-shirt: L | Story points: 21–34

System context

flowchart LR
  FCIN[FC inbound MAVLink/MSP2] --> C8I[C8 inbound]
  C8I --> C5[C5]
  C8I --> C1[C1]
  C5 --> C8O[C8 outbound]
  C8O -->|GPS_INPUT / MSP2_SENSOR_GPS| FCOUT[FC]
  C8O -->|telemetry 1-2 Hz| GCS[QGroundControl]

Problem / Context

Per-FC inbound + outbound. Inbound: subscribe to FC IMU/attitude/GPS-health/MAV_STATE; publish ImuWindow/AttitudeWindow/GpsHealth/FlightStateSignal. Outbound: encode EstimatorOutput for AP (GPS_INPUT) and iNav (MSP2_SENSOR_GPS) at 5 Hz with honest 6×6 → 2×2 covariance projection. Owns MAVLink 2.0 signing on AP wired channel (D-C8-9 = (d), R03 risk) + per-flight key rotation. Also feeds GCS at 1–2 Hz.

Scope

In scope: FcAdapter + PymavlinkArdupilotAdapter + Msp2InavAdapter; GcsAdapter + QgcTelemetryAdapter; signing handshake + per-flight ephemeral key + zeroisation; D-C8-2 source-set switch (gated by IT-3); honest covariance projection.

Out of scope: state estimation (E-C5); GCS workflow logic (operator side, E-C12).

Architecture notes

File: components/10_c8_fc_adapter/description.md.
Both AP + iNav adapters typically linked into the deployment binary (per ADR-002 — config picks one at runtime).
ADR-008 source-set switch gated by IT-3.

Interface specification

class FcAdapter(Protocol):
    def open(port: PortConfig, signing_key: bytes | None) -> None: ...
    def subscribe_telemetry(cb: Callable[[FcTelemetryFrame], None]) -> Subscription: ...
    def emit_external_position(o: EstimatorOutput) -> None: ...
    def emit_status_text(msg: str, severity: Severity) -> None: ...
    def request_source_set_switch() -> None: ...    # AP only
    def current_flight_state() -> FlightStateSignal: ...

Data flow

sequenceDiagram
  participant FC as FC
  participant C8I as C8 inbound
  participant C5 as C5
  participant C8O as C8 outbound
  FC->>C8I: IMU + attitude + gps_health
  C8I->>C5: ImuWindow / AttitudeWindow / GpsHealth
  C5->>C8O: EstimatorOutput
  C8O->>FC: GPS_INPUT / MSP2_SENSOR_GPS @ 5 Hz

Dependencies

E-C5, E-CC-CONF, E-CC-LOG.
External: pymavlink, MSP2 client, ArduPilot SITL, QGroundControl SITL.

Acceptance criteria

C8-IT-01: 6×6 → 2×2 honest covariance projection within 1% norm.
C8-IT-02: 5 Hz emission jitter ≤ ±5%.
C8-IT-03: warm-start GPS from FC EKF ≤ 1 s after C8 ready (AC-5.1).
C8-IT-04: GCS stream 1–2 Hz (AC-6.1).
C8-IT-05: GCS commands accepted (AC-6.2).
C8-IT-06: WGS84 round-trip ≤ 1 cm position residual (AC-6.3).
C8-IT-07: source-set switch ≤ 3 s of gate-clear (AC-NEW-2).
C8-IT-08: iNav adapter never attempts signing; AP always (RESTRICT-COMM-2).
C8-ST-01: MAVLink 2.0 signing handshake passes IT-3 SITL gate (R03).
C8-ST-02: per-flight key never persists across flights.

Non-functional requirements

C8-PT-01: emit_external_position p95 ≤ 5 ms; inbound IMU callback p95 ≤ 1 ms.

Risks & mitigations

R03 (signing handshake no precedent) — gated by IT-3; D-C8-2-FALLBACK options recorded.
R09 (key compromise) — per-flight ephemeral keys + zeroisation.

Effort

T-shirt L; 21–34 points.

Child issues

#	Title	Pts
1	`FcAdapter` interface + composition	3
2	`PymavlinkArdupilotAdapter` outbound `GPS_INPUT`	5
3	`Msp2InavAdapter` outbound `MSP2_SENSOR_GPS`	3
4	Inbound IMU/attitude/gps_health/MAV_STATE subscription	3
5	Honest 6×6 → 2×2 covariance projection	3
6	MAVLink 2.0 per-flight signing handshake (AP)	5
7	Source-set switch (AP D-C8-2 gated by IT-3)	3
8	`GcsAdapter` + downsampled telemetry	3
9	Component-internal tests C8-IT-01..08 + C8-PT-01 + C8-ST-01..02	5

Key constraints

AC-4.3, AC-4.4, AC-5.1, AC-5.2, AC-6.1, AC-6.2, AC-6.3, AC-NEW-2; RESTRICT-FC-1 / FC-2 / FC-3, RESTRICT-COMM-1 / COMM-2.

Testing strategy

Per components/10_c8_fc_adapter/tests.md.

E-BBT — Blackbox Tests (FT/NFT scenarios)

Tracker: AZ-262 | Type: tests | T-shirt: M | Story points: 13–21

System context

flowchart LR
  TESTROOT[tests/ runner] --> FTP[FT-P functional positive]
  TESTROOT --> FTN[FT-N functional negative]
  TESTROOT --> NFTPERF[NFT-PERF Tier-2]
  TESTROOT --> NFTLIM[NFT-LIM resource]
  TESTROOT --> NFTSEC[NFT-SEC security]
  TESTROOT --> NFTRES[NFT-RES resilience]
  TESTROOT --> IT[IT integration]

Problem / Context

Per-component epics ship their own component-internal unit/contract tests; this epic parents the suite-level scenarios already specified in _docs/02_document/tests/*.md. They exercise end-to-end ACs and restrictions and bind multiple components together.

Scope

In scope: implementing the FT-P, FT-N, NFT-PERF, NFT-LIM, NFT-SEC, NFT-RES, IT scenario IDs cited in traceability-matrix.md. Test data setup, fixtures (Derkachi flight + AerialVL S03 + e2e-test mock-suite-sat-service), Tier-2 runner orchestration.

Out of scope: per-component unit/contract tests (live in each component epic).

Architecture notes

Files: _docs/02_document/tests/blackbox-tests.md, performance-tests.md, security-tests.md, resource-limit-tests.md, resilience-tests.md, environment.md, test-data.md, traceability-matrix.md.
Tier-1 vs Tier-2 split per ADR-005.

Interface specification

Tests are pytest scenarios; no runtime interface beyond the test runner CLI.

Data flow

sequenceDiagram
  participant CI as CI runner
  participant FX as Fixtures
  participant SUT as System under test (compose / Tier-2 binary)
  CI->>FX: stage Derkachi corpus + SITL containers
  CI->>SUT: bring up
  CI->>SUT: drive scenario inputs
  SUT-->>CI: emitted MAVLink + FDR records
  CI->>CI: assert per scenario pass criteria

Dependencies

All component epics (each must ship ready-to-test).

Acceptance criteria

Every scenario ID cited in traceability-matrix.md exists, runs, and passes on its target tier.
Coverage ≥ 75% gate held (currently 92.4% inclusive / 89.8% strict — confirmed pre-Step-6).
PARTIAL / NOT COVERED rows have linked leftover entries explaining the deferral.

Non-functional requirements

Tier-1 full suite wall-clock ≤ 30 min on a developer laptop.
Tier-2 NFT suite wall-clock ≤ 90 min on the bench Jetson.

Risks & mitigations

R11 (statistical headroom) — NFT-RES-03 / NFT-SEC-01 use Monte-Carlo-with-CI per the AC-text relaxation.

Effort

T-shirt M; 13–21 points (test implementation; scenario specs already exist).

Child issues

#	Title	Pts
1	Test environment scaffolding (`tests/conftest.py`, fixtures dir, Postgres + SITL bring-up)	3
2	FT-P-* implementation (positive functional scenarios)	5
3	FT-N-* implementation (negative functional scenarios)	3
4	NFT-PERF / NFT-LIM Tier-2 runner integration	5
5	NFT-SEC implementation (incl. NFT-SEC-02 network egress + NFT-SEC-01 cache poisoning)	5
6	NFT-RES resilience scenarios	3
7	IT-3 ArduPilot SITL signing handshake (R03 gate)	5
8	IT-12 comparative-study runner	3

Key constraints

AC-NEW-3, AC-NEW-5, AC-NEW-7, RESTRICT-HW-1.

Testing strategy

This epic IS the testing strategy for system-level scenarios. Per-component testing belongs to component epics.

E-DEMO-REPLAY — Offline replay mode (video + tlog → per-tick coordinate stream)

Tracker: AZ-265 Type: feature (deployment-adjacent) T-shirt: M | Story points: 27–32 Added: Decompose Step 2 (cycle 1, 2026-05-10) Source notes: _docs/how_to_test.md (user-written demo requirements — auto-sync incorporated as child task #8)

System context

Demonstrate the GPS-denied positioning pipeline against historical flight data: a video file from the nav camera + a .tlog file from the FC. The replay mode runs the same C1–C5 inference pipeline the airborne binary runs; only the input transport (live camera → video file; live MAVLink → tlog) and output sink (FC MAVLink emit → JSONL) differ. NO ROS dependency is added — replay reuses the existing C8 FcAdapter interface via the strategy pattern.

flowchart LR
  subgraph LIVE[Airborne mode — unchanged]
    CAM[Live camera] --> C1L[C1 VIO]
    FCL[Live FC MAVLink] --> C8L[C8 inbound]
    C8L --> C1L
    C1L --> C2L[C2..C5]
    C2L --> C8OL[C8 outbound] --> FCL
  end
  subgraph REPLAY[Replay mode — this epic]
    VID[Video file .mp4/.h264] --> VFFS[VideoFileFrameSource] --> C1R[C1 VIO]
    TLOG[tlog file] --> TLR[TlogReplayFcAdapter] --> C1R
    C1R --> C2R[C2..C5]
    C2R --> RSINK[JsonlReplaySink] --> JSONL[results.jsonl - one EstimatorOutput per tick]
  end

Problem / Context

The parent-suite UI (in ui/ workspace, out of scope for this repo) needs to demo the GPS-denied positioning end-to-end. Per-component fixtures or simulators would not give the demo end-to-end fidelity. Instead, replay mode runs the production pipeline against historical inputs — demo confidence equals field test confidence on the same footage.

ROS as the input transport was considered and rejected: the system is MAVLink-native; introducing ROS would (a) add a major new dependency, (b) split production vs. demo code paths, and (c) duplicate code. Reusing the existing C8 FcAdapter interface with a tlog-replay strategy is strictly better.

Scope

In scope:

FrameSource interface (formalised cross-cutting; previously implicit "camera ingest thread") + VideoFileFrameSource strategy + LiveCameraFrameSource retrofit (no-op restructure of existing camera plumbing).
TlogReplayFcAdapter strategy (new C8 FcAdapter impl) parsing pymavlink .tlog files and emitting ImuWindow / AttitudeWindow / GpsHealth / FlightStateSignal at tlog timestamp cadence.
ReplaySink interface + JsonlReplaySink impl (one EstimatorOutput per line).
compose_replay(config) -> ReplayRoot composition root extending E-CC-CONF (AZ-246).
Clock injection (per R-DEMO-4) so timer-driven logic in C1–C5 works in both wall-clock (live) and tlog-simulated (replay) modes.
gps-denied-replay CLI: --video PATH --tlog PATH --output results.jsonl --camera-calibration calib.json --config config.yaml --pace {realtime,asap} [--time-offset-ms N].
Fourth Docker image gps-denied-replay-cli (Python + C1–C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server).
E2E replay test on a 1–2 min Derkachi clip + matching tlog asserting estimated track within ≤ 100 m of ground-truth GPS for ≥ 80 % of ticks.

Out of scope:

ROS / ROS2 dependency.
HTTP wrapper microservice (parent-suite UI backend shells out to the CLI; defer until subprocess-shape is proven insufficient).
Modifying any C1–C5 component to be replay-aware — they MUST remain mode-agnostic.
C6 mid-flight write path (replay reads a pre-built tile cache; doesn't write).

Architecture notes

ADR-001 / ADR-002 / ADR-009 all apply unchanged.
New BUILD_* flags: BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL. Default ON for the new replay-cli binary; OFF for airborne, research, and operator-orchestrator.
New cross-cutting FrameSource interface lives at src/gps_denied_onboard/frame_source/ (Layer 1 Foundation per module-layout.md § layering).
compose_replay lives in runtime_root.py alongside compose_root and compose_operator.

Interface specification

class FrameSource(Protocol):
    def next_frame(self) -> NavCameraFrame | None: ...
    def close(self) -> None: ...

class VideoFileFrameSource(FrameSource):
    def __init__(self, video_path: Path, frame_rate_hz: float, camera_id: str): ...

class TlogReplayFcAdapter(FcAdapter):  # FcAdapter from AZ-261 / E-C8
    def __init__(self, tlog_path: Path, target_fc_dialect: enum {ARDUPILOT, INAV}): ...

class ReplaySink(Protocol):
    def emit(self, output: EstimatorOutput) -> None: ...
    def close(self) -> None: ...

class JsonlReplaySink(ReplaySink):
    def __init__(self, output_path: Path): ...

def compose_replay(config: Config) -> ReplayRoot: ...

Data flow

Startup → load config / calibration → process tlog + video timestamp-aligned → for each frame: camera-ingest → C1 → C2 → C2.5 → C3 → C3.5 → C4 → C5 → emit EstimatorOutput to JsonlReplaySink. End of input → close sink → exit.

--pace realtime paces frames at wall-clock; --pace asap runs uncapped (default). The injected Clock is wall-clock-derived in realtime mode and tlog-timestamp-derived in asap mode so component fallback timers (e.g., AC-5.2 3 s no-estimate fallback) trigger consistently in both.

Dependencies

E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8 (every per-frame component).
E-CC-CONF (AZ-246) for compose_root extension.
E-CC-HELPERS (AZ-264) for WgsConverter (tlog GPS → local-tangent-plane).
Does NOT depend on E-C6 / E-C10 / E-C11 / E-C12 (replay reads pre-built cache; no operator-side workflows).

Acceptance criteria

AC-1: CLI exits 0 on a valid 1-min fixture and produces JSONL with one EstimatorOutput line per tlog tick (within ±5 % of GLOBAL_POSITION_INT count).
AC-2: Each line is a valid JSON object matching the EstimatorOutput schema.
AC-3: For a fixture with known ground-truth GPS, the L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound).
AC-4: Replay binary contains C1–C5 + replay strategies; SBOM diff CI step verifies absence of C6/C10/C11/C12.
AC-5: Same input → same output (deterministic) within ≤ 1e-6 float drift in position fields.
AC-6: --pace realtime runs the 1-min fixture in 60 ± 5 s; --pace asap in ≤ 30 s on Tier-1 hardware.
AC-7: Without --time-offset-ms, the CLI auto-detects the video ↔ tlog offset by correlating video motion-onset (or first-frame timestamp) with the tlog IMU take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s, matching the typical quadcopter take-off signature). On a fixture with known correct offset, the auto-detected offset is within ± 200 ms of ground truth. If auto-detect confidence is < 80 % the CLI logs a WARN and proceeds with the best-guess offset; --time-offset-ms N always overrides the auto-detect.
AC-8: If neither auto-detect nor manual offset can produce > 95 % of frames with at least one matching IMU window within ± 100 ms, the CLI exits with code 2 and prints both the auto-detected offset (if any) and the percentage of frames-with-IMU-window so the operator can debug.

Non-functional requirements

Cold-start ≤ 5 s (not subject to AC-NEW-1's 30 s budget — that's airborne-only).
Throughput ≥ 5 × real time on Jetson AGX Orin for --pace asap.
Memory ≤ 4 GB resident (lean image; no FAISS index unless tile lookup is needed).

Risks & mitigations

R-DEMO-1: Tlog ↔ video timestamp drift across long flights, AND the more-common case that recordings on the operator workstation are not synchronised at all (camera and FC start independently, often minutes apart). Mitigation: auto-sync via IMU take-off detection (AC-7) is the default; --time-offset-ms N is the manual override. If take-off pattern is ambiguous (e.g., fixed-wing hand-launch instead of quadcopter, or tlog includes pre-arm motion), CLI WARNs and falls back to the manual override.
R-DEMO-2: Pymavlink slow on multi-GB tlogs. Mitigation: stream-parse, never materialise; benchmark + document throughput floor.
R-DEMO-3: Demo footage missing required FC messages (HIL mode etc.). Mitigation: CLI fails fast at startup listing missing message types and the components that need them.
R-DEMO-4: Production C1–C5 paths bake real-time-cadence assumptions (e.g., 5 s fallback timer). Mitigation: Clock injection (wall-clock for live, tlog-derived for replay); documented as ADR amendment in next architecture-doc cycle.

Effort

T-shirt M; 27–32 points across 8 child tasks.

Child issues

#	Title	Pts
1	`FrameSource` interface (cross-cutting) + `VideoFileFrameSource` strategy + `LiveCameraFrameSource` retrofit	3
2	`TlogReplayFcAdapter` strategy (pymavlink stream parser → inbound DTOs)	5
3	`ReplaySink` interface + `JsonlReplaySink` impl	3
4	`compose_replay(config)` + `Clock` injection (per R-DEMO-4)	3
5	`gps-denied-replay` CLI entrypoint + arg parser + camera-calibration loader	3
6	`gps-denied-replay-cli` Dockerfile + GitHub Actions matrix entry + SBOM diff (excludes C6/C10/C11/C12)	3
7	E2E replay fixture test (Derkachi 1–2 min clip + tlog; AC-3 ≤100 m ≥ 80 % assertion)	5
8	Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8; `--time-offset-ms` remains the manual override)	5

Key constraints

ADR-001 / ADR-002 / ADR-009.
C1–C5 components MUST remain mode-agnostic; replay-aware logic lives only in the composition root, the new strategies, and the CLI.
No HTTP server in any companion binary (airborne or replay); HTTP wrapper, if added later, lives in operator-orchestrator per module-layout.md Layer-4 placement.

Testing strategy

Unit tests under tests/unit/frame_source/, tests/unit/c8_fc_adapter/test_tlog_replay_adapter.py, tests/unit/c8_fc_adapter/test_replay_sink.py, tests/unit/cli/test_replay_cli.py. E2E under tests/e2e/replay/ running the CLI against the Derkachi fixture (Tier-1 capable; gated by RUN_REPLAY_E2E=1 in CI). No FT/NFT scenarios at this epic — those live in E-BBT.

Lessons applied (Step 6 step-0 retrospective)

_docs/LESSONS.md does not yet exist (this is the project's first cycle), so no prior estimation/architecture/dependencies lessons were folded into the sizing above. When this cycle ends, the Final step's quality checklist should propose a lessons file capturing:

C2.5 ↔ C3 helper-ownership (R14) — generalisable lesson: when two siblings share a runtime, place ownership in a shared helper from day one rather than discovering the cycle in a 4a evaluator pass.
ADR-007 reversal (mock-as-fixture) — generalisable lesson: a test fixture is not a component; promoting one inflates architectural surface and risks contract drift.
D-PROJ-2 / D-PROJ-3 carryforwards — generalisable lesson: cross-suite design dependencies belong in _process_leftovers/ from the moment they are recognised, with full payload so a later cycle can replay them.

These three are candidates for the next cycle's LESSONS.md.

82 KiB Raw Blame History Unescape Escape

Work-Item Epics — gps-denied-onboard Plan cycle 1

Conventions

Decompose-time amendment (cycle 1, dated 2026-05-10)

Index

High-level component dependency diagram

E-BOOT — Bootstrap & Initial Structure

System context

Problem / Context

Scope

Architecture notes

Interface specification

Data flow

Dependencies

Acceptance criteria

Non-functional requirements

Risks & mitigations

Effort

Child issues (PBIs)

Key constraints

Testing strategy

E-CC-LOG — Cross-Cutting: Structured JSON Logging

System context

Problem / Context

Scope

Architecture notes

Interface specification

Data flow

Dependencies

Acceptance criteria

Non-functional requirements

Risks & mitigations

Effort

Child issues

Key constraints

Testing strategy

E-CC-CONF — Cross-Cutting: Configuration & Composition Root

System context

Problem / Context

Scope

Architecture notes

Interface specification

Data flow

Dependencies

Acceptance criteria

Non-functional requirements

Risks & mitigations

Effort

Child issues

Key constraints

Testing strategy

E-CC-FDR-CLIENT — Cross-Cutting: FDR Producer Client

System context

Problem / Context

Scope

Architecture notes

Interface specification

Data flow

Dependencies

Acceptance criteria

Non-functional requirements

Risks & mitigations

Effort

Child issues

Key constraints

Testing strategy

E-C13 — C13 Flight Data Recorder

System context

Problem / Context

Scope

Architecture notes

Interface specification

Data flow

Dependencies

Acceptance criteria

Non-functional requirements

Risks & mitigations

Effort

Child issues

Key constraints

82 KiB

Raw Blame History