Files
gps-denied-onboard/_docs/02_document/epics.md
T
Oleksandr Bezdieniezhnykh 5fe67023b2 [AZ-329] [AZ-330] [AZ-523] [AZ-524] Batch 44 atomic refactor
Implements two new C12 services and rebalances the C11/C12 boundary
in one atomic commit:

* AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the
  `flight_footer` FDR record's `clean_shutdown` field; 4 refusal
  modes; new FdrFooterReader Protocol + LocalFdrFooterReader.
* AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization
  hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol
  cut (E-C8 owns the future pymavlink concrete); new FDR record
  kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals,
  reason 200 chars).
* AZ-523 C11 internal flight-state gate removed (SRP refactor):
  `confirm_flight_state` / `FlightStateSignal` use /
  `FlightStateNotOnGroundError` deleted from C11; TileUploader
  contract bumped to v2.0.0 (frozen) with migration note; AZ-317
  superseded.
* AZ-524 Package rename `c12_operator_tooling` →
  `c12_operator_orchestrator` across source, tests, pyproject,
  CMake, Dockerfile, compose, CI, runtime-root services class
  (`OperatorOrchestratorServices`) + factory function
  (`build_operator_orchestrator`), logger namespaces, config slug,
  docs, and the E-C12 epic title.

Tests: 1543 passed, 80 skipped (all environment gates). Targeted
AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start
NFR-perf still ≤ 500 ms p99.

Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump
comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523
+ AZ-524 created and closed as audit-trail tickets.

See `_docs/03_implementation/batch_44_cycle1_report.md`.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 19:42:46 +03:00

82 KiB
Raw Blame History

Work-Item Epics — gps-denied-onboard Plan cycle 1

This file is the local epic draft for Plan Step 6. Tracker IDs (AZ-XXX) are now populated for every epic — they live in Jira project AZ. The canonical E-*AZ-NN mapping below is the source of truth referenced from each Jira epic's description.

Conventions

  • Issue type: Epic.
  • Epic descriptions are self-contained per the plan-skill rule: a developer reading only the epic should understand the full context. Each epic has the 14 required sections (system context, problem, scope, architecture notes, interface spec, data flow, dependencies, AC, NFRs, risks, effort, child issues, key constraints, testing strategy).
  • Effort sizing: T-shirt size + story-points range for the epic; per-task story points (PBI complexity) follow the user rule (1, 2, 3, 5, 8 — no PBI > 5; create 2/3, sometimes 5).
  • Cross-cutting epics parent exactly one shared implementation task; component epics consuming the concern declare a dependency, never re-implement locally.
  • Dependency rule: no epic depends on a later one in this index.

Decompose-time amendment (cycle 1, dated 2026-05-10)

Row 20 (E-CC-HELPERS / AZ-264) was added during Decompose Step 2 to comply with the cross-cutting rule. The 8 shared helpers (ImuPreintegrator, SE3Utils, LightGlueRuntime, WgsConverter, Sha256Sidecar, EngineFilenameSchema, RansacFilter, DescriptorNormaliser) were originally listed as child issues inside their largest-consumer component epics (e.g., ImuPreintegrator under E-C1 child #5, LightGlueRuntime under E-C2.5 child #2). Those child-issue listings are now superseded — helper ownership moves to E-CC-HELPERS, and component epics consume helpers as dependencies. The original component epic descriptions in Jira still reference the helpers in their child-issue tables; those will be reconciled at the next epic-edit pass (or at Step 4 cross-verification).

Index

# Epic ID Title Type Tracker T-shirt Story Pts Depends on
1 E-BOOT Bootstrap & Initial Structure bootstrap AZ-244 M 1321
2 E-CC-LOG Cross-Cutting: Structured JSON Logging cross-cutting AZ-245 S 58 E-BOOT
3 E-CC-CONF Cross-Cutting: Configuration & Composition Root cross-cutting AZ-246 S 58 E-BOOT
4 E-CC-FDR-CLIENT Cross-Cutting: FDR Producer Client (lock-free queue + record schema) cross-cutting AZ-247 M 813 E-BOOT, E-CC-LOG
5 E-C13 C13 Flight Data Recorder (writer thread + segments + cap) component AZ-248 L 2134 E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT
6 E-C7 C7 On-Jetson Inference Runtime component AZ-249 L 2134 E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT
7 E-C6 C6 Tile Cache + Spatial Index component AZ-250 M 1321 E-BOOT, E-CC-LOG, E-CC-CONF
8 E-C11 C11 Tile Manager (TileDownloader + TileUploader) component AZ-251 M 1321 E-C6, E-CC-CONF, E-CC-LOG
9 E-C10 C10 Pre-flight Cache Provisioning component AZ-252 M 1321 E-C6, E-C7, E-CC-LOG
10 E-C12 C12 Operator Pre-flight Orchestrator component AZ-253 M 1321 E-C10, E-C11, E-CC-LOG
11 E-C1 C1 Visual / Visual-Inertial Odometry component AZ-254 XL 3455 E-BOOT, E-CC-FDR-CLIENT, E-C7
12 E-C2 C2 Visual Place Recognition component AZ-255 L 2134 E-C6, E-C7, E-CC-FDR-CLIENT
13 E-C2.5 C2.5 Inlier-based Re-rank component AZ-256 S 58 E-C2, E-C7, E-C6 (LightGlue helper shared with C3)
14 E-C3 C3 Cross-Domain Matcher component AZ-257 L 2134 E-C2.5, E-C7
15 E-C3.5 C3.5 AdHoP-Conditional Refinement component AZ-258 M 813 E-C3, E-C7
16 E-C4 C4 Pose Estimator component AZ-259 M 1321 E-C3.5, E-C5 (shared GTSAM substrate; co-developed)
17 E-C5 C5 State Estimator component AZ-260 XL 3455 E-C1, E-C4 (shared graph), E-CC-FDR-CLIENT
18 E-C8 C8 FC + GCS Adapter component AZ-261 L 2134 E-C5, E-CC-CONF, E-CC-LOG
19 E-BBT Blackbox Tests (FT/NFT scenarios) tests AZ-262 M 1321 every component epic ships its component-internal tests under its own epic; this one parents the suite-level FT/NFT scenarios in _docs/02_document/tests/*.md
20 E-CC-HELPERS Cross-Cutting: Common Helpers (8 shared utilities) cross-cutting AZ-264 M 1321 E-BOOT, E-CC-LOG (added in Decompose Step 2 — supersedes per-component helper child-issues from cycle 1)
21 E-DEMO-REPLAY Offline replay mode (video + tlog → per-tick coordinate stream) feature AZ-265 M 2227 E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8, E-CC-CONF (added in Decompose Step 2 — enables parent-suite UI demo via subprocess + JSONL streaming)

High-level component dependency diagram

flowchart TB
  BOOT[E-BOOT Bootstrap]
  LOG[E-CC-LOG Logging]
  CONF[E-CC-CONF Config + Composition Root]
  FDRC[E-CC-FDR-CLIENT FDR Producer Client]
  C13[E-C13 FDR]
  C7[E-C7 Inference Runtime]
  C6[E-C6 Tile Cache]
  C11[E-C11 Tile Manager]
  C10[E-C10 Cache Provisioning]
  C12[E-C12 Operator Tooling]
  C1[E-C1 VIO]
  C2[E-C2 VPR]
  C25[E-C2.5 Re-rank]
  C3[E-C3 Matcher]
  C35[E-C3.5 AdHoP]
  C4[E-C4 Pose]
  C5[E-C5 State]
  C8[E-C8 FC Adapter]
  BBT[E-BBT Blackbox Tests]
  HELP[E-CC-HELPERS Common Helpers]
  DEMO[E-DEMO-REPLAY Offline Replay Mode]

  BOOT --> LOG --> FDRC --> C13
  BOOT --> CONF --> C13
  BOOT --> CONF --> C7
  BOOT --> LOG --> HELP
  C13 -.-> C7
  CONF --> C6 --> C11
  C6 --> C10
  C7 --> C10
  C10 --> C12
  C11 --> C12
  C7 --> C2 --> C25 --> C3 --> C35 --> C4
  C6 --> C2
  C6 --> C25
  C1 --> C5
  C4 <--> C5
  C5 --> C8
  FDRC --> C1
  FDRC --> C5
  C8 --> BBT
  C12 --> BBT
  HELP -.-> C1
  HELP -.-> C2
  HELP -.-> C25
  HELP -.-> C3
  HELP -.-> C35
  HELP -.-> C4
  HELP -.-> C5
  HELP -.-> C6
  HELP -.-> C7
  HELP -.-> C8
  HELP -.-> C10
  HELP -.-> C11
  HELP -.-> C12
  C1 --> DEMO
  C5 --> DEMO
  C8 --> DEMO
  CONF --> DEMO

E-BOOT — Bootstrap & Initial Structure

Tracker: AZ-244 Type: bootstrap T-shirt: M | Story points: 1321 Owner: onboard team

System context

flowchart LR
  EBOOT[E-BOOT scaffolding] --> SRC[src/ component dirs]
  EBOOT --> CICD[CI Tier-1 + Tier-2 jobs]
  EBOOT --> DOCKER[docker-compose.test.yml]
  EBOOT --> DB[Postgres init scripts]
  EBOOT --> TESTROOT[tests/ + tests/fixtures/]

Problem / Context

No source layout exists yet. Every downstream epic assumes a defined repo skeleton: src/components/<id>_<name>/, src/shared/<concern>/, tests/, tests/fixtures/, plus the Tier-1 Docker compose, the Tier-2 CI job, the Postgres init scripts that match data_model.md, and the operator-orchestrator tarball build path. Until this exists, no other epic can start.

Scope

In scope:

  • Create src/components/<id>_<name>/ for all 14 components with empty package init.
  • Create src/shared/{logging,config,fdr_client,crypto,calibration_loader}/ placeholders.
  • pyproject.toml (Python) + CMakeLists.txt (C++ where used by C1) with the project's pinned dep set.
  • Tier-1 docker-compose.test.yml skeleton (companion + Postgres + e2e-runner; mock-suite-sat-service compose pulled in only by upload tests).
  • Tier-2 CI job that runs on the bench Jetson runner, with the JetPack 6.2 / TRT 10.3 / SM 87 image pinned per ADR-005.
  • Postgres init scripts for the schema in data_model.md.
  • tests/ directory with tests/fixtures/, tests/tmp/, tests/conftest.py.
  • Empty runtime_root.py for the airborne composition root + operator_tool/__main__.py for the operator side.
  • .gitignore covering binaries, engine caches, FDR segments, ephemeral keys.
  • README with run commands.

Out of scope:

  • Any per-component logic (each component's epic owns its own implementation).
  • Cross-cutting impl (logging / config / FDR client live in their own epics).

Architecture notes

  • ADR-005 (Tier-1 / Tier-2 are first-class) drives the CI split.
  • ADR-009 (composition root) places runtime_root.py at the airborne entrypoint and operator_tool/__main__.py at the operator side.
  • ADR-002 (build-time exclusion) requires per-implementation CMake BUILD_* flags and the SBOM diff to be wired in CI from day one.
  • ADR-004 (process isolation) requires the airborne build target to refuse c11_tilemanager/ symbols. SBOM diff hook lives here from Bootstrap onward.

Interface specification

This epic exposes no runtime interface; it ships repository scaffolding only.

Data flow

N/A.

Dependencies

  • Epic dependencies: none.
  • External: GitHub Actions runner pool (Tier-1 Docker), bench Jetson runner (Tier-2), pinned base images (JetPack 6.2, Postgres 16, mcr.microsoft.com/dotnet/aspnet:8.0-alpine for the test fixture).

Acceptance criteria

  • docker compose -f docker-compose.test.yml up -d brings up companion + Postgres + e2e-runner cleanly on a fresh workstation.
  • Tier-2 CI smoke-job (echo $JETPACK_VERSION + nvidia-smi) passes on the bench Jetson.
  • pytest tests/ -q --collect-only discovers the empty tests/ tree without errors.
  • The SBOM diff CI step exists and fails the build if c11_tilemanager ever appears in the airborne production-binary artifact (R02 enforcement seed).
  • runtime_root.py runs and exits cleanly with a "no components configured" message (proves composition root wiring).

Non-functional requirements

  • CI cold-build wall-clock ≤ 10 min on Tier-1; ≤ 6 min on Tier-2 (just the smoke-job).
  • Repo size at this stage ≤ 5 MB (no fixtures committed).

Risks & mitigations

  • R12 (single deployment camera) — Bootstrap's CI must not assume the unit is plugged in; Tier-2 smoke-job runs without the camera, only against TRT/SM/JP version.

Effort

T-shirt M; 1321 story points across child PBIs (each ≤ 5 points).

Child issues (PBIs)

# Title Pts
1 Repo scaffolding: src/components/, src/shared/, tests/, runtime_root.py 2
2 pyproject.toml + CMakeLists.txt with pinned deps 3
3 Tier-1 docker-compose.test.yml skeleton + Postgres init 3
4 Tier-2 CI smoke-job on bench Jetson 3
5 SBOM diff CI step (R02 enforcement seed; fails on c11_tilemanager in airborne artifact) 3
6 .gitignore + README.md + run commands 2
7 runtime_root.py minimum (compose root + "no components configured" exit path) 2

Key constraints

  • RESTRICT-HW-1 (Jetson Orin Nano Super, 8 GB shared LPDDR5, 25 W) — Tier-2 image pins SM 87 / JP 6.2 / TRT 10.3.
  • RESTRICT-FC-1 (AP + iNav supported; PX4 out of scope) — composition root wires only AP + iNav adapters.

Testing strategy

  • CI smoke tests on every PR (Tier-1 compose-up, Tier-2 nvidia-smi).
  • No unit tests yet — those live in component epics.

E-CC-LOG — Cross-Cutting: Structured JSON Logging

Tracker: AZ-245 Type: cross-cutting T-shirt: S | Story points: 58

System context

Every component's § 9 Logging Strategy mandates structured JSON logging at ERROR / WARN / INFO / DEBUG levels with per-frame fields (frame_id, kind, component-specific keys). A single shared logger module under src/shared/logging/ produces these records; every component imports it.

flowchart LR
  COMP[Any component] --> LOGGER[src/shared/logging<br/>structured JSON]
  LOGGER --> STDOUT[stdout / journald]
  LOGGER --> FDR[FDR (via E-CC-FDR-CLIENT for ERROR + WARN)]

Problem / Context

If every component rolls its own logger, format drift is guaranteed. The traceability-matrix and post-flight FDR analysis rely on a stable JSON schema; a shared logger is the only honest way.

Scope

In scope:

  • src/shared/logging/__init__.py exporting get_logger(component_id: str) -> Logger.
  • JSON formatter with stable field ordering (ts, level, component, frame_id, kind, msg, ...kv).
  • Drop-in RotatingStdoutHandler for Tier-1 dev; JournaldHandler for Tier-2 production.
  • Bridge into the FDR client for ERROR + WARN levels (handler subscribes to log records and enqueues a kind = "log" FdrRecord).
  • Helpers for the documented per-frame log shapes (vio.frame_id, vpr.top10_distances, etc.) so component code is short.

Out of scope: per-component log content (lives in each component epic's child PBIs).

Architecture notes

Stdlib logging + python-json-logger (or orjson formatter for speed). No new dependency beyond what's already in pyproject.toml. No third-party log aggregator — Tier-1 uses Docker stdout capture; Tier-2 uses journald.

Interface specification

def get_logger(component_id: str) -> logging.Logger: ...

class StructuredJsonHandler(logging.Handler):
    """JSON formatter + FDR bridge for ERROR/WARN."""

class FdrLogBridge:
    """Subscribed by the logger; forwards ERROR + WARN to E-CC-FDR-CLIENT.enqueue."""

Data flow

sequenceDiagram
  participant C as Component
  participant L as Logger
  participant S as stdout
  participant F as FDR Client
  C->>L: log.warn("VPR top-1 above threshold", distance=0.42)
  L->>S: {"level":"WARN", "component":"c2", ...}
  L->>F: enqueue(kind="log", level="WARN", payload=...)

Dependencies

  • Depends on E-BOOT.
  • External: python-json-logger or orjson (whichever is already pinned).

Acceptance criteria

  • Every component test that asserts a log message uses the shared logger and finds the expected JSON shape.
  • ERROR + WARN records appear in FDR with kind = "log" and a back-reference to the originating component.
  • INFO + DEBUG do NOT appear in FDR (per-component § 9 storage rule).
  • Log format passes a contract test (tests/contract/log_schema.py) verifying field names + ordering + required keys.

Non-functional requirements

  • Per-record latency p99 ≤ 0.2 ms (lock-free emit on the hot path).
  • No allocation in the steady-state DEBUG path beyond the message string itself.

Risks & mitigations

  • R13 (FDR queue overrun) — the FDR bridge uses E-CC-FDR-CLIENT's drop-oldest semantics; it never blocks the caller.

Effort

T-shirt S; 58 points.

Child issues

# Title Pts
1 src/shared/logging/ module + JSON formatter + handlers 3
2 FDR log bridge (ERROR + WARN → kind=log) 2
3 Contract test tests/contract/log_schema.py 2

Key constraints

  • AC-NEW-3 (FDR ≤ 64 GB / flight, no silent drops) — DEBUG must not flow into FDR; verified by the contract test.

Testing strategy

  • Unit tests for the formatter (field ordering + escaping).
  • Contract test against the FDR record schema (kind=log).
  • Integration via every component's tests.md (each component asserts at least one log message).

E-CC-CONF — Cross-Cutting: Configuration & Composition Root

Tracker: AZ-246 Type: cross-cutting T-shirt: S | Story points: 58

System context

ADR-001 (runtime selection by config) + ADR-009 (composition root) together require a single shared loader that materialises the Config object at process startup, plus a compose_root(config) function that constructs each strategy/component instance with its dependencies. No component instantiates another component itself.

flowchart LR
  ENV[ENV vars] --> LOADER
  YAML[config.yaml] --> LOADER
  CALIB[Camera calibration JSON] --> LOADER
  LOADER[src/shared/config/loader] --> ROOT[runtime_root.py / operator_tool/__main__.py]
  ROOT --> COMPS[component instances]

Problem / Context

Without a single source of truth for configuration, the BUILD_* + runtime-strategy-selection rules of ADR-001/002/009 collapse — components silently fall back to defaults, and the composition root grows local config-parsing logic that drifts. The CI gate that ensures only the linked strategies are selectable also lives here.

Scope

In scope:

  • src/shared/config/loader.py: env + YAML + camera-calibration JSON merging with explicit precedence (env > YAML > defaults).
  • Config dataclass (frozen) covering every component's startup knob.
  • compose_root(config) -> RuntimeRoot for the airborne process; compose_operator(config) -> OperatorRoot for the tooling side.
  • Strategy-vs-build-flag consistency check at startup: refuse to start if config selects a strategy whose BUILD_* flag was off in the linked binary.

Out of scope: any component's specific config shape (defined inside its own epic).

Architecture notes

  • ADR-001, ADR-002, ADR-009 all converge here.
  • The composition root is the only place import of a concrete VioStrategy / VprStrategy / etc. is allowed; component code imports the abstract interface only.

Interface specification

@frozen
class Config: ...  # populated by union of every component's config schema

def load_config(env: dict[str, str], paths: list[Path]) -> Config: ...

def compose_root(config: Config) -> RuntimeRoot: ...
def compose_operator(config: Config) -> OperatorRoot: ...

Data flow

Startup-only — runs once per process. No per-frame path.

Dependencies

  • Depends on E-BOOT.

Acceptance criteria

  • compose_root constructs a runnable airborne process for every documented config preset (default deployment, IT-12 research-binary, smoke-test minimal).
  • Strategy/build-flag mismatch triggers an explicit StrategyNotLinkedError with a clear message (no silent fallback).
  • Config precedence (env > YAML > defaults) verified by unit tests for at least 3 keys per layer.
  • runtime_root.py exits with code 0 when given a valid config and no components actually do work (reachability proof).

Non-functional requirements

  • Cold-start config load + compose ≤ 1 s on Tier-2 (counts toward AC-NEW-1's 30 s budget).

Risks & mitigations

  • R02 (ADR-004 process isolation) — compose_root's strategy/build-flag check is the third enforcement gate (after SBOM diff and runtime self-check) preventing C11 from running airborne.

Effort

T-shirt S; 58 points.

Child issues

# Title Pts
1 src/shared/config/loader.py + Config dataclass 3
2 compose_root + compose_operator skeletons + StrategyNotLinkedError 3
3 Unit tests for env/YAML/defaults precedence 2

Key constraints

  • ADR-002 (build-time exclusion) — only linked strategies selectable.

Testing strategy

  • Unit: precedence + StrategyNotLinkedError.
  • Integration: every documented preset starts cleanly.

E-CC-FDR-CLIENT — Cross-Cutting: FDR Producer Client

Tracker: AZ-247 Type: cross-cutting T-shirt: M | Story points: 813

System context

C13 owns the FDR writer thread, segment files, and the 64 GB cap. Every other component publishes via a producer-side client: lock-free enqueue + an FdrRecord schema versioned in RecordSchema. This epic owns ONLY the producer side; the writer-thread internals belong to E-C13.

flowchart LR
  PROD[Component producer] --> Q[lock-free ring buffer]
  Q --> WRITER[E-C13 writer thread]
  WRITER --> SEG[segment file on NVM]

Problem / Context

Producer-side correctness (drop-oldest with rollover-log, schema versioning, never-block) is independent of where the file lands. Co-locating producer logic inside E-C13 would force every component test to spin up the writer thread; a thin shared client lets component tests use a fake sink.

Scope

In scope:

  • src/shared/fdr_client/__init__.py exporting FdrClient(producer_id: str) -> Client.
  • Lock-free SPSC ring buffer per producer; capacity configurable (default per producer in Config).
  • FdrRecord versioned schema (orjson or msgpack — pinned in E-BOOT).
  • Drop-oldest behaviour writing a structured kind=overrun record with producer_id + dropped count (never silent).
  • FakeFdrSink for component-level tests.

Out of scope: writer thread, segment files, 64 GB cap, rollover policy (E-C13).

Architecture notes

  • AC-NEW-3 (no silent drops) is enforced HERE: drop-oldest always emits the overrun record.
  • Schema versioning prevents post-flight tooling breakage when payload classes evolve.

Interface specification

class FdrClient:
    def __init__(self, producer_id: str): ...
    def enqueue(self, record: FdrRecord) -> None: ...   # lock-free, never blocks
    def flush(self) -> None: ...                         # used by tests only

Data flow

sequenceDiagram
  participant C as Component
  participant Q as Ring buffer
  participant W as Writer (E-C13)
  C->>Q: enqueue(record)
  alt overrun
    Q->>Q: drop oldest + emit kind=overrun record
  end
  W->>Q: dequeue (in writer thread)

Dependencies

  • Depends on E-BOOT, E-CC-LOG.
  • Consumed by every component that emits FDR records.

Acceptance criteria

  • enqueue never blocks even under writer-thread stall (verified by C13-IT-05 from the C13 tests.md).
  • Every overrun event produces a structured record with non-zero dropped_count and the originating producer_id.
  • Schema version bump (e.g., adding a new field) does not break post-flight tooling that reads at version N-1 (forward-compatible parser).

Non-functional requirements

  • enqueue p99 ≤ 5 µs on Tier-2 (no allocation on the steady-state path; pre-sized buffers).
  • Per-producer ring buffer size ≤ configured cap (no unbounded growth).

Risks & mitigations

  • R13 (queue overrun) — the design IS the mitigation: drop-oldest + always log.

Effort

T-shirt M; 813 points.

Child issues

# Title Pts
1 Lock-free SPSC ring buffer per producer 5
2 FdrRecord schema + versioned serialiser (orjson/msgpack) 3
3 Drop-oldest + kind=overrun record emission 2
4 FakeFdrSink for component tests 2

Key constraints

  • AC-NEW-3 (no silent drops).

Testing strategy

  • Unit: ring buffer correctness under contention; overrun record emitted.
  • Property tests: forward-compat parser at version N-1.

E-C13 — C13 Flight Data Recorder

Tracker: AZ-248 | Type: component | T-shirt: L | Story points: 2134

System context

flowchart LR
  ALL[All components] -->|enqueue via E-CC-FDR-CLIENT| Q[per-producer queues]
  Q --> W[C13 writer thread]
  W --> SEGS[segmented files on NVM]
  SEGS -.->|post-landing| OPTOOL[E-C12 retrieval]

Problem / Context

Per-flight ≤ 64 GB record of every payload class onboard, no silent drops, raw frames excluded except the ≤ 0.1 Hz failed-tile thumbnail forensic exception (AC-NEW-3, AC-8.5). Single writer thread; every other component produces.

Scope

In scope: writer thread, segment file lifecycle, 64 GB cap with oldest-segment-dropped policy, per-flight FlightHeader + FlightFooter, atomic segment rotation, mid-flight tile snapshot path, failed-tile thumbnail rate cap, refusal of takeoff when open_flight fails.

Out of scope: producer-side enqueue (E-CC-FDR-CLIENT); post-flight retrieval UI (E-C12).

Architecture notes

  • File: _docs/02_document/components/14_c13_fdr/description.md is the canonical spec.
  • IncrementalFixedLagSmoother from C5 publishes smoothed past-keyframes via FDR ONLY (AC-4.5 revised) — NOT into the FC stream.
  • Segment rotation uses atomicwrites; cross-process safety on the FDR root via filelock.

Interface specification

class FdrWriter:
    def open_flight(header: FlightHeader) -> None: ...     # raises FdrOpenError
    def write_record(record: FdrRecord) -> None: ...        # lock-free; FdrQueueOverrunError logged not raised
    def close_flight() -> FlightFooter: ...
    def current_size_bytes() -> int: ...
    def is_rolling() -> bool: ...

Data flow

sequenceDiagram
  participant Prods as Producers (every component)
  participant Q as Per-producer queues
  participant W as Writer thread
  participant FS as NVM segment file
  Prods->>Q: enqueue(record)
  W->>Q: dequeue
  W->>FS: serialise + append
  alt segment >= cap
    W->>FS: atomic rotate; drop oldest if total > 64 GB; emit kind=segment_rollover
  end

Dependencies

  • E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT.

Acceptance criteria

  • AC-NEW-3: synthetic 8 h replay produces ≤ 64 GB on disk, with every drop accompanied by a kind=overrun and/or kind=segment_rollover record.
  • AC-8.5: kind=raw_nav_frame writes raise RawFrameWriteForbiddenError; kind=failed_tile_thumbnail rate-limited to ≤ 0.1 Hz.
  • AC-1.4 / AC-4.5: every smoothed past-keyframe revision lands in FDR; the FC emission stream is unchanged.
  • AC-NEW-3 takeoff gate: FdrOpenError aborts takeoff before the FC adapter is opened.

Non-functional requirements

  • Writer throughput ≥ 200 Hz aggregate (per C13-PT-01).
  • Per-record serialise + write p95 ≤ 5 ms.

Risks & mitigations

  • R13 (queue overrun) — drop-oldest + always-log.
  • R02 (ADR-004) — C13 runs in the airborne process; no cross-process FDR root contention with C11 (C11 not airborne).

Effort

T-shirt L; 2134 points.

Child issues

# Title Pts
1 Writer thread + segment file open/close/rotate 5
2 FlightHeader / FlightFooter + records-written/dropped accounting 3
3 64 GB cap + oldest-segment-dropped policy + kind=segment_rollover record 5
4 Mid-flight tile snapshot path + filesystem layout 3
5 Failed-tile thumbnail ≤ 0.1 Hz rate limiter + AC-8.5 enforcement 3
6 FdrOpenError takeoff abort path 2
7 Component-internal tests C13-IT-01..06 + C13-PT-01 + C13-ST-01 5

Key constraints

  • AC-NEW-3, AC-8.5, AC-4.5 (revised), RESTRICT-UAV-4.

Testing strategy

  • Per _docs/02_document/components/14_c13_fdr/tests.md — six component-internal tests + 8 h NFT-LIM-02 at the suite level.

E-C7 — C7 On-Jetson Inference Runtime

Tracker: AZ-249 | Type: component | T-shirt: L | Story points: 2134

System context

flowchart LR
  C2[C2 VPR] --> C7
  C25[C2.5 Re-rank] --> C7
  C3[C3 Matcher] --> C7
  C35[C3.5 AdHoP] --> C7
  C7 --> TRT[TensorRT]
  C7 --> ORT[ONNX Runtime]
  C7 --> PT[PyTorch FP16]
  C7 --> THERMAL[ThermalState publish]
  THERMAL --> C4[C4 Pose hybrid]

Problem / Context

Centralise GPU inference on Jetson: engine compilation, deserialise + warm-up, per-call inference, fallback chain (TRT → ONNX-RT+TRT-EP → PyTorch FP16), and ThermalState telemetry that drives D-CROSS-LATENCY-1.

Scope

In scope: engine cache lifecycle, deserialise + warm-up budget (AC-NEW-1), ThermalState publisher from jetson-stats, D-C10-3 takeoff content-hash gate (engine-side), D-C10-7 filename-schema enforcement, ONNX-RT fallback path.

Out of scope: cache artifact build (E-C10), tile cache (E-C6), the per-frame consumers (their own epics).

Architecture notes

  • File: components/09_c7_inference/description.md.
  • Python in-process abstraction over C++ TRT bindings; no separate process.
  • Engines hardware-tied (SM 87 / JP 6.2 / TRT 10.3 / FP16) per D-C10-6.
  • Helper EngineFilenameSchema is shared with E-C10.

Interface specification

class InferenceRuntime:
    def load_engine(model_id: str) -> EngineHandle: ...
    def infer(handle: EngineHandle, batch: Tensor) -> Tensor: ...
    def thermal_state() -> ThermalState: ...
    def warm_up(handle: EngineHandle) -> None: ...

Data flow

sequenceDiagram
  participant Caller as C2/C2.5/C3/C3.5
  participant C7 as InferenceRuntime
  participant GPU as TRT/ONNX/PT
  Caller->>C7: infer(handle, batch)
  C7->>GPU: forward
  GPU-->>C7: output
  C7-->>Caller: tensor
  C7-->>C4: ThermalState pub (≥1 Hz)

Dependencies

  • E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT.
  • External: TensorRT 10.3, ONNX Runtime + TRT EP, PyTorch FP16, jetson-stats.

Acceptance criteria

  • AC-NEW-1 cold-start: every required engine deserialises + warms in ≤ 30 s p95 (C7-IT-01).
  • AC-NEW-5: ThermalState updates ≥ 1 Hz; throttle-detection latency ≤ 1 s; C4 hybrid switch within 1 frame (C7-IT-02).
  • D-C10-3: EngineHashMismatchError aborts F2 takeoff; no GPU memory allocated on mismatch (C7-IT-03).
  • D-C10-7: filename-schema mismatch refused at parse time (C7-IT-04).
  • ONNX-RT fallback path produces correct results when TRT engine missing (C7-IT-05).

Non-functional requirements

  • Per-model p95 latencies (C7-PT-01): UltraVPR ≤ 60 ms, LightGlue ≤ 30 ms, AdHoP ≤ 90 ms, DISK ≤ 50 ms.
  • GPU memory all engines resident ≤ 4 GB; system RAM ≤ 1.5 GB (C7-PT-02).

Risks & mitigations

  • R04 (engine cache hardware-tied) — D-C10-7 + D-C10-3 enforced at C7's deserialise path.
  • R10 (Marginals under thermal throttle) — C7's ThermalState publish is the upstream input to the C4 hybrid.

Effort

T-shirt L; 2134 points.

Child issues

# Title Pts
1 TRT engine load + warm-up + cache lifecycle 5
2 ONNX-RT + TRT-EP fallback path 3
3 PyTorch FP16 simple-baseline path 3
4 D-C10-3 content-hash gate + D-C10-7 filename schema enforcement 3
5 ThermalState publisher from jetson-stats 3
6 Component-internal tests C7-IT-01..05 + C7-PT-01..02 + C7-ST-01 5

Key constraints

  • RESTRICT-HW-1 (Jetson + 25 W TDP), AC-4.2 (8 GB system memory), AC-NEW-5 (thermal envelope).

Testing strategy

  • Per components/09_c7_inference/tests.md.

E-C6 — C6 Tile Cache + Spatial Index

Tracker: AZ-250 | Type: component | T-shirt: M | Story points: 1321

System context

flowchart LR
  C11[C11 TileDownloader] --> C6
  C10[C10 build] --> C6
  C5[C5 mid-flight gen] --> C6
  C2[C2 VPR] --> C6
  C25[C2.5 rerank] --> C6
  C3[C3 matcher] --> C6
  C11U[C11 TileUploader] --> C6

Problem / Context

Persistent imagery store byte-identical to satellite-provider's on-disk layout, plus the FAISS HNSW spatial index for VPR. Sole writer: C11 TileDownloader (production) + C5/orthorectifier (mid-flight). Sole readers: C2/C2.5/C3 (per-frame) + C11 TileUploader (post-landing).

Scope

In scope: Postgres tiles schema, filesystem JPEG layout matching satellite-provider, FAISS HNSW build/load (the index FILE — population via C10), per-sector freshness gates at write-time, 10 GB cache budget enforcement with LRU eviction, content-SHA-256 invariant on insert, mid-flight tile insert with quality_metadata.

Out of scope: tile fetch (E-C11), descriptor population (E-C10), inference (E-C7).

Architecture notes

  • File: components/08_c6_tile_cache/description.md, data_model.md.
  • Schema in data_model.md; Postgres 16; SHA-256 sidecar via helper Sha256Sidecar.

Interface specification

class TileStore:
    def insert(tile: TileRecord, jpeg: bytes) -> None: ...     # raises FreshnessRejected, ContentHashMismatch
    def get_tile_pixels(tile_id: TileId) -> bytes: ...
    def query_spatial(bbox: Bbox, zoom: int) -> list[TileRecord]: ...
    def mark_uploaded(tile_id: TileId) -> None: ...
    def pending_uploads() -> list[TileRecord]: ...

Data flow

sequenceDiagram
  participant W as Writer (C11 / C5)
  participant C6 as C6
  participant DB as Postgres
  participant FS as Filesystem
  W->>C6: insert(tile, jpeg)
  C6->>C6: freshness gate + sha256 check
  C6->>FS: write jpeg + sidecar
  C6->>DB: insert row

Dependencies

  • E-BOOT, E-CC-LOG, E-CC-CONF.
  • External: PostgreSQL 16, FAISS.

Acceptance criteria

  • AC-8.1: filesystem layout byte-identical to satellite-provider for the same coordinate (C6-IT-01).
  • AC-8.2 / AC-NEW-6: per-sector freshness gate rejects in active_conflict, downgrade-flags in stable_rear (C6-IT-02 / C6-IT-05).
  • AC-8.4: every mid-flight tile carries quality_metadata (C6-IT-03).
  • AC-NEW-3: peak F4 burst (5 Hz, 100 tiles) writes without dropping (C6-IT-04).
  • RESTRICT-SAT-2: 10 GB cap enforced with LRU eviction, every eviction logged (C6-IT-06).
  • Defensive: SHA-256 mismatch rejects insert (C6-ST-01).

Non-functional requirements

  • Per-tile read p95 (warm mmap) ≤ 0.5 ms; cold ≤ 50 ms (C6-PT-01).

Risks & mitigations

  • R08 (freshness drift in active_conflict) — write-side gate is the primary mitigation.

Effort

T-shirt M; 1321 points.

Child issues

# Title Pts
1 Postgres tiles schema + migration 3
2 Filesystem JPEG store byte-identical to satellite-provider 3
3 FAISS HNSW load/save + mmap 3
4 Freshness gate + sector classification 3
5 10 GB LRU eviction with logging 3
6 Component-internal tests C6-IT-01..06 + C6-PT-01 + C6-ST-01 5

Key constraints

  • AC-8.1, AC-8.2, AC-NEW-6, RESTRICT-SAT-2, RESTRICT-UAV-4.

Testing strategy

Per components/08_c6_tile_cache/tests.md.


E-C11 — C11 Tile Manager (TileDownloader + TileUploader)

Tracker: AZ-251 | Type: component | T-shirt: M | Story points: 1321

System context

flowchart LR
  SP[satellite-provider] -->|GET| DL[C11 TileDownloader]
  DL --> C6[C6 cache]
  C6 --> UP[C11 TileUploader]
  UP -->|POST /ingest| SP
  classDef airborne fill:#fee
  classDef operator fill:#cef
  class DL,UP operator

Problem / Context

Sole operator-side network I/O against satellite-provider, both directions. Strict ADR-004: never loaded into the airborne companion image. Bundled because download + upload share auth, HTTP client, deployment unit, and the airborne-exclusion property.

Scope

In scope: TileDownloader.fetch (download → freshness gate → write to C6), TileUploader.upload_pending (read C6 pending → sign → POST → mark uploaded), per-flight ephemeral signing key, idempotent retry on partial-success batches, flight_state == ON_GROUND gate (defense-in-depth atop ADR-004).

Out of scope: any airborne code; cache artifact build (E-C10); orchestration (E-C12).

Architecture notes

  • File: components/12_c11_tilemanager/description.md.
  • ADR-004 enforcement via E-BOOT's SBOM diff + runtime self-check.
  • Test substitute: e2e-test mock-suite-sat-service fixture under tests/fixtures/ (R01).

Interface specification

class TileDownloader:
    def fetch(req: FetchRequest) -> DownloadBatchReport: ...

class TileUploader:
    def upload_pending(flight_state: FlightStateSignal) -> UploadBatchReport: ...
    # raises UploadGateBlockedError if flight_state != ON_GROUND

Data flow

sequenceDiagram
  participant Op as Operator
  participant DL as TileDownloader
  participant SP as satellite-provider
  participant C6 as C6
  Op->>DL: fetch(area, sector_classification)
  DL->>SP: GET tiles
  SP-->>DL: tiles + metadata
  DL->>C6: insert (after freshness gate)
  DL-->>Op: DownloadBatchReport

Dependencies

  • E-C6, E-CC-CONF, E-CC-LOG.
  • External: real satellite-provider (download); D-PROJ-2 endpoint OR e2e-test fixture (upload).

Acceptance criteria

  • C11-IT-01: TileDownloader fetch + freshness gate + C6 write byte-identical layout.
  • C11-IT-02: stale-rejection counts surface in DownloadBatchReport.
  • C11-IT-03: TileUploader posts pending, signs payloads, marks uploaded on 202.
  • C11-IT-04: UploadGateBlockedError when not ON_GROUND.
  • C11-IT-05: idempotent retry — already-acked tiles not re-sent.
  • C11-ST-01: airborne process cannot import c11_tilemanager (R02 enforcement).
  • C11-ST-02: NFT-SEC-02 network-egress test passes.
  • C11-ST-03: per-flight key zeroised after upload.

Non-functional requirements

  • Download throughput ≥ 50 MB/s on 1 Gbps link (C11-PT-01).
  • Upload throughput ≥ 20 tile/s with signing (C11-PT-02).

Risks & mitigations

  • R01 (D-PROJ-2 not yet shipped) — TileUploader works against the e2e-test fixture; production retire when real endpoint lands.
  • R02 (ADR-004 break) — three enforcement gates; C11 tests verify each.
  • R09 (key compromise) — per-flight ephemeral keys; voting layer for compromise detection.

Effort

T-shirt M; 1321 points.

Child issues

# Title Pts
1 TileDownloader: GET + freshness gate + C6 write 5
2 TileUploader: read pending + sign + POST + mark uploaded 5
3 Idempotent retry on partial-success batch 3
4 flight_state == ON_GROUND gate (defense-in-depth) 2
5 Per-flight ephemeral signing key + zeroisation 3
6 Component-internal tests C11-IT-01..05 + C11-PT-01..02 + C11-ST-01..03 + C11-AT-01 5

Key constraints

  • ADR-004, RESTRICT-SAT-1 (no in-flight Service calls), AC-8.3, AC-8.4, AC-NEW-6.

Testing strategy

Per components/12_c11_tilemanager/tests.md.


E-C10 — C10 Pre-flight Cache Provisioning

Tracker: AZ-252 | Type: component | T-shirt: M | Story points: 1321

System context

flowchart LR
  C6[C6 already populated by C11] --> C10
  C10 --> ENGINES[TRT engines]
  C10 --> DESCS[FAISS descriptors]
  C10 --> MAN[signed Manifest]
  ENGINES & DESCS & MAN --> AIRBORNE[airborne image at F2 takeoff]

Problem / Context

Build model-derived artifacts from an already-populated C6: TRT engines, VPR descriptors (calling C2's embed_query over the corpus), the signed Manifest with content-hashes. Idempotent re-run on unchanged C6.

Scope

In scope: CacheProvisioner.build_artifacts, ManifestVerifier.verify, idempotence (D-C10-1), Manifest covers every shipped artifact, hardware-tied engine compile (D-C10-6), filename schema (D-C10-7), operator-key requirement.

Out of scope: tile fetch (E-C11), tile cache writes (E-C6), engine deserialisation (E-C7).

Architecture notes

  • File: components/11_c10_provisioning/description.md.
  • C10 narrowed in this Plan cycle: it does NOT talk to satellite-provider. Tiles must be present in C6 before C10 runs.

Interface specification

class CacheProvisioner:
    def build_artifacts(corpus_root: Path, key_path: Path) -> BuildReport: ...

class ManifestVerifier:
    def verify(manifest: Path, public_key: PublicKey) -> ManifestVerdict: ...

Data flow

sequenceDiagram
  participant Op as Operator
  participant C10 as CacheProvisioner
  participant C2 as C2 embed_query
  participant FS as Filesystem
  Op->>C10: build_artifacts(corpus_root, key)
  C10->>C2: embed every tile
  C2-->>C10: descriptors
  C10->>FS: write engines + faiss + manifest
  C10->>FS: sign manifest with operator key
  C10-->>Op: BuildReport

Dependencies

  • E-C6, E-C7, E-CC-LOG.

Acceptance criteria

  • C10-IT-01: end-to-end build produces engines + descriptors + signed Manifest.
  • C10-IT-02: ManifestVerifier rejects tampered or wrong-key Manifests.
  • C10-IT-03: idempotent re-run — same hash, no recompile (D-C10-1).
  • C10-IT-04: ManifestCoverageError on orphan files (no smuggled artifacts).
  • C10-IT-05: Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines (D-C10-6).
  • C10-ST-01: build refuses dev-key signing in operator mode.

Non-functional requirements

  • Cold build wall-clock ≤ 12 min on developer laptop with NVIDIA GPU; warm idempotent re-run ≤ 1 min (C10-PT-01).

Risks & mitigations

  • R04 (engine cache hardware-tied) — owner of the build side; deserialise side is C7.

Effort

T-shirt M; 1321 points.

Child issues

# Title Pts
1 TRT engine compile (per-model) 5
2 FAISS descriptor population via C2's embed path 3
3 Signed Manifest builder + content-hash table 3
4 ManifestVerifier with operator-key requirement 3
5 Idempotent re-run + ManifestCoverageError 3
6 Component-internal tests C10-IT-01..05 + C10-PT-01 + C10-ST-01 5

Key constraints

  • AC-8.3, AC-NEW-1, D-C10-1 / D-C10-3 / D-C10-6 / D-C10-7.

Testing strategy

Per components/11_c10_provisioning/tests.md.


E-C12 — C12 Operator Pre-flight Orchestrator

Tracker: AZ-253 | Type: component | T-shirt: M | Story points: 1321

System context

flowchart LR
  CLI[operator-orchestrator CLI]
  CLI --> C11D[C11 TileDownloader]
  CLI --> C10[C10 CacheProvisioner]
  CLI --> C11U[C11 TileUploader]
  CLI --> RELOC[AC-3.4 re-loc workflow]
  CLI --> FDR[FDR retrieval]

Problem / Context

Operator-facing CLI that sequences pre-flight (C11 download → C10 build) and post-landing (C11 upload), surfaces actionable failures, and handles the AC-3.4 re-localization workflow. Delivered as part of the operator-orchestrator tarball.

Scope

In scope: CLI subcommands (download, build-cache, upload-pending, reloc-confirm), CacheBuildReport aggregation, post-landing flight_state == ON_GROUND confirmation from FDR, sector-classification UI hook, FDR retrieval helpers.

Out of scope: actual download/upload (E-C11); engine compile (E-C10); FDR write side (E-C13).

Architecture notes

  • File: components/13_c12_operator_orchestrator/description.md.
  • Strict process boundary: C12 is operator-side only, in the same image as C11, but never airborne.

Interface specification

class OperatorTool:
    def build_cache(area: Area, sector_classification: SectorMap) -> CacheBuildReport: ...
    def trigger_post_landing_upload(fdr_root: Path) -> UploadBatchReport: ...
    def confirm_relocation(candidate: ReLocCandidate) -> None: ...

Data flow

sequenceDiagram
  participant Op as Operator
  participant C12 as OperatorTool
  participant C11 as C11
  participant C10 as C10
  Op->>C12: build_cache(area)
  C12->>C11: TileDownloader.fetch
  C11-->>C12: DownloadBatchReport
  C12->>C10: build_artifacts
  C10-->>C12: BuildReport
  C12-->>Op: CacheBuildReport

Dependencies

  • E-C10, E-C11, E-CC-LOG.

Acceptance criteria

  • C12-IT-01: operator re-loc workflow returns SUT to satellite_anchored ≤ 30 s (AC-3.4).
  • C12-IT-02: build_cache orchestrates C11 then C10; download failure aborts before C10.
  • C12-IT-03: trigger_post_landing_upload requires ≥ 30 s confirmed ON_GROUND in FDR.
  • C12-IT-04: actionable failure messages + non-zero exit on stale-tile rate > 30% or manifest signature failure.
  • C12-ST-01: no CLI command path imports into airborne package boundary.

Non-functional requirements

  • End-to-end build_cache wall-clock ≤ 18 min on developer laptop with NVIDIA GPU (C12-PT-01).

Risks & mitigations

  • R08 (freshness drift) — actionable failure surfacing in CacheBuildReport.

Effort

T-shirt M; 1321 points.

Child issues

# Title Pts
1 CLI scaffolding + subcommand routing 3
2 build_cache orchestration (C11 then C10) 3
3 trigger_post_landing_upload with FDR-state confirmation 3
4 AC-3.4 re-localization workflow 3
5 Actionable failure surfacing in CacheBuildReport 2
6 Component-internal tests C12-IT-01..04 + C12-PT-01 + C12-ST-01 + C12-AT-01 5

Key constraints

  • ADR-004 (C12 lives operator-side); AC-3.4, AC-8.3, AC-8.4.

Testing strategy

Per components/13_c12_operator_orchestrator/tests.md.


E-C1 — C1 Visual / Visual-Inertial Odometry

Tracker: AZ-254 | Type: component | T-shirt: XL | Story points: 3455

System context

flowchart LR
  NAVCAM[Nav camera 3 Hz] --> C1
  C8IMU[C8 ImuWindow 100-200 Hz] --> C1
  CAL[CameraCalibration] --> C1
  C1 --> C5[C5 StateEstimator]

Problem / Context

Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate from nav-camera + FC IMU. Three pluggable strategies (Okvis2 production-default, VinsMono research-only, KltRansac mandatory simple-baseline) selected at startup, build-time gated, never hot-swappable. Largest single epic by complexity.

Scope

In scope: VioStrategy interface + the three concrete strategies, ImuPreintegrator helper, warm-start path (AC-5.1), reboot recovery (AC-5.3), KltRansac as the simple-baseline AC-2.1a check, honest covariance under degradation.

Out of scope: state fusion (E-C5), pose estimation (E-C4), satellite anchoring (E-C2/C3/C4 chain).

Architecture notes

  • File: components/01_c1_vio/description.md.
  • Strategy + composition root + build-time exclusion (ADR-001 / ADR-002 / ADR-009).
  • C++ strategies via pybind11; KltRansac thin Python wrapper around OpenCV.
  • ImuPreintegrator shared with E-C5 (built once, used twice).

Interface specification

class VioStrategy(Protocol):
    def process_frame(frame: NavCameraFrame, imu: ImuWindow, cal: CameraCalibration) -> VioOutput: ...
    def reset_to_warm_start(pose: WarmStartPose) -> None: ...
    def health_snapshot() -> VioHealth: ...

DTOs in components/01_c1_vio/description.md § 2.

Data flow

sequenceDiagram
  participant CAM as Nav camera
  participant C1 as VioStrategy
  participant C5 as C5
  participant FDR as FDR
  CAM->>C1: NavCameraFrame
  C1->>C1: IMU preintegrate + feature tracking
  C1->>C5: VioOutput (relative pose + 6x6 cov + bias)
  C1->>FDR: VioHealth (ERROR + WARN; DEBUG to stdout)

Dependencies

  • E-BOOT, E-CC-FDR-CLIENT, E-C7 (only for the simple-baseline KltRansac path; OKVIS2 / VinsMono are CPU-bound, not GPU).

Acceptance criteria

  • C1-IT-01: honest cov norm rises monotonically under feature-loss event (AC-1.3 / AC-1.4).
  • C1-IT-02: VioOutput schema invariants — SPD covariance + matched frame_id (AC-1.4).
  • C1-IT-03: KltRansac ≥ 95% tracked-frame ratio on Derkachi normal segment (AC-2.1a engine rule).
  • C1-IT-04: MRE p95 < 1 px frame-to-frame for Okvis2 + KltRansac (AC-2.2).
  • C1-IT-05: warm-start converges within 5 frames (AC-5.1).
  • C1-IT-06: F8 reboot recovery from warm-start hint without fake confidence (AC-5.3).

Non-functional requirements

  • C1-PT-01: process_frame p95 ≤ 80 ms (Okvis2) at 3 Hz on Tier-2 with C2 backbone running concurrently; throughput ≥ 3 Hz sustained.
  • CPU ≤ 30% one core; memory ≤ 1.5 GB resident.

Risks & mitigations

  • R10 (latency under thermal throttle) — C1's budget partition is fixed; thermal-driven hybrid lives in C4.
  • R12 (single deployment camera) — KltRansac engine-rule path stays camera-agnostic; comparative IT-12 study uses static fixtures.

Effort

T-shirt XL; 3455 points.

Child issues

# Title Pts
1 VioStrategy interface + composition wiring 3
2 OKVIS2 strategy (pybind11 binding + integration) 5
3 VinsMono strategy (research-only; behind BUILD_VINS_MONO) 5
4 KltRansac simple-baseline strategy 5
5 ImuPreintegrator helper (shared with C5) 3
6 Warm-start + F8 reboot recovery paths 3
7 Honest-covariance contract tests 3
8 Component-internal tests C1-IT-01..06 + C1-PT-01 5

Key constraints

  • AC-1.3, AC-1.4, AC-2.1a, AC-2.2, AC-4.1, AC-5.1, AC-5.3; RESTRICT-UAV-3 (sharp turns < 5% overlap).

Testing strategy

Per components/01_c1_vio/tests.md + suite-level FT-P-02 / FT-P-04 / FT-P-05.


E-C2 — C2 Visual Place Recognition

Tracker: AZ-255 | Type: component | T-shirt: L | Story points: 2134

System context

flowchart LR
  CAM[Nav camera] --> C2
  C7[C7 backbone] --> C2
  C6[C6 FAISS index] --> C2
  C2 --> C25[C2.5 Re-rank]

Problem / Context

Top-K=10 candidate retrieval from the pre-cached corpus by descriptor similarity. UltraVPR primary, MegaLoc secondary, NetVLAD mandatory simple-baseline. Boundary between cheap retrieval and expensive matching.

Scope

In scope: VprStrategy + multiple backbones, FAISS HNSW lookup, descriptor pre-processing (resize/crop/normalise), L2 normalisation via DescriptorNormaliser, descriptor population entry-point used by C10.

Out of scope: re-rank (E-C2.5), matching (E-C3), index build (E-C10).

Architecture notes

  • File: components/02_c2_vpr/description.md.
  • Strategy + ADR-001/002/009.

Interface specification

class VprStrategy(Protocol):
    def embed_query(frame: NavCameraFrame, cal: CameraCalibration) -> VprQuery: ...
    def retrieve_topk(query: VprQuery, k: int) -> VprResult: ...
    def descriptor_dim() -> int: ...

Data flow

sequenceDiagram
  participant CAM as Nav camera
  participant C2 as VprStrategy
  participant C7 as C7
  participant C6 as FAISS
  CAM->>C2: NavCameraFrame
  C2->>C7: backbone forward
  C7-->>C2: embedding
  C2->>C6: HNSW search k=10
  C6-->>C2: candidates
  C2-->>C25: VprResult

Dependencies

  • E-C6, E-C7, E-CC-FDR-CLIENT.

Acceptance criteria

  • C2-IT-01: UltraVPR recall@10 ≥ 0.95; NetVLAD ≥ 0.85 on Derkachi (AC-2.1b + engine rule).
  • C2-IT-02: VprResult invariants (length, sorted distances, label).
  • C2-IT-03: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.
  • C2-IT-04: scale-ratio ±20% recall@10 ≥ 0.85 (AC-8.6 scale half).
  • C2-ST-01: index handle invalidation rejected with IndexUnavailableError.

Non-functional requirements

  • C2-PT-01: embed_query p95 ≤ 60 ms; retrieve_topk p95 ≤ 2 ms; combined ≤ 65 ms (AC-4.1 partition).
  • GPU ≤ 600 MB resident; system mem ≤ 200 MB for index handle.

Risks & mitigations

  • R06 (VPR top-1 false positive) — C2.5 + C3 + AC-NEW-7 downstream.

Effort

T-shirt L; 2134 points.

Child issues

# Title Pts
1 VprStrategy interface + composition 3
2 UltraVPR backbone (TRT) 5
3 MegaLoc, MixVPR, SelaVPR, EigenPlaces secondary backbones 5
4 NetVLAD mandatory simple-baseline 3
5 FAISS HNSW load + lookup wiring 3
6 DescriptorNormaliser helper (shared with C10) 2
7 Component-internal tests C2-IT-01..04 + C2-PT-01 + C2-ST-01 5

Key constraints

  • AC-2.1b, AC-2.2, AC-4.1, AC-8.6, AC-NEW-7.

Testing strategy

Per components/02_c2_vpr/tests.md.


E-C2.5 — C2.5 Inlier-based Re-rank

Tracker: AZ-256 | Type: component | T-shirt: S | Story points: 58

System context

flowchart LR
  C2[C2 K=10] --> C25
  C7[C7 LightGlueRuntime helper] --> C25
  C6[C6 tile pixels] --> C25
  C25 --> C3[C3 N=3]

Problem / Context

K=10 → N=3 by single-pair LightGlue inlier count. Boundary between cheap retrieval and expensive matching. Shares LightGlueRuntime helper with C3 (R14 — owned by helper, not by either component).

Scope

In scope: ReRankStrategy + InlierCountReRanker, drop-and-continue on per-candidate failure.

Out of scope: matching itself (E-C3); LightGlue runtime ownership (the helper is its own module).

Architecture notes

  • File: components/03_c2_5_rerank/description.md.
  • Helper-ownership decision documented in R14 / risk_mitigations.md.

Interface specification

class ReRankStrategy(Protocol):
    def rerank(frame: NavCameraFrame, vpr_result: VprResult, n: int) -> RerankResult: ...

Data flow

sequenceDiagram
  participant C2 as C2
  participant C25 as C2.5
  participant LG as LightGlueRuntime helper
  participant C6 as C6
  C2->>C25: VprResult (k=10)
  loop 10 candidates
    C25->>C6: get_tile_pixels
    C25->>LG: single-pair inlier count
    LG-->>C25: inlier count
  end
  C25-->>C3: top-N=3 by inlier count

Dependencies

  • E-C2, E-C7, E-C6, shared LightGlueRuntime helper (with C3).

Acceptance criteria

  • C2.5-IT-01: top-1 promotion rate ≥ 0.98 (rerank rarely overrides correct C2 top-1).
  • C2.5-IT-02: drop-and-continue on per-candidate RerankBackboneError.
  • C2.5-IT-03: shared LightGlueRuntime serial-access invariant (no deadlock; bit-identical to single-threaded).

Non-functional requirements

  • C2.5-PT-01: rerank p95 ≤ 80 ms for 10 single-pair LightGlue passes; engine reuse single instance across calls.
  • GPU mem ≤ 300 MB shared LightGlue engine.

Risks & mitigations

  • R14 (apparent C2.5↔C3 cycle) — resolved this iteration via helper ownership.

Effort

T-shirt S; 58 points.

Child issues

# Title Pts
1 InlierCountReRanker + drop-and-continue 3
2 Shared LightGlueRuntime helper module 3
3 Component-internal tests C2.5-IT-01..03 + C2.5-PT-01 2

Key constraints

  • AC-2.1b, AC-4.1, AC-NEW-7.

Testing strategy

Per components/03_c2_5_rerank/tests.md.


E-C3 — C3 Cross-Domain Matcher

Tracker: AZ-257 | Type: component | T-shirt: L | Story points: 2134

System context

flowchart LR
  C25[C2.5 N=3] --> C3
  C7[C7] --> C3
  CAL[CameraCalibration] --> C3
  C6[C6 tiles] --> C3
  C3 --> C35[C3.5 AdHoP]

Problem / Context

2D-3D correspondences between nav-camera and the top-N=3 satellite tiles, with RANSAC inliers + reprojection residual. Dominant compute cost in F3. Backbone choice locked (DISK+LightGlue per D-C3-1 = (a)) pending IT-12 verdict.

Scope

In scope: CrossDomainMatcher + DISK+LightGlue (primary) + ALIKED+LightGlue (secondary) + XFeat (alternate); RANSAC + reprojection residual via RansacFilter helper; InsufficientInliersError propagation.

Out of scope: refinement (E-C3.5); pose estimation (E-C4); LightGlue runtime ownership (helper).

Architecture notes

  • File: components/04_c3_matcher/description.md.

Interface specification

class CrossDomainMatcher(Protocol):
    def match(frame: NavCameraFrame, rerank: RerankResult, cal: CameraCalibration) -> MatchResult: ...
    def health_snapshot() -> MatcherHealth: ...

Data flow

sequenceDiagram
  participant C25 as C2.5
  participant C3 as C3
  participant C7 as C7
  C25->>C3: RerankResult (n=3)
  loop 3 candidates
    C3->>C7: backbone forward
    C3->>C3: RANSAC + residual
  end
  C3-->>C35: MatchResult (best by inlier count)

Dependencies

  • E-C2.5, E-C7, shared LightGlueRuntime helper, shared RansacFilter helper.

Acceptance criteria

  • C3-IT-01: best-candidate inlier count p5 ≥ 80 (AC-1.1 partition).
  • C3-IT-02: deterministic best_candidate_idx == argmax(inlier_count) with deterministic tie-break.
  • C3-IT-03: cross-domain MRE p95 < 2.5 px (AC-2.2).
  • C3-IT-04: tilt ±20° + 350 m outliers — inlier count p10 ≥ 40 (AC-3.1).
  • C3-IT-05: InsufficientInliersError propagation when all N=3 fail.

Non-functional requirements

  • C3-PT-01: match p95 ≤ 180 ms; per-candidate ≤ 60 ms; throughput ≥ 3 Hz; GPU mem ≤ 800 MB combined.

Risks & mitigations

  • R06 (false positive) — RANSAC + residual + downstream AC-NEW-7.
  • R10 (Marginals under throttle) — D-CROSS-LATENCY-1 hybrid touches C4 not C3 (C3's budget is fixed).

Effort

T-shirt L; 2134 points.

Child issues

# Title Pts
1 CrossDomainMatcher interface + composition 3
2 DISK+LightGlue primary 5
3 ALIKED+LightGlue secondary 3
4 XFeat alternate (lightweight) 3
5 RansacFilter helper (shared C3/C3.5/C4) 3
6 Component-internal tests C3-IT-01..05 + C3-PT-01 5

Key constraints

  • AC-1.1, AC-2.2, AC-3.1, AC-4.1.

Testing strategy

Per components/04_c3_matcher/tests.md.


E-C3.5 — C3.5 AdHoP-Conditional Refinement

Tracker: AZ-258 | Type: component | T-shirt: M | Story points: 813

System context

flowchart LR
  C3[C3 MatchResult] --> C35
  C7[C7 AdHoP backbone] --> C35
  C35 --> C4[C4 Pose]

Problem / Context

Conditional perspective preconditioning when residual exceeds threshold; passthrough otherwise. Preserves AC-4.1 budget on the steady-state path while keeping refinement for hard frames.

Scope

In scope: ConditionalRefiner + AdHoPRefiner + PassthroughRefiner (both linked); residual threshold configuration; passthrough fall-through on RefinerBackboneError.

Out of scope: matcher (E-C3); pose (E-C4).

Architecture notes

  • File: components/05_c3_5_adhop/description.md.
  • Both implementations linked into the deployment binary; runtime gate is a config knob.

Interface specification

class ConditionalRefiner(Protocol):
    def refine_if_needed(frame: NavCameraFrame, mr: MatchResult, threshold: float) -> MatchResult: ...
    def was_invoked() -> bool: ...

Data flow

sequenceDiagram
  participant C3 as C3
  participant C35 as C3.5
  participant C7 as C7
  C3->>C35: MatchResult (residual=R)
  alt R > threshold
    C35->>C7: AdHoP backbone forward
    C7-->>C35: refined correspondences
    C35-->>C4: enriched MatchResult
  else
    C35-->>C4: passthrough MatchResult
  end

Dependencies

  • E-C3, E-C7.

Acceptance criteria

  • C3.5-IT-01: residual reduction ≥ 90% of invocations (AC-2.2 hard-frame portion).
  • C3.5-IT-02: passthrough fall-through on RefinerBackboneError with bit-identical correspondences.
  • C3.5-IT-03: invocation rate < 0.30 on Derkachi normal segment.

Non-functional requirements

  • C3.5-PT-01: invoked p95 ≤ 90 ms; passthrough p95 ≤ 0.5 ms; aggregated added latency ≤ 25 ms.

Risks & mitigations

  • R10 (latency under throttle) — threshold tunable via operator-orchestrator pre-flight.

Effort

T-shirt M; 813 points.

Child issues

# Title Pts
1 AdHoPRefiner (TRT engine + perspective preconditioning) 5
2 PassthroughRefiner no-op 1
3 Conditional gate + passthrough fall-through 2
4 Component-internal tests C3.5-IT-01..03 + C3.5-PT-01 3

Key constraints

  • AC-2.2, AC-4.1.

Testing strategy

Per components/05_c3_5_adhop/tests.md.


E-C4 — C4 Pose Estimator

Tracker: AZ-259 | Type: component | T-shirt: M | Story points: 1321

System context

flowchart LR
  C35[C3.5] --> C4
  CAL[CameraCalibration] --> C4
  C7[C7 ThermalState] --> C4
  C5GRAPH[C5 iSAM2 graph] --> C4
  C4 --> C5[C5 add_pose_anchor]

Problem / Context

Convert MatchResult into PoseEstimate (WGS84 + 6×6 covariance + provenance label). OpenCV solvePnPRansac + GTSAM Marginals for native 6×6; D-CROSS-LATENCY-1 hybrid degrades to Jacobian under thermal throttle.

Scope

In scope: OpenCVGtsamPoseEstimator, GTSAM Marginals integration with C5's iSAM2 graph, Jacobian fallback, per-frame thermal-state-driven mode switch, WgsConverter helper usage.

Out of scope: state fusion (E-C5); thermal telemetry source (E-C7).

Architecture notes

  • File: components/06_c4_pose/description.md.
  • ADR-003 shared substrate: C4 adds factors to C5's graph; co-developed.
  • ADR-006 (Jacobian fallback ~510% accuracy loss accepted under throttle).

Interface specification

class PoseEstimator(Protocol):
    def estimate(mr: MatchResult, cal: CameraCalibration, thermal: ThermalState) -> PoseEstimate: ...
    def current_covariance_mode() -> CovarianceMode: ...

Data flow

sequenceDiagram
  participant C35 as C3.5
  participant C4 as C4
  participant C5 as C5 graph
  participant C7 as C7 thermal
  C35->>C4: MatchResult
  C7-->>C4: ThermalState
  alt thermal.throttle
    C4->>C4: Jacobian covariance
  else
    C4->>C5: add factor
    C5->>C5: Marginals.marginalCovariance
    C5-->>C4: Sigma
  end
  C4-->>C5: PoseEstimate (add_pose_anchor)

Dependencies

  • E-C3.5, E-C5 (co-developed shared substrate), shared RansacFilter, shared WgsConverter, shared SE3Utils.

Acceptance criteria

  • C4-IT-01: WGS84 accuracy p80 ≤ 50 m, p50 ≤ 20 m on Derkachi (AC-1.1 / AC-1.2).
  • C4-IT-02: 6×6 SPD covariance + honest under inlier degradation (AC-1.4).
  • C4-IT-03: D-CROSS-LATENCY-1 mode switch within 1 frame (AC-NEW-5 workstation portion).
  • C4-IT-04: shared-graph integration with C5 — prior keyframe perturbations within tolerance.

Non-functional requirements

  • C4-PT-01: estimate p95 MARGINALS ≤ 90 ms; JACOBIAN ≤ 15 ms; switch ≤ 1 frame.

Risks & mitigations

  • R10 (Marginals throttle) — primary owner of the hybrid switch.

Effort

T-shirt M; 1321 points.

Child issues

# Title Pts
1 solvePnPRansac + IPPE wiring 3
2 GTSAM Marginals factor add to C5 graph 5
3 Jacobian-degraded fallback 3
4 Per-frame thermal-state-driven switch 2
5 WgsConverter helper (shared with C8) 3
6 Component-internal tests C4-IT-01..04 + C4-PT-01 3

Key constraints

  • AC-1.1, AC-1.2, AC-1.4, AC-4.1, AC-NEW-5.

Testing strategy

Per components/06_c4_pose/tests.md.


E-C5 — C5 State Estimator

Tracker: AZ-260 | Type: component | T-shirt: XL | Story points: 3455

System context

flowchart LR
  C1[C1 VioOutput] --> C5
  C4[C4 PoseEstimate] --> C5
  C8I[C8 IMU/attitude/gps_health] --> C5
  C5 --> C8O[C8 outbound 5 Hz]
  C5 --> ORTHO[Orthorectifier → C6 mid-flight tile]
  C5 --> FDR[FDR smoothed history]

Problem / Context

Own GTSAM iSAM2 + IncrementalFixedLagSmoother (K=1020). Fuse VIO + Pose + FC IMU into the posterior state; emit smoothed current frame to C8 + smoothed past keyframes to FDR (AC-4.5 revised, NOT FC retroactive). Spoof-promotion gate (AC-NEW-2 / AC-NEW-8). Largest epic alongside C1.

Scope

In scope: StateEstimator + GtsamIsam2StateEstimator (production-default) + EskfStateEstimator (mandatory simple-baseline); spoof-promotion gate; source-label state machine; smoothed history → FDR; AC-5.2 fallback path.

Out of scope: VIO (E-C1); pose (E-C4); FC adapter (E-C8); orthorectifier (lives within C5 as an internal subcomponent OR could split — kept inside C5 per the spec).

Architecture notes

  • File: components/07_c5_state/description.md.
  • ADR-003 (shared GTSAM substrate with C4); co-developed.
  • ADR-008 + spoof gate logic.

Interface specification

class StateEstimator(Protocol):
    def add_vio(o: VioOutput) -> None: ...
    def add_pose_anchor(p: PoseEstimate) -> None: ...
    def add_fc_imu(w: ImuWindow) -> None: ...
    def current_estimate() -> EstimatorOutput: ...
    def smoothed_history(n: int) -> list[EstimatorOutput]: ...
    def health_snapshot() -> EstimatorHealth: ...

Data flow

sequenceDiagram
  participant C1 as C1
  participant C4 as C4
  participant C8I as C8 inbound
  participant C5 as C5 iSAM2
  participant C8O as C8 outbound
  participant FDR as FDR
  C1->>C5: add_vio
  C4->>C5: add_pose_anchor (factor add)
  C8I->>C5: add_fc_imu
  C5->>C5: iSAM2 update + Marginals
  C5->>C8O: current_estimate (5 Hz)
  C5->>FDR: smoothed_history (per AC-4.5)

Dependencies

  • E-C1, E-C4, E-CC-FDR-CLIENT, E-C8 inbound side, shared ImuPreintegrator, SE3Utils, WgsConverter.

Acceptance criteria

  • C5-IT-01: last_satellite_anchor_age_ms reset/monotonic-rise (AC-1.3 binning).
  • C5-IT-02: smoothed-current honest covariance (AC-1.4).
  • C5-IT-03: VIO-only fallback under matcher failure (AC-3.5).
  • C5-IT-04: smoothed past-keyframes → FDR but NOT to FC stream (AC-4.5 revised).
  • C5-IT-05: 3 s no-estimate triggers AC-5.2 fallback.
  • C5-IT-06: spoof-promotion gate ≥ 10 s + visual consistency (AC-NEW-2).
  • C5-IT-07: visual blackout + spoof escalation (AC-NEW-8).
  • C5-ST-01: spoof-rejection logging cannot be silenced.

Non-functional requirements

  • C5-PT-01: add_pose_anchor + current_estimate p95 ≤ 60 ms; memory ≤ 100 MB resident.

Risks & mitigations

  • R05 (iSAM2 silent factor-add failure) — every add logs success/false.
  • R07 (spoof premature promotion) — primary owner of the gate.

Effort

T-shirt XL; 3455 points.

Child issues

# Title Pts
1 StateEstimator interface + composition 3
2 iSAM2 + IncrementalFixedLagSmoother K=10-20 wiring 5
3 BetweenFactorPose3 (VIO) + GenericProjectionFactorCal3DS2 (pose) 5
4 Marginals.marginalCovariance integration 3
5 Source-label state machine + spoof-promotion gate 5
6 EskfStateEstimator mandatory simple-baseline 5
7 Smoothed-history → FDR path (NOT to FC) 3
8 AC-5.2 fallback path 3
9 Orthorectifier → C6 mid-flight tile gen sub-path 3
10 Component-internal tests C5-IT-01..07 + C5-PT-01 + C5-ST-01 5

Key constraints

  • AC-1.3, AC-1.4, AC-3.5, AC-4.5 (revised), AC-5.2, AC-NEW-2, AC-NEW-8.

Testing strategy

Per components/07_c5_state/tests.md.


E-C8 — C8 FC + GCS Adapter

Tracker: AZ-261 | Type: component | T-shirt: L | Story points: 2134

System context

flowchart LR
  FCIN[FC inbound MAVLink/MSP2] --> C8I[C8 inbound]
  C8I --> C5[C5]
  C8I --> C1[C1]
  C5 --> C8O[C8 outbound]
  C8O -->|GPS_INPUT / MSP2_SENSOR_GPS| FCOUT[FC]
  C8O -->|telemetry 1-2 Hz| GCS[QGroundControl]

Problem / Context

Per-FC inbound + outbound. Inbound: subscribe to FC IMU/attitude/GPS-health/MAV_STATE; publish ImuWindow/AttitudeWindow/GpsHealth/FlightStateSignal. Outbound: encode EstimatorOutput for AP (GPS_INPUT) and iNav (MSP2_SENSOR_GPS) at 5 Hz with honest 6×6 → 2×2 covariance projection. Owns MAVLink 2.0 signing on AP wired channel (D-C8-9 = (d), R03 risk) + per-flight key rotation. Also feeds GCS at 12 Hz.

Scope

In scope: FcAdapter + PymavlinkArdupilotAdapter + Msp2InavAdapter; GcsAdapter + QgcTelemetryAdapter; signing handshake + per-flight ephemeral key + zeroisation; D-C8-2 source-set switch (gated by IT-3); honest covariance projection.

Out of scope: state estimation (E-C5); GCS workflow logic (operator side, E-C12).

Architecture notes

  • File: components/10_c8_fc_adapter/description.md.
  • Both AP + iNav adapters typically linked into the deployment binary (per ADR-002 — config picks one at runtime).
  • ADR-008 source-set switch gated by IT-3.

Interface specification

class FcAdapter(Protocol):
    def open(port: PortConfig, signing_key: bytes | None) -> None: ...
    def subscribe_telemetry(cb: Callable[[FcTelemetryFrame], None]) -> Subscription: ...
    def emit_external_position(o: EstimatorOutput) -> None: ...
    def emit_status_text(msg: str, severity: Severity) -> None: ...
    def request_source_set_switch() -> None: ...    # AP only
    def current_flight_state() -> FlightStateSignal: ...

Data flow

sequenceDiagram
  participant FC as FC
  participant C8I as C8 inbound
  participant C5 as C5
  participant C8O as C8 outbound
  FC->>C8I: IMU + attitude + gps_health
  C8I->>C5: ImuWindow / AttitudeWindow / GpsHealth
  C5->>C8O: EstimatorOutput
  C8O->>FC: GPS_INPUT / MSP2_SENSOR_GPS @ 5 Hz

Dependencies

  • E-C5, E-CC-CONF, E-CC-LOG.
  • External: pymavlink, MSP2 client, ArduPilot SITL, QGroundControl SITL.

Acceptance criteria

  • C8-IT-01: 6×6 → 2×2 honest covariance projection within 1% norm.
  • C8-IT-02: 5 Hz emission jitter ≤ ±5%.
  • C8-IT-03: warm-start GPS from FC EKF ≤ 1 s after C8 ready (AC-5.1).
  • C8-IT-04: GCS stream 12 Hz (AC-6.1).
  • C8-IT-05: GCS commands accepted (AC-6.2).
  • C8-IT-06: WGS84 round-trip ≤ 1 cm position residual (AC-6.3).
  • C8-IT-07: source-set switch ≤ 3 s of gate-clear (AC-NEW-2).
  • C8-IT-08: iNav adapter never attempts signing; AP always (RESTRICT-COMM-2).
  • C8-ST-01: MAVLink 2.0 signing handshake passes IT-3 SITL gate (R03).
  • C8-ST-02: per-flight key never persists across flights.

Non-functional requirements

  • C8-PT-01: emit_external_position p95 ≤ 5 ms; inbound IMU callback p95 ≤ 1 ms.

Risks & mitigations

  • R03 (signing handshake no precedent) — gated by IT-3; D-C8-2-FALLBACK options recorded.
  • R09 (key compromise) — per-flight ephemeral keys + zeroisation.

Effort

T-shirt L; 2134 points.

Child issues

# Title Pts
1 FcAdapter interface + composition 3
2 PymavlinkArdupilotAdapter outbound GPS_INPUT 5
3 Msp2InavAdapter outbound MSP2_SENSOR_GPS 3
4 Inbound IMU/attitude/gps_health/MAV_STATE subscription 3
5 Honest 6×6 → 2×2 covariance projection 3
6 MAVLink 2.0 per-flight signing handshake (AP) 5
7 Source-set switch (AP D-C8-2 gated by IT-3) 3
8 GcsAdapter + downsampled telemetry 3
9 Component-internal tests C8-IT-01..08 + C8-PT-01 + C8-ST-01..02 5

Key constraints

  • AC-4.3, AC-4.4, AC-5.1, AC-5.2, AC-6.1, AC-6.2, AC-6.3, AC-NEW-2; RESTRICT-FC-1 / FC-2 / FC-3, RESTRICT-COMM-1 / COMM-2.

Testing strategy

Per components/10_c8_fc_adapter/tests.md.


E-BBT — Blackbox Tests (FT/NFT scenarios)

Tracker: AZ-262 | Type: tests | T-shirt: M | Story points: 1321

System context

flowchart LR
  TESTROOT[tests/ runner] --> FTP[FT-P functional positive]
  TESTROOT --> FTN[FT-N functional negative]
  TESTROOT --> NFTPERF[NFT-PERF Tier-2]
  TESTROOT --> NFTLIM[NFT-LIM resource]
  TESTROOT --> NFTSEC[NFT-SEC security]
  TESTROOT --> NFTRES[NFT-RES resilience]
  TESTROOT --> IT[IT integration]

Problem / Context

Per-component epics ship their own component-internal unit/contract tests; this epic parents the suite-level scenarios already specified in _docs/02_document/tests/*.md. They exercise end-to-end ACs and restrictions and bind multiple components together.

Scope

In scope: implementing the FT-P, FT-N, NFT-PERF, NFT-LIM, NFT-SEC, NFT-RES, IT scenario IDs cited in traceability-matrix.md. Test data setup, fixtures (Derkachi flight + AerialVL S03 + e2e-test mock-suite-sat-service), Tier-2 runner orchestration.

Out of scope: per-component unit/contract tests (live in each component epic).

Architecture notes

  • Files: _docs/02_document/tests/blackbox-tests.md, performance-tests.md, security-tests.md, resource-limit-tests.md, resilience-tests.md, environment.md, test-data.md, traceability-matrix.md.
  • Tier-1 vs Tier-2 split per ADR-005.

Interface specification

Tests are pytest scenarios; no runtime interface beyond the test runner CLI.

Data flow

sequenceDiagram
  participant CI as CI runner
  participant FX as Fixtures
  participant SUT as System under test (compose / Tier-2 binary)
  CI->>FX: stage Derkachi corpus + SITL containers
  CI->>SUT: bring up
  CI->>SUT: drive scenario inputs
  SUT-->>CI: emitted MAVLink + FDR records
  CI->>CI: assert per scenario pass criteria

Dependencies

  • All component epics (each must ship ready-to-test).

Acceptance criteria

  • Every scenario ID cited in traceability-matrix.md exists, runs, and passes on its target tier.
  • Coverage ≥ 75% gate held (currently 92.4% inclusive / 89.8% strict — confirmed pre-Step-6).
  • PARTIAL / NOT COVERED rows have linked leftover entries explaining the deferral.

Non-functional requirements

  • Tier-1 full suite wall-clock ≤ 30 min on a developer laptop.
  • Tier-2 NFT suite wall-clock ≤ 90 min on the bench Jetson.

Risks & mitigations

  • R11 (statistical headroom) — NFT-RES-03 / NFT-SEC-01 use Monte-Carlo-with-CI per the AC-text relaxation.

Effort

T-shirt M; 1321 points (test implementation; scenario specs already exist).

Child issues

# Title Pts
1 Test environment scaffolding (tests/conftest.py, fixtures dir, Postgres + SITL bring-up) 3
2 FT-P-* implementation (positive functional scenarios) 5
3 FT-N-* implementation (negative functional scenarios) 3
4 NFT-PERF / NFT-LIM Tier-2 runner integration 5
5 NFT-SEC implementation (incl. NFT-SEC-02 network egress + NFT-SEC-01 cache poisoning) 5
6 NFT-RES resilience scenarios 3
7 IT-3 ArduPilot SITL signing handshake (R03 gate) 5
8 IT-12 comparative-study runner 3

Key constraints

  • AC-NEW-3, AC-NEW-5, AC-NEW-7, RESTRICT-HW-1.

Testing strategy

This epic IS the testing strategy for system-level scenarios. Per-component testing belongs to component epics.


E-DEMO-REPLAY — Offline replay mode (video + tlog → per-tick coordinate stream)

Tracker: AZ-265 Type: feature (deployment-adjacent) T-shirt: M | Story points: 2732 Added: Decompose Step 2 (cycle 1, 2026-05-10) Source notes: _docs/how_to_test.md (user-written demo requirements — auto-sync incorporated as child task #8)

System context

Demonstrate the GPS-denied positioning pipeline against historical flight data: a video file from the nav camera + a .tlog file from the FC. The replay mode runs the same C1C5 inference pipeline the airborne binary runs; only the input transport (live camera → video file; live MAVLink → tlog) and output sink (FC MAVLink emit → JSONL) differ. NO ROS dependency is added — replay reuses the existing C8 FcAdapter interface via the strategy pattern.

flowchart LR
  subgraph LIVE[Airborne mode — unchanged]
    CAM[Live camera] --> C1L[C1 VIO]
    FCL[Live FC MAVLink] --> C8L[C8 inbound]
    C8L --> C1L
    C1L --> C2L[C2..C5]
    C2L --> C8OL[C8 outbound] --> FCL
  end
  subgraph REPLAY[Replay mode — this epic]
    VID[Video file .mp4/.h264] --> VFFS[VideoFileFrameSource] --> C1R[C1 VIO]
    TLOG[tlog file] --> TLR[TlogReplayFcAdapter] --> C1R
    C1R --> C2R[C2..C5]
    C2R --> RSINK[JsonlReplaySink] --> JSONL[results.jsonl - one EstimatorOutput per tick]
  end

Problem / Context

The parent-suite UI (in ui/ workspace, out of scope for this repo) needs to demo the GPS-denied positioning end-to-end. Per-component fixtures or simulators would not give the demo end-to-end fidelity. Instead, replay mode runs the production pipeline against historical inputs — demo confidence equals field test confidence on the same footage.

ROS as the input transport was considered and rejected: the system is MAVLink-native; introducing ROS would (a) add a major new dependency, (b) split production vs. demo code paths, and (c) duplicate code. Reusing the existing C8 FcAdapter interface with a tlog-replay strategy is strictly better.

Scope

In scope:

  • FrameSource interface (formalised cross-cutting; previously implicit "camera ingest thread") + VideoFileFrameSource strategy + LiveCameraFrameSource retrofit (no-op restructure of existing camera plumbing).
  • TlogReplayFcAdapter strategy (new C8 FcAdapter impl) parsing pymavlink .tlog files and emitting ImuWindow / AttitudeWindow / GpsHealth / FlightStateSignal at tlog timestamp cadence.
  • ReplaySink interface + JsonlReplaySink impl (one EstimatorOutput per line).
  • compose_replay(config) -> ReplayRoot composition root extending E-CC-CONF (AZ-246).
  • Clock injection (per R-DEMO-4) so timer-driven logic in C1C5 works in both wall-clock (live) and tlog-simulated (replay) modes.
  • gps-denied-replay CLI: --video PATH --tlog PATH --output results.jsonl --camera-calibration calib.json --config config.yaml --pace {realtime,asap} [--time-offset-ms N].
  • Fourth Docker image gps-denied-replay-cli (Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server).
  • E2E replay test on a 12 min Derkachi clip + matching tlog asserting estimated track within ≤ 100 m of ground-truth GPS for ≥ 80 % of ticks.

Out of scope:

  • ROS / ROS2 dependency.
  • HTTP wrapper microservice (parent-suite UI backend shells out to the CLI; defer until subprocess-shape is proven insufficient).
  • Modifying any C1C5 component to be replay-aware — they MUST remain mode-agnostic.
  • C6 mid-flight write path (replay reads a pre-built tile cache; doesn't write).

Architecture notes

  • ADR-001 / ADR-002 / ADR-009 all apply unchanged.
  • New BUILD_* flags: BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL. Default ON for the new replay-cli binary; OFF for airborne, research, and operator-orchestrator.
  • New cross-cutting FrameSource interface lives at src/gps_denied_onboard/frame_source/ (Layer 1 Foundation per module-layout.md § layering).
  • compose_replay lives in runtime_root.py alongside compose_root and compose_operator.

Interface specification

class FrameSource(Protocol):
    def next_frame(self) -> NavCameraFrame | None: ...
    def close(self) -> None: ...

class VideoFileFrameSource(FrameSource):
    def __init__(self, video_path: Path, frame_rate_hz: float, camera_id: str): ...

class TlogReplayFcAdapter(FcAdapter):  # FcAdapter from AZ-261 / E-C8
    def __init__(self, tlog_path: Path, target_fc_dialect: enum {ARDUPILOT, INAV}): ...

class ReplaySink(Protocol):
    def emit(self, output: EstimatorOutput) -> None: ...
    def close(self) -> None: ...

class JsonlReplaySink(ReplaySink):
    def __init__(self, output_path: Path): ...

def compose_replay(config: Config) -> ReplayRoot: ...

Data flow

Startup → load config / calibration → process tlog + video timestamp-aligned → for each frame: camera-ingest → C1 → C2 → C2.5 → C3 → C3.5 → C4 → C5 → emit EstimatorOutput to JsonlReplaySink. End of input → close sink → exit.

--pace realtime paces frames at wall-clock; --pace asap runs uncapped (default). The injected Clock is wall-clock-derived in realtime mode and tlog-timestamp-derived in asap mode so component fallback timers (e.g., AC-5.2 3 s no-estimate fallback) trigger consistently in both.

Dependencies

  • E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8 (every per-frame component).
  • E-CC-CONF (AZ-246) for compose_root extension.
  • E-CC-HELPERS (AZ-264) for WgsConverter (tlog GPS → local-tangent-plane).
  • Does NOT depend on E-C6 / E-C10 / E-C11 / E-C12 (replay reads pre-built cache; no operator-side workflows).

Acceptance criteria

  • AC-1: CLI exits 0 on a valid 1-min fixture and produces JSONL with one EstimatorOutput line per tlog tick (within ±5 % of GLOBAL_POSITION_INT count).
  • AC-2: Each line is a valid JSON object matching the EstimatorOutput schema.
  • AC-3: For a fixture with known ground-truth GPS, the L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound).
  • AC-4: Replay binary contains C1C5 + replay strategies; SBOM diff CI step verifies absence of C6/C10/C11/C12.
  • AC-5: Same input → same output (deterministic) within ≤ 1e-6 float drift in position fields.
  • AC-6: --pace realtime runs the 1-min fixture in 60 ± 5 s; --pace asap in ≤ 30 s on Tier-1 hardware.
  • AC-7: Without --time-offset-ms, the CLI auto-detects the video ↔ tlog offset by correlating video motion-onset (or first-frame timestamp) with the tlog IMU take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s, matching the typical quadcopter take-off signature). On a fixture with known correct offset, the auto-detected offset is within ± 200 ms of ground truth. If auto-detect confidence is < 80 % the CLI logs a WARN and proceeds with the best-guess offset; --time-offset-ms N always overrides the auto-detect.
  • AC-8: If neither auto-detect nor manual offset can produce > 95 % of frames with at least one matching IMU window within ± 100 ms, the CLI exits with code 2 and prints both the auto-detected offset (if any) and the percentage of frames-with-IMU-window so the operator can debug.

Non-functional requirements

  • Cold-start ≤ 5 s (not subject to AC-NEW-1's 30 s budget — that's airborne-only).
  • Throughput ≥ 5 × real time on Jetson AGX Orin for --pace asap.
  • Memory ≤ 4 GB resident (lean image; no FAISS index unless tile lookup is needed).

Risks & mitigations

  • R-DEMO-1: Tlog ↔ video timestamp drift across long flights, AND the more-common case that recordings on the operator workstation are not synchronised at all (camera and FC start independently, often minutes apart). Mitigation: auto-sync via IMU take-off detection (AC-7) is the default; --time-offset-ms N is the manual override. If take-off pattern is ambiguous (e.g., fixed-wing hand-launch instead of quadcopter, or tlog includes pre-arm motion), CLI WARNs and falls back to the manual override.
  • R-DEMO-2: Pymavlink slow on multi-GB tlogs. Mitigation: stream-parse, never materialise; benchmark + document throughput floor.
  • R-DEMO-3: Demo footage missing required FC messages (HIL mode etc.). Mitigation: CLI fails fast at startup listing missing message types and the components that need them.
  • R-DEMO-4: Production C1C5 paths bake real-time-cadence assumptions (e.g., 5 s fallback timer). Mitigation: Clock injection (wall-clock for live, tlog-derived for replay); documented as ADR amendment in next architecture-doc cycle.

Effort

T-shirt M; 2732 points across 8 child tasks.

Child issues

# Title Pts
1 FrameSource interface (cross-cutting) + VideoFileFrameSource strategy + LiveCameraFrameSource retrofit 3
2 TlogReplayFcAdapter strategy (pymavlink stream parser → inbound DTOs) 5
3 ReplaySink interface + JsonlReplaySink impl 3
4 compose_replay(config) + Clock injection (per R-DEMO-4) 3
5 gps-denied-replay CLI entrypoint + arg parser + camera-calibration loader 3
6 gps-denied-replay-cli Dockerfile + GitHub Actions matrix entry + SBOM diff (excludes C6/C10/C11/C12) 3
7 E2E replay fixture test (Derkachi 12 min clip + tlog; AC-3 ≤100 m ≥ 80 % assertion) 5
8 Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8; --time-offset-ms remains the manual override) 5

Key constraints

  • ADR-001 / ADR-002 / ADR-009.
  • C1C5 components MUST remain mode-agnostic; replay-aware logic lives only in the composition root, the new strategies, and the CLI.
  • No HTTP server in any companion binary (airborne or replay); HTTP wrapper, if added later, lives in operator-orchestrator per module-layout.md Layer-4 placement.

Testing strategy

Unit tests under tests/unit/frame_source/, tests/unit/c8_fc_adapter/test_tlog_replay_adapter.py, tests/unit/c8_fc_adapter/test_replay_sink.py, tests/unit/cli/test_replay_cli.py. E2E under tests/e2e/replay/ running the CLI against the Derkachi fixture (Tier-1 capable; gated by RUN_REPLAY_E2E=1 in CI). No FT/NFT scenarios at this epic — those live in E-BBT.


Lessons applied (Step 6 step-0 retrospective)

_docs/LESSONS.md does not yet exist (this is the project's first cycle), so no prior estimation/architecture/dependencies lessons were folded into the sizing above. When this cycle ends, the Final step's quality checklist should propose a lessons file capturing:

  • C2.5 ↔ C3 helper-ownership (R14) — generalisable lesson: when two siblings share a runtime, place ownership in a shared helper from day one rather than discovering the cycle in a 4a evaluator pass.
  • ADR-007 reversal (mock-as-fixture) — generalisable lesson: a test fixture is not a component; promoting one inflates architectural surface and risks contract drift.
  • D-PROJ-2 / D-PROJ-3 carryforwards — generalisable lesson: cross-suite design dependencies belong in _process_leftovers/ from the moment they are recognised, with full payload so a later cycle can replay them.

These three are candidates for the next cycle's LESSONS.md.