Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com>
82 KiB
Work-Item Epics — gps-denied-onboard Plan cycle 1
This file is the local epic draft for Plan Step 6. Tracker IDs (AZ-XXX) are now populated for every epic — they live in Jira project AZ. The canonical E-* ↔ AZ-NN mapping below is the source of truth referenced from each Jira epic's description.
Conventions
- Issue type: Epic.
- Epic descriptions are self-contained per the plan-skill rule: a developer reading only the epic should understand the full context. Each epic has the 14 required sections (system context, problem, scope, architecture notes, interface spec, data flow, dependencies, AC, NFRs, risks, effort, child issues, key constraints, testing strategy).
- Effort sizing: T-shirt size + story-points range for the epic; per-task story points (PBI complexity) follow the user rule (1, 2, 3, 5, 8 — no PBI > 5; create 2/3, sometimes 5).
- Cross-cutting epics parent exactly one shared implementation task; component epics consuming the concern declare a dependency, never re-implement locally.
- Dependency rule: no epic depends on a later one in this index.
Decompose-time amendment (cycle 1, dated 2026-05-10)
Row 20 (E-CC-HELPERS / AZ-264) was added during Decompose Step 2 to comply with the cross-cutting rule. The 8 shared helpers (ImuPreintegrator, SE3Utils, LightGlueRuntime, WgsConverter, Sha256Sidecar, EngineFilenameSchema, RansacFilter, DescriptorNormaliser) were originally listed as child issues inside their largest-consumer component epics (e.g., ImuPreintegrator under E-C1 child #5, LightGlueRuntime under E-C2.5 child #2). Those child-issue listings are now superseded — helper ownership moves to E-CC-HELPERS, and component epics consume helpers as dependencies. The original component epic descriptions in Jira still reference the helpers in their child-issue tables; those will be reconciled at the next epic-edit pass (or at Step 4 cross-verification).
Index
| # | Epic ID | Title | Type | Tracker | T-shirt | Story Pts | Depends on |
|---|---|---|---|---|---|---|---|
| 1 | E-BOOT | Bootstrap & Initial Structure | bootstrap | AZ-244 | M | 13–21 | — |
| 2 | E-CC-LOG | Cross-Cutting: Structured JSON Logging | cross-cutting | AZ-245 | S | 5–8 | E-BOOT |
| 3 | E-CC-CONF | Cross-Cutting: Configuration & Composition Root | cross-cutting | AZ-246 | S | 5–8 | E-BOOT |
| 4 | E-CC-FDR-CLIENT | Cross-Cutting: FDR Producer Client (lock-free queue + record schema) | cross-cutting | AZ-247 | M | 8–13 | E-BOOT, E-CC-LOG |
| 5 | E-C13 | C13 Flight Data Recorder (writer thread + segments + cap) | component | AZ-248 | L | 21–34 | E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT |
| 6 | E-C7 | C7 On-Jetson Inference Runtime | component | AZ-249 | L | 21–34 | E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT |
| 7 | E-C6 | C6 Tile Cache + Spatial Index | component | AZ-250 | M | 13–21 | E-BOOT, E-CC-LOG, E-CC-CONF |
| 8 | E-C11 | C11 Tile Manager (TileDownloader + TileUploader) | component | AZ-251 | M | 13–21 | E-C6, E-CC-CONF, E-CC-LOG |
| 9 | E-C10 | C10 Pre-flight Cache Provisioning | component | AZ-252 | M | 13–21 | E-C6, E-C7, E-CC-LOG |
| 10 | E-C12 | C12 Operator Pre-flight Orchestrator | component | AZ-253 | M | 13–21 | E-C10, E-C11, E-CC-LOG |
| 11 | E-C1 | C1 Visual / Visual-Inertial Odometry | component | AZ-254 | XL | 34–55 | E-BOOT, E-CC-FDR-CLIENT, E-C7 |
| 12 | E-C2 | C2 Visual Place Recognition | component | AZ-255 | L | 21–34 | E-C6, E-C7, E-CC-FDR-CLIENT |
| 13 | E-C2.5 | C2.5 Inlier-based Re-rank | component | AZ-256 | S | 5–8 | E-C2, E-C7, E-C6 (LightGlue helper shared with C3) |
| 14 | E-C3 | C3 Cross-Domain Matcher | component | AZ-257 | L | 21–34 | E-C2.5, E-C7 |
| 15 | E-C3.5 | C3.5 AdHoP-Conditional Refinement | component | AZ-258 | M | 8–13 | E-C3, E-C7 |
| 16 | E-C4 | C4 Pose Estimator | component | AZ-259 | M | 13–21 | E-C3.5, E-C5 (shared GTSAM substrate; co-developed) |
| 17 | E-C5 | C5 State Estimator | component | AZ-260 | XL | 34–55 | E-C1, E-C4 (shared graph), E-CC-FDR-CLIENT |
| 18 | E-C8 | C8 FC + GCS Adapter | component | AZ-261 | L | 21–34 | E-C5, E-CC-CONF, E-CC-LOG |
| 19 | E-BBT | Blackbox Tests (FT/NFT scenarios) | tests | AZ-262 | M | 13–21 | every component epic ships its component-internal tests under its own epic; this one parents the suite-level FT/NFT scenarios in _docs/02_document/tests/*.md |
| 20 | E-CC-HELPERS | Cross-Cutting: Common Helpers (8 shared utilities) | cross-cutting | AZ-264 | M | 13–21 | E-BOOT, E-CC-LOG (added in Decompose Step 2 — supersedes per-component helper child-issues from cycle 1) |
| 21 | E-DEMO-REPLAY | Offline replay mode (video + tlog → per-tick coordinate stream) | feature | AZ-265 | M | 22–27 | E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8, E-CC-CONF (added in Decompose Step 2 — enables parent-suite UI demo via subprocess + JSONL streaming) |
High-level component dependency diagram
flowchart TB
BOOT[E-BOOT Bootstrap]
LOG[E-CC-LOG Logging]
CONF[E-CC-CONF Config + Composition Root]
FDRC[E-CC-FDR-CLIENT FDR Producer Client]
C13[E-C13 FDR]
C7[E-C7 Inference Runtime]
C6[E-C6 Tile Cache]
C11[E-C11 Tile Manager]
C10[E-C10 Cache Provisioning]
C12[E-C12 Operator Tooling]
C1[E-C1 VIO]
C2[E-C2 VPR]
C25[E-C2.5 Re-rank]
C3[E-C3 Matcher]
C35[E-C3.5 AdHoP]
C4[E-C4 Pose]
C5[E-C5 State]
C8[E-C8 FC Adapter]
BBT[E-BBT Blackbox Tests]
HELP[E-CC-HELPERS Common Helpers]
DEMO[E-DEMO-REPLAY Offline Replay Mode]
BOOT --> LOG --> FDRC --> C13
BOOT --> CONF --> C13
BOOT --> CONF --> C7
BOOT --> LOG --> HELP
C13 -.-> C7
CONF --> C6 --> C11
C6 --> C10
C7 --> C10
C10 --> C12
C11 --> C12
C7 --> C2 --> C25 --> C3 --> C35 --> C4
C6 --> C2
C6 --> C25
C1 --> C5
C4 <--> C5
C5 --> C8
FDRC --> C1
FDRC --> C5
C8 --> BBT
C12 --> BBT
HELP -.-> C1
HELP -.-> C2
HELP -.-> C25
HELP -.-> C3
HELP -.-> C35
HELP -.-> C4
HELP -.-> C5
HELP -.-> C6
HELP -.-> C7
HELP -.-> C8
HELP -.-> C10
HELP -.-> C11
HELP -.-> C12
C1 --> DEMO
C5 --> DEMO
C8 --> DEMO
CONF --> DEMO
E-BOOT — Bootstrap & Initial Structure
Tracker: AZ-244 Type: bootstrap T-shirt: M | Story points: 13–21 Owner: onboard team
System context
flowchart LR
EBOOT[E-BOOT scaffolding] --> SRC[src/ component dirs]
EBOOT --> CICD[CI Tier-1 + Tier-2 jobs]
EBOOT --> DOCKER[docker-compose.test.yml]
EBOOT --> DB[Postgres init scripts]
EBOOT --> TESTROOT[tests/ + tests/fixtures/]
Problem / Context
No source layout exists yet. Every downstream epic assumes a defined repo skeleton: src/components/<id>_<name>/, src/shared/<concern>/, tests/, tests/fixtures/, plus the Tier-1 Docker compose, the Tier-2 CI job, the Postgres init scripts that match data_model.md, and the operator-orchestrator tarball build path. Until this exists, no other epic can start.
Scope
In scope:
- Create
src/components/<id>_<name>/for all 14 components with empty package init. - Create
src/shared/{logging,config,fdr_client,crypto,calibration_loader}/placeholders. pyproject.toml(Python) +CMakeLists.txt(C++ where used by C1) with the project's pinned dep set.- Tier-1
docker-compose.test.ymlskeleton (companion + Postgres + e2e-runner; mock-suite-sat-service compose pulled in only by upload tests). - Tier-2 CI job that runs on the bench Jetson runner, with the JetPack 6.2 / TRT 10.3 / SM 87 image pinned per ADR-005.
- Postgres init scripts for the schema in
data_model.md. tests/directory withtests/fixtures/,tests/tmp/,tests/conftest.py.- Empty
runtime_root.pyfor the airborne composition root +operator_tool/__main__.pyfor the operator side. .gitignorecovering binaries, engine caches, FDR segments, ephemeral keys.- README with run commands.
Out of scope:
- Any per-component logic (each component's epic owns its own implementation).
- Cross-cutting impl (logging / config / FDR client live in their own epics).
Architecture notes
- ADR-005 (Tier-1 / Tier-2 are first-class) drives the CI split.
- ADR-009 (composition root) places
runtime_root.pyat the airborne entrypoint andoperator_tool/__main__.pyat the operator side. - ADR-002 (build-time exclusion) requires per-implementation CMake
BUILD_*flags and the SBOM diff to be wired in CI from day one. - ADR-004 (process isolation) requires the airborne build target to refuse
c11_tilemanager/symbols. SBOM diff hook lives here from Bootstrap onward.
Interface specification
This epic exposes no runtime interface; it ships repository scaffolding only.
Data flow
N/A.
Dependencies
- Epic dependencies: none.
- External: GitHub Actions runner pool (Tier-1 Docker), bench Jetson runner (Tier-2), pinned base images (JetPack 6.2, Postgres 16, mcr.microsoft.com/dotnet/aspnet:8.0-alpine for the test fixture).
Acceptance criteria
docker compose -f docker-compose.test.yml up -dbrings up companion + Postgres + e2e-runner cleanly on a fresh workstation.- Tier-2 CI smoke-job (
echo $JETPACK_VERSION+nvidia-smi) passes on the bench Jetson. pytest tests/ -q --collect-onlydiscovers the emptytests/tree without errors.- The SBOM diff CI step exists and fails the build if
c11_tilemanagerever appears in the airborneproduction-binaryartifact (R02 enforcement seed). runtime_root.pyruns and exits cleanly with a "no components configured" message (proves composition root wiring).
Non-functional requirements
- CI cold-build wall-clock ≤ 10 min on Tier-1; ≤ 6 min on Tier-2 (just the smoke-job).
- Repo size at this stage ≤ 5 MB (no fixtures committed).
Risks & mitigations
- R12 (single deployment camera) — Bootstrap's CI must not assume the unit is plugged in; Tier-2 smoke-job runs without the camera, only against TRT/SM/JP version.
Effort
T-shirt M; 13–21 story points across child PBIs (each ≤ 5 points).
Child issues (PBIs)
| # | Title | Pts |
|---|---|---|
| 1 | Repo scaffolding: src/components/, src/shared/, tests/, runtime_root.py |
2 |
| 2 | pyproject.toml + CMakeLists.txt with pinned deps |
3 |
| 3 | Tier-1 docker-compose.test.yml skeleton + Postgres init |
3 |
| 4 | Tier-2 CI smoke-job on bench Jetson | 3 |
| 5 | SBOM diff CI step (R02 enforcement seed; fails on c11_tilemanager in airborne artifact) |
3 |
| 6 | .gitignore + README.md + run commands |
2 |
| 7 | runtime_root.py minimum (compose root + "no components configured" exit path) |
2 |
Key constraints
- RESTRICT-HW-1 (Jetson Orin Nano Super, 8 GB shared LPDDR5, 25 W) — Tier-2 image pins SM 87 / JP 6.2 / TRT 10.3.
- RESTRICT-FC-1 (AP + iNav supported; PX4 out of scope) — composition root wires only AP + iNav adapters.
Testing strategy
- CI smoke tests on every PR (Tier-1 compose-up, Tier-2 nvidia-smi).
- No unit tests yet — those live in component epics.
E-CC-LOG — Cross-Cutting: Structured JSON Logging
Tracker: AZ-245 Type: cross-cutting T-shirt: S | Story points: 5–8
System context
Every component's § 9 Logging Strategy mandates structured JSON logging at ERROR / WARN / INFO / DEBUG levels with per-frame fields (frame_id, kind, component-specific keys). A single shared logger module under src/shared/logging/ produces these records; every component imports it.
flowchart LR
COMP[Any component] --> LOGGER[src/shared/logging<br/>structured JSON]
LOGGER --> STDOUT[stdout / journald]
LOGGER --> FDR[FDR (via E-CC-FDR-CLIENT for ERROR + WARN)]
Problem / Context
If every component rolls its own logger, format drift is guaranteed. The traceability-matrix and post-flight FDR analysis rely on a stable JSON schema; a shared logger is the only honest way.
Scope
In scope:
src/shared/logging/__init__.pyexportingget_logger(component_id: str) -> Logger.- JSON formatter with stable field ordering (
ts, level, component, frame_id, kind, msg, ...kv). - Drop-in
RotatingStdoutHandlerfor Tier-1 dev;JournaldHandlerfor Tier-2 production. - Bridge into the FDR client for ERROR + WARN levels (handler subscribes to log records and enqueues a
kind = "log"FdrRecord). - Helpers for the documented per-frame log shapes (
vio.frame_id,vpr.top10_distances, etc.) so component code is short.
Out of scope: per-component log content (lives in each component epic's child PBIs).
Architecture notes
Stdlib logging + python-json-logger (or orjson formatter for speed). No new dependency beyond what's already in pyproject.toml. No third-party log aggregator — Tier-1 uses Docker stdout capture; Tier-2 uses journald.
Interface specification
def get_logger(component_id: str) -> logging.Logger: ...
class StructuredJsonHandler(logging.Handler):
"""JSON formatter + FDR bridge for ERROR/WARN."""
class FdrLogBridge:
"""Subscribed by the logger; forwards ERROR + WARN to E-CC-FDR-CLIENT.enqueue."""
Data flow
sequenceDiagram
participant C as Component
participant L as Logger
participant S as stdout
participant F as FDR Client
C->>L: log.warn("VPR top-1 above threshold", distance=0.42)
L->>S: {"level":"WARN", "component":"c2", ...}
L->>F: enqueue(kind="log", level="WARN", payload=...)
Dependencies
- Depends on E-BOOT.
- External:
python-json-loggerororjson(whichever is already pinned).
Acceptance criteria
- Every component test that asserts a log message uses the shared logger and finds the expected JSON shape.
- ERROR + WARN records appear in FDR with
kind = "log"and a back-reference to the originating component. - INFO + DEBUG do NOT appear in FDR (per-component § 9 storage rule).
- Log format passes a contract test (
tests/contract/log_schema.py) verifying field names + ordering + required keys.
Non-functional requirements
- Per-record latency p99 ≤ 0.2 ms (lock-free emit on the hot path).
- No allocation in the steady-state DEBUG path beyond the message string itself.
Risks & mitigations
- R13 (FDR queue overrun) — the FDR bridge uses E-CC-FDR-CLIENT's drop-oldest semantics; it never blocks the caller.
Effort
T-shirt S; 5–8 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | src/shared/logging/ module + JSON formatter + handlers |
3 |
| 2 | FDR log bridge (ERROR + WARN → kind=log) | 2 |
| 3 | Contract test tests/contract/log_schema.py |
2 |
Key constraints
- AC-NEW-3 (FDR ≤ 64 GB / flight, no silent drops) — DEBUG must not flow into FDR; verified by the contract test.
Testing strategy
- Unit tests for the formatter (field ordering + escaping).
- Contract test against the FDR record schema (kind=log).
- Integration via every component's tests.md (each component asserts at least one log message).
E-CC-CONF — Cross-Cutting: Configuration & Composition Root
Tracker: AZ-246 Type: cross-cutting T-shirt: S | Story points: 5–8
System context
ADR-001 (runtime selection by config) + ADR-009 (composition root) together require a single shared loader that materialises the Config object at process startup, plus a compose_root(config) function that constructs each strategy/component instance with its dependencies. No component instantiates another component itself.
flowchart LR
ENV[ENV vars] --> LOADER
YAML[config.yaml] --> LOADER
CALIB[Camera calibration JSON] --> LOADER
LOADER[src/shared/config/loader] --> ROOT[runtime_root.py / operator_tool/__main__.py]
ROOT --> COMPS[component instances]
Problem / Context
Without a single source of truth for configuration, the BUILD_* + runtime-strategy-selection rules of ADR-001/002/009 collapse — components silently fall back to defaults, and the composition root grows local config-parsing logic that drifts. The CI gate that ensures only the linked strategies are selectable also lives here.
Scope
In scope:
src/shared/config/loader.py: env + YAML + camera-calibration JSON merging with explicit precedence (env > YAML > defaults).Configdataclass (frozen) covering every component's startup knob.compose_root(config) -> RuntimeRootfor the airborne process;compose_operator(config) -> OperatorRootfor the tooling side.- Strategy-vs-build-flag consistency check at startup: refuse to start if config selects a strategy whose
BUILD_*flag was off in the linked binary.
Out of scope: any component's specific config shape (defined inside its own epic).
Architecture notes
- ADR-001, ADR-002, ADR-009 all converge here.
- The composition root is the only place
importof a concreteVioStrategy/VprStrategy/ etc. is allowed; component code imports the abstract interface only.
Interface specification
@frozen
class Config: ... # populated by union of every component's config schema
def load_config(env: dict[str, str], paths: list[Path]) -> Config: ...
def compose_root(config: Config) -> RuntimeRoot: ...
def compose_operator(config: Config) -> OperatorRoot: ...
Data flow
Startup-only — runs once per process. No per-frame path.
Dependencies
- Depends on E-BOOT.
Acceptance criteria
compose_rootconstructs a runnable airborne process for every documented config preset (default deployment, IT-12 research-binary, smoke-test minimal).- Strategy/build-flag mismatch triggers an explicit
StrategyNotLinkedErrorwith a clear message (no silent fallback). - Config precedence (env > YAML > defaults) verified by unit tests for at least 3 keys per layer.
runtime_root.pyexits with code 0 when given a valid config and no components actually do work (reachability proof).
Non-functional requirements
- Cold-start config load + compose ≤ 1 s on Tier-2 (counts toward AC-NEW-1's 30 s budget).
Risks & mitigations
- R02 (ADR-004 process isolation) — compose_root's strategy/build-flag check is the third enforcement gate (after SBOM diff and runtime self-check) preventing C11 from running airborne.
Effort
T-shirt S; 5–8 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | src/shared/config/loader.py + Config dataclass |
3 |
| 2 | compose_root + compose_operator skeletons + StrategyNotLinkedError |
3 |
| 3 | Unit tests for env/YAML/defaults precedence | 2 |
Key constraints
- ADR-002 (build-time exclusion) — only linked strategies selectable.
Testing strategy
- Unit: precedence +
StrategyNotLinkedError. - Integration: every documented preset starts cleanly.
E-CC-FDR-CLIENT — Cross-Cutting: FDR Producer Client
Tracker: AZ-247 Type: cross-cutting T-shirt: M | Story points: 8–13
System context
C13 owns the FDR writer thread, segment files, and the 64 GB cap. Every other component publishes via a producer-side client: lock-free enqueue + an FdrRecord schema versioned in RecordSchema. This epic owns ONLY the producer side; the writer-thread internals belong to E-C13.
flowchart LR
PROD[Component producer] --> Q[lock-free ring buffer]
Q --> WRITER[E-C13 writer thread]
WRITER --> SEG[segment file on NVM]
Problem / Context
Producer-side correctness (drop-oldest with rollover-log, schema versioning, never-block) is independent of where the file lands. Co-locating producer logic inside E-C13 would force every component test to spin up the writer thread; a thin shared client lets component tests use a fake sink.
Scope
In scope:
src/shared/fdr_client/__init__.pyexportingFdrClient(producer_id: str) -> Client.- Lock-free SPSC ring buffer per producer; capacity configurable (default per producer in
Config). FdrRecordversioned schema (orjson or msgpack — pinned in E-BOOT).- Drop-oldest behaviour writing a structured
kind=overrunrecord withproducer_id+ dropped count (never silent). FakeFdrSinkfor component-level tests.
Out of scope: writer thread, segment files, 64 GB cap, rollover policy (E-C13).
Architecture notes
- AC-NEW-3 (no silent drops) is enforced HERE: drop-oldest always emits the overrun record.
- Schema versioning prevents post-flight tooling breakage when payload classes evolve.
Interface specification
class FdrClient:
def __init__(self, producer_id: str): ...
def enqueue(self, record: FdrRecord) -> None: ... # lock-free, never blocks
def flush(self) -> None: ... # used by tests only
Data flow
sequenceDiagram
participant C as Component
participant Q as Ring buffer
participant W as Writer (E-C13)
C->>Q: enqueue(record)
alt overrun
Q->>Q: drop oldest + emit kind=overrun record
end
W->>Q: dequeue (in writer thread)
Dependencies
- Depends on E-BOOT, E-CC-LOG.
- Consumed by every component that emits FDR records.
Acceptance criteria
enqueuenever blocks even under writer-thread stall (verified by C13-IT-05 from the C13 tests.md).- Every overrun event produces a structured record with non-zero
dropped_countand the originatingproducer_id. - Schema version bump (e.g., adding a new field) does not break post-flight tooling that reads at version N-1 (forward-compatible parser).
Non-functional requirements
enqueuep99 ≤ 5 µs on Tier-2 (no allocation on the steady-state path; pre-sized buffers).- Per-producer ring buffer size ≤ configured cap (no unbounded growth).
Risks & mitigations
- R13 (queue overrun) — the design IS the mitigation: drop-oldest + always log.
Effort
T-shirt M; 8–13 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | Lock-free SPSC ring buffer per producer | 5 |
| 2 | FdrRecord schema + versioned serialiser (orjson/msgpack) |
3 |
| 3 | Drop-oldest + kind=overrun record emission |
2 |
| 4 | FakeFdrSink for component tests |
2 |
Key constraints
- AC-NEW-3 (no silent drops).
Testing strategy
- Unit: ring buffer correctness under contention; overrun record emitted.
- Property tests: forward-compat parser at version N-1.
E-C13 — C13 Flight Data Recorder
Tracker: AZ-248 | Type: component | T-shirt: L | Story points: 21–34
System context
flowchart LR
ALL[All components] -->|enqueue via E-CC-FDR-CLIENT| Q[per-producer queues]
Q --> W[C13 writer thread]
W --> SEGS[segmented files on NVM]
SEGS -.->|post-landing| OPTOOL[E-C12 retrieval]
Problem / Context
Per-flight ≤ 64 GB record of every payload class onboard, no silent drops, raw frames excluded except the ≤ 0.1 Hz failed-tile thumbnail forensic exception (AC-NEW-3, AC-8.5). Single writer thread; every other component produces.
Scope
In scope: writer thread, segment file lifecycle, 64 GB cap with oldest-segment-dropped policy, per-flight FlightHeader + FlightFooter, atomic segment rotation, mid-flight tile snapshot path, failed-tile thumbnail rate cap, refusal of takeoff when open_flight fails.
Out of scope: producer-side enqueue (E-CC-FDR-CLIENT); post-flight retrieval UI (E-C12).
Architecture notes
- File:
_docs/02_document/components/14_c13_fdr/description.mdis the canonical spec. IncrementalFixedLagSmootherfrom C5 publishes smoothed past-keyframes via FDR ONLY (AC-4.5 revised) — NOT into the FC stream.- Segment rotation uses
atomicwrites; cross-process safety on the FDR root viafilelock.
Interface specification
class FdrWriter:
def open_flight(header: FlightHeader) -> None: ... # raises FdrOpenError
def write_record(record: FdrRecord) -> None: ... # lock-free; FdrQueueOverrunError logged not raised
def close_flight() -> FlightFooter: ...
def current_size_bytes() -> int: ...
def is_rolling() -> bool: ...
Data flow
sequenceDiagram
participant Prods as Producers (every component)
participant Q as Per-producer queues
participant W as Writer thread
participant FS as NVM segment file
Prods->>Q: enqueue(record)
W->>Q: dequeue
W->>FS: serialise + append
alt segment >= cap
W->>FS: atomic rotate; drop oldest if total > 64 GB; emit kind=segment_rollover
end
Dependencies
- E-BOOT, E-CC-LOG, E-CC-CONF, E-CC-FDR-CLIENT.
Acceptance criteria
- AC-NEW-3: synthetic 8 h replay produces ≤ 64 GB on disk, with every drop accompanied by a
kind=overrunand/orkind=segment_rolloverrecord. - AC-8.5:
kind=raw_nav_framewrites raiseRawFrameWriteForbiddenError;kind=failed_tile_thumbnailrate-limited to ≤ 0.1 Hz. - AC-1.4 / AC-4.5: every smoothed past-keyframe revision lands in FDR; the FC emission stream is unchanged.
- AC-NEW-3 takeoff gate:
FdrOpenErroraborts takeoff before the FC adapter is opened.
Non-functional requirements
- Writer throughput ≥ 200 Hz aggregate (per C13-PT-01).
- Per-record serialise + write p95 ≤ 5 ms.
Risks & mitigations
- R13 (queue overrun) — drop-oldest + always-log.
- R02 (ADR-004) — C13 runs in the airborne process; no cross-process FDR root contention with C11 (C11 not airborne).
Effort
T-shirt L; 21–34 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | Writer thread + segment file open/close/rotate | 5 |
| 2 | FlightHeader / FlightFooter + records-written/dropped accounting |
3 |
| 3 | 64 GB cap + oldest-segment-dropped policy + kind=segment_rollover record |
5 |
| 4 | Mid-flight tile snapshot path + filesystem layout | 3 |
| 5 | Failed-tile thumbnail ≤ 0.1 Hz rate limiter + AC-8.5 enforcement | 3 |
| 6 | FdrOpenError takeoff abort path |
2 |
| 7 | Component-internal tests C13-IT-01..06 + C13-PT-01 + C13-ST-01 | 5 |
Key constraints
- AC-NEW-3, AC-8.5, AC-4.5 (revised), RESTRICT-UAV-4.
Testing strategy
- Per
_docs/02_document/components/14_c13_fdr/tests.md— six component-internal tests + 8 h NFT-LIM-02 at the suite level.
E-C7 — C7 On-Jetson Inference Runtime
Tracker: AZ-249 | Type: component | T-shirt: L | Story points: 21–34
System context
flowchart LR
C2[C2 VPR] --> C7
C25[C2.5 Re-rank] --> C7
C3[C3 Matcher] --> C7
C35[C3.5 AdHoP] --> C7
C7 --> TRT[TensorRT]
C7 --> ORT[ONNX Runtime]
C7 --> PT[PyTorch FP16]
C7 --> THERMAL[ThermalState publish]
THERMAL --> C4[C4 Pose hybrid]
Problem / Context
Centralise GPU inference on Jetson: engine compilation, deserialise + warm-up, per-call inference, fallback chain (TRT → ONNX-RT+TRT-EP → PyTorch FP16), and ThermalState telemetry that drives D-CROSS-LATENCY-1.
Scope
In scope: engine cache lifecycle, deserialise + warm-up budget (AC-NEW-1), ThermalState publisher from jetson-stats, D-C10-3 takeoff content-hash gate (engine-side), D-C10-7 filename-schema enforcement, ONNX-RT fallback path.
Out of scope: cache artifact build (E-C10), tile cache (E-C6), the per-frame consumers (their own epics).
Architecture notes
- File:
components/09_c7_inference/description.md. - Python in-process abstraction over C++ TRT bindings; no separate process.
- Engines hardware-tied (SM 87 / JP 6.2 / TRT 10.3 / FP16) per D-C10-6.
- Helper
EngineFilenameSchemais shared with E-C10.
Interface specification
class InferenceRuntime:
def load_engine(model_id: str) -> EngineHandle: ...
def infer(handle: EngineHandle, batch: Tensor) -> Tensor: ...
def thermal_state() -> ThermalState: ...
def warm_up(handle: EngineHandle) -> None: ...
Data flow
sequenceDiagram
participant Caller as C2/C2.5/C3/C3.5
participant C7 as InferenceRuntime
participant GPU as TRT/ONNX/PT
Caller->>C7: infer(handle, batch)
C7->>GPU: forward
GPU-->>C7: output
C7-->>Caller: tensor
C7-->>C4: ThermalState pub (≥1 Hz)
Dependencies
- E-BOOT, E-CC-CONF, E-CC-FDR-CLIENT.
- External: TensorRT 10.3, ONNX Runtime + TRT EP, PyTorch FP16, jetson-stats.
Acceptance criteria
- AC-NEW-1 cold-start: every required engine deserialises + warms in ≤ 30 s p95 (C7-IT-01).
- AC-NEW-5:
ThermalStateupdates ≥ 1 Hz; throttle-detection latency ≤ 1 s; C4 hybrid switch within 1 frame (C7-IT-02). - D-C10-3:
EngineHashMismatchErroraborts F2 takeoff; no GPU memory allocated on mismatch (C7-IT-03). - D-C10-7: filename-schema mismatch refused at parse time (C7-IT-04).
- ONNX-RT fallback path produces correct results when TRT engine missing (C7-IT-05).
Non-functional requirements
- Per-model p95 latencies (C7-PT-01): UltraVPR ≤ 60 ms, LightGlue ≤ 30 ms, AdHoP ≤ 90 ms, DISK ≤ 50 ms.
- GPU memory all engines resident ≤ 4 GB; system RAM ≤ 1.5 GB (C7-PT-02).
Risks & mitigations
- R04 (engine cache hardware-tied) — D-C10-7 + D-C10-3 enforced at C7's deserialise path.
- R10 (Marginals under thermal throttle) — C7's
ThermalStatepublish is the upstream input to the C4 hybrid.
Effort
T-shirt L; 21–34 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | TRT engine load + warm-up + cache lifecycle | 5 |
| 2 | ONNX-RT + TRT-EP fallback path | 3 |
| 3 | PyTorch FP16 simple-baseline path | 3 |
| 4 | D-C10-3 content-hash gate + D-C10-7 filename schema enforcement | 3 |
| 5 | ThermalState publisher from jetson-stats |
3 |
| 6 | Component-internal tests C7-IT-01..05 + C7-PT-01..02 + C7-ST-01 | 5 |
Key constraints
- RESTRICT-HW-1 (Jetson + 25 W TDP), AC-4.2 (8 GB system memory), AC-NEW-5 (thermal envelope).
Testing strategy
- Per
components/09_c7_inference/tests.md.
E-C6 — C6 Tile Cache + Spatial Index
Tracker: AZ-250 | Type: component | T-shirt: M | Story points: 13–21
System context
flowchart LR
C11[C11 TileDownloader] --> C6
C10[C10 build] --> C6
C5[C5 mid-flight gen] --> C6
C2[C2 VPR] --> C6
C25[C2.5 rerank] --> C6
C3[C3 matcher] --> C6
C11U[C11 TileUploader] --> C6
Problem / Context
Persistent imagery store byte-identical to satellite-provider's on-disk layout, plus the FAISS HNSW spatial index for VPR. Sole writer: C11 TileDownloader (production) + C5/orthorectifier (mid-flight). Sole readers: C2/C2.5/C3 (per-frame) + C11 TileUploader (post-landing).
Scope
In scope: Postgres tiles schema, filesystem JPEG layout matching satellite-provider, FAISS HNSW build/load (the index FILE — population via C10), per-sector freshness gates at write-time, 10 GB cache budget enforcement with LRU eviction, content-SHA-256 invariant on insert, mid-flight tile insert with quality_metadata.
Out of scope: tile fetch (E-C11), descriptor population (E-C10), inference (E-C7).
Architecture notes
- File:
components/08_c6_tile_cache/description.md,data_model.md. - Schema in
data_model.md; Postgres 16; SHA-256 sidecar via helperSha256Sidecar.
Interface specification
class TileStore:
def insert(tile: TileRecord, jpeg: bytes) -> None: ... # raises FreshnessRejected, ContentHashMismatch
def get_tile_pixels(tile_id: TileId) -> bytes: ...
def query_spatial(bbox: Bbox, zoom: int) -> list[TileRecord]: ...
def mark_uploaded(tile_id: TileId) -> None: ...
def pending_uploads() -> list[TileRecord]: ...
Data flow
sequenceDiagram
participant W as Writer (C11 / C5)
participant C6 as C6
participant DB as Postgres
participant FS as Filesystem
W->>C6: insert(tile, jpeg)
C6->>C6: freshness gate + sha256 check
C6->>FS: write jpeg + sidecar
C6->>DB: insert row
Dependencies
- E-BOOT, E-CC-LOG, E-CC-CONF.
- External: PostgreSQL 16, FAISS.
Acceptance criteria
- AC-8.1: filesystem layout byte-identical to
satellite-providerfor the same coordinate (C6-IT-01). - AC-8.2 / AC-NEW-6: per-sector freshness gate rejects in active_conflict, downgrade-flags in stable_rear (C6-IT-02 / C6-IT-05).
- AC-8.4: every mid-flight tile carries
quality_metadata(C6-IT-03). - AC-NEW-3: peak F4 burst (5 Hz, 100 tiles) writes without dropping (C6-IT-04).
- RESTRICT-SAT-2: 10 GB cap enforced with LRU eviction, every eviction logged (C6-IT-06).
- Defensive: SHA-256 mismatch rejects insert (C6-ST-01).
Non-functional requirements
- Per-tile read p95 (warm mmap) ≤ 0.5 ms; cold ≤ 50 ms (C6-PT-01).
Risks & mitigations
- R08 (freshness drift in active_conflict) — write-side gate is the primary mitigation.
Effort
T-shirt M; 13–21 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | Postgres tiles schema + migration |
3 |
| 2 | Filesystem JPEG store byte-identical to satellite-provider | 3 |
| 3 | FAISS HNSW load/save + mmap | 3 |
| 4 | Freshness gate + sector classification | 3 |
| 5 | 10 GB LRU eviction with logging | 3 |
| 6 | Component-internal tests C6-IT-01..06 + C6-PT-01 + C6-ST-01 | 5 |
Key constraints
- AC-8.1, AC-8.2, AC-NEW-6, RESTRICT-SAT-2, RESTRICT-UAV-4.
Testing strategy
Per components/08_c6_tile_cache/tests.md.
E-C11 — C11 Tile Manager (TileDownloader + TileUploader)
Tracker: AZ-251 | Type: component | T-shirt: M | Story points: 13–21
System context
flowchart LR
SP[satellite-provider] -->|GET| DL[C11 TileDownloader]
DL --> C6[C6 cache]
C6 --> UP[C11 TileUploader]
UP -->|POST /ingest| SP
classDef airborne fill:#fee
classDef operator fill:#cef
class DL,UP operator
Problem / Context
Sole operator-side network I/O against satellite-provider, both directions. Strict ADR-004: never loaded into the airborne companion image. Bundled because download + upload share auth, HTTP client, deployment unit, and the airborne-exclusion property.
Scope
In scope: TileDownloader.fetch (download → freshness gate → write to C6), TileUploader.upload_pending (read C6 pending → sign → POST → mark uploaded), per-flight ephemeral signing key, idempotent retry on partial-success batches, flight_state == ON_GROUND gate (defense-in-depth atop ADR-004).
Out of scope: any airborne code; cache artifact build (E-C10); orchestration (E-C12).
Architecture notes
- File:
components/12_c11_tilemanager/description.md. - ADR-004 enforcement via E-BOOT's SBOM diff + runtime self-check.
- Test substitute: e2e-test
mock-suite-sat-servicefixture undertests/fixtures/(R01).
Interface specification
class TileDownloader:
def fetch(req: FetchRequest) -> DownloadBatchReport: ...
class TileUploader:
def upload_pending(flight_state: FlightStateSignal) -> UploadBatchReport: ...
# raises UploadGateBlockedError if flight_state != ON_GROUND
Data flow
sequenceDiagram
participant Op as Operator
participant DL as TileDownloader
participant SP as satellite-provider
participant C6 as C6
Op->>DL: fetch(area, sector_classification)
DL->>SP: GET tiles
SP-->>DL: tiles + metadata
DL->>C6: insert (after freshness gate)
DL-->>Op: DownloadBatchReport
Dependencies
- E-C6, E-CC-CONF, E-CC-LOG.
- External: real
satellite-provider(download); D-PROJ-2 endpoint OR e2e-test fixture (upload).
Acceptance criteria
- C11-IT-01: TileDownloader fetch + freshness gate + C6 write byte-identical layout.
- C11-IT-02: stale-rejection counts surface in
DownloadBatchReport. - C11-IT-03: TileUploader posts pending, signs payloads, marks uploaded on 202.
- C11-IT-04:
UploadGateBlockedErrorwhen not ON_GROUND. - C11-IT-05: idempotent retry — already-acked tiles not re-sent.
- C11-ST-01: airborne process cannot import
c11_tilemanager(R02 enforcement). - C11-ST-02: NFT-SEC-02 network-egress test passes.
- C11-ST-03: per-flight key zeroised after upload.
Non-functional requirements
- Download throughput ≥ 50 MB/s on 1 Gbps link (C11-PT-01).
- Upload throughput ≥ 20 tile/s with signing (C11-PT-02).
Risks & mitigations
- R01 (D-PROJ-2 not yet shipped) — TileUploader works against the e2e-test fixture; production retire when real endpoint lands.
- R02 (ADR-004 break) — three enforcement gates; C11 tests verify each.
- R09 (key compromise) — per-flight ephemeral keys; voting layer for compromise detection.
Effort
T-shirt M; 13–21 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | TileDownloader: GET + freshness gate + C6 write | 5 |
| 2 | TileUploader: read pending + sign + POST + mark uploaded | 5 |
| 3 | Idempotent retry on partial-success batch | 3 |
| 4 | flight_state == ON_GROUND gate (defense-in-depth) |
2 |
| 5 | Per-flight ephemeral signing key + zeroisation | 3 |
| 6 | Component-internal tests C11-IT-01..05 + C11-PT-01..02 + C11-ST-01..03 + C11-AT-01 | 5 |
Key constraints
- ADR-004, RESTRICT-SAT-1 (no in-flight Service calls), AC-8.3, AC-8.4, AC-NEW-6.
Testing strategy
Per components/12_c11_tilemanager/tests.md.
E-C10 — C10 Pre-flight Cache Provisioning
Tracker: AZ-252 | Type: component | T-shirt: M | Story points: 13–21
System context
flowchart LR
C6[C6 already populated by C11] --> C10
C10 --> ENGINES[TRT engines]
C10 --> DESCS[FAISS descriptors]
C10 --> MAN[signed Manifest]
ENGINES & DESCS & MAN --> AIRBORNE[airborne image at F2 takeoff]
Problem / Context
Build model-derived artifacts from an already-populated C6: TRT engines, VPR descriptors (calling C2's embed_query over the corpus), the signed Manifest with content-hashes. Idempotent re-run on unchanged C6.
Scope
In scope: CacheProvisioner.build_artifacts, ManifestVerifier.verify, idempotence (D-C10-1), Manifest covers every shipped artifact, hardware-tied engine compile (D-C10-6), filename schema (D-C10-7), operator-key requirement.
Out of scope: tile fetch (E-C11), tile cache writes (E-C6), engine deserialisation (E-C7).
Architecture notes
- File:
components/11_c10_provisioning/description.md. - C10 narrowed in this Plan cycle: it does NOT talk to
satellite-provider. Tiles must be present in C6 before C10 runs.
Interface specification
class CacheProvisioner:
def build_artifacts(corpus_root: Path, key_path: Path) -> BuildReport: ...
class ManifestVerifier:
def verify(manifest: Path, public_key: PublicKey) -> ManifestVerdict: ...
Data flow
sequenceDiagram
participant Op as Operator
participant C10 as CacheProvisioner
participant C2 as C2 embed_query
participant FS as Filesystem
Op->>C10: build_artifacts(corpus_root, key)
C10->>C2: embed every tile
C2-->>C10: descriptors
C10->>FS: write engines + faiss + manifest
C10->>FS: sign manifest with operator key
C10-->>Op: BuildReport
Dependencies
- E-C6, E-C7, E-CC-LOG.
Acceptance criteria
- C10-IT-01: end-to-end build produces engines + descriptors + signed Manifest.
- C10-IT-02: ManifestVerifier rejects tampered or wrong-key Manifests.
- C10-IT-03: idempotent re-run — same hash, no recompile (D-C10-1).
- C10-IT-04: ManifestCoverageError on orphan files (no smuggled artifacts).
- C10-IT-05: Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines (D-C10-6).
- C10-ST-01: build refuses dev-key signing in operator mode.
Non-functional requirements
- Cold build wall-clock ≤ 12 min on developer laptop with NVIDIA GPU; warm idempotent re-run ≤ 1 min (C10-PT-01).
Risks & mitigations
- R04 (engine cache hardware-tied) — owner of the build side; deserialise side is C7.
Effort
T-shirt M; 13–21 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | TRT engine compile (per-model) | 5 |
| 2 | FAISS descriptor population via C2's embed path | 3 |
| 3 | Signed Manifest builder + content-hash table | 3 |
| 4 | ManifestVerifier with operator-key requirement | 3 |
| 5 | Idempotent re-run + ManifestCoverageError | 3 |
| 6 | Component-internal tests C10-IT-01..05 + C10-PT-01 + C10-ST-01 | 5 |
Key constraints
- AC-8.3, AC-NEW-1, D-C10-1 / D-C10-3 / D-C10-6 / D-C10-7.
Testing strategy
Per components/11_c10_provisioning/tests.md.
E-C12 — C12 Operator Pre-flight Orchestrator
Tracker: AZ-253 | Type: component | T-shirt: M | Story points: 13–21
System context
flowchart LR
CLI[operator-orchestrator CLI]
CLI --> C11D[C11 TileDownloader]
CLI --> C10[C10 CacheProvisioner]
CLI --> C11U[C11 TileUploader]
CLI --> RELOC[AC-3.4 re-loc workflow]
CLI --> FDR[FDR retrieval]
Problem / Context
Operator-facing CLI that sequences pre-flight (C11 download → C10 build) and post-landing (C11 upload), surfaces actionable failures, and handles the AC-3.4 re-localization workflow. Delivered as part of the operator-orchestrator tarball.
Scope
In scope: CLI subcommands (download, build-cache, upload-pending, reloc-confirm), CacheBuildReport aggregation, post-landing flight_state == ON_GROUND confirmation from FDR, sector-classification UI hook, FDR retrieval helpers.
Out of scope: actual download/upload (E-C11); engine compile (E-C10); FDR write side (E-C13).
Architecture notes
- File:
components/13_c12_operator_orchestrator/description.md. - Strict process boundary: C12 is operator-side only, in the same image as C11, but never airborne.
Interface specification
class OperatorTool:
def build_cache(area: Area, sector_classification: SectorMap) -> CacheBuildReport: ...
def trigger_post_landing_upload(fdr_root: Path) -> UploadBatchReport: ...
def confirm_relocation(candidate: ReLocCandidate) -> None: ...
Data flow
sequenceDiagram
participant Op as Operator
participant C12 as OperatorTool
participant C11 as C11
participant C10 as C10
Op->>C12: build_cache(area)
C12->>C11: TileDownloader.fetch
C11-->>C12: DownloadBatchReport
C12->>C10: build_artifacts
C10-->>C12: BuildReport
C12-->>Op: CacheBuildReport
Dependencies
- E-C10, E-C11, E-CC-LOG.
Acceptance criteria
- C12-IT-01: operator re-loc workflow returns SUT to
satellite_anchored≤ 30 s (AC-3.4). - C12-IT-02:
build_cacheorchestrates C11 then C10; download failure aborts before C10. - C12-IT-03:
trigger_post_landing_uploadrequires ≥ 30 s confirmed ON_GROUND in FDR. - C12-IT-04: actionable failure messages + non-zero exit on stale-tile rate > 30% or manifest signature failure.
- C12-ST-01: no CLI command path imports into airborne package boundary.
Non-functional requirements
- End-to-end
build_cachewall-clock ≤ 18 min on developer laptop with NVIDIA GPU (C12-PT-01).
Risks & mitigations
- R08 (freshness drift) — actionable failure surfacing in CacheBuildReport.
Effort
T-shirt M; 13–21 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | CLI scaffolding + subcommand routing | 3 |
| 2 | build_cache orchestration (C11 then C10) |
3 |
| 3 | trigger_post_landing_upload with FDR-state confirmation |
3 |
| 4 | AC-3.4 re-localization workflow | 3 |
| 5 | Actionable failure surfacing in CacheBuildReport | 2 |
| 6 | Component-internal tests C12-IT-01..04 + C12-PT-01 + C12-ST-01 + C12-AT-01 | 5 |
Key constraints
- ADR-004 (C12 lives operator-side); AC-3.4, AC-8.3, AC-8.4.
Testing strategy
Per components/13_c12_operator_orchestrator/tests.md.
E-C1 — C1 Visual / Visual-Inertial Odometry
Tracker: AZ-254 | Type: component | T-shirt: XL | Story points: 34–55
System context
flowchart LR
NAVCAM[Nav camera 3 Hz] --> C1
C8IMU[C8 ImuWindow 100-200 Hz] --> C1
CAL[CameraCalibration] --> C1
C1 --> C5[C5 StateEstimator]
Problem / Context
Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate from nav-camera + FC IMU. Three pluggable strategies (Okvis2 production-default, VinsMono research-only, KltRansac mandatory simple-baseline) selected at startup, build-time gated, never hot-swappable. Largest single epic by complexity.
Scope
In scope: VioStrategy interface + the three concrete strategies, ImuPreintegrator helper, warm-start path (AC-5.1), reboot recovery (AC-5.3), KltRansac as the simple-baseline AC-2.1a check, honest covariance under degradation.
Out of scope: state fusion (E-C5), pose estimation (E-C4), satellite anchoring (E-C2/C3/C4 chain).
Architecture notes
- File:
components/01_c1_vio/description.md. - Strategy + composition root + build-time exclusion (ADR-001 / ADR-002 / ADR-009).
- C++ strategies via pybind11; KltRansac thin Python wrapper around OpenCV.
ImuPreintegratorshared with E-C5 (built once, used twice).
Interface specification
class VioStrategy(Protocol):
def process_frame(frame: NavCameraFrame, imu: ImuWindow, cal: CameraCalibration) -> VioOutput: ...
def reset_to_warm_start(pose: WarmStartPose) -> None: ...
def health_snapshot() -> VioHealth: ...
DTOs in components/01_c1_vio/description.md § 2.
Data flow
sequenceDiagram
participant CAM as Nav camera
participant C1 as VioStrategy
participant C5 as C5
participant FDR as FDR
CAM->>C1: NavCameraFrame
C1->>C1: IMU preintegrate + feature tracking
C1->>C5: VioOutput (relative pose + 6x6 cov + bias)
C1->>FDR: VioHealth (ERROR + WARN; DEBUG to stdout)
Dependencies
- E-BOOT, E-CC-FDR-CLIENT, E-C7 (only for the simple-baseline KltRansac path; OKVIS2 / VinsMono are CPU-bound, not GPU).
Acceptance criteria
- C1-IT-01: honest cov norm rises monotonically under feature-loss event (AC-1.3 / AC-1.4).
- C1-IT-02:
VioOutputschema invariants — SPD covariance + matched frame_id (AC-1.4). - C1-IT-03: KltRansac ≥ 95% tracked-frame ratio on Derkachi normal segment (AC-2.1a engine rule).
- C1-IT-04: MRE p95 < 1 px frame-to-frame for Okvis2 + KltRansac (AC-2.2).
- C1-IT-05: warm-start converges within 5 frames (AC-5.1).
- C1-IT-06: F8 reboot recovery from warm-start hint without fake confidence (AC-5.3).
Non-functional requirements
- C1-PT-01:
process_framep95 ≤ 80 ms (Okvis2) at 3 Hz on Tier-2 with C2 backbone running concurrently; throughput ≥ 3 Hz sustained. - CPU ≤ 30% one core; memory ≤ 1.5 GB resident.
Risks & mitigations
- R10 (latency under thermal throttle) — C1's budget partition is fixed; thermal-driven hybrid lives in C4.
- R12 (single deployment camera) — KltRansac engine-rule path stays camera-agnostic; comparative IT-12 study uses static fixtures.
Effort
T-shirt XL; 34–55 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | VioStrategy interface + composition wiring |
3 |
| 2 | OKVIS2 strategy (pybind11 binding + integration) | 5 |
| 3 | VinsMono strategy (research-only; behind BUILD_VINS_MONO) | 5 |
| 4 | KltRansac simple-baseline strategy | 5 |
| 5 | ImuPreintegrator helper (shared with C5) |
3 |
| 6 | Warm-start + F8 reboot recovery paths | 3 |
| 7 | Honest-covariance contract tests | 3 |
| 8 | Component-internal tests C1-IT-01..06 + C1-PT-01 | 5 |
Key constraints
- AC-1.3, AC-1.4, AC-2.1a, AC-2.2, AC-4.1, AC-5.1, AC-5.3; RESTRICT-UAV-3 (sharp turns < 5% overlap).
Testing strategy
Per components/01_c1_vio/tests.md + suite-level FT-P-02 / FT-P-04 / FT-P-05.
E-C2 — C2 Visual Place Recognition
Tracker: AZ-255 | Type: component | T-shirt: L | Story points: 21–34
System context
flowchart LR
CAM[Nav camera] --> C2
C7[C7 backbone] --> C2
C6[C6 FAISS index] --> C2
C2 --> C25[C2.5 Re-rank]
Problem / Context
Top-K=10 candidate retrieval from the pre-cached corpus by descriptor similarity. UltraVPR primary, MegaLoc secondary, NetVLAD mandatory simple-baseline. Boundary between cheap retrieval and expensive matching.
Scope
In scope: VprStrategy + multiple backbones, FAISS HNSW lookup, descriptor pre-processing (resize/crop/normalise), L2 normalisation via DescriptorNormaliser, descriptor population entry-point used by C10.
Out of scope: re-rank (E-C2.5), matching (E-C3), index build (E-C10).
Architecture notes
- File:
components/02_c2_vpr/description.md. - Strategy + ADR-001/002/009.
Interface specification
class VprStrategy(Protocol):
def embed_query(frame: NavCameraFrame, cal: CameraCalibration) -> VprQuery: ...
def retrieve_topk(query: VprQuery, k: int) -> VprResult: ...
def descriptor_dim() -> int: ...
Data flow
sequenceDiagram
participant CAM as Nav camera
participant C2 as VprStrategy
participant C7 as C7
participant C6 as FAISS
CAM->>C2: NavCameraFrame
C2->>C7: backbone forward
C7-->>C2: embedding
C2->>C6: HNSW search k=10
C6-->>C2: candidates
C2-->>C25: VprResult
Dependencies
- E-C6, E-C7, E-CC-FDR-CLIENT.
Acceptance criteria
- C2-IT-01: UltraVPR recall@10 ≥ 0.95; NetVLAD ≥ 0.85 on Derkachi (AC-2.1b + engine rule).
- C2-IT-02:
VprResultinvariants (length, sorted distances, label). - C2-IT-03: poisoned-tile top-1 rate within AC-NEW-7 relaxed CI.
- C2-IT-04: scale-ratio ±20% recall@10 ≥ 0.85 (AC-8.6 scale half).
- C2-ST-01: index handle invalidation rejected with
IndexUnavailableError.
Non-functional requirements
- C2-PT-01:
embed_queryp95 ≤ 60 ms;retrieve_topkp95 ≤ 2 ms; combined ≤ 65 ms (AC-4.1 partition). - GPU ≤ 600 MB resident; system mem ≤ 200 MB for index handle.
Risks & mitigations
- R06 (VPR top-1 false positive) — C2.5 + C3 + AC-NEW-7 downstream.
Effort
T-shirt L; 21–34 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | VprStrategy interface + composition |
3 |
| 2 | UltraVPR backbone (TRT) | 5 |
| 3 | MegaLoc, MixVPR, SelaVPR, EigenPlaces secondary backbones | 5 |
| 4 | NetVLAD mandatory simple-baseline | 3 |
| 5 | FAISS HNSW load + lookup wiring | 3 |
| 6 | DescriptorNormaliser helper (shared with C10) |
2 |
| 7 | Component-internal tests C2-IT-01..04 + C2-PT-01 + C2-ST-01 | 5 |
Key constraints
- AC-2.1b, AC-2.2, AC-4.1, AC-8.6, AC-NEW-7.
Testing strategy
Per components/02_c2_vpr/tests.md.
E-C2.5 — C2.5 Inlier-based Re-rank
Tracker: AZ-256 | Type: component | T-shirt: S | Story points: 5–8
System context
flowchart LR
C2[C2 K=10] --> C25
C7[C7 LightGlueRuntime helper] --> C25
C6[C6 tile pixels] --> C25
C25 --> C3[C3 N=3]
Problem / Context
K=10 → N=3 by single-pair LightGlue inlier count. Boundary between cheap retrieval and expensive matching. Shares LightGlueRuntime helper with C3 (R14 — owned by helper, not by either component).
Scope
In scope: ReRankStrategy + InlierCountReRanker, drop-and-continue on per-candidate failure.
Out of scope: matching itself (E-C3); LightGlue runtime ownership (the helper is its own module).
Architecture notes
- File:
components/03_c2_5_rerank/description.md. - Helper-ownership decision documented in R14 / risk_mitigations.md.
Interface specification
class ReRankStrategy(Protocol):
def rerank(frame: NavCameraFrame, vpr_result: VprResult, n: int) -> RerankResult: ...
Data flow
sequenceDiagram
participant C2 as C2
participant C25 as C2.5
participant LG as LightGlueRuntime helper
participant C6 as C6
C2->>C25: VprResult (k=10)
loop 10 candidates
C25->>C6: get_tile_pixels
C25->>LG: single-pair inlier count
LG-->>C25: inlier count
end
C25-->>C3: top-N=3 by inlier count
Dependencies
- E-C2, E-C7, E-C6, shared
LightGlueRuntimehelper (with C3).
Acceptance criteria
- C2.5-IT-01: top-1 promotion rate ≥ 0.98 (rerank rarely overrides correct C2 top-1).
- C2.5-IT-02: drop-and-continue on per-candidate
RerankBackboneError. - C2.5-IT-03: shared
LightGlueRuntimeserial-access invariant (no deadlock; bit-identical to single-threaded).
Non-functional requirements
- C2.5-PT-01:
rerankp95 ≤ 80 ms for 10 single-pair LightGlue passes; engine reuse single instance across calls. - GPU mem ≤ 300 MB shared LightGlue engine.
Risks & mitigations
- R14 (apparent C2.5↔C3 cycle) — resolved this iteration via helper ownership.
Effort
T-shirt S; 5–8 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | InlierCountReRanker + drop-and-continue |
3 |
| 2 | Shared LightGlueRuntime helper module |
3 |
| 3 | Component-internal tests C2.5-IT-01..03 + C2.5-PT-01 | 2 |
Key constraints
- AC-2.1b, AC-4.1, AC-NEW-7.
Testing strategy
Per components/03_c2_5_rerank/tests.md.
E-C3 — C3 Cross-Domain Matcher
Tracker: AZ-257 | Type: component | T-shirt: L | Story points: 21–34
System context
flowchart LR
C25[C2.5 N=3] --> C3
C7[C7] --> C3
CAL[CameraCalibration] --> C3
C6[C6 tiles] --> C3
C3 --> C35[C3.5 AdHoP]
Problem / Context
2D-3D correspondences between nav-camera and the top-N=3 satellite tiles, with RANSAC inliers + reprojection residual. Dominant compute cost in F3. Backbone choice locked (DISK+LightGlue per D-C3-1 = (a)) pending IT-12 verdict.
Scope
In scope: CrossDomainMatcher + DISK+LightGlue (primary) + ALIKED+LightGlue (secondary) + XFeat (alternate); RANSAC + reprojection residual via RansacFilter helper; InsufficientInliersError propagation.
Out of scope: refinement (E-C3.5); pose estimation (E-C4); LightGlue runtime ownership (helper).
Architecture notes
- File:
components/04_c3_matcher/description.md.
Interface specification
class CrossDomainMatcher(Protocol):
def match(frame: NavCameraFrame, rerank: RerankResult, cal: CameraCalibration) -> MatchResult: ...
def health_snapshot() -> MatcherHealth: ...
Data flow
sequenceDiagram
participant C25 as C2.5
participant C3 as C3
participant C7 as C7
C25->>C3: RerankResult (n=3)
loop 3 candidates
C3->>C7: backbone forward
C3->>C3: RANSAC + residual
end
C3-->>C35: MatchResult (best by inlier count)
Dependencies
- E-C2.5, E-C7, shared
LightGlueRuntimehelper, sharedRansacFilterhelper.
Acceptance criteria
- C3-IT-01: best-candidate inlier count p5 ≥ 80 (AC-1.1 partition).
- C3-IT-02: deterministic
best_candidate_idx == argmax(inlier_count)with deterministic tie-break. - C3-IT-03: cross-domain MRE p95 < 2.5 px (AC-2.2).
- C3-IT-04: tilt ±20° + 350 m outliers — inlier count p10 ≥ 40 (AC-3.1).
- C3-IT-05:
InsufficientInliersErrorpropagation when all N=3 fail.
Non-functional requirements
- C3-PT-01:
matchp95 ≤ 180 ms; per-candidate ≤ 60 ms; throughput ≥ 3 Hz; GPU mem ≤ 800 MB combined.
Risks & mitigations
- R06 (false positive) — RANSAC + residual + downstream AC-NEW-7.
- R10 (Marginals under throttle) — D-CROSS-LATENCY-1 hybrid touches C4 not C3 (C3's budget is fixed).
Effort
T-shirt L; 21–34 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | CrossDomainMatcher interface + composition |
3 |
| 2 | DISK+LightGlue primary | 5 |
| 3 | ALIKED+LightGlue secondary | 3 |
| 4 | XFeat alternate (lightweight) | 3 |
| 5 | RansacFilter helper (shared C3/C3.5/C4) |
3 |
| 6 | Component-internal tests C3-IT-01..05 + C3-PT-01 | 5 |
Key constraints
- AC-1.1, AC-2.2, AC-3.1, AC-4.1.
Testing strategy
Per components/04_c3_matcher/tests.md.
E-C3.5 — C3.5 AdHoP-Conditional Refinement
Tracker: AZ-258 | Type: component | T-shirt: M | Story points: 8–13
System context
flowchart LR
C3[C3 MatchResult] --> C35
C7[C7 AdHoP backbone] --> C35
C35 --> C4[C4 Pose]
Problem / Context
Conditional perspective preconditioning when residual exceeds threshold; passthrough otherwise. Preserves AC-4.1 budget on the steady-state path while keeping refinement for hard frames.
Scope
In scope: ConditionalRefiner + AdHoPRefiner + PassthroughRefiner (both linked); residual threshold configuration; passthrough fall-through on RefinerBackboneError.
Out of scope: matcher (E-C3); pose (E-C4).
Architecture notes
- File:
components/05_c3_5_adhop/description.md. - Both implementations linked into the deployment binary; runtime gate is a config knob.
Interface specification
class ConditionalRefiner(Protocol):
def refine_if_needed(frame: NavCameraFrame, mr: MatchResult, threshold: float) -> MatchResult: ...
def was_invoked() -> bool: ...
Data flow
sequenceDiagram
participant C3 as C3
participant C35 as C3.5
participant C7 as C7
C3->>C35: MatchResult (residual=R)
alt R > threshold
C35->>C7: AdHoP backbone forward
C7-->>C35: refined correspondences
C35-->>C4: enriched MatchResult
else
C35-->>C4: passthrough MatchResult
end
Dependencies
- E-C3, E-C7.
Acceptance criteria
- C3.5-IT-01: residual reduction ≥ 90% of invocations (AC-2.2 hard-frame portion).
- C3.5-IT-02: passthrough fall-through on
RefinerBackboneErrorwith bit-identical correspondences. - C3.5-IT-03: invocation rate < 0.30 on Derkachi normal segment.
Non-functional requirements
- C3.5-PT-01: invoked p95 ≤ 90 ms; passthrough p95 ≤ 0.5 ms; aggregated added latency ≤ 25 ms.
Risks & mitigations
- R10 (latency under throttle) — threshold tunable via operator-orchestrator pre-flight.
Effort
T-shirt M; 8–13 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | AdHoPRefiner (TRT engine + perspective preconditioning) |
5 |
| 2 | PassthroughRefiner no-op |
1 |
| 3 | Conditional gate + passthrough fall-through | 2 |
| 4 | Component-internal tests C3.5-IT-01..03 + C3.5-PT-01 | 3 |
Key constraints
- AC-2.2, AC-4.1.
Testing strategy
Per components/05_c3_5_adhop/tests.md.
E-C4 — C4 Pose Estimator
Tracker: AZ-259 | Type: component | T-shirt: M | Story points: 13–21
System context
flowchart LR
C35[C3.5] --> C4
CAL[CameraCalibration] --> C4
C7[C7 ThermalState] --> C4
C5GRAPH[C5 iSAM2 graph] --> C4
C4 --> C5[C5 add_pose_anchor]
Problem / Context
Convert MatchResult into PoseEstimate (WGS84 + 6×6 covariance + provenance label). OpenCV solvePnPRansac + GTSAM Marginals for native 6×6; D-CROSS-LATENCY-1 hybrid degrades to Jacobian under thermal throttle.
Scope
In scope: OpenCVGtsamPoseEstimator, GTSAM Marginals integration with C5's iSAM2 graph, Jacobian fallback, per-frame thermal-state-driven mode switch, WgsConverter helper usage.
Out of scope: state fusion (E-C5); thermal telemetry source (E-C7).
Architecture notes
- File:
components/06_c4_pose/description.md. - ADR-003 shared substrate: C4 adds factors to C5's graph; co-developed.
- ADR-006 (Jacobian fallback ~5–10% accuracy loss accepted under throttle).
Interface specification
class PoseEstimator(Protocol):
def estimate(mr: MatchResult, cal: CameraCalibration, thermal: ThermalState) -> PoseEstimate: ...
def current_covariance_mode() -> CovarianceMode: ...
Data flow
sequenceDiagram
participant C35 as C3.5
participant C4 as C4
participant C5 as C5 graph
participant C7 as C7 thermal
C35->>C4: MatchResult
C7-->>C4: ThermalState
alt thermal.throttle
C4->>C4: Jacobian covariance
else
C4->>C5: add factor
C5->>C5: Marginals.marginalCovariance
C5-->>C4: Sigma
end
C4-->>C5: PoseEstimate (add_pose_anchor)
Dependencies
- E-C3.5, E-C5 (co-developed shared substrate), shared
RansacFilter, sharedWgsConverter, sharedSE3Utils.
Acceptance criteria
- C4-IT-01: WGS84 accuracy p80 ≤ 50 m, p50 ≤ 20 m on Derkachi (AC-1.1 / AC-1.2).
- C4-IT-02: 6×6 SPD covariance + honest under inlier degradation (AC-1.4).
- C4-IT-03: D-CROSS-LATENCY-1 mode switch within 1 frame (AC-NEW-5 workstation portion).
- C4-IT-04: shared-graph integration with C5 — prior keyframe perturbations within tolerance.
Non-functional requirements
- C4-PT-01:
estimatep95 MARGINALS ≤ 90 ms; JACOBIAN ≤ 15 ms; switch ≤ 1 frame.
Risks & mitigations
- R10 (Marginals throttle) — primary owner of the hybrid switch.
Effort
T-shirt M; 13–21 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | solvePnPRansac + IPPE wiring |
3 |
| 2 | GTSAM Marginals factor add to C5 graph |
5 |
| 3 | Jacobian-degraded fallback | 3 |
| 4 | Per-frame thermal-state-driven switch | 2 |
| 5 | WgsConverter helper (shared with C8) |
3 |
| 6 | Component-internal tests C4-IT-01..04 + C4-PT-01 | 3 |
Key constraints
- AC-1.1, AC-1.2, AC-1.4, AC-4.1, AC-NEW-5.
Testing strategy
Per components/06_c4_pose/tests.md.
E-C5 — C5 State Estimator
Tracker: AZ-260 | Type: component | T-shirt: XL | Story points: 34–55
System context
flowchart LR
C1[C1 VioOutput] --> C5
C4[C4 PoseEstimate] --> C5
C8I[C8 IMU/attitude/gps_health] --> C5
C5 --> C8O[C8 outbound 5 Hz]
C5 --> ORTHO[Orthorectifier → C6 mid-flight tile]
C5 --> FDR[FDR smoothed history]
Problem / Context
Own GTSAM iSAM2 + IncrementalFixedLagSmoother (K=10–20). Fuse VIO + Pose + FC IMU into the posterior state; emit smoothed current frame to C8 + smoothed past keyframes to FDR (AC-4.5 revised, NOT FC retroactive). Spoof-promotion gate (AC-NEW-2 / AC-NEW-8). Largest epic alongside C1.
Scope
In scope: StateEstimator + GtsamIsam2StateEstimator (production-default) + EskfStateEstimator (mandatory simple-baseline); spoof-promotion gate; source-label state machine; smoothed history → FDR; AC-5.2 fallback path.
Out of scope: VIO (E-C1); pose (E-C4); FC adapter (E-C8); orthorectifier (lives within C5 as an internal subcomponent OR could split — kept inside C5 per the spec).
Architecture notes
- File:
components/07_c5_state/description.md. - ADR-003 (shared GTSAM substrate with C4); co-developed.
- ADR-008 + spoof gate logic.
Interface specification
class StateEstimator(Protocol):
def add_vio(o: VioOutput) -> None: ...
def add_pose_anchor(p: PoseEstimate) -> None: ...
def add_fc_imu(w: ImuWindow) -> None: ...
def current_estimate() -> EstimatorOutput: ...
def smoothed_history(n: int) -> list[EstimatorOutput]: ...
def health_snapshot() -> EstimatorHealth: ...
Data flow
sequenceDiagram
participant C1 as C1
participant C4 as C4
participant C8I as C8 inbound
participant C5 as C5 iSAM2
participant C8O as C8 outbound
participant FDR as FDR
C1->>C5: add_vio
C4->>C5: add_pose_anchor (factor add)
C8I->>C5: add_fc_imu
C5->>C5: iSAM2 update + Marginals
C5->>C8O: current_estimate (5 Hz)
C5->>FDR: smoothed_history (per AC-4.5)
Dependencies
- E-C1, E-C4, E-CC-FDR-CLIENT, E-C8 inbound side, shared
ImuPreintegrator,SE3Utils,WgsConverter.
Acceptance criteria
- C5-IT-01:
last_satellite_anchor_age_msreset/monotonic-rise (AC-1.3 binning). - C5-IT-02: smoothed-current honest covariance (AC-1.4).
- C5-IT-03: VIO-only fallback under matcher failure (AC-3.5).
- C5-IT-04: smoothed past-keyframes → FDR but NOT to FC stream (AC-4.5 revised).
- C5-IT-05: 3 s no-estimate triggers AC-5.2 fallback.
- C5-IT-06: spoof-promotion gate ≥ 10 s + visual consistency (AC-NEW-2).
- C5-IT-07: visual blackout + spoof escalation (AC-NEW-8).
- C5-ST-01: spoof-rejection logging cannot be silenced.
Non-functional requirements
- C5-PT-01:
add_pose_anchor+current_estimatep95 ≤ 60 ms; memory ≤ 100 MB resident.
Risks & mitigations
- R05 (iSAM2 silent factor-add failure) — every add logs success/false.
- R07 (spoof premature promotion) — primary owner of the gate.
Effort
T-shirt XL; 34–55 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | StateEstimator interface + composition |
3 |
| 2 | iSAM2 + IncrementalFixedLagSmoother K=10-20 wiring | 5 |
| 3 | BetweenFactorPose3 (VIO) + GenericProjectionFactorCal3DS2 (pose) |
5 |
| 4 | Marginals.marginalCovariance integration |
3 |
| 5 | Source-label state machine + spoof-promotion gate | 5 |
| 6 | EskfStateEstimator mandatory simple-baseline |
5 |
| 7 | Smoothed-history → FDR path (NOT to FC) | 3 |
| 8 | AC-5.2 fallback path | 3 |
| 9 | Orthorectifier → C6 mid-flight tile gen sub-path | 3 |
| 10 | Component-internal tests C5-IT-01..07 + C5-PT-01 + C5-ST-01 | 5 |
Key constraints
- AC-1.3, AC-1.4, AC-3.5, AC-4.5 (revised), AC-5.2, AC-NEW-2, AC-NEW-8.
Testing strategy
Per components/07_c5_state/tests.md.
E-C8 — C8 FC + GCS Adapter
Tracker: AZ-261 | Type: component | T-shirt: L | Story points: 21–34
System context
flowchart LR
FCIN[FC inbound MAVLink/MSP2] --> C8I[C8 inbound]
C8I --> C5[C5]
C8I --> C1[C1]
C5 --> C8O[C8 outbound]
C8O -->|GPS_INPUT / MSP2_SENSOR_GPS| FCOUT[FC]
C8O -->|telemetry 1-2 Hz| GCS[QGroundControl]
Problem / Context
Per-FC inbound + outbound. Inbound: subscribe to FC IMU/attitude/GPS-health/MAV_STATE; publish ImuWindow/AttitudeWindow/GpsHealth/FlightStateSignal. Outbound: encode EstimatorOutput for AP (GPS_INPUT) and iNav (MSP2_SENSOR_GPS) at 5 Hz with honest 6×6 → 2×2 covariance projection. Owns MAVLink 2.0 signing on AP wired channel (D-C8-9 = (d), R03 risk) + per-flight key rotation. Also feeds GCS at 1–2 Hz.
Scope
In scope: FcAdapter + PymavlinkArdupilotAdapter + Msp2InavAdapter; GcsAdapter + QgcTelemetryAdapter; signing handshake + per-flight ephemeral key + zeroisation; D-C8-2 source-set switch (gated by IT-3); honest covariance projection.
Out of scope: state estimation (E-C5); GCS workflow logic (operator side, E-C12).
Architecture notes
- File:
components/10_c8_fc_adapter/description.md. - Both AP + iNav adapters typically linked into the deployment binary (per ADR-002 — config picks one at runtime).
- ADR-008 source-set switch gated by IT-3.
Interface specification
class FcAdapter(Protocol):
def open(port: PortConfig, signing_key: bytes | None) -> None: ...
def subscribe_telemetry(cb: Callable[[FcTelemetryFrame], None]) -> Subscription: ...
def emit_external_position(o: EstimatorOutput) -> None: ...
def emit_status_text(msg: str, severity: Severity) -> None: ...
def request_source_set_switch() -> None: ... # AP only
def current_flight_state() -> FlightStateSignal: ...
Data flow
sequenceDiagram
participant FC as FC
participant C8I as C8 inbound
participant C5 as C5
participant C8O as C8 outbound
FC->>C8I: IMU + attitude + gps_health
C8I->>C5: ImuWindow / AttitudeWindow / GpsHealth
C5->>C8O: EstimatorOutput
C8O->>FC: GPS_INPUT / MSP2_SENSOR_GPS @ 5 Hz
Dependencies
- E-C5, E-CC-CONF, E-CC-LOG.
- External: pymavlink, MSP2 client, ArduPilot SITL, QGroundControl SITL.
Acceptance criteria
- C8-IT-01: 6×6 → 2×2 honest covariance projection within 1% norm.
- C8-IT-02: 5 Hz emission jitter ≤ ±5%.
- C8-IT-03: warm-start GPS from FC EKF ≤ 1 s after C8 ready (AC-5.1).
- C8-IT-04: GCS stream 1–2 Hz (AC-6.1).
- C8-IT-05: GCS commands accepted (AC-6.2).
- C8-IT-06: WGS84 round-trip ≤ 1 cm position residual (AC-6.3).
- C8-IT-07: source-set switch ≤ 3 s of gate-clear (AC-NEW-2).
- C8-IT-08: iNav adapter never attempts signing; AP always (RESTRICT-COMM-2).
- C8-ST-01: MAVLink 2.0 signing handshake passes IT-3 SITL gate (R03).
- C8-ST-02: per-flight key never persists across flights.
Non-functional requirements
- C8-PT-01:
emit_external_positionp95 ≤ 5 ms; inbound IMU callback p95 ≤ 1 ms.
Risks & mitigations
- R03 (signing handshake no precedent) — gated by IT-3; D-C8-2-FALLBACK options recorded.
- R09 (key compromise) — per-flight ephemeral keys + zeroisation.
Effort
T-shirt L; 21–34 points.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | FcAdapter interface + composition |
3 |
| 2 | PymavlinkArdupilotAdapter outbound GPS_INPUT |
5 |
| 3 | Msp2InavAdapter outbound MSP2_SENSOR_GPS |
3 |
| 4 | Inbound IMU/attitude/gps_health/MAV_STATE subscription | 3 |
| 5 | Honest 6×6 → 2×2 covariance projection | 3 |
| 6 | MAVLink 2.0 per-flight signing handshake (AP) | 5 |
| 7 | Source-set switch (AP D-C8-2 gated by IT-3) | 3 |
| 8 | GcsAdapter + downsampled telemetry |
3 |
| 9 | Component-internal tests C8-IT-01..08 + C8-PT-01 + C8-ST-01..02 | 5 |
Key constraints
- AC-4.3, AC-4.4, AC-5.1, AC-5.2, AC-6.1, AC-6.2, AC-6.3, AC-NEW-2; RESTRICT-FC-1 / FC-2 / FC-3, RESTRICT-COMM-1 / COMM-2.
Testing strategy
Per components/10_c8_fc_adapter/tests.md.
E-BBT — Blackbox Tests (FT/NFT scenarios)
Tracker: AZ-262 | Type: tests | T-shirt: M | Story points: 13–21
System context
flowchart LR
TESTROOT[tests/ runner] --> FTP[FT-P functional positive]
TESTROOT --> FTN[FT-N functional negative]
TESTROOT --> NFTPERF[NFT-PERF Tier-2]
TESTROOT --> NFTLIM[NFT-LIM resource]
TESTROOT --> NFTSEC[NFT-SEC security]
TESTROOT --> NFTRES[NFT-RES resilience]
TESTROOT --> IT[IT integration]
Problem / Context
Per-component epics ship their own component-internal unit/contract tests; this epic parents the suite-level scenarios already specified in _docs/02_document/tests/*.md. They exercise end-to-end ACs and restrictions and bind multiple components together.
Scope
In scope: implementing the FT-P, FT-N, NFT-PERF, NFT-LIM, NFT-SEC, NFT-RES, IT scenario IDs cited in traceability-matrix.md. Test data setup, fixtures (Derkachi flight + AerialVL S03 + e2e-test mock-suite-sat-service), Tier-2 runner orchestration.
Out of scope: per-component unit/contract tests (live in each component epic).
Architecture notes
- Files:
_docs/02_document/tests/blackbox-tests.md,performance-tests.md,security-tests.md,resource-limit-tests.md,resilience-tests.md,environment.md,test-data.md,traceability-matrix.md. - Tier-1 vs Tier-2 split per ADR-005.
Interface specification
Tests are pytest scenarios; no runtime interface beyond the test runner CLI.
Data flow
sequenceDiagram
participant CI as CI runner
participant FX as Fixtures
participant SUT as System under test (compose / Tier-2 binary)
CI->>FX: stage Derkachi corpus + SITL containers
CI->>SUT: bring up
CI->>SUT: drive scenario inputs
SUT-->>CI: emitted MAVLink + FDR records
CI->>CI: assert per scenario pass criteria
Dependencies
- All component epics (each must ship ready-to-test).
Acceptance criteria
- Every scenario ID cited in
traceability-matrix.mdexists, runs, and passes on its target tier. - Coverage ≥ 75% gate held (currently 92.4% inclusive / 89.8% strict — confirmed pre-Step-6).
- PARTIAL / NOT COVERED rows have linked leftover entries explaining the deferral.
Non-functional requirements
- Tier-1 full suite wall-clock ≤ 30 min on a developer laptop.
- Tier-2 NFT suite wall-clock ≤ 90 min on the bench Jetson.
Risks & mitigations
- R11 (statistical headroom) — NFT-RES-03 / NFT-SEC-01 use Monte-Carlo-with-CI per the AC-text relaxation.
Effort
T-shirt M; 13–21 points (test implementation; scenario specs already exist).
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | Test environment scaffolding (tests/conftest.py, fixtures dir, Postgres + SITL bring-up) |
3 |
| 2 | FT-P-* implementation (positive functional scenarios) | 5 |
| 3 | FT-N-* implementation (negative functional scenarios) | 3 |
| 4 | NFT-PERF / NFT-LIM Tier-2 runner integration | 5 |
| 5 | NFT-SEC implementation (incl. NFT-SEC-02 network egress + NFT-SEC-01 cache poisoning) | 5 |
| 6 | NFT-RES resilience scenarios | 3 |
| 7 | IT-3 ArduPilot SITL signing handshake (R03 gate) | 5 |
| 8 | IT-12 comparative-study runner | 3 |
Key constraints
- AC-NEW-3, AC-NEW-5, AC-NEW-7, RESTRICT-HW-1.
Testing strategy
This epic IS the testing strategy for system-level scenarios. Per-component testing belongs to component epics.
E-DEMO-REPLAY — Offline replay mode (video + tlog → per-tick coordinate stream)
Tracker: AZ-265
Type: feature (deployment-adjacent)
T-shirt: M | Story points: 27–32
Added: Decompose Step 2 (cycle 1, 2026-05-10)
Source notes: _docs/how_to_test.md (user-written demo requirements — auto-sync incorporated as child task #8)
System context
Demonstrate the GPS-denied positioning pipeline against historical flight data: a video file from the nav camera + a .tlog file from the FC. The replay mode runs the same C1–C5 inference pipeline the airborne binary runs; only the input transport (live camera → video file; live MAVLink → tlog) and output sink (FC MAVLink emit → JSONL) differ. NO ROS dependency is added — replay reuses the existing C8 FcAdapter interface via the strategy pattern.
flowchart LR
subgraph LIVE[Airborne mode — unchanged]
CAM[Live camera] --> C1L[C1 VIO]
FCL[Live FC MAVLink] --> C8L[C8 inbound]
C8L --> C1L
C1L --> C2L[C2..C5]
C2L --> C8OL[C8 outbound] --> FCL
end
subgraph REPLAY[Replay mode — this epic]
VID[Video file .mp4/.h264] --> VFFS[VideoFileFrameSource] --> C1R[C1 VIO]
TLOG[tlog file] --> TLR[TlogReplayFcAdapter] --> C1R
C1R --> C2R[C2..C5]
C2R --> RSINK[JsonlReplaySink] --> JSONL[results.jsonl - one EstimatorOutput per tick]
end
Problem / Context
The parent-suite UI (in ui/ workspace, out of scope for this repo) needs to demo the GPS-denied positioning end-to-end. Per-component fixtures or simulators would not give the demo end-to-end fidelity. Instead, replay mode runs the production pipeline against historical inputs — demo confidence equals field test confidence on the same footage.
ROS as the input transport was considered and rejected: the system is MAVLink-native; introducing ROS would (a) add a major new dependency, (b) split production vs. demo code paths, and (c) duplicate code. Reusing the existing C8 FcAdapter interface with a tlog-replay strategy is strictly better.
Scope
In scope:
FrameSourceinterface (formalised cross-cutting; previously implicit "camera ingest thread") +VideoFileFrameSourcestrategy +LiveCameraFrameSourceretrofit (no-op restructure of existing camera plumbing).TlogReplayFcAdapterstrategy (new C8FcAdapterimpl) parsing pymavlink.tlogfiles and emittingImuWindow/AttitudeWindow/GpsHealth/FlightStateSignalat tlog timestamp cadence.ReplaySinkinterface +JsonlReplaySinkimpl (oneEstimatorOutputper line).compose_replay(config) -> ReplayRootcomposition root extending E-CC-CONF (AZ-246).Clockinjection (per R-DEMO-4) so timer-driven logic in C1–C5 works in both wall-clock (live) and tlog-simulated (replay) modes.gps-denied-replayCLI:--video PATH --tlog PATH --output results.jsonl --camera-calibration calib.json --config config.yaml --pace {realtime,asap} [--time-offset-ms N].- Fourth Docker image
gps-denied-replay-cli(Python + C1–C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server). - E2E replay test on a 1–2 min Derkachi clip + matching tlog asserting estimated track within ≤ 100 m of ground-truth GPS for ≥ 80 % of ticks.
Out of scope:
- ROS / ROS2 dependency.
- HTTP wrapper microservice (parent-suite UI backend shells out to the CLI; defer until subprocess-shape is proven insufficient).
- Modifying any C1–C5 component to be replay-aware — they MUST remain mode-agnostic.
- C6 mid-flight write path (replay reads a pre-built tile cache; doesn't write).
Architecture notes
- ADR-001 / ADR-002 / ADR-009 all apply unchanged.
- New
BUILD_*flags:BUILD_VIDEO_FILE_FRAME_SOURCE,BUILD_TLOG_REPLAY_ADAPTER,BUILD_REPLAY_SINK_JSONL. Default ON for the new replay-cli binary; OFF for airborne, research, and operator-orchestrator. - New cross-cutting
FrameSourceinterface lives atsrc/gps_denied_onboard/frame_source/(Layer 1 Foundation permodule-layout.md§ layering). compose_replaylives inruntime_root.pyalongsidecompose_rootandcompose_operator.
Interface specification
class FrameSource(Protocol):
def next_frame(self) -> NavCameraFrame | None: ...
def close(self) -> None: ...
class VideoFileFrameSource(FrameSource):
def __init__(self, video_path: Path, frame_rate_hz: float, camera_id: str): ...
class TlogReplayFcAdapter(FcAdapter): # FcAdapter from AZ-261 / E-C8
def __init__(self, tlog_path: Path, target_fc_dialect: enum {ARDUPILOT, INAV}): ...
class ReplaySink(Protocol):
def emit(self, output: EstimatorOutput) -> None: ...
def close(self) -> None: ...
class JsonlReplaySink(ReplaySink):
def __init__(self, output_path: Path): ...
def compose_replay(config: Config) -> ReplayRoot: ...
Data flow
Startup → load config / calibration → process tlog + video timestamp-aligned → for each frame: camera-ingest → C1 → C2 → C2.5 → C3 → C3.5 → C4 → C5 → emit EstimatorOutput to JsonlReplaySink. End of input → close sink → exit.
--pace realtime paces frames at wall-clock; --pace asap runs uncapped (default). The injected Clock is wall-clock-derived in realtime mode and tlog-timestamp-derived in asap mode so component fallback timers (e.g., AC-5.2 3 s no-estimate fallback) trigger consistently in both.
Dependencies
- E-C1, E-C2, E-C2.5, E-C3, E-C3.5, E-C4, E-C5, E-C8 (every per-frame component).
- E-CC-CONF (AZ-246) for
compose_rootextension. - E-CC-HELPERS (AZ-264) for
WgsConverter(tlog GPS → local-tangent-plane). - Does NOT depend on E-C6 / E-C10 / E-C11 / E-C12 (replay reads pre-built cache; no operator-side workflows).
Acceptance criteria
- AC-1: CLI exits 0 on a valid 1-min fixture and produces JSONL with one
EstimatorOutputline per tlog tick (within ±5 % ofGLOBAL_POSITION_INTcount). - AC-2: Each line is a valid JSON object matching the
EstimatorOutputschema. - AC-3: For a fixture with known ground-truth GPS, the L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound).
- AC-4: Replay binary contains C1–C5 + replay strategies; SBOM diff CI step verifies absence of C6/C10/C11/C12.
- AC-5: Same input → same output (deterministic) within ≤ 1e-6 float drift in position fields.
- AC-6:
--pace realtimeruns the 1-min fixture in 60 ± 5 s;--pace asapin ≤ 30 s on Tier-1 hardware. - AC-7: Without
--time-offset-ms, the CLI auto-detects the video ↔ tlog offset by correlating video motion-onset (or first-frame timestamp) with the tlog IMU take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s, matching the typical quadcopter take-off signature). On a fixture with known correct offset, the auto-detected offset is within ± 200 ms of ground truth. If auto-detect confidence is < 80 % the CLI logs a WARN and proceeds with the best-guess offset;--time-offset-ms Nalways overrides the auto-detect. - AC-8: If neither auto-detect nor manual offset can produce > 95 % of frames with at least one matching IMU window within ± 100 ms, the CLI exits with code 2 and prints both the auto-detected offset (if any) and the percentage of frames-with-IMU-window so the operator can debug.
Non-functional requirements
- Cold-start ≤ 5 s (not subject to AC-NEW-1's 30 s budget — that's airborne-only).
- Throughput ≥ 5 × real time on Jetson AGX Orin for
--pace asap. - Memory ≤ 4 GB resident (lean image; no FAISS index unless tile lookup is needed).
Risks & mitigations
- R-DEMO-1: Tlog ↔ video timestamp drift across long flights, AND the more-common case that recordings on the operator workstation are not synchronised at all (camera and FC start independently, often minutes apart). Mitigation: auto-sync via IMU take-off detection (AC-7) is the default;
--time-offset-ms Nis the manual override. If take-off pattern is ambiguous (e.g., fixed-wing hand-launch instead of quadcopter, or tlog includes pre-arm motion), CLI WARNs and falls back to the manual override. - R-DEMO-2: Pymavlink slow on multi-GB tlogs. Mitigation: stream-parse, never materialise; benchmark + document throughput floor.
- R-DEMO-3: Demo footage missing required FC messages (HIL mode etc.). Mitigation: CLI fails fast at startup listing missing message types and the components that need them.
- R-DEMO-4: Production C1–C5 paths bake real-time-cadence assumptions (e.g., 5 s fallback timer). Mitigation:
Clockinjection (wall-clock for live, tlog-derived for replay); documented as ADR amendment in next architecture-doc cycle.
Effort
T-shirt M; 27–32 points across 8 child tasks.
Child issues
| # | Title | Pts |
|---|---|---|
| 1 | FrameSource interface (cross-cutting) + VideoFileFrameSource strategy + LiveCameraFrameSource retrofit |
3 |
| 2 | TlogReplayFcAdapter strategy (pymavlink stream parser → inbound DTOs) |
5 |
| 3 | ReplaySink interface + JsonlReplaySink impl |
3 |
| 4 | compose_replay(config) + Clock injection (per R-DEMO-4) |
3 |
| 5 | gps-denied-replay CLI entrypoint + arg parser + camera-calibration loader |
3 |
| 6 | gps-denied-replay-cli Dockerfile + GitHub Actions matrix entry + SBOM diff (excludes C6/C10/C11/C12) |
3 |
| 7 | E2E replay fixture test (Derkachi 1–2 min clip + tlog; AC-3 ≤100 m ≥ 80 % assertion) | 5 |
| 8 | Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8; --time-offset-ms remains the manual override) |
5 |
Key constraints
- ADR-001 / ADR-002 / ADR-009.
- C1–C5 components MUST remain mode-agnostic; replay-aware logic lives only in the composition root, the new strategies, and the CLI.
- No HTTP server in any companion binary (airborne or replay); HTTP wrapper, if added later, lives in operator-orchestrator per
module-layout.mdLayer-4 placement.
Testing strategy
Unit tests under tests/unit/frame_source/, tests/unit/c8_fc_adapter/test_tlog_replay_adapter.py, tests/unit/c8_fc_adapter/test_replay_sink.py, tests/unit/cli/test_replay_cli.py. E2E under tests/e2e/replay/ running the CLI against the Derkachi fixture (Tier-1 capable; gated by RUN_REPLAY_E2E=1 in CI). No FT/NFT scenarios at this epic — those live in E-BBT.
Lessons applied (Step 6 step-0 retrospective)
_docs/LESSONS.md does not yet exist (this is the project's first cycle), so no prior estimation/architecture/dependencies lessons were folded into the sizing above. When this cycle ends, the Final step's quality checklist should propose a lessons file capturing:
- C2.5 ↔ C3 helper-ownership (R14) — generalisable lesson: when two siblings share a runtime, place ownership in a shared helper from day one rather than discovering the cycle in a 4a evaluator pass.
- ADR-007 reversal (mock-as-fixture) — generalisable lesson: a test fixture is not a component; promoting one inflates architectural surface and risks contract drift.
- D-PROJ-2 / D-PROJ-3 carryforwards — generalisable lesson: cross-suite design dependencies belong in
_process_leftovers/from the moment they are recognised, with full payload so a later cycle can replay them.
These three are candidates for the next cycle's LESSONS.md.