diff --git a/_docs/02_document/architecture.md b/_docs/02_document/architecture.md new file mode 100644 index 0000000..a746938 --- /dev/null +++ b/_docs/02_document/architecture.md @@ -0,0 +1,597 @@ +# GPS-Denied Onboard Pose Estimation — Architecture + +> Date: 2026-05-09 (Plan Phase 2a — initial draft). +> Inputs: `_docs/00_problem/{problem,acceptance_criteria,restrictions}.md`, `_docs/00_problem/input_data/*`, `_docs/01_solution/solution.md`, `_docs/02_document/glossary.md`, `_docs/02_document/tests/*`. + +## Architecture Vision + +> User-confirmed in Plan Phase 2a.0 (2026-05-09). This section is the spine of the document; nothing below it may contradict it without a recorded ADR. + +The system is a **Jetson Orin Nano Super-hosted onboard companion** that delivers a GPS-equivalent WGS84 position (with honest 6×6 covariance and provenance label `{satellite_anchored | visual_propagated | dead_reckoned}`) to a fixed-wing UAV's flight controller in GPS-denied or GPS-spoofed environments. It runs as a **single Python-with-C++-extensions monolithic process per binary track** on the companion PC, fusing pre-flight-cached satellite tiles served by the parent-suite `satellite-provider` with live nav-camera frames (3 Hz) and FC-supplied IMU/attitude (100–200 Hz). A canonical hierarchical pipeline `VIO → retrieval → re-rank → matching → AdHoP-conditional refinement → pose → fusion` drives the per-frame loop within a 400 ms p95 latency budget. Cross-component coupling routes through a shared GTSAM substrate so posterior covariance is recovered natively (D-C5-5 = (c)). The companion is **read-only against `satellite-provider` while airborne**; mid-flight tiles are generated locally and uploaded post-landing only via a separate operator-side tool, with the airborne companion image not loading the upload code path at all (process-level isolation, AC-8.4). + +### Components — intent-level (formal decomposition belongs to Step 3) + +- **C1 — Visual / Visual-Inertial Odometry**: pluggable `VioStrategy` (Okvis2 default, VinsMono in research builds only, KltRansac mandatory simple-baseline), config-selected at startup, not hot-swappable mid-flight. +- **C2 — Visual Place Recognition**: pre-cached satellite-tile retrieval (UltraVPR primary, MegaLoc secondary, MixVPR / SelaVPR / EigenPlaces / NetVLAD / SALAD additional candidates), all behind a single `VprStrategy` interface; concrete implementation chosen by config at startup. +- **C2.5 — Top-N inlier-based re-rank**: re-ranks the top-K=10 VPR candidates by single-pair LightGlue inlier count down to top-N=3. +- **C3 — Cross-domain matcher**: DISK+LightGlue (D-C3-1 = (a)) over the N=3 retained candidates; ALIKED+LightGlue secondary; XFeat alternate. +- **C3.5 — AdHoP-conditional refinement**: invoked only when initial reprojection residual exceeds threshold; bypassed otherwise to preserve AC-4.1. +- **C4 — Pose estimation**: OpenCV ≥4.12.0 `solvePnPRansac` (IPPE) wrapped in GTSAM `Marginals` for native 6×6 covariance recovery (D-C4-2 = (b); auto-degrades to Jacobian-based covariance D-C4-2 = (a) under thermal throttle per D-CROSS-LATENCY-1). +- **C5 — State estimator**: GTSAM iSAM2 + `CombinedImuFactor` + `IncrementalFixedLagSmoother` (K=10–20 keyframes, D-C5-3); native posterior covariance via `Marginals`; **AC-4.5 = internal smoothing only**, not FC retroactive correction. +- **C6 — Tile cache + spatial index**: PostgreSQL btree spatial index over filesystem `./tiles/{zoomLevel}/{x}/{y}.jpg` mirroring `satellite-provider`'s on-disk layout, plus FAISS HNSW index for VPR descriptors (`.index` written via `faiss.write_index` + atomicwrites + SHA-256 content-hash gate, D-C10-3). +- **C7 — On-Jetson inference runtime**: TensorRT 10.3 engines (Polygraphy / trtexec / IBuilderConfig hybrid orchestration), JetPack 6.2, SM 87; ONNX Runtime + TRT EP fallback; pure PyTorch FP16 baseline. +- **C8 — Flight-Controller adapter**: `pymavlink` `GPS_INPUT` for ArduPilot Plane (MAVLink 2.0 message signing on the companion ↔ AP wired channel, D-C8-9 = (d)) and `YAMSPy` / INAV-Toolkit `MSP2_SENSOR_GPS` for iNav (signing-gap accepted residual risk). +- **C10 — Pre-flight cache provisioning**: orchestrates download-from-`satellite-provider` → tile cache; descriptor generation; engine compilation; calibration loading; manifest+content-hash verification (D-C10-1, D-C10-3, D-C10-6, D-C10-7). +- **C11 — Post-landing tile upload tool** (operator-side, distinct binary/image): only loaded when `flight_state == ON_GROUND`; pushes locally-saved mid-flight tiles to `satellite-provider`'s ingest endpoint (D-PROJ-2 contract; not yet implemented service-side). +- **C12 — Operator pre-flight tooling** (Plan-phase carryforward, deferred from research): cache provisioning UI, sector classification (active-conflict vs stable rear), freshness pipeline workflow. +- **C13 — Flight Data Recorder (FDR)**: per-flight ≤64 GB NVM record of estimates + IMU traces + emitted MAVLink + system health + mid-flight tiles + ≤0.1 Hz failed-tile thumbnails; raw nav/AI-cam frames excluded (AC-8.5, AC-NEW-3). +- **`mock-suite-sat-service`** (testing-time component boundary, F7): a process that publishes the `satellite-provider` ingest contract during NFT-SEC-01 / FT-P-17 / IT runs; not a fixture but a real component boundary. +- **External: `satellite-provider`** (parent-suite .NET 8 service): tile producer pre-flight; tile sink post-landing (D-PROJ-2). Treated as a planned external dependency on the upload + voting paths. + +### Architectural principles / non-negotiables + +1. **Camera-specific math enters only via a `Camera calibration artifact` JSON** (intrinsics + distortion + body-to-camera extrinsics + acquisition method `factory_sheet | checkerboard_refined | hybrid`). No hard-coded camera math anywhere; test fixtures (`adti26`) and production deployments (`adti20`) load different artifacts on the same code path. +2. **VioStrategy is selected at startup via config; not hot-swappable mid-flight.** +3. **Build-time exclusion of unused `Strategy` implementations.** A given binary links only the implementations it actually uses at runtime. The default deployment binary links the production-default strategies (e.g. OKVIS2 on C1) plus the engine-rule-mandatory simple-baseline (KltRansac on C1); the IT-12 comparative-study binary links all C1 implementations side-by-side. The mechanism is per-component CMake `BUILD_*` flags (`BUILD_VINS_MONO`, `BUILD_SALAD`, …) plus the per-binary composition root choosing among the linked implementations at startup. **Justification is technical** — binary size on the 8 GB shared Jetson, boot/load time inside the AC-NEW-1 30 s budget, deployed dependency / attack surface, and accidental-selection risk reduction (a binary with only OKVIS2 + KltRansac linked cannot be misconfigured into running VINS-Mono). **Component licenses do not drive this decision** — see ADR-002. CI emits both the deployment binary and the research binary on every PR. +4. **In-air outbound writes to `satellite-provider` are forbidden.** Enforced primarily by **process-level isolation** — the upload daemon (C11) is not loaded in the airborne companion image. Software guard on `flight_state == ON_GROUND` is a defense-in-depth check, not the primary control. +5. **All persistent imagery is in `satellite-provider`'s on-disk tile format** (`./tiles/{zoomLevel}/{x}/{y}.jpg` + matching metadata) so post-landing upload is byte-identical. No raw frames on disk except the AC-8.5 forensic ≤0.1 Hz failed-tile thumbnail log inside FDR. +6. **Honest 6×6 posterior covariance via GTSAM `Marginals`** is the safety floor for AC-NEW-4 and AC-NEW-7. Under-reported `horiz_accuracy` is a defect, not a tuning knob. +7. **MAVLink 2.0 message signing on the companion ↔ ArduPilot wired channel**, with per-flight key rotation (D-C8-9 = (d)). iNav has no signing equivalent — accepted residual risk, Plan-phase carryforward proposes an iNav firmware feature request. +8. **D-CROSS-LATENCY-1 hybrid**: K=3 baseline auto-degrades to K=2 + Jacobian covariance under Jetson thermal throttle, preserving AC-4.1 at +50 °C ambient at the cost of ~5–10 % accuracy loss (still inside AC-NEW-4). +9. **Two execution tiers** (Tier-1 workstation Docker = fast/cheap; Tier-2 Jetson hardware = AC-bound) appear in the deployment plan and CI matrix per finding F6. +10. **Camera intrinsics and full-altitude footage are calibration prerequisites**, not implementation gaps. Production accuracy claims are gated on D-PROJ-1 closure (hybrid factory + checkerboard refinement). Test fixtures use `adti26` calibration sourced from public/factory references. +11. **Spoofed GPS never re-enters the estimator** unless FC GPS health is stable + non-spoofed for ≥10 s **AND** a visual/satellite consistency check has succeeded (AC-NEW-8). +12. **AC-4.5 is internal smoothing only.** GTSAM iSAM2 retroactively refines past keyframes onboard and emits the corrected current frame; the FC log is forward-time only — neither ArduPilot nor iNav supports FC-side retroactive correction (Mode B Fact #107). +13. **Interface-first components with constructor-injected dependencies.** Every component is **defined as an interface (Python `Protocol` or `ABC`) before any concrete implementation exists**, lives in its **own folder under `src/components//`**, and is wired together via **constructor injection** at a single composition root. Components never reach out to a global registry, a singleton, or `import` a sibling component's concrete class directly — they receive their collaborators as `__init__` arguments typed against the sibling's interface. Multiple interchangeable implementations of the same interface MUST be supported by design (e.g., C1 has three `VioStrategy` implementations; C2 has UltraVPR + MegaLoc + MixVPR + … behind a single `VprStrategy`; C8 has two FC-adapter implementations behind a single `FcAdapter`). Selection happens once, at startup, by config; the composition root resolves config → concrete implementation → wires the graph; the rest of the runtime sees only interfaces. **Side benefit (NOTE)**: this design also gives the project **packaging optionality** — different combinations of `BUILD_*` flags can produce binaries tailored to specific deployment targets, customer bundles, or (if/when relevant later) end-product licensing strategies, **without any source-level change in application code**. That optionality is a *consequence* of the interface-first design, not a driver — the architectural decisions in this document are made on technical grounds; component licenses do not influence them. See ADR-002 § Consequences and ADR-009. + +### Open architectural items (tracked, NOT blocking Phase 2a) + +- **D-PROJ-1** (camera calibration acquisition): CLOSED in this Plan cycle as hybrid factory + checkerboard refinement (~1 day per deployed unit). No physical hardware available this cycle, so production calibration is documented as instructions only; runtime path uses test-fixture calibration for `adti26` images. +- **D-PROJ-2** (parent-suite `satellite-provider` ingest endpoint + multi-flight voting layer): open, parent-suite work, tracked in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. Onboard-side proceeds with `mock-suite-sat-service` standing in for the contract. +- **D-PROJ-3** (multi-flight fixture acquisition for AC-NEW-4 / AC-NEW-7 statistical headroom): not pursued this cycle; AC-text was relaxed 2026-05-09 to Monte-Carlo-over-current-data with stated 95 % CI; multi-flight statistical headroom is residual risk in the Step 4 risk register. +- **D-C8-2 runtime gate** (companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` switch): pattern is firmware-supported but not deployed-precedent. ArduPilot Plane SITL validation (IT-3) is the lock gate; D-C8-2-FALLBACK options recorded. +- **D-C2-12** (DINOv2-feature-based matcher evaluation): carryforward research item; potentially closes D-C3-1 retrain cost. + +--- + +## 1. System Context + +**Problem being solved**: a fixed-wing UAV operating in eastern/southern Ukraine must continue to navigate and report position to its flight controller when GPS is **denied** (no fix) or **spoofed** (false fix). The onboard system replaces real GPS with a WGS84 position estimate derived from pre-cached satellite tiles + live nav-camera frames + FC IMU/attitude, with honest covariance and a provenance label. Mission profile: 8 h flights, ~60 km/h cruise, ≤1 km AGL, ≤400 km² total cached area. + +**System boundaries** (inside vs outside): + +| Inside the system (this project) | Outside the system | +|---|---| +| Companion PC runtime (Jetson Orin Nano Super, JetPack 6.2) | Flight controller firmware (ArduPilot Plane, iNav) | +| All onboard pose-estimation logic (C1–C8, C13) | Parent-suite `satellite-provider` (.NET 8 REST microservice) | +| Pre-flight cache provisioning (C10) | GCS (QGroundControl) | +| Operator-side post-landing upload tool (C11) | Nav camera hardware (`adti20`); AI-camera hardware | +| Operator pre-flight tooling (C12) | UAV airframe / FC IMU / sensors | +| FDR writer (C13) | Operator's workstation OS / authentication | +| Camera calibration artifact format + loader | The act of calibration itself (operator runs checkerboard rig) | +| `mock-suite-sat-service` (testing-time, F7) | Cellular / satellite GCS link bandwidth | + +**External systems**: + +| System | Integration Type | Direction | Purpose | +|---|---|---|---| +| `satellite-provider` (parent-suite .NET 8) | REST + filesystem (read), REST (post-landing write, D-PROJ-2) | Both | Pre-flight tile source; post-landing tile sink (planned) | +| ArduPilot Plane FC | MAVLink 2.0 over UART/USB (signed) | Both | Inbound: external position via `GPS_INPUT`. Outbound: IMU, attitude, GPS health, EKF source-set commands | +| iNav FC | MSP2 over UART (unsigned), MAVLink outbound | Both | Inbound: external position via `MSP2_SENSOR_GPS` (companion is sole GPS source on iNav). Outbound: IMU/attitude/telemetry | +| QGroundControl (GCS) | MAVLink 2.0 (link-bandwidth-limited) | Both | 1–2 Hz downsampled summary out (AC-6.1); operator commands in (AC-6.2) | +| Nav camera (USB/MIPI-CSI/GigE) | Camera SDK / V4L2 | Inbound | 3 Hz nadir frames at 5472×3648 px | +| AI camera | Camera SDK + gimbal/zoom telemetry | Inbound | AC-7.x object localization (deferred to follow-up cycle) | +| Operator workstation | Filesystem + USB/Ethernet | Both | Pre-flight: stages cache + calibration onto companion. Post-flight: triggers upload tool, reads FDR | + +--- + +## 2. Technology Stack + +| Layer | Technology | Version | Rationale | +|---|---|---|---| +| Language (host) | Python | 3.10 (JetPack 6.2 default) | Glue layer for GTSAM/FAISS/OpenCV/pymavlink/YAMSPy; matches every selected library's primary binding | +| Language (perf-critical) | C++ | C++17 | OKVIS2, VINS-Mono, GTSAM core, OpenCV, FAISS native; Python wrappers cross the boundary | +| Inference runtime | TensorRT | 10.3 (JetPack 6.2 pin) | Pinned per D-C7-9; fallback ONNX Runtime + TRT EP; pure PyTorch FP16 baseline for mandatory simple-baseline track | +| Visual matching | DISK + LightGlue | upstream HEAD pinned per Plan-phase | D-C3-1 = (a); replaces SuperPoint+SuperGlue (Magic Leap noncommercial canonical) | +| VPR (primary) | UltraVPR | RAL 2025 / ICRA 2026 (cbbhuxx/UltraVPR) | Documentary Lead PRIMARY; rotation-invariant, unsupervised aerial pretrain (multi-heading aerial flight + closes D-C2-1 retrain cost) | +| VPR (secondary) | MegaLoc, MixVPR, SelaVPR, EigenPlaces, NetVLAD | upstream HEAD pinned per Plan-phase | Mode B Fact #110/#113 + mandatory simple-baseline (NetVLAD/MixVPR) | +| State estimator | GTSAM + `gtsam_unstable.IncrementalFixedLagSmoother` | per Plan-phase pin (no published CVE at audit time) | Native 6×6 covariance; D-C5-5 = (c) `PriorFactorPose3` only | +| Image / pose math | OpenCV (Python+C++) | **≥ 4.12.0** | CVE-2025-53644 mitigation (Mode B Fact #112); IPPE flags for D-C4-1 = (b) | +| VPR descriptor index | FAISS HNSW | upstream HEAD pinned per Plan-phase | `faiss.write_index` + atomicwrites + SHA-256 content-hash gate (D-C10-3) | +| FC adapter (ArduPilot) | `pymavlink` + MAVLink 2.0 signing | bundled unmodified per D-C8-3 | Verified Source #4; ArduPilot canonical signing per Source #128 | +| FC adapter (iNav) | YAMSPy + INAV-Toolkit MSP2 | MIT throughout | iNav has no inbound MAVLink ext-positioning handler (SQ6) | +| VIO (production) | OKVIS2 (BSD-3-Clause) | upstream HEAD pinned per Plan-phase | D-C1-1-SUB-A = (a) production-default | +| VIO (research / IT-12) | VINS-Mono | upstream HEAD pinned per Plan-phase | Research binary only (`BUILD_VINS_MONO=ON`) for IT-12 comparative study; build-time exclusion from deployment binary per ADR-002 | +| VIO (mandatory baseline) | KLT+RANSAC over OpenCV | OpenCV ≥ 4.12.0 | Engine-rule-required mandatory simple-baseline | +| Tile cache backend | PostgreSQL + filesystem | PostgreSQL 16 (mirror of `satellite-provider`) | C6 mirrors `satellite-provider`'s on-disk and table layout for byte-identical post-landing upload | +| Container runtime | Docker (Tier-1) + bare JetPack (Tier-2) | Docker 27.x; JetPack 6.2 | Tier-1 workstation Docker; Tier-2 Jetson native (no Docker — direct JetPack to keep INT8 calibration cache trustworthy per D-C10-6) | +| Build system | CMake + Python `pyproject.toml` | CMake ≥ 3.27 | CMake `option(BUILD_VINS_MONO ...)` D-C1-1-SUB-A; Python wheels built per Jetson via cibuildwheel-equivalent recipe | +| CI/CD | GitHub Actions (Tier-1) + self-hosted Jetson runner (Tier-2) | latest pinned action versions | Two-binary emit on every PR (production + research); Tier-2 runs are AC-bound jobs only | +| Configuration | YAML (per-flight) + Camera calibration JSON | n/a | Single config root; the only camera-specific entry point is the calibration JSON | + +**Key constraints from `restrictions.md` and how they shape the stack**: + +- **Hardware pinned to Jetson Orin Nano Super (8 GB shared, 25 W)** → forces TensorRT engine compilation on-device + INT8/FP16 mix per D-C7-1; rules out heavy multi-process stacks (D-C1-1-SUB-A = (b) was rejected on latency budget). +- **Python is the host language but ROS-bound C++ is unavoidable for VIO** → both production and research binaries are CMake projects that produce a Python-importable `.so` per `VioStrategy`; the rest of the runtime is pure Python. +- **PX4 is out of scope, ArduPilot Plane + iNav both required** → C8 must split per FC, with no single message contract spanning both. +- **Build-time exclusion of unused `Strategy` implementations (ADR-002)** → CMake `BUILD_*` flags (`BUILD_VINS_MONO`, `BUILD_SALAD`, …) determine which implementations are linked into each binary; the deployment binary links the production-default + the mandatory simple-baseline; the IT-12 research binary links all strategies. Justification is technical (binary size on 8 GB shared Jetson, AC-NEW-1 boot budget, dependency surface, accidental-selection risk). Component licenses do not influence this decision. +- **MAVLink message-signing posture asymmetry** → `pymavlink` signing handshake is part of takeoff load on the AP path; iNav unsigned link is documented as accepted residual risk in `security_analysis.md` carryforward. +- **No raw-frame storage (AC-8.5)** → all camera ingestion is streaming; the only persistence path for frame imagery is via tile orthorectification (AC-8.4). +- **8 h continuous duty cycle at 25 W up to +50 °C ambient** → the auto-degrade hybrid (D-CROSS-LATENCY-1) is a first-class concern of every latency-sensitive component, not an afterthought. + +--- + +## 3. Deployment Model + +**Environments**: + +| Environment | Purpose | Hardware | +|---|---|---| +| `dev-tier1` | Fast iterative development; unit + most integration tests | Workstation (any Linux x86_64 + NVIDIA GPU optional); Docker | +| `dev-tier2` | Hardware-bound development checks | Jetson Orin Nano Super dev kit (developer's desk) | +| `staging-tier1` | CI runs that don't require Jetson hardware | GitHub-hosted runner (x86_64); Docker | +| `staging-tier2` | CI runs that require Jetson (AC-bound jobs only) | Self-hosted Jetson runner; bare JetPack (no Docker) | +| `production` | Deployed companion image on a UAV | Jetson Orin Nano Super (pinned); bare JetPack; no inbound network listening (defense-in-depth, NFT-SEC-05) | +| `production-operator-workstation` | Pre-flight cache build + post-landing upload + FDR retrieval | Operator's Linux workstation; Docker for `satellite-provider` mirror | + +**Infrastructure**: + +- **No cloud orchestration**. The companion is an embedded edge device; the operator's workstation is a single host that runs the operator tooling (C11, C12) and a local `satellite-provider` mirror or VPN-reaches the lab `satellite-provider`. +- **Two binaries shipped on every PR** (ADR-002): `deployment-binary` (links the production-default strategy on each component + the mandatory simple-baseline; CMake `BUILD_VINS_MONO=OFF`, `BUILD_SALAD=OFF`, …) and `research-binary` (links every available strategy on every component; all `BUILD_*` flags `ON`, used for the IT-12 comparative study). The deployment binary is what installs onto an operational Jetson; the research binary runs on dev/lab Jetson hardware for the comparative-study report. The same code base produces both — ADR-002 mechanism scales to additional binary variants later if packaging strategy requires it. +- **Container scope**: Tier-1 uses Docker (`docker compose` for the developer setup including a `mock-suite-sat-service` container, the operator-tool container, and a Postgres for C6). **Tier-2 (Jetson) does NOT use Docker** — TensorRT INT8 calibration caches and `jetson-stats` thermal telemetry are most reliable without a container layer, per D-C7-9 + D-C10-6. The deployed image on the Jetson is a JetPack-based system image with the deployment binary preinstalled. +- **Scaling**: not applicable (per-UAV, single companion). Failover is per-airframe (the FC's IMU-only fallback at AC-5.2 is the system's "scale-out"). + +**Environment-specific configuration**: + +| Config | dev-tier1 | staging-tier2 | production | +|---|---|---|---| +| `satellite-provider` host | local Docker (`satellite-provider:5100`) | mock-suite-sat-service or test fixture | operator workstation (pre-flight only) | +| Camera calibration source | test-fixture artifact (`adti26.json`) | test-fixture artifact | `adti20.json` (D-PROJ-1 hybrid output) | +| Logging sink | console (DEBUG) | journald + FDR | FDR (per-flight, ≤ 64 GB rolling) | +| MAVLink signing key | dev key (committed to test fixtures) | per-flight key from test config | per-flight key generated at takeoff load, rotated per flight | +| Inference engine source | pre-built engines OR on-the-fly compile | pre-built (Tier-2 cache) | pre-built (verified content-hash gate) | +| `BUILD_VINS_MONO` (binary track) | both (developer's choice) | both | OFF (production-only) | +| Network egress | unrestricted | locked to test endpoints | **none in flight** (DNS blackhole + iptables OUTPUT REJECT, NFT-SEC-05) | + +**Image / artifact pipeline**: + +``` +source repo + ├─→ CI matrix + │ ├─ tier1 lint + unit + most integration → Docker + │ ├─ tier1 build production-binary + research-binary (CMake split) + │ ├─ tier1 SBOM diff (production must NOT include vins_mono) + │ └─ tier2 (self-hosted Jetson) AC-bound suite (NFT-PERF-*, NFT-LIM-*, IT-12) + │ + ├─→ release artifacts: + │ ├─ deployment-binary tarball (production-default strategies + mandatory baselines, ADR-002) + │ ├─ research-binary tarball (all strategies linked; for IT-12 comparative study) + │ ├─ JetPack image (deployment-binary preinstalled) + │ └─ operator-tooling tarball (C11 + C12 + mock-suite-sat-service compose) + │ + └─→ deploy paths: + ├─ Jetson operational deploy: JetPack image flash (deployment-binary) + ├─ Lab/research deploy: research-binary install on dev Jetson + └─ Operator workstation: Docker compose for C11+C12+local satellite-provider mirror +``` + +--- + +## 4. Data Model Overview + +> Detailed per-component data models live in component specs (Step 3); per-entity migration strategies live in `data_model.md` (Phase 2b). + +**Core entities**: + +| Entity | Description | Owned by component | +|---|---|---| +| `NavCameraFrame` | 5472×3648 px nadir RGB frame + capture timestamp + camera ID | Camera ingest → C1, C2 | +| `ImuSample` / `ImuWindow` | IMU sample (accel + gyro + timestamp) at 100–200 Hz; windowed view sent to C1 | FC adapter (C8 inbound side) | +| `VioOutput` | Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate + feature quality | C1 | +| `VprQuery` | Image embedding (UltraVPR/MegaLoc/etc) | C2 | +| `VprResult` | Top-K=10 candidate tile IDs ranked by descriptor distance | C2 | +| `RerankResult` | Top-N=3 candidate tiles ranked by inlier count | C2.5 | +| `MatchResult` | 2D-3D correspondences with RANSAC inliers from C3 / C3.5 | C3, C3.5 | +| `CameraCalibration` | Intrinsics K + distortion + body-to-camera extrinsics + acquisition method | Loaded once at startup; consumed by C1, C3, C4 | +| `PoseEstimate` | WGS84 position + 6×6 covariance + provenance label + `last_satellite_anchor_age_ms` | C4 → C5 | +| `Tile` | JPEG body + center lat/lon + zoomLevel + tile_size_meters/pixels + capture_timestamp + source + freshness flag + (mid-flight only) quality_metadata | C6 | +| `TileQualityMetadata` | `estimator_label`, 2×2 covariance sub-matrix, `last_anchor_age_ms`, MRE, IMU bias norm — sufficient for D-PROJ-2 voting | C6 (write side from C5/C4 outputs) | +| `EmittedExternalPosition` | WGS84 + honest `horiz_accuracy` + per-FC encoding (MAVLink `GPS_INPUT` for AP, MSP2 `MSP2_SENSOR_GPS` for iNav) | C8 | +| `FlightStateSignal` | `IN_AIR | ON_GROUND` boolean derived from FC `MAV_STATE` | C8 inbound side; published to C11 only post-landing | +| `FdrRecord` | Estimates + IMU traces + emitted MAVLink + system health + tiles + thumbnails (≤ 64 GB / flight) | C13 | +| `Manifest` | Hash of (model + calibration + corpus + sector classification) for D-C10-1 idempotence | C10 | +| `EngineCacheEntry` | TRT engine + INT8 calibration cache keyed by SM/JP/TRT/precision tuple (D-C10-7) | C10, C7 | +| `SectorClassification` | `active_conflict | stable_rear` per area, drives freshness threshold | C12 (operator-set) → C6, C10 | + +**Key relationships**: + +- `NavCameraFrame` → `VioOutput` (via C1) and `VprQuery` (via C2): same frame, two consumers. +- `VprResult.tileIds` ⊆ `Tile.id` (FK into the tile cache). +- `MatchResult` references both `NavCameraFrame.id` and `Tile.id` (cross-domain pair). +- `PoseEstimate` aggregates `MatchResult` + `VioOutput` + `ImuWindow` through C4 + C5. +- `EmittedExternalPosition` is a per-FC projection of `PoseEstimate`; the projection rule lives in C8 (per-FC unit conversion D-C8-8 = (b)). +- `Tile` (mid-flight) is produced from `NavCameraFrame` + `PoseEstimate` via orthorectification; carries `TileQualityMetadata` referencing the `PoseEstimate` it was emitted from. +- `FdrRecord` is the union of all emitted streams + all inputs (excluding raw nav/AI-cam frames); rollover policy = oldest segment dropped first. + +**Data flow summary** (one-line each; full sequences in `system-flows.md`): + +- Pre-flight: `satellite-provider` → C10 → `Tile` cache + `EngineCacheEntry` + `Manifest` + descriptor `.index` (atomic write + content-hash gate). +- Takeoff load: `Manifest` content-hash verify + FAISS mmap + TRT deserialize + MAVLink signing handshake → ready. +- Per-frame runtime: `NavCameraFrame` + `ImuWindow` → C1 (`VioOutput`) → C2 → C2.5 → C3 → C3.5 → C4 → C5 → C8 → `EmittedExternalPosition` to FC. +- Mid-flight tile gen: `NavCameraFrame` + `PoseEstimate` → orthorectify → dedup → write to local C6 (no upload). +- GCS telemetry: C5 → C8 → 1–2 Hz downsampled summary to QGroundControl. +- FDR: every emitted/received stream → C13 ring with per-flight ≤ 64 GB cap. +- Post-landing: operator triggers C11 → reads C6 → uploads to `satellite-provider` ingest endpoint (D-PROJ-2 contract). + +--- + +## 5. Integration Points + +### Internal Communication + +> All in-process Python calls; the system is a single host process per binary track. "Pattern" describes the interaction shape. + +| From | To | Protocol | Pattern | Notes | +|---|---|---|---|---| +| Camera ingest thread | C1 (`VioStrategy.process_frame`) | In-process queue (bounded, drop-oldest) | Producer-consumer | Frame skip is allowed under sustained load (AC-4.1 "~10% may drop") | +| Camera ingest thread | C2 (`vpr_pipeline.query`) | In-process queue (bounded, drop-oldest) | Producer-consumer | Same frame fan-out, distinct queue depths | +| C2 | C2.5 | Direct call | Function call | C2.5 wraps C3 matcher; no queue | +| C2.5 | C3 / C3.5 | Direct call | Function call | C3.5 invoked iff `MatchResult.reprojection_residual > threshold` | +| C3 / C3.5 | C4 | Direct call | Function call | `MatchResult` passed as DTO | +| C1 + C4 | C5 | In-process queue (timestamp-ordered merge) | Pub/sub | C5 holds the GTSAM `iSAM2` state; one writer thread | +| C5 | C8 (FC outbound) | In-process queue (per-FC encoder) | Pub/sub | One encoder per active FC profile; selected at startup | +| C8 (FC inbound) | C1 (`ImuWindow`), C5 (FC IMU/attitude prior) | In-process pub/sub (timestamp-aligned) | Pub/sub | Single source of truth for FC IMU; both consumers see the same window | +| C8 (FC inbound) | flight-state guard (process boundary) | In-process pub/sub | Event | Used by FDR + GCS heartbeat; airborne companion does not load C11 at all | +| C5 → orthorectifier → C6 | C6 (write-only while airborne) | In-process function call | Command | Write path is in-process; the in-air image has no upload code path | +| All components | C13 (FDR writer) | In-process queue (lossy on overrun) | Pub/sub | Overrun = logged rollover, never silent drop (AC-NEW-3) | + +### External Integrations + +| External system | Protocol | Auth | Rate limits | Failure mode | +|---|---|---|---|---| +| ArduPilot Plane FC | MAVLink 2.0 (`GPS_INPUT` 5 Hz; `MAV_CMD_SET_EKF_SOURCE_SET`; `STATUSTEXT` / `NAMED_VALUE_FLOAT`) over UART/USB | MAVLink 2.0 message signing, per-flight key (D-C8-9 = (d)) | 5 Hz periodic emit; signing handshake at takeoff load (≤ 5 s, AC-NEW-1) | Signing handshake fail → companion refuses takeoff; mid-flight signing key compromise → FC ignores unsigned messages, AC-5.2 takes over | +| iNav FC | MSP2 `MSP2_SENSOR_GPS` over UART; MAVLink outbound for telemetry | None (iNav has no signing) — accepted residual risk per Mode B Source #129 | 5 Hz periodic emit | Mid-flight bad-frame → iNav `mspGPSReceiveNewData()` receives only the latest frame; honest `hPosAccuracy` is the only safety net | +| QGroundControl (GCS) | MAVLink 2.0 (`STATUSTEXT`, `NAMED_VALUE_FLOAT`, `GPS_RAW_INT`) | Same MAVLink 2.0 signing as the AP path (AP profile); no signing on iNav profile | 1–2 Hz downsampled (AC-6.1); operator commands are best-effort | GCS link drop → companion continues; no mid-flight reconfiguration is required from GCS | +| `satellite-provider` (pre-flight) | REST over HTTP, OpenAPI at `/swagger`; filesystem access if co-located | TLS + service-internal API key (operator workstation only); the companion never reaches `satellite-provider` directly while airborne | Off-line pre-flight; not time-critical | Cache miss → C10 fails fast pre-flight; takeoff blocked | +| `satellite-provider` (post-landing ingest, D-PROJ-2, **planned**) | REST `POST /api/satellite/tiles/ingest` (multipart) | Per-flight onboard signing key (carried with each tile); rate-limited | Bursty post-landing | Endpoint not yet implemented service-side → C11 keeps batches queued locally; never blocks the pre-flight cycle | +| Operator workstation (pre-flight stage) | Filesystem (USB / Ethernet) | OS-level (operator login) | Not time-critical | Bad-stage detection via Manifest content-hash gate (D-C10-3) | +| Nav camera | USB / MIPI-CSI / GigE (lens-module dependent) | n/a | 3 Hz | Frame drop / hardware fault → "VISUAL_BLACKOUT" path (AC-3.5, AC-NEW-8) | + +### `satellite-provider` upload contract (per D-PROJ-2 carryforward) + +The onboard side of D-PROJ-2 is fully specified in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. From this architecture's standpoint: + +- **`Tile` writes are append-only and idempotent** (the same `(zoomLevel, lat, lon, capture_timestamp, companion_id, flight_id)` tuple is the dedup key). +- **Quality metadata is mandatory on every uploaded tile** so the planned voting layer can promote `pending → trusted` without re-deriving statistics on the service side. +- **Onboard tiles never claim the `trusted` status**; they are uploaded as `pending` and the parent-suite voting layer (D-PROJ-2 design task #2) decides promotion. +- **Mock substitute**: `mock-suite-sat-service` (testing-time component, F7) implements the contract for NFT-SEC-01 / FT-P-17 / IT runs — it is treated as a real component boundary, not a test fixture. + +--- + +## 6. Non-Functional Requirements + +> Targets are taken verbatim from `acceptance_criteria.md` and `tests/traceability-matrix.md`. The tests column points to the canonical `tests/` files where each NFR is exercised. + +| Requirement | Target | Measurement | Priority | Tests | +|---|---|---|---|---| +| End-to-end latency (AC-4.1) | p95 ≤ 400 ms (steady-state and thermal-throttle hybrid) | NFT-PERF-01 (Tier-2); D-CROSS-LATENCY-1 partition | High | `tests/performance-tests.md` | +| Tail latency under thermal stress (AC-NEW-5 + AC-4.1) | p99 ≤ 600 ms; p95 ≤ 400 ms at +50 °C 8 h | NFT-9 hot-soak | High | `tests/performance-tests.md` | +| Memory cap (AC-4.2) | < 8 GB shared (CPU + GPU) on Jetson Orin Nano Super | NFT-LIM-01 8 h replay | High | `tests/resource-limit-tests.md` | +| Cold-start TTFF (AC-NEW-1) | p95 < 30 s from companion boot to first valid frame | NFT-PERF-03 (50× cold boot) | High | `tests/performance-tests.md` | +| Spoofing-promotion latency (AC-NEW-2) | p95 < 3 s on each FC | NFT-PERF-04 (SITL on AP + iNav) | High | `tests/performance-tests.md` | +| FDR storage (AC-NEW-3) | ≤ 64 GB / flight; no silent drops | NFT-LIM-02 8 h synthetic | Medium | `tests/resource-limit-tests.md` | +| False-position safety (AC-NEW-4) | P(err > 500 m) < 0.1 %; P(err > 1 km) < 0.01 %, with stated 95 % CI over current corpus | NFT-RES-03 Monte Carlo | High | `tests/resilience-tests.md` | +| Operating envelope (AC-NEW-5) | −20 °C to +50 °C; 25 W; 8 h no throttle | NFT-LIM-04 workstation baseline (chamber deferred) | High | `tests/resource-limit-tests.md` | +| Imagery freshness (AC-NEW-6, AC-8.2) | Reject/downgrade tiles violating 6 mo / 12 mo thresholds | FT-N-05 / FT-N-06 | High | `tests/blackbox-tests.md` | +| Cache-poisoning safety (AC-NEW-7) | Onboard-side: P(misalign > 30 m) < 1 %, P(> 100 m) < 0.1 %, with stated 95 % CI | NFT-SEC-01 onboard Monte Carlo + synthetic over-confidence injection | High | `tests/security-tests.md` | +| Visual blackout failsafe (AC-NEW-8) | Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS never re-promoted without 10 s + visual consistency gate | FT-N-04 + NFT-RES-04 | High | `tests/resilience-tests.md` + `tests/blackbox-tests.md` | +| Cross-FC covariance honesty (AC-NEW-4 cross-FC) | `horiz_accuracy` (m, AP) and `hPosAccuracy` (mm, iNav) carry mathematically equivalent values from the same 2×2 sub-matrix | IT-10 cross-FC | High | `tests/blackbox-tests.md` | +| MAVLink message-signing posture (AC-4.3 + D-C8-9) | Signing enabled on AP wired channel; per-flight key rotation logged to FDR; iNav documented residual risk | NFT-8 + NFT-SEC-03 | High | `tests/security-tests.md` | +| Dependency CVE pinning (D-CROSS-CVE-1) | OpenCV ≥ 4.12.0; SBOM clean of unpatched CVEs at audit time; monthly re-scan | NFT-10 SBOM CVE audit | High | `tests/security-tests.md` | +| GCS bandwidth budget (AC-6.1) | 1–2 Hz downsampled summary | FT-P-12 | Medium | `tests/blackbox-tests.md` | +| Frame-by-frame streaming (AC-4.4) | No batching/delay; estimates emitted per frame | NFT-PERF-02 | High | `tests/performance-tests.md` | +| Smoothing-loop look-back (AC-4.5, Mode B Fact #107) | FDR contains smoothed past-frame estimates; smoothing horizon converges within X m of ground truth at K = 10–20 keyframes | IT-11 | Medium | `tests/blackbox-tests.md` | + +--- + +## 7. Security Architecture + +**Threat model** (one-page summary; full extraction lives in carryforward `security_analysis.md`): + +- The companion is a **remote untrusted endpoint** from the parent-suite's standpoint: a downed UAV's companion can be physically captured. Persistent secrets must therefore be **per-flight ephemeral** wherever feasible. +- The **wired companion ↔ FC link** is the only physical-access-required attack surface for in-flight injection. MAVLink 2.0 signing on the AP path mitigates CVE-2026-1579 (D-C8-9 = (d)). iNav has no signing — accepted residual risk. +- The **GCS link** is bandwidth-limited and best-effort; a hostile GCS can spoof operator commands but cannot inject pose data (the system never accepts pose from GCS). +- **GPS spoofing** is treated as expected, not anomalous (AC-3.5, AC-NEW-2, AC-NEW-8). The system never lets a spoofed GPS source re-enter the estimator without a 10 s + visual-consistency gate. +- **Cache poisoning** is the dominant cross-flight attack vector (AC-NEW-7): a compromised companion could write a misaligned tile that becomes the next flight's anchor. The mitigation has two halves: onboard (honest covariance + quality metadata) and parent-suite (D-PROJ-2 voting layer, not yet implemented). +- **Pre-flight cache stage** is on the operator's workstation; the SHA-256 content-hash gate (D-C10-3) detects in-place tampering between stage and takeoff. +- **In-flight network egress is forbidden** (defense-in-depth: DNS blackhole + iptables OUTPUT REJECT, NFT-SEC-05). The only outbound path from the companion is MAVLink to the FC and signed STATUSTEXT to the GCS. + +**Authentication** (per integration): + +| Integration | Mechanism | +|---|---| +| Companion ↔ ArduPilot Plane FC | MAVLink 2.0 message signing, per-flight key rotation (D-C8-9 = (d)) | +| Companion ↔ iNav FC | None (iNav has no signing implementation; accepted residual risk per Mode B Source #129) | +| Companion ↔ GCS (AP profile) | MAVLink 2.0 signing inherited from the FC channel | +| Operator workstation ↔ `satellite-provider` (pre-flight) | TLS + service-internal API key (workstation only; never on the airborne companion) | +| Companion ↔ `satellite-provider` (post-landing upload, **D-PROJ-2 planned**) | Per-flight onboard signing key carried with each uploaded tile; the planned ingest endpoint verifies the key | +| Operator workstation pre-flight stage | OS-level (operator login + workstation hardening — operator-tooling concern, C12) | + +**Authorization**: + +- **Onboard runtime**: a single principal (the runtime process); no in-process privilege boundaries. The post-landing upload tool (C11) runs as a different principal with write access to `satellite-provider` and only loads when `flight_state == ON_GROUND`. The airborne image does not contain the C11 binary at all. +- **GCS**: operator commands (`AC-6.2`) are best-effort hints; the operator cannot promote a pose, override covariance, or reach the `satellite-provider` write path. Operator re-loc requests trigger the satellite re-localization flow (F6) but do not bypass any safety gate. + +**Data protection**: + +- **At rest**: tile cache + descriptor index + FDR are written to the companion's local NVM. No application-level encryption (the threat model treats a captured companion as compromised; encryption would buy little against physical access). Operator-side `satellite-provider` storage is the parent-suite's concern. +- **In transit**: MAVLink 2.0 message signing on the AP channel; MSP2 unsigned on iNav. The post-landing upload runs over TLS to `satellite-provider`. +- **Secrets management**: + - **Per-flight MAVLink signing key**: generated at takeoff load; rotated per flight; logged to FDR. + - **Per-flight onboard signing key for tile upload**: generated at takeoff load; baked into mid-flight tile metadata; consumed by C11 post-landing. + - **Pre-flight service API key**: stays on the operator workstation; never written to the companion image. + - **No long-lived secrets on the companion image** beyond firmware-level boot signatures (out of scope). + +**Audit logging**: + +| What | Where | Retention | +|---|---|---| +| All emitted external-position frames + covariance + provenance label | FDR (C13) | per flight (≤ 64 GB; rollover oldest-first) | +| All received MAVLink + MSP2 frames (raw `tlog` stream) | FDR | per flight | +| MAVLink 2.0 signing key rotation events | FDR | per flight | +| Spoofing-promotion / spoofing-rejection events | FDR + GCS STATUSTEXT | per flight + best-effort GCS link | +| `VISUAL_BLACKOUT_*` STATUSTEXT events (AC-3.5, AC-NEW-8) | FDR + GCS STATUSTEXT | per flight + best-effort | +| C10 content-hash gate fail events | FDR + companion refuses takeoff | per flight | +| Mid-flight tile-gen failures | ≤ 0.1 Hz thumbnail log inside FDR (AC-8.5 forensic exception) | per flight | +| Component health (CPU/GPU/temp/throttle) | FDR | per flight | +| Source-set switch events (D-C8-2 EKF source-set) | FDR + GCS STATUSTEXT | per flight | +| Production binary SBOM provenance | release artifacts; not on the deployed companion | per release | + +--- + +## 8. Key Architectural Decisions + +> These ADRs distill the user-confirmed Mode-B locks plus this architecture's first-time choices. ADRs are also tracked in `_docs/00_research/06_component_fit_matrix/MODEB_revisions.md` and (for cross-component gates) `99_cross_component_gates.md`. Step 4 (Risk Review) iterates on them; this section is the authoritative entry point. + +### ADR-001 — VioStrategy is selected at startup via config; not hot-swappable + +**Context**: Three VIO implementations are required (OKVIS2 production-default, VINS-Mono research-only, KLT+RANSAC mandatory simple-baseline). Hot-swap mid-flight would add re-initialisation cost on every switch and would require keeping multiple solvers warm in 8 GB shared memory. + +**Decision**: VioStrategy is selected at startup from a single config knob (`vio.strategy: okvis2 | vins_mono | klt_ransac`), and the choice is constant for the flight. The `VioStrategy` interface owns the abstraction; concrete strategies own their per-strategy concerns (OKVIS2's ROS bring-up, VINS-Mono's build flag, KLT's degraded covariance). Build-time inclusion / exclusion of individual strategies is governed separately by ADR-002. + +**Alternatives considered**: +1. Hot-swap at runtime — rejected: re-init cost + memory footprint inside AC-4.2. +2. Single-strategy build per binary — rejected: defeats the IT-12 comparative-study objective on the research binary. + +**Consequences**: A flight is locked to one VIO; failure of the active strategy = AC-5.2 fallback (FC IMU-only). The comparative study is a per-replay artifact, not a runtime decision. + +### ADR-002 — Build-time exclusion of unused `Strategy` implementations (D-C1-1-SUB-A = (a)) + +**Context**: The architecture deliberately requires multiple interchangeable implementations per component (three `VioStrategy` for C1; multiple `VprStrategy` for C2; two FC adapters for C8). At runtime each binary uses exactly one of them per component. Linking *all* implementations into every binary would inflate binary size on the 8 GB shared Jetson, increase boot/load time inside the AC-NEW-1 ≤ 30 s p95 budget, expand the deployed dependency / attack surface, and create accidental-selection risk (a misconfigured runtime accidentally booting a non-deployment-default strategy). A single binary with all strategies present is also harder to reason about for the IT-12 comparative study, which deliberately wants the *opposite* — every strategy present and replayed against the same footage. + +This decision is made on **technical grounds only**. Component licenses (BSD/Apache/MIT/LGPL/GPL/etc.) **do not influence** which strategy is the deployment-default — that choice is the IT-12 measured-performance verdict on the project's operating context (Jetson Orin Nano Super + ADTi 20MP 20L V1 + Derkachi-class footage). + +**Decision**: + +1. **Per-component CMake `BUILD_*` flag** controls whether each implementation is linked into a given binary (`BUILD_VINS_MONO`, `BUILD_SALAD`, etc.). The default deployment binary links the production-default strategy (OKVIS2 on C1 today, pending IT-12 verdict) plus the engine-rule-mandatory simple-baseline (KltRansac on C1). The research binary links every available strategy of every component for IT-12. +2. **The Strategy interface boundary makes the exclusion architectural** rather than configurational: sibling components import only the `Strategy` interface, never a concrete implementation. The composition root (one per binary, see ADR-009) is the only place that names concrete classes, and a class whose file is not part of the CMake target cannot be named there — so a misconfigured deployment cannot accidentally pull in an unintended strategy. +3. **Selection at startup** (config-driven; ADR-001) picks among the linked-in strategies. A binary with only OKVIS2 + KltRansac linked exposes only those two values for `vio.strategy`; the config validator fails fast if asked for `vins_mono`. +4. **CI emits both binaries on every PR** (deployment + research) so the comparative-study artifact is always reproducible alongside the deployable artifact. + +**Alternatives considered**: + +1. **Single binary with all strategies linked, runtime config picks one** — rejected on binary size + boot time + accidental-selection risk + unnecessary dependency surface on the deployed device. +2. **Process-isolation IPC for the unused strategies** — rejected on latency budget conflict (D-CROSS-LATENCY-1) and operational complexity of two-process deployments on a 25 W edge device. +3. **Multiple deployment-binary variants tailored to specific customer bundles** — out of scope of this ADR; supported as a *consequence* (see Consequences NOTE) but not a driver of the decision. + +**Consequences**: + +- Two CI binaries on every PR; both must build and test green. +- Adding any new strategy to a component is a folder-add + a CMake `BUILD_*` flag + an entry in the relevant binary's composition root. No call-site changes anywhere. +- The deployment binary's SBOM is what it is — a *consequence* of which `BUILD_*` flags were `ON`, not a driver of which flags should be `ON`. +- **NOTE — packaging optionality (deferred / non-binding).** Because the exclusion is per-implementation per-CMake-flag, the same code base can produce additional binaries — for different deployment targets, different customer bundles, or different end-product licensing bundles **if and when product licensing is decided later**. This architecture **deliberately makes no licensing decisions today**: component licenses do not influence which strategy is the deployment-default, and the decision above is purely technical. When packaging strategy is finalized, the same `BUILD_*` flag mechanism produces the right bundle without source-level changes — that optionality is a *side benefit* of the interface-first design (Principle #13 + ADR-009), not a justification for it. + +### ADR-003 — Honest 6×6 covariance via GTSAM Marginals is the safety floor (D-C5-5 = (c)) + +**Context**: AC-NEW-4 and the cross-FC covariance honesty (IT-10) require a single, mathematically-recoverable 6×6 posterior covariance per emitted frame. ESKF-style Jacobian-based covariance is faster but loses information across the C4–C5 boundary. + +**Decision**: C5 is GTSAM iSAM2 + `CombinedImuFactor` + `BetweenFactorPose3` + `GenericProjectionFactorCal3DS2`, with `Marginals.marginalCovariance(pose_key)` recovering the 6×6 posterior. C4 is OpenCV `solvePnPRansac` wrapped in a GTSAM factor so C4 and C5 share the same substrate. D-CROSS-LATENCY-1 hybrid auto-degrades C4 covariance to Jacobian-based (D-C4-2 = (a)) under thermal throttle, but C5 stays on Marginals. + +**Alternatives considered**: +1. ESKF-only with Jacobian covariance — rejected: loses cross-component covariance honesty; engine-rule mandatory simple-baseline only. +2. Dual estimators (ESKF + iSAM2) — rejected: memory + complexity + the hybrid auto-degrade already covers thermal stress. + +**Consequences**: GTSAM is a hard runtime dependency; AC-4.5 internal smoothing is for free; per-frame covariance recovery costs 30–90 ms in steady state (auto-degrades to 5–15 ms under thermal throttle). + +### ADR-004 — Process-level isolation for in-air upload prevention (AC-8.4 enforcement) + +**Context**: AC-8.4 forbids in-air outbound writes to `satellite-provider` for drone-security reasons. A software guard checking `flight_state == ON_GROUND` can be bypassed by code injection if the upload code path is ever loaded. + +**Decision**: The post-landing upload tool (C11) is a **separate binary / image** that runs only on the operator's workstation post-landing; the airborne companion image does not contain the C11 binary at all. The `flight_state == ON_GROUND` software guard remains as defense-in-depth. The local mid-flight tile format is byte-identical to `satellite-provider`'s on-disk layout so no transformation is needed at upload time. + +**Alternatives considered**: +1. Single binary with software-only guard — rejected on principle: a runtime guard cannot be the primary control for an "is the system airborne?" safety property. +2. Hardware-level switch (e.g., physical write-enable jumper) — rejected: adds operations cost; software-image-isolation gives equivalent assurance for this threat model. + +**Consequences**: Two binaries to maintain (companion image + operator-tooling image). CI builds and tests both. The operator workflow has an explicit post-landing step ("run the upload tool") which is itself a feature, not a bug. + +### ADR-005 — Two execution tiers (Tier-1 / Tier-2) are first-class architectural concerns (F6) + +**Context**: AC-4.1 latency, AC-4.2 memory, AC-NEW-1 cold-start, AC-NEW-3 FDR storage, AC-NEW-5 thermal envelope, and AC-NEW-7 cache-poisoning all have validation locations on Jetson hardware that cannot be replicated on a workstation. Conversely, most logic, integration, and contract tests run in seconds on Tier-1 and would take orders of magnitude longer on Tier-2. + +**Decision**: Tier-1 = workstation Docker (fast/cheap; runs lint + unit + most integration + Mock `satellite-provider`); Tier-2 = Jetson hardware (AC-bound jobs only; runs NFT-PERF-* + NFT-LIM-* + NFT-RES-* + IT-12). Both tiers are documented in the deployment plan and the CI matrix; failure on either tier is release-blocking. Tier-2 runner availability is itself a risk-register entry. + +**Alternatives considered**: +1. Tier-2-only — rejected: order-of-magnitude slower iteration loop; runner-availability risk dominates. +2. Tier-1-only — rejected: AC-bound NFTs cannot pass without Jetson hardware in the loop. + +**Consequences**: CI is split; some tests have an explicit `tier: 2` annotation in `tests/environment.md`; release artifacts include both tier results. + +### ADR-006 — D-CROSS-LATENCY-1 hybrid is the AC-4.1 budget strategy + +**Context**: At +50 °C ambient (AC-NEW-5 upper-temp), the Jetson auto-throttles, collapsing the steady-state K=3 latency budget. AC-4.1 has no thermal carve-out — the 400 ms p95 must hold across the operating envelope. + +**Decision**: K=3 baseline (DISK+LightGlue × 3 candidates from C2.5; GTSAM Marginals 6×6 covariance recovery in C4) auto-degrades to K=2 + Jacobian-based covariance under thermal throttle. The trigger is the Jetson's thermal-throttle telemetry crossing a configurable temperature/clock threshold (set per D-C7-9 JetPack 6.2 + TensorRT 10.3 lock). NFT-9 hot-soak validates the hybrid. + +**Alternatives considered**: +1. K=3 fixed + larger latency budget — rejected: AC-4.1 is the contract. +2. K=2 always — rejected: ~5–10 % accuracy loss at steady state hurts AC-NEW-4 headroom. + +**Consequences**: ~5–10 % accuracy loss at the upper thermal envelope (still inside AC-NEW-4). The hybrid is part of the runtime, not a config knob; the threshold is. + +### ADR-007 — `mock-suite-sat-service` is a real component boundary, not a fixture (F7) + +**Context**: D-PROJ-2 (parent-suite ingest endpoint + voting layer) is not yet implemented. NFT-SEC-01 / FT-P-17 / IT runs need a counterparty for the post-landing upload contract. Treating the counterparty as a "fixture" buried inside C8 hides the actual contract. + +**Decision**: `mock-suite-sat-service` is documented as a testing-time component boundary with its own description in `components/`. Component decomposition (Step 3) treats it as a peer of `satellite-provider`, not a sub-implementation of C8. The mock implements the contract sketched in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. + +**Alternatives considered**: +1. Embed mock inside C8 test fixtures — rejected: hides the contract. +2. Wait for D-PROJ-2 service-side implementation — rejected: blocks the onboard cycle. + +**Consequences**: The mock is a release artifact (not a test-only fixture); it is the source of truth for the onboard side of the D-PROJ-2 contract until the parent suite catches up. When `satellite-provider` ships the real endpoint, the mock is retired and the contract is replayed against the real service. + +### ADR-008 — D-C8-2 source-set switch is `Selected with runtime gate` (Mode B Fact #111) + +**Context**: AC-NEW-2 requires spoofing-promotion latency < 3 s. The companion-driven `MAV_CMD_SET_EKF_SOURCE_SET` switch (D-C8-2 = (b)) is firmware-supported but has no production-deployed precedent — the project would establish the canonical pattern. + +**Decision**: D-C8-2 = (b) is selected with a runtime gate: ArduPilot Plane SITL validation (IT-3) is the lock gate. If IT-3 fails, D-C8-2-FALLBACK options are recorded — (a) operator-manual RC aux switch with relaxed AC-NEW-2 wording; (b) operator-warning STATUSTEXT instead of automated switch; (c) escalate to ArduPilot dev community. + +**Alternatives considered**: see D-C8-2-FALLBACK above. + +**Consequences**: AC-NEW-2 contractual latency is contingent on IT-3 passing. If IT-3 fails, AC-NEW-2 wording is renegotiated as part of D-C8-2-FALLBACK = (a). + +### ADR-009 — Interface-first components, constructor injection, one folder per component + +**Context**: The architecture deliberately requires multiple interchangeable implementations per component (three `VioStrategy` for C1; UltraVPR / MegaLoc / MixVPR / SelaVPR / EigenPlaces / NetVLAD / SALAD candidates for C2; pymavlink-AP and YAMSPy-iNav adapters for C8). ADR-002 further mandates that the **same logical component** ship in different concrete forms across binaries (deployment binary vs IT-12 research binary; future packaging variants if/when needed). Without a strict interface boundary, sibling components import each other's concrete classes; build-time exclusion via `BUILD_*` flags becomes a fragile compile-time afterthought rather than an architectural property; testing each strategy in isolation requires monkey-patching; and adding a new strategy ripples into every call site. The interface-first pattern is the architectural mechanism that makes ADR-001 (runtime selection) and ADR-002 (build-time exclusion) tractable simultaneously. + +**Decision**: + +1. **Interface first.** Every component is specified as a Python `Protocol` (or `abc.ABC`, when concrete defaults are useful) **before** any concrete implementation is written. The interface is the contract; concrete implementations satisfy it. Step 3 component specs document the interface signature; concrete implementations are documented under their own header inside the component spec. + +2. **One folder per component.** Source layout (per `coderule.mdc` "place source code under `src/`"): + + ``` + src/ + components/ + c1_vio/ + __init__.py + interface.py # VioStrategy Protocol + VioOutput, VioConfig DTOs + okvis2_strategy.py # deployment-default (pending IT-12 verdict) + vins_mono_strategy.py # research-only; behind BUILD_VINS_MONO (ADR-002) + klt_ransac_strategy.py # engine-rule-mandatory simple-baseline + tests/ + c2_vpr/ + __init__.py + interface.py # VprStrategy Protocol + ultra_vpr.py # deployment-default (Documentary Lead PRIMARY) + mega_loc.py + mix_vpr.py # mandatory simple-baseline alternate + sela_vpr.py + eigen_places.py + net_vlad.py # mandatory simple-baseline classical + salad.py # additional candidate; behind BUILD_SALAD (ADR-002) + tests/ + c2_5_rerank/ + interface.py # ReRankStrategy + inlier_count_rerank.py + tests/ + c3_matcher/ + interface.py # CrossDomainMatcher + disk_lightglue.py + aliked_lightglue.py + xfeat.py + tests/ + c3_5_adhop/ + interface.py # ConditionalRefiner + adhop_refiner.py + passthrough_refiner.py # for non-conditional baseline + tests/ + c4_pose/ + interface.py # PoseEstimator + opencv_gtsam_estimator.py + tests/ + c5_state/ + interface.py # StateEstimator + gtsam_isam2_estimator.py + eskf_estimator.py # mandatory simple-baseline + tests/ + c6_tile_cache/ + interface.py # TileStore + TileMetadataStore + DescriptorIndex + postgres_filesystem_store.py + faiss_descriptor_index.py + tests/ + c7_inference/ + interface.py # InferenceRuntime + tensorrt_runtime.py + onnx_trt_ep_runtime.py + pytorch_fp16_runtime.py + tests/ + c8_fc_adapter/ + interface.py # FcAdapter (in+out), GcsAdapter + pymavlink_ardupilot_adapter.py + msp2_inav_adapter.py + qgc_telemetry_adapter.py + tests/ + c10_cache_provisioning/ + interface.py # CacheProvisioner, ManifestVerifier + provisioner.py + tests/ + c11_post_landing_upload/ # SEPARATE BINARY — never linked into airborne image + interface.py + uploader.py + tests/ + c13_fdr/ + interface.py # FdrWriter + file_fdr_writer.py + tests/ + composition/ + runtime_root.py # composition root: config -> concrete graph + upload_tool_root.py # composition root for the C11 operator-side tool + research_root.py # composition root for the research/dev binary + ``` + +3. **Constructor injection only.** Every component class declares its collaborators as **typed `__init__` arguments**, against the sibling's interface (not the concrete class). Example sketch: + + ```python + # src/components/c4_pose/interface.py + from typing import Protocol + class PoseEstimator(Protocol): + def estimate(self, match: MatchResult, calibration: CameraCalibration) -> PoseEstimate: ... + + # src/components/c5_state/gtsam_isam2_estimator.py + class GtsamIsam2StateEstimator: + def __init__( + self, + *, + pose_estimator: PoseEstimator, # interface, not concrete + imu_source: ImuSource, # interface + fdr: FdrWriter, # interface + config: StateEstimatorConfig, + ) -> None: + self._pose = pose_estimator + self._imu = imu_source + self._fdr = fdr + self._cfg = config + ``` + +4. **Composition root** (`src/composition/runtime_root.py`) is the **only** place that knows about concrete classes. It reads config, picks each concrete implementation, validates that every named implementation is actually linked into the active binary (fails fast otherwise), and wires the graph. Every other module sees only interfaces. **Build-time exclusion (ADR-002) becomes architectural**, not configurational: the deployment binary's composition root literally cannot wire `VinsMonoVioStrategy` because that file is not linked into the deployment binary (`BUILD_VINS_MONO=OFF`). Future packaging variants (e.g., a customer bundle with a different `VprStrategy` set) work the same way — a different `BUILD_*` flag combination + the same composition root code. + +5. **Python DI mechanism**: hand-rolled constructor injection in the composition root is the default — it has no extra dependency, is trivially understandable, and matches the pattern of "select once at startup, never hot-swap". A heavier DI library (`dependency-injector`, `injector`, `punq`) is **only** introduced if the composition root grows past ~150 lines or test-side wiring becomes repetitive; that is a Plan-phase deferred decision (carryforward), not a current architectural commitment. Mocking in tests is via simple stub classes that satisfy the same `Protocol` — no monkey-patching, no `unittest.mock.patch`. + +6. **Test wiring**: each component's `tests/` folder owns the test composition for that component. Test composition roots wire the unit-under-test against in-memory / fake implementations of every interface dependency. Cross-component integration tests (Tier-1) compose multiple real components with a fake `FcAdapter` + fake `TileStore` + fake `InferenceRuntime`. End-to-end Tier-2 tests run against the real composition root. + +**Alternatives considered**: + +1. **Sibling concrete imports** (`from c5_state.gtsam_isam2 import GtsamIsam2StateEstimator`) — rejected: makes ADR-002 build-time exclusion a CMake / SBOM artifact rather than an architectural property; couples C4 to a specific C5 implementation and vice versa; defeats the per-component test wiring; ripples into every call site whenever a new strategy is added. +2. **Service locator / global registry** (e.g., a process-wide DI singleton accessed via `get_service(VioStrategy)`) — rejected: hides the dependency graph from constructors, makes test isolation harder, and re-introduces the singletons banned in coderule.mdc. +3. **Function-based DI** (passing factories instead of instances) — rejected as the default: more cognitive overhead than constructor injection for a startup-bound, never-hot-swapped runtime. Reserved for the few call sites where lazy construction is genuinely required (e.g., the per-flight MAVLink signing key generator). +4. **Heavy DI framework** (`dependency-injector`, `injector`, `punq`) from day one — rejected as default: introduces a runtime dependency for a problem the composition root can solve in plain Python; reserved as an opt-in if the composition root outgrows hand-rolled wiring. + +**Consequences**: + +- Step 3 component decomposition produces, for **every** component: an `interface.py` description + ≥ 1 concrete implementation description + a test composition. +- The composition root is itself a reviewable artifact (a single Python file per binary track) that documents which concrete implementations a given binary contains. +- Build-time exclusion (ADR-002) becomes architectural: the deployment composition root *cannot* `import` a strategy whose file is not part of the deployment binary's CMake target. The same property scales to any future packaging variant — including, if/when product licensing strategy is decided, license-driven bundles (Principle #13 NOTE), without any source-level change in application code. +- Per-component folders give each implementation a natural home for its own `tests/`, fixtures, and adapter-specific helpers — matching coderule.mdc's "logic specific to a platform, variant, or environment belongs in the class that owns that variant". +- Adding a new C2 VPR backbone (e.g., a future foundation-model retrieval backbone via D-C2-12) is a folder-add + interface-conformance change; no other component is touched. diff --git a/_docs/02_document/glossary.md b/_docs/02_document/glossary.md index ede6813..5390d52 100644 --- a/_docs/02_document/glossary.md +++ b/_docs/02_document/glossary.md @@ -86,7 +86,7 @@ Terms are alphabetical. Each entry: one-line definition + parenthetical source. **UAV** — Fixed-wing unmanned aerial vehicle this system runs on; ~60 km/h cruise, ≤1 km AGL, 8 h flights, eastern/southern Ukraine theater. (source: `restrictions.md`) -**VioStrategy** — Pluggable interface (Okvis2 / VinsMono / KltRansac) selected at startup by config. Production binary excludes the GPL-3.0 implementation per D-C1-1-SUB-A=(a) build-config exclusion; research/dev binary links all three for the comparative study (IT-12). (source: `solution.md` §C1) +**VioStrategy** — Pluggable interface (Okvis2 / VinsMono / KltRansac) selected at startup by config; not hot-swappable mid-flight. The interchangeable-strategy pattern (ADR-001) plus build-time exclusion via per-implementation CMake `BUILD_*` flags (ADR-002) lets the project produce a small **deployment binary** (links the production-default + the engine-rule-mandatory simple-baseline) and a separate **research binary** that links every available strategy for the IT-12 comparative-study report. ADR-002 is purely technical (binary size on 8 GB shared Jetson, AC-NEW-1 boot budget, dependency / attack surface, accidental-selection risk); component licenses do not influence which strategy is the deployment-default. (source: `solution.md` §C1, `architecture.md` ADR-001 + ADR-002 + ADR-009) **VIO / Visual-Inertial Odometry** — Frame-to-frame motion + IMU bias estimation via fused camera + IMU streams (component C1). (source: `solution.md` §C1) diff --git a/_docs/02_document/system-flows.md b/_docs/02_document/system-flows.md new file mode 100644 index 0000000..5797843 --- /dev/null +++ b/_docs/02_document/system-flows.md @@ -0,0 +1,1006 @@ +# GPS-Denied Onboard Pose Estimation — System Flows + +> Date: 2026-05-09 (Plan Phase 2a — initial draft). +> Companion document to `architecture.md`. Component IDs (C1, C2, … C13) match the architecture's intent-level decomposition; concrete interfaces are defined in Step 3. +> +> Diagram conventions follow `.cursor/skills/plan/templates/system-flows.md` § Mermaid Diagram Conventions: component-named participants, camelCase node IDs, `{Question?}` decisions, `([label])` start/end, `[[label]]` for external systems, no inline styling. + +## Flow Inventory + +| # | Flow Name | Trigger | Primary Components | Criticality | +|---|-----------|---------|--------------------|-------------| +| F1 | Pre-flight cache provisioning | Operator runs C12 cache-build CLI on workstation | C12 (operator), [[`satellite-provider`]], C10, C6, C7 | High | +| F2 | Takeoff load | Companion boot detected by FC `MAV_STATE` ARMED OR companion process start with armed FC | C10, C7, C8 (signing handshake), C13 | High | +| F3 | Steady-state per-frame estimation | Nav camera frame received (3 Hz nominal) | C1, C2, C2.5, C3, C3.5, C4, C5, C8 (out), C13 | High | +| F4 | Mid-flight tile generation + local cache write | Successful satellite-anchored frame with quality metadata above threshold | C5, C6, C13 (no C8/C11 path) | High | +| F5 | Visual blackout + spoofed-GPS failsafe | Camera unusable AND/OR FC GPS reports denial/spoof | C1, C5, C8, C13 (degraded-mode escalation per AC-NEW-8) | High | +| F6 | Sharp-turn / disconnected-segment re-localization | Frame-to-frame registration fails for ≥ 1 frame (AC-3.2 / AC-3.3) | C1, C2, C2.5, C3, C3.5, C4, C5, C8, C13; optionally operator (AC-3.4) | High | +| F7 | Spoofing-promotion via EKF source-set switch | FC reports GPS denial/spoof while companion estimate is healthy | C5, C8, [[ArduPilot Plane FC]] | High | +| F8 | Companion reboot recovery | Companion process restart while FC remains armed | C8 (FC IMU pose ingest), C5, C10 (warm-cache verify), C13 | Medium | +| F9 | GCS telemetry stream | Per-frame estimate available + GCS link healthy | C5, C8, [[QGroundControl]] | Medium | +| F10 | Post-landing tile upload | Operator triggers C11 with `flight_state == ON_GROUND` confirmed | C11 (operator-side), C6 (read), [[`satellite-provider`]] (D-PROJ-2 endpoint, planned) | High | + +## Flow Dependencies + +| Flow | Depends on | Shares data with | +|------|-----------|------------------| +| F1 | none | F2 (Manifest, EngineCacheEntry, Tile cache, FAISS index) | +| F2 | F1 (cache + engines + manifests on disk; SHA-256 content-hash gate) | F3 (warmed pipeline state) | +| F3 | F2 (warm pipeline) | F4 (PoseEstimate for tile gen), F9 (downsampled summary), F11 (smoothing extends F3 internally — see F3 § Notes), F13 FDR (cross-cutting) | +| F4 | F3 (PoseEstimate + quality metadata) | F10 (uploaded tiles), F13 FDR | +| F5 | F3 (last trusted state), F8 (FC IMU prior) | F8 if covariance trips fail-threshold | +| F6 | F3 (frame-to-frame failure detection) | F3 resumes once anchor recovers | +| F7 | F3 (companion estimate health), F8 IMU prior | F3 (becomes primary FC source after switch) | +| F8 | F1 + F2 (warm cache survives reboot via content-hash verify) | F3 (resumes once warm), F5 (degraded mode if recovery fails) | +| F9 | F3 | n/a (read-only outbound) | +| F10 | F4 (locally-saved tiles), F8 confirmed `flight_state == ON_GROUND`, parent-suite D-PROJ-2 endpoint availability | F1 of the next flight (uploaded tiles enter the basemap once promoted to `trusted`) | + +**Cross-cutting**: F13 FDR-write is not a flow per se — every flow above has an FDR write side-effect. AC-NEW-3 requires every payload class (estimate, IMU, MAVLink, mid-flight tile, system health, failed-tile thumbnail) to be present; rollover is logged, never silent. + +--- + +## Flow F1: Pre-flight cache provisioning + +### Description + +The operator builds (or refreshes) the per-mission cache on the companion before takeoff: downloads tiles from `satellite-provider` for the operational area, generates VPR descriptors, compiles TensorRT engines, applies sector-classified freshness rules, and writes a manifest with a SHA-256 content-hash gate. This flow is offline and not time-critical; it is the only path that reaches `satellite-provider` from the companion side. + +### Preconditions + +- Operator workstation has network reach to `satellite-provider` (TLS + service-internal API key). +- Operator has classified the operational area (`active_conflict | stable_rear`) — drives the freshness threshold (AC-8.2 / AC-NEW-6). +- Camera calibration JSON for the deployed unit is available (`adti20..json` from D-PROJ-1 hybrid). +- Companion is connected to the operator workstation (USB or Ethernet) and writable. +- Available cache budget on the companion's NVM is ≥ the projected `≤ 10 GB` per AC-8.3. + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Operator + participant C12OperatorTool as C12 Operator Tool (workstation) + participant SatelliteProvider as [[satellite-provider]] (.NET 8) + participant C10Provisioner as C10 CacheProvisioner (companion) + participant C7Inference as C7 InferenceRuntime + participant C2Backbone as C2 VPR backbone (TensorRT) + participant C6TileStore as C6 TileStore + DescriptorIndex (Postgres + filesystem + FAISS) + + Operator->>C12OperatorTool: build_cache(area, sector_class, calibration_file) + C12OperatorTool->>SatelliteProvider: GET /api/satellite/tiles?bbox=&zoom= + SatelliteProvider-->>C12OperatorTool: Tile blobs + metadata (paged) + C12OperatorTool->>C12OperatorTool: filter by AC-NEW-6 freshness (sector-class-driven) + C12OperatorTool->>C10Provisioner: stage(tiles, manifest_inputs, calibration) + C10Provisioner->>C6TileStore: write tiles to ./tiles/{zoomLevel}/{x}/{y}.jpg + Postgres rows + C10Provisioner->>C7Inference: load VPR backbone ONNX + C7Inference-->>C10Provisioner: TRT engine compiled (cached per SM/JP/TRT/precision tuple) + C10Provisioner->>C2Backbone: per-tile descriptor generation (batched on Jetson) + C2Backbone-->>C10Provisioner: descriptor matrix (FP16/INT8 per D-C7-1) + C10Provisioner->>C6TileStore: faiss.write_index (HNSW) + atomicwrites + SHA-256 content-hash + C10Provisioner->>C10Provisioner: write Manifest (hash of model + calibration + corpus + sector_class) + C10Provisioner-->>C12OperatorTool: provisioning report (counts, hashes, freshness summary) + C12OperatorTool-->>Operator: PASS / FAIL summary +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Operator invokes C12 build]) --> Classify[Operator classifies sector active_conflict OR stable_rear] + Classify --> Download[C12 downloads tiles from satellite-provider for bbox + zoom] + Download --> FreshnessFilter{Freshness ok per AC-8.2 + AC-NEW-6?} + FreshnessFilter -->|stale and stable_rear| RejectOrDowngrade[Reject or downgrade tile] + FreshnessFilter -->|stale and active_conflict| RejectOrDowngrade + FreshnessFilter -->|fresh| Stage[C12 stages tiles + calibration on companion] + RejectOrDowngrade --> Stage + Stage --> InvokeC10[C12 invokes C10 CacheProvisioner on companion] + InvokeC10 --> WriteTiles[C10 writes tiles to filesystem + Postgres] + WriteTiles --> CompileEngines[C10 compiles TRT engines via C7 InferenceRuntime] + CompileEngines --> EngineCacheHit{EngineCacheEntry already valid for SM JP TRT precision tuple?} + EngineCacheHit -->|yes D-C10-6| ReuseEngine[Reuse cached engine and INT8 calibration cache] + EngineCacheHit -->|no| BuildEngine[Polygraphy or trtexec or IBuilderConfig hybrid build] + ReuseEngine --> Descriptors + BuildEngine --> Descriptors[C10 batches each tile through C2 backbone for descriptors] + Descriptors --> WriteIndex[faiss.write_index HNSW + atomicwrites + SHA-256 content-hash] + WriteIndex --> WriteManifest[Write Manifest with hash of model + calibration + corpus + sector_class] + WriteManifest --> ManifestHashCheck{Idempotence check D-C10-1: same manifest hash as last build?} + ManifestHashCheck -->|same| SkipRebuild[Skip rebuild and emit no-op report] + ManifestHashCheck -->|different| Done([Provisioning complete; cache + engines + manifest staged]) + SkipRebuild --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Operator | C12 | (`bounding_box`, `zoom_levels`, `sector_class`, `calibration_path`) | CLI args / GUI form | +| 2 | C12 | `satellite-provider` REST | `GET /api/satellite/tiles?bbox=…&zoom=…` | HTTPS query | +| 3 | `satellite-provider` | C12 | Paged tile blobs + metadata rows | JPEG + JSON metadata | +| 4 | C12 | C10 (over USB/Eth) | Staged tile bundle + calibration JSON | Tarball + manifest stub | +| 5 | C10 | C6 filesystem | Tile JPEG bodies | `./tiles/{zoomLevel}/{x}/{y}.jpg` | +| 6 | C10 | C6 PostgreSQL | Tile metadata rows | SQL INSERT (mirror of `satellite-provider`'s `tiles` table) | +| 7 | C10 → C7 | TRT engine cache | TRT engines | `.engine` files keyed by `(SM, JP, TRT, precision)` (D-C10-7) | +| 8 | C2 backbone | C6 FAISS index | Descriptor matrix | `.index` (FAISS HNSW), atomicwrites, SHA-256 sidecar | +| 9 | C10 | filesystem | Manifest | YAML or JSON; carries hashes | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| `satellite-provider` unreachable | Step 2 | HTTP timeout / 5xx | C12 fails with explicit error; operator retries when network is available; takeoff blocked | +| Tile fails freshness | Step 4 | `tile.capture_timestamp` vs `sector_class` threshold | Reject (active_conflict) or downgrade-no-`satellite_anchored`-label (rear), per AC-NEW-6; report to operator | +| Resolution below 0.5 m/px | Step 4 | Tile metadata GSD check (RESTRICT-SAT-4) | Reject; report; takeoff blocked | +| Insufficient cache budget | Step 5 | Filesystem free-space check pre-write | Fail fast with explicit budget delta; no partial write | +| Engine compile failure | Step 7 | Polygraphy / trtexec exit code; no output `.engine` | Surface error to operator; takeoff blocked; **never silently fall back** | +| Descriptor generation OOM on Jetson | Step 8 | CUDA OOM | Halve batch size and retry once; if still OOM, surface to operator | +| Atomic-write or SHA-256 mismatch | Step 8 | `atomicwrites` rollback or content-hash sidecar mismatch | Mark cache invalid; rebuild from staged tiles; if persistent, surface to operator | +| Tampered cache (post-write, pre-takeoff) | (caught at takeoff in F2, not here) | F2 SHA-256 content-hash gate | F2 refuses takeoff (IT-7) | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| End-to-end provisioning time (~400 km², worst case) | ≤ tens of minutes (offline, not time-critical per AC-8.3) | Dominated by tile download bandwidth + descriptor batching on Jetson | +| Engine cache hit re-build | < 30 s per IT-9 | D-C10-6 calibration-cache reuse + D-C10-7 self-describing filename schema | +| Idempotent re-run with no inputs changed | Skip rebuild via D-C10-1 manifest-hash trigger | IT-8 | +| Descriptor cache footprint | Inside the 10 GB AC-8.3 budget (incl. tiles + indices + overviews) | Carve-out per chosen VPR backbone descriptor dimension (D-C2-6 / D-C2-9 / D-C2-10) | + +--- + +## Flow F2: Takeoff load + +### Description + +From companion process start to **first valid emitted external-position frame**, within the AC-NEW-1 ≤ 30 s p95 cold-start TTFF budget. Takeoff load verifies the cache (SHA-256 content-hash gate, D-C10-3), mmaps the FAISS HNSW index, deserialises pre-built TensorRT engines, completes the MAVLink 2.0 signing handshake on the AP wired channel (D-C8-9 = (d)), and arms the per-frame pipeline. + +### Preconditions + +- F1 (pre-flight cache provisioning) has completed successfully. +- Manifest + tiles + descriptor index + TRT engines exist on the companion's NVM. +- Camera calibration JSON for the deployed unit is present at the path declared in config. +- FC is reachable on the configured UART/USB; FC firmware is `ArduPilot Plane ≥ Aug 2021 PR #18345` (for D-C8-2) or `iNav 8.0+`. +- Operator has staged the per-flight MAVLink 2.0 signing key seed (one half) and the FC has the matching pair (the AP path); iNav path skips this step. + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Companion as Companion process (composition root) + participant ContentHash as C10 ManifestVerifier + participant FaissIndex as C6 DescriptorIndex (FAISS HNSW) + participant TrtRuntime as C7 InferenceRuntime + participant FcAdapter as C8 FcAdapter (per-FC) + participant FC as [[Flight Controller]] + participant Pipeline as C1+C2+C2.5+C3+C3.5+C4+C5 pipeline (warm) + participant Fdr as C13 FdrWriter + + Companion->>ContentHash: verify(manifest, descriptor.index, tiles dir) + ContentHash-->>Companion: pass (or refuse takeoff) + Companion->>FaissIndex: faiss.read_index(IO_FLAG_MMAP_IFC) + FaissIndex-->>Companion: ready + Companion->>TrtRuntime: deserializeCudaEngine per cached .engine + TrtRuntime-->>Companion: engines ready + alt FC is ArduPilot Plane + Companion->>FcAdapter: open MAVLink 2.0 + signing handshake + FcAdapter->>FC: signing seed + key handshake (D-C8-9 = (d)) + FC-->>FcAdapter: signed handshake ack + FcAdapter-->>Companion: AP signed channel ready; key rotation logged to FDR + else FC is iNav + Companion->>FcAdapter: open MSP2 channel (unsigned, accepted residual risk) + FcAdapter-->>Companion: iNav channel ready + end + Companion->>FC: subscribe to FC IMU + attitude + GPS health (telemetry) + FC-->>Companion: first telemetry frame + Companion->>FC: query FC EKF last valid GPS + IMU-extrapolated pose (AC-5.1) + FC-->>Companion: warm-start pose + Companion->>Pipeline: warm with calibration + warm-start pose + Pipeline-->>Companion: ready (no estimate emitted yet) + Companion->>Fdr: open per-flight FDR record; log signing key rotation event + Note over Companion: Wait for first nav frame (F3 entry) +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Companion process start]) --> ReadManifest[C10 ManifestVerifier reads Manifest + sidecars] + ReadManifest --> ContentHashCheck{D-C10-3 SHA-256 content-hash gate passes?} + ContentHashCheck -->|no| RefuseTakeoff[Companion refuses takeoff: STATUSTEXT, no GPS_INPUT emit, FDR log] + ContentHashCheck -->|yes| MmapIndex[FAISS read_index IO_FLAG_MMAP_IFC] + MmapIndex --> LoadEngines[Per .engine: deserializeCudaEngine on C7] + LoadEngines --> EnginesOk{All engines deserialized OK?} + EnginesOk -->|no| RefuseTakeoff + EnginesOk -->|yes| FcDetect{Configured FC?} + FcDetect -->|ArduPilot Plane| ApSign[MAVLink 2.0 signing handshake D-C8-9] + FcDetect -->|iNav| InavOpen[Open MSP2 channel unsigned residual risk] + ApSign --> SignOk{Signing handshake OK?} + SignOk -->|no| RefuseTakeoff + SignOk -->|yes| WarmStart + InavOpen --> WarmStart[Query FC EKF last valid GPS + IMU-extrapolated pose AC-5.1] + WarmStart --> WarmPipeline[Warm C1 + C2 + C2.5 + C3 + C3.5 + C4 + C5 with calibration + warm-start pose] + WarmPipeline --> OpenFdr[C13 opens per-flight FDR; logs signing key rotation event] + OpenFdr --> Ready([Ready; awaiting first nav frame]) +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Companion | C10 | (`manifest_path`) | filesystem read | +| 2 | C10 | filesystem | content-hash sidecars | SHA-256 hex digests | +| 3 | Companion | FAISS | `.index` mmap pointer | C++ FAISS API | +| 4 | Companion | C7 / TensorRT | `.engine` deserialize | TensorRT IRuntime | +| 5 | Companion | FC (AP) | signing seed + handshake | MAVLink 2.0 signing | +| 6 | FC | Companion | warm-start pose + IMU/attitude/GPS health | MAVLink (AP) / MSP2 + MAVLink outbound (iNav) | +| 7 | Companion | C13 FDR | startup record (config snapshot, signing key rotation event, content-hash digests) | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| Content-hash mismatch | Step 2 | D-C10-3 sidecar verify | Refuse takeoff; STATUSTEXT to GCS; FDR records the event; operator must re-run F1 | +| FAISS mmap failure | Step 3 | C++ FAISS exception | Refuse takeoff; same as above | +| TRT deserialize failure | Step 4 | TensorRT API error | Refuse takeoff; report mismatched `(SM, JP, TRT, precision)` tuple to operator | +| Signing handshake fail (AP) | Step 5 | Handshake timeout / signed-message rejection | Refuse takeoff; clear-text reason via STATUSTEXT (handshake never succeeded → unsigned STATUSTEXT is acceptable for this case only) | +| FC unreachable | Step 6 | UART/USB read timeout | Retry with backoff; after `N` retries refuse takeoff | +| EKF returns no warm-start pose | Step 6 | Empty `GLOBAL_POSITION_INT` and no IMU prior | Defer pipeline warm-up until first valid prior; bound the wait by AC-NEW-1 budget; if exceeded, refuse takeoff | +| FDR open failure | Step 7 | Filesystem write error | Refuse takeoff (per AC-NEW-3 every payload class must be present from t=0) | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Total takeoff load (boot → first valid frame) | p95 < 30 s (AC-NEW-1) | Validated by IT-2 (50× cold boot SITL) and NFT-PERF-03 | +| FAISS mmap cost | sub-second (mmap is lazy) | First query in F3 pays the page-in cost | +| TRT deserialize per engine | 1–5 s typical on Jetson Orin Nano Super | Engines per `(SM 87, JetPack 6.2, TRT 10.3, precision)` cached on disk | +| Signing handshake (AP) | sub-second | Wired UART/USB; per-flight key | + +--- + +## Flow F3: Steady-state per-frame estimation + +### Description + +The system's **hot path**. For each nav-camera frame at 3 Hz nominal, run the canonical hierarchical pipeline `VIO → retrieval → re-rank → matching → AdHoP-conditional refinement → pose → fusion`, emit an `EmittedExternalPosition` to the FC at 5 Hz periodic, and write to FDR. The end-to-end latency budget is AC-4.1 p95 ≤ 400 ms; the partition is the D-CROSS-LATENCY-1 hybrid. + +### Preconditions + +- F2 (Takeoff load) completed; pipeline is warm. +- Camera ingest thread is running; FC IMU/attitude telemetry is flowing. +- `flight_state == IN_AIR` (hands-on indication that the upload code path is not loaded — F4 also gates on this). +- Last `EmittedExternalPosition` is either fresh or AC-5.2 fallback has not been triggered. + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Camera as Nav camera + participant C1 as C1 VioStrategy + participant C2 as C2 VprStrategy + participant C2_5 as C2.5 ReRanker + participant C3 as C3 CrossDomainMatcher + participant C3_5 as C3.5 ConditionalRefiner + participant C4 as C4 PoseEstimator (OpenCV solvePnPRansac + GTSAM Marginals) + participant C5 as C5 StateEstimator (GTSAM iSAM2) + participant C8 as C8 FcAdapter + participant FC as [[Flight Controller]] + participant Fdr as C13 FdrWriter + + Camera->>C1: NavCameraFrame_t + Camera->>C2: NavCameraFrame_t (parallel fan-out) + C1->>C5: VioOutput_t (relative pose + 6x6 cov + IMU bias) + C2->>C2_5: top-K=10 VprResult + C2_5->>C3: top-N=3 RerankResult + C3->>C3_5: MatchResult (with reprojection residual) + alt residual exceeds threshold + C3_5->>C3_5: invoke AdHoP refinement (~+30..90 ms p99) + C3_5->>C4: refined MatchResult + else residual below threshold + C3_5->>C4: passthrough MatchResult + end + C4->>C5: PoseEstimate (with 6x6 covariance from GTSAM Marginals or Jacobian degraded) + C5->>C5: iSAM2 update + IncrementalFixedLagSmoother (K=10..20 keyframes) + C5->>C8: PoseEstimate with provenance label + C8->>FC: GPS_INPUT (AP) or MSP2_SENSOR_GPS (iNav) at 5 Hz periodic + C8->>FC: STATUSTEXT or NAMED_VALUE_FLOAT for source label (out-of-band) + Note over C5: AC-4.5 internal smoothing — emits corrected current frame; logs smoothed past-frames to FDR + C8->>Fdr: emitted external-position record + C5->>Fdr: per-frame estimate + smoothed-past entries (NFT-6) +``` + +### Flowchart + +```mermaid +flowchart TD + Start([NavCameraFrame received at 3 Hz]) --> Fanout[Fan out to C1 VIO and C2 VPR in parallel] + Fanout --> C1Out[C1 produces VioOutput: relative pose + 6x6 cov + IMU bias + feature quality] + Fanout --> C2Out[C2 produces top-K=10 VprResult against FAISS HNSW] + C2Out --> C2_5[C2.5 single-pair LightGlue per candidate; rank by inlier count -> top-N=3] + C2_5 --> C3[C3 DISK + LightGlue × N pairs FP16] + C3 --> ResidualCheck{Reprojection residual > AdHoP threshold?} + ResidualCheck -->|yes| C3_5[C3.5 AdHoP refinement; ~+30..90 ms p99] + ResidualCheck -->|no| Passthrough[C3.5 passthrough] + C3_5 --> C4 + Passthrough --> C4[C4 OpenCV solvePnPRansac IPPE] + C4 --> ThermalCheck{D-CROSS-LATENCY-1 thermal-throttle telemetry crosses threshold?} + ThermalCheck -->|no, steady-state| GtsamMarginals[GTSAM Marginals 6x6 cov D-C4-2 = b] + ThermalCheck -->|yes, hybrid degraded| Jacobian[Jacobian 6x6 cov D-C4-2 = a; degrade C2.5 N to 2] + GtsamMarginals --> C5 + Jacobian --> C5 + C1Out --> C5[C5 iSAM2 + CombinedImuFactor + IncrementalFixedLagSmoother K=10..20 keyframes] + C5 --> Provenance{Provenance label?} + Provenance -->|fresh anchor| LabelSat[satellite_anchored] + Provenance -->|propagated under no fresh anchor| LabelVisual[visual_propagated] + Provenance -->|IMU-only| LabelDeadReckon[dead_reckoned] + LabelSat --> Emit[C8 emits per-FC GPS_INPUT or MSP2_SENSOR_GPS at 5 Hz; STATUSTEXT for source label] + LabelVisual --> Emit + LabelDeadReckon --> Emit + Emit --> FdrWrite[C13 FDR: emitted record + per-frame estimate + smoothed-past entries] + FdrWrite --> Done([Frame complete]) +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Camera | C1 | `NavCameraFrame` | RGB pixel buffer + timestamp | +| 1 | Camera | C2 | `NavCameraFrame` (same frame) | same | +| 2 | C8 inbound | C1, C5 | `ImuWindow` (timestamp-aligned to frame) | DTO; same window for both consumers | +| 3 | C1 | C5 | `VioOutput` | relative SE(3) + 6×6 cov + bias + feature quality | +| 4 | C2 | C2.5 | `VprResult` (top-K=10 tile IDs ranked by descriptor distance) | DTO | +| 5 | C2.5 | C3 / C3.5 | `RerankResult` (top-N=3 tile IDs ranked by inlier count) | DTO | +| 6 | C3 → C3.5 → C4 | match pipeline | `MatchResult` (2D-3D corresp. + RANSAC inliers + reprojection residual) | DTO | +| 7 | C4 | C5 | `PoseEstimate` (WGS84 + 6×6 cov + provenance + `last_satellite_anchor_age_ms`) | DTO | +| 8 | C5 | C8 | smoothed/refined `PoseEstimate` | DTO | +| 9 | C8 | FC | `EmittedExternalPosition` | MAVLink `GPS_INPUT` (AP) or MSP2 `MSP2_SENSOR_GPS` (iNav) | +| 10 | C8 | FC | provenance label | MAVLink `STATUSTEXT` / `NAMED_VALUE_FLOAT` (AP) or MSP equivalent (iNav) | +| 11 | C5 + C8 | C13 FDR | per-frame estimate + emitted MAVLink frame + smoothed past-frame entries | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| Frame-to-frame registration failure | C1 | VioOutput marks low feature quality OR matcher fails | F6 sharp-turn / disconnected-segment re-localization | +| Cross-domain matching insufficient inliers | C3 | RANSAC inlier count below threshold | Mark frame as no satellite anchor; provenance becomes `visual_propagated`; F6 if persists | +| Reprojection residual exceeds AdHoP threshold | C3 | residual > threshold | C3.5 AdHoP refinement invoked (worst-case 2× C3 latency on triggered frames; budgeted in D-CROSS-LATENCY-1) | +| GTSAM Marginals exceeds latency budget | C4 | per-frame timer | D-CROSS-LATENCY-1 hybrid auto-degrade: drop to Jacobian covariance + N=2 | +| Sustained latency overrun (multi-frame) | end-to-end | rolling p95 monitor | Drop oldest frame from camera ingest queue (~10% drop budget per AC-4.1); FDR logs the drop | +| FC GPS reports denial/spoof | C8 inbound | MAVLink GPS health bit / spoof flag | F7 spoofing-promotion + F5 if visual is also lost | +| FC stops accepting `GPS_INPUT` (AP) | C8 outbound | no source-set acknowledgement after `MAV_CMD_SET_EKF_SOURCE_SET` | D-C8-2-FALLBACK path; AC-5.2 IMU-only fallback if persistent | +| Camera frame drop | Camera | ingest queue overflow | Drop oldest frame; log in FDR | +| Dead-reckoning >3 s | C5 | watchdog | AC-5.2 — system logs failure; FC enters IMU-only | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| End-to-end latency (camera capture → FC GPS frame) | AC-4.1 p95 ≤ 400 ms | D-CROSS-LATENCY-1 partition; NFT-PERF-01 + NFT-9 | +| Tail | p99 ≤ 600 ms (allows AdHoP-triggered frames) | NFT-9 | +| Memory | < 8 GB shared on Jetson | AC-4.2; NFT-LIM-01 | +| Frame rate | 3 Hz nominal; ~10 % drop allowed under sustained load | AC-4.1 | +| C8 emit cadence | 5 Hz periodic per D-C8-5 | independent of nav-frame rate; last-known-pose if no new estimate | +| Mode-transition into degraded label | ≤ 1 frame OR ≤ 400 ms (AC-3.5) | applies on transition to `visual_propagated` / `dead_reckoned` | + +### Notes — AC-4.5 internal smoothing (sub-flow of F3) + +GTSAM iSAM2 with `IncrementalFixedLagSmoother` retroactively refines past keyframes (window K = 10–20 per D-C5-3). The current frame emitted to the FC carries the smoothing-corrected state — but the FC log itself remains forward-time only (Mode B Fact #107). FDR (C13) MUST log the smoothed past-frame estimates so post-mission analysis can validate AC-4.5. IT-11 measures the smoothing-loop look-back accuracy independently of FC consumption. + +--- + +## Flow F4: Mid-flight tile generation + local cache write + +### Description + +For every successful satellite-anchored frame whose `TileQualityMetadata` clears the publish threshold, orthorectify the nav frame onto basemap projection, deduplicate against the existing local tile cache, and write the result locally in `satellite-provider`-compatible on-disk format. **No outbound write while airborne** — process-level isolation enforces this: the C11 upload path is not loaded in the airborne companion image (ADR-004). The post-landing tool (F10) is a separate process / image. + +### Preconditions + +- F3 produced a `PoseEstimate` with provenance `satellite_anchored` and covariance below the publish threshold. +- `flight_state == IN_AIR` is signalled by FC `MAV_STATE`; the in-air image does not contain C11 (process-level isolation). +- Local C6 tile store has free quota (per AC-NEW-3 FDR sub-budget allocation). +- Mid-flight tile metadata schema (quality_metadata) is configured per AC-NEW-7 + D-PROJ-2 contract sketch. + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Frame as NavCameraFrame_t (post-F3) + participant Pose as PoseEstimate_t (from F3) + participant Ortho as Orthorectifier (C6 sub-component) + participant Dedup as Deduper (latest/highest-quality wins) + participant TileStore as C6 TileStore (filesystem + Postgres) + participant Fdr as C13 FdrWriter + + Pose->>Ortho: provenance == satellite_anchored AND covariance below threshold? + alt yes + Frame->>Ortho: NavCameraFrame_t + Ortho->>Ortho: orthorectify with calibration + pose + Ortho->>Dedup: candidate Tile (zoomLevel, lat, lon, capture_timestamp, quality_metadata) + Dedup->>TileStore: dedup query by (zoomLevel, lat, lon) + alt new or higher-quality + Dedup->>TileStore: write JPEG to ./tiles/{zoomLevel}/{x}/{y}.jpg + Dedup->>TileStore: insert/update Postgres row with source=onboard_ingest, voting_status=pending + Dedup->>Fdr: tile-write event + quality metadata + else duplicate or lower-quality + Dedup->>Fdr: skip event + end + else no (provenance not satellite_anchored OR cov above threshold) + Pose->>Fdr: skip event (rationale logged) + end +``` + +### Flowchart + +```mermaid +flowchart TD + Start([F3 emitted PoseEstimate_t]) --> Provenance{provenance == satellite_anchored AND cov below threshold?} + Provenance -->|no| LogSkip[FDR logs skip with rationale] + Provenance -->|yes| Ortho[Orthorectify NavCameraFrame_t with calibration + pose] + Ortho --> BuildMeta[Build TileQualityMetadata: estimator_label + 2x2 cov + last_anchor_age + MRE + IMU bias norm] + BuildMeta --> DedupQuery{Dedup vs existing tiles by zoomLevel + lat + lon} + DedupQuery -->|new cell| WriteFs[Write JPEG to filesystem . tiles . zoom . x . y . jpg] + DedupQuery -->|existing higher quality| WriteFs + DedupQuery -->|existing same or lower quality| LogSkip + WriteFs --> InsertDb[Postgres INSERT or UPDATE with source=onboard_ingest, voting_status=pending] + InsertDb --> FdrLog[FDR logs tile-write event + metadata] + FdrLog --> Done([Tile available locally; awaits F10 post-landing upload]) + LogSkip --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | F3 | C6 ortho | (`NavCameraFrame`, `PoseEstimate`, `CameraCalibration`) | DTO | +| 2 | C6 ortho | C6 dedup | candidate `Tile` + `TileQualityMetadata` | JPEG body + metadata DTO | +| 3 | C6 dedup | C6 store filesystem | tile JPEG | `./tiles/{zoomLevel}/{x}/{y}.jpg` (mirror of `satellite-provider`) | +| 4 | C6 dedup | C6 Postgres | tile row + metadata | SQL INSERT/UPDATE; `source=onboard_ingest`, `voting_status=pending` | +| 5 | C6 | FDR | tile-write event | FDR record (counts against AC-NEW-3 budget) | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| Filesystem write fails | Step 3 | filesystem error | Skip tile; FDR logs error; pipeline continues (tile generation is best-effort, not safety-critical) | +| Postgres insert fails | Step 4 | DB error | Skip tile; FDR logs error | +| Local cache quota exhausted | Step 3 | pre-write free-space check | LRU-evict oldest **mid-flight** tile (never evict pre-flight `satellite-provider` tiles); FDR logs eviction | +| `flight_state` glitch reports `ON_GROUND` mid-flight | architectural | software guard — but C11 is not loaded anyway | Defense-in-depth holds: even if guard misfires, C11 binary is not present in the airborne image | +| Dedup race (two threads writing same cell) | Step 4 | DB unique constraint or filesystem `O_EXCL` | Retry once with the freshest candidate; FDR logs race | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Per-tile orthorectification cost | not on the AC-4.1 critical path | Runs off the F3 hot loop; dropped first under thermal throttle | +| Per-tile write latency | < 1 frame interval typical (333 ms @ 3 Hz) | If exceeded, drop the tile rather than back-pressure F3 | +| Cache footprint growth | bounded by AC-NEW-3 mid-flight tile sub-budget | LRU-evict mid-flight tiles only | + +--- + +## Flow F5: Visual blackout + spoofed-GPS failsafe + +### Description + +When the navigation camera becomes fully unusable (clouds, occlusion, whiteout, hardware fault) **and/or** the FC reports GPS denial/spoof, the system must NOT pretend to have visual or GPS data. It transitions to `dead_reckoned` propagation from the last trusted state + FC IMU/attitude/airspeed/altitude, grows covariance monotonically, escalates the MAVLink fix-quality field as the covariance crosses thresholds, and never re-promotes spoofed GPS without a 10-s GPS-health + visual-consistency gate. Reference: AC-3.5, AC-NEW-8. + +### Preconditions + +- F3 was running normally before the trigger. +- FC is still reachable (the link itself works; it's the GPS source / camera that failed). + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Camera as Nav camera + participant C1 as C1 VioStrategy + participant C5 as C5 StateEstimator + participant C8 as C8 FcAdapter + participant FC as [[Flight Controller]] + participant Gcs as [[QGroundControl]] + participant Fdr as C13 FdrWriter + + alt camera unusable (whiteout / hw fault) + Camera->>C1: degraded or no frame + C1->>C5: VioOutput with low feature_quality OR no output + end + alt FC reports GPS denial/spoof + FC->>C8: GPS health bit / spoof flag set + C8->>C5: gps_health_event(denied | spoofed) + end + C5->>C5: switch label to dead_reckoned within ≤ 1 frame OR ≤ 400 ms + C5->>C5: propagate from last trusted state + FC IMU/attitude/airspeed/altitude; cov grows monotonically + C5->>C8: PoseEstimate (label = dead_reckoned) + C8->>C8: degrade horiz_accuracy field per AC-NEW-8 thresholds + alt 95% cov semi-major axis ≤ 100 m + C8->>FC: GPS_INPUT/MSP2 with honest horiz_accuracy + else 95% cov in (100, 500] m + C8->>FC: GPS_INPUT/MSP2 with fix_quality "2D fix or worse" + else 95% cov > 500 m OR blackout > 30 s + C8->>FC: GPS_INPUT/MSP2 with horiz_accuracy=999.0 (no fix) + C8->>Gcs: VISUAL_BLACKOUT_FAILSAFE STATUSTEXT + end + C8->>Gcs: VISUAL_BLACKOUT_IMU_ONLY STATUSTEXT at 1–2 Hz + C5->>Fdr: degraded-mode entry + per-frame estimate + cov + Note over C5,FC: spoofed GPS NEVER re-enters the estimator unless FC GPS health stable + non-spoofed for ≥10 s AND visual/satellite consistency check succeeds +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Visual blackout AND/OR GPS denial/spoof detected]) --> SwitchLabel[C5 switches label to dead_reckoned within ≤1 frame OR ≤400 ms] + SwitchLabel --> RejectSpoof[Reject spoofed GPS as estimator input] + RejectSpoof --> Propagate[Propagate from last trusted state + FC IMU/attitude/airspeed/altitude] + Propagate --> CovGrow[Covariance grows monotonically] + CovGrow --> Threshold{95 percent cov semi-major axis?} + Threshold -->|≤ 100 m| EmitNormal[C8 emits GPS_INPUT or MSP2 with honest horiz_accuracy] + Threshold -->|100 m to 500 m| EmitDegraded[C8 emits with fix_quality 2D fix or worse] + Threshold -->|gt 500 m OR blackout gt 30 s| EmitNoFix[C8 emits horiz_accuracy=999.0 + VISUAL_BLACKOUT_FAILSAFE STATUSTEXT] + EmitNormal --> Recovery{Anchor recovers OR GPS-health stable + non-spoofed for >=10 s + visual consistency check?} + EmitDegraded --> Recovery + EmitNoFix --> Recovery + Recovery -->|yes anchor| ResumeF3[Resume F3 with provenance restoration] + Recovery -->|yes GPS gate| ConsentReturn[Allow real-GPS back into estimator] + Recovery -->|no| Continue[Continue degraded mode; FDR per-frame] + Continue --> Threshold + ResumeF3 --> Done([Recovered]) + ConsentReturn --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Camera / C1 | C5 | degraded `VioOutput` or no output | DTO | +| 2 | FC / C8 inbound | C5 | GPS-health / spoof event | event DTO | +| 3 | C5 | C8 | `PoseEstimate` with `provenance=dead_reckoned` and growing covariance | DTO | +| 4 | C8 | FC | per-FC degraded `GPS_INPUT` / `MSP2_SENSOR_GPS` | MAVLink / MSP2 | +| 5 | C8 | GCS | `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT (1–2 Hz); escalates to `VISUAL_BLACKOUT_FAILSAFE` at thresholds | MAVLink STATUSTEXT | +| 6 | C5 / C8 | FDR | degraded-mode entry + per-frame estimate + thresholds crossed | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| 30-s budget exhausted with no anchor | C5 | timer | Escalate to no-fix; FC then handles AC-5.2 IMU-only fallback | +| Spoofed GPS attempts to re-enter | C5 | re-entry gate (10-s health + visual-consistency) | Reject; FDR logs the rejection; STATUSTEXT to GCS | +| Camera comes back but FC still spoofed | F3 / F7 | per-frame check | Resume `satellite_anchored` provenance via F6 re-localization; trigger F7 spoofing-promotion | +| FDR write back-pressure during degraded mode | C13 | queue overflow | Logged rollover (NFT-6); never silent | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Mode-transition latency | ≤ 1 frame OR ≤ 400 ms (AC-3.5) | NFT-RES-04 / FT-N-04 | +| Threshold escalation cadence | per-frame | AC-NEW-8 | +| GCS STATUSTEXT cadence | 1–2 Hz | AC-6.1 + AC-NEW-8 | +| Recovery — visual anchor | ≤ 1–2 frames after first valid match | F6 sharp-turn / disconnected-segment re-localization | +| Recovery — GPS re-promotion | NEVER < 10 s + visual-consistency check | AC-NEW-8 | + +--- + +## Flow F6: Sharp-turn / disconnected-segment re-localization + +### Description + +Frame-to-frame registration may fail under sharp turns (<5 % overlap, AC-3.2), disconnected segments (AC-3.3), or after a brief visual blackout. F6 restores `satellite_anchored` provenance via the C2 → C2.5 → C3 → C3.5 → C4 → C5 path, re-anchoring the estimate. If failure persists for ≥ 3 consecutive frames AND ≥ 2 s, the system requests an operator re-loc hint via GCS (AC-3.4) while continuing dead-reckoned propagation. + +### Preconditions + +- F3 was running normally; frame-to-frame registration just failed. +- Visual is **not** in full blackout (else go to F5). +- FC GPS may or may not be present (the re-loc path doesn't depend on FC GPS). + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant C1 as C1 VioStrategy + participant C2 as C2 VprStrategy + participant C2_5 as C2.5 ReRanker + participant C3 as C3 CrossDomainMatcher + participant C5 as C5 StateEstimator + participant C8 as C8 FcAdapter + participant Gcs as [[QGroundControl]] + participant Operator as Operator + participant Fdr as C13 FdrWriter + + C1->>C5: VioOutput marks frame-to-frame fail (or low feature quality) + C5->>C2: trigger satellite re-localization for current frame + C2->>C2_5: top-K=10 (full pipeline retried) + C2_5->>C3: top-N=3 + alt re-localization succeeds within 1–2 frames + C3->>C5: MatchResult with sufficient inliers + C5->>C5: restore satellite_anchored provenance + C5->>C8: PoseEstimate (label = satellite_anchored) + C8->>Fdr: recovery event + else re-localization fails ≥ 3 consecutive frames AND ≥ 2 s + C5->>C8: PoseEstimate (label = visual_propagated → dead_reckoned) + C8->>Gcs: STATUSTEXT requesting operator re-loc hint (AC-3.4) + Gcs->>Operator: prompt + Operator-->>Gcs: re-loc hint (region / pose seed) + Gcs-->>C8: NAMED_VALUE_FLOAT or custom-dialect re-loc hint + C8->>C5: re-loc hint + C5->>C2: prior-anchored retry (limit search by hint region) + end + C5->>Fdr: per-frame estimate + outage event chain +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Frame-to-frame registration fails]) --> Trigger[C5 triggers C2 satellite re-localization] + Trigger --> RetryCount{≥ 1–2 frames since trigger?} + RetryCount -->|yes| RetryFull[Full C2 → C2.5 → C3 → C3.5 → C4 retry] + RetryFull --> Inliers{Sufficient inliers from C3?} + Inliers -->|yes| Restore[Restore satellite_anchored; resume F3] + Inliers -->|no| Counter[Increment outage counter] + Counter --> ThreeFrameTwoSecond{≥ 3 frames AND ≥ 2 s?} + ThreeFrameTwoSecond -->|no| RetryFull + ThreeFrameTwoSecond -->|yes| Operator[STATUSTEXT to GCS requesting operator re-loc hint AC-3.4] + Operator --> WaitHint{Hint received within bound?} + WaitHint -->|yes| BoundedRetry[C2 retry with hint region prior] + WaitHint -->|no| Degraded[Continue dead-reckoned propagation; F5 thresholds apply] + BoundedRetry --> Inliers + Restore --> Done([Re-anchored; F3 resumes]) + Degraded --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | C1 / C5 | C2 | re-localization trigger + last trusted pose prior | DTO | +| 2 | C2 → C2.5 → C3 → C3.5 → C4 | C5 | `MatchResult` + `PoseEstimate` (or fail) | DTO | +| 3 | C8 | GCS | re-loc-hint-request STATUSTEXT (after AC-3.4 thresholds) | MAVLink STATUSTEXT | +| 4 | GCS / Operator | C8 | re-loc hint (`NAMED_VALUE_FLOAT` or custom-dialect) | MAVLink | +| 5 | C8 | C5 → C2 | hint region prior | DTO | +| 6 | C5 / C8 | FDR | per-frame estimate + outage event chain | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| Re-localization fails after operator hint | C2/C3 | per-frame inlier count | Continue dead-reckoned; F5 thresholds escalate `horiz_accuracy` | +| Operator hint never arrives | GCS link | bounded wait | Continue dead-reckoned; FDR logs no-hint case | +| GCS link fully down | C8 | link-health monitor | Continue dead-reckoned; FDR logs unreachable-GCS case | +| Hint region invalidates the cache | C6 | cache miss | Fall back to global re-localization (full C2 candidate set) | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Re-anchor on sharp turn | within 1–2 frames after first valid match (AC-3.2) | FT-P-07 + IT-4 | +| Disconnected-segment recovery | ≥ 3 disconnected segments per flight (AC-3.3) | core capability, not degraded mode | +| Operator-hint round-trip | best-effort (GCS bandwidth-limited) | AC-3.4 + AC-6.2 | + +--- + +## Flow F7: Spoofing-promotion via EKF source-set switch + +### Description + +When the FC reports GPS denial/spoof while the companion estimate is healthy, the companion publishes its estimate to the FC's EKF source-set 2 and issues `MAV_CMD_SET_EKF_SOURCE_SET` to make set 2 primary (D-C8-2 = (b)). When the companion is unavailable, the FC switches back to set 1 (real GPS). On iNav, the companion is the sole GPS source and there is no source-set switching — the equivalent is just keeping `MSP2_SENSOR_GPS` flowing. + +This flow is a **hot path**: AC-NEW-2 ≤ 3 s p95 from spoof onset to companion estimate becoming primary. Status: D-C8-2 = (b) is `Selected with runtime gate` — IT-3 SITL validation is the lock gate (Mode B Fact #111). + +### Preconditions + +- ArduPilot Plane FC (D-C8-2 only applies to AP path). +- FC reports `GPS_RAW_INT` health degradation OR a spoof flag. +- Companion estimate is healthy (provenance = `satellite_anchored` OR `visual_propagated` within fresh anchor age). + +### Sequence Diagram (ArduPilot Plane path) + +```mermaid +sequenceDiagram + participant FC as [[ArduPilot Plane FC]] + participant C8 as C8 FcAdapter (AP) + participant C5 as C5 StateEstimator + participant Gcs as [[QGroundControl]] + participant Fdr as C13 FdrWriter + + FC->>C8: GPS_RAW_INT health degraded OR spoof flag set + C8->>C5: gps_health_event(denied | spoofed) + alt companion estimate is healthy + C8->>FC: GPS_INPUT (5 Hz periodic) on source-set 2 with signed MAVLink 2.0 + C8->>FC: MAV_CMD_SET_EKF_SOURCE_SET to make set 2 primary + FC-->>C8: command ack + C8->>Fdr: source-set switch event + signing key reference + C8->>Gcs: STATUSTEXT "EKF_SOURCE_SET=2" + else companion estimate not healthy + Note over C8,FC: stay on source-set 1; F5 failsafe thresholds apply + end + Note over FC: when companion becomes unavailable, FC auto-switches back to source-set 1 +``` + +### Flowchart + +```mermaid +flowchart TD + Start([FC reports GPS denial OR spoof]) --> CompanionHealthy{Companion estimate healthy?} + CompanionHealthy -->|no| F5Path[Stay on source-set 1; F5 failsafe thresholds apply] + CompanionHealthy -->|yes| FcType{FC type?} + FcType -->|ArduPilot Plane| PublishSet2[C8 publishes GPS_INPUT to source-set 2 signed MAVLink 2.0] + FcType -->|iNav| ContinueMSP[Continue MSP2_SENSOR_GPS; iNav has no source-set] + PublishSet2 --> SetCmd[Send MAV_CMD_SET_EKF_SOURCE_SET set=2] + SetCmd --> Ack{Ack within latency budget?} + Ack -->|yes| LogSwitch[FDR + STATUSTEXT EKF_SOURCE_SET=2] + Ack -->|no, IT-3 fail| FallbackPath[D-C8-2-FALLBACK options a or b or c] + ContinueMSP --> Done([Companion is sole GPS source on iNav by construction]) + LogSwitch --> Done + FallbackPath --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | FC | C8 | GPS health / spoof event | MAVLink `GPS_RAW_INT` + flags | +| 2 | C8 | C5 | gps_health_event | DTO | +| 3 | C8 | FC | `GPS_INPUT` on source-set 2 + `MAV_CMD_SET_EKF_SOURCE_SET` | MAVLink 2.0 signed | +| 4 | FC | C8 | command ack | MAVLink | +| 5 | C8 | FDR + GCS | switch event + STATUSTEXT | FDR record + MAVLink STATUSTEXT | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| FC does not ack `MAV_CMD_SET_EKF_SOURCE_SET` | Step 4 | timeout | Retry once; if persistent, D-C8-2-FALLBACK; FDR logs | +| Real GPS becomes healthy mid-spoof (not actually spoof) | Step 1 | FC GPS health restored AND spoof flag cleared | Source-set switch back to 1 (FC-driven); companion stays on standby | +| Spoofed real-GPS attempts re-promotion | C5 / C8 | 10-s + visual-consistency gate | Reject; AC-NEW-8 | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Spoof onset → primary switch (AP) | p95 < 3 s (AC-NEW-2) | NFT-PERF-04; IT-3 SITL is the runtime gate for D-C8-2 = (b) | +| iNav companion-as-sole-GPS lateral check | continuous; no switch needed | iNav has no source-set arbitration | + +--- + +## Flow F8: Companion reboot recovery + +### Description + +The companion process restarts mid-flight (crash, watchdog reset, voltage glitch). The FC remains armed and continues IMU-only dead reckoning during the gap (~500 m drift max at 60 km/h cruise per AC-NEW-1). On restart, the companion re-runs F2 (Takeoff load), seeded with the FC's current IMU-extrapolated pose. Cold-start TTFF ≤ 30 s p95 (AC-NEW-1) is the same budget as a clean takeoff. + +### Preconditions + +- FC remained armed during the gap; FC IMU is still reporting. +- The companion's NVM cache survived the reboot (warm cache; D-C10-3 SHA-256 content-hash gate verifies integrity). + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Companion as Companion (post-reboot) + participant C10 as C10 ManifestVerifier + participant FaissIndex as C6 DescriptorIndex + participant TrtRuntime as C7 InferenceRuntime + participant C8 as C8 FcAdapter + participant FC as [[Flight Controller]] + participant Pipeline as C1+...+C5 (warm) + participant Fdr as C13 FdrWriter + + Note over Companion: Process restart; FC continues IMU-only dead reckoning + Companion->>C10: warm-cache content-hash verify + C10-->>Companion: pass (or refuse takeoff if tampering) + Companion->>FaissIndex: faiss.read_index mmap + Companion->>TrtRuntime: deserializeCudaEngine + Companion->>C8: re-establish MAVLink 2.0 signing handshake (per-flight key still valid) + C8->>FC: signing handshake + FC-->>C8: ack + C8->>FC: query GLOBAL_POSITION_INT + GPS health + FC-->>C8: current IMU-extrapolated pose + Companion->>Pipeline: warm with IMU-extrapolated pose (AC-5.3) + Companion->>Fdr: open continuation record (NOT a new flight; same flight_id) + reboot event + Note over Companion: Pipeline re-enters F3; first valid frame budget = AC-NEW-1 30 s +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Companion process restart while FC armed]) --> Verify[D-C10-3 SHA-256 content-hash gate] + Verify --> Pass{Pass?} + Pass -->|no| Refuse[Refuse to re-arm; STATUSTEXT to GCS; FDR log] + Pass -->|yes| LoadCache[FAISS mmap + TRT deserialize] + LoadCache --> Reconnect[Re-establish MAVLink 2.0 signing handshake] + Reconnect --> WarmStart[Query FC IMU-extrapolated pose AC-5.3] + WarmStart --> WarmPipe[Warm C1+...+C5] + WarmPipe --> ReopenFdr[FDR opens continuation record same flight_id] + ReopenFdr --> ReenterF3([Re-enter F3 within AC-NEW-1 budget]) +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Companion | C10 | manifest path + content-hash sidecars | filesystem read | +| 2 | Companion | FAISS / C7 | mmap pointer + TRT engines | runtime API | +| 3 | C8 | FC | MAVLink 2.0 signing handshake (re-handshake; per-flight key valid) | MAVLink 2.0 | +| 4 | FC | C8 | IMU-extrapolated pose | `GLOBAL_POSITION_INT` | +| 5 | C13 | FDR | reboot continuation record | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| Cache integrity check fails | Step 1 | SHA-256 mismatch | Refuse to re-arm; STATUSTEXT; companion stays out of source-set 2 | +| MAVLink signing re-handshake fails | Step 3 | handshake timeout | Refuse to re-arm | +| AC-NEW-1 budget exceeded | end-to-end | timer | F5 dead-reckoned mode kicks in once first frame is emitted; FDR logs the over-budget event | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Reboot → first valid frame | p95 < 30 s (AC-NEW-1) | Same budget as cold takeoff | +| FC dead-reckoning drift during gap | ≤ ~500 m at 60 km/h cruise | inherited from AC-NEW-1 rationale | + +--- + +## Flow F9: GCS telemetry stream + +### Description + +Send a 1–2 Hz downsampled summary of the per-frame estimate to QGroundControl over MAVLink (AC-6.1). High-rate per-frame data stays on the local FDR; the GCS link is bandwidth-limited and best-effort. + +### Preconditions + +- F2 completed; pipeline is warm. +- GCS link is healthy (link drop is non-fatal; companion continues; FDR retains everything). + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant C5 as C5 StateEstimator + participant C8 as C8 FcAdapter / Telemetry + participant Gcs as [[QGroundControl]] + participant Fdr as C13 FdrWriter + + loop every 500–1000 ms + C5->>C8: latest PoseEstimate + provenance + cov + system health (CPU/GPU/temp/throttle) + C8->>C8: downsample + serialize per AC-6.1 + C8->>Gcs: STATUSTEXT (provenance + degraded-mode flags) + NAMED_VALUE_FLOAT (cov ellipse axis) + GPS_RAW_INT (downsampled pos) + C8->>Fdr: telemetry-emit record + end + alt operator command inbound (AC-6.2) + Gcs->>C8: STATUSTEXT or NAMED_VALUE_FLOAT or custom-dialect command + C8->>C5: forward command (e.g., re-loc hint, sector reclassification) + end +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | C5 | C8 | latest `PoseEstimate` + system health | DTO | +| 2 | C8 | GCS | downsampled summary | MAVLink STATUSTEXT + NAMED_VALUE_FLOAT + GPS_RAW_INT | +| 3 | GCS | C8 | operator command | MAVLink | +| 4 | C8 | C5 / C12 | parsed command | DTO | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| GCS link drop | Step 2 | link health monitor | Continue; FDR retains everything; reconnect when link returns | +| Operator command malformed | Step 3 | parser error | Reject; STATUSTEXT explanation; FDR logs | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| Telemetry rate | 1–2 Hz (AC-6.1) | NFT-PERF + FT-P-12 | +| Per-frame data on FDR | full rate | AC-NEW-3 | + +--- + +## Flow F10: Post-landing tile upload + +### Description + +After the UAV has landed and `flight_state == ON_GROUND` is confirmed, the operator triggers C11 (a separate operator-side process / image — **not present in the airborne companion image**, ADR-004) which reads locally-saved mid-flight tiles from C6 and uploads them to `satellite-provider`'s ingest endpoint per the D-PROJ-2 contract sketch. Each tile carries quality metadata sufficient for the parent-suite voting layer to decide promotion `pending → trusted` (D-PROJ-2 design task #2; not yet implemented service-side — `mock-suite-sat-service` stands in for testing). + +### Preconditions + +- `flight_state == ON_GROUND` confirmed by the FC's `MAV_STATE` (operator's workstation reads this off the FC or from the FDR). +- Operator workstation has network reach to `satellite-provider` (or `mock-suite-sat-service` in test). +- Local C6 tile store has mid-flight tiles with `voting_status=pending` and quality metadata. +- Per-flight onboard signing key (generated at takeoff load, baked into tile metadata) is available to C11 for payload signing. + +### Sequence Diagram + +```mermaid +sequenceDiagram + participant Operator + participant C11 as C11 PostLandingUploadTool (workstation) + participant C6 as C6 TileStore (companion or workstation mirror) + participant SatelliteProvider as [[satellite-provider]] (D-PROJ-2 endpoint, planned) + participant Fdr as C13 FdrWriter + + Operator->>C11: trigger upload(flight_id) + C11->>C6: read mid-flight tiles where voting_status=pending AND flight_id=... + C6-->>C11: batched Tile + TileQualityMetadata + loop per batch + C11->>SatelliteProvider: POST /api/satellite/tiles/ingest (multipart) signed with per-flight key + SatelliteProvider-->>C11: 202 Accepted with batch UUID + per-tile status (queued | rejected | duplicate | superseded) + C11->>C6: update voting_status=uploaded for accepted tiles + C11->>Fdr: upload-batch event + service response + end + C11-->>Operator: upload report (counts, rejections, duplicates) + Note over SatelliteProvider: voting layer (D-PROJ-2 design task #2) eventually promotes pending → trusted; out of scope for this flow +``` + +### Flowchart + +```mermaid +flowchart TD + Start([Operator triggers C11 with flight_id]) --> StateCheck{flight_state == ON_GROUND confirmed?} + StateCheck -->|no| Refuse[Refuse to upload; report to operator] + StateCheck -->|yes| ReadTiles[Read mid-flight tiles voting_status=pending] + ReadTiles --> Empty{Any tiles to upload?} + Empty -->|no| Done([No-op; report]) + Empty -->|yes| Batch[Batch by configurable size] + Batch --> Sign[Sign payload with per-flight onboard signing key] + Sign --> Post[POST /api/satellite/tiles/ingest with multipart batch] + Post --> Endpoint{Endpoint responds?} + Endpoint -->|2xx| Update[Update voting_status=uploaded for accepted tiles] + Endpoint -->|429 rate limit| Backoff[Back off and retry] + Endpoint -->|5xx OR network error| Retry[Retry with bounded retries] + Endpoint -->|endpoint not yet implemented| Queue[Keep batches queued locally; never block] + Update --> More{More batches?} + More -->|yes| Batch + More -->|no| Report[Report to operator with counts and rejections] + Backoff --> Post + Retry --> Post + Queue --> Report + Report --> Done +``` + +### Data flow + +| Step | From | To | Data | Format | +|------|------|----|------|--------| +| 1 | Operator | C11 | (`flight_id`) | CLI / GUI | +| 2 | C11 | C6 | SELECT tiles WHERE `voting_status=pending` AND `flight_id=...` | SQL + filesystem reads | +| 3 | C11 | `satellite-provider` | multipart batch (tile JPEG + metadata + signature) | per D-PROJ-2 contract sketch | +| 4 | `satellite-provider` | C11 | 202 Accepted with batch UUID + per-tile statuses | JSON | +| 5 | C11 | C6 | UPDATE voting_status | SQL UPDATE | +| 6 | C11 | FDR | upload-batch event + service response | FDR record | + +### Error scenarios + +| Error | Where | Detection | Recovery | +|-------|-------|-----------|----------| +| `flight_state != ON_GROUND` | Step 1 | FC `MAV_STATE` query | Refuse upload; never proceed (architectural invariant) | +| `satellite-provider` ingest endpoint not yet implemented (D-PROJ-2 open) | Step 3 | 404 / 501 / connection refused | Keep batches queued locally; report to operator; retry on next operator trigger | +| Network rate-limit (429) | Step 3 | HTTP 429 | Back off + retry | +| Per-tile rejected by service | Step 4 | per-tile status `rejected` | Mark `voting_status=rejected_by_service`; FDR logs reason; do not retry that tile | +| Per-tile duplicate / superseded | Step 4 | per-tile status `duplicate` / `superseded` | Mark accordingly; not an error | +| Signature verification fails service-side | Step 3 | service rejects all tiles in batch | Investigate per-flight signing key; FDR logs; do NOT downgrade or remove signing | +| Operator workstation runs out of disk space mid-upload | Step 5 | filesystem check | Pause; surface to operator; never silently drop tiles | + +### Performance expectations + +| Metric | Target | Notes | +|--------|--------|-------| +| End-to-end upload time | not time-critical | post-landing; bursty | +| Batch size | configurable; default sized to workstation bandwidth | tunable per deployment | +| Idempotence | service-side dedup is the dedup mechanism (per `(zoomLevel, lat, lon, capture_timestamp, companion_id, flight_id)`) | onboard-side does not need to track delivery transactions | + +--- + +## Cross-cutting: FDR write side-effect + +Every flow above produces FDR records (per AC-NEW-3). The cross-cutting rules are: + +- **Every payload class must be present** for the duration of the flight (per-frame estimates with covariance + source-label, FC IMU traces full-rate, all emitted external-position MAVLink frames, raw MAVLink stream `tlog`, system health, mid-flight tiles, ≤ 0.1 Hz failed-tile thumbnails). +- **No raw nav/AI-cam frames** (AC-8.5). +- **64 GB cap per flight**; oldest segment dropped first on rollover; **rollover is logged**, never silent (NFT-6). +- **Smoothed past-frame entries** are mandatory per Mode B Fact #107 so post-mission analysis can verify AC-4.5 internal-smoothing scope. +- **Reboot continuation** (F8) opens a continuation record under the same `flight_id`, never a new flight. + +The FDR is the post-mission single source of truth; everything emitted to FC + GCS is also FDR-logged so AC-NEW-4 / AC-NEW-7 / IT-10 / IT-11 / NFT-* analyses can be replayed offline. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 6f3c6c5..6670051 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,8 +6,8 @@ step: 3 name: Plan status: in_progress sub_step: - phase: 6 - name: plan-step2-phase2a-architecture-flows - detail: "" + phase: 10 + name: plan-step2-phase2a-blocking-review + detail: "ADR-002 reframed to technical-only build-time exclusion; license-driven framing removed; packaging optionality NOTE added" retry_count: 0 cycle: 1