Files
gps-denied-onboard/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh e2bebefdfc [AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.

AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.

AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.

Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).

Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.

Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.

AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 02:37:14 +03:00

76 KiB
Raw Blame History

GPS-Denied Onboard Pose Estimation — Architecture

Date: 2026-05-09 (Plan Phase 2a — initial draft). Inputs: _docs/00_problem/{problem,acceptance_criteria,restrictions}.md, _docs/00_problem/input_data/*, _docs/01_solution/solution.md, _docs/02_document/glossary.md, _docs/02_document/tests/*.

Architecture Vision

User-confirmed in Plan Phase 2a.0 (2026-05-09). This section is the spine of the document; nothing below it may contradict it without a recorded ADR.

The system is a Jetson Orin Nano Super-hosted onboard companion that delivers a GPS-equivalent WGS84 position (with honest 6×6 covariance and provenance label {satellite_anchored | visual_propagated | dead_reckoned}) to a fixed-wing UAV's flight controller in GPS-denied or GPS-spoofed environments. It runs as a single Python-with-C++-extensions monolithic process per binary track on the companion PC, fusing pre-flight-cached satellite tiles served by the parent-suite satellite-provider with live nav-camera frames (3 Hz) and FC-supplied IMU/attitude (100200 Hz). A canonical hierarchical pipeline VIO → retrieval → re-rank → matching → AdHoP-conditional refinement → pose → fusion drives the per-frame loop within a 400 ms p95 latency budget. Cross-component coupling routes through a shared GTSAM substrate so posterior covariance is recovered natively (D-C5-5 = (c)). The companion is read-only against satellite-provider while airborne — both the pre-flight tile download and the post-landing tile upload run from the operator-side Tile Manager (C11), a separate binary that is excluded from the airborne CMake target so the companion image cannot load either code path even via reflection or config error (process-level isolation, AC-8.4).

Components — intent-level (formal decomposition belongs to Step 3)

  • C1 — Visual / Visual-Inertial Odometry: pluggable VioStrategy (Okvis2 default, VinsMono in research builds only, KltRansac mandatory simple-baseline), config-selected at startup, not hot-swappable mid-flight.
  • C2 — Visual Place Recognition: pre-cached satellite-tile retrieval (UltraVPR primary, MegaLoc secondary, MixVPR / SelaVPR / EigenPlaces / NetVLAD / SALAD additional candidates), all behind a single VprStrategy interface; concrete implementation chosen by config at startup.
  • C2.5 — Top-N inlier-based re-rank: re-ranks the top-K=10 VPR candidates by single-pair LightGlue inlier count down to top-N=3.
  • C3 — Cross-domain matcher: DISK+LightGlue (D-C3-1 = (a)) over the N=3 retained candidates; ALIKED+LightGlue secondary; XFeat alternate.
  • C3.5 — AdHoP-conditional refinement: invoked only when initial reprojection residual exceeds threshold; bypassed otherwise to preserve AC-4.1.
  • C4 — Pose estimation: OpenCV ≥4.12.0 solvePnPRansac (IPPE) wrapped in GTSAM Marginals for native 6×6 covariance recovery (D-C4-2 = (b); auto-degrades to Jacobian-based covariance D-C4-2 = (a) under thermal throttle per D-CROSS-LATENCY-1).
  • C5 — State estimator: GTSAM iSAM2 + CombinedImuFactor + IncrementalFixedLagSmoother (K=1020 keyframes, D-C5-3); native posterior covariance via Marginals; AC-4.5 = internal smoothing only, not FC retroactive correction.
  • C6 — Tile cache + spatial index: PostgreSQL btree spatial index over filesystem ./tiles/{zoomLevel}/{x}/{y}.jpg mirroring satellite-provider's on-disk layout, plus FAISS HNSW index for VPR descriptors (.index written via faiss.write_index + atomicwrites + SHA-256 content-hash gate, D-C10-3).
  • C7 — On-Jetson inference runtime: TensorRT 10.3 engines (Polygraphy / trtexec / IBuilderConfig hybrid orchestration), JetPack 6.2, SM 87; ONNX Runtime + TRT EP fallback; pure PyTorch FP16 baseline.
  • C8 — Flight-Controller adapter: pymavlink GPS_INPUT for ArduPilot Plane (MAVLink 2.0 message signing on the companion ↔ AP wired channel, D-C8-9 = (d)) and YAMSPy / INAV-Toolkit MSP2_SENSOR_GPS for iNav (signing-gap accepted residual risk).
  • C10 — Pre-flight cache provisioning: builds the model-derived cache artifacts (descriptor generation, engine compilation, manifest + content-hash) on top of an already-populated tile store; F2 takeoff verifier (D-C10-1, D-C10-3, D-C10-6, D-C10-7). C10 does NOT touch satellite-provider — tile network I/O lives in C11.
  • C11 — Tile Manager (operator-side, distinct binary/image, ADR-004 process-isolated): owns operator-side network I/O against satellite-provider in both directions. TileDownloader interface fetches tiles into C6 during F1 (TLS + service-internal API key); TileUploader interface, gated on flight_state == ON_GROUND, pushes mid-flight tiles to satellite-provider's ingest endpoint (D-PROJ-2 contract; not yet implemented service-side). The component bundles both interfaces because they share auth, HTTP client, deployment unit, and the airborne-exclusion property.
  • C12 — Operator pre-flight tooling (Plan-phase carryforward, deferred from research): cache provisioning UI, sector classification (active-conflict vs stable rear), freshness pipeline workflow.
  • C13 — Flight Data Recorder (FDR): per-flight ≤64 GB NVM record of estimates + IMU traces + emitted MAVLink + system health + mid-flight tiles + ≤0.1 Hz failed-tile thumbnails; raw nav/AI-cam frames excluded (AC-8.5, AC-NEW-3).
  • External: satellite-provider (parent-suite .NET 8 service): tile producer pre-flight; tile sink post-landing (D-PROJ-2). Treated as a planned external dependency on the upload + voting paths.

Architectural principles / non-negotiables

  1. Camera-specific math enters only via a Camera calibration artifact JSON (intrinsics + distortion + body-to-camera extrinsics + acquisition method factory_sheet | checkerboard_refined | hybrid). No hard-coded camera math anywhere; test fixtures (adti26) and production deployments (adti20) load different artifacts on the same code path.
  2. VioStrategy is selected at startup via config; not hot-swappable mid-flight.
  3. Build-time exclusion of unused Strategy implementations. A given binary links only the implementations it actually uses at runtime. The default deployment binary links the production-default strategies (e.g. OKVIS2 on C1) plus the engine-rule-mandatory simple-baseline (KltRansac on C1); the IT-12 comparative-study binary links all C1 implementations side-by-side. The mechanism is per-component CMake BUILD_* flags (BUILD_VINS_MONO, BUILD_SALAD, …) plus the per-binary composition root choosing among the linked implementations at startup. Justification is technical — binary size on the 8 GB shared Jetson, boot/load time inside the AC-NEW-1 30 s budget, deployed dependency / attack surface, and accidental-selection risk reduction (a binary with only OKVIS2 + KltRansac linked cannot be misconfigured into running VINS-Mono). Component licenses do not drive this decision — see ADR-002. CI emits both the deployment binary and the research binary on every PR.
  4. In-air network I/O against satellite-provider is forbidden — in BOTH directions. Enforced primarily by process-level isolation — the Tile Manager (C11), which carries both the TileDownloader and the TileUploader interfaces, is not loaded in the airborne companion image. Software guard on flight_state == ON_GROUND (upload) is a defense-in-depth check, not the primary control. The companion is read-only against C6 in flight; both pre-flight tile fetching and post-landing tile upload happen on the operator workstation.
  5. All persistent imagery is in satellite-provider's on-disk tile format (./tiles/{zoomLevel}/{x}/{y}.jpg + matching metadata) so post-landing upload is byte-identical. No raw frames on disk except the AC-8.5 forensic ≤0.1 Hz failed-tile thumbnail log inside FDR.
  6. Honest 6×6 posterior covariance via GTSAM Marginals is the safety floor for AC-NEW-4 and AC-NEW-7. Under-reported horiz_accuracy is a defect, not a tuning knob.
  7. MAVLink 2.0 message signing on the companion ↔ ArduPilot wired channel, with per-flight key rotation (D-C8-9 = (d)). iNav has no signing equivalent — accepted residual risk, Plan-phase carryforward proposes an iNav firmware feature request.
  8. D-CROSS-LATENCY-1 hybrid: K=3 baseline auto-degrades to K=2 + Jacobian covariance under Jetson thermal throttle, preserving AC-4.1 at +50 °C ambient at the cost of ~510 % accuracy loss (still inside AC-NEW-4).
  9. Two execution tiers (Tier-1 workstation Docker = fast/cheap; Tier-2 Jetson hardware = AC-bound) appear in the deployment plan and CI matrix per finding F6.
  10. Camera intrinsics and full-altitude footage are calibration prerequisites, not implementation gaps. Production accuracy claims are gated on D-PROJ-1 closure (hybrid factory + checkerboard refinement). Test fixtures use adti26 calibration sourced from public/factory references.
  11. Spoofed GPS never re-enters the estimator unless the FC GPS report passes a three-part gate (AC-NEW-8 + AZ-490 follow-up): (a) FC GPS health stable + non-spoofed for ≥ 10 s, (b) a visual/satellite consistency check has succeeded on the next anchor frame, AND (c) the FC's reported position is within ≤ 200 m of the companion's last emitted PoseEstimate. The third clause is the mid-flight bounded-delta gate — even a "stable, non-spoofed" GPS frame is rejected if it disagrees with the companion's posterior by more than the configurable budget. Real GPS that passes the gate is fused via add_pose_anchor with the FC's covariance (treated as one more anchor source, never overriding the visual pipeline without the gate).
  12. Operator-planned mission is the primary cold-start trust anchor, not the FC EKF (AZ-490 follow-up). The operator authors the route in the parent-suite Mission Planner UI (suite/ui), the route persists in the parent-suite flights REST service (suite/flights), and C12 (operator tooling) reads the Flight from that service to: (a) derive the cache bbox as the envelope of the waypoint lat/lon plus a configurable buffer, (b) extract the first-ordered waypoint as the takeoff origin (lat / lon / alt), and (c) bake the takeoff origin into the C10 Manifest so the airborne C5 can warm-start from it via set_takeoff_origin(origin, sigma_horiz_m, sigma_vert_m) before any FC IMU / VIO sample arrives. This unblocks the GPS-jammed-at-takeoff scenario the FC-EKF-only cold-start path (AZ-419 today) cannot handle. The FC EKF's last valid GPS becomes a secondary cold-start input — used only when the operator origin is missing from the Manifest OR when the FC EKF reading passes the same bounded-delta consistency check against the operator origin.
  13. AC-4.5 is internal smoothing only. GTSAM iSAM2 retroactively refines past keyframes onboard and emits the corrected current frame; the FC log is forward-time only — neither ArduPilot nor iNav supports FC-side retroactive correction (Mode B Fact #107).
  14. Interface-first components with constructor-injected dependencies. Every component is defined as an interface (Python Protocol or ABC) before any concrete implementation exists, lives in its own folder under src/components/<component>/, and is wired together via constructor injection at a single composition root. Components never reach out to a global registry, a singleton, or import a sibling component's concrete class directly — they receive their collaborators as __init__ arguments typed against the sibling's interface. Multiple interchangeable implementations of the same interface MUST be supported by design (e.g., C1 has three VioStrategy implementations; C2 has UltraVPR + MegaLoc + MixVPR + … behind a single VprStrategy; C8 has two FC-adapter implementations behind a single FcAdapter). Selection happens once, at startup, by config; the composition root resolves config → concrete implementation → wires the graph; the rest of the runtime sees only interfaces. Side benefit (NOTE): this design also gives the project packaging optionality — different combinations of BUILD_* flags can produce binaries tailored to specific deployment targets, customer bundles, or (if/when relevant later) end-product licensing strategies, without any source-level change in application code. That optionality is a consequence of the interface-first design, not a driver — the architectural decisions in this document are made on technical grounds; component licenses do not influence them. See ADR-002 § Consequences and ADR-009.

Open architectural items (tracked, NOT blocking Phase 2a)

  • D-PROJ-1 (camera calibration acquisition): CLOSED in this Plan cycle as hybrid factory + checkerboard refinement (~1 day per deployed unit). No physical hardware available this cycle, so production calibration is documented as instructions only; runtime path uses test-fixture calibration for adti26 images.
  • D-PROJ-2 (parent-suite satellite-provider ingest endpoint + multi-flight voting layer): open, parent-suite work, tracked in _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md. Onboard-side proceeds against the real satellite-provider — and uses an e2e-test-only mock-suite-sat-service fixture (under tests/fixtures/) to stand in for the not-yet-shipped POST contract during integration tests.
  • D-PROJ-3 (multi-flight fixture acquisition for AC-NEW-4 / AC-NEW-7 statistical headroom): not pursued this cycle; AC-text was relaxed 2026-05-09 to Monte-Carlo-over-current-data with stated 95 % CI; multi-flight statistical headroom is residual risk in the Step 4 risk register.
  • D-C8-2 runtime gate (companion-driven MAV_CMD_SET_EKF_SOURCE_SET switch): pattern is firmware-supported but not deployed-precedent. ArduPilot Plane SITL validation (IT-3) is the lock gate; D-C8-2-FALLBACK options recorded.
  • D-C2-12 (DINOv2-feature-based matcher evaluation): carryforward research item; potentially closes D-C3-1 retrain cost.

1. System Context

Problem being solved: a fixed-wing UAV operating in eastern/southern Ukraine must continue to navigate and report position to its flight controller when GPS is denied (no fix) or spoofed (false fix). The onboard system replaces real GPS with a WGS84 position estimate derived from pre-cached satellite tiles + live nav-camera frames + FC IMU/attitude, with honest covariance and a provenance label. Mission profile: 8 h flights, ~60 km/h cruise, ≤1 km AGL, ≤400 km² total cached area.

System boundaries (inside vs outside):

Inside the system (this project) Outside the system
Companion PC runtime (Jetson Orin Nano Super, JetPack 6.2) Flight controller firmware (ArduPilot Plane, iNav)
All onboard pose-estimation logic (C1C8, C13) Parent-suite satellite-provider (.NET 8 REST microservice)
Pre-flight cache artifact build (C10 — engines + descriptors + manifest) Parent-suite flights REST service (.NET 8; owns the Flight + Waypoint DTOs)
Operator-side Tile Manager (C11 — pre-flight download + post-landing upload) Parent-suite Mission Planner UI (suite/ui — where operators plan the route)
Operator pre-flight tooling (C12) GCS (QGroundControl)
FDR writer (C13) Nav camera hardware (adti20); AI-camera hardware
Camera calibration artifact format + loader UAV airframe / FC IMU / sensors
Operator's workstation OS / authentication
The act of calibration itself (operator runs checkerboard rig)

External systems:

System Integration Type Direction Purpose
satellite-provider (parent-suite .NET 8) REST + filesystem (read), REST (post-landing write, D-PROJ-2) Both Pre-flight tile source; post-landing tile sink (planned)
flights REST service (parent-suite .NET 8) REST (read) over HTTPS Inbound to C12 Source of the operator-planned Flight (waypoints, ordering, altitudes). C12 derives bbox + takeoff origin from the Flight. Operator workstation only — never reached from the airborne companion
Mission Planner UI (suite/ui) Indirect via flights REST Inbound (mediated) Where the operator authors the route before C12 consumes it. Out of scope for this project, but the API contract it produces IS in scope
ArduPilot Plane FC MAVLink 2.0 over UART/USB (signed) Both Inbound: external position via GPS_INPUT. Outbound: IMU, attitude, GPS health, EKF source-set commands
iNav FC MSP2 over UART (unsigned), MAVLink outbound Both Inbound: external position via MSP2_SENSOR_GPS (companion is sole GPS source on iNav). Outbound: IMU/attitude/telemetry
QGroundControl (GCS) MAVLink 2.0 (link-bandwidth-limited) Both 12 Hz downsampled summary out (AC-6.1); operator commands in (AC-6.2)
Nav camera (USB/MIPI-CSI/GigE) Camera SDK / V4L2 Inbound 3 Hz nadir frames at 5472×3648 px
AI camera Camera SDK + gimbal/zoom telemetry Inbound AC-7.x object localization (deferred to follow-up cycle)
Operator workstation Filesystem + USB/Ethernet Both Pre-flight: stages cache + calibration onto companion. Post-flight: triggers upload tool, reads FDR

2. Technology Stack

Layer Technology Version Rationale
Language (host) Python 3.10 (JetPack 6.2 default) Glue layer for GTSAM/FAISS/OpenCV/pymavlink/YAMSPy; matches every selected library's primary binding
Language (perf-critical) C++ C++17 OKVIS2, VINS-Mono, GTSAM core, OpenCV, FAISS native; Python wrappers cross the boundary
Inference runtime TensorRT 10.3 (JetPack 6.2 pin) Pinned per D-C7-9; fallback ONNX Runtime + TRT EP; pure PyTorch FP16 baseline for mandatory simple-baseline track
Visual matching DISK + LightGlue upstream HEAD pinned per Plan-phase D-C3-1 = (a); replaces SuperPoint+SuperGlue (Magic Leap noncommercial canonical)
VPR (primary) UltraVPR RAL 2025 / ICRA 2026 (cbbhuxx/UltraVPR) Documentary Lead PRIMARY; rotation-invariant, unsupervised aerial pretrain (multi-heading aerial flight + closes D-C2-1 retrain cost)
VPR (secondary) MegaLoc, MixVPR, SelaVPR, EigenPlaces, NetVLAD upstream HEAD pinned per Plan-phase Mode B Fact #110/#113 + mandatory simple-baseline (NetVLAD/MixVPR)
State estimator GTSAM + gtsam_unstable.IncrementalFixedLagSmoother per Plan-phase pin (no published CVE at audit time) Native 6×6 covariance; D-C5-5 = (c) PriorFactorPose3 only
Image / pose math OpenCV (Python+C++) ≥ 4.12.0 CVE-2025-53644 mitigation (Mode B Fact #112); IPPE flags for D-C4-1 = (b)
VPR descriptor index FAISS HNSW upstream HEAD pinned per Plan-phase faiss.write_index + atomicwrites + SHA-256 content-hash gate (D-C10-3)
FC adapter (ArduPilot) pymavlink + MAVLink 2.0 signing bundled unmodified per D-C8-3 Verified Source #4; ArduPilot canonical signing per Source #128
FC adapter (iNav) YAMSPy + INAV-Toolkit MSP2 MIT throughout iNav has no inbound MAVLink ext-positioning handler (SQ6)
VIO (production) OKVIS2 (BSD-3-Clause) upstream HEAD pinned per Plan-phase D-C1-1-SUB-A = (a) production-default
VIO (research / IT-12) VINS-Mono upstream HEAD pinned per Plan-phase Research binary only (BUILD_VINS_MONO=ON) for IT-12 comparative study; build-time exclusion from deployment binary per ADR-002
VIO (mandatory baseline) KLT+RANSAC over OpenCV OpenCV ≥ 4.12.0 Engine-rule-required mandatory simple-baseline
Tile cache backend PostgreSQL + filesystem PostgreSQL 16 (mirror of satellite-provider) C6 mirrors satellite-provider's on-disk and table layout so C11 TileUploader's post-landing payload is byte-identical to what the parent suite already serves
Container runtime Docker (Tier-1) + bare JetPack (Tier-2) Docker 27.x; JetPack 6.2 Tier-1 workstation Docker; Tier-2 Jetson native (no Docker — direct JetPack to keep INT8 calibration cache trustworthy per D-C10-6)
Build system CMake + Python pyproject.toml CMake ≥ 3.27 CMake option(BUILD_VINS_MONO ...) D-C1-1-SUB-A; Python wheels built per Jetson via cibuildwheel-equivalent recipe
CI/CD GitHub Actions (Tier-1) + self-hosted Jetson runner (Tier-2) latest pinned action versions Two-binary emit on every PR (production + research); Tier-2 runs are AC-bound jobs only
Configuration YAML (per-flight) + Camera calibration JSON n/a Single config root; the only camera-specific entry point is the calibration JSON

Key constraints from restrictions.md and how they shape the stack:

  • Hardware pinned to Jetson Orin Nano Super (8 GB shared, 25 W) → forces TensorRT engine compilation on-device + INT8/FP16 mix per D-C7-1; rules out heavy multi-process stacks (D-C1-1-SUB-A = (b) was rejected on latency budget).
  • Python is the host language but ROS-bound C++ is unavoidable for VIO → both production and research binaries are CMake projects that produce a Python-importable .so per VioStrategy; the rest of the runtime is pure Python.
  • PX4 is out of scope, ArduPilot Plane + iNav both required → C8 must split per FC, with no single message contract spanning both.
  • Build-time exclusion of unused Strategy implementations (ADR-002) → CMake BUILD_* flags (BUILD_VINS_MONO, BUILD_SALAD, …) determine which implementations are linked into each binary; the deployment binary links the production-default + the mandatory simple-baseline; the IT-12 research binary links all strategies. Justification is technical (binary size on 8 GB shared Jetson, AC-NEW-1 boot budget, dependency surface, accidental-selection risk). Component licenses do not influence this decision.
  • MAVLink message-signing posture asymmetrypymavlink signing handshake is part of takeoff load on the AP path; iNav unsigned link is documented as accepted residual risk in security_analysis.md carryforward.
  • No raw-frame storage (AC-8.5) → all camera ingestion is streaming; the only persistence path for frame imagery is via tile orthorectification (AC-8.4).
  • 8 h continuous duty cycle at 25 W up to +50 °C ambient → the auto-degrade hybrid (D-CROSS-LATENCY-1) is a first-class concern of every latency-sensitive component, not an afterthought.

3. Deployment Model

Environments:

Environment Purpose Hardware
dev-tier1 Fast iterative development; unit + most integration tests Workstation (any Linux x86_64 + NVIDIA GPU optional); Docker
dev-tier2 Hardware-bound development checks Jetson Orin Nano Super dev kit (developer's desk)
staging-tier1 CI runs that don't require Jetson hardware GitHub-hosted runner (x86_64); Docker
staging-tier2 CI runs that require Jetson (AC-bound jobs only) Self-hosted Jetson runner; bare JetPack (no Docker)
production Deployed companion image on a UAV Jetson Orin Nano Super (pinned); bare JetPack; no inbound network listening (defense-in-depth, NFT-SEC-05)
production-operator-workstation Pre-flight tile download + cache artifact build (C10) + post-landing tile upload (C11) + FDR retrieval Operator's Linux workstation; Docker for satellite-provider mirror

Infrastructure:

  • No cloud orchestration. The companion is an embedded edge device; the operator's workstation is a single host that runs the operator tooling (C11 Tile Manager + C12 Operator Pre-flight Tooling) and a local satellite-provider mirror or VPN-reaches the lab satellite-provider.
  • Two binaries shipped on every PR (ADR-002): deployment-binary (links the production-default strategy on each component + the mandatory simple-baseline; CMake BUILD_VINS_MONO=OFF, BUILD_SALAD=OFF, …) and research-binary (links every available strategy on every component; all BUILD_* flags ON, used for the IT-12 comparative study). The deployment binary is what installs onto an operational Jetson; the research binary runs on dev/lab Jetson hardware for the comparative-study report. The same code base produces both — ADR-002 mechanism scales to additional binary variants later if packaging strategy requires it.
  • Container scope: Tier-1 uses Docker (docker compose for the developer setup including a mock-suite-sat-service container, the operator-tool container, and a Postgres for C6). Tier-2 (Jetson) does NOT use Docker — TensorRT INT8 calibration caches and jetson-stats thermal telemetry are most reliable without a container layer, per D-C7-9 + D-C10-6. The deployed image on the Jetson is a JetPack-based system image with the deployment binary preinstalled.
  • Scaling: not applicable (per-UAV, single companion). Failover is per-airframe (the FC's IMU-only fallback at AC-5.2 is the system's "scale-out").

Environment-specific configuration:

Config dev-tier1 staging-tier2 production
satellite-provider host local Docker (satellite-provider:5100) real satellite-provider Docker (download path; existing) + e2e-test mock-suite-sat-service fixture (POST/upload only, until D-PROJ-2 lands) operator workstation (pre-flight only)
Camera calibration source test-fixture artifact (adti26.json) test-fixture artifact adti20.json (D-PROJ-1 hybrid output)
Logging sink console (DEBUG) journald + FDR FDR (per-flight, ≤ 64 GB rolling)
MAVLink signing key dev key (committed to test fixtures) per-flight key from test config per-flight key generated at takeoff load, rotated per flight
Inference engine source pre-built engines OR on-the-fly compile pre-built (Tier-2 cache) pre-built (verified content-hash gate)
BUILD_VINS_MONO (binary track) both (developer's choice) both OFF (production-only)
Network egress unrestricted locked to test endpoints none in flight (DNS blackhole + iptables OUTPUT REJECT, NFT-SEC-05)

Image / artifact pipeline:

source repo
   ├─→ CI matrix
   │     ├─ tier1 lint + unit + most integration → Docker
   │     ├─ tier1 build production-binary + research-binary (CMake split)
   │     ├─ tier1 SBOM diff (production must NOT include vins_mono)
   │     └─ tier2 (self-hosted Jetson) AC-bound suite (NFT-PERF-*, NFT-LIM-*, IT-12)
   │
   ├─→ release artifacts:
   │     ├─ deployment-binary tarball (production-default strategies + mandatory baselines, ADR-002)
   │     ├─ research-binary tarball (all strategies linked; for IT-12 comparative study)
   │     ├─ JetPack image (deployment-binary preinstalled)
   │     └─ operator-tooling tarball (C11 + C12 + e2e-test mock-suite-sat-service compose for offline integration testing)
   │
   └─→ deploy paths:
         ├─ Jetson operational deploy: JetPack image flash (deployment-binary)
         ├─ Lab/research deploy: research-binary install on dev Jetson
         └─ Operator workstation: Docker compose for C11+C12+local satellite-provider mirror

4. Data Model Overview

Detailed per-component data models live in component specs (Step 3); per-entity migration strategies live in data_model.md (Phase 2b).

Core entities:

Entity Description Owned by component
NavCameraFrame 5472×3648 px nadir RGB frame + capture timestamp + camera ID Camera ingest → C1, C2
ImuSample / ImuWindow IMU sample (accel + gyro + timestamp) at 100200 Hz; windowed view sent to C1 FC adapter (C8 inbound side)
VioOutput Per-frame relative pose SE(3) + 6×6 covariance + IMU bias estimate + feature quality C1
VprQuery Image embedding (UltraVPR/MegaLoc/etc) C2
VprResult Top-K=10 candidate tile IDs ranked by descriptor distance C2
RerankResult Top-N=3 candidate tiles ranked by inlier count C2.5
MatchResult 2D-3D correspondences with RANSAC inliers from C3 / C3.5 C3, C3.5
CameraCalibration Intrinsics K + distortion + body-to-camera extrinsics + acquisition method Loaded once at startup; consumed by C1, C3, C4
PoseEstimate WGS84 position + 6×6 covariance + provenance label + last_satellite_anchor_age_ms C4 → C5
Tile JPEG body + center lat/lon + zoomLevel + tile_size_meters/pixels + capture_timestamp + source + freshness flag + (mid-flight only) quality_metadata C6
TileQualityMetadata estimator_label, 2×2 covariance sub-matrix, last_anchor_age_ms, MRE, IMU bias norm — sufficient for D-PROJ-2 voting C6 (write side from C5/C4 outputs)
EmittedExternalPosition WGS84 + honest horiz_accuracy + per-FC encoding (MAVLink GPS_INPUT for AP, MSP2 MSP2_SENSOR_GPS for iNav) C8
FlightStateSignal `IN_AIR ON_GROUNDboolean derived from FCMAV_STATE`
FdrRecord Estimates + IMU traces + emitted MAVLink + system health + tiles + thumbnails (≤ 64 GB / flight) C13
Manifest Hash of (model + calibration + corpus + sector classification + takeoff origin) for D-C10-1 idempotence C10
EngineCacheEntry TRT engine + INT8 calibration cache keyed by SM/JP/TRT/precision tuple (D-C10-7) C10, C7
SectorClassification `active_conflict stable_rear` per area, drives freshness threshold
Flight Operator-planned mission: ordered Waypoint list + metadata, persisted in the parent-suite flights REST service. Read by C12 via FlightsApiClient; never reached from the airborne companion External (suite/flights) → C12
Waypoint Ordered (lat, lon, alt, objective, source) entry inside a Flight. C12 envelopes waypoint lat/lon → bbox; first-ordered waypoint → takeoff origin External (suite/flights) → C12
TakeoffOrigin LatLonAlt carried in the C10 Manifest; baked in by C12 at build time from Flight.waypoints[0]; consumed at boot by C5 via set_takeoff_origin(origin, sigma_horiz_m, sigma_vert_m) (AZ-490) C12 → C10 Manifest → C5

Key relationships:

  • NavCameraFrameVioOutput (via C1) and VprQuery (via C2): same frame, two consumers.
  • VprResult.tileIdsTile.id (FK into the tile cache).
  • MatchResult references both NavCameraFrame.id and Tile.id (cross-domain pair).
  • PoseEstimate aggregates MatchResult + VioOutput + ImuWindow through C4 + C5.
  • EmittedExternalPosition is a per-FC projection of PoseEstimate; the projection rule lives in C8 (per-FC unit conversion D-C8-8 = (b)).
  • Tile (mid-flight) is produced from NavCameraFrame + PoseEstimate via orthorectification; carries TileQualityMetadata referencing the PoseEstimate it was emitted from.
  • FdrRecord is the union of all emitted streams + all inputs (excluding raw nav/AI-cam frames); rollover policy = oldest segment dropped first.

Data flow summary (one-line each; full sequences in system-flows.md):

  • Pre-flight: satellite-provider → C11 TileDownloaderTile cache (C6) → C10 → EngineCacheEntry + Manifest + descriptor .index (atomic write + content-hash gate).
  • Takeoff load: Manifest content-hash verify + FAISS mmap + TRT deserialize + MAVLink signing handshake → ready.
  • Per-frame runtime: NavCameraFrame + ImuWindow → C1 (VioOutput) → C2 → C2.5 → C3 → C3.5 → C4 → C5 → C8 → EmittedExternalPosition to FC.
  • Mid-flight tile gen: NavCameraFrame + PoseEstimate → orthorectify → dedup → write to local C6 (no upload).
  • GCS telemetry: C5 → C8 → 12 Hz downsampled summary to QGroundControl.
  • FDR: every emitted/received stream → C13 ring with per-flight ≤ 64 GB cap.
  • Post-landing: operator triggers C11 TileUploader → reads C6 → uploads to satellite-provider ingest endpoint (D-PROJ-2 contract).

5. Integration Points

Internal Communication

All in-process Python calls; the system is a single host process per binary track. "Pattern" describes the interaction shape.

From To Protocol Pattern Notes
Camera ingest thread C1 (VioStrategy.process_frame) In-process queue (bounded, drop-oldest) Producer-consumer Frame skip is allowed under sustained load (AC-4.1 "~10% may drop")
Camera ingest thread C2 (vpr_pipeline.query) In-process queue (bounded, drop-oldest) Producer-consumer Same frame fan-out, distinct queue depths
C2 C2.5 Direct call Function call C2.5 wraps C3 matcher; no queue
C2.5 C3 / C3.5 Direct call Function call C3.5 invoked iff MatchResult.reprojection_residual > threshold
C3 / C3.5 C4 Direct call Function call MatchResult passed as DTO
C1 + C4 C5 In-process queue (timestamp-ordered merge) Pub/sub C5 holds the GTSAM iSAM2 state; one writer thread
C5 C8 (FC outbound) In-process queue (per-FC encoder) Pub/sub One encoder per active FC profile; selected at startup
C8 (FC inbound) C1 (ImuWindow), C5 (FC IMU/attitude prior) In-process pub/sub (timestamp-aligned) Pub/sub Single source of truth for FC IMU; both consumers see the same window
C8 (FC inbound) flight-state guard (process boundary) In-process pub/sub Event Used by FDR + GCS heartbeat; airborne companion does not load C11 at all
C5 → orthorectifier → C6 C6 (write-only while airborne) In-process function call Command Write path is in-process; the in-air image has no upload code path
All components C13 (FDR writer) In-process queue (lossy on overrun) Pub/sub Overrun = logged rollover, never silent drop (AC-NEW-3)

External Integrations

External system Protocol Auth Rate limits Failure mode
ArduPilot Plane FC MAVLink 2.0 (GPS_INPUT 5 Hz; MAV_CMD_SET_EKF_SOURCE_SET; STATUSTEXT / NAMED_VALUE_FLOAT) over UART/USB MAVLink 2.0 message signing, per-flight key (D-C8-9 = (d)) 5 Hz periodic emit; signing handshake at takeoff load (≤ 5 s, AC-NEW-1) Signing handshake fail → companion refuses takeoff; mid-flight signing key compromise → FC ignores unsigned messages, AC-5.2 takes over
iNav FC MSP2 MSP2_SENSOR_GPS over UART; MAVLink outbound for telemetry None (iNav has no signing) — accepted residual risk per Mode B Source #129 5 Hz periodic emit Mid-flight bad-frame → iNav mspGPSReceiveNewData() receives only the latest frame; honest hPosAccuracy is the only safety net
QGroundControl (GCS) MAVLink 2.0 (STATUSTEXT, NAMED_VALUE_FLOAT, GPS_RAW_INT) Same MAVLink 2.0 signing as the AP path (AP profile); no signing on iNav profile 12 Hz downsampled (AC-6.1); operator commands are best-effort GCS link drop → companion continues; no mid-flight reconfiguration is required from GCS
satellite-provider (pre-flight) REST over HTTP, OpenAPI at /swagger; filesystem access if co-located TLS + service-internal API key (operator workstation only); the companion never reaches satellite-provider directly while airborne Off-line pre-flight; not time-critical Cache miss → C11 TileDownloader fails fast pre-flight; C10 build is blocked downstream; takeoff blocked
satellite-provider (post-landing ingest, D-PROJ-2, planned) REST POST /api/satellite/tiles/ingest (multipart) Per-flight onboard signing key (carried with each tile); rate-limited Bursty post-landing Endpoint not yet implemented service-side → C11 keeps batches queued locally; never blocks the pre-flight cycle
Operator workstation (pre-flight stage) Filesystem (USB / Ethernet) OS-level (operator login) Not time-critical Bad-stage detection via Manifest content-hash gate (D-C10-3)
Nav camera USB / MIPI-CSI / GigE (lens-module dependent) n/a 3 Hz Frame drop / hardware fault → "VISUAL_BLACKOUT" path (AC-3.5, AC-NEW-8)

satellite-provider upload contract (per D-PROJ-2 carryforward)

The onboard side of D-PROJ-2 is fully specified in _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md. From this architecture's standpoint:

  • Tile writes are append-only and idempotent (the same (zoomLevel, lat, lon, capture_timestamp, companion_id, flight_id) tuple is the dedup key).
  • Quality metadata is mandatory on every uploaded tile so the planned voting layer can promote pending → trusted without re-deriving statistics on the service side.
  • Onboard tiles never claim the trusted status; they are uploaded as pending and the parent-suite voting layer (D-PROJ-2 design task #2) decides promotion.
  • Test substitute: mock-suite-sat-service is an e2e-test-only fixture (under tests/fixtures/mock-suite-sat-service/) that implements the upload contract for NFT-SEC-01 / FT-P-17 / IT runs until D-PROJ-2 lands service-side. It is not a component in the architectural sense — the production architectural counterparty for both download and upload is the real satellite-provider. The fixture is retired the moment the real ingest endpoint ships.

6. Non-Functional Requirements

Targets are taken verbatim from acceptance_criteria.md and tests/traceability-matrix.md. The tests column points to the canonical tests/ files where each NFR is exercised.

Requirement Target Measurement Priority Tests
End-to-end latency (AC-4.1) p95 ≤ 400 ms (steady-state and thermal-throttle hybrid) NFT-PERF-01 (Tier-2); D-CROSS-LATENCY-1 partition High tests/performance-tests.md
Tail latency under thermal stress (AC-NEW-5 + AC-4.1) p99 ≤ 600 ms; p95 ≤ 400 ms at +50 °C 8 h NFT-9 hot-soak High tests/performance-tests.md
Memory cap (AC-4.2) < 8 GB shared (CPU + GPU) on Jetson Orin Nano Super NFT-LIM-01 8 h replay High tests/resource-limit-tests.md
Cold-start TTFF (AC-NEW-1) p95 < 30 s from companion boot to first valid frame NFT-PERF-03 (50× cold boot) High tests/performance-tests.md
Spoofing-promotion latency (AC-NEW-2) p95 < 3 s on each FC NFT-PERF-04 (SITL on AP + iNav) High tests/performance-tests.md
FDR storage (AC-NEW-3) ≤ 64 GB / flight; no silent drops NFT-LIM-02 8 h synthetic Medium tests/resource-limit-tests.md
False-position safety (AC-NEW-4) P(err > 500 m) < 0.1 %; P(err > 1 km) < 0.01 %, with stated 95 % CI over current corpus NFT-RES-03 Monte Carlo High tests/resilience-tests.md
Operating envelope (AC-NEW-5) 20 °C to +50 °C; 25 W; 8 h no throttle NFT-LIM-04 workstation baseline (chamber deferred) High tests/resource-limit-tests.md
Imagery freshness (AC-NEW-6, AC-8.2) Reject/downgrade tiles violating 6 mo / 12 mo thresholds FT-N-05 / FT-N-06 High tests/blackbox-tests.md
Cache-poisoning safety (AC-NEW-7) Onboard-side: P(misalign > 30 m) < 1 %, P(> 100 m) < 0.1 %, with stated 95 % CI NFT-SEC-01 onboard Monte Carlo + synthetic over-confidence injection High tests/security-tests.md
Visual blackout failsafe (AC-NEW-8) Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS never re-promoted without 10 s + visual consistency gate FT-N-04 + NFT-RES-04 High tests/resilience-tests.md + tests/blackbox-tests.md
Cross-FC covariance honesty (AC-NEW-4 cross-FC) horiz_accuracy (m, AP) and hPosAccuracy (mm, iNav) carry mathematically equivalent values from the same 2×2 sub-matrix IT-10 cross-FC High tests/blackbox-tests.md
MAVLink message-signing posture (AC-4.3 + D-C8-9) Signing enabled on AP wired channel; per-flight key rotation logged to FDR; iNav documented residual risk NFT-8 + NFT-SEC-03 High tests/security-tests.md
Dependency CVE pinning (D-CROSS-CVE-1) OpenCV ≥ 4.12.0; SBOM clean of unpatched CVEs at audit time; monthly re-scan NFT-10 SBOM CVE audit High tests/security-tests.md
GCS bandwidth budget (AC-6.1) 12 Hz downsampled summary FT-P-12 Medium tests/blackbox-tests.md
Frame-by-frame streaming (AC-4.4) No batching/delay; estimates emitted per frame NFT-PERF-02 High tests/performance-tests.md
Smoothing-loop look-back (AC-4.5, Mode B Fact #107) FDR contains smoothed past-frame estimates; smoothing horizon converges within X m of ground truth at K = 1020 keyframes IT-11 Medium tests/blackbox-tests.md

7. Security Architecture

Threat model (one-page summary; full extraction lives in carryforward security_analysis.md):

  • The companion is a remote untrusted endpoint from the parent-suite's standpoint: a downed UAV's companion can be physically captured. Persistent secrets must therefore be per-flight ephemeral wherever feasible.
  • The wired companion ↔ FC link is the only physical-access-required attack surface for in-flight injection. MAVLink 2.0 signing on the AP path mitigates CVE-2026-1579 (D-C8-9 = (d)). iNav has no signing — accepted residual risk.
  • The GCS link is bandwidth-limited and best-effort; a hostile GCS can spoof operator commands but cannot inject pose data (the system never accepts pose from GCS).
  • GPS spoofing is treated as expected, not anomalous (AC-3.5, AC-NEW-2, AC-NEW-8). The system never lets a spoofed GPS source re-enter the estimator without a 10 s + visual-consistency gate.
  • Cache poisoning is the dominant cross-flight attack vector (AC-NEW-7): a compromised companion could write a misaligned tile that becomes the next flight's anchor. The mitigation has two halves: onboard (honest covariance + quality metadata) and parent-suite (D-PROJ-2 voting layer, not yet implemented).
  • Pre-flight cache stage is on the operator's workstation; the SHA-256 content-hash gate (D-C10-3) detects in-place tampering between stage and takeoff.
  • In-flight network egress is forbidden (defense-in-depth: DNS blackhole + iptables OUTPUT REJECT, NFT-SEC-05). The only outbound path from the companion is MAVLink to the FC and signed STATUSTEXT to the GCS.

Authentication (per integration):

Integration Mechanism
Companion ↔ ArduPilot Plane FC MAVLink 2.0 message signing, per-flight key rotation (D-C8-9 = (d))
Companion ↔ iNav FC None (iNav has no signing implementation; accepted residual risk per Mode B Source #129)
Companion ↔ GCS (AP profile) MAVLink 2.0 signing inherited from the FC channel
Operator workstation ↔ satellite-provider (pre-flight) TLS + service-internal API key (workstation only; never on the airborne companion)
Companion ↔ satellite-provider (post-landing upload, D-PROJ-2 planned) Per-flight onboard signing key carried with each uploaded tile; the planned ingest endpoint verifies the key
Operator workstation pre-flight stage OS-level (operator login + workstation hardening — operator-tooling concern, C12)

Authorization:

  • Onboard runtime: a single principal (the runtime process); no in-process privilege boundaries. The Tile Manager (C11) runs as a different principal on the operator workstation, holding the only credentials that reach satellite-provider (TLS API key for download; per-flight onboard signing key for post-landing upload). The airborne image does not contain the C11 binary at all.
  • GCS: operator commands (AC-6.2) are best-effort hints; the operator cannot promote a pose, override covariance, or reach the satellite-provider write path. Operator re-loc requests trigger the satellite re-localization flow (F6) but do not bypass any safety gate.

Data protection:

  • At rest: tile cache + descriptor index + FDR are written to the companion's local NVM. No application-level encryption (the threat model treats a captured companion as compromised; encryption would buy little against physical access). Operator-side satellite-provider storage is the parent-suite's concern.
  • In transit: MAVLink 2.0 message signing on the AP channel; MSP2 unsigned on iNav. The post-landing upload runs over TLS to satellite-provider.
  • Secrets management:
    • Per-flight MAVLink signing key: generated at takeoff load; rotated per flight; logged to FDR.
    • Per-flight onboard signing key for tile upload: generated at takeoff load; baked into mid-flight tile metadata; consumed by C11 post-landing.
    • Pre-flight service API key: stays on the operator workstation; never written to the companion image.
    • No long-lived secrets on the companion image beyond firmware-level boot signatures (out of scope).

Audit logging:

What Where Retention
All emitted external-position frames + covariance + provenance label FDR (C13) per flight (≤ 64 GB; rollover oldest-first)
All received MAVLink + MSP2 frames (raw tlog stream) FDR per flight
MAVLink 2.0 signing key rotation events FDR per flight
Spoofing-promotion / spoofing-rejection events FDR + GCS STATUSTEXT per flight + best-effort GCS link
VISUAL_BLACKOUT_* STATUSTEXT events (AC-3.5, AC-NEW-8) FDR + GCS STATUSTEXT per flight + best-effort
C10 content-hash gate fail events FDR + companion refuses takeoff per flight
Mid-flight tile-gen failures ≤ 0.1 Hz thumbnail log inside FDR (AC-8.5 forensic exception) per flight
Component health (CPU/GPU/temp/throttle) FDR per flight
Source-set switch events (D-C8-2 EKF source-set) FDR + GCS STATUSTEXT per flight
Production binary SBOM provenance release artifacts; not on the deployed companion per release

8. Key Architectural Decisions

These ADRs distill the user-confirmed Mode-B locks plus this architecture's first-time choices. ADRs are also tracked in _docs/00_research/06_component_fit_matrix/MODEB_revisions.md and (for cross-component gates) 99_cross_component_gates.md. Step 4 (Risk Review) iterates on them; this section is the authoritative entry point.

ADR-001 — VioStrategy is selected at startup via config; not hot-swappable

Context: Three VIO implementations are required (OKVIS2 production-default, VINS-Mono research-only, KLT+RANSAC mandatory simple-baseline). Hot-swap mid-flight would add re-initialisation cost on every switch and would require keeping multiple solvers warm in 8 GB shared memory.

Decision: VioStrategy is selected at startup from a single config knob (vio.strategy: okvis2 | vins_mono | klt_ransac), and the choice is constant for the flight. The VioStrategy interface owns the abstraction; concrete strategies own their per-strategy concerns (OKVIS2's ROS bring-up, VINS-Mono's build flag, KLT's degraded covariance). Build-time inclusion / exclusion of individual strategies is governed separately by ADR-002.

Alternatives considered:

  1. Hot-swap at runtime — rejected: re-init cost + memory footprint inside AC-4.2.
  2. Single-strategy build per binary — rejected: defeats the IT-12 comparative-study objective on the research binary.

Consequences: A flight is locked to one VIO; failure of the active strategy = AC-5.2 fallback (FC IMU-only). The comparative study is a per-replay artifact, not a runtime decision.

ADR-002 — Build-time exclusion of unused Strategy implementations (D-C1-1-SUB-A = (a))

Context: The architecture deliberately requires multiple interchangeable implementations per component (three VioStrategy for C1; multiple VprStrategy for C2; two FC adapters for C8). At runtime each binary uses exactly one of them per component. Linking all implementations into every binary would inflate binary size on the 8 GB shared Jetson, increase boot/load time inside the AC-NEW-1 ≤ 30 s p95 budget, expand the deployed dependency / attack surface, and create accidental-selection risk (a misconfigured runtime accidentally booting a non-deployment-default strategy). A single binary with all strategies present is also harder to reason about for the IT-12 comparative study, which deliberately wants the opposite — every strategy present and replayed against the same footage.

This decision is made on technical grounds only. Component licenses (BSD/Apache/MIT/LGPL/GPL/etc.) do not influence which strategy is the deployment-default — that choice is the IT-12 measured-performance verdict on the project's operating context (Jetson Orin Nano Super + ADTi 20MP 20L V1 + Derkachi-class footage).

Decision:

  1. Per-component CMake BUILD_* flag controls whether each implementation is linked into a given binary (BUILD_VINS_MONO, BUILD_SALAD, etc.). The default deployment binary links the production-default strategy (OKVIS2 on C1 today, pending IT-12 verdict) plus the engine-rule-mandatory simple-baseline (KltRansac on C1). The research binary links every available strategy of every component for IT-12.
  2. The Strategy interface boundary makes the exclusion architectural rather than configurational: sibling components import only the Strategy interface, never a concrete implementation. The composition root (one per binary, see ADR-009) is the only place that names concrete classes, and a class whose file is not part of the CMake target cannot be named there — so a misconfigured deployment cannot accidentally pull in an unintended strategy.
  3. Selection at startup (config-driven; ADR-001) picks among the linked-in strategies. A binary with only OKVIS2 + KltRansac linked exposes only those two values for vio.strategy; the config validator fails fast if asked for vins_mono.
  4. CI emits both binaries on every PR (deployment + research) so the comparative-study artifact is always reproducible alongside the deployable artifact.

Alternatives considered:

  1. Single binary with all strategies linked, runtime config picks one — rejected on binary size + boot time + accidental-selection risk + unnecessary dependency surface on the deployed device.
  2. Process-isolation IPC for the unused strategies — rejected on latency budget conflict (D-CROSS-LATENCY-1) and operational complexity of two-process deployments on a 25 W edge device.
  3. Multiple deployment-binary variants tailored to specific customer bundles — out of scope of this ADR; supported as a consequence (see Consequences NOTE) but not a driver of the decision.

Consequences:

  • Two CI binaries on every PR; both must build and test green.
  • Adding any new strategy to a component is a folder-add + a CMake BUILD_* flag + an entry in the relevant binary's composition root. No call-site changes anywhere.
  • The deployment binary's SBOM is what it is — a consequence of which BUILD_* flags were ON, not a driver of which flags should be ON.
  • NOTE — packaging optionality (deferred / non-binding). Because the exclusion is per-implementation per-CMake-flag, the same code base can produce additional binaries — for different deployment targets, different customer bundles, or different end-product licensing bundles if and when product licensing is decided later. This architecture deliberately makes no licensing decisions today: component licenses do not influence which strategy is the deployment-default, and the decision above is purely technical. When packaging strategy is finalized, the same BUILD_* flag mechanism produces the right bundle without source-level changes — that optionality is a side benefit of the interface-first design (Principle #13 + ADR-009), not a justification for it.

ADR-003 — Honest 6×6 covariance via GTSAM Marginals is the safety floor (D-C5-5 = (c))

Context: AC-NEW-4 and the cross-FC covariance honesty (IT-10) require a single, mathematically-recoverable 6×6 posterior covariance per emitted frame. ESKF-style Jacobian-based covariance is faster but loses information across the C4C5 boundary.

Decision: C5 is GTSAM iSAM2 + CombinedImuFactor + BetweenFactorPose3 + GenericProjectionFactorCal3DS2, with Marginals.marginalCovariance(pose_key) recovering the 6×6 posterior. C4 is OpenCV solvePnPRansac wrapped in a GTSAM factor so C4 and C5 share the same substrate. D-CROSS-LATENCY-1 hybrid auto-degrades C4 covariance to Jacobian-based (D-C4-2 = (a)) under thermal throttle, but C5 stays on Marginals.

Alternatives considered:

  1. ESKF-only with Jacobian covariance — rejected: loses cross-component covariance honesty; engine-rule mandatory simple-baseline only.
  2. Dual estimators (ESKF + iSAM2) — rejected: memory + complexity + the hybrid auto-degrade already covers thermal stress.

Consequences: GTSAM is a hard runtime dependency; AC-4.5 internal smoothing is for free; per-frame covariance recovery costs 3090 ms in steady state (auto-degrades to 515 ms under thermal throttle).

ADR-004 — Process-level isolation for in-air upload prevention (AC-8.4 enforcement)

Context: AC-8.4 forbids in-air outbound writes to satellite-provider for drone-security reasons. The companion is also read-only against satellite-provider while airborne — there is no operational reason to fetch tiles in flight either, since the pre-flight cache is the contract. A software guard checking flight_state == ON_GROUND can be bypassed by code injection if the network I/O code path is ever loaded.

Decision: The Tile Manager (C11) is a separate binary / image that runs only on the operator's workstation; the airborne companion image does not contain the C11 binary at all — neither the TileDownloader (pre-flight) nor the TileUploader (post-landing) code paths can be reached from the airborne process. The flight_state == ON_GROUND software guard inside the TileUploader remains as defense-in-depth for the upload direction. The local mid-flight tile format is byte-identical to satellite-provider's on-disk layout so no transformation is needed at upload time.

Enforcement gates (per R02 risk register):

  1. CI SBOM diff: the build pipeline fails the airborne production-binary artifact if any symbol from c11_tilemanager/ (or any module that transitively imports c11_tilemanager) appears in the linked image. This is an extension of the per-implementation SBOM enforcement already in ADR-002.
  2. Runtime self-check in runtime_root.py: at startup, before opening the FC adapter, the airborne composition root attempts importlib.util.find_spec("c11_tilemanager") and panics if the spec resolves to anything other than None. Cost: one import lookup at startup; benefit: catches a build-system regression even if SBOM diff was bypassed.
  3. Network egress test (NFT-SEC-02): the airborne process is run inside a network namespace with no route to satellite-provider's host; any attempted outbound TCP connection to it is a release-blocking test failure.

Alternatives considered:

  1. Single binary with software-only guard — rejected on principle: a runtime guard cannot be the primary control for an "is the system airborne?" safety property.
  2. Hardware-level switch (e.g., physical write-enable jumper) — rejected: adds operations cost; software-image-isolation gives equivalent assurance for this threat model.

Consequences: Two binaries to maintain (companion image + operator-tooling image). CI builds and tests both. The operator workflow has an explicit post-landing step ("run the upload tool") which is itself a feature, not a bug.

ADR-005 — Two execution tiers (Tier-1 / Tier-2) are first-class architectural concerns (F6)

Context: AC-4.1 latency, AC-4.2 memory, AC-NEW-1 cold-start, AC-NEW-3 FDR storage, AC-NEW-5 thermal envelope, and AC-NEW-7 cache-poisoning all have validation locations on Jetson hardware that cannot be replicated on a workstation. Conversely, most logic, integration, and contract tests run in seconds on Tier-1 and would take orders of magnitude longer on Tier-2.

Decision: Tier-1 = workstation Docker (fast/cheap; runs lint + unit + most integration + Mock satellite-provider); Tier-2 = Jetson hardware (AC-bound jobs only; runs NFT-PERF-* + NFT-LIM-* + NFT-RES-* + IT-12). Both tiers are documented in the deployment plan and the CI matrix; failure on either tier is release-blocking. Tier-2 runner availability is itself a risk-register entry.

Alternatives considered:

  1. Tier-2-only — rejected: order-of-magnitude slower iteration loop; runner-availability risk dominates.
  2. Tier-1-only — rejected: AC-bound NFTs cannot pass without Jetson hardware in the loop.

Consequences: CI is split; some tests have an explicit tier: 2 annotation in tests/environment.md; release artifacts include both tier results.

ADR-006 — D-CROSS-LATENCY-1 hybrid is the AC-4.1 budget strategy

Context: At +50 °C ambient (AC-NEW-5 upper-temp), the Jetson auto-throttles, collapsing the steady-state K=3 latency budget. AC-4.1 has no thermal carve-out — the 400 ms p95 must hold across the operating envelope.

Decision: K=3 baseline (DISK+LightGlue × 3 candidates from C2.5; GTSAM Marginals 6×6 covariance recovery in C4) auto-degrades to K=2 + Jacobian-based covariance under thermal throttle. The trigger is the Jetson's thermal-throttle telemetry crossing a configurable temperature/clock threshold (set per D-C7-9 JetPack 6.2 + TensorRT 10.3 lock). NFT-9 hot-soak validates the hybrid.

Alternatives considered:

  1. K=3 fixed + larger latency budget — rejected: AC-4.1 is the contract.
  2. K=2 always — rejected: ~510 % accuracy loss at steady state hurts AC-NEW-4 headroom.

Consequences: ~510 % accuracy loss at the upper thermal envelope (still inside AC-NEW-4). The hybrid is part of the runtime, not a config knob; the threshold is.

ADR-007 — mock-suite-sat-service is an e2e-test fixture, not a first-class component (REVERSED 2026-05-09)

Context: D-PROJ-2 (parent-suite ingest endpoint + voting layer) is not yet implemented. NFT-SEC-01 / FT-P-17 / IT runs need a counterparty for the post-landing upload contract. An earlier iteration of this ADR promoted the mock to a first-class component boundary peer of satellite-provider, with its own description under components/ and its own deployable image — to make the contract auditable.

Decision (current): the mock is an e2e-test fixture only, scoped under tests/fixtures/mock-suite-sat-service/. The architectural counterparty for both the existing download path and the planned D-PROJ-2 upload path is the real satellite-provider. The contract sketch lives in _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md (the source of truth for the parent-suite work) and is mirrored in C11 Tile Manager's external API section (the onboard consumer's view). The mock implements that contract in tests; production never reaches it.

Why reversed: promoting an e2e-test fixture to a component boundary inflated the architectural surface and risked the test fixture drifting away from the real contract once D-PROJ-2 lands. The contract sketch in the leftover file is sufficient as the auditable source of truth without a separate component spec.

Alternatives considered:

  1. Keep ADR-007 as originally written — rejected: see "Why reversed".
  2. Wait for D-PROJ-2 service-side implementation before any tests — rejected: blocks the onboard cycle.

Consequences: The mock continues to ship in the operator-tooling tarball's compose file as a test-time service, but it is no longer documented under _docs/02_document/components/. Test specs and CI references treat it as a fixture. When satellite-provider ships the real endpoint, the fixture is replaced by pointing tests at the real service; no architectural changes flow from that switch.

ADR-008 — D-C8-2 source-set switch is Selected with runtime gate (Mode B Fact #111)

Context: AC-NEW-2 requires spoofing-promotion latency < 3 s. The companion-driven MAV_CMD_SET_EKF_SOURCE_SET switch (D-C8-2 = (b)) is firmware-supported but has no production-deployed precedent — the project would establish the canonical pattern.

Decision: D-C8-2 = (b) is selected with a runtime gate: ArduPilot Plane SITL validation (IT-3) is the lock gate. If IT-3 fails, D-C8-2-FALLBACK options are recorded — (a) operator-manual RC aux switch with relaxed AC-NEW-2 wording; (b) operator-warning STATUSTEXT instead of automated switch; (c) escalate to ArduPilot dev community.

Alternatives considered: see D-C8-2-FALLBACK above.

Consequences: AC-NEW-2 contractual latency is contingent on IT-3 passing. If IT-3 fails, AC-NEW-2 wording is renegotiated as part of D-C8-2-FALLBACK = (a).

ADR-009 — Interface-first components, constructor injection, one folder per component

Context: The architecture deliberately requires multiple interchangeable implementations per component (three VioStrategy for C1; UltraVPR / MegaLoc / MixVPR / SelaVPR / EigenPlaces / NetVLAD / SALAD candidates for C2; pymavlink-AP and YAMSPy-iNav adapters for C8). ADR-002 further mandates that the same logical component ship in different concrete forms across binaries (deployment binary vs IT-12 research binary; future packaging variants if/when needed). Without a strict interface boundary, sibling components import each other's concrete classes; build-time exclusion via BUILD_* flags becomes a fragile compile-time afterthought rather than an architectural property; testing each strategy in isolation requires monkey-patching; and adding a new strategy ripples into every call site. The interface-first pattern is the architectural mechanism that makes ADR-001 (runtime selection) and ADR-002 (build-time exclusion) tractable simultaneously.

Decision:

  1. Interface first. Every component is specified as a Python Protocol (or abc.ABC, when concrete defaults are useful) before any concrete implementation is written. The interface is the contract; concrete implementations satisfy it. Step 3 component specs document the interface signature; concrete implementations are documented under their own header inside the component spec.

  2. One folder per component. Source layout (per coderule.mdc "place source code under src/"):

    src/
      components/
        c1_vio/
          __init__.py
          interface.py                 # VioStrategy Protocol + VioOutput, VioConfig DTOs
          okvis2_strategy.py           # deployment-default (pending IT-12 verdict)
          vins_mono_strategy.py        # research-only; behind BUILD_VINS_MONO (ADR-002)
          klt_ransac_strategy.py       # engine-rule-mandatory simple-baseline
          tests/
        c2_vpr/
          __init__.py
          interface.py                 # VprStrategy Protocol
          ultra_vpr.py                 # deployment-default (Documentary Lead PRIMARY)
          mega_loc.py
          mix_vpr.py                   # mandatory simple-baseline alternate
          sela_vpr.py
          eigen_places.py
          net_vlad.py                  # mandatory simple-baseline classical
          salad.py                     # additional candidate; behind BUILD_SALAD (ADR-002)
          tests/
        c2_5_rerank/
          interface.py                 # ReRankStrategy
          inlier_count_rerank.py
          tests/
        c3_matcher/
          interface.py                 # CrossDomainMatcher
          disk_lightglue.py
          aliked_lightglue.py
          xfeat.py
          tests/
        c3_5_adhop/
          interface.py                 # ConditionalRefiner
          adhop_refiner.py
          passthrough_refiner.py       # for non-conditional baseline
          tests/
        c4_pose/
          interface.py                 # PoseEstimator
          opencv_gtsam_estimator.py
          tests/
        c5_state/
          interface.py                 # StateEstimator
          gtsam_isam2_estimator.py
          eskf_estimator.py            # mandatory simple-baseline
          tests/
        c6_tile_cache/
          interface.py                 # TileStore + TileMetadataStore + DescriptorIndex
          postgres_filesystem_store.py
          faiss_descriptor_index.py
          tests/
        c7_inference/
          interface.py                 # InferenceRuntime
          tensorrt_runtime.py
          onnx_trt_ep_runtime.py
          pytorch_fp16_runtime.py
          tests/
        c8_fc_adapter/
          interface.py                 # FcAdapter (in+out), GcsAdapter
          pymavlink_ardupilot_adapter.py
          msp2_inav_adapter.py
          qgc_telemetry_adapter.py
          tests/
       c10_cache_provisioning/
         interface.py                 # CacheProvisioner, ManifestVerifier
         provisioner.py
         tests/
       c11_tilemanager/                 # SEPARATE BINARY — never linked into airborne image
         interface.py                 # TileDownloader, TileUploader (two interfaces in one component)
         http_tile_downloader.py
         http_tile_uploader.py
         tests/
        c13_fdr/
          interface.py                 # FdrWriter
          file_fdr_writer.py
          tests/
      composition/
       runtime_root.py                  # composition root: config -> concrete graph
       tilemanager_root.py              # composition root for the C11 operator-side tool (download + upload)
       research_root.py                 # composition root for the research/dev binary
    
  3. Constructor injection only. Every component class declares its collaborators as typed __init__ arguments, against the sibling's interface (not the concrete class). Example sketch:

    # src/components/c4_pose/interface.py
    from typing import Protocol
    class PoseEstimator(Protocol):
        def estimate(self, match: MatchResult, calibration: CameraCalibration) -> PoseEstimate: ...
    
    # src/components/c5_state/gtsam_isam2_estimator.py
    class GtsamIsam2StateEstimator:
        def __init__(
            self,
            *,
            pose_estimator: PoseEstimator,           # interface, not concrete
            imu_source: ImuSource,                   # interface
            fdr: FdrWriter,                          # interface
            config: StateEstimatorConfig,
        ) -> None:
            self._pose = pose_estimator
            self._imu = imu_source
            self._fdr = fdr
            self._cfg = config
    
  4. Composition root (src/composition/runtime_root.py) is the only place that knows about concrete classes. It reads config, picks each concrete implementation, validates that every named implementation is actually linked into the active binary (fails fast otherwise), and wires the graph. Every other module sees only interfaces. Build-time exclusion (ADR-002) becomes architectural, not configurational: the deployment binary's composition root literally cannot wire VinsMonoVioStrategy because that file is not linked into the deployment binary (BUILD_VINS_MONO=OFF). Future packaging variants (e.g., a customer bundle with a different VprStrategy set) work the same way — a different BUILD_* flag combination + the same composition root code.

  5. Python DI mechanism: hand-rolled constructor injection in the composition root is the default — it has no extra dependency, is trivially understandable, and matches the pattern of "select once at startup, never hot-swap". A heavier DI library (dependency-injector, injector, punq) is only introduced if the composition root grows past ~150 lines or test-side wiring becomes repetitive; that is a Plan-phase deferred decision (carryforward), not a current architectural commitment. Mocking in tests is via simple stub classes that satisfy the same Protocol — no monkey-patching, no unittest.mock.patch.

  6. Test wiring: each component's tests/ folder owns the test composition for that component. Test composition roots wire the unit-under-test against in-memory / fake implementations of every interface dependency. Cross-component integration tests (Tier-1) compose multiple real components with a fake FcAdapter + fake TileStore + fake InferenceRuntime. End-to-end Tier-2 tests run against the real composition root.

Alternatives considered:

  1. Sibling concrete imports (from c5_state.gtsam_isam2 import GtsamIsam2StateEstimator) — rejected: makes ADR-002 build-time exclusion a CMake / SBOM artifact rather than an architectural property; couples C4 to a specific C5 implementation and vice versa; defeats the per-component test wiring; ripples into every call site whenever a new strategy is added.
  2. Service locator / global registry (e.g., a process-wide DI singleton accessed via get_service(VioStrategy)) — rejected: hides the dependency graph from constructors, makes test isolation harder, and re-introduces the singletons banned in coderule.mdc.
  3. Function-based DI (passing factories instead of instances) — rejected as the default: more cognitive overhead than constructor injection for a startup-bound, never-hot-swapped runtime. Reserved for the few call sites where lazy construction is genuinely required (e.g., the per-flight MAVLink signing key generator).
  4. Heavy DI framework (dependency-injector, injector, punq) from day one — rejected as default: introduces a runtime dependency for a problem the composition root can solve in plain Python; reserved as an opt-in if the composition root outgrows hand-rolled wiring.

Consequences:

  • Step 3 component decomposition produces, for every component: an interface.py description + ≥ 1 concrete implementation description + a test composition.
  • The composition root is itself a reviewable artifact (a single Python file per binary track) that documents which concrete implementations a given binary contains.
  • Build-time exclusion (ADR-002) becomes architectural: the deployment composition root cannot import a strategy whose file is not part of the deployment binary's CMake target. The same property scales to any future packaging variant — including, if/when product licensing strategy is decided, license-driven bundles (Principle #13 NOTE), without any source-level change in application code.
  • Per-component folders give each implementation a natural home for its own tests/, fixtures, and adapter-specific helpers — matching coderule.mdc's "logic specific to a platform, variant, or environment belongs in the class that owns that variant".
  • Adding a new C2 VPR backbone (e.g., a future foundation-model retrieval backbone via D-C2-12) is a folder-add + interface-conformance change; no other component is touched.

Cross-Component Contract Surface (AZ-507)

The ADR-009 "interface, not concrete" rule has an architectural sibling: cross-component imports go through _types/*.py (DTOs + typed-error envelopes such as _types.inference_errors), never through components.X (Public API). The only exception is runtime_root/* (the composition root), which is allowed to import concrete strategies across components precisely because it is the single place that resolves Protocol parameters to concrete classes. Every other module under components/**/*.py consumes cross-component contracts via (a) shared DTOs in _types/*, and (b) consumer-side structural Protocol cuts defined locally inside the consuming component (e.g. c10_provisioning.engine_compiler.CompileEngineCallable for the narrow compile_engine surface of the C7 InferenceRuntime). This is the same architectural property as constructor-injection-against-interface, applied to the import graph rather than the call graph. The AZ-270 test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies lint enforces this on every components/**/*.py; AZ-507 reconciles module-layout.md with the lint so the documentation and the build gate agree.

ADR-010 — Operator-planned mission is the cold-start trust anchor; FC GPS is secondary

Context: The original cold-start design (AZ-419 / FT-P-11) assumed the FC EKF's last valid GPS fix is available at takeoff to seed C5. Field reality contradicts this: a UAV operating in a contested-EW environment may have GPS jammed before takeoff (the jamming radius reaches the launch site, the unit launches under a jammer's umbrella, etc.). In that case the FC EKF has no GPS fix to give, and the companion has nothing to anchor the initial pose to — the entire downstream pipeline (VIO bootstrap, VPR retrieval scope, satellite anchoring) collapses or runs blind. At the same time, the parent suite already requires the operator to author a route in the Mission Planner UI (suite/ui) and persist it to the flights REST service (suite/flights) before any flight runs. The waypoint ordering is operationally meaningful: waypoint[0] is the planned takeoff point. The operator therefore already declares the takeoff position with operationally relevant accuracy (typically a few tens of metres) hours before launch, in a context that has no dependency on GPS at all. This information is the natural cold-start trust anchor.

Decision:

  1. Flight is read pre-flight, not in-flight. C12 (the operator-side tool, separate binary from the airborne companion — per ADR-002) calls the parent-suite flights REST service via a typed client (AZ-489 FlightsApiClient) when the operator runs gps-denied-cli build-cache --flight-id <Guid>. An offline path (--flight-file <path>) reads the same DTO shape from a JSON export so the workflow survives operator workstations that have no path to the flights service. The companion binary never depends on the flights service at runtime (Principle #9 — denied-environment operation).
  2. C12 derives bbox + takeoff origin from the Flight. The bbox is the envelope of waypoint lat/lon plus a configurable buffer (default 1 km, AZ-489 AC-3). The takeoff origin is Flight.waypoints[0].(lat, lon, alt) — the operator's authored launch point.
  3. Both fields are baked into the C10 Manifest. BuildRequest and Manifest carry takeoff_origin: LatLonAlt | None (AZ-323 / AZ-325 / AZ-324 amendments). The hash that drives D-C10-1 idempotence includes takeoff_origin, so a re-plan of the route produces a new cache identity and the verifier (AZ-324) rejects a mismatched cache at boot.
  4. C5 consumes the origin before any sensor sample. The companion's composition root reads takeoff_origin from the cache manifest at boot and invokes set_takeoff_origin(origin, sigma_horiz_m, sigma_vert_m) on the active StateEstimator (AZ-490) before the first add_vio / add_fc_imu call. Both GtsamIsam2StateEstimator and EskfStateEstimator accept the origin as a Bayesian prior — iSAM2 attaches a PriorFactorPose3 at Pose3.Identity() (the operator origin BECOMES the local-ENU (0,0,0) anchor) with diagonal sigmas [5°, 5°, 5°, sigma_horiz_m, sigma_horiz_m, sigma_vert_m]; ESKF seeds the nominal position to (0,0,0) and writes the position block of the error covariance to diag(sigma_horiz_m², sigma_horiz_m², sigma_vert_m²). Defaults are sigma_horiz_m = 5.0 m, sigma_vert_m = 10.0 m from C5StateConfig.
  5. FC GPS is a secondary, gated input. If the FC EKF later produces a GPS reading (in-flight or at takeoff), it is fused through the existing add_pose_anchor machinery only after passing the three-part gate of Principle #11 — including the ≤ 200 m bounded-delta check against the companion's last emitted PoseEstimate. Real GPS that passes the gate is one more measurement, never an override.
  6. Failure modes. If the Manifest has no takeoff_origin AND the FC EKF has no usable GPS at takeoff, C5 stays in INITIALIZING and the FC adapter (C8) emits a non-fused source label; the FT-P-11 takeoff-abort policy (AZ-419 amended) applies. If the Manifest has takeoff_origin AND the FC EKF GPS is wildly inconsistent with it at takeoff (e.g., > 200 m), the operator origin wins and the FC GPS is logged as suspect — this is the GPS-spoofed-at-takeoff case and is the entire point of this ADR.

Alternatives considered:

  1. Keep FC EKF as primary (status quo of AZ-419) — rejected: cannot survive GPS-denied takeoff, which is in scope per Principles #1 and #9. Field reports of pre-launch jamming make this a realistic, not edge-case, failure mode.
  2. Operator types the origin into a CLI prompt at build-cache time — rejected: duplicates information the Mission Planner UI already captures, drifts from the canonical route, and breaks if the operator re-plans without re-typing. The Flight DTO is the single source of truth.
  3. Pull Flight from the companion at runtime over a back-channel — rejected: violates Principle #9 (denied-environment operation; no egress from the companion to anything other than the FC). The flights service is an operator-workstation concern only.
  4. Treat operator origin as a hard assignment instead of a prior — rejected: a hard assignment cannot be fused with a later high-quality posterior, breaks ADR-003's "honest covariance" property, and prevents the add_pose_anchor fusion path from ever correcting the origin if it was authored with imprecision.

Consequences:

  • AZ-419 (FT-P-11) is amended: the primary cold-start path is operator-origin-from-manifest; FC-EKF-GPS is the fallback path with its own sub-AC.
  • C10 contracts gain a takeoff_origin field in BuildRequest, Manifest, and the verifier's validation set (AZ-323 / AZ-325 / AZ-324). Contract version bumps to v1.1.0.
  • C5 gains a set_takeoff_origin(origin, sigma_horiz_m, sigma_vert_m) method on the StateEstimator protocol (AZ-490). Protocol contract version bumps to v1.1.0.
  • C12 gains the FlightsApiClient boundary + offline --flight-file path (AZ-489).
  • Principle #11 (the spoofed-GPS gate) is extended with the bounded-delta clause; the gate now serves both takeoff and mid-flight.
  • The companion binary's network surface is unchanged — only C12 (operator-side, separate binary) talks to the flights service.