Files
gps-denied-onboard/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh 5391d2c710 update reserach skill
2026-04-29 11:58:37 +03:00

26 KiB

GPS-Denied Onboard System - Architecture

Architecture Vision

The GPS-Denied stack is an onboard companion-computer software system for a fixed-wing UAV. It runs on a Jetson Orin Nano Super beside an ArduPilot flight controller and produces GPS-equivalent WGS84 position fixes when the real GPS is denied, jammed, or spoofed. The runtime shape is a ROS 2 Humble pipeline: camera frames, FC IMU/attitude/altitude telemetry, and a pre-loaded satellite tile cache flow through a cheap nadir motion bridge, conditional visual place recognition, cross-view matching for anchors/corrections, covariance calibration, MAVLink output, in-flight tile generation, and slim companion-side flight recording. The system owns only the onboard stack, the Jetson UI pre-flight workflow, the local tile-cache read/write behavior, and the provisioning workflow; upstream satellite sourcing and trusted basemap promotion remain owned by the Azaion Suite Satellite Service.

Components and intent-level responsibilities:

Component Responsibility
C-1 Satellite Tile Cache & Descriptor Index Maintain the local MBTiles tile cache, descriptor indexes, freshness metadata, sidecar metadata, and read/write access for runtime localization and tile generation.
C-1b Ortho-Tile Generator Orthorectify eligible nav-cam frames into basemap-aligned z=19 tiles using camera calibration, pose, and per-sector DEM data.
C-2 Visual Place Recognition Retrieve candidate basemap chunks only on re-localization triggers using DINOv2 SALAD/BoQ or fallback descriptors over a FAISS index.
C-3 Cross-View Matching & PnP Match nav-cam frames to satellite basemap candidates, estimate absolute pose, and produce covariance for the output stage.
C-4 Nadir Motion Bridge Provide cheap per-frame/near-frame relative motion from nav-frame overlap, homography/optical flow, FC IMU, attitude, altitude, and DEM constraints between satellite-anchored fixes.
C-5 Companion-Side Output Stage Calibrate covariance, apply a Mahalanobis outlier gate, assign source labels, and emit GPS-equivalent fixes without doing state propagation.
C-6 MAVLink Integration & Source Promotion Manage MAVLink telemetry, signed GPS_INPUT injection, sysid separation, and spoofing-promotion behavior.
C-7 Failsafe, Health & Re-Localization Detect loss of confidence, raise re-localization workflows, report health, and keep FC failsafe behavior explicit.
C-8 Object Localization Compute coordinates for AI-camera detections from current GPS-Denied pose, gimbal angle, zoom, and altitude.
C-9 Software Platform & Process Topology Package and orchestrate the ROS 2 / MAVROS / rclpy / CUDA-TensorRT node graph on JetPack 6.
C-10 Companion-Side FDR Store only companion-unique forensic signals, tile-generation metadata, failure thumbnails, and Jetson health under a rolling retention cap.
C-11 Confidence Score Define and validate the (sigma_xy, source_label) contract carried by every emitted fix.
C-12 Provisioning Tool Provide provision-fc.py, guided camera calibration, FC parameter pinning, MAVLink2 signing bootstrap, smoke tests, and signed airframe-manifest generation.

Architectural principles and non-negotiables:

  • Q1) Tile-cache + sidecar schema authority = THIS build (sole owner).
  • Q2) Operator-hint MAVLink mechanism is OOS for v1; QGC is the only GCS; no custom Azaion GCS service exists.
  • Q3) Tile-cache folder is the boundary contract with the Suite Satellite Service. Pre-flight: operator uses Jetson UI (deployed on Jetson, with internet) to draw route + sector; flights API requests tiles from Suite Satellite Service; tiles land in /var/azaion/tile-cache/ (path owned by ../satellite-service/). Companion reads + writes during/post flight. Throughput-sensitive on initial pre-load (~700 MB at z=19).
  • Q4) Three-scope configuration model: Global = bundled in companion deployment image. Per-airframe = signed manifest produced by Provisioning Tool (NEW Component 12) at client setup. Per-mission = YAML written by Jetson UI alongside the tile fetch (operational area polygon, sector classes, mission ID).
  • Q5) Suite Satellite Service contract authorship lives in ../satellite-service/. This build references, does not own. Includes tile-cache folder layout + file allocation.
  • Q6 / F2b) SLIM companion FDR - companion-side-unique signals only: per-frame source-label decisions + gate values (matcher inliers, VPR top-K scores, sigma_xy pre/post calibration, Mahalanobis distance), tile-generation events with sidecar metadata, failure thumbnails (<=0.1 Hz), Jetson health (CPU/GPU/temp/throttle, 1 Hz delta-encoded). Format: FlatBuffers + zstd at segment close. Retention: rolling 3 flights, oldest auto-removed. Storage cap: ~2 GB total (~500 MB/flight). FC dataflash remains authoritative for IMU, GPS_INPUT receipts, EKF3 state, MAVLink tlog - companion does NOT duplicate.
  • F3b) Storage tile zoom locked to z=19 (60 cm/px), 5x scale ratio vs nav cam. Documented fall-back to z=20 if matcher bench-off shows AC-2.2 (cross-domain MRE <2.5 px) failure.
  • FC connection (per F1 elaboration): Provisioning Tool provision-fc.py one-button workflow - detect FC, pin SERIAL2_BAUD/PROTOCOL, pin GPS1_TYPE=14 + EK3_SRC1_*, bootstrap MAVLink2 signing key, smoke-test (RAW_IMU rate + GPS_INPUT round-trip). Manual residuals: physical wiring (labeled harness), checkerboard intrinsics cal (guided by Jetson UI), FC compass/accelerometer cal (standard QGC procedure).

Open architecture research item:

  • Cross-view matcher accuracy at z=19 (5x scale ratio) on Orin Nano Super must validate AC-2.2 cross-domain MRE <2.5 px before the z=19 lock is treated as final for deployment. z=20 remains the documented fallback if bench-off fails.

1. System Context

Problem being solved: the UAV must continue estimating the GPS coordinates of navigation-camera frame centers, and AI-camera object centers, when true GPS is denied or spoofed. It has an initial GPS handoff, a downward fixed nav camera, FC IMU/attitude telemetry, and pre-loaded satellite imagery, but must operate in real time on a constrained Jetson platform for long fixed-wing missions.

System boundaries:

  • Inside this build: Jetson UI, companion runtime nodes, local tile cache schema and sidecars, VPR/matcher/VO/output/FDR nodes, provisioning tool, FC parameter bootstrap, and post-flight tile upload client behavior.
  • Outside this build: satellite imagery procurement, trusted-basemap promotion, Suite Service ingest/voting implementation, QGroundControl itself, ArduPilot firmware internals, AI detection model internals, and GCS-side custom application development.

External systems:

System Integration Type Direction Purpose
ArduPilot FC MAVLink2 over serial/USB Both Source of IMU/attitude/GPS-health telemetry; sink for signed GPS_INPUT position fixes.
QGroundControl MAVLink via FC telemetry Outbound primarily Operator situational awareness through 1-2 Hz summary and STATUSTEXT events.
Azaion Suite Satellite Service HTTPS/API + tile-cache folder contract Both, pre/post flight only Supplies pre-flight tiles and receives candidate onboard tiles after landing.
AI detection systems Local process/topic API Inbound request, outbound coordinates Request AI-camera object geolocation using GPS-Denied pose.
Public/field test corpora Offline files Inbound to tests/bench-off Provide validation data for matcher, VO, and robustness acceptance criteria.

2. Technology Stack

Layer Technology Version / Constraint Rationale
OS / BSP JetPack 6 on Ubuntu 22.04 Jetson Orin Nano Super, 25 W mode Matches hardware target and CUDA/TensorRT availability.
Robotics middleware ROS 2 Humble Locked v1 orchestration choice Provides node contracts, bags, topic introspection, diagnostics, and MAVROS integration without assuming cuVSLAM applicability.
Runtime languages C++ and Python C++ for Isaac/MAVROS nodes; Python 3.11/3.12 for rclpy nodes Reuses existing ROS ecosystem while keeping model and orchestration logic simple.
Nadir motion bridge Custom/bench-selected fixed-wing nadir VO: homography/optical flow + FC IMU/attitude/altitude + DEM constraints Single fixed nadir camera, fixed-wing flight Matches the actual sensor topology. Isaac ROS cuVSLAM is not selected for v1 because official docs center on stereo/multi-camera VIO and do not prove support for one monocular nadir camera in this flight envelope.
Matching / retrieval TensorRT, SP+LG/GIM-LG, LiteSAM fallback, FAISS FP16 default, INT8 when validated Keeps inline matching under latency budget while preserving a rare re-loc fallback.
Tile storage MBTiles SQLite + WAL z=19 storage, sidecar schema owned here One persistent local cache with transactional writes and predictable file footprint.
Data serialization FlatBuffers + zstd Segment compression at close Low-overhead FDR format with bounded storage.
MAVLink MAVROS, pymavlink, MAVSDK GPS_INPUT via pymavlink sysid=11; telemetry via MAVSDK sysid=10 MAVSDK lacks native GPS_INPUT; sysid separation avoids router daemon.
Orthorectification Orthority with fallback to OpenCV + DEM SRTM-30 m DEM Uses a maintained projection library, with a measured fallback if F-T14 fails.
Database None for app state; SQLite inside MBTiles Local file only No server database belongs on the aircraft runtime path.
CI/CD Containerized build and hardware bench gates Later deployment artifacts Must build TRT engines at install time and run SITL/bench-off gates before flight release.

Key constraints from restrictions.md and Phase 2a decisions:

  • Runtime hardware is Jetson Orin Nano Super with 8 GB shared memory and 25 W thermal envelope.
  • Flights last up to 8 hours, at about 60 km/h, over up to 400 km2 operational areas.
  • Satellite imagery is pre-loaded only; in-flight network access to the Service is not part of the runtime path.
  • QGroundControl is the only supported GCS in v1.
  • ArduPilot is the only v1 autopilot target; PX4 is out of scope.
  • The confirmed architecture supersedes stale z=20/full-FDR source wording with z=19 storage and slim companion FDR.

3. Deployment Model

Environments:

Environment Purpose Representative Runtime
Development Local component development, unit tests, fixture generation Dev machine plus containerized ROS 2 nodes where possible.
Bench / SITL MAVLink loop, cold start, source switching, and signing tests Jetson or Jetson-equivalent container + ArduPilot SITL.
HIL / Field Test Thermal, camera, FC, and real hardware validation Actual Jetson, FC, cameras, NVMe, and harness.
Production Flight Operational aircraft runtime Signed companion image, signed airframe manifest, mission YAML, local tile cache.

Infrastructure:

  • Onboard deployment is a companion image installed on the Jetson with ROS 2 packages, Python nodes, TensorRT engines, CUDA assets, calibration files, and provisioning tooling.
  • No cloud service is required in flight. The Suite Satellite Service is contacted before flight for tile fetch and after flight for candidate upload.
  • The local persistent storage layout separates the service-owned tile-cache boundary from FDR segments, logs, mission YAML, and airframe manifests.

Environment-specific configuration:

Config Scope Development / Bench Production Flight
Global Repository defaults and dev image values Bundled in signed companion deployment image.
Per-airframe Test manifest Signed manifest from Provisioning Tool containing FC params, sysids, signing fingerprint, and camera intrinsics.
Per-mission Fixture YAML Mission YAML written by Jetson UI with operational polygon, sector classes, and mission ID.
Secrets Dev-only keys, never shared MAVLink2 signing key provisioned to FC; no hardcoded keys in image.
Logging Console + bags + local logs Slim FDR + selected ROS 2 bags where explicitly enabled for tests.

4. Data Model Overview

Entity Description Owned By Component
Airframe Manifest Signed per-airframe record of FC parameter pins, sysids, signing key fingerprint, and camera calibration references. C-12
Mission YAML Per-mission operational polygon, sector classification, mission ID, and cache-fetch metadata. Jetson UI / C-12
Tile Cache Persistent MBTiles store and sidecar schema under /var/azaion/tile-cache/. C-1
Tile Sidecar Metadata for tiles: capture date, trust level, parent pose sigma, terrain class, source, and freshness state. C-1 / C-1b
VPR Chunk Ground-footprint retrieval unit derived from storage tiles, multi-scale in active-conflict sectors. C-2
Pose Hypothesis Relative or absolute pose plus covariance from VO or matcher. C-3 / C-4
Position Estimate WGS84 fix, covariance, source label, gate values, and timestamp emitted toward FC. C-5 / C-11
Tile Generation Event Eligibility, quality, dedup, write decision, and sidecar metadata for an onboard tile. C-1b / C-10
FDR Segment FlatBuffers segment compressed with zstd at close, stored under rolling retention. C-10
Object Geolocation Request AI-camera detection plus gimbal/zoom/altitude inputs and resulting object coordinate. C-8

Key relationships:

  • One airframe has one active signed airframe manifest and many missions.
  • One mission has one mission YAML, one operational area, many sectors, many VPR chunks, and a bounded local tile cache subset.
  • One position estimate may be satellite-anchored, VO-extrapolated, or dead-reckoned; only valid estimates are emitted as GPS_INPUT.
  • One onboard-generated tile is tied to a parent position estimate and may later be uploaded as a candidate to the Suite Satellite Service.
  • One FDR segment contains many position estimate decisions, gate values, tile-generation events, health samples, and failure thumbnails.

Data flow summary:

  • Suite Satellite Service -> Jetson UI -> Tile Cache: pre-flight tile fetch and descriptor/index preparation.
  • Nav Cam + FC IMU/attitude/altitude -> Nadir Motion Bridge -> Output Stage -> ArduPilot FC: cheap per-frame or near-frame motion propagation between satellite anchors.
  • Tile Cache -> VPR/Matcher -> Output Stage: initial anchors, periodic drift correction, re-localization, and confidence reset. Satellite matching is not the every-frame path.
  • Nav Cam + Position Estimate + DEM -> Ortho-Tile Generator -> Tile Cache -> Suite Service: in-flight tile write and post-flight candidate upload.
  • Companion Nodes -> FDR: companion-unique forensic record only.

5. Integration Points

Internal Communication

From To Protocol / Format Pattern Notes
Nav camera ingest Nadir motion bridge, matcher, tile generator ROS 2 image topics Stream Raw frames are processed in memory and not retained.
MAVROS Nadir motion bridge, output stage, object localization ROS 2 topics Stream Provides FC IMU, attitude, altitude, EKF/GPS health.
Tile cache VPR, matcher, tile generator SQLite/MBTiles + sidecar API Local file access WAL and transaction batching required.
VPR Matcher Candidate chunk shortlist Request/response Invoked conditionally, not on every frame.
Matcher/PnP Output stage Pose hypothesis + covariance Event Input to Mahalanobis gate and covariance calibration.
Output stage MAVLink injector GPS_INPUT payload Command v1 emits GPS_INPUT only.
Runtime nodes FDR writer FlatBuffers records Append Records companion-side-unique signals.
AI detection systems Object localization Local API/topic Request/response Uses current position and AI camera pose inputs.

External Integrations

External System Protocol Auth Rate / Limits Failure Mode
ArduPilot FC MAVLink2 Mandatory signing on flight link Per-frame GPS_INPUT, health telemetry If output stops >3 s, FC falls back to IMU-only dead reckoning.
QGroundControl MAVLink via FC FC/GCS link controls 1-2 Hz downsampled summary Operator sees degraded status; no custom GCS service exists.
Suite Satellite Service HTTPS/API + folder contract Suite auth outside this build Pre-flight/post-flight only Flight can proceed only with sufficient local cache; no in-flight fetch.
Field/bench corpora Files/datasets Dataset-specific Offline tests only Deferred-corpus rows remain blocked until data is acquired.

6. Non-Functional Requirements

Requirement Target Measurement Priority
Frame-center accuracy >=80% within 50 m; >=50% within 20 m in normal segments Replay against ground truth High
Cross-domain registration MRE <2.5 px Matcher bench-off and replay tests High
VO drift <100 m VO-only or <50 m with IMU between anchors AerialVL / field replay High
End-to-end latency <400 ms p95 camera-to-FC output Jetson 25 W benchmark High
Memory <8 GB shared LPDDR5 Runtime telemetry under load High
Cold start TTFF <30 s p95 50x boot bench High
Spoofing promotion <3 s p95 SITL spoofing scenario High
False-position safety P(error >500 m) <0.1%; P(error >1 km) <0.01% per flight Monte Carlo replay High
Tile poisoning safety P(tile error >30 m) <1%; >100 m <0.1% per flight Multi-flight Monte Carlo High
Thermal envelope 25 W sustained for 8 h at +50 C, no throttling Hot-soak HIL High
FDR retention Rolling 3 flights; ~2 GB total cap 8-hour synthetic load Medium
Tile cache footprint About 700 MB for 400 km2 at z=19 plus index/DEM overhead Cache-generation run Medium

7. Security Architecture

Authentication and trust boundaries:

  • MAVLink2 signing is mandatory for flight links between the companion and FC.
  • The airframe manifest is signed and produced during provisioning; runtime refuses to silently invent missing FC/signing/camera parameters.
  • Suite Service auth is used only during pre-flight tile fetch and post-flight candidate upload.

Authorization:

  • The Jetson UI exposes provisioning and mission setup functions only in the ground/pre-flight context.
  • Runtime flight nodes do not accept arbitrary GCS commands; v1 operator hints are out of scope except standard QGC status/telemetry visibility.

Data protection:

  • At rest: no raw nav-cam or AI-camera frames are retained in normal operation. FDR contains bounded forensic metadata and failure thumbnails only.
  • In transit: MAVLink2 signing protects the companion-FC link; Suite Service calls must use TLS.
  • Secrets: no MAVLink signing key or Suite credential is hardcoded in source or image defaults.

Audit logging:

  • FDR records source-label decisions, gate values, tile-generation events, signing/provisioning status, promotion/demotion events, and Jetson health.
  • FC dataflash remains authoritative for IMU, received GPS_INPUT, EKF3 state, and MAVLink tlog.

8. Key Architectural Decisions

ADR-001: ROS 2 Humble Runtime, Without cuVSLAM as v1 VO Lead

Context: the runtime needs ROS 2 integration for camera ingest, FC telemetry, node observability, and MAVLink plumbing. However, the aircraft has one fixed nadir monocular nav camera. Isaac ROS Visual SLAM / cuVSLAM is not documented as a proven fit for this exact sensor topology and fixed-wing altitude/speed envelope; its official material centers on stereo/multi-camera visual-inertial SLAM.

Decision: use ROS 2 Humble on JetPack 6 as the v1 runtime platform, but do not select cuVSLAM as the v1 visual-odometry lead. C-4 is a nadir-specific motion bridge using frame overlap/homography/optical flow plus FC IMU/attitude/altitude and DEM constraints. cuVSLAM is Rejected for v1 unless future bench evidence proves reliable operation with one monocular nadir camera under the actual fixed-wing flight envelope.

Alternatives considered:

  1. Isaac ROS cuVSLAM as the VO lead - rejected for v1 because exact-fit support is unproven for one monocular nadir camera at the required flight conditions.
  2. Single Python asyncio orchestrator - rejected because FC telemetry, camera ingest, diagnostics, and replay tooling benefit from ROS 2 contracts.
  3. Mixed ad hoc subprocess topology - rejected because observability, topic contracts, and failure recovery would be weaker.

Consequences: accepts DDS overhead and larger image footprint in exchange for robotics tooling, but keeps the motion-estimation algorithm matched to the actual aircraft sensor geometry. C-4 must be validated with a bench-off against the project acceptance criteria before implementation lock.

ADR-002: GPS_INPUT Only in v1

Context: parallel GPS_INPUT and ODOMETRY for overlapping axes risks EKF3 double-fusion behavior.

Decision: v1 emits signed GPS_INPUT only; ODOMETRY remains disabled until a v1.1 SITL gate proves clean source switching.

Alternatives considered:

  1. Emit GPS_INPUT and ODOMETRY together - rejected because of documented EKF3 instability risks.
  2. ODOMETRY-primary v1 - rejected because the product framing and ArduPilot parameter path are currently GPS-substitute oriented.

Consequences: covariance fidelity is limited to GPS_INPUT fields, so companion-side calibration and gate correctness become more important.

ADR-003: No Companion-Side State-Propagating Filter in v1

Context: a companion EKF feeding FC EKF3 duplicates fusion responsibilities.

Decision: C-5 calibrates covariance, gates outliers, labels source, and emits fixes; ArduPilot EKF3 remains the only state-propagating filter.

Alternatives considered:

  1. Companion EKF - rejected because it creates double-fusion and ownership ambiguity.
  2. Companion ESKF - reserved for v1.x only if evidence requires it.

Consequences: VO and matcher components must expose honest covariance; F-T18 and AC-NEW-4 become critical safety gates.

ADR-004: z=19 Tile Storage with z=20 Fallback

Context: Phase 2a.0 locked z=19 to reduce cache preload from about 2.8 GB to about 700 MB for 400 km2, accepting a 5x scale ratio to the nav camera.

Decision: store the runtime basemap as z=19 MBTiles and validate cross-domain accuracy empirically before deployment lock.

Alternatives considered:

  1. z=20 storage - safer matcher scale ratio but heavier preload; retained as fallback if z=19 fails AC-2.2.
  2. z=18 storage - rejected because scale ratio is too high for reliable matching.

Consequences: the matcher bench-off must explicitly test z=19 cross-view MRE <2.5 px on Orin Nano Super.

ADR-005: Tile-Cache Folder Boundary with Suite Satellite Service

Context: the Suite Satellite Service owns upstream imagery sourcing and trusted-basemap promotion, while this build owns onboard runtime use.

Decision: this build owns tile-cache schema and sidecars locally, but treats the folder layout/file allocation contract as authored by ../satellite-service/.

Alternatives considered:

  1. Direct commercial-provider integration onboard - rejected as out of scope and unavailable in flight.
  2. This build owning Service ingest/voting - rejected because it belongs to the sibling Satellite Service.

Consequences: Plan Step 3 must produce clear interface tasks and avoid implementing Service internals in this repo.

ADR-006: Slim Companion-Side FDR

Context: older AC wording asked the companion to retain IMU, GPS_INPUT receipts, MAVLink tlog, and larger per-flight payloads that the FC already records.

Decision: the companion FDR stores companion-side-unique signals only, compressed as FlatBuffers + zstd, rolling 3 flights under about 2 GB total.

Alternatives considered:

  1. Full duplicate FDR on companion - rejected because FC dataflash is authoritative and duplicate storage increases write load and retention risk.
  2. No companion FDR - rejected because gate values and failure thumbnails are needed for false-position and tile-poisoning analysis.

Consequences: AC-NEW-3 and related test docs need a coordinated wording update in a later Plan/Decompose cleanup.

ADR-007: Provisioning Tool as a First-Class Component

Context: FC parameter pinning, signing setup, camera calibration, and manifest creation are too safety-critical to leave as manual tribal knowledge.

Decision: add C-12 Provisioning Tool to v1 deliverables.

Alternatives considered:

  1. Manual runbook only - rejected because repeatability and auditability are insufficient.
  2. Fold provisioning into runtime nodes - rejected because provisioning is a ground-time workflow with different privileges and failure modes.

Consequences: component decomposition must include provisioning tasks, manifest validation, FC smoke tests, and Jetson UI guided calibration steps.

9. Known Source Deltas to Resolve Later

These deltas are intentionally not side-edited during Phase 2a. They should be handled as a coordinated cleanup after architecture confirmation or during Plan Step 3 / Decompose:

  • _docs/00_problem/restrictions.md still states z=20 and about 2.8 GB cache; update to z=19 and about 700 MB if this architecture is confirmed.
  • _docs/00_problem/acceptance_criteria.md AC-NEW-3 still describes a large duplicate FDR; rewrite to the slim companion-side FDR contract.
  • _docs/00_problem/acceptance_criteria.md AC-2.2 should explicitly call out the z=19 5x scale-ratio validation and z=20 fallback.
  • _docs/01_solution/solution.md needs Component 1, Component 10, and new Component 12 updates to match this architecture.