mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-04-27 11:56:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh f321268e1b Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.

2026-04-27 06:23:53 +03:00

18 KiB

Raw Blame History

Test Environment

Overview

System under test (SUT): the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).

SUT entry points (public interfaces, all black-box):

Entry point	Protocol	Direction	Bound to	Purpose
`MAVLink GPS_INPUT`	MAVLink2 (signed), serial/UDP	SUT → FC	sysid=11	Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2)
`MAVLink STATUSTEXT / NAMED_VALUE_FLOAT`	MAVLink2 (signed)	SUT → GCS	sysid=10	Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2)
`MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT`	MAVLink2	FC → SUT	sysid=10	IMU + autopilot inputs to cuVSLAM, ortho, source-promotion
`HTTP/HTTPS REST` (e.g., `/health`, `/sessions`, `/objects/locate`)	HTTPS+JWT	external → SUT	TBD port	Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 27–33)
`HTTP SSE` (`/sessions/{id}/stream`)	HTTPS+SSE	SUT → external	TBD port	1 Hz position stream for monitoring (results_report row 32)
`ROS 2 topics` (test-only sniffer)	DDS	SUT internal	observed black-box via topic ports	F-T19 ROS rate sanity test only — NOT used by functional tests
`MBTiles cache file` (read-only check)	SQLite read	external → cache fs	mounted volume	AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals

Consumer app purpose: a standalone pytest-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has no source-code access to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public nav_msgs/Odometry / sensor_msgs/Image subscriptions that are documented as the SUT contract).

Docker Environment

Services

Service	Image / Build	Purpose	Ports
`sut`	build context `./` (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for `linux/arm64` for HW tier and `linux/amd64+cuda` for SW emulation tier)	The full GPS-Denied stack (all ROS 2 nodes)	UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics)
`ardupilot-sitl`	`ardupilot/ardupilot-sitl:4.5-PR30080-pinned`	Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2	UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock
`qgc-mock`	build `./fixtures/qgc-mock/` (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints)	Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4)	UDP 14570
`tile-cache-init`	build `./fixtures/tile-cache-init/` (one-shot loader that materialises `fixtures/satellite_tiles_AD0000xx_z20/` MBTiles + sidecar)	Pre-populates the satellite cache before each test	— (one-shot)
`gps-spoof-injector`	build `./fixtures/gps-spoof-injector/` (publishes `GPS_RAW_INT` with crafted lat/lon/sat/hdop)	F-T12 / AC-NEW-2 spoof scenarios	UDP 14571 → sut
`e2e-runner`	build `./e2e/` (Python 3.11 + pytest + pymavlink + httpx + pyserial)	Black-box test runner	—
`prom`	`prom/prometheus:v2.51.0`	Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5	TCP 9091
`nvidia-smi-exporter`	`utkuozdemir/nvidia_gpu_exporter:1.2.0` (HW tier only)	Jetson tegrastats / nvidia-smi metrics	TCP 9092

Networks

Network	Services	Purpose
`e2e-mavlink-net`	`sut`, `ardupilot-sitl`, `qgc-mock`, `gps-spoof-injector`	MAVLink traffic (single broadcast domain so distinct sysids share routing realistically)
`e2e-api-net`	`sut`, `e2e-runner`	HTTPS + SSE traffic for object-localization / health endpoints
`e2e-metrics-net`	`sut`, `prom`, `nvidia-smi-exporter`, `e2e-runner`	Resource-monitoring scrape path

Volumes

Volume	Mounted to	Purpose
`tile-cache`	`sut:/var/lib/gpsdenied/tiles` (rw), `tile-cache-init:/init/tiles` (rw), `e2e-runner:/probe/tiles` (ro)	Persistent satellite + onboard tile cache (AC-8.3, AC-8.4)
`fdr`	`sut:/var/lib/gpsdenied/fdr` (rw), `e2e-runner:/probe/fdr` (ro)	Flight Data Recorder output (AC-NEW-3)
`fixtures-images`	`sut:/fixtures/images` (ro), `e2e-runner:/fixtures/images` (ro)	The 60 nav-cam JPGs + AerialVL S03 slice
`fixtures-imu`	`sut:/fixtures/imu` (ro), `ardupilot-sitl:/fixtures/imu` (ro)	SITL replay IMU traces (AerialVL S03 + synthetic from `coordinates.csv`)
`fixtures-expected`	`e2e-runner:/fixtures/expected_results` (ro)	`_docs/00_problem/input_data/expected_results/` mounted into the runner
`e2e-results`	`e2e-runner:/results` (rw, host bind)	CSV report output

docker-compose structure

# Outline only — not runnable code
services:
  sut:
    build: .
    networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/var/lib/gpsdenied/tiles
      - fdr:/var/lib/gpsdenied/fdr
      - fixtures-images:/fixtures/images:ro
      - fixtures-imu:/fixtures/imu:ro
    environment:
      - MAVLINK_FC_URL=udp://ardupilot-sitl:14550
      - MAVLINK_GCS_URL=udp://qgc-mock:14570
      - GPSD_API_BIND=0.0.0.0:8443
      - GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
      - GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
    runtime: nvidia      # HW tier
  ardupilot-sitl:
    image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
    networks: [e2e-mavlink-net]
    command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
  qgc-mock:
    build: ./fixtures/qgc-mock/
    networks: [e2e-mavlink-net]
  tile-cache-init:
    build: ./fixtures/tile-cache-init/
    volumes:
      - tile-cache:/init/tiles
    restart: "no"
  gps-spoof-injector:
    build: ./fixtures/gps-spoof-injector/
    networks: [e2e-mavlink-net]
  e2e-runner:
    build: ./e2e/
    depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
    networks: [e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/probe/tiles:ro
      - fdr:/probe/fdr:ro
      - fixtures-images:/fixtures/images:ro
      - fixtures-expected:/fixtures/expected_results:ro
      - e2e-results:/results
    command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
  prom:
    image: prom/prometheus:v2.51.0
    networks: [e2e-metrics-net]

Consumer Application

Tech stack: Python 3.11 / pytest 8.x / pymavlink (matching the SUT version) / httpx[http2] / pyserial / numpy / pandas / pytest-csv / pytest-timeout. No SUT source imports.

Entry point: pytest -q inside e2e-runner, with marker-based selection per tier (pytest -m "blackbox and pipeline" → 60-image slice; pytest -m "blackbox and deferred-corpus" → AerialVL S03; etc.).

Communication with system under test

Interface	Protocol	Endpoint / Topic	Authentication
GPS_INPUT capture	MAVLink2 over UDP	`udp://qgc-mock:14570` (sniffed) and `udp://ardupilot-sitl:14550` (target)	MAVLink2 signing key shared with FC for round-trip verification
STATUSTEXT / NAMED_VALUE_FLOAT capture	MAVLink2 over UDP	`udp://qgc-mock:14570` (sniffed)	MAVLink2 signing key
Object localization	HTTPS + JSON	`POST sut:8443/objects/locate`	JWT bearer (test-only key in `e2e-runner` config)
Health probe	HTTPS + JSON	`GET sut:8443/health`	JWT bearer
Session management	HTTPS + JSON	`POST sut:8443/sessions`, `GET sut:8443/sessions/{id}/stream`	JWT bearer
Operator hint	MAVLink2 STATUSTEXT	injected via `qgc-mock`	MAVLink2 signing key
Spoofed GPS injection	MAVLink2 GPS_RAW_INT	injected via `gps-spoof-injector` (separate sysid)	MAVLink2 signing key
Tile cache file probe	filesystem read	`/probe/tiles/*.mbtiles` + sidecar JSON	— (read-only mount)
FDR file probe	filesystem read	`/probe/fdr/*/`	— (read-only mount)
Metrics scrape	HTTP	`GET prom:9091/api/v1/query?…`	— (test net only)

What the consumer does NOT have access to

No direct DB / SQLite write access against the SUT's tile or FDR stores.
No Python imports of SUT modules.
No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
No CUDA context, no shared memory, no /proc access into the SUT container.
No log-file scraping that bypasses the public health/STATUSTEXT path.

Test Tiers

The runner stratifies execution by what artefact set is present. Each tier maps to a pytest marker and to a data_status column value in traceability-matrix.md.

Tier	Marker	Corpus / fixtures required	Coverage scope
T1 pipeline-correctness	`pipeline`	`_docs/00_problem/input_data/` 60-image slice + `coordinates.csv` + placeholder satellite tiles + SITL-replayed IMU	Validates pipeline plumbing only, NOT deployment-binding numbers (per Phase 1 D2).
T2 deferred-corpus	`deferred-corpus`	AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight	Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9.
T3 deferred-sitl	`deferred-sitl`	ArduPilot SITL pinned to PR #30080-class build + scripted scenarios	F-T9 source-switching matrix (AC-4.3, AC-NEW-2).
T4 deferred-hil	`deferred-hil`	Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop	AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW.
T5 deferred-field	`deferred-field`	Recorded fixed-wing sortie	FT-1 / FT-2 / FT-3 final field validation.

Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are NOT treated as deployment evidence — they verify that the pipeline produces some output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2–T5.

CI/CD Integration

Tier	When to run	Pipeline stage	Gate behavior	Timeout
T1 pipeline	Every PR to `dev`; nightly	After unit tests	Block merge on FAIL	30 min
T2 deferred-corpus	Nightly; on tag push	Pre-release	Block release on FAIL	4 h (Monte Carlo NF-T4 dominates)
T3 deferred-sitl	Nightly	Pre-release	Block release on FAIL	1 h
T4 deferred-hil	Bench-on-demand + weekly thermal cycle	Bench-only stage	Manual approval	12 h (NF-T3 8 h soak)
T5 deferred-field	Field-test plan (per-sortie)	Field stage	Out-of-band sign-off	per sortie

Reporting

Format: CSV (one row per test execution) plus JUnit XML for CI.

CSV columns: test_id, test_name, tier, marker, traces_to_acs (semicolon-joined), traces_to_restricts, data_status (present / deferred-corpus / deferred-sitl / deferred-hil / deferred-field), started_at, execution_time_ms, result (PASS / FAIL / SKIP / BLOCKED-DATA), expected_metric, actual_metric, tolerance, error_message (if FAIL or BLOCKED-DATA), git_sha, image_tag, runner_host.

Output paths:

e2e-results:/results/report.csv — primary CSV report
e2e-results:/results/junit.xml — JUnit XML
e2e-results:/results/coverage_by_ac.csv — derived: AC → covering test IDs → aggregate result
e2e-results:/results/per_tier.csv — derived: tier → pass/fail/skip/blocked-data counts

BLOCKED-DATA handling: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit BLOCKED-DATA rather than FAIL or SKIP — this preserves the data_status signal in the matrix without polluting the failure rate.

Test Execution

Decision: both (per-tier split). The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).

Hardware dependencies found

Source	Indicator
`_docs/00_problem/restrictions.md:26`	Cameras over USB / MIPI-CSI / GigE
`_docs/00_problem/restrictions.md:41`	Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP
`_docs/00_problem/restrictions.md:42`	JetPack + CUDA + TensorRT
`_docs/00_problem/restrictions.md:43`	Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5)
`_docs/00_problem/restrictions.md:48-51`	IMU + MAVLink2 from FC (serial/UDP); ArduPilot only
`_docs/01_solution/solution.md`	cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT)
this file (`environment.md`)	`runtime: nvidia`; `linux/arm64` HW tier + `linux/amd64+cuda` SW emulation tier; `nvidia-smi-exporter`

Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).

Mode A — Docker (T1 / T2 / T3)

Prerequisites:

Docker 24.x+ with Compose v2
For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
For SW-emulation runners: linux/amd64 host; CUDA emulation layer enabled in the SUT image's linux/amd64+cuda build target
T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see test-data.md)
T3 only: ardupilot-sitl PR-#30080-pinned image pulled

Run:

# T1 pipeline
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
    pytest -m "blackbox and pipeline" --csv=/results/report.csv

# T2 deferred-corpus (corpus volumes must be present)
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
    pytest -m "deferred-corpus" --csv=/results/report.csv

# T3 deferred-sitl
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
    pytest -m "deferred-sitl" --csv=/results/report.csv

Result collection: host bind-mount e2e-results:./results — produces report.csv, junit.xml, coverage_by_ac.csv, per_tier.csv.

Environment variables (key): MAVLINK_FC_URL, MAVLINK_GCS_URL, GPSD_API_BIND, GPSD_TILE_DIR, GPSD_FDR_DIR, MAVLINK2_SIGNING_KEY, JWT_SIGNING_KEY — full list in e2e/.env.example (to be produced in Phase 4 / Decompose).

Mode B — Local on bench Jetson (T4 deferred-hil)

Prerequisites:

Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running ardupilot-sitl against a recorded IMU stream, OR a real autopilot board on bench)
Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
tegrastats and nvidia-smi available
Single-tenant scheduling — no other tests share the Jetson during a T4 run

Run:

# T4 perf binding on real HW
./scripts/run-tests.sh --tier=t4
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak

Result collection: the bench runner copies report.csv + junit.xml + tegrastats.log + power.csv to a network share (path TBD by Decompose).

Mode C — Field (T5 deferred-field)

Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same e2e-runner image plus a recorded-flight replay harness defined in the field-test plan.

CI runner mapping

Tier	CI runner type	Mode	Cadence
T1 pipeline	Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation	Docker	Every PR + nightly
T2 deferred-corpus	Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted	Docker	Nightly + on-tag
T3 deferred-sitl	Linux x86 (CPU-only OK)	Docker	Nightly
T4 deferred-hil	Self-hosted Jetson Orin Nano Super bench runner	Local	Bench-on-demand + weekly thermal cycle
T5 deferred-field	n/a (per-sortie out-of-band)	Field	Per field-test plan

Phase 4 (run-tests.sh, run-performance-tests.sh) consumes this section to choose between the Docker and bench-local code paths via the --tier= flag.

External Dependencies

The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is out of scope for this build. The runner therefore mocks:

tile-cache-init provides the cache contents the SUT would normally have synced from the Service pre-flight.
qgc-mock is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
gps-spoof-injector simulates a malicious GPS signal for AC-NEW-2 / F-T12.
ardupilot-sitl is the only autopilot under test (PX4 is out of scope per restrictions).
The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.

No external mocks have access to internal SUT state.

18 KiB Raw Blame History Unescape Escape

Test Environment

Overview

Docker Environment

Services

Networks

Volumes

docker-compose structure

Consumer Application

Communication with system under test

What the consumer does NOT have access to

Test Tiers

CI/CD Integration

Reporting

Test Execution

Hardware dependencies found

Mode A — Docker (T1 / T2 / T3)

Mode B — Local on bench Jetson (T4 deferred-hil)

Mode C — Field (T5 deferred-field)

CI runner mapping

External Dependencies

18 KiB

Raw Blame History