Files
gps-denied-onboard/_docs/02_document/tests/environment.md
T

18 KiB
Raw Blame History

Test Environment

Overview

System under test (SUT): the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).

SUT entry points (public interfaces, all black-box):

Entry point Protocol Direction Bound to Purpose
MAVLink GPS_INPUT MAVLink2 (signed), serial/UDP SUT → FC sysid=11 Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2)
MAVLink STATUSTEXT / NAMED_VALUE_FLOAT MAVLink2 (signed) SUT → GCS sysid=10 Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2)
MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT MAVLink2 FC → SUT sysid=10 IMU + autopilot inputs to cuVSLAM, ortho, source-promotion
HTTP/HTTPS REST (e.g., /health, /sessions, /objects/locate) HTTPS+JWT external → SUT TBD port Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 2733)
HTTP SSE (/sessions/{id}/stream) HTTPS+SSE SUT → external TBD port 1 Hz position stream for monitoring (results_report row 32)
ROS 2 topics (test-only sniffer) DDS SUT internal observed black-box via topic ports F-T19 ROS rate sanity test only — NOT used by functional tests
MBTiles cache file (read-only check) SQLite read external → cache fs mounted volume AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals

Consumer app purpose: a standalone pytest-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has no source-code access to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public nav_msgs/Odometry / sensor_msgs/Image subscriptions that are documented as the SUT contract).

Docker Environment

Services

Service Image / Build Purpose Ports
sut build context ./ (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for linux/arm64 for HW tier and linux/amd64+cuda for SW emulation tier) The full GPS-Denied stack (all ROS 2 nodes) UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics)
ardupilot-sitl ardupilot/ardupilot-sitl:4.5-PR30080-pinned Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock
qgc-mock build ./fixtures/qgc-mock/ (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) UDP 14570
tile-cache-init build ./fixtures/tile-cache-init/ (one-shot loader that materialises fixtures/satellite_tiles_AD0000xx_z20/ MBTiles + sidecar) Pre-populates the satellite cache before each test — (one-shot)
gps-spoof-injector build ./fixtures/gps-spoof-injector/ (publishes GPS_RAW_INT with crafted lat/lon/sat/hdop) F-T12 / AC-NEW-2 spoof scenarios UDP 14571 → sut
e2e-runner build ./e2e/ (Python 3.11 + pytest + pymavlink + httpx + pyserial) Black-box test runner
prom prom/prometheus:v2.51.0 Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 TCP 9091
nvidia-smi-exporter utkuozdemir/nvidia_gpu_exporter:1.2.0 (HW tier only) Jetson tegrastats / nvidia-smi metrics TCP 9092

Networks

Network Services Purpose
e2e-mavlink-net sut, ardupilot-sitl, qgc-mock, gps-spoof-injector MAVLink traffic (single broadcast domain so distinct sysids share routing realistically)
e2e-api-net sut, e2e-runner HTTPS + SSE traffic for object-localization / health endpoints
e2e-metrics-net sut, prom, nvidia-smi-exporter, e2e-runner Resource-monitoring scrape path

Volumes

Volume Mounted to Purpose
tile-cache sut:/var/lib/gpsdenied/tiles (rw), tile-cache-init:/init/tiles (rw), e2e-runner:/probe/tiles (ro) Persistent satellite + onboard tile cache (AC-8.3, AC-8.4)
fdr sut:/var/lib/gpsdenied/fdr (rw), e2e-runner:/probe/fdr (ro) Flight Data Recorder output (AC-NEW-3)
fixtures-images sut:/fixtures/images (ro), e2e-runner:/fixtures/images (ro) The 60 nav-cam JPGs + AerialVL S03 slice
fixtures-imu sut:/fixtures/imu (ro), ardupilot-sitl:/fixtures/imu (ro) SITL replay IMU traces (AerialVL S03 + synthetic from coordinates.csv)
fixtures-expected e2e-runner:/fixtures/expected_results (ro) _docs/00_problem/input_data/expected_results/ mounted into the runner
e2e-results e2e-runner:/results (rw, host bind) CSV report output

docker-compose structure

# Outline only — not runnable code
services:
  sut:
    build: .
    networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/var/lib/gpsdenied/tiles
      - fdr:/var/lib/gpsdenied/fdr
      - fixtures-images:/fixtures/images:ro
      - fixtures-imu:/fixtures/imu:ro
    environment:
      - MAVLINK_FC_URL=udp://ardupilot-sitl:14550
      - MAVLINK_GCS_URL=udp://qgc-mock:14570
      - GPSD_API_BIND=0.0.0.0:8443
      - GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
      - GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
    runtime: nvidia      # HW tier
  ardupilot-sitl:
    image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
    networks: [e2e-mavlink-net]
    command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
  qgc-mock:
    build: ./fixtures/qgc-mock/
    networks: [e2e-mavlink-net]
  tile-cache-init:
    build: ./fixtures/tile-cache-init/
    volumes:
      - tile-cache:/init/tiles
    restart: "no"
  gps-spoof-injector:
    build: ./fixtures/gps-spoof-injector/
    networks: [e2e-mavlink-net]
  e2e-runner:
    build: ./e2e/
    depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
    networks: [e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/probe/tiles:ro
      - fdr:/probe/fdr:ro
      - fixtures-images:/fixtures/images:ro
      - fixtures-expected:/fixtures/expected_results:ro
      - e2e-results:/results
    command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
  prom:
    image: prom/prometheus:v2.51.0
    networks: [e2e-metrics-net]

Consumer Application

Tech stack: Python 3.11 / pytest 8.x / pymavlink (matching the SUT version) / httpx[http2] / pyserial / numpy / pandas / pytest-csv / pytest-timeout. No SUT source imports.

Entry point: pytest -q inside e2e-runner, with marker-based selection per tier (pytest -m "blackbox and pipeline" → 60-image slice; pytest -m "blackbox and deferred-corpus" → AerialVL S03; etc.).

Communication with system under test

Interface Protocol Endpoint / Topic Authentication
GPS_INPUT capture MAVLink2 over UDP udp://qgc-mock:14570 (sniffed) and udp://ardupilot-sitl:14550 (target) MAVLink2 signing key shared with FC for round-trip verification
STATUSTEXT / NAMED_VALUE_FLOAT capture MAVLink2 over UDP udp://qgc-mock:14570 (sniffed) MAVLink2 signing key
Object localization HTTPS + JSON POST sut:8443/objects/locate JWT bearer (test-only key in e2e-runner config)
Health probe HTTPS + JSON GET sut:8443/health JWT bearer
Session management HTTPS + JSON POST sut:8443/sessions, GET sut:8443/sessions/{id}/stream JWT bearer
Operator hint MAVLink2 STATUSTEXT injected via qgc-mock MAVLink2 signing key
Spoofed GPS injection MAVLink2 GPS_RAW_INT injected via gps-spoof-injector (separate sysid) MAVLink2 signing key
Tile cache file probe filesystem read /probe/tiles/*.mbtiles + sidecar JSON — (read-only mount)
FDR file probe filesystem read /probe/fdr/**/* — (read-only mount)
Metrics scrape HTTP GET prom:9091/api/v1/query?… — (test net only)

What the consumer does NOT have access to

  • No direct DB / SQLite write access against the SUT's tile or FDR stores.
  • No Python imports of SUT modules.
  • No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
  • No CUDA context, no shared memory, no /proc access into the SUT container.
  • No log-file scraping that bypasses the public health/STATUSTEXT path.

Test Tiers

The runner stratifies execution by what artefact set is present. Each tier maps to a pytest marker and to a data_status column value in traceability-matrix.md.

Tier Marker Corpus / fixtures required Coverage scope
T1 pipeline-correctness pipeline _docs/00_problem/input_data/ 60-image slice + coordinates.csv + placeholder satellite tiles + SITL-replayed IMU Validates pipeline plumbing only, NOT deployment-binding numbers (per Phase 1 D2).
T2 deferred-corpus deferred-corpus AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9.
T3 deferred-sitl deferred-sitl ArduPilot SITL pinned to PR #30080-class build + scripted scenarios F-T9 source-switching matrix (AC-4.3, AC-NEW-2).
T4 deferred-hil deferred-hil Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW.
T5 deferred-field deferred-field Recorded fixed-wing sortie FT-1 / FT-2 / FT-3 final field validation.

Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are NOT treated as deployment evidence — they verify that the pipeline produces some output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2T5.

CI/CD Integration

Tier When to run Pipeline stage Gate behavior Timeout
T1 pipeline Every PR to dev; nightly After unit tests Block merge on FAIL 30 min
T2 deferred-corpus Nightly; on tag push Pre-release Block release on FAIL 4 h (Monte Carlo NF-T4 dominates)
T3 deferred-sitl Nightly Pre-release Block release on FAIL 1 h
T4 deferred-hil Bench-on-demand + weekly thermal cycle Bench-only stage Manual approval 12 h (NF-T3 8 h soak)
T5 deferred-field Field-test plan (per-sortie) Field stage Out-of-band sign-off per sortie

Reporting

Format: CSV (one row per test execution) plus JUnit XML for CI.

CSV columns: test_id, test_name, tier, marker, traces_to_acs (semicolon-joined), traces_to_restricts, data_status (present / deferred-corpus / deferred-sitl / deferred-hil / deferred-field), started_at, execution_time_ms, result (PASS / FAIL / SKIP / BLOCKED-DATA), expected_metric, actual_metric, tolerance, error_message (if FAIL or BLOCKED-DATA), git_sha, image_tag, runner_host.

Output paths:

  • e2e-results:/results/report.csv — primary CSV report
  • e2e-results:/results/junit.xml — JUnit XML
  • e2e-results:/results/coverage_by_ac.csv — derived: AC → covering test IDs → aggregate result
  • e2e-results:/results/per_tier.csv — derived: tier → pass/fail/skip/blocked-data counts

BLOCKED-DATA handling: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit BLOCKED-DATA rather than FAIL or SKIP — this preserves the data_status signal in the matrix without polluting the failure rate.

Test Execution

Decision: both (per-tier split). The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).

Hardware dependencies found

Source Indicator
_docs/00_problem/restrictions.md:26 Cameras over USB / MIPI-CSI / GigE
_docs/00_problem/restrictions.md:41 Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP
_docs/00_problem/restrictions.md:42 JetPack + CUDA + TensorRT
_docs/00_problem/restrictions.md:43 Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5)
_docs/00_problem/restrictions.md:48-51 IMU + MAVLink2 from FC (serial/UDP); ArduPilot only
_docs/01_solution/solution.md cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT)
this file (environment.md) runtime: nvidia; linux/arm64 HW tier + linux/amd64+cuda SW emulation tier; nvidia-smi-exporter

Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).

Mode A — Docker (T1 / T2 / T3)

Prerequisites:

  • Docker 24.x+ with Compose v2
  • For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
  • For SW-emulation runners: linux/amd64 host; CUDA emulation layer enabled in the SUT image's linux/amd64+cuda build target
  • T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see test-data.md)
  • T3 only: ardupilot-sitl PR-#30080-pinned image pulled

Run:

# T1 pipeline
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
    pytest -m "blackbox and pipeline" --csv=/results/report.csv

# T2 deferred-corpus (corpus volumes must be present)
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
    pytest -m "deferred-corpus" --csv=/results/report.csv

# T3 deferred-sitl
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
    pytest -m "deferred-sitl" --csv=/results/report.csv

Result collection: host bind-mount e2e-results:./results — produces report.csv, junit.xml, coverage_by_ac.csv, per_tier.csv.

Environment variables (key): MAVLINK_FC_URL, MAVLINK_GCS_URL, GPSD_API_BIND, GPSD_TILE_DIR, GPSD_FDR_DIR, MAVLINK2_SIGNING_KEY, JWT_SIGNING_KEY — full list in e2e/.env.example (to be produced in Phase 4 / Decompose).

Mode B — Local on bench Jetson (T4 deferred-hil)

Prerequisites:

  • Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
  • Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running ardupilot-sitl against a recorded IMU stream, OR a real autopilot board on bench)
  • Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
  • tegrastats and nvidia-smi available
  • Single-tenant scheduling — no other tests share the Jetson during a T4 run

Run:

# T4 perf binding on real HW
./scripts/run-tests.sh --tier=t4
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak

Result collection: the bench runner copies report.csv + junit.xml + tegrastats.log + power.csv to a network share (path TBD by Decompose).

Mode C — Field (T5 deferred-field)

Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same e2e-runner image plus a recorded-flight replay harness defined in the field-test plan.

CI runner mapping

Tier CI runner type Mode Cadence
T1 pipeline Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation Docker Every PR + nightly
T2 deferred-corpus Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted Docker Nightly + on-tag
T3 deferred-sitl Linux x86 (CPU-only OK) Docker Nightly
T4 deferred-hil Self-hosted Jetson Orin Nano Super bench runner Local Bench-on-demand + weekly thermal cycle
T5 deferred-field n/a (per-sortie out-of-band) Field Per field-test plan

Phase 4 (run-tests.sh, run-performance-tests.sh) consumes this section to choose between the Docker and bench-local code paths via the --tier= flag.

External Dependencies

The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is out of scope for this build. The runner therefore mocks:

  • tile-cache-init provides the cache contents the SUT would normally have synced from the Service pre-flight.
  • qgc-mock is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
  • gps-spoof-injector simulates a malicious GPS signal for AC-NEW-2 / F-T12.
  • ardupilot-sitl is the only autopilot under test (PX4 is out of scope per restrictions).
  • The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.

No external mocks have access to internal SUT state.