18 KiB
Test Environment
Overview
System under test (SUT): the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).
SUT entry points (public interfaces, all black-box):
| Entry point | Protocol | Direction | Bound to | Purpose |
|---|---|---|---|---|
MAVLink GPS_INPUT |
MAVLink2 (signed), serial/UDP | SUT → FC | sysid=11 | Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2) |
MAVLink STATUSTEXT / NAMED_VALUE_FLOAT |
MAVLink2 (signed) | SUT → GCS | sysid=10 | Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2) |
MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT |
MAVLink2 | FC → SUT | sysid=10 | IMU + autopilot inputs to cuVSLAM, ortho, source-promotion |
HTTP/HTTPS REST (e.g., /health, /sessions, /objects/locate) |
HTTPS+JWT | external → SUT | TBD port | Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 27–33) |
HTTP SSE (/sessions/{id}/stream) |
HTTPS+SSE | SUT → external | TBD port | 1 Hz position stream for monitoring (results_report row 32) |
ROS 2 topics (test-only sniffer) |
DDS | SUT internal | observed black-box via topic ports | F-T19 ROS rate sanity test only — NOT used by functional tests |
MBTiles cache file (read-only check) |
SQLite read | external → cache fs | mounted volume | AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals |
Consumer app purpose: a standalone pytest-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has no source-code access to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public nav_msgs/Odometry / sensor_msgs/Image subscriptions that are documented as the SUT contract).
Docker Environment
Services
| Service | Image / Build | Purpose | Ports |
|---|---|---|---|
sut |
build context ./ (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for linux/arm64 for HW tier and linux/amd64+cuda for SW emulation tier) |
The full GPS-Denied stack (all ROS 2 nodes) | UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics) |
ardupilot-sitl |
ardupilot/ardupilot-sitl:4.5-PR30080-pinned |
Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 | UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock |
qgc-mock |
build ./fixtures/qgc-mock/ (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) |
Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) | UDP 14570 |
tile-cache-init |
build ./fixtures/tile-cache-init/ (one-shot loader that materialises fixtures/satellite_tiles_AD0000xx_z20/ MBTiles + sidecar) |
Pre-populates the satellite cache before each test | — (one-shot) |
gps-spoof-injector |
build ./fixtures/gps-spoof-injector/ (publishes GPS_RAW_INT with crafted lat/lon/sat/hdop) |
F-T12 / AC-NEW-2 spoof scenarios | UDP 14571 → sut |
e2e-runner |
build ./e2e/ (Python 3.11 + pytest + pymavlink + httpx + pyserial) |
Black-box test runner | — |
prom |
prom/prometheus:v2.51.0 |
Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 | TCP 9091 |
nvidia-smi-exporter |
utkuozdemir/nvidia_gpu_exporter:1.2.0 (HW tier only) |
Jetson tegrastats / nvidia-smi metrics | TCP 9092 |
Networks
| Network | Services | Purpose |
|---|---|---|
e2e-mavlink-net |
sut, ardupilot-sitl, qgc-mock, gps-spoof-injector |
MAVLink traffic (single broadcast domain so distinct sysids share routing realistically) |
e2e-api-net |
sut, e2e-runner |
HTTPS + SSE traffic for object-localization / health endpoints |
e2e-metrics-net |
sut, prom, nvidia-smi-exporter, e2e-runner |
Resource-monitoring scrape path |
Volumes
| Volume | Mounted to | Purpose |
|---|---|---|
tile-cache |
sut:/var/lib/gpsdenied/tiles (rw), tile-cache-init:/init/tiles (rw), e2e-runner:/probe/tiles (ro) |
Persistent satellite + onboard tile cache (AC-8.3, AC-8.4) |
fdr |
sut:/var/lib/gpsdenied/fdr (rw), e2e-runner:/probe/fdr (ro) |
Flight Data Recorder output (AC-NEW-3) |
fixtures-images |
sut:/fixtures/images (ro), e2e-runner:/fixtures/images (ro) |
The 60 nav-cam JPGs + AerialVL S03 slice |
fixtures-imu |
sut:/fixtures/imu (ro), ardupilot-sitl:/fixtures/imu (ro) |
SITL replay IMU traces (AerialVL S03 + synthetic from coordinates.csv) |
fixtures-expected |
e2e-runner:/fixtures/expected_results (ro) |
_docs/00_problem/input_data/expected_results/ mounted into the runner |
e2e-results |
e2e-runner:/results (rw, host bind) |
CSV report output |
docker-compose structure
# Outline only — not runnable code
services:
sut:
build: .
networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/var/lib/gpsdenied/tiles
- fdr:/var/lib/gpsdenied/fdr
- fixtures-images:/fixtures/images:ro
- fixtures-imu:/fixtures/imu:ro
environment:
- MAVLINK_FC_URL=udp://ardupilot-sitl:14550
- MAVLINK_GCS_URL=udp://qgc-mock:14570
- GPSD_API_BIND=0.0.0.0:8443
- GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
- GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
runtime: nvidia # HW tier
ardupilot-sitl:
image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
networks: [e2e-mavlink-net]
command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
qgc-mock:
build: ./fixtures/qgc-mock/
networks: [e2e-mavlink-net]
tile-cache-init:
build: ./fixtures/tile-cache-init/
volumes:
- tile-cache:/init/tiles
restart: "no"
gps-spoof-injector:
build: ./fixtures/gps-spoof-injector/
networks: [e2e-mavlink-net]
e2e-runner:
build: ./e2e/
depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
networks: [e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/probe/tiles:ro
- fdr:/probe/fdr:ro
- fixtures-images:/fixtures/images:ro
- fixtures-expected:/fixtures/expected_results:ro
- e2e-results:/results
command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
prom:
image: prom/prometheus:v2.51.0
networks: [e2e-metrics-net]
Consumer Application
Tech stack: Python 3.11 / pytest 8.x / pymavlink (matching the SUT version) / httpx[http2] / pyserial / numpy / pandas / pytest-csv / pytest-timeout. No SUT source imports.
Entry point: pytest -q inside e2e-runner, with marker-based selection per tier (pytest -m "blackbox and pipeline" → 60-image slice; pytest -m "blackbox and deferred-corpus" → AerialVL S03; etc.).
Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|---|---|---|---|
| GPS_INPUT capture | MAVLink2 over UDP | udp://qgc-mock:14570 (sniffed) and udp://ardupilot-sitl:14550 (target) |
MAVLink2 signing key shared with FC for round-trip verification |
| STATUSTEXT / NAMED_VALUE_FLOAT capture | MAVLink2 over UDP | udp://qgc-mock:14570 (sniffed) |
MAVLink2 signing key |
| Object localization | HTTPS + JSON | POST sut:8443/objects/locate |
JWT bearer (test-only key in e2e-runner config) |
| Health probe | HTTPS + JSON | GET sut:8443/health |
JWT bearer |
| Session management | HTTPS + JSON | POST sut:8443/sessions, GET sut:8443/sessions/{id}/stream |
JWT bearer |
| Operator hint | MAVLink2 STATUSTEXT | injected via qgc-mock |
MAVLink2 signing key |
| Spoofed GPS injection | MAVLink2 GPS_RAW_INT | injected via gps-spoof-injector (separate sysid) |
MAVLink2 signing key |
| Tile cache file probe | filesystem read | /probe/tiles/*.mbtiles + sidecar JSON |
— (read-only mount) |
| FDR file probe | filesystem read | /probe/fdr/**/* |
— (read-only mount) |
| Metrics scrape | HTTP | GET prom:9091/api/v1/query?… |
— (test net only) |
What the consumer does NOT have access to
- No direct DB / SQLite write access against the SUT's tile or FDR stores.
- No Python imports of SUT modules.
- No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
- No CUDA context, no shared memory, no
/procaccess into the SUT container. - No log-file scraping that bypasses the public health/STATUSTEXT path.
Test Tiers
The runner stratifies execution by what artefact set is present. Each tier maps to a pytest marker and to a data_status column value in traceability-matrix.md.
| Tier | Marker | Corpus / fixtures required | Coverage scope |
|---|---|---|---|
| T1 pipeline-correctness | pipeline |
_docs/00_problem/input_data/ 60-image slice + coordinates.csv + placeholder satellite tiles + SITL-replayed IMU |
Validates pipeline plumbing only, NOT deployment-binding numbers (per Phase 1 D2). |
| T2 deferred-corpus | deferred-corpus |
AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight | Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9. |
| T3 deferred-sitl | deferred-sitl |
ArduPilot SITL pinned to PR #30080-class build + scripted scenarios | F-T9 source-switching matrix (AC-4.3, AC-NEW-2). |
| T4 deferred-hil | deferred-hil |
Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop | AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW. |
| T5 deferred-field | deferred-field |
Recorded fixed-wing sortie | FT-1 / FT-2 / FT-3 final field validation. |
Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are NOT treated as deployment evidence — they verify that the pipeline produces some output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2–T5.
CI/CD Integration
| Tier | When to run | Pipeline stage | Gate behavior | Timeout |
|---|---|---|---|---|
| T1 pipeline | Every PR to dev; nightly |
After unit tests | Block merge on FAIL | 30 min |
| T2 deferred-corpus | Nightly; on tag push | Pre-release | Block release on FAIL | 4 h (Monte Carlo NF-T4 dominates) |
| T3 deferred-sitl | Nightly | Pre-release | Block release on FAIL | 1 h |
| T4 deferred-hil | Bench-on-demand + weekly thermal cycle | Bench-only stage | Manual approval | 12 h (NF-T3 8 h soak) |
| T5 deferred-field | Field-test plan (per-sortie) | Field stage | Out-of-band sign-off | per sortie |
Reporting
Format: CSV (one row per test execution) plus JUnit XML for CI.
CSV columns: test_id, test_name, tier, marker, traces_to_acs (semicolon-joined), traces_to_restricts, data_status (present / deferred-corpus / deferred-sitl / deferred-hil / deferred-field), started_at, execution_time_ms, result (PASS / FAIL / SKIP / BLOCKED-DATA), expected_metric, actual_metric, tolerance, error_message (if FAIL or BLOCKED-DATA), git_sha, image_tag, runner_host.
Output paths:
e2e-results:/results/report.csv— primary CSV reporte2e-results:/results/junit.xml— JUnit XMLe2e-results:/results/coverage_by_ac.csv— derived: AC → covering test IDs → aggregate resulte2e-results:/results/per_tier.csv— derived: tier → pass/fail/skip/blocked-data counts
BLOCKED-DATA handling: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit BLOCKED-DATA rather than FAIL or SKIP — this preserves the data_status signal in the matrix without polluting the failure rate.
Test Execution
Decision: both (per-tier split). The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).
Hardware dependencies found
| Source | Indicator |
|---|---|
_docs/00_problem/restrictions.md:26 |
Cameras over USB / MIPI-CSI / GigE |
_docs/00_problem/restrictions.md:41 |
Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP |
_docs/00_problem/restrictions.md:42 |
JetPack + CUDA + TensorRT |
_docs/00_problem/restrictions.md:43 |
Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5) |
_docs/00_problem/restrictions.md:48-51 |
IMU + MAVLink2 from FC (serial/UDP); ArduPilot only |
_docs/01_solution/solution.md |
cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT) |
this file (environment.md) |
runtime: nvidia; linux/arm64 HW tier + linux/amd64+cuda SW emulation tier; nvidia-smi-exporter |
Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).
Mode A — Docker (T1 / T2 / T3)
Prerequisites:
- Docker 24.x+ with Compose v2
- For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
- For SW-emulation runners:
linux/amd64host; CUDA emulation layer enabled in the SUT image'slinux/amd64+cudabuild target - T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see
test-data.md) - T3 only:
ardupilot-sitlPR-#30080-pinned image pulled
Run:
# T1 pipeline
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
pytest -m "blackbox and pipeline" --csv=/results/report.csv
# T2 deferred-corpus (corpus volumes must be present)
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
pytest -m "deferred-corpus" --csv=/results/report.csv
# T3 deferred-sitl
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
pytest -m "deferred-sitl" --csv=/results/report.csv
Result collection: host bind-mount e2e-results:./results — produces report.csv, junit.xml, coverage_by_ac.csv, per_tier.csv.
Environment variables (key): MAVLINK_FC_URL, MAVLINK_GCS_URL, GPSD_API_BIND, GPSD_TILE_DIR, GPSD_FDR_DIR, MAVLINK2_SIGNING_KEY, JWT_SIGNING_KEY — full list in e2e/.env.example (to be produced in Phase 4 / Decompose).
Mode B — Local on bench Jetson (T4 deferred-hil)
Prerequisites:
- Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
- Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running
ardupilot-sitlagainst a recorded IMU stream, OR a real autopilot board on bench) - Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
tegrastatsandnvidia-smiavailable- Single-tenant scheduling — no other tests share the Jetson during a T4 run
Run:
# T4 perf binding on real HW
./scripts/run-tests.sh --tier=t4
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak
Result collection: the bench runner copies report.csv + junit.xml + tegrastats.log + power.csv to a network share (path TBD by Decompose).
Mode C — Field (T5 deferred-field)
Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same e2e-runner image plus a recorded-flight replay harness defined in the field-test plan.
CI runner mapping
| Tier | CI runner type | Mode | Cadence |
|---|---|---|---|
| T1 pipeline | Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation | Docker | Every PR + nightly |
| T2 deferred-corpus | Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted | Docker | Nightly + on-tag |
| T3 deferred-sitl | Linux x86 (CPU-only OK) | Docker | Nightly |
| T4 deferred-hil | Self-hosted Jetson Orin Nano Super bench runner | Local | Bench-on-demand + weekly thermal cycle |
| T5 deferred-field | n/a (per-sortie out-of-band) | Field | Per field-test plan |
Phase 4 (run-tests.sh, run-performance-tests.sh) consumes this section to choose between the Docker and bench-local code paths via the --tier= flag.
External Dependencies
The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is out of scope for this build. The runner therefore mocks:
tile-cache-initprovides the cache contents the SUT would normally have synced from the Service pre-flight.qgc-mockis a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.gps-spoof-injectorsimulates a malicious GPS signal for AC-NEW-2 / F-T12.ardupilot-sitlis the only autopilot under test (PX4 is out of scope per restrictions).- The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.
No external mocks have access to internal SUT state.