Files
gps-denied-onboard/_docs/02_document/tests/environment.md
T

269 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Test Environment
## Overview
**System under test (SUT)**: the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).
**SUT entry points (public interfaces, all black-box)**:
| Entry point | Protocol | Direction | Bound to | Purpose |
|-------------|----------|-----------|----------|---------|
| `MAVLink GPS_INPUT` | MAVLink2 (signed), serial/UDP | SUT → FC | sysid=11 | Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2) |
| `MAVLink STATUSTEXT / NAMED_VALUE_FLOAT` | MAVLink2 (signed) | SUT → GCS | sysid=10 | Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2) |
| `MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT` | MAVLink2 | FC → SUT | sysid=10 | IMU + autopilot inputs to cuVSLAM, ortho, source-promotion |
| `HTTP/HTTPS REST` (e.g., `/health`, `/sessions`, `/objects/locate`) | HTTPS+JWT | external → SUT | TBD port | Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 2733) |
| `HTTP SSE` (`/sessions/{id}/stream`) | HTTPS+SSE | SUT → external | TBD port | 1 Hz position stream for monitoring (results_report row 32) |
| `ROS 2 topics` (test-only sniffer) | DDS | SUT internal | observed black-box via topic ports | F-T19 ROS rate sanity test only — NOT used by functional tests |
| `MBTiles cache file` (read-only check) | SQLite read | external → cache fs | mounted volume | AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals |
**Consumer app purpose**: a standalone `pytest`-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has **no source-code access** to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public `nav_msgs/Odometry` / `sensor_msgs/Image` subscriptions that are documented as the SUT contract).
## Docker Environment
### Services
| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| `sut` | build context `./` (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for `linux/arm64` for HW tier and `linux/amd64+cuda` for SW emulation tier) | The full GPS-Denied stack (all ROS 2 nodes) | UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics) |
| `ardupilot-sitl` | `ardupilot/ardupilot-sitl:4.5-PR30080-pinned` | Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 | UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock |
| `qgc-mock` | build `./fixtures/qgc-mock/` (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) | Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) | UDP 14570 |
| `tile-cache-init` | build `./fixtures/tile-cache-init/` (one-shot loader that materialises `fixtures/satellite_tiles_AD0000xx_z20/` MBTiles + sidecar) | Pre-populates the satellite cache before each test | — (one-shot) |
| `gps-spoof-injector` | build `./fixtures/gps-spoof-injector/` (publishes `GPS_RAW_INT` with crafted lat/lon/sat/hdop) | F-T12 / AC-NEW-2 spoof scenarios | UDP 14571 → sut |
| `e2e-runner` | build `./e2e/` (Python 3.11 + pytest + pymavlink + httpx + pyserial) | Black-box test runner | — |
| `prom` | `prom/prometheus:v2.51.0` | Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 | TCP 9091 |
| `nvidia-smi-exporter` | `utkuozdemir/nvidia_gpu_exporter:1.2.0` (HW tier only) | Jetson tegrastats / nvidia-smi metrics | TCP 9092 |
### Networks
| Network | Services | Purpose |
|---------|----------|---------|
| `e2e-mavlink-net` | `sut`, `ardupilot-sitl`, `qgc-mock`, `gps-spoof-injector` | MAVLink traffic (single broadcast domain so distinct sysids share routing realistically) |
| `e2e-api-net` | `sut`, `e2e-runner` | HTTPS + SSE traffic for object-localization / health endpoints |
| `e2e-metrics-net` | `sut`, `prom`, `nvidia-smi-exporter`, `e2e-runner` | Resource-monitoring scrape path |
### Volumes
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache` | `sut:/var/lib/gpsdenied/tiles` (rw), `tile-cache-init:/init/tiles` (rw), `e2e-runner:/probe/tiles` (ro) | Persistent satellite + onboard tile cache (AC-8.3, AC-8.4) |
| `fdr` | `sut:/var/lib/gpsdenied/fdr` (rw), `e2e-runner:/probe/fdr` (ro) | Flight Data Recorder output (AC-NEW-3) |
| `fixtures-images` | `sut:/fixtures/images` (ro), `e2e-runner:/fixtures/images` (ro) | The 60 nav-cam JPGs + AerialVL S03 slice |
| `fixtures-imu` | `sut:/fixtures/imu` (ro), `ardupilot-sitl:/fixtures/imu` (ro) | SITL replay IMU traces (AerialVL S03 + synthetic from `coordinates.csv`) |
| `fixtures-expected` | `e2e-runner:/fixtures/expected_results` (ro) | `_docs/00_problem/input_data/expected_results/` mounted into the runner |
| `e2e-results` | `e2e-runner:/results` (rw, host bind) | CSV report output |
### docker-compose structure
```yaml
# Outline only — not runnable code
services:
sut:
build: .
networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/var/lib/gpsdenied/tiles
- fdr:/var/lib/gpsdenied/fdr
- fixtures-images:/fixtures/images:ro
- fixtures-imu:/fixtures/imu:ro
environment:
- MAVLINK_FC_URL=udp://ardupilot-sitl:14550
- MAVLINK_GCS_URL=udp://qgc-mock:14570
- GPSD_API_BIND=0.0.0.0:8443
- GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
- GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
runtime: nvidia # HW tier
ardupilot-sitl:
image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
networks: [e2e-mavlink-net]
command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
qgc-mock:
build: ./fixtures/qgc-mock/
networks: [e2e-mavlink-net]
tile-cache-init:
build: ./fixtures/tile-cache-init/
volumes:
- tile-cache:/init/tiles
restart: "no"
gps-spoof-injector:
build: ./fixtures/gps-spoof-injector/
networks: [e2e-mavlink-net]
e2e-runner:
build: ./e2e/
depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
networks: [e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/probe/tiles:ro
- fdr:/probe/fdr:ro
- fixtures-images:/fixtures/images:ro
- fixtures-expected:/fixtures/expected_results:ro
- e2e-results:/results
command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
prom:
image: prom/prometheus:v2.51.0
networks: [e2e-metrics-net]
```
## Consumer Application
**Tech stack**: Python 3.11 / pytest 8.x / `pymavlink` (matching the SUT version) / `httpx[http2]` / `pyserial` / `numpy` / `pandas` / `pytest-csv` / `pytest-timeout`. **No SUT source imports.**
**Entry point**: `pytest -q` inside `e2e-runner`, with marker-based selection per tier (`pytest -m "blackbox and pipeline"` → 60-image slice; `pytest -m "blackbox and deferred-corpus"` → AerialVL S03; etc.).
### Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| GPS_INPUT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) and `udp://ardupilot-sitl:14550` (target) | MAVLink2 signing key shared with FC for round-trip verification |
| STATUSTEXT / NAMED_VALUE_FLOAT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) | MAVLink2 signing key |
| Object localization | HTTPS + JSON | `POST sut:8443/objects/locate` | JWT bearer (test-only key in `e2e-runner` config) |
| Health probe | HTTPS + JSON | `GET sut:8443/health` | JWT bearer |
| Session management | HTTPS + JSON | `POST sut:8443/sessions`, `GET sut:8443/sessions/{id}/stream` | JWT bearer |
| Operator hint | MAVLink2 STATUSTEXT | injected via `qgc-mock` | MAVLink2 signing key |
| Spoofed GPS injection | MAVLink2 GPS_RAW_INT | injected via `gps-spoof-injector` (separate sysid) | MAVLink2 signing key |
| Tile cache file probe | filesystem read | `/probe/tiles/*.mbtiles` + sidecar JSON | — (read-only mount) |
| FDR file probe | filesystem read | `/probe/fdr/**/*` | — (read-only mount) |
| Metrics scrape | HTTP | `GET prom:9091/api/v1/query?…` | — (test net only) |
### What the consumer does NOT have access to
- No direct DB / SQLite write access against the SUT's tile or FDR stores.
- No Python imports of SUT modules.
- No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
- No CUDA context, no shared memory, no `/proc` access into the SUT container.
- No log-file scraping that bypasses the public health/STATUSTEXT path.
## Test Tiers
The runner stratifies execution by **what artefact set is present**. Each tier maps to a pytest marker and to a `data_status` column value in `traceability-matrix.md`.
| Tier | Marker | Corpus / fixtures required | Coverage scope |
|------|--------|---------------------------|----------------|
| **T1 pipeline-correctness** | `pipeline` | `_docs/00_problem/input_data/` 60-image slice + `coordinates.csv` + placeholder satellite tiles + SITL-replayed IMU | Validates pipeline plumbing only, **NOT** deployment-binding numbers (per Phase 1 D2). |
| **T2 deferred-corpus** | `deferred-corpus` | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight | Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9. |
| **T3 deferred-sitl** | `deferred-sitl` | ArduPilot SITL pinned to PR #30080-class build + scripted scenarios | F-T9 source-switching matrix (AC-4.3, AC-NEW-2). |
| **T4 deferred-hil** | `deferred-hil` | Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop | AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW. |
| **T5 deferred-field** | `deferred-field` | Recorded fixed-wing sortie | FT-1 / FT-2 / FT-3 final field validation. |
Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are **NOT** treated as deployment evidence — they verify that the pipeline produces *some* output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2T5.
## CI/CD Integration
| Tier | When to run | Pipeline stage | Gate behavior | Timeout |
|------|-------------|----------------|---------------|---------|
| T1 pipeline | Every PR to `dev`; nightly | After unit tests | Block merge on FAIL | 30 min |
| T2 deferred-corpus | Nightly; on tag push | Pre-release | Block release on FAIL | 4 h (Monte Carlo NF-T4 dominates) |
| T3 deferred-sitl | Nightly | Pre-release | Block release on FAIL | 1 h |
| T4 deferred-hil | Bench-on-demand + weekly thermal cycle | Bench-only stage | Manual approval | 12 h (NF-T3 8 h soak) |
| T5 deferred-field | Field-test plan (per-sortie) | Field stage | Out-of-band sign-off | per sortie |
## Reporting
**Format**: CSV (one row per test execution) plus JUnit XML for CI.
**CSV columns**: `test_id`, `test_name`, `tier`, `marker`, `traces_to_acs` (semicolon-joined), `traces_to_restricts`, `data_status` (`present` / `deferred-corpus` / `deferred-sitl` / `deferred-hil` / `deferred-field`), `started_at`, `execution_time_ms`, `result` (`PASS` / `FAIL` / `SKIP` / `BLOCKED-DATA`), `expected_metric`, `actual_metric`, `tolerance`, `error_message` (if FAIL or BLOCKED-DATA), `git_sha`, `image_tag`, `runner_host`.
**Output paths**:
- `e2e-results:/results/report.csv` — primary CSV report
- `e2e-results:/results/junit.xml` — JUnit XML
- `e2e-results:/results/coverage_by_ac.csv` — derived: AC → covering test IDs → aggregate result
- `e2e-results:/results/per_tier.csv` — derived: tier → pass/fail/skip/blocked-data counts
**`BLOCKED-DATA` handling**: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit `BLOCKED-DATA` rather than `FAIL` or `SKIP` — this preserves the data_status signal in the matrix without polluting the failure rate.
## Test Execution
**Decision: both (per-tier split).** The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).
### Hardware dependencies found
| Source | Indicator |
|--------|-----------|
| `_docs/00_problem/restrictions.md:26` | Cameras over USB / MIPI-CSI / GigE |
| `_docs/00_problem/restrictions.md:41` | Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP |
| `_docs/00_problem/restrictions.md:42` | JetPack + CUDA + TensorRT |
| `_docs/00_problem/restrictions.md:43` | Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5) |
| `_docs/00_problem/restrictions.md:48-51` | IMU + MAVLink2 from FC (serial/UDP); ArduPilot only |
| `_docs/01_solution/solution.md` | cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT) |
| this file (`environment.md`) | `runtime: nvidia`; `linux/arm64` HW tier + `linux/amd64+cuda` SW emulation tier; `nvidia-smi-exporter` |
Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).
### Mode A — Docker (T1 / T2 / T3)
**Prerequisites:**
- Docker 24.x+ with Compose v2
- For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
- For SW-emulation runners: `linux/amd64` host; CUDA emulation layer enabled in the SUT image's `linux/amd64+cuda` build target
- T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see `test-data.md`)
- T3 only: `ardupilot-sitl` PR-#30080-pinned image pulled
**Run:**
```bash
# T1 pipeline
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
pytest -m "blackbox and pipeline" --csv=/results/report.csv
# T2 deferred-corpus (corpus volumes must be present)
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
pytest -m "deferred-corpus" --csv=/results/report.csv
# T3 deferred-sitl
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
pytest -m "deferred-sitl" --csv=/results/report.csv
```
**Result collection:** host bind-mount `e2e-results:./results` — produces `report.csv`, `junit.xml`, `coverage_by_ac.csv`, `per_tier.csv`.
**Environment variables (key):** `MAVLINK_FC_URL`, `MAVLINK_GCS_URL`, `GPSD_API_BIND`, `GPSD_TILE_DIR`, `GPSD_FDR_DIR`, `MAVLINK2_SIGNING_KEY`, `JWT_SIGNING_KEY` — full list in `e2e/.env.example` (to be produced in Phase 4 / Decompose).
### Mode B — Local on bench Jetson (T4 deferred-hil)
**Prerequisites:**
- Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
- Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running `ardupilot-sitl` against a recorded IMU stream, OR a real autopilot board on bench)
- Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
- `tegrastats` and `nvidia-smi` available
- Single-tenant scheduling — no other tests share the Jetson during a T4 run
**Run:**
```bash
# T4 perf binding on real HW
./scripts/run-tests.sh --tier=t4
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak
```
**Result collection:** the bench runner copies `report.csv` + `junit.xml` + `tegrastats.log` + `power.csv` to a network share (path TBD by Decompose).
### Mode C — Field (T5 deferred-field)
Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same `e2e-runner` image plus a recorded-flight replay harness defined in the field-test plan.
### CI runner mapping
| Tier | CI runner type | Mode | Cadence |
|------|---------------|------|---------|
| T1 pipeline | Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation | Docker | Every PR + nightly |
| T2 deferred-corpus | Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted | Docker | Nightly + on-tag |
| T3 deferred-sitl | Linux x86 (CPU-only OK) | Docker | Nightly |
| T4 deferred-hil | Self-hosted Jetson Orin Nano Super bench runner | Local | Bench-on-demand + weekly thermal cycle |
| T5 deferred-field | n/a (per-sortie out-of-band) | Field | Per field-test plan |
Phase 4 (`run-tests.sh`, `run-performance-tests.sh`) consumes this section to choose between the Docker and bench-local code paths via the `--tier=` flag.
## External Dependencies
The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is **out of scope** for this build. The runner therefore mocks:
- `tile-cache-init` provides the cache contents the SUT would normally have synced from the Service pre-flight.
- `qgc-mock` is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
- `gps-spoof-injector` simulates a malicious GPS signal for AC-NEW-2 / F-T12.
- `ardupilot-sitl` is the only autopilot under test (PX4 is out of scope per restrictions).
- The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.
No external mocks have access to internal SUT state.