Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-27 06:23:53 +03:00
parent 9eba1689b3
commit f321268e1b
9 changed files with 2561 additions and 3 deletions
+268
View File
@@ -0,0 +1,268 @@
# Test Environment
## Overview
**System under test (SUT)**: the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).
**SUT entry points (public interfaces, all black-box)**:
| Entry point | Protocol | Direction | Bound to | Purpose |
|-------------|----------|-----------|----------|---------|
| `MAVLink GPS_INPUT` | MAVLink2 (signed), serial/UDP | SUT → FC | sysid=11 | Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2) |
| `MAVLink STATUSTEXT / NAMED_VALUE_FLOAT` | MAVLink2 (signed) | SUT → GCS | sysid=10 | Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2) |
| `MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT` | MAVLink2 | FC → SUT | sysid=10 | IMU + autopilot inputs to cuVSLAM, ortho, source-promotion |
| `HTTP/HTTPS REST` (e.g., `/health`, `/sessions`, `/objects/locate`) | HTTPS+JWT | external → SUT | TBD port | Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 2733) |
| `HTTP SSE` (`/sessions/{id}/stream`) | HTTPS+SSE | SUT → external | TBD port | 1 Hz position stream for monitoring (results_report row 32) |
| `ROS 2 topics` (test-only sniffer) | DDS | SUT internal | observed black-box via topic ports | F-T19 ROS rate sanity test only — NOT used by functional tests |
| `MBTiles cache file` (read-only check) | SQLite read | external → cache fs | mounted volume | AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals |
**Consumer app purpose**: a standalone `pytest`-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has **no source-code access** to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public `nav_msgs/Odometry` / `sensor_msgs/Image` subscriptions that are documented as the SUT contract).
## Docker Environment
### Services
| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| `sut` | build context `./` (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for `linux/arm64` for HW tier and `linux/amd64+cuda` for SW emulation tier) | The full GPS-Denied stack (all ROS 2 nodes) | UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics) |
| `ardupilot-sitl` | `ardupilot/ardupilot-sitl:4.5-PR30080-pinned` | Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 | UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock |
| `qgc-mock` | build `./fixtures/qgc-mock/` (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) | Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) | UDP 14570 |
| `tile-cache-init` | build `./fixtures/tile-cache-init/` (one-shot loader that materialises `fixtures/satellite_tiles_AD0000xx_z20/` MBTiles + sidecar) | Pre-populates the satellite cache before each test | — (one-shot) |
| `gps-spoof-injector` | build `./fixtures/gps-spoof-injector/` (publishes `GPS_RAW_INT` with crafted lat/lon/sat/hdop) | F-T12 / AC-NEW-2 spoof scenarios | UDP 14571 → sut |
| `e2e-runner` | build `./e2e/` (Python 3.11 + pytest + pymavlink + httpx + pyserial) | Black-box test runner | — |
| `prom` | `prom/prometheus:v2.51.0` | Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 | TCP 9091 |
| `nvidia-smi-exporter` | `utkuozdemir/nvidia_gpu_exporter:1.2.0` (HW tier only) | Jetson tegrastats / nvidia-smi metrics | TCP 9092 |
### Networks
| Network | Services | Purpose |
|---------|----------|---------|
| `e2e-mavlink-net` | `sut`, `ardupilot-sitl`, `qgc-mock`, `gps-spoof-injector` | MAVLink traffic (single broadcast domain so distinct sysids share routing realistically) |
| `e2e-api-net` | `sut`, `e2e-runner` | HTTPS + SSE traffic for object-localization / health endpoints |
| `e2e-metrics-net` | `sut`, `prom`, `nvidia-smi-exporter`, `e2e-runner` | Resource-monitoring scrape path |
### Volumes
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache` | `sut:/var/lib/gpsdenied/tiles` (rw), `tile-cache-init:/init/tiles` (rw), `e2e-runner:/probe/tiles` (ro) | Persistent satellite + onboard tile cache (AC-8.3, AC-8.4) |
| `fdr` | `sut:/var/lib/gpsdenied/fdr` (rw), `e2e-runner:/probe/fdr` (ro) | Flight Data Recorder output (AC-NEW-3) |
| `fixtures-images` | `sut:/fixtures/images` (ro), `e2e-runner:/fixtures/images` (ro) | The 60 nav-cam JPGs + AerialVL S03 slice |
| `fixtures-imu` | `sut:/fixtures/imu` (ro), `ardupilot-sitl:/fixtures/imu` (ro) | SITL replay IMU traces (AerialVL S03 + synthetic from `coordinates.csv`) |
| `fixtures-expected` | `e2e-runner:/fixtures/expected_results` (ro) | `_docs/00_problem/input_data/expected_results/` mounted into the runner |
| `e2e-results` | `e2e-runner:/results` (rw, host bind) | CSV report output |
### docker-compose structure
```yaml
# Outline only — not runnable code
services:
sut:
build: .
networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/var/lib/gpsdenied/tiles
- fdr:/var/lib/gpsdenied/fdr
- fixtures-images:/fixtures/images:ro
- fixtures-imu:/fixtures/imu:ro
environment:
- MAVLINK_FC_URL=udp://ardupilot-sitl:14550
- MAVLINK_GCS_URL=udp://qgc-mock:14570
- GPSD_API_BIND=0.0.0.0:8443
- GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
- GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
runtime: nvidia # HW tier
ardupilot-sitl:
image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
networks: [e2e-mavlink-net]
command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
qgc-mock:
build: ./fixtures/qgc-mock/
networks: [e2e-mavlink-net]
tile-cache-init:
build: ./fixtures/tile-cache-init/
volumes:
- tile-cache:/init/tiles
restart: "no"
gps-spoof-injector:
build: ./fixtures/gps-spoof-injector/
networks: [e2e-mavlink-net]
e2e-runner:
build: ./e2e/
depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
networks: [e2e-api-net, e2e-metrics-net]
volumes:
- tile-cache:/probe/tiles:ro
- fdr:/probe/fdr:ro
- fixtures-images:/fixtures/images:ro
- fixtures-expected:/fixtures/expected_results:ro
- e2e-results:/results
command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
prom:
image: prom/prometheus:v2.51.0
networks: [e2e-metrics-net]
```
## Consumer Application
**Tech stack**: Python 3.11 / pytest 8.x / `pymavlink` (matching the SUT version) / `httpx[http2]` / `pyserial` / `numpy` / `pandas` / `pytest-csv` / `pytest-timeout`. **No SUT source imports.**
**Entry point**: `pytest -q` inside `e2e-runner`, with marker-based selection per tier (`pytest -m "blackbox and pipeline"` → 60-image slice; `pytest -m "blackbox and deferred-corpus"` → AerialVL S03; etc.).
### Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| GPS_INPUT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) and `udp://ardupilot-sitl:14550` (target) | MAVLink2 signing key shared with FC for round-trip verification |
| STATUSTEXT / NAMED_VALUE_FLOAT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) | MAVLink2 signing key |
| Object localization | HTTPS + JSON | `POST sut:8443/objects/locate` | JWT bearer (test-only key in `e2e-runner` config) |
| Health probe | HTTPS + JSON | `GET sut:8443/health` | JWT bearer |
| Session management | HTTPS + JSON | `POST sut:8443/sessions`, `GET sut:8443/sessions/{id}/stream` | JWT bearer |
| Operator hint | MAVLink2 STATUSTEXT | injected via `qgc-mock` | MAVLink2 signing key |
| Spoofed GPS injection | MAVLink2 GPS_RAW_INT | injected via `gps-spoof-injector` (separate sysid) | MAVLink2 signing key |
| Tile cache file probe | filesystem read | `/probe/tiles/*.mbtiles` + sidecar JSON | — (read-only mount) |
| FDR file probe | filesystem read | `/probe/fdr/**/*` | — (read-only mount) |
| Metrics scrape | HTTP | `GET prom:9091/api/v1/query?…` | — (test net only) |
### What the consumer does NOT have access to
- No direct DB / SQLite write access against the SUT's tile or FDR stores.
- No Python imports of SUT modules.
- No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
- No CUDA context, no shared memory, no `/proc` access into the SUT container.
- No log-file scraping that bypasses the public health/STATUSTEXT path.
## Test Tiers
The runner stratifies execution by **what artefact set is present**. Each tier maps to a pytest marker and to a `data_status` column value in `traceability-matrix.md`.
| Tier | Marker | Corpus / fixtures required | Coverage scope |
|------|--------|---------------------------|----------------|
| **T1 pipeline-correctness** | `pipeline` | `_docs/00_problem/input_data/` 60-image slice + `coordinates.csv` + placeholder satellite tiles + SITL-replayed IMU | Validates pipeline plumbing only, **NOT** deployment-binding numbers (per Phase 1 D2). |
| **T2 deferred-corpus** | `deferred-corpus` | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight | Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9. |
| **T3 deferred-sitl** | `deferred-sitl` | ArduPilot SITL pinned to PR #30080-class build + scripted scenarios | F-T9 source-switching matrix (AC-4.3, AC-NEW-2). |
| **T4 deferred-hil** | `deferred-hil` | Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop | AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW. |
| **T5 deferred-field** | `deferred-field` | Recorded fixed-wing sortie | FT-1 / FT-2 / FT-3 final field validation. |
Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are **NOT** treated as deployment evidence — they verify that the pipeline produces *some* output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2T5.
## CI/CD Integration
| Tier | When to run | Pipeline stage | Gate behavior | Timeout |
|------|-------------|----------------|---------------|---------|
| T1 pipeline | Every PR to `dev`; nightly | After unit tests | Block merge on FAIL | 30 min |
| T2 deferred-corpus | Nightly; on tag push | Pre-release | Block release on FAIL | 4 h (Monte Carlo NF-T4 dominates) |
| T3 deferred-sitl | Nightly | Pre-release | Block release on FAIL | 1 h |
| T4 deferred-hil | Bench-on-demand + weekly thermal cycle | Bench-only stage | Manual approval | 12 h (NF-T3 8 h soak) |
| T5 deferred-field | Field-test plan (per-sortie) | Field stage | Out-of-band sign-off | per sortie |
## Reporting
**Format**: CSV (one row per test execution) plus JUnit XML for CI.
**CSV columns**: `test_id`, `test_name`, `tier`, `marker`, `traces_to_acs` (semicolon-joined), `traces_to_restricts`, `data_status` (`present` / `deferred-corpus` / `deferred-sitl` / `deferred-hil` / `deferred-field`), `started_at`, `execution_time_ms`, `result` (`PASS` / `FAIL` / `SKIP` / `BLOCKED-DATA`), `expected_metric`, `actual_metric`, `tolerance`, `error_message` (if FAIL or BLOCKED-DATA), `git_sha`, `image_tag`, `runner_host`.
**Output paths**:
- `e2e-results:/results/report.csv` — primary CSV report
- `e2e-results:/results/junit.xml` — JUnit XML
- `e2e-results:/results/coverage_by_ac.csv` — derived: AC → covering test IDs → aggregate result
- `e2e-results:/results/per_tier.csv` — derived: tier → pass/fail/skip/blocked-data counts
**`BLOCKED-DATA` handling**: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit `BLOCKED-DATA` rather than `FAIL` or `SKIP` — this preserves the data_status signal in the matrix without polluting the failure rate.
## Test Execution
**Decision: both (per-tier split).** The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).
### Hardware dependencies found
| Source | Indicator |
|--------|-----------|
| `_docs/00_problem/restrictions.md:26` | Cameras over USB / MIPI-CSI / GigE |
| `_docs/00_problem/restrictions.md:41` | Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP |
| `_docs/00_problem/restrictions.md:42` | JetPack + CUDA + TensorRT |
| `_docs/00_problem/restrictions.md:43` | Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5) |
| `_docs/00_problem/restrictions.md:48-51` | IMU + MAVLink2 from FC (serial/UDP); ArduPilot only |
| `_docs/01_solution/solution.md` | cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT) |
| this file (`environment.md`) | `runtime: nvidia`; `linux/arm64` HW tier + `linux/amd64+cuda` SW emulation tier; `nvidia-smi-exporter` |
Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).
### Mode A — Docker (T1 / T2 / T3)
**Prerequisites:**
- Docker 24.x+ with Compose v2
- For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
- For SW-emulation runners: `linux/amd64` host; CUDA emulation layer enabled in the SUT image's `linux/amd64+cuda` build target
- T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see `test-data.md`)
- T3 only: `ardupilot-sitl` PR-#30080-pinned image pulled
**Run:**
```bash
# T1 pipeline
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
pytest -m "blackbox and pipeline" --csv=/results/report.csv
# T2 deferred-corpus (corpus volumes must be present)
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
pytest -m "deferred-corpus" --csv=/results/report.csv
# T3 deferred-sitl
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
pytest -m "deferred-sitl" --csv=/results/report.csv
```
**Result collection:** host bind-mount `e2e-results:./results` — produces `report.csv`, `junit.xml`, `coverage_by_ac.csv`, `per_tier.csv`.
**Environment variables (key):** `MAVLINK_FC_URL`, `MAVLINK_GCS_URL`, `GPSD_API_BIND`, `GPSD_TILE_DIR`, `GPSD_FDR_DIR`, `MAVLINK2_SIGNING_KEY`, `JWT_SIGNING_KEY` — full list in `e2e/.env.example` (to be produced in Phase 4 / Decompose).
### Mode B — Local on bench Jetson (T4 deferred-hil)
**Prerequisites:**
- Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
- Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running `ardupilot-sitl` against a recorded IMU stream, OR a real autopilot board on bench)
- Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
- `tegrastats` and `nvidia-smi` available
- Single-tenant scheduling — no other tests share the Jetson during a T4 run
**Run:**
```bash
# T4 perf binding on real HW
./scripts/run-tests.sh --tier=t4
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak
```
**Result collection:** the bench runner copies `report.csv` + `junit.xml` + `tegrastats.log` + `power.csv` to a network share (path TBD by Decompose).
### Mode C — Field (T5 deferred-field)
Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same `e2e-runner` image plus a recorded-flight replay harness defined in the field-test plan.
### CI runner mapping
| Tier | CI runner type | Mode | Cadence |
|------|---------------|------|---------|
| T1 pipeline | Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation | Docker | Every PR + nightly |
| T2 deferred-corpus | Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted | Docker | Nightly + on-tag |
| T3 deferred-sitl | Linux x86 (CPU-only OK) | Docker | Nightly |
| T4 deferred-hil | Self-hosted Jetson Orin Nano Super bench runner | Local | Bench-on-demand + weekly thermal cycle |
| T5 deferred-field | n/a (per-sortie out-of-band) | Field | Per field-test plan |
Phase 4 (`run-tests.sh`, `run-performance-tests.sh`) consumes this section to choose between the Docker and bench-local code paths via the `--tier=` flag.
## External Dependencies
The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is **out of scope** for this build. The runner therefore mocks:
- `tile-cache-init` provides the cache contents the SUT would normally have synced from the Service pre-flight.
- `qgc-mock` is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
- `gps-spoof-injector` simulates a malicious GPS signal for AC-NEW-2 / F-T12.
- `ardupilot-sitl` is the only autopilot under test (PX4 is out of scope per restrictions).
- The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.
No external mocks have access to internal SUT state.