Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
16 KiB
Test Environment
Overview
System under test (SUT): gps-denied-onboard companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
- Inbound — nav-camera frames: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per
restrictions.md; tests: file-backed source replaying_docs/00_problem/input_data/AD0000NN.jpgorflight_derkachi/flight_derkachi.mp4). - Inbound — FC telemetry: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying
SCALED_IMU2,ATTITUDE,GLOBAL_POSITION_INT(or MSP equivalents). Tests replayflight_derkachi/data_imu.csvthrough a thin replayer. - Inbound — satellite tile cache: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
- Outbound — FC external-positioning: MAVLink
GPS_INPUT(ArduPilot Plane) OR MSP2MSP2_SENSOR_GPS(iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC. - Outbound — GCS telemetry: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
- Outbound — Flight Data Recorder: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.
Consumer app purpose: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.
Two-tier execution profile
This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|---|---|---|---|
| Tier-1 (workstation Docker) | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All FT-* correctness, schema, NFT-RES-* resilience scenarios, NFT-SEC-* security scenarios, NFT-LIM-* storage budgets |
Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| Tier-2 (Jetson hardware loop) | Jetson Orin Nano Super (pinned hardware per restrictions.md), thermal chamber for AC-NEW-5 |
AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
Docker Environment (Tier-1)
Services
| Service | Image / Build | Purpose | Ports |
|---|---|---|---|
gps-denied-onboard |
local build (docker/Dockerfile) |
The SUT. Production binary built with BUILD_VINS_MONO=OFF per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with BUILD_VINS_MONO=ON |
14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
ardupilot-plane-sitl |
ardupilot/ardupilot-sitl:plane-stable |
ArduPilot Plane SITL. Receives GPS_INPUT from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x |
14550/udp (MAVLink) |
inav-sitl |
inavflight/inav-sitl:9.0.0 |
iNav SITL. Receives MSP2_SENSOR_GPS from the SUT; we read its GPS provider state |
5760/tcp (MSP2 over TCP per iNav SITL convention) |
mock-suite-sat-service |
local build (tests/fixtures/mock-suite-sat) |
Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
e2e-runner |
local build (tests/runner) |
Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios | — |
mavproxy-listener |
ardupilot/mavproxy:latest |
Passive MAVLink listener that captures the SUT → GCS stream into a per-run .tlog for assertions |
14551/udp |
Networks
| Network | Services | Purpose |
|---|---|---|
e2e-net |
all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on e2e-net enforces this and is itself a security test (NFT-SEC-02). |
Volumes
| Volume | Mounted to | Purpose |
|---|---|---|
tile-cache-fixture |
gps-denied-onboard:/var/azaion/tile-cache:ro |
Pre-built FAISS HNSW index + tile filesystem. Built once per test run from tests/fixtures/tile-cache-builder/ from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
fdr-output |
gps-denied-onboard:/var/azaion/fdr |
Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker --storage-opt size=64g on this volume) |
input-data |
e2e-runner:/test-data:ro |
Bind mount of _docs/00_problem/input_data/ for replay |
expected-results |
e2e-runner:/expected:ro |
Bind mount of _docs/00_problem/input_data/expected_results/ for assertions |
docker-compose structure
services:
gps-denied-onboard:
build:
context: ../..
dockerfile: docker/Dockerfile
args:
BUILD_VINS_MONO: "OFF"
networks: [e2e-net]
volumes:
- tile-cache-fixture:/var/azaion/tile-cache:ro
- fdr-output:/var/azaion/fdr
environment:
ONBOARD_FC_ADAPTER: ${FC_ADAPTER} # ardupilot | inav, set per scenario
ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY} # okvis2 | klt_ransac (production); vins_mono only in research build
MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
depends_on:
- mock-suite-sat-service
ardupilot-plane-sitl:
image: ardupilot/ardupilot-sitl:plane-stable
networks: [e2e-net]
command: ["--vehicle=ArduPlane", "--gps-type=14"] # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html
inav-sitl:
image: inavflight/inav-sitl:9.0.0
networks: [e2e-net]
# iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md
mock-suite-sat-service:
build: ../fixtures/mock-suite-sat
networks: [e2e-net]
# Egress restriction enforced at network level, not service level
e2e-runner:
build: ../runner
networks: [e2e-net]
volumes:
- input-data:/test-data:ro
- expected-results:/expected:ro
- fdr-output:/fdr:ro
depends_on:
- gps-denied-onboard
- ardupilot-plane-sitl
- inav-sitl
- mavproxy-listener
mavproxy-listener:
image: ardupilot/mavproxy:latest
networks: [e2e-net]
networks:
e2e-net:
driver: bridge
internal: true # NO external connectivity (enforces RESTRICT-SAT-1)
volumes:
tile-cache-fixture: {}
fdr-output: {}
Consumer Application
Tech stack: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), msp_gps_toy (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.12.0 (frame source replay), numpy + scipy (geodesic-distance assertions in WGS84).
Entry point: pytest tests/e2e/ from inside e2e-runner. Each scenario is a parameterized pytest case keyed by FC adapter (ardupilot / inav).
Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|---|---|---|---|
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared /test-data mount |
none (local) |
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | udp:gps-denied-onboard:14550 (AP) or tcp:gps-denied-onboard:5760 (iNav) |
MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
| Tile cache | Filesystem read | /var/azaion/tile-cache (read-only mount) |
filesystem perms |
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | udp:ardupilot-plane-sitl:14550 or tcp:inav-sitl:5760 |
passive listener |
| GCS telemetry observation | MAVLink listener | udp:mavproxy-listener:14551 (forwarded from SUT 14550) |
none |
| FDR output | Filesystem read post-run | /fdr (read-only mount) |
filesystem perms |
| Suite Sat Service mock | HTTP/JSON | http://mock-suite-sat-service:8080 |
none (test) |
What the consumer does NOT have access to
- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
- No internal Python/C++ module imports from the SUT.
- No shared memory or filesystem with the SUT outside the four explicit mounts (
tile-cache-fixturer/o,fdr-outputr/o from runner side,input-datar/o,expected-resultsr/o). - No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.
CI/CD Integration
When to run:
- Tier-1 (workstation Docker): on every PR to
devbranch and nightly ondevHEAD. - Tier-2 (Jetson hardware loop): nightly on
dev, and as a hard gate before any release tag. - AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
Pipeline stage:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on
self-hosted-jetson-orinrunner.
Gate behavior: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
Timeout:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
Reporting
Format: CSV (one row per test).
Columns: test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths
traces_to: comma-separated AC/RESTRICT IDs from the traceability matrix.fc_adapter:ardupilot|inav|n/a.vio_strategy:okvis2|klt_ransac|vins_mono|n/a(research-build only forvins_mono).tier:tier1-docker|tier2-jetson|tier2-chamber.result:PASS|FAIL|SKIP|XFAIL(XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).evidence_paths: comma-separated paths inside the run-output bundle (.tlogfiles, FDR archives, screenshots, profiler traces) supporting the verdict.
Output path: e2e-results/run-${RUN_ID}/report.csv plus a per-run bundle of evidence at e2e-results/run-${RUN_ID}/evidence/.
Test Execution
Decision (2026-05-09): both — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
Hardware dependencies found (Phase 3 → Hardware Assessment scan)
| Category | Indicator | Source file |
|---|---|---|
| GPU / CUDA | TensorRT engines (.engine, SM 87, JetPack 6.2, TRT 10.3) |
_docs/01_solution/solution.md PRE-FLIGHT block |
| GPU / CUDA | DISK+LightGlue FP16 inference | _docs/01_solution/solution.md RUNTIME block (C3) |
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | _docs/00_problem/restrictions.md § Onboard Hardware |
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | _docs/00_problem/restrictions.md § Cameras |
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | _docs/02_document/tests/environment.md § Overview |
| OS-specific services | High-rate IMU via UART/MAVLink to FC | _docs/00_problem/restrictions.md § Sensors & Integration |
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | _docs/00_problem/restrictions.md § Sensors & Integration |
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | _docs/02_document/tests/resource-limit-tests.md NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | _docs/00_problem/restrictions.md § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan returned zero indicators because no source code exists yet — this is the planning phase. Decompose → Implement will produce requirements.txt / pyproject.toml / Cargo.toml entries that confirm: tensorrt, pycuda, pymavlink, gtsam, faiss-gpu, opencv-python>=4.12.0, jetson-stats.)
Execution instructions — Tier-1 (Docker)
Prerequisites:
- Docker 24+ with Compose v2.
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for
tile-cache-fixture+fdr-output+ image build cache.
How to start:
cd e2e/docker
export FC_ADAPTER=ardupilot # or: inav (parameterized per scenario in CI)
export VIO_STRATEGY=okvis2 # or: klt_ransac (production binary)
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
The run reports to ./e2e-results/run-${RUN_ID}/report.csv (see § Reporting). Exit code matches the test verdict.
Environment variables:
FC_ADAPTER∈{ardupilot, inav}— selects which SITL the SUT talks to.VIO_STRATEGY∈{okvis2, klt_ransac}for production binary;vins_monoonly when the research binaryBUILD_VINS_MONO=ONis the build.MAVLINK_SIGNING_PASSKEY_FILE— path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
Skipped on Tier-1: NFT-PERF-01 (AC-4.1 latency p95 — Jetson-bound), NFT-LIM-01 (AC-4.2 memory — Jetson-bound), NFT-PERF-03 (AC-NEW-1 cold-start — Jetson-bound), NFT-LIM-04 (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
Execution instructions — Tier-2 (Jetson hardware loop)
Prerequisites:
- Jetson Orin Nano Super (per
restrictions.md§ Onboard Hardware). - JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all
AC-2.xcross-validation isXFAILfor that run).
How to start:
cd e2e/jetson
sudo systemctl restart gps-denied-onboard.service
./run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 8h
# or:
./run-tier2.sh --fc-adapter inav --vio-strategy klt_ransac --duration 5min
Outputs the same CSV format as Tier-1 (one report.csv per run).
Environment variables: same as Tier-1 plus:
TIER2_CHAMBER_AMBIENT_C— ambient temperature for AC-NEW-5 chamber runs.TIER2_CAMERA_DEVICE—/dev/video0(production) or file path for replay mode.
CI runner mapping
ubuntu-24.04(GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.self-hosted-jetson-orin→ Tier-2 Jetson, nightly ondevHEAD + pre-release gate. ~4 hr per matrix entry.self-hosted-jetson-orin-chamber→ AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
Matrix dimensions: FC_ADAPTER × VIO_STRATEGY × build_kind where build_kind ∈ {production, research}. Production vins_mono is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.