Files
gps-denied-onboard/_docs/02_document/tests/environment.md
T
Oleksandr Bezdieniezhnykh c19c76481c Update autodev skill documentation and acceptance criteria
Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
2026-05-09 03:10:57 +03:00

16 KiB
Raw Blame History

Test Environment

Overview

System under test (SUT): gps-denied-onboard companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):

  • Inbound — nav-camera frames: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per restrictions.md; tests: file-backed source replaying _docs/00_problem/input_data/AD0000NN.jpg or flight_derkachi/flight_derkachi.mp4).
  • Inbound — FC telemetry: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying SCALED_IMU2, ATTITUDE, GLOBAL_POSITION_INT (or MSP equivalents). Tests replay flight_derkachi/data_imu.csv through a thin replayer.
  • Inbound — satellite tile cache: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
  • Outbound — FC external-positioning: MAVLink GPS_INPUT (ArduPilot Plane) OR MSP2 MSP2_SENSOR_GPS (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
  • Outbound — GCS telemetry: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
  • Outbound — Flight Data Recorder: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.

Consumer app purpose: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.

Two-tier execution profile

This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.

Tier Hardware What it covers What it skips
Tier-1 (workstation Docker) x86 dev workstation, optional NVIDIA dGPU for TensorRT validation All FT-* correctness, schema, NFT-RES-* resilience scenarios, NFT-SEC-* security scenarios, NFT-LIM-* storage budgets Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5
Tier-2 (Jetson hardware loop) Jetson Orin Nano Super (pinned hardware per restrictions.md), thermal chamber for AC-NEW-5 AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) Iteration speed (manual hardware time)

CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.

Docker Environment (Tier-1)

Services

Service Image / Build Purpose Ports
gps-denied-onboard local build (docker/Dockerfile) The SUT. Production binary built with BUILD_VINS_MONO=OFF per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with BUILD_VINS_MONO=ON 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL)
ardupilot-plane-sitl ardupilot/ardupilot-sitl:plane-stable ArduPilot Plane SITL. Receives GPS_INPUT from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x 14550/udp (MAVLink)
inav-sitl inavflight/inav-sitl:9.0.0 iNav SITL. Receives MSP2_SENSOR_GPS from the SUT; we read its GPS provider state 5760/tcp (MSP2 over TCP per iNav SITL convention)
mock-suite-sat-service local build (tests/fixtures/mock-suite-sat) Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles 8080/tcp
e2e-runner local build (tests/runner) Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios
mavproxy-listener ardupilot/mavproxy:latest Passive MAVLink listener that captures the SUT → GCS stream into a per-run .tlog for assertions 14551/udp

Networks

Network Services Purpose
e2e-net all Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on e2e-net enforces this and is itself a security test (NFT-SEC-02).

Volumes

Volume Mounted to Purpose
tile-cache-fixture gps-denied-onboard:/var/azaion/tile-cache:ro Pre-built FAISS HNSW index + tile filesystem. Built once per test run from tests/fixtures/tile-cache-builder/ from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior.
fdr-output gps-denied-onboard:/var/azaion/fdr Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker --storage-opt size=64g on this volume)
input-data e2e-runner:/test-data:ro Bind mount of _docs/00_problem/input_data/ for replay
expected-results e2e-runner:/expected:ro Bind mount of _docs/00_problem/input_data/expected_results/ for assertions

docker-compose structure

services:
  gps-denied-onboard:
    build:
      context: ../..
      dockerfile: docker/Dockerfile
      args:
        BUILD_VINS_MONO: "OFF"
    networks: [e2e-net]
    volumes:
      - tile-cache-fixture:/var/azaion/tile-cache:ro
      - fdr-output:/var/azaion/fdr
    environment:
      ONBOARD_FC_ADAPTER: ${FC_ADAPTER}        # ardupilot | inav, set per scenario
      ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY}    # okvis2 | klt_ransac (production); vins_mono only in research build
      MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
    depends_on:
      - mock-suite-sat-service

  ardupilot-plane-sitl:
    image: ardupilot/ardupilot-sitl:plane-stable
    networks: [e2e-net]
    command: ["--vehicle=ArduPlane", "--gps-type=14"]   # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html

  inav-sitl:
    image: inavflight/inav-sitl:9.0.0
    networks: [e2e-net]
    # iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md

  mock-suite-sat-service:
    build: ../fixtures/mock-suite-sat
    networks: [e2e-net]
    # Egress restriction enforced at network level, not service level

  e2e-runner:
    build: ../runner
    networks: [e2e-net]
    volumes:
      - input-data:/test-data:ro
      - expected-results:/expected:ro
      - fdr-output:/fdr:ro
    depends_on:
      - gps-denied-onboard
      - ardupilot-plane-sitl
      - inav-sitl
      - mavproxy-listener

  mavproxy-listener:
    image: ardupilot/mavproxy:latest
    networks: [e2e-net]

networks:
  e2e-net:
    driver: bridge
    internal: true   # NO external connectivity (enforces RESTRICT-SAT-1)

volumes:
  tile-cache-fixture: {}
  fdr-output: {}

Consumer Application

Tech stack: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), msp_gps_toy (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.12.0 (frame source replay), numpy + scipy (geodesic-distance assertions in WGS84).

Entry point: pytest tests/e2e/ from inside e2e-runner. Each scenario is a parameterized pytest case keyed by FC adapter (ardupilot / inav).

Communication with system under test

Interface Protocol Endpoint / Topic Authentication
Frame source V4L2 / GStreamer file source UNIX domain socket / shared /test-data mount none (local)
FC telemetry inbound MAVLink (AP) or MSP2 (iNav) udp:gps-denied-onboard:14550 (AP) or tcp:gps-denied-onboard:5760 (iNav) MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk
Tile cache Filesystem read /var/azaion/tile-cache (read-only mount) filesystem perms
FC external-pos outbound observation Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL udp:ardupilot-plane-sitl:14550 or tcp:inav-sitl:5760 passive listener
GCS telemetry observation MAVLink listener udp:mavproxy-listener:14551 (forwarded from SUT 14550) none
FDR output Filesystem read post-run /fdr (read-only mount) filesystem perms
Suite Sat Service mock HTTP/JSON http://mock-suite-sat-service:8080 none (test)

What the consumer does NOT have access to

  • No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
  • No internal Python/C++ module imports from the SUT.
  • No shared memory or filesystem with the SUT outside the four explicit mounts (tile-cache-fixture r/o, fdr-output r/o from runner side, input-data r/o, expected-results r/o).
  • No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.

CI/CD Integration

When to run:

  • Tier-1 (workstation Docker): on every PR to dev branch and nightly on dev HEAD.
  • Tier-2 (Jetson hardware loop): nightly on dev, and as a hard gate before any release tag.
  • AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.

Pipeline stage:

  • Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
  • Tier-2 is a separate workflow on self-hosted-jetson-orin runner.

Gate behavior: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.

Timeout:

  • Tier-1: 60 min per matrix entry.
  • Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
  • Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).

Reporting

Format: CSV (one row per test).

Columns: test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths

  • traces_to: comma-separated AC/RESTRICT IDs from the traceability matrix.
  • fc_adapter: ardupilot | inav | n/a.
  • vio_strategy: okvis2 | klt_ransac | vins_mono | n/a (research-build only for vins_mono).
  • tier: tier1-docker | tier2-jetson | tier2-chamber.
  • result: PASS | FAIL | SKIP | XFAIL (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
  • evidence_paths: comma-separated paths inside the run-output bundle (.tlog files, FDR archives, screenshots, profiler traces) supporting the verdict.

Output path: e2e-results/run-${RUN_ID}/report.csv plus a per-run bundle of evidence at e2e-results/run-${RUN_ID}/evidence/.

Test Execution

Decision (2026-05-09): both — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.

Hardware dependencies found (Phase 3 → Hardware Assessment scan)

Category Indicator Source file
GPU / CUDA TensorRT engines (.engine, SM 87, JetPack 6.2, TRT 10.3) _docs/01_solution/solution.md PRE-FLIGHT block
GPU / CUDA DISK+LightGlue FP16 inference _docs/01_solution/solution.md RUNTIME block (C3)
GPU / CUDA pin Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) _docs/00_problem/restrictions.md § Onboard Hardware
Sensors / Cameras ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE _docs/00_problem/restrictions.md § Cameras
Sensors / Cameras V4L2 / GStreamer frame source (production) _docs/02_document/tests/environment.md § Overview
OS-specific services High-rate IMU via UART/MAVLink to FC _docs/00_problem/restrictions.md § Sensors & Integration
OS-specific services Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) _docs/00_problem/restrictions.md § Sensors & Integration
OS-specific services tegrastats / jetson_stats for thermal telemetry _docs/02_document/tests/resource-limit-tests.md NFT-LIM-04
Thermal envelope -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle _docs/00_problem/restrictions.md § Failsafe & Safety + AC-NEW-5

(Step 2 Code scan returned zero indicators because no source code exists yet — this is the planning phase. Decompose → Implement will produce requirements.txt / pyproject.toml / Cargo.toml entries that confirm: tensorrt, pycuda, pymavlink, gtsam, faiss-gpu, opencv-python>=4.12.0, jetson-stats.)

Execution instructions — Tier-1 (Docker)

Prerequisites:

  • Docker 24+ with Compose v2.
  • NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
  • ≥16 GB host RAM, ≥80 GB free disk for tile-cache-fixture + fdr-output + image build cache.

How to start:

cd e2e/docker
export FC_ADAPTER=ardupilot          # or: inav  (parameterized per scenario in CI)
export VIO_STRATEGY=okvis2           # or: klt_ransac  (production binary)
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner

The run reports to ./e2e-results/run-${RUN_ID}/report.csv (see § Reporting). Exit code matches the test verdict.

Environment variables:

  • FC_ADAPTER{ardupilot, inav} — selects which SITL the SUT talks to.
  • VIO_STRATEGY{okvis2, klt_ransac} for production binary; vins_mono only when the research binary BUILD_VINS_MONO=ON is the build.
  • MAVLINK_SIGNING_PASSKEY_FILE — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.

Skipped on Tier-1: NFT-PERF-01 (AC-4.1 latency p95 — Jetson-bound), NFT-LIM-01 (AC-4.2 memory — Jetson-bound), NFT-PERF-03 (AC-NEW-1 cold-start — Jetson-bound), NFT-LIM-04 (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).

Execution instructions — Tier-2 (Jetson hardware loop)

Prerequisites:

  • Jetson Orin Nano Super (per restrictions.md § Onboard Hardware).
  • JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
  • Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
  • ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
  • Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all AC-2.x cross-validation is XFAIL for that run).

How to start:

cd e2e/jetson
sudo systemctl restart gps-denied-onboard.service
./run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 8h
# or:
./run-tier2.sh --fc-adapter inav --vio-strategy klt_ransac --duration 5min

Outputs the same CSV format as Tier-1 (one report.csv per run).

Environment variables: same as Tier-1 plus:

  • TIER2_CHAMBER_AMBIENT_C — ambient temperature for AC-NEW-5 chamber runs.
  • TIER2_CAMERA_DEVICE/dev/video0 (production) or file path for replay mode.

CI runner mapping

  • ubuntu-24.04 (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
  • self-hosted-jetson-orin → Tier-2 Jetson, nightly on dev HEAD + pre-release gate. ~4 hr per matrix entry.
  • self-hosted-jetson-orin-chamber → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.

Matrix dimensions: FC_ADAPTER × VIO_STRATEGY × build_kind where build_kind ∈ {production, research}. Production vins_mono is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.