New operator-side console-script renders a self-contained HTML map (folium / Leaflet) comparing the estimator's JSONL track against the tlog ground-truth track. Pinned visual style: red truth + blue estimated polylines, start/end markers per track, 100 m + 50 m scale circles, optional AZ-699 accuracy-summary banner, and an --offline-tiles mode (with optional local tile-URL template) for Jetsons without internet. folium is gated behind a new [operator-tools] optional-dep so the airborne binary's cold-start NFR is unaffected (C12 binary doesn't import the new module). 14 new unit tests pin polyline count, marker count, scale-circle radii, summary embedding, offline-tile behaviour, and full CLI smoke. Zero mypy --strict errors. Refines the 2026-05-20 Jetson-only test policy: unit tests may run locally, e2e/perf/resilience/security stay Jetson-only. Documented in _docs/02_document/tests/environment.md (Where each tier runs) and .cursor/rules/testing.mdc (Test environment for this project). Co-authored-by: Cursor <cursoragent@cursor.com>
29 KiB
Test Environment
Active policy — 2026-05-20 (refined): the canonical CI / release-gate test environment is the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). Unit tests (
pytest tests/unit/) MAY be run on a local developer workstation for fast iteration — they are hardware-agnostic by construction, the suite is fully synthetic, and Jetson SSH round-trips add latency without adding signal. Blackbox / e2e / performance / resilience / security / resource-limit tests (tests/e2e/,e2e/tests/,tests/perf/, etc.) MUST run on the Jetson — never on a local workstation — because their pass criteria are tied to Jetson wall-clock latency, thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86 Docker (the historical "Tier-1" path) is deprecated as a supported e2e environment; the Tier-1 sections below are retained as historical reference / traceability only. CI e2e pipelines target the colocated arm64 Jetson Woodpecker agent (see_docs/04_deploy/ci_cd_pipeline.md); local-development e2e runs SHOULD usescripts/run-tests-jetson.shagainst the configuredjetson-e2eSSH alias rather thanscripts/run-tests.sh. This refinement supersedes the 2026-05-20 "all tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in the § Test Execution section.
Where each tier runs (active policy)
| Tier | Local workstation | Jetson (canonical) | When local is the only option |
|---|---|---|---|
Unit (tests/unit/) |
✅ allowed and encouraged for dev iteration | ✅ also run as part of the Jetson CI lane | always |
Blackbox / e2e (tests/e2e/, e2e/tests/) |
❌ forbidden — placeholder fixtures + missing hardware = false-negative runs | ✅ required for any merge / release decision | never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result |
| Performance / resilience / security / resource-limit | ❌ forbidden | ✅ required | never |
| Thermal chamber (AC-NEW-5) | ❌ forbidden | ✅ chamber Jetson only | never |
Practical consequences:
- A PR may merge on green local unit tests + green Jetson e2e tests.
- A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
- When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via
_docs/_process_leftovers/) rather than substituting a local run. - Tests in
tests/e2e/that gate onRUN_REPLAY_E2Eor@pytest.mark.tier2will SKIP locally; this is correct behaviour, not a failure to investigate.
Overview
System under test (SUT): gps-denied-onboard companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
- Inbound — nav-camera frames: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per
restrictions.md; tests: file-backed source replaying_docs/00_problem/input_data/AD0000NN.jpgorflight_derkachi/flight_derkachi.mp4). - Inbound — FC telemetry: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying
SCALED_IMU2,ATTITUDE,GLOBAL_POSITION_INT(or MSP equivalents). Tests replayflight_derkachi/data_imu.csvthrough a thin replayer. - Inbound — satellite tile cache: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
- Outbound — FC external-positioning: MAVLink
GPS_INPUT(ArduPilot Plane) OR MSP2MSP2_SENSOR_GPS(iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC. - Outbound — GCS telemetry: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
- Outbound — Flight Data Recorder: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.
Consumer app purpose: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.
Two-tier execution profile
SUPERSEDED — 2026-05-20: the two-tier model below is retained for historical traceability. The active policy is Jetson-only (see banner at the top of this doc). Tier-1 (workstation Docker) is deprecated; only the Tier-2 row continues to describe a supported environment.
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|---|---|---|---|
| Tier-1 (workstation Docker) (deprecated 2026-05-20) | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All FT-* correctness, schema, NFT-RES-* resilience scenarios, NFT-SEC-* security scenarios, NFT-LIM-* storage budgets |
Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| Jetson (canonical, 2026-05-20) (formerly "Tier-2") | Jetson Orin Nano Super (pinned hardware per restrictions.md), thermal chamber for AC-NEW-5 |
Everything: FT-* correctness, schema, NFT-RES-*, NFT-SEC-*, NFT-LIM-*, NFT-PERF-* (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) |
Nothing — anything that doesn't run here doesn't run at all |
CI runs the Jetson pipeline (01-test.yml) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on self-hosted-jetson-orin-chamber on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
Docker Environment (Tier-1)
Services
| Service | Image / Build | Purpose | Ports |
|---|---|---|---|
gps-denied-onboard |
local build (docker/Dockerfile) |
The SUT. Production binary built with BUILD_VINS_MONO=OFF per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with BUILD_VINS_MONO=ON |
14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
ardupilot-plane-sitl |
ardupilot/ardupilot-sitl:plane-stable |
ArduPilot Plane SITL. Receives GPS_INPUT from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x |
14550/udp (MAVLink) |
inav-sitl |
inavflight/inav-sitl:9.0.0 |
iNav SITL. Receives MSP2_SENSOR_GPS from the SUT; we read its GPS provider state |
5760/tcp (MSP2 over TCP per iNav SITL convention) |
mock-suite-sat-service |
local build (e2e/fixtures/mock-suite-sat) |
Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
e2e-runner |
local build (e2e/runner) |
Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
mavproxy-listener |
ardupilot/mavproxy:latest |
Passive MAVLink listener that captures the SUT → GCS stream into a per-run .tlog for assertions |
14551/udp |
Networks
| Network | Services | Purpose |
|---|---|---|
e2e-net |
all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on e2e-net enforces this and is itself a security test (NFT-SEC-02). |
Volumes
| Volume | Mounted to | Purpose |
|---|---|---|
tile-cache-fixture |
gps-denied-onboard:/var/azaion/tile-cache:ro |
Pre-built FAISS HNSW index + tile filesystem. Built once per test run from e2e/fixtures/tile-cache-builder/ from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
fdr-output |
gps-denied-onboard:/var/azaion/fdr |
Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker --storage-opt size=64g on this volume) |
input-data |
e2e-runner:/test-data:ro |
Bind mount of _docs/00_problem/input_data/ for replay |
expected-results |
e2e-runner:/expected:ro |
Bind mount of _docs/00_problem/input_data/expected_results/ for assertions |
docker-compose structure
services:
gps-denied-onboard:
build:
context: ../..
dockerfile: docker/Dockerfile
args:
BUILD_VINS_MONO: "OFF"
networks: [e2e-net]
volumes:
- tile-cache-fixture:/var/azaion/tile-cache:ro
- fdr-output:/var/azaion/fdr
environment:
ONBOARD_FC_ADAPTER: ${FC_ADAPTER} # ardupilot | inav, set per scenario
ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY} # okvis2 | klt_ransac (production); vins_mono only in research build
MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
depends_on:
- mock-suite-sat-service
ardupilot-plane-sitl:
image: ardupilot/ardupilot-sitl:plane-stable
networks: [e2e-net]
command: ["--vehicle=ArduPlane", "--gps-type=14"] # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html
inav-sitl:
image: inavflight/inav-sitl:9.0.0
networks: [e2e-net]
# iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md
mock-suite-sat-service:
build: ../fixtures/mock-suite-sat
networks: [e2e-net]
# Egress restriction enforced at network level, not service level
e2e-runner:
build: ../runner
networks: [e2e-net]
volumes:
- input-data:/test-data:ro
- expected-results:/expected:ro
- fdr-output:/fdr:ro
depends_on:
- gps-denied-onboard
- ardupilot-plane-sitl
- inav-sitl
- mavproxy-listener
mavproxy-listener:
image: ardupilot/mavproxy:latest
networks: [e2e-net]
networks:
e2e-net:
driver: bridge
internal: true # NO external connectivity (enforces RESTRICT-SAT-1)
volumes:
tile-cache-fixture: {}
fdr-output: {}
Consumer Application
Tech stack: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), msp_gps_toy (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).
Entry point: pytest e2e/tests/ from inside e2e-runner. Each scenario is a parameterized pytest case keyed by FC adapter (ardupilot / inav) and VioStrategy (okvis2 / klt_ransac) via the session-scoped conftest fixtures.
Harness Implementation Layout
The blackbox harness implementation lives under e2e/ (NOT the SUT source tree — public-boundary discipline enforced by e2e/README.md):
e2e/
├── docker/ Tier-1 entrypoint
│ ├── docker-compose.test.yml Compose stack (services from § Services above)
│ ├── docker-compose.tier2-bridge.yml Compose override for paired-host Tier-2 SITL bridging
│ ├── run-tier1.sh AZ-444 selector-parity wrapper
│ └── secrets/ Mounted Docker secrets (mavlink-passkey)
├── jetson/ Tier-2 entrypoint
│ ├── run-tier2.sh AZ-444 selector-parity wrapper (control-host side)
│ ├── tier2-on-jetson.sh SSH-orchestrated on-Jetson half
│ ├── tier2.service systemd unit template
│ ├── jtop_parser.py jetson_stats / jtop telemetry parser (NFT-LIM-01)
│ └── tegrastats_parser.py tegrastats parser (NFT-LIM-04)
├── runner/ e2e-runner image
│ ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│ ├── helpers/ Per-AC evaluator + observer modules (47 evaluators
│ │ covering accuracy, AP/iNav contract, blackout-spoof,
│ │ cache poisoning, cold-start, companion reboot,
│ │ CVE probe, e2e latency, egress observer, escalation
│ │ ladder, FDR reader, frame-source replay, IMU replay,
│ │ injector fixtures, MAVLink signing, MAVProxy tlog,
│ │ memory budget, mid-flight tile, mock suite-sat audit,
│ │ Monte Carlo envelope, MRE, multi-segment, outage
│ │ request, outlier tolerance, registration classifier,
│ │ retrieval, sharp-turn, sitl_observer, smoothing,
│ │ spoof promotion, storage budget, streaming, thermal
│ │ envelope, tile-cache inspector, TTFF — see
│ │ `e2e/runner/helpers/` for the authoritative list)
│ └── reporting/ CSV reporter + evidence bundler (AZ-445/446)
│ ├── csv_reporter.py Emits `report.csv` per § Reporting
│ ├── evidence_bundler.py Collects per-run `.tlog`, FDR, telemetry CSVs
│ └── nfr_recorder.py NFR per-stage latency + budget recorder
├── fixtures/ Fixture builders + captured fixtures
│ ├── tile-cache-builder/ `tile-cache-fixture` builder
│ ├── age-injector/ `synth-age-tile-set` builder (FT-N-05)
│ ├── injectors/ Runtime injectors:
│ │ ├── outlier.py `outlier-injection-derkachi` (FT-N-01)
│ │ ├── blackout_spoof.py `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│ │ ├── multi_segment.py `multi-segment-derkachi` (FT-P-08)
│ │ ├── cold_boot.py `cold-boot-fixture` (NFT-PERF-03)
│ │ └── fc_proxy.py FC-inbound blackout/spoof proxy (FT-N-04 driver)
│ ├── sitl_replay/ Captured offline FDR-replay fixtures
│ │ └── p01/ FT-P-01 capture set (see test-data.md)
│ ├── sitl_replay_builder/ Captured-fixture builder framework (AZ-598-600)
│ │ ├── builder.py VideoSource × TlogSource × FdrProjection strategies
│ │ ├── build_p01_fixtures.py FT-P-01 still-image builder
│ │ └── build_p02_fixtures.py FT-P-02 Derkachi builder
│ ├── mock-suite-sat/ `mock-suite-sat-service` Docker image
│ ├── secrets/ Test-only secrets (mavlink-test-passkey.txt)
│ └── security/ Security fixtures (cve-2025-53644.jpg)
├── tests/ Pytest target: positive/, negative/, performance/,
│ resilience/, security/, resource_limit/
└── _unit_tests/ Out-of-container unit tests for harness internals
(runs as part of project pytest, no Docker required)
Replay-Mode Skip Gating
Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the E2E_SITL_REPLAY_DIR environment variable is unset, those scenarios skip cleanly via a sitl_replay_ready pytest marker (per AZ-594/595/598/599). To activate them:
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
The captured-fixture builder framework (e2e/fixtures/sitl_replay_builder/) regenerates these fixtures from _docs/00_problem/input_data/ against a live compose stack; the captured artifacts are then committed under e2e/fixtures/sitl_replay/<scenario>/. See e2e/fixtures/sitl_replay_builder/README.md for the framework, supported scenarios, and per-scenario builder invocations.
Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|---|---|---|---|
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared /test-data mount |
none (local) |
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | udp:gps-denied-onboard:14550 (AP) or tcp:gps-denied-onboard:5760 (iNav) |
MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
| Tile cache | Filesystem read | /var/azaion/tile-cache (read-only mount) |
filesystem perms |
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | udp:ardupilot-plane-sitl:14550 or tcp:inav-sitl:5760 |
passive listener |
| GCS telemetry observation | MAVLink listener | udp:mavproxy-listener:14551 (forwarded from SUT 14550) |
none |
| FDR output | Filesystem read post-run | /fdr (read-only mount) |
filesystem perms |
| Suite Sat Service mock | HTTP/JSON | http://mock-suite-sat-service:8080 |
none (test) |
What the consumer does NOT have access to
- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
- No internal Python/C++ module imports from the SUT.
- No shared memory or filesystem with the SUT outside the four explicit mounts (
tile-cache-fixturer/o,fdr-outputr/o from runner side,input-datar/o,expected-resultsr/o). - No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.
CI/CD Integration
2026-05-20: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
When to run (active policy):
- Jetson (colocated arm64 Woodpecker agent): on every PR to
devbranch, nightly ondevHEAD, and as a hard gate before any release tag. - AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
Pipeline stage: a single Jetson workflow (.woodpecker/01-test.yml) on the self-hosted-jetson-orin runner exercises the full suite — there is no longer a parallel x86 lane.
Gate behavior: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
Timeout:
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
Reporting
Format: CSV (one row per test).
Columns: test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths
traces_to: comma-separated AC/RESTRICT IDs from the traceability matrix.fc_adapter:ardupilot|inav|n/a.vio_strategy:okvis2|klt_ransac|vins_mono|n/a(research-build only forvins_mono).tier:tier1-docker|tier2-jetson|tier2-chamber.result:PASS|FAIL|SKIP|XFAIL(XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).evidence_paths: comma-separated paths inside the run-output bundle (.tlogfiles, FDR archives, screenshots, profiler traces) supporting the verdict.
Output path: e2e-results/run-${RUN_ID}/report.csv plus a per-run bundle of evidence at e2e-results/run-${RUN_ID}/evidence/.
Test Execution
Decision (2026-05-20, refined later that day) — Jetson is the binding e2e environment; unit tests may run locally. This refines the earlier "Jetson only for everything" wording. Rationale captured in _docs/LESSONS.md (2026-05-20 entries):
- The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing
libgl1, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment. - Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 30–90 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.
Operational entry points:
| Tier | Entry point | Where it runs |
|---|---|---|
Unit (tests/unit/) |
pytest tests/unit/ -q directly, or scripts/run-tests.sh |
local workstation (Python 3.10+ venv) |
Blackbox / e2e (tests/e2e/, e2e/tests/) |
scripts/run-tests-jetson.sh (local dev) / .woodpecker/01-test.yml (CI) |
colocated arm64 Jetson Woodpecker agent — see _docs/04_deploy/ci_cd_pipeline.md |
| Performance / resilience / security / resource-limit | same as e2e | Jetson only |
| AC-NEW-5 thermal chamber | quarterly + pre-release | self-hosted-jetson-orin-chamber |
A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
Decision (2026-05-09, SUPERSEDED): both — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
Hardware dependencies found (Phase 3 → Hardware Assessment scan)
| Category | Indicator | Source file |
|---|---|---|
| GPU / CUDA | TensorRT engines (.engine, SM 87, JetPack 6.2, TRT 10.3) |
_docs/01_solution/solution.md PRE-FLIGHT block |
| GPU / CUDA | DISK+LightGlue FP16 inference | _docs/01_solution/solution.md RUNTIME block (C3) |
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | _docs/00_problem/restrictions.md § Onboard Hardware |
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | _docs/00_problem/restrictions.md § Cameras |
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | _docs/02_document/tests/environment.md § Overview |
| OS-specific services | High-rate IMU via UART/MAVLink to FC | _docs/00_problem/restrictions.md § Sensors & Integration |
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | _docs/00_problem/restrictions.md § Sensors & Integration |
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | _docs/02_document/tests/resource-limit-tests.md NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | _docs/00_problem/restrictions.md § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: pyproject.toml confirms tensorrt, pymavlink, gtsam==4.2.1, faiss-gpu, opencv-python>=4.11.0.86,<4.12 (cycle-1 relaxation per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md — the original >=4.12.0 target replays once gtsam ships numpy-2 wheels), and jetson-stats. pycuda was NOT added — TensorRT EP is invoked via ONNX Runtime + the onnx_trt_ep_runtime factory, which uses TensorRT's Python bindings directly without pycuda.)
Execution instructions — Tier-1 (Docker)
Prerequisites:
- Docker 24+ with Compose v2.
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for
tile-cache-fixture+fdr-output+ image build cache.
How to start (preferred — selector-parity wrapper from AZ-444):
./e2e/docker/run-tier1.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--enable-chamber]
run-tier1.sh and e2e/jetson/run-tier2.sh accept the same -k <selector> flag and emit the same pytest invocation modulo the TIER env var (AZ-444 AC-1).
Raw-compose equivalent (when bypassing the wrapper for debugging):
cd e2e/docker
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
The run reports to ./e2e-results/run-${RUN_ID}/report.csv (see § Reporting). Exit code matches the test verdict.
Environment variables:
FC_ADAPTER∈{ardupilot, inav}— selects which SITL the SUT talks to.VIO_STRATEGY∈{okvis2, klt_ransac}for production binary;vins_monoonly when the research binaryBUILD_VINS_MONO=ONis the build.MAVLINK_SIGNING_PASSKEY_FILE— path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.E2E_SITL_REPLAY_DIR— when set, activates captured-fixture FDR-replay mode for scenarios that gate onsitl_replay_ready; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).RUN_ID— per-invocation run identifier; defaults tolocal-${USER}-${EPOCH}in development, CI sets it from the workflow run id. Determines thee2e-results/run-${RUN_ID}/output directory.
Skipped on Tier-1: NFT-PERF-01 (AC-4.1 latency p95 — Jetson-bound), NFT-LIM-01 (AC-4.2 memory — Jetson-bound), NFT-PERF-03 (AC-NEW-1 cold-start — Jetson-bound), NFT-LIM-04 (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
Execution instructions — Tier-2 (Jetson hardware loop)
Prerequisites:
- Jetson Orin Nano Super (per
restrictions.md§ Onboard Hardware). - JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all
AC-2.xcross-validation isXFAILfor that run).
How to start (AZ-444 selector-parity wrapper):
./e2e/jetson/run-tier2.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--duration 5min|8h] \
[--enable-chamber] \
[--reflash]
The Tier-2 SITL stack runs on a paired x86 host via:
docker compose \
-f e2e/docker/docker-compose.test.yml \
-f e2e/docker/docker-compose.tier2-bridge.yml up ...
When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (tier2-on-jetson.sh). When TIER2_HOST=localhost and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.
Environment variables: same as Tier-1 plus:
TIER2_HOST/TIER2_USER/TIER2_KEY_PATH— control-host → Jetson SSH wiring (required whenTIER2_HOST != localhost).TIER2_CHAMBER_AMBIENT_C— ambient temperature for AC-NEW-5 chamber runs.TIER2_CAMERA_DEVICE—/dev/video0(production) or file path for replay mode.
gps-denied-onboard.service (or gps-denied-onboard-asan.service for --build-kind=asan) MUST be installed via systemd on the Jetson — e2e/jetson/tier2.service is the template. See _docs/03_implementation/jetson_harness_setup.md for the physical provisioning steps.
CI runner mapping
Active mapping (2026-05-20):
self-hosted-jetson-orin(colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. This is the single canonical CI test runner.self-hosted-jetson-orin-chamber→ AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
Removed (2026-05-20):
- ~
— Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented inubuntu-24.04(GitHub-hosted) → Tier-1 Docker, every PR + nightly.30-45 min per matrix entry.02-build-push.yml, but the test lane is Jetson-only.
Matrix dimensions: FC_ADAPTER × VIO_STRATEGY × build_kind where build_kind ∈ {production, research}. Production vins_mono is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.