Files
gps-denied-onboard/_docs/02_document/tests/environment.md
T
Oleksandr Bezdieniezhnykh b66b68ff76 [AZ-700] gps-denied-render-map: HTML map of estimated vs truth tracks
New operator-side console-script renders a self-contained HTML map
(folium / Leaflet) comparing the estimator's JSONL track against
the tlog ground-truth track. Pinned visual style: red truth + blue
estimated polylines, start/end markers per track, 100 m + 50 m
scale circles, optional AZ-699 accuracy-summary banner, and an
--offline-tiles mode (with optional local tile-URL template) for
Jetsons without internet.

folium is gated behind a new [operator-tools] optional-dep so the
airborne binary's cold-start NFR is unaffected (C12 binary doesn't
import the new module). 14 new unit tests pin polyline count,
marker count, scale-circle radii, summary embedding, offline-tile
behaviour, and full CLI smoke. Zero mypy --strict errors.

Refines the 2026-05-20 Jetson-only test policy: unit tests may run
locally, e2e/perf/resilience/security stay Jetson-only. Documented
in _docs/02_document/tests/environment.md (Where each tier runs)
and .cursor/rules/testing.mdc (Test environment for this project).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-20 17:04:01 +03:00

413 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Test Environment
> **Active policy — 2026-05-20 (refined)**: the canonical CI / release-gate
> test environment is the Jetson Orin Nano Super (or a Jetson-equivalent
> arm64 agent). **Unit tests** (`pytest tests/unit/`) MAY be run on a local
> developer workstation for fast iteration — they are hardware-agnostic by
> construction, the suite is fully synthetic, and Jetson SSH round-trips add
> latency without adding signal. **Blackbox / e2e / performance / resilience
> / security / resource-limit tests** (`tests/e2e/`, `e2e/tests/`,
> `tests/perf/`, etc.) MUST run on the Jetson — never on a local workstation
> — because their pass criteria are tied to Jetson wall-clock latency,
> thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86
> Docker (the historical "Tier-1" path) is **deprecated** as a supported
> e2e environment; the Tier-1 sections below are retained as historical
> reference / traceability only. CI e2e pipelines target the colocated
> arm64 Jetson Woodpecker agent (see `_docs/04_deploy/ci_cd_pipeline.md`);
> local-development e2e runs SHOULD use `scripts/run-tests-jetson.sh`
> against the configured `jetson-e2e` SSH alias rather than
> `scripts/run-tests.sh`. This refinement supersedes the 2026-05-20 "all
> tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in
> the § Test Execution section.
## Where each tier runs (active policy)
| Tier | Local workstation | Jetson (canonical) | When local is the only option |
|------|--------------------|--------------------|-------------------------------|
| Unit (`tests/unit/`) | ✅ allowed and encouraged for dev iteration | ✅ also run as part of the Jetson CI lane | always |
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | ❌ forbidden — placeholder fixtures + missing hardware = false-negative runs | ✅ required for any merge / release decision | never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result |
| Performance / resilience / security / resource-limit | ❌ forbidden | ✅ required | never |
| Thermal chamber (AC-NEW-5) | ❌ forbidden | ✅ chamber Jetson only | never |
Practical consequences:
- A PR may merge on green local unit tests + green Jetson e2e tests.
- A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
- When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via `_docs/_process_leftovers/`) rather than substituting a local run.
- Tests in `tests/e2e/` that gate on `RUN_REPLAY_E2E` or `@pytest.mark.tier2` will SKIP locally; this is correct behaviour, not a failure to investigate.
## Overview
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
- **Inbound — nav-camera frames**: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per `restrictions.md`; tests: file-backed source replaying `_docs/00_problem/input_data/AD0000NN.jpg` or `flight_derkachi/flight_derkachi.mp4`).
- **Inbound — FC telemetry**: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying `SCALED_IMU2`, `ATTITUDE`, `GLOBAL_POSITION_INT` (or MSP equivalents). Tests replay `flight_derkachi/data_imu.csv` through a thin replayer.
- **Inbound — satellite tile cache**: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
- **Outbound — FC external-positioning**: MAVLink `GPS_INPUT` (ArduPilot Plane) OR MSP2 `MSP2_SENSOR_GPS` (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
- **Outbound — GCS telemetry**: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
- **Outbound — Flight Data Recorder**: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.
**Consumer app purpose**: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.
## Two-tier execution profile
> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
> historical traceability. The active policy is **Jetson-only** (see banner
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
> the Tier-2 row continues to describe a supported environment.
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|------|----------|----------------|---------------|
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |
CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
## Docker Environment (Tier-1)
### Services
| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
| `mock-suite-sat-service` | local build (`e2e/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`e2e/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |
### Networks
| Network | Services | Purpose |
|---------|----------|---------|
| `e2e-net` | all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on `e2e-net` enforces this and is itself a security test (NFT-SEC-02). |
### Volumes
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |
### docker-compose structure
```yaml
services:
gps-denied-onboard:
build:
context: ../..
dockerfile: docker/Dockerfile
args:
BUILD_VINS_MONO: "OFF"
networks: [e2e-net]
volumes:
- tile-cache-fixture:/var/azaion/tile-cache:ro
- fdr-output:/var/azaion/fdr
environment:
ONBOARD_FC_ADAPTER: ${FC_ADAPTER} # ardupilot | inav, set per scenario
ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY} # okvis2 | klt_ransac (production); vins_mono only in research build
MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
depends_on:
- mock-suite-sat-service
ardupilot-plane-sitl:
image: ardupilot/ardupilot-sitl:plane-stable
networks: [e2e-net]
command: ["--vehicle=ArduPlane", "--gps-type=14"] # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html
inav-sitl:
image: inavflight/inav-sitl:9.0.0
networks: [e2e-net]
# iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md
mock-suite-sat-service:
build: ../fixtures/mock-suite-sat
networks: [e2e-net]
# Egress restriction enforced at network level, not service level
e2e-runner:
build: ../runner
networks: [e2e-net]
volumes:
- input-data:/test-data:ro
- expected-results:/expected:ro
- fdr-output:/fdr:ro
depends_on:
- gps-denied-onboard
- ardupilot-plane-sitl
- inav-sitl
- mavproxy-listener
mavproxy-listener:
image: ardupilot/mavproxy:latest
networks: [e2e-net]
networks:
e2e-net:
driver: bridge
internal: true # NO external connectivity (enforces RESTRICT-SAT-1)
volumes:
tile-cache-fixture: {}
fdr-output: {}
```
## Consumer Application
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).
**Entry point**: `pytest e2e/tests/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`) and VioStrategy (`okvis2` / `klt_ransac`) via the session-scoped conftest fixtures.
### Harness Implementation Layout
The blackbox harness implementation lives under `e2e/` (NOT the SUT source tree — public-boundary discipline enforced by `e2e/README.md`):
```
e2e/
├── docker/ Tier-1 entrypoint
│ ├── docker-compose.test.yml Compose stack (services from § Services above)
│ ├── docker-compose.tier2-bridge.yml Compose override for paired-host Tier-2 SITL bridging
│ ├── run-tier1.sh AZ-444 selector-parity wrapper
│ └── secrets/ Mounted Docker secrets (mavlink-passkey)
├── jetson/ Tier-2 entrypoint
│ ├── run-tier2.sh AZ-444 selector-parity wrapper (control-host side)
│ ├── tier2-on-jetson.sh SSH-orchestrated on-Jetson half
│ ├── tier2.service systemd unit template
│ ├── jtop_parser.py jetson_stats / jtop telemetry parser (NFT-LIM-01)
│ └── tegrastats_parser.py tegrastats parser (NFT-LIM-04)
├── runner/ e2e-runner image
│ ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│ ├── helpers/ Per-AC evaluator + observer modules (47 evaluators
│ │ covering accuracy, AP/iNav contract, blackout-spoof,
│ │ cache poisoning, cold-start, companion reboot,
│ │ CVE probe, e2e latency, egress observer, escalation
│ │ ladder, FDR reader, frame-source replay, IMU replay,
│ │ injector fixtures, MAVLink signing, MAVProxy tlog,
│ │ memory budget, mid-flight tile, mock suite-sat audit,
│ │ Monte Carlo envelope, MRE, multi-segment, outage
│ │ request, outlier tolerance, registration classifier,
│ │ retrieval, sharp-turn, sitl_observer, smoothing,
│ │ spoof promotion, storage budget, streaming, thermal
│ │ envelope, tile-cache inspector, TTFF — see
│ │ `e2e/runner/helpers/` for the authoritative list)
│ └── reporting/ CSV reporter + evidence bundler (AZ-445/446)
│ ├── csv_reporter.py Emits `report.csv` per § Reporting
│ ├── evidence_bundler.py Collects per-run `.tlog`, FDR, telemetry CSVs
│ └── nfr_recorder.py NFR per-stage latency + budget recorder
├── fixtures/ Fixture builders + captured fixtures
│ ├── tile-cache-builder/ `tile-cache-fixture` builder
│ ├── age-injector/ `synth-age-tile-set` builder (FT-N-05)
│ ├── injectors/ Runtime injectors:
│ │ ├── outlier.py `outlier-injection-derkachi` (FT-N-01)
│ │ ├── blackout_spoof.py `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│ │ ├── multi_segment.py `multi-segment-derkachi` (FT-P-08)
│ │ ├── cold_boot.py `cold-boot-fixture` (NFT-PERF-03)
│ │ └── fc_proxy.py FC-inbound blackout/spoof proxy (FT-N-04 driver)
│ ├── sitl_replay/ Captured offline FDR-replay fixtures
│ │ └── p01/ FT-P-01 capture set (see test-data.md)
│ ├── sitl_replay_builder/ Captured-fixture builder framework (AZ-598-600)
│ │ ├── builder.py VideoSource × TlogSource × FdrProjection strategies
│ │ ├── build_p01_fixtures.py FT-P-01 still-image builder
│ │ └── build_p02_fixtures.py FT-P-02 Derkachi builder
│ ├── mock-suite-sat/ `mock-suite-sat-service` Docker image
│ ├── secrets/ Test-only secrets (mavlink-test-passkey.txt)
│ └── security/ Security fixtures (cve-2025-53644.jpg)
├── tests/ Pytest target: positive/, negative/, performance/,
│ resilience/, security/, resource_limit/
└── _unit_tests/ Out-of-container unit tests for harness internals
(runs as part of project pytest, no Docker required)
```
### Replay-Mode Skip Gating
Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the `E2E_SITL_REPLAY_DIR` environment variable is unset, those scenarios skip cleanly via a `sitl_replay_ready` pytest marker (per AZ-594/595/598/599). To activate them:
```bash
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
```
The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) regenerates these fixtures from `_docs/00_problem/input_data/` against a live compose stack; the captured artifacts are then committed under `e2e/fixtures/sitl_replay/<scenario>/`. See `e2e/fixtures/sitl_replay_builder/README.md` for the framework, supported scenarios, and per-scenario builder invocations.
### Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared `/test-data` mount | none (local) |
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | `udp:gps-denied-onboard:14550` (AP) or `tcp:gps-denied-onboard:5760` (iNav) | MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
| Tile cache | Filesystem read | `/var/azaion/tile-cache` (read-only mount) | filesystem perms |
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | `udp:ardupilot-plane-sitl:14550` or `tcp:inav-sitl:5760` | passive listener |
| GCS telemetry observation | MAVLink listener | `udp:mavproxy-listener:14551` (forwarded from SUT 14550) | none |
| FDR output | Filesystem read post-run | `/fdr` (read-only mount) | filesystem perms |
| Suite Sat Service mock | HTTP/JSON | `http://mock-suite-sat-service:8080` | none (test) |
### What the consumer does NOT have access to
- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
- No internal Python/C++ module imports from the SUT.
- No shared memory or filesystem with the SUT outside the four explicit mounts (`tile-cache-fixture` r/o, `fdr-output` r/o from runner side, `input-data` r/o, `expected-results` r/o).
- No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.
## CI/CD Integration
> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
**When to run** (active policy):
- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
**Timeout**:
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
## Reporting
**Format**: CSV (one row per test).
**Columns**: `test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths`
- `traces_to`: comma-separated AC/RESTRICT IDs from the traceability matrix.
- `fc_adapter`: `ardupilot` | `inav` | `n/a`.
- `vio_strategy`: `okvis2` | `klt_ransac` | `vins_mono` | `n/a` (research-build only for `vins_mono`).
- `tier`: `tier1-docker` | `tier2-jetson` | `tier2-chamber`.
- `result`: `PASS` | `FAIL` | `SKIP` | `XFAIL` (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
- `evidence_paths`: comma-separated paths inside the run-output bundle (`.tlog` files, FDR archives, screenshots, profiler traces) supporting the verdict.
**Output path**: `e2e-results/run-${RUN_ID}/report.csv` plus a per-run bundle of evidence at `e2e-results/run-${RUN_ID}/evidence/`.
## Test Execution
**Decision (2026-05-20, refined later that day)****Jetson is the binding e2e environment; unit tests may run locally.** This refines the earlier "Jetson only for everything" wording. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entries):
- The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment.
- Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 3090 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.
**Operational entry points**:
| Tier | Entry point | Where it runs |
|------|-------------|---------------|
| Unit (`tests/unit/`) | `pytest tests/unit/ -q` directly, or `scripts/run-tests.sh` | local workstation (Python 3.10+ venv) |
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | `scripts/run-tests-jetson.sh` (local dev) / `.woodpecker/01-test.yml` (CI) | colocated arm64 Jetson Woodpecker agent — see `_docs/04_deploy/ci_cd_pipeline.md` |
| Performance / resilience / security / resource-limit | same as e2e | Jetson only |
| AC-NEW-5 thermal chamber | quarterly + pre-release | `self-hosted-jetson-orin-chamber` |
A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
---
**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
| Category | Indicator | Source file |
|---|---|---|
| GPU / CUDA | TensorRT engines (`.engine`, SM 87, JetPack 6.2, TRT 10.3) | `_docs/01_solution/solution.md` PRE-FLIGHT block |
| GPU / CUDA | DISK+LightGlue FP16 inference | `_docs/01_solution/solution.md` RUNTIME block (C3) |
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | `_docs/00_problem/restrictions.md` § Onboard Hardware |
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | `_docs/00_problem/restrictions.md` § Cameras |
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | `_docs/02_document/tests/environment.md` § Overview |
| OS-specific services | High-rate IMU via UART/MAVLink to FC | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: `pyproject.toml` confirms `tensorrt`, `pymavlink`, `gtsam==4.2.1`, `faiss-gpu`, `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — the original `>=4.12.0` target replays once gtsam ships numpy-2 wheels), and `jetson-stats`. `pycuda` was NOT added — TensorRT EP is invoked via ONNX Runtime + the `onnx_trt_ep_runtime` factory, which uses TensorRT's Python bindings directly without `pycuda`.)
### Execution instructions — Tier-1 (Docker)
**Prerequisites**:
- Docker 24+ with Compose v2.
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.
**How to start** (preferred — selector-parity wrapper from AZ-444):
```bash
./e2e/docker/run-tier1.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--enable-chamber]
```
`run-tier1.sh` and `e2e/jetson/run-tier2.sh` accept the same `-k <selector>` flag and emit the same pytest invocation modulo the `TIER` env var (AZ-444 AC-1).
Raw-compose equivalent (when bypassing the wrapper for debugging):
```bash
cd e2e/docker
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
```
The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.
**Environment variables**:
- `FC_ADAPTER``{ardupilot, inav}` — selects which SITL the SUT talks to.
- `VIO_STRATEGY``{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
- `E2E_SITL_REPLAY_DIR` — when set, activates captured-fixture FDR-replay mode for scenarios that gate on `sitl_replay_ready`; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
- `RUN_ID` — per-invocation run identifier; defaults to `local-${USER}-${EPOCH}` in development, CI sets it from the workflow run id. Determines the `e2e-results/run-${RUN_ID}/` output directory.
**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
### Execution instructions — Tier-2 (Jetson hardware loop)
**Prerequisites**:
- Jetson Orin Nano Super (per `restrictions.md` § Onboard Hardware).
- JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).
**How to start** (AZ-444 selector-parity wrapper):
```bash
./e2e/jetson/run-tier2.sh \
--fc-adapter ardupilot \
--vio-strategy okvis2 \
[-k <pytest selector>] \
[--build-kind production|asan] \
[--duration 5min|8h] \
[--enable-chamber] \
[--reflash]
```
The Tier-2 SITL stack runs on a paired x86 host via:
```bash
docker compose \
-f e2e/docker/docker-compose.test.yml \
-f e2e/docker/docker-compose.tier2-bridge.yml up ...
```
When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (`tier2-on-jetson.sh`). When `TIER2_HOST=localhost` and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.
**Environment variables**: same as Tier-1 plus:
- `TIER2_HOST` / `TIER2_USER` / `TIER2_KEY_PATH` — control-host → Jetson SSH wiring (required when `TIER2_HOST != localhost`).
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
- `TIER2_CAMERA_DEVICE``/dev/video0` (production) or file path for replay mode.
`gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` for `--build-kind=asan`) MUST be installed via systemd on the Jetson — `e2e/jetson/tier2.service` is the template. See `_docs/03_implementation/jetson_harness_setup.md` for the physical provisioning steps.
### CI runner mapping
**Active mapping (2026-05-20)**:
- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
**Removed (2026-05-20)**:
- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.