gps-denied-onboard/_docs/02_document/tests/environment.md

# Test Environment

> **Active policy — 2026-05-20 (refined)**: the canonical CI / release-gate
> test environment is the Jetson Orin Nano Super (or a Jetson-equivalent
> arm64 agent). **Unit tests** (`pytest tests/unit/`) MAY be run on a local
> developer workstation for fast iteration — they are hardware-agnostic by
> construction, the suite is fully synthetic, and Jetson SSH round-trips add
> latency without adding signal. **Blackbox / e2e / performance / resilience
> / security / resource-limit tests** (`tests/e2e/`, `e2e/tests/`,
> `tests/perf/`, etc.) MUST run on the Jetson — never on a local workstation
> — because their pass criteria are tied to Jetson wall-clock latency,
> thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86
> Docker (the historical "Tier-1" path) is **deprecated** as a supported
> e2e environment; the Tier-1 sections below are retained as historical
> reference / traceability only. CI e2e pipelines target the colocated
> arm64 Jetson Woodpecker agent (see `_docs/04_deploy/ci_cd_pipeline.md`);
> local-development e2e runs SHOULD use `scripts/run-tests-jetson.sh`
> against the configured `jetson-e2e` SSH alias rather than
> `scripts/run-tests.sh`. This refinement supersedes the 2026-05-20 "all
> tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in
> the § Test Execution section.

## Where each tier runs (active policy)

| Tier | Local workstation | Jetson (canonical) | When local is the only option |
|------|--------------------|--------------------|-------------------------------|
| Unit (`tests/unit/`) | ✅ allowed and encouraged for dev iteration | ✅ also run as part of the Jetson CI lane | always |
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | ❌ forbidden — placeholder fixtures + missing hardware = false-negative runs | ✅ required for any merge / release decision | never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result |
| Performance / resilience / security / resource-limit | ❌ forbidden | ✅ required | never |
| Thermal chamber (AC-NEW-5) | ❌ forbidden | ✅ chamber Jetson only | never |

Practical consequences:

- A PR may merge on green local unit tests + green Jetson e2e tests.
- A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
- When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via `_docs/_process_leftovers/`) rather than substituting a local run.
- Tests in `tests/e2e/` that gate on `RUN_REPLAY_E2E` or `@pytest.mark.tier2` will SKIP locally; this is correct behaviour, not a failure to investigate.

## Overview

**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):

- **Inbound — nav-camera frames**: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per `restrictions.md`; tests: file-backed source replaying `_docs/00_problem/input_data/AD0000NN.jpg` or `flight_derkachi/flight_derkachi.mp4`).
- **Inbound — FC telemetry**: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying `SCALED_IMU2`, `ATTITUDE`, `GLOBAL_POSITION_INT` (or MSP equivalents). Tests replay `flight_derkachi/data_imu.csv` through a thin replayer.
- **Inbound — satellite tile cache**: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
- **Outbound — FC external-positioning**: MAVLink `GPS_INPUT` (ArduPilot Plane) OR MSP2 `MSP2_SENSOR_GPS` (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
- **Outbound — GCS telemetry**: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
- **Outbound — Flight Data Recorder**: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.

**Consumer app purpose**: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.

## Two-tier execution profile

> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
> historical traceability. The active policy is **Jetson-only** (see banner
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
> the Tier-2 row continues to describe a supported environment.

This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.

| Tier | Hardware | What it covers | What it skips |
|------|----------|----------------|---------------|
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |

CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.

## Docker Environment (Tier-1)

### Services

| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
| `mock-suite-sat-service` | local build (`e2e/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`e2e/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |

### Networks

| Network | Services | Purpose |
|---------|----------|---------|
| `e2e-net` | all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on `e2e-net` enforces this and is itself a security test (NFT-SEC-02). |

### Volumes

| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |

### docker-compose structure

```yaml
services:
  gps-denied-onboard:
    build:
      context: ../..
      dockerfile: docker/Dockerfile
      args:
        BUILD_VINS_MONO: "OFF"
    networks: [e2e-net]
    volumes:
      - tile-cache-fixture:/var/azaion/tile-cache:ro
      - fdr-output:/var/azaion/fdr
    environment:
      ONBOARD_FC_ADAPTER: ${FC_ADAPTER}        # ardupilot | inav, set per scenario
      ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY}    # okvis2 | klt_ransac (production); vins_mono only in research build
      MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
    depends_on:
      - mock-suite-sat-service

  ardupilot-plane-sitl:
    image: ardupilot/ardupilot-sitl:plane-stable
    networks: [e2e-net]
    command: ["--vehicle=ArduPlane", "--gps-type=14"]   # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html

  inav-sitl:
    image: inavflight/inav-sitl:9.0.0
    networks: [e2e-net]
    # iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md

  mock-suite-sat-service:
    build: ../fixtures/mock-suite-sat
    networks: [e2e-net]
    # Egress restriction enforced at network level, not service level

  e2e-runner:
    build: ../runner
    networks: [e2e-net]
    volumes:
      - input-data:/test-data:ro
      - expected-results:/expected:ro
      - fdr-output:/fdr:ro
    depends_on:
      - gps-denied-onboard
      - ardupilot-plane-sitl
      - inav-sitl
      - mavproxy-listener

  mavproxy-listener:
    image: ardupilot/mavproxy:latest
    networks: [e2e-net]

networks:
  e2e-net:
    driver: bridge
    internal: true   # NO external connectivity (enforces RESTRICT-SAT-1)

volumes:
  tile-cache-fixture: {}
  fdr-output: {}
```

## Consumer Application

**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).

**Entry point**: `pytest e2e/tests/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`) and VioStrategy (`okvis2` / `klt_ransac`) via the session-scoped conftest fixtures.

### Harness Implementation Layout

The blackbox harness implementation lives under `e2e/` (NOT the SUT source tree — public-boundary discipline enforced by `e2e/README.md`):

```
e2e/
├── docker/                            Tier-1 entrypoint
│   ├── docker-compose.test.yml        Compose stack (services from § Services above)
│   ├── docker-compose.tier2-bridge.yml  Compose override for paired-host Tier-2 SITL bridging
│   ├── run-tier1.sh                   AZ-444 selector-parity wrapper
│   └── secrets/                       Mounted Docker secrets (mavlink-passkey)
├── jetson/                            Tier-2 entrypoint
│   ├── run-tier2.sh                   AZ-444 selector-parity wrapper (control-host side)
│   ├── tier2-on-jetson.sh             SSH-orchestrated on-Jetson half
│   ├── tier2.service                  systemd unit template
│   ├── jtop_parser.py                 jetson_stats / jtop telemetry parser (NFT-LIM-01)
│   └── tegrastats_parser.py           tegrastats parser (NFT-LIM-04)
├── runner/                            e2e-runner image
│   ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│   ├── helpers/                       Per-AC evaluator + observer modules (47 evaluators
│   │                                  covering accuracy, AP/iNav contract, blackout-spoof,
│   │                                  cache poisoning, cold-start, companion reboot,
│   │                                  CVE probe, e2e latency, egress observer, escalation
│   │                                  ladder, FDR reader, frame-source replay, IMU replay,
│   │                                  injector fixtures, MAVLink signing, MAVProxy tlog,
│   │                                  memory budget, mid-flight tile, mock suite-sat audit,
│   │                                  Monte Carlo envelope, MRE, multi-segment, outage
│   │                                  request, outlier tolerance, registration classifier,
│   │                                  retrieval, sharp-turn, sitl_observer, smoothing,
│   │                                  spoof promotion, storage budget, streaming, thermal
│   │                                  envelope, tile-cache inspector, TTFF — see
│   │                                  `e2e/runner/helpers/` for the authoritative list)
│   └── reporting/                     CSV reporter + evidence bundler (AZ-445/446)
│       ├── csv_reporter.py            Emits `report.csv` per § Reporting
│       ├── evidence_bundler.py        Collects per-run `.tlog`, FDR, telemetry CSVs
│       └── nfr_recorder.py            NFR per-stage latency + budget recorder
├── fixtures/                          Fixture builders + captured fixtures
│   ├── tile-cache-builder/            `tile-cache-fixture` builder
│   ├── age-injector/                  `synth-age-tile-set` builder (FT-N-05)
│   ├── injectors/                     Runtime injectors:
│   │   ├── outlier.py                 `outlier-injection-derkachi` (FT-N-01)
│   │   ├── blackout_spoof.py          `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│   │   ├── multi_segment.py           `multi-segment-derkachi` (FT-P-08)
│   │   ├── cold_boot.py               `cold-boot-fixture` (NFT-PERF-03)
│   │   └── fc_proxy.py                FC-inbound blackout/spoof proxy (FT-N-04 driver)
│   ├── sitl_replay/                   Captured offline FDR-replay fixtures
│   │   └── p01/                       FT-P-01 capture set (see test-data.md)
│   ├── sitl_replay_builder/           Captured-fixture builder framework (AZ-598-600)
│   │   ├── builder.py                 VideoSource × TlogSource × FdrProjection strategies
│   │   ├── build_p01_fixtures.py      FT-P-01 still-image builder
│   │   └── build_p02_fixtures.py      FT-P-02 Derkachi builder
│   ├── mock-suite-sat/                `mock-suite-sat-service` Docker image
│   ├── secrets/                       Test-only secrets (mavlink-test-passkey.txt)
│   └── security/                      Security fixtures (cve-2025-53644.jpg)
├── tests/                             Pytest target: positive/, negative/, performance/,
│                                      resilience/, security/, resource_limit/
└── _unit_tests/                       Out-of-container unit tests for harness internals
                                       (runs as part of project pytest, no Docker required)
```

### Replay-Mode Skip Gating

Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the `E2E_SITL_REPLAY_DIR` environment variable is unset, those scenarios skip cleanly via a `sitl_replay_ready` pytest marker (per AZ-594/595/598/599). To activate them:

```bash
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
    pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
```

The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) regenerates these fixtures from `_docs/00_problem/input_data/` against a live compose stack; the captured artifacts are then committed under `e2e/fixtures/sitl_replay/<scenario>/`. See `e2e/fixtures/sitl_replay_builder/README.md` for the framework, supported scenarios, and per-scenario builder invocations.

### Communication with system under test

| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared `/test-data` mount | none (local) |
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | `udp:gps-denied-onboard:14550` (AP) or `tcp:gps-denied-onboard:5760` (iNav) | MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
| Tile cache | Filesystem read | `/var/azaion/tile-cache` (read-only mount) | filesystem perms |
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | `udp:ardupilot-plane-sitl:14550` or `tcp:inav-sitl:5760` | passive listener |
| GCS telemetry observation | MAVLink listener | `udp:mavproxy-listener:14551` (forwarded from SUT 14550) | none |
| FDR output | Filesystem read post-run | `/fdr` (read-only mount) | filesystem perms |
| Suite Sat Service mock | HTTP/JSON | `http://mock-suite-sat-service:8080` | none (test) |

### What the consumer does NOT have access to

- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
- No internal Python/C++ module imports from the SUT.
- No shared memory or filesystem with the SUT outside the four explicit mounts (`tile-cache-fixture` r/o, `fdr-output` r/o from runner side, `input-data` r/o, `expected-results` r/o).
- No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.

## CI/CD Integration

> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.

**When to run** (active policy):

- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.

**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.

**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.

**Timeout**:
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).

## Reporting

**Format**: CSV (one row per test).

**Columns**: `test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths`

- `traces_to`: comma-separated AC/RESTRICT IDs from the traceability matrix.
- `fc_adapter`: `ardupilot` | `inav` | `n/a`.
- `vio_strategy`: `okvis2` | `klt_ransac` | `vins_mono` | `n/a` (research-build only for `vins_mono`).
- `tier`: `tier1-docker` | `tier2-jetson` | `tier2-chamber`.
- `result`: `PASS` | `FAIL` | `SKIP` | `XFAIL` (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
- `evidence_paths`: comma-separated paths inside the run-output bundle (`.tlog` files, FDR archives, screenshots, profiler traces) supporting the verdict.

**Output path**: `e2e-results/run-${RUN_ID}/report.csv` plus a per-run bundle of evidence at `e2e-results/run-${RUN_ID}/evidence/`.

## Test Execution

**Decision (2026-05-20, refined later that day)** — **Jetson is the binding e2e environment; unit tests may run locally.** This refines the earlier "Jetson only for everything" wording. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entries):

- The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment.
- Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 30–90 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.

**Operational entry points**:

| Tier | Entry point | Where it runs |
|------|-------------|---------------|
| Unit (`tests/unit/`) | `pytest tests/unit/ -q` directly, or `scripts/run-tests.sh` | local workstation (Python 3.10+ venv) |
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | `scripts/run-tests-jetson.sh` (local dev) / `.woodpecker/01-test.yml` (CI) | colocated arm64 Jetson Woodpecker agent — see `_docs/04_deploy/ci_cd_pipeline.md` |
| Performance / resilience / security / resource-limit | same as e2e | Jetson only |
| AC-NEW-5 thermal chamber | quarterly + pre-release | `self-hosted-jetson-orin-chamber` |

A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.

The remainder of this section preserves the original 2026-05-09 decision context for traceability.

---

**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.

### Hardware dependencies found (Phase 3 → Hardware Assessment scan)

| Category | Indicator | Source file |
|---|---|---|
| GPU / CUDA | TensorRT engines (`.engine`, SM 87, JetPack 6.2, TRT 10.3) | `_docs/01_solution/solution.md` PRE-FLIGHT block |
| GPU / CUDA | DISK+LightGlue FP16 inference | `_docs/01_solution/solution.md` RUNTIME block (C3) |
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | `_docs/00_problem/restrictions.md` § Onboard Hardware |
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | `_docs/00_problem/restrictions.md` § Cameras |
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | `_docs/02_document/tests/environment.md` § Overview |
| OS-specific services | High-rate IMU via UART/MAVLink to FC | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |

(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: `pyproject.toml` confirms `tensorrt`, `pymavlink`, `gtsam==4.2.1`, `faiss-gpu`, `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — the original `>=4.12.0` target replays once gtsam ships numpy-2 wheels), and `jetson-stats`. `pycuda` was NOT added — TensorRT EP is invoked via ONNX Runtime + the `onnx_trt_ep_runtime` factory, which uses TensorRT's Python bindings directly without `pycuda`.)

### Execution instructions — Tier-1 (Docker)

**Prerequisites**:
- Docker 24+ with Compose v2.
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.

**How to start** (preferred — selector-parity wrapper from AZ-444):
```bash
./e2e/docker/run-tier1.sh \
    --fc-adapter ardupilot \
    --vio-strategy okvis2 \
    [-k <pytest selector>] \
    [--build-kind production|asan] \
    [--enable-chamber]
```

`run-tier1.sh` and `e2e/jetson/run-tier2.sh` accept the same `-k <selector>` flag and emit the same pytest invocation modulo the `TIER` env var (AZ-444 AC-1).

Raw-compose equivalent (when bypassing the wrapper for debugging):
```bash
cd e2e/docker
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
```

The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.

**Environment variables**:
- `FC_ADAPTER` ∈ `{ardupilot, inav}` — selects which SITL the SUT talks to.
- `VIO_STRATEGY` ∈ `{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
- `E2E_SITL_REPLAY_DIR` — when set, activates captured-fixture FDR-replay mode for scenarios that gate on `sitl_replay_ready`; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
- `RUN_ID` — per-invocation run identifier; defaults to `local-${USER}-${EPOCH}` in development, CI sets it from the workflow run id. Determines the `e2e-results/run-${RUN_ID}/` output directory.

**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).

### Execution instructions — Tier-2 (Jetson hardware loop)

**Prerequisites**:
- Jetson Orin Nano Super (per `restrictions.md` § Onboard Hardware).
- JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).

**How to start** (AZ-444 selector-parity wrapper):
```bash
./e2e/jetson/run-tier2.sh \
    --fc-adapter ardupilot \
    --vio-strategy okvis2 \
    [-k <pytest selector>] \
    [--build-kind production|asan] \
    [--duration 5min|8h] \
    [--enable-chamber] \
    [--reflash]
```

The Tier-2 SITL stack runs on a paired x86 host via:
```bash
docker compose \
    -f e2e/docker/docker-compose.test.yml \
    -f e2e/docker/docker-compose.tier2-bridge.yml up ...
```

When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (`tier2-on-jetson.sh`). When `TIER2_HOST=localhost` and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.

**Environment variables**: same as Tier-1 plus:
- `TIER2_HOST` / `TIER2_USER` / `TIER2_KEY_PATH` — control-host → Jetson SSH wiring (required when `TIER2_HOST != localhost`).
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
- `TIER2_CAMERA_DEVICE` — `/dev/video0` (production) or file path for replay mode.

`gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` for `--build-kind=asan`) MUST be installed via systemd on the Jetson — `e2e/jetson/tier2.service` is the template. See `_docs/03_implementation/jetson_harness_setup.md` for the physical provisioning steps.

### CI runner mapping

**Active mapping (2026-05-20)**:

- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.

**Removed (2026-05-20)**:

- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.

**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.