mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 20:41:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh b66b68ff76 [AZ-700] gps-denied-render-map: HTML map of estimated vs truth tracks

New operator-side console-script renders a self-contained HTML map
(folium / Leaflet) comparing the estimator's JSONL track against
the tlog ground-truth track. Pinned visual style: red truth + blue
estimated polylines, start/end markers per track, 100 m + 50 m
scale circles, optional AZ-699 accuracy-summary banner, and an
--offline-tiles mode (with optional local tile-URL template) for
Jetsons without internet.

folium is gated behind a new [operator-tools] optional-dep so the
airborne binary's cold-start NFR is unaffected (C12 binary doesn't
import the new module). 14 new unit tests pin polyline count,
marker count, scale-circle radii, summary embedding, offline-tile
behaviour, and full CLI smoke. Zero mypy --strict errors.

Refines the 2026-05-20 Jetson-only test policy: unit tests may run
locally, e2e/perf/resilience/security stay Jetson-only. Documented
in _docs/02_document/tests/environment.md (Where each tier runs)
and .cursor/rules/testing.mdc (Test environment for this project).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-20 17:04:01 +03:00

29 KiB

Raw Blame History

Test Environment

Active policy — 2026-05-20 (refined): the canonical CI / release-gate test environment is the Jetson Orin Nano Super (or a Jetson-equivalent arm64 agent). Unit tests (pytest tests/unit/) MAY be run on a local developer workstation for fast iteration — they are hardware-agnostic by construction, the suite is fully synthetic, and Jetson SSH round-trips add latency without adding signal. Blackbox / e2e / performance / resilience / security / resource-limit tests (tests/e2e/, e2e/tests/, tests/perf/, etc.) MUST run on the Jetson — never on a local workstation — because their pass criteria are tied to Jetson wall-clock latency, thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86 Docker (the historical "Tier-1" path) is deprecated as a supported e2e environment; the Tier-1 sections below are retained as historical reference / traceability only. CI e2e pipelines target the colocated arm64 Jetson Woodpecker agent (see _docs/04_deploy/ci_cd_pipeline.md); local-development e2e runs SHOULD use scripts/run-tests-jetson.sh against the configured jetson-e2e SSH alias rather than scripts/run-tests.sh. This refinement supersedes the 2026-05-20 "all tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in the § Test Execution section.

Where each tier runs (active policy)

Tier	Local workstation	Jetson (canonical)	When local is the only option
Unit (`tests/unit/`)	✅ allowed and encouraged for dev iteration	✅ also run as part of the Jetson CI lane	always
Blackbox / e2e (`tests/e2e/`, `e2e/tests/`)	❌ forbidden — placeholder fixtures + missing hardware = false-negative runs	✅ required for any merge / release decision	never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result
Performance / resilience / security / resource-limit	❌ forbidden	✅ required	never
Thermal chamber (AC-NEW-5)	❌ forbidden	✅ chamber Jetson only	never

Practical consequences:

A PR may merge on green local unit tests + green Jetson e2e tests.
A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via _docs/_process_leftovers/) rather than substituting a local run.
Tests in tests/e2e/ that gate on RUN_REPLAY_E2E or @pytest.mark.tier2 will SKIP locally; this is correct behaviour, not a failure to investigate.

Overview

System under test (SUT): gps-denied-onboard companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):

Inbound — nav-camera frames: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per restrictions.md; tests: file-backed source replaying _docs/00_problem/input_data/AD0000NN.jpg or flight_derkachi/flight_derkachi.mp4).
Inbound — FC telemetry: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying SCALED_IMU2, ATTITUDE, GLOBAL_POSITION_INT (or MSP equivalents). Tests replay flight_derkachi/data_imu.csv through a thin replayer.
Inbound — satellite tile cache: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
Outbound — FC external-positioning: MAVLink GPS_INPUT (ArduPilot Plane) OR MSP2 MSP2_SENSOR_GPS (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
Outbound — GCS telemetry: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
Outbound — Flight Data Recorder: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.

Consumer app purpose: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.

Two-tier execution profile

SUPERSEDED — 2026-05-20: the two-tier model below is retained for historical traceability. The active policy is Jetson-only (see banner at the top of this doc). Tier-1 (workstation Docker) is deprecated; only the Tier-2 row continues to describe a supported environment.

This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.

Tier	Hardware	What it covers	What it skips
Tier-1 (workstation Docker) (deprecated 2026-05-20)	x86 dev workstation, optional NVIDIA dGPU for TensorRT validation	All `FT-` correctness, schema, `NFT-RES-` resilience scenarios, `NFT-SEC-` security scenarios, `NFT-LIM-` storage budgets	Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5
Jetson (canonical, 2026-05-20) (formerly "Tier-2")	Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5	Everything: `FT-` correctness, schema, `NFT-RES-`, `NFT-SEC-`, `NFT-LIM-`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only)	Nothing — anything that doesn't run here doesn't run at all

CI runs the Jetson pipeline (01-test.yml) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on self-hosted-jetson-orin-chamber on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.

Docker Environment (Tier-1)

Services

Service	Image / Build	Purpose	Ports
`gps-denied-onboard`	local build (`docker/Dockerfile`)	The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON`	14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL)
`ardupilot-plane-sitl`	`ardupilot/ardupilot-sitl:plane-stable`	ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x	14550/udp (MAVLink)
`inav-sitl`	`inavflight/inav-sitl:9.0.0`	iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state	5760/tcp (MSP2 over TCP per iNav SITL convention)
`mock-suite-sat-service`	local build (`e2e/fixtures/mock-suite-sat`)	Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles	8080/tcp
`e2e-runner`	local build (`e2e/runner`)	Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory.	—
`mavproxy-listener`	`ardupilot/mavproxy:latest`	Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions	14551/udp

Networks

Network	Services	Purpose
`e2e-net`	all	Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on `e2e-net` enforces this and is itself a security test (NFT-SEC-02).

Volumes

Volume	Mounted to	Purpose
`tile-cache-fixture`	`gps-denied-onboard:/var/azaion/tile-cache:ro`	Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior.
`fdr-output`	`gps-denied-onboard:/var/azaion/fdr`	Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume)
`input-data`	`e2e-runner:/test-data:ro`	Bind mount of `_docs/00_problem/input_data/` for replay
`expected-results`	`e2e-runner:/expected:ro`	Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions

docker-compose structure

services:
  gps-denied-onboard:
    build:
      context: ../..
      dockerfile: docker/Dockerfile
      args:
        BUILD_VINS_MONO: "OFF"
    networks: [e2e-net]
    volumes:
      - tile-cache-fixture:/var/azaion/tile-cache:ro
      - fdr-output:/var/azaion/fdr
    environment:
      ONBOARD_FC_ADAPTER: ${FC_ADAPTER}        # ardupilot | inav, set per scenario
      ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY}    # okvis2 | klt_ransac (production); vins_mono only in research build
      MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
    depends_on:
      - mock-suite-sat-service

  ardupilot-plane-sitl:
    image: ardupilot/ardupilot-sitl:plane-stable
    networks: [e2e-net]
    command: ["--vehicle=ArduPlane", "--gps-type=14"]   # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html

  inav-sitl:
    image: inavflight/inav-sitl:9.0.0
    networks: [e2e-net]
    # iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md

  mock-suite-sat-service:
    build: ../fixtures/mock-suite-sat
    networks: [e2e-net]
    # Egress restriction enforced at network level, not service level

  e2e-runner:
    build: ../runner
    networks: [e2e-net]
    volumes:
      - input-data:/test-data:ro
      - expected-results:/expected:ro
      - fdr-output:/fdr:ro
    depends_on:
      - gps-denied-onboard
      - ardupilot-plane-sitl
      - inav-sitl
      - mavproxy-listener

  mavproxy-listener:
    image: ardupilot/mavproxy:latest
    networks: [e2e-net]

networks:
  e2e-net:
    driver: bridge
    internal: true   # NO external connectivity (enforces RESTRICT-SAT-1)

volumes:
  tile-cache-fixture: {}
  fdr-output: {}

Consumer Application

Tech stack: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), msp_gps_toy (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).

Entry point: pytest e2e/tests/ from inside e2e-runner. Each scenario is a parameterized pytest case keyed by FC adapter (ardupilot / inav) and VioStrategy (okvis2 / klt_ransac) via the session-scoped conftest fixtures.

Harness Implementation Layout

The blackbox harness implementation lives under e2e/ (NOT the SUT source tree — public-boundary discipline enforced by e2e/README.md):

e2e/
├── docker/                            Tier-1 entrypoint
│   ├── docker-compose.test.yml        Compose stack (services from § Services above)
│   ├── docker-compose.tier2-bridge.yml  Compose override for paired-host Tier-2 SITL bridging
│   ├── run-tier1.sh                   AZ-444 selector-parity wrapper
│   └── secrets/                       Mounted Docker secrets (mavlink-passkey)
├── jetson/                            Tier-2 entrypoint
│   ├── run-tier2.sh                   AZ-444 selector-parity wrapper (control-host side)
│   ├── tier2-on-jetson.sh             SSH-orchestrated on-Jetson half
│   ├── tier2.service                  systemd unit template
│   ├── jtop_parser.py                 jetson_stats / jtop telemetry parser (NFT-LIM-01)
│   └── tegrastats_parser.py           tegrastats parser (NFT-LIM-04)
├── runner/                            e2e-runner image
│   ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
│   ├── helpers/                       Per-AC evaluator + observer modules (47 evaluators
│   │                                  covering accuracy, AP/iNav contract, blackout-spoof,
│   │                                  cache poisoning, cold-start, companion reboot,
│   │                                  CVE probe, e2e latency, egress observer, escalation
│   │                                  ladder, FDR reader, frame-source replay, IMU replay,
│   │                                  injector fixtures, MAVLink signing, MAVProxy tlog,
│   │                                  memory budget, mid-flight tile, mock suite-sat audit,
│   │                                  Monte Carlo envelope, MRE, multi-segment, outage
│   │                                  request, outlier tolerance, registration classifier,
│   │                                  retrieval, sharp-turn, sitl_observer, smoothing,
│   │                                  spoof promotion, storage budget, streaming, thermal
│   │                                  envelope, tile-cache inspector, TTFF — see
│   │                                  `e2e/runner/helpers/` for the authoritative list)
│   └── reporting/                     CSV reporter + evidence bundler (AZ-445/446)
│       ├── csv_reporter.py            Emits `report.csv` per § Reporting
│       ├── evidence_bundler.py        Collects per-run `.tlog`, FDR, telemetry CSVs
│       └── nfr_recorder.py            NFR per-stage latency + budget recorder
├── fixtures/                          Fixture builders + captured fixtures
│   ├── tile-cache-builder/            `tile-cache-fixture` builder
│   ├── age-injector/                  `synth-age-tile-set` builder (FT-N-05)
│   ├── injectors/                     Runtime injectors:
│   │   ├── outlier.py                 `outlier-injection-derkachi` (FT-N-01)
│   │   ├── blackout_spoof.py          `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
│   │   ├── multi_segment.py           `multi-segment-derkachi` (FT-P-08)
│   │   ├── cold_boot.py               `cold-boot-fixture` (NFT-PERF-03)
│   │   └── fc_proxy.py                FC-inbound blackout/spoof proxy (FT-N-04 driver)
│   ├── sitl_replay/                   Captured offline FDR-replay fixtures
│   │   └── p01/                       FT-P-01 capture set (see test-data.md)
│   ├── sitl_replay_builder/           Captured-fixture builder framework (AZ-598-600)
│   │   ├── builder.py                 VideoSource × TlogSource × FdrProjection strategies
│   │   ├── build_p01_fixtures.py      FT-P-01 still-image builder
│   │   └── build_p02_fixtures.py      FT-P-02 Derkachi builder
│   ├── mock-suite-sat/                `mock-suite-sat-service` Docker image
│   ├── secrets/                       Test-only secrets (mavlink-test-passkey.txt)
│   └── security/                      Security fixtures (cve-2025-53644.jpg)
├── tests/                             Pytest target: positive/, negative/, performance/,
│                                      resilience/, security/, resource_limit/
└── _unit_tests/                       Out-of-container unit tests for harness internals
                                       (runs as part of project pytest, no Docker required)

Replay-Mode Skip Gating

Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the E2E_SITL_REPLAY_DIR environment variable is unset, those scenarios skip cleanly via a sitl_replay_ready pytest marker (per AZ-594/595/598/599). To activate them:

E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
    pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py

The captured-fixture builder framework (e2e/fixtures/sitl_replay_builder/) regenerates these fixtures from _docs/00_problem/input_data/ against a live compose stack; the captured artifacts are then committed under e2e/fixtures/sitl_replay/<scenario>/. See e2e/fixtures/sitl_replay_builder/README.md for the framework, supported scenarios, and per-scenario builder invocations.

Communication with system under test

Interface	Protocol	Endpoint / Topic	Authentication
Frame source	V4L2 / GStreamer file source	UNIX domain socket / shared `/test-data` mount	none (local)
FC telemetry inbound	MAVLink (AP) or MSP2 (iNav)	`udp:gps-denied-onboard:14550` (AP) or `tcp:gps-denied-onboard:5760` (iNav)	MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk
Tile cache	Filesystem read	`/var/azaion/tile-cache` (read-only mount)	filesystem perms
FC external-pos outbound observation	Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL	`udp:ardupilot-plane-sitl:14550` or `tcp:inav-sitl:5760`	passive listener
GCS telemetry observation	MAVLink listener	`udp:mavproxy-listener:14551` (forwarded from SUT 14550)	none
FDR output	Filesystem read post-run	`/fdr` (read-only mount)	filesystem perms
Suite Sat Service mock	HTTP/JSON	`http://mock-suite-sat-service:8080`	none (test)

What the consumer does NOT have access to

No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
No internal Python/C++ module imports from the SUT.
No shared memory or filesystem with the SUT outside the four explicit mounts (tile-cache-fixture r/o, fdr-output r/o from runner side, input-data r/o, expected-results r/o).
No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.

CI/CD Integration

2026-05-20: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.

When to run (active policy):

Jetson (colocated arm64 Woodpecker agent): on every PR to dev branch, nightly on dev HEAD, and as a hard gate before any release tag.
AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.

Pipeline stage: a single Jetson workflow (.woodpecker/01-test.yml) on the self-hosted-jetson-orin runner exercises the full suite — there is no longer a parallel x86 lane.

Gate behavior: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.

Timeout:

Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).

Reporting

Format: CSV (one row per test).

Columns: test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths

traces_to: comma-separated AC/RESTRICT IDs from the traceability matrix.
fc_adapter: ardupilot | inav | n/a.
vio_strategy: okvis2 | klt_ransac | vins_mono | n/a (research-build only for vins_mono).
tier: tier1-docker | tier2-jetson | tier2-chamber.
result: PASS | FAIL | SKIP | XFAIL (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
evidence_paths: comma-separated paths inside the run-output bundle (.tlog files, FDR archives, screenshots, profiler traces) supporting the verdict.

Output path: e2e-results/run-${RUN_ID}/report.csv plus a per-run bundle of evidence at e2e-results/run-${RUN_ID}/evidence/.

Test Execution

Decision (2026-05-20, refined later that day) — Jetson is the binding e2e environment; unit tests may run locally. This refines the earlier "Jetson only for everything" wording. Rationale captured in _docs/LESSONS.md (2026-05-20 entries):

The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing libgl1, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment.
Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 30–90 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.

Operational entry points:

Tier	Entry point	Where it runs
Unit (`tests/unit/`)	`pytest tests/unit/ -q` directly, or `scripts/run-tests.sh`	local workstation (Python 3.10+ venv)
Blackbox / e2e (`tests/e2e/`, `e2e/tests/`)	`scripts/run-tests-jetson.sh` (local dev) / `.woodpecker/01-test.yml` (CI)	colocated arm64 Jetson Woodpecker agent — see `_docs/04_deploy/ci_cd_pipeline.md`
Performance / resilience / security / resource-limit	same as e2e	Jetson only
AC-NEW-5 thermal chamber	quarterly + pre-release	`self-hosted-jetson-orin-chamber`

A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.

The remainder of this section preserves the original 2026-05-09 decision context for traceability.

Decision (2026-05-09, SUPERSEDED): both — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.

Hardware dependencies found (Phase 3 → Hardware Assessment scan)

Category	Indicator	Source file
GPU / CUDA	TensorRT engines (`.engine`, SM 87, JetPack 6.2, TRT 10.3)	`_docs/01_solution/solution.md` PRE-FLIGHT block
GPU / CUDA	DISK+LightGlue FP16 inference	`_docs/01_solution/solution.md` RUNTIME block (C3)
GPU / CUDA pin	Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W)	`_docs/00_problem/restrictions.md` § Onboard Hardware
Sensors / Cameras	ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE	`_docs/00_problem/restrictions.md` § Cameras
Sensors / Cameras	V4L2 / GStreamer frame source (production)	`_docs/02_document/tests/environment.md` § Overview
OS-specific services	High-rate IMU via UART/MAVLink to FC	`_docs/00_problem/restrictions.md` § Sensors & Integration
OS-specific services	Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav)	`_docs/00_problem/restrictions.md` § Sensors & Integration
OS-specific services	tegrastats / jetson_stats for thermal telemetry	`_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04
Thermal envelope	-20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle	`_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5

(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: pyproject.toml confirms tensorrt, pymavlink, gtsam==4.2.1, faiss-gpu, opencv-python>=4.11.0.86,<4.12 (cycle-1 relaxation per _docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md — the original >=4.12.0 target replays once gtsam ships numpy-2 wheels), and jetson-stats. pycuda was NOT added — TensorRT EP is invoked via ONNX Runtime + the onnx_trt_ep_runtime factory, which uses TensorRT's Python bindings directly without pycuda.)

Execution instructions — Tier-1 (Docker)

Prerequisites:

Docker 24+ with Compose v2.
NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
≥16 GB host RAM, ≥80 GB free disk for tile-cache-fixture + fdr-output + image build cache.

How to start (preferred — selector-parity wrapper from AZ-444):

./e2e/docker/run-tier1.sh \
    --fc-adapter ardupilot \
    --vio-strategy okvis2 \
    [-k <pytest selector>] \
    [--build-kind production|asan] \
    [--enable-chamber]

run-tier1.sh and e2e/jetson/run-tier2.sh accept the same -k <selector> flag and emit the same pytest invocation modulo the TIER env var (AZ-444 AC-1).

Raw-compose equivalent (when bypassing the wrapper for debugging):

cd e2e/docker
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner

The run reports to ./e2e-results/run-${RUN_ID}/report.csv (see § Reporting). Exit code matches the test verdict.

Environment variables:

FC_ADAPTER ∈ {ardupilot, inav} — selects which SITL the SUT talks to.
VIO_STRATEGY ∈ {okvis2, klt_ransac} for production binary; vins_mono only when the research binary BUILD_VINS_MONO=ON is the build.
MAVLINK_SIGNING_PASSKEY_FILE — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
E2E_SITL_REPLAY_DIR — when set, activates captured-fixture FDR-replay mode for scenarios that gate on sitl_replay_ready; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
RUN_ID — per-invocation run identifier; defaults to local-${USER}-${EPOCH} in development, CI sets it from the workflow run id. Determines the e2e-results/run-${RUN_ID}/ output directory.

Skipped on Tier-1: NFT-PERF-01 (AC-4.1 latency p95 — Jetson-bound), NFT-LIM-01 (AC-4.2 memory — Jetson-bound), NFT-PERF-03 (AC-NEW-1 cold-start — Jetson-bound), NFT-LIM-04 (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).

Execution instructions — Tier-2 (Jetson hardware loop)

Prerequisites:

Jetson Orin Nano Super (per restrictions.md § Onboard Hardware).
JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all AC-2.x cross-validation is XFAIL for that run).

How to start (AZ-444 selector-parity wrapper):

./e2e/jetson/run-tier2.sh \
    --fc-adapter ardupilot \
    --vio-strategy okvis2 \
    [-k <pytest selector>] \
    [--build-kind production|asan] \
    [--duration 5min|8h] \
    [--enable-chamber] \
    [--reflash]

The Tier-2 SITL stack runs on a paired x86 host via:

docker compose \
    -f e2e/docker/docker-compose.test.yml \
    -f e2e/docker/docker-compose.tier2-bridge.yml up ...

When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (tier2-on-jetson.sh). When TIER2_HOST=localhost and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.

Environment variables: same as Tier-1 plus:

TIER2_HOST / TIER2_USER / TIER2_KEY_PATH — control-host → Jetson SSH wiring (required when TIER2_HOST != localhost).
TIER2_CHAMBER_AMBIENT_C — ambient temperature for AC-NEW-5 chamber runs.
TIER2_CAMERA_DEVICE — /dev/video0 (production) or file path for replay mode.

gps-denied-onboard.service (or gps-denied-onboard-asan.service for --build-kind=asan) MUST be installed via systemd on the Jetson — e2e/jetson/tier2.service is the template. See _docs/03_implementation/jetson_harness_setup.md for the physical provisioning steps.

CI runner mapping

Active mapping (2026-05-20):

self-hosted-jetson-orin (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. This is the single canonical CI test runner.
self-hosted-jetson-orin-chamber → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.

Removed (2026-05-20):

~~~ubuntu-24.04 (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~~30-45 min per matrix entry.~~~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in 02-build-push.yml, but the test lane is Jetson-only.

Matrix dimensions: FC_ADAPTER × VIO_STRATEGY × build_kind where build_kind ∈ {production, research}. Production vins_mono is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.

29 KiB Raw Blame History Unescape Escape