mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 20:51:13 +00:00
b66b68ff76
New operator-side console-script renders a self-contained HTML map (folium / Leaflet) comparing the estimator's JSONL track against the tlog ground-truth track. Pinned visual style: red truth + blue estimated polylines, start/end markers per track, 100 m + 50 m scale circles, optional AZ-699 accuracy-summary banner, and an --offline-tiles mode (with optional local tile-URL template) for Jetsons without internet. folium is gated behind a new [operator-tools] optional-dep so the airborne binary's cold-start NFR is unaffected (C12 binary doesn't import the new module). 14 new unit tests pin polyline count, marker count, scale-circle radii, summary embedding, offline-tile behaviour, and full CLI smoke. Zero mypy --strict errors. Refines the 2026-05-20 Jetson-only test policy: unit tests may run locally, e2e/perf/resilience/security stay Jetson-only. Documented in _docs/02_document/tests/environment.md (Where each tier runs) and .cursor/rules/testing.mdc (Test environment for this project). Co-authored-by: Cursor <cursoragent@cursor.com>
413 lines
29 KiB
Markdown
413 lines
29 KiB
Markdown
# Test Environment
|
||
|
||
> **Active policy — 2026-05-20 (refined)**: the canonical CI / release-gate
|
||
> test environment is the Jetson Orin Nano Super (or a Jetson-equivalent
|
||
> arm64 agent). **Unit tests** (`pytest tests/unit/`) MAY be run on a local
|
||
> developer workstation for fast iteration — they are hardware-agnostic by
|
||
> construction, the suite is fully synthetic, and Jetson SSH round-trips add
|
||
> latency without adding signal. **Blackbox / e2e / performance / resilience
|
||
> / security / resource-limit tests** (`tests/e2e/`, `e2e/tests/`,
|
||
> `tests/perf/`, etc.) MUST run on the Jetson — never on a local workstation
|
||
> — because their pass criteria are tied to Jetson wall-clock latency,
|
||
> thermal envelope, and the real-camera + real-FC SITL loop. Workstation x86
|
||
> Docker (the historical "Tier-1" path) is **deprecated** as a supported
|
||
> e2e environment; the Tier-1 sections below are retained as historical
|
||
> reference / traceability only. CI e2e pipelines target the colocated
|
||
> arm64 Jetson Woodpecker agent (see `_docs/04_deploy/ci_cd_pipeline.md`);
|
||
> local-development e2e runs SHOULD use `scripts/run-tests-jetson.sh`
|
||
> against the configured `jetson-e2e` SSH alias rather than
|
||
> `scripts/run-tests.sh`. This refinement supersedes the 2026-05-20 "all
|
||
> tiers on Jetson" wording and the 2026-05-09 "both" decision recorded in
|
||
> the § Test Execution section.
|
||
|
||
## Where each tier runs (active policy)
|
||
|
||
| Tier | Local workstation | Jetson (canonical) | When local is the only option |
|
||
|------|--------------------|--------------------|-------------------------------|
|
||
| Unit (`tests/unit/`) | ✅ allowed and encouraged for dev iteration | ✅ also run as part of the Jetson CI lane | always |
|
||
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | ❌ forbidden — placeholder fixtures + missing hardware = false-negative runs | ✅ required for any merge / release decision | never — if Jetson is unreachable, the e2e verdict is "not run" rather than a local result |
|
||
| Performance / resilience / security / resource-limit | ❌ forbidden | ✅ required | never |
|
||
| Thermal chamber (AC-NEW-5) | ❌ forbidden | ✅ chamber Jetson only | never |
|
||
|
||
Practical consequences:
|
||
|
||
- A PR may merge on green local unit tests + green Jetson e2e tests.
|
||
- A PR MAY NOT merge on green local unit tests alone — the Jetson e2e lane is the binding signal.
|
||
- When the Jetson agent is offline, the e2e verdict is "pending Jetson" — record the gap (e.g. via `_docs/_process_leftovers/`) rather than substituting a local run.
|
||
- Tests in `tests/e2e/` that gate on `RUN_REPLAY_E2E` or `@pytest.mark.tier2` will SKIP locally; this is correct behaviour, not a failure to investigate.
|
||
|
||
## Overview
|
||
|
||
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
|
||
|
||
- **Inbound — nav-camera frames**: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per `restrictions.md`; tests: file-backed source replaying `_docs/00_problem/input_data/AD0000NN.jpg` or `flight_derkachi/flight_derkachi.mp4`).
|
||
- **Inbound — FC telemetry**: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying `SCALED_IMU2`, `ATTITUDE`, `GLOBAL_POSITION_INT` (or MSP equivalents). Tests replay `flight_derkachi/data_imu.csv` through a thin replayer.
|
||
- **Inbound — satellite tile cache**: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
|
||
- **Outbound — FC external-positioning**: MAVLink `GPS_INPUT` (ArduPilot Plane) OR MSP2 `MSP2_SENSOR_GPS` (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
|
||
- **Outbound — GCS telemetry**: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
|
||
- **Outbound — Flight Data Recorder**: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.
|
||
|
||
**Consumer app purpose**: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.
|
||
|
||
## Two-tier execution profile
|
||
|
||
> **SUPERSEDED — 2026-05-20**: the two-tier model below is retained for
|
||
> historical traceability. The active policy is **Jetson-only** (see banner
|
||
> at the top of this doc). Tier-1 (workstation Docker) is deprecated; only
|
||
> the Tier-2 row continues to describe a supported environment.
|
||
|
||
This project originally specified two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
|
||
|
||
| Tier | Hardware | What it covers | What it skips |
|
||
|------|----------|----------------|---------------|
|
||
| **Tier-1 (workstation Docker)** *(deprecated 2026-05-20)* | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
|
||
| **Jetson (canonical, 2026-05-20)** *(formerly "Tier-2")* | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | Everything: `FT-*` correctness, schema, `NFT-RES-*`, `NFT-SEC-*`, `NFT-LIM-*`, `NFT-PERF-*` (AC-4.1 latency p95), AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Nothing — anything that doesn't run here doesn't run at all |
|
||
|
||
CI runs the Jetson pipeline (`01-test.yml`) on the colocated arm64 Jetson agent. Chamber-only AC-NEW-5 runs on `self-hosted-jetson-orin-chamber` on the documented quarterly + pre-release cadence; results are recorded in the same CSV report format.
|
||
|
||
## Docker Environment (Tier-1)
|
||
|
||
### Services
|
||
|
||
| Service | Image / Build | Purpose | Ports |
|
||
|---------|--------------|---------|-------|
|
||
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
|
||
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
|
||
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
|
||
| `mock-suite-sat-service` | local build (`e2e/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
|
||
| `e2e-runner` | local build (`e2e/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios. See § Harness Implementation Layout below for the per-evaluator inventory. | — |
|
||
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |
|
||
|
||
### Networks
|
||
|
||
| Network | Services | Purpose |
|
||
|---------|----------|---------|
|
||
| `e2e-net` | all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on `e2e-net` enforces this and is itself a security test (NFT-SEC-02). |
|
||
|
||
### Volumes
|
||
|
||
| Volume | Mounted to | Purpose |
|
||
|--------|-----------|---------|
|
||
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `e2e/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
|
||
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
|
||
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
|
||
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |
|
||
|
||
### docker-compose structure
|
||
|
||
```yaml
|
||
services:
|
||
gps-denied-onboard:
|
||
build:
|
||
context: ../..
|
||
dockerfile: docker/Dockerfile
|
||
args:
|
||
BUILD_VINS_MONO: "OFF"
|
||
networks: [e2e-net]
|
||
volumes:
|
||
- tile-cache-fixture:/var/azaion/tile-cache:ro
|
||
- fdr-output:/var/azaion/fdr
|
||
environment:
|
||
ONBOARD_FC_ADAPTER: ${FC_ADAPTER} # ardupilot | inav, set per scenario
|
||
ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY} # okvis2 | klt_ransac (production); vins_mono only in research build
|
||
MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
|
||
depends_on:
|
||
- mock-suite-sat-service
|
||
|
||
ardupilot-plane-sitl:
|
||
image: ardupilot/ardupilot-sitl:plane-stable
|
||
networks: [e2e-net]
|
||
command: ["--vehicle=ArduPlane", "--gps-type=14"] # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html
|
||
|
||
inav-sitl:
|
||
image: inavflight/inav-sitl:9.0.0
|
||
networks: [e2e-net]
|
||
# iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md
|
||
|
||
mock-suite-sat-service:
|
||
build: ../fixtures/mock-suite-sat
|
||
networks: [e2e-net]
|
||
# Egress restriction enforced at network level, not service level
|
||
|
||
e2e-runner:
|
||
build: ../runner
|
||
networks: [e2e-net]
|
||
volumes:
|
||
- input-data:/test-data:ro
|
||
- expected-results:/expected:ro
|
||
- fdr-output:/fdr:ro
|
||
depends_on:
|
||
- gps-denied-onboard
|
||
- ardupilot-plane-sitl
|
||
- inav-sitl
|
||
- mavproxy-listener
|
||
|
||
mavproxy-listener:
|
||
image: ardupilot/mavproxy:latest
|
||
networks: [e2e-net]
|
||
|
||
networks:
|
||
e2e-net:
|
||
driver: bridge
|
||
internal: true # NO external connectivity (enforces RESTRICT-SAT-1)
|
||
|
||
volumes:
|
||
tile-cache-fixture: {}
|
||
fdr-output: {}
|
||
```
|
||
|
||
## Consumer Application
|
||
|
||
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.11.0,<4.12 (frame source replay; see `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — pin is held below 4.12 until gtsam ships numpy-2 wheels; D-CROSS-CVE-1 leftover remains open), numpy ≥1.26,<2.0 + scipy (geodesic-distance assertions in WGS84).
|
||
|
||
**Entry point**: `pytest e2e/tests/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`) and VioStrategy (`okvis2` / `klt_ransac`) via the session-scoped conftest fixtures.
|
||
|
||
### Harness Implementation Layout
|
||
|
||
The blackbox harness implementation lives under `e2e/` (NOT the SUT source tree — public-boundary discipline enforced by `e2e/README.md`):
|
||
|
||
```
|
||
e2e/
|
||
├── docker/ Tier-1 entrypoint
|
||
│ ├── docker-compose.test.yml Compose stack (services from § Services above)
|
||
│ ├── docker-compose.tier2-bridge.yml Compose override for paired-host Tier-2 SITL bridging
|
||
│ ├── run-tier1.sh AZ-444 selector-parity wrapper
|
||
│ └── secrets/ Mounted Docker secrets (mavlink-passkey)
|
||
├── jetson/ Tier-2 entrypoint
|
||
│ ├── run-tier2.sh AZ-444 selector-parity wrapper (control-host side)
|
||
│ ├── tier2-on-jetson.sh SSH-orchestrated on-Jetson half
|
||
│ ├── tier2.service systemd unit template
|
||
│ ├── jtop_parser.py jetson_stats / jtop telemetry parser (NFT-LIM-01)
|
||
│ └── tegrastats_parser.py tegrastats parser (NFT-LIM-04)
|
||
├── runner/ e2e-runner image
|
||
│ ├── Dockerfile, conftest.py, pytest.ini, requirements.txt
|
||
│ ├── helpers/ Per-AC evaluator + observer modules (47 evaluators
|
||
│ │ covering accuracy, AP/iNav contract, blackout-spoof,
|
||
│ │ cache poisoning, cold-start, companion reboot,
|
||
│ │ CVE probe, e2e latency, egress observer, escalation
|
||
│ │ ladder, FDR reader, frame-source replay, IMU replay,
|
||
│ │ injector fixtures, MAVLink signing, MAVProxy tlog,
|
||
│ │ memory budget, mid-flight tile, mock suite-sat audit,
|
||
│ │ Monte Carlo envelope, MRE, multi-segment, outage
|
||
│ │ request, outlier tolerance, registration classifier,
|
||
│ │ retrieval, sharp-turn, sitl_observer, smoothing,
|
||
│ │ spoof promotion, storage budget, streaming, thermal
|
||
│ │ envelope, tile-cache inspector, TTFF — see
|
||
│ │ `e2e/runner/helpers/` for the authoritative list)
|
||
│ └── reporting/ CSV reporter + evidence bundler (AZ-445/446)
|
||
│ ├── csv_reporter.py Emits `report.csv` per § Reporting
|
||
│ ├── evidence_bundler.py Collects per-run `.tlog`, FDR, telemetry CSVs
|
||
│ └── nfr_recorder.py NFR per-stage latency + budget recorder
|
||
├── fixtures/ Fixture builders + captured fixtures
|
||
│ ├── tile-cache-builder/ `tile-cache-fixture` builder
|
||
│ ├── age-injector/ `synth-age-tile-set` builder (FT-N-05)
|
||
│ ├── injectors/ Runtime injectors:
|
||
│ │ ├── outlier.py `outlier-injection-derkachi` (FT-N-01)
|
||
│ │ ├── blackout_spoof.py `blackout-spoof-derkachi` (FT-N-04, NFT-RES-04)
|
||
│ │ ├── multi_segment.py `multi-segment-derkachi` (FT-P-08)
|
||
│ │ ├── cold_boot.py `cold-boot-fixture` (NFT-PERF-03)
|
||
│ │ └── fc_proxy.py FC-inbound blackout/spoof proxy (FT-N-04 driver)
|
||
│ ├── sitl_replay/ Captured offline FDR-replay fixtures
|
||
│ │ └── p01/ FT-P-01 capture set (see test-data.md)
|
||
│ ├── sitl_replay_builder/ Captured-fixture builder framework (AZ-598-600)
|
||
│ │ ├── builder.py VideoSource × TlogSource × FdrProjection strategies
|
||
│ │ ├── build_p01_fixtures.py FT-P-01 still-image builder
|
||
│ │ └── build_p02_fixtures.py FT-P-02 Derkachi builder
|
||
│ ├── mock-suite-sat/ `mock-suite-sat-service` Docker image
|
||
│ ├── secrets/ Test-only secrets (mavlink-test-passkey.txt)
|
||
│ └── security/ Security fixtures (cve-2025-53644.jpg)
|
||
├── tests/ Pytest target: positive/, negative/, performance/,
|
||
│ resilience/, security/, resource_limit/
|
||
└── _unit_tests/ Out-of-container unit tests for harness internals
|
||
(runs as part of project pytest, no Docker required)
|
||
```
|
||
|
||
### Replay-Mode Skip Gating
|
||
|
||
Several FT-* and FT-N-* scenarios rely on a pre-captured FDR-replay fixture instead of a live SITL run. When the `E2E_SITL_REPLAY_DIR` environment variable is unset, those scenarios skip cleanly via a `sitl_replay_ready` pytest marker (per AZ-594/595/598/599). To activate them:
|
||
|
||
```bash
|
||
E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01 \
|
||
pytest e2e/tests/positive/test_ft_p_01_still_image_accuracy.py
|
||
```
|
||
|
||
The captured-fixture builder framework (`e2e/fixtures/sitl_replay_builder/`) regenerates these fixtures from `_docs/00_problem/input_data/` against a live compose stack; the captured artifacts are then committed under `e2e/fixtures/sitl_replay/<scenario>/`. See `e2e/fixtures/sitl_replay_builder/README.md` for the framework, supported scenarios, and per-scenario builder invocations.
|
||
|
||
### Communication with system under test
|
||
|
||
| Interface | Protocol | Endpoint / Topic | Authentication |
|
||
|-----------|----------|-----------------|----------------|
|
||
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared `/test-data` mount | none (local) |
|
||
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | `udp:gps-denied-onboard:14550` (AP) or `tcp:gps-denied-onboard:5760` (iNav) | MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
|
||
| Tile cache | Filesystem read | `/var/azaion/tile-cache` (read-only mount) | filesystem perms |
|
||
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | `udp:ardupilot-plane-sitl:14550` or `tcp:inav-sitl:5760` | passive listener |
|
||
| GCS telemetry observation | MAVLink listener | `udp:mavproxy-listener:14551` (forwarded from SUT 14550) | none |
|
||
| FDR output | Filesystem read post-run | `/fdr` (read-only mount) | filesystem perms |
|
||
| Suite Sat Service mock | HTTP/JSON | `http://mock-suite-sat-service:8080` | none (test) |
|
||
|
||
### What the consumer does NOT have access to
|
||
|
||
- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
|
||
- No internal Python/C++ module imports from the SUT.
|
||
- No shared memory or filesystem with the SUT outside the four explicit mounts (`tile-cache-fixture` r/o, `fdr-output` r/o from runner side, `input-data` r/o, `expected-results` r/o).
|
||
- No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.
|
||
|
||
## CI/CD Integration
|
||
|
||
> **2026-05-20**: rewritten for the Jetson-only policy. Tier-1 references in the historical sub-sections below are no longer operative.
|
||
|
||
**When to run** (active policy):
|
||
|
||
- Jetson (colocated arm64 Woodpecker agent): on every PR to `dev` branch, nightly on `dev` HEAD, and as a hard gate before any release tag.
|
||
- AC-NEW-5 thermal envelope: quarterly on the chamber-attached Jetson runner; failures block release tags only.
|
||
|
||
**Pipeline stage**: a single Jetson workflow (`.woodpecker/01-test.yml`) on the `self-hosted-jetson-orin` runner exercises the full suite — there is no longer a parallel x86 lane.
|
||
|
||
**Gate behavior**: Jetson blocks PR merge on any test failure and blocks release tags on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
|
||
|
||
**Timeout**:
|
||
- Jetson: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
|
||
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
|
||
|
||
## Reporting
|
||
|
||
**Format**: CSV (one row per test).
|
||
|
||
**Columns**: `test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths`
|
||
|
||
- `traces_to`: comma-separated AC/RESTRICT IDs from the traceability matrix.
|
||
- `fc_adapter`: `ardupilot` | `inav` | `n/a`.
|
||
- `vio_strategy`: `okvis2` | `klt_ransac` | `vins_mono` | `n/a` (research-build only for `vins_mono`).
|
||
- `tier`: `tier1-docker` | `tier2-jetson` | `tier2-chamber`.
|
||
- `result`: `PASS` | `FAIL` | `SKIP` | `XFAIL` (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
|
||
- `evidence_paths`: comma-separated paths inside the run-output bundle (`.tlog` files, FDR archives, screenshots, profiler traces) supporting the verdict.
|
||
|
||
**Output path**: `e2e-results/run-${RUN_ID}/report.csv` plus a per-run bundle of evidence at `e2e-results/run-${RUN_ID}/evidence/`.
|
||
|
||
## Test Execution
|
||
|
||
**Decision (2026-05-20, refined later that day)** — **Jetson is the binding e2e environment; unit tests may run locally.** This refines the earlier "Jetson only for everything" wording. Rationale captured in `_docs/LESSONS.md` (2026-05-20 entries):
|
||
|
||
- The original "Jetson-only across all tiers" decision came from repeated workstation-vs-Jetson environment divergences in the e2e / build path (Dockerfile build order, missing `libgl1`, gtsam wheel availability, venv symlink resolution, lazy-import side-effect registration). Those divergences are real and continue to justify Jetson as the binding e2e environment.
|
||
- Forcing the unit-test suite over an SSH-orchestrated Jetson loop added 30–90 s per iteration without producing any signal the local interpreter doesn't already produce. The unit suite is fully synthetic — no camera, no SITL, no Jetson-specific runtime — so a local PASS is equivalent to a Jetson PASS for that tier.
|
||
|
||
**Operational entry points**:
|
||
|
||
| Tier | Entry point | Where it runs |
|
||
|------|-------------|---------------|
|
||
| Unit (`tests/unit/`) | `pytest tests/unit/ -q` directly, or `scripts/run-tests.sh` | local workstation (Python 3.10+ venv) |
|
||
| Blackbox / e2e (`tests/e2e/`, `e2e/tests/`) | `scripts/run-tests-jetson.sh` (local dev) / `.woodpecker/01-test.yml` (CI) | colocated arm64 Jetson Woodpecker agent — see `_docs/04_deploy/ci_cd_pipeline.md` |
|
||
| Performance / resilience / security / resource-limit | same as e2e | Jetson only |
|
||
| AC-NEW-5 thermal chamber | quarterly + pre-release | `self-hosted-jetson-orin-chamber` |
|
||
|
||
A green local unit-test run is necessary-but-not-sufficient for merge; the Jetson e2e lane is the binding signal.
|
||
|
||
The remainder of this section preserves the original 2026-05-09 decision context for traceability.
|
||
|
||
---
|
||
|
||
**Decision (2026-05-09, SUPERSEDED)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
|
||
|
||
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
|
||
|
||
| Category | Indicator | Source file |
|
||
|---|---|---|
|
||
| GPU / CUDA | TensorRT engines (`.engine`, SM 87, JetPack 6.2, TRT 10.3) | `_docs/01_solution/solution.md` PRE-FLIGHT block |
|
||
| GPU / CUDA | DISK+LightGlue FP16 inference | `_docs/01_solution/solution.md` RUNTIME block (C3) |
|
||
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | `_docs/00_problem/restrictions.md` § Onboard Hardware |
|
||
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | `_docs/00_problem/restrictions.md` § Cameras |
|
||
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | `_docs/02_document/tests/environment.md` § Overview |
|
||
| OS-specific services | High-rate IMU via UART/MAVLink to FC | `_docs/00_problem/restrictions.md` § Sensors & Integration |
|
||
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | `_docs/00_problem/restrictions.md` § Sensors & Integration |
|
||
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
|
||
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |
|
||
|
||
(Step 2 Code scan from the planning phase returned zero indicators because no source code existed yet. Post-implementation: `pyproject.toml` confirms `tensorrt`, `pymavlink`, `gtsam==4.2.1`, `faiss-gpu`, `opencv-python>=4.11.0.86,<4.12` (cycle-1 relaxation per `_docs/_process_leftovers/2026-05-11_d_cross_cve_1_opencv_pin_deferred.md` — the original `>=4.12.0` target replays once gtsam ships numpy-2 wheels), and `jetson-stats`. `pycuda` was NOT added — TensorRT EP is invoked via ONNX Runtime + the `onnx_trt_ep_runtime` factory, which uses TensorRT's Python bindings directly without `pycuda`.)
|
||
|
||
### Execution instructions — Tier-1 (Docker)
|
||
|
||
**Prerequisites**:
|
||
- Docker 24+ with Compose v2.
|
||
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
|
||
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.
|
||
|
||
**How to start** (preferred — selector-parity wrapper from AZ-444):
|
||
```bash
|
||
./e2e/docker/run-tier1.sh \
|
||
--fc-adapter ardupilot \
|
||
--vio-strategy okvis2 \
|
||
[-k <pytest selector>] \
|
||
[--build-kind production|asan] \
|
||
[--enable-chamber]
|
||
```
|
||
|
||
`run-tier1.sh` and `e2e/jetson/run-tier2.sh` accept the same `-k <selector>` flag and emit the same pytest invocation modulo the `TIER` env var (AZ-444 AC-1).
|
||
|
||
Raw-compose equivalent (when bypassing the wrapper for debugging):
|
||
```bash
|
||
cd e2e/docker
|
||
export FC_ADAPTER=ardupilot VIO_STRATEGY=okvis2
|
||
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
|
||
```
|
||
|
||
The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.
|
||
|
||
**Environment variables**:
|
||
- `FC_ADAPTER` ∈ `{ardupilot, inav}` — selects which SITL the SUT talks to.
|
||
- `VIO_STRATEGY` ∈ `{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
|
||
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
|
||
- `E2E_SITL_REPLAY_DIR` — when set, activates captured-fixture FDR-replay mode for scenarios that gate on `sitl_replay_ready`; unset → those scenarios skip cleanly (see § Replay-Mode Skip Gating above).
|
||
- `RUN_ID` — per-invocation run identifier; defaults to `local-${USER}-${EPOCH}` in development, CI sets it from the workflow run id. Determines the `e2e-results/run-${RUN_ID}/` output directory.
|
||
|
||
**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
|
||
|
||
### Execution instructions — Tier-2 (Jetson hardware loop)
|
||
|
||
**Prerequisites**:
|
||
- Jetson Orin Nano Super (per `restrictions.md` § Onboard Hardware).
|
||
- JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
|
||
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
|
||
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
|
||
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).
|
||
|
||
**How to start** (AZ-444 selector-parity wrapper):
|
||
```bash
|
||
./e2e/jetson/run-tier2.sh \
|
||
--fc-adapter ardupilot \
|
||
--vio-strategy okvis2 \
|
||
[-k <pytest selector>] \
|
||
[--build-kind production|asan] \
|
||
[--duration 5min|8h] \
|
||
[--enable-chamber] \
|
||
[--reflash]
|
||
```
|
||
|
||
The Tier-2 SITL stack runs on a paired x86 host via:
|
||
```bash
|
||
docker compose \
|
||
-f e2e/docker/docker-compose.test.yml \
|
||
-f e2e/docker/docker-compose.tier2-bridge.yml up ...
|
||
```
|
||
|
||
When invoked on a control host (typical), the script SSH-orchestrates the Jetson half (`tier2-on-jetson.sh`). When `TIER2_HOST=localhost` and the script runs on the Jetson itself, it delegates directly without SSH. Outputs the same CSV format as Tier-1 (one report.csv per run) plus tegrastats + jtop CSVs in the evidence bundle.
|
||
|
||
**Environment variables**: same as Tier-1 plus:
|
||
- `TIER2_HOST` / `TIER2_USER` / `TIER2_KEY_PATH` — control-host → Jetson SSH wiring (required when `TIER2_HOST != localhost`).
|
||
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
|
||
- `TIER2_CAMERA_DEVICE` — `/dev/video0` (production) or file path for replay mode.
|
||
|
||
`gps-denied-onboard.service` (or `gps-denied-onboard-asan.service` for `--build-kind=asan`) MUST be installed via systemd on the Jetson — `e2e/jetson/tier2.service` is the template. See `_docs/03_implementation/jetson_harness_setup.md` for the physical provisioning steps.
|
||
|
||
### CI runner mapping
|
||
|
||
**Active mapping (2026-05-20)**:
|
||
|
||
- `self-hosted-jetson-orin` (colocated arm64 Woodpecker agent) → all test runs, every PR + nightly + pre-release. ~4 hr per matrix entry. **This is the single canonical CI test runner.**
|
||
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
|
||
|
||
**Removed (2026-05-20)**:
|
||
|
||
- ~~`ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.~~ — Tier-1 workstation Docker is deprecated; no x86 CI agent participates in the test path. CI build-push lanes that ship images may still run on amd64 if/when that matrix dimension is uncommented in `02-build-push.yml`, but the test lane is Jetson-only.
|
||
|
||
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
|