[AZ-406] Blackbox test harness bootstrap (Tier-1 + Tier-2 scaffold)

Bootstraps the public-boundary blackbox test harness owned by epic AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root, fully separated from src/gps_denied_onboard/** and from the in-process tests/** tree, and commits to the contracts every subsequent test ticket (AZ-407..AZ-446) builds against. Tier-1 (workstation Docker): - docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL + mock Suite Sat Service + mavproxy listener + e2e-runner onto one e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 / NFT-SEC-02 egress isolation at the network layer). - docker/docker-compose.tier2-bridge.yml override disables the in- compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host while the SUT runs natively on the Jetson under systemd. Tier-2 (Jetson): - jetson/run-tier2.sh + tier2.service systemd unit + tegrastats / jtop parsers feed per-sample telemetry into the evidence bundle. Runner image (e2e/runner/): - Dockerfile + requirements.txt install ONLY ground-side libs (pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx, orjson, pydantic, structlog, pytest 8.x). The runner deliberately does NOT install the SUT package. - conftest.py implements the AC-9 skip-rule mapping (tier2_only, chamber_only, vins_mono, deferred_ac) tied to environment.md parametrize axes. - reporting/csv_reporter.py is a pytest plugin emitting one row per test with the exact 11-column schema from environment.md § Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths). XFAIL surfaced only when a test carries @pytest.mark.deferred_ac(verdict="xfail", reason=...). - reporting/evidence_bundler.py exposes the attach_evidence fixture that copies per-test artifacts (.tlog, FDR archives, screenshots, tegrastats / jtop CSVs) into the run bundle and records relative paths into the reporter's evidence_paths column. - helpers/{frame_source_replay,imu_replay,sitl_observer, mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces (concrete implementations owned by AZ-407 / AZ-408 / AZ-416 / AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships today (no downstream task dep) — WGS84 distance / forward-bearing / offset via pyproj with NaN rejection. Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/): - FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up), GET /tiles/audit + /mock/audit (per-run read-back), POST /mock/config (force-status, response delay), POST /mock/reset (clears audit between tests), GET /mock/health. Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector, injectors, cold-boot, secrets, security}/): - Public surfaces only. Concrete builders land in AZ-407 (static fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot fixture), AZ-439 (CVE-2025-53644 JPEG generator). Test tree (e2e/tests/{positive,negative,performance,resilience, security,resource_limit}/): - Mirror of the test-spec category grouping in _docs/02_document/tests/*-tests.md. - tests/positive/test_smoke.py is the AC-1 harness-boot smoke run inside the e2e-runner image once Docker brings everything up. Out-of-container unit tests (e2e/_unit_tests/): - Exercises the harness internals (CSV reporter plugin lifecycle, conftest skip rules, helper modules, parsers, mock app, compose YAML structural contract, public-boundary enforcement) without Docker / SITL. 97 unit tests, all passing. Build / config: - pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the mock-app TestClient unit test. AC coverage: - AC-1 (Tier-1 boot) → compose YAML test + directory layout + smoke test (Docker-bound) - AC-2 (mock services) → 6 FastAPI TestClient unit tests - AC-3 (SITLs accept output) → contract present; concrete check deferred to AZ-416 / AZ-417 - AC-4 (CSV columns) → in-process plugin lifecycle test emits the exact 11-column schema - AC-5 (egress isolation) → static config test + runtime probe in Docker-bound smoke - AC-6 (Tier-2 contract) → tegrastats + jtop parser unit tests + jetson/* layout test; full Tier-2 contract is AZ-444 - AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec - AC-8 (parametrize matrix) → vins_mono skip-rule cases + tests/positive/test_smoke - AC-9 (skip semantics) → 9 conftest skip-rule unit tests Module layout entry for blackbox_tests was added in 2026-05-16 preparatory commit d7a17a8 so this diff stays focused on the harness scaffold. AZ-406 advances to In Testing on commit. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 10:51:13 +00:00 · 2026-05-16 16:22:44 +03:00
parent d7a17a8248
commit 59d9116d36
72 changed files with 3515 additions and 6 deletions
@@ -0,0 +1,291 @@
+# Test Infrastructure
+
+**Task**: AZ-406_test_infrastructure
+**Name**: Blackbox Test Infrastructure Bootstrap (Tier-1 Docker + Tier-2 Jetson harness scaffold)
+**Description**: Scaffold the e2e blackbox test project — `e2e/` directory, pytest runner, docker-compose.test.yml, mock services, fixture builders, secrets handling, CSV reporter. This is the foundation every blackbox test depends on.
+**Complexity**: 5 points
+**Dependencies**: None
+**Component**: Blackbox Tests (epic AZ-262 / E-BBT)
+**Tracker**: AZ-406
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+The product (`gps-denied-onboard`) needs a behavioral verification layer that drives the SUT exclusively through its declared public boundaries (frame source, FC inbound, tile cache mount, FC outbound via SITL, GCS via mavproxy, FDR filesystem). Without a unified test harness — Docker compose for Tier-1, Jetson runner harness for Tier-2, fixture builders, mock Suite Sat Service, MAVLink listener, CSV reporter — every individual blackbox / performance / resilience / security / resource-limit task would re-invent its own scaffolding. This task delivers that shared scaffold once.
+
+## Outcome
+
+- A single `cd e2e/docker && docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner` command brings up the full Tier-1 environment and runs the full pytest suite (when test tasks land).
+- A single `./e2e/jetson/run-tier2.sh --fc-adapter <ardupilot|inav> --vio-strategy <okvis2|klt_ransac>` runs the Tier-2 hardware-loop with the same CSV reporter contract.
+- The matrix dimensions `FC_ADAPTER × VIO_STRATEGY × build_kind` (per `environment.md` § CI runner mapping) are first-class CI parameters; CI YAML scaffold provided.
+- Every external dependency named in `environment.md` (`ardupilot-plane-sitl`, `inav-sitl`, `mock-suite-sat-service`, `mavproxy-listener`) is provisioned by the compose file and reachable inside `e2e-net`.
+- Egress isolation (`internal: true` on `e2e-net`) is enforced by default, satisfying RESTRICT-SAT-1 / NFT-SEC-02 at the network layer.
+- A single `pytest`-based runner discovers tests under `e2e/tests/`, parameterizes by `FC_ADAPTER` + `VIO_STRATEGY`, and emits the CSV report at `e2e-results/run-${RUN_ID}/report.csv` with the columns specified in `environment.md` § Reporting.
+- Fixture builders for `tile-cache-fixture`, `synth-age-tile-set`, `outlier-injection-derkachi`, `blackout-spoof-derkachi`, `multi-segment-derkachi`, `cold-boot-fixture`, `cve-jpeg-fixture`, `mavlink-passkey` exist as separate Dockerized helpers under `tests/fixtures/`.
+
+## Test Project Folder Layout
+
+```
+e2e/
+├── docker/
+│   ├── docker-compose.test.yml         # Tier-1 entrypoint
+│   ├── docker-compose.tier2-bridge.yml # optional override for Jetson-attached SITLs
+│   └── secrets/
+│       └── mavlink_passkey             # Docker-secret mount target (test passkey)
+├── jetson/
+│   ├── run-tier2.sh                    # Tier-2 entrypoint
+│   ├── tier2.service                   # systemd unit template for SUT lifecycle
+│   ├── tegrastats_parser.py            # parse tegrastats → per-sample CSV rows
+│   └── jtop_parser.py                  # parse jetson-stats jtop API → per-sample CSV
+├── runner/
+│   ├── Dockerfile                      # e2e-runner image (Python 3.12 + pytest 8.x)
+│   ├── requirements.txt                # pytest, pymavlink, msp_gps_toy bridge, opencv-python>=4.12.0, numpy, scipy, geopy
+│   ├── conftest.py                     # session/module/function fixtures, FC_ADAPTER/VIO_STRATEGY parameterization
+│   ├── reporting/
+│   │   ├── csv_reporter.py             # pytest plugin emitting environment.md § Reporting columns
+│   │   └── evidence_bundler.py         # collects .tlog, FDR archives, profiler traces, screenshots
+│   └── helpers/
+│       ├── frame_source_replay.py      # replay images / video to V4L2 file source
+│       ├── imu_replay.py               # replay data_imu.csv at 10 Hz to FC inbound
+│       ├── sitl_observer.py            # AP/iNav state-read helpers (param read, GPS_RAW_INT, MSP queries)
+│       ├── mavproxy_tlog_reader.py     # parse .tlog from mavproxy-listener
+│       ├── fdr_reader.py               # post-run filesystem read of FDR archive
+│       └── geo.py                      # Vincenty / WGS84 geodesic helpers
+├── fixtures/
+│   ├── tile-cache-builder/             # builds tile-cache-fixture from input_data + curated public subset
+│   ├── age-injector/                   # mutates tile manifest dates → synth-age-tile-set
+│   ├── injectors/
+│   │   ├── outlier.py                  # outlier-injection-derkachi
+│   │   ├── blackout_spoof.py           # blackout-spoof-derkachi (5/15/35 s windows)
+│   │   ├── multi_segment.py            # multi-segment-derkachi
+│   │   └── cold_boot.py                # cold-boot-fixture (frozen FC pose JSON)
+│   ├── secrets/
+│   │   └── mavlink-test-passkey.txt    # 32-byte hex; "TEST ONLY"
+│   ├── security/
+│   │   └── cve-2025-53644.jpg          # crafted JPEG for NFT-SEC-04 (license-checked PoC)
+│   └── mock-suite-sat/                 # FastAPI stub for mock-suite-sat-service
+│       ├── Dockerfile
+│       └── app.py                      # 202 on well-formed publish; 400 on malformed
+└── tests/
+    ├── positive/                       # FT-P-* scenarios
+    ├── negative/                       # FT-N-* scenarios
+    ├── performance/                    # NFT-PERF-*
+    ├── resilience/                     # NFT-RES-*
+    ├── security/                       # NFT-SEC-*
+    └── resource_limit/                 # NFT-LIM-*
+```
+
+### Layout Rationale
+
+- `e2e/docker/` and `e2e/jetson/` separate the two execution tiers, mirroring `environment.md` § Two-tier execution profile. Each tier has its own entrypoint script — the runner image and CSV-reporter contract are shared.
+- `e2e/runner/helpers/` keeps reusable boundary-driving primitives (frame replay, IMU replay, SITL observers, FDR reader, MAVLink listener) out of individual test modules — every blackbox task imports from here, not from the SUT.
+- `e2e/fixtures/` holds *fixture builders*, not the data itself. Heavy fixture content (Derkachi video, 60 still images) is bind-mounted from `_docs/00_problem/input_data/` per `test-data.md`.
+- `e2e/tests/<category>/` mirrors the `_docs/02_document/tests/*-tests.md` grouping so a reader can map any spec scenario to its test file.
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|---|---|---|---|
+| `mock-suite-sat-service` | Azaion Suite Satellite Service ingest API | `POST /tiles` (publish), `GET /tiles/audit` (test-side audit retrieval) | Returns 202 on well-formed publish; 400 on malformed; logs every received tile + per-tile quality metadata to `/audit/<run-id>.jsonl`; `GET /audit` reads the log back. Deterministic; same input → same response. |
+| `ardupilot-plane-sitl` | ArduPilot Plane FC | UDP 14550 (MAVLink) | Real `ardupilot/ardupilot-sitl:plane-stable`; `GPS_TYPE=14`. Tests OBSERVE, do not patch. |
+| `inav-sitl` | iNav FC | TCP 5760 (MSP2) | Real `inavflight/inav-sitl:9.0.0`; GPS provider configured to MSP per `docs/SITL/SITL.md`. Tests OBSERVE. |
+| `mavproxy-listener` | QGroundControl GCS | UDP 14551 | Passive MAVLink listener; captures SUT → GCS stream into `/var/log/tlogs/<run-id>.tlog` for assertions. |
+
+### Mock Control Surface
+
+The mock-suite-sat-service exposes:
+
+- `POST /mock/config` — test-time behavior control (e.g., simulate downtime, inject 400 errors for negative-path scenarios)
+- `GET /mock/audit` — returns received tiles + their declared quality metadata for assertion
+- `POST /mock/reset` — clears audit log between tests for isolation
+
+The two SITL services (ArduPilot, iNav) are NOT control-surface mocks — they are real flight-controller stacks running in simulation. Tests interact via standard MAVLink / MSP2 protocols.
+
+## Docker Test Environment
+
+### docker-compose.test.yml structure
+
+The full structure is defined in `_docs/02_document/tests/environment.md` § Docker Environment. The test infrastructure task implements that structure verbatim with the following behaviors:
+
+- All services on the `e2e-net` bridge network with `internal: true` (no external connectivity — RESTRICT-SAT-1 / NFT-SEC-02).
+- Volumes: `tile-cache-fixture` (RO mount into SUT), `fdr-output` (RW from SUT, RO from runner), `input-data` (RO bind from `_docs/00_problem/input_data/`), `expected-results` (RO bind from `_docs/00_problem/input_data/expected_results/`).
+- `fdr-output` sized exactly 64 GB via Docker `--storage-opt size=64g` to enforce AC-NEW-3 capacity at the volume layer (NFT-LIM-02 cross-checks rotation behavior).
+- `MAVLINK_SIGNING_PASSKEY_FILE` injected as a Docker secret from `e2e/docker/secrets/mavlink_passkey`.
+- `FC_ADAPTER` and `VIO_STRATEGY` pulled from environment, default `ardupilot` + `okvis2`. CI matrix sets these per job.
+
+### Networks and Volumes
+
+| Network / Volume | Type | Purpose |
+|---|---|---|
+| `e2e-net` | bridge, `internal: true` | All test traffic; enforces no-external-egress at network layer |
+| `tile-cache-fixture` | named volume | Pre-built FAISS HNSW index + tile filesystem; built once per CI run |
+| `fdr-output` | named volume, 64 GB cap | Per-flight FDR write target |
+| `input-data` | bind mount | RO bind of `_docs/00_problem/input_data/` |
+| `expected-results` | bind mount | RO bind of `_docs/00_problem/input_data/expected_results/` |
+
+### Tier-2 Bridge
+
+Tier-2 runs the SUT on real Jetson hardware with `systemctl start gps-denied-onboard.service`. SITL containers (ArduPilot, iNav) run either on the same Jetson (constrained CPU sharing) OR on a paired x86 host on the same network — `docker-compose.tier2-bridge.yml` provisions the SITL-only subset with the same `e2e-net` definition so the runner observes the same boundaries.
+
+## Test Runner Configuration
+
+**Framework**: pytest 8.x
+**Plugins**:
+- `pytest-csv` — CSV emission per `environment.md` § Reporting (one row per test)
+- `pytest-xdist` — parallel test execution within a tier (Tier-1 only; Tier-2 runs serially due to single-Jetson constraint)
+- `pytest-timeout` — per-test wall-clock budget enforcement (matches the per-scenario `Max execution time` in test specs)
+- `pymavlink` — MAVLink ground side
+- `msp_gps_toy` (Rust binary, called via subprocess) — MSP2 ground side
+- `opencv-python>=4.12.0` — frame source replay (CVE-mitigated per D-CROSS-CVE-1)
+- `numpy` + `scipy` + `geopy` (Vincenty) — geodesic-distance assertions in WGS84
+- `pytest-forked` — `--forked` mode for hermetic-critical scenarios
+
+**Entry point**: `pytest e2e/tests/ --csv=e2e-results/run-${RUN_ID}/report.csv --csv-columns="test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths"`
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---|---|---|
+| `fc_adapter` | session | parametrized over `{ardupilot, inav}`; selects which SITL to bind |
+| `vio_strategy` | session | parametrized over `{okvis2, klt_ransac}` (production); `vins_mono` only on research-build sessions |
+| `tile_cache` | session | mounts `tile-cache-fixture` once per session |
+| `clean_sut` | function | `docker compose restart gps-denied-onboard` between tests; resets `fdr-output` |
+| `clean_sut_forked` | function (--forked) | full `docker compose down/up` per test; for hermetic-critical scenarios |
+| `mavproxy_tlog` | function | starts a fresh `.tlog` capture window for the duration of the test |
+| `fdr_reader` | function | post-run helper for parsing the FDR archive |
+| `sitl_observer` | function | AP/iNav state-read helper |
+
+### Parameterization
+
+Every test file is parameterized by `(fc_adapter, vio_strategy)` unless the spec explicitly skips one or both. The conftest skip-rules:
+
+- AC-7.x scenarios: skipped on every run (NOT COVERED per traceability matrix; `pytest.skip(reason="AC-7.x deferred — see traceability matrix")`).
+- `vins_mono`: skipped on production-build runs (`pytest.skip(reason="vins_mono is research-build-only per D-C1-1-SUB-A")`).
+- Tier-2-only scenarios (NFT-PERF-01, NFT-LIM-01, NFT-PERF-03, NFT-LIM-04): skipped on Tier-1 with `pytest.skip(reason="Tier-2 only — Jetson hardware required")`.
+- Chamber-only scenarios (AC-NEW-5 hot-soak portion): skipped on Tier-2 workstation-thermal runs; gated by `--enable-chamber` flag.
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|---|---|---|---|
+| `still-image-set-60` | bind mount from `_docs/00_problem/input_data/` | JPEG + CSV GT | FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-RES-03, NFT-PERF-04 |
+| `still-image-sat-refs-2` | same | PNG | FT-P-05, FT-P-19 |
+| `derkachi-fixture` | bind mount | MP4 + CSV | FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01..04, NFT-PERF-01..02, NFT-RES-01..04, NFT-LIM-02, NFT-LIM-04 |
+| `tile-cache-fixture` | named volume built by `tests/fixtures/tile-cache-builder/` | FAISS HNSW + tile filesystem | FT-P-01, FT-P-05, FT-P-15..17, FT-P-19, FT-N-05..06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01..02 |
+| `synth-age-tile-set` | built from `tile-cache-fixture` by `age-injector/` | tile filesystem with mutated manifest dates | FT-N-05, FT-N-06 |
+| `outlier-injection-derkachi` | runtime-generated by `injectors/outlier.py` | tmpfs frame source | FT-N-01 |
+| `blackout-spoof-derkachi` | runtime-generated by `injectors/blackout_spoof.py` | tmpfs + spoof injector on FC inbound | FT-N-04, NFT-RES-04 |
+| `multi-segment-derkachi` | runtime-generated by `injectors/multi_segment.py` | tmpfs frame source | FT-P-08 |
+| `cold-boot-fixture` | static JSON fixture in `tests/fixtures/cold-boot/` | JSON pose snapshot | FT-P-11, NFT-PERF-03 |
+| `mavlink-passkey` | `tests/fixtures/secrets/mavlink-test-passkey.txt` | 32-byte hex | FT-P-09-AP, NFT-SEC-03 |
+| `cve-jpeg-fixture` | `tests/fixtures/security/cve-2025-53644.jpg` | crafted JPEG | NFT-SEC-04 |
+
+### Data Isolation
+
+Per `test-data.md` § Data Isolation Strategy:
+
+- Each test runs against a fresh SUT container (`docker compose restart` between tests, OR `--forked` pytest mode for hermetic-critical scenarios).
+- `tile-cache-fixture` and `input-data` mounts are read-only — cross-contamination at the SUT input layer is impossible.
+- `fdr-output` volume is reset between tests (`docker volume rm` + recreate).
+- Synthetic-injection fixtures generate to per-test tmpfs; never written back to a persistent volume.
+- For Tier-2: same isolation discipline at the systemd-service level (`systemctl restart`); `/var/azaion/fdr` wiped between tests.
+
+## Test Reporting
+
+**Format**: CSV (one row per test) — exactly per `environment.md` § Reporting.
+
+**Columns**: `test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths`
+
+**Output path**: `e2e-results/run-${RUN_ID}/report.csv` plus a per-run bundle of evidence at `e2e-results/run-${RUN_ID}/evidence/` (assembled by `evidence_bundler.py` from `.tlog` files, FDR archives, screenshots, profiler traces, tegrastats CSV, jtop CSV).
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts (Tier-1)**
+Given a clean checkout of the repo
+When `cd e2e/docker && docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner` is executed
+Then all services in `environment.md` § Docker Environment start, the e2e-runner image builds, and pytest discovers ≥1 test file (sample test in `e2e/tests/positive/test_smoke.py`).
+
+**AC-2: Mock services respond**
+Given the test environment is running
+When the e2e-runner POSTs a well-formed tile-publish JSON to `mock-suite-sat-service`
+Then the service responds 202 and records the tile in its audit log; subsequent `GET /mock/audit` returns the recorded entry.
+
+**AC-3: SITL services accept SUT output**
+Given the test environment is running with a placeholder SUT that emits one valid `GPS_INPUT` (AP) AND one valid `MSP2_SENSOR_GPS` (iNav)
+When the e2e-runner reads `EK3_SRC1_POSXY` from `ardupilot-plane-sitl` AND queries iNav GPS state via MSP from `inav-sitl`
+Then both SITLs reflect the test-injected GPS source as primary.
+
+**AC-4: CSV report generated with required columns**
+Given the test runner executes
+When the test run completes
+Then `e2e-results/run-${RUN_ID}/report.csv` exists with exactly the columns from `environment.md` § Reporting, and a per-run evidence bundle exists at `e2e-results/run-${RUN_ID}/evidence/`.
+
+**AC-5: Egress isolation enforced**
+Given the test environment is running with `e2e-net.internal: true`
+When the e2e-runner attempts a TCP connect to `1.1.1.1:443` from inside the SUT container
+Then the connection fails (network-layer block); no DNS resolution succeeds for non-`e2e-net` names.
+
+**AC-6: Tier-2 runner harness contract**
+Given a Jetson Orin Nano Super with the SUT installed via systemd
+When `./e2e/jetson/run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 5min` is executed
+Then the same CSV report format is produced at `e2e-results/run-${RUN_ID}/report.csv`, with `tier=tier2-jetson` for every row, and `tegrastats` + `jtop` per-sample CSVs land in the evidence bundle.
+
+**AC-7: Fixture builders are reproducible**
+Given a clean Docker volume state
+When `tests/fixtures/tile-cache-builder/build.sh` runs
+Then the same tile-cache-fixture content is produced bit-for-bit twice in a row (same FAISS index, same tile manifest hashes); same idempotency for `age-injector` and the static JSON fixtures.
+
+**AC-8: Parameterization matrix coverage**
+Given the conftest sets up `(fc_adapter, vio_strategy, tier, build_kind)` parameterization
+When CI runs the standard matrix
+Then every produced report row has well-defined values for `fc_adapter`, `vio_strategy`, `tier`; `vins_mono` rows appear only on `build_kind=research` runs; `Tier-2`-only test_ids are SKIP on Tier-1 with the expected reason string.
+
+**AC-9: Skips per traceability matrix**
+Given the e2e-runner starts
+When the discoverer encounters a test mapped to AC-7.1, AC-7.2, AC-NEW-5 chamber portion, AC-8.6 scene-change subset, RESTRICT-CAM-2, or RESTRICT-HW-2 chamber portion
+Then those test rows show `result=SKIP` (or `XFAIL` for AC-8.6 scene-change PARTIAL) with an `error_message` referencing the traceability-matrix mitigation entry.
+
+## Constraints
+
+- **Public-boundary discipline**: the e2e-runner image MUST NOT import any module from the SUT source tree. The only legal interaction surfaces are MAVLink / MSP2 / HTTP / filesystem — same as a real consumer would have.
+- **OpenCV pin**: the runner image's OpenCV version MUST be `>=4.12.0` (D-CROSS-CVE-1); pinned in `e2e/runner/requirements.txt`.
+- **MAVLink-passkey provenance**: the test passkey is a checked-in fixture explicitly labeled "TEST ONLY"; the production passkey path is `/run/secrets/mavlink_passkey` per environment.md and is never the test fixture.
+- **Tier separation**: Tier-1 and Tier-2 produce IDENTICAL CSV row formats so the same downstream tooling (badge generators, regression detectors) can consume both.
+- **No internal state probes**: no test may read SUT internal state (GTSAM iSAM2 graph, FAISS in-memory index, internal Python/C++ buffers, logger handles). Only public boundaries + FDR archive + SITL observation are legal evidence sources.
+
+## Risks & Mitigation
+
+**Risk 1: Tier-1 runner image build slow / large**
+- *Risk*: pulling `tensorrt`, `gtsam`, `faiss-gpu`, `opencv-python>=4.12.0` plus dev dependencies into a single image bloats the e2e-runner build to ≥30 min and ≥10 GB.
+- *Mitigation*: the e2e-runner image is **separate from the SUT image** — the runner only needs ground-side libs (`pymavlink`, `msp_gps_toy`, `opencv-python`, `numpy`, `scipy`, `geopy`, `pytest`). The SUT image is what carries the heavy ML stack. Keep the runner image lean (target ≤2 GB).
+
+**Risk 2: SITL containers flaky / non-deterministic timing**
+- *Risk*: `ardupilot-plane-sitl` and `inav-sitl` boot times vary; tests may race the SITL's parameter-init phase.
+- *Mitigation*: conftest fixture polls SITL readiness via a known parameter read (e.g., `EKF_ENABLE`) before any test runs. Failure to reach readiness within 60 s fails the SITL fixture, not the individual test.
+
+**Risk 3: Mock Suite Sat Service drift from D-PROJ-2 contract**
+- *Risk*: when the real Suite Sat Service ingest contract ships (D-PROJ-2), the mock may diverge silently.
+- *Mitigation*: the mock's request/response schema is sourced from the contract sketch in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`; a contract test in NFT-SEC-01 asserts the mock's accepted-fields match that sketch. When the real endpoint ships, the mock is retired (per F7 in traceability matrix).
+
+**Risk 4: `--storage-opt size=64g` not portable**
+- *Risk*: Docker's `--storage-opt size=64g` for volumes requires specific storage drivers (overlay2 with xfs backing); may not work on all CI runners.
+- *Mitigation*: fallback strategy in the docker-compose: if the volume cannot be size-capped at the Docker layer, the SUT enforces the cap internally per AC-NEW-3 and NFT-LIM-02 verifies via volume-size sampling. CI runner config flagged in the runner README.
+
+**Risk 5: `cve-jpeg-fixture` license / distribution**
+- *Risk*: PoC JPEG for CVE-2025-53644 may have unclear redistribution terms.
+- *Mitigation*: license-checked at fixture-import time; if license unclear, the fixture is generated programmatically following the published PoC structure (no copyrighted bytes); generation script is itself part of `tests/fixtures/security/`.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/environment.md` — full Docker environment spec (services, networks, volumes, secrets, ports)
+- `_docs/02_document/tests/test-data.md` — fixture sources, formats, isolation strategy, validation rules
+- `_docs/02_document/tests/traceability-matrix.md` — AC coverage map (drives the SKIP / XFAIL rules in conftest)
+- `_docs/02_document/tests/blackbox-tests.md` + `performance-tests.md` + `resilience-tests.md` + `security-tests.md` + `resource-limit-tests.md` — list of test categories that the `e2e/tests/<category>/` folders mirror
+
+## Excluded
+
+- The SUT (`gps-denied-onboard`) container build — owned by the BUILD-side epics (E-CC-CONF / E-BOOT and per-component epics). The test infrastructure references the SUT image but does NOT build it.
+- Individual test scenario implementations (FT-P-*, FT-N-*, NFT-*) — owned by the per-scenario tasks decomposed in Step 3.
+- The Suite Sat Service real endpoint contract — owned by the parent suite (D-PROJ-2); the mock here mirrors a sketch only.
+- The thermal chamber AC-NEW-5 hot-soak run — physical hardware, deferred per traceability matrix.
+- The AI-camera fixture (AC-7.x) — out of scope per Phase 1 gate.