Bootstraps the public-boundary blackbox test harness owned by epic
AZ-262 (E-BBT). Establishes the e2e/ directory tree at the repo root,
fully separated from src/gps_denied_onboard/** and from the in-process
tests/** tree, and commits to the contracts every subsequent test
ticket (AZ-407..AZ-446) builds against.
Tier-1 (workstation Docker):
- docker/docker-compose.test.yml wires SUT + ArduPilot SITL + iNav SITL
+ mock Suite Sat Service + mavproxy listener + e2e-runner onto one
e2e-net bridge with internal: true (enforces RESTRICT-SAT-1 /
NFT-SEC-02 egress isolation at the network layer).
- docker/docker-compose.tier2-bridge.yml override disables the in-
compose SUT so Tier-2 pairs SITLs + mock + runner on an x86 host
while the SUT runs natively on the Jetson under systemd.
Tier-2 (Jetson):
- jetson/run-tier2.sh + tier2.service systemd unit + tegrastats /
jtop parsers feed per-sample telemetry into the evidence bundle.
Runner image (e2e/runner/):
- Dockerfile + requirements.txt install ONLY ground-side libs
(pymavlink, opencv-python>=4.12, numpy/scipy/geopy/pyproj, httpx,
orjson, pydantic, structlog, pytest 8.x). The runner deliberately
does NOT install the SUT package.
- conftest.py implements the AC-9 skip-rule mapping (tier2_only,
chamber_only, vins_mono, deferred_ac) tied to environment.md
parametrize axes.
- reporting/csv_reporter.py is a pytest plugin emitting one row per
test with the exact 11-column schema from environment.md §
Reporting (test_id, test_name, traces_to, fc_adapter, vio_strategy,
tier, started_at_utc, execution_time_ms, result, error_message,
evidence_paths). XFAIL surfaced only when a test carries
@pytest.mark.deferred_ac(verdict="xfail", reason=...).
- reporting/evidence_bundler.py exposes the attach_evidence fixture
that copies per-test artifacts (.tlog, FDR archives, screenshots,
tegrastats / jtop CSVs) into the run bundle and records relative
paths into the reporter's evidence_paths column.
- helpers/{frame_source_replay,imu_replay,sitl_observer,
mavproxy_tlog_reader,fdr_reader}.py declare the public surfaces
(concrete implementations owned by AZ-407 / AZ-408 / AZ-416 /
AZ-417 / AZ-441 per the dependency table); helpers/geo.py ships
today (no downstream task dep) — WGS84 distance / forward-bearing
/ offset via pyproj with NaN rejection.
Mock Suite Sat Service (e2e/fixtures/mock-suite-sat/):
- FastAPI app: POST /tiles (ingest contract from D-PROJ-2 follow-up),
GET /tiles/audit + /mock/audit (per-run read-back), POST
/mock/config (force-status, response delay), POST /mock/reset
(clears audit between tests), GET /mock/health.
Fixture scaffolds (e2e/fixtures/{tile-cache-builder, age-injector,
injectors, cold-boot, secrets, security}/):
- Public surfaces only. Concrete builders land in AZ-407 (static
fixtures), AZ-408 (runtime synthetic injection), AZ-419 (cold-boot
fixture), AZ-439 (CVE-2025-53644 JPEG generator).
Test tree (e2e/tests/{positive,negative,performance,resilience,
security,resource_limit}/):
- Mirror of the test-spec category grouping in
_docs/02_document/tests/*-tests.md.
- tests/positive/test_smoke.py is the AC-1 harness-boot smoke run
inside the e2e-runner image once Docker brings everything up.
Out-of-container unit tests (e2e/_unit_tests/):
- Exercises the harness internals (CSV reporter plugin lifecycle,
conftest skip rules, helper modules, parsers, mock app, compose
YAML structural contract, public-boundary enforcement) without
Docker / SITL. 97 unit tests, all passing.
Build / config:
- pyproject.toml: testpaths extended with e2e/_unit_tests; pythonpath
extended with e2e; fastapi>=0.111,<0.120 added to dev extras for the
mock-app TestClient unit test.
AC coverage:
- AC-1 (Tier-1 boot) → compose YAML test + directory layout
+ smoke test (Docker-bound)
- AC-2 (mock services) → 6 FastAPI TestClient unit tests
- AC-3 (SITLs accept output) → contract present; concrete check
deferred to AZ-416 / AZ-417
- AC-4 (CSV columns) → in-process plugin lifecycle test
emits the exact 11-column schema
- AC-5 (egress isolation) → static config test + runtime probe
in Docker-bound smoke
- AC-6 (Tier-2 contract) → tegrastats + jtop parser unit tests
+ jetson/* layout test; full Tier-2
contract is AZ-444
- AC-7 (fixture reproducibility) → deferred to AZ-407 per task spec
- AC-8 (parametrize matrix) → vins_mono skip-rule cases +
tests/positive/test_smoke
- AC-9 (skip semantics) → 9 conftest skip-rule unit tests
Module layout entry for blackbox_tests was added in 2026-05-16
preparatory commit d7a17a8 so this diff stays focused on the harness
scaffold. AZ-406 advances to In Testing on commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
22 KiB
Test Infrastructure
Task: AZ-406_test_infrastructure
Name: Blackbox Test Infrastructure Bootstrap (Tier-1 Docker + Tier-2 Jetson harness scaffold)
Description: Scaffold the e2e blackbox test project — e2e/ directory, pytest runner, docker-compose.test.yml, mock services, fixture builders, secrets handling, CSV reporter. This is the foundation every blackbox test depends on.
Complexity: 5 points
Dependencies: None
Component: Blackbox Tests (epic AZ-262 / E-BBT)
Tracker: AZ-406
Epic: AZ-262 (E-BBT)
Problem
The product (gps-denied-onboard) needs a behavioral verification layer that drives the SUT exclusively through its declared public boundaries (frame source, FC inbound, tile cache mount, FC outbound via SITL, GCS via mavproxy, FDR filesystem). Without a unified test harness — Docker compose for Tier-1, Jetson runner harness for Tier-2, fixture builders, mock Suite Sat Service, MAVLink listener, CSV reporter — every individual blackbox / performance / resilience / security / resource-limit task would re-invent its own scaffolding. This task delivers that shared scaffold once.
Outcome
- A single
cd e2e/docker && docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runnercommand brings up the full Tier-1 environment and runs the full pytest suite (when test tasks land). - A single
./e2e/jetson/run-tier2.sh --fc-adapter <ardupilot|inav> --vio-strategy <okvis2|klt_ransac>runs the Tier-2 hardware-loop with the same CSV reporter contract. - The matrix dimensions
FC_ADAPTER × VIO_STRATEGY × build_kind(perenvironment.md§ CI runner mapping) are first-class CI parameters; CI YAML scaffold provided. - Every external dependency named in
environment.md(ardupilot-plane-sitl,inav-sitl,mock-suite-sat-service,mavproxy-listener) is provisioned by the compose file and reachable insidee2e-net. - Egress isolation (
internal: trueone2e-net) is enforced by default, satisfying RESTRICT-SAT-1 / NFT-SEC-02 at the network layer. - A single
pytest-based runner discovers tests undere2e/tests/, parameterizes byFC_ADAPTER+VIO_STRATEGY, and emits the CSV report ate2e-results/run-${RUN_ID}/report.csvwith the columns specified inenvironment.md§ Reporting. - Fixture builders for
tile-cache-fixture,synth-age-tile-set,outlier-injection-derkachi,blackout-spoof-derkachi,multi-segment-derkachi,cold-boot-fixture,cve-jpeg-fixture,mavlink-passkeyexist as separate Dockerized helpers undertests/fixtures/.
Test Project Folder Layout
e2e/
├── docker/
│ ├── docker-compose.test.yml # Tier-1 entrypoint
│ ├── docker-compose.tier2-bridge.yml # optional override for Jetson-attached SITLs
│ └── secrets/
│ └── mavlink_passkey # Docker-secret mount target (test passkey)
├── jetson/
│ ├── run-tier2.sh # Tier-2 entrypoint
│ ├── tier2.service # systemd unit template for SUT lifecycle
│ ├── tegrastats_parser.py # parse tegrastats → per-sample CSV rows
│ └── jtop_parser.py # parse jetson-stats jtop API → per-sample CSV
├── runner/
│ ├── Dockerfile # e2e-runner image (Python 3.12 + pytest 8.x)
│ ├── requirements.txt # pytest, pymavlink, msp_gps_toy bridge, opencv-python>=4.12.0, numpy, scipy, geopy
│ ├── conftest.py # session/module/function fixtures, FC_ADAPTER/VIO_STRATEGY parameterization
│ ├── reporting/
│ │ ├── csv_reporter.py # pytest plugin emitting environment.md § Reporting columns
│ │ └── evidence_bundler.py # collects .tlog, FDR archives, profiler traces, screenshots
│ └── helpers/
│ ├── frame_source_replay.py # replay images / video to V4L2 file source
│ ├── imu_replay.py # replay data_imu.csv at 10 Hz to FC inbound
│ ├── sitl_observer.py # AP/iNav state-read helpers (param read, GPS_RAW_INT, MSP queries)
│ ├── mavproxy_tlog_reader.py # parse .tlog from mavproxy-listener
│ ├── fdr_reader.py # post-run filesystem read of FDR archive
│ └── geo.py # Vincenty / WGS84 geodesic helpers
├── fixtures/
│ ├── tile-cache-builder/ # builds tile-cache-fixture from input_data + curated public subset
│ ├── age-injector/ # mutates tile manifest dates → synth-age-tile-set
│ ├── injectors/
│ │ ├── outlier.py # outlier-injection-derkachi
│ │ ├── blackout_spoof.py # blackout-spoof-derkachi (5/15/35 s windows)
│ │ ├── multi_segment.py # multi-segment-derkachi
│ │ └── cold_boot.py # cold-boot-fixture (frozen FC pose JSON)
│ ├── secrets/
│ │ └── mavlink-test-passkey.txt # 32-byte hex; "TEST ONLY"
│ ├── security/
│ │ └── cve-2025-53644.jpg # crafted JPEG for NFT-SEC-04 (license-checked PoC)
│ └── mock-suite-sat/ # FastAPI stub for mock-suite-sat-service
│ ├── Dockerfile
│ └── app.py # 202 on well-formed publish; 400 on malformed
└── tests/
├── positive/ # FT-P-* scenarios
├── negative/ # FT-N-* scenarios
├── performance/ # NFT-PERF-*
├── resilience/ # NFT-RES-*
├── security/ # NFT-SEC-*
└── resource_limit/ # NFT-LIM-*
Layout Rationale
e2e/docker/ande2e/jetson/separate the two execution tiers, mirroringenvironment.md§ Two-tier execution profile. Each tier has its own entrypoint script — the runner image and CSV-reporter contract are shared.e2e/runner/helpers/keeps reusable boundary-driving primitives (frame replay, IMU replay, SITL observers, FDR reader, MAVLink listener) out of individual test modules — every blackbox task imports from here, not from the SUT.e2e/fixtures/holds fixture builders, not the data itself. Heavy fixture content (Derkachi video, 60 still images) is bind-mounted from_docs/00_problem/input_data/pertest-data.md.e2e/tests/<category>/mirrors the_docs/02_document/tests/*-tests.mdgrouping so a reader can map any spec scenario to its test file.
Mock Services
| Mock Service | Replaces | Endpoints | Behavior |
|---|---|---|---|
mock-suite-sat-service |
Azaion Suite Satellite Service ingest API | POST /tiles (publish), GET /tiles/audit (test-side audit retrieval) |
Returns 202 on well-formed publish; 400 on malformed; logs every received tile + per-tile quality metadata to /audit/<run-id>.jsonl; GET /audit reads the log back. Deterministic; same input → same response. |
ardupilot-plane-sitl |
ArduPilot Plane FC | UDP 14550 (MAVLink) | Real ardupilot/ardupilot-sitl:plane-stable; GPS_TYPE=14. Tests OBSERVE, do not patch. |
inav-sitl |
iNav FC | TCP 5760 (MSP2) | Real inavflight/inav-sitl:9.0.0; GPS provider configured to MSP per docs/SITL/SITL.md. Tests OBSERVE. |
mavproxy-listener |
QGroundControl GCS | UDP 14551 | Passive MAVLink listener; captures SUT → GCS stream into /var/log/tlogs/<run-id>.tlog for assertions. |
Mock Control Surface
The mock-suite-sat-service exposes:
POST /mock/config— test-time behavior control (e.g., simulate downtime, inject 400 errors for negative-path scenarios)GET /mock/audit— returns received tiles + their declared quality metadata for assertionPOST /mock/reset— clears audit log between tests for isolation
The two SITL services (ArduPilot, iNav) are NOT control-surface mocks — they are real flight-controller stacks running in simulation. Tests interact via standard MAVLink / MSP2 protocols.
Docker Test Environment
docker-compose.test.yml structure
The full structure is defined in _docs/02_document/tests/environment.md § Docker Environment. The test infrastructure task implements that structure verbatim with the following behaviors:
- All services on the
e2e-netbridge network withinternal: true(no external connectivity — RESTRICT-SAT-1 / NFT-SEC-02). - Volumes:
tile-cache-fixture(RO mount into SUT),fdr-output(RW from SUT, RO from runner),input-data(RO bind from_docs/00_problem/input_data/),expected-results(RO bind from_docs/00_problem/input_data/expected_results/). fdr-outputsized exactly 64 GB via Docker--storage-opt size=64gto enforce AC-NEW-3 capacity at the volume layer (NFT-LIM-02 cross-checks rotation behavior).MAVLINK_SIGNING_PASSKEY_FILEinjected as a Docker secret frome2e/docker/secrets/mavlink_passkey.FC_ADAPTERandVIO_STRATEGYpulled from environment, defaultardupilot+okvis2. CI matrix sets these per job.
Networks and Volumes
| Network / Volume | Type | Purpose |
|---|---|---|
e2e-net |
bridge, internal: true |
All test traffic; enforces no-external-egress at network layer |
tile-cache-fixture |
named volume | Pre-built FAISS HNSW index + tile filesystem; built once per CI run |
fdr-output |
named volume, 64 GB cap | Per-flight FDR write target |
input-data |
bind mount | RO bind of _docs/00_problem/input_data/ |
expected-results |
bind mount | RO bind of _docs/00_problem/input_data/expected_results/ |
Tier-2 Bridge
Tier-2 runs the SUT on real Jetson hardware with systemctl start gps-denied-onboard.service. SITL containers (ArduPilot, iNav) run either on the same Jetson (constrained CPU sharing) OR on a paired x86 host on the same network — docker-compose.tier2-bridge.yml provisions the SITL-only subset with the same e2e-net definition so the runner observes the same boundaries.
Test Runner Configuration
Framework: pytest 8.x Plugins:
pytest-csv— CSV emission perenvironment.md§ Reporting (one row per test)pytest-xdist— parallel test execution within a tier (Tier-1 only; Tier-2 runs serially due to single-Jetson constraint)pytest-timeout— per-test wall-clock budget enforcement (matches the per-scenarioMax execution timein test specs)pymavlink— MAVLink ground sidemsp_gps_toy(Rust binary, called via subprocess) — MSP2 ground sideopencv-python>=4.12.0— frame source replay (CVE-mitigated per D-CROSS-CVE-1)numpy+scipy+geopy(Vincenty) — geodesic-distance assertions in WGS84pytest-forked—--forkedmode for hermetic-critical scenarios
Entry point: pytest e2e/tests/ --csv=e2e-results/run-${RUN_ID}/report.csv --csv-columns="test_id,test_name,traces_to,fc_adapter,vio_strategy,tier,started_at_utc,execution_time_ms,result,error_message,evidence_paths"
Fixture Strategy
| Fixture | Scope | Purpose |
|---|---|---|
fc_adapter |
session | parametrized over {ardupilot, inav}; selects which SITL to bind |
vio_strategy |
session | parametrized over {okvis2, klt_ransac} (production); vins_mono only on research-build sessions |
tile_cache |
session | mounts tile-cache-fixture once per session |
clean_sut |
function | docker compose restart gps-denied-onboard between tests; resets fdr-output |
clean_sut_forked |
function (--forked) | full docker compose down/up per test; for hermetic-critical scenarios |
mavproxy_tlog |
function | starts a fresh .tlog capture window for the duration of the test |
fdr_reader |
function | post-run helper for parsing the FDR archive |
sitl_observer |
function | AP/iNav state-read helper |
Parameterization
Every test file is parameterized by (fc_adapter, vio_strategy) unless the spec explicitly skips one or both. The conftest skip-rules:
- AC-7.x scenarios: skipped on every run (NOT COVERED per traceability matrix;
pytest.skip(reason="AC-7.x deferred — see traceability matrix")). vins_mono: skipped on production-build runs (pytest.skip(reason="vins_mono is research-build-only per D-C1-1-SUB-A")).- Tier-2-only scenarios (NFT-PERF-01, NFT-LIM-01, NFT-PERF-03, NFT-LIM-04): skipped on Tier-1 with
pytest.skip(reason="Tier-2 only — Jetson hardware required"). - Chamber-only scenarios (AC-NEW-5 hot-soak portion): skipped on Tier-2 workstation-thermal runs; gated by
--enable-chamberflag.
Test Data Fixtures
| Data Set | Source | Format | Used By |
|---|---|---|---|
still-image-set-60 |
bind mount from _docs/00_problem/input_data/ |
JPEG + CSV GT | FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-RES-03, NFT-PERF-04 |
still-image-sat-refs-2 |
same | PNG | FT-P-05, FT-P-19 |
derkachi-fixture |
bind mount | MP4 + CSV | FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01..04, NFT-PERF-01..02, NFT-RES-01..04, NFT-LIM-02, NFT-LIM-04 |
tile-cache-fixture |
named volume built by tests/fixtures/tile-cache-builder/ |
FAISS HNSW + tile filesystem | FT-P-01, FT-P-05, FT-P-15..17, FT-P-19, FT-N-05..06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01..02 |
synth-age-tile-set |
built from tile-cache-fixture by age-injector/ |
tile filesystem with mutated manifest dates | FT-N-05, FT-N-06 |
outlier-injection-derkachi |
runtime-generated by injectors/outlier.py |
tmpfs frame source | FT-N-01 |
blackout-spoof-derkachi |
runtime-generated by injectors/blackout_spoof.py |
tmpfs + spoof injector on FC inbound | FT-N-04, NFT-RES-04 |
multi-segment-derkachi |
runtime-generated by injectors/multi_segment.py |
tmpfs frame source | FT-P-08 |
cold-boot-fixture |
static JSON fixture in tests/fixtures/cold-boot/ |
JSON pose snapshot | FT-P-11, NFT-PERF-03 |
mavlink-passkey |
tests/fixtures/secrets/mavlink-test-passkey.txt |
32-byte hex | FT-P-09-AP, NFT-SEC-03 |
cve-jpeg-fixture |
tests/fixtures/security/cve-2025-53644.jpg |
crafted JPEG | NFT-SEC-04 |
Data Isolation
Per test-data.md § Data Isolation Strategy:
- Each test runs against a fresh SUT container (
docker compose restartbetween tests, OR--forkedpytest mode for hermetic-critical scenarios). tile-cache-fixtureandinput-datamounts are read-only — cross-contamination at the SUT input layer is impossible.fdr-outputvolume is reset between tests (docker volume rm+ recreate).- Synthetic-injection fixtures generate to per-test tmpfs; never written back to a persistent volume.
- For Tier-2: same isolation discipline at the systemd-service level (
systemctl restart);/var/azaion/fdrwiped between tests.
Test Reporting
Format: CSV (one row per test) — exactly per environment.md § Reporting.
Columns: test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths
Output path: e2e-results/run-${RUN_ID}/report.csv plus a per-run bundle of evidence at e2e-results/run-${RUN_ID}/evidence/ (assembled by evidence_bundler.py from .tlog files, FDR archives, screenshots, profiler traces, tegrastats CSV, jtop CSV).
Acceptance Criteria
AC-1: Test environment starts (Tier-1)
Given a clean checkout of the repo
When cd e2e/docker && docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner is executed
Then all services in environment.md § Docker Environment start, the e2e-runner image builds, and pytest discovers ≥1 test file (sample test in e2e/tests/positive/test_smoke.py).
AC-2: Mock services respond
Given the test environment is running
When the e2e-runner POSTs a well-formed tile-publish JSON to mock-suite-sat-service
Then the service responds 202 and records the tile in its audit log; subsequent GET /mock/audit returns the recorded entry.
AC-3: SITL services accept SUT output
Given the test environment is running with a placeholder SUT that emits one valid GPS_INPUT (AP) AND one valid MSP2_SENSOR_GPS (iNav)
When the e2e-runner reads EK3_SRC1_POSXY from ardupilot-plane-sitl AND queries iNav GPS state via MSP from inav-sitl
Then both SITLs reflect the test-injected GPS source as primary.
AC-4: CSV report generated with required columns
Given the test runner executes
When the test run completes
Then e2e-results/run-${RUN_ID}/report.csv exists with exactly the columns from environment.md § Reporting, and a per-run evidence bundle exists at e2e-results/run-${RUN_ID}/evidence/.
AC-5: Egress isolation enforced
Given the test environment is running with e2e-net.internal: true
When the e2e-runner attempts a TCP connect to 1.1.1.1:443 from inside the SUT container
Then the connection fails (network-layer block); no DNS resolution succeeds for non-e2e-net names.
AC-6: Tier-2 runner harness contract
Given a Jetson Orin Nano Super with the SUT installed via systemd
When ./e2e/jetson/run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 5min is executed
Then the same CSV report format is produced at e2e-results/run-${RUN_ID}/report.csv, with tier=tier2-jetson for every row, and tegrastats + jtop per-sample CSVs land in the evidence bundle.
AC-7: Fixture builders are reproducible
Given a clean Docker volume state
When tests/fixtures/tile-cache-builder/build.sh runs
Then the same tile-cache-fixture content is produced bit-for-bit twice in a row (same FAISS index, same tile manifest hashes); same idempotency for age-injector and the static JSON fixtures.
AC-8: Parameterization matrix coverage
Given the conftest sets up (fc_adapter, vio_strategy, tier, build_kind) parameterization
When CI runs the standard matrix
Then every produced report row has well-defined values for fc_adapter, vio_strategy, tier; vins_mono rows appear only on build_kind=research runs; Tier-2-only test_ids are SKIP on Tier-1 with the expected reason string.
AC-9: Skips per traceability matrix
Given the e2e-runner starts
When the discoverer encounters a test mapped to AC-7.1, AC-7.2, AC-NEW-5 chamber portion, AC-8.6 scene-change subset, RESTRICT-CAM-2, or RESTRICT-HW-2 chamber portion
Then those test rows show result=SKIP (or XFAIL for AC-8.6 scene-change PARTIAL) with an error_message referencing the traceability-matrix mitigation entry.
Constraints
- Public-boundary discipline: the e2e-runner image MUST NOT import any module from the SUT source tree. The only legal interaction surfaces are MAVLink / MSP2 / HTTP / filesystem — same as a real consumer would have.
- OpenCV pin: the runner image's OpenCV version MUST be
>=4.12.0(D-CROSS-CVE-1); pinned ine2e/runner/requirements.txt. - MAVLink-passkey provenance: the test passkey is a checked-in fixture explicitly labeled "TEST ONLY"; the production passkey path is
/run/secrets/mavlink_passkeyper environment.md and is never the test fixture. - Tier separation: Tier-1 and Tier-2 produce IDENTICAL CSV row formats so the same downstream tooling (badge generators, regression detectors) can consume both.
- No internal state probes: no test may read SUT internal state (GTSAM iSAM2 graph, FAISS in-memory index, internal Python/C++ buffers, logger handles). Only public boundaries + FDR archive + SITL observation are legal evidence sources.
Risks & Mitigation
Risk 1: Tier-1 runner image build slow / large
- Risk: pulling
tensorrt,gtsam,faiss-gpu,opencv-python>=4.12.0plus dev dependencies into a single image bloats the e2e-runner build to ≥30 min and ≥10 GB. - Mitigation: the e2e-runner image is separate from the SUT image — the runner only needs ground-side libs (
pymavlink,msp_gps_toy,opencv-python,numpy,scipy,geopy,pytest). The SUT image is what carries the heavy ML stack. Keep the runner image lean (target ≤2 GB).
Risk 2: SITL containers flaky / non-deterministic timing
- Risk:
ardupilot-plane-sitlandinav-sitlboot times vary; tests may race the SITL's parameter-init phase. - Mitigation: conftest fixture polls SITL readiness via a known parameter read (e.g.,
EKF_ENABLE) before any test runs. Failure to reach readiness within 60 s fails the SITL fixture, not the individual test.
Risk 3: Mock Suite Sat Service drift from D-PROJ-2 contract
- Risk: when the real Suite Sat Service ingest contract ships (D-PROJ-2), the mock may diverge silently.
- Mitigation: the mock's request/response schema is sourced from the contract sketch in
_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md; a contract test in NFT-SEC-01 asserts the mock's accepted-fields match that sketch. When the real endpoint ships, the mock is retired (per F7 in traceability matrix).
Risk 4: --storage-opt size=64g not portable
- Risk: Docker's
--storage-opt size=64gfor volumes requires specific storage drivers (overlay2 with xfs backing); may not work on all CI runners. - Mitigation: fallback strategy in the docker-compose: if the volume cannot be size-capped at the Docker layer, the SUT enforces the cap internally per AC-NEW-3 and NFT-LIM-02 verifies via volume-size sampling. CI runner config flagged in the runner README.
Risk 5: cve-jpeg-fixture license / distribution
- Risk: PoC JPEG for CVE-2025-53644 may have unclear redistribution terms.
- Mitigation: license-checked at fixture-import time; if license unclear, the fixture is generated programmatically following the published PoC structure (no copyrighted bytes); generation script is itself part of
tests/fixtures/security/.
Document Dependencies
_docs/02_document/tests/environment.md— full Docker environment spec (services, networks, volumes, secrets, ports)_docs/02_document/tests/test-data.md— fixture sources, formats, isolation strategy, validation rules_docs/02_document/tests/traceability-matrix.md— AC coverage map (drives the SKIP / XFAIL rules in conftest)_docs/02_document/tests/blackbox-tests.md+performance-tests.md+resilience-tests.md+security-tests.md+resource-limit-tests.md— list of test categories that thee2e/tests/<category>/folders mirror
Excluded
- The SUT (
gps-denied-onboard) container build — owned by the BUILD-side epics (E-CC-CONF / E-BOOT and per-component epics). The test infrastructure references the SUT image but does NOT build it. - Individual test scenario implementations (FT-P-, FT-N-, NFT-*) — owned by the per-scenario tasks decomposed in Step 3.
- The Suite Sat Service real endpoint contract — owned by the parent suite (D-PROJ-2); the mock here mirrors a sketch only.
- The thermal chamber AC-NEW-5 hot-soak run — physical hardware, deferred per traceability matrix.
- The AI-camera fixture (AC-7.x) — out of scope per Phase 1 gate.