Files
autopilot/_docs/02_document/tests/environment.md
T
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

24 KiB
Raw Blame History

Test Environment

Authored by /test-spec Phase 2 (2026-05-19) against:

  • _docs/00_problem/problem.md, acceptance_criteria.md, restrictions.md, security_approach.md
  • _docs/01_solution/solution_draft01.md
  • _docs/02_document/architecture.md (incl. §6 NFR Targets, §7 Detailed Design)
  • _docs/00_problem/input_data/data_parameters.md, services.md, fixtures/README.md, expected_results/results_report.md

Per .cursor/rules/artifact-srp.mdc this artifact owns ONLY the test environment / harness shape — measurable thresholds belong in acceptance_criteria.md, fixture inventory belongs in test-data.md, and per-test specs belong in the sibling *-tests.md files.


Overview

System under test (SUT): autopilot — a single Rust binary that mounts onto the Jetson Orin Nano Super of a reconnaissance UAV. Its observable external surfaces:

Surface Direction Protocol Source/Sink in production
Tier-1 detection RPC autopilot ⇄ detector bi-directional gRPC streaming (local) ../detections
MAVLink command/telemetry autopilot ⇄ airframe MAVLink v2 over UDP (or serial) ArduPilot / PX4
Camera RTSP feed camera → autopilot H.264/265 1080p, 30/60 fps ViewPro A40
Gimbal control + telemetry autopilot ⇄ camera ViewPro vendor UDP ViewPro A40
Mission + MapObjects REST autopilot ⇄ central HTTPS JSON missions service
Operator stream (telemetry out, commands in) autopilot ⇄ GS Suite-level modem protocol, signed commands Ground Station
Deep-analysis VLM IPC (optional) autopilot ⇄ VLM Unix-domain socket local-onboard VLM
Health endpoint autopilot → ops HTTP/JSON scraped by ops
Structured logs autopilot → ops JSON to stdout log shipper

The harness exercises every one of those surfaces from outside the SUT process. No test reaches inside the binary (no module imports, no direct DB peeks, no shared memory).

Consumer app purpose: a black-box test runner (e2e-consumer) that:

  1. Brings up the SUT in a controlled topology (with mock or live peers).
  2. Drives inputs through public surfaces.
  3. Captures every observable: outbound network frames, MAVLink commands, gimbal UDP commands, REST calls, operator-stream messages, health-endpoint JSON, log lines, plus passive resource metrics (RSS, CPU, GPU).
  4. Compares each observation against the expected result tagged in _docs/00_problem/input_data/expected_results/results_report.md and emits a CSV report.

Test execution tiers

Three execution tiers exist; each test scenario declares which tier(s) it must run in:

Tier Purpose What is real vs mocked When it runs
U — unit Pure in-process logic with no external surface (state-machine transitions, geometry helpers, schema validators) Everything in-process Per commit (cargo test)
I — component-integration One autopilot component against mocks for every peer SUT component real; all peers stubbed/replayed Per commit; isolates contract drift
B — blackbox / harness Full SUT binary against mock peers in containers SUT binary real; every external peer mocked (HTTPS mock, gRPC replay, MAVLink SITL, scripted operator trace, RTSP loopback) Per commit + nightly
E — suite-e2e Full SUT against live siblings (../detections, ../missions, ArduPilot SITL, Ground Station replay) All real services in the suite-e2e compose Nightly + pre-release
HW — hardware/replay benchmark SUT binary on representative Jetson hardware OR on a benchmarked replay of that hardware Real Jetson Orin Nano Super OR benchmarked replay Pre-release; the only path that satisfies the acceptance_criteria.md → Acceptance Gates (project-level) hardware gate

Hardware-dependency analysis (which AC rows require HW vs replay vs commodity) is produced by the test-spec phases/hardware-assessment.md step before Phase 4 runner scripts are generated and is appended to this file as ## Hardware Execution Matrix.

Docker environment (Tier B + E)

The suite-e2e compose lives at the monorepo level (../e2e/docker-compose.suite-e2e.yml, owned by the monorepo-e2e skill — see _docs/00_problem/input_data/services.md). The autopilot-local harness lives at e2e/docker-compose.autopilot-e2e.yml (created by Phase 4) and brings up only the SUT + mocks needed for Tier-B runs.

Services (Tier B — autopilot-local harness)

Service Image / Build Purpose Ports
autopilot build: . (cross to aarch64-unknown-linux-gnu for HW, native for Tier B) SUT health: 9100/tcp; log: stdout; MAVLink: 14550/udp; gimbal: 9201/udp; operator: 9301/tcp
detections-mock build: e2e/mocks/detections-mock (Python) Bi-directional gRPC mock replaying recorded Detections streams 50051/tcp
missions-mock build: e2e/mocks/missions-mock (Python FastAPI) HTTPS REST mock — GET/POST /missions/{id} + /mapobjects 8443/tcp (TLS)
rtsp-loopback image: bluenviron/mediamtx RTSP server playing back recorded .mp4 frame sequences at 30/60 fps 8554/tcp
gimbal-mock build: e2e/mocks/gimbal-mock (Rust) ViewPro UDP echo + scripted yaw/pitch/zoom telemetry replays 9200/udp
mavlink-sitl image: ardupilot/ardupilot-sitl ArduPilot SITL — MAVLink v2 endpoint for the autopilot to drive 14551/udp
vlm-mock build: e2e/mocks/vlm-mock (Python, UDS) Optional Tier-3 VLM IPC mock; replays recorded VlmAssessment JSON (UDS only)
operator-replay build: e2e/mocks/operator-replay (Python) Scripted Ground Station session trace: connect / push frame / push telemetry / operator-click / modem-drop / reconnect / lost-link 9300/tcp
time-injector build: e2e/mocks/time-injector (Rust) Injects clock-drift / NTP-loss scenarios into the SUT container's clock via faketime LD_PRELOAD shim
e2e-consumer build: e2e/consumer (Rust + assert crates) The black-box test runner that drives scenarios + compares observables to expected results

Networks

Network Services Purpose
autopilot-e2e all Isolated test network; no egress

Volumes

Volume Mounted to Purpose
fixtures-ro every mock service (read-only) Mounts _docs/00_problem/input_data/fixtures/ for replay sources
expected-ro e2e-consumer:/expected:ro Mounts _docs/00_problem/input_data/expected_results/ for assertion comparison
reports-rw e2e-consumer:/reports CSV + JSON test output
autopilot-state autopilot:/var/lib/autopilot On-device persistent store (R3, Mp4) — wiped between runs

docker-compose structure (outline only — not runnable)

services:
  autopilot:
    build: .
    depends_on: [detections-mock, missions-mock, rtsp-loopback, gimbal-mock, mavlink-sitl, operator-replay]
    networks: [autopilot-e2e]
    environment:
      DETECTOR_GRPC: detections-mock:50051
      MISSIONS_URL: https://missions-mock:8443
      RTSP_URL: rtsp://rtsp-loopback:8554/feed
      GIMBAL_UDP: gimbal-mock:9200
      MAVLINK_UDP: mavlink-sitl:14551
      OPERATOR_TCP: operator-replay:9300
      VLM_SOCK: /tmp/vlm.sock
      AUTOPILOT_CONFIG: /etc/autopilot/test.toml
    volumes:
      - autopilot-state:/var/lib/autopilot
  detections-mock: { build: e2e/mocks/detections-mock, volumes: [fixtures-ro:/fixtures:ro] }
  missions-mock:   { build: e2e/mocks/missions-mock,   volumes: [fixtures-ro:/fixtures:ro] }
  rtsp-loopback:   { image: bluenviron/mediamtx,       volumes: [fixtures-ro:/fixtures:ro] }
  gimbal-mock:     { build: e2e/mocks/gimbal-mock,     volumes: [fixtures-ro:/fixtures:ro] }
  mavlink-sitl:    { image: ardupilot/ardupilot-sitl }
  vlm-mock:        { build: e2e/mocks/vlm-mock,        volumes: [fixtures-ro:/fixtures:ro] }
  operator-replay: { build: e2e/mocks/operator-replay, volumes: [fixtures-ro:/fixtures:ro] }
  time-injector:   { build: e2e/mocks/time-injector }
  e2e-consumer:
    build: e2e/consumer
    depends_on: [autopilot]
    volumes: [expected-ro:/expected:ro, reports-rw:/reports]
networks:
  autopilot-e2e: {}
volumes:
  fixtures-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/fixtures } }
  expected-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/expected_results } }
  reports-rw: {}
  autopilot-state: {}

Suite-e2e compose (Tier E) — referenced, not redefined

For Tier-E runs the harness uses ../e2e/docker-compose.suite-e2e.yml (owned by monorepo-e2e). It adds the real ../detections, real ../missions, and a richer mavlink-sitl configuration. Autopilot's Tier-E entries in this file MUST mirror the suite-e2e topology — drift is reconciled by the monorepo-e2e skill, not here.

Consumer application (e2e-consumer)

Tech stack: Rust + assert_cmd + testcontainers-rs + prost/tonic (for gRPC observation) + mavlink-rs (for MAVLink observation) + reqwest/hyper (for HTTPS observation) + tokio-tungstenite (for operator-stream observation). Tests are organised one-scenario-per-file under e2e/consumer/tests/scenarios/.

Entry point: cargo test --release --test scenarios (orchestrated by scripts/run-tests.sh, produced in Phase 4).

Communication with the system under test

Interface Protocol Endpoint / Topic Authentication
Health endpoint HTTP GET http://autopilot:9100/health none (loopback)
Structured log stream line-delimited JSON on stdout docker-compose log tail none
MAVLink observed MAVLink v2 / UDP mavlink-sitl:14551 (the harness records both sides of the link) per Q6: MAVLink-2 message signing if configured
Gimbal observed ViewPro UDP gimbal-mock:9200 (commands recorded + telemetry replayed) none
RTSP delivered RTSP rtsp://rtsp-loopback:8554/feed (consumer schedules which clip plays per scenario) none
Detection RPC observed gRPC streaming detections-mock:50051 (consumer scripts the recorded replay served) none
Mission REST observed HTTPS missions-mock:8443 (consumer scripts JSON fixtures + asserts captured request bodies) TLS cert (self-signed for test)
Operator stream observed Suite modem protocol operator-replay:9300 (consumer scripts session traces + signed-command envelopes) per Q9: signed envelope (HMAC / ed25519 / MAVLink-2-ext)
VLM IPC observed (when enabled) Unix-domain socket /tmp/vlm.sock shared with vlm-mock peer-credential check (security_approach §"Local IPC peer authorisation")

What the consumer does NOT have access to

  • No direct database access to the autopilot's on-device persistent store (autopilot-state volume) — the consumer reads it only via the health endpoint, the operator telemetry stream, or as a post-run forensic check (the storage AC R3 is checked via the BIT health response, not by peeking at SQLite rows).
  • No internal Rust module imports — the consumer is a separate crate compiled against published public proto/schema files only.
  • No shared memory, no /proc/$pid/... inspection beyond passive resource metrics.
  • No direct reading of in-flight POI queue ordering — ordering is observed indirectly via the operator-stream emission order and the gimbal command stream.

External dependency mocks

Dependency Mock service Determinism guarantee Source fixture(s)
../detections Tier-1 RPC detections-mock Replays recorded Detections stream byte-for-byte; same input → same output <DEFERRED: tier1_replay/*.replay; services.md §1> (live ../detections used as fallback in Tier-E)
missions API missions-mock Static JSON responses per scenario; recorded round-trip captured for POST <DEFERRED: missions_fixtures/*.json; services.md §2>
ViewPro A40 camera frames rtsp-loopback (mediamtx) Plays back .mp4 at exact configured fps; frame timestamps deterministic fixtures/videos/94d42580bd1ad6ff.mp4, fixtures/movement/video0[1-4].mp4
ViewPro A40 gimbal control gimbal-mock Replays gimbal.csv per scenario; echoes commands with bounded latency budget per scenario <DEFERRED: gimbal_csv/*.csv paired with movement videos; services.md §6>
ArduPilot airframe mavlink-sitl (ArduPilot SITL) Deterministic seed + scripted mission scripted per scenario; no fixture file required for Tier B (SITL is the fixture)
Ground Station modem session operator-replay Replays (t, event) script per scenario <DEFERRED: operator_sessions/*.script; services.md §3>
Local VLM (Tier-3 optional) vlm-mock Returns paired (roi.png → VlmAssessment) from disk; schema-violation fixtures for fail-closed tests <DEFERRED: vlm_io_pairs/*.json; services.md §7>
Wall-clock / GPS / NTP time-injector (faketime LD_PRELOAD) Scripted offset / jump / source-loss; injected at SUT process start scripted per scenario; no fixture file required

Mocks that are marked <DEFERRED:> are bridged through _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md. Scenarios that consume those mocks declare Test status: DEFERRED — input fixture not yet acquired (see leftover row N) in their entry under the relevant *-tests.md file.

CI/CD integration

Stage Tier(s) When Gate Timeout
PR pipeline U, I on every PR push block merge on FAIL 10 min
dev-branch nightly U, I, B nightly warn on FAIL; report attached 60 min
weekly suite-e2e U, I, B, E weekly + on release branch block release on FAIL 180 min
pre-release HW benchmark HW manual + pre-release block release on FAIL 240 min

Owned in _docs/02_document/deployment/ci_cd_pipeline.md. This file only declares which tier each scenario MUST run in; the pipeline orchestration is documented there.

Reporting

Format: CSV (one row per scenario per run).

Columns:

Column Type Notes
test_id string e.g. FT-P-001, NFT-PERF-L1, NFT-SEC-O9
test_name string short title from the scenario header
tier enum U / I / B / E / HW
seed int deterministic seed used (where applicable)
start_ts_utc ISO 8601 scenario start
duration_ms int total execution time
result enum PASS / FAIL / SKIP / DEFERRED
expected_result_ref string row id in expected_results/results_report.md (e.g. L1, Mp3)
actual_value string quantitative observation (latency_ms, count, etc.)
compare_method string one of expected-results.md methods
tolerance string as declared in the expected-results row
failure_reason string populated only on FAIL or DEFERRED
artifacts_path string path under /reports/<run-id>/ for captured logs / pcaps / mavlink dumps

Output path: e2e/consumer/reports/<run-id>/report.csv (mounted host-side to ./reports/<run-id>/report.csv).

Sidecar artifacts per scenario (one folder per test_id): stdout.log, stderr.log, mavlink.tlog (where applicable), pcap.bin (where applicable), health-trace.jsonl, actual-output.json.

Test Execution

Decision (recorded 2026-05-19 by phases/hardware-assessment.md): local-only on Jetson Orin Nano Super. Every scenario — Tier B, Tier E, Tier HW — runs on representative Jetson hardware (the same hardware the airborne payload deploys to). Docker is used for service orchestration (mocks, sibling services) on the Jetson host, NOT for SUT execution on x86.

Hardware dependencies found

File Dependency surfaced
_docs/00_problem/restrictions.md → "Hardware" Jetson Orin Nano Super (aarch64), 8 GB shared LPDDR5, 67 TOPS INT8; ViewPro A40 (40× optical zoom + vendor UDP); ViewPro Z40K compatibility
_docs/00_problem/restrictions.md → "Software environment" FP16 precision (INT8 rejected); no cloud egress; Tier 1 + local large models share Jetson GPU with mutual exclusion
_docs/01_solution/solution_draft01.md "single Rust binary on Jetson Orin Nano Super (aarch64)"; TensorRT FP16; Tokio + Unix-domain-socket VLM IPC
_docs/02_document/architecture.md §6 (NFR Targets) + §7.6 (Solution Architecture) + §7.14 (Tech Stack) cross-compile target aarch64-unknown-linux-gnu; TensorRT engine; gimbal UDP; MAVLink-v2 transport
_docs/02_document/components/*/description.md (13 components) physical UDP (gimbal_controller), RTSP capture (frame_ingest), MAVLink airframe link (mavlink_layer), local-onboard model (semantic_analyzer + vlm_client)

Why local-only on Jetson

The choice rejects two alternatives:

  • Docker-only on x86 would leave Tier-HW rows (L1L9, Re1, Re2, NFT-RES-LIM-CPU, NFT-RES-LIM-GPU) SKIPPED-NO-HW. That defeats the project-level Acceptance Gate (acceptance_criteria.md → "Acceptance Gates (project-level)": every latency criterion MUST be measured on the deployed compute device).
  • Both x86 + Jetson would split the test surface and let Tier-B scenarios pass on x86 while masking real-hardware regressions (e.g. GPU contention is invisible on x86). The honest path is to exercise the actual hardware path uniformly.

Execution instructions (local on Jetson)

Prerequisites (one-time, per Jetson runner):

  • JetPack 6.x SDK + L4T r36.x (matches the airborne deployment image).
  • Rust toolchain pinned to the workspace's rust-toolchain.toml (added by Step 7 Implement); rustup target aarch64-unknown-linux-gnu already native here.
  • Docker + Docker Compose v2 (for orchestrating the mock services + sibling repos in Tier-E mode).
  • mavlink-router, tegrastats, iperf3, tc (network shaping).
  • ViewPro A40 (or Z40K for the Z40K-swap regression run) connected over Ethernet at the documented control endpoint.
  • ArduPilot SITL binary installed natively (the Docker image is x86-only; on Jetson aarch64 we run SITL natively or via Apptainer).
  • A representative ViewPro A40 RTSP feed source — either the physical camera or a recorded .mp4 looped through a local mediamtx.

How to start services: docker compose -f e2e/docker-compose.autopilot-e2e.yml up -d brings up detections-mock, missions-mock, rtsp-loopback, gimbal-mock, vlm-mock, operator-replay, time-injector on the Jetson host. The SUT (autopilot binary) runs outside the compose — cargo run --release on the Jetson directly, with env vars pointing at the compose-side mock endpoints. For Tier E, swap detections-mock → live ../detections and missions-mock → live missions per ../e2e/docker-compose.suite-e2e.yml.

How to run the test runner: scripts/run-tests.sh (to be created by a Decompose task per traceability-matrix.md → "Phase 4 SKIPPED" handoff) orchestrates: bring up compose → start SUT → run cargo test --release --test scenarios -p e2e-consumer → tear down. The runner reads RUN_TIER ∈ {B, E, HW} to decide which scenarios to execute.

Environment variables (consumed by both the SUT and the consumer):

  • RUN_TIER (B | E | HW) — selects scenario set per the matrix below.
  • AUTOPILOT_CONFIG — path to the test profile TOML (overrides per-scenario thresholds + Q-tagged defaults).
  • AUTOPILOT_RNG_SEED — deterministic-seed per scenario; captured in the CSV report.
  • JETSON_RUNNER_ID — identifier for the physical Jetson + camera+gimbal hardware combo; carried into every CSV row for forensic comparison across runners.

CI/CD addendum (overrides the earlier ## CI/CD integration table)

The earlier table assumed a Docker-on-x86 PR pipeline. Under this decision, every tier runs on a Jetson runner. Operationally that means:

Stage Tier(s) When Gate Timeout Runner
PR pipeline U, I on every PR push block merge on FAIL 10 min Jetson runner (native cargo test for U + I)
dev-branch nightly U, I, B nightly warn on FAIL; report attached 60 min Jetson runner
weekly suite-e2e U, I, B, E weekly + on release branch block release on FAIL 180 min Jetson runner + live siblings reachable from it
pre-release HW benchmark HW manual + pre-release block release on FAIL 240 min Jetson runner + physical A40 + airframe SITL/HW

Capacity note: the PR pipeline running on Jetson trades x86 throughput for execution honesty. If PR latency becomes painful, the team's mitigation is to add more Jetson runners — NOT to fall back to x86 for Tier B (that would defeat the choice).

Hardware Execution Matrix

Per the local-only-on-Jetson decision, every tier runs on Jetson. The matrix below is collapsed accordingly: it records what each scenario actually exercises on the Jetson (which hardware surface is the load-bearing one) so that a runner-capacity planner can predict which scenarios contend for the same physical resource.

Scenario Tier Jetson surface exercised Concurrent-with constraint
FT-P-001 (D6 Tier-1 contract) B + E GPU (Tier 1 inference) conflicts with NFT-RES-LIM-Re2 / GPU
FT-P-002 — FT-P-006 (D1D5) E + HW GPU (Tier 1 inference) as above
FT-P-007 — FT-P-010 (M1M4) B + E CPU (movement) + GPU (Tier 1 inputs) as above
FT-P-011 — FT-P-015 (S1S5) B + E CPU + gimbal UDP + GPU (Tier 3 in S5) gimbal contention serialises S1/S2/S3
FT-P-016 — FT-P-022 (O1O7, O8 happy) B + E CPU + operator-stream low contention
FT-P-023 (R1 BIT pass) B + E every dep mocked none
FT-N-001 — FT-N-002 (R2/R3) B + E none (storage seed manipulation) none
FT-N-003 (Mp2 cache-fallback) B + E mock timeout on missions-mock none
FT-N-004 (O4 below-threshold) B CPU only none
FT-P-024 / FT-P-025 / FT-P-026 (Mp1/Mp3/Mp5) B + E network + persistent store persistent-store contention serialises
NFT-PERF-L1 HW GPU (Tier 1) dedicate runner — measurement integrity
NFT-PERF-L2 HW + B GPU (Tier 2) conflicts with L1/L3/L8 — serialise
NFT-PERF-L3 HW + B (vlm-mock) GPU (Tier 3 VLM) conflicts with L1/L2 — serialise
NFT-PERF-L4 HW A40 physical zoom motor dedicate runner — physical motion
NFT-PERF-L5 HW + B CPU + gimbal UDP serialise with L4/L8
NFT-PERF-L6 / L7 B + E CPU + ego-motion + GPU (Tier 1 inputs) serialise with L1
NFT-PERF-L8 HW + B A40 physical zoom + Tier 1 GPU dedicate runner
NFT-PERF-L9 B + E CPU + operator-stream low contention
NFT-PERF-T1 B CPU + queue none
NFT-PERF-T2 B + E airframe link low
NFT-PERF-T3 B RTSP throttling + health none
NFT-RES-R4R9 B + E airframe link + persistent store serialise per-mission
NFT-RES-Mp2 / Mp4 B + E network + persistent store low
NFT-SEC-O9 / O10 B + E operator-stream + crypto path low
NFT-SEC-CraftedFrame / OversizeCrop B decoder CPU low
NFT-SEC-VlmSchemaViolation / FreeFormText B (vlm-mock) UDS IPC low
NFT-SEC-IpcPeerAuth B UDS IPC + peer-cred low
NFT-SEC-Tier1SchemaViolation B Tier-1 RPC none
NFT-SEC-MavlinkUnsigned B + E airframe link (Q6 dep) low
NFT-SEC-HealthExposesSecurity B counters + health low
NFT-RES-LIM-Re1 HW full Jetson workload (RSS) dedicate runner — measurement integrity
NFT-RES-LIM-Re2 HW Tier 1 + autopilot workload concurrent runs back-to-back with NFT-PERF-L1 in same session
NFT-RES-LIM-Storage B + HW persistent store low
NFT-RES-LIM-CPU HW full CPU dedicate runner
NFT-RES-LIM-GPU HW GPU mutex (Tier 1 vs Tier 3) dedicate runner
NFT-RES-LIM-FileHandles B + HW /proc/<pid>/fd low

Bold Tier values mark scenarios that REQUIRE physical Jetson + (sometimes) physical A40 to satisfy the project-level Acceptance Gate; surrogate replay does NOT count for those rows.

Capacity rule: scenarios marked dedicate runner MUST NOT run concurrently with any other scenario on the same Jetson — measurement integrity depends on the workload being exclusively the SUT.

Open dependencies that affect the harness

Open Q Affects Default until resolved
Q6 (MAVLink-2 signing) mavlink-sitl config + observed-MAVLink assertions signing disabled; tests skip signing assertions until Q6 lands
Q8 (MapObjects conflict resolution) Mp5 fixture shape <DEFERRED>
Q9 (Operator-command auth scheme) operator-replay envelope format + signature validator <DEFERRED> for O9/O10; O8 runs the happy path only
Q11 (multi-operator session policy) operator-replay session-id semantics single-operator only
Q14 (movement-detection classical vs learned-CV) M4 benchmark fixture shape <DEFERRED>