Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
24 KiB
Test Environment
Authored by /test-spec Phase 2 (2026-05-19) against:
_docs/00_problem/problem.md,acceptance_criteria.md,restrictions.md,security_approach.md_docs/01_solution/solution_draft01.md_docs/02_document/architecture.md(incl. §6 NFR Targets, §7 Detailed Design)_docs/00_problem/input_data/data_parameters.md,services.md,fixtures/README.md,expected_results/results_report.md
Per .cursor/rules/artifact-srp.mdc this artifact owns ONLY the test environment / harness shape — measurable thresholds belong in acceptance_criteria.md, fixture inventory belongs in test-data.md, and per-test specs belong in the sibling *-tests.md files.
Overview
System under test (SUT): autopilot — a single Rust binary that mounts onto the Jetson Orin Nano Super of a reconnaissance UAV. Its observable external surfaces:
| Surface | Direction | Protocol | Source/Sink in production |
|---|---|---|---|
| Tier-1 detection RPC | autopilot ⇄ detector | bi-directional gRPC streaming (local) | ../detections |
| MAVLink command/telemetry | autopilot ⇄ airframe | MAVLink v2 over UDP (or serial) | ArduPilot / PX4 |
| Camera RTSP feed | camera → autopilot | H.264/265 1080p, 30/60 fps | ViewPro A40 |
| Gimbal control + telemetry | autopilot ⇄ camera | ViewPro vendor UDP | ViewPro A40 |
| Mission + MapObjects REST | autopilot ⇄ central | HTTPS JSON | missions service |
| Operator stream (telemetry out, commands in) | autopilot ⇄ GS | Suite-level modem protocol, signed commands | Ground Station |
| Deep-analysis VLM IPC (optional) | autopilot ⇄ VLM | Unix-domain socket | local-onboard VLM |
| Health endpoint | autopilot → ops | HTTP/JSON | scraped by ops |
| Structured logs | autopilot → ops | JSON to stdout | log shipper |
The harness exercises every one of those surfaces from outside the SUT process. No test reaches inside the binary (no module imports, no direct DB peeks, no shared memory).
Consumer app purpose: a black-box test runner (e2e-consumer) that:
- Brings up the SUT in a controlled topology (with mock or live peers).
- Drives inputs through public surfaces.
- Captures every observable: outbound network frames, MAVLink commands, gimbal UDP commands, REST calls, operator-stream messages, health-endpoint JSON, log lines, plus passive resource metrics (RSS, CPU, GPU).
- Compares each observation against the expected result tagged in
_docs/00_problem/input_data/expected_results/results_report.mdand emits a CSV report.
Test execution tiers
Three execution tiers exist; each test scenario declares which tier(s) it must run in:
| Tier | Purpose | What is real vs mocked | When it runs |
|---|---|---|---|
| U — unit | Pure in-process logic with no external surface (state-machine transitions, geometry helpers, schema validators) | Everything in-process | Per commit (cargo test) |
| I — component-integration | One autopilot component against mocks for every peer | SUT component real; all peers stubbed/replayed | Per commit; isolates contract drift |
| B — blackbox / harness | Full SUT binary against mock peers in containers | SUT binary real; every external peer mocked (HTTPS mock, gRPC replay, MAVLink SITL, scripted operator trace, RTSP loopback) | Per commit + nightly |
| E — suite-e2e | Full SUT against live siblings (../detections, ../missions, ArduPilot SITL, Ground Station replay) |
All real services in the suite-e2e compose | Nightly + pre-release |
| HW — hardware/replay benchmark | SUT binary on representative Jetson hardware OR on a benchmarked replay of that hardware | Real Jetson Orin Nano Super OR benchmarked replay | Pre-release; the only path that satisfies the acceptance_criteria.md → Acceptance Gates (project-level) hardware gate |
Hardware-dependency analysis (which AC rows require HW vs replay vs commodity) is produced by the test-spec phases/hardware-assessment.md step before Phase 4 runner scripts are generated and is appended to this file as ## Hardware Execution Matrix.
Docker environment (Tier B + E)
The suite-e2e compose lives at the monorepo level (../e2e/docker-compose.suite-e2e.yml, owned by the monorepo-e2e skill — see _docs/00_problem/input_data/services.md). The autopilot-local harness lives at e2e/docker-compose.autopilot-e2e.yml (created by Phase 4) and brings up only the SUT + mocks needed for Tier-B runs.
Services (Tier B — autopilot-local harness)
| Service | Image / Build | Purpose | Ports |
|---|---|---|---|
autopilot |
build: . (cross to aarch64-unknown-linux-gnu for HW, native for Tier B) |
SUT | health: 9100/tcp; log: stdout; MAVLink: 14550/udp; gimbal: 9201/udp; operator: 9301/tcp |
detections-mock |
build: e2e/mocks/detections-mock (Python) |
Bi-directional gRPC mock replaying recorded Detections streams |
50051/tcp |
missions-mock |
build: e2e/mocks/missions-mock (Python FastAPI) |
HTTPS REST mock — GET/POST /missions/{id} + /mapobjects |
8443/tcp (TLS) |
rtsp-loopback |
image: bluenviron/mediamtx |
RTSP server playing back recorded .mp4 frame sequences at 30/60 fps |
8554/tcp |
gimbal-mock |
build: e2e/mocks/gimbal-mock (Rust) |
ViewPro UDP echo + scripted yaw/pitch/zoom telemetry replays | 9200/udp |
mavlink-sitl |
image: ardupilot/ardupilot-sitl |
ArduPilot SITL — MAVLink v2 endpoint for the autopilot to drive | 14551/udp |
vlm-mock |
build: e2e/mocks/vlm-mock (Python, UDS) |
Optional Tier-3 VLM IPC mock; replays recorded VlmAssessment JSON |
(UDS only) |
operator-replay |
build: e2e/mocks/operator-replay (Python) |
Scripted Ground Station session trace: connect / push frame / push telemetry / operator-click / modem-drop / reconnect / lost-link | 9300/tcp |
time-injector |
build: e2e/mocks/time-injector (Rust) |
Injects clock-drift / NTP-loss scenarios into the SUT container's clock via faketime LD_PRELOAD shim |
— |
e2e-consumer |
build: e2e/consumer (Rust + assert crates) |
The black-box test runner that drives scenarios + compares observables to expected results | — |
Networks
| Network | Services | Purpose |
|---|---|---|
autopilot-e2e |
all | Isolated test network; no egress |
Volumes
| Volume | Mounted to | Purpose |
|---|---|---|
fixtures-ro |
every mock service (read-only) | Mounts _docs/00_problem/input_data/fixtures/ for replay sources |
expected-ro |
e2e-consumer:/expected:ro |
Mounts _docs/00_problem/input_data/expected_results/ for assertion comparison |
reports-rw |
e2e-consumer:/reports |
CSV + JSON test output |
autopilot-state |
autopilot:/var/lib/autopilot |
On-device persistent store (R3, Mp4) — wiped between runs |
docker-compose structure (outline only — not runnable)
services:
autopilot:
build: .
depends_on: [detections-mock, missions-mock, rtsp-loopback, gimbal-mock, mavlink-sitl, operator-replay]
networks: [autopilot-e2e]
environment:
DETECTOR_GRPC: detections-mock:50051
MISSIONS_URL: https://missions-mock:8443
RTSP_URL: rtsp://rtsp-loopback:8554/feed
GIMBAL_UDP: gimbal-mock:9200
MAVLINK_UDP: mavlink-sitl:14551
OPERATOR_TCP: operator-replay:9300
VLM_SOCK: /tmp/vlm.sock
AUTOPILOT_CONFIG: /etc/autopilot/test.toml
volumes:
- autopilot-state:/var/lib/autopilot
detections-mock: { build: e2e/mocks/detections-mock, volumes: [fixtures-ro:/fixtures:ro] }
missions-mock: { build: e2e/mocks/missions-mock, volumes: [fixtures-ro:/fixtures:ro] }
rtsp-loopback: { image: bluenviron/mediamtx, volumes: [fixtures-ro:/fixtures:ro] }
gimbal-mock: { build: e2e/mocks/gimbal-mock, volumes: [fixtures-ro:/fixtures:ro] }
mavlink-sitl: { image: ardupilot/ardupilot-sitl }
vlm-mock: { build: e2e/mocks/vlm-mock, volumes: [fixtures-ro:/fixtures:ro] }
operator-replay: { build: e2e/mocks/operator-replay, volumes: [fixtures-ro:/fixtures:ro] }
time-injector: { build: e2e/mocks/time-injector }
e2e-consumer:
build: e2e/consumer
depends_on: [autopilot]
volumes: [expected-ro:/expected:ro, reports-rw:/reports]
networks:
autopilot-e2e: {}
volumes:
fixtures-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/fixtures } }
expected-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/expected_results } }
reports-rw: {}
autopilot-state: {}
Suite-e2e compose (Tier E) — referenced, not redefined
For Tier-E runs the harness uses ../e2e/docker-compose.suite-e2e.yml (owned by monorepo-e2e). It adds the real ../detections, real ../missions, and a richer mavlink-sitl configuration. Autopilot's Tier-E entries in this file MUST mirror the suite-e2e topology — drift is reconciled by the monorepo-e2e skill, not here.
Consumer application (e2e-consumer)
Tech stack: Rust + assert_cmd + testcontainers-rs + prost/tonic (for gRPC observation) + mavlink-rs (for MAVLink observation) + reqwest/hyper (for HTTPS observation) + tokio-tungstenite (for operator-stream observation). Tests are organised one-scenario-per-file under e2e/consumer/tests/scenarios/.
Entry point: cargo test --release --test scenarios (orchestrated by scripts/run-tests.sh, produced in Phase 4).
Communication with the system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|---|---|---|---|
| Health endpoint | HTTP GET | http://autopilot:9100/health |
none (loopback) |
| Structured log stream | line-delimited JSON on stdout | docker-compose log tail | none |
| MAVLink observed | MAVLink v2 / UDP | mavlink-sitl:14551 (the harness records both sides of the link) |
per Q6: MAVLink-2 message signing if configured |
| Gimbal observed | ViewPro UDP | gimbal-mock:9200 (commands recorded + telemetry replayed) |
none |
| RTSP delivered | RTSP | rtsp://rtsp-loopback:8554/feed (consumer schedules which clip plays per scenario) |
none |
| Detection RPC observed | gRPC streaming | detections-mock:50051 (consumer scripts the recorded replay served) |
none |
| Mission REST observed | HTTPS | missions-mock:8443 (consumer scripts JSON fixtures + asserts captured request bodies) |
TLS cert (self-signed for test) |
| Operator stream observed | Suite modem protocol | operator-replay:9300 (consumer scripts session traces + signed-command envelopes) |
per Q9: signed envelope (HMAC / ed25519 / MAVLink-2-ext) |
| VLM IPC observed (when enabled) | Unix-domain socket | /tmp/vlm.sock shared with vlm-mock |
peer-credential check (security_approach §"Local IPC peer authorisation") |
What the consumer does NOT have access to
- No direct database access to the autopilot's on-device persistent store (
autopilot-statevolume) — the consumer reads it only via the health endpoint, the operator telemetry stream, or as a post-run forensic check (the storage AC R3 is checked via the BIT health response, not by peeking at SQLite rows). - No internal Rust module imports — the consumer is a separate crate compiled against published public proto/schema files only.
- No shared memory, no
/proc/$pid/...inspection beyond passive resource metrics. - No direct reading of in-flight POI queue ordering — ordering is observed indirectly via the operator-stream emission order and the gimbal command stream.
External dependency mocks
| Dependency | Mock service | Determinism guarantee | Source fixture(s) |
|---|---|---|---|
../detections Tier-1 RPC |
detections-mock |
Replays recorded Detections stream byte-for-byte; same input → same output |
<DEFERRED: tier1_replay/*.replay; services.md §1> (live ../detections used as fallback in Tier-E) |
missions API |
missions-mock |
Static JSON responses per scenario; recorded round-trip captured for POST |
<DEFERRED: missions_fixtures/*.json; services.md §2> |
| ViewPro A40 camera frames | rtsp-loopback (mediamtx) |
Plays back .mp4 at exact configured fps; frame timestamps deterministic |
fixtures/videos/94d42580bd1ad6ff.mp4, fixtures/movement/video0[1-4].mp4 |
| ViewPro A40 gimbal control | gimbal-mock |
Replays gimbal.csv per scenario; echoes commands with bounded latency budget per scenario |
<DEFERRED: gimbal_csv/*.csv paired with movement videos; services.md §6> |
| ArduPilot airframe | mavlink-sitl (ArduPilot SITL) |
Deterministic seed + scripted mission | scripted per scenario; no fixture file required for Tier B (SITL is the fixture) |
| Ground Station modem session | operator-replay |
Replays (t, event) script per scenario |
<DEFERRED: operator_sessions/*.script; services.md §3> |
| Local VLM (Tier-3 optional) | vlm-mock |
Returns paired (roi.png → VlmAssessment) from disk; schema-violation fixtures for fail-closed tests |
<DEFERRED: vlm_io_pairs/*.json; services.md §7> |
| Wall-clock / GPS / NTP | time-injector (faketime LD_PRELOAD) |
Scripted offset / jump / source-loss; injected at SUT process start | scripted per scenario; no fixture file required |
Mocks that are marked <DEFERRED:> are bridged through _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md. Scenarios that consume those mocks declare Test status: DEFERRED — input fixture not yet acquired (see leftover row N) in their entry under the relevant *-tests.md file.
CI/CD integration
| Stage | Tier(s) | When | Gate | Timeout |
|---|---|---|---|---|
| PR pipeline | U, I | on every PR push | block merge on FAIL | 10 min |
| dev-branch nightly | U, I, B | nightly | warn on FAIL; report attached | 60 min |
| weekly suite-e2e | U, I, B, E | weekly + on release branch | block release on FAIL | 180 min |
| pre-release HW benchmark | HW | manual + pre-release | block release on FAIL | 240 min |
Owned in _docs/02_document/deployment/ci_cd_pipeline.md. This file only declares which tier each scenario MUST run in; the pipeline orchestration is documented there.
Reporting
Format: CSV (one row per scenario per run).
Columns:
| Column | Type | Notes |
|---|---|---|
test_id |
string | e.g. FT-P-001, NFT-PERF-L1, NFT-SEC-O9 |
test_name |
string | short title from the scenario header |
tier |
enum | U / I / B / E / HW |
seed |
int | deterministic seed used (where applicable) |
start_ts_utc |
ISO 8601 | scenario start |
duration_ms |
int | total execution time |
result |
enum | PASS / FAIL / SKIP / DEFERRED |
expected_result_ref |
string | row id in expected_results/results_report.md (e.g. L1, Mp3) |
actual_value |
string | quantitative observation (latency_ms, count, etc.) |
compare_method |
string | one of expected-results.md methods |
tolerance |
string | as declared in the expected-results row |
failure_reason |
string | populated only on FAIL or DEFERRED |
artifacts_path |
string | path under /reports/<run-id>/ for captured logs / pcaps / mavlink dumps |
Output path: e2e/consumer/reports/<run-id>/report.csv (mounted host-side to ./reports/<run-id>/report.csv).
Sidecar artifacts per scenario (one folder per test_id): stdout.log, stderr.log, mavlink.tlog (where applicable), pcap.bin (where applicable), health-trace.jsonl, actual-output.json.
Test Execution
Decision (recorded 2026-05-19 by phases/hardware-assessment.md): local-only on Jetson Orin Nano Super. Every scenario — Tier B, Tier E, Tier HW — runs on representative Jetson hardware (the same hardware the airborne payload deploys to). Docker is used for service orchestration (mocks, sibling services) on the Jetson host, NOT for SUT execution on x86.
Hardware dependencies found
| File | Dependency surfaced |
|---|---|
_docs/00_problem/restrictions.md → "Hardware" |
Jetson Orin Nano Super (aarch64), 8 GB shared LPDDR5, 67 TOPS INT8; ViewPro A40 (40× optical zoom + vendor UDP); ViewPro Z40K compatibility |
_docs/00_problem/restrictions.md → "Software environment" |
FP16 precision (INT8 rejected); no cloud egress; Tier 1 + local large models share Jetson GPU with mutual exclusion |
_docs/01_solution/solution_draft01.md |
"single Rust binary on Jetson Orin Nano Super (aarch64)"; TensorRT FP16; Tokio + Unix-domain-socket VLM IPC |
_docs/02_document/architecture.md §6 (NFR Targets) + §7.6 (Solution Architecture) + §7.14 (Tech Stack) |
cross-compile target aarch64-unknown-linux-gnu; TensorRT engine; gimbal UDP; MAVLink-v2 transport |
_docs/02_document/components/*/description.md (13 components) |
physical UDP (gimbal_controller), RTSP capture (frame_ingest), MAVLink airframe link (mavlink_layer), local-onboard model (semantic_analyzer + vlm_client) |
Why local-only on Jetson
The choice rejects two alternatives:
- Docker-only on x86 would leave Tier-HW rows (L1–L9, Re1, Re2, NFT-RES-LIM-CPU, NFT-RES-LIM-GPU)
SKIPPED-NO-HW. That defeats the project-level Acceptance Gate (acceptance_criteria.md → "Acceptance Gates (project-level)": every latency criterion MUST be measured on the deployed compute device). - Both x86 + Jetson would split the test surface and let Tier-B scenarios pass on x86 while masking real-hardware regressions (e.g. GPU contention is invisible on x86). The honest path is to exercise the actual hardware path uniformly.
Execution instructions (local on Jetson)
Prerequisites (one-time, per Jetson runner):
- JetPack 6.x SDK + L4T r36.x (matches the airborne deployment image).
- Rust toolchain pinned to the workspace's
rust-toolchain.toml(added by Step 7 Implement); rustup targetaarch64-unknown-linux-gnualready native here. - Docker + Docker Compose v2 (for orchestrating the mock services + sibling repos in Tier-E mode).
mavlink-router,tegrastats,iperf3,tc(network shaping).- ViewPro A40 (or Z40K for the Z40K-swap regression run) connected over Ethernet at the documented control endpoint.
- ArduPilot SITL binary installed natively (the Docker image is x86-only; on Jetson aarch64 we run SITL natively or via Apptainer).
- A representative ViewPro A40 RTSP feed source — either the physical camera or a recorded
.mp4looped through a localmediamtx.
How to start services: docker compose -f e2e/docker-compose.autopilot-e2e.yml up -d brings up detections-mock, missions-mock, rtsp-loopback, gimbal-mock, vlm-mock, operator-replay, time-injector on the Jetson host. The SUT (autopilot binary) runs outside the compose — cargo run --release on the Jetson directly, with env vars pointing at the compose-side mock endpoints. For Tier E, swap detections-mock → live ../detections and missions-mock → live missions per ../e2e/docker-compose.suite-e2e.yml.
How to run the test runner: scripts/run-tests.sh (to be created by a Decompose task per traceability-matrix.md → "Phase 4 SKIPPED" handoff) orchestrates: bring up compose → start SUT → run cargo test --release --test scenarios -p e2e-consumer → tear down. The runner reads RUN_TIER ∈ {B, E, HW} to decide which scenarios to execute.
Environment variables (consumed by both the SUT and the consumer):
RUN_TIER(B|E|HW) — selects scenario set per the matrix below.AUTOPILOT_CONFIG— path to the test profile TOML (overrides per-scenario thresholds + Q-tagged defaults).AUTOPILOT_RNG_SEED— deterministic-seed per scenario; captured in the CSV report.JETSON_RUNNER_ID— identifier for the physical Jetson + camera+gimbal hardware combo; carried into every CSV row for forensic comparison across runners.
CI/CD addendum (overrides the earlier ## CI/CD integration table)
The earlier table assumed a Docker-on-x86 PR pipeline. Under this decision, every tier runs on a Jetson runner. Operationally that means:
| Stage | Tier(s) | When | Gate | Timeout | Runner |
|---|---|---|---|---|---|
| PR pipeline | U, I | on every PR push | block merge on FAIL | 10 min | Jetson runner (native cargo test for U + I) |
| dev-branch nightly | U, I, B | nightly | warn on FAIL; report attached | 60 min | Jetson runner |
| weekly suite-e2e | U, I, B, E | weekly + on release branch | block release on FAIL | 180 min | Jetson runner + live siblings reachable from it |
| pre-release HW benchmark | HW | manual + pre-release | block release on FAIL | 240 min | Jetson runner + physical A40 + airframe SITL/HW |
Capacity note: the PR pipeline running on Jetson trades x86 throughput for execution honesty. If PR latency becomes painful, the team's mitigation is to add more Jetson runners — NOT to fall back to x86 for Tier B (that would defeat the choice).
Hardware Execution Matrix
Per the local-only-on-Jetson decision, every tier runs on Jetson. The matrix below is collapsed accordingly: it records what each scenario actually exercises on the Jetson (which hardware surface is the load-bearing one) so that a runner-capacity planner can predict which scenarios contend for the same physical resource.
| Scenario | Tier | Jetson surface exercised | Concurrent-with constraint |
|---|---|---|---|
| FT-P-001 (D6 Tier-1 contract) | B + E | GPU (Tier 1 inference) | conflicts with NFT-RES-LIM-Re2 / GPU |
| FT-P-002 — FT-P-006 (D1–D5) | E + HW | GPU (Tier 1 inference) | as above |
| FT-P-007 — FT-P-010 (M1–M4) | B + E | CPU (movement) + GPU (Tier 1 inputs) | as above |
| FT-P-011 — FT-P-015 (S1–S5) | B + E | CPU + gimbal UDP + GPU (Tier 3 in S5) | gimbal contention serialises S1/S2/S3 |
| FT-P-016 — FT-P-022 (O1–O7, O8 happy) | B + E | CPU + operator-stream | low contention |
| FT-P-023 (R1 BIT pass) | B + E | every dep mocked | none |
| FT-N-001 — FT-N-002 (R2/R3) | B + E | none (storage seed manipulation) | none |
| FT-N-003 (Mp2 cache-fallback) | B + E | mock timeout on missions-mock |
none |
| FT-N-004 (O4 below-threshold) | B | CPU only | none |
| FT-P-024 / FT-P-025 / FT-P-026 (Mp1/Mp3/Mp5) | B + E | network + persistent store | persistent-store contention serialises |
| NFT-PERF-L1 | HW | GPU (Tier 1) | dedicate runner — measurement integrity |
| NFT-PERF-L2 | HW + B | GPU (Tier 2) | conflicts with L1/L3/L8 — serialise |
| NFT-PERF-L3 | HW + B (vlm-mock) | GPU (Tier 3 VLM) | conflicts with L1/L2 — serialise |
| NFT-PERF-L4 | HW | A40 physical zoom motor | dedicate runner — physical motion |
| NFT-PERF-L5 | HW + B | CPU + gimbal UDP | serialise with L4/L8 |
| NFT-PERF-L6 / L7 | B + E | CPU + ego-motion + GPU (Tier 1 inputs) | serialise with L1 |
| NFT-PERF-L8 | HW + B | A40 physical zoom + Tier 1 GPU | dedicate runner |
| NFT-PERF-L9 | B + E | CPU + operator-stream | low contention |
| NFT-PERF-T1 | B | CPU + queue | none |
| NFT-PERF-T2 | B + E | airframe link | low |
| NFT-PERF-T3 | B | RTSP throttling + health | none |
| NFT-RES-R4–R9 | B + E | airframe link + persistent store | serialise per-mission |
| NFT-RES-Mp2 / Mp4 | B + E | network + persistent store | low |
| NFT-SEC-O9 / O10 | B + E | operator-stream + crypto path | low |
| NFT-SEC-CraftedFrame / OversizeCrop | B | decoder CPU | low |
| NFT-SEC-VlmSchemaViolation / FreeFormText | B (vlm-mock) | UDS IPC | low |
| NFT-SEC-IpcPeerAuth | B | UDS IPC + peer-cred | low |
| NFT-SEC-Tier1SchemaViolation | B | Tier-1 RPC | none |
| NFT-SEC-MavlinkUnsigned | B + E | airframe link (Q6 dep) | low |
| NFT-SEC-HealthExposesSecurity | B | counters + health | low |
| NFT-RES-LIM-Re1 | HW | full Jetson workload (RSS) | dedicate runner — measurement integrity |
| NFT-RES-LIM-Re2 | HW | Tier 1 + autopilot workload concurrent | runs back-to-back with NFT-PERF-L1 in same session |
| NFT-RES-LIM-Storage | B + HW | persistent store | low |
| NFT-RES-LIM-CPU | HW | full CPU | dedicate runner |
| NFT-RES-LIM-GPU | HW | GPU mutex (Tier 1 vs Tier 3) | dedicate runner |
| NFT-RES-LIM-FileHandles | B + HW | /proc/<pid>/fd |
low |
Bold Tier values mark scenarios that REQUIRE physical Jetson + (sometimes) physical A40 to satisfy the project-level Acceptance Gate; surrogate replay does NOT count for those rows.
Capacity rule: scenarios marked dedicate runner MUST NOT run concurrently with any other scenario on the same Jetson — measurement integrity depends on the workload being exclusively the SUT.
Open dependencies that affect the harness
| Open Q | Affects | Default until resolved |
|---|---|---|
| Q6 (MAVLink-2 signing) | mavlink-sitl config + observed-MAVLink assertions |
signing disabled; tests skip signing assertions until Q6 lands |
| Q8 (MapObjects conflict resolution) | Mp5 fixture shape | <DEFERRED> |
| Q9 (Operator-command auth scheme) | operator-replay envelope format + signature validator |
<DEFERRED> for O9/O10; O8 runs the happy path only |
| Q11 (multi-operator session policy) | operator-replay session-id semantics |
single-operator only |
| Q14 (movement-detection classical vs learned-CV) | M4 benchmark fixture shape | <DEFERRED> |