mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 12:11:10 +00:00
bc40ea7300
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
321 lines
24 KiB
Markdown
321 lines
24 KiB
Markdown
# Test Environment
|
||
|
||
Authored by `/test-spec` Phase 2 (2026-05-19) against:
|
||
|
||
- `_docs/00_problem/problem.md`, `acceptance_criteria.md`, `restrictions.md`, `security_approach.md`
|
||
- `_docs/01_solution/solution_draft01.md`
|
||
- `_docs/02_document/architecture.md` (incl. §6 NFR Targets, §7 Detailed Design)
|
||
- `_docs/00_problem/input_data/data_parameters.md`, `services.md`, `fixtures/README.md`, `expected_results/results_report.md`
|
||
|
||
Per `.cursor/rules/artifact-srp.mdc` this artifact owns ONLY the test environment / harness shape — measurable thresholds belong in `acceptance_criteria.md`, fixture inventory belongs in `test-data.md`, and per-test specs belong in the sibling `*-tests.md` files.
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
**System under test (SUT)**: `autopilot` — a single Rust binary that mounts onto the Jetson Orin Nano Super of a reconnaissance UAV. Its observable external surfaces:
|
||
|
||
| Surface | Direction | Protocol | Source/Sink in production |
|
||
|---|---|---|---|
|
||
| Tier-1 detection RPC | autopilot ⇄ detector | bi-directional gRPC streaming (local) | `../detections` |
|
||
| MAVLink command/telemetry | autopilot ⇄ airframe | MAVLink v2 over UDP (or serial) | ArduPilot / PX4 |
|
||
| Camera RTSP feed | camera → autopilot | H.264/265 1080p, 30/60 fps | ViewPro A40 |
|
||
| Gimbal control + telemetry | autopilot ⇄ camera | ViewPro vendor UDP | ViewPro A40 |
|
||
| Mission + MapObjects REST | autopilot ⇄ central | HTTPS JSON | `missions` service |
|
||
| Operator stream (telemetry out, commands in) | autopilot ⇄ GS | Suite-level modem protocol, signed commands | Ground Station |
|
||
| Deep-analysis VLM IPC (optional) | autopilot ⇄ VLM | Unix-domain socket | local-onboard VLM |
|
||
| Health endpoint | autopilot → ops | HTTP/JSON | scraped by ops |
|
||
| Structured logs | autopilot → ops | JSON to stdout | log shipper |
|
||
|
||
The harness exercises every one of those surfaces from outside the SUT process. No test reaches inside the binary (no module imports, no direct DB peeks, no shared memory).
|
||
|
||
**Consumer app purpose**: a black-box test runner (`e2e-consumer`) that:
|
||
|
||
1. Brings up the SUT in a controlled topology (with mock or live peers).
|
||
2. Drives inputs through public surfaces.
|
||
3. Captures every observable: outbound network frames, MAVLink commands, gimbal UDP commands, REST calls, operator-stream messages, health-endpoint JSON, log lines, plus passive resource metrics (RSS, CPU, GPU).
|
||
4. Compares each observation against the expected result tagged in `_docs/00_problem/input_data/expected_results/results_report.md` and emits a CSV report.
|
||
|
||
## Test execution tiers
|
||
|
||
Three execution tiers exist; each test scenario declares which tier(s) it must run in:
|
||
|
||
| Tier | Purpose | What is real vs mocked | When it runs |
|
||
|---|---|---|---|
|
||
| **U** — unit | Pure in-process logic with no external surface (state-machine transitions, geometry helpers, schema validators) | Everything in-process | Per commit (cargo test) |
|
||
| **I** — component-integration | One autopilot component against mocks for every peer | SUT component real; all peers stubbed/replayed | Per commit; isolates contract drift |
|
||
| **B** — blackbox / harness | Full SUT binary against mock peers in containers | SUT binary real; every external peer mocked (HTTPS mock, gRPC replay, MAVLink SITL, scripted operator trace, RTSP loopback) | Per commit + nightly |
|
||
| **E** — suite-e2e | Full SUT against live siblings (`../detections`, `../missions`, ArduPilot SITL, Ground Station replay) | All real services in the suite-e2e compose | Nightly + pre-release |
|
||
| **HW** — hardware/replay benchmark | SUT binary on representative Jetson hardware OR on a benchmarked replay of that hardware | Real Jetson Orin Nano Super OR benchmarked replay | Pre-release; the only path that satisfies the `acceptance_criteria.md → Acceptance Gates (project-level)` hardware gate |
|
||
|
||
Hardware-dependency analysis (which AC rows require HW vs replay vs commodity) is produced by the test-spec `phases/hardware-assessment.md` step before Phase 4 runner scripts are generated and is appended to this file as `## Hardware Execution Matrix`.
|
||
|
||
## Docker environment (Tier B + E)
|
||
|
||
The suite-e2e compose lives at the monorepo level (`../e2e/docker-compose.suite-e2e.yml`, owned by the `monorepo-e2e` skill — see `_docs/00_problem/input_data/services.md`). The autopilot-local harness lives at `e2e/docker-compose.autopilot-e2e.yml` (created by Phase 4) and brings up only the SUT + mocks needed for Tier-B runs.
|
||
|
||
### Services (Tier B — autopilot-local harness)
|
||
|
||
| Service | Image / Build | Purpose | Ports |
|
||
|---|---|---|---|
|
||
| `autopilot` | build: `.` (cross to `aarch64-unknown-linux-gnu` for HW, native for Tier B) | SUT | health: 9100/tcp; log: stdout; MAVLink: 14550/udp; gimbal: 9201/udp; operator: 9301/tcp |
|
||
| `detections-mock` | build: `e2e/mocks/detections-mock` (Python) | Bi-directional gRPC mock replaying recorded `Detections` streams | 50051/tcp |
|
||
| `missions-mock` | build: `e2e/mocks/missions-mock` (Python FastAPI) | HTTPS REST mock — `GET/POST /missions/{id}` + `/mapobjects` | 8443/tcp (TLS) |
|
||
| `rtsp-loopback` | image: `bluenviron/mediamtx` | RTSP server playing back recorded `.mp4` frame sequences at 30/60 fps | 8554/tcp |
|
||
| `gimbal-mock` | build: `e2e/mocks/gimbal-mock` (Rust) | ViewPro UDP echo + scripted yaw/pitch/zoom telemetry replays | 9200/udp |
|
||
| `mavlink-sitl` | image: `ardupilot/ardupilot-sitl` | ArduPilot SITL — MAVLink v2 endpoint for the autopilot to drive | 14551/udp |
|
||
| `vlm-mock` | build: `e2e/mocks/vlm-mock` (Python, UDS) | Optional Tier-3 VLM IPC mock; replays recorded `VlmAssessment` JSON | (UDS only) |
|
||
| `operator-replay` | build: `e2e/mocks/operator-replay` (Python) | Scripted Ground Station session trace: connect / push frame / push telemetry / operator-click / modem-drop / reconnect / lost-link | 9300/tcp |
|
||
| `time-injector` | build: `e2e/mocks/time-injector` (Rust) | Injects clock-drift / NTP-loss scenarios into the SUT container's clock via `faketime` LD_PRELOAD shim | — |
|
||
| `e2e-consumer` | build: `e2e/consumer` (Rust + assert crates) | The black-box test runner that drives scenarios + compares observables to expected results | — |
|
||
|
||
### Networks
|
||
|
||
| Network | Services | Purpose |
|
||
|---|---|---|
|
||
| `autopilot-e2e` | all | Isolated test network; no egress |
|
||
|
||
### Volumes
|
||
|
||
| Volume | Mounted to | Purpose |
|
||
|---|---|---|
|
||
| `fixtures-ro` | every mock service (read-only) | Mounts `_docs/00_problem/input_data/fixtures/` for replay sources |
|
||
| `expected-ro` | `e2e-consumer:/expected:ro` | Mounts `_docs/00_problem/input_data/expected_results/` for assertion comparison |
|
||
| `reports-rw` | `e2e-consumer:/reports` | CSV + JSON test output |
|
||
| `autopilot-state` | `autopilot:/var/lib/autopilot` | On-device persistent store (R3, Mp4) — wiped between runs |
|
||
|
||
### docker-compose structure (outline only — not runnable)
|
||
|
||
```yaml
|
||
services:
|
||
autopilot:
|
||
build: .
|
||
depends_on: [detections-mock, missions-mock, rtsp-loopback, gimbal-mock, mavlink-sitl, operator-replay]
|
||
networks: [autopilot-e2e]
|
||
environment:
|
||
DETECTOR_GRPC: detections-mock:50051
|
||
MISSIONS_URL: https://missions-mock:8443
|
||
RTSP_URL: rtsp://rtsp-loopback:8554/feed
|
||
GIMBAL_UDP: gimbal-mock:9200
|
||
MAVLINK_UDP: mavlink-sitl:14551
|
||
OPERATOR_TCP: operator-replay:9300
|
||
VLM_SOCK: /tmp/vlm.sock
|
||
AUTOPILOT_CONFIG: /etc/autopilot/test.toml
|
||
volumes:
|
||
- autopilot-state:/var/lib/autopilot
|
||
detections-mock: { build: e2e/mocks/detections-mock, volumes: [fixtures-ro:/fixtures:ro] }
|
||
missions-mock: { build: e2e/mocks/missions-mock, volumes: [fixtures-ro:/fixtures:ro] }
|
||
rtsp-loopback: { image: bluenviron/mediamtx, volumes: [fixtures-ro:/fixtures:ro] }
|
||
gimbal-mock: { build: e2e/mocks/gimbal-mock, volumes: [fixtures-ro:/fixtures:ro] }
|
||
mavlink-sitl: { image: ardupilot/ardupilot-sitl }
|
||
vlm-mock: { build: e2e/mocks/vlm-mock, volumes: [fixtures-ro:/fixtures:ro] }
|
||
operator-replay: { build: e2e/mocks/operator-replay, volumes: [fixtures-ro:/fixtures:ro] }
|
||
time-injector: { build: e2e/mocks/time-injector }
|
||
e2e-consumer:
|
||
build: e2e/consumer
|
||
depends_on: [autopilot]
|
||
volumes: [expected-ro:/expected:ro, reports-rw:/reports]
|
||
networks:
|
||
autopilot-e2e: {}
|
||
volumes:
|
||
fixtures-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/fixtures } }
|
||
expected-ro: { driver_opts: { type: none, o: bind, device: ${PWD}/_docs/00_problem/input_data/expected_results } }
|
||
reports-rw: {}
|
||
autopilot-state: {}
|
||
```
|
||
|
||
### Suite-e2e compose (Tier E) — referenced, not redefined
|
||
|
||
For Tier-E runs the harness uses `../e2e/docker-compose.suite-e2e.yml` (owned by `monorepo-e2e`). It adds the real `../detections`, real `../missions`, and a richer `mavlink-sitl` configuration. Autopilot's Tier-E entries in this file MUST mirror the suite-e2e topology — drift is reconciled by the `monorepo-e2e` skill, not here.
|
||
|
||
## Consumer application (`e2e-consumer`)
|
||
|
||
**Tech stack**: Rust + `assert_cmd` + `testcontainers-rs` + `prost`/`tonic` (for gRPC observation) + `mavlink-rs` (for MAVLink observation) + `reqwest`/`hyper` (for HTTPS observation) + `tokio-tungstenite` (for operator-stream observation). Tests are organised one-scenario-per-file under `e2e/consumer/tests/scenarios/`.
|
||
|
||
**Entry point**: `cargo test --release --test scenarios` (orchestrated by `scripts/run-tests.sh`, produced in Phase 4).
|
||
|
||
### Communication with the system under test
|
||
|
||
| Interface | Protocol | Endpoint / Topic | Authentication |
|
||
|---|---|---|---|
|
||
| Health endpoint | HTTP GET | `http://autopilot:9100/health` | none (loopback) |
|
||
| Structured log stream | line-delimited JSON on stdout | docker-compose log tail | none |
|
||
| MAVLink observed | MAVLink v2 / UDP | `mavlink-sitl:14551` (the harness records both sides of the link) | per Q6: MAVLink-2 message signing if configured |
|
||
| Gimbal observed | ViewPro UDP | `gimbal-mock:9200` (commands recorded + telemetry replayed) | none |
|
||
| RTSP delivered | RTSP | `rtsp://rtsp-loopback:8554/feed` (consumer schedules which clip plays per scenario) | none |
|
||
| Detection RPC observed | gRPC streaming | `detections-mock:50051` (consumer scripts the recorded replay served) | none |
|
||
| Mission REST observed | HTTPS | `missions-mock:8443` (consumer scripts JSON fixtures + asserts captured request bodies) | TLS cert (self-signed for test) |
|
||
| Operator stream observed | Suite modem protocol | `operator-replay:9300` (consumer scripts session traces + signed-command envelopes) | per Q9: signed envelope (HMAC / ed25519 / MAVLink-2-ext) |
|
||
| VLM IPC observed (when enabled) | Unix-domain socket | `/tmp/vlm.sock` shared with `vlm-mock` | peer-credential check (security_approach §"Local IPC peer authorisation") |
|
||
|
||
### What the consumer does NOT have access to
|
||
|
||
- No direct database access to the autopilot's on-device persistent store (`autopilot-state` volume) — the consumer reads it only via the health endpoint, the operator telemetry stream, or as a post-run forensic check (the storage AC R3 is checked via the BIT health response, not by peeking at SQLite rows).
|
||
- No internal Rust module imports — the consumer is a separate crate compiled against published public proto/schema files only.
|
||
- No shared memory, no `/proc/$pid/...` inspection beyond passive resource metrics.
|
||
- No direct reading of in-flight POI queue ordering — ordering is observed indirectly via the operator-stream emission order and the gimbal command stream.
|
||
|
||
## External dependency mocks
|
||
|
||
| Dependency | Mock service | Determinism guarantee | Source fixture(s) |
|
||
|---|---|---|---|
|
||
| `../detections` Tier-1 RPC | `detections-mock` | Replays recorded `Detections` stream byte-for-byte; same input → same output | `<DEFERRED: tier1_replay/*.replay; services.md §1>` (live `../detections` used as fallback in Tier-E) |
|
||
| `missions` API | `missions-mock` | Static JSON responses per scenario; recorded round-trip captured for `POST` | `<DEFERRED: missions_fixtures/*.json; services.md §2>` |
|
||
| ViewPro A40 camera frames | `rtsp-loopback` (mediamtx) | Plays back `.mp4` at exact configured fps; frame timestamps deterministic | `fixtures/videos/94d42580bd1ad6ff.mp4`, `fixtures/movement/video0[1-4].mp4` |
|
||
| ViewPro A40 gimbal control | `gimbal-mock` | Replays `gimbal.csv` per scenario; echoes commands with bounded latency budget per scenario | `<DEFERRED: gimbal_csv/*.csv paired with movement videos; services.md §6>` |
|
||
| ArduPilot airframe | `mavlink-sitl` (ArduPilot SITL) | Deterministic seed + scripted mission | scripted per scenario; no fixture file required for Tier B (SITL is the fixture) |
|
||
| Ground Station modem session | `operator-replay` | Replays `(t, event)` script per scenario | `<DEFERRED: operator_sessions/*.script; services.md §3>` |
|
||
| Local VLM (Tier-3 optional) | `vlm-mock` | Returns paired `(roi.png → VlmAssessment)` from disk; schema-violation fixtures for fail-closed tests | `<DEFERRED: vlm_io_pairs/*.json; services.md §7>` |
|
||
| Wall-clock / GPS / NTP | `time-injector` (faketime LD_PRELOAD) | Scripted offset / jump / source-loss; injected at SUT process start | scripted per scenario; no fixture file required |
|
||
|
||
Mocks that are marked `<DEFERRED:>` are bridged through `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`. Scenarios that consume those mocks declare `Test status: DEFERRED — input fixture not yet acquired (see leftover row N)` in their entry under the relevant `*-tests.md` file.
|
||
|
||
## CI/CD integration
|
||
|
||
| Stage | Tier(s) | When | Gate | Timeout |
|
||
|---|---|---|---|---|
|
||
| PR pipeline | U, I | on every PR push | block merge on FAIL | 10 min |
|
||
| dev-branch nightly | U, I, B | nightly | warn on FAIL; report attached | 60 min |
|
||
| weekly suite-e2e | U, I, B, E | weekly + on release branch | block release on FAIL | 180 min |
|
||
| pre-release HW benchmark | HW | manual + pre-release | block release on FAIL | 240 min |
|
||
|
||
Owned in `_docs/02_document/deployment/ci_cd_pipeline.md`. This file only declares which tier each scenario MUST run in; the pipeline orchestration is documented there.
|
||
|
||
## Reporting
|
||
|
||
**Format**: CSV (one row per scenario per run).
|
||
|
||
**Columns**:
|
||
|
||
| Column | Type | Notes |
|
||
|---|---|---|
|
||
| `test_id` | string | e.g. `FT-P-001`, `NFT-PERF-L1`, `NFT-SEC-O9` |
|
||
| `test_name` | string | short title from the scenario header |
|
||
| `tier` | enum | U / I / B / E / HW |
|
||
| `seed` | int | deterministic seed used (where applicable) |
|
||
| `start_ts_utc` | ISO 8601 | scenario start |
|
||
| `duration_ms` | int | total execution time |
|
||
| `result` | enum | PASS / FAIL / SKIP / DEFERRED |
|
||
| `expected_result_ref` | string | row id in `expected_results/results_report.md` (e.g. `L1`, `Mp3`) |
|
||
| `actual_value` | string | quantitative observation (latency_ms, count, etc.) |
|
||
| `compare_method` | string | one of `expected-results.md` methods |
|
||
| `tolerance` | string | as declared in the expected-results row |
|
||
| `failure_reason` | string | populated only on FAIL or DEFERRED |
|
||
| `artifacts_path` | string | path under `/reports/<run-id>/` for captured logs / pcaps / mavlink dumps |
|
||
|
||
**Output path**: `e2e/consumer/reports/<run-id>/report.csv` (mounted host-side to `./reports/<run-id>/report.csv`).
|
||
|
||
**Sidecar artifacts** per scenario (one folder per `test_id`): `stdout.log`, `stderr.log`, `mavlink.tlog` (where applicable), `pcap.bin` (where applicable), `health-trace.jsonl`, `actual-output.json`.
|
||
|
||
## Test Execution
|
||
|
||
**Decision** (recorded 2026-05-19 by `phases/hardware-assessment.md`): **local-only on Jetson Orin Nano Super**. Every scenario — Tier B, Tier E, Tier HW — runs on representative Jetson hardware (the same hardware the airborne payload deploys to). Docker is used for **service orchestration** (mocks, sibling services) on the Jetson host, NOT for SUT execution on x86.
|
||
|
||
### Hardware dependencies found
|
||
|
||
| File | Dependency surfaced |
|
||
|---|---|
|
||
| `_docs/00_problem/restrictions.md → "Hardware"` | Jetson Orin Nano Super (aarch64), 8 GB shared LPDDR5, 67 TOPS INT8; ViewPro A40 (40× optical zoom + vendor UDP); ViewPro Z40K compatibility |
|
||
| `_docs/00_problem/restrictions.md → "Software environment"` | FP16 precision (INT8 rejected); no cloud egress; Tier 1 + local large models share Jetson GPU with mutual exclusion |
|
||
| `_docs/01_solution/solution_draft01.md` | "single Rust binary on Jetson Orin Nano Super (aarch64)"; TensorRT FP16; Tokio + Unix-domain-socket VLM IPC |
|
||
| `_docs/02_document/architecture.md §6` (NFR Targets) + `§7.6` (Solution Architecture) + `§7.14` (Tech Stack) | cross-compile target `aarch64-unknown-linux-gnu`; TensorRT engine; gimbal UDP; MAVLink-v2 transport |
|
||
| `_docs/02_document/components/*/description.md` (13 components) | physical UDP (gimbal_controller), RTSP capture (frame_ingest), MAVLink airframe link (mavlink_layer), local-onboard model (semantic_analyzer + vlm_client) |
|
||
|
||
### Why local-only on Jetson
|
||
|
||
The choice rejects two alternatives:
|
||
|
||
- **Docker-only on x86** would leave Tier-HW rows (L1–L9, Re1, Re2, NFT-RES-LIM-CPU, NFT-RES-LIM-GPU) `SKIPPED-NO-HW`. That defeats the project-level Acceptance Gate (`acceptance_criteria.md → "Acceptance Gates (project-level)"`: every latency criterion MUST be measured on the deployed compute device).
|
||
- **Both x86 + Jetson** would split the test surface and let Tier-B scenarios pass on x86 while masking real-hardware regressions (e.g. GPU contention is invisible on x86). The honest path is to exercise the actual hardware path uniformly.
|
||
|
||
### Execution instructions (local on Jetson)
|
||
|
||
**Prerequisites** (one-time, per Jetson runner):
|
||
- JetPack 6.x SDK + L4T r36.x (matches the airborne deployment image).
|
||
- Rust toolchain pinned to the workspace's `rust-toolchain.toml` (added by Step 7 Implement); rustup target `aarch64-unknown-linux-gnu` already native here.
|
||
- Docker + Docker Compose v2 (for orchestrating the mock services + sibling repos in Tier-E mode).
|
||
- `mavlink-router`, `tegrastats`, `iperf3`, `tc` (network shaping).
|
||
- ViewPro A40 (or Z40K for the Z40K-swap regression run) connected over Ethernet at the documented control endpoint.
|
||
- ArduPilot SITL binary installed natively (the Docker image is x86-only; on Jetson aarch64 we run SITL natively or via Apptainer).
|
||
- A representative ViewPro A40 RTSP feed source — either the physical camera or a recorded `.mp4` looped through a local `mediamtx`.
|
||
|
||
**How to start services**: `docker compose -f e2e/docker-compose.autopilot-e2e.yml up -d` brings up `detections-mock`, `missions-mock`, `rtsp-loopback`, `gimbal-mock`, `vlm-mock`, `operator-replay`, `time-injector` on the Jetson host. The SUT (`autopilot` binary) runs **outside** the compose — `cargo run --release` on the Jetson directly, with env vars pointing at the compose-side mock endpoints. For Tier E, swap `detections-mock` → live `../detections` and `missions-mock` → live `missions` per `../e2e/docker-compose.suite-e2e.yml`.
|
||
|
||
**How to run the test runner**: `scripts/run-tests.sh` (to be created by a Decompose task per `traceability-matrix.md → "Phase 4 SKIPPED"` handoff) orchestrates: bring up compose → start SUT → run `cargo test --release --test scenarios -p e2e-consumer` → tear down. The runner reads `RUN_TIER ∈ {B, E, HW}` to decide which scenarios to execute.
|
||
|
||
**Environment variables** (consumed by both the SUT and the consumer):
|
||
- `RUN_TIER` (`B` | `E` | `HW`) — selects scenario set per the matrix below.
|
||
- `AUTOPILOT_CONFIG` — path to the test profile TOML (overrides per-scenario thresholds + Q-tagged defaults).
|
||
- `AUTOPILOT_RNG_SEED` — deterministic-seed per scenario; captured in the CSV report.
|
||
- `JETSON_RUNNER_ID` — identifier for the physical Jetson + camera+gimbal hardware combo; carried into every CSV row for forensic comparison across runners.
|
||
|
||
### CI/CD addendum (overrides the earlier `## CI/CD integration` table)
|
||
|
||
The earlier table assumed a Docker-on-x86 PR pipeline. Under this decision, every tier runs on a Jetson runner. Operationally that means:
|
||
|
||
| Stage | Tier(s) | When | Gate | Timeout | Runner |
|
||
|---|---|---|---|---|---|
|
||
| PR pipeline | U, I | on every PR push | block merge on FAIL | 10 min | Jetson runner (native cargo test for U + I) |
|
||
| dev-branch nightly | U, I, B | nightly | warn on FAIL; report attached | 60 min | Jetson runner |
|
||
| weekly suite-e2e | U, I, B, E | weekly + on release branch | block release on FAIL | 180 min | Jetson runner + live siblings reachable from it |
|
||
| pre-release HW benchmark | HW | manual + pre-release | block release on FAIL | 240 min | Jetson runner + physical A40 + airframe SITL/HW |
|
||
|
||
Capacity note: the PR pipeline running on Jetson trades x86 throughput for execution honesty. If PR latency becomes painful, the team's mitigation is to add more Jetson runners — NOT to fall back to x86 for Tier B (that would defeat the choice).
|
||
|
||
## Hardware Execution Matrix
|
||
|
||
Per the local-only-on-Jetson decision, every tier runs on Jetson. The matrix below is collapsed accordingly: it records **what each scenario actually exercises on the Jetson** (which hardware surface is the load-bearing one) so that a runner-capacity planner can predict which scenarios contend for the same physical resource.
|
||
|
||
| Scenario | Tier | Jetson surface exercised | Concurrent-with constraint |
|
||
|---|---|---|---|
|
||
| FT-P-001 (D6 Tier-1 contract) | B + E | GPU (Tier 1 inference) | conflicts with NFT-RES-LIM-Re2 / GPU |
|
||
| FT-P-002 — FT-P-006 (D1–D5) | E + HW | GPU (Tier 1 inference) | as above |
|
||
| FT-P-007 — FT-P-010 (M1–M4) | B + E | CPU (movement) + GPU (Tier 1 inputs) | as above |
|
||
| FT-P-011 — FT-P-015 (S1–S5) | B + E | CPU + gimbal UDP + GPU (Tier 3 in S5) | gimbal contention serialises S1/S2/S3 |
|
||
| FT-P-016 — FT-P-022 (O1–O7, O8 happy) | B + E | CPU + operator-stream | low contention |
|
||
| FT-P-023 (R1 BIT pass) | B + E | every dep mocked | none |
|
||
| FT-N-001 — FT-N-002 (R2/R3) | B + E | none (storage seed manipulation) | none |
|
||
| FT-N-003 (Mp2 cache-fallback) | B + E | mock timeout on `missions-mock` | none |
|
||
| FT-N-004 (O4 below-threshold) | B | CPU only | none |
|
||
| FT-P-024 / FT-P-025 / FT-P-026 (Mp1/Mp3/Mp5) | B + E | network + persistent store | persistent-store contention serialises |
|
||
| NFT-PERF-L1 | **HW** | GPU (Tier 1) | dedicate runner — measurement integrity |
|
||
| NFT-PERF-L2 | HW + B | GPU (Tier 2) | conflicts with L1/L3/L8 — serialise |
|
||
| NFT-PERF-L3 | HW + B (vlm-mock) | GPU (Tier 3 VLM) | conflicts with L1/L2 — serialise |
|
||
| NFT-PERF-L4 | **HW** | A40 physical zoom motor | dedicate runner — physical motion |
|
||
| NFT-PERF-L5 | HW + B | CPU + gimbal UDP | serialise with L4/L8 |
|
||
| NFT-PERF-L6 / L7 | B + E | CPU + ego-motion + GPU (Tier 1 inputs) | serialise with L1 |
|
||
| NFT-PERF-L8 | HW + B | A40 physical zoom + Tier 1 GPU | dedicate runner |
|
||
| NFT-PERF-L9 | B + E | CPU + operator-stream | low contention |
|
||
| NFT-PERF-T1 | B | CPU + queue | none |
|
||
| NFT-PERF-T2 | B + E | airframe link | low |
|
||
| NFT-PERF-T3 | B | RTSP throttling + health | none |
|
||
| NFT-RES-R4–R9 | B + E | airframe link + persistent store | serialise per-mission |
|
||
| NFT-RES-Mp2 / Mp4 | B + E | network + persistent store | low |
|
||
| NFT-SEC-O9 / O10 | B + E | operator-stream + crypto path | low |
|
||
| NFT-SEC-CraftedFrame / OversizeCrop | B | decoder CPU | low |
|
||
| NFT-SEC-VlmSchemaViolation / FreeFormText | B (vlm-mock) | UDS IPC | low |
|
||
| NFT-SEC-IpcPeerAuth | B | UDS IPC + peer-cred | low |
|
||
| NFT-SEC-Tier1SchemaViolation | B | Tier-1 RPC | none |
|
||
| NFT-SEC-MavlinkUnsigned | B + E | airframe link (Q6 dep) | low |
|
||
| NFT-SEC-HealthExposesSecurity | B | counters + health | low |
|
||
| NFT-RES-LIM-Re1 | **HW** | full Jetson workload (RSS) | dedicate runner — measurement integrity |
|
||
| NFT-RES-LIM-Re2 | **HW** | Tier 1 + autopilot workload concurrent | runs back-to-back with NFT-PERF-L1 in same session |
|
||
| NFT-RES-LIM-Storage | B + HW | persistent store | low |
|
||
| NFT-RES-LIM-CPU | **HW** | full CPU | dedicate runner |
|
||
| NFT-RES-LIM-GPU | **HW** | GPU mutex (Tier 1 vs Tier 3) | dedicate runner |
|
||
| NFT-RES-LIM-FileHandles | B + HW | `/proc/<pid>/fd` | low |
|
||
|
||
**Bold Tier values** mark scenarios that REQUIRE physical Jetson + (sometimes) physical A40 to satisfy the project-level Acceptance Gate; surrogate replay does NOT count for those rows.
|
||
|
||
**Capacity rule**: scenarios marked `dedicate runner` MUST NOT run concurrently with any other scenario on the same Jetson — measurement integrity depends on the workload being exclusively the SUT.
|
||
|
||
## Open dependencies that affect the harness
|
||
|
||
| Open Q | Affects | Default until resolved |
|
||
|---|---|---|
|
||
| Q6 (MAVLink-2 signing) | `mavlink-sitl` config + observed-MAVLink assertions | signing disabled; tests skip signing assertions until Q6 lands |
|
||
| Q8 (MapObjects conflict resolution) | Mp5 fixture shape | `<DEFERRED>` |
|
||
| Q9 (Operator-command auth scheme) | `operator-replay` envelope format + signature validator | `<DEFERRED>` for O9/O10; O8 runs the happy path only |
|
||
| Q11 (multi-operator session policy) | `operator-replay` session-id semantics | single-operator only |
|
||
| Q14 (movement-detection classical vs learned-CV) | M4 benchmark fixture shape | `<DEFERRED>` |
|