mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-27 13:56:37 +00:00
Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,268 @@
|
|||||||
|
# Test Environment
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
**System under test (SUT)**: the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).
|
||||||
|
|
||||||
|
**SUT entry points (public interfaces, all black-box)**:
|
||||||
|
|
||||||
|
| Entry point | Protocol | Direction | Bound to | Purpose |
|
||||||
|
|-------------|----------|-----------|----------|---------|
|
||||||
|
| `MAVLink GPS_INPUT` | MAVLink2 (signed), serial/UDP | SUT → FC | sysid=11 | Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2) |
|
||||||
|
| `MAVLink STATUSTEXT / NAMED_VALUE_FLOAT` | MAVLink2 (signed) | SUT → GCS | sysid=10 | Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2) |
|
||||||
|
| `MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT` | MAVLink2 | FC → SUT | sysid=10 | IMU + autopilot inputs to cuVSLAM, ortho, source-promotion |
|
||||||
|
| `HTTP/HTTPS REST` (e.g., `/health`, `/sessions`, `/objects/locate`) | HTTPS+JWT | external → SUT | TBD port | Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 27–33) |
|
||||||
|
| `HTTP SSE` (`/sessions/{id}/stream`) | HTTPS+SSE | SUT → external | TBD port | 1 Hz position stream for monitoring (results_report row 32) |
|
||||||
|
| `ROS 2 topics` (test-only sniffer) | DDS | SUT internal | observed black-box via topic ports | F-T19 ROS rate sanity test only — NOT used by functional tests |
|
||||||
|
| `MBTiles cache file` (read-only check) | SQLite read | external → cache fs | mounted volume | AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals |
|
||||||
|
|
||||||
|
**Consumer app purpose**: a standalone `pytest`-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has **no source-code access** to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public `nav_msgs/Odometry` / `sensor_msgs/Image` subscriptions that are documented as the SUT contract).
|
||||||
|
|
||||||
|
## Docker Environment
|
||||||
|
|
||||||
|
### Services
|
||||||
|
|
||||||
|
| Service | Image / Build | Purpose | Ports |
|
||||||
|
|---------|--------------|---------|-------|
|
||||||
|
| `sut` | build context `./` (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for `linux/arm64` for HW tier and `linux/amd64+cuda` for SW emulation tier) | The full GPS-Denied stack (all ROS 2 nodes) | UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics) |
|
||||||
|
| `ardupilot-sitl` | `ardupilot/ardupilot-sitl:4.5-PR30080-pinned` | Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 | UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock |
|
||||||
|
| `qgc-mock` | build `./fixtures/qgc-mock/` (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) | Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) | UDP 14570 |
|
||||||
|
| `tile-cache-init` | build `./fixtures/tile-cache-init/` (one-shot loader that materialises `fixtures/satellite_tiles_AD0000xx_z20/` MBTiles + sidecar) | Pre-populates the satellite cache before each test | — (one-shot) |
|
||||||
|
| `gps-spoof-injector` | build `./fixtures/gps-spoof-injector/` (publishes `GPS_RAW_INT` with crafted lat/lon/sat/hdop) | F-T12 / AC-NEW-2 spoof scenarios | UDP 14571 → sut |
|
||||||
|
| `e2e-runner` | build `./e2e/` (Python 3.11 + pytest + pymavlink + httpx + pyserial) | Black-box test runner | — |
|
||||||
|
| `prom` | `prom/prometheus:v2.51.0` | Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 | TCP 9091 |
|
||||||
|
| `nvidia-smi-exporter` | `utkuozdemir/nvidia_gpu_exporter:1.2.0` (HW tier only) | Jetson tegrastats / nvidia-smi metrics | TCP 9092 |
|
||||||
|
|
||||||
|
### Networks
|
||||||
|
|
||||||
|
| Network | Services | Purpose |
|
||||||
|
|---------|----------|---------|
|
||||||
|
| `e2e-mavlink-net` | `sut`, `ardupilot-sitl`, `qgc-mock`, `gps-spoof-injector` | MAVLink traffic (single broadcast domain so distinct sysids share routing realistically) |
|
||||||
|
| `e2e-api-net` | `sut`, `e2e-runner` | HTTPS + SSE traffic for object-localization / health endpoints |
|
||||||
|
| `e2e-metrics-net` | `sut`, `prom`, `nvidia-smi-exporter`, `e2e-runner` | Resource-monitoring scrape path |
|
||||||
|
|
||||||
|
### Volumes
|
||||||
|
|
||||||
|
| Volume | Mounted to | Purpose |
|
||||||
|
|--------|-----------|---------|
|
||||||
|
| `tile-cache` | `sut:/var/lib/gpsdenied/tiles` (rw), `tile-cache-init:/init/tiles` (rw), `e2e-runner:/probe/tiles` (ro) | Persistent satellite + onboard tile cache (AC-8.3, AC-8.4) |
|
||||||
|
| `fdr` | `sut:/var/lib/gpsdenied/fdr` (rw), `e2e-runner:/probe/fdr` (ro) | Flight Data Recorder output (AC-NEW-3) |
|
||||||
|
| `fixtures-images` | `sut:/fixtures/images` (ro), `e2e-runner:/fixtures/images` (ro) | The 60 nav-cam JPGs + AerialVL S03 slice |
|
||||||
|
| `fixtures-imu` | `sut:/fixtures/imu` (ro), `ardupilot-sitl:/fixtures/imu` (ro) | SITL replay IMU traces (AerialVL S03 + synthetic from `coordinates.csv`) |
|
||||||
|
| `fixtures-expected` | `e2e-runner:/fixtures/expected_results` (ro) | `_docs/00_problem/input_data/expected_results/` mounted into the runner |
|
||||||
|
| `e2e-results` | `e2e-runner:/results` (rw, host bind) | CSV report output |
|
||||||
|
|
||||||
|
### docker-compose structure
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Outline only — not runnable code
|
||||||
|
services:
|
||||||
|
sut:
|
||||||
|
build: .
|
||||||
|
networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
|
||||||
|
volumes:
|
||||||
|
- tile-cache:/var/lib/gpsdenied/tiles
|
||||||
|
- fdr:/var/lib/gpsdenied/fdr
|
||||||
|
- fixtures-images:/fixtures/images:ro
|
||||||
|
- fixtures-imu:/fixtures/imu:ro
|
||||||
|
environment:
|
||||||
|
- MAVLINK_FC_URL=udp://ardupilot-sitl:14550
|
||||||
|
- MAVLINK_GCS_URL=udp://qgc-mock:14570
|
||||||
|
- GPSD_API_BIND=0.0.0.0:8443
|
||||||
|
- GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
|
||||||
|
- GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
|
||||||
|
runtime: nvidia # HW tier
|
||||||
|
ardupilot-sitl:
|
||||||
|
image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
|
||||||
|
networks: [e2e-mavlink-net]
|
||||||
|
command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
|
||||||
|
qgc-mock:
|
||||||
|
build: ./fixtures/qgc-mock/
|
||||||
|
networks: [e2e-mavlink-net]
|
||||||
|
tile-cache-init:
|
||||||
|
build: ./fixtures/tile-cache-init/
|
||||||
|
volumes:
|
||||||
|
- tile-cache:/init/tiles
|
||||||
|
restart: "no"
|
||||||
|
gps-spoof-injector:
|
||||||
|
build: ./fixtures/gps-spoof-injector/
|
||||||
|
networks: [e2e-mavlink-net]
|
||||||
|
e2e-runner:
|
||||||
|
build: ./e2e/
|
||||||
|
depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
|
||||||
|
networks: [e2e-api-net, e2e-metrics-net]
|
||||||
|
volumes:
|
||||||
|
- tile-cache:/probe/tiles:ro
|
||||||
|
- fdr:/probe/fdr:ro
|
||||||
|
- fixtures-images:/fixtures/images:ro
|
||||||
|
- fixtures-expected:/fixtures/expected_results:ro
|
||||||
|
- e2e-results:/results
|
||||||
|
command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
|
||||||
|
prom:
|
||||||
|
image: prom/prometheus:v2.51.0
|
||||||
|
networks: [e2e-metrics-net]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Consumer Application
|
||||||
|
|
||||||
|
**Tech stack**: Python 3.11 / pytest 8.x / `pymavlink` (matching the SUT version) / `httpx[http2]` / `pyserial` / `numpy` / `pandas` / `pytest-csv` / `pytest-timeout`. **No SUT source imports.**
|
||||||
|
|
||||||
|
**Entry point**: `pytest -q` inside `e2e-runner`, with marker-based selection per tier (`pytest -m "blackbox and pipeline"` → 60-image slice; `pytest -m "blackbox and deferred-corpus"` → AerialVL S03; etc.).
|
||||||
|
|
||||||
|
### Communication with system under test
|
||||||
|
|
||||||
|
| Interface | Protocol | Endpoint / Topic | Authentication |
|
||||||
|
|-----------|----------|-----------------|----------------|
|
||||||
|
| GPS_INPUT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) and `udp://ardupilot-sitl:14550` (target) | MAVLink2 signing key shared with FC for round-trip verification |
|
||||||
|
| STATUSTEXT / NAMED_VALUE_FLOAT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) | MAVLink2 signing key |
|
||||||
|
| Object localization | HTTPS + JSON | `POST sut:8443/objects/locate` | JWT bearer (test-only key in `e2e-runner` config) |
|
||||||
|
| Health probe | HTTPS + JSON | `GET sut:8443/health` | JWT bearer |
|
||||||
|
| Session management | HTTPS + JSON | `POST sut:8443/sessions`, `GET sut:8443/sessions/{id}/stream` | JWT bearer |
|
||||||
|
| Operator hint | MAVLink2 STATUSTEXT | injected via `qgc-mock` | MAVLink2 signing key |
|
||||||
|
| Spoofed GPS injection | MAVLink2 GPS_RAW_INT | injected via `gps-spoof-injector` (separate sysid) | MAVLink2 signing key |
|
||||||
|
| Tile cache file probe | filesystem read | `/probe/tiles/*.mbtiles` + sidecar JSON | — (read-only mount) |
|
||||||
|
| FDR file probe | filesystem read | `/probe/fdr/**/*` | — (read-only mount) |
|
||||||
|
| Metrics scrape | HTTP | `GET prom:9091/api/v1/query?…` | — (test net only) |
|
||||||
|
|
||||||
|
### What the consumer does NOT have access to
|
||||||
|
|
||||||
|
- No direct DB / SQLite write access against the SUT's tile or FDR stores.
|
||||||
|
- No Python imports of SUT modules.
|
||||||
|
- No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
|
||||||
|
- No CUDA context, no shared memory, no `/proc` access into the SUT container.
|
||||||
|
- No log-file scraping that bypasses the public health/STATUSTEXT path.
|
||||||
|
|
||||||
|
## Test Tiers
|
||||||
|
|
||||||
|
The runner stratifies execution by **what artefact set is present**. Each tier maps to a pytest marker and to a `data_status` column value in `traceability-matrix.md`.
|
||||||
|
|
||||||
|
| Tier | Marker | Corpus / fixtures required | Coverage scope |
|
||||||
|
|------|--------|---------------------------|----------------|
|
||||||
|
| **T1 pipeline-correctness** | `pipeline` | `_docs/00_problem/input_data/` 60-image slice + `coordinates.csv` + placeholder satellite tiles + SITL-replayed IMU | Validates pipeline plumbing only, **NOT** deployment-binding numbers (per Phase 1 D2). |
|
||||||
|
| **T2 deferred-corpus** | `deferred-corpus` | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight | Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9. |
|
||||||
|
| **T3 deferred-sitl** | `deferred-sitl` | ArduPilot SITL pinned to PR #30080-class build + scripted scenarios | F-T9 source-switching matrix (AC-4.3, AC-NEW-2). |
|
||||||
|
| **T4 deferred-hil** | `deferred-hil` | Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop | AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW. |
|
||||||
|
| **T5 deferred-field** | `deferred-field` | Recorded fixed-wing sortie | FT-1 / FT-2 / FT-3 final field validation. |
|
||||||
|
|
||||||
|
Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are **NOT** treated as deployment evidence — they verify that the pipeline produces *some* output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2–T5.
|
||||||
|
|
||||||
|
## CI/CD Integration
|
||||||
|
|
||||||
|
| Tier | When to run | Pipeline stage | Gate behavior | Timeout |
|
||||||
|
|------|-------------|----------------|---------------|---------|
|
||||||
|
| T1 pipeline | Every PR to `dev`; nightly | After unit tests | Block merge on FAIL | 30 min |
|
||||||
|
| T2 deferred-corpus | Nightly; on tag push | Pre-release | Block release on FAIL | 4 h (Monte Carlo NF-T4 dominates) |
|
||||||
|
| T3 deferred-sitl | Nightly | Pre-release | Block release on FAIL | 1 h |
|
||||||
|
| T4 deferred-hil | Bench-on-demand + weekly thermal cycle | Bench-only stage | Manual approval | 12 h (NF-T3 8 h soak) |
|
||||||
|
| T5 deferred-field | Field-test plan (per-sortie) | Field stage | Out-of-band sign-off | per sortie |
|
||||||
|
|
||||||
|
## Reporting
|
||||||
|
|
||||||
|
**Format**: CSV (one row per test execution) plus JUnit XML for CI.
|
||||||
|
|
||||||
|
**CSV columns**: `test_id`, `test_name`, `tier`, `marker`, `traces_to_acs` (semicolon-joined), `traces_to_restricts`, `data_status` (`present` / `deferred-corpus` / `deferred-sitl` / `deferred-hil` / `deferred-field`), `started_at`, `execution_time_ms`, `result` (`PASS` / `FAIL` / `SKIP` / `BLOCKED-DATA`), `expected_metric`, `actual_metric`, `tolerance`, `error_message` (if FAIL or BLOCKED-DATA), `git_sha`, `image_tag`, `runner_host`.
|
||||||
|
|
||||||
|
**Output paths**:
|
||||||
|
- `e2e-results:/results/report.csv` — primary CSV report
|
||||||
|
- `e2e-results:/results/junit.xml` — JUnit XML
|
||||||
|
- `e2e-results:/results/coverage_by_ac.csv` — derived: AC → covering test IDs → aggregate result
|
||||||
|
- `e2e-results:/results/per_tier.csv` — derived: tier → pass/fail/skip/blocked-data counts
|
||||||
|
|
||||||
|
**`BLOCKED-DATA` handling**: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit `BLOCKED-DATA` rather than `FAIL` or `SKIP` — this preserves the data_status signal in the matrix without polluting the failure rate.
|
||||||
|
|
||||||
|
## Test Execution
|
||||||
|
|
||||||
|
**Decision: both (per-tier split).** The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).
|
||||||
|
|
||||||
|
### Hardware dependencies found
|
||||||
|
|
||||||
|
| Source | Indicator |
|
||||||
|
|--------|-----------|
|
||||||
|
| `_docs/00_problem/restrictions.md:26` | Cameras over USB / MIPI-CSI / GigE |
|
||||||
|
| `_docs/00_problem/restrictions.md:41` | Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP |
|
||||||
|
| `_docs/00_problem/restrictions.md:42` | JetPack + CUDA + TensorRT |
|
||||||
|
| `_docs/00_problem/restrictions.md:43` | Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5) |
|
||||||
|
| `_docs/00_problem/restrictions.md:48-51` | IMU + MAVLink2 from FC (serial/UDP); ArduPilot only |
|
||||||
|
| `_docs/01_solution/solution.md` | cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT) |
|
||||||
|
| this file (`environment.md`) | `runtime: nvidia`; `linux/arm64` HW tier + `linux/amd64+cuda` SW emulation tier; `nvidia-smi-exporter` |
|
||||||
|
|
||||||
|
Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).
|
||||||
|
|
||||||
|
### Mode A — Docker (T1 / T2 / T3)
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
|
||||||
|
- Docker 24.x+ with Compose v2
|
||||||
|
- For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
|
||||||
|
- For SW-emulation runners: `linux/amd64` host; CUDA emulation layer enabled in the SUT image's `linux/amd64+cuda` build target
|
||||||
|
- T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see `test-data.md`)
|
||||||
|
- T3 only: `ardupilot-sitl` PR-#30080-pinned image pulled
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# T1 pipeline
|
||||||
|
docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
|
||||||
|
pytest -m "blackbox and pipeline" --csv=/results/report.csv
|
||||||
|
|
||||||
|
# T2 deferred-corpus (corpus volumes must be present)
|
||||||
|
docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
|
||||||
|
pytest -m "deferred-corpus" --csv=/results/report.csv
|
||||||
|
|
||||||
|
# T3 deferred-sitl
|
||||||
|
docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
|
||||||
|
pytest -m "deferred-sitl" --csv=/results/report.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result collection:** host bind-mount `e2e-results:./results` — produces `report.csv`, `junit.xml`, `coverage_by_ac.csv`, `per_tier.csv`.
|
||||||
|
|
||||||
|
**Environment variables (key):** `MAVLINK_FC_URL`, `MAVLINK_GCS_URL`, `GPSD_API_BIND`, `GPSD_TILE_DIR`, `GPSD_FDR_DIR`, `MAVLINK2_SIGNING_KEY`, `JWT_SIGNING_KEY` — full list in `e2e/.env.example` (to be produced in Phase 4 / Decompose).
|
||||||
|
|
||||||
|
### Mode B — Local on bench Jetson (T4 deferred-hil)
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
|
||||||
|
- Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
|
||||||
|
- Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running `ardupilot-sitl` against a recorded IMU stream, OR a real autopilot board on bench)
|
||||||
|
- Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
|
||||||
|
- `tegrastats` and `nvidia-smi` available
|
||||||
|
- Single-tenant scheduling — no other tests share the Jetson during a T4 run
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# T4 perf binding on real HW
|
||||||
|
./scripts/run-tests.sh --tier=t4
|
||||||
|
# Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
|
||||||
|
./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result collection:** the bench runner copies `report.csv` + `junit.xml` + `tegrastats.log` + `power.csv` to a network share (path TBD by Decompose).
|
||||||
|
|
||||||
|
### Mode C — Field (T5 deferred-field)
|
||||||
|
|
||||||
|
Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same `e2e-runner` image plus a recorded-flight replay harness defined in the field-test plan.
|
||||||
|
|
||||||
|
### CI runner mapping
|
||||||
|
|
||||||
|
| Tier | CI runner type | Mode | Cadence |
|
||||||
|
|------|---------------|------|---------|
|
||||||
|
| T1 pipeline | Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation | Docker | Every PR + nightly |
|
||||||
|
| T2 deferred-corpus | Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted | Docker | Nightly + on-tag |
|
||||||
|
| T3 deferred-sitl | Linux x86 (CPU-only OK) | Docker | Nightly |
|
||||||
|
| T4 deferred-hil | Self-hosted Jetson Orin Nano Super bench runner | Local | Bench-on-demand + weekly thermal cycle |
|
||||||
|
| T5 deferred-field | n/a (per-sortie out-of-band) | Field | Per field-test plan |
|
||||||
|
|
||||||
|
Phase 4 (`run-tests.sh`, `run-performance-tests.sh`) consumes this section to choose between the Docker and bench-local code paths via the `--tier=` flag.
|
||||||
|
|
||||||
|
## External Dependencies
|
||||||
|
|
||||||
|
The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is **out of scope** for this build. The runner therefore mocks:
|
||||||
|
|
||||||
|
- `tile-cache-init` provides the cache contents the SUT would normally have synced from the Service pre-flight.
|
||||||
|
- `qgc-mock` is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
|
||||||
|
- `gps-spoof-injector` simulates a malicious GPS signal for AC-NEW-2 / F-T12.
|
||||||
|
- `ardupilot-sitl` is the only autopilot under test (PX4 is out of scope per restrictions).
|
||||||
|
- The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.
|
||||||
|
|
||||||
|
No external mocks have access to internal SUT state.
|
||||||
@@ -0,0 +1,248 @@
|
|||||||
|
# Performance Tests
|
||||||
|
|
||||||
|
> Deployment-binding numbers require Tier T4 (real Jetson Orin Nano Super @ 25 W). T1 runs are functional plausibility checks only — same caveat as `test-data.md` D2.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-01: End-to-end latency p95 ≤400 ms (AC-4.1)
|
||||||
|
|
||||||
|
**Summary**: From camera-frame capture to GPS_INPUT emission, p95 latency ≤ 400 ms on Orin Nano Super @ 25 W.
|
||||||
|
**Traces to**: AC-4.1. Tier: T4 (`deferred-hil`) for binding result; T1 functional smoke.
|
||||||
|
**Metric**: end-to-end latency in ms, sampled per-frame, aggregated to p50 / p95 / p99.
|
||||||
|
|
||||||
|
**Preconditions**:
|
||||||
|
- Tier T4: real Jetson Orin Nano Super, 25 W power mode (`nvpmodel -m 0` + 25 W profile), thermals stabilized at +25 °C ambient.
|
||||||
|
- TRT engines warmed (≥1 min steady-state replay before measurement).
|
||||||
|
- 30-min sustained replay of `synthetic_8h_load` slice (or AerialVL S03 mid-segment).
|
||||||
|
- Frame timestamping uses the camera-shim `time_usec` and matches against the GPS_INPUT `time_usec`.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Stream nav-cam frames at 3 fps for 30 min after warm-up | per-frame `(t_emit_gps_input - t_capture)` |
|
||||||
|
| 2 | Drop the first 60 s as warm-up | aggregate the rest |
|
||||||
|
| 3 | Compute p50, p95, p99, max | report |
|
||||||
|
| 4 | Verify drop rate | `dropped_frames / total_frames ≤ 10%` |
|
||||||
|
|
||||||
|
**Pass criteria**: p95 ≤ 400 ms; drop rate ≤ 10 % (per AC-4.1's "skip-allowed" clause).
|
||||||
|
**Duration**: 30 min + 60 s warm-up.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-02: cuVSLAM single-frame latency ≤20 ms
|
||||||
|
|
||||||
|
**Summary**: cuVSLAM inference completes within 20 ms per frame.
|
||||||
|
**Traces to**: results_report row 37, F-T1b. Tier: T4 binding; T1 functional.
|
||||||
|
**Metric**: cuVSLAM per-frame inference duration, p95.
|
||||||
|
|
||||||
|
**Preconditions**: cuVSLAM warmed; mono+IMU mode.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Replay 5 min of nav-cam frames at 3 fps | per-frame `cuvslam_inference_ms` (publicly exposed metric) |
|
||||||
|
| 2 | p95 over the run | report |
|
||||||
|
|
||||||
|
**Pass criteria**: p95 ≤ 20 ms.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-03: Cross-view matcher latency
|
||||||
|
|
||||||
|
**Summary**: Inline matcher (SP+LG TRT FP16/INT8) ≤ 200 ms / pair; LiteSAM re-loc fallback ≤ 2000 ms / pair.
|
||||||
|
**Traces to**: AC-4.1 (sub-budget), results_report row 38. Tier: T4 binding.
|
||||||
|
**Metric**: per-pair matcher inference time, p95.
|
||||||
|
|
||||||
|
**Preconditions**: matcher warmed; representative resolution (1024×768 SP+LG / GIM-LG).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Replay 1000 cross-view pairs through inline path | `inline_matcher_ms` per pair |
|
||||||
|
| 2 | Replay 100 cross-view pairs through re-loc path | `reloc_matcher_ms` per pair |
|
||||||
|
|
||||||
|
**Pass criteria**: inline p95 ≤ 200 ms; re-loc p95 ≤ 2000 ms.
|
||||||
|
**Duration**: ≤30 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-04: Orthority per-frame latency ≤50 ms
|
||||||
|
|
||||||
|
**Summary**: Orthority's per-frame ortho call on Orin Nano Super stays within budget.
|
||||||
|
**Traces to**: F-T14, M-27. Tier: T4 binding. If exceeded, fall back to `cv2.warpPerspective + bilinear DEM` per Component 1b documented fall-back.
|
||||||
|
**Metric**: ortho per-frame duration, p95.
|
||||||
|
|
||||||
|
**Preconditions**: Orthority loaded; SRTM-30 m DEM mmap warm; sector classified `flat` or `moderate`.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Replay 1000 frames | per-frame `ortho_ms` |
|
||||||
|
|
||||||
|
**Pass criteria**: p95 ≤ 50 ms. If FAIL: open task to switch to fall-back path (not a blocking gate at this test, but a flow trigger).
|
||||||
|
**Duration**: ≤10 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-05: Spoofing-promotion latency ≤3 s p95 (AC-NEW-2)
|
||||||
|
|
||||||
|
**Summary**: Time from spoof onset to SUT promotion as primary GPS source.
|
||||||
|
**Traces to**: AC-NEW-2. Tier: T3 (`deferred-sitl`).
|
||||||
|
**Metric**: t_promote = `t_promotion_event - t_spoof_onset`, p95 over 50 trials.
|
||||||
|
|
||||||
|
**Preconditions**: SITL + `gps-spoof-injector`; FC EKF3 lane-switch event observable via `EKF_STATUS_REPORT`.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | At t=0 inject spoof signal | observe SUT GPS_INPUT promotion (raised `fix_type` to 3D-fix-with-priority + STATUSTEXT `PROMOTE`) |
|
||||||
|
| 2 | Repeat 50 trials with randomised spoof magnitudes | distribution |
|
||||||
|
|
||||||
|
**Pass criteria**: p95 ≤ 3 s.
|
||||||
|
**Duration**: ≤30 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-06: Frame-by-frame output cadence (AC-4.4)
|
||||||
|
|
||||||
|
**Summary**: GPS_INPUT is streamed per-frame, not batched.
|
||||||
|
**Traces to**: AC-4.4. Tier: T1 + T4.
|
||||||
|
**Metric**: inter-frame interval distribution.
|
||||||
|
|
||||||
|
**Preconditions**: 30 min steady-state replay.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Replay at 3 fps | sniff GPS_INPUT timestamps |
|
||||||
|
| 2 | Compute inter-arrival deltas | distribution |
|
||||||
|
| 3 | Verify no frame is delayed >1 inter-frame interval | — |
|
||||||
|
|
||||||
|
**Pass criteria**: |Δt - 1/3 s| ≤ 50 ms for ≥99 % of frames; no batches (no clusters of frames within the same 50 ms window).
|
||||||
|
**Duration**: 30 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-07: GPS_INPUT message rate (results_report row 9)
|
||||||
|
|
||||||
|
**Summary**: GPS_INPUT emitted at 5–10 Hz continuous (matches per-frame at 3 fps + duplicates for FC stability when configured).
|
||||||
|
**Traces to**: AC-4.3, results_report row 9. Tier: T1.
|
||||||
|
**Metric**: rate over 60 s windows.
|
||||||
|
|
||||||
|
**Preconditions**: steady-state tracking.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Sniff GPS_INPUT for 5 min | per-second rate |
|
||||||
|
|
||||||
|
**Pass criteria**: rate ∈ [5, 10] Hz throughout.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-08: VPR latency under conditional invocation
|
||||||
|
|
||||||
|
**Summary**: VPR's DINOv2 forward only fires on re-loc triggers; in cruise it stays near zero CPU/GPU.
|
||||||
|
**Traces to**: AC-8.6, restrictions §Satellite (VPR retrieval unit). Tier: T4.
|
||||||
|
**Metric**: VPR invocations / second; cruise idle vs re-loc burst.
|
||||||
|
|
||||||
|
**Preconditions**: 60-min replay with scripted re-loc triggers (cold start, sharp turn, σ_xy > 50 m, VO failure ≥2 frames).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Run replay | per-second `vpr_invocations` counter |
|
||||||
|
| 2 | Compute average across cruise window vs re-loc window | — |
|
||||||
|
|
||||||
|
**Pass criteria**:
|
||||||
|
- Cruise window (no triggers): VPR invocations / 100 frames ≤ 1 (i.e., not invoked per-frame).
|
||||||
|
- Re-loc window: VPR invokes within 1 frame of trigger; latency ≤ 200 ms p95 for the DINOv2 forward.
|
||||||
|
**Duration**: 60 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-09: Top-K dynamic sizing matches sector / σ_xy
|
||||||
|
|
||||||
|
**Summary**: VPR top-K honours AC-8.6 dynamic-K rules.
|
||||||
|
**Traces to**: AC-8.6. Tier: T1 + T4.
|
||||||
|
**Metric**: K value selected per VPR call vs sector class + σ_xy.
|
||||||
|
|
||||||
|
**Preconditions**: scripted scenarios with (sector ∈ {stable, active}) × (σ_xy ∈ {10, 30, 60}).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Measurement |
|
||||||
|
|------|----------------|-------------|
|
||||||
|
| 1 | Trigger VPR in each combination | observe `vpr_top_k` metric |
|
||||||
|
|
||||||
|
**Pass criteria**:
|
||||||
|
- stable + σ_xy ≤ 20 m → K=5.
|
||||||
|
- active-conflict → K=20.
|
||||||
|
- expanding-window fallback (σ_xy > 50 m or fail-N) → K=50.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-10: Failsafe latency ≤3 s no-fix → FC fallback (AC-5.2)
|
||||||
|
|
||||||
|
**Summary**: When SUT cannot produce any estimate for >3 s, FC observably falls back to IMU-only DR.
|
||||||
|
**Traces to**: AC-5.2. Tier: T3.
|
||||||
|
**Metric**: time from last-fix-emission to FC fallback signal in `EKF_STATUS_REPORT`.
|
||||||
|
|
||||||
|
**Preconditions**: scripted blackout in SITL.
|
||||||
|
|
||||||
|
**Steps**: blackout pipeline; observe FC.
|
||||||
|
|
||||||
|
**Pass criteria**: FC fallback observable within 4 s of blackout (3 s budget + 1 s observation latency).
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-11: Bench-off candidates — accuracy-vs-latency frontier
|
||||||
|
|
||||||
|
**Summary**: Score inline matcher candidates on the documented bench-off corpora.
|
||||||
|
**Traces to**: AC-1.1 / AC-1.2 / AC-2.2 / R2 / R3, F-T15. Tier: T2.
|
||||||
|
**Metric**: per-candidate (recall@30 m, p95 latency, peak GPU mem, sustained 30-min thermal stability, seasonal-robustness score).
|
||||||
|
|
||||||
|
**Preconditions**: AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic.
|
||||||
|
|
||||||
|
**Steps**: run each candidate (SP+LG, GIM-LG, XFeat sparse, XFeat semi-dense) and each ceiling reference (RoMa v2, MASt3R-SLAM, MapGlue, MATCHA — offline only) over the corpora.
|
||||||
|
|
||||||
|
**Pass criteria**:
|
||||||
|
- Inline candidates must fit in 200 ms / pair on Orin Nano Super @ 25 W.
|
||||||
|
- Re-loc candidates (LiteSAM) must fit in 2 s / pair.
|
||||||
|
- Selected inline matcher's recall@30 m on AerialVL S03 must support AC-1.1 / AC-1.2.
|
||||||
|
**Duration**: 4 h Monte Carlo.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-PERF-12: Latency under adversarial input — no infinite stall
|
||||||
|
|
||||||
|
**Summary**: Pathological inputs (uniform-grey frame, all-black frame, very low contrast) do not cause unbounded latency.
|
||||||
|
**Traces to**: AC-3.x (resilience), AC-4.1 (negative). Tier: T1.
|
||||||
|
**Metric**: per-frame latency capped.
|
||||||
|
|
||||||
|
**Preconditions**: replay with 5 % of frames replaced by uniform-grey or all-black.
|
||||||
|
|
||||||
|
**Steps**: replay 30 min; observe latency CDF.
|
||||||
|
|
||||||
|
**Pass criteria**: each frame's latency ≤ 600 ms (1.5× p95 budget); pipeline never stalls beyond a single frame interval.
|
||||||
|
**Duration**: 30 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test execution caveats
|
||||||
|
|
||||||
|
- **T1 runs**: produced numbers are NOT deployment-binding. AC-4.1 / NFT-PERF-01 specifically requires Orin Nano Super 25 W (T4) for binding pass.
|
||||||
|
- **T4 runs**: bench scheduler enforces single-tenant access; thermal warm-up ≥1 min before measurement window starts.
|
||||||
|
- **Frame-rate floor**: AC-4.1 allows ~10 % drop under sustained load. Drop rate IS measured and reported in NFT-PERF-01.
|
||||||
@@ -0,0 +1,309 @@
|
|||||||
|
# Resilience Tests
|
||||||
|
|
||||||
|
> Each test defines fault injection + observable recovery + quantifiable pass/fail. All run through the public interfaces from `environment.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-01: Companion-computer process kill mid-flight (AC-5.3, AC-NEW-1)
|
||||||
|
|
||||||
|
**Summary**: SUT process killed mid-flight; SUT restarts and recovers from FC's IMU-extrapolated position within 30 s.
|
||||||
|
**Traces to**: AC-5.3, AC-NEW-1, F-T11, results_report row 25. Tier: T1.
|
||||||
|
|
||||||
|
**Preconditions**: SUT in steady-state tracking; FC continues to fly.
|
||||||
|
|
||||||
|
**Fault injection**:
|
||||||
|
- `docker kill -s SIGKILL <sut>` followed by `docker start <sut>`.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | SIGKILL SUT | SUT process exits non-gracefully; FC continues IMU-only DR per AC-5.2 |
|
||||||
|
| 2 | Restart SUT | container starts |
|
||||||
|
| 3 | Time from container start to first valid GPS_INPUT (`fix_type==3`) | t_recovery ≤ 30 s |
|
||||||
|
| 4 | Read `GLOBAL_POSITION_INT` from FC at SUT-start; assert pipeline seeds from it | source recovery via FC pose |
|
||||||
|
| 5 | After first satellite match, error ≤ 50 m | accuracy restored |
|
||||||
|
|
||||||
|
**Pass criteria**: t_recovery ≤ 30 s p95 over 50 trials; AC-5.2 fallback observable on FC during the gap; accuracy restored ≤ 50 m after first match.
|
||||||
|
**Duration**: 60 s per trial; 50-trial campaign on T4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-02: GPS spoofing — promotion within 3 s (AC-NEW-2)
|
||||||
|
|
||||||
|
**Summary**: FC GPS-loss / lane-switch event signalled → SUT promotes its estimate to primary within 3 s.
|
||||||
|
**Traces to**: AC-NEW-2, F-T12. Tier: T3 (`deferred-sitl`).
|
||||||
|
|
||||||
|
**Preconditions**: SITL + `gps-spoof-injector`.
|
||||||
|
|
||||||
|
**Fault injection**:
|
||||||
|
- Inject malicious `GPS_RAW_INT` with 1 km lat/lon offset starting at scripted t=0.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | t=0: inject spoof | FC observes anomaly; emits EKF lane-switch / fix-loss in `EKF_STATUS_REPORT` |
|
||||||
|
| 2 | SUT subscribes to `GPS_RAW_INT`, `EKF_STATUS_REPORT`, `SYS_STATUS` and maintains a "real-GPS health" rolling average | health drops below threshold |
|
||||||
|
| 3 | Within 3 s, SUT raises GPS_INPUT to primary mode + emits STATUSTEXT `PROMOTE` to GCS | promotion event observable |
|
||||||
|
|
||||||
|
**Pass criteria**: 95th percentile of t_promote ≤ 3 s over 50 trials.
|
||||||
|
**Duration**: 30 min campaign.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-03: 3-s no-fix → FC fallback to IMU-only DR (AC-5.2)
|
||||||
|
|
||||||
|
**Summary**: Pipeline blackout for >3 s — FC falls back to IMU-only DR; SUT logs the failure.
|
||||||
|
**Traces to**: AC-5.2, restrictions §Failsafe. Tier: T3.
|
||||||
|
|
||||||
|
**Fault injection**: scripted scenario where SUT cannot produce any estimate for 3.5 s (e.g., cuVSLAM tracking loss + cache poisoned + matcher offline).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Inject blackout | SUT publishes STATUSTEXT WARN within 1 s of blackout |
|
||||||
|
| 2 | At t=3 s of blackout, SUT emits a single STATUSTEXT FAILSAFE | recorded |
|
||||||
|
| 3 | Observe FC `EKF_STATUS_REPORT` | FC switches to IMU-only DR within 4 s of blackout start |
|
||||||
|
| 4 | After 5 s, restore pipeline | SUT re-emits valid GPS_INPUT; FC re-fuses |
|
||||||
|
|
||||||
|
**Pass criteria**: FC fallback observable within 4 s; SUT recovers within 30 s of pipeline restore (matches AC-NEW-1 budget).
|
||||||
|
**Duration**: 60 s per trial.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-04: 3-consecutive-failures → RELOC_REQ + waiting state (AC-3.4)
|
||||||
|
|
||||||
|
**Summary**: When SUT cannot determine position for ≥3 consecutive frames AND ≥2 s, it sends a re-localization request.
|
||||||
|
**Traces to**: AC-3.4, results_report rows 20, 21, 46. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: scripted 3 frames of failed satellite matching + cuVSLAM degraded.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Trigger 3 consecutive frame failures spanning ≥2 s | counter increments |
|
||||||
|
| 2 | Within 2 s of the third failure, STATUSTEXT `RELOC_REQ: last_lat=… last_lon=… uncertainty=…m` emitted | regex match |
|
||||||
|
| 3 | While waiting, SUT continues VO/IMU dead reckoning (`fix_type==0`, source `dead_reckoned`) and continues satellite-match attempts (counter increments) | observable |
|
||||||
|
| 4 | FC continues with last known position + IMU extrapolation | `EKF_STATUS_REPORT` consistent |
|
||||||
|
|
||||||
|
**Pass criteria**: regex matches; SUT continues emitting GPS_INPUT in waiting state; satellite-match counter increments.
|
||||||
|
**Duration**: 60 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-05: Operator hint workflow (AC-3.4, AC-6.2)
|
||||||
|
|
||||||
|
**Summary**: Operator hint is consumed as a 500 m seed for VPR/cross-view re-loc.
|
||||||
|
**Traces to**: AC-3.4, AC-6.2, F-T10, results_report row 22. Tier: T1.
|
||||||
|
|
||||||
|
**Preconditions**: SUT in re-loc waiting (after NFT-RES-04).
|
||||||
|
|
||||||
|
**Fault injection** (cooperative): `qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m`.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Send hint | SUT consumes hint; STATUSTEXT `HINT_RECEIVED` echoed |
|
||||||
|
| 2 | First fix after hint | error ≤ 500 m |
|
||||||
|
| 3 | After next satellite match | error ≤ 50 m; `tracking_state == NORMAL` |
|
||||||
|
|
||||||
|
**Pass criteria**: as above.
|
||||||
|
**Duration**: 60 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-06: Sharp turn — VO-loss → satellite re-loc (AC-3.2)
|
||||||
|
|
||||||
|
**Summary**: <5 % overlap, <70°, <200 m drift triggers VO loss; satellite re-loc recovers within 3 frames.
|
||||||
|
**Traces to**: AC-3.2, F-T7. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: synthetic sharp-turn pair injected into `nav_cam_60_slice`.
|
||||||
|
|
||||||
|
**Steps**: see FT-P-14; resilience perspective: cuVSLAM tracking-loss event → matcher invocation via re-loc trigger → recovery.
|
||||||
|
|
||||||
|
**Pass criteria**: error ≤ 50 m within 3 frames of turn; cuVSLAM tracking-state returns to NORMAL.
|
||||||
|
**Duration**: 60 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-07: Disconnected-segment recovery (AC-3.3)
|
||||||
|
|
||||||
|
**Summary**: ≥3 disconnected segments per flight; each segment connects to prior trajectory via global retrieval.
|
||||||
|
**Traces to**: AC-3.3, F-T8. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: `disconnected_segments_replay` with ≥3 large gaps.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Replay segment N (after gap) | VPR retrieves top-K candidate chunks; matcher relocalizes within 10 frames |
|
||||||
|
| 2 | After re-loc, trajectory continuity restored (no jump in EKF position beyond gap-expected) | `tracking_state == NORMAL` |
|
||||||
|
| 3 | Repeat for ≥3 segments | all 3 succeed |
|
||||||
|
|
||||||
|
**Pass criteria**: 3/3 segments recover within 10 frames; trajectory continuity maintained.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-08: cuVSLAM-degraded fall-back path
|
||||||
|
|
||||||
|
**Summary**: If cuVSLAM underperforms (tracking lost repeatedly), SUT degrades gracefully and emits `dead_reckoned` source label rather than producing wild estimates.
|
||||||
|
**Traces to**: AC-1.4, AC-3.x, R8 reframed. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: scripted cuVSLAM tracking loss for 30 s.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Force cuVSLAM tracking-loss for 30 s | source label switches to `dead_reckoned`; horiz_accuracy grows |
|
||||||
|
| 2 | After 30 s, restore cuVSLAM | source label returns to `vo_extrapolated` or `satellite_anchored` |
|
||||||
|
| 3 | Verify GPS_INPUT during the 30 s window does not contain wild jumps | per-frame Δposition ≤ IMU integration bound |
|
||||||
|
|
||||||
|
**Pass criteria**: source label correctly transitions; no wild jumps; behaviour reversible.
|
||||||
|
**Duration**: 60 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-09: Tile-cache corruption — graceful degradation
|
||||||
|
|
||||||
|
**Summary**: Corrupted MBTiles entry triggers reject + WARN, not a crash.
|
||||||
|
**Traces to**: AC-8.3, AC-3.x. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: overwrite a tile sidecar JSON with garbage between SUT runs.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Inject corruption | SUT logs WARN at cache-load |
|
||||||
|
| 2 | Replay frames over the affected sector | matcher does not consume the corrupt tile; falls through to next candidate |
|
||||||
|
| 3 | SUT process | does NOT crash; tracking_state may go DEGRADED for affected frames, then NORMAL |
|
||||||
|
|
||||||
|
**Pass criteria**: process alive; corrupt tile never produces `satellite_anchored`; recovery on next valid sector.
|
||||||
|
**Duration**: 60 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-10: SITL F-T9 source-switching (AC-4.3 Option A)
|
||||||
|
|
||||||
|
**Summary**: ArduPilot SITL fuses GPS_INPUT correctly; failover to `EK3_SRC2_*` when primary unavailable.
|
||||||
|
**Traces to**: AC-4.3, F-T9 Option A. Tier: T3.
|
||||||
|
|
||||||
|
**Fault injection**: temporarily stop SUT GPS_INPUT emission for 5 s; observe FC failover.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | SUT stops emitting | FC EKF3 detects loss; switches to `EK3_SRC2_*=GPS` |
|
||||||
|
| 2 | Resume SUT emission | EKF3 switches back; no double-fusion (no #30076 / #32506 symptoms) |
|
||||||
|
|
||||||
|
**Pass criteria**: clean switch in both directions; EKF3 logs show no double-fusion symptoms.
|
||||||
|
**Duration**: 15 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-11: MAVLink2 signing failure — FC rejects, SUT logs
|
||||||
|
|
||||||
|
**Summary**: When the runner sends a deliberately mis-signed GPS_INPUT, FC rejects and SUT/FC log the rejection.
|
||||||
|
**Traces to**: M-7, S-T1, F-T9 signing assertion. Tier: T3.
|
||||||
|
|
||||||
|
**Fault injection**: send a GPS_INPUT with valid schema but invalid signing tag.
|
||||||
|
|
||||||
|
**Steps**: see FT-N-14.
|
||||||
|
|
||||||
|
**Pass criteria**: FC ARM-rejects the message; STATUSTEXT WARN observable; FC continues on prior valid source.
|
||||||
|
**Duration**: 30 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-12: Stale-tile rejection (AC-NEW-6)
|
||||||
|
|
||||||
|
**Summary**: Tile beyond freshness budget (or grace zone) is rejected — `satellite_anchored` source label NEVER produced from it.
|
||||||
|
**Traces to**: AC-8.2, AC-NEW-6, NF-T6. Tier: T1.
|
||||||
|
|
||||||
|
**Fault injection**: `stale_tile_scenarios` with ages 7 / 11 / 13 / 18 months for active-conflict + stable-rear sectors.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | For each combination, replay frames over the affected sector | matcher invocation either skipped or scored 0 |
|
||||||
|
| 2 | Assert source label of resulting GPS_INPUT | NEVER `satellite_anchored` from stale tile |
|
||||||
|
| 3 | Confidence weight on tiles in 30-day grace zone | linearly decayed per spec |
|
||||||
|
|
||||||
|
**Pass criteria**: as above.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-13: F-T16 cloud-occlusion injection
|
||||||
|
|
||||||
|
**Summary**: Synthetic cloud occlusion on a fraction of frames does not cause cascading failure.
|
||||||
|
**Traces to**: F-T16, AC-3.x. Tier: T2 (`deferred-corpus`).
|
||||||
|
|
||||||
|
**Fault injection**: 30 % of frames in AerialVL S03 replay overlaid with synthetic cloud cover.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Run replay | matcher fails on cloud-occluded frames; pipeline degrades to `vo_extrapolated` |
|
||||||
|
| 2 | After cloud passes, satellite re-loc resumes | source returns to `satellite_anchored` |
|
||||||
|
|
||||||
|
**Pass criteria**: AC-1.1 / AC-1.2 still met on the non-cloud-frame subset; pipeline does not enter unrecoverable state.
|
||||||
|
**Duration**: 90 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-14: 8-hour soak — no FDR rollover loss (AC-NEW-3)
|
||||||
|
|
||||||
|
**Summary**: Sustained 8 h replay; FDR caps at 64 GB and rolls over without silently dropping a payload class.
|
||||||
|
**Traces to**: AC-NEW-3, NF-T5. Tier: T4 (`deferred-hil`).
|
||||||
|
|
||||||
|
**Fault injection**: replay `synthetic_8h_load` continuously for 8 h.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Action | Expected Behavior |
|
||||||
|
|------|--------|------------------|
|
||||||
|
| 1 | Run replay | FDR populates |
|
||||||
|
| 2 | Inspect at every hour boundary | size monotonic up to cap; rollover events logged |
|
||||||
|
| 3 | After 8 h | FDR ≤ 64 GB; all payload classes present (positions, IMU, GPS_INPUT, tlog, system health, mid-flight tiles, failure-thumbnail log) |
|
||||||
|
|
||||||
|
**Pass criteria**: ≤ 64 GB; all classes present in the latest segment; rollover events logged for any class that hit cap.
|
||||||
|
**Duration**: 8 h.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-15: AC-NEW-7 cache-poisoning Service-side voting
|
||||||
|
|
||||||
|
**Summary**: Single-flight onboard tile is NOT promoted to trusted basemap until ≥2 voting flights confirm.
|
||||||
|
**Traces to**: AC-NEW-7, F-T3. Tier: T1 (with `service-stub`).
|
||||||
|
|
||||||
|
**Fault injection** (cooperative): submit a single-flight tile with deliberately deflated EKF covariance.
|
||||||
|
|
||||||
|
**Steps**: see FT-N-17.
|
||||||
|
|
||||||
|
**Pass criteria**: candidate stays `trust_level=candidate`; promotion only after N≥2 voting; for active sectors, single-flight promotion only when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 %.
|
||||||
|
**Duration**: 5 min.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-16: ROS 2 topic-rate sanity (F-T19)
|
||||||
|
|
||||||
|
**Summary**: Under simulated load, all expected ROS 2 contract topics meet expected publish rates.
|
||||||
|
**Traces to**: F-T19, Q6 → A. Tier: T1 (uses ROS 2 sniffer that subscribes only to documented contract topics, treating internal topics as opaque).
|
||||||
|
|
||||||
|
**Fault injection**: synthetic load (load generator publishes pseudo-image frames at 3 fps + IMU at 200 Hz).
|
||||||
|
|
||||||
|
**Steps**: subscribe to `nav_msgs/Odometry` (cuVSLAM output), `sensor_msgs/Image` (camera input), `mavros/global_position/global` (FC bridge), `mavros/imu/data` (FC bridge).
|
||||||
|
|
||||||
|
**Pass criteria**: each contract topic publishes at expected rate ± 10 % over a 5 min window.
|
||||||
|
**Duration**: 5 min.
|
||||||
@@ -0,0 +1,177 @@
|
|||||||
|
# Resource Limit Tests
|
||||||
|
|
||||||
|
> All tests measure resources via the `prom` (Prometheus) and `nvidia-smi-exporter` services defined in `environment.md`. None of these tests touch SUT internals.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-01: Memory ≤8 GB shared (AC-4.2)
|
||||||
|
|
||||||
|
**Summary**: Peak resident memory + GPU memory remains under the 8 GB shared LPDDR5 cap.
|
||||||
|
**Traces to**: AC-4.2, results_report row 35, NF-T2. Tier: T1 (Docker mem accounting) + T4 (`tegrastats`).
|
||||||
|
|
||||||
|
**Preconditions**: 30-min sustained replay on Orin Nano Super 25 W (T4) or 30-min replay on x86+CUDA emulation (T1 functional only).
|
||||||
|
|
||||||
|
**Monitoring**:
|
||||||
|
- `prom` scrapes the SUT's `/metrics` endpoint for `process_resident_memory_bytes`.
|
||||||
|
- `nvidia-smi-exporter` (T4) scrapes Jetson `tegrastats` for shared-LPDDR5 usage.
|
||||||
|
|
||||||
|
**Duration**: 30 min replay.
|
||||||
|
|
||||||
|
**Pass criteria**:
|
||||||
|
- T4 binding: peak shared LPDDR5 usage < 8192 MB throughout; growth ≤ 50 MB over the 30-min window (no leak).
|
||||||
|
- T1 functional: peak resident memory < 8192 MB; growth ≤ 50 MB.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-02: Thermal — junction temperature ≤80 °C, no throttle (results_report row 36)
|
||||||
|
|
||||||
|
**Summary**: SoC junction temperature stays below 80 °C; no thermal throttle event.
|
||||||
|
**Traces to**: results_report row 36, AC-NEW-5 (sub-budget). Tier: T4.
|
||||||
|
|
||||||
|
**Preconditions**: T4 only; +25 °C ambient.
|
||||||
|
|
||||||
|
**Monitoring**: `nvidia-smi-exporter` reads junction temp every 1 s.
|
||||||
|
|
||||||
|
**Duration**: 30 min replay.
|
||||||
|
|
||||||
|
**Pass criteria**: max(junction_temp_c) ≤ 80 °C; throttle_event_count == 0 (per `tegrastats throttle` indicator).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-03: AC-NEW-5 thermal envelope — 8 h @ 25 W @ +50 °C ambient
|
||||||
|
|
||||||
|
**Summary**: Cooling solution sustains 25 W for 8 h at +50 °C ambient without thermal throttling.
|
||||||
|
**Traces to**: AC-NEW-5, NF-T3, restriction §Onboard Hardware. Tier: T4 (`deferred-hil`) — requires hot-soak chamber.
|
||||||
|
|
||||||
|
**Preconditions**: hot-soak chamber, +50 °C ambient stabilized; SUT in 25 W mode running `synthetic_8h_load`.
|
||||||
|
|
||||||
|
**Monitoring**: junction temp + throttle indicator via `tegrastats`; ambient temp probe; FDR thermal log (AC-NEW-3 includes thermal traces).
|
||||||
|
|
||||||
|
**Duration**: 8 h.
|
||||||
|
|
||||||
|
**Pass criteria**: throttle_event_count == 0 over 8 h; throttle event automatically emits STATUSTEXT to GCS if it occurs (verify behaviour with a deliberate throttle injection in a separate run).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-04: AC-NEW-5 cold-soak cold-start
|
||||||
|
|
||||||
|
**Summary**: Cold-start TTFF at −20 °C ambient meets AC-NEW-1 budget.
|
||||||
|
**Traces to**: AC-NEW-5 cold corner, AC-NEW-1, NF-T3 cold-soak. Tier: T4 (`deferred-hil`) — requires cold chamber.
|
||||||
|
|
||||||
|
**Preconditions**: chamber stabilized at −20 °C with SUT powered off; nav-cam + IMU sources cold-replay-ready.
|
||||||
|
|
||||||
|
**Monitoring**: TTFF timer (per FT-P-16 / FT-P-T4 cold).
|
||||||
|
|
||||||
|
**Duration**: 50 cold boots within the cold chamber.
|
||||||
|
|
||||||
|
**Pass criteria**: 95th percentile TTFF ≤ 30 s.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-05: FDR — 8-h cap + rollover (AC-NEW-3, NF-T5)
|
||||||
|
|
||||||
|
**Summary**: After 8 h replay, FDR is ≤ 64 GB and no payload class silently dropped.
|
||||||
|
**Traces to**: AC-NEW-3, AC-8.5, NF-T5. Tier: T1 (volume-size accounting) + T4 (real disk).
|
||||||
|
|
||||||
|
**Preconditions**: clean `fdr` volume at start; `synthetic_8h_load` replay.
|
||||||
|
|
||||||
|
**Monitoring**: filesystem accounting per directory class; FDR rollover log (must record every dropped segment).
|
||||||
|
|
||||||
|
**Duration**: 8 h.
|
||||||
|
|
||||||
|
**Pass criteria**:
|
||||||
|
- Total FDR ≤ 64 GB.
|
||||||
|
- All payload classes present in the latest segment: per-frame positions w/ covariance + source-label, FC IMU full-rate, GPS_INPUT frames, MAVLink raw stream (tlog), system health (CPU / GPU / temp / throttle), mid-flight tiles, ≤0.1 Hz failure-thumbnail log.
|
||||||
|
- For each rollover, a STATUSTEXT or rollover log entry exists; no silent drop.
|
||||||
|
- Raw nav-cam / AI-cam frames are NOT present (AC-8.5 cross-check).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-06: Tile cache ≤ 10 GB persistent (restrictions §UAV)
|
||||||
|
|
||||||
|
**Summary**: Persistent satellite-tile cache for the 400 km² operational area + onboard-generated tiles fits in 10 GB.
|
||||||
|
**Traces to**: restrictions §UAV ("~10 GB" tile-cache budget). Tier: T1.
|
||||||
|
|
||||||
|
**Preconditions**: simulate 400 km² operational area (satellite tiles + DEM tiles + VPR chunk index) loaded; run a flight that generates onboard tiles; let cache settle.
|
||||||
|
|
||||||
|
**Monitoring**: filesystem size of `/probe/tiles/`.
|
||||||
|
|
||||||
|
**Duration**: 30 min replay (enough to populate onboard tiles).
|
||||||
|
|
||||||
|
**Pass criteria**: total cache size ≤ 10 GB after the flight; deduplication keeps onboard tiles per sector ≤ 1.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-07: GPU memory peak
|
||||||
|
|
||||||
|
**Summary**: TensorRT engines (cuVSLAM + matcher + VPR) collectively fit within Orin Nano Super shared LPDDR5 with headroom for the rest of the system.
|
||||||
|
**Traces to**: AC-4.2, NF-T2 (extended for ROS 2 image growth). Tier: T4.
|
||||||
|
|
||||||
|
**Preconditions**: all TRT engines loaded.
|
||||||
|
|
||||||
|
**Monitoring**: `tegrastats` GPU memory line.
|
||||||
|
|
||||||
|
**Duration**: steady-state 5 min after warm-up.
|
||||||
|
|
||||||
|
**Pass criteria**: GPU memory ≤ 4 GB (leaves ≥ 4 GB for ROS 2 nodes + working set + OS); engine reservation ≥ 1 GB for matcher + VPR (per NF-T2 extended).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-08: Per-frame GPU latency budget breakdown
|
||||||
|
|
||||||
|
**Summary**: Sum of (cuVSLAM + matcher + VPR + Component 5 calibrator + Component 1b ortho) ≤ 400 ms p95 per AC-4.1.
|
||||||
|
**Traces to**: AC-4.1, NFT-PERF-01..04. Tier: T4.
|
||||||
|
|
||||||
|
**Monitoring**: per-stage timers exposed via `/metrics`.
|
||||||
|
|
||||||
|
**Duration**: 30 min replay.
|
||||||
|
|
||||||
|
**Pass criteria**: Σ p95(per-stage) ≤ 400 ms; each component within its sub-budget (cuVSLAM ≤ 20, matcher inline ≤ 200, ortho ≤ 50, VPR conditional ≤ 200 only on triggers, calibrator ≤ 5).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-09: ROS 2 + Isaac ROS image footprint
|
||||||
|
|
||||||
|
**Summary**: Deployment image fits the documented ~200 MB growth budget over the DIY-Python baseline.
|
||||||
|
**Traces to**: M-29 cost / benefit, NF-T2 extended. Tier: T1 (image inspection).
|
||||||
|
|
||||||
|
**Steps**: build the deployment image; compare against a baseline DIY-Python image manifest; assert delta ≤ 200 MB.
|
||||||
|
|
||||||
|
**Pass criteria**: delta ≤ 200 MB; matcher + VPR engine reservation ≥ 1 GB available at runtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-10: CPU usage — DDS overhead bound
|
||||||
|
|
||||||
|
**Summary**: ROS 2 DDS + topic serialisation overhead stays within the documented 2–5 % CPU.
|
||||||
|
**Traces to**: M-29 (Q6 → A cost / benefit). Tier: T4.
|
||||||
|
|
||||||
|
**Monitoring**: per-process CPU via `prom`; DDS process / `rmw_*` thread CPU specifically.
|
||||||
|
|
||||||
|
**Duration**: 30 min replay.
|
||||||
|
|
||||||
|
**Pass criteria**: DDS CPU mean ≤ 5 %; total SUT CPU ≤ 80 % to leave headroom for spikes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-11: Operational area ≤ 400 km² and 8-h flight cap
|
||||||
|
|
||||||
|
**Summary**: SUT correctly handles the documented operational ceiling (sector 150 km² + corridor 50 km² ≈ 200 km² typical, up to 400 km² total).
|
||||||
|
**Traces to**: restrictions §UAV. Tier: T1 (smoke + audit).
|
||||||
|
|
||||||
|
**Steps**: configure SUT with a 400 km² operational area; verify boot-time pre-allocation respects budget; run a synthetic flight at 60 km/h cruise for 30 min (representative of 8 h scaled).
|
||||||
|
|
||||||
|
**Pass criteria**: SUT loads tile descriptors + VPR index without OOM; 30 min replay sustained at expected fps; resource budgets (NFT-RES-LIM-01..10) all green at this scale.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-RES-LIM-12: Disk I/O — FDR write rate sustainable
|
||||||
|
|
||||||
|
**Summary**: FDR write rate sustained over 8 h does not back up the writer or interfere with the inline pipeline.
|
||||||
|
**Traces to**: AC-NEW-3, AC-4.1 (no interference). Tier: T4.
|
||||||
|
|
||||||
|
**Monitoring**: NVMe write throughput (MB/s) via Prometheus + I/O wait via `vmstat`.
|
||||||
|
|
||||||
|
**Duration**: 8 h.
|
||||||
|
|
||||||
|
**Pass criteria**: write rate ≤ NVMe sustained throughput minus 30 % headroom; I/O wait does not contribute to AC-4.1 latency violations (NFT-PERF-01 still passes during the 8-h window).
|
||||||
@@ -0,0 +1,222 @@
|
|||||||
|
# Security Tests
|
||||||
|
|
||||||
|
> Black-box security scenarios at the public interfaces. Code-level vulnerability scanning is out of scope here (handled by Phase 4 security audit / `security/SKILL.md`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-01: MAVLink2 signing — invalid signature rejected (S-T1)
|
||||||
|
|
||||||
|
**Summary**: A GPS_INPUT or other companion-bound MAVLink frame with invalid signing tag is rejected by the FC; SUT and FC both log the rejection.
|
||||||
|
**Traces to**: M-7, R10, restrictions §Sensors (MAVLink2 signing mandatory), S-T1, F-T9. Tier: T3 (`deferred-sitl`).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Runner injects a GPS_INPUT with valid schema but signing tag computed against a wrong key | FC discards frame; STATUSTEXT WARN visible at GCS |
|
||||||
|
| 2 | Inspect FC log | rejection event recorded |
|
||||||
|
| 3 | Subsequent valid GPS_INPUT | accepted normally |
|
||||||
|
|
||||||
|
**Pass criteria**: invalid frame discarded; FC continues on prior valid source; valid frames still accepted.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-02: HTTPS unauthenticated requests are rejected
|
||||||
|
|
||||||
|
**Summary**: All HTTPS API endpoints require valid JWT.
|
||||||
|
**Traces to**: results_report row 33, restriction "JWT auth on the HTTP boundary". Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Endpoint | Auth | Expected Response |
|
||||||
|
|------|---------|------|-------------------|
|
||||||
|
| 1 | `POST /sessions` | none | HTTP 401 |
|
||||||
|
| 2 | `POST /objects/locate` | none | HTTP 401 |
|
||||||
|
| 3 | `GET /sessions/{id}/stream` | none | HTTP 401 |
|
||||||
|
| 4 | `GET /health` | none | HTTP 200 (health is intentionally unauthenticated for liveness probes — confirm via S-T2) OR 401 if it requires auth |
|
||||||
|
|
||||||
|
**Pass criteria**: 1–3 return 401; 4's behaviour matches the documented contract (test asserts whichever the contract states). If `/health` is unauthenticated, body still must NOT leak sensitive state (no flight data, no key fingerprints).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-03: HTTPS — malformed / expired / wrong-issuer JWT
|
||||||
|
|
||||||
|
**Summary**: JWTs that fail validation are rejected.
|
||||||
|
**Traces to**: derived from results_report row 33. Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Token | Expected Response |
|
||||||
|
|------|-------|-------------------|
|
||||||
|
| 1 | malformed (`.foo.bar`) | HTTP 401 |
|
||||||
|
| 2 | expired (`exp` in the past) | HTTP 401 |
|
||||||
|
| 3 | wrong issuer | HTTP 401 |
|
||||||
|
| 4 | wrong signing algorithm (`none` algorithm) | HTTP 401 |
|
||||||
|
| 5 | missing required claim (e.g., `sub`) | HTTP 401 |
|
||||||
|
|
||||||
|
**Pass criteria**: all return 401 with no leaked state in the body.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-04: TLS — minimum version + downgrade rejection
|
||||||
|
|
||||||
|
**Summary**: TLS ≥1.2; weaker / downgrade attempts rejected.
|
||||||
|
**Traces to**: S-T2, derived from restriction "telemetry plumbing uses MAVSDK + HTTPS API". Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Connect with TLSv1.0 / TLSv1.1 | refused |
|
||||||
|
| 2 | Connect with cipher suite from a known weak set (e.g., RC4) | refused |
|
||||||
|
| 3 | Valid TLSv1.2+ + modern cipher | accepted |
|
||||||
|
|
||||||
|
**Pass criteria**: all weak attempts refused; modern accepted.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-05: Tile-cache write attempt by unauthorized API path
|
||||||
|
|
||||||
|
**Summary**: SUT does not expose any HTTP path that allows external clients to write to the tile cache.
|
||||||
|
**Traces to**: AC-8.5 (storage policy), AC-NEW-7 (cache integrity), restriction §Satellite. Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | `POST /tiles` (or any guess) with valid JWT | 404 or 405 (no such endpoint) |
|
||||||
|
| 2 | Try `PUT /var/lib/gpsdenied/tiles/...` via any exposed API | 404 / 405 |
|
||||||
|
| 3 | Inspect the documented OpenAPI contract | no tile-write endpoints |
|
||||||
|
|
||||||
|
**Pass criteria**: no successful tile-write paths exist via HTTP; only the post-flight uploader (out-bound to `service-stub`) writes outside the SUT.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-06: Spoofed sysid / sysid collision (M-31)
|
||||||
|
|
||||||
|
**Summary**: A second device claiming sysid 11 (the SUT's sysid) — FC handles per ArduPilot routing rules.
|
||||||
|
**Traces to**: M-31, F-T9. Tier: T3.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Runner publishes a fake GPS_INPUT from a sysid-collision sender | FC routing handles per documented behaviour (latest-talker wins or rejects) |
|
||||||
|
| 2 | Confirm FC parameter audit prints the actual sysid configured | matches deployment runbook (M-31 sysid collision-check) |
|
||||||
|
|
||||||
|
**Pass criteria**: behaviour matches documented FC routing rule; STATUSTEXT WARN observable; test verifies the deploy runbook's collision-check (M-31) catches this in pre-flight.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-07: Operator-hint injection — only signed STATUSTEXT consumed
|
||||||
|
|
||||||
|
**Summary**: Unsigned operator hints (or hints from a non-allowed sender) are not consumed.
|
||||||
|
**Traces to**: AC-6.2, M-7. Tier: T3.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Send `RELOC_HINT` STATUSTEXT with invalid MAVLink2 signing | SUT discards; emits WARN |
|
||||||
|
| 2 | Send from a sysid not on the allowed-list | SUT discards |
|
||||||
|
| 3 | Send signed by allowed sender | SUT consumes (NFT-RES-05 covers happy path) |
|
||||||
|
|
||||||
|
**Pass criteria**: only authenticated, allowed-sender hints are consumed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-08: GPS_RAW_INT spoofing chain — SUT promotion is the safety boundary
|
||||||
|
|
||||||
|
**Summary**: A spoofed `GPS_RAW_INT` cannot influence SUT's GPS_INPUT directly; SUT only uses GPS_RAW_INT for source-promotion logic, not for fusing.
|
||||||
|
**Traces to**: AC-NEW-2, restriction §Failsafe. Tier: T3.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Inject GPS_RAW_INT with high-quality false fix | SUT does NOT use it as a position seed; only uses it for the "real-GPS health" rolling average |
|
||||||
|
| 2 | After scripted spoofing-pattern, SUT promotes its own estimate per AC-NEW-2 | promotion event observable |
|
||||||
|
|
||||||
|
**Pass criteria**: SUT GPS_INPUT positions never influenced by spoofed GPS_RAW_INT lat/lon (compare SUT GPS_INPUT vs ground truth from `coordinates.csv` during the spoof window).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-09: USB bypass surface — bench-only
|
||||||
|
|
||||||
|
**Summary**: USB bypasses MAVLink2 signing per restriction; this must be **disabled in production** runtime config.
|
||||||
|
**Traces to**: M-7, restrictions §Onboard Hardware. Tier: T1 (config audit).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | At SUT boot, inspect runtime config | USB MAVLink endpoint disabled in production profile (env var `MAVLINK_USB_ALLOWED=false` or absent) |
|
||||||
|
| 2 | Attempt to connect via USB | refused |
|
||||||
|
|
||||||
|
**Pass criteria**: production config refuses USB MAVLink; bench config (env var explicitly enabled) accepts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-10: FDR — no sensitive-data leak
|
||||||
|
|
||||||
|
**Summary**: FDR contains the documented payload classes only — no private keys, no plaintext JWTs, no MAVLink2 signing keys, no raw frames (AC-8.5).
|
||||||
|
**Traces to**: AC-8.5, AC-NEW-3, S-T3 (data-at-rest). Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | After a 30 min replay, scan FDR for known-sensitive byte patterns (test-only signing key bytes; test JWT) | none found |
|
||||||
|
| 2 | Scan for raw JPEG headers in non-thumbnail-log payload classes | none |
|
||||||
|
| 3 | Verify failure-thumbnail log is ≤ 0.1 Hz and within FDR cap | as spec'd |
|
||||||
|
|
||||||
|
**Pass criteria**: no leaks; raw-frame storage policy enforced.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-11: External-host network policy
|
||||||
|
|
||||||
|
**Summary**: SUT does not call external commercial satellite providers at runtime.
|
||||||
|
**Traces to**: AC-8.1, restrictions §Satellite. Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Run a 5-min replay with `iptables` / Docker network policy capturing all out-bound connections | none of the captured destinations resolves to Maxar / Airbus / Planet / Sentinel-2 / Esri / etc. |
|
||||||
|
| 2 | The only allowed out-bound is to `service-stub` (the Suite Satellite Service candidate-pool endpoint, post-flight) | matches |
|
||||||
|
|
||||||
|
**Pass criteria**: no out-bound to commercial / public ortho providers at runtime.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-12: HTTPS — payload size + path-traversal hardening
|
||||||
|
|
||||||
|
**Summary**: Pathological HTTP requests do not crash the SUT or leak filesystem content.
|
||||||
|
**Traces to**: AC-3.x (resilience), restrictions (security defaults). Tier: T1.
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | `POST /objects/locate` with a 100 MB body | HTTP 413 (payload too large) |
|
||||||
|
| 2 | Path-traversal `GET /sessions/../../etc/passwd` | HTTP 404 / 400; no filesystem leak |
|
||||||
|
| 3 | Header-injection (`X-Forwarded-For: \r\nSet-Cookie: …`) | sanitised; no echo back |
|
||||||
|
|
||||||
|
**Pass criteria**: as above; SUT alive; no leak.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### NFT-SEC-13: AC-NEW-7 over-confidence injection — gate rejects
|
||||||
|
|
||||||
|
**Summary**: Synthetic over-confidence injection (1.5×–3× covariance deflation) does not let bad tiles into the trusted basemap.
|
||||||
|
**Traces to**: AC-NEW-7. Tier: T2 (`deferred-corpus`).
|
||||||
|
|
||||||
|
**Steps**:
|
||||||
|
|
||||||
|
| Step | Consumer Action | Expected Response |
|
||||||
|
|------|----------------|-------------------|
|
||||||
|
| 1 | Replay AerialVL + Mavic + AerialExtreMatch with synthetic deflation | per-tile geo-misalignment computed |
|
||||||
|
| 2 | At the σ_xy boundary (3 m, 5 m, 10 m), assert hard-gate behaviour | tiles outside σ_xy ≤ 5 m never written; tiles in (3, 5] m marked `trust_level=soft`; tiles ≤ 3 m `trust_level=candidate` |
|
||||||
|
|
||||||
|
**Pass criteria**: P(misalign > 30 m) < 1 %, P(misalign > 100 m) < 0.1 %; voting layer prevents single-flight promotion in non-active sectors.
|
||||||
@@ -0,0 +1,164 @@
|
|||||||
|
# Test Data Management
|
||||||
|
|
||||||
|
## Important Caveat — 60-image slice scope (per Phase 1 D2)
|
||||||
|
|
||||||
|
The 60 nav-cam JPGs in `_docs/00_problem/input_data/AD000001.jpg … AD000060.jpg` were captured at **400 m AGL** with the **ADTi Surveyor Lite 26S v2 (26 MP, 6252 × 4168, 25 mm, 23.5 mm sensor)** — **not** the deployment camera (ADTi 20MP 20L V1, APS-C, ~5472 × 3648) and **not** the deployment altitude (≤1 km AGL). This corpus is therefore **pipeline-correctness only**:
|
||||||
|
|
||||||
|
- It validates that the pipeline (cuVSLAM → VPR → matcher → Component 5 → MAVLink GPS_INPUT) produces the right **shape** of output, in the right **order**, with the right **categorical labels** and **MAVLink schema**.
|
||||||
|
- It does **NOT** validate the deployment-binding accuracy budgets (AC-1.1 ≥80 %@50 m, AC-1.2 ≥50 %@20 m), the GSD-band assumptions, the matcher resolution sweeps, or the latency budget for the deployed 1 km AGL / 20 MP path.
|
||||||
|
- Pass numbers from this slice on AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-NEW-8 are **functional, not deployment-binding**. The deployment-binding numbers come from the deferred-corpus tier (AerialVL S03, UAV-VisLoc, AerialExtreMatch, internal Mavic, first internal fixed-wing flight).
|
||||||
|
|
||||||
|
## Seed Data Sets
|
||||||
|
|
||||||
|
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|
||||||
|
|----------|-------------|---------------|-----------|---------|
|
||||||
|
| `nav_cam_60_slice` | 60 JPGs `AD000001.jpg`…`AD000060.jpg`, 6252×4168, captured at 400 m AGL | T1 pipeline-correctness tests (FT-P-01..FT-P-08, FT-N-01..FT-N-04) | volume mount `fixtures-images:/fixtures/images:ro` | volume is read-only — no cleanup |
|
||||||
|
| `nav_cam_60_slice_coordinates` | `coordinates.csv`: per-frame WGS84 ground truth | All T1 accuracy tests | mount path `/fixtures/images/coordinates.csv` | — |
|
||||||
|
| `nav_cam_60_slice_imu` *(synthetic, fixture)* | `fixtures/imu_AD0000xx.csv`: 200 Hz IMU traces synthesised by SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory | T1 cuVSLAM tests; F-T1c IMU-sync-jitter measurement | mount path `/fixtures/imu/` ; `ardupilot-sitl --imu-replay=...` | regenerated per test session |
|
||||||
|
| `satellite_tiles_AD0000xx_z20` *(placeholder fixture)* | z=20 ortho-tiles for the bbox of `coordinates.csv`, fetched offline by `tile-cache-init` from public ortho service (Esri / Mapbox / Sentinel-2 fallback gated to ≥0.5 m/px) | T1 cross-view matcher / VPR tests | volume `tile-cache:/var/lib/gpsdenied/tiles` | volume rebuilt per test session |
|
||||||
|
| `satellite_tile_descriptors_z20` | Pre-extracted SuperPoint keypoints + DINOv2-VLAD global descriptors for `satellite_tiles_AD0000xx_z20` | T1 VPR + matcher tests | same volume, sidecar `.descriptors.h5` files | same |
|
||||||
|
| `aerialvl_s03` *(deferred-corpus)* | AerialVL S03: 70 km of fixed-wing flight at 1 km AGL with synced IMU + GPS truth + nav-cam stream | T2 AC-1.3, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9 | external download script (data team task — Decompose); mount when present | not removed (large, kept across sessions) |
|
||||||
|
| `uav_visloc` *(deferred-corpus)* | UAV-VisLoc public dataset | T2 matcher / VPR seasonal-robustness regression | external download script | not removed |
|
||||||
|
| `aerialextrematch` *(deferred-corpus)* | AerialExtreMatch open-review dataset | T2 matcher seasonal-robustness regression | external download script | not removed |
|
||||||
|
| `2chadcnn_seasons` *(deferred-corpus)* | 2chADCNN season set (cross-season scene-change benchmark) | T2 NF-T*-season-robustness | external download script | not removed |
|
||||||
|
| `tartanair_v2` *(deferred-corpus)* | TartanAir V2 synthetic scenes | T2 matcher distillation evaluation | external download script | not removed |
|
||||||
|
| `internal_mavic` *(deferred-corpus)* | Internal Mavic 3 Pro Mini recorded flights (legacy attempt; no IMU per problem.md, used for visual-only checks) | T2 matcher visual-only regression | external `data team` mount | not removed |
|
||||||
|
| `internal_fixed_wing_first_sortie` *(deferred-field)* | First internal fixed-wing flight with synced IMU + GPS truth | T5 FT-1 / FT-2 / FT-3, AC-1.3 lock | field-test mount | not removed |
|
||||||
|
| `synthetic_8h_load` *(synthesisable)* | 8-hour synthetic 3 fps nav-frame replay sequence assembled from `nav_cam_60_slice` looped + jittered | NF-T3 thermal soak, NF-T5 FDR rollover (AC-NEW-3), AC-NEW-5 | generated at fixture build time by `fixtures/synth-8h-loader/` | regenerated per session |
|
||||||
|
| `cold_soak_corpus` *(deferred-hil)* | A short replay loop run at −20 °C ambient | T4 NF-T3 cold-soak, AC-NEW-1 cold | bench HW only | — |
|
||||||
|
| `hot_soak_corpus` *(deferred-hil)* | Same replay loop run at +50 °C ambient for 8 h | T4 NF-T3 hot-soak, AC-NEW-5 | bench HW only | — |
|
||||||
|
| `spoofing_scenarios` | Scripted MAVLink GPS_RAW_INT injections: jam-onset, lat/lon offset, sat-count drop, hdop spike | T3 F-T9 / F-T12, AC-NEW-2 | `gps-spoof-injector` config files | regenerated per session |
|
||||||
|
| `operator_hint_scenarios` | Scripted operator STATUSTEXT messages with approximate `(lat, lon, sigma_xy=500m)` | T3 F-T10, AC-3.4, AC-6.2, results_report row 22 | `qgc-mock` config | regenerated per session |
|
||||||
|
| `stale_tile_scenarios` | Synthetic-age tiles (1, 5, 7, 11, 13, 18 months old; both active-conflict and stable-rear sectors) | T1 NF-T6, AC-8.2 / AC-NEW-6 | injected into `tile-cache` by `tile-cache-init --inject-stale` | volume rebuilt per session |
|
||||||
|
| `cache_poisoning_scenarios` | Multi-flight Monte Carlo with synthetic over-confidence injection (EKF covariance deflated by 1.5×–3×) | T2 NF-T4b, AC-NEW-7 | generated by `fixtures/cache-poison-mc/` | regenerated per session |
|
||||||
|
| `cold_start_replay_50` | 50× cold-boot replay: SUT process killed and restarted with simulated FC pose injection | T1+T4 F-T11, AC-NEW-1 | scripted in `e2e-runner` test | — |
|
||||||
|
| `disconnected_segments_replay` | Synthetic ≥3 disconnected flight segments stitched from `nav_cam_60_slice` with gaps | T1 F-T8, AC-3.3 | generated at fixture build time | regenerated per session |
|
||||||
|
| `tile_dedup_replay` | A flight where ground sectors are visited twice — used to verify deduplication (AC-8.4) | T1 F-T2 | generated at fixture build time | regenerated per session |
|
||||||
|
| `mavlink2_signing_keys` | Test-only per-airframe HMAC-SHA256 signing keys | T1 / T3 F-T9, S-T1, MAVLink2 signing assertions | env var `MAVLINK2_SIGNING_KEY=…` shared SUT + runner + FC | rotated per session |
|
||||||
|
| `tls_test_certs` | Self-signed CA + SUT cert + client cert (test-only) | T1 S-T1..S-T5 HTTPS auth tests | mount `tls-test-certs:/etc/gpsdenied/tls:ro` | regenerated per session |
|
||||||
|
|
||||||
|
## Data Isolation Strategy
|
||||||
|
|
||||||
|
- **Container scope**: each test session starts with a clean `sut` container (no cache poisoning between sessions).
|
||||||
|
- **Volume scope**: `tile-cache` and `fdr` volumes are **rebuilt per test session** (not per test) — within a session, tests that depend on cache state are ordered or use namespaced subdirectories. `fixtures-images`, `fixtures-imu`, `fixtures-expected` are read-only; cannot be polluted.
|
||||||
|
- **Cross-test contamination**: tests that mutate state (cache writes, FDR writes) declare `pytest.mark.mutates_state` and are run in a serial group. Read-only tests run in parallel within a tier.
|
||||||
|
- **Identity isolation**: each session generates a fresh `mavlink2_signing_keys` set and JWT signing key — replay across sessions is impossible.
|
||||||
|
- **Resource isolation**: T4 deferred-hil tests do **not** share a Jetson with any other test; bench scheduler enforces single-tenant access.
|
||||||
|
|
||||||
|
## Input Data Mapping
|
||||||
|
|
||||||
|
| Input Data File | Source Location | Description | Covers Scenarios |
|
||||||
|
|-----------------|----------------|-------------|-----------------|
|
||||||
|
| `AD000001.jpg`…`AD000060.jpg` | `_docs/00_problem/input_data/` | 60 nav-cam JPGs, 6252×4168, 400 m AGL, ADTi 26S v2 | FT-P-01..FT-P-08, FT-N-01..FT-N-04, NF-RES-LIM-01..03 (T1) |
|
||||||
|
| `coordinates.csv` | `_docs/00_problem/input_data/` | Frame index → WGS84 ground truth | results_report rows 1–4, FT-P-01, FT-P-02, NFT-PERF-01 |
|
||||||
|
| `data_parameters.md` | `_docs/00_problem/input_data/` | Corpus-shoot params (400 m AGL, 26S v2, 25 mm, 23.5 mm sensor) | All T1 tests — context for pipeline-correctness scope |
|
||||||
|
| `AD000001_gmaps.png`, `AD000002_gmaps.png` | `_docs/00_problem/input_data/` | Two satellite reference thumbnails (frames 1–2 only) | Smoke-test only; not used as the cross-view reference (placeholder fixture is) |
|
||||||
|
| `expected_results/results_report.md` | `_docs/00_problem/input_data/` | 46-scenario expected results mapping | All T1 tests + most T2 tests; canonical pass/fail thresholds |
|
||||||
|
| `expected_results/position_accuracy.csv` | `_docs/00_problem/input_data/` | Per-frame ground truth + thresholds | results_report rows 1–3, FT-P-01, FT-P-02 |
|
||||||
|
|
||||||
|
## Expected Results Mapping
|
||||||
|
|
||||||
|
The canonical mapping is `_docs/00_problem/input_data/expected_results/results_report.md`. The traceability matrix references that file by row number. The summary table below lists the rows by the test scenario IDs that consume them.
|
||||||
|
|
||||||
|
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|
||||||
|
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
|
||||||
|
| FT-P-01 | `coordinates.csv` (60 frames) + `nav_cam_60_slice` + `satellite_tiles_AD0000xx_z20` + `nav_cam_60_slice_imu` | ≥80 % within 50 m | `percentage` | ≥80 % | `results_report` row 1; `position_accuracy.csv` |
|
||||||
|
| FT-P-02 | same | ≥50 % within 20 m | `percentage` | ≥50 % | `results_report` row 2; `position_accuracy.csv` |
|
||||||
|
| FT-P-03 | same | each frame ≤100 m error | `numeric_tolerance` | ±100 m max per frame | `results_report` row 3 |
|
||||||
|
| FT-P-04 | same | cumulative VO drift between satellite anchors ≤100 m mono / ≤50 m mono+IMU | `threshold_max` | mono: ≤100 m; mono+IMU: ≤50 m | `results_report` row 4 ; AC-1.3 / AC-NEW-8 |
|
||||||
|
| FT-P-05 | single frame + IMU | `fix_type=3, horiz_accuracy ∈ [1,50] m, satellites_visible=10` | `exact` (fix_type, sat) + `range` (h_acc) | as stated | `results_report` row 5 |
|
||||||
|
| FT-P-06 | sequence, no satellite >30 s | `fix_type=3, horiz_accuracy ∈ [20,100]` | `exact` + `range` | as stated | `results_report` row 6 |
|
||||||
|
| FT-P-07 | sequence, VO lost + no satellite | `fix_type=2, h_acc ≥ 50 m` (growing) | `exact` + `threshold_min` | as stated | `results_report` row 7 |
|
||||||
|
| FT-P-08 | VO lost + 3 sat failures | `fix_type=0, h_acc=999.0` | `exact` | N/A | `results_report` row 8 |
|
||||||
|
| FT-P-09 | tier transitions | tier ∈ {HIGH, MEDIUM, LOW, FAILED} per conditions | `exact` | N/A | `results_report` rows 10–13 |
|
||||||
|
| FT-P-10 | 60 frames | registration rate ≥95 % (T1 functional only) | `percentage` | ≥95 % (functional) | `results_report` row 14 |
|
||||||
|
| FT-P-11 | 60 frames | MRE < 1.0 px VO frame-to-frame; < 2.5 px cross-domain | `threshold_max` | <1.0 / <2.5 | `results_report` row 15 ; AC-2.2 |
|
||||||
|
| FT-P-12 | frames 32–43 (turn area) | system continues producing position estimates through turn | `threshold_min` | ≥1 position output / frame | `results_report` row 16 |
|
||||||
|
| FT-P-13 | 350 m gap synthetic | error ≤100 m after recovery | `threshold_max` | ≤100 m | `results_report` row 17 |
|
||||||
|
| FT-P-14 | sharp-turn synthetic | satellite re-loc triggers; error ≤50 m within 3 frames | `threshold_max` | ≤50 m | `results_report` row 18 |
|
||||||
|
| FT-P-15 | VO loss + sat success | `tracking_state == NORMAL` after recovery | `exact` | N/A | `results_report` row 19 |
|
||||||
|
| FT-P-16 | startup with `GLOBAL_POSITION_INT` | first GPS_INPUT within 30 s of boot, p95 | `threshold_max` | ≤30 s p95 | `results_report` row 23 ; AC-NEW-1 |
|
||||||
|
| FT-P-17 | startup + first satellite match | error ≤50 m after first match | `threshold_max` | ≤50 m | `results_report` row 24 |
|
||||||
|
| FT-P-18 | reboot mid-flight | recovery time ≤30 s | `threshold_max` | ≤30 s | `results_report` row 25 ; AC-NEW-1 |
|
||||||
|
| FT-P-19 | post-reboot first match | error ≤50 m | `threshold_max` | ≤50 m | `results_report` row 26 |
|
||||||
|
| FT-P-20 | object localize valid request | response with lat/lon within `accuracy_m` of ground truth | `numeric_tolerance` | per response.accuracy_m | `results_report` row 27 |
|
||||||
|
| FT-P-21 | round-trip GPS→NED→pixel→GPS | error ≤0.1 m | `threshold_max` | ≤0.1 m | `results_report` row 29 |
|
||||||
|
| FT-P-22 | `GET /health` | 200 + JSON with `status`, `memory_mb`, `gpu_temp_c` | `exact` + `regex` | as stated | `results_report` row 30 |
|
||||||
|
| FT-P-23 | `POST /sessions` | 200 or 201 + session id | `exact` | status ∈ {200,201} | `results_report` row 31 |
|
||||||
|
| FT-P-24 | `GET /sessions/{id}/stream` | SSE events at ~1 Hz with schema fields | `regex` + rate | per SSE schema | `results_report` row 32 |
|
||||||
|
| FT-P-25 | TRT engine load | ≤10 s total | `threshold_max` | ≤10 s | `results_report` row 39 |
|
||||||
|
| FT-P-26 | mission area definition | 300–1000 MB tile storage | `range` | [300, 1000] MB | `results_report` row 40 |
|
||||||
|
| FT-P-27 | EKF position ± 3σ | tile mosaic radius ≥500 m | `threshold_min` | ≥500 m | `results_report` row 41 |
|
||||||
|
| FT-P-28 | tile dedup replay | ≤1 tile per ground sector visited ≥2× | `exact` | per-sector count == 1 | AC-8.4, F-T2 |
|
||||||
|
| FT-P-29 | post-flight upload | tiles uploaded to candidate pool with `trust_level=candidate` | `exact` | as stated | AC-8.4, F-T3 |
|
||||||
|
| FT-P-30 | telemetry | NAMED_VALUE_FLOAT at 1 Hz ± 0.2 Hz | `numeric_tolerance` | 1 Hz ± 0.2 Hz | `results_report` row 45 |
|
||||||
|
| FT-N-01 | corrupted JPG | system continues with `tracking_state == DEGRADED`, no crash | `exact` | tracking_state ∈ {DEGRADED, NORMAL} | derived from AC-3.x |
|
||||||
|
| FT-N-02 | invalid object localize pixel | HTTP 422 | `exact` | status == 422 | `results_report` row 28 |
|
||||||
|
| FT-N-03 | unauthenticated `POST /sessions` | HTTP 401 | `exact` | status == 401 | `results_report` row 33 |
|
||||||
|
| FT-N-04 | tile older than freshness budget | tile rejected or down-confidence; never `satellite_anchored` | `exact` | as stated | AC-8.2, AC-NEW-6 |
|
||||||
|
| FT-N-05 | tile in 30-day grace zone | confidence linearly decayed | `numeric_tolerance` | per spec curve | AC-NEW-6 |
|
||||||
|
| FT-N-06 | sharp turn (no overlap, <70°, <200 m) | satellite re-loc within 3 frames | `threshold_max` | ≤50 m within 3 frames | `results_report` row 18 ; AC-3.2 |
|
||||||
|
| FT-N-07 | VO loss + 3 sat failures | `RELOC_REQ` regex pattern emitted via STATUSTEXT | `regex` | per pattern | `results_report` rows 20, 46 |
|
||||||
|
| FT-N-08 | re-loc active | `fix_type=0`, IMU prediction continues, sat attempts continue | `exact` | as stated | `results_report` row 21 |
|
||||||
|
| FT-N-09 | operator hint received | hint used as 500 m seed for VPR; ≤500 m initially, ≤50 m after match | `threshold_max` | as stated | `results_report` row 22 |
|
||||||
|
| NFT-PERF-01 | single 6252×4168 frame on Orin Nano Super 25 W (T4) | end-to-end latency ≤400 ms p95 | `threshold_max` | ≤400 ms p95 | `results_report` row 34 ; AC-4.1 |
|
||||||
|
| NFT-PERF-02 | cuVSLAM single frame | ≤20 ms / frame | `threshold_max` | ≤20 ms | `results_report` row 37 |
|
||||||
|
| NFT-PERF-03 | matcher single pair on Orin Nano Super 25 W | inline ≤200 ms; re-loc fallback ≤2000 ms | `threshold_max` | as stated | `results_report` row 38 |
|
||||||
|
| NFT-PERF-04 | Orthority per-frame on Orin Nano Super | ≤50 ms / frame | `threshold_max` | ≤50 m frame | F-T14, M-27 |
|
||||||
|
| NFT-PERF-05 | spoof onset → SUT promotion | ≤3 s p95 | `threshold_max` | ≤3 s p95 | AC-NEW-2 ; F-T12 |
|
||||||
|
| NFT-PERF-06 | per-frame end-to-end (frame-by-frame, not batched) | inter-frame interval matches camera rate | `numeric_tolerance` | per frame within ±50 ms of camera rate | AC-4.4 |
|
||||||
|
| NFT-RES-01 | SUT process killed mid-flight | recovery ≤30 s, restart from FC pose | `threshold_max` | ≤30 s | `results_report` row 25 ; AC-5.3, AC-NEW-1 |
|
||||||
|
| NFT-RES-02 | spoofing onset | promotion ≤3 s | `threshold_max` | ≤3 s | AC-NEW-2 |
|
||||||
|
| NFT-RES-03 | network partition with FC | failsafe at 3 s no fix | `threshold_max` | ≤3 s | AC-5.2 |
|
||||||
|
| NFT-RES-04 | EKF3 lane-switch / fix-loss event | source-promotion responds | `exact` | promotion within budget | AC-NEW-2 |
|
||||||
|
| NFT-SEC-01 | unsigned MAVLink injection | FC rejects | `exact` | acceptance==false | F-T9, S-T1 |
|
||||||
|
| NFT-SEC-02 | unauthenticated REST | 401 / 403 | `exact` | per endpoint | results_report row 33 |
|
||||||
|
| NFT-SEC-03 | malformed JWT | 401 | `exact` | status==401 | derived |
|
||||||
|
| NFT-SEC-04 | TLS downgrade attempt | rejected | `exact` | TLS ≥1.2 only | S-T2 |
|
||||||
|
| NFT-SEC-05 | tile-cache write attempt by unauthorized API | 403 / no-op | `exact` | as stated | AC-8.5, AC-NEW-7 |
|
||||||
|
| NFT-RES-LIM-01 | 30-min sustained load (T1+T4) | peak < 8192 MB; growth ≤50 MB / 30 min | `threshold_max` | as stated | results_report row 35 ; AC-4.2 |
|
||||||
|
| NFT-RES-LIM-02 | 30-min sustained load | SoC junction ≤80 °C | `threshold_max` | ≤80 °C | results_report row 36 |
|
||||||
|
| NFT-RES-LIM-03 | 8-h sustained 25 W @ +50 °C ambient (T4) | no thermal throttle | `exact` | throttle_event_count == 0 | AC-NEW-5, NF-T3 |
|
||||||
|
| NFT-RES-LIM-04 | FDR 8-h synthetic load | FDR ≤64 GB; rollover logged; no payload class silently dropped | `threshold_max` + audit | as stated | AC-NEW-3, NF-T5 |
|
||||||
|
| NFT-RES-LIM-05 | tile cache 400 km² | ≤10 GB persistent | `threshold_max` | ≤10 GB | restrictions §UAV |
|
||||||
|
|
||||||
|
## External Dependency Mocks
|
||||||
|
|
||||||
|
| External Service | Mock/Stub | How Provided | Behavior |
|
||||||
|
|-----------------|-----------|-------------|----------|
|
||||||
|
| Azaion Suite Satellite Service (pre-flight cache sync) | `tile-cache-init` one-shot loader | Docker service that materialises MBTiles + sidecar before SUT starts | Returns the same fixture set every run; deterministic |
|
||||||
|
| Azaion Suite Satellite Service (post-flight upload) | candidate-pool stub inside `qgc-mock` (or a dedicated `service-stub` container) | HTTP server with `POST /candidates` accepting tile uploads, recording to a file | Records what the SUT sends; never alters the cache used by the next test |
|
||||||
|
| QGroundControl GCS | `qgc-mock` | Custom MAVLink-only mock | Records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY frames; can inject operator-hint STATUSTEXT |
|
||||||
|
| ArduPilot autopilot | `ardupilot-sitl` (PR #30080-pinned) | Official ArduPilot SITL container | Replays IMU from fixture; runs EKF3; exposes `RAW_IMU`, `ATTITUDE`, `GLOBAL_POSITION_INT`, `EKF_STATUS_REPORT`, `GPS_RAW_INT` |
|
||||||
|
| Spoofing GPS adversary | `gps-spoof-injector` | Custom MAVLink injector | Sends crafted `GPS_RAW_INT` with configurable lat/lon offset, sat count, hdop |
|
||||||
|
| Identity provider (JWT) | in-runner key generator | Test-only HMAC-SHA256 key shared at SUT boot via env var | Mints valid + invalid + expired JWTs |
|
||||||
|
| External satellite providers (Maxar, Airbus, Planet) | **NOT MOCKED** — out of scope per AC-8.1; SUT does not call them at runtime | — | The SUT must never make outbound HTTP to these hosts; F-T2 / NFT-SEC-04 includes a network-policy assertion |
|
||||||
|
|
||||||
|
All mocks are deterministic — same input always produces same output — except the spoof / operator-hint scenarios that explicitly schedule events on a wall-clock so the SUT's timing budgets (AC-NEW-1, AC-NEW-2) are exercised.
|
||||||
|
|
||||||
|
## Data Validation Rules
|
||||||
|
|
||||||
|
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|
||||||
|
|-----------|-----------|-----------------|------------------------|
|
||||||
|
| Nav-cam frame | non-zero size; JPEG / PNG decodable; expected resolution within ±1 % of `data_parameters.md` | 0-byte file, truncated JPEG header, wildly wrong resolution | log error; `tracking_state` transitions to `DEGRADED` if loss >2 frames; never crash |
|
||||||
|
| IMU sample | rate 200 Hz ± 10 %; timestamps monotonic; covariance present | timestamp regression, rate < 50 Hz, NaN / Inf | drop sample with WARN log; if loss > 0.5 s → cuVSLAM degrade; AC-5.2 path eligible |
|
||||||
|
| Satellite tile | MBTiles schema valid; descriptors present; `capture_date` within freshness budget for sector | corrupt MBTiles, missing sidecar, beyond-grace freshness | reject with WARN; AC-8.2 / AC-NEW-6 |
|
||||||
|
| MAVLink GPS_RAW_INT (FC inputs) | well-formed; signing valid (when MAVLink2 signing on) | unsigned frame, malformed length, sysid spoofing | reject; F-T9 + S-T1 cover this |
|
||||||
|
| HTTPS request body | JSON parse OK; required fields present; pixel coords ∈ frame bounds | missing fields, NaN, out-of-bounds pixel | HTTP 422 |
|
||||||
|
| JWT | signature valid; not expired; subject is allowed | expired, wrong sig, missing claims | HTTP 401 |
|
||||||
|
| Tile descriptor | dimension matches index; checksum match | wrong dims, mismatched hash | reject load; cache marks as corrupt; F-T2 |
|
||||||
|
| Operator hint STATUSTEXT | parseable `RELOC_HINT: lat=… lon=… sigma=…`; numeric ranges sane | malformed, NaN, negative sigma, lat > 90 / lon > 180 | reject hint; emit STATUSTEXT WARN; do not seed VPR |
|
||||||
|
|
||||||
|
## Pending Data (Phase 1 D3 — placeholder fixtures)
|
||||||
|
|
||||||
|
The following fixtures are **declared by name** in this spec but **not yet present** at the time of writing. Phase 3's HARD GATE will surface them as **`pending data`**, not "remove":
|
||||||
|
|
||||||
|
| Fixture | Generator / source | Owner | Phase 3 treatment |
|
||||||
|
|---------|-------------------|-------|-------------------|
|
||||||
|
| `fixtures/satellite_tiles_AD0000xx_z20/` | `tile-cache-init` script: fetch z=20 ortho tiles for the bbox of `coordinates.csv` from a public ortho service (Esri / Mapbox / Sentinel-2 ≥ 0.5 m/px); pre-extract SuperPoint + DINOv2-VLAD descriptors | Decompose / impl. team task | `pending data` — not removed; `data_status: deferred-corpus` retained until generator script is committed |
|
||||||
|
| `fixtures/imu_AD0000xx.csv` | SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory at 200 Hz | Decompose / impl. team task | `pending data` — not removed; `data_status: deferred-corpus` |
|
||||||
|
| `aerialvl_s03`, `uav_visloc`, `aerialextrematch`, `2chadcnn_seasons`, `tartanair_v2`, `internal_mavic` | External downloads + curation | data team task (Decompose creates a "dataset acquisition" task) | `data_status: deferred-corpus` |
|
||||||
|
| `internal_fixed_wing_first_sortie` | Field-test plan | operations team | `data_status: deferred-field` |
|
||||||
|
| `cold_soak_corpus`, `hot_soak_corpus` | Bench HW + chamber | bench team | `data_status: deferred-hil` |
|
||||||
|
| `synthetic_8h_load` | `fixtures/synth-8h-loader/` script | impl. team | regenerated per session — synthesisable, no external dependency |
|
||||||
|
| `cache_poisoning_scenarios` | `fixtures/cache-poison-mc/` script | impl. team | regenerated per session |
|
||||||
@@ -0,0 +1,138 @@
|
|||||||
|
# Traceability Matrix
|
||||||
|
|
||||||
|
> **`data_status` legend** (Phase 1 decision D4):
|
||||||
|
> - `present` — fixture / corpus is in `_docs/00_problem/input_data/` and ready.
|
||||||
|
> - `deferred-corpus` — relies on an external dataset declared by name (AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, internal-fixed-wing first sortie, multi-flight Monte Carlo) — fixture path is reserved; data not yet downloaded / curated.
|
||||||
|
> - `deferred-sitl` — requires SITL ArduPilot environment (PR #30080-pinned) to be provisioned.
|
||||||
|
> - `deferred-hil` — requires real Jetson Orin Nano Super on bench + thermal chamber.
|
||||||
|
> - `deferred-field` — requires a real field-test sortie.
|
||||||
|
> - `pending data` — placeholder fixture declared by name (Phase 1 D3) but generator script not yet committed (`fixtures/satellite_tiles_AD0000xx_z20/`, `fixtures/imu_AD0000xx.csv`).
|
||||||
|
>
|
||||||
|
> Per Phase 1 D4: tests are specified for **all 38 ACs** + the documented restrictions, even where data is not yet present. Phase 3's HARD GATE will surface fixtures as **`pending data`** rather than removing tests.
|
||||||
|
|
||||||
|
## Acceptance Criteria Coverage
|
||||||
|
|
||||||
|
| AC ID | Acceptance Criterion (one-line) | Test IDs | data_status | Coverage |
|
||||||
|
|-------|-----------|----------|-------------|----------|
|
||||||
|
| AC-1.1 | ≥80 % within 50 m on normal flight (functional pipeline + deployment-binding) | FT-P-01 (T1), FT-P-T2 (T2 binding), NFT-PERF-11 (bench-off) | T1 `present`; T2 `deferred-corpus` (AerialVL S03) | Covered |
|
||||||
|
| AC-1.2 | ≥50 % within 20 m | FT-P-02 (T1), FT-P-T2 (T2 binding) | same | Covered |
|
||||||
|
| AC-1.3 | VO drift <100 m mono / <50 m mono+IMU between satellite anchors | FT-P-04 (T1 functional + T2 binding via AerialVL) | T1 `pending data` (synthetic IMU + placeholder tiles); T2 `deferred-corpus` | Covered |
|
||||||
|
| AC-1.4 | Quantitative confidence score (covariance + categorical label) | FT-P-05, FT-P-06, FT-P-07, FT-P-09, NFT-RES-08 | `present` (T1) | Covered |
|
||||||
|
| AC-2.1 | Image registration rate >95 % under normal-flight definition | FT-P-10 (T1 functional + T2 binding) | T1 `present`; T2 `deferred-corpus` | Covered |
|
||||||
|
| AC-2.2 | MRE <1.0 px VO frame-to-frame; <2.5 px cross-domain | FT-P-11 (T1 functional + T2 binding) | T1 `pending data` (placeholder tiles); T2 `deferred-corpus` | Covered |
|
||||||
|
| AC-3.1 | Survives 350 m outliers from ±20° tilt | FT-P-13 | `present` (synthetic injection over 60-image slice) | Covered |
|
||||||
|
| AC-3.2 | Sharp turn (<5 % overlap, <70°, <200 m drift) handled by satellite re-loc | FT-P-14, FT-N-06, NFT-RES-06 | `present` (synthetic injection) + `pending data` (placeholder tiles) | Covered |
|
||||||
|
| AC-3.3 | ≥3 disconnected segments per flight via global retrieval + RANSAC pose-graph re-loc | FT-P-31, NFT-RES-07 | `present` (synthetic) + `pending data` (placeholder tiles) | Covered |
|
||||||
|
| AC-3.4 | RELOC_REQ on ≥3 frames AND ≥2 s no-position; continues VO/IMU DR while waiting | FT-N-07, FT-N-08, FT-N-09, NFT-RES-04, NFT-RES-05 | `present` | Covered |
|
||||||
|
| AC-4.1 | End-to-end latency <400 ms p95 on Orin Nano Super 25 W | NFT-PERF-01 (T4 binding), NFT-PERF-12 | T1 `present` (functional smoke); T4 `deferred-hil` (binding) | Covered |
|
||||||
|
| AC-4.2 | Memory <8 GB shared on Jetson Orin Nano Super | NFT-RES-LIM-01, NFT-RES-LIM-07 | T1 `present` (functional); T4 `deferred-hil` (binding) | Covered |
|
||||||
|
| AC-4.3 | Two parallel MAVLink channels; v1 ships GPS_INPUT only (ODOMETRY disabled) | FT-P-05, FT-N-11, FT-N-15, FT-N-16 | T1 `present`; T3 `deferred-sitl` for SITL matrix | Covered |
|
||||||
|
| AC-4.4 | Frame-by-frame output, no batching | NFT-PERF-06, FT-P-12 | `present` | Covered |
|
||||||
|
| AC-4.5 | Refinement / corrections to prior fixes | FT-P-32 | `present` | Covered |
|
||||||
|
| AC-5.1 | Initialise from FC's last-known GPS + IMU-extrapolated position at GPS denial | FT-P-17 | `present` | Covered |
|
||||||
|
| AC-5.2 | >3 s no-fix → IMU-only DR + log failure | NFT-RES-03, NFT-PERF-10, FT-N-13 | T3 `deferred-sitl` (binding); T1 `present` for SUT-side observable | Covered |
|
||||||
|
| AC-5.3 | Re-init on companion reboot from FC's IMU-extrapolated position | FT-P-18, FT-P-19, NFT-RES-01 | `present` | Covered |
|
||||||
|
| AC-6.1 | QGC telemetry; per-frame on local link, 1–2 Hz GCS | FT-P-22, FT-P-23, FT-P-24, FT-P-30 | `present` | Covered |
|
||||||
|
| AC-6.2 | GCS commands (operator hint via STATUSTEXT / NAMED_VALUE_FLOAT / custom dialect) | FT-N-09, FT-N-10, NFT-RES-05, NFT-SEC-07 | `present` | Covered |
|
||||||
|
| AC-6.3 | Output coordinates in WGS84 | FT-P-05, FT-P-21 | `present` | Covered |
|
||||||
|
| AC-7.1 | Object loc accuracy = frame-center accuracy in level flight; bound published in maneuver | FT-P-20, FT-P-33, FT-N-21 | `present` | Covered |
|
||||||
|
| AC-7.2 | Object loc trigonometric (gimbal angle + zoom + altitude + flat-terrain) | FT-P-20, FT-P-21 | `present` | Covered |
|
||||||
|
| AC-8.1 | Cache interface ≥0.5 m/px ideal 0.3 m/px; no direct calls to Maxar/Airbus/Planet | FT-N-19, NFT-SEC-11 | `present` | Covered |
|
||||||
|
| AC-8.2 | Tile freshness <6 mo active / <12 mo stable | FT-N-04, FT-N-05, NFT-RES-12 | `present` (synthetic-age tiles) | Covered |
|
||||||
|
| AC-8.3 | Pre-loaded + pre-processed cache; pre-extracted descriptors | FT-P-26, FT-P-27, NFT-RES-09 | T1 `present` for cache-shape; deployment binding `pending data` (real Service-supplied corpus) | Covered |
|
||||||
|
| AC-8.4 | Mid-flight tile generation, dedup, post-flight upload | FT-P-28, FT-P-29, FT-P-34, F-T2 (within FT-P-28) | `present` (dedup replay) + `pending data` (`service-stub` records) | Covered |
|
||||||
|
| AC-8.5 | No raw nav-cam / AI-cam frame retention; tiles + ≤0.1 Hz failure thumbnail log only | FT-N-18, NFT-SEC-10, NFT-RES-LIM-05 | `present` | Covered |
|
||||||
|
| AC-8.6 | VPR retrieval unit decoupled from storage tile; multi-scale; dynamic K; conditional invocation | NFT-PERF-08, NFT-PERF-09 | T1 `pending data` (placeholder tiles + descriptors); T2 binding `deferred-corpus` | Covered |
|
||||||
|
| AC-NEW-1 | Cold-start TTFF <30 s p95 | FT-P-16 (T1 N=10), FT-P-T4 cold (T4 N=50), FT-P-25, NFT-RES-LIM-04 | T1 `present` (functional smoke); T4 `deferred-hil` for cold-soak binding | Covered |
|
||||||
|
| AC-NEW-2 | Spoofing-promotion <3 s p95 | NFT-PERF-05, NFT-RES-02, FT-N-12 | T3 `deferred-sitl` | Covered |
|
||||||
|
| AC-NEW-3 | Flight Data Recorder, 64 GB cap, no raw frames, all classes preserved | NFT-RES-14, NFT-RES-LIM-05, NFT-SEC-10, FT-N-18 | T1 `present` (volume accounting); T4 `deferred-hil` for 8-h soak binding | Covered |
|
||||||
|
| AC-NEW-4 | False-position safety budget P(>500 m)<0.1 %, P(>1 km)<0.01 % | covered via Monte Carlo on AerialVL S03 + Mavic + AerialExtreMatch (statistical analysis bundled into FT-P-T2 + FT-P-35 + dedicated NF-T4 Monte Carlo run) | T2 `deferred-corpus` (Monte Carlo over ≥100 simulated flights) | Covered |
|
||||||
|
| AC-NEW-5 | Operating temp −20 °C to +50 °C; 25 W sustained 8 h with no thermal throttle | NFT-RES-LIM-02, NFT-RES-LIM-03, NFT-RES-LIM-04 | T4 `deferred-hil` (chamber) | Covered |
|
||||||
|
| AC-NEW-6 | Stale-tile rejection / decay across 30-day grace | FT-N-04, FT-N-05, NFT-RES-12 | `present` (synthetic-age tiles) | Covered |
|
||||||
|
| AC-NEW-7 | Cache-poisoning safety budget P(>30 m)<1 %, P(>100 m)<0.1 %; voting layer | FT-P-34, FT-N-17, FT-P-35, NFT-RES-15, NFT-SEC-13 | T1 `present` (gate behaviour) + `pending data` (`service-stub` voting); T2 `deferred-corpus` (Monte Carlo binding) | Covered |
|
||||||
|
| AC-NEW-8 | cuVSLAM mono+IMU drift ≤50 m / mono ≤100 m on AerialVL fixed-wing trajectories | FT-P-04 (binding split) | T2 `deferred-corpus` (AerialVL S03) | Covered |
|
||||||
|
| AC-NEW-9 | Companion-side covariance calibration: empirical residuals lie within reported h_acc/v_acc with prob ≥95 % | FT-P-36, FT-P-37 | T2 `deferred-corpus` (AerialVL S03) | Covered |
|
||||||
|
|
||||||
|
## Restrictions Coverage
|
||||||
|
|
||||||
|
| Restriction ID | Restriction (one-line) | Test IDs | data_status | Coverage |
|
||||||
|
|----------------|------------------------|----------|-------------|----------|
|
||||||
|
| RESTRICT-UAV-01 | Fixed-wing UAV only | FT-P-T2 (binding via AerialVL fixed-wing) | T2 `deferred-corpus` | Covered |
|
||||||
|
| RESTRICT-UAV-02 | Nav cam fixed downward, not gimbal-stabilized | FT-P-01..FT-P-04 (assumed by replay shape) | `present` | Covered |
|
||||||
|
| RESTRICT-UAV-03 | Operational area: east/south Ukraine | environmental envelope (AC-NEW-5 covers thermal); no separate test required | — | Implicit (envelope captured by AC-NEW-5 + AC-8.6 active-conflict sector handling) |
|
||||||
|
| RESTRICT-UAV-04 | 8-h flights at ~60 km/h; sector + corridor up to 400 km² total | NFT-RES-LIM-06, NFT-RES-LIM-11, NFT-RES-14 | T4 `deferred-hil` for 8-h | Covered |
|
||||||
|
| RESTRICT-UAV-05 | ≤1 km AGL; flat-terrain assumption | AC-7.1 / AC-7.2 tests (flat-terrain) + Component 1b ortho terrain-class check (F-T14 within NFT-PERF-04) | `pending data` (DEM tiles) | Covered |
|
||||||
|
| RESTRICT-UAV-06 | Predominantly sunny daytime | bench-off seasonal-robustness (NFT-PERF-11 + NFT-RES-13) | T2 `deferred-corpus` | Covered |
|
||||||
|
| RESTRICT-UAV-07 | Sharp turns are exception (<5 % overlap) | FT-P-14, FT-N-06, NFT-RES-06 | `present` | Covered |
|
||||||
|
| RESTRICT-UAV-08 | No photo-count cap | FT-N-20 | `present` | Covered |
|
||||||
|
| RESTRICT-CAM-01 | Nav cam: ADTi 20MP 20L V1 APS-C; GSD 10–20 cm/px @ 1 km AGL | FT-P-T2 binding (AerialVL S03 stand-in until first internal fixed-wing flight) | T5 `deferred-field` for the deployment camera proper | Covered (caveat: 60-image slice = 26 MP @ 400 m AGL, pipeline-correctness only — see test-data.md D2 caveat) |
|
||||||
|
| RESTRICT-CAM-02 | AI cam pose info = gimbal angle + zoom only; airframe attitude not published | FT-P-33, FT-N-21 | `present` | Covered |
|
||||||
|
| RESTRICT-CAM-03 | Cameras connect via USB / MIPI-CSI / GigE | not separately testable at black-box level | — | Hardware-integration concern; covered by FT-1 / FT-2 / FT-3 field tests at T5 |
|
||||||
|
| RESTRICT-SAT-01 | Source = Azaion Suite Satellite Service; SUT consumes via offline cache | NFT-SEC-11 | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-02 | No in-flight Service calls (offline cache only) | NFT-SEC-11 | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-03 | Mid-flight tile generation + post-flight upload | FT-P-28, FT-P-29, NFT-RES-15 | `present` + `pending data` (`service-stub`) | Covered |
|
||||||
|
| RESTRICT-SAT-04 | No raw photo storage | FT-N-18, NFT-SEC-10 | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-05 | Cache resolution ≥0.5 m/px | FT-N-19 | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-06 | Storage tile zoom z=20 | FT-P-26 + cache-shape audit | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-07 | Freshness gates: 6 mo active / 12 mo stable | FT-N-04, FT-N-05, NFT-RES-12 | `present` | Covered |
|
||||||
|
| RESTRICT-SAT-08 | Free public Sentinel-2 not on runtime path | FT-N-19, NFT-SEC-11 | `present` | Covered |
|
||||||
|
| RESTRICT-HW-01 | Jetson Orin Nano Super: 67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W TDP | NFT-PERF-01, NFT-RES-LIM-01, NFT-RES-LIM-07 | T4 `deferred-hil` (binding) | Covered |
|
||||||
|
| RESTRICT-HW-02 | JetPack + CUDA + TensorRT | FT-P-25 + NFT-PERF-02..04 | T4 `deferred-hil` | Covered |
|
||||||
|
| RESTRICT-HW-03 | Cooling sustains 25 W for 8 h at upper temp | NFT-RES-LIM-03 | T4 `deferred-hil` (chamber) | Covered |
|
||||||
|
| RESTRICT-HW-04 | NVMe ≥ 10 GB cache + 64 GB FDR | NFT-RES-LIM-05, NFT-RES-LIM-06, NFT-RES-LIM-12 | T1 + T4 mix | Covered |
|
||||||
|
| RESTRICT-INTEG-01 | IMU via MAVLink from FC | F-T1c within FT-P-04 (cuVSLAM mono vs mono+IMU) | T1 `pending data` (synthetic IMU); T2 `deferred-corpus` for AerialVL IMU | Covered |
|
||||||
|
| RESTRICT-INTEG-02 | MAVLink comm: MAVSDK + pymavlink, distinct sysids via ArduPilot routing, no `mavlink-router` | FT-P-05, FT-N-11, NFT-SEC-06 (sysid) | T1 + T3 | Covered |
|
||||||
|
| RESTRICT-INTEG-03 | ArduPilot only; no PX4 | F-T9 SITL matrix runs only against ArduPilot SITL (FT-N-15, FT-N-16, NFT-RES-10) | T3 `deferred-sitl` | Covered |
|
||||||
|
| RESTRICT-INTEG-04 | WGS84 output | FT-P-05, FT-P-21 | `present` | Covered |
|
||||||
|
| RESTRICT-INTEG-05 | QGroundControl GCS only; no Mission Planner | by `qgc-mock` only — Mission Planner not exercised | `present` | Covered |
|
||||||
|
| RESTRICT-FAIL-01 | 3 s no-fix → IMU DR fallback | NFT-RES-03, NFT-PERF-10 | T3 `deferred-sitl` | Covered |
|
||||||
|
| RESTRICT-FAIL-02 | False-position safety (AC-NEW-4) | identical coverage as AC-NEW-4 | T2 `deferred-corpus` | Covered |
|
||||||
|
| RESTRICT-FAIL-03 | Cold-start TTFF + spoofing-promotion latency budgets | identical to AC-NEW-1 + AC-NEW-2 | T1+T3+T4 mix | Covered |
|
||||||
|
|
||||||
|
## Coverage Summary
|
||||||
|
|
||||||
|
| Category | Total Items | Covered | Not Covered | Coverage % |
|
||||||
|
|----------|-----------|---------|-------------|-----------|
|
||||||
|
| Acceptance Criteria | 38 | 38 | 0 | 100 % |
|
||||||
|
| Restrictions | 31 | 31 | 0 | 100 % |
|
||||||
|
| **Total** | **69** | **69** | **0** | **100 %** |
|
||||||
|
|
||||||
|
### Coverage by `data_status`
|
||||||
|
|
||||||
|
| `data_status` | Test count (rows where this status appears for ≥1 test) | Notes |
|
||||||
|
|---------------|-----------|-------|
|
||||||
|
| `present` | majority of T1 tests | Covers all 60-image-slice pipeline-correctness ACs/restrictions and all behavioural-shape tests. |
|
||||||
|
| `pending data` | satellite tile + IMU placeholder fixtures | Covers AC-1.3, AC-2.2 cross-domain, AC-3.2 sat re-loc, AC-3.3 segments, AC-8.6 VPR descriptors, AC-NEW-7 voting, RESTRICT-UAV-05 DEM, RESTRICT-INTEG-01 IMU. Surfaced as Phase 3 HARD-GATE finding, not removed. |
|
||||||
|
| `deferred-corpus` | AC-1.1, AC-1.2 deployment-binding; AC-1.3 binding; AC-2.1 binding; AC-2.2 binding; AC-NEW-4; AC-NEW-7 Monte Carlo; AC-NEW-8; AC-NEW-9; bench-off corpora | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic. Decompose creates a "dataset acquisition" task. |
|
||||||
|
| `deferred-sitl` | AC-4.3 SITL matrix (FT-N-15, FT-N-16); AC-NEW-2; RESTRICT-INTEG-03; RESTRICT-FAIL-01 | ArduPilot SITL pinned to PR #30080-class build. |
|
||||||
|
| `deferred-hil` | AC-4.1 binding; AC-4.2 binding; AC-NEW-1 cold corner; AC-NEW-3 8-h soak; AC-NEW-5 thermal envelope; RESTRICT-HW-01..03 | Real Jetson + thermal chamber. |
|
||||||
|
| `deferred-field` | RESTRICT-CAM-01 deployment-camera binding (first internal fixed-wing flight) | Field-test plan. |
|
||||||
|
|
||||||
|
## Uncovered Items Analysis
|
||||||
|
|
||||||
|
| Item | Reason Not Covered | Risk | Mitigation |
|
||||||
|
|------|-------------------|------|-----------|
|
||||||
|
| (none) | — | — | — |
|
||||||
|
|
||||||
|
All 38 ACs and 31 restrictions are covered by ≥1 test, per Phase 1 D4. **No uncovered items.** Coverage is 100 % at the spec level; data availability — not coverage — is the gating concern, surfaced via the `data_status` column.
|
||||||
|
|
||||||
|
## Pipeline-Correctness vs Deployment-Binding Boundary
|
||||||
|
|
||||||
|
The 60-image slice (`present` data_status) is **pipeline-correctness only** for the accuracy ACs. Deployment-binding numbers come from the `deferred-corpus` and `deferred-hil` tiers. This is per Phase 1 decision D2 and is documented in `test-data.md`. The matrix's "Covered" column is honest about which tier supplies which evidence:
|
||||||
|
|
||||||
|
| AC | Pipeline-correctness (T1, `present`) | Deployment-binding |
|
||||||
|
|----|---------------------------------------|--------------------|
|
||||||
|
| AC-1.1 | FT-P-01 (functional check) | FT-P-T2 (T2 `deferred-corpus` AerialVL S03) |
|
||||||
|
| AC-1.2 | FT-P-02 | FT-P-T2 |
|
||||||
|
| AC-1.3 | FT-P-04 (functional, with `pending data`) | FT-P-04 binding split (T2) |
|
||||||
|
| AC-2.1 | FT-P-10 | FT-P-10 binding (T2) |
|
||||||
|
| AC-2.2 | FT-P-11 | FT-P-11 binding (T2) |
|
||||||
|
| AC-4.1 | NFT-PERF-01 functional smoke | NFT-PERF-01 binding (T4) |
|
||||||
|
| AC-4.2 | NFT-RES-LIM-01 functional | NFT-RES-LIM-01 binding (T4) |
|
||||||
|
| AC-NEW-1 | FT-P-16 (T1 N=10) | FT-P-T4 cold (T4 N=50) + NFT-RES-LIM-04 |
|
||||||
|
| AC-NEW-3 | NFT-RES-LIM-05 functional | NFT-RES-14 + NFT-RES-LIM-05 binding (T4 8-h) |
|
||||||
|
| AC-NEW-4 | (none — Monte Carlo only) | FT-P-35 (T2 binding) |
|
||||||
|
| AC-NEW-5 | (none — chamber only) | NFT-RES-LIM-02..04 (T4 chamber) |
|
||||||
|
| AC-NEW-7 | FT-P-34 + FT-N-17 functional | FT-P-35 + NFT-SEC-13 binding (T2) |
|
||||||
|
| AC-NEW-8 | (none — fixed-wing only) | FT-P-04 binding (T2) |
|
||||||
|
| AC-NEW-9 | (none — covariance evaluation requires ground-truth corpus) | FT-P-36 + FT-P-37 (T2) |
|
||||||
@@ -6,9 +6,9 @@ step: 3
|
|||||||
name: Plan
|
name: Plan
|
||||||
status: in_progress
|
status: in_progress
|
||||||
sub_step:
|
sub_step:
|
||||||
phase: 2
|
phase: 5
|
||||||
name: test-scenarios
|
name: plan-step-1-complete
|
||||||
detail: "Plan Step 1 (test-spec) Phase 1 COMPLETE. User cleared the BLOCKING gate (all 4 questions = A): D1 fix stale docs inline; D2 60-image slice = pipeline-correctness corpus only; D3 spec with placeholder fixtures (satellite tiles + IMU); D4 spec all 46 ACs with data_status markers in traceability-matrix.md. Stale-doc fixes already applied to results_report.md rows 2/19/22/23/25/38 and AC-4.3 v1-scope clause added to acceptance_criteria.md. Findings + locked decisions saved to _docs/02_document/tests/_phase1_findings.md. NEXT on resume: Phase 2 (test-spec/phases/02-test-scenarios.md) — generate 8 artifacts under _docs/02_document/tests/ (environment.md, test-data.md, blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, resource-limit-tests.md, traceability-matrix.md). Recommended fresh conversation due to context-budget caution zone."
|
detail: "Plan Step 1 (test-spec sub-skill) COMPLETE. Phase 1 (input-data analysis) DONE earlier; Phase 2 (test scenarios, 8 artifacts) DONE; Phase 3 HARD GATE PASSED (100% spec-level coverage, 0 truly-missing items, 0 removed tests, defer-don't-remove per Phase 1 D4); Hardware Assessment DONE — `## Test Execution` section appended to environment.md classifying project as hardware-dependent and recording the Mode-C (both: Docker for T1/T2/T3 + bench-local for T4 + field for T5) per-tier split decision; Phase 4 (runner-scripts) SKIPPED per skill rule (planning context — script creation deferred to Decompose as tasks). Plan Step 1 user-level BLOCKING gate (test coverage confirmation) was satisfied by the Phase 2 → Phase 3 confirmation earlier in this session. Next: Plan Step 2 (Solution Analysis), opening with BLOCKING Phase 2a.0 (Glossary + Architecture Vision)."
|
||||||
retry_count: 0
|
retry_count: 0
|
||||||
cycle: 1
|
cycle: 1
|
||||||
tracker: jira
|
tracker: jira
|
||||||
|
|||||||
Reference in New Issue
Block a user