Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.

2026-04-27 13:56:37 +00:00 · 2026-04-27 06:23:53 +03:00
parent 9eba1689b3
commit f321268e1b
9 changed files with 2561 additions and 3 deletions
@@ -0,0 +1,268 @@
 # Test Environment
 ## Overview
 **System under test (SUT)**: the GPS-Denied Onboard companion-computer software stack — a set of ROS 2 Humble + Isaac ROS 3.2 nodes (cuVSLAM, VPR, cross-view matcher, Component 5 calibrator, Component 1b ortho-tile generator, Component 6 MAVLink bridge, Component 10 FDR, Component 7 health/failsafe, Component 8 object localizer) running on a Jetson Orin Nano Super (or x86+CUDA emulator for non-hardware tiers).
 **SUT entry points (public interfaces, all black-box)**:
 | Entry point | Protocol | Direction | Bound to | Purpose |
 |-------------|----------|-----------|----------|---------|
 | `MAVLink GPS_INPUT` | MAVLink2 (signed), serial/UDP | SUT → FC | sysid=11 | Primary position output (AC-4.3, AC-6.3, AC-NEW-1, AC-NEW-2) |
 | `MAVLink STATUSTEXT / NAMED_VALUE_FLOAT` | MAVLink2 (signed) | SUT → GCS | sysid=10 | Telemetry summary, RELOC_REQ (AC-3.4, AC-6.1, AC-6.2) |
 | `MAVLink RAW_IMU / SCALED_IMU / ATTITUDE / GPS_RAW_INT / EKF_STATUS_REPORT / GLOBAL_POSITION_INT` | MAVLink2 | FC → SUT | sysid=10 | IMU + autopilot inputs to cuVSLAM, ortho, source-promotion |
 | `HTTP/HTTPS REST` (e.g., `/health`, `/sessions`, `/objects/locate`) | HTTPS+JWT | external → SUT | TBD port | Object localization, health, session management (AC-7.1, AC-8.1 cache interface, results_report rows 27–33) |
 | `HTTP SSE` (`/sessions/{id}/stream`) | HTTPS+SSE | SUT → external | TBD port | 1 Hz position stream for monitoring (results_report row 32) |
 | `ROS 2 topics` (test-only sniffer) | DDS | SUT internal | observed black-box via topic ports | F-T19 ROS rate sanity test only — NOT used by functional tests |
 | `MBTiles cache file` (read-only check) | SQLite read | external → cache fs | mounted volume | AC-8.3 / AC-8.4 verification at cache boundary, never read SUT internals |
 **Consumer app purpose**: a standalone `pytest`-based black-box test runner exercising the SUT through the MAVLink wire, the HTTP API, and the cache-boundary file artifacts. The runner has **no source-code access** to the SUT, no Python imports of SUT modules, and no DDS subscriptions to internal-only topics (only the public `nav_msgs/Odometry` / `sensor_msgs/Image` subscriptions that are documented as the SUT contract).
 ## Docker Environment
 ### Services
 | Service | Image / Build | Purpose | Ports |
 |---------|--------------|---------|-------|
 | `sut` | build context `./` (multi-stage Dockerfile producing the JetPack 6 runtime image; compiled for `linux/arm64` for HW tier and `linux/amd64+cuda` for SW emulation tier) | The full GPS-Denied stack (all ROS 2 nodes) | UDP 14550 (MAVLink to FC), UDP 14560 (MAVLink to GCS), TCP 8443 (HTTPS API), TCP 8080 (HTTP SSE), TCP 9090 (Prometheus metrics) |
 | `ardupilot-sitl` | `ardupilot/ardupilot-sitl:4.5-PR30080-pinned` | Autopilot SITL (ArduCopter / ArduPlane) — provides FC behaviour for F-T9, F-T11, F-T12, AC-4.3, AC-NEW-1, AC-NEW-2 | UDP 14550 ↔ sut, UDP 14570 ↔ qgc-mock |
 | `qgc-mock` | build `./fixtures/qgc-mock/` (a MAVLink-only mock GCS that records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY, sends operator hints) | Records GCS-bound telemetry; sends operator re-localization hints (AC-6.1, AC-6.2, AC-3.4) | UDP 14570 |
 | `tile-cache-init` | build `./fixtures/tile-cache-init/` (one-shot loader that materialises `fixtures/satellite_tiles_AD0000xx_z20/` MBTiles + sidecar) | Pre-populates the satellite cache before each test | — (one-shot) |
 | `gps-spoof-injector` | build `./fixtures/gps-spoof-injector/` (publishes `GPS_RAW_INT` with crafted lat/lon/sat/hdop) | F-T12 / AC-NEW-2 spoof scenarios | UDP 14571 → sut |
 | `e2e-runner` | build `./e2e/` (Python 3.11 + pytest + pymavlink + httpx + pyserial) | Black-box test runner | — |
 | `prom` | `prom/prometheus:v2.51.0` | Scrape SUT metrics (CPU, GPU, temp) for NF-T2 / NF-T3 / AC-4.2 / AC-NEW-5 | TCP 9091 |
 | `nvidia-smi-exporter` | `utkuozdemir/nvidia_gpu_exporter:1.2.0` (HW tier only) | Jetson tegrastats / nvidia-smi metrics | TCP 9092 |
 ### Networks
 | Network | Services | Purpose |
 |---------|----------|---------|
 | `e2e-mavlink-net` | `sut`, `ardupilot-sitl`, `qgc-mock`, `gps-spoof-injector` | MAVLink traffic (single broadcast domain so distinct sysids share routing realistically) |
 | `e2e-api-net` | `sut`, `e2e-runner` | HTTPS + SSE traffic for object-localization / health endpoints |
 | `e2e-metrics-net` | `sut`, `prom`, `nvidia-smi-exporter`, `e2e-runner` | Resource-monitoring scrape path |
 ### Volumes
 | Volume | Mounted to | Purpose |
 |--------|-----------|---------|
 | `tile-cache` | `sut:/var/lib/gpsdenied/tiles` (rw), `tile-cache-init:/init/tiles` (rw), `e2e-runner:/probe/tiles` (ro) | Persistent satellite + onboard tile cache (AC-8.3, AC-8.4) |
 | `fdr` | `sut:/var/lib/gpsdenied/fdr` (rw), `e2e-runner:/probe/fdr` (ro) | Flight Data Recorder output (AC-NEW-3) |
 | `fixtures-images` | `sut:/fixtures/images` (ro), `e2e-runner:/fixtures/images` (ro) | The 60 nav-cam JPGs + AerialVL S03 slice |
 | `fixtures-imu` | `sut:/fixtures/imu` (ro), `ardupilot-sitl:/fixtures/imu` (ro) | SITL replay IMU traces (AerialVL S03 + synthetic from `coordinates.csv`) |
 | `fixtures-expected` | `e2e-runner:/fixtures/expected_results` (ro) | `_docs/00_problem/input_data/expected_results/` mounted into the runner |
 | `e2e-results` | `e2e-runner:/results` (rw, host bind) | CSV report output |
 ### docker-compose structure
 ```yaml
 # Outline only — not runnable code
 services:
  sut:
    build: .
    networks: [e2e-mavlink-net, e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/var/lib/gpsdenied/tiles
      - fdr:/var/lib/gpsdenied/fdr
      - fixtures-images:/fixtures/images:ro
      - fixtures-imu:/fixtures/imu:ro
    environment:
      - MAVLINK_FC_URL=udp://ardupilot-sitl:14550
      - MAVLINK_GCS_URL=udp://qgc-mock:14570
      - GPSD_API_BIND=0.0.0.0:8443
      - GPSD_TILE_DIR=/var/lib/gpsdenied/tiles
      - GPSD_FDR_DIR=/var/lib/gpsdenied/fdr
    runtime: nvidia      # HW tier
  ardupilot-sitl:
    image: ardupilot/ardupilot-sitl:4.5-PR30080-pinned
    networks: [e2e-mavlink-net]
    command: ["--vehicle=ArduPlane", "--frame=plane", "--imu-replay=/fixtures/imu/AD0000xx.csv"]
  qgc-mock:
    build: ./fixtures/qgc-mock/
    networks: [e2e-mavlink-net]
  tile-cache-init:
    build: ./fixtures/tile-cache-init/
    volumes:
      - tile-cache:/init/tiles
    restart: "no"
  gps-spoof-injector:
    build: ./fixtures/gps-spoof-injector/
    networks: [e2e-mavlink-net]
  e2e-runner:
    build: ./e2e/
    depends_on: [sut, ardupilot-sitl, qgc-mock, tile-cache-init]
    networks: [e2e-api-net, e2e-metrics-net]
    volumes:
      - tile-cache:/probe/tiles:ro
      - fdr:/probe/fdr:ro
      - fixtures-images:/fixtures/images:ro
      - fixtures-expected:/fixtures/expected_results:ro
      - e2e-results:/results
    command: ["pytest", "-q", "--junit-xml=/results/junit.xml", "--csv=/results/report.csv"]
  prom:
    image: prom/prometheus:v2.51.0
    networks: [e2e-metrics-net]
 ```
 ## Consumer Application
 **Tech stack**: Python 3.11 / pytest 8.x / `pymavlink` (matching the SUT version) / `httpx[http2]` / `pyserial` / `numpy` / `pandas` / `pytest-csv` / `pytest-timeout`. **No SUT source imports.**
 **Entry point**: `pytest -q` inside `e2e-runner`, with marker-based selection per tier (`pytest -m "blackbox and pipeline"` → 60-image slice; `pytest -m "blackbox and deferred-corpus"` → AerialVL S03; etc.).
 ### Communication with system under test
 | Interface | Protocol | Endpoint / Topic | Authentication |
 |-----------|----------|-----------------|----------------|
 | GPS_INPUT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) and `udp://ardupilot-sitl:14550` (target) | MAVLink2 signing key shared with FC for round-trip verification |
 | STATUSTEXT / NAMED_VALUE_FLOAT capture | MAVLink2 over UDP | `udp://qgc-mock:14570` (sniffed) | MAVLink2 signing key |
 | Object localization | HTTPS + JSON | `POST sut:8443/objects/locate` | JWT bearer (test-only key in `e2e-runner` config) |
 | Health probe | HTTPS + JSON | `GET sut:8443/health` | JWT bearer |
 | Session management | HTTPS + JSON | `POST sut:8443/sessions`, `GET sut:8443/sessions/{id}/stream` | JWT bearer |
 | Operator hint | MAVLink2 STATUSTEXT | injected via `qgc-mock` | MAVLink2 signing key |
 | Spoofed GPS injection | MAVLink2 GPS_RAW_INT | injected via `gps-spoof-injector` (separate sysid) | MAVLink2 signing key |
 | Tile cache file probe | filesystem read | `/probe/tiles/*.mbtiles` + sidecar JSON | — (read-only mount) |
 | FDR file probe | filesystem read | `/probe/fdr/**/*` | — (read-only mount) |
 | Metrics scrape | HTTP | `GET prom:9091/api/v1/query?…` | — (test net only) |
 ### What the consumer does NOT have access to
 - No direct DB / SQLite write access against the SUT's tile or FDR stores.
 - No Python imports of SUT modules.
 - No DDS subscriptions to internal-only topics (e.g., the matcher's intermediate keypoint topic, the calibrator's residual topic). Only the documented contract topics consumed in F-T19.
 - No CUDA context, no shared memory, no `/proc` access into the SUT container.
 - No log-file scraping that bypasses the public health/STATUSTEXT path.
 ## Test Tiers
 The runner stratifies execution by **what artefact set is present**. Each tier maps to a pytest marker and to a `data_status` column value in `traceability-matrix.md`.
 | Tier | Marker | Corpus / fixtures required | Coverage scope |
 |------|--------|---------------------------|----------------|
 | **T1 pipeline-correctness** | `pipeline` | `_docs/00_problem/input_data/` 60-image slice + `coordinates.csv` + placeholder satellite tiles + SITL-replayed IMU | Validates pipeline plumbing only, **NOT** deployment-binding numbers (per Phase 1 D2). |
 | **T2 deferred-corpus** | `deferred-corpus` | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight | Deployment-binding accuracy & drift for AC-1.1, AC-1.2, AC-1.3, AC-2.1, AC-2.2, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9. |
 | **T3 deferred-sitl** | `deferred-sitl` | ArduPilot SITL pinned to PR #30080-class build + scripted scenarios | F-T9 source-switching matrix (AC-4.3, AC-NEW-2). |
 | **T4 deferred-hil** | `deferred-hil` | Real Jetson Orin Nano Super on bench + thermal chamber + bench MAVLink loop | AC-4.1 latency on real HW, AC-4.2 memory cap, AC-NEW-5 thermal envelope, AC-NEW-1 cold-start TTFF on real HW. |
 | **T5 deferred-field** | `deferred-field` | Recorded fixed-wing sortie | FT-1 / FT-2 / FT-3 final field validation. |
 Pipeline-tier (T1) tests are the only ones whose pass/fail numbers are **NOT** treated as deployment evidence — they verify that the pipeline produces *some* output of the right shape, not that the output meets the deployment-binding accuracy budget. Deployment-binding tests live in T2–T5.
 ## CI/CD Integration
 | Tier | When to run | Pipeline stage | Gate behavior | Timeout |
 |------|-------------|----------------|---------------|---------|
 | T1 pipeline | Every PR to `dev`; nightly | After unit tests | Block merge on FAIL | 30 min |
 | T2 deferred-corpus | Nightly; on tag push | Pre-release | Block release on FAIL | 4 h (Monte Carlo NF-T4 dominates) |
 | T3 deferred-sitl | Nightly | Pre-release | Block release on FAIL | 1 h |
 | T4 deferred-hil | Bench-on-demand + weekly thermal cycle | Bench-only stage | Manual approval | 12 h (NF-T3 8 h soak) |
 | T5 deferred-field | Field-test plan (per-sortie) | Field stage | Out-of-band sign-off | per sortie |
 ## Reporting
 **Format**: CSV (one row per test execution) plus JUnit XML for CI.
 **CSV columns**: `test_id`, `test_name`, `tier`, `marker`, `traces_to_acs` (semicolon-joined), `traces_to_restricts`, `data_status` (`present` / `deferred-corpus` / `deferred-sitl` / `deferred-hil` / `deferred-field`), `started_at`, `execution_time_ms`, `result` (`PASS` / `FAIL` / `SKIP` / `BLOCKED-DATA`), `expected_metric`, `actual_metric`, `tolerance`, `error_message` (if FAIL or BLOCKED-DATA), `git_sha`, `image_tag`, `runner_host`.
 **Output paths**:
 - `e2e-results:/results/report.csv` — primary CSV report
 - `e2e-results:/results/junit.xml` — JUnit XML
 - `e2e-results:/results/coverage_by_ac.csv` — derived: AC → covering test IDs → aggregate result
 - `e2e-results:/results/per_tier.csv` — derived: tier → pass/fail/skip/blocked-data counts
 **`BLOCKED-DATA` handling**: when a test's required fixture is missing (e.g., AerialVL S03 not yet downloaded in CI), the test must emit `BLOCKED-DATA` rather than `FAIL` or `SKIP` — this preserves the data_status signal in the matrix without polluting the failure rate.
 ## Test Execution
 **Decision: both (per-tier split).** The system is hardware-dependent (Jetson Orin Nano Super + CUDA + TensorRT + thermal envelope + USB/MIPI cameras + MAVLink hardware loop), so execution is split between Docker (T1/T2/T3 — pipeline-correctness, deferred-corpus, deferred-sitl) and real-hardware bench / field (T4 deferred-hil, T5 deferred-field).
 ### Hardware dependencies found
 | Source | Indicator |
 |--------|-----------|
 | `_docs/00_problem/restrictions.md:26` | Cameras over USB / MIPI-CSI / GigE |
 | `_docs/00_problem/restrictions.md:41` | Jetson Orin Nano Super — 67 TOPS INT8, 8 GB LPDDR5, 25 W TDP |
 | `_docs/00_problem/restrictions.md:42` | JetPack + CUDA + TensorRT |
 | `_docs/00_problem/restrictions.md:43` | Sustained 25 W for 8 h at upper-envelope temperature (AC-NEW-5) |
 | `_docs/00_problem/restrictions.md:48-51` | IMU + MAVLink2 from FC (serial/UDP); ArduPilot only |
 | `_docs/01_solution/solution.md` | cuVSLAM (GPU), VPR DINOv2-VLAD (TensorRT), cross-view matcher (TensorRT) |
 | this file (`environment.md`) | `runtime: nvidia`; `linux/arm64` HW tier + `linux/amd64+cuda` SW emulation tier; `nvidia-smi-exporter` |
 Source-code scan is deferred to the first implement cycle (no source code yet at Plan Step 1).
 ### Mode A — Docker (T1 / T2 / T3)
 **Prerequisites:**
 - Docker 24.x+ with Compose v2
 - For HW-tier runners: NVIDIA Container Toolkit + a host with an NVIDIA GPU (sm_87 for true Orin parity; sm_86 acceptable for SW emulation)
 - For SW-emulation runners: `linux/amd64` host; CUDA emulation layer enabled in the SUT image's `linux/amd64+cuda` build target
 - T2 only: deferred-corpus volumes mounted (AerialVL S03, etc. — see `test-data.md`)
 - T3 only: `ardupilot-sitl` PR-#30080-pinned image pulled
 **Run:**
 ```bash
 # T1 pipeline
 docker compose -f e2e/docker-compose.test.yml run --rm e2e-runner \
    pytest -m "blackbox and pipeline" --csv=/results/report.csv
 # T2 deferred-corpus (corpus volumes must be present)
 docker compose -f e2e/docker-compose.test.yml --profile corpus run --rm e2e-runner \
    pytest -m "deferred-corpus" --csv=/results/report.csv
 # T3 deferred-sitl
 docker compose -f e2e/docker-compose.test.yml --profile sitl run --rm e2e-runner \
    pytest -m "deferred-sitl" --csv=/results/report.csv
 ```
 **Result collection:** host bind-mount `e2e-results:./results` — produces `report.csv`, `junit.xml`, `coverage_by_ac.csv`, `per_tier.csv`.
 **Environment variables (key):** `MAVLINK_FC_URL`, `MAVLINK_GCS_URL`, `GPSD_API_BIND`, `GPSD_TILE_DIR`, `GPSD_FDR_DIR`, `MAVLINK2_SIGNING_KEY`, `JWT_SIGNING_KEY` — full list in `e2e/.env.example` (to be produced in Phase 4 / Decompose).
 ### Mode B — Local on bench Jetson (T4 deferred-hil)
 **Prerequisites:**
 - Real Jetson Orin Nano Super dev kit with JetPack 6.x, CUDA 12.x, TensorRT 10.x
 - Bench MAVLink loop (a second Jetson or a USB-MAVLink dongle running `ardupilot-sitl` against a recorded IMU stream, OR a real autopilot board on bench)
 - Thermal chamber (AC-NEW-5 only; otherwise lab ambient is sufficient for AC-4.1 / AC-4.2 / AC-NEW-1 cold-start / AC-NEW-3 8-h soak)
 - `tegrastats` and `nvidia-smi` available
 - Single-tenant scheduling — no other tests share the Jetson during a T4 run
 **Run:**
 ```bash
 # T4 perf binding on real HW
 ./scripts/run-tests.sh --tier=t4
 # Or specifically the perf script for AC-4.1 / AC-NEW-5 binding
 ./scripts/run-performance-tests.sh --tier=t4 --thermal-profile=hot-soak
 ```
 **Result collection:** the bench runner copies `report.csv` + `junit.xml` + `tegrastats.log` + `power.csv` to a network share (path TBD by Decompose).
 ### Mode C — Field (T5 deferred-field)
 Out-of-band per the field-test plan; not part of CI. Captured here for completeness — the runner is the same `e2e-runner` image plus a recorded-flight replay harness defined in the field-test plan.
 ### CI runner mapping
 | Tier | CI runner type | Mode | Cadence |
 |------|---------------|------|---------|
 | T1 pipeline | Linux x86 + NVIDIA GPU (any sm_86+) OR Linux x86 with CUDA emulation | Docker | Every PR + nightly |
 | T2 deferred-corpus | Linux x86 + NVIDIA GPU (sm_86+) with corpus volume mounted | Docker | Nightly + on-tag |
 | T3 deferred-sitl | Linux x86 (CPU-only OK) | Docker | Nightly |
 | T4 deferred-hil | Self-hosted Jetson Orin Nano Super bench runner | Local | Bench-on-demand + weekly thermal cycle |
 | T5 deferred-field | n/a (per-sortie out-of-band) | Field | Per field-test plan |
 Phase 4 (`run-tests.sh`, `run-performance-tests.sh`) consumes this section to choose between the Docker and bench-local code paths via the `--tier=` flag.
 ## External Dependencies
 The SUT does not call commercial satellite providers at runtime (AC-8.1). All upstream sourcing is the Suite Satellite Service's responsibility, which is **out of scope** for this build. The runner therefore mocks:
 - `tile-cache-init` provides the cache contents the SUT would normally have synced from the Service pre-flight.
 - `qgc-mock` is a black-box GCS sniffer + operator-hint injector — not a real QGroundControl instance, but speaks the same MAVLink wire.
 - `gps-spoof-injector` simulates a malicious GPS signal for AC-NEW-2 / F-T12.
 - `ardupilot-sitl` is the only autopilot under test (PX4 is out of scope per restrictions).
 - The SUT's HTTPS API is exercised against the SUT directly — there is no upstream identity provider; JWTs are minted by the runner against a test-only signing key shared at SUT start.
 No external mocks have access to internal SUT state.
@@ -0,0 +1,248 @@
 # Performance Tests
 > Deployment-binding numbers require Tier T4 (real Jetson Orin Nano Super @ 25 W). T1 runs are functional plausibility checks only — same caveat as `test-data.md` D2.
 ---
 ### NFT-PERF-01: End-to-end latency p95 ≤400 ms (AC-4.1)
 **Summary**: From camera-frame capture to GPS_INPUT emission, p95 latency ≤ 400 ms on Orin Nano Super @ 25 W.
 **Traces to**: AC-4.1. Tier: T4 (`deferred-hil`) for binding result; T1 functional smoke.
 **Metric**: end-to-end latency in ms, sampled per-frame, aggregated to p50 / p95 / p99.
 **Preconditions**:
 - Tier T4: real Jetson Orin Nano Super, 25 W power mode (`nvpmodel -m 0` + 25 W profile), thermals stabilized at +25 °C ambient.
 - TRT engines warmed (≥1 min steady-state replay before measurement).
 - 30-min sustained replay of `synthetic_8h_load` slice (or AerialVL S03 mid-segment).
 - Frame timestamping uses the camera-shim `time_usec` and matches against the GPS_INPUT `time_usec`.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Stream nav-cam frames at 3 fps for 30 min after warm-up | per-frame `(t_emit_gps_input - t_capture)` |
 | 2 | Drop the first 60 s as warm-up | aggregate the rest |
 | 3 | Compute p50, p95, p99, max | report |
 | 4 | Verify drop rate | `dropped_frames / total_frames ≤ 10%` |
 **Pass criteria**: p95 ≤ 400 ms; drop rate ≤ 10 % (per AC-4.1's "skip-allowed" clause).
 **Duration**: 30 min + 60 s warm-up.
 ---
 ### NFT-PERF-02: cuVSLAM single-frame latency ≤20 ms
 **Summary**: cuVSLAM inference completes within 20 ms per frame.
 **Traces to**: results_report row 37, F-T1b. Tier: T4 binding; T1 functional.
 **Metric**: cuVSLAM per-frame inference duration, p95.
 **Preconditions**: cuVSLAM warmed; mono+IMU mode.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Replay 5 min of nav-cam frames at 3 fps | per-frame `cuvslam_inference_ms` (publicly exposed metric) |
 | 2 | p95 over the run | report |
 **Pass criteria**: p95 ≤ 20 ms.
 **Duration**: 5 min.
 ---
 ### NFT-PERF-03: Cross-view matcher latency
 **Summary**: Inline matcher (SP+LG TRT FP16/INT8) ≤ 200 ms / pair; LiteSAM re-loc fallback ≤ 2000 ms / pair.
 **Traces to**: AC-4.1 (sub-budget), results_report row 38. Tier: T4 binding.
 **Metric**: per-pair matcher inference time, p95.
 **Preconditions**: matcher warmed; representative resolution (1024×768 SP+LG / GIM-LG).
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Replay 1000 cross-view pairs through inline path | `inline_matcher_ms` per pair |
 | 2 | Replay 100 cross-view pairs through re-loc path | `reloc_matcher_ms` per pair |
 **Pass criteria**: inline p95 ≤ 200 ms; re-loc p95 ≤ 2000 ms.
 **Duration**: ≤30 min.
 ---
 ### NFT-PERF-04: Orthority per-frame latency ≤50 ms
 **Summary**: Orthority's per-frame ortho call on Orin Nano Super stays within budget.
 **Traces to**: F-T14, M-27. Tier: T4 binding. If exceeded, fall back to `cv2.warpPerspective + bilinear DEM` per Component 1b documented fall-back.
 **Metric**: ortho per-frame duration, p95.
 **Preconditions**: Orthority loaded; SRTM-30 m DEM mmap warm; sector classified `flat` or `moderate`.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Replay 1000 frames | per-frame `ortho_ms` |
 **Pass criteria**: p95 ≤ 50 ms. If FAIL: open task to switch to fall-back path (not a blocking gate at this test, but a flow trigger).
 **Duration**: ≤10 min.
 ---
 ### NFT-PERF-05: Spoofing-promotion latency ≤3 s p95 (AC-NEW-2)
 **Summary**: Time from spoof onset to SUT promotion as primary GPS source.
 **Traces to**: AC-NEW-2. Tier: T3 (`deferred-sitl`).
 **Metric**: t_promote = `t_promotion_event - t_spoof_onset`, p95 over 50 trials.
 **Preconditions**: SITL + `gps-spoof-injector`; FC EKF3 lane-switch event observable via `EKF_STATUS_REPORT`.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | At t=0 inject spoof signal | observe SUT GPS_INPUT promotion (raised `fix_type` to 3D-fix-with-priority + STATUSTEXT `PROMOTE`) |
 | 2 | Repeat 50 trials with randomised spoof magnitudes | distribution |
 **Pass criteria**: p95 ≤ 3 s.
 **Duration**: ≤30 min.
 ---
 ### NFT-PERF-06: Frame-by-frame output cadence (AC-4.4)
 **Summary**: GPS_INPUT is streamed per-frame, not batched.
 **Traces to**: AC-4.4. Tier: T1 + T4.
 **Metric**: inter-frame interval distribution.
 **Preconditions**: 30 min steady-state replay.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Replay at 3 fps | sniff GPS_INPUT timestamps |
 | 2 | Compute inter-arrival deltas | distribution |
 | 3 | Verify no frame is delayed >1 inter-frame interval | — |
 **Pass criteria**: |Δt - 1/3 s| ≤ 50 ms for ≥99 % of frames; no batches (no clusters of frames within the same 50 ms window).
 **Duration**: 30 min.
 ---
 ### NFT-PERF-07: GPS_INPUT message rate (results_report row 9)
 **Summary**: GPS_INPUT emitted at 5–10 Hz continuous (matches per-frame at 3 fps + duplicates for FC stability when configured).
 **Traces to**: AC-4.3, results_report row 9. Tier: T1.
 **Metric**: rate over 60 s windows.
 **Preconditions**: steady-state tracking.
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Sniff GPS_INPUT for 5 min | per-second rate |
 **Pass criteria**: rate ∈ [5, 10] Hz throughout.
 **Duration**: 5 min.
 ---
 ### NFT-PERF-08: VPR latency under conditional invocation
 **Summary**: VPR's DINOv2 forward only fires on re-loc triggers; in cruise it stays near zero CPU/GPU.
 **Traces to**: AC-8.6, restrictions §Satellite (VPR retrieval unit). Tier: T4.
 **Metric**: VPR invocations / second; cruise idle vs re-loc burst.
 **Preconditions**: 60-min replay with scripted re-loc triggers (cold start, sharp turn, σ_xy > 50 m, VO failure ≥2 frames).
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Run replay | per-second `vpr_invocations` counter |
 | 2 | Compute average across cruise window vs re-loc window | — |
 **Pass criteria**:
 - Cruise window (no triggers): VPR invocations / 100 frames ≤ 1 (i.e., not invoked per-frame).
 - Re-loc window: VPR invokes within 1 frame of trigger; latency ≤ 200 ms p95 for the DINOv2 forward.
 **Duration**: 60 min.
 ---
 ### NFT-PERF-09: Top-K dynamic sizing matches sector / σ_xy
 **Summary**: VPR top-K honours AC-8.6 dynamic-K rules.
 **Traces to**: AC-8.6. Tier: T1 + T4.
 **Metric**: K value selected per VPR call vs sector class + σ_xy.
 **Preconditions**: scripted scenarios with (sector ∈ {stable, active}) × (σ_xy ∈ {10, 30, 60}).
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Trigger VPR in each combination | observe `vpr_top_k` metric |
 **Pass criteria**:
 - stable + σ_xy ≤ 20 m → K=5.
 - active-conflict → K=20.
 - expanding-window fallback (σ_xy > 50 m or fail-N) → K=50.
 **Duration**: 5 min.
 ---
 ### NFT-PERF-10: Failsafe latency ≤3 s no-fix → FC fallback (AC-5.2)
 **Summary**: When SUT cannot produce any estimate for >3 s, FC observably falls back to IMU-only DR.
 **Traces to**: AC-5.2. Tier: T3.
 **Metric**: time from last-fix-emission to FC fallback signal in `EKF_STATUS_REPORT`.
 **Preconditions**: scripted blackout in SITL.
 **Steps**: blackout pipeline; observe FC.
 **Pass criteria**: FC fallback observable within 4 s of blackout (3 s budget + 1 s observation latency).
 **Duration**: 5 min.
 ---
 ### NFT-PERF-11: Bench-off candidates — accuracy-vs-latency frontier
 **Summary**: Score inline matcher candidates on the documented bench-off corpora.
 **Traces to**: AC-1.1 / AC-1.2 / AC-2.2 / R2 / R3, F-T15. Tier: T2.
 **Metric**: per-candidate (recall@30 m, p95 latency, peak GPU mem, sustained 30-min thermal stability, seasonal-robustness score).
 **Preconditions**: AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic.
 **Steps**: run each candidate (SP+LG, GIM-LG, XFeat sparse, XFeat semi-dense) and each ceiling reference (RoMa v2, MASt3R-SLAM, MapGlue, MATCHA — offline only) over the corpora.
 **Pass criteria**:
 - Inline candidates must fit in 200 ms / pair on Orin Nano Super @ 25 W.
 - Re-loc candidates (LiteSAM) must fit in 2 s / pair.
 - Selected inline matcher's recall@30 m on AerialVL S03 must support AC-1.1 / AC-1.2.
 **Duration**: 4 h Monte Carlo.
 ---
 ### NFT-PERF-12: Latency under adversarial input — no infinite stall
 **Summary**: Pathological inputs (uniform-grey frame, all-black frame, very low contrast) do not cause unbounded latency.
 **Traces to**: AC-3.x (resilience), AC-4.1 (negative). Tier: T1.
 **Metric**: per-frame latency capped.
 **Preconditions**: replay with 5 % of frames replaced by uniform-grey or all-black.
 **Steps**: replay 30 min; observe latency CDF.
 **Pass criteria**: each frame's latency ≤ 600 ms (1.5× p95 budget); pipeline never stalls beyond a single frame interval.
 **Duration**: 30 min.
 ---
 ## Test execution caveats
 - **T1 runs**: produced numbers are NOT deployment-binding. AC-4.1 / NFT-PERF-01 specifically requires Orin Nano Super 25 W (T4) for binding pass.
 - **T4 runs**: bench scheduler enforces single-tenant access; thermal warm-up ≥1 min before measurement window starts.
 - **Frame-rate floor**: AC-4.1 allows ~10 % drop under sustained load. Drop rate IS measured and reported in NFT-PERF-01.
@@ -0,0 +1,309 @@
 # Resilience Tests
 > Each test defines fault injection + observable recovery + quantifiable pass/fail. All run through the public interfaces from `environment.md`.
 ---
 ### NFT-RES-01: Companion-computer process kill mid-flight (AC-5.3, AC-NEW-1)
 **Summary**: SUT process killed mid-flight; SUT restarts and recovers from FC's IMU-extrapolated position within 30 s.
 **Traces to**: AC-5.3, AC-NEW-1, F-T11, results_report row 25. Tier: T1.
 **Preconditions**: SUT in steady-state tracking; FC continues to fly.
 **Fault injection**:
 - `docker kill -s SIGKILL <sut>` followed by `docker start <sut>`.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | SIGKILL SUT | SUT process exits non-gracefully; FC continues IMU-only DR per AC-5.2 |
 | 2 | Restart SUT | container starts |
 | 3 | Time from container start to first valid GPS_INPUT (`fix_type==3`) | t_recovery ≤ 30 s |
 | 4 | Read `GLOBAL_POSITION_INT` from FC at SUT-start; assert pipeline seeds from it | source recovery via FC pose |
 | 5 | After first satellite match, error ≤ 50 m | accuracy restored |
 **Pass criteria**: t_recovery ≤ 30 s p95 over 50 trials; AC-5.2 fallback observable on FC during the gap; accuracy restored ≤ 50 m after first match.
 **Duration**: 60 s per trial; 50-trial campaign on T4.
 ---
 ### NFT-RES-02: GPS spoofing — promotion within 3 s (AC-NEW-2)
 **Summary**: FC GPS-loss / lane-switch event signalled → SUT promotes its estimate to primary within 3 s.
 **Traces to**: AC-NEW-2, F-T12. Tier: T3 (`deferred-sitl`).
 **Preconditions**: SITL + `gps-spoof-injector`.
 **Fault injection**:
 - Inject malicious `GPS_RAW_INT` with 1 km lat/lon offset starting at scripted t=0.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | t=0: inject spoof | FC observes anomaly; emits EKF lane-switch / fix-loss in `EKF_STATUS_REPORT` |
 | 2 | SUT subscribes to `GPS_RAW_INT`, `EKF_STATUS_REPORT`, `SYS_STATUS` and maintains a "real-GPS health" rolling average | health drops below threshold |
 | 3 | Within 3 s, SUT raises GPS_INPUT to primary mode + emits STATUSTEXT `PROMOTE` to GCS | promotion event observable |
 **Pass criteria**: 95th percentile of t_promote ≤ 3 s over 50 trials.
 **Duration**: 30 min campaign.
 ---
 ### NFT-RES-03: 3-s no-fix → FC fallback to IMU-only DR (AC-5.2)
 **Summary**: Pipeline blackout for >3 s — FC falls back to IMU-only DR; SUT logs the failure.
 **Traces to**: AC-5.2, restrictions §Failsafe. Tier: T3.
 **Fault injection**: scripted scenario where SUT cannot produce any estimate for 3.5 s (e.g., cuVSLAM tracking loss + cache poisoned + matcher offline).
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Inject blackout | SUT publishes STATUSTEXT WARN within 1 s of blackout |
 | 2 | At t=3 s of blackout, SUT emits a single STATUSTEXT FAILSAFE | recorded |
 | 3 | Observe FC `EKF_STATUS_REPORT` | FC switches to IMU-only DR within 4 s of blackout start |
 | 4 | After 5 s, restore pipeline | SUT re-emits valid GPS_INPUT; FC re-fuses |
 **Pass criteria**: FC fallback observable within 4 s; SUT recovers within 30 s of pipeline restore (matches AC-NEW-1 budget).
 **Duration**: 60 s per trial.
 ---
 ### NFT-RES-04: 3-consecutive-failures → RELOC_REQ + waiting state (AC-3.4)
 **Summary**: When SUT cannot determine position for ≥3 consecutive frames AND ≥2 s, it sends a re-localization request.
 **Traces to**: AC-3.4, results_report rows 20, 21, 46. Tier: T1.
 **Fault injection**: scripted 3 frames of failed satellite matching + cuVSLAM degraded.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Trigger 3 consecutive frame failures spanning ≥2 s | counter increments |
 | 2 | Within 2 s of the third failure, STATUSTEXT `RELOC_REQ: last_lat=… last_lon=… uncertainty=…m` emitted | regex match |
 | 3 | While waiting, SUT continues VO/IMU dead reckoning (`fix_type==0`, source `dead_reckoned`) and continues satellite-match attempts (counter increments) | observable |
 | 4 | FC continues with last known position + IMU extrapolation | `EKF_STATUS_REPORT` consistent |
 **Pass criteria**: regex matches; SUT continues emitting GPS_INPUT in waiting state; satellite-match counter increments.
 **Duration**: 60 s.
 ---
 ### NFT-RES-05: Operator hint workflow (AC-3.4, AC-6.2)
 **Summary**: Operator hint is consumed as a 500 m seed for VPR/cross-view re-loc.
 **Traces to**: AC-3.4, AC-6.2, F-T10, results_report row 22. Tier: T1.
 **Preconditions**: SUT in re-loc waiting (after NFT-RES-04).
 **Fault injection** (cooperative): `qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m`.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Send hint | SUT consumes hint; STATUSTEXT `HINT_RECEIVED` echoed |
 | 2 | First fix after hint | error ≤ 500 m |
 | 3 | After next satellite match | error ≤ 50 m; `tracking_state == NORMAL` |
 **Pass criteria**: as above.
 **Duration**: 60 s.
 ---
 ### NFT-RES-06: Sharp turn — VO-loss → satellite re-loc (AC-3.2)
 **Summary**: <5 % overlap, <70°, <200 m drift triggers VO loss; satellite re-loc recovers within 3 frames.
 **Traces to**: AC-3.2, F-T7. Tier: T1.
 **Fault injection**: synthetic sharp-turn pair injected into `nav_cam_60_slice`.
 **Steps**: see FT-P-14; resilience perspective: cuVSLAM tracking-loss event → matcher invocation via re-loc trigger → recovery.
 **Pass criteria**: error ≤ 50 m within 3 frames of turn; cuVSLAM tracking-state returns to NORMAL.
 **Duration**: 60 s.
 ---
 ### NFT-RES-07: Disconnected-segment recovery (AC-3.3)
 **Summary**: ≥3 disconnected segments per flight; each segment connects to prior trajectory via global retrieval.
 **Traces to**: AC-3.3, F-T8. Tier: T1.
 **Fault injection**: `disconnected_segments_replay` with ≥3 large gaps.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Replay segment N (after gap) | VPR retrieves top-K candidate chunks; matcher relocalizes within 10 frames |
 | 2 | After re-loc, trajectory continuity restored (no jump in EKF position beyond gap-expected) | `tracking_state == NORMAL` |
 | 3 | Repeat for ≥3 segments | all 3 succeed |
 **Pass criteria**: 3/3 segments recover within 10 frames; trajectory continuity maintained.
 **Duration**: 5 min.
 ---
 ### NFT-RES-08: cuVSLAM-degraded fall-back path
 **Summary**: If cuVSLAM underperforms (tracking lost repeatedly), SUT degrades gracefully and emits `dead_reckoned` source label rather than producing wild estimates.
 **Traces to**: AC-1.4, AC-3.x, R8 reframed. Tier: T1.
 **Fault injection**: scripted cuVSLAM tracking loss for 30 s.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Force cuVSLAM tracking-loss for 30 s | source label switches to `dead_reckoned`; horiz_accuracy grows |
 | 2 | After 30 s, restore cuVSLAM | source label returns to `vo_extrapolated` or `satellite_anchored` |
 | 3 | Verify GPS_INPUT during the 30 s window does not contain wild jumps | per-frame Δposition ≤ IMU integration bound |
 **Pass criteria**: source label correctly transitions; no wild jumps; behaviour reversible.
 **Duration**: 60 s.
 ---
 ### NFT-RES-09: Tile-cache corruption — graceful degradation
 **Summary**: Corrupted MBTiles entry triggers reject + WARN, not a crash.
 **Traces to**: AC-8.3, AC-3.x. Tier: T1.
 **Fault injection**: overwrite a tile sidecar JSON with garbage between SUT runs.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Inject corruption | SUT logs WARN at cache-load |
 | 2 | Replay frames over the affected sector | matcher does not consume the corrupt tile; falls through to next candidate |
 | 3 | SUT process | does NOT crash; tracking_state may go DEGRADED for affected frames, then NORMAL |
 **Pass criteria**: process alive; corrupt tile never produces `satellite_anchored`; recovery on next valid sector.
 **Duration**: 60 s.
 ---
 ### NFT-RES-10: SITL F-T9 source-switching (AC-4.3 Option A)
 **Summary**: ArduPilot SITL fuses GPS_INPUT correctly; failover to `EK3_SRC2_*` when primary unavailable.
 **Traces to**: AC-4.3, F-T9 Option A. Tier: T3.
 **Fault injection**: temporarily stop SUT GPS_INPUT emission for 5 s; observe FC failover.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | SUT stops emitting | FC EKF3 detects loss; switches to `EK3_SRC2_*=GPS` |
 | 2 | Resume SUT emission | EKF3 switches back; no double-fusion (no #30076 / #32506 symptoms) |
 **Pass criteria**: clean switch in both directions; EKF3 logs show no double-fusion symptoms.
 **Duration**: 15 min.
 ---
 ### NFT-RES-11: MAVLink2 signing failure — FC rejects, SUT logs
 **Summary**: When the runner sends a deliberately mis-signed GPS_INPUT, FC rejects and SUT/FC log the rejection.
 **Traces to**: M-7, S-T1, F-T9 signing assertion. Tier: T3.
 **Fault injection**: send a GPS_INPUT with valid schema but invalid signing tag.
 **Steps**: see FT-N-14.
 **Pass criteria**: FC ARM-rejects the message; STATUSTEXT WARN observable; FC continues on prior valid source.
 **Duration**: 30 s.
 ---
 ### NFT-RES-12: Stale-tile rejection (AC-NEW-6)
 **Summary**: Tile beyond freshness budget (or grace zone) is rejected — `satellite_anchored` source label NEVER produced from it.
 **Traces to**: AC-8.2, AC-NEW-6, NF-T6. Tier: T1.
 **Fault injection**: `stale_tile_scenarios` with ages 7 / 11 / 13 / 18 months for active-conflict + stable-rear sectors.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | For each combination, replay frames over the affected sector | matcher invocation either skipped or scored 0 |
 | 2 | Assert source label of resulting GPS_INPUT | NEVER `satellite_anchored` from stale tile |
 | 3 | Confidence weight on tiles in 30-day grace zone | linearly decayed per spec |
 **Pass criteria**: as above.
 **Duration**: 5 min.
 ---
 ### NFT-RES-13: F-T16 cloud-occlusion injection
 **Summary**: Synthetic cloud occlusion on a fraction of frames does not cause cascading failure.
 **Traces to**: F-T16, AC-3.x. Tier: T2 (`deferred-corpus`).
 **Fault injection**: 30 % of frames in AerialVL S03 replay overlaid with synthetic cloud cover.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Run replay | matcher fails on cloud-occluded frames; pipeline degrades to `vo_extrapolated` |
 | 2 | After cloud passes, satellite re-loc resumes | source returns to `satellite_anchored` |
 **Pass criteria**: AC-1.1 / AC-1.2 still met on the non-cloud-frame subset; pipeline does not enter unrecoverable state.
 **Duration**: 90 min.
 ---
 ### NFT-RES-14: 8-hour soak — no FDR rollover loss (AC-NEW-3)
 **Summary**: Sustained 8 h replay; FDR caps at 64 GB and rolls over without silently dropping a payload class.
 **Traces to**: AC-NEW-3, NF-T5. Tier: T4 (`deferred-hil`).
 **Fault injection**: replay `synthetic_8h_load` continuously for 8 h.
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Run replay | FDR populates |
 | 2 | Inspect at every hour boundary | size monotonic up to cap; rollover events logged |
 | 3 | After 8 h | FDR ≤ 64 GB; all payload classes present (positions, IMU, GPS_INPUT, tlog, system health, mid-flight tiles, failure-thumbnail log) |
 **Pass criteria**: ≤ 64 GB; all classes present in the latest segment; rollover events logged for any class that hit cap.
 **Duration**: 8 h.
 ---
 ### NFT-RES-15: AC-NEW-7 cache-poisoning Service-side voting
 **Summary**: Single-flight onboard tile is NOT promoted to trusted basemap until ≥2 voting flights confirm.
 **Traces to**: AC-NEW-7, F-T3. Tier: T1 (with `service-stub`).
 **Fault injection** (cooperative): submit a single-flight tile with deliberately deflated EKF covariance.
 **Steps**: see FT-N-17.
 **Pass criteria**: candidate stays `trust_level=candidate`; promotion only after N≥2 voting; for active sectors, single-flight promotion only when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 %.
 **Duration**: 5 min.
 ---
 ### NFT-RES-16: ROS 2 topic-rate sanity (F-T19)
 **Summary**: Under simulated load, all expected ROS 2 contract topics meet expected publish rates.
 **Traces to**: F-T19, Q6 → A. Tier: T1 (uses ROS 2 sniffer that subscribes only to documented contract topics, treating internal topics as opaque).
 **Fault injection**: synthetic load (load generator publishes pseudo-image frames at 3 fps + IMU at 200 Hz).
 **Steps**: subscribe to `nav_msgs/Odometry` (cuVSLAM output), `sensor_msgs/Image` (camera input), `mavros/global_position/global` (FC bridge), `mavros/imu/data` (FC bridge).
 **Pass criteria**: each contract topic publishes at expected rate ± 10 % over a 5 min window.
 **Duration**: 5 min.
@@ -0,0 +1,177 @@
 # Resource Limit Tests
 > All tests measure resources via the `prom` (Prometheus) and `nvidia-smi-exporter` services defined in `environment.md`. None of these tests touch SUT internals.
 ---
 ### NFT-RES-LIM-01: Memory ≤8 GB shared (AC-4.2)
 **Summary**: Peak resident memory + GPU memory remains under the 8 GB shared LPDDR5 cap.
 **Traces to**: AC-4.2, results_report row 35, NF-T2. Tier: T1 (Docker mem accounting) + T4 (`tegrastats`).
 **Preconditions**: 30-min sustained replay on Orin Nano Super 25 W (T4) or 30-min replay on x86+CUDA emulation (T1 functional only).
 **Monitoring**:
 - `prom` scrapes the SUT's `/metrics` endpoint for `process_resident_memory_bytes`.
 - `nvidia-smi-exporter` (T4) scrapes Jetson `tegrastats` for shared-LPDDR5 usage.
 **Duration**: 30 min replay.
 **Pass criteria**:
 - T4 binding: peak shared LPDDR5 usage < 8192 MB throughout; growth ≤ 50 MB over the 30-min window (no leak).
 - T1 functional: peak resident memory < 8192 MB; growth ≤ 50 MB.
 ---
 ### NFT-RES-LIM-02: Thermal — junction temperature ≤80 °C, no throttle (results_report row 36)
 **Summary**: SoC junction temperature stays below 80 °C; no thermal throttle event.
 **Traces to**: results_report row 36, AC-NEW-5 (sub-budget). Tier: T4.
 **Preconditions**: T4 only; +25 °C ambient.
 **Monitoring**: `nvidia-smi-exporter` reads junction temp every 1 s.
 **Duration**: 30 min replay.
 **Pass criteria**: max(junction_temp_c) ≤ 80 °C; throttle_event_count == 0 (per `tegrastats throttle` indicator).
 ---
 ### NFT-RES-LIM-03: AC-NEW-5 thermal envelope — 8 h @ 25 W @ +50 °C ambient
 **Summary**: Cooling solution sustains 25 W for 8 h at +50 °C ambient without thermal throttling.
 **Traces to**: AC-NEW-5, NF-T3, restriction §Onboard Hardware. Tier: T4 (`deferred-hil`) — requires hot-soak chamber.
 **Preconditions**: hot-soak chamber, +50 °C ambient stabilized; SUT in 25 W mode running `synthetic_8h_load`.
 **Monitoring**: junction temp + throttle indicator via `tegrastats`; ambient temp probe; FDR thermal log (AC-NEW-3 includes thermal traces).
 **Duration**: 8 h.
 **Pass criteria**: throttle_event_count == 0 over 8 h; throttle event automatically emits STATUSTEXT to GCS if it occurs (verify behaviour with a deliberate throttle injection in a separate run).
 ---
 ### NFT-RES-LIM-04: AC-NEW-5 cold-soak cold-start
 **Summary**: Cold-start TTFF at −20 °C ambient meets AC-NEW-1 budget.
 **Traces to**: AC-NEW-5 cold corner, AC-NEW-1, NF-T3 cold-soak. Tier: T4 (`deferred-hil`) — requires cold chamber.
 **Preconditions**: chamber stabilized at −20 °C with SUT powered off; nav-cam + IMU sources cold-replay-ready.
 **Monitoring**: TTFF timer (per FT-P-16 / FT-P-T4 cold).
 **Duration**: 50 cold boots within the cold chamber.
 **Pass criteria**: 95th percentile TTFF ≤ 30 s.
 ---
 ### NFT-RES-LIM-05: FDR — 8-h cap + rollover (AC-NEW-3, NF-T5)
 **Summary**: After 8 h replay, FDR is ≤ 64 GB and no payload class silently dropped.
 **Traces to**: AC-NEW-3, AC-8.5, NF-T5. Tier: T1 (volume-size accounting) + T4 (real disk).
 **Preconditions**: clean `fdr` volume at start; `synthetic_8h_load` replay.
 **Monitoring**: filesystem accounting per directory class; FDR rollover log (must record every dropped segment).
 **Duration**: 8 h.
 **Pass criteria**:
 - Total FDR ≤ 64 GB.
 - All payload classes present in the latest segment: per-frame positions w/ covariance + source-label, FC IMU full-rate, GPS_INPUT frames, MAVLink raw stream (tlog), system health (CPU / GPU / temp / throttle), mid-flight tiles, ≤0.1 Hz failure-thumbnail log.
 - For each rollover, a STATUSTEXT or rollover log entry exists; no silent drop.
 - Raw nav-cam / AI-cam frames are NOT present (AC-8.5 cross-check).
 ---
 ### NFT-RES-LIM-06: Tile cache ≤ 10 GB persistent (restrictions §UAV)
 **Summary**: Persistent satellite-tile cache for the 400 km² operational area + onboard-generated tiles fits in 10 GB.
 **Traces to**: restrictions §UAV ("~10 GB" tile-cache budget). Tier: T1.
 **Preconditions**: simulate 400 km² operational area (satellite tiles + DEM tiles + VPR chunk index) loaded; run a flight that generates onboard tiles; let cache settle.
 **Monitoring**: filesystem size of `/probe/tiles/`.
 **Duration**: 30 min replay (enough to populate onboard tiles).
 **Pass criteria**: total cache size ≤ 10 GB after the flight; deduplication keeps onboard tiles per sector ≤ 1.
 ---
 ### NFT-RES-LIM-07: GPU memory peak
 **Summary**: TensorRT engines (cuVSLAM + matcher + VPR) collectively fit within Orin Nano Super shared LPDDR5 with headroom for the rest of the system.
 **Traces to**: AC-4.2, NF-T2 (extended for ROS 2 image growth). Tier: T4.
 **Preconditions**: all TRT engines loaded.
 **Monitoring**: `tegrastats` GPU memory line.
 **Duration**: steady-state 5 min after warm-up.
 **Pass criteria**: GPU memory ≤ 4 GB (leaves ≥ 4 GB for ROS 2 nodes + working set + OS); engine reservation ≥ 1 GB for matcher + VPR (per NF-T2 extended).
 ---
 ### NFT-RES-LIM-08: Per-frame GPU latency budget breakdown
 **Summary**: Sum of (cuVSLAM + matcher + VPR + Component 5 calibrator + Component 1b ortho) ≤ 400 ms p95 per AC-4.1.
 **Traces to**: AC-4.1, NFT-PERF-01..04. Tier: T4.
 **Monitoring**: per-stage timers exposed via `/metrics`.
 **Duration**: 30 min replay.
 **Pass criteria**: Σ p95(per-stage) ≤ 400 ms; each component within its sub-budget (cuVSLAM ≤ 20, matcher inline ≤ 200, ortho ≤ 50, VPR conditional ≤ 200 only on triggers, calibrator ≤ 5).
 ---
 ### NFT-RES-LIM-09: ROS 2 + Isaac ROS image footprint
 **Summary**: Deployment image fits the documented ~200 MB growth budget over the DIY-Python baseline.
 **Traces to**: M-29 cost / benefit, NF-T2 extended. Tier: T1 (image inspection).
 **Steps**: build the deployment image; compare against a baseline DIY-Python image manifest; assert delta ≤ 200 MB.
 **Pass criteria**: delta ≤ 200 MB; matcher + VPR engine reservation ≥ 1 GB available at runtime.
 ---
 ### NFT-RES-LIM-10: CPU usage — DDS overhead bound
 **Summary**: ROS 2 DDS + topic serialisation overhead stays within the documented 2–5 % CPU.
 **Traces to**: M-29 (Q6 → A cost / benefit). Tier: T4.
 **Monitoring**: per-process CPU via `prom`; DDS process / `rmw_*` thread CPU specifically.
 **Duration**: 30 min replay.
 **Pass criteria**: DDS CPU mean ≤ 5 %; total SUT CPU ≤ 80 % to leave headroom for spikes.
 ---
 ### NFT-RES-LIM-11: Operational area ≤ 400 km² and 8-h flight cap
 **Summary**: SUT correctly handles the documented operational ceiling (sector 150 km² + corridor 50 km² ≈ 200 km² typical, up to 400 km² total).
 **Traces to**: restrictions §UAV. Tier: T1 (smoke + audit).
 **Steps**: configure SUT with a 400 km² operational area; verify boot-time pre-allocation respects budget; run a synthetic flight at 60 km/h cruise for 30 min (representative of 8 h scaled).
 **Pass criteria**: SUT loads tile descriptors + VPR index without OOM; 30 min replay sustained at expected fps; resource budgets (NFT-RES-LIM-01..10) all green at this scale.
 ---
 ### NFT-RES-LIM-12: Disk I/O — FDR write rate sustainable
 **Summary**: FDR write rate sustained over 8 h does not back up the writer or interfere with the inline pipeline.
 **Traces to**: AC-NEW-3, AC-4.1 (no interference). Tier: T4.
 **Monitoring**: NVMe write throughput (MB/s) via Prometheus + I/O wait via `vmstat`.
 **Duration**: 8 h.
 **Pass criteria**: write rate ≤ NVMe sustained throughput minus 30 % headroom; I/O wait does not contribute to AC-4.1 latency violations (NFT-PERF-01 still passes during the 8-h window).
@@ -0,0 +1,222 @@
 # Security Tests
 > Black-box security scenarios at the public interfaces. Code-level vulnerability scanning is out of scope here (handled by Phase 4 security audit / `security/SKILL.md`).
 ---
 ### NFT-SEC-01: MAVLink2 signing — invalid signature rejected (S-T1)
 **Summary**: A GPS_INPUT or other companion-bound MAVLink frame with invalid signing tag is rejected by the FC; SUT and FC both log the rejection.
 **Traces to**: M-7, R10, restrictions §Sensors (MAVLink2 signing mandatory), S-T1, F-T9. Tier: T3 (`deferred-sitl`).
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Runner injects a GPS_INPUT with valid schema but signing tag computed against a wrong key | FC discards frame; STATUSTEXT WARN visible at GCS |
 | 2 | Inspect FC log | rejection event recorded |
 | 3 | Subsequent valid GPS_INPUT | accepted normally |
 **Pass criteria**: invalid frame discarded; FC continues on prior valid source; valid frames still accepted.
 ---
 ### NFT-SEC-02: HTTPS unauthenticated requests are rejected
 **Summary**: All HTTPS API endpoints require valid JWT.
 **Traces to**: results_report row 33, restriction "JWT auth on the HTTP boundary". Tier: T1.
 **Steps**:
 | Step | Endpoint | Auth | Expected Response |
 |------|---------|------|-------------------|
 | 1 | `POST /sessions` | none | HTTP 401 |
 | 2 | `POST /objects/locate` | none | HTTP 401 |
 | 3 | `GET /sessions/{id}/stream` | none | HTTP 401 |
 | 4 | `GET /health` | none | HTTP 200 (health is intentionally unauthenticated for liveness probes — confirm via S-T2) OR 401 if it requires auth |
 **Pass criteria**: 1–3 return 401; 4's behaviour matches the documented contract (test asserts whichever the contract states). If `/health` is unauthenticated, body still must NOT leak sensitive state (no flight data, no key fingerprints).
 ---
 ### NFT-SEC-03: HTTPS — malformed / expired / wrong-issuer JWT
 **Summary**: JWTs that fail validation are rejected.
 **Traces to**: derived from results_report row 33. Tier: T1.
 **Steps**:
 | Step | Token | Expected Response |
 |------|-------|-------------------|
 | 1 | malformed (`.foo.bar`) | HTTP 401 |
 | 2 | expired (`exp` in the past) | HTTP 401 |
 | 3 | wrong issuer | HTTP 401 |
 | 4 | wrong signing algorithm (`none` algorithm) | HTTP 401 |
 | 5 | missing required claim (e.g., `sub`) | HTTP 401 |
 **Pass criteria**: all return 401 with no leaked state in the body.
 ---
 ### NFT-SEC-04: TLS — minimum version + downgrade rejection
 **Summary**: TLS ≥1.2; weaker / downgrade attempts rejected.
 **Traces to**: S-T2, derived from restriction "telemetry plumbing uses MAVSDK + HTTPS API". Tier: T1.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Connect with TLSv1.0 / TLSv1.1 | refused |
 | 2 | Connect with cipher suite from a known weak set (e.g., RC4) | refused |
 | 3 | Valid TLSv1.2+ + modern cipher | accepted |
 **Pass criteria**: all weak attempts refused; modern accepted.
 ---
 ### NFT-SEC-05: Tile-cache write attempt by unauthorized API path
 **Summary**: SUT does not expose any HTTP path that allows external clients to write to the tile cache.
 **Traces to**: AC-8.5 (storage policy), AC-NEW-7 (cache integrity), restriction §Satellite. Tier: T1.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | `POST /tiles` (or any guess) with valid JWT | 404 or 405 (no such endpoint) |
 | 2 | Try `PUT /var/lib/gpsdenied/tiles/...` via any exposed API | 404 / 405 |
 | 3 | Inspect the documented OpenAPI contract | no tile-write endpoints |
 **Pass criteria**: no successful tile-write paths exist via HTTP; only the post-flight uploader (out-bound to `service-stub`) writes outside the SUT.
 ---
 ### NFT-SEC-06: Spoofed sysid / sysid collision (M-31)
 **Summary**: A second device claiming sysid 11 (the SUT's sysid) — FC handles per ArduPilot routing rules.
 **Traces to**: M-31, F-T9. Tier: T3.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Runner publishes a fake GPS_INPUT from a sysid-collision sender | FC routing handles per documented behaviour (latest-talker wins or rejects) |
 | 2 | Confirm FC parameter audit prints the actual sysid configured | matches deployment runbook (M-31 sysid collision-check) |
 **Pass criteria**: behaviour matches documented FC routing rule; STATUSTEXT WARN observable; test verifies the deploy runbook's collision-check (M-31) catches this in pre-flight.
 ---
 ### NFT-SEC-07: Operator-hint injection — only signed STATUSTEXT consumed
 **Summary**: Unsigned operator hints (or hints from a non-allowed sender) are not consumed.
 **Traces to**: AC-6.2, M-7. Tier: T3.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Send `RELOC_HINT` STATUSTEXT with invalid MAVLink2 signing | SUT discards; emits WARN |
 | 2 | Send from a sysid not on the allowed-list | SUT discards |
 | 3 | Send signed by allowed sender | SUT consumes (NFT-RES-05 covers happy path) |
 **Pass criteria**: only authenticated, allowed-sender hints are consumed.
 ---
 ### NFT-SEC-08: GPS_RAW_INT spoofing chain — SUT promotion is the safety boundary
 **Summary**: A spoofed `GPS_RAW_INT` cannot influence SUT's GPS_INPUT directly; SUT only uses GPS_RAW_INT for source-promotion logic, not for fusing.
 **Traces to**: AC-NEW-2, restriction §Failsafe. Tier: T3.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Inject GPS_RAW_INT with high-quality false fix | SUT does NOT use it as a position seed; only uses it for the "real-GPS health" rolling average |
 | 2 | After scripted spoofing-pattern, SUT promotes its own estimate per AC-NEW-2 | promotion event observable |
 **Pass criteria**: SUT GPS_INPUT positions never influenced by spoofed GPS_RAW_INT lat/lon (compare SUT GPS_INPUT vs ground truth from `coordinates.csv` during the spoof window).
 ---
 ### NFT-SEC-09: USB bypass surface — bench-only
 **Summary**: USB bypasses MAVLink2 signing per restriction; this must be **disabled in production** runtime config.
 **Traces to**: M-7, restrictions §Onboard Hardware. Tier: T1 (config audit).
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | At SUT boot, inspect runtime config | USB MAVLink endpoint disabled in production profile (env var `MAVLINK_USB_ALLOWED=false` or absent) |
 | 2 | Attempt to connect via USB | refused |
 **Pass criteria**: production config refuses USB MAVLink; bench config (env var explicitly enabled) accepts.
 ---
 ### NFT-SEC-10: FDR — no sensitive-data leak
 **Summary**: FDR contains the documented payload classes only — no private keys, no plaintext JWTs, no MAVLink2 signing keys, no raw frames (AC-8.5).
 **Traces to**: AC-8.5, AC-NEW-3, S-T3 (data-at-rest). Tier: T1.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | After a 30 min replay, scan FDR for known-sensitive byte patterns (test-only signing key bytes; test JWT) | none found |
 | 2 | Scan for raw JPEG headers in non-thumbnail-log payload classes | none |
 | 3 | Verify failure-thumbnail log is ≤ 0.1 Hz and within FDR cap | as spec'd |
 **Pass criteria**: no leaks; raw-frame storage policy enforced.
 ---
 ### NFT-SEC-11: External-host network policy
 **Summary**: SUT does not call external commercial satellite providers at runtime.
 **Traces to**: AC-8.1, restrictions §Satellite. Tier: T1.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Run a 5-min replay with `iptables` / Docker network policy capturing all out-bound connections | none of the captured destinations resolves to Maxar / Airbus / Planet / Sentinel-2 / Esri / etc. |
 | 2 | The only allowed out-bound is to `service-stub` (the Suite Satellite Service candidate-pool endpoint, post-flight) | matches |
 **Pass criteria**: no out-bound to commercial / public ortho providers at runtime.
 ---
 ### NFT-SEC-12: HTTPS — payload size + path-traversal hardening
 **Summary**: Pathological HTTP requests do not crash the SUT or leak filesystem content.
 **Traces to**: AC-3.x (resilience), restrictions (security defaults). Tier: T1.
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | `POST /objects/locate` with a 100 MB body | HTTP 413 (payload too large) |
 | 2 | Path-traversal `GET /sessions/../../etc/passwd` | HTTP 404 / 400; no filesystem leak |
 | 3 | Header-injection (`X-Forwarded-For: \r\nSet-Cookie: …`) | sanitised; no echo back |
 **Pass criteria**: as above; SUT alive; no leak.
 ---
 ### NFT-SEC-13: AC-NEW-7 over-confidence injection — gate rejects
 **Summary**: Synthetic over-confidence injection (1.5×–3× covariance deflation) does not let bad tiles into the trusted basemap.
 **Traces to**: AC-NEW-7. Tier: T2 (`deferred-corpus`).
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|-------------------|
 | 1 | Replay AerialVL + Mavic + AerialExtreMatch with synthetic deflation | per-tile geo-misalignment computed |
 | 2 | At the σ_xy boundary (3 m, 5 m, 10 m), assert hard-gate behaviour | tiles outside σ_xy ≤ 5 m never written; tiles in (3, 5] m marked `trust_level=soft`; tiles ≤ 3 m `trust_level=candidate` |
 **Pass criteria**: P(misalign > 30 m) < 1 %, P(misalign > 100 m) < 0.1 %; voting layer prevents single-flight promotion in non-active sectors.
@@ -0,0 +1,164 @@
 # Test Data Management
 ## Important Caveat — 60-image slice scope (per Phase 1 D2)
 The 60 nav-cam JPGs in `_docs/00_problem/input_data/AD000001.jpg … AD000060.jpg` were captured at **400 m AGL** with the **ADTi Surveyor Lite 26S v2 (26 MP, 6252 × 4168, 25 mm, 23.5 mm sensor)** — **not** the deployment camera (ADTi 20MP 20L V1, APS-C, ~5472 × 3648) and **not** the deployment altitude (≤1 km AGL). This corpus is therefore **pipeline-correctness only**:
 - It validates that the pipeline (cuVSLAM → VPR → matcher → Component 5 → MAVLink GPS_INPUT) produces the right **shape** of output, in the right **order**, with the right **categorical labels** and **MAVLink schema**.
 - It does **NOT** validate the deployment-binding accuracy budgets (AC-1.1 ≥80 %@50 m, AC-1.2 ≥50 %@20 m), the GSD-band assumptions, the matcher resolution sweeps, or the latency budget for the deployed 1 km AGL / 20 MP path.
 - Pass numbers from this slice on AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-NEW-8 are **functional, not deployment-binding**. The deployment-binding numbers come from the deferred-corpus tier (AerialVL S03, UAV-VisLoc, AerialExtreMatch, internal Mavic, first internal fixed-wing flight).
 ## Seed Data Sets
 | Data Set | Description | Used by Tests | How Loaded | Cleanup |
 |----------|-------------|---------------|-----------|---------|
 | `nav_cam_60_slice` | 60 JPGs `AD000001.jpg`…`AD000060.jpg`, 6252×4168, captured at 400 m AGL | T1 pipeline-correctness tests (FT-P-01..FT-P-08, FT-N-01..FT-N-04) | volume mount `fixtures-images:/fixtures/images:ro` | volume is read-only — no cleanup |
 | `nav_cam_60_slice_coordinates` | `coordinates.csv`: per-frame WGS84 ground truth | All T1 accuracy tests | mount path `/fixtures/images/coordinates.csv` | — |
 | `nav_cam_60_slice_imu` *(synthetic, fixture)* | `fixtures/imu_AD0000xx.csv`: 200 Hz IMU traces synthesised by SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory | T1 cuVSLAM tests; F-T1c IMU-sync-jitter measurement | mount path `/fixtures/imu/` ; `ardupilot-sitl --imu-replay=...` | regenerated per test session |
 | `satellite_tiles_AD0000xx_z20` *(placeholder fixture)* | z=20 ortho-tiles for the bbox of `coordinates.csv`, fetched offline by `tile-cache-init` from public ortho service (Esri / Mapbox / Sentinel-2 fallback gated to ≥0.5 m/px) | T1 cross-view matcher / VPR tests | volume `tile-cache:/var/lib/gpsdenied/tiles` | volume rebuilt per test session |
 | `satellite_tile_descriptors_z20` | Pre-extracted SuperPoint keypoints + DINOv2-VLAD global descriptors for `satellite_tiles_AD0000xx_z20` | T1 VPR + matcher tests | same volume, sidecar `.descriptors.h5` files | same |
 | `aerialvl_s03` *(deferred-corpus)* | AerialVL S03: 70 km of fixed-wing flight at 1 km AGL with synced IMU + GPS truth + nav-cam stream | T2 AC-1.3, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9 | external download script (data team task — Decompose); mount when present | not removed (large, kept across sessions) |
 | `uav_visloc` *(deferred-corpus)* | UAV-VisLoc public dataset | T2 matcher / VPR seasonal-robustness regression | external download script | not removed |
 | `aerialextrematch` *(deferred-corpus)* | AerialExtreMatch open-review dataset | T2 matcher seasonal-robustness regression | external download script | not removed |
 | `2chadcnn_seasons` *(deferred-corpus)* | 2chADCNN season set (cross-season scene-change benchmark) | T2 NF-T*-season-robustness | external download script | not removed |
 | `tartanair_v2` *(deferred-corpus)* | TartanAir V2 synthetic scenes | T2 matcher distillation evaluation | external download script | not removed |
 | `internal_mavic` *(deferred-corpus)* | Internal Mavic 3 Pro Mini recorded flights (legacy attempt; no IMU per problem.md, used for visual-only checks) | T2 matcher visual-only regression | external `data team` mount | not removed |
 | `internal_fixed_wing_first_sortie` *(deferred-field)* | First internal fixed-wing flight with synced IMU + GPS truth | T5 FT-1 / FT-2 / FT-3, AC-1.3 lock | field-test mount | not removed |
 | `synthetic_8h_load` *(synthesisable)* | 8-hour synthetic 3 fps nav-frame replay sequence assembled from `nav_cam_60_slice` looped + jittered | NF-T3 thermal soak, NF-T5 FDR rollover (AC-NEW-3), AC-NEW-5 | generated at fixture build time by `fixtures/synth-8h-loader/` | regenerated per session |
 | `cold_soak_corpus` *(deferred-hil)* | A short replay loop run at −20 °C ambient | T4 NF-T3 cold-soak, AC-NEW-1 cold | bench HW only | — |
 | `hot_soak_corpus` *(deferred-hil)* | Same replay loop run at +50 °C ambient for 8 h | T4 NF-T3 hot-soak, AC-NEW-5 | bench HW only | — |
 | `spoofing_scenarios` | Scripted MAVLink GPS_RAW_INT injections: jam-onset, lat/lon offset, sat-count drop, hdop spike | T3 F-T9 / F-T12, AC-NEW-2 | `gps-spoof-injector` config files | regenerated per session |
 | `operator_hint_scenarios` | Scripted operator STATUSTEXT messages with approximate `(lat, lon, sigma_xy=500m)` | T3 F-T10, AC-3.4, AC-6.2, results_report row 22 | `qgc-mock` config | regenerated per session |
 | `stale_tile_scenarios` | Synthetic-age tiles (1, 5, 7, 11, 13, 18 months old; both active-conflict and stable-rear sectors) | T1 NF-T6, AC-8.2 / AC-NEW-6 | injected into `tile-cache` by `tile-cache-init --inject-stale` | volume rebuilt per session |
 | `cache_poisoning_scenarios` | Multi-flight Monte Carlo with synthetic over-confidence injection (EKF covariance deflated by 1.5×–3×) | T2 NF-T4b, AC-NEW-7 | generated by `fixtures/cache-poison-mc/` | regenerated per session |
 | `cold_start_replay_50` | 50× cold-boot replay: SUT process killed and restarted with simulated FC pose injection | T1+T4 F-T11, AC-NEW-1 | scripted in `e2e-runner` test | — |
 | `disconnected_segments_replay` | Synthetic ≥3 disconnected flight segments stitched from `nav_cam_60_slice` with gaps | T1 F-T8, AC-3.3 | generated at fixture build time | regenerated per session |
 | `tile_dedup_replay` | A flight where ground sectors are visited twice — used to verify deduplication (AC-8.4) | T1 F-T2 | generated at fixture build time | regenerated per session |
 | `mavlink2_signing_keys` | Test-only per-airframe HMAC-SHA256 signing keys | T1 / T3 F-T9, S-T1, MAVLink2 signing assertions | env var `MAVLINK2_SIGNING_KEY=…` shared SUT + runner + FC | rotated per session |
 | `tls_test_certs` | Self-signed CA + SUT cert + client cert (test-only) | T1 S-T1..S-T5 HTTPS auth tests | mount `tls-test-certs:/etc/gpsdenied/tls:ro` | regenerated per session |
 ## Data Isolation Strategy
 - **Container scope**: each test session starts with a clean `sut` container (no cache poisoning between sessions).
 - **Volume scope**: `tile-cache` and `fdr` volumes are **rebuilt per test session** (not per test) — within a session, tests that depend on cache state are ordered or use namespaced subdirectories. `fixtures-images`, `fixtures-imu`, `fixtures-expected` are read-only; cannot be polluted.
 - **Cross-test contamination**: tests that mutate state (cache writes, FDR writes) declare `pytest.mark.mutates_state` and are run in a serial group. Read-only tests run in parallel within a tier.
 - **Identity isolation**: each session generates a fresh `mavlink2_signing_keys` set and JWT signing key — replay across sessions is impossible.
 - **Resource isolation**: T4 deferred-hil tests do **not** share a Jetson with any other test; bench scheduler enforces single-tenant access.
 ## Input Data Mapping
 | Input Data File | Source Location | Description | Covers Scenarios |
 |-----------------|----------------|-------------|-----------------|
 | `AD000001.jpg`…`AD000060.jpg` | `_docs/00_problem/input_data/` | 60 nav-cam JPGs, 6252×4168, 400 m AGL, ADTi 26S v2 | FT-P-01..FT-P-08, FT-N-01..FT-N-04, NF-RES-LIM-01..03 (T1) |
 | `coordinates.csv` | `_docs/00_problem/input_data/` | Frame index → WGS84 ground truth | results_report rows 1–4, FT-P-01, FT-P-02, NFT-PERF-01 |
 | `data_parameters.md` | `_docs/00_problem/input_data/` | Corpus-shoot params (400 m AGL, 26S v2, 25 mm, 23.5 mm sensor) | All T1 tests — context for pipeline-correctness scope |
 | `AD000001_gmaps.png`, `AD000002_gmaps.png` | `_docs/00_problem/input_data/` | Two satellite reference thumbnails (frames 1–2 only) | Smoke-test only; not used as the cross-view reference (placeholder fixture is) |
 | `expected_results/results_report.md` | `_docs/00_problem/input_data/` | 46-scenario expected results mapping | All T1 tests + most T2 tests; canonical pass/fail thresholds |
 | `expected_results/position_accuracy.csv` | `_docs/00_problem/input_data/` | Per-frame ground truth + thresholds | results_report rows 1–3, FT-P-01, FT-P-02 |
 ## Expected Results Mapping
 The canonical mapping is `_docs/00_problem/input_data/expected_results/results_report.md`. The traceability matrix references that file by row number. The summary table below lists the rows by the test scenario IDs that consume them.
 | Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
 |-----------------|------------|-----------------|-------------------|-----------|----------------------|
 | FT-P-01 | `coordinates.csv` (60 frames) + `nav_cam_60_slice` + `satellite_tiles_AD0000xx_z20` + `nav_cam_60_slice_imu` | ≥80 % within 50 m | `percentage` | ≥80 % | `results_report` row 1; `position_accuracy.csv` |
 | FT-P-02 | same | ≥50 % within 20 m | `percentage` | ≥50 % | `results_report` row 2; `position_accuracy.csv` |
 | FT-P-03 | same | each frame ≤100 m error | `numeric_tolerance` | ±100 m max per frame | `results_report` row 3 |
 | FT-P-04 | same | cumulative VO drift between satellite anchors ≤100 m mono / ≤50 m mono+IMU | `threshold_max` | mono: ≤100 m; mono+IMU: ≤50 m | `results_report` row 4 ; AC-1.3 / AC-NEW-8 |
 | FT-P-05 | single frame + IMU | `fix_type=3, horiz_accuracy ∈ [1,50] m, satellites_visible=10` | `exact` (fix_type, sat) + `range` (h_acc) | as stated | `results_report` row 5 |
 | FT-P-06 | sequence, no satellite >30 s | `fix_type=3, horiz_accuracy ∈ [20,100]` | `exact` + `range` | as stated | `results_report` row 6 |
 | FT-P-07 | sequence, VO lost + no satellite | `fix_type=2, h_acc ≥ 50 m` (growing) | `exact` + `threshold_min` | as stated | `results_report` row 7 |
 | FT-P-08 | VO lost + 3 sat failures | `fix_type=0, h_acc=999.0` | `exact` | N/A | `results_report` row 8 |
 | FT-P-09 | tier transitions | tier ∈ {HIGH, MEDIUM, LOW, FAILED} per conditions | `exact` | N/A | `results_report` rows 10–13 |
 | FT-P-10 | 60 frames | registration rate ≥95 % (T1 functional only) | `percentage` | ≥95 % (functional) | `results_report` row 14 |
 | FT-P-11 | 60 frames | MRE < 1.0 px VO frame-to-frame; < 2.5 px cross-domain | `threshold_max` | <1.0 / <2.5 | `results_report` row 15 ; AC-2.2 |
 | FT-P-12 | frames 32–43 (turn area) | system continues producing position estimates through turn | `threshold_min` | ≥1 position output / frame | `results_report` row 16 |
 | FT-P-13 | 350 m gap synthetic | error ≤100 m after recovery | `threshold_max` | ≤100 m | `results_report` row 17 |
 | FT-P-14 | sharp-turn synthetic | satellite re-loc triggers; error ≤50 m within 3 frames | `threshold_max` | ≤50 m | `results_report` row 18 |
 | FT-P-15 | VO loss + sat success | `tracking_state == NORMAL` after recovery | `exact` | N/A | `results_report` row 19 |
 | FT-P-16 | startup with `GLOBAL_POSITION_INT` | first GPS_INPUT within 30 s of boot, p95 | `threshold_max` | ≤30 s p95 | `results_report` row 23 ; AC-NEW-1 |
 | FT-P-17 | startup + first satellite match | error ≤50 m after first match | `threshold_max` | ≤50 m | `results_report` row 24 |
 | FT-P-18 | reboot mid-flight | recovery time ≤30 s | `threshold_max` | ≤30 s | `results_report` row 25 ; AC-NEW-1 |
 | FT-P-19 | post-reboot first match | error ≤50 m | `threshold_max` | ≤50 m | `results_report` row 26 |
 | FT-P-20 | object localize valid request | response with lat/lon within `accuracy_m` of ground truth | `numeric_tolerance` | per response.accuracy_m | `results_report` row 27 |
 | FT-P-21 | round-trip GPS→NED→pixel→GPS | error ≤0.1 m | `threshold_max` | ≤0.1 m | `results_report` row 29 |
 | FT-P-22 | `GET /health` | 200 + JSON with `status`, `memory_mb`, `gpu_temp_c` | `exact` + `regex` | as stated | `results_report` row 30 |
 | FT-P-23 | `POST /sessions` | 200 or 201 + session id | `exact` | status ∈ {200,201} | `results_report` row 31 |
 | FT-P-24 | `GET /sessions/{id}/stream` | SSE events at ~1 Hz with schema fields | `regex` + rate | per SSE schema | `results_report` row 32 |
 | FT-P-25 | TRT engine load | ≤10 s total | `threshold_max` | ≤10 s | `results_report` row 39 |
 | FT-P-26 | mission area definition | 300–1000 MB tile storage | `range` | [300, 1000] MB | `results_report` row 40 |
 | FT-P-27 | EKF position ± 3σ | tile mosaic radius ≥500 m | `threshold_min` | ≥500 m | `results_report` row 41 |
 | FT-P-28 | tile dedup replay | ≤1 tile per ground sector visited ≥2× | `exact` | per-sector count == 1 | AC-8.4, F-T2 |
 | FT-P-29 | post-flight upload | tiles uploaded to candidate pool with `trust_level=candidate` | `exact` | as stated | AC-8.4, F-T3 |
 | FT-P-30 | telemetry | NAMED_VALUE_FLOAT at 1 Hz ± 0.2 Hz | `numeric_tolerance` | 1 Hz ± 0.2 Hz | `results_report` row 45 |
 | FT-N-01 | corrupted JPG | system continues with `tracking_state == DEGRADED`, no crash | `exact` | tracking_state ∈ {DEGRADED, NORMAL} | derived from AC-3.x |
 | FT-N-02 | invalid object localize pixel | HTTP 422 | `exact` | status == 422 | `results_report` row 28 |
 | FT-N-03 | unauthenticated `POST /sessions` | HTTP 401 | `exact` | status == 401 | `results_report` row 33 |
 | FT-N-04 | tile older than freshness budget | tile rejected or down-confidence; never `satellite_anchored` | `exact` | as stated | AC-8.2, AC-NEW-6 |
 | FT-N-05 | tile in 30-day grace zone | confidence linearly decayed | `numeric_tolerance` | per spec curve | AC-NEW-6 |
 | FT-N-06 | sharp turn (no overlap, <70°, <200 m) | satellite re-loc within 3 frames | `threshold_max` | ≤50 m within 3 frames | `results_report` row 18 ; AC-3.2 |
 | FT-N-07 | VO loss + 3 sat failures | `RELOC_REQ` regex pattern emitted via STATUSTEXT | `regex` | per pattern | `results_report` rows 20, 46 |
 | FT-N-08 | re-loc active | `fix_type=0`, IMU prediction continues, sat attempts continue | `exact` | as stated | `results_report` row 21 |
 | FT-N-09 | operator hint received | hint used as 500 m seed for VPR; ≤500 m initially, ≤50 m after match | `threshold_max` | as stated | `results_report` row 22 |
 | NFT-PERF-01 | single 6252×4168 frame on Orin Nano Super 25 W (T4) | end-to-end latency ≤400 ms p95 | `threshold_max` | ≤400 ms p95 | `results_report` row 34 ; AC-4.1 |
 | NFT-PERF-02 | cuVSLAM single frame | ≤20 ms / frame | `threshold_max` | ≤20 ms | `results_report` row 37 |
 | NFT-PERF-03 | matcher single pair on Orin Nano Super 25 W | inline ≤200 ms; re-loc fallback ≤2000 ms | `threshold_max` | as stated | `results_report` row 38 |
 | NFT-PERF-04 | Orthority per-frame on Orin Nano Super | ≤50 ms / frame | `threshold_max` | ≤50 m frame | F-T14, M-27 |
 | NFT-PERF-05 | spoof onset → SUT promotion | ≤3 s p95 | `threshold_max` | ≤3 s p95 | AC-NEW-2 ; F-T12 |
 | NFT-PERF-06 | per-frame end-to-end (frame-by-frame, not batched) | inter-frame interval matches camera rate | `numeric_tolerance` | per frame within ±50 ms of camera rate | AC-4.4 |
 | NFT-RES-01 | SUT process killed mid-flight | recovery ≤30 s, restart from FC pose | `threshold_max` | ≤30 s | `results_report` row 25 ; AC-5.3, AC-NEW-1 |
 | NFT-RES-02 | spoofing onset | promotion ≤3 s | `threshold_max` | ≤3 s | AC-NEW-2 |
 | NFT-RES-03 | network partition with FC | failsafe at 3 s no fix | `threshold_max` | ≤3 s | AC-5.2 |
 | NFT-RES-04 | EKF3 lane-switch / fix-loss event | source-promotion responds | `exact` | promotion within budget | AC-NEW-2 |
 | NFT-SEC-01 | unsigned MAVLink injection | FC rejects | `exact` | acceptance==false | F-T9, S-T1 |
 | NFT-SEC-02 | unauthenticated REST | 401 / 403 | `exact` | per endpoint | results_report row 33 |
 | NFT-SEC-03 | malformed JWT | 401 | `exact` | status==401 | derived |
 | NFT-SEC-04 | TLS downgrade attempt | rejected | `exact` | TLS ≥1.2 only | S-T2 |
 | NFT-SEC-05 | tile-cache write attempt by unauthorized API | 403 / no-op | `exact` | as stated | AC-8.5, AC-NEW-7 |
 | NFT-RES-LIM-01 | 30-min sustained load (T1+T4) | peak < 8192 MB; growth ≤50 MB / 30 min | `threshold_max` | as stated | results_report row 35 ; AC-4.2 |
 | NFT-RES-LIM-02 | 30-min sustained load | SoC junction ≤80 °C | `threshold_max` | ≤80 °C | results_report row 36 |
 | NFT-RES-LIM-03 | 8-h sustained 25 W @ +50 °C ambient (T4) | no thermal throttle | `exact` | throttle_event_count == 0 | AC-NEW-5, NF-T3 |
 | NFT-RES-LIM-04 | FDR 8-h synthetic load | FDR ≤64 GB; rollover logged; no payload class silently dropped | `threshold_max` + audit | as stated | AC-NEW-3, NF-T5 |
 | NFT-RES-LIM-05 | tile cache 400 km² | ≤10 GB persistent | `threshold_max` | ≤10 GB | restrictions §UAV |
 ## External Dependency Mocks
 | External Service | Mock/Stub | How Provided | Behavior |
 |-----------------|-----------|-------------|----------|
 | Azaion Suite Satellite Service (pre-flight cache sync) | `tile-cache-init` one-shot loader | Docker service that materialises MBTiles + sidecar before SUT starts | Returns the same fixture set every run; deterministic |
 | Azaion Suite Satellite Service (post-flight upload) | candidate-pool stub inside `qgc-mock` (or a dedicated `service-stub` container) | HTTP server with `POST /candidates` accepting tile uploads, recording to a file | Records what the SUT sends; never alters the cache used by the next test |
 | QGroundControl GCS | `qgc-mock` | Custom MAVLink-only mock | Records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY frames; can inject operator-hint STATUSTEXT |
 | ArduPilot autopilot | `ardupilot-sitl` (PR #30080-pinned) | Official ArduPilot SITL container | Replays IMU from fixture; runs EKF3; exposes `RAW_IMU`, `ATTITUDE`, `GLOBAL_POSITION_INT`, `EKF_STATUS_REPORT`, `GPS_RAW_INT` |
 | Spoofing GPS adversary | `gps-spoof-injector` | Custom MAVLink injector | Sends crafted `GPS_RAW_INT` with configurable lat/lon offset, sat count, hdop |
 | Identity provider (JWT) | in-runner key generator | Test-only HMAC-SHA256 key shared at SUT boot via env var | Mints valid + invalid + expired JWTs |
 | External satellite providers (Maxar, Airbus, Planet) | **NOT MOCKED** — out of scope per AC-8.1; SUT does not call them at runtime | — | The SUT must never make outbound HTTP to these hosts; F-T2 / NFT-SEC-04 includes a network-policy assertion |
 All mocks are deterministic — same input always produces same output — except the spoof / operator-hint scenarios that explicitly schedule events on a wall-clock so the SUT's timing budgets (AC-NEW-1, AC-NEW-2) are exercised.
 ## Data Validation Rules
 | Data Type | Validation | Invalid Examples | Expected System Behavior |
 |-----------|-----------|-----------------|------------------------|
 | Nav-cam frame | non-zero size; JPEG / PNG decodable; expected resolution within ±1 % of `data_parameters.md` | 0-byte file, truncated JPEG header, wildly wrong resolution | log error; `tracking_state` transitions to `DEGRADED` if loss >2 frames; never crash |
 | IMU sample | rate 200 Hz ± 10 %; timestamps monotonic; covariance present | timestamp regression, rate < 50 Hz, NaN / Inf | drop sample with WARN log; if loss > 0.5 s → cuVSLAM degrade; AC-5.2 path eligible |
 | Satellite tile | MBTiles schema valid; descriptors present; `capture_date` within freshness budget for sector | corrupt MBTiles, missing sidecar, beyond-grace freshness | reject with WARN; AC-8.2 / AC-NEW-6 |
 | MAVLink GPS_RAW_INT (FC inputs) | well-formed; signing valid (when MAVLink2 signing on) | unsigned frame, malformed length, sysid spoofing | reject; F-T9 + S-T1 cover this |
 | HTTPS request body | JSON parse OK; required fields present; pixel coords ∈ frame bounds | missing fields, NaN, out-of-bounds pixel | HTTP 422 |
 | JWT | signature valid; not expired; subject is allowed | expired, wrong sig, missing claims | HTTP 401 |
 | Tile descriptor | dimension matches index; checksum match | wrong dims, mismatched hash | reject load; cache marks as corrupt; F-T2 |
 | Operator hint STATUSTEXT | parseable `RELOC_HINT: lat=… lon=… sigma=…`; numeric ranges sane | malformed, NaN, negative sigma, lat > 90 / lon > 180 | reject hint; emit STATUSTEXT WARN; do not seed VPR |
 ## Pending Data (Phase 1 D3 — placeholder fixtures)
 The following fixtures are **declared by name** in this spec but **not yet present** at the time of writing. Phase 3's HARD GATE will surface them as **`pending data`**, not "remove":
 | Fixture | Generator / source | Owner | Phase 3 treatment |
 |---------|-------------------|-------|-------------------|
 | `fixtures/satellite_tiles_AD0000xx_z20/` | `tile-cache-init` script: fetch z=20 ortho tiles for the bbox of `coordinates.csv` from a public ortho service (Esri / Mapbox / Sentinel-2 ≥ 0.5 m/px); pre-extract SuperPoint + DINOv2-VLAD descriptors | Decompose / impl. team task | `pending data` — not removed; `data_status: deferred-corpus` retained until generator script is committed |
 | `fixtures/imu_AD0000xx.csv` | SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory at 200 Hz | Decompose / impl. team task | `pending data` — not removed; `data_status: deferred-corpus` |
 | `aerialvl_s03`, `uav_visloc`, `aerialextrematch`, `2chadcnn_seasons`, `tartanair_v2`, `internal_mavic` | External downloads + curation | data team task (Decompose creates a "dataset acquisition" task) | `data_status: deferred-corpus` |
 | `internal_fixed_wing_first_sortie` | Field-test plan | operations team | `data_status: deferred-field` |
 | `cold_soak_corpus`, `hot_soak_corpus` | Bench HW + chamber | bench team | `data_status: deferred-hil` |
 | `synthetic_8h_load` | `fixtures/synth-8h-loader/` script | impl. team | regenerated per session — synthesisable, no external dependency |
 | `cache_poisoning_scenarios` | `fixtures/cache-poison-mc/` script | impl. team | regenerated per session |
@@ -0,0 +1,138 @@
 # Traceability Matrix
 > **`data_status` legend** (Phase 1 decision D4):
 > - `present` — fixture / corpus is in `_docs/00_problem/input_data/` and ready.
 > - `deferred-corpus` — relies on an external dataset declared by name (AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, internal-fixed-wing first sortie, multi-flight Monte Carlo) — fixture path is reserved; data not yet downloaded / curated.
 > - `deferred-sitl` — requires SITL ArduPilot environment (PR #30080-pinned) to be provisioned.
 > - `deferred-hil` — requires real Jetson Orin Nano Super on bench + thermal chamber.
 > - `deferred-field` — requires a real field-test sortie.
 > - `pending data` — placeholder fixture declared by name (Phase 1 D3) but generator script not yet committed (`fixtures/satellite_tiles_AD0000xx_z20/`, `fixtures/imu_AD0000xx.csv`).
 >
 > Per Phase 1 D4: tests are specified for **all 38 ACs** + the documented restrictions, even where data is not yet present. Phase 3's HARD GATE will surface fixtures as **`pending data`** rather than removing tests.
 ## Acceptance Criteria Coverage
 | AC ID | Acceptance Criterion (one-line) | Test IDs | data_status | Coverage |
 |-------|-----------|----------|-------------|----------|
 | AC-1.1 | ≥80 % within 50 m on normal flight (functional pipeline + deployment-binding) | FT-P-01 (T1), FT-P-T2 (T2 binding), NFT-PERF-11 (bench-off) | T1 `present`; T2 `deferred-corpus` (AerialVL S03) | Covered |
 | AC-1.2 | ≥50 % within 20 m | FT-P-02 (T1), FT-P-T2 (T2 binding) | same | Covered |
 | AC-1.3 | VO drift <100 m mono / <50 m mono+IMU between satellite anchors | FT-P-04 (T1 functional + T2 binding via AerialVL) | T1 `pending data` (synthetic IMU + placeholder tiles); T2 `deferred-corpus` | Covered |
 | AC-1.4 | Quantitative confidence score (covariance + categorical label) | FT-P-05, FT-P-06, FT-P-07, FT-P-09, NFT-RES-08 | `present` (T1) | Covered |
 | AC-2.1 | Image registration rate >95 % under normal-flight definition | FT-P-10 (T1 functional + T2 binding) | T1 `present`; T2 `deferred-corpus` | Covered |
 | AC-2.2 | MRE <1.0 px VO frame-to-frame; <2.5 px cross-domain | FT-P-11 (T1 functional + T2 binding) | T1 `pending data` (placeholder tiles); T2 `deferred-corpus` | Covered |
 | AC-3.1 | Survives 350 m outliers from ±20° tilt | FT-P-13 | `present` (synthetic injection over 60-image slice) | Covered |
 | AC-3.2 | Sharp turn (<5 % overlap, <70°, <200 m drift) handled by satellite re-loc | FT-P-14, FT-N-06, NFT-RES-06 | `present` (synthetic injection) + `pending data` (placeholder tiles) | Covered |
 | AC-3.3 | ≥3 disconnected segments per flight via global retrieval + RANSAC pose-graph re-loc | FT-P-31, NFT-RES-07 | `present` (synthetic) + `pending data` (placeholder tiles) | Covered |
 | AC-3.4 | RELOC_REQ on ≥3 frames AND ≥2 s no-position; continues VO/IMU DR while waiting | FT-N-07, FT-N-08, FT-N-09, NFT-RES-04, NFT-RES-05 | `present` | Covered |
 | AC-4.1 | End-to-end latency <400 ms p95 on Orin Nano Super 25 W | NFT-PERF-01 (T4 binding), NFT-PERF-12 | T1 `present` (functional smoke); T4 `deferred-hil` (binding) | Covered |
 | AC-4.2 | Memory <8 GB shared on Jetson Orin Nano Super | NFT-RES-LIM-01, NFT-RES-LIM-07 | T1 `present` (functional); T4 `deferred-hil` (binding) | Covered |
 | AC-4.3 | Two parallel MAVLink channels; v1 ships GPS_INPUT only (ODOMETRY disabled) | FT-P-05, FT-N-11, FT-N-15, FT-N-16 | T1 `present`; T3 `deferred-sitl` for SITL matrix | Covered |
 | AC-4.4 | Frame-by-frame output, no batching | NFT-PERF-06, FT-P-12 | `present` | Covered |
 | AC-4.5 | Refinement / corrections to prior fixes | FT-P-32 | `present` | Covered |
 | AC-5.1 | Initialise from FC's last-known GPS + IMU-extrapolated position at GPS denial | FT-P-17 | `present` | Covered |
 | AC-5.2 | >3 s no-fix → IMU-only DR + log failure | NFT-RES-03, NFT-PERF-10, FT-N-13 | T3 `deferred-sitl` (binding); T1 `present` for SUT-side observable | Covered |
 | AC-5.3 | Re-init on companion reboot from FC's IMU-extrapolated position | FT-P-18, FT-P-19, NFT-RES-01 | `present` | Covered |
 | AC-6.1 | QGC telemetry; per-frame on local link, 1–2 Hz GCS | FT-P-22, FT-P-23, FT-P-24, FT-P-30 | `present` | Covered |
 | AC-6.2 | GCS commands (operator hint via STATUSTEXT / NAMED_VALUE_FLOAT / custom dialect) | FT-N-09, FT-N-10, NFT-RES-05, NFT-SEC-07 | `present` | Covered |
 | AC-6.3 | Output coordinates in WGS84 | FT-P-05, FT-P-21 | `present` | Covered |
 | AC-7.1 | Object loc accuracy = frame-center accuracy in level flight; bound published in maneuver | FT-P-20, FT-P-33, FT-N-21 | `present` | Covered |
 | AC-7.2 | Object loc trigonometric (gimbal angle + zoom + altitude + flat-terrain) | FT-P-20, FT-P-21 | `present` | Covered |
 | AC-8.1 | Cache interface ≥0.5 m/px ideal 0.3 m/px; no direct calls to Maxar/Airbus/Planet | FT-N-19, NFT-SEC-11 | `present` | Covered |
 | AC-8.2 | Tile freshness <6 mo active / <12 mo stable | FT-N-04, FT-N-05, NFT-RES-12 | `present` (synthetic-age tiles) | Covered |
 | AC-8.3 | Pre-loaded + pre-processed cache; pre-extracted descriptors | FT-P-26, FT-P-27, NFT-RES-09 | T1 `present` for cache-shape; deployment binding `pending data` (real Service-supplied corpus) | Covered |
 | AC-8.4 | Mid-flight tile generation, dedup, post-flight upload | FT-P-28, FT-P-29, FT-P-34, F-T2 (within FT-P-28) | `present` (dedup replay) + `pending data` (`service-stub` records) | Covered |
 | AC-8.5 | No raw nav-cam / AI-cam frame retention; tiles + ≤0.1 Hz failure thumbnail log only | FT-N-18, NFT-SEC-10, NFT-RES-LIM-05 | `present` | Covered |
 | AC-8.6 | VPR retrieval unit decoupled from storage tile; multi-scale; dynamic K; conditional invocation | NFT-PERF-08, NFT-PERF-09 | T1 `pending data` (placeholder tiles + descriptors); T2 binding `deferred-corpus` | Covered |
 | AC-NEW-1 | Cold-start TTFF <30 s p95 | FT-P-16 (T1 N=10), FT-P-T4 cold (T4 N=50), FT-P-25, NFT-RES-LIM-04 | T1 `present` (functional smoke); T4 `deferred-hil` for cold-soak binding | Covered |
 | AC-NEW-2 | Spoofing-promotion <3 s p95 | NFT-PERF-05, NFT-RES-02, FT-N-12 | T3 `deferred-sitl` | Covered |
 | AC-NEW-3 | Flight Data Recorder, 64 GB cap, no raw frames, all classes preserved | NFT-RES-14, NFT-RES-LIM-05, NFT-SEC-10, FT-N-18 | T1 `present` (volume accounting); T4 `deferred-hil` for 8-h soak binding | Covered |
 | AC-NEW-4 | False-position safety budget P(>500 m)<0.1 %, P(>1 km)<0.01 % | covered via Monte Carlo on AerialVL S03 + Mavic + AerialExtreMatch (statistical analysis bundled into FT-P-T2 + FT-P-35 + dedicated NF-T4 Monte Carlo run) | T2 `deferred-corpus` (Monte Carlo over ≥100 simulated flights) | Covered |
 | AC-NEW-5 | Operating temp −20 °C to +50 °C; 25 W sustained 8 h with no thermal throttle | NFT-RES-LIM-02, NFT-RES-LIM-03, NFT-RES-LIM-04 | T4 `deferred-hil` (chamber) | Covered |
 | AC-NEW-6 | Stale-tile rejection / decay across 30-day grace | FT-N-04, FT-N-05, NFT-RES-12 | `present` (synthetic-age tiles) | Covered |
 | AC-NEW-7 | Cache-poisoning safety budget P(>30 m)<1 %, P(>100 m)<0.1 %; voting layer | FT-P-34, FT-N-17, FT-P-35, NFT-RES-15, NFT-SEC-13 | T1 `present` (gate behaviour) + `pending data` (`service-stub` voting); T2 `deferred-corpus` (Monte Carlo binding) | Covered |
 | AC-NEW-8 | cuVSLAM mono+IMU drift ≤50 m / mono ≤100 m on AerialVL fixed-wing trajectories | FT-P-04 (binding split) | T2 `deferred-corpus` (AerialVL S03) | Covered |
 | AC-NEW-9 | Companion-side covariance calibration: empirical residuals lie within reported h_acc/v_acc with prob ≥95 % | FT-P-36, FT-P-37 | T2 `deferred-corpus` (AerialVL S03) | Covered |
 ## Restrictions Coverage
 | Restriction ID | Restriction (one-line) | Test IDs | data_status | Coverage |
 |----------------|------------------------|----------|-------------|----------|
 | RESTRICT-UAV-01 | Fixed-wing UAV only | FT-P-T2 (binding via AerialVL fixed-wing) | T2 `deferred-corpus` | Covered |
 | RESTRICT-UAV-02 | Nav cam fixed downward, not gimbal-stabilized | FT-P-01..FT-P-04 (assumed by replay shape) | `present` | Covered |
 | RESTRICT-UAV-03 | Operational area: east/south Ukraine | environmental envelope (AC-NEW-5 covers thermal); no separate test required | — | Implicit (envelope captured by AC-NEW-5 + AC-8.6 active-conflict sector handling) |
 | RESTRICT-UAV-04 | 8-h flights at ~60 km/h; sector + corridor up to 400 km² total | NFT-RES-LIM-06, NFT-RES-LIM-11, NFT-RES-14 | T4 `deferred-hil` for 8-h | Covered |
 | RESTRICT-UAV-05 | ≤1 km AGL; flat-terrain assumption | AC-7.1 / AC-7.2 tests (flat-terrain) + Component 1b ortho terrain-class check (F-T14 within NFT-PERF-04) | `pending data` (DEM tiles) | Covered |
 | RESTRICT-UAV-06 | Predominantly sunny daytime | bench-off seasonal-robustness (NFT-PERF-11 + NFT-RES-13) | T2 `deferred-corpus` | Covered |
 | RESTRICT-UAV-07 | Sharp turns are exception (<5 % overlap) | FT-P-14, FT-N-06, NFT-RES-06 | `present` | Covered |
 | RESTRICT-UAV-08 | No photo-count cap | FT-N-20 | `present` | Covered |
 | RESTRICT-CAM-01 | Nav cam: ADTi 20MP 20L V1 APS-C; GSD 10–20 cm/px @ 1 km AGL | FT-P-T2 binding (AerialVL S03 stand-in until first internal fixed-wing flight) | T5 `deferred-field` for the deployment camera proper | Covered (caveat: 60-image slice = 26 MP @ 400 m AGL, pipeline-correctness only — see test-data.md D2 caveat) |
 | RESTRICT-CAM-02 | AI cam pose info = gimbal angle + zoom only; airframe attitude not published | FT-P-33, FT-N-21 | `present` | Covered |
 | RESTRICT-CAM-03 | Cameras connect via USB / MIPI-CSI / GigE | not separately testable at black-box level | — | Hardware-integration concern; covered by FT-1 / FT-2 / FT-3 field tests at T5 |
 | RESTRICT-SAT-01 | Source = Azaion Suite Satellite Service; SUT consumes via offline cache | NFT-SEC-11 | `present` | Covered |
 | RESTRICT-SAT-02 | No in-flight Service calls (offline cache only) | NFT-SEC-11 | `present` | Covered |
 | RESTRICT-SAT-03 | Mid-flight tile generation + post-flight upload | FT-P-28, FT-P-29, NFT-RES-15 | `present` + `pending data` (`service-stub`) | Covered |
 | RESTRICT-SAT-04 | No raw photo storage | FT-N-18, NFT-SEC-10 | `present` | Covered |
 | RESTRICT-SAT-05 | Cache resolution ≥0.5 m/px | FT-N-19 | `present` | Covered |
 | RESTRICT-SAT-06 | Storage tile zoom z=20 | FT-P-26 + cache-shape audit | `present` | Covered |
 | RESTRICT-SAT-07 | Freshness gates: 6 mo active / 12 mo stable | FT-N-04, FT-N-05, NFT-RES-12 | `present` | Covered |
 | RESTRICT-SAT-08 | Free public Sentinel-2 not on runtime path | FT-N-19, NFT-SEC-11 | `present` | Covered |
 | RESTRICT-HW-01 | Jetson Orin Nano Super: 67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W TDP | NFT-PERF-01, NFT-RES-LIM-01, NFT-RES-LIM-07 | T4 `deferred-hil` (binding) | Covered |
 | RESTRICT-HW-02 | JetPack + CUDA + TensorRT | FT-P-25 + NFT-PERF-02..04 | T4 `deferred-hil` | Covered |
 | RESTRICT-HW-03 | Cooling sustains 25 W for 8 h at upper temp | NFT-RES-LIM-03 | T4 `deferred-hil` (chamber) | Covered |
 | RESTRICT-HW-04 | NVMe ≥ 10 GB cache + 64 GB FDR | NFT-RES-LIM-05, NFT-RES-LIM-06, NFT-RES-LIM-12 | T1 + T4 mix | Covered |
 | RESTRICT-INTEG-01 | IMU via MAVLink from FC | F-T1c within FT-P-04 (cuVSLAM mono vs mono+IMU) | T1 `pending data` (synthetic IMU); T2 `deferred-corpus` for AerialVL IMU | Covered |
 | RESTRICT-INTEG-02 | MAVLink comm: MAVSDK + pymavlink, distinct sysids via ArduPilot routing, no `mavlink-router` | FT-P-05, FT-N-11, NFT-SEC-06 (sysid) | T1 + T3 | Covered |
 | RESTRICT-INTEG-03 | ArduPilot only; no PX4 | F-T9 SITL matrix runs only against ArduPilot SITL (FT-N-15, FT-N-16, NFT-RES-10) | T3 `deferred-sitl` | Covered |
 | RESTRICT-INTEG-04 | WGS84 output | FT-P-05, FT-P-21 | `present` | Covered |
 | RESTRICT-INTEG-05 | QGroundControl GCS only; no Mission Planner | by `qgc-mock` only — Mission Planner not exercised | `present` | Covered |
 | RESTRICT-FAIL-01 | 3 s no-fix → IMU DR fallback | NFT-RES-03, NFT-PERF-10 | T3 `deferred-sitl` | Covered |
 | RESTRICT-FAIL-02 | False-position safety (AC-NEW-4) | identical coverage as AC-NEW-4 | T2 `deferred-corpus` | Covered |
 | RESTRICT-FAIL-03 | Cold-start TTFF + spoofing-promotion latency budgets | identical to AC-NEW-1 + AC-NEW-2 | T1+T3+T4 mix | Covered |
 ## Coverage Summary
 | Category | Total Items | Covered | Not Covered | Coverage % |
 |----------|-----------|---------|-------------|-----------|
 | Acceptance Criteria | 38 | 38 | 0 | 100 % |
 | Restrictions | 31 | 31 | 0 | 100 % |
 | **Total** | **69** | **69** | **0** | **100 %** |
 ### Coverage by `data_status`
 | `data_status` | Test count (rows where this status appears for ≥1 test) | Notes |
 |---------------|-----------|-------|
 | `present` | majority of T1 tests | Covers all 60-image-slice pipeline-correctness ACs/restrictions and all behavioural-shape tests. |
 | `pending data` | satellite tile + IMU placeholder fixtures | Covers AC-1.3, AC-2.2 cross-domain, AC-3.2 sat re-loc, AC-3.3 segments, AC-8.6 VPR descriptors, AC-NEW-7 voting, RESTRICT-UAV-05 DEM, RESTRICT-INTEG-01 IMU. Surfaced as Phase 3 HARD-GATE finding, not removed. |
 | `deferred-corpus` | AC-1.1, AC-1.2 deployment-binding; AC-1.3 binding; AC-2.1 binding; AC-2.2 binding; AC-NEW-4; AC-NEW-7 Monte Carlo; AC-NEW-8; AC-NEW-9; bench-off corpora | AerialVL S03, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic. Decompose creates a "dataset acquisition" task. |
 | `deferred-sitl` | AC-4.3 SITL matrix (FT-N-15, FT-N-16); AC-NEW-2; RESTRICT-INTEG-03; RESTRICT-FAIL-01 | ArduPilot SITL pinned to PR #30080-class build. |
 | `deferred-hil` | AC-4.1 binding; AC-4.2 binding; AC-NEW-1 cold corner; AC-NEW-3 8-h soak; AC-NEW-5 thermal envelope; RESTRICT-HW-01..03 | Real Jetson + thermal chamber. |
 | `deferred-field` | RESTRICT-CAM-01 deployment-camera binding (first internal fixed-wing flight) | Field-test plan. |
 ## Uncovered Items Analysis
 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
 | (none) | — | — | — |
 All 38 ACs and 31 restrictions are covered by ≥1 test, per Phase 1 D4. **No uncovered items.** Coverage is 100 % at the spec level; data availability — not coverage — is the gating concern, surfaced via the `data_status` column.
 ## Pipeline-Correctness vs Deployment-Binding Boundary
 The 60-image slice (`present` data_status) is **pipeline-correctness only** for the accuracy ACs. Deployment-binding numbers come from the `deferred-corpus` and `deferred-hil` tiers. This is per Phase 1 decision D2 and is documented in `test-data.md`. The matrix's "Covered" column is honest about which tier supplies which evidence:
 | AC | Pipeline-correctness (T1, `present`) | Deployment-binding |
 |----|---------------------------------------|--------------------|
 | AC-1.1 | FT-P-01 (functional check) | FT-P-T2 (T2 `deferred-corpus` AerialVL S03) |
 | AC-1.2 | FT-P-02 | FT-P-T2 |
 | AC-1.3 | FT-P-04 (functional, with `pending data`) | FT-P-04 binding split (T2) |
 | AC-2.1 | FT-P-10 | FT-P-10 binding (T2) |
 | AC-2.2 | FT-P-11 | FT-P-11 binding (T2) |
 | AC-4.1 | NFT-PERF-01 functional smoke | NFT-PERF-01 binding (T4) |
 | AC-4.2 | NFT-RES-LIM-01 functional | NFT-RES-LIM-01 binding (T4) |
 | AC-NEW-1 | FT-P-16 (T1 N=10) | FT-P-T4 cold (T4 N=50) + NFT-RES-LIM-04 |
 | AC-NEW-3 | NFT-RES-LIM-05 functional | NFT-RES-14 + NFT-RES-LIM-05 binding (T4 8-h) |
 | AC-NEW-4 | (none — Monte Carlo only) | FT-P-35 (T2 binding) |
 | AC-NEW-5 | (none — chamber only) | NFT-RES-LIM-02..04 (T4 chamber) |
 | AC-NEW-7 | FT-P-34 + FT-N-17 functional | FT-P-35 + NFT-SEC-13 binding (T2) |
 | AC-NEW-8 | (none — fixed-wing only) | FT-P-04 binding (T2) |
 | AC-NEW-9 | (none — covariance evaluation requires ground-truth corpus) | FT-P-36 + FT-P-37 (T2) |
@@ -6,9 +6,9 @@ step: 3
 name: Plan
 status: in_progress
 sub_step:
-  phase: 2
+  phase: 5
-  name: test-scenarios
+  name: plan-step-1-complete
-  detail: "Plan Step 1 (test-spec) Phase 1 COMPLETE. User cleared the BLOCKING gate (all 4 questions = A): D1 fix stale docs inline; D2 60-image slice = pipeline-correctness corpus only; D3 spec with placeholder fixtures (satellite tiles + IMU); D4 spec all 46 ACs with data_status markers in traceability-matrix.md. Stale-doc fixes already applied to results_report.md rows 2/19/22/23/25/38 and AC-4.3 v1-scope clause added to acceptance_criteria.md. Findings + locked decisions saved to _docs/02_document/tests/_phase1_findings.md. NEXT on resume: Phase 2 (test-spec/phases/02-test-scenarios.md) — generate 8 artifacts under _docs/02_document/tests/ (environment.md, test-data.md, blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, resource-limit-tests.md, traceability-matrix.md). Recommended fresh conversation due to context-budget caution zone."
+  detail: "Plan Step 1 (test-spec sub-skill) COMPLETE. Phase 1 (input-data analysis) DONE earlier; Phase 2 (test scenarios, 8 artifacts) DONE; Phase 3 HARD GATE PASSED (100% spec-level coverage, 0 truly-missing items, 0 removed tests, defer-don't-remove per Phase 1 D4); Hardware Assessment DONE — `## Test Execution` section appended to environment.md classifying project as hardware-dependent and recording the Mode-C (both: Docker for T1/T2/T3 + bench-local for T4 + field for T5) per-tier split decision; Phase 4 (runner-scripts) SKIPPED per skill rule (planning context — script creation deferred to Decompose as tasks). Plan Step 1 user-level BLOCKING gate (test coverage confirmation) was satisfied by the Phase 2 → Phase 3 confirmation earlier in this session. Next: Plan Step 2 (Solution Analysis), opening with BLOCKING Phase 2a.0 (Glossary + Architecture Vision)."
 retry_count: 0
 cycle: 1
 tracker: jira