Update autodev skill documentation and acceptance criteria

Enhanced the SKILL.md file to enforce conciseness rules for the state file, specifying acceptable content and file size limits. Updated the autodev state to reflect the transition to the planning phase, including changes to the current step and sub-step details. Revised acceptance criteria to clarify validation requirements and external dependencies, ensuring alignment with the latest research findings. Added a new overlay for Mode B revisions to track changes and decisions made during the assessment process.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-09 03:10:57 +03:00
parent 846670a5c5
commit c19c76481c
21 changed files with 2354 additions and 10 deletions
+595
View File
@@ -0,0 +1,595 @@
# Blackbox Tests
All tests run from the `e2e-runner` container against the SUT through public boundaries only (frame source, FC inbound stream, tile cache mount, FC outbound observed via SITL, GCS observed via mavproxy-listener, FDR via post-run filesystem read). Two FC adapters parameterize every test that touches the FC contract: `ardupilot` and `inav`. Two `VioStrategy` modes parameterize Tier-1 product correctness tests: `okvis2` (production-default) and `klt_ransac` (mandatory simple-baseline). `vins_mono` is parameterized only when the research build is under test.
## Positive Scenarios
### FT-P-01: Still-image satellite-anchor frame-center accuracy
**Summary**: Validates the canonical satellite-anchor frame-center geolocation pipeline against the 60-image GT set.
**Traces to**: AC-1.1, AC-1.2
**Category**: Position Accuracy
**Preconditions**:
- `tile-cache-fixture` mounted at `/var/azaion/tile-cache`.
- SUT cold-started with no prior state; configured for the FC adapter under test.
**Input data**: `still-image-set-60` (per `test-data.md`).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | For each image `AD0000NN.jpg` in order, write the frame to the SUT's frame-source path and wait up to 5 s for the corresponding outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) message at the SITL listener | One outbound message per input image; payload includes WGS84 lat/lon |
| 2 | Compute Vincenty geodesic distance between estimated lat/lon and `coordinates.csv` GT row for that image | Per-image error ≤ 50 m for ≥80% of images, ≤ 20 m for ≥50% |
| 3 | Capture per-image error to `e2e-results/run-${RUN_ID}/ft-p-01.csv` | CSV produced with one row per image |
**Expected outcome**: aggregate `pass_count(error≤50m) ≥ 48` AND `pass_count(error≤20m) ≥ 30` (matching the rule in `expected_results/results_report.md`).
**Max execution time**: 5 min (60 images × ~5 s including SITL round trip).
---
### FT-P-02: Derkachi VIO drift between satellite anchors
**Summary**: Validates cumulative drift between consecutive satellite-anchored fixes during the Derkachi flight replay.
**Traces to**: AC-1.3
**Category**: Position Accuracy
**Preconditions**:
- `tile-cache-fixture` mounted (covers Derkachi route).
- SUT cold-started; FC adapter under test connected via SITL; `data_imu.csv` replayed at 10 Hz into FC IMU stream.
**Input data**: `derkachi-fixture` video at 30 fps + IMU CSV at 10 Hz.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start synchronized video + IMU replay (3 video frames per IMU row) | SUT begins emitting estimates at the SUT's runtime cadence |
| 2 | At each frame whose outbound estimate carries `source_label = satellite_anchored`, record the propagated centre estimate of the prior visual-only segment AND the new anchor centre | Two values per anchor pair captured |
| 3 | Compute per-anchor-pair drift = ‖propagated_centre next_anchor_centre‖. Bin by `last_satellite_anchor_age_ms`. | Bins populated; CSV emitted |
**Expected outcome**: Across all anchor pairs, at least 95% satisfy `drift < 100 m` (visual-only) AND `drift < 50 m` (when CombinedImuFactor IMU fusion is active in C5). Drift distribution monotonically grows with anchor age, with no anomalous spike.
**Max execution time**: 10 min (8 min replay + parsing).
---
### FT-P-03: Estimate output schema and source-label semantics
**Summary**: Validates the SUT's outbound estimate carries every required field with correct types and the source label is one of the three allowed values.
**Traces to**: AC-1.4, AC-4.3
**Category**: Position Accuracy / FC Contract
**Preconditions**:
- One image from `still-image-set-60` already loaded into the cache fixture.
- SUT cold-started.
**Input data**: any single image (default `AD000001.jpg`).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Push the image to the frame source | SUT emits one outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) AND one out-of-band channel message (MAVLink `STATUSTEXT` or `NAMED_VALUE_FLOAT` per AC-4.3) carrying the source label |
| 2 | Read the SITL-side fields | Schema match: `lat`, `lon`, `cov_semi_major_m`, `last_satellite_anchor_age_ms` present and well-typed |
| 3 | Read the out-of-band label channel | Label ∈ `{satellite_anchored, visual_propagated, dead_reckoned}` |
**Expected outcome**: Schema check passes AND label is in the allowed set.
**Max execution time**: 30 s.
---
### FT-P-04: Derkachi frame-to-frame registration success rate
**Summary**: Validates frame-to-frame registration succeeds for ≥95% of "normal" segments of the Derkachi flight.
**Traces to**: AC-2.1a
**Category**: Image Processing
**Preconditions**:
- SUT cold-started; FC adapter and VioStrategy both parameterized.
**Input data**: `derkachi-fixture` (full duration). "Normal" segments derived per AC-2.1a: nadir ±10° bank/pitch (estimated from `SCALED_IMU2`-derived attitude), ≥40% inferred prior-frame overlap (heuristic from frame-to-frame translation magnitude).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay the Derkachi fixture | SUT emits per-frame registration-success metric (exposed via `NAMED_VALUE_FLOAT` or in FDR per AC-NEW-3) |
| 2 | After replay, compute success-ratio over normal segments only | Success ratio ≥ 0.95 |
**Expected outcome**: ≥95% on normal segments. Sharp-turn segments (excluded from this denominator) are exercised separately by FT-N-02.
**Max execution time**: 12 min.
---
### FT-P-05: Satellite-anchor cross-domain registration
**Summary**: Validates the satellite-anchor (UAV→satellite cross-domain) matcher succeeds with the cross-domain MRE budget.
**Traces to**: AC-2.1b, AC-2.2
**Category**: Image Processing
**Preconditions**:
- `tile-cache-fixture` includes the still-image footprints.
- SUT cold-started.
**Input data**: `still-image-set-60` plus `still-image-sat-refs-2` (for the 2 images with paired `_gmaps.png`).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | For each still image, push to frame source | One satellite-anchor result per image |
| 2 | Read per-frame MRE (via FDR or `NAMED_VALUE_FLOAT`) | MRE recorded |
| 3 | Aggregate per-image accuracy AND MRE distribution | All images: MRE < 2.5 px; ≥80% within 50 m of GT; ≥50% within 20 m of GT |
**Expected outcome**: AC-1.1, AC-1.2, AC-2.1b, AC-2.2 all satisfied.
**Max execution time**: 5 min.
---
### FT-P-06: Mean Reprojection Error budgets (frame-to-frame + cross-domain)
**Summary**: Validates the two MRE budgets are honored.
**Traces to**: AC-2.2
**Preconditions**: Same as FT-P-04 + FT-P-05.
**Input data**: `derkachi-fixture` (frame-to-frame MRE) + `still-image-set-60` (cross-domain MRE).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Run FT-P-04 and FT-P-05 in sequence; collect per-frame MRE from both runs | MRE values captured |
| 2 | Aggregate by domain (frame-to-frame vs satellite-anchored) | Distribution per domain |
**Expected outcome**: Frame-to-frame MRE < 1.0 px (95th percentile); cross-domain MRE < 2.5 px (95th percentile).
**Max execution time**: piggybacks on FT-P-04 / FT-P-05.
---
### FT-P-07: Sharp-turn recovery via satellite reference
**Summary**: Validates that frames during sharp turns may fail frame-to-frame but recover via satellite-reference re-localization.
**Traces to**: AC-3.2
**Preconditions**:
- Sharp-turn segment of the Derkachi flight identified by gyro_z spikes in `SCALED_IMU2`. (If Derkachi has no sharp turn meeting AC-3.2 thresholds, fall back to a synthetic gyro overlay; flag in FDR.)
**Input data**: `derkachi-fixture` filtered to sharp-turn segment(s).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay sharp-turn segment | SUT emits `source_label = visual_propagated` or `dead_reckoned` during turn |
| 2 | After turn, observe next satellite-anchor attempt | Recovery: `source_label = satellite_anchored` returns within 3 frames of turn end; drift ≤ 200 m, heading change handled |
**Expected outcome**: Recovery within 3 frames; <200 m drift; <70° heading change handled.
**Max execution time**: 5 min (per turn segment, multiple per replay).
---
### FT-P-08: Multi-segment satellite-reference re-localization
**Summary**: Validates ≥3 disconnected segments per flight handled via satellite-reference re-localization.
**Traces to**: AC-3.3
**Preconditions**:
- `multi-segment-derkachi` synthetic fixture generated with 3+ blackout windows.
**Input data**: `multi-segment-derkachi`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay with injected blackout windows | SUT emits `dead_reckoned` during each blackout |
| 2 | At end of each blackout, observe re-localization | `source_label` returns to `satellite_anchored` within 3 frames; trajectory continuity preserved (no >100 m jump) |
**Expected outcome**: All 3+ segments re-localized successfully; no trajectory jump exceeds 100 m.
**Max execution time**: 12 min.
---
### FT-P-09-AP: ArduPilot Plane GPS_INPUT contract conformance + signing
**Summary**: Validates `GPS_INPUT` reaches AP SITL, AP EKF accepts it as primary GPS, and MAVLink 2.0 message signing handshake completes per D-C8-9.
**Traces to**: AC-4.3 (AP), D-C8-9, AC-NEW-2 (precondition)
**Category**: FC Contract / Security
**Preconditions**:
- `ardupilot-plane-sitl` running with `GPS_TYPE=14`.
- `mavlink-passkey` loaded as Docker secret into SUT.
**Input data**: `derkachi-fixture` (any 60 s segment).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start SUT with FC adapter `ardupilot` | Signing handshake completes within 5 s; signed channel established |
| 2 | Replay 60 s of Derkachi | SUT emits signed `GPS_INPUT` at the configured rate |
| 3 | Read AP `EK3_SRC1_POSXY` parameter via MAVPROXY | Value reads `3` (GPS source) |
| 4 | Read AP-side GPS health via `GPS_RAW_INT` | Fix type ≥ 3 (3D fix), HDOP within nominal |
**Expected outcome**: Signing handshake succeeds; AP EKF on GPS source-set; GPS_RAW_INT shows healthy fix.
**Max execution time**: 90 s.
---
### FT-P-09-iNav: iNav MSP2_SENSOR_GPS contract conformance
**Summary**: Validates `MSP2_SENSOR_GPS` reaches iNav SITL and iNav GPS provider accepts it as the sole source.
**Traces to**: AC-4.3 (iNav)
**Category**: FC Contract
**Preconditions**:
- `inav-sitl` running with GPS provider configured to MSP per `docs/SITL/SITL.md`.
**Input data**: `derkachi-fixture` (any 60 s segment).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start SUT with FC adapter `inav` | TCP connection to `inav-sitl:5760` established |
| 2 | Replay 60 s of Derkachi | SUT emits `MSP2_SENSOR_GPS` (ID 0x1F03) frames at 5 Hz |
| 3 | Read iNav GPS state via MSP query | `gpsSol.fixType` ≥ 3, `gpsSol.numSat` reflects emitted value, provider=MSP |
**Expected outcome**: iNav GPS state reflects emitted frames; no fallback to internal GPS.
**Max execution time**: 90 s.
---
### FT-P-10: GTSAM smoothing-loop look-back accuracy (IT-11)
**Summary**: Validates the smoothing-loop's past-keyframe pose estimates improve over raw single-shot estimates (Mode B Fact #107). NOT validated as FC-side retroactive correction.
**Traces to**: AC-4.5 (revised scope per Mode B), Mode B Fact #107
**Category**: Position Accuracy / Internal smoothing
**Preconditions**:
- SUT cold-started; FDR enabled.
**Input data**: `derkachi-fixture` full replay.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay Derkachi end-to-end | FDR contains per-keyframe (a) raw single-shot pose at first emission, (b) smoothed pose at iSAM2 convergence |
| 2 | After replay, parse FDR; for each past keyframe compute distance(raw, GT) and distance(smoothed, GT) | Per-keyframe pair extracted |
| 3 | Aggregate across keyframes | smoothed_error < raw_error for ≥80% of keyframes; mean improvement ≥ 5 m |
**Expected outcome**: Internal smoothing improves past-keyframe accuracy; FC-side retroactive correction NOT exercised (out of scope per Mode B revision A6).
**Max execution time**: 12 min.
---
### FT-P-11: Cold-start initialization from FC EKF
**Summary**: Validates SUT initialization from FC EKF's last valid GPS + IMU-extrapolated position at GPS denial.
**Traces to**: AC-5.1
**Category**: Startup
**Preconditions**:
- `cold-boot-fixture` provides a frozen FC pose snapshot.
- SUT not running.
**Input data**: `cold-boot-fixture`.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start `ardupilot-plane-sitl` (or `inav-sitl`) with the frozen-pose snapshot loaded | SITL EKF reflects the snapshot pose |
| 2 | Start SUT | SUT queries FC EKF; reads pose; initializes |
| 3 | Push first nav-camera frame | First outbound estimate's lat/lon within ±50 m of the FC EKF snapshot pose |
**Expected outcome**: First emitted estimate uses FC EKF's pose as prior, within ±50 m tolerance.
**Max execution time**: 60 s.
---
### FT-P-12: GCS downsample at 1-2 Hz
**Summary**: Validates position estimates + confidence stream to the GCS (via `mavproxy-listener`) at 1-2 Hz.
**Traces to**: AC-6.1
**Category**: GCS / Telemetry
**Preconditions**:
- `mavproxy-listener` running and capturing to `.tlog`.
**Input data**: `derkachi-fixture` 60 s segment.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start replay | SUT emits to FC at runtime cadence (~3 Hz) AND to GCS at 1-2 Hz |
| 2 | After replay, parse `.tlog` for SUT-emitted GCS messages over the 60 s window | GCS rate within [1, 2] Hz inclusive |
**Expected outcome**: GCS-side rate observed in [1, 2] Hz over the window.
**Max execution time**: 90 s.
---
### FT-P-13: GCS command path (operator re-loc hint)
**Summary**: Validates that GCS-originated commands (via standard MAVLink) can carry operator re-loc hints to the SUT.
**Traces to**: AC-6.2
**Category**: GCS / Telemetry
**Preconditions**:
- `mavproxy-listener` configured to send commands.
- SUT in `dead_reckoned` state (e.g. mid-blackout from FT-N-03 setup).
**Input data**: synthesized `STATUSTEXT` carrying re-loc hint from MAVPROXY.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | While SUT is in `dead_reckoned`, send re-loc-hint STATUSTEXT from MAVPROXY | SUT acknowledges the hint via FDR log entry |
| 2 | Push next nav-camera frame after hint | Next satellite-anchor attempt uses hint as a search prior |
**Expected outcome**: Hint received; next anchor attempt biases search; no rejection.
**Max execution time**: 60 s.
---
### FT-P-14: WGS84 output coordinate system
**Summary**: Validates output coordinates are in WGS84 (latitude/longitude in degrees as per ArduPilot/iNav GPS convention scaled to 1e-7).
**Traces to**: AC-6.3
**Preconditions**: any FT-P-01 / FT-P-02 run.
**Input data**: any.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Capture one outbound `GPS_INPUT` / `MSP2_SENSOR_GPS` from SITL | Lat/lon present; values in valid WGS84 range; scaled per protocol convention |
**Expected outcome**: Coordinates parse as WGS84 within Earth bounds.
**Max execution time**: 30 s.
---
### FT-P-15: Tile cache schema and resolution floor
**Summary**: Validates the tile cache manifest carries every required field and tiles meet the ≥0.5 m/px floor.
**Traces to**: AC-8.1, RESTRICT-SAT-2 (manifest schema)
**Preconditions**: `tile-cache-fixture` mounted.
**Input data**: tile cache.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | The SUT exposes a one-time cache-load self-check at startup; observe via FDR | Each tile manifest entry has CRS, tile matrix, dimension, lat-adjusted m/px, capture date, source, compression |
| 2 | Inspect m/px values | All ≥ 0.5 m/px; reject below floor |
**Expected outcome**: All loaded tiles pass schema check and resolution floor.
**Max execution time**: 30 s.
---
### FT-P-16: Pre-loaded cache (offline-only interface)
**Summary**: Validates the SUT loads tiles from the local cache only, with no in-flight Service calls.
**Traces to**: AC-8.3, RESTRICT-SAT-1
**Preconditions**: `tile-cache-fixture` mounted; `e2e-net` `internal: true` enforced (no internet egress).
**Input data**: `derkachi-fixture` 60 s segment.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start replay | SUT serves tiles from `/var/azaion/tile-cache` only |
| 2 | Observe network egress counter on `gps-denied-onboard` container | All egress to non-`e2e-net` destinations is 0 (paired with NFT-SEC-02) |
**Expected outcome**: 0 external egress; replay completes against local cache.
**Max execution time**: 90 s.
---
### FT-P-17: Mid-flight tile generation
**Summary**: Validates the SUT continuously orthorectifies nav-camera frames into basemap-projected tiles, deduplicates them, and stores them locally for landing-time upload.
**Traces to**: AC-8.4
**Preconditions**: empty `mid-flight-tile-output` directory in the FDR volume; mock-suite-sat-service running.
**Input data**: `derkachi-fixture` 5 min segment.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start replay | SUT generates and writes tiles to FDR's `mid-flight-tile-output/` |
| 2 | After replay, read tiles | ≥1 tile per ~3 s of high-quality nav frames; each tile carries quality metadata sufficient for the Service voting layer (per Mode B Fact #105) |
| 3 | Simulate landing event; SUT uploads to `mock-suite-sat-service` | Mock service receives all tiles with HTTP 202 |
**Expected outcome**: Tiles produced + deduplicated + uploaded with quality metadata.
**Max execution time**: 8 min.
---
### FT-P-18: No raw nav/AI-cam frame retention (storage policy)
**Summary**: Validates that no raw nav-camera or AI-camera frames are retained except the ≤0.1 Hz failed-tile-gen thumbnail log.
**Traces to**: AC-8.5
**Preconditions**: `derkachi-fixture` replay just completed.
**Input data**: post-replay state of FDR + tile cache.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Walk the FDR + tile cache for any file matching nav-camera raw-frame pattern (JPEG/RAW with original dimensions) | Only the failed-tile-gen thumbnail log files present (≤0.1 Hz cadence) |
| 2 | Verify thumbnail log is bounded by AC-NEW-3 FDR budget | Total thumbnail log < 1 GB over 8 h (NFT-LIM-02 cross-check) |
**Expected outcome**: 0 unauthorized raw frames retained.
**Max execution time**: 30 s (filesystem walk).
---
### FT-P-19: Satellite relocalization scale-ratio + scene-change
**Summary**: Validates UAV-frame ground footprint at deployment altitude is retrievable from cache regardless of internal tiling. Scene-change subset is reduced-confidence (PARTIAL — see traceability matrix).
**Traces to**: AC-8.6 (scale-ratio FULL; scene-change PARTIAL)
**Preconditions**: `tile-cache-fixture` mounted with multi-zoom-level coverage.
**Input data**: `still-image-set-60` (scale-ratio); the 2 paired `_gmaps.png` images (scene-change subset).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | For each still image, query cache top-K=10 retrieval | Top-K result includes a tile whose centre is within 100 m of the image's true centre (scale-ratio satisfied) |
| 2 | For the 2 paired images, run cross-domain matcher against the `_gmaps.png` reference | Scale-ratio match succeeds; scene-change behavior recorded (PARTIAL — full coverage requires a labeled change-pair dataset, deferred under D-PROJ-3) |
**Expected outcome**: Scale-ratio passes for 60/60; scene-change recorded as PARTIAL.
**Max execution time**: 5 min.
---
## Negative Scenarios
### FT-N-01: 350 m outlier injection tolerance
**Summary**: Validates the system tolerates up to 350 m outliers between two consecutive frames with airframe tilt up to ±20°.
**Traces to**: AC-3.1, RESTRICT-CAM-1 (nadir camera, tilt limits)
**Preconditions**: SUT running on `derkachi-fixture`; `outlier-injection-derkachi` injector primed in `medium` density.
**Input data**: `outlier-injection-derkachi` (medium).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start replay with injector active (every 10th frame replaced by far-away tile crop) | SUT detects outlier; rejects from anchor; estimate continues from prior valid state |
| 2 | Compare per-frame outbound estimate vs GT for non-outlier frames | Error_after_outlier ≤ error_before_outlier + 50 m; covariance grows monotonically across the outlier event |
**Expected outcome**: Outliers rejected; estimate degrades at most by 50 m drift; covariance monotonic.
**Max execution time**: 12 min.
---
### FT-N-02: Sharp-turn frame-to-frame failure expected
**Summary**: Negative twin of FT-P-07 — validates that during a sharp turn, frame-to-frame may LEGITIMATELY fail, and the system labels accordingly.
**Traces to**: AC-3.2 (negative path)
**Preconditions**: Same as FT-P-07.
**Input data**: sharp-turn segment of Derkachi (or synthetic gyro overlay).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay sharp-turn segment | During turn frames: `source_label``{visual_propagated, dead_reckoned}`; covariance grows |
| 2 | After turn, observe label transition | Label returns to `satellite_anchored` once next anchor succeeds |
**Expected outcome**: Sharp-turn frames correctly mark themselves as not-satellite-anchored; recovery exercised in FT-P-07.
**Max execution time**: 5 min.
---
### FT-N-03: Extended outage triggers operator re-loc request
**Summary**: Validates that on ≥3 consecutive frames AND ≥2 s without estimate, the SUT requests operator re-loc via telemetry and continues dead-reckoned propagation.
**Traces to**: AC-3.4
**Preconditions**: `derkachi-fixture` + 3-frame outage injector primed.
**Input data**: synthetic outage on Derkachi.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Trigger 3-consecutive-frame failure (corrupt frames) | SUT fails to produce estimates for 3+ frames |
| 2 | Wait ≥2 s | STATUSTEXT containing `OPERATOR_RELOC_REQUEST` emitted to `mavproxy-listener` |
| 3 | During outage, observe FC outbound | Estimates labeled `dead_reckoned` continue; FC uses last-known + IMU extrapolation |
**Expected outcome**: Re-loc request emitted; dead-reckoned estimates continue.
**Max execution time**: 60 s.
---
### FT-N-04: Visual blackout + spoofed GPS combined failsafe
**Summary**: Validates the AC-3.5 + AC-NEW-8 combined failsafe: switch label, reject spoof, propagate from last trusted state, monotonic covariance, STATUSTEXT.
**Traces to**: AC-3.5, AC-NEW-8
**Preconditions**: `blackout-spoof-derkachi` injector primed for 5 s, 15 s, 35 s windows; FC inbound stream patched to inject spoofed GPS.
**Input data**: `blackout-spoof-derkachi` (each window run as a sub-case).
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Begin blackout window AND inject spoofed GPS in same temporal window | Within ≤1 frame OR ≤400 ms: `source_label = dead_reckoned`; spoofed GPS rejected from estimator input; covariance grows monotonically |
| 2 | Observe `horiz_accuracy` field in outbound `GPS_INPUT` (AP) | `horiz_accuracy` ≥ 95% covariance semi-major axis (no under-reporting) |
| 3 | Observe GCS stream | `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz throughout blackout |
| 4 | For 35 s window only | Per AC-NEW-8: when 95% covariance crosses 100 m → fix-quality degraded; when crosses 500 m OR blackout exceeds 30 s → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT |
| 5 | End blackout; restore FC GPS-health | Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND a visual/satellite consistency check succeeds |
**Expected outcome**: All four steps' assertions pass for each window.
**Max execution time**: 5 min (3 windows × ~90 s each).
---
### FT-N-05: Stale-tile rejection on freshness violation
**Summary**: Validates that tiles violating AC-8.2 freshness window are rejected (or downgraded so they cannot produce a `satellite_anchored` label).
**Traces to**: AC-8.2, AC-NEW-6
**Preconditions**: `synth-age-tile-set` (`synth-age-7mo` for active-conflict, `synth-age-13mo` for rear) mounted instead of fresh fixture.
**Input data**: `still-image-set-60` against the aged cache.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Replay against `synth-age-7mo` (configure SUT for active-conflict sector) | SUT either rejects load OR loads but never emits `satellite_anchored` from these tiles |
| 2 | Replay against `synth-age-13mo` (configure SUT for rear sector) | Same: reject or non-`satellite_anchored` only |
**Expected outcome**: 0 frames emit `satellite_anchored` from aged tiles.
**Max execution time**: 5 min.
---
### FT-N-06: Mid-flight tile freshness (current-timestamped)
**Summary**: Validates that mid-flight-generated tiles are timestamped as current and treated as fresh per AC-NEW-6.
**Traces to**: AC-NEW-6 (positive sub-case)
**Preconditions**: empty `mid-flight-tile-output`.
**Input data**: `derkachi-fixture` 5 min segment.
**Steps**:
| Step | Consumer Action | Expected System Response |
|------|----------------|------------------------|
| 1 | Start replay | SUT generates mid-flight tiles |
| 2 | Inspect each generated tile's manifest entry | `capture_date` is within ±60 s of generation wall-clock; treated as fresh by the freshness gate |
**Expected outcome**: All mid-flight tiles current-timestamped and fresh.
**Max execution time**: 6 min.
+248
View File
@@ -0,0 +1,248 @@
# Test Environment
## Overview
**System under test (SUT)**: `gps-denied-onboard` companion-PC service that produces WGS84 position estimates from nav-camera frames + FC IMU/attitude and emits them to the FC over its native external-positioning interface. Public boundaries (the only surfaces tests interact with):
- **Inbound — nav-camera frames**: V4L2 / GStreamer source (production: USB / MIPI-CSI / GigE per `restrictions.md`; tests: file-backed source replaying `_docs/00_problem/input_data/AD0000NN.jpg` or `flight_derkachi/flight_derkachi.mp4`).
- **Inbound — FC telemetry**: MAVLink (ArduPilot) or MSP2 (iNav) inbound stream carrying `SCALED_IMU2`, `ATTITUDE`, `GLOBAL_POSITION_INT` (or MSP equivalents). Tests replay `flight_derkachi/data_imu.csv` through a thin replayer.
- **Inbound — satellite tile cache**: filesystem + on-disk index (FAISS HNSW + tile manifest). Tests load a fixture cache mounted as a Docker volume.
- **Outbound — FC external-positioning**: MAVLink `GPS_INPUT` (ArduPilot Plane) OR MSP2 `MSP2_SENSOR_GPS` (iNav). Tests observe these by spinning up the corresponding open-source SITL and reading what reaches the FC.
- **Outbound — GCS telemetry**: MAVLink to QGroundControl (1-2 Hz downsample of estimates + STATUSTEXT). Tests subscribe via a passive MAVLink listener.
- **Outbound — Flight Data Recorder**: NVM filesystem (per AC-NEW-3). Tests read the resulting FDR archive after the run.
**Consumer app purpose**: The e2e harness drives the SUT through these public boundaries — replaying frames + telemetry, mounting tile-cache fixtures, observing FC-side acceptance via SITL, and parsing FDR output. It NEVER imports SUT modules, NEVER queries SUT internal state, and NEVER touches the SUT's filesystem outside the FDR output directory.
## Two-tier execution profile
This project requires two distinct test environments because the production target is Jetson hardware and AC-4.1/AC-4.2/AC-NEW-5 cannot be honestly validated on a generic x86 dev workstation.
| Tier | Hardware | What it covers | What it skips |
|------|----------|----------------|---------------|
| **Tier-1 (workstation Docker)** | x86 dev workstation, optional NVIDIA dGPU for TensorRT validation | All `FT-*` correctness, schema, `NFT-RES-*` resilience scenarios, `NFT-SEC-*` security scenarios, `NFT-LIM-*` storage budgets | Any AC whose pass criterion is bound to Jetson Orin Nano Super wall-clock latency or thermal envelope: AC-4.1 / AC-4.2 / AC-NEW-1 / AC-NEW-5 |
| **Tier-2 (Jetson hardware loop)** | Jetson Orin Nano Super (pinned hardware per `restrictions.md`), thermal chamber for AC-NEW-5 | AC-4.1 latency p95, AC-4.2 memory, AC-NEW-1 cold-start TTFF, AC-NEW-5 thermal envelope (chamber-only) | Iteration speed (manual hardware time) |
CI runs Tier-1 on every PR. Tier-2 runs on hardware-attached runners on a nightly cadence and pre-release gate; results are imported into the same CSV report format as Tier-1.
## Docker Environment (Tier-1)
### Services
| Service | Image / Build | Purpose | Ports |
|---------|--------------|---------|-------|
| `gps-denied-onboard` | local build (`docker/Dockerfile`) | The SUT. Production binary built with `BUILD_VINS_MONO=OFF` per locked sub-decision D-C1-1-SUB-A; research builds run a parallel job with `BUILD_VINS_MONO=ON` | 14550/udp (MAVLink to GCS), 5760/tcp (MSP2 to iNav SITL) |
| `ardupilot-plane-sitl` | `ardupilot/ardupilot-sitl:plane-stable` | ArduPilot Plane SITL. Receives `GPS_INPUT` from the SUT; we read its EKF source-set state to validate AC-4.3, AC-NEW-2, AC-5.x | 14550/udp (MAVLink) |
| `inav-sitl` | `inavflight/inav-sitl:9.0.0` | iNav SITL. Receives `MSP2_SENSOR_GPS` from the SUT; we read its GPS provider state | 5760/tcp (MSP2 over TCP per iNav SITL convention) |
| `mock-suite-sat-service` | local build (`tests/fixtures/mock-suite-sat`) | Stubs the parent-suite Satellite Service tile-publish API (read-only ingest contract for AC-NEW-7 voting layer). Returns deterministic fixture tiles | 8080/tcp |
| `e2e-runner` | local build (`tests/runner`) | Pytest-based harness. Drives all replays, reads FDR output, spins SITL scenarios | — |
| `mavproxy-listener` | `ardupilot/mavproxy:latest` | Passive MAVLink listener that captures the SUT → GCS stream into a per-run `.tlog` for assertions | 14551/udp |
### Networks
| Network | Services | Purpose |
|---------|----------|---------|
| `e2e-net` | all | Isolated test network. No host networking, no internet. Per RESTRICT-SAT-1, the SUT must NEVER reach an external satellite provider during a flight; a deny-all egress rule on `e2e-net` enforces this and is itself a security test (NFT-SEC-02). |
### Volumes
| Volume | Mounted to | Purpose |
|--------|-----------|---------|
| `tile-cache-fixture` | `gps-denied-onboard:/var/azaion/tile-cache:ro` | Pre-built FAISS HNSW index + tile filesystem. Built once per test run from `tests/fixtures/tile-cache-builder/` from the 60 still-image satellite references and the Derkachi route bbox. Read-only mount mirrors AC-8.3 pre-flight load behavior. |
| `fdr-output` | `gps-denied-onboard:/var/azaion/fdr` | Per-flight FDR write target (AC-NEW-3 64 GB cap enforced via Docker `--storage-opt size=64g` on this volume) |
| `input-data` | `e2e-runner:/test-data:ro` | Bind mount of `_docs/00_problem/input_data/` for replay |
| `expected-results` | `e2e-runner:/expected:ro` | Bind mount of `_docs/00_problem/input_data/expected_results/` for assertions |
### docker-compose structure
```yaml
services:
gps-denied-onboard:
build:
context: ../..
dockerfile: docker/Dockerfile
args:
BUILD_VINS_MONO: "OFF"
networks: [e2e-net]
volumes:
- tile-cache-fixture:/var/azaion/tile-cache:ro
- fdr-output:/var/azaion/fdr
environment:
ONBOARD_FC_ADAPTER: ${FC_ADAPTER} # ardupilot | inav, set per scenario
ONBOARD_VIO_STRATEGY: ${VIO_STRATEGY} # okvis2 | klt_ransac (production); vins_mono only in research build
MAVLINK_SIGNING_PASSKEY_FILE: /run/secrets/mavlink_passkey
depends_on:
- mock-suite-sat-service
ardupilot-plane-sitl:
image: ardupilot/ardupilot-sitl:plane-stable
networks: [e2e-net]
command: ["--vehicle=ArduPlane", "--gps-type=14"] # GPS_TYPE=14 = MAV per ArduPilot SITL_simulation_parameters.html
inav-sitl:
image: inavflight/inav-sitl:9.0.0
networks: [e2e-net]
# iNav SITL exposes MSP on TCP 5760 (UART1) per docs/SITL/SITL.md
mock-suite-sat-service:
build: ../fixtures/mock-suite-sat
networks: [e2e-net]
# Egress restriction enforced at network level, not service level
e2e-runner:
build: ../runner
networks: [e2e-net]
volumes:
- input-data:/test-data:ro
- expected-results:/expected:ro
- fdr-output:/fdr:ro
depends_on:
- gps-denied-onboard
- ardupilot-plane-sitl
- inav-sitl
- mavproxy-listener
mavproxy-listener:
image: ardupilot/mavproxy:latest
networks: [e2e-net]
networks:
e2e-net:
driver: bridge
internal: true # NO external connectivity (enforces RESTRICT-SAT-1)
volumes:
tile-cache-fixture: {}
fdr-output: {}
```
## Consumer Application
**Tech stack**: Python 3.12, pytest 8.x, pymavlink (MAVLink ground side), `msp_gps_toy` (MSP2 ground side, Rust binary called via subprocess), OpenCV ≥4.12.0 (frame source replay), numpy + scipy (geodesic-distance assertions in WGS84).
**Entry point**: `pytest tests/e2e/` from inside `e2e-runner`. Each scenario is a parameterized pytest case keyed by FC adapter (`ardupilot` / `inav`).
### Communication with system under test
| Interface | Protocol | Endpoint / Topic | Authentication |
|-----------|----------|-----------------|----------------|
| Frame source | V4L2 / GStreamer file source | UNIX domain socket / shared `/test-data` mount | none (local) |
| FC telemetry inbound | MAVLink (AP) or MSP2 (iNav) | `udp:gps-denied-onboard:14550` (AP) or `tcp:gps-denied-onboard:5760` (iNav) | MAVLink 2.0 message signing on AP per D-C8-9 (passkey via Docker secret); iNav unsigned per accepted residual risk |
| Tile cache | Filesystem read | `/var/azaion/tile-cache` (read-only mount) | filesystem perms |
| FC external-pos outbound observation | Read SITL EKF source-set + GLOBAL_POSITION_INT replay back from SITL | `udp:ardupilot-plane-sitl:14550` or `tcp:inav-sitl:5760` | passive listener |
| GCS telemetry observation | MAVLink listener | `udp:mavproxy-listener:14551` (forwarded from SUT 14550) | none |
| FDR output | Filesystem read post-run | `/fdr` (read-only mount) | filesystem perms |
| Suite Sat Service mock | HTTP/JSON | `http://mock-suite-sat-service:8080` | none (test) |
### What the consumer does NOT have access to
- No direct access to the SUT's internal state (GTSAM iSAM2 graph, FAISS index in-memory, OpenCV intermediate buffers, VioStrategy implementation pointer).
- No internal Python/C++ module imports from the SUT.
- No shared memory or filesystem with the SUT outside the four explicit mounts (`tile-cache-fixture` r/o, `fdr-output` r/o from runner side, `input-data` r/o, `expected-results` r/o).
- No bypass of the FC-side acceptance check — every AC-4.3 assertion goes through SITL.
## CI/CD Integration
**When to run**:
- Tier-1 (workstation Docker): on every PR to `dev` branch and nightly on `dev` HEAD.
- Tier-2 (Jetson hardware loop): nightly on `dev`, and as a hard gate before any release tag.
- AC-NEW-5 thermal envelope: monthly on chamber-attached Jetson runner; failures block release tags only.
**Pipeline stage**:
- Tier-1 fits in the standard CI matrix as a single job (~30-45 min wall-clock for the full suite at first cut).
- Tier-2 is a separate workflow on `self-hosted-jetson-orin` runner.
**Gate behavior**: Tier-1 blocks PR merge on any test failure. Tier-2 blocks release tag on any test failure. Chamber tests are warning-only on PRs and blocking on release tags.
**Timeout**:
- Tier-1: 60 min per matrix entry.
- Tier-2: 4 hr per matrix entry (allows for full Derkachi 8 min replay × ~10 scenarios + cold-boot loops).
- Thermal chamber AC-NEW-5: 9 hr (8 h hot-soak + setup/teardown).
## Reporting
**Format**: CSV (one row per test).
**Columns**: `test_id, test_name, traces_to, fc_adapter, vio_strategy, tier, started_at_utc, execution_time_ms, result, error_message, evidence_paths`
- `traces_to`: comma-separated AC/RESTRICT IDs from the traceability matrix.
- `fc_adapter`: `ardupilot` | `inav` | `n/a`.
- `vio_strategy`: `okvis2` | `klt_ransac` | `vins_mono` | `n/a` (research-build only for `vins_mono`).
- `tier`: `tier1-docker` | `tier2-jetson` | `tier2-chamber`.
- `result`: `PASS` | `FAIL` | `SKIP` | `XFAIL` (XFAIL only allowed for AC explicitly marked NOT COVERED in the traceability matrix and not yet promoted to a real test).
- `evidence_paths`: comma-separated paths inside the run-output bundle (`.tlog` files, FDR archives, screenshots, profiler traces) supporting the verdict.
**Output path**: `e2e-results/run-${RUN_ID}/report.csv` plus a per-run bundle of evidence at `e2e-results/run-${RUN_ID}/evidence/`.
## Test Execution
**Decision (2026-05-09)**: **both** — Tier-1 Docker + Tier-2 Jetson hardware loop. Confirmed at the Hardware-Dependency Assessment Step 4 gate.
### Hardware dependencies found (Phase 3 → Hardware Assessment scan)
| Category | Indicator | Source file |
|---|---|---|
| GPU / CUDA | TensorRT engines (`.engine`, SM 87, JetPack 6.2, TRT 10.3) | `_docs/01_solution/solution.md` PRE-FLIGHT block |
| GPU / CUDA | DISK+LightGlue FP16 inference | `_docs/01_solution/solution.md` RUNTIME block (C3) |
| GPU / CUDA pin | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared LPDDR5, 25 W) | `_docs/00_problem/restrictions.md` § Onboard Hardware |
| Sensors / Cameras | ADTi 20MP 20L V1 nadir camera over USB / MIPI-CSI / GigE | `_docs/00_problem/restrictions.md` § Cameras |
| Sensors / Cameras | V4L2 / GStreamer frame source (production) | `_docs/02_document/tests/environment.md` § Overview |
| OS-specific services | High-rate IMU via UART/MAVLink to FC | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | Per-FC inbound (MAVLink GPS_INPUT for AP, MSP2 over UART for iNav) | `_docs/00_problem/restrictions.md` § Sensors & Integration |
| OS-specific services | tegrastats / jetson_stats for thermal telemetry | `_docs/02_document/tests/resource-limit-tests.md` NFT-LIM-04 |
| Thermal envelope | -20 °C to +50 °C operating envelope, 25 W TDP, 8 h duty cycle | `_docs/00_problem/restrictions.md` § Failsafe & Safety + AC-NEW-5 |
(Step 2 Code scan returned zero indicators because no source code exists yet — this is the planning phase. Decompose → Implement will produce `requirements.txt` / `pyproject.toml` / Cargo.toml entries that confirm: `tensorrt`, `pycuda`, `pymavlink`, `gtsam`, `faiss-gpu`, `opencv-python>=4.12.0`, `jetson-stats`.)
### Execution instructions — Tier-1 (Docker)
**Prerequisites**:
- Docker 24+ with Compose v2.
- NVIDIA Container Toolkit if the workstation has an NVIDIA dGPU (lets the SUT exercise the TensorRT path; otherwise falls back to CPU TensorRT).
- ≥16 GB host RAM, ≥80 GB free disk for `tile-cache-fixture` + `fdr-output` + image build cache.
**How to start**:
```bash
cd e2e/docker
export FC_ADAPTER=ardupilot # or: inav (parameterized per scenario in CI)
export VIO_STRATEGY=okvis2 # or: klt_ransac (production binary)
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit e2e-runner
```
The run reports to `./e2e-results/run-${RUN_ID}/report.csv` (see § Reporting). Exit code matches the test verdict.
**Environment variables**:
- `FC_ADAPTER``{ardupilot, inav}` — selects which SITL the SUT talks to.
- `VIO_STRATEGY``{okvis2, klt_ransac}` for production binary; `vins_mono` only when the research binary `BUILD_VINS_MONO=ON` is the build.
- `MAVLINK_SIGNING_PASSKEY_FILE` — path to the Docker secret loaded with the test passkey for FT-P-09-AP / NFT-SEC-03.
**Skipped on Tier-1**: `NFT-PERF-01` (AC-4.1 latency p95 — Jetson-bound), `NFT-LIM-01` (AC-4.2 memory — Jetson-bound), `NFT-PERF-03` (AC-NEW-1 cold-start — Jetson-bound), `NFT-LIM-04` (AC-NEW-5 chamber baseline — Jetson-bound), AC-NEW-5 chamber portion (chamber-bound).
### Execution instructions — Tier-2 (Jetson hardware loop)
**Prerequisites**:
- Jetson Orin Nano Super (per `restrictions.md` § Onboard Hardware).
- JetPack 6.2 + CUDA + TensorRT 10.3 + cuDNN per D-C7-9.
- Workstation thermal-day environment for NFT-LIM-04 baseline. Chamber-attached runner for AC-NEW-5 chamber portion (separate quarterly job; not run in standard CI).
- ArduPilot Plane SITL + iNav SITL run on the same Jetson, OR on a paired x86 host on the same network — both are supported.
- Real ADTi 20MP 20L V1 camera connected via USB/MIPI-CSI/GigE; OR file-replay source if camera unavailable (in which case all `AC-2.x` cross-validation is `XFAIL` for that run).
**How to start**:
```bash
cd e2e/jetson
sudo systemctl restart gps-denied-onboard.service
./run-tier2.sh --fc-adapter ardupilot --vio-strategy okvis2 --duration 8h
# or:
./run-tier2.sh --fc-adapter inav --vio-strategy klt_ransac --duration 5min
```
Outputs the same CSV format as Tier-1 (one report.csv per run).
**Environment variables**: same as Tier-1 plus:
- `TIER2_CHAMBER_AMBIENT_C` — ambient temperature for AC-NEW-5 chamber runs.
- `TIER2_CAMERA_DEVICE``/dev/video0` (production) or file path for replay mode.
### CI runner mapping
- `ubuntu-24.04` (GitHub-hosted) → Tier-1 Docker, every PR + nightly. ~30-45 min per matrix entry.
- `self-hosted-jetson-orin` → Tier-2 Jetson, nightly on `dev` HEAD + pre-release gate. ~4 hr per matrix entry.
- `self-hosted-jetson-orin-chamber` → AC-NEW-5 hot-soak. Quarterly + before any release tag. ~9 hr.
**Matrix dimensions**: `FC_ADAPTER × VIO_STRATEGY × build_kind` where `build_kind ∈ {production, research}`. Production `vins_mono` is excluded (D-C1-1-SUB-A locked); research includes all three VioStrategy values.
@@ -0,0 +1,126 @@
# Performance Tests
All performance tests honor the per-tier execution profile from `environment.md`. Latency and memory tests bound to Jetson Orin Nano Super hardware run on Tier-2 only; metrics that don't depend on hardware (e.g. inter-emit interval correctness, GCS rate) run on both tiers.
### NFT-PERF-01: End-to-end latency p95 budget
**Summary**: Validates the AC-4.1 end-to-end latency budget (camera capture → GPS to FC) on the pinned hardware.
**Traces to**: AC-4.1, D-CROSS-LATENCY-1
**Metric**: Wall-clock latency from frame-capture timestamp to outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) reception at the SITL container.
**Preconditions**:
- Tier-2 only — Jetson Orin Nano Super, JetPack 6.2, TensorRT 10.3 per D-C7-9.
- `tile-cache-fixture` pre-loaded.
- SUT cold-started THEN warmed up for 30 s of replay before measurement window starts.
- Two configurations measured: (a) `K=3` baseline at +25 °C, (b) `K=2 + Jacobian-cov` hybrid auto-degrade at +50 °C ambient (NFT-9 in the solution draft).
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Run 30 s warm-up replay (excluded from measurement) | none |
| 2 | Run 5 min Derkachi replay at 3 Hz target cadence | per-frame latency: `t_emit_at_sitl t_capture` |
| 3 | Record per-frame latency to CSV; compute p50, p95, p99 | distribution |
| 4 | Repeat at +50 °C ambient (chamber if available, else flagged) | distribution under thermal-throttle hybrid |
**Pass criteria**:
- (a) `K=3` baseline: p95 ≤ 400 ms (AC-4.1 hard bound).
- (b) `K=2 + Jacobian-cov` hybrid: p95 ≤ 400 ms still satisfied after auto-degrade (proves D-CROSS-LATENCY-1 effective).
- ≤10% frame drops under sustained load (AC-4.1 allowance).
- Per-stage latency partitioning (D-CROSS-LATENCY-1 table) recorded for all stages: C1 OKVIS2 / C2 UltraVPR / C2.5 / C3 / C3.5 / C4 / C4 cov / C5 / serialization / OS jitter — used in NFT-PERF-01 evidence bundle for budget-margin tracking.
**Duration**: 2 × 5.5 min replays (warm-up + measurement) per configuration; ~25 min total per FC adapter.
---
### NFT-PERF-02: Frame-by-frame streaming (no batching)
**Summary**: Validates AC-4.4 — estimates streamed frame-by-frame with no batching/delay.
**Traces to**: AC-4.4
**Metric**: Inter-emit interval at SITL.
**Preconditions**:
- Tier-1 OR Tier-2.
- SUT warmed up for 30 s.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Replay Derkachi 5 min at 3 Hz | per-frame inter-emit interval at SITL |
| 2 | Compute distribution | p95 of inter-emit interval |
**Pass criteria**: p95 inter-emit interval ≤ inter-frame-interval × 1.05 (i.e. ≤ ~350 ms at 3 Hz target). No window of ≥3 missed-emit gaps.
**Duration**: 6 min.
---
### NFT-PERF-03: Cold-start TTFF
**Summary**: Validates AC-NEW-1 cold-start time-to-first-fix from companion boot.
**Traces to**: AC-NEW-1
**Metric**: Wall-clock from SUT container-ready event (or `systemctl start` on Tier-2) to first valid outbound `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival at SITL.
**Preconditions**:
- Tier-2 (Jetson) for the canonical run; Tier-1 acceptable for trend-tracking.
- `cold-boot-fixture` provides the FC EKF snapshot (loaded into SITL before the SUT cold boot).
- `tile-cache-fixture` already mounted (cache-load is part of the TTFF budget per AC-NEW-1 wording "from boot").
- 50 cold boots executed back-to-back to populate distribution.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Stop SUT; clear in-memory state | container down |
| 2 | Start SUT (record `t_start`) | timestamp |
| 3 | First outbound message arrives at SITL (record `t_first_emit`) | TTFF = `t_first_emit t_start` |
| 4 | Repeat 50 times | distribution |
**Pass criteria**: p95 TTFF < 30 s.
**Duration**: ~30 min (50 × ~30 s + restart overhead).
---
### NFT-PERF-04: Spoofing-promotion latency
**Summary**: Validates AC-NEW-2 — when FC signals GPS denial/spoof, promote onboard estimate to FC's primary position source within < 3 s p95.
**Traces to**: AC-NEW-2
**Metric**: Latency from spoof-onset signal to FC-side EKF source-set switch (AP: `EK3_SRC1_POSXY` flips to companion-source value; iNav: GPS provider state reflects companion as primary).
**Preconditions**:
- Tier-1 acceptable (mostly software loops + SITL).
- `derkachi-fixture` running with SUT in `satellite_anchored` steady state.
- Spoof injector primed.
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Inject false GPS into FC SITL (record `t_spoof_onset`) | timestamp |
| 2 | Observe FC EKF source-set state via parameter read polling at 100 Hz (record `t_promotion`) | promotion latency = `t_promotion t_spoof_onset` |
| 3 | Repeat 50 trials per FC (parameterized on `ardupilot` + `inav`) | distribution per FC |
**Pass criteria**: p95 < 3 s on both FCs.
**Duration**: ~25 min per FC (50 trials × ~30 s including pre-trial reset).
---
### Per-stage latency partition record (informational, not pass/fail)
NFT-PERF-01 captures per-stage latencies matching the D-CROSS-LATENCY-1 partition table from `solution.md`. The recorded targets are tracked for budget-margin trend (regression detector), not as independent pass/fail thresholds — only AC-4.1 p95 ≤ 400 ms is the hard gate.
| Stage | K=3 target p95 | K=2 hybrid target p95 |
|-------|---------------|----------------------|
| C1 OKVIS2 VIO | ≤ 60 ms | ≤ 60 ms |
| C2 UltraVPR query | ≤ 15 ms | ≤ 15 ms |
| C2.5 Top-N re-rank | ≤ 80 ms | ≤ 80 ms |
| C3 DISK+LightGlue × N | ≤ 200 ms (steady) | ≤ 140 ms (thermal) |
| C3.5 AdHoP (conditional, p99) | ≤ 100 ms when triggered | ≤ 60 ms when triggered |
| C4 solvePnPRansac | ≤ 25 ms | ≤ 25 ms |
| C4 covariance recovery | ≤ 100 ms (steady) | ≤ 25 ms (thermal) |
| C5 iSAM2 update | ≤ 15 ms | ≤ 15 ms |
| MAVLink/MSP2 + UART/USB | ≤ 30 ms | ≤ 30 ms |
| OS scheduling jitter (p99) | ≤ 50 ms | ≤ 50 ms |
+108
View File
@@ -0,0 +1,108 @@
# Resilience Tests
### NFT-RES-01: FC IMU-only fallback after >3 s without estimate
**Summary**: Validates AC-5.2 — on >3 s without an estimate, the FC falls back to IMU-only dead reckoning AND the SUT logs the failure.
**Traces to**: AC-5.2
**Preconditions**:
- SUT in `satellite_anchored` steady state on Derkachi replay.
- 4 s outage injector primed (replay paused for 4 s of wall-clock).
**Fault injection**:
- Pause frame source for 4 s of wall-clock while FC IMU stream continues.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Mid-replay, halt frame delivery for 4 s | SUT continues emitting `dead_reckoned` estimates from FC IMU/attitude propagation |
| 2 | After 3 s without an emit (i.e. SUT internally fails to update for >3 s), SUT logs `NO_ESTIMATE_TIMEOUT` | FDR contains the log entry |
| 3 | Observe FC EKF source-set transition | EKF source-set transitions to internal IMU-only on the FC side per the FC's own failsafe logic (AP `EKF_FAILSAFE` or equivalent on iNav) |
| 4 | Resume frame delivery | SUT recovers; FC EKF source-set returns to companion-GPS source |
**Pass criteria**:
- `NO_ESTIMATE_TIMEOUT` logged within 200 ms of the 3 s mark.
- FC EKF reflects the transition.
- Recovery on resume happens within 5 emit cycles.
---
### NFT-RES-02: Companion mid-flight reboot
**Summary**: Validates AC-5.3 — on companion reboot mid-flight, SUT re-initializes from FC's current IMU-extrapolated position.
**Traces to**: AC-5.3
**Preconditions**:
- SUT in steady state on Derkachi replay.
- FC SITL has been running long enough to have a stable IMU-extrapolated pose.
**Fault injection**:
- `docker compose restart gps-denied-onboard` mid-replay (or `systemctl restart` on Tier-2).
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | At t=120 s of replay, restart SUT container | SUT goes down and back up |
| 2 | Wait for first post-restart `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival | First emit lat/lon within ±100 m of FC's IMU-extrapolated pose at boot-complete time |
| 3 | Observe TTFF post-reboot | Within AC-NEW-1 budget (<30 s p95) |
**Pass criteria**:
- First post-restart emit ±100 m of FC pose at boot-complete.
- Cold-restart TTFF < 30 s.
- No FC-side EKF divergence event during the gap.
---
### NFT-RES-03: False-position safety budget Monte Carlo
**Summary**: Validates AC-NEW-4 false-position safety budget (`P(error > 500 m) < 0.1%`, `P(error > 1 km) < 0.01%`) on the available data + synthesis. PARTIAL — multi-flight statistics constrained by single Derkachi flight + 60 stills (see traceability matrix flag).
**Traces to**: AC-NEW-4 (PARTIAL)
**Preconditions**:
- Tier-1 acceptable (statistical rather than hardware-bound).
- Pull together: 60 still-image runs (60 frames) + Derkachi replay (~14,700 frames at 30 fps OR resampled to ~870 frames at 3 Hz target). Total ≥930 frames per Monte Carlo iteration.
- Run M=50 Monte Carlo iterations with synthetic perturbations (camera-pose noise, IMU bias drift, randomized tile sub-selection).
**Fault injection**:
- Add per-iteration synthetic perturbations to mimic a population of independent flights.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Run M iterations end-to-end | Per-iteration error distribution captured |
| 2 | Aggregate across all iterations × frames | Per-frame error CDF |
| 3 | Read off `P(error > 500 m)` and `P(error > 1 km)` from CDF | Both values |
**Pass criteria** (PARTIAL):
- `P(error > 500 m) < 0.1%`.
- `P(error > 1 km) < 0.01%`.
- Test FAILS-OPEN with explicit "PARTIAL" annotation in CSV report when iteration count is below the AC-NEW-4-implied ≥100 flights — noted as reduced confidence pending D-PROJ-3 (AerialVL S03 + own multi-flight data).
---
### NFT-RES-04: Visual blackout + spoof degraded-mode escalation
**Summary**: Validates the AC-NEW-8 escalation ladder (5 s, 15 s, 35 s blackouts paired with spoof) including the 100 m / 500 m covariance thresholds and the 10 s GPS-health gate before recovery.
**Traces to**: AC-NEW-8 (twin of FT-N-04 with extended duration window and covariance assertions)
**Preconditions**: Same as FT-N-04; Tier-1 acceptable.
**Fault injection**: `blackout-spoof-derkachi` 5 s / 15 s / 35 s windows + spoofed FC GPS for the same windows.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Begin 5 s window | Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS rejected |
| 2 | At end of 5 s window, attempt recovery | Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND visual/satellite consistency check succeeds (gate enforced) |
| 3 | Begin 15 s window | Same as step 1 plus when 95% covariance crosses 100 m: outbound MAVLink fix-quality degraded to "2D fix or worse" |
| 4 | Begin 35 s window | Plus when 95% covariance crosses 500 m OR blackout exceeds 30 s: `horiz_accuracy=999.0` + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT emitted |
**Pass criteria**:
- All four assertions fire at the right thresholds.
- Recovery gate is honored — early recovery attempts (FC GPS healthy for <10 s) MUST NOT promote spoofed GPS back into the estimator.
**Duration**: ~10 min total for three windows.
@@ -0,0 +1,100 @@
# Resource Limit Tests
### NFT-LIM-01: Jetson memory ≤ 8 GB throughout 8 h replay
**Summary**: Validates AC-4.2 — memory < 8 GB shared on Jetson Orin Nano Super for the full duty cycle.
**Traces to**: AC-4.2, RESTRICT-HW-1
**Preconditions**:
- Tier-2 only (Jetson hardware).
- `tile-cache-fixture` mounted.
- 8 h Derkachi replay loop (~60 loops of the 490 s fixture, OR a wrapped 8 h synthetic load that holds the same operating mix per AC-NEW-3 8 h synthetic-load definition).
**Monitoring**:
- `jetson_stats` (`jtop` API) RAM usage sampled at 1 Hz.
- Per-component memory annotation if SUT exposes it via `NAMED_VALUE_FLOAT` / FDR.
- Swap usage (must remain 0 — Jetson Orin Nano Super has no swap by default).
**Duration**: 8 h.
**Pass criteria**: Peak RSS ≤ 8 GB across the entire 8 h window; swap stays 0.
---
### NFT-LIM-02: FDR ≤ 64 GB / flight (8 h synthetic load)
**Summary**: Validates AC-NEW-3 — per-flight FDR ≤ 64 GB; oldest segment dropped on rollover; no payload class silently dropped without a logged rollover.
**Traces to**: AC-NEW-3
**Preconditions**:
- Tier-1 acceptable (storage budget is policy/rotation driven, not Jetson-specific).
- `fdr-output` Docker volume sized exactly 64 GB.
- 8 h Derkachi replay loop at 3 Hz nav frames (per AC-NEW-3 validation wording).
**Monitoring**:
- Total `fdr-output` volume size at 1-min sample rate.
- Per-payload-class size: per-frame estimates + IMU traces + emitted MAVLink + raw MAVLink (tlog) + system health + mid-flight tiles + ≤0.1 Hz failed-tile-gen thumbnails.
- Rollover-event log entries (count, timestamp, dropped-segment ID).
**Duration**: 8 h synthetic.
**Pass criteria**:
- Volume never exceeds 64 GB.
- Every drop event has a corresponding rollover log entry (no silent drops).
- All payload classes enumerated in AC-NEW-3 are present (no class missing entirely).
---
### NFT-LIM-03: Tile cache ≤ 10 GB across operational area
**Summary**: Validates RESTRICT-SAT-2 — cache budget 10 GB persistent across the ~400 km² operational area, including manifests, overviews, and any precomputed indices.
**Traces to**: RESTRICT-SAT-2, AC-8.3
**Preconditions**:
- `tile-cache-fixture` covers the full operational-area footprint (still-image + Derkachi route bbox, target ~400 km² for parity).
**Monitoring**:
- Total tile-cache size on disk.
- Per-component breakdown: tile filesystem, tile manifest DB (PostgreSQL btree per `solution.md`), FAISS HNSW index, descriptor cache.
**Duration**: one-shot measurement after fixture build + after a 5 min replay (to catch any descriptor-on-demand growth).
**Pass criteria**: Total cache size ≤ 10 GB at both measurement points.
---
### NFT-LIM-04: No thermal throttling at 25 W TDP — workstation thermal-day baseline
**Summary**: Tier-2 baseline of AC-NEW-5 thermal-throttle behavior at workstation ambient temperature. Full chamber test at +50 °C is deferred to the AC-NEW-5 chamber gate (out-of-scope for data-acquisition per Phase 1 gate).
**Traces to**: AC-NEW-5 (PARTIAL), RESTRICT-HW-1
**Preconditions**:
- Tier-2 (Jetson) at workstation ambient (~25 °C).
- 8 h Derkachi replay loop sustaining 25 W TDP.
**Monitoring**:
- `tegrastats`: GPU/CPU clock, GR3D_FREQ, RAM, temperatures, power-rail draw, throttle events.
**Duration**: 8 h.
**Pass criteria**:
- 0 thermal throttle events at workstation ambient.
- Average power draw ≤ 25 W.
- Hot-soak chamber test at +50 °C is OUT OF SCOPE for data-acquisition; tracked as deferred AC-NEW-5 chamber gate. The test is expected to be exercised on a chamber-attached Jetson runner before any release tag.
---
### NFT-LIM-05: Disk storage budget (cache 10 GB + FDR 64 GB)
**Summary**: Validates the combined storage budget per `restrictions.md` § Onboard Hardware: ≥ tile cache (~10 GB) + per-flight FDR (64 GB) of available storage on the deployed Jetson.
**Traces to**: RESTRICT-HW-1 (storage budget)
**Preconditions**:
- Tier-2 acceptance run on the deployed-image Jetson.
**Monitoring**:
- Available storage on the production storage device after a single fresh install of SUT + fixtures.
**Duration**: one-shot.
**Pass criteria**: Available storage ≥ 74 GB after install, leaving headroom for system + logs.
+97
View File
@@ -0,0 +1,97 @@
# Security Tests
These tests cover the security-relevant AC and the Mode B revisions that introduced explicit security gates: D-CROSS-CVE-1 (OpenCV CVE pin), D-C8-9 (MAVLink 2.0 message signing), AC-NEW-7 (cache poisoning), and RESTRICT-SAT-1 / AC-8.1 (no in-flight Service calls).
### NFT-SEC-01: Cache-poisoning safety budget
**Summary**: Validates AC-NEW-7 — across all onboard tiles written, `P(geo-misalign > 30 m) < 1%` and `P(> 100 m) < 0.1%`. Multi-flight statistics constrained — PARTIAL with current single-flight fixture (see traceability matrix).
**Traces to**: AC-NEW-7, Mode B Fact #105 (Service voting layer external dependency), D-PROJ-2
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Run 3 trial flights against `derkachi-fixture` with synthetic over-confidence injection (deflate covariance ×1.5, ×2, ×3) | Each flight produces mid-flight tiles uploaded to `mock-suite-sat-service` |
| 2 | After each flight, the mock service records each received tile's quality metadata + onboard-asserted geo-alignment vs the GT-derived geo-alignment | Per-tile mis-alignment captured |
| 3 | Across all uploaded tiles, compute `P(misalign > 30 m)` and `P(misalign > 100 m)` | Statistic computed |
| 4 | Independently observe Suite Sat Service voting-layer behavior (mock) — verify mock-side gate refuses `trusted basemap` promotion when ingest votes don't agree | Voting contract assertion (per D-PROJ-2) |
**Pass criteria** (PARTIAL):
- `P(misalign > 30 m) < 1%`, `P(misalign > 100 m) < 0.1%` across the available trial flights.
- PARTIAL annotation: AC text expects ≥100 flights — escalates D-PROJ-3 fixture acquisition + D-PROJ-2 contract verification.
---
### NFT-SEC-02: No in-flight Service calls (network egress isolation)
**Summary**: Validates RESTRICT-SAT-1 / AC-8.1 — the SUT MUST NOT reach an external satellite provider during a flight. All cache reads come from the local cache.
**Traces to**: RESTRICT-SAT-1, AC-8.1
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Start the SUT with `e2e-net` configured `internal: true` (no external connectivity at the network layer) | SUT comes up; tile cache reads succeed |
| 2 | Run 5 min of Derkachi replay | All tile lookups served from local cache |
| 3 | Read SUT egress counter (Docker network stats) | 0 packets out to non-`e2e-net` destinations |
| 4 | Inspect SUT log for any "external Service call attempted" event | 0 events (proving the SUT didn't even try) |
| 5 | Defense-in-depth: temporarily flip `internal: false` AND blackhole DNS, re-run | Same — 0 egress attempts; no failed-DNS errors |
**Pass criteria**: 0 packets to non-`e2e-net` destinations; no "Service call attempted" log entry.
---
### NFT-SEC-03: MAVLink 2.0 message signing on AP wired channel
**Summary**: Validates D-C8-9 — AP-side rejects unsigned MAVLink GPS_INPUT messages on the signed channel; SUT-emitted (signed) messages pass; SBOM dump confirms passkey configuration.
**Traces to**: D-C8-9 (Plan-phase decision), Mode B Fact #109 (CVE-2026-1579 mitigation)
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Start `ardupilot-plane-sitl` with signing enabled and the test passkey loaded | Signing enabled |
| 2 | Inject an UNSIGNED `GPS_INPUT` from `mavproxy-listener` (i.e. a non-SUT origin) | AP rejects the message; rejection logged in AP STATUSTEXT |
| 3 | Inject a SIGNED `GPS_INPUT` with the SUT's signing key | AP accepts |
| 4 | Inject a SIGNED `GPS_INPUT` with a DIFFERENT key | AP rejects |
| 5 | Run the SUT's SBOM-dump CI step | SBOM declares the MAVLink signing module + passkey configuration entry present |
**Pass criteria**: AP rejection of unsigned + wrong-key; AP acceptance of correct-signed; SBOM declares signing.
**Note**: iNav-side is NOT subject to this test — Mode B Fact #109 documents the asymmetry as accepted residual risk (no MAVLink signing in iNav firmware per Source #129).
---
### NFT-SEC-04: OpenCV CVE-2025-53644 mitigation (≥4.12.0 pin)
**Summary**: Validates D-CROSS-CVE-1 — the pinned OpenCV ≥4.12.0 either decodes the CVE-2025-53644 PoC JPEG safely or rejects it; no crash, no buffer overflow.
**Traces to**: D-CROSS-CVE-1, Mode B Fact #112
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Build the SUT image with AddressSanitizer (ASan) instrumentation enabled (separate CI build) | Instrumented binary |
| 2 | Push `cve-jpeg-fixture` to every code path that uses OpenCV imread/imdecode: nav-camera frame source (C1), satellite tile thumbnail re-load (C4), tile cache import (C6) | Each path either decodes cleanly OR returns a graceful error |
| 3 | Observe ASan output | 0 buffer-overflow / use-after-free / uninitialized-read reports |
| 4 | Observe SUT process exit code | Process does NOT crash; if rejection path taken, exit code is 0 + error logged |
| 5 | CI step: lint the lockfile / pyproject.toml / requirements.txt for the OpenCV version pin | Pin asserts `opencv-python >= 4.12.0` (or platform-equivalent) |
**Pass criteria**: ASan clean; no crash; pinned version ≥ 4.12.0 in dependency manifest.
---
### NFT-SEC-05: Egress-blocked + DNS-blackholed defense-in-depth
**Summary**: Defense-in-depth complement to NFT-SEC-02 — verifies that even if the network policy were misconfigured, the SUT does not call out to public DNS / known satellite-provider hosts.
**Traces to**: RESTRICT-SAT-1 (defense-in-depth)
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Configure SUT container with iptables OUTPUT DROP except `e2e-net` AND DNS blackholed via `--dns 0.0.0.0` | SUT comes up |
| 2 | Run Derkachi replay | All operations succeed; 0 outbound DNS queries (verified via tcpdump on egress) |
| 3 | Inspect SUT for hardcoded provider hostnames (e.g. `*.googleapis.com`, `*.maxar.com`, `*.mapbox.com`, `*.azaion.com` for the runtime path) | grep finds zero references in compiled binary's strings table for runtime-path code |
**Pass criteria**: 0 DNS queries during replay; 0 provider hostname references in runtime path.
+129
View File
@@ -0,0 +1,129 @@
# Test Data Management
## Seed Data Sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|----------|-------------|---------------|-----------|---------|
| `still-image-set-60` | 60 nadir aerial images `AD000001-60.jpg` from `_docs/00_problem/input_data/` with WGS84 frame-center GT in `coordinates.csv` and per-image accuracy table in `expected_results/position_accuracy.csv`. Captured at 400 m AGL with ADTi 20MP 20L V1 (per `data_parameters.md`). Slow cadence (~1 per 2-3 s), so suitable for satellite-anchor frame-center tests, NOT frame-to-frame VIO. | FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-RES-03 (Monte Carlo), NFT-PERF-04 | Bind-mounted from `_docs/00_problem/input_data/` to `/test-data` in `e2e-runner` (read-only) | None — read-only fixture |
| `still-image-sat-refs-2` | Two paired Google Maps reference images `AD000001_gmaps.png`, `AD000002_gmaps.png`. Insufficient for full satellite-anchor coverage of the 60-image set; supplements the tile-cache fixture for AC-2.1b cross-validation only. | FT-P-05 (subset), FT-P-19 | Same as above | Same |
| `derkachi-fixture` | Cropped nadir flight footage `flight_derkachi/flight_derkachi.mp4` (H.264, 880×720, 30 fps, ~490.07 s = 14,700 frames) plus synchronized FC telemetry `flight_derkachi/data_imu.csv` (4,900 rows @ 10 Hz, columns `timestamp(ms)`, `Time`, `SCALED_IMU2.*`, `GLOBAL_POSITION_INT.*`). Three video frames per telemetry row. The `GLOBAL_POSITION_INT` columns are the trajectory ground truth. | FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01 (synth on top), FT-N-02, FT-N-03 (synth), FT-N-04 (synth), NFT-PERF-01, NFT-PERF-02, NFT-RES-01, NFT-RES-02, NFT-RES-03 (Monte Carlo), NFT-RES-04, NFT-LIM-02 (8 h synth load loop) | Same bind mount as above | Same |
| `tile-cache-fixture` | Pre-built FAISS HNSW index + tile filesystem covering: (a) the 60 still-image footprints at 0.3-0.5 m/px, (b) the Derkachi route bbox at the same resolution. Built once per CI run by `tests/fixtures/tile-cache-builder/` from the `_gmaps.png` references and from a curated public-data subset (when D-PROJ-3 is resolved — until then, stub-tile content for footprints not paired with `_gmaps.png`). Tile manifest schema per `restrictions.md` § Satellite Imagery. | FT-P-01, FT-P-05, FT-P-15, FT-P-16, FT-P-17, FT-P-19, FT-N-05, FT-N-06, NFT-LIM-03, NFT-PERF-01, NFT-PERF-04, NFT-SEC-01 (poisoning test), NFT-SEC-02 (egress) | Built into named Docker volume `tile-cache-fixture`; mounted read-only into SUT at `/var/azaion/tile-cache` | Volume removed at teardown |
| `synth-age-tile-set` | Two clones of the tile-cache-fixture with manifest `capture_date` field synthetically aged: `synth-age-7mo` (>6 mo, exceeds AC-8.2 active-conflict threshold) and `synth-age-13mo` (>12 mo, exceeds rear threshold). Tile pixels unchanged; only manifest dates differ. | FT-N-05, FT-N-06 | Built from `tile-cache-fixture` by date-mutating script in `tests/fixtures/age-injector/` | Volume removed at teardown |
| `outlier-injection-derkachi` | Synthetic adversarial overlay on `derkachi-fixture`: every Nth frame replaced by a random crop from a far-away tile (>350 m offset, per AC-3.1) to inject a visual outlier. Three injection densities: `light` (1 in 100), `medium` (1 in 10), `heavy` (1 in 3). Generated at runtime by `tests/fixtures/injectors/outlier.py`. | FT-N-01 | Generated at scenario start, written to `tmpfs` in `e2e-runner`, mounted into SUT as a derived frame source | Auto-cleared at teardown (tmpfs) |
| `blackout-spoof-derkachi` | Synthetic overlay on `derkachi-fixture`: pure-black frames inserted in 5 s / 15 s / 35 s windows AND simultaneous spoofed-GPS injection on the FC inbound stream. Spoof pattern: realistic-looking GPS jumps the trajectory 200-500 m in `north_east_random_direction`. Three windows produce three sub-scenarios per AC-NEW-8. Generated at runtime. | FT-N-04, NFT-RES-04 | Same | Same |
| `multi-segment-derkachi` | Synthetic overlay: 3+ blackout segments distributed across the Derkachi flight to exercise satellite-reference re-localization (AC-3.3) without spoofing. Generated at runtime. | FT-P-08 | Same | Same |
| `cold-boot-fixture` | The state needed to validate AC-NEW-1: a frozen FC pose (`GLOBAL_POSITION_INT` snapshot at flight-resume time) + the tile-cache-fixture + a blank FDR. Test cold-boots the SUT and measures TTFF. | NFT-PERF-03 (AC-NEW-1) | The frozen FC pose is a JSON fixture in `tests/fixtures/cold-boot/`; SUT is restarted (`docker compose restart gps-denied-onboard`) and TTFF is measured from container-ready event to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival at SITL | Container restart only |
| `mavlink-passkey` | A test-only MAVLink 2.0 signing passkey (32-byte hex). Used for D-C8-9 ArduPilot-track signing channel. NEVER reused outside test environment; checked-in as `tests/fixtures/secrets/mavlink-test-passkey.txt` with explicit comment "TEST ONLY". | FT-P-09 (AP track), NFT-SEC-03 | Loaded via Docker secret into SUT environment | None — fixture file |
| `cve-jpeg-fixture` | Crafted JPEG that triggers CVE-2025-53644 (uninitialized stack pointer → heap buffer write) in OpenCV 4.10/4.11. The pinned ≥4.12.0 must process it without crash and either decode safely or reject. | NFT-SEC-04 | Local-data-only fixture file at `tests/fixtures/security/cve-2025-53644.jpg` (sourced from public PoC, license-checked) | None — fixture file |
## Data Isolation Strategy
Each `pytest` test case runs against a fresh `gps-denied-onboard` container (`docker compose restart` between tests, OR `--forked` pytest mode that brings a clean compose stack per case for hermetic-critical tests). The `tile-cache-fixture` and `input-data` mounts are read-only so cross-contamination between tests is impossible at the SUT-input layer. The `fdr-output` volume is reset between tests (`docker volume rm` + recreate) so each test sees a blank FDR.
For Tier-2 (Jetson hardware), the same isolation discipline applies but at the systemd-service level: `systemctl restart gps-denied-onboard.service` between tests, `/var/azaion/fdr` is wiped between tests.
Synthetic-injection fixtures (`outlier-injection-derkachi`, `blackout-spoof-derkachi`, `multi-segment-derkachi`, `synth-age-tile-set`) are generated into per-test tmpfs and never written back to a persistent volume.
## Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |
|-----------------|----------------|-------------|-----------------|
| `AD000001.jpg` ... `AD000060.jpg` | `_docs/00_problem/input_data/` | 60 nadir still images, ADTi 20MP @ 400 m AGL | FT-P-01, FT-P-03, FT-P-05, FT-P-06, FT-P-15, FT-P-19, NFT-PERF-04, NFT-RES-03 |
| `coordinates.csv` | `_docs/00_problem/input_data/` | 60-row WGS84 frame-center GT (image, lat, lon) | Same as above |
| `AD000001_gmaps.png`, `AD000002_gmaps.png` | `_docs/00_problem/input_data/` | Google Maps satellite reference for images 1-2 | FT-P-05, FT-P-19 |
| `data_parameters.md` | `_docs/00_problem/input_data/` | AGL height (400 m) + camera model | All — global metadata |
| `flight_derkachi/flight_derkachi.mp4` | `_docs/00_problem/input_data/flight_derkachi/` | H.264 nadir video, 880×720 @ 30 fps, ~490 s | FT-P-02, FT-P-04, FT-P-07, FT-P-10, FT-N-01..04, NFT-PERF-01..04, NFT-RES-01..04, NFT-LIM-02 |
| `flight_derkachi/data_imu.csv` | `_docs/00_problem/input_data/flight_derkachi/` | 4,900 rows @ 10 Hz of `SCALED_IMU2` + `GLOBAL_POSITION_INT` | Same as above |
| `flight_derkachi/README.md` | `_docs/00_problem/input_data/flight_derkachi/` | Fixture metadata | Documentation only |
| `expected_results/results_report.md` | `_docs/00_problem/input_data/expected_results/` | Pass/fail rules + still-image and Derkachi mappings | All FT-P / FT-N scenarios that load this fixture |
| `expected_results/position_accuracy.csv` | `_docs/00_problem/input_data/expected_results/` | Per-image accuracy threshold flags | FT-P-01, NFT-RES-03 |
## Expected Results Mapping
This table closes the gap between each test scenario and the quantifiable expected result it asserts on. Comparison methods follow `.cursor/skills/test-spec/templates/expected-results.md`. The `Expected Result Source` column points at the canonical source of truth for the assertion.
### Position accuracy
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| FT-P-01 | `still-image-set-60` + `tile-cache-fixture` | `pass_count(error≤50m) ≥ 48` (≥80% of 60) AND `pass_count(error≤20m) ≥ 30` (≥50% of 60) | `threshold_min` on aggregate counts; per-image error via `numeric_tolerance` against Vincenty geodesic distance to GT in `coordinates.csv` | ±50 m / ±20 m | `expected_results/results_report.md` § Pass/Fail Rules + `expected_results/position_accuracy.csv` |
| FT-P-02 | `derkachi-fixture` | At each anchor frame, `‖propagated_centre next_anchor_centre‖ < 100 m` (visual-only) AND `< 50 m` (IMU-fused). Drift binned by `last_satellite_anchor_age_ms`. | `threshold_max` per anchor pair, then aggregate rule `≥95% of anchor pairs satisfy` | < 100 m / < 50 m | AC-1.3 + Derkachi `GLOBAL_POSITION_INT` GT |
| FT-P-03 | `still-image-set-60` (any 1 image) | Estimate output schema fields present: `lat:float`, `lon:float`, `cov_semi_major_m:float`, `source_label ∈ {satellite_anchored, visual_propagated, dead_reckoned}`, `last_satellite_anchor_age_ms:int` | `schema_match` (presence + type) AND `set_contains` (label) | N/A | AC-1.4 + AC-4.3 |
| FT-P-19 | `tile-cache-fixture` + `still-image-sat-refs-2` | Scale-ratio: any UAV-frame footprint at 400 m AGL retrievable from cache (FAISS top-K=10 includes a tile with center within 100 m of true position). Scene-change subset (PARTIAL — flag-marked, see traceability matrix). | `set_contains` (top-K result includes correct tile) | top-K hit | AC-8.6 |
### Image processing quality
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| FT-P-04 | `derkachi-fixture` | Frame-to-frame registration succeeds for `≥95%` of "normal" segments (defined per AC-2.1a: nadir ±10° bank/pitch from `data_imu.csv` `SCALED_IMU2` quaternion-derived attitude estimate, ≥40% inferred prior-frame overlap). Sharp-turn frames excluded from this denominator. | `threshold_min` on success ratio | ≥95% | AC-2.1a |
| FT-P-05 | `still-image-set-60` (with `_gmaps.png` subset for ground-truth match) | Satellite-anchor registration succeeds AND satisfies AC-1.1/1.2 accuracy AND MRE < 2.5 px | `threshold_max` MRE | < 2.5 px | AC-2.1b + AC-2.2 |
| FT-P-06 | `derkachi-fixture` (frame-to-frame) AND `still-image-set-60` (sat-anchor) | Mean Reprojection Error: `< 1.0 px` frame-to-frame, `< 2.5 px` satellite-anchored cross-domain | `threshold_max` per shape | < 1.0 / < 2.5 px | AC-2.2 |
### Resilience
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| FT-N-01 | `outlier-injection-derkachi` | Up to 350 m offset in a single frame is rejected as outlier; estimate continues from prior valid state with grown covariance; airframe tilt up to ±20° handled | Per-injected-outlier: `error_after_outlier ≤ error_before_outlier + 50 m` AND `covariance_growth_monotonic` | ±50 m drift budget | AC-3.1 |
| FT-N-02 | `derkachi-fixture` (sharp-turn segment, identified via `SCALED_IMU2` gyro_z spikes) | Sharp-turn frames may fail frame-to-frame registration; recovery via satellite-reference re-localization within next 3 frames | Boolean recovery within 3 frames | N/A | AC-3.2 |
| FT-P-08 | `multi-segment-derkachi` | ≥3 disconnected segments handled; satellite-reference re-localization succeeds at each gap; trajectory remains continuous (no >100 m jump) | `threshold_max` discontinuity | < 100 m | AC-3.3 |
| FT-N-03 | `derkachi-fixture` + synthetic 3-frame outage injector | After ≥3 consecutive frames AND ≥2 s without estimate: STATUSTEXT containing `OPERATOR_RELOC_REQUEST` emitted to GCS via `mavproxy-listener`; estimates labeled `dead_reckoned` continue | `regex` on STATUSTEXT + `set_contains` on labels | regex | AC-3.4 |
| FT-N-04 | `blackout-spoof-derkachi` (5 s / 15 s / 35 s windows) | Within ≤1 frame OR ≤400 ms: label switches to `dead_reckoned`; spoofed GPS rejected; covariance grows monotonically; `horiz_accuracy` not under-reported; `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz | `threshold_max` switch latency + `regex` STATUSTEXT + monotonic check | ≤400 ms | AC-3.5 |
### FC contract & startup
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| FT-P-09-AP | `derkachi-fixture` + `mavlink-passkey` + `ardupilot-plane-sitl` | `GPS_INPUT` messages reach AP SITL; AP EKF accepts them as `EK3_SRC1_POSXY=3` (GPS); MAVLink 2.0 signing handshake completes (D-C8-9); messages without valid signature are rejected | `exact` (AP source-set state via param read) + `boolean` (signing handshake success) + `exact` (rejection of unsigned in NFT-SEC-03) | N/A | AC-4.3 + D-C8-9 |
| FT-P-09-iNav | `derkachi-fixture` + `inav-sitl` | `MSP2_SENSOR_GPS` (ID 0x1F03) messages reach iNav SITL via TCP 5760; iNav GPS provider state shows `provider=MSP` and fix is acquired | `exact` on iNav GPS provider state via MSP read | N/A | AC-4.3 + Source #4 |
| FT-P-10 | `derkachi-fixture` | Per Mode B Fact #107: GTSAM iSAM2 smoothed past-keyframe pose estimates differ from raw single-shot estimates AND smoothed estimates are closer to `GLOBAL_POSITION_INT` GT than raw (IT-11). NOT validated as FC-side retroactive correction (out of scope per Mode B revision). | `numeric_tolerance` improvement check | smoothed_error < raw_error | AC-4.5 (revised) + Mode B Fact #107 |
| FT-P-11 | `cold-boot-fixture` + `ardupilot-plane-sitl` | On boot, SUT initializes from FC EKF's last valid GPS + IMU-extrapolated position | `numeric_tolerance` initial-pose-vs-FC-pose | ±50 m | AC-5.1 |
| NFT-RES-01 | `derkachi-fixture` + 4 s outage injector | After >3 s without estimate, FC falls back to IMU-only dead reckoning; SUT emits a `NO_ESTIMATE_TIMEOUT` failure log | `boolean` on FC EKF source-set transition + `regex` on log | N/A | AC-5.2 |
| NFT-RES-02 | `derkachi-fixture` + container restart mid-replay | After companion reboot, SUT re-initializes from FC's current IMU-extrapolated position; first emitted `GPS_INPUT` / `MSP2_SENSOR_GPS` is within ±100 m of FC's IMU-extrapolated pose at boot-complete time | `numeric_tolerance` pose at first emit | ±100 m | AC-5.3 |
### Performance
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| NFT-PERF-01 (Tier-2 only) | `derkachi-fixture` resampled to 3 Hz on Jetson Orin Nano Super | End-to-end latency (camera capture → GPS to FC) | `threshold_max` p95 | ≤ 400 ms | AC-4.1 + D-CROSS-LATENCY-1 |
| NFT-PERF-02 (Tier-1+2) | `derkachi-fixture` | Estimates emitted frame-by-frame (no batching > 1 frame); inter-emit interval p95 ≤ inter-frame interval × 1.05 | `threshold_max` p95 inter-emit | ≤ 350 ms (at 3 Hz target) | AC-4.4 |
| NFT-PERF-03 (Tier-2 only) | `cold-boot-fixture` | Cold-start TTFF: from container-ready to first valid `GPS_INPUT` / `MSP2_SENSOR_GPS` | `threshold_max` p95 over 50 cold boots | < 30 s | AC-NEW-1 |
| NFT-PERF-04 | `still-image-set-60` + spoofed FC GPS injection in `ardupilot-plane-sitl` | Spoofing-promotion latency: from FC GPS-denial / spoof signal to SUT estimate becoming AP primary position source | `threshold_max` p95 over 50 trials per FC | < 3 s | AC-NEW-2 |
### Resource limits
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| NFT-LIM-01 (Tier-2) | `derkachi-fixture` 8 h replay loop | Memory `< 8 GB shared` on Jetson Orin Nano Super throughout | `threshold_max` peak RSS over duration | ≤ 8 GB | AC-4.2 |
| NFT-LIM-02 (Tier-1) | 8 h Derkachi replay loop | FDR ≤ `64 GB`; no payload class silently dropped without a logged rollover | `threshold_max` total FDR size + `regex` on rollover-event presence | ≤ 64 GB | AC-NEW-3 |
| NFT-LIM-03 | `tile-cache-fixture` plus exercised manifests/overviews/indices | Cache budget `≤ 10 GB` for the ~400 km² operational area unless solution defines a separate descriptor budget | `threshold_max` total cache size | ≤ 10 GB | RESTRICT-SAT-2 + AC-8.3 |
| NFT-LIM-04 (Tier-2) | `derkachi-fixture` 8 h | CPU/GPU/temp/throttle telemetry recorded; no thermal throttling at 25 W TDP at the upper temp envelope (deferred to chamber for AC-NEW-5) | `threshold_max` throttle event count = 0 (workstation thermal-day) | 0 events | RESTRICT-HW-1 + AC-NEW-5 (Tier-2 partial) |
### Security
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|-----------------|------------|-----------------|-------------------|-----------|----------------------|
| NFT-SEC-01 | Synthetic over-confidence injection: deflate covariance ×1.5-3 in 3 trial flights, observe AC-NEW-7 cache-poisoning behavior at the `mock-suite-sat-service` ingest | Per flight: `P(geo-misalign > 30 m) < 1%`, `P(> 100 m) < 0.1%` of written tiles. PARTIAL — multi-flight Monte Carlo (≥100 flights per AC text) is reduced-confidence with current single Derkachi fixture; trace flag in matrix. | `threshold_max` on probability | < 1% / < 0.1% | AC-NEW-7 |
| NFT-SEC-02 | Network egress probe from SUT container | All non-`e2e-net` egress attempts blocked by Docker `internal: true`; per-attempt logged as security event in SUT log | `exact` (egress count = 0) + `regex` (security-event log emission) | N/A | RESTRICT-SAT-1 + AC-8.1 |
| NFT-SEC-03 | `ardupilot-plane-sitl` + un-signed MAVLink GPS_INPUT injection | AP SITL rejects unsigned messages on the signed channel; SUT-emitted (signed) messages pass; SBOM check confirms passkey configuration | `exact` (AP rejection of unsigned) + `boolean` (SBOM passkey present) | N/A | D-C8-9 + Mode B Fact #109 + AC-NEW-2 |
| NFT-SEC-04 | `cve-jpeg-fixture` fed to SUT image pipeline (C1 + C4 paths) | OpenCV ≥4.12.0 either decodes safely or rejects the file; no crash, no buffer overflow detected by AddressSanitizer | `boolean` on no-crash + ASan clean | N/A | D-CROSS-CVE-1 + Mode B Fact #112 |
## External Dependency Mocks
| External Service | Mock/Stub | How Provided | Behavior |
|-----------------|-----------|-------------|----------|
| Azaion Suite Satellite Service (ingest API for AC-NEW-7 voting layer) | `mock-suite-sat-service` Docker service | Local FastAPI stub returning canned tile-publish-acknowledgement responses with deterministic IDs; logs every received tile + per-tile quality metadata to a file the e2e-runner reads back | Returns 202 Accepted on every well-formed publish; returns 400 on malformed; never simulates real voting (the project's role is to publish, the Service's role is to vote per Mode B Fact #105 / D-PROJ-2) |
| ArduPilot Plane FC | `ardupilot-plane-sitl` Docker service | Open-source SITL build of ArduPilot Plane stable; configured with `GPS_TYPE=14` per Source #2 to accept MAVLink GPS_INPUT | Real ArduPilot EKF behavior; we observe but do not patch |
| iNav FC | `inav-sitl` Docker service | Open-source iNav SITL; GPS provider configured to MSP per `docs/SITL/SITL.md` | Real iNav GPS subsystem behavior; we observe but do not patch |
| QGroundControl GCS | `mavproxy-listener` Docker service | Passive MAVLink listener that forwards SUT → GCS stream into a `.tlog` file the e2e-runner parses | Captures all STATUSTEXT, NAMED_VALUE_FLOAT, downsampled position frames for assertions |
| AI camera (AC-7.x) | NOT MOCKED — out of scope per Phase 1 gate | N/A | NOT COVERED in current matrix — see traceability matrix |
## Data Validation Rules
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|-----------|-----------|-----------------|------------------------|
| Nav-camera frame | Resolution within ADTi spec (~5472×3648 production, downscaled equivalents allowed in Tier-1 Docker) | 0×0 frame, corrupt JPEG (CVE fixture), wrong color depth | Reject frame, log invalid-input event, do NOT advance estimator state |
| FC IMU sample | `SCALED_IMU2` fields present; timestamp monotonic; non-zero accelerometer norm | Missing field, backwards timestamp, NaN | Reject sample, log invalid-input event, propagate estimator from prior valid state |
| Satellite tile manifest | Required fields per `restrictions.md`: CRS, tile matrix, dimension, lat-adjusted m/px, capture date, source, compression. m/px ≥ 0.5. capture_date within AC-8.2 freshness window. | Missing capture_date, m/px = 1.0 (below floor), capture_date older than freshness threshold | Reject tile load OR downgrade to non-`satellite_anchored` source label per AC-NEW-6 |
| Spoofed FC GPS | (FC-side input the SUT detects) | GPS jump >200 m between consecutive 5 Hz frames; FC GPS-health flag toggled to spoofed | SUT switches estimator label to `dead_reckoned`, stops promoting FC GPS, continues per AC-NEW-8 |
| MAVLink GPS_INPUT outbound | Honest covariance — `horiz_accuracy` ≥ estimator's 95% covariance semi-major axis | Under-reported covariance | This is a defect (AC-NEW-4) — fail NFT-PERF-04 if observed |
| MAVLink message signature | MAVLink 2.0 signed on AP wired channel per D-C8-9 | Unsigned message on signed channel | AP-side rejection (NFT-SEC-03 expected behavior) |
@@ -0,0 +1,109 @@
# Traceability Matrix
This matrix is the canonical view of test coverage for the planning context. It traces every numbered AC and every restriction to the test scenario IDs that exercise it.
**Coverage discipline**: an AC counts as **Covered** when at least one test scenario has a quantifiable pass/fail criterion that exercises it. **PARTIAL** rows are exercised but with reduced confidence — the row's "Mitigation" column points to the action item (Plan-phase decision or D-PROJ gate) that, when resolved, lifts the row to Covered. **NOT COVERED** rows are deliberately deferred (out-of-scope for data acquisition per Phase 1 gate, or covered at a later workflow stage); each has a stated mitigation.
## Acceptance Criteria Coverage
| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|-------|---------------------|----------|----------|
| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01 | Covered |
| AC-1.2 | Frame-center GPS within 20 m for ≥50% of normal-flight photos | FT-P-01 | Covered |
| AC-1.3 | Cumulative drift between satellite-anchored fixes <100 m visual / <50 m IMU-fused | FT-P-02 | Covered |
| AC-1.4 | Estimate reports 95% covariance + source label | FT-P-03 | Covered |
| AC-2.1a | Frame-to-frame registration ≥95% on normal segments | FT-P-04 | Covered |
| AC-2.1b | Satellite-anchor registration meets AC-1.1/1.2/2.2/8.2/8.6 | FT-P-05, FT-P-19 | Covered |
| AC-2.2 | MRE <1 px frame-to-frame, <2.5 px cross-domain | FT-P-05, FT-P-06 | Covered |
| AC-3.1 | Tolerate up to 350 m outliers, tilt ±20° | FT-N-01 | Covered |
| AC-3.2 | Tolerate sharp turns; recovery via satellite re-loc | FT-P-07, FT-N-02 | Covered |
| AC-3.3 | Handle ≥3 disconnected segments via satellite re-loc | FT-P-08 | Covered |
| AC-3.4 | On ≥3 frames + ≥2 s outage, request operator re-loc; FC dead-reckons | FT-N-03 | Covered |
| AC-3.5 | Visual blackout + spoofed GPS failsafe | FT-N-04 | Covered |
| AC-4.1 | E2E latency <400 ms p95 | NFT-PERF-01 (Tier-2) | Covered |
| AC-4.2 | Memory <8 GB on Jetson | NFT-LIM-01 (Tier-2) | Covered |
| AC-4.3 | FC output contract: GPS_INPUT (AP) + MSP2_SENSOR_GPS (iNav) with honest covariance | FT-P-03, FT-P-09-AP, FT-P-09-iNav | Covered |
| AC-4.4 | Estimates streamed frame-by-frame | NFT-PERF-02 | Covered |
| AC-4.5 (revised) | Internal smoothing improves past-keyframe estimates (NOT FC retroactive correction per Mode B Fact #107) | FT-P-10 | Covered |
| AC-5.1 | Init from FC EKF's last valid GPS + IMU-extrapolated | FT-P-11 | Covered |
| AC-5.2 | On >3 s without estimate, FC IMU-only fallback; SUT logs | NFT-RES-01 | Covered |
| AC-5.3 | On reboot, re-init from FC IMU-extrapolated pose | NFT-RES-02 | Covered |
| AC-6.1 | GCS stream at 1-2 Hz | FT-P-12 | Covered |
| AC-6.2 | GCS may send commands via standard MAVLink | FT-P-13 | Covered |
| AC-6.3 | WGS84 output | FT-P-14 | Covered |
| AC-7.1 | AI-camera object localization, level-flight accuracy | — | NOT COVERED — out of scope for current data acquisition (no AI-camera fixture; AC-7.x scoped to a different sensor). Mitigation: defer to a follow-up cycle with AI-camera fixture; flag in `_docs/_process_leftovers/` as `2026-05-09_ai-camera-fixture-deferred.md` |
| AC-7.2 | AI-camera object coordinates from gimbal/zoom/altitude | — | NOT COVERED — same as AC-7.1 |
| AC-8.1 | Imagery via Suite Sat Service offline cache, ≥0.5 m/px | FT-P-15, FT-P-16, NFT-SEC-02 | Covered |
| AC-8.2 | Tile freshness <6 mo (active-conflict) / <12 mo (rear) | FT-N-05 | Covered |
| AC-8.3 | Imagery pre-loaded onto companion before flight | FT-P-15, FT-P-16 | Covered |
| AC-8.4 | Mid-flight tile generation with quality metadata | FT-P-17 | Covered |
| AC-8.5 | No raw nav/AI-cam frame retention except thumbnail log | FT-P-18 | Covered |
| AC-8.6 | Satellite relocalization scale-ratio + scene-change | FT-P-19 (scale FULL; scene-change PARTIAL) | PARTIAL — scene-change subset reduced confidence (only 2/60 stills have paired sat refs; no labeled change-pair dataset). Independent of the AC-NEW-4 / AC-NEW-7 multi-flight gap (those rows were resolved by AC-text relaxation 2026-05-09; AC-8.6 scene-change still requires a labeled change-pair dataset that synthetic perturbations cannot substitute for). Mitigation: deferred to a follow-up cycle when labeled change-pair data becomes available; surfaced in the Step 4 risk register |
| AC-NEW-1 | Cold-start TTFF <30 s p95 | NFT-PERF-03 (Tier-2) | Covered |
| AC-NEW-2 | Spoofing-promotion latency <3 s p95 | NFT-PERF-04 | Covered |
| AC-NEW-3 | FDR ≤64 GB / flight, no silent drops | NFT-LIM-02 | Covered |
| AC-NEW-4 | False-position safety: P(>500 m)<0.1%, P(>1 km)<0.01% | NFT-RES-03 | Covered — AC text relaxed 2026-05-09 to Monte-Carlo-over-current-data with stated 95% CI (Plan Phase 2a.0 outcome). Multi-flight statistical headroom is residual risk in the Step 4 risk register; D-PROJ-3 reopens validation when additional multi-flight data becomes available |
| AC-NEW-5 | Operating envelope -20 °C to +50 °C, 25 W TDP, 8 h, no throttle | NFT-LIM-04 (workstation baseline only) | PARTIAL — workstation thermal-day baseline only. Mitigation: chamber-attached Jetson runner + DO-160G shaker rig — out of scope for data-acquisition per Phase 1 gate; tracked as a release-tag-blocking gate |
| AC-NEW-6 | System rejects/downgrades stale tiles | FT-N-05, FT-N-06 | Covered |
| AC-NEW-7 | Cache poisoning: P(misalign>30 m)<1%, P(>100 m)<0.1% | NFT-SEC-01 | Covered (onboard-side) — AC text relaxed 2026-05-09 to Monte-Carlo-over-current-data with stated 95% CI for the onboard contribution. Cross-suite voting-layer contract verification (D-PROJ-2) is a parent-suite design task tracked outside this Plan cycle; multi-flight statistical headroom remains residual risk (D-PROJ-3) |
| AC-NEW-8 | Visual blackout + spoof degraded-mode escalation | FT-N-04, NFT-RES-04 | Covered |
## Restrictions Coverage
| Restriction ID | Restriction (one-line) | Test IDs | Coverage |
|---------------|-------------|----------|----------|
| RESTRICT-UAV-1 | Fixed-wing UAV, nav-camera fixed downward | FT-N-01 (tilt envelope) | Covered (envelope assertion) |
| RESTRICT-UAV-2 | Mission profile: 8 h flights, 60 km/h, ≤400 km² area | NFT-LIM-01, NFT-LIM-02 (8 h replay) | Covered |
| RESTRICT-UAV-3 | Sharp turns may share <5% overlap | FT-P-07, FT-N-02 | Covered |
| RESTRICT-UAV-4 | No raw-photo storage; tile cache + FDR only | FT-P-18, NFT-LIM-03 | Covered |
| RESTRICT-CAM-1 | Nav camera ADTi 20MP 20L V1 nadir-fixed | FT-N-01 (tilt envelope), test fixture validation | Covered |
| RESTRICT-CAM-2 | AI camera: gimbal+zoom only; level-flight scope | — | NOT COVERED — paired with AC-7.x deferral |
| RESTRICT-SAT-1 | Onboard cache offline-only; no in-flight Service calls | FT-P-16, NFT-SEC-02, NFT-SEC-05 | Covered |
| RESTRICT-SAT-2 | Cache budget 10 GB across operational area | NFT-LIM-03 | Covered |
| RESTRICT-SAT-3 | Tile freshness per AC-8.2 / AC-NEW-6 | FT-N-05, FT-N-06 | Covered |
| RESTRICT-SAT-4 | No Sentinel-2 / sub-0.5 m/px imagery | FT-P-15 (resolution floor) | Covered |
| RESTRICT-HW-1 | Jetson Orin Nano Super, 8 GB shared LPDDR5, 25 W | NFT-LIM-01, NFT-LIM-04, NFT-LIM-05 | Covered |
| RESTRICT-HW-2 | Cooling 25 W continuous, 8 h, upper temp envelope | NFT-LIM-04, deferred chamber test | PARTIAL — chamber portion deferred; same as AC-NEW-5 |
| RESTRICT-FC-1 | ArduPilot Plane + iNav supported; PX4 out of scope | FT-P-09-AP, FT-P-09-iNav, parameterized matrix | Covered |
| RESTRICT-FC-2 | iNav has no inbound MAVLink ext-positioning; MSP2 only | FT-P-09-iNav | Covered |
| RESTRICT-FC-3 | Output contract: WGS84 GPS via per-FC interface | FT-P-09-AP, FT-P-09-iNav, FT-P-14 | Covered |
| RESTRICT-COMM-1 | MAVLink for GCS link (QGroundControl) | FT-P-12, FT-P-13 | Covered |
| RESTRICT-COMM-2 | iNav has no MAVLink signing; accepted residual risk | NFT-SEC-03 (asymmetry note) | Covered (documented asymmetry) |
| RESTRICT-FAIL-1 | >3 s no estimate → FC IMU-only fallback | NFT-RES-01 | Covered |
| RESTRICT-FAIL-2 | False-position safety budget (AC-NEW-4) | NFT-RES-03 | Covered (via AC-NEW-4 relaxation 2026-05-09); multi-flight statistical headroom is residual risk in Step 4 |
| RESTRICT-FAIL-3 | Cold-start TTFF (AC-NEW-1), spoofing-promotion (AC-NEW-2) | NFT-PERF-03, NFT-PERF-04 | Covered |
## Coverage Summary
> Revised 2026-05-09 (Plan Phase 2a.0 outcomes): three rows moved PARTIAL → Covered (AC-NEW-4, AC-NEW-7, RESTRICT-FAIL-2) following AC-text relaxation per Q3=B. Restriction row count corrected from 19 to 20 (pre-existing arithmetic error).
| Category | Total Items | Covered | PARTIAL | Not Covered | Coverage % (Covered + PARTIAL counted half) |
|----------|-----------|---------|---------|-------------|--------------------------------------------|
| Acceptance Criteria | 39 | 35 | 2 | 2 | 92.3% |
| Restrictions | 20 | 18 | 1 | 1 | 92.5% |
| **Total** | **59** | **53** | **3** | **3** | **92.4%** |
Coverage clears the 75% gate with margin under both the inclusive reading (PARTIAL = covered) and the strict reading (PARTIAL not counted) — strict coverage is **(53 / 59) = 89.8%**. The remaining PARTIAL / Not Covered items are: AC-8.6 scene-change subset (needs labeled change-pair dataset, deferred), AC-NEW-5 hot-soak chamber (physical hardware, deferred), AC-7.1 / AC-7.2 (no AI-camera fixture, deferred), RESTRICT-CAM-2 (paired with AC-7.x), RESTRICT-HW-2 chamber portion (paired with AC-NEW-5).
## Uncovered Items Analysis
> Revised 2026-05-09 (Plan Phase 2a.0): AC-NEW-4 and AC-NEW-7 rows removed from this section after AC-text relaxation (Q3=B) flipped them to Covered with residual risk tracked in the Step 4 risk register.
| Item | Reason Not Covered | Risk | Mitigation |
|------|-------------------|------|-----------|
| AC-7.1 | No AI-camera fixture in `input_data/`; AC scoped to a different sensor than the nav camera; level-flight assumption + bank/pitch <5° is independent of the nav-cam pipeline | Object-localization accuracy untested; AI consumers may receive wrong coordinates if not flight-tested | Deferred to a follow-up Plan cycle scoped to AI-camera integration; recorded in `_docs/_process_leftovers/2026-05-09_ai-camera-fixture-deferred.md` (will be created in Phase 3 if confirmed). |
| AC-7.2 | Same as AC-7.1 | Same | Same |
| AC-8.6 (scene-change subset) | Only 2/60 stills paired with `_gmaps.png`; no labeled change-pair dataset bundled in `input_data/`. Independent of the AC-NEW-4 / AC-NEW-7 multi-flight gap (those were resolved by AC-text relaxation; AC-8.6 still needs labeled change-pair data) | Stale-tile match in active-conflict sectors may yield false `satellite_anchored`; AC-NEW-6 partially compensates but scene-change recall is unmeasured | Deferred to a follow-up cycle when labeled change-pair data becomes available (Maxar Open Data Ukraine + AerialVL change-pair subset). Scale-ratio half of AC-8.6 IS covered. |
| AC-NEW-5 | Workstation thermal-day baseline only. AC-NEW-5 hot-soak (25 W @ +50 °C, 8 h, no throttle) requires a thermal chamber — physical hardware, not data | Without chamber test, AC-4.1 latency budget at +50 °C is not validated; D-CROSS-LATENCY-1 hybrid auto-degrade unproven under real thermal stress | Chamber-attached Jetson runner gated as release-tag-blocker. NOT counted as data-acquisition deferral; counted as physical hardware deferral. |
| RESTRICT-CAM-2 | Paired with AC-7.x — no AI-camera fixture | Same as AC-7.x | Same as AC-7.x |
| RESTRICT-HW-2 (chamber portion) | Paired with AC-NEW-5 — physical chamber required | Same as AC-NEW-5 | Same as AC-NEW-5 |
## New findings forwarded into Plan (Steps 2 + 3 inputs)
These insights from Phase 2 augment the F1-F5 carried over from Phase 1; together they feed forward into Solution Analysis (Step 2) and Component Decomposition (Step 3):
1. **F6 — Two-tier execution profile is a first-class architectural concern.** The split between Tier-1 (workstation Docker) and Tier-2 (Jetson hardware) means several AC have validation locations that must appear in the deployment plan and in the CI matrix design. Add a "Tier-2 hardware-runner availability" entry to the project's risk register (Step 4).
2. **F7 — `mock-suite-sat-service` is a real testing-time dependency that must be documented as a component boundary (not just a test fixture).** It encodes the publish-side of D-PROJ-2 and feeds into both NFT-SEC-01 and FT-P-17. Component decomposition (Step 3) should treat the Service-publish contract as an explicit C8/C10 cross-cutting boundary, not buried inside C8.
3. **F8 — VioStrategy parameterization in CI requires both a production binary AND a research binary.** D-C1-1-SUB-A locked the BUILD_VINS_MONO=ON/OFF split; the test plan must produce both binaries on every PR for the comparative-study report (IT-12 in `solution.md`). Add to deployment plan (Step 2) and to epic/work-item planning (Step 6).
4. **F9 — D-PROJ-3 (fixture acquisition) is now a named deliverable** with a clear gate: must resolve before greenfield Step 5 re-runs the full test-spec with architecture context. Promote to risk register and to the architecture's open-items list.
5. **F10 — Defense-in-depth security layer (NFT-SEC-05 DNS blackholing, OPENCV ASan build, SBOM signing-passkey verification)** implies CI/build infrastructure features (multi-stage build for ASan instrumentation, SBOM generator, lockfile linter). Add to deployment plan (Step 2).