mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 20:31:12 +00:00
dcde602f61
New e2e test runs gps-denied-replay --auto-trim against the real
derkachi.tlog + flight video + AZ-702 calibration, computes the
horizontal-error distribution (mean/p50/p95/p99 + 10/25/50/100 m
threshold-hit share), writes _docs/06_metrics/real_flight_
validation_{date}.md, and asserts honest PASS/FAIL with no @xfail
mask. AZ-404's 1-min test is untouched (sibling, not replacement).
Extends gps_compare.py with HorizontalErrorDistribution +
percentile_sorted (numpy-equivalent linear interpolation). New
test helper _report_writer.py renders the canonical Markdown
schema documented as FT-P-20 in blackbox-tests.md.
16 new unit tests pin distribution arithmetic, verdict gate,
failure-message templating (references calibration acquisition
method per AC-3), and report layout. 129 passed in focused
regression, 3 skipped (real video / Tier-2 prerequisites).
Zero new mypy --strict errors.
Co-authored-by: Cursor <cursoragent@cursor.com>
675 lines
30 KiB
Markdown
675 lines
30 KiB
Markdown
# Blackbox Tests
|
||
|
||
All tests run from the `e2e-runner` container against the SUT through public boundaries only (frame source, FC inbound stream, tile cache mount, FC outbound observed via SITL, GCS observed via mavproxy-listener, FDR via post-run filesystem read). Two FC adapters parameterize every test that touches the FC contract: `ardupilot` and `inav`. Two `VioStrategy` modes parameterize Tier-1 product correctness tests: `okvis2` (production-default) and `klt_ransac` (mandatory simple-baseline). `vins_mono` is parameterized only when the research build is under test.
|
||
|
||
## Positive Scenarios
|
||
|
||
### FT-P-01: Still-image satellite-anchor frame-center accuracy
|
||
|
||
**Summary**: Validates the canonical satellite-anchor frame-center geolocation pipeline against the 60-image GT set.
|
||
**Traces to**: AC-1.1, AC-1.2
|
||
**Category**: Position Accuracy
|
||
|
||
**Preconditions**:
|
||
- `tile-cache-fixture` mounted at `/var/azaion/tile-cache`.
|
||
- SUT cold-started with no prior state; configured for the FC adapter under test.
|
||
|
||
**Input data**: `still-image-set-60` (per `test-data.md`).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | For each image `AD0000NN.jpg` in order, write the frame to the SUT's frame-source path and wait up to 5 s for the corresponding outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) message at the SITL listener | One outbound message per input image; payload includes WGS84 lat/lon |
|
||
| 2 | Compute Vincenty geodesic distance between estimated lat/lon and `coordinates.csv` GT row for that image | Per-image error ≤ 50 m for ≥80% of images, ≤ 20 m for ≥50% |
|
||
| 3 | Capture per-image error to `e2e-results/run-${RUN_ID}/ft-p-01.csv` | CSV produced with one row per image |
|
||
|
||
**Expected outcome**: aggregate `pass_count(error≤50m) ≥ 48` AND `pass_count(error≤20m) ≥ 30` (matching the rule in `expected_results/results_report.md`).
|
||
**Max execution time**: 5 min (60 images × ~5 s including SITL round trip).
|
||
|
||
---
|
||
|
||
### FT-P-02: Derkachi VIO drift between satellite anchors
|
||
|
||
**Summary**: Validates cumulative drift between consecutive satellite-anchored fixes during the Derkachi flight replay.
|
||
**Traces to**: AC-1.3
|
||
**Category**: Position Accuracy
|
||
|
||
**Preconditions**:
|
||
- `tile-cache-fixture` mounted (covers Derkachi route).
|
||
- SUT cold-started; FC adapter under test connected via SITL; `data_imu.csv` replayed at 10 Hz into FC IMU stream.
|
||
|
||
**Input data**: `derkachi-fixture` video at 30 fps + IMU CSV at 10 Hz.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start synchronized video + IMU replay (3 video frames per IMU row) | SUT begins emitting estimates at the SUT's runtime cadence |
|
||
| 2 | At each frame whose outbound estimate carries `source_label = satellite_anchored`, record the propagated centre estimate of the prior visual-only segment AND the new anchor centre | Two values per anchor pair captured |
|
||
| 3 | Compute per-anchor-pair drift = ‖propagated_centre − next_anchor_centre‖. Bin by `last_satellite_anchor_age_ms`. | Bins populated; CSV emitted |
|
||
|
||
**Expected outcome**: Across all anchor pairs, at least 95% satisfy `drift < 100 m` (visual-only) AND `drift < 50 m` (when CombinedImuFactor IMU fusion is active in C5). Drift distribution monotonically grows with anchor age, with no anomalous spike.
|
||
**Max execution time**: 10 min (8 min replay + parsing).
|
||
|
||
---
|
||
|
||
### FT-P-03: Estimate output schema and source-label semantics
|
||
|
||
**Summary**: Validates the SUT's outbound estimate carries every required field with correct types and the source label is one of the three allowed values.
|
||
**Traces to**: AC-1.4, AC-4.3
|
||
**Category**: Position Accuracy / FC Contract
|
||
|
||
**Preconditions**:
|
||
- One image from `still-image-set-60` already loaded into the cache fixture.
|
||
- SUT cold-started.
|
||
|
||
**Input data**: any single image (default `AD000001.jpg`).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Push the image to the frame source | SUT emits one outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) AND one out-of-band channel message (MAVLink `STATUSTEXT` or `NAMED_VALUE_FLOAT` per AC-4.3) carrying the source label |
|
||
| 2 | Read the SITL-side fields | Schema match: `lat`, `lon`, `cov_semi_major_m`, `last_satellite_anchor_age_ms` present and well-typed |
|
||
| 3 | Read the out-of-band label channel | Label ∈ `{satellite_anchored, visual_propagated, dead_reckoned}` |
|
||
|
||
**Expected outcome**: Schema check passes AND label is in the allowed set.
|
||
**Max execution time**: 30 s.
|
||
|
||
---
|
||
|
||
### FT-P-04: Derkachi frame-to-frame registration success rate
|
||
|
||
**Summary**: Validates frame-to-frame registration succeeds for ≥95% of "normal" segments of the Derkachi flight.
|
||
**Traces to**: AC-2.1a
|
||
**Category**: Image Processing
|
||
|
||
**Preconditions**:
|
||
- SUT cold-started; FC adapter and VioStrategy both parameterized.
|
||
|
||
**Input data**: `derkachi-fixture` (full duration). "Normal" segments derived per AC-2.1a: nadir ±10° bank/pitch (estimated from `SCALED_IMU2`-derived attitude), ≥40% inferred prior-frame overlap (heuristic from frame-to-frame translation magnitude).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay the Derkachi fixture | SUT emits per-frame registration-success metric (exposed via `NAMED_VALUE_FLOAT` or in FDR per AC-NEW-3) |
|
||
| 2 | After replay, compute success-ratio over normal segments only | Success ratio ≥ 0.95 |
|
||
|
||
**Expected outcome**: ≥95% on normal segments. Sharp-turn segments (excluded from this denominator) are exercised separately by FT-N-02.
|
||
**Max execution time**: 12 min.
|
||
|
||
---
|
||
|
||
### FT-P-05: Satellite-anchor cross-domain registration
|
||
|
||
**Summary**: Validates the satellite-anchor (UAV→satellite cross-domain) matcher succeeds with the cross-domain MRE budget.
|
||
**Traces to**: AC-2.1b, AC-2.2
|
||
**Category**: Image Processing
|
||
|
||
**Preconditions**:
|
||
- `tile-cache-fixture` includes the still-image footprints.
|
||
- SUT cold-started.
|
||
|
||
**Input data**: `still-image-set-60` plus `still-image-sat-refs-2` (for the 2 images with paired `_gmaps.png`).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | For each still image, push to frame source | One satellite-anchor result per image |
|
||
| 2 | Read per-frame MRE (via FDR or `NAMED_VALUE_FLOAT`) | MRE recorded |
|
||
| 3 | Aggregate per-image accuracy AND MRE distribution | All images: MRE < 2.5 px; ≥80% within 50 m of GT; ≥50% within 20 m of GT |
|
||
|
||
**Expected outcome**: AC-1.1, AC-1.2, AC-2.1b, AC-2.2 all satisfied.
|
||
**Max execution time**: 5 min.
|
||
|
||
---
|
||
|
||
### FT-P-06: Mean Reprojection Error budgets (frame-to-frame + cross-domain)
|
||
|
||
**Summary**: Validates the two MRE budgets are honored.
|
||
**Traces to**: AC-2.2
|
||
|
||
**Preconditions**: Same as FT-P-04 + FT-P-05.
|
||
|
||
**Input data**: `derkachi-fixture` (frame-to-frame MRE) + `still-image-set-60` (cross-domain MRE).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Run FT-P-04 and FT-P-05 in sequence; collect per-frame MRE from both runs | MRE values captured |
|
||
| 2 | Aggregate by domain (frame-to-frame vs satellite-anchored) | Distribution per domain |
|
||
|
||
**Expected outcome**: Frame-to-frame MRE < 1.0 px (95th percentile); cross-domain MRE < 2.5 px (95th percentile).
|
||
**Max execution time**: piggybacks on FT-P-04 / FT-P-05.
|
||
|
||
---
|
||
|
||
### FT-P-07: Sharp-turn recovery via satellite reference
|
||
|
||
**Summary**: Validates that frames during sharp turns may fail frame-to-frame but recover via satellite-reference re-localization.
|
||
**Traces to**: AC-3.2
|
||
|
||
**Preconditions**:
|
||
- Sharp-turn segment of the Derkachi flight identified by gyro_z spikes in `SCALED_IMU2`. (If Derkachi has no sharp turn meeting AC-3.2 thresholds, fall back to a synthetic gyro overlay; flag in FDR.)
|
||
|
||
**Input data**: `derkachi-fixture` filtered to sharp-turn segment(s).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay sharp-turn segment | SUT emits `source_label = visual_propagated` or `dead_reckoned` during turn |
|
||
| 2 | After turn, observe next satellite-anchor attempt | Recovery: `source_label = satellite_anchored` returns within 3 frames of turn end; drift ≤ 200 m, heading change handled |
|
||
|
||
**Expected outcome**: Recovery within 3 frames; <200 m drift; <70° heading change handled.
|
||
**Max execution time**: 5 min (per turn segment, multiple per replay).
|
||
|
||
---
|
||
|
||
### FT-P-08: Multi-segment satellite-reference re-localization
|
||
|
||
**Summary**: Validates ≥3 disconnected segments per flight handled via satellite-reference re-localization.
|
||
**Traces to**: AC-3.3
|
||
|
||
**Preconditions**:
|
||
- `multi-segment-derkachi` synthetic fixture generated with 3+ blackout windows.
|
||
|
||
**Input data**: `multi-segment-derkachi`.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay with injected blackout windows | SUT emits `dead_reckoned` during each blackout |
|
||
| 2 | At end of each blackout, observe re-localization | `source_label` returns to `satellite_anchored` within 3 frames; trajectory continuity preserved (no >100 m jump) |
|
||
|
||
**Expected outcome**: All 3+ segments re-localized successfully; no trajectory jump exceeds 100 m.
|
||
**Max execution time**: 12 min.
|
||
|
||
---
|
||
|
||
### FT-P-09-AP: ArduPilot Plane GPS_INPUT contract conformance + signing
|
||
|
||
**Summary**: Validates `GPS_INPUT` reaches AP SITL, AP EKF accepts it as primary GPS, and MAVLink 2.0 message signing handshake completes per D-C8-9.
|
||
**Traces to**: AC-4.3 (AP), D-C8-9, AC-NEW-2 (precondition)
|
||
**Category**: FC Contract / Security
|
||
|
||
**Preconditions**:
|
||
- `ardupilot-plane-sitl` running with `GPS_TYPE=14`.
|
||
- `mavlink-passkey` loaded as Docker secret into SUT.
|
||
|
||
**Input data**: `derkachi-fixture` (any 60 s segment).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start SUT with FC adapter `ardupilot` | Signing handshake completes within 5 s; signed channel established |
|
||
| 2 | Replay 60 s of Derkachi | SUT emits signed `GPS_INPUT` at the configured rate |
|
||
| 3 | Read AP `EK3_SRC1_POSXY` parameter via MAVPROXY | Value reads `3` (GPS source) |
|
||
| 4 | Read AP-side GPS health via `GPS_RAW_INT` | Fix type ≥ 3 (3D fix), HDOP within nominal |
|
||
|
||
**Expected outcome**: Signing handshake succeeds; AP EKF on GPS source-set; GPS_RAW_INT shows healthy fix.
|
||
**Max execution time**: 90 s.
|
||
|
||
---
|
||
|
||
### FT-P-09-iNav: iNav MSP2_SENSOR_GPS contract conformance
|
||
|
||
**Summary**: Validates `MSP2_SENSOR_GPS` reaches iNav SITL and iNav GPS provider accepts it as the sole source.
|
||
**Traces to**: AC-4.3 (iNav)
|
||
**Category**: FC Contract
|
||
|
||
**Preconditions**:
|
||
- `inav-sitl` running with GPS provider configured to MSP per `docs/SITL/SITL.md`.
|
||
|
||
**Input data**: `derkachi-fixture` (any 60 s segment).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start SUT with FC adapter `inav` | TCP connection to `inav-sitl:5760` established |
|
||
| 2 | Replay 60 s of Derkachi | SUT emits `MSP2_SENSOR_GPS` (ID 0x1F03) frames at 5 Hz |
|
||
| 3 | Read iNav GPS state via MSP query | `gpsSol.fixType` ≥ 3, `gpsSol.numSat` reflects emitted value, provider=MSP |
|
||
|
||
**Expected outcome**: iNav GPS state reflects emitted frames; no fallback to internal GPS.
|
||
**Max execution time**: 90 s.
|
||
|
||
---
|
||
|
||
### FT-P-10: GTSAM smoothing-loop look-back accuracy (IT-11)
|
||
|
||
**Summary**: Validates the smoothing-loop's past-keyframe pose estimates improve over raw single-shot estimates (Mode B Fact #107). NOT validated as FC-side retroactive correction.
|
||
**Traces to**: AC-4.5 (revised scope per Mode B), Mode B Fact #107
|
||
**Category**: Position Accuracy / Internal smoothing
|
||
|
||
**Preconditions**:
|
||
- SUT cold-started; FDR enabled.
|
||
|
||
**Input data**: `derkachi-fixture` full replay.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay Derkachi end-to-end | FDR contains per-keyframe (a) raw single-shot pose at first emission, (b) smoothed pose at iSAM2 convergence |
|
||
| 2 | After replay, parse FDR; for each past keyframe compute distance(raw, GT) and distance(smoothed, GT) | Per-keyframe pair extracted |
|
||
| 3 | Aggregate across keyframes | smoothed_error < raw_error for ≥80% of keyframes; mean improvement ≥ 5 m |
|
||
|
||
**Expected outcome**: Internal smoothing improves past-keyframe accuracy; FC-side retroactive correction NOT exercised (out of scope per Mode B revision A6).
|
||
**Max execution time**: 12 min.
|
||
|
||
---
|
||
|
||
### FT-P-11: Cold-start initialization from FC EKF
|
||
|
||
**Summary**: Validates SUT initialization from FC EKF's last valid GPS + IMU-extrapolated position at GPS denial.
|
||
**Traces to**: AC-5.1
|
||
**Category**: Startup
|
||
|
||
**Preconditions**:
|
||
- `cold-boot-fixture` provides a frozen FC pose snapshot.
|
||
- SUT not running.
|
||
|
||
**Input data**: `cold-boot-fixture`.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start `ardupilot-plane-sitl` (or `inav-sitl`) with the frozen-pose snapshot loaded | SITL EKF reflects the snapshot pose |
|
||
| 2 | Start SUT | SUT queries FC EKF; reads pose; initializes |
|
||
| 3 | Push first nav-camera frame | First outbound estimate's lat/lon within ±50 m of the FC EKF snapshot pose |
|
||
|
||
**Expected outcome**: First emitted estimate uses FC EKF's pose as prior, within ±50 m tolerance.
|
||
**Max execution time**: 60 s.
|
||
|
||
---
|
||
|
||
### FT-P-12: GCS downsample at 1-2 Hz
|
||
|
||
**Summary**: Validates position estimates + confidence stream to the GCS (via `mavproxy-listener`) at 1-2 Hz.
|
||
**Traces to**: AC-6.1
|
||
**Category**: GCS / Telemetry
|
||
|
||
**Preconditions**:
|
||
- `mavproxy-listener` running and capturing to `.tlog`.
|
||
|
||
**Input data**: `derkachi-fixture` 60 s segment.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start replay | SUT emits to FC at runtime cadence (~3 Hz) AND to GCS at 1-2 Hz |
|
||
| 2 | After replay, parse `.tlog` for SUT-emitted GCS messages over the 60 s window | GCS rate within [1, 2] Hz inclusive |
|
||
|
||
**Expected outcome**: GCS-side rate observed in [1, 2] Hz over the window.
|
||
**Max execution time**: 90 s.
|
||
|
||
---
|
||
|
||
### FT-P-13: GCS command path (operator re-loc hint)
|
||
|
||
**Summary**: Validates that GCS-originated commands (via standard MAVLink) can carry operator re-loc hints to the SUT.
|
||
**Traces to**: AC-6.2
|
||
**Category**: GCS / Telemetry
|
||
|
||
**Preconditions**:
|
||
- `mavproxy-listener` configured to send commands.
|
||
- SUT in `dead_reckoned` state (e.g. mid-blackout from FT-N-03 setup).
|
||
|
||
**Input data**: synthesized `STATUSTEXT` carrying re-loc hint from MAVPROXY.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | While SUT is in `dead_reckoned`, send re-loc-hint STATUSTEXT from MAVPROXY | SUT acknowledges the hint via FDR log entry |
|
||
| 2 | Push next nav-camera frame after hint | Next satellite-anchor attempt uses hint as a search prior |
|
||
|
||
**Expected outcome**: Hint received; next anchor attempt biases search; no rejection.
|
||
**Max execution time**: 60 s.
|
||
|
||
---
|
||
|
||
### FT-P-14: WGS84 output coordinate system
|
||
|
||
**Summary**: Validates output coordinates are in WGS84 (latitude/longitude in degrees as per ArduPilot/iNav GPS convention scaled to 1e-7).
|
||
**Traces to**: AC-6.3
|
||
|
||
**Preconditions**: any FT-P-01 / FT-P-02 run.
|
||
|
||
**Input data**: any.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Capture one outbound `GPS_INPUT` / `MSP2_SENSOR_GPS` from SITL | Lat/lon present; values in valid WGS84 range; scaled per protocol convention |
|
||
|
||
**Expected outcome**: Coordinates parse as WGS84 within Earth bounds.
|
||
**Max execution time**: 30 s.
|
||
|
||
---
|
||
|
||
### FT-P-15: Tile cache schema and resolution floor
|
||
|
||
**Summary**: Validates the tile cache manifest carries every required field and tiles meet the ≥0.5 m/px floor.
|
||
**Traces to**: AC-8.1, RESTRICT-SAT-2 (manifest schema)
|
||
|
||
**Preconditions**: `tile-cache-fixture` mounted.
|
||
|
||
**Input data**: tile cache.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | The SUT exposes a one-time cache-load self-check at startup; observe via FDR | Each tile manifest entry has CRS, tile matrix, dimension, lat-adjusted m/px, capture date, source, compression |
|
||
| 2 | Inspect m/px values | All ≥ 0.5 m/px; reject below floor |
|
||
|
||
**Expected outcome**: All loaded tiles pass schema check and resolution floor.
|
||
**Max execution time**: 30 s.
|
||
|
||
---
|
||
|
||
### FT-P-16: Pre-loaded cache (offline-only interface)
|
||
|
||
**Summary**: Validates the SUT loads tiles from the local cache only, with no in-flight Service calls.
|
||
**Traces to**: AC-8.3, RESTRICT-SAT-1
|
||
|
||
**Preconditions**: `tile-cache-fixture` mounted; `e2e-net` `internal: true` enforced (no internet egress).
|
||
|
||
**Input data**: `derkachi-fixture` 60 s segment.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start replay | SUT serves tiles from `/var/azaion/tile-cache` only |
|
||
| 2 | Observe network egress counter on `gps-denied-onboard` container | All egress to non-`e2e-net` destinations is 0 (paired with NFT-SEC-02) |
|
||
|
||
**Expected outcome**: 0 external egress; replay completes against local cache.
|
||
**Max execution time**: 90 s.
|
||
|
||
---
|
||
|
||
### FT-P-17: Mid-flight tile generation
|
||
|
||
**Summary**: Validates the SUT continuously orthorectifies nav-camera frames into basemap-projected tiles, deduplicates them, and stores them locally for landing-time upload.
|
||
**Traces to**: AC-8.4
|
||
|
||
**Preconditions**: empty `mid-flight-tile-output` directory in the FDR volume; mock-suite-sat-service running.
|
||
|
||
**Input data**: `derkachi-fixture` 5 min segment.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start replay | SUT generates and writes tiles to FDR's `mid-flight-tile-output/` |
|
||
| 2 | After replay, read tiles | ≥1 tile per ~3 s of high-quality nav frames; each tile carries quality metadata sufficient for the Service voting layer (per Mode B Fact #105) |
|
||
| 3 | Simulate landing event; SUT uploads to `mock-suite-sat-service` | Mock service receives all tiles with HTTP 202 |
|
||
|
||
**Expected outcome**: Tiles produced + deduplicated + uploaded with quality metadata.
|
||
**Max execution time**: 8 min.
|
||
|
||
---
|
||
|
||
### FT-P-18: No raw nav/AI-cam frame retention (storage policy)
|
||
|
||
**Summary**: Validates that no raw nav-camera or AI-camera frames are retained except the ≤0.1 Hz failed-tile-gen thumbnail log.
|
||
**Traces to**: AC-8.5
|
||
|
||
**Preconditions**: `derkachi-fixture` replay just completed.
|
||
|
||
**Input data**: post-replay state of FDR + tile cache.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Walk the FDR + tile cache for any file matching nav-camera raw-frame pattern (JPEG/RAW with original dimensions) | Only the failed-tile-gen thumbnail log files present (≤0.1 Hz cadence) |
|
||
| 2 | Verify thumbnail log is bounded by AC-NEW-3 FDR budget | Total thumbnail log < 1 GB over 8 h (NFT-LIM-02 cross-check) |
|
||
|
||
**Expected outcome**: 0 unauthorized raw frames retained.
|
||
**Max execution time**: 30 s (filesystem walk).
|
||
|
||
---
|
||
|
||
### FT-P-19: Satellite relocalization scale-ratio + scene-change
|
||
|
||
**Summary**: Validates UAV-frame ground footprint at deployment altitude is retrievable from cache regardless of internal tiling. Scene-change subset is reduced-confidence (PARTIAL — see traceability matrix).
|
||
**Traces to**: AC-8.6 (scale-ratio FULL; scene-change PARTIAL)
|
||
|
||
**Preconditions**: `tile-cache-fixture` mounted with multi-zoom-level coverage.
|
||
|
||
**Input data**: `still-image-set-60` (scale-ratio); the 2 paired `_gmaps.png` images (scene-change subset).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | For each still image, query cache top-K=10 retrieval | Top-K result includes a tile whose centre is within 100 m of the image's true centre (scale-ratio satisfied) |
|
||
| 2 | For the 2 paired images, run cross-domain matcher against the `_gmaps.png` reference | Scale-ratio match succeeds; scene-change behavior recorded (PARTIAL — full coverage requires a labeled change-pair dataset, deferred under D-PROJ-3) |
|
||
|
||
**Expected outcome**: Scale-ratio passes for 60/60; scene-change recorded as PARTIAL.
|
||
**Max execution time**: 5 min.
|
||
|
||
---
|
||
|
||
## Negative Scenarios
|
||
|
||
### FT-N-01: 350 m outlier injection tolerance
|
||
|
||
**Summary**: Validates the system tolerates up to 350 m outliers between two consecutive frames with airframe tilt up to ±20°.
|
||
**Traces to**: AC-3.1, RESTRICT-CAM-1 (nadir camera, tilt limits)
|
||
|
||
**Preconditions**: SUT running on `derkachi-fixture`; `outlier-injection-derkachi` injector primed in `medium` density.
|
||
|
||
**Input data**: `outlier-injection-derkachi` (medium).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start replay with injector active (every 10th frame replaced by far-away tile crop) | SUT detects outlier; rejects from anchor; estimate continues from prior valid state |
|
||
| 2 | Compare per-frame outbound estimate vs GT for non-outlier frames | Error_after_outlier ≤ error_before_outlier + 50 m; covariance grows monotonically across the outlier event |
|
||
|
||
**Expected outcome**: Outliers rejected; estimate degrades at most by 50 m drift; covariance monotonic.
|
||
**Max execution time**: 12 min.
|
||
|
||
---
|
||
|
||
### FT-N-02: Sharp-turn frame-to-frame failure expected
|
||
|
||
**Summary**: Negative twin of FT-P-07 — validates that during a sharp turn, frame-to-frame may LEGITIMATELY fail, and the system labels accordingly.
|
||
**Traces to**: AC-3.2 (negative path)
|
||
|
||
**Preconditions**: Same as FT-P-07.
|
||
|
||
**Input data**: sharp-turn segment of Derkachi (or synthetic gyro overlay).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay sharp-turn segment | During turn frames: `source_label` ∈ `{visual_propagated, dead_reckoned}`; covariance grows |
|
||
| 2 | After turn, observe label transition | Label returns to `satellite_anchored` once next anchor succeeds |
|
||
|
||
**Expected outcome**: Sharp-turn frames correctly mark themselves as not-satellite-anchored; recovery exercised in FT-P-07.
|
||
**Max execution time**: 5 min.
|
||
|
||
---
|
||
|
||
### FT-N-03: Extended outage triggers operator re-loc request
|
||
|
||
**Summary**: Validates that on ≥3 consecutive frames AND ≥2 s without estimate, the SUT requests operator re-loc via telemetry and continues dead-reckoned propagation.
|
||
**Traces to**: AC-3.4
|
||
|
||
**Preconditions**: `derkachi-fixture` + 3-frame outage injector primed.
|
||
|
||
**Input data**: synthetic outage on Derkachi.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Trigger 3-consecutive-frame failure (corrupt frames) | SUT fails to produce estimates for 3+ frames |
|
||
| 2 | Wait ≥2 s | STATUSTEXT containing `OPERATOR_RELOC_REQUEST` emitted to `mavproxy-listener` |
|
||
| 3 | During outage, observe FC outbound | Estimates labeled `dead_reckoned` continue; FC uses last-known + IMU extrapolation |
|
||
|
||
**Expected outcome**: Re-loc request emitted; dead-reckoned estimates continue.
|
||
**Max execution time**: 60 s.
|
||
|
||
---
|
||
|
||
### FT-N-04: Visual blackout + spoofed GPS combined failsafe
|
||
|
||
**Summary**: Validates the AC-3.5 + AC-NEW-8 combined failsafe: switch label, reject spoof, propagate from last trusted state, monotonic covariance, STATUSTEXT.
|
||
**Traces to**: AC-3.5, AC-NEW-8
|
||
|
||
**Preconditions**: `blackout-spoof-derkachi` injector primed for 5 s, 15 s, 35 s windows; FC inbound stream patched to inject spoofed GPS.
|
||
|
||
**Input data**: `blackout-spoof-derkachi` (each window run as a sub-case).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Begin blackout window AND inject spoofed GPS in same temporal window | Within ≤1 frame OR ≤400 ms: `source_label = dead_reckoned`; spoofed GPS rejected from estimator input; covariance grows monotonically |
|
||
| 2 | Observe `horiz_accuracy` field in outbound `GPS_INPUT` (AP) | `horiz_accuracy` ≥ 95% covariance semi-major axis (no under-reporting) |
|
||
| 3 | Observe GCS stream | `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz throughout blackout |
|
||
| 4 | For 35 s window only | Per AC-NEW-8: when 95% covariance crosses 100 m → fix-quality degraded; when crosses 500 m OR blackout exceeds 30 s → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT |
|
||
| 5 | End blackout; restore FC GPS-health | Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND a visual/satellite consistency check succeeds |
|
||
|
||
**Expected outcome**: All four steps' assertions pass for each window.
|
||
**Max execution time**: 5 min (3 windows × ~90 s each).
|
||
|
||
---
|
||
|
||
### FT-N-05: Stale-tile rejection on freshness violation
|
||
|
||
**Summary**: Validates that tiles violating AC-8.2 freshness window are rejected (or downgraded so they cannot produce a `satellite_anchored` label).
|
||
**Traces to**: AC-8.2, AC-NEW-6
|
||
|
||
**Preconditions**: `synth-age-tile-set` (`synth-age-7mo` for active-conflict, `synth-age-13mo` for rear) mounted instead of fresh fixture.
|
||
|
||
**Input data**: `still-image-set-60` against the aged cache.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Replay against `synth-age-7mo` (configure SUT for active-conflict sector) | SUT either rejects load OR loads but never emits `satellite_anchored` from these tiles |
|
||
| 2 | Replay against `synth-age-13mo` (configure SUT for rear sector) | Same: reject or non-`satellite_anchored` only |
|
||
|
||
**Expected outcome**: 0 frames emit `satellite_anchored` from aged tiles.
|
||
**Max execution time**: 5 min.
|
||
|
||
---
|
||
|
||
### FT-N-06: Mid-flight tile freshness (current-timestamped)
|
||
|
||
**Summary**: Validates that mid-flight-generated tiles are timestamped as current and treated as fresh per AC-NEW-6.
|
||
**Traces to**: AC-NEW-6 (positive sub-case)
|
||
|
||
**Preconditions**: empty `mid-flight-tile-output`.
|
||
|
||
**Input data**: `derkachi-fixture` 5 min segment.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Start replay | SUT generates mid-flight tiles |
|
||
| 2 | Inspect each generated tile's manifest entry | `capture_date` is within ±60 s of generation wall-clock; treated as fresh by the freshness gate |
|
||
|
||
**Expected outcome**: All mid-flight tiles current-timestamped and fresh.
|
||
**Max execution time**: 6 min.
|
||
|
||
---
|
||
|
||
### FT-P-20: Real-flight validation runner — honest verdict + Markdown accuracy report
|
||
|
||
**Summary**: Runs the full `gps-denied-replay` against the **real** Derkachi binary tlog + flight video + AZ-702 factory-sheet camera calibration, computes the per-emission horizontal-error distribution, and writes a structured Markdown accuracy report. Replaces the AZ-404 `@xfail` mask on AC-3 with a real PASS/FAIL.
|
||
**Traces to**: AZ-699 AC-1..AC-3 (epic AZ-696 AC-3 — the 100 m / 80 % gate).
|
||
**Category**: Position Accuracy
|
||
|
||
**Preconditions**:
|
||
- `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog` (real binary, multi-flight).
|
||
- `_docs/00_problem/input_data/flight_derkachi/flight_derkachi.mp4` (real recording, > 1 MB; the placeholder used by AZ-404 does not satisfy this gate).
|
||
- `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` (AZ-702 calibration).
|
||
- `gps-denied-replay` console-script installed.
|
||
- `RUN_REPLAY_E2E=1` (matches the existing AZ-404 gate).
|
||
|
||
**Input data**: real `derkachi.tlog` covers up to three sorties; the AZ-698 segmenter + `--auto-trim` locates the matching flight automatically.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Consumer Action | Expected System Response |
|
||
|------|----------------|------------------------|
|
||
| 1 | Invoke `gps-denied-replay --auto-trim ...` with real fixtures | Subprocess exits 0 within the 15-min NFR budget |
|
||
| 2 | Parse JSONL emissions; pair each with the nearest-in-time ground-truth row (binary-tlog GPS via AZ-697) | Distribution computed: count, mean, p50, p95, p99, threshold-hit share at 10/25/50/100 m |
|
||
| 3 | Render the Markdown accuracy report and write `_docs/06_metrics/real_flight_validation_{YYYY-MM-DD}.md` | Report exists with header, run context, horizontal-error stats, threshold-hit table, and (when available) vertical-error stats |
|
||
| 4 | Evaluate the AC-3 gate: ≥ 80 % within 100 m | Verdict is PASS or honest FAIL — no `@xfail` mask |
|
||
| 5 | On FAIL, surface a failure message referencing the calibration acquisition method (factory-sheet / placeholder / unknown) and the residual budget | Operator can attribute the failure without re-reading the source |
|
||
|
||
**Expected outcome**: PASS when the estimator meets the epic AC-3 gate; honest FAIL otherwise. The Markdown report is the durable artefact (consumed by the cycle retrospective and downstream tuning work).
|
||
|
||
**Max execution time**: 15 min (matches AZ-699 NFR for a single Tier-2 Jetson run).
|
||
|
||
**Report artefact schema** (canonical, produced by `tests/e2e/replay/_report_writer.py`):
|
||
|
||
```markdown
|
||
# Real-flight validation — YYYY-MM-DD
|
||
|
||
**Verdict**: PASS | FAIL (AC-3 gate: ≥ 80 % within 100 m)
|
||
|
||
## Run context
|
||
|
||
- Tlog: `<path>`
|
||
- Video: `<path>`
|
||
- Calibration acquisition method: factory-sheet | placeholder | unknown
|
||
- Clip duration: <float> s
|
||
- Emissions consumed: <int>
|
||
- Ground-truth pairings: <int>
|
||
|
||
## Horizontal error (metres)
|
||
|
||
| Statistic | Value |
|
||
| --------- | ----- |
|
||
| Mean | <float> |
|
||
| p50 | <float> |
|
||
| p95 | <float> |
|
||
| p99 | <float> |
|
||
|
||
## Threshold-hit share
|
||
|
||
| Threshold (m) | Hit share (%) |
|
||
| ------------- | ------------- |
|
||
| 10 | <float> |
|
||
| 25 | <float> |
|
||
| 50 | <float> |
|
||
| 100 | <float> |
|
||
|
||
## Vertical error (metres)
|
||
|
||
| Statistic | Value |
|
||
| --------- | ----- |
|
||
| Mean | <float> |
|
||
| p50 | <float> |
|
||
| p95 | <float> |
|
||
| Samples | <int> |
|
||
```
|
||
|
||
The Vertical-error section is replaced by `_No emissions carried a comparable altitude — vertical stats skipped._` when none of the JSONL rows carry an `alt_m` field comparable to the ground-truth altitude.
|
||
|
||
**Skip semantics**: AZ-699 distinguishes between *missing-prerequisite skip* (cleanly skipped with the missing file's path) and *test-cannot-resolve mask* (`@xfail` — explicitly forbidden by AZ-699 AC-1). The AZ-404 1-min test's `@xfail` on AC-3 is unchanged (AZ-699 AC-4 is "add a new test, don't replace") — FT-P-20 is the honest replacement that runs alongside it.
|