Files
gps-denied-onboard/_docs/02_document/tests/test-data.md
T

22 KiB
Raw Blame History

Test Data Management

Important Caveat — 60-image slice scope (per Phase 1 D2)

The 60 nav-cam JPGs in _docs/00_problem/input_data/AD000001.jpg … AD000060.jpg were captured at 400 m AGL with the ADTi Surveyor Lite 26S v2 (26 MP, 6252 × 4168, 25 mm, 23.5 mm sensor)not the deployment camera (ADTi 20MP 20L V1, APS-C, ~5472 × 3648) and not the deployment altitude (≤1 km AGL). This corpus is therefore pipeline-correctness only:

  • It validates that the pipeline (cuVSLAM → VPR → matcher → Component 5 → MAVLink GPS_INPUT) produces the right shape of output, in the right order, with the right categorical labels and MAVLink schema.
  • It does NOT validate the deployment-binding accuracy budgets (AC-1.1 ≥80 %@50 m, AC-1.2 ≥50 %@20 m), the GSD-band assumptions, the matcher resolution sweeps, or the latency budget for the deployed 1 km AGL / 20 MP path.
  • Pass numbers from this slice on AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-NEW-8 are functional, not deployment-binding. The deployment-binding numbers come from the deferred-corpus tier (AerialVL S03, UAV-VisLoc, AerialExtreMatch, internal Mavic, first internal fixed-wing flight).

Seed Data Sets

Data Set Description Used by Tests How Loaded Cleanup
nav_cam_60_slice 60 JPGs AD000001.jpgAD000060.jpg, 6252×4168, captured at 400 m AGL T1 pipeline-correctness tests (FT-P-01..FT-P-08, FT-N-01..FT-N-04) volume mount fixtures-images:/fixtures/images:ro volume is read-only — no cleanup
nav_cam_60_slice_coordinates coordinates.csv: per-frame WGS84 ground truth All T1 accuracy tests mount path /fixtures/images/coordinates.csv
nav_cam_60_slice_imu (synthetic, fixture) fixtures/imu_AD0000xx.csv: 200 Hz IMU traces synthesised by SITL ArduPilot replay of coordinates.csv as ground-truth trajectory T1 cuVSLAM tests; F-T1c IMU-sync-jitter measurement mount path /fixtures/imu/ ; ardupilot-sitl --imu-replay=... regenerated per test session
satellite_tiles_AD0000xx_z20 (placeholder fixture) z=20 ortho-tiles for the bbox of coordinates.csv, fetched offline by tile-cache-init from public ortho service (Esri / Mapbox / Sentinel-2 fallback gated to ≥0.5 m/px) T1 cross-view matcher / VPR tests volume tile-cache:/var/lib/gpsdenied/tiles volume rebuilt per test session
satellite_tile_descriptors_z20 Pre-extracted SuperPoint keypoints + DINOv2-VLAD global descriptors for satellite_tiles_AD0000xx_z20 T1 VPR + matcher tests same volume, sidecar .descriptors.h5 files same
aerialvl_s03 (deferred-corpus) AerialVL S03: 70 km of fixed-wing flight at 1 km AGL with synced IMU + GPS truth + nav-cam stream T2 AC-1.3, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9 external download script (data team task — Decompose); mount when present not removed (large, kept across sessions)
uav_visloc (deferred-corpus) UAV-VisLoc public dataset T2 matcher / VPR seasonal-robustness regression external download script not removed
aerialextrematch (deferred-corpus) AerialExtreMatch open-review dataset T2 matcher seasonal-robustness regression external download script not removed
2chadcnn_seasons (deferred-corpus) 2chADCNN season set (cross-season scene-change benchmark) T2 NF-T*-season-robustness external download script not removed
tartanair_v2 (deferred-corpus) TartanAir V2 synthetic scenes T2 matcher distillation evaluation external download script not removed
internal_mavic (deferred-corpus) Internal Mavic 3 Pro Mini recorded flights (legacy attempt; no IMU per problem.md, used for visual-only checks) T2 matcher visual-only regression external data team mount not removed
internal_fixed_wing_first_sortie (deferred-field) First internal fixed-wing flight with synced IMU + GPS truth T5 FT-1 / FT-2 / FT-3, AC-1.3 lock field-test mount not removed
synthetic_8h_load (synthesisable) 8-hour synthetic 3 fps nav-frame replay sequence assembled from nav_cam_60_slice looped + jittered NF-T3 thermal soak, NF-T5 FDR rollover (AC-NEW-3), AC-NEW-5 generated at fixture build time by fixtures/synth-8h-loader/ regenerated per session
cold_soak_corpus (deferred-hil) A short replay loop run at 20 °C ambient T4 NF-T3 cold-soak, AC-NEW-1 cold bench HW only
hot_soak_corpus (deferred-hil) Same replay loop run at +50 °C ambient for 8 h T4 NF-T3 hot-soak, AC-NEW-5 bench HW only
spoofing_scenarios Scripted MAVLink GPS_RAW_INT injections: jam-onset, lat/lon offset, sat-count drop, hdop spike T3 F-T9 / F-T12, AC-NEW-2 gps-spoof-injector config files regenerated per session
operator_hint_scenarios Scripted operator STATUSTEXT messages with approximate (lat, lon, sigma_xy=500m) T3 F-T10, AC-3.4, AC-6.2, results_report row 22 qgc-mock config regenerated per session
stale_tile_scenarios Synthetic-age tiles (1, 5, 7, 11, 13, 18 months old; both active-conflict and stable-rear sectors) T1 NF-T6, AC-8.2 / AC-NEW-6 injected into tile-cache by tile-cache-init --inject-stale volume rebuilt per session
cache_poisoning_scenarios Multi-flight Monte Carlo with synthetic over-confidence injection (EKF covariance deflated by 1.5×–3×) T2 NF-T4b, AC-NEW-7 generated by fixtures/cache-poison-mc/ regenerated per session
cold_start_replay_50 50× cold-boot replay: SUT process killed and restarted with simulated FC pose injection T1+T4 F-T11, AC-NEW-1 scripted in e2e-runner test
disconnected_segments_replay Synthetic ≥3 disconnected flight segments stitched from nav_cam_60_slice with gaps T1 F-T8, AC-3.3 generated at fixture build time regenerated per session
tile_dedup_replay A flight where ground sectors are visited twice — used to verify deduplication (AC-8.4) T1 F-T2 generated at fixture build time regenerated per session
mavlink2_signing_keys Test-only per-airframe HMAC-SHA256 signing keys T1 / T3 F-T9, S-T1, MAVLink2 signing assertions env var MAVLINK2_SIGNING_KEY=… shared SUT + runner + FC rotated per session
tls_test_certs Self-signed CA + SUT cert + client cert (test-only) T1 S-T1..S-T5 HTTPS auth tests mount tls-test-certs:/etc/gpsdenied/tls:ro regenerated per session

Data Isolation Strategy

  • Container scope: each test session starts with a clean sut container (no cache poisoning between sessions).
  • Volume scope: tile-cache and fdr volumes are rebuilt per test session (not per test) — within a session, tests that depend on cache state are ordered or use namespaced subdirectories. fixtures-images, fixtures-imu, fixtures-expected are read-only; cannot be polluted.
  • Cross-test contamination: tests that mutate state (cache writes, FDR writes) declare pytest.mark.mutates_state and are run in a serial group. Read-only tests run in parallel within a tier.
  • Identity isolation: each session generates a fresh mavlink2_signing_keys set and JWT signing key — replay across sessions is impossible.
  • Resource isolation: T4 deferred-hil tests do not share a Jetson with any other test; bench scheduler enforces single-tenant access.

Input Data Mapping

Input Data File Source Location Description Covers Scenarios
AD000001.jpgAD000060.jpg _docs/00_problem/input_data/ 60 nav-cam JPGs, 6252×4168, 400 m AGL, ADTi 26S v2 FT-P-01..FT-P-08, FT-N-01..FT-N-04, NF-RES-LIM-01..03 (T1)
coordinates.csv _docs/00_problem/input_data/ Frame index → WGS84 ground truth results_report rows 14, FT-P-01, FT-P-02, NFT-PERF-01
data_parameters.md _docs/00_problem/input_data/ Corpus-shoot params (400 m AGL, 26S v2, 25 mm, 23.5 mm sensor) All T1 tests — context for pipeline-correctness scope
AD000001_gmaps.png, AD000002_gmaps.png _docs/00_problem/input_data/ Two satellite reference thumbnails (frames 12 only) Smoke-test only; not used as the cross-view reference (placeholder fixture is)
expected_results/results_report.md _docs/00_problem/input_data/ 46-scenario expected results mapping All T1 tests + most T2 tests; canonical pass/fail thresholds
expected_results/position_accuracy.csv _docs/00_problem/input_data/ Per-frame ground truth + thresholds results_report rows 13, FT-P-01, FT-P-02

Expected Results Mapping

The canonical mapping is _docs/00_problem/input_data/expected_results/results_report.md. The traceability matrix references that file by row number. The summary table below lists the rows by the test scenario IDs that consume them.

Test Scenario ID Input Data Expected Result Comparison Method Tolerance Expected Result Source
FT-P-01 coordinates.csv (60 frames) + nav_cam_60_slice + satellite_tiles_AD0000xx_z20 + nav_cam_60_slice_imu ≥80 % within 50 m percentage ≥80 % results_report row 1; position_accuracy.csv
FT-P-02 same ≥50 % within 20 m percentage ≥50 % results_report row 2; position_accuracy.csv
FT-P-03 same each frame ≤100 m error numeric_tolerance ±100 m max per frame results_report row 3
FT-P-04 same cumulative VO drift between satellite anchors ≤100 m mono / ≤50 m mono+IMU threshold_max mono: ≤100 m; mono+IMU: ≤50 m results_report row 4 ; AC-1.3 / AC-NEW-8
FT-P-05 single frame + IMU fix_type=3, horiz_accuracy ∈ [1,50] m, satellites_visible=10 exact (fix_type, sat) + range (h_acc) as stated results_report row 5
FT-P-06 sequence, no satellite >30 s fix_type=3, horiz_accuracy ∈ [20,100] exact + range as stated results_report row 6
FT-P-07 sequence, VO lost + no satellite fix_type=2, h_acc ≥ 50 m (growing) exact + threshold_min as stated results_report row 7
FT-P-08 VO lost + 3 sat failures fix_type=0, h_acc=999.0 exact N/A results_report row 8
FT-P-09 tier transitions tier ∈ {HIGH, MEDIUM, LOW, FAILED} per conditions exact N/A results_report rows 1013
FT-P-10 60 frames registration rate ≥95 % (T1 functional only) percentage ≥95 % (functional) results_report row 14
FT-P-11 60 frames MRE < 1.0 px VO frame-to-frame; < 2.5 px cross-domain threshold_max <1.0 / <2.5 results_report row 15 ; AC-2.2
FT-P-12 frames 3243 (turn area) system continues producing position estimates through turn threshold_min ≥1 position output / frame results_report row 16
FT-P-13 350 m gap synthetic error ≤100 m after recovery threshold_max ≤100 m results_report row 17
FT-P-14 sharp-turn synthetic satellite re-loc triggers; error ≤50 m within 3 frames threshold_max ≤50 m results_report row 18
FT-P-15 VO loss + sat success tracking_state == NORMAL after recovery exact N/A results_report row 19
FT-P-16 startup with GLOBAL_POSITION_INT first GPS_INPUT within 30 s of boot, p95 threshold_max ≤30 s p95 results_report row 23 ; AC-NEW-1
FT-P-17 startup + first satellite match error ≤50 m after first match threshold_max ≤50 m results_report row 24
FT-P-18 reboot mid-flight recovery time ≤30 s threshold_max ≤30 s results_report row 25 ; AC-NEW-1
FT-P-19 post-reboot first match error ≤50 m threshold_max ≤50 m results_report row 26
FT-P-20 object localize valid request response with lat/lon within accuracy_m of ground truth numeric_tolerance per response.accuracy_m results_report row 27
FT-P-21 round-trip GPS→NED→pixel→GPS error ≤0.1 m threshold_max ≤0.1 m results_report row 29
FT-P-22 GET /health 200 + JSON with status, memory_mb, gpu_temp_c exact + regex as stated results_report row 30
FT-P-23 POST /sessions 200 or 201 + session id exact status ∈ {200,201} results_report row 31
FT-P-24 GET /sessions/{id}/stream SSE events at ~1 Hz with schema fields regex + rate per SSE schema results_report row 32
FT-P-25 TRT engine load ≤10 s total threshold_max ≤10 s results_report row 39
FT-P-26 mission area definition 3001000 MB tile storage range [300, 1000] MB results_report row 40
FT-P-27 EKF position ± 3σ tile mosaic radius ≥500 m threshold_min ≥500 m results_report row 41
FT-P-28 tile dedup replay ≤1 tile per ground sector visited ≥2× exact per-sector count == 1 AC-8.4, F-T2
FT-P-29 post-flight upload tiles uploaded to candidate pool with trust_level=candidate exact as stated AC-8.4, F-T3
FT-P-30 telemetry NAMED_VALUE_FLOAT at 1 Hz ± 0.2 Hz numeric_tolerance 1 Hz ± 0.2 Hz results_report row 45
FT-N-01 corrupted JPG system continues with tracking_state == DEGRADED, no crash exact tracking_state ∈ {DEGRADED, NORMAL} derived from AC-3.x
FT-N-02 invalid object localize pixel HTTP 422 exact status == 422 results_report row 28
FT-N-03 unauthenticated POST /sessions HTTP 401 exact status == 401 results_report row 33
FT-N-04 tile older than freshness budget tile rejected or down-confidence; never satellite_anchored exact as stated AC-8.2, AC-NEW-6
FT-N-05 tile in 30-day grace zone confidence linearly decayed numeric_tolerance per spec curve AC-NEW-6
FT-N-06 sharp turn (no overlap, <70°, <200 m) satellite re-loc within 3 frames threshold_max ≤50 m within 3 frames results_report row 18 ; AC-3.2
FT-N-07 VO loss + 3 sat failures RELOC_REQ regex pattern emitted via STATUSTEXT regex per pattern results_report rows 20, 46
FT-N-08 re-loc active fix_type=0, IMU prediction continues, sat attempts continue exact as stated results_report row 21
FT-N-09 operator hint received hint used as 500 m seed for VPR; ≤500 m initially, ≤50 m after match threshold_max as stated results_report row 22
NFT-PERF-01 single 6252×4168 frame on Orin Nano Super 25 W (T4) end-to-end latency ≤400 ms p95 threshold_max ≤400 ms p95 results_report row 34 ; AC-4.1
NFT-PERF-02 cuVSLAM single frame ≤20 ms / frame threshold_max ≤20 ms results_report row 37
NFT-PERF-03 matcher single pair on Orin Nano Super 25 W inline ≤200 ms; re-loc fallback ≤2000 ms threshold_max as stated results_report row 38
NFT-PERF-04 Orthority per-frame on Orin Nano Super ≤50 ms / frame threshold_max ≤50 m frame F-T14, M-27
NFT-PERF-05 spoof onset → SUT promotion ≤3 s p95 threshold_max ≤3 s p95 AC-NEW-2 ; F-T12
NFT-PERF-06 per-frame end-to-end (frame-by-frame, not batched) inter-frame interval matches camera rate numeric_tolerance per frame within ±50 ms of camera rate AC-4.4
NFT-RES-01 SUT process killed mid-flight recovery ≤30 s, restart from FC pose threshold_max ≤30 s results_report row 25 ; AC-5.3, AC-NEW-1
NFT-RES-02 spoofing onset promotion ≤3 s threshold_max ≤3 s AC-NEW-2
NFT-RES-03 network partition with FC failsafe at 3 s no fix threshold_max ≤3 s AC-5.2
NFT-RES-04 EKF3 lane-switch / fix-loss event source-promotion responds exact promotion within budget AC-NEW-2
NFT-SEC-01 unsigned MAVLink injection FC rejects exact acceptance==false F-T9, S-T1
NFT-SEC-02 unauthenticated REST 401 / 403 exact per endpoint results_report row 33
NFT-SEC-03 malformed JWT 401 exact status==401 derived
NFT-SEC-04 TLS downgrade attempt rejected exact TLS ≥1.2 only S-T2
NFT-SEC-05 tile-cache write attempt by unauthorized API 403 / no-op exact as stated AC-8.5, AC-NEW-7
NFT-RES-LIM-01 30-min sustained load (T1+T4) peak < 8192 MB; growth ≤50 MB / 30 min threshold_max as stated results_report row 35 ; AC-4.2
NFT-RES-LIM-02 30-min sustained load SoC junction ≤80 °C threshold_max ≤80 °C results_report row 36
NFT-RES-LIM-03 8-h sustained 25 W @ +50 °C ambient (T4) no thermal throttle exact throttle_event_count == 0 AC-NEW-5, NF-T3
NFT-RES-LIM-04 FDR 8-h synthetic load FDR ≤64 GB; rollover logged; no payload class silently dropped threshold_max + audit as stated AC-NEW-3, NF-T5
NFT-RES-LIM-05 tile cache 400 km² ≤10 GB persistent threshold_max ≤10 GB restrictions §UAV

External Dependency Mocks

External Service Mock/Stub How Provided Behavior
Azaion Suite Satellite Service (pre-flight cache sync) tile-cache-init one-shot loader Docker service that materialises MBTiles + sidecar before SUT starts Returns the same fixture set every run; deterministic
Azaion Suite Satellite Service (post-flight upload) candidate-pool stub inside qgc-mock (or a dedicated service-stub container) HTTP server with POST /candidates accepting tile uploads, recording to a file Records what the SUT sends; never alters the cache used by the next test
QGroundControl GCS qgc-mock Custom MAVLink-only mock Records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY frames; can inject operator-hint STATUSTEXT
ArduPilot autopilot ardupilot-sitl (PR #30080-pinned) Official ArduPilot SITL container Replays IMU from fixture; runs EKF3; exposes RAW_IMU, ATTITUDE, GLOBAL_POSITION_INT, EKF_STATUS_REPORT, GPS_RAW_INT
Spoofing GPS adversary gps-spoof-injector Custom MAVLink injector Sends crafted GPS_RAW_INT with configurable lat/lon offset, sat count, hdop
Identity provider (JWT) in-runner key generator Test-only HMAC-SHA256 key shared at SUT boot via env var Mints valid + invalid + expired JWTs
External satellite providers (Maxar, Airbus, Planet) NOT MOCKED — out of scope per AC-8.1; SUT does not call them at runtime The SUT must never make outbound HTTP to these hosts; F-T2 / NFT-SEC-04 includes a network-policy assertion

All mocks are deterministic — same input always produces same output — except the spoof / operator-hint scenarios that explicitly schedule events on a wall-clock so the SUT's timing budgets (AC-NEW-1, AC-NEW-2) are exercised.

Data Validation Rules

Data Type Validation Invalid Examples Expected System Behavior
Nav-cam frame non-zero size; JPEG / PNG decodable; expected resolution within ±1 % of data_parameters.md 0-byte file, truncated JPEG header, wildly wrong resolution log error; tracking_state transitions to DEGRADED if loss >2 frames; never crash
IMU sample rate 200 Hz ± 10 %; timestamps monotonic; covariance present timestamp regression, rate < 50 Hz, NaN / Inf drop sample with WARN log; if loss > 0.5 s → cuVSLAM degrade; AC-5.2 path eligible
Satellite tile MBTiles schema valid; descriptors present; capture_date within freshness budget for sector corrupt MBTiles, missing sidecar, beyond-grace freshness reject with WARN; AC-8.2 / AC-NEW-6
MAVLink GPS_RAW_INT (FC inputs) well-formed; signing valid (when MAVLink2 signing on) unsigned frame, malformed length, sysid spoofing reject; F-T9 + S-T1 cover this
HTTPS request body JSON parse OK; required fields present; pixel coords ∈ frame bounds missing fields, NaN, out-of-bounds pixel HTTP 422
JWT signature valid; not expired; subject is allowed expired, wrong sig, missing claims HTTP 401
Tile descriptor dimension matches index; checksum match wrong dims, mismatched hash reject load; cache marks as corrupt; F-T2
Operator hint STATUSTEXT parseable RELOC_HINT: lat=… lon=… sigma=…; numeric ranges sane malformed, NaN, negative sigma, lat > 90 / lon > 180 reject hint; emit STATUSTEXT WARN; do not seed VPR

Pending Data (Phase 1 D3 — placeholder fixtures)

The following fixtures are declared by name in this spec but not yet present at the time of writing. Phase 3's HARD GATE will surface them as pending data, not "remove":

Fixture Generator / source Owner Phase 3 treatment
fixtures/satellite_tiles_AD0000xx_z20/ tile-cache-init script: fetch z=20 ortho tiles for the bbox of coordinates.csv from a public ortho service (Esri / Mapbox / Sentinel-2 ≥ 0.5 m/px); pre-extract SuperPoint + DINOv2-VLAD descriptors Decompose / impl. team task pending data — not removed; data_status: deferred-corpus retained until generator script is committed
fixtures/imu_AD0000xx.csv SITL ArduPilot replay of coordinates.csv as ground-truth trajectory at 200 Hz Decompose / impl. team task pending data — not removed; data_status: deferred-corpus
aerialvl_s03, uav_visloc, aerialextrematch, 2chadcnn_seasons, tartanair_v2, internal_mavic External downloads + curation data team task (Decompose creates a "dataset acquisition" task) data_status: deferred-corpus
internal_fixed_wing_first_sortie Field-test plan operations team data_status: deferred-field
cold_soak_corpus, hot_soak_corpus Bench HW + chamber bench team data_status: deferred-hil
synthetic_8h_load fixtures/synth-8h-loader/ script impl. team regenerated per session — synthesisable, no external dependency
cache_poisoning_scenarios fixtures/cache-poison-mc/ script impl. team regenerated per session