22 KiB
Test Data Management
Important Caveat — 60-image slice scope (per Phase 1 D2)
The 60 nav-cam JPGs in _docs/00_problem/input_data/AD000001.jpg … AD000060.jpg were captured at 400 m AGL with the ADTi Surveyor Lite 26S v2 (26 MP, 6252 × 4168, 25 mm, 23.5 mm sensor) — not the deployment camera (ADTi 20MP 20L V1, APS-C, ~5472 × 3648) and not the deployment altitude (≤1 km AGL). This corpus is therefore pipeline-correctness only:
- It validates that the pipeline (cuVSLAM → VPR → matcher → Component 5 → MAVLink GPS_INPUT) produces the right shape of output, in the right order, with the right categorical labels and MAVLink schema.
- It does NOT validate the deployment-binding accuracy budgets (AC-1.1 ≥80 %@50 m, AC-1.2 ≥50 %@20 m), the GSD-band assumptions, the matcher resolution sweeps, or the latency budget for the deployed 1 km AGL / 20 MP path.
- Pass numbers from this slice on AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-NEW-8 are functional, not deployment-binding. The deployment-binding numbers come from the deferred-corpus tier (AerialVL S03, UAV-VisLoc, AerialExtreMatch, internal Mavic, first internal fixed-wing flight).
Seed Data Sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|---|---|---|---|---|
nav_cam_60_slice |
60 JPGs AD000001.jpg…AD000060.jpg, 6252×4168, captured at 400 m AGL |
T1 pipeline-correctness tests (FT-P-01..FT-P-08, FT-N-01..FT-N-04) | volume mount fixtures-images:/fixtures/images:ro |
volume is read-only — no cleanup |
nav_cam_60_slice_coordinates |
coordinates.csv: per-frame WGS84 ground truth |
All T1 accuracy tests | mount path /fixtures/images/coordinates.csv |
— |
nav_cam_60_slice_imu (synthetic, fixture) |
fixtures/imu_AD0000xx.csv: 200 Hz IMU traces synthesised by SITL ArduPilot replay of coordinates.csv as ground-truth trajectory |
T1 cuVSLAM tests; F-T1c IMU-sync-jitter measurement | mount path /fixtures/imu/ ; ardupilot-sitl --imu-replay=... |
regenerated per test session |
satellite_tiles_AD0000xx_z20 (placeholder fixture) |
z=20 ortho-tiles for the bbox of coordinates.csv, fetched offline by tile-cache-init from public ortho service (Esri / Mapbox / Sentinel-2 fallback gated to ≥0.5 m/px) |
T1 cross-view matcher / VPR tests | volume tile-cache:/var/lib/gpsdenied/tiles |
volume rebuilt per test session |
satellite_tile_descriptors_z20 |
Pre-extracted SuperPoint keypoints + DINOv2-VLAD global descriptors for satellite_tiles_AD0000xx_z20 |
T1 VPR + matcher tests | same volume, sidecar .descriptors.h5 files |
same |
aerialvl_s03 (deferred-corpus) |
AerialVL S03: 70 km of fixed-wing flight at 1 km AGL with synced IMU + GPS truth + nav-cam stream | T2 AC-1.3, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9 | external download script (data team task — Decompose); mount when present | not removed (large, kept across sessions) |
uav_visloc (deferred-corpus) |
UAV-VisLoc public dataset | T2 matcher / VPR seasonal-robustness regression | external download script | not removed |
aerialextrematch (deferred-corpus) |
AerialExtreMatch open-review dataset | T2 matcher seasonal-robustness regression | external download script | not removed |
2chadcnn_seasons (deferred-corpus) |
2chADCNN season set (cross-season scene-change benchmark) | T2 NF-T*-season-robustness | external download script | not removed |
tartanair_v2 (deferred-corpus) |
TartanAir V2 synthetic scenes | T2 matcher distillation evaluation | external download script | not removed |
internal_mavic (deferred-corpus) |
Internal Mavic 3 Pro Mini recorded flights (legacy attempt; no IMU per problem.md, used for visual-only checks) | T2 matcher visual-only regression | external data team mount |
not removed |
internal_fixed_wing_first_sortie (deferred-field) |
First internal fixed-wing flight with synced IMU + GPS truth | T5 FT-1 / FT-2 / FT-3, AC-1.3 lock | field-test mount | not removed |
synthetic_8h_load (synthesisable) |
8-hour synthetic 3 fps nav-frame replay sequence assembled from nav_cam_60_slice looped + jittered |
NF-T3 thermal soak, NF-T5 FDR rollover (AC-NEW-3), AC-NEW-5 | generated at fixture build time by fixtures/synth-8h-loader/ |
regenerated per session |
cold_soak_corpus (deferred-hil) |
A short replay loop run at −20 °C ambient | T4 NF-T3 cold-soak, AC-NEW-1 cold | bench HW only | — |
hot_soak_corpus (deferred-hil) |
Same replay loop run at +50 °C ambient for 8 h | T4 NF-T3 hot-soak, AC-NEW-5 | bench HW only | — |
spoofing_scenarios |
Scripted MAVLink GPS_RAW_INT injections: jam-onset, lat/lon offset, sat-count drop, hdop spike | T3 F-T9 / F-T12, AC-NEW-2 | gps-spoof-injector config files |
regenerated per session |
operator_hint_scenarios |
Scripted operator STATUSTEXT messages with approximate (lat, lon, sigma_xy=500m) |
T3 F-T10, AC-3.4, AC-6.2, results_report row 22 | qgc-mock config |
regenerated per session |
stale_tile_scenarios |
Synthetic-age tiles (1, 5, 7, 11, 13, 18 months old; both active-conflict and stable-rear sectors) | T1 NF-T6, AC-8.2 / AC-NEW-6 | injected into tile-cache by tile-cache-init --inject-stale |
volume rebuilt per session |
cache_poisoning_scenarios |
Multi-flight Monte Carlo with synthetic over-confidence injection (EKF covariance deflated by 1.5×–3×) | T2 NF-T4b, AC-NEW-7 | generated by fixtures/cache-poison-mc/ |
regenerated per session |
cold_start_replay_50 |
50× cold-boot replay: SUT process killed and restarted with simulated FC pose injection | T1+T4 F-T11, AC-NEW-1 | scripted in e2e-runner test |
— |
disconnected_segments_replay |
Synthetic ≥3 disconnected flight segments stitched from nav_cam_60_slice with gaps |
T1 F-T8, AC-3.3 | generated at fixture build time | regenerated per session |
tile_dedup_replay |
A flight where ground sectors are visited twice — used to verify deduplication (AC-8.4) | T1 F-T2 | generated at fixture build time | regenerated per session |
mavlink2_signing_keys |
Test-only per-airframe HMAC-SHA256 signing keys | T1 / T3 F-T9, S-T1, MAVLink2 signing assertions | env var MAVLINK2_SIGNING_KEY=… shared SUT + runner + FC |
rotated per session |
tls_test_certs |
Self-signed CA + SUT cert + client cert (test-only) | T1 S-T1..S-T5 HTTPS auth tests | mount tls-test-certs:/etc/gpsdenied/tls:ro |
regenerated per session |
Data Isolation Strategy
- Container scope: each test session starts with a clean
sutcontainer (no cache poisoning between sessions). - Volume scope:
tile-cacheandfdrvolumes are rebuilt per test session (not per test) — within a session, tests that depend on cache state are ordered or use namespaced subdirectories.fixtures-images,fixtures-imu,fixtures-expectedare read-only; cannot be polluted. - Cross-test contamination: tests that mutate state (cache writes, FDR writes) declare
pytest.mark.mutates_stateand are run in a serial group. Read-only tests run in parallel within a tier. - Identity isolation: each session generates a fresh
mavlink2_signing_keysset and JWT signing key — replay across sessions is impossible. - Resource isolation: T4 deferred-hil tests do not share a Jetson with any other test; bench scheduler enforces single-tenant access.
Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |
|---|---|---|---|
AD000001.jpg…AD000060.jpg |
_docs/00_problem/input_data/ |
60 nav-cam JPGs, 6252×4168, 400 m AGL, ADTi 26S v2 | FT-P-01..FT-P-08, FT-N-01..FT-N-04, NF-RES-LIM-01..03 (T1) |
coordinates.csv |
_docs/00_problem/input_data/ |
Frame index → WGS84 ground truth | results_report rows 1–4, FT-P-01, FT-P-02, NFT-PERF-01 |
data_parameters.md |
_docs/00_problem/input_data/ |
Corpus-shoot params (400 m AGL, 26S v2, 25 mm, 23.5 mm sensor) | All T1 tests — context for pipeline-correctness scope |
AD000001_gmaps.png, AD000002_gmaps.png |
_docs/00_problem/input_data/ |
Two satellite reference thumbnails (frames 1–2 only) | Smoke-test only; not used as the cross-view reference (placeholder fixture is) |
expected_results/results_report.md |
_docs/00_problem/input_data/ |
46-scenario expected results mapping | All T1 tests + most T2 tests; canonical pass/fail thresholds |
expected_results/position_accuracy.csv |
_docs/00_problem/input_data/ |
Per-frame ground truth + thresholds | results_report rows 1–3, FT-P-01, FT-P-02 |
Expected Results Mapping
The canonical mapping is _docs/00_problem/input_data/expected_results/results_report.md. The traceability matrix references that file by row number. The summary table below lists the rows by the test scenario IDs that consume them.
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Expected Result Source |
|---|---|---|---|---|---|
| FT-P-01 | coordinates.csv (60 frames) + nav_cam_60_slice + satellite_tiles_AD0000xx_z20 + nav_cam_60_slice_imu |
≥80 % within 50 m | percentage |
≥80 % | results_report row 1; position_accuracy.csv |
| FT-P-02 | same | ≥50 % within 20 m | percentage |
≥50 % | results_report row 2; position_accuracy.csv |
| FT-P-03 | same | each frame ≤100 m error | numeric_tolerance |
±100 m max per frame | results_report row 3 |
| FT-P-04 | same | cumulative VO drift between satellite anchors ≤100 m mono / ≤50 m mono+IMU | threshold_max |
mono: ≤100 m; mono+IMU: ≤50 m | results_report row 4 ; AC-1.3 / AC-NEW-8 |
| FT-P-05 | single frame + IMU | fix_type=3, horiz_accuracy ∈ [1,50] m, satellites_visible=10 |
exact (fix_type, sat) + range (h_acc) |
as stated | results_report row 5 |
| FT-P-06 | sequence, no satellite >30 s | fix_type=3, horiz_accuracy ∈ [20,100] |
exact + range |
as stated | results_report row 6 |
| FT-P-07 | sequence, VO lost + no satellite | fix_type=2, h_acc ≥ 50 m (growing) |
exact + threshold_min |
as stated | results_report row 7 |
| FT-P-08 | VO lost + 3 sat failures | fix_type=0, h_acc=999.0 |
exact |
N/A | results_report row 8 |
| FT-P-09 | tier transitions | tier ∈ {HIGH, MEDIUM, LOW, FAILED} per conditions | exact |
N/A | results_report rows 10–13 |
| FT-P-10 | 60 frames | registration rate ≥95 % (T1 functional only) | percentage |
≥95 % (functional) | results_report row 14 |
| FT-P-11 | 60 frames | MRE < 1.0 px VO frame-to-frame; < 2.5 px cross-domain | threshold_max |
<1.0 / <2.5 | results_report row 15 ; AC-2.2 |
| FT-P-12 | frames 32–43 (turn area) | system continues producing position estimates through turn | threshold_min |
≥1 position output / frame | results_report row 16 |
| FT-P-13 | 350 m gap synthetic | error ≤100 m after recovery | threshold_max |
≤100 m | results_report row 17 |
| FT-P-14 | sharp-turn synthetic | satellite re-loc triggers; error ≤50 m within 3 frames | threshold_max |
≤50 m | results_report row 18 |
| FT-P-15 | VO loss + sat success | tracking_state == NORMAL after recovery |
exact |
N/A | results_report row 19 |
| FT-P-16 | startup with GLOBAL_POSITION_INT |
first GPS_INPUT within 30 s of boot, p95 | threshold_max |
≤30 s p95 | results_report row 23 ; AC-NEW-1 |
| FT-P-17 | startup + first satellite match | error ≤50 m after first match | threshold_max |
≤50 m | results_report row 24 |
| FT-P-18 | reboot mid-flight | recovery time ≤30 s | threshold_max |
≤30 s | results_report row 25 ; AC-NEW-1 |
| FT-P-19 | post-reboot first match | error ≤50 m | threshold_max |
≤50 m | results_report row 26 |
| FT-P-20 | object localize valid request | response with lat/lon within accuracy_m of ground truth |
numeric_tolerance |
per response.accuracy_m | results_report row 27 |
| FT-P-21 | round-trip GPS→NED→pixel→GPS | error ≤0.1 m | threshold_max |
≤0.1 m | results_report row 29 |
| FT-P-22 | GET /health |
200 + JSON with status, memory_mb, gpu_temp_c |
exact + regex |
as stated | results_report row 30 |
| FT-P-23 | POST /sessions |
200 or 201 + session id | exact |
status ∈ {200,201} | results_report row 31 |
| FT-P-24 | GET /sessions/{id}/stream |
SSE events at ~1 Hz with schema fields | regex + rate |
per SSE schema | results_report row 32 |
| FT-P-25 | TRT engine load | ≤10 s total | threshold_max |
≤10 s | results_report row 39 |
| FT-P-26 | mission area definition | 300–1000 MB tile storage | range |
[300, 1000] MB | results_report row 40 |
| FT-P-27 | EKF position ± 3σ | tile mosaic radius ≥500 m | threshold_min |
≥500 m | results_report row 41 |
| FT-P-28 | tile dedup replay | ≤1 tile per ground sector visited ≥2× | exact |
per-sector count == 1 | AC-8.4, F-T2 |
| FT-P-29 | post-flight upload | tiles uploaded to candidate pool with trust_level=candidate |
exact |
as stated | AC-8.4, F-T3 |
| FT-P-30 | telemetry | NAMED_VALUE_FLOAT at 1 Hz ± 0.2 Hz | numeric_tolerance |
1 Hz ± 0.2 Hz | results_report row 45 |
| FT-N-01 | corrupted JPG | system continues with tracking_state == DEGRADED, no crash |
exact |
tracking_state ∈ {DEGRADED, NORMAL} | derived from AC-3.x |
| FT-N-02 | invalid object localize pixel | HTTP 422 | exact |
status == 422 | results_report row 28 |
| FT-N-03 | unauthenticated POST /sessions |
HTTP 401 | exact |
status == 401 | results_report row 33 |
| FT-N-04 | tile older than freshness budget | tile rejected or down-confidence; never satellite_anchored |
exact |
as stated | AC-8.2, AC-NEW-6 |
| FT-N-05 | tile in 30-day grace zone | confidence linearly decayed | numeric_tolerance |
per spec curve | AC-NEW-6 |
| FT-N-06 | sharp turn (no overlap, <70°, <200 m) | satellite re-loc within 3 frames | threshold_max |
≤50 m within 3 frames | results_report row 18 ; AC-3.2 |
| FT-N-07 | VO loss + 3 sat failures | RELOC_REQ regex pattern emitted via STATUSTEXT |
regex |
per pattern | results_report rows 20, 46 |
| FT-N-08 | re-loc active | fix_type=0, IMU prediction continues, sat attempts continue |
exact |
as stated | results_report row 21 |
| FT-N-09 | operator hint received | hint used as 500 m seed for VPR; ≤500 m initially, ≤50 m after match | threshold_max |
as stated | results_report row 22 |
| NFT-PERF-01 | single 6252×4168 frame on Orin Nano Super 25 W (T4) | end-to-end latency ≤400 ms p95 | threshold_max |
≤400 ms p95 | results_report row 34 ; AC-4.1 |
| NFT-PERF-02 | cuVSLAM single frame | ≤20 ms / frame | threshold_max |
≤20 ms | results_report row 37 |
| NFT-PERF-03 | matcher single pair on Orin Nano Super 25 W | inline ≤200 ms; re-loc fallback ≤2000 ms | threshold_max |
as stated | results_report row 38 |
| NFT-PERF-04 | Orthority per-frame on Orin Nano Super | ≤50 ms / frame | threshold_max |
≤50 m frame | F-T14, M-27 |
| NFT-PERF-05 | spoof onset → SUT promotion | ≤3 s p95 | threshold_max |
≤3 s p95 | AC-NEW-2 ; F-T12 |
| NFT-PERF-06 | per-frame end-to-end (frame-by-frame, not batched) | inter-frame interval matches camera rate | numeric_tolerance |
per frame within ±50 ms of camera rate | AC-4.4 |
| NFT-RES-01 | SUT process killed mid-flight | recovery ≤30 s, restart from FC pose | threshold_max |
≤30 s | results_report row 25 ; AC-5.3, AC-NEW-1 |
| NFT-RES-02 | spoofing onset | promotion ≤3 s | threshold_max |
≤3 s | AC-NEW-2 |
| NFT-RES-03 | network partition with FC | failsafe at 3 s no fix | threshold_max |
≤3 s | AC-5.2 |
| NFT-RES-04 | EKF3 lane-switch / fix-loss event | source-promotion responds | exact |
promotion within budget | AC-NEW-2 |
| NFT-SEC-01 | unsigned MAVLink injection | FC rejects | exact |
acceptance==false | F-T9, S-T1 |
| NFT-SEC-02 | unauthenticated REST | 401 / 403 | exact |
per endpoint | results_report row 33 |
| NFT-SEC-03 | malformed JWT | 401 | exact |
status==401 | derived |
| NFT-SEC-04 | TLS downgrade attempt | rejected | exact |
TLS ≥1.2 only | S-T2 |
| NFT-SEC-05 | tile-cache write attempt by unauthorized API | 403 / no-op | exact |
as stated | AC-8.5, AC-NEW-7 |
| NFT-RES-LIM-01 | 30-min sustained load (T1+T4) | peak < 8192 MB; growth ≤50 MB / 30 min | threshold_max |
as stated | results_report row 35 ; AC-4.2 |
| NFT-RES-LIM-02 | 30-min sustained load | SoC junction ≤80 °C | threshold_max |
≤80 °C | results_report row 36 |
| NFT-RES-LIM-03 | 8-h sustained 25 W @ +50 °C ambient (T4) | no thermal throttle | exact |
throttle_event_count == 0 | AC-NEW-5, NF-T3 |
| NFT-RES-LIM-04 | FDR 8-h synthetic load | FDR ≤64 GB; rollover logged; no payload class silently dropped | threshold_max + audit |
as stated | AC-NEW-3, NF-T5 |
| NFT-RES-LIM-05 | tile cache 400 km² | ≤10 GB persistent | threshold_max |
≤10 GB | restrictions §UAV |
External Dependency Mocks
| External Service | Mock/Stub | How Provided | Behavior |
|---|---|---|---|
| Azaion Suite Satellite Service (pre-flight cache sync) | tile-cache-init one-shot loader |
Docker service that materialises MBTiles + sidecar before SUT starts | Returns the same fixture set every run; deterministic |
| Azaion Suite Satellite Service (post-flight upload) | candidate-pool stub inside qgc-mock (or a dedicated service-stub container) |
HTTP server with POST /candidates accepting tile uploads, recording to a file |
Records what the SUT sends; never alters the cache used by the next test |
| QGroundControl GCS | qgc-mock |
Custom MAVLink-only mock | Records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY frames; can inject operator-hint STATUSTEXT |
| ArduPilot autopilot | ardupilot-sitl (PR #30080-pinned) |
Official ArduPilot SITL container | Replays IMU from fixture; runs EKF3; exposes RAW_IMU, ATTITUDE, GLOBAL_POSITION_INT, EKF_STATUS_REPORT, GPS_RAW_INT |
| Spoofing GPS adversary | gps-spoof-injector |
Custom MAVLink injector | Sends crafted GPS_RAW_INT with configurable lat/lon offset, sat count, hdop |
| Identity provider (JWT) | in-runner key generator | Test-only HMAC-SHA256 key shared at SUT boot via env var | Mints valid + invalid + expired JWTs |
| External satellite providers (Maxar, Airbus, Planet) | NOT MOCKED — out of scope per AC-8.1; SUT does not call them at runtime | — | The SUT must never make outbound HTTP to these hosts; F-T2 / NFT-SEC-04 includes a network-policy assertion |
All mocks are deterministic — same input always produces same output — except the spoof / operator-hint scenarios that explicitly schedule events on a wall-clock so the SUT's timing budgets (AC-NEW-1, AC-NEW-2) are exercised.
Data Validation Rules
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|---|---|---|---|
| Nav-cam frame | non-zero size; JPEG / PNG decodable; expected resolution within ±1 % of data_parameters.md |
0-byte file, truncated JPEG header, wildly wrong resolution | log error; tracking_state transitions to DEGRADED if loss >2 frames; never crash |
| IMU sample | rate 200 Hz ± 10 %; timestamps monotonic; covariance present | timestamp regression, rate < 50 Hz, NaN / Inf | drop sample with WARN log; if loss > 0.5 s → cuVSLAM degrade; AC-5.2 path eligible |
| Satellite tile | MBTiles schema valid; descriptors present; capture_date within freshness budget for sector |
corrupt MBTiles, missing sidecar, beyond-grace freshness | reject with WARN; AC-8.2 / AC-NEW-6 |
| MAVLink GPS_RAW_INT (FC inputs) | well-formed; signing valid (when MAVLink2 signing on) | unsigned frame, malformed length, sysid spoofing | reject; F-T9 + S-T1 cover this |
| HTTPS request body | JSON parse OK; required fields present; pixel coords ∈ frame bounds | missing fields, NaN, out-of-bounds pixel | HTTP 422 |
| JWT | signature valid; not expired; subject is allowed | expired, wrong sig, missing claims | HTTP 401 |
| Tile descriptor | dimension matches index; checksum match | wrong dims, mismatched hash | reject load; cache marks as corrupt; F-T2 |
| Operator hint STATUSTEXT | parseable RELOC_HINT: lat=… lon=… sigma=…; numeric ranges sane |
malformed, NaN, negative sigma, lat > 90 / lon > 180 | reject hint; emit STATUSTEXT WARN; do not seed VPR |
Pending Data (Phase 1 D3 — placeholder fixtures)
The following fixtures are declared by name in this spec but not yet present at the time of writing. Phase 3's HARD GATE will surface them as pending data, not "remove":
| Fixture | Generator / source | Owner | Phase 3 treatment |
|---|---|---|---|
fixtures/satellite_tiles_AD0000xx_z20/ |
tile-cache-init script: fetch z=20 ortho tiles for the bbox of coordinates.csv from a public ortho service (Esri / Mapbox / Sentinel-2 ≥ 0.5 m/px); pre-extract SuperPoint + DINOv2-VLAD descriptors |
Decompose / impl. team task | pending data — not removed; data_status: deferred-corpus retained until generator script is committed |
fixtures/imu_AD0000xx.csv |
SITL ArduPilot replay of coordinates.csv as ground-truth trajectory at 200 Hz |
Decompose / impl. team task | pending data — not removed; data_status: deferred-corpus |
aerialvl_s03, uav_visloc, aerialextrematch, 2chadcnn_seasons, tartanair_v2, internal_mavic |
External downloads + curation | data team task (Decompose creates a "dataset acquisition" task) | data_status: deferred-corpus |
internal_fixed_wing_first_sortie |
Field-test plan | operations team | data_status: deferred-field |
cold_soak_corpus, hot_soak_corpus |
Bench HW + chamber | bench team | data_status: deferred-hil |
synthetic_8h_load |
fixtures/synth-8h-loader/ script |
impl. team | regenerated per session — synthesisable, no external dependency |
cache_poisoning_scenarios |
fixtures/cache-poison-mc/ script |
impl. team | regenerated per session |