Test Data Management

Important Caveat — 60-image slice scope (per Phase 1 D2)

The 60 nav-cam JPGs in _docs/00_problem/input_data/AD000001.jpg … AD000060.jpg were captured at 400 m AGL with the ADTi Surveyor Lite 26S v2 (26 MP, 6252 × 4168, 25 mm, 23.5 mm sensor) — not the deployment camera (ADTi 20MP 20L V1, APS-C, ~5472 × 3648) and not the deployment altitude (≤1 km AGL). This corpus is therefore pipeline-correctness only:

It validates that the pipeline (cuVSLAM → VPR → matcher → Component 5 → MAVLink GPS_INPUT) produces the right shape of output, in the right order, with the right categorical labels and MAVLink schema.
It does NOT validate the deployment-binding accuracy budgets (AC-1.1 ≥80 %@50 m, AC-1.2 ≥50 %@20 m), the GSD-band assumptions, the matcher resolution sweeps, or the latency budget for the deployed 1 km AGL / 20 MP path.
Pass numbers from this slice on AC-1.1 / AC-1.2 / AC-2.1 / AC-2.2 / AC-NEW-8 are functional, not deployment-binding. The deployment-binding numbers come from the deferred-corpus tier (AerialVL S03, UAV-VisLoc, AerialExtreMatch, internal Mavic, first internal fixed-wing flight).

Seed Data Sets

Data Set	Description	Used by Tests	How Loaded	Cleanup
`nav_cam_60_slice`	60 JPGs `AD000001.jpg`…`AD000060.jpg`, 6252×4168, captured at 400 m AGL	T1 pipeline-correctness tests (FT-P-01..FT-P-08, FT-N-01..FT-N-04)	volume mount `fixtures-images:/fixtures/images:ro`	volume is read-only — no cleanup
`nav_cam_60_slice_coordinates`	`coordinates.csv`: per-frame WGS84 ground truth	All T1 accuracy tests	mount path `/fixtures/images/coordinates.csv`	—
`nav_cam_60_slice_imu` (synthetic, fixture)	`fixtures/imu_AD0000xx.csv`: 200 Hz IMU traces synthesised by SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory	T1 cuVSLAM tests; F-T1c IMU-sync-jitter measurement	mount path `/fixtures/imu/` ; `ardupilot-sitl --imu-replay=...`	regenerated per test session
`satellite_tiles_AD0000xx_z20` (placeholder fixture)	z=20 ortho-tiles for the bbox of `coordinates.csv`, fetched offline by `tile-cache-init` from public ortho service (Esri / Mapbox / Sentinel-2 fallback gated to ≥0.5 m/px)	T1 cross-view matcher / VPR tests	volume `tile-cache:/var/lib/gpsdenied/tiles`	volume rebuilt per test session
`satellite_tile_descriptors_z20`	Pre-extracted SuperPoint keypoints + DINOv2-VLAD global descriptors for `satellite_tiles_AD0000xx_z20`	T1 VPR + matcher tests	same volume, sidecar `.descriptors.h5` files	same
`aerialvl_s03` (deferred-corpus)	AerialVL S03: 70 km of fixed-wing flight at 1 km AGL with synced IMU + GPS truth + nav-cam stream	T2 AC-1.3, AC-NEW-4, AC-NEW-7, AC-NEW-8, AC-NEW-9	external download script (data team task — Decompose); mount when present	not removed (large, kept across sessions)
`uav_visloc` (deferred-corpus)	UAV-VisLoc public dataset	T2 matcher / VPR seasonal-robustness regression	external download script	not removed
`aerialextrematch` (deferred-corpus)	AerialExtreMatch open-review dataset	T2 matcher seasonal-robustness regression	external download script	not removed
`2chadcnn_seasons` (deferred-corpus)	2chADCNN season set (cross-season scene-change benchmark)	T2 NF-T*-season-robustness	external download script	not removed
`tartanair_v2` (deferred-corpus)	TartanAir V2 synthetic scenes	T2 matcher distillation evaluation	external download script	not removed
`internal_mavic` (deferred-corpus)	Internal Mavic 3 Pro Mini recorded flights (legacy attempt; no IMU per problem.md, used for visual-only checks)	T2 matcher visual-only regression	external `data team` mount	not removed
`internal_fixed_wing_first_sortie` (deferred-field)	First internal fixed-wing flight with synced IMU + GPS truth	T5 FT-1 / FT-2 / FT-3, AC-1.3 lock	field-test mount	not removed
`synthetic_8h_load` (synthesisable)	8-hour synthetic 3 fps nav-frame replay sequence assembled from `nav_cam_60_slice` looped + jittered	NF-T3 thermal soak, NF-T5 FDR rollover (AC-NEW-3), AC-NEW-5	generated at fixture build time by `fixtures/synth-8h-loader/`	regenerated per session
`cold_soak_corpus` (deferred-hil)	A short replay loop run at −20 °C ambient	T4 NF-T3 cold-soak, AC-NEW-1 cold	bench HW only	—
`hot_soak_corpus` (deferred-hil)	Same replay loop run at +50 °C ambient for 8 h	T4 NF-T3 hot-soak, AC-NEW-5	bench HW only	—
`spoofing_scenarios`	Scripted MAVLink GPS_RAW_INT injections: jam-onset, lat/lon offset, sat-count drop, hdop spike	T3 F-T9 / F-T12, AC-NEW-2	`gps-spoof-injector` config files	regenerated per session
`operator_hint_scenarios`	Scripted operator STATUSTEXT messages with approximate `(lat, lon, sigma_xy=500m)`	T3 F-T10, AC-3.4, AC-6.2, results_report row 22	`qgc-mock` config	regenerated per session
`stale_tile_scenarios`	Synthetic-age tiles (1, 5, 7, 11, 13, 18 months old; both active-conflict and stable-rear sectors)	T1 NF-T6, AC-8.2 / AC-NEW-6	injected into `tile-cache` by `tile-cache-init --inject-stale`	volume rebuilt per session
`cache_poisoning_scenarios`	Multi-flight Monte Carlo with synthetic over-confidence injection (EKF covariance deflated by 1.5×–3×)	T2 NF-T4b, AC-NEW-7	generated by `fixtures/cache-poison-mc/`	regenerated per session
`cold_start_replay_50`	50× cold-boot replay: SUT process killed and restarted with simulated FC pose injection	T1+T4 F-T11, AC-NEW-1	scripted in `e2e-runner` test	—
`disconnected_segments_replay`	Synthetic ≥3 disconnected flight segments stitched from `nav_cam_60_slice` with gaps	T1 F-T8, AC-3.3	generated at fixture build time	regenerated per session
`tile_dedup_replay`	A flight where ground sectors are visited twice — used to verify deduplication (AC-8.4)	T1 F-T2	generated at fixture build time	regenerated per session
`mavlink2_signing_keys`	Test-only per-airframe HMAC-SHA256 signing keys	T1 / T3 F-T9, S-T1, MAVLink2 signing assertions	env var `MAVLINK2_SIGNING_KEY=…` shared SUT + runner + FC	rotated per session
`tls_test_certs`	Self-signed CA + SUT cert + client cert (test-only)	T1 S-T1..S-T5 HTTPS auth tests	mount `tls-test-certs:/etc/gpsdenied/tls:ro`	regenerated per session

Data Isolation Strategy

Container scope: each test session starts with a clean sut container (no cache poisoning between sessions).
Volume scope: tile-cache and fdr volumes are rebuilt per test session (not per test) — within a session, tests that depend on cache state are ordered or use namespaced subdirectories. fixtures-images, fixtures-imu, fixtures-expected are read-only; cannot be polluted.
Cross-test contamination: tests that mutate state (cache writes, FDR writes) declare pytest.mark.mutates_state and are run in a serial group. Read-only tests run in parallel within a tier.
Identity isolation: each session generates a fresh mavlink2_signing_keys set and JWT signing key — replay across sessions is impossible.
Resource isolation: T4 deferred-hil tests do not share a Jetson with any other test; bench scheduler enforces single-tenant access.

Input Data Mapping

Input Data File	Source Location	Description	Covers Scenarios
`AD000001.jpg`…`AD000060.jpg`	`_docs/00_problem/input_data/`	60 nav-cam JPGs, 6252×4168, 400 m AGL, ADTi 26S v2	FT-P-01..FT-P-08, FT-N-01..FT-N-04, NF-RES-LIM-01..03 (T1)
`coordinates.csv`	`_docs/00_problem/input_data/`	Frame index → WGS84 ground truth	results_report rows 1–4, FT-P-01, FT-P-02, NFT-PERF-01
`data_parameters.md`	`_docs/00_problem/input_data/`	Corpus-shoot params (400 m AGL, 26S v2, 25 mm, 23.5 mm sensor)	All T1 tests — context for pipeline-correctness scope
`AD000001_gmaps.png`, `AD000002_gmaps.png`	`_docs/00_problem/input_data/`	Two satellite reference thumbnails (frames 1–2 only)	Smoke-test only; not used as the cross-view reference (placeholder fixture is)
`expected_results/results_report.md`	`_docs/00_problem/input_data/`	46-scenario expected results mapping	All T1 tests + most T2 tests; canonical pass/fail thresholds
`expected_results/position_accuracy.csv`	`_docs/00_problem/input_data/`	Per-frame ground truth + thresholds	results_report rows 1–3, FT-P-01, FT-P-02

Expected Results Mapping

The canonical mapping is _docs/00_problem/input_data/expected_results/results_report.md. The traceability matrix references that file by row number. The summary table below lists the rows by the test scenario IDs that consume them.

Test Scenario ID	Input Data	Expected Result	Comparison Method	Tolerance	Expected Result Source
FT-P-01	`coordinates.csv` (60 frames) + `nav_cam_60_slice` + `satellite_tiles_AD0000xx_z20` + `nav_cam_60_slice_imu`	≥80 % within 50 m	`percentage`	≥80 %	`results_report` row 1; `position_accuracy.csv`
FT-P-02	same	≥50 % within 20 m	`percentage`	≥50 %	`results_report` row 2; `position_accuracy.csv`
FT-P-03	same	each frame ≤100 m error	`numeric_tolerance`	±100 m max per frame	`results_report` row 3
FT-P-04	same	cumulative VO drift between satellite anchors ≤100 m mono / ≤50 m mono+IMU	`threshold_max`	mono: ≤100 m; mono+IMU: ≤50 m	`results_report` row 4 ; AC-1.3 / AC-NEW-8
FT-P-05	single frame + IMU	`fix_type=3, horiz_accuracy ∈ [1,50] m, satellites_visible=10`	`exact` (fix_type, sat) + `range` (h_acc)	as stated	`results_report` row 5
FT-P-06	sequence, no satellite >30 s	`fix_type=3, horiz_accuracy ∈ [20,100]`	`exact` + `range`	as stated	`results_report` row 6
FT-P-07	sequence, VO lost + no satellite	`fix_type=2, h_acc ≥ 50 m` (growing)	`exact` + `threshold_min`	as stated	`results_report` row 7
FT-P-08	VO lost + 3 sat failures	`fix_type=0, h_acc=999.0`	`exact`	N/A	`results_report` row 8
FT-P-09	tier transitions	tier ∈ {HIGH, MEDIUM, LOW, FAILED} per conditions	`exact`	N/A	`results_report` rows 10–13
FT-P-10	60 frames	registration rate ≥95 % (T1 functional only)	`percentage`	≥95 % (functional)	`results_report` row 14
FT-P-11	60 frames	MRE < 1.0 px VO frame-to-frame; < 2.5 px cross-domain	`threshold_max`	<1.0 / <2.5	`results_report` row 15 ; AC-2.2
FT-P-12	frames 32–43 (turn area)	system continues producing position estimates through turn	`threshold_min`	≥1 position output / frame	`results_report` row 16
FT-P-13	350 m gap synthetic	error ≤100 m after recovery	`threshold_max`	≤100 m	`results_report` row 17
FT-P-14	sharp-turn synthetic	satellite re-loc triggers; error ≤50 m within 3 frames	`threshold_max`	≤50 m	`results_report` row 18
FT-P-15	VO loss + sat success	`tracking_state == NORMAL` after recovery	`exact`	N/A	`results_report` row 19
FT-P-16	startup with `GLOBAL_POSITION_INT`	first GPS_INPUT within 30 s of boot, p95	`threshold_max`	≤30 s p95	`results_report` row 23 ; AC-NEW-1
FT-P-17	startup + first satellite match	error ≤50 m after first match	`threshold_max`	≤50 m	`results_report` row 24
FT-P-18	reboot mid-flight	recovery time ≤30 s	`threshold_max`	≤30 s	`results_report` row 25 ; AC-NEW-1
FT-P-19	post-reboot first match	error ≤50 m	`threshold_max`	≤50 m	`results_report` row 26
FT-P-20	object localize valid request	response with lat/lon within `accuracy_m` of ground truth	`numeric_tolerance`	per response.accuracy_m	`results_report` row 27
FT-P-21	round-trip GPS→NED→pixel→GPS	error ≤0.1 m	`threshold_max`	≤0.1 m	`results_report` row 29
FT-P-22	`GET /health`	200 + JSON with `status`, `memory_mb`, `gpu_temp_c`	`exact` + `regex`	as stated	`results_report` row 30
FT-P-23	`POST /sessions`	200 or 201 + session id	`exact`	status ∈ {200,201}	`results_report` row 31
FT-P-24	`GET /sessions/{id}/stream`	SSE events at ~1 Hz with schema fields	`regex` + rate	per SSE schema	`results_report` row 32
FT-P-25	TRT engine load	≤10 s total	`threshold_max`	≤10 s	`results_report` row 39
FT-P-26	mission area definition	300–1000 MB tile storage	`range`	[300, 1000] MB	`results_report` row 40
FT-P-27	EKF position ± 3σ	tile mosaic radius ≥500 m	`threshold_min`	≥500 m	`results_report` row 41
FT-P-28	tile dedup replay	≤1 tile per ground sector visited ≥2×	`exact`	per-sector count == 1	AC-8.4, F-T2
FT-P-29	post-flight upload	tiles uploaded to candidate pool with `trust_level=candidate`	`exact`	as stated	AC-8.4, F-T3
FT-P-30	telemetry	NAMED_VALUE_FLOAT at 1 Hz ± 0.2 Hz	`numeric_tolerance`	1 Hz ± 0.2 Hz	`results_report` row 45
FT-N-01	corrupted JPG	system continues with `tracking_state == DEGRADED`, no crash	`exact`	tracking_state ∈ {DEGRADED, NORMAL}	derived from AC-3.x
FT-N-02	invalid object localize pixel	HTTP 422	`exact`	status == 422	`results_report` row 28
FT-N-03	unauthenticated `POST /sessions`	HTTP 401	`exact`	status == 401	`results_report` row 33
FT-N-04	tile older than freshness budget	tile rejected or down-confidence; never `satellite_anchored`	`exact`	as stated	AC-8.2, AC-NEW-6
FT-N-05	tile in 30-day grace zone	confidence linearly decayed	`numeric_tolerance`	per spec curve	AC-NEW-6
FT-N-06	sharp turn (no overlap, <70°, <200 m)	satellite re-loc within 3 frames	`threshold_max`	≤50 m within 3 frames	`results_report` row 18 ; AC-3.2
FT-N-07	VO loss + 3 sat failures	`RELOC_REQ` regex pattern emitted via STATUSTEXT	`regex`	per pattern	`results_report` rows 20, 46
FT-N-08	re-loc active	`fix_type=0`, IMU prediction continues, sat attempts continue	`exact`	as stated	`results_report` row 21
FT-N-09	operator hint received	hint used as 500 m seed for VPR; ≤500 m initially, ≤50 m after match	`threshold_max`	as stated	`results_report` row 22
NFT-PERF-01	single 6252×4168 frame on Orin Nano Super 25 W (T4)	end-to-end latency ≤400 ms p95	`threshold_max`	≤400 ms p95	`results_report` row 34 ; AC-4.1
NFT-PERF-02	cuVSLAM single frame	≤20 ms / frame	`threshold_max`	≤20 ms	`results_report` row 37
NFT-PERF-03	matcher single pair on Orin Nano Super 25 W	inline ≤200 ms; re-loc fallback ≤2000 ms	`threshold_max`	as stated	`results_report` row 38
NFT-PERF-04	Orthority per-frame on Orin Nano Super	≤50 ms / frame	`threshold_max`	≤50 m frame	F-T14, M-27
NFT-PERF-05	spoof onset → SUT promotion	≤3 s p95	`threshold_max`	≤3 s p95	AC-NEW-2 ; F-T12
NFT-PERF-06	per-frame end-to-end (frame-by-frame, not batched)	inter-frame interval matches camera rate	`numeric_tolerance`	per frame within ±50 ms of camera rate	AC-4.4
NFT-RES-01	SUT process killed mid-flight	recovery ≤30 s, restart from FC pose	`threshold_max`	≤30 s	`results_report` row 25 ; AC-5.3, AC-NEW-1
NFT-RES-02	spoofing onset	promotion ≤3 s	`threshold_max`	≤3 s	AC-NEW-2
NFT-RES-03	network partition with FC	failsafe at 3 s no fix	`threshold_max`	≤3 s	AC-5.2
NFT-RES-04	EKF3 lane-switch / fix-loss event	source-promotion responds	`exact`	promotion within budget	AC-NEW-2
NFT-SEC-01	unsigned MAVLink injection	FC rejects	`exact`	acceptance==false	F-T9, S-T1
NFT-SEC-02	unauthenticated REST	401 / 403	`exact`	per endpoint	results_report row 33
NFT-SEC-03	malformed JWT	401	`exact`	status==401	derived
NFT-SEC-04	TLS downgrade attempt	rejected	`exact`	TLS ≥1.2 only	S-T2
NFT-SEC-05	tile-cache write attempt by unauthorized API	403 / no-op	`exact`	as stated	AC-8.5, AC-NEW-7
NFT-RES-LIM-01	30-min sustained load (T1+T4)	peak < 8192 MB; growth ≤50 MB / 30 min	`threshold_max`	as stated	results_report row 35 ; AC-4.2
NFT-RES-LIM-02	30-min sustained load	SoC junction ≤80 °C	`threshold_max`	≤80 °C	results_report row 36
NFT-RES-LIM-03	8-h sustained 25 W @ +50 °C ambient (T4)	no thermal throttle	`exact`	throttle_event_count == 0	AC-NEW-5, NF-T3
NFT-RES-LIM-04	FDR 8-h synthetic load	FDR ≤64 GB; rollover logged; no payload class silently dropped	`threshold_max` + audit	as stated	AC-NEW-3, NF-T5
NFT-RES-LIM-05	tile cache 400 km²	≤10 GB persistent	`threshold_max`	≤10 GB	restrictions §UAV

External Dependency Mocks

External Service	Mock/Stub	How Provided	Behavior
Azaion Suite Satellite Service (pre-flight cache sync)	`tile-cache-init` one-shot loader	Docker service that materialises MBTiles + sidecar before SUT starts	Returns the same fixture set every run; deterministic
Azaion Suite Satellite Service (post-flight upload)	candidate-pool stub inside `qgc-mock` (or a dedicated `service-stub` container)	HTTP server with `POST /candidates` accepting tile uploads, recording to a file	Records what the SUT sends; never alters the cache used by the next test
QGroundControl GCS	`qgc-mock`	Custom MAVLink-only mock	Records STATUSTEXT, NAMED_VALUE_FLOAT, GPS_INPUT, ODOMETRY frames; can inject operator-hint STATUSTEXT
ArduPilot autopilot	`ardupilot-sitl` (PR #30080-pinned)	Official ArduPilot SITL container	Replays IMU from fixture; runs EKF3; exposes `RAW_IMU`, `ATTITUDE`, `GLOBAL_POSITION_INT`, `EKF_STATUS_REPORT`, `GPS_RAW_INT`
Spoofing GPS adversary	`gps-spoof-injector`	Custom MAVLink injector	Sends crafted `GPS_RAW_INT` with configurable lat/lon offset, sat count, hdop
Identity provider (JWT)	in-runner key generator	Test-only HMAC-SHA256 key shared at SUT boot via env var	Mints valid + invalid + expired JWTs
External satellite providers (Maxar, Airbus, Planet)	NOT MOCKED — out of scope per AC-8.1; SUT does not call them at runtime	—	The SUT must never make outbound HTTP to these hosts; F-T2 / NFT-SEC-04 includes a network-policy assertion

All mocks are deterministic — same input always produces same output — except the spoof / operator-hint scenarios that explicitly schedule events on a wall-clock so the SUT's timing budgets (AC-NEW-1, AC-NEW-2) are exercised.

Data Validation Rules

Data Type	Validation	Invalid Examples	Expected System Behavior
Nav-cam frame	non-zero size; JPEG / PNG decodable; expected resolution within ±1 % of `data_parameters.md`	0-byte file, truncated JPEG header, wildly wrong resolution	log error; `tracking_state` transitions to `DEGRADED` if loss >2 frames; never crash
IMU sample	rate 200 Hz ± 10 %; timestamps monotonic; covariance present	timestamp regression, rate < 50 Hz, NaN / Inf	drop sample with WARN log; if loss > 0.5 s → cuVSLAM degrade; AC-5.2 path eligible
Satellite tile	MBTiles schema valid; descriptors present; `capture_date` within freshness budget for sector	corrupt MBTiles, missing sidecar, beyond-grace freshness	reject with WARN; AC-8.2 / AC-NEW-6
MAVLink GPS_RAW_INT (FC inputs)	well-formed; signing valid (when MAVLink2 signing on)	unsigned frame, malformed length, sysid spoofing	reject; F-T9 + S-T1 cover this
HTTPS request body	JSON parse OK; required fields present; pixel coords ∈ frame bounds	missing fields, NaN, out-of-bounds pixel	HTTP 422
JWT	signature valid; not expired; subject is allowed	expired, wrong sig, missing claims	HTTP 401
Tile descriptor	dimension matches index; checksum match	wrong dims, mismatched hash	reject load; cache marks as corrupt; F-T2
Operator hint STATUSTEXT	parseable `RELOC_HINT: lat=… lon=… sigma=…`; numeric ranges sane	malformed, NaN, negative sigma, lat > 90 / lon > 180	reject hint; emit STATUSTEXT WARN; do not seed VPR

Pending Data (Phase 1 D3 — placeholder fixtures)

The following fixtures are declared by name in this spec but not yet present at the time of writing. Phase 3's HARD GATE will surface them as pending data, not "remove":

Fixture	Generator / source	Owner	Phase 3 treatment
`fixtures/satellite_tiles_AD0000xx_z20/`	`tile-cache-init` script: fetch z=20 ortho tiles for the bbox of `coordinates.csv` from a public ortho service (Esri / Mapbox / Sentinel-2 ≥ 0.5 m/px); pre-extract SuperPoint + DINOv2-VLAD descriptors	Decompose / impl. team task	`pending data` — not removed; `data_status: deferred-corpus` retained until generator script is committed
`fixtures/imu_AD0000xx.csv`	SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory at 200 Hz	Decompose / impl. team task	`pending data` — not removed; `data_status: deferred-corpus`
`aerialvl_s03`, `uav_visloc`, `aerialextrematch`, `2chadcnn_seasons`, `tartanair_v2`, `internal_mavic`	External downloads + curation	data team task (Decompose creates a "dataset acquisition" task)	`data_status: deferred-corpus`
`internal_fixed_wing_first_sortie`	Field-test plan	operations team	`data_status: deferred-field`
`cold_soak_corpus`, `hot_soak_corpus`	Bench HW + chamber	bench team	`data_status: deferred-hil`
`synthetic_8h_load`	`fixtures/synth-8h-loader/` script	impl. team	regenerated per session — synthesisable, no external dependency
`cache_poisoning_scenarios`	`fixtures/cache-poison-mc/` script	impl. team	regenerated per session

22 KiB Raw Blame History Unescape Escape