mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-04-27 12:06:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh f321268e1b Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.

2026-04-27 06:23:53 +03:00

36 KiB

Raw Blame History

Blackbox Tests

Tier markers (per environment.md): pipeline (T1), deferred-corpus (T2), deferred-sitl (T3), deferred-hil (T4), deferred-field (T5). Every test pairs an input/observable with a quantifiable expected result from _docs/00_problem/input_data/expected_results/results_report.md or directly from an AC. All tests run through the public interfaces defined in environment.md. No SUT-internal access.

Positive Scenarios

FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)

Summary: Sequentially feed the 60 nav-cam JPGs through the SUT and verify the position-error CDF on this corpus. Traces to: AC-1.1 (pipeline-correctness only — see test-data.md caveat), results_report row 1. Category: Position Accuracy. Tier: T1 (pipeline).

Preconditions:

nav_cam_60_slice mounted; nav_cam_60_slice_imu synthesised; satellite_tiles_AD0000xx_z20 placeholder fixture present.
SUT booted; cuVSLAM warmed; ArduPilot SITL loaded with the corresponding IMU replay; first valid GPS_INPUT received (i.e., AC-NEW-1 cleared).

Input data: nav_cam_60_slice + coordinates.csv + nav_cam_60_slice_imu.

Steps:

Step	Consumer Action	Expected System Response
1	Stream the 60 JPGs at 3 fps via the camera-input shim into the SUT	SUT publishes `GPS_INPUT` for each frame
2	Capture each `GPS_INPUT.lat / lon` at the qgc-mock sniffer	All frames produce a frame within the test window
3	Compute haversine error vs `coordinates.csv` ground truth per frame	Per-frame errors collected into a CDF

Expected outcome: ≥80 % of frames have error < 50 m. Reported as pipeline-functional, not deployment-binding (per test-data.md caveat — deployment-binding number from FT-P-T2 / AerialVL). Max execution time: 60 s per run.

FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)

Summary: Same corpus as FT-P-01; tighter tolerance. Traces to: AC-1.2 (pipeline-correctness only), results_report row 2. Category: Position Accuracy. Tier: T1.

Preconditions / Input data: same as FT-P-01.

Steps:

Step	Consumer Action	Expected System Response
1	Replay the corpus end-to-end	per-frame `GPS_INPUT`
2	Compute haversine error CDF	—

Expected outcome: ≥50 % within 20 m on the 60-image slice (functional check). Deployment-binding number comes from AerialVL S03 in FT-P-T2. Max execution time: 60 s.

FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03

Summary: Re-run AC-1.1 / AC-1.2 on the deployment-binding corpus. Traces to: AC-1.1, AC-1.2. Tier: T2 (deferred-corpus). data_status: deferred-corpus.

Preconditions: aerialvl_s03 mounted with synced IMU + nav-cam stream + GPS truth.

Input data: AerialVL S03.

Steps:

Step	Consumer Action	Expected System Response
1	Replay AerialVL S03 70 km of fixed-wing flight at 1 km AGL	per-frame `GPS_INPUT`
2	Compute error CDF vs S03 GPS truth	—

Expected outcome: ≥80 % within 50 m AND ≥50 % within 20 m (deployment-binding). Max execution time: 90 min (replay + analysis).

FT-P-03: Per-frame error bound ≤100 m

Summary: No single frame exceeds 100 m error on the 60-image slice. Traces to: AC-1.1 (negative-tail bound), results_report row 3. Tier: T1.

Preconditions / Input: same as FT-P-01.

Steps:

Step	Consumer Action	Expected System Response
1	Replay 60 frames	per-frame GPS_INPUT
2	Compute max(haversine_err) over all frames	—

Expected outcome: max error ≤ 100 m. Pipeline-functional only. Max execution time: 60 s.

FT-P-04: VO drift bound between satellite anchors

Summary: VO drift between successive satellite-anchored fixes stays bounded. Traces to: AC-1.3, AC-NEW-8, results_report row 4, F-T1b. Tier: T1 functional + T2 binding.

Preconditions: cuVSLAM in mono+IMU mode (T1) AND mono-only mode (T2 split test). Input data: nav_cam_60_slice (T1) + AerialVL S03 (T2).

Steps:

Step	Consumer Action	Expected System Response
1	Identify successive `satellite_anchored` source-label transitions	series of anchor pairs
2	For each anchor pair, measure VO-extrapolated centre vs next-anchor centre	drift in metres
3	Compute 95th percentile across all pairs	—

Expected outcome:

mono+IMU: p95 drift ≤ 50 m (binding on T2 / AerialVL).
mono-only: p95 drift ≤ 100 m (binding on T2 / AerialVL).
T1 functional check: drift bounded (no monotonic growth) — exact numbers not deployment-binding.

Max execution time: 90 min (T2).

FT-P-05: GPS_INPUT shape under normal tracking

Summary: GPS_INPUT messages emitted while tracking is healthy carry the correct schema and value ranges. Traces to: AC-1.4, AC-4.3, AC-6.3, results_report row 5. Tier: T1.

Preconditions: SUT in steady-state tracking with recent satellite anchor (<30 s old). Input data: any single frame from nav_cam_60_slice.

Steps:

Step	Consumer Action	Expected System Response
1	Sniff GPS_INPUT at qgc-mock	one frame per nav-cam frame
2	Decode fields: `fix_type`, `horiz_accuracy`, `satellites_visible`, `lat`, `lon`, `alt`, `vel_acc`	as per MAVLink GPS_INPUT spec
3	Inspect optional ODOMETRY: assert intentional absence in v1 (per AC-4.3 v1-scope clause)	no ODOMETRY frames present

Expected outcome: fix_type == 3, horiz_accuracy ∈ [1, 50] m, satellites_visible == 10, lat / lon non-null, WGS84. ODOMETRY count == 0 across the run. Max execution time: 30 s.

FT-P-06: GPS_INPUT shape during VO-only fallback

Summary: Fields adapt when no satellite anchor is available for >30 s. Traces to: AC-1.4, AC-4.3, results_report row 6. Tier: T1.

Preconditions: Force satellite-match failure for >30 s (cache poisoned with stale tiles).

Input data: nav_cam_60_slice with stale_tile_scenarios injected.

Steps:

Step	Consumer Action	Expected System Response
1	After 30 s of failed matches, sniff GPS_INPUT	`fix_type == 3`, `horiz_accuracy ∈ [20, 100]` m, source-label `vo_extrapolated`

Expected outcome: as above; horiz_accuracy grows monotonically until next successful match. Max execution time: 60 s.

FT-P-07: GPS_INPUT shape during dead-reckoning

Summary: VO lost AND no satellite → IMU-only dead reckoning. Traces to: AC-1.4, AC-5.2, results_report row 7. Tier: T1.

Preconditions: Inject cuVSLAM tracking-loss + cache poisoned.

Steps:

Step	Consumer Action	Expected System Response
1	Sniff GPS_INPUT	`fix_type == 2`, `horiz_accuracy ≥ 50 m` and growing
2	Source label	`dead_reckoned`

Expected outcome: fix_type == 2, monotonically growing horiz_accuracy, source == dead_reckoned. Max execution time: 60 s.

FT-P-08: GPS_INPUT shape on total failure

Summary: 3+ consecutive failures — system signals total failure. Traces to: AC-3.4, results_report row 8. Tier: T1.

Preconditions: cache_poisoning_scenarios flavour that causes 3 sat failures + cuVSLAM lost.

Steps:

Step	Consumer Action	Expected System Response
1	Wait for 3 consecutive failures	GPS_INPUT continues at the configured rate
2	Inspect GPS_INPUT	`fix_type == 0`, `horiz_accuracy == 999.0`
3	Inspect STATUSTEXT	RELOC_REQ regex emitted

Expected outcome: as above. Max execution time: 60 s.

FT-P-09: Confidence tier transitions

Summary: Confidence tier label transitions match defined conditions. Traces to: AC-1.4, results_report rows 10–13. Tier: T1.

Preconditions: scripted scenario that walks (HIGH → MEDIUM → LOW → FAILED).

Steps:

Step	Consumer Action	Expected System Response
1	At each scripted state, read the SSE stream confidence field AND the source-label field	matches expected tier

Expected outcome:

Sat anchor <30 s + cov <400 m² → tier HIGH, source satellite_anchored.
cuVSLAM OK + no sat >30 s → tier MEDIUM, source vo_extrapolated.
cuVSLAM lost + IMU only → tier LOW, source dead_reckoned.
3+ consecutive failures → tier FAILED, fix_type 0.

Max execution time: 5 min.

FT-P-10: Image registration rate (functional)

Summary: Pipeline registers ≥95 % of normal-flight frames against the previous frame. Traces to: AC-2.1 (pipeline-functional only), results_report row 14. Tier: T1 functional + T2 binding.

Preconditions: SUT exposes registration outcome via STATUSTEXT or NAMED_VALUE_FLOAT (reg_pass_count, reg_total_count).

Steps:

Step	Consumer Action	Expected System Response
1	Replay `nav_cam_60_slice` (T1) or AerialVL S03 (T2)	registration metrics published
2	Compute `reg_pass_count / reg_total_count`	percentage

Expected outcome: T1 ≥95 % (functional); T2 ≥95 % (deployment-binding) under normal-flight definition (nadir, ±10° bank/pitch, ≥40 % overlap, daytime, season-matched tile). Max execution time: 60 s (T1) / 90 min (T2).

FT-P-11: Mean Reprojection Error (MRE)

Summary: VO and cross-domain MRE under thresholds. Traces to: AC-2.2, results_report row 15. Tier: T1 functional + T2 binding.

Preconditions: SUT publishes mre_vo (frame-to-frame) and mre_cross (cross-view) on the metrics endpoint.

Steps:

Step	Consumer Action	Expected System Response
1	Scrape MRE metrics over a replay	per-frame samples
2	Compute mean across the run	—

Expected outcome: mean(mre_vo) < 1.0 px; mean(mre_cross) < 2.5 px. T1 numbers functional only. Max execution time: 60 s (T1) / 90 min (T2).

FT-P-12: Continuous output through turn area (frames 32–43)

Summary: SUT keeps producing position estimates through the turn segment of coordinates.csv. Traces to: AC-3.2, AC-4.4, results_report row 16. Tier: T1.

Preconditions: standard pipeline replay.

Steps:

Step	Consumer Action	Expected System Response
1	Replay frames 32–43	per-frame GPS_INPUT
2	Count outputs vs frames	—

Expected outcome: ≥1 GPS_INPUT per nav-cam frame in the turn region. Max execution time: 30 s.

FT-P-13: 350 m outlier handled (AC-3.1)

Summary: Pipeline survives a synthetic 350 m gap between consecutive frames (caused by ±20° tilt outlier). Traces to: AC-3.1, results_report row 17. Tier: T1.

Input data: synthetic two-frame pair with 350 m gap injected into nav_cam_60_slice mid-replay.

Steps:

Step	Consumer Action	Expected System Response
1	Inject the outlier pair	SUT emits a `vo_extrapolated` or `dead_reckoned` frame, not corrupted output
2	Continue with next valid frame	error returns to ≤100 m within next valid frame

Expected outcome: error ≤ 100 m on the next valid frame after the outlier. Max execution time: 60 s.

FT-P-14: Sharp-turn re-localization (AC-3.2)

Summary: Sharp turn (<5 % overlap, <70°, <200 m drift) — VO fails, satellite re-loc recovers. Traces to: AC-3.2, F-T7, results_report row 18. Tier: T1.

Input data: synthetic sharp-turn pair injected into nav_cam_60_slice.

Steps:

Step	Consumer Action	Expected System Response
1	Inject the sharp-turn pair	cuVSLAM tracking lost; VPR triggers; matcher re-localizes
2	Track error over next 3 frames	error ≤ 50 m within 3 frames

Expected outcome: error ≤ 50 m within 3 frames of the turn. Max execution time: 60 s.

FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL

Summary: After cuVSLAM tracking loss + sat match success, tracking_state returns to NORMAL. Traces to: AC-3.2, AC-3.3, results_report row 19. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	Force cuVSLAM tracking-loss; deliver a fresh tile that matches	matcher emits absolute pose; calibrator emits satellite-anchored fix
2	Observe FC EKF3 reconvergence via `EKF_STATUS_REPORT`	EKF3 reconverges
3	Read SUT-published `tracking_state`	== `NORMAL`

Expected outcome: tracking_state == NORMAL within bounded time. Max execution time: 60 s.

FT-P-16: Cold-start TTFF ≤30 s p95

Summary: From companion-computer boot, first valid GPS_INPUT within 30 s. Traces to: AC-NEW-1, results_report row 23, F-T11. Tier: T1 statistical (≤10 boots) + T4 binding (50 boots on real HW).

Preconditions: SUT image cold (no warmed engines); FC providing GLOBAL_POSITION_INT simulating IMU-extrapolated pose.

Steps:

Step	Consumer Action	Expected System Response
1	Boot SUT container	container start logged
2	Time from container start to first valid `fix_type==3` GPS_INPUT	t_ttff
3	Repeat N times (N=10 T1 / N=50 T4)	distribution

Expected outcome: 95th percentile of t_ttff ≤ 30 s. Max execution time: 10 min (T1) / 30 min (T4).

FT-P-17: Validate initial position via first satellite match

Summary: First satellite match after startup pulls position to ≤50 m. Traces to: AC-5.1, AC-NEW-1, results_report row 24. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	Provide `GLOBAL_POSITION_INT` with a deliberate 200 m offset from truth	SUT seeds pipeline with 200 m uncertainty
2	Replay first frame with valid satellite tile	matcher succeeds; calibrator emits anchored fix
3	Read GPS_INPUT lat/lon	error ≤ 50 m

Expected outcome: position error ≤ 50 m after first match. Max execution time: 90 s.

FT-P-18: Mid-flight reboot recovery ≤30 s

Summary: Process kill mid-flight; SUT recovers within AC-NEW-1 budget. Traces to: AC-5.3, AC-NEW-1, results_report row 25. Tier: T1.

Preconditions: SUT in steady-state tracking; FC continues to fly.

Steps:

Step	Consumer Action	Expected System Response
1	Send SIGKILL to SUT container	SUT restarts
2	Time from restart to first `fix_type==3` GPS_INPUT	t_recovery

Expected outcome: t_recovery ≤ 30 s. Max execution time: 60 s.

FT-P-19: Post-reboot first-match accuracy

Summary: After reboot, first satellite match restores accuracy. Traces to: AC-5.3, results_report row 26. Tier: T1.

Steps: same as FT-P-17 but starting from a reboot.

Expected outcome: error ≤ 50 m after first match. Max execution time: 90 s.

FT-P-20: Object localization (level flight)

Summary: POST /objects/locate returns lat/lon for an object pixel given known UAV pose. Traces to: AC-7.1, AC-7.2, results_report row 27. Tier: T1.

Preconditions: SUT has a known anchored fix; AI camera gimbal pose injected via FC ATTITUDE. Input data: pixel coordinates + gimbal angle + zoom + altitude in request body.

Steps:

Step	Consumer Action	Expected System Response
1	`POST /objects/locate` with pixel_x, pixel_y, gimbal_pitch, gimbal_yaw, zoom, altitude	200 + JSON `{lat, lon, alt, accuracy_m, confidence}`
2	Compare to ground truth	error ≤ accuracy_m

Expected outcome: lat/lon within accuracy_m of ground truth; in level flight, accuracy_m consistent with frame-center accuracy of the GPS-Denied system. In maneuvering flight, response includes the altitude × |sin(unknown_bank_or_pitch)| bound (AC-7.1 second clause) when bank/pitch >5°. Max execution time: 5 s.

FT-P-21: Coordinate transform round-trip ≤0.1 m

Summary: GPS → NED → pixel → GPS round-trip preserves position. Traces to: AC-6.3, AC-7.2, results_report row 29. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	Submit a known WGS84 point through the round-trip via `/objects/locate` (or a debug endpoint if exposed)	round-trip lat/lon
2	Compare to original	≤ 0.1 m

Expected outcome: round-trip error ≤ 0.1 m. Max execution time: 1 s.

FT-P-22: `GET /health` schema and content

Summary: Health endpoint returns 200 with required fields. Traces to: AC-6.1 (telemetry), results_report row 30. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	`GET /health`	HTTP 200, JSON body
2	Validate schema	contains `status`, `memory_mb`, `gpu_temp_c`, `tracking_state`, `last_anchor_age_s`, `confidence_tier`

Expected outcome: as above; status ∈ {ok, degraded, failed}. Max execution time: 1 s.

FT-P-23: `POST /sessions` returns id

Traces to: AC-6.1, results_report row 31. Tier: T1.

Steps: POST /sessions (auth) → 200/201 with session id.

Expected outcome: status ∈ {200, 201}; body has session_id matching ^[a-f0-9-]{36}$. Max execution time: 1 s.

FT-P-24: SSE stream emits per-second events

Traces to: AC-6.1, results_report row 32. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	`GET /sessions/{id}/stream`	SSE connection; events emitted at ~1 Hz
2	Sample 30 s of stream	each event matches schema: `type`, `timestamp`, `lat`, `lon`, `alt`, `accuracy_h`, `confidence`, `vo_status`

Expected outcome: rate 1 Hz ± 0.2 Hz; all events conform to schema. Max execution time: 35 s.

FT-P-25: TRT engine load ≤10 s

Traces to: AC-NEW-1 (sub-budget), results_report row 39. Tier: T1 (synthetic timing) + T4 (real HW).

Steps:

Step	Consumer Action	Expected System Response
1	At SUT boot, time from container start to "all engines ready" STATUSTEXT	t_engines

Expected outcome: t_engines ≤ 10 s. Max execution time: 30 s.

FT-P-26: Tile storage size for the operational area

Traces to: AC-8.3, restrictions §UAV/Satellite, results_report row 40. Tier: T1.

Preconditions: a 200 km mission path × ±2 km buffer × z=18 + z=20 fixture loaded.

Steps: read total bytes under /probe/tiles/.

Expected outcome: 300 MB ≤ size ≤ 1000 MB. (Aligned with restriction's ~10 GB persistent cap for full 400 km².) Max execution time: 5 s.

FT-P-27: Tile mosaic coverage radius ≥500 m

Traces to: AC-8.3, results_report row 41. Tier: T1.

Preconditions: SUT given EKF position with σ_xy.

Steps: capture the assembled mosaic bbox via STATUSTEXT or a debug endpoint.

Expected outcome: mosaic radius ≥ 500 m around current position. Max execution time: 5 s.

FT-P-28: Tile dedup — ≤1 onboard tile per ground sector

Traces to: AC-8.4, F-T2. Tier: T1.

Preconditions: tile_dedup_replay (sectors visited ≥2×).

Steps:

Step	Consumer Action	Expected System Response
1	Replay the flight	onboard tiles written
2	Inspect MBTiles + sidecar in `/probe/tiles/`	per-sector tile count

Expected outcome: per-sector count ≤ 1; latest/highest-quality wins. Max execution time: 10 min.

FT-P-29: Post-flight upload to candidate pool

Traces to: AC-8.4, F-T3. Tier: T1.

Preconditions: service-stub running.

Steps: replay → on landing-event, SUT uploads tiles.

Expected outcome: service-stub records ≥1 tile with trust_level=candidate; promotion only after N≥2 voting flights (so a single flight does not promote). Max execution time: 5 min.

FT-P-30: NAMED_VALUE_FLOAT telemetry rate

Traces to: AC-6.1, results_report row 45. Tier: T1.

Steps: sniff gps_conf, gps_drift, gps_hacc NAMED_VALUE_FLOAT rates over 30 s.

Expected outcome: each at 1 Hz ± 0.2 Hz. Max execution time: 35 s.

FT-P-31: Disconnected segments — ≥3 connected via global retrieval

Traces to: AC-3.3, F-T8. Tier: T1.

Preconditions: disconnected_segments_replay with ≥3 segments.

Steps:

Step	Consumer Action	Expected System Response
1	Replay each segment with a synthetic gap	for each segment, VPR retrieves top-K candidates; matcher relocalizes
2	Verify segment-to-segment trajectory continuity	each segment connects to prior trajectory

Expected outcome: 3/3 segments connect within 10 frames of segment start; tracking_state == NORMAL after each. Max execution time: 5 min.

FT-P-32: Position refinement / corrections (AC-4.5)

Traces to: AC-4.5. Tier: T1.

Preconditions: SUT in steady state; ability to refine prior fixes.

Steps:

Step	Consumer Action	Expected System Response
1	Capture sequence of GPS_INPUT for a 10-s window	per-frame fixes
2	After delayed loop closure / late satellite match, observe whether SUT emits a corrected fix or signals correction via STATUSTEXT	a follow-up GPS_INPUT for an earlier `time_usec` OR a STATUSTEXT correction record

Expected outcome: at least one correction event where the corrected fix replaces the prior fix's h_acc (covariance shrinks). System never silently rewrites past output without recording the correction. Max execution time: 60 s.

FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound

Traces to: AC-7.1 (second clause). Tier: T1.

Preconditions: FC ATTITUDE published with bank > 5°.

Steps:

Step	Consumer Action	Expected System Response
1	`POST /objects/locate` while bank > 5°	response includes bound = `altitude × abs(sin(bank_or_pitch))`

Expected outcome: response body includes bank_pitch_bound_m matching the formula within 1 m. Max execution time: 5 s.

FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate

Traces to: AC-8.4, AC-NEW-7 hard gate. Tier: T1.

Preconditions: scripted scenarios with σ_xy ∈ {2, 4, 6, 8} m.

Steps:

Step	Consumer Action	Expected System Response
1	Replay σ_xy=2 m frames	tiles written
2	Replay σ_xy=8 m frames	NO tiles written
3	Inspect sidecar `trust_level` for σ_xy ∈ (3, 5] m	`trust_level == soft`
4	Inspect sidecar for σ_xy ≤ 3 m	`trust_level == candidate`

Expected outcome: as above. Max execution time: 5 min.

FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)

Traces to: AC-NEW-7. Tier: T2 (deferred-corpus). data_status: deferred-corpus.

Preconditions: ≥100 simulated flights worth of frames from AerialVL + Mavic + AerialExtreMatch with synthetic over-confidence injection (1.5×–3×).

Steps:

Step	Consumer Action	Expected System Response
1	Replay all flights	per-tile geo-misalignment captured
2	Compute P(misalign > 30 m) and P(misalign > 100 m)	—

Expected outcome: P(>30 m) < 1 %, P(>100 m) < 0.1 %. Max execution time: 4 h.

FT-P-36: AC-NEW-9 covariance calibration accuracy

Traces to: AC-NEW-9, F-T18. Tier: T2.

Preconditions: AerialVL S03 replay with ground truth.

Steps:

Step	Consumer Action	Expected System Response
1	For each emitted GPS_INPUT, capture (`h_acc`, ground-truth error)	series of pairs
2	Compute fraction of frames where `error ≤ h_acc * Mahalanobis-2D-95% factor`	fraction

Expected outcome: fraction ≥ 95 % (calibration neither over- nor under-claims). Max execution time: 90 min.

FT-P-37: F-T18 calibrator regression (no state propagation)

Traces to: AC-NEW-9, F-T18. Tier: T2.

Preconditions: replay with logging hooks on Component 5 outputs (publicly exposed counters).

Steps:

Step	Consumer Action	Expected System Response
1	Run replay	calibrator counters emitted
2	Assert `state_propagation_invocations_total == 0`	no propagation
3	Assert `mahalanobis_gate_rejections_total > 0`	gate active

Expected outcome: as above. Max execution time: 90 min.

Negative Scenarios

FT-N-01: Corrupted nav-cam frame — no crash, degraded mode

Traces to: AC-3.x (resilience), restriction "fixed downward camera". Tier: T1.

Input data: a 60-frame replay with frame N replaced by a 10-byte random blob.

Steps:

Step	Consumer Action	Expected System Response
1	Stream the replay	SUT logs decode error; emits STATUSTEXT WARN
2	Inspect tracking_state	transitions to `DEGRADED` for 1 frame; recovers to `NORMAL` on next valid frame
3	SUT process	does NOT crash

Expected outcome: process alive; no GPS_INPUT spike with bad data; tracking_state returns to NORMAL within 1 frame of recovery. Max execution time: 30 s.

FT-N-02: Object-localize invalid pixel

Traces to: AC-7.1, results_report row 28. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	`POST /objects/locate` with `pixel_x = -10` (out of frame)	HTTP 422 + error body

Expected outcome: status == 422; body contains a structured error code. Max execution time: 1 s.

FT-N-03: Unauthenticated `POST /sessions`

Traces to: results_report row 33, security restrictions. Tier: T1.

Steps: POST /sessions without JWT → 401.

Expected outcome: status == 401. Max execution time: 1 s.

FT-N-04: Stale tile beyond grace — must NOT label `satellite_anchored`

Traces to: AC-8.2, AC-NEW-6. Tier: T1.

Preconditions: stale_tile_scenarios with 18-month-old active-conflict tile (well past 6 mo + 30-day grace).

Steps:

Step	Consumer Action	Expected System Response
1	Replay frames whose only candidate tile is the 18 mo stale one	matcher invocation skipped or scored 0
2	Inspect source label on emitted GPS_INPUT	NEVER `satellite_anchored`
3	Inspect WARN STATUSTEXT	tile rejected event recorded

Expected outcome: no satellite_anchored label across the run; rejection event recorded. Max execution time: 60 s.

FT-N-05: Stale tile in 30-day grace — confidence linearly decayed

Traces to: AC-NEW-6. Tier: T1.

Preconditions: tiles aged at +0, +15, +30 days past the 6-mo budget.

Steps:

Step	Consumer Action	Expected System Response
1	Replay each	confidence weight in sidecar metric: 1.0, 0.5, 0.0

Expected outcome: confidence weight decays linearly as specified. Max execution time: 60 s.

FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail

Traces to: AC-3.2 (negative case). Tier: T1.

Steps: same as FT-P-14 but assert that before re-loc the SUT emits a STATUSTEXT explaining VO loss; assert tracking_state transitions through DEGRADED.

Expected outcome: explicit STATUSTEXT log; recovery within 3 frames. Max execution time: 60 s.

FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT

Traces to: AC-3.4, results_report rows 20, 46. Tier: T1.

Steps: see FT-P-08; additionally verify the regex RELOC_REQ:.*last_lat=.*last_lon=.*uncertainty=.*m.

Expected outcome: regex matches at least one STATUSTEXT after 3 failures; emitted within 2 s of the third failure (per AC-3.4 timing). Max execution time: 60 s.

FT-N-08: Re-loc waiting state behaviour

Traces to: AC-3.4, results_report row 21. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	After RELOC_REQ, observe SUT for 10 s	`fix_type == 0` GPS_INPUT continues; IMU-prediction-only label; satellite-match attempts continue (counter increments)

Expected outcome: as above; SUT does NOT stop emitting GPS_INPUT. Max execution time: 30 s.

FT-N-09: Operator hint — used as 500 m seed

Traces to: AC-6.2, AC-3.4, results_report row 22. Tier: T1.

Preconditions: SUT in re-loc-waiting; operator hint scenario active.

Steps:

Step	Consumer Action	Expected System Response
1	`qgc-mock` sends STATUSTEXT `RELOC_HINT: lat=… lon=… sigma=500m`	SUT consumes hint; uses as seed for VPR/cross-view
2	First fix after hint	error ≤ 500 m initially
3	After next satellite match	error ≤ 50 m

Expected outcome: as above. Max execution time: 60 s.

FT-N-10: Operator hint — malformed value rejected

Traces to: AC-6.2 (negative). Tier: T1.

Steps: send RELOC_HINT: lat=NaN lon=… sigma=-10.

Expected outcome: SUT emits STATUSTEXT WARN; hint NOT applied; pipeline state unchanged. Max execution time: 30 s.

FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1

Traces to: AC-4.3 (v1-scope clause), F-T9 Option A. Tier: T1.

Preconditions: SUT configured for v1 (default).

Steps:

Step	Consumer Action	Expected System Response
1	Run any of FT-P-01 / FT-P-04 / FT-P-T2	GPS_INPUT emitted
2	At qgc-mock, count ODOMETRY frames over the run	== 0
3	Inspect `EK3_SRC1_*` configuration via FC parameter readback	`POSXY=GPS, VELXY=GPS, YAW=GPS+Compass`

Expected outcome: ODOMETRY count == 0; FC parameters as configured. Max execution time: 60 s.

FT-N-12: Spoofed GPS — SUT promotes within 3 s

Traces to: AC-NEW-2, F-T12. Tier: T3 (deferred-sitl). data_status: deferred-sitl.

Preconditions: SITL + gps-spoof-injector configured.

Steps:

Step	Consumer Action	Expected System Response
1	At t=0, inject a malicious `GPS_RAW_INT` with a 1 km offset	FC sees both spoof + SUT GPS_INPUT
2	Time from spoof onset to SUT promoting its `GPS_INPUT` to primary (raised `fix_type=3` AND STATUSTEXT promotion event)	t_promote
3	Repeat 50×	distribution

Expected outcome: 95th percentile of t_promote < 3 s. Max execution time: 30 min.

FT-N-13: Failsafe at 3 s no-fix (AC-5.2)

Traces to: AC-5.2. Tier: T1+T3.

Preconditions: scripted scenario where SUT cannot produce ANY estimate for 3.5 s.

Steps:

Step	Consumer Action	Expected System Response
1	Force pipeline blackout	SUT logs failure
2	Verify FC behaviour	ArduPilot SITL logs fall-back to IMU-only dead reckoning

Expected outcome: failsafe transition observable in EKF_STATUS_REPORT within 4 s of blackout. Max execution time: 60 s.

FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary)

Traces to: restrictions §Sensors. Tier: T3.

Steps:

Step	Consumer Action	Expected System Response
1	Send a GPS_INPUT with invalid signing tag from the runner	FC rejects
2	Inspect FC log + STATUSTEXT to GCS	rejection event recorded

Expected outcome: rejected; FC continues to fly on prior valid source. Max execution time: 30 s.

FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion

Traces to: AC-4.3, F-T9 Option A. Tier: T3.

Preconditions: SITL with EK3_SRC1_*=GPS+Compass, EK3_SRC2_*=GPS.

Steps:

Step	Consumer Action	Expected System Response
1	Run a representative AerialVL replay	EKF3 fuses GPS_INPUT
2	Inspect EKF3 logs for double-fusion symptoms (issues #30076, #32506)	none
3	Trigger backup-GPS failover via SITL parameter	EKF3 switches to `EK3_SRC2_*` cleanly

Expected outcome: no double-fusion; clean failover. Max execution time: 30 min.

FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)

Traces to: AC-4.3 Option B (v1.1+), F-T9. Tier: T3 (deferred-sitl). data_status: deferred-sitl.

Preconditions: SITL with PR #30080-class build; SUT switched to ODOMETRY-primary mode (build flag).

Steps: ODOMETRY primary; GPS_INPUT held in reserve; verify clean source-switching, no double-fusion.

Expected outcome: as above. Test runs but build flag is OFF for v1 release gate. Max execution time: 30 min.

FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap

Traces to: AC-NEW-7 (Service-side voting). Tier: T1 (service-stub).

Steps:

Step	Consumer Action	Expected System Response
1	Run a single-flight upload to candidate pool	tile recorded as `trust_level=candidate`
2	Query `service-stub` for promoted basemap content	tile NOT in promoted basemap

Expected outcome: as above; promotion only after N≥2 voting flights confirm. Max execution time: 5 min.

FT-N-18: AC-8.5 — raw frames are NOT retained in FDR

Traces to: AC-8.5, AC-NEW-3. Tier: T1.

Preconditions: replay 60-frame slice; nav_cam_60_slice written to camera input.

Steps:

Step	Consumer Action	Expected System Response
1	Run the replay	FDR populates
2	Inspect `/probe/fdr/` for raw nav-cam frames	no JPEGs / no AI-cam frames
3	Inspect for thumbnail log of failed-tile-generation frames	present, ≤0.1 Hz, within FDR cap

Expected outcome: no raw frames retained; only the failure thumbnail log within budget. Max execution time: 60 s.

FT-N-19: Free public Sentinel-2 tile rejected at cache boundary

Traces to: AC-8.1 (resolution floor), restrictions §Satellite. Tier: T1.

Steps:

Step	Consumer Action	Expected System Response
1	Inject a synthetic tile at 10 m/px into the cache	cache index marks as below-resolution
2	Replay frames over that area	matcher does NOT use the tile; never emits `satellite_anchored` from it

Expected outcome: as above. Max execution time: 60 s.

FT-N-20: Photo-count cap removed — system runs without arbitrary cap

Traces to: restrictions §UAV ("no photo-count cap"). Tier: T1 (smoke).

Steps:

Step	Consumer Action	Expected System Response
1	Replay `synthetic_8h_load` for 30 min	SUT continues operating
2	Inspect logs for any "photo count exceeded" condition	none

Expected outcome: no cap-related condition; pipeline degrades only against FDR cap (AC-NEW-3) and tile-cache cap. Max execution time: 35 min.

FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number

Traces to: AC-7.1 second clause. Tier: T1.

Steps: like FT-P-33 but assert that for bank ∈ {0°, 5°, 15°, 30°}, the published bank_pitch_bound_m matches altitude × |sin(bank)| within 1 m.

Expected outcome: bound published correctly across the range. Max execution time: 30 s.

Coverage notes

Pipeline-correctness boundary: T1 tests on the 60-image slice are NOT deployment-binding. AC-1.1, AC-1.2, AC-2.1, AC-2.2, AC-1.3, AC-NEW-8 deployment numbers come from T2 (FT-P-T2, FT-P-04 binding split, FT-P-10 binding, FT-P-11 binding).
Behavioural-shape tests: FT-P-08, FT-P-15, FT-P-16, FT-P-18, FT-N-04, FT-N-11, FT-N-12, FT-N-13, FT-N-14, FT-N-17, FT-N-18, FT-N-19 use the behavioural shape (trigger + observable + quantifiable verdict) — no input-data input/output mapping required.
Untraced tests: none. Every test traces to ≥1 AC or restriction.

36 KiB Raw Blame History Unescape Escape

Blackbox Tests

Positive Scenarios

FT-P-01: 60-frame sequential pipeline — ≥80 % within 50 m (pipeline-correctness only)

FT-P-02: 60-frame sequential pipeline — ≥50 % within 20 m (pipeline-correctness only)

FT-P-T2: AC-1.1 / AC-1.2 deployment-binding accuracy on AerialVL S03

FT-P-03: Per-frame error bound ≤100 m

FT-P-04: VO drift bound between satellite anchors

FT-P-05: GPS_INPUT shape under normal tracking

FT-P-06: GPS_INPUT shape during VO-only fallback

FT-P-07: GPS_INPUT shape during dead-reckoning

FT-P-08: GPS_INPUT shape on total failure

FT-P-09: Confidence tier transitions

FT-P-10: Image registration rate (functional)

FT-P-11: Mean Reprojection Error (MRE)

FT-P-12: Continuous output through turn area (frames 32–43)

FT-P-13: 350 m outlier handled (AC-3.1)

FT-P-14: Sharp-turn re-localization (AC-3.2)

FT-P-15: VO loss → satellite recovery → tracking_state == NORMAL

FT-P-16: Cold-start TTFF ≤30 s p95

FT-P-17: Validate initial position via first satellite match

FT-P-18: Mid-flight reboot recovery ≤30 s

FT-P-19: Post-reboot first-match accuracy

FT-P-20: Object localization (level flight)

FT-P-21: Coordinate transform round-trip ≤0.1 m

FT-P-22: GET /health schema and content

FT-P-23: POST /sessions returns id

FT-P-24: SSE stream emits per-second events

FT-P-25: TRT engine load ≤10 s

FT-P-26: Tile storage size for the operational area

FT-P-27: Tile mosaic coverage radius ≥500 m

FT-P-28: Tile dedup — ≤1 onboard tile per ground sector

FT-P-29: Post-flight upload to candidate pool

FT-P-30: NAMED_VALUE_FLOAT telemetry rate

FT-P-31: Disconnected segments — ≥3 connected via global retrieval

FT-P-32: Position refinement / corrections (AC-4.5)

FT-P-33: Object-localize bank/pitch >5° publishes uncertainty bound

FT-P-34: Mid-flight tile generation respects σ_xy ≤ 5 m hard gate

FT-P-35: NF-T4b cache-poisoning safety budget (Monte Carlo)

FT-P-36: AC-NEW-9 covariance calibration accuracy

FT-P-37: F-T18 calibrator regression (no state propagation)

Negative Scenarios

FT-N-01: Corrupted nav-cam frame — no crash, degraded mode

FT-N-02: Object-localize invalid pixel

FT-N-03: Unauthenticated POST /sessions

FT-N-04: Stale tile beyond grace — must NOT label satellite_anchored

FT-N-05: Stale tile in 30-day grace — confidence linearly decayed

FT-N-06: Sharp-turn negative — must trigger satellite re-loc, not silently fail

FT-N-07: 3-consecutive-failure → RELOC_REQ on STATUSTEXT

FT-N-08: Re-loc waiting state behaviour

FT-N-09: Operator hint — used as 500 m seed

FT-N-10: Operator hint — malformed value rejected

FT-N-11: AC-4.3 — ODOMETRY intentionally absent in v1

FT-N-12: Spoofed GPS — SUT promotes within 3 s

FT-N-13: Failsafe at 3 s no-fix (AC-5.2)

FT-N-14: Refusal of unsigned MAVLink (S-T1 boundary)

FT-N-15: SITL F-T9 Option A regression — EKF3 fuses GPS_INPUT only, no double-fusion

FT-N-16: SITL F-T9 Option B regression (v1.1 candidate)

FT-N-17: AC-NEW-7 single-flight tile NOT promoted to trusted basemap

FT-N-18: AC-8.5 — raw frames are NOT retained in FDR

FT-N-19: Free public Sentinel-2 tile rejected at cache boundary

FT-N-20: Photo-count cap removed — system runs without arbitrary cap

FT-N-21: AC-7.1 in maneuvering flight — uncertainty bound published, not a fixed number

Coverage notes

36 KiB

Raw Blame History

FT-P-22: `GET /health` schema and content

FT-P-23: `POST /sessions` returns id

FT-N-03: Unauthenticated `POST /sessions`

FT-N-04: Stale tile beyond grace — must NOT label `satellite_anchored`