Files
gps-denied-onboard/_docs/02_document/tests/resilience-tests.md
T
Oleksandr Bezdieniezhnykh bb9c408597 [autodev] Step 12 cycle-1 sync: tests/resilience+traceability
Backfill the uncommitted Step 12 (Test-Spec Sync) output for the
resilience-tests and traceability-matrix surfaces; these were
produced by the test-spec skill in cycle-update mode but never
landed as a git commit before the flow moved to Step 13.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 16:49:26 +03:00

9.5 KiB
Raw Blame History

Resilience Tests

NFT-RES-01: FC IMU-only fallback after >3 s without estimate

Summary: Validates AC-5.2 — on >3 s without an estimate, the FC falls back to IMU-only dead reckoning AND the SUT logs the failure. Traces to: AC-5.2

Preconditions:

  • SUT in satellite_anchored steady state on Derkachi replay.
  • 4 s outage injector primed (replay paused for 4 s of wall-clock).

Fault injection:

  • Pause frame source for 4 s of wall-clock while FC IMU stream continues.

Steps:

Step Action Expected Behavior
1 Mid-replay, halt frame delivery for 4 s SUT continues emitting dead_reckoned estimates from FC IMU/attitude propagation
2 After 3 s without an emit (i.e. SUT internally fails to update for >3 s), SUT logs NO_ESTIMATE_TIMEOUT FDR contains the log entry
3 Observe FC EKF source-set transition EKF source-set transitions to internal IMU-only on the FC side per the FC's own failsafe logic (AP EKF_FAILSAFE or equivalent on iNav)
4 Resume frame delivery SUT recovers; FC EKF source-set returns to companion-GPS source

Pass criteria:

  • NO_ESTIMATE_TIMEOUT logged within 200 ms of the 3 s mark.
  • FC EKF reflects the transition.
  • Recovery on resume happens within 5 emit cycles.

NFT-RES-02: Companion mid-flight reboot

Summary: Validates AC-5.3 — on companion reboot mid-flight, SUT re-initializes from FC's current IMU-extrapolated position. Traces to: AC-5.3

Preconditions:

  • SUT in steady state on Derkachi replay.
  • FC SITL has been running long enough to have a stable IMU-extrapolated pose.

Fault injection:

  • docker compose restart gps-denied-onboard mid-replay (or systemctl restart on Tier-2).

Steps:

Step Action Expected Behavior
1 At t=120 s of replay, restart SUT container SUT goes down and back up
2 Wait for first post-restart GPS_INPUT / MSP2_SENSOR_GPS arrival First emit lat/lon within ±100 m of FC's IMU-extrapolated pose at boot-complete time
3 Observe TTFF post-reboot Within AC-NEW-1 budget (<30 s p95)

Pass criteria:

  • First post-restart emit ±100 m of FC pose at boot-complete.
  • Cold-restart TTFF < 30 s.
  • No FC-side EKF divergence event during the gap.

NFT-RES-03: False-position safety budget Monte Carlo

Summary: Validates AC-NEW-4 false-position safety budget (P(error > 500 m) < 0.1%, P(error > 1 km) < 0.01%) on the available data + synthesis. PARTIAL — multi-flight statistics constrained by single Derkachi flight + 60 stills (see traceability matrix flag). Traces to: AC-NEW-4 (PARTIAL)

Preconditions:

  • Tier-1 acceptable (statistical rather than hardware-bound).
  • Pull together: 60 still-image runs (60 frames) + Derkachi replay (~14,700 frames at 30 fps OR resampled to ~870 frames at 3 Hz target). Total ≥930 frames per Monte Carlo iteration.
  • Run M=50 Monte Carlo iterations with synthetic perturbations (camera-pose noise, IMU bias drift, randomized tile sub-selection).

Fault injection:

  • Add per-iteration synthetic perturbations to mimic a population of independent flights.

Steps:

Step Action Expected Behavior
1 Run M iterations end-to-end Per-iteration error distribution captured
2 Aggregate across all iterations × frames Per-frame error CDF
3 Read off P(error > 500 m) and P(error > 1 km) from CDF Both values

Pass criteria (PARTIAL):

  • P(error > 500 m) < 0.1%.
  • P(error > 1 km) < 0.01%.
  • Test FAILS-OPEN with explicit "PARTIAL" annotation in CSV report when iteration count is below the AC-NEW-4-implied ≥100 flights — noted as reduced confidence pending D-PROJ-3 (AerialVL S03 + own multi-flight data).

NFT-RES-04: Visual blackout + spoof degraded-mode escalation

Summary: Validates the AC-NEW-8 escalation ladder (5 s, 15 s, 35 s blackouts paired with spoof) including the 100 m / 500 m covariance thresholds and the 10 s GPS-health gate before recovery. Traces to: AC-NEW-8 (twin of FT-N-04 with extended duration window and covariance assertions)

Preconditions: Same as FT-N-04; Tier-1 acceptable.

Fault injection: blackout-spoof-derkachi 5 s / 15 s / 35 s windows + spoofed FC GPS for the same windows.

Steps:

Step Action Expected Behavior
1 Begin 5 s window Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS rejected
2 At end of 5 s window, attempt recovery Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND visual/satellite consistency check succeeds (gate enforced)
3 Begin 15 s window Same as step 1 plus when 95% covariance crosses 100 m: outbound MAVLink fix-quality degraded to "2D fix or worse"
4 Begin 35 s window Plus when 95% covariance crosses 500 m OR blackout exceeds 30 s: horiz_accuracy=999.0 + VISUAL_BLACKOUT_FAILSAFE STATUSTEXT emitted

Pass criteria:

  • All four assertions fire at the right thresholds.
  • Recovery gate is honored — early recovery attempts (FC GPS healthy for <10 s) MUST NOT promote spoofed GPS back into the estimator.

Duration: ~10 min total for three windows.


NFT-RES-05: Composition-root bootstrap contract (replay-mode minimal config, operator-error contract, Tier-2 log boundary)

Summary: Validates the airborne composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687. Three sub-cases pin the cross-cycle plumbing that ALL Tier-1 and Tier-2 replay tests depend on: (a) replay-mode build_pre_constructed(config) succeeds without a c6_tile_cache config block (AZ-687 AC-687-1); (b) on a misconfigured airborne bootstrap, the SUT exits non-zero with stderr carrying the airborne_bootstrap: prefix, the consuming component slug, the missing infrastructure key or BUILD_* flag, and one actionable sentence (AZ-618 operator-facing-error-contract NFR + AC-4 non-success branch); (c) every successful Tier-2 replay run logs both replay.compose_root.ready and replay.input.frame_emitted to stdout (AZ-618 AC-5, AZ-687 AC-687-3). Traces to: AC-NEW-1 (precondition for cold-start TTFF), AC-4.1 (precondition for latency budget) — protects the assembly path every product-AC test relies on.

Preconditions:

  • Tier-1 acceptable for sub-cases (a) and (b); sub-case (c) is Tier-2-only.
  • For (a): replay CLI invocation with a synthesized Config that has mode == "replay" and components that does NOT include a c6_tile_cache block. Public surface: scripts/run_replay.sh (or equivalent CLI entrypoint) pointed at _docs/00_problem/input_data/flight_derkachi/.
  • For (b): minimal live-mode Config selecting a component strategy whose required BUILD_* flag is set to OFF in the environment (e.g. c2_vpr.strategy="net_vlad" with BUILD_PYTORCH_FP16_RUNTIME=OFF). No SUT internals are imported; the test reads stderr.
  • For (c): standard Tier-2 Jetson e2e replay invocation per tests/e2e/replay/test_derkachi_1min.py (AC-1, AC-2, AC-5, AC-6).

Fault injection (sub-case b only): set the gating BUILD_* env var to OFF for a strategy that the config selects, then start the SUT.

Steps:

Sub-case Step Consumer Action Expected Behavior
(a) replay minimal 1 Start SUT in replay mode against derkachi-fixture, with no c6_tile_cache block in the synthesized config SUT process reaches the replay coordinator without exiting; stdout contains replay.compose_root.ready within 60 s
(a) replay minimal 2 Observe at least one outbound frame emission stdout contains at least one replay.input.frame_emitted log line within the same run
(b) misconfig 1 Start SUT in live mode with a config that selects a strategy whose BUILD_* flag is OFF SUT exits with EXIT_GENERIC_FAILURE (1)
(b) misconfig 2 Read SUT stderr stderr contains airborne_bootstrap: prefix AND the consuming component slug (e.g. c2_vpr) AND either the missing infrastructure key (e.g. c7_inference) or the gating BUILD_* flag name AND one actionable sentence (regex: set \BUILD_[A-Z0-9_]+`ORensure `[a-z0-9_]+.[a-z0-9_]+` is`)
(c) Tier-2 log boundary 1 Run scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py (or its compose-bypass equivalent) Each of AC-1 / AC-2 / AC-5 / AC-6 produces stdout that crosses both replay.compose_root.ready AND replay.input.frame_emitted log boundaries

Pass criteria:

  • (a) Process does not exit during bootstrap; both boundary log lines appear within 60 s of process start.
  • (b) Exit code is 1; stderr matches the four-field contract (prefix + component slug + missing key/flag + actionable sentence).
  • (c) Every AC sub-test in the Tier-2 invocation has both log lines in its captured stdout; batch report Tier-2 evidence: field references the terminal log path per _docs/02_document/tests/tier2-jetson-testing.md.

Duration: ~3 min (Tier-1 sub-cases a+b combined); sub-case (c) is observed inside the existing Tier-2 e2e replay run, no separate duration.

Rationale: This scenario formalizes the cross-cycle bootstrap invariant. The Tier-1 contract path bypasses compose_root's registry-driven assembly via replay_components_factory, so without this test the failure modes AZ-687 caught (replay-mode KeyError) and AZ-618 caught (empty pre_constructed) regress silently — both were missed by the Tier-1 suite that was 3343/0 green at the moment of the Jetson rerun. See _docs/02_document/tests/tier2-jetson-testing.md § Rationale.