Files
gps-denied-onboard/_docs/02_document/tests/resilience-tests.md
T
Oleksandr Bezdieniezhnykh bb9c408597 [autodev] Step 12 cycle-1 sync: tests/resilience+traceability
Backfill the uncommitted Step 12 (Test-Spec Sync) output for the
resilience-tests and traceability-matrix surfaces; these were
produced by the test-spec skill in cycle-update mode but never
landed as a git commit before the flow moved to Step 13.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 16:49:26 +03:00

146 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Resilience Tests
### NFT-RES-01: FC IMU-only fallback after >3 s without estimate
**Summary**: Validates AC-5.2 — on >3 s without an estimate, the FC falls back to IMU-only dead reckoning AND the SUT logs the failure.
**Traces to**: AC-5.2
**Preconditions**:
- SUT in `satellite_anchored` steady state on Derkachi replay.
- 4 s outage injector primed (replay paused for 4 s of wall-clock).
**Fault injection**:
- Pause frame source for 4 s of wall-clock while FC IMU stream continues.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Mid-replay, halt frame delivery for 4 s | SUT continues emitting `dead_reckoned` estimates from FC IMU/attitude propagation |
| 2 | After 3 s without an emit (i.e. SUT internally fails to update for >3 s), SUT logs `NO_ESTIMATE_TIMEOUT` | FDR contains the log entry |
| 3 | Observe FC EKF source-set transition | EKF source-set transitions to internal IMU-only on the FC side per the FC's own failsafe logic (AP `EKF_FAILSAFE` or equivalent on iNav) |
| 4 | Resume frame delivery | SUT recovers; FC EKF source-set returns to companion-GPS source |
**Pass criteria**:
- `NO_ESTIMATE_TIMEOUT` logged within 200 ms of the 3 s mark.
- FC EKF reflects the transition.
- Recovery on resume happens within 5 emit cycles.
---
### NFT-RES-02: Companion mid-flight reboot
**Summary**: Validates AC-5.3 — on companion reboot mid-flight, SUT re-initializes from FC's current IMU-extrapolated position.
**Traces to**: AC-5.3
**Preconditions**:
- SUT in steady state on Derkachi replay.
- FC SITL has been running long enough to have a stable IMU-extrapolated pose.
**Fault injection**:
- `docker compose restart gps-denied-onboard` mid-replay (or `systemctl restart` on Tier-2).
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | At t=120 s of replay, restart SUT container | SUT goes down and back up |
| 2 | Wait for first post-restart `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival | First emit lat/lon within ±100 m of FC's IMU-extrapolated pose at boot-complete time |
| 3 | Observe TTFF post-reboot | Within AC-NEW-1 budget (<30 s p95) |
**Pass criteria**:
- First post-restart emit ±100 m of FC pose at boot-complete.
- Cold-restart TTFF < 30 s.
- No FC-side EKF divergence event during the gap.
---
### NFT-RES-03: False-position safety budget Monte Carlo
**Summary**: Validates AC-NEW-4 false-position safety budget (`P(error > 500 m) < 0.1%`, `P(error > 1 km) < 0.01%`) on the available data + synthesis. PARTIAL — multi-flight statistics constrained by single Derkachi flight + 60 stills (see traceability matrix flag).
**Traces to**: AC-NEW-4 (PARTIAL)
**Preconditions**:
- Tier-1 acceptable (statistical rather than hardware-bound).
- Pull together: 60 still-image runs (60 frames) + Derkachi replay (~14,700 frames at 30 fps OR resampled to ~870 frames at 3 Hz target). Total ≥930 frames per Monte Carlo iteration.
- Run M=50 Monte Carlo iterations with synthetic perturbations (camera-pose noise, IMU bias drift, randomized tile sub-selection).
**Fault injection**:
- Add per-iteration synthetic perturbations to mimic a population of independent flights.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Run M iterations end-to-end | Per-iteration error distribution captured |
| 2 | Aggregate across all iterations × frames | Per-frame error CDF |
| 3 | Read off `P(error > 500 m)` and `P(error > 1 km)` from CDF | Both values |
**Pass criteria** (PARTIAL):
- `P(error > 500 m) < 0.1%`.
- `P(error > 1 km) < 0.01%`.
- Test FAILS-OPEN with explicit "PARTIAL" annotation in CSV report when iteration count is below the AC-NEW-4-implied ≥100 flights — noted as reduced confidence pending D-PROJ-3 (AerialVL S03 + own multi-flight data).
---
### NFT-RES-04: Visual blackout + spoof degraded-mode escalation
**Summary**: Validates the AC-NEW-8 escalation ladder (5 s, 15 s, 35 s blackouts paired with spoof) including the 100 m / 500 m covariance thresholds and the 10 s GPS-health gate before recovery.
**Traces to**: AC-NEW-8 (twin of FT-N-04 with extended duration window and covariance assertions)
**Preconditions**: Same as FT-N-04; Tier-1 acceptable.
**Fault injection**: `blackout-spoof-derkachi` 5 s / 15 s / 35 s windows + spoofed FC GPS for the same windows.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Begin 5 s window | Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS rejected |
| 2 | At end of 5 s window, attempt recovery | Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND visual/satellite consistency check succeeds (gate enforced) |
| 3 | Begin 15 s window | Same as step 1 plus when 95% covariance crosses 100 m: outbound MAVLink fix-quality degraded to "2D fix or worse" |
| 4 | Begin 35 s window | Plus when 95% covariance crosses 500 m OR blackout exceeds 30 s: `horiz_accuracy=999.0` + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT emitted |
**Pass criteria**:
- All four assertions fire at the right thresholds.
- Recovery gate is honored — early recovery attempts (FC GPS healthy for <10 s) MUST NOT promote spoofed GPS back into the estimator.
**Duration**: ~10 min total for three windows.
---
### NFT-RES-05: Composition-root bootstrap contract (replay-mode minimal config, operator-error contract, Tier-2 log boundary)
**Summary**: Validates the airborne composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687. Three sub-cases pin the cross-cycle plumbing that ALL Tier-1 and Tier-2 replay tests depend on:
(a) replay-mode `build_pre_constructed(config)` succeeds without a `c6_tile_cache` config block (AZ-687 AC-687-1);
(b) on a misconfigured airborne bootstrap, the SUT exits non-zero with stderr carrying the `airborne_bootstrap:` prefix, the consuming component slug, the missing infrastructure key or `BUILD_*` flag, and one actionable sentence (AZ-618 operator-facing-error-contract NFR + AC-4 non-success branch);
(c) every successful Tier-2 replay run logs both `replay.compose_root.ready` and `replay.input.frame_emitted` to stdout (AZ-618 AC-5, AZ-687 AC-687-3).
**Traces to**: AC-NEW-1 (precondition for cold-start TTFF), AC-4.1 (precondition for latency budget) — protects the assembly path every product-AC test relies on.
**Preconditions**:
- Tier-1 acceptable for sub-cases (a) and (b); sub-case (c) is Tier-2-only.
- For (a): replay CLI invocation with a synthesized `Config` that has `mode == "replay"` and `components` that does NOT include a `c6_tile_cache` block. Public surface: `scripts/run_replay.sh` (or equivalent CLI entrypoint) pointed at `_docs/00_problem/input_data/flight_derkachi/`.
- For (b): minimal live-mode `Config` selecting a component strategy whose required `BUILD_*` flag is set to OFF in the environment (e.g. `c2_vpr.strategy="net_vlad"` with `BUILD_PYTORCH_FP16_RUNTIME=OFF`). No SUT internals are imported; the test reads stderr.
- For (c): standard Tier-2 Jetson e2e replay invocation per `tests/e2e/replay/test_derkachi_1min.py` (AC-1, AC-2, AC-5, AC-6).
**Fault injection** (sub-case b only): set the gating `BUILD_*` env var to `OFF` for a strategy that the config selects, then start the SUT.
**Steps**:
| Sub-case | Step | Consumer Action | Expected Behavior |
|----------|------|-----------------|-------------------|
| (a) replay minimal | 1 | Start SUT in replay mode against `derkachi-fixture`, with no `c6_tile_cache` block in the synthesized config | SUT process reaches the replay coordinator without exiting; stdout contains `replay.compose_root.ready` within 60 s |
| (a) replay minimal | 2 | Observe at least one outbound frame emission | stdout contains at least one `replay.input.frame_emitted` log line within the same run |
| (b) misconfig | 1 | Start SUT in live mode with a config that selects a strategy whose `BUILD_*` flag is OFF | SUT exits with `EXIT_GENERIC_FAILURE` (`1`) |
| (b) misconfig | 2 | Read SUT stderr | stderr contains `airborne_bootstrap:` prefix AND the consuming component slug (e.g. `c2_vpr`) AND either the missing infrastructure key (e.g. `c7_inference`) or the gating `BUILD_*` flag name AND one actionable sentence (regex: `set \`BUILD_[A-Z0-9_]+\`` OR `ensure \`[a-z0-9_]+\.[a-z0-9_]+\` is`) |
| (c) Tier-2 log boundary | 1 | Run `scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py` (or its compose-bypass equivalent) | Each of AC-1 / AC-2 / AC-5 / AC-6 produces stdout that crosses both `replay.compose_root.ready` AND `replay.input.frame_emitted` log boundaries |
**Pass criteria**:
- (a) Process does not exit during bootstrap; both boundary log lines appear within 60 s of process start.
- (b) Exit code is `1`; stderr matches the four-field contract (prefix + component slug + missing key/flag + actionable sentence).
- (c) Every AC sub-test in the Tier-2 invocation has both log lines in its captured stdout; batch report `Tier-2 evidence:` field references the terminal log path per `_docs/02_document/tests/tier2-jetson-testing.md`.
**Duration**: ~3 min (Tier-1 sub-cases a+b combined); sub-case (c) is observed inside the existing Tier-2 e2e replay run, no separate duration.
**Rationale**: This scenario formalizes the cross-cycle bootstrap invariant. The Tier-1 contract path bypasses `compose_root`'s registry-driven assembly via `replay_components_factory`, so without this test the failure modes AZ-687 caught (replay-mode `KeyError`) and AZ-618 caught (empty `pre_constructed`) regress silently — both were missed by the Tier-1 suite that was 3343/0 green at the moment of the Jetson rerun. See `_docs/02_document/tests/tier2-jetson-testing.md` § Rationale.