mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 22:01:13 +00:00
bb9c408597
Backfill the uncommitted Step 12 (Test-Spec Sync) output for the resilience-tests and traceability-matrix surfaces; these were produced by the test-spec skill in cycle-update mode but never landed as a git commit before the flow moved to Step 13. Co-authored-by: Cursor <cursoragent@cursor.com>
146 lines
9.5 KiB
Markdown
146 lines
9.5 KiB
Markdown
# Resilience Tests
|
||
|
||
### NFT-RES-01: FC IMU-only fallback after >3 s without estimate
|
||
|
||
**Summary**: Validates AC-5.2 — on >3 s without an estimate, the FC falls back to IMU-only dead reckoning AND the SUT logs the failure.
|
||
**Traces to**: AC-5.2
|
||
|
||
**Preconditions**:
|
||
- SUT in `satellite_anchored` steady state on Derkachi replay.
|
||
- 4 s outage injector primed (replay paused for 4 s of wall-clock).
|
||
|
||
**Fault injection**:
|
||
- Pause frame source for 4 s of wall-clock while FC IMU stream continues.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Action | Expected Behavior |
|
||
|------|--------|------------------|
|
||
| 1 | Mid-replay, halt frame delivery for 4 s | SUT continues emitting `dead_reckoned` estimates from FC IMU/attitude propagation |
|
||
| 2 | After 3 s without an emit (i.e. SUT internally fails to update for >3 s), SUT logs `NO_ESTIMATE_TIMEOUT` | FDR contains the log entry |
|
||
| 3 | Observe FC EKF source-set transition | EKF source-set transitions to internal IMU-only on the FC side per the FC's own failsafe logic (AP `EKF_FAILSAFE` or equivalent on iNav) |
|
||
| 4 | Resume frame delivery | SUT recovers; FC EKF source-set returns to companion-GPS source |
|
||
|
||
**Pass criteria**:
|
||
- `NO_ESTIMATE_TIMEOUT` logged within 200 ms of the 3 s mark.
|
||
- FC EKF reflects the transition.
|
||
- Recovery on resume happens within 5 emit cycles.
|
||
|
||
---
|
||
|
||
### NFT-RES-02: Companion mid-flight reboot
|
||
|
||
**Summary**: Validates AC-5.3 — on companion reboot mid-flight, SUT re-initializes from FC's current IMU-extrapolated position.
|
||
**Traces to**: AC-5.3
|
||
|
||
**Preconditions**:
|
||
- SUT in steady state on Derkachi replay.
|
||
- FC SITL has been running long enough to have a stable IMU-extrapolated pose.
|
||
|
||
**Fault injection**:
|
||
- `docker compose restart gps-denied-onboard` mid-replay (or `systemctl restart` on Tier-2).
|
||
|
||
**Steps**:
|
||
|
||
| Step | Action | Expected Behavior |
|
||
|------|--------|------------------|
|
||
| 1 | At t=120 s of replay, restart SUT container | SUT goes down and back up |
|
||
| 2 | Wait for first post-restart `GPS_INPUT` / `MSP2_SENSOR_GPS` arrival | First emit lat/lon within ±100 m of FC's IMU-extrapolated pose at boot-complete time |
|
||
| 3 | Observe TTFF post-reboot | Within AC-NEW-1 budget (<30 s p95) |
|
||
|
||
**Pass criteria**:
|
||
- First post-restart emit ±100 m of FC pose at boot-complete.
|
||
- Cold-restart TTFF < 30 s.
|
||
- No FC-side EKF divergence event during the gap.
|
||
|
||
---
|
||
|
||
### NFT-RES-03: False-position safety budget Monte Carlo
|
||
|
||
**Summary**: Validates AC-NEW-4 false-position safety budget (`P(error > 500 m) < 0.1%`, `P(error > 1 km) < 0.01%`) on the available data + synthesis. PARTIAL — multi-flight statistics constrained by single Derkachi flight + 60 stills (see traceability matrix flag).
|
||
**Traces to**: AC-NEW-4 (PARTIAL)
|
||
|
||
**Preconditions**:
|
||
- Tier-1 acceptable (statistical rather than hardware-bound).
|
||
- Pull together: 60 still-image runs (60 frames) + Derkachi replay (~14,700 frames at 30 fps OR resampled to ~870 frames at 3 Hz target). Total ≥930 frames per Monte Carlo iteration.
|
||
- Run M=50 Monte Carlo iterations with synthetic perturbations (camera-pose noise, IMU bias drift, randomized tile sub-selection).
|
||
|
||
**Fault injection**:
|
||
- Add per-iteration synthetic perturbations to mimic a population of independent flights.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Action | Expected Behavior |
|
||
|------|--------|------------------|
|
||
| 1 | Run M iterations end-to-end | Per-iteration error distribution captured |
|
||
| 2 | Aggregate across all iterations × frames | Per-frame error CDF |
|
||
| 3 | Read off `P(error > 500 m)` and `P(error > 1 km)` from CDF | Both values |
|
||
|
||
**Pass criteria** (PARTIAL):
|
||
- `P(error > 500 m) < 0.1%`.
|
||
- `P(error > 1 km) < 0.01%`.
|
||
- Test FAILS-OPEN with explicit "PARTIAL" annotation in CSV report when iteration count is below the AC-NEW-4-implied ≥100 flights — noted as reduced confidence pending D-PROJ-3 (AerialVL S03 + own multi-flight data).
|
||
|
||
---
|
||
|
||
### NFT-RES-04: Visual blackout + spoof degraded-mode escalation
|
||
|
||
**Summary**: Validates the AC-NEW-8 escalation ladder (5 s, 15 s, 35 s blackouts paired with spoof) including the 100 m / 500 m covariance thresholds and the 10 s GPS-health gate before recovery.
|
||
**Traces to**: AC-NEW-8 (twin of FT-N-04 with extended duration window and covariance assertions)
|
||
|
||
**Preconditions**: Same as FT-N-04; Tier-1 acceptable.
|
||
|
||
**Fault injection**: `blackout-spoof-derkachi` 5 s / 15 s / 35 s windows + spoofed FC GPS for the same windows.
|
||
|
||
**Steps**:
|
||
|
||
| Step | Action | Expected Behavior |
|
||
|------|--------|------------------|
|
||
| 1 | Begin 5 s window | Mode transition ≤ 400 ms; covariance grows monotonically; spoofed GPS rejected |
|
||
| 2 | At end of 5 s window, attempt recovery | Recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND visual/satellite consistency check succeeds (gate enforced) |
|
||
| 3 | Begin 15 s window | Same as step 1 plus when 95% covariance crosses 100 m: outbound MAVLink fix-quality degraded to "2D fix or worse" |
|
||
| 4 | Begin 35 s window | Plus when 95% covariance crosses 500 m OR blackout exceeds 30 s: `horiz_accuracy=999.0` + `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT emitted |
|
||
|
||
**Pass criteria**:
|
||
- All four assertions fire at the right thresholds.
|
||
- Recovery gate is honored — early recovery attempts (FC GPS healthy for <10 s) MUST NOT promote spoofed GPS back into the estimator.
|
||
|
||
**Duration**: ~10 min total for three windows.
|
||
|
||
---
|
||
|
||
### NFT-RES-05: Composition-root bootstrap contract (replay-mode minimal config, operator-error contract, Tier-2 log boundary)
|
||
|
||
**Summary**: Validates the airborne composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687. Three sub-cases pin the cross-cycle plumbing that ALL Tier-1 and Tier-2 replay tests depend on:
|
||
(a) replay-mode `build_pre_constructed(config)` succeeds without a `c6_tile_cache` config block (AZ-687 AC-687-1);
|
||
(b) on a misconfigured airborne bootstrap, the SUT exits non-zero with stderr carrying the `airborne_bootstrap:` prefix, the consuming component slug, the missing infrastructure key or `BUILD_*` flag, and one actionable sentence (AZ-618 operator-facing-error-contract NFR + AC-4 non-success branch);
|
||
(c) every successful Tier-2 replay run logs both `replay.compose_root.ready` and `replay.input.frame_emitted` to stdout (AZ-618 AC-5, AZ-687 AC-687-3).
|
||
**Traces to**: AC-NEW-1 (precondition for cold-start TTFF), AC-4.1 (precondition for latency budget) — protects the assembly path every product-AC test relies on.
|
||
|
||
**Preconditions**:
|
||
- Tier-1 acceptable for sub-cases (a) and (b); sub-case (c) is Tier-2-only.
|
||
- For (a): replay CLI invocation with a synthesized `Config` that has `mode == "replay"` and `components` that does NOT include a `c6_tile_cache` block. Public surface: `scripts/run_replay.sh` (or equivalent CLI entrypoint) pointed at `_docs/00_problem/input_data/flight_derkachi/`.
|
||
- For (b): minimal live-mode `Config` selecting a component strategy whose required `BUILD_*` flag is set to OFF in the environment (e.g. `c2_vpr.strategy="net_vlad"` with `BUILD_PYTORCH_FP16_RUNTIME=OFF`). No SUT internals are imported; the test reads stderr.
|
||
- For (c): standard Tier-2 Jetson e2e replay invocation per `tests/e2e/replay/test_derkachi_1min.py` (AC-1, AC-2, AC-5, AC-6).
|
||
|
||
**Fault injection** (sub-case b only): set the gating `BUILD_*` env var to `OFF` for a strategy that the config selects, then start the SUT.
|
||
|
||
**Steps**:
|
||
|
||
| Sub-case | Step | Consumer Action | Expected Behavior |
|
||
|----------|------|-----------------|-------------------|
|
||
| (a) replay minimal | 1 | Start SUT in replay mode against `derkachi-fixture`, with no `c6_tile_cache` block in the synthesized config | SUT process reaches the replay coordinator without exiting; stdout contains `replay.compose_root.ready` within 60 s |
|
||
| (a) replay minimal | 2 | Observe at least one outbound frame emission | stdout contains at least one `replay.input.frame_emitted` log line within the same run |
|
||
| (b) misconfig | 1 | Start SUT in live mode with a config that selects a strategy whose `BUILD_*` flag is OFF | SUT exits with `EXIT_GENERIC_FAILURE` (`1`) |
|
||
| (b) misconfig | 2 | Read SUT stderr | stderr contains `airborne_bootstrap:` prefix AND the consuming component slug (e.g. `c2_vpr`) AND either the missing infrastructure key (e.g. `c7_inference`) or the gating `BUILD_*` flag name AND one actionable sentence (regex: `set \`BUILD_[A-Z0-9_]+\`` OR `ensure \`[a-z0-9_]+\.[a-z0-9_]+\` is`) |
|
||
| (c) Tier-2 log boundary | 1 | Run `scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py` (or its compose-bypass equivalent) | Each of AC-1 / AC-2 / AC-5 / AC-6 produces stdout that crosses both `replay.compose_root.ready` AND `replay.input.frame_emitted` log boundaries |
|
||
|
||
**Pass criteria**:
|
||
- (a) Process does not exit during bootstrap; both boundary log lines appear within 60 s of process start.
|
||
- (b) Exit code is `1`; stderr matches the four-field contract (prefix + component slug + missing key/flag + actionable sentence).
|
||
- (c) Every AC sub-test in the Tier-2 invocation has both log lines in its captured stdout; batch report `Tier-2 evidence:` field references the terminal log path per `_docs/02_document/tests/tier2-jetson-testing.md`.
|
||
|
||
**Duration**: ~3 min (Tier-1 sub-cases a+b combined); sub-case (c) is observed inside the existing Tier-2 e2e replay run, no separate duration.
|
||
|
||
**Rationale**: This scenario formalizes the cross-cycle bootstrap invariant. The Tier-1 contract path bypasses `compose_root`'s registry-driven assembly via `replay_components_factory`, so without this test the failure modes AZ-687 caught (replay-mode `KeyError`) and AZ-618 caught (empty `pre_constructed`) regress silently — both were missed by the Tier-1 suite that was 3343/0 green at the moment of the Jetson rerun. See `_docs/02_document/tests/tier2-jetson-testing.md` § Rationale.
|