[AZ-424] [AZ-425] [AZ-426] Implement negatives set (FT-N-01/03/04)

Adds three pure-logic evaluators + scenarios + unit tests covering the
project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5,
AC-NEW-8):

* outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift
  bound + 3-frame covariance-monotonic window over the AZ-408 outlier
  injector's medium-density manifest.
* outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive
  missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT
  arrives at 2 s ±500 ms, dead_reckoned label during outage, and no
  FC EKF divergence.
* blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across
  the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection,
  monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz,
  35 s escalation thresholds, and recovery gate.

Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay /
SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the
AC logic today. Full e2e unit-test suite: 527 passed (+67).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 08:26:16 +03:00
parent a644debdb7
commit 2d6d44af5d
16 changed files with 3343 additions and 1 deletions
@@ -0,0 +1,68 @@
# FT-N-01 — 350 m outlier injection tolerance
**Task**: AZ-424_ft_n_01_outlier_tolerance
**Name**: Tolerate up to 350 m outliers between consecutive frames; tilt up to ±20° (AC-3.1, RESTRICT-CAM-1)
**Description**: Implement FT-N-01 — Derkachi replay with `outlier-injection-derkachi` injector (medium density); SUT detects outlier; rejects from anchor; estimate continues from prior valid state; covariance grows monotonically; per-frame error_after_outlier ≤ error_before_outlier + 50 m.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408
**Component**: Blackbox Tests / Negative (epic AZ-262)
**Tracker**: AZ-424
**Epic**: AZ-262 (E-BBT)
## Problem
Outlier tolerance (AC-3.1) is the project's primary failure-mode robustness measurement. Without this scenario the matcher's outlier-rejection is unmeasured.
## Outcome
- pytest scenario at `e2e/tests/negative/test_ft_n_01_outlier_tolerance.py`.
- Replays Derkachi with the `outlier.py --density medium` injector (every 10th frame replaced).
- For each non-outlier frame: computes `error_per_frame = vincenty(estimate, GT)`; tracks `error_before_outlier` and `error_after_outlier`.
- For each outlier event: asserts `error_after_outlier ≤ error_before_outlier + 50 m`.
- Tracks `cov_semi_major_m` across outlier events; asserts monotonic growth across the event.
## Scope
### Included
- `medium`-density injection.
- Per-frame error computation against GT.
- Per-event drift bound assertion.
- Per-event covariance-monotonic assertion.
### Excluded
- `light` and `heavy` densities — `medium` is the AC-3.1 canonical envelope.
- Tilt envelope assertion (camera ±20°) — derived directly from `RESTRICT-CAM-1` and verified at fixture-validation time, NOT exercised in this scenario.
## Acceptance Criteria
**AC-1: medium-density injection active**
Given the injector is configured `--density medium`
Then ≥10 outlier frames are injected over the Derkachi 8-min replay (1 in 10 ≈ 1470 frames / 10).
**AC-2: drift bound per outlier**
Given each outlier event
Then `error_after_outlier ≤ error_before_outlier + 50 m`.
**AC-3: covariance monotonic**
Given the per-frame `cov_semi_major_m` stream
Then for every outlier event, `cov_semi_major_m` is non-decreasing across the 3-frame window centred on the outlier.
**AC-4: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: outbound estimate stream, FDR per-frame records.
- **Forbidden**: importing C3 matcher state, stubbing the outlier-detection threshold.
## Constraints
- The injector's per-frame replacement is geo-located via the tile-cache GT; ≥99 % of replacements are >350 m from the original frame's GT centre (per AZ-408 AC-2).
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-N-01
- `_docs/02_document/tests/test-data.md` § Resilience (FT-N-01 row)
@@ -0,0 +1,71 @@
# FT-N-03 — Extended outage triggers operator re-loc request
**Task**: AZ-425_ft_n_03_outage_reloc
**Name**: ≥3 consecutive frames AND ≥2 s without estimate → STATUSTEXT `OPERATOR_RELOC_REQUEST` + `dead_reckoned` propagation (AC-3.4)
**Description**: Implement FT-N-03 — Derkachi replay with synthetic 3-frame outage injector; SUT fails to produce estimates for 3+ frames; after ≥2 s, STATUSTEXT containing `OPERATOR_RELOC_REQUEST` is emitted to mavproxy listener; estimates labeled `dead_reckoned` continue; FC uses last-known + IMU extrapolation.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408
**Component**: Blackbox Tests / Negative / Resilience (epic AZ-262)
**Tracker**: AZ-425
**Epic**: AZ-262 (E-BBT)
## Problem
The operator must be alerted when the SUT enters a sustained no-estimate state — without a STATUSTEXT-based signal the mission is silently degraded. AC-3.4 is the operator-experience cornerstone of the failure-mode contract.
## Outcome
- pytest scenario at `e2e/tests/negative/test_ft_n_03_outage_reloc.py`.
- Replays Derkachi with a 3-consecutive-frame failure injector (corrupt frames force C3 matcher to fail).
- After 2 s of no SUT estimate: assert STATUSTEXT containing `OPERATOR_RELOC_REQUEST` is captured in mavproxy `.tlog`.
- During outage: assert outbound estimates carry `source_label = dead_reckoned` (FC IMU-extrapolated propagation continues).
## Scope
### Included
- 3-frame failure injector (a thin extension of `injectors/outlier.py` that emits all-zero frames instead of crops).
- mavproxy listener regex match on `OPERATOR_RELOC_REQUEST`.
- `dead_reckoned`-label observation during outage.
### Excluded
- 5/15/35 s blackout-with-spoof — owned by FT-N-04 (AZ-426).
- Multi-segment re-loc — owned by FT-P-08 (AZ-415).
## Acceptance Criteria
**AC-1: outage onset**
Given the injector emits 3 consecutive corrupt frames
Then the SUT fails to produce estimates for ≥3 frames (observable via gap in outbound stream).
**AC-2: STATUSTEXT emission**
Given ≥2 s elapses without SUT estimate
Then mavproxy `.tlog` contains a STATUSTEXT with payload matching regex `OPERATOR_RELOC_REQUEST` within ≤500 ms of the 2 s mark.
**AC-3: dead_reckoned during outage**
Given the outage window
Then outbound emissions during the window carry `source_label = dead_reckoned` (estimates continue, just IMU-extrapolated).
**AC-4: FC IMU-only continues**
Given the `dead_reckoned` emissions reach the FC
Then the FC's EKF uses last-known + IMU extrapolation (observable via SITL state read; no EKF divergence event).
**AC-5: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: mavproxy listener `.tlog`, outbound message stream, SITL state read.
- **Forbidden**: monkeypatching the `OPERATOR_RELOC_REQUEST` emitter, stubbing the no-estimate detector.
## Constraints
- The STATUSTEXT regex is `OPERATOR_RELOC_REQUEST` (exact substring match within the STATUSTEXT payload; case-sensitive).
- The 2 s threshold is per AC-3.4; the test's tolerance is ±500 ms around that threshold.
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-N-03
- `_docs/02_document/tests/test-data.md` § Resilience (FT-N-03 row)
@@ -0,0 +1,94 @@
# FT-N-04 — Visual blackout + spoofed GPS combined failsafe
**Task**: AZ-426_ft_n_04_blackout_spoof
**Name**: AC-3.5 + AC-NEW-8 combined failsafe — switch label, reject spoof, propagate, monotonic covariance, STATUSTEXT (AC-3.5, AC-NEW-8)
**Description**: Implement FT-N-04 — three sub-cases at 5 s / 15 s / 35 s blackout windows paired with FC GPS spoof; assert mode transition ≤400 ms or ≤1 frame; spoofed GPS rejected; covariance monotonic; honest `horiz_accuracy`; `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz; 35 s window adds covariance-threshold escalations.
**Complexity**: 5 points
**Dependencies**: AZ-406, AZ-407, AZ-408
**Component**: Blackbox Tests / Negative / Security (epic AZ-262)
**Tracker**: AZ-426
**Epic**: AZ-262 (E-BBT)
## Problem
The combined "visual-blackout + spoofed-GPS" failsafe is the project's most security-critical degradation mode. AC-3.5 + AC-NEW-8 prescribe a multi-step ladder (label switch, spoof rejection, monotonic covariance, honest reporting, escalation thresholds) that is genuinely hard to validate end-to-end. This must be exercised with synthetic spoof injection.
## Outcome
- pytest scenario at `e2e/tests/negative/test_ft_n_04_blackout_spoof.py` with three sub-tests (5 s, 15 s, 35 s windows) using `blackout_spoof.py`.
- For every sub-case asserts:
- Within ≤1 frame OR ≤400 ms of blackout-onset: `source_label = dead_reckoned`; spoofed GPS rejected; covariance grows monotonically.
- `horiz_accuracy ≥ 0.95 × cov_semi_major_m` (no under-reporting).
- GCS receives `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1-2 Hz throughout the blackout.
- For the 35 s window only:
- When 95 % covariance crosses 100 m → fix-quality degraded ("2D fix or worse" in MAVLink fix_type).
- When 95 % covariance crosses 500 m OR blackout exceeds 30 s → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT.
- After blackout end: recovery only after FC GPS-health stable + non-spoofed for ≥10 s AND a visual/satellite consistency check succeeds.
## Scope
### Included
- All three sub-cases (5 s, 15 s, 35 s).
- Multi-step ladder assertions per sub-case.
- 35 s window covariance-threshold escalations.
- Recovery-gate assertion (≥10 s of stable non-spoofed FC GPS before re-promotion).
### Excluded
- Pure-blackout (no spoof) — owned by NFT-RES-01 / FT-N-03.
- Cache-poisoning safety — owned by NFT-SEC-01.
## Acceptance Criteria
**AC-1: switch latency**
Given any blackout-onset event
Then within ≤1 frame OR ≤400 ms (whichever is shorter for the run's frame rate), `source_label = dead_reckoned`.
**AC-2: spoof rejection**
Given the FC inbound proxy injects spoofed GPS during the window
Then the SUT does NOT consume the spoofed GPS into the estimator (verifiable via FDR `spoof-rejected` events).
**AC-3: monotonic covariance**
Given the per-frame `cov_semi_major_m` stream within the blackout
Then `cov_semi_major_m` is non-decreasing across consecutive emissions.
**AC-4: honest horiz_accuracy**
Given the outbound `GPS_INPUT.horiz_accuracy` (AP) field
Then `horiz_accuracy ≥ 0.95 × cov_semi_major_m` for every emission within the blackout.
**AC-5: STATUSTEXT 1-2 Hz**
Given the GCS-side mavproxy capture
Then `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT messages are emitted at a rate in `[1, 2]` Hz throughout the blackout.
**AC-6 (35 s only): 100 m covariance escalation**
Given the 35 s window
When 95 % covariance crosses 100 m
Then outbound MAVLink reports fix-quality degraded (2D fix or worse).
**AC-7 (35 s only): 500 m / 30 s escalation**
When 95 % covariance crosses 500 m OR blackout exceeds 30 s
Then `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT emitted within ≤500 ms of the threshold crossing.
**AC-8: recovery gate**
Given blackout ends and FC GPS-health is restored
Then recovery to `satellite_anchored` only after: (a) FC GPS-health stable + non-spoofed for ≥10 s AND (b) a visual/satellite consistency check succeeds.
**AC-9: parameterization**
Given conftest parameterization
Then all sub-cases run per `(fc_adapter, vio_strategy)` for AP (where signing is in play), `fc_adapter=inav` exercises the same ladder minus the AP-specific signing context.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: blackout_spoof injector (a public-input fault), FDR `spoof-rejected` events, mavproxy STATUSTEXT capture, outbound `horiz_accuracy` field.
- **Forbidden**: stubbing the spoof detector, monkeypatching the source-label state machine.
## Constraints
- All AC numerical thresholds match the AC-3.5 / AC-NEW-8 text exactly; deviation is a real defect signal.
- The recovery-gate ≥10 s is wall-clock measured by the runner.
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-N-04
- `_docs/02_document/tests/test-data.md` § Resilience (FT-N-04 row)