[autodev] archive batch 86 tasks, advance to batch 87

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 17:09:37 +03:00
parent 330893be5c
commit de19e716d8
5 changed files with 3 additions and 3 deletions
@@ -0,0 +1,68 @@
# NFT-RES-01 — IMU-only fallback drift bound
**Task**: AZ-432_nft_res_01_imu_only_fallback
**Name**: 30 s vision blackout drift bound: 100 m without good IMU, 50 m with CombinedImuFactor (AC-3.5, AC-NEW-7)
**Description**: Implement NFT-RES-01 — Derkachi replay; inject 30 s pure-black-frame window (no spoof); measure drift between estimate at blackout onset and estimate at blackout end against GT.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408 (blackout_spoof with `--no-spoof` flag)
**Component**: Blackbox Tests / Resilience (epic AZ-262)
**Tracker**: AZ-432
**Epic**: AZ-262 (E-BBT)
## Problem
The 30 s pure-vision-blackout drift bound is the project's IMU-fusion-quality cornerstone. AC-3.5 / AC-NEW-7 prescribe two budgets — 100 m without IMU, 50 m with CombinedImuFactor — and only by exercising both paths can the project verify the IMU fusion's value.
## Outcome
- pytest scenario at `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py`. Tier-1 OR Tier-2.
- Two sub-cases:
- (a) "No good IMU" = SUT runs without IMU input or with IMU disabled in C5.
- (b) "Good IMU + CombinedImuFactor" = SUT's default config.
- Each sub-case: 30 s blackout (no spoof); measure drift at blackout end vs GT.
- Assert `drift ≤ 100 m` (no IMU) AND `drift ≤ 50 m` (CombinedImuFactor).
## Scope
### Included
- Both sub-cases.
- Per-sub-case drift measurement.
### Excluded
- Spoof-paired blackout — owned by FT-N-04.
- Cumulative-drift over multiple blackouts — owned by FT-P-08.
## Acceptance Criteria
**AC-1: 30 s window injected**
Given the no-spoof blackout injector
Then a 30 s pure-black-frame window is active during the measurement.
**AC-2: drift bound (no IMU)**
Given sub-case (a)
Then `drift_at_t_end ≤ 100 m`.
**AC-3: drift bound (CombinedImuFactor)**
Given sub-case (b)
Then `drift_at_t_end ≤ 50 m`.
**AC-4: parameterization**
Given conftest parameterization
Then both sub-cases run per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: no-spoof blackout injector, outbound estimate stream, GT from `data_imu.csv`.
- **Forbidden**: stubbing C5 IMU-fusion path; sub-case (a) uses the SUT's documented "disable IMU" config flag if one exists, OR an empty IMU stream from the FC inbound proxy.
## Constraints
- Drift is `vincenty(estimate_at_t_end, GT_at_t_end)`.
- Sub-case (a) MUST exercise a real "no IMU" config path; if the SUT cannot be configured this way, the test fails AC-2.
## Document Dependencies
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-01
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-01 row)
@@ -0,0 +1,64 @@
# NFT-RES-02 — Companion mid-flight reboot recovery
**Task**: AZ-433_nft_res_02_companion_reboot
**Name**: SUT recovers from companion mid-flight reboot via FC pose snapshot (AC-5.2, AC-5.3)
**Description**: Implement NFT-RES-02 — Tier-1 OR Tier-2; mid-flight Derkachi replay; trigger companion-process restart (`docker compose restart gps-denied-onboard` on Tier-1, `systemctl restart gps-denied-onboard` on Tier-2); measure resume time + first-emission accuracy.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Resilience (epic AZ-262)
**Tracker**: AZ-433
**Epic**: AZ-262 (E-BBT)
## Problem
Companion-process reboots are an inevitable operational event. AC-5.2 / AC-5.3 prescribe a recovery contract — without measurement the project has no honest claim of recoverability.
## Outcome
- pytest scenario at `e2e/tests/resilience/test_nft_res_02_companion_reboot.py`. Tier-1 OR Tier-2.
- Mid-flight: trigger `restart` of the SUT process; measure (a) `resume_time` (wall-clock from restart command to first post-restart emission); (b) `first_emission_accuracy` (vincenty distance from first post-restart estimate to GT at that timestamp).
- Assert `resume_time ≤ 30 s` AND `first_emission_accuracy ≤ 100 m`.
## Scope
### Included
- Mid-flight restart trigger.
- Resume-time + accuracy measurement.
### Excluded
- Cold-start TTFF — owned by NFT-PERF-03.
## Acceptance Criteria
**AC-1: restart trigger**
Given a Derkachi replay in progress past the 60 s mark
When the test issues the restart command
Then the SUT process restarts within ≤5 s (Docker / systemd response).
**AC-2: resume time ≤ 30 s**
Given the restart event
Then the first post-restart outbound emission occurs within ≤30 s of the restart command.
**AC-3: first-emission accuracy ≤ 100 m**
Given the first post-restart emission
Then `vincenty(estimate, GT) ≤ 100 m`.
**AC-4: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: Docker / systemd restart commands (public process-management interfaces); SITL outbound capture; GT from `data_imu.csv`.
- **Forbidden**: pre-snapshotting SUT internal state to short-circuit the cold-start.
## Constraints
- The restart command differs by tier; conftest passes the appropriate command per the runner's tier.
## Document Dependencies
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-02
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-02 row)
@@ -0,0 +1,66 @@
# NFT-RES-03 — Monte Carlo statistical envelope
**Task**: AZ-434_nft_res_03_monte_carlo
**Name**: 100-iteration Monte Carlo with seeded perturbations; 95th-percentile error within 1.96 × cov_semi_major_m (AC-NEW-4)
**Description**: Implement NFT-RES-03 — Tier-1 OR Tier-2; 100 Derkachi replays with seeded perturbations (gain noise, IMU bias, frame-drop, outlier injection); per-iteration record per-frame error + cov_semi_major_m; statistical envelope assertion.
**Complexity**: 5 points
**Dependencies**: AZ-406, AZ-407, AZ-408
**Component**: Blackbox Tests / Resilience (epic AZ-262)
**Tracker**: AZ-434
**Epic**: AZ-262 (E-BBT)
## Problem
A single deterministic run does not validate the SUT's statistical envelope. AC-NEW-4 requires a 100-iteration Monte Carlo with multiple perturbation sources to verify that the reported covariance is *honest* — i.e., the actual error distribution stays within the cov-derived envelope at the 95th percentile.
## Outcome
- pytest scenario at `e2e/tests/resilience/test_nft_res_03_monte_carlo.py`. Tier-1 OR Tier-2.
- 100 Derkachi replays at randomized seed (gain noise, IMU bias, frame-drop pattern, outlier-injection schedule all seeded from a master seed).
- Per iteration: per-frame `error_m` + `cov_semi_major_m`; aggregated globally across 100 × N_frames.
- Assertion: `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95` (95th-percentile envelope).
## Scope
### Included
- 100 iterations driven by the runner.
- Seeded perturbation generation.
- Aggregate envelope assertion.
### Excluded
- Per-component unit-level Monte Carlo — out of scope (unit tests).
## Acceptance Criteria
**AC-1: 100 iterations**
Given the test runs
Then 100 Derkachi replay iterations complete; partial completion is a hard failure (no early-return).
**AC-2: seeded perturbations**
Given the master seed
Then re-running with the same master seed produces bit-identical iteration outcomes (deterministic).
**AC-3: 95th-percentile envelope**
Given the global aggregate of 100 × N_frames samples
Then `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95`.
**AC-4: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`. Note: 100 iterations × 4 parameterizations is expensive — the runner SHOULD run only one canonical parameterization per CI invocation by default and surface "full-matrix" mode behind an env flag.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: per-iteration runner orchestration (Docker / systemd lifecycles); per-frame outbound stream + cov_semi_major_m.
- **Forbidden**: mocking C1-C5 to control the per-iteration error directly.
## Constraints
- 100 iterations × 8 min Derkachi replay = ≥800 min on Tier-1; the runner uses parallel Docker compose stacks (4 × 200 min) when CI host has the capacity. Tier-2 (Jetson) iterations run sequentially.
- Master seed is recorded in the evidence bundle for reproducibility.
## Document Dependencies
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-03
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-03 row)
@@ -0,0 +1,59 @@
# NFT-RES-04 — 35 s blackout-with-spoof full escalation ladder
**Task**: AZ-435_nft_res_04_blackout_escalation
**Name**: 35 s blackout + spoof exercises full failsafe escalation ladder (AC-NEW-8 escalation thresholds)
**Description**: Implement NFT-RES-04 — Tier-1 OR Tier-2; 35 s blackout + spoof injection; assert full escalation ladder fires (100 m → 2D fix degrade; 500 m or 30 s → 999.0 horiz_accuracy + VISUAL_BLACKOUT_FAILSAFE).
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408, AZ-426 (uses same fixture, separate scenario)
**Component**: Blackbox Tests / Resilience / Security (epic AZ-262)
**Tracker**: AZ-435
**Epic**: AZ-262 (E-BBT)
## Problem
While FT-N-04 covers the 35 s window's escalation thresholds, NFT-RES-04 is the resilience-tier scenario specifically tasked with verifying the *full* escalation ladder fires in observable order — covariance crossings → fix-quality degrade → 999.0 horiz_accuracy → VISUAL_BLACKOUT_FAILSAFE STATUSTEXT — under realistic timing.
## Outcome
- pytest scenario at `e2e/tests/resilience/test_nft_res_04_blackout_escalation.py`.
- 35 s blackout + spoof from `blackout_spoof.py --window 35s`.
- Assertions:
- 100 m covariance threshold → fix-quality degrade (2D fix or worse) within ≤500 ms of crossing.
- 500 m covariance OR 30 s elapsed → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT within ≤500 ms.
## Scope
### Included
- 35 s window with spoof.
- Both escalation-threshold assertions.
### Excluded
- The non-35-s sub-cases of FT-N-04 — owned by AZ-426.
## Acceptance Criteria
**AC-1: 100 m threshold → fix-quality degrade**
Given the 35 s window
When 95 % covariance crosses 100 m
Then within ≤500 ms, outbound MAVLink reports fix-quality degraded (`GPS_INPUT.fix_type ≤ 2` for AP, equivalent for iNav).
**AC-2: 500 m / 30 s threshold → 999.0 + STATUSTEXT**
When 95 % covariance crosses 500 m OR blackout exceeds 30 s
Then within ≤500 ms: outbound `horiz_accuracy=999.0` AND mavproxy captures STATUSTEXT containing `VISUAL_BLACKOUT_FAILSAFE`.
**AC-3: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
Same boundary as FT-N-04.
## Constraints
- The escalation ladder must fire in the observable order — earlier escalations (100 m fix-degrade) must precede later ones (999.0 + FAILSAFE) in the captured streams.
## Document Dependencies
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-04
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-04 row)