[autodev] archive batch 86 tasks, advance to batch 87

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 12:51:12 +00:00 · 2026-05-17 17:09:37 +03:00
parent 330893be5c
commit de19e716d8
5 changed files with 3 additions and 3 deletions
@@ -0,0 +1,68 @@
+# NFT-RES-01 — IMU-only fallback drift bound
+
+**Task**: AZ-432_nft_res_01_imu_only_fallback
+**Name**: 30 s vision blackout drift bound: 100 m without good IMU, 50 m with CombinedImuFactor (AC-3.5, AC-NEW-7)
+**Description**: Implement NFT-RES-01 — Derkachi replay; inject 30 s pure-black-frame window (no spoof); measure drift between estimate at blackout onset and estimate at blackout end against GT.
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407, AZ-408 (blackout_spoof with `--no-spoof` flag)
+**Component**: Blackbox Tests / Resilience (epic AZ-262)
+**Tracker**: AZ-432
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+The 30 s pure-vision-blackout drift bound is the project's IMU-fusion-quality cornerstone. AC-3.5 / AC-NEW-7 prescribe two budgets — 100 m without IMU, 50 m with CombinedImuFactor — and only by exercising both paths can the project verify the IMU fusion's value.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py`. Tier-1 OR Tier-2.
+- Two sub-cases:
+  - (a) "No good IMU" = SUT runs without IMU input or with IMU disabled in C5.
+  - (b) "Good IMU + CombinedImuFactor" = SUT's default config.
+- Each sub-case: 30 s blackout (no spoof); measure drift at blackout end vs GT.
+- Assert `drift ≤ 100 m` (no IMU) AND `drift ≤ 50 m` (CombinedImuFactor).
+
+## Scope
+
+### Included
+- Both sub-cases.
+- Per-sub-case drift measurement.
+
+### Excluded
+- Spoof-paired blackout — owned by FT-N-04.
+- Cumulative-drift over multiple blackouts — owned by FT-P-08.
+
+## Acceptance Criteria
+
+**AC-1: 30 s window injected**
+Given the no-spoof blackout injector
+Then a 30 s pure-black-frame window is active during the measurement.
+
+**AC-2: drift bound (no IMU)**
+Given sub-case (a)
+Then `drift_at_t_end ≤ 100 m`.
+
+**AC-3: drift bound (CombinedImuFactor)**
+Given sub-case (b)
+Then `drift_at_t_end ≤ 50 m`.
+
+**AC-4: parameterization**
+Given conftest parameterization
+Then both sub-cases run per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: no-spoof blackout injector, outbound estimate stream, GT from `data_imu.csv`.
+- **Forbidden**: stubbing C5 IMU-fusion path; sub-case (a) uses the SUT's documented "disable IMU" config flag if one exists, OR an empty IMU stream from the FC inbound proxy.
+
+## Constraints
+
+- Drift is `vincenty(estimate_at_t_end, GT_at_t_end)`.
+- Sub-case (a) MUST exercise a real "no IMU" config path; if the SUT cannot be configured this way, the test fails AC-2.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-01
+- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-01 row)
@@ -0,0 +1,64 @@
+# NFT-RES-02 — Companion mid-flight reboot recovery
+
+**Task**: AZ-433_nft_res_02_companion_reboot
+**Name**: SUT recovers from companion mid-flight reboot via FC pose snapshot (AC-5.2, AC-5.3)
+**Description**: Implement NFT-RES-02 — Tier-1 OR Tier-2; mid-flight Derkachi replay; trigger companion-process restart (`docker compose restart gps-denied-onboard` on Tier-1, `systemctl restart gps-denied-onboard` on Tier-2); measure resume time + first-emission accuracy.
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Resilience (epic AZ-262)
+**Tracker**: AZ-433
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Companion-process reboots are an inevitable operational event. AC-5.2 / AC-5.3 prescribe a recovery contract — without measurement the project has no honest claim of recoverability.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resilience/test_nft_res_02_companion_reboot.py`. Tier-1 OR Tier-2.
+- Mid-flight: trigger `restart` of the SUT process; measure (a) `resume_time` (wall-clock from restart command to first post-restart emission); (b) `first_emission_accuracy` (vincenty distance from first post-restart estimate to GT at that timestamp).
+- Assert `resume_time ≤ 30 s` AND `first_emission_accuracy ≤ 100 m`.
+
+## Scope
+
+### Included
+- Mid-flight restart trigger.
+- Resume-time + accuracy measurement.
+
+### Excluded
+- Cold-start TTFF — owned by NFT-PERF-03.
+
+## Acceptance Criteria
+
+**AC-1: restart trigger**
+Given a Derkachi replay in progress past the 60 s mark
+When the test issues the restart command
+Then the SUT process restarts within ≤5 s (Docker / systemd response).
+
+**AC-2: resume time ≤ 30 s**
+Given the restart event
+Then the first post-restart outbound emission occurs within ≤30 s of the restart command.
+
+**AC-3: first-emission accuracy ≤ 100 m**
+Given the first post-restart emission
+Then `vincenty(estimate, GT) ≤ 100 m`.
+
+**AC-4: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: Docker / systemd restart commands (public process-management interfaces); SITL outbound capture; GT from `data_imu.csv`.
+- **Forbidden**: pre-snapshotting SUT internal state to short-circuit the cold-start.
+
+## Constraints
+
+- The restart command differs by tier; conftest passes the appropriate command per the runner's tier.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-02
+- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-02 row)
@@ -0,0 +1,66 @@
+# NFT-RES-03 — Monte Carlo statistical envelope
+
+**Task**: AZ-434_nft_res_03_monte_carlo
+**Name**: 100-iteration Monte Carlo with seeded perturbations; 95th-percentile error within 1.96 × cov_semi_major_m (AC-NEW-4)
+**Description**: Implement NFT-RES-03 — Tier-1 OR Tier-2; 100 Derkachi replays with seeded perturbations (gain noise, IMU bias, frame-drop, outlier injection); per-iteration record per-frame error + cov_semi_major_m; statistical envelope assertion.
+**Complexity**: 5 points
+**Dependencies**: AZ-406, AZ-407, AZ-408
+**Component**: Blackbox Tests / Resilience (epic AZ-262)
+**Tracker**: AZ-434
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+A single deterministic run does not validate the SUT's statistical envelope. AC-NEW-4 requires a 100-iteration Monte Carlo with multiple perturbation sources to verify that the reported covariance is *honest* — i.e., the actual error distribution stays within the cov-derived envelope at the 95th percentile.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resilience/test_nft_res_03_monte_carlo.py`. Tier-1 OR Tier-2.
+- 100 Derkachi replays at randomized seed (gain noise, IMU bias, frame-drop pattern, outlier-injection schedule all seeded from a master seed).
+- Per iteration: per-frame `error_m` + `cov_semi_major_m`; aggregated globally across 100 × N_frames.
+- Assertion: `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95` (95th-percentile envelope).
+
+## Scope
+
+### Included
+- 100 iterations driven by the runner.
+- Seeded perturbation generation.
+- Aggregate envelope assertion.
+
+### Excluded
+- Per-component unit-level Monte Carlo — out of scope (unit tests).
+
+## Acceptance Criteria
+
+**AC-1: 100 iterations**
+Given the test runs
+Then 100 Derkachi replay iterations complete; partial completion is a hard failure (no early-return).
+
+**AC-2: seeded perturbations**
+Given the master seed
+Then re-running with the same master seed produces bit-identical iteration outcomes (deterministic).
+
+**AC-3: 95th-percentile envelope**
+Given the global aggregate of 100 × N_frames samples
+Then `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95`.
+
+**AC-4: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`. Note: 100 iterations × 4 parameterizations is expensive — the runner SHOULD run only one canonical parameterization per CI invocation by default and surface "full-matrix" mode behind an env flag.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries.
+
+- **Allowed**: per-iteration runner orchestration (Docker / systemd lifecycles); per-frame outbound stream + cov_semi_major_m.
+- **Forbidden**: mocking C1-C5 to control the per-iteration error directly.
+
+## Constraints
+
+- 100 iterations × 8 min Derkachi replay = ≥800 min on Tier-1; the runner uses parallel Docker compose stacks (4 × 200 min) when CI host has the capacity. Tier-2 (Jetson) iterations run sequentially.
+- Master seed is recorded in the evidence bundle for reproducibility.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-03
+- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-03 row)
@@ -0,0 +1,59 @@
+# NFT-RES-04 — 35 s blackout-with-spoof full escalation ladder
+
+**Task**: AZ-435_nft_res_04_blackout_escalation
+**Name**: 35 s blackout + spoof exercises full failsafe escalation ladder (AC-NEW-8 escalation thresholds)
+**Description**: Implement NFT-RES-04 — Tier-1 OR Tier-2; 35 s blackout + spoof injection; assert full escalation ladder fires (100 m → 2D fix degrade; 500 m or 30 s → 999.0 horiz_accuracy + VISUAL_BLACKOUT_FAILSAFE).
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407, AZ-408, AZ-426 (uses same fixture, separate scenario)
+**Component**: Blackbox Tests / Resilience / Security (epic AZ-262)
+**Tracker**: AZ-435
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+While FT-N-04 covers the 35 s window's escalation thresholds, NFT-RES-04 is the resilience-tier scenario specifically tasked with verifying the *full* escalation ladder fires in observable order — covariance crossings → fix-quality degrade → 999.0 horiz_accuracy → VISUAL_BLACKOUT_FAILSAFE STATUSTEXT — under realistic timing.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/resilience/test_nft_res_04_blackout_escalation.py`.
+- 35 s blackout + spoof from `blackout_spoof.py --window 35s`.
+- Assertions:
+  - 100 m covariance threshold → fix-quality degrade (2D fix or worse) within ≤500 ms of crossing.
+  - 500 m covariance OR 30 s elapsed → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT within ≤500 ms.
+
+## Scope
+
+### Included
+- 35 s window with spoof.
+- Both escalation-threshold assertions.
+
+### Excluded
+- The non-35-s sub-cases of FT-N-04 — owned by AZ-426.
+
+## Acceptance Criteria
+
+**AC-1: 100 m threshold → fix-quality degrade**
+Given the 35 s window
+When 95 % covariance crosses 100 m
+Then within ≤500 ms, outbound MAVLink reports fix-quality degraded (`GPS_INPUT.fix_type ≤ 2` for AP, equivalent for iNav).
+
+**AC-2: 500 m / 30 s threshold → 999.0 + STATUSTEXT**
+When 95 % covariance crosses 500 m OR blackout exceeds 30 s
+Then within ≤500 ms: outbound `horiz_accuracy=999.0` AND mavproxy captures STATUSTEXT containing `VISUAL_BLACKOUT_FAILSAFE`.
+
+**AC-3: parameterization**
+Given conftest parameterization
+Then the scenario runs per `(fc_adapter, vio_strategy)`.
+
+## System Under Test Boundary
+
+Same boundary as FT-N-04.
+
+## Constraints
+
+- The escalation ladder must fire in the observable order — earlier escalations (100 m fix-degrade) must precede later ones (999.0 + FAILSAFE) in the captured streams.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-04
+- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-04 row)