[autodev] archive batch 86 tasks, advance to batch 87

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 14:11:13 +00:00 · 2026-05-17 17:09:37 +03:00
parent 330893be5c
commit de19e716d8
5 changed files with 3 additions and 3 deletions
@@ -1,68 +0,0 @@
-# NFT-RES-01 — IMU-only fallback drift bound
-
-**Task**: AZ-432_nft_res_01_imu_only_fallback
-**Name**: 30 s vision blackout drift bound: 100 m without good IMU, 50 m with CombinedImuFactor (AC-3.5, AC-NEW-7)
-**Description**: Implement NFT-RES-01 — Derkachi replay; inject 30 s pure-black-frame window (no spoof); measure drift between estimate at blackout onset and estimate at blackout end against GT.
-**Complexity**: 3 points
-**Dependencies**: AZ-406, AZ-407, AZ-408 (blackout_spoof with `--no-spoof` flag)
-**Component**: Blackbox Tests / Resilience (epic AZ-262)
-**Tracker**: AZ-432
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-The 30 s pure-vision-blackout drift bound is the project's IMU-fusion-quality cornerstone. AC-3.5 / AC-NEW-7 prescribe two budgets — 100 m without IMU, 50 m with CombinedImuFactor — and only by exercising both paths can the project verify the IMU fusion's value.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py`. Tier-1 OR Tier-2.
- Two sub-cases:
-  - (a) "No good IMU" = SUT runs without IMU input or with IMU disabled in C5.
-  - (b) "Good IMU + CombinedImuFactor" = SUT's default config.
- Each sub-case: 30 s blackout (no spoof); measure drift at blackout end vs GT.
- Assert `drift ≤ 100 m` (no IMU) AND `drift ≤ 50 m` (CombinedImuFactor).
-
-## Scope
-
-### Included
- Both sub-cases.
- Per-sub-case drift measurement.
-
-### Excluded
- Spoof-paired blackout — owned by FT-N-04.
- Cumulative-drift over multiple blackouts — owned by FT-P-08.
-
-## Acceptance Criteria
-
-**AC-1: 30 s window injected**
-Given the no-spoof blackout injector
-Then a 30 s pure-black-frame window is active during the measurement.
-
-**AC-2: drift bound (no IMU)**
-Given sub-case (a)
-Then `drift_at_t_end ≤ 100 m`.
-
-**AC-3: drift bound (CombinedImuFactor)**
-Given sub-case (b)
-Then `drift_at_t_end ≤ 50 m`.
-
-**AC-4: parameterization**
-Given conftest parameterization
-Then both sub-cases run per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: no-spoof blackout injector, outbound estimate stream, GT from `data_imu.csv`.
- **Forbidden**: stubbing C5 IMU-fusion path; sub-case (a) uses the SUT's documented "disable IMU" config flag if one exists, OR an empty IMU stream from the FC inbound proxy.
-
-## Constraints
-
- Drift is `vincenty(estimate_at_t_end, GT_at_t_end)`.
- Sub-case (a) MUST exercise a real "no IMU" config path; if the SUT cannot be configured this way, the test fails AC-2.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-01
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-01 row)
@@ -1,64 +0,0 @@
-# NFT-RES-02 — Companion mid-flight reboot recovery
-
-**Task**: AZ-433_nft_res_02_companion_reboot
-**Name**: SUT recovers from companion mid-flight reboot via FC pose snapshot (AC-5.2, AC-5.3)
-**Description**: Implement NFT-RES-02 — Tier-1 OR Tier-2; mid-flight Derkachi replay; trigger companion-process restart (`docker compose restart gps-denied-onboard` on Tier-1, `systemctl restart gps-denied-onboard` on Tier-2); measure resume time + first-emission accuracy.
-**Complexity**: 3 points
-**Dependencies**: AZ-406, AZ-407
-**Component**: Blackbox Tests / Resilience (epic AZ-262)
-**Tracker**: AZ-433
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-Companion-process reboots are an inevitable operational event. AC-5.2 / AC-5.3 prescribe a recovery contract — without measurement the project has no honest claim of recoverability.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resilience/test_nft_res_02_companion_reboot.py`. Tier-1 OR Tier-2.
- Mid-flight: trigger `restart` of the SUT process; measure (a) `resume_time` (wall-clock from restart command to first post-restart emission); (b) `first_emission_accuracy` (vincenty distance from first post-restart estimate to GT at that timestamp).
- Assert `resume_time ≤ 30 s` AND `first_emission_accuracy ≤ 100 m`.
-
-## Scope
-
-### Included
- Mid-flight restart trigger.
- Resume-time + accuracy measurement.
-
-### Excluded
- Cold-start TTFF — owned by NFT-PERF-03.
-
-## Acceptance Criteria
-
-**AC-1: restart trigger**
-Given a Derkachi replay in progress past the 60 s mark
-When the test issues the restart command
-Then the SUT process restarts within ≤5 s (Docker / systemd response).
-
-**AC-2: resume time ≤ 30 s**
-Given the restart event
-Then the first post-restart outbound emission occurs within ≤30 s of the restart command.
-
-**AC-3: first-emission accuracy ≤ 100 m**
-Given the first post-restart emission
-Then `vincenty(estimate, GT) ≤ 100 m`.
-
-**AC-4: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: Docker / systemd restart commands (public process-management interfaces); SITL outbound capture; GT from `data_imu.csv`.
- **Forbidden**: pre-snapshotting SUT internal state to short-circuit the cold-start.
-
-## Constraints
-
- The restart command differs by tier; conftest passes the appropriate command per the runner's tier.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-02
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-02 row)
@@ -1,66 +0,0 @@
-# NFT-RES-03 — Monte Carlo statistical envelope
-
-**Task**: AZ-434_nft_res_03_monte_carlo
-**Name**: 100-iteration Monte Carlo with seeded perturbations; 95th-percentile error within 1.96 × cov_semi_major_m (AC-NEW-4)
-**Description**: Implement NFT-RES-03 — Tier-1 OR Tier-2; 100 Derkachi replays with seeded perturbations (gain noise, IMU bias, frame-drop, outlier injection); per-iteration record per-frame error + cov_semi_major_m; statistical envelope assertion.
-**Complexity**: 5 points
-**Dependencies**: AZ-406, AZ-407, AZ-408
-**Component**: Blackbox Tests / Resilience (epic AZ-262)
-**Tracker**: AZ-434
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-A single deterministic run does not validate the SUT's statistical envelope. AC-NEW-4 requires a 100-iteration Monte Carlo with multiple perturbation sources to verify that the reported covariance is *honest* — i.e., the actual error distribution stays within the cov-derived envelope at the 95th percentile.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resilience/test_nft_res_03_monte_carlo.py`. Tier-1 OR Tier-2.
- 100 Derkachi replays at randomized seed (gain noise, IMU bias, frame-drop pattern, outlier-injection schedule all seeded from a master seed).
- Per iteration: per-frame `error_m` + `cov_semi_major_m`; aggregated globally across 100 × N_frames.
- Assertion: `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95` (95th-percentile envelope).
-
-## Scope
-
-### Included
- 100 iterations driven by the runner.
- Seeded perturbation generation.
- Aggregate envelope assertion.
-
-### Excluded
- Per-component unit-level Monte Carlo — out of scope (unit tests).
-
-## Acceptance Criteria
-
-**AC-1: 100 iterations**
-Given the test runs
-Then 100 Derkachi replay iterations complete; partial completion is a hard failure (no early-return).
-
-**AC-2: seeded perturbations**
-Given the master seed
-Then re-running with the same master seed produces bit-identical iteration outcomes (deterministic).
-
-**AC-3: 95th-percentile envelope**
-Given the global aggregate of 100 × N_frames samples
-Then `count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95`.
-
-**AC-4: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`. Note: 100 iterations × 4 parameterizations is expensive — the runner SHOULD run only one canonical parameterization per CI invocation by default and surface "full-matrix" mode behind an env flag.
-
-## System Under Test Boundary
-
-End-to-end through public boundaries.
-
- **Allowed**: per-iteration runner orchestration (Docker / systemd lifecycles); per-frame outbound stream + cov_semi_major_m.
- **Forbidden**: mocking C1-C5 to control the per-iteration error directly.
-
-## Constraints
-
- 100 iterations × 8 min Derkachi replay = ≥800 min on Tier-1; the runner uses parallel Docker compose stacks (4 × 200 min) when CI host has the capacity. Tier-2 (Jetson) iterations run sequentially.
- Master seed is recorded in the evidence bundle for reproducibility.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-03
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-03 row)
@@ -1,59 +0,0 @@
-# NFT-RES-04 — 35 s blackout-with-spoof full escalation ladder
-
-**Task**: AZ-435_nft_res_04_blackout_escalation
-**Name**: 35 s blackout + spoof exercises full failsafe escalation ladder (AC-NEW-8 escalation thresholds)
-**Description**: Implement NFT-RES-04 — Tier-1 OR Tier-2; 35 s blackout + spoof injection; assert full escalation ladder fires (100 m → 2D fix degrade; 500 m or 30 s → 999.0 horiz_accuracy + VISUAL_BLACKOUT_FAILSAFE).
-**Complexity**: 3 points
-**Dependencies**: AZ-406, AZ-407, AZ-408, AZ-426 (uses same fixture, separate scenario)
-**Component**: Blackbox Tests / Resilience / Security (epic AZ-262)
-**Tracker**: AZ-435
-**Epic**: AZ-262 (E-BBT)
-
-## Problem
-
-While FT-N-04 covers the 35 s window's escalation thresholds, NFT-RES-04 is the resilience-tier scenario specifically tasked with verifying the *full* escalation ladder fires in observable order — covariance crossings → fix-quality degrade → 999.0 horiz_accuracy → VISUAL_BLACKOUT_FAILSAFE STATUSTEXT — under realistic timing.
-
-## Outcome
-
- pytest scenario at `e2e/tests/resilience/test_nft_res_04_blackout_escalation.py`.
- 35 s blackout + spoof from `blackout_spoof.py --window 35s`.
- Assertions:
-  - 100 m covariance threshold → fix-quality degrade (2D fix or worse) within ≤500 ms of crossing.
-  - 500 m covariance OR 30 s elapsed → `horiz_accuracy=999.0` AND `VISUAL_BLACKOUT_FAILSAFE` STATUSTEXT within ≤500 ms.
-
-## Scope
-
-### Included
- 35 s window with spoof.
- Both escalation-threshold assertions.
-
-### Excluded
- The non-35-s sub-cases of FT-N-04 — owned by AZ-426.
-
-## Acceptance Criteria
-
-**AC-1: 100 m threshold → fix-quality degrade**
-Given the 35 s window
-When 95 % covariance crosses 100 m
-Then within ≤500 ms, outbound MAVLink reports fix-quality degraded (`GPS_INPUT.fix_type ≤ 2` for AP, equivalent for iNav).
-
-**AC-2: 500 m / 30 s threshold → 999.0 + STATUSTEXT**
-When 95 % covariance crosses 500 m OR blackout exceeds 30 s
-Then within ≤500 ms: outbound `horiz_accuracy=999.0` AND mavproxy captures STATUSTEXT containing `VISUAL_BLACKOUT_FAILSAFE`.
-
-**AC-3: parameterization**
-Given conftest parameterization
-Then the scenario runs per `(fc_adapter, vio_strategy)`.
-
-## System Under Test Boundary
-
-Same boundary as FT-N-04.
-
-## Constraints
-
- The escalation ladder must fire in the observable order — earlier escalations (100 m fix-degrade) must precede later ones (999.0 + FAILSAFE) in the captured streams.
-
-## Document Dependencies
-
- `_docs/02_document/tests/resilience-tests.md` § NFT-RES-04
- `_docs/02_document/tests/test-data.md` § Resilience (NFT-RES-04 row)