[AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing

- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering
  AC-1 (gyro_z run detection + synthetic-overlay fallback),
  AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance),
  AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario
  files under positive/ and negative/.
- AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario.
- AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw +
  smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3
  (mean improvement >= 5 m); scenario file.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407
  leftovers land).
- +68 unit tests; full e2e unit suite: 393 passed.

See _docs/03_implementation/batch_71_report.md and
_docs/03_implementation/reviews/batch_71_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 07:12:24 +03:00
parent 29ac16cfcb
commit c6e6cba237
17 changed files with 3195 additions and 1 deletions
@@ -1,81 +0,0 @@
# FT-P-07 + FT-N-02 — Sharp-turn recovery (positive + negative twin)
**Task**: AZ-414_ft_p_07_ftn_02_sharp_turn
**Name**: Sharp-turn recovery via satellite reference + legitimate frame-to-frame failure expected (AC-3.2)
**Description**: Implement FT-P-07 (recovery within 3 frames of turn end; drift ≤ 200 m, heading change handled) AND FT-N-02 (during turn, source_label is `visual_propagated` or `dead_reckoned`; covariance grows; recovery exercised in FT-P-07). Both scenarios share the sharp-turn segment fixture.
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Positive + Negative (epic AZ-262)
**Tracker**: AZ-414
**Epic**: AZ-262 (E-BBT)
## Problem
Sharp turns are a documented degradation case (AC-3.2). The system must label the turn frames correctly (FT-N-02) AND recover when the turn ends (FT-P-07) — both halves must be measured to validate the failure-path correctness.
## Outcome
- pytest scenario at `e2e/tests/positive/test_ft_p_07_sharp_turn_recovery.py` (FT-P-07) and `e2e/tests/negative/test_ft_n_02_sharp_turn_failure.py` (FT-N-02).
- Both replay the sharp-turn segment of Derkachi (identified by gyro_z spikes in `SCALED_IMU2`).
- FT-P-07 asserts: source_label returns to `satellite_anchored` within 3 frames of turn end; drift since pre-turn anchor ≤ 200 m; heading change up to 70° handled.
- FT-N-02 asserts: during turn, source_label ∈ `{visual_propagated, dead_reckoned}`; covariance grows monotonically; transitions to satellite_anchored after turn (handed off to FT-P-07 for the recovery assertion).
- Synthetic-gyro-overlay fallback: if the natural Derkachi flight has no sharp turn meeting AC-3.2 thresholds, both scenarios fall back to a synthetic gyro overlay; this fact is flagged in the FDR record + CSV `evidence_paths`.
## Scope
### Included
- Sharp-turn segment identification via gyro_z spikes in `data_imu.csv`.
- Synthetic-gyro overlay fallback path (use the same approach as outlier injector for determinism).
- FT-P-07 recovery assertions (label transition, drift, heading-change tolerance).
- FT-N-02 during-turn assertions (label, monotonic covariance).
### Excluded
- Multi-segment satellite re-localization — owned by FT-P-08 (AZ-415).
- Outlier-injection tolerance — owned by FT-N-01 (AZ-424).
## Acceptance Criteria
**AC-1: turn-segment identification**
Given Derkachi `data_imu.csv`
Then the test computes gyro_z magnitude per IMU row and identifies segments where ≥3 consecutive rows have `|gyro_z| > AC-3.2 threshold`. If no segment meets the threshold, the synthetic-overlay fallback fires and the FDR + CSV mark this as `synthetic-overlay`.
**AC-2: FT-N-02 during-turn label**
Given a turn segment
Then for every frame inside the segment, source_label ∈ `{visual_propagated, dead_reckoned}` (no `satellite_anchored` during the turn).
**AC-3: FT-N-02 monotonic covariance**
Given the during-turn frames
Then `cov_semi_major_m` is non-decreasing across consecutive frames within the turn segment.
**AC-4: FT-P-07 recovery within 3 frames**
Given a turn-end timestamp
Then the next satellite_anchored emission occurs within ≤3 frames after that timestamp.
**AC-5: FT-P-07 drift bound**
Given the recovery anchor
Then `‖propagated_centre_at_turn_end recovery_anchor_centre‖ ≤ 200 m`.
**AC-6: FT-P-07 heading-change envelope**
Given the heading delta from pre-turn to post-turn anchor
Then heading changes up to 70° are handled (the recovery still occurs within the 3-frame budget at heading deltas in [0°, 70°]).
**AC-7: parameterization**
Given conftest parameterization
Then both scenarios run per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: outbound message stream (`source_label`, `cov_semi_major_m`); FDR for the synthetic-overlay flag.
- **Forbidden**: stubbing C1 VIO failure mode, monkeypatching the source-label state machine.
## Constraints
- Synthetic-gyro overlay fallback is determined per-run by checking the natural fixture; the choice is logged into FDR and CSV `evidence_paths`.
- The sharp-turn threshold per AC-3.2 is the project's authoritative value (gyro_z magnitude + duration); this test reads that threshold from the test-spec environment, not from a hardcoded constant.
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-07, § FT-N-02
- `_docs/02_document/tests/test-data.md` § Image processing quality / Resilience
@@ -1,70 +0,0 @@
# FT-P-08 — Multi-segment satellite-reference re-localization
**Task**: AZ-415_ft_p_08_multi_segment_reloc
**Name**: ≥3 disconnected segments handled via satellite-reference re-localization (AC-3.3)
**Description**: Implement FT-P-08 — replay the `multi-segment-derkachi` synthetic fixture with 3+ blackout windows; assert SUT emits `dead_reckoned` during each blackout, returns to `satellite_anchored` within 3 frames of each blackout end, and trajectory continuity preserved (no >100 m jump).
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407, AZ-408 (multi_segment injector)
**Component**: Blackbox Tests / Positive (epic AZ-262)
**Tracker**: AZ-415
**Epic**: AZ-262 (E-BBT)
## Problem
The system claims to handle ≥3 disconnected segments per flight via satellite-reference re-localization (AC-3.3). Without this scenario the multi-blackout recovery path is unmeasured.
## Outcome
- pytest scenario at `e2e/tests/positive/test_ft_p_08_multi_segment_reloc.py`.
- Replays the `multi-segment-derkachi` fixture (3+ blackout windows distributed across the flight, no spoof injection).
- For each blackout: asserts `source_label = dead_reckoned` during the blackout; asserts `source_label` returns to `satellite_anchored` within 3 frames of blackout end; asserts no trajectory jump >100 m at the recovery transition.
## Scope
### Included
- Replay-driven test method against `multi-segment-derkachi`.
- Per-blackout assertion (label during, label transition, jump size).
- Aggregate pass: all 3+ blackouts must satisfy all three sub-assertions.
### Excluded
- Spoof-paired blackouts — owned by FT-N-04 (AZ-426) and NFT-RES-04 (AZ-435).
- Single-blackout outage with operator-reloc request — owned by FT-N-03 (AZ-425).
## Acceptance Criteria
**AC-1: blackout-window detection**
Given the fixture
Then the test identifies all ≥3 blackout windows from the fixture's manifest (the injector emits a window-list JSON alongside the modified video).
**AC-2: dead_reckoned during blackout**
Given a blackout window `[t_start, t_end]`
Then for every outbound emission with timestamp in `[t_start, t_end]`, `source_label = dead_reckoned`.
**AC-3: recovery within 3 frames**
Given each blackout's `t_end`
Then the next `satellite_anchored` emission occurs within ≤3 frames after `t_end`.
**AC-4: trajectory-continuity bound**
Given the recovery anchor following each blackout
Then `‖estimate_at_t_end recovery_anchor‖ ≤ 100 m`.
**AC-5: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`.
## System Under Test Boundary
End-to-end through public boundaries.
- **Allowed**: outbound message stream; FDR; the injector's window-list JSON (a public test artifact).
- **Forbidden**: querying SUT internal anchor cache, stubbing C2 retrieval.
## Constraints
- The injector ensures ≥3 windows AND ≥30 s of normal flight between them (AC-5 of AZ-408).
- "Trajectory jump" is the L2 distance at the moment of recovery; pre/post measurements are the SUT's outbound estimates, not GT.
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-08
- `_docs/02_document/tests/test-data.md` § Resilience (FT-P-08 row)
@@ -1,69 +0,0 @@
# FT-P-10 — GTSAM smoothing-loop look-back accuracy
**Task**: AZ-418_ft_p_10_smoothing_lookback
**Name**: Internal smoothing improves past-keyframe estimates (AC-4.5 revised, Mode B Fact #107)
**Description**: Implement FT-P-10 — full Derkachi replay; FDR contains per-keyframe (a) raw single-shot pose at first emission, (b) smoothed pose at iSAM2 convergence; assert `smoothed_error < raw_error` for ≥80 % of keyframes; `mean_improvement ≥ 5 m`. NOT validated as FC-side retroactive correction (out of scope per Mode B revision).
**Complexity**: 3 points
**Dependencies**: AZ-406, AZ-407
**Component**: Blackbox Tests / Positive / Internal smoothing (epic AZ-262)
**Tracker**: AZ-418
**Epic**: AZ-262 (E-BBT)
## Problem
The iSAM2 fixed-lag smoother is the project's IMU-fusion mechanism; if it doesn't actually improve past-keyframe estimates over raw single-shot, the entire C5 design loses its rationale. AC-4.5 (revised per Mode B) measures this as an internal-improvement metric, NOT FC-side retroactive correction.
## Outcome
- pytest scenario at `e2e/tests/positive/test_ft_p_10_smoothing_lookback.py`.
- Replays Derkachi end-to-end; reads FDR archive after replay.
- For each past keyframe: extracts (a) `raw_pose` (first single-shot emission for that keyframe) and (b) `smoothed_pose` (iSAM2-converged pose at smoother window end).
- Computes `distance(raw, GT)` and `distance(smoothed, GT)` against Derkachi `GLOBAL_POSITION_INT`.
- Aggregates: `improvement_rate = count(smoothed_error < raw_error) / total`; `mean_improvement = mean(raw_error - smoothed_error)`.
## Scope
### Included
- FDR archive reader for past-keyframe records (per AC-NEW-3 schema; raw + smoothed entries).
- Per-keyframe error computation against `GLOBAL_POSITION_INT` GT.
- Aggregate assertions.
### Excluded
- FC-side retroactive correction — explicitly OUT OF SCOPE per Mode B revision.
- Inter-keyframe interpolation accuracy — out of scope.
- iSAM2 timing — owned by NFT-PERF-01 (AZ-428).
## Acceptance Criteria
**AC-1: FDR contains raw + smoothed pose pairs**
Given a full Derkachi replay
Then the FDR archive contains, for each past keyframe k: (a) `raw_pose_k` recorded at the keyframe's first emission timestamp; (b) `smoothed_pose_k` recorded when k exits the iSAM2 window.
**AC-2: improvement rate**
Given the per-keyframe pairs
Then `count(smoothed_error_k < raw_error_k) / total_keyframes ≥ 0.80`.
**AC-3: mean improvement**
Given the per-keyframe pairs
Then `mean(raw_error_k smoothed_error_k) ≥ 5 m` over all keyframes.
**AC-4: parameterization**
Given conftest parameterization
Then the scenario runs per `(fc_adapter, vio_strategy)`. Note: this AC is sensitive to VIO strategy quality; expected `vins_mono` (research) ≥ `okvis2``klt_ransac` improvement rates — the test reports per-strategy rates as evidence even if all pass the threshold.
## System Under Test Boundary
End-to-end through public boundaries; FDR archive read post-flight.
- **Allowed**: FDR-archive read (a public on-disk artifact per AC-NEW-3).
- **Forbidden**: querying live iSAM2 graph state; importing SUT C5 module.
## Constraints
- The FDR record schema (AC-NEW-3) MUST distinguish raw vs smoothed past-keyframe entries; if only one is present, the test fails.
- "Past keyframe" in this scenario excludes the most recent K=10..20 keyframes (still inside the smoother window) — those have not yet converged.
## Document Dependencies
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-10
- `_docs/02_document/tests/test-data.md` § FC contract & startup (FT-P-10 row)