# Batch 71 Report — Test Implementation (cycle 1, batch 5 of test phase) **Batch**: 71 **Date**: 2026-05-16 **Context**: Test implementation (greenfield Step 10 — Implement Tests) **Tasks**: AZ-414 (3pt), AZ-415 (3pt), AZ-418 (3pt) — 9 cp / 3 tasks **Cycle**: 1 **Verdict**: COMPLETE — PASS (self-reviewed; see `reviews/batch_71_review.md`) ## Summary Three scenarios covering sharp-turn recovery (positive + negative twin), multi-segment relocalisation, and GTSAM smoothing-loop look-back. Same pattern as batches 69 + 70: * Pure-logic helper under `e2e/runner/helpers/` for everything the scenario can express without docker-bound replay + FDR ingestion. * Scenario file(s) under `e2e/tests/{positive,negative}/`, parameterized across `(fc_adapter, vio_strategy)`, skip-gated on upstream `frame_source_replay` / `imu_replay` / `fdr_reader` stubs (auto-activates when AZ-441 + AZ-407 leftovers land). * Helper-driven unit test file under `e2e/_unit_tests/helpers/`. ### AZ-414 — FT-P-07 + FT-N-02 sharp-turn recovery + failure twin (3pt) * **`runner/helpers/sharp_turn_detector.py`** — `load_zgyro_samples` reads `SCALED_IMU2.zgyro` (millidegree/s) from `data_imu.csv`; `get_threshold_mdps` reads the AC-3.2 threshold from env var `AC32_SHARP_TURN_GYRO_Z_MDPS` (default 30,000 mdps) per spec note; `detect_turn_segments` finds contiguous runs of ≥3 samples above threshold (using `|zgyro|` so left + right turns both qualify); `synthesize_overlay_segment` provides the AC-1 fallback when the natural fixture has no qualifying turn; `detect_or_synthesize` is the scenario-facing helper that picks natural-first; `evaluate_ft_n_02` computes AC-2 (label ∈ {visual_propagated, dead_reckoned}) + AC-3 (cov non-decreasing); `evaluate_ft_p_07` computes AC-4 (recovery lag ≤ 3 frames safety-budget = 1100 ms), AC-5 (drift ≤ 200 m), AC-6 (heading delta ≤ 70°). `write_csv_evidence` emits a combined CSV with `synthetic_overlay` column so the fallback fact is recorded per AC-1. * **`tests/positive/test_ft_p_07_sharp_turn_recovery.py`** — asserts AC-4 + AC-5 + AC-6 per segment; records NFR metrics with AC IDs; records `synthetic_overlay` flag. * **`tests/negative/test_ft_n_02_sharp_turn_failure.py`** — asserts AC-2 + AC-3 per segment; uses the same detector + frame-collection logic. * **30 unit tests** in `test_sharp_turn_detector.py`. ### AZ-415 — FT-P-08 multi-segment relocalisation (3pt) * **`runner/helpers/multi_segment_evaluator.py`** — `load_schedule` reads the AZ-408 multi-segment schedule (blackout window times + recovery anchors); `evaluate_window` checks AC-2 (label = dead_reckoned inside window), AC-3 (recovery to satellite_anchored ≤ 10 s after window end), AC-4 (no centre jump > 200 m at recovery); `evaluate` aggregates over all blackout windows; constants `MAX_RECOVERY_LAG_MS = 10_000` and `MAX_JUMP_M = 200.0` expose the thresholds. * **`tests/positive/test_ft_p_08_multi_segment_reloc.py`** — drives `multi-segment-derkachi` injector fixture, replays via stub-gated helpers, evaluates per window. * **16 unit tests** in `test_multi_segment_evaluator.py`. ### AZ-418 — FT-P-10 GTSAM smoothing-loop look-back accuracy (3pt) * **`runner/helpers/smoothing_evaluator.py`** — `pair_records` groups FDR `keyframe_pose` records by `keyframe_id` and rejects duplicate `pose_kind` values (raw + smoothed must be exactly one each per keyframe per AC-1); `resolve_gt_at` picks the nearest-in-time Derkachi GLOBAL_POSITION_INT pose (10 Hz cadence → ≤50 ms slop acceptable for the metre-scale improvement deltas this AC measures); `evaluate` produces per-keyframe + aggregate report with `improvement_rate` (AC-2 threshold = 0.80) and `mean_improvement_m` (AC-3 threshold = 5 m). The module docstring explicitly preserves Mode B Fact #107 — this is an INTERNAL improvement metric, NOT FC-side retroactive correction. * **`tests/positive/test_ft_p_10_smoothing_lookback.py`** — pairs + evaluates + asserts AC-2 + AC-3 per (fc_adapter, vio_strategy). * **15 unit tests** in `test_smoothing_evaluator.py`. ## Tests * **Full e2e unit suite**: 393 passed in 126.64 s (was 325 at end of batch 70 → +68 net new tests this batch). * **Pre-existing**: macOS-only `/e2e-results` plugin issue in scenario invocation outside Docker. Unit suite unaffected. Tracked under runner reporting — out of batch scope. ## Files Touched **New helpers:** * `e2e/runner/helpers/sharp_turn_detector.py` * `e2e/runner/helpers/smoothing_evaluator.py` * `e2e/runner/helpers/multi_segment_evaluator.py` **New unit tests:** * `e2e/_unit_tests/helpers/test_sharp_turn_detector.py` (30 tests) * `e2e/_unit_tests/helpers/test_smoothing_evaluator.py` (15 tests) * `e2e/_unit_tests/helpers/test_multi_segment_evaluator.py` (16 tests) **New scenarios:** * `e2e/tests/positive/test_ft_p_07_sharp_turn_recovery.py` * `e2e/tests/positive/test_ft_p_08_multi_segment_reloc.py` * `e2e/tests/positive/test_ft_p_10_smoothing_lookback.py` * `e2e/tests/negative/test_ft_n_02_sharp_turn_failure.py` **Updated:** * `e2e/_unit_tests/test_directory_layout.py` — added 7 new paths. **Archived:** * `_docs/02_tasks/todo/AZ-414_*.md` → `done/` * `_docs/02_tasks/todo/AZ-415_*.md` → `done/` * `_docs/02_tasks/todo/AZ-418_*.md` → `done/` ## Cumulative Review Trigger K=3. Last cumulative covered batches 67-69. Since then: 70 + 71 = 2 batches. **Cumulative does NOT fire this batch.** Next cumulative trigger: end of batch 72.