Files
gps-denied-onboard/_docs/03_implementation/batch_71_report.md
T
Oleksandr Bezdieniezhnykh c6e6cba237 [AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing
- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering
  AC-1 (gyro_z run detection + synthetic-overlay fallback),
  AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance),
  AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario
  files under positive/ and negative/.
- AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario.
- AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw +
  smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3
  (mean improvement >= 5 m); scenario file.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407
  leftovers land).
- +68 unit tests; full e2e unit suite: 393 passed.

See _docs/03_implementation/batch_71_report.md and
_docs/03_implementation/reviews/batch_71_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 07:12:24 +03:00

5.4 KiB

Batch 71 Report — Test Implementation (cycle 1, batch 5 of test phase)

Batch: 71 Date: 2026-05-16 Context: Test implementation (greenfield Step 10 — Implement Tests) Tasks: AZ-414 (3pt), AZ-415 (3pt), AZ-418 (3pt) — 9 cp / 3 tasks Cycle: 1 Verdict: COMPLETE — PASS (self-reviewed; see reviews/batch_71_review.md)

Summary

Three scenarios covering sharp-turn recovery (positive + negative twin), multi-segment relocalisation, and GTSAM smoothing-loop look-back. Same pattern as batches 69 + 70:

  • Pure-logic helper under e2e/runner/helpers/ for everything the scenario can express without docker-bound replay + FDR ingestion.
  • Scenario file(s) under e2e/tests/{positive,negative}/, parameterized across (fc_adapter, vio_strategy), skip-gated on upstream frame_source_replay / imu_replay / fdr_reader stubs (auto-activates when AZ-441 + AZ-407 leftovers land).
  • Helper-driven unit test file under e2e/_unit_tests/helpers/.

AZ-414 — FT-P-07 + FT-N-02 sharp-turn recovery + failure twin (3pt)

  • runner/helpers/sharp_turn_detector.pyload_zgyro_samples reads SCALED_IMU2.zgyro (millidegree/s) from data_imu.csv; get_threshold_mdps reads the AC-3.2 threshold from env var AC32_SHARP_TURN_GYRO_Z_MDPS (default 30,000 mdps) per spec note; detect_turn_segments finds contiguous runs of ≥3 samples above threshold (using |zgyro| so left + right turns both qualify); synthesize_overlay_segment provides the AC-1 fallback when the natural fixture has no qualifying turn; detect_or_synthesize is the scenario-facing helper that picks natural-first; evaluate_ft_n_02 computes AC-2 (label ∈ {visual_propagated, dead_reckoned}) + AC-3 (cov non-decreasing); evaluate_ft_p_07 computes AC-4 (recovery lag ≤ 3 frames safety-budget = 1100 ms), AC-5 (drift ≤ 200 m), AC-6 (heading delta ≤ 70°). write_csv_evidence emits a combined CSV with synthetic_overlay column so the fallback fact is recorded per AC-1.
  • tests/positive/test_ft_p_07_sharp_turn_recovery.py — asserts AC-4 + AC-5 + AC-6 per segment; records NFR metrics with AC IDs; records synthetic_overlay flag.
  • tests/negative/test_ft_n_02_sharp_turn_failure.py — asserts AC-2 + AC-3 per segment; uses the same detector + frame-collection logic.
  • 30 unit tests in test_sharp_turn_detector.py.

AZ-415 — FT-P-08 multi-segment relocalisation (3pt)

  • runner/helpers/multi_segment_evaluator.pyload_schedule reads the AZ-408 multi-segment schedule (blackout window times + recovery anchors); evaluate_window checks AC-2 (label = dead_reckoned inside window), AC-3 (recovery to satellite_anchored ≤ 10 s after window end), AC-4 (no centre jump > 200 m at recovery); evaluate aggregates over all blackout windows; constants MAX_RECOVERY_LAG_MS = 10_000 and MAX_JUMP_M = 200.0 expose the thresholds.
  • tests/positive/test_ft_p_08_multi_segment_reloc.py — drives multi-segment-derkachi injector fixture, replays via stub-gated helpers, evaluates per window.
  • 16 unit tests in test_multi_segment_evaluator.py.

AZ-418 — FT-P-10 GTSAM smoothing-loop look-back accuracy (3pt)

  • runner/helpers/smoothing_evaluator.pypair_records groups FDR keyframe_pose records by keyframe_id and rejects duplicate pose_kind values (raw + smoothed must be exactly one each per keyframe per AC-1); resolve_gt_at picks the nearest-in-time Derkachi GLOBAL_POSITION_INT pose (10 Hz cadence → ≤50 ms slop acceptable for the metre-scale improvement deltas this AC measures); evaluate produces per-keyframe + aggregate report with improvement_rate (AC-2 threshold = 0.80) and mean_improvement_m (AC-3 threshold = 5 m). The module docstring explicitly preserves Mode B Fact #107 — this is an INTERNAL improvement metric, NOT FC-side retroactive correction.
  • tests/positive/test_ft_p_10_smoothing_lookback.py — pairs + evaluates + asserts AC-2 + AC-3 per (fc_adapter, vio_strategy).
  • 15 unit tests in test_smoothing_evaluator.py.

Tests

  • Full e2e unit suite: 393 passed in 126.64 s (was 325 at end of batch 70 → +68 net new tests this batch).
  • Pre-existing: macOS-only /e2e-results plugin issue in scenario invocation outside Docker. Unit suite unaffected. Tracked under runner reporting — out of batch scope.

Files Touched

New helpers:

  • e2e/runner/helpers/sharp_turn_detector.py
  • e2e/runner/helpers/smoothing_evaluator.py
  • e2e/runner/helpers/multi_segment_evaluator.py

New unit tests:

  • e2e/_unit_tests/helpers/test_sharp_turn_detector.py (30 tests)
  • e2e/_unit_tests/helpers/test_smoothing_evaluator.py (15 tests)
  • e2e/_unit_tests/helpers/test_multi_segment_evaluator.py (16 tests)

New scenarios:

  • e2e/tests/positive/test_ft_p_07_sharp_turn_recovery.py
  • e2e/tests/positive/test_ft_p_08_multi_segment_reloc.py
  • e2e/tests/positive/test_ft_p_10_smoothing_lookback.py
  • e2e/tests/negative/test_ft_n_02_sharp_turn_failure.py

Updated:

  • e2e/_unit_tests/test_directory_layout.py — added 7 new paths.

Archived:

  • _docs/02_tasks/todo/AZ-414_*.mddone/
  • _docs/02_tasks/todo/AZ-415_*.mddone/
  • _docs/02_tasks/todo/AZ-418_*.mddone/

Cumulative Review Trigger

K=3. Last cumulative covered batches 67-69. Since then: 70 + 71 = 2 batches. Cumulative does NOT fire this batch. Next cumulative trigger: end of batch 72.