Files
Oleksandr Bezdieniezhnykh c6e6cba237 [AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing
- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering
  AC-1 (gyro_z run detection + synthetic-overlay fallback),
  AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance),
  AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario
  files under positive/ and negative/.
- AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario.
- AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw +
  smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3
  (mean improvement >= 5 m); scenario file.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407
  leftovers land).
- +68 unit tests; full e2e unit suite: 393 passed.

See _docs/03_implementation/batch_71_report.md and
_docs/03_implementation/reviews/batch_71_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 07:12:24 +03:00

8.8 KiB

Code Review Report

Batch: 71 — AZ-414, AZ-415, AZ-418 Date: 2026-05-16 Verdict: PASS

Findings

(none)

Findings Sweep

Phase 1 — Context Loading

Loaded specs AZ-414_ft_p_07_ftn_02_sharp_turn.md, AZ-415_ft_p_08_multi_segment_reloc.md, AZ-418_ft_p_10_smoothing_lookback.md. Reused the existing geo.py, sharp_turn_detector.py (new this batch), multi_segment_evaluator.py (new this batch), smoothing_evaluator.py (new this batch), and the fdr_reader / frame_source_replay / imu_replay stub gates used by batches 69 and 70. Re-read _docs/00_problem/input_data/flight_derkachi/data_imu.csv header layout to confirm SCALED_IMU2.zgyro column and GLOBAL_POSITION_INT.lat/lon units (decimal degrees, not 1e-7 int32).

Phase 2 — Spec Compliance

AZ-414 (FT-P-07 + FT-N-02)

AC Coverage Status
AC-1 (turn segment ID via |gyro_z| ≥ threshold for ≥3 rows, with synthetic-overlay fallback marked in evidence CSV) test_detect_simple_turn, test_detect_short_run_pruned, test_detect_negative_yaw_uses_abs_value, test_detect_or_synthesize_falls_back_to_overlay, scenario synthetic_overlay column in write_csv_evidence Covered
AC-2 (during-turn label ∈ {visual_propagated, dead_reckoned}) test_ft_n_02_passes_with_only_propagated_labels, test_ft_n_02_fails_on_satellite_anchored_during_turn, FT-N-02 scenario assertion Covered
AC-3 (cov non-decreasing during turn) test_ft_n_02_fails_on_decreasing_covariance, FT-N-02 scenario assertion Covered
AC-4 (recovery ≤ 3 frames after turn end) test_ft_p_07_passes_recovery_within_budget, test_ft_p_07_fails_when_recovery_takes_too_long, FT-P-07 scenario assertion via MAX_RECOVERY_FRAMES_SAFETY_MS Covered
AC-5 (drift ≤ 200 m) test_ft_p_07_fails_when_drift_exceeds_budget, FT-P-07 scenario assertion Covered
AC-6 (heading delta ≤ 70°) test_ft_p_07_heading_envelope_with_pre_anchor, test_ft_p_07_heading_outside_envelope_fails, FT-P-07 scenario assertion Covered
AC-7 (parameterized per (fc_adapter, vio_strategy)) Both scenarios use fc_adapter + vio_strategy fixtures from runner/conftest.pypytest --collect-only shows 6 variants each Covered

Note on AC-3.2 threshold: helper reads AC32_SHARP_TURN_GYRO_Z_MDPS env var (default 30,000 millidegree/s) per spec note ("reads from test-spec environment, not from a hardcoded constant"). Default + env override + validation covered by test_default_threshold_when_env_unset, test_threshold_env_override_applies, test_threshold_env_rejects_non_int, test_threshold_env_rejects_non_positive.

AZ-415 (FT-P-08)

AC Coverage Status
AC-1 (multi-segment-derkachi fixture with three 5-15 s blackouts) Fixture parameter, evidence CSV gap_s column Covered (gated on frame_source_replay/fdr_reader)
AC-2 (source_label = dead_reckoned inside each blackout) test_evaluate_window_label_violation, scenario assertion Covered
AC-3 (recovery to satellite_anchored ≤ 10 s after window end) test_evaluate_window_recovery_within_budget, test_evaluate_window_recovery_misses_budget, scenario assertion Covered
AC-4 (no centre jump > 200 m at recovery) test_evaluate_window_jump_within_budget, test_evaluate_window_jump_exceeds_budget, scenario assertion Covered

AZ-418 (FT-P-10)

AC Coverage Status
AC-1 (raw + smoothed per past keyframe in FDR) test_pair_records_groups_by_keyframe, test_pair_records_keeps_orphans_partial, test_pair_records_rejects_duplicate_pose_kind, test_evaluate_excludes_unpaired_keyframes, scenario record_type == "keyframe_pose" filter Covered
AC-2 (improvement rate ≥ 0.80) test_evaluate_all_smoothed_wins_passes, test_evaluate_at_80_pct_improvement_rate_passes, test_evaluate_below_80_pct_fails_overall, scenario assertion Covered
AC-3 (mean improvement ≥ 5 m) test_evaluate_at_80_pct_improvement_rate_passes, test_evaluate_mean_improvement_below_5m_fails, scenario assertion Covered

Mode B Fact #107 ("INTERNAL improvement metric; NOT FC-side retroactive correction") explicitly preserved in module docstring.

Phase 3 — Code Quality

  • Single responsibility: each helper is one concern. sharp_turn_detector owns detection + per-segment evaluation for both FT-P-07 (recovery) and FT-N-02 (during turn) because both halves consume the same segment fixture and the same outbound-estimate stream — splitting them would duplicate TurnFrameSample collection in two scenarios. The split is at the assertion layer (positive vs negative scenario file), not at the detector. smoothing_evaluator owns pose-pair logic, GT resolution, and budget evaluation. multi_segment_evaluator already reviewed in batch 71 partial.
  • No suppressed errors: every helper raises on invalid input (get_threshold_mdps env validation, pair_records dupe/unknown pose_kind, load_zgyro_samples missing column, synthesize_overlay_segment argument validation, evaluate empty GT track).
  • AAA comment discipline: every test uses `# Arrange / # Act /

    Assert; sections omitted when not needed (single-line Assert` for

    constant tests).
  • No narration comments: docstrings explain non-obvious intent (AC mapping, why orphans are excluded, why nearest-neighbour GT is acceptable, why the env override exists).

Phase 4 — Security

  • No SUT imports: confirmed by passing test_no_sut_imports.py in the full suite. None of the new modules import from src.gps_denied_onboard.
  • No PII/credentials: helpers handle synthetic + Derkachi-public GT only.
  • No SQL/shell injection surface: helpers consume CSV via csv.DictReader and pathlib. No subprocess calls.

Phase 5 — Performance

  • All helpers are O(N) over samples. pair_records is one dict pass; evaluate is O(P) over paired keyframes (P typically ≤200 for the Derkachi window); detect_turn_segments is one scan; evaluate_ft_p_07 uses a linear scan for pre-anchor + recovery (acceptable for ≤1000 samples; if scenario data grows we can switch to bisect).
  • No nested CSV reads or repeated geodesic recomputations.

Phase 6 — Cross-Task Consistency

  • Pattern parity with batches 69 + 70:
    • Skip gate (_harness_helpers_implemented fixture) for missing upstream replay helpers — same pattern as test_ft_p_02_*, test_ft_p_04_*, test_ft_p_05_*.
    • _NullSink / _NullImuEmitter probes — same pattern as test_ft_p_04_*.
    • Evidence CSV via write_csv_evidence(out, …) returning the path — same pattern as accuracy_evaluator, mre_evaluator, multi_segment_evaluator.
    • NFR metrics via nfr_recorder.record_metric(name, value, ac_id=…) — same pattern as test_ft_p_01_*, test_ft_p_04_*.
    • Helper modules importable from runner.helpers.* with module-level constants in UPPER_SNAKE — matches multi_segment_evaluator and mre_evaluator.
  • No drift: FT-P-07 scenario reuses the MAX_RECOVERY_FRAMES_SAFETY_MS budget constant from the helper (no magic numbers); FT-N-02 reuses ALLOWED_DURING_TURN_LABELS set so the scenario assertion message prints the spec-accurate alphabetised set.

Phase 7 — Architecture Compliance

  • Public-boundary discipline: confirmed by test_no_sut_imports.py (passing). Helpers consume only the FDR record schema (record_type
    • payload dict) defined in runner.helpers.fdr_reader.
  • Directory layout: new files added to test_directory_layout.py parametrize list (runner/helpers/{smoothing_evaluator,sharp_turn_detector,multi_segment_evaluator}.py, tests/positive/test_ft_p_{07,08,10}_*.py, tests/negative/test_ft_n_02_*.py). All 75 parametrized variants pass.
  • Determinism: all helpers are deterministic — no time.time(), no random number generation; random not imported in any new module.

Phase 8 — Test Suite Health

  • Total: 393 passed in 126.64s (was 325 at end of batch 70).
  • New tests this batch: +68 (sharp_turn_detector: 30; smoothing_evaluator: 15; multi_segment_evaluator: 16; directory_layout new entries: 7).
  • Pre-existing macOS-only /e2e-results plugin issue still present — affects scenario test invocation outside Docker only; unit suite unaffected. Tracked under runner reporting (out of batch scope).

Cross-Task Consistency Verdict

PASS — no cross-task drift, no duplicated logic across the three new helpers, no shared mutable state, evidence CSV schemas distinct per scenario but follow the same write pattern.

Architecture Compliance Verdict

PASS — public-boundary blackbox stance preserved; no SUT imports; FDR schema honoured.

Final Verdict

PASS — Batch 71 (AZ-414 + AZ-415 + AZ-418) ready for commit.