- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering AC-1 (gyro_z run detection + synthetic-overlay fallback), AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance), AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario files under positive/ and negative/. - AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario. - AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw + smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3 (mean improvement >= 5 m); scenario file. - All scenarios skip-gated on upstream frame_source_replay / imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407 leftovers land). - +68 unit tests; full e2e unit suite: 393 passed. See _docs/03_implementation/batch_71_report.md and _docs/03_implementation/reviews/batch_71_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>
8.8 KiB
Code Review Report
Batch: 71 — AZ-414, AZ-415, AZ-418 Date: 2026-05-16 Verdict: PASS
Findings
(none)
Findings Sweep
Phase 1 — Context Loading
Loaded specs AZ-414_ft_p_07_ftn_02_sharp_turn.md,
AZ-415_ft_p_08_multi_segment_reloc.md,
AZ-418_ft_p_10_smoothing_lookback.md. Reused the existing geo.py,
sharp_turn_detector.py (new this batch), multi_segment_evaluator.py
(new this batch), smoothing_evaluator.py (new this batch), and the
fdr_reader / frame_source_replay / imu_replay stub gates used by
batches 69 and 70. Re-read _docs/00_problem/input_data/flight_derkachi/data_imu.csv
header layout to confirm SCALED_IMU2.zgyro column and GLOBAL_POSITION_INT.lat/lon
units (decimal degrees, not 1e-7 int32).
Phase 2 — Spec Compliance
AZ-414 (FT-P-07 + FT-N-02)
| AC | Coverage | Status |
|---|---|---|
AC-1 (turn segment ID via |gyro_z| ≥ threshold for ≥3 rows, with synthetic-overlay fallback marked in evidence CSV) |
test_detect_simple_turn, test_detect_short_run_pruned, test_detect_negative_yaw_uses_abs_value, test_detect_or_synthesize_falls_back_to_overlay, scenario synthetic_overlay column in write_csv_evidence |
Covered |
| AC-2 (during-turn label ∈ {visual_propagated, dead_reckoned}) | test_ft_n_02_passes_with_only_propagated_labels, test_ft_n_02_fails_on_satellite_anchored_during_turn, FT-N-02 scenario assertion |
Covered |
| AC-3 (cov non-decreasing during turn) | test_ft_n_02_fails_on_decreasing_covariance, FT-N-02 scenario assertion |
Covered |
| AC-4 (recovery ≤ 3 frames after turn end) | test_ft_p_07_passes_recovery_within_budget, test_ft_p_07_fails_when_recovery_takes_too_long, FT-P-07 scenario assertion via MAX_RECOVERY_FRAMES_SAFETY_MS |
Covered |
| AC-5 (drift ≤ 200 m) | test_ft_p_07_fails_when_drift_exceeds_budget, FT-P-07 scenario assertion |
Covered |
| AC-6 (heading delta ≤ 70°) | test_ft_p_07_heading_envelope_with_pre_anchor, test_ft_p_07_heading_outside_envelope_fails, FT-P-07 scenario assertion |
Covered |
AC-7 (parameterized per (fc_adapter, vio_strategy)) |
Both scenarios use fc_adapter + vio_strategy fixtures from runner/conftest.py — pytest --collect-only shows 6 variants each |
Covered |
Note on AC-3.2 threshold: helper reads AC32_SHARP_TURN_GYRO_Z_MDPS
env var (default 30,000 millidegree/s) per spec note ("reads from
test-spec environment, not from a hardcoded constant"). Default + env
override + validation covered by test_default_threshold_when_env_unset,
test_threshold_env_override_applies, test_threshold_env_rejects_non_int,
test_threshold_env_rejects_non_positive.
AZ-415 (FT-P-08)
| AC | Coverage | Status |
|---|---|---|
| AC-1 (multi-segment-derkachi fixture with three 5-15 s blackouts) | Fixture parameter, evidence CSV gap_s column |
Covered (gated on frame_source_replay/fdr_reader) |
| AC-2 (source_label = dead_reckoned inside each blackout) | test_evaluate_window_label_violation, scenario assertion |
Covered |
| AC-3 (recovery to satellite_anchored ≤ 10 s after window end) | test_evaluate_window_recovery_within_budget, test_evaluate_window_recovery_misses_budget, scenario assertion |
Covered |
| AC-4 (no centre jump > 200 m at recovery) | test_evaluate_window_jump_within_budget, test_evaluate_window_jump_exceeds_budget, scenario assertion |
Covered |
AZ-418 (FT-P-10)
| AC | Coverage | Status |
|---|---|---|
| AC-1 (raw + smoothed per past keyframe in FDR) | test_pair_records_groups_by_keyframe, test_pair_records_keeps_orphans_partial, test_pair_records_rejects_duplicate_pose_kind, test_evaluate_excludes_unpaired_keyframes, scenario record_type == "keyframe_pose" filter |
Covered |
| AC-2 (improvement rate ≥ 0.80) | test_evaluate_all_smoothed_wins_passes, test_evaluate_at_80_pct_improvement_rate_passes, test_evaluate_below_80_pct_fails_overall, scenario assertion |
Covered |
| AC-3 (mean improvement ≥ 5 m) | test_evaluate_at_80_pct_improvement_rate_passes, test_evaluate_mean_improvement_below_5m_fails, scenario assertion |
Covered |
Mode B Fact #107 ("INTERNAL improvement metric; NOT FC-side retroactive correction") explicitly preserved in module docstring.
Phase 3 — Code Quality
- Single responsibility: each helper is one concern.
sharp_turn_detectorowns detection + per-segment evaluation for both FT-P-07 (recovery) and FT-N-02 (during turn) because both halves consume the same segment fixture and the same outbound-estimate stream — splitting them would duplicateTurnFrameSamplecollection in two scenarios. The split is at the assertion layer (positive vs negative scenario file), not at the detector.smoothing_evaluatorowns pose-pair logic, GT resolution, and budget evaluation.multi_segment_evaluatoralready reviewed in batch 71 partial. - No suppressed errors: every helper raises on invalid input
(
get_threshold_mdpsenv validation,pair_recordsdupe/unknown pose_kind,load_zgyro_samplesmissing column,synthesize_overlay_segmentargument validation,evaluateempty GT track). - AAA comment discipline: every test uses `# Arrange / # Act /
Assert
constant tests).; sections omitted when not needed (single-lineAssert` for - No narration comments: docstrings explain non-obvious intent (AC mapping, why orphans are excluded, why nearest-neighbour GT is acceptable, why the env override exists).
Phase 4 — Security
- No SUT imports: confirmed by passing
test_no_sut_imports.pyin the full suite. None of the new modules import fromsrc.gps_denied_onboard. - No PII/credentials: helpers handle synthetic + Derkachi-public GT only.
- No SQL/shell injection surface: helpers consume CSV via
csv.DictReaderandpathlib. No subprocess calls.
Phase 5 — Performance
- All helpers are O(N) over samples.
pair_recordsis one dict pass;evaluateis O(P) over paired keyframes (P typically ≤200 for the Derkachi window);detect_turn_segmentsis one scan;evaluate_ft_p_07uses a linear scan for pre-anchor + recovery (acceptable for ≤1000 samples; if scenario data grows we can switch to bisect). - No nested CSV reads or repeated geodesic recomputations.
Phase 6 — Cross-Task Consistency
- Pattern parity with batches 69 + 70:
- Skip gate (
_harness_helpers_implementedfixture) for missing upstream replay helpers — same pattern astest_ft_p_02_*,test_ft_p_04_*,test_ft_p_05_*. _NullSink/_NullImuEmitterprobes — same pattern astest_ft_p_04_*.- Evidence CSV via
write_csv_evidence(out, …)returning the path — same pattern asaccuracy_evaluator,mre_evaluator,multi_segment_evaluator. - NFR metrics via
nfr_recorder.record_metric(name, value, ac_id=…)— same pattern astest_ft_p_01_*,test_ft_p_04_*. - Helper modules importable from
runner.helpers.*with module-level constants inUPPER_SNAKE— matchesmulti_segment_evaluatorandmre_evaluator.
- Skip gate (
- No drift: FT-P-07 scenario reuses the
MAX_RECOVERY_FRAMES_SAFETY_MSbudget constant from the helper (no magic numbers); FT-N-02 reusesALLOWED_DURING_TURN_LABELSset so the scenario assertion message prints the spec-accurate alphabetised set.
Phase 7 — Architecture Compliance
- Public-boundary discipline: confirmed by
test_no_sut_imports.py(passing). Helpers consume only the FDR record schema (record_type- payload dict) defined in
runner.helpers.fdr_reader.
- payload dict) defined in
- Directory layout: new files added to
test_directory_layout.pyparametrize list (runner/helpers/{smoothing_evaluator,sharp_turn_detector,multi_segment_evaluator}.py,tests/positive/test_ft_p_{07,08,10}_*.py,tests/negative/test_ft_n_02_*.py). All 75 parametrized variants pass. - Determinism: all helpers are deterministic — no
time.time(), no random number generation;randomnot imported in any new module.
Phase 8 — Test Suite Health
- Total: 393 passed in 126.64s (was 325 at end of batch 70).
- New tests this batch: +68 (sharp_turn_detector: 30; smoothing_evaluator: 15; multi_segment_evaluator: 16; directory_layout new entries: 7).
- Pre-existing macOS-only
/e2e-resultsplugin issue still present — affects scenario test invocation outside Docker only; unit suite unaffected. Tracked under runner reporting (out of batch scope).
Cross-Task Consistency Verdict
PASS — no cross-task drift, no duplicated logic across the three new helpers, no shared mutable state, evidence CSV schemas distinct per scenario but follow the same write pattern.
Architecture Compliance Verdict
PASS — public-boundary blackbox stance preserved; no SUT imports; FDR schema honoured.
Final Verdict
PASS — Batch 71 (AZ-414 + AZ-415 + AZ-418) ready for commit.