mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 21:21:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh c6e6cba237 [AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing

- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering
  AC-1 (gyro_z run detection + synthetic-overlay fallback),
  AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance),
  AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario
  files under positive/ and negative/.
- AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario.
- AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw +
  smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3
  (mean improvement >= 5 m); scenario file.
- All scenarios skip-gated on upstream frame_source_replay /
  imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407
  leftovers land).
- +68 unit tests; full e2e unit suite: 393 passed.

See _docs/03_implementation/batch_71_report.md and
_docs/03_implementation/reviews/batch_71_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-17 07:12:24 +03:00

8.8 KiB

Raw Blame History

Code Review Report

Batch: 71 — AZ-414, AZ-415, AZ-418 Date: 2026-05-16 Verdict: PASS

Findings

(none)

Findings Sweep

Phase 1 — Context Loading

Loaded specs AZ-414_ft_p_07_ftn_02_sharp_turn.md, AZ-415_ft_p_08_multi_segment_reloc.md, AZ-418_ft_p_10_smoothing_lookback.md. Reused the existing geo.py, sharp_turn_detector.py (new this batch), multi_segment_evaluator.py (new this batch), smoothing_evaluator.py (new this batch), and the fdr_reader / frame_source_replay / imu_replay stub gates used by batches 69 and 70. Re-read _docs/00_problem/input_data/flight_derkachi/data_imu.csv header layout to confirm SCALED_IMU2.zgyro column and GLOBAL_POSITION_INT.lat/lon units (decimal degrees, not 1e-7 int32).

Phase 2 — Spec Compliance

AZ-414 (FT-P-07 + FT-N-02)

AC	Coverage	Status
AC-1 (turn segment ID via `\|gyro_z\| ≥ threshold` for ≥3 rows, with synthetic-overlay fallback marked in evidence CSV)	`test_detect_simple_turn`, `test_detect_short_run_pruned`, `test_detect_negative_yaw_uses_abs_value`, `test_detect_or_synthesize_falls_back_to_overlay`, scenario `synthetic_overlay` column in `write_csv_evidence`	Covered
AC-2 (during-turn label ∈ {visual_propagated, dead_reckoned})	`test_ft_n_02_passes_with_only_propagated_labels`, `test_ft_n_02_fails_on_satellite_anchored_during_turn`, FT-N-02 scenario assertion	Covered
AC-3 (cov non-decreasing during turn)	`test_ft_n_02_fails_on_decreasing_covariance`, FT-N-02 scenario assertion	Covered
AC-4 (recovery ≤ 3 frames after turn end)	`test_ft_p_07_passes_recovery_within_budget`, `test_ft_p_07_fails_when_recovery_takes_too_long`, FT-P-07 scenario assertion via `MAX_RECOVERY_FRAMES_SAFETY_MS`	Covered
AC-5 (drift ≤ 200 m)	`test_ft_p_07_fails_when_drift_exceeds_budget`, FT-P-07 scenario assertion	Covered
AC-6 (heading delta ≤ 70°)	`test_ft_p_07_heading_envelope_with_pre_anchor`, `test_ft_p_07_heading_outside_envelope_fails`, FT-P-07 scenario assertion	Covered
AC-7 (parameterized per `(fc_adapter, vio_strategy)`)	Both scenarios use `fc_adapter` + `vio_strategy` fixtures from `runner/conftest.py` — `pytest --collect-only` shows 6 variants each	Covered

Note on AC-3.2 threshold: helper reads AC32_SHARP_TURN_GYRO_Z_MDPS env var (default 30,000 millidegree/s) per spec note ("reads from test-spec environment, not from a hardcoded constant"). Default + env override + validation covered by test_default_threshold_when_env_unset, test_threshold_env_override_applies, test_threshold_env_rejects_non_int, test_threshold_env_rejects_non_positive.

AZ-415 (FT-P-08)

AC	Coverage	Status
AC-1 (multi-segment-derkachi fixture with three 5-15 s blackouts)	Fixture parameter, evidence CSV `gap_s` column	Covered (gated on `frame_source_replay`/`fdr_reader`)
AC-2 (source_label = dead_reckoned inside each blackout)	`test_evaluate_window_label_violation`, scenario assertion	Covered
AC-3 (recovery to satellite_anchored ≤ 10 s after window end)	`test_evaluate_window_recovery_within_budget`, `test_evaluate_window_recovery_misses_budget`, scenario assertion	Covered
AC-4 (no centre jump > 200 m at recovery)	`test_evaluate_window_jump_within_budget`, `test_evaluate_window_jump_exceeds_budget`, scenario assertion	Covered

AZ-418 (FT-P-10)

AC	Coverage	Status
AC-1 (raw + smoothed per past keyframe in FDR)	`test_pair_records_groups_by_keyframe`, `test_pair_records_keeps_orphans_partial`, `test_pair_records_rejects_duplicate_pose_kind`, `test_evaluate_excludes_unpaired_keyframes`, scenario `record_type == "keyframe_pose"` filter	Covered
AC-2 (improvement rate ≥ 0.80)	`test_evaluate_all_smoothed_wins_passes`, `test_evaluate_at_80_pct_improvement_rate_passes`, `test_evaluate_below_80_pct_fails_overall`, scenario assertion	Covered
AC-3 (mean improvement ≥ 5 m)	`test_evaluate_at_80_pct_improvement_rate_passes`, `test_evaluate_mean_improvement_below_5m_fails`, scenario assertion	Covered

Mode B Fact #107 ("INTERNAL improvement metric; NOT FC-side retroactive correction") explicitly preserved in module docstring.

Phase 3 — Code Quality

Single responsibility: each helper is one concern. sharp_turn_detector owns detection + per-segment evaluation for both FT-P-07 (recovery) and FT-N-02 (during turn) because both halves consume the same segment fixture and the same outbound-estimate stream — splitting them would duplicate TurnFrameSample collection in two scenarios. The split is at the assertion layer (positive vs negative scenario file), not at the detector. smoothing_evaluator owns pose-pair logic, GT resolution, and budget evaluation. multi_segment_evaluator already reviewed in batch 71 partial.
No suppressed errors: every helper raises on invalid input (get_threshold_mdps env validation, pair_records dupe/unknown pose_kind, load_zgyro_samples missing column, synthesize_overlay_segment argument validation, evaluate empty GT track).
AAA comment discipline: every test uses `# Arrange / # Act /
Assert; sections omitted when not needed (single-line Assert` for
constant tests).
No narration comments: docstrings explain non-obvious intent (AC mapping, why orphans are excluded, why nearest-neighbour GT is acceptable, why the env override exists).

Phase 4 — Security

No SUT imports: confirmed by passing test_no_sut_imports.py in the full suite. None of the new modules import from src.gps_denied_onboard.
No PII/credentials: helpers handle synthetic + Derkachi-public GT only.
No SQL/shell injection surface: helpers consume CSV via csv.DictReader and pathlib. No subprocess calls.

Phase 5 — Performance

All helpers are O(N) over samples. pair_records is one dict pass; evaluate is O(P) over paired keyframes (P typically ≤200 for the Derkachi window); detect_turn_segments is one scan; evaluate_ft_p_07 uses a linear scan for pre-anchor + recovery (acceptable for ≤1000 samples; if scenario data grows we can switch to bisect).
No nested CSV reads or repeated geodesic recomputations.

Phase 6 — Cross-Task Consistency

Pattern parity with batches 69 + 70:
- Skip gate (_harness_helpers_implemented fixture) for missing upstream replay helpers — same pattern as test_ft_p_02_*, test_ft_p_04_*, test_ft_p_05_*.
- _NullSink / _NullImuEmitter probes — same pattern as test_ft_p_04_*.
- Evidence CSV via write_csv_evidence(out, …) returning the path — same pattern as accuracy_evaluator, mre_evaluator, multi_segment_evaluator.
- NFR metrics via nfr_recorder.record_metric(name, value, ac_id=…) — same pattern as test_ft_p_01_*, test_ft_p_04_*.
- Helper modules importable from runner.helpers.* with module-level constants in UPPER_SNAKE — matches multi_segment_evaluator and mre_evaluator.
No drift: FT-P-07 scenario reuses the MAX_RECOVERY_FRAMES_SAFETY_MS budget constant from the helper (no magic numbers); FT-N-02 reuses ALLOWED_DURING_TURN_LABELS set so the scenario assertion message prints the spec-accurate alphabetised set.

Phase 7 — Architecture Compliance

Public-boundary discipline: confirmed by test_no_sut_imports.py (passing). Helpers consume only the FDR record schema (record_type
- payload dict) defined in runner.helpers.fdr_reader.
Directory layout: new files added to test_directory_layout.py parametrize list (runner/helpers/{smoothing_evaluator,sharp_turn_detector,multi_segment_evaluator}.py, tests/positive/test_ft_p_{07,08,10}_*.py, tests/negative/test_ft_n_02_*.py). All 75 parametrized variants pass.
Determinism: all helpers are deterministic — no time.time(), no random number generation; random not imported in any new module.

Phase 8 — Test Suite Health

Total: 393 passed in 126.64s (was 325 at end of batch 70).
New tests this batch: +68 (sharp_turn_detector: 30; smoothing_evaluator: 15; multi_segment_evaluator: 16; directory_layout new entries: 7).
Pre-existing macOS-only /e2e-results plugin issue still present — affects scenario test invocation outside Docker only; unit suite unaffected. Tracked under runner reporting (out of batch scope).

Cross-Task Consistency Verdict

PASS — no cross-task drift, no duplicated logic across the three new helpers, no shared mutable state, evidence CSV schemas distinct per scenario but follow the same write pattern.

Architecture Compliance Verdict

PASS — public-boundary blackbox stance preserved; no SUT imports; FDR schema honoured.

Final Verdict

PASS — Batch 71 (AZ-414 + AZ-415 + AZ-418) ready for commit.

8.8 KiB Raw Blame History