mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 02:21:13 +00:00
[AZ-414] [AZ-415] [AZ-418] Test batch 71: sharp turn + multi-segment + smoothing
- AZ-414 (FT-P-07 + FT-N-02): sharp_turn_detector helper covering AC-1 (gyro_z run detection + synthetic-overlay fallback), AC-2/AC-3 (FT-N-02 during-turn label + monotonic covariance), AC-4/AC-5/AC-6 (FT-P-07 recovery lag/drift/heading); twin scenario files under positive/ and negative/. - AZ-415 (FT-P-08): multi_segment_evaluator helper + scenario. - AZ-418 (FT-P-10): smoothing_evaluator helper covering AC-1 (raw + smoothed pose pairing), AC-2 (improvement rate >= 0.80), AC-3 (mean improvement >= 5 m); scenario file. - All scenarios skip-gated on upstream frame_source_replay / imu_replay / fdr_reader stubs (auto-activate when AZ-441 + AZ-407 leftovers land). - +68 unit tests; full e2e unit suite: 393 passed. See _docs/03_implementation/batch_71_report.md and _docs/03_implementation/reviews/batch_71_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,119 @@
|
||||
# Batch 71 Report — Test Implementation (cycle 1, batch 5 of test phase)
|
||||
|
||||
**Batch**: 71
|
||||
**Date**: 2026-05-16
|
||||
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
|
||||
**Tasks**: AZ-414 (3pt), AZ-415 (3pt), AZ-418 (3pt) — 9 cp / 3 tasks
|
||||
**Cycle**: 1
|
||||
**Verdict**: COMPLETE — PASS (self-reviewed; see `reviews/batch_71_review.md`)
|
||||
|
||||
## Summary
|
||||
|
||||
Three scenarios covering sharp-turn recovery (positive + negative twin),
|
||||
multi-segment relocalisation, and GTSAM smoothing-loop look-back. Same
|
||||
pattern as batches 69 + 70:
|
||||
|
||||
* Pure-logic helper under `e2e/runner/helpers/` for everything the
|
||||
scenario can express without docker-bound replay + FDR ingestion.
|
||||
* Scenario file(s) under `e2e/tests/{positive,negative}/`,
|
||||
parameterized across `(fc_adapter, vio_strategy)`, skip-gated on
|
||||
upstream `frame_source_replay` / `imu_replay` / `fdr_reader` stubs
|
||||
(auto-activates when AZ-441 + AZ-407 leftovers land).
|
||||
* Helper-driven unit test file under `e2e/_unit_tests/helpers/`.
|
||||
|
||||
### AZ-414 — FT-P-07 + FT-N-02 sharp-turn recovery + failure twin (3pt)
|
||||
|
||||
* **`runner/helpers/sharp_turn_detector.py`** — `load_zgyro_samples`
|
||||
reads `SCALED_IMU2.zgyro` (millidegree/s) from `data_imu.csv`;
|
||||
`get_threshold_mdps` reads the AC-3.2 threshold from env var
|
||||
`AC32_SHARP_TURN_GYRO_Z_MDPS` (default 30,000 mdps) per spec note;
|
||||
`detect_turn_segments` finds contiguous runs of ≥3 samples above
|
||||
threshold (using `|zgyro|` so left + right turns both qualify);
|
||||
`synthesize_overlay_segment` provides the AC-1 fallback when the
|
||||
natural fixture has no qualifying turn; `detect_or_synthesize` is the
|
||||
scenario-facing helper that picks natural-first; `evaluate_ft_n_02`
|
||||
computes AC-2 (label ∈ {visual_propagated, dead_reckoned}) + AC-3
|
||||
(cov non-decreasing); `evaluate_ft_p_07` computes AC-4 (recovery
|
||||
lag ≤ 3 frames safety-budget = 1100 ms), AC-5 (drift ≤ 200 m), AC-6
|
||||
(heading delta ≤ 70°). `write_csv_evidence` emits a combined CSV
|
||||
with `synthetic_overlay` column so the fallback fact is recorded per
|
||||
AC-1.
|
||||
* **`tests/positive/test_ft_p_07_sharp_turn_recovery.py`** — asserts
|
||||
AC-4 + AC-5 + AC-6 per segment; records NFR metrics with AC IDs;
|
||||
records `synthetic_overlay` flag.
|
||||
* **`tests/negative/test_ft_n_02_sharp_turn_failure.py`** — asserts
|
||||
AC-2 + AC-3 per segment; uses the same detector + frame-collection
|
||||
logic.
|
||||
* **30 unit tests** in `test_sharp_turn_detector.py`.
|
||||
|
||||
### AZ-415 — FT-P-08 multi-segment relocalisation (3pt)
|
||||
|
||||
* **`runner/helpers/multi_segment_evaluator.py`** — `load_schedule`
|
||||
reads the AZ-408 multi-segment schedule (blackout window times +
|
||||
recovery anchors); `evaluate_window` checks AC-2 (label =
|
||||
dead_reckoned inside window), AC-3 (recovery to satellite_anchored
|
||||
≤ 10 s after window end), AC-4 (no centre jump > 200 m at recovery);
|
||||
`evaluate` aggregates over all blackout windows; constants
|
||||
`MAX_RECOVERY_LAG_MS = 10_000` and `MAX_JUMP_M = 200.0` expose the
|
||||
thresholds.
|
||||
* **`tests/positive/test_ft_p_08_multi_segment_reloc.py`** — drives
|
||||
`multi-segment-derkachi` injector fixture, replays via stub-gated
|
||||
helpers, evaluates per window.
|
||||
* **16 unit tests** in `test_multi_segment_evaluator.py`.
|
||||
|
||||
### AZ-418 — FT-P-10 GTSAM smoothing-loop look-back accuracy (3pt)
|
||||
|
||||
* **`runner/helpers/smoothing_evaluator.py`** — `pair_records` groups
|
||||
FDR `keyframe_pose` records by `keyframe_id` and rejects duplicate
|
||||
`pose_kind` values (raw + smoothed must be exactly one each per
|
||||
keyframe per AC-1); `resolve_gt_at` picks the nearest-in-time
|
||||
Derkachi GLOBAL_POSITION_INT pose (10 Hz cadence → ≤50 ms slop
|
||||
acceptable for the metre-scale improvement deltas this AC
|
||||
measures); `evaluate` produces per-keyframe + aggregate report with
|
||||
`improvement_rate` (AC-2 threshold = 0.80) and `mean_improvement_m`
|
||||
(AC-3 threshold = 5 m). The module docstring explicitly preserves
|
||||
Mode B Fact #107 — this is an INTERNAL improvement metric, NOT
|
||||
FC-side retroactive correction.
|
||||
* **`tests/positive/test_ft_p_10_smoothing_lookback.py`** — pairs +
|
||||
evaluates + asserts AC-2 + AC-3 per (fc_adapter, vio_strategy).
|
||||
* **15 unit tests** in `test_smoothing_evaluator.py`.
|
||||
|
||||
## Tests
|
||||
|
||||
* **Full e2e unit suite**: 393 passed in 126.64 s (was 325 at end of
|
||||
batch 70 → +68 net new tests this batch).
|
||||
* **Pre-existing**: macOS-only `/e2e-results` plugin issue in scenario
|
||||
invocation outside Docker. Unit suite unaffected. Tracked under
|
||||
runner reporting — out of batch scope.
|
||||
|
||||
## Files Touched
|
||||
|
||||
**New helpers:**
|
||||
* `e2e/runner/helpers/sharp_turn_detector.py`
|
||||
* `e2e/runner/helpers/smoothing_evaluator.py`
|
||||
* `e2e/runner/helpers/multi_segment_evaluator.py`
|
||||
|
||||
**New unit tests:**
|
||||
* `e2e/_unit_tests/helpers/test_sharp_turn_detector.py` (30 tests)
|
||||
* `e2e/_unit_tests/helpers/test_smoothing_evaluator.py` (15 tests)
|
||||
* `e2e/_unit_tests/helpers/test_multi_segment_evaluator.py` (16 tests)
|
||||
|
||||
**New scenarios:**
|
||||
* `e2e/tests/positive/test_ft_p_07_sharp_turn_recovery.py`
|
||||
* `e2e/tests/positive/test_ft_p_08_multi_segment_reloc.py`
|
||||
* `e2e/tests/positive/test_ft_p_10_smoothing_lookback.py`
|
||||
* `e2e/tests/negative/test_ft_n_02_sharp_turn_failure.py`
|
||||
|
||||
**Updated:**
|
||||
* `e2e/_unit_tests/test_directory_layout.py` — added 7 new paths.
|
||||
|
||||
**Archived:**
|
||||
* `_docs/02_tasks/todo/AZ-414_*.md` → `done/`
|
||||
* `_docs/02_tasks/todo/AZ-415_*.md` → `done/`
|
||||
* `_docs/02_tasks/todo/AZ-418_*.md` → `done/`
|
||||
|
||||
## Cumulative Review Trigger
|
||||
|
||||
K=3. Last cumulative covered batches 67-69. Since then: 70 + 71 = 2
|
||||
batches. **Cumulative does NOT fire this batch.** Next cumulative
|
||||
trigger: end of batch 72.
|
||||
@@ -0,0 +1,162 @@
|
||||
# Code Review Report
|
||||
|
||||
**Batch**: 71 — AZ-414, AZ-415, AZ-418
|
||||
**Date**: 2026-05-16
|
||||
**Verdict**: PASS
|
||||
|
||||
## Findings
|
||||
|
||||
(none)
|
||||
|
||||
## Findings Sweep
|
||||
|
||||
### Phase 1 — Context Loading
|
||||
|
||||
Loaded specs `AZ-414_ft_p_07_ftn_02_sharp_turn.md`,
|
||||
`AZ-415_ft_p_08_multi_segment_reloc.md`,
|
||||
`AZ-418_ft_p_10_smoothing_lookback.md`. Reused the existing `geo.py`,
|
||||
`sharp_turn_detector.py` (new this batch), `multi_segment_evaluator.py`
|
||||
(new this batch), `smoothing_evaluator.py` (new this batch), and the
|
||||
`fdr_reader` / `frame_source_replay` / `imu_replay` stub gates used by
|
||||
batches 69 and 70. Re-read `_docs/00_problem/input_data/flight_derkachi/data_imu.csv`
|
||||
header layout to confirm `SCALED_IMU2.zgyro` column and `GLOBAL_POSITION_INT.lat/lon`
|
||||
units (decimal degrees, not 1e-7 int32).
|
||||
|
||||
### Phase 2 — Spec Compliance
|
||||
|
||||
**AZ-414 (FT-P-07 + FT-N-02)**
|
||||
|
||||
| AC | Coverage | Status |
|
||||
|----|----------|--------|
|
||||
| AC-1 (turn segment ID via `\|gyro_z\| ≥ threshold` for ≥3 rows, with synthetic-overlay fallback marked in evidence CSV) | `test_detect_simple_turn`, `test_detect_short_run_pruned`, `test_detect_negative_yaw_uses_abs_value`, `test_detect_or_synthesize_falls_back_to_overlay`, scenario `synthetic_overlay` column in `write_csv_evidence` | Covered |
|
||||
| AC-2 (during-turn label ∈ {visual_propagated, dead_reckoned}) | `test_ft_n_02_passes_with_only_propagated_labels`, `test_ft_n_02_fails_on_satellite_anchored_during_turn`, FT-N-02 scenario assertion | Covered |
|
||||
| AC-3 (cov non-decreasing during turn) | `test_ft_n_02_fails_on_decreasing_covariance`, FT-N-02 scenario assertion | Covered |
|
||||
| AC-4 (recovery ≤ 3 frames after turn end) | `test_ft_p_07_passes_recovery_within_budget`, `test_ft_p_07_fails_when_recovery_takes_too_long`, FT-P-07 scenario assertion via `MAX_RECOVERY_FRAMES_SAFETY_MS` | Covered |
|
||||
| AC-5 (drift ≤ 200 m) | `test_ft_p_07_fails_when_drift_exceeds_budget`, FT-P-07 scenario assertion | Covered |
|
||||
| AC-6 (heading delta ≤ 70°) | `test_ft_p_07_heading_envelope_with_pre_anchor`, `test_ft_p_07_heading_outside_envelope_fails`, FT-P-07 scenario assertion | Covered |
|
||||
| AC-7 (parameterized per `(fc_adapter, vio_strategy)`) | Both scenarios use `fc_adapter` + `vio_strategy` fixtures from `runner/conftest.py` — `pytest --collect-only` shows 6 variants each | Covered |
|
||||
|
||||
Note on AC-3.2 threshold: helper reads `AC32_SHARP_TURN_GYRO_Z_MDPS`
|
||||
env var (default 30,000 millidegree/s) per spec note ("reads from
|
||||
test-spec environment, not from a hardcoded constant"). Default + env
|
||||
override + validation covered by `test_default_threshold_when_env_unset`,
|
||||
`test_threshold_env_override_applies`, `test_threshold_env_rejects_non_int`,
|
||||
`test_threshold_env_rejects_non_positive`.
|
||||
|
||||
**AZ-415 (FT-P-08)**
|
||||
|
||||
| AC | Coverage | Status |
|
||||
|----|----------|--------|
|
||||
| AC-1 (multi-segment-derkachi fixture with three 5-15 s blackouts) | Fixture parameter, evidence CSV `gap_s` column | Covered (gated on `frame_source_replay`/`fdr_reader`) |
|
||||
| AC-2 (source_label = dead_reckoned inside each blackout) | `test_evaluate_window_label_violation`, scenario assertion | Covered |
|
||||
| AC-3 (recovery to satellite_anchored ≤ 10 s after window end) | `test_evaluate_window_recovery_within_budget`, `test_evaluate_window_recovery_misses_budget`, scenario assertion | Covered |
|
||||
| AC-4 (no centre jump > 200 m at recovery) | `test_evaluate_window_jump_within_budget`, `test_evaluate_window_jump_exceeds_budget`, scenario assertion | Covered |
|
||||
|
||||
**AZ-418 (FT-P-10)**
|
||||
|
||||
| AC | Coverage | Status |
|
||||
|----|----------|--------|
|
||||
| AC-1 (raw + smoothed per past keyframe in FDR) | `test_pair_records_groups_by_keyframe`, `test_pair_records_keeps_orphans_partial`, `test_pair_records_rejects_duplicate_pose_kind`, `test_evaluate_excludes_unpaired_keyframes`, scenario `record_type == "keyframe_pose"` filter | Covered |
|
||||
| AC-2 (improvement rate ≥ 0.80) | `test_evaluate_all_smoothed_wins_passes`, `test_evaluate_at_80_pct_improvement_rate_passes`, `test_evaluate_below_80_pct_fails_overall`, scenario assertion | Covered |
|
||||
| AC-3 (mean improvement ≥ 5 m) | `test_evaluate_at_80_pct_improvement_rate_passes`, `test_evaluate_mean_improvement_below_5m_fails`, scenario assertion | Covered |
|
||||
|
||||
Mode B Fact #107 ("INTERNAL improvement metric; NOT FC-side retroactive
|
||||
correction") explicitly preserved in module docstring.
|
||||
|
||||
### Phase 3 — Code Quality
|
||||
|
||||
* **Single responsibility**: each helper is one concern. `sharp_turn_detector`
|
||||
owns detection + per-segment evaluation for both FT-P-07 (recovery) and
|
||||
FT-N-02 (during turn) because both halves consume the same segment
|
||||
fixture and the same outbound-estimate stream — splitting them would
|
||||
duplicate `TurnFrameSample` collection in two scenarios. The split
|
||||
is at the *assertion* layer (positive vs negative scenario file), not
|
||||
at the detector. `smoothing_evaluator` owns pose-pair logic, GT
|
||||
resolution, and budget evaluation. `multi_segment_evaluator` already
|
||||
reviewed in batch 71 partial.
|
||||
* **No suppressed errors**: every helper raises on invalid input
|
||||
(`get_threshold_mdps` env validation, `pair_records` dupe/unknown
|
||||
pose_kind, `load_zgyro_samples` missing column, `synthesize_overlay_segment`
|
||||
argument validation, `evaluate` empty GT track).
|
||||
* **AAA comment discipline**: every test uses `# Arrange / # Act /
|
||||
# Assert`; sections omitted when not needed (single-line `Assert` for
|
||||
constant tests).
|
||||
* **No narration comments**: docstrings explain non-obvious intent
|
||||
(AC mapping, why orphans are excluded, why nearest-neighbour GT is
|
||||
acceptable, why the env override exists).
|
||||
|
||||
### Phase 4 — Security
|
||||
|
||||
* **No SUT imports**: confirmed by passing `test_no_sut_imports.py` in
|
||||
the full suite. None of the new modules import from
|
||||
`src.gps_denied_onboard`.
|
||||
* **No PII/credentials**: helpers handle synthetic + Derkachi-public
|
||||
GT only.
|
||||
* **No SQL/shell injection surface**: helpers consume CSV via
|
||||
`csv.DictReader` and `pathlib`. No subprocess calls.
|
||||
|
||||
### Phase 5 — Performance
|
||||
|
||||
* All helpers are O(N) over samples. `pair_records` is one dict pass;
|
||||
`evaluate` is O(P) over paired keyframes (P typically ≤200 for the
|
||||
Derkachi window); `detect_turn_segments` is one scan;
|
||||
`evaluate_ft_p_07` uses a linear scan for pre-anchor + recovery
|
||||
(acceptable for ≤1000 samples; if scenario data grows we can switch
|
||||
to bisect).
|
||||
* No nested CSV reads or repeated geodesic recomputations.
|
||||
|
||||
### Phase 6 — Cross-Task Consistency
|
||||
|
||||
* **Pattern parity with batches 69 + 70**:
|
||||
- Skip gate (`_harness_helpers_implemented` fixture) for missing
|
||||
upstream replay helpers — same pattern as `test_ft_p_02_*`,
|
||||
`test_ft_p_04_*`, `test_ft_p_05_*`.
|
||||
- `_NullSink` / `_NullImuEmitter` probes — same pattern as `test_ft_p_04_*`.
|
||||
- Evidence CSV via `write_csv_evidence(out, …)` returning the path —
|
||||
same pattern as `accuracy_evaluator`, `mre_evaluator`,
|
||||
`multi_segment_evaluator`.
|
||||
- NFR metrics via `nfr_recorder.record_metric(name, value, ac_id=…)` —
|
||||
same pattern as `test_ft_p_01_*`, `test_ft_p_04_*`.
|
||||
- Helper modules importable from `runner.helpers.*` with module-level
|
||||
constants in `UPPER_SNAKE` — matches `multi_segment_evaluator` and
|
||||
`mre_evaluator`.
|
||||
* **No drift**: FT-P-07 scenario reuses the `MAX_RECOVERY_FRAMES_SAFETY_MS`
|
||||
budget constant from the helper (no magic numbers); FT-N-02 reuses
|
||||
`ALLOWED_DURING_TURN_LABELS` set so the scenario assertion message
|
||||
prints the spec-accurate alphabetised set.
|
||||
|
||||
### Phase 7 — Architecture Compliance
|
||||
|
||||
* **Public-boundary discipline**: confirmed by `test_no_sut_imports.py`
|
||||
(passing). Helpers consume only the FDR record schema (record_type
|
||||
+ payload dict) defined in `runner.helpers.fdr_reader`.
|
||||
* **Directory layout**: new files added to `test_directory_layout.py`
|
||||
parametrize list (`runner/helpers/{smoothing_evaluator,sharp_turn_detector,multi_segment_evaluator}.py`,
|
||||
`tests/positive/test_ft_p_{07,08,10}_*.py`,
|
||||
`tests/negative/test_ft_n_02_*.py`). All 75 parametrized variants
|
||||
pass.
|
||||
* **Determinism**: all helpers are deterministic — no `time.time()`, no
|
||||
random number generation; `random` not imported in any new module.
|
||||
|
||||
### Phase 8 — Test Suite Health
|
||||
|
||||
* Total: **393 passed in 126.64s** (was 325 at end of batch 70).
|
||||
* New tests this batch: **+68** (sharp_turn_detector: 30; smoothing_evaluator: 15; multi_segment_evaluator: 16; directory_layout new entries: 7).
|
||||
* Pre-existing macOS-only `/e2e-results` plugin issue still present —
|
||||
affects scenario test invocation outside Docker only; unit suite
|
||||
unaffected. Tracked under runner reporting (out of batch scope).
|
||||
|
||||
## Cross-Task Consistency Verdict
|
||||
|
||||
PASS — no cross-task drift, no duplicated logic across the three new
|
||||
helpers, no shared mutable state, evidence CSV schemas distinct per
|
||||
scenario but follow the same write pattern.
|
||||
|
||||
## Architecture Compliance Verdict
|
||||
|
||||
PASS — public-boundary blackbox stance preserved; no SUT imports; FDR
|
||||
schema honoured.
|
||||
|
||||
## Final Verdict
|
||||
|
||||
**PASS** — Batch 71 (AZ-414 + AZ-415 + AZ-418) ready for commit.
|
||||
@@ -12,7 +12,7 @@ sub_step:
|
||||
retry_count: 0
|
||||
cycle: 1
|
||||
tracker: jira
|
||||
last_completed_batch: 70
|
||||
last_completed_batch: 71
|
||||
last_cumulative_review: batches_67-69
|
||||
last_step_outcomes:
|
||||
step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"
|
||||
|
||||
Reference in New Issue
Block a user