mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 17:31:14 +00:00
[AZ-408] [AZ-410] [AZ-411] Batch 69: synth injectors + FT-P-02/03/14
AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators: - outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset - blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment; AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas - multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage - fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard - _common.py: derive_rng + tile-manifest reader + tmpfs helpers - injector_fixtures.py: pytest fixtures wired via runner conftest AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors: - anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction, AC-4 monotonicity check, CSV evidence - test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper NotImplementedError (frame_source_replay / fdr_reader / imu_replay) AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84: - estimate_schema.py: AC-1 schema completeness, AC-2 source-label set containment, AC-3 WGS84 range + int32 1e-7 decode - test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario Tests: 248 unit tests pass (+91 vs batch 68). Reports: batch_69_report.md, batch_69_review.md (PASS), cumulative_review_batches_67-69_cycle1_report.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,319 @@
|
||||
# Batch 69 Report — Test Implementation (cycle 1, batch 3 of test phase)
|
||||
|
||||
**Batch**: 69
|
||||
**Date**: 2026-05-16
|
||||
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
|
||||
**Tasks**: AZ-408 (3pt), AZ-410 (3pt), AZ-411 (2pt) — 8 cp / 3 tasks
|
||||
**Cycle**: 1
|
||||
**Verdict**: COMPLETE — PASS (self-reviewed; see
|
||||
`reviews/batch_69_review.md` and
|
||||
`cumulative_review_batches_67-69_cycle1_report.md`)
|
||||
|
||||
## Summary
|
||||
|
||||
Three blackbox-harness tasks, all dependent only on AZ-406 + AZ-407:
|
||||
|
||||
### AZ-408 — Runtime synthetic injectors (3pt)
|
||||
|
||||
Replaced the four AZ-406 scaffold modules under
|
||||
`e2e/fixtures/injectors/` with concrete generators, plus a shared
|
||||
`_common.py` (deterministic seed, tile-cache manifest reader, tmpfs
|
||||
helpers) and a coordinated `fc_proxy.py` (the runtime companion to
|
||||
`blackout_spoof.py`).
|
||||
|
||||
* **outlier.py** — overlays Derkachi frames with far-away tile crops at
|
||||
three density flags (light = 1/100, medium = 1/10, heavy = 1/3).
|
||||
Frame selection is deterministic-stride; replacement-tile picks are
|
||||
drawn from a SHA-256-seeded `np.random.default_rng` so identical
|
||||
inputs reproduce identical outputs. Per-replacement geodesic offset
|
||||
enforced to ≥350 m (AC-2 of FT-N-01 / AC-NEW-8 envelope).
|
||||
* **blackout_spoof.py** — writes a `schedule.json` with paired
|
||||
`(window_start_ms, window_end_ms, blackout_frame_indices, spoof_gps)`
|
||||
artefacts. The schedule's spoofed-GPS track satisfies AC-NEW-8 (200–500 m
|
||||
consecutive deltas), AC-4 (fix_type ∈ {3, 4}, hdop ∈ [0.5, 2.5], no
|
||||
sentinels), and AC-3 (max alignment err 40 ms recorded; enforced by
|
||||
the runtime proxy). Black frames are pinned-PIL all-zero 256×256 JPEGs.
|
||||
* **multi_segment.py** — produces ≥3 disjoint blackout windows
|
||||
uniformly anchored at fractions of the source duration, with
|
||||
enforced ≥30 s inter-segment gaps and ≤25 % total coverage. No spoof
|
||||
injection (FT-P-08 positive path).
|
||||
* **fc_proxy.py** — stateless pass-through proxy with timed splice;
|
||||
`activate(now_ms_provider, first_blackout_ms)` aligns the proxy
|
||||
clock to the video-overlay's first black frame so AC-3 (≤40 ms) holds
|
||||
end-to-end. Pre-activate `process_inbound_message()` is a `RuntimeError`
|
||||
(programming-error guard, not silent passthrough).
|
||||
* **`_common.py`** — `derive_rng(domain, *components)` is the
|
||||
domain-tagged seed primitive; `read_tile_manifest` parses the
|
||||
AZ-407 manifest.csv (with derived lat/lon centres via the slippy XYZ
|
||||
inverse) so injectors can pick "far-away" replacement tiles without
|
||||
importing the tile-cache-builder package; `haversine_m` /
|
||||
`far_away_indices` are a deliberate light-weight duplicate of
|
||||
`geo.distance_m` (pyproj) so injectors run in minimal Docker images
|
||||
without the heavier geo extras.
|
||||
* **pytest fixtures**: `runner/helpers/injector_fixtures.py` exposes
|
||||
`outlier_injection_derkachi`, `blackout_spoof_derkachi`,
|
||||
`multi_segment_derkachi` plus the shared `derkachi_source_frames`,
|
||||
`tile_cache_fixture` lookups. Registered via the runner conftest's
|
||||
`pytest_plugins`.
|
||||
|
||||
### AZ-410 — FT-P-02 cumulative drift between satellite anchors (3pt)
|
||||
|
||||
* **`runner/helpers/anchor_pair_detector.py`** — pure-Python helper
|
||||
with the AC-1 detection (segment-then-anchor pair construction),
|
||||
AC-2/AC-3 pass-fraction computation, AC-4 bin-median monotonicity
|
||||
check, plus a Vincenty-WGS84 drift computation via
|
||||
`runner.helpers.geo.distance_m`. Default age bins follow the spec's
|
||||
`{<1 s, 1-3 s, 3-10 s, 10-30 s, >30 s}` buckets. `aggregate(stream)`
|
||||
is the one-call entry-point the scenario uses; `write_csv_evidence`
|
||||
emits the FT-P-02 evidence CSV.
|
||||
* **`tests/positive/test_ft_p_02_derkachi_drift.py`** — pytest scenario
|
||||
parameterized across `(fc_adapter, vio_strategy)`; the docker-bound
|
||||
runtime path is gated by `_harness_helpers_implemented`, which
|
||||
probes `runner.helpers.frame_source_replay` / `fdr_reader` /
|
||||
`imu_replay` for `NotImplementedError`. When the upstream helpers
|
||||
land the scenario activates with zero further changes.
|
||||
|
||||
### AZ-411 — FT-P-03 + FT-P-14 schema + WGS84 (2pt)
|
||||
|
||||
* **`runner/helpers/estimate_schema.py`** — three pure validators:
|
||||
`validate_estimate_schema` (AC-1: `lat:float`, `lon:float`,
|
||||
`cov_semi_major_m:float`, `last_satellite_anchor_age_ms:int` present
|
||||
& well-typed; bool-leaks-as-int explicitly rejected),
|
||||
`validate_source_label` (AC-2: set ⊆ {`satellite_anchored`,
|
||||
`visual_propagated`, `dead_reckoned`}), `validate_wgs84_range` (AC-3:
|
||||
lat ∈ [-90, 90], lon ∈ [-180, 180], NaN rejected). Plus
|
||||
`decode_lat_lon_int32` for the AP/iNav 1e-7 int32 wire format.
|
||||
* **`tests/positive/test_ft_p_03_14_schema_wgs84.py`** — two test
|
||||
methods (`test_schema_and_source_label` for FT-P-03,
|
||||
`test_wgs84_coordinate_range` for FT-P-14) sharing the
|
||||
single-image-push fixture. Same `_harness_helpers_implemented` gate
|
||||
as AZ-410.
|
||||
|
||||
## Files added / modified
|
||||
|
||||
### Added (13)
|
||||
|
||||
AZ-408:
|
||||
* `e2e/fixtures/injectors/_common.py`
|
||||
* `e2e/fixtures/injectors/fc_proxy.py`
|
||||
* `e2e/runner/helpers/injector_fixtures.py`
|
||||
|
||||
AZ-410:
|
||||
* `e2e/runner/helpers/anchor_pair_detector.py`
|
||||
* `e2e/tests/positive/test_ft_p_02_derkachi_drift.py`
|
||||
|
||||
AZ-411:
|
||||
* `e2e/runner/helpers/estimate_schema.py`
|
||||
* `e2e/tests/positive/test_ft_p_03_14_schema_wgs84.py`
|
||||
|
||||
Unit tests (AZ-408 + AZ-410 + AZ-411):
|
||||
* `e2e/_unit_tests/fixtures/test_outlier.py`
|
||||
* `e2e/_unit_tests/fixtures/test_blackout_spoof.py`
|
||||
* `e2e/_unit_tests/fixtures/test_multi_segment.py`
|
||||
* `e2e/_unit_tests/fixtures/test_fc_proxy.py`
|
||||
* `e2e/_unit_tests/helpers/test_anchor_pair_detector.py`
|
||||
* `e2e/_unit_tests/helpers/test_estimate_schema.py`
|
||||
|
||||
### Modified (8)
|
||||
|
||||
AZ-408 — replaced AZ-406 stub modules with real implementations:
|
||||
* `e2e/fixtures/injectors/outlier.py` — full implementation (was
|
||||
~20-line scaffold raising `NotImplementedError`).
|
||||
* `e2e/fixtures/injectors/blackout_spoof.py` — full implementation.
|
||||
* `e2e/fixtures/injectors/multi_segment.py` — full implementation.
|
||||
* `e2e/fixtures/injectors/__init__.py` — updated docstring; added
|
||||
`_common` + `fc_proxy` to the index.
|
||||
|
||||
Harness wiring:
|
||||
* `e2e/runner/conftest.py` — added `runner.helpers.injector_fixtures`
|
||||
to `pytest_plugins`.
|
||||
|
||||
Tests:
|
||||
* `e2e/_unit_tests/fixtures/test_injectors_contract.py` — updated to
|
||||
the new AZ-408 dataclass shapes (the old `target_segment_seconds` /
|
||||
`n_outliers` / `BlackoutSpoofPlan(blackout_seconds=…)` legacy
|
||||
contract from AZ-406 was retired together with the scaffold modules).
|
||||
* `e2e/_unit_tests/test_directory_layout.py` — added the 7 new
|
||||
paths (`_common.py`, `fc_proxy.py`, `injector_fixtures.py`,
|
||||
`anchor_pair_detector.py`, `estimate_schema.py`,
|
||||
`test_ft_p_02_derkachi_drift.py`,
|
||||
`test_ft_p_03_14_schema_wgs84.py`).
|
||||
* `e2e/_unit_tests/fixtures/test_blackout_spoof.py` — bumped synthetic
|
||||
frames count from 900 → 3000 so the 25 s / 35 s window probes fit
|
||||
inside the source (the spec's NFT-RES-04 35 s window family is the
|
||||
driver).
|
||||
* `e2e/fixtures/injectors/fc_proxy.py` — added the explicit
|
||||
pre-activate `RuntimeError` per the unit test feedback (was a silent
|
||||
passthrough in the first draft).
|
||||
|
||||
## Spec / module-layout drift notes
|
||||
|
||||
* **AZ-408 spec uses `tests/fixtures/injectors/*` paths**, but the
|
||||
`blackbox_tests` cross-cutting entry in `module-layout.md` places
|
||||
the e2e harness under `e2e/fixtures/injectors/`. Implementation
|
||||
followed the module-layout entry (consistent with batch 68's AZ-407
|
||||
resolution). The AZ-408 archived spec retains the `tests/fixtures`
|
||||
wording for audit; the actual file ownership is `e2e/fixtures/`.
|
||||
* **AZ-410 spec mentions `tests/fixtures/...` in the AC-NEW table**
|
||||
(single mention of `tests/integration/fdr_reader.py`). Same
|
||||
resolution — module-layout authoritative.
|
||||
* **AZ-408 AZ-406-scaffold-dataclass divergence**: the AZ-406 scaffold
|
||||
declared `OutlierInjectionPlan(target_segment_seconds, max_offset_m,
|
||||
n_outliers)`; AZ-408 needs `(source_frames_dir, tile_cache_dir,
|
||||
density, seed, min_offset_m)`. The contract test was updated together
|
||||
with the scaffold replacement (no other callers of the old shape
|
||||
existed; verified by `rg`). This is the expected scaffold-to-real
|
||||
evolution per the AZ-406 injector docstrings ("Concrete generator
|
||||
is owned by AZ-408").
|
||||
* **AZ-410 / AZ-411 runtime-path skip**: both scenario files probe
|
||||
`NotImplementedError` from `frame_source_replay` / `imu_replay` /
|
||||
`fdr_reader` / `sitl_observer` / `mavproxy_tlog_reader` rather than
|
||||
hard-coding a "deferred until AZ-X" marker. When those helpers
|
||||
land, both scenarios activate automatically.
|
||||
|
||||
## Test Results
|
||||
|
||||
### Focused tests (Step 6.4)
|
||||
|
||||
`pytest e2e/_unit_tests/` — **248 passed in 141.08s** (was 157 at end
|
||||
of batch 68; +91 new tests across this batch).
|
||||
|
||||
Breakdown of new tests:
|
||||
|
||||
* AZ-408 fixtures (60 cases across 5 files):
|
||||
- `test_outlier.py` — 20 cases (determinism, AC-2 offset, AC-6
|
||||
cleanup, density-stride mapping, error-path FileNotFoundError,
|
||||
summary.json round-trip, replacement-density target);
|
||||
- `test_blackout_spoof.py` — 10 cases (window length, AC-1
|
||||
determinism, AC-4 realism, AC-NEW-8 inter-spoof deltas, AC-3
|
||||
schedule, black-frame pixel sample, passthrough outside window,
|
||||
schedule.json shape, overwrite, validation);
|
||||
- `test_multi_segment.py` — 9 cases (≥3 disjoint, ≥30 s gap,
|
||||
≤25 % coverage, infeasibility validation, error paths);
|
||||
- `test_fc_proxy.py` — 10 cases (passthrough / spoof-replace,
|
||||
alignment-err scenarios, exhaustion behaviour, schedule.json
|
||||
round-trip, pre-activate RuntimeError);
|
||||
- `test_injectors_contract.py` — 10 cases (dataclass shape, frozen,
|
||||
Literal density round-trip, report types).
|
||||
* AZ-410 anchor-pair detector (15 cases):
|
||||
AC-1 detection variants (visual / dead_reckoned / IMU-fused / first-anchor-skip /
|
||||
multi-pair); AC-2/3 pass-fraction; AC-4 monotonic / 2× jump /
|
||||
regression; aggregate round-trip; CSV evidence round-trip.
|
||||
* AZ-411 estimate schema (18 cases):
|
||||
AC-1 schema completeness (missing / wrong-type / bool guard / spec
|
||||
drift guard); AC-2 source-label containment (each allowed +
|
||||
rejection); AC-3 WGS84 range (in-range, lat>90, lon<-180, NaN);
|
||||
int32 1e-7 decode round-trip + range check; aggregate.
|
||||
|
||||
No regressions in the 157 inherited AZ-406 / AZ-407 / AZ-444 / AZ-445 tests.
|
||||
|
||||
No per-batch full-suite run per the implement skill's Test-Run Cadence
|
||||
(Step 16 owns the only full-suite invocation).
|
||||
|
||||
## AC Test Coverage
|
||||
|
||||
### AZ-408
|
||||
|
||||
| AC | Test | Status |
|
||||
|----|------|--------|
|
||||
| AC-1 (outlier seed-deterministic) | `test_build_is_seed_deterministic`, `test_different_seeds_produce_different_replacements`, `test_density_ratio_maps_to_correct_stride[*]` | Covered |
|
||||
| AC-2 (outlier offsets >350 m) | `test_every_replacement_exceeds_min_offset`, `test_far_away_indices_filters_by_distance` | Covered |
|
||||
| AC-3 (blackout+spoof ≤40 ms alignment) | `test_alignment_err_below_40ms_when_clock_matches_first_blackout`, `test_alignment_err_within_budget_under_normal_clock_skew`, `test_proxy_spoofs_inside_window`, `test_schedule_has_max_alignment_err_per_ac3` | Covered |
|
||||
| AC-4 (spoof pattern realistic + AC-NEW-8 deltas) | `test_spoof_fields_are_realistic`, `test_spoof_track_inter_position_delta_in_range` | Covered |
|
||||
| AC-5 (multi_segment ≥3 disjoint / ≥30 s gaps / ≤25 % coverage) | `test_produces_three_disjoint_segments`, `test_segments_are_at_least_30_seconds_apart`, `test_total_blackout_below_25_percent`, `test_rejects_overlapping_gap` | Covered |
|
||||
| AC-6 (tmpfs auto-cleared) | `test_build_writes_only_under_out_root`, `test_build_overwrites_existing_out_root`, `test_cleanup_tmpfs_removes_scratch`, `test_cleanup_tmpfs_is_silent_for_missing_path` | Covered |
|
||||
|
||||
### AZ-410
|
||||
|
||||
| AC | Test | Status |
|
||||
|----|------|--------|
|
||||
| AC-1 (anchor-pair detection) | `test_first_anchor_is_not_a_pair`, `test_simple_visual_only_pair`, `test_imu_fused_segment_classifies_pair`, `test_dead_reckoned_in_segment_still_pair`, `test_multiple_pairs_in_one_flight` | Covered |
|
||||
| AC-2 (visual-only drift <100 m, ≥95 %) | `test_pass_fraction_all_pass`, `test_pass_fraction_partial`, `test_aggregate_round_trip` | Covered |
|
||||
| AC-3 (IMU-fused drift <50 m, ≥95 %) | `test_aggregate_round_trip` (covers IMU-fused vs visual-only segregation; pass-fraction helper tested with both bounds) | Covered |
|
||||
| AC-4 (bin-median monotonic with age) | `test_bin_drifts_default_edges`, `test_check_monotonic_passes_for_increasing_medians`, `test_check_monotonic_flags_regression`, `test_check_monotonic_flags_2x_jump` | Covered |
|
||||
| AC-5 (parameterized over `(fc_adapter, vio_strategy)`) | Verified via `pytest --collect-only` — 6 variants per scenario method | Covered |
|
||||
| AC-1.3 runtime (full Derkachi replay end-to-end) | requires `runner.helpers.{frame_source_replay,fdr_reader,imu_replay}` — currently stubs; scenario auto-activates when those land | NOT COVERED (harness-loop) |
|
||||
|
||||
### AZ-411
|
||||
|
||||
| AC | Test | Status |
|
||||
|----|------|--------|
|
||||
| AC-1 (schema completeness) | `test_valid_record_passes_schema`, `test_missing_field_caught`, `test_int_typed_field_rejected_when_wrong_type`, `test_bool_does_not_silently_satisfy_int`, `test_required_fields_table_is_what_the_spec_says` | Covered |
|
||||
| AC-2 (source-label set containment) | `test_each_allowed_label_passes[*]`, `test_unknown_label_rejected`, `test_non_string_label_rejected` | Covered |
|
||||
| AC-3 (WGS84 lat/lon range + 1e-7 int32 decode) | `test_valid_wgs84_inside_range`, `test_lat_above_90_rejected`, `test_lon_below_minus_180_rejected`, `test_nan_rejected`, `test_decode_lat_lon_int32_round_trip`, `test_decode_lat_lon_int32_rejects_out_of_int32_range` | Covered |
|
||||
| AC-4 (parameterized over `(fc_adapter, vio_strategy)`) | Verified via `pytest --collect-only` — 6 variants per scenario method, 12 total | Covered |
|
||||
| Single-image push runtime end-to-end | requires the same upstream helpers as AZ-410 | NOT COVERED (harness-loop) |
|
||||
|
||||
The runtime / harness-loop ACs are documented in the same way as
|
||||
batch 68's AZ-444 hardware-loop ACs: the helper logic is fully unit-
|
||||
tested; the docker-bound runtime path activates automatically when the
|
||||
upstream `frame_source_replay` / `fdr_reader` / `imu_replay` /
|
||||
`sitl_observer` / `mavproxy_tlog_reader` helpers stop raising
|
||||
`NotImplementedError`.
|
||||
|
||||
## Code Review Verdict
|
||||
|
||||
Self-reviewed — PASS. See `reviews/batch_69_review.md` for the per-phase
|
||||
sweep (no Critical / High / Medium / Low findings) and
|
||||
`cumulative_review_batches_67-69_cycle1_report.md` for the K=3
|
||||
cumulative review (same verdict; no cross-batch drift).
|
||||
|
||||
Notable points:
|
||||
|
||||
* **Determinism primitive**: `_common.derive_rng(domain, *components)`
|
||||
hashes the domain + components into a 64-bit seed, so two unrelated
|
||||
injectors with the same numeric seed receive independent streams.
|
||||
This is the basis for the AC-1 determinism guarantee across all
|
||||
three injectors.
|
||||
* **`_common.haversine_m` vs `geo.distance_m`**: deliberate
|
||||
dependency-isolation duplicate. The injectors must work in minimal
|
||||
Docker images without pyproj; the docstring explains the trade-off.
|
||||
Negligible numerical drift between haversine and Vincenty at the
|
||||
~km scales the AC-2 check operates on.
|
||||
* **Pre-activate `RuntimeError` in `fc_proxy`**: introduced after the
|
||||
unit test caught a silent-passthrough behaviour; programming-error
|
||||
guard so a forgotten `activate()` cannot quietly degrade into
|
||||
no-op passthrough during a real scenario run.
|
||||
* **Scenario-file skip pattern**: AZ-410's scenario probes upstream
|
||||
helpers' `NotImplementedError` rather than hard-coding a "deferred
|
||||
until X" marker. AZ-411 reuses the same pattern. When the helpers
|
||||
land, both scenarios activate without any source change.
|
||||
|
||||
## Auto-Fix Attempts
|
||||
|
||||
0. No code-review failures — auto-fix gate was not entered.
|
||||
|
||||
## Stuck Agents
|
||||
|
||||
None.
|
||||
|
||||
## Deferred follow-ups
|
||||
|
||||
* `runner.helpers.frame_source_replay.FrameSourceReplayer.replay_video`
|
||||
/ `.replay_image_directory` — currently `NotImplementedError`;
|
||||
unblocking AZ-410 / AZ-411 runtime paths.
|
||||
* `runner.helpers.fdr_reader.iter_records` — owned by AZ-441; blocks
|
||||
AZ-410 runtime path.
|
||||
* `runner.helpers.imu_replay.ImuReplayer.replay` — owned by AZ-407
|
||||
per scaffold docstring (the AZ-407 batch did not touch it); blocks
|
||||
AZ-410 runtime path.
|
||||
* `runner.helpers.sitl_observer.get_observer` — owned by AZ-416 /
|
||||
AZ-417; blocks AZ-411 runtime path.
|
||||
* `runner.helpers.mavproxy_tlog_reader.iter_messages` — owned by
|
||||
AZ-416; blocks AZ-411 runtime path.
|
||||
|
||||
These are existing scaffolds with explicit ownership tags — no new
|
||||
debt introduced by this batch.
|
||||
|
||||
## Next Batch
|
||||
|
||||
Batch 70 candidate set (all unblocked after this batch lands):
|
||||
|
||||
* AZ-409 (FT-P-01 — frame-center GPS accuracy — 5pt) — first
|
||||
concrete positive scenario exercising the SUT through the full
|
||||
Docker-bound runner. Same harness-loop gate as AZ-410.
|
||||
* AZ-412 (FT-P-04 — frame-to-frame registration — 3pt)
|
||||
* AZ-413 (FT-P-05/06 — sat anchor MRE — 5pt)
|
||||
|
||||
Total: 13 cp across 3 tasks. AZ-409 is the headline; AZ-412 / AZ-413
|
||||
fill out the positive-path family.
|
||||
Reference in New Issue
Block a user