AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators: - outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset - blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment; AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas - multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage - fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard - _common.py: derive_rng + tile-manifest reader + tmpfs helpers - injector_fixtures.py: pytest fixtures wired via runner conftest AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors: - anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction, AC-4 monotonicity check, CSV evidence - test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper NotImplementedError (frame_source_replay / fdr_reader / imu_replay) AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84: - estimate_schema.py: AC-1 schema completeness, AC-2 source-label set containment, AC-3 WGS84 range + int32 1e-7 decode - test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario Tests: 248 unit tests pass (+91 vs batch 68). Reports: batch_69_report.md, batch_69_review.md (PASS), cumulative_review_batches_67-69_cycle1_report.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>
17 KiB
Batch 69 Report — Test Implementation (cycle 1, batch 3 of test phase)
Batch: 69
Date: 2026-05-16
Context: Test implementation (greenfield Step 10 — Implement Tests)
Tasks: AZ-408 (3pt), AZ-410 (3pt), AZ-411 (2pt) — 8 cp / 3 tasks
Cycle: 1
Verdict: COMPLETE — PASS (self-reviewed; see
reviews/batch_69_review.md and
cumulative_review_batches_67-69_cycle1_report.md)
Summary
Three blackbox-harness tasks, all dependent only on AZ-406 + AZ-407:
AZ-408 — Runtime synthetic injectors (3pt)
Replaced the four AZ-406 scaffold modules under
e2e/fixtures/injectors/ with concrete generators, plus a shared
_common.py (deterministic seed, tile-cache manifest reader, tmpfs
helpers) and a coordinated fc_proxy.py (the runtime companion to
blackout_spoof.py).
- outlier.py — overlays Derkachi frames with far-away tile crops at
three density flags (light = 1/100, medium = 1/10, heavy = 1/3).
Frame selection is deterministic-stride; replacement-tile picks are
drawn from a SHA-256-seeded
np.random.default_rngso identical inputs reproduce identical outputs. Per-replacement geodesic offset enforced to ≥350 m (AC-2 of FT-N-01 / AC-NEW-8 envelope). - blackout_spoof.py — writes a
schedule.jsonwith paired(window_start_ms, window_end_ms, blackout_frame_indices, spoof_gps)artefacts. The schedule's spoofed-GPS track satisfies AC-NEW-8 (200–500 m consecutive deltas), AC-4 (fix_type ∈ {3, 4}, hdop ∈ [0.5, 2.5], no sentinels), and AC-3 (max alignment err 40 ms recorded; enforced by the runtime proxy). Black frames are pinned-PIL all-zero 256×256 JPEGs. - multi_segment.py — produces ≥3 disjoint blackout windows uniformly anchored at fractions of the source duration, with enforced ≥30 s inter-segment gaps and ≤25 % total coverage. No spoof injection (FT-P-08 positive path).
- fc_proxy.py — stateless pass-through proxy with timed splice;
activate(now_ms_provider, first_blackout_ms)aligns the proxy clock to the video-overlay's first black frame so AC-3 (≤40 ms) holds end-to-end. Pre-activateprocess_inbound_message()is aRuntimeError(programming-error guard, not silent passthrough). _common.py—derive_rng(domain, *components)is the domain-tagged seed primitive;read_tile_manifestparses the AZ-407 manifest.csv (with derived lat/lon centres via the slippy XYZ inverse) so injectors can pick "far-away" replacement tiles without importing the tile-cache-builder package;haversine_m/far_away_indicesare a deliberate light-weight duplicate ofgeo.distance_m(pyproj) so injectors run in minimal Docker images without the heavier geo extras.- pytest fixtures:
runner/helpers/injector_fixtures.pyexposesoutlier_injection_derkachi,blackout_spoof_derkachi,multi_segment_derkachiplus the sharedderkachi_source_frames,tile_cache_fixturelookups. Registered via the runner conftest'spytest_plugins.
AZ-410 — FT-P-02 cumulative drift between satellite anchors (3pt)
runner/helpers/anchor_pair_detector.py— pure-Python helper with the AC-1 detection (segment-then-anchor pair construction), AC-2/AC-3 pass-fraction computation, AC-4 bin-median monotonicity check, plus a Vincenty-WGS84 drift computation viarunner.helpers.geo.distance_m. Default age bins follow the spec's{<1 s, 1-3 s, 3-10 s, 10-30 s, >30 s}buckets.aggregate(stream)is the one-call entry-point the scenario uses;write_csv_evidenceemits the FT-P-02 evidence CSV.tests/positive/test_ft_p_02_derkachi_drift.py— pytest scenario parameterized across(fc_adapter, vio_strategy); the docker-bound runtime path is gated by_harness_helpers_implemented, which probesrunner.helpers.frame_source_replay/fdr_reader/imu_replayforNotImplementedError. When the upstream helpers land the scenario activates with zero further changes.
AZ-411 — FT-P-03 + FT-P-14 schema + WGS84 (2pt)
runner/helpers/estimate_schema.py— three pure validators:validate_estimate_schema(AC-1:lat:float,lon:float,cov_semi_major_m:float,last_satellite_anchor_age_ms:intpresent & well-typed; bool-leaks-as-int explicitly rejected),validate_source_label(AC-2: set ⊆ {satellite_anchored,visual_propagated,dead_reckoned}),validate_wgs84_range(AC-3: lat ∈ [-90, 90], lon ∈ [-180, 180], NaN rejected). Plusdecode_lat_lon_int32for the AP/iNav 1e-7 int32 wire format.tests/positive/test_ft_p_03_14_schema_wgs84.py— two test methods (test_schema_and_source_labelfor FT-P-03,test_wgs84_coordinate_rangefor FT-P-14) sharing the single-image-push fixture. Same_harness_helpers_implementedgate as AZ-410.
Files added / modified
Added (13)
AZ-408:
e2e/fixtures/injectors/_common.pye2e/fixtures/injectors/fc_proxy.pye2e/runner/helpers/injector_fixtures.py
AZ-410:
e2e/runner/helpers/anchor_pair_detector.pye2e/tests/positive/test_ft_p_02_derkachi_drift.py
AZ-411:
e2e/runner/helpers/estimate_schema.pye2e/tests/positive/test_ft_p_03_14_schema_wgs84.py
Unit tests (AZ-408 + AZ-410 + AZ-411):
e2e/_unit_tests/fixtures/test_outlier.pye2e/_unit_tests/fixtures/test_blackout_spoof.pye2e/_unit_tests/fixtures/test_multi_segment.pye2e/_unit_tests/fixtures/test_fc_proxy.pye2e/_unit_tests/helpers/test_anchor_pair_detector.pye2e/_unit_tests/helpers/test_estimate_schema.py
Modified (8)
AZ-408 — replaced AZ-406 stub modules with real implementations:
e2e/fixtures/injectors/outlier.py— full implementation (was ~20-line scaffold raisingNotImplementedError).e2e/fixtures/injectors/blackout_spoof.py— full implementation.e2e/fixtures/injectors/multi_segment.py— full implementation.e2e/fixtures/injectors/__init__.py— updated docstring; added_common+fc_proxyto the index.
Harness wiring:
e2e/runner/conftest.py— addedrunner.helpers.injector_fixturestopytest_plugins.
Tests:
e2e/_unit_tests/fixtures/test_injectors_contract.py— updated to the new AZ-408 dataclass shapes (the oldtarget_segment_seconds/n_outliers/BlackoutSpoofPlan(blackout_seconds=…)legacy contract from AZ-406 was retired together with the scaffold modules).e2e/_unit_tests/test_directory_layout.py— added the 7 new paths (_common.py,fc_proxy.py,injector_fixtures.py,anchor_pair_detector.py,estimate_schema.py,test_ft_p_02_derkachi_drift.py,test_ft_p_03_14_schema_wgs84.py).e2e/_unit_tests/fixtures/test_blackout_spoof.py— bumped synthetic frames count from 900 → 3000 so the 25 s / 35 s window probes fit inside the source (the spec's NFT-RES-04 35 s window family is the driver).e2e/fixtures/injectors/fc_proxy.py— added the explicit pre-activateRuntimeErrorper the unit test feedback (was a silent passthrough in the first draft).
Spec / module-layout drift notes
- AZ-408 spec uses
tests/fixtures/injectors/*paths, but theblackbox_testscross-cutting entry inmodule-layout.mdplaces the e2e harness undere2e/fixtures/injectors/. Implementation followed the module-layout entry (consistent with batch 68's AZ-407 resolution). The AZ-408 archived spec retains thetests/fixtureswording for audit; the actual file ownership ise2e/fixtures/. - AZ-410 spec mentions
tests/fixtures/...in the AC-NEW table (single mention oftests/integration/fdr_reader.py). Same resolution — module-layout authoritative. - AZ-408 AZ-406-scaffold-dataclass divergence: the AZ-406 scaffold
declared
OutlierInjectionPlan(target_segment_seconds, max_offset_m, n_outliers); AZ-408 needs(source_frames_dir, tile_cache_dir, density, seed, min_offset_m). The contract test was updated together with the scaffold replacement (no other callers of the old shape existed; verified byrg). This is the expected scaffold-to-real evolution per the AZ-406 injector docstrings ("Concrete generator is owned by AZ-408"). - AZ-410 / AZ-411 runtime-path skip: both scenario files probe
NotImplementedErrorfromframe_source_replay/imu_replay/fdr_reader/sitl_observer/mavproxy_tlog_readerrather than hard-coding a "deferred until AZ-X" marker. When those helpers land, both scenarios activate automatically.
Test Results
Focused tests (Step 6.4)
pytest e2e/_unit_tests/ — 248 passed in 141.08s (was 157 at end
of batch 68; +91 new tests across this batch).
Breakdown of new tests:
- AZ-408 fixtures (60 cases across 5 files):
test_outlier.py— 20 cases (determinism, AC-2 offset, AC-6 cleanup, density-stride mapping, error-path FileNotFoundError, summary.json round-trip, replacement-density target);test_blackout_spoof.py— 10 cases (window length, AC-1 determinism, AC-4 realism, AC-NEW-8 inter-spoof deltas, AC-3 schedule, black-frame pixel sample, passthrough outside window, schedule.json shape, overwrite, validation);test_multi_segment.py— 9 cases (≥3 disjoint, ≥30 s gap, ≤25 % coverage, infeasibility validation, error paths);test_fc_proxy.py— 10 cases (passthrough / spoof-replace, alignment-err scenarios, exhaustion behaviour, schedule.json round-trip, pre-activate RuntimeError);test_injectors_contract.py— 10 cases (dataclass shape, frozen, Literal density round-trip, report types).
- AZ-410 anchor-pair detector (15 cases): AC-1 detection variants (visual / dead_reckoned / IMU-fused / first-anchor-skip / multi-pair); AC-2/3 pass-fraction; AC-4 monotonic / 2× jump / regression; aggregate round-trip; CSV evidence round-trip.
- AZ-411 estimate schema (18 cases): AC-1 schema completeness (missing / wrong-type / bool guard / spec drift guard); AC-2 source-label containment (each allowed + rejection); AC-3 WGS84 range (in-range, lat>90, lon<-180, NaN); int32 1e-7 decode round-trip + range check; aggregate.
No regressions in the 157 inherited AZ-406 / AZ-407 / AZ-444 / AZ-445 tests.
No per-batch full-suite run per the implement skill's Test-Run Cadence (Step 16 owns the only full-suite invocation).
AC Test Coverage
AZ-408
| AC | Test | Status |
|---|---|---|
| AC-1 (outlier seed-deterministic) | test_build_is_seed_deterministic, test_different_seeds_produce_different_replacements, test_density_ratio_maps_to_correct_stride[*] |
Covered |
| AC-2 (outlier offsets >350 m) | test_every_replacement_exceeds_min_offset, test_far_away_indices_filters_by_distance |
Covered |
| AC-3 (blackout+spoof ≤40 ms alignment) | test_alignment_err_below_40ms_when_clock_matches_first_blackout, test_alignment_err_within_budget_under_normal_clock_skew, test_proxy_spoofs_inside_window, test_schedule_has_max_alignment_err_per_ac3 |
Covered |
| AC-4 (spoof pattern realistic + AC-NEW-8 deltas) | test_spoof_fields_are_realistic, test_spoof_track_inter_position_delta_in_range |
Covered |
| AC-5 (multi_segment ≥3 disjoint / ≥30 s gaps / ≤25 % coverage) | test_produces_three_disjoint_segments, test_segments_are_at_least_30_seconds_apart, test_total_blackout_below_25_percent, test_rejects_overlapping_gap |
Covered |
| AC-6 (tmpfs auto-cleared) | test_build_writes_only_under_out_root, test_build_overwrites_existing_out_root, test_cleanup_tmpfs_removes_scratch, test_cleanup_tmpfs_is_silent_for_missing_path |
Covered |
AZ-410
| AC | Test | Status |
|---|---|---|
| AC-1 (anchor-pair detection) | test_first_anchor_is_not_a_pair, test_simple_visual_only_pair, test_imu_fused_segment_classifies_pair, test_dead_reckoned_in_segment_still_pair, test_multiple_pairs_in_one_flight |
Covered |
| AC-2 (visual-only drift <100 m, ≥95 %) | test_pass_fraction_all_pass, test_pass_fraction_partial, test_aggregate_round_trip |
Covered |
| AC-3 (IMU-fused drift <50 m, ≥95 %) | test_aggregate_round_trip (covers IMU-fused vs visual-only segregation; pass-fraction helper tested with both bounds) |
Covered |
| AC-4 (bin-median monotonic with age) | test_bin_drifts_default_edges, test_check_monotonic_passes_for_increasing_medians, test_check_monotonic_flags_regression, test_check_monotonic_flags_2x_jump |
Covered |
AC-5 (parameterized over (fc_adapter, vio_strategy)) |
Verified via pytest --collect-only — 6 variants per scenario method |
Covered |
| AC-1.3 runtime (full Derkachi replay end-to-end) | requires runner.helpers.{frame_source_replay,fdr_reader,imu_replay} — currently stubs; scenario auto-activates when those land |
NOT COVERED (harness-loop) |
AZ-411
| AC | Test | Status |
|---|---|---|
| AC-1 (schema completeness) | test_valid_record_passes_schema, test_missing_field_caught, test_int_typed_field_rejected_when_wrong_type, test_bool_does_not_silently_satisfy_int, test_required_fields_table_is_what_the_spec_says |
Covered |
| AC-2 (source-label set containment) | test_each_allowed_label_passes[*], test_unknown_label_rejected, test_non_string_label_rejected |
Covered |
| AC-3 (WGS84 lat/lon range + 1e-7 int32 decode) | test_valid_wgs84_inside_range, test_lat_above_90_rejected, test_lon_below_minus_180_rejected, test_nan_rejected, test_decode_lat_lon_int32_round_trip, test_decode_lat_lon_int32_rejects_out_of_int32_range |
Covered |
AC-4 (parameterized over (fc_adapter, vio_strategy)) |
Verified via pytest --collect-only — 6 variants per scenario method, 12 total |
Covered |
| Single-image push runtime end-to-end | requires the same upstream helpers as AZ-410 | NOT COVERED (harness-loop) |
The runtime / harness-loop ACs are documented in the same way as
batch 68's AZ-444 hardware-loop ACs: the helper logic is fully unit-
tested; the docker-bound runtime path activates automatically when the
upstream frame_source_replay / fdr_reader / imu_replay /
sitl_observer / mavproxy_tlog_reader helpers stop raising
NotImplementedError.
Code Review Verdict
Self-reviewed — PASS. See reviews/batch_69_review.md for the per-phase
sweep (no Critical / High / Medium / Low findings) and
cumulative_review_batches_67-69_cycle1_report.md for the K=3
cumulative review (same verdict; no cross-batch drift).
Notable points:
- Determinism primitive:
_common.derive_rng(domain, *components)hashes the domain + components into a 64-bit seed, so two unrelated injectors with the same numeric seed receive independent streams. This is the basis for the AC-1 determinism guarantee across all three injectors. _common.haversine_mvsgeo.distance_m: deliberate dependency-isolation duplicate. The injectors must work in minimal Docker images without pyproj; the docstring explains the trade-off. Negligible numerical drift between haversine and Vincenty at the ~km scales the AC-2 check operates on.- Pre-activate
RuntimeErrorinfc_proxy: introduced after the unit test caught a silent-passthrough behaviour; programming-error guard so a forgottenactivate()cannot quietly degrade into no-op passthrough during a real scenario run. - Scenario-file skip pattern: AZ-410's scenario probes upstream
helpers'
NotImplementedErrorrather than hard-coding a "deferred until X" marker. AZ-411 reuses the same pattern. When the helpers land, both scenarios activate without any source change.
Auto-Fix Attempts
- No code-review failures — auto-fix gate was not entered.
Stuck Agents
None.
Deferred follow-ups
runner.helpers.frame_source_replay.FrameSourceReplayer.replay_video/.replay_image_directory— currentlyNotImplementedError; unblocking AZ-410 / AZ-411 runtime paths.runner.helpers.fdr_reader.iter_records— owned by AZ-441; blocks AZ-410 runtime path.runner.helpers.imu_replay.ImuReplayer.replay— owned by AZ-407 per scaffold docstring (the AZ-407 batch did not touch it); blocks AZ-410 runtime path.runner.helpers.sitl_observer.get_observer— owned by AZ-416 / AZ-417; blocks AZ-411 runtime path.runner.helpers.mavproxy_tlog_reader.iter_messages— owned by AZ-416; blocks AZ-411 runtime path.
These are existing scaffolds with explicit ownership tags — no new debt introduced by this batch.
Next Batch
Batch 70 candidate set (all unblocked after this batch lands):
- AZ-409 (FT-P-01 — frame-center GPS accuracy — 5pt) — first concrete positive scenario exercising the SUT through the full Docker-bound runner. Same harness-loop gate as AZ-410.
- AZ-412 (FT-P-04 — frame-to-frame registration — 3pt)
- AZ-413 (FT-P-05/06 — sat anchor MRE — 5pt)
Total: 13 cp across 3 tasks. AZ-409 is the headline; AZ-412 / AZ-413 fill out the positive-path family.