[AZ-408] [AZ-410] [AZ-411] Batch 69: synth injectors + FT-P-02/03/14

AZ-408 (3pt) — Replace AZ-406 injector scaffolds with concrete generators: - outlier.py: deterministic stride + far-away tile replacement; AC-2 ≥350m offset - blackout_spoof.py: paired video blackout + FC GPS spoof with ≤40ms alignment; AC-4 realistic fix_type/hdop; AC-NEW-8 200-500m inter-spoof deltas - multi_segment.py: ≥3 disjoint windows, ≥30s gaps, ≤25% coverage - fc_proxy.py: timed-splice runtime proxy with pre-activate RuntimeError guard - _common.py: derive_rng + tile-manifest reader + tmpfs helpers - injector_fixtures.py: pytest fixtures wired via runner conftest AZ-410 (3pt) — FT-P-02 cumulative drift between satellite anchors: - anchor_pair_detector.py: AC-1 detection, AC-2/3 pass-fraction, AC-4 monotonicity check, CSV evidence - test_ft_p_02_derkachi_drift.py: scenario gated on upstream helper NotImplementedError (frame_source_replay / fdr_reader / imu_replay) AZ-411 (2pt) — FT-P-03 + FT-P-14 schema + WGS84: - estimate_schema.py: AC-1 schema completeness, AC-2 source-label set containment, AC-3 WGS84 range + int32 1e-7 decode - test_ft_p_03_14_schema_wgs84.py: shared single-image-push scenario Tests: 248 unit tests pass (+91 vs batch 68). Reports: batch_69_report.md, batch_69_review.md (PASS), cumulative_review_batches_67-69_cycle1_report.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 00:41:13 +00:00 · 2026-05-16 17:54:00 +03:00
parent ff1b00200c
commit 702a0c0ff3
27 changed files with 4619 additions and 58 deletions
@@ -0,0 +1,82 @@
+# Fixture Builders — Synthetic Injectors (outlier, blackout-spoof, multi-segment)
+
+**Task**: AZ-408_fixture_builders_synth_injectors
+**Name**: Runtime synthetic-injection fixture builders
+**Description**: Implement runtime-generated synthetic fixtures: `outlier-injection-derkachi` (light/medium/heavy densities), `blackout-spoof-derkachi` (5 s / 15 s / 35 s windows + paired FC GPS spoof), `multi-segment-derkachi` (3+ blackout windows without spoof).
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407 (tile-cache-fixture for Derkachi route bbox)
+**Component**: Blackbox Tests / Fixture builders (epic AZ-262 / E-BBT)
+**Tracker**: AZ-408
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+The negative-path scenarios FT-N-01, FT-N-04, FT-P-08, NFT-RES-04 all rely on programmatically derived overlays of the Derkachi fixture (visual outliers, blackout windows, simultaneous FC GPS spoof). These overlays must be deterministic, runtime-generated into per-test tmpfs (per `test-data.md` § Data Isolation), and — for the spoof case — coordinate with FC inbound stream patching.
+
+## Outcome
+
+- `tests/fixtures/injectors/outlier.py` overlays `derkachi-fixture` with random crops from far-away tiles (>350 m offset per AC-3.1) at three densities — `light` (1 in 100 frames), `medium` (1 in 10), `heavy` (1 in 3). Each density is a CLI flag; same `--seed` produces identical overlays.
+- `tests/fixtures/injectors/blackout_spoof.py` produces three sub-scenarios — 5 s, 15 s, 35 s pure-black-frame windows on the video stream AND simultaneous spoofed-GPS injection on the FC inbound stream. Spoof pattern: realistic-looking GPS that jumps 200-500 m in a `north_east_random_direction`. The injector is a coordinated pair (video overlay + FC inbound proxy patch) so both fire at the exact same wall-clock instant.
+- `tests/fixtures/injectors/multi_segment.py` generates 3+ blackout segments distributed across the Derkachi flight (positions configurable; default = at 25 %, 50 %, 75 % of replay) WITHOUT spoof injection. Used to exercise satellite-reference re-localization without a security failsafe path.
+- All injectors emit to per-test tmpfs (`/tmp/<run-id>/<scenario>/`) and are auto-cleared at teardown.
+
+## Scope
+
+### Included
+- The three injector scripts + a shared library for deterministic random-seed handling.
+- A small FC-inbound proxy patch for blackout_spoof — sits between the IMU/GPS replay and the SUT's FC inbound port; passes through everything except the spoofed-GPS bursts during the configured window(s).
+- pytest fixtures wrapping each injector for use in the per-scenario test files.
+
+### Excluded
+- The static fixtures (tile-cache, age-injector, cold-boot, mavlink-passkey, cve-jpeg) — owned by AZ-407.
+- Per-scenario test logic (FT-N-01, FT-N-04, FT-P-08, NFT-RES-04) — those tasks consume the injector outputs.
+- Persistent storage of generated fixtures — explicitly per-test tmpfs; never written to a persistent volume.
+
+## Acceptance Criteria
+
+**AC-1: outlier injector is seed-deterministic**
+Given the same `--seed` value
+When `outlier.py --density medium` runs twice
+Then both runs produce overlays with identical frame indices replaced and identical replacement crop selection.
+
+**AC-2: outlier offsets exceed 350 m (AC-3.1 envelope)**
+Given an `outlier-injection-derkachi` `medium`-density overlay
+When the per-frame replacement crop is geo-located via the tile-cache GT
+Then ≥99 % of replacement crops are >350 m from the original frame's GT centre.
+
+**AC-3: blackout_spoof produces synchronized video + FC events**
+Given `blackout_spoof.py --window 15s`
+When the test runs
+Then within ≤40 ms wall-clock of the video stream's first all-black frame, the FC inbound proxy starts emitting spoofed GPS frames; both stop within ≤40 ms at the window's end.
+
+**AC-4: blackout_spoof spoof pattern is realistic-looking**
+Given the spoof injector emits GPS during the blackout
+Then the spoofed `lat`/`lon`/`alt`/`fix_type`/`hdop` fields are within typical-flight ranges (no NaN, no obvious sentinel values, fix_type in {3, 4}, hdop in [0.5, 2.5]); the position deltas between consecutive spoofed frames are in [200 m, 500 m] per AC-NEW-8.
+
+**AC-5: multi_segment produces ≥3 disjoint blackout windows**
+Given `multi_segment.py`
+Then the output contains ≥3 blackout windows; consecutive windows are separated by ≥30 s of normal frames; total blackout coverage ≤ 25 % of the source duration (so the rest of the flight remains exercising satellite-anchor recovery).
+
+**AC-6: tmpfs auto-cleared at teardown**
+Given a test using any injector completes (PASS or FAIL)
+Then the injector's tmpfs scratch directory is removed within ≤2 s of teardown; subsequent tests start with empty tmpfs.
+
+## System Under Test Boundary
+
+The injectors only produce input data; they never replace, stub, or fake any SUT module. The blackout_spoof FC-inbound proxy is a pure pass-through with timed splice — it does not implement any SUT logic.
+
+- No internal SUT module is imported by any injector.
+- The FC-inbound proxy operates at the protocol level (MAVLink frame routing); it does not interpret SUT output.
+- Geo-location of replacement crops uses the tile-cache fixture's manifest only (a public artifact); it does not query the SUT's tile lookup.
+
+## Constraints
+
+- Determinism: same `--seed` → identical overlay across runs (this is required for the regression detector in 40_csv_reporter_refinements).
+- Tmpfs-only: injectors never write to persistent volumes.
+- Coordinated timing for blackout_spoof: video-overlay and FC-inbound spoof must fire within ≤40 ms of each other (AC-NEW-8 / FT-N-04 pass criterion is "within ≤1 frame OR ≤400 ms").
+
+## Document Dependencies
+
+- `_docs/02_document/tests/test-data.md` § Seed Data Sets (synthetic-injection rows)
+- `_docs/02_document/tests/blackbox-tests.md` (FT-N-01, FT-N-04, FT-P-08)
+- `_docs/02_document/tests/resilience-tests.md` (NFT-RES-04 reuses blackout_spoof)
@@ -0,0 +1,84 @@
+# FT-P-02 — Derkachi VIO drift between satellite anchors
+
+**Task**: AZ-410_ft_p_02_derkachi_drift
+**Name**: FT-P-02 cumulative drift between consecutive satellite-anchored fixes (AC-1.3)
+**Description**: Implement the FT-P-02 scenario — full Derkachi replay; at each anchor frame compute drift between the propagated visual-only centre and the new satellite anchor centre; bin by `last_satellite_anchor_age_ms`; assert ≥95 % of anchor pairs satisfy drift bounds.
+**Complexity**: 3 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Positive (epic AZ-262)
+**Tracker**: AZ-410
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Cumulative drift between consecutive satellite anchors is the most direct measure of the project's onboard-VIO + IMU-fusion behavior in flight. AC-1.3 must be measured on the Derkachi fixture (the only available real flight) — without this scenario the project has no closed-loop validation of the drift budget.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/positive/test_ft_p_02_derkachi_drift.py`.
+- Replays the full Derkachi fixture (video at 30 fps + IMU CSV at 10 Hz, 3 video frames per IMU row).
+- For each frame whose outbound estimate carries `source_label = satellite_anchored`: records (a) the propagated centre estimate of the prior visual-only segment, (b) the new anchor centre.
+- Per-anchor-pair drift = `‖propagated_centre − next_anchor_centre‖`; binned by `last_satellite_anchor_age_ms`.
+- CSV evidence: `e2e-results/run-${RUN_ID}/ft-p-02.csv` (one row per anchor pair).
+- Aggregate pass criteria (per AC-1.3): ≥95 % of anchor pairs satisfy `drift < 100 m` (visual-only) AND `drift < 50 m` when CombinedImuFactor IMU fusion is active in C5.
+
+## Scope
+
+### Included
+- Full-replay test method (~8 min replay + parsing).
+- Anchor-pair detection from the outbound estimate stream (`source_label` transitions).
+- Drift binning + aggregate assertion.
+- CSV evidence emission.
+
+### Excluded
+- Synthetic outage / spoof injection — owned by FT-N-01..04.
+- Sharp-turn-segment-specific assertions — owned by FT-P-07 / FT-N-02.
+- Frame-by-frame inter-emit latency — owned by NFT-PERF-02.
+
+## Acceptance Criteria
+
+**AC-1: anchor-pair detection**
+Given a full Derkachi replay
+Then the test identifies every transition from `visual_propagated` (or `dead_reckoned`) → `satellite_anchored` and records the pair.
+
+**AC-2: drift bound (visual-only)**
+Given anchor pairs whose preceding segment was visual-only (no IMU fusion active in C5)
+Then ≥95 % of those pairs satisfy `drift < 100 m`.
+
+**AC-3: drift bound (IMU-fused)**
+Given anchor pairs whose preceding segment had CombinedImuFactor IMU fusion active in C5
+Then ≥95 % of those pairs satisfy `drift < 50 m`.
+
+**AC-4: drift distribution monotonic with anchor age**
+Given drift bins by `last_satellite_anchor_age_ms` (e.g. {<1 s, 1-3 s, 3-10 s, 10-30 s, >30 s})
+Then the bin medians grow monotonically with age; no anomalous spike (>2× median jump) between adjacent bins.
+
+**AC-5: parameterization**
+Given the conftest's `(fc_adapter, vio_strategy)` parameterization
+Then the scenario runs once per parameterization and emits one row per parameterization in `report.csv`.
+
+## System Under Test Boundary
+
+End-to-end scenario through public boundaries.
+
+- **Allowed inputs**: frame-source replay, IMU CSV replay through FC inbound proxy (passive).
+- **Allowed observation**: SITL-side outbound message stream; FDR-side `source_label` transitions (FDR is a public artifact post-flight); `GLOBAL_POSITION_INT` GT from `data_imu.csv`.
+- **Forbidden**: querying SUT internal C5 graph state, internal `source_label` state machine, or stubbing C5 / C8.
+- If C8 outbound emission is not implemented, the scenario MUST fail (no fallback to a stubbed emit).
+
+## Constraints
+
+- Replay synchrony: 3 video frames per IMU row (per `test-data.md`).
+- Drift is computed in metres in WGS84 via Vincenty; `propagated_centre` is the SUT's last-emitted position immediately before the anchor frame.
+- IMU-fused-vs-visual-only classification: derived from FDR `source_label` history of the segment preceding each anchor (visual-only = entire segment was `visual_propagated` since the prior anchor; IMU-fused = at least one frame was `satellite_anchored` with active CombinedImuFactor — readable from FDR per AC-NEW-3 schema).
+
+## Risks & Mitigation
+
+**Risk: Derkachi fixture has too few satellite-anchor opportunities for statistical power**
+- *Mitigation*: ≥95 % is the AC-1.3 budget; for a fixture with N anchors, the required passing count rounds up. The test reports the actual N, the pass count, and the percentage in the CSV; statistical significance flagged when N < 20.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/blackbox-tests.md` § FT-P-02
+- `_docs/02_document/tests/test-data.md` § Position accuracy (FT-P-02 row)
+- `_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (GT columns `GLOBAL_POSITION_INT.*`)
@@ -0,0 +1,66 @@
+# FT-P-03 + FT-P-14 — Estimate schema + WGS84 output coordinate system
+
+**Task**: AZ-411_ft_p_03_14_schema_wgs84
+**Name**: Estimate output schema + source-label semantics + WGS84 coordinate validation (AC-1.4, AC-4.3, AC-6.3)
+**Description**: Combined coverage for FT-P-03 (output schema + source-label) and FT-P-14 (WGS84 coordinate validation). Both are small format/contract checks on the same outbound message.
+**Complexity**: 2 points
+**Dependencies**: AZ-406, AZ-407
+**Component**: Blackbox Tests / Positive (epic AZ-262)
+**Tracker**: AZ-411
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Two thin contract checks on the SUT's outbound message — schema completeness (AC-1.4 / AC-4.3) and WGS84 coordinate-range validity (AC-6.3) — must be exercised, but each scenario alone is too small for an independent task. They share the fixture and only differ in their assertion set.
+
+## Outcome
+
+- pytest scenario at `e2e/tests/positive/test_ft_p_03_14_schema_wgs84.py` with two test methods: `test_schema_and_source_label` (FT-P-03) and `test_wgs84_coordinate_range` (FT-P-14).
+- Both methods push a single image (default `AD000001.jpg`) and read the resulting outbound message + the out-of-band source-label channel (`STATUSTEXT` or `NAMED_VALUE_FLOAT` per AC-4.3).
+- AC-3 implements the `set_contains` rule on the source label; AC-2 implements the `schema_match` rule on field presence + types.
+
+## Scope
+
+### Included
+- Schema match: `lat:float`, `lon:float`, `cov_semi_major_m:float`, `last_satellite_anchor_age_ms:int` present and well-typed in the outbound message; per AC-4.3 these fields ride either inside the `GPS_INPUT` / `MSP2_SENSOR_GPS` payload OR on a paired side-channel.
+- Source-label set containment: the out-of-band label channel emits one of `{satellite_anchored, visual_propagated, dead_reckoned}`.
+- WGS84 range check: `lat ∈ [-90, 90]`, `lon ∈ [-180, 180]`, scaled per protocol convention (AP `GPS_INPUT.lat/lon` are 1e-7 scaled int32, iNav `MSP2_SENSOR_GPS.lat/lon` likewise — the test parses correctly and checks the decoded float is in WGS84 bounds).
+
+### Excluded
+- Per-image accuracy — owned by FT-P-01 (AZ-409).
+- Honest-covariance-vs-95%-confidence cross-check — owned by FT-N-04 (AZ-426) and AC-NEW-4 in NFT-RES-03.
+- Signing handshake — owned by FT-P-09-AP (AZ-416).
+
+## Acceptance Criteria
+
+**AC-1: schema completeness**
+Given any single outbound message with paired source-label channel
+Then all of `lat:float`, `lon:float`, `cov_semi_major_m:float`, `last_satellite_anchor_age_ms:int` are present and parse to the documented types.
+
+**AC-2: source-label set containment**
+Given the source-label channel emission
+Then the label is exactly one of `{satellite_anchored, visual_propagated, dead_reckoned}`.
+
+**AC-3: WGS84 coordinate range**
+Given the decoded lat/lon
+Then `lat ∈ [-90, 90]` AND `lon ∈ [-180, 180]` AND the scaling factor matches the protocol convention (AP: 1e-7 scaled int32; iNav: per `MSP2_SENSOR_GPS` schema in `docs/SITL/SITL.md`).
+
+**AC-4: parameterization**
+Given the conftest's `(fc_adapter, vio_strategy)` parameterization
+Then both test methods run for each parameterization.
+
+## System Under Test Boundary
+
+End-to-end through public boundaries; SITL-observed.
+
+- **Allowed**: SITL receipt, mavproxy listener for STATUSTEXT/NAMED_VALUE_FLOAT.
+- **Forbidden**: parsing SUT internal logs for the schema fields; the schema must be visible at the outbound boundary (or it's a real defect).
+
+## Constraints
+
+- The source-label side-channel mechanism is documented per AC-4.3 ("MAVLink `STATUSTEXT` or `NAMED_VALUE_FLOAT` per AC-4.3"). Both encodings are accepted by this test as long as the label arrives within ≤500 ms of the paired `GPS_INPUT` / `MSP2_SENSOR_GPS`.
+
+## Document Dependencies
+
+- `_docs/02_document/tests/blackbox-tests.md` § FT-P-03, § FT-P-14
+- `_docs/02_document/tests/test-data.md` § Position accuracy (FT-P-03 row)