Files
gps-denied-onboard/_docs/03_implementation/batch_84_report.md
T
Oleksandr Bezdieniezhnykh f25cae4a82 [AZ-423] [AZ-427] Add FT-P-19 + FT-N-05 blackbox tests
Implement the AC-8.6 (top-K=10 retrieval scale-ratio + scene-change
PARTIAL) and AC-8.2 / AC-NEW-6 (stale aged-tile rejection) blackbox
scenarios.

AZ-423 (FT-P-19, 3pt) helpers + scenario:
- retrieval_evaluator.py — top-K within-distance evaluator (60 stills
  vs 100 m budget), scene-change PARTIAL recorder (always emits
  PARTIAL on the 2 _gmaps.png pairs), FDR record projectors, CSV
  writers.
- tests/positive/test_ft_p_19_sat_reloc_scale.py (6 parametrised
  variants).

AZ-427 (FT-N-05, 2pt) helpers + scenario:
- aged_tile_rejection_evaluator.py — Signal A (stale rejection at
  load) + Signal B (per-frame downgrade) decision matrix, reuses
  ALLOWED_SOURCE_LABELS from estimate_schema.
- tests/negative/test_ft_n_05_stale_tile_rejection.py (12 parametrised
  variants: FC × VIO × {7mo/active-conflict, 13mo/rear}).

48 new unit tests cover every helper branch. Both scenarios skip
when sitl_replay_ready is false and fail loudly when fixture records
are missing.

Per-batch review: PASS_WITH_WARNINGS (2 Low — production-dependency
surface, FDR-kind constant duplication).
Cumulative review 82-84: PASS (2 Low carry-over / hygiene candidate).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 15:43:06 +03:00

6.1 KiB
Raw Blame History

Batch 84 — AZ-423 + AZ-427 (FT-P-19 sat-relocalization + FT-N-05 stale-tile rejection)

Tracker: AZ-423, AZ-427 Tasks: 2 tasks / 5 complexity points (3 + 2) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS Review: _docs/03_implementation/reviews/batch_84_review.md

Scope

  • AZ-423 / FT-P-19 (AC-8.6): per-image top-K=10 retrieval includes a tile centre within 100 m of the image's true centre across all 60 stills; scene-change subset (2 paired _gmaps.png images) carries PARTIAL annotation.
  • AZ-427 / FT-N-05 (AC-8.2, AC-NEW-6): aged-tile fixtures (synth-age-7mo in active-conflict sector, synth-age-13mo in rear sector) produce zero satellite_anchored emissions — either by load-time stale rejection (Signal A) or per-frame downgrade (Signal B).

Files

Created

  • e2e/runner/helpers/retrieval_evaluator.py — pure-logic evaluators for AC-1 (top-K within distance) + AC-2 (scene-change PARTIAL); FDR payload projectors for retrieval-topk + scene-change-match records; CSV emitters for evidence.
  • e2e/runner/helpers/aged_tile_rejection_evaluator.py — pure-logic evaluator with explicit Signal-A / Signal-B decision matrix; FDR projector for tile-load-rejected: stale records; sector-binding constants (synth-age-7mo → active_conflict, synth-age-13mo → rear).
  • e2e/tests/positive/test_ft_p_19_sat_reloc_scale.py — FT-P-19 scenario.
  • e2e/tests/negative/test_ft_n_05_stale_tile_rejection.py — FT-N-05 scenario (parametrised across 2 sub-cases via AGED_FIXTURE_SECTOR_BINDINGS).
  • e2e/_unit_tests/helpers/test_retrieval_evaluator.py — 29 unit tests.
  • e2e/_unit_tests/helpers/test_aged_tile_rejection_evaluator.py — 19 unit tests.

Modified

  • e2e/_unit_tests/test_directory_layout.py — registered 4 new paths.

Test Results

$ pytest _unit_tests/helpers/test_retrieval_evaluator.py \
         _unit_tests/helpers/test_aged_tile_rejection_evaluator.py \
         _unit_tests/test_directory_layout.py
============================= 157 passed in 0.28s ==============================

Scenario collection:

$ pytest tests/positive/test_ft_p_19_sat_reloc_scale.py \
         tests/negative/test_ft_n_05_stale_tile_rejection.py --collect-only
collected 18 items
  test_ft_p_19_sat_reloc_scale: 6 cases (ardupilot/inav × {okvis2, klt_ransac, vins_mono})
  test_ft_n_05_stale_tile_rejection: 12 cases (FC × VIO × 2 aged-tile sub-cases)

AC Verification

AZ-423 / FT-P-19

AC Coverage
AC-1 top-K=10 within 100 m for all 60 images evaluate_top_k_within_distance + scenario assertion + 9 unit tests
AC-2 scene-change subset PARTIAL evaluate_scene_change_subset + scenario assertion + 5 unit tests
AC-3 parameterisation 6 collected variants via conftest fc_adapter / vio_strategy fixtures

AZ-427 / FT-N-05

AC Coverage
AC-1 7mo aged tiles → 0 satellite_anchored emissions parametrised sub-case + evaluate_aged_tile_rejection + 11 unit tests
AC-2 13mo aged tiles → 0 satellite_anchored emissions same helper + sub-case binding synth-age-13mo → rear
AC-3 parameterisation 12 collected variants (FC × VIO × 2 sub-cases)

traces_to markers:

  • FT-P-19: AC-8.6,AC-1,AC-2,AC-3
  • FT-N-05: AC-8.2,AC-NEW-6,AC-1,AC-2,AC-3

Code Review

Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low.

  • F1 (production-dependency surface): both scenarios depend on upstream SUT features (retrieval-topk / scene-change-match FDR record kinds for AZ-423; tile-load-rejected: stale events OR per-frame downgrades for AZ-427) and fixture-builder support (synth-age-* mounts + E2E_FT_N_05_FIXTURE env var). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing.
  • F2 (maintainability — intra-batch duplication): TILE_LOAD_REJECTED_FDR_KIND and its stale-reason constant now exist in both aged_tile_rejection_evaluator.py and mid_flight_tile_evaluator.py. Future hygiene PBI candidate.

Full review: _docs/03_implementation/reviews/batch_84_review.md.

Production Dependencies

Surfaced for the cumulative review window (82-84) + traceability matrix:

  1. AZ-423 SUT-side: emit FDR retrieval-topk record kind per pushed image, carrying image_id + candidates list of {tile_id, centre_lat_deg, centre_lon_deg}.
  2. AZ-423 SUT-side: emit FDR scene-change-match record kind for each paired _gmaps.png image, carrying image_id + matched (bool) + optional inlier_count (int).
  3. AZ-427 SUT-side: wire the C6 freshness gate's sector-aware date comparison (active-conflict vs rear thresholds per AC-8.2) and emit FDR tile-load-rejected: stale events at startup OR ensure every emission downgrades to {visual_propagated, dead_reckoned}.
  4. Fixture-builder-side: snapshot/mount of synth-age-7mo + synth-age-13mo tile sets, run-per-sub-case orchestration that publishes E2E_FT_N_05_FIXTURE env var declaring which fixture is currently active.
  5. Already exists: still-image-set-60 + coordinates.csv (AZ-407 deliverable) + still-image-sat-refs-2 (AZ-407).
  6. Already exists: sitl_replay_ready fixture, fc_adapter / vio_strategy parameterisation, evidence_dir + nfr_recorder + run_id (AZ-406).

Architecture Compliance

  • All new files under e2e/, owned by the Blackbox Tests component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard (explicit public-boundary discipline notes in both helpers).
  • No new cyclic dependencies. Both helpers depend on .geo and (for aged_tile_rejection_evaluator) .estimate_schema only.
  • No new infrastructure libraries.

Sub-step Trace

Phases executed per implement/SKILL.md:

  • phase 5 (load-spec) → AZ-423 + AZ-427 specs read
  • phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests
  • phase 7 (verify-ac-coverage) → ACs traced above
  • phase 8 (code-review) → batch_84_review.md (PASS_WITH_WARNINGS)
  • phase 8.5 (cumulative-review) → batches_82-84 (next: K=3 cumulative trigger)
  • phase 11 (commit-batch) → after cumulative review.