# Batch 84 — AZ-423 + AZ-427 (FT-P-19 sat-relocalization + FT-N-05 stale-tile rejection) **Tracker**: AZ-423, AZ-427 **Tasks**: 2 tasks / 5 complexity points (3 + 2) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS **Review**: `_docs/03_implementation/reviews/batch_84_review.md` ## Scope - **AZ-423 / FT-P-19 (AC-8.6)**: per-image top-K=10 retrieval includes a tile centre within 100 m of the image's true centre across all 60 stills; scene-change subset (2 paired `_gmaps.png` images) carries PARTIAL annotation. - **AZ-427 / FT-N-05 (AC-8.2, AC-NEW-6)**: aged-tile fixtures (`synth-age-7mo` in active-conflict sector, `synth-age-13mo` in rear sector) produce zero `satellite_anchored` emissions — either by load-time stale rejection (Signal A) or per-frame downgrade (Signal B). ## Files ### Created - `e2e/runner/helpers/retrieval_evaluator.py` — pure-logic evaluators for AC-1 (top-K within distance) + AC-2 (scene-change PARTIAL); FDR payload projectors for `retrieval-topk` + `scene-change-match` records; CSV emitters for evidence. - `e2e/runner/helpers/aged_tile_rejection_evaluator.py` — pure-logic evaluator with explicit Signal-A / Signal-B decision matrix; FDR projector for `tile-load-rejected: stale` records; sector-binding constants (`synth-age-7mo → active_conflict`, `synth-age-13mo → rear`). - `e2e/tests/positive/test_ft_p_19_sat_reloc_scale.py` — FT-P-19 scenario. - `e2e/tests/negative/test_ft_n_05_stale_tile_rejection.py` — FT-N-05 scenario (parametrised across 2 sub-cases via `AGED_FIXTURE_SECTOR_BINDINGS`). - `e2e/_unit_tests/helpers/test_retrieval_evaluator.py` — 29 unit tests. - `e2e/_unit_tests/helpers/test_aged_tile_rejection_evaluator.py` — 19 unit tests. ### Modified - `e2e/_unit_tests/test_directory_layout.py` — registered 4 new paths. ## Test Results ``` $ pytest _unit_tests/helpers/test_retrieval_evaluator.py \ _unit_tests/helpers/test_aged_tile_rejection_evaluator.py \ _unit_tests/test_directory_layout.py ============================= 157 passed in 0.28s ============================== ``` Scenario collection: ``` $ pytest tests/positive/test_ft_p_19_sat_reloc_scale.py \ tests/negative/test_ft_n_05_stale_tile_rejection.py --collect-only collected 18 items test_ft_p_19_sat_reloc_scale: 6 cases (ardupilot/inav × {okvis2, klt_ransac, vins_mono}) test_ft_n_05_stale_tile_rejection: 12 cases (FC × VIO × 2 aged-tile sub-cases) ``` ## AC Verification ### AZ-423 / FT-P-19 | AC | Coverage | |----|----------| | AC-1 top-K=10 within 100 m for all 60 images | `evaluate_top_k_within_distance` + scenario assertion + 9 unit tests | | AC-2 scene-change subset PARTIAL | `evaluate_scene_change_subset` + scenario assertion + 5 unit tests | | AC-3 parameterisation | 6 collected variants via conftest `fc_adapter` / `vio_strategy` fixtures | ### AZ-427 / FT-N-05 | AC | Coverage | |----|----------| | AC-1 7mo aged tiles → 0 `satellite_anchored` emissions | parametrised sub-case + `evaluate_aged_tile_rejection` + 11 unit tests | | AC-2 13mo aged tiles → 0 `satellite_anchored` emissions | same helper + sub-case binding `synth-age-13mo → rear` | | AC-3 parameterisation | 12 collected variants (FC × VIO × 2 sub-cases) | `traces_to` markers: - FT-P-19: `AC-8.6,AC-1,AC-2,AC-3` - FT-N-05: `AC-8.2,AC-NEW-6,AC-1,AC-2,AC-3` ## Code Review **Verdict**: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low. - **F1 (production-dependency surface)**: both scenarios depend on upstream SUT features (`retrieval-topk` / `scene-change-match` FDR record kinds for AZ-423; `tile-load-rejected: stale` events OR per-frame downgrades for AZ-427) and fixture-builder support (`synth-age-*` mounts + `E2E_FT_N_05_FIXTURE` env var). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing. - **F2 (maintainability — intra-batch duplication)**: `TILE_LOAD_REJECTED_FDR_KIND` and its stale-reason constant now exist in both `aged_tile_rejection_evaluator.py` and `mid_flight_tile_evaluator.py`. Future hygiene PBI candidate. Full review: `_docs/03_implementation/reviews/batch_84_review.md`. ## Production Dependencies Surfaced for the cumulative review window (82-84) + traceability matrix: 1. **AZ-423 SUT-side**: emit FDR `retrieval-topk` record kind per pushed image, carrying `image_id` + `candidates` list of `{tile_id, centre_lat_deg, centre_lon_deg}`. 2. **AZ-423 SUT-side**: emit FDR `scene-change-match` record kind for each paired `_gmaps.png` image, carrying `image_id` + `matched` (bool) + optional `inlier_count` (int). 3. **AZ-427 SUT-side**: wire the C6 freshness gate's sector-aware date comparison (active-conflict vs rear thresholds per AC-8.2) and emit FDR `tile-load-rejected: stale` events at startup OR ensure every emission downgrades to `{visual_propagated, dead_reckoned}`. 4. **Fixture-builder-side**: snapshot/mount of `synth-age-7mo` + `synth-age-13mo` tile sets, run-per-sub-case orchestration that publishes `E2E_FT_N_05_FIXTURE` env var declaring which fixture is currently active. 5. **Already exists**: `still-image-set-60` + `coordinates.csv` (AZ-407 deliverable) + `still-image-sat-refs-2` (AZ-407). 6. **Already exists**: `sitl_replay_ready` fixture, `fc_adapter` / `vio_strategy` parameterisation, `evidence_dir` + `nfr_recorder` + `run_id` (AZ-406). ## Architecture Compliance - All new files under `e2e/`, owned by the Blackbox Tests component per `_docs/02_document/module-layout.md`. - No imports from `src/gps_denied_onboard` (explicit public-boundary discipline notes in both helpers). - No new cyclic dependencies. Both helpers depend on `.geo` and (for `aged_tile_rejection_evaluator`) `.estimate_schema` only. - No new infrastructure libraries. ## Sub-step Trace Phases executed per `implement/SKILL.md`: - phase 5 (load-spec) → AZ-423 + AZ-427 specs read - phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests - phase 7 (verify-ac-coverage) → ACs traced above - phase 8 (code-review) → batch_84_review.md (PASS_WITH_WARNINGS) - phase 8.5 (cumulative-review) → batches_82-84 (next: K=3 cumulative trigger) - phase 11 (commit-batch) → after cumulative review.