mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 16:11:13 +00:00
f25cae4a82
Implement the AC-8.6 (top-K=10 retrieval scale-ratio + scene-change
PARTIAL) and AC-8.2 / AC-NEW-6 (stale aged-tile rejection) blackbox
scenarios.
AZ-423 (FT-P-19, 3pt) helpers + scenario:
- retrieval_evaluator.py — top-K within-distance evaluator (60 stills
vs 100 m budget), scene-change PARTIAL recorder (always emits
PARTIAL on the 2 _gmaps.png pairs), FDR record projectors, CSV
writers.
- tests/positive/test_ft_p_19_sat_reloc_scale.py (6 parametrised
variants).
AZ-427 (FT-N-05, 2pt) helpers + scenario:
- aged_tile_rejection_evaluator.py — Signal A (stale rejection at
load) + Signal B (per-frame downgrade) decision matrix, reuses
ALLOWED_SOURCE_LABELS from estimate_schema.
- tests/negative/test_ft_n_05_stale_tile_rejection.py (12 parametrised
variants: FC × VIO × {7mo/active-conflict, 13mo/rear}).
48 new unit tests cover every helper branch. Both scenarios skip
when sitl_replay_ready is false and fail loudly when fixture records
are missing.
Per-batch review: PASS_WITH_WARNINGS (2 Low — production-dependency
surface, FDR-kind constant duplication).
Cumulative review 82-84: PASS (2 Low carry-over / hygiene candidate).
Co-authored-by: Cursor <cursoragent@cursor.com>
6.1 KiB
6.1 KiB
Batch 84 — AZ-423 + AZ-427 (FT-P-19 sat-relocalization + FT-N-05 stale-tile rejection)
Tracker: AZ-423, AZ-427
Tasks: 2 tasks / 5 complexity points (3 + 2)
Date: 2026-05-17
Verdict: PASS_WITH_WARNINGS
Review: _docs/03_implementation/reviews/batch_84_review.md
Scope
- AZ-423 / FT-P-19 (AC-8.6): per-image top-K=10 retrieval includes a tile centre within 100 m of the image's true centre across all 60 stills; scene-change subset (2 paired
_gmaps.pngimages) carries PARTIAL annotation. - AZ-427 / FT-N-05 (AC-8.2, AC-NEW-6): aged-tile fixtures (
synth-age-7moin active-conflict sector,synth-age-13moin rear sector) produce zerosatellite_anchoredemissions — either by load-time stale rejection (Signal A) or per-frame downgrade (Signal B).
Files
Created
e2e/runner/helpers/retrieval_evaluator.py— pure-logic evaluators for AC-1 (top-K within distance) + AC-2 (scene-change PARTIAL); FDR payload projectors forretrieval-topk+scene-change-matchrecords; CSV emitters for evidence.e2e/runner/helpers/aged_tile_rejection_evaluator.py— pure-logic evaluator with explicit Signal-A / Signal-B decision matrix; FDR projector fortile-load-rejected: stalerecords; sector-binding constants (synth-age-7mo → active_conflict,synth-age-13mo → rear).e2e/tests/positive/test_ft_p_19_sat_reloc_scale.py— FT-P-19 scenario.e2e/tests/negative/test_ft_n_05_stale_tile_rejection.py— FT-N-05 scenario (parametrised across 2 sub-cases viaAGED_FIXTURE_SECTOR_BINDINGS).e2e/_unit_tests/helpers/test_retrieval_evaluator.py— 29 unit tests.e2e/_unit_tests/helpers/test_aged_tile_rejection_evaluator.py— 19 unit tests.
Modified
e2e/_unit_tests/test_directory_layout.py— registered 4 new paths.
Test Results
$ pytest _unit_tests/helpers/test_retrieval_evaluator.py \
_unit_tests/helpers/test_aged_tile_rejection_evaluator.py \
_unit_tests/test_directory_layout.py
============================= 157 passed in 0.28s ==============================
Scenario collection:
$ pytest tests/positive/test_ft_p_19_sat_reloc_scale.py \
tests/negative/test_ft_n_05_stale_tile_rejection.py --collect-only
collected 18 items
test_ft_p_19_sat_reloc_scale: 6 cases (ardupilot/inav × {okvis2, klt_ransac, vins_mono})
test_ft_n_05_stale_tile_rejection: 12 cases (FC × VIO × 2 aged-tile sub-cases)
AC Verification
AZ-423 / FT-P-19
| AC | Coverage |
|---|---|
| AC-1 top-K=10 within 100 m for all 60 images | evaluate_top_k_within_distance + scenario assertion + 9 unit tests |
| AC-2 scene-change subset PARTIAL | evaluate_scene_change_subset + scenario assertion + 5 unit tests |
| AC-3 parameterisation | 6 collected variants via conftest fc_adapter / vio_strategy fixtures |
AZ-427 / FT-N-05
| AC | Coverage |
|---|---|
AC-1 7mo aged tiles → 0 satellite_anchored emissions |
parametrised sub-case + evaluate_aged_tile_rejection + 11 unit tests |
AC-2 13mo aged tiles → 0 satellite_anchored emissions |
same helper + sub-case binding synth-age-13mo → rear |
| AC-3 parameterisation | 12 collected variants (FC × VIO × 2 sub-cases) |
traces_to markers:
- FT-P-19:
AC-8.6,AC-1,AC-2,AC-3 - FT-N-05:
AC-8.2,AC-NEW-6,AC-1,AC-2,AC-3
Code Review
Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low.
- F1 (production-dependency surface): both scenarios depend on upstream SUT features (
retrieval-topk/scene-change-matchFDR record kinds for AZ-423;tile-load-rejected: staleevents OR per-frame downgrades for AZ-427) and fixture-builder support (synth-age-*mounts +E2E_FT_N_05_FIXTUREenv var). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing. - F2 (maintainability — intra-batch duplication):
TILE_LOAD_REJECTED_FDR_KINDand its stale-reason constant now exist in bothaged_tile_rejection_evaluator.pyandmid_flight_tile_evaluator.py. Future hygiene PBI candidate.
Full review: _docs/03_implementation/reviews/batch_84_review.md.
Production Dependencies
Surfaced for the cumulative review window (82-84) + traceability matrix:
- AZ-423 SUT-side: emit FDR
retrieval-topkrecord kind per pushed image, carryingimage_id+candidateslist of{tile_id, centre_lat_deg, centre_lon_deg}. - AZ-423 SUT-side: emit FDR
scene-change-matchrecord kind for each paired_gmaps.pngimage, carryingimage_id+matched(bool) + optionalinlier_count(int). - AZ-427 SUT-side: wire the C6 freshness gate's sector-aware date comparison (active-conflict vs rear thresholds per AC-8.2) and emit FDR
tile-load-rejected: staleevents at startup OR ensure every emission downgrades to{visual_propagated, dead_reckoned}. - Fixture-builder-side: snapshot/mount of
synth-age-7mo+synth-age-13motile sets, run-per-sub-case orchestration that publishesE2E_FT_N_05_FIXTUREenv var declaring which fixture is currently active. - Already exists:
still-image-set-60+coordinates.csv(AZ-407 deliverable) +still-image-sat-refs-2(AZ-407). - Already exists:
sitl_replay_readyfixture,fc_adapter/vio_strategyparameterisation,evidence_dir+nfr_recorder+run_id(AZ-406).
Architecture Compliance
- All new files under
e2e/, owned by the Blackbox Tests component per_docs/02_document/module-layout.md. - No imports from
src/gps_denied_onboard(explicit public-boundary discipline notes in both helpers). - No new cyclic dependencies. Both helpers depend on
.geoand (foraged_tile_rejection_evaluator).estimate_schemaonly. - No new infrastructure libraries.
Sub-step Trace
Phases executed per implement/SKILL.md:
- phase 5 (load-spec) → AZ-423 + AZ-427 specs read
- phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests
- phase 7 (verify-ac-coverage) → ACs traced above
- phase 8 (code-review) → batch_84_review.md (PASS_WITH_WARNINGS)
- phase 8.5 (cumulative-review) → batches_82-84 (next: K=3 cumulative trigger)
- phase 11 (commit-batch) → after cumulative review.