mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 03:21:14 +00:00
[AZ-423] [AZ-427] Add FT-P-19 + FT-N-05 blackbox tests
Implement the AC-8.6 (top-K=10 retrieval scale-ratio + scene-change
PARTIAL) and AC-8.2 / AC-NEW-6 (stale aged-tile rejection) blackbox
scenarios.
AZ-423 (FT-P-19, 3pt) helpers + scenario:
- retrieval_evaluator.py — top-K within-distance evaluator (60 stills
vs 100 m budget), scene-change PARTIAL recorder (always emits
PARTIAL on the 2 _gmaps.png pairs), FDR record projectors, CSV
writers.
- tests/positive/test_ft_p_19_sat_reloc_scale.py (6 parametrised
variants).
AZ-427 (FT-N-05, 2pt) helpers + scenario:
- aged_tile_rejection_evaluator.py — Signal A (stale rejection at
load) + Signal B (per-frame downgrade) decision matrix, reuses
ALLOWED_SOURCE_LABELS from estimate_schema.
- tests/negative/test_ft_n_05_stale_tile_rejection.py (12 parametrised
variants: FC × VIO × {7mo/active-conflict, 13mo/rear}).
48 new unit tests cover every helper branch. Both scenarios skip
when sitl_replay_ready is false and fail loudly when fixture records
are missing.
Per-batch review: PASS_WITH_WARNINGS (2 Low — production-dependency
surface, FDR-kind constant duplication).
Cumulative review 82-84: PASS (2 Low carry-over / hygiene candidate).
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,105 @@
|
||||
# Batch 84 — AZ-423 + AZ-427 (FT-P-19 sat-relocalization + FT-N-05 stale-tile rejection)
|
||||
|
||||
**Tracker**: AZ-423, AZ-427
|
||||
**Tasks**: 2 tasks / 5 complexity points (3 + 2)
|
||||
**Date**: 2026-05-17
|
||||
**Verdict**: PASS_WITH_WARNINGS
|
||||
**Review**: `_docs/03_implementation/reviews/batch_84_review.md`
|
||||
|
||||
## Scope
|
||||
|
||||
- **AZ-423 / FT-P-19 (AC-8.6)**: per-image top-K=10 retrieval includes a tile centre within 100 m of the image's true centre across all 60 stills; scene-change subset (2 paired `_gmaps.png` images) carries PARTIAL annotation.
|
||||
- **AZ-427 / FT-N-05 (AC-8.2, AC-NEW-6)**: aged-tile fixtures (`synth-age-7mo` in active-conflict sector, `synth-age-13mo` in rear sector) produce zero `satellite_anchored` emissions — either by load-time stale rejection (Signal A) or per-frame downgrade (Signal B).
|
||||
|
||||
## Files
|
||||
|
||||
### Created
|
||||
|
||||
- `e2e/runner/helpers/retrieval_evaluator.py` — pure-logic evaluators for AC-1 (top-K within distance) + AC-2 (scene-change PARTIAL); FDR payload projectors for `retrieval-topk` + `scene-change-match` records; CSV emitters for evidence.
|
||||
- `e2e/runner/helpers/aged_tile_rejection_evaluator.py` — pure-logic evaluator with explicit Signal-A / Signal-B decision matrix; FDR projector for `tile-load-rejected: stale` records; sector-binding constants (`synth-age-7mo → active_conflict`, `synth-age-13mo → rear`).
|
||||
- `e2e/tests/positive/test_ft_p_19_sat_reloc_scale.py` — FT-P-19 scenario.
|
||||
- `e2e/tests/negative/test_ft_n_05_stale_tile_rejection.py` — FT-N-05 scenario (parametrised across 2 sub-cases via `AGED_FIXTURE_SECTOR_BINDINGS`).
|
||||
- `e2e/_unit_tests/helpers/test_retrieval_evaluator.py` — 29 unit tests.
|
||||
- `e2e/_unit_tests/helpers/test_aged_tile_rejection_evaluator.py` — 19 unit tests.
|
||||
|
||||
### Modified
|
||||
|
||||
- `e2e/_unit_tests/test_directory_layout.py` — registered 4 new paths.
|
||||
|
||||
## Test Results
|
||||
|
||||
```
|
||||
$ pytest _unit_tests/helpers/test_retrieval_evaluator.py \
|
||||
_unit_tests/helpers/test_aged_tile_rejection_evaluator.py \
|
||||
_unit_tests/test_directory_layout.py
|
||||
============================= 157 passed in 0.28s ==============================
|
||||
```
|
||||
|
||||
Scenario collection:
|
||||
|
||||
```
|
||||
$ pytest tests/positive/test_ft_p_19_sat_reloc_scale.py \
|
||||
tests/negative/test_ft_n_05_stale_tile_rejection.py --collect-only
|
||||
collected 18 items
|
||||
test_ft_p_19_sat_reloc_scale: 6 cases (ardupilot/inav × {okvis2, klt_ransac, vins_mono})
|
||||
test_ft_n_05_stale_tile_rejection: 12 cases (FC × VIO × 2 aged-tile sub-cases)
|
||||
```
|
||||
|
||||
## AC Verification
|
||||
|
||||
### AZ-423 / FT-P-19
|
||||
|
||||
| AC | Coverage |
|
||||
|----|----------|
|
||||
| AC-1 top-K=10 within 100 m for all 60 images | `evaluate_top_k_within_distance` + scenario assertion + 9 unit tests |
|
||||
| AC-2 scene-change subset PARTIAL | `evaluate_scene_change_subset` + scenario assertion + 5 unit tests |
|
||||
| AC-3 parameterisation | 6 collected variants via conftest `fc_adapter` / `vio_strategy` fixtures |
|
||||
|
||||
### AZ-427 / FT-N-05
|
||||
|
||||
| AC | Coverage |
|
||||
|----|----------|
|
||||
| AC-1 7mo aged tiles → 0 `satellite_anchored` emissions | parametrised sub-case + `evaluate_aged_tile_rejection` + 11 unit tests |
|
||||
| AC-2 13mo aged tiles → 0 `satellite_anchored` emissions | same helper + sub-case binding `synth-age-13mo → rear` |
|
||||
| AC-3 parameterisation | 12 collected variants (FC × VIO × 2 sub-cases) |
|
||||
|
||||
`traces_to` markers:
|
||||
- FT-P-19: `AC-8.6,AC-1,AC-2,AC-3`
|
||||
- FT-N-05: `AC-8.2,AC-NEW-6,AC-1,AC-2,AC-3`
|
||||
|
||||
## Code Review
|
||||
|
||||
**Verdict**: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low.
|
||||
|
||||
- **F1 (production-dependency surface)**: both scenarios depend on upstream SUT features (`retrieval-topk` / `scene-change-match` FDR record kinds for AZ-423; `tile-load-rejected: stale` events OR per-frame downgrades for AZ-427) and fixture-builder support (`synth-age-*` mounts + `E2E_FT_N_05_FIXTURE` env var). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing.
|
||||
- **F2 (maintainability — intra-batch duplication)**: `TILE_LOAD_REJECTED_FDR_KIND` and its stale-reason constant now exist in both `aged_tile_rejection_evaluator.py` and `mid_flight_tile_evaluator.py`. Future hygiene PBI candidate.
|
||||
|
||||
Full review: `_docs/03_implementation/reviews/batch_84_review.md`.
|
||||
|
||||
## Production Dependencies
|
||||
|
||||
Surfaced for the cumulative review window (82-84) + traceability matrix:
|
||||
|
||||
1. **AZ-423 SUT-side**: emit FDR `retrieval-topk` record kind per pushed image, carrying `image_id` + `candidates` list of `{tile_id, centre_lat_deg, centre_lon_deg}`.
|
||||
2. **AZ-423 SUT-side**: emit FDR `scene-change-match` record kind for each paired `_gmaps.png` image, carrying `image_id` + `matched` (bool) + optional `inlier_count` (int).
|
||||
3. **AZ-427 SUT-side**: wire the C6 freshness gate's sector-aware date comparison (active-conflict vs rear thresholds per AC-8.2) and emit FDR `tile-load-rejected: stale` events at startup OR ensure every emission downgrades to `{visual_propagated, dead_reckoned}`.
|
||||
4. **Fixture-builder-side**: snapshot/mount of `synth-age-7mo` + `synth-age-13mo` tile sets, run-per-sub-case orchestration that publishes `E2E_FT_N_05_FIXTURE` env var declaring which fixture is currently active.
|
||||
5. **Already exists**: `still-image-set-60` + `coordinates.csv` (AZ-407 deliverable) + `still-image-sat-refs-2` (AZ-407).
|
||||
6. **Already exists**: `sitl_replay_ready` fixture, `fc_adapter` / `vio_strategy` parameterisation, `evidence_dir` + `nfr_recorder` + `run_id` (AZ-406).
|
||||
|
||||
## Architecture Compliance
|
||||
|
||||
- All new files under `e2e/`, owned by the Blackbox Tests component per `_docs/02_document/module-layout.md`.
|
||||
- No imports from `src/gps_denied_onboard` (explicit public-boundary discipline notes in both helpers).
|
||||
- No new cyclic dependencies. Both helpers depend on `.geo` and (for `aged_tile_rejection_evaluator`) `.estimate_schema` only.
|
||||
- No new infrastructure libraries.
|
||||
|
||||
## Sub-step Trace
|
||||
|
||||
Phases executed per `implement/SKILL.md`:
|
||||
- phase 5 (load-spec) → AZ-423 + AZ-427 specs read
|
||||
- phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests
|
||||
- phase 7 (verify-ac-coverage) → ACs traced above
|
||||
- phase 8 (code-review) → batch_84_review.md (PASS_WITH_WARNINGS)
|
||||
- phase 8.5 (cumulative-review) → batches_82-84 (next: K=3 cumulative trigger)
|
||||
- phase 11 (commit-batch) → after cumulative review.
|
||||
Reference in New Issue
Block a user