mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 22:31:12 +00:00
[AZ-409] [AZ-412] [AZ-413] Batch 70: FT-P-01/04/05/06 scenarios
AZ-409 (3pt) — FT-P-01 still-image frame-center accuracy: - accuracy_evaluator.py: GT loader + Vincenty error + AC-2/AC-3 pass-counts - test_ft_p_01_still_image_accuracy.py: scenario gated on frame_source_replay + sitl_observer NotImplementedError; AC-4 timeout discipline AZ-412 (3pt) — FT-P-04 Derkachi f2f registration >=95% on normal segments: - registration_classifier.py: accel-derived attitude + overlap heuristic + success ratio with AC-3 sharp-turn exclusion - test_ft_p_04_derkachi_f2f_registration.py: scenario gated on frame_source_replay + imu_replay + fdr_reader AZ-413 (3pt) — FT-P-05 + FT-P-06 cross-domain MRE budgets: - mre_evaluator.py: per-image budget (strict <2.5px) + 95th-percentile via numpy linear interp + combined report - test_ft_p_05_sat_anchor.py: cross-domain scenario, reuses accuracy_evaluator for geodesic join - test_ft_p_06_mre_budgets.py: pure piggyback on FT-P-04 + FT-P-05 CSV evidence; skips when either upstream CSV missing Tests: 325 unit tests pass (+77 vs batch 69). Reports: batch_70_report.md, batch_70_review.md (PASS). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,88 @@
|
||||
# FT-P-01 — Still-image satellite-anchor frame-center accuracy
|
||||
|
||||
**Task**: AZ-409_ft_p_01_still_image_accuracy
|
||||
**Name**: FT-P-01 still-image set-60 frame-center accuracy (AC-1.1, AC-1.2)
|
||||
**Description**: Implement the FT-P-01 blackbox scenario — push 60 still images one at a time, observe outbound `GPS_INPUT` (AP) / `MSP2_SENSOR_GPS` (iNav) at SITL, compute Vincenty geodesic distance to GT, assert pass-count rules from `expected_results/results_report.md`.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-406, AZ-407
|
||||
**Component**: Blackbox Tests / Positive (epic AZ-262)
|
||||
**Tracker**: AZ-409
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
The canonical "frame-center geolocation" pipeline accuracy must be validated against the 60-image GT set under the AC-1.1 / AC-1.2 budgets — without this scenario the project has no honest measurement of its satellite-anchor accuracy on still images.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A pytest scenario at `e2e/tests/positive/test_ft_p_01_still_image_accuracy.py` parameterized by `(fc_adapter, vio_strategy)`.
|
||||
- Test pushes each `AD0000NN.jpg` to the SUT's frame source, waits up to 5 s for the corresponding outbound message at the SITL listener, computes Vincenty distance to the `coordinates.csv` GT row.
|
||||
- Test emits `e2e-results/run-${RUN_ID}/ft-p-01.csv` (one row per image: `image_id, gt_lat, gt_lon, est_lat, est_lon, error_m, pass_50m, pass_20m`).
|
||||
- Aggregate pass criteria: `pass_count(error≤50m) ≥ 48` AND `pass_count(error≤20m) ≥ 30`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- pytest test method per `(fc_adapter, vio_strategy)` parameterization.
|
||||
- Frame-source-replay helper invocation (one image at a time).
|
||||
- SITL observer for AP `GPS_INPUT` reception AND iNav `MSP2_SENSOR_GPS` reception.
|
||||
- Vincenty geodesic computation (use `geopy` or `scipy` per the runner's pinned deps).
|
||||
- Per-image CSV emission for evidence; aggregate assertion.
|
||||
|
||||
### Excluded
|
||||
- The 60 images themselves — bind-mounted from `_docs/00_problem/input_data/`.
|
||||
- The tile-cache-fixture build — owned by AZ-407.
|
||||
- Per-image MRE (cross-domain matcher quality) — owned by FT-P-05 (AZ-413).
|
||||
- Schema validation of the outbound message — owned by FT-P-03 (AZ-411).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: per-image distance computed**
|
||||
Given the SUT cold-started with `tile-cache-fixture` mounted
|
||||
When the test pushes each `AD0000NN.jpg` and reads the next outbound message at SITL within 5 s
|
||||
Then a per-image error row is recorded in `ft-p-01.csv` for all 60 images.
|
||||
|
||||
**AC-2: pass-count rule (50 m budget)**
|
||||
Given all 60 per-image distances
|
||||
Then `pass_count(error_m ≤ 50) ≥ 48` (i.e. ≥80 % of 60 images per AC-1.1).
|
||||
|
||||
**AC-3: pass-count rule (20 m budget)**
|
||||
Given all 60 per-image distances
|
||||
Then `pass_count(error_m ≤ 20) ≥ 30` (i.e. ≥50 % of 60 images per AC-1.2).
|
||||
|
||||
**AC-4: timeout discipline**
|
||||
Given an image is pushed
|
||||
When no outbound message arrives at SITL within 5 s
|
||||
Then the test records `error_m=∞`, `pass_50m=False`, `pass_20m=False` for that image; the run continues. Final aggregate may still PASS if the total pass counts meet the thresholds.
|
||||
|
||||
**AC-5: parameterization**
|
||||
Given the conftest's `(fc_adapter, vio_strategy)` parameterization
|
||||
Then the scenario runs four times by default — `(ardupilot, okvis2)`, `(ardupilot, klt_ransac)`, `(inav, okvis2)`, `(inav, klt_ransac)` — and emits one CSV row per parameterization in `report.csv`.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
This is an end-to-end scenario. The test ONLY interacts with the SUT through public boundaries.
|
||||
|
||||
- **Allowed inputs**: frame-source push, FC inbound IMU stream replay (passive — no modification by the test).
|
||||
- **Allowed observation**: SITL-side `GPS_INPUT` / `MSP2_SENSOR_GPS` receipt; FDR archive post-run if needed for diagnostic evidence.
|
||||
- **Forbidden**: importing any SUT module, monkeypatching internal SUT state, stubbing C1-C5 components or C6 tile cache or C7 inference runtime, reading SUT internal Python/C++ buffers.
|
||||
- **External-system stubs allowed**: `ardupilot-plane-sitl`, `inav-sitl` (real SITLs — observe only), the `tile-cache-fixture` mount (a public on-disk schema), the FDR filesystem.
|
||||
- If a SUT module isn't yet implemented, the scenario MUST fail/block as missing product implementation; it must NOT pass by replacing that module with a test stub.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Reference data: `_docs/00_problem/input_data/expected_results/results_report.md` (the rule) + `expected_results/position_accuracy.csv` (per-image budget flags).
|
||||
- Geodesic comparison: Vincenty (NOT haversine) — required by `expected-results.md` `numeric_tolerance` semantics.
|
||||
- Per-image CSV must include the image_id used in `coordinates.csv` (1-based `AD000001` style).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk: SITL round-trip latency variance can push a passing image past the 5 s timeout**
|
||||
- *Mitigation*: timeout is 5 s per image (matches scenario's `Max execution time` budget of 5 min for 60 images). If timeout-driven failures dominate, NFT-PERF-01 is the place to investigate latency; this scenario records timeouts as failures and continues.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-01
|
||||
- `_docs/00_problem/input_data/expected_results/results_report.md` (Pass/Fail Rules)
|
||||
- `_docs/00_problem/input_data/expected_results/position_accuracy.csv` (per-image flags)
|
||||
- `_docs/02_document/tests/test-data.md` § Expected Results Mapping § Position accuracy (FT-P-01 row)
|
||||
@@ -0,0 +1,69 @@
|
||||
# FT-P-04 — Derkachi frame-to-frame registration success rate
|
||||
|
||||
**Task**: AZ-412_ft_p_04_derkachi_f2f_registration
|
||||
**Name**: FT-P-04 frame-to-frame registration ≥95 % on normal Derkachi segments (AC-2.1a)
|
||||
**Description**: Implement FT-P-04 — full Derkachi replay; SUT exposes per-frame registration-success metric (via `NAMED_VALUE_FLOAT` or FDR per AC-NEW-3); compute success ratio over normal segments only (nadir ±10° bank/pitch, ≥40 % inferred prior-frame overlap).
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-406, AZ-407
|
||||
**Component**: Blackbox Tests / Positive (epic AZ-262)
|
||||
**Tracker**: AZ-412
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
The frame-to-frame registration success rate on "normal" segments is a direct measurement of the C1 VIO + C3 matcher quality in nominal conditions. AC-2.1a requires ≥95 % — without this scenario the project has no honest measurement.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenario at `e2e/tests/positive/test_ft_p_04_derkachi_f2f_registration.py`.
|
||||
- Replays the full Derkachi fixture; reads per-frame registration-success metric (boolean per frame).
|
||||
- "Normal" segment classification: nadir bank/pitch within ±10° (estimated from `SCALED_IMU2`-derived attitude in `data_imu.csv`); ≥40 % inferred prior-frame overlap (heuristic from frame-to-frame translation magnitude).
|
||||
- Computes success ratio over normal segments only.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Full-replay test method.
|
||||
- Normal-segment derivation from `data_imu.csv` (the test computes attitude from `SCALED_IMU2` per AC-2.1a).
|
||||
- Per-frame registration-success metric ingestion (via `NAMED_VALUE_FLOAT` listener OR post-run FDR read).
|
||||
- Success-ratio computation + assertion.
|
||||
|
||||
### Excluded
|
||||
- Sharp-turn segments — exercised separately by FT-N-02 (AZ-414) and explicitly excluded from this denominator.
|
||||
- MRE budgets — owned by FT-P-05 / FT-P-06 (AZ-413).
|
||||
- Cross-domain (UAV → satellite) success — owned by FT-P-05.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: normal-segment classification reproducibility**
|
||||
Given the same Derkachi `data_imu.csv`
|
||||
Then the same set of frames are classified "normal" across two runs of the test.
|
||||
|
||||
**AC-2: success ratio meets AC-2.1a budget**
|
||||
Given the SUT exposes per-frame registration-success
|
||||
Then `success_ratio_over_normal_segments ≥ 0.95`.
|
||||
|
||||
**AC-3: success-ratio computation excludes sharp-turn frames**
|
||||
Given the per-frame attitude exceeds ±10° bank or pitch (sharp-turn region)
|
||||
Then those frames are excluded from the denominator; the test reports `excluded_frame_count` separately for diagnostic clarity.
|
||||
|
||||
**AC-4: parameterization**
|
||||
Given the conftest's `(fc_adapter, vio_strategy)` parameterization
|
||||
Then the scenario runs once per parameterization.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end through public boundaries.
|
||||
|
||||
- **Allowed**: per-frame metric exposure via `NAMED_VALUE_FLOAT` (a public MAVLink message) OR via post-run FDR archive read.
|
||||
- **Forbidden**: importing C1 VIO internal state, monkeypatching C3 matcher pass/fail return, stubbing C2 retrieval to force successes.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The per-frame registration-success metric is part of the SUT's documented public output (per AC-NEW-3 FDR schema). If it isn't there yet, the scenario fails — it does NOT compensate by inferring from another signal.
|
||||
- Normal-segment derivation uses ONLY `data_imu.csv`, not internal SUT state.
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-04
|
||||
- `_docs/02_document/tests/test-data.md` § Image processing quality (FT-P-04 row)
|
||||
@@ -0,0 +1,72 @@
|
||||
# FT-P-05 + FT-P-06 — Satellite-anchor cross-domain registration + MRE budgets
|
||||
|
||||
**Task**: AZ-413_ft_p_05_06_sat_anchor_mre
|
||||
**Name**: Cross-domain matcher registration + Mean Reprojection Error budgets (AC-2.1b, AC-2.2)
|
||||
**Description**: Implement FT-P-05 (satellite-anchor cross-domain registration with MRE < 2.5 px and accuracy budget) and FT-P-06 (frame-to-frame MRE < 1.0 px and cross-domain MRE < 2.5 px at 95th percentile). FT-P-06 piggybacks on FT-P-04 + FT-P-05 runs and only adds the MRE 95th-percentile assertion.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-406, AZ-407, AZ-412 (FT-P-06 reuses FT-P-04 MRE collection)
|
||||
**Component**: Blackbox Tests / Positive (epic AZ-262)
|
||||
**Tracker**: AZ-413
|
||||
**Epic**: AZ-262 (E-BBT)
|
||||
|
||||
## Problem
|
||||
|
||||
Two AC bind the cross-domain matcher (UAV → satellite tile) quality: AC-2.1b (registration succeeds with MRE in budget) and AC-2.2 (95th-percentile MRE < 1.0 px frame-to-frame, < 2.5 px cross-domain). Both are direct measurements of C3 / C3.5 quality and are required to validate the matcher choice.
|
||||
|
||||
## Outcome
|
||||
|
||||
- pytest scenarios at `e2e/tests/positive/test_ft_p_05_sat_anchor.py` (FT-P-05) and a small piggyback assertion in `e2e/tests/positive/test_ft_p_06_mre_budgets.py` (FT-P-06).
|
||||
- FT-P-05: pushes each `still-image-set-60` image; reads per-frame MRE (via `NAMED_VALUE_FLOAT` or FDR); aggregates per-image accuracy AND MRE distribution; asserts MRE < 2.5 px for all images, ≥80 % within 50 m of GT, ≥50 % within 20 m of GT.
|
||||
- FT-P-06: depends on the runs of FT-P-04 (frame-to-frame MRE) and FT-P-05 (cross-domain MRE); aggregates by domain; asserts 95th-percentile MRE < 1.0 px frame-to-frame, < 2.5 px cross-domain.
|
||||
- CSV evidence: `e2e-results/run-${RUN_ID}/ft-p-05.csv` (one row per image: `image_id, est_lat, est_lon, error_m, mre_px, pass_50m, pass_20m, pass_mre`).
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- FT-P-05 test method.
|
||||
- FT-P-06 piggyback method that reads FT-P-04 + FT-P-05 evidence and adds the 95th-percentile assertion.
|
||||
- Per-image MRE retrieval via `NAMED_VALUE_FLOAT` or post-run FDR archive read.
|
||||
|
||||
### Excluded
|
||||
- The 60-image accuracy-only assertion (AC-1.1, AC-1.2) — owned by FT-P-01 (AZ-409).
|
||||
- The Derkachi frame-to-frame success ratio (AC-2.1a) — owned by FT-P-04.
|
||||
- C3.5 conditional-refiner-specific assertions — out of scope; AC-2.1b is on the matcher pipeline as a whole.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: per-image MRE captured**
|
||||
Given each `still-image-set-60` image
|
||||
Then the test records the per-frame MRE in `ft-p-05.csv`.
|
||||
|
||||
**AC-2: cross-domain MRE budget (per image)**
|
||||
Given each per-image MRE
|
||||
Then `mre_px < 2.5` for all 60 images (per AC-2.1b "all images").
|
||||
|
||||
**AC-3: accuracy budget alongside MRE**
|
||||
Given the same 60 images
|
||||
Then ≥80 % satisfy `error_m ≤ 50` AND ≥50 % satisfy `error_m ≤ 20` (matches FT-P-01 thresholds; this assertion may be loosened to "implied by FT-P-01" if FT-P-01 already passes in the same run).
|
||||
|
||||
**AC-4: 95th-percentile MRE budget (FT-P-06)**
|
||||
Given FT-P-04 + FT-P-05 evidence
|
||||
Then `MRE_95th_percentile_frame_to_frame < 1.0 px` AND `MRE_95th_percentile_cross_domain < 2.5 px`.
|
||||
|
||||
**AC-5: parameterization**
|
||||
Given the conftest's `(fc_adapter, vio_strategy)` parameterization
|
||||
Then both test files run per parameterization.
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
End-to-end through public boundaries.
|
||||
|
||||
- **Allowed**: `NAMED_VALUE_FLOAT` MRE exposure OR post-run FDR archive read.
|
||||
- **Forbidden**: importing C3 / C3.5 matcher state, stubbing the matcher to force a specific MRE.
|
||||
|
||||
## Constraints
|
||||
|
||||
- The per-frame MRE must be exposed as a documented public artifact (NAMED_VALUE_FLOAT key or FDR field) — if absent, the test fails.
|
||||
- The 95th-percentile is computed strictly (linear interpolation between the two adjacent ranks per numpy's default percentile algorithm).
|
||||
|
||||
## Document Dependencies
|
||||
|
||||
- `_docs/02_document/tests/blackbox-tests.md` § FT-P-05, § FT-P-06
|
||||
- `_docs/02_document/tests/test-data.md` § Image processing quality
|
||||
Reference in New Issue
Block a user