- Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-26 14:28:10 +03:00
parent 2178737b36
commit 9eba1689b3
17 changed files with 2965 additions and 69 deletions
@@ -0,0 +1,76 @@
# Test-Spec Phase 1 Findings (intermediate, not a final artifact)
> Generated 2026-04-26 during Plan Step 1 (test-spec/SKILL.md, Phase 1).
> This is a working note — Phase 2 reads it to carry forward findings + the user's locked-in decisions.
> Phase 2 produces the 8 final artifacts under `_docs/02_document/tests/`.
## Inputs surveyed
- `_docs/00_problem/problem.md` — short problem statement; flags missing IMU.
- `_docs/00_problem/acceptance_criteria.md` — 46 ACs (37 numbered + 9 NEW).
- `_docs/00_problem/restrictions.md` — UAV/flight/camera/satellite/HW/sensors/failsafe.
- `_docs/01_solution/solution.md` (renamed from solution_draft03 by Plan Prereq 2) — 11 components + testing strategy F-T1…F-T19, NF-T1…NF-T6, S-T1…S-T5, FT-1…FT-3.
- `_docs/00_problem/input_data/`:
- 60 nav-cam JPGs `AD000001.jpg``AD000060.jpg`
- `coordinates.csv` (frame → GPS truth)
- 2 `_gmaps.png` thumbnails (frames 12 only)
- `data_parameters.md` (corpus-shoot params)
- `expected_results/results_report.md` (46 mapped scenarios) + `position_accuracy.csv`
## Quantifiability check
All 46 mapped scenarios in `results_report.md` have quantifiable expected results (no vague "works correctly" entries). Comparison methods used: `percentage`, `numeric_tolerance`, `threshold_max`, `threshold_min`, `exact`, `regex`, `range`, `file_reference`. Acceptable.
## Coverage of the 46 ACs by `results_report.md`
- **Fully covered (~18 / 46 ≈ 39%)**: AC-1.1, AC-1.3 (mono only), AC-2.1, AC-2.2 (VO half), AC-3.1, AC-3.2, AC-3.4, AC-4.1, AC-4.2, AC-5.1, AC-5.3, AC-6.2, AC-7.1, AC-7.2, AC-NEW-5 (junction temp slice), AC-NEW-8 (mono only), API ACs (rows 3033), TRT validation rows 4244.
- **Partially covered (~10)**: AC-1.2, AC-2.2 cross-view <2.5 px, AC-5.2 timing, AC-6.1, AC-NEW-1, AC-NEW-5 hot/cold soak.
- **Not covered (~18)**: AC-1.4, AC-3.3, AC-4.3 ODOMETRY (intentionally per v1 — see clause), AC-4.4, AC-4.5, AC-6.3, AC-8.1AC-8.6, AC-NEW-2, AC-NEW-3, AC-NEW-4, AC-NEW-6, AC-NEW-7, AC-NEW-9.
Headline ≈ 39% direct + 22% partial = **~61%** against the 46-AC denominator. Below the 75% threshold *only* when the 60-image slice is treated as the sole corpus. The solution's testing strategy explicitly delegates the missing slice to bench-off corpora named in solution.md (AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic, first internal fixed-wing). Per user decision #4 below, Phase 2 will spec tests for all 46 ACs and mark unfulfilled-data ACs with `data_status: deferred-corpus` in `traceability-matrix.md`.
## Stale-doc fixes already applied (per user decision #1, option A)
Edits made during Phase 1 — Phase 2 reads these as baseline truth:
| File | Row / AC | Change |
|------|----------|--------|
| `results_report.md` | row 2 | "≥60% within 20m" → "≥50% within 20m" (aligns with AC-1.2). |
| `results_report.md` | row 19 | "ESKF position corrected" → "Component 5 calibrator emits a satellite-anchored fix, FC EKF3 reconverges". |
| `results_report.md` | row 22 | "uses hint as ESKF measurement" → "uses hint as a high-covariance (~500m) seed for VPR/cross-view re-localization (consumed by Component 5 calibrator)". |
| `results_report.md` | row 23 | "GPS_INPUT output begins within 60s of boot" → "within 30s of boot (95th percentile)" (aligns with AC-NEW-1). |
| `results_report.md` | row 25 | "inits ESKF with high uncertainty" → "re-initialises Component 5 calibrator state with high uncertainty"; recovery time "≤70s" → "≤30s". |
| `results_report.md` | row 38 | LiteSAM/XFeat ≤330ms inline → "SP+LG (TRT FP16/INT8) inline ≤200ms; LiteSAM re-loc fallback ≤2000ms". |
| `acceptance_criteria.md` | AC-4.3 | added v1-scope clause: ODOMETRY emission disabled in v1 (per solution_draft03 finding M-30, EKF3 issues #30076/#32506); `EK3_SRC1_*=GPS+Compass`; tests assert ODOMETRY is intentionally absent on the wire in v1; ODOMETRY re-enabled in v1.1 once F-T9 SITL passes. |
## Locked-in user decisions (carry into Phase 2)
| ID | Decision | Phase-2 implication |
|----|----------|---------------------|
| D1 | Apply 4 stale-doc fixes inline (done above). | Phase 2 reads `results_report.md` v2 (post-edit) and the new AC-4.3 clause as authoritative. |
| D2 | Camera/altitude mismatch: 60-image slice is **pipeline-correctness corpus only** — does NOT validate GSD-band assumptions, latency budgets, or matcher resolution sweeps for the deployed 1km AGL / 20MP path. | `tests/test-data.md` MUST state: corpus shot at 400m AGL with ADTi 26S v2 (26MP, 6252×4168, 25mm, 23.5mm sensor). Tests scoped to "pipeline correctness" only. AC-1.1/AC-1.2/AC-2.1/AC-2.2/AC-NEW-8 acceptance numbers from this slice are pipeline-functional, not deployment-binding. Deployment-binding tests reference AerialVL S03 (1km AGL fixed-wing). |
| D3 | Missing satellite tiles + IMU: spec tests with **placeholder fixtures** referenced by name even though files don't yet exist. | `tests/test-data.md` declares: (a) `fixtures/satellite_tiles_AD0000xx_z20/` — z=20 ortho tiles for the bbox of `coordinates.csv`, fetched by an implementer-written script (Esri / public ortho); (b) `fixtures/imu_AD0000xx.csv` — IMU traces from SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory at 200 Hz; for AC-1.3 / AC-NEW-8 fixed-wing tests use **AerialVL S03 IMU** as the fixed-wing reference. Phase 3 hard gate will surface these as "pending data", not "remove". |
| D4 | AC-coverage gap: Phase 2 specs tests for **all 46 ACs**; deferred-data ACs get `data_status: deferred-corpus` in `traceability-matrix.md` listing the named external corpus. | `traceability-matrix.md` columns: AC-id, Test-id, Test-file, data_status (∈ {present, deferred-corpus, deferred-sitl, deferred-hil}). Rows pointing at AerialVL S03, UAV-VisLoc, AerialExtreMatch, TartanAir V2, internal Mavic, first internal fixed-wing flight, hot/cold soak chamber, multi-flight Monte Carlo, and SITL ArduPilot are emitted with the appropriate `deferred-*` token. |
## Open contradictions still standing (NOT auto-fixed)
None for Phase 2 entry. AC-4.3 dual-channel framing was the only remaining one and it was resolved by the v1-scope clause (D1) — AC text intact, v1 implementation scoped to GPS_INPUT only.
## Known data dependencies for Phase 2 to spec around
| Dependency | Status | Phase-2 treatment |
|------------|--------|-------------------|
| z=20 satellite tiles for the `coordinates.csv` bbox | Missing | Fixture name declared in `test-data.md`, script TODO (implementer task in Plan Step 3 / Decompose). |
| IMU traces synced to `coordinates.csv` frames | Missing | SITL replay declared as fixture; AerialVL S03 used for fixed-wing AC-1.3 / AC-NEW-8. |
| AerialVL S03 / UAV-VisLoc / AerialExtreMatch / 2chADCNN / TartanAir V2 | External, not yet downloaded | `data_status: deferred-corpus` in matrix; Decompose creates a "dataset acquisition" task. |
| First internal fixed-wing flight footage | Pending field-test plan | `data_status: deferred-corpus`. |
| SITL ArduPilot environment (PR #30080 pinned version) | Not yet provisioned | `data_status: deferred-sitl`. |
| Hot/cold soak chamber (AC-NEW-5) | Bench equipment | `data_status: deferred-hil`. |
| 8-h synthetic load fixture (AC-NEW-3 FDR) | Synthesizable | Declared as fixture, generated at impl time. |
## Phase 2 entry checklist (READY)
- [x] Phase 1 BLOCKING gate cleared (user confirmed coverage decisions).
- [x] Stale-doc fixes applied (D1).
- [x] Findings preserved here for resume in a fresh conversation.
- [ ] Phase 2 will read this file first, then read solution.md / AC / restrictions / results_report.md as needed for each artifact.