mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:31:13 +00:00
[autodev] step-11 path-3: calibration fix + harness drift report
Attempted Path-3 (Full SITL with community images) for the SUT Reality
Gate. Discovered sitl_observer is offline-fixture replay, not a live
SITL client -- compose-file SITL services in environment.md are
aspirational. The real Path-3 needs the fixture builders + SUT CLI
end-to-end, which surfaced 5 additional integration drifts (H-10..H-14)
on top of the prior 9.
Fixes:
- tests/fixtures/calibration/adti26.json: body_to_camera_se3 was a
{rotation_xyzw, translation_xyz_m} dict; runtime_root/_replay_branch.py
loader strictly expects a 4x4 SE3. Identity quaternion + zero
translation = identity 4x4, semantically equivalent.
New files:
- tests/fixtures/replay_config_minimal.yaml: minimal replay-mode config
for harness reproduction (mode=replay, ardupilot_plane defaults).
- .gitignore: e2e/fixtures/sitl_replay/ (generated by build_p0X_fixtures).
Documentation:
- Step 11 report: appended Path-3 attempt section.
- Leftover doc: H-10..H-14 ticket payloads added.
- Autodev state: reflects Path-3 outcome.
Step 11 stays blocked; H-13 (auto-sync AC-8 hard-fails on stationary
fixtures) requires a SUT design decision and cannot be unilaterally
fixed mid-session.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -43,6 +43,7 @@ tests/fixtures/flight_derkachi/*.h264
|
||||
tests/fixtures/flight_derkachi/*.tlog
|
||||
tests/fixtures/tiles_corpus/*.jpg
|
||||
tests/fixtures/tiles_corpus/*.png
|
||||
e2e/fixtures/sitl_replay/
|
||||
|
||||
# Editor / OS noise
|
||||
.idea/
|
||||
|
||||
@@ -172,9 +172,46 @@ Out of scope for this report; documented in `environment.md` § Execution instru
|
||||
3. **Add the preventive meta-rule** about transcript-verified test claims, if approved.
|
||||
4. **Resume Step 11 after Track 1 completes** — at minimum get one real Reality Gate signal from `tests/e2e/replay/`. Track 2 can run in parallel as its own work stream and feed back into Step 11 cycle 2.
|
||||
|
||||
## Path 3 attempt — Full SITL with community images (2026-05-17, post-blocker)
|
||||
|
||||
Per user direction, attempted the "Full path" rehab: switch ArduPilot SITL to `sparlane/ardupilot-sitl:Plane-latest` (verified pullable), build iNav SITL from source, write MAVProxy Dockerfile, then run FT-P-01 / FT-P-02 against the real fixture builders.
|
||||
|
||||
**Key reframe discovered during attempt**: `e2e/runner/helpers/sitl_observer.py` is **pure offline fixture replay**, not a live SITL client (see file docstring + `_FdrReplayObserver` class). Setting `E2E_SITL_REPLAY_DIR=...` switches the observer to read pre-built JSON fixtures (`observer_<fc_kind>_<host>.json`). No live SITL container needed for the existing blackbox FT-P-* and NFT-* tests. The compose-file SITL services in `environment.md` are aspirational future state.
|
||||
|
||||
So the realistic Full Path is:
|
||||
|
||||
1. Install SUT locally (`pip install -e .`) — DONE.
|
||||
2. Run `e2e.fixtures.sitl_replay_builder.build_p01_fixtures` to produce `e2e/fixtures/sitl_replay/p01/` — BLOCKED (see below).
|
||||
3. Run pytest on `e2e/tests/positive/test_ft_p_01_still_image_accuracy.py` with `E2E_SITL_REPLAY_DIR=e2e/fixtures/sitl_replay/p01` — BLOCKED on step 2.
|
||||
|
||||
Trying step 2 surfaced **4 new integration drifts**, on top of H-1..H-9 from the prior section:
|
||||
|
||||
| ID | Severity | Description | Status |
|
||||
|----|----------|-------------|--------|
|
||||
| H-10 | blocker | Fixture builder calls `gps-denied-replay --fdr-out PATH`. The CLI's actual arg name is `--output`. | not fixed |
|
||||
| H-11 | blocker | Fixture builder doesn't pass the CLI's required `--camera-calibration`, `--config`, `--mavlink-signing-key` args. Need to add fields to `FixtureBuilderConfig` and update `build_p01_fixtures.py` / `build_p02_fixtures.py`. | not fixed |
|
||||
| H-12 | medium | `tests/fixtures/calibration/adti26.json` declared `body_to_camera_se3` as `{rotation_xyzw, translation_xyz_m}` dict; loader at `runtime_root/_replay_branch.py:308` strictly expects a 4×4 matrix via `np.asarray(..., dtype=np.float64)`. The dict form was never parseable. | **fixed** — converted to 4×4 identity (`tests/fixtures/calibration/adti26.json`). Equivalent rotation/translation, no behavior change. |
|
||||
| H-13 | blocker | Auto-sync AC-8 validation hard-fails on still-image + stationary fixtures even when `--time-offset-ms 0` is supplied. Validator computes a "frame-window match %" (default 95% threshold) that requires real video motion + IMU takeoff signal. The FT-P-01 fixture (60 stills + stationary IMU) has neither by design. No `--skip-auto-sync` or `--accept-low-confidence-offset` escape hatch exists. | not fixed |
|
||||
| H-14 | env-conditional | CLI requires env vars including `BUILD_REPLAY_SINK_JSONL=ON` to use `NoopMavlinkTransport`. This is documented in code comments but not in `.env.example`. | needs doc update |
|
||||
|
||||
Total live harness drift count: **14 distinct items** (3 fixed, 11 deferred). Each H-10..H-13 individually takes 30-60 min to fix with the right design decisions; together they exceed the safe single-session budget given the surface-area uncertainty.
|
||||
|
||||
**Pattern**: The fixture builders (AZ-598/599/600), the CLI signature (AZ-401/402), the calibration JSON schema, and the replay protocol auto-sync (AZ-405) were each implemented well in isolation but never integrated end-to-end. This is exactly what the SUT Reality Gate is designed to surface.
|
||||
|
||||
### Path 3 verdict
|
||||
|
||||
**Cannot reach the SUT Reality Gate in this session.** Even after fixing H-12, the next gate (H-13: auto-sync hard-fail on stationary fixtures) requires a design decision: either expand the auto-sync escape hatch in the SUT, or change the fixture builder to inject a single-frame motion event, or relax AC-8 validation thresholds for stationary scenarios. Each is a non-trivial design call that warrants a Jira ticket and review, not a unilateral mid-session fix.
|
||||
|
||||
### Updated recommendation
|
||||
|
||||
The Track 2 ("Full blackbox harness") track from the previous section needs to expand to include H-10..H-14 as additional sub-stories. Realistic effort: **+1-2 days** on top of the prior estimate. Path 3 is achievable but requires 3-5 days of focused harness rehab, not a single session.
|
||||
|
||||
## Artifacts
|
||||
|
||||
- Commit `eb6dc17` — csv_reporter / pytest-csv fix
|
||||
- Commit `6ce3158` — e2e/docker harness drift fixes (H-1, H-2, H-3)
|
||||
- Local fix (uncommitted, ready to commit): `tests/fixtures/calibration/adti26.json` — H-12 4×4 SE3 fix
|
||||
- Local fix (uncommitted, ready to commit): `tests/fixtures/replay_config_minimal.yaml` — minimal config for path-3 reproduction
|
||||
- This report: `_docs/03_implementation/run_tests_step11_report.md`
|
||||
- Leftover for pytest-csv ticket: `_docs/_process_leftovers/2026-05-17_csv_reporter_pytest_csv_conflict.md`
|
||||
- Leftover for harness epic: `_docs/_process_leftovers/2026-05-17_e2e_harness_rehabilitation.md`
|
||||
|
||||
@@ -20,4 +20,4 @@ last_step_outcomes:
|
||||
step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"
|
||||
step_9: "41 blackbox test tasks (AZ-406..AZ-446) under epic AZ-262 in _docs/02_tasks/todo/ pre-existing; AZ-406 test-infra bootstrap pre-existing. Folder fallback satisfied. No Step-9 work executed in cycle 1."
|
||||
step_10: "41 of 41 blackbox-test tasks done (AZ-406..AZ-446). Final report at _docs/03_implementation/implementation_report_tests.md. Full-suite gate handed off to test-run skill per implement Step 16."
|
||||
step_11: "Local Tier-1 pytest: 3343 pass / 88 skip / 0 fail (after csv_reporter fix in eb6dc17). SUT Reality Gate UNMET — both docker harnesses blocked by pre-existing drift (3 SITL images missing + 5 other bugs). Full report: _docs/03_implementation/run_tests_step11_report.md. Tickets deferred to leftovers."
|
||||
step_11: "Local Tier-1 pytest: 3343 pass / 88 skip / 0 fail (after csv_reporter fix in eb6dc17). SUT Reality Gate UNMET — both docker harnesses blocked by pre-existing drift (now 14 distinct items: 3 fixed, 11 deferred). Full report: _docs/03_implementation/run_tests_step11_report.md. Path-3 attempt on 2026-05-17 21:30 surfaced H-10..H-14 (fixture-builder/CLI/calibration/auto-sync integration drifts); discovered sitl_observer is offline-fixture replay, NOT live SITL — compose-file SITL services are aspirational. Tickets deferred to leftovers."
|
||||
|
||||
@@ -86,6 +86,83 @@ description: |
|
||||
story_points: 2
|
||||
```
|
||||
|
||||
### Story: H-10 — Fixture builder uses wrong CLI flag
|
||||
```yaml
|
||||
type: Story
|
||||
summary: "[Bug] sitl_replay_builder uses --fdr-out; CLI requires --output"
|
||||
description: |
|
||||
e2e/fixtures/sitl_replay_builder/builder.py:79 passes `--fdr-out` to
|
||||
`gps-denied-replay`. The CLI's actual flag (src/gps_denied_onboard/cli/replay.py:90)
|
||||
is `--output`. Also need to add the CLI's other required args
|
||||
(--camera-calibration, --config, --mavlink-signing-key) — see H-11.
|
||||
Bundle H-10 + H-11 in one PR. Unit tests in
|
||||
e2e/_unit_tests/fixtures/test_sitl_replay_builder_builder.py assert on
|
||||
`--fdr-out` and need to be updated.
|
||||
story_points: 2
|
||||
```
|
||||
|
||||
### Story: H-11 — Fixture builder missing required CLI args
|
||||
```yaml
|
||||
type: Story
|
||||
summary: "[Bug] sitl_replay_builder doesn't pass camera-calibration/config/signing-key"
|
||||
description: |
|
||||
gps-denied-replay requires --camera-calibration PATH, --config PATH,
|
||||
--mavlink-signing-key PATH. Fixture builder omits all three. Add
|
||||
fields to FixtureBuilderConfig with defaults pointing at
|
||||
tests/fixtures/calibration/adti26.json, a new
|
||||
tests/fixtures/replay_config_minimal.yaml, and
|
||||
tests/fixtures/mavlink_signing/dev_key. Also set
|
||||
BUILD_REPLAY_SINK_JSONL=ON in the subprocess env.
|
||||
story_points: 2
|
||||
```
|
||||
|
||||
### Bug: H-12 — Calibration JSON shape drift (FIXED)
|
||||
```yaml
|
||||
type: Bug
|
||||
summary: "[Bug] adti26.json body_to_camera_se3 used dict form; loader expects 4x4"
|
||||
description: |
|
||||
tests/fixtures/calibration/adti26.json declared body_to_camera_se3 as
|
||||
{rotation_xyzw, translation_xyz_m}. _replay_branch.py:308 does
|
||||
np.asarray(..., dtype=np.float64) which can't decode the dict. Fixed
|
||||
by converting to the equivalent 4x4 identity matrix. Both forms encode
|
||||
the same SE3 (identity) so no behavior change.
|
||||
story_points: 1
|
||||
status_after_create: "Done"
|
||||
```
|
||||
|
||||
### Story: H-13 — Auto-sync hard-fails on stationary fixtures
|
||||
```yaml
|
||||
type: Story
|
||||
summary: "[Bug] AC-8 auto-sync validation rejects stationary FT-P-01 fixture"
|
||||
description: |
|
||||
Auto-sync (src/gps_denied_onboard/replay_input/...) hard-fails when
|
||||
--time-offset-ms 0 is supplied for a fixture with stationary IMU + no
|
||||
video motion (FT-P-01 still-image scenario). Threshold:
|
||||
frame_window_match_pct_threshold=95% in ReplayAutoSyncConfig defaults.
|
||||
Three possible fixes (design decision needed):
|
||||
a) Add --skip-auto-sync CLI flag that bypasses AC-8 validation entirely
|
||||
when time_offset_ms is explicitly supplied
|
||||
b) Lower or expose match_threshold_pct via config (already configurable
|
||||
but not surfaced in fixture builder)
|
||||
c) Change fixture builder to inject a single motion event so auto-sync
|
||||
can find SOMETHING to align on
|
||||
Recommend (a): aligns with replay protocol intent ("manual offset
|
||||
bypasses auto-sync entirely" per ReplayConfig docstring).
|
||||
story_points: 3
|
||||
```
|
||||
|
||||
### Story: H-14 — Document BUILD_REPLAY_SINK_JSONL in .env.example
|
||||
```yaml
|
||||
type: Story
|
||||
summary: "[Doc] add BUILD_REPLAY_SINK_JSONL=ON to .env.example for replay mode"
|
||||
description: |
|
||||
src/gps_denied_onboard/components/c8_fc_adapter/noop_mavlink_transport.py
|
||||
requires BUILD_REPLAY_SINK_JSONL=ON env var to construct. Not in
|
||||
.env.example. Add with comment explaining it's a replay-mode requirement
|
||||
per replay protocol Invariant 9.
|
||||
story_points: 1
|
||||
```
|
||||
|
||||
### Story: H-1..H-3 — fixes already committed
|
||||
```yaml
|
||||
type: Story
|
||||
|
||||
+6
-4
@@ -6,10 +6,12 @@
|
||||
[0.0, 0.0, 1.0]
|
||||
],
|
||||
"distortion": [0.0, 0.0, 0.0, 0.0, 0.0],
|
||||
"body_to_camera_se3": {
|
||||
"rotation_xyzw": [0.0, 0.0, 0.0, 1.0],
|
||||
"translation_xyz_m": [0.0, 0.0, 0.0]
|
||||
},
|
||||
"body_to_camera_se3": [
|
||||
[1.0, 0.0, 0.0, 0.0],
|
||||
[0.0, 1.0, 0.0, 0.0],
|
||||
[0.0, 0.0, 1.0, 0.0],
|
||||
[0.0, 0.0, 0.0, 1.0]
|
||||
],
|
||||
"acquisition_method": "calibration_target_aligned",
|
||||
"metadata": {
|
||||
"note": "Test fixture; replaced in production by adti20.json."
|
||||
|
||||
+13
@@ -0,0 +1,13 @@
|
||||
# Minimal replay-mode config for blackbox/e2e fixture builds (path-3 harness rehab).
|
||||
# Defaults from src/gps_denied_onboard/config/schema.py apply for everything else.
|
||||
__top__:
|
||||
mode: replay
|
||||
|
||||
runtime:
|
||||
fc_profile: ardupilot_plane
|
||||
tier: 1
|
||||
inference_backend: pytorch_fp16
|
||||
|
||||
replay:
|
||||
pace: asap
|
||||
target_fc_dialect: ardupilot_plane
|
||||
Reference in New Issue
Block a user