[AZ-962] [AZ-963] Cycle-4 Tier-2 e2e validation NOT GREEN; file 2 follow-up tickets

Ran the Tier-2 Jetson e2e harness on Jetson AGX Orin:
  JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh
Result: 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed in 90.59s.

Two distinct cycle-4 blockers surfaced:

Blocker 1 (AZ-962, 3 SP):
The AZ-840 orchestrator test that should prove the full 7-step
cycle-4 pipeline works was SKIPPED, not PASSed.
docker-compose.test.jetson.yml does not export
GPS_DENIED_OPERATOR_CONFIG_PATH and the operator_replay.yaml the
README references does not exist anywhere in the repo. Epic AZ-835's
'Done' status across its children was validated by doc-content
presence only, never by actual test execution.

Blocker 2 (AZ-963, 3 SP):
4 tests in test_derkachi_1min.py (AZ-265/AZ-404 60s smoke) regress
with EstimatorFatalError('eskf filter divergence: mahalanobis²=212.31
> 100.0') at frame 233. AZ-895 made the CSV-driven path primary; the
CSV path runs open-loop on the Derkachi fixture (no reference C6 tile
cache -> no satellite anchoring -> ESKF integrates open-loop ->
diverges in ~10s). Before AZ-895 the tlog path was primary and
exited cleanly. test_ac3_within_100m_80pct_of_ticks also XPASSed -
third silent-failure surface flagged for investigation in AZ-963.

Filed both as separate Jira Tasks (see local specs in
_docs/02_tasks/todo/AZ-962_*.md and AZ-963_*.md for full payload,
ACs, options, run-log evidence).

OKVIS2 chain (AZ-943/951/952) stays deferred per user 2026-05-29
directive — Derkachi e2e is not green, directive unchanged.

AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state I set earlier
today (commits 10c2a1e / 2cc992d) is contingent on whether
convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested'
applies; user-skipped convention question, the leftover at
_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md holds
the walk-back payload if (A).

No production code changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 13:43:52 +03:00
parent 2cc992da4a
commit 92ba7997a9
4 changed files with 213 additions and 2 deletions
File diff suppressed because one or more lines are too long
@@ -0,0 +1,100 @@
# AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-962
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
## Why
Discovered 2026-05-29 during cycle-4 e2e validation run on Tier-2 Jetson AGX Orin. The AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) — the test that's supposed to prove the full 7-step pipeline works end-to-end — was SKIPPED with:
```
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
Two gaps:
1. `docker-compose.test.jetson.yml` does NOT export `GPS_DENIED_OPERATOR_CONFIG_PATH` despite the comment claiming the Jetson harness sets it. Grep confirms the env var is absent from the compose file.
2. The YAML the README's Tier-2 invocation references (`/workspace/configs/operator_replay.yaml`) does NOT exist anywhere in the repo. No `configs/` directory, no `**/operator*.yaml` match.
Net effect: the cycle-4 closure narrative (Epic AZ-835 + children AZ-836/AZ-838/AZ-839/AZ-840/AZ-842 all marked Done) was based on AC verification by **doc-content presence**, not by the orchestrator test actually running. The test has never been demonstrated to PASS end-to-end on the Jetson harness automatically. This is the exact failure mode `meta-rule.mdc` warns against ("Tests that pass by skipping the component they are supposed to exercise create false confidence").
## Goal
Make the AZ-840 orchestrator test actually runnable on `bash scripts/run-tests-jetson.sh` (no out-of-band manual env-var setup). The test must either PASS, or fail with a NEW, real, attributable error that lands in a follow-up ticket — not skip silently.
## Scope
1. **Author `configs/operator_replay.yaml`** (final location TBD — `configs/` at repo root, or `tests/fixtures/operator_replay.yaml`, or another location consistent with the project's config conventions).
* Must register at minimum: `c6_tile_cache`, `c7_inference`, `c10_provisioning`, `c11_tile_manager` (the four blocks `conftest.py:322-326` and `_build_operator_pre_flight_cache` consume).
* Schema must match what `load_config` parses (see `gps_denied_onboard/config/loader.py`).
* Component types must match what the runtime factories build (see `tests/e2e/replay/conftest.py:430-462` for the `c6_tile_cache.root_dir` override pattern).
* Imagery / FAISS settings sized for Derkachi fixture: route-driven seeding (AZ-836 / AZ-838), HNSW32 FAISS index, NetVLAD descriptors.
2. **Wire the env var into `docker-compose.test.jetson.yml`**:
* Add `GPS_DENIED_OPERATOR_CONFIG_PATH: /opt/configs/operator_replay.yaml` to the `e2e-runner.environment` block.
* Add a read-only bind mount for the configs dir: `./configs:/opt/configs:ro`.
* Verify the README's "Tier-2 invocation" example matches what the compose does automatically — no manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step required.
3. **Re-run Tier-2 and capture the verdict**:
* `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh`
* Confirm the AZ-840 test no longer skips with the env-var or config-file gate.
* Capture the verdict-report (`_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) if PASS, or capture the new failure mode for follow-up ticket if FAIL.
4. **Update README** if the wiring story now differs from the documented one.
## Acceptance Criteria
* **AC-1**: `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH` pointing at a YAML that is bind-mounted into the e2e-runner container.
* **AC-2**: `configs/operator_replay.yaml` (or equivalent final path) exists in the repo, registers all 4 required component blocks (`c6_tile_cache` + `c7_inference` + `c10_provisioning` + `c11_tile_manager`), and is consumable by `load_config(os.environ, paths=[config_path])` without `KeyError`.
* **AC-3**: `bash scripts/run-tests-jetson.sh` no longer reports `SKIPPED [127]: AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH ...` for `test_az840_e2e_real_flight_orchestration`.
* **AC-4**: The orchestrator test either PASSes (and the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` is captured), or fails with a NEW error that is filed as a separate follow-up ticket (don't paper over the failure — failing test + new ticket is the honest outcome).
* **AC-5**: README's `### AZ-835 orchestrator test` section accurately describes what `scripts/run-tests-jetson.sh` does (no "set this env var manually" step required when running via the script).
## Out of scope
* The 4 regression failures in `test_derkachi_1min.py` (separate AZ-963 ticket).
* AZ-895 deprecation rollback.
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
* Updating cycle-4 closure narrative / re-opening AZ-840/AZ-842 status decisions — those are tracker-state questions the user owns.
## Dependencies
* **AZ-835** (parent Epic, currently To Do in Jira but tracker-drift suspected) — this ticket closes a real validation gap in that Epic's deliverable.
* **AZ-839** (C3 fixture, Done locally / In Testing in Jira) — this ticket provides the missing input the fixture's skip-gate complains about.
* **AZ-840** (C4 orchestrator test, Done locally / In Testing in Jira) — this ticket makes that test actually run.
## Estimate
3 SP. Multi-step (YAML + compose wiring + verification re-run), moderate complexity (YAML schema must match runtime factories' expectations), moderate risk (might need iterative tuning on the first re-run).
## Run-log evidence (2026-05-29 Tier-2)
```
JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh
...
e2e-runner-1 | collected 57 items
e2e-runner-1 | tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration SKIPPED [ 1%]
...
e2e-runner-1 | = 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed, 1 warning in 90.59s (0:01:30) =
e2e-runner-1 | SKIPPED [1] tests/e2e/replay/test_az835_e2e_real_flight.py:127:
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
## References
* Compose: `docker-compose.test.jetson.yml`
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py:127`
* Skip-gate definition: `tests/e2e/replay/conftest.py:343-388`
* README: `tests/e2e/replay/README.md` § `AZ-835 orchestrator test`
* Sibling ticket (parallel work): AZ-963 — 60s smoke regression
@@ -0,0 +1,111 @@
# AZ-963 — Fix Derkachi 60s smoke regressions: ESKF divergence on CSV-only path with no satellite anchoring (AZ-895 fallout)
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP (may bump to 5 SP after triage if option B is chosen)
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-963
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
## Why
Discovered 2026-05-29 during cycle-4 e2e validation run on Tier-2 Jetson AGX Orin. Four tests in `tests/e2e/replay/test_derkachi_1min.py` regressed to FAIL after the AZ-895 deprecation made the CSV-driven replay path primary:
* `test_ac1_exits_0_jsonl_count_match` — expects exit 0, got exit 1
* `test_ac5_determinism_two_runs_diff` — expects two PASSing runs to diff cleanly, both exit 1
* `test_ac6_pace_realtime_60s_within_5pct` — expects realtime pace within 5%, exits 1 before timing measurement is meaningful
* `test_ac6_pace_asap_under_30s` — expects asap under 30s, exits 1 in ~13s with fatal error
All four fail with the same root cause:
```
ERROR c5.state.eskf_filter_divergence kv={"source":"vio","mahalanobis_sq":212.31,"threshold_sq":100.0}
ERROR replay_loop.state_add_vio_fatal frame=233
EstimatorFatalError('eskf filter divergence on vio: mahalanobis²=212.311 > 100.0')
```
The CSV-driven path (now primary since AZ-895 deprecation) runs **open-loop** — the Derkachi fixture has no reference C6 tile cache so C2 VPR / C3 matcher / C4 pose-anchor stages are not wired:
```
WARN replay_loop.satellite_anchoring_not_wired: frame=0 — C2 VPR / C4 pose-anchor stages are not wired
in this run (Derkachi has no reference tile cache); estimator runs open-loop on VIO + IMU. Expect
monotonically growing position error.
```
After ~10s of open-loop integration, ESKF Mahalanobis distance exceeds the 100.0 threshold at frame 233 and the runner crashes with a non-zero exit code. The 4 tests don't care about accuracy but they require a clean exit — which they can't get on the CSV-only path.
**Why this matters now**: before AZ-895, the tlog path was the primary replay surface and presumably exited cleanly (with some warning about divergence) without raising `EstimatorFatalError`. The AZ-895 deprecation didn't account for the runtime-semantic difference between the two paths in test fixtures that depended on "runner exits 0 even without satellite anchoring".
## Related XPASS finding (in scope to investigate, may split into sub-ticket)
`test_ac3_within_100m_80pct_of_ticks` showed up as XPASS in the same run. It was marked xfail because "AC-3 requires the C1+C2+C3+C4+C5 satellite-re-anchoring pipeline. Blocked by AZ-777...". XPASS means "marked xfail but unexpectedly passed" — which is impossible per the documented physics (open-loop ESKF can't meet ≤80% within 100m). Either the test is silently no-oping into a pass, or the xfail mark is stale, or the new semantics changed something that fixed it. Worth investigating because it could be a third silent-failure surface.
## Goal
The 4 currently-failing tests must either PASS, or have an explicit gating decision (xfail with a tracked reason, or skip with the right mark) that doesn't silently hide AC coverage. The AC matrix in the README must accurately reflect what's measured vs what's deferred.
This ticket does NOT mandate a specific fix — the right answer requires triage. Options on the table:
* **A**: Loosen the ESKF divergence threshold in the test harness path (changes production code; risky — the threshold exists for a real safety reason)
* **B**: Add a reference C6 tile cache for Derkachi so satellite anchoring works (AZ-777 follow-up scope; large; the fixture has no anchorable imagery yet)
* **C**: Gate the 4 tests behind a "satellite anchoring required" mark and skip them on the open-loop path (preserves the tests as documentation; doesn't restore AC coverage)
* **D**: Mark the divergence-driven failures as expected (xfail with rationale: "open-loop ESKF diverges on this fixture")
* **E**: Investigate why AC-3 XPASSes and whether that finding changes AD
* **F**: Some combination after triage
## Acceptance Criteria
* **AC-1**: All 4 currently-failing tests (`test_ac1_exits_0_jsonl_count_match`, `test_ac5_determinism_two_runs_diff`, `test_ac6_pace_realtime_60s_within_5pct`, `test_ac6_pace_asap_under_30s`) are either PASSing or have an explicit gating decision with a tracked Jira reference — NOT silently disabled.
* **AC-2**: The `test_ac3_within_100m_80pct_of_ticks` XPASS is investigated and either becomes a real PASS (xfail mark removed with rationale) or stays xfail with an updated rationale (one of the two; not both, not silent).
* **AC-3**: No regression to the documented AC matrix in `tests/e2e/replay/README.md` § `AC matrix` — every AC row is still being measured in some form (PASS / honest xfail / honest skip with reason), and the README accurately reflects the current state.
* **AC-4**: The fix does not bring back the AZ-895-deprecated auto-sync surface (`--time-offset-ms`, `--skip-auto-sync-validation` CLI flags must remain deprecated).
* **AC-5**: A short triage memo lives at `_docs/03_implementation/batch_*_az963_triage.md` (or equivalent batch report) explaining which of options AF was chosen and why, with the run-log evidence cited.
## Out of scope
* AZ-840 orchestrator test (separate AZ-962 ticket).
* Reverting AZ-895 to restore the tlog path as primary.
* Building a reference C6 tile cache for Derkachi (separate large work).
* Tracker-state cleanup for AZ-840 / AZ-842 (separate user decision).
## Dependencies
* **AZ-895** (Done locally / In Testing in Jira) — this ticket addresses fallout from that deprecation.
* **AZ-265 / AZ-404** (60s suite epic) — the regressed tests are deliverables of that epic.
* **AZ-777** (Phase 3 superseded) — referenced in the existing xfail rationale; understanding why it's superseded informs the triage.
* **AZ-962** (sibling) — the AZ-840 orchestrator test is blocked by a different gap; both are cycle-4 e2e closure work but they're independent and can be worked in parallel.
## Estimate
3 SP. Investigation + triage + implementation. May bump to 5 SP if option B (build reference tile cache) is chosen — in that case split into sub-tickets per the user's complexity-budget rule (≤5 SP per ticket).
## Run-log evidence (2026-05-29 Tier-2)
```
e2e-runner-1 | = 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed, 1 warning in 90.59s (0:01:30) =
e2e-runner-1 | FAILED tests/e2e/replay/test_derkachi_1min.py::test_ac1_exits_0_jsonl_count_match
e2e-runner-1 | FAILED tests/e2e/replay/test_derkachi_1min.py::test_ac5_determinism_two_runs_diff
e2e-runner-1 | FAILED tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_realtime_60s_within_5pct
e2e-runner-1 | FAILED tests/e2e/replay/test_derkachi_1min.py::test_ac6_pace_asap_under_30s
e2e-runner-1 | XPASS tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks
```
Excerpt from the stdout of the first failure (representative of all 4):
```
{"ts":"2026-05-29T10:34:50.397901Z","level":"ERROR","component":"c5_state.eskf_baseline",
"kind":"c5.state.eskf_filter_divergence",
"kv":{"source":"vio","mahalanobis_sq":212.31115250586484,"threshold_sq":100.0}}
{"ts":"2026-05-29T10:34:50.398356Z","level":"ERROR","component":"runtime_root.replay_loop",
"kind":"replay_loop.state_add_vio_fatal",
"msg":"replay_loop.state_add_vio_fatal: frame=233 EstimatorFatalError('eskf filter divergence on vio: mahalanobis²=212.311 > 100.0')"}
```
## References
* Failing tests: `tests/e2e/replay/test_derkachi_1min.py:82, 387, 417, 433`
* XPASS: `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`
* ESKF threshold: `c5_state.eskf_baseline` (Mahalanobis² 100.0 threshold)
* Satellite-anchoring-not-wired warning: `runtime_root.replay_loop:replay_loop.satellite_anchoring_not_wired`
* README AC matrix: `tests/e2e/replay/README.md` § `AC matrix`
* Sibling ticket (parallel work): AZ-962 — orchestrator config wiring
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 6
name: implement-tasks
detail: "batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
detail: "batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
retry_count: 0
cycle: 4
tracker: jira