[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring

AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer
SKIPs at the env-var gate. configs/operator_replay.yaml registers
c6/c7/c10/c11 with sane defaults (backbones intentionally empty,
see AZ-965); docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains
SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url
and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key
so secrets flow from .env.test and never sit in YAML. README drops
the manual export step. 97/97 c11 + config unit tests stay green.

Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed /
1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2
skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to
ERROR with a deeper, real gate — IndexUnavailableError on
FaissDescriptorIndex against a fresh c6_tile_cache.root_dir.

AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839
C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for
NetVLAD ONNX backbone provisioning — the next gate the orchestrator
test will hit once FAISS clears.

Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 →
AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral
directive (2026-05-29) unchanged — still gated behind Derkachi
e2e green, still NOT MET.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 16:42:55 +03:00
parent 92ba7997a9
commit 763d8b21ad
9 changed files with 272 additions and 6 deletions
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 6
name: implement-tasks
detail: "batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
detail: "batch 10 = AZ-962 SHIPPED end-to-end + 2 new tickets filed. Implemented `configs/operator_replay.yaml` (registers c6/c7/c10/c11 with defaults; `backbones: []` intentionally — see AZ-965), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` + bind-mounts `./configs:/opt/configs:ro`, `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries (`SATELLITE_PROVIDER_URL``c11.satellite_provider_url`, `SATELLITE_PROVIDER_API_KEY``c11.service_api_key`) so secrets flow from `.env.test` and never land in YAML, and README dropped the manual export step. 97/97 c11+config unit tests stay green. Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s — i.e. -2 skipped, +2 errors vs the prior baseline. AZ-962 AC-3 + AC-4 satisfied: AZ-840 orchestrator no longer SKIPs at env-var; it now ERRORs at a deeper, real gate during fixture setup with `IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`. Same error in `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` confirms fixture-wide bug, not a single-test issue. Root cause: `conftest.py:487` calls `build_descriptor_index(config)` against a fresh empty `c6_tile_cache.root_dir` (tmp dir per AZ-839 invariant) — FAISS factory needs an existing `.index`. `tile-init` compose service exists but writes its seed to `/var/lib/gps-denied/tiles`, not the tmp dir the fixture overrides into. Filed AZ-964 (3 SP, To Do, FAISS index bootstrap; preferred fix = invoke `mk_test_faiss_fixture.py` inline against override `root_dir`) and AZ-965 (3 SP, To Do, blocked by AZ-964, NetVLAD ONNX backbone provisioning — the next gate after FAISS clears). AZ-962 transitioned To Do → In Progress → Done in Jira (read-back verified). AZ-962 spec moved todo/ → done/. **Cycle-4 e2e gate still NOT GREEN**: AZ-840 chain is now AZ-964 → AZ-965 → orchestrator PASS; 60s smoke is AZ-963 → 4 derkachi_1min tests PASS. OKVIS2 deferral directive still in force (not yet met). Earlier same-day batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
retry_count: 0
cycle: 4
tracker: jira