[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring

AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer
SKIPs at the env-var gate. configs/operator_replay.yaml registers
c6/c7/c10/c11 with sane defaults (backbones intentionally empty,
see AZ-965); docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains
SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url
and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key
so secrets flow from .env.test and never sit in YAML. README drops
the manual export step. 97/97 c11 + config unit tests stay green.

Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed /
1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2
skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to
ERROR with a deeper, real gate — IndexUnavailableError on
FaissDescriptorIndex against a fresh c6_tile_cache.root_dir.

AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839
C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for
NetVLAD ONNX backbone provisioning — the next gate the orchestrator
test will hit once FAISS clears.

Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 →
AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral
directive (2026-05-29) unchanged — still gated behind Derkachi
e2e green, still NOT MET.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 16:42:55 +03:00
parent 92ba7997a9
commit 763d8b21ad
9 changed files with 272 additions and 6 deletions
@@ -0,0 +1,109 @@
# AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness
**Status**: Done (Jira) / `done/` (local)
**Issue type**: Task
**Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-962
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
**Shipped**: 2026-05-29 (same day)
## Closure note (2026-05-29)
Shipped: `configs/operator_replay.yaml` authored (registers all 4 blocks c6/c7/c10/c11), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` and bind-mounts `./configs:/opt/configs:ro`, and `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries for `SATELLITE_PROVIDER_URL` / `SATELLITE_PROVIDER_API_KEY``c11_tile_manager` so secrets stay out of the YAML and flow in from `.env.test`. README `tests/e2e/replay/README.md` updated to drop the manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step.
Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s. AC-3 satisfied — `test_az840_e2e_real_flight_orchestration` no longer SKIPs at the env-var gate. AC-4 satisfied — it now ERRORs at a deeper, real gate (`IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`) which is captured in a NEW follow-up ticket **AZ-964**. The empty-backbones gate that this spec originally flagged (c10 backbones) becomes the gate AFTER AZ-964 clears — filed as **AZ-965**.
Net cycle-4 status remains NOT GREEN (orchestrator test still doesn't PASS, blocked by AZ-964 + AZ-965; ESKF divergence regression still blocked by AZ-963). AZ-962 itself is complete.
## Why
Discovered 2026-05-29 during cycle-4 e2e validation run on Tier-2 Jetson AGX Orin. The AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) — the test that's supposed to prove the full 7-step pipeline works end-to-end — was SKIPPED with:
```
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
Two gaps:
1. `docker-compose.test.jetson.yml` does NOT export `GPS_DENIED_OPERATOR_CONFIG_PATH` despite the comment claiming the Jetson harness sets it. Grep confirms the env var is absent from the compose file.
2. The YAML the README's Tier-2 invocation references (`/workspace/configs/operator_replay.yaml`) does NOT exist anywhere in the repo. No `configs/` directory, no `**/operator*.yaml` match.
Net effect: the cycle-4 closure narrative (Epic AZ-835 + children AZ-836/AZ-838/AZ-839/AZ-840/AZ-842 all marked Done) was based on AC verification by **doc-content presence**, not by the orchestrator test actually running. The test has never been demonstrated to PASS end-to-end on the Jetson harness automatically. This is the exact failure mode `meta-rule.mdc` warns against ("Tests that pass by skipping the component they are supposed to exercise create false confidence").
## Goal
Make the AZ-840 orchestrator test actually runnable on `bash scripts/run-tests-jetson.sh` (no out-of-band manual env-var setup). The test must either PASS, or fail with a NEW, real, attributable error that lands in a follow-up ticket — not skip silently.
## Scope
1. **Author `configs/operator_replay.yaml`** (final location TBD — `configs/` at repo root, or `tests/fixtures/operator_replay.yaml`, or another location consistent with the project's config conventions).
* Must register at minimum: `c6_tile_cache`, `c7_inference`, `c10_provisioning`, `c11_tile_manager` (the four blocks `conftest.py:322-326` and `_build_operator_pre_flight_cache` consume).
* Schema must match what `load_config` parses (see `gps_denied_onboard/config/loader.py`).
* Component types must match what the runtime factories build (see `tests/e2e/replay/conftest.py:430-462` for the `c6_tile_cache.root_dir` override pattern).
* Imagery / FAISS settings sized for Derkachi fixture: route-driven seeding (AZ-836 / AZ-838), HNSW32 FAISS index, NetVLAD descriptors.
2. **Wire the env var into `docker-compose.test.jetson.yml`**:
* Add `GPS_DENIED_OPERATOR_CONFIG_PATH: /opt/configs/operator_replay.yaml` to the `e2e-runner.environment` block.
* Add a read-only bind mount for the configs dir: `./configs:/opt/configs:ro`.
* Verify the README's "Tier-2 invocation" example matches what the compose does automatically — no manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step required.
3. **Re-run Tier-2 and capture the verdict**:
* `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh`
* Confirm the AZ-840 test no longer skips with the env-var or config-file gate.
* Capture the verdict-report (`_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) if PASS, or capture the new failure mode for follow-up ticket if FAIL.
4. **Update README** if the wiring story now differs from the documented one.
## Acceptance Criteria
* **AC-1**: `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH` pointing at a YAML that is bind-mounted into the e2e-runner container.
* **AC-2**: `configs/operator_replay.yaml` (or equivalent final path) exists in the repo, registers all 4 required component blocks (`c6_tile_cache` + `c7_inference` + `c10_provisioning` + `c11_tile_manager`), and is consumable by `load_config(os.environ, paths=[config_path])` without `KeyError`.
* **AC-3**: `bash scripts/run-tests-jetson.sh` no longer reports `SKIPPED [127]: AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH ...` for `test_az840_e2e_real_flight_orchestration`.
* **AC-4**: The orchestrator test either PASSes (and the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` is captured), or fails with a NEW error that is filed as a separate follow-up ticket (don't paper over the failure — failing test + new ticket is the honest outcome).
* **AC-5**: README's `### AZ-835 orchestrator test` section accurately describes what `scripts/run-tests-jetson.sh` does (no "set this env var manually" step required when running via the script).
## Out of scope
* The 4 regression failures in `test_derkachi_1min.py` (separate AZ-963 ticket).
* AZ-895 deprecation rollback.
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
* Updating cycle-4 closure narrative / re-opening AZ-840/AZ-842 status decisions — those are tracker-state questions the user owns.
## Dependencies
* **AZ-835** (parent Epic, currently To Do in Jira but tracker-drift suspected) — this ticket closes a real validation gap in that Epic's deliverable.
* **AZ-839** (C3 fixture, Done locally / In Testing in Jira) — this ticket provides the missing input the fixture's skip-gate complains about.
* **AZ-840** (C4 orchestrator test, Done locally / In Testing in Jira) — this ticket makes that test actually run.
## Estimate
3 SP. Multi-step (YAML + compose wiring + verification re-run), moderate complexity (YAML schema must match runtime factories' expectations), moderate risk (might need iterative tuning on the first re-run).
## Run-log evidence (2026-05-29 Tier-2)
```
JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh
...
e2e-runner-1 | collected 57 items
e2e-runner-1 | tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration SKIPPED [ 1%]
...
e2e-runner-1 | = 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed, 1 warning in 90.59s (0:01:30) =
e2e-runner-1 | SKIPPED [1] tests/e2e/replay/test_az835_e2e_real_flight.py:127:
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
## References
* Compose: `docker-compose.test.jetson.yml`
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py:127`
* Skip-gate definition: `tests/e2e/replay/conftest.py:343-388`
* README: `tests/e2e/replay/README.md` § `AZ-835 orchestrator test`
* Sibling ticket (parallel work): AZ-963 — 60s smoke regression