[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring

AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer
SKIPs at the env-var gate. configs/operator_replay.yaml registers
c6/c7/c10/c11 with sane defaults (backbones intentionally empty,
see AZ-965); docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains
SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url
and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key
so secrets flow from .env.test and never sit in YAML. README drops
the manual export step. 97/97 c11 + config unit tests stay green.

Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed /
1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2
skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to
ERROR with a deeper, real gate — IndexUnavailableError on
FaissDescriptorIndex against a fresh c6_tile_cache.root_dir.

AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839
C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for
NetVLAD ONNX backbone provisioning — the next gate the orchestrator
test will hit once FAISS clears.

Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 →
AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral
directive (2026-05-29) unchanged — still gated behind Derkachi
e2e green, still NOT MET.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 16:42:55 +03:00
parent 92ba7997a9
commit 763d8b21ad
9 changed files with 272 additions and 6 deletions
@@ -1,100 +0,0 @@
# AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-962
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
## Why
Discovered 2026-05-29 during cycle-4 e2e validation run on Tier-2 Jetson AGX Orin. The AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) — the test that's supposed to prove the full 7-step pipeline works end-to-end — was SKIPPED with:
```
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
Two gaps:
1. `docker-compose.test.jetson.yml` does NOT export `GPS_DENIED_OPERATOR_CONFIG_PATH` despite the comment claiming the Jetson harness sets it. Grep confirms the env var is absent from the compose file.
2. The YAML the README's Tier-2 invocation references (`/workspace/configs/operator_replay.yaml`) does NOT exist anywhere in the repo. No `configs/` directory, no `**/operator*.yaml` match.
Net effect: the cycle-4 closure narrative (Epic AZ-835 + children AZ-836/AZ-838/AZ-839/AZ-840/AZ-842 all marked Done) was based on AC verification by **doc-content presence**, not by the orchestrator test actually running. The test has never been demonstrated to PASS end-to-end on the Jetson harness automatically. This is the exact failure mode `meta-rule.mdc` warns against ("Tests that pass by skipping the component they are supposed to exercise create false confidence").
## Goal
Make the AZ-840 orchestrator test actually runnable on `bash scripts/run-tests-jetson.sh` (no out-of-band manual env-var setup). The test must either PASS, or fail with a NEW, real, attributable error that lands in a follow-up ticket — not skip silently.
## Scope
1. **Author `configs/operator_replay.yaml`** (final location TBD — `configs/` at repo root, or `tests/fixtures/operator_replay.yaml`, or another location consistent with the project's config conventions).
* Must register at minimum: `c6_tile_cache`, `c7_inference`, `c10_provisioning`, `c11_tile_manager` (the four blocks `conftest.py:322-326` and `_build_operator_pre_flight_cache` consume).
* Schema must match what `load_config` parses (see `gps_denied_onboard/config/loader.py`).
* Component types must match what the runtime factories build (see `tests/e2e/replay/conftest.py:430-462` for the `c6_tile_cache.root_dir` override pattern).
* Imagery / FAISS settings sized for Derkachi fixture: route-driven seeding (AZ-836 / AZ-838), HNSW32 FAISS index, NetVLAD descriptors.
2. **Wire the env var into `docker-compose.test.jetson.yml`**:
* Add `GPS_DENIED_OPERATOR_CONFIG_PATH: /opt/configs/operator_replay.yaml` to the `e2e-runner.environment` block.
* Add a read-only bind mount for the configs dir: `./configs:/opt/configs:ro`.
* Verify the README's "Tier-2 invocation" example matches what the compose does automatically — no manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step required.
3. **Re-run Tier-2 and capture the verdict**:
* `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh`
* Confirm the AZ-840 test no longer skips with the env-var or config-file gate.
* Capture the verdict-report (`_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) if PASS, or capture the new failure mode for follow-up ticket if FAIL.
4. **Update README** if the wiring story now differs from the documented one.
## Acceptance Criteria
* **AC-1**: `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH` pointing at a YAML that is bind-mounted into the e2e-runner container.
* **AC-2**: `configs/operator_replay.yaml` (or equivalent final path) exists in the repo, registers all 4 required component blocks (`c6_tile_cache` + `c7_inference` + `c10_provisioning` + `c11_tile_manager`), and is consumable by `load_config(os.environ, paths=[config_path])` without `KeyError`.
* **AC-3**: `bash scripts/run-tests-jetson.sh` no longer reports `SKIPPED [127]: AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH ...` for `test_az840_e2e_real_flight_orchestration`.
* **AC-4**: The orchestrator test either PASSes (and the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` is captured), or fails with a NEW error that is filed as a separate follow-up ticket (don't paper over the failure — failing test + new ticket is the honest outcome).
* **AC-5**: README's `### AZ-835 orchestrator test` section accurately describes what `scripts/run-tests-jetson.sh` does (no "set this env var manually" step required when running via the script).
## Out of scope
* The 4 regression failures in `test_derkachi_1min.py` (separate AZ-963 ticket).
* AZ-895 deprecation rollback.
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
* Updating cycle-4 closure narrative / re-opening AZ-840/AZ-842 status decisions — those are tracker-state questions the user owns.
## Dependencies
* **AZ-835** (parent Epic, currently To Do in Jira but tracker-drift suspected) — this ticket closes a real validation gap in that Epic's deliverable.
* **AZ-839** (C3 fixture, Done locally / In Testing in Jira) — this ticket provides the missing input the fixture's skip-gate complains about.
* **AZ-840** (C4 orchestrator test, Done locally / In Testing in Jira) — this ticket makes that test actually run.
## Estimate
3 SP. Multi-step (YAML + compose wiring + verification re-run), moderate complexity (YAML schema must match runtime factories' expectations), moderate risk (might need iterative tuning on the first re-run).
## Run-log evidence (2026-05-29 Tier-2)
```
JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh
...
e2e-runner-1 | collected 57 items
e2e-runner-1 | tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration SKIPPED [ 1%]
...
e2e-runner-1 | = 4 failed, 48 passed, 3 skipped, 1 xfailed, 1 xpassed, 1 warning in 90.59s (0:01:30) =
e2e-runner-1 | SKIPPED [1] tests/e2e/replay/test_az835_e2e_real_flight.py:127:
AZ-839 operator_pre_flight_setup requires GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML
that registers c6_tile_cache + c7_inference + c10_provisioning + c11_tile_manager blocks
(Jetson e2e harness sets this; dev macOS does not)
```
## References
* Compose: `docker-compose.test.jetson.yml`
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py:127`
* Skip-gate definition: `tests/e2e/replay/conftest.py:343-388`
* README: `tests/e2e/replay/README.md` § `AZ-835 orchestrator test`
* Sibling ticket (parallel work): AZ-963 — 60s smoke regression
@@ -0,0 +1,80 @@
# AZ-964 — Bootstrap FAISS descriptor index for AZ-839 C3 fixture (`operator_pre_flight_cache`)
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-964
**Filed**: 2026-05-29 (surfaced by AZ-962 Tier-2 re-run)
## Why
Discovered 2026-05-29 during the AZ-962 Tier-2 re-run on Jetson AGX Orin. With `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` now correctly wired (AZ-962 shipped), the AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) moved from SKIPped to ERRORed at a deeper, real gate during fixture setup:
```
gps_denied_onboard.components.c6_tile_cache.errors.IndexUnavailableError:
FaissDescriptorIndex: .index file missing at
/tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index
```
The same error also breaks `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache`, confirming this is a fixture-wide problem, not specific to one test.
## Root cause (read from code)
`tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache` (line 487):
1. Overrides `c6_tile_cache.root_dir` to a fresh `/tmp/pytest-of-root/.../operator_pre_flight_cache0/` (per AC of AZ-839, the fixture creates a *new* cache each test).
2. Calls `build_descriptor_index(config)` — which constructs `FaissDescriptorIndex.from_config(config)`.
3. `FaissDescriptorIndex.__init__` calls `_load()` which **raises** `IndexUnavailableError` when no `.index` file exists at `c6_tile_cache.root_dir/descriptor.index`.
4. The fixture never gets to call `populate_c6_from_route` (which presumably creates the index downstream).
The compose `tile-init` setup service exists and runs `scripts/mk_test_faiss_fixture.py` — but it writes a seed index to `/var/lib/gps-denied/tiles` (the `tile-data` volume), **not** to the tmp dir the fixture overrides into. So the fixture's override path always starts empty.
## Goal
Make `_build_operator_pre_flight_cache` succeed past the `build_descriptor_index(config)` call so the AZ-840 orchestrator test can actually exercise the 7-step pipeline (or fail at the next real gate — c10 backbones, AZ-965).
## Scope
One of (in preference order; pick during implementation):
1. **Fixture seeds the index inline**: before calling `build_descriptor_index`, invoke `scripts/mk_test_faiss_fixture.py` programmatically (or in-process equivalent) against the override `root_dir`. Pure test-infra change.
2. **`populate_c6_from_route` creates the index if missing**: production code change so the descriptor-index factory tolerates a fresh `root_dir`. Larger blast radius — touches a shared factory.
3. **`FaissDescriptorIndex` supports an explicit `bootstrap=True` mode**: factory signal that this run intends to create a fresh index. Requires API design.
Option (1) is the smallest, lowest-risk path and the natural extension of the `tile-init` pattern already in compose. **Recommended.**
## Acceptance Criteria
* **AC-1**: `_build_operator_pre_flight_cache` no longer ERRORs at `build_descriptor_index` when started against a fresh empty `c6_tile_cache.root_dir`.
* **AC-2**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer reports the `IndexUnavailableError` for `test_az840_e2e_real_flight_orchestration` **or** for `test_operator_pre_flight_setup_produces_populated_cache`.
* **AC-3**: If the AZ-840 orchestrator test now reaches the c10-backbone gate (`AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones entries`), that's the expected next gate — AZ-965 handles it; AZ-964 is done.
* **AC-4**: `tests/unit` + `tests/e2e/replay/test_operator_pre_flight_*` continue to pass on Tier-1 (Colima).
## Out of scope
* c10 backbone provisioning (separate ticket — AZ-965).
* The 4 ESKF-divergence regression failures in `test_derkachi_1min.py` (separate ticket — AZ-963).
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
* Re-opening AZ-840 / AZ-842 tracker state.
## Dependencies
* **Blocks**: AZ-840 (orchestrator test cannot run end-to-end until this clears).
* **Surfaced by**: AZ-962 (env-var + YAML wiring exposed the next gate).
* **Related**: AZ-839 (C3 fixture — this is its bug to own).
## Estimate
3 SP. Multi-step (locate the seed-index script, invoke it from the fixture before `build_descriptor_index`, verify on Tier-2), moderate risk (the seed script's assumptions might not match the fixture's override path layout).
## References
* Run log: 2026-05-29 Tier-2 Jetson AGX Orin (AZ-962 re-run), 84.99s, 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration` (ERROR)
* Test: `tests/e2e/replay/test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` (ERROR)
* Fixture: `tests/e2e/replay/conftest.py:487`
* Faulting factory: `src/gps_denied_onboard/runtime_root/storage_factory.py:176`
* Faulting class: `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py:107,430`
* Existing seed script: `scripts/mk_test_faiss_fixture.py` (invoked by `tile-init` compose service)
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`
@@ -0,0 +1,83 @@
# AZ-965 — Provision NetVLAD ONNX backbone for AZ-839 `c10_provisioning` corpus
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP (5 SP if export/training required)
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-965
**Filed**: 2026-05-29 (forward-looked during AZ-962)
## Why
Forward-looked during AZ-962. The AZ-839 C3 fixture's `_build_replay_backbone_embedder` (`conftest.py:594-601`) calls `build_backbone_specs(config)` which reads `config.components['c10_provisioning'].backbones` (a tuple of `BackboneSpec`). When empty (the current state — no `.onnx` files ship in the repo), the fixture `pytest.skip`s with:
```
AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones
entries — the e2e harness config must declare at least one backbone
(typically DINOv2-VPR or NetVLAD per AZ-321).
```
The AZ-962 YAML (`configs/operator_replay.yaml`) explicitly leaves the `backbones:` list empty with a TODO note pointing at this ticket. Right now (post-AZ-962) the AZ-840 orchestrator test ERRORs at the FAISS-index gate (AZ-964) **before** reaching the backbones gate — but once AZ-964 ships, this is the next blocker.
## Goal
Provision a NetVLAD `.onnx` model (per AZ-321's pinned backbone choice) and matching `BackboneSpec` entry in `configs/operator_replay.yaml` so `c10_provisioning.compile_engines_for_corpus` can compile at least one engine in the AZ-839 fixture.
## Scope
1. **Source a NetVLAD `.onnx`**: AZ-321 specifies NetVLAD as the C2 baseline. Either:
- Export from an existing PyTorch checkpoint our team owns;
- Pull a vetted public weights file (with license/provenance recorded in `_docs/03_ip_attribution/`);
- Train from scratch (out of scope for this ticket — file a follow-up if neither of the above works).
2. **Place the `.onnx` in the repo**: under a path that's bind-mounted into the Jetson container (e.g. `models/netvlad/netvlad.onnx`). Add to `.gitattributes` for git-lfs if >50 MiB. Verify size against existing checked-in models.
3. **Verify TensorRT compile**: run `c7_inference.PyTorchFp16Runtime.compile_engine` (or the relevant production code path) against the new `.onnx` on Jetson AGX Orin to confirm a `.engine` file is produced with a sensible descriptor dim (typically 4096 per AZ-321).
4. **Populate `configs/operator_replay.yaml`**:
```yaml
c10_provisioning:
workspace_mb: 4096
backbones:
- model_name: netvlad
onnx_path: /opt/models/netvlad/netvlad.onnx
input_name: image
input_shape_chw: [3, 224, 224]
descriptor_dim: 4096
```
(Exact field names per `BackboneSpec` dataclass — verify in `src/gps_denied_onboard/components/c10_provisioning/`.)
5. **Wire `./models` bind-mount** into `docker-compose.test.jetson.yml`.
6. **Update `c2_vpr` block** in the YAML if `_resolve_replay_descriptor_dim` requires `c2_vpr.strategy='net_vlad'` (it does — see `conftest.py:658-666`).
## Acceptance Criteria
* **AC-1**: `models/netvlad/netvlad.onnx` (or equivalent path) exists in the repo with documented provenance + license.
* **AC-2**: `c7_inference` can compile this `.onnx` to a TensorRT `.engine` on Jetson AGX Orin (Tier-2) without errors.
* **AC-3**: `configs/operator_replay.yaml` declares the `netvlad` backbone in `c10_provisioning.backbones`.
* **AC-4**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer SKIPs `test_az840_e2e_real_flight_orchestration` with the empty-backbones message.
* **AC-5**: The AZ-840 orchestrator test either PASSes (and the AZ-699 verdict report lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) or fails with a NEW error filed as a separate follow-up ticket.
* **AC-6**: License/provenance recorded in `_docs/03_ip_attribution/` per project convention.
## Out of scope
* DINOv2-VPR or other alternative backbones (NetVLAD is AZ-321's pinned baseline).
* MegaLoc / MixVPR / UltraVPR (these require a descriptor-dim resolver change — out of conftest scope).
* The 4 ESKF-divergence regression failures (AZ-963).
* Reference C6 tile cache for the Derkachi fixture (large separate work).
## Dependencies
* **Blocked by**: AZ-964 (FAISS index bootstrap — the orchestrator test ERRORs there before reaching this gate; clearing AZ-964 first surfaces the empty-backbones gate cleanly).
* **Blocks**: AZ-840 (orchestrator test cannot PASS end-to-end without a real backbone).
* **Related**: AZ-321 (defines NetVLAD as the C2 baseline), AZ-839 (C3 fixture).
## Estimate
3 SP if a usable `.onnx` already exists in the team's drive; 5 SP if export/training is needed. If 5+ SP, consider splitting model-acquisition from yaml-wiring into two sub-tickets.
## References
* Fixture skip-gate: `tests/e2e/replay/conftest.py:594-601`
* Backbone factory: `src/gps_denied_onboard/runtime_root/c10_factory.py::build_backbone_specs`
* Backbone spec dataclass: `src/gps_denied_onboard/components/c10_provisioning/config.py`
* AZ-321 (NetVLAD baseline choice)
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`