mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 07:01:14 +00:00
[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring
AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer SKIPs at the env-var gate. configs/operator_replay.yaml registers c6/c7/c10/c11 with sane defaults (backbones intentionally empty, see AZ-965); docker-compose.test.jetson.yml exports GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key so secrets flow from .env.test and never sit in YAML. README drops the manual export step. 97/97 c11 + config unit tests stay green. Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2 skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to ERROR with a deeper, real gate — IndexUnavailableError on FaissDescriptorIndex against a fresh c6_tile_cache.root_dir. AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839 C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for NetVLAD ONNX backbone provisioning — the next gate the orchestrator test will hit once FAISS clears. Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 → AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral directive (2026-05-29) unchanged — still gated behind Derkachi e2e green, still NOT MET. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
File diff suppressed because one or more lines are too long
+10
-1
@@ -1,11 +1,20 @@
|
||||
# AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness
|
||||
|
||||
**Status**: To Do (Jira) / `todo/` (local)
|
||||
**Status**: Done (Jira) / `done/` (local)
|
||||
**Issue type**: Task
|
||||
**Complexity**: 3 SP
|
||||
**Cycle**: cycle-4 e2e closure follow-up
|
||||
**Jira**: https://denyspopov.atlassian.net/browse/AZ-962
|
||||
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
|
||||
**Shipped**: 2026-05-29 (same day)
|
||||
|
||||
## Closure note (2026-05-29)
|
||||
|
||||
Shipped: `configs/operator_replay.yaml` authored (registers all 4 blocks c6/c7/c10/c11), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` and bind-mounts `./configs:/opt/configs:ro`, and `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries for `SATELLITE_PROVIDER_URL` / `SATELLITE_PROVIDER_API_KEY` → `c11_tile_manager` so secrets stay out of the YAML and flow in from `.env.test`. README `tests/e2e/replay/README.md` updated to drop the manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step.
|
||||
|
||||
Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s. AC-3 satisfied — `test_az840_e2e_real_flight_orchestration` no longer SKIPs at the env-var gate. AC-4 satisfied — it now ERRORs at a deeper, real gate (`IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`) which is captured in a NEW follow-up ticket **AZ-964**. The empty-backbones gate that this spec originally flagged (c10 backbones) becomes the gate AFTER AZ-964 clears — filed as **AZ-965**.
|
||||
|
||||
Net cycle-4 status remains NOT GREEN (orchestrator test still doesn't PASS, blocked by AZ-964 + AZ-965; ESKF divergence regression still blocked by AZ-963). AZ-962 itself is complete.
|
||||
|
||||
## Why
|
||||
|
||||
@@ -0,0 +1,80 @@
|
||||
# AZ-964 — Bootstrap FAISS descriptor index for AZ-839 C3 fixture (`operator_pre_flight_cache`)
|
||||
|
||||
**Status**: To Do (Jira) / `todo/` (local)
|
||||
**Issue type**: Task
|
||||
**Complexity**: 3 SP
|
||||
**Cycle**: cycle-4 e2e closure follow-up
|
||||
**Jira**: https://denyspopov.atlassian.net/browse/AZ-964
|
||||
**Filed**: 2026-05-29 (surfaced by AZ-962 Tier-2 re-run)
|
||||
|
||||
## Why
|
||||
|
||||
Discovered 2026-05-29 during the AZ-962 Tier-2 re-run on Jetson AGX Orin. With `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` now correctly wired (AZ-962 shipped), the AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) moved from SKIPped to ERRORed at a deeper, real gate during fixture setup:
|
||||
|
||||
```
|
||||
gps_denied_onboard.components.c6_tile_cache.errors.IndexUnavailableError:
|
||||
FaissDescriptorIndex: .index file missing at
|
||||
/tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index
|
||||
```
|
||||
|
||||
The same error also breaks `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache`, confirming this is a fixture-wide problem, not specific to one test.
|
||||
|
||||
## Root cause (read from code)
|
||||
|
||||
`tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache` (line 487):
|
||||
|
||||
1. Overrides `c6_tile_cache.root_dir` to a fresh `/tmp/pytest-of-root/.../operator_pre_flight_cache0/` (per AC of AZ-839, the fixture creates a *new* cache each test).
|
||||
2. Calls `build_descriptor_index(config)` — which constructs `FaissDescriptorIndex.from_config(config)`.
|
||||
3. `FaissDescriptorIndex.__init__` calls `_load()` which **raises** `IndexUnavailableError` when no `.index` file exists at `c6_tile_cache.root_dir/descriptor.index`.
|
||||
4. The fixture never gets to call `populate_c6_from_route` (which presumably creates the index downstream).
|
||||
|
||||
The compose `tile-init` setup service exists and runs `scripts/mk_test_faiss_fixture.py` — but it writes a seed index to `/var/lib/gps-denied/tiles` (the `tile-data` volume), **not** to the tmp dir the fixture overrides into. So the fixture's override path always starts empty.
|
||||
|
||||
## Goal
|
||||
|
||||
Make `_build_operator_pre_flight_cache` succeed past the `build_descriptor_index(config)` call so the AZ-840 orchestrator test can actually exercise the 7-step pipeline (or fail at the next real gate — c10 backbones, AZ-965).
|
||||
|
||||
## Scope
|
||||
|
||||
One of (in preference order; pick during implementation):
|
||||
|
||||
1. **Fixture seeds the index inline**: before calling `build_descriptor_index`, invoke `scripts/mk_test_faiss_fixture.py` programmatically (or in-process equivalent) against the override `root_dir`. Pure test-infra change.
|
||||
2. **`populate_c6_from_route` creates the index if missing**: production code change so the descriptor-index factory tolerates a fresh `root_dir`. Larger blast radius — touches a shared factory.
|
||||
3. **`FaissDescriptorIndex` supports an explicit `bootstrap=True` mode**: factory signal that this run intends to create a fresh index. Requires API design.
|
||||
|
||||
Option (1) is the smallest, lowest-risk path and the natural extension of the `tile-init` pattern already in compose. **Recommended.**
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
* **AC-1**: `_build_operator_pre_flight_cache` no longer ERRORs at `build_descriptor_index` when started against a fresh empty `c6_tile_cache.root_dir`.
|
||||
* **AC-2**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer reports the `IndexUnavailableError` for `test_az840_e2e_real_flight_orchestration` **or** for `test_operator_pre_flight_setup_produces_populated_cache`.
|
||||
* **AC-3**: If the AZ-840 orchestrator test now reaches the c10-backbone gate (`AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones entries`), that's the expected next gate — AZ-965 handles it; AZ-964 is done.
|
||||
* **AC-4**: `tests/unit` + `tests/e2e/replay/test_operator_pre_flight_*` continue to pass on Tier-1 (Colima).
|
||||
|
||||
## Out of scope
|
||||
|
||||
* c10 backbone provisioning (separate ticket — AZ-965).
|
||||
* The 4 ESKF-divergence regression failures in `test_derkachi_1min.py` (separate ticket — AZ-963).
|
||||
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
|
||||
* Re-opening AZ-840 / AZ-842 tracker state.
|
||||
|
||||
## Dependencies
|
||||
|
||||
* **Blocks**: AZ-840 (orchestrator test cannot run end-to-end until this clears).
|
||||
* **Surfaced by**: AZ-962 (env-var + YAML wiring exposed the next gate).
|
||||
* **Related**: AZ-839 (C3 fixture — this is its bug to own).
|
||||
|
||||
## Estimate
|
||||
|
||||
3 SP. Multi-step (locate the seed-index script, invoke it from the fixture before `build_descriptor_index`, verify on Tier-2), moderate risk (the seed script's assumptions might not match the fixture's override path layout).
|
||||
|
||||
## References
|
||||
|
||||
* Run log: 2026-05-29 Tier-2 Jetson AGX Orin (AZ-962 re-run), 84.99s, 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors
|
||||
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration` (ERROR)
|
||||
* Test: `tests/e2e/replay/test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` (ERROR)
|
||||
* Fixture: `tests/e2e/replay/conftest.py:487`
|
||||
* Faulting factory: `src/gps_denied_onboard/runtime_root/storage_factory.py:176`
|
||||
* Faulting class: `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py:107,430`
|
||||
* Existing seed script: `scripts/mk_test_faiss_fixture.py` (invoked by `tile-init` compose service)
|
||||
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`
|
||||
@@ -0,0 +1,83 @@
|
||||
# AZ-965 — Provision NetVLAD ONNX backbone for AZ-839 `c10_provisioning` corpus
|
||||
|
||||
**Status**: To Do (Jira) / `todo/` (local)
|
||||
**Issue type**: Task
|
||||
**Complexity**: 3 SP (5 SP if export/training required)
|
||||
**Cycle**: cycle-4 e2e closure follow-up
|
||||
**Jira**: https://denyspopov.atlassian.net/browse/AZ-965
|
||||
**Filed**: 2026-05-29 (forward-looked during AZ-962)
|
||||
|
||||
## Why
|
||||
|
||||
Forward-looked during AZ-962. The AZ-839 C3 fixture's `_build_replay_backbone_embedder` (`conftest.py:594-601`) calls `build_backbone_specs(config)` which reads `config.components['c10_provisioning'].backbones` (a tuple of `BackboneSpec`). When empty (the current state — no `.onnx` files ship in the repo), the fixture `pytest.skip`s with:
|
||||
|
||||
```
|
||||
AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones
|
||||
entries — the e2e harness config must declare at least one backbone
|
||||
(typically DINOv2-VPR or NetVLAD per AZ-321).
|
||||
```
|
||||
|
||||
The AZ-962 YAML (`configs/operator_replay.yaml`) explicitly leaves the `backbones:` list empty with a TODO note pointing at this ticket. Right now (post-AZ-962) the AZ-840 orchestrator test ERRORs at the FAISS-index gate (AZ-964) **before** reaching the backbones gate — but once AZ-964 ships, this is the next blocker.
|
||||
|
||||
## Goal
|
||||
|
||||
Provision a NetVLAD `.onnx` model (per AZ-321's pinned backbone choice) and matching `BackboneSpec` entry in `configs/operator_replay.yaml` so `c10_provisioning.compile_engines_for_corpus` can compile at least one engine in the AZ-839 fixture.
|
||||
|
||||
## Scope
|
||||
|
||||
1. **Source a NetVLAD `.onnx`**: AZ-321 specifies NetVLAD as the C2 baseline. Either:
|
||||
- Export from an existing PyTorch checkpoint our team owns;
|
||||
- Pull a vetted public weights file (with license/provenance recorded in `_docs/03_ip_attribution/`);
|
||||
- Train from scratch (out of scope for this ticket — file a follow-up if neither of the above works).
|
||||
2. **Place the `.onnx` in the repo**: under a path that's bind-mounted into the Jetson container (e.g. `models/netvlad/netvlad.onnx`). Add to `.gitattributes` for git-lfs if >50 MiB. Verify size against existing checked-in models.
|
||||
3. **Verify TensorRT compile**: run `c7_inference.PyTorchFp16Runtime.compile_engine` (or the relevant production code path) against the new `.onnx` on Jetson AGX Orin to confirm a `.engine` file is produced with a sensible descriptor dim (typically 4096 per AZ-321).
|
||||
4. **Populate `configs/operator_replay.yaml`**:
|
||||
|
||||
```yaml
|
||||
c10_provisioning:
|
||||
workspace_mb: 4096
|
||||
backbones:
|
||||
- model_name: netvlad
|
||||
onnx_path: /opt/models/netvlad/netvlad.onnx
|
||||
input_name: image
|
||||
input_shape_chw: [3, 224, 224]
|
||||
descriptor_dim: 4096
|
||||
```
|
||||
|
||||
(Exact field names per `BackboneSpec` dataclass — verify in `src/gps_denied_onboard/components/c10_provisioning/`.)
|
||||
5. **Wire `./models` bind-mount** into `docker-compose.test.jetson.yml`.
|
||||
6. **Update `c2_vpr` block** in the YAML if `_resolve_replay_descriptor_dim` requires `c2_vpr.strategy='net_vlad'` (it does — see `conftest.py:658-666`).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
* **AC-1**: `models/netvlad/netvlad.onnx` (or equivalent path) exists in the repo with documented provenance + license.
|
||||
* **AC-2**: `c7_inference` can compile this `.onnx` to a TensorRT `.engine` on Jetson AGX Orin (Tier-2) without errors.
|
||||
* **AC-3**: `configs/operator_replay.yaml` declares the `netvlad` backbone in `c10_provisioning.backbones`.
|
||||
* **AC-4**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer SKIPs `test_az840_e2e_real_flight_orchestration` with the empty-backbones message.
|
||||
* **AC-5**: The AZ-840 orchestrator test either PASSes (and the AZ-699 verdict report lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) or fails with a NEW error filed as a separate follow-up ticket.
|
||||
* **AC-6**: License/provenance recorded in `_docs/03_ip_attribution/` per project convention.
|
||||
|
||||
## Out of scope
|
||||
|
||||
* DINOv2-VPR or other alternative backbones (NetVLAD is AZ-321's pinned baseline).
|
||||
* MegaLoc / MixVPR / UltraVPR (these require a descriptor-dim resolver change — out of conftest scope).
|
||||
* The 4 ESKF-divergence regression failures (AZ-963).
|
||||
* Reference C6 tile cache for the Derkachi fixture (large separate work).
|
||||
|
||||
## Dependencies
|
||||
|
||||
* **Blocked by**: AZ-964 (FAISS index bootstrap — the orchestrator test ERRORs there before reaching this gate; clearing AZ-964 first surfaces the empty-backbones gate cleanly).
|
||||
* **Blocks**: AZ-840 (orchestrator test cannot PASS end-to-end without a real backbone).
|
||||
* **Related**: AZ-321 (defines NetVLAD as the C2 baseline), AZ-839 (C3 fixture).
|
||||
|
||||
## Estimate
|
||||
|
||||
3 SP if a usable `.onnx` already exists in the team's drive; 5 SP if export/training is needed. If 5+ SP, consider splitting model-acquisition from yaml-wiring into two sub-tickets.
|
||||
|
||||
## References
|
||||
|
||||
* Fixture skip-gate: `tests/e2e/replay/conftest.py:594-601`
|
||||
* Backbone factory: `src/gps_denied_onboard/runtime_root/c10_factory.py::build_backbone_specs`
|
||||
* Backbone spec dataclass: `src/gps_denied_onboard/components/c10_provisioning/config.py`
|
||||
* AZ-321 (NetVLAD baseline choice)
|
||||
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`
|
||||
@@ -8,7 +8,7 @@ status: in_progress
|
||||
sub_step:
|
||||
phase: 6
|
||||
name: implement-tasks
|
||||
detail: "batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
|
||||
detail: "batch 10 = AZ-962 SHIPPED end-to-end + 2 new tickets filed. Implemented `configs/operator_replay.yaml` (registers c6/c7/c10/c11 with defaults; `backbones: []` intentionally — see AZ-965), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` + bind-mounts `./configs:/opt/configs:ro`, `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries (`SATELLITE_PROVIDER_URL` → `c11.satellite_provider_url`, `SATELLITE_PROVIDER_API_KEY` → `c11.service_api_key`) so secrets flow from `.env.test` and never land in YAML, and README dropped the manual export step. 97/97 c11+config unit tests stay green. Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s — i.e. -2 skipped, +2 errors vs the prior baseline. AZ-962 AC-3 + AC-4 satisfied: AZ-840 orchestrator no longer SKIPs at env-var; it now ERRORs at a deeper, real gate during fixture setup with `IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`. Same error in `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` confirms fixture-wide bug, not a single-test issue. Root cause: `conftest.py:487` calls `build_descriptor_index(config)` against a fresh empty `c6_tile_cache.root_dir` (tmp dir per AZ-839 invariant) — FAISS factory needs an existing `.index`. `tile-init` compose service exists but writes its seed to `/var/lib/gps-denied/tiles`, not the tmp dir the fixture overrides into. Filed AZ-964 (3 SP, To Do, FAISS index bootstrap; preferred fix = invoke `mk_test_faiss_fixture.py` inline against override `root_dir`) and AZ-965 (3 SP, To Do, blocked by AZ-964, NetVLAD ONNX backbone provisioning — the next gate after FAISS clears). AZ-962 transitioned To Do → In Progress → Done in Jira (read-back verified). AZ-962 spec moved todo/ → done/. **Cycle-4 e2e gate still NOT GREEN**: AZ-840 chain is now AZ-964 → AZ-965 → orchestrator PASS; 60s smoke is AZ-963 → 4 derkachi_1min tests PASS. OKVIS2 deferral directive still in force (not yet met). Earlier same-day batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
|
||||
retry_count: 0
|
||||
cycle: 4
|
||||
tracker: jira
|
||||
|
||||
@@ -0,0 +1,66 @@
|
||||
# AZ-962 — Operator pre-flight + replay-mode config for Tier-2 Jetson e2e harness.
|
||||
#
|
||||
# Consumed by `tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache`
|
||||
# (the AZ-839 C3 fixture) which `load_config(env, paths=[this])` then drives the
|
||||
# AZ-840 7-step orchestrator (`test_az835_e2e_real_flight.py`).
|
||||
#
|
||||
# Most fields stay at their dataclass defaults (see
|
||||
# `src/gps_denied_onboard/components/{c6_tile_cache,c7_inference,c10_provisioning,c11_tile_manager}/config.py`).
|
||||
# The blocks are declared here primarily so the four-component contract the
|
||||
# fixture skip-gate cites is satisfied by inspection of this file. The env
|
||||
# vars below are filled by docker-compose.test.jetson.yml / `.env.test`:
|
||||
#
|
||||
# * `GPS_DENIED_FC_PROFILE`, `GPS_DENIED_TIER`, `DB_URL` → runtime
|
||||
# * `INFERENCE_BACKEND`, `TILE_CACHE_PATH`, `CAMERA_CALIBRATION_PATH` → runtime
|
||||
# * `LOG_LEVEL`, `LOG_SINK` → log
|
||||
# * `FDR_PATH` → fdr
|
||||
# * `SATELLITE_PROVIDER_URL` → c11_tile_manager.satellite_provider_url
|
||||
# * `SATELLITE_PROVIDER_API_KEY` → c11_tile_manager.service_api_key
|
||||
#
|
||||
# AZ-964 (follow-up, not yet filed): the orchestrator test SKIPs at the
|
||||
# next gate because `c10_provisioning.backbones` is empty — no NetVLAD /
|
||||
# DINOv2 .onnx file ships with this repo. Populating the backbones list
|
||||
# here (and provisioning the matching .onnx + verifying it compiles on
|
||||
# Tegra) is AZ-964's scope, not AZ-962's.
|
||||
|
||||
__top__:
|
||||
mode: replay
|
||||
|
||||
runtime:
|
||||
fc_profile: ardupilot_plane
|
||||
tier: 2
|
||||
|
||||
replay:
|
||||
pace: asap
|
||||
target_fc_dialect: ardupilot_plane
|
||||
|
||||
c6_tile_cache:
|
||||
store_runtime: postgres_filesystem
|
||||
metadata_runtime: postgres_filesystem
|
||||
descriptor_index_runtime: faiss_hnsw
|
||||
postgres_pool_size: 4
|
||||
lru_eviction_threshold_bytes: 10737418240 # 10 GiB
|
||||
|
||||
c7_inference:
|
||||
runtime: pytorch_fp16
|
||||
thermal_poll_hz: 1.0
|
||||
engine_cache_dir: /var/lib/gps-denied/engines
|
||||
gpu_memory_budget_bytes: 4294967296 # 4 GiB
|
||||
trtexec_timeout_s: 600
|
||||
ort_trt_cache_dir: /var/lib/gps-denied/engines/ort_trt_cache
|
||||
|
||||
c10_provisioning:
|
||||
workspace_mb: 4096
|
||||
# backbones intentionally empty — see AZ-964 for the follow-up.
|
||||
# The AZ-839 fixture skip-gate (conftest.py:594-601) fires here
|
||||
# with a clear message until backbone provisioning lands.
|
||||
|
||||
c11_tile_manager:
|
||||
# satellite_provider_url + service_api_key flow in from env vars
|
||||
# (SATELLITE_PROVIDER_URL / SATELLITE_PROVIDER_API_KEY) via the
|
||||
# loader's ENV_KEY_MAP additions in AZ-962.
|
||||
upload_batch_size: 25
|
||||
upload_http_timeout_s: 30.0
|
||||
download_http_timeout_s: 30.0
|
||||
download_max_5xx_retries: 4
|
||||
download_resolution_floor_m_per_px: 0.5
|
||||
@@ -169,9 +169,16 @@ services:
|
||||
# `replay_runner` fixture trips that gate without this line.
|
||||
BUILD_CSV_REPLAY_ADAPTER: "ON"
|
||||
BUILD_FAISS_INDEX: "ON"
|
||||
# AZ-962: the AZ-839 C3 fixture (operator_pre_flight_setup) skips
|
||||
# the AZ-840 orchestrator test when this var is missing. The YAML
|
||||
# bind-mounted at /opt/configs/operator_replay.yaml declares the
|
||||
# four blocks the fixture consumes (c6/c7/c10/c11). c10.backbones
|
||||
# is intentionally empty — AZ-964 ships the .onnx + populates it.
|
||||
GPS_DENIED_OPERATOR_CONFIG_PATH: /opt/configs/operator_replay.yaml
|
||||
volumes:
|
||||
- ./tests:/opt/tests:ro
|
||||
- ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
|
||||
- ./configs:/opt/configs:ro
|
||||
- fdr-data:/var/lib/gps-denied/fdr
|
||||
- tile-data:/var/lib/gps-denied/tiles
|
||||
|
||||
|
||||
@@ -74,6 +74,14 @@ ENV_KEY_MAP: Final[dict[str, tuple[str, str]]] = {
|
||||
"REPLAY_PACE": ("replay", "pace"),
|
||||
"REPLAY_TIME_OFFSET_MS": ("replay", "time_offset_ms"),
|
||||
"REPLAY_TARGET_FC_DIALECT": ("replay", "target_fc_dialect"),
|
||||
# C11 tile-manager URL + bearer (AZ-962) — the Jetson harness +
|
||||
# operator-orchestrator deploys inject these via env so the YAML
|
||||
# never carries a real secret. `build_tile_downloader` raises if
|
||||
# `service_api_key` is empty; mapping it through ENV_KEY_MAP lets
|
||||
# the e2e harness fill the field from `.env.test` without a YAML
|
||||
# override step.
|
||||
"SATELLITE_PROVIDER_URL": ("c11_tile_manager", "satellite_provider_url"),
|
||||
"SATELLITE_PROVIDER_API_KEY": ("c11_tile_manager", "service_api_key"),
|
||||
}
|
||||
|
||||
# Env vars that MUST resolve to a non-empty value before `load_config`
|
||||
@@ -122,6 +130,9 @@ _FIELD_COERCIONS: Final[dict[str, type]] = {
|
||||
"output_path": str,
|
||||
"pace": str,
|
||||
"target_fc_dialect": str,
|
||||
# C11 (AZ-962)
|
||||
"satellite_provider_url": str,
|
||||
"service_api_key": str,
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -51,10 +51,20 @@ matching nadir video + camera calibration, the orchestrator runs the
|
||||
ssh jetson-e2e
|
||||
cd /workspace/gps-denied-onboard
|
||||
export RUN_REPLAY_E2E=1
|
||||
export GPS_DENIED_OPERATOR_CONFIG_PATH=/workspace/configs/operator_replay.yaml
|
||||
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
|
||||
```
|
||||
|
||||
AZ-962: `docker-compose.test.jetson.yml` exports
|
||||
`GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml`
|
||||
automatically and bind-mounts `./configs:/opt/configs:ro`, so no
|
||||
manual env-var export is required when running through
|
||||
`scripts/run-tests-jetson.sh`. The YAML at `configs/operator_replay.yaml`
|
||||
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
|
||||
secrets (`SATELLITE_PROVIDER_API_KEY`) flow in from `.env.test` via
|
||||
the loader's `ENV_KEY_MAP`. `c10_provisioning.backbones` is
|
||||
intentionally empty pending AZ-964 (the orchestrator test will
|
||||
SKIP at the "no backbones" gate until AZ-964 lands).
|
||||
|
||||
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
|
||||
which handles the SSH alias + rsync + remote pytest invocation. See
|
||||
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
|
||||
|
||||
Reference in New Issue
Block a user