[AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring

AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer
SKIPs at the env-var gate. configs/operator_replay.yaml registers
c6/c7/c10/c11 with sane defaults (backbones intentionally empty,
see AZ-965); docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains
SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url
and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key
so secrets flow from .env.test and never sit in YAML. README drops
the manual export step. 97/97 c11 + config unit tests stay green.

Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed /
1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2
skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to
ERROR with a deeper, real gate — IndexUnavailableError on
FaissDescriptorIndex against a fresh c6_tile_cache.root_dir.

AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839
C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for
NetVLAD ONNX backbone provisioning — the next gate the orchestrator
test will hit once FAISS clears.

Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 →
AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral
directive (2026-05-29) unchanged — still gated behind Derkachi
e2e green, still NOT MET.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 16:42:55 +03:00
parent 92ba7997a9
commit 763d8b21ad
9 changed files with 272 additions and 6 deletions
File diff suppressed because one or more lines are too long
@@ -1,11 +1,20 @@
# AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness # AZ-962 — Wire `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` into Tier-2 Jetson harness
**Status**: To Do (Jira) / `todo/` (local) **Status**: Done (Jira) / `done/` (local)
**Issue type**: Task **Issue type**: Task
**Complexity**: 3 SP **Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up **Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-962 **Jira**: https://denyspopov.atlassian.net/browse/AZ-962
**Filed**: 2026-05-29 during cycle-4 Tier-2 validation run **Filed**: 2026-05-29 during cycle-4 Tier-2 validation run
**Shipped**: 2026-05-29 (same day)
## Closure note (2026-05-29)
Shipped: `configs/operator_replay.yaml` authored (registers all 4 blocks c6/c7/c10/c11), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` and bind-mounts `./configs:/opt/configs:ro`, and `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries for `SATELLITE_PROVIDER_URL` / `SATELLITE_PROVIDER_API_KEY``c11_tile_manager` so secrets stay out of the YAML and flow in from `.env.test`. README `tests/e2e/replay/README.md` updated to drop the manual `export GPS_DENIED_OPERATOR_CONFIG_PATH=...` step.
Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s. AC-3 satisfied — `test_az840_e2e_real_flight_orchestration` no longer SKIPs at the env-var gate. AC-4 satisfied — it now ERRORs at a deeper, real gate (`IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`) which is captured in a NEW follow-up ticket **AZ-964**. The empty-backbones gate that this spec originally flagged (c10 backbones) becomes the gate AFTER AZ-964 clears — filed as **AZ-965**.
Net cycle-4 status remains NOT GREEN (orchestrator test still doesn't PASS, blocked by AZ-964 + AZ-965; ESKF divergence regression still blocked by AZ-963). AZ-962 itself is complete.
## Why ## Why
@@ -0,0 +1,80 @@
# AZ-964 — Bootstrap FAISS descriptor index for AZ-839 C3 fixture (`operator_pre_flight_cache`)
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-964
**Filed**: 2026-05-29 (surfaced by AZ-962 Tier-2 re-run)
## Why
Discovered 2026-05-29 during the AZ-962 Tier-2 re-run on Jetson AGX Orin. With `GPS_DENIED_OPERATOR_CONFIG_PATH` + `operator_replay.yaml` now correctly wired (AZ-962 shipped), the AZ-840 orchestrator test (`tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration`) moved from SKIPped to ERRORed at a deeper, real gate during fixture setup:
```
gps_denied_onboard.components.c6_tile_cache.errors.IndexUnavailableError:
FaissDescriptorIndex: .index file missing at
/tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index
```
The same error also breaks `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache`, confirming this is a fixture-wide problem, not specific to one test.
## Root cause (read from code)
`tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache` (line 487):
1. Overrides `c6_tile_cache.root_dir` to a fresh `/tmp/pytest-of-root/.../operator_pre_flight_cache0/` (per AC of AZ-839, the fixture creates a *new* cache each test).
2. Calls `build_descriptor_index(config)` — which constructs `FaissDescriptorIndex.from_config(config)`.
3. `FaissDescriptorIndex.__init__` calls `_load()` which **raises** `IndexUnavailableError` when no `.index` file exists at `c6_tile_cache.root_dir/descriptor.index`.
4. The fixture never gets to call `populate_c6_from_route` (which presumably creates the index downstream).
The compose `tile-init` setup service exists and runs `scripts/mk_test_faiss_fixture.py` — but it writes a seed index to `/var/lib/gps-denied/tiles` (the `tile-data` volume), **not** to the tmp dir the fixture overrides into. So the fixture's override path always starts empty.
## Goal
Make `_build_operator_pre_flight_cache` succeed past the `build_descriptor_index(config)` call so the AZ-840 orchestrator test can actually exercise the 7-step pipeline (or fail at the next real gate — c10 backbones, AZ-965).
## Scope
One of (in preference order; pick during implementation):
1. **Fixture seeds the index inline**: before calling `build_descriptor_index`, invoke `scripts/mk_test_faiss_fixture.py` programmatically (or in-process equivalent) against the override `root_dir`. Pure test-infra change.
2. **`populate_c6_from_route` creates the index if missing**: production code change so the descriptor-index factory tolerates a fresh `root_dir`. Larger blast radius — touches a shared factory.
3. **`FaissDescriptorIndex` supports an explicit `bootstrap=True` mode**: factory signal that this run intends to create a fresh index. Requires API design.
Option (1) is the smallest, lowest-risk path and the natural extension of the `tile-init` pattern already in compose. **Recommended.**
## Acceptance Criteria
* **AC-1**: `_build_operator_pre_flight_cache` no longer ERRORs at `build_descriptor_index` when started against a fresh empty `c6_tile_cache.root_dir`.
* **AC-2**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer reports the `IndexUnavailableError` for `test_az840_e2e_real_flight_orchestration` **or** for `test_operator_pre_flight_setup_produces_populated_cache`.
* **AC-3**: If the AZ-840 orchestrator test now reaches the c10-backbone gate (`AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones entries`), that's the expected next gate — AZ-965 handles it; AZ-964 is done.
* **AC-4**: `tests/unit` + `tests/e2e/replay/test_operator_pre_flight_*` continue to pass on Tier-1 (Colima).
## Out of scope
* c10 backbone provisioning (separate ticket — AZ-965).
* The 4 ESKF-divergence regression failures in `test_derkachi_1min.py` (separate ticket — AZ-963).
* Adding a reference C6 tile cache for the Derkachi fixture (large separate work).
* Re-opening AZ-840 / AZ-842 tracker state.
## Dependencies
* **Blocks**: AZ-840 (orchestrator test cannot run end-to-end until this clears).
* **Surfaced by**: AZ-962 (env-var + YAML wiring exposed the next gate).
* **Related**: AZ-839 (C3 fixture — this is its bug to own).
## Estimate
3 SP. Multi-step (locate the seed-index script, invoke it from the fixture before `build_descriptor_index`, verify on Tier-2), moderate risk (the seed script's assumptions might not match the fixture's override path layout).
## References
* Run log: 2026-05-29 Tier-2 Jetson AGX Orin (AZ-962 re-run), 84.99s, 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors
* Test: `tests/e2e/replay/test_az835_e2e_real_flight.py::test_az840_e2e_real_flight_orchestration` (ERROR)
* Test: `tests/e2e/replay/test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` (ERROR)
* Fixture: `tests/e2e/replay/conftest.py:487`
* Faulting factory: `src/gps_denied_onboard/runtime_root/storage_factory.py:176`
* Faulting class: `src/gps_denied_onboard/components/c6_tile_cache/faiss_descriptor_index.py:107,430`
* Existing seed script: `scripts/mk_test_faiss_fixture.py` (invoked by `tile-init` compose service)
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`
@@ -0,0 +1,83 @@
# AZ-965 — Provision NetVLAD ONNX backbone for AZ-839 `c10_provisioning` corpus
**Status**: To Do (Jira) / `todo/` (local)
**Issue type**: Task
**Complexity**: 3 SP (5 SP if export/training required)
**Cycle**: cycle-4 e2e closure follow-up
**Jira**: https://denyspopov.atlassian.net/browse/AZ-965
**Filed**: 2026-05-29 (forward-looked during AZ-962)
## Why
Forward-looked during AZ-962. The AZ-839 C3 fixture's `_build_replay_backbone_embedder` (`conftest.py:594-601`) calls `build_backbone_specs(config)` which reads `config.components['c10_provisioning'].backbones` (a tuple of `BackboneSpec`). When empty (the current state — no `.onnx` files ship in the repo), the fixture `pytest.skip`s with:
```
AZ-839 operator_pre_flight_setup: config has no c10_provisioning.backbones
entries — the e2e harness config must declare at least one backbone
(typically DINOv2-VPR or NetVLAD per AZ-321).
```
The AZ-962 YAML (`configs/operator_replay.yaml`) explicitly leaves the `backbones:` list empty with a TODO note pointing at this ticket. Right now (post-AZ-962) the AZ-840 orchestrator test ERRORs at the FAISS-index gate (AZ-964) **before** reaching the backbones gate — but once AZ-964 ships, this is the next blocker.
## Goal
Provision a NetVLAD `.onnx` model (per AZ-321's pinned backbone choice) and matching `BackboneSpec` entry in `configs/operator_replay.yaml` so `c10_provisioning.compile_engines_for_corpus` can compile at least one engine in the AZ-839 fixture.
## Scope
1. **Source a NetVLAD `.onnx`**: AZ-321 specifies NetVLAD as the C2 baseline. Either:
- Export from an existing PyTorch checkpoint our team owns;
- Pull a vetted public weights file (with license/provenance recorded in `_docs/03_ip_attribution/`);
- Train from scratch (out of scope for this ticket — file a follow-up if neither of the above works).
2. **Place the `.onnx` in the repo**: under a path that's bind-mounted into the Jetson container (e.g. `models/netvlad/netvlad.onnx`). Add to `.gitattributes` for git-lfs if >50 MiB. Verify size against existing checked-in models.
3. **Verify TensorRT compile**: run `c7_inference.PyTorchFp16Runtime.compile_engine` (or the relevant production code path) against the new `.onnx` on Jetson AGX Orin to confirm a `.engine` file is produced with a sensible descriptor dim (typically 4096 per AZ-321).
4. **Populate `configs/operator_replay.yaml`**:
```yaml
c10_provisioning:
workspace_mb: 4096
backbones:
- model_name: netvlad
onnx_path: /opt/models/netvlad/netvlad.onnx
input_name: image
input_shape_chw: [3, 224, 224]
descriptor_dim: 4096
```
(Exact field names per `BackboneSpec` dataclass — verify in `src/gps_denied_onboard/components/c10_provisioning/`.)
5. **Wire `./models` bind-mount** into `docker-compose.test.jetson.yml`.
6. **Update `c2_vpr` block** in the YAML if `_resolve_replay_descriptor_dim` requires `c2_vpr.strategy='net_vlad'` (it does — see `conftest.py:658-666`).
## Acceptance Criteria
* **AC-1**: `models/netvlad/netvlad.onnx` (or equivalent path) exists in the repo with documented provenance + license.
* **AC-2**: `c7_inference` can compile this `.onnx` to a TensorRT `.engine` on Jetson AGX Orin (Tier-2) without errors.
* **AC-3**: `configs/operator_replay.yaml` declares the `netvlad` backbone in `c10_provisioning.backbones`.
* **AC-4**: `JETSON_SSH_ALIAS=<alias> bash scripts/run-tests-jetson.sh` no longer SKIPs `test_az840_e2e_real_flight_orchestration` with the empty-backbones message.
* **AC-5**: The AZ-840 orchestrator test either PASSes (and the AZ-699 verdict report lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`) or fails with a NEW error filed as a separate follow-up ticket.
* **AC-6**: License/provenance recorded in `_docs/03_ip_attribution/` per project convention.
## Out of scope
* DINOv2-VPR or other alternative backbones (NetVLAD is AZ-321's pinned baseline).
* MegaLoc / MixVPR / UltraVPR (these require a descriptor-dim resolver change — out of conftest scope).
* The 4 ESKF-divergence regression failures (AZ-963).
* Reference C6 tile cache for the Derkachi fixture (large separate work).
## Dependencies
* **Blocked by**: AZ-964 (FAISS index bootstrap — the orchestrator test ERRORs there before reaching this gate; clearing AZ-964 first surfaces the empty-backbones gate cleanly).
* **Blocks**: AZ-840 (orchestrator test cannot PASS end-to-end without a real backbone).
* **Related**: AZ-321 (defines NetVLAD as the C2 baseline), AZ-839 (C3 fixture).
## Estimate
3 SP if a usable `.onnx` already exists in the team's drive; 5 SP if export/training is needed. If 5+ SP, consider splitting model-acquisition from yaml-wiring into two sub-tickets.
## References
* Fixture skip-gate: `tests/e2e/replay/conftest.py:594-601`
* Backbone factory: `src/gps_denied_onboard/runtime_root/c10_factory.py::build_backbone_specs`
* Backbone spec dataclass: `src/gps_denied_onboard/components/c10_provisioning/config.py`
* AZ-321 (NetVLAD baseline choice)
* AZ-962 spec: `_docs/02_tasks/done/AZ-962_operator_config_jetson_wiring.md`
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step: sub_step:
phase: 6 phase: 6
name: implement-tasks name: implement-tasks
detail: "batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui." detail: "batch 10 = AZ-962 SHIPPED end-to-end + 2 new tickets filed. Implemented `configs/operator_replay.yaml` (registers c6/c7/c10/c11 with defaults; `backbones: []` intentionally — see AZ-965), `docker-compose.test.jetson.yml` exports `GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml` + bind-mounts `./configs:/opt/configs:ro`, `ENV_KEY_MAP` (`src/gps_denied_onboard/config/loader.py`) gained two entries (`SATELLITE_PROVIDER_URL``c11.satellite_provider_url`, `SATELLITE_PROVIDER_API_KEY``c11.service_api_key`) so secrets flow from `.env.test` and never land in YAML, and README dropped the manual export step. 97/97 c11+config unit tests stay green. Tier-2 re-run on Jetson AGX Orin (`JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`): 4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s — i.e. -2 skipped, +2 errors vs the prior baseline. AZ-962 AC-3 + AC-4 satisfied: AZ-840 orchestrator no longer SKIPs at env-var; it now ERRORs at a deeper, real gate during fixture setup with `IndexUnavailableError: FaissDescriptorIndex: .index file missing at /tmp/pytest-of-root/pytest-0/operator_pre_flight_cache0/descriptor.index`. Same error in `test_operator_pre_flight_integration.py::test_operator_pre_flight_setup_produces_populated_cache` confirms fixture-wide bug, not a single-test issue. Root cause: `conftest.py:487` calls `build_descriptor_index(config)` against a fresh empty `c6_tile_cache.root_dir` (tmp dir per AZ-839 invariant) — FAISS factory needs an existing `.index`. `tile-init` compose service exists but writes its seed to `/var/lib/gps-denied/tiles`, not the tmp dir the fixture overrides into. Filed AZ-964 (3 SP, To Do, FAISS index bootstrap; preferred fix = invoke `mk_test_faiss_fixture.py` inline against override `root_dir`) and AZ-965 (3 SP, To Do, blocked by AZ-964, NetVLAD ONNX backbone provisioning — the next gate after FAISS clears). AZ-962 transitioned To Do → In Progress → Done in Jira (read-back verified). AZ-962 spec moved todo/ → done/. **Cycle-4 e2e gate still NOT GREEN**: AZ-840 chain is now AZ-964 → AZ-965 → orchestrator PASS; 60s smoke is AZ-963 → 4 derkachi_1min tests PASS. OKVIS2 deferral directive still in force (not yet met). Earlier same-day batch 9 = Tier-2 Jetson e2e validation run NOT GREEN. Ran `JETSON_SSH_ALIAS=jetson bash scripts/run-tests-jetson.sh`; result = 4 failed / 48 passed / 3 skipped / 1 xfailed / 1 xpassed in 90.59s. Two distinct blockers: (1) AZ-840 orchestrator test SKIPPED because `GPS_DENIED_OPERATOR_CONFIG_PATH` not exported by `docker-compose.test.jetson.yml` AND `operator_replay.yaml` missing from repo — Epic AZ-835's 'Done' status was validated by doc-content only, never by actual orchestrator test execution; (2) AZ-895 fallout — 4 tests in `test_derkachi_1min.py` regress with `EstimatorFatalError('eskf filter divergence: mahalanobis²=212.311 > 100.0')` at frame 233 because the CSV-driven path (now primary) runs open-loop on the Derkachi fixture (no reference C6 tile cache → no satellite anchoring). Filed AZ-962 (3 SP, operator config + compose wiring) and AZ-963 (3 SP, ESKF regression triage). OKVIS2 chain stays deferred per user 2026-05-29 directive ('after Derkachi e2e green' — directive unchanged; e2e not green). AZ-842 caveat: the AZ-840/AZ-842 'Done' tracker state set earlier today is contingent on whether convention (A) 'In Testing = shipped' or (B) 'Done = shipped+tested' applies; user-skipped convention question, leftover holds the walk-back payload if needed. Cycle-4 not green. Earlier same-day batch 8 = tracker-only fix for AZ-842 (To Do → Done, read-back verified) + wider Jira drift audit recorded as `_docs/_process_leftovers/2026-05-29_jira_status_drift_audit.md`. 10 cycle-3/4 tickets (AZ-836/838/839/840/894/895/896/899/900/901) shipped to `done/` locally but stuck in 'In Testing' in Jira; Epic AZ-835 in `todo/` with all 5 children done. User skipped A/B/C/D convention question — leftover holds the bulk-transition payload for whichever convention they pick. **Corrected cycle-4 todo/ remainder**: nothing actionable. Earlier narratives that listed AZ-899/900/901 as 'cycle-4 todo/ remainder for next batches' were fiction — those specs have been in done/ the whole time. OKVIS2 chain (AZ-943/951/952) sits in todo/ but is deferred per user 2026-05-29 directive until after Derkachi e2e flight test passes. Cycle-4 product work is effectively complete pending Derkachi e2e green + AZ-897 UI in ../ui."
retry_count: 0 retry_count: 0
cycle: 4 cycle: 4
tracker: jira tracker: jira
+66
View File
@@ -0,0 +1,66 @@
# AZ-962 — Operator pre-flight + replay-mode config for Tier-2 Jetson e2e harness.
#
# Consumed by `tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache`
# (the AZ-839 C3 fixture) which `load_config(env, paths=[this])` then drives the
# AZ-840 7-step orchestrator (`test_az835_e2e_real_flight.py`).
#
# Most fields stay at their dataclass defaults (see
# `src/gps_denied_onboard/components/{c6_tile_cache,c7_inference,c10_provisioning,c11_tile_manager}/config.py`).
# The blocks are declared here primarily so the four-component contract the
# fixture skip-gate cites is satisfied by inspection of this file. The env
# vars below are filled by docker-compose.test.jetson.yml / `.env.test`:
#
# * `GPS_DENIED_FC_PROFILE`, `GPS_DENIED_TIER`, `DB_URL` → runtime
# * `INFERENCE_BACKEND`, `TILE_CACHE_PATH`, `CAMERA_CALIBRATION_PATH` → runtime
# * `LOG_LEVEL`, `LOG_SINK` → log
# * `FDR_PATH` → fdr
# * `SATELLITE_PROVIDER_URL` → c11_tile_manager.satellite_provider_url
# * `SATELLITE_PROVIDER_API_KEY` → c11_tile_manager.service_api_key
#
# AZ-964 (follow-up, not yet filed): the orchestrator test SKIPs at the
# next gate because `c10_provisioning.backbones` is empty — no NetVLAD /
# DINOv2 .onnx file ships with this repo. Populating the backbones list
# here (and provisioning the matching .onnx + verifying it compiles on
# Tegra) is AZ-964's scope, not AZ-962's.
__top__:
mode: replay
runtime:
fc_profile: ardupilot_plane
tier: 2
replay:
pace: asap
target_fc_dialect: ardupilot_plane
c6_tile_cache:
store_runtime: postgres_filesystem
metadata_runtime: postgres_filesystem
descriptor_index_runtime: faiss_hnsw
postgres_pool_size: 4
lru_eviction_threshold_bytes: 10737418240 # 10 GiB
c7_inference:
runtime: pytorch_fp16
thermal_poll_hz: 1.0
engine_cache_dir: /var/lib/gps-denied/engines
gpu_memory_budget_bytes: 4294967296 # 4 GiB
trtexec_timeout_s: 600
ort_trt_cache_dir: /var/lib/gps-denied/engines/ort_trt_cache
c10_provisioning:
workspace_mb: 4096
# backbones intentionally empty — see AZ-964 for the follow-up.
# The AZ-839 fixture skip-gate (conftest.py:594-601) fires here
# with a clear message until backbone provisioning lands.
c11_tile_manager:
# satellite_provider_url + service_api_key flow in from env vars
# (SATELLITE_PROVIDER_URL / SATELLITE_PROVIDER_API_KEY) via the
# loader's ENV_KEY_MAP additions in AZ-962.
upload_batch_size: 25
upload_http_timeout_s: 30.0
download_http_timeout_s: 30.0
download_max_5xx_retries: 4
download_resolution_floor_m_per_px: 0.5
+7
View File
@@ -169,9 +169,16 @@ services:
# `replay_runner` fixture trips that gate without this line. # `replay_runner` fixture trips that gate without this line.
BUILD_CSV_REPLAY_ADAPTER: "ON" BUILD_CSV_REPLAY_ADAPTER: "ON"
BUILD_FAISS_INDEX: "ON" BUILD_FAISS_INDEX: "ON"
# AZ-962: the AZ-839 C3 fixture (operator_pre_flight_setup) skips
# the AZ-840 orchestrator test when this var is missing. The YAML
# bind-mounted at /opt/configs/operator_replay.yaml declares the
# four blocks the fixture consumes (c6/c7/c10/c11). c10.backbones
# is intentionally empty — AZ-964 ships the .onnx + populates it.
GPS_DENIED_OPERATOR_CONFIG_PATH: /opt/configs/operator_replay.yaml
volumes: volumes:
- ./tests:/opt/tests:ro - ./tests:/opt/tests:ro
- ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro - ./_docs/00_problem/input_data:/opt/_docs/00_problem/input_data:ro
- ./configs:/opt/configs:ro
- fdr-data:/var/lib/gps-denied/fdr - fdr-data:/var/lib/gps-denied/fdr
- tile-data:/var/lib/gps-denied/tiles - tile-data:/var/lib/gps-denied/tiles
+11
View File
@@ -74,6 +74,14 @@ ENV_KEY_MAP: Final[dict[str, tuple[str, str]]] = {
"REPLAY_PACE": ("replay", "pace"), "REPLAY_PACE": ("replay", "pace"),
"REPLAY_TIME_OFFSET_MS": ("replay", "time_offset_ms"), "REPLAY_TIME_OFFSET_MS": ("replay", "time_offset_ms"),
"REPLAY_TARGET_FC_DIALECT": ("replay", "target_fc_dialect"), "REPLAY_TARGET_FC_DIALECT": ("replay", "target_fc_dialect"),
# C11 tile-manager URL + bearer (AZ-962) — the Jetson harness +
# operator-orchestrator deploys inject these via env so the YAML
# never carries a real secret. `build_tile_downloader` raises if
# `service_api_key` is empty; mapping it through ENV_KEY_MAP lets
# the e2e harness fill the field from `.env.test` without a YAML
# override step.
"SATELLITE_PROVIDER_URL": ("c11_tile_manager", "satellite_provider_url"),
"SATELLITE_PROVIDER_API_KEY": ("c11_tile_manager", "service_api_key"),
} }
# Env vars that MUST resolve to a non-empty value before `load_config` # Env vars that MUST resolve to a non-empty value before `load_config`
@@ -122,6 +130,9 @@ _FIELD_COERCIONS: Final[dict[str, type]] = {
"output_path": str, "output_path": str,
"pace": str, "pace": str,
"target_fc_dialect": str, "target_fc_dialect": str,
# C11 (AZ-962)
"satellite_provider_url": str,
"service_api_key": str,
} }
+11 -1
View File
@@ -51,10 +51,20 @@ matching nadir video + camera calibration, the orchestrator runs the
ssh jetson-e2e ssh jetson-e2e
cd /workspace/gps-denied-onboard cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1 export RUN_REPLAY_E2E=1
export GPS_DENIED_OPERATOR_CONFIG_PATH=/workspace/configs/operator_replay.yaml
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2 pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
``` ```
AZ-962: `docker-compose.test.jetson.yml` exports
`GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml`
automatically and bind-mounts `./configs:/opt/configs:ro`, so no
manual env-var export is required when running through
`scripts/run-tests-jetson.sh`. The YAML at `configs/operator_replay.yaml`
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
secrets (`SATELLITE_PROVIDER_API_KEY`) flow in from `.env.test` via
the loader's `ENV_KEY_MAP`. `c10_provisioning.backbones` is
intentionally empty pending AZ-964 (the orchestrator test will
SKIP at the "no backbones" gate until AZ-964 lands).
The bundled local-development entry point is `scripts/run-tests-jetson.sh`, The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
which handles the SSH alias + rsync + remote pytest invocation. See which handles the SSH alias + rsync + remote pytest invocation. See
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract. `_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.