mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:51:14 +00:00
[AZ-777] Rewrite spec: real satellite-provider + production C10/C11
Original spec called for direct OSM/CARTO downloads, contradicting architecture (C11 owns tile network I/O against parent-suite satellite-provider .NET 8 service; C10 batches descriptors over the populated C6, never touches the upstream). Rewritten spec drives the production C10/C11 pipeline against the real satellite-provider running in docker-compose.test.yml, replacing the mock-suite-sat- service GET stub. Complexity 5 -> 8 pts (single-ticket override). Decision log: _docs/_process_leftovers/2026-05-21_az777_complexity_ override.md. Jira AZ-777 description + summary synced. Autodev state pauses for next session to pick up Phase 1 (satellite-provider stand-up + smoke test). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,193 +1,196 @@
|
||||
# Derkachi C6 reference tile cache + descriptor index (OSM/CARTO basemap)
|
||||
# Derkachi e2e: wire real satellite-provider + production C10/C11 pipeline into the operator pre-flight fixture
|
||||
|
||||
**Task**: AZ-777_derkachi_c6_reference_fixture
|
||||
**Name**: Build the C6 reference tile cache + FAISS descriptor index for the Derkachi flight bbox so the full-protocol C1+C2+C3+C4+C5 pipeline can produce satellite anchors during e2e replay
|
||||
**Description**: Add a reproducible build script that downloads OSM/CARTO basemap tiles for the Derkachi flight bbox (approx 50.05–50.15 lat, 36.05–36.15 lon), pre-computes feature descriptors via the same C7 backbone the airborne binary uses (DINOv2 or the configured VPR backbone), populates the C6 tile store + FAISS HNSW index, and integrates them into the e2e replay harness. Unblocks the two remaining `@xfail`-masked Derkachi tests on Jetson (`test_ac3_within_100m_80pct_of_ticks` and `test_az699_real_flight_validation_emits_verdict_and_report`) and produces the first honest AZ-699 accuracy verdict.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-776_eskf_open_loop_composition_profile
|
||||
**Component**: c6_tile_cache / e2e fixtures / input_data
|
||||
**Name**: Drive the production C10/C11 pre-flight pipeline against a real parent-suite `satellite-provider` service in the e2e harness so the Derkachi clip produces a real FAISS-anchored C4/C5 satellite-fix loop end-to-end
|
||||
**Description**: Replace the e2e harness's `mock-suite-sat-service` `/healthz`-only stub on the GET tile path with the real `satellite-provider` .NET 8 service (sibling repo at `../satellite-provider`). Seed satellite-provider's Derkachi-bbox tile catalog from a CC-BY-licensed basemap source. Replace the `operator_pre_flight_setup` placeholder fixture in `tests/e2e/replay/conftest.py` with a real fixture that drives the production C11 `HttpTileDownloader` + C10 `DescriptorBatcher` pipeline against the running service, builds C6 (Postgres metadata + filesystem tile store + FAISS HNSW descriptor index), and mounts the populated cache into the e2e-runner container. Un-xfail the Derkachi AC-3 + AZ-699 verdict tests on Tier-2 Jetson; produce the first honest AZ-699 horizontal-error verdict report.
|
||||
**Complexity**: 8 points (explicit override of the standard 5-pt PBI cap — see decision log entry 2026-05-21 under `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md`; single-ticket containment is preferred over decomposition because the four sub-deliverables only deliver demo-confidence value when shipped together)
|
||||
**Dependencies**: AZ-776_eskf_open_loop_composition_profile (done — AZ-776 unblocks compose; this task closes the satellite-anchoring loop)
|
||||
**Component**: e2e fixtures / c6_tile_cache / c10_provisioning / c11_tile_manager / docker compose
|
||||
**Tracker**: AZ-777
|
||||
**Epic**: AZ-602
|
||||
|
||||
## Problem
|
||||
|
||||
The Derkachi e2e fixture
|
||||
(`_docs/00_problem/input_data/flight_derkachi/`) ships the real
|
||||
flight inputs (video, tlog, IMU, camera calibration) but DOES NOT
|
||||
ship the C6 tile-cache artifacts that the replay protocol requires
|
||||
the operator's pre-flight C10 stage to produce:
|
||||
The Derkachi e2e fixture (`_docs/00_problem/input_data/flight_derkachi/`) ships real flight inputs (video, tlog, IMU, camera calibration) but DOES NOT ship the populated C6 tile cache + FAISS descriptor index the replay protocol requires (`replay_protocol.md` Invariant 12: "Real C6 cache in replay: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow"). Two architectural gaps stop the full-protocol C1+C2+C3+C4+C5 pipeline from running against Derkachi today:
|
||||
|
||||
- `c6_tile_store` — persistent JPEG tiles covering the flight area at the chosen zoom levels
|
||||
- `c6_descriptor_index` — FAISS index of VPR-backbone descriptors over those tiles
|
||||
1. **`mock-suite-sat-service` is `/healthz`-only.** The stub at `tests/fixtures/mock-suite-sat-service/main.py` exposes only `GET /healthz` and does NOT implement the `/api/satellite/tiles` contract that C11 `HttpTileDownloader` (production code at `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`) queries against. Any e2e test that wants to exercise the production tile-download path against the stub gets HTTP 404 the moment C11 calls `_LIST_PATH = "/api/satellite/tiles"`.
|
||||
2. **`operator_pre_flight_setup` is a placeholder.** The fixture at `tests/e2e/replay/conftest.py` (lines 293-310) `mkdir`s an empty `operator_cache` directory and yields. It does NOT drive C11 download or C10 descriptor-batcher; it does NOT populate C6. The fixture's docstring explicitly calls itself "a stub" pending this ticket.
|
||||
|
||||
Without these artifacts:
|
||||
The production architecture says (per `architecture.md` Principle #5 + the C10/C11 component descriptions):
|
||||
|
||||
- C2 VPR has no haystack to look up against — `c2_vpr.lookup` returns empty.
|
||||
- C3 matcher has nothing to match against (depends on C2 candidates).
|
||||
- C4 pose has no anchors — cannot estimate satellite-frame pose.
|
||||
- C5 state has no anchors to fuse — runs open-loop on VIO only.
|
||||
- C10 does NOT touch satellite-provider — tile network I/O lives in C11.
|
||||
- C11 `HttpTileDownloader` is the production path: authenticated GETs against the parent-suite `satellite-provider` .NET 8 REST service (sibling repo at `../satellite-provider/`, real implementation with `SatelliteProvider.Api`, region-onboarding flows, integration tests).
|
||||
- `satellite-provider` owns the OSM/CARTO tile network I/O + license attribution + multi-flight voting layer — the onboard companion is read-only against it (via C11) during pre-flight and read-only against C6 during flight.
|
||||
- `mock-suite-sat-service` exists specifically for the D-PROJ-2 ingest (POST upload) endpoint that the parent-suite has not yet shipped — NOT for the GET tile-fetch path.
|
||||
|
||||
When `c5_state.strategy = gtsam_isam2` (the default that AZ-699's e2e
|
||||
exercises), the composition reaches the per-frame loop but
|
||||
`iSAM2.update` crashes at frame 1 with:
|
||||
|
||||
```
|
||||
EstimatorFatalError: compute_marginals failed: Attempting to at the
|
||||
key 'x2', which does not exist in the Values.
|
||||
```
|
||||
|
||||
— because no C4 anchor was ever inserted (C2/C3/C4 have nothing to
|
||||
match against).
|
||||
|
||||
AZ-776 (sibling, prerequisite) makes the open-loop C1+C5(ESKF)
|
||||
composition runnable, but that path skips C2–C4 entirely and accepts
|
||||
unbounded drift. To validate the FULL protocol-compliant pipeline
|
||||
against Derkachi — i.e. AC-3 (`≤100 m for 80 % of ticks`) and the
|
||||
AZ-699 horizontal-error verdict — we need real C6 fixtures.
|
||||
|
||||
The replay protocol (`replay_protocol.md` line 214) explicitly states
|
||||
"`BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay
|
||||
alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`,
|
||||
populated by the pre-flight C10 build. This is the architectural
|
||||
change vs. v1.0.0 of this contract." We have no such build for
|
||||
Derkachi.
|
||||
The current AZ-777 spec ("write a script under `scripts/build_derkachi_c6_fixture.py` that downloads OSM/CARTO basemap tiles directly") was inconsistent with this architecture: it asked the onboard companion to do network I/O against an external imagery source instead of going through C11→satellite-provider. The corrected scope (this revision) drives the production pipeline end-to-end.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A reproducible build script under `scripts/` produces the C6 artifacts (`tile_store` + `descriptor_index`) given the Derkachi bbox + zoom levels + camera calibration, deterministically on a clean checkout, in under 30 minutes on a developer workstation.
|
||||
- Reference imagery source is OSM-tile-server-distributed basemap (CARTO Voyager or equivalent CC-BY-licensed source). Each tile carries the source URL + license attribution in its metadata sidecar.
|
||||
- The Derkachi fixture directory documents the build invocation; tiles + index are EITHER committed to the repo (if total size ≤ 100 MB) OR built on-demand from the script (if larger) — decision recorded in the fixture README.
|
||||
- `tests/e2e/replay/conftest.py`'s `operator_pre_flight_setup` fixture is replaced (or extended) to mount the prebuilt artifacts into the e2e-runner container. The mock-suite-sat-service stub is retired for the C6-served paths (it remains for the C12 operator-workflow AC-8).
|
||||
- After this task ships (with AZ-776), un-xfail `test_ac3_within_100m_80pct_of_ticks` (`test_derkachi_1min.py` line 174) AND `test_az699_real_flight_validation_emits_verdict_and_report` (`test_derkachi_real_tlog.py` line 174); both pass on the Jetson harness.
|
||||
- The first honest AZ-699 verdict lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the full horizontal-error distribution. Whether the verdict is PASS or FAIL is the honest finding — this task's success is that the verdict is *produced* against the real pipeline, not that it is necessarily green.
|
||||
- The e2e harness `docker-compose.test.yml` runs the real `satellite-provider` .NET 8 service (built from `../satellite-provider/SatelliteProvider.Api/Dockerfile`) alongside the existing `mock-sat` (which is retained only for the D-PROJ-2 POST/upload contract until the parent suite ships it).
|
||||
- `satellite-provider`'s tile catalog is seeded with the Derkachi bbox (≈50.05–50.15 lat, 36.05–36.15 lon) at the camera-AGL-appropriate zoom levels (15–18) via the service's existing region-onboarding flow (CC-BY-licensed basemap source; license + attribution baked into the seeded catalog's metadata).
|
||||
- `tests/e2e/replay/conftest.py::operator_pre_flight_setup` is replaced by a real fixture that:
|
||||
1. Resolves the Derkachi bbox + camera-derived zoom range from the existing flight fixture.
|
||||
2. Invokes C11 `HttpTileDownloader` against the running `satellite-provider` to populate C6 (Postgres metadata + filesystem tile store).
|
||||
3. Invokes C10 `DescriptorBatcher` against the populated C6 to build the FAISS HNSW descriptor index via the production NetVLAD backbone (C2 default per `c2_vpr/config.py:67`).
|
||||
4. Verifies all three sidecar files (`.index`, `.sha256`, `.meta.json`) per the FAISS sidecar coherence invariant (AZ-306).
|
||||
5. Yields the populated cache directory + Postgres connection string for the e2e-runner to mount.
|
||||
- The populated C6 is mounted into the `e2e-runner` container via named volumes that survive across pytest sessions (so repeated test runs reuse the cache).
|
||||
- AC-3 (`test_ac3_within_100m_80pct_of_ticks` in `tests/e2e/replay/test_derkachi_1min.py`) un-xfails and passes on Tier-2 Jetson with ≥ 80 % of ticks within 100 m of ground truth.
|
||||
- AZ-699 verdict test (`test_az699_real_flight_validation_emits_verdict_and_report` in `tests/e2e/replay/test_derkachi_real_tlog.py`) un-xfails and produces the first honest horizontal-error distribution report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `scripts/build_derkachi_c6_fixture.py` (or equivalent module under `e2e/fixtures/derkachi_c6/`): reproducible build pipeline that:
|
||||
- Reads the Derkachi bbox + zoom levels from a small YAML config (`tests/fixtures/derkachi_c6/bbox.yaml`).
|
||||
- Downloads OSM/CARTO basemap tiles into `<output>/tiles/{zoom}/{x}/{y}.jpg` mirroring `satellite-provider`'s on-disk layout (per architecture principle #5).
|
||||
- Computes per-tile descriptors via the same C7 backbone the airborne binary uses (configurable; defaults to whatever `config.components.c2_vpr.strategy`'s feature dimension is — e.g. UltraVPR or NetVLAD).
|
||||
- Builds a FAISS HNSW index over the descriptors, writes via `faiss.write_index` + atomicwrites + SHA-256 content-hash gate (per D-C10-3).
|
||||
- Emits a manifest JSON recording tile count, bbox, zoom levels, backbone, descriptor dimension, FAISS index parameters, source URL template, license, and the SHA-256 of every artifact.
|
||||
- `tests/fixtures/derkachi_c6/bbox.yaml`: the bbox + zoom + backbone config consumed by the build script. Committed.
|
||||
- `tests/fixtures/derkachi_c6/README.md`: how to rebuild + license attribution + estimated artifact size.
|
||||
- Build the artifacts once, decide commit vs on-demand:
|
||||
- If total size ≤ 100 MB → commit to `_docs/00_problem/input_data/flight_derkachi/c6_cache/` (under LFS).
|
||||
- If > 100 MB → keep build-on-demand only, document the build invocation in the fixture README, and add a `scripts/run-tests-jetson.sh` pre-step that builds if absent.
|
||||
- `tests/e2e/replay/conftest.py`: replace `operator_pre_flight_setup`'s mock with a real fixture that mounts the prebuilt artifacts into the e2e-runner container at the expected paths (`/opt/tiles/`, `/opt/descriptor_index.index`).
|
||||
- `docker-compose.test.yml` + `docker-compose.test.jetson.yml`: mount the artifacts into the `e2e-runner` service (bind mount or named volume), set `c6_tile_store.path` + `c6_descriptor_index.path` env vars.
|
||||
- `tests/e2e/replay/test_derkachi_1min.py`: remove the `@pytest.mark.xfail` decorator on AC-3 (line 174).
|
||||
- `tests/e2e/replay/test_derkachi_real_tlog.py`: remove the `@pytest.mark.xfail` decorator on AZ-699 (line 174).
|
||||
- `_docs/00_problem/input_data/flight_derkachi/README.md`: document the new C6 artifacts + build invocation + license attribution.
|
||||
- `_docs/02_document/contracts/c6_tile_cache/`: if a contract file exists for the descriptor-index format, append a Consumer entry naming this fixture; if not, no new contract needed.
|
||||
**Phase 1 — satellite-provider stand-up in the e2e harness**
|
||||
|
||||
- `docker-compose.test.yml`: add a `satellite-provider` service that builds from `../satellite-provider/SatelliteProvider.Api/Dockerfile`. Service depends on a `satellite-provider-db` Postgres instance (separate from the existing `db` service for c6 metadata to avoid cross-tenant table collisions). Service exposes port 5101 (`satellite-provider` standard) inside the compose network.
|
||||
- `e2e-runner` env: replace `SATELLITE_PROVIDER_URL: http://mock-sat:5100` with `SATELLITE_PROVIDER_URL: http://satellite-provider:5101` for the C11 download path. Keep `MOCK_SAT_UPLOAD_URL: http://mock-sat:5100` for the D-PROJ-2 POST stub (until D-PROJ-2 ships).
|
||||
- `docker-compose.test.jetson.yml`: mirror the same satellite-provider service for Tier-2 (build context unchanged; Jetson uses cross-compiled image once the parent-suite .NET service builds for arm64 — verify in this task whether the existing Dockerfile produces an arm64-capable image, otherwise file a follow-up).
|
||||
- Smoke test in `tests/e2e/satellite_provider/test_smoke.py`: brings up the docker-compose stack, GETs `/healthz` against the real service, runs a single C11 `HttpTileDownloader.download_for_bbox` call against a 1-tile bbox, asserts the tile arrives in C6 + the metadata row is inserted. Gated by `RUN_REPLAY_E2E=1`.
|
||||
|
||||
**Phase 2 — Derkachi tile catalog seeding**
|
||||
|
||||
- `tests/fixtures/derkachi_c6/seed_region.py` (new): a Python helper that calls the real `satellite-provider` region-onboarding endpoint (`/api/regions` or whatever the contract is — verify against the .NET source at `../satellite-provider/SatelliteProvider.Api`) to register the Derkachi bbox + zoom range. The seed run uses CARTO Voyager Basemap as the upstream imagery source (CC-BY-3.0; satellite-provider owns the actual tile download from CARTO and applies the freshness gate).
|
||||
- `tests/fixtures/derkachi_c6/bbox.yaml`: Derkachi bbox + zoom levels + imagery source + license attribution metadata. The values match the seed script's payload.
|
||||
- `tests/fixtures/derkachi_c6/README.md`: how to re-seed if the satellite-provider DB is wiped; license attribution operators must propagate.
|
||||
|
||||
**Phase 3 — replace `operator_pre_flight_setup` with a real fixture**
|
||||
|
||||
- `tests/e2e/replay/conftest.py::operator_pre_flight_setup`: replace the placeholder. The new fixture:
|
||||
- Reads the Derkachi bbox from `tests/fixtures/derkachi_c6/bbox.yaml`.
|
||||
- Invokes C11 `HttpTileDownloader` against the running satellite-provider service.
|
||||
- Invokes C10 `DescriptorBatcher` against the populated C6 (NetVLAD backbone per c2_vpr default).
|
||||
- Verifies sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency check per AZ-306).
|
||||
- Yields a `PopulatedC6Cache` dataclass that the test bodies consume.
|
||||
- The fixture's outputs are mounted into the e2e-runner container via named volumes that survive across pytest sessions (so the second test run in the same session reuses the populated cache — re-seeding takes minutes, re-downloading takes longer).
|
||||
|
||||
**Phase 4 — un-xfail the Tier-2 tests**
|
||||
|
||||
- `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`: remove `@pytest.mark.xfail` (still gated by `RUN_REPLAY_E2E=1` env + `tier2` marker — only runs on Tier-2 harness).
|
||||
- `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`: remove `@pytest.mark.xfail`. The test body MUST emit the verdict report to `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` regardless of PASS/FAIL — the success criterion is that the report exists with the honest distribution, not that the verdict is necessarily green.
|
||||
|
||||
**Phase 5 — documentation**
|
||||
|
||||
- `_docs/02_document/contracts/replay/replay_protocol.md`: Invariant 12 already states "Real C6 cache in replay" — append the AZ-777 / e2e-runner integration detail in a new sub-section under **Composition root extension** describing the operator_pre_flight_setup fixture's behaviour.
|
||||
- `_docs/00_problem/input_data/flight_derkachi/README.md`: add a Derkachi C6 section pointing at the seed script + bbox config.
|
||||
- `_docs/02_document/architecture.md`: append a new sub-section to the existing satellite-provider entry (line ~28) noting that the e2e harness now stands up the real service via `docker-compose.test.yml`; `mock-suite-sat-service` is retained only for the unshipped D-PROJ-2 POST contract.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Multi-flight fixtures — just Derkachi. (Other flights would each need their own C6 build invocation.)
|
||||
- Online tile download at test time — the e2e harness MUST remain offline (per replay protocol Invariant 5 / RESTRICT-SAT-1 / NFT-SEC-02; the docker compose `internal: true` network). The build script downloads tiles AT BUILD TIME from the developer workstation; the e2e harness only sees the prebuilt artifacts.
|
||||
- Replacing the mock-suite-sat-service stub for the C12 operator-workflow `test_ac8_operator_workflow` test — that test exercises the D-PROJ-2 ingest contract which is parent-suite work, not in scope here.
|
||||
- Building tiles for any backbone other than the airborne-default. If the operator wants a different backbone, they re-run the script with a different `--backbone` flag; this task only commits the default-backbone artifacts.
|
||||
- Switching the airborne C6 backend from Postgres-mirroring to anything else — the build script writes the same on-disk layout the production C6 expects.
|
||||
- AZ-776 (sibling): this task does NOT introduce the `c4_pose.enabled` flag or the open-loop composition profile. AZ-776 must land first to unblock the open-loop xfails (AC-1, AC-2, AC-5, AC-6); this task targets the full-GTSAM xfails (AC-3, AZ-699).
|
||||
- The D-PROJ-2 POST/upload contract — still gated on the parent-suite design landing. `mock-suite-sat-service` continues to handle the POST stub.
|
||||
- Multi-flight fixtures — Derkachi only. Other flights each need their own bbox seed and re-run.
|
||||
- Switching C2 default backbone away from `net_vlad` — out of scope; if the operator wants UltraVPR or DINOv2, they re-run C10 with a different backbone configuration.
|
||||
- Cross-compilation of satellite-provider for Jetson arm64 if the existing Dockerfile does not produce arm64 — file a follow-up ticket if needed; this task does NOT attempt to land arm64 support in the .NET service.
|
||||
- Modifying any file under `../satellite-provider/` (sibling repo) — this task is purely additive on the gps-denied-onboard side + docker-compose orchestration. If the .NET service is missing an endpoint the C11 client requires, file a parent-suite ticket and STOP.
|
||||
- Persisting the populated C6 to git/LFS — the named-volume approach above keeps the cache out of the repo. If repo-committed artifacts become a requirement later, file a follow-up to evaluate LFS size.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Reproducible build**
|
||||
Given a clean checkout
|
||||
When `python scripts/build_derkachi_c6_fixture.py --output tests/fixtures/derkachi_c6/out --bbox tests/fixtures/derkachi_c6/bbox.yaml` runs
|
||||
Then it produces a `tiles/` directory in the documented `{zoom}/{x}/{y}.jpg` layout, a FAISS `.index` file with a SHA-256-verified content hash, and a `manifest.json` recording tile count, bbox, backbone, descriptor dimension, FAISS parameters, source URL template, license, and per-artifact SHA-256, in under 30 minutes on a developer workstation
|
||||
**AC-1: Real satellite-provider runs in the e2e harness**
|
||||
Given `docker-compose.test.yml` with the new `satellite-provider` service
|
||||
When `docker compose -f docker-compose.test.yml up satellite-provider` is invoked
|
||||
Then the service builds from `../satellite-provider/SatelliteProvider.Api/Dockerfile`, comes up healthy on port 5101, and `GET /healthz` returns 200
|
||||
|
||||
**AC-2: License attribution**
|
||||
Given the produced artifacts
|
||||
When the manifest is inspected
|
||||
Then it records the tile source URL template, the license name (CC-BY-3.0 or CC-BY-4.0 as applicable), and the attribution string the operator must surface in any derived publication
|
||||
**AC-2: C11 downloads against real satellite-provider succeed**
|
||||
Given the running satellite-provider service + a seeded Derkachi-bbox tile catalog
|
||||
When `tests/e2e/satellite_provider/test_smoke.py` runs C11 `HttpTileDownloader.download_for_bbox` for a single tile
|
||||
Then the tile arrives in the C6 filesystem store, the metadata row is inserted into C6's Postgres, and the freshness label is `fresh` (per the C6 freshness gate)
|
||||
|
||||
**AC-3: Offline e2e harness**
|
||||
Given the prebuilt C6 artifacts mounted into the e2e-runner container
|
||||
When `scripts/run-tests-jetson.sh` runs on Jetson with `RUN_REPLAY_E2E=1 GPS_DENIED_TIER=2` and the Docker compose network is `internal: true`
|
||||
Then the test harness never reaches out to any external host; all C6 queries are served from the mounted artifacts
|
||||
**AC-3: operator_pre_flight_setup drives the production pipeline**
|
||||
Given the running satellite-provider with Derkachi tiles seeded
|
||||
When `tests/e2e/replay/conftest.py::operator_pre_flight_setup` runs
|
||||
Then C11 `HttpTileDownloader` downloads the Derkachi-bbox tiles into C6, C10 `DescriptorBatcher` builds the FAISS HNSW index over them using the NetVLAD backbone, the three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check, and the fixture yields a `PopulatedC6Cache` with all three artifact paths populated
|
||||
|
||||
**AC-4: Full-protocol e2e passes**
|
||||
Given AZ-776 has landed AND the C6 artifacts are mounted AND the YAML config selects `c5_state.strategy = gtsam_isam2` with `c4_pose.enabled = True`
|
||||
When `gps-denied-replay` runs the Derkachi 1-min fixture on Jetson
|
||||
Then it exits with code 0, emits one EstimatorOutput per video frame, `test_ac3_within_100m_80pct_of_ticks` un-xfails and passes (≥80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not the existing `satellite_anchoring_not_wired` warning)
|
||||
**AC-4: AC-3 Derkachi test un-xfails on Tier-2**
|
||||
Given AZ-776 landed + the populated C6 from AC-3 mounted into the e2e-runner + the airborne binary configured with `c5_state.strategy = gtsam_isam2` + `c4_pose.enabled = True`
|
||||
When `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` runs on Tier-2 Jetson
|
||||
Then it un-xfails, the test passes (≥ 80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not the existing `satellite_anchoring_not_wired` warning)
|
||||
|
||||
**AC-5: AZ-699 produces an honest verdict**
|
||||
Given AZ-776 has landed AND the C6 artifacts are mounted AND the real flight video + factory calibration are present (already are)
|
||||
When `test_az699_real_flight_validation_emits_verdict_and_report` runs on Jetson
|
||||
Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥80 % within 100 m gate
|
||||
**AC-5: AZ-699 verdict report is produced**
|
||||
Given AZ-776 landed + the populated C6 from AC-3 + the real flight video + factory calibration
|
||||
When `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` runs on Tier-2 Jetson
|
||||
Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥ 80 % within 100 m gate (PASS not required for the AC; HONEST report required)
|
||||
|
||||
**AC-6: Fixture README documents rebuild**
|
||||
Given the updated `_docs/00_problem/input_data/flight_derkachi/README.md`
|
||||
When a new contributor reads it
|
||||
Then it documents (i) what C6 artifacts now exist, (ii) the exact `python scripts/build_derkachi_c6_fixture.py …` invocation to rebuild, (iii) the license attribution operators must propagate, (iv) the size-on-disk decision (committed vs. build-on-demand)
|
||||
**AC-6: Documentation captures the new architecture seam**
|
||||
Given the rewritten replay protocol doc + the Derkachi fixture README + the architecture sub-section
|
||||
When a new contributor reads them
|
||||
Then they understand (i) why the real satellite-provider runs in the e2e harness, (ii) how to re-seed the Derkachi catalog, (iii) which path goes through `mock-sat` vs. real satellite-provider (POST vs. GET), and (iv) what license attribution operators must propagate
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Build script completes in ≤ 30 minutes on a developer workstation (Apple Silicon or x86 Linux, no GPU required for OSM tile download + descriptor pre-compute via the CPU-fallback path of the backbone).
|
||||
- Built artifacts do not regress the airborne C2 lookup latency budget — the FAISS HNSW parameters MUST match what production C6 expects (M, efConstruction, efSearch); the index is built once and never rebuilt at runtime.
|
||||
- `operator_pre_flight_setup` completes in ≤ 5 minutes on first invocation (cold cache), ≤ 30 seconds on subsequent invocations within the same docker-compose session (warm cache via named volume).
|
||||
- Built C6 artifacts (tile store + descriptor index) match the airborne C2 lookup latency budget — FAISS HNSW parameters MUST match what production C6 expects (M, efConstruction, efSearch); the index is built once per session, never rebuilt mid-test.
|
||||
|
||||
**Compatibility**
|
||||
- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to `satellite-provider`'s layout (architecture principle #5) so a future post-landing upload would be byte-identical.
|
||||
- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex` impl without code changes.
|
||||
- Descriptor dimension MUST match the configured C7 backbone's output dimension — the build script asserts this at start.
|
||||
- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to satellite-provider's layout (architecture principle #5) — this is automatic because C11 writes via the C6 production code path.
|
||||
- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex.from_config` impl without code changes — this is automatic because C10 writes via the C6 production code path.
|
||||
- The .NET satellite-provider service's `/api/satellite/tiles` contract version MUST be compatible with the C11 `HttpTileDownloader._LIST_PATH` / `_GET_PATH` constants (`/api/satellite/tiles`). Mismatch is a parent-suite bug; this task does not patch C11 around it.
|
||||
|
||||
**Reliability**
|
||||
- Build script MUST fail loud on partial downloads (network error, HTTP 429/500, malformed tile) rather than silently producing an incomplete tile store. Resume-from-partial is allowed but each resumed run re-verifies SHA-256 of every committed tile.
|
||||
- The SHA-256 content-hash gate on the FAISS index (per D-C10-3) MUST be enforced — operator can verify a downloaded fixture matches what was built.
|
||||
- The smoke test (AC-2) MUST fail loud if the satellite-provider service is unreachable, returns malformed responses, or rate-limits — no silent skip.
|
||||
- The `operator_pre_flight_setup` fixture MUST clean up partial cache state on failure (no half-built FAISS index left around).
|
||||
- The SHA-256 content-hash gate on the FAISS index (per D-C10-3) MUST be verified at every fixture yield — mismatch raises `IndexUnavailableError`.
|
||||
|
||||
**Security**
|
||||
- Reference imagery URLs MUST be HTTPS. Tile metadata MUST record the exact source URL so license auditors can verify attribution.
|
||||
- No API keys committed to the repo — if the chosen tile source requires registration, the build script reads the key from an env var and documents the env var name in the fixture README.
|
||||
- Reference imagery source URLs MUST be HTTPS. License attribution recorded in the seeded catalog's metadata so operators can verify before any derived publication.
|
||||
- No JWT secrets committed — the satellite-provider service in docker-compose reads `JWT_SECRET` from a `.env.test` file that's `.gitignore`'d; the test environment uses a development-only key.
|
||||
- C11 download MUST go through the production auth path (Bearer token from satellite-provider's `/api/auth`) — no auth bypass for tests.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | Build script produces `tiles/`, `descriptor_index.index`, `manifest.json` on a small mock bbox | All three artifacts exist, manifest fields populated |
|
||||
| AC-1 | SHA-256 of `descriptor_index.index` recorded in manifest matches actual file hash | Hashes match |
|
||||
| AC-2 | Manifest records source URL template + license + attribution | All three fields non-empty |
|
||||
| AC-2 | License field matches the source's documented license | Round-trips against an enum |
|
||||
| AC-6 | Fixture README documents the build invocation | Invocation string greps cleanly |
|
||||
| AC-1 | docker-compose.test.yml validates `satellite-provider` service definition | YAML lints; service has correct build context + port |
|
||||
| AC-2 | C11 `HttpTileDownloader.download_for_bbox` against a stubbed real satellite-provider response | Returns expected `DownloadBatchReport` with `outcome=SUCCESS` |
|
||||
| AC-3 | `operator_pre_flight_setup` fixture yields a `PopulatedC6Cache` with non-empty tile store + FAISS index | All three sidecar files exist + sha256 triple-consistency holds |
|
||||
| AC-3 | Sidecar SHA-256 coherence check inside the fixture | `IndexUnavailableError` raised when one of the three files is tampered |
|
||||
| AC-6 | Fixture README documents the seed invocation | Invocation string + license attribution greps cleanly |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|--------------|-------------------|----------------|
|
||||
| AC-3 | Prebuilt C6 artifacts + e2e-runner with `internal: true` network | Run `scripts/run-tests-jetson.sh` end-to-end | No outbound network calls observed by Docker network logs; all C6 queries return from local index | Security, Reliability |
|
||||
| AC-4 | AZ-776 landed + C6 artifacts mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed | Test passes (≥80 % of ticks within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
|
||||
| AC-5 | AZ-776 landed + C6 artifacts mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test runs to completion ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
|
||||
| AC-1 | docker-compose.test.yml + satellite-provider service definition | `docker compose up satellite-provider` | Service comes up healthy in ≤ 60 s | Perf |
|
||||
| AC-2 | Real satellite-provider running + 1-tile-bbox query | C11 HttpTileDownloader against the live service | Tile arrives in C6 + metadata row inserted + freshness=fresh | Reliability |
|
||||
| AC-3 | Seeded Derkachi catalog + e2e-runner | `operator_pre_flight_setup` cold + warm invocation | Cold ≤ 5 min, warm ≤ 30 s, all three sidecar files coherent | Perf |
|
||||
| AC-4 | AZ-776 landed + populated C6 mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed on Tier-2 Jetson | Test passes (≥ 80 % within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
|
||||
| AC-5 | AZ-776 landed + populated C6 mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test completes ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Reference imagery source MUST be OSM/CARTO basemap (CC-BY-licensed). Operator chose this during AZ-777 scoping (cycle-3 Step 9, 2026-05-21) over Maxar Open Data (license uncertainty for in-repo redistribution) and video-self-orthorectification (self-referential, makes AC-3 a smoke test rather than a real accuracy gate). The trade-off — lower-resolution reference imagery may produce a higher residual on the AC-3 horizontal-error metric than satellite imagery would — is an HONEST finding the AZ-699 verdict will surface.
|
||||
- The build script MUST NOT depend on `satellite-provider` running. The script's only network dependency is the chosen OSM/CARTO tile server (HTTPS, public, no auth).
|
||||
- The committed artifact size budget (if AC-6 chooses commit-to-repo) is 100 MB total across `tiles/` + `descriptor_index.index`. Over budget → switch to build-on-demand, document in README.
|
||||
- The `mock-suite-sat-service` stub stays in place for `test_ac8_operator_workflow` — that test exercises the D-PROJ-2 contract which this task does not address.
|
||||
- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner. The build script runs on the developer workstation; the harness only sees prebuilt artifacts.
|
||||
- ZERO modifications to files under `../satellite-provider/` (sibling repo). If a parent-suite API gap is discovered (e.g., `/api/satellite/tiles` returns 404 because the endpoint isn't wired), STOP and file a parent-suite ticket; do not work around it on the onboard side.
|
||||
- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner once the cache is populated. The cache-population phase needs network (satellite-provider downloads from CARTO upstream), but once the docker-compose `e2e-runner` service is `internal: true`-networked for the airborne replay run, no external host is reachable. Verify with Docker network inspection during AC-4.
|
||||
- Imagery source MUST be CC-BY-licensed (CARTO Voyager Basemap or equivalent). The seeded catalog records the license + attribution string operators must propagate in any derived publication.
|
||||
- The seeded Derkachi catalog size budget is 100 MB on the satellite-provider DB side. Over budget → reduce zoom-level coverage; document the trade-off in `bbox.yaml` and `tests/fixtures/derkachi_c6/README.md`.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: OSM basemap residual is too coarse for the AC-3 threshold**
|
||||
- *Risk*: AC-3's `≤100 m for 80 %` gate may be physically unmeetable when the reference imagery is OSM rasterized basemap (street-level features, not satellite features) — the visual descriptors may not lock against the aerial nav-camera frames at all.
|
||||
- *Mitigation*: This is an honest discovery. If AC-3 still fails after this task lands, the failure mode shifts from "no anchors at all" (current) to "anchors exist but VPR similarity is too low to produce ≥80 % within 100 m". The AZ-699 verdict report will surface the actual horizontal-error distribution; if it lands at e.g. p50 = 250 m, that becomes evidence for a follow-up ticket to switch to satellite imagery. The xfail is removed in either case because the test now exercises the real pipeline — the verdict, not the xfail, becomes the honest signal.
|
||||
**Risk 1: satellite-provider's `/api/satellite/tiles` contract drifts from what C11 expects**
|
||||
- *Risk*: C11 `HttpTileDownloader` was implemented against an older satellite-provider contract; recent satellite-provider changes may have moved or renamed the endpoint.
|
||||
- *Mitigation*: AC-1 smoke test fires the C11 call against the real service before any test depends on it. Any 404/400/contract mismatch surfaces immediately; the failure points at a parent-suite ticket, not an onboard bug. The onboard code path is the standard production code; this task does not modify it.
|
||||
|
||||
**Risk 2: Tile source rate-limits or goes offline mid-build**
|
||||
- *Risk*: Public OSM/CARTO tile servers may rate-limit or temporarily go down, breaking reproducibility on a re-build.
|
||||
- *Mitigation*: Build script implements exponential backoff + resume-from-partial. Document the chosen tile-server URL in the fixture README so an operator can swap to a mirror if needed. If commit-to-repo is chosen for the artifacts, future re-builds are unnecessary — the committed artifacts are the source of truth.
|
||||
**Risk 2: CARTO Voyager basemap residual is too coarse for AC-4**
|
||||
- *Risk*: CC-BY basemap is OSM-derived (street-level features, not satellite features). NetVLAD descriptors may not lock against nadir camera frames well enough for ≥ 80 % within 100 m.
|
||||
- *Mitigation*: This is an honest discovery surface. AC-4 may fail on accuracy after this task lands — the failure mode shifts from "no anchors at all" (current) to "anchors exist but VPR similarity is too low". The AZ-699 verdict report (AC-5) surfaces the actual horizontal-error distribution; if it lands at e.g. p50 = 250 m, that becomes evidence for a follow-up ticket to seed a satellite-imagery source (Maxar Open Data, Sentinel-2, etc.). The xfail is removed in either case because the test now exercises the real pipeline — the verdict, not the xfail, is the honest signal.
|
||||
|
||||
**Risk 3: Repo size pressure if artifacts are committed**
|
||||
- *Risk*: Tile store + FAISS index could exceed 100 MB depending on bbox + zoom levels; committing them under LFS still costs LFS storage and bandwidth.
|
||||
- *Mitigation*: First build run measures the size. If under 100 MB → commit. If over → build-on-demand documented in README + `scripts/run-tests-jetson.sh` pre-step. Either choice is acceptable per AC-6.
|
||||
**Risk 3: satellite-provider doesn't build on arm64 (Jetson)**
|
||||
- *Risk*: The existing `SatelliteProvider.Api/Dockerfile` uses `mcr.microsoft.com/dotnet/aspnet:10.0` which is amd64-default. Tier-2 Jetson is arm64.
|
||||
- *Mitigation*: First check whether the multi-arch manifest exists for the dotnet/aspnet image at the pinned version. If yes → no action needed. If no → file a follow-up ticket to multi-arch the satellite-provider Dockerfile; AC-4 + AC-5 stay BLOCKED on Tier-2 until that ticket lands, but Phases 1–3 + AC-1/2/3/6 still complete on Tier-1 in this ticket's scope.
|
||||
|
||||
**Risk 4: Backbone descriptor dimension mismatch**
|
||||
- *Risk*: If the operator changes the airborne C2 backbone (UltraVPR → NetVLAD, etc.) without rebuilding the index, the FAISS load will fail at runtime with a dimension mismatch.
|
||||
- *Mitigation*: Manifest records the descriptor dimension. C6 loader asserts the manifest's dimension matches the configured backbone's output dimension at compose time; mismatch surfaces as an `AirborneBootstrapError` naming both numbers + the rebuild invocation.
|
||||
**Risk 4: docker-compose stand-up flakiness slows down the test suite**
|
||||
- *Risk*: Cold-bringing up satellite-provider + its Postgres + the gps-denied-onboard companion + e2e-runner across CI pipelines adds wall-clock time.
|
||||
- *Mitigation*: Named volumes for both the satellite-provider DB and the populated C6 mean only the first run in a CI session pays the cost. Subsequent runs are warm. Document the named volumes in the docker-compose comments + the fixture README so an operator knows to `docker volume prune` if they want to force a re-seed.
|
||||
|
||||
**Risk 5: Single-ticket 8-pt complexity exceeds the standard PBI cap**
|
||||
- *Risk*: The task is intentionally above the 5-pt cap stated in the project's PBI complexity rule; this can mask the failure mode where a sub-phase blocks and the whole ticket grinds.
|
||||
- *Mitigation*: The five phases above are explicit handoff points. If Phase 1 (satellite-provider stand-up) fails for reasons outside this ticket's scope (e.g., parent-suite contract drift, arm64 issue), the implementer STOPS at the phase boundary, reports the blocker, and proposes a split into smaller follow-up tickets. The "single ticket" property is preserved as long as the work proceeds linearly; if it grinds at any phase boundary, decomposition is the escape hatch.
|
||||
|
||||
### ADR Impact
|
||||
|
||||
> Affects ADR-001 (composition root is single registration site): unchanged — C6 is built outside the composition root by the operator-side build script; the airborne binary still just loads what's on disk.
|
||||
> Implements architecture principle #4 (no in-air network I/O) and principle #5 (all persistent imagery in `satellite-provider` on-disk layout) — this is the FIRST executable artifact that demonstrates both principles end-to-end against a real flight.
|
||||
> Affects ADR-002 (build-time exclusion): unchanged — C11 is already operator-side-only via process-level isolation (architecture Principle #4 + ADR-004); this task just exercises that path against the real upstream.
|
||||
> Affects ADR-011 (replay is a configuration): unchanged — the per-frame loop is mode-agnostic; this task closes the gap between the live and replay paths' upstream tile source (both now go through the real satellite-provider).
|
||||
> Implements architecture principle #5 (satellite-provider on-disk layout) end-to-end against a real flight for the first time.
|
||||
> No new ADR — the architectural decision is "wire the production C10/C11 pipeline into the e2e harness", which is execution of existing decisions, not a new one.
|
||||
|
||||
Reference in New Issue
Block a user