[AZ-777] Rewrite spec: real satellite-provider + production C10/C11

Original spec called for direct OSM/CARTO downloads, contradicting
architecture (C11 owns tile network I/O against parent-suite
satellite-provider .NET 8 service; C10 batches descriptors over the
populated C6, never touches the upstream). Rewritten spec drives the
production C10/C11 pipeline against the real satellite-provider
running in docker-compose.test.yml, replacing the mock-suite-sat-
service GET stub. Complexity 5 -> 8 pts (single-ticket override).
Decision log: _docs/_process_leftovers/2026-05-21_az777_complexity_
override.md. Jira AZ-777 description + summary synced. Autodev state
pauses for next session to pick up Phase 1 (satellite-provider
stand-up + smoke test).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-21 13:57:01 +03:00
parent 2b53168142
commit 1198890b74
3 changed files with 198 additions and 133 deletions
@@ -1,193 +1,196 @@
# Derkachi C6 reference tile cache + descriptor index (OSM/CARTO basemap) # Derkachi e2e: wire real satellite-provider + production C10/C11 pipeline into the operator pre-flight fixture
**Task**: AZ-777_derkachi_c6_reference_fixture **Task**: AZ-777_derkachi_c6_reference_fixture
**Name**: Build the C6 reference tile cache + FAISS descriptor index for the Derkachi flight bbox so the full-protocol C1+C2+C3+C4+C5 pipeline can produce satellite anchors during e2e replay **Name**: Drive the production C10/C11 pre-flight pipeline against a real parent-suite `satellite-provider` service in the e2e harness so the Derkachi clip produces a real FAISS-anchored C4/C5 satellite-fix loop end-to-end
**Description**: Add a reproducible build script that downloads OSM/CARTO basemap tiles for the Derkachi flight bbox (approx 50.0550.15 lat, 36.0536.15 lon), pre-computes feature descriptors via the same C7 backbone the airborne binary uses (DINOv2 or the configured VPR backbone), populates the C6 tile store + FAISS HNSW index, and integrates them into the e2e replay harness. Unblocks the two remaining `@xfail`-masked Derkachi tests on Jetson (`test_ac3_within_100m_80pct_of_ticks` and `test_az699_real_flight_validation_emits_verdict_and_report`) and produces the first honest AZ-699 accuracy verdict. **Description**: Replace the e2e harness's `mock-suite-sat-service` `/healthz`-only stub on the GET tile path with the real `satellite-provider` .NET 8 service (sibling repo at `../satellite-provider`). Seed satellite-provider's Derkachi-bbox tile catalog from a CC-BY-licensed basemap source. Replace the `operator_pre_flight_setup` placeholder fixture in `tests/e2e/replay/conftest.py` with a real fixture that drives the production C11 `HttpTileDownloader` + C10 `DescriptorBatcher` pipeline against the running service, builds C6 (Postgres metadata + filesystem tile store + FAISS HNSW descriptor index), and mounts the populated cache into the e2e-runner container. Un-xfail the Derkachi AC-3 + AZ-699 verdict tests on Tier-2 Jetson; produce the first honest AZ-699 horizontal-error verdict report.
**Complexity**: 5 points **Complexity**: 8 points (explicit override of the standard 5-pt PBI cap — see decision log entry 2026-05-21 under `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md`; single-ticket containment is preferred over decomposition because the four sub-deliverables only deliver demo-confidence value when shipped together)
**Dependencies**: AZ-776_eskf_open_loop_composition_profile **Dependencies**: AZ-776_eskf_open_loop_composition_profile (done — AZ-776 unblocks compose; this task closes the satellite-anchoring loop)
**Component**: c6_tile_cache / e2e fixtures / input_data **Component**: e2e fixtures / c6_tile_cache / c10_provisioning / c11_tile_manager / docker compose
**Tracker**: AZ-777 **Tracker**: AZ-777
**Epic**: AZ-602 **Epic**: AZ-602
## Problem ## Problem
The Derkachi e2e fixture The Derkachi e2e fixture (`_docs/00_problem/input_data/flight_derkachi/`) ships real flight inputs (video, tlog, IMU, camera calibration) but DOES NOT ship the populated C6 tile cache + FAISS descriptor index the replay protocol requires (`replay_protocol.md` Invariant 12: "Real C6 cache in replay: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow"). Two architectural gaps stop the full-protocol C1+C2+C3+C4+C5 pipeline from running against Derkachi today:
(`_docs/00_problem/input_data/flight_derkachi/`) ships the real
flight inputs (video, tlog, IMU, camera calibration) but DOES NOT
ship the C6 tile-cache artifacts that the replay protocol requires
the operator's pre-flight C10 stage to produce:
- `c6_tile_store` — persistent JPEG tiles covering the flight area at the chosen zoom levels 1. **`mock-suite-sat-service` is `/healthz`-only.** The stub at `tests/fixtures/mock-suite-sat-service/main.py` exposes only `GET /healthz` and does NOT implement the `/api/satellite/tiles` contract that C11 `HttpTileDownloader` (production code at `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`) queries against. Any e2e test that wants to exercise the production tile-download path against the stub gets HTTP 404 the moment C11 calls `_LIST_PATH = "/api/satellite/tiles"`.
- `c6_descriptor_index` — FAISS index of VPR-backbone descriptors over those tiles 2. **`operator_pre_flight_setup` is a placeholder.** The fixture at `tests/e2e/replay/conftest.py` (lines 293-310) `mkdir`s an empty `operator_cache` directory and yields. It does NOT drive C11 download or C10 descriptor-batcher; it does NOT populate C6. The fixture's docstring explicitly calls itself "a stub" pending this ticket.
Without these artifacts: The production architecture says (per `architecture.md` Principle #5 + the C10/C11 component descriptions):
- C2 VPR has no haystack to look up against — `c2_vpr.lookup` returns empty. - C10 does NOT touch satellite-provider — tile network I/O lives in C11.
- C3 matcher has nothing to match against (depends on C2 candidates). - C11 `HttpTileDownloader` is the production path: authenticated GETs against the parent-suite `satellite-provider` .NET 8 REST service (sibling repo at `../satellite-provider/`, real implementation with `SatelliteProvider.Api`, region-onboarding flows, integration tests).
- C4 pose has no anchors — cannot estimate satellite-frame pose. - `satellite-provider` owns the OSM/CARTO tile network I/O + license attribution + multi-flight voting layer — the onboard companion is read-only against it (via C11) during pre-flight and read-only against C6 during flight.
- C5 state has no anchors to fuse — runs open-loop on VIO only. - `mock-suite-sat-service` exists specifically for the D-PROJ-2 ingest (POST upload) endpoint that the parent-suite has not yet shipped — NOT for the GET tile-fetch path.
When `c5_state.strategy = gtsam_isam2` (the default that AZ-699's e2e The current AZ-777 spec ("write a script under `scripts/build_derkachi_c6_fixture.py` that downloads OSM/CARTO basemap tiles directly") was inconsistent with this architecture: it asked the onboard companion to do network I/O against an external imagery source instead of going through C11→satellite-provider. The corrected scope (this revision) drives the production pipeline end-to-end.
exercises), the composition reaches the per-frame loop but
`iSAM2.update` crashes at frame 1 with:
```
EstimatorFatalError: compute_marginals failed: Attempting to at the
key 'x2', which does not exist in the Values.
```
— because no C4 anchor was ever inserted (C2/C3/C4 have nothing to
match against).
AZ-776 (sibling, prerequisite) makes the open-loop C1+C5(ESKF)
composition runnable, but that path skips C2C4 entirely and accepts
unbounded drift. To validate the FULL protocol-compliant pipeline
against Derkachi — i.e. AC-3 (`≤100 m for 80 % of ticks`) and the
AZ-699 horizontal-error verdict — we need real C6 fixtures.
The replay protocol (`replay_protocol.md` line 214) explicitly states
"`BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay
alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`,
populated by the pre-flight C10 build. This is the architectural
change vs. v1.0.0 of this contract." We have no such build for
Derkachi.
## Outcome ## Outcome
- A reproducible build script under `scripts/` produces the C6 artifacts (`tile_store` + `descriptor_index`) given the Derkachi bbox + zoom levels + camera calibration, deterministically on a clean checkout, in under 30 minutes on a developer workstation. - The e2e harness `docker-compose.test.yml` runs the real `satellite-provider` .NET 8 service (built from `../satellite-provider/SatelliteProvider.Api/Dockerfile`) alongside the existing `mock-sat` (which is retained only for the D-PROJ-2 POST/upload contract until the parent suite ships it).
- Reference imagery source is OSM-tile-server-distributed basemap (CARTO Voyager or equivalent CC-BY-licensed source). Each tile carries the source URL + license attribution in its metadata sidecar. - `satellite-provider`'s tile catalog is seeded with the Derkachi bbox (≈50.0550.15 lat, 36.0536.15 lon) at the camera-AGL-appropriate zoom levels (1518) via the service's existing region-onboarding flow (CC-BY-licensed basemap source; license + attribution baked into the seeded catalog's metadata).
- The Derkachi fixture directory documents the build invocation; tiles + index are EITHER committed to the repo (if total size ≤ 100 MB) OR built on-demand from the script (if larger) — decision recorded in the fixture README. - `tests/e2e/replay/conftest.py::operator_pre_flight_setup` is replaced by a real fixture that:
- `tests/e2e/replay/conftest.py`'s `operator_pre_flight_setup` fixture is replaced (or extended) to mount the prebuilt artifacts into the e2e-runner container. The mock-suite-sat-service stub is retired for the C6-served paths (it remains for the C12 operator-workflow AC-8). 1. Resolves the Derkachi bbox + camera-derived zoom range from the existing flight fixture.
- After this task ships (with AZ-776), un-xfail `test_ac3_within_100m_80pct_of_ticks` (`test_derkachi_1min.py` line 174) AND `test_az699_real_flight_validation_emits_verdict_and_report` (`test_derkachi_real_tlog.py` line 174); both pass on the Jetson harness. 2. Invokes C11 `HttpTileDownloader` against the running `satellite-provider` to populate C6 (Postgres metadata + filesystem tile store).
- The first honest AZ-699 verdict lands at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the full horizontal-error distribution. Whether the verdict is PASS or FAIL is the honest finding — this task's success is that the verdict is *produced* against the real pipeline, not that it is necessarily green. 3. Invokes C10 `DescriptorBatcher` against the populated C6 to build the FAISS HNSW descriptor index via the production NetVLAD backbone (C2 default per `c2_vpr/config.py:67`).
4. Verifies all three sidecar files (`.index`, `.sha256`, `.meta.json`) per the FAISS sidecar coherence invariant (AZ-306).
5. Yields the populated cache directory + Postgres connection string for the e2e-runner to mount.
- The populated C6 is mounted into the `e2e-runner` container via named volumes that survive across pytest sessions (so repeated test runs reuse the cache).
- AC-3 (`test_ac3_within_100m_80pct_of_ticks` in `tests/e2e/replay/test_derkachi_1min.py`) un-xfails and passes on Tier-2 Jetson with ≥ 80 % of ticks within 100 m of ground truth.
- AZ-699 verdict test (`test_az699_real_flight_validation_emits_verdict_and_report` in `tests/e2e/replay/test_derkachi_real_tlog.py`) un-xfails and produces the first honest horizontal-error distribution report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
## Scope ## Scope
### Included ### Included
- `scripts/build_derkachi_c6_fixture.py` (or equivalent module under `e2e/fixtures/derkachi_c6/`): reproducible build pipeline that: **Phase 1 — satellite-provider stand-up in the e2e harness**
- Reads the Derkachi bbox + zoom levels from a small YAML config (`tests/fixtures/derkachi_c6/bbox.yaml`).
- Downloads OSM/CARTO basemap tiles into `<output>/tiles/{zoom}/{x}/{y}.jpg` mirroring `satellite-provider`'s on-disk layout (per architecture principle #5). - `docker-compose.test.yml`: add a `satellite-provider` service that builds from `../satellite-provider/SatelliteProvider.Api/Dockerfile`. Service depends on a `satellite-provider-db` Postgres instance (separate from the existing `db` service for c6 metadata to avoid cross-tenant table collisions). Service exposes port 5101 (`satellite-provider` standard) inside the compose network.
- Computes per-tile descriptors via the same C7 backbone the airborne binary uses (configurable; defaults to whatever `config.components.c2_vpr.strategy`'s feature dimension is — e.g. UltraVPR or NetVLAD). - `e2e-runner` env: replace `SATELLITE_PROVIDER_URL: http://mock-sat:5100` with `SATELLITE_PROVIDER_URL: http://satellite-provider:5101` for the C11 download path. Keep `MOCK_SAT_UPLOAD_URL: http://mock-sat:5100` for the D-PROJ-2 POST stub (until D-PROJ-2 ships).
- Builds a FAISS HNSW index over the descriptors, writes via `faiss.write_index` + atomicwrites + SHA-256 content-hash gate (per D-C10-3). - `docker-compose.test.jetson.yml`: mirror the same satellite-provider service for Tier-2 (build context unchanged; Jetson uses cross-compiled image once the parent-suite .NET service builds for arm64 — verify in this task whether the existing Dockerfile produces an arm64-capable image, otherwise file a follow-up).
- Emits a manifest JSON recording tile count, bbox, zoom levels, backbone, descriptor dimension, FAISS index parameters, source URL template, license, and the SHA-256 of every artifact. - Smoke test in `tests/e2e/satellite_provider/test_smoke.py`: brings up the docker-compose stack, GETs `/healthz` against the real service, runs a single C11 `HttpTileDownloader.download_for_bbox` call against a 1-tile bbox, asserts the tile arrives in C6 + the metadata row is inserted. Gated by `RUN_REPLAY_E2E=1`.
- `tests/fixtures/derkachi_c6/bbox.yaml`: the bbox + zoom + backbone config consumed by the build script. Committed.
- `tests/fixtures/derkachi_c6/README.md`: how to rebuild + license attribution + estimated artifact size. **Phase 2 — Derkachi tile catalog seeding**
- Build the artifacts once, decide commit vs on-demand:
- If total size ≤ 100 MB → commit to `_docs/00_problem/input_data/flight_derkachi/c6_cache/` (under LFS). - `tests/fixtures/derkachi_c6/seed_region.py` (new): a Python helper that calls the real `satellite-provider` region-onboarding endpoint (`/api/regions` or whatever the contract is — verify against the .NET source at `../satellite-provider/SatelliteProvider.Api`) to register the Derkachi bbox + zoom range. The seed run uses CARTO Voyager Basemap as the upstream imagery source (CC-BY-3.0; satellite-provider owns the actual tile download from CARTO and applies the freshness gate).
- If > 100 MB → keep build-on-demand only, document the build invocation in the fixture README, and add a `scripts/run-tests-jetson.sh` pre-step that builds if absent. - `tests/fixtures/derkachi_c6/bbox.yaml`: Derkachi bbox + zoom levels + imagery source + license attribution metadata. The values match the seed script's payload.
- `tests/e2e/replay/conftest.py`: replace `operator_pre_flight_setup`'s mock with a real fixture that mounts the prebuilt artifacts into the e2e-runner container at the expected paths (`/opt/tiles/`, `/opt/descriptor_index.index`). - `tests/fixtures/derkachi_c6/README.md`: how to re-seed if the satellite-provider DB is wiped; license attribution operators must propagate.
- `docker-compose.test.yml` + `docker-compose.test.jetson.yml`: mount the artifacts into the `e2e-runner` service (bind mount or named volume), set `c6_tile_store.path` + `c6_descriptor_index.path` env vars.
- `tests/e2e/replay/test_derkachi_1min.py`: remove the `@pytest.mark.xfail` decorator on AC-3 (line 174). **Phase 3 — replace `operator_pre_flight_setup` with a real fixture**
- `tests/e2e/replay/test_derkachi_real_tlog.py`: remove the `@pytest.mark.xfail` decorator on AZ-699 (line 174).
- `_docs/00_problem/input_data/flight_derkachi/README.md`: document the new C6 artifacts + build invocation + license attribution. - `tests/e2e/replay/conftest.py::operator_pre_flight_setup`: replace the placeholder. The new fixture:
- `_docs/02_document/contracts/c6_tile_cache/`: if a contract file exists for the descriptor-index format, append a Consumer entry naming this fixture; if not, no new contract needed. - Reads the Derkachi bbox from `tests/fixtures/derkachi_c6/bbox.yaml`.
- Invokes C11 `HttpTileDownloader` against the running satellite-provider service.
- Invokes C10 `DescriptorBatcher` against the populated C6 (NetVLAD backbone per c2_vpr default).
- Verifies sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency check per AZ-306).
- Yields a `PopulatedC6Cache` dataclass that the test bodies consume.
- The fixture's outputs are mounted into the e2e-runner container via named volumes that survive across pytest sessions (so the second test run in the same session reuses the populated cache — re-seeding takes minutes, re-downloading takes longer).
**Phase 4 — un-xfail the Tier-2 tests**
- `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`: remove `@pytest.mark.xfail` (still gated by `RUN_REPLAY_E2E=1` env + `tier2` marker — only runs on Tier-2 harness).
- `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`: remove `@pytest.mark.xfail`. The test body MUST emit the verdict report to `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` regardless of PASS/FAIL — the success criterion is that the report exists with the honest distribution, not that the verdict is necessarily green.
**Phase 5 — documentation**
- `_docs/02_document/contracts/replay/replay_protocol.md`: Invariant 12 already states "Real C6 cache in replay" — append the AZ-777 / e2e-runner integration detail in a new sub-section under **Composition root extension** describing the operator_pre_flight_setup fixture's behaviour.
- `_docs/00_problem/input_data/flight_derkachi/README.md`: add a Derkachi C6 section pointing at the seed script + bbox config.
- `_docs/02_document/architecture.md`: append a new sub-section to the existing satellite-provider entry (line ~28) noting that the e2e harness now stands up the real service via `docker-compose.test.yml`; `mock-suite-sat-service` is retained only for the unshipped D-PROJ-2 POST contract.
### Excluded ### Excluded
- Multi-flight fixtures — just Derkachi. (Other flights would each need their own C6 build invocation.) - The D-PROJ-2 POST/upload contract — still gated on the parent-suite design landing. `mock-suite-sat-service` continues to handle the POST stub.
- Online tile download at test time — the e2e harness MUST remain offline (per replay protocol Invariant 5 / RESTRICT-SAT-1 / NFT-SEC-02; the docker compose `internal: true` network). The build script downloads tiles AT BUILD TIME from the developer workstation; the e2e harness only sees the prebuilt artifacts. - Multi-flight fixtures — Derkachi only. Other flights each need their own bbox seed and re-run.
- Replacing the mock-suite-sat-service stub for the C12 operator-workflow `test_ac8_operator_workflow` test — that test exercises the D-PROJ-2 ingest contract which is parent-suite work, not in scope here. - Switching C2 default backbone away from `net_vlad` — out of scope; if the operator wants UltraVPR or DINOv2, they re-run C10 with a different backbone configuration.
- Building tiles for any backbone other than the airborne-default. If the operator wants a different backbone, they re-run the script with a different `--backbone` flag; this task only commits the default-backbone artifacts. - Cross-compilation of satellite-provider for Jetson arm64 if the existing Dockerfile does not produce arm64 — file a follow-up ticket if needed; this task does NOT attempt to land arm64 support in the .NET service.
- Switching the airborne C6 backend from Postgres-mirroring to anything else — the build script writes the same on-disk layout the production C6 expects. - Modifying any file under `../satellite-provider/` (sibling repo) — this task is purely additive on the gps-denied-onboard side + docker-compose orchestration. If the .NET service is missing an endpoint the C11 client requires, file a parent-suite ticket and STOP.
- AZ-776 (sibling): this task does NOT introduce the `c4_pose.enabled` flag or the open-loop composition profile. AZ-776 must land first to unblock the open-loop xfails (AC-1, AC-2, AC-5, AC-6); this task targets the full-GTSAM xfails (AC-3, AZ-699). - Persisting the populated C6 to git/LFS — the named-volume approach above keeps the cache out of the repo. If repo-committed artifacts become a requirement later, file a follow-up to evaluate LFS size.
## Acceptance Criteria ## Acceptance Criteria
**AC-1: Reproducible build** **AC-1: Real satellite-provider runs in the e2e harness**
Given a clean checkout Given `docker-compose.test.yml` with the new `satellite-provider` service
When `python scripts/build_derkachi_c6_fixture.py --output tests/fixtures/derkachi_c6/out --bbox tests/fixtures/derkachi_c6/bbox.yaml` runs When `docker compose -f docker-compose.test.yml up satellite-provider` is invoked
Then it produces a `tiles/` directory in the documented `{zoom}/{x}/{y}.jpg` layout, a FAISS `.index` file with a SHA-256-verified content hash, and a `manifest.json` recording tile count, bbox, backbone, descriptor dimension, FAISS parameters, source URL template, license, and per-artifact SHA-256, in under 30 minutes on a developer workstation Then the service builds from `../satellite-provider/SatelliteProvider.Api/Dockerfile`, comes up healthy on port 5101, and `GET /healthz` returns 200
**AC-2: License attribution** **AC-2: C11 downloads against real satellite-provider succeed**
Given the produced artifacts Given the running satellite-provider service + a seeded Derkachi-bbox tile catalog
When the manifest is inspected When `tests/e2e/satellite_provider/test_smoke.py` runs C11 `HttpTileDownloader.download_for_bbox` for a single tile
Then it records the tile source URL template, the license name (CC-BY-3.0 or CC-BY-4.0 as applicable), and the attribution string the operator must surface in any derived publication Then the tile arrives in the C6 filesystem store, the metadata row is inserted into C6's Postgres, and the freshness label is `fresh` (per the C6 freshness gate)
**AC-3: Offline e2e harness** **AC-3: operator_pre_flight_setup drives the production pipeline**
Given the prebuilt C6 artifacts mounted into the e2e-runner container Given the running satellite-provider with Derkachi tiles seeded
When `scripts/run-tests-jetson.sh` runs on Jetson with `RUN_REPLAY_E2E=1 GPS_DENIED_TIER=2` and the Docker compose network is `internal: true` When `tests/e2e/replay/conftest.py::operator_pre_flight_setup` runs
Then the test harness never reaches out to any external host; all C6 queries are served from the mounted artifacts Then C11 `HttpTileDownloader` downloads the Derkachi-bbox tiles into C6, C10 `DescriptorBatcher` builds the FAISS HNSW index over them using the NetVLAD backbone, the three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check, and the fixture yields a `PopulatedC6Cache` with all three artifact paths populated
**AC-4: Full-protocol e2e passes** **AC-4: AC-3 Derkachi test un-xfails on Tier-2**
Given AZ-776 has landed AND the C6 artifacts are mounted AND the YAML config selects `c5_state.strategy = gtsam_isam2` with `c4_pose.enabled = True` Given AZ-776 landed + the populated C6 from AC-3 mounted into the e2e-runner + the airborne binary configured with `c5_state.strategy = gtsam_isam2` + `c4_pose.enabled = True`
When `gps-denied-replay` runs the Derkachi 1-min fixture on Jetson When `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` runs on Tier-2 Jetson
Then it exits with code 0, emits one EstimatorOutput per video frame, `test_ac3_within_100m_80pct_of_ticks` un-xfails and passes (≥80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not the existing `satellite_anchoring_not_wired` warning) Then it un-xfails, the test passes (≥ 80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not the existing `satellite_anchoring_not_wired` warning)
**AC-5: AZ-699 produces an honest verdict** **AC-5: AZ-699 verdict report is produced**
Given AZ-776 has landed AND the C6 artifacts are mounted AND the real flight video + factory calibration are present (already are) Given AZ-776 landed + the populated C6 from AC-3 + the real flight video + factory calibration
When `test_az699_real_flight_validation_emits_verdict_and_report` runs on Jetson When `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` runs on Tier-2 Jetson
Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥80 % within 100 m gate Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥ 80 % within 100 m gate (PASS not required for the AC; HONEST report required)
**AC-6: Fixture README documents rebuild** **AC-6: Documentation captures the new architecture seam**
Given the updated `_docs/00_problem/input_data/flight_derkachi/README.md` Given the rewritten replay protocol doc + the Derkachi fixture README + the architecture sub-section
When a new contributor reads it When a new contributor reads them
Then it documents (i) what C6 artifacts now exist, (ii) the exact `python scripts/build_derkachi_c6_fixture.py …` invocation to rebuild, (iii) the license attribution operators must propagate, (iv) the size-on-disk decision (committed vs. build-on-demand) Then they understand (i) why the real satellite-provider runs in the e2e harness, (ii) how to re-seed the Derkachi catalog, (iii) which path goes through `mock-sat` vs. real satellite-provider (POST vs. GET), and (iv) what license attribution operators must propagate
## Non-Functional Requirements ## Non-Functional Requirements
**Performance** **Performance**
- Build script completes in ≤ 30 minutes on a developer workstation (Apple Silicon or x86 Linux, no GPU required for OSM tile download + descriptor pre-compute via the CPU-fallback path of the backbone). - `operator_pre_flight_setup` completes in ≤ 5 minutes on first invocation (cold cache), ≤ 30 seconds on subsequent invocations within the same docker-compose session (warm cache via named volume).
- Built artifacts do not regress the airborne C2 lookup latency budget — the FAISS HNSW parameters MUST match what production C6 expects (M, efConstruction, efSearch); the index is built once and never rebuilt at runtime. - Built C6 artifacts (tile store + descriptor index) match the airborne C2 lookup latency budget — FAISS HNSW parameters MUST match what production C6 expects (M, efConstruction, efSearch); the index is built once per session, never rebuilt mid-test.
**Compatibility** **Compatibility**
- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to `satellite-provider`'s layout (architecture principle #5) so a future post-landing upload would be byte-identical. - Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to satellite-provider's layout (architecture principle #5) — this is automatic because C11 writes via the C6 production code path.
- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex` impl without code changes. - FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex.from_config` impl without code changes — this is automatic because C10 writes via the C6 production code path.
- Descriptor dimension MUST match the configured C7 backbone's output dimension — the build script asserts this at start. - The .NET satellite-provider service's `/api/satellite/tiles` contract version MUST be compatible with the C11 `HttpTileDownloader._LIST_PATH` / `_GET_PATH` constants (`/api/satellite/tiles`). Mismatch is a parent-suite bug; this task does not patch C11 around it.
**Reliability** **Reliability**
- Build script MUST fail loud on partial downloads (network error, HTTP 429/500, malformed tile) rather than silently producing an incomplete tile store. Resume-from-partial is allowed but each resumed run re-verifies SHA-256 of every committed tile. - The smoke test (AC-2) MUST fail loud if the satellite-provider service is unreachable, returns malformed responses, or rate-limits — no silent skip.
- The SHA-256 content-hash gate on the FAISS index (per D-C10-3) MUST be enforced — operator can verify a downloaded fixture matches what was built. - The `operator_pre_flight_setup` fixture MUST clean up partial cache state on failure (no half-built FAISS index left around).
- The SHA-256 content-hash gate on the FAISS index (per D-C10-3) MUST be verified at every fixture yield — mismatch raises `IndexUnavailableError`.
**Security** **Security**
- Reference imagery URLs MUST be HTTPS. Tile metadata MUST record the exact source URL so license auditors can verify attribution. - Reference imagery source URLs MUST be HTTPS. License attribution recorded in the seeded catalog's metadata so operators can verify before any derived publication.
- No API keys committed to the repo — if the chosen tile source requires registration, the build script reads the key from an env var and documents the env var name in the fixture README. - No JWT secrets committed the satellite-provider service in docker-compose reads `JWT_SECRET` from a `.env.test` file that's `.gitignore`'d; the test environment uses a development-only key.
- C11 download MUST go through the production auth path (Bearer token from satellite-provider's `/api/auth`) — no auth bypass for tests.
## Unit Tests ## Unit Tests
| AC Ref | What to Test | Required Outcome | | AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------| |--------|--------------|------------------|
| AC-1 | Build script produces `tiles/`, `descriptor_index.index`, `manifest.json` on a small mock bbox | All three artifacts exist, manifest fields populated | | AC-1 | docker-compose.test.yml validates `satellite-provider` service definition | YAML lints; service has correct build context + port |
| AC-1 | SHA-256 of `descriptor_index.index` recorded in manifest matches actual file hash | Hashes match | | AC-2 | C11 `HttpTileDownloader.download_for_bbox` against a stubbed real satellite-provider response | Returns expected `DownloadBatchReport` with `outcome=SUCCESS` |
| AC-2 | Manifest records source URL template + license + attribution | All three fields non-empty | | AC-3 | `operator_pre_flight_setup` fixture yields a `PopulatedC6Cache` with non-empty tile store + FAISS index | All three sidecar files exist + sha256 triple-consistency holds |
| AC-2 | License field matches the source's documented license | Round-trips against an enum | | AC-3 | Sidecar SHA-256 coherence check inside the fixture | `IndexUnavailableError` raised when one of the three files is tampered |
| AC-6 | Fixture README documents the build invocation | Invocation string greps cleanly | | AC-6 | Fixture README documents the seed invocation | Invocation string + license attribution greps cleanly |
## Blackbox Tests ## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|--------------|-------------------|----------------| |--------|------------------------|--------------|-------------------|----------------|
| AC-3 | Prebuilt C6 artifacts + e2e-runner with `internal: true` network | Run `scripts/run-tests-jetson.sh` end-to-end | No outbound network calls observed by Docker network logs; all C6 queries return from local index | Security, Reliability | | AC-1 | docker-compose.test.yml + satellite-provider service definition | `docker compose up satellite-provider` | Service comes up healthy in ≤ 60 s | Perf |
| AC-4 | AZ-776 landed + C6 artifacts mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed | Test passes (≥80 % of ticks within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat | | AC-2 | Real satellite-provider running + 1-tile-bbox query | C11 HttpTileDownloader against the live service | Tile arrives in C6 + metadata row inserted + freshness=fresh | Reliability |
| AC-5 | AZ-776 landed + C6 artifacts mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test runs to completion15 min, verdict report written to `_docs/06_metrics/` | Perf | | AC-3 | Seeded Derkachi catalog + e2e-runner | `operator_pre_flight_setup` cold + warm invocation | Cold ≤ 5 min, warm ≤ 30 s, all three sidecar files coherent | Perf |
| AC-4 | AZ-776 landed + populated C6 mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed on Tier-2 Jetson | Test passes (≥ 80 % within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
| AC-5 | AZ-776 landed + populated C6 mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test completes ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
## Constraints ## Constraints
- Reference imagery source MUST be OSM/CARTO basemap (CC-BY-licensed). Operator chose this during AZ-777 scoping (cycle-3 Step 9, 2026-05-21) over Maxar Open Data (license uncertainty for in-repo redistribution) and video-self-orthorectification (self-referential, makes AC-3 a smoke test rather than a real accuracy gate). The trade-off — lower-resolution reference imagery may produce a higher residual on the AC-3 horizontal-error metric than satellite imagery would — is an HONEST finding the AZ-699 verdict will surface. - ZERO modifications to files under `../satellite-provider/` (sibling repo). If a parent-suite API gap is discovered (e.g., `/api/satellite/tiles` returns 404 because the endpoint isn't wired), STOP and file a parent-suite ticket; do not work around it on the onboard side.
- The build script MUST NOT depend on `satellite-provider` running. The script's only network dependency is the chosen OSM/CARTO tile server (HTTPS, public, no auth). - Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner once the cache is populated. The cache-population phase needs network (satellite-provider downloads from CARTO upstream), but once the docker-compose `e2e-runner` service is `internal: true`-networked for the airborne replay run, no external host is reachable. Verify with Docker network inspection during AC-4.
- The committed artifact size budget (if AC-6 chooses commit-to-repo) is 100 MB total across `tiles/` + `descriptor_index.index`. Over budget → switch to build-on-demand, document in README. - Imagery source MUST be CC-BY-licensed (CARTO Voyager Basemap or equivalent). The seeded catalog records the license + attribution string operators must propagate in any derived publication.
- The `mock-suite-sat-service` stub stays in place for `test_ac8_operator_workflow` — that test exercises the D-PROJ-2 contract which this task does not address. - The seeded Derkachi catalog size budget is 100 MB on the satellite-provider DB side. Over budget → reduce zoom-level coverage; document the trade-off in `bbox.yaml` and `tests/fixtures/derkachi_c6/README.md`.
- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner. The build script runs on the developer workstation; the harness only sees prebuilt artifacts.
## Risks & Mitigation ## Risks & Mitigation
**Risk 1: OSM basemap residual is too coarse for the AC-3 threshold** **Risk 1: satellite-provider's `/api/satellite/tiles` contract drifts from what C11 expects**
- *Risk*: AC-3's `≤100 m for 80 %` gate may be physically unmeetable when the reference imagery is OSM rasterized basemap (street-level features, not satellite features) — the visual descriptors may not lock against the aerial nav-camera frames at all. - *Risk*: C11 `HttpTileDownloader` was implemented against an older satellite-provider contract; recent satellite-provider changes may have moved or renamed the endpoint.
- *Mitigation*: This is an honest discovery. If AC-3 still fails after this task lands, the failure mode shifts from "no anchors at all" (current) to "anchors exist but VPR similarity is too low to produce ≥80 % within 100 m". The AZ-699 verdict report will surface the actual horizontal-error distribution; if it lands at e.g. p50 = 250 m, that becomes evidence for a follow-up ticket to switch to satellite imagery. The xfail is removed in either case because the test now exercises the real pipeline — the verdict, not the xfail, becomes the honest signal. - *Mitigation*: AC-1 smoke test fires the C11 call against the real service before any test depends on it. Any 404/400/contract mismatch surfaces immediately; the failure points at a parent-suite ticket, not an onboard bug. The onboard code path is the standard production code; this task does not modify it.
**Risk 2: Tile source rate-limits or goes offline mid-build** **Risk 2: CARTO Voyager basemap residual is too coarse for AC-4**
- *Risk*: Public OSM/CARTO tile servers may rate-limit or temporarily go down, breaking reproducibility on a re-build. - *Risk*: CC-BY basemap is OSM-derived (street-level features, not satellite features). NetVLAD descriptors may not lock against nadir camera frames well enough for ≥ 80 % within 100 m.
- *Mitigation*: Build script implements exponential backoff + resume-from-partial. Document the chosen tile-server URL in the fixture README so an operator can swap to a mirror if needed. If commit-to-repo is chosen for the artifacts, future re-builds are unnecessary — the committed artifacts are the source of truth. - *Mitigation*: This is an honest discovery surface. AC-4 may fail on accuracy after this task lands — the failure mode shifts from "no anchors at all" (current) to "anchors exist but VPR similarity is too low". The AZ-699 verdict report (AC-5) surfaces the actual horizontal-error distribution; if it lands at e.g. p50 = 250 m, that becomes evidence for a follow-up ticket to seed a satellite-imagery source (Maxar Open Data, Sentinel-2, etc.). The xfail is removed in either case because the test now exercises the real pipeline — the verdict, not the xfail, is the honest signal.
**Risk 3: Repo size pressure if artifacts are committed** **Risk 3: satellite-provider doesn't build on arm64 (Jetson)**
- *Risk*: Tile store + FAISS index could exceed 100 MB depending on bbox + zoom levels; committing them under LFS still costs LFS storage and bandwidth. - *Risk*: The existing `SatelliteProvider.Api/Dockerfile` uses `mcr.microsoft.com/dotnet/aspnet:10.0` which is amd64-default. Tier-2 Jetson is arm64.
- *Mitigation*: First build run measures the size. If under 100 MB → commit. If over → build-on-demand documented in README + `scripts/run-tests-jetson.sh` pre-step. Either choice is acceptable per AC-6. - *Mitigation*: First check whether the multi-arch manifest exists for the dotnet/aspnet image at the pinned version. If yes → no action needed. If no → file a follow-up ticket to multi-arch the satellite-provider Dockerfile; AC-4 + AC-5 stay BLOCKED on Tier-2 until that ticket lands, but Phases 13 + AC-1/2/3/6 still complete on Tier-1 in this ticket's scope.
**Risk 4: Backbone descriptor dimension mismatch** **Risk 4: docker-compose stand-up flakiness slows down the test suite**
- *Risk*: If the operator changes the airborne C2 backbone (UltraVPR → NetVLAD, etc.) without rebuilding the index, the FAISS load will fail at runtime with a dimension mismatch. - *Risk*: Cold-bringing up satellite-provider + its Postgres + the gps-denied-onboard companion + e2e-runner across CI pipelines adds wall-clock time.
- *Mitigation*: Manifest records the descriptor dimension. C6 loader asserts the manifest's dimension matches the configured backbone's output dimension at compose time; mismatch surfaces as an `AirborneBootstrapError` naming both numbers + the rebuild invocation. - *Mitigation*: Named volumes for both the satellite-provider DB and the populated C6 mean only the first run in a CI session pays the cost. Subsequent runs are warm. Document the named volumes in the docker-compose comments + the fixture README so an operator knows to `docker volume prune` if they want to force a re-seed.
**Risk 5: Single-ticket 8-pt complexity exceeds the standard PBI cap**
- *Risk*: The task is intentionally above the 5-pt cap stated in the project's PBI complexity rule; this can mask the failure mode where a sub-phase blocks and the whole ticket grinds.
- *Mitigation*: The five phases above are explicit handoff points. If Phase 1 (satellite-provider stand-up) fails for reasons outside this ticket's scope (e.g., parent-suite contract drift, arm64 issue), the implementer STOPS at the phase boundary, reports the blocker, and proposes a split into smaller follow-up tickets. The "single ticket" property is preserved as long as the work proceeds linearly; if it grinds at any phase boundary, decomposition is the escape hatch.
### ADR Impact ### ADR Impact
> Affects ADR-001 (composition root is single registration site): unchanged — C6 is built outside the composition root by the operator-side build script; the airborne binary still just loads what's on disk. > Affects ADR-002 (build-time exclusion): unchanged — C11 is already operator-side-only via process-level isolation (architecture Principle #4 + ADR-004); this task just exercises that path against the real upstream.
> Implements architecture principle #4 (no in-air network I/O) and principle #5 (all persistent imagery in `satellite-provider` on-disk layout) — this is the FIRST executable artifact that demonstrates both principles end-to-end against a real flight. > Affects ADR-011 (replay is a configuration): unchanged — the per-frame loop is mode-agnostic; this task closes the gap between the live and replay paths' upstream tile source (both now go through the real satellite-provider).
> Implements architecture principle #5 (satellite-provider on-disk layout) end-to-end against a real flight for the first time.
> No new ADR — the architectural decision is "wire the production C10/C11 pipeline into the e2e harness", which is execution of existing decisions, not a new one.
+18 -2
View File
@@ -4,12 +4,28 @@
flow: existing-code flow: existing-code
step: 10 step: 10
name: Implement name: Implement
status: in_progress status: paused
sub_step: sub_step:
phase: 7 phase: 7
name: batch-loop name: batch-loop
detail: "batch 103 cycle3: AZ-776 committed + transitioned to In Testing; AZ-777 next" detail: "batch 104 cycle3: AZ-777 spec rewritten (architecture-aligned, 8 pts, single ticket) + Jira synced; In Progress in Jira; Phase 1 (satellite-provider stand-up in docker-compose.test.yml) ready for next /autodev session"
retry_count: 0 retry_count: 0
cycle: 3 cycle: 3
tracker: jira tracker: jira
last_completed_batch: 103 last_completed_batch: 103
session_handoff:
current_task: AZ-777
jira_status: in_progress
canonical_spec: _docs/02_tasks/todo/AZ-777_derkachi_c6_reference_fixture.md
decision_log: _docs/_process_leftovers/2026-05-21_az777_complexity_override.md
next_session_phase: "Phase 1 — satellite-provider stand-up in docker-compose.test.yml + smoke test at tests/e2e/satellite_provider/test_smoke.py"
parent_suite_paths:
satellite_provider_repo: ../satellite-provider/
api_dockerfile: ../satellite-provider/SatelliteProvider.Api/Dockerfile
api_port_default: 8080
integration_test_compose: ../satellite-provider/docker-compose.tests.yml
notes:
- "C2 default backbone is net_vlad (c2_vpr/config.py:67) — Phase 3 fixture uses it."
- "STOP gates apply between phases — see canonical spec Risk 5 + Phase headers."
- "If satellite-provider 's /api/satellite/tiles contract drifts from C11 expectations, STOP and file parent-suite ticket; do not patch C11."
- "Tier-2 arm64 of satellite-provider not yet validated; check multi-arch manifest in Phase 1 or file follow-up."
@@ -0,0 +1,46 @@
# AZ-777 — Complexity override (8 pts, single ticket)
**Timestamp**: 2026-05-21T13:30:00+03:00
**Type**: Decision log (not a blocked tracker write)
**Decision-maker**: user (explicit choice via /autodev questionnaire 2026-05-21)
## Context
The standard PBI complexity rule in `user_rules` says:
> Create PBI with 2 or 3 points of complexity, could be 5. Do not create very complex PBIs with more than 5 points.
AZ-777 was originally a 5-pt task ("write a script that downloads OSM/CARTO basemap tiles directly"). During cycle-3 Step 10 implementation, the agent surfaced that the task spec contradicted the architecture (C10 does not touch satellite-provider; C11 owns that path against the real parent-suite .NET service). The user was asked to choose among:
- A) Decompose AZ-777 into 4 sub-tickets (AZ-777-a/b/c/d), cancel original
- B) Rewrite AZ-777 in place, expand to 8 pts, keep single ticket, multi-session implementation
- C) Implement original spec as-written (ignore architecture mismatch)
- D) Close cycle, pick up later
User chose B.
## Override rationale
The four sub-deliverables (satellite-provider stand-up, Derkachi catalog seeding, operator_pre_flight_setup rewrite, Tier-2 AC-4/AC-5 validation) only deliver demo-confidence value when shipped together. Splitting them into four PBIs would create a half-shipped state where:
- AZ-777-a alone leaves the e2e harness with a satellite-provider service that nothing consumes.
- AZ-777-b alone seeds a tile catalog that nothing queries.
- AZ-777-c alone tries to drive a fixture without the upstream service in place.
The user's preference is single-ticket containment with explicit phase boundaries documented in the task spec (Phases 15 + STOP gates per phase). This is the "single ticket but staged execution" pattern, not the "decompose into sub-tickets" pattern.
## STOP-gate enforcement
The rewritten AZ-777 spec includes explicit STOP gates between phases:
1. **Phase 1 → Phase 2**: If satellite-provider stand-up fails for parent-suite reasons (contract drift, arm64 issue), STOP and file a parent-suite ticket. Do not work around on the onboard side.
2. **Phase 2 → Phase 3**: If satellite-provider's region-onboarding endpoint shape differs from what the seed script expects, STOP and file a parent-suite ticket.
3. **Any phase → next**: If the implementation runs into work that materially exceeds the remaining phase's budget, STOP and propose decomposition (escape hatch into the 4-ticket split that was option A above).
The "single ticket" property is preserved as long as work proceeds linearly. If it grinds at any phase boundary, decomposition becomes the escape hatch. The user has been informed of this escape via the task spec's Risk 5.
## Replay obligation
This is NOT a tracker write blocker — Jira is reachable and the AZ-777 description + story points update is being made in the same /autodev turn that this decision log is being written. This file is the AUDIT TRAIL for the override, not a deferred-write record.
No replay action required on subsequent /autodev invocations. The file can be deleted once AZ-777 is moved to `done/`, but it's small enough that keeping it as historical documentation of the decision is fine.