mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 12:41:13 +00:00
[AZ-835] [AZ-777] Decompose Epic into C3-C6 + close AZ-777
AZ-839 (C3, 5pt) operator_pre_flight_setup real fixture: wire C1+C2+C11+C10, supersedes AZ-777 Phase 3 (route-driven, not bbox). AZ-840 (C4, 3pt) E2E orchestrator test ingesting raw (tlog, video, calibration), runs steps 1-7 end-to-end on Jetson. AZ-841 (C5, 1pt) Un-xfail AZ-777 AC-4 + AC-5 once C3 + C4 land. AZ-842 (C6, 2pt) Docs: replay_protocol Invariant 12 + architecture + orchestrator-test README. AZ-777 transitioned to Done in Jira (Phases 1+2 shipped batches 104-106; Phases 3-5 superseded per 2026-05-22 route-driven directive). Closure comment 11177 added with phase-by-phase status. Local spec moved todo/ -> done/ with a status banner at the top. Dependencies table preamble bumped to 173 tasks / 557 SP and a 2026-05-23 entry prepended. Autodev state sub_step.detail set to "batch 108 next; AZ-839 C3". Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,209 +0,0 @@
|
||||
# Derkachi e2e: wire EXISTING parent-suite satellite-provider into the operator pre-flight fixture
|
||||
|
||||
**Task**: AZ-777_derkachi_c6_reference_fixture
|
||||
**Name**: Drive the production C10/C11 pre-flight pipeline against the parent-suite `satellite-provider` .NET service ALREADY running in the Jetson e2e harness so the Derkachi clip produces a real FAISS-anchored C4/C5 satellite-fix loop end-to-end
|
||||
**Description**: The Jetson e2e harness already runs the real `satellite-provider` .NET 8 service (lineage AZ-688 / AZ-691 / AZ-692, services `satellite-provider` + `satellite-provider-postgres` in `docker-compose.test.jetson.yml`), but the e2e-runner still points its `SATELLITE_PROVIDER_URL` at the legacy `mock-sat` fixture and the placeholder `operator_pre_flight_setup` fixture never drives the C10/C11 pipeline. Compounding this, C11's `HttpTileDownloader` path constants (`_LIST_PATH=/api/satellite/tiles`, `_GET_PATH=/api/satellite/tiles/{tile_id}`) do not match the real satellite-provider API surface (`POST /api/satellite/tiles/inventory` for LIST, `GET /tiles/{z}/{x}/{y}` for tile fetch). This task wires the existing service into the e2e-runner, adapts C11 to the real contract, seeds the Derkachi-bbox tile catalog via `POST /api/satellite/request`, replaces the placeholder fixture with a real C10+C11 driver, and un-xfails the Tier-2 Derkachi + AZ-699 verdict tests.
|
||||
**Complexity**: 8 points (explicit override of the standard 5-pt PBI cap — see decision log entry 2026-05-21 + spec refresh note at `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md`; scope reconciled with reality 2026-05-21 during cycle-3 batch 104. Single-ticket containment preserved — the four sub-deliverables only deliver demo-confidence value when shipped together.)
|
||||
**Dependencies**: AZ-776 done (eskf open-loop composition profile unblocks the replay graph for Derkachi); relies on prior compose-side work AZ-688 / AZ-691 / AZ-692 (closed in Jira without local task spec files — the `satellite-provider` + `satellite-provider-postgres` services + `.env.test.example` are already present)
|
||||
**Component**: e2e fixtures / c6_tile_cache / c10_provisioning / c11_tile_manager / docker compose
|
||||
**Tracker**: AZ-777
|
||||
**Epic**: AZ-602
|
||||
|
||||
## Problem
|
||||
|
||||
The Derkachi e2e fixture (`_docs/00_problem/input_data/flight_derkachi/`) ships real flight inputs but DOES NOT ship the populated C6 tile cache + FAISS descriptor index the replay protocol requires (`replay_protocol.md` Invariant 12). Three architectural gaps stop the full C1+C2+C3+C4+C5 pipeline from running against Derkachi today:
|
||||
|
||||
1. **`e2e-runner` still points at `mock-sat`.** In `docker-compose.test.jetson.yml` the `e2e-runner` env block has `SATELLITE_PROVIDER_URL: http://mock-sat:5100` even though `mock-sat` is no longer defined in that file and the real `satellite-provider` service (https://satellite-provider:8080) IS defined right below.
|
||||
2. **C11 contract drift.** `c11_tile_manager/tile_downloader.py:61-62` defines `_LIST_PATH = /api/satellite/tiles` and `_GET_PATH = /api/satellite/tiles`. The real satellite-provider exposes `POST /api/satellite/tiles/inventory` (bulk lookup by z/x/y or `locationHashes`) and `GET /tiles/{z:int}/{x:int}/{y:int}` (slippy-map tile fetch) — different paths, different methods, different schemas (`Program.cs:187-209`).
|
||||
3. **`operator_pre_flight_setup` is a placeholder.** The fixture at `tests/e2e/replay/conftest.py` (lines 293-310) `mkdir`s an empty `operator_cache` directory and yields. It does NOT drive C11 download or C10 descriptor-batcher; it does NOT populate C6. The fixture's docstring explicitly calls itself "a stub" pending this ticket.
|
||||
|
||||
Production architecture (per `architecture.md` Principle #5 + the C10/C11 descriptions) requires:
|
||||
|
||||
- C10 does NOT touch satellite-provider — tile network I/O lives in C11.
|
||||
- C11 `HttpTileDownloader` is the production path: authenticated GETs against the parent-suite `satellite-provider`.
|
||||
- `satellite-provider` owns OSM/CARTO tile network I/O + license attribution + multi-flight voting layer — the onboard companion is read-only against it (via C11) during pre-flight and read-only against C6 during flight.
|
||||
- `mock-sat` is fully obsolete on Jetson (D-PROJ-2 / `POST /api/satellite/upload` shipped — verified at `Program.cs:211`). Tier-1 (`docker-compose.test.yml`) is deprecated per `_docs/02_document/tests/environment.md` 2026-05-20 active policy and is OUT OF SCOPE.
|
||||
|
||||
## Outcome
|
||||
|
||||
- The e2e-runner in `docker-compose.test.jetson.yml` consumes the existing real `satellite-provider` service over `https://satellite-provider:8080` with a self-signed dev cert and a static Bearer `service_api_key` token. `mock-sat` references removed.
|
||||
- C11 `HttpTileDownloader._LIST_PATH` / `_GET_PATH` adapted to the real satellite-provider API surface (`POST /api/satellite/tiles/inventory` for LIST; `GET /tiles/{z}/{x}/{y}` for tile fetch), with the consumer code in `_do_enumerate` + `_download_one_tile` updated to match. All existing C11 unit tests in `tests/unit/c11_tile_manager/` re-greened against the new contract.
|
||||
- `satellite-provider`'s tile catalog is seeded with the Derkachi bbox (≈50.05–50.15 lat, 36.05–36.15 lon, zoom 15–18) via `POST /api/satellite/request`. Imagery source: **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`) — verified via 2026-05-22 black-box probe of the running satellite-provider. NOTE: this was originally specced as CARTO Voyager Basemap (CC-BY-3.0); the spec was amended 2026-05-22 after the probe revealed the actual upstream is Google Maps governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires Google Maps Platform licensing review OR migration to a true CC-BY source on the satellite-provider side (parent-suite ticket TBD).
|
||||
- `tests/e2e/replay/conftest.py::operator_pre_flight_setup` replaced by a real fixture that drives adapted C11 + C10 against the seeded catalog and yields a `PopulatedC6Cache` dataclass mounted via named volumes that survive across pytest sessions.
|
||||
- AC-3 (`test_ac3_within_100m_80pct_of_ticks` in `tests/e2e/replay/test_derkachi_1min.py`) un-xfails on Tier-2 Jetson with ≥ 80 % of ticks within 100 m of ground truth.
|
||||
- AZ-699 verdict test (`test_az699_real_flight_validation_emits_verdict_and_report`) un-xfails and produces the first honest horizontal-error distribution report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
**Phase 1 — wire e2e-runner against existing satellite-provider + C11 contract adaptation**
|
||||
|
||||
- `docker-compose.test.jetson.yml` (only the `e2e-runner` service block changes; the existing `satellite-provider` + `satellite-provider-postgres` blocks are unchanged):
|
||||
- Switch e2e-runner `SATELLITE_PROVIDER_URL: http://mock-sat:5100` → `SATELLITE_PROVIDER_URL: https://satellite-provider:8080`.
|
||||
- Add `SATELLITE_PROVIDER_TLS_INSECURE: "1"` env var (development-only) so requests accepts the self-signed dev cert. Loud warning + documentation per Risk 2.
|
||||
- Add `SATELLITE_PROVIDER_API_KEY: ${SATELLITE_PROVIDER_API_KEY}` env sourced from `.env.test` (matches existing `JWT_SECRET` pattern; `.env.test.example` already covers JWT_*, this one extends it with one new variable).
|
||||
- Add `e2e-runner.depends_on.satellite-provider: { condition: service_healthy }`.
|
||||
- Remove any residual `mock-sat` reference from the `e2e-runner` env block (the service itself is already gone from the file).
|
||||
- **C11 contract adaptation** (in `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`):
|
||||
- Change `_LIST_PATH = "/api/satellite/tiles"` → `_LIST_PATH = "/api/satellite/tiles/inventory"` and switch `_do_enumerate` from GET-with-query-params to POST-with-JSON-body per AZ-505 / `tile-inventory.md` v1.0.0 (body: `{tiles: [{tileZoom, tileX, tileY}, ...]}` OR `{locationHashes: [...]}`; response order matches request order with `present: true|false`).
|
||||
- Change `_GET_PATH = "/api/satellite/tiles"` → `_GET_PATH = "/tiles"` and adjust `_download_one_tile` to build `/tiles/{z}/{x}/{y}` from the inventory hit's coordinates instead of `tile_id`.
|
||||
- Map the response field renames in `TileSummary` construction (existing fields like `tile_id`, `produced_at`, `resolution_m_per_px`, `estimated_bytes` map to whatever the real inventory response uses — verify against `Program.cs` + `tile-inventory.md` and document any per-field adaptation needed).
|
||||
- Update `tests/unit/c11_tile_manager/test_tile_downloader.py` (and any other unit tests touching the LIST/GET paths) to use the new POST contract + slippy-map GET — these are stubbed-response tests, no live service needed.
|
||||
- **Smoke test** at `tests/e2e/satellite_provider/test_smoke.py` (new):
|
||||
- Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
|
||||
- Brings up the docker-compose stack (`satellite-provider` + `satellite-provider-postgres` + dependencies).
|
||||
- TCP-probe `satellite-provider:8080` until healthy.
|
||||
- Issues one Bearer-authenticated `POST /api/satellite/tiles/inventory` for a 1-tile query (a tile in the Derkachi bbox); asserts a 200 response with the documented schema.
|
||||
- For an inventory-present tile, fetches via `GET /tiles/{z}/{x}/{y}`; asserts non-empty JPEG bytes return.
|
||||
- Asserts the C11-adapted code path (`HttpTileDownloader.download_for_bbox` for a 1-tile bbox) successfully writes to C6's tile store + Postgres metadata table.
|
||||
- `docker-compose.test.yml` (Tier-1) is **NOT** modified. Tier-1 e2e is deprecated per `_docs/02_document/tests/environment.md` 2026-05-20 active policy.
|
||||
- `.env.test.example` extended with `SATELLITE_PROVIDER_API_KEY=DEV-ONLY-REPLACE-...`.
|
||||
|
||||
**Phase 2 — Derkachi tile catalog seeding via the real satellite-provider region API**
|
||||
|
||||
- `tests/fixtures/derkachi_c6/seed_region.py` (new): a Python helper that calls `POST /api/satellite/request` against the running satellite-provider to register the Derkachi bbox + zoom range. Body schema verified against the actual `RequestRegionRequest` DTO (`{id, latitude, longitude, sizeMeters, zoomLevel, stitchTiles}`) — body shape probe-confirmed 2026-05-22. Imagery source: **Google Maps satellite layer** (`lyrs=s`); satellite-provider owns the actual tile download from Google Maps and applies the freshness gate. Note: see AZ-812 for the planned `latitude/longitude` → `lat/lon` rename on this DTO.
|
||||
- `tests/fixtures/derkachi_c6/bbox.yaml`: Derkachi bbox + zoom levels + actual imagery source (Google Maps satellite, not CARTO as originally specced) + license attribution metadata (Google Maps Platform Terms of Service + "Imagery © Google" attribution string).
|
||||
- `tests/fixtures/derkachi_c6/README.md`: how to re-seed if the satellite-provider DB is wiped; license attribution operators must propagate ("Imagery © Google"); the dev-only caveat for Google Maps ToS; pointer to the parent-suite ticket (TBD) for migrating to a true CC-BY source for production.
|
||||
|
||||
**Phase 3 — replace `operator_pre_flight_setup` with a real fixture**
|
||||
|
||||
- `tests/e2e/replay/conftest.py::operator_pre_flight_setup`: replace the placeholder. The new fixture:
|
||||
- Reads the Derkachi bbox from `tests/fixtures/derkachi_c6/bbox.yaml`.
|
||||
- Invokes the adapted C11 `HttpTileDownloader` against the running satellite-provider service.
|
||||
- Invokes C10 `DescriptorBatcher` against the populated C6 (NetVLAD backbone per `c2_vpr/config.py:67` default).
|
||||
- Verifies sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency check per AZ-306).
|
||||
- Yields a `PopulatedC6Cache` dataclass that the test bodies consume.
|
||||
- Outputs mounted into the e2e-runner container via named volumes that survive across pytest sessions.
|
||||
|
||||
**Phase 4 — un-xfail the Tier-2 tests**
|
||||
|
||||
- `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks`: remove `@pytest.mark.xfail` (still gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`).
|
||||
- `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`: remove `@pytest.mark.xfail`. The test body MUST emit the verdict report regardless of PASS/FAIL — the success criterion is that the report exists with the honest distribution.
|
||||
|
||||
**Phase 5 — documentation**
|
||||
|
||||
- `_docs/02_document/contracts/replay/replay_protocol.md`: extend Invariant 12 with an AZ-777 sub-section describing the operator_pre_flight_setup behaviour against the real satellite-provider.
|
||||
- `_docs/00_problem/input_data/flight_derkachi/README.md`: add a Derkachi C6 section pointing at the seed script + bbox config.
|
||||
- `_docs/02_document/architecture.md`: append a sub-section to the existing satellite-provider entry noting that the Jetson e2e harness consumes the real .NET service (AZ-688 / AZ-691 / AZ-692 prior art; AZ-777 closes the C11 contract gap and wires the e2e-runner client). Tier-1 status updated to "deprecated 2026-05-20".
|
||||
|
||||
### Excluded
|
||||
|
||||
- ZERO modifications to `../satellite-provider/`. If a parent-suite gap surfaces beyond C11 adapting to existing endpoints (e.g., inventory response missing fields C11 needs, region-onboarding endpoint rejects the Derkachi payload shape), STOP and file a parent-suite ticket.
|
||||
- `docker-compose.test.yml` (Tier-1) — OUT OF SCOPE (deprecated 2026-05-20).
|
||||
- Cross-compile / arm64 follow-up — **CLOSED**: `mcr.microsoft.com/dotnet/aspnet:10.0` has an arm64 manifest (verified 2026-05-21 via `docker manifest inspect`). No follow-up ticket needed.
|
||||
- `mock-sat` retention — **CLOSED**: already retired from Jetson compose; D-PROJ-2 / `POST /api/satellite/upload` has shipped on the real satellite-provider (`Program.cs:211`).
|
||||
- Switching C2 default backbone away from `net_vlad` — out of scope.
|
||||
- Persisting populated C6 to git/LFS — named-volume approach unchanged.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: satellite-provider healthy in Jetson compose**
|
||||
Given the existing `satellite-provider` + `satellite-provider-postgres` services in `docker-compose.test.jetson.yml`
|
||||
When `docker compose -f docker-compose.test.jetson.yml up satellite-provider` is invoked
|
||||
Then both services build, the satellite-provider becomes healthy via TCP probe on port 8080 (per existing healthcheck), and is reachable from any compose-network service via DNS `satellite-provider:8080`
|
||||
|
||||
**AC-2: C11 contract aligns with satellite-provider's actual API**
|
||||
Given the adapted C11 `_LIST_PATH=/api/satellite/tiles/inventory` (POST) and `_GET_PATH=/tiles/{z}/{x}/{y}` (GET) against the running satellite-provider
|
||||
When `tests/e2e/satellite_provider/test_smoke.py` runs `HttpTileDownloader.download_for_bbox` for a 1-tile bbox in the Derkachi region (seeded)
|
||||
Then the inventory POST returns 200 with the documented schema, the tile fetch returns non-empty JPEG bytes, and C6's tile store + Postgres metadata both reflect the tile (freshness label `fresh`)
|
||||
|
||||
**AC-3: operator_pre_flight_setup drives the production pipeline**
|
||||
Given the running satellite-provider with Derkachi tiles seeded
|
||||
When `tests/e2e/replay/conftest.py::operator_pre_flight_setup` runs
|
||||
Then adapted C11 downloads the Derkachi-bbox tiles into C6, C10 `DescriptorBatcher` builds the FAISS HNSW index using the NetVLAD backbone, the three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check, and the fixture yields a `PopulatedC6Cache` with all three artifact paths populated
|
||||
|
||||
**AC-4: Derkachi AC-3 test un-xfails on Tier-2**
|
||||
Given AZ-776 landed + the populated C6 from AC-3 mounted into the e2e-runner + `c5_state.strategy = gtsam_isam2` + `c4_pose.enabled = True`
|
||||
When `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` runs on Tier-2 Jetson
|
||||
Then it un-xfails, the test passes (≥ 80 % of ticks within 100 m of ground truth), and the per-frame loop emits `replay.satellite_anchor_inserted` log lines (not `satellite_anchoring_not_wired`)
|
||||
|
||||
**AC-5: AZ-699 verdict report is produced**
|
||||
Given AZ-776 landed + the populated C6 from AC-3 + the real flight video + factory calibration
|
||||
When `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` runs on Tier-2 Jetson
|
||||
Then it un-xfails, the test runs to completion within the 15-min NFR budget, and `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` records the horizontal-error distribution with the honest PASS/FAIL verdict against the ≥ 80 % within 100 m gate (PASS not required for the AC; HONEST report required)
|
||||
|
||||
**AC-6: Documentation captures the new architecture seam**
|
||||
Given the updated replay protocol doc + Derkachi fixture README + architecture sub-section
|
||||
When a new contributor reads them
|
||||
Then they understand (i) why the real satellite-provider runs in the Jetson e2e harness, (ii) the C11 contract used against satellite-provider (inventory + slippy-map), (iii) how to re-seed the Derkachi catalog, (iv) what license attribution operators must propagate, and (v) why Tier-1 is deprecated
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `operator_pre_flight_setup` completes in ≤ 5 minutes on first invocation (cold cache), ≤ 30 seconds on subsequent invocations within the same docker-compose session (warm cache via named volume).
|
||||
- C11 inventory POST + per-tile GET round-trips MUST stay within the existing C11 retry/backoff schedule (`_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`). No new retry budget.
|
||||
|
||||
**Compatibility**
|
||||
- Tile on-disk layout `{zoom}/{x}/{y}.jpg` MUST be byte-equivalent to satellite-provider's layout (architecture principle #5) — automatic via C6 write path.
|
||||
- FAISS index format MUST be loadable by the airborne `c6_descriptor_index.FaissDescriptorIndex.from_config` impl without code changes — automatic via C6 write path.
|
||||
- C11 inventory POST schema MUST match `tile-inventory.md` v1.0.0 (AZ-505). Schema mismatch is a parent-suite bug; this task adapts C11 to the documented v1.0.0 contract, no further patches.
|
||||
|
||||
**Reliability**
|
||||
- The smoke test (AC-2) MUST fail loud if satellite-provider is unreachable, returns malformed responses, rate-limits, or returns 401/403 (auth failure) — no silent skip.
|
||||
- `operator_pre_flight_setup` MUST clean up partial cache state on failure (no half-built FAISS index left).
|
||||
- SHA-256 content-hash gate on the FAISS index (per D-C10-3) verified at every fixture yield — mismatch raises `IndexUnavailableError`.
|
||||
|
||||
**Security**
|
||||
- `SATELLITE_PROVIDER_TLS_INSECURE=1` is a **development-only** override. Documented in `.env.test.example` + the smoke test + the architecture sub-section. Production deploys MUST validate against a real CA-issued cert.
|
||||
- `SATELLITE_PROVIDER_API_KEY` sourced from `.env.test`; never committed; same `.gitignore` pattern as `JWT_SECRET`.
|
||||
- C11 download goes through the production Bearer-token auth path (`Authorization: Bearer ${SATELLITE_PROVIDER_API_KEY}`) — no auth bypass.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | `docker-compose.test.jetson.yml` lints; e2e-runner depends_on satellite-provider | `docker compose -f docker-compose.test.jetson.yml config` exits 0 |
|
||||
| AC-2 | C11 `_do_enumerate` against a stubbed POST `/api/satellite/tiles/inventory` response | Returns `list[TileSummary]` with correct field mapping |
|
||||
| AC-2 | C11 `_download_one_tile` against a stubbed GET `/tiles/{z}/{x}/{y}` response | Writes tile bytes + sha256 to C6 adapter |
|
||||
| AC-3 | `operator_pre_flight_setup` fixture yields a `PopulatedC6Cache` with non-empty tile store + FAISS index | All three sidecar files exist + sha256 triple-consistency holds |
|
||||
| AC-3 | Sidecar SHA-256 coherence check inside the fixture | `IndexUnavailableError` raised when one of the three files is tampered |
|
||||
| AC-6 | Fixture README documents the seed invocation | Invocation string + license attribution greps cleanly |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|--------------|-------------------|----------------|
|
||||
| AC-1 | Jetson compose | `docker compose up satellite-provider` | Both services come up healthy in ≤ 60 s | Perf |
|
||||
| AC-2 | Real satellite-provider running + 1-tile-bbox query | C11 adapted HttpTileDownloader against the live service | Tile arrives in C6 + metadata row inserted + freshness=fresh | Reliability |
|
||||
| AC-3 | Seeded Derkachi catalog + e2e-runner | `operator_pre_flight_setup` cold + warm invocation | Cold ≤ 5 min, warm ≤ 30 s, all three sidecar files coherent | Perf |
|
||||
| AC-4 | AZ-776 landed + populated C6 mounted + full-GTSAM YAML | `test_ac3_within_100m_80pct_of_ticks` un-xfailed on Tier-2 Jetson | Test passes (≥ 80 % within 100 m); `satellite_anchor_inserted` log lines visible | Perf, Compat |
|
||||
| AC-5 | AZ-776 landed + populated C6 mounted + real flight video + factory calibration | `test_az699_real_flight_validation_emits_verdict_and_report` un-xfailed | Test completes ≤ 15 min, verdict report written to `_docs/06_metrics/` | Perf |
|
||||
|
||||
## Constraints
|
||||
|
||||
- ZERO modifications to files under `../satellite-provider/` (sibling repo). If a parent-suite gap is discovered, STOP and file a parent-suite ticket.
|
||||
- Per replay protocol Invariant 5: ZERO outbound network from the e2e-runner once the cache is populated. The cache-population phase needs network (satellite-provider downloads from CARTO upstream); the airborne replay run is internal-network-only.
|
||||
- Imagery source: **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Originally specced as CC-BY-licensed (CARTO Voyager); amended 2026-05-22 after probe revealed Google Maps is the actual upstream. License attribution string ("Imagery © Google") recorded in the seeded catalog's metadata. Dev/research use only; production deploy requires (a) Google Maps Platform licensing review for offline-cache use, OR (b) parent-suite ticket to add a true CC-BY satellite imagery provider to satellite-provider (Esri World Imagery, Mapbox satellite, Sentinel-2, etc.).
|
||||
- The seeded Derkachi catalog size budget is 100 MB on the satellite-provider DB side. Over budget → reduce zoom-level coverage; document in `bbox.yaml`.
|
||||
- Tier-1 (`docker-compose.test.yml`) is deprecated and MUST NOT be modified by this task.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: C11 inventory response field names drift further from `tile-inventory.md` v1.0.0**
|
||||
- *Risk*: Even after fixing `_LIST_PATH` + `_GET_PATH`, the response object fields (`tile_id`, `produced_at`, `resolution_m_per_px`, `estimated_bytes`, etc.) may not match the inventory response's actual field names; or the inventory response may not include all the fields C11's `TileSummary` requires.
|
||||
- *Mitigation*: Phase 1 verifies field mapping against `tile-inventory.md` v1.0.0 + `Program.cs::GetTilesInventory` source. Per-field renames are a gps-denied-onboard side concern (C11 adapter); only fields entirely missing from the inventory response warrant a parent-suite ticket.
|
||||
|
||||
**Risk 2: Self-signed cert CN/SAN doesn't include `satellite-provider` hostname**
|
||||
- *Risk*: The dev cert at `../satellite-provider/certs/api.pfx` may be issued for `localhost` only; via compose DNS `satellite-provider:8080` it would fail SSL verification.
|
||||
- *Mitigation*: Phase 1 introduces `SATELLITE_PROVIDER_TLS_INSECURE=1` env knob — accepted as a **development-only** workaround with prominent warnings in `.env.test.example`, the smoke test, and the architecture doc. Production deploys MUST set this to `0` (default) and use a real cert. Regenerating the dev cert with the right SAN is the cleaner long-term fix but lives on the parent-suite side; file a follow-up ticket if the workaround feels brittle.
|
||||
|
||||
**Risk 3: ~~satellite-provider doesn't build on arm64~~ — CLOSED 2026-05-21**
|
||||
- `mcr.microsoft.com/dotnet/aspnet:10.0` multi-arch manifest verified via `docker manifest inspect`: arm64, amd64, arm/v7 all present. No follow-up needed.
|
||||
|
||||
**Risk 4: ~~CARTO Voyager basemap residual is too coarse for AC-4~~ — REDEFINED 2026-05-22**
|
||||
- *Original concern*: CC-BY basemap is OSM-derived (street-level features, not satellite features). NetVLAD descriptors may not lock against nadir camera frames well enough for ≥ 80 % within 100 m.
|
||||
- *Probe-verified reality (2026-05-22)*: The actual upstream is **Google Maps satellite layer** (`lyrs=s`), which IS high-resolution overhead imagery from genuine satellite/aerial sources. NetVLAD descriptor lock should be strong against nadir camera frames. The original CARTO-coarseness risk is mitigated by the reality.
|
||||
- *New risk (replacing it)*: **Google Maps Platform Terms of Service may restrict offline-tile storage** for the C6-style use case (long-lived cache of stored tiles serving as a VPR reference dataset). Acceptable for dev/research; production deployment requires licensing review or a CC-BY-source migration on the satellite-provider side. Surfaced explicitly in `bbox.yaml`, `README.md`, and the architecture doc sub-section.
|
||||
- *Mitigation*: AC-5 (AZ-699 verdict report) still serves as the honest signal regardless of imagery quality. If VPR locks well, AC-4 passes; if it doesn't, the verdict report records the actual horizontal-error distribution and points to a follow-up (e.g., higher-zoom seeding, different descriptor backbone, or migrating to a CC-BY satellite source for both licensing AND quality reasons).
|
||||
|
||||
**Risk 5: Single-ticket 8-pt complexity exceeds the standard PBI cap**
|
||||
- *Risk*: Above the 5-pt cap stated in the project's PBI complexity rule.
|
||||
- *Mitigation*: The five phases are explicit STOP-gates. If Phase 1 (wiring + C11 adaptation) fails for reasons outside this ticket's scope (e.g., parent-suite contract drift beyond field renames, cert hostname issue requiring parent-suite regen), the implementer STOPS at the phase boundary, files the parent-suite ticket, and proposes a split into smaller follow-up tickets. The "single ticket" property holds as long as work proceeds linearly; if any phase grinds, decomposition is the escape hatch.
|
||||
|
||||
### ADR Impact
|
||||
|
||||
> Affects ADR-002 (build-time exclusion): unchanged — C11 is already operator-side-only via process-level isolation (architecture Principle #4 + ADR-004); this task adapts C11's contract but does not change its build-time isolation.
|
||||
> Affects ADR-011 (replay is a configuration): unchanged — the per-frame loop is mode-agnostic; this task closes the gap between the live and replay paths' upstream tile source.
|
||||
> Implements architecture principle #5 (satellite-provider on-disk layout) end-to-end against a real flight for the first time.
|
||||
> No new ADR — the architectural decision is "adapt C11 to the existing satellite-provider contract and wire the e2e harness against the real service", which is execution of existing decisions, not a new one.
|
||||
@@ -0,0 +1,85 @@
|
||||
# operator_pre_flight_setup real fixture (AZ-835 C3)
|
||||
|
||||
**Task**: AZ-839_operator_pre_flight_setup_real_fixture
|
||||
**Name**: operator_pre_flight_setup fixture: wire C1+C2+C11+C10 into real fixture, supersede AZ-777 Phase 3 (AZ-835 C3)
|
||||
**Description**: Third building block of Epic AZ-835. Replace the placeholder `operator_pre_flight_setup` fixture (currently a `mkdir` stub at `tests/e2e/replay/conftest.py` lines 293-310) with a real driver that wires C1 (AZ-836) + C2 (AZ-838) + C11 (AZ-777 Phase 1) + C10 to populate C6 from a tlog-derived route. Supersedes AZ-777 Phase 3 (the bbox-seeded placeholder-replacement design) per the 2026-05-22 user directive — route-driven seeding is ~100x more tile-efficient and pre-commits to where the operator did fly per the tlog.
|
||||
**Complexity**: 5 SP
|
||||
**Dependencies**: AZ-836 (C1, RouteSpec + extractor — In Testing); AZ-838 (C2, SatelliteProviderRouteClient + seed_route.py CLI — In Testing); AZ-777 Phase 1 (e2e-runner ↔ satellite-provider wire + C11 contract adaptation — done, batch 104); AZ-322 (C10 DescriptorBatcher — done); AZ-316+AZ-777 Phase 1 (C11 HttpTileDownloader.download_for_bbox — done); AZ-306 (FAISS sidecar triple-consistency — done); AZ-835 (parent Epic)
|
||||
**Component**: `tests/e2e/replay/conftest.py` (`operator_pre_flight_setup` fixture rewrite + new `PopulatedC6Cache` dataclass)
|
||||
**Tracker**: AZ-839 (https://denyspopov.atlassian.net/browse/AZ-839)
|
||||
**Parent Epic**: AZ-835
|
||||
|
||||
Jira AZ-839 is the authoritative spec; this file is the in-workspace mirror.
|
||||
|
||||
## Public surface
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from gps_denied_onboard.replay_input.tlog_route import RouteSpec # AZ-836
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class PopulatedC6Cache:
|
||||
cache_root: Path # named-volume mount inside the e2e-runner container
|
||||
tile_store_path: Path # postgres + filesystem store root
|
||||
faiss_index_path: Path # .index file
|
||||
faiss_sidecar_sha256_path: Path # .sha256 file
|
||||
faiss_sidecar_meta_path: Path # .meta.json file
|
||||
route_spec: RouteSpec # provenance — which tlog/route produced this cache
|
||||
tile_count: int # how many tiles ended up in C6
|
||||
elapsed_seconds: float # wall time, for the AC-1/AC-2 perf budget
|
||||
```
|
||||
|
||||
The fixture remains a pytest fixture at `tests/e2e/replay/conftest.py::operator_pre_flight_setup`, same `session` scope as today. Input contract unchanged (same args the placeholder takes) plus a new dependency on `RouteSpec` — either fixture-injected or extracted from the test's tlog parameter via `extract_route_from_tlog`.
|
||||
|
||||
## Behaviour
|
||||
|
||||
1. Read the route spec (fixture-injected or extracted from test tlog via `extract_route_from_tlog`).
|
||||
2. Instantiate `SatelliteProviderRouteClient` from env (`SATELLITE_PROVIDER_URL`, `SATELLITE_PROVIDER_API_KEY`, `SATELLITE_PROVIDER_TLS_INSECURE`).
|
||||
3. Call `seed_route(route_spec)`. On `RouteValidationError` / `RouteTerminalFailureError` → re-raise with original cause. On `RouteTransientError` → retry up to 3 attempts using C11's `_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`.
|
||||
4. Enumerate tile coverage locally (mirror `route_client._enumerate_route_tile_coords` from AZ-838); call C11 `HttpTileDownloader.download_for_bbox` to pull every tile into C6.
|
||||
5. Invoke C10 `DescriptorBatcher` against the populated C6 to build the FAISS HNSW index using the NetVLAD backbone (per `c2_vpr/config.py:67` default).
|
||||
6. Verify sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency per AZ-306). Mismatch → `IndexUnavailableError`.
|
||||
7. Yield `PopulatedC6Cache(...)`. On any failure path, clean up partial cache state (no half-built FAISS index left behind).
|
||||
|
||||
**Mount strategy**: write into a named docker volume that survives across pytest sessions. Cold first invocation populates; subsequent invocations within the same compose session reuse (warm cache). Same pattern AZ-777 Phase 3 originally specced; only the cache **source** changes (route, not bbox).
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
| # | Criterion |
|
||||
|---|-----------|
|
||||
| AC-1 | Cold first invocation on the Derkachi tlog completes in ≤ 5 min on Tier-2 Jetson (includes satellite-provider Google Maps round-trips). |
|
||||
| AC-2 | Warm invocation within the same compose session completes in ≤ 30 s (named-volume reuse). |
|
||||
| AC-3 | Yielded `PopulatedC6Cache` has all paths populated; `tile_count > 0`; FAISS sidecar triple-consistency passes (AZ-306). |
|
||||
| AC-4 | `RouteValidationError` / `RouteTerminalFailureError` from `seed_route` is re-raised with original cause; no silent swallow. |
|
||||
| AC-5 | `RouteTransientError` is retried up to 3 attempts using C11's existing backoff schedule; final attempt's exception is propagated. |
|
||||
| AC-6 | Tamper test — corrupt one of the three sidecar files between fixture runs; next invocation raises `IndexUnavailableError`. |
|
||||
| AC-7 | On any failure path inside the fixture, partial state is cleaned up (no half-built FAISS index, no orphaned postgres rows). |
|
||||
| AC-8 | Unit tests (stubbed `SatelliteProviderRouteClient` + stubbed C11 + stubbed C10) cover: happy path, transient-retry, terminal-failure, validation-error, tamper-detection, cleanup-on-failure. |
|
||||
| AC-9 | Integration test gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2` against the Jetson harness produces a real `PopulatedC6Cache` from the Derkachi tlog. |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Driving the airborne replay pipeline against the populated cache (AZ-840 / C4).
|
||||
- Un-xfailing the existing AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
|
||||
- Updating `replay_protocol.md` (AZ-842 / C6).
|
||||
- Switching the C2 default backbone away from NetVLAD.
|
||||
- Multi-tlog aggregate caches (one route per fixture invocation).
|
||||
|
||||
## Risks
|
||||
|
||||
**Risk 1 — Docker named-volume lifecycle across pytest sessions.** First invocation may leave half-populated volume on crash; the cleanup-on-failure path in step 7 must be robust. Mitigation: AC-7 covers explicitly + a `try/finally` around the four wiring steps.
|
||||
|
||||
**Risk 2 — Cold-start budget (AC-1, 5 min) tight on first Jetson run.** Google Maps round-trips for ~50-100 tiles may exceed budget on slow networks. Mitigation: instrument elapsed_seconds on every step and surface in the verdict report; if AC-1 fails, file a perf-tuning ticket rather than skipping the AC.
|
||||
|
||||
## References
|
||||
|
||||
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
|
||||
- Existing placeholder: `tests/e2e/replay/conftest.py` lines 293-310
|
||||
- C1: AZ-836 (`extract_route_from_tlog`) — https://denyspopov.atlassian.net/browse/AZ-836
|
||||
- C2: AZ-838 (`SatelliteProviderRouteClient`) — https://denyspopov.atlassian.net/browse/AZ-838
|
||||
- AZ-777 (Phase 3+ superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md`
|
||||
- C10 DescriptorBatcher: `src/gps_denied_onboard/components/c10_provisioning/descriptor_batcher.py`
|
||||
- C11 HttpTileDownloader: `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`
|
||||
- AZ-306 FAISS sidecar triple-consistency reference
|
||||
@@ -0,0 +1,75 @@
|
||||
# E2E orchestrator test (AZ-835 C4)
|
||||
|
||||
**Task**: AZ-840_e2e_orchestrator_test
|
||||
**Name**: E2E orchestrator test ingesting raw (tlog, video, calibration) and running steps 1-7 (AZ-835 C4)
|
||||
**Description**: Fourth building block of Epic AZ-835. A single pytest test that takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on the Jetson harness — without any operator hand-curation between steps. Extends or wraps the existing AZ-699 verdict test (`tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`) so the verdict-report-writing path is preserved. This is the test that closes the Epic's narrative: "give it a tlog + video, and the system does everything else."
|
||||
**Complexity**: 3 SP
|
||||
**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-836 (C1, RouteSpec — In Testing); AZ-838 (C2, SatelliteProviderRouteClient — In Testing); AZ-699 (real flight validation runner — done); AZ-405 (tlog/video auto-sync — done); AZ-702 (camera factory-sheet calibration — done); AZ-696 (≥ 80 % within 100 m threshold — done); AZ-835 (parent Epic)
|
||||
**Component**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (new) OR extend `test_derkachi_real_tlog.py`
|
||||
**Tracker**: AZ-840 (https://denyspopov.atlassian.net/browse/AZ-840)
|
||||
**Parent Epic**: AZ-835
|
||||
|
||||
Jira AZ-840 is the authoritative spec; this file is the in-workspace mirror.
|
||||
|
||||
## Inputs (test parameters)
|
||||
|
||||
- `tlog_path: Path` — raw ArduPilot tlog binary (Derkachi as the reference fixture; parametrize for future tlogs).
|
||||
- `video_path: Path` — raw flight video.
|
||||
- `calibration_path: Path` — camera factory-sheet calibration (AZ-702).
|
||||
|
||||
## Pipeline orchestration
|
||||
|
||||
The 7 steps from the Epic:
|
||||
|
||||
1. **Active flight cut + tlog/video sync** — call AZ-405's `tlog_video_adapter`. If active-segment detection needs a small extension, file as an in-scope sub-fix; if it needs a meaningful new feature, STOP and propose a sibling ticket.
|
||||
2. **On-fly frame + IMU extraction** — `VideoFileFrameSource` + `TlogReplayFcAdapter`. No change.
|
||||
3. **Auto-create route** — call AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)`. Assert the returned `RouteSpec` materially follows the tlog trajectory.
|
||||
4. **POST route to satellite-provider** — delegate to AZ-839 (C3) fixture `operator_pre_flight_setup` (which itself calls AZ-838's `SatelliteProviderRouteClient.seed_route`). The fixture's `PopulatedC6Cache` is the dependency boundary.
|
||||
5. **Build FAISS index** — driven by C3 fixture as part of populating the cache.
|
||||
6. **Run gps-denied airborne pipeline** — invoke the `gps-denied-replay` console-script or equivalent direct-call entry point against the populated cache + tlog/video/calibration. Reuse the airborne composition root path AZ-699 exercises today.
|
||||
7. **Get GPS fixes, check vs tlog GPS** — call `helpers/accuracy_report.py` + `helpers/gps_compare.py` to compute the horizontal-error distribution and emit the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
|
||||
|
||||
## Test gating
|
||||
|
||||
- `@pytest.mark.tier2`.
|
||||
- Skip-unless-env(`RUN_REPLAY_E2E=1`) with an explicit skip reason that names the missing env var — no silent skip.
|
||||
|
||||
## Verdict report
|
||||
|
||||
Emit ALWAYS, even on FAIL. The success criterion for AC-1 is that the report exists and the distribution is honest — NOT that the verdict is PASS.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
| # | Criterion |
|
||||
|---|-----------|
|
||||
| AC-1 | Test takes only `(tlog, video, calibration)` and runs steps 1-7 end-to-end on Tier-2 Jetson. No operator hand-curation between steps. |
|
||||
| AC-2 | Test produces the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest horizontal-error distribution, REGARDLESS of PASS/FAIL on the AZ-696 AC-3 threshold (≥ 80 % within 100 m). |
|
||||
| AC-3 | Test reuses the C3 fixture's `operator_pre_flight_setup` for steps 3-5; no duplicate seeding/downloading logic. |
|
||||
| AC-4 | Test runs to completion within 15 min wall time on the Derkachi clip (soft target for first delivery; hard NFR set after first measurement is recorded in the report). |
|
||||
| AC-5 | Mid-pipeline failure (e.g. step 4 satellite-provider rejection, step 5 FAISS sidecar mismatch) fails LOUD with a clear error pointing at the failing step. No silent skip past a failing step. |
|
||||
| AC-6 | Test is gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`; explicit skip reason names the missing env var. |
|
||||
| AC-7 | The existing AZ-699 verdict test continues to pass (this test does not break or supersede it; either it lives alongside, or AZ-699 is folded into this test with the verdict-writing path preserved). |
|
||||
| AC-8 | Unit tests cover the orchestration helper layer (parameter validation, error propagation between steps). The end-to-end happy path is the Jetson integration test. |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Un-xfailing the AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
|
||||
- Documentation updates beyond the test file's own docstring (AZ-842 / C6).
|
||||
- Real-time tlog ingestion (one finished `.tlog` per test invocation).
|
||||
- Multi-flight aggregate validation.
|
||||
- Performance optimization beyond the AC-4 soft target.
|
||||
- Modifying the airborne composition root.
|
||||
|
||||
## Risks
|
||||
|
||||
**Risk 1 — Integration glue between AZ-405 tlog/video sync and the airborne pipeline's frame-source contract.** The auto-sync adapter and the airborne composition root were authored in different cycles; small impedance mismatches are likely. Mitigation: if the glue exceeds the 3 SP budget, STOP and propose a sub-ticket rather than expanding scope.
|
||||
|
||||
**Risk 2 — Step 1 active-segment detection may need extension.** AZ-405 covered tlog↔video sync; take-off/landing boundary detection may not be implemented. Mitigation: file an in-scope sub-fix if small; STOP and propose a sibling ticket if not.
|
||||
|
||||
## References
|
||||
|
||||
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
|
||||
- Hard dep (C3 fixture): AZ-839 — https://denyspopov.atlassian.net/browse/AZ-839
|
||||
- Existing verdict test: `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`
|
||||
- Tlog/video adapter: `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` (AZ-405)
|
||||
- Helpers: `src/gps_denied_onboard/helpers/accuracy_report.py`, `src/gps_denied_onboard/helpers/gps_compare.py`
|
||||
@@ -0,0 +1,57 @@
|
||||
# Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests (AZ-835 C5)
|
||||
|
||||
**Task**: AZ-841_unxfail_az777_tier2_tests
|
||||
**Name**: Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests once C3 fixture + C4 orchestrator land (AZ-835 C5)
|
||||
**Description**: Fifth building block of Epic AZ-835. Once C3 (AZ-839, `operator_pre_flight_setup` real fixture) and C4 (AZ-840, e2e orchestrator test) land, remove the `@pytest.mark.xfail` markers from the AZ-777 Tier-2 tests. The verdict — PASS or FAIL — becomes the honest signal. Both tests remain gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
|
||||
**Complexity**: 1 SP
|
||||
**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-840 (C4, e2e orchestrator test — HARD); AZ-777 (being closed/superseded by this Epic; tests live in same file tree); AZ-835 (parent Epic)
|
||||
**Component**: `tests/e2e/replay/test_derkachi_1min.py` (xfail removal) + `tests/e2e/replay/test_derkachi_real_tlog.py` (xfail removal)
|
||||
**Tracker**: AZ-841 (https://denyspopov.atlassian.net/browse/AZ-841)
|
||||
**Parent Epic**: AZ-835
|
||||
|
||||
Jira AZ-841 is the authoritative spec; this file is the in-workspace mirror.
|
||||
|
||||
## Targets
|
||||
|
||||
1. `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` (AZ-777 AC-4) — remove `@pytest.mark.xfail`; verify `@pytest.mark.tier2` + `RUN_REPLAY_E2E` gating stays in place.
|
||||
2. `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` (AZ-777 AC-5) — remove `@pytest.mark.xfail`; verify gating stays in place.
|
||||
|
||||
## Verification
|
||||
|
||||
**On Tier-2 Jetson** (`RUN_REPLAY_E2E=1`):
|
||||
- `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % of ticks within 100 m of ground truth, log lines `replay.satellite_anchor_inserted` visible).
|
||||
- `test_az699_real_flight_validation_emits_verdict_and_report` runs to completion within 15 min and emits `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest distribution. PASS preferred but NOT required for AC-4 — emitting the honest report IS the success criterion.
|
||||
|
||||
**Locally** (no env):
|
||||
- Both tests skip explicitly with a reason naming `RUN_REPLAY_E2E` — they MUST NOT pass as a side effect of being skipped.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
| # | Criterion |
|
||||
|---|-----------|
|
||||
| AC-1 | `@pytest.mark.xfail` removed from both AZ-777 tests. |
|
||||
| AC-2 | Both tests still gated by `@pytest.mark.tier2` + skip-unless-env(`RUN_REPLAY_E2E=1`). Skip reason names the missing env. |
|
||||
| AC-3 | On Jetson with `RUN_REPLAY_E2E=1`, `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % within 100 m). |
|
||||
| AC-4 | On Jetson with `RUN_REPLAY_E2E=1`, `test_az699_real_flight_validation_emits_verdict_and_report` completes within 15 min and emits the verdict report. PASS preferred but not required for AC-4. |
|
||||
| AC-5 | If either test FAILS on the metric (e.g. only 60 % within 100 m), the test reports FAIL honestly — no fallback to xfail or skip. Failure mode is a feature, not a bug. |
|
||||
| AC-6 | Locally on a machine without `RUN_REPLAY_E2E`, both tests skip with an explicit reason. |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Modifying the airborne pipeline to improve metric performance (separate optimization tickets if AC-3 fails).
|
||||
- Adding new test cases (this ticket only removes xfail; new cases belong to other tickets).
|
||||
- Documentation updates (AZ-842 / C6).
|
||||
- Modifying the verdict thresholds (AZ-696).
|
||||
|
||||
## Risks
|
||||
|
||||
**Risk 1 — Un-xfailed tests may FAIL on the metric.** If horizontal-error distribution comes in worse than the 80 % @ 100 m gate, this test reports FAIL. That outcome is in-scope for AC-5 (report honestly) and out-of-scope for this ticket's fix (file a separate optimization ticket).
|
||||
|
||||
## References
|
||||
|
||||
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
|
||||
- Hard deps: AZ-839 (C3), AZ-840 (C4)
|
||||
- Tests: `tests/e2e/replay/test_derkachi_1min.py`, `tests/e2e/replay/test_derkachi_real_tlog.py`
|
||||
- AZ-777 spec: `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
|
||||
- Threshold spec: AZ-696 (≥ 80 % within 100 m)
|
||||
- Verdict writer: `src/gps_denied_onboard/helpers/accuracy_report.py`
|
||||
@@ -0,0 +1,65 @@
|
||||
# Docs: replay_protocol.md + architecture.md + orchestrator-test README (AZ-835 C6)
|
||||
|
||||
**Task**: AZ-842_replay_protocol_and_orchestrator_docs
|
||||
**Name**: Docs: replay_protocol.md Invariant 12 + AZ-777 Phase 3+ superseded note + orchestrator-test README (AZ-835 C6)
|
||||
**Description**: Sixth and final building block of Epic AZ-835. Capture the route-driven flow in the authoritative documents so future implementers, operators, and reviewers understand what changed and why.
|
||||
**Complexity**: 2 SP
|
||||
**Dependencies**: AZ-841 (C5, un-xfail — SOFT; README describes test outcomes assuming C5 has landed); AZ-777 (being closed/superseded by this Epic — AZ-777 spec is updated during the AZ-777 closure step, verified by AC-6); AZ-835 (parent Epic)
|
||||
**Component**: `_docs/02_document/contracts/replay/replay_protocol.md` + `_docs/02_document/architecture.md` + `tests/e2e/replay/README*.md`
|
||||
**Tracker**: AZ-842 (https://denyspopov.atlassian.net/browse/AZ-842)
|
||||
**Parent Epic**: AZ-835
|
||||
|
||||
Jira AZ-842 is the authoritative spec; this file is the in-workspace mirror.
|
||||
|
||||
## Modified files
|
||||
|
||||
### 1. `_docs/02_document/contracts/replay/replay_protocol.md` — Invariant 12 extension
|
||||
|
||||
Extend **Invariant 12** with an AZ-835 sub-section describing:
|
||||
|
||||
- The route-driven `operator_pre_flight_setup` fixture (AZ-839 / C3) flow: tlog → `RouteSpec` → `POST /api/satellite/route` → tile download → FAISS build → yield `PopulatedC6Cache`.
|
||||
- Why route-driven supersedes the AZ-777 bbox approach (efficiency: ~100× fewer tiles; honesty: pre-commits to where the operator did fly).
|
||||
- The C3 fixture's failure-handling contract (validation/terminal → re-raise; transient → retry up to 3 attempts using C11's existing backoff schedule).
|
||||
|
||||
### 2. `_docs/02_document/architecture.md` — satellite-provider entry extension
|
||||
|
||||
Append a sub-section to the existing satellite-provider entry noting that Epic AZ-835 + its C1-C5 children landed the full e2e real-flight validation path on top of AZ-777 Phase 1's wire + C11 contract adaptation. Mark AZ-777 Phase 3+ as superseded by Epic AZ-835 (pointer-only — the AZ-777 spec itself is updated in C5's wake during the AZ-777 closure step).
|
||||
|
||||
### 3. `tests/e2e/replay/README*.md` — orchestrator-test README
|
||||
|
||||
Either extend `tests/e2e/replay/README.md` or create a dedicated `tests/e2e/replay/README_AZ835.md` (prefer dedicated file if the existing README is already long). Short operator-facing content:
|
||||
|
||||
- How to run the new orchestrator test locally (env vars, Jetson SSH alias, expected runtime).
|
||||
- What `(tlog, video, calibration)` triple to provide and where the reference Derkachi fixture lives.
|
||||
- Where the verdict report is written and how to interpret it (PASS/FAIL on AZ-696 AC-3 threshold).
|
||||
- Imagery-source caveat: Google Maps satellite (dev/research use only; production needs CC-BY migration on the satellite-provider side).
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
| # | Criterion |
|
||||
|---|-----------|
|
||||
| AC-1 | `replay_protocol.md` Invariant 12 has a new AZ-835 sub-section covering the route-driven flow, the bbox-supersedure rationale, and the failure-handling contract. |
|
||||
| AC-2 | `architecture.md` satellite-provider entry has a sub-section noting Epic AZ-835's contribution and pointing at AZ-777 Phase 3+ as superseded. |
|
||||
| AC-3 | `tests/e2e/replay/README*.md` exists and a new contributor can run the orchestrator test on Jetson using only the README's instructions (no out-of-band knowledge required). |
|
||||
| AC-4 | All three docs link to the Epic (AZ-835) and to the relevant child tickets (AZ-836 / AZ-838 / AZ-839 / AZ-840 / AZ-841). |
|
||||
| AC-5 | License attribution string ("Imagery © Google") and the dev-only caveat are present in the test README. |
|
||||
| AC-6 | Cross-references in `_docs/02_tasks/_dependencies_table.md` and `_docs/02_tasks/done/AZ-777*.md` (once moved) point at this Epic / its children. |
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Updating consumer-facing API/contract docs in `../satellite-provider/` (parent-suite owns those).
|
||||
- Migrating imagery source to a CC-BY provider (parent-suite, out of scope for this Epic).
|
||||
- Writing additional tutorials beyond the orchestrator-test README.
|
||||
- ADR creation — no new architectural decision; this Epic implements existing decisions.
|
||||
|
||||
## Risks
|
||||
|
||||
**Risk 1 — Scope creep into reformatting unrelated doc sections.** Resist; this ticket only adds what AC-1..AC-5 require.
|
||||
|
||||
## References
|
||||
|
||||
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
|
||||
- Replay protocol: `_docs/02_document/contracts/replay/replay_protocol.md` Invariant 12
|
||||
- Architecture: `_docs/02_document/architecture.md` (satellite-provider section)
|
||||
- Tests directory: `tests/e2e/replay/`
|
||||
- AZ-777 spec (being superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
|
||||
Reference in New Issue
Block a user