mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 20:11:14 +00:00
[AZ-316] Implement C11 HttpTileDownloader (batch 40)
Lands the operator-side pre-flight download path: authenticated httpx GETs against satellite-provider, RESTRICT-SAT-4 (>= 0.5 m/px) enforcement at the C11 boundary, c6 writes via consumer-side cuts (_TileWriterLike, _BudgetEnforcerLike), per-(flight_id, request_hash) journal under cache_root/.c11/journal/ for idempotent re-runs (AC-8, AC-12), 429 Retry-After + 5xx exponential backoff handling, fail-fast on TLS / 401 / 403, and a redacted-bearer auth-header policy. Architecture: - AZ-507 cross-component rule held: tile_downloader.py imports zero c6 symbols; the composition-root _C6DownloadAdapter in runtime_root/c11_factory.py absorbs c6's TileMetadata / TileSource / FreshnessLabel / VotingStatus enum assembly. - Sleep-callable injection (not full Clock) per Batch 39 precedent; default routes through WallClock.sleep_until_ns to keep the AZ-398 invariant intact. - No FDR records on the download path; spec mandates structured logs only (8 log kinds wired: session.start/end, resolution_rejected, freshness_rejected_summary, freshness_downgraded, batch.retry, provider.failed, budget.exceeded, idempotent_no_op). Tests: 14 new downloader unit tests covering AC-1..AC-9, AC-11, AC-12 plus throughput NFR + 429 HTTP-date + 429 budget exhaustion; 2 new TileDownloader Protocol conformance tests (AC-10). Full unit suite: 1420 passed, 80 skipped (env-gated), 0 failed. Code review: PASS_WITH_WARNINGS (5 Low findings, all documentation or downstream-blocked). See _docs/03_implementation/reviews/ batch_40_review.md and batch_40_cycle1_report.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,229 +0,0 @@
|
||||
# C11 TileDownloader — GET + Resolution Gate + C6 Write
|
||||
|
||||
**Task**: AZ-316_c11_tile_downloader
|
||||
**Name**: C11 TileDownloader
|
||||
**Description**: Implement the `TileDownloader` Protocol — C11's operator-side download path. `download_tiles_for_area` issues authenticated `httpx` GETs against `satellite-provider`, enforces RESTRICT-SAT-4 (≥ 0.5 m/px) at the C11 boundary, writes accepted tiles via the AZ-303 `TileStore` + `TileMetadataStore` Protocols (which run AZ-307's freshness gate at write), pre-checks cache headroom against AZ-308's budget enforcer, and journals download progress for idempotent re-runs. `enumerate_remote_coverage` is the read-only enumeration helper used by C12 for pre-flight area sizing. Honours `Retry-After` on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Surfaces freshness, resolution, downgrade, and outcome counts in `DownloadBatchReport`.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-303_c6_storage_interfaces, AZ-305_c6_postgres_filesystem_store, AZ-307_c6_freshness_gate, AZ-308_c6_cache_budget_eviction
|
||||
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
|
||||
**Tracker**: AZ-316
|
||||
**Epic**: AZ-251 (E-C11)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c11_tilemanager/tile_downloader.md` — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — consumed: `write_tile`, `tile_exists` are the write-side endpoints used by this task.
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` — consumed: `insert_metadata`, `query_by_bbox` for idempotence.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/WARN/ERROR log shapes for download events.
|
||||
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 3.1 download API, § 5 error handling, § 7 caveats (R02 enforcement).
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real `TileDownloader`:
|
||||
|
||||
- F1 pre-flight cache build cannot start (C10 has no source; C12 has no operation to invoke).
|
||||
- AC-8.1 (imagery via Suite Sat Service offline cache) and AC-8.3 (pre-loaded onto companion before flight) collapse — both require C11 populating C6 first.
|
||||
- RESTRICT-SAT-4 (≥ 0.5 m/px) has no enforcement point — sub-spec tiles would silently land in C6 and downstream pose estimation would degrade.
|
||||
- AC-NEW-6 (system rejects/downgrades stale tiles) is unenforced for the download path; C6's gate (AZ-307) catches it but the operator has no visibility into how many were rejected per batch.
|
||||
- A re-run of an already-completed download would re-issue every GET against `satellite-provider`, wasting bandwidth and fighting the operator workflow which expects idempotence (D-C10-1 manifest hash check downstream relies on stable downloads).
|
||||
- Network failures (TLS, auth, 5xx, 429) without a structured handler would either silently drop tiles or crash the operator-tooling CLI mid-batch.
|
||||
|
||||
This task delivers the production downloader. It writes no C6 storage logic (AZ-303/305 own that), no freshness rule (AZ-307 owns), and no eviction (AZ-308 owns) — it composes them.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `TileDownloader` Protocol + concrete `HttpTileDownloader` class at `src/gps_denied_onboard/components/c11_tilemanager/`:
|
||||
- `interface.py` exposes `TileDownloader` Protocol (`runtime_checkable`).
|
||||
- `tile_downloader.py` houses `HttpTileDownloader`.
|
||||
- `_types.py` houses `DownloadRequest`, `DownloadBatchReport`, `TileSummary`, `DownloadOutcome` (all `@dataclass(frozen=True)` for the data DTOs; `DownloadOutcome` is a `StrEnum`).
|
||||
- `errors.py` houses `SatelliteProviderError`, `RateLimitedError`, `ResolutionRejectionError`, `CacheBudgetExceededError`, `TileManagerError` (family parent).
|
||||
- Constructor signature:
|
||||
`__init__(self, *, http_client: httpx.Client, tile_store: TileStore, tile_metadata_store: TileMetadataStore, budget_enforcer: CacheBudgetEnforcer, logger: Logger, clock: Clock)`. Injected dependencies — no module-level singletons.
|
||||
- `download_tiles_for_area(request)` flow:
|
||||
1. Validates the `DownloadRequest` (bbox ordering, zoom levels in `[0, 21]`, `cache_root` is writable).
|
||||
2. Looks up `cache_root/.c11/journal/<flight_id>__<request_hash>.json`. If present and complete → `outcome = idempotent_no_op`; return early with zero counts.
|
||||
3. Calls `enumerate_remote_coverage(bbox, zoom_levels)` to size the batch.
|
||||
4. Calls `budget_enforcer.reserve_headroom(estimated_bytes)`. If insufficient → `CacheBudgetExceededError`; `outcome = failure`.
|
||||
5. For each tile in the batch, fetches the `httpx` GET response (per-tile streaming to bound memory):
|
||||
- Auth: TLS + `Authorization: Bearer <service_api_key>`.
|
||||
- On 429 → honour `Retry-After` (sleep then retry once; second 429 → `RateLimitedError`).
|
||||
- On 5xx → exponential backoff with cap (1s, 2s, 4s; 4 retries max); persistent 5xx → `SatelliteProviderError`.
|
||||
- On TLS / 401 / 403 → fail fast → `SatelliteProviderError`.
|
||||
6. Resolution gate: read `resolution_m_per_px` from response headers / metadata; if `< 0.5` → increment `tiles_rejected_resolution`; do NOT write to C6.
|
||||
7. Calls `tile_store.write_tile(...)` and `tile_metadata_store.insert_metadata(...)`. C6's freshness gate (AZ-307) may raise `FreshnessRejectionError` here — catch, increment `tiles_rejected_freshness`, continue.
|
||||
8. On `FreshnessLabel.DOWNGRADED` (returned by C6 when stable_rear-stale) → increment `tiles_downgraded`.
|
||||
9. After every successful tile write, append the tile id to the journal (`fsync` per atomicwrites pattern from description.md).
|
||||
10. On batch completion: write the journal's terminal record, return `DownloadBatchReport(outcome=success)`.
|
||||
- `enumerate_remote_coverage(bbox, zoom_levels)` issues a `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` (matches the existing parent-suite GET surface; the response carries the per-tile `produced_at` + `resolution_m_per_px` + `estimated_bytes`); returns `list[TileSummary]`.
|
||||
- INFO log: `kind="c11.download.session.start"` / `kind="c11.download.session.end"` with batch counts.
|
||||
- WARN log: per retry, per freshness rejection batch, per resolution rejection.
|
||||
- ERROR log: persistent `SatelliteProviderError`, `CacheBudgetExceededError`, TLS/auth failures.
|
||||
- The composition root constructs `HttpTileDownloader` via `build_tile_downloader(config) -> TileDownloader` at `src/gps_denied_onboard/runtime_root/c11_factory.py`.
|
||||
- Configuration extension to AZ-269 loader: `config.c11.satellite_provider_url`, `config.c11.service_api_key`, `config.c11.cache_root`, `config.c11.http_timeout_s`, `config.c11.max_5xx_retries`.
|
||||
- Type-only conformance test verifies `isinstance(HttpTileDownloader(...), TileDownloader)` via `runtime_checkable`.
|
||||
- The contract file at `_docs/02_document/contracts/c11_tilemanager/tile_downloader.md` is frozen as part of this task; consumers (C12) read it, not this spec.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `TileDownloader` Protocol declaration + `HttpTileDownloader` concrete class.
|
||||
- All four DTOs (`DownloadRequest`, `DownloadBatchReport`, `TileSummary`) and the `DownloadOutcome` + `FreshnessLabel`-consumption enums.
|
||||
- The download-progress journal under `cache_root/.c11/journal/` (atomicwrites + fsync; one file per `(flight_id, request_hash)` pair).
|
||||
- The resolution gate at C11 boundary (RESTRICT-SAT-4).
|
||||
- The cache-headroom pre-check via AZ-308's `CacheBudgetEnforcer.reserve_headroom`.
|
||||
- HTTP retry / backoff / `Retry-After` handling.
|
||||
- Composition-root factory `build_tile_downloader`.
|
||||
- Config schema extension for the C11 download fields.
|
||||
- Conformance test at `tests/unit/c11_tilemanager/test_protocol_conformance.py`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The `TileUploader` Protocol and concrete impl — separate task in this epic.
|
||||
- The per-flight signing key — separate task; downloads use a static service-internal API key, NOT the per-flight key.
|
||||
- The flight-state gate — download has no `flight_state` precondition (it runs operator-side, never airborne). The gate task applies to upload only.
|
||||
- Idempotent-retry-on-partial-success — that's the upload concern; download idempotence is handled here via the journal.
|
||||
- The `mock-suite-sat-service` fixture — the e2e-test fixture in `tests/fixtures/` is for the upload path. Download tests run against the REAL `satellite-provider` GET surface (or its existing test Docker fixture).
|
||||
- C12 CLI plumbing — owned by E-C12.
|
||||
- Sector boundary CRUD (C12) — the `DownloadRequest.sector_class` is supplied per request; C11 does NOT look up sector boundaries.
|
||||
- The R02 ADR-004 build-time exclusion of `c11_tilemanager` from the airborne image — owned by E-BOOT (build-system task).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Downloader writes accepted tiles to C6**
|
||||
Given a `DownloadRequest` for a Derkachi-bbox with 100 tiles all fresh and ≥ 0.5 m/px
|
||||
When `download_tiles_for_area` is called
|
||||
Then `DownloadBatchReport.tiles_downloaded == 100`; every tile is present in `TileStore` (verifiable via `tile_exists`); every tile has a `TileMetadata` row; `outcome = success`
|
||||
|
||||
**AC-2: Resolution-gate rejects sub-spec tiles at C11 boundary**
|
||||
Given a batch of 50 tiles where 10 have `resolution_m_per_px = 0.3`
|
||||
When `download_tiles_for_area` is called
|
||||
Then `tiles_rejected_resolution == 10`; the 10 are NOT in `TileStore`; no `tile_store.write_tile` was attempted for them (verifiable via spy); ONE `c11.download.resolution_rejected` WARN log per rejected tile
|
||||
|
||||
**AC-3: Freshness rejections from C6 are counted, not propagated**
|
||||
Given C6's AZ-307 raises `FreshnessRejectionError` for 5 tiles in an active_conflict batch
|
||||
When `download_tiles_for_area` is called
|
||||
Then `tiles_rejected_freshness == 5`; the run continues for the remaining tiles; no exception escapes to the caller; ONE summary WARN log at session end
|
||||
|
||||
**AC-4: Stable_rear stale tiles are surfaced as downgraded**
|
||||
Given C6 returns `FreshnessLabel.DOWNGRADED` for 3 tiles (stable_rear stale)
|
||||
When `download_tiles_for_area` is called
|
||||
Then `tiles_downgraded == 3`; those tiles ARE present in `TileStore` with the `DOWNGRADED` label; no exception is raised
|
||||
|
||||
**AC-5: 429 honours Retry-After**
|
||||
Given `satellite-provider` returns 429 with `Retry-After: 30`
|
||||
When the downloader hits the 429
|
||||
Then the downloader sleeps ≥ 30s before the retry (verifiable via injected `Clock.sleep` capture); on success the run proceeds normally; no `RateLimitedError` is raised
|
||||
|
||||
**AC-6: Persistent 5xx aborts with structured error**
|
||||
Given `satellite-provider` returns 503 for 5 consecutive attempts (exceeds the retry budget)
|
||||
When the downloader exhausts retries
|
||||
Then `SatelliteProviderError` is raised with the response body, the URL, and the attempt count; `outcome = failure`; partial writes are journaled (the run is resumable on next call)
|
||||
|
||||
**AC-7: TLS / 401 / 403 fail fast (no retry)**
|
||||
Given the first GET returns 401
|
||||
When the downloader processes the response
|
||||
Then `SatelliteProviderError` is raised on the FIRST attempt (zero retries); no plaintext fallback is attempted; the API key is NOT logged (security)
|
||||
|
||||
**AC-8: Idempotent re-run after success**
|
||||
Given a successful prior `download_tiles_for_area(R)` whose journal recorded all 100 tiles
|
||||
When `download_tiles_for_area(R)` is called again
|
||||
Then `outcome = idempotent_no_op`; zero GETs observed; zero `tile_store.write_tile` calls; the report's counts are zero except `tiles_downloaded` which equals the journaled count (transparency for the caller)
|
||||
|
||||
**AC-9: Cache-budget pre-check aborts before any write**
|
||||
Given `budget_enforcer.reserve_headroom(estimated_bytes)` returns `EvictionResult.insufficient`
|
||||
When `download_tiles_for_area` is called
|
||||
Then `CacheBudgetExceededError` is raised with the budget delta; zero GETs are issued; zero writes attempted; ONE ERROR log
|
||||
|
||||
**AC-10: Conformance — concrete impl satisfies Protocol**
|
||||
Given a `HttpTileDownloader` instance
|
||||
When `isinstance(impl, TileDownloader)` is checked under `runtime_checkable`
|
||||
Then the result is `True`; a fake omitting `enumerate_remote_coverage` returns `False`
|
||||
|
||||
**AC-11: Service API key never logged**
|
||||
Given any log path (INFO, WARN, ERROR)
|
||||
When the downloader logs the request URL or headers
|
||||
Then the `service_api_key` value is redacted (`Bearer ***`); no test log capture observes the raw key
|
||||
|
||||
**AC-12: Journal survives mid-batch crash**
|
||||
Given the process is killed after 30 of 100 tiles have been written and journaled
|
||||
When `download_tiles_for_area(R)` is called on restart
|
||||
Then the journal is read; the 30 already-journaled tiles are skipped; only the remaining 70 are GET-fetched; final report shows `tiles_downloaded = 100` (30 prior + 70 new); `outcome = success`
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Download throughput ≥ 50 MB/s on a 1 Gbps link (C11-PT-01); the bottleneck is the network, not the writer.
|
||||
- Per-tile resolution-gate check ≤ 50 µs (header parse + comparison).
|
||||
- Journal append ≤ 5 ms per tile (atomicwrites + fsync; bounded by disk).
|
||||
|
||||
**Compatibility**
|
||||
- `httpx` per project pin (verify before adding); no per-task version bump unless absolutely required.
|
||||
- `atomicwrites` for journal — already used elsewhere (per description.md § 5).
|
||||
|
||||
**Reliability**
|
||||
- The downloader MUST tolerate process kill at any point and recover via journal on restart (AC-12).
|
||||
- The downloader MUST NOT corrupt C6 — the AZ-303 Protocol guarantees atomic writes; this task does not add new fs operations.
|
||||
- The injected `Clock` MUST be the same singleton used by AZ-308's enforcer and AZ-307's gate.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | 100-tile happy path against fake `httpx.Client` | All in C6; `tiles_downloaded=100`; `outcome=success` |
|
||||
| AC-2 | Mixed batch, 10 sub-spec tiles | `tiles_rejected_resolution=10`; spy on `write_tile` shows 40 calls |
|
||||
| AC-3 | C6 raises `FreshnessRejectionError` 5x | `tiles_rejected_freshness=5`; run completes |
|
||||
| AC-4 | C6 returns DOWNGRADED for 3 | `tiles_downgraded=3`; tiles present with DOWNGRADED label |
|
||||
| AC-5 | 429 + `Retry-After: 30` | `Clock.sleep` captured ≥ 30s; success path resumes |
|
||||
| AC-6 | 5x 503 | `SatelliteProviderError` with attempt count |
|
||||
| AC-7 | 401 first attempt | `SatelliteProviderError` on first call; zero retries |
|
||||
| AC-8 | Re-run identical request post-success | `outcome=idempotent_no_op`; zero GETs |
|
||||
| AC-9 | `reserve_headroom` returns insufficient | `CacheBudgetExceededError`; zero GETs |
|
||||
| AC-10 | `isinstance` check on impl + on partial fake | True / False |
|
||||
| AC-11 | Log capture during full session | Header values redacted; no raw key in any record |
|
||||
| AC-12 | Kill mid-batch + restart | 30 prior journaled + 70 new = 100 final |
|
||||
| NFR-perf-throughput | 10 GB synthetic batch over loopback | ≥ 50 MB/s |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The `service_api_key` is logged ONLY redacted (`Bearer ***`); the raw value MUST never appear in INFO/WARN/ERROR logs (security finding High in code-review otherwise).
|
||||
- The downloader runs operator-side ONLY; the airborne build target excludes the entire `c11_tilemanager/` source tree (E-BOOT enforces; this task does NOT add the exclusion but its tests assert via R02 that the module's import surface is honest).
|
||||
- The journal location is `cache_root/.c11/journal/<flight_id>__<request_hash>.json` — `request_hash` is `sha256(bbox|zoom_levels|sector_class|service_api_key_hash).hex()[:16]`.
|
||||
- The hot path uses `httpx.Client` (sync), not `httpx.AsyncClient` — the operator workflow is offline minutes; async adds complexity without a measurable win.
|
||||
- Per-tile streaming is mandatory; loading an entire batch into memory is rejected (a 10 GB Derkachi area would OOM on operator workstations).
|
||||
- This task introduces no new runtime dependencies beyond `httpx` and `atomicwrites` (verify both are in the project's `requirements.txt`).
|
||||
- The R02 ADR-004 exclusion is enforced at build time by E-BOOT; this task adds no runtime self-check (security tasks C11-ST-01/02/03 cover that in Step 9).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: `Retry-After` header format ambiguity (HTTP-date vs. seconds)**
|
||||
- *Risk*: Some servers send `Retry-After: Wed, 21 Oct 2026 07:28:00 GMT`; naïve integer parsing crashes.
|
||||
- *Mitigation*: Parse both forms (`int` first, `email.utils.parsedate_to_datetime` fallback); cap the wait at `config.c11.max_retry_after_s` (default 300s) to prevent server-pinned hangs.
|
||||
|
||||
**Risk 2: API key leakage via exception messages**
|
||||
- *Risk*: An unhandled exception's repr includes the `httpx.Request` whose headers carry the raw `service_api_key`.
|
||||
- *Mitigation*: Wrap every `httpx` call in try/except → re-raise `SatelliteProviderError` with sanitised message; never include `httpx.Request.headers` in the error payload.
|
||||
|
||||
**Risk 3: Journal corruption on power-loss**
|
||||
- *Risk*: Process killed mid-`fsync` leaves a torn JSON line; restart fails to parse.
|
||||
- *Mitigation*: `atomicwrites` provides write-then-rename semantics; one journal record per file segment with a checksum; corrupt records are skipped on read with a WARN log and the affected tile is re-fetched. Documented behaviour.
|
||||
|
||||
**Risk 4: `satellite-provider`'s GET surface returns tiles in a format C11 doesn't expect**
|
||||
- *Risk*: The parent suite changes the response schema (new field, renamed metadata key); C11 silently misparses.
|
||||
- *Mitigation*: Strict pydantic-free DTO parsing — extra fields ignored, required fields → `SatelliteProviderError("response schema mismatch", field=...)`. Versioning for the GET API is owned by `satellite-provider`'s contract; this task pins to the current shape.
|
||||
|
||||
**Risk 5: Concurrent C11 invocations corrupt the journal**
|
||||
- *Risk*: Operator launches two `download_tiles_for_area` runs against the same `cache_root` simultaneously; both write to the same journal.
|
||||
- *Mitigation*: Per `description.md` § 7, C12 owns a filesystem lockfile that gates concurrent invocations. This task asserts the lockfile exists at construction (`cache_root/.c11/lock`) and refuses to start otherwise. The lockfile creation is C12's job.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: operator-side download from `satellite-provider` GET surface, RESTRICT-SAT-4 enforcement, AC-NEW-6 freshness counting, AC-8.1/8.3 cache provisioning input.
|
||||
- **Production code that must exist**: real `httpx.Client`-backed `HttpTileDownloader`, real journal write/read with atomicwrites, real resolution gate, real composition-root factory wiring AZ-303/305/307/308 dependencies, real config schema extension.
|
||||
- **Allowed external stubs**: tests MAY use a fake `httpx.Client` (transport-level) to script responses, fake `Clock` for deterministic retries, fake `TileStore`/`TileMetadataStore` (already provided by AZ-303's conformance fakes); production wiring uses real httpx + real Postgres-backed C6.
|
||||
- **Unacceptable substitutes**: a hardcoded "every tile passes" resolution gate (defeats RESTRICT-SAT-4); skipping the journal and re-issuing every GET (defeats I-2 idempotence); silently catching `FreshnessRejectionError` without counting (loses operator visibility for AC-NEW-6); using `requests` instead of `httpx` (project pinned to httpx; switching mid-component is an `Architecture` finding).
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces/implements the contract at `_docs/02_document/contracts/c11_tilemanager/tile_downloader.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
Reference in New Issue
Block a user