Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-316_c11_tile_downloader.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

19 KiB

C11 TileDownloader — GET + Resolution Gate + C6 Write

Task: AZ-316_c11_tile_downloader Name: C11 TileDownloader Description: Implement the TileDownloader Protocol — C11's operator-side download path. download_tiles_for_area issues authenticated httpx GETs against satellite-provider, enforces RESTRICT-SAT-4 (≥ 0.5 m/px) at the C11 boundary, writes accepted tiles via the AZ-303 TileStore + TileMetadataStore Protocols (which run AZ-307's freshness gate at write), pre-checks cache headroom against AZ-308's budget enforcer, and journals download progress for idempotent re-runs. enumerate_remote_coverage is the read-only enumeration helper used by C12 for pre-flight area sizing. Honours Retry-After on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Surfaces freshness, resolution, downgrade, and outcome counts in DownloadBatchReport. Complexity: 5 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-303_c6_storage_interfaces, AZ-305_c6_postgres_filesystem_store, AZ-307_c6_freshness_gate, AZ-308_c6_cache_budget_eviction Component: c11_tilemanager (epic AZ-251 / E-C11) Tracker: AZ-316 Epic: AZ-251 (E-C11)

Document Dependencies

  • _docs/02_document/contracts/c11_tilemanager/tile_downloader.md — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
  • _docs/02_document/contracts/c6_tile_cache/tile_store.md — consumed: write_tile, tile_exists are the write-side endpoints used by this task.
  • _docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md — consumed: insert_metadata, query_by_bbox for idempotence.
  • _docs/02_document/contracts/shared_logging/log_record_schema.md — INFO/WARN/ERROR log shapes for download events.
  • _docs/02_document/components/12_c11_tilemanager/description.md — § 3.1 download API, § 5 error handling, § 7 caveats (R02 enforcement).

Problem

Without a real TileDownloader:

  • F1 pre-flight cache build cannot start (C10 has no source; C12 has no operation to invoke).
  • AC-8.1 (imagery via Suite Sat Service offline cache) and AC-8.3 (pre-loaded onto companion before flight) collapse — both require C11 populating C6 first.
  • RESTRICT-SAT-4 (≥ 0.5 m/px) has no enforcement point — sub-spec tiles would silently land in C6 and downstream pose estimation would degrade.
  • AC-NEW-6 (system rejects/downgrades stale tiles) is unenforced for the download path; C6's gate (AZ-307) catches it but the operator has no visibility into how many were rejected per batch.
  • A re-run of an already-completed download would re-issue every GET against satellite-provider, wasting bandwidth and fighting the operator workflow which expects idempotence (D-C10-1 manifest hash check downstream relies on stable downloads).
  • Network failures (TLS, auth, 5xx, 429) without a structured handler would either silently drop tiles or crash the operator-tooling CLI mid-batch.

This task delivers the production downloader. It writes no C6 storage logic (AZ-303/305 own that), no freshness rule (AZ-307 owns), and no eviction (AZ-308 owns) — it composes them.

Outcome

  • A TileDownloader Protocol + concrete HttpTileDownloader class at src/gps_denied_onboard/components/c11_tilemanager/:
    • interface.py exposes TileDownloader Protocol (runtime_checkable).
    • tile_downloader.py houses HttpTileDownloader.
    • _types.py houses DownloadRequest, DownloadBatchReport, TileSummary, DownloadOutcome (all @dataclass(frozen=True) for the data DTOs; DownloadOutcome is a StrEnum).
    • errors.py houses SatelliteProviderError, RateLimitedError, ResolutionRejectionError, CacheBudgetExceededError, TileManagerError (family parent).
  • Constructor signature: __init__(self, *, http_client: httpx.Client, tile_store: TileStore, tile_metadata_store: TileMetadataStore, budget_enforcer: CacheBudgetEnforcer, logger: Logger, clock: Clock). Injected dependencies — no module-level singletons.
  • download_tiles_for_area(request) flow:
    1. Validates the DownloadRequest (bbox ordering, zoom levels in [0, 21], cache_root is writable).
    2. Looks up cache_root/.c11/journal/<flight_id>__<request_hash>.json. If present and complete → outcome = idempotent_no_op; return early with zero counts.
    3. Calls enumerate_remote_coverage(bbox, zoom_levels) to size the batch.
    4. Calls budget_enforcer.reserve_headroom(estimated_bytes). If insufficient → CacheBudgetExceededError; outcome = failure.
    5. For each tile in the batch, fetches the httpx GET response (per-tile streaming to bound memory):
      • Auth: TLS + Authorization: Bearer <service_api_key>.
      • On 429 → honour Retry-After (sleep then retry once; second 429 → RateLimitedError).
      • On 5xx → exponential backoff with cap (1s, 2s, 4s; 4 retries max); persistent 5xx → SatelliteProviderError.
      • On TLS / 401 / 403 → fail fast → SatelliteProviderError.
    6. Resolution gate: read resolution_m_per_px from response headers / metadata; if < 0.5 → increment tiles_rejected_resolution; do NOT write to C6.
    7. Calls tile_store.write_tile(...) and tile_metadata_store.insert_metadata(...). C6's freshness gate (AZ-307) may raise FreshnessRejectionError here — catch, increment tiles_rejected_freshness, continue.
    8. On FreshnessLabel.DOWNGRADED (returned by C6 when stable_rear-stale) → increment tiles_downgraded.
    9. After every successful tile write, append the tile id to the journal (fsync per atomicwrites pattern from description.md).
    10. On batch completion: write the journal's terminal record, return DownloadBatchReport(outcome=success).
  • enumerate_remote_coverage(bbox, zoom_levels) issues a GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true (matches the existing parent-suite GET surface; the response carries the per-tile produced_at + resolution_m_per_px + estimated_bytes); returns list[TileSummary].
  • INFO log: kind="c11.download.session.start" / kind="c11.download.session.end" with batch counts.
  • WARN log: per retry, per freshness rejection batch, per resolution rejection.
  • ERROR log: persistent SatelliteProviderError, CacheBudgetExceededError, TLS/auth failures.
  • The composition root constructs HttpTileDownloader via build_tile_downloader(config) -> TileDownloader at src/gps_denied_onboard/runtime_root/c11_factory.py.
  • Configuration extension to AZ-269 loader: config.c11.satellite_provider_url, config.c11.service_api_key, config.c11.cache_root, config.c11.http_timeout_s, config.c11.max_5xx_retries.
  • Type-only conformance test verifies isinstance(HttpTileDownloader(...), TileDownloader) via runtime_checkable.
  • The contract file at _docs/02_document/contracts/c11_tilemanager/tile_downloader.md is frozen as part of this task; consumers (C12) read it, not this spec.

Scope

Included

  • TileDownloader Protocol declaration + HttpTileDownloader concrete class.
  • All four DTOs (DownloadRequest, DownloadBatchReport, TileSummary) and the DownloadOutcome + FreshnessLabel-consumption enums.
  • The download-progress journal under cache_root/.c11/journal/ (atomicwrites + fsync; one file per (flight_id, request_hash) pair).
  • The resolution gate at C11 boundary (RESTRICT-SAT-4).
  • The cache-headroom pre-check via AZ-308's CacheBudgetEnforcer.reserve_headroom.
  • HTTP retry / backoff / Retry-After handling.
  • Composition-root factory build_tile_downloader.
  • Config schema extension for the C11 download fields.
  • Conformance test at tests/unit/c11_tilemanager/test_protocol_conformance.py.

Excluded

  • The TileUploader Protocol and concrete impl — separate task in this epic.
  • The per-flight signing key — separate task; downloads use a static service-internal API key, NOT the per-flight key.
  • The flight-state gate — download has no flight_state precondition (it runs operator-side, never airborne). The gate task applies to upload only.
  • Idempotent-retry-on-partial-success — that's the upload concern; download idempotence is handled here via the journal.
  • The mock-suite-sat-service fixture — the e2e-test fixture in tests/fixtures/ is for the upload path. Download tests run against the REAL satellite-provider GET surface (or its existing test Docker fixture).
  • C12 CLI plumbing — owned by E-C12.
  • Sector boundary CRUD (C12) — the DownloadRequest.sector_class is supplied per request; C11 does NOT look up sector boundaries.
  • The R02 ADR-004 build-time exclusion of c11_tilemanager from the airborne image — owned by E-BOOT (build-system task).

Acceptance Criteria

AC-1: Downloader writes accepted tiles to C6 Given a DownloadRequest for a Derkachi-bbox with 100 tiles all fresh and ≥ 0.5 m/px When download_tiles_for_area is called Then DownloadBatchReport.tiles_downloaded == 100; every tile is present in TileStore (verifiable via tile_exists); every tile has a TileMetadata row; outcome = success

AC-2: Resolution-gate rejects sub-spec tiles at C11 boundary Given a batch of 50 tiles where 10 have resolution_m_per_px = 0.3 When download_tiles_for_area is called Then tiles_rejected_resolution == 10; the 10 are NOT in TileStore; no tile_store.write_tile was attempted for them (verifiable via spy); ONE c11.download.resolution_rejected WARN log per rejected tile

AC-3: Freshness rejections from C6 are counted, not propagated Given C6's AZ-307 raises FreshnessRejectionError for 5 tiles in an active_conflict batch When download_tiles_for_area is called Then tiles_rejected_freshness == 5; the run continues for the remaining tiles; no exception escapes to the caller; ONE summary WARN log at session end

AC-4: Stable_rear stale tiles are surfaced as downgraded Given C6 returns FreshnessLabel.DOWNGRADED for 3 tiles (stable_rear stale) When download_tiles_for_area is called Then tiles_downgraded == 3; those tiles ARE present in TileStore with the DOWNGRADED label; no exception is raised

AC-5: 429 honours Retry-After Given satellite-provider returns 429 with Retry-After: 30 When the downloader hits the 429 Then the downloader sleeps ≥ 30s before the retry (verifiable via injected Clock.sleep capture); on success the run proceeds normally; no RateLimitedError is raised

AC-6: Persistent 5xx aborts with structured error Given satellite-provider returns 503 for 5 consecutive attempts (exceeds the retry budget) When the downloader exhausts retries Then SatelliteProviderError is raised with the response body, the URL, and the attempt count; outcome = failure; partial writes are journaled (the run is resumable on next call)

AC-7: TLS / 401 / 403 fail fast (no retry) Given the first GET returns 401 When the downloader processes the response Then SatelliteProviderError is raised on the FIRST attempt (zero retries); no plaintext fallback is attempted; the API key is NOT logged (security)

AC-8: Idempotent re-run after success Given a successful prior download_tiles_for_area(R) whose journal recorded all 100 tiles When download_tiles_for_area(R) is called again Then outcome = idempotent_no_op; zero GETs observed; zero tile_store.write_tile calls; the report's counts are zero except tiles_downloaded which equals the journaled count (transparency for the caller)

AC-9: Cache-budget pre-check aborts before any write Given budget_enforcer.reserve_headroom(estimated_bytes) returns EvictionResult.insufficient When download_tiles_for_area is called Then CacheBudgetExceededError is raised with the budget delta; zero GETs are issued; zero writes attempted; ONE ERROR log

AC-10: Conformance — concrete impl satisfies Protocol Given a HttpTileDownloader instance When isinstance(impl, TileDownloader) is checked under runtime_checkable Then the result is True; a fake omitting enumerate_remote_coverage returns False

AC-11: Service API key never logged Given any log path (INFO, WARN, ERROR) When the downloader logs the request URL or headers Then the service_api_key value is redacted (Bearer ***); no test log capture observes the raw key

AC-12: Journal survives mid-batch crash Given the process is killed after 30 of 100 tiles have been written and journaled When download_tiles_for_area(R) is called on restart Then the journal is read; the 30 already-journaled tiles are skipped; only the remaining 70 are GET-fetched; final report shows tiles_downloaded = 100 (30 prior + 70 new); outcome = success

Non-Functional Requirements

Performance

  • Download throughput ≥ 50 MB/s on a 1 Gbps link (C11-PT-01); the bottleneck is the network, not the writer.
  • Per-tile resolution-gate check ≤ 50 µs (header parse + comparison).
  • Journal append ≤ 5 ms per tile (atomicwrites + fsync; bounded by disk).

Compatibility

  • httpx per project pin (verify before adding); no per-task version bump unless absolutely required.
  • atomicwrites for journal — already used elsewhere (per description.md § 5).

Reliability

  • The downloader MUST tolerate process kill at any point and recover via journal on restart (AC-12).
  • The downloader MUST NOT corrupt C6 — the AZ-303 Protocol guarantees atomic writes; this task does not add new fs operations.
  • The injected Clock MUST be the same singleton used by AZ-308's enforcer and AZ-307's gate.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 100-tile happy path against fake httpx.Client All in C6; tiles_downloaded=100; outcome=success
AC-2 Mixed batch, 10 sub-spec tiles tiles_rejected_resolution=10; spy on write_tile shows 40 calls
AC-3 C6 raises FreshnessRejectionError 5x tiles_rejected_freshness=5; run completes
AC-4 C6 returns DOWNGRADED for 3 tiles_downgraded=3; tiles present with DOWNGRADED label
AC-5 429 + Retry-After: 30 Clock.sleep captured ≥ 30s; success path resumes
AC-6 5x 503 SatelliteProviderError with attempt count
AC-7 401 first attempt SatelliteProviderError on first call; zero retries
AC-8 Re-run identical request post-success outcome=idempotent_no_op; zero GETs
AC-9 reserve_headroom returns insufficient CacheBudgetExceededError; zero GETs
AC-10 isinstance check on impl + on partial fake True / False
AC-11 Log capture during full session Header values redacted; no raw key in any record
AC-12 Kill mid-batch + restart 30 prior journaled + 70 new = 100 final
NFR-perf-throughput 10 GB synthetic batch over loopback ≥ 50 MB/s

Constraints

  • The service_api_key is logged ONLY redacted (Bearer ***); the raw value MUST never appear in INFO/WARN/ERROR logs (security finding High in code-review otherwise).
  • The downloader runs operator-side ONLY; the airborne build target excludes the entire c11_tilemanager/ source tree (E-BOOT enforces; this task does NOT add the exclusion but its tests assert via R02 that the module's import surface is honest).
  • The journal location is cache_root/.c11/journal/<flight_id>__<request_hash>.jsonrequest_hash is sha256(bbox|zoom_levels|sector_class|service_api_key_hash).hex()[:16].
  • The hot path uses httpx.Client (sync), not httpx.AsyncClient — the operator workflow is offline minutes; async adds complexity without a measurable win.
  • Per-tile streaming is mandatory; loading an entire batch into memory is rejected (a 10 GB Derkachi area would OOM on operator workstations).
  • This task introduces no new runtime dependencies beyond httpx and atomicwrites (verify both are in the project's requirements.txt).
  • The R02 ADR-004 exclusion is enforced at build time by E-BOOT; this task adds no runtime self-check (security tasks C11-ST-01/02/03 cover that in Step 9).

Risks & Mitigation

Risk 1: Retry-After header format ambiguity (HTTP-date vs. seconds)

  • Risk: Some servers send Retry-After: Wed, 21 Oct 2026 07:28:00 GMT; naïve integer parsing crashes.
  • Mitigation: Parse both forms (int first, email.utils.parsedate_to_datetime fallback); cap the wait at config.c11.max_retry_after_s (default 300s) to prevent server-pinned hangs.

Risk 2: API key leakage via exception messages

  • Risk: An unhandled exception's repr includes the httpx.Request whose headers carry the raw service_api_key.
  • Mitigation: Wrap every httpx call in try/except → re-raise SatelliteProviderError with sanitised message; never include httpx.Request.headers in the error payload.

Risk 3: Journal corruption on power-loss

  • Risk: Process killed mid-fsync leaves a torn JSON line; restart fails to parse.
  • Mitigation: atomicwrites provides write-then-rename semantics; one journal record per file segment with a checksum; corrupt records are skipped on read with a WARN log and the affected tile is re-fetched. Documented behaviour.

Risk 4: satellite-provider's GET surface returns tiles in a format C11 doesn't expect

  • Risk: The parent suite changes the response schema (new field, renamed metadata key); C11 silently misparses.
  • Mitigation: Strict pydantic-free DTO parsing — extra fields ignored, required fields → SatelliteProviderError("response schema mismatch", field=...). Versioning for the GET API is owned by satellite-provider's contract; this task pins to the current shape.

Risk 5: Concurrent C11 invocations corrupt the journal

  • Risk: Operator launches two download_tiles_for_area runs against the same cache_root simultaneously; both write to the same journal.
  • Mitigation: Per description.md § 7, C12 owns a filesystem lockfile that gates concurrent invocations. This task asserts the lockfile exists at construction (cache_root/.c11/lock) and refuses to start otherwise. The lockfile creation is C12's job.

Runtime Completeness

  • Named capability: operator-side download from satellite-provider GET surface, RESTRICT-SAT-4 enforcement, AC-NEW-6 freshness counting, AC-8.1/8.3 cache provisioning input.
  • Production code that must exist: real httpx.Client-backed HttpTileDownloader, real journal write/read with atomicwrites, real resolution gate, real composition-root factory wiring AZ-303/305/307/308 dependencies, real config schema extension.
  • Allowed external stubs: tests MAY use a fake httpx.Client (transport-level) to script responses, fake Clock for deterministic retries, fake TileStore/TileMetadataStore (already provided by AZ-303's conformance fakes); production wiring uses real httpx + real Postgres-backed C6.
  • Unacceptable substitutes: a hardcoded "every tile passes" resolution gate (defeats RESTRICT-SAT-4); skipping the journal and re-issuing every GET (defeats I-2 idempotence); silently catching FreshnessRejectionError without counting (loses operator visibility for AC-NEW-6); using requests instead of httpx (project pinned to httpx; switching mid-component is an Architecture finding).

Contract

This task produces/implements the contract at _docs/02_document/contracts/c11_tilemanager/tile_downloader.md. Consumers MUST read that file — not this task spec — to discover the interface.