Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
19 KiB
C11 TileDownloader — GET + Resolution Gate + C6 Write
Task: AZ-316_c11_tile_downloader
Name: C11 TileDownloader
Description: Implement the TileDownloader Protocol — C11's operator-side download path. download_tiles_for_area issues authenticated httpx GETs against satellite-provider, enforces RESTRICT-SAT-4 (≥ 0.5 m/px) at the C11 boundary, writes accepted tiles via the AZ-303 TileStore + TileMetadataStore Protocols (which run AZ-307's freshness gate at write), pre-checks cache headroom against AZ-308's budget enforcer, and journals download progress for idempotent re-runs. enumerate_remote_coverage is the read-only enumeration helper used by C12 for pre-flight area sizing. Honours Retry-After on 429s, fails fast on TLS / auth errors, retries with backoff on 5xx. Surfaces freshness, resolution, downgrade, and outcome counts in DownloadBatchReport.
Complexity: 5 points
Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-303_c6_storage_interfaces, AZ-305_c6_postgres_filesystem_store, AZ-307_c6_freshness_gate, AZ-308_c6_cache_budget_eviction
Component: c11_tilemanager (epic AZ-251 / E-C11)
Tracker: AZ-316
Epic: AZ-251 (E-C11)
Document Dependencies
_docs/02_document/contracts/c11_tilemanager/tile_downloader.md— produced by this task (frozen Protocol + DTO shape, invariants, test cases)._docs/02_document/contracts/c6_tile_cache/tile_store.md— consumed:write_tile,tile_existsare the write-side endpoints used by this task._docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md— consumed:insert_metadata,query_by_bboxfor idempotence._docs/02_document/contracts/shared_logging/log_record_schema.md— INFO/WARN/ERROR log shapes for download events._docs/02_document/components/12_c11_tilemanager/description.md— § 3.1 download API, § 5 error handling, § 7 caveats (R02 enforcement).
Problem
Without a real TileDownloader:
- F1 pre-flight cache build cannot start (C10 has no source; C12 has no operation to invoke).
- AC-8.1 (imagery via Suite Sat Service offline cache) and AC-8.3 (pre-loaded onto companion before flight) collapse — both require C11 populating C6 first.
- RESTRICT-SAT-4 (≥ 0.5 m/px) has no enforcement point — sub-spec tiles would silently land in C6 and downstream pose estimation would degrade.
- AC-NEW-6 (system rejects/downgrades stale tiles) is unenforced for the download path; C6's gate (AZ-307) catches it but the operator has no visibility into how many were rejected per batch.
- A re-run of an already-completed download would re-issue every GET against
satellite-provider, wasting bandwidth and fighting the operator workflow which expects idempotence (D-C10-1 manifest hash check downstream relies on stable downloads). - Network failures (TLS, auth, 5xx, 429) without a structured handler would either silently drop tiles or crash the operator-tooling CLI mid-batch.
This task delivers the production downloader. It writes no C6 storage logic (AZ-303/305 own that), no freshness rule (AZ-307 owns), and no eviction (AZ-308 owns) — it composes them.
Outcome
- A
TileDownloaderProtocol + concreteHttpTileDownloaderclass atsrc/gps_denied_onboard/components/c11_tilemanager/:interface.pyexposesTileDownloaderProtocol (runtime_checkable).tile_downloader.pyhousesHttpTileDownloader._types.pyhousesDownloadRequest,DownloadBatchReport,TileSummary,DownloadOutcome(all@dataclass(frozen=True)for the data DTOs;DownloadOutcomeis aStrEnum).errors.pyhousesSatelliteProviderError,RateLimitedError,ResolutionRejectionError,CacheBudgetExceededError,TileManagerError(family parent).
- Constructor signature:
__init__(self, *, http_client: httpx.Client, tile_store: TileStore, tile_metadata_store: TileMetadataStore, budget_enforcer: CacheBudgetEnforcer, logger: Logger, clock: Clock). Injected dependencies — no module-level singletons. download_tiles_for_area(request)flow:- Validates the
DownloadRequest(bbox ordering, zoom levels in[0, 21],cache_rootis writable). - Looks up
cache_root/.c11/journal/<flight_id>__<request_hash>.json. If present and complete →outcome = idempotent_no_op; return early with zero counts. - Calls
enumerate_remote_coverage(bbox, zoom_levels)to size the batch. - Calls
budget_enforcer.reserve_headroom(estimated_bytes). If insufficient →CacheBudgetExceededError;outcome = failure. - For each tile in the batch, fetches the
httpxGET response (per-tile streaming to bound memory):- Auth: TLS +
Authorization: Bearer <service_api_key>. - On 429 → honour
Retry-After(sleep then retry once; second 429 →RateLimitedError). - On 5xx → exponential backoff with cap (1s, 2s, 4s; 4 retries max); persistent 5xx →
SatelliteProviderError. - On TLS / 401 / 403 → fail fast →
SatelliteProviderError.
- Auth: TLS +
- Resolution gate: read
resolution_m_per_pxfrom response headers / metadata; if< 0.5→ incrementtiles_rejected_resolution; do NOT write to C6. - Calls
tile_store.write_tile(...)andtile_metadata_store.insert_metadata(...). C6's freshness gate (AZ-307) may raiseFreshnessRejectionErrorhere — catch, incrementtiles_rejected_freshness, continue. - On
FreshnessLabel.DOWNGRADED(returned by C6 when stable_rear-stale) → incrementtiles_downgraded. - After every successful tile write, append the tile id to the journal (
fsyncper atomicwrites pattern from description.md). - On batch completion: write the journal's terminal record, return
DownloadBatchReport(outcome=success).
- Validates the
enumerate_remote_coverage(bbox, zoom_levels)issues aGET /api/satellite/tiles?bbox=...&zoom=...&list-only=true(matches the existing parent-suite GET surface; the response carries the per-tileproduced_at+resolution_m_per_px+estimated_bytes); returnslist[TileSummary].- INFO log:
kind="c11.download.session.start"/kind="c11.download.session.end"with batch counts. - WARN log: per retry, per freshness rejection batch, per resolution rejection.
- ERROR log: persistent
SatelliteProviderError,CacheBudgetExceededError, TLS/auth failures. - The composition root constructs
HttpTileDownloaderviabuild_tile_downloader(config) -> TileDownloaderatsrc/gps_denied_onboard/runtime_root/c11_factory.py. - Configuration extension to AZ-269 loader:
config.c11.satellite_provider_url,config.c11.service_api_key,config.c11.cache_root,config.c11.http_timeout_s,config.c11.max_5xx_retries. - Type-only conformance test verifies
isinstance(HttpTileDownloader(...), TileDownloader)viaruntime_checkable. - The contract file at
_docs/02_document/contracts/c11_tilemanager/tile_downloader.mdis frozen as part of this task; consumers (C12) read it, not this spec.
Scope
Included
TileDownloaderProtocol declaration +HttpTileDownloaderconcrete class.- All four DTOs (
DownloadRequest,DownloadBatchReport,TileSummary) and theDownloadOutcome+FreshnessLabel-consumption enums. - The download-progress journal under
cache_root/.c11/journal/(atomicwrites + fsync; one file per(flight_id, request_hash)pair). - The resolution gate at C11 boundary (RESTRICT-SAT-4).
- The cache-headroom pre-check via AZ-308's
CacheBudgetEnforcer.reserve_headroom. - HTTP retry / backoff /
Retry-Afterhandling. - Composition-root factory
build_tile_downloader. - Config schema extension for the C11 download fields.
- Conformance test at
tests/unit/c11_tilemanager/test_protocol_conformance.py.
Excluded
- The
TileUploaderProtocol and concrete impl — separate task in this epic. - The per-flight signing key — separate task; downloads use a static service-internal API key, NOT the per-flight key.
- The flight-state gate — download has no
flight_stateprecondition (it runs operator-side, never airborne). The gate task applies to upload only. - Idempotent-retry-on-partial-success — that's the upload concern; download idempotence is handled here via the journal.
- The
mock-suite-sat-servicefixture — the e2e-test fixture intests/fixtures/is for the upload path. Download tests run against the REALsatellite-providerGET surface (or its existing test Docker fixture). - C12 CLI plumbing — owned by E-C12.
- Sector boundary CRUD (C12) — the
DownloadRequest.sector_classis supplied per request; C11 does NOT look up sector boundaries. - The R02 ADR-004 build-time exclusion of
c11_tilemanagerfrom the airborne image — owned by E-BOOT (build-system task).
Acceptance Criteria
AC-1: Downloader writes accepted tiles to C6
Given a DownloadRequest for a Derkachi-bbox with 100 tiles all fresh and ≥ 0.5 m/px
When download_tiles_for_area is called
Then DownloadBatchReport.tiles_downloaded == 100; every tile is present in TileStore (verifiable via tile_exists); every tile has a TileMetadata row; outcome = success
AC-2: Resolution-gate rejects sub-spec tiles at C11 boundary
Given a batch of 50 tiles where 10 have resolution_m_per_px = 0.3
When download_tiles_for_area is called
Then tiles_rejected_resolution == 10; the 10 are NOT in TileStore; no tile_store.write_tile was attempted for them (verifiable via spy); ONE c11.download.resolution_rejected WARN log per rejected tile
AC-3: Freshness rejections from C6 are counted, not propagated
Given C6's AZ-307 raises FreshnessRejectionError for 5 tiles in an active_conflict batch
When download_tiles_for_area is called
Then tiles_rejected_freshness == 5; the run continues for the remaining tiles; no exception escapes to the caller; ONE summary WARN log at session end
AC-4: Stable_rear stale tiles are surfaced as downgraded
Given C6 returns FreshnessLabel.DOWNGRADED for 3 tiles (stable_rear stale)
When download_tiles_for_area is called
Then tiles_downgraded == 3; those tiles ARE present in TileStore with the DOWNGRADED label; no exception is raised
AC-5: 429 honours Retry-After
Given satellite-provider returns 429 with Retry-After: 30
When the downloader hits the 429
Then the downloader sleeps ≥ 30s before the retry (verifiable via injected Clock.sleep capture); on success the run proceeds normally; no RateLimitedError is raised
AC-6: Persistent 5xx aborts with structured error
Given satellite-provider returns 503 for 5 consecutive attempts (exceeds the retry budget)
When the downloader exhausts retries
Then SatelliteProviderError is raised with the response body, the URL, and the attempt count; outcome = failure; partial writes are journaled (the run is resumable on next call)
AC-7: TLS / 401 / 403 fail fast (no retry)
Given the first GET returns 401
When the downloader processes the response
Then SatelliteProviderError is raised on the FIRST attempt (zero retries); no plaintext fallback is attempted; the API key is NOT logged (security)
AC-8: Idempotent re-run after success
Given a successful prior download_tiles_for_area(R) whose journal recorded all 100 tiles
When download_tiles_for_area(R) is called again
Then outcome = idempotent_no_op; zero GETs observed; zero tile_store.write_tile calls; the report's counts are zero except tiles_downloaded which equals the journaled count (transparency for the caller)
AC-9: Cache-budget pre-check aborts before any write
Given budget_enforcer.reserve_headroom(estimated_bytes) returns EvictionResult.insufficient
When download_tiles_for_area is called
Then CacheBudgetExceededError is raised with the budget delta; zero GETs are issued; zero writes attempted; ONE ERROR log
AC-10: Conformance — concrete impl satisfies Protocol
Given a HttpTileDownloader instance
When isinstance(impl, TileDownloader) is checked under runtime_checkable
Then the result is True; a fake omitting enumerate_remote_coverage returns False
AC-11: Service API key never logged
Given any log path (INFO, WARN, ERROR)
When the downloader logs the request URL or headers
Then the service_api_key value is redacted (Bearer ***); no test log capture observes the raw key
AC-12: Journal survives mid-batch crash
Given the process is killed after 30 of 100 tiles have been written and journaled
When download_tiles_for_area(R) is called on restart
Then the journal is read; the 30 already-journaled tiles are skipped; only the remaining 70 are GET-fetched; final report shows tiles_downloaded = 100 (30 prior + 70 new); outcome = success
Non-Functional Requirements
Performance
- Download throughput ≥ 50 MB/s on a 1 Gbps link (C11-PT-01); the bottleneck is the network, not the writer.
- Per-tile resolution-gate check ≤ 50 µs (header parse + comparison).
- Journal append ≤ 5 ms per tile (atomicwrites + fsync; bounded by disk).
Compatibility
httpxper project pin (verify before adding); no per-task version bump unless absolutely required.atomicwritesfor journal — already used elsewhere (per description.md § 5).
Reliability
- The downloader MUST tolerate process kill at any point and recover via journal on restart (AC-12).
- The downloader MUST NOT corrupt C6 — the AZ-303 Protocol guarantees atomic writes; this task does not add new fs operations.
- The injected
ClockMUST be the same singleton used by AZ-308's enforcer and AZ-307's gate.
Unit Tests
| AC Ref | What to Test | Required Outcome |
|---|---|---|
| AC-1 | 100-tile happy path against fake httpx.Client |
All in C6; tiles_downloaded=100; outcome=success |
| AC-2 | Mixed batch, 10 sub-spec tiles | tiles_rejected_resolution=10; spy on write_tile shows 40 calls |
| AC-3 | C6 raises FreshnessRejectionError 5x |
tiles_rejected_freshness=5; run completes |
| AC-4 | C6 returns DOWNGRADED for 3 | tiles_downgraded=3; tiles present with DOWNGRADED label |
| AC-5 | 429 + Retry-After: 30 |
Clock.sleep captured ≥ 30s; success path resumes |
| AC-6 | 5x 503 | SatelliteProviderError with attempt count |
| AC-7 | 401 first attempt | SatelliteProviderError on first call; zero retries |
| AC-8 | Re-run identical request post-success | outcome=idempotent_no_op; zero GETs |
| AC-9 | reserve_headroom returns insufficient |
CacheBudgetExceededError; zero GETs |
| AC-10 | isinstance check on impl + on partial fake |
True / False |
| AC-11 | Log capture during full session | Header values redacted; no raw key in any record |
| AC-12 | Kill mid-batch + restart | 30 prior journaled + 70 new = 100 final |
| NFR-perf-throughput | 10 GB synthetic batch over loopback | ≥ 50 MB/s |
Constraints
- The
service_api_keyis logged ONLY redacted (Bearer ***); the raw value MUST never appear in INFO/WARN/ERROR logs (security finding High in code-review otherwise). - The downloader runs operator-side ONLY; the airborne build target excludes the entire
c11_tilemanager/source tree (E-BOOT enforces; this task does NOT add the exclusion but its tests assert via R02 that the module's import surface is honest). - The journal location is
cache_root/.c11/journal/<flight_id>__<request_hash>.json—request_hashissha256(bbox|zoom_levels|sector_class|service_api_key_hash).hex()[:16]. - The hot path uses
httpx.Client(sync), nothttpx.AsyncClient— the operator workflow is offline minutes; async adds complexity without a measurable win. - Per-tile streaming is mandatory; loading an entire batch into memory is rejected (a 10 GB Derkachi area would OOM on operator workstations).
- This task introduces no new runtime dependencies beyond
httpxandatomicwrites(verify both are in the project'srequirements.txt). - The R02 ADR-004 exclusion is enforced at build time by E-BOOT; this task adds no runtime self-check (security tasks C11-ST-01/02/03 cover that in Step 9).
Risks & Mitigation
Risk 1: Retry-After header format ambiguity (HTTP-date vs. seconds)
- Risk: Some servers send
Retry-After: Wed, 21 Oct 2026 07:28:00 GMT; naïve integer parsing crashes. - Mitigation: Parse both forms (
intfirst,email.utils.parsedate_to_datetimefallback); cap the wait atconfig.c11.max_retry_after_s(default 300s) to prevent server-pinned hangs.
Risk 2: API key leakage via exception messages
- Risk: An unhandled exception's repr includes the
httpx.Requestwhose headers carry the rawservice_api_key. - Mitigation: Wrap every
httpxcall in try/except → re-raiseSatelliteProviderErrorwith sanitised message; never includehttpx.Request.headersin the error payload.
Risk 3: Journal corruption on power-loss
- Risk: Process killed mid-
fsyncleaves a torn JSON line; restart fails to parse. - Mitigation:
atomicwritesprovides write-then-rename semantics; one journal record per file segment with a checksum; corrupt records are skipped on read with a WARN log and the affected tile is re-fetched. Documented behaviour.
Risk 4: satellite-provider's GET surface returns tiles in a format C11 doesn't expect
- Risk: The parent suite changes the response schema (new field, renamed metadata key); C11 silently misparses.
- Mitigation: Strict pydantic-free DTO parsing — extra fields ignored, required fields →
SatelliteProviderError("response schema mismatch", field=...). Versioning for the GET API is owned bysatellite-provider's contract; this task pins to the current shape.
Risk 5: Concurrent C11 invocations corrupt the journal
- Risk: Operator launches two
download_tiles_for_arearuns against the samecache_rootsimultaneously; both write to the same journal. - Mitigation: Per
description.md§ 7, C12 owns a filesystem lockfile that gates concurrent invocations. This task asserts the lockfile exists at construction (cache_root/.c11/lock) and refuses to start otherwise. The lockfile creation is C12's job.
Runtime Completeness
- Named capability: operator-side download from
satellite-providerGET surface, RESTRICT-SAT-4 enforcement, AC-NEW-6 freshness counting, AC-8.1/8.3 cache provisioning input. - Production code that must exist: real
httpx.Client-backedHttpTileDownloader, real journal write/read with atomicwrites, real resolution gate, real composition-root factory wiring AZ-303/305/307/308 dependencies, real config schema extension. - Allowed external stubs: tests MAY use a fake
httpx.Client(transport-level) to script responses, fakeClockfor deterministic retries, fakeTileStore/TileMetadataStore(already provided by AZ-303's conformance fakes); production wiring uses real httpx + real Postgres-backed C6. - Unacceptable substitutes: a hardcoded "every tile passes" resolution gate (defeats RESTRICT-SAT-4); skipping the journal and re-issuing every GET (defeats I-2 idempotence); silently catching
FreshnessRejectionErrorwithout counting (loses operator visibility for AC-NEW-6); usingrequestsinstead ofhttpx(project pinned to httpx; switching mid-component is anArchitecturefinding).
Contract
This task produces/implements the contract at _docs/02_document/contracts/c11_tilemanager/tile_downloader.md.
Consumers MUST read that file — not this task spec — to discover the interface.