Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-319_c11_tile_uploader.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

20 KiB

C11 TileUploader — Read Pending + Sign + POST + Mark Uploaded

Task: AZ-319_c11_tile_uploader Name: C11 TileUploader Description: Implement the TileUploader Protocol — C11's operator-side post-landing upload path. upload_pending_tiles calls AZ-317's FlightStateGate.confirm_on_ground() first, starts an AZ-318 signing session, reads pending mid-flight tiles from C6 (source = onboard_ingest, voting_status = pending) via the AZ-303 metadata store, packages each tile per the D-PROJ-2 multipart contract sketch (tile_blob, geo metadata, capture_timestamp, flight_id, companion_id, quality_metadata, signature), signs each payload, POSTs to /api/satellite/tiles/ingest, parses the per-tile response, and marks acknowledged tiles uploaded in C6. Honours Retry-After on 429s; fails fast on TLS / auth; surfaces signature_rejected per tile via FDR. The signing key is zeroised in a try/finally guarantee. Idempotent-retry across partial-success batches is a separate decorator task in this epic. Complexity: 5 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf, AZ-303_c6_storage_interfaces, AZ-305_c6_postgres_filesystem_store, AZ-317_c11_flight_state_gate, AZ-318_c11_signing_key Component: c11_tilemanager (epic AZ-251 / E-C11) Tracker: AZ-319 Epic: AZ-251 (E-C11)

Document Dependencies

  • _docs/02_document/contracts/c11_tilemanager/tile_uploader.md — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
  • _docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md — consumed: pending_uploads, mark_uploaded, get_by_id.
  • _docs/02_document/contracts/c6_tile_cache/tile_store.md — consumed: read_tile_pixels for the multipart blob.
  • _docs/02_document/contracts/shared_logging/log_record_schema.md — INFO/WARN/ERROR log shapes for upload events.
  • _docs/02_document/contracts/shared_fdr_client/fdr_record_schema.mdkind="c11.upload.tile.queued" / kind="c11.upload.tile.rejected" / kind="c11.upload.batch.complete" envelopes.
  • _docs/02_document/components/12_c11_tilemanager/description.md — § 3.2 D-PROJ-2 contract sketch, § 5 error handling.
  • _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md — D-PROJ-2 design task #1 ingest endpoint shape.

Problem

Without a real TileUploader:

  • AC-8.4 (post-landing upload of mid-flight tiles to the parent suite) collapses — the pending-upload journal grows unboundedly across flights.
  • D-PROJ-2's safety-officer correlation cannot work — the public-key + tile-id linkage exists only at upload time.
  • The AC-NEW-7 voting / trust layer (parent-suite side) has no inputs — without uploads, no flights ever vote.
  • Mid-flight tile generation (E-C13 mid-flight tile snapshot, AZ-294) becomes a leaf system: tiles land in C6 with voting_status = pending and stay there forever.
  • SignatureRejectedError from the parent suite has no detection path; a key compromise would not surface to the safety officer until manual log inspection.
  • Operators have no observable post-landing operation; the F10 functional flow has no implementation.

This task delivers the production uploader. It composes AZ-317 (gate) + AZ-318 (signing) + AZ-303/305 (C6) + httpx; it adds no new responsibilities beyond orchestration, so the surface area is tight.

Outcome

  • A TileUploader Protocol + concrete HttpTileUploader class at src/gps_denied_onboard/components/c11_tilemanager/:
    • interface.py exposes TileUploader Protocol (runtime_checkable).
    • tile_uploader.py houses HttpTileUploader.
    • _types.py adds UploadRequest, UploadBatchReport, PerTileStatus, UploadOutcome (StrEnum), IngestStatus (StrEnum) — all @dataclass(frozen=True) for the data DTOs.
    • errors.py adds SignatureRejectedError (subclasses TileManagerError); FlightStateNotOnGroundError and the rest are already declared in AZ-317/AZ-318/AZ-316.
  • Constructor signature: __init__(self, *, http_client: httpx.Client, tile_store: TileStore, tile_metadata_store: TileMetadataStore, flight_state_gate: FlightStateGate, key_manager: PerFlightKeyManager, fdr_client: FdrClient, logger: Logger, clock: Clock, config: C11Config). Injected dependencies — no module-level singletons.
  • upload_pending_tiles(request) flow:
    1. Calls flight_state_gate.confirm_on_ground() (raises if not ON_GROUND; ZERO state-mutation prior to this).
    2. Calls key_manager.start_session(flight_id_for_session)flight_id_for_session is request.flight_id if provided else uuid.uuid4() ("session id" for the multi-flight case).
    3. In a try block:
      • Calls tile_metadata_store.pending_uploads(flight_id=request.flight_id) to enumerate pending tiles.
      • If empty → returns UploadBatchReport(outcome=success, per_tile_status=(), batch_uuid=uuid4()).
      • Splits the pending list into batches of request.batch_size.
      • For each batch:
        • Reads each tile's pixel bytes via tile_store.read_tile_pixels(tile_id).
        • Builds the multipart payload per tile: tile_blob, zoomLevel, latitude, longitude, tile_size_meters, tile_size_pixels, capture_timestamp, flight_id, companion_id, quality_metadata (JSON), signature (key_manager.sign(canonical_payload_bytes)).
        • Canonical payload bytes for signing: SHA-256 of tile_blob || zoomLevel || latitude || longitude || capture_timestamp || flight_id || companion_id || quality_metadata_json (deterministic byte concatenation; documented).
        • POSTs the multipart to {config.satellite_provider_url}/api/satellite/tiles/ingest.
        • On 202: parses batch_uuid + per_tile_status[] from the response body. For each queued | duplicate | superseded tile, calls tile_metadata_store.mark_uploaded(tile_id, batch_uuid). For each rejected tile, calls key_manager.record_signature_rejection(flight_id, tile_id) if the rejection reason mentions signature; emits FDR kind="c11.upload.tile.rejected" with the reason regardless.
        • On 429: honours Retry-After; on persistent 429 → RateLimitedError.
        • On 5xx: exponential backoff (1s, 2s, 4s; 4 retries max); persistent → SatelliteProviderError.
        • On TLS / 401 / 403: fail fast → SatelliteProviderError.
      • Aggregates UploadBatchReport:
        • outcome = success if ALL tiles are queued | duplicate | superseded.
        • outcome = partial if any rejected OR any unparseable response with otherwise-acked tiles.
        • outcome = failure if the gate blocked, the API key was invalid, or zero tiles could be POSTed.
        • public_key_fingerprint = the AZ-318 fingerprint from start_session.
        • batch_uuid = the LAST successful batch's UUID (or uuid4() if none succeeded; documented).
    4. In a finally block:
      • Calls key_manager.end_session() — guaranteed zeroisation regardless of success / failure / exception.
      • Emits FDR kind="c11.upload.batch.complete" with {flight_id_for_session, public_key_fingerprint, total_attempted, total_queued, total_rejected, outcome, observed_at_iso}.
  • enumerate_pending_tiles(flight_id) returns tile_metadata_store.pending_uploads(flight_id) directly (read-only enumeration).
  • confirm_flight_state() returns flight_state_gate.confirm_on_ground() (passes through; raises on non-ON_GROUND).
  • INFO log on session start/end with batch counts; WARN log per retry; ERROR log on SatelliteProviderError, FlightStateNotOnGroundError (caught and re-raised after log).
  • Composition root constructs HttpTileUploader via build_tile_uploader(config) -> TileUploader at src/gps_denied_onboard/runtime_root/c11_factory.py.
  • Configuration extension to AZ-269 loader: config.c11.satellite_provider_ingest_url, config.c11.upload_batch_size, config.c11.upload_http_timeout_s, config.c11.companion_id.
  • Type-only conformance test verifies isinstance(HttpTileUploader(...), TileUploader).

Scope

Included

  • TileUploader Protocol declaration + HttpTileUploader concrete class.
  • UploadRequest, UploadBatchReport, PerTileStatus, UploadOutcome, IngestStatus DTOs.
  • SignatureRejectedError definition (parent of TileManagerError).
  • The orchestration: gate → start_session → enumerate → batch loop → mark_uploaded / FDR alert → end_session.
  • Multipart payload construction + canonical bytes for signing.
  • HTTP retry / backoff / Retry-After handling for the upload path.
  • Composition-root factory build_tile_uploader.
  • Config schema extension for the C11 upload fields.
  • Conformance test at tests/unit/c11_tilemanager/test_protocol_conformance.py.

Excluded

  • The TileDownloader Protocol and concrete impl — separate task (AZ-316).
  • FlightStateGate impl — owned by AZ-317.
  • PerFlightKeyManager impl — owned by AZ-318.
  • Idempotent-retry-on-partial-success batch decorator — separate task in this epic (AZ-320_c11_idempotent_retry).
  • The R02 ADR-004 build-time exclusion — owned by E-BOOT.
  • The pre-flight key enrolment workflow at C12 — owned by E-C12.
  • The mock-suite-sat-service fixture under tests/fixtures/ — owned by E-BBT (test infrastructure).
  • Voting / trust promotion — owned by D-PROJ-2 / satellite-provider.
  • E-C8's FlightStateSource impl — owned by E-C8 (AZ-261).

Acceptance Criteria

AC-1: Happy path uploads all pending tiles Given 50 pending tiles in C6, ON_GROUND, parent suite returns 202 with all queued When upload_pending_tiles(request) is called Then 50 POSTs issued (one per tile or batched per batch_size); all 50 marked uploaded in C6 (verifiable via mark_uploaded spy); UploadBatchReport.outcome = success; one FDR kind="c11.upload.batch.complete" with total_attempted=50, total_queued=50

AC-2: Flight-state gate blocks before any read or POST Given FlightStateGate.confirm_on_ground() raises FlightStateNotOnGroundError(IN_FLIGHT) When upload_pending_tiles(request) is called Then FlightStateNotOnGroundError is raised; ZERO calls to pending_uploads (verifiable via spy); ZERO HTTP POSTs; ZERO calls to key_manager.start_session (key generation is also gated); key_manager.end_session() is NOT called (no session was started)

AC-3: Signature rejection per tile is FDR'd and not marked uploaded Given parent suite returns rejected for 1 tile with reason "invalid signature" When the response is parsed Then key_manager.record_signature_rejection(flight_id, tile_id) is called once; tile_metadata_store.mark_uploaded is NOT called for that tile; the tile remains voting_status = pending; FDR kind="c11.upload.tile.rejected" is emitted with the reason; report's outcome = partial

AC-4: duplicate and superseded are treated as success Given parent suite returns duplicate for 5 tiles and superseded for 3 tiles When the response is parsed Then all 8 are mark_uploaded'd in C6 with the batch_uuid; report's per_tile_status reflects the original status; outcome = success if no rejected

AC-5: Signing key is zeroised on success Given a successful upload When upload_pending_tiles returns Then key_manager.end_session() was called once (verifiable via spy); the AZ-318 manager's _private_key is None

AC-6: Signing key is zeroised on failure Given the FIRST POST raises a connection-reset error When upload_pending_tiles raises SatelliteProviderError Then key_manager.end_session() was called (try/finally executed); the manager's _private_key is None; the partial state in C6 is consistent (no half-marked tiles)

AC-7: Public-key FDR record precedes any tile FDR Given a session with at least one tile When the FDR stream is captured Then kind="c11.upload.session.key.public" is observed BEFORE any kind="c11.upload.tile.*" record

AC-8: 429 honours Retry-After Given parent suite returns 429 with Retry-After: 60 on the first POST When the uploader processes the response Then Clock.sleep is called with ≥ 60s; on success the run proceeds; the report includes retry_count >= 1

AC-9: Persistent 5xx aborts with structured error Given parent suite returns 503 for 5 consecutive attempts When the uploader exhausts retries Then SatelliteProviderError is raised; the report is NOT returned (the exception propagates); key_manager.end_session() was called via finally

AC-10: TLS / 401 / 403 fail fast Given the first POST returns 401 When the uploader processes the response Then SatelliteProviderError is raised on the first attempt; zero retries; the public key is NOT logged; the API key (if any TLS auth header) is NOT logged

AC-11: Empty pending set is success with no POSTs Given zero pending tiles in C6 When upload_pending_tiles(request) is called Then outcome = success; per_tile_status is empty; key_manager.start_session was called (signature still required by D-PROJ-2 for the empty-batch ack record per § 3.2; documented); end_session was called; ONE FDR c11.upload.batch.complete with total_attempted=0

AC-12: Conformance — concrete impl satisfies Protocol Given an HttpTileUploader instance When isinstance(impl, TileUploader) is checked under runtime_checkable Then the result is True; a fake omitting confirm_flight_state returns False

AC-13: Canonical signing bytes are deterministic Given the same tile metadata + tile bytes When _canonical_payload_bytes(tile) is computed twice Then the two byte strings are bitwise identical (no map ordering, no JSON whitespace drift); the SHA-256 over them matches; this is asserted via property test with N random tiles

AC-14: Partial-success batches return without raising Given a 10-tile batch where 7 are queued, 3 are rejected When upload_pending_tiles returns Then NO exception is raised; outcome = partial; per_tile_status has all 10 entries with their respective statuses; the 7 acked tiles are marked uploaded in C6; the 3 rejected stay pending

Non-Functional Requirements

Performance

  • Upload throughput ≥ 20 tile/s with signing (C11-PT-02); the bottleneck is the network plus signing per tile.
  • Per-tile signing ≤ 200 µs (Ed25519 from AZ-318); per-tile multipart construction ≤ 1 ms.

Compatibility

  • httpx per project pin; cryptography per project pin.
  • Multipart form encoding per httpx's files= parameter — no manual boundary construction.

Reliability

  • Try/finally ensures key_manager.end_session() runs in EVERY exit path including unexpected exceptions and KeyboardInterrupt.
  • The uploader writes to C6 ONLY via the AZ-303 Protocol (mark_uploaded); it does NOT touch the metadata table directly.
  • Concurrent invocations against the same cache_root are gated by C12's filesystem lockfile (same lock as TileDownloader); the uploader asserts the lock at construction.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 50-tile happy path All mark_uploaded'd; outcome=success; FDR batch.complete present
AC-2 Gate raises before any work Zero spies fire on pending_uploads, POST, start_session
AC-3 One signature rejection in a 5-tile batch record_signature_rejection called once; rejected tile NOT marked uploaded; outcome=partial
AC-4 Mix of duplicate and superseded responses All marked uploaded; outcome=success
AC-5 Successful upload end_session called; _private_key is None
AC-6 Mid-batch failure end_session called; key zeroised
AC-7 FDR stream order key.public before any tile.*
AC-8 429 + Retry-After: 60 Clock.sleep ≥ 60s; retry succeeds
AC-9 5x 503 SatelliteProviderError; finally still ran
AC-10 401 first attempt Fail-fast; no API-key in any log
AC-11 Empty pending set outcome=success; zero POSTs; key still session-started/ended
AC-12 isinstance check on impl + partial fake True / False
AC-13 Property test: deterministic canonical bytes Bitwise equal for N samples
AC-14 Partial-success batch No exception; outcome=partial; per-tile statuses correct
NFR-perf-throughput 1000 tiles via fake httpx ≥ 20 tile/s including signing

Constraints

  • The signing canonical-bytes scheme is sha256(tile_blob || zoomLevel || latitude || longitude || capture_timestamp || flight_id || companion_id || quality_metadata_json); the parent suite's D-PROJ-2 ingest endpoint MUST agree on this scheme (the leftover file documents the contract sketch). Any divergence at the parent-suite side surfaces as signature_rejected and gets FDR-alerted.
  • The uploader does NOT modify the multipart payload's tile_blob — bytes go from C6 directly into the POST body.
  • The order of operations is gate → start_session → enumerate → batch loop → finally end_session. Reordering is a Reliability finding (High).
  • Concurrent C11 invocations are blocked by C12's lockfile; this task asserts the lock exists at construction.
  • This task introduces no new third-party dependencies beyond httpx and cryptography (already used in AZ-316 and AZ-318).
  • The companion_id field comes from config.c11.companion_id — not auto-detected, not derived from hostname; documented because the parent suite's voting layer relies on stable per-companion identifiers.

Risks & Mitigation

Risk 1: Parent-suite ingest endpoint not yet implemented (D-PROJ-2)

  • Risk: Until satellite-provider ships the POST endpoint, every upload fails with 404.
  • Mitigation: The e2e-test mock-suite-sat-service fixture (under tests/fixtures/, owned by E-BBT) implements the planned POST contract. The C11 unit tests run against a fake httpx.Client; integration tests run against the mock fixture. Production retire to the real endpoint when it ships; no code change in C11.

Risk 2: Signature canonical-bytes drift between C11 and parent suite

  • Risk: A subtle JSON-ordering or float-formatting drift produces signatures that don't verify on the parent side.
  • Mitigation: AC-13 property test asserts bitwise determinism on the C11 side; the leftover file documents the canonical scheme; the parent-suite team's Plan cycle will reuse the same scheme. If they diverge, signature_rejected surfaces immediately and the safety officer is alerted.

Risk 3: Retry-After parsing for HTTP-date format

  • Risk: The parent suite returns Retry-After: <date> not <seconds>; naïve parsing crashes.
  • Mitigation: Same as AZ-316 (TileDownloader Risk 1) — parse both forms; cap wait at config.c11.max_retry_after_s.

Risk 4: try/finally violation (key not zeroised on KeyboardInterrupt)

  • Risk: A KeyboardInterrupt during the batch loop bypasses the finally if poorly written.
  • Mitigation: The finally is unconditional (Python's try/finally runs for KeyboardInterrupt); a unit test injects KeyboardInterrupt mid-batch and asserts end_session ran.

Risk 5: Partial-success state inconsistency

  • Risk: A tile is marked uploaded in C6 but the parent suite later disputes (race between mark_uploaded and the safety officer's audit).
  • Mitigation: mark_uploaded records the batch_uuid (per AZ-303 contract); audits cross-reference batch_uuid + tile_id against the parent suite's ingest log. The race window is ≤ 1 sec (mark happens immediately after the per-tile response is parsed). Documented; not addressed in this task.

Runtime Completeness

  • Named capability: post-landing tile upload to D-PROJ-2 ingest endpoint, AC-8.4 enforcement, F10 functional flow, R09 mitigation via per-flight key (composed from AZ-318), parent-suite voting-layer enabler.
  • Production code that must exist: real HttpTileUploader orchestrating real httpx POSTs, real C6 mark_uploaded calls, real try/finally zeroisation, real composition-root factory, real config schema extension, real canonical-byte scheme.
  • Allowed external stubs: tests MAY use a fake httpx.Client, fake Clock, fake C6 stores (already provided by AZ-303's conformance fakes), fake FlightStateGate and PerFlightKeyManager (so this task's tests don't drag in AZ-317/AZ-318 internals); production wiring uses real all the way down.
  • Unacceptable substitutes: skipping the gate (defeats AC-8.4 defence-in-depth); silently retrying signature rejections without FDR (loses safety officer surface); reusing a static signing key (reintroduces R09); marking a tile uploaded before the parent suite acks (data integrity violation); manually building the multipart boundary (httpx's files= is the right interface).

Contract

This task produces/implements the contract at _docs/02_document/contracts/c11_tilemanager/tile_uploader.md. Consumers MUST read that file — not this task spec — to discover the interface.