[AZ-319] C11 HttpTileUploader (post-landing upload path)

Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's
per-flight signing, and consumer-side cuts over c6 storage. Implements
the full upload flow: gate ON_GROUND -> start_session -> enumerate
pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded
on ack -> end_session in finally. Honours Retry-After (RFC 7231 int +
HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403.

Adds C11Config block, three FDR kinds (tile.queued, tile.rejected,
batch.complete), and the build_tile_uploader composition-root factory.
Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270).

Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272
schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 06:13:36 +03:00
parent cde237e236
commit 610e8a743c
15 changed files with 2461 additions and 24 deletions
@@ -0,0 +1,250 @@
# C11 TileUploader — Read Pending + Sign + POST + Mark Uploaded
**Task**: AZ-319_c11_tile_uploader
**Name**: C11 TileUploader
**Description**: Implement the `TileUploader` Protocol — C11's operator-side post-landing upload path. `upload_pending_tiles` calls AZ-317's `FlightStateGate.confirm_on_ground()` first, starts an AZ-318 signing session, reads pending mid-flight tiles from C6 (`source = onboard_ingest`, `voting_status = pending`) via the AZ-303 metadata store, packages each tile per the D-PROJ-2 multipart contract sketch (tile_blob, geo metadata, capture_timestamp, flight_id, companion_id, quality_metadata, signature), signs each payload, POSTs to `/api/satellite/tiles/ingest`, parses the per-tile response, and marks acknowledged tiles uploaded in C6. Honours `Retry-After` on 429s; fails fast on TLS / auth; surfaces `signature_rejected` per tile via FDR. The signing key is zeroised in a try/finally guarantee. Idempotent-retry across partial-success batches is a separate decorator task in this epic.
**Complexity**: 5 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf, AZ-303_c6_storage_interfaces, AZ-305_c6_postgres_filesystem_store, AZ-317_c11_flight_state_gate, AZ-318_c11_signing_key
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
**Tracker**: AZ-319
**Epic**: AZ-251 (E-C11)
### Document Dependencies
- `_docs/02_document/contracts/c11_tilemanager/tile_uploader.md` — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` — consumed: `pending_uploads`, `mark_uploaded`, `get_by_id`.
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — consumed: `read_tile_pixels` for the multipart blob.
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/WARN/ERROR log shapes for upload events.
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md``kind="c11.upload.tile.queued"` / `kind="c11.upload.tile.rejected"` / `kind="c11.upload.batch.complete"` envelopes.
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 3.2 D-PROJ-2 contract sketch, § 5 error handling.
- `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — D-PROJ-2 design task #1 ingest endpoint shape.
## Problem
Without a real `TileUploader`:
- AC-8.4 (post-landing upload of mid-flight tiles to the parent suite) collapses — the pending-upload journal grows unboundedly across flights.
- D-PROJ-2's safety-officer correlation cannot work — the public-key + tile-id linkage exists only at upload time.
- The AC-NEW-7 voting / trust layer (parent-suite side) has no inputs — without uploads, no flights ever vote.
- Mid-flight tile generation (E-C13 mid-flight tile snapshot, AZ-294) becomes a leaf system: tiles land in C6 with `voting_status = pending` and stay there forever.
- `SignatureRejectedError` from the parent suite has no detection path; a key compromise would not surface to the safety officer until manual log inspection.
- Operators have no observable post-landing operation; the F10 functional flow has no implementation.
This task delivers the production uploader. It composes AZ-317 (gate) + AZ-318 (signing) + AZ-303/305 (C6) + httpx; it adds no new responsibilities beyond orchestration, so the surface area is tight.
## Outcome
- A `TileUploader` Protocol + concrete `HttpTileUploader` class at `src/gps_denied_onboard/components/c11_tilemanager/`:
- `interface.py` exposes `TileUploader` Protocol (`runtime_checkable`).
- `tile_uploader.py` houses `HttpTileUploader`.
- `_types.py` adds `UploadRequest`, `UploadBatchReport`, `PerTileStatus`, `UploadOutcome` (StrEnum), `IngestStatus` (StrEnum) — all `@dataclass(frozen=True)` for the data DTOs.
- `errors.py` adds `SignatureRejectedError` (subclasses `TileManagerError`); `FlightStateNotOnGroundError` and the rest are already declared in AZ-317/AZ-318/AZ-316.
- Constructor signature:
`__init__(self, *, http_client: httpx.Client, tile_store: TileStore, tile_metadata_store: TileMetadataStore, flight_state_gate: FlightStateGate, key_manager: PerFlightKeyManager, fdr_client: FdrClient, logger: Logger, clock: Clock, config: C11Config)`. Injected dependencies — no module-level singletons.
- `upload_pending_tiles(request)` flow:
1. Calls `flight_state_gate.confirm_on_ground()` (raises if not ON_GROUND; ZERO state-mutation prior to this).
2. Calls `key_manager.start_session(flight_id_for_session)``flight_id_for_session` is `request.flight_id` if provided else `uuid.uuid4()` ("session id" for the multi-flight case).
3. In a `try` block:
- Calls `tile_metadata_store.pending_uploads(flight_id=request.flight_id)` to enumerate pending tiles.
- If empty → returns `UploadBatchReport(outcome=success, per_tile_status=(), batch_uuid=uuid4())`.
- Splits the pending list into batches of `request.batch_size`.
- For each batch:
- Reads each tile's pixel bytes via `tile_store.read_tile_pixels(tile_id)`.
- Builds the multipart payload per tile: `tile_blob`, `zoomLevel`, `latitude`, `longitude`, `tile_size_meters`, `tile_size_pixels`, `capture_timestamp`, `flight_id`, `companion_id`, `quality_metadata` (JSON), `signature` (`key_manager.sign(canonical_payload_bytes)`).
- Canonical payload bytes for signing: SHA-256 of `tile_blob || zoomLevel || latitude || longitude || capture_timestamp || flight_id || companion_id || quality_metadata_json` (deterministic byte concatenation; documented).
- POSTs the multipart to `{config.satellite_provider_url}/api/satellite/tiles/ingest`.
- On 202: parses `batch_uuid` + `per_tile_status[]` from the response body. For each `queued | duplicate | superseded` tile, calls `tile_metadata_store.mark_uploaded(tile_id, batch_uuid)`. For each `rejected` tile, calls `key_manager.record_signature_rejection(flight_id, tile_id)` if the rejection reason mentions signature; emits FDR `kind="c11.upload.tile.rejected"` with the reason regardless.
- On 429: honours `Retry-After`; on persistent 429 → `RateLimitedError`.
- On 5xx: exponential backoff (1s, 2s, 4s; 4 retries max); persistent → `SatelliteProviderError`.
- On TLS / 401 / 403: fail fast → `SatelliteProviderError`.
- Aggregates `UploadBatchReport`:
- `outcome = success` if ALL tiles are `queued | duplicate | superseded`.
- `outcome = partial` if any `rejected` OR any unparseable response with otherwise-acked tiles.
- `outcome = failure` if the gate blocked, the API key was invalid, or zero tiles could be POSTed.
- `public_key_fingerprint` = the AZ-318 fingerprint from `start_session`.
- `batch_uuid` = the LAST successful batch's UUID (or `uuid4()` if none succeeded; documented).
4. In a `finally` block:
- Calls `key_manager.end_session()` — guaranteed zeroisation regardless of success / failure / exception.
- Emits FDR `kind="c11.upload.batch.complete"` with `{flight_id_for_session, public_key_fingerprint, total_attempted, total_queued, total_rejected, outcome, observed_at_iso}`.
- `enumerate_pending_tiles(flight_id)` returns `tile_metadata_store.pending_uploads(flight_id)` directly (read-only enumeration).
- `confirm_flight_state()` returns `flight_state_gate.confirm_on_ground()` (passes through; raises on non-ON_GROUND).
- INFO log on session start/end with batch counts; WARN log per retry; ERROR log on `SatelliteProviderError`, `FlightStateNotOnGroundError` (caught and re-raised after log).
- Composition root constructs `HttpTileUploader` via `build_tile_uploader(config) -> TileUploader` at `src/gps_denied_onboard/runtime_root/c11_factory.py`.
- Configuration extension to AZ-269 loader: `config.c11.satellite_provider_ingest_url`, `config.c11.upload_batch_size`, `config.c11.upload_http_timeout_s`, `config.c11.companion_id`.
- Type-only conformance test verifies `isinstance(HttpTileUploader(...), TileUploader)`.
## Scope
### Included
- `TileUploader` Protocol declaration + `HttpTileUploader` concrete class.
- `UploadRequest`, `UploadBatchReport`, `PerTileStatus`, `UploadOutcome`, `IngestStatus` DTOs.
- `SignatureRejectedError` definition (parent of `TileManagerError`).
- The orchestration: gate → start_session → enumerate → batch loop → mark_uploaded / FDR alert → end_session.
- Multipart payload construction + canonical bytes for signing.
- HTTP retry / backoff / `Retry-After` handling for the upload path.
- Composition-root factory `build_tile_uploader`.
- Config schema extension for the C11 upload fields.
- Conformance test at `tests/unit/c11_tilemanager/test_protocol_conformance.py`.
### Excluded
- The `TileDownloader` Protocol and concrete impl — separate task (AZ-316).
- `FlightStateGate` impl — owned by AZ-317.
- `PerFlightKeyManager` impl — owned by AZ-318.
- Idempotent-retry-on-partial-success batch decorator — separate task in this epic (AZ-320_c11_idempotent_retry).
- The R02 ADR-004 build-time exclusion — owned by E-BOOT.
- The pre-flight key enrolment workflow at C12 — owned by E-C12.
- The `mock-suite-sat-service` fixture under `tests/fixtures/` — owned by E-BBT (test infrastructure).
- Voting / trust promotion — owned by D-PROJ-2 / `satellite-provider`.
- E-C8's `FlightStateSource` impl — owned by E-C8 (AZ-261).
## Acceptance Criteria
**AC-1: Happy path uploads all pending tiles**
Given 50 pending tiles in C6, ON_GROUND, parent suite returns 202 with all `queued`
When `upload_pending_tiles(request)` is called
Then 50 POSTs issued (one per tile or batched per `batch_size`); all 50 marked `uploaded` in C6 (verifiable via `mark_uploaded` spy); `UploadBatchReport.outcome = success`; one FDR `kind="c11.upload.batch.complete"` with `total_attempted=50, total_queued=50`
**AC-2: Flight-state gate blocks before any read or POST**
Given `FlightStateGate.confirm_on_ground()` raises `FlightStateNotOnGroundError(IN_FLIGHT)`
When `upload_pending_tiles(request)` is called
Then `FlightStateNotOnGroundError` is raised; ZERO calls to `pending_uploads` (verifiable via spy); ZERO HTTP POSTs; ZERO calls to `key_manager.start_session` (key generation is also gated); `key_manager.end_session()` is NOT called (no session was started)
**AC-3: Signature rejection per tile is FDR'd and not marked uploaded**
Given parent suite returns `rejected` for 1 tile with reason `"invalid signature"`
When the response is parsed
Then `key_manager.record_signature_rejection(flight_id, tile_id)` is called once; `tile_metadata_store.mark_uploaded` is NOT called for that tile; the tile remains `voting_status = pending`; FDR `kind="c11.upload.tile.rejected"` is emitted with the reason; report's `outcome = partial`
**AC-4: `duplicate` and `superseded` are treated as success**
Given parent suite returns `duplicate` for 5 tiles and `superseded` for 3 tiles
When the response is parsed
Then all 8 are `mark_uploaded`'d in C6 with the batch_uuid; report's per_tile_status reflects the original status; `outcome = success` if no `rejected`
**AC-5: Signing key is zeroised on success**
Given a successful upload
When `upload_pending_tiles` returns
Then `key_manager.end_session()` was called once (verifiable via spy); the AZ-318 manager's `_private_key is None`
**AC-6: Signing key is zeroised on failure**
Given the FIRST POST raises a connection-reset error
When `upload_pending_tiles` raises `SatelliteProviderError`
Then `key_manager.end_session()` was called (try/finally executed); the manager's `_private_key is None`; the partial state in C6 is consistent (no half-marked tiles)
**AC-7: Public-key FDR record precedes any tile FDR**
Given a session with at least one tile
When the FDR stream is captured
Then `kind="c11.upload.session.key.public"` is observed BEFORE any `kind="c11.upload.tile.*"` record
**AC-8: 429 honours Retry-After**
Given parent suite returns 429 with `Retry-After: 60` on the first POST
When the uploader processes the response
Then `Clock.sleep` is called with ≥ 60s; on success the run proceeds; the report includes `retry_count >= 1`
**AC-9: Persistent 5xx aborts with structured error**
Given parent suite returns 503 for 5 consecutive attempts
When the uploader exhausts retries
Then `SatelliteProviderError` is raised; the report is NOT returned (the exception propagates); `key_manager.end_session()` was called via finally
**AC-10: TLS / 401 / 403 fail fast**
Given the first POST returns 401
When the uploader processes the response
Then `SatelliteProviderError` is raised on the first attempt; zero retries; the public key is NOT logged; the API key (if any TLS auth header) is NOT logged
**AC-11: Empty pending set is success with no POSTs**
Given zero pending tiles in C6
When `upload_pending_tiles(request)` is called
Then `outcome = success`; `per_tile_status` is empty; `key_manager.start_session` was called (signature still required by D-PROJ-2 for the empty-batch ack record per § 3.2; documented); `end_session` was called; ONE FDR `c11.upload.batch.complete` with `total_attempted=0`
**AC-12: Conformance — concrete impl satisfies Protocol**
Given an `HttpTileUploader` instance
When `isinstance(impl, TileUploader)` is checked under `runtime_checkable`
Then the result is `True`; a fake omitting `confirm_flight_state` returns `False`
**AC-13: Canonical signing bytes are deterministic**
Given the same tile metadata + tile bytes
When `_canonical_payload_bytes(tile)` is computed twice
Then the two byte strings are bitwise identical (no map ordering, no JSON whitespace drift); the SHA-256 over them matches; this is asserted via property test with N random tiles
**AC-14: Partial-success batches return without raising**
Given a 10-tile batch where 7 are `queued`, 3 are `rejected`
When `upload_pending_tiles` returns
Then NO exception is raised; `outcome = partial`; `per_tile_status` has all 10 entries with their respective statuses; the 7 acked tiles are marked uploaded in C6; the 3 rejected stay pending
## Non-Functional Requirements
**Performance**
- Upload throughput ≥ 20 tile/s with signing (C11-PT-02); the bottleneck is the network plus signing per tile.
- Per-tile signing ≤ 200 µs (Ed25519 from AZ-318); per-tile multipart construction ≤ 1 ms.
**Compatibility**
- `httpx` per project pin; `cryptography` per project pin.
- Multipart form encoding per `httpx`'s `files=` parameter — no manual boundary construction.
**Reliability**
- Try/finally ensures `key_manager.end_session()` runs in EVERY exit path including unexpected exceptions and KeyboardInterrupt.
- The uploader writes to C6 ONLY via the AZ-303 Protocol (`mark_uploaded`); it does NOT touch the metadata table directly.
- Concurrent invocations against the same `cache_root` are gated by C12's filesystem lockfile (same lock as TileDownloader); the uploader asserts the lock at construction.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | 50-tile happy path | All `mark_uploaded`'d; `outcome=success`; FDR batch.complete present |
| AC-2 | Gate raises before any work | Zero spies fire on `pending_uploads`, POST, `start_session` |
| AC-3 | One signature rejection in a 5-tile batch | `record_signature_rejection` called once; rejected tile NOT marked uploaded; outcome=partial |
| AC-4 | Mix of `duplicate` and `superseded` responses | All marked uploaded; outcome=success |
| AC-5 | Successful upload | `end_session` called; `_private_key is None` |
| AC-6 | Mid-batch failure | `end_session` called; key zeroised |
| AC-7 | FDR stream order | `key.public` before any `tile.*` |
| AC-8 | 429 + Retry-After: 60 | `Clock.sleep` ≥ 60s; retry succeeds |
| AC-9 | 5x 503 | `SatelliteProviderError`; finally still ran |
| AC-10 | 401 first attempt | Fail-fast; no API-key in any log |
| AC-11 | Empty pending set | outcome=success; zero POSTs; key still session-started/ended |
| AC-12 | `isinstance` check on impl + partial fake | True / False |
| AC-13 | Property test: deterministic canonical bytes | Bitwise equal for N samples |
| AC-14 | Partial-success batch | No exception; outcome=partial; per-tile statuses correct |
| NFR-perf-throughput | 1000 tiles via fake httpx | ≥ 20 tile/s including signing |
## Constraints
- The signing canonical-bytes scheme is `sha256(tile_blob || zoomLevel || latitude || longitude || capture_timestamp || flight_id || companion_id || quality_metadata_json)`; the parent suite's D-PROJ-2 ingest endpoint MUST agree on this scheme (the leftover file documents the contract sketch). Any divergence at the parent-suite side surfaces as `signature_rejected` and gets FDR-alerted.
- The uploader does NOT modify the multipart payload's tile_blob — bytes go from C6 directly into the POST body.
- The order of operations is gate → start_session → enumerate → batch loop → finally end_session. Reordering is a Reliability finding (High).
- Concurrent C11 invocations are blocked by C12's lockfile; this task asserts the lock exists at construction.
- This task introduces no new third-party dependencies beyond `httpx` and `cryptography` (already used in AZ-316 and AZ-318).
- The `companion_id` field comes from `config.c11.companion_id` — not auto-detected, not derived from hostname; documented because the parent suite's voting layer relies on stable per-companion identifiers.
## Risks & Mitigation
**Risk 1: Parent-suite ingest endpoint not yet implemented (D-PROJ-2)**
- *Risk*: Until `satellite-provider` ships the POST endpoint, every upload fails with 404.
- *Mitigation*: The e2e-test `mock-suite-sat-service` fixture (under `tests/fixtures/`, owned by E-BBT) implements the planned POST contract. The C11 unit tests run against a fake `httpx.Client`; integration tests run against the mock fixture. Production retire to the real endpoint when it ships; no code change in C11.
**Risk 2: Signature canonical-bytes drift between C11 and parent suite**
- *Risk*: A subtle JSON-ordering or float-formatting drift produces signatures that don't verify on the parent side.
- *Mitigation*: AC-13 property test asserts bitwise determinism on the C11 side; the leftover file documents the canonical scheme; the parent-suite team's Plan cycle will reuse the same scheme. If they diverge, `signature_rejected` surfaces immediately and the safety officer is alerted.
**Risk 3: `Retry-After` parsing for HTTP-date format**
- *Risk*: The parent suite returns `Retry-After: <date>` not `<seconds>`; naïve parsing crashes.
- *Mitigation*: Same as AZ-316 (TileDownloader Risk 1) — parse both forms; cap wait at `config.c11.max_retry_after_s`.
**Risk 4: try/finally violation (key not zeroised on `KeyboardInterrupt`)**
- *Risk*: A `KeyboardInterrupt` during the batch loop bypasses the finally if poorly written.
- *Mitigation*: The finally is unconditional (Python's `try/finally` runs for `KeyboardInterrupt`); a unit test injects `KeyboardInterrupt` mid-batch and asserts `end_session` ran.
**Risk 5: Partial-success state inconsistency**
- *Risk*: A tile is marked `uploaded` in C6 but the parent suite later disputes (race between `mark_uploaded` and the safety officer's audit).
- *Mitigation*: `mark_uploaded` records the `batch_uuid` (per AZ-303 contract); audits cross-reference `batch_uuid` + `tile_id` against the parent suite's ingest log. The race window is ≤ 1 sec (mark happens immediately after the per-tile response is parsed). Documented; not addressed in this task.
## Runtime Completeness
- **Named capability**: post-landing tile upload to D-PROJ-2 ingest endpoint, AC-8.4 enforcement, F10 functional flow, R09 mitigation via per-flight key (composed from AZ-318), parent-suite voting-layer enabler.
- **Production code that must exist**: real `HttpTileUploader` orchestrating real `httpx` POSTs, real C6 `mark_uploaded` calls, real `try/finally` zeroisation, real composition-root factory, real config schema extension, real canonical-byte scheme.
- **Allowed external stubs**: tests MAY use a fake `httpx.Client`, fake `Clock`, fake C6 stores (already provided by AZ-303's conformance fakes), fake `FlightStateGate` and `PerFlightKeyManager` (so this task's tests don't drag in AZ-317/AZ-318 internals); production wiring uses real all the way down.
- **Unacceptable substitutes**: skipping the gate (defeats AC-8.4 defence-in-depth); silently retrying signature rejections without FDR (loses safety officer surface); reusing a static signing key (reintroduces R09); marking a tile uploaded before the parent suite acks (data integrity violation); manually building the multipart boundary (`httpx`'s `files=` is the right interface).
## Contract
This task produces/implements the contract at `_docs/02_document/contracts/c11_tilemanager/tile_uploader.md`.
Consumers MUST read that file — not this task spec — to discover the interface.