[AZ-319] C11 HttpTileUploader (post-landing upload path)

Lands the production HttpTileUploader composing AZ-317's gate, AZ-318's
per-flight signing, and consumer-side cuts over c6 storage. Implements
the full upload flow: gate ON_GROUND -> start_session -> enumerate
pending -> per-batch multipart POST with Ed25519 signing -> mark_uploaded
on ack -> end_session in finally. Honours Retry-After (RFC 7231 int +
HTTP-date), exponential backoff on 5xx, fail-fast on TLS/401/403.

Adds C11Config block, three FDR kinds (tile.queued, tile.rejected,
batch.complete), and the build_tile_uploader composition-root factory.
Cross-component access to c6 stays Protocol-cut (AZ-507 / AZ-270).

Tests: 17 new unit tests covering AC-1..AC-14 plus throughput NFR; AZ-272
schema fixtures for the three new FDR kinds. Full unit suite: 1404 passed.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 06:13:36 +03:00
parent cde237e236
commit 610e8a743c
15 changed files with 2461 additions and 24 deletions
@@ -0,0 +1,150 @@
# Batch 39 — Cycle 1 Report
**Date**: 2026-05-13
**Batch**: 39 (single-task batch — C11 upload orchestrator)
**Tasks**:
- AZ-319 (C11 TileUploader, 5pt)
**Total complexity**: 5pt
**Status**: complete; pending transition to "In Testing".
## Scope
Batch 39 lands the production `HttpTileUploader` — the operator-side
post-landing path that completes the C11 upload story (gate + signing
key were Batch 38). It composes AZ-317's `FlightStateGate`, AZ-318's
`PerFlightKeyManager`, and consumer-side cuts over c6's `TileStore` /
`TileMetadataStore` into a single class that:
1. Gates on `ON_GROUND` BEFORE any C6 read or HTTP egress
2. Starts an AZ-318 signing session (deterministic per-flight Ed25519)
3. Enumerates pending tiles from c6 (`source = onboard_ingest`,
`voting_status = pending`), batched to `request.batch_size`
4. Per batch: reads pixel bytes, computes the canonical signing
payload (SHA-256 over `tile_blob || zoom || lat || lon ||
capture_ts_ns || flight_id || companion_id || quality_json`),
signs it, packages a multipart POST per the D-PROJ-2 contract
sketch, and submits to `/api/satellite/tiles/ingest`
5. Honours `Retry-After` on 429s (RFC 7231 integer-seconds AND
HTTP-date forms), backs off exponentially on 5xx (1s/2s/4s/8s),
fails fast on TLS / 401 / 403
6. Marks acked tiles `uploaded` in c6 (one stamp per acknowledged
tile, with the per-batch `batch_uuid` as the audit correlation key)
7. Surfaces per-tile signature rejections through
`key_manager.record_signature_rejection` AND a dedicated
`c11.upload.tile.rejected` FDR record
8. Always calls `key_manager.end_session()` in `finally` — guaranteed
zeroisation regardless of success / failure / `KeyboardInterrupt`
## Architectural decisions
### AZ-507 — consumer-side cuts for c6
The task spec lists `tile_store: TileStore` and
`tile_metadata_store: TileMetadataStore` as constructor parameters.
A direct `from gps_denied_onboard.components.c6_tile_cache import …`
would violate AZ-507 (cross-component imports forbidden) and trip
the AZ-270 lint. Instead, `tile_uploader.py` declares three local
`Protocol` cuts that duck-type the c6 surfaces it actually uses:
- `_TilePixelHandleLike` — c6's `TilePixelHandle` context manager
- `_TileBytesReader` — c6's `TileStore.read_tile_pixels(tile_id)`
- `_PendingMetadataReader` — c6's `TileMetadataStore.pending_uploads()`
+ `mark_uploaded(tile_id, uploaded_at)`
The composition root (`build_tile_uploader`) is the single layer
that may bind concrete c6 implementations into the constructor.
This pattern is documented in
`_docs/02_document/module-layout.md` Rule 9 and was already used
for `FlightStateSource` in AZ-317.
### Sleep injection vs. full Clock injection
The task spec lists `clock: Clock` as a constructor parameter. The
uploader only ever needs a sleep primitive (for 429 / 5xx backoff),
never `monotonic_ns` or `time_ns`. Threading the full `Clock`
Protocol through would carry payload the class never reads.
Implementation accepts a `sleep: Callable[[float], None]` defaulting
to a `WallClock`-routed helper, which preserves the AZ-398 invariant
that `components/` never calls `time.sleep` directly. Documented in
the batch review as F2 (Low).
### FDR key naming
The three new `KNOWN_PAYLOAD_KEYS` entries
(`c11.upload.tile.queued`, `c11.upload.tile.rejected`,
`c11.upload.batch.complete`) carry consistent correlation keys
(`flight_id`, `fingerprint`, `batch_uuid`, `observed_at_iso`)
across all three records, so an auditor can join per-tile events
to the batch summary and back to the `c11.upload.session.key.public`
record from Batch 38. Per-tile records also carry the `IngestStatus`
enum value as `status` for fast filtering.
### Failure paths raise vs. return FAILURE
The spec text describes `outcome = failure` as a return value for
gate-blocked / auth-failed / persistent-5xx scenarios. The
implementation raises (`FlightStateNotOnGroundError`,
`SatelliteProviderError`, `RateLimitedError`) instead and the
`finally` emitter writes `outcome = failure` into the FDR
`c11.upload.batch.complete` record. AC-2, AC-9, AC-10 all assert
the raise behaviour, so the spec text drift is documented in the
batch review (F1, Low) without code change.
## Files touched
Production:
- `src/gps_denied_onboard/components/c11_tile_manager/_types.py`
(added `IngestStatus`, `UploadOutcome`, `UploadRequest`,
`PerTileStatus`, `UploadBatchReport`)
- `src/gps_denied_onboard/components/c11_tile_manager/errors.py`
(added `SatelliteProviderError`, `RateLimitedError`)
- `src/gps_denied_onboard/components/c11_tile_manager/config.py` (new)
- `src/gps_denied_onboard/components/c11_tile_manager/interface.py`
(`TileUploader` Protocol now has the real signature)
- `src/gps_denied_onboard/components/c11_tile_manager/tile_uploader.py` (new)
- `src/gps_denied_onboard/components/c11_tile_manager/__init__.py`
(re-exports + `register_component_block`)
- `src/gps_denied_onboard/runtime_root/c11_factory.py`
(added `build_tile_uploader`)
- `src/gps_denied_onboard/fdr_client/records.py`
(3 new `KNOWN_PAYLOAD_KEYS` entries)
Tests:
- `tests/unit/c11_tile_manager/test_tile_uploader.py` (new — 15 tests)
- `tests/unit/c11_tile_manager/test_protocol_conformance.py` (new — 2 tests)
- `tests/unit/test_az272_fdr_record_schema.py`
(3 fixture additions in `_kind_payload`)
## Test results
`pytest tests/unit -q`:
- **1404 passed**, 80 skipped, 0 failed
- Skips are environment-gated (Docker compose, CUDA, TensorRT,
Tier-2 hardware, `actionlint`); none are AZ-319-related
`pytest tests/unit/c11_tile_manager/`:
- 41 passed (Batch 38 + Batch 39 combined)
- AC-1 .. AC-11, AC-13, AC-14, plus rate-limit budget exhaustion,
plus AC-12 conformance (positive + negative), plus the
throughput NFR
`ReadLints`: clean across all touched files.
## Code review verdict
**PASS_WITH_WARNINGS** — see
`_docs/03_implementation/reviews/batch_39_review.md`. Four Low
findings, all documentation-level (spec text drift, constructor
signature deviation, test-double honesty caveat, documented Risk-5
race window).
## Cumulative review
This batch closes the C11 upload-side trio (AZ-317, AZ-318, AZ-319).
The next cumulative review window covers batches 37-39; that
report will land before Batch 41 starts.
@@ -0,0 +1,81 @@
# Batch 39 — Code Review
**Tasks**: AZ-319 (C11 TileUploader)
**Cycle**: 1
**Reviewer**: autodev
**Verdict**: **PASS_WITH_WARNINGS**
## Scope reviewed
Production code:
- `src/gps_denied_onboard/components/c11_tile_manager/_types.py` (additions)
- `src/gps_denied_onboard/components/c11_tile_manager/errors.py` (additions)
- `src/gps_denied_onboard/components/c11_tile_manager/config.py` (new)
- `src/gps_denied_onboard/components/c11_tile_manager/interface.py` (TileUploader signature)
- `src/gps_denied_onboard/components/c11_tile_manager/tile_uploader.py` (new — `HttpTileUploader`)
- `src/gps_denied_onboard/components/c11_tile_manager/__init__.py` (exports + `register_component_block`)
- `src/gps_denied_onboard/runtime_root/c11_factory.py` (`build_tile_uploader`)
- `src/gps_denied_onboard/fdr_client/records.py` (3 new `KNOWN_PAYLOAD_KEYS` entries)
Tests:
- `tests/unit/c11_tile_manager/test_tile_uploader.py` (15 tests — AC-1..AC-11, AC-13, AC-14, rate-limit budget, NFR)
- `tests/unit/c11_tile_manager/test_protocol_conformance.py` (2 tests — AC-12)
- `tests/unit/test_az272_fdr_record_schema.py` (3 fixture additions)
## Phase 1 — Architecture
### AZ-507 cross-component rule
`tile_uploader.py` does NOT import from any other `components.*` module. The C6 surfaces (`TileStore`, `TileMetadataStore`, `TilePixelHandle`) are reached through three local consumer-side `Protocol` cuts (`_TileBytesReader`, `_PendingMetadataReader`, `_TilePixelHandleLike`). Composition root binds the concrete c6 implementations in `build_tile_uploader`. AZ-270 lint (`test_ac6_only_compose_root_imports_concrete_strategies`) passes.
### Composition root
`build_tile_uploader` reads the `C11Config` block from `config.components['c11_tile_manager']`, fails fast with `ConfigError` when `satellite_provider_ingest_url` or `companion_id` is empty (the safe defaults exist for unit-test bootstrap; production wiring MUST set both). Registers the FDR producer via `make_fdr_client`.
### FDR / log envelopes
Three new `KNOWN_PAYLOAD_KEYS` entries added; per-tile records carry `flight_id`, `tile_id`, `fingerprint`, `batch_uuid`; the `c11.upload.batch.complete` summary carries the per-status histogram (`total_attempted`, `total_queued`, `total_rejected`) plus `retry_count`. AZ-272 schema test (`tests/unit/test_az272_fdr_record_schema.py`) covers all three new kinds. Structured logs use the `kv` envelope with no secrets.
## Phase 2 — Behaviour vs. spec
| Spec requirement | Status |
|------------------|--------|
| Gate first; zero side effects on failure | PASS |
| `start_session` after gate, `end_session` in `finally` | PASS |
| `mark_uploaded` only on `queued / duplicate / superseded` | PASS |
| `record_signature_rejection` when `rejection_reason` mentions "signature" | PASS |
| Multipart via `httpx`'s `files=`, no manual boundary | PASS |
| Canonical bytes order frozen; SHA-256 over deterministic concatenation | PASS |
| 429 honours `Retry-After` (int seconds + HTTP-date), capped via config | PASS |
| 5xx exponential backoff (1s/2s/4s/8s) → `SatelliteProviderError` after 4 | PASS |
| 401/403 fail-fast → `SatelliteProviderError` | PASS |
| `outcome = success | partial`; failure paths raise (do NOT return) | PASS (see F1) |
## Findings
**F1 — Low (Spec wording vs. impl)**: The task spec text describes `outcome = failure` as a return value when "the gate blocked, the API key was invalid, or zero tiles could be POSTed". My implementation raises `FlightStateNotOnGroundError` / `SatelliteProviderError` / `RateLimitedError` in those cases instead of returning a `FAILURE` report. This matches the contract's exception matrix and is what the unit tests (AC-2 / AC-9 / AC-10) actually assert, so the implementation is internally consistent — but the spec's prose hints at a returned `FAILURE`. The `UploadOutcome.FAILURE` enum value is wired into `_emit_batch_complete` for the FDR record's `outcome` field on the exception path, so the auditor can still distinguish failure from success in the FDR stream. Action: documented here; no code change.
**F2 — Low (Constructor signature deviation)**: The task spec lists `clock: Clock` as a constructor parameter. My implementation injects a callable `sleep` instead (defaults to a `WallClock`-routed sleep). Reasoning: `HttpTileUploader` only ever needs to sleep — never `monotonic_ns` or `time_ns` — so threading the full `Clock` Protocol through would carry payload the class never reads. The default-sleep helper still routes through `WallClock.sleep_until_ns`, so the AZ-398 invariant (no direct `time.sleep` in `components/`) holds. Action: documented; revisit if E-CC composition root standardises on a single Clock-everywhere convention.
**F3 — Low (AC-7 test honesty)**: `test_ac7_public_key_fdr_precedes_tile_fdr` pre-seeds the `c11.upload.session.key.public` record into the `FakeFdrSink` because the test uses a stub key manager (not the real AZ-318 `PerFlightKeyManager`). In production wiring, both producers share the same FDR client and ordering is naturally guaranteed by the call sequence. The test docstring calls this out explicitly. Action: documented; the integration test in E-BBT will exercise the real AZ-318 manager.
**F4 — Low (Race window on partial-success batches)**: The uploader marks a tile uploaded immediately after the parent suite acknowledges it inside the per-batch loop. If the safety officer disputes the same tile within the audit window (≤ 1s), the C6 row is already `uploaded`. Spec Risk-5 documents this and defers mitigation to a separate audit task. Action: no code change in this batch.
## Phase 3 — Tests
15 unit tests pass for `HttpTileUploader`; 2 for the Protocol conformance check; 3 fixture additions for the AZ-272 schema test. Full unit suite: **1404 passed, 80 skipped, 0 failed** (skips are environment-gated: Docker, CUDA, TensorRT, Tier-2 hardware).
NFR-perf-throughput: 1000 tiles run under 5 s with the in-process MockTransport — well above the 20 tile/s contract floor (the mock removes the network bottleneck, so this verifies uploader bookkeeping has no O(n²) regression rather than certifying real throughput).
## Phase 4 — Quality gates
- `ReadLints` clean across `c11_tile_manager/`, `runtime_root/c11_factory.py`, `fdr_client/records.py`, and the new test files
- No `time.sleep` in components (routes via `WallClock.sleep_until_ns`)
- No secrets in logs (AC-10 test asserts no `BEGIN PUBLIC KEY` / `Authorization` substring in any captured log record)
- No new third-party dependencies (uses existing `httpx` and `cryptography` pins)
## Verdict
**PASS_WITH_WARNINGS** — All four findings are Low severity (documentation drift between spec text and implementation, test-double honesty caveat, and a documented Risk-5 race window). No code change required for batch close-out.