[AZ-320] Add C11 IdempotentRetryTileUploader decorator

Wraps HttpTileUploader (AZ-319) with two bounded retry budgets:

- In-call (per-batch) — re-invokes inner on PARTIAL outcome up to
  `max_in_call_retries` times with capped exponential backoff
  (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an
  operator hint via `next_retry_at_s = now + backoff_cap_s`.
- Per-tile (cross-call) — atomically increments c6's
  `tiles.upload_attempts` counter for every rejection; once a tile
  hits `max_per_tile_attempts` it is forward-only transitioned to
  `voting_status = upload_giveup` (excluded from `pending_uploads`).
  Each transition emits FDR `kind="c11.upload.giveup"` plus an
  ERROR log.

C6 contract changes (AZ-303 v1.3.0):
- VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED).
- TileMetadataStore.increment_upload_attempts(tile_id) -> int added
  with NotImplementedError default for backwards-compat.
- Migration 0003_c11_upload_attempts: additive column +
  widened ck_tiles_voting_status (preserves IS NULL clause).

C11 wiring:
- C11RetryConfig + disable_retry_decorator on C11Config.
- build_tile_uploader wraps in decorator by default; bypass flag
  returns the bare HttpTileUploader. New `clock` keyword.

Cross-component isolation honoured (AZ-507): the decorator declares
`_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore
and references `UPLOAD_GIVEUP` via a local string constant — no c6
imports.

Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum
update + alembic head bump + AZ-272 schema fixture. 238 passed across
c11/c6/fdr suites; pre-existing perf microbenches unrelated.

Code review: PASS_WITH_WARNINGS (5 Low/Informational findings,
docs-level or downstream-CI-blocked). See
_docs/03_implementation/reviews/batch_41_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 08:48:53 +03:00
parent 90f4ac78f4
commit a06b107fc3
19 changed files with 1788 additions and 21 deletions
@@ -7,9 +7,9 @@
- AZ-TBD-c6-freshness-gate (insert hook + sector classification reader)
- AZ-TBD-c6-cache-budget-eviction (LRU candidate enumeration + delete coordination)
- TBD at decompose time: E-C10 (AZ-252 — manifest + provisioning), E-C11 (AZ-251 — both `TileDownloader` insert and `TileUploader` reader queries), E-C12 (AZ-253 — operator pre-flight tooling)
**Version**: 1.2.0
**Version**: 1.3.0
**Status**: draft
**Last Updated**: 2026-05-12
**Last Updated**: 2026-05-13
## Purpose
@@ -32,6 +32,7 @@ Defines the typed boundary to the Postgres-backed spatial index over `TileMetada
| `lru_candidates` | `(*, max_count: int) -> list[TileMetadata]` | `TileMetadataError` | sync (oldest-`accessed_at`-first; bounded result set) |
| `total_disk_bytes` | `() -> int` | `TileMetadataError` | sync (sum of `disk_bytes` column; ≤ 100 ms even at 100k rows) |
| `get_by_id` | `(tile_id: TileId) -> Optional[TileMetadata]` | `TileMetadataError` | sync; returns `None` if absent (NOT `TileNotFoundError`) |
| `increment_upload_attempts` | `(tile_id: TileId) -> int` | `TileMetadataError`, `TileNotFoundError` | sync; atomic ``UPDATE … RETURNING`` (per-row lock); added in v1.3.0 |
### DTOs
@@ -99,7 +100,7 @@ class SectorBoundary:
- **I-5 (disk-budget invariant):** `total_disk_bytes` MUST equal `SUM(disk_bytes)` over all rows where `voting_status != REJECTED`. Rejected rows are tombstones — they keep the on-disk file deleted but retain the row for the manifest's content-hash check (D-C10-3).
- **I-6 (frozen DTOs):** `Bbox`, `SectorBoundary`, `TileMetadataPersistent` are `@dataclass(frozen=True)`.
- **I-7 (transactional writes):** `insert_metadata` is a single transaction over the `tiles` table; the freshness check + the row insert MUST be atomic (a parallel sector-boundary update MUST NOT race the gate).
- **I-8 (no silent voting-status downgrade):** `update_voting_status` accepts only forward transitions (`PENDING → TRUSTED`, `PENDING → REJECTED`); a backward transition raises `TileMetadataError`. `TRUSTED → REJECTED` is allowed (covers the cache-poisoning recall path).
- **I-8 (no silent voting-status downgrade):** `update_voting_status` accepts only forward transitions (`PENDING → TRUSTED`, `PENDING → REJECTED`, `TRUSTED → REJECTED`, `PENDING → UPLOAD_GIVEUP`, `TRUSTED → UPLOAD_GIVEUP`); a backward transition raises `TileMetadataError`. `TRUSTED → REJECTED` covers the cache-poisoning recall path; the two `UPLOAD_GIVEUP` transitions (added in v1.3.0 by AZ-320) cover the C11 retry decorator's per-tile budget exhaustion. `UPLOAD_GIVEUP → anything` is forbidden — recovery is an out-of-band SQL UPDATE by the operator.
- **I-9 (`pending_uploads` is the single source for C11 TileUploader):** the uploader MUST NOT scan the filesystem for pending tiles; it MUST drive its loop off `pending_uploads()`. The metadata store is the bookkeeping.
## Non-Goals
@@ -144,3 +145,4 @@ Same rules as `tile_store.md` § Versioning Rules.
| 1.0.0 | 2026-05-10 | Initial contract — 9-method Protocol + LRU/disk-budget extensions + freshness gate semantics + composite-key uniqueness invariant. | autodev (decompose Step 2 of AZ-250 / E-C6) |
| 1.1.0 | 2026-05-12 | Non-breaking refinement of Invariant I-1: natural key switched from `(zoom_level, lat, lon, source)` (float-based) to `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, zero_uuid))` (integer + per-flight separated). Protocol surface unchanged; consumers gain the ability to observe multiple ONBOARD_INGEST rows for the same cell from different flights (required by D-PROJ-2 voting). Driven by `_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md` and the cross-workspace satellite-provider task `AZ-TBD_tile_identity_uuidv5_bulk_list`. | autodev (AZ-304 batch 27 of cycle 1) |
| 1.2.0 | 2026-05-12 | Non-breaking addition of `TileMetadata.location_hash: UUID \| None = None` (cross-source/cross-flight cell-bag identifier; UUIDv5 over `(zoom, tile_x, tile_y)`). Corrected stale references: sector table name (`sector_boundaries` → `sector_classifications`) and Alembic env path (`c6_tile_cache/_alembic/` → `db/migrations/versions/`). Protocol surface unchanged; existing constructors continue to work because the field defaults to `None`. Shipped by AZ-304 alongside the additive `0002_c6_tile_identity_and_lru` migration. | autodev (AZ-304 batch 27 of cycle 1) |
| 1.3.0 | 2026-05-13 | Non-breaking addition of (a) `VotingStatus.UPLOAD_GIVEUP` terminal state, (b) two new forward transitions (`PENDING → UPLOAD_GIVEUP`, `TRUSTED → UPLOAD_GIVEUP`) under Invariant I-8, (c) the `increment_upload_attempts(tile_id) -> int` Protocol method (atomic per-row UPDATE … RETURNING), and (d) the `tiles.upload_attempts INTEGER NOT NULL DEFAULT 0` column. The Protocol method body raises `NotImplementedError` so legacy duck-typed impls keep their conformance — production wiring uses `PostgresFilesystemStore` which ships the SQL. `pending_uploads()` now also excludes `voting_status = upload_giveup`. Shipped by AZ-320 (C11 retry decorator) alongside the additive `0003_c11_upload_attempts` migration. | autodev (AZ-320 batch 41 of cycle 1) |