[AZ-320] Add C11 IdempotentRetryTileUploader decorator

Wraps HttpTileUploader (AZ-319) with two bounded retry budgets:

- In-call (per-batch) — re-invokes inner on PARTIAL outcome up to
  `max_in_call_retries` times with capped exponential backoff
  (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an
  operator hint via `next_retry_at_s = now + backoff_cap_s`.
- Per-tile (cross-call) — atomically increments c6's
  `tiles.upload_attempts` counter for every rejection; once a tile
  hits `max_per_tile_attempts` it is forward-only transitioned to
  `voting_status = upload_giveup` (excluded from `pending_uploads`).
  Each transition emits FDR `kind="c11.upload.giveup"` plus an
  ERROR log.

C6 contract changes (AZ-303 v1.3.0):
- VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED).
- TileMetadataStore.increment_upload_attempts(tile_id) -> int added
  with NotImplementedError default for backwards-compat.
- Migration 0003_c11_upload_attempts: additive column +
  widened ck_tiles_voting_status (preserves IS NULL clause).

C11 wiring:
- C11RetryConfig + disable_retry_decorator on C11Config.
- build_tile_uploader wraps in decorator by default; bypass flag
  returns the bare HttpTileUploader. New `clock` keyword.

Cross-component isolation honoured (AZ-507): the decorator declares
`_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore
and references `UPLOAD_GIVEUP` via a local string constant — no c6
imports.

Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum
update + alembic head bump + AZ-272 schema fixture. 238 passed across
c11/c6/fdr suites; pre-existing perf microbenches unrelated.

Code review: PASS_WITH_WARNINGS (5 Low/Informational findings,
docs-level or downstream-CI-blocked). See
_docs/03_implementation/reviews/batch_41_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 08:48:53 +03:00
parent 90f4ac78f4
commit a06b107fc3
19 changed files with 1788 additions and 21 deletions
@@ -1,11 +1,13 @@
"""C11 TileManager config block (AZ-316, AZ-319).
"""C11 TileManager config block (AZ-316, AZ-319, AZ-320).
Registered into ``config.components['c11_tile_manager']`` by the
package ``__init__.py``. Two composition-root factories read this
package ``__init__.py``. Three composition-root factories read this
block:
* :func:`gps_denied_onboard.runtime_root.c11_factory.build_tile_uploader`
reads the ``upload_*`` fields and ``companion_id`` to drive AZ-319.
reads the ``upload_*`` fields, ``companion_id``, and the AZ-320
``retry`` block (``disable_retry_decorator`` + the per-tile / per-call
retry knobs) to drive AZ-319 + the optional AZ-320 decorator.
* :func:`gps_denied_onboard.runtime_root.c11_factory.build_tile_downloader`
reads the ``satellite_provider_url``, ``service_api_key``, and
``download_*`` fields to drive AZ-316.
@@ -19,11 +21,11 @@ wiring.
from __future__ import annotations
from dataclasses import dataclass
from dataclasses import dataclass, field
from gps_denied_onboard.config.schema import ConfigError
__all__ = ["C11Config"]
__all__ = ["C11Config", "C11RetryConfig"]
_DEFAULT_BATCH_SIZE: int = 25
@@ -34,6 +36,55 @@ _DEFAULT_DOWNLOAD_RESOLUTION_FLOOR: float = 0.5
_DEFAULT_DOWNLOAD_MAX_5XX_RETRIES: int = 4
_MIN_DOWNLOAD_RETRIES: int = 1
_MAX_DOWNLOAD_RETRIES: int = 16
_DEFAULT_MAX_IN_CALL_RETRIES: int = 3
_DEFAULT_MAX_PER_TILE_ATTEMPTS: int = 5
_DEFAULT_RETRY_BACKOFF_BASE_S: float = 2.0
_DEFAULT_RETRY_BACKOFF_CAP_S: float = 60.0
@dataclass(frozen=True)
class C11RetryConfig:
"""C11 ``IdempotentRetryTileUploader`` knobs (AZ-320).
* ``max_in_call_retries`` — bounded loop count for partial-success
re-invocations of the wrapped uploader within a single call.
* ``max_per_tile_attempts`` — terminal threshold per tile across
ALL calls; exceeding the threshold moves the tile to
:class:`VotingStatus.UPLOAD_GIVEUP` (a human-decision boundary —
automated promotion back to ``PENDING`` is forbidden).
* ``backoff_base_s`` — base of the exponential backoff used between
in-call retries (``base ** retries_used``).
* ``backoff_cap_s`` — upper bound on each individual backoff sleep;
also used as the operator hint for ``next_retry_at_s`` when the
in-call budget is exhausted.
"""
max_in_call_retries: int = _DEFAULT_MAX_IN_CALL_RETRIES
max_per_tile_attempts: int = _DEFAULT_MAX_PER_TILE_ATTEMPTS
backoff_base_s: float = _DEFAULT_RETRY_BACKOFF_BASE_S
backoff_cap_s: float = _DEFAULT_RETRY_BACKOFF_CAP_S
def __post_init__(self) -> None:
if self.max_in_call_retries < 0:
raise ConfigError(
"C11RetryConfig.max_in_call_retries must be >= 0; "
f"got {self.max_in_call_retries}"
)
if self.max_per_tile_attempts <= 0:
raise ConfigError(
"C11RetryConfig.max_per_tile_attempts must be > 0; "
f"got {self.max_per_tile_attempts}"
)
if self.backoff_base_s <= 0:
raise ConfigError(
"C11RetryConfig.backoff_base_s must be > 0; "
f"got {self.backoff_base_s}"
)
if self.backoff_cap_s <= 0:
raise ConfigError(
"C11RetryConfig.backoff_cap_s must be > 0; "
f"got {self.backoff_cap_s}"
)
@dataclass(frozen=True)
@@ -81,6 +132,9 @@ class C11Config:
download_max_retry_after_s: int = _DEFAULT_MAX_RETRY_AFTER_S
download_resolution_floor_m_per_px: float = _DEFAULT_DOWNLOAD_RESOLUTION_FLOOR
disable_retry_decorator: bool = False
retry: C11RetryConfig = field(default_factory=C11RetryConfig)
def __post_init__(self) -> None:
if not 1 <= self.upload_batch_size <= _MAX_BATCH_SIZE:
raise ConfigError(
@@ -118,3 +172,8 @@ class C11Config:
"C11Config.download_resolution_floor_m_per_px must be > 0; "
f"got {self.download_resolution_floor_m_per_px}"
)
if not isinstance(self.retry, C11RetryConfig):
raise ConfigError(
"C11Config.retry must be a C11RetryConfig; got "
f"{type(self.retry).__name__}"
)