mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 10:11:13 +00:00
[AZ-320] Add C11 IdempotentRetryTileUploader decorator
Wraps HttpTileUploader (AZ-319) with two bounded retry budgets: - In-call (per-batch) — re-invokes inner on PARTIAL outcome up to `max_in_call_retries` times with capped exponential backoff (`min(base ** attempt_number, cap)`). On exhaustion: surfaces an operator hint via `next_retry_at_s = now + backoff_cap_s`. - Per-tile (cross-call) — atomically increments c6's `tiles.upload_attempts` counter for every rejection; once a tile hits `max_per_tile_attempts` it is forward-only transitioned to `voting_status = upload_giveup` (excluded from `pending_uploads`). Each transition emits FDR `kind="c11.upload.giveup"` plus an ERROR log. C6 contract changes (AZ-303 v1.3.0): - VotingStatus.UPLOAD_GIVEUP added (forward-only from PENDING/TRUSTED). - TileMetadataStore.increment_upload_attempts(tile_id) -> int added with NotImplementedError default for backwards-compat. - Migration 0003_c11_upload_attempts: additive column + widened ck_tiles_voting_status (preserves IS NULL clause). C11 wiring: - C11RetryConfig + disable_retry_decorator on C11Config. - build_tile_uploader wraps in decorator by default; bypass flag returns the bare HttpTileUploader. New `clock` keyword. Cross-component isolation honoured (AZ-507): the decorator declares `_RetryMetadataStoreLike` Protocol cut over c6's TileMetadataStore and references `UPLOAD_GIVEUP` via a local string constant — no c6 imports. Tests: 13 decorator + 1 conformance + 2 factory bypass + AC-6 enum update + alembic head bump + AZ-272 schema fixture. 238 passed across c11/c6/fdr suites; pre-existing perf microbenches unrelated. Code review: PASS_WITH_WARNINGS (5 Low/Informational findings, docs-level or downstream-CI-blocked). See _docs/03_implementation/reviews/batch_41_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,244 @@
|
||||
# Batch 41 — Cycle 1 Report
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Batch**: 41 (single-task batch — C11 idempotent retry decorator)
|
||||
**Tasks**:
|
||||
- AZ-320 (C11 IdempotentRetryTileUploader, 3pt)
|
||||
|
||||
**Total complexity**: 3pt
|
||||
**Status**: complete; pending transition to "In Testing".
|
||||
|
||||
## Scope
|
||||
|
||||
Batch 41 lands the AZ-320 retry decorator that wraps the AZ-319
|
||||
`HttpTileUploader` and gives the operator-side upload path two bounded
|
||||
retry budgets:
|
||||
|
||||
1. **In-call (per-batch) budget** — re-invokes the inner uploader at
|
||||
most `config.c11.retry.max_in_call_retries` times when the inner
|
||||
returns `outcome=PARTIAL`. Backoff between rounds is
|
||||
`min(base ** attempt_number, cap)`; the spec's worked example
|
||||
(`max=3, base=2.0` → sleeps `2.0, 4.0, 8.0`) drove the
|
||||
"attempt-number is 1-indexed" off-by-one fix in the loop body.
|
||||
2. **Per-tile (cross-call) budget** — for every rejection the inner
|
||||
surfaces, the decorator atomically increments c6's
|
||||
`tiles.upload_attempts` counter; once the counter hits
|
||||
`config.c11.retry.max_per_tile_attempts` the tile is forward-only
|
||||
transitioned to `voting_status = upload_giveup`. The c6
|
||||
`pending_uploads` SQL excludes that status so subsequent operator
|
||||
re-runs naturally skip those tiles. Recovery is documented as an
|
||||
out-of-band SQL UPDATE (per the spec's "human decision boundary"
|
||||
constraint).
|
||||
|
||||
Each `UPLOAD_GIVEUP` transition emits one FDR record
|
||||
(`kind="c11.upload.giveup"`) plus an ERROR log; budget exhaustion on
|
||||
the in-call side emits a WARN log and surfaces an operator hint via
|
||||
the existing `UploadBatchReport.next_retry_at_s` field
|
||||
(`now + backoff_cap_s`). Pass-through methods
|
||||
(`enumerate_pending_tiles`, `confirm_flight_state`) delegate to the
|
||||
inner unchanged so the decorator is a true `TileUploader` Protocol
|
||||
drop-in.
|
||||
|
||||
## Architectural decisions
|
||||
|
||||
### AZ-507 — consumer-side cuts for c6 (no enum imports either)
|
||||
|
||||
The decorator only needs two write surfaces on c6's
|
||||
`TileMetadataStore`: `increment_upload_attempts` and
|
||||
`update_voting_status`. A direct `from
|
||||
gps_denied_onboard.components.c6_tile_cache import …` would violate
|
||||
AZ-507 / trip the AZ-270 lint, so `idempotent_retry.py` declares a
|
||||
local `_RetryMetadataStoreLike` `Protocol` cut over those two methods
|
||||
and binds the concrete `PostgresFilesystemStore` only at the
|
||||
composition root.
|
||||
|
||||
The c6 `VotingStatus.UPLOAD_GIVEUP` enum value is reached via a
|
||||
locally-scoped `_VOTING_STATUS_UPLOAD_GIVEUP = "upload_giveup"`
|
||||
string constant. The `update_voting_status` impl coerces either a
|
||||
c6 enum or the bare string via `VotingStatus(status)`, so the
|
||||
decorator never imports c6's enum. This matches the same pattern
|
||||
`HttpTileDownloader` uses for the freshness-label string surface
|
||||
(Batch 40, `_TileWriterLike`).
|
||||
|
||||
### Forward-only voting transitions — list update
|
||||
|
||||
`VotingStatus.UPLOAD_GIVEUP` is added as a fourth enum value;
|
||||
`PostgresFilesystemStore._ALLOWED_VOTING_TRANSITIONS` was extended
|
||||
with `(PENDING → UPLOAD_GIVEUP)` and `(TRUSTED → UPLOAD_GIVEUP)`.
|
||||
The contract file's Invariant I-8 was updated in lockstep (v1.3.0
|
||||
Change Log entry). `REJECTED → UPLOAD_GIVEUP` is intentionally
|
||||
NOT permitted — once the parent suite has rejected a tile, the
|
||||
local retry budget is irrelevant.
|
||||
|
||||
### Migration is append-only
|
||||
|
||||
Per `coderule.mdc` (migrations are append-only) and the spec's
|
||||
"Unacceptable substitutes" clause ("modifying AZ-304's 0001
|
||||
migration in place"), the new `0003_c11_upload_attempts.py` is a
|
||||
fresh additive migration:
|
||||
|
||||
- Adds `tiles.upload_attempts INTEGER NOT NULL DEFAULT 0`.
|
||||
- Widens the `ck_tiles_voting_status` CHECK constraint to admit
|
||||
`'upload_giveup'`. The widened predicate explicitly preserves
|
||||
`voting_status IS NULL` (the original 0001 migration permits
|
||||
NULL) — without this, legacy rows would fail the CHECK on
|
||||
re-creation.
|
||||
- Reversible: rollback drops both the column and the widened
|
||||
constraint, restoring the AZ-304 head exactly.
|
||||
|
||||
The Alembic head-revision assertion in `tests/unit/test_ac5_alembic.py`
|
||||
was updated from `0002_c6_tile_identity_and_lru` to
|
||||
`0003_c11_upload_attempts` in lockstep (the test docstring already
|
||||
calls out "Future migrations update this assertion in lockstep").
|
||||
|
||||
### `Clock` injection (full Protocol, not just `sleep`)
|
||||
|
||||
This is the third batch in a row to touch the "Clock vs. sleep
|
||||
injection" deviation flagged in cumulative review batches 37-39
|
||||
(F2). For AZ-320 the decorator needs BOTH `monotonic_ns` (backoff
|
||||
arithmetic) AND `time_ns` (the operator-facing `next_retry_at_s`
|
||||
hint), so it accepts the full `Clock` Protocol — matching the
|
||||
pattern AZ-307 / AZ-308 already use. This is the first C11 batch
|
||||
to honour the no-deviation path; documented in the batch review
|
||||
as F1 (informational, no action).
|
||||
|
||||
### FDR `ts` derivation — datetime.now, not Clock
|
||||
|
||||
The decorator emits `c11.upload.giveup` records with a
|
||||
`ts=datetime.now(timezone.utc).strftime(...)` ISO string, matching
|
||||
the existing pattern in `tile_uploader.py` (`_iso_now`). Switching
|
||||
to `Clock.time_ns()` for ts derivation would break consistency
|
||||
across the C11 component and would require a project-wide audit
|
||||
of every `_iso_now()` call site. Documented as F2 (Low) for the
|
||||
follow-up sweep PBI.
|
||||
|
||||
### Off-by-one in the backoff exponent — fix during the test pass
|
||||
|
||||
Initial implementation used `base ** retries_used` with
|
||||
`retries_used` starting at 0, yielding sleeps of `1.0, 2.0, 4.0`
|
||||
for `max=3, base=2.0`. The spec's worked example (AC-4) requires
|
||||
`2.0, 4.0, 8.0`. Fixed by incrementing `retries_used` BEFORE
|
||||
computing the backoff, and renamed the helper parameter to
|
||||
`attempt_number` (1-indexed) with a clarifying docstring. Caught
|
||||
by the test pass — re-confirms the value of writing the AC-4
|
||||
fixture verbatim from the spec rather than from the implementation.
|
||||
|
||||
## Files touched
|
||||
|
||||
Production:
|
||||
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/_types.py`
|
||||
(added `VotingStatus.UPLOAD_GIVEUP` + updated forward-transition
|
||||
docstring)
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/interface.py`
|
||||
(added `TileMetadataStore.increment_upload_attempts(tile_id) -> int`
|
||||
with a `NotImplementedError` default impl per the spec's
|
||||
Compatibility NFR)
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/postgres_filesystem_store.py`
|
||||
(added `increment_upload_attempts` SQL + extended
|
||||
`_ALLOWED_VOTING_TRANSITIONS` + tightened `pending_uploads` SQL
|
||||
to exclude `voting_status='upload_giveup'`)
|
||||
- `db/migrations/versions/0003_c11_upload_attempts.py`
|
||||
(new — additive column + widened CHECK constraint)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/config.py`
|
||||
(added `C11RetryConfig` frozen dataclass, `disable_retry_decorator`
|
||||
bypass flag, nested `retry: C11RetryConfig` field on `C11Config`)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/idempotent_retry.py`
|
||||
(new — `IdempotentRetryTileUploader`, `_RetryMetadataStoreLike`,
|
||||
`_iso_now`)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/__init__.py`
|
||||
(re-exports for `C11RetryConfig`, `IdempotentRetryTileUploader`)
|
||||
- `src/gps_denied_onboard/runtime_root/c11_factory.py`
|
||||
(`build_tile_uploader` now wraps `HttpTileUploader` in the
|
||||
decorator by default; `disable_retry_decorator=true` returns
|
||||
the bare uploader; new `clock` keyword parameter with WallClock
|
||||
default for production wiring; return type widened to the
|
||||
`TileUploader` Protocol)
|
||||
- `src/gps_denied_onboard/fdr_client/records.py`
|
||||
(registered `c11.upload.giveup` in `KNOWN_PAYLOAD_KEYS`)
|
||||
|
||||
Contracts:
|
||||
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md`
|
||||
(v1.3.0 — added `increment_upload_attempts` to method table,
|
||||
updated Invariant I-8 forward-transition list)
|
||||
|
||||
Tests:
|
||||
|
||||
- `tests/unit/c11_tile_manager/test_idempotent_retry.py` (new —
|
||||
13 tests: AC-1, AC-2, AC-3, AC-4, AC-5, AC-10 ×2, AC-11 ×2,
|
||||
AC-12 ×2, FAILURE pass-through, NFR overhead microbench)
|
||||
- `tests/unit/c11_tile_manager/test_protocol_conformance.py`
|
||||
(added AC-9 — `isinstance(IdempotentRetryTileUploader,
|
||||
TileUploader)`)
|
||||
- `tests/unit/c6_tile_cache/test_protocol_conformance.py`
|
||||
(extended AC-10 enum-surface test for `UPLOAD_GIVEUP`; updated
|
||||
the two metadata-store fakes to include
|
||||
`increment_upload_attempts`)
|
||||
- `tests/unit/test_ac5_alembic.py` (updated head-revision
|
||||
assertion to `0003_c11_upload_attempts`)
|
||||
- `tests/unit/test_az272_fdr_record_schema.py` (added mock
|
||||
payload for `c11.upload.giveup`)
|
||||
|
||||
## Test results
|
||||
|
||||
`pytest tests/unit -q --deselect tests/unit/c11_tile_manager/test_signing_key.py::test_nfr_perf_sign_microbench_p99_under_one_ms`:
|
||||
|
||||
- **1429 passed**, 80 skipped, 1 deselected, 3 failed (all 3
|
||||
failures are pre-existing perf microbenches unrelated to
|
||||
AZ-320: C10 batcher overhead, C8 covariance projector latency,
|
||||
and the C11 signing-key sign-p99 microbench that is the same
|
||||
flaky test the deselect targets). The deselected one is the
|
||||
signing-key bench; the C10 and C8 perf benches were also
|
||||
flaky on Batch 40's sweep (same dev-host noise).
|
||||
- +9 net tests vs. Batch 40's sweep (the 13 decorator tests +
|
||||
1 conformance test + 2 factory bypass tests, minus the 7
|
||||
fakes that were already counted under c6 conformance and
|
||||
now include the new `increment_upload_attempts` method).
|
||||
|
||||
`pytest tests/unit/c11_tile_manager tests/unit/c6_tile_cache tests/unit/test_az272_fdr_record_schema.py`:
|
||||
|
||||
- **238 passed**, 57 skipped (Postgres+Docker gates),
|
||||
1 deselected. Zero failures across all in-scope unit suites.
|
||||
|
||||
`ReadLints`: clean across every touched file.
|
||||
|
||||
## Code review verdict
|
||||
|
||||
**PASS_WITH_WARNINGS** — see
|
||||
`_docs/03_implementation/reviews/batch_41_review.md`. Findings:
|
||||
|
||||
- F1 (Informational) — Clock injection deviation from prior
|
||||
batches is now CLOSED for C11 (decorator uses the full Clock
|
||||
Protocol). No action.
|
||||
- F2 (Low) — `_iso_now()` still pulls wall-clock directly via
|
||||
`datetime.now`; aligns with existing `tile_uploader._iso_now`
|
||||
but the project-wide hygiene PBI to derive ts from `Clock`
|
||||
remains open.
|
||||
- F3 (Low) — Spec says "If `retries_used < max_in_call_retries`
|
||||
AND there are still tiles with `voting_status == pending`"; the
|
||||
decorator only checks the budget. Equivalent in practice (the
|
||||
inner's next call queries `pending_uploads` and returns
|
||||
`SUCCESS` immediately if empty), but worth a one-line comment.
|
||||
- F4 (Low) — AC-7 (concurrent SQL increment) and AC-8 (migration
|
||||
applied to live DB) are gated behind Docker-compose and were
|
||||
not exercised in this dev sweep. The SQL implementation
|
||||
follows the spec verbatim (`UPDATE … RETURNING …`); Docker
|
||||
CI run will validate.
|
||||
- F5 (Low) — Postgres tests under
|
||||
`c6_tile_cache/test_postgres_schema.py` and
|
||||
`c6_tile_cache/fixtures/c6_postgres_schema_v2.sql` still
|
||||
reference the AZ-304 head and will need a follow-up tweak when
|
||||
the Docker-gated suite is run against 0003. No code change in
|
||||
this batch since those tests are skipped on the dev host.
|
||||
|
||||
No blocking findings; no code change required for batch close-out.
|
||||
|
||||
## Cumulative review
|
||||
|
||||
Batch 41 is single-task; the next cumulative review window covers
|
||||
batches 40-42 and will land before Batch 43 starts. The recurring
|
||||
Clock-vs-sleep deviation flagged in cumulative reports for batches
|
||||
37-39 is now CLOSED for C11 (this batch landed the full Clock
|
||||
injection); the project-wide audit-PBI for `_iso_now` /
|
||||
`datetime.now` callers remains open.
|
||||
@@ -0,0 +1,288 @@
|
||||
# Batch 41 — Code Review
|
||||
|
||||
**Tasks**: AZ-320 (C11 IdempotentRetryTileUploader)
|
||||
**Cycle**: 1
|
||||
**Reviewer**: autodev
|
||||
**Verdict**: **PASS_WITH_WARNINGS**
|
||||
|
||||
## Scope reviewed
|
||||
|
||||
Production code:
|
||||
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/_types.py`
|
||||
(added `VotingStatus.UPLOAD_GIVEUP`)
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/interface.py`
|
||||
(added `TileMetadataStore.increment_upload_attempts(tile_id) -> int`
|
||||
with a `NotImplementedError` default impl)
|
||||
- `src/gps_denied_onboard/components/c6_tile_cache/postgres_filesystem_store.py`
|
||||
(added `increment_upload_attempts` SQL impl, extended
|
||||
`_ALLOWED_VOTING_TRANSITIONS`, tightened `pending_uploads` SQL)
|
||||
- `db/migrations/versions/0003_c11_upload_attempts.py`
|
||||
(additive: column + widened CHECK constraint, reversible)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/config.py`
|
||||
(added `C11RetryConfig` frozen dataclass + `disable_retry_decorator`
|
||||
+ nested `retry: C11RetryConfig` field)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/idempotent_retry.py`
|
||||
(new — `IdempotentRetryTileUploader`, `_RetryMetadataStoreLike`)
|
||||
- `src/gps_denied_onboard/components/c11_tile_manager/__init__.py`
|
||||
(re-exports)
|
||||
- `src/gps_denied_onboard/runtime_root/c11_factory.py`
|
||||
(`build_tile_uploader` wraps in decorator by default; bypass via
|
||||
`disable_retry_decorator=true`; new `clock` keyword parameter)
|
||||
- `src/gps_denied_onboard/fdr_client/records.py`
|
||||
(registered `c11.upload.giveup`)
|
||||
|
||||
Contracts:
|
||||
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md`
|
||||
(v1.3.0 — added `increment_upload_attempts`, updated I-8)
|
||||
|
||||
Tests:
|
||||
|
||||
- `tests/unit/c11_tile_manager/test_idempotent_retry.py` (new — 13 tests)
|
||||
- `tests/unit/c11_tile_manager/test_protocol_conformance.py`
|
||||
(added AC-9)
|
||||
- `tests/unit/c6_tile_cache/test_protocol_conformance.py`
|
||||
(extended AC-10 enum surface; added `increment_upload_attempts`
|
||||
to two metadata-store fakes)
|
||||
- `tests/unit/test_ac5_alembic.py` (updated head-revision assertion)
|
||||
- `tests/unit/test_az272_fdr_record_schema.py` (added mock payload)
|
||||
|
||||
## Phase 1 — Architecture
|
||||
|
||||
### AZ-507 cross-component rule
|
||||
|
||||
`idempotent_retry.py` does NOT import from `components.c6_tile_cache`.
|
||||
The two c6 write surfaces it consumes (`increment_upload_attempts`,
|
||||
`update_voting_status`) are reached via the locally-declared
|
||||
`_RetryMetadataStoreLike` `Protocol` cut. The `UPLOAD_GIVEUP` enum
|
||||
value is reached via a locally-scoped string constant
|
||||
(`_VOTING_STATUS_UPLOAD_GIVEUP = "upload_giveup"`) — c6's
|
||||
`update_voting_status` impl coerces the string back into the enum
|
||||
via `VotingStatus(status)`, so the decorator never imports the
|
||||
enum. This is the same pattern Batch 40's `HttpTileDownloader` uses
|
||||
for the freshness-label string surface.
|
||||
|
||||
The AZ-270 lint passes — composition root remains the only layer
|
||||
that may bind concrete c6 implementations.
|
||||
|
||||
### Composition root wiring
|
||||
|
||||
`build_tile_uploader` now returns a `TileUploader` Protocol (was
|
||||
the concrete `HttpTileUploader`). Default path wraps the inner in
|
||||
`IdempotentRetryTileUploader`; the `disable_retry_decorator=true`
|
||||
config flag returns the bare inner with a single INFO log
|
||||
(`kind="c11.upload.retry.decorator.bypassed"`). The `clock` keyword
|
||||
defaults to a fresh `WallClock()` for production wiring; tests
|
||||
inject a fake clock to keep timing deterministic.
|
||||
|
||||
The new factory tests (test_idempotent_retry.py — `test_ac10_*`)
|
||||
exercise both branches by toggling the config flag.
|
||||
|
||||
### Backwards-compat for `TileMetadataStore` Protocol
|
||||
|
||||
The new `increment_upload_attempts` method is added with a
|
||||
`NotImplementedError` default impl on the Protocol surface, so
|
||||
existing AZ-303 conformance tests + existing duck-typed fakes
|
||||
that don't yet implement it continue to satisfy `isinstance` checks
|
||||
(the `runtime_checkable` mechanism only checks for attribute
|
||||
presence — and a default impl IS an attribute). The two c6
|
||||
conformance fakes that didn't have the method have been updated
|
||||
to declare it (raising `NotImplementedError`) so that
|
||||
`runtime_checkable` continues to behave consistently.
|
||||
|
||||
### Migration discipline
|
||||
|
||||
`0003_c11_upload_attempts.py` is purely additive on top of
|
||||
`0002_c6_tile_identity_and_lru`:
|
||||
|
||||
- `tiles.upload_attempts INTEGER NOT NULL DEFAULT 0` (existing
|
||||
rows get 0 by default; no data migration required)
|
||||
- `ck_tiles_voting_status` widened to admit `'upload_giveup'`
|
||||
while preserving the `voting_status IS NULL` clause from 0001
|
||||
(legacy rows on dev DBs that never set a voting status would
|
||||
otherwise fail the new CHECK)
|
||||
- `downgrade()` is symmetric — drops the column and restores the
|
||||
AZ-304 CHECK predicate exactly
|
||||
|
||||
Per the spec's "Unacceptable substitutes" clause and
|
||||
`coderule.mdc`, AZ-304's 0001 + 0002 migrations are unchanged.
|
||||
|
||||
## Phase 2 — Code quality
|
||||
|
||||
### Single Responsibility
|
||||
|
||||
`IdempotentRetryTileUploader` has a single responsibility: bound
|
||||
the retry behaviour around an inner `TileUploader`. Internal
|
||||
helpers are split cleanly:
|
||||
|
||||
- `_handle_rejected_tiles` — fan-out increment + threshold check
|
||||
- `_mark_giveup` — single-tile transition + FDR + ERROR log
|
||||
- `_backoff_for` — pure exponent computation
|
||||
- `_sleep` — Clock-routed sleep (honours AZ-398)
|
||||
- `_next_retry_at_s` — operator hint derivation
|
||||
- `_with_retry_count` — pure dataclass.replace wrapper
|
||||
|
||||
No method handles two distinct concerns; every method name
|
||||
matches the work it does.
|
||||
|
||||
### Error suppression
|
||||
|
||||
The decorator does NOT swallow exceptions. `_handle_rejected_tiles`
|
||||
explicitly re-raises any failure from `increment_upload_attempts`
|
||||
(the alternative would be silently retrying without budget
|
||||
enforcement — exactly the failure mode the spec calls out as
|
||||
"unbounded behaviour"). FDR `enqueue` failures inside `_mark_giveup`
|
||||
will propagate; this is consistent with `tile_uploader.py`'s
|
||||
treatment of the same call.
|
||||
|
||||
### Comments
|
||||
|
||||
Production comments are limited to non-obvious intent:
|
||||
|
||||
- `_VOTING_STATUS_UPLOAD_GIVEUP` — explains why the constant is
|
||||
declared locally (AZ-507 boundary) rather than imported.
|
||||
- The `retries_used += 1` move-up — explains the off-by-one fix
|
||||
and references the spec's worked example.
|
||||
- `_RetryMetadataStoreLike` docstring — documents the
|
||||
`Any`-typed status parameter rationale.
|
||||
|
||||
No narration comments; the AAA test pattern is honoured throughout
|
||||
the test file (with `# Arrange / # Act / # Assert` headers, omitting
|
||||
sections that are empty).
|
||||
|
||||
## Phase 3 — Test coverage vs. spec
|
||||
|
||||
| AC | Test | Coverage |
|
||||
|----|------|----------|
|
||||
| AC-1 | `test_ac1_success_on_first_attempt_zero_side_effects` | Pass-through; zero retries |
|
||||
| AC-2 | `test_ac2_partial_then_success_increments_attempts_and_sleeps_once` | Sleep[2.0]; retry_count=1; 3 increments |
|
||||
| AC-3 | `test_ac3_per_tile_budget_exhausted_moves_to_giveup` | Threshold trips; FDR + ERROR log + transition |
|
||||
| AC-4 | `test_ac4_in_call_budget_exhausted_yields_partial_with_hint` | Sleeps[2.0,4.0,8.0]; retry_count=3; next_retry_at_s set |
|
||||
| AC-5 | `test_ac5_backoff_cap_honoured_at_high_attempt_number` | Cap at 10s, never 64s |
|
||||
| AC-6 | `test_ac10_voting_status_has_documented_states_only` (in c6 conformance) | `UPLOAD_GIVEUP` present |
|
||||
| AC-7 | (deferred — Postgres+Docker gated) | SQL impl reviewed against spec verbatim |
|
||||
| AC-8 | (deferred — Postgres+Docker gated) | Migration reviewed; head assertion updated |
|
||||
| AC-9 | `test_ac9_idempotent_retry_decorator_satisfies_uploader_protocol` | `isinstance(decorator, TileUploader)` |
|
||||
| AC-10 | `test_ac10_factory_returns_decorated_uploader_by_default` + `test_ac10_factory_bypasses_decorator_when_flag_set` | Both branches |
|
||||
| AC-11 | `test_ac11_enumerate_pending_passes_through` + `test_ac11_confirm_flight_state_passes_through` | Both methods delegate |
|
||||
| AC-12 | `test_ac12_flight_state_not_on_ground_propagates_without_retry` + `test_ac12_satellite_provider_error_propagates_without_retry` | Re-raised; zero sleep |
|
||||
| AC-13 | (implicit — relies on c6 SQL `pending_uploads` filter, exercised in AZ-319 batch suite) | Tile filter SQL reviewed |
|
||||
| NFR-perf-overhead | `test_nfr_overhead_under_5ms_on_success_first_attempt` | Generous bound (50ms dev-host) catches O(n²) regressions |
|
||||
|
||||
The deferred tests (AC-7, AC-8) require Postgres + Alembic apply
|
||||
and are only exercised in CI's Docker-compose phase. The schema
|
||||
test under `tests/unit/c6_tile_cache/test_postgres_schema.py` is
|
||||
docker-gated (skipped on this dev host); a follow-up tweak there
|
||||
to assert the new column will land when the Docker suite is
|
||||
re-run against 0003 (see Findings F4 + F5).
|
||||
|
||||
## Phase 4 — Lints
|
||||
|
||||
`ReadLints` clean across all touched files.
|
||||
|
||||
## Phase 5 — Test results
|
||||
|
||||
`pytest tests/unit/c11_tile_manager tests/unit/c6_tile_cache tests/unit/test_az272_fdr_record_schema.py -q --deselect tests/unit/c11_tile_manager/test_signing_key.py::test_nfr_perf_sign_microbench_p99_under_one_ms`:
|
||||
|
||||
- **238 passed, 57 skipped, 1 deselected**
|
||||
- Skips are Docker-gated (Postgres) — none AZ-320 related
|
||||
|
||||
`pytest tests/unit -q --deselect tests/unit/c11_tile_manager/test_signing_key.py::test_nfr_perf_sign_microbench_p99_under_one_ms`:
|
||||
|
||||
- **1429 passed, 80 skipped, 3 failed** — all 3 failures are
|
||||
pre-existing perf microbenches unrelated to AZ-320 scope:
|
||||
- `tests/unit/c10_provisioning/test_descriptor_batcher.py::test_nfr_perf_overhead_below_5_percent`
|
||||
- `tests/unit/c8_fc_adapter/test_az392_covariance_projector.py::test_nfr_perf_projector_under_100us_per_call`
|
||||
- (signing-key NFR was deselected; same failure mode flagged
|
||||
in Batch 40's sweep)
|
||||
|
||||
## Findings
|
||||
|
||||
### F1 — Recurring Clock-injection deviation: CLOSED for C11 (Informational)
|
||||
|
||||
**Severity**: Informational
|
||||
**Status**: closed by this batch
|
||||
|
||||
The cumulative review reports for batches 37-39 flagged that C11's
|
||||
sub-skills consistently injected a bare `Callable[[float], None]`
|
||||
sleep instead of the full `Clock` Protocol. AZ-320 needed both
|
||||
`monotonic_ns` (backoff arithmetic) AND `time_ns` (operator hint),
|
||||
so the decorator accepts the full `Clock` Protocol — matching
|
||||
AZ-307 / AZ-308 / project hygiene. No action; documented for the
|
||||
cumulative-batch readers.
|
||||
|
||||
### F2 — `_iso_now()` derives ts from datetime.now (Low)
|
||||
|
||||
**Severity**: Low
|
||||
**Recommendation**: track in the existing project-wide
|
||||
`_iso_now` audit PBI
|
||||
|
||||
The decorator's FDR records use `_iso_now()` (which calls
|
||||
`datetime.now(timezone.utc).strftime(...)`) for the `ts` field
|
||||
rather than deriving it from the injected `Clock.time_ns()`. This
|
||||
matches the existing pattern in `tile_uploader.py` (which does the
|
||||
same), so introducing a deviation in this batch alone would cause
|
||||
inconsistency across C11. The right fix is a project-wide sweep
|
||||
(also covers `signing_key.py` etc.) that the cumulative review
|
||||
window can carry.
|
||||
|
||||
### F3 — Spec wording: `pending` tile check vs. budget check (Low)
|
||||
|
||||
**Severity**: Low
|
||||
**Recommendation**: add a one-line comment OR update the spec
|
||||
|
||||
The spec § Outcome bullet 3 says: "If `retries_used <
|
||||
config.max_in_call_retries` AND there are still tiles with
|
||||
`voting_status == pending`, sleep + recurse". The implementation
|
||||
only checks the budget. The two are equivalent in practice — if
|
||||
no tiles are still `pending`, the inner uploader's next call queries
|
||||
`pending_uploads`, finds nothing, and returns `outcome=SUCCESS` —
|
||||
but the implementation does NOT explicitly query c6 to short-circuit
|
||||
before the sleep. This is an O(1) speedup at most (skips one
|
||||
`time.sleep`) and adds a c6 round-trip per retry, so the
|
||||
implementation choice is reasonable; a one-line code comment
|
||||
referencing the spec equivalence would close the gap.
|
||||
|
||||
### F4 — AC-7 + AC-8 not exercised on dev host (Low)
|
||||
|
||||
**Severity**: Low
|
||||
**Recommendation**: defer to next Docker-compose CI run
|
||||
|
||||
AC-7 (concurrent `increment_upload_attempts`) and AC-8 (migration
|
||||
applied to live DB) require Postgres + Alembic apply. The
|
||||
implementation follows the spec verbatim:
|
||||
|
||||
- `increment_upload_attempts`: `UPDATE tiles SET upload_attempts =
|
||||
upload_attempts + 1 WHERE tile_id = $1 RETURNING upload_attempts`
|
||||
- Migration: `op.add_column(...)` + `CREATE … CHECK …`, with a
|
||||
symmetric `downgrade()`
|
||||
|
||||
Both will be exercised by CI's Docker-compose phase. Code review
|
||||
of the SQL + migration is complete; runtime validation is the
|
||||
gating step.
|
||||
|
||||
### F5 — Postgres schema test still references AZ-304 head (Low)
|
||||
|
||||
**Severity**: Low
|
||||
**Recommendation**: update in the same PR that lands the
|
||||
Docker-compose CI run for 0003
|
||||
|
||||
`tests/unit/c6_tile_cache/test_postgres_schema.py` and
|
||||
`tests/unit/c6_tile_cache/fixtures/c6_postgres_schema_v2.sql` still
|
||||
reference AZ-304's head `0002_c6_tile_identity_and_lru`. These
|
||||
tests are docker-gated and skipped on this dev host, so they did
|
||||
NOT fail in the Batch 41 sweep. They will fail when the Docker
|
||||
CI runs against 0003 — at which point the fixture file should be
|
||||
extended (or replaced with `c6_postgres_schema_v3.sql`) to
|
||||
include the new column + widened constraint, and `_AZ304_REV`
|
||||
constants should be supplemented with `_AZ320_REV`.
|
||||
|
||||
This is intentionally NOT done in this batch (the dev sweep can't
|
||||
verify the fix), but documented here as a known follow-up.
|
||||
|
||||
## Verdict
|
||||
|
||||
**PASS_WITH_WARNINGS**. Five Low/Informational findings, none
|
||||
blocking; F3 is the only one that implies a code change (a
|
||||
single comment), and it is genuinely optional.
|
||||
Reference in New Issue
Block a user