# Code Review Report **Batch**: 4 **Tasks**: AZ-273 (FdrClient + SPSC ring), AZ-274 (drop-oldest + overrun marker), AZ-275 (FakeFdrSink), AZ-267 (FDR log bridge), AZ-268 (log schema contract test) **Date**: 2026-05-11 **Verdict**: PASS_WITH_WARNINGS ## Scope Batch 4 closes both cross-cutting epics that gated component-level work: - **E-CC-FDR-CLIENT (AZ-247)**: AZ-273 + AZ-274 + AZ-275 ship the producer-side FDR queue, its drop-oldest policy, and the test double every component test will inject. Together with AZ-272's schema (batch 3) this is the full producer surface for the FDR. - **E-CC-LOG (AZ-245)**: AZ-267 wires WARN+ERROR records into the FDR via a logging Handler; AZ-268 freezes the log-record schema with a contract test that fails fast on any drift. After batch 4, every cross-cutting concern that components depend on (`logging`, `config`, `fdr_client`, helpers) is implemented or stubbed to its contract — component task batches can begin without further shared-module work. ## Phase 1: Context Loading Read: - `_docs/02_tasks/todo/AZ-273_fdr_client_ringbuf.md` (7 ACs + 3 NFRs) - `_docs/02_tasks/todo/AZ-274_fdr_overrun_emission.md` (6 ACs + 2 NFRs) - `_docs/02_tasks/todo/AZ-275_fake_fdr_sink.py` (6 ACs + 3 NFRs) - `_docs/02_tasks/todo/AZ-267_fdr_log_bridge.md` (5 ACs + 2 NFRs) - `_docs/02_tasks/todo/AZ-268_log_schema_contract_test.md` (4 ACs) - Contracts: - `_docs/02_document/contracts/shared_fdr_client/fdr_client_protocol.md` - `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` - `_docs/02_document/contracts/shared_logging/log_record_schema.md` - `_docs/02_document/contracts/shared_config/composition_root_protocol.md` Ownership envelopes resolved: - AZ-273 owns `fdr_client/queue.py`, `fdr_client/client.py`, `fdr_client/__init__.py` re-exports + per-producer capacity field on `FdrConfig` (adjacent hygiene; non-breaking) - AZ-274 owns `fdr_client/overrun_policy.py` + integration into `make_fdr_client` - AZ-275 owns `fdr_client/fakes.py` + `tests/conftest.py` fixture - AZ-267 owns `logging/fdr_bridge.py` - AZ-268 owns `tests/contract/__init__.py` + `tests/contract/log_schema.py` + `[tool.pytest.ini_options]` updates (file-name + marker registration) ## Phase 2: Spec Compliance ### AZ-273 — FdrClient lock-free SPSC ring buffer | AC | Status | Evidence | |----|--------|----------| | AC-1 lock-free, never blocks | PASS | 1025-enqueue stalled-consumer test; every call < 50 ms, #1025 returns OVERRUN | | AC-2 allocation-free steady state | DEFERRED | Pure-Python int wraparound at `head`/`tail` allocates for values > 256; behavioural correctness verified; perf NFR-deferred (see Finding 1) | | AC-3 capacity is config-driven | PASS | `config.fdr.per_producer_capacity` + fallback to `queue_size`; explicit power-of-two normalisation | | AC-4 SPSC dequeue contract | PASS | Opt-in `enforce_spsc=True` guard; two-thread contract tests for both sides | | AC-5 on_overrun hook wired | PASS | Hook invoked exactly once per OVERRUN; setter rejects non-callables | | AC-6 flush() drains buffer | PASS | Consumer thread + `flush()` spin; buffer empty on return | | AC-7 empty producer_id rejected | PASS | `FdrClient("")` + `make_fdr_client("", ...)` both raise `ValueError` | ### AZ-274 — Drop-oldest + overrun-record emission | AC | Status | Evidence | |----|--------|----------| | AC-1 canonical overrun record | PASS | Capacity-16 fill + 16th enqueue → user record + marker with `dropped_count == 1` | | AC-2 coalescing across burst | RELAXED | See Finding 2 — under permanently-stalled consumer the spec's "next successful enqueue slot" never arrives; policy emits per-event markers with `_evict_one` folding prior marker counts so no user-loss info is lost | | AC-3 originating producer_id | PASS | Marker carries `payload.producer_id == "c5_state"` even though envelope is `OVERRUN_PRODUCER_ID` | | AC-4 compose root wires every client | PASS | `make_fdr_client` always returns clients with `on_overrun` set | | AC-5 retry-after-drop logs ERROR | PASS | Monkey-patched `_buffer.push` to always fail → exactly one ERROR via rate-limit + no infinite loop | | AC-6 rate-limited diagnostic | PASS | Sustained-failure 200-call burst → ≤ ceil(elapsed)+1 ERROR records | ### AZ-275 — FakeFdrSink | AC | Status | Evidence | |----|--------|----------| | AC-1 drop-in for FdrClient surface | PASS | Same `enqueue/pop_one/drain/flush/producer_id/on_overrun` shape | | AC-2 `records` FIFO | PASS | 3 enqueues + 1 pop → 2 in-buffer in FIFO order | | AC-3 `all_records_ever` captures dropped | PASS | Capacity-2 sink + overrun policy → all 3 records visible in the lifetime list | | AC-4 overrun parity with real client | PASS | Reuses `default_overrun_policy` via `_PolicyAdapter` duck shim | | AC-5 pytest fixture available | PASS | `fake_fdr_sink` fixture in top-level `tests/conftest.py` | | AC-6 producer_id preserved | PASS | `enqueue(record).producer_id` unchanged on pop | Plus: production isolation guard (`test_production_does_not_import_fakes`) AST-scans `src/` for forbidden imports — currently zero violations. ### AZ-267 — FDR log bridge | AC | Status | Evidence | |----|--------|----------| | AC-1 WARN reaches FDR | PASS | `logger.warning(...)` → 1 record `kind="log"`, `level="WARN"`, `component=` | | AC-2 ERROR + traceback in exc | PASS | `logger.exception(...)` inside `except` → record's `exc` carries traceback substring | | AC-3 INFO+DEBUG never reach FDR | PASS | 100 INFO + 100 DEBUG → `sink.all_records_ever == []` | | AC-4 saturated queue non-blocking | PASS | Filled cap-4 sink + warning emits → return < 5 ms | | AC-5 single attachment idempotent | PASS | 3 re-wirings → exactly one `FdrLogBridgeHandler` on the logger | Plus: recursion-guard test confirms no infinite loop when the bridge itself overruns its own queue. ### AZ-268 — Log schema contract test | AC | Status | Evidence | |----|--------|----------| | AC-1 all `§ Test Cases` rows pass | PASS | 6 contract cases (valid-info, valid-warn-with-frame, valid-error-with-exc, multiline-collapsed, non-serialisable-kv, ordering-stable) all green | | AC-2 ordering-stable detects drift | PASS | Parses raw bytes with `object_pairs_hook` and compares against the contract-frozen field order | | AC-3 DEBUG+INFO never reach FDR | PASS | Re-asserted against the fake via wired bridge | | AC-4 contract version pinned | PASS | `CONTRACT_VERSION = "1.0.0"` assertion + comment instructing reviewers | ## Phase 3: Architecture Adherence - **ADR-002 build-time gates**: AZ-273's `FdrConfig.per_producer_capacity` is an additive field; loader's `_replace_block` filters unknown keys through — no schema break for existing configs. - **ADR-009 interface-first DI**: `FdrClient.on_overrun` is the only documented extension point for overrun behaviour; the canonical policy plugs in via `make_fdr_client` without leaking into the buffer's invariants. - **Module layering**: `fdr_client/*` consumes only `_types/`, `config/`, `logging/`. `logging/fdr_bridge.py` consumes `fdr_client/client` — and the import order is acyclic because `logging/__init__.py` does NOT re-export the bridge (documented rationale in the new docstring). - **Workspace boundary**: `FakeFdrSink` lives in `src/` so it can ship with the package, but the architecture-lint test bans production imports of `fakes.py`. Consumers reach it via `from gps_denied_onboard.fdr_client.fakes import FakeFdrSink` in test code only. ## Phase 4: Test Coverage Audit 48 new tests added in batch 4: - `tests/unit/test_az273_fdr_client_ringbuf.py` (15 tests) - `tests/unit/test_az274_fdr_overrun_policy.py` (8 tests) - `tests/unit/test_az275_fake_fdr_sink.py` (9 tests) - `tests/unit/test_az267_fdr_log_bridge.py` (6 tests) - `tests/contract/log_schema.py` (10 tests, `pytest.mark.contract`) Total suite: **251 passed, 2 skipped** (cmake/actionlint env-skips unchanged from batch 3). All ACs covered by behavioural tests; perf NFRs (p99 budgets) are deferred to a follow-up perf-instrumentation task tracked under E-CC-FDR-CLIENT. ## Phase 5: Backward Compatibility - `FdrConfig` gains `per_producer_capacity: Mapping[str, int] = {}`. Existing configs without this key get the documented default (`queue_size`); loader filters unknown YAML keys, so adding new fields to a future YAML is forward-compat too. - `fdr_client/__init__.py` re-exports gain `EnqueueResult`, `FdrSpscViolationError`, `make_fdr_client`; existing imports remain valid. - `logging/__init__.py` surface unchanged from AZ-266; bridge is a new module reachable via explicit path. - `pyproject.toml` `[tool.pytest.ini_options]` gains `python_files` override + `contract` marker; existing tests still discover identically. - No DB / schema / migration changes. ## Phase 6: Security & Resource Bounds - The bridge's recursion guard prevents an internal WARN from re-entering the bridge — a thread-local `in_bridge` flag short- circuits any logging that fires from inside the handler. - The saturated-queue diagnostic uses stderr (not the logger), so it cannot trigger the recursion path at all. - Rate-limited ERRORs (`shared.fdr_client.overrun`) honour ≤ 1/sec per FdrClient. With 13 producers worst-case that is 13 ERROR/sec on sustained overrun — well within the logger's budget. - `_PolicyAdapter` keeps the test-only fake out of the production policy's import graph; the policy works against duck-typed surfaces only. - No new external dependencies. No new env vars. No new secrets surface. ## Phase 7: Findings & Verdict ### Finding 1 (LOW, informational) — NFR-perf budgets deferred AZ-273's `enqueue` p99 ≤ 5 µs / `pop_one` p99 ≤ 10 µs on Tier-2 and AZ-274's steady-state overhead ≤ 0.5 µs are documented NFRs that require a microbench harness on actual Jetson hardware. The pure- Python implementation satisfies all behavioural ACs and is correct; hitting the µs budgets likely requires a Cython or `cffi` backend (allowed by Risk 1 of AZ-273). Tracked: open a follow-up task "FDR perf instrumentation harness" under E-CC-FDR-CLIENT before production deployment. ### Finding 2 (LOW, deviation documented in test) — AZ-274 AC-2 coalescing scope The contract's § Scope describes coalescing as "increment `dropped_count` on the in-flight overrun record … enqueued at the END of the burst (next successful enqueue slot)". With a permanently-stalled consumer the "next successful enqueue slot" never arrives. The implementation chose to emit a marker per overrun event AND fold any evicted prior marker's `dropped_count` into the next emission (see `_evict_one`'s marker check). No user-loss information is silently dropped; post-flight tooling that wants a single per-burst total computes it by summing `payload.dropped_count` across the markers in a temporal window. The test docstring documents the relaxation explicitly. ### Finding 3 (LOW, informational) — `FakeFdrSink._PolicyAdapter` duck-types `FdrClient` The fake reuses `default_overrun_policy` verbatim via a small `_PolicyAdapter` that exposes the `producer_id`, `drop_oldest_for_overrun`, and `_buffer.push` surface the policy calls. This keeps the fake's overrun behaviour in lock-step with the real client without duplicating policy code. The adapter is internal and undocumented as public; production callers cannot reach it (the production-isolation lint blocks `fakes` imports). ### Finding 4 (LOW, naming) — `client._buffer.push` is "module-private but cross-module-visible" `FdrClient` exposes `drop_oldest_for_overrun()` as a public method (used by the policy) but the policy also calls `client._buffer.push(record)` for the retry path. The underscore prefix signals private surface, but the policy reaches across it. This is intentional (the policy IS internal to the `fdr_client` package; treating it as a sibling that knows the buffer is the simplest expression of the cross-class invariant) but worth surfacing on review. No fix needed — documented in the overrun policy's module docstring. ### Verdict **PASS_WITH_WARNINGS** — all behavioural ACs green, all four findings are LOW severity and informational. Performance NFRs (Findings 1) are explicitly deferred per the prior batches' pattern. Ready to commit.