mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 20:51:14 +00:00
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,59 @@
|
||||
# Batch 06 — Code Review
|
||||
|
||||
**Batch**: 6 of N
|
||||
**Tasks**: AZ-291 (C13 writer thread), AZ-292 (FlightHeader/Footer), AZ-293 (Capacity cap policy)
|
||||
**Reviewer**: autodev (7-phase)
|
||||
**Verdict**: **PASS_WITH_INFO**
|
||||
**Date**: 2026-05-11
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|
||||
|------|---------------------|----------------------|------------------------|
|
||||
| AZ-291 | C13 writer thread + segment files + filelock + ENOSPC | `components/c13_fdr/{writer.py,errors.py,interface.py,__init__.py}`, `config/schema.py`, `fdr_client/records.py` | `tests/unit/c13_fdr/test_az291_writer_thread.py` |
|
||||
| AZ-292 | Flight header/footer + counters | `components/c13_fdr/{headers.py,writer.py}`, `fdr_record_schema.md` contract bump | `tests/unit/c13_fdr/test_az292_flight_header_footer.py`, `tests/unit/test_az272_fdr_record_schema.py` |
|
||||
| AZ-293 | 64 GiB cap + oldest-segment-dropped + segment_rollover | `components/c13_fdr/cap_policy.py`, writer hook surface | `tests/unit/c13_fdr/test_az293_capacity_cap_policy.py` |
|
||||
|
||||
## Phase 1 — AC compliance
|
||||
|
||||
All 8 ACs per task verified via the new unit tests; 29 new tests added, all passing. See per-task AC coverage in `batch_06_cycle1_report.md`.
|
||||
|
||||
## Phase 2 — Contract drift
|
||||
|
||||
- **`fdr_record_schema.md` v1.0.0 → v1.1.0 (effective)**: `flight_header` and `flight_footer` payload key sets were extended to match AZ-292's task-spec dataclass shape (`flight_started_at_iso`, `flight_started_at_monotonic_ns`, `config_snapshot`, `signing_key_rotation_event`, `manifest_content_hashes` on header; `flight_ended_at_iso`, `flight_ended_at_monotonic_ns`, `records_dropped_overrun`, `bytes_written`, `rollover_count`, `clean_shutdown` on footer). The previous narrow shape (`started_at` / `ended_at` / `records_dropped`) was an unimplemented draft — no producer or consumer relied on it. The change is a minor bump per the contract's own versioning rules ("new optional payload field appended → minor"); existing parsers stay forward-compatible (unknown keys end up in `payload.extra`). The AZ-272 round-trip test was updated to track the new canonical fields.
|
||||
|
||||
## Phase 3 — Architectural compliance
|
||||
|
||||
- **R14 / single-writer SPSC contract**: `FileFdrWriter` is the sole consumer of every registered `FdrClient`; the writer thread is the only mutator of the four flight counters. No reader-side locks.
|
||||
- **No cross-component upward imports**: `cap_policy.py` imports `FileFdrWriter` (allowed: same component); `writer.py` imports from `fdr_client.*` and `config.*` (allowed: cross-cutting); no component upward edges.
|
||||
- **AZ-291 vs AZ-293 separation**: per-segment rotation (size-driven) lives in the writer; per-flight cap policy (cumulative size-driven) lives in `CapacityCapPolicy` wired by composition root via `on_rotation` hook. Writer never imports the policy.
|
||||
- **No new dependencies**: contract said "atomicwrites + filelock" but `filelock` was not in `pyproject.toml`. Used `fcntl.flock` from stdlib (POSIX advisory locks — kernel releases on process death, matching the Risk-3 mitigation in the spec). Documented inline in `writer.py` module docstring.
|
||||
|
||||
## Phase 4 — Performance & reliability
|
||||
|
||||
- **`fsync` discipline (AC-3 / NFR-reliability-fsync)**: every segment close (rotation + stop + close_flight) calls `os.fsync` before `os.close`. No per-record fsync (NFR allows this).
|
||||
- **No backward seeks (NFR-reliability-no-seek)**: file descriptor opened with `O_WRONLY | O_CREAT | O_APPEND`; only `os.write` and `os.fsync` are called on it.
|
||||
- **Footer `bytes_written` self-reference (AC-3)**: the footer's `bytes_written` payload field must include the footer's own framed size. `close_flight()` iterates up to 8 times to converge (an integer field's ASCII length only changes at decimal-power boundaries, so the fixpoint is reached in ≤ 2 passes in practice).
|
||||
- **ENOSPC degraded mode (AC-5)**: catches `OSError` around `os.write`, emits one ERROR log + one GCS alert, drops further records while continuing to dequeue producer buffers so producers don't grow unbounded. Per-second log rate cap (`_LOG_FAILURE_RATE_LIMIT_S = 1.0`) caps repeated failure noise.
|
||||
- **Filelock recovery (Risk 3)**: `fcntl.flock` is kernel-managed; abrupt process death releases the lock automatically — verified by AC-6 (re-construct succeeds after `stop()`).
|
||||
|
||||
## Phase 5 — Test quality
|
||||
|
||||
- **AC coverage**: 8 ACs per task × 3 tasks = 24 explicit AC tests; plus 5 additional invariants (frozen dataclasses, double-start, double-close idempotency, cap-policy input validation, config-schema-no-disable-flag).
|
||||
- **Determinism**: tests use a busy-wait loop with explicit timeout (`deadline = time.monotonic() + 5.0`) to wait for the writer thread to drain producer buffers — preferred over fixed `time.sleep` (deterministic on fast machines, robust on slow ones).
|
||||
- **Production isolation**: no `monkeypatch` of stdlib unless required (AC-5 ENOSPC mocks `os.write` once to inject `errno.ENOSPC`; AC-4 of AZ-292 mocks `os.write` to inject `PermissionError` for the read-only-mount path).
|
||||
- **AC-7 of AZ-292 (clean_shutdown=False on uncleansed teardown)**: tests choose the "no footer at all" path (allowed by spec); production composition root can choose to add the partial-footer path later without breaking the contract.
|
||||
|
||||
## Phase 6 — Informational findings (no blockers)
|
||||
|
||||
1. **Footer `bytes_written` includes the footer record itself** — convergence loop runs once or twice in practice; documented inline. If a future test pins the exact byte total against the file, the loop is the canonical answer; no edge case where it diverges (8-iter cap is paranoia margin).
|
||||
|
||||
2. **Cap policy emits `segment_rollover` records via the shared FdrClient** — those records are themselves enqueued, drained, and written to the current open segment. Under aggressive test caps (cap ≤ 1024 bytes, segment ≤ 256 bytes), the cascade of rollover-record writes can extend the cap-drop loop. In production with cap = 64 GiB and segments = 64 MiB, the cascade is negligible (≤ tens of bytes per drop). Documented as test-only consideration in AC-2 test comments.
|
||||
|
||||
3. **Contract minor bump for `flight_header` / `flight_footer`** — see Phase 2. No consumer of the previous narrow shape exists; no migration needed; the v1.0.0 draft has been overwritten in place because no record matching the prior shape has ever been emitted.
|
||||
|
||||
4. **`filelock` dependency replaced with `fcntl.flock`** — see Phase 3. Net effect: one fewer transitive dependency, same semantics on Linux + macOS (target platforms). Windows is explicitly unsupported on the companion onboard runtime.
|
||||
|
||||
## Phase 7 — Verdict
|
||||
|
||||
**PASS_WITH_INFO** — all ACs covered, all 323 project tests green, no lint or formatting issues, no contract drift uncovered. Informational findings (1–4 above) are documented and require no follow-up beyond their inline notes.
|
||||
Reference in New Issue
Block a user