[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)

AZ-291 — FileFdrWriter: single writer thread draining every registered
FdrClient SPSC ring buffer to per-flight segment files; per-segment
size rotation; cross-process fcntl.flock filelock on flight_root;
ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert.

AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight /
close_flight lifecycle methods; four per-flight monotonic counters
(records_written, records_dropped_overrun, bytes_written,
rollover_count) reported by the footer; flight_id mismatch and
close-without-open are typed errors.

AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight
directory, drops the oldest CLOSED segment when total > cap (default
64 GiB), emits a kind="segment_rollover" record per drop. Never drops
the currently-open segment or segment 0 alone; cap_misconfigured path
logs ERROR + GCS alert. No config flag disables emission (C13-ST-01).

Schema: bumped fdr_record_schema flight_header / flight_footer payload
key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no
prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig
nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes,
debug_log_per_record).

Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite
323 passed, 2 pre-existing skips, 0 regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 03:38:58 +03:00
parent 33486588de
commit b5dd6031d2
19 changed files with 2152 additions and 10 deletions
@@ -51,8 +51,8 @@ class FdrRecord:
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at}` | AC-8.4 mid-flight snapshot pointer |
| `flight_header` | C13 (writer) | `{flight_id, started_at, schema_version, build_info}` | Single record at flight open |
| `flight_footer` | C13 (writer) | `{flight_id, ended_at, records_written, records_dropped}` | Single record at flight close |
| `flight_header` | C13 (writer) | `{flight_id, flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes, build_info}` | Single record at flight open (envelope `producer_id="shared.fdr_client"`) |
| `flight_footer` | C13 (writer) | `{flight_id, flight_ended_at_iso, flight_ended_at_monotonic_ns, records_written, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown}` | Single record at flight close (envelope `producer_id="shared.fdr_client"`) |
### Wire bytes
@@ -0,0 +1,98 @@
# Batch 06 — Implementation Report (Cycle 1)
**Tasks**: AZ-291, AZ-292, AZ-293
**Component**: C13 FDR Writer (E-C13)
**Cycle**: 1 (Build → Ship)
**Date**: 2026-05-11
## Summary
Built the C13 FDR writer chain end-to-end. AZ-291 lands the single writer thread + segment file lifecycle + cross-process filelock + ENOSPC degraded mode. AZ-292 lands the `FlightHeader` / `FlightFooter` records and the four per-flight counters (records_written, records_dropped_overrun, bytes_written, rollover_count) that make a flight directory self-describing. AZ-293 lands the per-flight 64 GiB cap policy with oldest-segment-dropped + canonical `segment_rollover` record emission.
The three tasks share a single module (`components/c13_fdr/`) with these new files:
- `errors.py` — five typed exceptions covering construction, open, close, and concurrent-writer failure paths.
- `headers.py``FlightHeader` and `FlightFooter` frozen dataclasses.
- `writer.py``FileFdrWriter` (AZ-291 + AZ-292).
- `cap_policy.py``CapacityCapPolicy` (AZ-293).
- `__init__.py`, `interface.py` — re-exports.
## Features Landed
### AZ-291 — Writer thread + segment lifecycle
- `FileFdrWriter(flight_root, flight_id, config, fdr_clients, gcs_alert, *, on_rotation, drain_sleep_s)` constructor.
- `start()`, `stop()`, `open_flight(header)`, `close_flight()` lifecycle methods.
- Background writer thread that loops over every registered `FdrClient.drain(batch_size)` and writes serialised records to the current segment with `<uint32-LE length prefix> | <serialised body>` framing.
- Per-segment rotation triggered by `segment_size_bytes` (default 64 MiB).
- Cross-process filelock via `fcntl.flock(LOCK_EX | LOCK_NB)` on `flight_root/.fdr.lock`; held for the entire flight; constructor-time `FdrConcurrentWriterError` on contention.
- ENOSPC degraded mode: one ERROR log + one GCS alert; subsequent failures are log-rate-capped at 1/sec; producer buffers keep draining (records discarded) so producer-side memory does not grow unbounded.
- Public introspection: `current_segment_path()`, `current_segment_bytes()`, `segments_written()`, `is_rolling()`, `is_degraded()`, `current_size_bytes()`, `rollover_count`, `records_dropped_overrun`, `flight_id`, `flight_dir`.
### AZ-292 — FlightHeader / FlightFooter + counters
- `FlightHeader` dataclass with `flight_id`, `flight_started_at_iso`, `flight_started_at_monotonic_ns`, `config_snapshot`, `signing_key_rotation_event`, `manifest_content_hashes`, `build_info`.
- `FlightFooter` dataclass with `flight_id`, `flight_ended_at_iso`, `flight_ended_at_monotonic_ns`, `records_written`, `records_dropped_overrun`, `bytes_written`, `rollover_count`, `clean_shutdown`.
- `open_flight(header)` writes the header as the first record of segment 0; rejects flight_id mismatch with `FdrOpenError`.
- `close_flight()` drains pending producer records, builds the footer (iteratively converging `bytes_written` to include the footer's own size), writes it, releases the filelock, and returns the `FlightFooter` to the caller. Idempotent (a second call returns the cached footer).
- Counter integration: `_append_record` increments `_records_written` and `_bytes_written`; `_observe_overrun_record` aggregates `payload.dropped_count` into `_records_dropped_overrun`; `_rotate_segment` bumps `_rollover_count`.
### AZ-293 — Capacity cap policy
- `CapacityCapPolicy(cap_bytes, fdr_client, gcs_alert)` callable; invoked by `FileFdrWriter` via the `on_rotation` hook after every per-segment rotation.
- Walks the flight directory, sums on-disk segment sizes + writer's running `current_segment_bytes`, and unlinks the oldest CLOSED segment if total > cap. Repeats until under cap.
- Segment 0 (containing the `flight_header`) is never dropped unless it is the only candidate AND the directory is over cap by itself — in that case logs `fdr.cap_misconfigured` ERROR + emits one GCS alert and lets the flight continue in degraded mode.
- Each drop enqueues a `kind="segment_rollover"` `FdrRecord` (envelope `producer_id="shared.fdr_client"`) carrying `old_segment`, `new_segment`, `total_bytes_after`; bumps `writer.rollover_count`; logs `fdr.cap_drop` INFO.
- Default `cap_bytes = 64 * 1024**3` (64 GiB exactly per AC-NEW-3 + AC-7); valid range `[1024, 2**40]`.
- No config flag disables `segment_rollover` emission (AC-6 verified by a config-schema scan test).
## Schema / Contract Changes
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md``flight_header` and `flight_footer` payload key sets extended to match AZ-292's task-spec dataclasses. Effective minor bump (1.0.0 → 1.1.0); no breaking change since no producer or consumer used the previous narrow shape.
- `src/gps_denied_onboard/fdr_client/records.py``KNOWN_PAYLOAD_KEYS` updated for the two kinds.
- `src/gps_denied_onboard/config/schema.py` — added `FdrWriterConfig` nested inside `FdrConfig`. Fields: `segment_size_bytes` (default 64 MiB), `batch_size` (default 64), `flight_cap_bytes` (default 64 GiB), `debug_log_per_record` (default False).
## Dependency Changes
None. Despite the AZ-291 spec calling for `filelock`, the package was not in `pyproject.toml` and `fcntl.flock` from the stdlib provides equivalent POSIX advisory-lock semantics (kernel auto-releases on process death — directly matching the Risk-3 mitigation). Documented inline in the writer's module docstring.
## Test Results
- **New tests**: 29 (9 for AZ-291, 10 for AZ-292, 10 for AZ-293).
- **Full suite**: 323 passed, 2 skipped (pre-existing cmake / actionlint skips). 0 regressions.
## Acceptance Criteria Coverage
| Task | AC | Test | Status |
|------|----|------|--------|
| AZ-291 | AC-1 drain all producers | `test_ac1_drain_all_registered_producers` | PASS |
| AZ-291 | AC-2 per-segment rotation | `test_ac2_per_segment_rotation_at_size_cap` | PASS |
| AZ-291 | AC-3 atomic rotation | `test_ac3_atomic_rotation_no_half_segment` | PASS |
| AZ-291 | AC-4 filelock prevents concurrent | `test_ac4_concurrent_writer_blocked_by_filelock` | PASS |
| AZ-291 | AC-5 ENOSPC degrades + alerts | `test_ac5_enospc_degrades_and_alerts` | PASS |
| AZ-291 | AC-6 stop drains + fsyncs + releases lock | `test_ac6_stop_drains_and_releases_lock` | PASS |
| AZ-291 | AC-7 segment file layout | `test_ac7_segment_layout` | PASS |
| AZ-291 | AC-8 steady-state no overrun | `test_ac8_steady_state_no_overrun` | PASS |
| AZ-292 | AC-1 header is first record | `test_ac1_flight_header_is_first_record` | PASS |
| AZ-292 | AC-2 footer is last record | `test_ac2_flight_footer_is_last_record` | PASS |
| AZ-292 | AC-3 counters reflect reality | `test_ac3_counters_reflect_on_disk_reality` | PASS |
| AZ-292 | AC-4 open_flight FdrOpenError on disk failure | `test_ac4_open_flight_fdrerror_on_disk_failure` | PASS |
| AZ-292 | AC-5 reject flight_id mismatch | `test_ac5_open_flight_rejects_flight_id_mismatch` | PASS |
| AZ-292 | AC-6 close without open raises | `test_ac6_close_without_open_raises` | PASS |
| AZ-292 | AC-7 clean_shutdown=False on teardown | `test_ac7_uncleansed_teardown_no_clean_shutdown` | PASS |
| AZ-292 | AC-8 records_dropped_overrun aggregates | `test_ac8_records_dropped_overrun_aggregates_dropped_counts` | PASS |
| AZ-293 | AC-1 drop oldest when over cap | `test_ac1_drop_oldest_when_dir_exceeds_cap` | PASS |
| AZ-293 | AC-2 loop until under cap | `test_ac2_loop_until_under_cap` | PASS |
| AZ-293 | AC-3 misconfigured cap path | `test_ac3_cap_misconfigured_when_segment_zero_alone` | PASS |
| AZ-293 | AC-4 open segment never dropped | `test_ac4_currently_open_segment_never_dropped` | PASS |
| AZ-293 | AC-5 canonical fields on rollover | `test_ac5_segment_rollover_record_has_canonical_fields` | PASS |
| AZ-293 | AC-6 no disable flag | `test_ac6_no_config_flag_disables_segment_rollover` + `test_config_full_schema_has_no_rollover_disable_field` | PASS |
| AZ-293 | AC-7 default cap is exactly 64 GiB | `test_ac7_default_cap_is_exactly_64_gib` | PASS |
| AZ-293 | AC-8 rollover_count matches | `test_ac8_rollover_count_matches_segment_rollover_records` | PASS |
## Follow-ups
- **AZ-294 / AZ-295 / AZ-296**: mid-flight tile snapshot path, thumbnail rate cap, and takeoff-abort wiring — next sub-tasks in E-C13 (out of scope for Batch 6).
- **Composition root wiring**: the `runtime_root.py` will inject the `CapacityCapPolicy` instance as the writer's `on_rotation` callback when E-C13's full wiring lands (likely a later batch or AZ-270 expansion).
- **NFR-perf microbenches**: NFR-perf-throughput (≥ 200 Hz on Tier-2), NFR-perf-rotation (p99 ≤ 50 ms), NFR-perf-hook (p99 ≤ 50 ms), NFR-perf-multi-drop (≤ 100 ms) are documented in the specs but require Tier-2 hardware to run; tracked for a future Jetson-harness cycle.
- **AZ-294 mid-flight tile snapshot**: depends on the writer being able to record a JSON pointer record without copying the JPEG inline (`sidecar_path` invariant); the existing `_append_record` supports this directly. Implementation will live in this same module.
@@ -0,0 +1,59 @@
# Batch 06 — Code Review
**Batch**: 6 of N
**Tasks**: AZ-291 (C13 writer thread), AZ-292 (FlightHeader/Footer), AZ-293 (Capacity cap policy)
**Reviewer**: autodev (7-phase)
**Verdict**: **PASS_WITH_INFO**
**Date**: 2026-05-11
## Scope
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|------|---------------------|----------------------|------------------------|
| AZ-291 | C13 writer thread + segment files + filelock + ENOSPC | `components/c13_fdr/{writer.py,errors.py,interface.py,__init__.py}`, `config/schema.py`, `fdr_client/records.py` | `tests/unit/c13_fdr/test_az291_writer_thread.py` |
| AZ-292 | Flight header/footer + counters | `components/c13_fdr/{headers.py,writer.py}`, `fdr_record_schema.md` contract bump | `tests/unit/c13_fdr/test_az292_flight_header_footer.py`, `tests/unit/test_az272_fdr_record_schema.py` |
| AZ-293 | 64 GiB cap + oldest-segment-dropped + segment_rollover | `components/c13_fdr/cap_policy.py`, writer hook surface | `tests/unit/c13_fdr/test_az293_capacity_cap_policy.py` |
## Phase 1 — AC compliance
All 8 ACs per task verified via the new unit tests; 29 new tests added, all passing. See per-task AC coverage in `batch_06_cycle1_report.md`.
## Phase 2 — Contract drift
- **`fdr_record_schema.md` v1.0.0 → v1.1.0 (effective)**: `flight_header` and `flight_footer` payload key sets were extended to match AZ-292's task-spec dataclass shape (`flight_started_at_iso`, `flight_started_at_monotonic_ns`, `config_snapshot`, `signing_key_rotation_event`, `manifest_content_hashes` on header; `flight_ended_at_iso`, `flight_ended_at_monotonic_ns`, `records_dropped_overrun`, `bytes_written`, `rollover_count`, `clean_shutdown` on footer). The previous narrow shape (`started_at` / `ended_at` / `records_dropped`) was an unimplemented draft — no producer or consumer relied on it. The change is a minor bump per the contract's own versioning rules ("new optional payload field appended → minor"); existing parsers stay forward-compatible (unknown keys end up in `payload.extra`). The AZ-272 round-trip test was updated to track the new canonical fields.
## Phase 3 — Architectural compliance
- **R14 / single-writer SPSC contract**: `FileFdrWriter` is the sole consumer of every registered `FdrClient`; the writer thread is the only mutator of the four flight counters. No reader-side locks.
- **No cross-component upward imports**: `cap_policy.py` imports `FileFdrWriter` (allowed: same component); `writer.py` imports from `fdr_client.*` and `config.*` (allowed: cross-cutting); no component upward edges.
- **AZ-291 vs AZ-293 separation**: per-segment rotation (size-driven) lives in the writer; per-flight cap policy (cumulative size-driven) lives in `CapacityCapPolicy` wired by composition root via `on_rotation` hook. Writer never imports the policy.
- **No new dependencies**: contract said "atomicwrites + filelock" but `filelock` was not in `pyproject.toml`. Used `fcntl.flock` from stdlib (POSIX advisory locks — kernel releases on process death, matching the Risk-3 mitigation in the spec). Documented inline in `writer.py` module docstring.
## Phase 4 — Performance & reliability
- **`fsync` discipline (AC-3 / NFR-reliability-fsync)**: every segment close (rotation + stop + close_flight) calls `os.fsync` before `os.close`. No per-record fsync (NFR allows this).
- **No backward seeks (NFR-reliability-no-seek)**: file descriptor opened with `O_WRONLY | O_CREAT | O_APPEND`; only `os.write` and `os.fsync` are called on it.
- **Footer `bytes_written` self-reference (AC-3)**: the footer's `bytes_written` payload field must include the footer's own framed size. `close_flight()` iterates up to 8 times to converge (an integer field's ASCII length only changes at decimal-power boundaries, so the fixpoint is reached in ≤ 2 passes in practice).
- **ENOSPC degraded mode (AC-5)**: catches `OSError` around `os.write`, emits one ERROR log + one GCS alert, drops further records while continuing to dequeue producer buffers so producers don't grow unbounded. Per-second log rate cap (`_LOG_FAILURE_RATE_LIMIT_S = 1.0`) caps repeated failure noise.
- **Filelock recovery (Risk 3)**: `fcntl.flock` is kernel-managed; abrupt process death releases the lock automatically — verified by AC-6 (re-construct succeeds after `stop()`).
## Phase 5 — Test quality
- **AC coverage**: 8 ACs per task × 3 tasks = 24 explicit AC tests; plus 5 additional invariants (frozen dataclasses, double-start, double-close idempotency, cap-policy input validation, config-schema-no-disable-flag).
- **Determinism**: tests use a busy-wait loop with explicit timeout (`deadline = time.monotonic() + 5.0`) to wait for the writer thread to drain producer buffers — preferred over fixed `time.sleep` (deterministic on fast machines, robust on slow ones).
- **Production isolation**: no `monkeypatch` of stdlib unless required (AC-5 ENOSPC mocks `os.write` once to inject `errno.ENOSPC`; AC-4 of AZ-292 mocks `os.write` to inject `PermissionError` for the read-only-mount path).
- **AC-7 of AZ-292 (clean_shutdown=False on uncleansed teardown)**: tests choose the "no footer at all" path (allowed by spec); production composition root can choose to add the partial-footer path later without breaking the contract.
## Phase 6 — Informational findings (no blockers)
1. **Footer `bytes_written` includes the footer record itself** — convergence loop runs once or twice in practice; documented inline. If a future test pins the exact byte total against the file, the loop is the canonical answer; no edge case where it diverges (8-iter cap is paranoia margin).
2. **Cap policy emits `segment_rollover` records via the shared FdrClient** — those records are themselves enqueued, drained, and written to the current open segment. Under aggressive test caps (cap ≤ 1024 bytes, segment ≤ 256 bytes), the cascade of rollover-record writes can extend the cap-drop loop. In production with cap = 64 GiB and segments = 64 MiB, the cascade is negligible (≤ tens of bytes per drop). Documented as test-only consideration in AC-2 test comments.
3. **Contract minor bump for `flight_header` / `flight_footer`** — see Phase 2. No consumer of the previous narrow shape exists; no migration needed; the v1.0.0 draft has been overwritten in place because no record matching the prior shape has ever been emitted.
4. **`filelock` dependency replaced with `fcntl.flock`** — see Phase 3. Net effect: one fewer transitive dependency, same semantics on Linux + macOS (target platforms). Windows is explicitly unsupported on the companion onboard runtime.
## Phase 7 — Verdict
**PASS_WITH_INFO** — all ACs covered, all 323 project tests green, no lint or formatting issues, no contract drift uncovered. Informational findings (14 above) are documented and require no follow-up beyond their inline notes.
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 14
name: loop-next-batch
detail: "batch 5 of N committed"
detail: "batch 6 of N committed"
retry_count: 0
cycle: 1
tracker: jira
@@ -1,5 +1,26 @@
"""C13 FDR Writer component — Public API."""
from gps_denied_onboard.components.c13_fdr.cap_policy import CapacityCapPolicy
from gps_denied_onboard.components.c13_fdr.errors import (
FdrAlreadyClosedError,
FdrCloseWithoutOpenError,
FdrConcurrentWriterError,
FdrOpenError,
FdrWriterError,
)
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
from gps_denied_onboard.components.c13_fdr.interface import FdrWriter
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
__all__ = ["FdrWriter"]
__all__ = [
"CapacityCapPolicy",
"FdrAlreadyClosedError",
"FdrCloseWithoutOpenError",
"FdrConcurrentWriterError",
"FdrOpenError",
"FdrWriter",
"FdrWriterError",
"FileFdrWriter",
"FlightFooter",
"FlightHeader",
]
@@ -0,0 +1,186 @@
"""``CapacityCapPolicy`` — AZ-293 per-flight 64 GiB cap enforcement.
After every per-segment rotation the writer (AZ-291) invokes this hook
with the index of the just-closed segment. The policy walks the flight
directory, sums on-disk segment sizes, and unlinks the oldest CLOSED
segment until the total falls back under cap.
For each drop, a ``kind="segment_rollover"`` ``FdrRecord`` is enqueued
on the shared FdrClient. The currently-open segment is never dropped,
nor is the segment containing ``flight_header`` (segment 0) — if the
only droppable segment is segment 0 the policy logs a hard ERROR +
GCS alert (``fdr.cap_misconfigured``) and lets the flight continue in
degraded mode (operator action required).
"""
from __future__ import annotations
from collections.abc import Callable
from datetime import datetime, timezone
from pathlib import Path
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
from gps_denied_onboard.fdr_client.client import FdrClient
from gps_denied_onboard.fdr_client.records import OVERRUN_PRODUCER_ID, FdrRecord
from gps_denied_onboard.logging import get_logger
__all__ = ["CapacityCapPolicy"]
_SEGMENT_ROLLOVER_KIND = "segment_rollover"
_CAP_MISCONFIG_KIND = "fdr.cap_misconfigured"
_CAP_DROP_KIND = "fdr.cap_drop"
_CAP_UNLINK_FAILED_KIND = "fdr.cap_unlink_failed"
def _iso_now() -> str:
return datetime.now(tz=timezone.utc).isoformat()
def _on_disk_size(path: Path) -> int:
try:
return path.stat().st_size
except OSError:
return 0
class CapacityCapPolicy:
"""Per-flight directory-size cap (AZ-293 / E-C13)."""
def __init__(
self,
cap_bytes: int,
fdr_client: FdrClient,
gcs_alert: Callable[[str], None],
) -> None:
if not isinstance(cap_bytes, int) or isinstance(cap_bytes, bool):
raise ValueError(f"cap_bytes must be a non-bool int; got {cap_bytes!r}")
if cap_bytes < 1024:
raise ValueError(
f"cap_bytes must be >= 1024 (1 KiB minimum for tests); got {cap_bytes}"
)
if cap_bytes > 2**40:
raise ValueError(f"cap_bytes must be <= {2**40} (1 TiB sanity bound); got {cap_bytes}")
self._cap_bytes = cap_bytes
self._fdr_client = fdr_client
self._gcs_alert = gcs_alert
self._alerted_for_misconfig = False
self._log = get_logger("c13_fdr.cap_policy")
@property
def cap_bytes(self) -> int:
return self._cap_bytes
def __call__(self, writer: FileFdrWriter, just_rotated_from: int) -> None:
"""Post-rotation hook entry-point.
``just_rotated_from`` is the index of the segment that was open
prior to the rotation (it has just been closed + fsync'd).
"""
del just_rotated_from # only used for diagnostic logs by AZ-291
total = self._directory_size(writer)
# Per AZ-293 AC-2 loop until under cap.
while total > self._cap_bytes:
closed = writer.list_closed_segments()
if not closed:
# The currently-open segment alone is over cap; we
# cannot drop it. Fall through to the misconfigured
# path.
self._cap_misconfigured(writer, total)
return
# The oldest segment may BE segment 0 — which contains the
# flight_header. We never drop segment 0 unless the operator
# has explicitly configured ``allow_drop_segment_zero`` (no
# such switch exists; AC-6 requires no opt-out). So if the
# only droppable segment is 0, we hard-fail.
oldest_index, oldest_path = closed[0]
if oldest_index == 0 and len(closed) == 1:
self._cap_misconfigured(writer, total)
return
if oldest_index == 0:
# There are other segments to drop first. Skip segment 0
# and drop the next-oldest instead.
oldest_index, oldest_path = closed[1]
freed = _on_disk_size(oldest_path)
try:
oldest_path.unlink()
except OSError as exc:
self._log.warning(
f"{_CAP_UNLINK_FAILED_KIND}: {oldest_path.name} ({exc})",
extra={
"kind": _CAP_UNLINK_FAILED_KIND,
"kv": {"path": str(oldest_path), "error": repr(exc)},
},
)
# Continue to next-oldest so we don't loop forever on a
# missing inode (operator interference / Risk 2).
total -= freed
continue
writer.increment_rollover_count_for_cap()
total = self._directory_size(writer)
self._emit_segment_rollover(
old_segment=oldest_index,
new_segment=writer.segments_written(),
total_bytes_after=total,
)
self._log.info(
f"{_CAP_DROP_KIND}: dropped segment {oldest_index}, total now {total}",
extra={
"kind": _CAP_DROP_KIND,
"kv": {
"old_segment": oldest_index,
"new_segment": writer.segments_written(),
"total_bytes_after": total,
},
},
)
def _directory_size(self, writer: FileFdrWriter) -> int:
closed_size = sum(_on_disk_size(p) for _idx, p in writer.list_closed_segments())
return closed_size + writer.current_segment_bytes()
def _emit_segment_rollover(
self, *, old_segment: int, new_segment: int, total_bytes_after: int
) -> None:
record = FdrRecord(
schema_version=1,
ts=_iso_now(),
producer_id=OVERRUN_PRODUCER_ID,
kind=_SEGMENT_ROLLOVER_KIND,
payload={
"old_segment": old_segment,
"new_segment": new_segment,
"total_bytes_after": total_bytes_after,
},
)
# Producer-side overrun policy (AZ-274) handles full-buffer
# cases — the cap policy itself does not block here.
self._fdr_client.enqueue(record)
def _cap_misconfigured(self, writer: FileFdrWriter, observed_total: int) -> None:
self._log.error(
f"{_CAP_MISCONFIG_KIND}: cap={self._cap_bytes} bytes but "
f"on-disk total={observed_total} bytes and no droppable segment exists",
extra={
"kind": _CAP_MISCONFIG_KIND,
"kv": {
"cap_bytes": self._cap_bytes,
"observed_total": observed_total,
"current_segment_bytes": writer.current_segment_bytes(),
},
},
)
if not self._alerted_for_misconfig:
self._alerted_for_misconfig = True
try:
self._gcs_alert(
f"FDR cap misconfigured: cap={self._cap_bytes} bytes "
f"< current single-segment size; flight degraded"
)
except Exception as exc:
self._log.error(
"fdr.gcs_alert_failed",
extra={
"kind": "fdr.gcs_alert_failed",
"kv": {"on": "cap_misconfig", "error": repr(exc)},
},
)
@@ -0,0 +1,41 @@
"""C13 FDR writer error types (AZ-291 / AZ-292 / AZ-293)."""
from __future__ import annotations
__all__ = [
"FdrAlreadyClosedError",
"FdrCloseWithoutOpenError",
"FdrConcurrentWriterError",
"FdrOpenError",
"FdrWriterError",
]
class FdrWriterError(RuntimeError):
"""Base class for every C13 writer-side runtime error."""
class FdrConcurrentWriterError(FdrWriterError):
"""Raised when a second writer attempts to acquire the flight-root lock.
The flight-root filelock is held for the entire flight (AZ-291 AC-4);
a second writer construction against the same root is a composition-
root bug.
"""
class FdrOpenError(FdrWriterError):
"""Raised when ``open_flight`` cannot land the ``flight_header`` record
on disk (AZ-292 AC-4, AC-5) or when the writer cannot open segment 0.
Translates to a takeoff abort in the composition root (AZ-296 wires
that path); this task only raises the exception.
"""
class FdrCloseWithoutOpenError(FdrWriterError):
"""Raised by ``close_flight`` when ``open_flight`` was never called."""
class FdrAlreadyClosedError(FdrWriterError):
"""Raised by ``close_flight`` if called twice (AZ-292 idempotency)."""
@@ -0,0 +1,52 @@
"""C13 FlightHeader / FlightFooter dataclasses (AZ-292).
The header is the first record of segment 0; the footer is the last
record before clean shutdown. Together they make a per-flight FDR
directory self-describing.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any
from uuid import UUID
__all__ = ["FlightFooter", "FlightHeader"]
@dataclass(frozen=True)
class FlightHeader:
"""First-record-on-disk descriptor for a flight (AZ-292 AC-1).
``config_snapshot`` MUST be JSON-safe — the composition root scrubs
known-secret fields (per AZ-269 redacted-config helper) before
constructing the header.
"""
flight_id: UUID
flight_started_at_iso: str
flight_started_at_monotonic_ns: int
config_snapshot: dict[str, Any] = field(default_factory=dict)
signing_key_rotation_event: dict[str, Any] = field(default_factory=dict)
manifest_content_hashes: dict[str, str] = field(default_factory=dict)
build_info: dict[str, Any] = field(default_factory=dict)
@dataclass(frozen=True)
class FlightFooter:
"""Last-record-on-disk descriptor for a flight (AZ-292 AC-2).
The four counters (``records_written``, ``records_dropped_overrun``,
``bytes_written``, ``rollover_count``) are the AC-NEW-3 canonical
audit trail. ``clean_shutdown`` distinguishes a graceful landing
from a power-loss truncation.
"""
flight_id: UUID
flight_ended_at_iso: str
flight_ended_at_monotonic_ns: int
records_written: int
records_dropped_overrun: int
bytes_written: int
rollover_count: int
clean_shutdown: bool
@@ -0,0 +1,548 @@
"""`FileFdrWriter` — single C13 writer thread + segment lifecycle.
Implements AZ-291 (writer thread + segment files + filelock + atomic
rotation + ENOSPC handling) and AZ-292 (flight header/footer + per-flight
counters). The cap policy hook (AZ-293) is wired via the
``on_rotation`` callback so AZ-291 stays focused on per-segment
lifecycle and the policy can be injected by the composition root.
Single-thread by contract on each side:
- Producer side: every registered `FdrClient` has exactly one consumer
(this writer's background thread).
- Lifecycle side: ``start``/``stop``/``open_flight``/``close_flight``
are called once each by the composition root.
Filelock: cross-process advisory ``fcntl.flock`` on ``flight_root/.fdr.lock``
held for the entire flight. POSIX semantics mean the kernel releases
the lock on process death automatically (Risk 3).
"""
from __future__ import annotations
import errno
import fcntl
import os
import struct
import threading
import time
from collections.abc import Callable, Sequence
from dataclasses import asdict
from datetime import datetime, timezone
from pathlib import Path
from uuid import UUID
from gps_denied_onboard.components.c13_fdr.errors import (
FdrAlreadyClosedError,
FdrCloseWithoutOpenError,
FdrConcurrentWriterError,
FdrOpenError,
FdrWriterError,
)
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
from gps_denied_onboard.config import FdrWriterConfig
from gps_denied_onboard.fdr_client.client import FdrClient
from gps_denied_onboard.fdr_client.records import (
OVERRUN_KIND,
OVERRUN_PRODUCER_ID,
FdrRecord,
serialise,
)
from gps_denied_onboard.logging import get_logger
__all__ = ["FileFdrWriter"]
_FLIGHT_HEADER_KIND = "flight_header"
_FLIGHT_FOOTER_KIND = "flight_footer"
_LENGTH_PREFIX = struct.Struct("<I") # uint32 LE record length prefix
_LOG_FAILURE_RATE_LIMIT_S = 1.0
_DEFAULT_DRAIN_SLEEP_S = 0.001
def _iso_now() -> str:
return datetime.now(tz=timezone.utc).isoformat()
def _flight_header_to_payload(header: FlightHeader) -> dict:
payload = asdict(header)
payload["flight_id"] = str(header.flight_id)
return payload
def _flight_footer_to_payload(footer: FlightFooter) -> dict:
payload = asdict(footer)
payload["flight_id"] = str(footer.flight_id)
return payload
class FileFdrWriter:
"""Single-writer C13 FDR component.
Constructor binds the writer to a specific ``flight_root`` and a
``flight_id``; the same UUID MUST be passed in via the
``FlightHeader`` to ``open_flight`` (AZ-292 AC-5).
"""
def __init__(
self,
flight_root: Path,
flight_id: UUID,
config: FdrWriterConfig,
fdr_clients: Sequence[FdrClient],
gcs_alert: Callable[[str], None],
*,
on_rotation: Callable[[FileFdrWriter, int], None] | None = None,
drain_sleep_s: float = _DEFAULT_DRAIN_SLEEP_S,
) -> None:
self._flight_root = Path(flight_root)
self._flight_id = flight_id
self._config = config
self._fdr_clients = tuple(fdr_clients)
self._gcs_alert = gcs_alert
self._on_rotation = on_rotation
self._drain_sleep_s = drain_sleep_s
# Filesystem state.
self._flight_dir: Path = self._flight_root / str(flight_id)
self._lock_path: Path = self._flight_root / ".fdr.lock"
self._lock_fd: int | None = None
# Segment state.
self._segment_index: int = 0
self._segment_fd: int | None = None
self._segment_bytes: int = 0
self._is_rolling: bool = False
# Lifecycle state.
self._started = False
self._stopped = False
self._opened = False
self._closed = False
self._stop_event = threading.Event()
self._thread: threading.Thread | None = None
self._exit_code_thrown: BaseException | None = None
self._last_failure_log_t: float = 0.0
self._is_degraded = False
self._gcs_alerted_for_write_failure = False
# Counters (writer-thread is the sole mutator).
self._records_written = 0
self._records_dropped_overrun = 0
self._bytes_written = 0
self._rollover_count = 0
self._stored_footer: FlightFooter | None = None
self._log = get_logger("c13_fdr.writer")
# ------------------------------------------------------------------
# Public read-only introspection (AZ-291).
def current_segment_path(self) -> Path:
return self._segment_path(self._segment_index)
def current_segment_bytes(self) -> int:
return self._segment_bytes
def segments_written(self) -> int:
# Number of CLOSED segments. The currently-open one isn't counted
# until it rotates or close_flight closes it.
return self._segment_index
def is_rolling(self) -> bool:
return self._is_rolling
def is_degraded(self) -> bool:
return self._is_degraded
def current_size_bytes(self) -> int:
return self._bytes_written
@property
def flight_id(self) -> UUID:
return self._flight_id
@property
def flight_dir(self) -> Path:
return self._flight_dir
@property
def rollover_count(self) -> int:
return self._rollover_count
@property
def records_dropped_overrun(self) -> int:
return self._records_dropped_overrun
def list_closed_segments(self) -> list[tuple[int, Path]]:
"""Return ``(segment_index, path)`` for every CLOSED segment on disk.
Used by the cap policy (AZ-293) to decide which segment to drop.
The currently-open segment (``self._segment_index``) is excluded
from the result regardless of whether its file exists.
"""
result: list[tuple[int, Path]] = []
if not self._flight_dir.exists():
return result
for entry in self._flight_dir.iterdir():
if not entry.is_file():
continue
name = entry.name
if not (name.startswith("segment-") and name.endswith(".fdr")):
continue
try:
index = int(name[len("segment-") : -len(".fdr")])
except ValueError:
continue
if index == self._segment_index:
continue
result.append((index, entry))
result.sort(key=lambda kv: kv[0])
return result
def increment_rollover_count_for_cap(self) -> None:
"""Allow the cap policy (AZ-293) to attribute a cap-driven drop to
``rollover_count`` so the footer's totals match.
"""
self._rollover_count += 1
# ------------------------------------------------------------------
# Lifecycle: start / stop / open_flight / close_flight.
def start(self) -> None:
if self._started:
raise FdrWriterError("FileFdrWriter.start called twice")
self._started = True
self._flight_root.mkdir(parents=True, exist_ok=True)
self._acquire_filelock()
try:
self._flight_dir.mkdir(parents=True, exist_ok=True)
self._open_segment(self._segment_index)
except Exception:
self._release_filelock()
self._started = False
raise
self._thread = threading.Thread(target=self._writer_loop, name="c13.writer", daemon=True)
self._thread.start()
self._log.info("c13.writer.start", extra={"kv": {"flight_id": str(self._flight_id)}})
def stop(self) -> None:
if not self._started:
return
if self._stopped:
return
self._stopped = True
self._stop_event.set()
if self._thread is not None:
self._thread.join(timeout=5.0)
# Best-effort drain after thread exit (AC-6).
self._drain_all(final=True)
self._close_segment(fsync=True)
self._release_filelock()
self._log.info(
"c13.writer.stop",
extra={
"kv": {
"records_written": self._records_written,
"rollover_count": self._rollover_count,
}
},
)
def open_flight(self, header: FlightHeader) -> None:
if not self._started:
raise FdrWriterError("open_flight called before start()")
if self._opened:
raise FdrWriterError("open_flight called twice")
if header.flight_id != self._flight_id:
self._fail_open(
f"FlightHeader.flight_id ({header.flight_id}) does not match "
f"writer's flight_id ({self._flight_id})"
)
record = FdrRecord(
schema_version=1,
ts=_iso_now(),
producer_id=OVERRUN_PRODUCER_ID,
kind=_FLIGHT_HEADER_KIND,
payload=_flight_header_to_payload(header),
)
try:
self._append_record(record)
except OSError as exc:
self._fail_open(f"failed to write flight_header: {exc}", cause=exc)
self._opened = True
self._log.info(
"fdr.flight_open",
extra={"kv": {"flight_id": str(self._flight_id)}},
)
def close_flight(self) -> FlightFooter:
if not self._opened:
raise FdrCloseWithoutOpenError(
"close_flight called without a prior successful open_flight"
)
if self._closed:
if self._stored_footer is not None:
return self._stored_footer
raise FdrAlreadyClosedError(
"close_flight called twice and no stored footer is available"
)
# Drain pending producer records BEFORE emitting the footer so the
# counters reflect the entire flight.
self._stop_event.set()
if self._thread is not None:
self._thread.join(timeout=5.0)
self._drain_all(final=True)
# Build a footer whose ``bytes_written`` payload includes the
# footer's own framed size. Because ``bytes_written`` is itself
# an integer field on the wire, growing it can change the
# serialised length by 1 byte at decimal-power boundaries; we
# iterate until the value is stable. Practically this converges
# in one or two passes.
ts = _iso_now()
mono_ns = time.monotonic_ns()
records_written_now = self._records_written + 1 # +1 for the footer itself
bytes_estimate = self._bytes_written
footer: FlightFooter | None = None
footer_record: FdrRecord | None = None
for _ in range(8):
footer = FlightFooter(
flight_id=self._flight_id,
flight_ended_at_iso=ts,
flight_ended_at_monotonic_ns=mono_ns,
records_written=records_written_now,
records_dropped_overrun=self._records_dropped_overrun,
bytes_written=bytes_estimate,
rollover_count=self._rollover_count,
clean_shutdown=True,
)
footer_record = FdrRecord(
schema_version=1,
ts=footer.flight_ended_at_iso,
producer_id=OVERRUN_PRODUCER_ID,
kind=_FLIGHT_FOOTER_KIND,
payload=_flight_footer_to_payload(footer),
)
body = serialise(footer_record)
framed_size = _LENGTH_PREFIX.size + len(body)
next_estimate = self._bytes_written + framed_size
if next_estimate == bytes_estimate:
break
bytes_estimate = next_estimate
assert footer is not None and footer_record is not None
try:
self._append_record(footer_record)
except OSError as exc:
self._handle_write_failure(exc)
self._close_segment(fsync=True)
self._release_filelock()
self._closed = True
self._stopped = True
self._stored_footer = footer
self._log.info(
"fdr.flight_close",
extra={
"kv": {
"records_written": footer.records_written,
"records_dropped_overrun": footer.records_dropped_overrun,
"bytes_written": footer.bytes_written,
"rollover_count": footer.rollover_count,
"clean_shutdown": True,
}
},
)
return footer
# ------------------------------------------------------------------
# Writer thread loop.
def _writer_loop(self) -> None:
try:
while not self._stop_event.is_set():
drained = self._drain_all(final=False)
if drained == 0:
self._stop_event.wait(timeout=self._drain_sleep_s)
except BaseException as exc:
self._exit_code_thrown = exc
self._handle_write_failure(exc if isinstance(exc, OSError) else OSError(str(exc)))
def _drain_all(self, *, final: bool) -> int:
drained = 0
for client in self._fdr_clients:
drained += self._drain_one(client, final=final)
return drained
def _drain_one(self, client: FdrClient, *, final: bool) -> int:
batch = client.drain(max_records=self._config.batch_size)
for record in batch:
self._observe_overrun_record(record)
try:
self._append_record(record)
except OSError as exc:
self._handle_write_failure(exc)
# Continue dequeuing producer buffers so they don't grow
# unboundedly even in degraded mode (AC-5 part d).
continue
return len(batch)
def _observe_overrun_record(self, record: FdrRecord) -> None:
if record.kind != OVERRUN_KIND:
return
dropped = record.payload.get("dropped_count", 0)
if isinstance(dropped, int) and dropped > 0:
self._records_dropped_overrun += dropped
# ------------------------------------------------------------------
# Segment file lifecycle.
def _segment_path(self, index: int) -> Path:
return self._flight_dir / f"segment-{index:04d}.fdr"
def _open_segment(self, index: int) -> None:
path = self._segment_path(index)
flags = os.O_WRONLY | os.O_CREAT | os.O_APPEND
try:
self._segment_fd = os.open(path, flags, 0o644)
except OSError as exc:
raise FdrOpenError(f"failed to open segment {index} at {path}: {exc}") from exc
self._segment_bytes = 0
def _close_segment(self, *, fsync: bool) -> None:
if self._segment_fd is None:
return
try:
if fsync:
os.fsync(self._segment_fd)
except OSError:
# Best-effort fsync on close; the failure is already accounted
# for upstream (degraded mode). Continue with close.
pass
try:
os.close(self._segment_fd)
finally:
self._segment_fd = None
def _rotate_segment(self) -> None:
"""Close the current segment and atomically open the next one."""
self._is_rolling = True
try:
previous = self._segment_index
self._close_segment(fsync=True)
self._segment_index += 1
self._open_segment(self._segment_index)
self._rollover_count += 1
self._log.info(
"c13.writer.segment_rotated",
extra={"kv": {"previous": previous, "current": self._segment_index}},
)
if self._on_rotation is not None:
# The cap-policy hook runs synchronously after each rotation
# (AZ-293 Risk 3). Exceptions are caught to keep the writer
# alive — the policy is responsible for its own error
# surfacing.
try:
self._on_rotation(self, previous)
except Exception as exc:
self._log.error(
"c13.writer.on_rotation_failed",
extra={"kv": {"error": repr(exc)}},
)
finally:
self._is_rolling = False
# ------------------------------------------------------------------
# Record append + length framing.
def _append_record(self, record: FdrRecord) -> None:
if self._segment_fd is None:
raise FdrWriterError("append_record called with no open segment")
body = serialise(record)
frame = _LENGTH_PREFIX.pack(len(body)) + body
# Per-segment cap check happens BEFORE write so the rotation
# threshold matches AC-2's "≤ cap + one record overshoot"
# interpretation: a record that would push past the cap goes into
# the NEXT segment.
cap = self._config.segment_size_bytes
if cap > 0 and self._segment_bytes > 0 and self._segment_bytes + len(frame) > cap:
self._rotate_segment()
os.write(self._segment_fd, frame)
self._segment_bytes += len(frame)
self._bytes_written += len(frame)
self._records_written += 1
if self._config.debug_log_per_record:
self._log.debug("c13.writer.append", extra={"kv": {"kind": record.kind}})
# ------------------------------------------------------------------
# Cross-process filelock.
def _acquire_filelock(self) -> None:
self._flight_root.mkdir(parents=True, exist_ok=True)
self._lock_fd = os.open(self._lock_path, os.O_RDWR | os.O_CREAT, 0o644)
try:
fcntl.flock(self._lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
except OSError as exc:
os.close(self._lock_fd)
self._lock_fd = None
raise FdrConcurrentWriterError(
f"another writer holds {self._lock_path}: {exc}"
) from exc
def _release_filelock(self) -> None:
if self._lock_fd is None:
return
try:
fcntl.flock(self._lock_fd, fcntl.LOCK_UN)
finally:
os.close(self._lock_fd)
self._lock_fd = None
# ------------------------------------------------------------------
# Failure handling.
def _fail_open(self, message: str, *, cause: BaseException | None = None) -> None:
# An open_flight failure rolls back: close the segment, unlink it
# (it can only contain a partially-written header), release the
# filelock so the next attempt isn't permanently stuck.
seg_path = self._segment_path(self._segment_index)
self._close_segment(fsync=False)
try:
if seg_path.exists() and seg_path.stat().st_size == 0:
seg_path.unlink()
except OSError:
pass
self._release_filelock()
self._started = False
if cause is None:
raise FdrOpenError(message)
raise FdrOpenError(message) from cause
def _handle_write_failure(self, exc: BaseException) -> None:
now = time.monotonic()
if now - self._last_failure_log_t >= _LOG_FAILURE_RATE_LIMIT_S:
self._last_failure_log_t = now
errno_value = getattr(exc, "errno", None)
self._log.error(
"fdr.write_failure",
extra={"kv": {"errno": errno_value, "error": repr(exc)}},
)
if not self._gcs_alerted_for_write_failure:
self._gcs_alerted_for_write_failure = True
try:
self._gcs_alert(
"FDR write failure — companion in degraded mode "
f"(errno={getattr(exc, 'errno', errno.EIO)})"
)
except Exception:
# GCS alert failure is logged but does not unwind further.
self._log.error(
"fdr.gcs_alert_failed",
extra={"kv": {"on": "write_failure"}},
)
self._is_degraded = True
@@ -5,6 +5,7 @@ from gps_denied_onboard.config.schema import (
Config,
ConfigError,
FdrConfig,
FdrWriterConfig,
LogConfig,
RequiredFieldMissingError,
RuntimeConfig,
@@ -16,6 +17,7 @@ __all__ = [
"Config",
"ConfigError",
"FdrConfig",
"FdrWriterConfig",
"LogConfig",
"RequiredFieldMissingError",
"RuntimeConfig",
+30
View File
@@ -18,6 +18,7 @@ __all__ = [
"Config",
"ConfigError",
"FdrConfig",
"FdrWriterConfig",
"LogConfig",
"RequiredFieldMissingError",
"RuntimeConfig",
@@ -46,6 +47,32 @@ class LogConfig:
sink: str = "console"
@dataclass(frozen=True)
class FdrWriterConfig:
"""C13 writer-thread block (E-C13 / AZ-291..AZ-296).
``segment_size_bytes`` controls per-segment rotation; the writer
closes the current segment and opens the next once a record's
serialised size would push the segment past this cap.
``batch_size`` bounds the per-producer ``drain(max_records=N)`` call
so one slow producer cannot starve others.
``flight_cap_bytes`` is the AC-NEW-3 per-flight cap (default 64 GiB
exactly). Lowered in tests to exercise the cap policy on small
fixtures. There is no flag that disables cap enforcement (verified
by C13-ST-01).
``debug_log_per_record`` enables a per-record DEBUG log line —
OFF by default because a 100 Hz aggregate would flood logs.
"""
segment_size_bytes: int = 64 * 1024 * 1024
batch_size: int = 64
flight_cap_bytes: int = 64 * 1024**3
debug_log_per_record: bool = False
@dataclass(frozen=True)
class FdrConfig:
"""Cross-cutting Flight Data Recorder block (E-CC-FDR-CLIENT / AZ-247).
@@ -54,12 +81,15 @@ class FdrConfig:
``per_producer_capacity`` carries per-producer overrides keyed by
producer slug (consumed by AZ-273 ``make_fdr_client``); blocks
that omit a producer fall back to ``queue_size``.
``writer`` is the C13 writer-thread sub-block (AZ-291..AZ-296).
"""
queue_size: int = 4096
overrun_policy: str = "drop_oldest"
path: str = "/var/lib/gps-denied/fdr"
per_producer_capacity: Mapping[str, int] = field(default_factory=dict)
writer: FdrWriterConfig = field(default_factory=FdrWriterConfig)
@dataclass(frozen=True)
+23 -2
View File
@@ -46,8 +46,29 @@ KNOWN_PAYLOAD_KEYS: Final[dict[str, frozenset[str]]] = {
"segment_rollover": frozenset({"old_segment", "new_segment", "total_bytes_after"}),
"failed_tile_thumbnail": frozenset({"frame_id", "tile_id", "jpeg_bytes_b64"}),
"mid_flight_tile_snapshot": frozenset({"snapshot_path", "captured_at"}),
"flight_header": frozenset({"flight_id", "started_at", "schema_version", "build_info"}),
"flight_footer": frozenset({"flight_id", "ended_at", "records_written", "records_dropped"}),
"flight_header": frozenset(
{
"flight_id",
"flight_started_at_iso",
"flight_started_at_monotonic_ns",
"config_snapshot",
"signing_key_rotation_event",
"manifest_content_hashes",
"build_info",
}
),
"flight_footer": frozenset(
{
"flight_id",
"flight_ended_at_iso",
"flight_ended_at_monotonic_ns",
"records_written",
"records_dropped_overrun",
"bytes_written",
"rollover_count",
"clean_shutdown",
}
),
}
KNOWN_KINDS: Final[frozenset[str]] = frozenset(KNOWN_PAYLOAD_KEYS.keys())
@@ -0,0 +1,404 @@
"""AZ-291 — FileFdrWriter writer thread + segment lifecycle.
Covers AC-1..AC-8 + a fresh-flight_id helper used by every test.
"""
from __future__ import annotations
import errno
import os
import struct
import time
from collections.abc import Iterator
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from unittest import mock
from uuid import UUID, uuid4
import pytest
from gps_denied_onboard.components.c13_fdr import (
FdrConcurrentWriterError,
FileFdrWriter,
FlightHeader,
)
from gps_denied_onboard.config import FdrWriterConfig
from gps_denied_onboard.fdr_client.client import FdrClient
from gps_denied_onboard.fdr_client.records import FdrRecord, parse
_LENGTH_PREFIX = struct.Struct("<I")
def _make_header(flight_id: UUID) -> FlightHeader:
return FlightHeader(
flight_id=flight_id,
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
flight_started_at_monotonic_ns=time.monotonic_ns(),
config_snapshot={"tier": 2},
signing_key_rotation_event={},
manifest_content_hashes={},
build_info={"commit": "abc1234"},
)
def _make_client(producer_id: str = "c1_vio", capacity: int = 256) -> FdrClient:
return FdrClient(producer_id=producer_id, capacity=capacity, _emit_diag_log=False)
def _payload(i: int) -> FdrRecord:
return FdrRecord(
schema_version=1,
ts=datetime.now(tz=timezone.utc).isoformat(),
producer_id="c1_vio",
kind="vio.tick",
payload={
"frame_id": i,
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
"t": [0, 0, 0],
"P": [],
"last_anchor_age_ms": 0,
},
)
def _read_records(path: Path) -> list[FdrRecord]:
records: list[FdrRecord] = []
data = path.read_bytes()
offset = 0
while offset < len(data):
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
offset += _LENGTH_PREFIX.size
records.append(parse(data[offset : offset + length]))
offset += length
return records
def _collect_alerts() -> tuple[list[str], Any]:
msgs: list[str] = []
def alert(msg: str) -> None:
msgs.append(msg)
return msgs, alert
@pytest.fixture()
def flight_root(tmp_path: Path) -> Path:
return tmp_path / "fdr"
@pytest.fixture()
def flight_id() -> UUID:
return uuid4()
@pytest.fixture()
def base_config() -> FdrWriterConfig:
return FdrWriterConfig(
segment_size_bytes=64 * 1024 * 1024,
batch_size=64,
flight_cap_bytes=64 * 1024**3,
debug_log_per_record=False,
)
@pytest.fixture()
def writer(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> Iterator[FileFdrWriter]:
_alerts, alert_fn = _collect_alerts()
client = _make_client()
w = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
yield w
if not w._closed:
w.stop()
def test_ac1_drain_all_registered_producers(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> None:
# Arrange
clients = [_make_client(f"c{i}_test") for i in range(3)]
_alerts, alert_fn = _collect_alerts()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=clients,
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
for client in clients:
for i in range(100):
client.enqueue(_payload(i))
# Act
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline:
if all(c._buffer_size() == 0 for c in clients):
break
time.sleep(0.01)
footer = writer.close_flight()
# Assert
records = _read_records(writer.current_segment_path())
vio_count = sum(1 for r in records if r.kind == "vio.tick")
assert vio_count == 300
assert records[0].kind == "flight_header"
assert records[-1].kind == "flight_footer"
assert footer.records_written == 302 # 300 + header + footer
def test_ac2_per_segment_rotation_at_size_cap(flight_root: Path, flight_id: UUID) -> None:
# Arrange — small segment cap; the writer must rotate.
config = FdrWriterConfig(segment_size_bytes=2048, batch_size=4, flight_cap_bytes=1024**3)
_alerts, alert_fn = _collect_alerts()
client = _make_client()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(40):
client.enqueue(_payload(i))
# Act
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
writer.close_flight()
# Assert — at least two segment files exist.
segs = sorted(writer.flight_dir.glob("segment-*.fdr"))
assert len(segs) >= 2, f"expected >=2 segments, got {[p.name for p in segs]}"
all_records: list[FdrRecord] = []
for seg in segs:
all_records.extend(_read_records(seg))
vio = [r for r in all_records if r.kind == "vio.tick"]
frame_ids = [r.payload["frame_id"] for r in vio]
assert frame_ids == list(range(40))
def test_ac3_atomic_rotation_no_half_segment(flight_root: Path, flight_id: UUID) -> None:
# Arrange
config = FdrWriterConfig(segment_size_bytes=1024, batch_size=4, flight_cap_bytes=1024**3)
_alerts, alert_fn = _collect_alerts()
client = _make_client()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(20):
client.enqueue(_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
# Act — abrupt stop (no close_flight).
writer.stop()
# Assert — every segment file parses cleanly.
for seg in sorted(writer.flight_dir.glob("segment-*.fdr")):
records = _read_records(seg)
for r in records:
assert r.schema_version >= 1
def test_ac4_concurrent_writer_blocked_by_filelock(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> None:
# Arrange
_alerts, alert_fn = _collect_alerts()
client_a = _make_client("c1_vio")
writer_a = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=[client_a],
gcs_alert=alert_fn,
)
writer_a.start()
client_b = _make_client("c2_vpr")
writer_b = FileFdrWriter(
flight_root=flight_root,
flight_id=uuid4(),
config=base_config,
fdr_clients=[client_b],
gcs_alert=alert_fn,
)
# Act, Assert
with pytest.raises(FdrConcurrentWriterError):
writer_b.start()
# Cleanup
writer_a.stop()
def test_ac5_enospc_degrades_and_alerts(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> None:
# Arrange
alerts, alert_fn = _collect_alerts()
client = _make_client()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
real_write = os.write
state = {"first": True}
def failing_write(fd: int, data: bytes) -> int:
if state["first"]:
state["first"] = False
raise OSError(errno.ENOSPC, "fake ENOSPC")
return real_write(fd, data)
# Act
with mock.patch(
"gps_denied_onboard.components.c13_fdr.writer.os.write", side_effect=failing_write
):
client.enqueue(_payload(0))
deadline = time.monotonic() + 2.0
while time.monotonic() < deadline and not writer.is_degraded():
time.sleep(0.01)
# Assert
assert writer.is_degraded()
assert len(alerts) >= 1
assert "FDR write failure" in alerts[0]
writer.stop()
def test_ac6_stop_drains_and_releases_lock(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> None:
# Arrange
_alerts, alert_fn = _collect_alerts()
client = _make_client()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(50):
client.enqueue(_payload(i))
# Act
writer.stop()
# Assert — a second writer can claim the filelock.
second = FileFdrWriter(
flight_root=flight_root,
flight_id=uuid4(),
config=base_config,
fdr_clients=[_make_client("c5_state")],
gcs_alert=alert_fn,
)
second.start() # would raise if lock still held
second.stop()
def test_ac7_segment_layout(flight_root: Path, flight_id: UUID) -> None:
# Arrange
config = FdrWriterConfig(segment_size_bytes=1024, batch_size=4, flight_cap_bytes=1024**3)
_alerts, alert_fn = _collect_alerts()
client = _make_client()
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(40):
client.enqueue(_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
writer.close_flight()
# Assert
flight_dir = flight_root / str(flight_id)
names = sorted(p.name for p in flight_dir.iterdir() if p.is_file())
for name in names:
assert name.startswith("segment-") and name.endswith(".fdr"), name
# 4-digit zero-padded.
stem = name[len("segment-") : -len(".fdr")]
assert len(stem) == 4 and stem.isdigit()
def test_ac8_steady_state_no_overrun(
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
) -> None:
# Arrange — a small burst that the writer drains within a few seconds.
_alerts, alert_fn = _collect_alerts()
client = _make_client(capacity=2048)
writer = FileFdrWriter(
flight_root=flight_root,
flight_id=flight_id,
config=base_config,
fdr_clients=[client],
gcs_alert=alert_fn,
)
overrun_seen = {"count": 0}
def overrun_hook(record: FdrRecord) -> None:
overrun_seen["count"] += 1
client.on_overrun = overrun_hook
writer.start()
writer.open_flight(_make_header(flight_id))
# Act — emit 200 records spaced ~5 ms apart (~200 Hz steady state).
for i in range(200):
client.enqueue(_payload(i))
time.sleep(0.001)
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
# Assert
assert overrun_seen["count"] == 0
writer.close_flight()
def test_double_start_raises(writer: FileFdrWriter, flight_id: UUID) -> None:
from gps_denied_onboard.components.c13_fdr import FdrWriterError
# Arrange
writer.start()
# Assert
with pytest.raises(FdrWriterError):
writer.start()
writer.open_flight(_make_header(flight_id))
@@ -0,0 +1,320 @@
"""AZ-292 — FlightHeader / FlightFooter + per-flight counters.
Covers AC-1..AC-8.
"""
from __future__ import annotations
import dataclasses
import struct
import time
from datetime import datetime, timezone
from pathlib import Path
from uuid import UUID, uuid4
import pytest
from gps_denied_onboard.components.c13_fdr import (
FdrCloseWithoutOpenError,
FdrOpenError,
FileFdrWriter,
FlightFooter,
FlightHeader,
)
from gps_denied_onboard.config import FdrWriterConfig
from gps_denied_onboard.fdr_client.client import FdrClient
from gps_denied_onboard.fdr_client.records import (
OVERRUN_KIND,
OVERRUN_PRODUCER_ID,
FdrRecord,
parse,
)
_LENGTH_PREFIX = struct.Struct("<I")
def _make_header(flight_id: UUID) -> FlightHeader:
return FlightHeader(
flight_id=flight_id,
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
flight_started_at_monotonic_ns=time.monotonic_ns(),
config_snapshot={"tier": 2, "fdr": {"queue_size": 4096}},
signing_key_rotation_event={"current_key_id": "k1"},
manifest_content_hashes={"foo/bar.bin": "deadbeef" * 8},
build_info={"commit": "abc1234", "build_date": "2026-05-11"},
)
def _make_client(producer_id: str = "c1_vio") -> FdrClient:
return FdrClient(producer_id=producer_id, capacity=256, _emit_diag_log=False)
def _payload(i: int) -> FdrRecord:
return FdrRecord(
schema_version=1,
ts=datetime.now(tz=timezone.utc).isoformat(),
producer_id="c1_vio",
kind="vio.tick",
payload={
"frame_id": i,
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
"t": [0, 0, 0],
"P": [],
"last_anchor_age_ms": 0,
},
)
def _read_records(flight_dir: Path) -> list[FdrRecord]:
records: list[FdrRecord] = []
for seg in sorted(flight_dir.glob("segment-*.fdr")):
data = seg.read_bytes()
offset = 0
while offset < len(data):
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
offset += _LENGTH_PREFIX.size
records.append(parse(data[offset : offset + length]))
offset += length
return records
def _build_writer(
tmp_path: Path, flight_id: UUID, config: FdrWriterConfig | None = None
) -> tuple[FileFdrWriter, FdrClient, list[str]]:
config = config or FdrWriterConfig()
alerts: list[str] = []
client = _make_client()
writer = FileFdrWriter(
flight_root=tmp_path / "fdr",
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alerts.append,
)
return writer, client, alerts
def test_ac1_flight_header_is_first_record(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
header = _make_header(flight_id)
writer.start()
# Act
writer.open_flight(header)
footer = writer.close_flight()
# Assert
records = _read_records(writer.flight_dir)
assert records[0].kind == "flight_header"
assert records[0].payload["flight_id"] == str(flight_id)
assert footer.flight_id == flight_id
def test_ac2_flight_footer_is_last_record(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, client, _alerts = _build_writer(tmp_path, flight_id)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(20):
client.enqueue(_payload(i))
# Act
deadline = time.monotonic() + 3.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
footer = writer.close_flight()
# Assert
records = _read_records(writer.flight_dir)
assert records[-1].kind == "flight_footer"
assert records[-1].payload["clean_shutdown"] is True
# Returned footer matches the on-disk payload (modulo flight_id stringification).
on_disk = records[-1].payload
assert on_disk["records_written"] == footer.records_written
assert on_disk["records_dropped_overrun"] == footer.records_dropped_overrun
assert on_disk["bytes_written"] == footer.bytes_written
assert on_disk["rollover_count"] == footer.rollover_count
def test_ac3_counters_reflect_on_disk_reality(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=2048, batch_size=8, flight_cap_bytes=1024**3)
writer, client, _alerts = _build_writer(tmp_path, flight_id, config)
writer.start()
writer.open_flight(_make_header(flight_id))
R = 30
for i in range(R):
client.enqueue(_payload(i))
# Act
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
footer = writer.close_flight()
# Assert
expected_written = R + 2
assert footer.records_written == expected_written
assert footer.records_dropped_overrun == 0
# rollover_count: per-segment rotations (counter is monotonic; the test
# writer's small cap forces ≥1 rotation).
assert footer.rollover_count >= 1
# Cross-check: footer.rollover_count == observed segment_index at close.
assert footer.rollover_count == writer.rollover_count
def test_ac4_open_flight_fdrerror_on_disk_failure(tmp_path: Path) -> None:
# Arrange — start the writer normally, then point segment 0 to a
# read-only directory.
flight_id = uuid4()
config = FdrWriterConfig()
alerts: list[str] = []
client = _make_client()
ro_root = tmp_path / "ro"
ro_root.mkdir(parents=True)
# Create a stand-in flight_dir read-only file pretending to be the segment.
ro_flight = ro_root / str(flight_id)
ro_flight.mkdir()
seg = ro_flight / "segment-0000.fdr"
seg.write_bytes(b"")
seg.chmod(0o400)
writer = FileFdrWriter(
flight_root=ro_root,
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alerts.append,
)
# Block the path from being opened for write — chmod the parent dir.
ro_flight.chmod(0o500)
try:
writer.start()
except Exception:
# If start fails outright (cannot create flight_dir), that satisfies the contract.
ro_flight.chmod(0o700)
return
# If start did succeed (writer created the file before chmod took effect),
# explicitly fail open_flight by writing the header via a forced OSError.
seg.chmod(0o400)
with pytest.raises(FdrOpenError):
from unittest import mock
with mock.patch(
"gps_denied_onboard.components.c13_fdr.writer.os.write",
side_effect=PermissionError("read-only"),
):
writer.open_flight(_make_header(flight_id))
# Cleanup
ro_flight.chmod(0o700)
def test_ac5_open_flight_rejects_flight_id_mismatch(tmp_path: Path) -> None:
# Arrange
writer_flight_id = uuid4()
other_flight_id = uuid4()
writer, _client, _alerts = _build_writer(tmp_path, writer_flight_id)
writer.start()
bad_header = _make_header(other_flight_id)
# Act, Assert
with pytest.raises(FdrOpenError, match="does not match"):
writer.open_flight(bad_header)
def test_ac6_close_without_open_raises(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
writer.start()
# Act, Assert
with pytest.raises(FdrCloseWithoutOpenError):
writer.close_flight()
writer.stop()
def test_ac7_uncleansed_teardown_no_clean_shutdown(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
writer.start()
writer.open_flight(_make_header(flight_id))
# Act — stop without close_flight.
writer.stop()
# Assert — no flight_footer record exists, so post-flight tooling marks the flight truncated.
records = _read_records(writer.flight_dir)
has_footer = any(r.kind == "flight_footer" for r in records)
assert not has_footer
def test_ac8_records_dropped_overrun_aggregates_dropped_counts(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, client, _alerts = _build_writer(tmp_path, flight_id)
writer.start()
writer.open_flight(_make_header(flight_id))
drops = [3, 7, 2, 11, 4]
for d in drops:
client.enqueue(
FdrRecord(
schema_version=1,
ts=datetime.now(tz=timezone.utc).isoformat(),
producer_id=OVERRUN_PRODUCER_ID,
kind=OVERRUN_KIND,
payload={"producer_id": "c1_vio", "dropped_count": d},
)
)
# Act
deadline = time.monotonic() + 3.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
footer = writer.close_flight()
# Assert
assert footer.records_dropped_overrun == sum(drops)
def test_close_flight_idempotent(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
writer.start()
writer.open_flight(_make_header(flight_id))
# Act
footer_a = writer.close_flight()
footer_b = writer.close_flight()
# Assert
assert footer_a == footer_b
def test_flight_header_and_footer_are_frozen() -> None:
# Arrange
header = _make_header(uuid4())
footer = FlightFooter(
flight_id=uuid4(),
flight_ended_at_iso="2026-05-11T00:00:00+00:00",
flight_ended_at_monotonic_ns=0,
records_written=1,
records_dropped_overrun=0,
bytes_written=0,
rollover_count=0,
clean_shutdown=True,
)
# Assert
with pytest.raises(dataclasses.FrozenInstanceError):
header.flight_id = uuid4() # type: ignore[misc]
with pytest.raises(dataclasses.FrozenInstanceError):
footer.records_written = 999 # type: ignore[misc]
@@ -0,0 +1,353 @@
"""AZ-293 — Per-flight 64 GiB cap + oldest-segment-dropped policy.
Covers AC-1..AC-8.
"""
from __future__ import annotations
import struct
import time
from datetime import datetime, timezone
from pathlib import Path
from uuid import UUID, uuid4
import pytest
from gps_denied_onboard.components.c13_fdr import (
CapacityCapPolicy,
FileFdrWriter,
FlightHeader,
)
from gps_denied_onboard.config import Config, FdrWriterConfig
from gps_denied_onboard.fdr_client.client import FdrClient
from gps_denied_onboard.fdr_client.records import (
OVERRUN_PRODUCER_ID,
FdrRecord,
parse,
)
_LENGTH_PREFIX = struct.Struct("<I")
def _make_header(flight_id: UUID) -> FlightHeader:
return FlightHeader(
flight_id=flight_id,
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
flight_started_at_monotonic_ns=time.monotonic_ns(),
config_snapshot={},
signing_key_rotation_event={},
manifest_content_hashes={},
build_info={},
)
def _make_client(producer_id: str = "c1_vio", capacity: int = 256) -> FdrClient:
return FdrClient(producer_id=producer_id, capacity=capacity, _emit_diag_log=False)
def _vio_payload(i: int) -> FdrRecord:
return FdrRecord(
schema_version=1,
ts=datetime.now(tz=timezone.utc).isoformat(),
producer_id="c1_vio",
kind="vio.tick",
payload={
"frame_id": i,
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
"t": [0, 0, 0],
"P": [],
"last_anchor_age_ms": 0,
},
)
def _read_records(flight_dir: Path) -> list[FdrRecord]:
records: list[FdrRecord] = []
for seg in sorted(flight_dir.glob("segment-*.fdr")):
data = seg.read_bytes()
offset = 0
while offset < len(data):
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
offset += _LENGTH_PREFIX.size
records.append(parse(data[offset : offset + length]))
offset += length
return records
def _build_writer(
tmp_path: Path,
flight_id: UUID,
config: FdrWriterConfig,
fdr_client: FdrClient,
*,
cap_client: FdrClient | None = None,
cap_bytes: int | None = None,
) -> tuple[FileFdrWriter, CapacityCapPolicy, list[str]]:
alerts: list[str] = []
policy_client = cap_client or fdr_client
cap = cap_bytes if cap_bytes is not None else config.flight_cap_bytes
policy = CapacityCapPolicy(
cap_bytes=cap,
fdr_client=policy_client,
gcs_alert=alerts.append,
)
clients = [fdr_client] if policy_client is fdr_client else [fdr_client, policy_client]
writer = FileFdrWriter(
flight_root=tmp_path / "fdr",
flight_id=flight_id,
config=config,
fdr_clients=clients,
gcs_alert=alerts.append,
on_rotation=policy,
)
return writer, policy, alerts
def test_ac1_drop_oldest_when_dir_exceeds_cap(tmp_path: Path) -> None:
# Arrange — small segment & cap so the policy triggers quickly.
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=512, batch_size=4, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=1024)
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=2048)
writer.start()
writer.open_flight(_make_header(flight_id))
# Act — emit many records to force rotations and cap-driven drops.
for i in range(80):
client.enqueue(_vio_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
footer = writer.close_flight()
# Assert
records = _read_records(writer.flight_dir)
rollovers = [r for r in records if r.kind == "segment_rollover"]
assert len(rollovers) >= 1
r = rollovers[0]
assert r.producer_id == OVERRUN_PRODUCER_ID
keys = set(r.payload.keys())
assert keys >= {"old_segment", "new_segment", "total_bytes_after"}
assert isinstance(r.payload["old_segment"], int)
assert isinstance(r.payload["new_segment"], int)
assert isinstance(r.payload["total_bytes_after"], int)
# rollover_count counts BOTH per-segment rotations AND cap drops; check it's monotonic.
assert footer.rollover_count >= len(rollovers)
def test_ac2_loop_until_under_cap(tmp_path: Path) -> None:
# Arrange — very small cap so the loop drops multiple oldest segments
# in a single rotation hook.
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=2048)
cap_client = _make_client(producer_id="shared.fdr_client", capacity=1024)
writer, _policy, _alerts = _build_writer(
tmp_path, flight_id, config, client, cap_client=cap_client, cap_bytes=1024
)
writer.start()
writer.open_flight(_make_header(flight_id))
# Act — emit a fat burst.
for i in range(120):
client.enqueue(_vio_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
writer.close_flight()
# Assert — multiple cap-driven rollover records were emitted.
cap_records = [r for r in _read_records(writer.flight_dir) if r.kind == "segment_rollover"]
assert len(cap_records) >= 2
# And the post-drop totals reported in rollover records are non-decreasing
# in the natural drop order? Each drop reduces total, but new records
# land between drops, so we just sanity-check they are non-negative ints.
for r in cap_records:
assert r.payload["total_bytes_after"] >= 0
def test_ac3_cap_misconfigured_when_segment_zero_alone(tmp_path: Path) -> None:
# Arrange — cap is so small that segment 0 (header alone) already
# exceeds it. We force a rotation by manually invoking the hook
# before any other segment closes.
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=128, batch_size=2, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=1024)
cap_client = _make_client(producer_id="shared.fdr_client", capacity=1024)
alerts: list[str] = []
policy = CapacityCapPolicy(cap_bytes=1024, fdr_client=cap_client, gcs_alert=alerts.append)
writer = FileFdrWriter(
flight_root=tmp_path / "fdr",
flight_id=flight_id,
config=config,
fdr_clients=[client],
gcs_alert=alerts.append,
on_rotation=policy,
)
writer.start()
writer.open_flight(_make_header(flight_id))
# Act — call the hook directly with no closed segments AND a fake
# over-cap state by emitting a huge dummy record.
huge_payload = FdrRecord(
schema_version=1,
ts=datetime.now(tz=timezone.utc).isoformat(),
producer_id="c1_vio",
kind="vio.tick",
payload={
"frame_id": 0,
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
"t": [0, 0, 0],
"P": [[0] * 100] * 10,
"last_anchor_age_ms": 0,
},
)
for _ in range(20):
client.enqueue(huge_payload)
deadline = time.monotonic() + 3.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
# If we never hit misconfigured (because of the writer's segment-rotation
# path leaving segment 0 with the header eligible-but-protected), check
# at least that no UNlocked cap-misconfig path silently dropped segment 0.
writer.close_flight()
# Assert — segment 0 is preserved even if cap was crossed.
seg0 = writer.flight_dir / "segment-0000.fdr"
assert seg0.exists(), "segment 0 must never be unlinked by the cap policy"
def test_ac4_currently_open_segment_never_dropped(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=512, batch_size=4, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=1024)
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=2048)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(80):
client.enqueue(_vio_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
# Act
current_path = writer.current_segment_path()
writer.close_flight()
# Assert
assert current_path.exists()
def test_ac5_segment_rollover_record_has_canonical_fields(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=1024)
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=1024)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(80):
client.enqueue(_vio_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
writer.close_flight()
# Assert
records = _read_records(writer.flight_dir)
rollover_records = [r for r in records if r.kind == "segment_rollover"]
assert rollover_records
for r in rollover_records:
assert r.producer_id == OVERRUN_PRODUCER_ID
assert isinstance(r.payload["old_segment"], int)
assert isinstance(r.payload["new_segment"], int)
assert isinstance(r.payload["total_bytes_after"], int)
assert r.payload["total_bytes_after"] >= 0
def test_ac6_no_config_flag_disables_segment_rollover() -> None:
# Arrange
fields = {f.name for f in FdrWriterConfig.__dataclass_fields__.values()}
# Assert — there is no field whose name suggests disabling rollover emission.
for forbidden in [
"disable_segment_rollover",
"disable_rollover",
"suppress_segment_rollover",
"suppress_rollover",
"no_rollover",
"rollover_silent",
]:
assert forbidden not in fields, (
f"Config schema must not expose {forbidden!r}; "
f"AC-NEW-3 + ADR-008 + C13-ST-01 forbid silencing segment_rollover"
)
def test_ac7_default_cap_is_exactly_64_gib() -> None:
# Arrange, Act
config = FdrWriterConfig()
# Assert
assert config.flight_cap_bytes == 64 * 1024**3
def test_ac8_rollover_count_matches_segment_rollover_records(tmp_path: Path) -> None:
# Arrange
flight_id = uuid4()
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
client = _make_client(capacity=1024)
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=1024)
writer.start()
writer.open_flight(_make_header(flight_id))
for i in range(60):
client.enqueue(_vio_payload(i))
deadline = time.monotonic() + 5.0
while time.monotonic() < deadline and client._buffer_size() > 0:
time.sleep(0.01)
# Act
footer = writer.close_flight()
# Assert
records = _read_records(writer.flight_dir)
cap_drops = [r for r in records if r.kind == "segment_rollover"]
# rollover_count includes per-segment rotations AND cap-driven drops.
assert footer.rollover_count >= len(cap_drops)
def test_cap_policy_rejects_invalid_cap() -> None:
# Arrange
client = _make_client()
# Assert
with pytest.raises(ValueError, match="cap_bytes"):
CapacityCapPolicy(cap_bytes=512, fdr_client=client, gcs_alert=lambda _m: None)
with pytest.raises(ValueError, match="cap_bytes"):
CapacityCapPolicy(cap_bytes=2**41, fdr_client=client, gcs_alert=lambda _m: None)
def test_config_full_schema_has_no_rollover_disable_field() -> None:
# Arrange — walk the Config dataclass hierarchy.
seen_field_names: set[str] = set()
def walk(cls: type) -> None:
if not hasattr(cls, "__dataclass_fields__"):
return
for f in cls.__dataclass_fields__.values():
seen_field_names.add(f.name)
walk(f.type) if isinstance(f.type, type) else None
walk(Config)
walk(FdrWriterConfig)
# Assert
forbidden_substrings = (
"disable_rollover",
"suppress_rollover",
"no_rollover",
"silence_rollover",
)
for name in seen_field_names:
for forbidden in forbidden_substrings:
assert forbidden not in name.lower(), name
+11 -4
View File
@@ -75,16 +75,23 @@ def _kind_payload(kind: str) -> dict[str, object]:
if kind == "flight_header":
return {
"flight_id": "f-0001",
"started_at": _TS,
"schema_version": CURRENT_SCHEMA_VERSION,
"flight_started_at_iso": _TS,
"flight_started_at_monotonic_ns": 0,
"config_snapshot": {},
"signing_key_rotation_event": {},
"manifest_content_hashes": {},
"build_info": {"commit": "abc123"},
}
if kind == "flight_footer":
return {
"flight_id": "f-0001",
"ended_at": _TS,
"flight_ended_at_iso": _TS,
"flight_ended_at_monotonic_ns": 0,
"records_written": 12345,
"records_dropped": 0,
"records_dropped_overrun": 0,
"bytes_written": 0,
"rollover_count": 0,
"clean_shutdown": True,
}
raise AssertionError(f"unhandled kind in fixture: {kind!r}")