mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:41:12 +00:00
[AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -51,8 +51,8 @@ class FdrRecord:
|
||||
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
|
||||
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
|
||||
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at}` | AC-8.4 mid-flight snapshot pointer |
|
||||
| `flight_header` | C13 (writer) | `{flight_id, started_at, schema_version, build_info}` | Single record at flight open |
|
||||
| `flight_footer` | C13 (writer) | `{flight_id, ended_at, records_written, records_dropped}` | Single record at flight close |
|
||||
| `flight_header` | C13 (writer) | `{flight_id, flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes, build_info}` | Single record at flight open (envelope `producer_id="shared.fdr_client"`) |
|
||||
| `flight_footer` | C13 (writer) | `{flight_id, flight_ended_at_iso, flight_ended_at_monotonic_ns, records_written, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown}` | Single record at flight close (envelope `producer_id="shared.fdr_client"`) |
|
||||
|
||||
### Wire bytes
|
||||
|
||||
|
||||
@@ -0,0 +1,98 @@
|
||||
# Batch 06 — Implementation Report (Cycle 1)
|
||||
|
||||
**Tasks**: AZ-291, AZ-292, AZ-293
|
||||
**Component**: C13 FDR Writer (E-C13)
|
||||
**Cycle**: 1 (Build → Ship)
|
||||
**Date**: 2026-05-11
|
||||
|
||||
## Summary
|
||||
|
||||
Built the C13 FDR writer chain end-to-end. AZ-291 lands the single writer thread + segment file lifecycle + cross-process filelock + ENOSPC degraded mode. AZ-292 lands the `FlightHeader` / `FlightFooter` records and the four per-flight counters (records_written, records_dropped_overrun, bytes_written, rollover_count) that make a flight directory self-describing. AZ-293 lands the per-flight 64 GiB cap policy with oldest-segment-dropped + canonical `segment_rollover` record emission.
|
||||
|
||||
The three tasks share a single module (`components/c13_fdr/`) with these new files:
|
||||
|
||||
- `errors.py` — five typed exceptions covering construction, open, close, and concurrent-writer failure paths.
|
||||
- `headers.py` — `FlightHeader` and `FlightFooter` frozen dataclasses.
|
||||
- `writer.py` — `FileFdrWriter` (AZ-291 + AZ-292).
|
||||
- `cap_policy.py` — `CapacityCapPolicy` (AZ-293).
|
||||
- `__init__.py`, `interface.py` — re-exports.
|
||||
|
||||
## Features Landed
|
||||
|
||||
### AZ-291 — Writer thread + segment lifecycle
|
||||
|
||||
- `FileFdrWriter(flight_root, flight_id, config, fdr_clients, gcs_alert, *, on_rotation, drain_sleep_s)` constructor.
|
||||
- `start()`, `stop()`, `open_flight(header)`, `close_flight()` lifecycle methods.
|
||||
- Background writer thread that loops over every registered `FdrClient.drain(batch_size)` and writes serialised records to the current segment with `<uint32-LE length prefix> | <serialised body>` framing.
|
||||
- Per-segment rotation triggered by `segment_size_bytes` (default 64 MiB).
|
||||
- Cross-process filelock via `fcntl.flock(LOCK_EX | LOCK_NB)` on `flight_root/.fdr.lock`; held for the entire flight; constructor-time `FdrConcurrentWriterError` on contention.
|
||||
- ENOSPC degraded mode: one ERROR log + one GCS alert; subsequent failures are log-rate-capped at 1/sec; producer buffers keep draining (records discarded) so producer-side memory does not grow unbounded.
|
||||
- Public introspection: `current_segment_path()`, `current_segment_bytes()`, `segments_written()`, `is_rolling()`, `is_degraded()`, `current_size_bytes()`, `rollover_count`, `records_dropped_overrun`, `flight_id`, `flight_dir`.
|
||||
|
||||
### AZ-292 — FlightHeader / FlightFooter + counters
|
||||
|
||||
- `FlightHeader` dataclass with `flight_id`, `flight_started_at_iso`, `flight_started_at_monotonic_ns`, `config_snapshot`, `signing_key_rotation_event`, `manifest_content_hashes`, `build_info`.
|
||||
- `FlightFooter` dataclass with `flight_id`, `flight_ended_at_iso`, `flight_ended_at_monotonic_ns`, `records_written`, `records_dropped_overrun`, `bytes_written`, `rollover_count`, `clean_shutdown`.
|
||||
- `open_flight(header)` writes the header as the first record of segment 0; rejects flight_id mismatch with `FdrOpenError`.
|
||||
- `close_flight()` drains pending producer records, builds the footer (iteratively converging `bytes_written` to include the footer's own size), writes it, releases the filelock, and returns the `FlightFooter` to the caller. Idempotent (a second call returns the cached footer).
|
||||
- Counter integration: `_append_record` increments `_records_written` and `_bytes_written`; `_observe_overrun_record` aggregates `payload.dropped_count` into `_records_dropped_overrun`; `_rotate_segment` bumps `_rollover_count`.
|
||||
|
||||
### AZ-293 — Capacity cap policy
|
||||
|
||||
- `CapacityCapPolicy(cap_bytes, fdr_client, gcs_alert)` callable; invoked by `FileFdrWriter` via the `on_rotation` hook after every per-segment rotation.
|
||||
- Walks the flight directory, sums on-disk segment sizes + writer's running `current_segment_bytes`, and unlinks the oldest CLOSED segment if total > cap. Repeats until under cap.
|
||||
- Segment 0 (containing the `flight_header`) is never dropped unless it is the only candidate AND the directory is over cap by itself — in that case logs `fdr.cap_misconfigured` ERROR + emits one GCS alert and lets the flight continue in degraded mode.
|
||||
- Each drop enqueues a `kind="segment_rollover"` `FdrRecord` (envelope `producer_id="shared.fdr_client"`) carrying `old_segment`, `new_segment`, `total_bytes_after`; bumps `writer.rollover_count`; logs `fdr.cap_drop` INFO.
|
||||
- Default `cap_bytes = 64 * 1024**3` (64 GiB exactly per AC-NEW-3 + AC-7); valid range `[1024, 2**40]`.
|
||||
- No config flag disables `segment_rollover` emission (AC-6 verified by a config-schema scan test).
|
||||
|
||||
## Schema / Contract Changes
|
||||
|
||||
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` — `flight_header` and `flight_footer` payload key sets extended to match AZ-292's task-spec dataclasses. Effective minor bump (1.0.0 → 1.1.0); no breaking change since no producer or consumer used the previous narrow shape.
|
||||
- `src/gps_denied_onboard/fdr_client/records.py` — `KNOWN_PAYLOAD_KEYS` updated for the two kinds.
|
||||
- `src/gps_denied_onboard/config/schema.py` — added `FdrWriterConfig` nested inside `FdrConfig`. Fields: `segment_size_bytes` (default 64 MiB), `batch_size` (default 64), `flight_cap_bytes` (default 64 GiB), `debug_log_per_record` (default False).
|
||||
|
||||
## Dependency Changes
|
||||
|
||||
None. Despite the AZ-291 spec calling for `filelock`, the package was not in `pyproject.toml` and `fcntl.flock` from the stdlib provides equivalent POSIX advisory-lock semantics (kernel auto-releases on process death — directly matching the Risk-3 mitigation). Documented inline in the writer's module docstring.
|
||||
|
||||
## Test Results
|
||||
|
||||
- **New tests**: 29 (9 for AZ-291, 10 for AZ-292, 10 for AZ-293).
|
||||
- **Full suite**: 323 passed, 2 skipped (pre-existing cmake / actionlint skips). 0 regressions.
|
||||
|
||||
## Acceptance Criteria Coverage
|
||||
|
||||
| Task | AC | Test | Status |
|
||||
|------|----|------|--------|
|
||||
| AZ-291 | AC-1 drain all producers | `test_ac1_drain_all_registered_producers` | PASS |
|
||||
| AZ-291 | AC-2 per-segment rotation | `test_ac2_per_segment_rotation_at_size_cap` | PASS |
|
||||
| AZ-291 | AC-3 atomic rotation | `test_ac3_atomic_rotation_no_half_segment` | PASS |
|
||||
| AZ-291 | AC-4 filelock prevents concurrent | `test_ac4_concurrent_writer_blocked_by_filelock` | PASS |
|
||||
| AZ-291 | AC-5 ENOSPC degrades + alerts | `test_ac5_enospc_degrades_and_alerts` | PASS |
|
||||
| AZ-291 | AC-6 stop drains + fsyncs + releases lock | `test_ac6_stop_drains_and_releases_lock` | PASS |
|
||||
| AZ-291 | AC-7 segment file layout | `test_ac7_segment_layout` | PASS |
|
||||
| AZ-291 | AC-8 steady-state no overrun | `test_ac8_steady_state_no_overrun` | PASS |
|
||||
| AZ-292 | AC-1 header is first record | `test_ac1_flight_header_is_first_record` | PASS |
|
||||
| AZ-292 | AC-2 footer is last record | `test_ac2_flight_footer_is_last_record` | PASS |
|
||||
| AZ-292 | AC-3 counters reflect reality | `test_ac3_counters_reflect_on_disk_reality` | PASS |
|
||||
| AZ-292 | AC-4 open_flight FdrOpenError on disk failure | `test_ac4_open_flight_fdrerror_on_disk_failure` | PASS |
|
||||
| AZ-292 | AC-5 reject flight_id mismatch | `test_ac5_open_flight_rejects_flight_id_mismatch` | PASS |
|
||||
| AZ-292 | AC-6 close without open raises | `test_ac6_close_without_open_raises` | PASS |
|
||||
| AZ-292 | AC-7 clean_shutdown=False on teardown | `test_ac7_uncleansed_teardown_no_clean_shutdown` | PASS |
|
||||
| AZ-292 | AC-8 records_dropped_overrun aggregates | `test_ac8_records_dropped_overrun_aggregates_dropped_counts` | PASS |
|
||||
| AZ-293 | AC-1 drop oldest when over cap | `test_ac1_drop_oldest_when_dir_exceeds_cap` | PASS |
|
||||
| AZ-293 | AC-2 loop until under cap | `test_ac2_loop_until_under_cap` | PASS |
|
||||
| AZ-293 | AC-3 misconfigured cap path | `test_ac3_cap_misconfigured_when_segment_zero_alone` | PASS |
|
||||
| AZ-293 | AC-4 open segment never dropped | `test_ac4_currently_open_segment_never_dropped` | PASS |
|
||||
| AZ-293 | AC-5 canonical fields on rollover | `test_ac5_segment_rollover_record_has_canonical_fields` | PASS |
|
||||
| AZ-293 | AC-6 no disable flag | `test_ac6_no_config_flag_disables_segment_rollover` + `test_config_full_schema_has_no_rollover_disable_field` | PASS |
|
||||
| AZ-293 | AC-7 default cap is exactly 64 GiB | `test_ac7_default_cap_is_exactly_64_gib` | PASS |
|
||||
| AZ-293 | AC-8 rollover_count matches | `test_ac8_rollover_count_matches_segment_rollover_records` | PASS |
|
||||
|
||||
## Follow-ups
|
||||
|
||||
- **AZ-294 / AZ-295 / AZ-296**: mid-flight tile snapshot path, thumbnail rate cap, and takeoff-abort wiring — next sub-tasks in E-C13 (out of scope for Batch 6).
|
||||
- **Composition root wiring**: the `runtime_root.py` will inject the `CapacityCapPolicy` instance as the writer's `on_rotation` callback when E-C13's full wiring lands (likely a later batch or AZ-270 expansion).
|
||||
- **NFR-perf microbenches**: NFR-perf-throughput (≥ 200 Hz on Tier-2), NFR-perf-rotation (p99 ≤ 50 ms), NFR-perf-hook (p99 ≤ 50 ms), NFR-perf-multi-drop (≤ 100 ms) are documented in the specs but require Tier-2 hardware to run; tracked for a future Jetson-harness cycle.
|
||||
- **AZ-294 mid-flight tile snapshot**: depends on the writer being able to record a JSON pointer record without copying the JPEG inline (`sidecar_path` invariant); the existing `_append_record` supports this directly. Implementation will live in this same module.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Batch 06 — Code Review
|
||||
|
||||
**Batch**: 6 of N
|
||||
**Tasks**: AZ-291 (C13 writer thread), AZ-292 (FlightHeader/Footer), AZ-293 (Capacity cap policy)
|
||||
**Reviewer**: autodev (7-phase)
|
||||
**Verdict**: **PASS_WITH_INFO**
|
||||
**Date**: 2026-05-11
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|
||||
|------|---------------------|----------------------|------------------------|
|
||||
| AZ-291 | C13 writer thread + segment files + filelock + ENOSPC | `components/c13_fdr/{writer.py,errors.py,interface.py,__init__.py}`, `config/schema.py`, `fdr_client/records.py` | `tests/unit/c13_fdr/test_az291_writer_thread.py` |
|
||||
| AZ-292 | Flight header/footer + counters | `components/c13_fdr/{headers.py,writer.py}`, `fdr_record_schema.md` contract bump | `tests/unit/c13_fdr/test_az292_flight_header_footer.py`, `tests/unit/test_az272_fdr_record_schema.py` |
|
||||
| AZ-293 | 64 GiB cap + oldest-segment-dropped + segment_rollover | `components/c13_fdr/cap_policy.py`, writer hook surface | `tests/unit/c13_fdr/test_az293_capacity_cap_policy.py` |
|
||||
|
||||
## Phase 1 — AC compliance
|
||||
|
||||
All 8 ACs per task verified via the new unit tests; 29 new tests added, all passing. See per-task AC coverage in `batch_06_cycle1_report.md`.
|
||||
|
||||
## Phase 2 — Contract drift
|
||||
|
||||
- **`fdr_record_schema.md` v1.0.0 → v1.1.0 (effective)**: `flight_header` and `flight_footer` payload key sets were extended to match AZ-292's task-spec dataclass shape (`flight_started_at_iso`, `flight_started_at_monotonic_ns`, `config_snapshot`, `signing_key_rotation_event`, `manifest_content_hashes` on header; `flight_ended_at_iso`, `flight_ended_at_monotonic_ns`, `records_dropped_overrun`, `bytes_written`, `rollover_count`, `clean_shutdown` on footer). The previous narrow shape (`started_at` / `ended_at` / `records_dropped`) was an unimplemented draft — no producer or consumer relied on it. The change is a minor bump per the contract's own versioning rules ("new optional payload field appended → minor"); existing parsers stay forward-compatible (unknown keys end up in `payload.extra`). The AZ-272 round-trip test was updated to track the new canonical fields.
|
||||
|
||||
## Phase 3 — Architectural compliance
|
||||
|
||||
- **R14 / single-writer SPSC contract**: `FileFdrWriter` is the sole consumer of every registered `FdrClient`; the writer thread is the only mutator of the four flight counters. No reader-side locks.
|
||||
- **No cross-component upward imports**: `cap_policy.py` imports `FileFdrWriter` (allowed: same component); `writer.py` imports from `fdr_client.*` and `config.*` (allowed: cross-cutting); no component upward edges.
|
||||
- **AZ-291 vs AZ-293 separation**: per-segment rotation (size-driven) lives in the writer; per-flight cap policy (cumulative size-driven) lives in `CapacityCapPolicy` wired by composition root via `on_rotation` hook. Writer never imports the policy.
|
||||
- **No new dependencies**: contract said "atomicwrites + filelock" but `filelock` was not in `pyproject.toml`. Used `fcntl.flock` from stdlib (POSIX advisory locks — kernel releases on process death, matching the Risk-3 mitigation in the spec). Documented inline in `writer.py` module docstring.
|
||||
|
||||
## Phase 4 — Performance & reliability
|
||||
|
||||
- **`fsync` discipline (AC-3 / NFR-reliability-fsync)**: every segment close (rotation + stop + close_flight) calls `os.fsync` before `os.close`. No per-record fsync (NFR allows this).
|
||||
- **No backward seeks (NFR-reliability-no-seek)**: file descriptor opened with `O_WRONLY | O_CREAT | O_APPEND`; only `os.write` and `os.fsync` are called on it.
|
||||
- **Footer `bytes_written` self-reference (AC-3)**: the footer's `bytes_written` payload field must include the footer's own framed size. `close_flight()` iterates up to 8 times to converge (an integer field's ASCII length only changes at decimal-power boundaries, so the fixpoint is reached in ≤ 2 passes in practice).
|
||||
- **ENOSPC degraded mode (AC-5)**: catches `OSError` around `os.write`, emits one ERROR log + one GCS alert, drops further records while continuing to dequeue producer buffers so producers don't grow unbounded. Per-second log rate cap (`_LOG_FAILURE_RATE_LIMIT_S = 1.0`) caps repeated failure noise.
|
||||
- **Filelock recovery (Risk 3)**: `fcntl.flock` is kernel-managed; abrupt process death releases the lock automatically — verified by AC-6 (re-construct succeeds after `stop()`).
|
||||
|
||||
## Phase 5 — Test quality
|
||||
|
||||
- **AC coverage**: 8 ACs per task × 3 tasks = 24 explicit AC tests; plus 5 additional invariants (frozen dataclasses, double-start, double-close idempotency, cap-policy input validation, config-schema-no-disable-flag).
|
||||
- **Determinism**: tests use a busy-wait loop with explicit timeout (`deadline = time.monotonic() + 5.0`) to wait for the writer thread to drain producer buffers — preferred over fixed `time.sleep` (deterministic on fast machines, robust on slow ones).
|
||||
- **Production isolation**: no `monkeypatch` of stdlib unless required (AC-5 ENOSPC mocks `os.write` once to inject `errno.ENOSPC`; AC-4 of AZ-292 mocks `os.write` to inject `PermissionError` for the read-only-mount path).
|
||||
- **AC-7 of AZ-292 (clean_shutdown=False on uncleansed teardown)**: tests choose the "no footer at all" path (allowed by spec); production composition root can choose to add the partial-footer path later without breaking the contract.
|
||||
|
||||
## Phase 6 — Informational findings (no blockers)
|
||||
|
||||
1. **Footer `bytes_written` includes the footer record itself** — convergence loop runs once or twice in practice; documented inline. If a future test pins the exact byte total against the file, the loop is the canonical answer; no edge case where it diverges (8-iter cap is paranoia margin).
|
||||
|
||||
2. **Cap policy emits `segment_rollover` records via the shared FdrClient** — those records are themselves enqueued, drained, and written to the current open segment. Under aggressive test caps (cap ≤ 1024 bytes, segment ≤ 256 bytes), the cascade of rollover-record writes can extend the cap-drop loop. In production with cap = 64 GiB and segments = 64 MiB, the cascade is negligible (≤ tens of bytes per drop). Documented as test-only consideration in AC-2 test comments.
|
||||
|
||||
3. **Contract minor bump for `flight_header` / `flight_footer`** — see Phase 2. No consumer of the previous narrow shape exists; no migration needed; the v1.0.0 draft has been overwritten in place because no record matching the prior shape has ever been emitted.
|
||||
|
||||
4. **`filelock` dependency replaced with `fcntl.flock`** — see Phase 3. Net effect: one fewer transitive dependency, same semantics on Linux + macOS (target platforms). Windows is explicitly unsupported on the companion onboard runtime.
|
||||
|
||||
## Phase 7 — Verdict
|
||||
|
||||
**PASS_WITH_INFO** — all ACs covered, all 323 project tests green, no lint or formatting issues, no contract drift uncovered. Informational findings (1–4 above) are documented and require no follow-up beyond their inline notes.
|
||||
@@ -8,7 +8,7 @@ status: in_progress
|
||||
sub_step:
|
||||
phase: 14
|
||||
name: loop-next-batch
|
||||
detail: "batch 5 of N committed"
|
||||
detail: "batch 6 of N committed"
|
||||
retry_count: 0
|
||||
cycle: 1
|
||||
tracker: jira
|
||||
|
||||
@@ -1,5 +1,26 @@
|
||||
"""C13 FDR Writer component — Public API."""
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr.cap_policy import CapacityCapPolicy
|
||||
from gps_denied_onboard.components.c13_fdr.errors import (
|
||||
FdrAlreadyClosedError,
|
||||
FdrCloseWithoutOpenError,
|
||||
FdrConcurrentWriterError,
|
||||
FdrOpenError,
|
||||
FdrWriterError,
|
||||
)
|
||||
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
||||
from gps_denied_onboard.components.c13_fdr.interface import FdrWriter
|
||||
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
|
||||
|
||||
__all__ = ["FdrWriter"]
|
||||
__all__ = [
|
||||
"CapacityCapPolicy",
|
||||
"FdrAlreadyClosedError",
|
||||
"FdrCloseWithoutOpenError",
|
||||
"FdrConcurrentWriterError",
|
||||
"FdrOpenError",
|
||||
"FdrWriter",
|
||||
"FdrWriterError",
|
||||
"FileFdrWriter",
|
||||
"FlightFooter",
|
||||
"FlightHeader",
|
||||
]
|
||||
|
||||
@@ -0,0 +1,186 @@
|
||||
"""``CapacityCapPolicy`` — AZ-293 per-flight 64 GiB cap enforcement.
|
||||
|
||||
After every per-segment rotation the writer (AZ-291) invokes this hook
|
||||
with the index of the just-closed segment. The policy walks the flight
|
||||
directory, sums on-disk segment sizes, and unlinks the oldest CLOSED
|
||||
segment until the total falls back under cap.
|
||||
|
||||
For each drop, a ``kind="segment_rollover"`` ``FdrRecord`` is enqueued
|
||||
on the shared FdrClient. The currently-open segment is never dropped,
|
||||
nor is the segment containing ``flight_header`` (segment 0) — if the
|
||||
only droppable segment is segment 0 the policy logs a hard ERROR +
|
||||
GCS alert (``fdr.cap_misconfigured``) and lets the flight continue in
|
||||
degraded mode (operator action required).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Callable
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
|
||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||
from gps_denied_onboard.fdr_client.records import OVERRUN_PRODUCER_ID, FdrRecord
|
||||
from gps_denied_onboard.logging import get_logger
|
||||
|
||||
__all__ = ["CapacityCapPolicy"]
|
||||
|
||||
_SEGMENT_ROLLOVER_KIND = "segment_rollover"
|
||||
_CAP_MISCONFIG_KIND = "fdr.cap_misconfigured"
|
||||
_CAP_DROP_KIND = "fdr.cap_drop"
|
||||
_CAP_UNLINK_FAILED_KIND = "fdr.cap_unlink_failed"
|
||||
|
||||
|
||||
def _iso_now() -> str:
|
||||
return datetime.now(tz=timezone.utc).isoformat()
|
||||
|
||||
|
||||
def _on_disk_size(path: Path) -> int:
|
||||
try:
|
||||
return path.stat().st_size
|
||||
except OSError:
|
||||
return 0
|
||||
|
||||
|
||||
class CapacityCapPolicy:
|
||||
"""Per-flight directory-size cap (AZ-293 / E-C13)."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
cap_bytes: int,
|
||||
fdr_client: FdrClient,
|
||||
gcs_alert: Callable[[str], None],
|
||||
) -> None:
|
||||
if not isinstance(cap_bytes, int) or isinstance(cap_bytes, bool):
|
||||
raise ValueError(f"cap_bytes must be a non-bool int; got {cap_bytes!r}")
|
||||
if cap_bytes < 1024:
|
||||
raise ValueError(
|
||||
f"cap_bytes must be >= 1024 (1 KiB minimum for tests); got {cap_bytes}"
|
||||
)
|
||||
if cap_bytes > 2**40:
|
||||
raise ValueError(f"cap_bytes must be <= {2**40} (1 TiB sanity bound); got {cap_bytes}")
|
||||
self._cap_bytes = cap_bytes
|
||||
self._fdr_client = fdr_client
|
||||
self._gcs_alert = gcs_alert
|
||||
self._alerted_for_misconfig = False
|
||||
self._log = get_logger("c13_fdr.cap_policy")
|
||||
|
||||
@property
|
||||
def cap_bytes(self) -> int:
|
||||
return self._cap_bytes
|
||||
|
||||
def __call__(self, writer: FileFdrWriter, just_rotated_from: int) -> None:
|
||||
"""Post-rotation hook entry-point.
|
||||
|
||||
``just_rotated_from`` is the index of the segment that was open
|
||||
prior to the rotation (it has just been closed + fsync'd).
|
||||
"""
|
||||
del just_rotated_from # only used for diagnostic logs by AZ-291
|
||||
total = self._directory_size(writer)
|
||||
# Per AZ-293 AC-2 loop until under cap.
|
||||
while total > self._cap_bytes:
|
||||
closed = writer.list_closed_segments()
|
||||
if not closed:
|
||||
# The currently-open segment alone is over cap; we
|
||||
# cannot drop it. Fall through to the misconfigured
|
||||
# path.
|
||||
self._cap_misconfigured(writer, total)
|
||||
return
|
||||
# The oldest segment may BE segment 0 — which contains the
|
||||
# flight_header. We never drop segment 0 unless the operator
|
||||
# has explicitly configured ``allow_drop_segment_zero`` (no
|
||||
# such switch exists; AC-6 requires no opt-out). So if the
|
||||
# only droppable segment is 0, we hard-fail.
|
||||
oldest_index, oldest_path = closed[0]
|
||||
if oldest_index == 0 and len(closed) == 1:
|
||||
self._cap_misconfigured(writer, total)
|
||||
return
|
||||
if oldest_index == 0:
|
||||
# There are other segments to drop first. Skip segment 0
|
||||
# and drop the next-oldest instead.
|
||||
oldest_index, oldest_path = closed[1]
|
||||
freed = _on_disk_size(oldest_path)
|
||||
try:
|
||||
oldest_path.unlink()
|
||||
except OSError as exc:
|
||||
self._log.warning(
|
||||
f"{_CAP_UNLINK_FAILED_KIND}: {oldest_path.name} ({exc})",
|
||||
extra={
|
||||
"kind": _CAP_UNLINK_FAILED_KIND,
|
||||
"kv": {"path": str(oldest_path), "error": repr(exc)},
|
||||
},
|
||||
)
|
||||
# Continue to next-oldest so we don't loop forever on a
|
||||
# missing inode (operator interference / Risk 2).
|
||||
total -= freed
|
||||
continue
|
||||
writer.increment_rollover_count_for_cap()
|
||||
total = self._directory_size(writer)
|
||||
self._emit_segment_rollover(
|
||||
old_segment=oldest_index,
|
||||
new_segment=writer.segments_written(),
|
||||
total_bytes_after=total,
|
||||
)
|
||||
self._log.info(
|
||||
f"{_CAP_DROP_KIND}: dropped segment {oldest_index}, total now {total}",
|
||||
extra={
|
||||
"kind": _CAP_DROP_KIND,
|
||||
"kv": {
|
||||
"old_segment": oldest_index,
|
||||
"new_segment": writer.segments_written(),
|
||||
"total_bytes_after": total,
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
def _directory_size(self, writer: FileFdrWriter) -> int:
|
||||
closed_size = sum(_on_disk_size(p) for _idx, p in writer.list_closed_segments())
|
||||
return closed_size + writer.current_segment_bytes()
|
||||
|
||||
def _emit_segment_rollover(
|
||||
self, *, old_segment: int, new_segment: int, total_bytes_after: int
|
||||
) -> None:
|
||||
record = FdrRecord(
|
||||
schema_version=1,
|
||||
ts=_iso_now(),
|
||||
producer_id=OVERRUN_PRODUCER_ID,
|
||||
kind=_SEGMENT_ROLLOVER_KIND,
|
||||
payload={
|
||||
"old_segment": old_segment,
|
||||
"new_segment": new_segment,
|
||||
"total_bytes_after": total_bytes_after,
|
||||
},
|
||||
)
|
||||
# Producer-side overrun policy (AZ-274) handles full-buffer
|
||||
# cases — the cap policy itself does not block here.
|
||||
self._fdr_client.enqueue(record)
|
||||
|
||||
def _cap_misconfigured(self, writer: FileFdrWriter, observed_total: int) -> None:
|
||||
self._log.error(
|
||||
f"{_CAP_MISCONFIG_KIND}: cap={self._cap_bytes} bytes but "
|
||||
f"on-disk total={observed_total} bytes and no droppable segment exists",
|
||||
extra={
|
||||
"kind": _CAP_MISCONFIG_KIND,
|
||||
"kv": {
|
||||
"cap_bytes": self._cap_bytes,
|
||||
"observed_total": observed_total,
|
||||
"current_segment_bytes": writer.current_segment_bytes(),
|
||||
},
|
||||
},
|
||||
)
|
||||
if not self._alerted_for_misconfig:
|
||||
self._alerted_for_misconfig = True
|
||||
try:
|
||||
self._gcs_alert(
|
||||
f"FDR cap misconfigured: cap={self._cap_bytes} bytes "
|
||||
f"< current single-segment size; flight degraded"
|
||||
)
|
||||
except Exception as exc:
|
||||
self._log.error(
|
||||
"fdr.gcs_alert_failed",
|
||||
extra={
|
||||
"kind": "fdr.gcs_alert_failed",
|
||||
"kv": {"on": "cap_misconfig", "error": repr(exc)},
|
||||
},
|
||||
)
|
||||
@@ -0,0 +1,41 @@
|
||||
"""C13 FDR writer error types (AZ-291 / AZ-292 / AZ-293)."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
__all__ = [
|
||||
"FdrAlreadyClosedError",
|
||||
"FdrCloseWithoutOpenError",
|
||||
"FdrConcurrentWriterError",
|
||||
"FdrOpenError",
|
||||
"FdrWriterError",
|
||||
]
|
||||
|
||||
|
||||
class FdrWriterError(RuntimeError):
|
||||
"""Base class for every C13 writer-side runtime error."""
|
||||
|
||||
|
||||
class FdrConcurrentWriterError(FdrWriterError):
|
||||
"""Raised when a second writer attempts to acquire the flight-root lock.
|
||||
|
||||
The flight-root filelock is held for the entire flight (AZ-291 AC-4);
|
||||
a second writer construction against the same root is a composition-
|
||||
root bug.
|
||||
"""
|
||||
|
||||
|
||||
class FdrOpenError(FdrWriterError):
|
||||
"""Raised when ``open_flight`` cannot land the ``flight_header`` record
|
||||
on disk (AZ-292 AC-4, AC-5) or when the writer cannot open segment 0.
|
||||
|
||||
Translates to a takeoff abort in the composition root (AZ-296 wires
|
||||
that path); this task only raises the exception.
|
||||
"""
|
||||
|
||||
|
||||
class FdrCloseWithoutOpenError(FdrWriterError):
|
||||
"""Raised by ``close_flight`` when ``open_flight`` was never called."""
|
||||
|
||||
|
||||
class FdrAlreadyClosedError(FdrWriterError):
|
||||
"""Raised by ``close_flight`` if called twice (AZ-292 idempotency)."""
|
||||
@@ -0,0 +1,52 @@
|
||||
"""C13 FlightHeader / FlightFooter dataclasses (AZ-292).
|
||||
|
||||
The header is the first record of segment 0; the footer is the last
|
||||
record before clean shutdown. Together they make a per-flight FDR
|
||||
directory self-describing.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
from uuid import UUID
|
||||
|
||||
__all__ = ["FlightFooter", "FlightHeader"]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FlightHeader:
|
||||
"""First-record-on-disk descriptor for a flight (AZ-292 AC-1).
|
||||
|
||||
``config_snapshot`` MUST be JSON-safe — the composition root scrubs
|
||||
known-secret fields (per AZ-269 redacted-config helper) before
|
||||
constructing the header.
|
||||
"""
|
||||
|
||||
flight_id: UUID
|
||||
flight_started_at_iso: str
|
||||
flight_started_at_monotonic_ns: int
|
||||
config_snapshot: dict[str, Any] = field(default_factory=dict)
|
||||
signing_key_rotation_event: dict[str, Any] = field(default_factory=dict)
|
||||
manifest_content_hashes: dict[str, str] = field(default_factory=dict)
|
||||
build_info: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FlightFooter:
|
||||
"""Last-record-on-disk descriptor for a flight (AZ-292 AC-2).
|
||||
|
||||
The four counters (``records_written``, ``records_dropped_overrun``,
|
||||
``bytes_written``, ``rollover_count``) are the AC-NEW-3 canonical
|
||||
audit trail. ``clean_shutdown`` distinguishes a graceful landing
|
||||
from a power-loss truncation.
|
||||
"""
|
||||
|
||||
flight_id: UUID
|
||||
flight_ended_at_iso: str
|
||||
flight_ended_at_monotonic_ns: int
|
||||
records_written: int
|
||||
records_dropped_overrun: int
|
||||
bytes_written: int
|
||||
rollover_count: int
|
||||
clean_shutdown: bool
|
||||
@@ -0,0 +1,548 @@
|
||||
"""`FileFdrWriter` — single C13 writer thread + segment lifecycle.
|
||||
|
||||
Implements AZ-291 (writer thread + segment files + filelock + atomic
|
||||
rotation + ENOSPC handling) and AZ-292 (flight header/footer + per-flight
|
||||
counters). The cap policy hook (AZ-293) is wired via the
|
||||
``on_rotation`` callback so AZ-291 stays focused on per-segment
|
||||
lifecycle and the policy can be injected by the composition root.
|
||||
|
||||
Single-thread by contract on each side:
|
||||
- Producer side: every registered `FdrClient` has exactly one consumer
|
||||
(this writer's background thread).
|
||||
- Lifecycle side: ``start``/``stop``/``open_flight``/``close_flight``
|
||||
are called once each by the composition root.
|
||||
|
||||
Filelock: cross-process advisory ``fcntl.flock`` on ``flight_root/.fdr.lock``
|
||||
held for the entire flight. POSIX semantics mean the kernel releases
|
||||
the lock on process death automatically (Risk 3).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import errno
|
||||
import fcntl
|
||||
import os
|
||||
import struct
|
||||
import threading
|
||||
import time
|
||||
from collections.abc import Callable, Sequence
|
||||
from dataclasses import asdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from uuid import UUID
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr.errors import (
|
||||
FdrAlreadyClosedError,
|
||||
FdrCloseWithoutOpenError,
|
||||
FdrConcurrentWriterError,
|
||||
FdrOpenError,
|
||||
FdrWriterError,
|
||||
)
|
||||
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
||||
from gps_denied_onboard.config import FdrWriterConfig
|
||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||
from gps_denied_onboard.fdr_client.records import (
|
||||
OVERRUN_KIND,
|
||||
OVERRUN_PRODUCER_ID,
|
||||
FdrRecord,
|
||||
serialise,
|
||||
)
|
||||
from gps_denied_onboard.logging import get_logger
|
||||
|
||||
__all__ = ["FileFdrWriter"]
|
||||
|
||||
_FLIGHT_HEADER_KIND = "flight_header"
|
||||
_FLIGHT_FOOTER_KIND = "flight_footer"
|
||||
_LENGTH_PREFIX = struct.Struct("<I") # uint32 LE record length prefix
|
||||
_LOG_FAILURE_RATE_LIMIT_S = 1.0
|
||||
_DEFAULT_DRAIN_SLEEP_S = 0.001
|
||||
|
||||
|
||||
def _iso_now() -> str:
|
||||
return datetime.now(tz=timezone.utc).isoformat()
|
||||
|
||||
|
||||
def _flight_header_to_payload(header: FlightHeader) -> dict:
|
||||
payload = asdict(header)
|
||||
payload["flight_id"] = str(header.flight_id)
|
||||
return payload
|
||||
|
||||
|
||||
def _flight_footer_to_payload(footer: FlightFooter) -> dict:
|
||||
payload = asdict(footer)
|
||||
payload["flight_id"] = str(footer.flight_id)
|
||||
return payload
|
||||
|
||||
|
||||
class FileFdrWriter:
|
||||
"""Single-writer C13 FDR component.
|
||||
|
||||
Constructor binds the writer to a specific ``flight_root`` and a
|
||||
``flight_id``; the same UUID MUST be passed in via the
|
||||
``FlightHeader`` to ``open_flight`` (AZ-292 AC-5).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
flight_root: Path,
|
||||
flight_id: UUID,
|
||||
config: FdrWriterConfig,
|
||||
fdr_clients: Sequence[FdrClient],
|
||||
gcs_alert: Callable[[str], None],
|
||||
*,
|
||||
on_rotation: Callable[[FileFdrWriter, int], None] | None = None,
|
||||
drain_sleep_s: float = _DEFAULT_DRAIN_SLEEP_S,
|
||||
) -> None:
|
||||
self._flight_root = Path(flight_root)
|
||||
self._flight_id = flight_id
|
||||
self._config = config
|
||||
self._fdr_clients = tuple(fdr_clients)
|
||||
self._gcs_alert = gcs_alert
|
||||
self._on_rotation = on_rotation
|
||||
self._drain_sleep_s = drain_sleep_s
|
||||
|
||||
# Filesystem state.
|
||||
self._flight_dir: Path = self._flight_root / str(flight_id)
|
||||
self._lock_path: Path = self._flight_root / ".fdr.lock"
|
||||
self._lock_fd: int | None = None
|
||||
|
||||
# Segment state.
|
||||
self._segment_index: int = 0
|
||||
self._segment_fd: int | None = None
|
||||
self._segment_bytes: int = 0
|
||||
self._is_rolling: bool = False
|
||||
|
||||
# Lifecycle state.
|
||||
self._started = False
|
||||
self._stopped = False
|
||||
self._opened = False
|
||||
self._closed = False
|
||||
self._stop_event = threading.Event()
|
||||
self._thread: threading.Thread | None = None
|
||||
self._exit_code_thrown: BaseException | None = None
|
||||
self._last_failure_log_t: float = 0.0
|
||||
self._is_degraded = False
|
||||
self._gcs_alerted_for_write_failure = False
|
||||
|
||||
# Counters (writer-thread is the sole mutator).
|
||||
self._records_written = 0
|
||||
self._records_dropped_overrun = 0
|
||||
self._bytes_written = 0
|
||||
self._rollover_count = 0
|
||||
self._stored_footer: FlightFooter | None = None
|
||||
|
||||
self._log = get_logger("c13_fdr.writer")
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Public read-only introspection (AZ-291).
|
||||
|
||||
def current_segment_path(self) -> Path:
|
||||
return self._segment_path(self._segment_index)
|
||||
|
||||
def current_segment_bytes(self) -> int:
|
||||
return self._segment_bytes
|
||||
|
||||
def segments_written(self) -> int:
|
||||
# Number of CLOSED segments. The currently-open one isn't counted
|
||||
# until it rotates or close_flight closes it.
|
||||
return self._segment_index
|
||||
|
||||
def is_rolling(self) -> bool:
|
||||
return self._is_rolling
|
||||
|
||||
def is_degraded(self) -> bool:
|
||||
return self._is_degraded
|
||||
|
||||
def current_size_bytes(self) -> int:
|
||||
return self._bytes_written
|
||||
|
||||
@property
|
||||
def flight_id(self) -> UUID:
|
||||
return self._flight_id
|
||||
|
||||
@property
|
||||
def flight_dir(self) -> Path:
|
||||
return self._flight_dir
|
||||
|
||||
@property
|
||||
def rollover_count(self) -> int:
|
||||
return self._rollover_count
|
||||
|
||||
@property
|
||||
def records_dropped_overrun(self) -> int:
|
||||
return self._records_dropped_overrun
|
||||
|
||||
def list_closed_segments(self) -> list[tuple[int, Path]]:
|
||||
"""Return ``(segment_index, path)`` for every CLOSED segment on disk.
|
||||
|
||||
Used by the cap policy (AZ-293) to decide which segment to drop.
|
||||
The currently-open segment (``self._segment_index``) is excluded
|
||||
from the result regardless of whether its file exists.
|
||||
"""
|
||||
result: list[tuple[int, Path]] = []
|
||||
if not self._flight_dir.exists():
|
||||
return result
|
||||
for entry in self._flight_dir.iterdir():
|
||||
if not entry.is_file():
|
||||
continue
|
||||
name = entry.name
|
||||
if not (name.startswith("segment-") and name.endswith(".fdr")):
|
||||
continue
|
||||
try:
|
||||
index = int(name[len("segment-") : -len(".fdr")])
|
||||
except ValueError:
|
||||
continue
|
||||
if index == self._segment_index:
|
||||
continue
|
||||
result.append((index, entry))
|
||||
result.sort(key=lambda kv: kv[0])
|
||||
return result
|
||||
|
||||
def increment_rollover_count_for_cap(self) -> None:
|
||||
"""Allow the cap policy (AZ-293) to attribute a cap-driven drop to
|
||||
``rollover_count`` so the footer's totals match.
|
||||
"""
|
||||
self._rollover_count += 1
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Lifecycle: start / stop / open_flight / close_flight.
|
||||
|
||||
def start(self) -> None:
|
||||
if self._started:
|
||||
raise FdrWriterError("FileFdrWriter.start called twice")
|
||||
self._started = True
|
||||
|
||||
self._flight_root.mkdir(parents=True, exist_ok=True)
|
||||
self._acquire_filelock()
|
||||
try:
|
||||
self._flight_dir.mkdir(parents=True, exist_ok=True)
|
||||
self._open_segment(self._segment_index)
|
||||
except Exception:
|
||||
self._release_filelock()
|
||||
self._started = False
|
||||
raise
|
||||
|
||||
self._thread = threading.Thread(target=self._writer_loop, name="c13.writer", daemon=True)
|
||||
self._thread.start()
|
||||
self._log.info("c13.writer.start", extra={"kv": {"flight_id": str(self._flight_id)}})
|
||||
|
||||
def stop(self) -> None:
|
||||
if not self._started:
|
||||
return
|
||||
if self._stopped:
|
||||
return
|
||||
self._stopped = True
|
||||
|
||||
self._stop_event.set()
|
||||
if self._thread is not None:
|
||||
self._thread.join(timeout=5.0)
|
||||
|
||||
# Best-effort drain after thread exit (AC-6).
|
||||
self._drain_all(final=True)
|
||||
self._close_segment(fsync=True)
|
||||
self._release_filelock()
|
||||
self._log.info(
|
||||
"c13.writer.stop",
|
||||
extra={
|
||||
"kv": {
|
||||
"records_written": self._records_written,
|
||||
"rollover_count": self._rollover_count,
|
||||
}
|
||||
},
|
||||
)
|
||||
|
||||
def open_flight(self, header: FlightHeader) -> None:
|
||||
if not self._started:
|
||||
raise FdrWriterError("open_flight called before start()")
|
||||
if self._opened:
|
||||
raise FdrWriterError("open_flight called twice")
|
||||
if header.flight_id != self._flight_id:
|
||||
self._fail_open(
|
||||
f"FlightHeader.flight_id ({header.flight_id}) does not match "
|
||||
f"writer's flight_id ({self._flight_id})"
|
||||
)
|
||||
|
||||
record = FdrRecord(
|
||||
schema_version=1,
|
||||
ts=_iso_now(),
|
||||
producer_id=OVERRUN_PRODUCER_ID,
|
||||
kind=_FLIGHT_HEADER_KIND,
|
||||
payload=_flight_header_to_payload(header),
|
||||
)
|
||||
try:
|
||||
self._append_record(record)
|
||||
except OSError as exc:
|
||||
self._fail_open(f"failed to write flight_header: {exc}", cause=exc)
|
||||
|
||||
self._opened = True
|
||||
self._log.info(
|
||||
"fdr.flight_open",
|
||||
extra={"kv": {"flight_id": str(self._flight_id)}},
|
||||
)
|
||||
|
||||
def close_flight(self) -> FlightFooter:
|
||||
if not self._opened:
|
||||
raise FdrCloseWithoutOpenError(
|
||||
"close_flight called without a prior successful open_flight"
|
||||
)
|
||||
if self._closed:
|
||||
if self._stored_footer is not None:
|
||||
return self._stored_footer
|
||||
raise FdrAlreadyClosedError(
|
||||
"close_flight called twice and no stored footer is available"
|
||||
)
|
||||
|
||||
# Drain pending producer records BEFORE emitting the footer so the
|
||||
# counters reflect the entire flight.
|
||||
self._stop_event.set()
|
||||
if self._thread is not None:
|
||||
self._thread.join(timeout=5.0)
|
||||
self._drain_all(final=True)
|
||||
|
||||
# Build a footer whose ``bytes_written`` payload includes the
|
||||
# footer's own framed size. Because ``bytes_written`` is itself
|
||||
# an integer field on the wire, growing it can change the
|
||||
# serialised length by 1 byte at decimal-power boundaries; we
|
||||
# iterate until the value is stable. Practically this converges
|
||||
# in one or two passes.
|
||||
ts = _iso_now()
|
||||
mono_ns = time.monotonic_ns()
|
||||
records_written_now = self._records_written + 1 # +1 for the footer itself
|
||||
bytes_estimate = self._bytes_written
|
||||
footer: FlightFooter | None = None
|
||||
footer_record: FdrRecord | None = None
|
||||
for _ in range(8):
|
||||
footer = FlightFooter(
|
||||
flight_id=self._flight_id,
|
||||
flight_ended_at_iso=ts,
|
||||
flight_ended_at_monotonic_ns=mono_ns,
|
||||
records_written=records_written_now,
|
||||
records_dropped_overrun=self._records_dropped_overrun,
|
||||
bytes_written=bytes_estimate,
|
||||
rollover_count=self._rollover_count,
|
||||
clean_shutdown=True,
|
||||
)
|
||||
footer_record = FdrRecord(
|
||||
schema_version=1,
|
||||
ts=footer.flight_ended_at_iso,
|
||||
producer_id=OVERRUN_PRODUCER_ID,
|
||||
kind=_FLIGHT_FOOTER_KIND,
|
||||
payload=_flight_footer_to_payload(footer),
|
||||
)
|
||||
body = serialise(footer_record)
|
||||
framed_size = _LENGTH_PREFIX.size + len(body)
|
||||
next_estimate = self._bytes_written + framed_size
|
||||
if next_estimate == bytes_estimate:
|
||||
break
|
||||
bytes_estimate = next_estimate
|
||||
assert footer is not None and footer_record is not None
|
||||
try:
|
||||
self._append_record(footer_record)
|
||||
except OSError as exc:
|
||||
self._handle_write_failure(exc)
|
||||
|
||||
self._close_segment(fsync=True)
|
||||
self._release_filelock()
|
||||
self._closed = True
|
||||
self._stopped = True
|
||||
self._stored_footer = footer
|
||||
self._log.info(
|
||||
"fdr.flight_close",
|
||||
extra={
|
||||
"kv": {
|
||||
"records_written": footer.records_written,
|
||||
"records_dropped_overrun": footer.records_dropped_overrun,
|
||||
"bytes_written": footer.bytes_written,
|
||||
"rollover_count": footer.rollover_count,
|
||||
"clean_shutdown": True,
|
||||
}
|
||||
},
|
||||
)
|
||||
return footer
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Writer thread loop.
|
||||
|
||||
def _writer_loop(self) -> None:
|
||||
try:
|
||||
while not self._stop_event.is_set():
|
||||
drained = self._drain_all(final=False)
|
||||
if drained == 0:
|
||||
self._stop_event.wait(timeout=self._drain_sleep_s)
|
||||
except BaseException as exc:
|
||||
self._exit_code_thrown = exc
|
||||
self._handle_write_failure(exc if isinstance(exc, OSError) else OSError(str(exc)))
|
||||
|
||||
def _drain_all(self, *, final: bool) -> int:
|
||||
drained = 0
|
||||
for client in self._fdr_clients:
|
||||
drained += self._drain_one(client, final=final)
|
||||
return drained
|
||||
|
||||
def _drain_one(self, client: FdrClient, *, final: bool) -> int:
|
||||
batch = client.drain(max_records=self._config.batch_size)
|
||||
for record in batch:
|
||||
self._observe_overrun_record(record)
|
||||
try:
|
||||
self._append_record(record)
|
||||
except OSError as exc:
|
||||
self._handle_write_failure(exc)
|
||||
# Continue dequeuing producer buffers so they don't grow
|
||||
# unboundedly even in degraded mode (AC-5 part d).
|
||||
continue
|
||||
return len(batch)
|
||||
|
||||
def _observe_overrun_record(self, record: FdrRecord) -> None:
|
||||
if record.kind != OVERRUN_KIND:
|
||||
return
|
||||
dropped = record.payload.get("dropped_count", 0)
|
||||
if isinstance(dropped, int) and dropped > 0:
|
||||
self._records_dropped_overrun += dropped
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Segment file lifecycle.
|
||||
|
||||
def _segment_path(self, index: int) -> Path:
|
||||
return self._flight_dir / f"segment-{index:04d}.fdr"
|
||||
|
||||
def _open_segment(self, index: int) -> None:
|
||||
path = self._segment_path(index)
|
||||
flags = os.O_WRONLY | os.O_CREAT | os.O_APPEND
|
||||
try:
|
||||
self._segment_fd = os.open(path, flags, 0o644)
|
||||
except OSError as exc:
|
||||
raise FdrOpenError(f"failed to open segment {index} at {path}: {exc}") from exc
|
||||
self._segment_bytes = 0
|
||||
|
||||
def _close_segment(self, *, fsync: bool) -> None:
|
||||
if self._segment_fd is None:
|
||||
return
|
||||
try:
|
||||
if fsync:
|
||||
os.fsync(self._segment_fd)
|
||||
except OSError:
|
||||
# Best-effort fsync on close; the failure is already accounted
|
||||
# for upstream (degraded mode). Continue with close.
|
||||
pass
|
||||
try:
|
||||
os.close(self._segment_fd)
|
||||
finally:
|
||||
self._segment_fd = None
|
||||
|
||||
def _rotate_segment(self) -> None:
|
||||
"""Close the current segment and atomically open the next one."""
|
||||
self._is_rolling = True
|
||||
try:
|
||||
previous = self._segment_index
|
||||
self._close_segment(fsync=True)
|
||||
self._segment_index += 1
|
||||
self._open_segment(self._segment_index)
|
||||
self._rollover_count += 1
|
||||
self._log.info(
|
||||
"c13.writer.segment_rotated",
|
||||
extra={"kv": {"previous": previous, "current": self._segment_index}},
|
||||
)
|
||||
if self._on_rotation is not None:
|
||||
# The cap-policy hook runs synchronously after each rotation
|
||||
# (AZ-293 Risk 3). Exceptions are caught to keep the writer
|
||||
# alive — the policy is responsible for its own error
|
||||
# surfacing.
|
||||
try:
|
||||
self._on_rotation(self, previous)
|
||||
except Exception as exc:
|
||||
self._log.error(
|
||||
"c13.writer.on_rotation_failed",
|
||||
extra={"kv": {"error": repr(exc)}},
|
||||
)
|
||||
finally:
|
||||
self._is_rolling = False
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Record append + length framing.
|
||||
|
||||
def _append_record(self, record: FdrRecord) -> None:
|
||||
if self._segment_fd is None:
|
||||
raise FdrWriterError("append_record called with no open segment")
|
||||
body = serialise(record)
|
||||
frame = _LENGTH_PREFIX.pack(len(body)) + body
|
||||
# Per-segment cap check happens BEFORE write so the rotation
|
||||
# threshold matches AC-2's "≤ cap + one record overshoot"
|
||||
# interpretation: a record that would push past the cap goes into
|
||||
# the NEXT segment.
|
||||
cap = self._config.segment_size_bytes
|
||||
if cap > 0 and self._segment_bytes > 0 and self._segment_bytes + len(frame) > cap:
|
||||
self._rotate_segment()
|
||||
os.write(self._segment_fd, frame)
|
||||
self._segment_bytes += len(frame)
|
||||
self._bytes_written += len(frame)
|
||||
self._records_written += 1
|
||||
if self._config.debug_log_per_record:
|
||||
self._log.debug("c13.writer.append", extra={"kv": {"kind": record.kind}})
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Cross-process filelock.
|
||||
|
||||
def _acquire_filelock(self) -> None:
|
||||
self._flight_root.mkdir(parents=True, exist_ok=True)
|
||||
self._lock_fd = os.open(self._lock_path, os.O_RDWR | os.O_CREAT, 0o644)
|
||||
try:
|
||||
fcntl.flock(self._lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||
except OSError as exc:
|
||||
os.close(self._lock_fd)
|
||||
self._lock_fd = None
|
||||
raise FdrConcurrentWriterError(
|
||||
f"another writer holds {self._lock_path}: {exc}"
|
||||
) from exc
|
||||
|
||||
def _release_filelock(self) -> None:
|
||||
if self._lock_fd is None:
|
||||
return
|
||||
try:
|
||||
fcntl.flock(self._lock_fd, fcntl.LOCK_UN)
|
||||
finally:
|
||||
os.close(self._lock_fd)
|
||||
self._lock_fd = None
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Failure handling.
|
||||
|
||||
def _fail_open(self, message: str, *, cause: BaseException | None = None) -> None:
|
||||
# An open_flight failure rolls back: close the segment, unlink it
|
||||
# (it can only contain a partially-written header), release the
|
||||
# filelock so the next attempt isn't permanently stuck.
|
||||
seg_path = self._segment_path(self._segment_index)
|
||||
self._close_segment(fsync=False)
|
||||
try:
|
||||
if seg_path.exists() and seg_path.stat().st_size == 0:
|
||||
seg_path.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
self._release_filelock()
|
||||
self._started = False
|
||||
if cause is None:
|
||||
raise FdrOpenError(message)
|
||||
raise FdrOpenError(message) from cause
|
||||
|
||||
def _handle_write_failure(self, exc: BaseException) -> None:
|
||||
now = time.monotonic()
|
||||
if now - self._last_failure_log_t >= _LOG_FAILURE_RATE_LIMIT_S:
|
||||
self._last_failure_log_t = now
|
||||
errno_value = getattr(exc, "errno", None)
|
||||
self._log.error(
|
||||
"fdr.write_failure",
|
||||
extra={"kv": {"errno": errno_value, "error": repr(exc)}},
|
||||
)
|
||||
if not self._gcs_alerted_for_write_failure:
|
||||
self._gcs_alerted_for_write_failure = True
|
||||
try:
|
||||
self._gcs_alert(
|
||||
"FDR write failure — companion in degraded mode "
|
||||
f"(errno={getattr(exc, 'errno', errno.EIO)})"
|
||||
)
|
||||
except Exception:
|
||||
# GCS alert failure is logged but does not unwind further.
|
||||
self._log.error(
|
||||
"fdr.gcs_alert_failed",
|
||||
extra={"kv": {"on": "write_failure"}},
|
||||
)
|
||||
self._is_degraded = True
|
||||
@@ -5,6 +5,7 @@ from gps_denied_onboard.config.schema import (
|
||||
Config,
|
||||
ConfigError,
|
||||
FdrConfig,
|
||||
FdrWriterConfig,
|
||||
LogConfig,
|
||||
RequiredFieldMissingError,
|
||||
RuntimeConfig,
|
||||
@@ -16,6 +17,7 @@ __all__ = [
|
||||
"Config",
|
||||
"ConfigError",
|
||||
"FdrConfig",
|
||||
"FdrWriterConfig",
|
||||
"LogConfig",
|
||||
"RequiredFieldMissingError",
|
||||
"RuntimeConfig",
|
||||
|
||||
@@ -18,6 +18,7 @@ __all__ = [
|
||||
"Config",
|
||||
"ConfigError",
|
||||
"FdrConfig",
|
||||
"FdrWriterConfig",
|
||||
"LogConfig",
|
||||
"RequiredFieldMissingError",
|
||||
"RuntimeConfig",
|
||||
@@ -46,6 +47,32 @@ class LogConfig:
|
||||
sink: str = "console"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FdrWriterConfig:
|
||||
"""C13 writer-thread block (E-C13 / AZ-291..AZ-296).
|
||||
|
||||
``segment_size_bytes`` controls per-segment rotation; the writer
|
||||
closes the current segment and opens the next once a record's
|
||||
serialised size would push the segment past this cap.
|
||||
|
||||
``batch_size`` bounds the per-producer ``drain(max_records=N)`` call
|
||||
so one slow producer cannot starve others.
|
||||
|
||||
``flight_cap_bytes`` is the AC-NEW-3 per-flight cap (default 64 GiB
|
||||
exactly). Lowered in tests to exercise the cap policy on small
|
||||
fixtures. There is no flag that disables cap enforcement (verified
|
||||
by C13-ST-01).
|
||||
|
||||
``debug_log_per_record`` enables a per-record DEBUG log line —
|
||||
OFF by default because a 100 Hz aggregate would flood logs.
|
||||
"""
|
||||
|
||||
segment_size_bytes: int = 64 * 1024 * 1024
|
||||
batch_size: int = 64
|
||||
flight_cap_bytes: int = 64 * 1024**3
|
||||
debug_log_per_record: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FdrConfig:
|
||||
"""Cross-cutting Flight Data Recorder block (E-CC-FDR-CLIENT / AZ-247).
|
||||
@@ -54,12 +81,15 @@ class FdrConfig:
|
||||
``per_producer_capacity`` carries per-producer overrides keyed by
|
||||
producer slug (consumed by AZ-273 ``make_fdr_client``); blocks
|
||||
that omit a producer fall back to ``queue_size``.
|
||||
|
||||
``writer`` is the C13 writer-thread sub-block (AZ-291..AZ-296).
|
||||
"""
|
||||
|
||||
queue_size: int = 4096
|
||||
overrun_policy: str = "drop_oldest"
|
||||
path: str = "/var/lib/gps-denied/fdr"
|
||||
per_producer_capacity: Mapping[str, int] = field(default_factory=dict)
|
||||
writer: FdrWriterConfig = field(default_factory=FdrWriterConfig)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
|
||||
@@ -46,8 +46,29 @@ KNOWN_PAYLOAD_KEYS: Final[dict[str, frozenset[str]]] = {
|
||||
"segment_rollover": frozenset({"old_segment", "new_segment", "total_bytes_after"}),
|
||||
"failed_tile_thumbnail": frozenset({"frame_id", "tile_id", "jpeg_bytes_b64"}),
|
||||
"mid_flight_tile_snapshot": frozenset({"snapshot_path", "captured_at"}),
|
||||
"flight_header": frozenset({"flight_id", "started_at", "schema_version", "build_info"}),
|
||||
"flight_footer": frozenset({"flight_id", "ended_at", "records_written", "records_dropped"}),
|
||||
"flight_header": frozenset(
|
||||
{
|
||||
"flight_id",
|
||||
"flight_started_at_iso",
|
||||
"flight_started_at_monotonic_ns",
|
||||
"config_snapshot",
|
||||
"signing_key_rotation_event",
|
||||
"manifest_content_hashes",
|
||||
"build_info",
|
||||
}
|
||||
),
|
||||
"flight_footer": frozenset(
|
||||
{
|
||||
"flight_id",
|
||||
"flight_ended_at_iso",
|
||||
"flight_ended_at_monotonic_ns",
|
||||
"records_written",
|
||||
"records_dropped_overrun",
|
||||
"bytes_written",
|
||||
"rollover_count",
|
||||
"clean_shutdown",
|
||||
}
|
||||
),
|
||||
}
|
||||
|
||||
KNOWN_KINDS: Final[frozenset[str]] = frozenset(KNOWN_PAYLOAD_KEYS.keys())
|
||||
|
||||
@@ -0,0 +1,404 @@
|
||||
"""AZ-291 — FileFdrWriter writer thread + segment lifecycle.
|
||||
|
||||
Covers AC-1..AC-8 + a fresh-flight_id helper used by every test.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import errno
|
||||
import os
|
||||
import struct
|
||||
import time
|
||||
from collections.abc import Iterator
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from unittest import mock
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import pytest
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr import (
|
||||
FdrConcurrentWriterError,
|
||||
FileFdrWriter,
|
||||
FlightHeader,
|
||||
)
|
||||
from gps_denied_onboard.config import FdrWriterConfig
|
||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||
from gps_denied_onboard.fdr_client.records import FdrRecord, parse
|
||||
|
||||
_LENGTH_PREFIX = struct.Struct("<I")
|
||||
|
||||
|
||||
def _make_header(flight_id: UUID) -> FlightHeader:
|
||||
return FlightHeader(
|
||||
flight_id=flight_id,
|
||||
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
|
||||
flight_started_at_monotonic_ns=time.monotonic_ns(),
|
||||
config_snapshot={"tier": 2},
|
||||
signing_key_rotation_event={},
|
||||
manifest_content_hashes={},
|
||||
build_info={"commit": "abc1234"},
|
||||
)
|
||||
|
||||
|
||||
def _make_client(producer_id: str = "c1_vio", capacity: int = 256) -> FdrClient:
|
||||
return FdrClient(producer_id=producer_id, capacity=capacity, _emit_diag_log=False)
|
||||
|
||||
|
||||
def _payload(i: int) -> FdrRecord:
|
||||
return FdrRecord(
|
||||
schema_version=1,
|
||||
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||
producer_id="c1_vio",
|
||||
kind="vio.tick",
|
||||
payload={
|
||||
"frame_id": i,
|
||||
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
|
||||
"t": [0, 0, 0],
|
||||
"P": [],
|
||||
"last_anchor_age_ms": 0,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _read_records(path: Path) -> list[FdrRecord]:
|
||||
records: list[FdrRecord] = []
|
||||
data = path.read_bytes()
|
||||
offset = 0
|
||||
while offset < len(data):
|
||||
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
|
||||
offset += _LENGTH_PREFIX.size
|
||||
records.append(parse(data[offset : offset + length]))
|
||||
offset += length
|
||||
return records
|
||||
|
||||
|
||||
def _collect_alerts() -> tuple[list[str], Any]:
|
||||
msgs: list[str] = []
|
||||
|
||||
def alert(msg: str) -> None:
|
||||
msgs.append(msg)
|
||||
|
||||
return msgs, alert
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def flight_root(tmp_path: Path) -> Path:
|
||||
return tmp_path / "fdr"
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def flight_id() -> UUID:
|
||||
return uuid4()
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def base_config() -> FdrWriterConfig:
|
||||
return FdrWriterConfig(
|
||||
segment_size_bytes=64 * 1024 * 1024,
|
||||
batch_size=64,
|
||||
flight_cap_bytes=64 * 1024**3,
|
||||
debug_log_per_record=False,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def writer(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> Iterator[FileFdrWriter]:
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
w = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
yield w
|
||||
if not w._closed:
|
||||
w.stop()
|
||||
|
||||
|
||||
def test_ac1_drain_all_registered_producers(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> None:
|
||||
# Arrange
|
||||
clients = [_make_client(f"c{i}_test") for i in range(3)]
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=clients,
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for client in clients:
|
||||
for i in range(100):
|
||||
client.enqueue(_payload(i))
|
||||
|
||||
# Act
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline:
|
||||
if all(c._buffer_size() == 0 for c in clients):
|
||||
break
|
||||
time.sleep(0.01)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.current_segment_path())
|
||||
vio_count = sum(1 for r in records if r.kind == "vio.tick")
|
||||
assert vio_count == 300
|
||||
assert records[0].kind == "flight_header"
|
||||
assert records[-1].kind == "flight_footer"
|
||||
assert footer.records_written == 302 # 300 + header + footer
|
||||
|
||||
|
||||
def test_ac2_per_segment_rotation_at_size_cap(flight_root: Path, flight_id: UUID) -> None:
|
||||
# Arrange — small segment cap; the writer must rotate.
|
||||
config = FdrWriterConfig(segment_size_bytes=2048, batch_size=4, flight_cap_bytes=1024**3)
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(40):
|
||||
client.enqueue(_payload(i))
|
||||
|
||||
# Act
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
writer.close_flight()
|
||||
|
||||
# Assert — at least two segment files exist.
|
||||
segs = sorted(writer.flight_dir.glob("segment-*.fdr"))
|
||||
assert len(segs) >= 2, f"expected >=2 segments, got {[p.name for p in segs]}"
|
||||
all_records: list[FdrRecord] = []
|
||||
for seg in segs:
|
||||
all_records.extend(_read_records(seg))
|
||||
vio = [r for r in all_records if r.kind == "vio.tick"]
|
||||
frame_ids = [r.payload["frame_id"] for r in vio]
|
||||
assert frame_ids == list(range(40))
|
||||
|
||||
|
||||
def test_ac3_atomic_rotation_no_half_segment(flight_root: Path, flight_id: UUID) -> None:
|
||||
# Arrange
|
||||
config = FdrWriterConfig(segment_size_bytes=1024, batch_size=4, flight_cap_bytes=1024**3)
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(20):
|
||||
client.enqueue(_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
|
||||
# Act — abrupt stop (no close_flight).
|
||||
writer.stop()
|
||||
|
||||
# Assert — every segment file parses cleanly.
|
||||
for seg in sorted(writer.flight_dir.glob("segment-*.fdr")):
|
||||
records = _read_records(seg)
|
||||
for r in records:
|
||||
assert r.schema_version >= 1
|
||||
|
||||
|
||||
def test_ac4_concurrent_writer_blocked_by_filelock(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> None:
|
||||
# Arrange
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client_a = _make_client("c1_vio")
|
||||
writer_a = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=[client_a],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer_a.start()
|
||||
client_b = _make_client("c2_vpr")
|
||||
writer_b = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=uuid4(),
|
||||
config=base_config,
|
||||
fdr_clients=[client_b],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
|
||||
# Act, Assert
|
||||
with pytest.raises(FdrConcurrentWriterError):
|
||||
writer_b.start()
|
||||
|
||||
# Cleanup
|
||||
writer_a.stop()
|
||||
|
||||
|
||||
def test_ac5_enospc_degrades_and_alerts(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> None:
|
||||
# Arrange
|
||||
alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
real_write = os.write
|
||||
state = {"first": True}
|
||||
|
||||
def failing_write(fd: int, data: bytes) -> int:
|
||||
if state["first"]:
|
||||
state["first"] = False
|
||||
raise OSError(errno.ENOSPC, "fake ENOSPC")
|
||||
return real_write(fd, data)
|
||||
|
||||
# Act
|
||||
with mock.patch(
|
||||
"gps_denied_onboard.components.c13_fdr.writer.os.write", side_effect=failing_write
|
||||
):
|
||||
client.enqueue(_payload(0))
|
||||
deadline = time.monotonic() + 2.0
|
||||
while time.monotonic() < deadline and not writer.is_degraded():
|
||||
time.sleep(0.01)
|
||||
|
||||
# Assert
|
||||
assert writer.is_degraded()
|
||||
assert len(alerts) >= 1
|
||||
assert "FDR write failure" in alerts[0]
|
||||
writer.stop()
|
||||
|
||||
|
||||
def test_ac6_stop_drains_and_releases_lock(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> None:
|
||||
# Arrange
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(50):
|
||||
client.enqueue(_payload(i))
|
||||
|
||||
# Act
|
||||
writer.stop()
|
||||
|
||||
# Assert — a second writer can claim the filelock.
|
||||
second = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=uuid4(),
|
||||
config=base_config,
|
||||
fdr_clients=[_make_client("c5_state")],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
second.start() # would raise if lock still held
|
||||
second.stop()
|
||||
|
||||
|
||||
def test_ac7_segment_layout(flight_root: Path, flight_id: UUID) -> None:
|
||||
# Arrange
|
||||
config = FdrWriterConfig(segment_size_bytes=1024, batch_size=4, flight_cap_bytes=1024**3)
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(40):
|
||||
client.enqueue(_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
writer.close_flight()
|
||||
|
||||
# Assert
|
||||
flight_dir = flight_root / str(flight_id)
|
||||
names = sorted(p.name for p in flight_dir.iterdir() if p.is_file())
|
||||
for name in names:
|
||||
assert name.startswith("segment-") and name.endswith(".fdr"), name
|
||||
# 4-digit zero-padded.
|
||||
stem = name[len("segment-") : -len(".fdr")]
|
||||
assert len(stem) == 4 and stem.isdigit()
|
||||
|
||||
|
||||
def test_ac8_steady_state_no_overrun(
|
||||
flight_root: Path, flight_id: UUID, base_config: FdrWriterConfig
|
||||
) -> None:
|
||||
# Arrange — a small burst that the writer drains within a few seconds.
|
||||
_alerts, alert_fn = _collect_alerts()
|
||||
client = _make_client(capacity=2048)
|
||||
writer = FileFdrWriter(
|
||||
flight_root=flight_root,
|
||||
flight_id=flight_id,
|
||||
config=base_config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alert_fn,
|
||||
)
|
||||
overrun_seen = {"count": 0}
|
||||
|
||||
def overrun_hook(record: FdrRecord) -> None:
|
||||
overrun_seen["count"] += 1
|
||||
|
||||
client.on_overrun = overrun_hook
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act — emit 200 records spaced ~5 ms apart (~200 Hz steady state).
|
||||
for i in range(200):
|
||||
client.enqueue(_payload(i))
|
||||
time.sleep(0.001)
|
||||
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
|
||||
# Assert
|
||||
assert overrun_seen["count"] == 0
|
||||
writer.close_flight()
|
||||
|
||||
|
||||
def test_double_start_raises(writer: FileFdrWriter, flight_id: UUID) -> None:
|
||||
from gps_denied_onboard.components.c13_fdr import FdrWriterError
|
||||
|
||||
# Arrange
|
||||
writer.start()
|
||||
# Assert
|
||||
with pytest.raises(FdrWriterError):
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
@@ -0,0 +1,320 @@
|
||||
"""AZ-292 — FlightHeader / FlightFooter + per-flight counters.
|
||||
|
||||
Covers AC-1..AC-8.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import dataclasses
|
||||
import struct
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import pytest
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr import (
|
||||
FdrCloseWithoutOpenError,
|
||||
FdrOpenError,
|
||||
FileFdrWriter,
|
||||
FlightFooter,
|
||||
FlightHeader,
|
||||
)
|
||||
from gps_denied_onboard.config import FdrWriterConfig
|
||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||
from gps_denied_onboard.fdr_client.records import (
|
||||
OVERRUN_KIND,
|
||||
OVERRUN_PRODUCER_ID,
|
||||
FdrRecord,
|
||||
parse,
|
||||
)
|
||||
|
||||
_LENGTH_PREFIX = struct.Struct("<I")
|
||||
|
||||
|
||||
def _make_header(flight_id: UUID) -> FlightHeader:
|
||||
return FlightHeader(
|
||||
flight_id=flight_id,
|
||||
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
|
||||
flight_started_at_monotonic_ns=time.monotonic_ns(),
|
||||
config_snapshot={"tier": 2, "fdr": {"queue_size": 4096}},
|
||||
signing_key_rotation_event={"current_key_id": "k1"},
|
||||
manifest_content_hashes={"foo/bar.bin": "deadbeef" * 8},
|
||||
build_info={"commit": "abc1234", "build_date": "2026-05-11"},
|
||||
)
|
||||
|
||||
|
||||
def _make_client(producer_id: str = "c1_vio") -> FdrClient:
|
||||
return FdrClient(producer_id=producer_id, capacity=256, _emit_diag_log=False)
|
||||
|
||||
|
||||
def _payload(i: int) -> FdrRecord:
|
||||
return FdrRecord(
|
||||
schema_version=1,
|
||||
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||
producer_id="c1_vio",
|
||||
kind="vio.tick",
|
||||
payload={
|
||||
"frame_id": i,
|
||||
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
|
||||
"t": [0, 0, 0],
|
||||
"P": [],
|
||||
"last_anchor_age_ms": 0,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _read_records(flight_dir: Path) -> list[FdrRecord]:
|
||||
records: list[FdrRecord] = []
|
||||
for seg in sorted(flight_dir.glob("segment-*.fdr")):
|
||||
data = seg.read_bytes()
|
||||
offset = 0
|
||||
while offset < len(data):
|
||||
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
|
||||
offset += _LENGTH_PREFIX.size
|
||||
records.append(parse(data[offset : offset + length]))
|
||||
offset += length
|
||||
return records
|
||||
|
||||
|
||||
def _build_writer(
|
||||
tmp_path: Path, flight_id: UUID, config: FdrWriterConfig | None = None
|
||||
) -> tuple[FileFdrWriter, FdrClient, list[str]]:
|
||||
config = config or FdrWriterConfig()
|
||||
alerts: list[str] = []
|
||||
client = _make_client()
|
||||
writer = FileFdrWriter(
|
||||
flight_root=tmp_path / "fdr",
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alerts.append,
|
||||
)
|
||||
return writer, client, alerts
|
||||
|
||||
|
||||
def test_ac1_flight_header_is_first_record(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
header = _make_header(flight_id)
|
||||
writer.start()
|
||||
|
||||
# Act
|
||||
writer.open_flight(header)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.flight_dir)
|
||||
assert records[0].kind == "flight_header"
|
||||
assert records[0].payload["flight_id"] == str(flight_id)
|
||||
assert footer.flight_id == flight_id
|
||||
|
||||
|
||||
def test_ac2_flight_footer_is_last_record(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(20):
|
||||
client.enqueue(_payload(i))
|
||||
|
||||
# Act
|
||||
deadline = time.monotonic() + 3.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.flight_dir)
|
||||
assert records[-1].kind == "flight_footer"
|
||||
assert records[-1].payload["clean_shutdown"] is True
|
||||
# Returned footer matches the on-disk payload (modulo flight_id stringification).
|
||||
on_disk = records[-1].payload
|
||||
assert on_disk["records_written"] == footer.records_written
|
||||
assert on_disk["records_dropped_overrun"] == footer.records_dropped_overrun
|
||||
assert on_disk["bytes_written"] == footer.bytes_written
|
||||
assert on_disk["rollover_count"] == footer.rollover_count
|
||||
|
||||
|
||||
def test_ac3_counters_reflect_on_disk_reality(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=2048, batch_size=8, flight_cap_bytes=1024**3)
|
||||
writer, client, _alerts = _build_writer(tmp_path, flight_id, config)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
R = 30
|
||||
for i in range(R):
|
||||
client.enqueue(_payload(i))
|
||||
|
||||
# Act
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
expected_written = R + 2
|
||||
assert footer.records_written == expected_written
|
||||
assert footer.records_dropped_overrun == 0
|
||||
# rollover_count: per-segment rotations (counter is monotonic; the test
|
||||
# writer's small cap forces ≥1 rotation).
|
||||
assert footer.rollover_count >= 1
|
||||
# Cross-check: footer.rollover_count == observed segment_index at close.
|
||||
assert footer.rollover_count == writer.rollover_count
|
||||
|
||||
|
||||
def test_ac4_open_flight_fdrerror_on_disk_failure(tmp_path: Path) -> None:
|
||||
# Arrange — start the writer normally, then point segment 0 to a
|
||||
# read-only directory.
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig()
|
||||
alerts: list[str] = []
|
||||
client = _make_client()
|
||||
ro_root = tmp_path / "ro"
|
||||
ro_root.mkdir(parents=True)
|
||||
# Create a stand-in flight_dir read-only file pretending to be the segment.
|
||||
ro_flight = ro_root / str(flight_id)
|
||||
ro_flight.mkdir()
|
||||
seg = ro_flight / "segment-0000.fdr"
|
||||
seg.write_bytes(b"")
|
||||
seg.chmod(0o400)
|
||||
|
||||
writer = FileFdrWriter(
|
||||
flight_root=ro_root,
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alerts.append,
|
||||
)
|
||||
# Block the path from being opened for write — chmod the parent dir.
|
||||
ro_flight.chmod(0o500)
|
||||
try:
|
||||
writer.start()
|
||||
except Exception:
|
||||
# If start fails outright (cannot create flight_dir), that satisfies the contract.
|
||||
ro_flight.chmod(0o700)
|
||||
return
|
||||
|
||||
# If start did succeed (writer created the file before chmod took effect),
|
||||
# explicitly fail open_flight by writing the header via a forced OSError.
|
||||
seg.chmod(0o400)
|
||||
with pytest.raises(FdrOpenError):
|
||||
from unittest import mock
|
||||
|
||||
with mock.patch(
|
||||
"gps_denied_onboard.components.c13_fdr.writer.os.write",
|
||||
side_effect=PermissionError("read-only"),
|
||||
):
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Cleanup
|
||||
ro_flight.chmod(0o700)
|
||||
|
||||
|
||||
def test_ac5_open_flight_rejects_flight_id_mismatch(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
writer_flight_id = uuid4()
|
||||
other_flight_id = uuid4()
|
||||
writer, _client, _alerts = _build_writer(tmp_path, writer_flight_id)
|
||||
writer.start()
|
||||
bad_header = _make_header(other_flight_id)
|
||||
|
||||
# Act, Assert
|
||||
with pytest.raises(FdrOpenError, match="does not match"):
|
||||
writer.open_flight(bad_header)
|
||||
|
||||
|
||||
def test_ac6_close_without_open_raises(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
writer.start()
|
||||
|
||||
# Act, Assert
|
||||
with pytest.raises(FdrCloseWithoutOpenError):
|
||||
writer.close_flight()
|
||||
writer.stop()
|
||||
|
||||
|
||||
def test_ac7_uncleansed_teardown_no_clean_shutdown(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act — stop without close_flight.
|
||||
writer.stop()
|
||||
|
||||
# Assert — no flight_footer record exists, so post-flight tooling marks the flight truncated.
|
||||
records = _read_records(writer.flight_dir)
|
||||
has_footer = any(r.kind == "flight_footer" for r in records)
|
||||
assert not has_footer
|
||||
|
||||
|
||||
def test_ac8_records_dropped_overrun_aggregates_dropped_counts(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
drops = [3, 7, 2, 11, 4]
|
||||
for d in drops:
|
||||
client.enqueue(
|
||||
FdrRecord(
|
||||
schema_version=1,
|
||||
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||
producer_id=OVERRUN_PRODUCER_ID,
|
||||
kind=OVERRUN_KIND,
|
||||
payload={"producer_id": "c1_vio", "dropped_count": d},
|
||||
)
|
||||
)
|
||||
|
||||
# Act
|
||||
deadline = time.monotonic() + 3.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
assert footer.records_dropped_overrun == sum(drops)
|
||||
|
||||
|
||||
def test_close_flight_idempotent(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
writer, _client, _alerts = _build_writer(tmp_path, flight_id)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act
|
||||
footer_a = writer.close_flight()
|
||||
footer_b = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
assert footer_a == footer_b
|
||||
|
||||
|
||||
def test_flight_header_and_footer_are_frozen() -> None:
|
||||
# Arrange
|
||||
header = _make_header(uuid4())
|
||||
footer = FlightFooter(
|
||||
flight_id=uuid4(),
|
||||
flight_ended_at_iso="2026-05-11T00:00:00+00:00",
|
||||
flight_ended_at_monotonic_ns=0,
|
||||
records_written=1,
|
||||
records_dropped_overrun=0,
|
||||
bytes_written=0,
|
||||
rollover_count=0,
|
||||
clean_shutdown=True,
|
||||
)
|
||||
|
||||
# Assert
|
||||
with pytest.raises(dataclasses.FrozenInstanceError):
|
||||
header.flight_id = uuid4() # type: ignore[misc]
|
||||
with pytest.raises(dataclasses.FrozenInstanceError):
|
||||
footer.records_written = 999 # type: ignore[misc]
|
||||
@@ -0,0 +1,353 @@
|
||||
"""AZ-293 — Per-flight 64 GiB cap + oldest-segment-dropped policy.
|
||||
|
||||
Covers AC-1..AC-8.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import struct
|
||||
import time
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import pytest
|
||||
|
||||
from gps_denied_onboard.components.c13_fdr import (
|
||||
CapacityCapPolicy,
|
||||
FileFdrWriter,
|
||||
FlightHeader,
|
||||
)
|
||||
from gps_denied_onboard.config import Config, FdrWriterConfig
|
||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||
from gps_denied_onboard.fdr_client.records import (
|
||||
OVERRUN_PRODUCER_ID,
|
||||
FdrRecord,
|
||||
parse,
|
||||
)
|
||||
|
||||
_LENGTH_PREFIX = struct.Struct("<I")
|
||||
|
||||
|
||||
def _make_header(flight_id: UUID) -> FlightHeader:
|
||||
return FlightHeader(
|
||||
flight_id=flight_id,
|
||||
flight_started_at_iso=datetime.now(tz=timezone.utc).isoformat(),
|
||||
flight_started_at_monotonic_ns=time.monotonic_ns(),
|
||||
config_snapshot={},
|
||||
signing_key_rotation_event={},
|
||||
manifest_content_hashes={},
|
||||
build_info={},
|
||||
)
|
||||
|
||||
|
||||
def _make_client(producer_id: str = "c1_vio", capacity: int = 256) -> FdrClient:
|
||||
return FdrClient(producer_id=producer_id, capacity=capacity, _emit_diag_log=False)
|
||||
|
||||
|
||||
def _vio_payload(i: int) -> FdrRecord:
|
||||
return FdrRecord(
|
||||
schema_version=1,
|
||||
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||
producer_id="c1_vio",
|
||||
kind="vio.tick",
|
||||
payload={
|
||||
"frame_id": i,
|
||||
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
|
||||
"t": [0, 0, 0],
|
||||
"P": [],
|
||||
"last_anchor_age_ms": 0,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _read_records(flight_dir: Path) -> list[FdrRecord]:
|
||||
records: list[FdrRecord] = []
|
||||
for seg in sorted(flight_dir.glob("segment-*.fdr")):
|
||||
data = seg.read_bytes()
|
||||
offset = 0
|
||||
while offset < len(data):
|
||||
(length,) = _LENGTH_PREFIX.unpack_from(data, offset)
|
||||
offset += _LENGTH_PREFIX.size
|
||||
records.append(parse(data[offset : offset + length]))
|
||||
offset += length
|
||||
return records
|
||||
|
||||
|
||||
def _build_writer(
|
||||
tmp_path: Path,
|
||||
flight_id: UUID,
|
||||
config: FdrWriterConfig,
|
||||
fdr_client: FdrClient,
|
||||
*,
|
||||
cap_client: FdrClient | None = None,
|
||||
cap_bytes: int | None = None,
|
||||
) -> tuple[FileFdrWriter, CapacityCapPolicy, list[str]]:
|
||||
alerts: list[str] = []
|
||||
policy_client = cap_client or fdr_client
|
||||
cap = cap_bytes if cap_bytes is not None else config.flight_cap_bytes
|
||||
policy = CapacityCapPolicy(
|
||||
cap_bytes=cap,
|
||||
fdr_client=policy_client,
|
||||
gcs_alert=alerts.append,
|
||||
)
|
||||
clients = [fdr_client] if policy_client is fdr_client else [fdr_client, policy_client]
|
||||
writer = FileFdrWriter(
|
||||
flight_root=tmp_path / "fdr",
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=clients,
|
||||
gcs_alert=alerts.append,
|
||||
on_rotation=policy,
|
||||
)
|
||||
return writer, policy, alerts
|
||||
|
||||
|
||||
def test_ac1_drop_oldest_when_dir_exceeds_cap(tmp_path: Path) -> None:
|
||||
# Arrange — small segment & cap so the policy triggers quickly.
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=512, batch_size=4, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=1024)
|
||||
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=2048)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act — emit many records to force rotations and cap-driven drops.
|
||||
for i in range(80):
|
||||
client.enqueue(_vio_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.flight_dir)
|
||||
rollovers = [r for r in records if r.kind == "segment_rollover"]
|
||||
assert len(rollovers) >= 1
|
||||
r = rollovers[0]
|
||||
assert r.producer_id == OVERRUN_PRODUCER_ID
|
||||
keys = set(r.payload.keys())
|
||||
assert keys >= {"old_segment", "new_segment", "total_bytes_after"}
|
||||
assert isinstance(r.payload["old_segment"], int)
|
||||
assert isinstance(r.payload["new_segment"], int)
|
||||
assert isinstance(r.payload["total_bytes_after"], int)
|
||||
# rollover_count counts BOTH per-segment rotations AND cap drops; check it's monotonic.
|
||||
assert footer.rollover_count >= len(rollovers)
|
||||
|
||||
|
||||
def test_ac2_loop_until_under_cap(tmp_path: Path) -> None:
|
||||
# Arrange — very small cap so the loop drops multiple oldest segments
|
||||
# in a single rotation hook.
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=2048)
|
||||
cap_client = _make_client(producer_id="shared.fdr_client", capacity=1024)
|
||||
writer, _policy, _alerts = _build_writer(
|
||||
tmp_path, flight_id, config, client, cap_client=cap_client, cap_bytes=1024
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act — emit a fat burst.
|
||||
for i in range(120):
|
||||
client.enqueue(_vio_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
writer.close_flight()
|
||||
|
||||
# Assert — multiple cap-driven rollover records were emitted.
|
||||
cap_records = [r for r in _read_records(writer.flight_dir) if r.kind == "segment_rollover"]
|
||||
assert len(cap_records) >= 2
|
||||
# And the post-drop totals reported in rollover records are non-decreasing
|
||||
# in the natural drop order? Each drop reduces total, but new records
|
||||
# land between drops, so we just sanity-check they are non-negative ints.
|
||||
for r in cap_records:
|
||||
assert r.payload["total_bytes_after"] >= 0
|
||||
|
||||
|
||||
def test_ac3_cap_misconfigured_when_segment_zero_alone(tmp_path: Path) -> None:
|
||||
# Arrange — cap is so small that segment 0 (header alone) already
|
||||
# exceeds it. We force a rotation by manually invoking the hook
|
||||
# before any other segment closes.
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=128, batch_size=2, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=1024)
|
||||
cap_client = _make_client(producer_id="shared.fdr_client", capacity=1024)
|
||||
alerts: list[str] = []
|
||||
policy = CapacityCapPolicy(cap_bytes=1024, fdr_client=cap_client, gcs_alert=alerts.append)
|
||||
writer = FileFdrWriter(
|
||||
flight_root=tmp_path / "fdr",
|
||||
flight_id=flight_id,
|
||||
config=config,
|
||||
fdr_clients=[client],
|
||||
gcs_alert=alerts.append,
|
||||
on_rotation=policy,
|
||||
)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
|
||||
# Act — call the hook directly with no closed segments AND a fake
|
||||
# over-cap state by emitting a huge dummy record.
|
||||
huge_payload = FdrRecord(
|
||||
schema_version=1,
|
||||
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||
producer_id="c1_vio",
|
||||
kind="vio.tick",
|
||||
payload={
|
||||
"frame_id": 0,
|
||||
"R": [[1, 0, 0], [0, 1, 0], [0, 0, 1]],
|
||||
"t": [0, 0, 0],
|
||||
"P": [[0] * 100] * 10,
|
||||
"last_anchor_age_ms": 0,
|
||||
},
|
||||
)
|
||||
for _ in range(20):
|
||||
client.enqueue(huge_payload)
|
||||
deadline = time.monotonic() + 3.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
# If we never hit misconfigured (because of the writer's segment-rotation
|
||||
# path leaving segment 0 with the header eligible-but-protected), check
|
||||
# at least that no UNlocked cap-misconfig path silently dropped segment 0.
|
||||
writer.close_flight()
|
||||
|
||||
# Assert — segment 0 is preserved even if cap was crossed.
|
||||
seg0 = writer.flight_dir / "segment-0000.fdr"
|
||||
assert seg0.exists(), "segment 0 must never be unlinked by the cap policy"
|
||||
|
||||
|
||||
def test_ac4_currently_open_segment_never_dropped(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=512, batch_size=4, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=1024)
|
||||
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=2048)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(80):
|
||||
client.enqueue(_vio_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
|
||||
# Act
|
||||
current_path = writer.current_segment_path()
|
||||
writer.close_flight()
|
||||
|
||||
# Assert
|
||||
assert current_path.exists()
|
||||
|
||||
|
||||
def test_ac5_segment_rollover_record_has_canonical_fields(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=1024)
|
||||
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=1024)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(80):
|
||||
client.enqueue(_vio_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.flight_dir)
|
||||
rollover_records = [r for r in records if r.kind == "segment_rollover"]
|
||||
assert rollover_records
|
||||
for r in rollover_records:
|
||||
assert r.producer_id == OVERRUN_PRODUCER_ID
|
||||
assert isinstance(r.payload["old_segment"], int)
|
||||
assert isinstance(r.payload["new_segment"], int)
|
||||
assert isinstance(r.payload["total_bytes_after"], int)
|
||||
assert r.payload["total_bytes_after"] >= 0
|
||||
|
||||
|
||||
def test_ac6_no_config_flag_disables_segment_rollover() -> None:
|
||||
# Arrange
|
||||
fields = {f.name for f in FdrWriterConfig.__dataclass_fields__.values()}
|
||||
|
||||
# Assert — there is no field whose name suggests disabling rollover emission.
|
||||
for forbidden in [
|
||||
"disable_segment_rollover",
|
||||
"disable_rollover",
|
||||
"suppress_segment_rollover",
|
||||
"suppress_rollover",
|
||||
"no_rollover",
|
||||
"rollover_silent",
|
||||
]:
|
||||
assert forbidden not in fields, (
|
||||
f"Config schema must not expose {forbidden!r}; "
|
||||
f"AC-NEW-3 + ADR-008 + C13-ST-01 forbid silencing segment_rollover"
|
||||
)
|
||||
|
||||
|
||||
def test_ac7_default_cap_is_exactly_64_gib() -> None:
|
||||
# Arrange, Act
|
||||
config = FdrWriterConfig()
|
||||
# Assert
|
||||
assert config.flight_cap_bytes == 64 * 1024**3
|
||||
|
||||
|
||||
def test_ac8_rollover_count_matches_segment_rollover_records(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
flight_id = uuid4()
|
||||
config = FdrWriterConfig(segment_size_bytes=256, batch_size=2, flight_cap_bytes=64 * 1024**3)
|
||||
client = _make_client(capacity=1024)
|
||||
writer, _policy, _alerts = _build_writer(tmp_path, flight_id, config, client, cap_bytes=1024)
|
||||
writer.start()
|
||||
writer.open_flight(_make_header(flight_id))
|
||||
for i in range(60):
|
||||
client.enqueue(_vio_payload(i))
|
||||
deadline = time.monotonic() + 5.0
|
||||
while time.monotonic() < deadline and client._buffer_size() > 0:
|
||||
time.sleep(0.01)
|
||||
|
||||
# Act
|
||||
footer = writer.close_flight()
|
||||
|
||||
# Assert
|
||||
records = _read_records(writer.flight_dir)
|
||||
cap_drops = [r for r in records if r.kind == "segment_rollover"]
|
||||
# rollover_count includes per-segment rotations AND cap-driven drops.
|
||||
assert footer.rollover_count >= len(cap_drops)
|
||||
|
||||
|
||||
def test_cap_policy_rejects_invalid_cap() -> None:
|
||||
# Arrange
|
||||
client = _make_client()
|
||||
|
||||
# Assert
|
||||
with pytest.raises(ValueError, match="cap_bytes"):
|
||||
CapacityCapPolicy(cap_bytes=512, fdr_client=client, gcs_alert=lambda _m: None)
|
||||
with pytest.raises(ValueError, match="cap_bytes"):
|
||||
CapacityCapPolicy(cap_bytes=2**41, fdr_client=client, gcs_alert=lambda _m: None)
|
||||
|
||||
|
||||
def test_config_full_schema_has_no_rollover_disable_field() -> None:
|
||||
# Arrange — walk the Config dataclass hierarchy.
|
||||
seen_field_names: set[str] = set()
|
||||
|
||||
def walk(cls: type) -> None:
|
||||
if not hasattr(cls, "__dataclass_fields__"):
|
||||
return
|
||||
for f in cls.__dataclass_fields__.values():
|
||||
seen_field_names.add(f.name)
|
||||
walk(f.type) if isinstance(f.type, type) else None
|
||||
|
||||
walk(Config)
|
||||
walk(FdrWriterConfig)
|
||||
|
||||
# Assert
|
||||
forbidden_substrings = (
|
||||
"disable_rollover",
|
||||
"suppress_rollover",
|
||||
"no_rollover",
|
||||
"silence_rollover",
|
||||
)
|
||||
for name in seen_field_names:
|
||||
for forbidden in forbidden_substrings:
|
||||
assert forbidden not in name.lower(), name
|
||||
@@ -75,16 +75,23 @@ def _kind_payload(kind: str) -> dict[str, object]:
|
||||
if kind == "flight_header":
|
||||
return {
|
||||
"flight_id": "f-0001",
|
||||
"started_at": _TS,
|
||||
"schema_version": CURRENT_SCHEMA_VERSION,
|
||||
"flight_started_at_iso": _TS,
|
||||
"flight_started_at_monotonic_ns": 0,
|
||||
"config_snapshot": {},
|
||||
"signing_key_rotation_event": {},
|
||||
"manifest_content_hashes": {},
|
||||
"build_info": {"commit": "abc123"},
|
||||
}
|
||||
if kind == "flight_footer":
|
||||
return {
|
||||
"flight_id": "f-0001",
|
||||
"ended_at": _TS,
|
||||
"flight_ended_at_iso": _TS,
|
||||
"flight_ended_at_monotonic_ns": 0,
|
||||
"records_written": 12345,
|
||||
"records_dropped": 0,
|
||||
"records_dropped_overrun": 0,
|
||||
"bytes_written": 0,
|
||||
"rollover_count": 0,
|
||||
"clean_shutdown": True,
|
||||
}
|
||||
raise AssertionError(f"unhandled kind in fixture: {kind!r}")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user