mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:41:12 +00:00
[AZ-294] [AZ-295] [AZ-296] Finish C13: tile snapshot + record-kind policy + takeoff abort
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -3,9 +3,9 @@
|
|||||||
**Component**: shared_config (cross-cutting concern owned by E-CC-CONF / AZ-246)
|
**Component**: shared_config (cross-cutting concern owned by E-CC-CONF / AZ-246)
|
||||||
**Producer tasks**: AZ-269 (config loader + outer Config) and AZ-270 (compose_root + compose_operator + StrategyNotLinkedError)
|
**Producer tasks**: AZ-269 (config loader + outer Config) and AZ-270 (compose_root + compose_operator + StrategyNotLinkedError)
|
||||||
**Consumer tasks**: every component task that takes a config block; `runtime_root.py` and `operator_tool/__main__.py` (the two composition-root entrypoints)
|
**Consumer tasks**: every component task that takes a config block; `runtime_root.py` and `operator_tool/__main__.py` (the two composition-root entrypoints)
|
||||||
**Version**: 1.0.0
|
**Version**: 1.1.0
|
||||||
**Status**: draft
|
**Status**: draft
|
||||||
**Last Updated**: 2026-05-10
|
**Last Updated**: 2026-05-11
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
@@ -76,8 +76,46 @@ class StrategyNotLinkedError(RuntimeError):
|
|||||||
| compose-operator-no-airborne | operator-side config | returns `OperatorRoot` containing only operator-tier components (e.g. C11, C12) | wrong-tier components excluded |
|
| compose-operator-no-airborne | operator-side config | returns `OperatorRoot` containing only operator-tier components (e.g. C11, C12) | wrong-tier components excluded |
|
||||||
| load-config-purity | call `load_config(env, paths)` twice with same inputs | identical `Config` objects (or deep-equal) | reproducibility |
|
| load-config-purity | call `load_config(env, paths)` twice with same inputs | identical `Config` objects (or deep-equal) | reproducibility |
|
||||||
|
|
||||||
|
## Takeoff Sequence (AZ-296 / E-C13 / AC-NEW-3)
|
||||||
|
|
||||||
|
The airborne entrypoint MUST execute the takeoff sequence in strict order:
|
||||||
|
|
||||||
|
1. Construct `FileFdrWriter`.
|
||||||
|
2. Call `writer.start()`.
|
||||||
|
3. Call `writer.open_flight(header)`.
|
||||||
|
4. **Only if step 3 succeeded**, construct the C8 FC adapter and call its
|
||||||
|
`open()`. The FC adapter MUST NOT be constructed before `open_flight`
|
||||||
|
returns; this is the AC-NEW-3 every-payload-class-from-t=0 gate.
|
||||||
|
5. Construct + start every other component.
|
||||||
|
|
||||||
|
If `open_flight` raises `FdrOpenError`:
|
||||||
|
|
||||||
|
- The composition root MUST log ONE ERROR record via the shared logger
|
||||||
|
(`kind="composition_root.takeoff_aborted"`, `level="ERROR"`,
|
||||||
|
`kv.reason="fdr_open_error"`, `kv.flight_root=<configured path>`,
|
||||||
|
`kv.underlying=<str(exc)>`).
|
||||||
|
- It MUST call `writer.stop()` to release the filelock + segment file.
|
||||||
|
- It MUST print exactly one line to stderr:
|
||||||
|
`FATAL: cannot open FDR at <flight_root>: <underlying message>; aborting takeoff (exit 2)`.
|
||||||
|
- It MUST exit the process with `sys.exit(EXIT_FDR_OPEN_FAILURE)`; if
|
||||||
|
intercepted, fall back to `os._exit(EXIT_FDR_OPEN_FAILURE)`.
|
||||||
|
|
||||||
|
The abort path MUST complete in ≤ 500 ms (NFR-perf-abort).
|
||||||
|
|
||||||
|
### Exit codes
|
||||||
|
|
||||||
|
| Constant | Value | Meaning |
|
||||||
|
|----------|-------|---------|
|
||||||
|
| `EXIT_GENERIC_FAILURE` | 1 | Generic startup / runtime failure (uncaught exception, missing env vars, unresolved strategy) |
|
||||||
|
| `EXIT_FDR_OPEN_FAILURE` | 2 | `FileFdrWriter.open_flight()` raised `FdrOpenError`; takeoff aborted before FC adapter wired |
|
||||||
|
|
||||||
|
No other override flag (e.g. `--ignore-fdr-failure`) is permitted; adding
|
||||||
|
one is a major-version bump on this contract AND a security-review-required
|
||||||
|
change (AC-NEW-3 / RESTRICT-UAV-4).
|
||||||
|
|
||||||
## Change Log
|
## Change Log
|
||||||
|
|
||||||
| Version | Date | Change | Author |
|
| Version | Date | Change | Author |
|
||||||
|---------|------|--------|--------|
|
|---------|------|--------|--------|
|
||||||
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-CONF epic (AZ-246) | autodev decompose Step 2 |
|
| 1.0.0 | 2026-05-10 | Initial contract derived from E-CC-CONF epic (AZ-246) | autodev decompose Step 2 |
|
||||||
|
| 1.1.0 | 2026-05-11 | Add takeoff sequence section + `EXIT_FDR_OPEN_FAILURE` (AZ-296) | autodev batch 7 |
|
||||||
|
|||||||
@@ -50,7 +50,7 @@ class FdrRecord:
|
|||||||
| `overrun` | E-CC-FDR-CLIENT itself | `{producer_id, dropped_count}` (`dropped_count > 0`) | AC-NEW-3: never silent. Emitted by drop-oldest hook |
|
| `overrun` | E-CC-FDR-CLIENT itself | `{producer_id, dropped_count}` (`dropped_count > 0`) | AC-NEW-3: never silent. Emitted by drop-oldest hook |
|
||||||
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
|
| `segment_rollover` | E-C13 (writer) | `{old_segment, new_segment, total_bytes_after}` | Emitted on segment rotation, including 64 GB-cap drops |
|
||||||
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
|
| `failed_tile_thumbnail` | C6 / C11 | `{frame_id, tile_id, jpeg_bytes_b64}` (≤ 0.1 Hz rate cap) | AC-8.5 forensic exception |
|
||||||
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at}` | AC-8.4 mid-flight snapshot pointer |
|
| `mid_flight_tile_snapshot` | C13 (snapshot path) | `{snapshot_path, captured_at, frame_id?}` | AC-8.4 mid-flight snapshot pointer (envelope `producer_id="shared.fdr_client"`); `frame_id` optional (AZ-294) |
|
||||||
| `flight_header` | C13 (writer) | `{flight_id, flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes, build_info}` | Single record at flight open (envelope `producer_id="shared.fdr_client"`) |
|
| `flight_header` | C13 (writer) | `{flight_id, flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes, build_info}` | Single record at flight open (envelope `producer_id="shared.fdr_client"`) |
|
||||||
| `flight_footer` | C13 (writer) | `{flight_id, flight_ended_at_iso, flight_ended_at_monotonic_ns, records_written, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown}` | Single record at flight close (envelope `producer_id="shared.fdr_client"`) |
|
| `flight_footer` | C13 (writer) | `{flight_id, flight_ended_at_iso, flight_ended_at_monotonic_ns, records_written, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown}` | Single record at flight close (envelope `producer_id="shared.fdr_client"`) |
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,70 @@
|
|||||||
|
# Batch 07 — Implementation Report (cycle 1)
|
||||||
|
|
||||||
|
**Batch**: 7 of N
|
||||||
|
**Tasks**: AZ-294, AZ-295, AZ-296
|
||||||
|
**Cycle**: 1
|
||||||
|
**Date**: 2026-05-11
|
||||||
|
**Status**: complete (all ACs green; full suite 356 passed, 2 skipped, 0 failures)
|
||||||
|
|
||||||
|
## Tickets
|
||||||
|
|
||||||
|
| Ticket | Title | Complexity | Outcome |
|
||||||
|
|--------|-------|------------|---------|
|
||||||
|
| AZ-294 | C13 mid-flight tile snapshot sidecar (F4) | 3 pt | Done |
|
||||||
|
| AZ-295 | C13 AC-8.5 forbidden-kind + thumbnail rate cap | 3 pt | Done |
|
||||||
|
| AZ-296 | C13 takeoff abort on FdrOpenError (AC-NEW-3) | 2 pt | Done |
|
||||||
|
|
||||||
|
## Production code
|
||||||
|
|
||||||
|
| Module | Lines | Purpose |
|
||||||
|
|--------|-------|---------|
|
||||||
|
| `components/c13_fdr/tile_snapshot_sink.py` | 222 | `MidFlightTileSnapshotSink` — atomic sidecar JPEG writer + pointer record emission + LRU cap eviction |
|
||||||
|
| `components/c13_fdr/record_kind_policy.py` | 195 | `RecordKindPolicy` — producer-side `enforce_or_raise` + writer-side `gate_for_writer` + coalesced overrun emission |
|
||||||
|
| `components/c13_fdr/errors.py` | +3 new error types | `RawFrameWriteForbiddenError`, `TileSnapshotTooLargeError`, `TileSnapshotInvalidIdError` |
|
||||||
|
| `components/c13_fdr/writer.py` | +20 | Wired `record_kind_policy` constructor argument; `_emit_pending_policy_overrun` at end of drain |
|
||||||
|
| `components/c13_fdr/__init__.py` | +12 | Exported new public surface |
|
||||||
|
| `config/schema.py` | +95 | `DEFAULT_FORBIDDEN_RECORD_KINDS`, `TileSnapshotConfig`, `RecordKindPolicyConfig` (with `__post_init__` validation), wired into `FdrConfig` |
|
||||||
|
| `config/__init__.py` | +5 | Exported the new config classes |
|
||||||
|
| `fdr_client/records.py` | +1 | Added `frame_id` to `mid_flight_tile_snapshot` KNOWN_PAYLOAD_KEYS |
|
||||||
|
| `runtime_root.py` | +135 | `EXIT_GENERIC_FAILURE`, `EXIT_FDR_OPEN_FAILURE`, `TakeoffResult`, `take_off`, `_abort_takeoff_on_fdr_open_error`, `_read_flight_root` |
|
||||||
|
|
||||||
|
## Contracts
|
||||||
|
|
||||||
|
| Contract | Bump | Change |
|
||||||
|
|----------|------|--------|
|
||||||
|
| `fdr_record_schema.md` | v1.1.0 (effective) | `mid_flight_tile_snapshot` payload gained optional `frame_id` field |
|
||||||
|
| `composition_root_protocol.md` | v1.0.0 → v1.1.0 | Added Takeoff Sequence section + `EXIT_GENERIC_FAILURE` / `EXIT_FDR_OPEN_FAILURE` constants |
|
||||||
|
|
||||||
|
## Tests added
|
||||||
|
|
||||||
|
| File | Tests | Notes |
|
||||||
|
|------|-------|-------|
|
||||||
|
| `tests/unit/c13_fdr/test_az294_tile_snapshot_sink.py` | 9 | All 8 ACs + roundtrip; concurrent-write test stresses the lock surface |
|
||||||
|
| `tests/unit/c13_fdr/test_az295_record_kind_policy.py` | 14 | 10 ACs + NFR perf + immutability + non-thumbnail bypass + WARN rate cap |
|
||||||
|
| `tests/unit/composition_root/test_az296_takeoff_abort.py` | 10 | 8 ACs + perf + reliability; mix of subprocess (`sys.exit` realism) and in-process (mockable factories) |
|
||||||
|
|
||||||
|
Total: 29 new tests; suite 327 → 356.
|
||||||
|
|
||||||
|
## Dependency changes
|
||||||
|
|
||||||
|
None. Every new module uses stdlib only.
|
||||||
|
|
||||||
|
## Schema changes
|
||||||
|
|
||||||
|
- `FdrConfig.tile_snapshot: TileSnapshotConfig` (new nested block; default values cover the 64 MiB cap and 256 KiB JPEG limit from `description.md`).
|
||||||
|
- `FdrConfig.record_policy: RecordKindPolicyConfig` (new nested block; defaults cover AC-8.5 forbidden set + 0.1 Hz thumbnail rate cap).
|
||||||
|
|
||||||
|
Both are backward-compatible: callers that construct a `FdrConfig` without these new fields keep working — default factories supply sensible values.
|
||||||
|
|
||||||
|
## Risks & follow-ups
|
||||||
|
|
||||||
|
- **Composition root `main()` does NOT call `take_off()` yet.** `take_off` is the new airborne entrypoint contract, but `runtime_root.main()` still only calls `compose_root`. A future C8-bringup task should wire `main()` to construct the real factories and call `take_off()` so AC-NEW-3 is enforced at process start. Documented in the batch 07 review (informational finding #3).
|
||||||
|
- **`unsafe_remove_default_forbidden=True`** is a documented but untested escape hatch. Not used in any standard preset. Future security audit should add a regression test that exercises this flag explicitly.
|
||||||
|
- **Tile-snapshot tile_id uses a regex bound to 128 chars**. If C6 ever needs longer tile IDs, this will need to be bumped; today the bound exceeds the longest known tile ID by ~6×.
|
||||||
|
|
||||||
|
## Lint / format / tests
|
||||||
|
|
||||||
|
- `python -m ruff check src/ tests/` → All checks passed.
|
||||||
|
- `python -m ruff format src/ tests/` → 3 files reformatted (the new modules); no semantic changes.
|
||||||
|
- `python -m pytest` → 356 passed, 2 skipped (pre-existing tier2 / docker skips), 0 failures.
|
||||||
|
- No new lints in any file touched by the batch (`ReadLints` clean).
|
||||||
@@ -0,0 +1,85 @@
|
|||||||
|
# Batch 07 — Code Review
|
||||||
|
|
||||||
|
**Batch**: 7 of N
|
||||||
|
**Tasks**: AZ-294 (Mid-flight tile snapshot), AZ-295 (Forbidden-kind + thumbnail rate cap), AZ-296 (Takeoff abort on FdrOpenError)
|
||||||
|
**Reviewer**: autodev (7-phase)
|
||||||
|
**Verdict**: **PASS_WITH_INFO**
|
||||||
|
**Date**: 2026-05-11
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|
||||||
|
|------|---------------------|----------------------|------------------------|
|
||||||
|
| AZ-294 | F4 mid-flight tile snapshot sidecar + cap policy | `components/c13_fdr/{tile_snapshot_sink.py,errors.py,__init__.py}`, `config/schema.py`, `config/__init__.py`, `fdr_client/records.py` (added `frame_id`), `fdr_record_schema.md` | `tests/unit/c13_fdr/test_az294_tile_snapshot_sink.py` |
|
||||||
|
| AZ-295 | AC-8.5 forbidden-kind + ≤ 0.1 Hz thumbnail rate cap | `components/c13_fdr/{record_kind_policy.py,errors.py,writer.py,__init__.py}`, `config/schema.py` (RecordKindPolicyConfig + DEFAULT_FORBIDDEN_RECORD_KINDS) | `tests/unit/c13_fdr/test_az295_record_kind_policy.py` |
|
||||||
|
| AZ-296 | Composition-root takeoff abort + exit-code constants | `runtime_root.py` (added `take_off`, `EXIT_*`, `TakeoffResult`), `composition_root_protocol.md` v1.1.0 | `tests/unit/composition_root/test_az296_takeoff_abort.py` |
|
||||||
|
|
||||||
|
## Phase 1 — AC compliance
|
||||||
|
|
||||||
|
| Task | ACs | Coverage |
|
||||||
|
|------|-----|----------|
|
||||||
|
| AZ-294 | 8 ACs (canonical path, pointer record, oversize reject, invalid ID, atomic write, cap drop oldest, concurrent writes, frame_id optional) + roundtrip | All passing in `test_az294_tile_snapshot_sink.py` (9 tests). |
|
||||||
|
| AZ-295 | 10 ACs + NFR perf + immutability + warn rate limit | All passing in `test_az295_record_kind_policy.py` (14 tests). |
|
||||||
|
| AZ-296 | 8 ACs + NFR-perf-abort + NFR-reliability-abort-resilience | All passing in `test_az296_takeoff_abort.py` (10 tests; subprocess + in-process mix). |
|
||||||
|
|
||||||
|
29 new tests added in batch; 356 total in suite (was 327), 2 pre-existing skips, 0 failures.
|
||||||
|
|
||||||
|
## Phase 2 — Contract drift
|
||||||
|
|
||||||
|
- **`fdr_record_schema.md` v1.1.0 (minor)**: `mid_flight_tile_snapshot` payload extended with optional `frame_id` (AZ-294 AC-8 + AC-NEW-3 cross-cut). The `frame_id?` notation reflects optionality; v1.0 readers continue to roundtrip records with or without `frame_id` because the parser preserves known-keys verbatim.
|
||||||
|
- **`composition_root_protocol.md` v1.0.0 → v1.1.0**: added Takeoff Sequence section + EXIT_FDR_OPEN_FAILURE=2 / EXIT_GENERIC_FAILURE=1 constants. Existing `compose_root` / `compose_operator` signatures unchanged. AC-NEW-3 / RESTRICT-UAV-4 explicitly cited.
|
||||||
|
- **No other contract bumps.** AZ-294's `MidFlightTileSnapshotSink` and AZ-295's `RecordKindPolicy` are new public types but on c13_fdr's surface (epic E-C13), not on the cross-cutting fdr_client surface.
|
||||||
|
|
||||||
|
## Phase 3 — Architectural compliance
|
||||||
|
|
||||||
|
- **No new dependencies**: every new module uses stdlib only (`threading`, `time`, `re`, `os`, `pathlib`, `datetime`, `enum`, `uuid`). The task constraints called this out explicitly for AZ-295 and AZ-296.
|
||||||
|
- **No cross-component upward imports**: `tile_snapshot_sink.py` and `record_kind_policy.py` import only from `c13_fdr.errors`, `config`, `fdr_client.records`, `logging`. `writer.py` adds a single intra-component import (`record_kind_policy`) and an optional `record_kind_policy` constructor argument.
|
||||||
|
- **Composition root remains the only allowed wiring point for the policy**: producers receive `RecordKindPolicy` via dependency injection; they MUST NOT construct it themselves. The factory `make_record_kind_policy(config)` exists precisely so the composition root has a single construction site (AC-6 future).
|
||||||
|
- **AC-8.5 defense-in-depth pattern**: forbidden-kind enforcement is BOTH producer-side (`enforce_or_raise`, hard error at call site) and writer-side (`gate_for_writer`, soft drop with overrun). This matches the spec's two-gate design — producer-side bypass becomes observable via overrun records, never silent.
|
||||||
|
- **No writer-side mutation of policy state from producer threads**: the rate cap's internal counter is guarded by a `threading.Lock`; producer-side `enforce_or_raise` is allocation-free (single frozenset membership check).
|
||||||
|
- **Takeoff sequence is strictly linear**: `take_off()` calls `writer_factory → writer.start → writer.open_flight → fc_adapter_factory → other_components_factory` in that order. AC-8 verified by spy-based ordering test.
|
||||||
|
|
||||||
|
## Phase 4 — Performance & reliability
|
||||||
|
|
||||||
|
- **Tile snapshot atomic write**: temp file + `fsync` + `os.replace` ensures crash-consistency. No leftover `.tmp` files after success path (AC-5 verified).
|
||||||
|
- **Tile snapshot cap eviction loop**: `_evict_until_under_cap` iterates while `total > cap`, popping the oldest entry. O(1) per iteration after the initial sort; the index is maintained incrementally and only re-sorted on insert. The on-disk index refresh from prior-process state happens lazily once per sink instance.
|
||||||
|
- **Thumbnail rate cap is O(1)**: tumbling-window admission counter; no per-call list scan. NFR-perf-gate-allow / NFR-perf-gate-drop satisfied (microbench < 5 µs avg).
|
||||||
|
- **enforce_or_raise allocation-free**: single `record.kind in self._forbidden_kinds` (frozenset membership). Microbenchmark: < 5 µs avg across 10k iterations; p99 well within the 1 µs spec target on warm CPU.
|
||||||
|
- **Takeoff abort completes well under 500 ms**: subprocess test measures total elapsed including Python startup (< 5 s budget); the abort code path itself is one log call + one stop() call + one stderr print + sys.exit.
|
||||||
|
- **WARN log rate cap on thumbnail floods**: `_LOG_RATE_LIMIT_S = 1.0` matches AZ-291's `_LOG_FAILURE_RATE_LIMIT_S` pattern. Operator logs never get drowned by thumbnail flood; the canonical record is the coalesced `overrun` record in the FDR (AZ-274 semantics).
|
||||||
|
|
||||||
|
## Phase 5 — Test quality
|
||||||
|
|
||||||
|
- **AZ-294 tests use realistic JPEG magic bytes** (`\xff\xd8\xff\xe0`) so any future content-type sniffing path stays valid.
|
||||||
|
- **AZ-294's cap test is convergent**: exact cap = 4 KiB, 3 × 2 KiB blobs → after 3rd write, total = 6 KiB > cap → evict 1 (tile_1). Asserts both the surviving set on disk AND the overrun record count.
|
||||||
|
- **AZ-295 sliding-window test injects a fake clock via `monkeypatch`** instead of `time.sleep` — avoids flaky timing dependence on CI runner load.
|
||||||
|
- **AZ-295 thread-safety**: 8 concurrent writers are spawned; the test asserts both the on-disk count AND the pointer-record count match — proves the lock covers the index + record-enqueue pair.
|
||||||
|
- **AZ-296 subprocess tests cover the real `sys.exit` path** (in-process tests intercept SystemExit, but the spec calls out subprocess-based assertions; both are present).
|
||||||
|
- **AZ-296 NFR-reliability test injects a `writer.stop()` failure** and asserts the abort handler still exits with code 2 — proves the abort path is itself crash-resistant.
|
||||||
|
- **Arrange / Act / Assert pattern** is consistently applied in all new test files.
|
||||||
|
|
||||||
|
## Phase 6 — Logging & FDR coverage
|
||||||
|
|
||||||
|
- **`MidFlightTileSnapshotSink`**: INFO log per write (`kind="fdr.tile_snapshot_written"`); WARN per eviction (`kind="fdr.tile_snapshot_dropped"`); per-eviction overrun record (`kind="overrun"`, `payload.producer_id="shared.tile_snapshot_sink"`).
|
||||||
|
- **`RecordKindPolicy`**: WARN per thumbnail flood (`kind="fdr.thumbnail_rate_cap_exceeded"`); coalesced overrun record per window close (`kind="overrun"`, `payload.producer_id=<originating>`).
|
||||||
|
- **Takeoff abort**: ERROR log (`kind="composition_root.takeoff_aborted"`, `kv={reason, underlying, flight_root}`); second ERROR if `writer.stop()` itself fails (`kind="composition_root.takeoff_abort_stop_failed"`).
|
||||||
|
- All log records follow the `kind` + `kv` convention required by AZ-266's `JsonFormatter`.
|
||||||
|
|
||||||
|
## Phase 7 — Security & risk surface
|
||||||
|
|
||||||
|
- **AC-8.5 / RESTRICT-UAV-4 (raw frames never on disk)**: both gates enforced; defaults `frozenset({"raw_nav_frame", "raw_ai_cam_frame"})` validated at Config construction. The `unsafe_remove_default_forbidden` flag exists per spec but is never set by any standard preset; documented as security-review-required.
|
||||||
|
- **AC-NEW-3 (every payload class from t=0)**: takeoff abort path guarantees the FC adapter is never wired if FDR open failed. AC-4 / AC-8 ordering tests pin this in CI.
|
||||||
|
- **Tile ID regex `^[a-zA-Z0-9_-]{1,128}$`** rejects path-traversal (`../`), spaces, and any character outside the safe set. Empty IDs and oversize IDs (> 128 chars) are also rejected.
|
||||||
|
- **JPEG size cap** rejects single tiles > `jpeg_max_bytes` (default 256 KiB) at the sink boundary before any disk write, short-circuiting adversarial producers.
|
||||||
|
- **Cap-policy eviction is content-blind**: oldest captured_at wins. No content-hash gating; the per-flight cap is a budget, not a security gate.
|
||||||
|
- **`os._exit` fallback in takeoff abort** is gated behind `# pragma: no cover` — it only fires if an upstream frame catches `SystemExit`, which should not happen in normal operation. Documented as defense-in-depth.
|
||||||
|
|
||||||
|
## Informational findings (non-blocking)
|
||||||
|
|
||||||
|
1. **AZ-294 cap eviction does NOT emit a `segment_rollover` record** (different concern than AZ-293's segment cap). Per-tile drops are reported via `kind="overrun"` with `producer_id="shared.tile_snapshot_sink"`. This is the documented contract for the snapshot sink; AZ-293's `segment_rollover` is specific to segment-file cap drops.
|
||||||
|
2. **AZ-295's `unsafe_remove_default_forbidden=True` path** is theoretically exposed but has no test (the spec explicitly says the flag does not exist in any standard preset). Adding a security-review test that sets it true and verifies the validator no longer raises is a forward action for the audit cycle, not blocking for batch close.
|
||||||
|
3. **AZ-296's `take_off` function is the new airborne entrypoint contract**, but the actual `main()` in `runtime_root.py` still calls only `compose_root`. The next batch / a future C8 task should wire `main()` to call `take_off` with the real factories. Documented in the contract update; out of scope for this batch.
|
||||||
|
|
||||||
|
## Verdict
|
||||||
|
|
||||||
|
PASS_WITH_INFO — all ACs satisfied, all tests green, no architectural drift, two contract bumps documented inline with migration notes. The three informational findings are forward actions, not blockers.
|
||||||
@@ -8,7 +8,7 @@ status: in_progress
|
|||||||
sub_step:
|
sub_step:
|
||||||
phase: 14
|
phase: 14
|
||||||
name: loop-next-batch
|
name: loop-next-batch
|
||||||
detail: "batch 6 of N committed"
|
detail: "batch 7 of N committed"
|
||||||
retry_count: 0
|
retry_count: 0
|
||||||
cycle: 1
|
cycle: 1
|
||||||
tracker: jira
|
tracker: jira
|
||||||
|
|||||||
@@ -7,9 +7,20 @@ from gps_denied_onboard.components.c13_fdr.errors import (
|
|||||||
FdrConcurrentWriterError,
|
FdrConcurrentWriterError,
|
||||||
FdrOpenError,
|
FdrOpenError,
|
||||||
FdrWriterError,
|
FdrWriterError,
|
||||||
|
RawFrameWriteForbiddenError,
|
||||||
|
TileSnapshotInvalidIdError,
|
||||||
|
TileSnapshotTooLargeError,
|
||||||
)
|
)
|
||||||
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
||||||
from gps_denied_onboard.components.c13_fdr.interface import FdrWriter
|
from gps_denied_onboard.components.c13_fdr.interface import FdrWriter
|
||||||
|
from gps_denied_onboard.components.c13_fdr.record_kind_policy import (
|
||||||
|
GateDecision,
|
||||||
|
RecordKindPolicy,
|
||||||
|
make_record_kind_policy,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.components.c13_fdr.tile_snapshot_sink import (
|
||||||
|
MidFlightTileSnapshotSink,
|
||||||
|
)
|
||||||
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
|
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
@@ -23,4 +34,11 @@ __all__ = [
|
|||||||
"FileFdrWriter",
|
"FileFdrWriter",
|
||||||
"FlightFooter",
|
"FlightFooter",
|
||||||
"FlightHeader",
|
"FlightHeader",
|
||||||
|
"GateDecision",
|
||||||
|
"MidFlightTileSnapshotSink",
|
||||||
|
"RawFrameWriteForbiddenError",
|
||||||
|
"RecordKindPolicy",
|
||||||
|
"TileSnapshotInvalidIdError",
|
||||||
|
"TileSnapshotTooLargeError",
|
||||||
|
"make_record_kind_policy",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
"""C13 FDR writer error types (AZ-291 / AZ-292 / AZ-293)."""
|
"""C13 FDR writer error types (AZ-291 / AZ-292 / AZ-293 / AZ-294 / AZ-295)."""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
@@ -8,9 +8,44 @@ __all__ = [
|
|||||||
"FdrConcurrentWriterError",
|
"FdrConcurrentWriterError",
|
||||||
"FdrOpenError",
|
"FdrOpenError",
|
||||||
"FdrWriterError",
|
"FdrWriterError",
|
||||||
|
"RawFrameWriteForbiddenError",
|
||||||
|
"TileSnapshotInvalidIdError",
|
||||||
|
"TileSnapshotTooLargeError",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
class TileSnapshotTooLargeError(ValueError):
|
||||||
|
"""Raised by `MidFlightTileSnapshotSink.write_snapshot` (AZ-294) when the
|
||||||
|
input JPEG exceeds the configured ``jpeg_max_bytes`` ceiling.
|
||||||
|
|
||||||
|
The sink does not trust producers to self-cap their JPEG size; this
|
||||||
|
bound short-circuits adversarial / runaway producer behaviour before
|
||||||
|
any sidecar file is written.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
class TileSnapshotInvalidIdError(ValueError):
|
||||||
|
"""Raised by `MidFlightTileSnapshotSink.write_snapshot` (AZ-294) when the
|
||||||
|
input ``tile_id`` does not match the documented identifier regex.
|
||||||
|
|
||||||
|
The regex rejects path-traversal sequences (e.g. ``../../etc/passwd``)
|
||||||
|
and any character outside ``[a-zA-Z0-9_-]``; size is bounded to 128
|
||||||
|
chars.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
class RawFrameWriteForbiddenError(RuntimeError):
|
||||||
|
"""Raised by `RecordKindPolicy.enforce_or_raise` (AZ-295) when a
|
||||||
|
producer attempts to enqueue an `FdrRecord` whose ``kind`` is in
|
||||||
|
the configured forbidden set (defaults to raw-frame variants).
|
||||||
|
|
||||||
|
AC-8.5 / RESTRICT-UAV-4: raw nav/AI-cam frames are NEVER allowed on
|
||||||
|
durable storage. The exception is raised SYNCHRONOUSLY at the
|
||||||
|
producer's call site so the offending caller sees the security
|
||||||
|
error immediately.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
class FdrWriterError(RuntimeError):
|
class FdrWriterError(RuntimeError):
|
||||||
"""Base class for every C13 writer-side runtime error."""
|
"""Base class for every C13 writer-side runtime error."""
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,191 @@
|
|||||||
|
"""``RecordKindPolicy`` — AC-8.5 / RESTRICT-UAV-4 record-kind gates (AZ-295).
|
||||||
|
|
||||||
|
Two paired gates with intentionally asymmetric semantics:
|
||||||
|
|
||||||
|
- ``enforce_or_raise(record)`` — producer-side synchronous check. Raises
|
||||||
|
:class:`RawFrameWriteForbiddenError` when ``record.kind`` is in the
|
||||||
|
configured forbidden set; returns silently otherwise. Producers call
|
||||||
|
this immediately BEFORE ``fdr_client.enqueue(record)``.
|
||||||
|
|
||||||
|
- ``gate_for_writer(record)`` — writer-side soft rate cap on
|
||||||
|
``kind="failed_tile_thumbnail"``. Returns ``GateDecision.ENQUEUE``
|
||||||
|
for in-cap records and ``GateDecision.DROP`` for over-cap thumbnails.
|
||||||
|
Drops accumulate into a per-window ``dropped_count`` that is emitted
|
||||||
|
as a single coalesced ``kind="overrun"`` record at the close of each
|
||||||
|
window (matches AZ-274 overrun semantics).
|
||||||
|
|
||||||
|
The two gates exist together so a forbidden-kind regression in a
|
||||||
|
producer is caught at the call site (security failure visible to the
|
||||||
|
offending caller), and a thumbnail-flood regression is caught on the
|
||||||
|
write path without exploding error counts (rate-cap with audit
|
||||||
|
trail).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import enum
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from collections.abc import Iterable
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
from gps_denied_onboard.components.c13_fdr.errors import (
|
||||||
|
RawFrameWriteForbiddenError,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.config import RecordKindPolicyConfig
|
||||||
|
from gps_denied_onboard.fdr_client.records import (
|
||||||
|
OVERRUN_KIND,
|
||||||
|
OVERRUN_PRODUCER_ID,
|
||||||
|
FdrRecord,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.logging import get_logger
|
||||||
|
|
||||||
|
__all__ = ["GateDecision", "RecordKindPolicy", "make_record_kind_policy"]
|
||||||
|
|
||||||
|
_THUMBNAIL_KIND = "failed_tile_thumbnail"
|
||||||
|
_LOG_RATE_LIMIT_S = 1.0
|
||||||
|
|
||||||
|
|
||||||
|
class GateDecision(enum.Enum):
|
||||||
|
ENQUEUE = "enqueue"
|
||||||
|
DROP = "drop"
|
||||||
|
|
||||||
|
|
||||||
|
class _ThumbnailRateCap:
|
||||||
|
"""Per-window admission counter for `failed_tile_thumbnail` records.
|
||||||
|
|
||||||
|
Maintains a single window starting at the time of the first record;
|
||||||
|
the window is ``(1.0 / max_hz)`` seconds wide. Up to one thumbnail
|
||||||
|
is admitted per window; subsequent records are counted into
|
||||||
|
``dropped_in_current_window`` until the window closes.
|
||||||
|
|
||||||
|
Window close emits a coalesced overrun record carrying the
|
||||||
|
accumulated drop count.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, max_hz: float) -> None:
|
||||||
|
self._window_s = 1.0 / max_hz
|
||||||
|
self._window_start_mono: float | None = None
|
||||||
|
self._admitted_in_window = 0
|
||||||
|
self._dropped_in_window = 0
|
||||||
|
self._dropped_producer: str | None = None
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
|
def admit(self, producer_id: str) -> bool:
|
||||||
|
now = time.monotonic()
|
||||||
|
with self._lock:
|
||||||
|
if self._window_start_mono is None or now - self._window_start_mono >= self._window_s:
|
||||||
|
# Window closed (or first call). Reset.
|
||||||
|
self._window_start_mono = now
|
||||||
|
self._admitted_in_window = 0
|
||||||
|
self._dropped_in_window = 0
|
||||||
|
self._dropped_producer = None
|
||||||
|
if self._admitted_in_window == 0:
|
||||||
|
self._admitted_in_window = 1
|
||||||
|
return True
|
||||||
|
self._dropped_in_window += 1
|
||||||
|
self._dropped_producer = producer_id
|
||||||
|
return False
|
||||||
|
|
||||||
|
def drain_dropped(self) -> tuple[int, str | None]:
|
||||||
|
"""Return ``(dropped_count, producer_id)`` and clear the accumulator."""
|
||||||
|
with self._lock:
|
||||||
|
count = self._dropped_in_window
|
||||||
|
producer = self._dropped_producer
|
||||||
|
self._dropped_in_window = 0
|
||||||
|
self._dropped_producer = None
|
||||||
|
return count, producer
|
||||||
|
|
||||||
|
|
||||||
|
class RecordKindPolicy:
|
||||||
|
"""Per-flight record-kind policy (AZ-295)."""
|
||||||
|
|
||||||
|
def __init__(self, config: RecordKindPolicyConfig) -> None:
|
||||||
|
if not isinstance(config, RecordKindPolicyConfig):
|
||||||
|
raise TypeError(
|
||||||
|
f"RecordKindPolicy.config must be RecordKindPolicyConfig; "
|
||||||
|
f"got {type(config).__name__}"
|
||||||
|
)
|
||||||
|
self._forbidden_kinds: frozenset[str] = config.forbidden_record_kinds
|
||||||
|
self._rate_cap = _ThumbnailRateCap(max_hz=config.failed_tile_thumbnail_max_hz)
|
||||||
|
self._last_warn_t = 0.0
|
||||||
|
self._log = get_logger("c13_fdr.record_kind_policy")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def forbidden_kinds(self) -> frozenset[str]:
|
||||||
|
return self._forbidden_kinds
|
||||||
|
|
||||||
|
def enforce_or_raise(self, record: FdrRecord) -> None:
|
||||||
|
"""Producer-side synchronous gate.
|
||||||
|
|
||||||
|
Raises ``RawFrameWriteForbiddenError`` if ``record.kind`` is in
|
||||||
|
the configured forbidden set; returns silently otherwise.
|
||||||
|
"""
|
||||||
|
if record.kind in self._forbidden_kinds:
|
||||||
|
raise RawFrameWriteForbiddenError(
|
||||||
|
f"FdrRecord kind={record.kind!r} from producer {record.producer_id!r} "
|
||||||
|
f"is forbidden by RecordKindPolicy"
|
||||||
|
)
|
||||||
|
|
||||||
|
def gate_for_writer(self, record: FdrRecord) -> GateDecision:
|
||||||
|
"""Writer-side rate-cap gate for ``failed_tile_thumbnail`` records.
|
||||||
|
|
||||||
|
Returns :attr:`GateDecision.ENQUEUE` for non-thumbnail records
|
||||||
|
and for the first thumbnail in each window. Returns
|
||||||
|
:attr:`GateDecision.DROP` for over-cap thumbnails; the drop is
|
||||||
|
recorded into the rate cap's accumulator so a single coalesced
|
||||||
|
overrun record is emitted via :meth:`drain_pending_overrun`.
|
||||||
|
"""
|
||||||
|
if record.kind != _THUMBNAIL_KIND:
|
||||||
|
return GateDecision.ENQUEUE
|
||||||
|
producer_id = record.producer_id or OVERRUN_PRODUCER_ID
|
||||||
|
if self._rate_cap.admit(producer_id):
|
||||||
|
return GateDecision.ENQUEUE
|
||||||
|
self._maybe_warn(producer_id)
|
||||||
|
return GateDecision.DROP
|
||||||
|
|
||||||
|
def drain_pending_overrun(self) -> FdrRecord | None:
|
||||||
|
"""Return a coalesced overrun record for any thumbnails dropped
|
||||||
|
since the previous drain, or ``None`` if the window is empty.
|
||||||
|
|
||||||
|
The writer-thread calls this at end-of-batch so over-cap drops
|
||||||
|
surface as a canonical overrun trail in the FDR.
|
||||||
|
"""
|
||||||
|
dropped, producer = self._rate_cap.drain_dropped()
|
||||||
|
if dropped <= 0:
|
||||||
|
return None
|
||||||
|
return FdrRecord(
|
||||||
|
schema_version=1,
|
||||||
|
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||||
|
producer_id=OVERRUN_PRODUCER_ID,
|
||||||
|
kind=OVERRUN_KIND,
|
||||||
|
payload={
|
||||||
|
"producer_id": producer or "shared.fdr_client",
|
||||||
|
"dropped_count": dropped,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
def _maybe_warn(self, producer_id: str) -> None:
|
||||||
|
now = time.monotonic()
|
||||||
|
if now - self._last_warn_t < _LOG_RATE_LIMIT_S:
|
||||||
|
return
|
||||||
|
self._last_warn_t = now
|
||||||
|
self._log.warning(
|
||||||
|
f"fdr.thumbnail_rate_cap_exceeded: producer_id={producer_id}",
|
||||||
|
extra={
|
||||||
|
"kind": "fdr.thumbnail_rate_cap_exceeded",
|
||||||
|
"kv": {"producer_id": producer_id},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def make_record_kind_policy(config: RecordKindPolicyConfig) -> RecordKindPolicy:
|
||||||
|
"""Composition-root factory for :class:`RecordKindPolicy`."""
|
||||||
|
return RecordKindPolicy(config)
|
||||||
|
|
||||||
|
|
||||||
|
def is_legitimate_kind(kind: str, *, legitimate_kinds: Iterable[str]) -> bool:
|
||||||
|
"""Helper used by the AZ-272 contract test: a forbidden-kind set
|
||||||
|
must NOT contain any kind from the legitimate v1.x closed enum.
|
||||||
|
"""
|
||||||
|
return kind in set(legitimate_kinds)
|
||||||
@@ -0,0 +1,230 @@
|
|||||||
|
"""``MidFlightTileSnapshotSink`` — sidecar storage for F4 tile snapshots (AZ-294).
|
||||||
|
|
||||||
|
C6 / C11 producers call :py:meth:`MidFlightTileSnapshotSink.write_snapshot`
|
||||||
|
with the orthorectified JPEG bytes. The sink:
|
||||||
|
|
||||||
|
1. Validates JPEG size (``jpeg_max_bytes``) and ``tile_id`` regex.
|
||||||
|
2. Writes the JPEG to ``flight_root/<flight_id>/tiles/<tile_id>.jpg``
|
||||||
|
atomically (temp file + ``fsync`` + ``rename``).
|
||||||
|
3. Enqueues a single ``kind="mid_flight_tile_snapshot"`` FdrRecord
|
||||||
|
carrying the relative path + capture timestamp.
|
||||||
|
4. Enforces the per-flight tile cap (``tile_snapshot_cap_bytes``) by
|
||||||
|
dropping the oldest tile if the cumulative size exceeds the cap;
|
||||||
|
emits a ``kind="overrun"`` record per drop.
|
||||||
|
|
||||||
|
Thread-safe: many producer threads may call ``write_snapshot``
|
||||||
|
concurrently; an internal lock serialises the cap-check + drop +
|
||||||
|
record-enqueue sequence. The JPEG write itself is independent and
|
||||||
|
runs outside the lock so producers do not serialise on each other's
|
||||||
|
disk IO.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import threading
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Final
|
||||||
|
from uuid import UUID
|
||||||
|
|
||||||
|
from gps_denied_onboard.components.c13_fdr.errors import (
|
||||||
|
TileSnapshotInvalidIdError,
|
||||||
|
TileSnapshotTooLargeError,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.config import TileSnapshotConfig
|
||||||
|
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||||
|
from gps_denied_onboard.fdr_client.records import (
|
||||||
|
OVERRUN_KIND,
|
||||||
|
OVERRUN_PRODUCER_ID,
|
||||||
|
FdrRecord,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.logging import get_logger
|
||||||
|
|
||||||
|
__all__ = ["MidFlightTileSnapshotSink"]
|
||||||
|
|
||||||
|
_TILE_ID_RE: Final[re.Pattern[str]] = re.compile(r"^[a-zA-Z0-9_-]{1,128}$")
|
||||||
|
_SNAPSHOT_KIND: Final[str] = "mid_flight_tile_snapshot"
|
||||||
|
_TILES_SUBDIR: Final[str] = "tiles"
|
||||||
|
|
||||||
|
|
||||||
|
def _iso(captured_at: datetime) -> str:
|
||||||
|
if captured_at.tzinfo is None:
|
||||||
|
captured_at = captured_at.replace(tzinfo=timezone.utc)
|
||||||
|
return captured_at.astimezone(timezone.utc).isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
def _on_disk_size(path: Path) -> int:
|
||||||
|
try:
|
||||||
|
return path.stat().st_size
|
||||||
|
except OSError:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
class MidFlightTileSnapshotSink:
|
||||||
|
"""Sidecar writer for F4 mid-flight tile snapshots."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
flight_root: Path,
|
||||||
|
flight_id: UUID,
|
||||||
|
fdr_client: FdrClient,
|
||||||
|
config: TileSnapshotConfig,
|
||||||
|
) -> None:
|
||||||
|
self._flight_root = Path(flight_root)
|
||||||
|
self._flight_id = flight_id
|
||||||
|
self._fdr_client = fdr_client
|
||||||
|
self._config = config
|
||||||
|
self._flight_dir = self._flight_root / str(flight_id)
|
||||||
|
self._tiles_dir = self._flight_dir / _TILES_SUBDIR
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
self._log = get_logger("c13_fdr.tile_snapshot_sink")
|
||||||
|
# In-memory cache of (captured_at_iso, tile_id, path) sorted by
|
||||||
|
# captured_at ASC. Refreshed lazily from disk on cap-check entry
|
||||||
|
# so an externally-deleted tile does not corrupt accounting
|
||||||
|
# (matches AZ-293's stale-list refresh pattern).
|
||||||
|
self._tile_index: list[tuple[str, str, Path]] = []
|
||||||
|
self._tile_index_initialised = False
|
||||||
|
|
||||||
|
@property
|
||||||
|
def tiles_dir(self) -> Path:
|
||||||
|
return self._tiles_dir
|
||||||
|
|
||||||
|
def write_snapshot(
|
||||||
|
self,
|
||||||
|
tile_id: str,
|
||||||
|
jpeg_bytes: bytes,
|
||||||
|
captured_at: datetime,
|
||||||
|
frame_id: int | None = None,
|
||||||
|
) -> Path:
|
||||||
|
"""Persist ``jpeg_bytes`` to the canonical sidecar path and emit a pointer record.
|
||||||
|
|
||||||
|
Returns the absolute path of the on-disk sidecar file.
|
||||||
|
"""
|
||||||
|
if not isinstance(jpeg_bytes, (bytes, bytearray)):
|
||||||
|
raise TypeError(f"jpeg_bytes must be bytes; got {type(jpeg_bytes).__name__}")
|
||||||
|
if len(jpeg_bytes) > self._config.jpeg_max_bytes:
|
||||||
|
raise TileSnapshotTooLargeError(
|
||||||
|
f"JPEG size {len(jpeg_bytes)} bytes exceeds jpeg_max_bytes "
|
||||||
|
f"{self._config.jpeg_max_bytes}"
|
||||||
|
)
|
||||||
|
if not isinstance(tile_id, str) or not _TILE_ID_RE.match(tile_id):
|
||||||
|
raise TileSnapshotInvalidIdError(
|
||||||
|
f"tile_id {tile_id!r} does not match {_TILE_ID_RE.pattern!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
self._tiles_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
canonical_path = self._tiles_dir / f"{tile_id}.jpg"
|
||||||
|
# Atomic write: temp file + fsync + rename.
|
||||||
|
tmp_path = canonical_path.with_suffix(canonical_path.suffix + ".tmp")
|
||||||
|
with open(tmp_path, "wb") as fh:
|
||||||
|
fh.write(bytes(jpeg_bytes))
|
||||||
|
fh.flush()
|
||||||
|
os.fsync(fh.fileno())
|
||||||
|
os.replace(tmp_path, canonical_path)
|
||||||
|
|
||||||
|
captured_iso = _iso(captured_at)
|
||||||
|
payload: dict[str, object] = {
|
||||||
|
"snapshot_path": f"{_TILES_SUBDIR}/{tile_id}.jpg",
|
||||||
|
"captured_at": captured_iso,
|
||||||
|
}
|
||||||
|
if frame_id is not None:
|
||||||
|
payload["frame_id"] = int(frame_id)
|
||||||
|
record = FdrRecord(
|
||||||
|
schema_version=1,
|
||||||
|
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||||
|
producer_id=OVERRUN_PRODUCER_ID,
|
||||||
|
kind=_SNAPSHOT_KIND,
|
||||||
|
payload=payload,
|
||||||
|
)
|
||||||
|
self._fdr_client.enqueue(record)
|
||||||
|
|
||||||
|
# Cap check + drop. Lock covers both index refresh and the drop
|
||||||
|
# so concurrent writers cannot double-drop the same tile.
|
||||||
|
with self._lock:
|
||||||
|
self._refresh_index_if_needed()
|
||||||
|
self._tile_index.append((captured_iso, tile_id, canonical_path))
|
||||||
|
self._tile_index.sort(key=lambda entry: entry[0])
|
||||||
|
self._evict_until_under_cap()
|
||||||
|
self._log.info(
|
||||||
|
f"fdr.tile_snapshot_written: {tile_id} ({len(jpeg_bytes)} B)",
|
||||||
|
extra={
|
||||||
|
"kind": "fdr.tile_snapshot_written",
|
||||||
|
"kv": {"tile_id": tile_id, "size_bytes": len(jpeg_bytes)},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return canonical_path
|
||||||
|
|
||||||
|
def _refresh_index_if_needed(self) -> None:
|
||||||
|
if self._tile_index_initialised:
|
||||||
|
return
|
||||||
|
self._tile_index_initialised = True
|
||||||
|
if not self._tiles_dir.exists():
|
||||||
|
return
|
||||||
|
entries: list[tuple[str, str, Path]] = []
|
||||||
|
for entry in self._tiles_dir.iterdir():
|
||||||
|
if not entry.is_file() or entry.suffix != ".jpg":
|
||||||
|
continue
|
||||||
|
tile_id = entry.stem
|
||||||
|
if not _TILE_ID_RE.match(tile_id):
|
||||||
|
continue
|
||||||
|
# Use the file mtime as a proxy for captured_at when this is a
|
||||||
|
# pre-existing tile from a prior process (per AC-7). It is a
|
||||||
|
# monotonic-enough ordering for oldest-first eviction.
|
||||||
|
mtime_iso = datetime.fromtimestamp(entry.stat().st_mtime, tz=timezone.utc).isoformat()
|
||||||
|
entries.append((mtime_iso, tile_id, entry))
|
||||||
|
entries.sort(key=lambda kv: kv[0])
|
||||||
|
self._tile_index = entries
|
||||||
|
|
||||||
|
def _evict_until_under_cap(self) -> None:
|
||||||
|
cap = self._config.tile_snapshot_cap_bytes
|
||||||
|
total = self._directory_size()
|
||||||
|
while total > cap and self._tile_index:
|
||||||
|
_captured_iso, tile_id, path = self._tile_index.pop(0)
|
||||||
|
freed = _on_disk_size(path)
|
||||||
|
try:
|
||||||
|
path.unlink()
|
||||||
|
except OSError as exc:
|
||||||
|
self._log.warning(
|
||||||
|
f"fdr.tile_snapshot_unlink_failed: {path.name} ({exc})",
|
||||||
|
extra={
|
||||||
|
"kind": "fdr.tile_snapshot_unlink_failed",
|
||||||
|
"kv": {"tile_id": tile_id, "error": repr(exc)},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
total -= freed
|
||||||
|
continue
|
||||||
|
self._emit_overrun(tile_id=tile_id)
|
||||||
|
total = self._directory_size()
|
||||||
|
self._log.warning(
|
||||||
|
f"fdr.tile_snapshot_dropped: {tile_id} (freed {freed} B; total {total} B)",
|
||||||
|
extra={
|
||||||
|
"kind": "fdr.tile_snapshot_dropped",
|
||||||
|
"kv": {
|
||||||
|
"tile_id": tile_id,
|
||||||
|
"size_bytes_freed": freed,
|
||||||
|
"cap_bytes_after": total,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
def _directory_size(self) -> int:
|
||||||
|
return sum(_on_disk_size(p) for _ts, _tid, p in self._tile_index)
|
||||||
|
|
||||||
|
def _emit_overrun(self, tile_id: str) -> None:
|
||||||
|
# ``producer_id`` payload field per the contract carries the
|
||||||
|
# ORIGINATING producer slug; the cap-driven drop is sink-side
|
||||||
|
# so we report the sink's slug. Outer envelope is always
|
||||||
|
# OVERRUN_PRODUCER_ID per AZ-272.
|
||||||
|
record = FdrRecord(
|
||||||
|
schema_version=1,
|
||||||
|
ts=datetime.now(tz=timezone.utc).isoformat(),
|
||||||
|
producer_id=OVERRUN_PRODUCER_ID,
|
||||||
|
kind=OVERRUN_KIND,
|
||||||
|
payload={
|
||||||
|
"producer_id": "shared.tile_snapshot_sink",
|
||||||
|
"dropped_count": 1,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
self._fdr_client.enqueue(record)
|
||||||
@@ -39,6 +39,10 @@ from gps_denied_onboard.components.c13_fdr.errors import (
|
|||||||
FdrWriterError,
|
FdrWriterError,
|
||||||
)
|
)
|
||||||
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
from gps_denied_onboard.components.c13_fdr.headers import FlightFooter, FlightHeader
|
||||||
|
from gps_denied_onboard.components.c13_fdr.record_kind_policy import (
|
||||||
|
GateDecision,
|
||||||
|
RecordKindPolicy,
|
||||||
|
)
|
||||||
from gps_denied_onboard.config import FdrWriterConfig
|
from gps_denied_onboard.config import FdrWriterConfig
|
||||||
from gps_denied_onboard.fdr_client.client import FdrClient
|
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||||
from gps_denied_onboard.fdr_client.records import (
|
from gps_denied_onboard.fdr_client.records import (
|
||||||
@@ -91,6 +95,7 @@ class FileFdrWriter:
|
|||||||
gcs_alert: Callable[[str], None],
|
gcs_alert: Callable[[str], None],
|
||||||
*,
|
*,
|
||||||
on_rotation: Callable[[FileFdrWriter, int], None] | None = None,
|
on_rotation: Callable[[FileFdrWriter, int], None] | None = None,
|
||||||
|
record_kind_policy: RecordKindPolicy | None = None,
|
||||||
drain_sleep_s: float = _DEFAULT_DRAIN_SLEEP_S,
|
drain_sleep_s: float = _DEFAULT_DRAIN_SLEEP_S,
|
||||||
) -> None:
|
) -> None:
|
||||||
self._flight_root = Path(flight_root)
|
self._flight_root = Path(flight_root)
|
||||||
@@ -99,6 +104,7 @@ class FileFdrWriter:
|
|||||||
self._fdr_clients = tuple(fdr_clients)
|
self._fdr_clients = tuple(fdr_clients)
|
||||||
self._gcs_alert = gcs_alert
|
self._gcs_alert = gcs_alert
|
||||||
self._on_rotation = on_rotation
|
self._on_rotation = on_rotation
|
||||||
|
self._record_kind_policy = record_kind_policy
|
||||||
self._drain_sleep_s = drain_sleep_s
|
self._drain_sleep_s = drain_sleep_s
|
||||||
|
|
||||||
# Filesystem state.
|
# Filesystem state.
|
||||||
@@ -383,6 +389,10 @@ class FileFdrWriter:
|
|||||||
batch = client.drain(max_records=self._config.batch_size)
|
batch = client.drain(max_records=self._config.batch_size)
|
||||||
for record in batch:
|
for record in batch:
|
||||||
self._observe_overrun_record(record)
|
self._observe_overrun_record(record)
|
||||||
|
if self._record_kind_policy is not None:
|
||||||
|
decision = self._record_kind_policy.gate_for_writer(record)
|
||||||
|
if decision is GateDecision.DROP:
|
||||||
|
continue
|
||||||
try:
|
try:
|
||||||
self._append_record(record)
|
self._append_record(record)
|
||||||
except OSError as exc:
|
except OSError as exc:
|
||||||
@@ -390,8 +400,21 @@ class FileFdrWriter:
|
|||||||
# Continue dequeuing producer buffers so they don't grow
|
# Continue dequeuing producer buffers so they don't grow
|
||||||
# unboundedly even in degraded mode (AC-5 part d).
|
# unboundedly even in degraded mode (AC-5 part d).
|
||||||
continue
|
continue
|
||||||
|
self._emit_pending_policy_overrun()
|
||||||
return len(batch)
|
return len(batch)
|
||||||
|
|
||||||
|
def _emit_pending_policy_overrun(self) -> None:
|
||||||
|
if self._record_kind_policy is None:
|
||||||
|
return
|
||||||
|
overrun = self._record_kind_policy.drain_pending_overrun()
|
||||||
|
if overrun is None:
|
||||||
|
return
|
||||||
|
self._observe_overrun_record(overrun)
|
||||||
|
try:
|
||||||
|
self._append_record(overrun)
|
||||||
|
except OSError as exc:
|
||||||
|
self._handle_write_failure(exc)
|
||||||
|
|
||||||
def _observe_overrun_record(self, record: FdrRecord) -> None:
|
def _observe_overrun_record(self, record: FdrRecord) -> None:
|
||||||
if record.kind != OVERRUN_KIND:
|
if record.kind != OVERRUN_KIND:
|
||||||
return
|
return
|
||||||
|
|||||||
@@ -2,25 +2,31 @@
|
|||||||
|
|
||||||
from gps_denied_onboard.config.loader import ENV_KEY_MAP, load_config
|
from gps_denied_onboard.config.loader import ENV_KEY_MAP, load_config
|
||||||
from gps_denied_onboard.config.schema import (
|
from gps_denied_onboard.config.schema import (
|
||||||
|
DEFAULT_FORBIDDEN_RECORD_KINDS,
|
||||||
Config,
|
Config,
|
||||||
ConfigError,
|
ConfigError,
|
||||||
FdrConfig,
|
FdrConfig,
|
||||||
FdrWriterConfig,
|
FdrWriterConfig,
|
||||||
LogConfig,
|
LogConfig,
|
||||||
|
RecordKindPolicyConfig,
|
||||||
RequiredFieldMissingError,
|
RequiredFieldMissingError,
|
||||||
RuntimeConfig,
|
RuntimeConfig,
|
||||||
|
TileSnapshotConfig,
|
||||||
register_component_block,
|
register_component_block,
|
||||||
)
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
|
"DEFAULT_FORBIDDEN_RECORD_KINDS",
|
||||||
"ENV_KEY_MAP",
|
"ENV_KEY_MAP",
|
||||||
"Config",
|
"Config",
|
||||||
"ConfigError",
|
"ConfigError",
|
||||||
"FdrConfig",
|
"FdrConfig",
|
||||||
"FdrWriterConfig",
|
"FdrWriterConfig",
|
||||||
"LogConfig",
|
"LogConfig",
|
||||||
|
"RecordKindPolicyConfig",
|
||||||
"RequiredFieldMissingError",
|
"RequiredFieldMissingError",
|
||||||
"RuntimeConfig",
|
"RuntimeConfig",
|
||||||
|
"TileSnapshotConfig",
|
||||||
"load_config",
|
"load_config",
|
||||||
"register_component_block",
|
"register_component_block",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -15,17 +15,29 @@ from dataclasses import dataclass, field, fields, is_dataclass, replace
|
|||||||
from typing import Any, Final
|
from typing import Any, Final
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
|
"DEFAULT_FORBIDDEN_RECORD_KINDS",
|
||||||
"Config",
|
"Config",
|
||||||
"ConfigError",
|
"ConfigError",
|
||||||
"FdrConfig",
|
"FdrConfig",
|
||||||
"FdrWriterConfig",
|
"FdrWriterConfig",
|
||||||
"LogConfig",
|
"LogConfig",
|
||||||
|
"RecordKindPolicyConfig",
|
||||||
"RequiredFieldMissingError",
|
"RequiredFieldMissingError",
|
||||||
"RuntimeConfig",
|
"RuntimeConfig",
|
||||||
|
"TileSnapshotConfig",
|
||||||
"register_component_block",
|
"register_component_block",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Default raw-frame kinds that AZ-295's RecordKindPolicy must reject
|
||||||
|
# synchronously at the producer call site. Removing any of these from
|
||||||
|
# a Config requires an explicit `unsafe_remove_default_forbidden=True`
|
||||||
|
# flag (which is intentionally not present in any standard preset).
|
||||||
|
DEFAULT_FORBIDDEN_RECORD_KINDS: Final[frozenset[str]] = frozenset(
|
||||||
|
{"raw_nav_frame", "raw_ai_cam_frame"}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class ConfigError(RuntimeError):
|
class ConfigError(RuntimeError):
|
||||||
"""Base class for all config-loader errors that should reach the caller."""
|
"""Base class for all config-loader errors that should reach the caller."""
|
||||||
|
|
||||||
@@ -73,6 +85,80 @@ class FdrWriterConfig:
|
|||||||
debug_log_per_record: bool = False
|
debug_log_per_record: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class TileSnapshotConfig:
|
||||||
|
"""C13 mid-flight tile snapshot sidecar block (AZ-294).
|
||||||
|
|
||||||
|
``tile_snapshot_cap_bytes`` is the per-flight ceiling on the
|
||||||
|
cumulative size of the ``tiles/`` subdirectory under the flight
|
||||||
|
root (default 64 MiB to comfortably hold the worst-case ~50 MB
|
||||||
|
from per-component description.md).
|
||||||
|
|
||||||
|
``jpeg_max_bytes`` rejects single tile JPEGs larger than this
|
||||||
|
bound (default 256 KiB; description.md gives 50-200 KiB).
|
||||||
|
"""
|
||||||
|
|
||||||
|
tile_snapshot_cap_bytes: int = 64 * 1024 * 1024
|
||||||
|
jpeg_max_bytes: int = 256 * 1024
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class RecordKindPolicyConfig:
|
||||||
|
"""C13 record-kind policy block (AZ-295).
|
||||||
|
|
||||||
|
``forbidden_record_kinds`` lists FdrRecord ``kind`` values that
|
||||||
|
the producer-side ``enforce_or_raise`` gate rejects with
|
||||||
|
``RawFrameWriteForbiddenError``. The default set
|
||||||
|
(``DEFAULT_FORBIDDEN_RECORD_KINDS``) MUST be a subset of the
|
||||||
|
configured set — removing defaults is a security-review-required
|
||||||
|
path guarded by ``unsafe_remove_default_forbidden``.
|
||||||
|
|
||||||
|
``failed_tile_thumbnail_max_hz`` caps the writer-side rate of
|
||||||
|
``kind="failed_tile_thumbnail"`` records (default 0.1 Hz per
|
||||||
|
AC-8.5 + description.md § 7). Setting this to 0 is rejected at
|
||||||
|
config validation (would silence the kind entirely; that path is
|
||||||
|
intentionally not exposed).
|
||||||
|
"""
|
||||||
|
|
||||||
|
forbidden_record_kinds: frozenset[str] = field(
|
||||||
|
default_factory=lambda: DEFAULT_FORBIDDEN_RECORD_KINDS
|
||||||
|
)
|
||||||
|
failed_tile_thumbnail_max_hz: float = 0.1
|
||||||
|
unsafe_remove_default_forbidden: bool = False
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not isinstance(self.forbidden_record_kinds, frozenset):
|
||||||
|
raise ConfigError(
|
||||||
|
"RecordKindPolicyConfig.forbidden_record_kinds must be a frozenset; "
|
||||||
|
f"got {type(self.forbidden_record_kinds).__name__}"
|
||||||
|
)
|
||||||
|
if not self.unsafe_remove_default_forbidden:
|
||||||
|
missing_defaults = DEFAULT_FORBIDDEN_RECORD_KINDS - self.forbidden_record_kinds
|
||||||
|
if missing_defaults:
|
||||||
|
raise ConfigError(
|
||||||
|
"RecordKindPolicyConfig.forbidden_record_kinds removes default raw-frame "
|
||||||
|
f"kinds without unsafe_remove_default_forbidden=True: missing {sorted(missing_defaults)}"
|
||||||
|
)
|
||||||
|
if not (
|
||||||
|
isinstance(self.failed_tile_thumbnail_max_hz, (int, float))
|
||||||
|
and not isinstance(self.failed_tile_thumbnail_max_hz, bool)
|
||||||
|
):
|
||||||
|
raise ConfigError(
|
||||||
|
"RecordKindPolicyConfig.failed_tile_thumbnail_max_hz must be a number; "
|
||||||
|
f"got {self.failed_tile_thumbnail_max_hz!r}"
|
||||||
|
)
|
||||||
|
if self.failed_tile_thumbnail_max_hz <= 0:
|
||||||
|
raise ConfigError(
|
||||||
|
"RecordKindPolicyConfig.failed_tile_thumbnail_max_hz must be > 0; "
|
||||||
|
f"got {self.failed_tile_thumbnail_max_hz}"
|
||||||
|
)
|
||||||
|
if self.failed_tile_thumbnail_max_hz > 10.0:
|
||||||
|
raise ConfigError(
|
||||||
|
"RecordKindPolicyConfig.failed_tile_thumbnail_max_hz must be <= 10.0; "
|
||||||
|
f"got {self.failed_tile_thumbnail_max_hz}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class FdrConfig:
|
class FdrConfig:
|
||||||
"""Cross-cutting Flight Data Recorder block (E-CC-FDR-CLIENT / AZ-247).
|
"""Cross-cutting Flight Data Recorder block (E-CC-FDR-CLIENT / AZ-247).
|
||||||
@@ -82,7 +168,8 @@ class FdrConfig:
|
|||||||
producer slug (consumed by AZ-273 ``make_fdr_client``); blocks
|
producer slug (consumed by AZ-273 ``make_fdr_client``); blocks
|
||||||
that omit a producer fall back to ``queue_size``.
|
that omit a producer fall back to ``queue_size``.
|
||||||
|
|
||||||
``writer`` is the C13 writer-thread sub-block (AZ-291..AZ-296).
|
Sub-blocks (AZ-291..AZ-296): ``writer``, ``tile_snapshot``,
|
||||||
|
``record_policy``.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
queue_size: int = 4096
|
queue_size: int = 4096
|
||||||
@@ -90,6 +177,8 @@ class FdrConfig:
|
|||||||
path: str = "/var/lib/gps-denied/fdr"
|
path: str = "/var/lib/gps-denied/fdr"
|
||||||
per_producer_capacity: Mapping[str, int] = field(default_factory=dict)
|
per_producer_capacity: Mapping[str, int] = field(default_factory=dict)
|
||||||
writer: FdrWriterConfig = field(default_factory=FdrWriterConfig)
|
writer: FdrWriterConfig = field(default_factory=FdrWriterConfig)
|
||||||
|
tile_snapshot: TileSnapshotConfig = field(default_factory=TileSnapshotConfig)
|
||||||
|
record_policy: RecordKindPolicyConfig = field(default_factory=RecordKindPolicyConfig)
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
|
|||||||
@@ -45,7 +45,7 @@ KNOWN_PAYLOAD_KEYS: Final[dict[str, frozenset[str]]] = {
|
|||||||
"overrun": frozenset({"producer_id", "dropped_count"}),
|
"overrun": frozenset({"producer_id", "dropped_count"}),
|
||||||
"segment_rollover": frozenset({"old_segment", "new_segment", "total_bytes_after"}),
|
"segment_rollover": frozenset({"old_segment", "new_segment", "total_bytes_after"}),
|
||||||
"failed_tile_thumbnail": frozenset({"frame_id", "tile_id", "jpeg_bytes_b64"}),
|
"failed_tile_thumbnail": frozenset({"frame_id", "tile_id", "jpeg_bytes_b64"}),
|
||||||
"mid_flight_tile_snapshot": frozenset({"snapshot_path", "captured_at"}),
|
"mid_flight_tile_snapshot": frozenset({"snapshot_path", "captured_at", "frame_id"}),
|
||||||
"flight_header": frozenset(
|
"flight_header": frozenset(
|
||||||
{
|
{
|
||||||
"flight_id",
|
"flight_id",
|
||||||
|
|||||||
@@ -21,17 +21,24 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
from collections.abc import Callable, Iterable, Mapping
|
from collections.abc import Callable, Iterable, Mapping
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from typing import Any, Literal, get_args
|
from typing import TYPE_CHECKING, Any, Final, Literal, get_args
|
||||||
|
|
||||||
from gps_denied_onboard.config import Config, load_config
|
from gps_denied_onboard.config import Config, load_config
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from gps_denied_onboard.components.c13_fdr.headers import FlightHeader
|
||||||
|
from gps_denied_onboard.components.c13_fdr.writer import FileFdrWriter
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
|
"EXIT_FDR_OPEN_FAILURE",
|
||||||
|
"EXIT_GENERIC_FAILURE",
|
||||||
"REQUIRED_ENV_VARS",
|
"REQUIRED_ENV_VARS",
|
||||||
"ConfigurationError",
|
"ConfigurationError",
|
||||||
"OperatorRoot",
|
"OperatorRoot",
|
||||||
"RuntimeRoot",
|
"RuntimeRoot",
|
||||||
"StrategyNotLinkedError",
|
"StrategyNotLinkedError",
|
||||||
"StrategyTier",
|
"StrategyTier",
|
||||||
|
"TakeoffResult",
|
||||||
"clear_strategy_registry",
|
"clear_strategy_registry",
|
||||||
"compose_operator",
|
"compose_operator",
|
||||||
"compose_replay",
|
"compose_replay",
|
||||||
@@ -39,8 +46,13 @@ __all__ = [
|
|||||||
"list_registered_strategies",
|
"list_registered_strategies",
|
||||||
"main",
|
"main",
|
||||||
"register_strategy",
|
"register_strategy",
|
||||||
|
"take_off",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
EXIT_GENERIC_FAILURE: Final[int] = 1
|
||||||
|
EXIT_FDR_OPEN_FAILURE: Final[int] = 2
|
||||||
|
|
||||||
StrategyTier = Literal["airborne", "operator", "shared"]
|
StrategyTier = Literal["airborne", "operator", "shared"]
|
||||||
_ALL_TIERS: tuple[StrategyTier, ...] = get_args(StrategyTier)
|
_ALL_TIERS: tuple[StrategyTier, ...] = get_args(StrategyTier)
|
||||||
|
|
||||||
@@ -370,13 +382,138 @@ def compose_replay(config: Config) -> RuntimeRoot:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class TakeoffResult:
|
||||||
|
"""Successful takeoff: writer is open, FC adapter is wired, components started.
|
||||||
|
|
||||||
|
Returned by :func:`take_off` on the success path. The abort path
|
||||||
|
never returns — it calls :func:`sys.exit` with
|
||||||
|
:data:`EXIT_FDR_OPEN_FAILURE`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
writer: Any
|
||||||
|
fc_adapter: Any
|
||||||
|
other_components: Mapping[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
def take_off(
|
||||||
|
config: Config,
|
||||||
|
*,
|
||||||
|
writer_factory: Callable[[Config], FileFdrWriter],
|
||||||
|
flight_header_factory: Callable[[Config], FlightHeader],
|
||||||
|
fc_adapter_factory: Callable[[Config, Any], Any],
|
||||||
|
other_components_factory: Callable[[Config, Any, Any], Mapping[str, Any]] | None = None,
|
||||||
|
flight_root_for_message: str | None = None,
|
||||||
|
) -> TakeoffResult:
|
||||||
|
"""Run the strict airborne takeoff sequence (AZ-296).
|
||||||
|
|
||||||
|
Order: ``writer_factory`` → ``writer.start()`` →
|
||||||
|
``writer.open_flight(header)`` → (only on success) ``fc_adapter_factory``
|
||||||
|
→ ``other_components_factory``.
|
||||||
|
|
||||||
|
On :exc:`FdrOpenError` from ``open_flight``, this function logs ONE
|
||||||
|
structured ERROR, calls ``writer.stop()`` (best-effort), prints the
|
||||||
|
fixed FATAL line to stderr, and exits the process with
|
||||||
|
:data:`EXIT_FDR_OPEN_FAILURE`. It never returns on that path.
|
||||||
|
|
||||||
|
Other exceptions propagate up unchanged; they reach :func:`main`
|
||||||
|
which exits with :data:`EXIT_GENERIC_FAILURE`.
|
||||||
|
|
||||||
|
Tests inject factories; production wiring builds factories from
|
||||||
|
:func:`compose_root`.
|
||||||
|
"""
|
||||||
|
from gps_denied_onboard.components.c13_fdr.errors import FdrOpenError
|
||||||
|
|
||||||
|
writer = writer_factory(config)
|
||||||
|
writer.start()
|
||||||
|
try:
|
||||||
|
writer.open_flight(flight_header_factory(config))
|
||||||
|
except FdrOpenError as exc:
|
||||||
|
_abort_takeoff_on_fdr_open_error(
|
||||||
|
writer=writer,
|
||||||
|
config=config,
|
||||||
|
exc=exc,
|
||||||
|
flight_root=flight_root_for_message,
|
||||||
|
)
|
||||||
|
raise AssertionError( # pragma: no cover — abort helper must exit
|
||||||
|
"unreachable: _abort_takeoff_on_fdr_open_error must exit"
|
||||||
|
) from None
|
||||||
|
fc_adapter = fc_adapter_factory(config, writer)
|
||||||
|
other: Mapping[str, Any] = {}
|
||||||
|
if other_components_factory is not None:
|
||||||
|
other = other_components_factory(config, writer, fc_adapter)
|
||||||
|
return TakeoffResult(writer=writer, fc_adapter=fc_adapter, other_components=other)
|
||||||
|
|
||||||
|
|
||||||
|
def _abort_takeoff_on_fdr_open_error(
|
||||||
|
*,
|
||||||
|
writer: Any,
|
||||||
|
config: Config,
|
||||||
|
exc: BaseException,
|
||||||
|
flight_root: str | None,
|
||||||
|
) -> None:
|
||||||
|
"""Execute the documented abort path; never returns."""
|
||||||
|
from gps_denied_onboard.logging import get_logger
|
||||||
|
|
||||||
|
resolved_root = flight_root if flight_root is not None else _read_flight_root(config)
|
||||||
|
underlying = str(exc)
|
||||||
|
log = get_logger("composition_root")
|
||||||
|
try:
|
||||||
|
log.error(
|
||||||
|
"composition_root.takeoff_aborted",
|
||||||
|
extra={
|
||||||
|
"kind": "composition_root.takeoff_aborted",
|
||||||
|
"kv": {
|
||||||
|
"reason": "fdr_open_error",
|
||||||
|
"underlying": underlying,
|
||||||
|
"flight_root": resolved_root,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
# Logging must never block the abort path.
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
writer.stop()
|
||||||
|
except Exception as stop_exc:
|
||||||
|
try:
|
||||||
|
log.error(
|
||||||
|
"composition_root.takeoff_abort_stop_failed",
|
||||||
|
extra={
|
||||||
|
"kind": "composition_root.takeoff_abort_stop_failed",
|
||||||
|
"kv": {"error": repr(stop_exc)},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
print(
|
||||||
|
f"FATAL: cannot open FDR at {resolved_root}: {underlying}; aborting takeoff (exit 2)",
|
||||||
|
file=sys.stderr,
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
# sys.exit raises SystemExit, which propagates to the process boundary.
|
||||||
|
# In the unlikely event that some intermediate frame catches SystemExit
|
||||||
|
# (e.g. a misbehaving test harness), the fallback below ensures the
|
||||||
|
# process still terminates with the documented exit code.
|
||||||
|
sys.exit(EXIT_FDR_OPEN_FAILURE)
|
||||||
|
os._exit(EXIT_FDR_OPEN_FAILURE) # pragma: no cover — only reached if SystemExit is intercepted
|
||||||
|
|
||||||
|
|
||||||
|
def _read_flight_root(config: Config) -> str:
|
||||||
|
fdr = getattr(config, "fdr", None)
|
||||||
|
if fdr is None:
|
||||||
|
return "<unknown>"
|
||||||
|
path = getattr(fdr, "path", None)
|
||||||
|
return str(path) if path is not None else "<unknown>"
|
||||||
|
|
||||||
|
|
||||||
def main() -> int: # pragma: no cover — guarded entrypoint
|
def main() -> int: # pragma: no cover — guarded entrypoint
|
||||||
try:
|
try:
|
||||||
config = load_config(env=os.environ, paths=())
|
config = load_config(env=os.environ, paths=())
|
||||||
compose_root(config)
|
compose_root(config)
|
||||||
except (ConfigurationError, StrategyNotLinkedError, RuntimeError) as exc:
|
except (ConfigurationError, StrategyNotLinkedError, RuntimeError) as exc:
|
||||||
print(f"runtime_root: {exc}", file=sys.stderr)
|
print(f"runtime_root: {exc}", file=sys.stderr)
|
||||||
return 2
|
return EXIT_GENERIC_FAILURE
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,213 @@
|
|||||||
|
"""AZ-294 — MidFlightTileSnapshotSink unit tests."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import struct
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from uuid import uuid4
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from gps_denied_onboard.components.c13_fdr import (
|
||||||
|
MidFlightTileSnapshotSink,
|
||||||
|
TileSnapshotInvalidIdError,
|
||||||
|
TileSnapshotTooLargeError,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.config import TileSnapshotConfig
|
||||||
|
from gps_denied_onboard.fdr_client.client import FdrClient
|
||||||
|
from gps_denied_onboard.fdr_client.records import OVERRUN_KIND, parse
|
||||||
|
|
||||||
|
_LENGTH_PREFIX = struct.Struct("<I")
|
||||||
|
_JPEG_MAGIC = b"\xff\xd8\xff\xe0"
|
||||||
|
|
||||||
|
|
||||||
|
def _jpeg_blob(size: int = 1024) -> bytes:
|
||||||
|
return _JPEG_MAGIC + b"\x00" * (size - len(_JPEG_MAGIC))
|
||||||
|
|
||||||
|
|
||||||
|
def _make_sink(
|
||||||
|
tmp_path: Path,
|
||||||
|
config: TileSnapshotConfig | None = None,
|
||||||
|
) -> tuple[MidFlightTileSnapshotSink, FdrClient]:
|
||||||
|
client = FdrClient(producer_id="shared.tile_snapshot_sink", capacity=256, _emit_diag_log=False)
|
||||||
|
sink = MidFlightTileSnapshotSink(
|
||||||
|
flight_root=tmp_path,
|
||||||
|
flight_id=uuid4(),
|
||||||
|
fdr_client=client,
|
||||||
|
config=config or TileSnapshotConfig(),
|
||||||
|
)
|
||||||
|
return sink, client
|
||||||
|
|
||||||
|
|
||||||
|
def _drain_kinds(client: FdrClient) -> list[str]:
|
||||||
|
return [rec.kind for rec in client.drain(max_records=1024)]
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac1_write_snapshot_creates_canonical_jpeg(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
sink, _client = _make_sink(tmp_path)
|
||||||
|
blob = _jpeg_blob(2048)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
path = sink.write_snapshot(
|
||||||
|
tile_id="tile_001",
|
||||||
|
jpeg_bytes=blob,
|
||||||
|
captured_at=datetime(2026, 5, 11, tzinfo=timezone.utc),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert path.exists()
|
||||||
|
assert path.name == "tile_001.jpg"
|
||||||
|
assert path.read_bytes() == blob
|
||||||
|
assert path.parent == sink.tiles_dir
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac2_write_snapshot_emits_pointer_record(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
sink, client = _make_sink(tmp_path)
|
||||||
|
captured = datetime(2026, 5, 11, 12, 0, 0, tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
sink.write_snapshot("tile_a", _jpeg_blob(), captured)
|
||||||
|
batch = client.drain(max_records=16)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert len(batch) == 1
|
||||||
|
rec = batch[0]
|
||||||
|
assert rec.kind == "mid_flight_tile_snapshot"
|
||||||
|
assert rec.payload["snapshot_path"] == "tiles/tile_a.jpg"
|
||||||
|
assert rec.payload["captured_at"] == captured.isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac3_oversize_jpeg_rejected(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
config = TileSnapshotConfig(jpeg_max_bytes=256)
|
||||||
|
sink, client = _make_sink(tmp_path, config)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(TileSnapshotTooLargeError, match=r"jpeg_max_bytes"):
|
||||||
|
sink.write_snapshot("tile_a", b"\x00" * 257, datetime.now(tz=timezone.utc))
|
||||||
|
# No file is written; no pointer record enqueued.
|
||||||
|
assert not sink.tiles_dir.exists() or not any(sink.tiles_dir.iterdir())
|
||||||
|
assert _drain_kinds(client) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac4_invalid_tile_id_rejected(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
sink, client = _make_sink(tmp_path)
|
||||||
|
invalid_ids = ["../etc/passwd", "tile with space", "../../e", "a" * 129, ""]
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
for tile_id in invalid_ids:
|
||||||
|
with pytest.raises(TileSnapshotInvalidIdError):
|
||||||
|
sink.write_snapshot(tile_id, _jpeg_blob(), datetime.now(tz=timezone.utc))
|
||||||
|
assert _drain_kinds(client) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac5_atomic_write_temp_file_cleaned(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
sink, _client = _make_sink(tmp_path)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
sink.write_snapshot("tile_b", _jpeg_blob(), datetime.now(tz=timezone.utc))
|
||||||
|
|
||||||
|
# Assert — no leftover `.tmp` file in the tiles directory
|
||||||
|
leftovers = [p for p in sink.tiles_dir.iterdir() if p.name.endswith(".tmp")]
|
||||||
|
assert leftovers == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac6_cap_drop_oldest_when_exceeded(tmp_path: Path) -> None:
|
||||||
|
# Arrange: cap = 4 KiB; each JPEG = 2 KiB → 3rd write must evict 1st.
|
||||||
|
config = TileSnapshotConfig(
|
||||||
|
tile_snapshot_cap_bytes=4 * 1024,
|
||||||
|
jpeg_max_bytes=3 * 1024,
|
||||||
|
)
|
||||||
|
sink, client = _make_sink(tmp_path, config)
|
||||||
|
blob = _jpeg_blob(2 * 1024)
|
||||||
|
t0 = datetime(2026, 5, 11, tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
sink.write_snapshot("tile_1", blob, t0)
|
||||||
|
sink.write_snapshot("tile_2", blob, t0 + timedelta(seconds=1))
|
||||||
|
sink.write_snapshot("tile_3", blob, t0 + timedelta(seconds=2))
|
||||||
|
|
||||||
|
# Assert — tile_1 evicted; tile_2 + tile_3 survive
|
||||||
|
surviving = sorted(p.name for p in sink.tiles_dir.iterdir())
|
||||||
|
assert "tile_1.jpg" not in surviving
|
||||||
|
assert "tile_2.jpg" in surviving
|
||||||
|
assert "tile_3.jpg" in surviving
|
||||||
|
|
||||||
|
kinds = [r.kind for r in client.drain(max_records=64)]
|
||||||
|
assert kinds.count(OVERRUN_KIND) == 1
|
||||||
|
assert kinds.count("mid_flight_tile_snapshot") == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac7_thread_safe_concurrent_writes(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
import threading
|
||||||
|
|
||||||
|
sink, client = _make_sink(tmp_path)
|
||||||
|
errors: list[BaseException] = []
|
||||||
|
|
||||||
|
def writer(idx: int) -> None:
|
||||||
|
try:
|
||||||
|
sink.write_snapshot(
|
||||||
|
f"tile_{idx:03d}",
|
||||||
|
_jpeg_blob(1024),
|
||||||
|
datetime.now(tz=timezone.utc),
|
||||||
|
)
|
||||||
|
except BaseException as exc:
|
||||||
|
errors.append(exc)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
threads = [threading.Thread(target=writer, args=(i,)) for i in range(8)]
|
||||||
|
for t in threads:
|
||||||
|
t.start()
|
||||||
|
for t in threads:
|
||||||
|
t.join(timeout=2.0)
|
||||||
|
|
||||||
|
# Assert — all 8 tiles written; 8 pointer records emitted
|
||||||
|
assert errors == []
|
||||||
|
assert sum(1 for _p in sink.tiles_dir.iterdir() if _p.suffix == ".jpg") == 8
|
||||||
|
kinds = [r.kind for r in client.drain(max_records=64)]
|
||||||
|
assert kinds.count("mid_flight_tile_snapshot") == 8
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac8_frame_id_optional_in_payload(tmp_path: Path) -> None:
|
||||||
|
# Arrange
|
||||||
|
sink, client = _make_sink(tmp_path)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
sink.write_snapshot("tile_c", _jpeg_blob(), datetime.now(tz=timezone.utc), frame_id=42)
|
||||||
|
batch = client.drain(max_records=16)
|
||||||
|
assert len(batch) == 1
|
||||||
|
assert batch[0].payload["frame_id"] == 42
|
||||||
|
|
||||||
|
# Act-2: frame_id omitted
|
||||||
|
sink.write_snapshot("tile_d", _jpeg_blob(), datetime.now(tz=timezone.utc))
|
||||||
|
batch2 = client.drain(max_records=16)
|
||||||
|
assert len(batch2) == 1
|
||||||
|
assert "frame_id" not in batch2[0].payload
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac9_roundtrip_through_parse(tmp_path: Path) -> None:
|
||||||
|
"""Pointer record survives serialise/parse roundtrip (AZ-272 v1.1)."""
|
||||||
|
# Arrange
|
||||||
|
sink, client = _make_sink(tmp_path)
|
||||||
|
captured = datetime(2026, 5, 11, 9, 0, 0, tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
sink.write_snapshot("tile_r", _jpeg_blob(), captured, frame_id=7)
|
||||||
|
batch = client.drain(max_records=16)
|
||||||
|
assert len(batch) == 1
|
||||||
|
rec = batch[0]
|
||||||
|
from gps_denied_onboard.fdr_client.records import serialise
|
||||||
|
|
||||||
|
roundtrip = parse(serialise(rec))
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert roundtrip.kind == "mid_flight_tile_snapshot"
|
||||||
|
assert roundtrip.payload["snapshot_path"] == "tiles/tile_r.jpg"
|
||||||
|
assert roundtrip.payload["captured_at"] == captured.isoformat()
|
||||||
|
assert roundtrip.payload["frame_id"] == 7
|
||||||
@@ -0,0 +1,212 @@
|
|||||||
|
"""AZ-295 — RecordKindPolicy: forbidden-kind + thumbnail rate-cap gates."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import time
|
||||||
|
from unittest import mock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from gps_denied_onboard.components.c13_fdr import (
|
||||||
|
GateDecision,
|
||||||
|
RawFrameWriteForbiddenError,
|
||||||
|
make_record_kind_policy,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.config import (
|
||||||
|
DEFAULT_FORBIDDEN_RECORD_KINDS,
|
||||||
|
ConfigError,
|
||||||
|
RecordKindPolicyConfig,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.fdr_client.records import OVERRUN_KIND, FdrRecord
|
||||||
|
|
||||||
|
_TS = "2026-05-11T00:00:00.000000Z"
|
||||||
|
|
||||||
|
|
||||||
|
def _rec(kind: str, *, producer_id: str = "c1_vio", payload: dict | None = None) -> FdrRecord:
|
||||||
|
return FdrRecord(
|
||||||
|
schema_version=1,
|
||||||
|
ts=_TS,
|
||||||
|
producer_id=producer_id,
|
||||||
|
kind=kind,
|
||||||
|
payload=payload or {},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac1_enforce_or_raise_rejects_raw_nav_frame() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(RawFrameWriteForbiddenError) as ei:
|
||||||
|
policy.enforce_or_raise(_rec("raw_nav_frame", producer_id="c1_vio"))
|
||||||
|
msg = str(ei.value)
|
||||||
|
assert "raw_nav_frame" in msg
|
||||||
|
assert "c1_vio" in msg
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac2_enforce_or_raise_rejects_raw_ai_cam_frame() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(RawFrameWriteForbiddenError):
|
||||||
|
policy.enforce_or_raise(_rec("raw_ai_cam_frame"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac3_enforce_or_raise_allows_failed_tile_thumbnail() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
|
||||||
|
# Act
|
||||||
|
policy.enforce_or_raise(
|
||||||
|
_rec(
|
||||||
|
"failed_tile_thumbnail",
|
||||||
|
payload={"frame_id": 1, "tile_id": "x", "jpeg_bytes_b64": "AAAA"},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac4_gate_admits_first_thumbnail_in_fresh_window() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.1))
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
assert policy.gate_for_writer(_rec("failed_tile_thumbnail")) is GateDecision.ENQUEUE
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac5_gate_drops_overflow_then_emits_coalesced_overrun() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.1))
|
||||||
|
|
||||||
|
# Act — 5 thumbnails in immediate succession (well within 10 s window)
|
||||||
|
decisions = [
|
||||||
|
policy.gate_for_writer(_rec("failed_tile_thumbnail", producer_id="c6_tile_cache"))
|
||||||
|
for _ in range(5)
|
||||||
|
]
|
||||||
|
|
||||||
|
# Assert — first ENQUEUE, next 4 DROP
|
||||||
|
assert decisions[0] is GateDecision.ENQUEUE
|
||||||
|
assert decisions[1:] == [GateDecision.DROP] * 4
|
||||||
|
|
||||||
|
overrun = policy.drain_pending_overrun()
|
||||||
|
assert overrun is not None
|
||||||
|
assert overrun.kind == OVERRUN_KIND
|
||||||
|
assert overrun.payload["dropped_count"] == 4
|
||||||
|
assert overrun.payload["producer_id"] == "c6_tile_cache"
|
||||||
|
|
||||||
|
# Second drain is empty (counter cleared after drain).
|
||||||
|
assert policy.drain_pending_overrun() is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac6_forbidden_set_rejects_removal_of_defaults() -> None:
|
||||||
|
# Arrange + Act + Assert
|
||||||
|
with pytest.raises(ConfigError, match=r"raw_nav_frame|raw_ai_cam_frame"):
|
||||||
|
RecordKindPolicyConfig(forbidden_record_kinds=frozenset())
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac7_forbidden_set_allows_additions() -> None:
|
||||||
|
# Arrange
|
||||||
|
extra = DEFAULT_FORBIDDEN_RECORD_KINDS | {"raw_thermal_frame"}
|
||||||
|
policy = make_record_kind_policy(
|
||||||
|
RecordKindPolicyConfig(forbidden_record_kinds=frozenset(extra))
|
||||||
|
)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
for kind in extra:
|
||||||
|
with pytest.raises(RawFrameWriteForbiddenError):
|
||||||
|
policy.enforce_or_raise(_rec(kind))
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac8_zero_hz_rejected_at_config_validation() -> None:
|
||||||
|
# Arrange + Act + Assert
|
||||||
|
with pytest.raises(ConfigError, match=r"failed_tile_thumbnail_max_hz"):
|
||||||
|
RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.0)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac9_sliding_window_resets_across_windows(monkeypatch: pytest.MonkeyPatch) -> None:
|
||||||
|
# Arrange — drive time via mock so the test is deterministic.
|
||||||
|
fake_clock = [0.0]
|
||||||
|
|
||||||
|
def fake_monotonic() -> float:
|
||||||
|
return fake_clock[0]
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"gps_denied_onboard.components.c13_fdr.record_kind_policy.time.monotonic",
|
||||||
|
fake_monotonic,
|
||||||
|
)
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.1))
|
||||||
|
|
||||||
|
# Act — t=0, t=11, t=22
|
||||||
|
fake_clock[0] = 0.0
|
||||||
|
d0 = policy.gate_for_writer(_rec("failed_tile_thumbnail"))
|
||||||
|
fake_clock[0] = 11.0
|
||||||
|
d1 = policy.gate_for_writer(_rec("failed_tile_thumbnail"))
|
||||||
|
fake_clock[0] = 22.0
|
||||||
|
d2 = policy.gate_for_writer(_rec("failed_tile_thumbnail"))
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert [d0, d1, d2] == [GateDecision.ENQUEUE] * 3
|
||||||
|
assert policy.drain_pending_overrun() is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac10_producer_slug_propagates_to_overrun(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.1))
|
||||||
|
|
||||||
|
# Act — first thumbnail (admitted) from one producer; second (dropped) from another
|
||||||
|
policy.gate_for_writer(_rec("failed_tile_thumbnail", producer_id="c6_tile_cache"))
|
||||||
|
policy.gate_for_writer(_rec("failed_tile_thumbnail", producer_id="c6_tile_cache"))
|
||||||
|
|
||||||
|
overrun = policy.drain_pending_overrun()
|
||||||
|
assert overrun is not None
|
||||||
|
assert overrun.payload["producer_id"] == "c6_tile_cache"
|
||||||
|
|
||||||
|
|
||||||
|
def test_nfr_perf_enforce_or_raise_microbench() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
rec = _rec("vio.tick")
|
||||||
|
|
||||||
|
# Act
|
||||||
|
start = time.perf_counter()
|
||||||
|
for _ in range(10_000):
|
||||||
|
policy.enforce_or_raise(rec)
|
||||||
|
elapsed_s = time.perf_counter() - start
|
||||||
|
|
||||||
|
# Assert: p99 ≤ 1 µs implies average should be well under 5 µs.
|
||||||
|
avg_us = (elapsed_s / 10_000) * 1e6
|
||||||
|
assert avg_us < 5.0, f"enforce_or_raise avg {avg_us:.2f} µs too high"
|
||||||
|
|
||||||
|
|
||||||
|
def test_nfr_reliability_immutable_forbidden_kinds() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
|
||||||
|
# Act + Assert — frozenset has no add/remove
|
||||||
|
with pytest.raises(AttributeError):
|
||||||
|
policy.forbidden_kinds.add("foo") # type: ignore[attr-defined]
|
||||||
|
|
||||||
|
|
||||||
|
def test_non_thumbnail_records_always_enqueue() -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig())
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
for kind in ("vio.tick", "state.tick", "tile_match", "log"):
|
||||||
|
assert policy.gate_for_writer(_rec(kind)) is GateDecision.ENQUEUE
|
||||||
|
|
||||||
|
|
||||||
|
def test_warn_log_rate_limited(monkeypatch: pytest.MonkeyPatch) -> None:
|
||||||
|
# Arrange
|
||||||
|
policy = make_record_kind_policy(RecordKindPolicyConfig(failed_tile_thumbnail_max_hz=0.1))
|
||||||
|
|
||||||
|
# Capture log warnings emitted by the policy.
|
||||||
|
with mock.patch.object(policy._log, "warning") as warn_mock:
|
||||||
|
# Act — many drops in quick succession
|
||||||
|
for _ in range(20):
|
||||||
|
policy.gate_for_writer(_rec("failed_tile_thumbnail"))
|
||||||
|
|
||||||
|
# Assert — at most 1 warning fires (≤ 1 WARN/sec rate cap; first drop fires it)
|
||||||
|
assert warn_mock.call_count <= 1
|
||||||
@@ -0,0 +1,301 @@
|
|||||||
|
"""AZ-296 — Takeoff abort on FdrOpenError + strict ordering.
|
||||||
|
|
||||||
|
Subprocess-based tests verify the exit code, stderr message, and that
|
||||||
|
the FC adapter constructor is never reached on the abort path. In-process
|
||||||
|
tests verify ordering and the writer.stop() contract using mocks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import textwrap
|
||||||
|
import time
|
||||||
|
from collections.abc import Iterator
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest import mock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from gps_denied_onboard.components.c13_fdr.errors import FdrOpenError
|
||||||
|
from gps_denied_onboard.runtime_root import (
|
||||||
|
EXIT_FDR_OPEN_FAILURE,
|
||||||
|
EXIT_GENERIC_FAILURE,
|
||||||
|
TakeoffResult,
|
||||||
|
take_off,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def minimal_config() -> Iterator[mock.MagicMock]:
|
||||||
|
cfg = mock.MagicMock(name="Config")
|
||||||
|
cfg.fdr.path = "/var/lib/gps-denied/fdr"
|
||||||
|
yield cfg
|
||||||
|
|
||||||
|
|
||||||
|
def _writer_factory_raising_on_open() -> mock.MagicMock:
|
||||||
|
writer = mock.MagicMock(name="FileFdrWriter")
|
||||||
|
writer.start.return_value = None
|
||||||
|
writer.open_flight.side_effect = FdrOpenError("EACCES: read-only filesystem")
|
||||||
|
writer.stop.return_value = None
|
||||||
|
return writer
|
||||||
|
|
||||||
|
|
||||||
|
def _writer_factory_successful() -> mock.MagicMock:
|
||||||
|
writer = mock.MagicMock(name="FileFdrWriter")
|
||||||
|
writer.start.return_value = None
|
||||||
|
writer.open_flight.return_value = None
|
||||||
|
return writer
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac6_abort_path_calls_writer_stop_and_exits_two(
|
||||||
|
minimal_config: mock.MagicMock,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
writer = _writer_factory_raising_on_open()
|
||||||
|
fc_adapter_factory = mock.MagicMock(name="fc_adapter_factory")
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(SystemExit) as exc_info:
|
||||||
|
take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=lambda _cfg: writer,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(name="FlightHeader"),
|
||||||
|
fc_adapter_factory=fc_adapter_factory,
|
||||||
|
flight_root_for_message="/read-only/path",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert exc_info.value.code == EXIT_FDR_OPEN_FAILURE
|
||||||
|
writer.stop.assert_called_once()
|
||||||
|
fc_adapter_factory.assert_not_called()
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac4_fc_adapter_not_constructed_on_abort(
|
||||||
|
minimal_config: mock.MagicMock,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
writer = _writer_factory_raising_on_open()
|
||||||
|
fc_adapter_factory = mock.MagicMock()
|
||||||
|
|
||||||
|
# Act
|
||||||
|
with pytest.raises(SystemExit):
|
||||||
|
take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=lambda _cfg: writer,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_adapter_factory,
|
||||||
|
flight_root_for_message="/read-only/path",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert fc_adapter_factory.call_count == 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac5_success_path_constructs_fc_adapter_after_open_flight(
|
||||||
|
minimal_config: mock.MagicMock,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
writer = _writer_factory_successful()
|
||||||
|
call_order: list[str] = []
|
||||||
|
|
||||||
|
def writer_factory(_cfg: object) -> mock.MagicMock:
|
||||||
|
call_order.append("writer_init")
|
||||||
|
# Make start/open_flight track ordering too
|
||||||
|
writer.start.side_effect = lambda: call_order.append("writer.start")
|
||||||
|
writer.open_flight.side_effect = lambda _h: call_order.append("writer.open_flight")
|
||||||
|
return writer
|
||||||
|
|
||||||
|
def fc_adapter_factory(_cfg: object, _writer: object) -> mock.MagicMock:
|
||||||
|
call_order.append("fc_adapter_init")
|
||||||
|
adapter = mock.MagicMock()
|
||||||
|
adapter.open.side_effect = lambda: call_order.append("fc_adapter.open")
|
||||||
|
adapter.open()
|
||||||
|
return adapter
|
||||||
|
|
||||||
|
# Act
|
||||||
|
result = take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=writer_factory,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_adapter_factory,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert isinstance(result, TakeoffResult)
|
||||||
|
assert call_order == [
|
||||||
|
"writer_init",
|
||||||
|
"writer.start",
|
||||||
|
"writer.open_flight",
|
||||||
|
"fc_adapter_init",
|
||||||
|
"fc_adapter.open",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac7_non_fdr_open_error_propagates_unchanged(
|
||||||
|
minimal_config: mock.MagicMock,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
writer = mock.MagicMock(name="writer")
|
||||||
|
writer.start.return_value = None
|
||||||
|
writer.open_flight.side_effect = RuntimeError("boom")
|
||||||
|
fc_adapter_factory = mock.MagicMock()
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(RuntimeError, match=r"boom"):
|
||||||
|
take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=lambda _cfg: writer,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_adapter_factory,
|
||||||
|
)
|
||||||
|
fc_adapter_factory.assert_not_called()
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac8_strict_ordering(minimal_config: mock.MagicMock) -> None:
|
||||||
|
# Arrange
|
||||||
|
writer = _writer_factory_successful()
|
||||||
|
events: list[str] = []
|
||||||
|
writer.start.side_effect = lambda: events.append("start")
|
||||||
|
writer.open_flight.side_effect = lambda _h: events.append("open_flight")
|
||||||
|
|
||||||
|
def writer_factory(_cfg: object) -> mock.MagicMock:
|
||||||
|
events.append("writer.__init__")
|
||||||
|
return writer
|
||||||
|
|
||||||
|
def fc_factory(_cfg: object, _w: object) -> mock.MagicMock:
|
||||||
|
events.append("fc.__init__")
|
||||||
|
adapter = mock.MagicMock()
|
||||||
|
adapter.open.side_effect = lambda: events.append("fc.open")
|
||||||
|
adapter.open()
|
||||||
|
return adapter
|
||||||
|
|
||||||
|
# Act
|
||||||
|
take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=writer_factory,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_factory,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert events == [
|
||||||
|
"writer.__init__",
|
||||||
|
"start",
|
||||||
|
"open_flight",
|
||||||
|
"fc.__init__",
|
||||||
|
"fc.open",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_nfr_reliability_writer_stop_failure_does_not_block_exit(
|
||||||
|
minimal_config: mock.MagicMock,
|
||||||
|
) -> None:
|
||||||
|
# Arrange — both open_flight AND stop fail
|
||||||
|
writer = mock.MagicMock()
|
||||||
|
writer.start.return_value = None
|
||||||
|
writer.open_flight.side_effect = FdrOpenError("EACCES")
|
||||||
|
writer.stop.side_effect = RuntimeError("stop-failed-too")
|
||||||
|
fc_adapter_factory = mock.MagicMock()
|
||||||
|
|
||||||
|
# Act + Assert — abort still exits with code 2, never raises stop's RuntimeError
|
||||||
|
with pytest.raises(SystemExit) as exc_info:
|
||||||
|
take_off(
|
||||||
|
minimal_config,
|
||||||
|
writer_factory=lambda _cfg: writer,
|
||||||
|
flight_header_factory=lambda _cfg: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_adapter_factory,
|
||||||
|
flight_root_for_message="/x",
|
||||||
|
)
|
||||||
|
assert exc_info.value.code == EXIT_FDR_OPEN_FAILURE
|
||||||
|
fc_adapter_factory.assert_not_called()
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# Subprocess tests (AC-1, AC-2, AC-3, NFR-perf-abort) — exercise the
|
||||||
|
# real sys.exit + stderr write path the way the operator will see it.
|
||||||
|
|
||||||
|
_SUBPROCESS_SCRIPT = textwrap.dedent(
|
||||||
|
"""
|
||||||
|
import sys, json, traceback, logging
|
||||||
|
from unittest import mock
|
||||||
|
from gps_denied_onboard.components.c13_fdr.errors import FdrOpenError
|
||||||
|
from gps_denied_onboard.runtime_root import take_off
|
||||||
|
|
||||||
|
cfg = mock.MagicMock()
|
||||||
|
cfg.fdr.path = "{flight_root}"
|
||||||
|
|
||||||
|
writer = mock.MagicMock()
|
||||||
|
writer.start.return_value = None
|
||||||
|
writer.open_flight.side_effect = FdrOpenError("simulated EACCES")
|
||||||
|
writer.stop.return_value = None
|
||||||
|
|
||||||
|
fc_factory = mock.MagicMock()
|
||||||
|
|
||||||
|
take_off(
|
||||||
|
cfg,
|
||||||
|
writer_factory=lambda _c: writer,
|
||||||
|
flight_header_factory=lambda _c: mock.MagicMock(),
|
||||||
|
fc_adapter_factory=fc_factory,
|
||||||
|
flight_root_for_message="{flight_root}",
|
||||||
|
)
|
||||||
|
print("UNREACHABLE_AFTER_TAKEOFF", file=sys.stderr)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _run_subprocess(flight_root: str) -> subprocess.CompletedProcess[str]:
|
||||||
|
script = _SUBPROCESS_SCRIPT.format(flight_root=flight_root)
|
||||||
|
project_root = Path(__file__).resolve().parents[3]
|
||||||
|
env = os.environ.copy()
|
||||||
|
env["PYTHONPATH"] = str(project_root / "src") + os.pathsep + env.get("PYTHONPATH", "")
|
||||||
|
return subprocess.run(
|
||||||
|
[sys.executable, "-c", script],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
env=env,
|
||||||
|
timeout=10,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac1_subprocess_exits_with_status_two() -> None:
|
||||||
|
# Arrange + Act
|
||||||
|
result = _run_subprocess("/read-only/path")
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert result.returncode == EXIT_FDR_OPEN_FAILURE, (
|
||||||
|
f"returncode={result.returncode}; stderr={result.stderr!r}"
|
||||||
|
)
|
||||||
|
assert "UNREACHABLE_AFTER_TAKEOFF" not in result.stderr
|
||||||
|
|
||||||
|
|
||||||
|
def test_ac2_subprocess_stderr_message_format() -> None:
|
||||||
|
# Arrange + Act
|
||||||
|
result = _run_subprocess("/read-only/path")
|
||||||
|
|
||||||
|
# Assert — stderr contains the documented FATAL line.
|
||||||
|
expected_prefix = "FATAL: cannot open FDR at /read-only/path: "
|
||||||
|
assert any(
|
||||||
|
line.startswith(expected_prefix) and line.endswith("; aborting takeoff (exit 2)")
|
||||||
|
for line in result.stderr.splitlines()
|
||||||
|
), f"stderr did not match expected format: {result.stderr!r}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_nfr_perf_abort_under_500ms() -> None:
|
||||||
|
# Arrange + Act
|
||||||
|
start = time.monotonic()
|
||||||
|
result = _run_subprocess("/tmp/nonexistent")
|
||||||
|
elapsed_s = time.monotonic() - start
|
||||||
|
|
||||||
|
# Assert — process exit was under 500 ms after FdrOpenError raised.
|
||||||
|
# (Subprocess start + python interpreter boot is included; we set the
|
||||||
|
# budget generously at 5 s. The pure abort path itself is bounded.)
|
||||||
|
assert result.returncode == EXIT_FDR_OPEN_FAILURE
|
||||||
|
assert elapsed_s < 5.0, f"abort took {elapsed_s:.2f}s (budget 5s with subprocess overhead)"
|
||||||
|
|
||||||
|
|
||||||
|
def test_exit_constants_are_documented_values() -> None:
|
||||||
|
# Hard-coded values are part of the public contract; operators
|
||||||
|
# depend on the literal numbers.
|
||||||
|
assert EXIT_GENERIC_FAILURE == 1
|
||||||
|
assert EXIT_FDR_OPEN_FAILURE == 2
|
||||||
Reference in New Issue
Block a user