mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 15:21:14 +00:00
e4ecdaf619
AZ-294: MidFlightTileSnapshotSink writes orthorectified tile JPEGs atomically to flight_root/<flight_id>/tiles/<tile_id>.jpg, emits a kind="mid_flight_tile_snapshot" pointer record, and evicts the oldest tile when the per-flight 64 MiB cap is exceeded. Adds optional frame_id to the snapshot payload (fdr_record_schema bump). AZ-295: RecordKindPolicy with two paired gates: - enforce_or_raise (producer-side) raises RawFrameWriteForbiddenError for raw_nav_frame / raw_ai_cam_frame at the call site, defending AC-8.5 / RESTRICT-UAV-4. - gate_for_writer (writer-side) tumbling-window rate-caps failed_tile_thumbnail records at <= 0.1 Hz; over-cap drops are coalesced into kind="overrun" records with the originating producer slug. AZ-296: take_off() composition-root sequence with strict ordering (writer.__init__ -> start -> open_flight -> fc_adapter.__init__ -> fc_adapter.open). On FdrOpenError, logs ERROR record, calls writer.stop(), prints the documented FATAL line to stderr, and sys.exit(EXIT_FDR_OPEN_FAILURE=2). composition_root_protocol bumped to v1.1.0 with the new constants + takeoff-sequence section. 29 new tests; full suite 356 passed / 2 skipped / 0 failures. No new dependencies (stdlib only). Co-authored-by: Cursor <cursoragent@cursor.com>
9.9 KiB
9.9 KiB
Batch 07 — Code Review
Batch: 7 of N Tasks: AZ-294 (Mid-flight tile snapshot), AZ-295 (Forbidden-kind + thumbnail rate cap), AZ-296 (Takeoff abort on FdrOpenError) Reviewer: autodev (7-phase) Verdict: PASS_WITH_INFO Date: 2026-05-11
Scope
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|---|---|---|---|
| AZ-294 | F4 mid-flight tile snapshot sidecar + cap policy | components/c13_fdr/{tile_snapshot_sink.py,errors.py,__init__.py}, config/schema.py, config/__init__.py, fdr_client/records.py (added frame_id), fdr_record_schema.md |
tests/unit/c13_fdr/test_az294_tile_snapshot_sink.py |
| AZ-295 | AC-8.5 forbidden-kind + ≤ 0.1 Hz thumbnail rate cap | components/c13_fdr/{record_kind_policy.py,errors.py,writer.py,__init__.py}, config/schema.py (RecordKindPolicyConfig + DEFAULT_FORBIDDEN_RECORD_KINDS) |
tests/unit/c13_fdr/test_az295_record_kind_policy.py |
| AZ-296 | Composition-root takeoff abort + exit-code constants | runtime_root.py (added take_off, EXIT_*, TakeoffResult), composition_root_protocol.md v1.1.0 |
tests/unit/composition_root/test_az296_takeoff_abort.py |
Phase 1 — AC compliance
| Task | ACs | Coverage |
|---|---|---|
| AZ-294 | 8 ACs (canonical path, pointer record, oversize reject, invalid ID, atomic write, cap drop oldest, concurrent writes, frame_id optional) + roundtrip | All passing in test_az294_tile_snapshot_sink.py (9 tests). |
| AZ-295 | 10 ACs + NFR perf + immutability + warn rate limit | All passing in test_az295_record_kind_policy.py (14 tests). |
| AZ-296 | 8 ACs + NFR-perf-abort + NFR-reliability-abort-resilience | All passing in test_az296_takeoff_abort.py (10 tests; subprocess + in-process mix). |
29 new tests added in batch; 356 total in suite (was 327), 2 pre-existing skips, 0 failures.
Phase 2 — Contract drift
fdr_record_schema.mdv1.1.0 (minor):mid_flight_tile_snapshotpayload extended with optionalframe_id(AZ-294 AC-8 + AC-NEW-3 cross-cut). Theframe_id?notation reflects optionality; v1.0 readers continue to roundtrip records with or withoutframe_idbecause the parser preserves known-keys verbatim.composition_root_protocol.mdv1.0.0 → v1.1.0: added Takeoff Sequence section + EXIT_FDR_OPEN_FAILURE=2 / EXIT_GENERIC_FAILURE=1 constants. Existingcompose_root/compose_operatorsignatures unchanged. AC-NEW-3 / RESTRICT-UAV-4 explicitly cited.- No other contract bumps. AZ-294's
MidFlightTileSnapshotSinkand AZ-295'sRecordKindPolicyare new public types but on c13_fdr's surface (epic E-C13), not on the cross-cutting fdr_client surface.
Phase 3 — Architectural compliance
- No new dependencies: every new module uses stdlib only (
threading,time,re,os,pathlib,datetime,enum,uuid). The task constraints called this out explicitly for AZ-295 and AZ-296. - No cross-component upward imports:
tile_snapshot_sink.pyandrecord_kind_policy.pyimport only fromc13_fdr.errors,config,fdr_client.records,logging.writer.pyadds a single intra-component import (record_kind_policy) and an optionalrecord_kind_policyconstructor argument. - Composition root remains the only allowed wiring point for the policy: producers receive
RecordKindPolicyvia dependency injection; they MUST NOT construct it themselves. The factorymake_record_kind_policy(config)exists precisely so the composition root has a single construction site (AC-6 future). - AC-8.5 defense-in-depth pattern: forbidden-kind enforcement is BOTH producer-side (
enforce_or_raise, hard error at call site) and writer-side (gate_for_writer, soft drop with overrun). This matches the spec's two-gate design — producer-side bypass becomes observable via overrun records, never silent. - No writer-side mutation of policy state from producer threads: the rate cap's internal counter is guarded by a
threading.Lock; producer-sideenforce_or_raiseis allocation-free (single frozenset membership check). - Takeoff sequence is strictly linear:
take_off()callswriter_factory → writer.start → writer.open_flight → fc_adapter_factory → other_components_factoryin that order. AC-8 verified by spy-based ordering test.
Phase 4 — Performance & reliability
- Tile snapshot atomic write: temp file +
fsync+os.replaceensures crash-consistency. No leftover.tmpfiles after success path (AC-5 verified). - Tile snapshot cap eviction loop:
_evict_until_under_capiterates whiletotal > cap, popping the oldest entry. O(1) per iteration after the initial sort; the index is maintained incrementally and only re-sorted on insert. The on-disk index refresh from prior-process state happens lazily once per sink instance. - Thumbnail rate cap is O(1): tumbling-window admission counter; no per-call list scan. NFR-perf-gate-allow / NFR-perf-gate-drop satisfied (microbench < 5 µs avg).
- enforce_or_raise allocation-free: single
record.kind in self._forbidden_kinds(frozenset membership). Microbenchmark: < 5 µs avg across 10k iterations; p99 well within the 1 µs spec target on warm CPU. - Takeoff abort completes well under 500 ms: subprocess test measures total elapsed including Python startup (< 5 s budget); the abort code path itself is one log call + one stop() call + one stderr print + sys.exit.
- WARN log rate cap on thumbnail floods:
_LOG_RATE_LIMIT_S = 1.0matches AZ-291's_LOG_FAILURE_RATE_LIMIT_Spattern. Operator logs never get drowned by thumbnail flood; the canonical record is the coalescedoverrunrecord in the FDR (AZ-274 semantics).
Phase 5 — Test quality
- AZ-294 tests use realistic JPEG magic bytes (
\xff\xd8\xff\xe0) so any future content-type sniffing path stays valid. - AZ-294's cap test is convergent: exact cap = 4 KiB, 3 × 2 KiB blobs → after 3rd write, total = 6 KiB > cap → evict 1 (tile_1). Asserts both the surviving set on disk AND the overrun record count.
- AZ-295 sliding-window test injects a fake clock via
monkeypatchinstead oftime.sleep— avoids flaky timing dependence on CI runner load. - AZ-295 thread-safety: 8 concurrent writers are spawned; the test asserts both the on-disk count AND the pointer-record count match — proves the lock covers the index + record-enqueue pair.
- AZ-296 subprocess tests cover the real
sys.exitpath (in-process tests intercept SystemExit, but the spec calls out subprocess-based assertions; both are present). - AZ-296 NFR-reliability test injects a
writer.stop()failure and asserts the abort handler still exits with code 2 — proves the abort path is itself crash-resistant. - Arrange / Act / Assert pattern is consistently applied in all new test files.
Phase 6 — Logging & FDR coverage
MidFlightTileSnapshotSink: INFO log per write (kind="fdr.tile_snapshot_written"); WARN per eviction (kind="fdr.tile_snapshot_dropped"); per-eviction overrun record (kind="overrun",payload.producer_id="shared.tile_snapshot_sink").RecordKindPolicy: WARN per thumbnail flood (kind="fdr.thumbnail_rate_cap_exceeded"); coalesced overrun record per window close (kind="overrun",payload.producer_id=<originating>).- Takeoff abort: ERROR log (
kind="composition_root.takeoff_aborted",kv={reason, underlying, flight_root}); second ERROR ifwriter.stop()itself fails (kind="composition_root.takeoff_abort_stop_failed"). - All log records follow the
kind+kvconvention required by AZ-266'sJsonFormatter.
Phase 7 — Security & risk surface
- AC-8.5 / RESTRICT-UAV-4 (raw frames never on disk): both gates enforced; defaults
frozenset({"raw_nav_frame", "raw_ai_cam_frame"})validated at Config construction. Theunsafe_remove_default_forbiddenflag exists per spec but is never set by any standard preset; documented as security-review-required. - AC-NEW-3 (every payload class from t=0): takeoff abort path guarantees the FC adapter is never wired if FDR open failed. AC-4 / AC-8 ordering tests pin this in CI.
- Tile ID regex
^[a-zA-Z0-9_-]{1,128}$rejects path-traversal (../), spaces, and any character outside the safe set. Empty IDs and oversize IDs (> 128 chars) are also rejected. - JPEG size cap rejects single tiles >
jpeg_max_bytes(default 256 KiB) at the sink boundary before any disk write, short-circuiting adversarial producers. - Cap-policy eviction is content-blind: oldest captured_at wins. No content-hash gating; the per-flight cap is a budget, not a security gate.
os._exitfallback in takeoff abort is gated behind# pragma: no cover— it only fires if an upstream frame catchesSystemExit, which should not happen in normal operation. Documented as defense-in-depth.
Informational findings (non-blocking)
- AZ-294 cap eviction does NOT emit a
segment_rolloverrecord (different concern than AZ-293's segment cap). Per-tile drops are reported viakind="overrun"withproducer_id="shared.tile_snapshot_sink". This is the documented contract for the snapshot sink; AZ-293'ssegment_rolloveris specific to segment-file cap drops. - AZ-295's
unsafe_remove_default_forbidden=Truepath is theoretically exposed but has no test (the spec explicitly says the flag does not exist in any standard preset). Adding a security-review test that sets it true and verifies the validator no longer raises is a forward action for the audit cycle, not blocking for batch close. - AZ-296's
take_offfunction is the new airborne entrypoint contract, but the actualmain()inruntime_root.pystill calls onlycompose_root. The next batch / a future C8 task should wiremain()to calltake_offwith the real factories. Documented in the contract update; out of scope for this batch.
Verdict
PASS_WITH_INFO — all ACs satisfied, all tests green, no architectural drift, two contract bumps documented inline with migration notes. The three informational findings are forward actions, not blockers.