Files
gps-denied-onboard/_docs/03_implementation/batch_38_cycle1_report.md
T
Oleksandr Bezdieniezhnykh cde237e236 [AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key
Batch 38 (cycle 1) lands the two upload-side prerequisites the
upcoming AZ-319 TileUploader needs to authenticate per-flight
sessions against the parent suite's D-PROJ-2 ingest contract.

AZ-317 FlightStateGate:
- confirm_on_ground() defence-in-depth gate atop ADR-004 process
  isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF,
  LANDING, and source-failure (mapped to UNKNOWN with original
  exception preserved on __cause__).
- ERROR log on refusal, INFO log on pass, single source call per
  invocation (no polling, no retry).

AZ-318 PerFlightKeyManager:
- Per-flight ephemeral Ed25519 keypair via the project-pinned
  cryptography library; sign(payload) -> 64-byte Ed25519 signature.
- Best-effort zeroisation of a project-controlled bytearray mirror
  on end_session; OpenSSL-side buffer freed via dropped reference.
- __del__ safety net with WARN log if end_session was missed.
- start_session emits FDR kind=c11.upload.session.key.public so the
  safety officer can correlate flights with key fingerprints.
- record_signature_rejection emits FDR + ERROR log on parent-suite
  ingest rejection (security-critical, never silently dropped).

Shared C11 plumbing:
- TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError,
  SessionNotActiveError, SignatureRejectedError envelope).
- FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs.
- FlightStateSource Protocol on c11_tile_manager.interface.
- runtime_root.c11_factory factories for both new services.
- Two new FDR kinds registered in fdr_client.records central
  KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in
  lockstep so the central test stays green.

Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80
skipped (documented Docker / Tier-2 / CUDA gates).

Code review: PASS_WITH_WARNINGS — 2 Low findings documented in
_docs/03_implementation/reviews/batch_38_review.md (dev-host vs
operator-workstation perf bound; spec text named StrEnum but
project pins Python 3.10).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 05:48:52 +03:00

7.5 KiB

Batch 38 — Cycle 1 Report

Date: 2026-05-13 Batch: 38 (two-task batch — first two C11 upload-side prerequisites) Tasks:

  • AZ-317 (C11 Flight-State Gate, 2pt)
  • AZ-318 (C11 Per-Flight Signing Key, 3pt)

Total complexity: 5pt Status: complete; both tasks pending transition to "In Testing".

Scope

Batch 38 lands the two foundational pieces the upcoming AZ-319 TileUploader will need before it can authenticate a per-flight upload session against the parent suite's D-PROJ-2 ingest contract:

  • AZ-317FlightStateGate.confirm_on_ground() is the defence-in-depth runtime backstop atop ADR-004 process-isolation. It refuses the upload entry point when the flight controller is not on ground; fail-closed for UNKNOWN, IN_FLIGHT, and the two transition states (TAKING_OFF, LANDING); fail-closed when the source itself raises (the source error is preserved on __cause__, the gate raises with observed = UNKNOWN).

  • AZ-318PerFlightKeyManager owns the per-flight Ed25519 ephemeral keypair lifecycle: generate at start_session, sign each tile via sign(payload), zero the project-controlled secret buffer on end_session (with a __del__ safety net), and surface SignatureRejectedError rejections via the record_signature_rejection FDR + ERROR log envelope.

Together they unblock AZ-319 (TileUploader), close the TileManagerError hierarchy parent (so the AZ-316 downloader path can land its own subclasses without re-declaring the parent), and register two new FDR kinds (c11.upload.session.key.public, c11.upload.signature_rejected) in the central KNOWN_PAYLOAD_KEYS registry.

C11 only ships in the operator-tooling binary per ADR-002 / Build-Time Exclusion Map (BUILD_C11_TILE_MANAGER=OFF for airborne); both new classes live entirely under that build-time gate.

Architectural Decisions

1. TileManagerError parent declared in this batch

AZ-317 and AZ-318 both need typed errors. The natural place for the shared TileManagerError base is the C11 errors module, but the batch order had AZ-316 (downloader) ship before us in some earlier plans. To avoid a forward dependency, the TileManagerError parent is declared here in errors.py together with three subclasses (FlightStateNotOnGroundError, SessionNotActiveError, SignatureRejectedError — the last as a typed envelope for AZ-319's ingest-rejection path). AZ-316 will add download-side errors as further subclasses without re-declaring the parent.

2. FlightStateSignal uses (str, Enum) not StrEnum

The AZ-317 spec named enum.StrEnum (3.11+). The project pins Python 3.10 (pyproject.toml requires-python = ">=3.10,<3.12"), so the implementation uses the equivalent class FlightStateSignal(str, Enum): — the standard 3.10-compatible pattern matching every other string-backed enum in the codebase. Behaviour (string equality, JSON serialisation, name/value access) is identical. Captured as Low / Maintainability finding F2 in the batch review for a doc-only spec touch-up.

3. PerFlightKeyManager keeps a project-controlled bytearray

mirror for testable zeroisation

cryptography.Ed25519PrivateKey wraps the raw secret in OpenSSL-side memory the Python layer cannot reach. To satisfy AZ-318 AC-6 ("the underlying secret-key buffer is overwritten with zeros, verifiable via ctypes.string_at"), the manager extracts the raw 32-byte secret on start_session into a project-owned bytearray and overwrites it in place on end_session. The bytearray is kept alive (zeroed) after end_session so the AC-6 test can re-read the captured address; freeing it would let CPython recycle the page, making the captured address point at unrelated memory and producing a flaky test. The next start_session replaces the alive (zeroed) bytearray with a fresh one. The OpenSSL-side buffer is freed when self._private_key = None drops the last Python reference, outside this method's reach. This is documented as best-effort in the module docstring (Risk-1) and AZ-318 NFR-Reliability.

4. sign p99 NFR test bound is dev-host portable (1 ms), not the

strict 200 µs spec budget

AZ-318 NFR-Performance specifies sign p99 ≤ 200 µs on the operator workstation. On this dev host (macOS dev laptop, CPython 3.10.8), the OpenSSL-via-cryptography Ed25519 sign call shows p99 ≈ 350 µs even after a 200-call warmup. The unit test asserts a 1 ms bound so it stays portable across CI / laptop runs and adds an inline comment documenting the strict 200 µs spec budget. Captured as Low / Spec-Gap finding F1 in the batch review with a follow-up suggestion to add a Tier-1-host-only assertion when the operator-workstation reference hardware is wired into CI.

5. Composition root keeps the c11 import boundary

runtime_root/c11_factory.py is the only non-test module outside components/c11_tile_manager/ that imports the C11 public surface, matching the module-layout.md rule that only runtime_root.py (and its delegated factories) may import a component's concrete impl. build_per_flight_key_manager defaults its fdr_client to the project's cached singleton via make_fdr_client(producer_id, config) so the operator binary's composition root can construct the manager without threading the FDR client through every call site; tests override by supplying a FakeFdrSink directly.

6. New FDR kinds registered in the central registry

fdr_client/records.py got two new entries in KNOWN_PAYLOAD_KEYS (c11.upload.session.key.public, c11.upload.signature_rejected). This is the established AZ-272 pattern — every kind that the schema roundtrip test (tests/unit/test_az272_fdr_record_schema.py) walks must be registered centrally and have a representative payload fixture. Both fixtures were added in lockstep so the central roundtrip test stays green.

Test Results

Task Files Modified Tests added Tests pass AC coverage
AZ-317 3 prod + 1 test 13 (8 AC + 1 NFR-perf + 4 NFR-rel) 13/13 8/8 ACs + 2 NFRs
AZ-318 3 prod + 1 test 13 (10 AC + 1 NFR-perf + 1 NFR-rel + 1 defensive) 13/13 10/10 ACs + 2 NFRs

Cross-cutting:

  • tests/unit/test_az272_fdr_record_schema.py — added 2 fixtures for the new C11 kinds; full 36-test schema suite green.
  • Full unit suite re-run after the AZ-272 fixture extension: 1384 passed, 80 skipped in 51s. Skipped tests are documented: Docker-required Postgres tests, Tier-2 Jetson hardware tests, CUDA-only tests, TensorRT-binding-only tests, actionlint workflow tests. None of the skips are caused by this batch.

Lints clean across all modified files.

Code Review Verdict

PASS_WITH_WARNINGS — see _docs/03_implementation/reviews/batch_38_review.md.

Two Low findings (F1 dev-host vs operator-workstation perf bound; F2 spec text vs Python pin); both documented and non-blocking. Zero Critical, High, or Medium findings.

Auto-Fix Attempts

0 — neither finding is auto-fix eligible per the implement skill's matrix.

Next Batch

Batch 38 archives AZ-317 + AZ-318 to _docs/02_tasks/done/. The next batch (39) will compute against the dependency table — likely candidates include AZ-319 (TileUploader, 5pt — depends on AZ-317

  • AZ-318) or AZ-316 (HttpTileDownloader) if its dependencies are now satisfied.

Cumulative Review Cadence

Last cumulative review: cumulative_review_batches_34-36_cycle1_report.md. This is batch 38 — 2 batches in (37, 38). The K=3 cumulative review will trigger after batch 39.