[AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key

Batch 38 (cycle 1) lands the two upload-side prerequisites the
upcoming AZ-319 TileUploader needs to authenticate per-flight
sessions against the parent suite's D-PROJ-2 ingest contract.

AZ-317 FlightStateGate:
- confirm_on_ground() defence-in-depth gate atop ADR-004 process
  isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF,
  LANDING, and source-failure (mapped to UNKNOWN with original
  exception preserved on __cause__).
- ERROR log on refusal, INFO log on pass, single source call per
  invocation (no polling, no retry).

AZ-318 PerFlightKeyManager:
- Per-flight ephemeral Ed25519 keypair via the project-pinned
  cryptography library; sign(payload) -> 64-byte Ed25519 signature.
- Best-effort zeroisation of a project-controlled bytearray mirror
  on end_session; OpenSSL-side buffer freed via dropped reference.
- __del__ safety net with WARN log if end_session was missed.
- start_session emits FDR kind=c11.upload.session.key.public so the
  safety officer can correlate flights with key fingerprints.
- record_signature_rejection emits FDR + ERROR log on parent-suite
  ingest rejection (security-critical, never silently dropped).

Shared C11 plumbing:
- TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError,
  SessionNotActiveError, SignatureRejectedError envelope).
- FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs.
- FlightStateSource Protocol on c11_tile_manager.interface.
- runtime_root.c11_factory factories for both new services.
- Two new FDR kinds registered in fdr_client.records central
  KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in
  lockstep so the central test stays green.

Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80
skipped (documented Docker / Tier-2 / CUDA gates).

Code review: PASS_WITH_WARNINGS — 2 Low findings documented in
_docs/03_implementation/reviews/batch_38_review.md (dev-host vs
operator-workstation perf bound; spec text named StrEnum but
project pins Python 3.10).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 05:48:52 +03:00
parent ca0430a44d
commit cde237e236
16 changed files with 1936 additions and 8 deletions
@@ -1,171 +0,0 @@
# C11 Flight-State Gate — ON_GROUND Defence-in-Depth for Upload
**Task**: AZ-317_c11_flight_state_gate
**Name**: C11 Flight-State Gate
**Description**: Implement the `flight_state == ON_GROUND` precondition check that `TileUploader.upload_pending_tiles` calls before any network egress. Defines a thin C11-internal `FlightStateSource` Protocol with one method `current_flight_state() -> FlightStateSignal`; the concrete impl is supplied by E-C8 later (subscribes to the FC adapter's flight-state stream). The gate raises `FlightStateNotOnGroundError` if the current state is anything other than `ON_GROUND` (`IN_FLIGHT`, `UNKNOWN`, `TAKING_OFF`, `LANDING` all block). Logs an ERROR with the observed state and refuses to proceed; this is defence-in-depth atop ADR-004's process-level isolation, NOT the primary control.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
**Tracker**: AZ-317
**Epic**: AZ-251 (E-C11)
### Document Dependencies
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 2 `confirm_flight_state` method, § 5 `FlightStateNotOnGroundError`, § 7 ADR-004 process isolation as the primary control.
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — ERROR log shape on refusal.
## Problem
Without an ON_GROUND gate at the upload entry point:
- AC-8.4 collapses partially: ADR-004 process isolation alone protects the airborne process from importing C11, but if the operator workstation accidentally triggers `upload_pending_tiles` while the FC reports `IN_FLIGHT` (e.g. operator started the upload during a pre-landing approach window), the upload would proceed — which is the exact scenario the safety case forbids.
- `RESTRICT-SAT-1` (no in-flight Service calls) loses one of its enforcement points; the operator workflow assumes uploads only happen when wheels are on the ground.
- `TileUploader` has no place to reach for "what is the FC saying right now?" without coupling tightly to E-C8's full FC adapter surface.
- The Risk-7 mitigation in description.md ("The FC believes it's airborne") becomes a documentation-only claim with no test surface.
This task delivers the gate as a thin pre-call hook. It does NOT implement the FC subscription itself (that's E-C8's job); it consumes whatever C8 ships via the `FlightStateSource` Protocol declared here.
## Outcome
- A `FlightStateSource` Protocol at `src/gps_denied_onboard/components/c11_tilemanager/interface.py` (re-exported from `__init__.py`):
```python
@runtime_checkable
class FlightStateSource(Protocol):
def current_flight_state(self) -> FlightStateSignal: ...
```
- `FlightStateSignal` enum at `src/gps_denied_onboard/components/c11_tilemanager/_types.py`:
```python
class FlightStateSignal(StrEnum):
ON_GROUND = "on_ground"
TAKING_OFF = "taking_off"
IN_FLIGHT = "in_flight"
LANDING = "landing"
UNKNOWN = "unknown"
```
- A `FlightStateGate` class at `src/gps_denied_onboard/components/c11_tilemanager/flight_state_gate.py`:
- Constructor: `__init__(self, *, source: FlightStateSource, logger: Logger)`.
- One public method: `confirm_on_ground() -> FlightStateSignal`. Returns `FlightStateSignal.ON_GROUND` on pass; raises `FlightStateNotOnGroundError(observed: FlightStateSignal, observed_at: datetime)` on fail.
- Emits an ERROR log on every refusal with `kind="c11.upload.refused.flight_state"` carrying `{observed, observed_at_iso}`.
- Emits an INFO log on pass with `kind="c11.upload.flight_state_confirmed"`.
- `FlightStateNotOnGroundError` defined at `src/gps_denied_onboard/components/c11_tilemanager/errors.py`. Subclasses `TileManagerError` (the C11 error family parent declared in AZ-316).
- The gate is integrated by the TileUploader task (separate task; called once per `upload_pending_tiles` invocation BEFORE any C6 read or network setup).
- Composition root constructs `FlightStateGate` with a `FlightStateSource` impl supplied by the C8 adapter wiring (when E-C8 ships). For now, a fake-source pattern is documented in this task's tests; the production wiring is a one-line factory swap.
- A `Clock` injection is NOT needed here — the gate reads "now" via `datetime.utcnow()` at the call site for the error's `observed_at` timestamp, which is purely diagnostic and not a control surface.
## Scope
### Included
- `FlightStateSource` Protocol (single method `current_flight_state`).
- `FlightStateSignal` enum (5 states).
- `FlightStateGate` class with `confirm_on_ground()` method.
- `FlightStateNotOnGroundError` definition.
- ERROR log on refusal; INFO log on pass.
- Composition-root entry for the gate (factory `build_flight_state_gate(source) -> FlightStateGate`).
- Conformance test for `FlightStateSource` Protocol against a fake.
### Excluded
- The actual FC subscription (subscribing to MAVLink heartbeat or equivalent) — owned by E-C8.
- The TileUploader integration of this gate — owned by the TileUploader task in this epic.
- ADR-004 build-time exclusion enforcement — owned by E-BOOT.
- Sector boundary or any geographic awareness — gate is state-only.
- Mid-upload re-checks (the gate fires once at start; in-progress uploads are NOT torn down if the FC transitions mid-upload, per the operator workflow which expects atomic batches).
## Acceptance Criteria
**AC-1: ON_GROUND passes**
Given a `FlightStateSource` returning `ON_GROUND`
When `confirm_on_ground()` is called
Then the call returns `FlightStateSignal.ON_GROUND`; no exception is raised; ONE INFO log `kind="c11.upload.flight_state_confirmed"` is emitted
**AC-2: IN_FLIGHT raises**
Given a `FlightStateSource` returning `IN_FLIGHT`
When `confirm_on_ground()` is called
Then `FlightStateNotOnGroundError` is raised with `observed = IN_FLIGHT`; ONE ERROR log `kind="c11.upload.refused.flight_state"` is emitted; the exception message names the observed state
**AC-3: UNKNOWN raises (fail-closed)**
Given a `FlightStateSource` returning `UNKNOWN`
When `confirm_on_ground()` is called
Then `FlightStateNotOnGroundError` is raised; the gate is fail-closed by design (UNKNOWN is treated as "not safe to upload")
**AC-4: TAKING_OFF and LANDING raise**
Given a `FlightStateSource` returning `TAKING_OFF` or `LANDING`
When `confirm_on_ground()` is called
Then `FlightStateNotOnGroundError` is raised in both cases; transition states are NOT treated as ON_GROUND
**AC-5: Source exception propagates with context**
Given a `FlightStateSource` whose `current_flight_state()` raises `RuntimeError("FC disconnected")`
When `confirm_on_ground()` is called
Then `FlightStateNotOnGroundError` is raised with `observed = UNKNOWN` (the gate maps source failure to UNKNOWN, not to the raw exception); the original `RuntimeError` is set as `__cause__` on the new exception; ONE ERROR log carries the original exception's message
**AC-6: FlightStateSource Protocol is conformance-checkable**
Given a class implementing `current_flight_state` returning `FlightStateSignal`
When `isinstance(impl, FlightStateSource)` is evaluated under `runtime_checkable`
Then the result is `True`; for a class missing the method, the result is `False`
**AC-7: Error carries diagnostic fields**
Given a refusal
When the test inspects the raised `FlightStateNotOnGroundError`
Then `exc.observed`, `exc.observed_at` (datetime, UTC, second-precision) are populated; the message starts with `"Upload refused: flight state is "` followed by the observed state name
**AC-8: Gate does not retry**
Given the source returns `IN_FLIGHT` then `ON_GROUND` on a hypothetical second call
When `confirm_on_ground()` is called once
Then `current_flight_state()` is called EXACTLY once (verifiable via spy); the gate does NOT poll-and-retry
## Non-Functional Requirements
**Performance**
- `confirm_on_ground` p99 ≤ 1 ms when the source returns synchronously (the gate is a thin wrapper; its cost is dominated by the source's own implementation, which this task does not constrain).
**Compatibility**
- `FlightStateSignal` is a stdlib `StrEnum` (Python 3.11+); no `pydantic` or `attrs`.
- No new third-party dependencies.
**Reliability**
- The gate is fail-closed: UNKNOWN, transition states, and source-failures all block the upload. The cost of a false-block (skipped upload) is small; the cost of a false-pass (upload during flight) is unbounded per the safety case.
- The gate does NOT cache state across calls; each `confirm_on_ground()` invocation re-queries the source.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Fake source returns ON_GROUND | `confirm_on_ground` returns `ON_GROUND`; INFO log emitted |
| AC-2 | Fake source returns IN_FLIGHT | `FlightStateNotOnGroundError`; ERROR log; observed=IN_FLIGHT |
| AC-3 | Fake source returns UNKNOWN | `FlightStateNotOnGroundError` |
| AC-4 | Fake source returns TAKING_OFF; then LANDING | Both raise |
| AC-5 | Fake source raises `RuntimeError` | `FlightStateNotOnGroundError` with `observed=UNKNOWN`; `__cause__` set; ERROR log carries original message |
| AC-6 | `isinstance` check on conforming + non-conforming fakes | True / False |
| AC-7 | Inspect raised exception fields | `observed`, `observed_at` populated; message format correct |
| AC-8 | Spy on `current_flight_state` call count | Exactly 1 call per `confirm_on_ground` |
| NFR-perf | Microbench gate × 100k with synchronous fake | p99 ≤ 1 ms |
| NFR-reliability-fail-closed | Each non-ON_GROUND state | All raise; coverage matrix complete |
## Constraints
- The gate is fail-closed for UNKNOWN and any source exception. Documented; opening this default would require a Choose A/B/C/D coordination with the safety reviewer.
- Transition states (`TAKING_OFF`, `LANDING`) are NOT treated as ON_GROUND — operators must wait until the FC reports `ON_GROUND`. This is intentional and documented; the operator workflow's typical pause between landing and upload-trigger covers it.
- The gate calls `current_flight_state()` exactly once per `confirm_on_ground` — no polling, no retries. Documented behaviour; the upper-layer TileUploader handles retries at the upload-batch level if the operator wants to retry after a fail.
- This task introduces no new third-party dependencies.
## Risks & Mitigation
**Risk 1: `FlightStateSource` Protocol surface diverges from C8's eventual impl**
- *Risk*: When E-C8 (AZ-261) ships its FC adapter, the natural public method might not be `current_flight_state` — could be `latest_heartbeat()` or `state_stream`.
- *Mitigation*: Document the Protocol as a thin C11-facing adapter; if C8's natural surface differs, an adapter class wraps it (`FlightStateSourceAdapter(c8_fc_adapter)` — owned by E-C8's wiring task). The Protocol's narrow surface (one method, one return type) makes adapting trivial.
**Risk 2: UNKNOWN state during FC link recovery is too aggressive**
- *Risk*: A transient FC connection blip causes UNKNOWN; operator sees a refusal during a perfectly-on-ground state.
- *Mitigation*: Documented as fail-closed; the operator workflow tolerates re-triggering the upload after the FC recovers (the upload journal preserves pending tiles between attempts). E-C8's FC adapter is responsible for state debouncing if the false-UNKNOWN rate is operationally too high; not C11's concern.
**Risk 3: Fail-closed during a real on-ground emergency upload**
- *Risk*: A safety officer urgently needs to trigger an upload but the FC is reporting UNKNOWN.
- *Mitigation*: Per architecture, no operational scenario exists where an upload MUST succeed during FC-disconnect. The pending-upload journal preserves data; the upload runs after the FC reconnects. No override flag is provided — adding one would weaken the safety case and require Choose A/B/C/D approval.
## Runtime Completeness
- **Named capability**: defence-in-depth ON_GROUND check at upload entry (description.md § 5; ADR-004; AC-8.4).
- **Production code that must exist**: real `FlightStateSource` Protocol declaration, real `FlightStateSignal` enum, real `FlightStateGate` class with logging, real `FlightStateNotOnGroundError` in the C11 error family.
- **Allowed external stubs**: tests MAY use a fake `FlightStateSource` impl (synchronous return value); production wiring uses the real C8 FC-adapter source via the composition root (when E-C8 ships).
- **Unacceptable substitutes**: a hardcoded "always ON_GROUND" source (defeats the entire point); polling the source N times to "average" the state (introduces TOCTOU windows where the FC transitions mid-poll); silently mapping `UNKNOWN` to `ON_GROUND` (defeats fail-closed); reading FC state from a static config file (the FC's actual telemetry IS the source of truth).
@@ -1,205 +0,0 @@
# C11 Per-Flight Signing Key — Generation + Sign + Zeroise
**Task**: AZ-318_c11_signing_key
**Name**: C11 Per-Flight Signing Key
**Description**: Implement the per-flight ephemeral signing key used by `TileUploader` to authenticate each uploaded tile against the parent suite's D-PROJ-2 ingest contract. `PerFlightKeyManager` generates one fresh Ed25519 keypair per flight at upload-session start, signs the multipart payload per tile, and zeroises the secret-key buffer in memory after the session completes (success OR failure). The public key is recorded in the FDR (`kind="c11.upload.session.key.public"`) so the safety officer can later correlate which key signed which tiles. On `SignatureRejectedError` from `satellite-provider`, the manager emits an FDR alert (`kind="c11.upload.signature_rejected"`) — security-critical event, never silently dropped. Uses the project-pinned `cryptography` library; no custom crypto.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
**Tracker**: AZ-318
**Epic**: AZ-251 (E-C11)
### Document Dependencies
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 3.2 D-PROJ-2 contract sketch (signature requirement), § 5 `SignatureRejectedError`, § 7 R09 key-compromise mitigation.
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md``kind="c11.upload.session.key.public"` and `kind="c11.upload.signature_rejected"` envelopes.
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/ERROR log shapes for key lifecycle events.
- `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — D-PROJ-2 design task #1 (parent-suite ingest contract), specifically the `signature` field requirement.
## Problem
Without a per-flight ephemeral signing key:
- D-PROJ-2 contract sketch demands every uploaded tile carry a `signature` field; without it, `satellite-provider`'s ingest endpoint will reject every payload.
- The R09 risk (key compromise) is unmitigated — a single static API key would compromise every flight's uploads on first leak; per-flight keys bound the blast radius to one flight.
- The "ingest-side voting layer" (D-PROJ-2 design task #2) cannot trust uploaded tiles without a way to associate each tile with its source flight; the public key is the binding.
- AC-NEW-7 (cache-poisoning safety budget) loses one of its layers — the voting layer relies on per-flight keys to detect collusion (multiple compromised companions colluding becomes detectable when their key fingerprints differ from the safety officer's pre-flight enrolment record).
- Per `description.md` § 5: `SignatureRejectedError` is a security-critical event; without a structured handler, it would either crash the upload run or be silently caught.
- The C11-ST-03 security test (key zeroised after upload) has no implementation to verify against — without zeroisation, the secret-key bytes remain in heap memory long after the upload completes, increasing exfil window.
This task delivers the key lifecycle. It does NOT plumb the key into the upload payload (TileUploader task does that); it provides `sign(payload)` as the boundary.
## Outcome
- A `PerFlightKeyManager` class at `src/gps_denied_onboard/components/c11_tilemanager/signing_key.py`:
- Constructor: `__init__(self, *, fdr_client: FdrClient, logger: Logger)`. No state at construction time.
- `start_session(flight_id: uuid.UUID) -> PublicKeyFingerprint`:
1. Generates a fresh Ed25519 keypair via `cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PrivateKey.generate()`.
2. Stores the private key in `self._private_key` (instance state, not module-level).
3. Computes `public_key_pem = private_key.public_key().public_bytes(...)`.
4. Computes `fingerprint = sha256(public_key_pem).hex()[:16]`.
5. Emits FDR `kind="c11.upload.session.key.public"` with `{flight_id, public_key_pem, fingerprint, generated_at_iso}`.
6. Emits INFO log `kind="c11.upload.session.key.generated"` with `{flight_id, fingerprint}` (NEVER the private key).
7. Returns `PublicKeyFingerprint(flight_id, public_key_pem, fingerprint, generated_at)`.
- `sign(payload: bytes) -> bytes`:
1. Raises `SessionNotActiveError` if `self._private_key is None`.
2. Returns `self._private_key.sign(payload)` (Ed25519 signature is 64 bytes).
3. No log emission per call (would flood at upload throughput).
- `end_session() -> None`:
1. If `self._private_key is None`, no-op.
2. Calls `self._zeroise_private_key()` (overwrites the secret-key bytes with zeros via `cryptography`'s key-deletion guidance, then sets `self._private_key = None`).
3. Emits INFO log `kind="c11.upload.session.key.zeroised"`.
- `record_signature_rejection(flight_id, tile_id) -> None`:
1. Emits FDR `kind="c11.upload.signature_rejected"` with `{flight_id, tile_id, fingerprint, observed_at_iso}`.
2. Emits ERROR log with the same payload.
- `PublicKeyFingerprint` DTO at `src/gps_denied_onboard/components/c11_tilemanager/_types.py``@dataclass(frozen=True)` with the four fields above.
- `SessionNotActiveError` defined in `src/gps_denied_onboard/components/c11_tilemanager/errors.py` — subclasses `TileManagerError`. (`SignatureRejectedError` is also defined here, but raised by `TileUploader` after parsing the ingest response, NOT by this task.)
- The TileUploader task (separate) calls:
- `start_session(flight_id)` once per upload run.
- `sign(payload)` once per tile.
- `record_signature_rejection(...)` on each per-tile rejection from the ingest response.
- `end_session()` in a `finally` block guaranteeing zeroisation on success or failure.
- The composition root constructs `PerFlightKeyManager` and injects it into `TileUploader`. Factory: `build_per_flight_key_manager(fdr_client, logger) -> PerFlightKeyManager`.
- A `__del__` safety net calls `end_session()` if it was never explicitly called, with a WARN log noting the leak. This is a belt-and-braces guarantee, not the primary control.
## Scope
### Included
- `PerFlightKeyManager` class (4 public methods + `__del__` safety net).
- `PublicKeyFingerprint` DTO.
- `SessionNotActiveError` definition.
- Ed25519 keypair generation using the project-pinned `cryptography` library.
- Best-effort zeroisation of the secret-key buffer (via `cryptography` library's recommended deletion path; documented as "best-effort" because Python heap zeroisation cannot be guaranteed without ctypes-level pinning).
- FDR emission on session start (public key) and on signature rejection.
- INFO log on session lifecycle events; ERROR log on signature rejection.
- Composition-root factory.
### Excluded
- The TileUploader integration (signing into multipart payloads) — owned by the TileUploader task.
- Pre-flight key enrolment workflow (the safety officer's record of expected per-flight public keys) — owned by C12 operator tooling.
- HSM / TPM-backed key storage — out of scope this cycle; the assumption is that the operator workstation's process is trusted enough for ephemeral in-memory keys, with zeroisation as the residual hygiene.
- Mid-session key rotation — one key per session; rotation requires `end_session` + `start_session`.
- Key persistence between processes — the key is in-memory ONLY; an upload session must complete in one process lifetime.
- The `SignatureRejectedError` class itself is defined here but raised by TileUploader.
## Acceptance Criteria
**AC-1: `start_session` generates a fresh keypair and emits FDR**
Given a fresh `PerFlightKeyManager`
When `start_session(flight_id)` is called
Then the manager holds a non-None `_private_key`; `PublicKeyFingerprint` is returned with a 16-char hex fingerprint; ONE FDR `kind="c11.upload.session.key.public"` is emitted with the public-key PEM; ONE INFO log without the private key
**AC-2: Two consecutive sessions produce different keys**
Given `start_session(F1)` followed by `end_session()` followed by `start_session(F2)`
When fingerprints are compared
Then `fingerprint_F1 != fingerprint_F2` (cryptographically distinct keys); two FDR records are emitted, one per session
**AC-3: `sign` returns 64-byte Ed25519 signature**
Given an active session
When `sign(b"hello world")` is called
Then a 64-byte signature is returned; the signature verifies against the session's public key (verifiable via `Ed25519PublicKey.verify`)
**AC-4: `sign` before `start_session` raises**
Given a fresh `PerFlightKeyManager`
When `sign(b"...")` is called without prior `start_session`
Then `SessionNotActiveError` is raised; no signature is computed
**AC-5: `sign` after `end_session` raises**
Given `start_session(F)` then `end_session()`
When `sign(b"...")` is called
Then `SessionNotActiveError` is raised
**AC-6: `end_session` zeroises and emits log**
Given an active session
When `end_session()` is called
Then `self._private_key is None`; the underlying secret-key buffer is overwritten with zeros (verifiable via `ctypes.string_at` against the buffer address captured pre-zeroise); ONE INFO log `kind="c11.upload.session.key.zeroised"`
**AC-7: `__del__` safety net zeroises if `end_session` was missed**
Given an active session whose owner is garbage-collected without calling `end_session`
When the GC runs `__del__`
Then `end_session()` runs implicitly; ONE WARN log `kind="c11.upload.session.key.zeroised_via_finalizer"`; the buffer is zeroised
**AC-8: `record_signature_rejection` emits FDR + ERROR log**
Given an active session and a tile_id
When `record_signature_rejection(flight_id, tile_id)` is called
Then ONE FDR `kind="c11.upload.signature_rejected"` is emitted with `{flight_id, tile_id, fingerprint, observed_at_iso}`; ONE ERROR log with the same payload
**AC-9: Private key never logged anywhere**
Given the full session lifecycle
When all log records and all FDR records are captured
Then the private-key PEM does NOT appear in ANY record (verifiable via byte search across the captured stream)
**AC-10: `end_session` is idempotent**
Given an active session
When `end_session()` is called twice in a row
Then the second call is a no-op; no exception is raised; no second INFO log is emitted
## Non-Functional Requirements
**Performance**
- `sign` p99 ≤ 200 µs on the operator workstation (Ed25519 is fast; the bottleneck is the upload network, not signing).
- `start_session` ≤ 5 ms (Ed25519 keygen is sub-millisecond; FDR emission + log emission dominate).
**Compatibility**
- `cryptography` library at the project-pinned version. Verify before adding; do NOT bump unilaterally.
- Ed25519 is available in `cryptography.hazmat.primitives.asymmetric.ed25519` since 2.6 — the project pin must be ≥ 2.6.
**Reliability**
- The manager guarantees zeroisation on `end_session` AND on `__del__` — both paths converge through the same `_zeroise_private_key` helper.
- The Python heap layer cannot guarantee bit-perfect zeroisation (objects may be relocated by the GC); this is documented. The mitigation is: keep the key buffer's lifetime as short as possible (one upload session) and rely on the OS-level memory protections (no swap on the operator workstation per RESTRICT-OPS-1).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `start_session` then capture FDR + log | Public PEM in FDR; fingerprint 16 hex chars; private key not in log |
| AC-2 | Two sessions back-to-back | Different fingerprints |
| AC-3 | Sign + verify roundtrip | 64-byte signature; verifies against public key |
| AC-4 | `sign` without `start_session` | `SessionNotActiveError` |
| AC-5 | `sign` after `end_session` | `SessionNotActiveError` |
| AC-6 | `end_session` and inspect zeroised buffer | Buffer is all zeros; log emitted |
| AC-7 | Drop reference + force GC | `__del__` runs `end_session`; WARN log |
| AC-8 | `record_signature_rejection` | FDR + ERROR log with all fields |
| AC-9 | Capture all logs/FDR for a full session; byte-search private PEM | Not present |
| AC-10 | `end_session` twice | Second call is no-op; no second log |
| NFR-perf-sign | Microbench `sign` × 100k | p99 ≤ 200 µs |
| NFR-reliability-fingerprint-uniqueness | 1000 sessions with unique flight_ids | All 1000 fingerprints distinct (collision-resistant) |
## Constraints
- The signing algorithm is Ed25519; no per-task choice (the parent suite's D-PROJ-2 contract requires Ed25519 per the leftover file's design).
- The secret-key never leaves the manager — `sign(payload) -> bytes` is the only method that uses it; consumers do NOT touch the private key.
- The public key is logged AND FDR'd (it is public by definition); the private key is NOT logged anywhere — code-review treats any private-key reference outside `signing_key.py` as a `Security` finding (Critical).
- This task pins to the project's existing `cryptography` version. If the version doesn't support `Ed25519PrivateKey.generate()`, ASK the user before bumping (per `coderule.mdc` "verify the API actually exists in the pinned version").
- `__del__` is a safety net, NOT the primary contract — consumers MUST call `end_session()` explicitly. Code-review treats reliance on `__del__` as a `Reliability` finding.
## Risks & Mitigation
**Risk 1: Python heap zeroisation is not bit-perfect**
- *Risk*: The `cryptography` library returns the private key as a Python object; freeing the object's memory does not guarantee zeroisation (the GC may relocate objects).
- *Mitigation*: Documented as "best-effort"; the operator workstation runs no-swap (RESTRICT-OPS-1); the key lifetime is bounded to one upload session (typically minutes); the residual exfil window is minimised. A future task could add ctypes-level pinning if the threat model tightens.
**Risk 2: `__del__` doesn't run when the process is killed (`SIGKILL`)**
- *Risk*: A SIGKILL during an active session leaves the key buffer in heap memory until the OS reclaims the process pages.
- *Mitigation*: Documented; the OS-level mitigation is process termination → memory pages reclaimed; on Linux with no swap, the bytes never hit disk. No software mitigation is feasible inside the killed process.
**Risk 3: FDR ringbuffer overrun loses the public-key record**
- *Risk*: Under FDR backpressure (AZ-274 overrun), the `kind="c11.upload.session.key.public"` record might be dropped — the safety officer cannot correlate the upload with a key fingerprint later.
- *Mitigation*: AZ-273's ringbuffer is sized per `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md`; this task adds NO new pressure but is documented as critical-priority. Mid-flight FDR loss is already an AC-NEW-1 concern; this task surfaces the dependency.
**Risk 4: `cryptography` library API drift across pins**
- *Risk*: A minor `cryptography` bump renames `Ed25519PrivateKey.generate()` or changes its signature.
- *Mitigation*: The task verifies the API against the pinned version (per `coderule.mdc`); the pin is recorded in `requirements.txt`; a wrapper isolates the library to this single class.
**Risk 5: Replay attack — captured signed payloads re-uploaded by an attacker**
- *Risk*: An MITM captures a valid `(payload, signature)` pair and re-uploads to `satellite-provider`'s ingest endpoint.
- *Mitigation*: Out of scope for this task — the parent suite's ingest endpoint owns nonce / timestamp validation per the D-PROJ-2 design. C11 includes `capture_timestamp` in the signed payload (per the leftover file's contract sketch); the parent suite rejects timestamps outside its acceptance window. This task does NOT add a separate nonce.
## Runtime Completeness
- **Named capability**: per-flight ephemeral signing key per D-PROJ-2 contract, R09 mitigation, AC-NEW-7 voting-layer enabler (description.md § 7, leftover file design task #1).
- **Production code that must exist**: real `PerFlightKeyManager` with real Ed25519 keypair generation via `cryptography`, real `sign`, real best-effort zeroisation, real FDR emission for public-key + signature-rejection events, real `__del__` safety net.
- **Allowed external stubs**: tests MAY use a fake `FdrClient` (already provided by AZ-275 fake_fdr_sink) and a fake `Logger`; production wiring uses the real AZ-273 ringbuffer + AZ-266 logger.
- **Unacceptable substitutes**: a hardcoded shared key reused across flights (defeats R09 mitigation); a pseudo-random "key" generated from `random.getrandbits` instead of `cryptography`'s CSPRNG (rolling our own crypto is rejected per `coderule.mdc`); skipping `end_session` zeroisation (loses C11-ST-03 test surface); logging the private key for "debugging" (Critical Security finding).