mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 21:51:13 +00:00
[AZ-317] [AZ-318] C11 upload-side: flight-state gate + per-flight key
Batch 38 (cycle 1) lands the two upload-side prerequisites the upcoming AZ-319 TileUploader needs to authenticate per-flight sessions against the parent suite's D-PROJ-2 ingest contract. AZ-317 FlightStateGate: - confirm_on_ground() defence-in-depth gate atop ADR-004 process isolation; fail-closed for UNKNOWN, IN_FLIGHT, TAKING_OFF, LANDING, and source-failure (mapped to UNKNOWN with original exception preserved on __cause__). - ERROR log on refusal, INFO log on pass, single source call per invocation (no polling, no retry). AZ-318 PerFlightKeyManager: - Per-flight ephemeral Ed25519 keypair via the project-pinned cryptography library; sign(payload) -> 64-byte Ed25519 signature. - Best-effort zeroisation of a project-controlled bytearray mirror on end_session; OpenSSL-side buffer freed via dropped reference. - __del__ safety net with WARN log if end_session was missed. - start_session emits FDR kind=c11.upload.session.key.public so the safety officer can correlate flights with key fingerprints. - record_signature_rejection emits FDR + ERROR log on parent-suite ingest rejection (security-critical, never silently dropped). Shared C11 plumbing: - TileManagerError parent + 3 subclasses (FlightStateNotOnGroundError, SessionNotActiveError, SignatureRejectedError envelope). - FlightStateSignal (str, Enum) and PublicKeyFingerprint DTOs. - FlightStateSource Protocol on c11_tile_manager.interface. - runtime_root.c11_factory factories for both new services. - Two new FDR kinds registered in fdr_client.records central KNOWN_PAYLOAD_KEYS; AZ-272 schema-roundtrip fixtures added in lockstep so the central test stays green. Tests: 26 new + 2 fixture additions; full suite 1384 passed, 80 skipped (documented Docker / Tier-2 / CUDA gates). Code review: PASS_WITH_WARNINGS — 2 Low findings documented in _docs/03_implementation/reviews/batch_38_review.md (dev-host vs operator-workstation perf bound; spec text named StrEnum but project pins Python 3.10). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,171 @@
|
||||
# C11 Flight-State Gate — ON_GROUND Defence-in-Depth for Upload
|
||||
|
||||
**Task**: AZ-317_c11_flight_state_gate
|
||||
**Name**: C11 Flight-State Gate
|
||||
**Description**: Implement the `flight_state == ON_GROUND` precondition check that `TileUploader.upload_pending_tiles` calls before any network egress. Defines a thin C11-internal `FlightStateSource` Protocol with one method `current_flight_state() -> FlightStateSignal`; the concrete impl is supplied by E-C8 later (subscribes to the FC adapter's flight-state stream). The gate raises `FlightStateNotOnGroundError` if the current state is anything other than `ON_GROUND` (`IN_FLIGHT`, `UNKNOWN`, `TAKING_OFF`, `LANDING` all block). Logs an ERROR with the observed state and refuses to proceed; this is defence-in-depth atop ADR-004's process-level isolation, NOT the primary control.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module
|
||||
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
|
||||
**Tracker**: AZ-317
|
||||
**Epic**: AZ-251 (E-C11)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 2 `confirm_flight_state` method, § 5 `FlightStateNotOnGroundError`, § 7 ADR-004 process isolation as the primary control.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — ERROR log shape on refusal.
|
||||
|
||||
## Problem
|
||||
|
||||
Without an ON_GROUND gate at the upload entry point:
|
||||
|
||||
- AC-8.4 collapses partially: ADR-004 process isolation alone protects the airborne process from importing C11, but if the operator workstation accidentally triggers `upload_pending_tiles` while the FC reports `IN_FLIGHT` (e.g. operator started the upload during a pre-landing approach window), the upload would proceed — which is the exact scenario the safety case forbids.
|
||||
- `RESTRICT-SAT-1` (no in-flight Service calls) loses one of its enforcement points; the operator workflow assumes uploads only happen when wheels are on the ground.
|
||||
- `TileUploader` has no place to reach for "what is the FC saying right now?" without coupling tightly to E-C8's full FC adapter surface.
|
||||
- The Risk-7 mitigation in description.md ("The FC believes it's airborne") becomes a documentation-only claim with no test surface.
|
||||
|
||||
This task delivers the gate as a thin pre-call hook. It does NOT implement the FC subscription itself (that's E-C8's job); it consumes whatever C8 ships via the `FlightStateSource` Protocol declared here.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `FlightStateSource` Protocol at `src/gps_denied_onboard/components/c11_tilemanager/interface.py` (re-exported from `__init__.py`):
|
||||
```python
|
||||
@runtime_checkable
|
||||
class FlightStateSource(Protocol):
|
||||
def current_flight_state(self) -> FlightStateSignal: ...
|
||||
```
|
||||
- `FlightStateSignal` enum at `src/gps_denied_onboard/components/c11_tilemanager/_types.py`:
|
||||
```python
|
||||
class FlightStateSignal(StrEnum):
|
||||
ON_GROUND = "on_ground"
|
||||
TAKING_OFF = "taking_off"
|
||||
IN_FLIGHT = "in_flight"
|
||||
LANDING = "landing"
|
||||
UNKNOWN = "unknown"
|
||||
```
|
||||
- A `FlightStateGate` class at `src/gps_denied_onboard/components/c11_tilemanager/flight_state_gate.py`:
|
||||
- Constructor: `__init__(self, *, source: FlightStateSource, logger: Logger)`.
|
||||
- One public method: `confirm_on_ground() -> FlightStateSignal`. Returns `FlightStateSignal.ON_GROUND` on pass; raises `FlightStateNotOnGroundError(observed: FlightStateSignal, observed_at: datetime)` on fail.
|
||||
- Emits an ERROR log on every refusal with `kind="c11.upload.refused.flight_state"` carrying `{observed, observed_at_iso}`.
|
||||
- Emits an INFO log on pass with `kind="c11.upload.flight_state_confirmed"`.
|
||||
- `FlightStateNotOnGroundError` defined at `src/gps_denied_onboard/components/c11_tilemanager/errors.py`. Subclasses `TileManagerError` (the C11 error family parent declared in AZ-316).
|
||||
- The gate is integrated by the TileUploader task (separate task; called once per `upload_pending_tiles` invocation BEFORE any C6 read or network setup).
|
||||
- Composition root constructs `FlightStateGate` with a `FlightStateSource` impl supplied by the C8 adapter wiring (when E-C8 ships). For now, a fake-source pattern is documented in this task's tests; the production wiring is a one-line factory swap.
|
||||
- A `Clock` injection is NOT needed here — the gate reads "now" via `datetime.utcnow()` at the call site for the error's `observed_at` timestamp, which is purely diagnostic and not a control surface.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `FlightStateSource` Protocol (single method `current_flight_state`).
|
||||
- `FlightStateSignal` enum (5 states).
|
||||
- `FlightStateGate` class with `confirm_on_ground()` method.
|
||||
- `FlightStateNotOnGroundError` definition.
|
||||
- ERROR log on refusal; INFO log on pass.
|
||||
- Composition-root entry for the gate (factory `build_flight_state_gate(source) -> FlightStateGate`).
|
||||
- Conformance test for `FlightStateSource` Protocol against a fake.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The actual FC subscription (subscribing to MAVLink heartbeat or equivalent) — owned by E-C8.
|
||||
- The TileUploader integration of this gate — owned by the TileUploader task in this epic.
|
||||
- ADR-004 build-time exclusion enforcement — owned by E-BOOT.
|
||||
- Sector boundary or any geographic awareness — gate is state-only.
|
||||
- Mid-upload re-checks (the gate fires once at start; in-progress uploads are NOT torn down if the FC transitions mid-upload, per the operator workflow which expects atomic batches).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: ON_GROUND passes**
|
||||
Given a `FlightStateSource` returning `ON_GROUND`
|
||||
When `confirm_on_ground()` is called
|
||||
Then the call returns `FlightStateSignal.ON_GROUND`; no exception is raised; ONE INFO log `kind="c11.upload.flight_state_confirmed"` is emitted
|
||||
|
||||
**AC-2: IN_FLIGHT raises**
|
||||
Given a `FlightStateSource` returning `IN_FLIGHT`
|
||||
When `confirm_on_ground()` is called
|
||||
Then `FlightStateNotOnGroundError` is raised with `observed = IN_FLIGHT`; ONE ERROR log `kind="c11.upload.refused.flight_state"` is emitted; the exception message names the observed state
|
||||
|
||||
**AC-3: UNKNOWN raises (fail-closed)**
|
||||
Given a `FlightStateSource` returning `UNKNOWN`
|
||||
When `confirm_on_ground()` is called
|
||||
Then `FlightStateNotOnGroundError` is raised; the gate is fail-closed by design (UNKNOWN is treated as "not safe to upload")
|
||||
|
||||
**AC-4: TAKING_OFF and LANDING raise**
|
||||
Given a `FlightStateSource` returning `TAKING_OFF` or `LANDING`
|
||||
When `confirm_on_ground()` is called
|
||||
Then `FlightStateNotOnGroundError` is raised in both cases; transition states are NOT treated as ON_GROUND
|
||||
|
||||
**AC-5: Source exception propagates with context**
|
||||
Given a `FlightStateSource` whose `current_flight_state()` raises `RuntimeError("FC disconnected")`
|
||||
When `confirm_on_ground()` is called
|
||||
Then `FlightStateNotOnGroundError` is raised with `observed = UNKNOWN` (the gate maps source failure to UNKNOWN, not to the raw exception); the original `RuntimeError` is set as `__cause__` on the new exception; ONE ERROR log carries the original exception's message
|
||||
|
||||
**AC-6: FlightStateSource Protocol is conformance-checkable**
|
||||
Given a class implementing `current_flight_state` returning `FlightStateSignal`
|
||||
When `isinstance(impl, FlightStateSource)` is evaluated under `runtime_checkable`
|
||||
Then the result is `True`; for a class missing the method, the result is `False`
|
||||
|
||||
**AC-7: Error carries diagnostic fields**
|
||||
Given a refusal
|
||||
When the test inspects the raised `FlightStateNotOnGroundError`
|
||||
Then `exc.observed`, `exc.observed_at` (datetime, UTC, second-precision) are populated; the message starts with `"Upload refused: flight state is "` followed by the observed state name
|
||||
|
||||
**AC-8: Gate does not retry**
|
||||
Given the source returns `IN_FLIGHT` then `ON_GROUND` on a hypothetical second call
|
||||
When `confirm_on_ground()` is called once
|
||||
Then `current_flight_state()` is called EXACTLY once (verifiable via spy); the gate does NOT poll-and-retry
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `confirm_on_ground` p99 ≤ 1 ms when the source returns synchronously (the gate is a thin wrapper; its cost is dominated by the source's own implementation, which this task does not constrain).
|
||||
|
||||
**Compatibility**
|
||||
- `FlightStateSignal` is a stdlib `StrEnum` (Python 3.11+); no `pydantic` or `attrs`.
|
||||
- No new third-party dependencies.
|
||||
|
||||
**Reliability**
|
||||
- The gate is fail-closed: UNKNOWN, transition states, and source-failures all block the upload. The cost of a false-block (skipped upload) is small; the cost of a false-pass (upload during flight) is unbounded per the safety case.
|
||||
- The gate does NOT cache state across calls; each `confirm_on_ground()` invocation re-queries the source.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Fake source returns ON_GROUND | `confirm_on_ground` returns `ON_GROUND`; INFO log emitted |
|
||||
| AC-2 | Fake source returns IN_FLIGHT | `FlightStateNotOnGroundError`; ERROR log; observed=IN_FLIGHT |
|
||||
| AC-3 | Fake source returns UNKNOWN | `FlightStateNotOnGroundError` |
|
||||
| AC-4 | Fake source returns TAKING_OFF; then LANDING | Both raise |
|
||||
| AC-5 | Fake source raises `RuntimeError` | `FlightStateNotOnGroundError` with `observed=UNKNOWN`; `__cause__` set; ERROR log carries original message |
|
||||
| AC-6 | `isinstance` check on conforming + non-conforming fakes | True / False |
|
||||
| AC-7 | Inspect raised exception fields | `observed`, `observed_at` populated; message format correct |
|
||||
| AC-8 | Spy on `current_flight_state` call count | Exactly 1 call per `confirm_on_ground` |
|
||||
| NFR-perf | Microbench gate × 100k with synchronous fake | p99 ≤ 1 ms |
|
||||
| NFR-reliability-fail-closed | Each non-ON_GROUND state | All raise; coverage matrix complete |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The gate is fail-closed for UNKNOWN and any source exception. Documented; opening this default would require a Choose A/B/C/D coordination with the safety reviewer.
|
||||
- Transition states (`TAKING_OFF`, `LANDING`) are NOT treated as ON_GROUND — operators must wait until the FC reports `ON_GROUND`. This is intentional and documented; the operator workflow's typical pause between landing and upload-trigger covers it.
|
||||
- The gate calls `current_flight_state()` exactly once per `confirm_on_ground` — no polling, no retries. Documented behaviour; the upper-layer TileUploader handles retries at the upload-batch level if the operator wants to retry after a fail.
|
||||
- This task introduces no new third-party dependencies.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: `FlightStateSource` Protocol surface diverges from C8's eventual impl**
|
||||
- *Risk*: When E-C8 (AZ-261) ships its FC adapter, the natural public method might not be `current_flight_state` — could be `latest_heartbeat()` or `state_stream`.
|
||||
- *Mitigation*: Document the Protocol as a thin C11-facing adapter; if C8's natural surface differs, an adapter class wraps it (`FlightStateSourceAdapter(c8_fc_adapter)` — owned by E-C8's wiring task). The Protocol's narrow surface (one method, one return type) makes adapting trivial.
|
||||
|
||||
**Risk 2: UNKNOWN state during FC link recovery is too aggressive**
|
||||
- *Risk*: A transient FC connection blip causes UNKNOWN; operator sees a refusal during a perfectly-on-ground state.
|
||||
- *Mitigation*: Documented as fail-closed; the operator workflow tolerates re-triggering the upload after the FC recovers (the upload journal preserves pending tiles between attempts). E-C8's FC adapter is responsible for state debouncing if the false-UNKNOWN rate is operationally too high; not C11's concern.
|
||||
|
||||
**Risk 3: Fail-closed during a real on-ground emergency upload**
|
||||
- *Risk*: A safety officer urgently needs to trigger an upload but the FC is reporting UNKNOWN.
|
||||
- *Mitigation*: Per architecture, no operational scenario exists where an upload MUST succeed during FC-disconnect. The pending-upload journal preserves data; the upload runs after the FC reconnects. No override flag is provided — adding one would weaken the safety case and require Choose A/B/C/D approval.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: defence-in-depth ON_GROUND check at upload entry (description.md § 5; ADR-004; AC-8.4).
|
||||
- **Production code that must exist**: real `FlightStateSource` Protocol declaration, real `FlightStateSignal` enum, real `FlightStateGate` class with logging, real `FlightStateNotOnGroundError` in the C11 error family.
|
||||
- **Allowed external stubs**: tests MAY use a fake `FlightStateSource` impl (synchronous return value); production wiring uses the real C8 FC-adapter source via the composition root (when E-C8 ships).
|
||||
- **Unacceptable substitutes**: a hardcoded "always ON_GROUND" source (defeats the entire point); polling the source N times to "average" the state (introduces TOCTOU windows where the FC transitions mid-poll); silently mapping `UNKNOWN` to `ON_GROUND` (defeats fail-closed); reading FC state from a static config file (the FC's actual telemetry IS the source of truth).
|
||||
@@ -0,0 +1,205 @@
|
||||
# C11 Per-Flight Signing Key — Generation + Sign + Zeroise
|
||||
|
||||
**Task**: AZ-318_c11_signing_key
|
||||
**Name**: C11 Per-Flight Signing Key
|
||||
**Description**: Implement the per-flight ephemeral signing key used by `TileUploader` to authenticate each uploaded tile against the parent suite's D-PROJ-2 ingest contract. `PerFlightKeyManager` generates one fresh Ed25519 keypair per flight at upload-session start, signs the multipart payload per tile, and zeroises the secret-key buffer in memory after the session completes (success OR failure). The public key is recorded in the FDR (`kind="c11.upload.session.key.public"`) so the safety officer can later correlate which key signed which tiles. On `SignatureRejectedError` from `satellite-provider`, the manager emits an FDR alert (`kind="c11.upload.signature_rejected"`) — security-critical event, never silently dropped. Uses the project-pinned `cryptography` library; no custom crypto.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf
|
||||
**Component**: c11_tilemanager (epic AZ-251 / E-C11)
|
||||
**Tracker**: AZ-318
|
||||
**Epic**: AZ-251 (E-C11)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/components/12_c11_tilemanager/description.md` — § 3.2 D-PROJ-2 contract sketch (signature requirement), § 5 `SignatureRejectedError`, § 7 R09 key-compromise mitigation.
|
||||
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` — `kind="c11.upload.session.key.public"` and `kind="c11.upload.signature_rejected"` envelopes.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — INFO/ERROR log shapes for key lifecycle events.
|
||||
- `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — D-PROJ-2 design task #1 (parent-suite ingest contract), specifically the `signature` field requirement.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a per-flight ephemeral signing key:
|
||||
|
||||
- D-PROJ-2 contract sketch demands every uploaded tile carry a `signature` field; without it, `satellite-provider`'s ingest endpoint will reject every payload.
|
||||
- The R09 risk (key compromise) is unmitigated — a single static API key would compromise every flight's uploads on first leak; per-flight keys bound the blast radius to one flight.
|
||||
- The "ingest-side voting layer" (D-PROJ-2 design task #2) cannot trust uploaded tiles without a way to associate each tile with its source flight; the public key is the binding.
|
||||
- AC-NEW-7 (cache-poisoning safety budget) loses one of its layers — the voting layer relies on per-flight keys to detect collusion (multiple compromised companions colluding becomes detectable when their key fingerprints differ from the safety officer's pre-flight enrolment record).
|
||||
- Per `description.md` § 5: `SignatureRejectedError` is a security-critical event; without a structured handler, it would either crash the upload run or be silently caught.
|
||||
- The C11-ST-03 security test (key zeroised after upload) has no implementation to verify against — without zeroisation, the secret-key bytes remain in heap memory long after the upload completes, increasing exfil window.
|
||||
|
||||
This task delivers the key lifecycle. It does NOT plumb the key into the upload payload (TileUploader task does that); it provides `sign(payload)` as the boundary.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `PerFlightKeyManager` class at `src/gps_denied_onboard/components/c11_tilemanager/signing_key.py`:
|
||||
- Constructor: `__init__(self, *, fdr_client: FdrClient, logger: Logger)`. No state at construction time.
|
||||
- `start_session(flight_id: uuid.UUID) -> PublicKeyFingerprint`:
|
||||
1. Generates a fresh Ed25519 keypair via `cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PrivateKey.generate()`.
|
||||
2. Stores the private key in `self._private_key` (instance state, not module-level).
|
||||
3. Computes `public_key_pem = private_key.public_key().public_bytes(...)`.
|
||||
4. Computes `fingerprint = sha256(public_key_pem).hex()[:16]`.
|
||||
5. Emits FDR `kind="c11.upload.session.key.public"` with `{flight_id, public_key_pem, fingerprint, generated_at_iso}`.
|
||||
6. Emits INFO log `kind="c11.upload.session.key.generated"` with `{flight_id, fingerprint}` (NEVER the private key).
|
||||
7. Returns `PublicKeyFingerprint(flight_id, public_key_pem, fingerprint, generated_at)`.
|
||||
- `sign(payload: bytes) -> bytes`:
|
||||
1. Raises `SessionNotActiveError` if `self._private_key is None`.
|
||||
2. Returns `self._private_key.sign(payload)` (Ed25519 signature is 64 bytes).
|
||||
3. No log emission per call (would flood at upload throughput).
|
||||
- `end_session() -> None`:
|
||||
1. If `self._private_key is None`, no-op.
|
||||
2. Calls `self._zeroise_private_key()` (overwrites the secret-key bytes with zeros via `cryptography`'s key-deletion guidance, then sets `self._private_key = None`).
|
||||
3. Emits INFO log `kind="c11.upload.session.key.zeroised"`.
|
||||
- `record_signature_rejection(flight_id, tile_id) -> None`:
|
||||
1. Emits FDR `kind="c11.upload.signature_rejected"` with `{flight_id, tile_id, fingerprint, observed_at_iso}`.
|
||||
2. Emits ERROR log with the same payload.
|
||||
- `PublicKeyFingerprint` DTO at `src/gps_denied_onboard/components/c11_tilemanager/_types.py` — `@dataclass(frozen=True)` with the four fields above.
|
||||
- `SessionNotActiveError` defined in `src/gps_denied_onboard/components/c11_tilemanager/errors.py` — subclasses `TileManagerError`. (`SignatureRejectedError` is also defined here, but raised by `TileUploader` after parsing the ingest response, NOT by this task.)
|
||||
- The TileUploader task (separate) calls:
|
||||
- `start_session(flight_id)` once per upload run.
|
||||
- `sign(payload)` once per tile.
|
||||
- `record_signature_rejection(...)` on each per-tile rejection from the ingest response.
|
||||
- `end_session()` in a `finally` block guaranteeing zeroisation on success or failure.
|
||||
- The composition root constructs `PerFlightKeyManager` and injects it into `TileUploader`. Factory: `build_per_flight_key_manager(fdr_client, logger) -> PerFlightKeyManager`.
|
||||
- A `__del__` safety net calls `end_session()` if it was never explicitly called, with a WARN log noting the leak. This is a belt-and-braces guarantee, not the primary control.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `PerFlightKeyManager` class (4 public methods + `__del__` safety net).
|
||||
- `PublicKeyFingerprint` DTO.
|
||||
- `SessionNotActiveError` definition.
|
||||
- Ed25519 keypair generation using the project-pinned `cryptography` library.
|
||||
- Best-effort zeroisation of the secret-key buffer (via `cryptography` library's recommended deletion path; documented as "best-effort" because Python heap zeroisation cannot be guaranteed without ctypes-level pinning).
|
||||
- FDR emission on session start (public key) and on signature rejection.
|
||||
- INFO log on session lifecycle events; ERROR log on signature rejection.
|
||||
- Composition-root factory.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The TileUploader integration (signing into multipart payloads) — owned by the TileUploader task.
|
||||
- Pre-flight key enrolment workflow (the safety officer's record of expected per-flight public keys) — owned by C12 operator tooling.
|
||||
- HSM / TPM-backed key storage — out of scope this cycle; the assumption is that the operator workstation's process is trusted enough for ephemeral in-memory keys, with zeroisation as the residual hygiene.
|
||||
- Mid-session key rotation — one key per session; rotation requires `end_session` + `start_session`.
|
||||
- Key persistence between processes — the key is in-memory ONLY; an upload session must complete in one process lifetime.
|
||||
- The `SignatureRejectedError` class itself is defined here but raised by TileUploader.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: `start_session` generates a fresh keypair and emits FDR**
|
||||
Given a fresh `PerFlightKeyManager`
|
||||
When `start_session(flight_id)` is called
|
||||
Then the manager holds a non-None `_private_key`; `PublicKeyFingerprint` is returned with a 16-char hex fingerprint; ONE FDR `kind="c11.upload.session.key.public"` is emitted with the public-key PEM; ONE INFO log without the private key
|
||||
|
||||
**AC-2: Two consecutive sessions produce different keys**
|
||||
Given `start_session(F1)` followed by `end_session()` followed by `start_session(F2)`
|
||||
When fingerprints are compared
|
||||
Then `fingerprint_F1 != fingerprint_F2` (cryptographically distinct keys); two FDR records are emitted, one per session
|
||||
|
||||
**AC-3: `sign` returns 64-byte Ed25519 signature**
|
||||
Given an active session
|
||||
When `sign(b"hello world")` is called
|
||||
Then a 64-byte signature is returned; the signature verifies against the session's public key (verifiable via `Ed25519PublicKey.verify`)
|
||||
|
||||
**AC-4: `sign` before `start_session` raises**
|
||||
Given a fresh `PerFlightKeyManager`
|
||||
When `sign(b"...")` is called without prior `start_session`
|
||||
Then `SessionNotActiveError` is raised; no signature is computed
|
||||
|
||||
**AC-5: `sign` after `end_session` raises**
|
||||
Given `start_session(F)` then `end_session()`
|
||||
When `sign(b"...")` is called
|
||||
Then `SessionNotActiveError` is raised
|
||||
|
||||
**AC-6: `end_session` zeroises and emits log**
|
||||
Given an active session
|
||||
When `end_session()` is called
|
||||
Then `self._private_key is None`; the underlying secret-key buffer is overwritten with zeros (verifiable via `ctypes.string_at` against the buffer address captured pre-zeroise); ONE INFO log `kind="c11.upload.session.key.zeroised"`
|
||||
|
||||
**AC-7: `__del__` safety net zeroises if `end_session` was missed**
|
||||
Given an active session whose owner is garbage-collected without calling `end_session`
|
||||
When the GC runs `__del__`
|
||||
Then `end_session()` runs implicitly; ONE WARN log `kind="c11.upload.session.key.zeroised_via_finalizer"`; the buffer is zeroised
|
||||
|
||||
**AC-8: `record_signature_rejection` emits FDR + ERROR log**
|
||||
Given an active session and a tile_id
|
||||
When `record_signature_rejection(flight_id, tile_id)` is called
|
||||
Then ONE FDR `kind="c11.upload.signature_rejected"` is emitted with `{flight_id, tile_id, fingerprint, observed_at_iso}`; ONE ERROR log with the same payload
|
||||
|
||||
**AC-9: Private key never logged anywhere**
|
||||
Given the full session lifecycle
|
||||
When all log records and all FDR records are captured
|
||||
Then the private-key PEM does NOT appear in ANY record (verifiable via byte search across the captured stream)
|
||||
|
||||
**AC-10: `end_session` is idempotent**
|
||||
Given an active session
|
||||
When `end_session()` is called twice in a row
|
||||
Then the second call is a no-op; no exception is raised; no second INFO log is emitted
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- `sign` p99 ≤ 200 µs on the operator workstation (Ed25519 is fast; the bottleneck is the upload network, not signing).
|
||||
- `start_session` ≤ 5 ms (Ed25519 keygen is sub-millisecond; FDR emission + log emission dominate).
|
||||
|
||||
**Compatibility**
|
||||
- `cryptography` library at the project-pinned version. Verify before adding; do NOT bump unilaterally.
|
||||
- Ed25519 is available in `cryptography.hazmat.primitives.asymmetric.ed25519` since 2.6 — the project pin must be ≥ 2.6.
|
||||
|
||||
**Reliability**
|
||||
- The manager guarantees zeroisation on `end_session` AND on `__del__` — both paths converge through the same `_zeroise_private_key` helper.
|
||||
- The Python heap layer cannot guarantee bit-perfect zeroisation (objects may be relocated by the GC); this is documented. The mitigation is: keep the key buffer's lifetime as short as possible (one upload session) and rely on the OS-level memory protections (no swap on the operator workstation per RESTRICT-OPS-1).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `start_session` then capture FDR + log | Public PEM in FDR; fingerprint 16 hex chars; private key not in log |
|
||||
| AC-2 | Two sessions back-to-back | Different fingerprints |
|
||||
| AC-3 | Sign + verify roundtrip | 64-byte signature; verifies against public key |
|
||||
| AC-4 | `sign` without `start_session` | `SessionNotActiveError` |
|
||||
| AC-5 | `sign` after `end_session` | `SessionNotActiveError` |
|
||||
| AC-6 | `end_session` and inspect zeroised buffer | Buffer is all zeros; log emitted |
|
||||
| AC-7 | Drop reference + force GC | `__del__` runs `end_session`; WARN log |
|
||||
| AC-8 | `record_signature_rejection` | FDR + ERROR log with all fields |
|
||||
| AC-9 | Capture all logs/FDR for a full session; byte-search private PEM | Not present |
|
||||
| AC-10 | `end_session` twice | Second call is no-op; no second log |
|
||||
| NFR-perf-sign | Microbench `sign` × 100k | p99 ≤ 200 µs |
|
||||
| NFR-reliability-fingerprint-uniqueness | 1000 sessions with unique flight_ids | All 1000 fingerprints distinct (collision-resistant) |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The signing algorithm is Ed25519; no per-task choice (the parent suite's D-PROJ-2 contract requires Ed25519 per the leftover file's design).
|
||||
- The secret-key never leaves the manager — `sign(payload) -> bytes` is the only method that uses it; consumers do NOT touch the private key.
|
||||
- The public key is logged AND FDR'd (it is public by definition); the private key is NOT logged anywhere — code-review treats any private-key reference outside `signing_key.py` as a `Security` finding (Critical).
|
||||
- This task pins to the project's existing `cryptography` version. If the version doesn't support `Ed25519PrivateKey.generate()`, ASK the user before bumping (per `coderule.mdc` "verify the API actually exists in the pinned version").
|
||||
- `__del__` is a safety net, NOT the primary contract — consumers MUST call `end_session()` explicitly. Code-review treats reliance on `__del__` as a `Reliability` finding.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Python heap zeroisation is not bit-perfect**
|
||||
- *Risk*: The `cryptography` library returns the private key as a Python object; freeing the object's memory does not guarantee zeroisation (the GC may relocate objects).
|
||||
- *Mitigation*: Documented as "best-effort"; the operator workstation runs no-swap (RESTRICT-OPS-1); the key lifetime is bounded to one upload session (typically minutes); the residual exfil window is minimised. A future task could add ctypes-level pinning if the threat model tightens.
|
||||
|
||||
**Risk 2: `__del__` doesn't run when the process is killed (`SIGKILL`)**
|
||||
- *Risk*: A SIGKILL during an active session leaves the key buffer in heap memory until the OS reclaims the process pages.
|
||||
- *Mitigation*: Documented; the OS-level mitigation is process termination → memory pages reclaimed; on Linux with no swap, the bytes never hit disk. No software mitigation is feasible inside the killed process.
|
||||
|
||||
**Risk 3: FDR ringbuffer overrun loses the public-key record**
|
||||
- *Risk*: Under FDR backpressure (AZ-274 overrun), the `kind="c11.upload.session.key.public"` record might be dropped — the safety officer cannot correlate the upload with a key fingerprint later.
|
||||
- *Mitigation*: AZ-273's ringbuffer is sized per `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md`; this task adds NO new pressure but is documented as critical-priority. Mid-flight FDR loss is already an AC-NEW-1 concern; this task surfaces the dependency.
|
||||
|
||||
**Risk 4: `cryptography` library API drift across pins**
|
||||
- *Risk*: A minor `cryptography` bump renames `Ed25519PrivateKey.generate()` or changes its signature.
|
||||
- *Mitigation*: The task verifies the API against the pinned version (per `coderule.mdc`); the pin is recorded in `requirements.txt`; a wrapper isolates the library to this single class.
|
||||
|
||||
**Risk 5: Replay attack — captured signed payloads re-uploaded by an attacker**
|
||||
- *Risk*: An MITM captures a valid `(payload, signature)` pair and re-uploads to `satellite-provider`'s ingest endpoint.
|
||||
- *Mitigation*: Out of scope for this task — the parent suite's ingest endpoint owns nonce / timestamp validation per the D-PROJ-2 design. C11 includes `capture_timestamp` in the signed payload (per the leftover file's contract sketch); the parent suite rejects timestamps outside its acceptance window. This task does NOT add a separate nonce.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: per-flight ephemeral signing key per D-PROJ-2 contract, R09 mitigation, AC-NEW-7 voting-layer enabler (description.md § 7, leftover file design task #1).
|
||||
- **Production code that must exist**: real `PerFlightKeyManager` with real Ed25519 keypair generation via `cryptography`, real `sign`, real best-effort zeroisation, real FDR emission for public-key + signature-rejection events, real `__del__` safety net.
|
||||
- **Allowed external stubs**: tests MAY use a fake `FdrClient` (already provided by AZ-275 fake_fdr_sink) and a fake `Logger`; production wiring uses the real AZ-273 ringbuffer + AZ-266 logger.
|
||||
- **Unacceptable substitutes**: a hardcoded shared key reused across flights (defeats R09 mitigation); a pseudo-random "key" generated from `random.getrandbits` instead of `cryptography`'s CSPRNG (rolling our own crypto is rejected per `coderule.mdc`); skipping `end_session` zeroisation (loses C11-ST-03 test surface); logging the private key for "debugging" (Critical Security finding).
|
||||
Reference in New Issue
Block a user