Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-318_c11_signing_key.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

16 KiB
Raw Blame History

C11 Per-Flight Signing Key — Generation + Sign + Zeroise

Task: AZ-318_c11_signing_key Name: C11 Per-Flight Signing Key Description: Implement the per-flight ephemeral signing key used by TileUploader to authenticate each uploaded tile against the parent suite's D-PROJ-2 ingest contract. PerFlightKeyManager generates one fresh Ed25519 keypair per flight at upload-session start, signs the multipart payload per tile, and zeroises the secret-key buffer in memory after the session completes (success OR failure). The public key is recorded in the FDR (kind="c11.upload.session.key.public") so the safety officer can later correlate which key signed which tiles. On SignatureRejectedError from satellite-provider, the manager emits an FDR alert (kind="c11.upload.signature_rejected") — security-critical event, never silently dropped. Uses the project-pinned cryptography library; no custom crypto. Complexity: 3 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-273_fdr_client_ringbuf Component: c11_tilemanager (epic AZ-251 / E-C11) Tracker: AZ-318 Epic: AZ-251 (E-C11)

Document Dependencies

  • _docs/02_document/components/12_c11_tilemanager/description.md — § 3.2 D-PROJ-2 contract sketch (signature requirement), § 5 SignatureRejectedError, § 7 R09 key-compromise mitigation.
  • _docs/02_document/contracts/shared_fdr_client/fdr_record_schema.mdkind="c11.upload.session.key.public" and kind="c11.upload.signature_rejected" envelopes.
  • _docs/02_document/contracts/shared_logging/log_record_schema.md — INFO/ERROR log shapes for key lifecycle events.
  • _docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md — D-PROJ-2 design task #1 (parent-suite ingest contract), specifically the signature field requirement.

Problem

Without a per-flight ephemeral signing key:

  • D-PROJ-2 contract sketch demands every uploaded tile carry a signature field; without it, satellite-provider's ingest endpoint will reject every payload.
  • The R09 risk (key compromise) is unmitigated — a single static API key would compromise every flight's uploads on first leak; per-flight keys bound the blast radius to one flight.
  • The "ingest-side voting layer" (D-PROJ-2 design task #2) cannot trust uploaded tiles without a way to associate each tile with its source flight; the public key is the binding.
  • AC-NEW-7 (cache-poisoning safety budget) loses one of its layers — the voting layer relies on per-flight keys to detect collusion (multiple compromised companions colluding becomes detectable when their key fingerprints differ from the safety officer's pre-flight enrolment record).
  • Per description.md § 5: SignatureRejectedError is a security-critical event; without a structured handler, it would either crash the upload run or be silently caught.
  • The C11-ST-03 security test (key zeroised after upload) has no implementation to verify against — without zeroisation, the secret-key bytes remain in heap memory long after the upload completes, increasing exfil window.

This task delivers the key lifecycle. It does NOT plumb the key into the upload payload (TileUploader task does that); it provides sign(payload) as the boundary.

Outcome

  • A PerFlightKeyManager class at src/gps_denied_onboard/components/c11_tilemanager/signing_key.py:
    • Constructor: __init__(self, *, fdr_client: FdrClient, logger: Logger). No state at construction time.
    • start_session(flight_id: uuid.UUID) -> PublicKeyFingerprint:
      1. Generates a fresh Ed25519 keypair via cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PrivateKey.generate().
      2. Stores the private key in self._private_key (instance state, not module-level).
      3. Computes public_key_pem = private_key.public_key().public_bytes(...).
      4. Computes fingerprint = sha256(public_key_pem).hex()[:16].
      5. Emits FDR kind="c11.upload.session.key.public" with {flight_id, public_key_pem, fingerprint, generated_at_iso}.
      6. Emits INFO log kind="c11.upload.session.key.generated" with {flight_id, fingerprint} (NEVER the private key).
      7. Returns PublicKeyFingerprint(flight_id, public_key_pem, fingerprint, generated_at).
    • sign(payload: bytes) -> bytes:
      1. Raises SessionNotActiveError if self._private_key is None.
      2. Returns self._private_key.sign(payload) (Ed25519 signature is 64 bytes).
      3. No log emission per call (would flood at upload throughput).
    • end_session() -> None:
      1. If self._private_key is None, no-op.
      2. Calls self._zeroise_private_key() (overwrites the secret-key bytes with zeros via cryptography's key-deletion guidance, then sets self._private_key = None).
      3. Emits INFO log kind="c11.upload.session.key.zeroised".
    • record_signature_rejection(flight_id, tile_id) -> None:
      1. Emits FDR kind="c11.upload.signature_rejected" with {flight_id, tile_id, fingerprint, observed_at_iso}.
      2. Emits ERROR log with the same payload.
  • PublicKeyFingerprint DTO at src/gps_denied_onboard/components/c11_tilemanager/_types.py@dataclass(frozen=True) with the four fields above.
  • SessionNotActiveError defined in src/gps_denied_onboard/components/c11_tilemanager/errors.py — subclasses TileManagerError. (SignatureRejectedError is also defined here, but raised by TileUploader after parsing the ingest response, NOT by this task.)
  • The TileUploader task (separate) calls:
    • start_session(flight_id) once per upload run.
    • sign(payload) once per tile.
    • record_signature_rejection(...) on each per-tile rejection from the ingest response.
    • end_session() in a finally block guaranteeing zeroisation on success or failure.
  • The composition root constructs PerFlightKeyManager and injects it into TileUploader. Factory: build_per_flight_key_manager(fdr_client, logger) -> PerFlightKeyManager.
  • A __del__ safety net calls end_session() if it was never explicitly called, with a WARN log noting the leak. This is a belt-and-braces guarantee, not the primary control.

Scope

Included

  • PerFlightKeyManager class (4 public methods + __del__ safety net).
  • PublicKeyFingerprint DTO.
  • SessionNotActiveError definition.
  • Ed25519 keypair generation using the project-pinned cryptography library.
  • Best-effort zeroisation of the secret-key buffer (via cryptography library's recommended deletion path; documented as "best-effort" because Python heap zeroisation cannot be guaranteed without ctypes-level pinning).
  • FDR emission on session start (public key) and on signature rejection.
  • INFO log on session lifecycle events; ERROR log on signature rejection.
  • Composition-root factory.

Excluded

  • The TileUploader integration (signing into multipart payloads) — owned by the TileUploader task.
  • Pre-flight key enrolment workflow (the safety officer's record of expected per-flight public keys) — owned by C12 operator tooling.
  • HSM / TPM-backed key storage — out of scope this cycle; the assumption is that the operator workstation's process is trusted enough for ephemeral in-memory keys, with zeroisation as the residual hygiene.
  • Mid-session key rotation — one key per session; rotation requires end_session + start_session.
  • Key persistence between processes — the key is in-memory ONLY; an upload session must complete in one process lifetime.
  • The SignatureRejectedError class itself is defined here but raised by TileUploader.

Acceptance Criteria

AC-1: start_session generates a fresh keypair and emits FDR Given a fresh PerFlightKeyManager When start_session(flight_id) is called Then the manager holds a non-None _private_key; PublicKeyFingerprint is returned with a 16-char hex fingerprint; ONE FDR kind="c11.upload.session.key.public" is emitted with the public-key PEM; ONE INFO log without the private key

AC-2: Two consecutive sessions produce different keys Given start_session(F1) followed by end_session() followed by start_session(F2) When fingerprints are compared Then fingerprint_F1 != fingerprint_F2 (cryptographically distinct keys); two FDR records are emitted, one per session

AC-3: sign returns 64-byte Ed25519 signature Given an active session When sign(b"hello world") is called Then a 64-byte signature is returned; the signature verifies against the session's public key (verifiable via Ed25519PublicKey.verify)

AC-4: sign before start_session raises Given a fresh PerFlightKeyManager When sign(b"...") is called without prior start_session Then SessionNotActiveError is raised; no signature is computed

AC-5: sign after end_session raises Given start_session(F) then end_session() When sign(b"...") is called Then SessionNotActiveError is raised

AC-6: end_session zeroises and emits log Given an active session When end_session() is called Then self._private_key is None; the underlying secret-key buffer is overwritten with zeros (verifiable via ctypes.string_at against the buffer address captured pre-zeroise); ONE INFO log kind="c11.upload.session.key.zeroised"

AC-7: __del__ safety net zeroises if end_session was missed Given an active session whose owner is garbage-collected without calling end_session When the GC runs __del__ Then end_session() runs implicitly; ONE WARN log kind="c11.upload.session.key.zeroised_via_finalizer"; the buffer is zeroised

AC-8: record_signature_rejection emits FDR + ERROR log Given an active session and a tile_id When record_signature_rejection(flight_id, tile_id) is called Then ONE FDR kind="c11.upload.signature_rejected" is emitted with {flight_id, tile_id, fingerprint, observed_at_iso}; ONE ERROR log with the same payload

AC-9: Private key never logged anywhere Given the full session lifecycle When all log records and all FDR records are captured Then the private-key PEM does NOT appear in ANY record (verifiable via byte search across the captured stream)

AC-10: end_session is idempotent Given an active session When end_session() is called twice in a row Then the second call is a no-op; no exception is raised; no second INFO log is emitted

Non-Functional Requirements

Performance

  • sign p99 ≤ 200 µs on the operator workstation (Ed25519 is fast; the bottleneck is the upload network, not signing).
  • start_session ≤ 5 ms (Ed25519 keygen is sub-millisecond; FDR emission + log emission dominate).

Compatibility

  • cryptography library at the project-pinned version. Verify before adding; do NOT bump unilaterally.
  • Ed25519 is available in cryptography.hazmat.primitives.asymmetric.ed25519 since 2.6 — the project pin must be ≥ 2.6.

Reliability

  • The manager guarantees zeroisation on end_session AND on __del__ — both paths converge through the same _zeroise_private_key helper.
  • The Python heap layer cannot guarantee bit-perfect zeroisation (objects may be relocated by the GC); this is documented. The mitigation is: keep the key buffer's lifetime as short as possible (one upload session) and rely on the OS-level memory protections (no swap on the operator workstation per RESTRICT-OPS-1).

Unit Tests

AC Ref What to Test Required Outcome
AC-1 start_session then capture FDR + log Public PEM in FDR; fingerprint 16 hex chars; private key not in log
AC-2 Two sessions back-to-back Different fingerprints
AC-3 Sign + verify roundtrip 64-byte signature; verifies against public key
AC-4 sign without start_session SessionNotActiveError
AC-5 sign after end_session SessionNotActiveError
AC-6 end_session and inspect zeroised buffer Buffer is all zeros; log emitted
AC-7 Drop reference + force GC __del__ runs end_session; WARN log
AC-8 record_signature_rejection FDR + ERROR log with all fields
AC-9 Capture all logs/FDR for a full session; byte-search private PEM Not present
AC-10 end_session twice Second call is no-op; no second log
NFR-perf-sign Microbench sign × 100k p99 ≤ 200 µs
NFR-reliability-fingerprint-uniqueness 1000 sessions with unique flight_ids All 1000 fingerprints distinct (collision-resistant)

Constraints

  • The signing algorithm is Ed25519; no per-task choice (the parent suite's D-PROJ-2 contract requires Ed25519 per the leftover file's design).
  • The secret-key never leaves the manager — sign(payload) -> bytes is the only method that uses it; consumers do NOT touch the private key.
  • The public key is logged AND FDR'd (it is public by definition); the private key is NOT logged anywhere — code-review treats any private-key reference outside signing_key.py as a Security finding (Critical).
  • This task pins to the project's existing cryptography version. If the version doesn't support Ed25519PrivateKey.generate(), ASK the user before bumping (per coderule.mdc "verify the API actually exists in the pinned version").
  • __del__ is a safety net, NOT the primary contract — consumers MUST call end_session() explicitly. Code-review treats reliance on __del__ as a Reliability finding.

Risks & Mitigation

Risk 1: Python heap zeroisation is not bit-perfect

  • Risk: The cryptography library returns the private key as a Python object; freeing the object's memory does not guarantee zeroisation (the GC may relocate objects).
  • Mitigation: Documented as "best-effort"; the operator workstation runs no-swap (RESTRICT-OPS-1); the key lifetime is bounded to one upload session (typically minutes); the residual exfil window is minimised. A future task could add ctypes-level pinning if the threat model tightens.

Risk 2: __del__ doesn't run when the process is killed (SIGKILL)

  • Risk: A SIGKILL during an active session leaves the key buffer in heap memory until the OS reclaims the process pages.
  • Mitigation: Documented; the OS-level mitigation is process termination → memory pages reclaimed; on Linux with no swap, the bytes never hit disk. No software mitigation is feasible inside the killed process.

Risk 3: FDR ringbuffer overrun loses the public-key record

  • Risk: Under FDR backpressure (AZ-274 overrun), the kind="c11.upload.session.key.public" record might be dropped — the safety officer cannot correlate the upload with a key fingerprint later.
  • Mitigation: AZ-273's ringbuffer is sized per _docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md; this task adds NO new pressure but is documented as critical-priority. Mid-flight FDR loss is already an AC-NEW-1 concern; this task surfaces the dependency.

Risk 4: cryptography library API drift across pins

  • Risk: A minor cryptography bump renames Ed25519PrivateKey.generate() or changes its signature.
  • Mitigation: The task verifies the API against the pinned version (per coderule.mdc); the pin is recorded in requirements.txt; a wrapper isolates the library to this single class.

Risk 5: Replay attack — captured signed payloads re-uploaded by an attacker

  • Risk: An MITM captures a valid (payload, signature) pair and re-uploads to satellite-provider's ingest endpoint.
  • Mitigation: Out of scope for this task — the parent suite's ingest endpoint owns nonce / timestamp validation per the D-PROJ-2 design. C11 includes capture_timestamp in the signed payload (per the leftover file's contract sketch); the parent suite rejects timestamps outside its acceptance window. This task does NOT add a separate nonce.

Runtime Completeness

  • Named capability: per-flight ephemeral signing key per D-PROJ-2 contract, R09 mitigation, AC-NEW-7 voting-layer enabler (description.md § 7, leftover file design task #1).
  • Production code that must exist: real PerFlightKeyManager with real Ed25519 keypair generation via cryptography, real sign, real best-effort zeroisation, real FDR emission for public-key + signature-rejection events, real __del__ safety net.
  • Allowed external stubs: tests MAY use a fake FdrClient (already provided by AZ-275 fake_fdr_sink) and a fake Logger; production wiring uses the real AZ-273 ringbuffer + AZ-266 logger.
  • Unacceptable substitutes: a hardcoded shared key reused across flights (defeats R09 mitigation); a pseudo-random "key" generated from random.getrandbits instead of cryptography's CSPRNG (rolling our own crypto is rejected per coderule.mdc); skipping end_session zeroisation (loses C11-ST-03 test surface); logging the private key for "debugging" (Critical Security finding).