Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-323_c10_manifest_builder.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

18 KiB
Raw Blame History

C10 Manifest Builder — Content-Hash Table + Operator-Key Ed25519 Signing

Task: AZ-323_c10_manifest_builder Name: C10 Manifest Builder Description: Implement ManifestBuilder, the C10-internal phase that produces the signed cache Manifest covering EVERY shipped artifact (engines, FAISS index, calibration JSON, all tile hashes from C6) plus the build-identity tuple (model_ids, calibration_sha256, sorted_tile_hashes, sector_class, bbox, zoom_levels) whose canonical hash is manifest_hash — the D-C10-1 idempotence key. Serializes the Manifest as canonical JSON (sorted keys, no whitespace) at cache_root/Manifest.json, computes its own SHA-256 sidecar via AZ-280, and writes a detached Ed25519 signature at cache_root/Manifest.json.sig using the operator's signing key from key_path. Refuses to sign with a non-operator key when config.c10.signing_mode = "operator" (C10-ST-01). Emits the signing_public_key_fingerprint into the Manifest itself so verifiers can pin the trust root. Complexity: 3 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-303_c6_storage_interfaces Component: c10_provisioning (epic AZ-252 / E-C10) Tracker: AZ-323 Epic: AZ-252 (E-C10)

Document Dependencies

  • _docs/02_document/contracts/shared_helpers/sha256_sidecar.md — atomic write + sidecar pattern (AZ-280).
  • _docs/02_document/contracts/c6_tile_cache/tile_metadata_store.mdquery_by_bbox returning per-tile sha256 set by AZ-316.
  • _docs/02_document/components/11_c10_provisioning/description.md — § 1 idempotence, § 5 ManifestWriteError, § 7 D-C10-3 sidecar coverage.

Problem

Without a real Manifest builder:

  • D-C10-1 (idempotent re-run via manifest hash) cannot be implemented — T5's "did anything change?" check has no canonical hash to compare.
  • D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable — the takeoff verifier (T4) has nothing to verify against.
  • AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses without a signed Manifest at takeoff.
  • C10-ST-01 (build refuses dev-key signing in operator mode) cannot be enforced without a signing key check.
  • The signing_public_key_fingerprint field is the trust anchor for the airborne ManifestVerifier; without it, the verifier cannot decide which key is allowed to vouch for a Manifest.
  • A Manifest that is huge (100k tile hashes × 80 bytes = 8 MB) but human-inspectable is operator-friendly; without canonical JSON ordering, two builds of the same input produce different bytes and break idempotence.

This task delivers the Manifest serialization + signing. It does NOT compile engines (AZ-321), embed tiles (AZ-322), or run the takeoff verify (T4).

Outcome

  • A ManifestBuilder class at src/gps_denied_onboard/components/c10_provisioning/manifest_builder.py:
    • Constructor: __init__(self, *, sidecar: Sha256Sidecar, signer: ManifestSigner, tile_metadata_store: TileMetadataStore, logger: Logger, clock: Clock, config: C10ManifestConfig).
    • C10ManifestConfig (@dataclass(frozen=True)): signing_mode: enum {operator, dev}, allowed_operator_fingerprints: tuple[str, ...], schema_version: str = "1.0".
    • Public method: build_manifest(input: ManifestBuildInput) -> ManifestArtifact.
    • ManifestBuildInput (@dataclass(frozen=True)): cache_root: Path, bbox: Bbox, zoom_levels: tuple[int, ...], sector_class: SectorClassification, engine_entries: tuple[EngineCacheEntry, ...], descriptor_index_path: Path, calibration_path: Path, key_path: Path.
    • ManifestArtifact (@dataclass(frozen=True)): manifest_path: Path, signature_path: Path, manifest_hash: str, signing_public_key_fingerprint: str, total_artifacts_listed: int.
  • A ManifestSigner Protocol at src/gps_denied_onboard/components/c10_provisioning/interface.py:
    @runtime_checkable
    class ManifestSigner(Protocol):
        def load_signing_key(self, key_path: Path) -> SigningKeyHandle: ...
        def sign(self, key: SigningKeyHandle, payload_bytes: bytes) -> bytes: ...
        def public_key_fingerprint(self, key: SigningKeyHandle) -> str: ...
    
    Default impl Ed25519ManifestSigner uses the cryptography library (already pinned via AZ-318 for per-flight keys).
  • Method flow:
    1. Load operator signing key: signer.load_signing_key(input.key_path)SigningKeyHandle.
    2. Compute signing_public_key_fingerprint = signer.public_key_fingerprint(key) (sha256 of the raw 32-byte ed25519 public key, hex).
    3. Operator-mode gate (C10-ST-01): if config.signing_mode == "operator" AND fingerprint not in config.allowed_operator_fingerprints → raise ManifestWriteError("signing key fingerprint not in allowed_operator_fingerprints"); ERROR log with the offending fingerprint. If config.signing_mode == "dev" AND fingerprint matches an allowed operator fingerprint → emit WARN c10.manifest.dev_mode_with_operator_key (operator key being used in dev mode is suspicious but allowed).
    4. Compute per-artifact hashes:
      • For each engine entry: read entry.engine_sha256_hex (already computed by AZ-321; do NOT re-hash).
      • For descriptor index: call sidecar.read_sidecar(input.descriptor_index_path) → expect a 64-char hex digest.
      • For calibration JSON: sha256_hex(open(calibration_path, 'rb').read()) — calibration is small (KB).
      • For tiles: call tile_metadata_store.query_by_bbox(bbox, zoom_levels, sector_class) → list of TileMetadata with sha256_hex field (set by AZ-316). Sort by (zoom, lat, lon, source) for determinism. Compute tiles_coverage_sha256 = sha256(b"\n".join(f"{t.tile_id}:{t.sha256_hex}".encode() for t in sorted_tiles)).
    5. Build the canonical Manifest dict:
      {
        "schema_version": "1.0",
        "build": {
          "bbox": {...},
          "zoom_levels": [16, 17, 18],
          "sector_class": "stable_rear",
          "built_at": "2026-05-10T12:00:00Z",
          "manifest_hash": "<sha256-hex>"
        },
        "artifacts": {
          "engines": [{"path": "engines/dinov2_vpr_sm87_jp62_trt103_fp16.engine", "sha256": "<hex>"}, ...],
          "descriptor_index": {"path": "descriptors/corpus.index", "sha256": "<hex>"},
          "calibration": {"path": "calibration/int8_calibration.json", "sha256": "<hex>"},
          "tiles_coverage": {"sha256": "<hex>", "tile_count": <int>}
        },
        "signing_public_key_fingerprint": "<hex>"
      }
      
    6. Compute manifest_hash as sha256(canonical_json(build_identity_tuple)) where build_identity_tuple = sorted({model_ids, calibration_sha256, tiles_coverage_sha256, sector_class, bbox, zoom_levels}). This is the D-C10-1 idempotence key. Insert into the Manifest dict at build.manifest_hash AFTER computation.
    7. Serialize the Manifest dict as canonical JSON: orjson.dumps(manifest, option=orjson.OPT_SORT_KEYS | orjson.OPT_INDENT_2).decode(). Append a trailing newline.
    8. Atomic-write the JSON via sidecar.write_with_sidecar(cache_root / "Manifest.json", canonical_json_bytes) — produces Manifest.json + Manifest.json.sha256 (the latter is the Manifest's OWN sha256, used by T4).
    9. Sign the canonical JSON bytes: signature_bytes = signer.sign(key, canonical_json_bytes) (raw Ed25519 signature, 64 bytes).
    10. Atomic-write the signature: sidecar.atomic_write(cache_root / "Manifest.json.sig", signature_bytes) (no .sha256 sidecar for the signature itself — signature integrity is verified by Ed25519 over the Manifest bytes).
    11. Return ManifestArtifact(manifest_path, signature_path, manifest_hash, signing_public_key_fingerprint, total_artifacts_listed).
  • INFO log on successful build (c10.manifest.build.success with manifest_hash + total_artifacts_listed); ERROR on ManifestWriteError; WARN on dev-mode-with-operator-key.

Scope

Included

  • ManifestBuilder class with the single public method.
  • ManifestSigner Protocol + Ed25519ManifestSigner default impl.
  • Canonical JSON serialization (sorted keys, sorted lists where order is content-defining).
  • Operator-key gate per signing_mode config.
  • Per-artifact hash computation (engines, descriptor index, calibration, tiles aggregate).
  • Atomic writes via AZ-280 for both Manifest.json and Manifest.json.sig.
  • Composition-root factory build_manifest_builder.
  • Conformance test for ManifestSigner Protocol.

Excluded

  • The orchestration of when to build (T5 owns).
  • Engine compilation / descriptor generation (AZ-321 / AZ-322).
  • Manifest verification (T4 owns).
  • Idempotence "should we skip the build?" decision (T5 owns; this task always rebuilds when called).
  • ManifestCoverageError (T5 owns; this task lists what it's told, doesn't enumerate cache_root).
  • Key generation — operator's long-lived key is provisioned out-of-band; this task only loads + uses.
  • Multi-key signing (M-of-N quorum) — single-key per build.
  • Compressed Manifest format — JSON for human inspection.

Acceptance Criteria

AC-1: Happy path produces Manifest + sig + sidecars Given a valid input with 3 engines, 1 descriptor index, 1 calibration JSON, 100 tiles When build_manifest(input) is called Then Manifest.json, Manifest.json.sha256, Manifest.json.sig are all present at cache_root/; the Manifest contains 3 engine entries, 1 descriptor_index entry, 1 calibration entry, 1 tiles_coverage entry; manifest_hash is a 64-char lowercase hex string; the returned ManifestArtifact.total_artifacts_listed == 5 (engines + index + calibration + tiles_coverage as one logical artifact + the Manifest itself counts separately if at all)

AC-2: Determinism — same input produces byte-identical Manifest Given the same ManifestBuildInput run twice on different days (different built_at) When the canonical JSON is compared with built_at redacted Then both runs produce byte-identical bytes — proves canonical JSON ordering works; same manifest_hash. (This is the foundation for T5's idempotence check.)

AC-3: Signature verifies against the public key Given the signature file + the operator's public key When cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PublicKey.verify(signature, manifest_bytes) is called Then no exception is raised — proves the signing produced a valid Ed25519 signature

AC-4: Operator-mode rejects unknown fingerprint Given config.signing_mode = "operator" and config.allowed_operator_fingerprints = ("known_fp",) and a key file whose fingerprint is "unknown_fp" When build_manifest is called Then ManifestWriteError is raised with a message naming both fingerprints (the offered one + the allowlist); ZERO files are written; ONE ERROR log

AC-5: Operator-mode accepts known fingerprint Given config.signing_mode = "operator" and the key file's fingerprint IS in the allowlist When build_manifest is called Then the build succeeds; ZERO WARN logs about dev-mode

AC-6: Dev-mode with non-operator key emits no warning Given config.signing_mode = "dev" and a random dev key (not in allowlist) When build_manifest is called Then build succeeds; signing_public_key_fingerprint is the dev key's; ZERO warnings about operator key in dev mode

AC-7: Dev-mode with operator key emits warning Given config.signing_mode = "dev" and a key whose fingerprint IS in allowed_operator_fingerprints When build_manifest is called Then build succeeds; ONE WARN log c10.manifest.dev_mode_with_operator_key with the fingerprint

AC-8: Tile coverage hash is sort-order-deterministic Given the same 100 tiles loaded in two different SQL row orders (e.g., insertion order vs index scan) When tiles_coverage_sha256 is computed Then both runs produce the same hash — proves the (zoom, lat, lon, source) sort is canonical

AC-9: ManifestWriteError on key load failure Given a key_path that does not exist OR contains malformed PEM When signer.load_signing_key(key_path) raises Then ManifestWriteError("operator signing key load failed: <reason>") is raised; ZERO files are written; the original cryptography exception is chained as __cause__ for diagnosis

AC-10: Atomic write — partial Manifest impossible Given the Manifest is being written and the process is killed mid-write When restarted Then either the previous-good Manifest OR the new Manifest is at the path; never a half-written JSON. (AZ-280's atomic-write contract.)

AC-11: Manifest's own sidecar is consistent Given a freshly-written Manifest.json When sha256_hex(open("Manifest.json", "rb").read()) is computed and compared to Manifest.json.sha256 Then the values match — T4's verifier walks all sidecars and this is the entry point

AC-12: total_artifacts_listed equals dict-counted artifacts Given an input with N engines + 1 index + 1 calibration + tiles_coverage When ManifestArtifact.total_artifacts_listed is inspected Then it equals N + 3 (engines + index + calibration + tiles_coverage); does NOT count the Manifest itself or the signature

Non-Functional Requirements

Performance

  • Build wall-clock ≤ 5 s for a 100k-tile corpus on Tier-1 dev workstation: sorting 100k tile hashes + computing one SHA-256 over the concatenated string is ~50 MB of input → ~100 ms; serializing JSON with 100k tile_count is fast (single integer); engine + index + calibration hashes are already computed upstream. Total ≤ 5 s leaves headroom.
  • Operator-mode fingerprint check is a single string comparison.

Compatibility

  • Uses orjson (already pinned via AZ-272 for FDR), cryptography (already pinned via AZ-318 for per-flight keys), hashlib (stdlib).
  • No new third-party dependencies.

Reliability

  • Operator-key gate is fail-closed: unknown fingerprint → no Manifest written.
  • Atomic writes prevent half-written Manifests on process kill.
  • Canonical JSON ensures bit-identical Manifests for identical inputs (foundation for D-C10-1 idempotence in T5).

Unit Tests

AC Ref What to Test Required Outcome
AC-1 Build with 3 engines + index + calibration + 100 tiles All files present; counts match
AC-2 Build twice, redact built_at, compare bytes Identical
AC-3 Verify signature with public key No raise
AC-4 Operator mode + unknown fingerprint ManifestWriteError; no files
AC-5 Operator mode + known fingerprint Success; no warnings
AC-6 Dev mode + dev key Success; no warnings
AC-7 Dev mode + operator-allowlisted key Success; ONE warning
AC-8 Tile rows in different orders Same tiles_coverage_sha256
AC-9 Missing or malformed key file ManifestWriteError; chained cause
AC-10 Kill mid-write No half-Manifest
AC-11 Verify Manifest's own sidecar Hashes match
AC-12 Inspect total_artifacts_listed Counts engines+index+calibration+tiles_coverage
NFR-perf 100k-tile bench ≤ 5 s wall clock
NFR-reliability-fail-closed Operator mode + unknown fp Fail-closed; nothing written

Constraints

  • Canonical JSON via orjson with OPT_SORT_KEYS; this task does NOT use a different JSON library.
  • Atomic writes via AZ-280 for BOTH Manifest.json and Manifest.json.sig; no naked Path.write_bytes().
  • manifest_hash excludes built_at (it's a build-identity hash, not a Manifest-bytes hash).
  • The Manifest's own SHA-256 sidecar (Manifest.json.sha256) IS the Manifest-bytes hash and is used by T4 at takeoff.
  • Tile coverage hashing is via aggregate tiles_coverage_sha256, NOT per-tile entries in the Manifest (keeps Manifest bounded).
  • Signature is detached (separate .sig file); embedded signatures are NOT permitted (would require parsing before verifying).
  • Ed25519 only; this task does NOT add other algorithms.
  • Operator-key fingerprint allowlist is config-driven; no hardcoded keys.

Risks & Mitigation

Risk 1: built_at makes Manifests non-deterministic for the same input

  • Risk: Idempotence check in T5 compares manifest_hash only, but if T5 reads the Manifest bytes directly elsewhere it could see different bytes for "same" build.
  • Mitigation: AC-2 explicitly excludes built_at from the manifest_hash computation. T5 compares hashes, not bytes. Documented in the Manifest schema.

Risk 2: tiles_coverage as aggregate hides which tile changed

  • Risk: When verify fails at takeoff (T4), the operator only learns "tiles_coverage hash mismatch", not WHICH tile drifted.
  • Mitigation: T4's failure path can re-walk per-tile hashes against C6 to identify the offender. The Manifest stays small; debugging detail is computed on-demand. Documented in T4's scope.

Risk 3: cryptography API breaks between minor versions

  • Risk: Ed25519 API changes (unlikely but cryptography does ship breaking changes occasionally).
  • Mitigation: Pin to the same version used by AZ-318. The Ed25519ManifestSigner is the only place using the API; a one-place adapter swap on upgrade.

Risk 4: Operator key file format ambiguity

  • Risk: Operators might supply a key in PKCS8, OpenSSH, or raw 32-byte format.
  • Mitigation: Ed25519ManifestSigner.load_signing_key accepts PEM-encoded PKCS8 only (matches AZ-318's convention); other formats raise ManifestWriteError with explicit format hint.

Risk 5: Dev key accidentally signs an operator-mode build

  • Risk: Operator runs build with signing_mode = "operator" but supplies a dev key by mistake.
  • Mitigation: AC-4 covers; the gate is fail-closed and logs the offending fingerprint so the operator can correct.

Runtime Completeness

  • Named capability: signed Manifest production with content-hash table covering every shipped artifact, D-C10-1 idempotence key (manifest_hash), C10-ST-01 operator-mode gate (epic § Acceptance C10-IT-01, C10-IT-02, C10-ST-01).
  • Production code that must exist: real ManifestBuilder orchestrating real Ed25519ManifestSigner (cryptography library) + real AZ-280 atomic writes + real C6 query_by_bbox to gather tile hashes; real config-driven fingerprint allowlist.
  • Allowed external stubs: tests MAY use a fake ManifestSigner with a known keypair generated in-test + a fake tile_metadata_store (AZ-303 conformance fakes); production wiring uses cryptography.hazmat.
  • Unacceptable substitutes: HMAC instead of Ed25519 (different trust model — symmetric vs asymmetric); embedding the signature in the JSON (defeats the parse-before-verify problem at takeoff); Python-only pickle of the Manifest (not human-inspectable, not canonical-byte stable); skipping the operator-fingerprint allowlist when signing_mode = "operator" (defeats C10-ST-01); using json.dumps without OPT_SORT_KEYS (breaks AC-2 determinism and breaks T5's idempotence).