Files
gps-denied-onboard/_docs/02_tasks/done/AZ-323_c10_manifest_builder.md
T
Oleksandr Bezdieniezhnykh e2bebefdfc [AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.

AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.

AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.

Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).

Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.

Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.

AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 02:37:14 +03:00

22 KiB
Raw Blame History

C10 Manifest Builder — Content-Hash Table + Operator-Key Ed25519 Signing

Task: AZ-323_c10_manifest_builder Name: C10 Manifest Builder Description: Implement ManifestBuilder, the C10-internal phase that produces the signed cache Manifest covering EVERY shipped artifact (engines, FAISS index, calibration JSON, all tile hashes from C6) plus the build-identity tuple (model_ids, calibration_sha256, sorted_tile_hashes, sector_class, bbox, zoom_levels, takeoff_origin, flight_id) whose canonical hash is manifest_hash — the D-C10-1 idempotence key. The takeoff_origin (LatLonAlt) and flight_id (UUID) are supplied by C12 from Flight.waypoints[0] via the FlightsApiClient (ADR-010, AZ-489); both are baked into the Manifest body and included in the manifest-hash so re-planning the flight produces a new cache identity. Serializes the Manifest as canonical JSON (sorted keys, no whitespace) at cache_root/Manifest.json, computes its own SHA-256 sidecar via AZ-280, and writes a detached Ed25519 signature at cache_root/Manifest.json.sig using the operator's signing key from key_path. Refuses to sign with a non-operator key when config.c10.signing_mode = "operator" (C10-ST-01). Emits the signing_public_key_fingerprint into the Manifest itself so verifiers can pin the trust root. Complexity: 3 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-303_c6_storage_interfaces Component: c10_provisioning (epic AZ-252 / E-C10) Tracker: AZ-323 Epic: AZ-252 (E-C10)

Document Dependencies

  • _docs/02_document/contracts/shared_helpers/sha256_sidecar.md — atomic write + sidecar pattern (AZ-280).
  • _docs/02_document/contracts/c6_tile_cache/tile_metadata_store.mdquery_by_bbox returning per-tile sha256 set by AZ-316.
  • _docs/02_document/components/11_c10_provisioning/description.md — § 1 idempotence, § 5 ManifestWriteError, § 7 D-C10-3 sidecar coverage.

Problem

Without a real Manifest builder:

  • D-C10-1 (idempotent re-run via manifest hash) cannot be implemented — T5's "did anything change?" check has no canonical hash to compare.
  • D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable — the takeoff verifier (T4) has nothing to verify against.
  • AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses without a signed Manifest at takeoff.
  • C10-ST-01 (build refuses dev-key signing in operator mode) cannot be enforced without a signing key check.
  • The signing_public_key_fingerprint field is the trust anchor for the airborne ManifestVerifier; without it, the verifier cannot decide which key is allowed to vouch for a Manifest.
  • A Manifest that is huge (100k tile hashes × 80 bytes = 8 MB) but human-inspectable is operator-friendly; without canonical JSON ordering, two builds of the same input produce different bytes and break idempotence.

This task delivers the Manifest serialization + signing. It does NOT compile engines (AZ-321), embed tiles (AZ-322), or run the takeoff verify (T4).

Outcome

  • A ManifestBuilder class at src/gps_denied_onboard/components/c10_provisioning/manifest_builder.py:
    • Constructor: __init__(self, *, sidecar: Sha256Sidecar, signer: ManifestSigner, tile_metadata_store: TileMetadataStore, logger: Logger, clock: Clock, config: C10ManifestConfig).
    • C10ManifestConfig (@dataclass(frozen=True)): signing_mode: enum {operator, dev}, allowed_operator_fingerprints: tuple[str, ...], schema_version: str = "1.0".
    • Public method: build_manifest(input: ManifestBuildInput) -> ManifestArtifact.
    • ManifestBuildInput (@dataclass(frozen=True)): cache_root: Path, bbox: Bbox, zoom_levels: tuple[int, ...], sector_class: SectorClassification, engine_entries: tuple[EngineCacheEntry, ...], descriptor_index_path: Path, calibration_path: Path, key_path: Path, takeoff_origin: LatLonAlt | None = None (ADR-010 / AZ-489 — when set, baked into Manifest + hash), flight_id: UUID | None = None (ADR-010 — pass-through provenance).
    • ManifestArtifact (@dataclass(frozen=True)): manifest_path: Path, signature_path: Path, manifest_hash: str, signing_public_key_fingerprint: str, total_artifacts_listed: int.
  • A ManifestSigner Protocol at src/gps_denied_onboard/components/c10_provisioning/interface.py:
    @runtime_checkable
    class ManifestSigner(Protocol):
        def load_signing_key(self, key_path: Path) -> SigningKeyHandle: ...
        def sign(self, key: SigningKeyHandle, payload_bytes: bytes) -> bytes: ...
        def public_key_fingerprint(self, key: SigningKeyHandle) -> str: ...
    
    Default impl Ed25519ManifestSigner uses the cryptography library (already pinned via AZ-318 for per-flight keys).
  • Method flow:
    1. Load operator signing key: signer.load_signing_key(input.key_path)SigningKeyHandle.
    2. Compute signing_public_key_fingerprint = signer.public_key_fingerprint(key) (sha256 of the raw 32-byte ed25519 public key, hex).
    3. Operator-mode gate (C10-ST-01): if config.signing_mode == "operator" AND fingerprint not in config.allowed_operator_fingerprints → raise ManifestWriteError("signing key fingerprint not in allowed_operator_fingerprints"); ERROR log with the offending fingerprint. If config.signing_mode == "dev" AND fingerprint matches an allowed operator fingerprint → emit WARN c10.manifest.dev_mode_with_operator_key (operator key being used in dev mode is suspicious but allowed).
    4. Compute per-artifact hashes:
      • For each engine entry: read entry.engine_sha256_hex (already computed by AZ-321; do NOT re-hash).
      • For descriptor index: call sidecar.read_sidecar(input.descriptor_index_path) → expect a 64-char hex digest.
      • For calibration JSON: sha256_hex(open(calibration_path, 'rb').read()) — calibration is small (KB).
      • For tiles: call tile_metadata_store.query_by_bbox(bbox, zoom_levels, sector_class) → list of TileMetadata with sha256_hex field (set by AZ-316). Sort by (zoom, lat, lon, source) for determinism. Compute tiles_coverage_sha256 = sha256(b"\n".join(f"{t.tile_id}:{t.sha256_hex}".encode() for t in sorted_tiles)).
    5. Build the canonical Manifest dict (ADR-010 adds flight.takeoff_origin + flight.flight_id blocks when supplied):
      {
        "schema_version": "1.1",
        "build": {
          "bbox": {...},
          "zoom_levels": [16, 17, 18],
          "sector_class": "stable_rear",
          "built_at": "2026-05-10T12:00:00Z",
          "manifest_hash": "<sha256-hex>"
        },
        "flight": {
          "flight_id": "<uuid>",                       // null when ManifestBuildInput.flight_id is None
          "takeoff_origin": {                          // omitted when ManifestBuildInput.takeoff_origin is None
            "lat_deg": <float>,
            "lon_deg": <float>,
            "alt_m": <float>
          }
        },
        "artifacts": {
          "engines": [{"path": "engines/dinov2_vpr_sm87_jp62_trt103_fp16.engine", "sha256": "<hex>"}, ...],
          "descriptor_index": {"path": "descriptors/corpus.index", "sha256": "<hex>"},
          "calibration": {"path": "calibration/int8_calibration.json", "sha256": "<hex>"},
          "tiles_coverage": {"sha256": "<hex>", "tile_count": <int>}
        },
        "signing_public_key_fingerprint": "<hex>"
      }
      
    6. Compute manifest_hash as sha256(canonical_json(build_identity_tuple)) where build_identity_tuple = sorted({model_ids, calibration_sha256, tiles_coverage_sha256, sector_class, bbox, zoom_levels, takeoff_origin_tuple_or_none, flight_id_or_none}). The takeoff origin is serialised as (lat_deg, lon_deg, alt_m) rounded to 9 decimal places (sub-millimetre, deterministic). This is the D-C10-1 idempotence key. Insert into the Manifest dict at build.manifest_hash AFTER computation. Two builds with identical inputs but different takeoff_origin produce different manifest_hash values; this is the contract that lets ManifestVerifier reject a re-planned route at boot (AZ-324, MV-INV-8).
    7. Serialize the Manifest dict as canonical JSON: orjson.dumps(manifest, option=orjson.OPT_SORT_KEYS | orjson.OPT_INDENT_2).decode(). Append a trailing newline.
    8. Atomic-write the JSON via sidecar.write_with_sidecar(cache_root / "Manifest.json", canonical_json_bytes) — produces Manifest.json + Manifest.json.sha256 (the latter is the Manifest's OWN sha256, used by T4).
    9. Sign the canonical JSON bytes: signature_bytes = signer.sign(key, canonical_json_bytes) (raw Ed25519 signature, 64 bytes).
    10. Atomic-write the signature: sidecar.atomic_write(cache_root / "Manifest.json.sig", signature_bytes) (no .sha256 sidecar for the signature itself — signature integrity is verified by Ed25519 over the Manifest bytes).
    11. Return ManifestArtifact(manifest_path, signature_path, manifest_hash, signing_public_key_fingerprint, total_artifacts_listed).
  • INFO log on successful build (c10.manifest.build.success with manifest_hash + total_artifacts_listed); ERROR on ManifestWriteError; WARN on dev-mode-with-operator-key.

Scope

Included

  • ManifestBuilder class with the single public method.
  • ManifestSigner Protocol + Ed25519ManifestSigner default impl.
  • Canonical JSON serialization (sorted keys, sorted lists where order is content-defining).
  • Operator-key gate per signing_mode config.
  • Per-artifact hash computation (engines, descriptor index, calibration, tiles aggregate).
  • Atomic writes via AZ-280 for both Manifest.json and Manifest.json.sig.
  • Composition-root factory build_manifest_builder.
  • Conformance test for ManifestSigner Protocol.

Excluded

  • The orchestration of when to build (T5 owns).
  • Engine compilation / descriptor generation (AZ-321 / AZ-322).
  • Manifest verification (T4 owns).
  • Idempotence "should we skip the build?" decision (T5 owns; this task always rebuilds when called).
  • ManifestCoverageError (T5 owns; this task lists what it's told, doesn't enumerate cache_root).
  • Key generation — operator's long-lived key is provisioned out-of-band; this task only loads + uses.
  • Multi-key signing (M-of-N quorum) — single-key per build.
  • Compressed Manifest format — JSON for human inspection.

Acceptance Criteria

AC-1: Happy path produces Manifest + sig + sidecars Given a valid input with 3 engines, 1 descriptor index, 1 calibration JSON, 100 tiles When build_manifest(input) is called Then Manifest.json, Manifest.json.sha256, Manifest.json.sig are all present at cache_root/; the Manifest contains 3 engine entries, 1 descriptor_index entry, 1 calibration entry, 1 tiles_coverage entry; manifest_hash is a 64-char lowercase hex string; the returned ManifestArtifact.total_artifacts_listed == 5 (engines + index + calibration + tiles_coverage as one logical artifact + the Manifest itself counts separately if at all)

AC-2: Determinism — same input produces byte-identical Manifest Given the same ManifestBuildInput run twice on different days (different built_at) When the canonical JSON is compared with built_at redacted Then both runs produce byte-identical bytes — proves canonical JSON ordering works; same manifest_hash. (This is the foundation for T5's idempotence check.)

AC-3: Signature verifies against the public key Given the signature file + the operator's public key When cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PublicKey.verify(signature, manifest_bytes) is called Then no exception is raised — proves the signing produced a valid Ed25519 signature

AC-4: Operator-mode rejects unknown fingerprint Given config.signing_mode = "operator" and config.allowed_operator_fingerprints = ("known_fp",) and a key file whose fingerprint is "unknown_fp" When build_manifest is called Then ManifestWriteError is raised with a message naming both fingerprints (the offered one + the allowlist); ZERO files are written; ONE ERROR log

AC-5: Operator-mode accepts known fingerprint Given config.signing_mode = "operator" and the key file's fingerprint IS in the allowlist When build_manifest is called Then the build succeeds; ZERO WARN logs about dev-mode

AC-6: Dev-mode with non-operator key emits no warning Given config.signing_mode = "dev" and a random dev key (not in allowlist) When build_manifest is called Then build succeeds; signing_public_key_fingerprint is the dev key's; ZERO warnings about operator key in dev mode

AC-7: Dev-mode with operator key emits warning Given config.signing_mode = "dev" and a key whose fingerprint IS in allowed_operator_fingerprints When build_manifest is called Then build succeeds; ONE WARN log c10.manifest.dev_mode_with_operator_key with the fingerprint

AC-8: Tile coverage hash is sort-order-deterministic Given the same 100 tiles loaded in two different SQL row orders (e.g., insertion order vs index scan) When tiles_coverage_sha256 is computed Then both runs produce the same hash — proves the (zoom, lat, lon, source) sort is canonical

AC-9: ManifestWriteError on key load failure Given a key_path that does not exist OR contains malformed PEM When signer.load_signing_key(key_path) raises Then ManifestWriteError("operator signing key load failed: <reason>") is raised; ZERO files are written; the original cryptography exception is chained as __cause__ for diagnosis

AC-10: Atomic write — partial Manifest impossible Given the Manifest is being written and the process is killed mid-write When restarted Then either the previous-good Manifest OR the new Manifest is at the path; never a half-written JSON. (AZ-280's atomic-write contract.)

AC-11: Manifest's own sidecar is consistent Given a freshly-written Manifest.json When sha256_hex(open("Manifest.json", "rb").read()) is computed and compared to Manifest.json.sha256 Then the values match — T4's verifier walks all sidecars and this is the entry point

AC-12: total_artifacts_listed equals dict-counted artifacts Given an input with N engines + 1 index + 1 calibration + tiles_coverage When ManifestArtifact.total_artifacts_listed is inspected Then it equals N + 3 (engines + index + calibration + tiles_coverage); does NOT count the Manifest itself or the signature

AC-13: takeoff_origin baked into Manifest body when supplied (ADR-010 / AZ-489) Given a ManifestBuildInput with takeoff_origin = LatLonAlt(50.0, 36.2, 200.0) and flight_id = some_uuid When build_manifest is called Then the Manifest body contains a flight block with flight_id and takeoff_origin (lat_deg=50.0, lon_deg=36.2, alt_m=200.0); ZERO built_at-style timestamp inside takeoff_origin

AC-14: takeoff_origin absent from Manifest body when not supplied Given a ManifestBuildInput with takeoff_origin = None and flight_id = None When build_manifest is called Then the Manifest body has the flight block with flight_id: null and NO takeoff_origin key (use absence, not null, so AZ-324 can detect "field never set" vs "field invalid")

AC-15: manifest_hash changes when only takeoff_origin differs Given two ManifestBuildInputs identical except takeoff_origin = A vs takeoff_origin = B (B != A by ≥ 1 mm) When build_manifest is called twice Then the two manifest_hash values differ — D-C10-1 idempotence treats re-planned route as a new build

AC-16: manifest_hash stable when only flight_id differs but takeoff_origin is the same Given two ManifestBuildInputs identical except flight_id When build_manifest is called twice Then the two manifest_hash values differflight_id is provenance and is part of the build identity (operator may re-plan with the same takeoff position but a different mission; the cache identity must track that)

Non-Functional Requirements

Performance

  • Build wall-clock ≤ 5 s for a 100k-tile corpus on Tier-1 dev workstation: sorting 100k tile hashes + computing one SHA-256 over the concatenated string is ~50 MB of input → ~100 ms; serializing JSON with 100k tile_count is fast (single integer); engine + index + calibration hashes are already computed upstream. Total ≤ 5 s leaves headroom.
  • Operator-mode fingerprint check is a single string comparison.

Compatibility

  • Uses orjson (already pinned via AZ-272 for FDR), cryptography (already pinned via AZ-318 for per-flight keys), hashlib (stdlib).
  • No new third-party dependencies.

Reliability

  • Operator-key gate is fail-closed: unknown fingerprint → no Manifest written.
  • Atomic writes prevent half-written Manifests on process kill.
  • Canonical JSON ensures bit-identical Manifests for identical inputs (foundation for D-C10-1 idempotence in T5).

Unit Tests

AC Ref What to Test Required Outcome
AC-1 Build with 3 engines + index + calibration + 100 tiles All files present; counts match
AC-2 Build twice, redact built_at, compare bytes Identical
AC-3 Verify signature with public key No raise
AC-4 Operator mode + unknown fingerprint ManifestWriteError; no files
AC-5 Operator mode + known fingerprint Success; no warnings
AC-6 Dev mode + dev key Success; no warnings
AC-7 Dev mode + operator-allowlisted key Success; ONE warning
AC-8 Tile rows in different orders Same tiles_coverage_sha256
AC-9 Missing or malformed key file ManifestWriteError; chained cause
AC-10 Kill mid-write No half-Manifest
AC-11 Verify Manifest's own sidecar Hashes match
AC-12 Inspect total_artifacts_listed Counts engines+index+calibration+tiles_coverage
AC-13 Build with takeoff_origin set flight.takeoff_origin present in JSON; lat/lon/alt match
AC-14 Build with takeoff_origin=None flight.takeoff_origin key absent from JSON
AC-15 Two builds, takeoff_origin differs manifest_hash differs
AC-16 Two builds, only flight_id differs manifest_hash differs
NFR-perf 100k-tile bench ≤ 5 s wall clock
NFR-reliability-fail-closed Operator mode + unknown fp Fail-closed; nothing written

Constraints

  • Canonical JSON via orjson with OPT_SORT_KEYS; this task does NOT use a different JSON library.
  • Atomic writes via AZ-280 for BOTH Manifest.json and Manifest.json.sig; no naked Path.write_bytes().
  • manifest_hash excludes built_at (it's a build-identity hash, not a Manifest-bytes hash).
  • The Manifest's own SHA-256 sidecar (Manifest.json.sha256) IS the Manifest-bytes hash and is used by T4 at takeoff.
  • Tile coverage hashing is via aggregate tiles_coverage_sha256, NOT per-tile entries in the Manifest (keeps Manifest bounded).
  • Signature is detached (separate .sig file); embedded signatures are NOT permitted (would require parsing before verifying).
  • Ed25519 only; this task does NOT add other algorithms.
  • Operator-key fingerprint allowlist is config-driven; no hardcoded keys.

Risks & Mitigation

Risk 1: built_at makes Manifests non-deterministic for the same input

  • Risk: Idempotence check in T5 compares manifest_hash only, but if T5 reads the Manifest bytes directly elsewhere it could see different bytes for "same" build.
  • Mitigation: AC-2 explicitly excludes built_at from the manifest_hash computation. T5 compares hashes, not bytes. Documented in the Manifest schema.

Risk 2: tiles_coverage as aggregate hides which tile changed

  • Risk: When verify fails at takeoff (T4), the operator only learns "tiles_coverage hash mismatch", not WHICH tile drifted.
  • Mitigation: T4's failure path can re-walk per-tile hashes against C6 to identify the offender. The Manifest stays small; debugging detail is computed on-demand. Documented in T4's scope.

Risk 3: cryptography API breaks between minor versions

  • Risk: Ed25519 API changes (unlikely but cryptography does ship breaking changes occasionally).
  • Mitigation: Pin to the same version used by AZ-318. The Ed25519ManifestSigner is the only place using the API; a one-place adapter swap on upgrade.

Risk 4: Operator key file format ambiguity

  • Risk: Operators might supply a key in PKCS8, OpenSSH, or raw 32-byte format.
  • Mitigation: Ed25519ManifestSigner.load_signing_key accepts PEM-encoded PKCS8 only (matches AZ-318's convention); other formats raise ManifestWriteError with explicit format hint.

Risk 5: Dev key accidentally signs an operator-mode build

  • Risk: Operator runs build with signing_mode = "operator" but supplies a dev key by mistake.
  • Mitigation: AC-4 covers; the gate is fail-closed and logs the offending fingerprint so the operator can correct.

Runtime Completeness

  • Named capability: signed Manifest production with content-hash table covering every shipped artifact, D-C10-1 idempotence key (manifest_hash), C10-ST-01 operator-mode gate (epic § Acceptance C10-IT-01, C10-IT-02, C10-ST-01).
  • Production code that must exist: real ManifestBuilder orchestrating real Ed25519ManifestSigner (cryptography library) + real AZ-280 atomic writes + real C6 query_by_bbox to gather tile hashes; real config-driven fingerprint allowlist.
  • Allowed external stubs: tests MAY use a fake ManifestSigner with a known keypair generated in-test + a fake tile_metadata_store (AZ-303 conformance fakes); production wiring uses cryptography.hazmat.
  • Unacceptable substitutes: HMAC instead of Ed25519 (different trust model — symmetric vs asymmetric); embedding the signature in the JSON (defeats the parse-before-verify problem at takeoff); Python-only pickle of the Manifest (not human-inspectable, not canonical-byte stable); skipping the operator-fingerprint allowlist when signing_mode = "operator" (defeats C10-ST-01); using json.dumps without OPT_SORT_KEYS (breaks AC-2 determinism and breaks T5's idempotence).