Files
gps-denied-onboard/_docs/02_tasks/done/AZ-324_c10_manifest_verifier.md
T
Oleksandr Bezdieniezhnykh e2bebefdfc [AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.

AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.

AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.

Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).

Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.

Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.

AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 02:37:14 +03:00

20 KiB
Raw Blame History

C10 ManifestVerifier — Takeoff Content-Hash Gate + Trusted-Key Pinning

Task: AZ-324_c10_manifest_verifier Name: C10 ManifestVerifier Description: Implement ManifestVerifier (per the contract _docs/02_document/contracts/c10_provisioning/manifest_verifier.md v1.1.0), the read-only validator that AC-NEW-1 places between F2 takeoff and any engine deserialization. Loads Manifest.json, verifies its sidecar SHA-256 matches the Manifest bytes, parses the Ed25519 detached signature at Manifest.json.sig, verifies it against the caller-supplied trusted_public_keys tuple, parses the Manifest schema (rejecting absolute paths and schema violations), validates the optional flight.takeoff_origin block (well-formed LatLonAlt + inside build.bbox per ADR-010 + AZ-490), and walks every per-artifact entry re-hashing it via AZ-280's sidecar pattern. Returns a VerificationResult with outcome ∈ {PASS, FAIL}, the union of all VerifyFailReason values that fired, the populated per_artifact_checks list, the pass-through takeoff_origin + flight_id (or None when absent from the Manifest body), and elapsed_ms. Fail-closed: any deviation in signature, schema, key trust, hashes, or origin validity yields FAIL with detailed reasons. Never raises on a verify failure — only on environment errors (Manifest.json missing → MANIFEST_NOT_FOUND is still FAIL, not raise). Complexity: 3 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema Component: c10_provisioning (epic AZ-252 / E-C10) Tracker: AZ-324 Epic: AZ-252 (E-C10)

Document Dependencies

  • _docs/02_document/contracts/c10_provisioning/manifest_verifier.md — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
  • _docs/02_document/contracts/shared_helpers/sha256_sidecar.md — sidecar verify pattern (AZ-280).
  • _docs/02_document/components/11_c10_provisioning/description.md — § 5 ContentHashMismatchError handling, § 7 D-C10-3 sidecar coverage.

Problem

Without a real verifier:

  • AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses — F2 has nothing to gate on.
  • D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable at takeoff.
  • C10-IT-02 (rejects tampered or wrong-key Manifests) cannot be implemented.
  • A built but unverified Manifest is no better than no Manifest — operators cannot trust it without an actual check.
  • Without a contract, C5 takeoff arming and C12 operator tooling cannot couple to C10 — every consumer would re-implement an ad-hoc check.
  • The "fail-closed" property is a hard requirement; partial verifies that report PASS on first match would compromise the entire trust chain.

This task delivers the verifier + its frozen contract. It does NOT compile engines (AZ-321), build the Manifest (AZ-323), or own the takeoff-arming policy (E-C5).

Outcome

  • A ManifestVerifier class implementation at src/gps_denied_onboard/components/c10_provisioning/manifest_verifier.py matching the Protocol in the contract.
  • Constructor: __init__(self, *, sidecar: Sha256Sidecar, logger: Logger, clock: Clock, tile_metadata_store: TileMetadataStore | None = None).
    • When tile_metadata_store is None, the verifier operates in airborne mode: trusts the recorded tiles_coverage_sha256 after the signature passes (per MV-INV-5).
    • When tile_metadata_store is not None, the verifier operates in operator mode: re-derives tiles_coverage_sha256 from C6 and reports TILES_COVERAGE_MISMATCH on drift.
  • The frozen contract at _docs/02_document/contracts/c10_provisioning/manifest_verifier.md (already written; this task brings the implementation up to it).
  • Method verify_manifest(manifest_path, trusted_public_keys) -> VerificationResult flow:
    1. Start time.monotonic() for elapsed_ms.
    2. Initialize empty fail_reasons: list[VerifyFailReason], fail_details: list[str], per_artifact_checks: list[ArtifactCheck].
    3. Step A — Manifest exists & sidecar matches:
      • If manifest_path does not exist: append MANIFEST_NOT_FOUND; return FAIL (no further work; per MV-INV-1).
      • Read Manifest.json bytes.
      • If manifest_path.with_suffix(".json.sha256") does not exist: append SCHEMA_VIOLATION ("missing manifest sidecar"); return FAIL.
      • If sha256(manifest_bytes) != sidecar_value: append MANIFEST_SELF_HASH_MISMATCH; return FAIL (do NOT consult signature per MV-INV-3).
    4. Step B — Signature verifies against a trusted key:
      • If signature_path = manifest_path.with_suffix(".json.sig") does not exist: append SIGNATURE_NOT_FOUND; signing_public_key_fingerprint = None; return FAIL.
      • Parse Ed25519 signature bytes (must be exactly 64 bytes; otherwise SIGNATURE_INVALID).
      • Try each public key in trusted_public_keys:
        • Compute fingerprint = sha256(pub.public_bytes_raw()).hex().
        • Try pub.verify(signature_bytes, manifest_bytes).
        • On success: signature is valid; signing_public_key_fingerprint = fingerprint; break.
      • If no trusted key verified:
        • If at least one key raised InvalidSignature (signature doesn't match this key's bytes): the signature could still match an untrusted key. Try parsing the Manifest's signing_public_key_fingerprint field (if schema parses) and report whichever is more diagnostic — UNTRUSTED_PUBLIC_KEY if the Manifest names a known-but-untrusted key, SIGNATURE_INVALID otherwise.
        • Append the reason; return FAIL (do NOT proceed to per-artifact hashing per MV-INV-2).
      • If trusted_public_keys is empty: append UNTRUSTED_PUBLIC_KEY; return FAIL.
    5. Step C — Schema parse:
      • orjson.loads(manifest_bytes) → dict.
      • Validate required keys: schema_version, build (with sub-keys bbox, zoom_levels, sector_class, built_at, manifest_hash), artifacts (with engines, descriptor_index, calibration, tiles_coverage), signing_public_key_fingerprint. flight block is OPTIONAL (added in schema v1.1, ADR-010).
      • Validate types: engines is list of {path: str, sha256: str}; descriptor_index, calibration are {path: str, sha256: str}; tiles_coverage is {sha256: str, tile_count: int}.
      • Validate path-relative-only: every path value must be relative (no leading /, no .. segments). Append SCHEMA_VIOLATION per offending field; if any, return FAIL.
      • Flight block (ADR-010 / AZ-490):
        • If flight key absent → takeoff_origin = None, flight_id = None; continue.
        • If flight present → parse flight_id (UUID or None) and takeoff_origin (optional block).
        • If flight.takeoff_origin present → validate lat_deg ∈ [-90, 90], lon_deg ∈ [-180, 180], alt_m finite (no NaN/Inf). Append TAKEOFF_ORIGIN_INVALID to fail_reasons and the offending field name to fail_details if any check fails.
        • If flight.takeoff_origin is well-formed → check it falls inside build.bbox (bbox.lat_min ≤ lat ≤ bbox.lat_max, bbox.lon_min ≤ lon ≤ bbox.lon_max). Append TAKEOFF_ORIGIN_OUT_OF_BBOX if not.
        • The takeoff_origin is populated on VerificationResult whenever the block parsed (even on FAIL), per MV-INV-9, so operators see what was attempted.
    6. Step D — Per-artifact hash walk (only reached if Steps AC all passed):
      • For each engine, descriptor_index, calibration entry:
        • Compute actual_path = manifest_path.parent / entry.path.
        • If file missing: append ArtifactCheck(entry.path, entry.sha256, None, matched=False); append ARTIFACT_MISSING to fail_reasons once if not already there.
        • Else: stream-read the file, compute SHA-256 (use AZ-280's helper that takes a path).
        • If hash matches: matched=True.
        • Else: matched=False; append ARTIFACT_HASH_MISMATCH once.
      • For tiles_coverage:
        • If tile_metadata_store is None (airborne mode): trust the recorded tiles_coverage.sha256 since the Manifest signature already binds it. Append ArtifactCheck("tiles_coverage", recorded_sha256, recorded_sha256, matched=True) for completeness.
        • Else (operator mode): re-derive tiles_coverage_sha256 by tile_metadata_store.query_by_bbox(...) over the build.bbox + zoom_levels + sector_class, sort by (zoom, lat, lon, source), hash. If mismatch → TILES_COVERAGE_MISMATCH.
      • Walk ALL entries even on first failure (per MV-TC-9).
    7. Set outcome = PASS iff fail_reasons is empty; else FAIL.
    8. Set elapsed_ms = int((time.monotonic() - start) * 1000).
    9. Return VerificationResult(...).
  • INFO log on PASS (c10.manifest.verify.pass with elapsed_ms + fingerprint); WARN on FAIL with fail_reasons + counts of mismatched artifacts.
  • Composition root factory build_manifest_verifier(config, *, with_tile_store: bool) -> ManifestVerifierwith_tile_store=True for operator mode, False for airborne C5.

Scope

Included

  • ManifestVerifier class implementing the Protocol from the contract.
  • The contract document (frozen at v1.0.0).
  • Schema validation against the v1.0 shape produced by AZ-323.
  • Signature verification against a tuple of trusted public keys.
  • Per-artifact stream-hash walk with multiple-failure accumulation.
  • Airborne vs operator mode for tiles_coverage handling.
  • Composition-root factory.
  • Conformance test for the contract Protocol.

Excluded

  • Manifest building / signing (AZ-323 owns).
  • Trusted-key distribution / loading from disk — caller passes Ed25519PublicKey instances.
  • Cache repair on FAIL — caller (E-C5 takeoff arming, E-C12 operator) decides next action.
  • Coverage check for orphan files in cache_root (AZ-325 owns ManifestCoverageError).
  • Logging Manifest contents (Manifests are not secret but verbose; only fingerprints + counts are logged).
  • C13 FDR emission — caller's responsibility (per MV-INV-6).
  • Non-Ed25519 signatures.

Acceptance Criteria

AC-1: PASS on a valid Manifest with all artifacts present and matching Given a freshly-built Manifest + sig + sidecar from AZ-323 and trusted_public_keys = (signing_pub,) When verify_manifest(manifest_path, trusted_public_keys) is called Then outcome=PASS, fail_reasons is empty, per_artifact_checks has every entry matched=True, signing_public_key_fingerprint is the signing key's fingerprint, elapsed_ms > 0

AC-2: FAIL on missing Manifest with no further work Given manifest_path does not exist When verify runs Then outcome=FAIL, fail_reasons=(MANIFEST_NOT_FOUND,), per_artifact_checks is empty (no work performed), signing_public_key_fingerprint=None

AC-3: FAIL on missing signature with diagnostic Given Manifest.json exists + sidecar matches but Manifest.json.sig is absent When verify runs Then fail_reasons=(SIGNATURE_NOT_FOUND,), per_artifact_checks is empty, no per-artifact disk reads happen (defence-in-depth)

AC-4: FAIL on tampered Manifest body Given Manifest.json is mutated by 1 byte after signing When verify runs Then either MANIFEST_SELF_HASH_MISMATCH (sidecar caught it first) OR SIGNATURE_INVALID (if sidecar was also re-computed by attacker); per-artifact walk does NOT happen

AC-5: FAIL on untrusted public key Given the Manifest is signed with a key NOT in trusted_public_keys When verify runs Then fail_reasons=(UNTRUSTED_PUBLIC_KEY,), signing_public_key_fingerprint is populated (so operators see WHICH untrusted key signed it), per-artifact walk does NOT happen

AC-6: FAIL on schema violation lists offending field Given a Manifest missing the signing_public_key_fingerprint key When verify runs Then fail_reasons=(SCHEMA_VIOLATION,), fail_details contains a string naming signing_public_key_fingerprint

AC-7: FAIL on absolute path in artifact entry Given an engine entry has path: "/etc/passwd" When verify runs Then fail_reasons=(SCHEMA_VIOLATION,), fail_details names the offending field; per-artifact walk does NOT consult /etc/passwd

AC-8: FAIL with multiple reasons accumulated Given one engine is missing on disk AND one engine's bytes drifted AND a third engine matches When verify runs Then fail_reasons contains BOTH ARTIFACT_MISSING and ARTIFACT_HASH_MISMATCH (in deterministic order: traversal order); per_artifact_checks has all 3 entries with correct matched values; the third entry has matched=True

AC-9: Operator mode re-derives tiles_coverage Given tile_metadata_store is supplied AND C6's tiles for the build's bbox/zoom now have a different aggregate hash (e.g., a tile was re-downloaded) When verify runs Then fail_reasons=(TILES_COVERAGE_MISMATCH,); the recorded vs computed hashes are in fail_details

AC-10: Airborne mode trusts tiles_coverage post-signature Given tile_metadata_store=None When verify runs Then tiles_coverage ArtifactCheck shows matched=True (recorded == "actual" because we don't re-derive); the airborne F2 path is fast (≤ 100 ms per NFR)

AC-11: Conformance — isinstance returns True Given the implementation When isinstance(impl, ManifestVerifier) is checked under runtime_checkable Then True

AC-12: elapsed_ms recorded on every outcome Given any of the above ACs When inspecting the result Then elapsed_ms >= 0 and is reasonable (smaller for early-exit failures, larger for full per-artifact walks)

AC-13: Empty trusted_public_keys always fails closed Given trusted_public_keys = () When verify runs Then fail_reasons=(UNTRUSTED_PUBLIC_KEY,) regardless of Manifest validity; per-artifact walk does NOT happen

AC-14: Manifest with no flight block parses cleanly (back-compat) Given a v1.0 Manifest (no flight block) that is otherwise valid + signed When verify runs Then outcome=PASS; VerificationResult.takeoff_origin is None; VerificationResult.flight_id is None

AC-15: Well-formed in-bbox takeoff_origin passes through Given a v1.1 Manifest with flight.takeoff_origin = (50.0, 36.2, 200.0) inside the recorded bbox When verify runs Then outcome=PASS; VerificationResult.takeoff_origin == LatLonAlt(50.0, 36.2, 200.0)

AC-16: Malformed takeoff_origin (lat=200) fails closed Given a Manifest with flight.takeoff_origin.lat_deg = 200 When verify runs Then outcome=FAIL; fail_reasons contains TAKEOFF_ORIGIN_INVALID; fail_details names lat_deg; the takeoff_origin field on VerificationResult is still populated for diagnostics

AC-17: Out-of-bbox takeoff_origin fails closed Given a Manifest whose flight.takeoff_origin = (10.0, 10.0, 0) while build.bbox covers (49.5..50.5, 35.5..36.5) When verify runs Then outcome=FAIL; fail_reasons contains TAKEOFF_ORIGIN_OUT_OF_BBOX

Non-Functional Requirements

Performance

  • Airborne F2 verify (no per-tile re-derivation, ~5 artifact entries): wall-clock ≤ 100 ms on Jetson Orin (signature verify + 5 stream-SHA-256s of bounded files).
  • Operator-mode verify with 100k tiles re-derivation: ≤ 5 s (matches AZ-323's NFR).
  • Stream-hash files via 64 KB chunks; do NOT load engine binaries (~200 MB) entirely into memory.

Compatibility

  • cryptography (already pinned via AZ-318), orjson (already pinned), hashlib (stdlib).
  • No new third-party dependencies.

Reliability

  • Fail-closed: empty trusted keys → FAIL; missing files → FAIL; any drift → FAIL.
  • No partial PASS; the outcome=PASS branch is taken only when fail_reasons is empty.
  • Defensive against directory traversal: relative paths only (AC-7).

Unit Tests

AC Ref What to Test Required Outcome
AC-1 Built Manifest from AZ-323 fixture PASS; all matched
AC-2 Missing Manifest.json FAIL; MANIFEST_NOT_FOUND only
AC-3 Missing signature FAIL; SIGNATURE_NOT_FOUND; no disk reads
AC-4 Mutated Manifest body FAIL; either MANIFEST_SELF_HASH_MISMATCH or SIGNATURE_INVALID
AC-5 Wrong-key signing FAIL; UNTRUSTED_PUBLIC_KEY; fingerprint populated
AC-6 Missing required field FAIL; SCHEMA_VIOLATION + field name
AC-7 Absolute path in artifact FAIL; SCHEMA_VIOLATION; no path traversal
AC-8 1 missing + 1 drifted + 1 OK Two failure reasons; per_artifact_checks complete
AC-9 Operator mode + drifted tile TILES_COVERAGE_MISMATCH
AC-10 Airborne mode tiles_coverage matched=True
AC-11 Conformance check True
AC-14 v1.0 Manifest (no flight block) PASS; takeoff_origin=None; flight_id=None
AC-15 v1.1 Manifest, valid in-bbox origin PASS; takeoff_origin populated
AC-16 Malformed origin (lat=200) FAIL; TAKEOFF_ORIGIN_INVALID; field name in details
AC-17 Out-of-bbox origin FAIL; TAKEOFF_ORIGIN_OUT_OF_BBOX
AC-12 Inspect elapsed_ms All non-negative; ordered as expected
AC-13 Empty trusted keys FAIL; UNTRUSTED
NFR-perf-airborne 5 artifact bench, no tile re-walk p99 ≤ 100 ms
NFR-perf-operator 100k-tile re-walk ≤ 5 s
NFR-reliability-stream-hash 200 MB engine + memory profile Peak < 10 MB extra

Constraints

  • Stream SHA-256 over files via hashlib.sha256().update(chunk) in 64 KB blocks; do NOT Path.read_bytes() on engines (memory blowup per NFR).
  • Path interpretation is relative-only; absolute paths are SCHEMA_VIOLATION (AC-7).
  • The verifier is read-only (per MV-INV-6); no disk writes, no network, no FDR.
  • fail_reasons is a tuple (immutable, ordered, deterministic).
  • Signature checks happen before per-artifact walks (per MV-INV-2).
  • Manifest sidecar check happens before signature (per MV-INV-3).
  • Multiple failures accumulate; do not short-circuit on first per-artifact failure (per MV-TC-9 / AC-8).

Risks & Mitigation

Risk 1: Trusted-key list accidentally empty in production wiring

  • Risk: Composition root mis-configures; airborne C5 ends up with an empty key list and arming silently fails forever.
  • Mitigation: AC-13 + ERROR log on UNTRUSTED_PUBLIC_KEY with key-list-length=0 makes the misconfiguration loud at first arm attempt.

Risk 2: Per-artifact walk dominates airborne arm latency

  • Risk: 5 engines × 200 MB stream-hash on slow microSD → 30 s arm latency.
  • Mitigation: NFR-perf-airborne benchmark documents the envelope; if the Jetson microSD I/O is the bottleneck, a follow-up task adds an "incremental verify" path that trusts unchanged artifacts since last reboot. Out of scope this cycle.

Risk 3: Tampered sidecar matches tampered body (attacker drops both sidecar + body)

  • Risk: AC-4's first failure case (sidecar mismatch) is bypassed by an attacker who recomputes the sidecar.
  • Mitigation: Signature check (Step B) catches this — the signature is over the Manifest body; recomputing the sidecar does NOT also recompute the signature. The Ed25519 secret key is operator-only.

Risk 4: Path traversal via relative .. segments

  • Risk: A relative path like ../../etc/passwd passes the "no leading /" check but escapes cache_root.
  • Mitigation: AC-7 + .. segment rejection covers it; explicit check if ".." in Path(entry.path).parts: SCHEMA_VIOLATION.

Risk 5: Operator-mode tile re-walk on Jetson is too slow

  • Risk: An airborne-mode verifier mistakenly gets a tile_metadata_store (composition root mistake) and re-walks 100k tiles, blowing the arm latency budget.
  • Mitigation: The composition root factory build_manifest_verifier(config, *, with_tile_store: bool) is the explicit toggle; airborne wiring passes with_tile_store=False. AC-10 tests airborne mode latency.

Runtime Completeness

  • Named capability: takeoff content-hash gate per AC-NEW-1 + D-C10-3 + C10-IT-02 (epic § Acceptance C10-IT-01..02; description.md § 5 ContentHashMismatchError).
  • Production code that must exist: real ManifestVerifier orchestrating real cryptography Ed25519 verify + real hashlib stream-SHA-256 + real orjson schema parse; real tile_metadata_store re-derivation in operator mode.
  • Allowed external stubs: tests MAY use a fake key generated in-test, fake Manifest fixtures from AZ-323's test fixtures; production wiring uses real keys from operator key store.
  • Unacceptable substitutes: skipping Step A's sidecar check (loses bit-rot detection); skipping Step B before walking artifacts (defeats MV-INV-2 defence-in-depth); short-circuiting on first per-artifact failure (operators need full diagnostic per MV-TC-9); HMAC instead of Ed25519 (different trust model); accepting absolute paths in entries (path traversal vulnerability per AC-7); raising on missing files instead of outcome=FAIL (breaks the contract's read-only / never-raise-on-verify-failure invariant).