[AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene

AZ-507: codify cross-component import rule. Added
_types/inference_errors.py shim re-exporting EngineBuildError +
CalibrationCacheError from c7_inference; narrowed C10
EngineCompiler's except Exception to the two typed errors so unknown
exceptions propagate (AC-3). Rewrote module-layout.md "Imports from"
sections for 9 components + added Rule 9; appended an
architecture.md ADR-009 note explaining why components must go
through _types/*.

AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via
orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha
sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate
(C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest
AND manifest_hash so re-planned routes change the cache identity
(AC-15/AC-16). 20 unit tests cover all 16 ACs.

AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json
sidecar self-hash, Ed25519 trust-key set, schema parse with
absolute/.. path rejection + takeoff_origin in-bbox check, stream
SHA-256 per artifact with multi-failure accumulation. Operator mode
re-derives tiles_coverage_sha256 from C6; airborne mode trusts the
signed aggregate. 19 unit tests cover all 17 ACs.

Composition root: c10_factory.build_manifest_builder +
build_manifest_verifier + c6_tile_metadata_store_to_tiles_query
adapter (the one place that legitimately imports both C6 and C10
without violating the AZ-270 lint).

Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml.

Tests: 1300 passed, 80 skipped (env-only), ruff clean for all
AZ-323/324 files.

AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++
pybind11 toolchain not present in this environment.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 02:37:14 +03:00
parent 6ca8d78190
commit e2bebefdfc
20 changed files with 3406 additions and 26 deletions
@@ -0,0 +1,273 @@
# C10 ManifestVerifier — Takeoff Content-Hash Gate + Trusted-Key Pinning
**Task**: AZ-324_c10_manifest_verifier
**Name**: C10 ManifestVerifier
**Description**: Implement `ManifestVerifier` (per the contract `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` v1.1.0), the read-only validator that AC-NEW-1 places between F2 takeoff and any engine deserialization. Loads `Manifest.json`, verifies its sidecar SHA-256 matches the Manifest bytes, parses the Ed25519 detached signature at `Manifest.json.sig`, verifies it against the caller-supplied `trusted_public_keys` tuple, parses the Manifest schema (rejecting absolute paths and schema violations), validates the optional `flight.takeoff_origin` block (well-formed `LatLonAlt` + inside `build.bbox` per ADR-010 + AZ-490), and walks every per-artifact entry re-hashing it via AZ-280's sidecar pattern. Returns a `VerificationResult` with `outcome ∈ {PASS, FAIL}`, the union of all `VerifyFailReason` values that fired, the populated `per_artifact_checks` list, the pass-through `takeoff_origin` + `flight_id` (or `None` when absent from the Manifest body), and `elapsed_ms`. Fail-closed: any deviation in signature, schema, key trust, hashes, or origin validity yields `FAIL` with detailed reasons. Never raises on a verify failure — only on environment errors (Manifest.json missing → `MANIFEST_NOT_FOUND` is still `FAIL`, not raise).
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema
**Component**: c10_provisioning (epic AZ-252 / E-C10)
**Tracker**: AZ-324
**Epic**: AZ-252 (E-C10)
### Document Dependencies
- `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — sidecar verify pattern (AZ-280).
- `_docs/02_document/components/11_c10_provisioning/description.md` — § 5 `ContentHashMismatchError` handling, § 7 D-C10-3 sidecar coverage.
## Problem
Without a real verifier:
- AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses — F2 has nothing to gate on.
- D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable at takeoff.
- C10-IT-02 (rejects tampered or wrong-key Manifests) cannot be implemented.
- A built but unverified Manifest is no better than no Manifest — operators cannot trust it without an actual check.
- Without a contract, C5 takeoff arming and C12 operator tooling cannot couple to C10 — every consumer would re-implement an ad-hoc check.
- The "fail-closed" property is a hard requirement; partial verifies that report PASS on first match would compromise the entire trust chain.
This task delivers the verifier + its frozen contract. It does NOT compile engines (AZ-321), build the Manifest (AZ-323), or own the takeoff-arming policy (E-C5).
## Outcome
- A `ManifestVerifier` class implementation at `src/gps_denied_onboard/components/c10_provisioning/manifest_verifier.py` matching the Protocol in the contract.
- Constructor: `__init__(self, *, sidecar: Sha256Sidecar, logger: Logger, clock: Clock, tile_metadata_store: TileMetadataStore | None = None)`.
- When `tile_metadata_store is None`, the verifier operates in airborne mode: trusts the recorded `tiles_coverage_sha256` after the signature passes (per MV-INV-5).
- When `tile_metadata_store is not None`, the verifier operates in operator mode: re-derives `tiles_coverage_sha256` from C6 and reports `TILES_COVERAGE_MISMATCH` on drift.
- The frozen contract at `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` (already written; this task brings the implementation up to it).
- Method `verify_manifest(manifest_path, trusted_public_keys) -> VerificationResult` flow:
1. Start `time.monotonic()` for `elapsed_ms`.
2. Initialize empty `fail_reasons: list[VerifyFailReason]`, `fail_details: list[str]`, `per_artifact_checks: list[ArtifactCheck]`.
3. **Step A — Manifest exists & sidecar matches**:
- If `manifest_path` does not exist: append `MANIFEST_NOT_FOUND`; return `FAIL` (no further work; per MV-INV-1).
- Read `Manifest.json` bytes.
- If `manifest_path.with_suffix(".json.sha256")` does not exist: append `SCHEMA_VIOLATION` ("missing manifest sidecar"); return `FAIL`.
- If `sha256(manifest_bytes) != sidecar_value`: append `MANIFEST_SELF_HASH_MISMATCH`; return `FAIL` (do NOT consult signature per MV-INV-3).
4. **Step B — Signature verifies against a trusted key**:
- If `signature_path = manifest_path.with_suffix(".json.sig")` does not exist: append `SIGNATURE_NOT_FOUND`; `signing_public_key_fingerprint = None`; return `FAIL`.
- Parse Ed25519 signature bytes (must be exactly 64 bytes; otherwise `SIGNATURE_INVALID`).
- Try each public key in `trusted_public_keys`:
- Compute `fingerprint = sha256(pub.public_bytes_raw()).hex()`.
- Try `pub.verify(signature_bytes, manifest_bytes)`.
- On success: signature is valid; `signing_public_key_fingerprint = fingerprint`; break.
- If no trusted key verified:
- If at least one key raised `InvalidSignature` (signature doesn't match this key's bytes): the signature could still match an untrusted key. Try parsing the Manifest's `signing_public_key_fingerprint` field (if schema parses) and report whichever is more diagnostic — `UNTRUSTED_PUBLIC_KEY` if the Manifest names a known-but-untrusted key, `SIGNATURE_INVALID` otherwise.
- Append the reason; return `FAIL` (do NOT proceed to per-artifact hashing per MV-INV-2).
- If `trusted_public_keys` is empty: append `UNTRUSTED_PUBLIC_KEY`; return `FAIL`.
5. **Step C — Schema parse**:
- `orjson.loads(manifest_bytes)` → dict.
- Validate required keys: `schema_version`, `build` (with sub-keys `bbox`, `zoom_levels`, `sector_class`, `built_at`, `manifest_hash`), `artifacts` (with `engines`, `descriptor_index`, `calibration`, `tiles_coverage`), `signing_public_key_fingerprint`. `flight` block is OPTIONAL (added in schema v1.1, ADR-010).
- Validate types: `engines` is list of `{path: str, sha256: str}`; `descriptor_index`, `calibration` are `{path: str, sha256: str}`; `tiles_coverage` is `{sha256: str, tile_count: int}`.
- Validate path-relative-only: every `path` value must be relative (no leading `/`, no `..` segments). Append `SCHEMA_VIOLATION` per offending field; if any, return `FAIL`.
- **Flight block (ADR-010 / AZ-490)**:
- If `flight` key absent → `takeoff_origin = None`, `flight_id = None`; continue.
- If `flight` present → parse `flight_id` (`UUID` or `None`) and `takeoff_origin` (optional block).
- If `flight.takeoff_origin` present → validate `lat_deg ∈ [-90, 90]`, `lon_deg ∈ [-180, 180]`, `alt_m` finite (no NaN/Inf). Append `TAKEOFF_ORIGIN_INVALID` to `fail_reasons` and the offending field name to `fail_details` if any check fails.
- If `flight.takeoff_origin` is well-formed → check it falls inside `build.bbox` (`bbox.lat_min ≤ lat ≤ bbox.lat_max`, `bbox.lon_min ≤ lon ≤ bbox.lon_max`). Append `TAKEOFF_ORIGIN_OUT_OF_BBOX` if not.
- The `takeoff_origin` is populated on `VerificationResult` whenever the block parsed (even on FAIL), per MV-INV-9, so operators see what was attempted.
6. **Step D — Per-artifact hash walk** (only reached if Steps AC all passed):
- For each engine, descriptor_index, calibration entry:
- Compute `actual_path = manifest_path.parent / entry.path`.
- If file missing: append `ArtifactCheck(entry.path, entry.sha256, None, matched=False)`; append `ARTIFACT_MISSING` to `fail_reasons` once if not already there.
- Else: stream-read the file, compute SHA-256 (use AZ-280's helper that takes a path).
- If hash matches: `matched=True`.
- Else: `matched=False`; append `ARTIFACT_HASH_MISMATCH` once.
- For tiles_coverage:
- If `tile_metadata_store is None` (airborne mode): trust the recorded `tiles_coverage.sha256` since the Manifest signature already binds it. Append `ArtifactCheck("tiles_coverage", recorded_sha256, recorded_sha256, matched=True)` for completeness.
- Else (operator mode): re-derive `tiles_coverage_sha256` by `tile_metadata_store.query_by_bbox(...)` over the `build.bbox` + `zoom_levels` + `sector_class`, sort by `(zoom, lat, lon, source)`, hash. If mismatch → `TILES_COVERAGE_MISMATCH`.
- Walk ALL entries even on first failure (per MV-TC-9).
7. Set `outcome = PASS` iff `fail_reasons` is empty; else `FAIL`.
8. Set `elapsed_ms = int((time.monotonic() - start) * 1000)`.
9. Return `VerificationResult(...)`.
- INFO log on PASS (`c10.manifest.verify.pass` with elapsed_ms + fingerprint); WARN on FAIL with `fail_reasons` + counts of mismatched artifacts.
- Composition root factory `build_manifest_verifier(config, *, with_tile_store: bool) -> ManifestVerifier``with_tile_store=True` for operator mode, `False` for airborne C5.
## Scope
### Included
- `ManifestVerifier` class implementing the Protocol from the contract.
- The contract document (frozen at v1.0.0).
- Schema validation against the v1.0 shape produced by AZ-323.
- Signature verification against a tuple of trusted public keys.
- Per-artifact stream-hash walk with multiple-failure accumulation.
- Airborne vs operator mode for tiles_coverage handling.
- Composition-root factory.
- Conformance test for the contract Protocol.
### Excluded
- Manifest building / signing (AZ-323 owns).
- Trusted-key distribution / loading from disk — caller passes `Ed25519PublicKey` instances.
- Cache repair on FAIL — caller (E-C5 takeoff arming, E-C12 operator) decides next action.
- Coverage check for orphan files in `cache_root` (AZ-325 owns `ManifestCoverageError`).
- Logging Manifest contents (Manifests are not secret but verbose; only fingerprints + counts are logged).
- C13 FDR emission — caller's responsibility (per MV-INV-6).
- Non-Ed25519 signatures.
## Acceptance Criteria
**AC-1: PASS on a valid Manifest with all artifacts present and matching**
Given a freshly-built Manifest + sig + sidecar from AZ-323 and `trusted_public_keys = (signing_pub,)`
When `verify_manifest(manifest_path, trusted_public_keys)` is called
Then `outcome=PASS`, `fail_reasons` is empty, `per_artifact_checks` has every entry `matched=True`, `signing_public_key_fingerprint` is the signing key's fingerprint, `elapsed_ms > 0`
**AC-2: FAIL on missing Manifest with no further work**
Given `manifest_path` does not exist
When verify runs
Then `outcome=FAIL`, `fail_reasons=(MANIFEST_NOT_FOUND,)`, `per_artifact_checks` is empty (no work performed), `signing_public_key_fingerprint=None`
**AC-3: FAIL on missing signature with diagnostic**
Given Manifest.json exists + sidecar matches but Manifest.json.sig is absent
When verify runs
Then `fail_reasons=(SIGNATURE_NOT_FOUND,)`, `per_artifact_checks` is empty, no per-artifact disk reads happen (defence-in-depth)
**AC-4: FAIL on tampered Manifest body**
Given Manifest.json is mutated by 1 byte after signing
When verify runs
Then either `MANIFEST_SELF_HASH_MISMATCH` (sidecar caught it first) OR `SIGNATURE_INVALID` (if sidecar was also re-computed by attacker); per-artifact walk does NOT happen
**AC-5: FAIL on untrusted public key**
Given the Manifest is signed with a key NOT in `trusted_public_keys`
When verify runs
Then `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)`, `signing_public_key_fingerprint` is populated (so operators see WHICH untrusted key signed it), per-artifact walk does NOT happen
**AC-6: FAIL on schema violation lists offending field**
Given a Manifest missing the `signing_public_key_fingerprint` key
When verify runs
Then `fail_reasons=(SCHEMA_VIOLATION,)`, `fail_details` contains a string naming `signing_public_key_fingerprint`
**AC-7: FAIL on absolute path in artifact entry**
Given an engine entry has `path: "/etc/passwd"`
When verify runs
Then `fail_reasons=(SCHEMA_VIOLATION,)`, `fail_details` names the offending field; per-artifact walk does NOT consult `/etc/passwd`
**AC-8: FAIL with multiple reasons accumulated**
Given one engine is missing on disk AND one engine's bytes drifted AND a third engine matches
When verify runs
Then `fail_reasons` contains BOTH `ARTIFACT_MISSING` and `ARTIFACT_HASH_MISMATCH` (in deterministic order: traversal order); `per_artifact_checks` has all 3 entries with correct `matched` values; the third entry has `matched=True`
**AC-9: Operator mode re-derives tiles_coverage**
Given `tile_metadata_store` is supplied AND C6's tiles for the build's bbox/zoom now have a different aggregate hash (e.g., a tile was re-downloaded)
When verify runs
Then `fail_reasons=(TILES_COVERAGE_MISMATCH,)`; the recorded vs computed hashes are in `fail_details`
**AC-10: Airborne mode trusts tiles_coverage post-signature**
Given `tile_metadata_store=None`
When verify runs
Then `tiles_coverage` `ArtifactCheck` shows `matched=True` (recorded == "actual" because we don't re-derive); the airborne F2 path is fast (≤ 100 ms per NFR)
**AC-11: Conformance — `isinstance` returns True**
Given the implementation
When `isinstance(impl, ManifestVerifier)` is checked under runtime_checkable
Then `True`
**AC-12: `elapsed_ms` recorded on every outcome**
Given any of the above ACs
When inspecting the result
Then `elapsed_ms >= 0` and is reasonable (smaller for early-exit failures, larger for full per-artifact walks)
**AC-13: Empty `trusted_public_keys` always fails closed**
Given `trusted_public_keys = ()`
When verify runs
Then `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)` regardless of Manifest validity; per-artifact walk does NOT happen
**AC-14: Manifest with no `flight` block parses cleanly (back-compat)**
Given a v1.0 Manifest (no `flight` block) that is otherwise valid + signed
When verify runs
Then `outcome=PASS`; `VerificationResult.takeoff_origin is None`; `VerificationResult.flight_id is None`
**AC-15: Well-formed in-bbox `takeoff_origin` passes through**
Given a v1.1 Manifest with `flight.takeoff_origin = (50.0, 36.2, 200.0)` inside the recorded bbox
When verify runs
Then `outcome=PASS`; `VerificationResult.takeoff_origin == LatLonAlt(50.0, 36.2, 200.0)`
**AC-16: Malformed `takeoff_origin` (lat=200) fails closed**
Given a Manifest with `flight.takeoff_origin.lat_deg = 200`
When verify runs
Then `outcome=FAIL`; `fail_reasons` contains `TAKEOFF_ORIGIN_INVALID`; `fail_details` names `lat_deg`; the `takeoff_origin` field on `VerificationResult` is still populated for diagnostics
**AC-17: Out-of-bbox `takeoff_origin` fails closed**
Given a Manifest whose `flight.takeoff_origin = (10.0, 10.0, 0)` while `build.bbox` covers `(49.5..50.5, 35.5..36.5)`
When verify runs
Then `outcome=FAIL`; `fail_reasons` contains `TAKEOFF_ORIGIN_OUT_OF_BBOX`
## Non-Functional Requirements
**Performance**
- Airborne F2 verify (no per-tile re-derivation, ~5 artifact entries): wall-clock ≤ 100 ms on Jetson Orin (signature verify + 5 stream-SHA-256s of bounded files).
- Operator-mode verify with 100k tiles re-derivation: ≤ 5 s (matches AZ-323's NFR).
- Stream-hash files via 64 KB chunks; do NOT load engine binaries (~200 MB) entirely into memory.
**Compatibility**
- `cryptography` (already pinned via AZ-318), `orjson` (already pinned), `hashlib` (stdlib).
- No new third-party dependencies.
**Reliability**
- Fail-closed: empty trusted keys → FAIL; missing files → FAIL; any drift → FAIL.
- No partial PASS; the `outcome=PASS` branch is taken only when `fail_reasons` is empty.
- Defensive against directory traversal: relative paths only (AC-7).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Built Manifest from AZ-323 fixture | PASS; all matched |
| AC-2 | Missing Manifest.json | FAIL; MANIFEST_NOT_FOUND only |
| AC-3 | Missing signature | FAIL; SIGNATURE_NOT_FOUND; no disk reads |
| AC-4 | Mutated Manifest body | FAIL; either MANIFEST_SELF_HASH_MISMATCH or SIGNATURE_INVALID |
| AC-5 | Wrong-key signing | FAIL; UNTRUSTED_PUBLIC_KEY; fingerprint populated |
| AC-6 | Missing required field | FAIL; SCHEMA_VIOLATION + field name |
| AC-7 | Absolute path in artifact | FAIL; SCHEMA_VIOLATION; no path traversal |
| AC-8 | 1 missing + 1 drifted + 1 OK | Two failure reasons; per_artifact_checks complete |
| AC-9 | Operator mode + drifted tile | TILES_COVERAGE_MISMATCH |
| AC-10 | Airborne mode | tiles_coverage matched=True |
| AC-11 | Conformance check | True |
| AC-14 | v1.0 Manifest (no flight block) | PASS; takeoff_origin=None; flight_id=None |
| AC-15 | v1.1 Manifest, valid in-bbox origin | PASS; takeoff_origin populated |
| AC-16 | Malformed origin (lat=200) | FAIL; TAKEOFF_ORIGIN_INVALID; field name in details |
| AC-17 | Out-of-bbox origin | FAIL; TAKEOFF_ORIGIN_OUT_OF_BBOX |
| AC-12 | Inspect elapsed_ms | All non-negative; ordered as expected |
| AC-13 | Empty trusted keys | FAIL; UNTRUSTED |
| NFR-perf-airborne | 5 artifact bench, no tile re-walk | p99 ≤ 100 ms |
| NFR-perf-operator | 100k-tile re-walk | ≤ 5 s |
| NFR-reliability-stream-hash | 200 MB engine + memory profile | Peak < 10 MB extra |
## Constraints
- Stream SHA-256 over files via `hashlib.sha256().update(chunk)` in 64 KB blocks; do NOT `Path.read_bytes()` on engines (memory blowup per NFR).
- Path interpretation is relative-only; absolute paths are SCHEMA_VIOLATION (AC-7).
- The verifier is read-only (per MV-INV-6); no disk writes, no network, no FDR.
- `fail_reasons` is a tuple (immutable, ordered, deterministic).
- Signature checks happen before per-artifact walks (per MV-INV-2).
- Manifest sidecar check happens before signature (per MV-INV-3).
- Multiple failures accumulate; do not short-circuit on first per-artifact failure (per MV-TC-9 / AC-8).
## Risks & Mitigation
**Risk 1: Trusted-key list accidentally empty in production wiring**
- *Risk*: Composition root mis-configures; airborne C5 ends up with an empty key list and arming silently fails forever.
- *Mitigation*: AC-13 + ERROR log on `UNTRUSTED_PUBLIC_KEY` with key-list-length=0 makes the misconfiguration loud at first arm attempt.
**Risk 2: Per-artifact walk dominates airborne arm latency**
- *Risk*: 5 engines × 200 MB stream-hash on slow microSD → 30 s arm latency.
- *Mitigation*: NFR-perf-airborne benchmark documents the envelope; if the Jetson microSD I/O is the bottleneck, a follow-up task adds an "incremental verify" path that trusts unchanged artifacts since last reboot. Out of scope this cycle.
**Risk 3: Tampered sidecar matches tampered body (attacker drops both sidecar + body)**
- *Risk*: AC-4's first failure case (sidecar mismatch) is bypassed by an attacker who recomputes the sidecar.
- *Mitigation*: Signature check (Step B) catches this — the signature is over the Manifest body; recomputing the sidecar does NOT also recompute the signature. The Ed25519 secret key is operator-only.
**Risk 4: Path traversal via relative `..` segments**
- *Risk*: A relative path like `../../etc/passwd` passes the "no leading /" check but escapes cache_root.
- *Mitigation*: AC-7 + `..` segment rejection covers it; explicit check `if ".." in Path(entry.path).parts: SCHEMA_VIOLATION`.
**Risk 5: Operator-mode tile re-walk on Jetson is too slow**
- *Risk*: An airborne-mode verifier mistakenly gets a `tile_metadata_store` (composition root mistake) and re-walks 100k tiles, blowing the arm latency budget.
- *Mitigation*: The composition root factory `build_manifest_verifier(config, *, with_tile_store: bool)` is the explicit toggle; airborne wiring passes `with_tile_store=False`. AC-10 tests airborne mode latency.
## Runtime Completeness
- **Named capability**: takeoff content-hash gate per AC-NEW-1 + D-C10-3 + C10-IT-02 (epic § Acceptance C10-IT-01..02; description.md § 5 `ContentHashMismatchError`).
- **Production code that must exist**: real `ManifestVerifier` orchestrating real `cryptography` Ed25519 verify + real `hashlib` stream-SHA-256 + real `orjson` schema parse; real `tile_metadata_store` re-derivation in operator mode.
- **Allowed external stubs**: tests MAY use a fake key generated in-test, fake Manifest fixtures from AZ-323's test fixtures; production wiring uses real keys from operator key store.
- **Unacceptable substitutes**: skipping Step A's sidecar check (loses bit-rot detection); skipping Step B before walking artifacts (defeats MV-INV-2 defence-in-depth); short-circuiting on first per-artifact failure (operators need full diagnostic per MV-TC-9); HMAC instead of Ed25519 (different trust model); accepting absolute paths in entries (path traversal vulnerability per AC-7); raising on missing files instead of `outcome=FAIL` (breaks the contract's read-only / never-raise-on-verify-failure invariant).