mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 09:11:13 +00:00
[AZ-507] [AZ-323] [AZ-324] C10 Manifest build + verify + AZ-270 hygiene
AZ-507: codify cross-component import rule. Added _types/inference_errors.py shim re-exporting EngineBuildError + CalibrationCacheError from c7_inference; narrowed C10 EngineCompiler's except Exception to the two typed errors so unknown exceptions propagate (AC-3). Rewrote module-layout.md "Imports from" sections for 9 components + added Rule 9; appended an architecture.md ADR-009 note explaining why components must go through _types/*. AZ-323: ManifestBuilder + Ed25519ManifestSigner. Canonical JSON via orjson OPT_SORT_KEYS+OPT_INDENT_2, atomic-write Manifest.json + sha sidecar + .sig via AZ-280, operator-key fingerprint allowlist gate (C10-ST-01), ADR-010 takeoff_origin + flight_id baked into Manifest AND manifest_hash so re-planned routes change the cache identity (AC-15/AC-16). 20 unit tests cover all 16 ACs. AZ-324: ManifestVerifierImpl. Fail-closed Steps A-D: Manifest.json sidecar self-hash, Ed25519 trust-key set, schema parse with absolute/.. path rejection + takeoff_origin in-bbox check, stream SHA-256 per artifact with multi-failure accumulation. Operator mode re-derives tiles_coverage_sha256 from C6; airborne mode trusts the signed aggregate. 19 unit tests cover all 17 ACs. Composition root: c10_factory.build_manifest_builder + build_manifest_verifier + c6_tile_metadata_store_to_tiles_query adapter (the one place that legitimately imports both C6 and C10 without violating the AZ-270 lint). Dependency: pinned cryptography>=43.0,<46.0 in pyproject.toml. Tests: 1300 passed, 80 skipped (env-only), ruff clean for all AZ-323/324 files. AZ-306 (FAISS) intentionally deferred to batch 35 — needs C++ pybind11 toolchain not present in this environment. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,275 @@
|
||||
# C10 Manifest Builder — Content-Hash Table + Operator-Key Ed25519 Signing
|
||||
|
||||
**Task**: AZ-323_c10_manifest_builder
|
||||
**Name**: C10 Manifest Builder
|
||||
**Description**: Implement `ManifestBuilder`, the C10-internal phase that produces the signed cache Manifest covering EVERY shipped artifact (engines, FAISS index, calibration JSON, all tile hashes from C6) plus the build-identity tuple `(model_ids, calibration_sha256, sorted_tile_hashes, sector_class, bbox, zoom_levels, takeoff_origin, flight_id)` whose canonical hash is `manifest_hash` — the D-C10-1 idempotence key. The `takeoff_origin` (`LatLonAlt`) and `flight_id` (`UUID`) are supplied by C12 from `Flight.waypoints[0]` via the `FlightsApiClient` (ADR-010, AZ-489); both are baked into the Manifest body **and** included in the manifest-hash so re-planning the flight produces a new cache identity. Serializes the Manifest as canonical JSON (sorted keys, no whitespace) at `cache_root/Manifest.json`, computes its own SHA-256 sidecar via AZ-280, and writes a detached Ed25519 signature at `cache_root/Manifest.json.sig` using the operator's signing key from `key_path`. Refuses to sign with a non-operator key when `config.c10.signing_mode = "operator"` (C10-ST-01). Emits the `signing_public_key_fingerprint` into the Manifest itself so verifiers can pin the trust root.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-303_c6_storage_interfaces
|
||||
**Component**: c10_provisioning (epic AZ-252 / E-C10)
|
||||
**Tracker**: AZ-323
|
||||
**Epic**: AZ-252 (E-C10)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — atomic write + sidecar pattern (AZ-280).
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` — `query_by_bbox` returning per-tile sha256 set by AZ-316.
|
||||
- `_docs/02_document/components/11_c10_provisioning/description.md` — § 1 idempotence, § 5 `ManifestWriteError`, § 7 D-C10-3 sidecar coverage.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real Manifest builder:
|
||||
|
||||
- D-C10-1 (idempotent re-run via manifest hash) cannot be implemented — T5's "did anything change?" check has no canonical hash to compare.
|
||||
- D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable — the takeoff verifier (T4) has nothing to verify against.
|
||||
- AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses without a signed Manifest at takeoff.
|
||||
- C10-ST-01 (build refuses dev-key signing in operator mode) cannot be enforced without a signing key check.
|
||||
- The `signing_public_key_fingerprint` field is the trust anchor for the airborne `ManifestVerifier`; without it, the verifier cannot decide which key is allowed to vouch for a Manifest.
|
||||
- A Manifest that is huge (100k tile hashes × 80 bytes = 8 MB) but human-inspectable is operator-friendly; without canonical JSON ordering, two builds of the same input produce different bytes and break idempotence.
|
||||
|
||||
This task delivers the Manifest serialization + signing. It does NOT compile engines (AZ-321), embed tiles (AZ-322), or run the takeoff verify (T4).
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `ManifestBuilder` class at `src/gps_denied_onboard/components/c10_provisioning/manifest_builder.py`:
|
||||
- Constructor: `__init__(self, *, sidecar: Sha256Sidecar, signer: ManifestSigner, tile_metadata_store: TileMetadataStore, logger: Logger, clock: Clock, config: C10ManifestConfig)`.
|
||||
- `C10ManifestConfig` (`@dataclass(frozen=True)`): `signing_mode: enum {operator, dev}`, `allowed_operator_fingerprints: tuple[str, ...]`, `schema_version: str = "1.0"`.
|
||||
- Public method: `build_manifest(input: ManifestBuildInput) -> ManifestArtifact`.
|
||||
- `ManifestBuildInput` (`@dataclass(frozen=True)`): `cache_root: Path`, `bbox: Bbox`, `zoom_levels: tuple[int, ...]`, `sector_class: SectorClassification`, `engine_entries: tuple[EngineCacheEntry, ...]`, `descriptor_index_path: Path`, `calibration_path: Path`, `key_path: Path`, `takeoff_origin: LatLonAlt | None = None` (ADR-010 / AZ-489 — when set, baked into Manifest + hash), `flight_id: UUID | None = None` (ADR-010 — pass-through provenance).
|
||||
- `ManifestArtifact` (`@dataclass(frozen=True)`): `manifest_path: Path`, `signature_path: Path`, `manifest_hash: str`, `signing_public_key_fingerprint: str`, `total_artifacts_listed: int`.
|
||||
- A `ManifestSigner` Protocol at `src/gps_denied_onboard/components/c10_provisioning/interface.py`:
|
||||
```python
|
||||
@runtime_checkable
|
||||
class ManifestSigner(Protocol):
|
||||
def load_signing_key(self, key_path: Path) -> SigningKeyHandle: ...
|
||||
def sign(self, key: SigningKeyHandle, payload_bytes: bytes) -> bytes: ...
|
||||
def public_key_fingerprint(self, key: SigningKeyHandle) -> str: ...
|
||||
```
|
||||
Default impl `Ed25519ManifestSigner` uses the `cryptography` library (already pinned via AZ-318 for per-flight keys).
|
||||
- Method flow:
|
||||
1. Load operator signing key: `signer.load_signing_key(input.key_path)` → `SigningKeyHandle`.
|
||||
2. Compute `signing_public_key_fingerprint = signer.public_key_fingerprint(key)` (sha256 of the raw 32-byte ed25519 public key, hex).
|
||||
3. **Operator-mode gate (C10-ST-01)**: if `config.signing_mode == "operator"` AND `fingerprint not in config.allowed_operator_fingerprints` → raise `ManifestWriteError("signing key fingerprint not in allowed_operator_fingerprints")`; ERROR log with the offending fingerprint. If `config.signing_mode == "dev"` AND fingerprint matches an allowed operator fingerprint → emit WARN `c10.manifest.dev_mode_with_operator_key` (operator key being used in dev mode is suspicious but allowed).
|
||||
4. Compute per-artifact hashes:
|
||||
- For each engine entry: read `entry.engine_sha256_hex` (already computed by AZ-321; do NOT re-hash).
|
||||
- For descriptor index: call `sidecar.read_sidecar(input.descriptor_index_path)` → expect a 64-char hex digest.
|
||||
- For calibration JSON: `sha256_hex(open(calibration_path, 'rb').read())` — calibration is small (KB).
|
||||
- For tiles: call `tile_metadata_store.query_by_bbox(bbox, zoom_levels, sector_class)` → list of `TileMetadata` with `sha256_hex` field (set by AZ-316). Sort by `(zoom, lat, lon, source)` for determinism. Compute `tiles_coverage_sha256 = sha256(b"\n".join(f"{t.tile_id}:{t.sha256_hex}".encode() for t in sorted_tiles))`.
|
||||
5. Build the canonical Manifest dict (ADR-010 adds `flight.takeoff_origin` + `flight.flight_id` blocks when supplied):
|
||||
```
|
||||
{
|
||||
"schema_version": "1.1",
|
||||
"build": {
|
||||
"bbox": {...},
|
||||
"zoom_levels": [16, 17, 18],
|
||||
"sector_class": "stable_rear",
|
||||
"built_at": "2026-05-10T12:00:00Z",
|
||||
"manifest_hash": "<sha256-hex>"
|
||||
},
|
||||
"flight": {
|
||||
"flight_id": "<uuid>", // null when ManifestBuildInput.flight_id is None
|
||||
"takeoff_origin": { // omitted when ManifestBuildInput.takeoff_origin is None
|
||||
"lat_deg": <float>,
|
||||
"lon_deg": <float>,
|
||||
"alt_m": <float>
|
||||
}
|
||||
},
|
||||
"artifacts": {
|
||||
"engines": [{"path": "engines/dinov2_vpr_sm87_jp62_trt103_fp16.engine", "sha256": "<hex>"}, ...],
|
||||
"descriptor_index": {"path": "descriptors/corpus.index", "sha256": "<hex>"},
|
||||
"calibration": {"path": "calibration/int8_calibration.json", "sha256": "<hex>"},
|
||||
"tiles_coverage": {"sha256": "<hex>", "tile_count": <int>}
|
||||
},
|
||||
"signing_public_key_fingerprint": "<hex>"
|
||||
}
|
||||
```
|
||||
6. Compute `manifest_hash` as `sha256(canonical_json(build_identity_tuple))` where `build_identity_tuple = sorted({model_ids, calibration_sha256, tiles_coverage_sha256, sector_class, bbox, zoom_levels, takeoff_origin_tuple_or_none, flight_id_or_none})`. The takeoff origin is serialised as `(lat_deg, lon_deg, alt_m)` rounded to 9 decimal places (sub-millimetre, deterministic). This is the D-C10-1 idempotence key. Insert into the Manifest dict at `build.manifest_hash` AFTER computation. **Two builds with identical inputs but different `takeoff_origin` produce different `manifest_hash` values; this is the contract that lets `ManifestVerifier` reject a re-planned route at boot (AZ-324, MV-INV-8).**
|
||||
7. Serialize the Manifest dict as canonical JSON: `orjson.dumps(manifest, option=orjson.OPT_SORT_KEYS | orjson.OPT_INDENT_2).decode()`. Append a trailing newline.
|
||||
8. Atomic-write the JSON via `sidecar.write_with_sidecar(cache_root / "Manifest.json", canonical_json_bytes)` — produces `Manifest.json` + `Manifest.json.sha256` (the latter is the Manifest's OWN sha256, used by T4).
|
||||
9. Sign the canonical JSON bytes: `signature_bytes = signer.sign(key, canonical_json_bytes)` (raw Ed25519 signature, 64 bytes).
|
||||
10. Atomic-write the signature: `sidecar.atomic_write(cache_root / "Manifest.json.sig", signature_bytes)` (no .sha256 sidecar for the signature itself — signature integrity is verified by Ed25519 over the Manifest bytes).
|
||||
11. Return `ManifestArtifact(manifest_path, signature_path, manifest_hash, signing_public_key_fingerprint, total_artifacts_listed)`.
|
||||
- INFO log on successful build (`c10.manifest.build.success` with `manifest_hash` + `total_artifacts_listed`); ERROR on `ManifestWriteError`; WARN on dev-mode-with-operator-key.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `ManifestBuilder` class with the single public method.
|
||||
- `ManifestSigner` Protocol + `Ed25519ManifestSigner` default impl.
|
||||
- Canonical JSON serialization (sorted keys, sorted lists where order is content-defining).
|
||||
- Operator-key gate per `signing_mode` config.
|
||||
- Per-artifact hash computation (engines, descriptor index, calibration, tiles aggregate).
|
||||
- Atomic writes via AZ-280 for both `Manifest.json` and `Manifest.json.sig`.
|
||||
- Composition-root factory `build_manifest_builder`.
|
||||
- Conformance test for `ManifestSigner` Protocol.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The orchestration of when to build (T5 owns).
|
||||
- Engine compilation / descriptor generation (AZ-321 / AZ-322).
|
||||
- Manifest verification (T4 owns).
|
||||
- Idempotence "should we skip the build?" decision (T5 owns; this task always rebuilds when called).
|
||||
- ManifestCoverageError (T5 owns; this task lists what it's told, doesn't enumerate cache_root).
|
||||
- Key generation — operator's long-lived key is provisioned out-of-band; this task only loads + uses.
|
||||
- Multi-key signing (M-of-N quorum) — single-key per build.
|
||||
- Compressed Manifest format — JSON for human inspection.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Happy path produces Manifest + sig + sidecars**
|
||||
Given a valid input with 3 engines, 1 descriptor index, 1 calibration JSON, 100 tiles
|
||||
When `build_manifest(input)` is called
|
||||
Then `Manifest.json`, `Manifest.json.sha256`, `Manifest.json.sig` are all present at `cache_root/`; the Manifest contains 3 engine entries, 1 descriptor_index entry, 1 calibration entry, 1 tiles_coverage entry; `manifest_hash` is a 64-char lowercase hex string; the returned `ManifestArtifact.total_artifacts_listed == 5` (engines + index + calibration + tiles_coverage as one logical artifact + the Manifest itself counts separately if at all)
|
||||
|
||||
**AC-2: Determinism — same input produces byte-identical Manifest**
|
||||
Given the same `ManifestBuildInput` run twice on different days (different `built_at`)
|
||||
When the canonical JSON is compared with `built_at` redacted
|
||||
Then both runs produce byte-identical bytes — proves canonical JSON ordering works; same `manifest_hash`. (This is the foundation for T5's idempotence check.)
|
||||
|
||||
**AC-3: Signature verifies against the public key**
|
||||
Given the signature file + the operator's public key
|
||||
When `cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PublicKey.verify(signature, manifest_bytes)` is called
|
||||
Then no exception is raised — proves the signing produced a valid Ed25519 signature
|
||||
|
||||
**AC-4: Operator-mode rejects unknown fingerprint**
|
||||
Given `config.signing_mode = "operator"` and `config.allowed_operator_fingerprints = ("known_fp",)` and a key file whose fingerprint is `"unknown_fp"`
|
||||
When `build_manifest` is called
|
||||
Then `ManifestWriteError` is raised with a message naming both fingerprints (the offered one + the allowlist); ZERO files are written; ONE ERROR log
|
||||
|
||||
**AC-5: Operator-mode accepts known fingerprint**
|
||||
Given `config.signing_mode = "operator"` and the key file's fingerprint IS in the allowlist
|
||||
When `build_manifest` is called
|
||||
Then the build succeeds; ZERO WARN logs about dev-mode
|
||||
|
||||
**AC-6: Dev-mode with non-operator key emits no warning**
|
||||
Given `config.signing_mode = "dev"` and a random dev key (not in allowlist)
|
||||
When `build_manifest` is called
|
||||
Then build succeeds; `signing_public_key_fingerprint` is the dev key's; ZERO warnings about operator key in dev mode
|
||||
|
||||
**AC-7: Dev-mode with operator key emits warning**
|
||||
Given `config.signing_mode = "dev"` and a key whose fingerprint IS in `allowed_operator_fingerprints`
|
||||
When `build_manifest` is called
|
||||
Then build succeeds; ONE WARN log `c10.manifest.dev_mode_with_operator_key` with the fingerprint
|
||||
|
||||
**AC-8: Tile coverage hash is sort-order-deterministic**
|
||||
Given the same 100 tiles loaded in two different SQL row orders (e.g., insertion order vs index scan)
|
||||
When `tiles_coverage_sha256` is computed
|
||||
Then both runs produce the same hash — proves the `(zoom, lat, lon, source)` sort is canonical
|
||||
|
||||
**AC-9: ManifestWriteError on key load failure**
|
||||
Given a `key_path` that does not exist OR contains malformed PEM
|
||||
When `signer.load_signing_key(key_path)` raises
|
||||
Then `ManifestWriteError("operator signing key load failed: <reason>")` is raised; ZERO files are written; the original `cryptography` exception is chained as `__cause__` for diagnosis
|
||||
|
||||
**AC-10: Atomic write — partial Manifest impossible**
|
||||
Given the Manifest is being written and the process is killed mid-write
|
||||
When restarted
|
||||
Then either the previous-good Manifest OR the new Manifest is at the path; never a half-written JSON. (AZ-280's atomic-write contract.)
|
||||
|
||||
**AC-11: Manifest's own sidecar is consistent**
|
||||
Given a freshly-written `Manifest.json`
|
||||
When `sha256_hex(open("Manifest.json", "rb").read())` is computed and compared to `Manifest.json.sha256`
|
||||
Then the values match — T4's verifier walks all sidecars and this is the entry point
|
||||
|
||||
**AC-12: `total_artifacts_listed` equals dict-counted artifacts**
|
||||
Given an input with N engines + 1 index + 1 calibration + tiles_coverage
|
||||
When `ManifestArtifact.total_artifacts_listed` is inspected
|
||||
Then it equals `N + 3` (engines + index + calibration + tiles_coverage); does NOT count the Manifest itself or the signature
|
||||
|
||||
**AC-13: `takeoff_origin` baked into Manifest body when supplied (ADR-010 / AZ-489)**
|
||||
Given a `ManifestBuildInput` with `takeoff_origin = LatLonAlt(50.0, 36.2, 200.0)` and `flight_id = some_uuid`
|
||||
When `build_manifest` is called
|
||||
Then the Manifest body contains a `flight` block with `flight_id` and `takeoff_origin` (`lat_deg=50.0`, `lon_deg=36.2`, `alt_m=200.0`); ZERO `built_at`-style timestamp inside `takeoff_origin`
|
||||
|
||||
**AC-14: `takeoff_origin` absent from Manifest body when not supplied**
|
||||
Given a `ManifestBuildInput` with `takeoff_origin = None` and `flight_id = None`
|
||||
When `build_manifest` is called
|
||||
Then the Manifest body has the `flight` block with `flight_id: null` and NO `takeoff_origin` key (use absence, not `null`, so AZ-324 can detect "field never set" vs "field invalid")
|
||||
|
||||
**AC-15: `manifest_hash` changes when only `takeoff_origin` differs**
|
||||
Given two `ManifestBuildInput`s identical except `takeoff_origin = A` vs `takeoff_origin = B` (B != A by ≥ 1 mm)
|
||||
When `build_manifest` is called twice
|
||||
Then the two `manifest_hash` values differ — D-C10-1 idempotence treats re-planned route as a new build
|
||||
|
||||
**AC-16: `manifest_hash` stable when only `flight_id` differs but `takeoff_origin` is the same**
|
||||
Given two `ManifestBuildInput`s identical except `flight_id`
|
||||
When `build_manifest` is called twice
|
||||
Then the two `manifest_hash` values **differ** — `flight_id` is provenance and is part of the build identity (operator may re-plan with the same takeoff position but a different mission; the cache identity must track that)
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Build wall-clock ≤ 5 s for a 100k-tile corpus on Tier-1 dev workstation: sorting 100k tile hashes + computing one SHA-256 over the concatenated string is ~50 MB of input → ~100 ms; serializing JSON with 100k tile_count is fast (single integer); engine + index + calibration hashes are already computed upstream. Total ≤ 5 s leaves headroom.
|
||||
- Operator-mode fingerprint check is a single string comparison.
|
||||
|
||||
**Compatibility**
|
||||
- Uses `orjson` (already pinned via AZ-272 for FDR), `cryptography` (already pinned via AZ-318 for per-flight keys), `hashlib` (stdlib).
|
||||
- No new third-party dependencies.
|
||||
|
||||
**Reliability**
|
||||
- Operator-key gate is fail-closed: unknown fingerprint → no Manifest written.
|
||||
- Atomic writes prevent half-written Manifests on process kill.
|
||||
- Canonical JSON ensures bit-identical Manifests for identical inputs (foundation for D-C10-1 idempotence in T5).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Build with 3 engines + index + calibration + 100 tiles | All files present; counts match |
|
||||
| AC-2 | Build twice, redact built_at, compare bytes | Identical |
|
||||
| AC-3 | Verify signature with public key | No raise |
|
||||
| AC-4 | Operator mode + unknown fingerprint | ManifestWriteError; no files |
|
||||
| AC-5 | Operator mode + known fingerprint | Success; no warnings |
|
||||
| AC-6 | Dev mode + dev key | Success; no warnings |
|
||||
| AC-7 | Dev mode + operator-allowlisted key | Success; ONE warning |
|
||||
| AC-8 | Tile rows in different orders | Same `tiles_coverage_sha256` |
|
||||
| AC-9 | Missing or malformed key file | ManifestWriteError; chained cause |
|
||||
| AC-10 | Kill mid-write | No half-Manifest |
|
||||
| AC-11 | Verify Manifest's own sidecar | Hashes match |
|
||||
| AC-12 | Inspect total_artifacts_listed | Counts engines+index+calibration+tiles_coverage |
|
||||
| AC-13 | Build with takeoff_origin set | `flight.takeoff_origin` present in JSON; lat/lon/alt match |
|
||||
| AC-14 | Build with takeoff_origin=None | `flight.takeoff_origin` key absent from JSON |
|
||||
| AC-15 | Two builds, takeoff_origin differs | manifest_hash differs |
|
||||
| AC-16 | Two builds, only flight_id differs | manifest_hash differs |
|
||||
| NFR-perf | 100k-tile bench | ≤ 5 s wall clock |
|
||||
| NFR-reliability-fail-closed | Operator mode + unknown fp | Fail-closed; nothing written |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Canonical JSON via `orjson` with `OPT_SORT_KEYS`; this task does NOT use a different JSON library.
|
||||
- Atomic writes via AZ-280 for BOTH `Manifest.json` and `Manifest.json.sig`; no naked `Path.write_bytes()`.
|
||||
- `manifest_hash` excludes `built_at` (it's a build-identity hash, not a Manifest-bytes hash).
|
||||
- The Manifest's own SHA-256 sidecar (Manifest.json.sha256) IS the Manifest-bytes hash and is used by T4 at takeoff.
|
||||
- Tile coverage hashing is via aggregate `tiles_coverage_sha256`, NOT per-tile entries in the Manifest (keeps Manifest bounded).
|
||||
- Signature is detached (separate `.sig` file); embedded signatures are NOT permitted (would require parsing before verifying).
|
||||
- Ed25519 only; this task does NOT add other algorithms.
|
||||
- Operator-key fingerprint allowlist is config-driven; no hardcoded keys.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: `built_at` makes Manifests non-deterministic for the same input**
|
||||
- *Risk*: Idempotence check in T5 compares `manifest_hash` only, but if T5 reads the Manifest bytes directly elsewhere it could see different bytes for "same" build.
|
||||
- *Mitigation*: AC-2 explicitly excludes `built_at` from the `manifest_hash` computation. T5 compares hashes, not bytes. Documented in the Manifest schema.
|
||||
|
||||
**Risk 2: tiles_coverage as aggregate hides which tile changed**
|
||||
- *Risk*: When verify fails at takeoff (T4), the operator only learns "tiles_coverage hash mismatch", not WHICH tile drifted.
|
||||
- *Mitigation*: T4's failure path can re-walk per-tile hashes against C6 to identify the offender. The Manifest stays small; debugging detail is computed on-demand. Documented in T4's scope.
|
||||
|
||||
**Risk 3: `cryptography` API breaks between minor versions**
|
||||
- *Risk*: Ed25519 API changes (unlikely but `cryptography` does ship breaking changes occasionally).
|
||||
- *Mitigation*: Pin to the same version used by AZ-318. The `Ed25519ManifestSigner` is the only place using the API; a one-place adapter swap on upgrade.
|
||||
|
||||
**Risk 4: Operator key file format ambiguity**
|
||||
- *Risk*: Operators might supply a key in PKCS8, OpenSSH, or raw 32-byte format.
|
||||
- *Mitigation*: `Ed25519ManifestSigner.load_signing_key` accepts PEM-encoded PKCS8 only (matches AZ-318's convention); other formats raise `ManifestWriteError` with explicit format hint.
|
||||
|
||||
**Risk 5: Dev key accidentally signs an operator-mode build**
|
||||
- *Risk*: Operator runs build with `signing_mode = "operator"` but supplies a dev key by mistake.
|
||||
- *Mitigation*: AC-4 covers; the gate is fail-closed and logs the offending fingerprint so the operator can correct.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: signed Manifest production with content-hash table covering every shipped artifact, D-C10-1 idempotence key (`manifest_hash`), C10-ST-01 operator-mode gate (epic § Acceptance C10-IT-01, C10-IT-02, C10-ST-01).
|
||||
- **Production code that must exist**: real `ManifestBuilder` orchestrating real `Ed25519ManifestSigner` (cryptography library) + real AZ-280 atomic writes + real C6 `query_by_bbox` to gather tile hashes; real config-driven fingerprint allowlist.
|
||||
- **Allowed external stubs**: tests MAY use a fake `ManifestSigner` with a known keypair generated in-test + a fake `tile_metadata_store` (AZ-303 conformance fakes); production wiring uses `cryptography.hazmat`.
|
||||
- **Unacceptable substitutes**: HMAC instead of Ed25519 (different trust model — symmetric vs asymmetric); embedding the signature in the JSON (defeats the parse-before-verify problem at takeoff); Python-only `pickle` of the Manifest (not human-inspectable, not canonical-byte stable); skipping the operator-fingerprint allowlist when `signing_mode = "operator"` (defeats C10-ST-01); using `json.dumps` without `OPT_SORT_KEYS` (breaks AC-2 determinism and breaks T5's idempotence).
|
||||
@@ -0,0 +1,273 @@
|
||||
# C10 ManifestVerifier — Takeoff Content-Hash Gate + Trusted-Key Pinning
|
||||
|
||||
**Task**: AZ-324_c10_manifest_verifier
|
||||
**Name**: C10 ManifestVerifier
|
||||
**Description**: Implement `ManifestVerifier` (per the contract `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` v1.1.0), the read-only validator that AC-NEW-1 places between F2 takeoff and any engine deserialization. Loads `Manifest.json`, verifies its sidecar SHA-256 matches the Manifest bytes, parses the Ed25519 detached signature at `Manifest.json.sig`, verifies it against the caller-supplied `trusted_public_keys` tuple, parses the Manifest schema (rejecting absolute paths and schema violations), validates the optional `flight.takeoff_origin` block (well-formed `LatLonAlt` + inside `build.bbox` per ADR-010 + AZ-490), and walks every per-artifact entry re-hashing it via AZ-280's sidecar pattern. Returns a `VerificationResult` with `outcome ∈ {PASS, FAIL}`, the union of all `VerifyFailReason` values that fired, the populated `per_artifact_checks` list, the pass-through `takeoff_origin` + `flight_id` (or `None` when absent from the Manifest body), and `elapsed_ms`. Fail-closed: any deviation in signature, schema, key trust, hashes, or origin validity yields `FAIL` with detailed reasons. Never raises on a verify failure — only on environment errors (Manifest.json missing → `MANIFEST_NOT_FOUND` is still `FAIL`, not raise).
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema
|
||||
**Component**: c10_provisioning (epic AZ-252 / E-C10)
|
||||
**Tracker**: AZ-324
|
||||
**Epic**: AZ-252 (E-C10)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` — produced by this task (frozen Protocol + DTO shape, invariants, test cases).
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — sidecar verify pattern (AZ-280).
|
||||
- `_docs/02_document/components/11_c10_provisioning/description.md` — § 5 `ContentHashMismatchError` handling, § 7 D-C10-3 sidecar coverage.
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real verifier:
|
||||
|
||||
- AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses — F2 has nothing to gate on.
|
||||
- D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable at takeoff.
|
||||
- C10-IT-02 (rejects tampered or wrong-key Manifests) cannot be implemented.
|
||||
- A built but unverified Manifest is no better than no Manifest — operators cannot trust it without an actual check.
|
||||
- Without a contract, C5 takeoff arming and C12 operator tooling cannot couple to C10 — every consumer would re-implement an ad-hoc check.
|
||||
- The "fail-closed" property is a hard requirement; partial verifies that report PASS on first match would compromise the entire trust chain.
|
||||
|
||||
This task delivers the verifier + its frozen contract. It does NOT compile engines (AZ-321), build the Manifest (AZ-323), or own the takeoff-arming policy (E-C5).
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `ManifestVerifier` class implementation at `src/gps_denied_onboard/components/c10_provisioning/manifest_verifier.py` matching the Protocol in the contract.
|
||||
- Constructor: `__init__(self, *, sidecar: Sha256Sidecar, logger: Logger, clock: Clock, tile_metadata_store: TileMetadataStore | None = None)`.
|
||||
- When `tile_metadata_store is None`, the verifier operates in airborne mode: trusts the recorded `tiles_coverage_sha256` after the signature passes (per MV-INV-5).
|
||||
- When `tile_metadata_store is not None`, the verifier operates in operator mode: re-derives `tiles_coverage_sha256` from C6 and reports `TILES_COVERAGE_MISMATCH` on drift.
|
||||
- The frozen contract at `_docs/02_document/contracts/c10_provisioning/manifest_verifier.md` (already written; this task brings the implementation up to it).
|
||||
- Method `verify_manifest(manifest_path, trusted_public_keys) -> VerificationResult` flow:
|
||||
1. Start `time.monotonic()` for `elapsed_ms`.
|
||||
2. Initialize empty `fail_reasons: list[VerifyFailReason]`, `fail_details: list[str]`, `per_artifact_checks: list[ArtifactCheck]`.
|
||||
3. **Step A — Manifest exists & sidecar matches**:
|
||||
- If `manifest_path` does not exist: append `MANIFEST_NOT_FOUND`; return `FAIL` (no further work; per MV-INV-1).
|
||||
- Read `Manifest.json` bytes.
|
||||
- If `manifest_path.with_suffix(".json.sha256")` does not exist: append `SCHEMA_VIOLATION` ("missing manifest sidecar"); return `FAIL`.
|
||||
- If `sha256(manifest_bytes) != sidecar_value`: append `MANIFEST_SELF_HASH_MISMATCH`; return `FAIL` (do NOT consult signature per MV-INV-3).
|
||||
4. **Step B — Signature verifies against a trusted key**:
|
||||
- If `signature_path = manifest_path.with_suffix(".json.sig")` does not exist: append `SIGNATURE_NOT_FOUND`; `signing_public_key_fingerprint = None`; return `FAIL`.
|
||||
- Parse Ed25519 signature bytes (must be exactly 64 bytes; otherwise `SIGNATURE_INVALID`).
|
||||
- Try each public key in `trusted_public_keys`:
|
||||
- Compute `fingerprint = sha256(pub.public_bytes_raw()).hex()`.
|
||||
- Try `pub.verify(signature_bytes, manifest_bytes)`.
|
||||
- On success: signature is valid; `signing_public_key_fingerprint = fingerprint`; break.
|
||||
- If no trusted key verified:
|
||||
- If at least one key raised `InvalidSignature` (signature doesn't match this key's bytes): the signature could still match an untrusted key. Try parsing the Manifest's `signing_public_key_fingerprint` field (if schema parses) and report whichever is more diagnostic — `UNTRUSTED_PUBLIC_KEY` if the Manifest names a known-but-untrusted key, `SIGNATURE_INVALID` otherwise.
|
||||
- Append the reason; return `FAIL` (do NOT proceed to per-artifact hashing per MV-INV-2).
|
||||
- If `trusted_public_keys` is empty: append `UNTRUSTED_PUBLIC_KEY`; return `FAIL`.
|
||||
5. **Step C — Schema parse**:
|
||||
- `orjson.loads(manifest_bytes)` → dict.
|
||||
- Validate required keys: `schema_version`, `build` (with sub-keys `bbox`, `zoom_levels`, `sector_class`, `built_at`, `manifest_hash`), `artifacts` (with `engines`, `descriptor_index`, `calibration`, `tiles_coverage`), `signing_public_key_fingerprint`. `flight` block is OPTIONAL (added in schema v1.1, ADR-010).
|
||||
- Validate types: `engines` is list of `{path: str, sha256: str}`; `descriptor_index`, `calibration` are `{path: str, sha256: str}`; `tiles_coverage` is `{sha256: str, tile_count: int}`.
|
||||
- Validate path-relative-only: every `path` value must be relative (no leading `/`, no `..` segments). Append `SCHEMA_VIOLATION` per offending field; if any, return `FAIL`.
|
||||
- **Flight block (ADR-010 / AZ-490)**:
|
||||
- If `flight` key absent → `takeoff_origin = None`, `flight_id = None`; continue.
|
||||
- If `flight` present → parse `flight_id` (`UUID` or `None`) and `takeoff_origin` (optional block).
|
||||
- If `flight.takeoff_origin` present → validate `lat_deg ∈ [-90, 90]`, `lon_deg ∈ [-180, 180]`, `alt_m` finite (no NaN/Inf). Append `TAKEOFF_ORIGIN_INVALID` to `fail_reasons` and the offending field name to `fail_details` if any check fails.
|
||||
- If `flight.takeoff_origin` is well-formed → check it falls inside `build.bbox` (`bbox.lat_min ≤ lat ≤ bbox.lat_max`, `bbox.lon_min ≤ lon ≤ bbox.lon_max`). Append `TAKEOFF_ORIGIN_OUT_OF_BBOX` if not.
|
||||
- The `takeoff_origin` is populated on `VerificationResult` whenever the block parsed (even on FAIL), per MV-INV-9, so operators see what was attempted.
|
||||
6. **Step D — Per-artifact hash walk** (only reached if Steps A–C all passed):
|
||||
- For each engine, descriptor_index, calibration entry:
|
||||
- Compute `actual_path = manifest_path.parent / entry.path`.
|
||||
- If file missing: append `ArtifactCheck(entry.path, entry.sha256, None, matched=False)`; append `ARTIFACT_MISSING` to `fail_reasons` once if not already there.
|
||||
- Else: stream-read the file, compute SHA-256 (use AZ-280's helper that takes a path).
|
||||
- If hash matches: `matched=True`.
|
||||
- Else: `matched=False`; append `ARTIFACT_HASH_MISMATCH` once.
|
||||
- For tiles_coverage:
|
||||
- If `tile_metadata_store is None` (airborne mode): trust the recorded `tiles_coverage.sha256` since the Manifest signature already binds it. Append `ArtifactCheck("tiles_coverage", recorded_sha256, recorded_sha256, matched=True)` for completeness.
|
||||
- Else (operator mode): re-derive `tiles_coverage_sha256` by `tile_metadata_store.query_by_bbox(...)` over the `build.bbox` + `zoom_levels` + `sector_class`, sort by `(zoom, lat, lon, source)`, hash. If mismatch → `TILES_COVERAGE_MISMATCH`.
|
||||
- Walk ALL entries even on first failure (per MV-TC-9).
|
||||
7. Set `outcome = PASS` iff `fail_reasons` is empty; else `FAIL`.
|
||||
8. Set `elapsed_ms = int((time.monotonic() - start) * 1000)`.
|
||||
9. Return `VerificationResult(...)`.
|
||||
- INFO log on PASS (`c10.manifest.verify.pass` with elapsed_ms + fingerprint); WARN on FAIL with `fail_reasons` + counts of mismatched artifacts.
|
||||
- Composition root factory `build_manifest_verifier(config, *, with_tile_store: bool) -> ManifestVerifier` — `with_tile_store=True` for operator mode, `False` for airborne C5.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `ManifestVerifier` class implementing the Protocol from the contract.
|
||||
- The contract document (frozen at v1.0.0).
|
||||
- Schema validation against the v1.0 shape produced by AZ-323.
|
||||
- Signature verification against a tuple of trusted public keys.
|
||||
- Per-artifact stream-hash walk with multiple-failure accumulation.
|
||||
- Airborne vs operator mode for tiles_coverage handling.
|
||||
- Composition-root factory.
|
||||
- Conformance test for the contract Protocol.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Manifest building / signing (AZ-323 owns).
|
||||
- Trusted-key distribution / loading from disk — caller passes `Ed25519PublicKey` instances.
|
||||
- Cache repair on FAIL — caller (E-C5 takeoff arming, E-C12 operator) decides next action.
|
||||
- Coverage check for orphan files in `cache_root` (AZ-325 owns `ManifestCoverageError`).
|
||||
- Logging Manifest contents (Manifests are not secret but verbose; only fingerprints + counts are logged).
|
||||
- C13 FDR emission — caller's responsibility (per MV-INV-6).
|
||||
- Non-Ed25519 signatures.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: PASS on a valid Manifest with all artifacts present and matching**
|
||||
Given a freshly-built Manifest + sig + sidecar from AZ-323 and `trusted_public_keys = (signing_pub,)`
|
||||
When `verify_manifest(manifest_path, trusted_public_keys)` is called
|
||||
Then `outcome=PASS`, `fail_reasons` is empty, `per_artifact_checks` has every entry `matched=True`, `signing_public_key_fingerprint` is the signing key's fingerprint, `elapsed_ms > 0`
|
||||
|
||||
**AC-2: FAIL on missing Manifest with no further work**
|
||||
Given `manifest_path` does not exist
|
||||
When verify runs
|
||||
Then `outcome=FAIL`, `fail_reasons=(MANIFEST_NOT_FOUND,)`, `per_artifact_checks` is empty (no work performed), `signing_public_key_fingerprint=None`
|
||||
|
||||
**AC-3: FAIL on missing signature with diagnostic**
|
||||
Given Manifest.json exists + sidecar matches but Manifest.json.sig is absent
|
||||
When verify runs
|
||||
Then `fail_reasons=(SIGNATURE_NOT_FOUND,)`, `per_artifact_checks` is empty, no per-artifact disk reads happen (defence-in-depth)
|
||||
|
||||
**AC-4: FAIL on tampered Manifest body**
|
||||
Given Manifest.json is mutated by 1 byte after signing
|
||||
When verify runs
|
||||
Then either `MANIFEST_SELF_HASH_MISMATCH` (sidecar caught it first) OR `SIGNATURE_INVALID` (if sidecar was also re-computed by attacker); per-artifact walk does NOT happen
|
||||
|
||||
**AC-5: FAIL on untrusted public key**
|
||||
Given the Manifest is signed with a key NOT in `trusted_public_keys`
|
||||
When verify runs
|
||||
Then `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)`, `signing_public_key_fingerprint` is populated (so operators see WHICH untrusted key signed it), per-artifact walk does NOT happen
|
||||
|
||||
**AC-6: FAIL on schema violation lists offending field**
|
||||
Given a Manifest missing the `signing_public_key_fingerprint` key
|
||||
When verify runs
|
||||
Then `fail_reasons=(SCHEMA_VIOLATION,)`, `fail_details` contains a string naming `signing_public_key_fingerprint`
|
||||
|
||||
**AC-7: FAIL on absolute path in artifact entry**
|
||||
Given an engine entry has `path: "/etc/passwd"`
|
||||
When verify runs
|
||||
Then `fail_reasons=(SCHEMA_VIOLATION,)`, `fail_details` names the offending field; per-artifact walk does NOT consult `/etc/passwd`
|
||||
|
||||
**AC-8: FAIL with multiple reasons accumulated**
|
||||
Given one engine is missing on disk AND one engine's bytes drifted AND a third engine matches
|
||||
When verify runs
|
||||
Then `fail_reasons` contains BOTH `ARTIFACT_MISSING` and `ARTIFACT_HASH_MISMATCH` (in deterministic order: traversal order); `per_artifact_checks` has all 3 entries with correct `matched` values; the third entry has `matched=True`
|
||||
|
||||
**AC-9: Operator mode re-derives tiles_coverage**
|
||||
Given `tile_metadata_store` is supplied AND C6's tiles for the build's bbox/zoom now have a different aggregate hash (e.g., a tile was re-downloaded)
|
||||
When verify runs
|
||||
Then `fail_reasons=(TILES_COVERAGE_MISMATCH,)`; the recorded vs computed hashes are in `fail_details`
|
||||
|
||||
**AC-10: Airborne mode trusts tiles_coverage post-signature**
|
||||
Given `tile_metadata_store=None`
|
||||
When verify runs
|
||||
Then `tiles_coverage` `ArtifactCheck` shows `matched=True` (recorded == "actual" because we don't re-derive); the airborne F2 path is fast (≤ 100 ms per NFR)
|
||||
|
||||
**AC-11: Conformance — `isinstance` returns True**
|
||||
Given the implementation
|
||||
When `isinstance(impl, ManifestVerifier)` is checked under runtime_checkable
|
||||
Then `True`
|
||||
|
||||
**AC-12: `elapsed_ms` recorded on every outcome**
|
||||
Given any of the above ACs
|
||||
When inspecting the result
|
||||
Then `elapsed_ms >= 0` and is reasonable (smaller for early-exit failures, larger for full per-artifact walks)
|
||||
|
||||
**AC-13: Empty `trusted_public_keys` always fails closed**
|
||||
Given `trusted_public_keys = ()`
|
||||
When verify runs
|
||||
Then `fail_reasons=(UNTRUSTED_PUBLIC_KEY,)` regardless of Manifest validity; per-artifact walk does NOT happen
|
||||
|
||||
**AC-14: Manifest with no `flight` block parses cleanly (back-compat)**
|
||||
Given a v1.0 Manifest (no `flight` block) that is otherwise valid + signed
|
||||
When verify runs
|
||||
Then `outcome=PASS`; `VerificationResult.takeoff_origin is None`; `VerificationResult.flight_id is None`
|
||||
|
||||
**AC-15: Well-formed in-bbox `takeoff_origin` passes through**
|
||||
Given a v1.1 Manifest with `flight.takeoff_origin = (50.0, 36.2, 200.0)` inside the recorded bbox
|
||||
When verify runs
|
||||
Then `outcome=PASS`; `VerificationResult.takeoff_origin == LatLonAlt(50.0, 36.2, 200.0)`
|
||||
|
||||
**AC-16: Malformed `takeoff_origin` (lat=200) fails closed**
|
||||
Given a Manifest with `flight.takeoff_origin.lat_deg = 200`
|
||||
When verify runs
|
||||
Then `outcome=FAIL`; `fail_reasons` contains `TAKEOFF_ORIGIN_INVALID`; `fail_details` names `lat_deg`; the `takeoff_origin` field on `VerificationResult` is still populated for diagnostics
|
||||
|
||||
**AC-17: Out-of-bbox `takeoff_origin` fails closed**
|
||||
Given a Manifest whose `flight.takeoff_origin = (10.0, 10.0, 0)` while `build.bbox` covers `(49.5..50.5, 35.5..36.5)`
|
||||
When verify runs
|
||||
Then `outcome=FAIL`; `fail_reasons` contains `TAKEOFF_ORIGIN_OUT_OF_BBOX`
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Airborne F2 verify (no per-tile re-derivation, ~5 artifact entries): wall-clock ≤ 100 ms on Jetson Orin (signature verify + 5 stream-SHA-256s of bounded files).
|
||||
- Operator-mode verify with 100k tiles re-derivation: ≤ 5 s (matches AZ-323's NFR).
|
||||
- Stream-hash files via 64 KB chunks; do NOT load engine binaries (~200 MB) entirely into memory.
|
||||
|
||||
**Compatibility**
|
||||
- `cryptography` (already pinned via AZ-318), `orjson` (already pinned), `hashlib` (stdlib).
|
||||
- No new third-party dependencies.
|
||||
|
||||
**Reliability**
|
||||
- Fail-closed: empty trusted keys → FAIL; missing files → FAIL; any drift → FAIL.
|
||||
- No partial PASS; the `outcome=PASS` branch is taken only when `fail_reasons` is empty.
|
||||
- Defensive against directory traversal: relative paths only (AC-7).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Built Manifest from AZ-323 fixture | PASS; all matched |
|
||||
| AC-2 | Missing Manifest.json | FAIL; MANIFEST_NOT_FOUND only |
|
||||
| AC-3 | Missing signature | FAIL; SIGNATURE_NOT_FOUND; no disk reads |
|
||||
| AC-4 | Mutated Manifest body | FAIL; either MANIFEST_SELF_HASH_MISMATCH or SIGNATURE_INVALID |
|
||||
| AC-5 | Wrong-key signing | FAIL; UNTRUSTED_PUBLIC_KEY; fingerprint populated |
|
||||
| AC-6 | Missing required field | FAIL; SCHEMA_VIOLATION + field name |
|
||||
| AC-7 | Absolute path in artifact | FAIL; SCHEMA_VIOLATION; no path traversal |
|
||||
| AC-8 | 1 missing + 1 drifted + 1 OK | Two failure reasons; per_artifact_checks complete |
|
||||
| AC-9 | Operator mode + drifted tile | TILES_COVERAGE_MISMATCH |
|
||||
| AC-10 | Airborne mode | tiles_coverage matched=True |
|
||||
| AC-11 | Conformance check | True |
|
||||
| AC-14 | v1.0 Manifest (no flight block) | PASS; takeoff_origin=None; flight_id=None |
|
||||
| AC-15 | v1.1 Manifest, valid in-bbox origin | PASS; takeoff_origin populated |
|
||||
| AC-16 | Malformed origin (lat=200) | FAIL; TAKEOFF_ORIGIN_INVALID; field name in details |
|
||||
| AC-17 | Out-of-bbox origin | FAIL; TAKEOFF_ORIGIN_OUT_OF_BBOX |
|
||||
| AC-12 | Inspect elapsed_ms | All non-negative; ordered as expected |
|
||||
| AC-13 | Empty trusted keys | FAIL; UNTRUSTED |
|
||||
| NFR-perf-airborne | 5 artifact bench, no tile re-walk | p99 ≤ 100 ms |
|
||||
| NFR-perf-operator | 100k-tile re-walk | ≤ 5 s |
|
||||
| NFR-reliability-stream-hash | 200 MB engine + memory profile | Peak < 10 MB extra |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Stream SHA-256 over files via `hashlib.sha256().update(chunk)` in 64 KB blocks; do NOT `Path.read_bytes()` on engines (memory blowup per NFR).
|
||||
- Path interpretation is relative-only; absolute paths are SCHEMA_VIOLATION (AC-7).
|
||||
- The verifier is read-only (per MV-INV-6); no disk writes, no network, no FDR.
|
||||
- `fail_reasons` is a tuple (immutable, ordered, deterministic).
|
||||
- Signature checks happen before per-artifact walks (per MV-INV-2).
|
||||
- Manifest sidecar check happens before signature (per MV-INV-3).
|
||||
- Multiple failures accumulate; do not short-circuit on first per-artifact failure (per MV-TC-9 / AC-8).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Trusted-key list accidentally empty in production wiring**
|
||||
- *Risk*: Composition root mis-configures; airborne C5 ends up with an empty key list and arming silently fails forever.
|
||||
- *Mitigation*: AC-13 + ERROR log on `UNTRUSTED_PUBLIC_KEY` with key-list-length=0 makes the misconfiguration loud at first arm attempt.
|
||||
|
||||
**Risk 2: Per-artifact walk dominates airborne arm latency**
|
||||
- *Risk*: 5 engines × 200 MB stream-hash on slow microSD → 30 s arm latency.
|
||||
- *Mitigation*: NFR-perf-airborne benchmark documents the envelope; if the Jetson microSD I/O is the bottleneck, a follow-up task adds an "incremental verify" path that trusts unchanged artifacts since last reboot. Out of scope this cycle.
|
||||
|
||||
**Risk 3: Tampered sidecar matches tampered body (attacker drops both sidecar + body)**
|
||||
- *Risk*: AC-4's first failure case (sidecar mismatch) is bypassed by an attacker who recomputes the sidecar.
|
||||
- *Mitigation*: Signature check (Step B) catches this — the signature is over the Manifest body; recomputing the sidecar does NOT also recompute the signature. The Ed25519 secret key is operator-only.
|
||||
|
||||
**Risk 4: Path traversal via relative `..` segments**
|
||||
- *Risk*: A relative path like `../../etc/passwd` passes the "no leading /" check but escapes cache_root.
|
||||
- *Mitigation*: AC-7 + `..` segment rejection covers it; explicit check `if ".." in Path(entry.path).parts: SCHEMA_VIOLATION`.
|
||||
|
||||
**Risk 5: Operator-mode tile re-walk on Jetson is too slow**
|
||||
- *Risk*: An airborne-mode verifier mistakenly gets a `tile_metadata_store` (composition root mistake) and re-walks 100k tiles, blowing the arm latency budget.
|
||||
- *Mitigation*: The composition root factory `build_manifest_verifier(config, *, with_tile_store: bool)` is the explicit toggle; airborne wiring passes `with_tile_store=False`. AC-10 tests airborne mode latency.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: takeoff content-hash gate per AC-NEW-1 + D-C10-3 + C10-IT-02 (epic § Acceptance C10-IT-01..02; description.md § 5 `ContentHashMismatchError`).
|
||||
- **Production code that must exist**: real `ManifestVerifier` orchestrating real `cryptography` Ed25519 verify + real `hashlib` stream-SHA-256 + real `orjson` schema parse; real `tile_metadata_store` re-derivation in operator mode.
|
||||
- **Allowed external stubs**: tests MAY use a fake key generated in-test, fake Manifest fixtures from AZ-323's test fixtures; production wiring uses real keys from operator key store.
|
||||
- **Unacceptable substitutes**: skipping Step A's sidecar check (loses bit-rot detection); skipping Step B before walking artifacts (defeats MV-INV-2 defence-in-depth); short-circuiting on first per-artifact failure (operators need full diagnostic per MV-TC-9); HMAC instead of Ed25519 (different trust model); accepting absolute paths in entries (path traversal vulnerability per AC-7); raising on missing files instead of `outcome=FAIL` (breaks the contract's read-only / never-raise-on-verify-failure invariant).
|
||||
@@ -0,0 +1,127 @@
|
||||
# Hygiene — Align module-layout.md cross-component import rules with AZ-270 lint
|
||||
|
||||
**Task**: AZ-507_hygiene_module_layout_az270_alignment
|
||||
**Name**: Module-layout ↔ AZ-270 lint alignment
|
||||
**Description**: Resolve the architecture contradiction surfaced by cumulative review batches 31–33 (Finding F1, Medium / Architecture): `_docs/02_document/module-layout.md` allows c10 → c7 imports via `(Public API)` while the AZ-270 lint (`test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies`) forbids ANY cross-component import. Apply option (a) from the review: tighten the doc to match the lint. Rewrite every `module-layout.md` *"Imports from"* line that names `components.X (Public API)` to instead point at `_types`. Hoist `EngineBuildError` + `CalibrationCacheError` to `_types/inference_errors.py` so consumers catch a typed envelope instead of widening to `except Exception`.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-270 (Composition Root + AZ-270 lint)
|
||||
**Component**: cross-cutting (architecture compliance — Configuration & Composition Root epic E-CC-CONF)
|
||||
**Tracker**: AZ-507
|
||||
**Epic**: AZ-246 (E-CC-CONF)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/module-layout.md` — the document being edited. Every line tagged `(Public API)` for a cross-component import must be replaced or removed.
|
||||
- `_docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md` § F1 — the finding being closed.
|
||||
- `_docs/02_document/architecture.md` — confirm the `_types`-anchored DTO pattern is the canonical cross-component contract surface.
|
||||
|
||||
## Problem
|
||||
|
||||
After AZ-321 (batch 33), `_docs/02_document/module-layout.md` and `tests/unit/test_az270_compose_root.py::test_ac6` describe contradictory rules:
|
||||
|
||||
- The doc tells implementers c10/c11/c12/c2/etc. MAY import from `components.X (Public API)`.
|
||||
- The lint walks every `.py` under `src/gps_denied_onboard/components/` and FAILS the build on any `from gps_denied_onboard.components.X import …` where X is not the importer's own component — regardless of `Public API` status.
|
||||
|
||||
The lint is strictly stricter than the doc. The first task that took the doc at its word (AZ-321 `engine_compiler.py`) failed the lint immediately and had to pivot to a consumer-side `CompileEngineCallable` Protocol cut + broad `except Exception`. The pivot was correct, but the next c10/c11/c12 task to need C7 Public API symbols will hit the same wall. The contradiction must be resolved at the source.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `_docs/02_document/module-layout.md` no longer references `components.X (Public API)` in any "Imports from" line. Every such reference is replaced by `_types` (DTOs already live there) or removed if the import is dead.
|
||||
- A new shim module `src/gps_denied_onboard/_types/inference_errors.py` re-exports `EngineBuildError` and `CalibrationCacheError` from `c7_inference` internals (the canonical definitions stay in c7_inference; the shim is a Layer-0 typed envelope).
|
||||
- `c10_provisioning/engine_compiler.py::_compile_one` narrows its `except Exception` catch to the union of `EngineBuildError | CalibrationCacheError` imported from `_types/inference_errors.py`. All other catch-all behavior is preserved (unknown exceptions still bubble up).
|
||||
- The AZ-270 lint test (`tests/unit/test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies`) continues to pass without modification.
|
||||
- The existing c7_inference + c10_provisioning unit test suites continue to pass without modification.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Edits to `_docs/02_document/module-layout.md` for c10_provisioning + any other component whose "Imports from" line names `components.X (Public API)`.
|
||||
- Create `src/gps_denied_onboard/_types/inference_errors.py` with `EngineBuildError` and `CalibrationCacheError` re-exports.
|
||||
- Narrow `c10_provisioning/engine_compiler.py::_compile_one`'s `except Exception` catch to the typed envelope.
|
||||
- Add a doc-level note in `_docs/02_document/architecture.md` (one paragraph under the existing cross-component contract section) codifying the rule: cross-component imports go through `_types`, never through `components.X (Public API)`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Option (b) from the cumulative review (loosen the lint to read a Public API allowlist). Rejected: adds parser fragility to the lint.
|
||||
- Refactoring any concrete behavior. This is documentation + a typed-error shim + one narrowed catch.
|
||||
- Renaming public symbols (`EngineBuildError` and `CalibrationCacheError` keep their c7_inference canonical names; the shim re-exports under the same names).
|
||||
- The F3 finding ("Consumer-Side Protocol Cuts" architecture-doc section). That is a separate hygiene PBI if/when promoted.
|
||||
- Touching any consumer other than `engine_compiler.py`. There is currently only one consumer of the c7 error types (c10_provisioning).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: module-layout.md has no `components.X (Public API)` imports**
|
||||
Given the AZ-321 module-layout edits documenting c10's cross-component imports
|
||||
When the doc is re-read after this task
|
||||
Then no "Imports from" line names `components.X (Public API)`; every former such reference points at `_types` or has been removed as dead
|
||||
|
||||
**AC-2: `_types/inference_errors.py` re-exports the c7 typed error envelope**
|
||||
Given the new shim `src/gps_denied_onboard/_types/inference_errors.py`
|
||||
When a consumer imports `from gps_denied_onboard._types.inference_errors import EngineBuildError, CalibrationCacheError`
|
||||
Then both symbols resolve and are identical (same class objects) to the canonical c7_inference definitions
|
||||
|
||||
**AC-3: engine_compiler narrows its catch**
|
||||
Given `c10_provisioning/engine_compiler.py::_compile_one` previously had `except Exception`
|
||||
When the task lands
|
||||
Then the catch is `except (EngineBuildError, CalibrationCacheError)` (or equivalent typed union); unknown exceptions propagate; all existing `engine_compiler` unit tests pass without test modification
|
||||
|
||||
**AC-4: AZ-270 lint still passes**
|
||||
Given the rewritten doc + new shim + narrowed catch
|
||||
When `pytest tests/unit/test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies` runs
|
||||
Then the test passes (no new cross-component imports introduced)
|
||||
|
||||
**AC-5: Architecture doc codifies the rule**
|
||||
Given `_docs/02_document/architecture.md` has the existing cross-component contract section
|
||||
When this task lands
|
||||
Then a new one-paragraph note under that section says: cross-component imports go through `_types/*.py` (DTOs + typed error envelopes), never through `components.X (Public API)`; the only exception is `runtime_root/*` (composition root)
|
||||
|
||||
**AC-6: No behavior change**
|
||||
Given the c7_inference + c10_provisioning unit test suites
|
||||
When the suites run after this task
|
||||
Then every previously passing test still passes; every previously skipped test is still skipped for the same reason; no new failures or warnings
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Compatibility**
|
||||
- No new third-party dependencies.
|
||||
- No bump to numpy / opencv / pytest / mypy pins.
|
||||
|
||||
**Reliability**
|
||||
- The `_types/inference_errors.py` shim is import-only; it does not introduce module-load side effects.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|------------------|
|
||||
| AC-2 | `import` from `_types.inference_errors` resolves both symbols | Identity check `is` matches c7 canonical classes |
|
||||
| AC-3 | `engine_compiler._compile_one` propagates non-typed exceptions | `RuntimeError("test")` bubbles up; `EngineBuildError("test")` is caught |
|
||||
| AC-4 | Run AZ-270 lint test | Pass |
|
||||
| AC-6 | Full c7_inference + c10_provisioning suite | All previously-passing tests still pass |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The canonical class definitions for `EngineBuildError` and `CalibrationCacheError` STAY in c7_inference. The shim is purely a re-export at the `_types` Layer-0 surface — moving the canonical definitions would change the c7 surface and risk breaking the c7 internal call sites.
|
||||
- `_types/*.py` must not import from `gps_denied_onboard.components.*`. The shim achieves this via a local re-export idiom: `from gps_denied_onboard.components.c7_inference._errors import EngineBuildError, CalibrationCacheError`. The `_types` surface itself remains Layer 0; the *import direction* points downward into a component, which the AZ-270 lint already allows for `_types`-internal modules (verified — the lint only walks `components/`, not `_types/`).
|
||||
- No public API surface changes. All current consumers that already catch `Exception` continue to work.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: A consumer relies on the broad `Exception` catch to swallow non-EngineBuildError errors**
|
||||
- *Risk*: Tightening the catch in `engine_compiler._compile_one` exposes a previously-suppressed error.
|
||||
- *Mitigation*: Audit the c7_inference internals before the change. Currently only `EngineBuildError` and `CalibrationCacheError` are documented in c7's public surface; everything else is a programmer error. AC-3's unit test covers the propagation contract.
|
||||
|
||||
**Risk 2: `_types/inference_errors.py` re-export introduces an import cycle**
|
||||
- *Risk*: c7_inference somehow imports back from `_types/inference_errors`, creating `c7_inference → _types/inference_errors → c7_inference._errors → c7_inference`.
|
||||
- *Mitigation*: c7_inference defines its errors in `_errors.py` (or similar); it doesn't need to import from `_types`. Verify no such backward import exists before landing. Add an explicit AZ-270-style test if a back-edge appears.
|
||||
|
||||
**Risk 3: Architecture doc note becomes stale if `_types` is reorganized**
|
||||
- *Risk*: A future refactor moves DTOs out of `_types`; the note still says "go through `_types`".
|
||||
- *Mitigation*: Keep the note short and rule-shaped ("cross-component contract surface lives in the project's Layer-0 typed envelope"). The exact path is incidental to the rule.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: architecture-compliance documentation that matches the AZ-270 enforcement gate; small typed-error envelope shim; one narrowed catch.
|
||||
- **Production code that must exist**: real `_types/inference_errors.py` shim + real narrowed `except (EngineBuildError, CalibrationCacheError)` in `engine_compiler._compile_one`.
|
||||
- **Allowed external stubs**: none. This task is pure refactor + docs.
|
||||
- **Unacceptable substitutes**: a `# type: ignore` over the broad catch (defeats the typed envelope intent); deleting the c7 error classes and moving them wholesale into `_types` (changes c7's surface and risks regression in tests that import directly from c7); leaving the broad `except Exception` (defeats the entire reason F1 was opened).
|
||||
Reference in New Issue
Block a user