Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,243 @@
# C10 Manifest Builder — Content-Hash Table + Operator-Key Ed25519 Signing
**Task**: AZ-323_c10_manifest_builder
**Name**: C10 Manifest Builder
**Description**: Implement `ManifestBuilder`, the C10-internal phase that produces the signed cache Manifest covering EVERY shipped artifact (engines, FAISS index, calibration JSON, all tile hashes from C6) plus the build-identity tuple `(model_ids, calibration_sha256, sorted_tile_hashes, sector_class, bbox, zoom_levels)` whose canonical hash is `manifest_hash` — the D-C10-1 idempotence key. Serializes the Manifest as canonical JSON (sorted keys, no whitespace) at `cache_root/Manifest.json`, computes its own SHA-256 sidecar via AZ-280, and writes a detached Ed25519 signature at `cache_root/Manifest.json.sig` using the operator's signing key from `key_path`. Refuses to sign with a non-operator key when `config.c10.signing_mode = "operator"` (C10-ST-01). Emits the `signing_public_key_fingerprint` into the Manifest itself so verifiers can pin the trust root.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-303_c6_storage_interfaces
**Component**: c10_provisioning (epic AZ-252 / E-C10)
**Tracker**: AZ-323
**Epic**: AZ-252 (E-C10)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — atomic write + sidecar pattern (AZ-280).
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md``query_by_bbox` returning per-tile sha256 set by AZ-316.
- `_docs/02_document/components/11_c10_provisioning/description.md` — § 1 idempotence, § 5 `ManifestWriteError`, § 7 D-C10-3 sidecar coverage.
## Problem
Without a real Manifest builder:
- D-C10-1 (idempotent re-run via manifest hash) cannot be implemented — T5's "did anything change?" check has no canonical hash to compare.
- D-C10-3 (SHA-256 content-hash gate over every shipped artifact) is unobservable — the takeoff verifier (T4) has nothing to verify against.
- AC-NEW-1 ("no engine deserialization at takeoff before manifest verify") collapses without a signed Manifest at takeoff.
- C10-ST-01 (build refuses dev-key signing in operator mode) cannot be enforced without a signing key check.
- The `signing_public_key_fingerprint` field is the trust anchor for the airborne `ManifestVerifier`; without it, the verifier cannot decide which key is allowed to vouch for a Manifest.
- A Manifest that is huge (100k tile hashes × 80 bytes = 8 MB) but human-inspectable is operator-friendly; without canonical JSON ordering, two builds of the same input produce different bytes and break idempotence.
This task delivers the Manifest serialization + signing. It does NOT compile engines (AZ-321), embed tiles (AZ-322), or run the takeoff verify (T4).
## Outcome
- A `ManifestBuilder` class at `src/gps_denied_onboard/components/c10_provisioning/manifest_builder.py`:
- Constructor: `__init__(self, *, sidecar: Sha256Sidecar, signer: ManifestSigner, tile_metadata_store: TileMetadataStore, logger: Logger, clock: Clock, config: C10ManifestConfig)`.
- `C10ManifestConfig` (`@dataclass(frozen=True)`): `signing_mode: enum {operator, dev}`, `allowed_operator_fingerprints: tuple[str, ...]`, `schema_version: str = "1.0"`.
- Public method: `build_manifest(input: ManifestBuildInput) -> ManifestArtifact`.
- `ManifestBuildInput` (`@dataclass(frozen=True)`): `cache_root: Path`, `bbox: Bbox`, `zoom_levels: tuple[int, ...]`, `sector_class: SectorClassification`, `engine_entries: tuple[EngineCacheEntry, ...]`, `descriptor_index_path: Path`, `calibration_path: Path`, `key_path: Path`.
- `ManifestArtifact` (`@dataclass(frozen=True)`): `manifest_path: Path`, `signature_path: Path`, `manifest_hash: str`, `signing_public_key_fingerprint: str`, `total_artifacts_listed: int`.
- A `ManifestSigner` Protocol at `src/gps_denied_onboard/components/c10_provisioning/interface.py`:
```python
@runtime_checkable
class ManifestSigner(Protocol):
def load_signing_key(self, key_path: Path) -> SigningKeyHandle: ...
def sign(self, key: SigningKeyHandle, payload_bytes: bytes) -> bytes: ...
def public_key_fingerprint(self, key: SigningKeyHandle) -> str: ...
```
Default impl `Ed25519ManifestSigner` uses the `cryptography` library (already pinned via AZ-318 for per-flight keys).
- Method flow:
1. Load operator signing key: `signer.load_signing_key(input.key_path)` → `SigningKeyHandle`.
2. Compute `signing_public_key_fingerprint = signer.public_key_fingerprint(key)` (sha256 of the raw 32-byte ed25519 public key, hex).
3. **Operator-mode gate (C10-ST-01)**: if `config.signing_mode == "operator"` AND `fingerprint not in config.allowed_operator_fingerprints` → raise `ManifestWriteError("signing key fingerprint not in allowed_operator_fingerprints")`; ERROR log with the offending fingerprint. If `config.signing_mode == "dev"` AND fingerprint matches an allowed operator fingerprint → emit WARN `c10.manifest.dev_mode_with_operator_key` (operator key being used in dev mode is suspicious but allowed).
4. Compute per-artifact hashes:
- For each engine entry: read `entry.engine_sha256_hex` (already computed by AZ-321; do NOT re-hash).
- For descriptor index: call `sidecar.read_sidecar(input.descriptor_index_path)` → expect a 64-char hex digest.
- For calibration JSON: `sha256_hex(open(calibration_path, 'rb').read())` — calibration is small (KB).
- For tiles: call `tile_metadata_store.query_by_bbox(bbox, zoom_levels, sector_class)` → list of `TileMetadata` with `sha256_hex` field (set by AZ-316). Sort by `(zoom, lat, lon, source)` for determinism. Compute `tiles_coverage_sha256 = sha256(b"\n".join(f"{t.tile_id}:{t.sha256_hex}".encode() for t in sorted_tiles))`.
5. Build the canonical Manifest dict:
```
{
"schema_version": "1.0",
"build": {
"bbox": {...},
"zoom_levels": [16, 17, 18],
"sector_class": "stable_rear",
"built_at": "2026-05-10T12:00:00Z",
"manifest_hash": "<sha256-hex>"
},
"artifacts": {
"engines": [{"path": "engines/dinov2_vpr_sm87_jp62_trt103_fp16.engine", "sha256": "<hex>"}, ...],
"descriptor_index": {"path": "descriptors/corpus.index", "sha256": "<hex>"},
"calibration": {"path": "calibration/int8_calibration.json", "sha256": "<hex>"},
"tiles_coverage": {"sha256": "<hex>", "tile_count": <int>}
},
"signing_public_key_fingerprint": "<hex>"
}
```
6. Compute `manifest_hash` as `sha256(canonical_json(build_identity_tuple))` where `build_identity_tuple = sorted({model_ids, calibration_sha256, tiles_coverage_sha256, sector_class, bbox, zoom_levels})`. This is the D-C10-1 idempotence key. Insert into the Manifest dict at `build.manifest_hash` AFTER computation.
7. Serialize the Manifest dict as canonical JSON: `orjson.dumps(manifest, option=orjson.OPT_SORT_KEYS | orjson.OPT_INDENT_2).decode()`. Append a trailing newline.
8. Atomic-write the JSON via `sidecar.write_with_sidecar(cache_root / "Manifest.json", canonical_json_bytes)` — produces `Manifest.json` + `Manifest.json.sha256` (the latter is the Manifest's OWN sha256, used by T4).
9. Sign the canonical JSON bytes: `signature_bytes = signer.sign(key, canonical_json_bytes)` (raw Ed25519 signature, 64 bytes).
10. Atomic-write the signature: `sidecar.atomic_write(cache_root / "Manifest.json.sig", signature_bytes)` (no .sha256 sidecar for the signature itself — signature integrity is verified by Ed25519 over the Manifest bytes).
11. Return `ManifestArtifact(manifest_path, signature_path, manifest_hash, signing_public_key_fingerprint, total_artifacts_listed)`.
- INFO log on successful build (`c10.manifest.build.success` with `manifest_hash` + `total_artifacts_listed`); ERROR on `ManifestWriteError`; WARN on dev-mode-with-operator-key.
## Scope
### Included
- `ManifestBuilder` class with the single public method.
- `ManifestSigner` Protocol + `Ed25519ManifestSigner` default impl.
- Canonical JSON serialization (sorted keys, sorted lists where order is content-defining).
- Operator-key gate per `signing_mode` config.
- Per-artifact hash computation (engines, descriptor index, calibration, tiles aggregate).
- Atomic writes via AZ-280 for both `Manifest.json` and `Manifest.json.sig`.
- Composition-root factory `build_manifest_builder`.
- Conformance test for `ManifestSigner` Protocol.
### Excluded
- The orchestration of when to build (T5 owns).
- Engine compilation / descriptor generation (AZ-321 / AZ-322).
- Manifest verification (T4 owns).
- Idempotence "should we skip the build?" decision (T5 owns; this task always rebuilds when called).
- ManifestCoverageError (T5 owns; this task lists what it's told, doesn't enumerate cache_root).
- Key generation — operator's long-lived key is provisioned out-of-band; this task only loads + uses.
- Multi-key signing (M-of-N quorum) — single-key per build.
- Compressed Manifest format — JSON for human inspection.
## Acceptance Criteria
**AC-1: Happy path produces Manifest + sig + sidecars**
Given a valid input with 3 engines, 1 descriptor index, 1 calibration JSON, 100 tiles
When `build_manifest(input)` is called
Then `Manifest.json`, `Manifest.json.sha256`, `Manifest.json.sig` are all present at `cache_root/`; the Manifest contains 3 engine entries, 1 descriptor_index entry, 1 calibration entry, 1 tiles_coverage entry; `manifest_hash` is a 64-char lowercase hex string; the returned `ManifestArtifact.total_artifacts_listed == 5` (engines + index + calibration + tiles_coverage as one logical artifact + the Manifest itself counts separately if at all)
**AC-2: Determinism — same input produces byte-identical Manifest**
Given the same `ManifestBuildInput` run twice on different days (different `built_at`)
When the canonical JSON is compared with `built_at` redacted
Then both runs produce byte-identical bytes — proves canonical JSON ordering works; same `manifest_hash`. (This is the foundation for T5's idempotence check.)
**AC-3: Signature verifies against the public key**
Given the signature file + the operator's public key
When `cryptography.hazmat.primitives.asymmetric.ed25519.Ed25519PublicKey.verify(signature, manifest_bytes)` is called
Then no exception is raised — proves the signing produced a valid Ed25519 signature
**AC-4: Operator-mode rejects unknown fingerprint**
Given `config.signing_mode = "operator"` and `config.allowed_operator_fingerprints = ("known_fp",)` and a key file whose fingerprint is `"unknown_fp"`
When `build_manifest` is called
Then `ManifestWriteError` is raised with a message naming both fingerprints (the offered one + the allowlist); ZERO files are written; ONE ERROR log
**AC-5: Operator-mode accepts known fingerprint**
Given `config.signing_mode = "operator"` and the key file's fingerprint IS in the allowlist
When `build_manifest` is called
Then the build succeeds; ZERO WARN logs about dev-mode
**AC-6: Dev-mode with non-operator key emits no warning**
Given `config.signing_mode = "dev"` and a random dev key (not in allowlist)
When `build_manifest` is called
Then build succeeds; `signing_public_key_fingerprint` is the dev key's; ZERO warnings about operator key in dev mode
**AC-7: Dev-mode with operator key emits warning**
Given `config.signing_mode = "dev"` and a key whose fingerprint IS in `allowed_operator_fingerprints`
When `build_manifest` is called
Then build succeeds; ONE WARN log `c10.manifest.dev_mode_with_operator_key` with the fingerprint
**AC-8: Tile coverage hash is sort-order-deterministic**
Given the same 100 tiles loaded in two different SQL row orders (e.g., insertion order vs index scan)
When `tiles_coverage_sha256` is computed
Then both runs produce the same hash — proves the `(zoom, lat, lon, source)` sort is canonical
**AC-9: ManifestWriteError on key load failure**
Given a `key_path` that does not exist OR contains malformed PEM
When `signer.load_signing_key(key_path)` raises
Then `ManifestWriteError("operator signing key load failed: <reason>")` is raised; ZERO files are written; the original `cryptography` exception is chained as `__cause__` for diagnosis
**AC-10: Atomic write — partial Manifest impossible**
Given the Manifest is being written and the process is killed mid-write
When restarted
Then either the previous-good Manifest OR the new Manifest is at the path; never a half-written JSON. (AZ-280's atomic-write contract.)
**AC-11: Manifest's own sidecar is consistent**
Given a freshly-written `Manifest.json`
When `sha256_hex(open("Manifest.json", "rb").read())` is computed and compared to `Manifest.json.sha256`
Then the values match — T4's verifier walks all sidecars and this is the entry point
**AC-12: `total_artifacts_listed` equals dict-counted artifacts**
Given an input with N engines + 1 index + 1 calibration + tiles_coverage
When `ManifestArtifact.total_artifacts_listed` is inspected
Then it equals `N + 3` (engines + index + calibration + tiles_coverage); does NOT count the Manifest itself or the signature
## Non-Functional Requirements
**Performance**
- Build wall-clock ≤ 5 s for a 100k-tile corpus on Tier-1 dev workstation: sorting 100k tile hashes + computing one SHA-256 over the concatenated string is ~50 MB of input → ~100 ms; serializing JSON with 100k tile_count is fast (single integer); engine + index + calibration hashes are already computed upstream. Total ≤ 5 s leaves headroom.
- Operator-mode fingerprint check is a single string comparison.
**Compatibility**
- Uses `orjson` (already pinned via AZ-272 for FDR), `cryptography` (already pinned via AZ-318 for per-flight keys), `hashlib` (stdlib).
- No new third-party dependencies.
**Reliability**
- Operator-key gate is fail-closed: unknown fingerprint → no Manifest written.
- Atomic writes prevent half-written Manifests on process kill.
- Canonical JSON ensures bit-identical Manifests for identical inputs (foundation for D-C10-1 idempotence in T5).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Build with 3 engines + index + calibration + 100 tiles | All files present; counts match |
| AC-2 | Build twice, redact built_at, compare bytes | Identical |
| AC-3 | Verify signature with public key | No raise |
| AC-4 | Operator mode + unknown fingerprint | ManifestWriteError; no files |
| AC-5 | Operator mode + known fingerprint | Success; no warnings |
| AC-6 | Dev mode + dev key | Success; no warnings |
| AC-7 | Dev mode + operator-allowlisted key | Success; ONE warning |
| AC-8 | Tile rows in different orders | Same `tiles_coverage_sha256` |
| AC-9 | Missing or malformed key file | ManifestWriteError; chained cause |
| AC-10 | Kill mid-write | No half-Manifest |
| AC-11 | Verify Manifest's own sidecar | Hashes match |
| AC-12 | Inspect total_artifacts_listed | Counts engines+index+calibration+tiles_coverage |
| NFR-perf | 100k-tile bench | ≤ 5 s wall clock |
| NFR-reliability-fail-closed | Operator mode + unknown fp | Fail-closed; nothing written |
## Constraints
- Canonical JSON via `orjson` with `OPT_SORT_KEYS`; this task does NOT use a different JSON library.
- Atomic writes via AZ-280 for BOTH `Manifest.json` and `Manifest.json.sig`; no naked `Path.write_bytes()`.
- `manifest_hash` excludes `built_at` (it's a build-identity hash, not a Manifest-bytes hash).
- The Manifest's own SHA-256 sidecar (Manifest.json.sha256) IS the Manifest-bytes hash and is used by T4 at takeoff.
- Tile coverage hashing is via aggregate `tiles_coverage_sha256`, NOT per-tile entries in the Manifest (keeps Manifest bounded).
- Signature is detached (separate `.sig` file); embedded signatures are NOT permitted (would require parsing before verifying).
- Ed25519 only; this task does NOT add other algorithms.
- Operator-key fingerprint allowlist is config-driven; no hardcoded keys.
## Risks & Mitigation
**Risk 1: `built_at` makes Manifests non-deterministic for the same input**
- *Risk*: Idempotence check in T5 compares `manifest_hash` only, but if T5 reads the Manifest bytes directly elsewhere it could see different bytes for "same" build.
- *Mitigation*: AC-2 explicitly excludes `built_at` from the `manifest_hash` computation. T5 compares hashes, not bytes. Documented in the Manifest schema.
**Risk 2: tiles_coverage as aggregate hides which tile changed**
- *Risk*: When verify fails at takeoff (T4), the operator only learns "tiles_coverage hash mismatch", not WHICH tile drifted.
- *Mitigation*: T4's failure path can re-walk per-tile hashes against C6 to identify the offender. The Manifest stays small; debugging detail is computed on-demand. Documented in T4's scope.
**Risk 3: `cryptography` API breaks between minor versions**
- *Risk*: Ed25519 API changes (unlikely but `cryptography` does ship breaking changes occasionally).
- *Mitigation*: Pin to the same version used by AZ-318. The `Ed25519ManifestSigner` is the only place using the API; a one-place adapter swap on upgrade.
**Risk 4: Operator key file format ambiguity**
- *Risk*: Operators might supply a key in PKCS8, OpenSSH, or raw 32-byte format.
- *Mitigation*: `Ed25519ManifestSigner.load_signing_key` accepts PEM-encoded PKCS8 only (matches AZ-318's convention); other formats raise `ManifestWriteError` with explicit format hint.
**Risk 5: Dev key accidentally signs an operator-mode build**
- *Risk*: Operator runs build with `signing_mode = "operator"` but supplies a dev key by mistake.
- *Mitigation*: AC-4 covers; the gate is fail-closed and logs the offending fingerprint so the operator can correct.
## Runtime Completeness
- **Named capability**: signed Manifest production with content-hash table covering every shipped artifact, D-C10-1 idempotence key (`manifest_hash`), C10-ST-01 operator-mode gate (epic § Acceptance C10-IT-01, C10-IT-02, C10-ST-01).
- **Production code that must exist**: real `ManifestBuilder` orchestrating real `Ed25519ManifestSigner` (cryptography library) + real AZ-280 atomic writes + real C6 `query_by_bbox` to gather tile hashes; real config-driven fingerprint allowlist.
- **Allowed external stubs**: tests MAY use a fake `ManifestSigner` with a known keypair generated in-test + a fake `tile_metadata_store` (AZ-303 conformance fakes); production wiring uses `cryptography.hazmat`.
- **Unacceptable substitutes**: HMAC instead of Ed25519 (different trust model — symmetric vs asymmetric); embedding the signature in the JSON (defeats the parse-before-verify problem at takeoff); Python-only `pickle` of the Manifest (not human-inspectable, not canonical-byte stable); skipping the operator-fingerprint allowlist when `signing_mode = "operator"` (defeats C10-ST-01); using `json.dumps` without `OPT_SORT_KEYS` (breaks AC-2 determinism and breaks T5's idempotence).