# Common Helper — `Sha256Sidecar` ## Purpose Atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) verifies must be written atomically AND have a `.sha256` sidecar that the verifier can independently recompute. Centralising the pattern avoids two slightly-different implementations across C6 (FAISS index, tile metadata) and C7 (engine + calibration cache) and C10 (Manifest itself). ## Used By - C6 — Tile Cache + Spatial Index (FAISS `.index`, descriptor sidecar; tile pixels do NOT use sidecars individually — there are too many; the Manifest covers the tile-tree hash collectively). - C7 — Inference Runtime (engine cache files + INT8 calibration cache; D-C10-6 calibration-cache trust depends on this). - C10 — Pre-flight Cache Provisioning (Manifest itself; aggregate hash of the cache root). ## Interface (sketch) ``` class Sha256Sidecar: @staticmethod def write_atomic(path: Path, payload: bytes) -> sha256 @staticmethod def write_atomic_and_sidecar(path: Path, payload: bytes) -> sha256 @staticmethod def verify(path: Path) -> bool # checks payload hash against sidecar @staticmethod def aggregate_hash(paths: list[Path]) -> sha256 # for Manifest covering many files ``` ## Implementation Notes - Backed by the `atomicwrites` package for atomic rename and Python's `hashlib.sha256` for digesting. - Sidecars are written as `.sha256` containing the hex digest (no JSON wrapper — keeps verification trivial). - `aggregate_hash` is order-deterministic (sorts paths first) so two runs that read the same files yield the same aggregate. ## Caveats - The atomic rename is filesystem-level — works on POSIX local filesystems, not on NFS / SMB / overlayfs. For production deployments the cache root MUST live on a local filesystem. - The sidecar is NOT cryptographically signed; it protects against accidental corruption + file-replacement-after-staging, NOT against an attacker with write access to the cache root. Threat model treats the operator workstation as trusted; the companion's write access is restricted to F4 (mid-flight tile gen) which has its own per-flight signing key path. ## Cycle-1 operational reality The shipped surface in `src/gps_denied_onboard/helpers/sha256_sidecar.py` (AZ-280) is static-only by design. Atomicity comes from `atomicwrites.atomic_write` (temp-file → `os.replace`). All four entry points wrap `OSError` and `ValueError` into a single exception hierarchy. - **`Sha256SidecarError`** — single public exception type (subclasses `RuntimeError`). Raised on: `write_atomic` OS failure; `write_atomic_and_sidecar` sidecar OS failure; `verify` finds the sidecar missing for an existing payload; sidecar text not exactly 64 lowercase hex chars; `aggregate_hash` finds a missing or unreadable path. - **`SIDECAR_SUFFIX = ".sha256"`** — public module-level constant for callers (e.g. takeoff-load verifier listing) that need to spell the sidecar suffix without hard-coding it. - **Sidecar file format** — pure hex digest, no JSON wrapper, exactly 64 chars, all lowercase. The validator rejects uppercase or wrong-length sidecars hard (catches "user edited the sidecar by hand and broke it"). Keeps verification trivial. - **Sidecar path appends `.sha256` verbatim** — `Path.with_suffix` would re-interpret an existing extension; we explicitly use `Path(str(payload_path) + ".sha256")`. So `manifest` → `manifest.sha256` AND `engine.engine` → `engine.engine.sha256`. This is the AC-NEW-CACHE-3 / D-C10-3 invariant. - **Streaming digests** — `verify` and `aggregate_hash` stream the file in 1 MiB chunks (`_digest_file`) so an 8 GB engine file does not require 8 GB of RAM. `write_atomic` is the only entry point that operates on in-memory `bytes`. - **`verify` semantics** — returns `False` (not raise) when the payload path is missing entirely ("not verifiable" rather than "verification error"); raises `Sha256SidecarError` when the payload exists but the sidecar is missing, unreadable, or malformed. Callers can branch on `path.exists()` first if they need to distinguish missing-payload from corrupt-sidecar. - **`aggregate_hash` is byte-deterministic** — input list is sorted lexicographically by `str(path)` before hashing. The digest is computed over the concatenation of `\0\n` lines (basename only, NOT full path, so the same physical file at a different mount point still produces the same aggregate). Missing paths in the input list raise instead of being silently skipped. ### Cycle-1 task lineage - AZ-280 — initial helper, contract producer. - No cycle-1 follow-up tasks touched this helper. The C10 / C6 / C7 task batch that consumes it (AZ-301 C7 engine gate, AZ-303 C6 storage interfaces, AZ-305 C6 postgres+filesystem store, AZ-321 C10 engine compiler, AZ-322 C10 descriptor batcher, AZ-323 C10 manifest builder, AZ-324 C10 manifest verifier, AZ-325 C10 cache provisioner) cycles through the four `Sha256Sidecar` static methods without extending them.