# Sha256Sidecar Helper Module **Task**: AZ-280_sha256_sidecar **Name**: Sha256Sidecar Helper **Description**: Implement the shared `Sha256Sidecar` helper that owns the atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) must verify gets written atomically AND has a `.sha256` sidecar that the verifier can independently recompute. Used by C6 (FAISS index, descriptor sidecar), C7 (engine cache + INT8 calibration cache), C10 (Manifest), and C11 (tile artifact verification). Stateless static-only design. **Complexity**: 2 points **Dependencies**: AZ-263_initial_structure **Component**: shared.helpers.sha256_sidecar (cross-cutting; epic AZ-264 / E-CC-HELPERS) **Tracker**: AZ-280 **Epic**: AZ-264 (E-CC-HELPERS) ### Document Dependencies - `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — frozen public interface this task produces. - `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` — design rationale and consumer mapping (D-C10-3). ## Problem The takeoff-load gate (F2) verifies four classes of persistent artifact: FAISS index + descriptor sidecar (C6), TensorRT engine cache + INT8 calibration cache (C7), Manifest (C10), and tile artifacts (C11). Each artifact must be written atomically (no partial files) AND must have a hash sidecar the verifier can independently recompute. Without a shared helper: - C6 / C7 / C10 / C11 each grow their own atomic-write + hash implementation; subtle differences in temp-file naming, rename ordering, or sidecar format break the cross-component verifier the moment one drifts. - The Manifest aggregate hash (which covers many files) goes through path-ordering logic that is implemented in only one place; if that ordering ever differs across a writer and a verifier, the entire cache root looks corrupt. - An attacker (or accidental `rsync`) replaces `engine.engine` after `engine.engine.sha256` was written; without independent verification, takeoff-load accepts the swapped file. ## Outcome - A single `helpers.sha256_sidecar` module is the only path through which any onboard process writes hash-verified artifacts. - Atomic write is a hard contract: the temp-file → rename pattern guarantees no partial file ever appears at the target path. A fault between the bytes-flushed point and the rename leaves either the previous version or no file at all — never a half-written one. - `verify(path)` recomputes the digest from the file's bytes; it does NOT trust the sidecar's value alone. A swapped artifact with a stale sidecar is detected. - `aggregate_hash` is order-deterministic (sorts paths first), so the Manifest aggregate is reproducible across writer and verifier. - The sidecar format is intentionally trivial (lowercase hex digest, no JSON wrapper, no trailing newline) so any small script can verify a single artifact without pulling in the helper. ## Scope ### Included - `Sha256Sidecar` static methods: `write_atomic`, `write_atomic_and_sidecar`, `verify`, `aggregate_hash`. - `Sha256SidecarError` exception type wrapping underlying `OSError` and capturing missing/malformed sidecar conditions. - Public interface contract published at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`. ### Excluded - Cryptographic signing — this helper is corruption + accidental-replacement defense only; signing is out of scope (mid-flight tile gen has its own per-flight signing key path elsewhere). - Streaming hashing for payloads larger than RAM — out of scope; the helper's API is `payload: bytes`. - Compression / on-disk encoding — payloads are written verbatim. - Sidecar versioning — there is no version byte. - Filesystem-type detection (warning when run on NFS / overlayfs) — documented in contract Caveats; not enforced at runtime. ## Acceptance Criteria **AC-1: Round-trip write + verify** Given a 1 MiB random payload When `write_atomic_and_sidecar(path, payload)` runs followed by `verify(path)` Then `verify` returns True AND the sidecar at `path.sha256` contains a 64-char lowercase hex digest matching `hashlib.sha256(payload).hexdigest()` **AC-2: Atomicity — no partial file on fault** Given a fault is injected between the temp-file flush and the rename (e.g., monkey-patch `os.replace` to raise `OSError`) When `write_atomic(path, payload)` runs and raises Then `path` does NOT exist (or, if it pre-existed, its bytes are unchanged); no `*.tmp` or partial file remains at the target name **AC-3: Independent verification rejects swapped payloads** Given an artifact is written via `write_atomic_and_sidecar`, then the file at `path` is overwritten out-of-band with different bytes When `verify(path)` runs Then it returns False (NOT True; it must NOT trust the sidecar value alone) **AC-4: Missing sidecar is an error, not False** Given an artifact exists at `path` but `path.sha256` was deleted When `verify(path)` runs Then `Sha256SidecarError` is raised with a message naming the missing sidecar (the helper does NOT silently return False — that would conflate "corrupt artifact" with "missing sidecar") **AC-5: Malformed sidecar is rejected** Given a sidecar containing `not a hex digest` or a digest of wrong length When `verify(path)` runs Then `Sha256SidecarError` is raised mentioning malformed sidecar content **AC-6: Aggregate hash is order-deterministic** Given three files `a`, `b`, `c` and their hashes When `aggregate_hash([a, b, c])` and `aggregate_hash([c, a, b])` run Then both calls return the same hex digest (the implementation sorts paths internally) **AC-7: Aggregate hash rejects missing files** Given a list including a non-existent path When `aggregate_hash` runs Then `Sha256SidecarError` is raised mentioning the missing path **AC-8: Sidecar format strictness** Given the sidecar written by `write_atomic_and_sidecar` When the file's bytes are read Then the bytes are EXACTLY the 64-char lowercase hex digest — no JSON wrapper, no trailing newline, no whitespace **AC-9: No upward imports (Layer 1 invariant)** Given the helper module When a static-import check runs Then it imports ONLY from `_types`, `atomicwrites`, `hashlib`, `pathlib`, and stdlib — no `gps_denied_onboard.components.*` imports anywhere ## Non-Functional Requirements **Performance** - No specific latency budget per `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` (consumers are pre-flight / post-landing). Sanity bound: `write_atomic_and_sidecar` of a 1 MiB payload ≤ 50 ms on Tier-2. **Reliability** - `Sha256SidecarError` is the ONLY exception type the public surface raises on filesystem / sidecar errors. `OSError` MUST be wrapped so callers do not have to handle two error hierarchies. - Pure deterministic: same payload always produces the same digest. ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|-------------|-----------------| | AC-1 | Round-trip write + verify on 1 MiB random payload | sidecar matches `hashlib.sha256(payload).hexdigest()`; `verify` True | | AC-2 | Inject `OSError` between flush and rename | no partial file remains at target name | | AC-3 | Overwrite payload after sidecar is written | `verify` returns False | | AC-4 | Delete sidecar; call `verify` | `Sha256SidecarError`; mentions missing sidecar | | AC-5 | Malformed sidecar content | `Sha256SidecarError`; mentions malformed sidecar | | AC-6 | `aggregate_hash` with two different orderings | byte-equal digests | | AC-7 | `aggregate_hash` with a missing path | `Sha256SidecarError`; mentions missing path | | AC-8 | Read sidecar bytes after `write_atomic_and_sidecar` | exactly 64 hex chars; no newline / whitespace / JSON | | AC-9 | importlinter / grep gate | no `components.*` imports | | NFR-perf | Microbench `write_atomic_and_sidecar` of 1 MiB payload | ≤ 50 ms on Tier-2 | ## Constraints - Public surface frozen by `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` v1.0.0. - Layer 1 Foundation only. - `atomicwrites` is the single atomic-rename backend; pinned in `pyproject.toml` at AZ-263 / E-BOOT. - Static-only design satisfies `coderule.mdc`. - No new dependency beyond what AZ-263 / E-BOOT pinned. - Production cache root MUST live on a local POSIX filesystem (NFS / SMB / overlayfs are unsupported per the helper's atomic-rename invariant). Documented in deployment artifacts; not enforced at runtime. ## Risks & Mitigation **Risk 1: A future helper change relaxes atomicity to "best-effort"** - *Risk*: Someone replaces the temp-file → rename pattern with a direct write under the rationale "rename is slow on certain filesystems"; takeoff-load occasionally sees partial files. - *Mitigation*: AC-2 makes atomicity a hard test. Any regression that loses the rename is caught immediately. **Risk 2: `aggregate_hash` ordering drifts between writer and verifier** - *Risk*: A future change adds case-insensitive sorting or strips path prefixes; writer and verifier disagree; cache root looks corrupt. - *Mitigation*: AC-6 pins the deterministic-ordering invariant; the contract spells out the exact format (`\0\n` lines, lexicographically sorted by full path). **Risk 3: Sidecar format ambiguity (someone wraps the digest in JSON)** - *Risk*: A future contributor "improves" the sidecar to be JSON for "extensibility"; verification scripts that expect raw hex break. - *Mitigation*: AC-8 pins the exact byte-level format. Versioning rules force a major bump for any format change. ## Runtime Completeness - **Named capability**: atomic-write + SHA-256 content-hash sidecar (D-C10-3 / `05_helper_sha256_sidecar.md`). - **Production code that must exist**: real `atomicwrites`-backed atomic rename; real `hashlib.sha256` digesting; real independent verify. - **Allowed external stubs**: none — `atomicwrites` and `hashlib` are stdlib + production deps. - **Unacceptable substitutes**: direct write (loses atomicity); trusting the sidecar value without recomputing the file's hash; JSON-wrapped sidecar; case-insensitive aggregate ordering. ## Contract This task produces the contract at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`. Consumers MUST read that file — not this task spec — to discover the interface.