Files
gps-denied-onboard/_docs/02_tasks/done/AZ-280_sha256_sidecar.md
T
Oleksandr Bezdieniezhnykh 8e71f6c002 [AZ-266] [AZ-269] [AZ-277] [AZ-280] Cross-cutting log/config + SE3/SHA256 helpers
AZ-266: schema-compliant JSON logging entrypoint, level normalisation,
handler-topology guard, format-error fallback (log_record_schema v1.0.0).
AZ-269: env > YAML > defaults config loader, frozen Config dataclass,
missing-var fail-fast with pointer to .env.example, component-block registry.
AZ-277: GTSAM-backed SE3Utils (matrix<->SE3 + exp/log/adjoint) with strict
orthogonality, dtype, and bottom-row contract enforcement.
AZ-280: atomicwrites-backed write_atomic + independent verify +
order-deterministic aggregate_hash; sidecar format strictness.
pyproject.toml pins gtsam>=4.2,<5.0 and atomicwrites>=1.4,<2.0
(named-backend deps per the AZ-277 / AZ-280 contracts).
139 unit tests pass (44 new). Review verdict: PASS_WITH_WARNINGS;
findings are perf-NFR + journald deferrals, no blocking issues.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 01:33:42 +03:00

10 KiB

Sha256Sidecar Helper Module

Task: AZ-280_sha256_sidecar Name: Sha256Sidecar Helper Description: Implement the shared Sha256Sidecar helper that owns the atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) must verify gets written atomically AND has a .sha256 sidecar that the verifier can independently recompute. Used by C6 (FAISS index, descriptor sidecar), C7 (engine cache + INT8 calibration cache), C10 (Manifest), and C11 (tile artifact verification). Stateless static-only design. Complexity: 2 points Dependencies: AZ-263_initial_structure Component: shared.helpers.sha256_sidecar (cross-cutting; epic AZ-264 / E-CC-HELPERS) Tracker: AZ-280 Epic: AZ-264 (E-CC-HELPERS)

Document Dependencies

  • _docs/02_document/contracts/shared_helpers/sha256_sidecar.md — frozen public interface this task produces.
  • _docs/02_document/common-helpers/05_helper_sha256_sidecar.md — design rationale and consumer mapping (D-C10-3).

Problem

The takeoff-load gate (F2) verifies four classes of persistent artifact: FAISS index + descriptor sidecar (C6), TensorRT engine cache + INT8 calibration cache (C7), Manifest (C10), and tile artifacts (C11). Each artifact must be written atomically (no partial files) AND must have a hash sidecar the verifier can independently recompute.

Without a shared helper:

  • C6 / C7 / C10 / C11 each grow their own atomic-write + hash implementation; subtle differences in temp-file naming, rename ordering, or sidecar format break the cross-component verifier the moment one drifts.
  • The Manifest aggregate hash (which covers many files) goes through path-ordering logic that is implemented in only one place; if that ordering ever differs across a writer and a verifier, the entire cache root looks corrupt.
  • An attacker (or accidental rsync) replaces engine.engine after engine.engine.sha256 was written; without independent verification, takeoff-load accepts the swapped file.

Outcome

  • A single helpers.sha256_sidecar module is the only path through which any onboard process writes hash-verified artifacts.
  • Atomic write is a hard contract: the temp-file → rename pattern guarantees no partial file ever appears at the target path. A fault between the bytes-flushed point and the rename leaves either the previous version or no file at all — never a half-written one.
  • verify(path) recomputes the digest from the file's bytes; it does NOT trust the sidecar's value alone. A swapped artifact with a stale sidecar is detected.
  • aggregate_hash is order-deterministic (sorts paths first), so the Manifest aggregate is reproducible across writer and verifier.
  • The sidecar format is intentionally trivial (lowercase hex digest, no JSON wrapper, no trailing newline) so any small script can verify a single artifact without pulling in the helper.

Scope

Included

  • Sha256Sidecar static methods: write_atomic, write_atomic_and_sidecar, verify, aggregate_hash.
  • Sha256SidecarError exception type wrapping underlying OSError and capturing missing/malformed sidecar conditions.
  • Public interface contract published at _docs/02_document/contracts/shared_helpers/sha256_sidecar.md.

Excluded

  • Cryptographic signing — this helper is corruption + accidental-replacement defense only; signing is out of scope (mid-flight tile gen has its own per-flight signing key path elsewhere).
  • Streaming hashing for payloads larger than RAM — out of scope; the helper's API is payload: bytes.
  • Compression / on-disk encoding — payloads are written verbatim.
  • Sidecar versioning — there is no version byte.
  • Filesystem-type detection (warning when run on NFS / overlayfs) — documented in contract Caveats; not enforced at runtime.

Acceptance Criteria

AC-1: Round-trip write + verify Given a 1 MiB random payload When write_atomic_and_sidecar(path, payload) runs followed by verify(path) Then verify returns True AND the sidecar at path.sha256 contains a 64-char lowercase hex digest matching hashlib.sha256(payload).hexdigest()

AC-2: Atomicity — no partial file on fault Given a fault is injected between the temp-file flush and the rename (e.g., monkey-patch os.replace to raise OSError) When write_atomic(path, payload) runs and raises Then path does NOT exist (or, if it pre-existed, its bytes are unchanged); no *.tmp or partial file remains at the target name

AC-3: Independent verification rejects swapped payloads Given an artifact is written via write_atomic_and_sidecar, then the file at path is overwritten out-of-band with different bytes When verify(path) runs Then it returns False (NOT True; it must NOT trust the sidecar value alone)

AC-4: Missing sidecar is an error, not False Given an artifact exists at path but path.sha256 was deleted When verify(path) runs Then Sha256SidecarError is raised with a message naming the missing sidecar (the helper does NOT silently return False — that would conflate "corrupt artifact" with "missing sidecar")

AC-5: Malformed sidecar is rejected Given a sidecar containing not a hex digest or a digest of wrong length When verify(path) runs Then Sha256SidecarError is raised mentioning malformed sidecar content

AC-6: Aggregate hash is order-deterministic Given three files a, b, c and their hashes When aggregate_hash([a, b, c]) and aggregate_hash([c, a, b]) run Then both calls return the same hex digest (the implementation sorts paths internally)

AC-7: Aggregate hash rejects missing files Given a list including a non-existent path When aggregate_hash runs Then Sha256SidecarError is raised mentioning the missing path

AC-8: Sidecar format strictness Given the sidecar written by write_atomic_and_sidecar When the file's bytes are read Then the bytes are EXACTLY the 64-char lowercase hex digest — no JSON wrapper, no trailing newline, no whitespace

AC-9: No upward imports (Layer 1 invariant) Given the helper module When a static-import check runs Then it imports ONLY from _types, atomicwrites, hashlib, pathlib, and stdlib — no gps_denied_onboard.components.* imports anywhere

Non-Functional Requirements

Performance

  • No specific latency budget per _docs/02_document/common-helpers/05_helper_sha256_sidecar.md (consumers are pre-flight / post-landing). Sanity bound: write_atomic_and_sidecar of a 1 MiB payload ≤ 50 ms on Tier-2.

Reliability

  • Sha256SidecarError is the ONLY exception type the public surface raises on filesystem / sidecar errors. OSError MUST be wrapped so callers do not have to handle two error hierarchies.
  • Pure deterministic: same payload always produces the same digest.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 Round-trip write + verify on 1 MiB random payload sidecar matches hashlib.sha256(payload).hexdigest(); verify True
AC-2 Inject OSError between flush and rename no partial file remains at target name
AC-3 Overwrite payload after sidecar is written verify returns False
AC-4 Delete sidecar; call verify Sha256SidecarError; mentions missing sidecar
AC-5 Malformed sidecar content Sha256SidecarError; mentions malformed sidecar
AC-6 aggregate_hash with two different orderings byte-equal digests
AC-7 aggregate_hash with a missing path Sha256SidecarError; mentions missing path
AC-8 Read sidecar bytes after write_atomic_and_sidecar exactly 64 hex chars; no newline / whitespace / JSON
AC-9 importlinter / grep gate no components.* imports
NFR-perf Microbench write_atomic_and_sidecar of 1 MiB payload ≤ 50 ms on Tier-2

Constraints

  • Public surface frozen by _docs/02_document/contracts/shared_helpers/sha256_sidecar.md v1.0.0.
  • Layer 1 Foundation only.
  • atomicwrites is the single atomic-rename backend; pinned in pyproject.toml at AZ-263 / E-BOOT.
  • Static-only design satisfies coderule.mdc.
  • No new dependency beyond what AZ-263 / E-BOOT pinned.
  • Production cache root MUST live on a local POSIX filesystem (NFS / SMB / overlayfs are unsupported per the helper's atomic-rename invariant). Documented in deployment artifacts; not enforced at runtime.

Risks & Mitigation

Risk 1: A future helper change relaxes atomicity to "best-effort"

  • Risk: Someone replaces the temp-file → rename pattern with a direct write under the rationale "rename is slow on certain filesystems"; takeoff-load occasionally sees partial files.
  • Mitigation: AC-2 makes atomicity a hard test. Any regression that loses the rename is caught immediately.

Risk 2: aggregate_hash ordering drifts between writer and verifier

  • Risk: A future change adds case-insensitive sorting or strips path prefixes; writer and verifier disagree; cache root looks corrupt.
  • Mitigation: AC-6 pins the deterministic-ordering invariant; the contract spells out the exact format (<filename>\0<file-hex-digest>\n lines, lexicographically sorted by full path).

Risk 3: Sidecar format ambiguity (someone wraps the digest in JSON)

  • Risk: A future contributor "improves" the sidecar to be JSON for "extensibility"; verification scripts that expect raw hex break.
  • Mitigation: AC-8 pins the exact byte-level format. Versioning rules force a major bump for any format change.

Runtime Completeness

  • Named capability: atomic-write + SHA-256 content-hash sidecar (D-C10-3 / 05_helper_sha256_sidecar.md).
  • Production code that must exist: real atomicwrites-backed atomic rename; real hashlib.sha256 digesting; real independent verify.
  • Allowed external stubs: none — atomicwrites and hashlib are stdlib + production deps.
  • Unacceptable substitutes: direct write (loses atomicity); trusting the sidecar value without recomputing the file's hash; JSON-wrapped sidecar; case-insensitive aggregate ordering.

Contract

This task produces the contract at _docs/02_document/contracts/shared_helpers/sha256_sidecar.md. Consumers MUST read that file — not this task spec — to discover the interface.