Files
gps-denied-onboard/_docs/02_document/common-helpers/05_helper_sha256_sidecar.md
T
Oleksandr Bezdieniezhnykh ab92946833 [autodev] Step 13 partial: helpers 5-8 cycle-1 doc sync
Batch 5b completes the helpers sweep for cycle-1 Step 13.
For each of the four remaining helpers (sha256_sidecar,
engine_filename_schema, ransac_filter,
descriptor_normaliser):

- Append "Cycle-1 operational reality" section to the
  existing common-helpers/<NN>_*.md, documenting the
  shipped interface, exception types, public constants,
  determinism / validation invariants, and AZ-task
  lineage.

Specific cycle-1 facts captured per helper:

- sha256_sidecar (AZ-280): single Sha256SidecarError
  hierarchy, SIDECAR_SUFFIX public constant, sidecar
  format is pure lowercase 64-char hex (no JSON),
  verbatim ".sha256" suffix append, streaming digests
  in 1 MiB chunks, verify-returns-False semantics for
  missing payload vs. raise for missing sidecar,
  byte-deterministic aggregate_hash with sorted-by-str
  basenames.
- engine_filename_schema (AZ-281):
  EngineFilenameSchemaError, ENGINE_SUFFIX and
  ALLOWED_PRECISIONS public constants, strict model
  validation ([a-z0-9_]+ ≤64 chars no __), dotted
  version regex, non-bool sm validation, matches_host
  ignores precision by design.
- ransac_filter (AZ-282 / AZ-623): RansacFilterError,
  frozen RansacResult dataclass, cv2.setRNGSeed(0)
  determinism, median-not-mean residual, NaN for empty
  inliers, min_inliers is informational only,
  filter_correspondences uses perspectiveTransform vs.
  compute_reprojection_residual uses projectPoints, OK
  to import se3_utils (both Layer 1).
- descriptor_normaliser (AZ-283 / AZ-338):
  DescriptorNormaliserError, ALLOWED_DTYPES =
  (float16, float32), float32 norm computation with
  dtype-preserving cast-back, new
  intra_cluster_normalise method for NetVLAD per-cluster
  L2 (AZ-338), descriptor_metric returns
  "inner_product" string.

Two contract files (descriptor_normaliser.md and
ransac_filter.md mention follow-up) need follow-up
minor revisions to match shipped surface; queued for
the contracts-folder sweep.

Bumps _docs/_autodev_state.md sub_step to
tests-doc-updates phase 9.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 17:36:47 +03:00

5.0 KiB

Common Helper — Sha256Sidecar

Purpose

Atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) verifies must be written atomically AND have a .sha256 sidecar that the verifier can independently recompute. Centralising the pattern avoids two slightly-different implementations across C6 (FAISS index, tile metadata) and C7 (engine + calibration cache) and C10 (Manifest itself).

Used By

  • C6 — Tile Cache + Spatial Index (FAISS .index, descriptor sidecar; tile pixels do NOT use sidecars individually — there are too many; the Manifest covers the tile-tree hash collectively).
  • C7 — Inference Runtime (engine cache files + INT8 calibration cache; D-C10-6 calibration-cache trust depends on this).
  • C10 — Pre-flight Cache Provisioning (Manifest itself; aggregate hash of the cache root).

Interface (sketch)

class Sha256Sidecar:
    @staticmethod
    def write_atomic(path: Path, payload: bytes) -> sha256
    @staticmethod
    def write_atomic_and_sidecar(path: Path, payload: bytes) -> sha256
    @staticmethod
    def verify(path: Path) -> bool                           # checks payload hash against sidecar
    @staticmethod
    def aggregate_hash(paths: list[Path]) -> sha256          # for Manifest covering many files

Implementation Notes

  • Backed by the atomicwrites package for atomic rename and Python's hashlib.sha256 for digesting.
  • Sidecars are written as <path>.sha256 containing the hex digest (no JSON wrapper — keeps verification trivial).
  • aggregate_hash is order-deterministic (sorts paths first) so two runs that read the same files yield the same aggregate.

Caveats

  • The atomic rename is filesystem-level — works on POSIX local filesystems, not on NFS / SMB / overlayfs. For production deployments the cache root MUST live on a local filesystem.
  • The sidecar is NOT cryptographically signed; it protects against accidental corruption + file-replacement-after-staging, NOT against an attacker with write access to the cache root. Threat model treats the operator workstation as trusted; the companion's write access is restricted to F4 (mid-flight tile gen) which has its own per-flight signing key path.

Cycle-1 operational reality

The shipped surface in src/gps_denied_onboard/helpers/sha256_sidecar.py (AZ-280) is static-only by design. Atomicity comes from atomicwrites.atomic_write (temp-file → os.replace). All four entry points wrap OSError and ValueError into a single exception hierarchy.

  • Sha256SidecarError — single public exception type (subclasses RuntimeError). Raised on: write_atomic OS failure; write_atomic_and_sidecar sidecar OS failure; verify finds the sidecar missing for an existing payload; sidecar text not exactly 64 lowercase hex chars; aggregate_hash finds a missing or unreadable path.
  • SIDECAR_SUFFIX = ".sha256" — public module-level constant for callers (e.g. takeoff-load verifier listing) that need to spell the sidecar suffix without hard-coding it.
  • Sidecar file format — pure hex digest, no JSON wrapper, exactly 64 chars, all lowercase. The validator rejects uppercase or wrong-length sidecars hard (catches "user edited the sidecar by hand and broke it"). Keeps verification trivial.
  • Sidecar path appends .sha256 verbatimPath.with_suffix would re-interpret an existing extension; we explicitly use Path(str(payload_path) + ".sha256"). So manifestmanifest.sha256 AND engine.engineengine.engine.sha256. This is the AC-NEW-CACHE-3 / D-C10-3 invariant.
  • Streaming digestsverify and aggregate_hash stream the file in 1 MiB chunks (_digest_file) so an 8 GB engine file does not require 8 GB of RAM. write_atomic is the only entry point that operates on in-memory bytes.
  • verify semantics — returns False (not raise) when the payload path is missing entirely ("not verifiable" rather than "verification error"); raises Sha256SidecarError when the payload exists but the sidecar is missing, unreadable, or malformed. Callers can branch on path.exists() first if they need to distinguish missing-payload from corrupt-sidecar.
  • aggregate_hash is byte-deterministic — input list is sorted lexicographically by str(path) before hashing. The digest is computed over the concatenation of <basename>\0<hex-digest>\n lines (basename only, NOT full path, so the same physical file at a different mount point still produces the same aggregate). Missing paths in the input list raise instead of being silently skipped.

Cycle-1 task lineage

  • AZ-280 — initial helper, contract producer.
  • No cycle-1 follow-up tasks touched this helper. The C10 / C6 / C7 task batch that consumes it (AZ-301 C7 engine gate, AZ-303 C6 storage interfaces, AZ-305 C6 postgres+filesystem store, AZ-321 C10 engine compiler, AZ-322 C10 descriptor batcher, AZ-323 C10 manifest builder, AZ-324 C10 manifest verifier, AZ-325 C10 cache provisioner) cycles through the four Sha256Sidecar static methods without extending them.