[autodev] Step 13 partial: helpers 5-8 cycle-1 doc sync

Batch 5b completes the helpers sweep for cycle-1 Step 13.
For each of the four remaining helpers (sha256_sidecar,
engine_filename_schema, ransac_filter,
descriptor_normaliser):

- Append "Cycle-1 operational reality" section to the
  existing common-helpers/<NN>_*.md, documenting the
  shipped interface, exception types, public constants,
  determinism / validation invariants, and AZ-task
  lineage.

Specific cycle-1 facts captured per helper:

- sha256_sidecar (AZ-280): single Sha256SidecarError
  hierarchy, SIDECAR_SUFFIX public constant, sidecar
  format is pure lowercase 64-char hex (no JSON),
  verbatim ".sha256" suffix append, streaming digests
  in 1 MiB chunks, verify-returns-False semantics for
  missing payload vs. raise for missing sidecar,
  byte-deterministic aggregate_hash with sorted-by-str
  basenames.
- engine_filename_schema (AZ-281):
  EngineFilenameSchemaError, ENGINE_SUFFIX and
  ALLOWED_PRECISIONS public constants, strict model
  validation ([a-z0-9_]+ ≤64 chars no __), dotted
  version regex, non-bool sm validation, matches_host
  ignores precision by design.
- ransac_filter (AZ-282 / AZ-623): RansacFilterError,
  frozen RansacResult dataclass, cv2.setRNGSeed(0)
  determinism, median-not-mean residual, NaN for empty
  inliers, min_inliers is informational only,
  filter_correspondences uses perspectiveTransform vs.
  compute_reprojection_residual uses projectPoints, OK
  to import se3_utils (both Layer 1).
- descriptor_normaliser (AZ-283 / AZ-338):
  DescriptorNormaliserError, ALLOWED_DTYPES =
  (float16, float32), float32 norm computation with
  dtype-preserving cast-back, new
  intra_cluster_normalise method for NetVLAD per-cluster
  L2 (AZ-338), descriptor_metric returns
  "inner_product" string.

Two contract files (descriptor_normaliser.md and
ransac_filter.md mention follow-up) need follow-up
minor revisions to match shipped surface; queued for
the contracts-folder sweep.

Bumps _docs/_autodev_state.md sub_step to
tests-doc-updates phase 9.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 17:36:47 +03:00
parent 4fdf1968af
commit ab92946833
5 changed files with 71 additions and 3 deletions
@@ -34,3 +34,20 @@ class Sha256Sidecar:
- The atomic rename is filesystem-level — works on POSIX local filesystems, not on NFS / SMB / overlayfs. For production deployments the cache root MUST live on a local filesystem.
- The sidecar is NOT cryptographically signed; it protects against accidental corruption + file-replacement-after-staging, NOT against an attacker with write access to the cache root. Threat model treats the operator workstation as trusted; the companion's write access is restricted to F4 (mid-flight tile gen) which has its own per-flight signing key path.
## Cycle-1 operational reality
The shipped surface in `src/gps_denied_onboard/helpers/sha256_sidecar.py` (AZ-280) is static-only by design. Atomicity comes from `atomicwrites.atomic_write` (temp-file → `os.replace`). All four entry points wrap `OSError` and `ValueError` into a single exception hierarchy.
- **`Sha256SidecarError`** — single public exception type (subclasses `RuntimeError`). Raised on: `write_atomic` OS failure; `write_atomic_and_sidecar` sidecar OS failure; `verify` finds the sidecar missing for an existing payload; sidecar text not exactly 64 lowercase hex chars; `aggregate_hash` finds a missing or unreadable path.
- **`SIDECAR_SUFFIX = ".sha256"`** — public module-level constant for callers (e.g. takeoff-load verifier listing) that need to spell the sidecar suffix without hard-coding it.
- **Sidecar file format** — pure hex digest, no JSON wrapper, exactly 64 chars, all lowercase. The validator rejects uppercase or wrong-length sidecars hard (catches "user edited the sidecar by hand and broke it"). Keeps verification trivial.
- **Sidecar path appends `.sha256` verbatim** — `Path.with_suffix` would re-interpret an existing extension; we explicitly use `Path(str(payload_path) + ".sha256")`. So `manifest``manifest.sha256` AND `engine.engine``engine.engine.sha256`. This is the AC-NEW-CACHE-3 / D-C10-3 invariant.
- **Streaming digests** — `verify` and `aggregate_hash` stream the file in 1 MiB chunks (`_digest_file`) so an 8 GB engine file does not require 8 GB of RAM. `write_atomic` is the only entry point that operates on in-memory `bytes`.
- **`verify` semantics** — returns `False` (not raise) when the payload path is missing entirely ("not verifiable" rather than "verification error"); raises `Sha256SidecarError` when the payload exists but the sidecar is missing, unreadable, or malformed. Callers can branch on `path.exists()` first if they need to distinguish missing-payload from corrupt-sidecar.
- **`aggregate_hash` is byte-deterministic** — input list is sorted lexicographically by `str(path)` before hashing. The digest is computed over the concatenation of `<basename>\0<hex-digest>\n` lines (basename only, NOT full path, so the same physical file at a different mount point still produces the same aggregate). Missing paths in the input list raise instead of being silently skipped.
### Cycle-1 task lineage
- AZ-280 — initial helper, contract producer.
- No cycle-1 follow-up tasks touched this helper. The C10 / C6 / C7 task batch that consumes it (AZ-301 C7 engine gate, AZ-303 C6 storage interfaces, AZ-305 C6 postgres+filesystem store, AZ-321 C10 engine compiler, AZ-322 C10 descriptor batcher, AZ-323 C10 manifest builder, AZ-324 C10 manifest verifier, AZ-325 C10 cache provisioner) cycles through the four `Sha256Sidecar` static methods without extending them.