[autodev] Step 13 partial: helpers 5-8 cycle-1 doc sync

Batch 5b completes the helpers sweep for cycle-1 Step 13.
For each of the four remaining helpers (sha256_sidecar,
engine_filename_schema, ransac_filter,
descriptor_normaliser):

- Append "Cycle-1 operational reality" section to the
  existing common-helpers/<NN>_*.md, documenting the
  shipped interface, exception types, public constants,
  determinism / validation invariants, and AZ-task
  lineage.

Specific cycle-1 facts captured per helper:

- sha256_sidecar (AZ-280): single Sha256SidecarError
  hierarchy, SIDECAR_SUFFIX public constant, sidecar
  format is pure lowercase 64-char hex (no JSON),
  verbatim ".sha256" suffix append, streaming digests
  in 1 MiB chunks, verify-returns-False semantics for
  missing payload vs. raise for missing sidecar,
  byte-deterministic aggregate_hash with sorted-by-str
  basenames.
- engine_filename_schema (AZ-281):
  EngineFilenameSchemaError, ENGINE_SUFFIX and
  ALLOWED_PRECISIONS public constants, strict model
  validation ([a-z0-9_]+ ≤64 chars no __), dotted
  version regex, non-bool sm validation, matches_host
  ignores precision by design.
- ransac_filter (AZ-282 / AZ-623): RansacFilterError,
  frozen RansacResult dataclass, cv2.setRNGSeed(0)
  determinism, median-not-mean residual, NaN for empty
  inliers, min_inliers is informational only,
  filter_correspondences uses perspectiveTransform vs.
  compute_reprojection_residual uses projectPoints, OK
  to import se3_utils (both Layer 1).
- descriptor_normaliser (AZ-283 / AZ-338):
  DescriptorNormaliserError, ALLOWED_DTYPES =
  (float16, float32), float32 norm computation with
  dtype-preserving cast-back, new
  intra_cluster_normalise method for NetVLAD per-cluster
  L2 (AZ-338), descriptor_metric returns
  "inner_product" string.

Two contract files (descriptor_normaliser.md and
ransac_filter.md mention follow-up) need follow-up
minor revisions to match shipped surface; queued for
the contracts-folder sweep.

Bumps _docs/_autodev_state.md sub_step to
tests-doc-updates phase 9.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-19 17:36:47 +03:00
parent 4fdf1968af
commit ab92946833
5 changed files with 71 additions and 3 deletions
@@ -34,3 +34,20 @@ class Sha256Sidecar:
- The atomic rename is filesystem-level — works on POSIX local filesystems, not on NFS / SMB / overlayfs. For production deployments the cache root MUST live on a local filesystem.
- The sidecar is NOT cryptographically signed; it protects against accidental corruption + file-replacement-after-staging, NOT against an attacker with write access to the cache root. Threat model treats the operator workstation as trusted; the companion's write access is restricted to F4 (mid-flight tile gen) which has its own per-flight signing key path.
## Cycle-1 operational reality
The shipped surface in `src/gps_denied_onboard/helpers/sha256_sidecar.py` (AZ-280) is static-only by design. Atomicity comes from `atomicwrites.atomic_write` (temp-file → `os.replace`). All four entry points wrap `OSError` and `ValueError` into a single exception hierarchy.
- **`Sha256SidecarError`** — single public exception type (subclasses `RuntimeError`). Raised on: `write_atomic` OS failure; `write_atomic_and_sidecar` sidecar OS failure; `verify` finds the sidecar missing for an existing payload; sidecar text not exactly 64 lowercase hex chars; `aggregate_hash` finds a missing or unreadable path.
- **`SIDECAR_SUFFIX = ".sha256"`** — public module-level constant for callers (e.g. takeoff-load verifier listing) that need to spell the sidecar suffix without hard-coding it.
- **Sidecar file format** — pure hex digest, no JSON wrapper, exactly 64 chars, all lowercase. The validator rejects uppercase or wrong-length sidecars hard (catches "user edited the sidecar by hand and broke it"). Keeps verification trivial.
- **Sidecar path appends `.sha256` verbatim** — `Path.with_suffix` would re-interpret an existing extension; we explicitly use `Path(str(payload_path) + ".sha256")`. So `manifest``manifest.sha256` AND `engine.engine``engine.engine.sha256`. This is the AC-NEW-CACHE-3 / D-C10-3 invariant.
- **Streaming digests** — `verify` and `aggregate_hash` stream the file in 1 MiB chunks (`_digest_file`) so an 8 GB engine file does not require 8 GB of RAM. `write_atomic` is the only entry point that operates on in-memory `bytes`.
- **`verify` semantics** — returns `False` (not raise) when the payload path is missing entirely ("not verifiable" rather than "verification error"); raises `Sha256SidecarError` when the payload exists but the sidecar is missing, unreadable, or malformed. Callers can branch on `path.exists()` first if they need to distinguish missing-payload from corrupt-sidecar.
- **`aggregate_hash` is byte-deterministic** — input list is sorted lexicographically by `str(path)` before hashing. The digest is computed over the concatenation of `<basename>\0<hex-digest>\n` lines (basename only, NOT full path, so the same physical file at a different mount point still produces the same aggregate). Missing paths in the input list raise instead of being silently skipped.
### Cycle-1 task lineage
- AZ-280 — initial helper, contract producer.
- No cycle-1 follow-up tasks touched this helper. The C10 / C6 / C7 task batch that consumes it (AZ-301 C7 engine gate, AZ-303 C6 storage interfaces, AZ-305 C6 postgres+filesystem store, AZ-321 C10 engine compiler, AZ-322 C10 descriptor batcher, AZ-323 C10 manifest builder, AZ-324 C10 manifest verifier, AZ-325 C10 cache provisioner) cycles through the four `Sha256Sidecar` static methods without extending them.
@@ -48,3 +48,20 @@ HostCapabilities:
- The dotted-version format must round-trip cleanly through filesystems (no `/` or `\` in dotted versions; safe).
- Adding a new tuple dimension (e.g., a per-binary `BUILD_*` flag combination) requires extending the schema AND every existing `.engine` filename. Versioning the schema itself is a Plan-phase carryforward if/when needed.
## Cycle-1 operational reality
The shipped surface in `src/gps_denied_onboard/helpers/engine_filename_schema.py` (AZ-281) is stateless and static-only, with a single compiled regex governing `parse` and `build`. The host-match predicate compares `(sm, jetpack, trt)` exactly; **precision is NOT part of the host match** (a `fp16` engine and an `int8` engine for the same SM/JetPack/TRT both "match the host" — the takeoff-load verifier picks the one it wants by precision separately).
- **`EngineFilenameSchemaError`** — single public exception type (subclasses `ValueError`). Raised on: non-`str` inputs to `build` / `parse`; missing `.engine` suffix; regex non-match; reserved `__` separator inside `model_name`; `model_name` outside `[a-z0-9_]+` or longer than 64 chars; `sm` not a non-bool positive int; version not matching `\d+\.\d+`; precision not in `ALLOWED_PRECISIONS`.
- **`ENGINE_SUFFIX = ".engine"`** — public module-level constant.
- **`ALLOWED_PRECISIONS = frozenset({"fp16", "int8", "mixed"})`** — public module-level constant; exposed so C7's takeoff-load decision tree and C10's engine-build orchestration can validate operator-supplied precision without hard-coding the enum.
- **Strict model-name validation** — `[a-z0-9_]+`, non-empty, ≤64 chars, no embedded `__` (reserved as the model/SM separator). Catches "operator typed a model name with a hyphen" before any filesystem operation runs.
- **Strict version validation** — both `jetpack` and `trt` must match dotted `<major>.<minor>` (e.g. `"6.2"`, `"10.3"`). Patch components are deliberately NOT supported in the filename — the engine ABI is stable within `<major>.<minor>` per the JetPack/TRT release notes.
- **`sm` validation** — must be a non-bool `int > 0`. Python's `bool ⊆ int` quirk would otherwise let `True` slip through as `sm=1`.
- **`matches_host` ignores precision by design** — the filename's `precision` segment is informational for the host-match check. C7's `deserialize_engine` uses `matches_host` to filter "engines this host can run at all" before applying its own precision policy.
### Cycle-1 task lineage
- AZ-281 — initial helper, contract producer.
- No cycle-1 follow-up tasks touched this helper. C7's `deserialize_engine` (AZ-301) and C10's engine compiler (AZ-321) consume it without extension.
@@ -47,3 +47,21 @@ RansacResult:
- The RANSAC threshold is a tunable; defaults are documented per-component (C3, C3.5, C4) in their specs.
- For 2D-3D RANSAC inside C4's `solvePnPRansac`, OpenCV does it internally — this helper is for the standalone reprojection-residual computation that lives outside the PnP call.
## Cycle-1 operational reality
The shipped surface in `src/gps_denied_onboard/helpers/ransac_filter.py` (AZ-282, extended via composition in AZ-623) is static-only and deterministic — `cv2.setRNGSeed(0)` is called immediately before every `cv2.findHomography(..., RANSAC)` so the same correspondences always produce the same inlier mask (AC-3 byte-equal determinism).
- **`RansacFilterError`** — single public exception type (subclasses `ValueError`). Raised on: non-`ndarray` correspondences; wrong-shape correspondences (anything other than `(N, 4)`); non-positive `ransac_threshold_px`; negative `min_inliers`; fewer than 4 correspondences for `filter_correspondences` (homography needs ≥4); non-`(3, 3)` `K`; distortion not shape `(5,)` or `(8,)`; OpenCV exceptions are wrapped (`cv2.error``RansacFilterError`).
- **`RansacResult` is a frozen `@dataclass`** — `inlier_correspondences: np.ndarray`, `inlier_count: int`, `outlier_count: int`, `median_residual_px: float`. The numpy array is NOT copied; consumers MUST treat it as read-only.
- **Median, not mean** — both `filter_correspondences` (homography residual) and `compute_reprojection_residual` (post-pose residual) pin **median** as the residual statistic. This matches the contract for C3.5 (post-AdHoP residual gate) and C4 (per-frame FDR residual). Mean is more sensitive to remaining outliers and would defeat the gate.
- **NaN residual for empty inliers** — both methods return `float("nan")` when the inlier set is empty. Consumers must NOT propagate `nan` as a numeric residual; treat it as "no residual computable" and fall back to the C3.5/C4 "matcher returned nothing useful" branch.
- **`min_inliers` is INFORMATIONAL only** — passed to `filter_correspondences`, validated for non-negativity, but does NOT gate the result. The returned `RansacResult` always reflects the actual RANSAC outcome; callers decide whether `result.inlier_count >= min_inliers` is acceptable. This is the contract's "Min-inliers semantics" invariant — encoding the gate in the helper would conflate three separate component thresholds (C3 / C3.5 / C4).
- **`filter_correspondences` residual** uses `cv2.perspectiveTransform` (the homography fit's own residual). `compute_reprojection_residual` uses `cv2.projectPoints` with the supplied pose, back-projecting image-a pixels through `K^{-1}` to `z=1` in camera-a frame. The two residuals are NOT interchangeable — one measures homography fit quality, the other measures pose fit quality.
- **Imports `se3_utils`** — `from gps_denied_onboard.helpers.se3_utils import SE3, se3_to_matrix`. Layer 1 helper-on-helper import is allowed (both are Layer 1). The `SE3 = gtsam.Pose3` alias is the runtime pose type; `se3_to_matrix(pose)` extracts the 4×4 transform.
- **OpenCV pin** — uses `cv2.findHomography`, `cv2.perspectiveTransform`, `cv2.projectPoints`, `cv2.Rodrigues`, `cv2.setRNGSeed`. All exist in `opencv-python>=4.5`, so the cycle-1 pin relaxation to `>=4.11.0.86,<4.12` (D-CROSS-CVE-1 leftover) does not affect this helper.
### Cycle-1 task lineage
- AZ-282 — initial helper, contract producer.
- AZ-623 (`pre_constructed_phase_e_ransac_c5_helpers`) — composition-root sweep that wires C5 to consume the same static helper as C3/C3.5/C4; no signature changes, no new public surface added by this task.
@@ -31,3 +31,19 @@ class DescriptorNormaliser:
- Zero-norm vectors are returned as the zero vector (no division-by-zero); callers must filter or accept that such descriptors will match nothing.
- The choice of "inner product on L2-normalised" rather than "cosine" is FAISS-idiomatic — FAISS does not have a built-in cosine metric; cosine is achieved by L2-normalising and using inner product.
## Cycle-1 operational reality
The shipped surface in `src/gps_denied_onboard/helpers/descriptor_normaliser.py` (AZ-283, extended by AZ-338 NetVLAD per-cluster method) is static-only, stateless, and dtype-preserving. Norms are computed in `float32` to stabilise `float16` inputs against under/overflow, then cast back to the input dtype — the helper NEVER silently up-casts the returned descriptor.
- **`DescriptorNormaliserError`** — single public exception type (subclasses `ValueError`). Raised on: non-`ndarray` input; wrong dimensionality (`l2_normalise` requires 1-D, `l2_normalise_batch` requires 2-D, `intra_cluster_normalise` requires 1-D); zero-length axis; dtype not in `ALLOWED_DTYPES`; `num_clusters` not a non-bool positive int that divides `descriptor.shape[0]`.
- **`ALLOWED_DTYPES = (np.float16, np.float32)`** — public module-level constant. Anything else is rejected hard; this keeps the FAISS index and the runtime query path on the same precision (catches "C10 built the index in float32 but C2 fed a float64 query" regressions).
- **`intra_cluster_normalise(descriptor, num_clusters)` — NEW METHOD (AZ-338)**. Per-cluster L2 normalisation for VLAD-aggregated descriptors. NetVLAD's published preprocessing chain L2-normalises each per-cluster sub-vector BEFORE the global L2 step (`l2_normalise`). The input is a flat 1-D VLAD descriptor of shape `(num_clusters * cluster_dim,)`; the method reshapes to `(num_clusters, cluster_dim)`, normalises row-wise (zero-norm rows stay zero), and flattens back. `num_clusters` MUST divide `descriptor.shape[0]` — otherwise `DescriptorNormaliserError`.
- **`descriptor_metric()` returns the literal string `"inner_product"`** — the source of truth for FAISS HNSW index construction. C6's `DescriptorIndex.search_topk` and C10's index-build code both consult this; do NOT hard-code the metric string anywhere else.
- **Zero-norm vectors return zeros** — `l2_normalise`, `l2_normalise_batch`, and `intra_cluster_normalise` all guard the divisor. Callers that want to reject zero-norm descriptors must do so explicitly; the helper never raises on zero norm (it would be the wrong layer to decide the policy).
- **`l2_normalise_batch` vectorised** — uses `np.where(norms == 0.0, ...)` to apply the zero-guard row-wise so a batch of N descriptors with K zeros costs the same as a batch of N non-zero descriptors plus K boolean comparisons (no per-row branch).
### Cycle-1 task lineage
- AZ-283 — initial helper, contract producer (`l2_normalise`, `l2_normalise_batch`, `descriptor_metric`).
- AZ-338 — `intra_cluster_normalise` addition for the C2 VPR NetVLAD preprocessing path (`ultra_vpr` AZ-337 consumer). Contract minor revision (v1.0.0 → v1.1.0) is queued for the next contracts-folder sweep.