[AZ-266] [AZ-269] [AZ-277] [AZ-280] Cross-cutting log/config + SE3/SHA256 helpers

AZ-266: schema-compliant JSON logging entrypoint, level normalisation,
handler-topology guard, format-error fallback (log_record_schema v1.0.0).
AZ-269: env > YAML > defaults config loader, frozen Config dataclass,
missing-var fail-fast with pointer to .env.example, component-block registry.
AZ-277: GTSAM-backed SE3Utils (matrix<->SE3 + exp/log/adjoint) with strict
orthogonality, dtype, and bottom-row contract enforcement.
AZ-280: atomicwrites-backed write_atomic + independent verify +
order-deterministic aggregate_hash; sidecar format strictness.
pyproject.toml pins gtsam>=4.2,<5.0 and atomicwrites>=1.4,<2.0
(named-backend deps per the AZ-277 / AZ-280 contracts).
139 unit tests pass (44 new). Review verdict: PASS_WITH_WARNINGS;
findings are perf-NFR + journald deferrals, no blocking issues.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 01:33:42 +03:00
parent b12db61444
commit 8e71f6c002
21 changed files with 2134 additions and 133 deletions
+106
View File
@@ -0,0 +1,106 @@
# Shared Structured Logging Module
**Task**: AZ-266_log_module
**Name**: Shared Logging Module
**Description**: Provide the `get_logger(component_id)` entrypoint, a stable JSON formatter that emits records matching the log_record_schema contract, and the stdout / journald handlers used by Tier-1 and Tier-2 deployments.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.logging (cross-cutting; epic AZ-245 / E-CC-LOG)
**Tracker**: AZ-266
**Epic**: AZ-245 (E-CC-LOG)
## Problem
Every onboard component must emit structured JSON logs at DEBUG / INFO / WARN / ERROR with a stable, machine-parseable shape so post-flight analysis (FDR tooling, blackbox scenario checks, traceability matrix verification) can correlate events across components. Without one shared logger, format drift is guaranteed within a few weeks of parallel component development.
## Outcome
- A single `get_logger(component_id)` call is the only logging entrypoint any onboard module ever uses.
- Every emitted record is a single-line JSON object whose key set, key order, and value types match the `log_record_schema` contract version 1.0.0.
- Tier-1 deployments capture logs via Docker stdout; Tier-2 deployments capture logs via journald — switched by config, not by code.
## Scope
### Included
- `get_logger(component_id: str) -> Logger` factory backed by Python stdlib `logging`.
- A JSON formatter that emits the schema's 8 fields in the contract-mandated order, regardless of construction order. Implementation may use `python-json-logger` or `orjson`-backed formatter — whichever is already pinned in the project's lockfile from AZ-263.
- A stdout handler for Tier-1 (Docker) and a journald handler for Tier-2 (Jetson). Selection is config-driven via the structured-logging entry of the cross-cutting config epic (AZ-246 / E-CC-CONF).
- Per-frame structured-logging helpers for the documented per-component shapes referenced in epic AZ-245 (`vio.tick`, `vpr.query`, etc.) so component code can emit one-liner logs without rebuilding the kv dict.
- Public interface contract published at `_docs/02_document/contracts/shared_logging/log_record_schema.md`.
### Excluded
- The FDR bridge that forwards ERROR + WARN records into the Flight Data Recorder — owned by the next task (`03_fdr_log_bridge`, parented to the same epic).
- Per-component log call sites (each component epic owns its own logging call sites).
- Log schema versioning beyond 1.0.0 — handled by future change-log entries on the contract file.
## Acceptance Criteria
**AC-1: Single logger entrypoint**
Given any onboard Python module that imports the shared logging package
When the module calls `get_logger("c2_vpr")`
Then it receives a `Logger` whose every record passes the schema contract test (no other logger configuration is required by the caller)
**AC-2: Field order is stable**
Given a logger configured with the JSON formatter
When a component calls `logger.info(msg, extra={"frame_id": 42, "kind": "vpr.query", "kv": {...}})`
Then the emitted bytes parse as a single-line JSON object whose keys appear in the order `ts, level, component, frame_id, kind, msg, kv, exc`, regardless of the order the caller passed the fields
**AC-3: Level normalisation**
Given a logger receiving a record at level `WARNING` (Python stdlib name)
When the formatter emits the JSON record
Then the `level` field reads `WARN` (per contract), not `WARNING`
**AC-4: Handler topology selection**
Given the structured-logging config block selects `tier=1` (or `tier=2`)
When `runtime_root.py` initialises logging
Then exactly one stdout handler (or journald handler) is attached, with no duplicate handlers and no handler from the wrong tier
**AC-5: Non-frame records omit frame_id**
Given a startup or shutdown log call that does not pass a `frame_id`
When the record is emitted
Then `frame_id` appears as JSON `null` (never as a synthesised value, never absent from the key list)
## Non-Functional Requirements
**Performance**
- Per-record formatter latency p99 ≤ 0.2 ms on Tier-2 (Jetson Orin Nano Super) for a record with `len(kv) ≤ 8` scalar entries. Validated by a microbenchmark in unit tests.
- DEBUG records on the steady-state hot path allocate at most one new string (the formatted JSON line); no transient dict copies of `kv` are permitted.
**Reliability**
- Formatter never raises into the caller. A serialisation failure logs an internal `WARN` with `kind="log.format_error"` and drops the offending record's `kv` payload (replaces with `{"_format_error": "<reason>"}`); the rest of the record is still emitted.
- No global mutable state outside the standard `logging` module's own logger registry; multiple `get_logger("c2_vpr")` calls return the same cached `Logger` instance.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `get_logger("c2_vpr")` returns a Logger with the JSON formatter attached | Logger instance present; formatter produces valid contract record |
| AC-2 | Emit a record with kwargs in shuffled order | Parsed JSON keys appear in the contract's mandated order |
| AC-3 | Log at `logging.WARNING` level | Emitted JSON `level` field equals `"WARN"` |
| AC-4 | Initialise logging twice with the same tier-1 config | Exactly one stdout handler attached; no duplicates |
| AC-5 | Log a startup INFO without `frame_id` | Emitted JSON contains `"frame_id": null` |
| NFR-perf | Microbenchmark formatter on a record with 8 scalar kv entries | p99 ≤ 0.2 ms over 10k iterations |
| NFR-reliability | Pass a non-JSON-serialisable object in `kv` (e.g. a class instance) | Formatter emits the record with `kv={"_format_error": "..."}`; caller does not see an exception |
## Constraints
- Public interface frozen by `_docs/02_document/contracts/shared_logging/log_record_schema.md` v1.0.0 — any change requires a contract version bump.
- Stdlib `logging` is the only allowed underlying logging mechanism (per epic AZ-245 architecture note: "no third-party log aggregator").
- No new dependency beyond what AZ-263 / E-BOOT already pinned in `pyproject.toml`.
## Risks & Mitigation
**Risk 1: Formatter performance regression**
- *Risk*: Naïve `json.dumps` on each record exceeds the 0.2 ms p99 budget on Jetson.
- *Mitigation*: Bench against `orjson`-backed formatter as a fallback if stdlib `json` misses budget; choice is reversible because the contract is the public surface, not the formatter implementation.
**Risk 2: Handler duplication on hot-reload**
- *Risk*: Re-initialising logging during integration tests stacks duplicate handlers, multiplying every emitted record.
- *Mitigation*: `get_logger` checks for existing handlers on the named logger before adding new ones; integration test fixture asserts handler count after teardown.
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_logging/log_record_schema.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
+104
View File
@@ -0,0 +1,104 @@
# Config Loader + Outer Config Container
**Task**: AZ-269_config_loader
**Name**: Config Loader
**Description**: Implement `load_config(env, paths) -> Config` and the outer frozen `Config` dataclass. Merges env vars + one or more YAML files + documented defaults with strict precedence (env > YAML > defaults), returning an immutable container that holds one nested dataclass field per component slug.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.config (cross-cutting; epic AZ-246 / E-CC-CONF)
**Tracker**: AZ-269
**Epic**: AZ-246 (E-CC-CONF)
## Problem
ADR-001 (runtime selection by config) and ADR-009 (composition root) both require a single source of truth for configuration. Without a shared loader with explicit precedence rules, components silently fall back to defaults, the composition root grows local config-parsing logic, and operators cannot reliably override settings via env in CI or by YAML in the field.
## Outcome
- `load_config(env, paths)` is the only function any onboard process uses to materialise its `Config` at startup.
- Precedence is deterministic and observable: env > YAML > defaults; later YAML files win over earlier ones; missing keys fall to defaults.
- The returned `Config` is frozen end-to-end (every nested component block is also frozen) so accidental mutation by component code is a TypeError.
## Scope
### Included
- `load_config(env: Mapping[str, str], paths: Sequence[Path]) -> Config` per the composition_root_protocol contract.
- Outer frozen `Config` dataclass with one nested field per component slug. The OUTER container is owned by this task; the per-component nested dataclasses are owned by each component's epic and registered into the outer Config via a documented extension mechanism (a registry function called from `runtime_root.py`).
- Documented default values for cross-cutting blocks only (logging level, FDR queue size, etc.). Per-component defaults live in their own component epics.
- Friendly error messages when a required env var is missing (per AZ-263 AC-8): the error names the offending variable and points to `.env.example`.
### Excluded
- `compose_root` and `compose_operator` — owned by the next PBI in this epic.
- Per-component config blocks — owned by each component epic.
- The runtime self-check that strategies are linked — owned by the next PBI (StrategyNotLinkedError).
## Acceptance Criteria
**AC-1: Precedence env > YAML > defaults**
Given env sets `LOG_LEVEL=DEBUG` and YAML sets `log.level=INFO`
When `load_config(env, [yaml_path])` runs
Then `config.log.level == "DEBUG"`
**AC-2: YAML > defaults when env is silent**
Given env has no `LOG_LEVEL` and YAML sets `log.level=INFO`
When `load_config(env, [yaml_path])` runs
Then `config.log.level == "INFO"`
**AC-3: Defaults fill gaps**
Given env has no `LOG_LEVEL` and YAML omits `log.level`
When `load_config(env, [yaml_path])` runs
Then `config.log.level` equals the documented default
**AC-4: Multi-file YAML merge order**
Given two YAML paths where the second sets `fdr.queue_size=8192` and the first sets it to `4096`
When `load_config(env, [first, second])` runs
Then `config.fdr.queue_size == 8192` (later file wins)
**AC-5: Frozen end-to-end**
Given a loaded `Config`
When component code attempts `config.log.level = "DEBUG"`
Then a `TypeError` (or `FrozenInstanceError`) is raised
**AC-6: Required-var missing fails fast with pointer**
Given a required env var is unset and no YAML override or default exists
When `load_config(env, paths)` runs
Then it raises an error whose message names the missing var and points to `.env.example`
## Non-Functional Requirements
**Performance**
- Cold-start `load_config` ≤ 250 ms on Tier-2 (allocates the budget for the rest of compose_root within 1 s).
**Reliability**
- Loader is pure: same env + same file contents always yields a deep-equal `Config`. Verified by AC-relevant unit test.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | env vs. YAML for `log.level` | env value wins |
| AC-2 | YAML vs. default | YAML value wins |
| AC-3 | All-default for `log.level` | documented default returned |
| AC-4 | Two YAML files, conflicting key | later file wins |
| AC-5 | Mutation attempt on loaded Config | TypeError / FrozenInstanceError |
| AC-6 | Missing required env var | error message names the var + points to `.env.example` |
| NFR-perf | Microbenchmark `load_config` over a representative config | p99 ≤ 250 ms on Tier-2 |
| NFR-reliability | Call `load_config` twice with same args | deep-equal `Config` instances |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_config/composition_root_protocol.md` v1.0.0.
- No new dependency beyond what AZ-263 / E-BOOT pinned (stdlib + the YAML library already in `pyproject.toml`).
## Risks & Mitigation
**Risk 1: Per-component defaults drift across components**
- *Risk*: Without a documented registration mechanism, two components may both claim a `log.level` default and conflict.
- *Mitigation*: Defaults registry is keyed by component slug + key; collisions raise at registration time, not at load time.
## Contract
This task produces (jointly with AZ-NN compose_root) the contract at `_docs/02_document/contracts/shared_config/composition_root_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
+152
View File
@@ -0,0 +1,152 @@
# SE3Utils Helper Module
**Task**: AZ-277_se3_utils
**Name**: SE3Utils Helper
**Description**: Implement the shared `SE3Utils` helper for SE(3) ↔ 4×4-matrix conversion and Lie-algebra exp/log/adjoint, backed by GTSAM `Pose3` primitives. Used wherever a consumer needs a 6-vector twist, a Jacobian over an SE(3) operation, or a deterministic conversion between matrix and pose forms — i.e., C1, C2.5, C3, C3.5, C4, C5, C8. Stateless; pure functions; strict caller-orthogonalisation contract.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.se3_utils (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-277
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/se3_utils.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/02_helper_se3_utils.md` — design rationale and consumer mapping.
## Problem
Seven components (C1, C2.5, C3, C3.5, C4, C5, C8) need to cross the matrix-vs-pose boundary:
- C4's `solvePnPRansac` returns a 4×4 matrix; C5's iSAM2 graph wants a GTSAM `Pose3`.
- C1's relative-pose update needs `log_map` for covariance recovery.
- C8 encodes pose as a 6-vector for FC adapter emission.
Without a shared helper:
- Each component re-implements the conversion, drifting on rotation conventions, sign conventions, or near-identity edge cases.
- Subtle differences in `det(R)` validation (some silently re-orthogonalise, others reject) break the "same pose in, same pose out" invariant across components.
- Any future change (e.g., switching from GTSAM `Pose3` to `manifpy`) becomes a 7-place coordinated edit.
## Outcome
- A single `helpers.se3_utils` module is the only place that constructs a `Pose3` from a matrix or vice-versa across the codebase. Component imports go through the helper.
- All conversions are pure functions: same input → byte-equal numpy / GTSAM output.
- Strict orthogonal-rotation contract: `matrix_to_se3` rejects non-orthogonal or negative-determinant rotations with `Se3InvalidMatrixError` instead of silently fixing them. Callers are responsible for orthogonalisation; the rejection forces the bug back to the source.
- Near-identity Lie-algebra inputs (twist norm < 1e-10) are stable — `exp_map` falls back to the small-angle Taylor expansion documented in GTSAM rather than NaN-ing on `sin(θ)/θ`.
## Scope
### Included
- `matrix_to_se3(T_4x4) -> SE3`, `se3_to_matrix(SE3) -> np.ndarray`.
- `exp_map(xi) -> SE3`, `log_map(SE3) -> np.ndarray`, `adjoint(SE3) -> np.ndarray`.
- `is_valid_rotation(R_3x3, *, atol)` predicate for callers to check before calling `matrix_to_se3`.
- `Se3InvalidMatrixError` exception type.
- Re-export of GTSAM `Pose3` as `SE3` so consumers do not import GTSAM directly.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/se3_utils.md`.
### Excluded
- Quaternion conversions — consumers convert via numpy / GTSAM directly.
- SE(2) helpers — out of scope.
- Pose interpolation / Slerp — out of scope.
- Higher-order manifold ops (parallel transport, composition Jacobians) — out of scope.
## Acceptance Criteria
**AC-1: 4×4 ↔ SE3 round-trip**
Given a randomly-sampled valid `T_4x4` (orthogonal rotation, positive determinant, identity bottom row)
When `matrix_to_se3` then `se3_to_matrix` runs
Then the recovered matrix matches the input via `np.allclose(..., atol=1e-9)`
**AC-2: Lie-algebra round-trip**
Given a random twist `xi` of shape `(6,)` and norm ≈ 1.0
When `exp_map(xi)` then `log_map(...)` runs
Then the recovered twist matches `xi` via `np.allclose(..., atol=1e-9)`
**AC-3: Near-identity Lie stability**
Given `xi = [1e-12, 1e-12, 1e-12, 1e-12, 1e-12, 1e-12]`
When `exp_map(xi)` runs
Then the result is the identity pose within `atol=1e-9`; no exception, no NaN
**AC-4: Strict orthogonality rejection**
Given `T_4x4` whose `R` has `||R^T R - I||_F = 1e-3`
When `matrix_to_se3(T)` runs
Then `Se3InvalidMatrixError` is raised AND the helper does NOT silently re-orthogonalise (the message names the deviation magnitude)
**AC-5: Mirror rejection**
Given `T_4x4` with `det(R) ≈ -1`
When `matrix_to_se3(T)` runs
Then `Se3InvalidMatrixError` is raised mentioning the negative determinant
**AC-6: Block-layout guard**
Given `T_4x4` with bottom row `[0, 0, 0, 2]` (or any deviation from `[0, 0, 0, 1]`)
When `matrix_to_se3(T)` runs
Then `Se3InvalidMatrixError` is raised mentioning the bottom row
**AC-7: dtype contract**
Given `T_4x4` with `dtype=float32`
When `matrix_to_se3(T)` runs
Then `Se3InvalidMatrixError` is raised mentioning dtype (helpers operate strictly on `float64`)
**AC-8: Determinism**
Given the same `T_4x4` (or `xi`)
When converted twice through any helper function
Then both outputs are byte-equal
**AC-9: No upward imports (Layer 1 invariant)**
Given the helper module
When a static-import check runs
Then it imports ONLY from `_types`, GTSAM, numpy, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
## Non-Functional Requirements
**Performance**
- Each helper function p99 ≤ 50 µs on Tier-2 — overhead vs. inline GTSAM ≤ 5 % (per E-CC-HELPERS hot-path NFR).
**Reliability**
- Pure deterministic; same input → byte-equal output.
- `Se3InvalidMatrixError` is the ONLY exception type the public surface raises on shape / orthogonality / dtype violations.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `np.allclose(se3_to_matrix(matrix_to_se3(T)), T)` for 100 random valid `T` | all pass within `atol=1e-9` |
| AC-2 | `np.allclose(log_map(exp_map(xi)), xi)` for 100 random `xi` (norm ≈ 1.0) | all pass within `atol=1e-9` |
| AC-3 | `exp_map([1e-12]*6)` | identity pose within `atol=1e-9`; no NaN |
| AC-4 | non-orthogonal `T` | `Se3InvalidMatrixError`; message names deviation |
| AC-5 | `det(R) = -1` `T` | `Se3InvalidMatrixError`; mentions determinant |
| AC-6 | bottom row `[0, 0, 0, 2]` | `Se3InvalidMatrixError`; mentions bottom row |
| AC-7 | `float32` dtype | `Se3InvalidMatrixError`; mentions dtype |
| AC-8 | call any helper twice with same input | byte-equal outputs |
| AC-9 | static import scan | only `_types`, GTSAM, numpy, stdlib |
| NFR-perf | microbench each helper (10k iterations on Tier-2 fixture) | p99 ≤ 50 µs each |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/se3_utils.md` v1.0.0.
- Layer 1 Foundation only.
- GTSAM is the single math backend; numpy fallback only when GTSAM does not expose the primitive.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
## Risks & Mitigation
**Risk 1: Silent re-orthogonalisation hides upstream rotation drift**
- *Risk*: A future change "softens" `matrix_to_se3` to silently re-orthogonalise inputs; consumers no longer learn that their rotation source is producing non-orthogonal matrices.
- *Mitigation*: AC-4 makes strict rejection part of the contract. The contract test enforces that `Se3InvalidMatrixError` is raised, not absorbed.
**Risk 2: GTSAM API drift between minor versions**
- *Risk*: `Pose3.expmap` signature changes; this helper breaks on a GTSAM upgrade.
- *Mitigation*: GTSAM is pinned in `pyproject.toml` at AZ-263 / E-BOOT; this helper's tests are the canary that detects drift before consumers do.
## Runtime Completeness
- **Named capability**: SE(3) ↔ matrix conversion + Lie-algebra exp/log/adjoint via GTSAM `Pose3` primitives (architecture / E-CC-HELPERS / `02_helper_se3_utils.md`).
- **Production code that must exist**: real GTSAM-backed conversions; real strict-orthogonality guard; real small-angle Taylor fallback for near-identity exp.
- **Allowed external stubs**: numpy fallback only where GTSAM does not expose the primitive (e.g., adjoint matrix construction).
- **Unacceptable substitutes**: silent re-orthogonalisation; "for now we just call `np.linalg.logm`" (numerically inferior, no Jacobian); skipping near-identity small-angle handling (NaN risk).
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/se3_utils.md`.
Consumers MUST read that file — not this task spec — to discover the interface.
@@ -0,0 +1,154 @@
# Sha256Sidecar Helper Module
**Task**: AZ-280_sha256_sidecar
**Name**: Sha256Sidecar Helper
**Description**: Implement the shared `Sha256Sidecar` helper that owns the atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) must verify gets written atomically AND has a `.sha256` sidecar that the verifier can independently recompute. Used by C6 (FAISS index, descriptor sidecar), C7 (engine cache + INT8 calibration cache), C10 (Manifest), and C11 (tile artifact verification). Stateless static-only design.
**Complexity**: 2 points
**Dependencies**: AZ-263_initial_structure
**Component**: shared.helpers.sha256_sidecar (cross-cutting; epic AZ-264 / E-CC-HELPERS)
**Tracker**: AZ-280
**Epic**: AZ-264 (E-CC-HELPERS)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — frozen public interface this task produces.
- `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` — design rationale and consumer mapping (D-C10-3).
## Problem
The takeoff-load gate (F2) verifies four classes of persistent artifact: FAISS index + descriptor sidecar (C6), TensorRT engine cache + INT8 calibration cache (C7), Manifest (C10), and tile artifacts (C11). Each artifact must be written atomically (no partial files) AND must have a hash sidecar the verifier can independently recompute.
Without a shared helper:
- C6 / C7 / C10 / C11 each grow their own atomic-write + hash implementation; subtle differences in temp-file naming, rename ordering, or sidecar format break the cross-component verifier the moment one drifts.
- The Manifest aggregate hash (which covers many files) goes through path-ordering logic that is implemented in only one place; if that ordering ever differs across a writer and a verifier, the entire cache root looks corrupt.
- An attacker (or accidental `rsync`) replaces `engine.engine` after `engine.engine.sha256` was written; without independent verification, takeoff-load accepts the swapped file.
## Outcome
- A single `helpers.sha256_sidecar` module is the only path through which any onboard process writes hash-verified artifacts.
- Atomic write is a hard contract: the temp-file → rename pattern guarantees no partial file ever appears at the target path. A fault between the bytes-flushed point and the rename leaves either the previous version or no file at all — never a half-written one.
- `verify(path)` recomputes the digest from the file's bytes; it does NOT trust the sidecar's value alone. A swapped artifact with a stale sidecar is detected.
- `aggregate_hash` is order-deterministic (sorts paths first), so the Manifest aggregate is reproducible across writer and verifier.
- The sidecar format is intentionally trivial (lowercase hex digest, no JSON wrapper, no trailing newline) so any small script can verify a single artifact without pulling in the helper.
## Scope
### Included
- `Sha256Sidecar` static methods: `write_atomic`, `write_atomic_and_sidecar`, `verify`, `aggregate_hash`.
- `Sha256SidecarError` exception type wrapping underlying `OSError` and capturing missing/malformed sidecar conditions.
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`.
### Excluded
- Cryptographic signing — this helper is corruption + accidental-replacement defense only; signing is out of scope (mid-flight tile gen has its own per-flight signing key path elsewhere).
- Streaming hashing for payloads larger than RAM — out of scope; the helper's API is `payload: bytes`.
- Compression / on-disk encoding — payloads are written verbatim.
- Sidecar versioning — there is no version byte.
- Filesystem-type detection (warning when run on NFS / overlayfs) — documented in contract Caveats; not enforced at runtime.
## Acceptance Criteria
**AC-1: Round-trip write + verify**
Given a 1 MiB random payload
When `write_atomic_and_sidecar(path, payload)` runs followed by `verify(path)`
Then `verify` returns True AND the sidecar at `path.sha256` contains a 64-char lowercase hex digest matching `hashlib.sha256(payload).hexdigest()`
**AC-2: Atomicity — no partial file on fault**
Given a fault is injected between the temp-file flush and the rename (e.g., monkey-patch `os.replace` to raise `OSError`)
When `write_atomic(path, payload)` runs and raises
Then `path` does NOT exist (or, if it pre-existed, its bytes are unchanged); no `*.tmp` or partial file remains at the target name
**AC-3: Independent verification rejects swapped payloads**
Given an artifact is written via `write_atomic_and_sidecar`, then the file at `path` is overwritten out-of-band with different bytes
When `verify(path)` runs
Then it returns False (NOT True; it must NOT trust the sidecar value alone)
**AC-4: Missing sidecar is an error, not False**
Given an artifact exists at `path` but `path.sha256` was deleted
When `verify(path)` runs
Then `Sha256SidecarError` is raised with a message naming the missing sidecar (the helper does NOT silently return False — that would conflate "corrupt artifact" with "missing sidecar")
**AC-5: Malformed sidecar is rejected**
Given a sidecar containing `not a hex digest` or a digest of wrong length
When `verify(path)` runs
Then `Sha256SidecarError` is raised mentioning malformed sidecar content
**AC-6: Aggregate hash is order-deterministic**
Given three files `a`, `b`, `c` and their hashes
When `aggregate_hash([a, b, c])` and `aggregate_hash([c, a, b])` run
Then both calls return the same hex digest (the implementation sorts paths internally)
**AC-7: Aggregate hash rejects missing files**
Given a list including a non-existent path
When `aggregate_hash` runs
Then `Sha256SidecarError` is raised mentioning the missing path
**AC-8: Sidecar format strictness**
Given the sidecar written by `write_atomic_and_sidecar`
When the file's bytes are read
Then the bytes are EXACTLY the 64-char lowercase hex digest — no JSON wrapper, no trailing newline, no whitespace
**AC-9: No upward imports (Layer 1 invariant)**
Given the helper module
When a static-import check runs
Then it imports ONLY from `_types`, `atomicwrites`, `hashlib`, `pathlib`, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
## Non-Functional Requirements
**Performance**
- No specific latency budget per `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` (consumers are pre-flight / post-landing). Sanity bound: `write_atomic_and_sidecar` of a 1 MiB payload ≤ 50 ms on Tier-2.
**Reliability**
- `Sha256SidecarError` is the ONLY exception type the public surface raises on filesystem / sidecar errors. `OSError` MUST be wrapped so callers do not have to handle two error hierarchies.
- Pure deterministic: same payload always produces the same digest.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Round-trip write + verify on 1 MiB random payload | sidecar matches `hashlib.sha256(payload).hexdigest()`; `verify` True |
| AC-2 | Inject `OSError` between flush and rename | no partial file remains at target name |
| AC-3 | Overwrite payload after sidecar is written | `verify` returns False |
| AC-4 | Delete sidecar; call `verify` | `Sha256SidecarError`; mentions missing sidecar |
| AC-5 | Malformed sidecar content | `Sha256SidecarError`; mentions malformed sidecar |
| AC-6 | `aggregate_hash` with two different orderings | byte-equal digests |
| AC-7 | `aggregate_hash` with a missing path | `Sha256SidecarError`; mentions missing path |
| AC-8 | Read sidecar bytes after `write_atomic_and_sidecar` | exactly 64 hex chars; no newline / whitespace / JSON |
| AC-9 | importlinter / grep gate | no `components.*` imports |
| NFR-perf | Microbench `write_atomic_and_sidecar` of 1 MiB payload | ≤ 50 ms on Tier-2 |
## Constraints
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` v1.0.0.
- Layer 1 Foundation only.
- `atomicwrites` is the single atomic-rename backend; pinned in `pyproject.toml` at AZ-263 / E-BOOT.
- Static-only design satisfies `coderule.mdc`.
- No new dependency beyond what AZ-263 / E-BOOT pinned.
- Production cache root MUST live on a local POSIX filesystem (NFS / SMB / overlayfs are unsupported per the helper's atomic-rename invariant). Documented in deployment artifacts; not enforced at runtime.
## Risks & Mitigation
**Risk 1: A future helper change relaxes atomicity to "best-effort"**
- *Risk*: Someone replaces the temp-file → rename pattern with a direct write under the rationale "rename is slow on certain filesystems"; takeoff-load occasionally sees partial files.
- *Mitigation*: AC-2 makes atomicity a hard test. Any regression that loses the rename is caught immediately.
**Risk 2: `aggregate_hash` ordering drifts between writer and verifier**
- *Risk*: A future change adds case-insensitive sorting or strips path prefixes; writer and verifier disagree; cache root looks corrupt.
- *Mitigation*: AC-6 pins the deterministic-ordering invariant; the contract spells out the exact format (`<filename>\0<file-hex-digest>\n` lines, lexicographically sorted by full path).
**Risk 3: Sidecar format ambiguity (someone wraps the digest in JSON)**
- *Risk*: A future contributor "improves" the sidecar to be JSON for "extensibility"; verification scripts that expect raw hex break.
- *Mitigation*: AC-8 pins the exact byte-level format. Versioning rules force a major bump for any format change.
## Runtime Completeness
- **Named capability**: atomic-write + SHA-256 content-hash sidecar (D-C10-3 / `05_helper_sha256_sidecar.md`).
- **Production code that must exist**: real `atomicwrites`-backed atomic rename; real `hashlib.sha256` digesting; real independent verify.
- **Allowed external stubs**: none — `atomicwrites` and `hashlib` are stdlib + production deps.
- **Unacceptable substitutes**: direct write (loses atomicity); trusting the sidecar value without recomputing the file's hash; JSON-wrapped sidecar; case-insensitive aggregate ordering.
## Contract
This task produces the contract at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`.
Consumers MUST read that file — not this task spec — to discover the interface.