mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 17:31:13 +00:00
[AZ-266] [AZ-269] [AZ-277] [AZ-280] Cross-cutting log/config + SE3/SHA256 helpers
AZ-266: schema-compliant JSON logging entrypoint, level normalisation, handler-topology guard, format-error fallback (log_record_schema v1.0.0). AZ-269: env > YAML > defaults config loader, frozen Config dataclass, missing-var fail-fast with pointer to .env.example, component-block registry. AZ-277: GTSAM-backed SE3Utils (matrix<->SE3 + exp/log/adjoint) with strict orthogonality, dtype, and bottom-row contract enforcement. AZ-280: atomicwrites-backed write_atomic + independent verify + order-deterministic aggregate_hash; sidecar format strictness. pyproject.toml pins gtsam>=4.2,<5.0 and atomicwrites>=1.4,<2.0 (named-backend deps per the AZ-277 / AZ-280 contracts). 139 unit tests pass (44 new). Review verdict: PASS_WITH_WARNINGS; findings are perf-NFR + journald deferrals, no blocking issues. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,106 @@
|
||||
# Shared Structured Logging Module
|
||||
|
||||
**Task**: AZ-266_log_module
|
||||
**Name**: Shared Logging Module
|
||||
**Description**: Provide the `get_logger(component_id)` entrypoint, a stable JSON formatter that emits records matching the log_record_schema contract, and the stdout / journald handlers used by Tier-1 and Tier-2 deployments.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure
|
||||
**Component**: shared.logging (cross-cutting; epic AZ-245 / E-CC-LOG)
|
||||
**Tracker**: AZ-266
|
||||
**Epic**: AZ-245 (E-CC-LOG)
|
||||
|
||||
## Problem
|
||||
|
||||
Every onboard component must emit structured JSON logs at DEBUG / INFO / WARN / ERROR with a stable, machine-parseable shape so post-flight analysis (FDR tooling, blackbox scenario checks, traceability matrix verification) can correlate events across components. Without one shared logger, format drift is guaranteed within a few weeks of parallel component development.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A single `get_logger(component_id)` call is the only logging entrypoint any onboard module ever uses.
|
||||
- Every emitted record is a single-line JSON object whose key set, key order, and value types match the `log_record_schema` contract version 1.0.0.
|
||||
- Tier-1 deployments capture logs via Docker stdout; Tier-2 deployments capture logs via journald — switched by config, not by code.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `get_logger(component_id: str) -> Logger` factory backed by Python stdlib `logging`.
|
||||
- A JSON formatter that emits the schema's 8 fields in the contract-mandated order, regardless of construction order. Implementation may use `python-json-logger` or `orjson`-backed formatter — whichever is already pinned in the project's lockfile from AZ-263.
|
||||
- A stdout handler for Tier-1 (Docker) and a journald handler for Tier-2 (Jetson). Selection is config-driven via the structured-logging entry of the cross-cutting config epic (AZ-246 / E-CC-CONF).
|
||||
- Per-frame structured-logging helpers for the documented per-component shapes referenced in epic AZ-245 (`vio.tick`, `vpr.query`, etc.) so component code can emit one-liner logs without rebuilding the kv dict.
|
||||
- Public interface contract published at `_docs/02_document/contracts/shared_logging/log_record_schema.md`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The FDR bridge that forwards ERROR + WARN records into the Flight Data Recorder — owned by the next task (`03_fdr_log_bridge`, parented to the same epic).
|
||||
- Per-component log call sites (each component epic owns its own logging call sites).
|
||||
- Log schema versioning beyond 1.0.0 — handled by future change-log entries on the contract file.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Single logger entrypoint**
|
||||
Given any onboard Python module that imports the shared logging package
|
||||
When the module calls `get_logger("c2_vpr")`
|
||||
Then it receives a `Logger` whose every record passes the schema contract test (no other logger configuration is required by the caller)
|
||||
|
||||
**AC-2: Field order is stable**
|
||||
Given a logger configured with the JSON formatter
|
||||
When a component calls `logger.info(msg, extra={"frame_id": 42, "kind": "vpr.query", "kv": {...}})`
|
||||
Then the emitted bytes parse as a single-line JSON object whose keys appear in the order `ts, level, component, frame_id, kind, msg, kv, exc`, regardless of the order the caller passed the fields
|
||||
|
||||
**AC-3: Level normalisation**
|
||||
Given a logger receiving a record at level `WARNING` (Python stdlib name)
|
||||
When the formatter emits the JSON record
|
||||
Then the `level` field reads `WARN` (per contract), not `WARNING`
|
||||
|
||||
**AC-4: Handler topology selection**
|
||||
Given the structured-logging config block selects `tier=1` (or `tier=2`)
|
||||
When `runtime_root.py` initialises logging
|
||||
Then exactly one stdout handler (or journald handler) is attached, with no duplicate handlers and no handler from the wrong tier
|
||||
|
||||
**AC-5: Non-frame records omit frame_id**
|
||||
Given a startup or shutdown log call that does not pass a `frame_id`
|
||||
When the record is emitted
|
||||
Then `frame_id` appears as JSON `null` (never as a synthesised value, never absent from the key list)
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Per-record formatter latency p99 ≤ 0.2 ms on Tier-2 (Jetson Orin Nano Super) for a record with `len(kv) ≤ 8` scalar entries. Validated by a microbenchmark in unit tests.
|
||||
- DEBUG records on the steady-state hot path allocate at most one new string (the formatted JSON line); no transient dict copies of `kv` are permitted.
|
||||
|
||||
**Reliability**
|
||||
- Formatter never raises into the caller. A serialisation failure logs an internal `WARN` with `kind="log.format_error"` and drops the offending record's `kv` payload (replaces with `{"_format_error": "<reason>"}`); the rest of the record is still emitted.
|
||||
- No global mutable state outside the standard `logging` module's own logger registry; multiple `get_logger("c2_vpr")` calls return the same cached `Logger` instance.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `get_logger("c2_vpr")` returns a Logger with the JSON formatter attached | Logger instance present; formatter produces valid contract record |
|
||||
| AC-2 | Emit a record with kwargs in shuffled order | Parsed JSON keys appear in the contract's mandated order |
|
||||
| AC-3 | Log at `logging.WARNING` level | Emitted JSON `level` field equals `"WARN"` |
|
||||
| AC-4 | Initialise logging twice with the same tier-1 config | Exactly one stdout handler attached; no duplicates |
|
||||
| AC-5 | Log a startup INFO without `frame_id` | Emitted JSON contains `"frame_id": null` |
|
||||
| NFR-perf | Microbenchmark formatter on a record with 8 scalar kv entries | p99 ≤ 0.2 ms over 10k iterations |
|
||||
| NFR-reliability | Pass a non-JSON-serialisable object in `kv` (e.g. a class instance) | Formatter emits the record with `kv={"_format_error": "..."}`; caller does not see an exception |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Public interface frozen by `_docs/02_document/contracts/shared_logging/log_record_schema.md` v1.0.0 — any change requires a contract version bump.
|
||||
- Stdlib `logging` is the only allowed underlying logging mechanism (per epic AZ-245 architecture note: "no third-party log aggregator").
|
||||
- No new dependency beyond what AZ-263 / E-BOOT already pinned in `pyproject.toml`.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Formatter performance regression**
|
||||
- *Risk*: Naïve `json.dumps` on each record exceeds the 0.2 ms p99 budget on Jetson.
|
||||
- *Mitigation*: Bench against `orjson`-backed formatter as a fallback if stdlib `json` misses budget; choice is reversible because the contract is the public surface, not the formatter implementation.
|
||||
|
||||
**Risk 2: Handler duplication on hot-reload**
|
||||
- *Risk*: Re-initialising logging during integration tests stacks duplicate handlers, multiplying every emitted record.
|
||||
- *Mitigation*: `get_logger` checks for existing handlers on the named logger before adding new ones; integration test fixture asserts handler count after teardown.
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces the contract at `_docs/02_document/contracts/shared_logging/log_record_schema.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
@@ -0,0 +1,104 @@
|
||||
# Config Loader + Outer Config Container
|
||||
|
||||
**Task**: AZ-269_config_loader
|
||||
**Name**: Config Loader
|
||||
**Description**: Implement `load_config(env, paths) -> Config` and the outer frozen `Config` dataclass. Merges env vars + one or more YAML files + documented defaults with strict precedence (env > YAML > defaults), returning an immutable container that holds one nested dataclass field per component slug.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure
|
||||
**Component**: shared.config (cross-cutting; epic AZ-246 / E-CC-CONF)
|
||||
**Tracker**: AZ-269
|
||||
**Epic**: AZ-246 (E-CC-CONF)
|
||||
|
||||
## Problem
|
||||
|
||||
ADR-001 (runtime selection by config) and ADR-009 (composition root) both require a single source of truth for configuration. Without a shared loader with explicit precedence rules, components silently fall back to defaults, the composition root grows local config-parsing logic, and operators cannot reliably override settings via env in CI or by YAML in the field.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `load_config(env, paths)` is the only function any onboard process uses to materialise its `Config` at startup.
|
||||
- Precedence is deterministic and observable: env > YAML > defaults; later YAML files win over earlier ones; missing keys fall to defaults.
|
||||
- The returned `Config` is frozen end-to-end (every nested component block is also frozen) so accidental mutation by component code is a TypeError.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `load_config(env: Mapping[str, str], paths: Sequence[Path]) -> Config` per the composition_root_protocol contract.
|
||||
- Outer frozen `Config` dataclass with one nested field per component slug. The OUTER container is owned by this task; the per-component nested dataclasses are owned by each component's epic and registered into the outer Config via a documented extension mechanism (a registry function called from `runtime_root.py`).
|
||||
- Documented default values for cross-cutting blocks only (logging level, FDR queue size, etc.). Per-component defaults live in their own component epics.
|
||||
- Friendly error messages when a required env var is missing (per AZ-263 AC-8): the error names the offending variable and points to `.env.example`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- `compose_root` and `compose_operator` — owned by the next PBI in this epic.
|
||||
- Per-component config blocks — owned by each component epic.
|
||||
- The runtime self-check that strategies are linked — owned by the next PBI (StrategyNotLinkedError).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Precedence env > YAML > defaults**
|
||||
Given env sets `LOG_LEVEL=DEBUG` and YAML sets `log.level=INFO`
|
||||
When `load_config(env, [yaml_path])` runs
|
||||
Then `config.log.level == "DEBUG"`
|
||||
|
||||
**AC-2: YAML > defaults when env is silent**
|
||||
Given env has no `LOG_LEVEL` and YAML sets `log.level=INFO`
|
||||
When `load_config(env, [yaml_path])` runs
|
||||
Then `config.log.level == "INFO"`
|
||||
|
||||
**AC-3: Defaults fill gaps**
|
||||
Given env has no `LOG_LEVEL` and YAML omits `log.level`
|
||||
When `load_config(env, [yaml_path])` runs
|
||||
Then `config.log.level` equals the documented default
|
||||
|
||||
**AC-4: Multi-file YAML merge order**
|
||||
Given two YAML paths where the second sets `fdr.queue_size=8192` and the first sets it to `4096`
|
||||
When `load_config(env, [first, second])` runs
|
||||
Then `config.fdr.queue_size == 8192` (later file wins)
|
||||
|
||||
**AC-5: Frozen end-to-end**
|
||||
Given a loaded `Config`
|
||||
When component code attempts `config.log.level = "DEBUG"`
|
||||
Then a `TypeError` (or `FrozenInstanceError`) is raised
|
||||
|
||||
**AC-6: Required-var missing fails fast with pointer**
|
||||
Given a required env var is unset and no YAML override or default exists
|
||||
When `load_config(env, paths)` runs
|
||||
Then it raises an error whose message names the missing var and points to `.env.example`
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Cold-start `load_config` ≤ 250 ms on Tier-2 (allocates the budget for the rest of compose_root within 1 s).
|
||||
|
||||
**Reliability**
|
||||
- Loader is pure: same env + same file contents always yields a deep-equal `Config`. Verified by AC-relevant unit test.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | env vs. YAML for `log.level` | env value wins |
|
||||
| AC-2 | YAML vs. default | YAML value wins |
|
||||
| AC-3 | All-default for `log.level` | documented default returned |
|
||||
| AC-4 | Two YAML files, conflicting key | later file wins |
|
||||
| AC-5 | Mutation attempt on loaded Config | TypeError / FrozenInstanceError |
|
||||
| AC-6 | Missing required env var | error message names the var + points to `.env.example` |
|
||||
| NFR-perf | Microbenchmark `load_config` over a representative config | p99 ≤ 250 ms on Tier-2 |
|
||||
| NFR-reliability | Call `load_config` twice with same args | deep-equal `Config` instances |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Public surface frozen by `_docs/02_document/contracts/shared_config/composition_root_protocol.md` v1.0.0.
|
||||
- No new dependency beyond what AZ-263 / E-BOOT pinned (stdlib + the YAML library already in `pyproject.toml`).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Per-component defaults drift across components**
|
||||
- *Risk*: Without a documented registration mechanism, two components may both claim a `log.level` default and conflict.
|
||||
- *Mitigation*: Defaults registry is keyed by component slug + key; collisions raise at registration time, not at load time.
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces (jointly with AZ-NN compose_root) the contract at `_docs/02_document/contracts/shared_config/composition_root_protocol.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
@@ -0,0 +1,152 @@
|
||||
# SE3Utils Helper Module
|
||||
|
||||
**Task**: AZ-277_se3_utils
|
||||
**Name**: SE3Utils Helper
|
||||
**Description**: Implement the shared `SE3Utils` helper for SE(3) ↔ 4×4-matrix conversion and Lie-algebra exp/log/adjoint, backed by GTSAM `Pose3` primitives. Used wherever a consumer needs a 6-vector twist, a Jacobian over an SE(3) operation, or a deterministic conversion between matrix and pose forms — i.e., C1, C2.5, C3, C3.5, C4, C5, C8. Stateless; pure functions; strict caller-orthogonalisation contract.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-263_initial_structure
|
||||
**Component**: shared.helpers.se3_utils (cross-cutting; epic AZ-264 / E-CC-HELPERS)
|
||||
**Tracker**: AZ-277
|
||||
**Epic**: AZ-264 (E-CC-HELPERS)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/shared_helpers/se3_utils.md` — frozen public interface this task produces.
|
||||
- `_docs/02_document/common-helpers/02_helper_se3_utils.md` — design rationale and consumer mapping.
|
||||
|
||||
## Problem
|
||||
|
||||
Seven components (C1, C2.5, C3, C3.5, C4, C5, C8) need to cross the matrix-vs-pose boundary:
|
||||
- C4's `solvePnPRansac` returns a 4×4 matrix; C5's iSAM2 graph wants a GTSAM `Pose3`.
|
||||
- C1's relative-pose update needs `log_map` for covariance recovery.
|
||||
- C8 encodes pose as a 6-vector for FC adapter emission.
|
||||
|
||||
Without a shared helper:
|
||||
- Each component re-implements the conversion, drifting on rotation conventions, sign conventions, or near-identity edge cases.
|
||||
- Subtle differences in `det(R)` validation (some silently re-orthogonalise, others reject) break the "same pose in, same pose out" invariant across components.
|
||||
- Any future change (e.g., switching from GTSAM `Pose3` to `manifpy`) becomes a 7-place coordinated edit.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A single `helpers.se3_utils` module is the only place that constructs a `Pose3` from a matrix or vice-versa across the codebase. Component imports go through the helper.
|
||||
- All conversions are pure functions: same input → byte-equal numpy / GTSAM output.
|
||||
- Strict orthogonal-rotation contract: `matrix_to_se3` rejects non-orthogonal or negative-determinant rotations with `Se3InvalidMatrixError` instead of silently fixing them. Callers are responsible for orthogonalisation; the rejection forces the bug back to the source.
|
||||
- Near-identity Lie-algebra inputs (twist norm < 1e-10) are stable — `exp_map` falls back to the small-angle Taylor expansion documented in GTSAM rather than NaN-ing on `sin(θ)/θ`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `matrix_to_se3(T_4x4) -> SE3`, `se3_to_matrix(SE3) -> np.ndarray`.
|
||||
- `exp_map(xi) -> SE3`, `log_map(SE3) -> np.ndarray`, `adjoint(SE3) -> np.ndarray`.
|
||||
- `is_valid_rotation(R_3x3, *, atol)` predicate for callers to check before calling `matrix_to_se3`.
|
||||
- `Se3InvalidMatrixError` exception type.
|
||||
- Re-export of GTSAM `Pose3` as `SE3` so consumers do not import GTSAM directly.
|
||||
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/se3_utils.md`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Quaternion conversions — consumers convert via numpy / GTSAM directly.
|
||||
- SE(2) helpers — out of scope.
|
||||
- Pose interpolation / Slerp — out of scope.
|
||||
- Higher-order manifold ops (parallel transport, composition Jacobians) — out of scope.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: 4×4 ↔ SE3 round-trip**
|
||||
Given a randomly-sampled valid `T_4x4` (orthogonal rotation, positive determinant, identity bottom row)
|
||||
When `matrix_to_se3` then `se3_to_matrix` runs
|
||||
Then the recovered matrix matches the input via `np.allclose(..., atol=1e-9)`
|
||||
|
||||
**AC-2: Lie-algebra round-trip**
|
||||
Given a random twist `xi` of shape `(6,)` and norm ≈ 1.0
|
||||
When `exp_map(xi)` then `log_map(...)` runs
|
||||
Then the recovered twist matches `xi` via `np.allclose(..., atol=1e-9)`
|
||||
|
||||
**AC-3: Near-identity Lie stability**
|
||||
Given `xi = [1e-12, 1e-12, 1e-12, 1e-12, 1e-12, 1e-12]`
|
||||
When `exp_map(xi)` runs
|
||||
Then the result is the identity pose within `atol=1e-9`; no exception, no NaN
|
||||
|
||||
**AC-4: Strict orthogonality rejection**
|
||||
Given `T_4x4` whose `R` has `||R^T R - I||_F = 1e-3`
|
||||
When `matrix_to_se3(T)` runs
|
||||
Then `Se3InvalidMatrixError` is raised AND the helper does NOT silently re-orthogonalise (the message names the deviation magnitude)
|
||||
|
||||
**AC-5: Mirror rejection**
|
||||
Given `T_4x4` with `det(R) ≈ -1`
|
||||
When `matrix_to_se3(T)` runs
|
||||
Then `Se3InvalidMatrixError` is raised mentioning the negative determinant
|
||||
|
||||
**AC-6: Block-layout guard**
|
||||
Given `T_4x4` with bottom row `[0, 0, 0, 2]` (or any deviation from `[0, 0, 0, 1]`)
|
||||
When `matrix_to_se3(T)` runs
|
||||
Then `Se3InvalidMatrixError` is raised mentioning the bottom row
|
||||
|
||||
**AC-7: dtype contract**
|
||||
Given `T_4x4` with `dtype=float32`
|
||||
When `matrix_to_se3(T)` runs
|
||||
Then `Se3InvalidMatrixError` is raised mentioning dtype (helpers operate strictly on `float64`)
|
||||
|
||||
**AC-8: Determinism**
|
||||
Given the same `T_4x4` (or `xi`)
|
||||
When converted twice through any helper function
|
||||
Then both outputs are byte-equal
|
||||
|
||||
**AC-9: No upward imports (Layer 1 invariant)**
|
||||
Given the helper module
|
||||
When a static-import check runs
|
||||
Then it imports ONLY from `_types`, GTSAM, numpy, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Each helper function p99 ≤ 50 µs on Tier-2 — overhead vs. inline GTSAM ≤ 5 % (per E-CC-HELPERS hot-path NFR).
|
||||
|
||||
**Reliability**
|
||||
- Pure deterministic; same input → byte-equal output.
|
||||
- `Se3InvalidMatrixError` is the ONLY exception type the public surface raises on shape / orthogonality / dtype violations.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `np.allclose(se3_to_matrix(matrix_to_se3(T)), T)` for 100 random valid `T` | all pass within `atol=1e-9` |
|
||||
| AC-2 | `np.allclose(log_map(exp_map(xi)), xi)` for 100 random `xi` (norm ≈ 1.0) | all pass within `atol=1e-9` |
|
||||
| AC-3 | `exp_map([1e-12]*6)` | identity pose within `atol=1e-9`; no NaN |
|
||||
| AC-4 | non-orthogonal `T` | `Se3InvalidMatrixError`; message names deviation |
|
||||
| AC-5 | `det(R) = -1` `T` | `Se3InvalidMatrixError`; mentions determinant |
|
||||
| AC-6 | bottom row `[0, 0, 0, 2]` | `Se3InvalidMatrixError`; mentions bottom row |
|
||||
| AC-7 | `float32` dtype | `Se3InvalidMatrixError`; mentions dtype |
|
||||
| AC-8 | call any helper twice with same input | byte-equal outputs |
|
||||
| AC-9 | static import scan | only `_types`, GTSAM, numpy, stdlib |
|
||||
| NFR-perf | microbench each helper (10k iterations on Tier-2 fixture) | p99 ≤ 50 µs each |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/se3_utils.md` v1.0.0.
|
||||
- Layer 1 Foundation only.
|
||||
- GTSAM is the single math backend; numpy fallback only when GTSAM does not expose the primitive.
|
||||
- No new dependency beyond what AZ-263 / E-BOOT pinned.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Silent re-orthogonalisation hides upstream rotation drift**
|
||||
- *Risk*: A future change "softens" `matrix_to_se3` to silently re-orthogonalise inputs; consumers no longer learn that their rotation source is producing non-orthogonal matrices.
|
||||
- *Mitigation*: AC-4 makes strict rejection part of the contract. The contract test enforces that `Se3InvalidMatrixError` is raised, not absorbed.
|
||||
|
||||
**Risk 2: GTSAM API drift between minor versions**
|
||||
- *Risk*: `Pose3.expmap` signature changes; this helper breaks on a GTSAM upgrade.
|
||||
- *Mitigation*: GTSAM is pinned in `pyproject.toml` at AZ-263 / E-BOOT; this helper's tests are the canary that detects drift before consumers do.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: SE(3) ↔ matrix conversion + Lie-algebra exp/log/adjoint via GTSAM `Pose3` primitives (architecture / E-CC-HELPERS / `02_helper_se3_utils.md`).
|
||||
- **Production code that must exist**: real GTSAM-backed conversions; real strict-orthogonality guard; real small-angle Taylor fallback for near-identity exp.
|
||||
- **Allowed external stubs**: numpy fallback only where GTSAM does not expose the primitive (e.g., adjoint matrix construction).
|
||||
- **Unacceptable substitutes**: silent re-orthogonalisation; "for now we just call `np.linalg.logm`" (numerically inferior, no Jacobian); skipping near-identity small-angle handling (NaN risk).
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces the contract at `_docs/02_document/contracts/shared_helpers/se3_utils.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
@@ -0,0 +1,154 @@
|
||||
# Sha256Sidecar Helper Module
|
||||
|
||||
**Task**: AZ-280_sha256_sidecar
|
||||
**Name**: Sha256Sidecar Helper
|
||||
**Description**: Implement the shared `Sha256Sidecar` helper that owns the atomic-write + SHA-256 content-hash sidecar pattern (D-C10-3). Every persistent artifact that takeoff-load (F2) must verify gets written atomically AND has a `.sha256` sidecar that the verifier can independently recompute. Used by C6 (FAISS index, descriptor sidecar), C7 (engine cache + INT8 calibration cache), C10 (Manifest), and C11 (tile artifact verification). Stateless static-only design.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-263_initial_structure
|
||||
**Component**: shared.helpers.sha256_sidecar (cross-cutting; epic AZ-264 / E-CC-HELPERS)
|
||||
**Tracker**: AZ-280
|
||||
**Epic**: AZ-264 (E-CC-HELPERS)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — frozen public interface this task produces.
|
||||
- `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` — design rationale and consumer mapping (D-C10-3).
|
||||
|
||||
## Problem
|
||||
|
||||
The takeoff-load gate (F2) verifies four classes of persistent artifact: FAISS index + descriptor sidecar (C6), TensorRT engine cache + INT8 calibration cache (C7), Manifest (C10), and tile artifacts (C11). Each artifact must be written atomically (no partial files) AND must have a hash sidecar the verifier can independently recompute.
|
||||
|
||||
Without a shared helper:
|
||||
- C6 / C7 / C10 / C11 each grow their own atomic-write + hash implementation; subtle differences in temp-file naming, rename ordering, or sidecar format break the cross-component verifier the moment one drifts.
|
||||
- The Manifest aggregate hash (which covers many files) goes through path-ordering logic that is implemented in only one place; if that ordering ever differs across a writer and a verifier, the entire cache root looks corrupt.
|
||||
- An attacker (or accidental `rsync`) replaces `engine.engine` after `engine.engine.sha256` was written; without independent verification, takeoff-load accepts the swapped file.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A single `helpers.sha256_sidecar` module is the only path through which any onboard process writes hash-verified artifacts.
|
||||
- Atomic write is a hard contract: the temp-file → rename pattern guarantees no partial file ever appears at the target path. A fault between the bytes-flushed point and the rename leaves either the previous version or no file at all — never a half-written one.
|
||||
- `verify(path)` recomputes the digest from the file's bytes; it does NOT trust the sidecar's value alone. A swapped artifact with a stale sidecar is detected.
|
||||
- `aggregate_hash` is order-deterministic (sorts paths first), so the Manifest aggregate is reproducible across writer and verifier.
|
||||
- The sidecar format is intentionally trivial (lowercase hex digest, no JSON wrapper, no trailing newline) so any small script can verify a single artifact without pulling in the helper.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `Sha256Sidecar` static methods: `write_atomic`, `write_atomic_and_sidecar`, `verify`, `aggregate_hash`.
|
||||
- `Sha256SidecarError` exception type wrapping underlying `OSError` and capturing missing/malformed sidecar conditions.
|
||||
- Public interface contract published at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Cryptographic signing — this helper is corruption + accidental-replacement defense only; signing is out of scope (mid-flight tile gen has its own per-flight signing key path elsewhere).
|
||||
- Streaming hashing for payloads larger than RAM — out of scope; the helper's API is `payload: bytes`.
|
||||
- Compression / on-disk encoding — payloads are written verbatim.
|
||||
- Sidecar versioning — there is no version byte.
|
||||
- Filesystem-type detection (warning when run on NFS / overlayfs) — documented in contract Caveats; not enforced at runtime.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Round-trip write + verify**
|
||||
Given a 1 MiB random payload
|
||||
When `write_atomic_and_sidecar(path, payload)` runs followed by `verify(path)`
|
||||
Then `verify` returns True AND the sidecar at `path.sha256` contains a 64-char lowercase hex digest matching `hashlib.sha256(payload).hexdigest()`
|
||||
|
||||
**AC-2: Atomicity — no partial file on fault**
|
||||
Given a fault is injected between the temp-file flush and the rename (e.g., monkey-patch `os.replace` to raise `OSError`)
|
||||
When `write_atomic(path, payload)` runs and raises
|
||||
Then `path` does NOT exist (or, if it pre-existed, its bytes are unchanged); no `*.tmp` or partial file remains at the target name
|
||||
|
||||
**AC-3: Independent verification rejects swapped payloads**
|
||||
Given an artifact is written via `write_atomic_and_sidecar`, then the file at `path` is overwritten out-of-band with different bytes
|
||||
When `verify(path)` runs
|
||||
Then it returns False (NOT True; it must NOT trust the sidecar value alone)
|
||||
|
||||
**AC-4: Missing sidecar is an error, not False**
|
||||
Given an artifact exists at `path` but `path.sha256` was deleted
|
||||
When `verify(path)` runs
|
||||
Then `Sha256SidecarError` is raised with a message naming the missing sidecar (the helper does NOT silently return False — that would conflate "corrupt artifact" with "missing sidecar")
|
||||
|
||||
**AC-5: Malformed sidecar is rejected**
|
||||
Given a sidecar containing `not a hex digest` or a digest of wrong length
|
||||
When `verify(path)` runs
|
||||
Then `Sha256SidecarError` is raised mentioning malformed sidecar content
|
||||
|
||||
**AC-6: Aggregate hash is order-deterministic**
|
||||
Given three files `a`, `b`, `c` and their hashes
|
||||
When `aggregate_hash([a, b, c])` and `aggregate_hash([c, a, b])` run
|
||||
Then both calls return the same hex digest (the implementation sorts paths internally)
|
||||
|
||||
**AC-7: Aggregate hash rejects missing files**
|
||||
Given a list including a non-existent path
|
||||
When `aggregate_hash` runs
|
||||
Then `Sha256SidecarError` is raised mentioning the missing path
|
||||
|
||||
**AC-8: Sidecar format strictness**
|
||||
Given the sidecar written by `write_atomic_and_sidecar`
|
||||
When the file's bytes are read
|
||||
Then the bytes are EXACTLY the 64-char lowercase hex digest — no JSON wrapper, no trailing newline, no whitespace
|
||||
|
||||
**AC-9: No upward imports (Layer 1 invariant)**
|
||||
Given the helper module
|
||||
When a static-import check runs
|
||||
Then it imports ONLY from `_types`, `atomicwrites`, `hashlib`, `pathlib`, and stdlib — no `gps_denied_onboard.components.*` imports anywhere
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- No specific latency budget per `_docs/02_document/common-helpers/05_helper_sha256_sidecar.md` (consumers are pre-flight / post-landing). Sanity bound: `write_atomic_and_sidecar` of a 1 MiB payload ≤ 50 ms on Tier-2.
|
||||
|
||||
**Reliability**
|
||||
- `Sha256SidecarError` is the ONLY exception type the public surface raises on filesystem / sidecar errors. `OSError` MUST be wrapped so callers do not have to handle two error hierarchies.
|
||||
- Pure deterministic: same payload always produces the same digest.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Round-trip write + verify on 1 MiB random payload | sidecar matches `hashlib.sha256(payload).hexdigest()`; `verify` True |
|
||||
| AC-2 | Inject `OSError` between flush and rename | no partial file remains at target name |
|
||||
| AC-3 | Overwrite payload after sidecar is written | `verify` returns False |
|
||||
| AC-4 | Delete sidecar; call `verify` | `Sha256SidecarError`; mentions missing sidecar |
|
||||
| AC-5 | Malformed sidecar content | `Sha256SidecarError`; mentions malformed sidecar |
|
||||
| AC-6 | `aggregate_hash` with two different orderings | byte-equal digests |
|
||||
| AC-7 | `aggregate_hash` with a missing path | `Sha256SidecarError`; mentions missing path |
|
||||
| AC-8 | Read sidecar bytes after `write_atomic_and_sidecar` | exactly 64 hex chars; no newline / whitespace / JSON |
|
||||
| AC-9 | importlinter / grep gate | no `components.*` imports |
|
||||
| NFR-perf | Microbench `write_atomic_and_sidecar` of 1 MiB payload | ≤ 50 ms on Tier-2 |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Public surface frozen by `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` v1.0.0.
|
||||
- Layer 1 Foundation only.
|
||||
- `atomicwrites` is the single atomic-rename backend; pinned in `pyproject.toml` at AZ-263 / E-BOOT.
|
||||
- Static-only design satisfies `coderule.mdc`.
|
||||
- No new dependency beyond what AZ-263 / E-BOOT pinned.
|
||||
- Production cache root MUST live on a local POSIX filesystem (NFS / SMB / overlayfs are unsupported per the helper's atomic-rename invariant). Documented in deployment artifacts; not enforced at runtime.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: A future helper change relaxes atomicity to "best-effort"**
|
||||
- *Risk*: Someone replaces the temp-file → rename pattern with a direct write under the rationale "rename is slow on certain filesystems"; takeoff-load occasionally sees partial files.
|
||||
- *Mitigation*: AC-2 makes atomicity a hard test. Any regression that loses the rename is caught immediately.
|
||||
|
||||
**Risk 2: `aggregate_hash` ordering drifts between writer and verifier**
|
||||
- *Risk*: A future change adds case-insensitive sorting or strips path prefixes; writer and verifier disagree; cache root looks corrupt.
|
||||
- *Mitigation*: AC-6 pins the deterministic-ordering invariant; the contract spells out the exact format (`<filename>\0<file-hex-digest>\n` lines, lexicographically sorted by full path).
|
||||
|
||||
**Risk 3: Sidecar format ambiguity (someone wraps the digest in JSON)**
|
||||
- *Risk*: A future contributor "improves" the sidecar to be JSON for "extensibility"; verification scripts that expect raw hex break.
|
||||
- *Mitigation*: AC-8 pins the exact byte-level format. Versioning rules force a major bump for any format change.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: atomic-write + SHA-256 content-hash sidecar (D-C10-3 / `05_helper_sha256_sidecar.md`).
|
||||
- **Production code that must exist**: real `atomicwrites`-backed atomic rename; real `hashlib.sha256` digesting; real independent verify.
|
||||
- **Allowed external stubs**: none — `atomicwrites` and `hashlib` are stdlib + production deps.
|
||||
- **Unacceptable substitutes**: direct write (loses atomicity); trusting the sidecar value without recomputing the file's hash; JSON-wrapped sidecar; case-insensitive aggregate ordering.
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces the contract at `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
Reference in New Issue
Block a user