# Batch 24 / Cycle 1 — Implementation Report **Date**: 2026-05-12 **Tasks**: AZ-300 (C7 PytorchFp16Runtime — mandatory simple-baseline) **Story points landed**: 2 **Status**: complete (AZ-300 → In Testing) ## Scope summary Single-task batch by design — narrowed from the initial post-AZ-332 plan (`{AZ-300, AZ-301, AZ-302}`) to keep the post-OKVIS2 turn at a reviewable size. AZ-301 (EngineGate, 3pt) and AZ-302 (ThermalStatePublisher, 3pt) move to batch 25. ## Files added / modified ### New - `src/gps_denied_onboard/components/c7_inference/pytorch_fp16_runtime.py` — `PytorchFp16Runtime` + `PytorchEngineHandle` + `_to_numpy_dict` output-shape adapter. - `src/gps_denied_onboard/components/c7_inference/architecture_registry.py` — torch-free `register_architecture` / `default_registry` / `ArchitectureFactory`. Risk-1 mitigation (no L2→L3 back-edge from C7 into per-backbone code). - `tests/unit/c7_inference/test_pytorch_fp16_runtime.py` — 17 tests covering AC-1..AC-8 + NFRs; CPU-runnable subset green on macOS. ### Modified - `src/gps_denied_onboard/components/c7_inference/__init__.py` — re-exports `ArchitectureFactory`, `default_registry`, `register_architecture`. Still does NOT import the concrete strategy module (Invariant I-5 / Risk-2). - `src/gps_denied_onboard/components/c7_inference/config.py` — added `per_frame_debug_log: bool = False` to `C7InferenceConfig` (gates the DEBUG per-frame latency log per spec § Scope). - `tests/unit/c7_inference/test_protocol_conformance.py` — narrowed `test_ac5_build_inference_runtime_flag_on_but_module_missing` parametrisation to exclude `pytorch_fp16` (now-built); TRT / ORT remain covered (AZ-298 / AZ-299 still pending). - `_docs/02_tasks/todo/AZ-300_c7_pytorch_baseline.md` → moved to `_docs/02_tasks/done/`; added an `## Implementation Notes (2026-05-12, batch 24)` section documenting the three task-spec → as-built deltas. ## Design decisions (resolved spec contradictions) 1. **Constructor shape** — `__init__(config: Config, *, thermal_publisher=None, architecture_registry=None, clock=None)`. AZ-297 factory passes `config` only; thermal-publisher injection waits for AZ-302 to update the factory. Same pattern as AZ-332 vs. AZ-331 (user-approved option A from the prior batch). 2. **Architecture registry key** — `EngineCacheEntry.extras["model_name"]`, populated from the checkpoint's file stem inside `compile_engine`. Avoids touching the frozen `BuildConfig` / `EngineCacheEntry` DTOs. 3. **Warm-up forward** — deferred to AZ-300 tier-2 follow-up. The registry has no input-shape metadata; a real warm-up needs per-backbone shape info owned by each backbone's composition wiring. ## AC coverage | AC | Status | Notes | |----|--------|-------| | AC-1 protocol conformance | covered | `test_ac1_protocol_conformance` | | AC-2 compile_engine no-op | covered | `test_ac2_compile_engine_is_noop` | | AC-3 deserialize half-cast/GPU/eval | covered (CUDA-skip on Tier-1) | `test_ac3_deserialize_loads_half_casts_gpu_moves_eval` | | AC-4 infer numerical FP32 reference | covered (CUDA-skip on Tier-1) | `test_ac4_infer_numerical_close_to_fp32`; atol=5e-3, rtol=5e-3 for FP16 tiny linear | | AC-5 release frees GPU memory | covered (CUDA-skip on Tier-1) | `test_ac5_release_frees_gpu_memory` + I-7 idempotent assertion | | AC-6 missing checkpoint | covered | `test_ac6_missing_checkpoint_raises` | | AC-7 mismatched state_dict | covered | `test_ac7_incompatible_state_dict_raises_with_cause` (validates `__cause__` chain) | | AC-8 CUDA OOM rewrap | covered (CUDA-skip on Tier-1) | `test_ac8_cuda_oom_during_infer_rewrapped` (synthetic OOM via stub model) | | NFR-perf-deserialize | tier2 | Jetson-only validation | | NFR-reliability-eval-mode | covered (CUDA-skip on Tier-1) | `test_nfr_reliability_eval_mode_unconditional` | Additional coverage beyond ACs: - `test_thermal_state_default_safe_when_no_publisher` — Invariant I-6 fallback when AZ-302 publisher absent. - `test_thermal_state_delegates_to_publisher` — duck-typed `.read()` delegation, forward-compat with AZ-302. - `test_deserialize_missing_architecture_registration` — registry lookup miss path. - `test_infer_rejects_foreign_handle` / `test_infer_rejects_released_handle` — handle-lifecycle guards (consumers MUST pass back the same runtime's handle). - `test_register_architecture_rejects_collision` / `test_register_architecture_same_factory_is_idempotent` — composition-time registry safety. ## Test run ``` .venv/bin/pytest tests/unit/c7_inference/ → 63 passed, 6 skipped .venv/bin/pytest → 1120 passed, 10 skipped ``` The 6 c7_inference skips are CUDA-gated. The 10 full-suite skips are all environment-gated (CUDA + Tier-2 + cmake/actionlint not on PATH). No pre-existing tests regressed. ## Self-review verdict **Pass.** Followed AZ-297 contract (Protocol surface + factory shape + error envelope + Invariant I-1/2/4/5/6/7/8). The single test-protocol-conformance edit is narrowly scoped (parametrisation filter, not behaviour change). No churn outside `c7_inference`. ## Known gaps for the Product Implementation Completeness Gate - **Warm-up forward**: deferred to AZ-300 tier-2 (Jetson). Real first `infer` call does the implicit warm-up; AC-3 still passes because it only checks dtype/device/training-mode, not warm-up artifacts. - **Thermal publisher wiring**: returns default-safe state until AZ-302 ships. Invariant I-6 holds; consumers see `is_telemetry_available=False` and `thermal_throttle_active=False`. - **CUDA-gated NFR-perf**: Tier-1 CI cannot validate p95 ≤ 10 s on deserialize; Tier-2 Jetson CI is the gate. - **Architecture registry population**: this task ships the *mechanism*; per-backbone modules (E-C2 / E-C2.5 / E-C3 / E-C3.5) own actually *populating* the registry from their composition wiring. Tracked by those component epics. ## Next batch **Batch 25 candidates** (18 tasks total ready in the queue): - AZ-301 (C7 EngineGate, 3pt) — no `torch` dependency; uses C7 error types only. - AZ-302 (C7 ThermalStatePublisher, 3pt) — `jtop` / `pynvml` deps (Tier-2 only; Tier-1 tests stub the source). - AZ-304 (C6 Postgres schema, 2pt) — no native deps; pure SQL + alembic migration if pattern allows. Recommended batch 25 size: 2–3 tasks (AZ-301 + AZ-302, plus AZ-304 if turn budget allows).