mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 10:01:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 65ad2168ed [AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy

AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.

Production code (new):
 - components/c7_inference/pytorch_fp16_runtime.py — runtime +
   PytorchEngineHandle + output-shape adapter
 - components/c7_inference/architecture_registry.py — torch-free
   register_architecture / default_registry / ArchitectureFactory
   (Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
   code)
 - components/c7_inference/__init__.py — re-exports the registry
   mechanism. Still does NOT import the concrete strategy module
   (Invariant I-5)
 - components/c7_inference/config.py — adds per_frame_debug_log bool
   field (gates the DEBUG per-frame latency log)

Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.

Tests (modified):
 - test_protocol_conformance.py — narrowed
   test_ac5_build_inference_runtime_flag_on_but_module_missing
   parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
   still covered until AZ-298 / AZ-299 ship.

CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
 1. Constructor conforms to AZ-297 factory shape (config positional;
    thermal_publisher + registry + clock keyword-only optionals).
    AZ-302 will update the factory to thread thermal_publisher.
 2. Architecture registry uses extras["model_name"] as lookup key
    (avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
 3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
    registry has no per-backbone input-shape metadata.

Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-12 10:13:21 +03:00

6.3 KiB

Raw Blame History

Batch 24 / Cycle 1 — Implementation Report

Date: 2026-05-12 Tasks: AZ-300 (C7 PytorchFp16Runtime — mandatory simple-baseline) Story points landed: 2 Status: complete (AZ-300 → In Testing)

Scope summary

Single-task batch by design — narrowed from the initial post-AZ-332 plan ({AZ-300, AZ-301, AZ-302}) to keep the post-OKVIS2 turn at a reviewable size. AZ-301 (EngineGate, 3pt) and AZ-302 (ThermalStatePublisher, 3pt) move to batch 25.

Files added / modified

New

src/gps_denied_onboard/components/c7_inference/pytorch_fp16_runtime.py — PytorchFp16Runtime + PytorchEngineHandle + _to_numpy_dict output-shape adapter.
src/gps_denied_onboard/components/c7_inference/architecture_registry.py — torch-free register_architecture / default_registry / ArchitectureFactory. Risk-1 mitigation (no L2→L3 back-edge from C7 into per-backbone code).
tests/unit/c7_inference/test_pytorch_fp16_runtime.py — 17 tests covering AC-1..AC-8 + NFRs; CPU-runnable subset green on macOS.

Modified

src/gps_denied_onboard/components/c7_inference/__init__.py — re-exports ArchitectureFactory, default_registry, register_architecture. Still does NOT import the concrete strategy module (Invariant I-5 / Risk-2).
src/gps_denied_onboard/components/c7_inference/config.py — added per_frame_debug_log: bool = False to C7InferenceConfig (gates the DEBUG per-frame latency log per spec § Scope).
tests/unit/c7_inference/test_protocol_conformance.py — narrowed test_ac5_build_inference_runtime_flag_on_but_module_missing parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT remain covered (AZ-298 / AZ-299 still pending).
_docs/02_tasks/todo/AZ-300_c7_pytorch_baseline.md → moved to _docs/02_tasks/done/; added an ## Implementation Notes (2026-05-12, batch 24) section documenting the three task-spec → as-built deltas.

Design decisions (resolved spec contradictions)

Constructor shape — __init__(config: Config, *, thermal_publisher=None, architecture_registry=None, clock=None). AZ-297 factory passes config only; thermal-publisher injection waits for AZ-302 to update the factory. Same pattern as AZ-332 vs. AZ-331 (user-approved option A from the prior batch).
Architecture registry key — EngineCacheEntry.extras["model_name"], populated from the checkpoint's file stem inside compile_engine. Avoids touching the frozen BuildConfig / EngineCacheEntry DTOs.
Warm-up forward — deferred to AZ-300 tier-2 follow-up. The registry has no input-shape metadata; a real warm-up needs per-backbone shape info owned by each backbone's composition wiring.

AC coverage

AC	Status	Notes
AC-1 protocol conformance	covered	`test_ac1_protocol_conformance`
AC-2 compile_engine no-op	covered	`test_ac2_compile_engine_is_noop`
AC-3 deserialize half-cast/GPU/eval	covered (CUDA-skip on Tier-1)	`test_ac3_deserialize_loads_half_casts_gpu_moves_eval`
AC-4 infer numerical FP32 reference	covered (CUDA-skip on Tier-1)	`test_ac4_infer_numerical_close_to_fp32`; atol=5e-3, rtol=5e-3 for FP16 tiny linear
AC-5 release frees GPU memory	covered (CUDA-skip on Tier-1)	`test_ac5_release_frees_gpu_memory` + I-7 idempotent assertion
AC-6 missing checkpoint	covered	`test_ac6_missing_checkpoint_raises`
AC-7 mismatched state_dict	covered	`test_ac7_incompatible_state_dict_raises_with_cause` (validates `__cause__` chain)
AC-8 CUDA OOM rewrap	covered (CUDA-skip on Tier-1)	`test_ac8_cuda_oom_during_infer_rewrapped` (synthetic OOM via stub model)
NFR-perf-deserialize	tier2	Jetson-only validation
NFR-reliability-eval-mode	covered (CUDA-skip on Tier-1)	`test_nfr_reliability_eval_mode_unconditional`

Additional coverage beyond ACs:

test_thermal_state_default_safe_when_no_publisher — Invariant I-6 fallback when AZ-302 publisher absent.
test_thermal_state_delegates_to_publisher — duck-typed .read() delegation, forward-compat with AZ-302.
test_deserialize_missing_architecture_registration — registry lookup miss path.
test_infer_rejects_foreign_handle / test_infer_rejects_released_handle — handle-lifecycle guards (consumers MUST pass back the same runtime's handle).
test_register_architecture_rejects_collision / test_register_architecture_same_factory_is_idempotent — composition-time registry safety.

Test run

.venv/bin/pytest tests/unit/c7_inference/ → 63 passed, 6 skipped
.venv/bin/pytest                          → 1120 passed, 10 skipped

The 6 c7_inference skips are CUDA-gated. The 10 full-suite skips are all environment-gated (CUDA + Tier-2 + cmake/actionlint not on PATH). No pre-existing tests regressed.

Self-review verdict

Pass. Followed AZ-297 contract (Protocol surface + factory shape + error envelope + Invariant I-1/2/4/5/6/7/8). The single test-protocol-conformance edit is narrowly scoped (parametrisation filter, not behaviour change). No churn outside c7_inference.

Known gaps for the Product Implementation Completeness Gate

Warm-up forward: deferred to AZ-300 tier-2 (Jetson). Real first infer call does the implicit warm-up; AC-3 still passes because it only checks dtype/device/training-mode, not warm-up artifacts.
Thermal publisher wiring: returns default-safe state until AZ-302 ships. Invariant I-6 holds; consumers see is_telemetry_available=False and thermal_throttle_active=False.
CUDA-gated NFR-perf: Tier-1 CI cannot validate p95 ≤ 10 s on deserialize; Tier-2 Jetson CI is the gate.
Architecture registry population: this task ships the mechanism; per-backbone modules (E-C2 / E-C2.5 / E-C3 / E-C3.5) own actually populating the registry from their composition wiring. Tracked by those component epics.

Next batch

Batch 25 candidates (18 tasks total ready in the queue):

AZ-301 (C7 EngineGate, 3pt) — no torch dependency; uses C7 error types only.
AZ-302 (C7 ThermalStatePublisher, 3pt) — jtop / pynvml deps (Tier-2 only; Tier-1 tests stub the source).
AZ-304 (C6 Postgres schema, 2pt) — no native deps; pure SQL + alembic migration if pattern allows.

Recommended batch 25 size: 2–3 tasks (AZ-301 + AZ-302, plus AZ-304 if turn budget allows).

6.3 KiB Raw Blame History Unescape Escape