# Batch 26 / Cycle 1 — Implementation Report **Date**: 2026-05-12 **Tasks**: AZ-302 (C7 ThermalStatePublisher — jtop/NVML 1 Hz background) **Story points landed**: 3 **Status**: complete (AZ-302 → In Testing) ## Scope summary Single-task batch, continuing the 1-task cadence the user reaffirmed when picking option A after batches 24 and 25. AZ-302 was the deferred 3-pointer surfaced by the batch-25 narrow-down (background thread, two telemetry backends, FDR record kind, log rate-limiting). AZ-304 (C6 Postgres schema, 2pt) remains queued for batch 27. ## Files added / modified ### New - `src/gps_denied_onboard/components/c7_inference/thermal_publisher.py` — `ThermalStatePublisher` (start/stop/read/is_running, `_poll_once` test seam, atomic snapshot, source selection), `ThermalSource` runtime-checkable Protocol, `ThermalReading` intermediate DTO, `_JtopSource` (jtop binding, Tier-2 production path), `_PynvmlSource` (NVML fallback), `_default_safe_snapshot` (Invariant I-6 enforcement). - `tests/unit/c7_inference/test_thermal_publisher.py` — 14 deterministic tests (AC-1..AC-6, AC-8, NFR-default-safe, Protocol structural, Invariant I-6) plus one real-thread integration test that proves the background poller actually polls and emits transitions. ### Modified - `src/gps_denied_onboard/components/c7_inference/__init__.py` — re-exports `ThermalStatePublisher`, `ThermalSource`, `ThermalReading`. - `src/gps_denied_onboard/fdr_client/records.py` — `c7.thermal_transition` registered in `KNOWN_PAYLOAD_KEYS` with six documented payload keys (`previous_state`, `new_state`, `gpu_temp_c`, `cpu_temp_c`, `measured_clock_mhz`, `measured_at_ns`). - `pyproject.toml` — new `[telemetry]` optional dep group (`jetson-stats>=4.2`, `pynvml>=11.5`). Tier-1 (`.[dev]`) install stays slim; Tier-2 Jetson install adds `.[telemetry]`. - `tests/unit/test_az272_fdr_record_schema.py` — `_kind_payload` fixture extended with a sample `c7.thermal_transition` payload so the every-known-kind roundtrip test passes. - `_docs/02_tasks/todo/AZ-302_c7_thermal_publisher.md` → moved to `_docs/02_tasks/done/`; appended `## Implementation Notes (2026-05-12, batch 26)` documenting the four task-spec → as-built deltas. ## Design decisions 1. **`ThermalReading` intermediate DTO instead of source-stamped `ThermalState`**. The spec implied source `read()` returns a fully stamped `ThermalState`. As-built, sources return a `ThermalReading` (raw measurement only) and the publisher stamps `measured_at_ns` + `is_telemetry_available` from its injected `Clock`. This keeps the `_JtopSource` / `_PynvmlSource` modules from calling `time.monotonic_ns()` directly, which `tests/_meta/ test_no_direct_time_in_components.py` (Invariant 2) forbids in any `src/gps_denied_onboard/components/**` file. Documented in the spec's as-built notes section. 2. **`_poll_once` is the deterministic test seam — not `start()`**. The AC-1..AC-4 / AC-8 tests script the source and drive a `_ManualClock` forward; the production `start()` would spawn a background thread that races those deterministic calls (observed in the first test iteration). Tests bypass `start()` via a `_prime_without_thread()` helper that exercises the source-selection + initial-poll path without the thread. `start()` / `stop()` remain covered by AC-5 (cold-start fallback), AC-6 (idempotency), and a real-thread integration test at 200 Hz that proves the background loop polls the source and emits transition records. 3. **Telemetry backends moved to a `[telemetry]` optional dep group**. The spec claimed both `jtop` and `pynvml` were already pinned by description.md, but neither was in `pyproject.toml`. They are now a `[telemetry]` optional extra so Tier-1 installs (`.[dev]`) stay slim and Tier-2 Jetson installs (`.[dev,inference,telemetry]`) pull them in. Both backends remain runtime-optional: the publisher discovers them at `start()` time and only raises `TelemetryUnavailableError` when both are absent. 4. **WARN-on-unavailable rate-limiting uses the injected `Clock`**, not `time.monotonic_ns()`. The 1-second rate-limit window is enforced by `now_ns - last_warn_ns >= 1_000_000_000` against `self._clock. monotonic_ns()`, so replay-deterministic tests can simulate rate limit windows by advancing the manual clock. ## AC coverage | AC | Test name | Status | |----|-----------|--------| | AC-1 | `test_ac1_read_returns_latest_snapshot` + `test_ac1_read_is_wait_free_object_identity` | passing | | AC-2 | `test_ac2_throttle_entry_emits_fdr_and_warn` | passing | | AC-3 | `test_ac3_throttle_exit_emits_fdr_and_info` | passing | | AC-4 | `test_ac4_telemetry_unavailable_defaults_safe_and_rate_limits` | passing | | AC-5 | `test_ac5_start_raises_when_no_source_available` + `test_ac5_jtop_falls_back_to_pynvml` | passing | | AC-6 | `test_ac6_double_start_is_noop` + `test_ac6_double_stop_is_noop` | passing | | AC-7 | F3 hot-path non-interference perf | Tier-2 deferred | | AC-8 | `test_ac8_fdr_record_round_trips_through_wire_format` | passing | | NFR-perf-poll | microbench poll p99 ≤ 100 ms | Tier-2 deferred | | NFR-perf-read p99 ≤ 1 µs | wait-free read microbench | Tier-2 deferred | | NFR-default-safe | `test_nfr_default_safe_on_first_poll_failure` | passing | | Structural | `test_scripted_source_satisfies_protocol` | passing | | Invariant I-6 | `test_default_safe_snapshot_invariant_i6` | passing | AC-7 and the two Tier-2 microbenches need real Jetson hardware to be meaningful; they roll into the Tier-2 validation task with AZ-301's AC-8 (`read_host_tuple` on real NVML/L4T). ## Test run `./.venv/bin/python -m pytest tests/unit tests/_meta -q` → **1140 passed, 11 skipped (Tier-2 / CUDA / cmake / actionlint)**, no failures. Mypy strict on the three production-source files we touched: clean. Ruff on the same set: clean. ## Self-review - Production code (`thermal_publisher.py`, `__init__.py`, `records.py`, `pyproject.toml`): no use of `time.*` in component modules (Invariant 2 meta-test passes); FDR kind documented in the schema-keys table; telemetry backends are optional; the publisher raises `TelemetryUnavailableError` on cold-start with no source per AC-5; idempotent start/stop per AC-6; rate-limited WARN per AC-4. - Tests: every AC has at least one named assertion; the AC-1 wait-free microbench and AC-7 hot-path bench are Tier-2 deferred (documented in the report + the spec as-built notes). - Lint / type: ruff + mypy strict clean on the modified set. - Docs: spec moved to `done/` with the as-built delta notes. ## Known gaps - AZ-302 AC-7 + AC-1 microbench + NFR-perf-poll → require real Jetson thermal source under load; rolled into the Tier-2 validation task. - AZ-304 (C6 Postgres schema) deferred to batch 27 — testcontainers setup + Alembic baseline are independently meaningful. - `_JtopSource.read()` catches a bare `Exception` because jtop's internal exception types are not stable across jtop versions; the exception is always rewrapped as `TelemetryUnavailableError` (never swallowed silently) so this does not violate the "no silent error suppression" coding rule.