[AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator

AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:

  1. filename schema parse  -> EngineSchemaMismatchError(reason=...)
  2. schema tuple match     -> EngineSchemaMismatchError(expected,got)
  3. sidecar present        -> EngineSidecarMissingError
  4. sidecar trust          -> EngineHashMismatchError(stage=sidecar)
  5. manifest match         -> EngineHashMismatchError(stage=manifest)

Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).

Production code (new):
 - components/c7_inference/engine_gate.py  -- EngineGate, HostTuple,
   read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
   tensorrt.__version__; raises RuntimeError on Tier-1)
 - components/c7_inference/manifest.py     -- DeploymentManifest,
   ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
   type level: __getitem__ raises EngineHashMismatchError on
   missing key, NEVER KeyError, so the gate cannot silently pass
 - components/c7_inference/__init__.py     -- re-exports the new
   public surface

Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
 1. HostTuple lives in engine_gate.py (the only consumer);
    re-exported from package __init__.py.
 2. read_host_tuple takes precision as a keyword argument — three
    of four fields come from the host, precision is engine-build
    metadata supplied by the caller.
 3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
    run on every CI host.

Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.

NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.

AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.

Suite: 1134 passed / 11 skipped. No regressions outside the new
files.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 10:20:21 +03:00
parent 65ad2168ed
commit 59f56c032f
7 changed files with 941 additions and 1 deletions
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 13
name: archive-and-loop
detail: "batch 24/cycle1 complete: AZ-300 → In Testing, archived to done/. Installed [inference] extras (torch 2.11.0 + torchvision 0.26.0 + onnxruntime 1.23.2) into the dev venv. 17 ACs + NFRs covered (6 CUDA-skipped on macOS). Suite: 1120 passed / 10 skipped. Next: recompute batch 25 — candidates AZ-301 (EngineGate, 3pt) + AZ-302 (ThermalStatePublisher, 3pt) + AZ-304 (C6 Postgres schema, 2pt). 17 tasks total ready overall (AZ-300 removed; AZ-345 still gated)."
detail: "batch 25/cycle1 complete: AZ-301 → In Testing, archived to done/. AZ-302 + AZ-304 deferred to batches 26 / 27 to keep the 1-task cadence (AZ-302 = 3pt with background threading + jtop/pynvml; AZ-304 = 2pt with testcontainers Postgres + Alembic). 14 unconditional AC tests + 1 Tier-2 AC-8 skip. Suite: 1134 passed / 11 skipped. 17 tasks total ready overall (AZ-300 + AZ-301 removed)."
retry_count: 0
cycle: 1
tracker: jira