Commit Graph

4 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh 59f56c032f [AZ-301] Implement EngineGate — D-C10-3 + D-C10-7 takeoff validator
AZ-301 takeoff-side validator every InferenceRuntime strategy calls
before deserialize_engine. Five-step deterministic refusal pipeline,
in order:

  1. filename schema parse  -> EngineSchemaMismatchError(reason=...)
  2. schema tuple match     -> EngineSchemaMismatchError(expected,got)
  3. sidecar present        -> EngineSidecarMissingError
  4. sidecar trust          -> EngineHashMismatchError(stage=sidecar)
  5. manifest match         -> EngineHashMismatchError(stage=manifest)

Refusal order is part of the public contract (AC-7 verifies a
fixture that is BOTH schema-mismatched AND missing-sidecar refuses
at step 1).

Production code (new):
 - components/c7_inference/engine_gate.py  -- EngineGate, HostTuple,
   read_host_tuple (Jetson: pynvml + /etc/nv_tegra_release +
   tensorrt.__version__; raises RuntimeError on Tier-1)
 - components/c7_inference/manifest.py     -- DeploymentManifest,
   ManifestReader, ManifestReaderProtocol. Risk-2 enforced at the
   type level: __getitem__ raises EngineHashMismatchError on
   missing key, NEVER KeyError, so the gate cannot silently pass
 - components/c7_inference/__init__.py     -- re-exports the new
   public surface

Tests (new): tests/unit/c7_inference/test_engine_gate.py covers
AC-1..AC-7 + NFR-reliability-no-write + manifest reader + refusal
log emission. 14 tests unconditional + AC-8 Tier-2 skip (needs
real NVML + L4T release file + tensorrt binding).

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-301_c7_engine_gate.md Implementation Notes:
 1. HostTuple lives in engine_gate.py (the only consumer);
    re-exported from package __init__.py.
 2. read_host_tuple takes precision as a keyword argument — three
    of four fields come from the host, precision is engine-build
    metadata supplied by the caller.
 3. AC-8 is Tier-2-only; AC-1..AC-7 + NFR-reliability + extras
    run on every CI host.

Risk-2 (manifest reader silently treats missing entry as pass):
DeploymentManifest.__getitem__ raises EngineHashMismatchError with
"missing manifest entry for {path}" — covered by
test_manifest_missing_entry_raises_hash_mismatch.

NFR-perf-validate (p99 <= 50 ms): tier-2 only — a real 500 MB
engine streaming sha256 cannot be benchmarked on Tier-1 fixtures.

AZ-302 (ThermalStatePublisher) + AZ-304 (C6 Postgres schema)
deferred to batches 26 / 27 to keep the 1-task batch cadence and
isolate their respective env / testcontainer surface areas.

Suite: 1134 passed / 11 skipped. No regressions outside the new
files.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:20:21 +03:00
Oleksandr Bezdieniezhnykh 65ad2168ed [AZ-300] Implement PytorchFp16Runtime — C7 simple-baseline strategy
AZ-300 mandatory simple-baseline InferenceRuntime (eager FP16 PyTorch).
Implements the AZ-297 Protocol; current_runtime_label returns
"pytorch_fp16". Numerical reference every fancier C7 strategy (AZ-298
TRT, AZ-299 ORT) is measured against, and the only viable runtime for
Tier-1 workstation Docker where TRT is non-trivial to install.

Production code (new):
 - components/c7_inference/pytorch_fp16_runtime.py — runtime +
   PytorchEngineHandle + output-shape adapter
 - components/c7_inference/architecture_registry.py — torch-free
   register_architecture / default_registry / ArchitectureFactory
   (Risk-1 mitigation: no L2->L3 back-edge from C7 into per-backbone
   code)
 - components/c7_inference/__init__.py — re-exports the registry
   mechanism. Still does NOT import the concrete strategy module
   (Invariant I-5)
 - components/c7_inference/config.py — adds per_frame_debug_log bool
   field (gates the DEBUG per-frame latency log)

Tests (new): tests/unit/c7_inference/test_pytorch_fp16_runtime.py
covers AC-1..AC-8 + NFRs. AC-1/2/6/7 + thermal/release/registry
guards run unconditionally (17 tests); AC-3/4/5/8 +
NFR-perf-deserialize + NFR-reliability-eval-mode require CUDA and
skip on Tier-1 CI / macOS dev.

Tests (modified):
 - test_protocol_conformance.py — narrowed
   test_ac5_build_inference_runtime_flag_on_but_module_missing
   parametrisation to exclude pytorch_fp16 (now-built); TRT / ORT
   still covered until AZ-298 / AZ-299 ship.

CI: .github/workflows/ci.yml lint + unit jobs now install
'-e .[dev,inference]' because mypy + pytest need torch + torchvision +
onnxruntime on the runner.

Three task-spec -> as-built deltas documented in
_docs/02_tasks/done/AZ-300_c7_pytorch_baseline.md Implementation Notes:
 1. Constructor conforms to AZ-297 factory shape (config positional;
    thermal_publisher + registry + clock keyword-only optionals).
    AZ-302 will update the factory to thread thermal_publisher.
 2. Architecture registry uses extras["model_name"] as lookup key
    (avoids touching the frozen BuildConfig / EngineCacheEntry DTOs).
 3. Warm-up forward deferred to AZ-300 tier-2 follow-up — the zero-arg
    registry has no per-backbone input-shape metadata.

Suite: 1120 passed / 10 skipped (CUDA + Tier-2 + cmake / actionlint
environment gates). No regressions in non-c7_inference areas.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 10:13:21 +03:00
Oleksandr Bezdieniezhnykh daff5d4d1c [AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md
v1.0.0:

- InferenceRuntime Protocol (6 methods: compile_engine,
  deserialize_engine, infer, release_engine, thermal_state,
  current_runtime_label) in components/c7_inference/interface.py.
- DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig,
  EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py
  — placed at the L1 types layer so C10 can re-export EngineCacheEntry
  without crossing the components.* boundary (AZ-270 AC-6).
- ThermalState DTO expanded in _types/thermal.py from the AZ-355
  forward-declared stub to the AZ-297 contract shape (cpu/gpu temp,
  thermal_throttle_active, measured_clock_mhz, measured_at_ns,
  is_telemetry_available). Invariant I-6: when telemetry is
  unavailable, throttle is False.
- Error family rooted at c7_inference.errors.RuntimeError (9 subtypes:
  EngineBuildError, EngineDeserializeError, EngineHashMismatchError,
  EngineSchemaMismatchError, EngineSidecarMissingError,
  CalibrationCacheError, InferenceError, OutOfMemoryError,
  TelemetryUnavailableError). RuntimeNotAvailableError stays in
  runtime_root/errors.py — composition-time, outside the family.
- C7InferenceConfig per-component config block (runtime label,
  thermal_poll_hz, engine_cache_dir) with constructor-time validation
  rejecting unknown runtime labels.
- Composition-root factory build_inference_runtime in
  runtime_root/inference_factory.py with three BUILD_* gates
  (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME,
  BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported
  lazily via __import__ AFTER the flag check, so a Tier-0 build with
  the flag OFF MUST NOT load the strategy module (AC-5 / I-5;
  verifiable via sys.modules).
- 37 conformance tests cover all 8 ACs + NFR-perf-factory
  (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; also asserts all 9 error
  subtypes are documented.

Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py
(replaced by the AZ-297 canonical shape in _types/inference.py); updated
the LightGlue-flavoured EngineHandle Protocol docstring in
_types/manifests.py to rationalize its intentional dual existence
with the C7 opaque EngineHandle (same name, different consumer-side
cut, mirroring the C4/C5 ISam2GraphHandle pattern).

Stale ThermalState.throttle docstring references in c4_pose/config.py,
c4_pose/interface.py, and _types/pose.py updated to
thermal_throttle_active.

Full unit-test sweep: 843 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 04:30:14 +03:00
Oleksandr Bezdieniezhnykh b12db61444 [AZ-263] Bootstrap: repo skeleton + Docker + CI + Alembic + Tier-1 tests
Implements the AZ-263 / E-BOOT initial structure task:

- Python src/-layout package `gps_denied_onboard/` with per-component
  interface stubs (14 components), type-only DTOs under `_types/`,
  shared helpers under `helpers/` (R14 LightGlue ownership), structured
  JSON logging, runtime composition root with env-var fail-fast gate,
  healthcheck module shared by Docker and CI smoke.
- CMake top-level + `cmake/{build_options,dependencies,strategies}.cmake`
  with the BUILD_* per-binary flags (ADR-002) and pinned external git
  refs for OKVIS2 / VINS-Mono / GTSAM / FAISS / OpenCV >=4.12.0.
- Three Dockerfiles (companion-tier1, operator-tooling,
  mock-suite-sat-service) + two compose files (dev + Tier-1 test).
- Four GitHub Actions workflows: ci.yml (lint/unit/integration/dual
  binary build/SBOM diff/security), ci-tier2.yml (self-hosted Jetson
  AC-bound NFTs), release.yml, cve-rescan.yml.
- Two CI gate scripts: `ci/sbom_diff.py` (deployment SBOM subset +
  R02 exclusion), `ci/opencv_pin_gate.py` (>=4.12.0 enforcement,
  D-CROSS-CVE-1).
- Alembic-driven Postgres 16 initial migration `0001_initial.py`
  mirroring satellite-provider tiles + flights + sector_classifications
  + manifests + engine_cache_entries (data_model.md s 2).
- Tier-1 test scaffolding: 95 passing unit tests covering every AC,
  per-component smoke tests, structured logging JSON output check,
  env-var gate check, healthcheck import check. Two CI-gated tests
  (cmake configure, actionlint) skip locally with explicit reasons.
- Batch report + code review report under `_docs/03_implementation/`.

Verdict: PASS_WITH_WARNINGS (two Low findings, both informational).
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 01:00:28 +03:00