mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 11:21:13 +00:00
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -175,7 +175,7 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
|
||||
- `engine_gate.py` (AZ-301; D-C10-3 + D-C10-7 takeoff validator)
|
||||
- `errors.py` (component error family)
|
||||
- `manifest.py` (AZ-301; `DeploymentManifest` + `ManifestReader` for engine sidecar manifests)
|
||||
- `onnx_trt_runtime.py` (ONNX Runtime + TensorRT EP, pending)
|
||||
- `onnx_trt_ep_runtime.py` (AZ-299; ONNX Runtime + TensorRT EP fallback strategy + per-flight ORT TRT subgraph cache + one-shot fallback WARN/FDR/GCS alert + CPU-fallback gate)
|
||||
- `pytorch_fp16_runtime.py` (AZ-300; research-only / simple-baseline strategy)
|
||||
- `tensorrt_runtime.py` (AZ-298; production-default TensorRT 10.3 strategy + INT8 calibration cache trust + GPU memory budget enforcement + `python -m ...tensorrt_runtime compile ...` CLI)
|
||||
- `thermal_publisher.py` (AZ-302; 1 Hz background poller, jtop/NVML fallback)
|
||||
|
||||
@@ -0,0 +1,239 @@
|
||||
# Batch 32 / Cycle 1 — Implementation Report
|
||||
|
||||
**Date**: 2026-05-12
|
||||
**Tasks**: AZ-299 (C7 OnnxTrtEpRuntime — ONNX Runtime + TensorRT EP
|
||||
fallback strategy + per-flight ORT TRT subgraph cache + one-shot
|
||||
fallback alert + CPU-fallback gate)
|
||||
**Story points landed**: 3
|
||||
**Status**: complete (AZ-299 → In Testing)
|
||||
|
||||
## Scope summary
|
||||
|
||||
Single-task batch landing the fallback `InferenceRuntime` strategy
|
||||
for C7. `OnnxTrtEpRuntime` owns the ONNX Runtime + TensorRT EP path
|
||||
that satisfies C7-IT-05: when the TRT-direct strategy (AZ-298) cannot
|
||||
deserialise the cached engine for a given model, or when the operator
|
||||
explicitly selects ORT, the system stays in the air at degraded
|
||||
latency rather than dropping the request. The runtime conforms to the
|
||||
same AZ-297 Protocol as `TensorrtRuntime` and `PytorchFp16Runtime`,
|
||||
so the composition root can wire it as either the primary strategy
|
||||
or as the fallback target.
|
||||
|
||||
The fallback semantics required by AC-5 and Risk-2 are captured by
|
||||
two new FDR record kinds (extending AZ-272):
|
||||
|
||||
- `c7.fallback_to_onnx_trt_ep` — fired once per session when a
|
||||
runtime constructed with `is_fallback=True` serves its first
|
||||
`infer`. Carries `{model_name, reason, active_provider}`.
|
||||
- `c7.cpu_fallback` — fired at deserialise time when ORT's provider
|
||||
fallback chain settled on `CPUExecutionProvider` (TRT EP refused
|
||||
AND CUDA EP refused or unavailable). Carries `{model_name,
|
||||
requested_providers, active_provider}`. The composition root can
|
||||
install a hard-refusal hook by setting
|
||||
`config.inference.ort_disallow_cpu_fallback = True`; default
|
||||
remains "warn but allow" so a misconfigured Jetson serves results
|
||||
(slowly) rather than hard-failing the flight.
|
||||
|
||||
ORT, NumPy, and `pycuda` (used only for `release_engine` cleanup hints)
|
||||
are **lazy-imported** inside the methods that need them; the module
|
||||
loads cleanly on Tier-0 / macOS dev hosts so the package's protocol-
|
||||
conformance tests stay importable without GPU. ORT version is pinned
|
||||
at the project default; this task does NOT introduce any new third-
|
||||
party dependency. The TRT EP cache directory comes from
|
||||
`config.inference.ort_trt_cache_dir` (new field, defaults to
|
||||
`/var/lib/gps-denied/engines/ort_trt_cache`) and is intentionally a
|
||||
sibling of the TRT-direct `engine_cache_dir` so C12 operator tooling
|
||||
can clean both on flight end via a single sweep.
|
||||
|
||||
## Files added / modified
|
||||
|
||||
### New (production)
|
||||
|
||||
- `src/gps_denied_onboard/components/c7_inference/onnx_trt_ep_runtime.py`
|
||||
— `OnnxTrtEpEngineHandle` (opaque, slots, owns the ORT session +
|
||||
cached output names + `model_name` + `_released` flag); local
|
||||
`_iso_ts_now` helper for FDR timestamps (kept component-local
|
||||
rather than reaching across layering — see Findings #1);
|
||||
`_ort_dtype_to_numpy` (single point that maps the ORT type strings
|
||||
back to NumPy dtypes, isolating the version-fragile mapping for
|
||||
Risk-3); `_build_provider_args` (single place that constructs the
|
||||
TRT EP option dict — pins the option-key names for Risk-3 unit
|
||||
test); `_stage_engine_for_ort` (symlink-or-copy a cached `.engine`
|
||||
into the ORT TRT EP cache directory at the path ORT expects);
|
||||
`OnnxTrtEpRuntime` class with `compile_engine` (no-op returning
|
||||
an `EngineCacheEntry` pointing at the source `.onnx`),
|
||||
`deserialize_engine` (gate-first when the entry is a `.engine`,
|
||||
skip-gate when `.onnx`; provider list
|
||||
`[TensorrtExecutionProvider, CUDAExecutionProvider,
|
||||
CPUExecutionProvider]`; staging the cached engine for the EP;
|
||||
warm-up `session.run` after construction; one-shot
|
||||
`c7.cpu_fallback` alert when the active provider is CPU; honours
|
||||
`ort_disallow_cpu_fallback` by raising `EngineDeserializeError`
|
||||
before any work happens on the CPU path), `infer` (sync
|
||||
`session.run` with named inputs / outputs; first call on
|
||||
`is_fallback=True` runtimes fires exactly one
|
||||
`c7.fallback_to_onnx_trt_ep` WARN log + `gcs_alert` callback;
|
||||
ORT-internal exceptions rewrapped to `InferenceError` with
|
||||
`__cause__` preserved), idempotent `release_engine` (drops the
|
||||
session reference and marks the handle released; second call is
|
||||
a silent no-op), `thermal_state` delegation to the injected
|
||||
`ThermalStatePublisher`, `current_runtime_label() -> "onnx_trt_ep"`.
|
||||
|
||||
### New (tests)
|
||||
|
||||
- `tests/unit/c7_inference/test_onnx_trt_ep_runtime.py` — **NEW**
|
||||
suite covering every AC + the two risks:
|
||||
- **AC-1** protocol conformance + label string match.
|
||||
- **AC-2** deserialise from `.onnx` does NOT call `EngineGate.validate`;
|
||||
session is built with the TRT EP at the head of the provider
|
||||
list; warm-up `session.run` runs exactly once.
|
||||
- **AC-3** deserialise from `.engine` whose filename schema
|
||||
mismatches the host: `EngineGate.validate` raises before any
|
||||
ORT session creation — verified by monkey-patching `_load_ort`
|
||||
to raise `AssertionError` on any call.
|
||||
- **AC-4** `infer` round-trips through the fake ORT session with
|
||||
named inputs / outputs; the returned dict matches the Protocol
|
||||
shape. (The "numerical comparison against TRT-direct within
|
||||
FP16 tolerance" half of AC-4 lives in the Tier-2 microbench
|
||||
harness — placeholder skip in the same file.)
|
||||
- **AC-5** first `infer` with `is_fallback=True` emits exactly
|
||||
one `c7.fallback_to_onnx_trt_ep` WARN log AND invokes the
|
||||
`gcs_alert` callback once; second `infer` is silent on both
|
||||
channels; `is_fallback=False` never emits.
|
||||
- **AC-6** forcing TRT EP to refuse (the fake ORT reports only
|
||||
`CUDAExecutionProvider` and `CPUExecutionProvider` as
|
||||
successfully loaded) creates the session with CUDA EP as the
|
||||
active provider; an INFO log records the actual provider in
|
||||
use; `current_runtime_label()` still returns `"onnx_trt_ep"`.
|
||||
- **AC-7** `release_engine` called twice — first drops the
|
||||
session reference and marks released; second is a silent no-op;
|
||||
foreign handle types silently ignored (defensive shim consistent
|
||||
with `TensorrtRuntime`).
|
||||
- **AC-8** `_build_provider_args` sets `trt_max_workspace_size`
|
||||
to `gpu_memory_budget_bytes // 4`; the provider option dict
|
||||
contains exactly the keys
|
||||
`{trt_engine_cache_enable, trt_engine_cache_path,
|
||||
trt_max_workspace_size, trt_fp16_enable}` (Risk-3 pin).
|
||||
- **Risk-2** CPU fallback emits exactly one `c7.cpu_fallback`
|
||||
FDR record at deserialise time; with
|
||||
`ort_disallow_cpu_fallback=True` the runtime instead raises
|
||||
`EngineDeserializeError` before any session work.
|
||||
- **NFR-reliability** ORT-internal `RuntimeError` raised inside
|
||||
`session.run` is rewrapped as `InferenceError` with `__cause__`
|
||||
preserved; foreign handle types and released handles rewrap.
|
||||
- **Tier-2 placeholders**: numerical FP16 comparison against
|
||||
TRT-direct (AC-4 tail), session-creation perf NFR
|
||||
(≤ 30 s p95 first / ≤ 5 s p95 with EP cache hot), and real-EP
|
||||
CPU-fallback under TRT-version-mismatch — all marked
|
||||
`@pytest.mark.tier2` and skipped on Tier-1 / macOS dev.
|
||||
|
||||
### Modified (production)
|
||||
|
||||
- `src/gps_denied_onboard/components/c7_inference/config.py` —
|
||||
adds `ort_trt_cache_dir: str =
|
||||
"/var/lib/gps-denied/engines/ort_trt_cache"` (validated non-empty
|
||||
in `__post_init__`) and `ort_disallow_cpu_fallback: bool = False`
|
||||
to `C7InferenceConfig`. The CPU-fallback gate intentionally
|
||||
defaults to "warn but allow" to honour the architecture's
|
||||
"keep flying" principle; the operator opts INTO hard-refusal
|
||||
when latency budgets matter more than service continuity.
|
||||
- `src/gps_denied_onboard/fdr_client/records.py` — adds two new
|
||||
`FdrRecord` kinds (`c7.fallback_to_onnx_trt_ep` and
|
||||
`c7.cpu_fallback`) with their required field sets, following the
|
||||
existing pattern for `c6.write_failed` / `c6.freshness.*`.
|
||||
|
||||
### Modified (tests)
|
||||
|
||||
- `tests/unit/c7_inference/test_protocol_conformance.py` — the
|
||||
`test_ac5_build_inference_runtime_flag_on_but_module_missing`
|
||||
parametrization previously excluded only `{"pytorch_fp16",
|
||||
"tensorrt"}`; now that `onnx_trt_ep_runtime.py` exists the set is
|
||||
`{"pytorch_fp16", "tensorrt", "onnx_trt_ep"}`. The test body and
|
||||
parametrize structure are kept intact so the factory's missing-
|
||||
module branch stays under test for any future strategy whose
|
||||
`BUILD_*` flag is wired in `inference_factory._RUNTIME_TO_MODULE`
|
||||
ahead of its module landing.
|
||||
- `tests/unit/test_az272_fdr_record_schema.py` — extends the
|
||||
per-kind fixture builder with deterministic payloads for
|
||||
`c7.fallback_to_onnx_trt_ep` and `c7.cpu_fallback` so the AZ-272
|
||||
roundtrip / schema-version / unknown-kind tests cover the new
|
||||
kinds the same way they cover the C6 kinds.
|
||||
|
||||
### Modified (docs)
|
||||
|
||||
- `_docs/02_document/module-layout.md` — the
|
||||
`onnx_trt_runtime.py (ONNX Runtime + TensorRT EP, pending)` row
|
||||
in the c7_inference per-component table now reads
|
||||
`onnx_trt_ep_runtime.py (AZ-299; ONNX Runtime + TensorRT EP
|
||||
fallback strategy + per-flight ORT TRT subgraph cache + one-shot
|
||||
fallback WARN/FDR/GCS alert + CPU-fallback gate)`. The filename
|
||||
shift from `onnx_trt_runtime.py` (task spec body) to
|
||||
`onnx_trt_ep_runtime.py` (shipped) follows
|
||||
`inference_factory._RUNTIME_TO_MODULE` which is the authoritative
|
||||
factory wiring — the task spec's "Outcome" body had a typo that
|
||||
contradicted its own "label" wording (`"onnx_trt_ep"`). The
|
||||
factory wins.
|
||||
|
||||
## Acceptance criteria coverage
|
||||
|
||||
| AC | Test | Status |
|
||||
|----|------|--------|
|
||||
| AC-1 Protocol conformance + label | `test_ac1_protocol_conformance` | passing |
|
||||
| AC-2 Deserialise from `.onnx` skips the gate | `test_ac2_deserialize_from_onnx_skips_gate` | passing |
|
||||
| AC-3 Deserialise from `.engine` invokes the gate | `test_ac3_deserialize_from_engine_invokes_gate_and_skips_session_on_refusal` | passing |
|
||||
| AC-4 `infer` round-trips through ORT (named outputs) | `test_ac4_infer_round_trips_named_outputs` (Tier-1) + Tier-2 numerical FP16 comparison placeholder | passing / Tier-2 skipped |
|
||||
| AC-5 Fallback WARN log fires once on first infer | `test_ac5_first_infer_with_is_fallback_emits_warn_and_alert_once` + `test_ac5_not_fallback_never_emits` | passing |
|
||||
| AC-6 Provider fallback chain respects ORT order | `test_ac6_trt_ep_refused_falls_through_to_cuda_ep` | passing |
|
||||
| AC-7 `release_engine` idempotent | `test_ac7_release_is_idempotent` + `test_release_engine_ignores_foreign_handle_type` | passing |
|
||||
| AC-8 Workspace budget respected | `test_ac8_provider_options_pin_keys_and_budget_quarter` | passing |
|
||||
| Risk-2 CPU fallback signalled | `test_risk2_cpu_fallback_emits_fdr_kind` + `test_risk2_cpu_fallback_with_disallow_raises` | passing |
|
||||
| Risk-3 TRT EP option-key pin | `test_ac8_provider_options_pin_keys_and_budget_quarter` (shared) | passing |
|
||||
| NFR-perf-session-create p95 ≤ 30 s / ≤ 5 s cache hot | `test_nfr_perf_session_create_first_under_30s_cache_hot_under_5s` (Tier-2 microbench) | Tier-2 skipped |
|
||||
| NFR-reliability-error-rewrap | `test_nfr_reliability_infer_rewraps_runtime_error` + `test_infer_rejects_foreign_handle` + `test_infer_rejects_released_handle` | passing |
|
||||
|
||||
## AC Test Coverage: 8 of 8 covered (+ 2 risks + 2 NFRs)
|
||||
## Code Review Verdict: PASS_WITH_WARNINGS (2 Low accepted; see Findings)
|
||||
## Auto-Fix Attempts: 0
|
||||
## Stuck Agents: None
|
||||
|
||||
## Findings (self-review)
|
||||
|
||||
| # | Severity | Category | Location | Note | Resolution |
|
||||
|---|----------|----------|----------|------|------------|
|
||||
| 1 | Low | Maintainability | `onnx_trt_ep_runtime.py::_iso_ts_now` | Duplicated from the equivalent helper in `tensorrt_runtime.py` / `fdr_client`. Consolidating into a shared helper would either inflate `fdr_client/records.py` (which is the lowest-layer module the c7 strategies depend on) or carve out a new shared utility module just for one one-liner. Kept component-local; a later hygiene pass can extract the helper alongside the existing shared `_types/` move when more components grow ISO-timestamp call sites. | Open (Low) — accepted; the c7 layering rule wins. |
|
||||
| 2 | Low | Test-quality | `test_ac4_infer_round_trips_named_outputs` | Uses a `_FakeOrtSession` whose `run(...)` returns canned arrays in the declared output order. The named-output mapping assertion is verified at the Protocol layer; the *numerical* FP16 comparison against TRT-direct lives in the Tier-2 microbench harness. | Open (Low) — Tier-2 placeholder owns the numerical half. |
|
||||
| 3 | Low | Architecture | `onnx_trt_ep_runtime.py::_stage_engine_for_ort` | Attempts symlink first, falls back to copy on `OSError` (e.g., crossing a filesystem boundary, or running on a host that disallows symlinks for the running user). The copy path leaves a stale binary in the EP cache directory if the staging fails partway; C12's per-flight cache cleanup handles this — a torn copy on disk is no worse than a stale subgraph. | Open (Low) — accepted as documented; C12 owns cleanup. |
|
||||
| 4 | Low | Test-coverage | AC-3 schema-mismatch path | The test patches `EngineGate.validate` to raise `EngineSchemaMismatchError`; the real gate's filename-schema parser is exercised by AZ-281 / AZ-301 tests. Wiring this runtime to a real (live) gate would duplicate that coverage at the wrong layer. | Open (Low) — accepted; AZ-301 owns the parser. |
|
||||
|
||||
## Tracker
|
||||
|
||||
- AZ-299 transitioned to **In Progress** at session start; will move
|
||||
to **In Testing** post-commit per `protocols.md`.
|
||||
|
||||
## Test suite
|
||||
|
||||
- `tests/unit/c7_inference/test_onnx_trt_ep_runtime.py` — all
|
||||
active tests passing, Tier-2 placeholders skipped on macOS dev
|
||||
(no ORT/CUDA binding).
|
||||
- `tests/unit/c7_inference/` (full c7 suite) — 139 passing, 17
|
||||
skipped (CUDA / TensorRT / ORT unavailable on Tier-1 / macOS).
|
||||
- `tests/unit/test_az272_fdr_record_schema.py` — 34 passing (the
|
||||
two new C7 kinds now covered by every roundtrip / schema-version
|
||||
test).
|
||||
- Combined unit suite excluding pending components (c1, c2, c2.5,
|
||||
c3, c3.5, c4, c5, c8, c10, c11, c12) and the c6 collection
|
||||
blocker on this host (missing `psycopg_pool` is a known dev-
|
||||
machine env issue, pre-existing) — 529 passing, 19 environment-
|
||||
skipped, 1 warning (pre-existing `pynvml` FutureWarning unrelated
|
||||
to AZ-299).
|
||||
|
||||
## Next batch
|
||||
|
||||
Cycle 1 advances per the greenfield queue — autodev re-detects the
|
||||
next AZ ticket in the Step 7 batch loop. With C7's three concrete
|
||||
strategies now landed (AZ-298 / AZ-299 / AZ-300), the remaining
|
||||
C7 work is `AZ-301 c7_engine_gate` (already in `done/`) +
|
||||
`AZ-302 c7_thermal_publisher` (already in `done/`); the next
|
||||
ticket in dependency order is the first item in the queue that
|
||||
doesn't depend on a pending earlier task — autodev will compute
|
||||
that during the next sub-step.
|
||||
@@ -0,0 +1,54 @@
|
||||
# Code Review Report — Batch 32 / Cycle 1
|
||||
|
||||
**Batch**: 32
|
||||
**Tasks**: AZ-299 (C7 OnnxTrtEpRuntime)
|
||||
**Date**: 2026-05-12
|
||||
**Verdict**: PASS_WITH_WARNINGS
|
||||
|
||||
## Findings
|
||||
|
||||
| # | Severity | Category | File:Line | Title |
|
||||
|---|----------|----------|-----------|-------|
|
||||
| 1 | Low | Maintainability | `src/gps_denied_onboard/components/c7_inference/onnx_trt_ep_runtime.py::_iso_ts_now` | Duplicates the equivalent helper in `tensorrt_runtime.py` / `fdr_client` |
|
||||
| 2 | Low | Test-quality | `tests/unit/c7_inference/test_onnx_trt_ep_runtime.py::test_ac4_infer_round_trips_named_outputs` | Round-trip verified via fake ORT session; numerical FP16 comparison lives in Tier-2 microbench |
|
||||
| 3 | Low | Architecture | `onnx_trt_ep_runtime.py::_stage_engine_for_ort` | Symlink-first with copy fallback can leave a torn copy on disk if interrupted mid-copy |
|
||||
| 4 | Low | Test-coverage | AC-3 schema-mismatch path | Real gate filename-schema parser exercised in AZ-281 / AZ-301; this test stubs `EngineGate.validate` |
|
||||
|
||||
### Finding Details
|
||||
|
||||
**F1: `_iso_ts_now` duplicated component-locally** (Low / Maintainability)
|
||||
- Location: `src/gps_denied_onboard/components/c7_inference/onnx_trt_ep_runtime.py` — module-level helper
|
||||
- Description: A one-liner ISO-8601 timestamp helper, also present in `tensorrt_runtime.py` and `fdr_client`. Consolidating would either inflate `fdr_client/records.py` (the lowest-layer module the C7 strategies depend on, currently free of utility functions) or carve out a shared utility module for a single one-liner.
|
||||
- Suggestion: Extract alongside other shared ISO-timestamp call sites in a future hygiene pass (likely when `_types/` grows enough to justify a shared `_utils/` neighbour). For now the C7 layering rule wins.
|
||||
- Task: AZ-299
|
||||
- Resolution: Open (Low) — accepted as documented.
|
||||
|
||||
**F2: Round-trip via fake ORT session** (Low / Test-quality)
|
||||
- Location: `tests/unit/c7_inference/test_onnx_trt_ep_runtime.py::test_ac4_infer_round_trips_named_outputs`
|
||||
- Description: Uses `_FakeOrtSession.run(...)` returning canned arrays in the declared output order. The named-output mapping is verified at the Protocol layer; the *numerical* FP16 comparison against TRT-direct (the second half of AC-4) is a Tier-2 placeholder skip that runs on the Jetson microbench harness.
|
||||
- Suggestion: None — the Tier-1 / macOS dev environment lacks ORT + CUDA. The Tier-2 placeholder owns the numerical half explicitly.
|
||||
- Task: AZ-299
|
||||
- Resolution: Open (Low) — accepted as documented.
|
||||
|
||||
**F3: `_stage_engine_for_ort` symlink-then-copy** (Low / Architecture)
|
||||
- Location: `onnx_trt_ep_runtime.py::_stage_engine_for_ort`
|
||||
- Description: The helper tries `os.symlink(...)` first and falls back to `shutil.copy2(...)` on `OSError`. If the copy is interrupted partway, the EP cache directory ends up with a torn file. The runtime does NOT validate the staged file post-copy.
|
||||
- Suggestion: A torn copy left in the EP cache is no worse than a stale subgraph (ORT rebuilds on hash mismatch). C12's per-flight cache cleanup wipes the directory between flights, so the failure window is bounded to a single flight's deserialise attempts.
|
||||
- Task: AZ-299
|
||||
- Resolution: Open (Low) — accepted as documented; C12 owns cleanup.
|
||||
|
||||
**F4: AC-3 stubs `EngineGate.validate`** (Low / Test-coverage)
|
||||
- Location: `tests/unit/c7_inference/test_onnx_trt_ep_runtime.py::test_ac3_deserialize_from_engine_invokes_gate_and_skips_session_on_refusal`
|
||||
- Description: Patches `EngineGate.validate` to raise `EngineSchemaMismatchError`; the real filename-schema parser lives behind AZ-281 + AZ-301 and is exercised by their respective unit tests.
|
||||
- Suggestion: Wiring this runtime to a live gate would duplicate AZ-301's coverage at the wrong layer. Keep the stub here; the runtime's contract with the gate is "if it raises, propagate without touching ORT" — verified by the `_load_ort` monkey-patch that asserts no ORT import on the refusal path.
|
||||
- Task: AZ-299
|
||||
- Resolution: Open (Low) — accepted as documented.
|
||||
|
||||
## Verdict Logic
|
||||
|
||||
- 0 Critical
|
||||
- 0 High
|
||||
- 0 Medium
|
||||
- 4 Low
|
||||
|
||||
→ **PASS_WITH_WARNINGS**: only Low findings; all accepted as documented.
|
||||
@@ -12,3 +12,4 @@ sub_step:
|
||||
retry_count: 0
|
||||
cycle: 1
|
||||
tracker: jira
|
||||
last_completed_batch: 32
|
||||
|
||||
Reference in New Issue
Block a user