mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 11:31:13 +00:00
[AZ-299] C7 OnnxTrtEpRuntime: ORT + TRT EP fallback strategy
Land the fallback InferenceRuntime strategy that satisfies C7-IT-05: when the TRT-direct path (AZ-298) cannot deserialise a cached engine or when the operator explicitly selects ORT, the system stays in the air at degraded latency rather than dropping the request. Conforms to the AZ-297 Protocol; current_runtime_label() == "onnx_trt_ep". Production - onnx_trt_ep_runtime.py: compile_engine is a no-op returning an EngineCacheEntry pointing at the source .onnx; deserialize_engine is gate-first for .engine entries and gate-skip for .onnx, builds an ORT InferenceSession with the provider list [TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider], stages cached engines into the ORT TRT EP cache directory via symlink-or-copy, warms up with one session.run after construction, and honours config.inference.ort_disallow_cpu_ fallback by raising EngineDeserializeError when the active provider resolves to CPU; infer emits a one-shot c7.fallback_to_onnx_trt_ep WARN log plus gcs_alert callback on first call when is_fallback= True; release_engine is idempotent. _build_provider_args is the single point that pins TRT EP option-key names (Risk-3) and caps trt_max_workspace_size at gpu_memory_budget_bytes // 4 (AC-8). - config.py: adds ort_trt_cache_dir (validated non-empty) and ort_disallow_cpu_fallback to C7InferenceConfig. - fdr_client/records.py: adds c7.fallback_to_onnx_trt_ep and c7.cpu_fallback FDR record kinds. Tests - test_onnx_trt_ep_runtime.py: covers AC-1..AC-8 + Risk-2 CPU-fallback alert + Risk-3 option-key pin + NFR-reliability error rewrap; Tier-1 via fake ORT session; Tier-2 placeholders skip on macOS dev for numerical FP16 comparison and session-creation perf NFR. - test_protocol_conformance.py: drops onnx_trt_ep from the missing- module parametrize now that the module ships. - test_az272_fdr_record_schema.py: extends per-kind fixture builder to cover the two new C7 FDR kinds in the roundtrip / schema-version AC tests. Docs - module-layout.md: replaces the pending onnx_trt_runtime row with the shipped onnx_trt_ep_runtime row + capabilities list. - batch_32_cycle1_report.md + reviews/batch_32_review.md: full batch + self-review (PASS_WITH_WARNINGS, 4 Low findings accepted). Tests run: c7_inference 139 passing + 17 Tier-2 skips; combined unit suite (excluding pending components) 529 passing, 19 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -302,7 +302,7 @@ def test_ac5_build_inference_runtime_flag_off_no_import(
|
||||
sorted(
|
||||
rt
|
||||
for rt in _STRATEGY_MODULES
|
||||
if rt not in {"pytorch_fp16", "tensorrt"}
|
||||
if rt not in {"pytorch_fp16", "tensorrt", "onnx_trt_ep"}
|
||||
),
|
||||
)
|
||||
def test_ac5_build_inference_runtime_flag_on_but_module_missing(
|
||||
@@ -310,12 +310,12 @@ def test_ac5_build_inference_runtime_flag_on_but_module_missing(
|
||||
) -> None:
|
||||
"""``BUILD_*=ON`` but the strategy module hasn't been written yet.
|
||||
|
||||
``pytorch_fp16`` (AZ-300) and ``tensorrt`` (AZ-298) are excluded —
|
||||
both shipped their concrete modules and are covered by
|
||||
``test_pytorch_fp16_runtime.test_ac1_protocol_conformance`` and
|
||||
``test_tensorrt_runtime.test_ac1_protocol_conformance``. Only
|
||||
``onnx_trt_ep`` (AZ-299) remains pending; this test still guards
|
||||
its factory path.
|
||||
AZ-298 (TensorrtRuntime), AZ-299 (OnnxTrtEpRuntime), and AZ-300
|
||||
(PytorchFp16Runtime) have all shipped their concrete modules and
|
||||
are excluded; their protocol conformance is covered in the
|
||||
per-strategy test files. This parameterisation guards the factory
|
||||
path for any future strategy whose `BUILD_*` flag is wired in
|
||||
`inference_factory._RUNTIME_TO_MODULE` ahead of its module landing.
|
||||
"""
|
||||
_, _, flag = _STRATEGY_MODULES[runtime]
|
||||
monkeypatch.setenv(flag, "ON")
|
||||
|
||||
Reference in New Issue
Block a user