From d066a23cb1022842ad2b4547f811fed90fe2ae53 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Tue, 19 May 2026 06:06:47 +0300 Subject: [PATCH] [autodev] Add Tier-2 Jetson testing strategy doc Codifies that Tier-1 (local pytest + Docker) is necessary but NOT sufficient: Tier-2 (Jetson Orin Nano via run-tests-jetson.sh) is the product-completeness gate for runtime_root, c7_inference, c3_matcher, c2_5_rerank, replay_input, and the replay CLI. Documents the mandatory-Tier-2 scope, what Tier-1-only stubs cannot prove, the operating procedure, and what batch reports must capture for in-scope changes. Surfaced by the Step-11 cycle-1 finding that AZ-618 was only caught because Tier-2 was actually run. Co-authored-by: Cursor --- .../02_document/tests/tier2-jetson-testing.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 _docs/02_document/tests/tier2-jetson-testing.md diff --git a/_docs/02_document/tests/tier2-jetson-testing.md b/_docs/02_document/tests/tier2-jetson-testing.md new file mode 100644 index 0000000..e6cefd3 --- /dev/null +++ b/_docs/02_document/tests/tier2-jetson-testing.md @@ -0,0 +1,64 @@ +# Tier-2 Jetson Testing + +This project ships to a Jetson Orin Nano (JetPack 6.2.2+b24). The dev host (macOS / Linux x86) cannot exercise the production GPU path. **Tier-1 (local pytest + Tier-1 Docker) is the first gate; Tier-2 (Jetson) is the product gate.** + +A feature, bug fix, or refactor is NOT done until both tiers have passed when Tier-2 is in scope (see "Mandatory-Tier-2 Scope" below). + +## Tiers + +| Tier | Where | Runs | Authority | +|------|-------|------|-----------| +| Tier-1 (local pytest) | Mac/Linux x86 dev host | unit + integration + non-GPU e2e | required | +| Tier-1 (Docker) | dev-host docker-compose | blackbox over MAVLink/HTTP/FS | required | +| Tier-2 (Jetson e2e) | operator's Jetson Orin Nano via `scripts/run-tests-jetson.sh` | `tests/e2e/replay/test_derkachi_1min.py` AC-1..AC-6 | **product completeness** | + +## Mandatory-Tier-2 Scope + +A task / batch / refactor MUST be exercised on Tier-2, AND the run MUST cross both `replay.compose_root.ready` and `replay.input.frame_emitted` log lines (per AZ-618 AC-5), when ANY of the following changed: + +- `src/gps_denied_onboard/runtime_root/**` (composition root, airborne_bootstrap, factories) +- `src/gps_denied_onboard/components/c7_inference/**` (TensorRT / PyTorch FP16 GPU runtime) +- `src/gps_denied_onboard/components/c3_matcher/**` (DISK / ALIKED + LightGlue) +- `src/gps_denied_onboard/components/c2_5_rerank/**` (LightGlue inlier reranker) +- `src/gps_denied_onboard/replay_input/**` (replay coordinator + auto-sync) +- `src/gps_denied_onboard/cli/replay.py` (replay CLI wrapper) +- Any task whose Acceptance Criteria reference AC-5 of AZ-618 or an `nft_*` Tier-2 scenario +- `Dockerfile.jetson`, `scripts/run-tests-jetson.sh`, or any e2e harness file under `e2e/jetson/` + +For changes confined to Tier-1 surfaces (configs, helpers, non-GPU components, unit-test refactors), Tier-1 alone is sufficient — but the batch report MUST state explicitly that Tier-2 was not exercised and why. + +## What "Tier-1 only" tests CANNOT prove + +Tier-1 stubs the inference engine, fakes the GPU runtime, and bypasses `compose_root`'s registry-driven path via `replay_components_factory`. A green Tier-1 run does NOT prove: + +- TensorRT engine loads on the Jetson kernel +- PyTorch FP16 path runs on the device +- Real LightGlue matching produces non-degenerate correspondences +- Composition root assembles `pre_constructed` correctly under JetPack +- End-to-end latency / thermal / determinism budgets hold + +These are Tier-2-only signals. + +## Operating procedure + +1. Land code on `dev` with Tier-1 green. +2. Run `scripts/run-tests-jetson.sh` from a host with SSH access to the Jetson (see `_docs/03_implementation/jetson_harness_setup.md`). The script rsyncs the working tree, builds engines on-device when needed, and runs `tests/e2e/replay/test_derkachi_1min.py`. +3. Capture the Jetson terminal log path in the batch report's `Evidence:` field (e.g., `terminals/.txt` or a saved transcript). +4. A failing Tier-2 run sends the autodev flow back to Step 7 (Implement) per `flows/greenfield.md` Step 11 — missing internal product implementation must NOT be papered over by Tier-1 stubs. + +## What batch reports MUST include for in-scope changes + +- `Tier-2 evidence:` — path to Jetson terminal log, host name, JetPack version, run timestamp. +- `Tier-2 verdict:` — PASS / FAIL / BLOCKED-on-hardware (only when the Jetson is physically unavailable AND the operator cannot rsync; never as a quiet skip). +- If `BLOCKED-on-hardware`, an open leftover entry under `_docs/_process_leftovers/` with a target replay date. + +## Rationale + +Step 11 cycle-1 found AZ-618 ONLY because Tier-2 was actually run. Tier-1 was 3343/0 green at the same moment. Skipping Tier-2 hides the production gap by definition — the Tier-1 contract path explicitly bypasses the missing wire. + +## Related documents + +- `_docs/03_implementation/jetson_harness_setup.md` — physical setup, SSH config, JetPack provisioning +- `_docs/02_document/tests/traceability-matrix.md` — AC → scenario mapping (Tier-1 vs. Tier-2 markers) +- `_docs/02_document/tests/blackbox-tests.md`, `performance-tests.md`, `resilience-tests.md`, `security-tests.md`, `resource-limit-tests.md` — per-category specs; Tier-2-only scenarios are tagged in each +- `_docs/02_tasks/done/AZ-444*.md` — Tier-2 Jetson harness wrapper (run-tier2.sh, ssh provisioning, systemd, ASan-fuzz)