Files
Oleksandr Bezdieniezhnykh d066a23cb1 [autodev] Add Tier-2 Jetson testing strategy doc
Codifies that Tier-1 (local pytest + Docker) is necessary but NOT
sufficient: Tier-2 (Jetson Orin Nano via run-tests-jetson.sh) is the
product-completeness gate for runtime_root, c7_inference, c3_matcher,
c2_5_rerank, replay_input, and the replay CLI. Documents the
mandatory-Tier-2 scope, what Tier-1-only stubs cannot prove, the
operating procedure, and what batch reports must capture for in-scope
changes. Surfaced by the Step-11 cycle-1 finding that AZ-618 was only
caught because Tier-2 was actually run.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 06:06:47 +03:00

4.3 KiB

Tier-2 Jetson Testing

This project ships to a Jetson Orin Nano (JetPack 6.2.2+b24). The dev host (macOS / Linux x86) cannot exercise the production GPU path. Tier-1 (local pytest + Tier-1 Docker) is the first gate; Tier-2 (Jetson) is the product gate.

A feature, bug fix, or refactor is NOT done until both tiers have passed when Tier-2 is in scope (see "Mandatory-Tier-2 Scope" below).

Tiers

Tier Where Runs Authority
Tier-1 (local pytest) Mac/Linux x86 dev host unit + integration + non-GPU e2e required
Tier-1 (Docker) dev-host docker-compose blackbox over MAVLink/HTTP/FS required
Tier-2 (Jetson e2e) operator's Jetson Orin Nano via scripts/run-tests-jetson.sh tests/e2e/replay/test_derkachi_1min.py AC-1..AC-6 product completeness

Mandatory-Tier-2 Scope

A task / batch / refactor MUST be exercised on Tier-2, AND the run MUST cross both replay.compose_root.ready and replay.input.frame_emitted log lines (per AZ-618 AC-5), when ANY of the following changed:

  • src/gps_denied_onboard/runtime_root/** (composition root, airborne_bootstrap, factories)
  • src/gps_denied_onboard/components/c7_inference/** (TensorRT / PyTorch FP16 GPU runtime)
  • src/gps_denied_onboard/components/c3_matcher/** (DISK / ALIKED + LightGlue)
  • src/gps_denied_onboard/components/c2_5_rerank/** (LightGlue inlier reranker)
  • src/gps_denied_onboard/replay_input/** (replay coordinator + auto-sync)
  • src/gps_denied_onboard/cli/replay.py (replay CLI wrapper)
  • Any task whose Acceptance Criteria reference AC-5 of AZ-618 or an nft_* Tier-2 scenario
  • Dockerfile.jetson, scripts/run-tests-jetson.sh, or any e2e harness file under e2e/jetson/

For changes confined to Tier-1 surfaces (configs, helpers, non-GPU components, unit-test refactors), Tier-1 alone is sufficient — but the batch report MUST state explicitly that Tier-2 was not exercised and why.

What "Tier-1 only" tests CANNOT prove

Tier-1 stubs the inference engine, fakes the GPU runtime, and bypasses compose_root's registry-driven path via replay_components_factory. A green Tier-1 run does NOT prove:

  • TensorRT engine loads on the Jetson kernel
  • PyTorch FP16 path runs on the device
  • Real LightGlue matching produces non-degenerate correspondences
  • Composition root assembles pre_constructed correctly under JetPack
  • End-to-end latency / thermal / determinism budgets hold

These are Tier-2-only signals.

Operating procedure

  1. Land code on dev with Tier-1 green.
  2. Run scripts/run-tests-jetson.sh from a host with SSH access to the Jetson (see _docs/03_implementation/jetson_harness_setup.md). The script rsyncs the working tree, builds engines on-device when needed, and runs tests/e2e/replay/test_derkachi_1min.py.
  3. Capture the Jetson terminal log path in the batch report's Evidence: field (e.g., terminals/<id>.txt or a saved transcript).
  4. A failing Tier-2 run sends the autodev flow back to Step 7 (Implement) per flows/greenfield.md Step 11 — missing internal product implementation must NOT be papered over by Tier-1 stubs.

What batch reports MUST include for in-scope changes

  • Tier-2 evidence: — path to Jetson terminal log, host name, JetPack version, run timestamp.
  • Tier-2 verdict: — PASS / FAIL / BLOCKED-on-hardware (only when the Jetson is physically unavailable AND the operator cannot rsync; never as a quiet skip).
  • If BLOCKED-on-hardware, an open leftover entry under _docs/_process_leftovers/ with a target replay date.

Rationale

Step 11 cycle-1 found AZ-618 ONLY because Tier-2 was actually run. Tier-1 was 3343/0 green at the same moment. Skipping Tier-2 hides the production gap by definition — the Tier-1 contract path explicitly bypasses the missing wire.

  • _docs/03_implementation/jetson_harness_setup.md — physical setup, SSH config, JetPack provisioning
  • _docs/02_document/tests/traceability-matrix.md — AC → scenario mapping (Tier-1 vs. Tier-2 markers)
  • _docs/02_document/tests/blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, resource-limit-tests.md — per-category specs; Tier-2-only scenarios are tagged in each
  • _docs/02_tasks/done/AZ-444*.md — Tier-2 Jetson harness wrapper (run-tier2.sh, ssh provisioning, systemd, ASan-fuzz)