Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
9.8 KiB
Resource Limit Tests
Authored by /test-spec Phase 2 (2026-05-19). Resource-limit tests assert that the SUT stays within a quantified resource ceiling for the configured duration. Short bursts do not satisfy these tests — every scenario has an explicit sustained-monitoring window.
NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained
Summary: Combined process RSS on the deployed compute device for everything autopilot owns onboard (excluding Tier 1) MUST stay ≤ 6 GB throughout a 5-minute steady-state window with the full onboard workload active.
Traces to: AC Resources & Data — Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within ≤ 6 GB / Re1, RESTRICT Hardware — Compute device: Jetson Orin Nano Super, 8 GB shared LPDDR5; Tier 1 consumes ~2 GB, leaving ~6 GB for autopilot.
Tier: HW (representative Jetson Orin Nano Super) — pure-x86 reports informational only and does NOT satisfy the project-level Acceptance Gate.
Preconditions:
- Full onboard workload active: frame ingest from
rtsp-loopback, Tier-2 + Tier-3 (when enabled) inferring at the documented steady-state load, gimbal commands flowing, MAVLink stream consumed at 10 Hz, operator-stream connected, MapObjects store hydrated for a 30×30 km region. - Warm-up: 60 s before measurement starts (any first-load model warm-up complete).
- Tier-1 process is RUNNING in parallel but its RSS is EXCLUDED from the measurement (the AC scope is autopilot-owned RSS, excluding Tier 1).
Monitoring:
- Cgroup-level RSS for every process the SUT owns (the SUT binary plus any child processes it spawns — e.g., the VLM IPC peer if it lives in autopilot's cgroup), sampled at 1 Hz.
- Cgroup-level RSS for Tier 1 sampled at the same cadence (for the Re2 cross-reference).
- Per-process RSS captured to
reports/<run-id>/rss-trace.csvfor forensic review on failure.
Duration: 5 minutes of measurement after warm-up.
Pass criteria:
threshold_max: per 1 s sample,sum(autopilot_owned_RSS) ≤ 6 GB.- No single 1 s sample exceeds the ceiling.
- (Reporting only — not pass/fail): peak RSS, mean RSS, P95 RSS recorded in the CSV report.
Test status: DEFERRED — <DEFERRED: long-running scenario harness exercising the full onboard workload for 5 min; inline-authorable but requires that the SUT be operational end-to-end first>.
NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload
Summary: When autopilot's full onboard workload runs concurrently with Tier 1 on the same Jetson, Tier-1 per-frame latency MUST NOT degrade by more than ± 5 ms versus the Tier-1-alone baseline (recorded by NFT-PERF-L1).
Traces to: AC Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms when autopilot's own onboard workload is running concurrently / Re2, RESTRICT Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant.
Tier: HW (the only meaningful environment for this assertion — GPU contention behaviour does not reproduce on x86).
Preconditions:
- NFT-PERF-L1 has been run on the same HW configuration in the SAME session and a baseline
tier1_baseline_p95_msrecorded. - Full onboard workload active (same as Re1).
Monitoring:
- Tier-1 per-frame latency sampled per frame for the duration of the test.
- The same metric source as NFT-PERF-L1 — for direct delta comparison.
Duration: 5 minutes of measurement after warm-up (matches Re1 window so both can run in the same session).
Pass criteria:
numeric_tolerance:|p95(tier1_with_autopilot) - tier1_baseline_p95_ms| ≤ 5 ms.- (Reporting only): mean, P95, max delta over the window.
Test status: DEFERRED — same fixture dependency as Re1; requires SUT operational + Tier 1 colocated on HW.
NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation
Summary: During a steady-state mission run (no abnormal load), the on-device persistent store MUST NOT exceed 95 % full. This protects the takeoff gate (R3) from being silently violated mid-mission and protects the post-flight push (Mp4) from running out of room to persist a failed diff.
Traces to: AC Reliability & Safety — On-device storage MUST be bounded (via R3 BIT gate), RESTRICT On-device storage MUST be bounded.
Tier: B + HW.
Preconditions:
- SUT mid-flight; persistent store at typical post-takeoff utilisation (e.g. 30 %).
- Normal-operation event volume: telemetry persistence, ignored-item appends, pending map-diff buffer (empty in this scenario).
Monitoring:
- Volume utilisation sampled at 10 Hz throughout the duration.
Duration: 60 minutes (representative mission duration per Mp3).
Pass criteria:
threshold_max:volume_used / volume_total ≤ 0.95at every sample point.- On approach to 85 %: structured-log INFO
storage_pressurewith current utilisation. - On approach to 90 %: structured-log WARN with current utilisation; health.storage transitions to yellow.
- On 95 %: health.storage transitions to red; the SUT begins its documented eviction policy (this scenario does NOT test the policy semantics — that belongs to its own scenario; this scenario only asserts the policy IS triggered).
Test status: READY (no external fixture beyond the SUT itself; the persistent-store seed file controls starting utilisation).
NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee
Summary: Combined CPU utilisation of every autopilot-owned process MUST leave enough Jetson CPU headroom for Tier 1 to keep its NFT-PERF-L1 budget. Concretely: per-second sustained CPU usage by autopilot-owned processes MUST stay ≤ the configured budget (default 60 % of total CPU cycles measured at the cgroup level) for the duration of the run.
Traces to: AC Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms / Re2 (CPU-side mechanism backing Re2), RESTRICT Hardware — Jetson Orin Nano Super.
Tier: HW (CPU contention does not reproduce on x86).
Preconditions:
- Same workload as Re1 + Re2.
Monitoring:
- Cgroup CPU usage at 1 Hz.
Duration: 5 minutes after warm-up.
Pass criteria:
threshold_max: per 1 s sample,sum(autopilot_cpu_usage) ≤ 60 %of total CPU.- Reporting: mean, P95, max.
Test status: DEFERRED — same dependency as Re1/Re2.
NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model)
Summary: Per RESTRICT (Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant), the SUT MUST NOT issue a GPU compute call (e.g. Tier-3 VLM inference) while Tier 1 is executing on the GPU. The serialisation MUST be observable: a single GPU is busy at one instant.
Traces to: RESTRICT Tier 1 and any local large model … only one of them may execute at any wall-clock instant.
Tier: HW.
Preconditions:
- Tier 1 active; SUT in a ZoomedIn hold with deep-analysis enabled (Tier-3 will fire).
Monitoring:
- GPU-instance occupancy via
tegrastats/ equivalent at the highest available sampling rate. - The SUT's own internal "compute-class" telemetry exposed on the health endpoint as
gpu_owner_current∈ { "tier1", "tier3", "idle" }.
Duration: 60 s containing ≥ 5 Tier-3 hold cycles.
Pass criteria:
exact: at every sample point,gpu_owner_current ∈ { "tier1", "tier3", "idle" }; never simultaneously both.tegrastatspeak GPU occupancy attributable to autopilot processes never overlaps Tier 1's known activity window for the same wall-clock instant.
Test status: DEFERRED — depends on the SUT being operational end-to-end + Tier-3 enabled; also depends on the SUT exposing gpu_owner_current (which is an architectural choice not yet locked).
NFT-RES-LIM-FileHandles: File-descriptor and socket bound
Summary: Sustained operation MUST NOT leak file descriptors or sockets. The count MUST stay within a documented headroom of the initial-post-warmup baseline for the duration of the run.
Traces to: RESTRICT On-device storage MUST be bounded (general bounded-resource principle), security principle No silent error swallowing for security-relevant failures (FD exhaustion would silently break the operator-stream).
Tier: B + HW.
Preconditions:
- Warm-up: 60 s.
- Workload: full onboard workload at steady state.
Monitoring:
/proc/<pid>/fdcount per autopilot process at 1 Hz.
Duration: 60 minutes.
Pass criteria:
threshold_max: at every sample point,fd_count ≤ fd_baseline_post_warmup + 50(50 = documented churn headroom for intermittent operator reconnects).- A monotonically rising trend (slope > 0 over the run) is a TEST FAILURE even if the absolute ceiling is not breached.
Test status: READY for a Tier-B run; gains its real value once HW + sustained-workload land.
Common assertions for every resource-limit scenario
- Sustained-monitoring is non-negotiable. Each scenario specifies a duration ≥ 60 s; short bursts that pass do not satisfy the test. The CSV report records the full sample trace path under
artifacts_path. - No silent eviction. Where a ceiling is approached, the SUT MUST surface the pressure (structured-log INFO at 85 %, WARN at 90 %, transition to yellow/red on health) BEFORE reaching the ceiling. A pass with no observable pressure signal at thresholds is a TEST FAILURE.
- HW reporting vs gating. Pure-x86 runs report informational deltas only; they do NOT satisfy the project-level Acceptance Gate. Every CSV row records its tier so this distinction stays auditable.
- Re1 + Re2 are paired. Re1 establishes the autopilot RSS ceiling; Re2 establishes that respecting Re1 does not cost Tier 1 latency. They MUST be run in the same session to make the Re2 baseline meaningful.