mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 20:21:12 +00:00
bc40ea7300
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
157 lines
9.8 KiB
Markdown
157 lines
9.8 KiB
Markdown
# Resource Limit Tests
|
||
|
||
Authored by `/test-spec` Phase 2 (2026-05-19). Resource-limit tests assert that the SUT stays within a quantified resource ceiling for the configured duration. Short bursts do not satisfy these tests — every scenario has an explicit sustained-monitoring window.
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained
|
||
**Summary**: Combined process RSS on the deployed compute device for everything autopilot owns onboard (excluding Tier 1) MUST stay ≤ 6 GB throughout a 5-minute steady-state window with the full onboard workload active.
|
||
**Traces to**: AC `Resources & Data — Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within ≤ 6 GB / Re1`, RESTRICT `Hardware — Compute device: Jetson Orin Nano Super, 8 GB shared LPDDR5; Tier 1 consumes ~2 GB, leaving ~6 GB for autopilot`.
|
||
|
||
**Tier**: HW (representative Jetson Orin Nano Super) — pure-x86 reports informational only and does NOT satisfy the project-level Acceptance Gate.
|
||
|
||
**Preconditions**:
|
||
- Full onboard workload active: frame ingest from `rtsp-loopback`, Tier-2 + Tier-3 (when enabled) inferring at the documented steady-state load, gimbal commands flowing, MAVLink stream consumed at 10 Hz, operator-stream connected, MapObjects store hydrated for a 30×30 km region.
|
||
- Warm-up: 60 s before measurement starts (any first-load model warm-up complete).
|
||
- Tier-1 process is RUNNING in parallel but its RSS is EXCLUDED from the measurement (the AC scope is autopilot-owned RSS, excluding Tier 1).
|
||
|
||
**Monitoring**:
|
||
- Cgroup-level RSS for every process the SUT owns (the SUT binary plus any child processes it spawns — e.g., the VLM IPC peer if it lives in autopilot's cgroup), sampled at 1 Hz.
|
||
- Cgroup-level RSS for Tier 1 sampled at the same cadence (for the Re2 cross-reference).
|
||
- Per-process RSS captured to `reports/<run-id>/rss-trace.csv` for forensic review on failure.
|
||
|
||
**Duration**: 5 minutes of measurement after warm-up.
|
||
|
||
**Pass criteria**:
|
||
- `threshold_max`: per 1 s sample, `sum(autopilot_owned_RSS) ≤ 6 GB`.
|
||
- No single 1 s sample exceeds the ceiling.
|
||
- (Reporting only — not pass/fail): peak RSS, mean RSS, P95 RSS recorded in the CSV report.
|
||
|
||
**Test status**: DEFERRED — `<DEFERRED: long-running scenario harness exercising the full onboard workload for 5 min; inline-authorable but requires that the SUT be operational end-to-end first>`.
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload
|
||
**Summary**: When autopilot's full onboard workload runs concurrently with Tier 1 on the same Jetson, Tier-1 per-frame latency MUST NOT degrade by more than ± 5 ms versus the Tier-1-alone baseline (recorded by NFT-PERF-L1).
|
||
**Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms when autopilot's own onboard workload is running concurrently / Re2`, RESTRICT `Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`.
|
||
|
||
**Tier**: HW (the only meaningful environment for this assertion — GPU contention behaviour does not reproduce on x86).
|
||
|
||
**Preconditions**:
|
||
- NFT-PERF-L1 has been run on the same HW configuration in the SAME session and a baseline `tier1_baseline_p95_ms` recorded.
|
||
- Full onboard workload active (same as Re1).
|
||
|
||
**Monitoring**:
|
||
- Tier-1 per-frame latency sampled per frame for the duration of the test.
|
||
- The same metric source as NFT-PERF-L1 — for direct delta comparison.
|
||
|
||
**Duration**: 5 minutes of measurement after warm-up (matches Re1 window so both can run in the same session).
|
||
|
||
**Pass criteria**:
|
||
- `numeric_tolerance`: `|p95(tier1_with_autopilot) - tier1_baseline_p95_ms| ≤ 5 ms`.
|
||
- (Reporting only): mean, P95, max delta over the window.
|
||
|
||
**Test status**: DEFERRED — same fixture dependency as Re1; requires SUT operational + Tier 1 colocated on HW.
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation
|
||
**Summary**: During a steady-state mission run (no abnormal load), the on-device persistent store MUST NOT exceed 95 % full. This protects the takeoff gate (R3) from being silently violated mid-mission and protects the post-flight push (Mp4) from running out of room to persist a failed diff.
|
||
**Traces to**: AC `Reliability & Safety — On-device storage MUST be bounded` (via R3 BIT gate), RESTRICT `On-device storage MUST be bounded`.
|
||
|
||
**Tier**: B + HW.
|
||
|
||
**Preconditions**:
|
||
- SUT mid-flight; persistent store at typical post-takeoff utilisation (e.g. 30 %).
|
||
- Normal-operation event volume: telemetry persistence, ignored-item appends, pending map-diff buffer (empty in this scenario).
|
||
|
||
**Monitoring**:
|
||
- Volume utilisation sampled at 10 Hz throughout the duration.
|
||
|
||
**Duration**: 60 minutes (representative mission duration per Mp3).
|
||
|
||
**Pass criteria**:
|
||
- `threshold_max`: `volume_used / volume_total ≤ 0.95` at every sample point.
|
||
- On approach to 85 %: structured-log INFO `storage_pressure` with current utilisation.
|
||
- On approach to 90 %: structured-log WARN with current utilisation; health.storage transitions to yellow.
|
||
- On 95 %: health.storage transitions to red; the SUT begins its documented eviction policy (this scenario does NOT test the policy semantics — that belongs to its own scenario; this scenario only asserts the policy IS triggered).
|
||
|
||
**Test status**: READY (no external fixture beyond the SUT itself; the persistent-store seed file controls starting utilisation).
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee
|
||
**Summary**: Combined CPU utilisation of every autopilot-owned process MUST leave enough Jetson CPU headroom for Tier 1 to keep its NFT-PERF-L1 budget. Concretely: per-second sustained CPU usage by autopilot-owned processes MUST stay ≤ the configured budget (default 60 % of total CPU cycles measured at the cgroup level) for the duration of the run.
|
||
**Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms / Re2` (CPU-side mechanism backing Re2), RESTRICT `Hardware — Jetson Orin Nano Super`.
|
||
|
||
**Tier**: HW (CPU contention does not reproduce on x86).
|
||
|
||
**Preconditions**:
|
||
- Same workload as Re1 + Re2.
|
||
|
||
**Monitoring**:
|
||
- Cgroup CPU usage at 1 Hz.
|
||
|
||
**Duration**: 5 minutes after warm-up.
|
||
|
||
**Pass criteria**:
|
||
- `threshold_max`: per 1 s sample, `sum(autopilot_cpu_usage) ≤ 60 %` of total CPU.
|
||
- Reporting: mean, P95, max.
|
||
|
||
**Test status**: DEFERRED — same dependency as Re1/Re2.
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model)
|
||
**Summary**: Per RESTRICT (`Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`), the SUT MUST NOT issue a GPU compute call (e.g. Tier-3 VLM inference) while Tier 1 is executing on the GPU. The serialisation MUST be observable: a single GPU is busy at one instant.
|
||
**Traces to**: RESTRICT `Tier 1 and any local large model … only one of them may execute at any wall-clock instant`.
|
||
|
||
**Tier**: HW.
|
||
|
||
**Preconditions**:
|
||
- Tier 1 active; SUT in a ZoomedIn hold with deep-analysis enabled (Tier-3 will fire).
|
||
|
||
**Monitoring**:
|
||
- GPU-instance occupancy via `tegrastats` / equivalent at the highest available sampling rate.
|
||
- The SUT's own internal "compute-class" telemetry exposed on the health endpoint as `gpu_owner_current` ∈ { "tier1", "tier3", "idle" }.
|
||
|
||
**Duration**: 60 s containing ≥ 5 Tier-3 hold cycles.
|
||
|
||
**Pass criteria**:
|
||
- `exact`: at every sample point, `gpu_owner_current ∈ { "tier1", "tier3", "idle" }`; never simultaneously both.
|
||
- `tegrastats` peak GPU occupancy attributable to autopilot processes never overlaps Tier 1's known activity window for the same wall-clock instant.
|
||
|
||
**Test status**: DEFERRED — depends on the SUT being operational end-to-end + Tier-3 enabled; also depends on the SUT exposing `gpu_owner_current` (which is an architectural choice not yet locked).
|
||
|
||
---
|
||
|
||
### NFT-RES-LIM-FileHandles: File-descriptor and socket bound
|
||
**Summary**: Sustained operation MUST NOT leak file descriptors or sockets. The count MUST stay within a documented headroom of the initial-post-warmup baseline for the duration of the run.
|
||
**Traces to**: RESTRICT `On-device storage MUST be bounded` (general bounded-resource principle), security principle `No silent error swallowing for security-relevant failures` (FD exhaustion would silently break the operator-stream).
|
||
|
||
**Tier**: B + HW.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 60 s.
|
||
- Workload: full onboard workload at steady state.
|
||
|
||
**Monitoring**:
|
||
- `/proc/<pid>/fd` count per autopilot process at 1 Hz.
|
||
|
||
**Duration**: 60 minutes.
|
||
|
||
**Pass criteria**:
|
||
- `threshold_max`: at every sample point, `fd_count ≤ fd_baseline_post_warmup + 50` (50 = documented churn headroom for intermittent operator reconnects).
|
||
- A monotonically rising trend (slope > 0 over the run) is a TEST FAILURE even if the absolute ceiling is not breached.
|
||
|
||
**Test status**: READY for a Tier-B run; gains its real value once HW + sustained-workload land.
|
||
|
||
---
|
||
|
||
## Common assertions for every resource-limit scenario
|
||
|
||
- **Sustained-monitoring is non-negotiable.** Each scenario specifies a duration ≥ 60 s; short bursts that pass do not satisfy the test. The CSV report records the full sample trace path under `artifacts_path`.
|
||
- **No silent eviction.** Where a ceiling is approached, the SUT MUST surface the pressure (structured-log INFO at 85 %, WARN at 90 %, transition to yellow/red on health) BEFORE reaching the ceiling. A pass with no observable pressure signal at thresholds is a TEST FAILURE.
|
||
- **HW reporting vs gating.** Pure-x86 runs report informational deltas only; they do NOT satisfy the project-level Acceptance Gate. Every CSV row records its tier so this distinction stays auditable.
|
||
- **Re1 + Re2 are paired.** Re1 establishes the autopilot RSS ceiling; Re2 establishes that respecting Re1 does not cost Tier 1 latency. They MUST be run in the same session to make the Re2 baseline meaningful.
|