mirror of https://github.com/azaion/autopilot.git synced 2026-06-21 17:51:09 +00:00

Files

T

Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 11:02:01 +03:00

9.8 KiB

Raw Blame History

Resource Limit Tests

Authored by /test-spec Phase 2 (2026-05-19). Resource-limit tests assert that the SUT stays within a quantified resource ceiling for the configured duration. Short bursts do not satisfy these tests — every scenario has an explicit sustained-monitoring window.

NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained

Summary: Combined process RSS on the deployed compute device for everything autopilot owns onboard (excluding Tier 1) MUST stay ≤ 6 GB throughout a 5-minute steady-state window with the full onboard workload active. Traces to: AC Resources & Data — Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within ≤ 6 GB / Re1, RESTRICT Hardware — Compute device: Jetson Orin Nano Super, 8 GB shared LPDDR5; Tier 1 consumes ~2 GB, leaving ~6 GB for autopilot.

Tier: HW (representative Jetson Orin Nano Super) — pure-x86 reports informational only and does NOT satisfy the project-level Acceptance Gate.

Preconditions:

Full onboard workload active: frame ingest from rtsp-loopback, Tier-2 + Tier-3 (when enabled) inferring at the documented steady-state load, gimbal commands flowing, MAVLink stream consumed at 10 Hz, operator-stream connected, MapObjects store hydrated for a 30×30 km region.
Warm-up: 60 s before measurement starts (any first-load model warm-up complete).
Tier-1 process is RUNNING in parallel but its RSS is EXCLUDED from the measurement (the AC scope is autopilot-owned RSS, excluding Tier 1).

Monitoring:

Cgroup-level RSS for every process the SUT owns (the SUT binary plus any child processes it spawns — e.g., the VLM IPC peer if it lives in autopilot's cgroup), sampled at 1 Hz.
Cgroup-level RSS for Tier 1 sampled at the same cadence (for the Re2 cross-reference).
Per-process RSS captured to reports/<run-id>/rss-trace.csv for forensic review on failure.

Duration: 5 minutes of measurement after warm-up.

Pass criteria:

threshold_max: per 1 s sample, sum(autopilot_owned_RSS) ≤ 6 GB.
No single 1 s sample exceeds the ceiling.
(Reporting only — not pass/fail): peak RSS, mean RSS, P95 RSS recorded in the CSV report.

Test status: DEFERRED — <DEFERRED: long-running scenario harness exercising the full onboard workload for 5 min; inline-authorable but requires that the SUT be operational end-to-end first>.

NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload

Summary: When autopilot's full onboard workload runs concurrently with Tier 1 on the same Jetson, Tier-1 per-frame latency MUST NOT degrade by more than ± 5 ms versus the Tier-1-alone baseline (recorded by NFT-PERF-L1). Traces to: AC Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms when autopilot's own onboard workload is running concurrently / Re2, RESTRICT Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant.

Tier: HW (the only meaningful environment for this assertion — GPU contention behaviour does not reproduce on x86).

Preconditions:

NFT-PERF-L1 has been run on the same HW configuration in the SAME session and a baseline tier1_baseline_p95_ms recorded.
Full onboard workload active (same as Re1).

Monitoring:

Tier-1 per-frame latency sampled per frame for the duration of the test.
The same metric source as NFT-PERF-L1 — for direct delta comparison.

Duration: 5 minutes of measurement after warm-up (matches Re1 window so both can run in the same session).

Pass criteria:

numeric_tolerance: |p95(tier1_with_autopilot) - tier1_baseline_p95_ms| ≤ 5 ms.
(Reporting only): mean, P95, max delta over the window.

Test status: DEFERRED — same fixture dependency as Re1; requires SUT operational + Tier 1 colocated on HW.

NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation

Summary: During a steady-state mission run (no abnormal load), the on-device persistent store MUST NOT exceed 95 % full. This protects the takeoff gate (R3) from being silently violated mid-mission and protects the post-flight push (Mp4) from running out of room to persist a failed diff. Traces to: AC Reliability & Safety — On-device storage MUST be bounded (via R3 BIT gate), RESTRICT On-device storage MUST be bounded.

Tier: B + HW.

Preconditions:

SUT mid-flight; persistent store at typical post-takeoff utilisation (e.g. 30 %).
Normal-operation event volume: telemetry persistence, ignored-item appends, pending map-diff buffer (empty in this scenario).

Monitoring:

Volume utilisation sampled at 10 Hz throughout the duration.

Duration: 60 minutes (representative mission duration per Mp3).

Pass criteria:

threshold_max: volume_used / volume_total ≤ 0.95 at every sample point.
On approach to 85 %: structured-log INFO storage_pressure with current utilisation.
On approach to 90 %: structured-log WARN with current utilisation; health.storage transitions to yellow.
On 95 %: health.storage transitions to red; the SUT begins its documented eviction policy (this scenario does NOT test the policy semantics — that belongs to its own scenario; this scenario only asserts the policy IS triggered).

Test status: READY (no external fixture beyond the SUT itself; the persistent-store seed file controls starting utilisation).

NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee

Summary: Combined CPU utilisation of every autopilot-owned process MUST leave enough Jetson CPU headroom for Tier 1 to keep its NFT-PERF-L1 budget. Concretely: per-second sustained CPU usage by autopilot-owned processes MUST stay ≤ the configured budget (default 60 % of total CPU cycles measured at the cgroup level) for the duration of the run. Traces to: AC Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms / Re2 (CPU-side mechanism backing Re2), RESTRICT Hardware — Jetson Orin Nano Super.

Tier: HW (CPU contention does not reproduce on x86).

Preconditions:

Same workload as Re1 + Re2.

Monitoring:

Cgroup CPU usage at 1 Hz.

Duration: 5 minutes after warm-up.

Pass criteria:

threshold_max: per 1 s sample, sum(autopilot_cpu_usage) ≤ 60 % of total CPU.
Reporting: mean, P95, max.

Test status: DEFERRED — same dependency as Re1/Re2.

NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model)

Summary: Per RESTRICT (Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant), the SUT MUST NOT issue a GPU compute call (e.g. Tier-3 VLM inference) while Tier 1 is executing on the GPU. The serialisation MUST be observable: a single GPU is busy at one instant. Traces to: RESTRICT Tier 1 and any local large model … only one of them may execute at any wall-clock instant.

Tier: HW.

Preconditions:

Tier 1 active; SUT in a ZoomedIn hold with deep-analysis enabled (Tier-3 will fire).

Monitoring:

GPU-instance occupancy via tegrastats / equivalent at the highest available sampling rate.
The SUT's own internal "compute-class" telemetry exposed on the health endpoint as gpu_owner_current ∈ { "tier1", "tier3", "idle" }.

Duration: 60 s containing ≥ 5 Tier-3 hold cycles.

Pass criteria:

exact: at every sample point, gpu_owner_current ∈ { "tier1", "tier3", "idle" }; never simultaneously both.
tegrastats peak GPU occupancy attributable to autopilot processes never overlaps Tier 1's known activity window for the same wall-clock instant.

Test status: DEFERRED — depends on the SUT being operational end-to-end + Tier-3 enabled; also depends on the SUT exposing gpu_owner_current (which is an architectural choice not yet locked).

NFT-RES-LIM-FileHandles: File-descriptor and socket bound

Summary: Sustained operation MUST NOT leak file descriptors or sockets. The count MUST stay within a documented headroom of the initial-post-warmup baseline for the duration of the run. Traces to: RESTRICT On-device storage MUST be bounded (general bounded-resource principle), security principle No silent error swallowing for security-relevant failures (FD exhaustion would silently break the operator-stream).

Tier: B + HW.

Preconditions:

Warm-up: 60 s.
Workload: full onboard workload at steady state.

Monitoring:

/proc/<pid>/fd count per autopilot process at 1 Hz.

Duration: 60 minutes.

Pass criteria:

threshold_max: at every sample point, fd_count ≤ fd_baseline_post_warmup + 50 (50 = documented churn headroom for intermittent operator reconnects).
A monotonically rising trend (slope > 0 over the run) is a TEST FAILURE even if the absolute ceiling is not breached.

Test status: READY for a Tier-B run; gains its real value once HW + sustained-workload land.

Common assertions for every resource-limit scenario

Sustained-monitoring is non-negotiable. Each scenario specifies a duration ≥ 60 s; short bursts that pass do not satisfy the test. The CSV report records the full sample trace path under artifacts_path.
No silent eviction. Where a ceiling is approached, the SUT MUST surface the pressure (structured-log INFO at 85 %, WARN at 90 %, transition to yellow/red on health) BEFORE reaching the ceiling. A pass with no observable pressure signal at thresholds is a TEST FAILURE.
HW reporting vs gating. Pure-x86 runs report informational deltas only; they do NOT satisfy the project-level Acceptance Gate. Every CSV row records its tier so this distinction stays auditable.
Re1 + Re2 are paired. Re1 establishes the autopilot RSS ceiling; Re2 establishes that respecting Re1 does not cost Tier 1 latency. They MUST be run in the same session to make the Re2 baseline meaningful.

9.8 KiB Raw Blame History Unescape Escape

Resource Limit Tests

NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained

NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload

NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation

NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee

NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model)

NFT-RES-LIM-FileHandles: File-descriptor and socket bound

Common assertions for every resource-limit scenario

9.8 KiB

Raw Blame History