Files
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

157 lines
9.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Resource Limit Tests
Authored by `/test-spec` Phase 2 (2026-05-19). Resource-limit tests assert that the SUT stays within a quantified resource ceiling for the configured duration. Short bursts do not satisfy these tests — every scenario has an explicit sustained-monitoring window.
---
### NFT-RES-LIM-Re1: Combined onboard RSS ≤ 6 GB sustained
**Summary**: Combined process RSS on the deployed compute device for everything autopilot owns onboard (excluding Tier 1) MUST stay ≤ 6 GB throughout a 5-minute steady-state window with the full onboard workload active.
**Traces to**: AC `Resources & Data — Combined RSS on the deployed compute device, for everything autopilot owns onboard (excluding Tier 1), MUST stay within ≤ 6 GB / Re1`, RESTRICT `Hardware — Compute device: Jetson Orin Nano Super, 8 GB shared LPDDR5; Tier 1 consumes ~2 GB, leaving ~6 GB for autopilot`.
**Tier**: HW (representative Jetson Orin Nano Super) — pure-x86 reports informational only and does NOT satisfy the project-level Acceptance Gate.
**Preconditions**:
- Full onboard workload active: frame ingest from `rtsp-loopback`, Tier-2 + Tier-3 (when enabled) inferring at the documented steady-state load, gimbal commands flowing, MAVLink stream consumed at 10 Hz, operator-stream connected, MapObjects store hydrated for a 30×30 km region.
- Warm-up: 60 s before measurement starts (any first-load model warm-up complete).
- Tier-1 process is RUNNING in parallel but its RSS is EXCLUDED from the measurement (the AC scope is autopilot-owned RSS, excluding Tier 1).
**Monitoring**:
- Cgroup-level RSS for every process the SUT owns (the SUT binary plus any child processes it spawns — e.g., the VLM IPC peer if it lives in autopilot's cgroup), sampled at 1 Hz.
- Cgroup-level RSS for Tier 1 sampled at the same cadence (for the Re2 cross-reference).
- Per-process RSS captured to `reports/<run-id>/rss-trace.csv` for forensic review on failure.
**Duration**: 5 minutes of measurement after warm-up.
**Pass criteria**:
- `threshold_max`: per 1 s sample, `sum(autopilot_owned_RSS) ≤ 6 GB`.
- No single 1 s sample exceeds the ceiling.
- (Reporting only — not pass/fail): peak RSS, mean RSS, P95 RSS recorded in the CSV report.
**Test status**: DEFERRED — `<DEFERRED: long-running scenario harness exercising the full onboard workload for 5 min; inline-authorable but requires that the SUT be operational end-to-end first>`.
---
### NFT-RES-LIM-Re2: Tier-1 non-degradation under autopilot workload
**Summary**: When autopilot's full onboard workload runs concurrently with Tier 1 on the same Jetson, Tier-1 per-frame latency MUST NOT degrade by more than ± 5 ms versus the Tier-1-alone baseline (recorded by NFT-PERF-L1).
**Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms when autopilot's own onboard workload is running concurrently / Re2`, RESTRICT `Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`.
**Tier**: HW (the only meaningful environment for this assertion — GPU contention behaviour does not reproduce on x86).
**Preconditions**:
- NFT-PERF-L1 has been run on the same HW configuration in the SAME session and a baseline `tier1_baseline_p95_ms` recorded.
- Full onboard workload active (same as Re1).
**Monitoring**:
- Tier-1 per-frame latency sampled per frame for the duration of the test.
- The same metric source as NFT-PERF-L1 — for direct delta comparison.
**Duration**: 5 minutes of measurement after warm-up (matches Re1 window so both can run in the same session).
**Pass criteria**:
- `numeric_tolerance`: `|p95(tier1_with_autopilot) - tier1_baseline_p95_ms| ≤ 5 ms`.
- (Reporting only): mean, P95, max delta over the window.
**Test status**: DEFERRED — same fixture dependency as Re1; requires SUT operational + Tier 1 colocated on HW.
---
### NFT-RES-LIM-Storage: On-device persistent store stays under 95 % for in-flight operation
**Summary**: During a steady-state mission run (no abnormal load), the on-device persistent store MUST NOT exceed 95 % full. This protects the takeoff gate (R3) from being silently violated mid-mission and protects the post-flight push (Mp4) from running out of room to persist a failed diff.
**Traces to**: AC `Reliability & Safety — On-device storage MUST be bounded` (via R3 BIT gate), RESTRICT `On-device storage MUST be bounded`.
**Tier**: B + HW.
**Preconditions**:
- SUT mid-flight; persistent store at typical post-takeoff utilisation (e.g. 30 %).
- Normal-operation event volume: telemetry persistence, ignored-item appends, pending map-diff buffer (empty in this scenario).
**Monitoring**:
- Volume utilisation sampled at 10 Hz throughout the duration.
**Duration**: 60 minutes (representative mission duration per Mp3).
**Pass criteria**:
- `threshold_max`: `volume_used / volume_total ≤ 0.95` at every sample point.
- On approach to 85 %: structured-log INFO `storage_pressure` with current utilisation.
- On approach to 90 %: structured-log WARN with current utilisation; health.storage transitions to yellow.
- On 95 %: health.storage transitions to red; the SUT begins its documented eviction policy (this scenario does NOT test the policy semantics — that belongs to its own scenario; this scenario only asserts the policy IS triggered).
**Test status**: READY (no external fixture beyond the SUT itself; the persistent-store seed file controls starting utilisation).
---
### NFT-RES-LIM-CPU: CPU headroom for the Tier-1 colocation guarantee
**Summary**: Combined CPU utilisation of every autopilot-owned process MUST leave enough Jetson CPU headroom for Tier 1 to keep its NFT-PERF-L1 budget. Concretely: per-second sustained CPU usage by autopilot-owned processes MUST stay ≤ the configured budget (default 60 % of total CPU cycles measured at the cgroup level) for the duration of the run.
**Traces to**: AC `Resources & Data — Tier 1 per-frame latency MUST NOT degrade by more than ± 5 ms / Re2` (CPU-side mechanism backing Re2), RESTRICT `Hardware — Jetson Orin Nano Super`.
**Tier**: HW (CPU contention does not reproduce on x86).
**Preconditions**:
- Same workload as Re1 + Re2.
**Monitoring**:
- Cgroup CPU usage at 1 Hz.
**Duration**: 5 minutes after warm-up.
**Pass criteria**:
- `threshold_max`: per 1 s sample, `sum(autopilot_cpu_usage) ≤ 60 %` of total CPU.
- Reporting: mean, P95, max.
**Test status**: DEFERRED — same dependency as Re1/Re2.
---
### NFT-RES-LIM-GPU: GPU mutual exclusion contract (Tier 1 vs local large model)
**Summary**: Per RESTRICT (`Tier 1 (YOLO) and any local large model with GPU memory pressure share the Jetson GPU — only one of them may execute at any wall-clock instant`), the SUT MUST NOT issue a GPU compute call (e.g. Tier-3 VLM inference) while Tier 1 is executing on the GPU. The serialisation MUST be observable: a single GPU is busy at one instant.
**Traces to**: RESTRICT `Tier 1 and any local large model … only one of them may execute at any wall-clock instant`.
**Tier**: HW.
**Preconditions**:
- Tier 1 active; SUT in a ZoomedIn hold with deep-analysis enabled (Tier-3 will fire).
**Monitoring**:
- GPU-instance occupancy via `tegrastats` / equivalent at the highest available sampling rate.
- The SUT's own internal "compute-class" telemetry exposed on the health endpoint as `gpu_owner_current` ∈ { "tier1", "tier3", "idle" }.
**Duration**: 60 s containing ≥ 5 Tier-3 hold cycles.
**Pass criteria**:
- `exact`: at every sample point, `gpu_owner_current ∈ { "tier1", "tier3", "idle" }`; never simultaneously both.
- `tegrastats` peak GPU occupancy attributable to autopilot processes never overlaps Tier 1's known activity window for the same wall-clock instant.
**Test status**: DEFERRED — depends on the SUT being operational end-to-end + Tier-3 enabled; also depends on the SUT exposing `gpu_owner_current` (which is an architectural choice not yet locked).
---
### NFT-RES-LIM-FileHandles: File-descriptor and socket bound
**Summary**: Sustained operation MUST NOT leak file descriptors or sockets. The count MUST stay within a documented headroom of the initial-post-warmup baseline for the duration of the run.
**Traces to**: RESTRICT `On-device storage MUST be bounded` (general bounded-resource principle), security principle `No silent error swallowing for security-relevant failures` (FD exhaustion would silently break the operator-stream).
**Tier**: B + HW.
**Preconditions**:
- Warm-up: 60 s.
- Workload: full onboard workload at steady state.
**Monitoring**:
- `/proc/<pid>/fd` count per autopilot process at 1 Hz.
**Duration**: 60 minutes.
**Pass criteria**:
- `threshold_max`: at every sample point, `fd_count ≤ fd_baseline_post_warmup + 50` (50 = documented churn headroom for intermittent operator reconnects).
- A monotonically rising trend (slope > 0 over the run) is a TEST FAILURE even if the absolute ceiling is not breached.
**Test status**: READY for a Tier-B run; gains its real value once HW + sustained-workload land.
---
## Common assertions for every resource-limit scenario
- **Sustained-monitoring is non-negotiable.** Each scenario specifies a duration ≥ 60 s; short bursts that pass do not satisfy the test. The CSV report records the full sample trace path under `artifacts_path`.
- **No silent eviction.** Where a ceiling is approached, the SUT MUST surface the pressure (structured-log INFO at 85 %, WARN at 90 %, transition to yellow/red on health) BEFORE reaching the ceiling. A pass with no observable pressure signal at thresholds is a TEST FAILURE.
- **HW reporting vs gating.** Pure-x86 runs report informational deltas only; they do NOT satisfy the project-level Acceptance Gate. Every CSV row records its tier so this distinction stays auditable.
- **Re1 + Re2 are paired.** Re1 establishes the autopilot RSS ceiling; Re2 establishes that respecting Re1 does not cost Tier 1 latency. They MUST be run in the same session to make the Re2 baseline meaningful.