mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-27 19:16:41 +00:00
Update autodev state documentation to reflect completion of Plan Step 1, including detailed progress on phases and next steps. Revised phase details to clarify user-level blocking gates and hardware assessment outcomes.
This commit is contained in:
@@ -0,0 +1,177 @@
|
||||
# Resource Limit Tests
|
||||
|
||||
> All tests measure resources via the `prom` (Prometheus) and `nvidia-smi-exporter` services defined in `environment.md`. None of these tests touch SUT internals.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-01: Memory ≤8 GB shared (AC-4.2)
|
||||
|
||||
**Summary**: Peak resident memory + GPU memory remains under the 8 GB shared LPDDR5 cap.
|
||||
**Traces to**: AC-4.2, results_report row 35, NF-T2. Tier: T1 (Docker mem accounting) + T4 (`tegrastats`).
|
||||
|
||||
**Preconditions**: 30-min sustained replay on Orin Nano Super 25 W (T4) or 30-min replay on x86+CUDA emulation (T1 functional only).
|
||||
|
||||
**Monitoring**:
|
||||
- `prom` scrapes the SUT's `/metrics` endpoint for `process_resident_memory_bytes`.
|
||||
- `nvidia-smi-exporter` (T4) scrapes Jetson `tegrastats` for shared-LPDDR5 usage.
|
||||
|
||||
**Duration**: 30 min replay.
|
||||
|
||||
**Pass criteria**:
|
||||
- T4 binding: peak shared LPDDR5 usage < 8192 MB throughout; growth ≤ 50 MB over the 30-min window (no leak).
|
||||
- T1 functional: peak resident memory < 8192 MB; growth ≤ 50 MB.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-02: Thermal — junction temperature ≤80 °C, no throttle (results_report row 36)
|
||||
|
||||
**Summary**: SoC junction temperature stays below 80 °C; no thermal throttle event.
|
||||
**Traces to**: results_report row 36, AC-NEW-5 (sub-budget). Tier: T4.
|
||||
|
||||
**Preconditions**: T4 only; +25 °C ambient.
|
||||
|
||||
**Monitoring**: `nvidia-smi-exporter` reads junction temp every 1 s.
|
||||
|
||||
**Duration**: 30 min replay.
|
||||
|
||||
**Pass criteria**: max(junction_temp_c) ≤ 80 °C; throttle_event_count == 0 (per `tegrastats throttle` indicator).
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-03: AC-NEW-5 thermal envelope — 8 h @ 25 W @ +50 °C ambient
|
||||
|
||||
**Summary**: Cooling solution sustains 25 W for 8 h at +50 °C ambient without thermal throttling.
|
||||
**Traces to**: AC-NEW-5, NF-T3, restriction §Onboard Hardware. Tier: T4 (`deferred-hil`) — requires hot-soak chamber.
|
||||
|
||||
**Preconditions**: hot-soak chamber, +50 °C ambient stabilized; SUT in 25 W mode running `synthetic_8h_load`.
|
||||
|
||||
**Monitoring**: junction temp + throttle indicator via `tegrastats`; ambient temp probe; FDR thermal log (AC-NEW-3 includes thermal traces).
|
||||
|
||||
**Duration**: 8 h.
|
||||
|
||||
**Pass criteria**: throttle_event_count == 0 over 8 h; throttle event automatically emits STATUSTEXT to GCS if it occurs (verify behaviour with a deliberate throttle injection in a separate run).
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-04: AC-NEW-5 cold-soak cold-start
|
||||
|
||||
**Summary**: Cold-start TTFF at −20 °C ambient meets AC-NEW-1 budget.
|
||||
**Traces to**: AC-NEW-5 cold corner, AC-NEW-1, NF-T3 cold-soak. Tier: T4 (`deferred-hil`) — requires cold chamber.
|
||||
|
||||
**Preconditions**: chamber stabilized at −20 °C with SUT powered off; nav-cam + IMU sources cold-replay-ready.
|
||||
|
||||
**Monitoring**: TTFF timer (per FT-P-16 / FT-P-T4 cold).
|
||||
|
||||
**Duration**: 50 cold boots within the cold chamber.
|
||||
|
||||
**Pass criteria**: 95th percentile TTFF ≤ 30 s.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-05: FDR — 8-h cap + rollover (AC-NEW-3, NF-T5)
|
||||
|
||||
**Summary**: After 8 h replay, FDR is ≤ 64 GB and no payload class silently dropped.
|
||||
**Traces to**: AC-NEW-3, AC-8.5, NF-T5. Tier: T1 (volume-size accounting) + T4 (real disk).
|
||||
|
||||
**Preconditions**: clean `fdr` volume at start; `synthetic_8h_load` replay.
|
||||
|
||||
**Monitoring**: filesystem accounting per directory class; FDR rollover log (must record every dropped segment).
|
||||
|
||||
**Duration**: 8 h.
|
||||
|
||||
**Pass criteria**:
|
||||
- Total FDR ≤ 64 GB.
|
||||
- All payload classes present in the latest segment: per-frame positions w/ covariance + source-label, FC IMU full-rate, GPS_INPUT frames, MAVLink raw stream (tlog), system health (CPU / GPU / temp / throttle), mid-flight tiles, ≤0.1 Hz failure-thumbnail log.
|
||||
- For each rollover, a STATUSTEXT or rollover log entry exists; no silent drop.
|
||||
- Raw nav-cam / AI-cam frames are NOT present (AC-8.5 cross-check).
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-06: Tile cache ≤ 10 GB persistent (restrictions §UAV)
|
||||
|
||||
**Summary**: Persistent satellite-tile cache for the 400 km² operational area + onboard-generated tiles fits in 10 GB.
|
||||
**Traces to**: restrictions §UAV ("~10 GB" tile-cache budget). Tier: T1.
|
||||
|
||||
**Preconditions**: simulate 400 km² operational area (satellite tiles + DEM tiles + VPR chunk index) loaded; run a flight that generates onboard tiles; let cache settle.
|
||||
|
||||
**Monitoring**: filesystem size of `/probe/tiles/`.
|
||||
|
||||
**Duration**: 30 min replay (enough to populate onboard tiles).
|
||||
|
||||
**Pass criteria**: total cache size ≤ 10 GB after the flight; deduplication keeps onboard tiles per sector ≤ 1.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-07: GPU memory peak
|
||||
|
||||
**Summary**: TensorRT engines (cuVSLAM + matcher + VPR) collectively fit within Orin Nano Super shared LPDDR5 with headroom for the rest of the system.
|
||||
**Traces to**: AC-4.2, NF-T2 (extended for ROS 2 image growth). Tier: T4.
|
||||
|
||||
**Preconditions**: all TRT engines loaded.
|
||||
|
||||
**Monitoring**: `tegrastats` GPU memory line.
|
||||
|
||||
**Duration**: steady-state 5 min after warm-up.
|
||||
|
||||
**Pass criteria**: GPU memory ≤ 4 GB (leaves ≥ 4 GB for ROS 2 nodes + working set + OS); engine reservation ≥ 1 GB for matcher + VPR (per NF-T2 extended).
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-08: Per-frame GPU latency budget breakdown
|
||||
|
||||
**Summary**: Sum of (cuVSLAM + matcher + VPR + Component 5 calibrator + Component 1b ortho) ≤ 400 ms p95 per AC-4.1.
|
||||
**Traces to**: AC-4.1, NFT-PERF-01..04. Tier: T4.
|
||||
|
||||
**Monitoring**: per-stage timers exposed via `/metrics`.
|
||||
|
||||
**Duration**: 30 min replay.
|
||||
|
||||
**Pass criteria**: Σ p95(per-stage) ≤ 400 ms; each component within its sub-budget (cuVSLAM ≤ 20, matcher inline ≤ 200, ortho ≤ 50, VPR conditional ≤ 200 only on triggers, calibrator ≤ 5).
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-09: ROS 2 + Isaac ROS image footprint
|
||||
|
||||
**Summary**: Deployment image fits the documented ~200 MB growth budget over the DIY-Python baseline.
|
||||
**Traces to**: M-29 cost / benefit, NF-T2 extended. Tier: T1 (image inspection).
|
||||
|
||||
**Steps**: build the deployment image; compare against a baseline DIY-Python image manifest; assert delta ≤ 200 MB.
|
||||
|
||||
**Pass criteria**: delta ≤ 200 MB; matcher + VPR engine reservation ≥ 1 GB available at runtime.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-10: CPU usage — DDS overhead bound
|
||||
|
||||
**Summary**: ROS 2 DDS + topic serialisation overhead stays within the documented 2–5 % CPU.
|
||||
**Traces to**: M-29 (Q6 → A cost / benefit). Tier: T4.
|
||||
|
||||
**Monitoring**: per-process CPU via `prom`; DDS process / `rmw_*` thread CPU specifically.
|
||||
|
||||
**Duration**: 30 min replay.
|
||||
|
||||
**Pass criteria**: DDS CPU mean ≤ 5 %; total SUT CPU ≤ 80 % to leave headroom for spikes.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-11: Operational area ≤ 400 km² and 8-h flight cap
|
||||
|
||||
**Summary**: SUT correctly handles the documented operational ceiling (sector 150 km² + corridor 50 km² ≈ 200 km² typical, up to 400 km² total).
|
||||
**Traces to**: restrictions §UAV. Tier: T1 (smoke + audit).
|
||||
|
||||
**Steps**: configure SUT with a 400 km² operational area; verify boot-time pre-allocation respects budget; run a synthetic flight at 60 km/h cruise for 30 min (representative of 8 h scaled).
|
||||
|
||||
**Pass criteria**: SUT loads tile descriptors + VPR index without OOM; 30 min replay sustained at expected fps; resource budgets (NFT-RES-LIM-01..10) all green at this scale.
|
||||
|
||||
---
|
||||
|
||||
### NFT-RES-LIM-12: Disk I/O — FDR write rate sustainable
|
||||
|
||||
**Summary**: FDR write rate sustained over 8 h does not back up the writer or interfere with the inline pipeline.
|
||||
**Traces to**: AC-NEW-3, AC-4.1 (no interference). Tier: T4.
|
||||
|
||||
**Monitoring**: NVMe write throughput (MB/s) via Prometheus + I/O wait via `vmstat`.
|
||||
|
||||
**Duration**: 8 h.
|
||||
|
||||
**Pass criteria**: write rate ≤ NVMe sustained throughput minus 30 % headroom; I/O wait does not contribute to AC-4.1 latency violations (NFT-PERF-01 still passes during the 8-h window).
|
||||
Reference in New Issue
Block a user