[AZ-238] [AZ-239] Add resource restart tests

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-05 06:26:15 +03:00
parent 5acd14b792
commit 2ba44a33c5
8 changed files with 274 additions and 3 deletions
@@ -1,95 +0,0 @@
# Cold Start Restart Tests
**Task**: AZ-238_cold_start_restart_tests
**Name**: Cold Start Restart Tests
**Description**: Implement tests for cold start, companion restart, sharp-turn/disconnected relocalization, and first-fix resource spikes.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-238
**Epic**: AZ-218
## Problem
The test suite must prove that the runtime recovers from disconnected visual segments and companion restarts without hiding missing prerequisites or unsafe degraded behavior.
## Outcome
- Sharp-turn/disconnected-segment scenarios trigger relocalization or explicit degraded output.
- Companion restart scenarios measure first valid output timing and FDR evidence.
- Cold-start trials record first-fix latency and resource spikes.
## Scope
### Included
- NFT-RES-02 Sharp Turn And Disconnected Segment Relocalization.
- NFT-RES-03 Companion Computer Restart Mid-Flight.
- NFT-PERF-04 Cold Boot Time To First Fix.
- NFT-RES-LIM-05 Cold Start Resource Spike.
### Excluded
- Long thermal endurance.
- FDR 8-hour rollover load.
- Cache poisoning and no-fetch security tests.
## Acceptance Criteria
**AC-1: Disconnected segments trigger relocalization**
Given a sharp-turn or disconnected segment fixture
When replay reaches the low-overlap transition
Then relocalization is requested and the system either reconnects via verified anchor or reports degraded status.
**AC-2: Companion restart recovery is measured**
Given a replay/SITL mission in progress
When the GPS-denied service is restarted
Then first valid output timing, FC-state handoff behavior, and FDR restart evidence are recorded.
**AC-3: Cold-start trials report first-fix timing**
Given cold-start conditions and local cache/index prerequisites
When 50 trials run or are blocked
Then the p95 time-to-first-fix result or exact blocked prerequisite is reported.
**AC-4: Cold-start resource spikes are captured**
Given initialization begins
When engines/indexes/cache are loaded
Then peak memory and initialization-stage timing are recorded where measurable.
## Non-Functional Requirements
**Reliability**
- Missing calibration, public datasets, or hardware prerequisites must not be treated as passing.
**Performance**
- First-fix timing and peak memory are reported with percentile summaries where enough trials run.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Relocalization trigger assertion | Missing-position thresholds trigger request checks |
| AC-2 | Restart report parser | Restart and first-output events are present |
| AC-3 | Trial aggregation | p95 first-fix summary or blocked reason is emitted |
| AC-4 | Resource metric parser | Peak memory and stage timings are captured |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Sharp-turn/disconnected replay | NFT-RES-02 | Verified relocalization or degraded evidence | Reliability |
| AC-2 | Mission restart trace | NFT-RES-03 | First valid output and FDR restart evidence | Reliability |
| AC-3 | Cold-start harness | NFT-PERF-04 | p95 first fix <30 s or blocked prerequisite | Performance |
| AC-4 | Cold-start resource monitoring | NFT-RES-LIM-05 | Peak memory <8 GB or blocked/failure evidence | Performance |
## Constraints
- Restart tests must preserve fixture read-only guarantees.
- Trial loops must be bounded and report partial results if interrupted.
- Hardware-only assertions must be clearly marked when not runnable locally.
## Risks & Mitigation
**Risk 1: Long cold-start trials are expensive**
- *Risk*: Full 50-run evidence may not be practical on every PR.
- *Mitigation*: Support smoke mode for PRs and full mode for release gates, with clear report labels.
@@ -1,94 +0,0 @@
# Jetson Resource Endurance Tests
**Task**: AZ-239_jetson_resource_endurance_tests
**Name**: Jetson Resource Endurance Tests
**Description**: Implement release-gate resource and endurance tests for Jetson memory, thermal/power behavior, and FDR rollover.
**Complexity**: 5 points
**Dependencies**: AZ-233_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-239
**Epic**: AZ-218
## Problem
Release readiness requires hardware/resource evidence that cannot be proven by ordinary unit tests or short local replay runs.
## Outcome
- Jetson memory and thermal/power metrics are captured where hardware is available.
- FDR 8-hour synthetic load verifies rollover, storage cap, and retained payload classes.
- Hardware-only prerequisites are reported as blocked when not available.
## Scope
### Included
- NFT-RES-LIM-01 Jetson Memory Budget.
- NFT-RES-LIM-02 Thermal And Power Envelope.
- NFT-RES-LIM-04 Flight Data Recorder Rollover.
### Excluded
- Still-image replay accuracy.
- Satellite anchor/cache security tests.
- Cold-start first-fix trials.
## Acceptance Criteria
**AC-1: Jetson memory budget is measured**
Given Jetson hardware or equivalent production target is available
When sustained replay and trigger-path workload runs
Then CPU/GPU shared memory, process RSS, CUDA allocations, and OOM/throttle status are recorded.
**AC-2: Thermal and power endurance is validated or blocked**
Given thermal test prerequisites are available
When the sustained 25 W workload runs
Then throttle flags, temperatures, clocks, and latency are recorded for the required duration; otherwise the run reports blocked prerequisites.
**AC-3: FDR rollover is validated**
Given an 8-hour synthetic mission load
When FDR output reaches rollover conditions
Then storage remains within the cap, rollover is logged, and no payload class is silently dropped.
**AC-4: Evidence artifacts are complete**
Given resource/endurance scenarios complete or block
When reporting finishes
Then metrics, duration, environment, status, and artifact paths are written.
## Non-Functional Requirements
**Performance**
- Resource evidence must include duration and sampling interval.
**Reliability**
- Hardware-unavailable results are `blocked`, not `passed`.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | Resource metric parser | Memory and throttle fields are present |
| AC-2 | Blocked prerequisite reporter | Missing hardware/thermal setup records blocked status |
| AC-3 | FDR rollover report parser | Storage, rollover, and payload-class fields are validated |
| AC-4 | Evidence manifest writer | Artifact paths and run metadata are present |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|-------------------------|--------------|-------------------|----------------|
| AC-1 | Jetson/prod-equivalent hardware | NFT-RES-LIM-01 | Peak memory <8 GB or explicit failure | Performance |
| AC-2 | Thermal/power test setup | NFT-RES-LIM-02 | No throttle over required duration or blocked/failure | Performance |
| AC-3 | Synthetic 8-hour mission load | NFT-RES-LIM-04 | FDR cap and rollover behavior are evidenced | Reliability |
| AC-4 | Resource/endurance reports | All included scenarios | Complete artifact manifest and status | Reliability |
## Constraints
- These tests are release-gate oriented and may be skipped or blocked in ordinary PR mode.
- Raw frames must not be retained during FDR load tests.
- Resource tests must not write outside run-scoped output directories.
## Risks & Mitigation
**Risk 1: Hardware gates are unavailable during local development**
- *Risk*: Developers cannot run full evidence locally.
- *Mitigation*: Support blocked status and separate PR smoke mode from release-gate execution.