[AZ-238] [AZ-239] Add resource restart tests

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 13:11:13 +00:00 · 2026-05-05 06:26:15 +03:00
parent 5acd14b792
commit 2ba44a33c5
8 changed files with 274 additions and 3 deletions
@@ -1,95 +0,0 @@
-# Cold Start Restart Tests
-
-**Task**: AZ-238_cold_start_restart_tests
-**Name**: Cold Start Restart Tests
-**Description**: Implement tests for cold start, companion restart, sharp-turn/disconnected relocalization, and first-fix resource spikes.
-**Complexity**: 5 points
-**Dependencies**: AZ-233_test_infrastructure
-**Component**: Blackbox Tests
-**Tracker**: AZ-238
-**Epic**: AZ-218
-
-## Problem
-
-The test suite must prove that the runtime recovers from disconnected visual segments and companion restarts without hiding missing prerequisites or unsafe degraded behavior.
-
-## Outcome
-
- Sharp-turn/disconnected-segment scenarios trigger relocalization or explicit degraded output.
- Companion restart scenarios measure first valid output timing and FDR evidence.
- Cold-start trials record first-fix latency and resource spikes.
-
-## Scope
-
-### Included
-
- NFT-RES-02 Sharp Turn And Disconnected Segment Relocalization.
- NFT-RES-03 Companion Computer Restart Mid-Flight.
- NFT-PERF-04 Cold Boot Time To First Fix.
- NFT-RES-LIM-05 Cold Start Resource Spike.
-
-### Excluded
-
- Long thermal endurance.
- FDR 8-hour rollover load.
- Cache poisoning and no-fetch security tests.
-
-## Acceptance Criteria
-
-**AC-1: Disconnected segments trigger relocalization**
-Given a sharp-turn or disconnected segment fixture
-When replay reaches the low-overlap transition
-Then relocalization is requested and the system either reconnects via verified anchor or reports degraded status.
-
-**AC-2: Companion restart recovery is measured**
-Given a replay/SITL mission in progress
-When the GPS-denied service is restarted
-Then first valid output timing, FC-state handoff behavior, and FDR restart evidence are recorded.
-
-**AC-3: Cold-start trials report first-fix timing**
-Given cold-start conditions and local cache/index prerequisites
-When 50 trials run or are blocked
-Then the p95 time-to-first-fix result or exact blocked prerequisite is reported.
-
-**AC-4: Cold-start resource spikes are captured**
-Given initialization begins
-When engines/indexes/cache are loaded
-Then peak memory and initialization-stage timing are recorded where measurable.
-
-## Non-Functional Requirements
-
-**Reliability**
- Missing calibration, public datasets, or hardware prerequisites must not be treated as passing.
-
-**Performance**
- First-fix timing and peak memory are reported with percentile summaries where enough trials run.
-
-## Unit Tests
-
-| AC Ref | What to Test | Required Outcome |
-|--------|--------------|------------------|
-| AC-1 | Relocalization trigger assertion | Missing-position thresholds trigger request checks |
-| AC-2 | Restart report parser | Restart and first-output events are present |
-| AC-3 | Trial aggregation | p95 first-fix summary or blocked reason is emitted |
-| AC-4 | Resource metric parser | Peak memory and stage timings are captured |
-
-## Blackbox Tests
-
-| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
-|--------|-------------------------|--------------|-------------------|----------------|
-| AC-1 | Sharp-turn/disconnected replay | NFT-RES-02 | Verified relocalization or degraded evidence | Reliability |
-| AC-2 | Mission restart trace | NFT-RES-03 | First valid output and FDR restart evidence | Reliability |
-| AC-3 | Cold-start harness | NFT-PERF-04 | p95 first fix <30 s or blocked prerequisite | Performance |
-| AC-4 | Cold-start resource monitoring | NFT-RES-LIM-05 | Peak memory <8 GB or blocked/failure evidence | Performance |
-
-## Constraints
-
- Restart tests must preserve fixture read-only guarantees.
- Trial loops must be bounded and report partial results if interrupted.
- Hardware-only assertions must be clearly marked when not runnable locally.
-
-## Risks & Mitigation
-
-**Risk 1: Long cold-start trials are expensive**
- *Risk*: Full 50-run evidence may not be practical on every PR.
- *Mitigation*: Support smoke mode for PRs and full mode for release gates, with clear report labels.
@@ -1,94 +0,0 @@
-# Jetson Resource Endurance Tests
-
-**Task**: AZ-239_jetson_resource_endurance_tests
-**Name**: Jetson Resource Endurance Tests
-**Description**: Implement release-gate resource and endurance tests for Jetson memory, thermal/power behavior, and FDR rollover.
-**Complexity**: 5 points
-**Dependencies**: AZ-233_test_infrastructure
-**Component**: Blackbox Tests
-**Tracker**: AZ-239
-**Epic**: AZ-218
-
-## Problem
-
-Release readiness requires hardware/resource evidence that cannot be proven by ordinary unit tests or short local replay runs.
-
-## Outcome
-
- Jetson memory and thermal/power metrics are captured where hardware is available.
- FDR 8-hour synthetic load verifies rollover, storage cap, and retained payload classes.
- Hardware-only prerequisites are reported as blocked when not available.
-
-## Scope
-
-### Included
-
- NFT-RES-LIM-01 Jetson Memory Budget.
- NFT-RES-LIM-02 Thermal And Power Envelope.
- NFT-RES-LIM-04 Flight Data Recorder Rollover.
-
-### Excluded
-
- Still-image replay accuracy.
- Satellite anchor/cache security tests.
- Cold-start first-fix trials.
-
-## Acceptance Criteria
-
-**AC-1: Jetson memory budget is measured**
-Given Jetson hardware or equivalent production target is available
-When sustained replay and trigger-path workload runs
-Then CPU/GPU shared memory, process RSS, CUDA allocations, and OOM/throttle status are recorded.
-
-**AC-2: Thermal and power endurance is validated or blocked**
-Given thermal test prerequisites are available
-When the sustained 25 W workload runs
-Then throttle flags, temperatures, clocks, and latency are recorded for the required duration; otherwise the run reports blocked prerequisites.
-
-**AC-3: FDR rollover is validated**
-Given an 8-hour synthetic mission load
-When FDR output reaches rollover conditions
-Then storage remains within the cap, rollover is logged, and no payload class is silently dropped.
-
-**AC-4: Evidence artifacts are complete**
-Given resource/endurance scenarios complete or block
-When reporting finishes
-Then metrics, duration, environment, status, and artifact paths are written.
-
-## Non-Functional Requirements
-
-**Performance**
- Resource evidence must include duration and sampling interval.
-
-**Reliability**
- Hardware-unavailable results are `blocked`, not `passed`.
-
-## Unit Tests
-
-| AC Ref | What to Test | Required Outcome |
-|--------|--------------|------------------|
-| AC-1 | Resource metric parser | Memory and throttle fields are present |
-| AC-2 | Blocked prerequisite reporter | Missing hardware/thermal setup records blocked status |
-| AC-3 | FDR rollover report parser | Storage, rollover, and payload-class fields are validated |
-| AC-4 | Evidence manifest writer | Artifact paths and run metadata are present |
-
-## Blackbox Tests
-
-| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
-|--------|-------------------------|--------------|-------------------|----------------|
-| AC-1 | Jetson/prod-equivalent hardware | NFT-RES-LIM-01 | Peak memory <8 GB or explicit failure | Performance |
-| AC-2 | Thermal/power test setup | NFT-RES-LIM-02 | No throttle over required duration or blocked/failure | Performance |
-| AC-3 | Synthetic 8-hour mission load | NFT-RES-LIM-04 | FDR cap and rollover behavior are evidenced | Reliability |
-| AC-4 | Resource/endurance reports | All included scenarios | Complete artifact manifest and status | Reliability |
-
-## Constraints
-
- These tests are release-gate oriented and may be skipped or blocked in ordinary PR mode.
- Raw frames must not be retained during FDR load tests.
- Resource tests must not write outside run-scoped output directories.
-
-## Risks & Mitigation
-
-**Risk 1: Hardware gates are unavailable during local development**
- *Risk*: Developers cannot run full evidence locally.
- *Mitigation*: Support blocked status and separate PR smoke mode from release-gate execution.