[AZ-238] [AZ-239] Add resource restart tests

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 11:31:13 +00:00 · 2026-05-05 06:26:15 +03:00
parent 5acd14b792
commit 2ba44a33c5
8 changed files with 274 additions and 3 deletions
@@ -0,0 +1,95 @@
+# Cold Start Restart Tests
+
+**Task**: AZ-238_cold_start_restart_tests
+**Name**: Cold Start Restart Tests
+**Description**: Implement tests for cold start, companion restart, sharp-turn/disconnected relocalization, and first-fix resource spikes.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-238
+**Epic**: AZ-218
+
+## Problem
+
+The test suite must prove that the runtime recovers from disconnected visual segments and companion restarts without hiding missing prerequisites or unsafe degraded behavior.
+
+## Outcome
+
+- Sharp-turn/disconnected-segment scenarios trigger relocalization or explicit degraded output.
+- Companion restart scenarios measure first valid output timing and FDR evidence.
+- Cold-start trials record first-fix latency and resource spikes.
+
+## Scope
+
+### Included
+
+- NFT-RES-02 Sharp Turn And Disconnected Segment Relocalization.
+- NFT-RES-03 Companion Computer Restart Mid-Flight.
+- NFT-PERF-04 Cold Boot Time To First Fix.
+- NFT-RES-LIM-05 Cold Start Resource Spike.
+
+### Excluded
+
+- Long thermal endurance.
+- FDR 8-hour rollover load.
+- Cache poisoning and no-fetch security tests.
+
+## Acceptance Criteria
+
+**AC-1: Disconnected segments trigger relocalization**
+Given a sharp-turn or disconnected segment fixture
+When replay reaches the low-overlap transition
+Then relocalization is requested and the system either reconnects via verified anchor or reports degraded status.
+
+**AC-2: Companion restart recovery is measured**
+Given a replay/SITL mission in progress
+When the GPS-denied service is restarted
+Then first valid output timing, FC-state handoff behavior, and FDR restart evidence are recorded.
+
+**AC-3: Cold-start trials report first-fix timing**
+Given cold-start conditions and local cache/index prerequisites
+When 50 trials run or are blocked
+Then the p95 time-to-first-fix result or exact blocked prerequisite is reported.
+
+**AC-4: Cold-start resource spikes are captured**
+Given initialization begins
+When engines/indexes/cache are loaded
+Then peak memory and initialization-stage timing are recorded where measurable.
+
+## Non-Functional Requirements
+
+**Reliability**
+- Missing calibration, public datasets, or hardware prerequisites must not be treated as passing.
+
+**Performance**
+- First-fix timing and peak memory are reported with percentile summaries where enough trials run.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Relocalization trigger assertion | Missing-position thresholds trigger request checks |
+| AC-2 | Restart report parser | Restart and first-output events are present |
+| AC-3 | Trial aggregation | p95 first-fix summary or blocked reason is emitted |
+| AC-4 | Resource metric parser | Peak memory and stage timings are captured |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Sharp-turn/disconnected replay | NFT-RES-02 | Verified relocalization or degraded evidence | Reliability |
+| AC-2 | Mission restart trace | NFT-RES-03 | First valid output and FDR restart evidence | Reliability |
+| AC-3 | Cold-start harness | NFT-PERF-04 | p95 first fix <30 s or blocked prerequisite | Performance |
+| AC-4 | Cold-start resource monitoring | NFT-RES-LIM-05 | Peak memory <8 GB or blocked/failure evidence | Performance |
+
+## Constraints
+
+- Restart tests must preserve fixture read-only guarantees.
+- Trial loops must be bounded and report partial results if interrupted.
+- Hardware-only assertions must be clearly marked when not runnable locally.
+
+## Risks & Mitigation
+
+**Risk 1: Long cold-start trials are expensive**
+- *Risk*: Full 50-run evidence may not be practical on every PR.
+- *Mitigation*: Support smoke mode for PRs and full mode for release gates, with clear report labels.
@@ -0,0 +1,94 @@
+# Jetson Resource Endurance Tests
+
+**Task**: AZ-239_jetson_resource_endurance_tests
+**Name**: Jetson Resource Endurance Tests
+**Description**: Implement release-gate resource and endurance tests for Jetson memory, thermal/power behavior, and FDR rollover.
+**Complexity**: 5 points
+**Dependencies**: AZ-233_test_infrastructure
+**Component**: Blackbox Tests
+**Tracker**: AZ-239
+**Epic**: AZ-218
+
+## Problem
+
+Release readiness requires hardware/resource evidence that cannot be proven by ordinary unit tests or short local replay runs.
+
+## Outcome
+
+- Jetson memory and thermal/power metrics are captured where hardware is available.
+- FDR 8-hour synthetic load verifies rollover, storage cap, and retained payload classes.
+- Hardware-only prerequisites are reported as blocked when not available.
+
+## Scope
+
+### Included
+
+- NFT-RES-LIM-01 Jetson Memory Budget.
+- NFT-RES-LIM-02 Thermal And Power Envelope.
+- NFT-RES-LIM-04 Flight Data Recorder Rollover.
+
+### Excluded
+
+- Still-image replay accuracy.
+- Satellite anchor/cache security tests.
+- Cold-start first-fix trials.
+
+## Acceptance Criteria
+
+**AC-1: Jetson memory budget is measured**
+Given Jetson hardware or equivalent production target is available
+When sustained replay and trigger-path workload runs
+Then CPU/GPU shared memory, process RSS, CUDA allocations, and OOM/throttle status are recorded.
+
+**AC-2: Thermal and power endurance is validated or blocked**
+Given thermal test prerequisites are available
+When the sustained 25 W workload runs
+Then throttle flags, temperatures, clocks, and latency are recorded for the required duration; otherwise the run reports blocked prerequisites.
+
+**AC-3: FDR rollover is validated**
+Given an 8-hour synthetic mission load
+When FDR output reaches rollover conditions
+Then storage remains within the cap, rollover is logged, and no payload class is silently dropped.
+
+**AC-4: Evidence artifacts are complete**
+Given resource/endurance scenarios complete or block
+When reporting finishes
+Then metrics, duration, environment, status, and artifact paths are written.
+
+## Non-Functional Requirements
+
+**Performance**
+- Resource evidence must include duration and sampling interval.
+
+**Reliability**
+- Hardware-unavailable results are `blocked`, not `passed`.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|--------------|------------------|
+| AC-1 | Resource metric parser | Memory and throttle fields are present |
+| AC-2 | Blocked prerequisite reporter | Missing hardware/thermal setup records blocked status |
+| AC-3 | FDR rollover report parser | Storage, rollover, and payload-class fields are validated |
+| AC-4 | Evidence manifest writer | Artifact paths and run metadata are present |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|-------------------------|--------------|-------------------|----------------|
+| AC-1 | Jetson/prod-equivalent hardware | NFT-RES-LIM-01 | Peak memory <8 GB or explicit failure | Performance |
+| AC-2 | Thermal/power test setup | NFT-RES-LIM-02 | No throttle over required duration or blocked/failure | Performance |
+| AC-3 | Synthetic 8-hour mission load | NFT-RES-LIM-04 | FDR cap and rollover behavior are evidenced | Reliability |
+| AC-4 | Resource/endurance reports | All included scenarios | Complete artifact manifest and status | Reliability |
+
+## Constraints
+
+- These tests are release-gate oriented and may be skipped or blocked in ordinary PR mode.
+- Raw frames must not be retained during FDR load tests.
+- Resource tests must not write outside run-scoped output directories.
+
+## Risks & Mitigation
+
+**Risk 1: Hardware gates are unavailable during local development**
+- *Risk*: Developers cannot run full evidence locally.
+- *Mitigation*: Support blocked status and separate PR smoke mode from release-gate execution.