# Test Infrastructure **Task**: AZ-233_test_infrastructure **Name**: Test Infrastructure **Description**: Scaffold the blackbox and e2e test project: runner, deterministic fixtures, isolated replay/SITL environment, reporting, and external dependency stubs. **Complexity**: 5 points **Dependencies**: AZ-240_native_vio_backend_integration, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac **Component**: Blackbox Tests **Tracker**: AZ-233 **Epic**: AZ-218 ## Test Project Folder Layout ```text e2e/ ├── replay/ │ ├── run_replay.py │ ├── scenarios/ │ └── reports/ ├── fixtures/ │ ├── cache/ │ ├── mavlink/ │ ├── telemetry/ │ └── expected/ ├── tests/ │ ├── test_still_image_replay.py │ ├── test_vio_replay.py │ ├── test_satellite_anchor.py │ ├── test_blackout_spoofing.py │ ├── test_resource_limits.py │ └── test_security_gates.py ├── mocks/ │ ├── satellite_cache_stub/ │ ├── ardupilot_sitl/ │ └── qgc_observer/ └── reports/ ``` ### Layout Rationale The test project keeps blackbox/e2e runner code outside product runtime internals. Scenario definitions, fixtures, mocks, and reports are separated so tests can reset state between runs and produce release evidence without importing private component modules. Test implementation starts only after remediation tasks AZ-240, AZ-241, and AZ-242 close the native VIO, real satellite VPR, and real anchor matching gaps found during autodev verification. ## Mock Services | Mock Service | Replaces | Interfaces | Behavior | |-------------|----------|------------|----------| | `satellite_cache_stub` | Offline Azaion Suite Satellite Service cache package | Local COG/manifest/descriptor fixture volume | Serves preloaded valid, stale, unsigned, hash-mismatched, and low-resolution cache fixtures; never performs network fetches during flight-mode tests. | | `ardupilot_sitl` | ArduPilot Plane flight controller | MAVLink telemetry and `GPS_INPUT` receiving path | Emits generated IMU, attitude, GPS health, spoofing, and failsafe traces; records injected `GPS_INPUT` for assertions. | | `qgc_observer` | QGroundControl status consumer | MAVLink/tlog parser | Records downsampled `STATUSTEXT`, status, and failsafe messages for rate and content assertions. | ### Mock Control API Each mock or runner fixture must expose deterministic scenario controls for normal replay, stale cache, missing cache, spoofed GPS, blackout, restart, and resource-load modes. Recorded interactions must be queryable after each test run for assertions. ## Docker Test Environment ### `docker-compose.test.yml` Structure | Service | Image / Build | Purpose | Depends On | |---------|---------------|---------|------------| | `gps-denied-service` | Project runtime image or local package mount | System under test | `satellite-cache-stub` | | `replay-consumer` | Python replay/test harness | Feeds frames, telemetry, cache data, and faults | `gps-denied-service`, mock services | | `satellite-cache-stub` | Fixture volume/service | Provides offline cache manifests, sidecars, descriptors, and generated invalid variants | none | | `ardupilot-plane-sitl` | SITL container or local process wrapper | Validates `GPS_INPUT`, spoofing, and failsafe behavior | `gps-denied-service` | | `qgc-observer` | MAVLink log parser | Verifies GCS-visible status output | `ardupilot-plane-sitl` | ### Networks and Volumes - `replay-net`: connects the runtime, replay consumer, and satellite-cache stub. - `sitl-net`: connects the runtime, ArduPilot Plane SITL, and QGC observer. - `input-data`: read-only mount for `_docs/00_problem/input_data/`. - `expected-results`: read-only mount for expected coordinate and report fixtures. - `derkachi-replay`: read-only mount for `flight_derkachi.mp4` and `data_imu.csv`. - `satellite-cache`: fixture cache volume with valid and invalid manifests. - `fdr-output`: fresh per-run output volume for FDR and report artifacts. ## Test Runner Configuration **Framework**: Python pytest-style replay harness. **Entry point**: `run-blackbox-replay` or equivalent pytest command that executes scenario groups and writes reports. **Reports**: CSV summary plus FDR validation Markdown. ### Fixture Strategy | Fixture | Scope | Purpose | |---------|-------|---------| | `project_60_still_images` | session | Provides 60 nadir images and expected WGS84 centers. | | `derkachi_video_telemetry` | session | Provides synchronized video, IMU, and `GLOBAL_POSITION_INT` replay data. | | `cache_integrity_fixtures` | function | Provides valid, stale, unsigned, hash-mismatched, and low-resolution cache variants. | | `sitl_spoofing_scenarios` | function | Provides generated GPS loss/spoofing and blackout traces. | | `public_nadir_vio_candidates` | optional/session | Provides public or representative synchronized datasets when available. | ## Test Data Fixtures | Data Set | Source | Format | Used By | |----------|--------|--------|---------| | `project_60_still_images` | `_docs/00_problem/input_data/` | JPG + metadata | Still-image accuracy, confidence, latency smoke | | `expected_frame_centers` | `_docs/00_problem/input_data/coordinates.csv` and expected-results report | CSV/Markdown | Geolocation assertions | | `derkachi_video_telemetry` | `_docs/00_problem/input_data/flight_derkachi/` | MP4 + CSV | VIO replay, latency, resilience | | `cache_integrity_fixtures` | generated fixture volume | COG/manifest/sidecar/index fixtures | Cache freshness, poisoning, no-fetch tests | | `sitl_spoofing_scenarios` | generated by SITL harness | MAVLink/tlog traces | Spoofing, blackout, failsafe, GCS status | | `public_nadir_vio_candidates` | pinned external fixtures | dataset-specific | Final VIO and satellite-anchor validation | ### Data Isolation Every run uses read-only input fixtures and fresh run-scoped output directories. FDR, generated tiles, tlogs, and reports are written only to per-run output volumes. Mock state and generated fixtures are reset before each scenario group. ## Test Reporting **Format**: CSV summary and Markdown evidence report. **Output paths**: `test-results/blackbox-report.csv` and `test-results/fdr-validation-summary.md`. **Required columns**: Test ID, test name, input dataset, execution time, result, error distance, source label, covariance 95% semi-major, `GPS_INPUT.fix_type`, and error message. ## Acceptance Criteria **AC-1: Test environment starts** Given the Docker/replay test environment When the test stack starts Then the runtime, replay consumer, cache fixture, SITL, and observer services are reachable or report a clear blocked prerequisite. **AC-2: External dependency stubs are deterministic** Given a scenario config for cache, MAVLink, QGC, or fixture behavior When the replay consumer executes it Then mocks produce repeatable responses and expose recorded interactions for assertions. **AC-3: Test runner executes scenario groups** Given valid fixtures and a running test environment When the test runner starts Then it discovers and executes blackbox, performance, resilience, security, and resource-limit scenario groups. **AC-4: Reports are generated** Given a completed or blocked test run When reporting finishes Then CSV and Markdown evidence files are written with the required columns, metrics, artifact paths, and blocked-prerequisite reasons. ## Non-Functional Requirements **Reliability** - Missing hardware, public datasets, calibration, or SITL prerequisites are reported as `blocked`, not `passed`. **Security** - Fixture stubs must not access external satellite-provider or Suite service networks during in-flight test scenarios. **Data Isolation** - No test may mutate source fixtures or write FDR/generated-tile artifacts outside run-scoped output paths. ## Constraints - The test suite must use public runtime boundaries only: navigation frames, telemetry, offline cache, MAVLink output, QGC status, and FDR outputs. - The suite must not import private estimator, BASALT, wrapper, or tile-manager internals. - Hardware-specific Jetson gates remain release-gate tests and may be skipped or blocked in ordinary local replay. ## Risks & Mitigation **Risk 1: Environment prerequisites hide real failures** - *Risk*: Missing hardware, calibration, or datasets could be treated as success. - *Mitigation*: Report unavailable prerequisites as `blocked` with explicit artifact evidence. **Risk 2: Fixture mutation contaminates later runs** - *Risk*: Generated FDR, cache, or SITL output changes expected input fixtures. - *Mitigation*: Use read-only fixture mounts and fresh run-scoped output volumes for every execution.