- Refined task decomposition steps to ensure implementation tasks are atomic and complexity does not exceed 5 points. - Enhanced the product implementation process with a completeness gate to verify task outcomes against architecture promises before proceeding to testing. - Updated dependencies table to reflect new tasks and their relationships, ensuring all test tasks are linked to product remediation tasks. - Adjusted workflow documentation to clarify entry points for task decomposition and implementation contexts. Co-authored-by: Cursor <cursoragent@cursor.com>
8.6 KiB
Test Infrastructure
Task: AZ-233_test_infrastructure Name: Test Infrastructure Description: Scaffold the blackbox and e2e test project: runner, deterministic fixtures, isolated replay/SITL environment, reporting, and external dependency stubs. Complexity: 5 points Dependencies: AZ-240_native_vio_backend_integration, AZ-241_real_satellite_vpr_descriptor_retrieval, AZ-242_real_anchor_feature_matching_ransac Component: Blackbox Tests Tracker: AZ-233 Epic: AZ-218
Test Project Folder Layout
e2e/
├── replay/
│ ├── run_replay.py
│ ├── scenarios/
│ └── reports/
├── fixtures/
│ ├── cache/
│ ├── mavlink/
│ ├── telemetry/
│ └── expected/
├── tests/
│ ├── test_still_image_replay.py
│ ├── test_vio_replay.py
│ ├── test_satellite_anchor.py
│ ├── test_blackout_spoofing.py
│ ├── test_resource_limits.py
│ └── test_security_gates.py
├── mocks/
│ ├── satellite_cache_stub/
│ ├── ardupilot_sitl/
│ └── qgc_observer/
└── reports/
Layout Rationale
The test project keeps blackbox/e2e runner code outside product runtime internals. Scenario definitions, fixtures, mocks, and reports are separated so tests can reset state between runs and produce release evidence without importing private component modules.
Test implementation starts only after remediation tasks AZ-240, AZ-241, and AZ-242 close the native VIO, real satellite VPR, and real anchor matching gaps found during autodev verification.
Mock Services
| Mock Service | Replaces | Interfaces | Behavior |
|---|---|---|---|
satellite_cache_stub |
Offline Azaion Suite Satellite Service cache package | Local COG/manifest/descriptor fixture volume | Serves preloaded valid, stale, unsigned, hash-mismatched, and low-resolution cache fixtures; never performs network fetches during flight-mode tests. |
ardupilot_sitl |
ArduPilot Plane flight controller | MAVLink telemetry and GPS_INPUT receiving path |
Emits generated IMU, attitude, GPS health, spoofing, and failsafe traces; records injected GPS_INPUT for assertions. |
qgc_observer |
QGroundControl status consumer | MAVLink/tlog parser | Records downsampled STATUSTEXT, status, and failsafe messages for rate and content assertions. |
Mock Control API
Each mock or runner fixture must expose deterministic scenario controls for normal replay, stale cache, missing cache, spoofed GPS, blackout, restart, and resource-load modes. Recorded interactions must be queryable after each test run for assertions.
Docker Test Environment
docker-compose.test.yml Structure
| Service | Image / Build | Purpose | Depends On |
|---|---|---|---|
gps-denied-service |
Project runtime image or local package mount | System under test | satellite-cache-stub |
replay-consumer |
Python replay/test harness | Feeds frames, telemetry, cache data, and faults | gps-denied-service, mock services |
satellite-cache-stub |
Fixture volume/service | Provides offline cache manifests, sidecars, descriptors, and generated invalid variants | none |
ardupilot-plane-sitl |
SITL container or local process wrapper | Validates GPS_INPUT, spoofing, and failsafe behavior |
gps-denied-service |
qgc-observer |
MAVLink log parser | Verifies GCS-visible status output | ardupilot-plane-sitl |
Networks and Volumes
replay-net: connects the runtime, replay consumer, and satellite-cache stub.sitl-net: connects the runtime, ArduPilot Plane SITL, and QGC observer.input-data: read-only mount for_docs/00_problem/input_data/.expected-results: read-only mount for expected coordinate and report fixtures.derkachi-replay: read-only mount forflight_derkachi.mp4anddata_imu.csv.satellite-cache: fixture cache volume with valid and invalid manifests.fdr-output: fresh per-run output volume for FDR and report artifacts.
Test Runner Configuration
Framework: Python pytest-style replay harness.
Entry point: run-blackbox-replay or equivalent pytest command that executes scenario groups and writes reports.
Reports: CSV summary plus FDR validation Markdown.
Fixture Strategy
| Fixture | Scope | Purpose |
|---|---|---|
project_60_still_images |
session | Provides 60 nadir images and expected WGS84 centers. |
derkachi_video_telemetry |
session | Provides synchronized video, IMU, and GLOBAL_POSITION_INT replay data. |
cache_integrity_fixtures |
function | Provides valid, stale, unsigned, hash-mismatched, and low-resolution cache variants. |
sitl_spoofing_scenarios |
function | Provides generated GPS loss/spoofing and blackout traces. |
public_nadir_vio_candidates |
optional/session | Provides public or representative synchronized datasets when available. |
Test Data Fixtures
| Data Set | Source | Format | Used By |
|---|---|---|---|
project_60_still_images |
_docs/00_problem/input_data/ |
JPG + metadata | Still-image accuracy, confidence, latency smoke |
expected_frame_centers |
_docs/00_problem/input_data/coordinates.csv and expected-results report |
CSV/Markdown | Geolocation assertions |
derkachi_video_telemetry |
_docs/00_problem/input_data/flight_derkachi/ |
MP4 + CSV | VIO replay, latency, resilience |
cache_integrity_fixtures |
generated fixture volume | COG/manifest/sidecar/index fixtures | Cache freshness, poisoning, no-fetch tests |
sitl_spoofing_scenarios |
generated by SITL harness | MAVLink/tlog traces | Spoofing, blackout, failsafe, GCS status |
public_nadir_vio_candidates |
pinned external fixtures | dataset-specific | Final VIO and satellite-anchor validation |
Data Isolation
Every run uses read-only input fixtures and fresh run-scoped output directories. FDR, generated tiles, tlogs, and reports are written only to per-run output volumes. Mock state and generated fixtures are reset before each scenario group.
Test Reporting
Format: CSV summary and Markdown evidence report.
Output paths: test-results/blackbox-report.csv and test-results/fdr-validation-summary.md.
Required columns: Test ID, test name, input dataset, execution time, result, error distance, source label, covariance 95% semi-major, GPS_INPUT.fix_type, and error message.
Acceptance Criteria
AC-1: Test environment starts Given the Docker/replay test environment When the test stack starts Then the runtime, replay consumer, cache fixture, SITL, and observer services are reachable or report a clear blocked prerequisite.
AC-2: External dependency stubs are deterministic Given a scenario config for cache, MAVLink, QGC, or fixture behavior When the replay consumer executes it Then mocks produce repeatable responses and expose recorded interactions for assertions.
AC-3: Test runner executes scenario groups Given valid fixtures and a running test environment When the test runner starts Then it discovers and executes blackbox, performance, resilience, security, and resource-limit scenario groups.
AC-4: Reports are generated Given a completed or blocked test run When reporting finishes Then CSV and Markdown evidence files are written with the required columns, metrics, artifact paths, and blocked-prerequisite reasons.
Non-Functional Requirements
Reliability
- Missing hardware, public datasets, calibration, or SITL prerequisites are reported as
blocked, notpassed.
Security
- Fixture stubs must not access external satellite-provider or Suite service networks during in-flight test scenarios.
Data Isolation
- No test may mutate source fixtures or write FDR/generated-tile artifacts outside run-scoped output paths.
Constraints
- The test suite must use public runtime boundaries only: navigation frames, telemetry, offline cache, MAVLink output, QGC status, and FDR outputs.
- The suite must not import private estimator, BASALT, wrapper, or tile-manager internals.
- Hardware-specific Jetson gates remain release-gate tests and may be skipped or blocked in ordinary local replay.
Risks & Mitigation
Risk 1: Environment prerequisites hide real failures
- Risk: Missing hardware, calibration, or datasets could be treated as success.
- Mitigation: Report unavailable prerequisites as
blockedwith explicit artifact evidence.
Risk 2: Fixture mutation contaminates later runs
- Risk: Generated FDR, cache, or SITL output changes expected input fixtures.
- Mitigation: Use read-only fixture mounts and fresh run-scoped output volumes for every execution.