Initial commit

Made-with: Cursor
2026-06-21 09:01:15 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,117 @@
+# E2E Test Environment
+
+## Overview
+
+**System under test**: Semantic Detection Service — a Cython + TensorRT module running within the existing FastAPI detections service on Jetson Orin Nano Super. Entry points: FastAPI REST API (image/video input), UART serial port (gimbal commands), Unix socket (VLM IPC).
+
+**Consumer app purpose**: Standalone Python test runner that exercises the semantic detection pipeline through its public interfaces: submitting frames, injecting mock YOLO detections, capturing detection results, and monitoring gimbal command output. No access to internals.
+
+## Docker Environment
+
+### Services
+
+| Service | Image / Build | Purpose | Ports |
+|---------|--------------|---------|-------|
+| semantic-detection | build: ./Dockerfile.test | Main semantic detection pipeline (Tier 1 + 2 + scan controller + gimbal driver + recorder) | 8080 (API) |
+| mock-yolo | build: ./tests/mock_yolo/ | Provides deterministic YOLO detection output for test frames | 8081 (API) |
+| mock-gimbal | build: ./tests/mock_gimbal/ | Simulates ViewPro A40 serial interface via TCP socket (replaces UART for testing) | 9090 (TCP) |
+| vlm-stub | build: ./tests/vlm_stub/ | Deterministic VLM response stub via Unix socket | — (Unix socket) |
+| e2e-consumer | build: ./tests/e2e/ | Black-box test runner (pytest) | — |
+
+### Networks
+
+| Network | Services | Purpose |
+|---------|----------|---------|
+| e2e-net | all | Isolated test network |
+
+### Volumes
+
+| Volume | Mounted to | Purpose |
+|--------|-----------|---------|
+| test-frames | semantic-detection:/data/frames, e2e-consumer:/data/frames | Shared test images (semantic01-04.png + synthetic frames) |
+| test-output | semantic-detection:/data/output, e2e-consumer:/data/output | Detection logs, recorded frames, gimbal command log |
+
+### docker-compose structure
+
+```yaml
+services:
+  semantic-detection:
+    build: .
+    environment:
+      - ENV=test
+      - GIMBAL_HOST=mock-gimbal
+      - GIMBAL_PORT=9090
+      - VLM_SOCKET=/tmp/vlm.sock
+      - YOLO_API=http://mock-yolo:8081
+      - RECORD_PATH=/data/output/frames
+      - LOG_PATH=/data/output/detections.jsonl
+    volumes:
+      - test-frames:/data/frames
+      - test-output:/data/output
+    depends_on:
+      - mock-yolo
+      - mock-gimbal
+      - vlm-stub
+
+  mock-yolo:
+    build: ./tests/mock_yolo
+
+  mock-gimbal:
+    build: ./tests/mock_gimbal
+
+  vlm-stub:
+    build: ./tests/vlm_stub
+
+  e2e-consumer:
+    build: ./tests/e2e
+    volumes:
+      - test-frames:/data/frames
+      - test-output:/data/output
+    depends_on:
+      - semantic-detection
+```
+
+## Consumer Application
+
+**Tech stack**: Python 3.11, pytest, requests, struct (for gimbal protocol parsing)
+**Entry point**: `pytest tests/e2e/ --junitxml=e2e-results/report.xml`
+
+### Communication with system under test
+
+| Interface | Protocol | Endpoint / Topic | Authentication |
+|-----------|----------|-----------------|----------------|
+| Frame submission | HTTP POST | http://semantic-detection:8080/api/v1/detect | None (internal network) |
+| Detection results | HTTP GET | http://semantic-detection:8080/api/v1/results | None |
+| Gimbal command log | File read | /data/output/gimbal_commands.log | None (shared volume) |
+| Detection log | File read | /data/output/detections.jsonl | None (shared volume) |
+| Recorded frames | File read | /data/output/frames/ | None (shared volume) |
+
+### What the consumer does NOT have access to
+
+- No direct access to TensorRT engine internals
+- No access to YOLOE model weights or inference state
+- No access to VLM process memory or internal prompts
+- No direct UART/serial access (reads gimbal command log only)
+- No access to scan controller state machine internals
+
+## CI/CD Integration
+
+**When to run**: On every PR to `dev` branch; nightly on `dev`
+**Pipeline stage**: After unit tests pass, before merge approval
+**Gate behavior**: Block merge on any FAIL
+**Timeout**: 10 minutes total suite (most tests < 1s each; VLM tests up to 30s)
+
+## Reporting
+
+**Format**: JUnit XML + CSV summary
+**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
+**Output path**: `./e2e-results/report.xml`, `./e2e-results/summary.csv`
+
+## Hardware-in-the-Loop Test Track
+
+Tests requiring actual Jetson Orin Nano Super hardware are marked with `[HIL]` in test IDs. These tests:
+- Run on physical Jetson with real TensorRT engines
+- Use real ViewPro A40 gimbal (or ViewPro simulator if available)
+- Measure actual latency, memory, thermal, power
+- Run separately from Docker-based E2E suite
+- Triggered manually or on hardware CI runner (if available)
@@ -0,0 +1,323 @@
+# E2E Functional Tests
+
+## Positive Scenarios
+
+### FT-P-01: Tier 1 detects footpath from aerial image
+
+**Summary**: Submit a winter aerial image containing a visible footpath; verify Tier 1 (YOLOE) returns a detection with class "footpath" and a segmentation mask.
+**Traces to**: AC-YOLO-NEW-CLASSES, AC-SEMANTIC-PIPELINE
+**Category**: YOLO Object Detection — New Classes
+
+**Preconditions**:
+- Semantic detection service is running
+- Mock YOLO service returns pre-computed detections for semantic01.png including footpath class
+
+**Input data**: semantic01.png + mock-yolo-detections (footpath detected)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png to /api/v1/detect | 200 OK, processing started |
+| 2 | GET /api/v1/results after 200ms | Detection result array containing at least 1 detection with class="footpath", confidence > 0.5 |
+| 3 | Verify detection bbox covers the known footpath region in semantic01.png | bbox overlaps with annotated ground truth footpath region (IoU > 0.3) |
+
+**Expected outcome**: At least 1 footpath detection returned with confidence > 0.5
+**Max execution time**: 2s
+
+---
+
+### FT-P-02: Tier 2 traces footpath to endpoint and flags concealed position
+
+**Summary**: Given a frame with detected footpath, verify Tier 2 performs path tracing (skeletonization → endpoint detection) and identifies a dark mass at the endpoint as a potential concealed position.
+**Traces to**: AC-SEMANTIC-DETECTION, AC-SEMANTIC-PIPELINE
+**Category**: Semantic Detection Performance
+
+**Preconditions**:
+- Tier 1 has detected a footpath in the input frame
+- Mock YOLO provides footpath segmentation mask for semantic01.png
+
+**Input data**: semantic01.png + mock-yolo-detections (footpath with mask)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png to /api/v1/detect | Processing started |
+| 2 | Wait for Tier 2 processing (up to 500ms) | — |
+| 3 | GET /api/v1/results | Detection result includes tier2_result="concealed_position" with tier2_confidence > 0 |
+| 4 | Read detections.jsonl from output volume | Log entry exists with tier=2, class matches "concealed_position" or "branch_pile_endpoint" |
+
+**Expected outcome**: Tier 2 produces at least 1 endpoint detection flagged as potential concealed position
+**Max execution time**: 3s
+
+---
+
+### FT-P-03: Detection output format matches existing YOLO output schema
+
+**Summary**: Verify semantic detection output uses the same bounding box format as existing YOLO pipeline (centerX, centerY, width, height, classNum, label, confidence — all normalized).
+**Traces to**: AC-INTEGRATION
+**Category**: Integration
+
+**Preconditions**:
+- At least 1 detection produced from semantic pipeline
+
+**Input data**: semantic03.png + mock-yolo-detections
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic03.png to /api/v1/detect | Processing started |
+| 2 | GET /api/v1/results | Detection JSON array |
+| 3 | Validate each detection has fields: centerX (0-1), centerY (0-1), width (0-1), height (0-1), classNum (int), label (string), confidence (0-1) | All fields present, all values within valid ranges |
+
+**Expected outcome**: All output detections conform to existing YOLO output schema
+**Max execution time**: 2s
+
+---
+
+### FT-P-04: Tier 3 VLM analysis triggered for ambiguous Tier 2 result
+
+**Summary**: When Tier 2 confidence is below threshold (e.g., 0.3-0.6), verify Tier 3 VLM is invoked for deeper analysis and returns a structured response.
+**Traces to**: AC-LATENCY-TIER3, AC-SEMANTIC-PIPELINE
+**Category**: Semantic Analysis Pipeline
+
+**Preconditions**:
+- VLM stub is running and responds to IPC
+- Mock YOLO returns detections with ambiguous endpoint (moderate confidence)
+
+**Input data**: semantic02.png + mock-yolo-detections (footpath with ambiguous endpoint) + vlm-stub-responses
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic02.png to /api/v1/detect | Processing started |
+| 2 | Wait for Tier 3 processing (up to 6s) | — |
+| 3 | GET /api/v1/results | Detection result includes tier3_used=true |
+| 4 | Read detections.jsonl | Log entry with tier=3 and VLM analysis text present |
+
+**Expected outcome**: VLM was invoked, response is recorded in detection log, total latency ≤ 6s
+**Max execution time**: 8s
+
+---
+
+### FT-P-05: Frame quality gate rejects blurry frame
+
+**Summary**: Submit a blurred frame; verify the system rejects it via the frame quality gate and does not produce detections from it.
+**Traces to**: AC-SCAN-ALGORITHM
+**Category**: Scan Algorithm
+
+**Preconditions**:
+- Blurry test frames available in test data
+
+**Input data**: blurry-frames (Gaussian blur applied to semantic01.png)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST blurry_semantic01.png to /api/v1/detect | 200 OK |
+| 2 | GET /api/v1/results | Empty detection array or response indicating frame rejected (quality below threshold) |
+
+**Expected outcome**: No detections produced from blurry frame; frame quality metric logged
+**Max execution time**: 1s
+
+---
+
+### FT-P-06: Scan controller transitions from Level 1 to Level 2
+
+**Summary**: When Tier 1 detects a POI, verify the scan controller issues zoom-in gimbal commands and transitions to Level 2 state.
+**Traces to**: AC-SCAN-L1-TO-L2, AC-CAMERA-ZOOM
+**Category**: Scan Algorithm, Camera Control
+
+**Preconditions**:
+- Mock gimbal service is running and accepting commands
+- Scan controller starts in Level 1 mode
+
+**Input data**: synthetic-video-sequence (simulating Level 1 sweep) + mock-yolo-detections (POI detected mid-sequence)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST first 10 frames (Level 1 sweep, no POI) | Gimbal commands show pan sweep pattern |
+| 2 | POST frame 11 with mock YOLO returning a footpath detection | Scan controller queues POI |
+| 3 | POST frame 12-15 | Gimbal command log shows zoom-in command issued |
+| 4 | Read gimbal command log | Transition from sweep commands to zoom + hold commands within 2s of POI detection |
+
+**Expected outcome**: Gimbal transitions from Level 1 sweep to Level 2 zoom within 2 seconds
+**Max execution time**: 5s
+
+---
+
+### FT-P-07: Detection logging writes complete JSON-lines entries
+
+**Summary**: After processing multiple frames, verify the detection log contains properly formatted JSON-lines entries with all required fields.
+**Traces to**: AC-INTEGRATION
+**Category**: Recording, Logging & Telemetry
+
+**Preconditions**:
+- Multiple frames processed with detections
+
+**Input data**: semantic01.png, semantic02.png + mock-yolo-detections
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png, then semantic02.png | Detections produced |
+| 2 | Read /data/output/detections.jsonl | File exists, contains ≥1 JSON line |
+| 3 | Parse each line as JSON | Valid JSON with fields: ts, frame_id, tier, class, confidence, bbox |
+| 4 | Verify timestamps are ISO 8601, bbox values 0-1, confidence 0-1 | All values within valid ranges |
+
+**Expected outcome**: All detection log entries are valid JSON with all required fields
+**Max execution time**: 3s
+
+---
+
+### FT-P-08: Freshness metadata attached to footpath detections
+
+**Summary**: Verify that footpath detections include freshness metadata (contrast ratio) as "high_contrast" or "low_contrast" tag.
+**Traces to**: AC-SEMANTIC-PIPELINE
+**Category**: Semantic Analysis Pipeline
+
+**Preconditions**:
+- Footpath detected in Tier 1
+
+**Input data**: semantic01.png + mock-yolo-detections (footpath)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png | Detections produced |
+| 2 | GET /api/v1/results | Footpath detection includes freshness field |
+| 3 | Verify freshness is one of: "high_contrast", "low_contrast" | Valid freshness tag present |
+
+**Expected outcome**: Freshness metadata present on all footpath detections
+**Max execution time**: 2s
+
+---
+
+## Negative Scenarios
+
+### FT-N-01: No detections from empty scene
+
+**Summary**: Submit a frame where YOLO returns zero detections; verify semantic pipeline returns empty results without errors.
+**Traces to**: AC-SEMANTIC-PIPELINE (negative case)
+**Category**: Semantic Analysis Pipeline
+
+**Preconditions**:
+- Mock YOLO returns empty detection array
+
+**Input data**: semantic01.png + mock-yolo-empty
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png with mock YOLO returning zero detections | 200 OK |
+| 2 | GET /api/v1/results | Empty detection array, no errors |
+
+**Expected outcome**: System returns empty results gracefully
+**Max execution time**: 1s
+
+---
+
+### FT-N-02: System handles high-volume false positive YOLO input
+
+**Summary**: Submit a frame where YOLO returns 50+ random false positive bounding boxes; verify system processes without crash and Tier 2 filters most.
+**Traces to**: AC-SEMANTIC-DETECTION, RESTRICT-RESOURCE
+**Category**: Semantic Detection Performance
+
+**Preconditions**:
+- Mock YOLO returns 50 random detections
+
+**Input data**: semantic01.png + mock-yolo-noise (50 random bboxes)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST semantic01.png with noisy YOLO output | 200 OK, processing started |
+| 2 | Wait 2s, GET /api/v1/results | Results returned without crash |
+| 3 | Verify result count ≤ 50 | Tier 2 filtering reduces candidate count |
+
+**Expected outcome**: System handles noisy input without crash; processes within time budget
+**Max execution time**: 5s
+
+---
+
+### FT-N-03: Invalid image format rejected
+
+**Summary**: Submit a 0-byte file and a truncated JPEG; verify system rejects with appropriate error.
+**Traces to**: RESTRICT-SOFTWARE
+**Category**: Software
+
+**Preconditions**:
+- Service is running
+
+**Input data**: 0-byte file, truncated JPEG (first 100 bytes of semantic01.png)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST 0-byte file to /api/v1/detect | 400 Bad Request or skip with warning |
+| 2 | POST truncated JPEG | 400 Bad Request or skip with warning |
+
+**Expected outcome**: System rejects invalid input without crash
+**Max execution time**: 1s
+
+---
+
+### FT-N-04: Gimbal communication failure triggers graceful degradation
+
+**Summary**: When mock gimbal stops responding, verify system degrades to Level 3 (no gimbal) and continues YOLO-only detection.
+**Traces to**: AC-SCAN-ALGORITHM, RESTRICT-HARDWARE
+**Category**: Scan Algorithm, Resilience
+
+**Preconditions**:
+- Mock gimbal is initially running, then stopped mid-test
+
+**Input data**: semantic01.png + mock-yolo-detections
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | POST frame, verify gimbal commands are sent | Gimbal commands in log |
+| 2 | Stop mock-gimbal service | — |
+| 3 | POST next frame | System detects gimbal timeout |
+| 4 | POST 3 more frames | System enters degradation Level 3 (no gimbal), continues producing YOLO-only detections |
+| 5 | GET /api/v1/results | Detections still returned (from existing YOLO pipeline) |
+
+**Expected outcome**: System degrades gracefully to Level 3, continues detecting without gimbal
+**Max execution time**: 15s
+
+---
+
+### FT-N-05: VLM process crash triggers Tier 3 unavailability
+
+**Summary**: When VLM stub crashes, verify Tier 3 is marked unavailable and Tier 1+2 continue operating.
+**Traces to**: AC-SEMANTIC-PIPELINE, RESTRICT-SOFTWARE
+**Category**: Resilience
+
+**Preconditions**:
+- VLM stub initially running, then killed
+
+**Input data**: semantic02.png + mock-yolo-detections (ambiguous endpoint that would trigger VLM)
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Kill vlm-stub process | — |
+| 2 | POST semantic02.png with ambiguous detection | Processing starts |
+| 3 | GET /api/v1/results after 3s | Detection result with tier3_used=false (VLM unavailable), Tier 1+2 results still present |
+| 4 | Read detection log | Log entry shows tier3 skipped with reason "vlm_unavailable" |
+
+**Expected outcome**: Tier 1+2 results are returned; Tier 3 is gracefully skipped
+**Max execution time**: 5s
@@ -0,0 +1,272 @@
+# E2E Non-Functional Tests
+
+## Performance Tests
+
+### NFT-PERF-01: Tier 1 inference latency ≤100ms [HIL]
+
+**Summary**: Measure Tier 1 (YOLOE TRT FP16) inference latency on Jetson Orin Nano Super with real TensorRT engine.
+**Traces to**: AC-LATENCY-TIER1
+**Metric**: p95 inference latency per frame (ms)
+
+**Preconditions**:
+- Jetson Orin Nano Super with JetPack 6.2
+- YOLOE TRT FP16 engine loaded
+- Active cooling enabled, T_junction < 70°C
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Submit 100 frames (semantic01-04.png cycled) with 100ms interval | Record per-frame inference time from API response header |
+| 2 | Compute p50, p95, p99 latency | — |
+
+**Pass criteria**: p95 latency < 100ms
+**Duration**: 15 seconds
+
+---
+
+### NFT-PERF-02: Tier 2 heuristic latency ≤50ms
+
+**Summary**: Measure V1 heuristic endpoint analysis (skeletonization + endpoint + darkness check) latency.
+**Traces to**: AC-LATENCY-TIER2
+**Metric**: p95 processing latency per ROI (ms)
+
+**Preconditions**:
+- Tier 1 has produced footpath segmentation masks
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Submit 50 frames with mock YOLO footpath masks | Record Tier 2 processing time from detection log |
+| 2 | Compute p50, p95 latency | — |
+
+**Pass criteria**: p95 latency < 50ms (V1 heuristic), < 200ms (V2 CNN)
+**Duration**: 10 seconds
+
+---
+
+### NFT-PERF-03: Tier 3 VLM latency ≤5s
+
+**Summary**: Measure VLM inference latency including image encoding, prompt processing, and response generation.
+**Traces to**: AC-LATENCY-TIER3
+**Metric**: End-to-end VLM analysis time per ROI (ms)
+
+**Preconditions**:
+- NanoLLM with VILA1.5-3B loaded (or vlm-stub for Docker-based test)
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Trigger 10 Tier 3 analyses on different ROIs | Record time from VLM request to response via detection log |
+| 2 | Compute p50, p95 latency | — |
+
+**Pass criteria**: p95 latency < 5000ms
+**Duration**: 60 seconds
+
+---
+
+### NFT-PERF-04: Full pipeline throughput under continuous frame input
+
+**Summary**: Submit frames at 10 FPS for 60 seconds; measure detection throughput and queue depth.
+**Traces to**: AC-LATENCY-TIER1, AC-SCAN-ALGORITHM
+**Metric**: Frames processed per second, max queue depth
+
+**Preconditions**:
+- All tiers active, mock services responding
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Submit 600 frames at 10 FPS (60s) | Count processed frames from detection log |
+| 2 | Record queue depth if available from API status endpoint | — |
+
+**Pass criteria**: ≥8 FPS sustained processing rate; no frames silently dropped (all either processed or explicitly skipped with quality gate reason)
+**Duration**: 75 seconds
+
+---
+
+## Resilience Tests
+
+### NFT-RES-01: Semantic process crash and recovery
+
+**Summary**: Kill the semantic detection process; verify watchdog restarts it within 10 seconds and processing resumes.
+**Traces to**: AC-SCAN-ALGORITHM (degradation)
+
+**Preconditions**:
+- Semantic detection running and processing frames
+
+**Fault injection**:
+- Kill semantic process via signal (SIGKILL)
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Submit 5 frames successfully | Detections returned |
+| 2 | Kill semantic process | Frame processing stops |
+| 3 | Wait up to 10 seconds | Watchdog detects crash, restarts process |
+| 4 | Submit 5 more frames | Detections returned again |
+
+**Pass criteria**: Recovery within 10 seconds; no data corruption in detection log; frames submitted during downtime are either queued or rejected (not silently dropped)
+
+---
+
+### NFT-RES-02: VLM load/unload cycle stability
+
+**Summary**: Load and unload VLM 10 times; verify no memory leak and successful inference after each reload.
+**Traces to**: AC-RESOURCE-CONSTRAINTS
+
+**Preconditions**:
+- VLM process manageable via API/signal
+
+**Fault injection**:
+- Alternating VLM load/unload commands
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Load VLM, run 1 inference | Success, record memory |
+| 2 | Unload VLM, record memory | Memory decreases |
+| 3 | Repeat 10 times | — |
+| 4 | Compare memory at cycle 1 vs cycle 10 | Delta < 100MB |
+
+**Pass criteria**: No memory leak (delta < 100MB over 10 cycles); all inferences succeed
+
+---
+
+### NFT-RES-03: Gimbal CRC failure handling
+
+**Summary**: Inject corrupted gimbal command responses; verify CRC layer detects corruption and retries.
+**Traces to**: AC-CAMERA-CONTROL
+
+**Preconditions**:
+- Mock gimbal configured to return corrupted responses for first 2 attempts, valid on 3rd
+
+**Fault injection**:
+- Mock gimbal flips random bits in response CRC
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Issue pan command | First 2 responses rejected (bad CRC) |
+| 2 | Automatic retry | 3rd attempt succeeds |
+| 3 | Read gimbal command log | Log shows 2 CRC failures + 1 success |
+
+**Pass criteria**: Command succeeds after retries; CRC failures logged; no crash
+
+---
+
+## Security Tests
+
+### NFT-SEC-01: No external network access from semantic detection
+
+**Summary**: Verify the semantic detection service makes no outbound network connections outside the Docker network.
+**Traces to**: RESTRICT-SOFTWARE (local-only inference)
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Run semantic detection pipeline on test frames | Detections produced |
+| 2 | Monitor network traffic from semantic-detection container (via tcpdump on e2e-net) | No packets to external IPs |
+
+**Pass criteria**: Zero outbound connections to external networks
+
+---
+
+### NFT-SEC-02: Model files are not accessible via API
+
+**Summary**: Verify TRT engine files and VLM model weights cannot be downloaded through the API.
+**Traces to**: RESTRICT-SOFTWARE
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Attempt directory traversal via API: GET /api/v1/../models/ | 404 or 400 |
+| 2 | Attempt known model path: GET /api/v1/detect?path=/models/yoloe.engine | No model content returned |
+
+**Pass criteria**: Model files inaccessible via any API endpoint
+
+---
+
+## Resource Limit Tests
+
+### NFT-RES-LIM-01: Memory stays within 6GB budget [HIL]
+
+**Summary**: Run full pipeline (Tier 1+2+3 + recording + logging) for 30 minutes; verify peak memory stays below 6GB (semantic module allocation).
+**Traces to**: AC-RESOURCE-CONSTRAINTS, RESTRICT-HARDWARE
+**Metric**: Peak RSS memory of semantic detection + VLM processes
+
+**Preconditions**:
+- Jetson Orin Nano Super, 15W mode, active cooling
+- All components loaded
+
+**Monitoring**:
+- `tegrastats` logging at 1-second intervals: GPU memory, CPU memory, swap
+
+**Duration**: 30 minutes
+**Pass criteria**: Peak (semantic + VLM) memory < 6GB; no OOM kills; no swap usage above 100MB
+
+---
+
+### NFT-RES-LIM-02: Thermal stability under sustained load [HIL]
+
+**Summary**: Run continuous inference for 60 minutes; verify T_junction stays below 75°C with active cooling.
+**Traces to**: RESTRICT-HARDWARE
+**Metric**: T_junction max, T_junction average
+
+**Preconditions**:
+- Jetson Orin Nano Super, 15W mode, active cooling fan running
+- Ambient temperature 20-25°C
+
+**Monitoring**:
+- Temperature sensors via `tegrastats` at 1-second intervals
+
+**Duration**: 60 minutes
+**Pass criteria**: T_junction max < 75°C; no thermal throttling events
+
+---
+
+### NFT-RES-LIM-03: NVMe recording endurance [HIL]
+
+**Summary**: Record frames to NVMe at Level 2 rate (30 FPS, 1080p JPEG) for 2 hours; verify no write errors.
+**Traces to**: AC-SCAN-ALGORITHM (recording)
+**Metric**: Frames written, write errors, NVMe health
+
+**Preconditions**:
+- NVMe SSD ≥256GB, ≥30% free space
+
+**Monitoring**:
+- Write errors via dmesg
+- NVMe SMART data before and after
+
+**Duration**: 2 hours
+**Pass criteria**: Zero write errors; SMART indicators nominal; storage usage matches expected (~120GB for 2h at 30FPS)
+
+---
+
+### NFT-RES-LIM-04: Cold start time ≤60 seconds [HIL]
+
+**Summary**: Power on Jetson, measure time from boot to first successful detection.
+**Traces to**: RESTRICT-OPERATIONAL
+**Metric**: Time from power-on to first detection result (seconds)
+
+**Preconditions**:
+- JetPack 6.2 on NVMe, all models pre-exported as TRT engines
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Power on Jetson | Start timer |
+| 2 | Poll /api/v1/health every 1s | — |
+| 3 | When health returns 200, submit test frame | Record time to first detection |
+
+**Pass criteria**: First detection within 60 seconds of power-on
+**Duration**: 90 seconds max
@@ -0,0 +1,46 @@
+# E2E Test Data Management
+
+## Seed Data Sets
+
+| Data Set | Description | Used by Tests | How Loaded | Cleanup |
+|----------|-------------|---------------|-----------|---------|
+| winter-footpath-images | semantic01-04.png — real aerial images with footpaths and concealed positions (winter) | FT-P-01 to FT-P-07, FT-N-01 to FT-N-04, NFT-PERF-01 to NFT-PERF-03 | Volume mount from test-frames | Persistent, read-only |
+| mock-yolo-detections | Pre-computed YOLO detection JSONs for each test image (footpaths, roads, branch piles, entrances, trees) | FT-P-01 to FT-P-07 | Loaded by mock-yolo service from fixture files | Persistent, read-only |
+| mock-yolo-empty | YOLO detection JSON with zero detections | FT-N-01 | Loaded by mock-yolo service | Persistent, read-only |
+| mock-yolo-noise | YOLO detection JSON with high-confidence false positives (random bounding boxes) | FT-N-02 | Loaded by mock-yolo service | Persistent, read-only |
+| blurry-frames | 5 synthetically blurred versions of semantic01.png (Gaussian blur, motion blur) | FT-N-03, FT-P-05 | Volume mount | Persistent, read-only |
+| synthetic-video-sequence | 30 frames panning across semantic01.png to simulate gimbal movement | FT-P-06, FT-P-07 | Volume mount | Persistent, read-only |
+| vlm-stub-responses | Deterministic VLM text responses for each test image ROI | FT-P-04 | Loaded by vlm-stub service | Persistent, read-only |
+| gimbal-protocol-fixtures | ViewLink protocol command/response byte sequences for known operations | FT-P-06, FT-N-04 | Loaded by mock-gimbal service | Persistent, read-only |
+
+## Data Isolation Strategy
+
+Each test run starts with a clean output directory. The semantic-detection service restarts between test groups (via Docker restart). Input data (images, mock detections) is read-only and shared across tests. Output data (detection logs, recorded frames, gimbal commands) is written to a fresh directory per test run.
+
+## Input Data Mapping
+
+| Input Data File | Source Location | Description | Covers Scenarios |
+|-----------------|----------------|-------------|-----------------|
+| semantic01.png | `_docs/00_problem/input_data/semantic01.png` | Footpath with arrows, leading to branch pile hideout | FT-P-01, FT-P-02, FT-P-03, FT-P-04 |
+| semantic02.png | `_docs/00_problem/input_data/semantic02.png` | Footpath to open space from forest, FPV pilot trail | FT-P-01, FT-P-02, FT-P-07 |
+| semantic03.png | `_docs/00_problem/input_data/semantic03.png` | Footpath with squared hideout | FT-P-01, FT-P-03 |
+| semantic04.png | `_docs/00_problem/input_data/semantic04.png` | Footpath ending at tree branches | FT-P-01, FT-P-02 |
+| data_parameters.md | `_docs/00_problem/input_data/data_parameters.md` | Training data spec (not used in E2E tests directly) | — |
+
+## External Dependency Mocks
+
+| External Service | Mock/Stub | How Provided | Behavior |
+|-----------------|-----------|-------------|----------|
+| YOLO Detection Pipeline | mock-yolo Docker service | HTTP API returning deterministic JSON detection results per image hash | Returns pre-computed detection arrays matching expected YOLO output format (centerX, centerY, width, height, classNum, label, confidence) |
+| ViewPro A40 Gimbal | mock-gimbal Docker service | TCP socket emulating UART serial interface | Accepts ViewLink protocol commands, responds with gimbal feedback (pan/tilt angles, status). Logs all received commands to file. Supports simulated delays (1-2s zoom transition). |
+| VLM (NanoLLM/VILA) | vlm-stub Docker service | Unix socket responding to IPC messages | Returns deterministic text analysis per image ROI hash. Simulates ~2s latency. Returns configurable responses for positive/negative/ambiguous cases. |
+| GPS-Denied System | Not mocked | Not needed — coordinates are passed as metadata input | System under test accepts coordinates as input parameters, does not compute them |
+
+## Data Validation Rules
+
+| Data Type | Validation | Invalid Examples | Expected System Behavior |
+|-----------|-----------|-----------------|------------------------|
+| Input frame | JPEG/PNG, 1920x1080, 3-channel RGB | 0-byte file, truncated JPEG, 640x480, grayscale | Reject with error, skip frame, continue processing |
+| YOLO detection JSON | Array of objects with required fields (centerX, centerY, width, height, classNum, label, confidence) | Missing fields, confidence > 1.0, negative coordinates | Ignore malformed detections, process valid ones |
+| Gimbal command | Valid ViewLink protocol packet with CRC-16 | Truncated packet, invalid CRC, unknown command code | Retry up to 3 times, log error, continue without gimbal |
+| VLM IPC message | JSON with image_path and prompt fields | Missing image_path, empty prompt, non-existent file | Return error response, Tier 3 marked as failed for this ROI |
@@ -0,0 +1,69 @@
+# E2E Traceability Matrix
+
+## Acceptance Criteria Coverage
+
+| AC ID | Acceptance Criterion | Test IDs | Coverage |
+|-------|---------------------|----------|----------|
+| AC-LATENCY-TIER1 | Tier 1 ≤100ms per frame | NFT-PERF-01, NFT-PERF-04 | Covered |
+| AC-LATENCY-TIER2 | Tier 2 ≤200ms per ROI | NFT-PERF-02 | Covered |
+| AC-LATENCY-TIER3 | Tier 3 ≤5s per ROI | NFT-PERF-03, FT-P-04 | Covered |
+| AC-YOLO-NEW-CLASSES | New YOLO classes P≥80% R≥80% | FT-P-01 | Partially covered — functional flow tested; statistical P/R requires annotated validation set (component-level test) |
+| AC-SEMANTIC-DETECTION-R | Concealed position recall ≥60% | FT-P-02, FT-P-03 | Partially covered — functional detection tested; statistical recall requires larger dataset (component-level test) |
+| AC-SEMANTIC-DETECTION-P | Concealed position precision ≥20% | FT-P-02, FT-N-02 | Partially covered — same as above |
+| AC-FOOTPATH-RECALL | Footpath detection recall ≥70% | FT-P-01 | Partially covered — functional detection tested; statistical recall at component level |
+| AC-SCAN-L1 | Level 1 covers route with sweep | FT-P-06 | Covered |
+| AC-SCAN-L1-TO-L2 | L1→L2 transition within 2s | FT-P-06 | Covered |
+| AC-SCAN-L2-LOCK | L2 maintains camera lock on POI | — | NOT COVERED — requires real gimbal + moving platform; covered in [HIL] test track |
+| AC-SCAN-PATH-FOLLOW | Path-following keeps path in center 50% | — | NOT COVERED — requires real camera + gimbal; covered in [HIL] track |
+| AC-SCAN-ENDPOINT-HOLD | Endpoint hold for VLM analysis | FT-P-04 | Partially covered — VLM trigger tested; physical hold requires [HIL] |
+| AC-SCAN-RETURN | Return to L1 after analysis/timeout | FT-P-06 | Covered (within mock gimbal command sequence) |
+| AC-CAMERA-LATENCY | Gimbal command ≤500ms | NFT-RES-03 | Covered (mock; [HIL] for real latency) |
+| AC-CAMERA-ZOOM | Zoom M→H within 2s | FT-P-06 | Covered (mock acknowledges zoom; [HIL] for physical timing) |
+| AC-CAMERA-PATH-ACCURACY | Footpath stays in center 50% during pan | — | NOT COVERED — requires real gimbal; [HIL] |
+| AC-CAMERA-SMOOTH | Smooth gimbal transitions | — | NOT COVERED — requires real gimbal; [HIL] |
+| AC-CAMERA-QUEUE | POI queue prioritized by confidence/proximity | FT-P-06 | Partially covered — queue existence tested; priority ordering at component level |
+| AC-SEMANTIC-PIPELINE | Consumes YOLO input, traces paths, freshness | FT-P-01, FT-P-02, FT-P-08 | Covered |
+| AC-RESOURCE-CONSTRAINTS | ≤6GB RAM total | NFT-RES-LIM-01 | Covered [HIL] |
+| AC-COEXIST-YOLO | Must not degrade existing YOLO | NFT-PERF-04 | Partially covered — throughput measured; real coexistence at [HIL] |
+
+## Restrictions Coverage
+
+| Restriction ID | Restriction | Test IDs | Coverage |
+|---------------|-------------|----------|----------|
+| RESTRICT-HW-JETSON | Jetson Orin Nano Super, 67 TOPS, 8GB | NFT-RES-LIM-01, NFT-RES-LIM-02 | Covered [HIL] |
+| RESTRICT-HW-RAM | ~6GB available for semantic + VLM | NFT-RES-LIM-01 | Covered [HIL] |
+| RESTRICT-CAM-VIEWPRO | ViewPro A40 1080p 40x zoom | FT-P-06, NFT-RES-03 | Covered (mock) |
+| RESTRICT-CAM-ZOOM-TIME | Zoom transition 1-2s physical | FT-P-06 | Covered (mock with simulated delay) |
+| RESTRICT-OP-ALTITUDE | 600-1000m altitude | — | NOT COVERED — operational parameter, not testable at E2E; affects GSD calculation tested at component level |
+| RESTRICT-OP-SEASONS | All seasons, phased starting winter | FT-P-01 to FT-P-08 (winter images) | Partially covered — winter only; other seasons deferred to Phase 4 |
+| RESTRICT-SW-CYTHON-TRT | Extend Cython + TRT codebase | — | NOT COVERED — architectural constraint verified by code review, not E2E test |
+| RESTRICT-SW-TRT | TensorRT inference engine | NFT-PERF-01 | Covered [HIL] |
+| RESTRICT-SW-VLM-LOCAL | VLM runs locally, no cloud | NFT-SEC-01 | Covered |
+| RESTRICT-SW-VLM-SEPARATE | VLM as separate process with IPC | FT-P-04, FT-N-05 | Covered |
+| RESTRICT-SW-SEQUENTIAL-GPU | YOLO and VLM scheduled sequentially | NFT-PERF-04, NFT-RES-LIM-01 | Covered (memory monitoring shows no concurrent GPU allocation) |
+| RESTRICT-INT-FASTAPI | Existing FastAPI + Cython + Docker | FT-P-03 | Covered (output format) |
+| RESTRICT-INT-YOLO-OUTPUT | Consume YOLO bounding box output | FT-P-01, FT-P-02 | Covered |
+| RESTRICT-INT-OUTPUT-FORMAT | Output same bbox format | FT-P-03 | Covered |
+| RESTRICT-SCOPE-ANNOTATION | Annotation tooling out of scope | — | N/A |
+| RESTRICT-SCOPE-GPS | GPS-denied out of scope | — | N/A |
+
+## Coverage Summary
+
+| Category | Total Items | Covered | Partially Covered | Not Covered | Coverage % |
+|----------|-----------|---------|-------------------|-------------|-----------|
+| Acceptance Criteria | 21 | 10 | 7 | 4 | 81% (counting partial as 0.5) |
+| Restrictions | 16 | 8 | 2 | 4 | 69% (2 N/A excluded) |
+| **Total** | **37** | **18** | **9** | **8** | **76%** |
+
+## Uncovered Items Analysis
+
+| Item | Reason Not Covered | Risk | Mitigation |
+|------|-------------------|------|-----------|
+| AC-SCAN-L2-LOCK | Requires real gimbal + moving UAV platform | Camera drifts off target during flight | [HIL] test with real hardware; PID tuning on bench first |
+| AC-SCAN-PATH-FOLLOW | Requires real gimbal + camera | Path leaves frame during pan | [HIL] test; component-level PID unit tests with simulated feedback |
+| AC-CAMERA-PATH-ACCURACY | Requires real gimbal | Path not centered | [HIL] test |
+| AC-CAMERA-SMOOTH | Requires real gimbal | Jerky movement blurs frames | [HIL] test; PID tuning |
+| RESTRICT-OP-ALTITUDE | Operational parameter, not testable | GSD calculation wrong | Component-level GSD unit test with known altitude |
+| RESTRICT-SW-CYTHON-TRT | Architectural constraint | Wrong tech stack used | Code review gate in PR process |
+| RESTRICT-OP-SEASONS (non-winter) | Only winter images available now | System fails on summer/spring terrain | Phase 4 seasonal expansion; deferred by design |
+| RESTRICT-HW-JETSON (real perf) | Requires physical hardware | Docker perf doesn't match Jetson | [HIL] test track runs on real Jetson |