initial structure implemented

docs -> _docs
2026-06-22 15:11:12 +00:00 · 2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,202 @@
+# ASTRAL-Next Test Specifications Summary
+
+## Overview
+Comprehensive test specifications for the GPS-denied navigation system following the QA testing pyramid approach.
+
+**Total Test Specifications**: 49
+
+## Test Organization
+
+### Integration Tests (01-16): Component Level
+Tests individual system components in isolation with their dependencies.
+
+**Vision Pipeline (01-04)**:
+- 01: Sequential Visual Odometry (F07 - SuperPoint + LightGlue)
+- 02: Global Place Recognition (F08 - AnyLoc/DINOv2)
+- 03: Metric Refinement (F09 - LiteSAM)
+- 04: Factor Graph Optimizer (F10 - GTSAM)
+
+**Data Management (05-08)**:
+- 05: Satellite Data Manager (F04)
+- 06: Coordinate Transformer (F13)
+- 07: Image Input Pipeline (F05)
+- 08: Image Rotation Manager (F06)
+
+**Service Infrastructure (09-12)**:
+- 09: REST API (F01 - FastAPI endpoints)
+- 10: SSE Event Streamer (F15 - real-time streaming)
+- 11a: Flight Lifecycle Manager (F02.1 - CRUD, initialization, API delegation)
+- 11b: Flight Processing Engine (F02.2 - processing loop, recovery coordination)
+- 12: Result Manager (F14)
+
+**Support Components (13-16)**:
+- 13: Model Manager (F16 - TensorRT)
+- 14: Failure Recovery Coordinator (F11)
+- 15: Configuration Manager (F17)
+- 16: Database Layer (F03)
+
+### System Integration Tests (21-25): Multi-Component Flows
+Tests integration between multiple components.
+
+- 21: End-to-End Normal Flight
+- 22: Satellite to Vision Pipeline (F04 → F07/F08/F09)
+- 23: Vision to Optimization Pipeline (F07/F08/F09 → F10)
+- 24: Multi-Component Error Propagation
+- 25: Real-Time Streaming Pipeline (F02 → F14 → F15)
+
+### Acceptance Tests (31-50): Requirements Validation
+Tests mapped to 10 acceptance criteria.
+
+**Accuracy (31-33)**:
+- 31: AC-1 - 80% < 50m error (baseline)
+- 32: AC-1 - 80% < 50m error (varied terrain)
+- 33: AC-2 - 60% < 20m error (high precision)
+
+**Robustness - Outliers (34-35)**:
+- 34: AC-3 - Single 350m outlier handling
+- 35: AC-3 - Multiple outliers handling
+
+**Robustness - Sharp Turns (36-38)**:
+- 36: AC-4 - Sharp turn zero overlap recovery
+- 37: AC-4 - Sharp turn minimal overlap (<5%)
+- 38: Outlier anchor detection
+
+**Multi-Fragment (39)**:
+- 39: AC-5 - Multi-fragment route connection (chunk architecture)
+
+**User Interaction (40)**:
+- 40: AC-6 - User input after 3 consecutive failures
+
+**Performance (41-44)**:
+- 41: AC-7 - <5s single image processing
+- 42: AC-7 - Sustained throughput performance
+- 43: AC-8 - Real-time streaming results
+- 44: AC-8 - Async refinement delivery
+
+**Quality Metrics (45-47)**:
+- 45: AC-9 - Registration rate >95% (baseline)
+- 46: AC-9 - Registration rate >95% (challenging conditions)
+- 47: AC-10 - Mean Reprojection Error <1.0 pixels
+
+**Cross-Cutting (48-50)**:
+- 48: Long flight (3000 images)
+- 49: Degraded satellite data
+- 50: Complete system acceptance validation
+
+**Chunk-Based Recovery (55-56)**:
+- 55: Chunk rotation recovery (rotation sweeps for chunks)
+- 56: Multi-chunk simultaneous processing (Atlas architecture)
+
+### GPS-Analyzed Scenario Tests (51-54): Real Data
+Tests using GPS-analyzed test datasets.
+
+- 51: Test_Baseline (AD000001-030) - Standard flight
+- 52: Test_Outlier_350m (AD000045-050) - Outlier scenario
+- 53: Test_Sharp_Turn - Multiple sharp turn datasets
+- 54: Test_Long_Flight (AD000001-060) - Full dataset
+
+## Test Data
+
+### GPS Analysis Results
+- Mean distance: 120.8m
+- Min distance: 24.2m
+- Max distance: 268.6m
+
+**Identified Sharp Turns (>200m)**:
+- AD000003 → AD000004: 202.2m
+- AD000032 → AD000033: 220.6m
+- AD000042 → AD000043: 234.2m
+- AD000044 → AD000045: 230.2m
+- AD000047 → AD000048: 268.6m (largest outlier)
+
+### Test Datasets
+**Test_Baseline**: AD000001-030 (30 images, normal spacing)
+**Test_Outlier_350m**: AD000045-050 (6 images, 268.6m outlier)
+**Test_Sharp_Turn_A**: AD000042, AD000044, AD000045, AD000046 (skip 043)
+**Test_Sharp_Turn_B**: AD000032-035 (220m jump)
+**Test_Sharp_Turn_C**: AD000003, AD000009 (5-frame gap)
+**Test_Long_Flight**: AD000001-060 (all 60 images, all variations)
+
+## Acceptance Criteria Coverage
+
+| AC | Requirement | Test Specs | Status |
+|----|-------------|------------|--------|
+| AC-1 | 80% < 50m error | 31, 32, 50, 51, 54 | ✓ Covered |
+| AC-2 | 60% < 20m error | 33, 50, 51, 54 | ✓ Covered |
+| AC-3 | 350m outlier robust | 34, 35, 50, 52, 54 | ✓ Covered |
+| AC-4 | Sharp turn <5% overlap | 36, 37, 50, 53, 54, 55 | ✓ Covered |
+| AC-5 | Multi-fragment connection | 39, 50, 56 | ✓ Covered |
+| AC-6 | User input after 3 failures | 40, 50 | ✓ Covered |
+| AC-7 | <5s per image | 41, 42, 50, 51, 54 | ✓ Covered |
+| AC-8 | Real-time + refinement | 43, 44, 50 | ✓ Covered |
+| AC-9 | Registration >95% | 45, 46, 50, 51, 54 | ✓ Covered |
+| AC-10 | MRE <1.0px | 47, 50 | ✓ Covered |
+
+## Component to Test Mapping
+
+| Component | ID | Integration Test |
+|-----------|-----|------------------|
+| Flight API | F01 | 09 |
+| Flight Lifecycle Manager | F02.1 | 11a |
+| Flight Processing Engine | F02.2 | 11b |
+| Flight Database | F03 | 16 |
+| Satellite Data Manager | F04 | 05 |
+| Image Input Pipeline | F05 | 07 |
+| Image Rotation Manager | F06 | 08 |
+| Sequential Visual Odometry | F07 | 01 |
+| Global Place Recognition | F08 | 02 |
+| Metric Refinement | F09 | 03 |
+| Factor Graph Optimizer | F10 | 04 |
+| Failure Recovery Coordinator | F11 | 14 |
+| Route Chunk Manager | F12 | 39, 55, 56 |
+| Coordinate Transformer | F13 | 06 |
+| Result Manager | F14 | 12 |
+| SSE Event Streamer | F15 | 10 |
+| Model Manager | F16 | 13 |
+| Configuration Manager | F17 | 15 |
+
+## Test Execution Strategy
+
+### Phase 1: Component Integration (01-16)
+- Validate each component individually
+- Verify interfaces and dependencies
+- Establish baseline performance metrics
+
+### Phase 2: System Integration (21-25)
+- Test multi-component interactions
+- Validate end-to-end flows
+- Verify error handling across components
+
+### Phase 3: Acceptance Testing (31-50)
+- Validate all acceptance criteria
+- Use GPS-analyzed real data
+- Measure against requirements
+
+### Phase 4: Special Scenarios (51-56)
+- Test specific GPS-identified situations
+- Validate outliers and sharp turns
+- Chunk-based recovery scenarios
+- Full system validation
+
+## Success Criteria Summary
+
+**Integration Tests**: All components pass individual tests, interfaces work correctly
+**System Tests**: Multi-component flows work, errors handled properly
+**Acceptance Tests**: All 10 ACs met with real data
+**Overall**: System meets all requirements, ready for deployment
+
+## Test Metrics to Track
+
+- **Accuracy**: Mean error, RMSE, percentiles
+- **Performance**: Processing time per image, total time
+- **Reliability**: Registration rate, success rate
+- **Quality**: MRE, confidence scores
+- **Robustness**: Outlier handling, error recovery
+
+## Notes
+
+- All test specs follow standard format (Integration vs Acceptance)
+- GPS-analyzed datasets based on actual test data coordinates
+- Specifications ready for QA team implementation
+- No code included per requirement
+- Tests cover all components and all acceptance criteria
@@ -0,0 +1,81 @@
+# Integration Test: Sequential Visual Odometry (Layer 1)
+
+## Summary
+Test the SuperPoint + LightGlue sequential tracking pipeline for frame-to-frame relative pose estimation in continuous UAV flight scenarios.
+
+## Component Under Test
+**Component**: Sequential Visual Odometry (Layer 1)
+**Technologies**: SuperPoint (feature detection), LightGlue (attention-based matching)
+**Location**: `gps_denied_07_sequential_visual_odometry`
+
+## Dependencies
+- Model Manager (TensorRT models for SuperPoint and LightGlue)
+- Image Input Pipeline (preprocessed images)
+- Configuration Manager (algorithm parameters)
+
+## Test Scenarios
+
+### Scenario 1: Normal Sequential Tracking
+**Input Data**:
+- Images: AD000001.jpg through AD000010.jpg (10 consecutive images)
+- Ground truth: coordinates.csv
+- Camera parameters: data_parameters.md (400m altitude, 25mm focal length)
+
+**Expected Output**:
+- Relative pose transformations between consecutive frames
+- Feature match count >100 matches per frame pair
+- Inlier ratio >70% after geometric verification
+- Translation vectors consistent with ~120m spacing
+
+**Maximum Execution Time**: 100ms per frame pair
+
+**Success Criteria**:
+- All 9 frame pairs successfully matched
+- Estimated relative translations within 20% of ground truth distances
+- Rotation estimates within 5 degrees of expected values
+
+### Scenario 2: Low Overlap (<5%)
+**Input Data**:
+- Images: AD000042, AD000044, AD000045 (sharp turn with gap)
+- Sharp turn causes minimal overlap between AD000042 and AD000044
+
+**Expected Output**:
+- LightGlue adaptive depth mechanism activates (more layers)
+- Lower match count (10-50 matches) but high confidence
+- System reports low confidence flag for downstream fusion
+
+**Maximum Execution Time**: 200ms per difficult frame pair
+
+**Success Criteria**:
+- At least 10 high-quality matches found
+- Inlier ratio >50% despite low overlap
+- Confidence metric accurately reflects matching difficulty
+
+### Scenario 3: Repetitive Agricultural Texture
+**Input Data**:
+- Images from AD000015-AD000025 (likely agricultural fields)
+- High texture repetition challenge
+
+**Expected Output**:
+- SuperPoint detects semantically meaningful features (field boundaries, roads)
+- LightGlue dustbin mechanism rejects ambiguous matches
+- Stable tracking despite texture repetition
+
+**Maximum Execution Time**: 100ms per frame pair
+
+**Success Criteria**:
+- Match count >80 per frame pair
+- No catastrophic matching failures (>50% outliers)
+- Tracking continuity maintained across sequence
+
+## Performance Requirements
+- SuperPoint inference: <20ms per image (RTX 2060/3070)
+- LightGlue matching: <80ms per frame pair
+- Combined pipeline: <100ms per frame (normal overlap)
+- TensorRT FP16 optimization mandatory
+
+## Quality Metrics
+- Match count: Mean >100, Min >50 (normal overlap)
+- Inlier ratio: Mean >70%, Min >50%
+- Feature distribution: >30% of image area covered
+- Geometric consistency: Epipolar error <1.0 pixels
@@ -0,0 +1,148 @@
+# Integration Test: Global Place Recognition
+
+## Summary
+Validate the Layer 2 (L2) Global Place Recognition component using AnyLoc (DINOv2 + VLAD) for retrieving matching satellite tiles when UAV tracking is lost.
+
+## Component Under Test
+**Component**: Global Place Recognition (L2)
+**Location**: `gps_denied_08_global_place_recognition`
+**Dependencies**:
+- Model Manager (DINOv2 model)
+- Satellite Data Manager (pre-cached satellite tiles)
+- Coordinate Transformer
+- Faiss index database
+
+## Detailed Description
+This test validates that the Global Place Recognition component can:
+1. Extract DINOv2 features from UAV images
+2. Aggregate features using VLAD into compact descriptors
+3. Query Faiss index to retrieve top-K similar satellite tiles
+4. Handle "kidnapped robot" scenarios (zero overlap with previous frame)
+5. Work with potentially outdated satellite imagery
+
+The component solves the critical problem of recovering location after sharp turns or tracking loss where sequential matching fails.
+
+## Input Data
+
+### Test Case 1: Normal Flight Recovery
+- **UAV Image**: AD000001.jpg
+- **Ground truth GPS**: 48.275292, 37.385220
+- **Expected**: Should retrieve satellite tile containing this location in top-5
+- **Satellite reference**: AD000001_gmaps.png
+
+### Test Case 2: After Sharp Turn
+- **UAV Image**: AD000044.jpg (after skipping AD000043)
+- **Ground truth GPS**: 48.251489, 37.343079
+- **Context**: Simulates zero-overlap scenario
+- **Expected**: Should relocalize despite no sequential context
+
+### Test Case 3: Maximum Distance Jump
+- **UAV Image**: AD000048.jpg (268.6m from previous)
+- **Ground truth GPS**: 48.249114, 37.346895
+- **Context**: Largest outlier in dataset
+- **Expected**: Should retrieve correct region
+
+### Test Case 4: Middle of Route
+- **UAV Image**: AD000030.jpg
+- **Ground truth GPS**: 48.259677, 37.352165
+- **Expected**: Accurate retrieval for interior route point
+
+### Test Case 5: Route Start vs End
+- **UAV Images**: AD000001.jpg and AD000060.jpg
+- **Ground truth GPS**:
+  - AD000001: 48.275292, 37.385220
+  - AD000060: 48.256246, 37.357485
+- **Expected**: Both should retrieve distinct correct regions
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "query_image": "AD000001.jpg",
+  "top_k_tiles": [
+    {
+      "tile_id": "tile_xyz",
+      "center_gps": [lat, lon],
+      "similarity_score": <float 0-1>,
+      "distance_to_gt_m": <float>
+    }
+  ],
+  "top1_correct": true/false,
+  "top5_correct": true/false,
+  "processing_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Per Test Case**:
+- top1_correct = true (best match within 200m of ground truth) OR
+- top5_correct = true (at least one of top-5 within 200m of ground truth)
+- processing_time_ms < 200ms
+- similarity_score of correct match > 0.6
+
+**Test Case Specific**:
+- **Test Case 1**: top1_correct = true (reference image available)
+- **Test Case 2-4**: top5_correct = true
+- **Test Case 5**: Both images should have top5_correct = true
+
+## Maximum Expected Time
+- **Per query**: < 200ms (on RTX 3070)
+- **Per query**: < 300ms (on RTX 2060)
+- **Faiss index initialization**: < 5 seconds
+- **Total test suite**: < 10 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Satellite Data Manager
+   b. Load or create Faiss index with satellite tile descriptors
+   c. Verify satellite coverage for test area (48.25-48.28°N, 37.34-37.39°E)
+   d. Load DINOv2 model via Model Manager
+
+2. **Execution Phase**:
+   For each test case:
+   a. Load UAV image from test data
+   b. Extract DINOv2 features
+   c. Aggregate to VLAD descriptor
+   d. Query Faiss index for top-5 matches
+   e. Calculate distance from retrieved tiles to ground truth GPS
+   f. Record timing and accuracy metrics
+
+3. **Validation Phase**:
+   a. Verify top-K accuracy for each test case
+   b. Check processing time constraints
+   c. Validate similarity scores are reasonable
+   d. Ensure no duplicate tiles in top-K results
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- At least 4 out of 5 test cases meet success criteria (80% pass rate)
+- Average processing time < 200ms
+- No crashes or exceptions
+- Top-5 recall rate > 85%
+
+**Test Fails If**:
+- More than 1 test case fails
+- Any processing time exceeds 500ms
+- Faiss index fails to load or query
+- Memory usage exceeds 8GB
+- Top-5 recall rate < 70%
+
+## Additional Validation
+
+**Robustness Tests**:
+- Query with rotated image (90°, 180°, 270°) - should still retrieve correct tile
+- Query with brightness adjusted image (+/-30%) - should maintain similarity score > 0.5
+- Sequential queries should maintain consistent results (deterministic)
+
+**Performance Metrics to Report**:
+- Top-1 Recall@200m: percentage where best match is within 200m
+- Top-5 Recall@200m: percentage where any of top-5 within 200m
+- Mean Average Precision (MAP)
+- Average query latency
+- Memory footprint of Faiss index
+
@@ -0,0 +1,181 @@
+# Integration Test: Metric Refinement
+
+## Summary
+Validate the Layer 3 (L3) Metric Refinement component using LiteSAM for precise cross-view geo-localization between UAV images and satellite tiles.
+
+## Component Under Test
+**Component**: Metric Refinement (L3)
+**Location**: `gps_denied_09_metric_refinement`
+**Dependencies**:
+- Model Manager (TensorRT engine for LiteSAM)
+- Global Place Recognition (provides candidate satellite tiles)
+- Coordinate Transformer (pixel-to-GPS conversion)
+- Satellite Data Manager
+
+## Detailed Description
+This test validates that the Metric Refinement component can:
+1. Accept a UAV image and candidate satellite tile from L2
+2. Compute dense correspondence field using LiteSAM
+3. Estimate homography transformation between images
+4. Extract precise GPS coordinates from the homography
+5. Achieve target accuracy of <50m for 80% of images and <20m for 60% of images
+6. Handle scale variations due to altitude changes
+
+The component is critical for meeting the accuracy requirements (AC-1, AC-2) by providing absolute GPS anchors to reset drift in the factor graph.
+
+## Input Data
+
+### Test Case 1: High-Quality Reference Match
+- **UAV Image**: AD000001.jpg
+- **Satellite Tile**: AD000001_gmaps.png (reference image available)
+- **Ground truth GPS**: 48.275292, 37.385220
+- **Expected accuracy**: < 10m (best case scenario)
+
+### Test Case 2: Standard Flight Point
+- **UAV Image**: AD000015.jpg
+- **Satellite Tile**: Retrieved via L2 for this location
+- **Ground truth GPS**: 48.268291, 37.369815
+- **Expected accuracy**: < 20m
+
+### Test Case 3: After Sharp Turn
+- **UAV Image**: AD000033.jpg (after 220m jump)
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.258653, 37.347004
+- **Expected accuracy**: < 50m
+
+### Test Case 4: Near Outlier Region
+- **UAV Image**: AD000047.jpg
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.249414, 37.343296
+- **Expected accuracy**: < 50m
+
+### Test Case 5: End of Route
+- **UAV Image**: AD000060.jpg
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.256246, 37.357485
+- **Expected accuracy**: < 20m
+
+### Test Case 6: Multi-Scale Test
+- **UAV Images**: AD000010.jpg, AD000020.jpg, AD000030.jpg
+- **Context**: Test consistency across different parts of route
+- **Expected**: All should achieve < 50m accuracy
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "uav_image": "AD000001.jpg",
+  "satellite_tile_id": "tile_xyz",
+  "estimated_gps": {
+    "lat": <float>,
+    "lon": <float>
+  },
+  "ground_truth_gps": {
+    "lat": <float>,
+    "lon": <float>
+  },
+  "error_meters": <float>,
+  "confidence": <float 0-1>,
+  "num_correspondences": <integer>,
+  "homography_matrix": [[h11, h12, h13], [h21, h22, h23], [h31, h32, h33]],
+  "processing_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Per Test Case**:
+- success = true
+- num_correspondences > 50
+- confidence > 0.6
+- processing_time_ms < 100ms (RTX 3070) or < 150ms (RTX 2060)
+
+**Test Case Specific Accuracy**:
+- **Test Case 1**: error_meters < 10m
+- **Test Case 2**: error_meters < 20m
+- **Test Case 3**: error_meters < 50m
+- **Test Case 4**: error_meters < 50m
+- **Test Case 5**: error_meters < 20m
+- **Test Case 6**: All three < 50m
+
+**Overall Accuracy Targets** (aligned with AC-1, AC-2):
+- At least 80% of test cases achieve error < 50m
+- At least 60% of test cases achieve error < 20m
+
+## Maximum Expected Time
+- **Per image pair**: < 100ms (on RTX 3070)
+- **Per image pair**: < 150ms (on RTX 2060)
+- **Model loading**: < 5 seconds
+- **Total test suite**: < 15 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Model Manager and load LiteSAM TensorRT engine
+   b. Initialize Satellite Data Manager with pre-cached tiles
+   c. Initialize Coordinate Transformer for GPS calculations
+   d. Verify satellite tiles are georeferenced correctly
+
+2. **For Each Test Case**:
+   a. Load UAV image from test data
+   b. Retrieve appropriate satellite tile (use L2 or pre-specified reference)
+   c. Run LiteSAM to compute correspondence field
+   d. Estimate homography from correspondences
+   e. Extract GPS coordinates using homography and satellite tile georeference
+   f. Calculate haversine distance to ground truth
+   g. Record all metrics
+
+3. **Validation Phase**:
+   a. Calculate percentage achieving <50m accuracy
+   b. Calculate percentage achieving <20m accuracy
+   c. Verify processing times meet constraints
+   d. Check for outliers (errors >100m)
+   e. Validate confidence scores correlate with accuracy
+
+4. **Report Generation**:
+   a. Per-image results table
+   b. Accuracy distribution histogram
+   c. Timing statistics
+   d. Pass/fail determination
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- ≥80% of test cases achieve error <50m (meets AC-1)
+- ≥60% of test cases achieve error <20m (meets AC-2)
+- Average processing time <100ms
+- No test case exceeds 200m error
+- Success rate >90%
+
+**Test Fails If**:
+- <80% achieve error <50m
+- <60% achieve error <20m
+- Any test case exceeds 500m error (catastrophic failure)
+- More than 1 test case fails completely (success=false)
+- Average processing time >150ms
+
+## Additional Validation
+
+**Robustness Tests**:
+1. **Scale Variation**: Test with artificially scaled UAV images (0.8x, 1.2x) - should maintain accuracy
+2. **Rotation**: Test with rotated UAV images (±15°) - should detect via rotation manager
+3. **Seasonal Difference**: If available, test with satellite imagery from different season - should maintain <100m accuracy
+4. **Low Contrast**: Test with brightness/contrast adjusted images - should degrade gracefully
+
+**Quality Metrics**:
+- **RMSE (Root Mean Square Error)**: Overall RMSE should be <30m
+- **Median Error**: Should be <25m
+- **90th Percentile Error**: Should be <60m
+- **Correspondence Quality**: Average num_correspondences should be >100
+- **Confidence Correlation**: Correlation between confidence and accuracy should be >0.5
+
+## Error Analysis
+If test fails, analyze:
+- Distribution of errors across test cases
+- Correlation between num_correspondences and accuracy
+- Relationship between GPS distance jumps and accuracy degradation
+- Impact of terrain features (fields vs roads) on accuracy
+- Processing time variance across test cases
+
@@ -0,0 +1,316 @@
+# Integration Test: Factor Graph Optimizer (F10)
+
+## Summary
+Validate the Factor Graph Optimizer component using GTSAM to fuse sequential relative poses (L1) and absolute GPS anchors (L3) into a globally consistent trajectory, with native multi-chunk support for disconnected route segments.
+
+## Component Under Test
+**Component**: Factor Graph Optimizer (F10)
+**Interface**: `IFactorGraphOptimizer`
+**Dependencies**:
+- F07 Sequential Visual Odometry - provides relative factors
+- F09 Metric Refinement - provides absolute GPS factors
+- F12 Route Chunk Manager - chunk lifecycle (F10 provides low-level graph ops)
+- F13 Coordinate Transformer
+- H02 GSD Calculator - scale resolution
+- H03 Robust Kernels - outlier handling
+- GTSAM library
+
+## Detailed Description
+This test validates that the Factor Graph Optimizer can:
+1. Build and maintain a factor graph with pose nodes and measurement factors
+2. Add relative pose factors from sequential tracking
+3. Add absolute GPS factors from satellite anchoring
+4. Apply robust cost functions (Huber/Cauchy) to handle outliers
+5. Optimize the graph to produce globally consistent trajectory
+6. Handle 350m outliers without catastrophic failure
+7. Incrementally update the graph as new measurements arrive
+8. Maintain Mean Reprojection Error (MRE) < 1.0 pixels
+
+The optimizer is the "brain" of ASTRAL-Next, reconciling potentially conflicting measurements to produce the final trajectory.
+
+## Input Data
+
+### Test Case 1: Simple Sequential Chain
+- **Images**: AD000001-AD000010 (10 images)
+- **Input**: 9 relative pose factors from L1 (sequential pairs)
+- **Input**: 3 absolute GPS factors from L3 (images 1, 5, 10)
+- **Expected**: Smooth trajectory close to ground truth
+
+### Test Case 2: With Outlier (AC-3)
+- **Images**: AD000045-AD000050 (6 images, includes 268m outlier)
+- **Input**: 5 relative pose factors (including outlier AD000047→AD000048)
+- **Input**: 2 absolute GPS factors (images 45, 50)
+- **Expected**: Robust kernel should down-weight outlier, trajectory remains consistent
+
+### Test Case 3: Sharp Turn Recovery (AC-4)
+- **Images**: AD000042, AD000044, AD000045, AD000046 (skip AD000043)
+- **Input**: Sequential factors where available, GPS anchor at AD000044
+- **Expected**: Graph should handle missing sequential link, rely on GPS anchor
+
+### Test Case 4: Baseline Full Route
+- **Images**: AD000001-AD000030 (30 images)
+- **Input**: 29 relative factors, 6 GPS anchors (every 5th image)
+- **Expected**: Meet AC-1 and AC-2 accuracy targets
+
+### Test Case 5: Altitude Constraint Test
+- **Images**: AD000010-AD000015
+- **Input**: Relative factors with varying scale, altitude prior = 400m
+- **Expected**: Optimizer should maintain consistent altitude ~400m
+
+### Test Case 6: Incremental Update Test
+- **Images**: AD000001-AD000020
+- **Input**: Add measurements incrementally (simulate real-time operation)
+- **Expected**: Trajectory should converge smoothly, past poses may be refined
+
+### Test Case 7: Create Chunk Subgraph
+- **Input**: flight_id, chunk_id, start_frame_id = 20
+- **Expected**: 
+  - create_chunk_subgraph() returns True
+  - New subgraph created for chunk
+  - Chunk isolated from main trajectory
+
+### Test Case 8: Add Relative Factors to Chunk
+- **Chunk**: chunk_2 with frames 20-30
+- **Input**: 10 relative factors from VO
+- **Expected**: 
+  - add_relative_factor_to_chunk() returns True for each
+  - Factors added to chunk's subgraph only
+  - Main trajectory unaffected
+
+### Test Case 9: Add Chunk Anchor
+- **Chunk**: chunk_2 (frames 20-30, unanchored)
+- **Input**: GPS anchor at frame 25
+- **Expected**: 
+  - add_chunk_anchor() returns True
+  - Chunk can now be merged
+  - Chunk optimization triggered
+
+### Test Case 10: Optimize Chunk
+- **Chunk**: chunk_2 with anchor
+- **Input**: optimize_chunk(chunk_id, iterations=10)
+- **Expected**: 
+  - Returns OptimizationResult
+  - converged = True
+  - Chunk trajectory consistent
+  - Other chunks unaffected
+
+### Test Case 11: Merge Chunk Subgraphs
+- **Chunks**: chunk_1 (frames 1-10), chunk_2 (frames 20-30, anchored)
+- **Input**: merge_chunk_subgraphs(flight_id, chunk_2, chunk_1, transform)
+- **Expected**: 
+  - Returns True
+  - chunk_2 merged into chunk_1
+  - Sim(3) transform applied correctly
+  - Global consistency maintained
+
+### Test Case 12: Get Chunk Trajectory
+- **Chunk**: chunk_2 with 10 frames
+- **Input**: get_chunk_trajectory(flight_id, chunk_id)
+- **Expected**: 
+  - Returns Dict[int, Pose] with 10 frames
+  - Poses in chunk's coordinate system
+
+### Test Case 13: Optimize Global
+- **Setup**: 3 chunks, 2 anchored, 1 merged
+- **Input**: optimize_global(flight_id, iterations=50)
+- **Expected**: 
+  - All chunks optimized together
+  - Global consistency achieved
+  - Returns OptimizationResult with all frame IDs
+
+### Test Case 14: Multi-Flight Isolation
+- **Setup**: 2 flights processing simultaneously
+- **Input**: Add factors to both flights
+- **Expected**: 
+  - Each flight's graph isolated
+  - No cross-contamination
+  - Independent optimization results
+
+### Test Case 15: Delete Flight Graph
+- **Setup**: Flight with complex trajectory and chunks
+- **Input**: delete_flight_graph(flight_id)
+- **Expected**: 
+  - Returns True
+  - All resources cleaned up
+  - No memory leaks
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "num_poses": <integer>,
+  "num_factors": <integer>,
+  "num_relative_factors": <integer>,
+  "num_absolute_factors": <integer>,
+  "optimization_iterations": <integer>,
+  "final_error": <float>,
+  "mean_reprojection_error_px": <float>,
+  "optimized_trajectory": [
+    {
+      "image": "AD000001.jpg",
+      "estimated_gps": [lat, lon],
+      "ground_truth_gps": [lat, lon],
+      "error_m": <float>,
+      "covariance": [[...]]
+    }
+  ],
+  "accuracy_stats": {
+    "percent_under_50m": <float>,
+    "percent_under_20m": <float>,
+    "mean_error_m": <float>,
+    "median_error_m": <float>,
+    "rmse_m": <float>
+  },
+  "processing_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Per Test Case**:
+
+**Test Case 1 (Simple Chain)**:
+- success = true
+- percent_under_50m ≥ 90%
+- mean_reprojection_error_px < 1.0
+- processing_time_ms < 500ms
+
+**Test Case 2 (With Outlier)**:
+- success = true
+- Outlier factor should have low weight in final solution
+- Other poses should maintain accuracy (percent_under_50m ≥ 80%)
+- mean_reprojection_error_px < 1.5
+
+**Test Case 3 (Sharp Turn)**:
+- success = true
+- Gap between AD000042 and AD000044 should be bridged by GPS anchor
+- percent_under_50m ≥ 75%
+- processing_time_ms < 300ms
+
+**Test Case 4 (Baseline)**:
+- percent_under_50m ≥ 80% (meets AC-1)
+- percent_under_20m ≥ 60% (meets AC-2)
+- mean_reprojection_error_px < 1.0 (meets AC-10)
+- processing_time_ms < 1000ms
+
+**Test Case 5 (Altitude)**:
+- All poses should have altitude within ±50m of 400m
+- Scale drift should be prevented
+
+**Test Case 6 (Incremental)**:
+- Each incremental update completes in <100ms
+- Final trajectory matches batch optimization (within 5m)
+
+**Test Cases 7-9 (Chunk Creation & Factors)**:
+- Chunk subgraph created successfully
+- Factors added to correct chunk
+- Chunk anchor enables merging
+
+**Test Cases 10-11 (Chunk Optimization & Merging)**:
+- Chunk optimizes independently
+- Sim(3) transform applied correctly
+- Merged trajectory globally consistent
+
+**Test Cases 12-13 (Chunk Queries & Global)**:
+- Chunk trajectory retrieved correctly
+- Global optimization handles all chunks
+
+**Test Cases 14-15 (Isolation & Cleanup)**:
+- Multi-flight isolation maintained
+- Resource cleanup complete
+
+## Maximum Expected Time
+- **Small graph (10 poses)**: < 500ms
+- **Medium graph (30 poses)**: < 1000ms
+- **Incremental update**: < 100ms per new pose
+- **Create chunk subgraph**: < 10ms
+- **Add factor to chunk**: < 5ms
+- **Add chunk anchor**: < 50ms
+- **Optimize chunk (10 frames)**: < 100ms
+- **Merge chunks**: < 200ms
+- **Optimize global (50 frames, 3 chunks)**: < 500ms
+- **Total test suite**: < 60 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize GTSAM optimizer with appropriate noise models
+   b. Configure robust cost functions (Huber kernel with threshold 50m)
+   c. Set up altitude prior constraint (400m ± 50m)
+   d. Initialize coordinate transformer
+
+2. **For Each Test Case**:
+   a. Load ground truth GPS data
+   b. Construct simulated L1 relative factors (or load from pre-computed)
+   c. Construct simulated L3 absolute factors (with realistic noise)
+   d. Build factor graph incrementally:
+      - Add pose variables
+      - Add relative factors between consecutive poses
+      - Add absolute GPS factors at specified anchors
+      - Add altitude prior factors
+   e. Run optimization
+   f. Extract optimized trajectory
+   g. Compare against ground truth
+   h. Calculate accuracy statistics
+
+3. **Validation Phase**:
+   a. Verify accuracy targets are met
+   b. Check mean reprojection error
+   c. Validate processing times
+   d. Inspect covariances (uncertainty should be reasonable)
+   e. Check that robust kernels activated for outliers
+
+4. **Report Generation**:
+   a. Trajectory visualization (if possible)
+   b. Error distribution plots
+   c. Factor residuals before/after optimization
+   d. Timing breakdown
+   e. Pass/fail determination
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- At least 12 out of 15 test cases meet their individual success criteria
+- Test Case 4 (Baseline) must pass (validates AC-1, AC-2, AC-10)
+- Test Cases 7-11 (Chunk operations) must pass (validates multi-chunk architecture)
+- No crashes or numerical instabilities
+- Memory usage remains stable
+
+**Test Fails If**:
+- Test Case 4 fails to meet AC-1, AC-2, or AC-10
+- Chunk creation/merging fails
+- Multi-flight isolation violated
+- More than 3 test cases fail completely
+- Optimizer produces NaN or infinite values
+- Processing time exceeds 2x maximum expected time
+- Memory leak detected (>500MB growth)
+
+## Additional Validation
+
+**Robustness Analysis**:
+1. **Outlier Rejection**: Verify that outlier factors have reduced weights after optimization
+2. **Convergence**: Check that optimization converges within 50 iterations
+3. **Consistency**: Run optimization multiple times with same input - should get identical results
+4. **Covariance**: Verify that uncertainty grows in sections with no GPS anchors
+
+**Edge Cases**:
+1. **No GPS Anchors**: Graph with only relative factors - should drift but not crash
+2. **No Relative Factors**: Graph with only GPS anchors - should produce disconnected poses
+3. **Conflicting Measurements**: GPS anchor disagrees with relative factors by 100m - robust kernel should handle
+
+**Performance Metrics**:
+- **Optimization speedup**: Compare incremental (iSAM2) vs batch optimization
+- **Scalability**: Test with 100 poses - should complete in <5 seconds
+- **Memory efficiency**: Memory usage should scale linearly with number of poses
+
+## Error Analysis
+If test fails, analyze:
+- Which factors have largest residuals
+- Whether robust kernels are activating appropriately
+- If altitude constraint is being respected
+- Whether relative factors are consistent with absolute factors
+- If numerical conditioning is good (check condition number of Hessian)
+- Correlation between number of GPS anchors and accuracy
+
@@ -0,0 +1,226 @@
+# Integration Test: Satellite Data Manager
+
+## Summary
+Validate the Satellite Data Manager component responsible for downloading, caching, and providing georeferenced satellite tiles from Google Maps for the operational area.
+
+## Component Under Test
+**Component**: Satellite Data Manager
+**Location**: `gps_denied_04_satellite_data_manager`
+**Dependencies**:
+- Google Maps Static API
+- Coordinate Transformer (for tile bounds calculation)
+- File system / cache storage
+- Database Layer (for tile metadata)
+
+## Detailed Description
+This test validates that the Satellite Data Manager can:
+1. Download satellite tiles from Google Maps Static API for specified GPS coordinates
+2. Calculate correct tile bounding boxes using Web Mercator projection
+3. Cache tiles efficiently to avoid redundant downloads
+4. Provide georeferenced tiles with accurate meters-per-pixel (GSD)
+5. Handle zoom level 19 for operational area (Eastern/Southern Ukraine)
+6. Manage tile expiration and refresh
+7. Handle API errors and rate limiting gracefully
+
+The component provides the reference satellite imagery that all absolute localization (L2, L3) depends on.
+
+## Input Data
+
+### Test Case 1: Single Tile Download
+- **Center GPS**: 48.275292, 37.385220 (AD000001 location)
+- **Zoom Level**: 19
+- **Tile Size**: 640x640 pixels
+- **Expected**: Download and cache tile with correct georeferencing
+
+### Test Case 2: Area Coverage
+- **Bounding Box**: 
+  - North: 48.28°, South: 48.25°
+  - East: 37.39°, West: 37.34°
+- **Zoom Level**: 19
+- **Expected**: Download grid of overlapping tiles covering entire area
+
+### Test Case 3: Cache Hit
+- **Request**: Same tile as Test Case 1
+- **Expected**: Return cached tile without API call, verify integrity
+
+### Test Case 4: Georeferencing Accuracy
+- **Center GPS**: 48.260117, 37.353469 (AD000029 location)
+- **Zoom Level**: 19
+- **Expected**: Calculate meters_per_pixel accurately (~0.30 m/px at 48°N)
+
+### Test Case 5: Tile Bounds Calculation
+- **Center GPS**: 48.256246, 37.357485 (AD000060 location)
+- **Expected**: Northwest and Southeast corners calculated correctly
+
+### Test Case 6: Multiple Zoom Levels
+- **Center GPS**: 48.270334, 37.374442 (AD000011 location)
+- **Zoom Levels**: 17, 18, 19
+- **Expected**: Download and correctly georeference tiles at all zoom levels
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "tile_id": "unique_tile_identifier",
+  "center_gps": {"lat": <float>, "lon": <float>},
+  "zoom_level": <integer>,
+  "tile_size_px": {"width": <int>, "height": <int>},
+  "bounds": {
+    "nw": {"lat": <float>, "lon": <float>},
+    "se": {"lat": <float>, "lon": <float>}
+  },
+  "meters_per_pixel": <float>,
+  "file_path": "path/to/cached/tile.png",
+  "file_size_kb": <float>,
+  "cached": true/false,
+  "download_time_ms": <float>,
+  "api_calls_made": <integer>
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (Single Tile)**:
+- success = true
+- tile downloaded and saved to cache
+- file_size_kb > 10 (valid PNG image)
+- download_time_ms < 5000
+- meters_per_pixel ≈ 0.30 (±0.05) at zoom 19
+
+**Test Case 2 (Area Coverage)**:
+- success = true for all tiles
+- Coverage complete (no gaps)
+- Total download time < 120 seconds
+- All tiles have valid georeferencing
+
+**Test Case 3 (Cache Hit)**:
+- cached = true
+- download_time_ms < 100 (cache read)
+- api_calls_made = 0
+- File integrity verified (hash match)
+
+**Test Case 4 (Georeferencing)**:
+- meters_per_pixel calculation error < 1%
+- Bounding box corners consistent with center point
+
+**Test Case 5 (Bounds Calculation)**:
+- NW corner is actually northwest of center
+- SE corner is actually southeast of center
+- Distance from NW to SE matches tile_size_px * meters_per_pixel
+
+**Test Case 6 (Multi-Zoom)**:
+- All zoom levels download successfully
+- meters_per_pixel doubles with each zoom level decrease
+- Higher zoom (19) has better resolution than lower zoom (17)
+
+## Maximum Expected Time
+- **Single tile download**: < 5 seconds
+- **Cache read**: < 100ms
+- **Area coverage (20 tiles)**: < 120 seconds
+- **Total test suite**: < 180 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Configure Google Maps API key
+   b. Initialize cache directory
+   c. Clear any existing cached tiles for clean test
+   d. Initialize database for tile metadata
+
+2. **Test Case 1 - Single Tile**:
+   a. Request tile for AD000001 location
+   b. Verify API call made
+   c. Check file downloaded and cached
+   d. Validate georeferencing metadata
+   e. Verify image is valid PNG
+
+3. **Test Case 2 - Area Coverage**:
+   a. Define bounding box for test area
+   b. Calculate required tile grid
+   c. Request all tiles
+   d. Verify complete coverage
+   e. Check for overlaps
+
+4. **Test Case 3 - Cache Hit**:
+   a. Request same tile as Test Case 1
+   b. Verify no API call made
+   c. Verify fast retrieval
+   d. Check file integrity
+
+5. **Test Case 4 - Georeferencing**:
+   a. Request tile
+   b. Calculate expected meters_per_pixel using formula:
+      `meters_per_pixel = 156543.03392 * cos(lat * π/180) / 2^zoom`
+   c. Compare with reported value
+   d. Validate bounds calculation
+
+6. **Test Case 5 - Bounds**:
+   a. Request tile
+   b. Verify NW corner < center < SE corner
+   c. Calculate distances using haversine
+   d. Verify consistency
+
+7. **Test Case 6 - Multi-Zoom**:
+   a. Request same location at zoom 17, 18, 19
+   b. Verify resolution differences
+   c. Check file sizes increase with zoom
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 6 test cases meet their success criteria
+- No API errors (except acceptable rate limiting with retry)
+- All tiles are valid images
+- Georeferencing accuracy within 1%
+- Cache mechanism works correctly
+- No file system errors
+
+**Test Fails If**:
+- Any tile download fails permanently
+- Georeferencing error > 5%
+- Cache hits return wrong tiles
+- API key invalid or quota exceeded
+- File system permissions prevent caching
+- Memory usage exceeds 2GB
+
+## Additional Validation
+
+**API Resilience**:
+- Test with invalid API key - should fail gracefully with clear error
+- Test with rate limiting simulation - should retry with exponential backoff
+- Test with network timeout - should handle timeout and retry
+
+**Cache Management**:
+- Verify cache size limits are enforced
+- Test cache eviction (LRU or similar)
+- Verify stale tile detection and refresh
+
+**Coordinate Edge Cases**:
+- Test at extreme latitudes (if applicable)
+- Test at dateline crossing (lon ≈ 180°)
+- Test tile alignment at zoom level boundaries
+
+**Quality Metrics**:
+- Image quality inspection (not completely black/white/corrupted)
+- Verify actual zoom level matches requested
+- Check for Google watermarks/logos presence (expected)
+
+## Error Scenarios to Test
+
+1. **No Internet Connection**: Should fail gracefully, use cached tiles if available
+2. **Invalid GPS Coordinates**: Should reject or clamp to valid range
+3. **Unsupported Zoom Level**: Should reject or default to nearest valid zoom
+4. **Disk Space Full**: Should fail with clear error message
+5. **Corrupted Cache File**: Should detect and re-download
+
+## Performance Metrics to Report
+
+- Average download time per tile
+- Cache hit ratio
+- API calls per test run
+- Peak memory usage
+- Disk space used by cache
+- Tile download success rate
+
@@ -0,0 +1,243 @@
+# Integration Test: Coordinate Transformer
+
+## Summary
+Validate the Coordinate Transformer component for accurate conversion between GPS coordinates (WGS84), Web Mercator projections (EPSG:3857), and pixel coordinates in satellite tiles.
+
+## Component Under Test
+**Component**: Coordinate Transformer
+**Location**: `gps_denied_12_coordinate_transformer`
+**Dependencies**:
+- Python libraries: pyproj or similar
+- Mathematical projection formulas
+- None (standalone utility component)
+
+## Detailed Description
+This test validates that the Coordinate Transformer can:
+1. Convert GPS coordinates (lat/lon) to Web Mercator (x/y meters)
+2. Convert Web Mercator to GPS coordinates (inverse projection)
+3. Calculate pixel coordinates within satellite tiles at various zoom levels
+4. Compute meters-per-pixel (GSD) accurately for different latitudes
+5. Handle edge cases (poles, dateline, etc.)
+6. Maintain numerical precision to avoid GPS drift errors
+7. Calculate haversine distances between GPS points
+
+This component is critical for georeferencing satellite tiles and converting between different coordinate systems used throughout ASTRAL-Next.
+
+## Input Data
+
+### Test Case 1: GPS to Mercator Conversion
+- **Input GPS**: 48.275292, 37.385220 (AD000001)
+- **Expected Mercator**: Calculated using Web Mercator formula
+- **Validation**: Inverse conversion should return original GPS
+
+### Test Case 2: Mercator to GPS Conversion
+- **Input Mercator**: Result from Test Case 1
+- **Expected GPS**: 48.275292, 37.385220
+- **Tolerance**: < 0.000001° (≈0.1m)
+
+### Test Case 3: Meters-Per-Pixel Calculation
+- **Inputs**: Various latitudes and zoom levels
+  - Lat 48°N, Zoom 19: Expected ~0.30 m/px
+  - Lat 48°N, Zoom 18: Expected ~0.60 m/px
+  - Lat 48°N, Zoom 17: Expected ~1.20 m/px
+  - Lat 0° (Equator), Zoom 19: Expected 0.298 m/px
+- **Formula**: `156543.03392 * cos(lat * π/180) / 2^zoom`
+
+### Test Case 4: Pixel to GPS Conversion
+- **Satellite Tile**: Center at 48.260117, 37.353469, Zoom 19, Size 640x640
+- **Pixel Coordinate**: (320, 320) - center pixel
+- **Expected GPS**: 48.260117, 37.353469
+- **Tolerance**: < 0.00001° (≈1m)
+
+### Test Case 5: GPS to Pixel Conversion
+- **Satellite Tile**: Same as Test Case 4
+- **Input GPS**: 48.260117, 37.353469
+- **Expected Pixel**: (320, 320)
+- **Tolerance**: < 1 pixel
+
+### Test Case 6: Haversine Distance Calculation
+- **Point A**: 48.275292, 37.385220 (AD000001)
+- **Point B**: 48.275001, 37.382922 (AD000002)
+- **Expected Distance**: ~150-200m (verified by external tool)
+- **Tolerance**: < 1m
+
+### Test Case 7: Boundary Calculations
+- **Center GPS**: 48.256246, 37.357485 (AD000060)
+- **Tile Size**: 640x640 pixels, Zoom 19
+- **Calculate**: NW and SE corner coordinates
+- **Validation**: Center should be equidistant from corners
+
+### Test Case 8: Precision Test
+- **Round Trip**: GPS → Mercator → Pixel → GPS
+- **Input**: All 60 test images GPS coordinates
+- **Expected**: Final GPS within 0.1m of original
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "test_case": "GPS_to_Mercator",
+  "input": {"lat": 48.275292, "lon": 37.385220},
+  "output": {"x_meters": <float>, "y_meters": <float>},
+  "expected": {"x_meters": <float>, "y_meters": <float>},
+  "error": <float>,
+  "success": true/false,
+  "processing_time_us": <float>
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (GPS to Mercator)**:
+- Conversion completes without error
+- Output values are finite and reasonable
+- processing_time_us < 100
+
+**Test Case 2 (Mercator to GPS)**:
+- Round-trip error < 0.000001° (≈0.1m)
+- success = true
+
+**Test Case 3 (Meters-Per-Pixel)**:
+- All calculated m/px within 1% of expected values
+- Correct relationship: m/px doubles when zoom decreases by 1
+
+**Test Case 4 (Pixel to GPS)**:
+- Error < 0.00001° (≈1m at 48°N)
+- Center pixel correctly maps to center GPS
+
+**Test Case 5 (GPS to Pixel)**:
+- Pixel coordinate error < 1.0 pixel
+- Result is within tile bounds (0-640)
+
+**Test Case 6 (Haversine)**:
+- Distance calculation error < 1m
+- Consistent with external validation
+
+**Test Case 7 (Boundaries)**:
+- NW corner has smaller lat/lon than center
+- SE corner has larger lat/lon than center
+- Distances from center to corners are equal (within 1%)
+
+**Test Case 8 (Precision)**:
+- All 60 coordinates survive round-trip with < 0.1m error
+- No accumulation of errors
+
+## Maximum Expected Time
+- **Per conversion**: < 100 microseconds
+- **Haversine distance**: < 50 microseconds
+- **Complex transformation chain**: < 500 microseconds
+- **Total test suite**: < 1 second
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Coordinate Transformer
+   b. Load test GPS coordinates from coordinates.csv
+   c. Pre-calculate expected values for validation
+
+2. **Test Case 1 - GPS to Mercator**:
+   a. Convert test GPS coordinates to Mercator
+   b. Verify output is reasonable
+   c. Store for inverse test
+
+3. **Test Case 2 - Mercator to GPS**:
+   a. Take output from Test Case 1
+   b. Convert back to GPS
+   c. Compare with original input
+   d. Calculate round-trip error
+
+4. **Test Case 3 - Meters-Per-Pixel**:
+   a. Calculate m/px for various lat/zoom combinations
+   b. Compare with expected formula results
+   c. Verify zoom level relationships
+
+5. **Test Case 4 & 5 - Pixel Conversions**:
+   a. Set up test satellite tile parameters
+   b. Convert between pixel and GPS coordinates
+   c. Validate both directions
+
+6. **Test Case 6 - Haversine**:
+   a. Calculate distances between all consecutive image pairs
+   b. Verify distance statistics (mean ~120m)
+   c. Check specific known distances
+
+7. **Test Case 7 - Boundaries**:
+   a. Calculate tile boundaries for test locations
+   b. Verify geometric consistency
+   c. Check corner relationships
+
+8. **Test Case 8 - Precision**:
+   a. Run all 60 test coordinates through full transformation chain
+   b. Measure accumulated errors
+   c. Verify no systematic bias
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 8 test cases meet their success criteria
+- No numerical errors (NaN, Inf)
+- Performance meets timing constraints
+- Round-trip precision < 0.1m for all test cases
+- No memory leaks
+
+**Test Fails If**:
+- Any round-trip error > 1m
+- Meters-per-pixel calculation error > 5%
+- Haversine distance error > 10m
+- Any conversion produces NaN or Inf
+- Performance degrades over repeated calls
+
+## Additional Validation
+
+**Edge Cases**:
+1. **North Pole** (lat = 90°): Should handle or reject gracefully
+2. **South Pole** (lat = -90°): Should handle or reject gracefully
+3. **Dateline Crossing** (lon ≈ ±180°): Should handle correctly
+4. **Prime Meridian** (lon = 0°): Should work normally
+5. **Equator** (lat = 0°): Should calculate m/px correctly
+
+**Numerical Stability**:
+- Test with very small angles (< 0.0001°)
+- Test with coordinates close to each other (< 1m apart)
+- Test with maximum zoom level (23)
+- Test with minimum zoom level (0)
+
+**Consistency Checks**:
+- Distance from A to B equals distance from B to A
+- Triangle inequality: dist(A,C) ≤ dist(A,B) + dist(B,C)
+- Midpoint calculation: dist(A,mid) = dist(mid,B) = dist(A,B)/2
+
+**Performance Benchmarks**:
+- 10,000 GPS to Mercator conversions in < 1 second
+- 10,000 Haversine distance calculations in < 500ms
+- No performance degradation over 1,000,000 calls
+
+## Reference Values
+
+For validation, pre-calculated values:
+
+**Reference Point 1** (AD000001):
+- GPS: 48.275292°N, 37.385220°E
+- Web Mercator X: ~4,161,880 meters
+- Web Mercator Y: ~6,125,450 meters
+
+**Reference Point 2** (AD000030):
+- GPS: 48.259677°N, 37.352165°E
+- Distance from Point 1: ~2,865 meters (haversine)
+
+**Meters-per-pixel at 48°N**:
+- Zoom 17: ~1.195 m/px
+- Zoom 18: ~0.597 m/px
+- Zoom 19: ~0.299 m/px
+- Zoom 20: ~0.149 m/px
+
+## Error Analysis
+
+If test fails, analyze:
+- Which conversion direction has larger errors
+- Whether errors accumulate in round-trips
+- If errors are latitude-dependent
+- Whether zoom level affects accuracy
+- If there's systematic bias (always over/under)
+
@@ -0,0 +1,289 @@
+# Integration Test: Image Input Pipeline
+
+## Summary
+Validate the Image Input Pipeline component for loading, preprocessing, and preparing UAV images for processing by vision algorithms.
+
+## Component Under Test
+**Component**: Image Input Pipeline
+**Location**: `gps_denied_05_image_input_pipeline`
+**Dependencies**:
+- Image loading libraries (PIL/OpenCV)
+- Image Rotation Manager (for orientation correction)
+- File system access
+- Configuration Manager
+
+## Detailed Description
+This test validates that the Image Input Pipeline can:
+1. Load UAV images from disk in various formats (JPG, PNG)
+2. Read and validate image metadata (resolution, EXIF if available)
+3. Resize images appropriately for different processing layers
+4. Normalize pixel values for neural network input
+5. Handle high-resolution images (6252x4168) efficiently
+6. Detect and handle corrupted images gracefully
+7. Maintain consistent image quality during preprocessing
+8. Support batch loading for multiple images
+
+The component is the entry point for all UAV imagery into the ASTRAL-Next system.
+
+## Input Data
+
+### Test Case 1: Standard Resolution Image
+- **Image**: AD000001.jpg
+- **Expected Resolution**: 6252x4168 (26MP)
+- **Format**: JPEG
+- **Expected**: Load successfully, extract metadata
+
+### Test Case 2: Batch Loading
+- **Images**: AD000001-AD000010 (10 images)
+- **Expected**: All load successfully, maintain order
+
+### Test Case 3: Image Resizing for Vision Layers
+- **Image**: AD000015.jpg
+- **Target Resolutions**:
+  - L1 (Sequential VO): 1024x683 (for SuperPoint)
+  - L2 (Global PR): 512x341 (for AnyLoc)
+  - L3 (Metric Ref): 640x427 (for LiteSAM)
+- **Expected**: Maintain aspect ratio, preserve image quality
+
+### Test Case 4: Pixel Normalization
+- **Image**: AD000020.jpg
+- **Target Format**: Float32, range [0, 1] or [-1, 1]
+- **Expected**: Correct normalization for neural network input
+
+### Test Case 5: EXIF Data Extraction
+- **Image**: AD000001.jpg
+- **Expected EXIF**: Camera model, focal length, ISO, capture time
+- **Note**: May not be present in all images
+
+### Test Case 6: Image Sequence Loading
+- **Images**: AD000001-AD000060 (all 60 images)
+- **Expected**: Load in order, track sequence number
+
+### Test Case 7: Corrupted Image Handling
+- **Image**: Create test file with corrupted data
+- **Expected**: Detect corruption, fail gracefully with clear error
+
+### Test Case 8: Rotation Detection
+- **Image**: AD000025.jpg (may have orientation metadata)
+- **Expected**: Detect if rotation needed, coordinate with Image Rotation Manager
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "image_path": "path/to/image.jpg",
+  "image_id": "AD000001",
+  "original_resolution": {"width": 6252, "height": 4168},
+  "loaded_resolution": {"width": <int>, "height": <int>},
+  "format": "JPEG",
+  "file_size_kb": <float>,
+  "exif_data": {
+    "camera_model": "string",
+    "focal_length_mm": <float>,
+    "iso": <int>,
+    "capture_time": "timestamp"
+  },
+  "preprocessing_applied": ["resize", "normalize"],
+  "output_dtype": "float32",
+  "pixel_value_range": {"min": <float>, "max": <float>},
+  "loading_time_ms": <float>,
+  "preprocessing_time_ms": <float>,
+  "memory_mb": <float>
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (Standard Loading)**:
+- success = true
+- original_resolution matches expected (6252x4168)
+- format detected correctly
+- loading_time_ms < 500
+- memory_mb < 200 (for single 26MP image)
+
+**Test Case 2 (Batch Loading)**:
+- All 10 images load successfully
+- Images maintain sequential order
+- Total loading_time < 5 seconds
+- Memory usage scales linearly
+
+**Test Case 3 (Resizing)**:
+- All target resolutions achieved
+- Aspect ratio preserved (within 1%)
+- No significant quality degradation (PSNR > 30dB)
+- Resizing time < 200ms per image
+
+**Test Case 4 (Normalization)**:
+- Pixel values in expected range
+- No clipping (min ≥ 0, max ≤ 1 for [0,1] normalization)
+- Mean and std dev reasonable for natural images
+- Normalization time < 100ms
+
+**Test Case 5 (EXIF)**:
+- EXIF data extracted if present
+- Missing EXIF handled gracefully (not error)
+- Camera model matches known: "ADTi Surveyor Lite 26S v2"
+- Focal length matches known: 25mm
+
+**Test Case 6 (Sequence)**:
+- All 60 images load successfully
+- Sequential order maintained
+- Total loading time < 30 seconds
+- Memory usage stays bounded (< 2GB)
+
+**Test Case 7 (Corruption)**:
+- Corruption detected
+- Clear error message provided
+- No crash or hang
+- Other images in batch can still load
+
+**Test Case 8 (Rotation)**:
+- Rotation metadata detected if present
+- Coordinates with Image Rotation Manager
+- Correct orientation applied
+
+## Maximum Expected Time
+- **Single image load**: < 500ms (6252x4168)
+- **Single image resize**: < 200ms
+- **Single image normalize**: < 100ms
+- **Batch 60 images**: < 30 seconds
+- **Total test suite**: < 60 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Image Input Pipeline
+   b. Configure preprocessing parameters
+   c. Verify test images exist
+   d. Create corrupted test image
+
+2. **Test Case 1 - Standard Loading**:
+   a. Load AD000001.jpg
+   b. Verify resolution and format
+   c. Check metadata extraction
+   d. Measure timing and memory
+
+3. **Test Case 2 - Batch Loading**:
+   a. Load AD000001-AD000010
+   b. Verify all load successfully
+   c. Check sequential order
+   d. Measure batch timing
+
+4. **Test Case 3 - Resizing**:
+   a. Load AD000015.jpg
+   b. Resize to multiple target resolutions
+   c. Verify aspect ratio preservation
+   d. Check image quality (visual or PSNR)
+
+5. **Test Case 4 - Normalization**:
+   a. Load and normalize AD000020.jpg
+   b. Check pixel value range
+   c. Verify data type conversion
+   d. Validate statistics (mean/std)
+
+6. **Test Case 5 - EXIF**:
+   a. Load AD000001.jpg
+   b. Attempt EXIF extraction
+   c. Verify known camera parameters
+   d. Handle missing EXIF gracefully
+
+7. **Test Case 6 - Sequence**:
+   a. Load all 60 images
+   b. Verify complete and ordered
+   c. Monitor memory usage
+   d. Check for memory leaks
+
+8. **Test Case 7 - Corruption**:
+   a. Attempt to load corrupted test file
+   b. Verify error detection
+   c. Check error message quality
+   d. Ensure no crash
+
+9. **Test Case 8 - Rotation**:
+   a. Check for orientation metadata
+   b. Apply rotation if needed
+   c. Verify correct orientation
+   d. Measure rotation timing
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- At least 7 out of 8 test cases pass (Test Case 5 may fail if EXIF missing)
+- All valid images load successfully
+- No crashes or hangs
+- Memory usage bounded
+- Timing constraints met
+
+**Test Fails If**:
+- Any valid image fails to load
+- Resolution or format detection fails
+- Memory leak detected (>500MB growth)
+- Any timing exceeds 2x maximum expected
+- Corrupted images cause crash
+- Batch loading loses or reorders images
+
+## Additional Validation
+
+**Image Quality Tests**:
+1. **Lossy Preprocessing**: Verify resizing doesn't introduce artifacts
+2. **Color Space**: Verify RGB ordering (not BGR)
+3. **Bit Depth**: Handle 8-bit and 16-bit images
+4. **Compression**: Test various JPEG quality levels
+
+**Memory Management**:
+1. **Cleanup**: Verify images released from memory after processing
+2. **Large Batch**: Test with 100+ images (memory should not explode)
+3. **Repeated Loading**: Load same image 1000 times (no memory growth)
+
+**Format Support**:
+1. **JPEG**: Primary format (required)
+2. **PNG**: Should support (for satellite tiles)
+3. **TIFF**: Optional (may be used for high-quality)
+4. **RAW**: Not required but document limitations
+
+**Edge Cases**:
+1. **Non-Existent File**: Should fail with clear file-not-found error
+2. **Empty File**: Should detect as corrupted
+3. **Non-Image File**: Should reject with clear error
+4. **Symbolic Links**: Should follow or reject appropriately
+5. **Permissions**: Should handle read permission errors
+
+**Performance Benchmarks**:
+- Load 1000 FullHD images in < 60 seconds
+- Resize 100 images to 640x480 in < 10 seconds
+- Normalize 100 images in < 5 seconds
+- Memory footprint per loaded image < 200MB
+
+## Configuration Testing
+
+Test various configuration options:
+- **Resize Method**: bilinear, bicubic, lanczos
+- **Normalization**: [0,1], [-1,1], ImageNet statistics
+- **Color Mode**: RGB, grayscale
+- **Cache Behavior**: cache vs load-on-demand
+
+## Camera Parameters Validation
+
+For the known camera (ADTi Surveyor Lite 26S v2):
+- Resolution: 6252x4168 (26MP) ✓
+- Focal Length: 25mm ✓
+- Sensor Width: 23.5mm ✓
+- Expected Ground Sampling Distance (GSD) at 400m altitude: ~37.6 cm/pixel
+
+Calculate and validate GSD: `GSD = (altitude * sensor_width) / (focal_length * image_width)`
+- GSD = (400m * 23.5mm) / (25mm * 6252) = 0.0602m = 6.02cm/pixel per sensor pixel
+- Effective GSD in final image ≈ 6cm/pixel
+
+## Error Handling Tests
+
+Verify appropriate errors for:
+- File path with invalid characters
+- Image with unsupported format
+- Image with invalid header
+- Truncated image file
+- Image with mismatched metadata
+- Empty directory
+- Directory instead of file path
+
@@ -0,0 +1,291 @@
+# Integration Test: Image Rotation Manager
+
+## Summary
+Validate the Image Rotation Manager component for detecting and correcting image orientation issues from non-stabilized UAV camera.
+
+## Component Under Test
+**Component**: Image Rotation Manager
+**Location**: `gps_denied_06_image_rotation_manager`
+**Dependencies**:
+- Image Input Pipeline
+- EXIF metadata reading
+- OpenCV rotation functions
+- Configuration Manager
+
+## Detailed Description
+This test validates that the Image Rotation Manager can:
+1. Detect rotation/orientation from EXIF metadata
+2. Detect rotation from image content analysis (horizon detection)
+3. Apply rotation correction efficiently
+4. Handle pitch and roll variations from wing-type UAV
+5. Maintain image quality during rotation
+6. Estimate rotation angle for images without metadata
+7. Handle edge cases (upside-down, 90° rotations)
+8. Coordinate with vision algorithms that expect specific orientations
+
+Since the UAV camera is "pointing downwards and fixed, but not auto-stabilized" (per restrictions), images may have varying orientations due to aircraft banking and pitch during flight, especially during turns.
+
+## Input Data
+
+### Test Case 1: EXIF Orientation
+- **Image**: AD000001.jpg
+- **Check**: EXIF orientation tag if present
+- **Expected**: Detect and report orientation metadata
+
+### Test Case 2: Straight Flight (No Rotation)
+- **Images**: AD000001-AD000010 (straight segment)
+- **Expected**: Little to no rotation needed (< 5°)
+
+### Test Case 3: Before/After Sharp Turn
+- **Images**: AD000042, AD000044 (sharp turn, skip AD000043)
+- **Expected**: Possible rotation difference due to banking
+
+### Test Case 4: Outlier Image
+- **Image**: AD000048 (after 268m jump)
+- **Expected**: May have significant rotation due to aircraft tilt
+
+### Test Case 5: Content-Based Rotation Detection
+- **Image**: AD000025.jpg (mid-route)
+- **Method**: Horizon detection or feature alignment
+- **Expected**: Estimate rotation angle without metadata
+
+### Test Case 6: Rotation Correction Application
+- **Image**: Artificially rotated AD000010.jpg by known angles (15°, 30°, 45°, 90°)
+- **Expected**: Detect and correct rotation to within 2°
+
+### Test Case 7: Batch Processing
+- **Images**: All 60 images
+- **Expected**: Process all images, identify which need rotation correction
+
+### Test Case 8: Extreme Rotation
+- **Image**: AD000015.jpg rotated 180° (upside-down)
+- **Expected**: Detect and correct significant rotation
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "image_path": "path/to/image.jpg",
+  "rotation_detected": true/false,
+  "rotation_angle_deg": <float>,
+  "rotation_source": "exif|content_analysis|none",
+  "confidence": <float 0-1>,
+  "rotation_applied": true/false,
+  "corrected_image_path": "path/to/corrected_image.jpg",
+  "processing_time_ms": <float>,
+  "quality_metrics": {
+    "sharpness_before": <float>,
+    "sharpness_after": <float>
+  }
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (EXIF)**:
+- success = true
+- EXIF orientation read if present
+- Reported correctly
+- processing_time_ms < 50
+
+**Test Case 2 (Straight Flight)**:
+- All images show rotation_angle_deg < 5°
+- rotation_applied = false or minimal correction
+- Consistent orientations across sequence
+
+**Test Case 3 (Sharp Turn)**:
+- Rotation angles detected if present
+- Banking during turn may cause rotation
+- Both images processed successfully
+
+**Test Case 4 (Outlier)**:
+- Image processed successfully
+- Rotation detected if present
+- No crash despite unusual geometry
+
+**Test Case 5 (Content-Based)**:
+- Rotation angle estimated (confidence > 0.6)
+- Estimate within ±5° of ground truth if available
+- processing_time_ms < 500
+
+**Test Case 6 (Correction)**:
+- All artificially rotated images corrected
+- Corrected angle within ±2° of target (0°)
+- Image quality preserved (PSNR > 25dB)
+- 90° rotations detected and corrected exactly
+
+**Test Case 7 (Batch)**:
+- All 60 images processed
+- Statistics reported (mean rotation, max rotation)
+- Total processing time < 30 seconds
+- No crashes or failures
+
+**Test Case 8 (Extreme)**:
+- 180° rotation detected (confidence > 0.8)
+- Correction applied successfully
+- Resulting image is right-side-up
+
+## Maximum Expected Time
+- **EXIF reading**: < 50ms
+- **Content-based detection**: < 500ms
+- **Rotation application**: < 200ms
+- **Batch 60 images**: < 30 seconds
+- **Total test suite**: < 60 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Image Rotation Manager
+   b. Load test images
+   c. Create artificially rotated test images for Test Case 6
+   d. Configure detection parameters
+
+2. **Test Case 1 - EXIF**:
+   a. Load AD000001.jpg
+   b. Read EXIF orientation tag
+   c. Report findings
+   d. Measure timing
+
+3. **Test Case 2 - Straight Flight**:
+   a. Process AD000001-AD000010
+   b. Detect rotation for each
+   c. Verify low rotation angles
+   d. Check consistency
+
+4. **Test Case 3 - Sharp Turn**:
+   a. Process AD000042, AD000044
+   b. Compare rotation between images
+   c. Check for banking-induced rotation
+   d. Verify successful processing
+
+5. **Test Case 4 - Outlier**:
+   a. Process AD000048
+   b. Handle any unusual orientation
+   c. Verify robustness
+   d. Check error handling
+
+6. **Test Case 5 - Content-Based**:
+   a. Process AD000025.jpg
+   b. Apply content-based rotation detection
+   c. Estimate rotation angle
+   d. Validate confidence score
+
+7. **Test Case 6 - Correction**:
+   a. Load artificially rotated images
+   b. Detect rotation
+   c. Apply correction
+   d. Verify accuracy
+   e. Check image quality
+
+8. **Test Case 7 - Batch**:
+   a. Process all 60 images
+   b. Collect statistics
+   c. Identify outliers
+   d. Report summary
+
+9. **Test Case 8 - Extreme**:
+   a. Load 180° rotated image
+   b. Detect rotation
+   c. Apply correction
+   d. Verify result
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All test cases meet their success criteria
+- No crashes on any image
+- Rotation detection accuracy > 85%
+- Rotation correction accuracy within ±3°
+- Image quality maintained (PSNR > 25dB)
+
+**Test Fails If**:
+- Any image causes crash
+- Rotation detection fails on obviously rotated images
+- Correction introduces artifacts or quality loss > 5dB
+- Processing time exceeds 2x maximum expected
+- False positive rate > 20% (detecting rotation when none exists)
+
+## Additional Validation
+
+**Rotation Detection Methods**:
+1. **EXIF Tag**: Quickest, but may not be present
+2. **Horizon Detection**: Use Hough lines to find horizon
+3. **Feature Alignment**: Compare to reference orientation
+4. **Gravity Vector**: Use vanishing points (vertical structures)
+
+**Accuracy Testing**:
+For Test Case 6, create rotated versions at:
+- 0° (control)
+- 5°, 10°, 15° (small rotations)
+- 30°, 45° (moderate rotations)
+- 90°, 180°, 270° (cardinal rotations)
+- -15°, -30° (counter-clockwise)
+
+Measure detection and correction accuracy for each.
+
+**Quality Preservation**:
+- **Interpolation Method**: Use bicubic or lanczos for high quality
+- **Border Handling**: Crop or pad appropriately
+- **Metadata Preservation**: Maintain EXIF data in corrected image
+
+**Performance Optimization**:
+- Content-based detection should be skippable if EXIF reliable
+- GPU acceleration for batch rotation if available
+- Parallel processing for batch operations
+
+**Edge Cases**:
+1. **No Clear Horizon**: Agricultural fields may have no clear horizon - should handle gracefully
+2. **All-Sky Image**: If camera points up instead of down (error case)
+3. **Blank/Dark Image**: Insufficient features for detection
+4. **High Texture Image**: Many features but no clear orientation cues
+
+**Integration with Vision Pipeline**:
+- SuperPoint should work regardless of minor rotation
+- LightGlue should match features despite rotation
+- AnyLoc DINOv2 features should be rotation-invariant
+- LiteSAM may benefit from consistent orientation
+
+**False Positive Analysis**:
+Test images that are correctly oriented:
+- Should not detect false rotation
+- Should report confidence < 0.5 for "no rotation needed"
+- Should not apply unnecessary corrections
+
+**Rotation Statistics for Full Dataset**:
+Expected findings:
+- Mean rotation angle: < 5° (mostly straight)
+- Max rotation angle: ~15-30° during banking
+- Images needing correction: < 30%
+- Straight segments (01-30): minimal rotation
+- Turn segments (42-48): possible higher rotation
+
+## Camera Characteristics
+
+Given UAV camera is:
+- "pointing downwards and fixed"
+- "not auto-stabilized"
+- Wing-type UAV (banks to turn)
+
+Expected rotation scenarios:
+- **Roll**: Banking during turns (up to ±30°)
+- **Pitch**: Climbing/descending (usually minimal)
+- **Yaw**: Heading change (doesn't affect image rotation)
+
+Most critical: **Roll detection and correction**
+
+## Validation Against Ground Truth
+
+If GPS coordinates and consecutive images are used:
+- Direction of flight can be inferred
+- Expected "up" direction in image can be estimated
+- Validate detected rotation against flight path geometry
+
+## Performance Benchmarks
+
+- 1000 EXIF reads in < 5 seconds
+- 100 content-based detections in < 30 seconds
+- 100 rotation applications in < 10 seconds
+- Memory usage per image < 100MB
+
@@ -0,0 +1,335 @@
+# Integration Test: REST API
+
+## Summary
+Validate the REST API component that provides HTTP endpoints for managing flights, uploading images, retrieving results, and controlling the ASTRAL-Next system.
+
+## Component Under Test
+**Component**: GPS Denied REST API
+**Location**: `gps_denied_01_gps_denied_rest_api`
+**Dependencies**:
+- Flight Manager
+- Result Manager
+- SSE Event Streamer
+- Database Layer
+- Image Input Pipeline
+- Configuration Manager
+
+## Detailed Description
+This test validates that the REST API can:
+1. Handle flight creation and management (CRUD operations)
+2. Accept image uploads for processing
+3. Provide endpoints for result retrieval
+4. Support user input for manual fixes (AC-6)
+5. Manage authentication and authorization
+6. Handle concurrent requests
+7. Provide appropriate error responses
+8. Follow REST conventions and return proper status codes
+9. Support SSE connections for real-time updates
+
+The API is the primary interface for external clients to interact with the GPS-denied navigation system.
+
+## Input Data
+
+### Test Case 1: Create Flight
+- **Endpoint**: POST /flights
+- **Payload**:
+```json
+{
+  "flight_name": "Test_Flight_001",
+  "start_gps": {"lat": 48.275292, "lon": 37.385220},
+  "altitude_m": 400,
+  "camera_params": {
+    "focal_length_mm": 25,
+    "sensor_width_mm": 23.5,
+    "resolution": {"width": 6252, "height": 4168}
+  }
+}
+```
+- **Expected**: 201 Created, returns flight_id
+
+### Test Case 2: Upload Single Image
+- **Endpoint**: POST /flights/{flightId}/images
+- **Payload**: Multipart form-data with AD000001.jpg
+- **Headers**: Content-Type: multipart/form-data
+- **Expected**: 202 Accepted, processing started
+
+### Test Case 3: Upload Multiple Images
+- **Endpoint**: POST /flights/{flightId}/images/batch
+- **Payload**: AD000001-AD000010 (10 images)
+- **Expected**: 202 Accepted, all images queued
+
+### Test Case 4: Get Flight Status
+- **Endpoint**: GET /flights/{flightId}
+- **Expected**: 200 OK, returns processing status and statistics
+
+### Test Case 5: Get Results
+- **Endpoint**: GET /flights/{flightId}/results
+- **Query Params**: ?format=json&include_refined=true
+- **Expected**: 200 OK, returns GPS coordinates for all processed images
+
+### Test Case 6: Get Specific Image Result
+- **Endpoint**: GET /flights/{flightId}/results?image_id={imageId}
+- **Expected**: 200 OK, returns detailed result for one image
+
+### Test Case 7: Submit User Fix (AC-6)
+- **Endpoint**: POST /flights/{flightId}/user-fix
+- **Payload**:
+```json
+{
+  "frame_id": 15,
+  "uav_pixel": [3126, 2084],
+  "satellite_gps": {"lat": 48.270334, "lon": 37.374442}
+}
+```
+- **Expected**: 200 OK, system incorporates user fix, processing resumes
+
+### Test Case 8: List Flights
+- **Endpoint**: GET /flights
+- **Query Params**: ?status=active&limit=10
+- **Expected**: 200 OK, returns list of flights
+
+### Test Case 9: Delete Flight
+- **Endpoint**: DELETE /flights/{flightId}
+- **Expected**: 204 No Content, flight and associated data deleted
+
+### Test Case 10: Error Handling - Invalid Input
+- **Endpoint**: POST /flights
+- **Payload**: Invalid JSON or missing required fields
+- **Expected**: 400 Bad Request with error details
+
+### Test Case 11: Error Handling - Not Found
+- **Endpoint**: GET /flights/nonexistent_id
+- **Expected**: 404 Not Found
+
+### Test Case 12: SSE Connection
+- **Endpoint**: GET /flights/{flightId}/stream
+- **Headers**: Accept: text/event-stream
+- **Expected**: 200 OK, establishes SSE stream for real-time updates
+
+### Test Case 13: Concurrent Requests
+- **Scenario**: 10 simultaneous GET requests for different flights
+- **Expected**: All succeed, no race conditions
+
+## Expected Output
+
+For each test case, verify HTTP response:
+```json
+{
+  "status_code": <integer>,
+  "headers": {
+    "Content-Type": "string",
+    "X-Request-ID": "string"
+  },
+  "body": {
+    "success": true/false,
+    "data": {...},
+    "error": {...}
+  },
+  "response_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (Create Flight)**:
+- status_code = 201
+- Response includes valid flight_id
+- Flight persisted in database
+- response_time_ms < 500
+
+**Test Case 2 (Upload Single)**:
+- status_code = 202
+- Image accepted and queued
+- Processing begins within 5 seconds
+- response_time_ms < 2000
+
+**Test Case 3 (Upload Batch)**:
+- status_code = 202
+- All 10 images accepted
+- Queue order maintained
+- response_time_ms < 10000
+
+**Test Case 4 (Get Status)**:
+- status_code = 200
+- Returns valid status JSON
+- Includes processing statistics
+- response_time_ms < 200
+
+**Test Case 5 (Get Results)**:
+- status_code = 200
+- Returns GPS coordinates for all processed images
+- JSON format correct
+- response_time_ms < 1000
+
+**Test Case 6 (Get Specific Result)**:
+- status_code = 200
+- Returns detailed result for requested image
+- Includes confidence scores
+- response_time_ms < 200
+
+**Test Case 7 (User Fix)**:
+- status_code = 200
+- User fix accepted and applied
+- System re-optimizes affected trajectory
+- response_time_ms < 500
+
+**Test Case 8 (List Flights)**:
+- status_code = 200
+- Returns array of flights
+- Pagination works correctly
+- response_time_ms < 300
+
+**Test Case 9 (Delete)**:
+- status_code = 204
+- Flight deleted from database
+- Associated files cleaned up
+- response_time_ms < 1000
+
+**Test Case 10 (Invalid Input)**:
+- status_code = 400
+- Error message is clear and specific
+- Returns which fields are invalid
+
+**Test Case 11 (Not Found)**:
+- status_code = 404
+- Error message indicates resource not found
+
+**Test Case 12 (SSE)**:
+- status_code = 200
+- SSE connection established
+- Receives events when images processed
+- Connection remains open
+
+**Test Case 13 (Concurrent)**:
+- All requests succeed
+- No 500 errors
+- No data corruption
+- Average response_time < 500ms
+
+## Maximum Expected Time
+- **GET requests**: < 500ms
+- **POST create**: < 500ms
+- **POST upload single**: < 2 seconds
+- **POST upload batch**: < 10 seconds
+- **DELETE**: < 1 second
+- **Total test suite**: < 120 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Start REST API server
+   b. Initialize database with clean state
+   c. Prepare test images
+   d. Configure test client
+
+2. **Execute Test Cases Sequentially**:
+   a. Run Test Case 1 (Create Flight) - store flight_id
+   b. Run Test Case 2 (Upload Single Image) using flight_id
+   c. Wait for processing or use Test Case 4 to poll status
+   d. Run Test Case 3 (Upload Batch)
+   e. Run Test Case 5 (Get Results)
+   f. Run Test Case 6 (Get Specific Result)
+   g. Run Test Case 7 (User Fix)
+   h. Run Test Case 8 (List Flights)
+   i. Run Test Case 12 (SSE Connection) in parallel
+   j. Run Test Case 13 (Concurrent Requests)
+   k. Run Test Case 10 (Invalid Input)
+   l. Run Test Case 11 (Not Found)
+   m. Run Test Case 9 (Delete Flight) - cleanup
+
+3. **Validation Phase**:
+   a. Verify all status codes correct
+   b. Check response times
+   c. Validate JSON schemas
+   d. Verify database state
+   e. Check file system state
+
+4. **Cleanup**:
+   a. Delete test flights
+   b. Remove uploaded images
+   c. Clear database test data
+   d. Stop API server
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 13 test cases return expected status codes
+- Response times meet constraints
+- JSON responses match schemas
+- No unhandled exceptions or crashes
+- Database transactions are atomic
+- File uploads/downloads work correctly
+
+**Test Fails If**:
+- Any test case returns wrong status code
+- Any response time exceeds 2x maximum expected
+- Server crashes or becomes unresponsive
+- Data corruption occurs
+- Race conditions detected in concurrent tests
+- SSE connection fails or drops unexpectedly
+
+## Additional Validation
+
+**REST Conventions**:
+- Proper use of HTTP methods (GET, POST, PUT, DELETE)
+- Idempotent operations (GET, PUT, DELETE)
+- Resource-based URLs
+- Appropriate status codes (2xx, 4xx, 5xx)
+
+**JSON Schema Validation**:
+Verify responses match defined schemas for:
+- Flight object
+- Image object
+- Result object
+- Error object
+- Status object
+
+**Security Tests**:
+1. **Authentication**: Verify API requires valid auth tokens (if implemented)
+2. **Authorization**: Verify users can only access their own flights
+3. **Input Sanitization**: Test SQL injection, XSS attempts
+4. **Rate Limiting**: Verify rate limiting works (if implemented)
+5. **CORS**: Verify CORS headers if cross-origin access needed
+
+**Error Handling**:
+Test various error scenarios:
+- Malformed JSON
+- Missing required fields
+- Invalid data types
+- Oversized payloads
+- Unsupported content types
+- Database connection failure
+- Disk full
+- Processing timeout
+
+**Performance Under Load**:
+- 100 concurrent GET requests: avg response time < 1 second
+- Upload 100 images: complete within 5 minutes
+- 1000 status polls: no degradation
+
+**SSE Specific Tests**:
+- Reconnection after disconnect
+- Message ordering
+- Buffering behavior
+- Heartbeat/keep-alive
+- Error event handling
+
+**File Upload Tests**:
+- Maximum file size enforcement
+- Supported formats (JPEG, PNG)
+- Rejected formats (TXT, EXE)
+- Corrupted image handling
+- Duplicate filename handling
+- Concurrent uploads to same flight
+
+##API Documentation Compliance
+
+Verify endpoints match documented API spec in:
+`docs/02_components/gps_denied_01_gps_denied_rest_api/gps_denied_rest_api_spec.md`
+
+Check:
+- All documented endpoints exist
+- Request/response formats match
+- Error codes match documentation
+- Required vs optional fields correct
+
@@ -0,0 +1,357 @@
+# Integration Test: SSE Event Streamer
+
+## Summary
+Validate the Server-Sent Events (SSE) Event Streamer component that provides real-time updates about image processing status and results to connected clients.
+
+## Component Under Test
+**Component**: SSE Event Streamer
+**Location**: `gps_denied_14_sse_event_streamer`
+**Dependencies**:
+- REST API (provides SSE endpoint)
+- Result Manager (source of events)
+- Flight Manager (flight state changes)
+- Factor Graph Optimizer (trajectory updates)
+
+## Detailed Description
+This test validates that the SSE Event Streamer can:
+1. Establish SSE connections from clients
+2. Stream events in real-time as images are processed (AC-8)
+3. Send immediate results as soon as available (AC-8)
+4. Send refined results when trajectory is re-optimized (AC-8)
+5. Maintain multiple concurrent client connections
+6. Handle client disconnections gracefully
+7. Implement proper SSE protocol (event format, keep-alive)
+8. Provide event filtering and subscriptions
+9. Buffer events for slow clients
+
+Per AC-8: "Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user."
+
+## Input Data
+
+### Test Case 1: Single Client Connection
+- **Scenario**: Client connects to SSE stream for a flight
+- **Flight**: Test_Flight_001 with 10 images
+- **Expected**: Receive events for each image processed
+
+### Test Case 2: Multiple Client Connections
+- **Scenario**: 5 clients connect to same flight stream
+- **Flight**: Test_Flight_002 with 10 images
+- **Expected**: All clients receive all events
+
+### Test Case 3: Image Processed Event
+- **Trigger**: Image AD000001.jpg completed by vision pipeline
+- **Expected Event**:
+```json
+{
+  "event": "image_processed",
+  "data": {
+    "flight_id": "flight_123",
+    "image_id": "AD000001",
+    "sequence_number": 1,
+    "estimated_gps": {"lat": 48.275292, "lon": 37.385220},
+    "confidence": 0.92,
+    "processing_time_ms": 450,
+    "timestamp": "2025-11-24T21:00:00Z"
+  }
+}
+```
+
+### Test Case 4: Trajectory Refined Event
+- **Trigger**: Factor graph optimizes and refines past trajectory
+- **Expected Event**:
+```json
+{
+  "event": "trajectory_refined",
+  "data": {
+    "flight_id": "flight_123",
+    "refined_images": ["AD000001", "AD000002", "AD000003"],
+    "refinement_reason": "new_gps_anchor",
+    "timestamp": "2025-11-24T21:00:05Z"
+  }
+}
+```
+
+### Test Case 5: User Fix Applied Event
+- **Trigger**: User submits manual GPS fix for image
+- **Expected Event**:
+```json
+{
+  "event": "user_fix_applied",
+  "data": {
+    "flight_id": "flight_123",
+    "image_id": "AD000015",
+    "fixed_gps": {"lat": 48.268291, "lon": 37.369815},
+    "affected_images": 5,
+    "timestamp": "2025-11-24T21:00:10Z"
+  }
+}
+```
+
+### Test Case 6: Processing Error Event
+- **Trigger**: Image fails to process
+- **Expected Event**:
+```json
+{
+  "event": "processing_error",
+  "data": {
+    "flight_id": "flight_123",
+    "image_id": "AD000999",
+    "error_type": "tracking_lost",
+    "error_message": "Insufficient features for matching",
+    "timestamp": "2025-11-24T21:00:15Z"
+  }
+}
+```
+
+### Test Case 7: Flight Status Event
+- **Trigger**: Flight processing completes
+- **Expected Event**:
+```json
+{
+  "event": "flight_status",
+  "data": {
+    "flight_id": "flight_123",
+    "status": "completed",
+    "total_images": 60,
+    "processed_images": 60,
+    "success_rate": 0.95,
+    "mean_error_m": 23.5,
+    "timestamp": "2025-11-24T21:05:00Z"
+  }
+}
+```
+
+### Test Case 8: Keep-Alive Heartbeat
+- **Scenario**: No events for 30 seconds
+- **Expected**: Heartbeat/comment sent every 15 seconds to keep connection alive
+
+### Test Case 9: Event Ordering
+- **Scenario**: Process 5 images rapidly
+- **Expected**: Events received in correct sequence order
+
+### Test Case 10: Client Reconnection
+- **Scenario**: Client disconnects and reconnects
+- **Expected**: Can resume stream, no missed events (if buffered)
+
+### Test Case 11: Slow Client
+- **Scenario**: Client with slow network connection
+- **Expected**: Events buffered, connection maintained or gracefully closed
+
+### Test Case 12: Event Filtering
+- **Scenario**: Client subscribes only to "image_processed" events
+- **Expected**: Only receives filtered event types
+
+## Expected Output
+
+For each test case, verify:
+```json
+{
+  "connection_established": true/false,
+  "events_received": <integer>,
+  "events_expected": <integer>,
+  "event_types": ["image_processed", "trajectory_refined", ...],
+  "latency_ms": <float>,
+  "connection_duration_s": <float>,
+  "disconnection_reason": "string|null"
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (Single Client)**:
+- connection_established = true
+- events_received = 10 (one per image)
+- Average latency < 500ms from event generation to client receipt
+- No disconnections
+
+**Test Case 2 (Multiple Clients)**:
+- All 5 clients receive all events
+- No event loss
+- Similar latency across clients
+
+**Test Case 3-7 (Event Types)**:
+- All event types received correctly
+- JSON format valid and parseable
+- All required fields present
+- Timestamps reasonable
+
+**Test Case 8 (Keep-Alive)**:
+- Heartbeat sent every ~15 seconds
+- Connection does not timeout
+- No proxy/intermediary closes connection
+
+**Test Case 9 (Ordering)**:
+- Events received in order of sequence_number
+- No out-of-order delivery
+
+**Test Case 10 (Reconnection)**:
+- Reconnection succeeds
+- If Last-Event-ID supported, resumes from last event
+- Or starts fresh stream
+
+**Test Case 11 (Slow Client)**:
+- Buffer grows but doesn't exceed limit
+- If limit exceeded, connection closed gracefully
+- No server-side memory leak
+
+**Test Case 12 (Filtering)**:
+- Only subscribed event types received
+- Other events not sent
+- Reduces bandwidth usage
+
+## Maximum Expected Time
+- **Connection establishment**: < 100ms
+- **Event latency** (generation to receipt): < 500ms
+- **Heartbeat interval**: 15 seconds
+- **Total test suite**: < 300 seconds (includes processing wait times)
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Start REST API server with SSE support
+   b. Create test flight with sample images
+   c. Prepare SSE test clients
+
+2. **Test Case 1 - Single Client**:
+   a. Open SSE connection to GET /flights/{flightId}/stream
+   b. Upload 10 images to flight
+   c. Collect all events
+   d. Verify event count and content
+   e. Close connection
+
+3. **Test Case 2 - Multiple Clients**:
+   a. Open 5 SSE connections to same flight
+   b. Upload 10 images
+   c. Verify all clients receive all events
+   d. Compare events across clients
+
+4. **Test Cases 3-7 - Event Types**:
+   a. Trigger each type of event
+   b. Verify correct event emitted
+   c. Validate JSON structure
+   d. Check field values
+
+5. **Test Case 8 - Keep-Alive**:
+   a. Open connection
+   b. Wait 45 seconds without triggering events
+   c. Verify 2-3 heartbeats received
+   d. Check connection still alive
+
+6. **Test Case 9 - Ordering**:
+   a. Upload 5 images rapidly (< 1 second apart)
+   b. Collect events
+   c. Verify sequence_number ordering
+   d. Check no events skipped
+
+7. **Test Case 10 - Reconnection**:
+   a. Open connection, receive some events
+   b. Close connection
+   c. Immediately reconnect
+   d. Check if stream resumes or restarts
+
+8. **Test Case 11 - Slow Client**:
+   a. Simulate slow client (delay reading events)
+   b. Trigger many events rapidly
+   c. Monitor buffer size
+   d. Verify behavior when buffer full
+
+9. **Test Case 12 - Filtering**:
+   a. Connect with filter parameter: ?events=image_processed
+   b. Trigger multiple event types
+   c. Verify only subscribed events received
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 12 test cases meet their success criteria
+- No event loss in normal conditions
+- Latency consistently < 500ms
+- Multiple clients supported without degradation
+- Protocol correctly implemented
+- No memory leaks or resource exhaustion
+
+**Test Fails If**:
+- Events lost or duplicated
+- Latency exceeds 1 second regularly
+- Connections drop unexpectedly
+- Invalid SSE format
+- Server crashes under load
+- Memory leak detected
+
+## Additional Validation
+
+**SSE Protocol Compliance**:
+- Content-Type: text/event-stream
+- Cache-Control: no-cache
+- Connection: keep-alive
+- Event format: `event: name\ndata: json\n\n`
+- Support for `id:` field for resumption
+- Support for `retry:` field
+
+**Event Schema Validation**:
+Define and validate JSON schemas for each event type:
+- image_processed
+- trajectory_refined
+- user_fix_applied
+- processing_error
+- flight_status
+- failure_detected (when system requests user input per AC-6)
+
+**Performance Under Load**:
+- 100 concurrent clients: all receive events
+- 1000 events/second: latency < 1 second
+- Long-lived connections: stable over 1 hour
+
+**Error Scenarios**:
+- Client closes connection mid-event: no server error
+- Client sends data on read-only SSE connection: rejected
+- Invalid flight_id: connection refused with 404
+- Server restart: clients reconnect automatically
+
+**Browser Compatibility** (if web client):
+- EventSource API works in Chrome, Firefox, Safari
+- Automatic reconnection on network loss
+- Error event handling
+
+**Buffering Strategy**:
+- Recent events buffered (last 100?)
+- Old events discarded
+- Configurable buffer size
+- Buffer per-client or shared
+
+**Event Priority**:
+- Critical events (errors, user input needed) sent immediately
+- Non-critical events (refinements) can be batched
+- Priority queue for event delivery
+
+**Integration with AC-8**:
+Verify AC-8 requirement:
+1. **Immediate Results**: Events sent as soon as image processed (< 1 second delay)
+2. **Refinement Notification**: trajectory_refined events sent when optimization updates past results
+3. **Continuous Streaming**: User can see first results before route completes
+
+Measure:
+- Time from image processing complete to event sent: < 100ms
+- Time from event sent to client received: < 500ms
+- Total latency: < 1 second (meets AC-8 "immediately")
+
+## Monitoring and Metrics
+
+Track and report:
+- Active connections count
+- Events sent per second
+- Average event latency
+- Buffer usage (current/max)
+- Reconnection rate
+- Client disconnection reasons
+- Memory usage per connection
+- Network bandwidth per connection
+
+## Security Considerations
+
+- Authentication: Verify SSE requires auth token
+- Authorization: Clients can only subscribe to their own flights
+- Rate limiting: Prevent connection flooding
+- Input validation: Validate flight_id in URL
+- DoS protection: Limit connections per IP/user
+
@@ -0,0 +1,194 @@
+# Integration Test: Flight Lifecycle Manager (F02.1)
+
+## Summary
+Validate the Flight Lifecycle Manager component responsible for flight CRUD operations, system initialization, and API request routing.
+
+## Component Under Test
+**Component**: Flight Lifecycle Manager (F02.1)
+**Interface**: `IFlightLifecycleManager`
+**Dependencies**:
+- F03 Flight Database (persistence)
+- F04 Satellite Data Manager (prefetching)
+- F05 Image Input Pipeline (image queuing)
+- F13 Coordinate Transformer (ENU origin)
+- F15 SSE Event Streamer (stream creation)
+- F16 Model Manager (model loading)
+- F17 Configuration Manager (config loading)
+- F02.2 Flight Processing Engine (managed child)
+
+## Detailed Description
+This test validates that the Flight Lifecycle Manager can:
+1. Create and initialize new flight sessions
+2. Manage flight lifecycle (created → active → completed)
+3. Validate waypoints and geofences
+4. Queue images for processing (delegates to F05, triggers F02.2)
+5. Handle user fix requests (delegates to F02.2)
+6. Create SSE client streams (delegates to F15)
+7. Initialize system components on startup
+8. Manage F02.2 Processing Engine instances per flight
+
+The Lifecycle Manager is the external-facing component handling API requests and delegating to internal processing.
+
+## Input Data
+
+### Test Case 1: Create New Flight
+- **Input**:
+  - Flight name: "Test_Baseline_Flight"
+  - Start GPS: 48.275292, 37.385220
+  - Altitude: 400m
+  - Camera params: focal_length=25mm, sensor_width=23.5mm, resolution=6252x4168
+- **Expected**: 
+  - Flight created with unique ID
+  - F13.set_enu_origin() called with start_gps
+  - F04.prefetch_route_corridor() triggered
+  - Flight persisted to F03
+
+### Test Case 2: Get Flight
+- **Input**: Existing flight_id
+- **Expected**: Flight data returned with current state
+
+### Test Case 3: Get Flight State
+- **Input**: Existing flight_id
+- **Expected**: FlightState returned (processing status, current frame, etc.)
+
+### Test Case 4: Delete Flight
+- **Input**: Existing flight_id
+- **Expected**: 
+  - Flight marked deleted in F03
+  - Associated F02.2 engine stopped
+  - Resources cleaned up
+
+### Test Case 5: Update Flight Status
+- **Input**: flight_id, status update (e.g., pause, resume)
+- **Expected**: Status updated, F02.2 notified if needed
+
+### Test Case 6: Update Single Waypoint
+- **Input**: flight_id, waypoint_id, new Waypoint data
+- **Expected**: Waypoint updated in F03
+
+### Test Case 7: Batch Update Waypoints
+- **Input**: flight_id, List of updated Waypoints
+- **Expected**: All waypoints updated atomically
+
+### Test Case 8: Validate Waypoint
+- **Input**: Waypoint with GPS coordinates
+- **Expected**: ValidationResult with valid/invalid and reason
+
+### Test Case 9: Validate Geofence
+- **Input**: Geofence polygon
+- **Expected**: ValidationResult (valid polygon, within limits)
+
+### Test Case 10: Queue Images (Delegation)
+- **Input**: flight_id, ImageBatch (10 images)
+- **Expected**: 
+  - F05.queue_batch() called
+  - F02.2 engine started/triggered
+  - BatchQueueResult returned
+
+### Test Case 11: Handle User Fix (Delegation)
+- **Input**: flight_id, UserFixRequest (frame_id, GPS anchor)
+- **Expected**: 
+  - Active F02.2 engine retrieved
+  - engine.apply_user_fix() called
+  - UserFixResult returned
+
+### Test Case 12: Create Client Stream (Delegation)
+- **Input**: flight_id, client_id
+- **Expected**: 
+  - F15.create_stream() called
+  - StreamConnection returned
+
+### Test Case 13: Convert Object to GPS (Delegation)
+- **Input**: flight_id, frame_id, pixel coordinates
+- **Expected**: 
+  - F13.image_object_to_gps() called
+  - GPSPoint returned
+
+### Test Case 14: System Initialization
+- **Input**: Call initialize_system()
+- **Expected**: 
+  - F17.load_config() called
+  - F16.load_model() called for all models
+  - F03 database initialized
+  - F04 cache initialized
+  - F08 index loaded
+  - Returns True on success
+
+### Test Case 15: Get Flight Metadata
+- **Input**: flight_id
+- **Expected**: FlightMetadata (camera params, altitude, waypoint count, etc.)
+
+### Test Case 16: Validate Flight Continuity
+- **Input**: List of Waypoints
+- **Expected**: ValidationResult (continuous path, no gaps > threshold)
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "flight_id": "unique_flight_identifier",
+  "flight_name": "string",
+  "state": "created|active|completed|paused|deleted",
+  "created_at": "timestamp",
+  "updated_at": "timestamp",
+  "enu_origin": {
+    "latitude": <float>,
+    "longitude": <float>
+  },
+  "waypoint_count": <integer>,
+  "has_active_engine": <boolean>
+}
+```
+
+## Success Criteria
+
+**Test Cases 1-5 (Flight CRUD)**:
+- Flight created/retrieved/updated/deleted correctly
+- State transitions valid
+- Database persistence verified
+
+**Test Cases 6-9 (Validation)**:
+- Waypoint/geofence validation correct
+- Invalid inputs rejected with reason
+- Edge cases handled
+
+**Test Cases 10-13 (Delegation)**:
+- Correct components called
+- Parameters passed correctly
+- Results returned correctly
+
+**Test Case 14 (Initialization)**:
+- All components initialized in correct order
+- Failures handled gracefully
+- Startup time < 30 seconds
+
+**Test Cases 15-16 (Metadata/Continuity)**:
+- Metadata accurate
+- Continuity validation correct
+
+## Maximum Expected Time
+- Create flight: < 500ms (excluding prefetch)
+- Get/Update flight: < 100ms
+- Delete flight: < 500ms
+- Queue images: < 2 seconds (10 images)
+- User fix delegation: < 100ms
+- System initialization: < 30 seconds
+- Total test suite: < 120 seconds
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 16 test cases pass
+- CRUD operations work correctly
+- Delegation to child components works
+- System initialization completes
+- No resource leaks
+
+**Test Fails If**:
+- Flight CRUD fails
+- Delegation fails to reach correct component
+- System initialization fails
+- Invalid state transitions allowed
+- Resource cleanup fails on delete
+
@@ -0,0 +1,241 @@
+# Integration Test: Flight Processing Engine (F02.2)
+
+## Summary
+Validate the Flight Processing Engine component responsible for the main processing loop, frame-by-frame orchestration, recovery coordination, and chunk management.
+
+## Component Under Test
+**Component**: Flight Processing Engine (F02.2)
+**Interface**: `IFlightProcessingEngine`
+**Dependencies**:
+- F05 Image Input Pipeline (image source)
+- F06 Image Rotation Manager (pre-processing)
+- F07 Sequential Visual Odometry (motion estimation)
+- F09 Metric Refinement (satellite alignment)
+- F10 Factor Graph Optimizer (state estimation)
+- F11 Failure Recovery Coordinator (recovery logic)
+- F12 Route Chunk Manager (chunk state)
+- F14 Result Manager (saving results)
+- F15 SSE Event Streamer (real-time updates)
+
+## Detailed Description
+This test validates that the Flight Processing Engine can:
+1. Run the main processing loop (Image → VO → Graph → Result)
+2. Manage flight status (Processing, Blocked, Recovering, Completed)
+3. Coordinate chunk lifecycle with F12
+4. Handle tracking loss and delegate to F11
+5. Apply user fixes and resume processing
+6. Publish results via F14/F15
+7. Manage background chunk matching tasks
+8. Handle concurrent processing gracefully
+
+The Processing Engine runs in a background thread per active flight.
+
+## Input Data
+
+### Test Case 1: Start Processing
+- **Flight**: Test_Baseline_Flight with 10 queued images
+- **Action**: Call start_processing(flight_id)
+- **Expected**: 
+  - Background processing thread started
+  - First image retrieved from F05
+  - Processing loop begins
+
+### Test Case 2: Stop Processing
+- **Flight**: Active flight with processing in progress
+- **Action**: Call stop_processing(flight_id)
+- **Expected**: 
+  - Processing loop stopped gracefully
+  - Current frame completed or cancelled
+  - State saved
+
+### Test Case 3: Process Single Frame (Normal)
+- **Input**: Single frame with good tracking
+- **Expected**: 
+  - F06.requires_rotation_sweep() checked
+  - F07.compute_relative_pose() called
+  - F12.add_frame_to_chunk() called
+  - F10.add_relative_factor() called
+  - F10.optimize_chunk() called
+  - F14.update_frame_result() called
+  - SSE event sent
+
+### Test Case 4: Process Frame (First Frame / Sharp Turn)
+- **Input**: First frame or frame after sharp turn
+- **Expected**: 
+  - F06.requires_rotation_sweep() returns True
+  - F06.rotate_image_360() called (12 rotations)
+  - F09.align_to_satellite() called for each rotation
+  - Best rotation selected
+  - Heading updated
+
+### Test Case 5: Process Frame (Tracking Lost)
+- **Input**: Frame with low VO confidence
+- **Expected**: 
+  - F11.check_confidence() returns LOST
+  - F11.create_chunk_on_tracking_loss() called
+  - New chunk created proactively
+  - handle_tracking_loss() invoked
+
+### Test Case 6: Handle Tracking Loss (Progressive Search)
+- **Input**: Frame with tracking lost, recoverable
+- **Expected**: 
+  - F11.start_search() called
+  - F11.try_current_grid() called iteratively
+  - Grid expansion (1→4→9→16→25)
+  - Match found, F11.mark_found() called
+  - Processing continues
+
+### Test Case 7: Handle Tracking Loss (User Input Needed)
+- **Input**: Frame with tracking lost, not recoverable
+- **Expected**: 
+  - Progressive search exhausted (25 tiles)
+  - F11.create_user_input_request() called
+  - Engine receives UserInputRequest
+  - F15.send_user_input_request() called
+  - Status set to BLOCKED
+  - Processing paused
+
+### Test Case 8: Apply User Fix
+- **Input**: UserFixRequest with GPS anchor
+- **Action**: Call apply_user_fix(flight_id, fix_data)
+- **Expected**: 
+  - F11.apply_user_anchor() called
+  - Anchor applied to factor graph
+  - Status set to PROCESSING
+  - Processing loop resumes
+
+### Test Case 9: Get Active Chunk
+- **Flight**: Active flight with chunks
+- **Action**: Call get_active_chunk(flight_id)
+- **Expected**: 
+  - F12.get_active_chunk() called
+  - Returns current active chunk or None
+
+### Test Case 10: Create New Chunk
+- **Input**: Tracking loss detected
+- **Action**: Call create_new_chunk(flight_id, frame_id)
+- **Expected**: 
+  - F12.create_chunk() called
+  - New chunk created in factor graph
+  - Returns ChunkHandle
+
+### Test Case 11: Process Flight (Full - Normal)
+- **Flight**: 30 images (AD000001-030)
+- **Expected**: 
+  - All 30 images processed
+  - Status transitions: Processing → Completed
+  - Results published for all frames
+  - Processing time < 150 seconds (5s per image)
+
+### Test Case 12: Process Flight (With Sharp Turn)
+- **Flight**: AD000042, AD000044, AD000045, AD000046 (skip AD000043)
+- **Expected**: 
+  - Tracking lost at AD000044
+  - New chunk created
+  - Recovery succeeds (L2/L3)
+  - Flight completes
+
+### Test Case 13: Process Flight (With Outlier)
+- **Flight**: AD000045-050 (includes 268m outlier)
+- **Expected**: 
+  - Outlier detected by factor graph
+  - Robust kernel handles outlier
+  - Other images processed correctly
+
+### Test Case 14: Process Flight (Long)
+- **Flight**: All 60 images (AD000001-060)
+- **Expected**: 
+  - Processing completes
+  - Registration rate > 95% (AC-9)
+  - Processing time < 300 seconds (AC-7)
+
+### Test Case 15: Background Chunk Matching
+- **Flight**: Flight with multiple unanchored chunks
+- **Expected**: 
+  - Background task processes chunks
+  - F11.process_unanchored_chunks() called periodically
+  - Chunks matched and merged asynchronously
+  - Frame processing not blocked
+
+### Test Case 16: State Persistence and Recovery
+- **Scenario**: 
+  - Process 15 frames
+  - Simulate restart
+  - Resume processing
+- **Expected**: 
+  - State saved to F03 before restart
+  - State restored on resume
+  - Processing continues from frame 16
+
+## Expected Output
+
+For each frame processed:
+```json
+{
+  "flight_id": "string",
+  "frame_id": <integer>,
+  "status": "processed|failed|skipped|blocked",
+  "gps": {
+    "latitude": <float>,
+    "longitude": <float>
+  },
+  "confidence": <float>,
+  "chunk_id": "string",
+  "processing_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Test Cases 1-2 (Start/Stop)**:
+- Processing starts/stops correctly
+- No resource leaks
+- Graceful shutdown
+
+**Test Cases 3-5 (Frame Processing)**:
+- Correct components called in order
+- State updates correctly
+- Results published
+
+**Test Cases 6-8 (Recovery)**:
+- Progressive search works
+- User input flow works
+- Recovery successful
+
+**Test Cases 9-10 (Chunk Management)**:
+- Chunks created/managed correctly
+- F12 integration works
+
+**Test Cases 11-14 (Full Flights)**:
+- All acceptance criteria met
+- Processing completes successfully
+
+**Test Cases 15-16 (Background/Recovery)**:
+- Background tasks work
+- State persistence works
+
+## Maximum Expected Time
+- Start/stop processing: < 500ms
+- Process single frame: < 5 seconds (AC-7)
+- Handle tracking loss: < 2 seconds
+- Apply user fix: < 1 second
+- Process 30 images: < 150 seconds
+- Process 60 images: < 300 seconds
+- Total test suite: < 600 seconds
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 16 test cases pass
+- Processing loop works correctly
+- Recovery mechanisms work
+- Chunk management works
+- Performance targets met
+
+**Test Fails If**:
+- Processing loop crashes
+- Recovery fails when it should succeed
+- User input not requested when needed
+- Performance exceeds 5s per image
+- State persistence fails
+
@@ -0,0 +1,434 @@
+# Integration Test: Result Manager
+
+## Summary
+Validate the Result Manager component responsible for storing, retrieving, and managing GPS localization results for processed images.
+
+## Component Under Test
+**Component**: Result Manager
+**Location**: `gps_denied_13_result_manager`
+**Dependencies**:
+- Database Layer (result persistence)
+- Factor Graph Optimizer (source of results)
+- Coordinate Transformer
+- SSE Event Streamer (result notifications)
+
+## Detailed Description
+This test validates that the Result Manager can:
+1. Store initial GPS estimates from vision pipeline
+2. Store refined GPS results after factor graph optimization
+3. Track result versioning (initial vs refined per AC-8)
+4. Retrieve results by flight, image, or time range
+5. Support various output formats (JSON, CSV, KML)
+6. Calculate and store accuracy metrics
+7. Manage result updates when trajectory is re-optimized
+8. Handle user-provided fixes and manual corrections
+9. Export results for external analysis
+10. Maintain result history and audit trail
+
+Per AC-8: "system could refine existing calculated results and send refined results again to user" - Result Manager must track both initial and refined results.
+
+## Input Data
+
+### Test Case 1: Store Initial Result
+- **Flight**: Test_Flight_001
+- **Image**: AD000001.jpg
+- **GPS Estimate**: 48.275290, 37.385218 (from L3)
+- **Ground Truth**: 48.275292, 37.385220
+- **Metadata**: confidence=0.92, processing_time_ms=450, layer="L3"
+- **Expected**: Result stored with version=1 (initial)
+
+### Test Case 2: Store Refined Result
+- **Flight**: Test_Flight_001
+- **Image**: AD000001.jpg (same as Test Case 1)
+- **GPS Refined**: 48.275291, 37.385219 (from Factor Graph)
+- **Metadata**: confidence=0.95, refinement_reason="new_anchor"
+- **Expected**: Result stored with version=2 (refined), version=1 preserved
+
+### Test Case 3: Batch Store Results
+- **Flight**: Test_Flight_002
+- **Images**: AD000001-AD000010
+- **Expected**: All 10 results stored atomically
+
+### Test Case 4: Retrieve Single Result
+- **Query**: Get result for AD000001.jpg in Test_Flight_001
+- **Options**: include_all_versions=true
+- **Expected**: Returns both version=1 (initial) and version=2 (refined)
+
+### Test Case 5: Retrieve Flight Results
+- **Query**: Get all results for Test_Flight_001
+- **Options**: latest_version_only=true
+- **Expected**: Returns latest version for each image
+
+### Test Case 6: Retrieve with Filtering
+- **Query**: Get results for Test_Flight_001
+- **Filter**: confidence > 0.9, error_m < 50
+- **Expected**: Returns only results matching criteria
+
+### Test Case 7: Export to JSON
+- **Flight**: Test_Flight_001
+- **Format**: JSON
+- **Expected**: Valid JSON file with all results
+
+### Test Case 8: Export to CSV
+- **Flight**: Test_Flight_001
+- **Format**: CSV
+- **Columns**: image, lat, lon, error_m, confidence
+- **Expected**: Valid CSV matching coordinates.csv format
+
+### Test Case 9: Export to KML
+- **Flight**: Test_Flight_001
+- **Format**: KML (for Google Earth)
+- **Expected**: Valid KML with placemarks for each image
+
+### Test Case 10: Store User Fix
+- **Flight**: Test_Flight_001
+- **Image**: AD000005.jpg
+- **User GPS**: 48.273997, 37.379828 (from ground truth)
+- **Metadata**: source="user", confidence=1.0
+- **Expected**: User fix stored with special flag, triggers refinement
+
+### Test Case 11: Calculate Statistics
+- **Flight**: Test_Flight_001 with ground truth
+- **Calculation**: Compare estimated vs ground truth
+- **Expected Statistics**:
+  - mean_error_m
+  - median_error_m
+  - rmse_m
+  - percent_under_50m
+  - percent_under_20m
+  - max_error_m
+  - registration_rate
+
+### Test Case 12: Result History
+- **Query**: Get history for AD000001.jpg
+- **Expected**: Returns timeline of all versions with timestamps
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "result_id": "unique_result_identifier",
+  "flight_id": "flight_123",
+  "image_id": "AD000001",
+  "sequence_number": 1,
+  "version": 1,
+  "estimated_gps": {
+    "lat": 48.275290,
+    "lon": 37.385218,
+    "altitude_m": 400
+  },
+  "ground_truth_gps": {
+    "lat": 48.275292,
+    "lon": 37.385220
+  },
+  "error_m": 0.25,
+  "confidence": 0.92,
+  "source": "L3|factor_graph|user",
+  "processing_time_ms": 450,
+  "metadata": {
+    "layer": "L3",
+    "num_features": 250,
+    "inlier_ratio": 0.85
+  },
+  "created_at": "timestamp",
+  "refinement_reason": "string|null"
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (Store Initial)**:
+- Result stored successfully
+- version = 1
+- All fields present and valid
+- Timestamp recorded
+
+**Test Case 2 (Store Refined)**:
+- version = 2 stored
+- version = 1 preserved
+- Reference between versions maintained
+- Refinement reason recorded
+
+**Test Case 3 (Batch Store)**:
+- All 10 results stored
+- Transaction atomic (all or nothing)
+- Processing time < 1 second
+
+**Test Case 4 (Retrieve Single)**:
+- Both versions returned
+- Ordered by version number
+- All fields complete
+
+**Test Case 5 (Retrieve Flight)**:
+- All images returned
+- Only latest versions
+- Ordered by sequence_number
+
+**Test Case 6 (Filtering)**:
+- Only matching results returned
+- Filter applied correctly
+- Query time < 500ms
+
+**Test Case 7 (Export JSON)**:
+- Valid JSON file
+- All results included
+- Human-readable formatting
+
+**Test Case 8 (Export CSV)**:
+- Valid CSV file
+- Matches coordinates.csv format
+- Can be opened in Excel/spreadsheet
+- No missing values
+
+**Test Case 9 (Export KML)**:
+- Valid KML (validates against schema)
+- Displays correctly in Google Earth
+- Includes image names and metadata
+
+**Test Case 10 (User Fix)**:
+- User fix stored with source="user"
+- confidence = 1.0 (user fixes are absolute)
+- Triggers trajectory refinement notification
+
+**Test Case 11 (Statistics)**:
+- All statistics calculated correctly
+- Match manual calculations
+- Accuracy targets validated (AC-1, AC-2)
+- Registration rate validated (AC-9)
+
+**Test Case 12 (History)**:
+- Complete timeline returned
+- All versions present
+- Chronological order
+- Includes metadata for each version
+
+## Maximum Expected Time
+- **Store single result**: < 100ms
+- **Store batch (10 results)**: < 1 second
+- **Retrieve single**: < 100ms
+- **Retrieve flight (60 results)**: < 500ms
+- **Export to file (60 results)**: < 2 seconds
+- **Calculate statistics**: < 1 second
+- **Total test suite**: < 30 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Result Manager
+   b. Create test flight
+   c. Prepare test result data
+   d. Load ground truth for comparison
+
+2. **Test Case 1 - Store Initial**:
+   a. Call store_result() with initial estimate
+   b. Verify database insertion
+   c. Check all fields stored
+   d. Validate timestamp
+
+3. **Test Case 2 - Store Refined**:
+   a. Call store_result() with refined estimate
+   b. Verify version increment
+   c. Check version=1 still exists
+   d. Validate refinement metadata
+
+4. **Test Case 3 - Batch Store**:
+   a. Prepare 10 results
+   b. Call store_results_batch()
+   c. Verify transaction atomicity
+   d. Check all stored correctly
+
+5. **Test Case 4 - Retrieve Single**:
+   a. Call get_result() with image_id
+   b. Request all versions
+   c. Verify both returned
+   d. Check ordering
+
+6. **Test Case 5 - Retrieve Flight**:
+   a. Call get_flight_results() with flight_id
+   b. Request latest only
+   c. Verify all images present
+   d. Check ordering by sequence
+
+7. **Test Case 6 - Filtering**:
+   a. Call get_flight_results() with filters
+   b. Verify filter application
+   c. Validate query performance
+   d. Check result correctness
+
+8. **Test Case 7 - Export JSON**:
+   a. Call export_results(format="json")
+   b. Write to file
+   c. Validate JSON syntax
+   d. Verify completeness
+
+9. **Test Case 8 - Export CSV**:
+   a. Call export_results(format="csv")
+   b. Write to file
+   c. Validate CSV format
+   d. Compare with ground truth CSV
+
+10. **Test Case 9 - Export KML**:
+    a. Call export_results(format="kml")
+    b. Write to file
+    c. Validate KML schema
+    d. Test in Google Earth if available
+
+11. **Test Case 10 - User Fix**:
+    a. Call store_user_fix()
+    b. Verify special handling
+    c. Check refinement triggered
+    d. Validate confidence=1.0
+
+12. **Test Case 11 - Statistics**:
+    a. Call calculate_statistics()
+    b. Compare with ground truth
+    c. Verify all metrics calculated
+    d. Check against AC-1, AC-2, AC-9
+
+13. **Test Case 12 - History**:
+    a. Call get_result_history()
+    b. Verify all versions returned
+    c. Check chronological order
+    d. Validate metadata completeness
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 12 test cases meet their success criteria
+- No data loss
+- All versions tracked correctly
+- Exports generate valid files
+- Statistics calculated accurately
+- Query performance acceptable
+
+**Test Fails If**:
+- Any result fails to store
+- Versions overwrite each other
+- Data corruption occurs
+- Exports invalid or incomplete
+- Statistics incorrect
+- Query times exceed 2x maximum
+
+## Additional Validation
+
+**Data Integrity**:
+- Foreign key constraints enforced (flight_id, image_id)
+- No orphaned results
+- Cascading deletes handled correctly
+- Transaction isolation prevents dirty reads
+
+**Versioning Logic**:
+- Version numbers increment sequentially
+- No gaps in version sequence
+- Latest version easily identifiable
+- Historical versions immutable
+
+**Export Formats**:
+
+**JSON Format**:
+```json
+{
+  "flight_id": "flight_123",
+  "flight_name": "Test Flight",
+  "total_images": 60,
+  "results": [
+    {
+      "image": "AD000001.jpg",
+      "sequence": 1,
+      "gps": {"lat": 48.275292, "lon": 37.385220},
+      "error_m": 0.25,
+      "confidence": 0.92
+    }
+  ]
+}
+```
+
+**CSV Format**:
+```
+image,sequence,lat,lon,altitude_m,error_m,confidence,source
+AD000001.jpg,1,48.275292,37.385220,400,0.25,0.92,factor_graph
+```
+
+**KML Format**:
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<kml xmlns="http://www.opengis.net/kml/2.2">
+  <Document>
+    <Placemark>
+      <name>AD000001.jpg</name>
+      <Point>
+        <coordinates>37.385220,48.275292,400</coordinates>
+      </Point>
+    </Placemark>
+  </Document>
+</kml>
+```
+
+**Statistics Calculations**:
+
+Verify formulas:
+- **Error**: `haversine_distance(estimated, ground_truth)`
+- **RMSE**: `sqrt(mean(errors^2))`
+- **Percent < X**: `count(errors < X) / total * 100`
+- **Registration Rate**: `processed_images / total_images * 100`
+
+**Accuracy Validation Against ACs**:
+- **AC-1**: percent_under_50m ≥ 80%
+- **AC-2**: percent_under_20m ≥ 60%
+- **AC-9**: registration_rate > 95%
+
+**Performance Optimization**:
+- Database indexing on flight_id, image_id, sequence_number
+- Caching frequently accessed results
+- Batch operations for bulk inserts
+- Pagination for large result sets
+
+**Concurrent Access**:
+- Multiple clients reading same results
+- Concurrent writes to different flights
+- Optimistic locking for updates
+- No deadlocks
+
+**Error Handling**:
+- Invalid flight_id: reject with clear error
+- Duplicate result: update vs reject (configurable)
+- Missing ground truth: statistics gracefully handle nulls
+- Export to invalid path: fail with clear error
+- Database connection failure: retry logic
+
+**Audit Trail**:
+- Who created/modified result
+- When each version created
+- Why refinement occurred
+- Source of each estimate (L1/L2/L3/FG/user)
+
+**Data Retention**:
+- Configurable retention policy
+- Archival of old results
+- Purging of test data
+- Backup and recovery procedures
+
+**Integration with AC-8**:
+Verify "refine existing calculated results" functionality:
+1. Initial result stored immediately after L3 processing
+2. Refined result stored after factor graph optimization
+3. Both versions maintained
+4. Client notified via SSE when refinement occurs
+5. Latest version available via API
+
+**Query Capabilities**:
+- Get results by flight
+- Get results by time range
+- Get results by accuracy (error range)
+- Get results by confidence threshold
+- Get results by source (L3, factor_graph, user)
+- Get results needing refinement
+- Get results with errors
+
+**Memory Management**:
+- Results not kept in memory unnecessarily
+- Large result sets streamed not loaded entirely
+- Export operations use streaming writes
+- No memory leaks on repeated queries
+
@@ -0,0 +1,402 @@
+# Integration Test: Model Manager
+
+## Summary
+Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.
+
+## Component Under Test
+**Component**: Model Manager
+**Location**: `gps_denied_15_model_manager`
+**Dependencies**:
+- TensorRT runtime
+- ONNX model files
+- GPU (NVIDIA RTX 2060/3070)
+- Configuration Manager
+- File system access
+
+## Detailed Description
+This test validates that the Model Manager can:
+1. Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
+2. Convert ONNX models to TensorRT engines if needed
+3. Manage model lifecycle (load, warm-up, unload)
+4. Provide thread-safe access to models for concurrent requests
+5. Handle GPU memory allocation efficiently
+6. Support FP16 precision for performance
+7. Cache compiled engines for fast startup
+8. Detect and adapt to available GPU capabilities
+9. Handle model loading failures gracefully
+10. Monitor GPU utilization and memory
+
+Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.
+
+## Input Data
+
+### Test Case 1: Load SuperPoint Model
+- **Model**: SuperPoint feature detector
+- **Input Size**: 1024x683 (for FullHD downscaled)
+- **Format**: TensorRT engine or ONNX → TensorRT
+- **Expected**: Model loaded, ready for inference
+
+### Test Case 2: Load LightGlue Model
+- **Model**: LightGlue feature matcher
+- **Input**: Two sets of 256 features each
+- **Format**: TensorRT engine
+- **Expected**: Model loaded with correct input/output bindings
+
+### Test Case 3: Load DINOv2 Model
+- **Model**: DINOv2-Small for AnyLoc
+- **Input Size**: 512x512
+- **Format**: TensorRT engine
+- **Expected**: Model loaded, optimized for batch processing
+
+### Test Case 4: Load LiteSAM Model
+- **Model**: LiteSAM for cross-view matching
+- **Input**: UAV image + satellite tile
+- **Format**: TensorRT engine
+- **Expected**: Model loaded with multi-input support
+
+### Test Case 5: Cold Start (All Models)
+- **Scenario**: Load all 4 models from scratch
+- **Expected**: All models ready within 10 seconds
+
+### Test Case 6: Warm Start (Cached Engines)
+- **Scenario**: Load pre-compiled TensorRT engines
+- **Expected**: All models ready within 2 seconds
+
+### Test Case 7: Model Inference (SuperPoint)
+- **Input**: Test image AD000001.jpg (1024x683)
+- **Expected Output**: Keypoints and descriptors
+- **Expected**: Inference time <15ms on RTX 3070
+
+### Test Case 8: Concurrent Inference
+- **Scenario**: 5 simultaneous inference requests to SuperPoint
+- **Expected**: All complete successfully, no crashes
+
+### Test Case 9: FP16 Precision
+- **Model**: SuperPoint in FP16 mode
+- **Expected**: 2-3x speedup vs FP32, minimal accuracy loss
+
+### Test Case 10: GPU Memory Management
+- **Scenario**: Load all models, monitor GPU memory
+- **Expected**: Total GPU memory < 6GB (fits on RTX 2060)
+
+### Test Case 11: Model Unload and Reload
+- **Scenario**: Unload SuperPoint, reload it
+- **Expected**: Successful reload, no memory leak
+
+### Test Case 12: Handle Missing Model File
+- **Scenario**: Attempt to load non-existent model
+- **Expected**: Clear error message, graceful failure
+
+### Test Case 13: Handle Incompatible GPU
+- **Scenario**: Simulate GPU without required compute capability
+- **Expected**: Detect and report incompatibility
+
+### Test Case 14: ONNX to TensorRT Conversion
+- **Model**: ONNX model file
+- **Expected**: Automatic conversion to TensorRT, caching
+
+### Test Case 15: Model Warm-up
+- **Scenario**: Run warm-up inference after loading
+- **Expected**: First real inference is fast (no CUDA init overhead)
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "model_name": "superpoint|lightglue|dinov2|litesam",
+  "load_status": "success|failed",
+  "load_time_ms": <float>,
+  "engine_path": "path/to/tensorrt/engine",
+  "input_shapes": [
+    {"name": "input", "shape": [1, 3, 1024, 683]}
+  ],
+  "output_shapes": [
+    {"name": "keypoints", "shape": [1, 256, 2]},
+    {"name": "descriptors", "shape": [1, 256, 256]}
+  ],
+  "precision": "fp32|fp16",
+  "gpu_memory_mb": <float>,
+  "inference_time_ms": <float>,
+  "error": "string|null"
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (SuperPoint)**:
+- load_status = "success"
+- load_time_ms < 3000
+- Model ready for inference
+
+**Test Case 2 (LightGlue)**:
+- load_status = "success"
+- load_time_ms < 3000
+- Correct input/output bindings
+
+**Test Case 3 (DINOv2)**:
+- load_status = "success"
+- load_time_ms < 5000
+- Supports batch processing
+
+**Test Case 4 (LiteSAM)**:
+- load_status = "success"
+- load_time_ms < 5000
+- Multi-input configuration correct
+
+**Test Case 5 (Cold Start)**:
+- All 4 models loaded
+- Total time < 10 seconds
+- All models functional
+
+**Test Case 6 (Warm Start)**:
+- All 4 models loaded from cache
+- Total time < 2 seconds
+- Faster than cold start
+
+**Test Case 7 (Inference)**:
+- Inference successful
+- Output shapes correct
+- inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)
+
+**Test Case 8 (Concurrent)**:
+- All 5 requests complete
+- No crashes or errors
+- Average inference time < 50ms
+
+**Test Case 9 (FP16)**:
+- FP16 engine loads successfully
+- inference_time_ms < 8ms (2x speedup)
+- Output quality acceptable
+
+**Test Case 10 (GPU Memory)**:
+- gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
+- No out-of-memory errors
+- Memory usage stable
+
+**Test Case 11 (Unload/Reload)**:
+- Unload successful
+- Reload successful
+- Memory freed after unload
+- No memory leak (< 100MB difference)
+
+**Test Case 12 (Missing File)**:
+- load_status = "failed"
+- error message clear and specific
+- No crash
+
+**Test Case 13 (Incompatible GPU)**:
+- Incompatibility detected
+- Error message informative
+- Suggests compatible GPU or CPU fallback
+
+**Test Case 14 (ONNX Conversion)**:
+- Conversion successful
+- TensorRT engine cached
+- Next load uses cache (faster)
+
+**Test Case 15 (Warm-up)**:
+- Warm-up completes < 1 second
+- First real inference fast (< 20ms)
+- No CUDA initialization delays
+
+## Maximum Expected Time
+- **Load single model (cold)**: < 3 seconds
+- **Load single model (warm)**: < 500ms
+- **Load all models (cold)**: < 10 seconds
+- **Load all models (warm)**: < 2 seconds
+- **Single inference**: < 15ms (RTX 3070), < 25ms (RTX 2060)
+- **ONNX conversion**: < 60 seconds (one-time cost)
+- **Total test suite**: < 180 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Verify GPU availability and capabilities
+   b. Check TensorRT installation
+   c. Prepare model files (ONNX or pre-compiled engines)
+   d. Initialize Model Manager
+
+2. **Test Case 1-4 - Load Individual Models**:
+   For each model:
+   a. Call load_model(model_name)
+   b. Measure load time
+   c. Verify model ready
+   d. Check input/output shapes
+   e. Monitor GPU memory
+
+3. **Test Case 5 - Cold Start**:
+   a. Clear any cached engines
+   b. Load all 4 models
+   c. Measure total time
+   d. Verify all functional
+
+4. **Test Case 6 - Warm Start**:
+   a. Unload all models
+   b. Load all 4 models again (engines cached)
+   c. Measure total time
+   d. Compare with cold start
+
+5. **Test Case 7 - Inference**:
+   a. Load test image
+   b. Run SuperPoint inference
+   c. Measure inference time
+   d. Validate output format
+
+6. **Test Case 8 - Concurrent**:
+   a. Prepare 5 inference requests
+   b. Submit all simultaneously
+   c. Wait for all completions
+   d. Check for errors
+
+7. **Test Case 9 - FP16**:
+   a. Load SuperPoint in FP16 mode
+   b. Run inference
+   c. Compare speed with FP32
+   d. Validate output quality
+
+8. **Test Case 10 - GPU Memory**:
+   a. Query GPU memory before loading
+   b. Load all models
+   c. Query GPU memory after loading
+   d. Calculate usage
+
+9. **Test Case 11 - Unload/Reload**:
+   a. Unload SuperPoint
+   b. Check memory freed
+   c. Reload SuperPoint
+   d. Verify no memory leak
+
+10. **Test Case 12 - Missing File**:
+    a. Attempt to load non-existent model
+    b. Catch error
+    c. Verify error message
+    d. Check no crash
+
+11. **Test Case 13 - Incompatible GPU**:
+    a. Check GPU compute capability
+    b. If incompatible, verify detection
+    c. Check error handling
+
+12. **Test Case 14 - ONNX Conversion**:
+    a. Provide ONNX model file
+    b. Trigger conversion
+    c. Verify TensorRT engine created
+    d. Check caching works
+
+13. **Test Case 15 - Warm-up**:
+    a. Load model
+    b. Run warm-up inference(s)
+    c. Run real inference
+    d. Verify fast execution
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All models load successfully (cold and warm start)
+- Total load time < 10 seconds (cold), < 2 seconds (warm)
+- Inference times meet AC-7 requirements (contribute to < 5s per image)
+- GPU memory usage < 6GB
+- No memory leaks
+- Concurrent inference works correctly
+- Error handling robust
+
+**Test Fails If**:
+- Any model fails to load (with valid files)
+- Load time exceeds 15 seconds
+- Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
+- GPU memory exceeds 8GB
+- Memory leaks detected (> 500MB growth)
+- Concurrent inference causes crashes
+- Invalid error handling
+
+## Additional Validation
+
+**TensorRT Optimization**:
+Verify optimizations:
+- Layer fusion
+- Kernel auto-tuning
+- Dynamic tensor memory allocation
+- FP16/INT8 quantization support
+
+**Model Configuration**:
+- Input preprocessing (normalization, resizing)
+- Output postprocessing (NMS, thresholding)
+- Batch size configuration
+- Dynamic vs static shapes
+
+**Performance Benchmarks**:
+
+**SuperPoint**:
+- FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
+- FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
+- Keypoints extracted: 200-500 per image
+
+**LightGlue**:
+- FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
+- FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
+- Matches: 50-200 per image pair
+
+**DINOv2**:
+- FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
+- FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
+- Feature dimension: 384 (DINOv2-Small)
+
+**LiteSAM**:
+- FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
+- FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
+- Correspondence points: 100-500
+
+**Total Vision Pipeline** (L1 + L2 or L3):
+- Target: < 500ms per image (to meet AC-7 < 5s total)
+- Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
+- Well within budget
+
+**Error Scenarios**:
+1. **CUDA Out of Memory**: Provide clear error with suggestions
+2. **Model File Corrupted**: Detect and report
+3. **TensorRT Version Mismatch**: Handle gracefully
+4. **GPU Driver Issues**: Detect and suggest update
+5. **Insufficient Compute Capability**: Reject with clear message
+
+**Thread Safety**:
+- Multiple threads calling inference simultaneously
+- Model loading during inference
+- Thread-safe reference counting
+- No race conditions
+
+**Resource Cleanup**:
+- GPU memory freed on model unload
+- CUDA contexts released properly
+- File handles closed
+- No resource leaks
+
+**Monitoring and Logging**:
+- Log model load times
+- Track inference times (min/max/avg)
+- Monitor GPU utilization
+- Alert on performance degradation
+- Memory usage trends
+
+**Compatibility Matrix**:
+
+| GPU Model | VRAM | Compute Capability | FP16 Support | Expected Performance |
+|-----------|------|-------------------|--------------|---------------------|
+| RTX 2060 | 6GB | 7.5 | Yes | Baseline (25ms SuperPoint) |
+| RTX 3070 | 8GB | 8.6 | Yes | ~1.5x faster (15ms) |
+| RTX 4070 | 12GB | 8.9 | Yes | ~2x faster (10ms) |
+
+**Model Versions and Updates**:
+- Support multiple model versions
+- Graceful migration to new versions
+- A/B testing capability
+- Rollback on performance regression
+
+**Configuration Options**:
+- Model path configuration
+- Precision selection (FP32/FP16)
+- Batch size tuning
+- Workspace size for TensorRT builder
+- Engine cache location
+- Warm-up settings
+
@@ -0,0 +1,308 @@
+# Integration Test: Failure Recovery Coordinator (F11)
+
+## Summary
+Validate the Failure Recovery Coordinator that detects processing failures and coordinates recovery strategies. F11 is a pure logic component that returns status objects - it does NOT directly emit events or communicate with clients.
+
+## Component Under Test
+**Component**: Failure Recovery Coordinator (F11)
+**Interface**: `IFailureRecoveryCoordinator`
+**Dependencies**:
+- F04 Satellite Data Manager (search grids)
+- F06 Image Rotation Manager (rotation sweeps)
+- F08 Global Place Recognition (candidate retrieval)
+- F09 Metric Refinement (alignment)
+- F10 Factor Graph Optimizer (anchor application)
+- F12 Route Chunk Manager (chunk operations)
+
+## Architecture Pattern
+**Pure Logic Component**: F11 coordinates recovery strategies but delegates execution and communication.
+- **NO Events**: Returns status objects or booleans
+- **Caller Responsibility**: F02.2 decides state transitions based on F11 returns
+- **Chunk Orchestration**: Coordinates F12 and F10 operations during recovery
+
+## Detailed Description
+Per AC-6: "In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image."
+
+Tests that the coordinator can:
+1. Assess tracking confidence from VO and LiteSAM results
+2. Detect tracking loss conditions
+3. Coordinate progressive tile search (1→4→9→16→25)
+4. Create user input request objects (NOT send them)
+5. Apply user-provided GPS anchors
+6. Proactively create chunks on tracking loss
+7. Coordinate chunk semantic matching
+8. Coordinate chunk LiteSAM matching with rotation sweeps
+9. Merge chunks to main trajectory
+
+## Input Data
+
+### Test Case 1: Check Confidence (Good)
+- **Input**: VO result with 80 inliers, LiteSAM confidence 0.85
+- **Expected**: 
+  - Returns ConfidenceAssessment
+  - tracking_status = "good"
+  - overall_confidence > 0.7
+
+### Test Case 2: Check Confidence (Degraded)
+- **Input**: VO result with 35 inliers, LiteSAM confidence 0.5
+- **Expected**: 
+  - Returns ConfidenceAssessment
+  - tracking_status = "degraded"
+  - overall_confidence 0.3-0.7
+
+### Test Case 3: Check Confidence (Lost)
+- **Input**: VO result with 10 inliers, no LiteSAM result
+- **Expected**: 
+  - Returns ConfidenceAssessment
+  - tracking_status = "lost"
+  - overall_confidence < 0.3
+
+### Test Case 4: Detect Tracking Loss
+- **Input**: ConfidenceAssessment with tracking_status = "lost"
+- **Expected**: Returns True (tracking lost)
+
+### Test Case 5: Start Search
+- **Input**: flight_id, frame_id, estimated_gps
+- **Expected**: 
+  - Returns SearchSession
+  - session.current_grid_size = 1
+  - session.found = False
+  - session.exhausted = False
+
+### Test Case 6: Expand Search Radius
+- **Input**: SearchSession with grid_size = 1
+- **Action**: Call expand_search_radius()
+- **Expected**: 
+  - Returns List[TileCoords] (3 new tiles for 2x2 grid)
+  - session.current_grid_size = 4
+
+### Test Case 7: Try Current Grid (Match Found)
+- **Input**: SearchSession, tiles dict with matching tile
+- **Expected**: 
+  - Returns AlignmentResult
+  - result.matched = True
+  - result.gps populated
+
+### Test Case 8: Try Current Grid (No Match)
+- **Input**: SearchSession, tiles dict with no matching tile
+- **Expected**: 
+  - Returns None
+  - Caller should call expand_search_radius()
+
+### Test Case 9: Mark Found
+- **Input**: SearchSession, AlignmentResult
+- **Expected**: 
+  - Returns True
+  - session.found = True
+
+### Test Case 10: Get Search Status
+- **Input**: SearchSession
+- **Expected**: 
+  - Returns SearchStatus
+  - Contains current_grid_size, found, exhausted
+
+### Test Case 11: Create User Input Request
+- **Input**: flight_id, frame_id, candidate_tiles
+- **Expected**: 
+  - Returns UserInputRequest object (NOT sent)
+  - Contains request_id, flight_id, frame_id
+  - Contains uav_image, candidate_tiles, message
+  - **NOTE**: Caller (F02.2) sends to F15
+
+### Test Case 12: Apply User Anchor
+- **Input**: flight_id, frame_id, UserAnchor with GPS
+- **Expected**: 
+  - Calls F10.add_absolute_factor() with high confidence
+  - Returns True if successful
+  - **NOTE**: Caller (F02.2) updates state and publishes result
+
+### Test Case 13: Create Chunk on Tracking Loss
+- **Input**: flight_id, frame_id
+- **Expected**: 
+  - Calls F12.create_chunk()
+  - Returns ChunkHandle
+  - chunk.is_active = True
+  - chunk.has_anchor = False
+  - chunk.matching_status = "unanchored"
+
+### Test Case 14: Try Chunk Semantic Matching
+- **Input**: chunk_id (chunk with 10 frames)
+- **Expected**: 
+  - Gets chunk images via F12
+  - Calls F08.retrieve_candidate_tiles_for_chunk()
+  - Returns List[TileCandidate] or None
+
+### Test Case 15: Try Chunk LiteSAM Matching
+- **Input**: chunk_id, candidate_tiles
+- **Expected**: 
+  - Gets chunk images via F12
+  - Calls F06.try_chunk_rotation_steps() (12 rotations)
+  - Returns ChunkAlignmentResult or None
+  - Result contains rotation_angle, chunk_center_gps, transform
+
+### Test Case 16: Merge Chunk to Trajectory
+- **Input**: flight_id, chunk_id, ChunkAlignmentResult
+- **Expected**: 
+  - Calls F12.mark_chunk_anchored()
+  - Calls F12.merge_chunks()
+  - Returns True if successful
+  - **NOTE**: Caller (F02.2) coordinates result updates
+
+### Test Case 17: Process Unanchored Chunks (Logic)
+- **Input**: flight_id with 2 unanchored chunks
+- **Expected**: 
+  - Calls F12.get_chunks_for_matching()
+  - For each ready chunk:
+    - try_chunk_semantic_matching()
+    - try_chunk_litesam_matching()
+    - merge_chunk_to_trajectory() if match found
+
+### Test Case 18: Progressive Search Full Flow
+- **Scenario**: 
+  - start_search() → grid_size=1
+  - try_current_grid() → None
+  - expand_search_radius() → grid_size=4
+  - try_current_grid() → None
+  - expand_search_radius() → grid_size=9
+  - try_current_grid() → AlignmentResult
+  - mark_found() → success
+- **Expected**: Search succeeds at 3x3 grid
+
+### Test Case 19: Search Exhaustion Flow
+- **Scenario**: 
+  - start_search()
+  - try all grids: 1→4→9→16→25, all fail
+  - create_user_input_request()
+- **Expected**: 
+  - Returns UserInputRequest
+  - session.exhausted = True
+  - **NOTE**: Caller sends request, waits for user fix
+
+### Test Case 20: Chunk Recovery Full Flow
+- **Scenario**: 
+  - create_chunk_on_tracking_loss() → chunk created
+  - Processing continues in chunk
+  - try_chunk_semantic_matching() → candidates found
+  - try_chunk_litesam_matching() → match at 90° rotation
+  - merge_chunk_to_trajectory() → success
+- **Expected**: Chunk anchored and merged without user input
+
+## Expected Output
+
+### ConfidenceAssessment
+```json
+{
+  "overall_confidence": 0.85,
+  "vo_confidence": 0.9,
+  "litesam_confidence": 0.8,
+  "inlier_count": 80,
+  "tracking_status": "good|degraded|lost"
+}
+```
+
+### SearchSession
+```json
+{
+  "session_id": "string",
+  "flight_id": "string",
+  "frame_id": 42,
+  "center_gps": {"latitude": 48.275, "longitude": 37.385},
+  "current_grid_size": 4,
+  "max_grid_size": 25,
+  "found": false,
+  "exhausted": false
+}
+```
+
+### UserInputRequest
+```json
+{
+  "request_id": "string",
+  "flight_id": "string",
+  "frame_id": 42,
+  "candidate_tiles": [...],
+  "message": "Please provide GPS location for this frame"
+}
+```
+
+### ChunkAlignmentResult
+```json
+{
+  "matched": true,
+  "chunk_id": "string",
+  "chunk_center_gps": {"latitude": 48.275, "longitude": 37.385},
+  "rotation_angle": 90.0,
+  "confidence": 0.85,
+  "inlier_count": 150,
+  "transform": {...}
+}
+```
+
+## Success Criteria
+
+**Test Cases 1-4 (Confidence)**:
+- Confidence assessment accurate
+- Thresholds correctly applied
+- Tracking loss detected correctly
+
+**Test Cases 5-10 (Progressive Search)**:
+- Search session management works
+- Grid expansion correct (1→4→9→16→25)
+- Match detection works
+
+**Test Cases 11-12 (User Input)**:
+- UserInputRequest object created correctly (not sent)
+- User anchor applied correctly
+
+**Test Cases 13-17 (Chunk Recovery)**:
+- Proactive chunk creation works
+- Chunk semantic matching works
+- Chunk LiteSAM matching with rotation works
+- Chunk merging works
+
+**Test Cases 18-20 (Full Flows)**:
+- Progressive search flow completes
+- Search exhaustion flow completes
+- Chunk recovery flow completes
+
+## Maximum Expected Time
+- check_confidence: < 10ms
+- detect_tracking_loss: < 5ms
+- Progressive search (25 tiles): < 1.5s total
+- create_user_input_request: < 100ms
+- apply_user_anchor: < 500ms
+- Chunk semantic matching: < 2s
+- Chunk LiteSAM matching (12 rotations): < 10s
+- Total test suite: < 120 seconds
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All 20 test cases pass
+- Confidence assessment accurate
+- Progressive search works
+- User input request created correctly (not sent)
+- Chunk recovery works
+- No direct event emission (pure logic)
+
+**Test Fails If**:
+- Tracking loss not detected when should be
+- Progressive search fails to expand correctly
+- User input request not created when needed
+- F11 directly emits events (violates architecture)
+- Chunk recovery fails
+- Performance exceeds targets
+
+## Architecture Validation
+
+**F11 Must NOT**:
+- Call F15 directly (SSE events)
+- Emit events to clients
+- Manage processing state
+- Control processing flow
+
+**F11 Must**:
+- Return status objects for all operations
+- Let caller (F02.2) decide next actions
+- Coordinate with F10, F12 for chunk operations
+- Be testable in isolation (no I/O dependencies)
@@ -0,0 +1,82 @@
+# Integration Test: Configuration Manager
+
+## Summary
+Validate the Configuration Manager responsible for loading, validating, and providing system configuration parameters.
+
+## Component Under Test
+**Component**: Configuration Manager
+**Location**: `gps_denied_16_configuration_manager`
+**Dependencies**: File system, Environment variables
+
+## Detailed Description
+Tests that Configuration Manager can:
+1. Load configuration from files (YAML/JSON)
+2. Override with environment variables
+3. Validate configuration parameters
+4. Provide default values
+5. Support configuration hot-reload
+6. Handle invalid configurations gracefully
+7. Manage sensitive data (API keys)
+
+## Input Data
+
+### Test Case 1: Load Default Configuration
+- **File**: config/default.yaml
+- **Expected**: All default parameters loaded
+
+### Test Case 2: Environment Variable Override
+- **Env**: GOOGLE_MAPS_API_KEY=test_key_123
+- **Expected**: Overrides config file value
+
+### Test Case 3: Invalid Configuration
+- **File**: Invalid YAML syntax
+- **Expected**: Load fails with clear error
+
+### Test Case 4: Missing Required Parameter
+- **File**: Missing altitude parameter
+- **Expected**: Validation fails, uses default or errors
+
+### Test Case 5: Parameter Validation
+- **Config**: altitude_m = -100 (invalid)
+- **Expected**: Validation rejects negative altitude
+
+### Test Case 6: Configuration Hot-Reload
+- **Scenario**: Modify config file while system running
+- **Expected**: Changes detected and applied
+
+### Test Case 7: Sensitive Data Handling
+- **Config**: API keys, passwords
+- **Expected**: Not logged, encrypted at rest
+
+## Expected Output
+```json
+{
+  "config_loaded": true,
+  "config_source": "file|env|default",
+  "parameters": {
+    "altitude_m": 400,
+    "google_maps_zoom": 19,
+    "gpu_device_id": 0,
+    "processing_timeout_s": 300
+  },
+  "validation_errors": [],
+  "load_time_ms": <float>
+}
+```
+
+## Success Criteria
+- All test cases load or fail appropriately
+- Validation catches invalid parameters
+- Sensitive data protected
+- Load time < 500ms
+
+## Maximum Expected Time
+- Load configuration: < 500ms
+- Validate: < 100ms
+- Hot-reload: < 1 second
+- Total test suite: < 30 seconds
+
+## Pass/Fail Criteria
+**Passes If**: All configurations load/validate correctly, overrides work, sensitive data protected
+**Fails If**: Invalid configs not detected, sensitive data exposed, overrides don't work
+
@@ -0,0 +1,102 @@
+# Integration Test: Database Layer
+
+## Summary
+Validate the Database Layer for storing and retrieving flights, images, results, and system state.
+
+## Component Under Test
+**Component**: GPS Denied Database Layer
+**Location**: `gps_denied_17_gps_denied_database_layer`
+**Dependencies**: PostgreSQL/SQLite database, SQL migrations
+
+## Detailed Description
+Tests that Database Layer can:
+1. Create database schema via migrations
+2. Perform CRUD operations on all entities
+3. Handle transactions atomically
+4. Enforce foreign key constraints
+5. Support concurrent access
+6. Provide efficient queries with indexes
+7. Handle connection pooling
+8. Backup and restore data
+
+## Input Data
+
+### Test Case 1: Schema Creation
+- **Action**: Run database migrations
+- **Expected**: All tables created with correct schema
+
+### Test Case 2: Insert Flight
+- **Data**: New flight record
+- **Expected**: Flight inserted, ID returned
+
+### Test Case 3: Insert Images (Batch)
+- **Data**: 10 image records
+- **Expected**: All inserted in single transaction
+
+### Test Case 4: Insert Results
+- **Data**: GPS result for processed image
+- **Expected**: Result inserted with foreign key to image
+
+### Test Case 5: Query by Flight ID
+- **Query**: Get all images for a flight
+- **Expected**: Returns all images in correct order
+
+### Test Case 6: Update Result (Refinement)
+- **Action**: Update existing result with refined GPS
+- **Expected**: Result updated, version incremented
+
+### Test Case 7: Delete Flight (Cascade)
+- **Action**: Delete flight
+- **Expected**: Associated images and results also deleted
+
+### Test Case 8: Transaction Rollback
+- **Scenario**: Insert flight, insert images, error occurs
+- **Expected**: Transaction rolls back, no partial data
+
+### Test Case 9: Concurrent Writes
+- **Scenario**: 5 simultaneous inserts to different flights
+- **Expected**: All succeed, no deadlocks
+
+### Test Case 10: Query Performance
+- **Query**: Complex join across flights, images, results
+- **Dataset**: 1000 flights, 60,000 images
+- **Expected**: Query completes < 100ms
+
+### Test Case 11: Index Effectiveness
+- **Query**: Find image by ID
+- **Expected**: Uses index, < 1ms lookup
+
+### Test Case 12: Connection Pooling
+- **Scenario**: 100 rapid connections
+- **Expected**: Pool reuses connections, no exhaustion
+
+## Expected Output
+```json
+{
+  "operation": "insert|select|update|delete",
+  "success": true/false,
+  "rows_affected": <integer>,
+  "execution_time_ms": <float>,
+  "error": "string|null"
+}
+```
+
+## Success Criteria
+- All CRUD operations work correctly
+- Transactions atomic (all-or-nothing)
+- Foreign keys enforced
+- Query performance acceptable
+- Connection pooling works
+- No SQL injection vulnerabilities
+
+## Maximum Expected Time
+- Schema migration: < 5 seconds
+- Single insert: < 10ms
+- Batch insert (10): < 100ms
+- Query: < 100ms
+- Total test suite: < 60 seconds
+
+## Pass/Fail Criteria
+**Passes If**: All operations succeed, performance acceptable, data integrity maintained
+**Fails If**: Data corruption, slow queries, deadlocks, constraint violations
+
@@ -0,0 +1,114 @@
+# System Integration Test: End-to-End Normal Flight
+
+## Summary
+Validate complete system flow from flight creation through image processing to final results for a standard flight scenario.
+
+## Components Under Test
+All ASTRAL-Next components in integrated system:
+- REST API, Flight Manager, Image Input Pipeline
+- Sequential VO (L1), Global PR (L2), Metric Refinement (L3)
+- Factor Graph Optimizer, Result Manager, SSE Event Streamer
+- Satellite Data Manager, Coordinate Transformer, Database Layer
+
+## Detailed Description
+This is a comprehensive end-to-end test simulating real operational usage. Tests the complete workflow:
+1. User creates flight via REST API
+2. Uploads images (AD000001-AD000030)
+3. System processes images through all vision layers
+4. Factor graph optimizes trajectory
+5. Results streamed via SSE in real-time
+6. Final results retrieved and validated against ground truth
+
+## Test Scenario
+**Flight**: Test_Baseline (30 images, normal spacing ~120m)
+**Images**: AD000001-AD000030
+**Expected Outcome**: Meet AC-1 and AC-2 accuracy targets
+
+## Test Steps
+
+1. **Setup (< 10s)**:
+   - Start all system services
+   - Initialize database
+   - Load models (TensorRT engines)
+   - Prepare satellite tiles
+
+2. **Create Flight (< 1s)**:
+   - POST /flights
+   - Payload: start_gps, altitude=400m, camera_params
+   - Verify: flight_id returned, state="created"
+
+3. **Upload Images (< 30s)**:
+   - POST /flights/{flightId}/images/batch
+   - Upload AD000001-AD000030
+   - Verify: All 30 queued, processing starts
+
+4. **Monitor Processing via SSE (< 150s)**:
+   - Connect to GET /flights/{flightId}/stream
+   - Receive "image_processed" events as each image completes
+   - Verify: Events received in < 1s of processing
+
+5. **Await Completion (< 150s total)**:
+   - Monitor flight status via GET /flights/{flightId}
+   - Wait for state="completed"
+   - Verify: No failures, all 30 images processed
+
+6. **Retrieve Results (< 2s)**:
+   - GET /flights/{flightId}/results?format=json
+   - Download GPS coordinates for all images
+   - Compare with ground truth
+
+7. **Validate Accuracy**:
+   - Calculate errors for all 30 images
+   - Check: ≥80% within 50m (AC-1)
+   - Check: ≥60% within 20m (AC-2)
+   - Check: Mean error < 30m
+
+8. **Export Results (< 5s)**:
+   - GET /flights/{flightId}/results?format=csv
+   - Verify CSV matches coordinates.csv format
+
+## Expected Results
+```json
+{
+  "test_status": "passed",
+  "flight_id": "flight_123",
+  "total_images": 30,
+  "processed_images": 30,
+  "failed_images": 0,
+  "total_time_s": 145,
+  "avg_time_per_image_s": 4.83,
+  "accuracy_stats": {
+    "mean_error_m": 24.5,
+    "percent_under_50m": 93.3,
+    "percent_under_20m": 66.7,
+    "registration_rate": 100.0
+  },
+  "ac_compliance": {
+    "AC-1": "PASS",
+    "AC-2": "PASS",
+    "AC-7": "PASS",
+    "AC-8": "PASS",
+    "AC-9": "PASS"
+  }
+}
+```
+
+## Success Criteria
+- All 30 images processed successfully
+- Processing time < 5s per image (AC-7)
+- ≥80% accuracy within 50m (AC-1)
+- ≥60% accuracy within 20m (AC-2)
+- SSE events delivered in real-time (AC-8)
+- Registration rate > 95% (AC-9)
+- No system errors or crashes
+
+## Maximum Expected Time
+- Setup: < 10 seconds
+- Processing 30 images: < 150 seconds (5s per image)
+- Result retrieval: < 2 seconds
+- Total test: < 180 seconds
+
+## Pass/Fail Criteria
+**Passes If**: All steps complete successfully, AC-1, AC-2, AC-7, AC-8, AC-9 requirements met
+**Fails If**: Any component fails, accuracy targets missed, processing too slow, system errors
+
@@ -0,0 +1,63 @@
+# System Integration Test: Satellite to Vision Pipeline
+
+## Summary
+Validate integration between Satellite Data Manager and vision processing layers (L2, L3).
+
+## Components Under Test
+- Satellite Data Manager
+- Coordinate Transformer
+- Global Place Recognition (L2)
+- Metric Refinement (L3)
+- Configuration Manager
+
+## Test Scenario
+Test the flow of satellite data through vision pipeline:
+1. Download satellite tiles for operational area
+2. Build Faiss index for L2 (AnyLoc)
+3. Use tiles as reference for L3 (LiteSAM)
+4. Validate GPS accuracy depends on satellite data quality
+
+## Test Cases
+
+### Test Case 1: Satellite Tile Download and Caching
+- Request tiles for AD000001-AD000010 locations
+- Verify tiles cached locally
+- Check georeferencing accuracy
+
+### Test Case 2: L2 Global Place Recognition with Satellites
+- Query L2 with AD000001.jpg
+- Verify retrieves correct satellite tile
+- Distance to ground truth < 200m
+
+### Test Case 3: L3 Metric Refinement with Satellite
+- Use AD000001.jpg and retrieved satellite tile
+- Run LiteSAM cross-view matching
+- Verify GPS accuracy < 20m
+
+### Test Case 4: Outdated Satellite Data (AC requirement)
+- Simulate outdated satellite imagery
+- Verify DINOv2 features still match
+- Accuracy may degrade but system continues
+
+### Test Case 5: Missing Satellite Tile
+- Request tile for area without coverage
+- Verify graceful failure
+- System requests alternative or skips
+
+## Success Criteria
+- Satellite tiles download and georeference correctly
+- L2 retrieves correct tiles (top-5 accuracy > 85%)
+- L3 achieves < 20m accuracy with good satellite data
+- System handles missing/outdated tiles gracefully
+
+## Maximum Expected Time
+- Tile download (10 tiles): < 60 seconds
+- Faiss index build: < 30 seconds
+- L2 queries (10): < 2 seconds
+- L3 refinements (10): < 1.5 seconds
+- Total test: < 120 seconds
+
+## Pass/Fail Criteria
+**Passes If**: Satellite data flows correctly through vision pipeline, accuracy targets met
+**Fails If**: Tile download fails, georeferencing incorrect, L2/L3 cannot use satellite data
+
@@ -0,0 +1,70 @@
+# System Integration Test: Vision to Optimization Pipeline
+
+## Summary
+Validate integration between vision layers (L1, L2, L3) and Factor Graph Optimizer.
+
+## Components Under Test
+- Sequential Visual Odometry (L1)
+- Global Place Recognition (L2)
+- Metric Refinement (L3)
+- Factor Graph Optimizer
+- Result Manager
+
+## Test Scenario
+Test the flow of vision estimates into factor graph optimization:
+1. L1 provides relative pose factors
+2. L3 provides absolute GPS factors
+3. Factor graph fuses both into optimized trajectory
+4. Results show improvement over individual layers
+
+## Test Cases
+
+### Test Case 1: Sequential Factors Only (L1)
+- Process AD000001-AD000010 with L1 only
+- Feed relative poses to factor graph
+- Verify: Drift accumulates without GPS anchors
+
+### Test Case 2: GPS Anchors Only (L3)
+- Process same images with L3 only
+- Feed absolute GPS to factor graph
+- Verify: Accurate but no temporal smoothness
+
+### Test Case 3: Fused L1 + L3 (Optimal)
+- Process with both L1 and L3
+- Factor graph fuses relative and absolute factors
+- Verify: Better accuracy than L1-only, smoother than L3-only
+
+### Test Case 4: L2 Recovery after L1 Failure
+- Simulate L1 tracking loss
+- L2 recovers global location
+- L3 refines it
+- Factor graph incorporates recovery
+
+### Test Case 5: Robust Outlier Handling
+- Include outlier measurement (268m jump)
+- Verify robust kernel down-weights outlier
+- Trajectory remains consistent
+
+### Test Case 6: Incremental Updates
+- Add images one by one
+- Factor graph updates incrementally
+- Verify past trajectory refined when new anchors arrive (AC-8)
+
+## Success Criteria
+- L1-only shows drift (errors grow over time)
+- L3-only accurate but may be jagged
+- L1+L3 fusion achieves best results
+- Outliers handled without breaking trajectory
+- Incremental updates work correctly
+- Accuracy improves over single-layer estimates
+
+## Maximum Expected Time
+- L1-only (10 images): < 10 seconds
+- L3-only (10 images): < 15 seconds
+- Fused (10 images): < 20 seconds
+- Total test: < 60 seconds
+
+## Pass/Fail Criteria
+**Passes If**: Factor graph successfully fuses vision estimates, accuracy improved, outliers handled
+**Fails If**: Fusion fails, accuracy worse than single layer, outliers corrupt trajectory
+
@@ -0,0 +1,60 @@
+# System Integration Test: Multi-Component Error Propagation
+
+## Summary
+Validate how errors propagate and are handled across multiple system components.
+
+## Components Under Test
+All components, focusing on error handling and recovery
+
+## Test Scenarios
+
+### Test Case 1: Database Connection Loss
+- **Trigger**: Disconnect database mid-flight
+- **Expected**: System detects, caches results in memory, reconnects when available
+- **Components Affected**: Result Manager, Flight Manager, Database Layer
+
+### Test Case 2: GPU Out of Memory
+- **Trigger**: Exhaust GPU memory
+- **Expected**: Clear error, processing paused or fails gracefully
+- **Components Affected**: Model Manager, Vision Layers
+
+### Test Case 3: Satellite API Failure
+- **Trigger**: Google Maps API returns 503 error
+- **Expected**: Retry with exponential backoff, use cached tiles if available
+- **Components Affected**: Satellite Data Manager
+
+### Test Case 4: Corrupted Image File
+- **Trigger**: Upload corrupted image
+- **Expected**: Detected by Image Input Pipeline, marked as failed, skip and continue
+- **Components Affected**: Image Input Pipeline, Flight Manager
+
+### Test Case 5: Vision Layer Cascade Failure
+- **Trigger**: L1, L2, L3 all fail for same image
+- **Expected**: Failure Recovery Coordinator requests user input (AC-6)
+- **Components Affected**: All vision layers, Failure Recovery Coordinator
+
+### Test Case 6: SSE Connection Drop
+- **Trigger**: Client SSE connection drops
+- **Expected**: Events buffered, client can reconnect and catch up
+- **Components Affected**: SSE Event Streamer, Result Manager
+
+### Test Case 7: Configuration File Invalid
+- **Trigger**: Corrupt configuration file
+- **Expected**: System uses defaults, logs warning, continues
+- **Components Affected**: Configuration Manager
+
+## Success Criteria
+- All errors detected and handled appropriately
+- No silent failures or data corruption
+- Clear error messages provided
+- System recovers or fails gracefully
+- Processing continues where possible
+
+## Maximum Expected Time
+- Each error scenario: < 30 seconds
+- Total test: < 300 seconds
+
+## Pass/Fail Criteria
+**Passes If**: All errors handled gracefully, no crashes, recovery works
+**Fails If**: Unhandled exceptions, crashes, silent failures, data corruption
+
@@ -0,0 +1,78 @@
+# System Integration Test: Real-Time Streaming Pipeline
+
+## Summary
+Validate real-time streaming of results via SSE as images are processed (AC-8 requirement).
+
+## Components Under Test
+- Image Input Pipeline
+- Vision Layers (L1, L2, L3)
+- Factor Graph Optimizer
+- Result Manager
+- SSE Event Streamer
+- Flight Manager
+
+## Test Scenario
+Per AC-8: "Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete."
+
+Test that:
+1. Images uploaded and queued
+2. Processing starts immediately
+3. Results streamed to client < 1s after processing
+4. Refinements sent when trajectory updated
+5. User can analyze early results before flight completes
+
+## Test Cases
+
+### Test Case 1: Immediate Result Delivery
+- Upload 10 images
+- Connect SSE client
+- Measure time from image processed to event received
+- **Target**: < 1 second latency (AC-8 "immediately")
+
+### Test Case 2: Progressive Results
+- Upload 30 images
+- Monitor SSE stream
+- Verify first results available before image 30 processed
+- **Target**: First result within 10 seconds of upload
+
+### Test Case 3: Refinement Notifications
+- Process 20 images
+- New GPS anchor added (image 20)
+- Factor graph refines trajectory for images 1-19
+- Verify "trajectory_refined" event sent
+- **Target**: Refinement notification within 2 seconds
+
+### Test Case 4: Multiple Concurrent Clients
+- 3 clients connect to same flight SSE stream
+- All receive events simultaneously
+- No delays between clients
+- **Target**: All clients receive within 100ms of each other
+
+### Test Case 5: Late-Joining Client
+- Flight already processing (10 images done)
+- New client connects
+- Receives catch-up of existing results plus live stream
+- **Target**: Seamless experience
+
+### Test Case 6: Backpressure Handling
+- Process images rapidly (50 images in 60 seconds)
+- Verify SSE can handle high event rate
+- No event loss or buffer overflow
+- **Target**: All events delivered, no loss
+
+## Success Criteria
+- Result latency < 1 second (AC-8)
+- First results available immediately
+- Refinements streamed when they occur
+- Multiple clients supported
+- No event loss under load
+
+## Maximum Expected Time
+- Setup: < 5 seconds
+- Process 30 images with streaming: < 150 seconds
+- Total test: < 180 seconds
+
+## Pass/Fail Criteria
+**Passes If**: AC-8 requirements met, latency < 1s, refinements streamed, no event loss
+**Fails If**: Latency > 2s, events lost, clients not notified of refinements
+
@@ -0,0 +1,105 @@
+# Acceptance Test: AC-1 - 80% of Photos < 50m Error
+
+## Summary
+Validate Acceptance Criterion 1: "The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS."
+
+## Linked Acceptance Criteria
+**AC-1**: 80% of photos < 50m error
+
+## Preconditions
+1. ASTRAL-Next system fully operational
+2. All TensorRT models loaded
+3. Satellite tiles cached for test area (48.25-48.28°N, 37.34-37.39°E)
+4. Ground truth GPS coordinates available (coordinates.csv)
+5. Test dataset prepared: Test_Baseline (AD000001-AD000030)
+
+## Test Description
+Process baseline flight of 30 images with normal spacing (~120m between images). Compare estimated GPS coordinates against ground truth and verify that at least 80% achieve error < 50 meters.
+
+## Test Steps
+
+### Step 1: Initialize System
+- **Action**: Start ASTRAL-Next system, verify all components ready
+- **Expected Result**: System state = "ready", all models loaded, no errors
+
+### Step 2: Create Test Flight
+- **Action**: Create flight "AC1_Baseline" with start_gps=48.275292, 37.385220, altitude=400m
+- **Expected Result**: Flight created, flight_id returned
+
+### Step 3: Upload Test Images
+- **Action**: Upload AD000001-AD000030 (30 images) in order
+- **Expected Result**: All 30 images queued, sequence maintained
+
+### Step 4: Monitor Processing
+- **Action**: Monitor flight status until completed
+- **Expected Result**: 
+  - Processing completes within 150 seconds (5s per image)
+  - No system errors
+  - Registration rate > 95%
+
+### Step 5: Retrieve Results
+- **Action**: GET /flights/{flightId}/results
+- **Expected Result**: Results for all 30 images returned
+
+### Step 6: Calculate Errors
+- **Action**: For each image, calculate haversine distance between estimated and ground truth GPS
+- **Expected Result**: Error array with 30 values
+
+### Step 7: Validate AC-1
+- **Action**: Count images with error < 50m, calculate percentage
+- **Expected Result**: **≥ 80% of images have error < 50 meters** ✓
+
+### Step 8: Generate Report
+- **Action**: Create test report with statistics
+- **Expected Result**: 
+  - Total images: 30
+  - Images < 50m: ≥ 24
+  - Percentage: ≥ 80.0%
+  - Mean error: documented
+  - Median error: documented
+  - Max error: documented
+
+## Success Criteria
+
+**Primary Criterion (AC-1)**:
+- ≥ 24 out of 30 images (80%) have GPS error < 50 meters
+
+**Supporting Criteria**:
+- All 30 images processed (or user input requested if failures occur)
+- Processing time < 150 seconds total
+- No system crashes or unhandled errors
+- Registration rate > 95% (AC-9)
+
+## Expected Results
+
+Based on solution architecture (LiteSAM RMSE ~18m), expected performance:
+```
+Total Images: 30
+Successfully Processed: 30 (100%)
+Images with error < 50m: 28 (93.3%)
+Images with error < 20m: 20 (66.7%)
+Mean Error: 24.5m
+Median Error: 18.2m
+RMSE: 28.3m
+Max Error: 48.7m
+AC-1 Status: PASS (93.3% > 80%)
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- ≥ 80% of images achieve error < 50m
+- System completes processing without critical failures
+- Results reproducible across multiple test runs
+
+**TEST FAILS IF**:
+- < 80% of images achieve error < 50m
+- System crashes or becomes unresponsive
+- More than 5% of images fail to process (violates AC-9)
+
+## Notes
+- This test uses Test_Baseline dataset (AD000001-030) with consistent spacing
+- No sharp turns or outliers in this dataset
+- Represents ideal operating conditions
+- If test fails, investigate: satellite data quality, model accuracy, georeferencing precision
+
@@ -0,0 +1,104 @@
+# Acceptance Test: 80% Photos <50m Error - Varied Terrain
+
+## Summary
+Validate AC-1 accuracy requirement (80% of photos within 50m error) across different terrain types including agricultural fields, mixed vegetation, and urban edges.
+
+## Linked Acceptance Criteria
+**AC-1**: The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS.
+
+## Preconditions
+- ASTRAL-Next system fully deployed and operational
+- Satellite reference data downloaded for test region
+- TensorRT models loaded (SuperPoint, LightGlue, AnyLoc, LiteSAM)
+- Ground truth GPS coordinates available for validation
+- Test datasets covering varied terrain types
+
+## Test Data
+- **Primary Dataset**: AD000001-AD000060 (varied terrain across 60 images)
+- **Terrain Types**: Agricultural fields, tree lines, mixed vegetation, roads
+- **Ground Truth**: coordinates.csv
+- **Camera Parameters**: 400m altitude, 25mm focal length, 26MP resolution
+
+## Test Steps
+
+### Step 1: Initialize System with Starting GPS
+**Action**: Start flight processing with first image GPS coordinate (48.275292, 37.385220)
+**Expected Result**: 
+- System initializes successfully
+- Satellite tiles downloaded for operational area
+- L1, L2, L3 layers ready
+- Status: INITIALIZED
+
+### Step 2: Process Agricultural Field Segment (AD000001-015)
+**Action**: Process images over predominantly agricultural terrain
+**Expected Result**:
+- L1 sequential tracking maintains continuity
+- SuperPoint detects field boundaries and crop variations
+- LiteSAM achieves cross-view matching despite seasonal differences
+- Mean error <40m for this segment
+- Status: PROCESSING
+
+### Step 3: Process Mixed Vegetation Segment (AD000016-030)
+**Action**: Process images with mixed terrain features
+**Expected Result**:
+- L2 global place recognition active during transitions
+- AnyLoc retrieval successful using DINOv2 features
+- Factor graph optimization smooths trajectory
+- Mean error <45m for this segment
+- Status: PROCESSING
+
+### Step 4: Process Complex Terrain with Sharp Turns (AD000031-060)
+**Action**: Process remaining images including sharp turns and outliers
+**Expected Result**:
+- L2 recovers from sharp turns (AD000042-043, AD000032-033)
+- Robust cost functions handle AD000047-048 outlier (268.6m)
+- Multiple map fragments merged successfully
+- Mean error <50m for challenging segments
+- Status: PROCESSING
+
+### Step 5: Calculate Accuracy Metrics
+**Action**: Compare estimated GPS coordinates with ground truth
+**Expected Result**:
+```
+Total images: 60
+Error <50m: ≥48 images (80%)
+Error <20m: ≥36 images (60%)
+Mean error: <40m
+Median error: <35m
+Max error: <150m (excluding known outliers)
+```
+
+### Step 6: Validate Terrain-Specific Performance
+**Action**: Analyze accuracy by terrain type
+**Expected Result**:
+- Agricultural fields: 75-85% <50m
+- Mixed vegetation: 80-90% <50m  
+- Road intersections: 85-95% <50m
+- Overall: ≥80% <50m
+- Status: COMPLETED
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- ≥80% of images (48/60) have error <50m
+- No systematic bias across terrain types
+- System completes without fatal errors
+- Factor graph converges (final MRE <1.5px)
+
+**FAIL if**:
+- <80% of images meet 50m threshold
+- >3 terrain types show <70% accuracy
+- System crashes or hangs
+- Catastrophic tracking loss without recovery
+
+## Performance Requirements
+- Processing time: <5 seconds per image average
+- Total flight time: <5 minutes for 60 images
+- Memory usage: <8GB on RTX 3070
+- CPU usage: <80% average
+
+## Notes
+- Varied terrain test provides more comprehensive validation than baseline
+- Different terrain types stress different system components
+- AC-1 threshold of 80% allows for difficult scenarios while maintaining operational utility
+
@@ -0,0 +1,129 @@
+# Acceptance Test: AC-2 - 60% of Photos < 20m Error
+
+## Summary
+Validate Acceptance Criterion 2: "The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS."
+
+## Linked Acceptance Criteria
+**AC-2**: 60% of photos < 20m error
+
+## Preconditions
+1. ASTRAL-Next system fully operational
+2. All TensorRT models loaded (FP16 precision for maximum accuracy)
+3. High-quality satellite tiles cached (Zoom level 19, ~0.30 m/pixel)
+4. Ground truth GPS coordinates available
+5. Test dataset prepared: Test_Baseline (AD000001-AD000030)
+
+## Test Description
+Process same baseline flight as AC-1 test, but now validate the more stringent criterion that at least 60% of images achieve error < 20 meters. This tests the precision of LiteSAM cross-view matching.
+
+## Test Steps
+
+### Step 1: Initialize System for High Precision
+- **Action**: Start system, verify models loaded in optimal configuration
+- **Expected Result**: System ready, LiteSAM configured for maximum precision
+
+### Step 2: Create Test Flight
+- **Action**: Create flight "AC2_HighPrecision" with same parameters as AC-1
+- **Expected Result**: Flight created successfully
+
+### Step 3: Upload Test Images
+- **Action**: Upload AD000001-AD000030 (30 images)
+- **Expected Result**: All queued for processing
+
+### Step 4: Process with High-Quality Anchors
+- **Action**: System processes images, L3 provides frequent GPS anchors
+- **Expected Result**: 
+  - Processing completes
+  - Multiple GPS anchors per 10 images
+  - Factor graph well-constrained
+
+### Step 5: Retrieve Final Results
+- **Action**: GET /flights/{flightId}/results?include_refined=true
+- **Expected Result**: Refined GPS coordinates (post-optimization)
+
+### Step 6: Calculate Errors
+- **Action**: Calculate haversine distance for each image
+- **Expected Result**: Error array with 30 values
+
+### Step 7: Validate AC-2
+- **Action**: Count images with error < 20m, calculate percentage
+- **Expected Result**: **≥ 60% of images have error < 20 meters** ✓
+
+### Step 8: Analyze High-Precision Results
+- **Action**: Identify which images achieve < 20m vs 20-50m vs > 50m
+- **Expected Result**: 
+  - Category 1 (< 20m): ≥ 18 images (60%)
+  - Category 2 (20-50m): ~10 images
+  - Category 3 (> 50m): < 2 images
+
+### Step 9: Generate Detailed Report
+- **Action**: Create comprehensive accuracy report
+- **Expected Result**:
+  - Percentage breakdown by error thresholds
+  - Distribution histogram
+  - Correlation between accuracy and image features
+  - Compliance matrix for AC-1 and AC-2
+
+## Success Criteria
+
+**Primary Criterion (AC-2)**:
+- ≥ 18 out of 30 images (60%) have GPS error < 20 meters
+
+**Supporting Criteria**:
+- Also meets AC-1 (≥ 80% < 50m)
+- Mean error < 30 meters
+- RMSE < 35 meters
+- No catastrophic failures (errors > 200m)
+
+## Expected Results
+
+```
+Total Images: 30
+Successfully Processed: 30 (100%)
+Images with error < 10m: 8 (26.7%)
+Images with error < 20m: 20 (66.7%)
+Images with error < 50m: 28 (93.3%)
+Images with error > 50m: 2 (6.7%)
+Mean Error: 24.5m
+Median Error: 18.2m
+RMSE: 28.3m
+90th Percentile: 42.1m
+AC-2 Status: PASS (66.7% > 60%)
+AC-1 Status: PASS (93.3% > 80%)
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- ≥ 60% of images achieve error < 20m
+- Also passes AC-1 (≥ 80% < 50m)
+- System performance stable across multiple runs
+
+**TEST FAILS IF**:
+- < 60% of images achieve error < 20m
+- Fails AC-1 (would be critical failure)
+- Results not reproducible (high variance)
+
+## Error Analysis
+
+If test fails or borderline:
+
+**Investigate**:
+1. **Satellite Data Quality**: Check zoom level, age of imagery, resolution
+2. **LiteSAM Performance**: Review correspondence counts, homography quality
+3. **Factor Graph**: Check if GPS anchors frequent enough
+4. **Image Quality**: Verify no motion blur, good lighting conditions
+5. **Altitude Variation**: Check if altitude assumption (400m) accurate
+
+**Potential Improvements**:
+- Use Tier-2 commercial satellite data (higher resolution)
+- Increase GPS anchor frequency (every 3rd image vs every 5th)
+- Tune LiteSAM confidence threshold
+- Apply per-keyframe scale adjustment in factor graph
+
+## Notes
+- AC-2 is more stringent than AC-1 (20m vs 50m)
+- Achieving 60% at 20m while maintaining 80% at 50m validates solution design
+- LiteSAM reported RMSE of 17.86m on UAV-VisLoc dataset supports feasibility
+- Test represents high-precision navigation requirement
+
@@ -0,0 +1,104 @@
+# Acceptance Test: Single 350m Outlier Robustness
+
+## Summary
+Validate AC-3 requirement that the system handles a single outlier image with up to 350m position deviation caused by aircraft tilt without trajectory divergence.
+
+## Linked Acceptance Criteria
+**AC-3**: The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
+
+## Preconditions
+- ASTRAL-Next system operational
+- Factor graph optimizer configured with robust cost functions (Cauchy/Huber kernel)
+- Test dataset with known large position jump
+- Ground truth available for validation
+
+## Test Data
+- **Dataset**: AD000045-AD000050 (6 images)
+- **Outlier**: AD000047 → AD000048 (268.6m jump, closest to 350m requirement)
+- **Ground Truth**: coordinates.csv
+- **Purpose**: Verify robust handling of tilt-induced position jumps
+
+## Test Steps
+
+### Step 1: Establish Baseline Trajectory
+**Action**: Process AD000045-AD000046 to establish stable tracking
+**Expected Result**:
+- L1 sequential tracking successful
+- Factor graph initialized with first 2 poses
+- Baseline trajectory established
+- Status: TRACKING
+
+### Step 2: Process Outlier Frame
+**Action**: Process AD000047 (before the 268.6m jump)
+**Expected Result**:
+- Frame processed normally
+- Factor graph updated
+- Position estimate within expected range
+- Status: TRACKING
+
+### Step 3: Encounter Large Position Jump
+**Action**: Process AD000048 (268.6m from AD000047)
+**Expected Result**:
+- L1 reports low confidence or matching failure
+- Large translation vector detected (>>120m expected)
+- Robust cost function activates (error penalty changes from quadratic to linear/log)
+- Factor graph does NOT diverge
+- Status: OUTLIER_DETECTED
+
+### Step 4: Verify Robust Handling
+**Action**: Examine factor graph's treatment of the outlier constraint
+**Expected Result**:
+- Outlier factor down-weighted (residual weight <0.3)
+- OR outlier treated as rotation+normal translation
+- Trajectory maintains consistency with other constraints
+- No catastrophic error propagation
+- Status: OUTLIER_HANDLED
+
+### Step 5: Continue Processing Post-Outlier
+**Action**: Process AD000049-AD000050
+**Expected Result**:
+- L2 global relocalization may activate to verify position
+- L1 tracking resumes successfully
+- Factor graph optimization converges
+- Post-outlier trajectory accurate (<50m error)
+- Status: RECOVERED
+
+### Step 6: Validate Final Trajectory
+**Action**: Compare entire sequence with ground truth
+**Expected Result**:
+```
+AD000045: Error <50m
+AD000046: Error <50m  
+AD000047: Error <50m
+AD000048: Error may be 50-100m (outlier frame)
+AD000049: Error <50m (recovered)
+AD000050: Error <50m (recovered)
+Overall: System continues operation successfully
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- System processes all 6 images without crashing
+- Outlier frame (AD000048) does not cause trajectory divergence
+- Post-outlier frames (049-050) achieve <50m error
+- Factor graph converges with MRE <2.0px
+- No manual intervention required
+
+**FAIL if**:
+- System crashes on outlier frame
+- Trajectory diverges (errors >200m persist for >2 frames)
+- Factor graph fails to converge
+- Manual user input required for recovery
+
+## Technical Validation
+- **Robust Kernel Active**: Verify Cauchy/Huber kernel applied to outlier factor
+- **Residual Analysis**: Outlier residual >3σ from mean
+- **Weight Analysis**: Outlier factor weight <0.3 vs normal ~1.0
+- **Convergence**: Final optimization converges in <50 iterations
+
+## Notes
+- Test uses 268.6m outlier (real data) as proxy for 350m requirement
+- Robust M-estimation is critical for AC-3 compliance
+- Single outlier is easier than multiple; see AC-3 test 35 for multiple outliers
+
@@ -0,0 +1,116 @@
+# Acceptance Test: Multiple 350m Outliers Robustness
+
+## Summary
+Validate AC-3 requirement with multiple outlier frames (>250m position jumps) occurring in the same flight to ensure robust cost functions handle complex scenarios.
+
+## Linked Acceptance Criteria
+**AC-3**: The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route.
+
+## Preconditions
+- ASTRAL-Next system operational with robust M-estimation
+- Factor graph configured with Cauchy or Huber loss
+- Multiple outlier test dataset prepared
+- Ground truth available for all frames
+
+## Test Data
+- **Primary**: AD000001-AD000060 (contains multiple sharp turns >200m)
+- **Identified Outliers**:
+  - AD000003 → AD000004: 202.2m
+  - AD000032 → AD000033: 220.6m
+  - AD000042 → AD000043: 234.2m
+  - AD000044 → AD000045: 230.2m
+  - AD000047 → AD000048: 268.6m (largest)
+- **Total**: 5 large position jumps in 60-image sequence
+
+## Test Steps
+
+### Step 1: Process First Outlier (AD000003-004)
+**Action**: Process images through first 202.2m jump
+**Expected Result**:
+- Outlier detected by L1 (unexpected translation magnitude)
+- Robust kernel down-weights this constraint
+- L2 may activate for global verification
+- Trajectory continues without divergence
+- Status: OUTLIER_1_HANDLED
+
+### Step 2: Process Second Outlier (AD000032-033)
+**Action**: Continue processing through 220.6m jump
+**Expected Result**:
+- Second outlier detected independently
+- Factor graph handles multiple down-weighted factors
+- Previous outlier handling does not interfere
+- Trajectory remains globally consistent
+- Status: OUTLIER_2_HANDLED
+
+### Step 3: Process Clustered Outliers (AD000042-045)
+**Action**: Process two consecutive large jumps (234.2m and 230.2m)
+**Expected Result**:
+- System detects challenging sequence
+- L2 global relocalization activates more frequently
+- Both outliers handled without trajectory collapse
+- Factor graph optimization remains stable
+- Status: CLUSTERED_OUTLIERS_HANDLED
+
+### Step 4: Process Largest Outlier (AD000047-048)
+**Action**: Process the 268.6m outlier
+**Expected Result**:
+- Largest outlier correctly identified
+- Robust cost function prevents trajectory distortion
+- L2 provides absolute anchor for verification
+- Post-outlier tracking recovers quickly
+- Status: MAX_OUTLIER_HANDLED
+
+### Step 5: Complete Flight Processing
+**Action**: Process remaining images to AD000060
+**Expected Result**:
+- All frames processed successfully
+- Factor graph converges despite multiple outliers
+- Final trajectory globally consistent
+- Status: COMPLETED
+
+### Step 6: Analyze Multi-Outlier Performance
+**Action**: Validate final trajectory and outlier handling
+**Expected Result**:
+```
+Total outliers (>200m): 5
+Correctly handled: 5/5 (100%)
+Post-outlier recovery: <2 frames average
+Final trajectory accuracy: ≥80% <50m error
+MRE: <1.5px
+No divergence events: True
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- All 5 large jumps handled without system failure
+- ≥4/5 outliers correctly down-weighted by robust kernel
+- Overall trajectory meets AC-1 (80% <50m error)
+- Factor graph converges with MRE <2.0px
+- No catastrophic tracking loss
+
+**FAIL if**:
+- System crashes on any outlier
+- >1 outlier causes trajectory divergence (>3 frame error propagation)
+- Final trajectory accuracy <70% within 50m
+- Factor graph fails to converge
+- Manual intervention required more than once
+
+## Technical Validation Metrics
+- **Outlier Detection Rate**: 5/5 correctly flagged
+- **Down-weighting**: Outlier factors weighted <0.4
+- **Recovery Time**: Mean <2 frames to <50m error post-outlier
+- **Optimization Stability**: Converges in <100 iterations
+- **Memory**: No unbounded growth in factor graph
+
+## Stress Test Considerations
+- Multiple outliers test factor graph robustness more rigorously than single outlier
+- Clustered outliers (AD000042-045) are particularly challenging
+- System must not "learn" to ignore all large translations as outliers
+- Balance between robustness and responsiveness critical
+
+## Notes
+- This test exceeds AC-3 minimum requirement (single outlier)
+- Real-world flights likely contain multiple challenging frames
+- Successful handling demonstrates production-ready robustness
+
@@ -0,0 +1,157 @@
+# Acceptance Test: AC-4 - Sharp Turn Recovery
+
+## Summary
+Validate Acceptance Criterion 4: "System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70°"
+
+## Linked Acceptance Criteria
+**AC-4**: Handle sharp turns with <5% overlap
+
+## Preconditions
+1. System operational with L2 (Global Place Recognition) enabled
+2. AnyLoc model and Faiss index ready
+3. Test datasets: 
+   - Dataset A: AD000042, AD000044, AD000045, AD000046 (skip AD000043)
+   - Dataset B: AD000003, AD000009 (5-frame gap)
+4. Ground truth available
+
+## Test Description
+Test system's ability to recover from "kidnapped robot" scenarios where sequential tracking fails due to zero overlap. Validates L2 global place recognition functionality.
+
+## Test Steps
+
+### Step 1: Create Sharp Turn Flight (Dataset A)
+- **Action**: Create flight with AD000042, AD000044, AD000045, AD000046
+- **Expected Result**: Flight created, gap in sequence detected
+
+### Step 2: Process Through L1
+- **Action**: L1 processes AD000042
+- **Expected Result**: AD000042 processed successfully
+
+### Step 3: Attempt Sequential Tracking (L1 Failure Expected)
+- **Action**: L1 attempts AD000042 → AD000044 (skip AD000043)
+- **Expected Result**:
+  - L1 fails (overlap < 5% or zero)
+  - Low inlier count (< 10 matches)
+  - System triggers L2 recovery
+
+### Step 4: L2 Global Relocalization
+- **Action**: L2 (AnyLoc) queries AD000044 against satellite database
+- **Expected Result**:
+  - L2 retrieves correct satellite tile region
+  - Coarse location found (within 200m of ground truth per AC-4)
+  - Top-5 recall succeeds
+
+### Step 5: L3 Metric Refinement
+- **Action**: L3 (LiteSAM) refines location using satellite tile
+- **Expected Result**:
+  - Precise GPS estimate (< 50m error)
+  - High confidence score
+
+### Step 6: Continue Processing
+- **Action**: Process AD000045, AD000046
+- **Expected Result**:
+  - Processing continues normally
+  - Sequential tracking may work for AD000044 → AD000045
+  - All images completed
+
+### Step 7: Validate Recovery Success
+- **Action**: Check GPS estimates for all 4 images
+- **Expected Result**:
+  - AD000042: Accurate
+  - AD000044: Recovered via L2/L3, error < 200m (AC-4), preferably < 50m
+  - AD000045-046: Accurate
+
+### Step 8: Test Dataset B (Larger Gap)
+- **Action**: Repeat test with AD000003, AD000009 (5-frame gap)
+- **Expected Result**: Similar recovery, L2 successfully relocalizes
+
+## Success Criteria
+
+**Primary Criterion (AC-4)**:
+- System recovers from zero-overlap scenarios
+- Relocated image within < 200m of ground truth (AC-4 requirement)
+- Processing continues without manual intervention
+
+**Supporting Criteria**:
+- L1 failure detected appropriately
+- L2 retrieves correct region (top-5 accuracy)
+- L3 refines to < 50m accuracy
+- All images in sequence eventually processed
+
+## Expected Results
+
+**Dataset A**:
+```
+Images: AD000042, AD000044, AD000045, AD000046
+Gap: AD000043 skipped (simulates sharp turn)
+
+Results:
+- AD000042: L1 tracking, Error 21.3m ✓
+- AD000043: SKIPPED (not in dataset)
+- AD000044: L2 recovery, Error 38.7m ✓
+  - L1 failed (overlap ~0%)
+  - L2 top-1 retrieval: correct tile
+  - L3 refined GPS
+- AD000045: L1/L3, Error 19.2m ✓
+- AD000046: L1/L3, Error 23.8m ✓
+
+L1 Failure Detected: Yes (AD000042 → AD000044)
+L2 Recovery Success: Yes
+Images < 50m: 4/4 (100%)
+Images < 200m: 4/4 (100%) per AC-4
+AC-4 Status: PASS
+```
+
+**Dataset B**:
+```
+Images: AD000003, AD000009
+Gap: 5 frames (AD000004-008 skipped)
+
+Results:
+- AD000003: Error 24.1m ✓
+- AD000009: L2 recovery, Error 42.3m ✓
+
+L2 Recovery Success: Yes
+AC-4 Status: PASS
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- L1 failure detected when overlap < 5%
+- L2 successfully retrieves correct region (top-5)
+- Recovered image within 200m of ground truth (AC-4)
+- Preferably < 50m (demonstrates high accuracy)
+- Processing continues after recovery
+
+**TEST FAILS IF**:
+- L2 retrieves wrong region (relocalization fails)
+- Recovered image > 200m error (violates AC-4)
+- System halts processing after L1 failure
+- Multiple recovery attempts fail
+
+## Analysis
+
+**Sharp Turn Characteristics** (per AC-4):
+- Next photo overlap < 5% or zero
+- Distance < 200m (banking turn, not long-distance jump)
+- Angle < 70° (heading change)
+
+**Recovery Pipeline**:
+1. L1 detects failure (low inlier ratio)
+2. L2 (AnyLoc) global place recognition
+3. L3 (LiteSAM) metric refinement
+4. Factor graph incorporates GPS anchor
+
+**Why L2 Works**:
+- DINOv2 features capture semantic layout (roads, field patterns)
+- VLAD aggregation creates robust place descriptor
+- Faiss index enables fast retrieval
+- Works despite view angle differences
+
+## Notes
+- AC-4 specifies < 200m, but system targets < 50m for high quality
+- Sharp turns common in wing-type UAV flight (banking maneuvers)
+- L2 is critical component - if L2 fails, system requests user input per AC-6
+- Test validates "kidnapped robot" problem solution
+
@@ -0,0 +1,111 @@
+# Acceptance Test: Sharp Turns with Minimal Overlap (<5%)
+
+## Summary
+Validate AC-4 requirement for handling sharp turns where consecutive images have minimal overlap (1-5%) but do share some common features, testing the boundaries of L1 sequential tracking before L2 activation.
+
+## Linked Acceptance Criteria
+**AC-4**: System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%.
+
+## Preconditions
+- ASTRAL-Next system operational
+- LightGlue configured with adaptive depth mechanism
+- L2 global place recognition ready for fallback
+- Minimal overlap test dataset prepared
+
+## Test Data
+- **Dataset A**: AD000032-AD000035 (contains 220.6m jump)
+- **Dataset B**: AD000042, AD000044, AD000045 (skip 043, natural gap)
+- **Overlap Characteristics**: 2-5% estimated (edges of consecutive frames)
+- **Distance**: All jumps <230m (within 200m drift requirement)
+
+## Test Steps
+
+### Step 1: Test Minimal Overlap Scenario A
+**Action**: Process AD000032 → AD000033 (220.6m jump)
+**Expected Result**:
+- L1 SuperPoint+LightGlue attempts matching
+- Feature count low (10-50 matches vs normal 100+)
+- LightGlue adaptive depth increases (more attention layers)
+- Confidence score reported as LOW
+- Status: MINIMAL_OVERLAP_L1
+
+### Step 2: Verify L1 Handling or L2 Activation
+**Action**: Analyze matching result and system response
+**Expected Result**:
+- IF ≥10 high-quality matches: L1 succeeds, pose estimated
+- IF <10 matches: L1 fails gracefully, L2 activates
+- No system crash or undefined behavior
+- Transition between L1/L2 smooth
+- Status: L1_OR_L2_SUCCESS
+
+### Step 3: Test Minimal Overlap Scenario B
+**Action**: Process AD000042 → AD000044 (skipping 043)
+**Expected Result**:
+- Larger gap increases difficulty
+- L1 likely fails (overlap too low)
+- L2 AnyLoc retrieval activates
+- DINOv2 features match despite view change
+- Global position estimated from satellite database
+- Status: L2_RELOCALIZATION
+
+### Step 4: Validate Position Accuracy
+**Action**: Compare estimated positions with ground truth
+**Expected Result**:
+```
+AD000033: Error <100m (minimal overlap, higher uncertainty)
+AD000044: Error <100m (L2 retrieval, coarse localization)
+Drift: <200m for both scenarios (AC-4 requirement met)
+```
+
+### Step 5: Test Continuous Minimal Overlap Sequence
+**Action**: Process AD000032-035 as continuous sequence
+**Expected Result**:
+- Initial minimal overlap (032→033) handled
+- Subsequent frames (033→034→035) easier (normal overlap)
+- Factor graph smooths trajectory
+- Final accuracy improved through optimization
+- All frames: Error <50m after optimization
+- Status: SEQUENCE_COMPLETED
+
+### Step 6: Verify Angular Constraint (<70°)
+**Action**: Analyze estimated rotation between frames
+**Expected Result**:
+- Rotation estimates <70° for all minimal overlap scenarios
+- Rotation estimates consistent with wing-type UAV banking
+- No unrealistic 180° rotations estimated
+- Status: ANGULAR_CONSTRAINT_MET
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- Both minimal overlap scenarios processed without crash
+- Position drift <200m for each minimal overlap frame
+- Angular changes <70° estimated
+- System demonstrates graceful L1→L2 transition
+- Final optimized trajectory <50m error
+
+**FAIL if**:
+- System crashes on minimal overlap
+- Position drift >200m for any frame
+- Angular changes >70° estimated
+- No L2 fallback when L1 fails
+- Trajectory diverges
+
+## Technical Validation
+- **Match Count**: Minimal overlap yields 10-50 matches (vs normal 100+)
+- **LightGlue Depth**: Adaptive mechanism uses more layers (8+ vs typical 6)
+- **L1 Confidence**: <0.5 for minimal overlap (vs >0.8 normal)
+- **L2 Activation**: Triggered when L1 confidence <0.4
+- **Retrieval Accuracy**: Top-1 satellite tile correct in ≥90% cases
+
+## Edge Case Analysis
+- **1-5% overlap**: Boundary between L1 possible and L1 impossible
+- **5-10% overlap**: L1 should succeed with high confidence
+- **0% overlap**: Covered in separate test (36_sharp_turn_zero_overlap_spec.md)
+
+## Notes
+- This test validates the smooth transition zone between L1 and L2
+- LightGlue's attention mechanism is crucial for minimal overlap success
+- Real flights may have brief periods of minimal overlap during banking
+- AC-4 allows up to 200m drift, giving system reasonable tolerance
+
@@ -0,0 +1,136 @@
+# Acceptance Test: Outlier Anchor Detection (<10%)
+
+## Summary
+Validate AC-5 requirement that the system detects and rejects outlier global anchor points (bad satellite matches from L3) keeping outlier rate below 10%.
+
+## Linked Acceptance Criteria
+**AC-5**: Less than 10% outlier anchors. The system can tolerate some incorrect global anchor points (bad satellite matches) but must keep them below 10% threshold through validation and rejection mechanisms.
+
+## Preconditions
+- ASTRAL-Next system operational
+- L3 LiteSAM cross-view matching active
+- Factor graph with robust M-estimation
+- Validation mechanisms enabled (geometric consistency, residual analysis)
+
+## Test Data
+- **Dataset**: AD000001-AD000060 (60 images)
+- **Expected Anchors**: ~20-30 global anchor attempts (not every frame needs L3)
+- **Acceptable Outliers**: <3 outlier anchors (<10% of 30)
+- **Challenge**: Potential satellite data staleness, seasonal differences
+
+## Test Steps
+
+### Step 1: Process Flight with L3 Anchoring
+**Action**: Process full flight with L3 metric refinement active
+**Expected Result**:
+- L2 retrieves satellite tiles for keyframes
+- L3 LiteSAM performs cross-view matching
+- Global anchor factors added to factor graph
+- Anchor count: 20-30 across 60 images
+- Status: PROCESSING_WITH_ANCHORS
+
+### Step 2: Monitor Anchor Quality Metrics
+**Action**: Track L3 matching confidence and geometric consistency
+**Expected Result**:
+- Each anchor has confidence score (0-1)
+- Each anchor has initial residual error
+- Anchors with confidence <0.3 flagged as suspicious
+- Anchors with residual >3σ flagged as outliers
+- Status: MONITORING_QUALITY
+
+### Step 3: Identify Potential Outlier Anchors
+**Action**: Analyze anchors that conflict with trajectory consensus
+**Expected Result**:
+```
+Total anchors: 25 (example)
+High confidence (>0.7): 20
+Medium confidence (0.4-0.7): 3
+Low confidence (<0.4): 2
+Flagged as outliers: 2 (<10%)
+```
+
+### Step 4: Validate Outlier Rejection Mechanism
+**Action**: Verify factor graph handling of outlier anchors
+**Expected Result**:
+- Outlier anchors automatically down-weighted by robust kernel
+- Outlier anchor residuals remain high (not dragging trajectory)
+- Non-outlier anchors maintain weight ~1.0
+- Factor graph converges despite outlier anchors present
+- Status: OUTLIERS_HANDLED
+
+### Step 5: Test Explicit Outlier Anchor Scenario
+**Action**: Manually inject known bad anchor (simulated wrong satellite tile match)
+**Expected Result**:
+- Bad anchor creates large residual (>100m error)
+- Geometric validation detects inconsistency
+- Robust cost function down-weights bad anchor
+- Bad anchor does NOT corrupt trajectory
+- Status: SYNTHETIC_OUTLIER_REJECTED
+
+### Step 6: Calculate Final Anchor Statistics
+**Action**: Analyze all anchor attempts and outcomes
+**Expected Result**:
+```
+Total anchor attempts: 25-30
+Successful anchors: 23-27 (90-95%)
+Outlier anchors: 2-3 (<10%)
+Outlier detection rate: 100% (all caught)
+False positive rate: <5% (good anchors not rejected)
+Trajectory accuracy: Improved by valid anchors
+AC-5 Status: PASS
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- Outlier anchor rate <10% of total anchor attempts
+- All significant outliers (>100m error) detected and down-weighted
+- Factor graph converges with MRE <1.5px
+- Valid anchors improve trajectory accuracy vs L1-only
+- No trajectory corruption from outlier anchors
+
+**FAIL if**:
+- Outlier anchor rate >10%
+- >1 outlier anchor corrupts trajectory (causes >50m error propagation)
+- Outlier detection fails (outliers not flagged)
+- Factor graph diverges due to conflicting anchors
+- Valid anchors incorrectly rejected (>10% false positive rate)
+
+## Outlier Detection Mechanisms Tested
+
+### Geometric Consistency Check
+- Compare anchor position with L1 trajectory estimate
+- Flag if discrepancy >100m
+
+### Residual Analysis
+- Monitor residual error in factor graph optimization
+- Flag if residual >3σ from mean anchor residual
+
+### Confidence Thresholding
+- L3 LiteSAM outputs matching confidence
+- Reject anchors with confidence <0.2
+
+### Robust M-Estimation
+- Cauchy/Huber kernel automatically down-weights high-residual anchors
+- Prevents outliers from corrupting optimization
+
+## Technical Validation Metrics
+- **Anchor Attempt Rate**: 30-50% of frames (keyframes only)
+- **Anchor Success Rate**: 90-95%
+- **Outlier Rate**: <10% (AC-5 requirement)
+- **Detection Sensitivity**: >95% (outliers caught)
+- **Detection Specificity**: >90% (valid anchors retained)
+
+## Failure Modes Tested
+- **Wrong Satellite Tile**: L2 retrieves incorrect location
+- **Stale Satellite Data**: Terrain changed significantly
+- **Seasonal Mismatch**: Summer satellite vs winter UAV imagery
+- **Rotation Error**: L3 estimates incorrect rotation
+
+## Notes
+- AC-5 is critical for hybrid localization reliability
+- 10% outlier tolerance allows graceful degradation
+- Robust M-estimation is the primary outlier defense
+- Multiple validation layers provide defense-in-depth
+- Valid anchors significantly improve absolute accuracy
+
@@ -0,0 +1,171 @@
+# Acceptance Test: AC-5 - Multi-Fragment Route Connection
+
+## Summary
+Validate Acceptance Criterion 5 (partial): "System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2."
+
+## Linked Acceptance Criteria
+**AC-5**: Connect multiple disconnected route fragments
+
+## Preconditions
+1. System with "Atlas" multi-map capability (factor graph with native chunk support)
+2. F02.2 Flight Processing Engine running
+3. F11 Failure Recovery Coordinator (chunk orchestration)
+4. F12 Route Chunk Manager functional (chunk lifecycle)
+5. F10 Factor Graph Optimizer with multi-chunk support (subgraph operations)
+6. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
+7. F09 Metric Refinement (chunk LiteSAM matching)
+8. Geodetic map-merging logic implemented (Sim(3) transform via F10.merge_chunk_subgraphs())
+9. Test dataset: Simulate 3 disconnected route fragments
+
+## Test Description
+Test system's ability to handle completely disconnected route segments (no overlap between segments) and eventually connect them into a coherent trajectory using global GPS anchors.
+
+## Test Steps
+
+### Step 1: Create Multi-Fragment Flight
+- **Action**: Create flight with 3 disconnected segments:
+  - Fragment 1: AD000001-010 (sequential, connected)
+  - Fragment 2: AD000025-030 (sequential, no overlap with Fragment 1)
+  - Fragment 3: AD000050-055 (sequential, no overlap with Fragments 1 or 2)
+- **Expected Result**: Flight created with all 18 images
+
+### Step 2: Process Fragment 1
+- **Action**: Process AD000001-010
+- **Expected Result**:
+  - L1 provides sequential tracking
+  - L3 provides GPS anchors
+  - Local trajectory fragment created (Map_Fragment_1)
+  - Accurate GPS estimates
+
+### Step 3: Detect Discontinuity (Fragment 1 → 2)
+- **Action**: Process AD000025 after AD000010
+- **Expected Result**:
+  - L1 fails (no overlap, large displacement ~2km)
+  - System **proactively creates new chunk** (Map_Fragment_2)
+  - Processing continues immediately in new chunk
+  - Chunk matching attempted asynchronously
+
+### Step 4: Process Fragment 2 Independently
+- **Action**: Process AD000025-030
+- **Expected Result**:
+  - New sequential tracking starts in chunk_2
+  - Frames processed within chunk_2 context
+  - Relative factors added to chunk_2's subgraph
+  - Chunk_2 optimized independently for local consistency
+  - Chunk semantic matching attempted when ready (5-20 frames)
+  - Chunk LiteSAM matching with rotation sweeps attempted
+
+### Step 5: Process Fragment 3 
+- **Action**: Process AD000050-055 after AD000030
+- **Expected Result**:
+  - Another discontinuity detected
+  - Map_Fragment_3 initialized
+  - Independent processing
+
+### Step 6: Global Map Merging
+- **Action**: Factor graph attempts geodetic map-merging
+- **Expected Result**:
+  - All 3 chunks have GPS anchors from chunk LiteSAM matching
+  - Chunks merged via Sim(3) transform (translation, rotation, scale)
+  - Fragments aligned in global coordinate frame
+  - Single consistent trajectory created
+  - Global optimization performed
+
+### Step 7: Validate Fragment Connections
+- **Action**: Verify all 18 images have global GPS coordinates
+- **Expected Result**:
+  - All fragments successfully located
+  - Internal consistency within each fragment
+  - Global alignment across fragments
+
+### Step 8: Accuracy Validation
+- **Action**: Compare all 18 estimates vs ground truth
+- **Expected Result**:
+  - Each fragment individually accurate
+  - No systematic bias between fragments
+  - Overall accuracy meets AC-1 (≥ 80% < 50m)
+
+## Success Criteria
+
+**Primary Criterion (AC-5)**:
+- System processes all 3 disconnected fragments
+- Fragments successfully localized in global frame
+- No manual intervention required (unless extreme failures)
+
+**Supporting Criteria**:
+- Each fragment internally consistent
+- GPS anchors (L3) connect fragments globally
+- Final trajectory is coherent
+- Accuracy maintained across all fragments
+
+## Expected Results
+
+```
+Multi-Fragment Flight:
+- Fragment 1: AD000001-010 (10 images)
+  Internal consistency: Excellent
+  Global location: Accurate via L3 anchors
+  
+- Fragment 2: AD000025-030 (6 images)
+  Disconnected from Fragment 1 (~2km gap)
+  Internal consistency: Excellent
+  Global location: Recovered via L2/L3
+  
+- Fragment 3: AD000050-055 (6 images)
+  Disconnected from Fragment 2 (~1.5km gap)
+  Internal consistency: Excellent
+  Global location: Recovered via L2/L3
+
+Merging Results:
+- All 18 images localized globally
+- No fragments "lost"
+- Overall accuracy: 16/18 < 50m (88.9%)
+- Mean error: 27.3m
+
+AC-5 Status: PASS
+Processing Mode: Multi-Map Atlas
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- All 3 fragments processed successfully
+- Fragments localized in global frame via GPS anchors
+- No fragment lost or unlocatable
+- Overall accuracy acceptable (≥ 75% < 50m)
+
+**TEST FAILS IF**:
+- Any fragment completely fails to localize
+- Fragments have large systematic bias (> 100m offset)
+- System requires manual intervention for each fragment
+- Merging produces inconsistent trajectory
+
+## Architecture Elements
+
+**Multi-Map "Atlas"** (per solution document):
+- Each disconnected segment gets own local map via F12.create_chunk()
+- Local maps independently optimized via F10.optimize_chunk()
+- GPS anchors provide global reference via F10.add_chunk_anchor()
+- Geodetic merging aligns all maps via F10.merge_chunk_subgraphs()
+
+**Recovery Mechanisms**:
+- **Proactive chunk creation** via F11.create_chunk_on_tracking_loss() (immediate, not reactive)
+- Chunk semantic matching via F08.retrieve_candidate_tiles_for_chunk() (aggregate DINOv2)
+- Chunk LiteSAM matching via F06.try_chunk_rotation_steps() + F09.align_chunk_to_satellite()
+- F10 creates new chunk subgraph
+- Sim(3) transform merges chunks via F12.merge_chunks() → F10.merge_chunk_subgraphs()
+
+**Fragment Detection**:
+- Large displacement (> 500m) from last image
+- Low/zero overlap (F07 VO fails)
+- L1 failure triggers **proactive** new chunk creation
+- Chunks processed independently with local optimization
+- Multiple chunks can exist simultaneously (F10 supports multi-chunk factor graph)
+
+## Notes
+- AC-5 describes realistic operational scenario (multiple turns, disconnected segments)
+- System must not assume continuous flight path
+- GPS anchors (L3) are critical for connecting fragments
+- Without L3, fragments would be isolated with scale ambiguity
+- This validates core ASTRAL-Next architecture: hierarchical + anchor topology
+
@@ -0,0 +1,254 @@
+# Acceptance Test: AC-6 - User Input Recovery
+
+## Summary
+Validate Acceptance Criterion 6: "In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location."
+
+## Linked Acceptance Criteria
+**AC-6**: User input requested after 3 consecutive failures
+
+## Preconditions
+1. ASTRAL-Next system operational
+2. F11 Failure Recovery Coordinator configured with failure threshold = 3
+3. F15 SSE Event Streamer functional
+4. F01 Flight API accepting user-fix endpoint
+5. F10 Factor Graph Optimizer ready to accept high-confidence anchors
+6. Test environment configured to simulate L1/L2/L3 failures
+7. SSE client connected and monitoring events
+
+## Test Data
+- **Dataset**: AD000001-060 (60 images)
+- **Failure Injection**: Configure mock failures for specific frames
+- **Ground Truth**: coordinates.csv for validation
+
+## Test Steps
+
+### Step 1: Setup Failure Injection
+- **Action**: Configure system to fail L1, L2, L3 for frames AD000020, AD000021, AD000022
+- **Expected Result**: 
+  - L1 (SuperPoint+LightGlue): Returns match_count < 10
+  - L2 (AnyLoc): Returns confidence < 0.3
+  - L3 (LiteSAM): Returns alignment_score < 0.2
+
+### Step 2: Process Normal Frames (1-19)
+- **Action**: Process AD000001-AD000019 normally
+- **Expected Result**:
+  - All 19 frames processed successfully
+  - No user input requests
+  - SSE events: 19 × `frame_processed`
+
+### Step 3: First Consecutive Failure
+- **Action**: Process AD000020
+- **Expected Result**:
+  - L1 fails (low match count)
+  - L2 fallback fails (low confidence)
+  - L3 fallback fails (low alignment)
+  - System increments failure_count to 1
+  - SSE event: `frame_processing_failed` with frame_id=20
+  - **No user input request yet**
+
+### Step 4: Second Consecutive Failure
+- **Action**: Process AD000021
+- **Expected Result**:
+  - All layers fail
+  - failure_count incremented to 2
+  - SSE event: `frame_processing_failed` with frame_id=21
+  - **No user input request yet**
+
+### Step 5: Third Consecutive Failure - Triggers User Input
+- **Action**: Process AD000022
+- **Expected Result**:
+  - All layers fail
+  - failure_count reaches threshold (3)
+  - F11 calls `create_user_input_request()`
+  - SSE event: `user_input_required`
+  - Event payload contains:
+    ```json
+    {
+      "type": "user_input_required",
+      "flight_id": "<flight_id>",
+      "frame_id": 22,
+      "failed_frames": [20, 21, 22],
+      "candidate_tiles": [
+        {"tile_id": "xyz", "gps": {"lat": 48.27, "lon": 37.38}, "thumbnail_url": "..."},
+        {"tile_id": "abc", "gps": {"lat": 48.26, "lon": 37.37}, "thumbnail_url": "..."},
+        {"tile_id": "def", "gps": {"lat": 48.28, "lon": 37.39}, "thumbnail_url": "..."}
+      ],
+      "uav_image_url": "/flights/<id>/images/22",
+      "message": "System unable to locate 3 consecutive images. Please provide GPS fix."
+    }
+    ```
+
+### Step 6: Validate Threshold Behavior
+- **Action**: Verify user input NOT requested before 3 failures
+- **Expected Result**:
+  - Review event log: no `user_input_required` before frame 22
+  - Threshold is exactly 3 consecutive failures, not 2 or 4
+
+### Step 7: User Provides GPS Fix
+- **Action**: POST /flights/{flightId}/user-fix
+- **Payload**:
+  ```json
+  {
+    "frame_id": 22,
+    "uav_pixel": [3126, 2084],
+    "satellite_gps": {"lat": 48.273997, "lon": 37.379828},
+    "confidence": "high"
+  }
+  ```
+- **Expected Result**:
+  - HTTP 200 OK
+  - Response: `{"status": "accepted", "frame_id": 22}`
+
+### Step 8: System Incorporates User Fix
+- **Action**: F11 processes user fix via `apply_user_anchor()`
+- **Expected Result**:
+  - F10 adds GPS anchor with high confidence (weight = 10.0)
+  - Factor graph re-optimizes
+  - SSE event: `user_fix_applied`
+  - Event payload:
+    ```json
+    {
+      "type": "user_fix_applied",
+      "frame_id": 22,
+      "estimated_gps": {"lat": 48.273997, "lon": 37.379828},
+      "affected_frames": [20, 21, 22]
+    }
+    ```
+
+### Step 9: Trajectory Refinement
+- **Action**: Factor graph back-propagates fix to frames 20, 21
+- **Expected Result**:
+  - SSE event: `trajectory_refined` for frames 20, 21
+  - All 3 failed frames now have GPS estimates
+  - failure_count reset to 0
+
+### Step 10: Processing Resumes Automatically
+- **Action**: System processes AD000023 and beyond
+- **Expected Result**:
+  - Processing resumes without manual restart
+  - AD000023+ processed normally (no more injected failures)
+  - SSE events continue: `frame_processed`
+
+### Step 11: Validate 20% Route Allowance
+- **Action**: Calculate maximum allowed user inputs for 60-image flight
+- **Expected Result**:
+  - 20% of 60 = 12 images maximum can need user input
+  - System tracks user_input_count per flight
+  - If user_input_count > 12, system logs warning but continues
+
+### Step 12: Test Multiple User Input Cycles
+- **Action**: Inject failures for frames AD000040, AD000041, AD000042
+- **Expected Result**:
+  - Second `user_input_required` event triggered
+  - User provides second fix
+  - System continues processing
+  - Total user inputs: 2 cycles (6 frames aided)
+
+### Step 13: Test User Input Timeout
+- **Action**: Trigger user input request, wait 5 minutes without response
+- **Expected Result**:
+  - System sends reminder: `user_input_reminder` at 2 minutes
+  - Processing remains paused for affected chunk
+  - Other chunks (if any) continue processing
+  - No timeout crash
+
+### Step 14: Test Invalid User Fix
+- **Action**: Submit user fix with invalid GPS (outside geofence)
+- **Payload**:
+  ```json
+  {
+    "frame_id": 22,
+    "satellite_gps": {"lat": 0.0, "lon": 0.0}
+  }
+  ```
+- **Expected Result**:
+  - HTTP 400 Bad Request
+  - Error: "GPS coordinates outside flight geofence"
+  - System re-requests user input
+
+### Step 15: Validate Final Flight Statistics
+- **Action**: GET /flights/{flightId}/status
+- **Expected Result**:
+  ```json
+  {
+    "flight_id": "<id>",
+    "total_frames": 60,
+    "processed_frames": 60,
+    "user_input_requests": 2,
+    "user_inputs_provided": 2,
+    "frames_aided_by_user": 6,
+    "user_input_percentage": 10.0
+  }
+  ```
+
+## Success Criteria
+
+**Primary Criteria (AC-6)**:
+- User input requested after exactly 3 consecutive failures (not 2, not 4)
+- User notified via SSE with relevant context (candidate tiles, image URL)
+- User fix accepted via REST API
+- User fix incorporated as high-confidence GPS anchor
+- Processing resumes automatically after fix
+- System allows up to 20% of route to need user input
+
+**Supporting Criteria**:
+- SSE events delivered within 1 second
+- Factor graph incorporates fix within 2 seconds
+- Back-propagation refines earlier failed frames
+- failure_count resets after successful fix
+- System handles multiple user input cycles per flight
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- User input request triggered at exactly 3 consecutive failures
+- SSE event contains all required info (frame_id, candidate tiles)
+- User fix accepted and incorporated
+- Processing resumes automatically
+- 20% allowance calculated correctly
+- Multiple cycles work correctly
+- Invalid fixes rejected gracefully
+
+**TEST FAILS IF**:
+- User input requested before 3 failures
+- User input NOT requested after 3 failures
+- SSE event missing required fields
+- User fix causes system error
+- Processing does not resume after fix
+- System crashes on invalid user input
+- Timeout causes system hang
+
+## Error Scenarios
+
+### Scenario A: User Provides Wrong GPS
+- User fix GPS is 500m from actual location
+- System accepts fix (user has authority)
+- Subsequent frames may fail again
+- Second user input cycle may be needed
+
+### Scenario B: SSE Connection Lost
+- Client disconnects during user input wait
+- System buffers events
+- Client reconnects, receives pending events
+- Processing state preserved
+
+### Scenario C: Database Failure During Fix
+- User fix received but DB write fails
+- System retries 3 times
+- If all retries fail, returns HTTP 503
+- User can retry submission
+
+## Components Involved
+- F01 Flight API: `POST /flights/{id}/user-fix`
+- F02.1 Flight Lifecycle Manager: `handle_user_fix()`
+- F02.2 Flight Processing Engine: `apply_user_fix()`
+- F10 Factor Graph Optimizer: `add_absolute_factor()` with high confidence
+- F11 Failure Recovery Coordinator: `create_user_input_request()`, `apply_user_anchor()`
+- F15 SSE Event Streamer: `send_user_input_request()`, `send_user_fix_applied()`
+
+## Notes
+- AC-6 is the human-in-the-loop fallback for extreme failures
+- 3-failure threshold balances automation with user intervention
+- 20% allowance (12 of 60 images) is operational constraint
+- User fixes are trusted (high confidence weight in factor graph)
+- System should minimize user inputs via L1/L2/L3 layer defense
@@ -0,0 +1,171 @@
+# Acceptance Test: Single Image Processing Performance (<5 seconds)
+
+## Summary
+Validate AC-7 requirement that each individual image processes in less than 5 seconds on target hardware (NVIDIA RTX 2060/3070).
+
+## Linked Acceptance Criteria
+**AC-7**: Less than 5 seconds for processing one image.
+
+## Preconditions
+- ASTRAL-Next system deployed on target hardware
+- Hardware: NVIDIA RTX 2060 (minimum) or RTX 3070 (recommended)
+- TensorRT FP16 optimized models loaded
+- System warmed up (first frame excluded from timing)
+- No other GPU-intensive processes running
+
+## Test Data
+- **Test Images**: AD000001-AD000010 (representative sample)
+- **Scenarios**:
+  - Easy: Normal overlap, clear features (AD000001-002)
+  - Medium: Agricultural texture (AD000005-006)
+  - Hard: Minimal overlap (AD000032-033)
+
+## Performance Breakdown Target
+```
+L1 SuperPoint+LightGlue: 50-150ms
+L2 AnyLoc (keyframes only): 150-200ms
+L3 LiteSAM (keyframes only): 60-100ms
+Factor Graph Update: 50-100ms
+Overhead (I/O, coordination): 50-100ms
+-----------------------------------
+Total (L1 only): <500ms
+Total (L1+L2+L3): <700ms
+Safety Margin: 4300ms
+AC-7 Limit: 5000ms
+```
+
+## Test Steps
+
+### Step 1: Measure Easy Scenario Performance
+**Action**: Process AD000001 → AD000002 (normal overlap, clear features)
+**Expected Result**:
+```
+Image Load: <50ms
+L1 Processing: 80-120ms
+Factor Graph: 30-50ms
+Result Output: <20ms
+---
+Total: <240ms
+Status: WELL_UNDER_LIMIT (4.8% of budget)
+```
+
+### Step 2: Measure Medium Scenario Performance
+**Action**: Process AD000005 → AD000006 (agricultural texture)
+**Expected Result**:
+```
+Image Load: <50ms
+L1 Processing: 100-150ms (more features)
+Factor Graph: 40-60ms
+Result Output: <20ms
+---
+Total: <280ms
+Status: UNDER_LIMIT (5.6% of budget)
+```
+
+### Step 3: Measure Hard Scenario Performance
+**Action**: Process AD000032 → AD000033 (220.6m jump, minimal overlap)
+**Expected Result**:
+```
+Image Load: <50ms
+L1 Processing: 150-200ms (adaptive depth)
+L1 Confidence: LOW → Triggers L2
+L2 Processing: 150-200ms
+L3 Refinement: 80-120ms
+Factor Graph: 80-120ms (more complex)
+Result Output: <30ms
+---
+Total: <720ms
+Status: UNDER_LIMIT (14.4% of budget)
+```
+
+### Step 4: Measure Worst-Case Performance
+**Action**: Process with all layers active + large factor graph
+**Expected Result**:
+```
+Image Load: 80ms
+L1 Processing: 200ms
+L2 Processing: 200ms
+L3 Refinement: 120ms
+Factor Graph: 150ms (200+ nodes)
+Result Output: 50ms
+---
+Total: <800ms
+Status: UNDER_LIMIT (16% of budget)
+```
+
+### Step 5: Statistical Performance Analysis
+**Action**: Process 10 representative images, calculate statistics
+**Expected Result**:
+```
+Mean processing time: 350ms
+Median processing time: 280ms
+90th percentile: 500ms
+95th percentile: 650ms
+99th percentile: 800ms
+Max: <900ms
+All: <5000ms (AC-7 requirement)
+Status: PASS
+```
+
+### Step 6: Verify TensorRT Optimization Impact
+**Action**: Compare TensorRT FP16 vs PyTorch FP32 performance
+**Expected Result**:
+```
+PyTorch FP32 (baseline): 800-1200ms per image
+TensorRT FP16 (optimized): 250-400ms per image
+Speedup: 2.5-3.5x
+Without TensorRT: Would fail AC-7
+With TensorRT: Comfortably passes AC-7
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- 100% of images process in <5000ms
+- Mean processing time <1000ms (20% of budget)
+- 99th percentile <2000ms (40% of budget)
+- TensorRT FP16 optimization active and verified
+- Performance consistent across easy/medium/hard scenarios
+
+**FAIL if**:
+- ANY image takes ≥5000ms
+- Mean processing time >2000ms
+- System cannot maintain <5s with TensorRT optimization
+- Performance degrades over time (memory leak)
+
+## Hardware Requirements Validation
+
+### RTX 2060 (Minimum)
+- VRAM: 6GB
+- Expected performance: 90th percentile <1000ms
+- Status: Meets AC-7 with optimization
+
+### RTX 3070 (Recommended)
+- VRAM: 8GB  
+- Expected performance: 90th percentile <700ms
+- Status: Comfortably exceeds AC-7
+
+## Performance Optimization Checklist
+- TensorRT FP16 models compiled and loaded
+- CUDA graphs enabled for inference
+- Batch size = 1 (real-time constraint)
+- Asynchronous GPU operations where possible
+- Memory pre-allocated (no runtime allocation)
+- Factor graph incremental updates (iSAM2)
+
+## Monitoring and Profiling
+- **NVIDIA Nsight**: GPU utilization >80% during processing
+- **CPU Usage**: <50% (GPU-bound workload)
+- **Memory**: Stable (no leaks over 100+ images)
+- **Thermal**: GPU <85°C sustained
+
+## Notes
+- AC-7 specifies "processing one image", interpreted as latency per image
+- 5-second budget is generous given target ~500ms actual performance
+- Margin allows for:
+  - Older hardware (RTX 2060)
+  - Complex scenarios (multiple layers active)
+  - Factor graph growth over long flights
+  - System overhead
+- Real-time (<100ms) not required; <5s is operational target
+
@@ -0,0 +1,187 @@
+# Acceptance Test: Sustained Performance Throughput
+
+## Summary
+Validate that AC-7 performance (<5s per image) is maintained throughout long flight sequences without degradation from memory growth, thermal throttling, or resource exhaustion.
+
+## Linked Acceptance Criteria
+**AC-7**: Less than 5 seconds for processing one image (sustained over entire flight).
+
+## Preconditions
+- ASTRAL-Next system deployed on target hardware
+- Hardware: NVIDIA RTX 3070 (or RTX 2060 minimum)
+- Long flight test dataset available
+- System monitoring tools active (GPU-Z, NVIDIA SMI, memory profiler)
+
+## Test Data
+- **Short Flight**: AD000001-AD000030 (30 images, ~3.6km flight)
+- **Full Flight**: AD000001-AD000060 (60 images, ~7.2km flight)
+- **Simulated Long Flight**: AD000001-060 repeated 10x (600 images, ~72km)
+- **Maximum Scale**: 1500 images (target operational max from requirements)
+
+## Test Steps
+
+### Step 1: Baseline Short Flight Performance
+**Action**: Process 30-image flight, measure per-image timing
+**Expected Result**:
+```
+Images: 30
+Mean time: <500ms per image
+Total time: <15s
+Memory growth: <100MB
+GPU temp: <75°C
+Status: BASELINE_ESTABLISHED
+```
+
+### Step 2: Full Dataset Performance
+**Action**: Process 60-image flight, monitor for degradation
+**Expected Result**:
+```
+Images: 60
+Mean time: <550ms per image (slight increase acceptable)
+Total time: <33s
+Memory growth: <200MB (factor graph growth)
+GPU temp: <78°C
+No thermal throttling: True
+Status: FULL_FLIGHT_STABLE
+```
+
+### Step 3: Extended Flight Simulation
+**Action**: Process 600 images (10x repeat of dataset)
+**Expected Result**:
+```
+Images: 600
+Mean time per image: <600ms (monitoring degradation)
+90th percentile: <800ms
+99th percentile: <1200ms
+Max: <2000ms
+All images: <5000ms (AC-7 maintained)
+Total time: <6 minutes
+Status: EXTENDED_FLIGHT_STABLE
+```
+
+### Step 4: Memory Stability Analysis
+**Action**: Monitor memory usage over extended flight
+**Expected Result**:
+```
+Initial memory: 2-3GB (models loaded)
+After 100 images: 3-4GB (factor graph)
+After 300 images: 4-5GB (larger graph)
+After 600 images: 5-6GB (stable, not growing unbounded)
+Memory leaks: None detected
+Factor graph pruning: Active (old nodes marginalized)
+Status: MEMORY_STABLE
+```
+
+### Step 5: Thermal Management
+**Action**: Monitor GPU temperature over extended processing
+**Expected Result**:
+```
+Idle: 40-50°C
+Initial processing: 65-70°C
+After 15 minutes: 70-75°C
+After 30 minutes: 72-78°C (stable)
+Max temperature: <85°C (throttling threshold)
+Thermal throttling events: 0
+Fan speed: 60-80% (adequate cooling)
+Status: THERMAL_STABLE
+```
+
+### Step 6: Performance Degradation Analysis
+**Action**: Compare first 60 images vs last 60 images in 600-image flight
+**Expected Result**:
+```
+First 60 images:
+  Mean: 450ms
+  90th percentile: 600ms
+  
+Last 60 images:
+  Mean: 550ms (acceptable increase)
+  90th percentile: 700ms
+  
+Degradation: <20% (within acceptable limits)
+Root cause: Larger factor graph (expected)
+Mitigation: iSAM2 incremental updates working
+Status: ACCEPTABLE_DEGRADATION
+```
+
+### Step 7: Maximum Scale Test (1500 images)
+**Action**: Process or simulate 1500-image flight (operational maximum)
+**Expected Result**:
+```
+Images: 1500
+Processing time: <15 minutes total
+Mean per image: <600ms
+All images: <5000ms (AC-7 requirement met)
+Memory: <8GB (within RTX 3070 limits)
+System stability: No crashes, hangs, or errors
+Status: MAX_SCALE_PASS
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- ALL images across all test scales process in <5000ms
+- Performance degradation <30% over 600+ images
+- No memory leaks detected (unbounded growth)
+- No thermal throttling occurs
+- System remains stable through 1500+ image flights
+- Mean processing time <1000ms across all scales
+
+**FAIL if**:
+- ANY image takes ≥5000ms at any point in flight
+- Performance degradation >50% over extended flight
+- Memory leaks detected (>10GB for 600 images)
+- Thermal throttling reduces performance
+- System crashes on long flights
+- Mean processing time >2000ms
+
+## Performance Optimization for Sustained Throughput
+
+### Factor Graph Pruning
+- Marginalize old nodes after 100 frames
+- Keep only last 50 frames in active optimization
+- Reduces computational complexity: O(n²) → O(1)
+
+### Incremental Optimization (iSAM2)
+- Avoid full graph re-optimization each frame
+- Update only affected nodes
+- 10-20x speedup vs full batch optimization
+
+### Memory Management
+- Pre-allocate buffers for images and features
+- Release satellite tile cache for old regions
+- GPU memory defragmentation every 100 frames
+
+### Thermal Management
+- Monitor GPU temperature continuously
+- Reduce batch size if approaching thermal limits
+- Optional: reduce processing rate to maintain <80°C
+
+## Monitoring Dashboard Metrics
+```
+Current FPS: 2-3 images/second
+Current Latency: 350-500ms per image
+Images Processed: 450/1500
+Estimated Completion: 6 minutes
+GPU Utilization: 85%
+GPU Memory: 5.2GB / 8GB
+GPU Temperature: 74°C
+CPU Usage: 40%
+System Memory: 12GB / 32GB
+Status: HEALTHY
+```
+
+## Failure Modes Tested
+- **Memory Leak**: Unbounded factor graph growth
+- **Thermal Throttling**: GPU overheating reduces performance
+- **Cache Thrashing**: Satellite tile cache inefficiency
+- **Optimization Slowdown**: Factor graph becomes too large
+- **Resource Exhaustion**: Running out of GPU memory
+
+## Notes
+- Sustained performance more demanding than single-image performance
+- Real operational flights 500-1500 images (requirement specification)
+- 5-second budget per image allows comfortable sustained operation
+- System architecture designed for long-term stability
+- iSAM2 incremental optimization is critical for scalability
+
@@ -0,0 +1,36 @@
+# Acceptance Test: AC-8 - Real-Time Results + Async Refinement
+
+## Summary
+Validate Acceptance Criterion 8: "Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user."
+
+## Linked Acceptance Criteria
+**AC-8**: Real-time streaming + async refinement
+
+## Test Steps
+
+### Step 1: Immediate Results (Part 1 of AC-8)
+- **Action**: Upload AD000001, connect SSE client
+- **Expected Result**: "image_processed" event received within 1 second of processing complete
+
+### Step 2: Progressive Availability (Part 1)
+- **Action**: Upload 60 images, monitor SSE stream
+- **Expected Result**: First result available within 10 seconds, don't need to wait for all 60
+
+### Step 3: Refinement Notification (Part 2 of AC-8)
+- **Action**: Image 20 adds GPS anchor, factor graph refines images 1-19
+- **Expected Result**: "trajectory_refined" event sent within 2 seconds
+
+### Step 4: Refined Results Delivered
+- **Action**: Client receives refinement event, fetches updated results
+- **Expected Result**: Refined GPS coordinates available, version incremented
+
+## Success Criteria
+- Initial results latency < 1 second
+- First result available before flight complete
+- Refinements notified immediately
+- Both initial and refined results accessible
+
+## Pass/Fail Criteria
+**Passes If**: AC-8 both parts met (immediate + refinement)
+**Fails If**: Results delayed > 2s, or refinements not sent
+
@@ -0,0 +1,207 @@
+# Acceptance Test: Asynchronous Trajectory Refinement
+
+## Summary
+Validate AC-8 requirement that the system refines previously computed results asynchronously in the background without blocking real-time result delivery.
+
+## Linked Acceptance Criteria
+**AC-8**: Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user.
+
+## Preconditions
+- ASTRAL-Next system operational
+- Multi-threaded/process architecture active
+- SSE event streaming enabled
+- Factor graph configured for both incremental and batch optimization
+- Test client subscribed to result stream
+
+## Test Data
+- **Dataset**: AD000001-AD000060 (60 images)
+- **Focus**: Timing of initial results vs refined results
+- **Expected Behavior**: Initial results immediate, refinements sent later
+
+## Result Types
+1. **Initial Result**: L1 sequential tracking estimate (fast, may drift)
+2. **Intermediate Result**: L1 + periodic L2/L3 anchors (improved)
+3. **Refined Result**: Full factor graph optimization (best accuracy)
+
+## Test Steps
+
+### Step 1: Start Flight Processing and Monitor Initial Results
+**Action**: Begin processing, subscribe to SSE result stream
+**Expected Result**:
+```
+Frame 1 processed: Initial result delivered <500ms
+Frame 2 processed: Initial result delivered <500ms
+Frame 3 processed: Initial result delivered <500ms
+...
+Observation: User receives results immediately
+Refinement: None yet (not enough data for global optimization)
+Status: INITIAL_RESULTS_STREAMING
+```
+
+### Step 2: Detect First Refinement Event
+**Action**: Continue processing, monitor for refinement of early frames
+**Expected Result**:
+```
+Frame 10 processed: Initial result delivered <500ms
+Background: Factor graph optimization triggered (10 frames accumulated)
+Refinement Event: Frames 1-10 refined results published
+Timing: Refinement ~2-3 seconds after Frame 10 initial result
+Change: Frames 1-10 positions adjusted by 5-20m (drift correction)
+Status: FIRST_REFINEMENT_SENT
+```
+
+### Step 3: Verify Real-Time Processing Not Blocked
+**Action**: Measure frame processing timing during background refinement
+**Expected Result**:
+```
+Frame 11 processing: <500ms (not blocked by refinement)
+Frame 12 processing: <500ms (not blocked by refinement)
+Frame 13 processing: <500ms (not blocked by refinement)
+Background refinement: Running in parallel
+CPU usage: Main thread 40%, background thread 30%
+GPU usage: Shared between inference and optimization
+Status: PARALLEL_PROCESSING_VERIFIED
+```
+
+### Step 4: Monitor Multiple Refinement Cycles
+**Action**: Process full flight, count refinement events
+**Expected Result**:
+```
+Total frames: 60
+Initial results: 60 (all delivered immediately)
+Refinement cycles: 6 (every ~10 frames)
+Refinement triggers:
+  - Cycle 1: Frame 10 (refines 1-10)
+  - Cycle 2: Frame 20 (refines 1-20)
+  - Cycle 3: Frame 30 (refines 1-30)
+  - Cycle 4: Frame 40 (refines 1-40)
+  - Cycle 5: Frame 50 (refines 1-50)
+  - Cycle 6: Frame 60 (refines 1-60, final)
+Status: MULTIPLE_REFINEMENTS_OBSERVED
+```
+
+### Step 5: Analyze Refinement Impact
+**Action**: Compare initial vs refined results for accuracy improvement
+**Expected Result**:
+```
+Frame 10 (example):
+  Initial result: 48.27532, 37.38522 (estimate)
+  Refined result: 48.27529, 37.38520 (improved)
+  Ground truth:   48.27529, 37.38522
+  Initial error: 12m
+  Refined error: 3m
+  Improvement: 75% error reduction
+
+Mean improvement across all frames: 40-60% error reduction
+Status: REFINEMENT_BENEFICIAL
+```
+
+### Step 6: Test User Experience Flow
+**Action**: Simulate user viewing results in real-time
+**Expected Result**:
+```
+T=0s: User starts flight processing
+T=0.5s: Frame 1 result appears (can start analysis immediately)
+T=1s: Frame 2 result appears
+T=2s: Frame 4 result appears
+T=5s: Frame 10 result appears
+T=7s: Frames 1-10 refined (map updates with improved positions)
+T=10s: Frame 20 result appears
+T=12s: Frames 1-20 refined
+...
+User experience: Immediate feedback, incremental improvement
+Blocked waiting: 0 seconds (AC-8 requirement met)
+Status: UX_OPTIMAL
+```
+
+### Step 7: Validate Refinement Message Format
+**Action**: Inspect SSE messages for initial vs refined results
+**Expected Result**:
+```json
+// Initial result
+{
+  "type": "initial_result",
+  "frame_id": "AD000010.jpg",
+  "timestamp": 1234567890.5,
+  "gps": [48.27532, 37.38522],
+  "confidence": 0.85,
+  "status": "TRACKED"
+}
+
+// Refined result (same frame, later)
+{
+  "type": "refined_result",
+  "frame_id": "AD000010.jpg",
+  "timestamp": 1234567893.2,
+  "gps": [48.27529, 37.38520],
+  "confidence": 0.95,
+  "refinement_cycle": 1,
+  "improvement_m": 9.2,
+  "status": "REFINED"
+}
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- ALL initial results delivered in <500ms (real-time requirement)
+- Refinement occurs in background without blocking new frame processing
+- ≥3 refinement cycles observed in 60-frame flight
+- Refinement improves accuracy by ≥30% on average
+- User receives both initial and refined results via SSE
+- No race conditions or result ordering issues
+
+**FAIL if**:
+- Initial results delayed >1000ms waiting for refinement
+- Refinement blocks real-time processing (frames queue up)
+- No refinement occurs (system only outputs initial results)
+- Refinement degrades accuracy (worse than initial)
+- SSE messages dropped or out-of-order
+- System crashes during concurrent processing
+
+## Architecture Requirements
+
+### Threading Model
+- **Main Thread**: Image ingest, L1/L2/L3 inference, initial results
+- **Background Thread**: Factor graph batch optimization, refinement
+- **Communication**: Thread-safe queue for initial results, SSE for output
+
+### Refinement Triggers
+- **Time-based**: Every 5 seconds
+- **Frame-based**: Every 10 frames
+- **Anchor-based**: When new L3 global anchor added
+- **Final**: When flight completes
+
+### Optimization Strategy
+- **Initial**: L1 only (sequential VO, fast but drifts)
+- **Incremental**: iSAM2 updates every frame (moderate accuracy)
+- **Batch**: Full graph optimization during refinement (best accuracy)
+
+## Performance Considerations
+- Refinement CPU-bound (factor graph optimization)
+- Inference GPU-bound (neural networks)
+- Parallel execution achieves near-zero blocking
+- Refinement overhead <20% of real-time processing budget
+
+## User Interface Integration
+```javascript
+// Client receives two types of events
+eventSource.addEventListener('initial_result', (e) => {
+  const result = JSON.parse(e.data);
+  map.addMarker(result.frame_id, result.gps, 'initial');
+});
+
+eventSource.addEventListener('refined_result', (e) => {
+  const result = JSON.parse(e.data);
+  map.updateMarker(result.frame_id, result.gps, 'refined');
+  showImprovement(result.improvement_m);
+});
+```
+
+## Notes
+- AC-8 enables real-time mission monitoring while maintaining high accuracy
+- Decoupled architecture critical for concurrent processing
+- Refinement is optional enhancement, not required for basic operation
+- Users benefit from both immediate feedback and eventual accuracy improvement
+- Similar to Google Maps "blue dot" moving immediately, then snapping to road
+
@@ -0,0 +1,170 @@
+# Acceptance Test: Image Registration Rate >95% - Baseline
+
+## Summary
+Validate AC-9 requirement that ≥95% of images successfully register (find enough matching features for pose estimation) under normal flight conditions.
+
+## Linked Acceptance Criteria
+**AC-9**: Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory.
+
+## Preconditions
+- ASTRAL-Next system operational
+- Normal flight conditions (no extreme weather, lighting, or terrain)
+- Multi-layer architecture (L1, L2, L3) active
+- Registration success criteria defined
+
+## Registration Success Definition
+An image is **successfully registered** if:
+1. L1 sequential tracking succeeds (≥50 inlier matches), OR
+2. L2 global retrieval succeeds (correct satellite tile, confidence >0.6), OR
+3. L3 metric refinement succeeds (cross-view match confidence >0.5)
+
+An image **fails registration** only if ALL three layers fail.
+
+## Test Data
+- **Primary Dataset**: AD000001-AD000030 (baseline segment, normal conditions)
+- **Secondary Dataset**: AD000001-AD000060 (full flight, includes challenges)
+- **Expected**: ≥95% registration rate for both datasets
+
+## Test Steps
+
+### Step 1: Process Baseline Segment (Normal Conditions)
+**Action**: Process AD000001-AD000030, track registration outcomes
+**Expected Result**:
+```
+Total images: 30
+L1 successful: 28 (93.3%)
+L2 successful (L1 failed): 2 (6.7%)
+L3 successful (L1+L2 failed): 0 (0%)
+Total registered: 30 (100%)
+Registration rate: 100% > 95% (AC-9 PASS)
+Status: BASELINE_EXCELLENT
+```
+
+### Step 2: Analyze L1 Registration Performance
+**Action**: Examine L1 sequential tracking success rate
+**Expected Result**:
+```
+L1 attempts: 29 (frame-to-frame pairs)
+L1 successes: 28
+L1 failures: 1
+L1 success rate: 96.6%
+Failure case: AD000003→004 (202.2m jump, expected)
+L1 failure handling: L2 activated successfully
+Status: L1_BASELINE_PASS
+```
+
+### Step 3: Analyze L2 Fallback Performance
+**Action**: Examine L2 global retrieval when L1 fails
+**Expected Result**:
+```
+L2 activations: 2 (L1 failure triggers)
+L2 successes: 2
+L2 failures: 0
+L2 success rate: 100%
+Cases: AD000004 (after 202m jump), AD000033 (after 220m jump)
+L2 contribution: Prevented 2 registration failures
+Status: L2_FALLBACK_WORKING
+```
+
+### Step 4: Process Full Flight with Challenges
+**Action**: Process AD000001-AD000060, including sharp turns and outliers
+**Expected Result**:
+```
+Total images: 60
+L1 successful: 53 (88.3%)
+L2 successful (L1 failed): 6 (10%)
+L3 successful (L1+L2 failed): 0 (0%)
+Registration failures: 1 (1.7%)
+Registration rate: 98.3% > 95% (AC-9 PASS)
+Status: FULL_FLIGHT_PASS
+```
+
+### Step 5: Identify and Analyze Registration Failures
+**Action**: Investigate any frames that failed all three registration layers
+**Expected Result**:
+```
+Failed frame: AD000048 (hypothetical example)
+L1 failure: 268.6m jump from AD000047, overlap ~0%
+L2 failure: Satellite data outdated, wrong tile retrieved
+L3 failure: Cross-view match confidence 0.3 (below 0.5 threshold)
+Outcome: User intervention requested (AC-6 human-in-the-loop)
+Acceptable: <5% failure rate allows for extremely difficult cases
+Status: FAILURE_WITHIN_TOLERANCE
+```
+
+### Step 6: Calculate Final Registration Statistics
+**Action**: Compute comprehensive registration metrics
+**Expected Result**:
+```
+Dataset: AD000001-060 (60 images)
+Successfully registered: 59 images
+Registration failures: 1 image
+Registration rate: 98.3%
+
+AC-9 Requirement: >95%
+Actual Performance: 98.3%
+Margin: +3.3 percentage points
+Status: AC-9 PASS with margin
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- Registration rate ≥95% on baseline dataset (AD000001-030)
+- Registration rate ≥95% on full flight dataset (AD000001-060)
+- Multi-layer fallback working (L2 activates when L1 fails)
+- Registration failures <5% (allowing for extremely difficult frames)
+
+**FAIL if**:
+- Registration rate <95% on either dataset
+- L2 fallback not activating when L1 fails
+- Registration failures >5%
+- System crashes on unregistered frames (should handle gracefully)
+
+## Layer Contribution Analysis
+
+### Layer 1 (Sequential Tracking)
+- **Role**: Primary registration method (fastest)
+- **Success Rate**: 85-95% (normal overlap conditions)
+- **Failure Cases**: Sharp turns, low overlap
+
+### Layer 2 (Global Retrieval)
+- **Role**: Fallback for L1 failures (slower but robust)
+- **Success Rate**: 85-95% when activated
+- **Failure Cases**: Stale satellite data, ambiguous locations
+
+### Layer 3 (Metric Refinement)
+- **Role**: Precision improvement, not primary registration
+- **Success Rate**: 80-90% when attempted
+- **Failure Cases**: Large view angle difference, seasonal mismatch
+
+### Multi-Layer Defense
+```
+Registration Success = L1 OR L2 OR L3
+P(success) = 1 - P(all fail)
+P(success) = 1 - (0.10 × 0.10 × 0.15)
+P(success) = 1 - 0.0015
+P(success) = 99.85% > 95% (AC-9)
+```
+
+## Registration Failure Handling
+When registration fails for a frame:
+1. System flags frame as UNREGISTERED
+2. Continues processing subsequent frames
+3. Attempts to re-register after later frames provide context
+4. If still fails after 3 attempts, requests user input (AC-6)
+5. Does not crash or halt processing
+
+## Quality Metrics Beyond Registration Rate
+- **Mean inlier count**: >100 matches (L1 successful cases)
+- **Mean confidence**: >0.8 (registered frames)
+- **Pose covariance**: <50m uncertainty (registered frames)
+- **Trajectory continuity**: No gaps >3 frames
+
+## Notes
+- AC-9 threshold of 95% balances operational utility with real-world challenges
+- 5% failure allowance (~3 frames per 60) accommodates extreme cases
+- Multi-layer architecture critical for achieving high registration rate
+- "Atlas" multi-map approach counts disconnected fragments as registered
+- Registration rate > positioning accuracy (AC-1/AC-2) requirements
+
@@ -0,0 +1,213 @@
+# Acceptance Test: Image Registration Rate >95% - Challenging Conditions
+
+## Summary
+Validate AC-9 requirement (≥95% registration rate) under challenging conditions including multiple sharp turns, outliers, repetitive textures, and degraded satellite data.
+
+## Linked Acceptance Criteria
+**AC-9**: Image Registration Rate > 95%. System maintains high registration rate even under adverse conditions that stress all three localization layers.
+
+## Preconditions
+- ASTRAL-Next system operational
+- Multi-layer architecture robust to individual layer failures
+- Challenging test scenarios prepared
+- Registration fallback mechanisms active
+
+## Challenging Conditions Tested
+1. **Multiple sharp turns** (5 turns >200m in 60 images)
+2. **Large outlier** (268.6m jump)
+3. **Repetitive agricultural texture** (aliasing risk)
+4. **Degraded satellite data** (simulated staleness)
+5. **Seasonal mismatch** (summer satellite, autumn flight)
+6. **Clustered failures** (consecutive difficult frames)
+
+## Test Data
+- **Full Flight**: AD000001-AD000060 (contains all 5 sharp turns + outlier)
+- **Stress Test**: AD000042-AD000048 (clustered challenges)
+- **Expected**: ≥95% registration despite challenges
+
+## Test Steps
+
+### Step 1: Multi-Sharp-Turn Scenario
+**Action**: Process flight segment with 5 sharp turns (>200m jumps)
+**Expected Result**:
+```
+Sharp turn frames: 5
+  - AD000003→004 (202.2m)
+  - AD000032→033 (220.6m)
+  - AD000042→043 (234.2m)
+  - AD000044→045 (230.2m)
+  - AD000047→048 (268.6m)
+
+L1 failures at turns: 5 (expected)
+L2 activations: 5
+L2 successes: 4 (80%)
+L2 failures: 1 (AD000048, largest jump)
+L3 attempted on L2 failure: 1
+L3 success: 0 (cross-view difficult)
+
+Registration success: 4/5 sharp turn frames (80%)
+Overall impact on AC-9: <1% total failure rate
+Status: SHARP_TURNS_MOSTLY_HANDLED
+```
+
+### Step 2: Clustered Difficulty Scenario
+**Action**: Process AD000042-048 (2 sharp turns + outlier in 7 frames)
+**Expected Result**:
+```
+Total frames: 7
+Normal frames: 4 (042, 046, 047, 048 target frames)
+Challenging frames: 3 (043 gap, 044 pre-turn, 045 post-turn)
+
+L1 successes: 3/6 frame pairs (50%, expected low)
+L2 activations: 3
+L2 successes: 2
+Combined registration: 5/7 (71%)
+
+Observation: Clustered challenges stress system
+Mitigation: Multi-layer fallback prevents catastrophic failure
+Status: CLUSTERED_CHALLENGES_SURVIVED
+```
+
+### Step 3: Repetitive Texture Stress Test
+**Action**: Process agricultural field segment (AD000015-025)
+**Expected Result**:
+```
+Frames: 11
+Texture: Highly repetitive crop rows
+Traditional SIFT/ORB: Would fail (>50% outliers)
+SuperPoint+LightGlue: Succeeds (semantic features)
+
+L1 successes: 10/10 frame pairs (100%)
+SuperPoint feature quality: High (field boundaries prioritized)
+LightGlue outlier rejection: Effective (dustbin mechanism)
+Registration rate: 100%
+Status: REPETITIVE_TEXTURE_HANDLED
+```
+
+### Step 4: Degraded Satellite Data Simulation
+**Action**: Simulate stale satellite data (2-3 years old, terrain changes)
+**Expected Result**:
+```
+Scenario: 20% of satellite tiles outdated
+L2 retrieval attempts: 10
+L2 correct tile (outdated): 8
+L2 wrong tile: 2
+
+L3 refinement on outdated tiles:
+  - DINOv2 semantic features: Robust to changes
+  - Structural matching: 6/8 succeed (75%)
+  
+Combined L2+L3 success: 6/10 (60%)
+Impact on overall registration: Moderate
+Fallback to L1 trajectory: Maintains continuity
+Overall registration rate: >95% maintained
+Status: DEGRADED_DATA_TOLERATED
+```
+
+### Step 5: Seasonal Mismatch Test
+**Action**: Process with summer satellite tiles, autumn UAV imagery
+**Expected Result**:
+```
+Visual differences: Vegetation color, field state
+Traditional methods: Significant accuracy loss
+AnyLoc (DINOv2): Semantic invariance active
+
+L2 retrieval (color-invariant): 85% success
+L3 cross-view matching: 70% success (view angle + season)
+Registration maintained: Yes (structure-based features)
+Status: SEASONAL_ROBUSTNESS_VERIFIED
+```
+
+### Step 6: Calculate Challenging Conditions Registration Rate
+**Action**: Process full 60-image flight with all challenges, calculate final rate
+**Expected Result**:
+```
+Total images: 60
+Challenging frames: 15 (25% of flight)
+  - Sharp turns: 5
+  - Outlier: 1  
+  - Repetitive texture: 11 (overlapping with others)
+
+L1 success rate: 86.4% (51/59 pairs)
+L2 success rate (when L1 fails): 75% (6/8)
+L3 success rate (when L1+L2 fail): 50% (1/2)
+
+Total registered: 58/60
+Registration failures: 2
+Registration rate: 96.7%
+
+AC-9 Requirement: >95%
+Actual (challenging): 96.7%
+Status: AC-9 PASS under stress
+```
+
+## Pass/Fail Criteria
+
+**PASS if**:
+- Registration rate ≥95% despite multiple challenges
+- System demonstrates graceful degradation (challenges reduce but don't eliminate registration)
+- Multi-layer fallback working across all challenge types
+- No catastrophic failures (system crashes, infinite loops)
+- Clustered challenges (<3 consecutive failures)
+
+**FAIL if**:
+- Registration rate <95% under challenging conditions
+- Single challenge type causes >10% failure rate
+- Multi-layer fallback not activating appropriately
+- Catastrophic failure on any challenge type
+- Clustered failures >5 consecutive frames
+
+## Resilience Analysis
+
+### Without Multi-Layer Architecture
+```
+L1 only (sequential tracking):
+  Sharp turns: 100% failure (0% overlap)
+  Expected registration: 55/60 (91.7%)
+  Result: FAILS AC-9
+```
+
+### With Multi-Layer Architecture
+```
+L1 + L2 + L3 (proposed ASTRAL-Next):
+  L1 handles: 86.4% of cases
+  L2 recovers: 10.2% of cases (when L1 fails)
+  L3 refines: 1.7% of cases (when L1+L2 fail)
+  Expected registration: 58/60 (96.7%)
+  Result: PASSES AC-9
+```
+
+### Robustness Multiplier
+```
+Multi-layer provides ~5% improvement in registration rate
+This 5% is critical for meeting AC-9 threshold
+Justifies architectural complexity
+```
+
+## Failure Mode Analysis
+
+### Acceptable Failures (Within 5% Budget)
+- Extreme outliers (>300m, view completely different)
+- Satellite data completely missing (coverage gap)
+- UAV imagery corrupted (motion blur, exposure)
+- Location highly ambiguous (identical fields for km)
+
+### Unacceptable Failures (System Defects)
+- Crashes on difficult frames
+- L2 not activating when L1 fails
+- Infinite loops in matching algorithms
+- Memory exhaustion on challenging scenarios
+
+## Recovery Mechanisms Tested
+1. **L1→L2 Fallback**: Automatic when match count <50
+2. **L2→L3 Refinement**: Triggered on low retrieval confidence
+3. **Multi-Map (Atlas)**: New map started if all layers fail
+4. **User Input (AC-6)**: Requested after 3 consecutive failures
+
+## Notes
+- Challenging conditions test validates real-world operational robustness
+- 96.7% rate with challenges provides confidence in production deployment
+- Multi-layer architecture justification demonstrated empirically
+- 5% failure budget accommodates genuinely impossible registration cases
+- System designed for graceful degradation, not brittle all-or-nothing behavior
+
@@ -0,0 +1,256 @@
+# Acceptance Test: AC-10 - Mean Reprojection Error < 1.0 pixels
+
+## Summary
+Validate Acceptance Criterion 10: "Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location."
+
+## Linked Acceptance Criteria
+**AC-10**: MRE < 1.0 pixels
+
+## Preconditions
+1. ASTRAL-Next system operational
+2. F07 Sequential Visual Odometry extracting and matching features
+3. F10 Factor Graph Optimizer computing optimized poses
+4. Camera intrinsics calibrated (from F17 Configuration Manager)
+5. Test dataset with ground truth poses (for reference)
+6. Reprojection error calculation implemented
+
+## Reprojection Error Definition
+
+**Formula**:
+```
+For each matched feature point p_i in image I_j:
+  1. Triangulate 3D point X_i from matches across images
+  2. Project X_i back to image I_j using optimized pose T_j and camera K
+  3. p'_i = K * T_j * X_i (projected pixel location)
+  4. e_i = ||p_i - p'_i|| (Euclidean distance in pixels)
+
+MRE = (1/N) * Σ e_i  (mean across all features in all images)
+```
+
+## Test Data
+- **Dataset**: AD000001-AD000030 (30 images, baseline)
+- **Expected Features**: ~500-2000 matched features per image pair
+- **Total Measurements**: ~15,000-60,000 reprojection measurements
+
+## Test Steps
+
+### Step 1: Process Flight Through Complete Pipeline
+- **Action**: Process AD000001-AD000030 through full ASTRAL-Next pipeline
+- **Expected Result**:
+  - Factor graph initialized and optimized
+  - 30 poses computed
+  - All feature correspondences stored
+
+### Step 2: Extract Feature Correspondences
+- **Action**: Retrieve all matched features from F07
+- **Expected Result**:
+  - For each image pair (i, j):
+    - List of matched keypoint pairs: [(p_i, p_j), ...]
+    - Match confidence scores
+    - Total: ~500-1500 matches per pair
+  - Total matches across flight: ~15,000-45,000
+
+### Step 3: Triangulate 3D Points
+- **Action**: For each matched feature across multiple views, triangulate 3D position
+- **Expected Result**:
+  - 3D point cloud generated
+  - Each point has:
+    - 3D coordinates (X, Y, Z) in ENU frame
+    - List of observations (image_id, pixel_location)
+    - Triangulation uncertainty
+
+### Step 4: Calculate Per-Feature Reprojection Error
+- **Action**: For each 3D point and each observation:
+  ```
+  For point X with observation (image_j, pixel_p):
+    1. Get optimized pose T_j from factor graph
+    2. Get camera intrinsics K from config
+    3. Project: p' = project(K, T_j, X)
+    4. Error: e = sqrt((p.x - p'.x)² + (p.y - p'.y)²)
+  ```
+- **Expected Result**:
+  - Array of per-feature reprojection errors
+  - Typical range: 0.1 - 3.0 pixels
+
+### Step 5: Compute Statistical Metrics
+- **Action**: Calculate MRE and distribution statistics
+- **Expected Result**:
+  ```
+  Total features evaluated: 25,000
+  Mean Reprojection Error (MRE): 0.72 pixels
+  Median Reprojection Error: 0.58 pixels
+  Standard Deviation: 0.45 pixels
+  90th Percentile: 1.25 pixels
+  95th Percentile: 1.68 pixels
+  99th Percentile: 2.41 pixels
+  Max Error: 4.82 pixels
+  ```
+
+### Step 6: Validate MRE Threshold
+- **Action**: Compare MRE against AC-10 requirement
+- **Expected Result**:
+  - **MRE = 0.72 pixels < 1.0 pixels** ✓
+  - AC-10 PASS
+
+### Step 7: Identify Outlier Reprojections
+- **Action**: Find features with reprojection error > 3.0 pixels
+- **Expected Result**:
+  ```
+  Outliers (> 3.0 pixels): 127 (0.5% of total)
+  Outlier distribution:
+    - 3.0-5.0 pixels: 98 features
+    - 5.0-10.0 pixels: 27 features
+    - > 10.0 pixels: 2 features
+  ```
+
+### Step 8: Analyze Outlier Causes
+- **Action**: Investigate high-error features
+- **Expected Result**:
+  - Most outliers at image boundaries (lens distortion)
+  - Some at occlusion boundaries
+  - Moving objects (if any)
+  - Repetitive textures causing mismatches
+
+### Step 9: Per-Image MRE Analysis
+- **Action**: Calculate MRE per image
+- **Expected Result**:
+  ```
+  Per-Image MRE:
+  AD000001: 0.68 px (baseline)
+  AD000002: 0.71 px
+  ...
+  AD000032: 1.12 px (sharp turn - higher error)
+  AD000033: 0.95 px
+  ...
+  AD000030: 0.74 px
+  
+  Images with MRE > 1.0: 2 out of 30 (6.7%)
+  Overall MRE: 0.72 px
+  ```
+
+### Step 10: Temporal MRE Trend
+- **Action**: Plot MRE over sequence to detect drift
+- **Expected Result**:
+  - MRE relatively stable across sequence
+  - No significant upward trend (would indicate drift)
+  - Spikes at known challenging locations (sharp turns)
+
+### Step 11: Validate Robust Kernel Effect
+- **Action**: Compare MRE with/without robust cost functions
+- **Expected Result**:
+  ```
+  Without robust kernels: MRE = 0.89 px, outliers affect mean
+  With Cauchy kernel: MRE = 0.72 px, outliers downweighted
+  Improvement: 19% reduction in MRE
+  ```
+
+### Step 12: Cross-Validate with GPS Accuracy
+- **Action**: Correlate MRE with GPS error
+- **Expected Result**:
+  - Low MRE correlates with low GPS error
+  - Images with MRE > 1.5 px tend to have GPS error > 30m
+  - MRE is leading indicator of trajectory quality
+
+### Step 13: Test Under Challenging Conditions
+- **Action**: Compute MRE for challenging dataset (AD000001-060)
+- **Expected Result**:
+  ```
+  Full Flight MRE:
+  Total features: 55,000
+  MRE: 0.84 pixels (still < 1.0)
+  Challenging segments:
+    - Sharp turns: MRE = 1.15 px (above threshold locally)
+    - Normal segments: MRE = 0.68 px
+  Overall: AC-10 PASS
+  ```
+
+### Step 14: Generate Reprojection Error Report
+- **Action**: Create comprehensive MRE report
+- **Expected Result**:
+  ```
+  ========================================
+  REPROJECTION ERROR REPORT
+  Flight: AC10_Test
+  Dataset: AD000001-AD000030
+  ========================================
+  
+  SUMMARY:
+  Mean Reprojection Error: 0.72 pixels
+  AC-10 Threshold: 1.0 pixels
+  Status: PASS ✓
+  
+  DISTRIBUTION:
+  < 0.5 px: 12,450 (49.8%)
+  0.5-1.0 px: 9,875 (39.5%)
+  1.0-2.0 px: 2,350 (9.4%)
+  2.0-3.0 px: 198 (0.8%)
+  > 3.0 px: 127 (0.5%)
+  
+  PER-IMAGE BREAKDOWN:
+  Images meeting < 1.0 px MRE: 28/30 (93.3%)
+  Images with highest MRE: AD000032 (1.12 px), AD000048 (1.08 px)
+  
+  CORRELATION WITH GPS ACCURACY:
+  Pearson correlation (MRE vs GPS error): 0.73
+  Low MRE predicts high GPS accuracy
+  
+  RECOMMENDATIONS:
+  - System meets AC-10 requirement
+  - Consider additional outlier filtering for images > 1.0 px MRE
+  - Sharp turn handling could be improved
+  ========================================
+  ```
+
+## Success Criteria
+
+**Primary Criterion (AC-10)**:
+- Mean Reprojection Error < 1.0 pixels across entire flight
+
+**Supporting Criteria**:
+- Standard deviation < 2.0 pixels
+- No outlier reprojections > 10 pixels (indicates gross errors)
+- Per-image MRE < 2.0 pixels (no catastrophic single-image failures)
+- MRE stable across sequence (no drift)
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- Overall MRE < 1.0 pixels
+- Standard deviation reasonable (< 2.0 pixels)
+- Less than 1% of features have error > 5.0 pixels
+- MRE consistent across multiple test runs (variance < 10%)
+
+**TEST FAILS IF**:
+- MRE ≥ 1.0 pixels
+- Standard deviation > 3.0 pixels (high variance indicates instability)
+- More than 5% of features have error > 5.0 pixels
+- MRE increases significantly over sequence (drift)
+
+## Diagnostic Actions if Failing
+
+**If MRE > 1.0 px**:
+1. Check camera calibration accuracy
+2. Verify lens distortion model
+3. Review feature matching quality (outlier ratio)
+4. Examine factor graph convergence
+5. Check for scale drift in trajectory
+
+**If High Variance**:
+1. Investigate images with outlier MRE
+2. Check for challenging conditions (blur, low texture)
+3. Review robust kernel settings
+4. Verify triangulation accuracy
+
+## Components Involved
+- F07 Sequential Visual Odometry: Feature extraction and matching
+- F10 Factor Graph Optimizer: Pose optimization, marginal covariances
+- F13 Coordinate Transformer: 3D point projection
+- H01 Camera Model: Camera intrinsics, projection functions
+- H03 Robust Kernels: Outlier handling in optimization
+
+## Notes
+- MRE is a geometric consistency metric, not direct GPS accuracy
+- Low MRE indicates well-constrained factor graph
+- High MRE with good GPS accuracy = overfitting to GPS anchors
+- Low MRE with poor GPS accuracy = scale/alignment issues
+- AC-10 validates internal consistency of vision pipeline
@@ -0,0 +1,25 @@
+# Cross-Cutting Acceptance Test: Long Flight Maximum Scale
+
+## Summary
+Validate system scalability with maximum expected flight length of 3000 images.
+
+## Test Description
+Process extended flight with 3000 images (simulated or actual) to validate system can handle maximum scale per requirements.
+
+## Test Steps
+1. Create flight with 3000 images
+2. Process through complete pipeline
+3. Monitor memory usage, processing time, accuracy
+4. Verify no degradation over time
+
+## Success Criteria
+- All 3000 images processed
+- Total time < 15,000 seconds (5s/image)
+- Accuracy maintained throughout (AC-1, AC-2)
+- Memory usage stable (no leaks)
+- Registration rate > 95% (AC-9)
+
+## Pass/Fail
+**Passes**: System handles 3000 images without failure
+**Fails**: Memory exhaustion, crashes, significant performance degradation
+
@@ -0,0 +1,28 @@
+# Cross-Cutting Acceptance Test: Degraded Satellite Data
+
+## Summary
+Validate system robustness with outdated or lower-quality satellite imagery per operational constraints.
+
+## Test Description
+Test system performance when satellite data is:
+- Outdated (2+ years old)
+- Lower resolution (Zoom 17-18 instead of 19)
+- Partially missing (some areas without coverage)
+- From different season (winter satellite, summer flight)
+
+## Test Steps
+1. Use deliberately outdated satellite tiles
+2. Process AD000001-030
+3. Measure accuracy degradation
+4. Verify L2 (DINOv2) still matches despite appearance changes
+
+## Success Criteria
+- System continues to operate (no crash)
+- L2 retrieval still works (top-5 recall > 70%)
+- Accuracy degrades gracefully (may not meet AC-2, but should meet relaxed threshold)
+- Clear error/warning messages about data quality
+
+## Pass/Fail
+**Passes**: System operational with degraded data, graceful degradation
+**Fails**: Complete failure, crashes, or no indication of data quality issues
+
@@ -0,0 +1,35 @@
+# Cross-Cutting Acceptance Test: Complete System Validation
+
+## Summary
+Comprehensive end-to-end validation of all acceptance criteria simultaneously using full 60-image test dataset.
+
+## Linked Acceptance Criteria
+ALL (AC-1 through AC-10)
+
+## Test Description
+Process complete Test_Long_Flight dataset (AD000001-060) and validate compliance with all 10 acceptance criteria in single comprehensive test.
+
+## Test Steps
+1. Process AD000001-060 (includes normal flight, sharp turns, outliers)
+2. Calculate all metrics
+3. Validate against all ACs
+
+## Validation Matrix
+- AC-1: ≥80% < 50m ✓
+- AC-2: ≥60% < 20m ✓
+- AC-3: Outlier handled ✓
+- AC-4: Sharp turns recovered ✓
+- AC-5: Multi-fragment if needed ✓
+- AC-6: User input mechanism functional ✓
+- AC-7: < 5s per image ✓
+- AC-8: Real-time + refinement ✓
+- AC-9: Registration > 95% ✓
+- AC-10: MRE < 1.0px ✓
+
+## Success Criteria
+ALL 10 acceptance criteria must pass
+
+## Pass/Fail
+**Passes**: All ACs met
+**Fails**: Any AC fails
+
@@ -0,0 +1,33 @@
+# GPS-Analyzed Scenario Test: Baseline Standard Flight
+
+## Summary
+Test using GPS-analyzed Test_Baseline dataset (AD000001-030) with consistent ~120m spacing and no major outliers.
+
+## Dataset Characteristics
+- **Images**: AD000001-AD000030 (30 images)
+- **Mean spacing**: 120.8m
+- **Max spacing**: 202.2m (AD000003→004)
+- **Sharp turns**: None
+- **Outliers**: None
+- **Purpose**: Validate ideal operation (AC-1, AC-2, AC-7)
+
+## Test Steps
+1. Process AD000001-030
+2. Validate accuracy against ground truth (coordinates.csv)
+3. Check processing time
+4. Verify AC-1 and AC-2 compliance
+
+## Expected Results
+```
+Total Images: 30
+Mean Error: ~24m
+Images < 50m: ≥24 (80%) - AC-1 ✓
+Images < 20m: ≥18 (60%) - AC-2 ✓
+Processing Time: < 150s - AC-7 ✓
+Registration Rate: 100% - AC-9 ✓
+```
+
+## Pass/Fail
+**Passes**: AC-1, AC-2, AC-7, AC-9 all met
+**Fails**: Any accuracy or performance target missed
+
@@ -0,0 +1,29 @@
+# GPS-Analyzed Scenario Test: 350m Outlier
+
+## Summary
+Test using GPS-analyzed Test_Outlier_350m dataset (AD000045-050) containing 268.6m jump.
+
+## Dataset Characteristics
+- **Images**: AD000045-AD000050 (6 images)
+- **Outlier**: AD000047 → AD000048 (268.6m jump)
+- **Purpose**: Validate AC-3 (outlier robustness)
+
+## Test Steps
+1. Process AD000045-050
+2. Monitor robust kernel activation
+3. Verify trajectory remains consistent
+4. Check all images processed
+
+## Expected Results
+```
+Outlier Detected: AD000047→048 (268.6m)
+Robust Kernel: Activated ✓
+Images Processed: 6/6 (100%)
+Non-outlier accuracy: Good (< 50m)
+AC-3 Status: PASS
+```
+
+## Pass/Fail
+**Passes**: AC-3 met, system handles outlier without failure
+**Fails**: System crashes or trajectory corrupted
+
@@ -0,0 +1,40 @@
+# GPS-Analyzed Scenario Test: Sharp Turn Recovery
+
+## Summary
+Test using GPS-analyzed sharp turn datasets with frame gaps to simulate tracking loss.
+
+## Dataset Characteristics
+
+### Dataset A: Single Frame Gap
+- **Images**: AD000042, AD000044, AD000045, AD000046
+- **Gap**: Skip AD000043 (simulates tracking loss)
+- **Purpose**: Test L2 recovery
+
+### Dataset B: Sequential with Jump
+- **Images**: AD000032-AD000035
+- **Jump**: AD000032→033 (220.6m)
+- **Purpose**: Validate sharp turn handling
+
+### Dataset C: 5-Frame Gap
+- **Images**: AD000003, AD000009
+- **Gap**: Skip AD000004-008
+- **Purpose**: Test larger discontinuity
+
+## Test Steps
+1. Process each dataset separately
+2. Verify L2 global relocalization succeeds
+3. Check L1 failure detection
+4. Validate recovery accuracy
+
+## Expected Results
+```
+Dataset A: L2 recovery successful, error < 50m
+Dataset B: Sharp turn handled, accuracy maintained
+Dataset C: Large gap recovered, relocated < 200m
+AC-4 Status: PASS (all datasets)
+```
+
+## Pass/Fail
+**Passes**: AC-4 met, all recovery scenarios successful
+**Fails**: L2 fails to relocalize, errors > 200m
+
@@ -0,0 +1,37 @@
+# GPS-Analyzed Scenario Test: Full Long Flight
+
+## Summary
+Test using complete GPS-analyzed dataset (AD000001-060) with all variations.
+
+## Dataset Characteristics
+- **Images**: AD000001-AD000060 (all 60 images)
+- **Mean spacing**: 120.8m
+- **Sharp turns**: 5 locations (>200m)
+- **Outlier**: 1 location (268.6m)
+- **Terrain**: Varied agricultural land
+- **Purpose**: Complete system validation with real data
+
+## Test Steps
+1. Process all 60 images in order
+2. Handle all sharp turns automatically
+3. Manage outlier robustly
+4. Achieve all accuracy and performance targets
+
+## Expected Results
+```
+Total Images: 60
+Processed: 60 (100%)
+Sharp Turns Handled: 5/5
+Outlier Managed: 1/1
+Mean Error: < 30m
+Images < 50m: ≥48 (80%) - AC-1 ✓
+Images < 20m: ≥36 (60%) - AC-2 ✓
+Processing Time: < 300s - AC-7 ✓
+Registration Rate: >95% - AC-9 ✓
+MRE: < 1.0px - AC-10 ✓
+```
+
+## Pass/Fail
+**Passes**: All ACs met with full real dataset
+**Fails**: Any AC fails on real operational data
+
@@ -0,0 +1,139 @@
+# Acceptance Test: Chunk Rotation Recovery
+
+## Summary
+Validate chunk LiteSAM matching with rotation sweeps for chunks with unknown orientation (sharp turns).
+
+## Linked Acceptance Criteria
+**AC-4**: Robust to sharp turns (<5% overlap)
+**AC-5**: Connect route chunks
+
+## Preconditions
+1. F02.2 Flight Processing Engine running
+2. F11 Failure Recovery Coordinator (chunk orchestration, returns status objects)
+3. F12 Route Chunk Manager functional (chunk lifecycle via `create_chunk()`, `mark_chunk_anchored()`)
+4. F06 Image Rotation Manager with chunk rotation support (`try_chunk_rotation_steps()`)
+5. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
+6. F09 Metric Refinement with chunk LiteSAM matching (`align_chunk_to_satellite()`)
+7. F10 Factor Graph Optimizer with chunk operations (`add_chunk_anchor()`, `merge_chunk_subgraphs()`)
+8. Test dataset: Chunk with unknown orientation (simulated sharp turn)
+
+## Test Description
+Test system's ability to match chunks with unknown orientation using rotation sweeps. When a chunk is created after a sharp turn, its orientation relative to the satellite map is unknown. The system must rotate the entire chunk to all possible angles and attempt LiteSAM matching.
+
+## Test Steps
+
+### Step 1: Create Chunk with Unknown Orientation
+- **Action**: Simulate sharp turn scenario
+  - Process frames 1-10 (normal flight, heading 0°)
+  - Sharp turn at frame 11 (heading changes to 120°)
+  - Tracking lost, chunk_2 created proactively
+  - Process frames 11-20 in chunk_2
+- **Expected Result**:
+  - Chunk_2 created with frames 11-20
+  - Chunk orientation unknown (previous heading not relevant)
+  - Chunk ready for matching (10 frames)
+
+### Step 2: Chunk Semantic Matching
+- **Action**: Attempt chunk semantic matching
+- **Expected Result**:
+  - F08.compute_chunk_descriptor() → aggregate DINOv2 descriptor
+  - F08.retrieve_candidate_tiles_for_chunk() → returns top-5 candidate tiles
+  - Correct tile in top-5 candidates
+
+### Step 3: Chunk Rotation Sweeps
+- **Action**: Attempt chunk LiteSAM matching with rotation sweeps
+- **Expected Result**:
+  - F06.try_chunk_rotation_steps() called
+  - For each rotation angle (0°, 30°, 60°, ..., 330°):
+    - F06.rotate_chunk_360() rotates all 10 images
+    - F09.align_chunk_to_satellite() attempts matching
+  - Match found at 120° rotation (correct orientation)
+  - Returns ChunkAlignmentResult with:
+    - rotation_angle: 120°
+    - chunk_center_gps: correct GPS
+    - confidence > 0.7
+    - Sim(3) transform computed
+
+### Step 4: Chunk Merging
+- **Action**: Merge chunk_2 to main trajectory
+- **Expected Result**:
+  - F12.mark_chunk_anchored() updates chunk state (calls F10.add_chunk_anchor())
+  - F12.merge_chunks() merges chunk_2 into chunk_1 (calls F10.merge_chunk_subgraphs())
+  - Sim(3) transform applied correctly
+  - Global trajectory consistent
+
+### Step 5: Verify Final Trajectory
+- **Action**: Verify all frames have GPS coordinates
+- **Expected Result**:
+  - All 20 frames have GPS coordinates
+  - Frames 1-10: Original trajectory
+  - Frames 11-20: Merged chunk trajectory
+  - Global consistency maintained
+  - Accuracy: 18/20 < 50m (90%)
+
+## Success Criteria
+
+**Primary Criterion**:
+- Chunk rotation sweeps find correct orientation (120°)
+- Chunk LiteSAM matching succeeds with rotation
+- Chunk merged correctly to main trajectory
+
+**Supporting Criteria**:
+- All 12 rotation angles tried
+- Match found at correct angle
+- Sim(3) transform computed correctly
+- Final trajectory globally consistent
+
+## Expected Results
+
+```
+Chunk Rotation Recovery:
+- Chunk_2 created: frames 11-20
+- Unknown orientation: previous heading (0°) not relevant
+- Chunk semantic matching: correct tile in top-5
+- Rotation sweeps: 12 rotations tried (0°, 30°, ..., 330°)
+- Match found: 120° rotation
+- Chunk center GPS: Accurate (within 20m)
+- Chunk merged: Sim(3) transform applied
+- Final trajectory: Globally consistent
+- Accuracy: 18/20 < 50m (90%)
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- Chunk rotation sweeps find match at correct angle (120°)
+- Chunk LiteSAM matching succeeds
+- Chunk merged correctly
+- Final trajectory globally consistent
+- Accuracy acceptable (≥ 80% < 50m)
+
+**TEST FAILS IF**:
+- Rotation sweeps don't find correct angle
+- Chunk LiteSAM matching fails
+- Chunk merging produces inconsistent trajectory
+- Final accuracy below threshold
+
+## Architecture Elements
+
+**Chunk Rotation**:
+- F06 rotates all images in chunk by same angle
+- 12 rotation steps: 0°, 30°, 60°, ..., 330°
+- F09 attempts LiteSAM matching for each rotation
+
+**Chunk LiteSAM Matching**:
+- Aggregate correspondences from multiple images
+- More robust than single-image matching
+- Handles featureless terrain better
+
+**Chunk Merging**:
+- Sim(3) transform (translation, rotation, scale)
+- Critical for monocular VO scale ambiguity
+- Preserves internal consistency
+
+## Notes
+- Rotation sweeps are critical for chunks from sharp turns
+- Previous heading may not be relevant after sharp turn
+- Chunk matching more robust than single-image matching
+- Sim(3) transform accounts for scale differences
+
@@ -0,0 +1,175 @@
+# Acceptance Test: Multi-Chunk Simultaneous Processing
+
+## Summary
+Validate system's ability to process multiple chunks simultaneously, matching and merging them asynchronously.
+
+## Linked Acceptance Criteria
+**AC-4**: Robust to sharp turns (<5% overlap)
+**AC-5**: Connect route chunks (multiple chunks)
+
+## Preconditions
+1. F02.2 Flight Processing Engine running
+2. F10 Factor Graph Optimizer with native multi-chunk support (subgraph operations)
+3. F11 Failure Recovery Coordinator (pure logic, returns status objects to F02.2)
+4. F12 Route Chunk Manager functional (chunk lifecycle: `create_chunk()`, `add_frame_to_chunk()`, `mark_chunk_anchored()`, `merge_chunks()`)
+5. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
+6. F09 Metric Refinement (chunk LiteSAM matching)
+7. Test dataset: Flight with 3 disconnected segments
+
+## Test Description
+Test system's ability to handle multiple disconnected route segments simultaneously. The system should create chunks proactively, process them independently, and match/merge them asynchronously without blocking frame processing.
+
+## Test Steps
+
+### Step 1: Create Multi-Segment Flight
+- **Action**: Create flight with 3 disconnected segments:
+  - Segment 1: AD000001-010 (10 frames, sequential)
+  - Segment 2: AD000025-030 (6 frames, no overlap with Segment 1)
+  - Segment 3: AD000050-055 (6 frames, no overlap with Segments 1 or 2)
+- **Expected Result**: Flight created with all 22 images
+
+### Step 2: Process Segment 1
+- **Action**: Process AD000001-010
+- **Expected Result**:
+  - Chunk_1 created (frames 1-10)
+  - Sequential VO provides relative factors
+  - Factors added to chunk_1's subgraph
+  - Chunk_1 optimized independently
+  - GPS anchors from LiteSAM matching
+  - Chunk_1 anchored and merged to main trajectory
+
+### Step 3: Detect Discontinuity (Segment 1 → 2)
+- **Action**: Process AD000025 after AD000010
+- **Expected Result**:
+  - Tracking lost detected (large displacement ~2km)
+  - **Chunk_2 created proactively** (immediate, not reactive)
+  - Processing continues immediately in chunk_2
+  - Chunk_1 remains in factor graph (not deleted)
+
+### Step 4: Process Segment 2 Independently
+- **Action**: Process AD000025-030
+- **Expected Result**:
+  - Frames processed in chunk_2 context
+  - Relative factors added to chunk_2's subgraph
+  - Chunk_2 optimized independently
+  - Chunk_1 and chunk_2 exist simultaneously
+  - Chunk_2 matching attempted asynchronously (background)
+
+### Step 5: Detect Discontinuity (Segment 2 → 3)
+- **Action**: Process AD000050 after AD000030
+- **Expected Result**:
+  - Tracking lost detected again
+  - **Chunk_3 created proactively**
+  - Processing continues in chunk_3
+  - Chunk_1, chunk_2, chunk_3 all exist simultaneously
+
+### Step 6: Process Segment 3 Independently
+- **Action**: Process AD000050-055
+- **Expected Result**:
+  - Frames processed in chunk_3 context
+  - Chunk_3 optimized independently
+  - All 3 chunks exist simultaneously
+  - Each chunk processed independently
+
+### Step 7: Asynchronous Chunk Matching
+- **Action**: Background task attempts matching for unanchored chunks
+- **Expected Result**:
+  - Chunk_2 semantic matching attempted
+  - Chunk_2 LiteSAM matching attempted
+  - Chunk_2 anchored when match found
+  - Chunk_3 semantic matching attempted
+  - Chunk_3 LiteSAM matching attempted
+  - Chunk_3 anchored when match found
+  - Matching happens asynchronously (doesn't block frame processing)
+
+### Step 8: Chunk Merging
+- **Action**: Merge chunks as they become anchored
+- **Expected Result**:
+  - Chunk_2 merged to chunk_1 when anchored
+  - Chunk_3 merged to merged trajectory when anchored
+  - Sim(3) transforms applied correctly
+  - Global optimization performed
+  - All chunks merged into single trajectory
+
+### Step 9: Verify Final Trajectory
+- **Action**: Verify all 22 frames have GPS coordinates
+- **Expected Result**:
+  - All frames have GPS coordinates
+  - Trajectory globally consistent
+  - No systematic bias between segments
+  - Accuracy: 20/22 < 50m (90.9%)
+
+## Success Criteria
+
+**Primary Criterion**:
+- Multiple chunks created simultaneously
+- Chunks processed independently
+- Chunks matched and merged asynchronously
+- Final trajectory globally consistent
+
+**Supporting Criteria**:
+- Chunk creation is proactive (immediate)
+- Frame processing continues during chunk matching
+- Chunk matching doesn't block processing
+- Sim(3) transforms computed correctly
+
+## Expected Results
+
+```
+Multi-Chunk Simultaneous Processing:
+- Chunk_1: frames 1-10, anchored, merged
+- Chunk_2: frames 25-30, anchored asynchronously, merged
+- Chunk_3: frames 50-55, anchored asynchronously, merged
+- Simultaneous existence: All 3 chunks exist at same time
+- Independent processing: Each chunk optimized independently
+- Asynchronous matching: Matching doesn't block frame processing
+- Final trajectory: Globally consistent
+- Accuracy: 20/22 < 50m (90.9%)
+```
+
+## Pass/Fail Criteria
+
+**TEST PASSES IF**:
+- Multiple chunks created simultaneously
+- Chunks processed independently
+- Chunks matched and merged asynchronously
+- Final trajectory globally consistent
+- Accuracy acceptable (≥ 80% < 50m)
+
+**TEST FAILS IF**:
+- Chunks not created simultaneously
+- Chunk processing blocks frame processing
+- Chunk matching blocks processing
+- Merging produces inconsistent trajectory
+- Final accuracy below threshold
+
+## Architecture Elements
+
+**Multi-Chunk Support**:
+- F10 Factor Graph Optimizer supports multiple chunks via `create_chunk_subgraph()`
+- Each chunk has own subgraph, optimized independently via `optimize_chunk()`
+- F12 Route Chunk Manager owns chunk metadata (status, is_active, etc.)
+
+**Proactive Chunk Creation**:
+- F11 triggers chunk creation via `create_chunk_on_tracking_loss()`
+- F12.create_chunk() creates chunk and calls F10.create_chunk_subgraph()
+- Processing continues in new chunk immediately (not reactive)
+
+**Asynchronous Matching**:
+- F02.2 manages background task that calls F11.process_unanchored_chunks()
+- F11 calls F12.get_chunks_for_matching() to find ready chunks
+- F11.try_chunk_semantic_matching() → F11.try_chunk_litesam_matching()
+- Matching doesn't block frame processing
+
+**Chunk Merging**:
+- F11.merge_chunk_to_trajectory() coordinates merging
+- F12.merge_chunks() updates chunk state and calls F10.merge_chunk_subgraphs()
+- Sim(3) transform accounts for translation, rotation, scale
+- F10.optimize_global() runs after merging
+
+## Notes
+- Multiple chunks can exist simultaneously
+- Chunk processing is independent and non-blocking
+- Asynchronous matching reduces user input requests
+- Sim(3) transform critical for scale ambiguity resolution
+