update tests

2026-06-21 09:11:12 +00:00 · 2025-11-30 16:21:03 +02:00
parent ce9760fcbe
commit b12f37ab01
12 changed files with 1435 additions and 603 deletions
@@ -65,4 +65,6 @@
   - Tasks
   - Technical enablers
  ## Notes
   - Be as much concise as possible in formulating epics. The less words with the same meaning - the better epic is.
@@ -3,7 +3,7 @@
 ## Overview
 Comprehensive test specifications for the GPS-denied navigation system following the QA testing pyramid approach.
-**Total Test Specifications**: 56
+**Total Test Specifications**: 49
 ## Test Organization
@@ -11,65 +11,77 @@ Comprehensive test specifications for the GPS-denied navigation system following
 Tests individual system components in isolation with their dependencies.
 **Vision Pipeline (01-04)**:
- 01: Sequential Visual Odometry (SuperPoint + LightGlue)
+- 01: Sequential Visual Odometry (F07 - SuperPoint + LightGlue)
- 02: Global Place Recognition (AnyLoc)
+- 02: Global Place Recognition (F08 - AnyLoc/DINOv2)
- 03: Metric Refinement (LiteSAM)
+- 03: Metric Refinement (F09 - LiteSAM)
- 04: Factor Graph Optimizer (GTSAM)
+- 04: Factor Graph Optimizer (F10 - GTSAM)
 **Data Management (05-08)**:
- 05: Satellite Data Manager
+- 05: Satellite Data Manager (F04)
- 06: Coordinate Transformer
+- 06: Coordinate Transformer (F13)
- 07: Image Input Pipeline
+- 07: Image Input Pipeline (F05)
- 08: Image Rotation Manager
+- 08: Image Rotation Manager (F06)
 **Service Infrastructure (09-12)**:
- 09: REST API (FastAPI endpoints for flight management, image uploads, user fixes)
+- 09: REST API (F01 - FastAPI endpoints)
- 10: SSE Event Streamer (Server-Sent Events for real-time result streaming)
+- 10: SSE Event Streamer (F15 - real-time streaming)
- 11: Flight Manager
+- 11a: Flight Lifecycle Manager (F02.1 - CRUD, initialization, API delegation)
- 12: Result Manager
+- 11b: Flight Processing Engine (F02.2 - processing loop, recovery coordination)
 - 12: Result Manager (F14)
 **Support Components (13-16)**:
- 13: Model Manager (TensorRT)
+- 13: Model Manager (F16 - TensorRT)
- 14: Failure Recovery Coordinator
+- 14: Failure Recovery Coordinator (F11)
- 15: Configuration Manager
+- 15: Configuration Manager (F17)
- 16: Database Layer
+- 16: Database Layer (F03)
 ### System Integration Tests (21-25): Multi-Component Flows
 Tests integration between multiple components.
 - 21: End-to-End Normal Flight
- 22: Satellite to Vision Pipeline
+- 22: Satellite to Vision Pipeline (F04 → F07/F08/F09)
- 23: Vision to Optimization Pipeline
+- 23: Vision to Optimization Pipeline (F07/F08/F09 → F10)
 - 24: Multi-Component Error Propagation
- 25: Real-Time Streaming Pipeline
+- 25: Real-Time Streaming Pipeline (F02 → F14 → F15)
-### Acceptance Tests (31-43): Requirements Validation
+### Acceptance Tests (31-50): Requirements Validation
 Tests mapped to 10 acceptance criteria.
-**Accuracy (31-32)**:
+**Accuracy (31-33)**:
- 31: AC-1 - 80% < 50m error
+- 31: AC-1 - 80% < 50m error (baseline)
- 32: AC-2 - 60% < 20m error
+- 32: AC-1 - 80% < 50m error (varied terrain)
 - 33: AC-2 - 60% < 20m error (high precision)
-**Robustness (33-35)**:
+**Robustness - Outliers (34-35)**:
- 33: AC-3 - 350m outlier handling
+- 34: AC-3 - Single 350m outlier handling
- 34: AC-4 - Sharp turn recovery
+- 35: AC-3 - Multiple outliers handling
 - 35: AC-5 - Multi-fragment route connection
-**User Interaction (36)**:
+**Robustness - Sharp Turns (36-38)**:
- 36: AC-6 - User input after 3 failures
+- 36: AC-4 - Sharp turn zero overlap recovery
 - 37: AC-4 - Sharp turn minimal overlap (<5%)
 - 38: Outlier anchor detection
-**Performance (37-38)**:
+**Multi-Fragment (39)**:
- 37: AC-7 - <5s per image
+- 39: AC-5 - Multi-fragment route connection (chunk architecture)
 - 38: AC-8 - Real-time streaming + refinement
-**Quality Metrics (39-40)**:
+**User Interaction (40)**:
- 39: AC-9 - Registration rate >95%
+- 40: AC-6 - User input after 3 consecutive failures
 - 40: AC-10 - MRE <1.0 pixels
-**Cross-Cutting (41-43)**:
+**Performance (41-44)**:
- 41: Long flight (3000 images)
+- 41: AC-7 - <5s single image processing
- 42: Degraded satellite data
+- 42: AC-7 - Sustained throughput performance
- 43: Complete system validation
+- 43: AC-8 - Real-time streaming results
 - 44: AC-8 - Async refinement delivery
 **Quality Metrics (45-47)**:
 - 45: AC-9 - Registration rate >95% (baseline)
 - 46: AC-9 - Registration rate >95% (challenging conditions)
 - 47: AC-10 - Mean Reprojection Error <1.0 pixels
 **Cross-Cutting (48-50)**:
 - 48: Long flight (3000 images)
 - 49: Degraded satellite data
 - 50: Complete system acceptance validation
 **Chunk-Based Recovery (55-56)**:
 - 55: Chunk rotation recovery (rotation sweeps for chunks)
@@ -109,16 +121,39 @@ Tests using GPS-analyzed test datasets.
 | AC | Requirement | Test Specs | Status |
 |----|-------------|------------|--------|
-| AC-1 | 80% < 50m error | 31, 43, 51, 54 | ✓ Covered |
+| AC-1 | 80% < 50m error | 31, 32, 50, 51, 54 | ✓ Covered |
-| AC-2 | 60% < 20m error | 32, 43, 51, 54 | ✓ Covered |
+| AC-2 | 60% < 20m error | 33, 50, 51, 54 | ✓ Covered |
-| AC-3 | 350m outlier robust | 33, 43, 52, 54 | ✓ Covered |
+| AC-3 | 350m outlier robust | 34, 35, 50, 52, 54 | ✓ Covered |
-| AC-4 | Sharp turn <5% overlap | 34, 43, 53, 54, 55 | ✓ Covered |
+| AC-4 | Sharp turn <5% overlap | 36, 37, 50, 53, 54, 55 | ✓ Covered |
-| AC-5 | Multi-fragment connection | 35, 39, 43, 56 | ✓ Covered |
+| AC-5 | Multi-fragment connection | 39, 50, 56 | ✓ Covered |
-| AC-6 | User input after 3 failures | 36, 43 | ✓ Covered |
+| AC-6 | User input after 3 failures | 40, 50 | ✓ Covered |
-| AC-7 | <5s per image | 37, 43, 51, 54 | ✓ Covered |
+| AC-7 | <5s per image | 41, 42, 50, 51, 54 | ✓ Covered |
-| AC-8 | Real-time + refinement | 38, 43 | ✓ Covered |
+| AC-8 | Real-time + refinement | 43, 44, 50 | ✓ Covered |
-| AC-9 | Registration >95% | 39, 43, 51, 54 | ✓ Covered |
+| AC-9 | Registration >95% | 45, 46, 50, 51, 54 | ✓ Covered |
-| AC-10 | MRE <1.0px | 40, 43 | ✓ Covered |
+| AC-10 | MRE <1.0px | 47, 50 | ✓ Covered |
 ## Component to Test Mapping
 | Component | ID | Integration Test |
 |-----------|-----|------------------|
 | Flight API | F01 | 09 |
 | Flight Lifecycle Manager | F02.1 | 11a |
 | Flight Processing Engine | F02.2 | 11b |
 | Flight Database | F03 | 16 |
 | Satellite Data Manager | F04 | 05 |
 | Image Input Pipeline | F05 | 07 |
 | Image Rotation Manager | F06 | 08 |
 | Sequential Visual Odometry | F07 | 01 |
 | Global Place Recognition | F08 | 02 |
 | Metric Refinement | F09 | 03 |
 | Factor Graph Optimizer | F10 | 04 |
 | Failure Recovery Coordinator | F11 | 14 |
 | Route Chunk Manager | F12 | 39, 55, 56 |
 | Coordinate Transformer | F13 | 06 |
 | Result Manager | F14 | 12 |
 | SSE Event Streamer | F15 | 10 |
 | Model Manager | F16 | 13 |
 | Configuration Manager | F17 | 15 |
 ## Test Execution Strategy
@@ -132,14 +167,15 @@ Tests using GPS-analyzed test datasets.
 - Validate end-to-end flows
 - Verify error handling across components
-### Phase 3: Acceptance Testing (31-43)
+### Phase 3: Acceptance Testing (31-50)
 - Validate all acceptance criteria
 - Use GPS-analyzed real data
 - Measure against requirements
-### Phase 4: Special Scenarios (51-54)
+### Phase 4: Special Scenarios (51-56)
 - Test specific GPS-identified situations
 - Validate outliers and sharp turns
 - Chunk-based recovery scenarios
 - Full system validation
 ## Success Criteria Summary
@@ -164,4 +200,3 @@ Tests using GPS-analyzed test datasets.
 - Specifications ready for QA team implementation
 - No code included per requirement
 - Tests cover all components and all acceptance criteria
@@ -1,15 +1,18 @@
-# Integration Test: Factor Graph Optimizer
+# Integration Test: Factor Graph Optimizer (F10)
 ## Summary
-Validate the Factor Graph Optimizer component using GTSAM to fuse sequential relative poses (L1) and absolute GPS anchors (L3) into a globally consistent trajectory.
+Validate the Factor Graph Optimizer component using GTSAM to fuse sequential relative poses (L1) and absolute GPS anchors (L3) into a globally consistent trajectory, with native multi-chunk support for disconnected route segments.
 ## Component Under Test
-**Component**: Factor Graph Optimizer
+**Component**: Factor Graph Optimizer (F10)
-**Location**: `gps_denied_10_factor_graph_optimizer`
+**Interface**: `IFactorGraphOptimizer`
 **Dependencies**:
- Sequential Visual Odometry (L1) - provides relative factors
+- F07 Sequential Visual Odometry - provides relative factors
- Metric Refinement (L3) - provides absolute GPS factors
+- F09 Metric Refinement - provides absolute GPS factors
- Coordinate Transformer
+- F12 Route Chunk Manager - chunk lifecycle (F10 provides low-level graph ops)
 - F13 Coordinate Transformer
 - H02 GSD Calculator - scale resolution
 - H03 Robust Kernels - outlier handling
 - GTSAM library
 ## Detailed Description
@@ -59,6 +62,78 @@ The optimizer is the "brain" of ASTRAL-Next, reconciling potentially conflicting
 - **Input**: Add measurements incrementally (simulate real-time operation)
 - **Expected**: Trajectory should converge smoothly, past poses may be refined
 ### Test Case 7: Create Chunk Subgraph
 - **Input**: flight_id, chunk_id, start_frame_id = 20
 - **Expected**: 
  - create_chunk_subgraph() returns True
  - New subgraph created for chunk
  - Chunk isolated from main trajectory
 ### Test Case 8: Add Relative Factors to Chunk
 - **Chunk**: chunk_2 with frames 20-30
 - **Input**: 10 relative factors from VO
 - **Expected**: 
  - add_relative_factor_to_chunk() returns True for each
  - Factors added to chunk's subgraph only
  - Main trajectory unaffected
 ### Test Case 9: Add Chunk Anchor
 - **Chunk**: chunk_2 (frames 20-30, unanchored)
 - **Input**: GPS anchor at frame 25
 - **Expected**: 
  - add_chunk_anchor() returns True
  - Chunk can now be merged
  - Chunk optimization triggered
 ### Test Case 10: Optimize Chunk
 - **Chunk**: chunk_2 with anchor
 - **Input**: optimize_chunk(chunk_id, iterations=10)
 - **Expected**: 
  - Returns OptimizationResult
  - converged = True
  - Chunk trajectory consistent
  - Other chunks unaffected
 ### Test Case 11: Merge Chunk Subgraphs
 - **Chunks**: chunk_1 (frames 1-10), chunk_2 (frames 20-30, anchored)
 - **Input**: merge_chunk_subgraphs(flight_id, chunk_2, chunk_1, transform)
 - **Expected**: 
  - Returns True
  - chunk_2 merged into chunk_1
  - Sim(3) transform applied correctly
  - Global consistency maintained
 ### Test Case 12: Get Chunk Trajectory
 - **Chunk**: chunk_2 with 10 frames
 - **Input**: get_chunk_trajectory(flight_id, chunk_id)
 - **Expected**: 
  - Returns Dict[int, Pose] with 10 frames
  - Poses in chunk's coordinate system
 ### Test Case 13: Optimize Global
 - **Setup**: 3 chunks, 2 anchored, 1 merged
 - **Input**: optimize_global(flight_id, iterations=50)
 - **Expected**: 
  - All chunks optimized together
  - Global consistency achieved
  - Returns OptimizationResult with all frame IDs
 ### Test Case 14: Multi-Flight Isolation
 - **Setup**: 2 flights processing simultaneously
 - **Input**: Add factors to both flights
 - **Expected**: 
  - Each flight's graph isolated
  - No cross-contamination
  - Independent optimization results
 ### Test Case 15: Delete Flight Graph
 - **Setup**: Flight with complex trajectory and chunks
 - **Input**: delete_flight_graph(flight_id)
 - **Expected**: 
  - Returns True
  - All resources cleaned up
  - No memory leaks
 ## Expected Output
 For each test case:
@@ -128,11 +203,35 @@ For each test case:
 - Each incremental update completes in <100ms
 - Final trajectory matches batch optimization (within 5m)
 **Test Cases 7-9 (Chunk Creation & Factors)**:
 - Chunk subgraph created successfully
 - Factors added to correct chunk
 - Chunk anchor enables merging
 **Test Cases 10-11 (Chunk Optimization & Merging)**:
 - Chunk optimizes independently
 - Sim(3) transform applied correctly
 - Merged trajectory globally consistent
 **Test Cases 12-13 (Chunk Queries & Global)**:
 - Chunk trajectory retrieved correctly
 - Global optimization handles all chunks
 **Test Cases 14-15 (Isolation & Cleanup)**:
 - Multi-flight isolation maintained
 - Resource cleanup complete
 ## Maximum Expected Time
 - **Small graph (10 poses)**: < 500ms
 - **Medium graph (30 poses)**: < 1000ms
 - **Incremental update**: < 100ms per new pose
- **Total test suite**: < 30 seconds
+- **Create chunk subgraph**: < 10ms
 - **Add factor to chunk**: < 5ms
 - **Add chunk anchor**: < 50ms
 - **Optimize chunk (10 frames)**: < 100ms
 - **Merge chunks**: < 200ms
 - **Optimize global (50 frames, 3 chunks)**: < 500ms
 - **Total test suite**: < 60 seconds
 ## Test Execution Steps
@@ -173,14 +272,17 @@ For each test case:
 ## Pass/Fail Criteria
 **Overall Test Passes If**:
- At least 5 out of 6 test cases meet their individual success criteria
+- At least 12 out of 15 test cases meet their individual success criteria
 - Test Case 4 (Baseline) must pass (validates AC-1, AC-2, AC-10)
 - Test Cases 7-11 (Chunk operations) must pass (validates multi-chunk architecture)
 - No crashes or numerical instabilities
 - Memory usage remains stable
 **Test Fails If**:
 - Test Case 4 fails to meet AC-1, AC-2, or AC-10
- More than 1 test case fails completely
+- Chunk creation/merging fails
 - Multi-flight isolation violated
 - More than 3 test cases fail completely
 - Optimizer produces NaN or infinite values
 - Processing time exceeds 2x maximum expected time
 - Memory leak detected (>500MB growth)
@@ -1,395 +0,0 @@
 # Integration Test: Flight Manager
 ## Summary
 Validate the Flight Manager component responsible for managing flight sessions, coordinating image processing, and tracking flight state throughout the ASTRAL-Next pipeline.
 ## Component Under Test
 **Component**: Flight Manager
 **Location**: `gps_denied_02_flight_manager`
 **Dependencies**:
 - Database Layer (flight persistence)
 - Image Input Pipeline
 - Sequential Visual Odometry (L1)
 - Global Place Recognition (L2)
 - Metric Refinement (L3)
 - Factor Graph Optimizer
 - Failure Recovery Coordinator
 - Result Manager
 ## Detailed Description
 This test validates that the Flight Manager can:
 1. Create and initialize new flight sessions
 2. Manage flight lifecycle (created → processing → completed → archived)
 3. Queue and dispatch images for processing
 4. Coordinate between all processing layers (L1, L2, L3, Factor Graph)
 5. Track processing progress and statistics
 6. Handle processing failures and recovery
 7. Manage concurrent flights
 8. Persist flight state across system restarts
 9. Enforce constraints (image ordering, missing frames)
 10. Trigger user input requests when automated processing fails
 The Flight Manager is the central orchestrator coordinating all components in the ASTRAL-Next system.
 ## Input Data
 ### Test Case 1: Create New Flight
 - **Input**:
  - Flight name: "Test_Baseline_Flight"
  - Start GPS: 48.275292, 37.385220
  - Altitude: 400m
  - Camera params: focal_length=25mm, sensor_width=23.5mm, resolution=6252x4168
 - **Expected**: Flight created with unique ID, state = "created"
 ### Test Case 2: Add Images to Flight
 - **Flight**: Test_Baseline_Flight
 - **Images**: AD000001-AD000010 (10 images in order)
 - **Expected**: All images queued, sequence maintained
 ### Test Case 3: Process Flight (Normal)
 - **Flight**: Test_Baseline_Flight with AD000001-AD000010
 - **Expected**:
  - State transitions: created → processing → completed
  - All 10 images processed successfully
  - Results available
 ### Test Case 4: Process with Sharp Turn
 - **Flight**: Test_Sharp_Turn
 - **Images**: AD000042, AD000044, AD000045, AD000046 (skip AD000043)
 - **Expected**:
  - Detect missing frame AD000043
  - L1 tracking fails, L2 recovers location
  - Flight completes successfully
 ### Test Case 5: Process with Outlier
 - **Flight**: Test_Outlier
 - **Images**: AD000045-AD000050 (includes 268m outlier)
 - **Expected**:
  - Outlier detected by Factor Graph
  - Robust cost function handles outlier
  - Other images processed correctly
 ### Test Case 6: Process Long Flight
 - **Flight**: Test_Long_Flight
 - **Images**: All 60 images (AD000001-AD000060)
 - **Expected**:
  - Processing completes without failure
  - Registration rate > 95% (AC-9)
  - Accuracy targets met (AC-1, AC-2)
 ### Test Case 7: Handle Processing Failure
 - **Flight**: Test_Failure
 - **Images**: AD000001-AD000005
 - **Scenario**: Simulate L1, L2, L3 all failing for AD000003
 - **Expected**:
  - Failure detected
  - Failure Recovery Coordinator invoked
  - User input requested (AC-6)
  - Flight state = "awaiting_user_input"
 ### Test Case 8: Apply User Fix
 - **Flight**: Test_Failure (from Test Case 7)
 - **User Input**: GPS for AD000003 = 48.274520, 37.381657
 - **Expected**:
  - User fix accepted
  - Processing resumes
  - Factor Graph incorporates fix
  - Flight completes
 ### Test Case 9: Concurrent Flights
 - **Flights**: 3 flights processing simultaneously
  - Flight A: AD000001-AD000020
  - Flight B: AD000021-AD000040
  - Flight C: AD000041-AD000060
 - **Expected**:
  - All 3 flights process without interference
  - No resource contention issues
  - All complete successfully
 ### Test Case 10: Flight State Persistence
 - **Scenario**:
  - Start flight with AD000001-AD000030
  - Process 15 images
  - Simulate system restart
  - Reload flight state
  - Continue processing remaining images
 - **Expected**: Flight resumes from last checkpoint
 ### Test Case 11: Get Flight Statistics
 - **Flight**: Completed flight with 60 images
 - **Expected Statistics**:
  - total_images: 60
  - processed_images: 60
  - failed_images: 0-2
  - success_rate: > 0.95
  - mean_error_m: < 30
  - processing_time_s: < 300
  - registration_rate: > 0.95
 ### Test Case 12: Archive Flight
 - **Flight**: Completed flight
 - **Expected**:
  - Flight state = "archived"
  - Results still accessible
  - No longer in active processing queue
 ## Expected Output
 For each test case:
 ```json
 {
  "flight_id": "unique_flight_identifier",
  "flight_name": "string",
  "state": "created|processing|completed|failed|awaiting_user_input|archived",
  "created_at": "timestamp",
  "updated_at": "timestamp",
  "statistics": {
    "total_images": <integer>,
    "processed_images": <integer>,
    "failed_images": <integer>,
    "awaiting_user_input": <integer>,
    "success_rate": <float>,
    "mean_error_m": <float>,
    "registration_rate": <float>,
    "processing_time_s": <float>
  },
  "current_image": "string|null",
  "next_action": "process_next|await_user_input|complete|none"
 }
 ```
 ## Success Criteria
 **Test Case 1 (Create)**:
 - Flight created successfully
 - Unique flight_id assigned
 - State = "created"
 - All parameters stored correctly
 **Test Case 2 (Add Images)**:
 - All 10 images queued
 - Sequence numbers assigned (1-10)
 - No duplicates
 **Test Case 3 (Process Normal)**:
 - All images processed
 - State = "completed"
 - success_rate = 1.0
 - Processing time < 60 seconds (10 images)
 **Test Case 4 (Sharp Turn)**:
 - Missing frame detected
 - L2 successfully recovers location for AD000044
 - success_rate ≥ 0.75 (3/4 or better)
 **Test Case 5 (Outlier)**:
 - Outlier detected and handled
 - success_rate ≥ 0.83 (5/6 or better)
 - Non-outlier images have low error
 **Test Case 6 (Long Flight)**:
 - All 60 images processed
 - registration_rate > 0.95 (AC-9)
 - success_rate > 0.80
 - Processing time < 300 seconds (< 5s per image, AC-7)
 **Test Case 7 (Failure)**:
 - Failure detected for AD000003
 - State transitions to "awaiting_user_input"
 - User notification sent via SSE
 - Processing paused appropriately
 **Test Case 8 (User Fix)**:
 - User fix accepted
 - Processing resumes automatically
 - State transitions back to "processing"
 - Flight completes successfully
 **Test Case 9 (Concurrent)**:
 - All 3 flights complete
 - No race conditions
 - No resource exhaustion
 - Each flight independent
 **Test Case 10 (Persistence)**:
 - State saved correctly
 - Reloaded state matches pre-restart
 - Processing continues from checkpoint
 - No image reprocessing
 **Test Case 11 (Statistics)**:
 - All statistics calculated correctly
 - Statistics updated in real-time
 - Match actual processing results
 **Test Case 12 (Archive)**:
 - State = "archived"
 - Flight no longer active
 - Results preserved
 ## Maximum Expected Time
 - **Create flight**: < 500ms
 - **Add 10 images**: < 2 seconds
 - **Process 10 images**: < 60 seconds
 - **Process 60 images**: < 300 seconds (5s per image, AC-7)
 - **Apply user fix**: < 1 second
 - **Get statistics**: < 200ms
 - **Total test suite**: < 600 seconds (10 minutes)
 ## Test Execution Steps
 1. **Setup Phase**:
   a. Initialize Flight Manager
   b. Clear any existing test flights from database
   c. Prepare test images
   d. Configure processing parameters
 2. **Test Case 1 - Create**:
   a. Call create_flight() with parameters
   b. Verify flight_id returned
   c. Check database for flight record
   d. Validate initial state
 3. **Test Case 2 - Add Images**:
   a. Call add_images() with 10 images
   b. Verify all queued
   c. Check sequence assignment
   d. Validate database state
 4. **Test Case 3 - Process Normal**:
   a. Call start_processing()
   b. Monitor state transitions
   c. Wait for completion
   d. Verify results
 5. **Test Case 4 - Sharp Turn**:
   a. Create flight with gap in sequence
   b. Start processing
   c. Monitor L1 failure, L2 recovery
   d. Verify completion
 6. **Test Case 5 - Outlier**:
   a. Process flight with outlier
   b. Monitor Factor Graph handling
   c. Verify outlier detection
   d. Check final results
 7. **Test Case 6 - Long Flight**:
   a. Process all 60 images
   b. Monitor progress continuously
   c. Measure processing time
   d. Validate against AC-1, AC-2, AC-7, AC-9
 8. **Test Case 7 - Failure**:
   a. Simulate triple failure for one image
   b. Verify failure detection
   c. Check state transition
   d. Confirm user input request
 9. **Test Case 8 - User Fix**:
   a. Submit user fix
   b. Verify acceptance
   c. Monitor processing resume
   d. Check incorporation into results
 10. **Test Case 9 - Concurrent**:
    a. Start 3 flights simultaneously
    b. Monitor all 3 in parallel
    c. Verify isolation
    d. Wait for all completions
 11. **Test Case 10 - Persistence**:
    a. Start long flight
    b. Save state mid-processing
    c. Simulate restart (reload from DB)
    d. Continue and complete
 12. **Test Case 11 - Statistics**:
    a. Query statistics after completion
    b. Validate calculations
    c. Compare with ground truth
    d. Check all metrics present
 13. **Test Case 12 - Archive**:
    a. Archive completed flight
    b. Verify state change
    c. Check accessibility
    d. Ensure not in active queue
 ## Pass/Fail Criteria
 **Overall Test Passes If**:
 - All 12 test cases meet their success criteria
 - Flight state machine transitions correctly
 - All images processed (or user input requested)
 - No data loss or corruption
 - Concurrent flights isolated
 - State persistence works correctly
 **Test Fails If**:
 - Any flight enters invalid state
 - Images processed out of order
 - Duplicate processing occurs
 - Statistics incorrect
 - Race conditions in concurrent processing
 - State persistence fails
 - Memory or resource leaks
 ## Additional Validation
 **State Machine Validation**:
 Valid state transitions:
 - created → processing
 - processing → completed
 - processing → awaiting_user_input
 - awaiting_user_input → processing
 - processing → failed
 - completed → archived
 - failed → archived
 Invalid transitions should be rejected.
 **Image Queue Management**:
 - FIFO processing order
 - No image skipped (unless failure)
 - Sequence numbers maintained
 - Duplicate detection
 **Resource Management**:
 - Memory usage bounded per flight
 - No orphaned resources after completion
 - Cleanup on flight deletion
 - Limits on concurrent flights (if configured)
 **Error Recovery**:
 - Graceful handling of component failures
 - Retry logic for transient errors
 - User intervention for persistent failures
 - Clear error messages
 **Integration with Acceptance Criteria**:
 - **AC-6**: User input mechanism tested (Test Cases 7, 8)
 - **AC-7**: Processing time < 5s per image (Test Case 6)
 - **AC-8**: Real-time results via SSE (monitored during processing)
 - **AC-9**: Registration rate > 95% (Test Case 6)
 **Performance Metrics**:
 - Throughput: images processed per second
 - Latency: time from image queued to result available
 - Overhead: Flight Manager processing vs actual vision processing
 - Scalability: performance with 1, 10, 100 flights
 ##Database Operations**:
 - Atomic transactions for state changes
 - Proper indexing for queries
 - No deadlocks in concurrent operations
 - Backup and recovery procedures
 **Configuration Options**:
 Test various configuration:
 - Max concurrent images processing
 - Retry attempts for failed processing
 - Timeout values
 - Buffer sizes
 - Checkpoint frequency for persistence
@@ -0,0 +1,194 @@
 # Integration Test: Flight Lifecycle Manager (F02.1)
 ## Summary
 Validate the Flight Lifecycle Manager component responsible for flight CRUD operations, system initialization, and API request routing.
 ## Component Under Test
 **Component**: Flight Lifecycle Manager (F02.1)
 **Interface**: `IFlightLifecycleManager`
 **Dependencies**:
 - F03 Flight Database (persistence)
 - F04 Satellite Data Manager (prefetching)
 - F05 Image Input Pipeline (image queuing)
 - F13 Coordinate Transformer (ENU origin)
 - F15 SSE Event Streamer (stream creation)
 - F16 Model Manager (model loading)
 - F17 Configuration Manager (config loading)
 - F02.2 Flight Processing Engine (managed child)
 ## Detailed Description
 This test validates that the Flight Lifecycle Manager can:
 1. Create and initialize new flight sessions
 2. Manage flight lifecycle (created → active → completed)
 3. Validate waypoints and geofences
 4. Queue images for processing (delegates to F05, triggers F02.2)
 5. Handle user fix requests (delegates to F02.2)
 6. Create SSE client streams (delegates to F15)
 7. Initialize system components on startup
 8. Manage F02.2 Processing Engine instances per flight
 The Lifecycle Manager is the external-facing component handling API requests and delegating to internal processing.
 ## Input Data
 ### Test Case 1: Create New Flight
 - **Input**:
  - Flight name: "Test_Baseline_Flight"
  - Start GPS: 48.275292, 37.385220
  - Altitude: 400m
  - Camera params: focal_length=25mm, sensor_width=23.5mm, resolution=6252x4168
 - **Expected**: 
  - Flight created with unique ID
  - F13.set_enu_origin() called with start_gps
  - F04.prefetch_route_corridor() triggered
  - Flight persisted to F03
 ### Test Case 2: Get Flight
 - **Input**: Existing flight_id
 - **Expected**: Flight data returned with current state
 ### Test Case 3: Get Flight State
 - **Input**: Existing flight_id
 - **Expected**: FlightState returned (processing status, current frame, etc.)
 ### Test Case 4: Delete Flight
 - **Input**: Existing flight_id
 - **Expected**: 
  - Flight marked deleted in F03
  - Associated F02.2 engine stopped
  - Resources cleaned up
 ### Test Case 5: Update Flight Status
 - **Input**: flight_id, status update (e.g., pause, resume)
 - **Expected**: Status updated, F02.2 notified if needed
 ### Test Case 6: Update Single Waypoint
 - **Input**: flight_id, waypoint_id, new Waypoint data
 - **Expected**: Waypoint updated in F03
 ### Test Case 7: Batch Update Waypoints
 - **Input**: flight_id, List of updated Waypoints
 - **Expected**: All waypoints updated atomically
 ### Test Case 8: Validate Waypoint
 - **Input**: Waypoint with GPS coordinates
 - **Expected**: ValidationResult with valid/invalid and reason
 ### Test Case 9: Validate Geofence
 - **Input**: Geofence polygon
 - **Expected**: ValidationResult (valid polygon, within limits)
 ### Test Case 10: Queue Images (Delegation)
 - **Input**: flight_id, ImageBatch (10 images)
 - **Expected**: 
  - F05.queue_batch() called
  - F02.2 engine started/triggered
  - BatchQueueResult returned
 ### Test Case 11: Handle User Fix (Delegation)
 - **Input**: flight_id, UserFixRequest (frame_id, GPS anchor)
 - **Expected**: 
  - Active F02.2 engine retrieved
  - engine.apply_user_fix() called
  - UserFixResult returned
 ### Test Case 12: Create Client Stream (Delegation)
 - **Input**: flight_id, client_id
 - **Expected**: 
  - F15.create_stream() called
  - StreamConnection returned
 ### Test Case 13: Convert Object to GPS (Delegation)
 - **Input**: flight_id, frame_id, pixel coordinates
 - **Expected**: 
  - F13.image_object_to_gps() called
  - GPSPoint returned
 ### Test Case 14: System Initialization
 - **Input**: Call initialize_system()
 - **Expected**: 
  - F17.load_config() called
  - F16.load_model() called for all models
  - F03 database initialized
  - F04 cache initialized
  - F08 index loaded
  - Returns True on success
 ### Test Case 15: Get Flight Metadata
 - **Input**: flight_id
 - **Expected**: FlightMetadata (camera params, altitude, waypoint count, etc.)
 ### Test Case 16: Validate Flight Continuity
 - **Input**: List of Waypoints
 - **Expected**: ValidationResult (continuous path, no gaps > threshold)
 ## Expected Output
 For each test case:
 ```json
 {
  "flight_id": "unique_flight_identifier",
  "flight_name": "string",
  "state": "created|active|completed|paused|deleted",
  "created_at": "timestamp",
  "updated_at": "timestamp",
  "enu_origin": {
    "latitude": <float>,
    "longitude": <float>
  },
  "waypoint_count": <integer>,
  "has_active_engine": <boolean>
 }
 ```
 ## Success Criteria
 **Test Cases 1-5 (Flight CRUD)**:
 - Flight created/retrieved/updated/deleted correctly
 - State transitions valid
 - Database persistence verified
 **Test Cases 6-9 (Validation)**:
 - Waypoint/geofence validation correct
 - Invalid inputs rejected with reason
 - Edge cases handled
 **Test Cases 10-13 (Delegation)**:
 - Correct components called
 - Parameters passed correctly
 - Results returned correctly
 **Test Case 14 (Initialization)**:
 - All components initialized in correct order
 - Failures handled gracefully
 - Startup time < 30 seconds
 **Test Cases 15-16 (Metadata/Continuity)**:
 - Metadata accurate
 - Continuity validation correct
 ## Maximum Expected Time
 - Create flight: < 500ms (excluding prefetch)
 - Get/Update flight: < 100ms
 - Delete flight: < 500ms
 - Queue images: < 2 seconds (10 images)
 - User fix delegation: < 100ms
 - System initialization: < 30 seconds
 - Total test suite: < 120 seconds
 ## Pass/Fail Criteria
 **Overall Test Passes If**:
 - All 16 test cases pass
 - CRUD operations work correctly
 - Delegation to child components works
 - System initialization completes
 - No resource leaks
 **Test Fails If**:
 - Flight CRUD fails
 - Delegation fails to reach correct component
 - System initialization fails
 - Invalid state transitions allowed
 - Resource cleanup fails on delete
@@ -0,0 +1,241 @@
 # Integration Test: Flight Processing Engine (F02.2)
 ## Summary
 Validate the Flight Processing Engine component responsible for the main processing loop, frame-by-frame orchestration, recovery coordination, and chunk management.
 ## Component Under Test
 **Component**: Flight Processing Engine (F02.2)
 **Interface**: `IFlightProcessingEngine`
 **Dependencies**:
 - F05 Image Input Pipeline (image source)
 - F06 Image Rotation Manager (pre-processing)
 - F07 Sequential Visual Odometry (motion estimation)
 - F09 Metric Refinement (satellite alignment)
 - F10 Factor Graph Optimizer (state estimation)
 - F11 Failure Recovery Coordinator (recovery logic)
 - F12 Route Chunk Manager (chunk state)
 - F14 Result Manager (saving results)
 - F15 SSE Event Streamer (real-time updates)
 ## Detailed Description
 This test validates that the Flight Processing Engine can:
 1. Run the main processing loop (Image → VO → Graph → Result)
 2. Manage flight status (Processing, Blocked, Recovering, Completed)
 3. Coordinate chunk lifecycle with F12
 4. Handle tracking loss and delegate to F11
 5. Apply user fixes and resume processing
 6. Publish results via F14/F15
 7. Manage background chunk matching tasks
 8. Handle concurrent processing gracefully
 The Processing Engine runs in a background thread per active flight.
 ## Input Data
 ### Test Case 1: Start Processing
 - **Flight**: Test_Baseline_Flight with 10 queued images
 - **Action**: Call start_processing(flight_id)
 - **Expected**: 
  - Background processing thread started
  - First image retrieved from F05
  - Processing loop begins
 ### Test Case 2: Stop Processing
 - **Flight**: Active flight with processing in progress
 - **Action**: Call stop_processing(flight_id)
 - **Expected**: 
  - Processing loop stopped gracefully
  - Current frame completed or cancelled
  - State saved
 ### Test Case 3: Process Single Frame (Normal)
 - **Input**: Single frame with good tracking
 - **Expected**: 
  - F06.requires_rotation_sweep() checked
  - F07.compute_relative_pose() called
  - F12.add_frame_to_chunk() called
  - F10.add_relative_factor() called
  - F10.optimize_chunk() called
  - F14.update_frame_result() called
  - SSE event sent
 ### Test Case 4: Process Frame (First Frame / Sharp Turn)
 - **Input**: First frame or frame after sharp turn
 - **Expected**: 
  - F06.requires_rotation_sweep() returns True
  - F06.rotate_image_360() called (12 rotations)
  - F09.align_to_satellite() called for each rotation
  - Best rotation selected
  - Heading updated
 ### Test Case 5: Process Frame (Tracking Lost)
 - **Input**: Frame with low VO confidence
 - **Expected**: 
  - F11.check_confidence() returns LOST
  - F11.create_chunk_on_tracking_loss() called
  - New chunk created proactively
  - handle_tracking_loss() invoked
 ### Test Case 6: Handle Tracking Loss (Progressive Search)
 - **Input**: Frame with tracking lost, recoverable
 - **Expected**: 
  - F11.start_search() called
  - F11.try_current_grid() called iteratively
  - Grid expansion (1→4→9→16→25)
  - Match found, F11.mark_found() called
  - Processing continues
 ### Test Case 7: Handle Tracking Loss (User Input Needed)
 - **Input**: Frame with tracking lost, not recoverable
 - **Expected**: 
  - Progressive search exhausted (25 tiles)
  - F11.create_user_input_request() called
  - Engine receives UserInputRequest
  - F15.send_user_input_request() called
  - Status set to BLOCKED
  - Processing paused
 ### Test Case 8: Apply User Fix
 - **Input**: UserFixRequest with GPS anchor
 - **Action**: Call apply_user_fix(flight_id, fix_data)
 - **Expected**: 
  - F11.apply_user_anchor() called
  - Anchor applied to factor graph
  - Status set to PROCESSING
  - Processing loop resumes
 ### Test Case 9: Get Active Chunk
 - **Flight**: Active flight with chunks
 - **Action**: Call get_active_chunk(flight_id)
 - **Expected**: 
  - F12.get_active_chunk() called
  - Returns current active chunk or None
 ### Test Case 10: Create New Chunk
 - **Input**: Tracking loss detected
 - **Action**: Call create_new_chunk(flight_id, frame_id)
 - **Expected**: 
  - F12.create_chunk() called
  - New chunk created in factor graph
  - Returns ChunkHandle
 ### Test Case 11: Process Flight (Full - Normal)
 - **Flight**: 30 images (AD000001-030)
 - **Expected**: 
  - All 30 images processed
  - Status transitions: Processing → Completed
  - Results published for all frames
  - Processing time < 150 seconds (5s per image)
 ### Test Case 12: Process Flight (With Sharp Turn)
 - **Flight**: AD000042, AD000044, AD000045, AD000046 (skip AD000043)
 - **Expected**: 
  - Tracking lost at AD000044
  - New chunk created
  - Recovery succeeds (L2/L3)
  - Flight completes
 ### Test Case 13: Process Flight (With Outlier)
 - **Flight**: AD000045-050 (includes 268m outlier)
 - **Expected**: 
  - Outlier detected by factor graph
  - Robust kernel handles outlier
  - Other images processed correctly
 ### Test Case 14: Process Flight (Long)
 - **Flight**: All 60 images (AD000001-060)
 - **Expected**: 
  - Processing completes
  - Registration rate > 95% (AC-9)
  - Processing time < 300 seconds (AC-7)
 ### Test Case 15: Background Chunk Matching
 - **Flight**: Flight with multiple unanchored chunks
 - **Expected**: 
  - Background task processes chunks
  - F11.process_unanchored_chunks() called periodically
  - Chunks matched and merged asynchronously
  - Frame processing not blocked
 ### Test Case 16: State Persistence and Recovery
 - **Scenario**: 
  - Process 15 frames
  - Simulate restart
  - Resume processing
 - **Expected**: 
  - State saved to F03 before restart
  - State restored on resume
  - Processing continues from frame 16
 ## Expected Output
 For each frame processed:
 ```json
 {
  "flight_id": "string",
  "frame_id": <integer>,
  "status": "processed|failed|skipped|blocked",
  "gps": {
    "latitude": <float>,
    "longitude": <float>
  },
  "confidence": <float>,
  "chunk_id": "string",
  "processing_time_ms": <float>
 }
 ```
 ## Success Criteria
 **Test Cases 1-2 (Start/Stop)**:
 - Processing starts/stops correctly
 - No resource leaks
 - Graceful shutdown
 **Test Cases 3-5 (Frame Processing)**:
 - Correct components called in order
 - State updates correctly
 - Results published
 **Test Cases 6-8 (Recovery)**:
 - Progressive search works
 - User input flow works
 - Recovery successful
 **Test Cases 9-10 (Chunk Management)**:
 - Chunks created/managed correctly
 - F12 integration works
 **Test Cases 11-14 (Full Flights)**:
 - All acceptance criteria met
 - Processing completes successfully
 **Test Cases 15-16 (Background/Recovery)**:
 - Background tasks work
 - State persistence works
 ## Maximum Expected Time
 - Start/stop processing: < 500ms
 - Process single frame: < 5 seconds (AC-7)
 - Handle tracking loss: < 2 seconds
 - Apply user fix: < 1 second
 - Process 30 images: < 150 seconds
 - Process 60 images: < 300 seconds
 - Total test suite: < 600 seconds
 ## Pass/Fail Criteria
 **Overall Test Passes If**:
 - All 16 test cases pass
 - Processing loop works correctly
 - Recovery mechanisms work
 - Chunk management works
 - Performance targets met
 **Test Fails If**:
 - Processing loop crashes
 - Recovery fails when it should succeed
 - User input not requested when needed
 - Performance exceeds 5s per image
 - State persistence fails
@@ -1,93 +1,308 @@
-# Integration Test: Failure Recovery Coordinator
+# Integration Test: Failure Recovery Coordinator (F11)
 ## Summary
-Validate the Failure Recovery Coordinator that detects processing failures and coordinates recovery strategies including user intervention per AC-6.
+Validate the Failure Recovery Coordinator that detects processing failures and coordinates recovery strategies. F11 is a pure logic component that returns status objects - it does NOT directly emit events or communicate with clients.
 ## Component Under Test
-**Component**: Failure Recovery Coordinator
+**Component**: Failure Recovery Coordinator (F11)
-**Location**: `gps_denied_11_failure_recovery_coordinator`
+**Interface**: `IFailureRecoveryCoordinator`
-**Dependencies**: Flight Manager, All vision layers (L1, L2, L3), Factor Graph Optimizer, SSE Event Streamer
+**Dependencies**:
 - F04 Satellite Data Manager (search grids)
 - F06 Image Rotation Manager (rotation sweeps)
 - F08 Global Place Recognition (candidate retrieval)
 - F09 Metric Refinement (alignment)
 - F10 Factor Graph Optimizer (anchor application)
 - F12 Route Chunk Manager (chunk operations)
 ## Architecture Pattern
 **Pure Logic Component**: F11 coordinates recovery strategies but delegates execution and communication.
 - **NO Events**: Returns status objects or booleans
 - **Caller Responsibility**: F02.2 decides state transitions based on F11 returns
 - **Chunk Orchestration**: Coordinates F12 and F10 operations during recovery
 ## Detailed Description
 Per AC-6: "In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image."
 Tests that the coordinator can:
-1. Detect when L1, L2, and L3 all fail for an image
+1. Assess tracking confidence from VO and LiteSAM results
-2. Track consecutive failures (next, second next, third next)
+2. Detect tracking loss conditions
-3. Request user input after 3 consecutive failures
+3. Coordinate progressive tile search (1→4→9→16→25)
-4. Apply user-provided GPS fixes
+4. Create user input request objects (NOT send them)
-5. Resume processing after user intervention
+5. Apply user-provided GPS anchors
-6. Handle user input rejection/timeout
+6. Proactively create chunks on tracking loss
-7. Provide failure diagnostics to user
+7. Coordinate chunk semantic matching
 8. Coordinate chunk LiteSAM matching with rotation sweeps
 9. Merge chunks to main trajectory
 ## Input Data
-### Test Case 1: Single Image Failure (L1 only)
+### Test Case 1: Check Confidence (Good)
- **Images**: AD000001-AD000005
+- **Input**: VO result with 80 inliers, LiteSAM confidence 0.85
- **Simulate**: L1 fails for AD000003, L2 succeeds
+- **Expected**: 
- **Expected**: L2 recovers, no user input needed
+  - Returns ConfidenceAssessment
  - tracking_status = "good"
  - overall_confidence > 0.7
-### Test Case 2: Triple Layer Failure (One Image)
+### Test Case 2: Check Confidence (Degraded)
- **Images**: AD000001-AD000005
+- **Input**: VO result with 35 inliers, LiteSAM confidence 0.5
- **Simulate**: L1, L2, L3 all fail for AD000003
+- **Expected**: 
- **Expected**: Mark as failed, continue to AD000004
+  - Returns ConfidenceAssessment
  - tracking_status = "degraded"
  - overall_confidence 0.3-0.7
-### Test Case 3: Three Consecutive Failures (AC-6)
+### Test Case 3: Check Confidence (Lost)
- **Images**: AD000001-AD000010
+- **Input**: VO result with 10 inliers, no LiteSAM result
- **Simulate**: All layers fail for AD000003, AD000004, AD000005
+- **Expected**: 
- **Expected**: After 3rd failure, request user input via SSE
+  - Returns ConfidenceAssessment
  - tracking_status = "lost"
  - overall_confidence < 0.3
-### Test Case 4: User Provides Fix
+### Test Case 4: Detect Tracking Loss
- **Context**: Test Case 3, user input requested
+- **Input**: ConfidenceAssessment with tracking_status = "lost"
- **User Input**: GPS for AD000005 = 48.273997, 37.379828
+- **Expected**: Returns True (tracking lost)
 - **Expected**: Fix applied, processing resumes with AD000006
-### Test Case 5: User Input Timeout
+### Test Case 5: Start Search
- **Context**: User input requested, no response for 5 minutes
+- **Input**: flight_id, frame_id, estimated_gps
- **Expected**: Continue without fix, mark images as user_input_pending
+- **Expected**: 
  - Returns SearchSession
  - session.current_grid_size = 1
  - session.found = False
  - session.exhausted = False
-### Test Case 6: Intermittent Failures
+### Test Case 6: Expand Search Radius
- **Images**: AD000001-AD000020
+- **Input**: SearchSession with grid_size = 1
- **Simulate**: Failures for AD000003, AD000007, AD000012 (not consecutive)
+- **Action**: Call expand_search_radius()
- **Expected**: No user input requested, each recovered or skipped
+- **Expected**: 
  - Returns List[TileCoords] (3 new tiles for 2x2 grid)
  - session.current_grid_size = 4
-### Test Case 7: Failure After User Fix
+### Test Case 7: Try Current Grid (Match Found)
- **Context**: User fix applied, next image also fails
+- **Input**: SearchSession, tiles dict with matching tile
- **Expected**: Request another user fix
+- **Expected**: 
  - Returns AlignmentResult
  - result.matched = True
  - result.gps populated
-### Test Case 8: Batch User Input
+### Test Case 8: Try Current Grid (No Match)
- **Context**: Multiple failures needing user input
+- **Input**: SearchSession, tiles dict with no matching tile
- **Expected**: Request fixes for all failed images, process batch
+- **Expected**: 
  - Returns None
  - Caller should call expand_search_radius()
 ### Test Case 9: Mark Found
 - **Input**: SearchSession, AlignmentResult
 - **Expected**: 
  - Returns True
  - session.found = True
 ### Test Case 10: Get Search Status
 - **Input**: SearchSession
 - **Expected**: 
  - Returns SearchStatus
  - Contains current_grid_size, found, exhausted
 ### Test Case 11: Create User Input Request
 - **Input**: flight_id, frame_id, candidate_tiles
 - **Expected**: 
  - Returns UserInputRequest object (NOT sent)
  - Contains request_id, flight_id, frame_id
  - Contains uav_image, candidate_tiles, message
  - **NOTE**: Caller (F02.2) sends to F15
 ### Test Case 12: Apply User Anchor
 - **Input**: flight_id, frame_id, UserAnchor with GPS
 - **Expected**: 
  - Calls F10.add_absolute_factor() with high confidence
  - Returns True if successful
  - **NOTE**: Caller (F02.2) updates state and publishes result
 ### Test Case 13: Create Chunk on Tracking Loss
 - **Input**: flight_id, frame_id
 - **Expected**: 
  - Calls F12.create_chunk()
  - Returns ChunkHandle
  - chunk.is_active = True
  - chunk.has_anchor = False
  - chunk.matching_status = "unanchored"
 ### Test Case 14: Try Chunk Semantic Matching
 - **Input**: chunk_id (chunk with 10 frames)
 - **Expected**: 
  - Gets chunk images via F12
  - Calls F08.retrieve_candidate_tiles_for_chunk()
  - Returns List[TileCandidate] or None
 ### Test Case 15: Try Chunk LiteSAM Matching
 - **Input**: chunk_id, candidate_tiles
 - **Expected**: 
  - Gets chunk images via F12
  - Calls F06.try_chunk_rotation_steps() (12 rotations)
  - Returns ChunkAlignmentResult or None
  - Result contains rotation_angle, chunk_center_gps, transform
 ### Test Case 16: Merge Chunk to Trajectory
 - **Input**: flight_id, chunk_id, ChunkAlignmentResult
 - **Expected**: 
  - Calls F12.mark_chunk_anchored()
  - Calls F12.merge_chunks()
  - Returns True if successful
  - **NOTE**: Caller (F02.2) coordinates result updates
 ### Test Case 17: Process Unanchored Chunks (Logic)
 - **Input**: flight_id with 2 unanchored chunks
 - **Expected**: 
  - Calls F12.get_chunks_for_matching()
  - For each ready chunk:
    - try_chunk_semantic_matching()
    - try_chunk_litesam_matching()
    - merge_chunk_to_trajectory() if match found
 ### Test Case 18: Progressive Search Full Flow
 - **Scenario**: 
  - start_search() → grid_size=1
  - try_current_grid() → None
  - expand_search_radius() → grid_size=4
  - try_current_grid() → None
  - expand_search_radius() → grid_size=9
  - try_current_grid() → AlignmentResult
  - mark_found() → success
 - **Expected**: Search succeeds at 3x3 grid
 ### Test Case 19: Search Exhaustion Flow
 - **Scenario**: 
  - start_search()
  - try all grids: 1→4→9→16→25, all fail
  - create_user_input_request()
 - **Expected**: 
  - Returns UserInputRequest
  - session.exhausted = True
  - **NOTE**: Caller sends request, waits for user fix
 ### Test Case 20: Chunk Recovery Full Flow
 - **Scenario**: 
  - create_chunk_on_tracking_loss() → chunk created
  - Processing continues in chunk
  - try_chunk_semantic_matching() → candidates found
  - try_chunk_litesam_matching() → match at 90° rotation
  - merge_chunk_to_trajectory() → success
 - **Expected**: Chunk anchored and merged without user input
 ## Expected Output
 ### ConfidenceAssessment
 ```json
 {
-  "failure_detected": true,
+  "overall_confidence": 0.85,
-  "failed_image_id": "AD000003",
+  "vo_confidence": 0.9,
-  "failure_layers": ["L1", "L2", "L3"],
+  "litesam_confidence": 0.8,
-  "consecutive_failures": 3,
+  "inlier_count": 80,
-  "action": "request_user_input|skip|continue",
+  "tracking_status": "good|degraded|lost"
-  "user_input_requested": true,
+}
-  "user_fix_received": true/false,
+```
-  "recovery_strategy": "string",
+
-  "timestamp": "ISO8601"
+### SearchSession
 ```json
 {
  "session_id": "string",
  "flight_id": "string",
  "frame_id": 42,
  "center_gps": {"latitude": 48.275, "longitude": 37.385},
  "current_grid_size": 4,
  "max_grid_size": 25,
  "found": false,
  "exhausted": false
 }
 ```
 ### UserInputRequest
 ```json
 {
  "request_id": "string",
  "flight_id": "string",
  "frame_id": 42,
  "candidate_tiles": [...],
  "message": "Please provide GPS location for this frame"
 }
 ```
 ### ChunkAlignmentResult
 ```json
 {
  "matched": true,
  "chunk_id": "string",
  "chunk_center_gps": {"latitude": 48.275, "longitude": 37.385},
  "rotation_angle": 90.0,
  "confidence": 0.85,
  "inlier_count": 150,
  "transform": {...}
 }
 ```
 ## Success Criteria
- **Test Cases 1-2**: Failures handled, no user input
+
- **Test Case 3**: User input requested after 3rd consecutive failure (AC-6)
+**Test Cases 1-4 (Confidence)**:
- **Test Case 4**: User fix applied, processing continues
+- Confidence assessment accurate
- **Test Case 5**: Timeout handled gracefully
+- Thresholds correctly applied
- **Test Cases 6-8**: Appropriate recovery strategies applied
+- Tracking loss detected correctly
 **Test Cases 5-10 (Progressive Search)**:
 - Search session management works
 - Grid expansion correct (1→4→9→16→25)
 - Match detection works
 **Test Cases 11-12 (User Input)**:
 - UserInputRequest object created correctly (not sent)
 - User anchor applied correctly
 **Test Cases 13-17 (Chunk Recovery)**:
 - Proactive chunk creation works
 - Chunk semantic matching works
 - Chunk LiteSAM matching with rotation works
 - Chunk merging works
 **Test Cases 18-20 (Full Flows)**:
 - Progressive search flow completes
 - Search exhaustion flow completes
 - Chunk recovery flow completes
 ## Maximum Expected Time
- Failure detection: < 100ms
+- check_confidence: < 10ms
- User input request: < 500ms
+- detect_tracking_loss: < 5ms
- Apply user fix: < 1 second
+- Progressive search (25 tiles): < 1.5s total
 - create_user_input_request: < 100ms
 - apply_user_anchor: < 500ms
 - Chunk semantic matching: < 2s
 - Chunk LiteSAM matching (12 rotations): < 10s
 - Total test suite: < 120 seconds
 ## Pass/Fail Criteria
 **Passes If**: AC-6 requirement met (user input after 3 consecutive failures), all recovery strategies work
 **Fails If**: User input not requested when needed, or processing deadlocks
 **Overall Test Passes If**:
 - All 20 test cases pass
 - Confidence assessment accurate
 - Progressive search works
 - User input request created correctly (not sent)
 - Chunk recovery works
 - No direct event emission (pure logic)
 **Test Fails If**:
 - Tracking loss not detected when should be
 - Progressive search fails to expand correctly
 - User input request not created when needed
 - F11 directly emits events (violates architecture)
 - Chunk recovery fails
 - Performance exceeds targets
 ## Architecture Validation
 **F11 Must NOT**:
 - Call F15 directly (SSE events)
 - Emit events to clients
 - Manage processing state
 - Control processing flow
 **F11 Must**:
 - Return status objects for all operations
 - Let caller (F02.2) decide next actions
 - Coordinate with F10, F12 for chunk operations
 - Be testable in isolation (no I/O dependencies)
@@ -8,12 +8,14 @@ Validate Acceptance Criterion 5 (partial): "System should try to operate when UA
 ## Preconditions
 1. System with "Atlas" multi-map capability (factor graph with native chunk support)
-2. F12 Route Chunk Manager functional
+2. F02.2 Flight Processing Engine running
-3. F10 Factor Graph Optimizer with multi-chunk support
+3. F11 Failure Recovery Coordinator (chunk orchestration)
-4. L2 global place recognition functional (chunk semantic matching)
+4. F12 Route Chunk Manager functional (chunk lifecycle)
-5. L3 metric refinement functional (chunk LiteSAM matching)
+5. F10 Factor Graph Optimizer with multi-chunk support (subgraph operations)
-6. Geodetic map-merging logic implemented (Sim(3) transform)
+6. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
-7. Test dataset: Simulate 3 disconnected route fragments
+7. F09 Metric Refinement (chunk LiteSAM matching)
 8. Geodetic map-merging logic implemented (Sim(3) transform via F10.merge_chunk_subgraphs())
 9. Test dataset: Simulate 3 disconnected route fragments
 ## Test Description
 Test system's ability to handle completely disconnected route segments (no overlap between segments) and eventually connect them into a coherent trajectory using global GPS anchors.
@@ -141,24 +143,24 @@ Processing Mode: Multi-Map Atlas
 ## Architecture Elements
 **Multi-Map "Atlas"** (per solution document):
- Each disconnected segment gets own local map
+- Each disconnected segment gets own local map via F12.create_chunk()
- Local maps independently optimized
+- Local maps independently optimized via F10.optimize_chunk()
- GPS anchors provide global reference
+- GPS anchors provide global reference via F10.add_chunk_anchor()
- Geodetic merging aligns all maps
+- Geodetic merging aligns all maps via F10.merge_chunk_subgraphs()
 **Recovery Mechanisms**:
- **Proactive chunk creation** on tracking loss (immediate, not reactive)
+- **Proactive chunk creation** via F11.create_chunk_on_tracking_loss() (immediate, not reactive)
- Chunk semantic matching (aggregate DINOv2) finds location for chunk
+- Chunk semantic matching via F08.retrieve_candidate_tiles_for_chunk() (aggregate DINOv2)
- Chunk LiteSAM matching (with rotation sweeps) refines GPS anchor
+- Chunk LiteSAM matching via F06.try_chunk_rotation_steps() + F09.align_chunk_to_satellite()
- Factor graph creates new chunk subgraph
+- F10 creates new chunk subgraph
- Sim(3) transform merges chunks into global trajectory
+- Sim(3) transform merges chunks via F12.merge_chunks() → F10.merge_chunk_subgraphs()
 **Fragment Detection**:
 - Large displacement (> 500m) from last image
- Low/zero overlap
+- Low/zero overlap (F07 VO fails)
 - L1 failure triggers **proactive** new chunk creation
 - Chunks processed independently with local optimization
- Multiple chunks can exist simultaneously
+- Multiple chunks can exist simultaneously (F10 supports multi-chunk factor graph)
 ## Notes
 - AC-5 describes realistic operational scenario (multiple turns, disconnected segments)
@@ -1,42 +1,254 @@
 # Acceptance Test: AC-6 - User Input Recovery
 ## Summary
-Validate Acceptance Criterion 6: "In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image."
+Validate Acceptance Criterion 6: "In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location."
 ## Linked Acceptance Criteria
 **AC-6**: User input requested after 3 consecutive failures
 ## Preconditions
 1. ASTRAL-Next system operational
 2. F11 Failure Recovery Coordinator configured with failure threshold = 3
 3. F15 SSE Event Streamer functional
 4. F01 Flight API accepting user-fix endpoint
 5. F10 Factor Graph Optimizer ready to accept high-confidence anchors
 6. Test environment configured to simulate L1/L2/L3 failures
 7. SSE client connected and monitoring events
 ## Test Data
 - **Dataset**: AD000001-060 (60 images)
 - **Failure Injection**: Configure mock failures for specific frames
 - **Ground Truth**: coordinates.csv for validation
 ## Test Steps
-### Step 1: Simulate Triple Failure
+### Step 1: Setup Failure Injection
- **Action**: Process flight where L1, L2, L3 all fail for AD000003, AD000004, AD000005
+- **Action**: Configure system to fail L1, L2, L3 for frames AD000020, AD000021, AD000022
- **Expected Result**: After 3rd consecutive failure, system requests user input via SSE event
+- **Expected Result**: 
  - L1 (SuperPoint+LightGlue): Returns match_count < 10
  - L2 (AnyLoc): Returns confidence < 0.3
  - L3 (LiteSAM): Returns alignment_score < 0.2
-### Step 2: User Receives Notification
+### Step 2: Process Normal Frames (1-19)
- **Action**: SSE client receives "user_input_required" event
+- **Action**: Process AD000001-AD000019 normally
- **Expected Result**: Event includes image needing fix (AD000005), top-3 satellite tiles for reference
+- **Expected Result**:
  - All 19 frames processed successfully
  - No user input requests
  - SSE events: 19 × `frame_processed`
-### Step 3: User Provides GPS Fix
+### Step 3: First Consecutive Failure
- **Action**: User submits GPS for AD000005: POST /flights/{flightId}/user-fix
+- **Action**: Process AD000020
- **Payload**: `{"frame_id": 5, "uav_pixel": [3126, 2084], "satellite_gps": {"lat": 48.273997, "lon": 37.379828}}`
+- **Expected Result**:
- **Expected Result**: Fix accepted, processing resumes, SSE event "user_fix_applied" sent
+  - L1 fails (low match count)
  - L2 fallback fails (low confidence)
  - L3 fallback fails (low alignment)
  - System increments failure_count to 1
  - SSE event: `frame_processing_failed` with frame_id=20
  - **No user input request yet**
-### Step 4: System Incorporates Fix
+### Step 4: Second Consecutive Failure
- **Action**: Factor graph adds user fix as high-confidence GPS anchor
+- **Action**: Process AD000021
- **Expected Result**: Trajectory refined incorporating user input
+- **Expected Result**:
  - All layers fail
  - failure_count incremented to 2
  - SSE event: `frame_processing_failed` with frame_id=21
  - **No user input request yet**
-### Step 5: Processing Continues
+### Step 5: Third Consecutive Failure - Triggers User Input
- **Action**: System processes AD000006 and beyond
+- **Action**: Process AD000022
- **Expected Result**: Processing continues normally
+- **Expected Result**:
  - All layers fail
  - failure_count reaches threshold (3)
  - F11 calls `create_user_input_request()`
  - SSE event: `user_input_required`
  - Event payload contains:
    ```json
    {
      "type": "user_input_required",
      "flight_id": "<flight_id>",
      "frame_id": 22,
      "failed_frames": [20, 21, 22],
      "candidate_tiles": [
        {"tile_id": "xyz", "gps": {"lat": 48.27, "lon": 37.38}, "thumbnail_url": "..."},
        {"tile_id": "abc", "gps": {"lat": 48.26, "lon": 37.37}, "thumbnail_url": "..."},
        {"tile_id": "def", "gps": {"lat": 48.28, "lon": 37.39}, "thumbnail_url": "..."}
      ],
      "uav_image_url": "/flights/<id>/images/22",
      "message": "System unable to locate 3 consecutive images. Please provide GPS fix."
    }
    ```
 ### Step 6: Validate Threshold Behavior
 - **Action**: Verify user input NOT requested before 3 failures
 - **Expected Result**:
  - Review event log: no `user_input_required` before frame 22
  - Threshold is exactly 3 consecutive failures, not 2 or 4
 ### Step 7: User Provides GPS Fix
 - **Action**: POST /flights/{flightId}/user-fix
 - **Payload**:
  ```json
  {
    "frame_id": 22,
    "uav_pixel": [3126, 2084],
    "satellite_gps": {"lat": 48.273997, "lon": 37.379828},
    "confidence": "high"
  }
  ```
 - **Expected Result**:
  - HTTP 200 OK
  - Response: `{"status": "accepted", "frame_id": 22}`
 ### Step 8: System Incorporates User Fix
 - **Action**: F11 processes user fix via `apply_user_anchor()`
 - **Expected Result**:
  - F10 adds GPS anchor with high confidence (weight = 10.0)
  - Factor graph re-optimizes
  - SSE event: `user_fix_applied`
  - Event payload:
    ```json
    {
      "type": "user_fix_applied",
      "frame_id": 22,
      "estimated_gps": {"lat": 48.273997, "lon": 37.379828},
      "affected_frames": [20, 21, 22]
    }
    ```
 ### Step 9: Trajectory Refinement
 - **Action**: Factor graph back-propagates fix to frames 20, 21
 - **Expected Result**:
  - SSE event: `trajectory_refined` for frames 20, 21
  - All 3 failed frames now have GPS estimates
  - failure_count reset to 0
 ### Step 10: Processing Resumes Automatically
 - **Action**: System processes AD000023 and beyond
 - **Expected Result**:
  - Processing resumes without manual restart
  - AD000023+ processed normally (no more injected failures)
  - SSE events continue: `frame_processed`
 ### Step 11: Validate 20% Route Allowance
 - **Action**: Calculate maximum allowed user inputs for 60-image flight
 - **Expected Result**:
  - 20% of 60 = 12 images maximum can need user input
  - System tracks user_input_count per flight
  - If user_input_count > 12, system logs warning but continues
 ### Step 12: Test Multiple User Input Cycles
 - **Action**: Inject failures for frames AD000040, AD000041, AD000042
 - **Expected Result**:
  - Second `user_input_required` event triggered
  - User provides second fix
  - System continues processing
  - Total user inputs: 2 cycles (6 frames aided)
 ### Step 13: Test User Input Timeout
 - **Action**: Trigger user input request, wait 5 minutes without response
 - **Expected Result**:
  - System sends reminder: `user_input_reminder` at 2 minutes
  - Processing remains paused for affected chunk
  - Other chunks (if any) continue processing
  - No timeout crash
 ### Step 14: Test Invalid User Fix
 - **Action**: Submit user fix with invalid GPS (outside geofence)
 - **Payload**:
  ```json
  {
    "frame_id": 22,
    "satellite_gps": {"lat": 0.0, "lon": 0.0}
  }
  ```
 - **Expected Result**:
  - HTTP 400 Bad Request
  - Error: "GPS coordinates outside flight geofence"
  - System re-requests user input
 ### Step 15: Validate Final Flight Statistics
 - **Action**: GET /flights/{flightId}/status
 - **Expected Result**:
  ```json
  {
    "flight_id": "<id>",
    "total_frames": 60,
    "processed_frames": 60,
    "user_input_requests": 2,
    "user_inputs_provided": 2,
    "frames_aided_by_user": 6,
    "user_input_percentage": 10.0
  }
  ```
 ## Success Criteria
- User input requested after 3 consecutive failures (not before)
+
- User notified via SSE with relevant info
+**Primary Criteria (AC-6)**:
- User fix incorporated with high confidence
+- User input requested after exactly 3 consecutive failures (not 2, not 4)
- Processing resumes automatically
+- User notified via SSE with relevant context (candidate tiles, image URL)
- Allows up to 20% of route to need user input (12 out of 60 images)
+- User fix accepted via REST API
 - User fix incorporated as high-confidence GPS anchor
 - Processing resumes automatically after fix
 - System allows up to 20% of route to need user input
 **Supporting Criteria**:
 - SSE events delivered within 1 second
 - Factor graph incorporates fix within 2 seconds
 - Back-propagation refines earlier failed frames
 - failure_count resets after successful fix
 - System handles multiple user input cycles per flight
 ## Pass/Fail Criteria
 **Passes If**: User input mechanism works, threshold correct (3 failures), processing resumes
 **Fails If**: User input not requested, or system cannot incorporate user fixes
 **TEST PASSES IF**:
 - User input request triggered at exactly 3 consecutive failures
 - SSE event contains all required info (frame_id, candidate tiles)
 - User fix accepted and incorporated
 - Processing resumes automatically
 - 20% allowance calculated correctly
 - Multiple cycles work correctly
 - Invalid fixes rejected gracefully
 **TEST FAILS IF**:
 - User input requested before 3 failures
 - User input NOT requested after 3 failures
 - SSE event missing required fields
 - User fix causes system error
 - Processing does not resume after fix
 - System crashes on invalid user input
 - Timeout causes system hang
 ## Error Scenarios
 ### Scenario A: User Provides Wrong GPS
 - User fix GPS is 500m from actual location
 - System accepts fix (user has authority)
 - Subsequent frames may fail again
 - Second user input cycle may be needed
 ### Scenario B: SSE Connection Lost
 - Client disconnects during user input wait
 - System buffers events
 - Client reconnects, receives pending events
 - Processing state preserved
 ### Scenario C: Database Failure During Fix
 - User fix received but DB write fails
 - System retries 3 times
 - If all retries fail, returns HTTP 503
 - User can retry submission
 ## Components Involved
 - F01 Flight API: `POST /flights/{id}/user-fix`
 - F02.1 Flight Lifecycle Manager: `handle_user_fix()`
 - F02.2 Flight Processing Engine: `apply_user_fix()`
 - F10 Factor Graph Optimizer: `add_absolute_factor()` with high confidence
 - F11 Failure Recovery Coordinator: `create_user_input_request()`, `apply_user_anchor()`
 - F15 SSE Event Streamer: `send_user_input_request()`, `send_user_fix_applied()`
 ## Notes
 - AC-6 is the human-in-the-loop fallback for extreme failures
 - 3-failure threshold balances automation with user intervention
 - 20% allowance (12 of 60 images) is operational constraint
 - User fixes are trusted (high confidence weight in factor graph)
 - System should minimize user inputs via L1/L2/L3 layer defense
@@ -6,35 +6,251 @@ Validate Acceptance Criterion 10: "Mean Reprojection Error (MRE) < 1.0 pixels. T
 ## Linked Acceptance Criteria
 **AC-10**: MRE < 1.0 pixels
 ## Preconditions
 1. ASTRAL-Next system operational
 2. F07 Sequential Visual Odometry extracting and matching features
 3. F10 Factor Graph Optimizer computing optimized poses
 4. Camera intrinsics calibrated (from F17 Configuration Manager)
 5. Test dataset with ground truth poses (for reference)
 6. Reprojection error calculation implemented
 ## Reprojection Error Definition
 **Formula**:
 ```
 For each matched feature point p_i in image I_j:
  1. Triangulate 3D point X_i from matches across images
  2. Project X_i back to image I_j using optimized pose T_j and camera K
  3. p'_i = K * T_j * X_i (projected pixel location)
  4. e_i = ||p_i - p'_i|| (Euclidean distance in pixels)
 MRE = (1/N) * Σ e_i  (mean across all features in all images)
 ```
 ## Test Data
 - **Dataset**: AD000001-AD000030 (30 images, baseline)
 - **Expected Features**: ~500-2000 matched features per image pair
 - **Total Measurements**: ~15,000-60,000 reprojection measurements
 ## Test Steps
-### Step 1: Process Flight with Factor Graph
+### Step 1: Process Flight Through Complete Pipeline
- **Action**: Process AD000001-030 through complete pipeline
+- **Action**: Process AD000001-AD000030 through full ASTRAL-Next pipeline
- **Expected Result**: Factor graph optimizes full trajectory
+- **Expected Result**:
  - Factor graph initialized and optimized
  - 30 poses computed
  - All feature correspondences stored
-### Step 2: Calculate Reprojection Errors
+### Step 2: Extract Feature Correspondences
- **Action**: For each matched feature across image pairs:
+- **Action**: Retrieve all matched features from F07
-  - Project 3D point back to image plane using optimized poses
+- **Expected Result**:
-  - Measure pixel distance from original detection
+  - For each image pair (i, j):
- **Expected Result**: Array of reprojection errors for all features
+    - List of matched keypoint pairs: [(p_i, p_j), ...]
    - Match confidence scores
    - Total: ~500-1500 matches per pair
  - Total matches across flight: ~15,000-45,000
-### Step 3: Compute Mean Reprojection Error
+### Step 3: Triangulate 3D Points
- **Action**: Calculate mean across all features in all images
+- **Action**: For each matched feature across multiple views, triangulate 3D position
- **Expected Result**: MRE < 1.0 pixels
+- **Expected Result**:
  - 3D point cloud generated
  - Each point has:
    - 3D coordinates (X, Y, Z) in ENU frame
    - List of observations (image_id, pixel_location)
    - Triangulation uncertainty
-### Step 4: Validate Factor Graph Quality
+### Step 4: Calculate Per-Feature Reprojection Error
- **Action**: Low MRE indicates:
+- **Action**: For each 3D point and each observation:
-  - Poses geometrically consistent
+  ```
-  - 3D structure accurate
+  For point X with observation (image_j, pixel_p):
-  - No "tension" in factor graph
+    1. Get optimized pose T_j from factor graph
- **Expected Result**: MRE correlates with GPS accuracy
+    2. Get camera intrinsics K from config
    3. Project: p' = project(K, T_j, X)
    4. Error: e = sqrt((p.x - p'.x)² + (p.y - p'.y)²)
  ```
 - **Expected Result**:
  - Array of per-feature reprojection errors
  - Typical range: 0.1 - 3.0 pixels
 ### Step 5: Compute Statistical Metrics
 - **Action**: Calculate MRE and distribution statistics
 - **Expected Result**:
  ```
  Total features evaluated: 25,000
  Mean Reprojection Error (MRE): 0.72 pixels
  Median Reprojection Error: 0.58 pixels
  Standard Deviation: 0.45 pixels
  90th Percentile: 1.25 pixels
  95th Percentile: 1.68 pixels
  99th Percentile: 2.41 pixels
  Max Error: 4.82 pixels
  ```
 ### Step 6: Validate MRE Threshold
 - **Action**: Compare MRE against AC-10 requirement
 - **Expected Result**:
  - **MRE = 0.72 pixels < 1.0 pixels** ✓
  - AC-10 PASS
 ### Step 7: Identify Outlier Reprojections
 - **Action**: Find features with reprojection error > 3.0 pixels
 - **Expected Result**:
  ```
  Outliers (> 3.0 pixels): 127 (0.5% of total)
  Outlier distribution:
    - 3.0-5.0 pixels: 98 features
    - 5.0-10.0 pixels: 27 features
    - > 10.0 pixels: 2 features
  ```
 ### Step 8: Analyze Outlier Causes
 - **Action**: Investigate high-error features
 - **Expected Result**:
  - Most outliers at image boundaries (lens distortion)
  - Some at occlusion boundaries
  - Moving objects (if any)
  - Repetitive textures causing mismatches
 ### Step 9: Per-Image MRE Analysis
 - **Action**: Calculate MRE per image
 - **Expected Result**:
  ```
  Per-Image MRE:
  AD000001: 0.68 px (baseline)
  AD000002: 0.71 px
  ...
  AD000032: 1.12 px (sharp turn - higher error)
  AD000033: 0.95 px
  ...
  AD000030: 0.74 px
  Images with MRE > 1.0: 2 out of 30 (6.7%)
  Overall MRE: 0.72 px
  ```
 ### Step 10: Temporal MRE Trend
 - **Action**: Plot MRE over sequence to detect drift
 - **Expected Result**:
  - MRE relatively stable across sequence
  - No significant upward trend (would indicate drift)
  - Spikes at known challenging locations (sharp turns)
 ### Step 11: Validate Robust Kernel Effect
 - **Action**: Compare MRE with/without robust cost functions
 - **Expected Result**:
  ```
  Without robust kernels: MRE = 0.89 px, outliers affect mean
  With Cauchy kernel: MRE = 0.72 px, outliers downweighted
  Improvement: 19% reduction in MRE
  ```
 ### Step 12: Cross-Validate with GPS Accuracy
 - **Action**: Correlate MRE with GPS error
 - **Expected Result**:
  - Low MRE correlates with low GPS error
  - Images with MRE > 1.5 px tend to have GPS error > 30m
  - MRE is leading indicator of trajectory quality
 ### Step 13: Test Under Challenging Conditions
 - **Action**: Compute MRE for challenging dataset (AD000001-060)
 - **Expected Result**:
  ```
  Full Flight MRE:
  Total features: 55,000
  MRE: 0.84 pixels (still < 1.0)
  Challenging segments:
    - Sharp turns: MRE = 1.15 px (above threshold locally)
    - Normal segments: MRE = 0.68 px
  Overall: AC-10 PASS
  ```
 ### Step 14: Generate Reprojection Error Report
 - **Action**: Create comprehensive MRE report
 - **Expected Result**:
  ```
  ========================================
  REPROJECTION ERROR REPORT
  Flight: AC10_Test
  Dataset: AD000001-AD000030
  ========================================
  SUMMARY:
  Mean Reprojection Error: 0.72 pixels
  AC-10 Threshold: 1.0 pixels
  Status: PASS ✓
  DISTRIBUTION:
  < 0.5 px: 12,450 (49.8%)
  0.5-1.0 px: 9,875 (39.5%)
  1.0-2.0 px: 2,350 (9.4%)
  2.0-3.0 px: 198 (0.8%)
  > 3.0 px: 127 (0.5%)
  PER-IMAGE BREAKDOWN:
  Images meeting < 1.0 px MRE: 28/30 (93.3%)
  Images with highest MRE: AD000032 (1.12 px), AD000048 (1.08 px)
  CORRELATION WITH GPS ACCURACY:
  Pearson correlation (MRE vs GPS error): 0.73
  Low MRE predicts high GPS accuracy
  RECOMMENDATIONS:
  - System meets AC-10 requirement
  - Consider additional outlier filtering for images > 1.0 px MRE
  - Sharp turn handling could be improved
  ========================================
  ```
 ## Success Criteria
- Mean Reprojection Error < 1.0 pixels
+
- Standard deviation reasonable (< 2.0 pixels)
+**Primary Criterion (AC-10)**:
- No outlier reprojections (> 10 pixels)
+- Mean Reprojection Error < 1.0 pixels across entire flight
 **Supporting Criteria**:
 - Standard deviation < 2.0 pixels
 - No outlier reprojections > 10 pixels (indicates gross errors)
 - Per-image MRE < 2.0 pixels (no catastrophic single-image failures)
 - MRE stable across sequence (no drift)
 ## Pass/Fail Criteria
 **Passes If**: MRE < 1.0 pixels
 **Fails If**: MRE ≥ 1.0 pixels, indicating geometry inconsistencies
 **TEST PASSES IF**:
 - Overall MRE < 1.0 pixels
 - Standard deviation reasonable (< 2.0 pixels)
 - Less than 1% of features have error > 5.0 pixels
 - MRE consistent across multiple test runs (variance < 10%)
 **TEST FAILS IF**:
 - MRE ≥ 1.0 pixels
 - Standard deviation > 3.0 pixels (high variance indicates instability)
 - More than 5% of features have error > 5.0 pixels
 - MRE increases significantly over sequence (drift)
 ## Diagnostic Actions if Failing
 **If MRE > 1.0 px**:
 1. Check camera calibration accuracy
 2. Verify lens distortion model
 3. Review feature matching quality (outlier ratio)
 4. Examine factor graph convergence
 5. Check for scale drift in trajectory
 **If High Variance**:
 1. Investigate images with outlier MRE
 2. Check for challenging conditions (blur, low texture)
 3. Review robust kernel settings
 4. Verify triangulation accuracy
 ## Components Involved
 - F07 Sequential Visual Odometry: Feature extraction and matching
 - F10 Factor Graph Optimizer: Pose optimization, marginal covariances
 - F13 Coordinate Transformer: 3D point projection
 - H01 Camera Model: Camera intrinsics, projection functions
 - H03 Robust Kernels: Outlier handling in optimization
 ## Notes
 - MRE is a geometric consistency metric, not direct GPS accuracy
 - Low MRE indicates well-constrained factor graph
 - High MRE with good GPS accuracy = overfitting to GPS anchors
 - Low MRE with poor GPS accuracy = scale/alignment issues
 - AC-10 validates internal consistency of vision pipeline
@@ -8,11 +8,14 @@ Validate chunk LiteSAM matching with rotation sweeps for chunks with unknown ori
 **AC-5**: Connect route chunks
 ## Preconditions
-1. F12 Route Chunk Manager functional
+1. F02.2 Flight Processing Engine running
-2. F06 Image Rotation Manager with chunk rotation support
+2. F11 Failure Recovery Coordinator (chunk orchestration, returns status objects)
-3. F09 Metric Refinement with chunk LiteSAM matching
+3. F12 Route Chunk Manager functional (chunk lifecycle via `create_chunk()`, `mark_chunk_anchored()`)
-4. F10 Factor Graph Optimizer with chunk merging
+4. F06 Image Rotation Manager with chunk rotation support (`try_chunk_rotation_steps()`)
-5. Test dataset: Chunk with unknown orientation (simulated sharp turn)
+5. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
 6. F09 Metric Refinement with chunk LiteSAM matching (`align_chunk_to_satellite()`)
 7. F10 Factor Graph Optimizer with chunk operations (`add_chunk_anchor()`, `merge_chunk_subgraphs()`)
 8. Test dataset: Chunk with unknown orientation (simulated sharp turn)
 ## Test Description
 Test system's ability to match chunks with unknown orientation using rotation sweeps. When a chunk is created after a sharp turn, its orientation relative to the satellite map is unknown. The system must rotate the entire chunk to all possible angles and attempt LiteSAM matching.
@@ -54,8 +57,8 @@ Test system's ability to match chunks with unknown orientation using rotation sw
 ### Step 4: Chunk Merging
 - **Action**: Merge chunk_2 to main trajectory
 - **Expected Result**:
-  - F10.add_chunk_anchor() anchors chunk_2
+  - F12.mark_chunk_anchored() updates chunk state (calls F10.add_chunk_anchor())
-  - F10.merge_chunks() merges chunk_2 into chunk_1
+  - F12.merge_chunks() merges chunk_2 into chunk_1 (calls F10.merge_chunk_subgraphs())
  - Sim(3) transform applied correctly
  - Global trajectory consistent
@@ -8,10 +8,13 @@ Validate system's ability to process multiple chunks simultaneously, matching an
 **AC-5**: Connect route chunks (multiple chunks)
 ## Preconditions
-1. F10 Factor Graph Optimizer with native multi-chunk support
+1. F02.2 Flight Processing Engine running
-2. F12 Route Chunk Manager functional
+2. F10 Factor Graph Optimizer with native multi-chunk support (subgraph operations)
-3. F11 Failure Recovery Coordinator with chunk orchestration
+3. F11 Failure Recovery Coordinator (pure logic, returns status objects to F02.2)
-4. Test dataset: Flight with 3 disconnected segments
+4. F12 Route Chunk Manager functional (chunk lifecycle: `create_chunk()`, `add_frame_to_chunk()`, `mark_chunk_anchored()`, `merge_chunks()`)
 5. F08 Global Place Recognition (chunk semantic matching via `retrieve_candidate_tiles_for_chunk()`)
 6. F09 Metric Refinement (chunk LiteSAM matching)
 7. Test dataset: Flight with 3 disconnected segments
 ## Test Description
 Test system's ability to handle multiple disconnected route segments simultaneously. The system should create chunks proactively, process them independently, and match/merge them asynchronously without blocking frame processing.
@@ -143,24 +146,26 @@ Multi-Chunk Simultaneous Processing:
 ## Architecture Elements
 **Multi-Chunk Support**:
- F10 Factor Graph Optimizer supports multiple chunks simultaneously
+- F10 Factor Graph Optimizer supports multiple chunks via `create_chunk_subgraph()`
- Each chunk has own subgraph
+- Each chunk has own subgraph, optimized independently via `optimize_chunk()`
- Chunks optimized independently
+- F12 Route Chunk Manager owns chunk metadata (status, is_active, etc.)
 **Proactive Chunk Creation**:
- Chunks created immediately on tracking loss
+- F11 triggers chunk creation via `create_chunk_on_tracking_loss()`
- Not reactive (doesn't wait for matching to fail)
+- F12.create_chunk() creates chunk and calls F10.create_chunk_subgraph()
- Processing continues in new chunk
+- Processing continues in new chunk immediately (not reactive)
 **Asynchronous Matching**:
- Background task processes unanchored chunks
+- F02.2 manages background task that calls F11.process_unanchored_chunks()
 - F11 calls F12.get_chunks_for_matching() to find ready chunks
 - F11.try_chunk_semantic_matching() → F11.try_chunk_litesam_matching()
 - Matching doesn't block frame processing
 - Chunks matched and merged asynchronously
 **Chunk Merging**:
- Sim(3) transform for merging
+- F11.merge_chunk_to_trajectory() coordinates merging
- Accounts for translation, rotation, scale
+- F12.merge_chunks() updates chunk state and calls F10.merge_chunk_subgraphs()
- Global optimization after merging
+- Sim(3) transform accounts for translation, rotation, scale
 - F10.optimize_global() runs after merging
 ## Notes
 - Multiple chunks can exist simultaneously