initial structure implemented

docs -> _docs
This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,61 @@
# Feature: Batch Queue Management
## Description
Handles ingestion, validation, and FIFO queuing of image batches. This feature manages the entry point for all images into the system, ensuring sequence integrity and proper queuing for downstream processing.
## Component APIs Implemented
- `queue_batch(flight_id: str, batch: ImageBatch) -> bool`
- `validate_batch(batch: ImageBatch) -> ValidationResult`
- `process_next_batch(flight_id: str) -> Optional[ProcessedBatch]`
## External Tools and Services
- **H08 Batch Validator**: Delegated validation for naming convention, sequence continuity, format, dimensions
- **Pillow**: Image decoding and metadata extraction
- **opencv-python**: Image I/O operations
## Internal Methods
| Method | Purpose |
|--------|---------|
| `_add_to_queue(flight_id, batch)` | Adds validated batch to flight's FIFO queue |
| `_dequeue_batch(flight_id)` | Removes and returns next batch from queue |
| `_check_sequence_continuity(flight_id, batch)` | Validates batch continues from last processed sequence |
| `_decode_images(batch)` | Decompresses/decodes raw image bytes to ImageData |
| `_extract_metadata(image_bytes)` | Extracts EXIF, dimensions from raw image |
| `_get_queue_capacity(flight_id)` | Returns remaining queue capacity for backpressure |
## Unit Tests
| Test | Description |
|------|-------------|
| `test_queue_batch_valid` | Valid batch queued successfully, returns True |
| `test_queue_batch_sequence_gap` | Batch with sequence gap from last processed → ValidationError |
| `test_queue_batch_invalid_naming` | Non-consecutive filenames → ValidationError |
| `test_queue_batch_queue_full` | Queue at capacity → QueueFullError |
| `test_validate_batch_size_min` | 9 images → invalid (min 10) |
| `test_validate_batch_size_max` | 51 images → invalid (max 50) |
| `test_validate_batch_naming_convention` | ADxxxxxx.jpg format validated |
| `test_validate_batch_invalid_format` | IMG_0001.jpg → invalid |
| `test_validate_batch_non_consecutive` | AD000101, AD000103 → invalid |
| `test_validate_batch_file_format` | JPEG/PNG accepted, others rejected |
| `test_validate_batch_dimensions` | Within 640x480 to 6252x4168 |
| `test_validate_batch_file_size` | < 10MB per image |
| `test_process_next_batch_dequeue` | Returns ProcessedBatch with decoded images |
| `test_process_next_batch_empty_queue` | Empty queue → returns None |
| `test_process_next_batch_corrupted_image` | Corrupted image skipped, others processed |
| `test_process_next_batch_metadata_extraction` | EXIF and dimensions extracted correctly |
| `test_fifo_order` | Multiple batches processed in queue order |
## Integration Tests
| Test | Description |
|------|-------------|
| `test_batch_flow_queue_to_process` | queue_batch → process_next_batch → verify ImageData list |
| `test_multiple_batches_fifo` | Queue 5 batches, process in order, verify sequence maintained |
| `test_batch_validation_with_h08` | Integration with H08 Batch Validator |
| `test_concurrent_queue_access` | Multiple flights queuing simultaneously |
| `test_backpressure_handling` | Queue fills up, backpressure signal returned |
@@ -0,0 +1,70 @@
# Feature: Image Storage and Retrieval
## Description
Handles persistent storage of processed images and provides retrieval mechanisms for sequential processing, random access, and metadata queries. Manages disk storage structure, maintains sequence tracking per flight, and provides processing status information.
## Component APIs Implemented
- `store_images(flight_id: str, images: List[ImageData]) -> bool`
- `get_next_image(flight_id: str) -> Optional[ImageData]`
- `get_image_by_sequence(flight_id: str, sequence: int) -> Optional[ImageData]`
- `get_image_metadata(flight_id: str, sequence: int) -> Optional[ImageMetadata]`
- `get_processing_status(flight_id: str) -> ProcessingStatus`
## External Tools and Services
- **F03 Flight Database**: Metadata persistence, flight state queries
- **opencv-python**: Image I/O (cv2.imread, cv2.imwrite)
- **numpy**: Image array handling
## Internal Methods
| Method | Purpose |
|--------|---------|
| `_create_flight_directory(flight_id)` | Creates storage directory structure for flight |
| `_write_image(flight_id, filename, image_data)` | Writes single image to disk |
| `_update_metadata_index(flight_id, metadata_list)` | Updates metadata.json with new image metadata |
| `_load_image_from_disk(flight_id, filename)` | Reads image file and returns np.ndarray |
| `_construct_filename(sequence)` | Converts sequence number to ADxxxxxx.jpg format |
| `_get_sequence_tracker(flight_id)` | Gets/initializes current sequence position for flight |
| `_increment_sequence(flight_id)` | Advances sequence counter after get_next_image |
| `_load_metadata_from_index(flight_id, sequence)` | Reads metadata from index without loading image |
| `_calculate_processing_rate(flight_id)` | Computes images/second processing rate |
## Unit Tests
| Test | Description |
|------|-------------|
| `test_store_images_success` | All images written to correct paths |
| `test_store_images_creates_directory` | Flight directory created if not exists |
| `test_store_images_updates_metadata` | metadata.json updated with image info |
| `test_store_images_disk_full` | Storage error returns False |
| `test_get_next_image_sequential` | Returns images in sequence order |
| `test_get_next_image_increments_counter` | Sequence counter advances after each call |
| `test_get_next_image_end_of_sequence` | Returns None when no more images |
| `test_get_next_image_missing_file` | Handles missing image gracefully |
| `test_get_image_by_sequence_valid` | Returns correct image for sequence number |
| `test_get_image_by_sequence_invalid` | Invalid sequence returns None |
| `test_get_image_by_sequence_constructs_filename` | Sequence 101 → AD000101.jpg |
| `test_get_image_metadata_fast` | Returns metadata without loading full image |
| `test_get_image_metadata_missing` | Missing image returns None |
| `test_get_image_metadata_contains_fields` | Returns sequence, filename, dimensions, file_size, timestamp, exif |
| `test_get_processing_status_counts` | Accurate total_images, processed_images counts |
| `test_get_processing_status_current_sequence` | Reflects current processing position |
| `test_get_processing_status_queued_batches` | Includes queue depth |
| `test_get_processing_status_rate` | Processing rate calculation |
## Integration Tests
| Test | Description |
|------|-------------|
| `test_store_then_retrieve_sequential` | store_images → get_next_image × N → all images retrieved |
| `test_store_then_retrieve_by_sequence` | store_images → get_image_by_sequence → correct image |
| `test_metadata_persistence_f03` | store_images → metadata persisted to F03 Flight Database |
| `test_crash_recovery_resume` | Restart processing from last stored sequence |
| `test_concurrent_retrieval` | Multiple consumers retrieving images simultaneously |
| `test_storage_large_batch` | Store and retrieve 3000 images for single flight |
| `test_multiple_flights_isolation` | Multiple flights don't interfere with each other's storage |
| `test_status_updates_realtime` | Status reflects current state during active processing |
@@ -0,0 +1,455 @@
# Image Input Pipeline
## Interface Definition
**Interface Name**: `IImageInputPipeline`
### Interface Methods
```python
class IImageInputPipeline(ABC):
@abstractmethod
def queue_batch(self, flight_id: str, batch: ImageBatch) -> bool:
pass
@abstractmethod
def process_next_batch(self, flight_id: str) -> Optional[ProcessedBatch]:
pass
@abstractmethod
def validate_batch(self, batch: ImageBatch) -> ValidationResult:
pass
@abstractmethod
def store_images(self, flight_id: str, images: List[ImageData]) -> bool:
pass
@abstractmethod
def get_next_image(self, flight_id: str) -> Optional[ImageData]:
pass
@abstractmethod
def get_image_by_sequence(self, flight_id: str, sequence: int) -> Optional[ImageData]:
pass
@abstractmethod
def get_image_metadata(self, flight_id: str, sequence: int) -> Optional[ImageMetadata]:
pass
@abstractmethod
def get_processing_status(self, flight_id: str) -> ProcessingStatus:
pass
```
## Component Description
### Responsibilities
- Unified image ingestion, validation, storage, and retrieval
- FIFO batch queuing for processing
- Validate consecutive naming (AD000001, AD000002, etc.)
- Validate sequence integrity (strict sequential ordering)
- Image persistence with indexed retrieval
- Metadata extraction (EXIF, dimensions)
### Scope
- Batch queue management
- Image validation
- Disk storage management
- Sequential processing coordination
- Metadata management
## API Methods
### `queue_batch(flight_id: str, batch: ImageBatch) -> bool`
**Description**: Queues a batch of images for processing (FIFO).
**Called By**:
- F02.1 Flight Lifecycle Manager (via F01 Flight API image upload route)
**Input**:
```python
flight_id: str
batch: ImageBatch:
images: List[bytes] # Raw image data
filenames: List[str] # e.g., ["AD000101.jpg", "AD000102.jpg", ...]
start_sequence: int # 101
end_sequence: int # 150
```
**Output**:
```python
bool: True if queued successfully
```
**Processing Flow**:
1. Validate batch using H08 Batch Validator
2. Check sequence continuity (no gaps)
3. Add to FIFO queue for flight_id
4. Return immediately (async processing)
**Error Conditions**:
- `ValidationError`: Sequence gap, invalid naming
- `QueueFullError`: Queue capacity exceeded
**Test Cases**:
1. **Valid batch**: Queued successfully
2. **Sequence gap**: Batch 101-150, expecting 51-100 → error
3. **Invalid naming**: Non-consecutive names → error
4. **Queue full**: Returns error with backpressure signal
---
### `process_next_batch(flight_id: str) -> Optional[ProcessedBatch]`
**Description**: Dequeues and processes the next batch from FIFO queue.
**Called By**:
- Internal processing loop (background worker)
**Input**:
```python
flight_id: str
```
**Output**:
```python
ProcessedBatch:
images: List[ImageData]
batch_id: str
start_sequence: int
end_sequence: int
```
**Processing Flow**:
1. Dequeue next batch
2. Decompress/decode images
3. Extract metadata (EXIF, dimensions)
4. Store images to disk
5. Return ProcessedBatch for pipeline
**Error Conditions**:
- Returns `None`: Queue empty
- `ImageCorruptionError`: Invalid image data
**Test Cases**:
1. **Process batch**: Dequeues, returns ImageData list
2. **Empty queue**: Returns None
3. **Corrupted image**: Logs error, skips image
---
### `validate_batch(batch: ImageBatch) -> ValidationResult`
**Description**: Validates batch integrity and sequence continuity.
**Called By**:
- Internal (before queuing)
- H08 Batch Validator (delegated validation)
**Input**:
```python
batch: ImageBatch
```
**Output**:
```python
ValidationResult:
valid: bool
errors: List[str]
```
**Validation Rules**:
1. **Batch size**: 10 <= len(images) <= 50
2. **Naming convention**: ADxxxxxx.jpg (6 digits)
3. **Sequence continuity**: Consecutive numbers
4. **File format**: JPEG or PNG
5. **Image dimensions**: 640x480 to 6252x4168
6. **File size**: < 10MB per image
**Test Cases**:
1. **Valid batch**: Returns valid=True
2. **Too few images**: 5 images → invalid
3. **Too many images**: 60 images → invalid
4. **Non-consecutive**: AD000101, AD000103 → invalid
5. **Invalid naming**: IMG_0001.jpg → invalid
---
### `store_images(flight_id: str, images: List[ImageData]) -> bool`
**Description**: Persists images to disk with indexed storage.
**Called By**:
- Internal (after processing batch)
**Input**:
```python
flight_id: str
images: List[ImageData]
```
**Output**:
```python
bool: True if stored successfully
```
**Storage Structure**:
```
/image_storage/
{flight_id}/
AD000001.jpg
AD000002.jpg
metadata.json
```
**Processing Flow**:
1. Create flight directory if not exists
2. Write each image to disk
3. Update metadata index
4. Persist to F03 Database Layer (metadata only)
**Error Conditions**:
- `StorageError`: Disk full, permission error
**Test Cases**:
1. **Store batch**: All images written successfully
2. **Disk full**: Returns False
3. **Verify storage**: Images retrievable after storage
---
### `get_next_image(flight_id: str) -> Optional[ImageData]`
**Description**: Gets the next image in sequence for processing.
**Called By**:
- F02.2 Flight Processing Engine (main processing loop)
- F06 Image Rotation Manager (via F02.2)
- F07 Sequential VO (via F02.2)
**Input**:
```python
flight_id: str
```
**Output**:
```python
ImageData:
flight_id: str
sequence: int
filename: str
image: np.ndarray # Loaded image
metadata: ImageMetadata
```
**Processing Flow**:
1. Track current sequence number for flight
2. Load next image from disk
3. Increment sequence counter
4. Return ImageData
**Error Conditions**:
- Returns `None`: No more images
- `ImageNotFoundError`: Expected image missing
**Test Cases**:
1. **Get sequential images**: Returns images in order
2. **End of sequence**: Returns None
3. **Missing image**: Handles gracefully
---
### `get_image_by_sequence(flight_id: str, sequence: int) -> Optional[ImageData]`
**Description**: Retrieves a specific image by sequence number.
**Called By**:
- F11 Failure Recovery Coordinator (for user fix)
- F13 Result Manager (for refinement)
**Input**:
```python
flight_id: str
sequence: int
```
**Output**:
```python
Optional[ImageData]
```
**Processing Flow**:
1. Construct filename from sequence (ADxxxxxx.jpg)
2. Load from disk
3. Load metadata
4. Return ImageData
**Error Conditions**:
- Returns `None`: Image not found
**Test Cases**:
1. **Get specific image**: Returns correct image
2. **Invalid sequence**: Returns None
---
### `get_image_metadata(flight_id: str, sequence: int) -> Optional[ImageMetadata]`
**Description**: Retrieves metadata without loading full image (lightweight).
**Called By**:
- F02.1 Flight Lifecycle Manager (status checks)
- F13 Result Manager (metadata-only queries)
**Input**:
```python
flight_id: str
sequence: int
```
**Output**:
```python
ImageMetadata:
sequence: int
filename: str
dimensions: Tuple[int, int] # (width, height)
file_size: int # bytes
timestamp: datetime
exif_data: Optional[Dict]
```
**Test Cases**:
1. **Get metadata**: Returns quickly without loading image
2. **Missing image**: Returns None
---
### `get_processing_status(flight_id: str) -> ProcessingStatus`
**Description**: Gets current processing status for a flight.
**Called By**:
- F02.1 Flight Lifecycle Manager (status queries via F01 Flight API)
- F02.2 Flight Processing Engine (processing loop status)
**Input**:
```python
flight_id: str
```
**Output**:
```python
ProcessingStatus:
flight_id: str
total_images: int
processed_images: int
current_sequence: int
queued_batches: int
processing_rate: float # images/second
```
**Processing Flow**:
1. Get flight state via F03 Flight Database.get_flight(flight_id).status
2. Combine with internal queue status
3. Return ProcessingStatus
**Test Cases**:
1. **Get status**: Returns accurate counts
2. **During processing**: Updates in real-time
## Integration Tests
### Test 1: Batch Processing Flow
1. queue_batch() with 50 images
2. process_next_batch() → returns batch
3. store_images() → persists to disk
4. get_next_image() × 50 → retrieves all sequentially
5. Verify metadata
### Test 2: Multiple Batches
1. queue_batch() × 5 (250 images total)
2. process_next_batch() × 5
3. Verify FIFO order maintained
4. Verify sequence continuity
### Test 3: Error Handling
1. Queue batch with sequence gap
2. Verify validation error
3. Queue valid batch → succeeds
4. Simulate disk full → storage fails gracefully
## Non-Functional Requirements
### Performance
- **queue_batch**: < 100ms
- **process_next_batch**: < 2 seconds for 50 images
- **get_next_image**: < 50ms
- **get_image_by_sequence**: < 50ms
- **Processing throughput**: 10-20 images/second
### Scalability
- Support 3000 images per flight
- Handle 10 concurrent flights
- Manage 100GB+ image storage
### Reliability
- Crash recovery (resume processing from last sequence)
- Atomic batch operations
- Data integrity validation
## Dependencies
### Internal Components
- **F03 Flight Database**: For metadata persistence and flight state information
- **H08 Batch Validator**: For batch validation (naming convention, sequence continuity, format, dimensions)
### External Dependencies
- **opencv-python**: Image I/O
- **Pillow**: Image processing
- **numpy**: Image arrays
## Data Models
### ImageBatch
```python
class ImageBatch(BaseModel):
images: List[bytes]
filenames: List[str]
start_sequence: int
end_sequence: int
batch_number: int
```
### ImageData
```python
class ImageData(BaseModel):
flight_id: str
sequence: int
filename: str
image: np.ndarray
metadata: ImageMetadata
```
### ImageMetadata
```python
class ImageMetadata(BaseModel):
sequence: int
filename: str
dimensions: Tuple[int, int]
file_size: int
timestamp: datetime
exif_data: Optional[Dict]
```
### ProcessingStatus
```python
class ProcessingStatus(BaseModel):
flight_id: str
total_images: int
processed_images: int
current_sequence: int
queued_batches: int
processing_rate: float
```