mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 22:26:38 +00:00
component decomposition is done
This commit is contained in:
@@ -0,0 +1,329 @@
|
||||
# Batch Validator Helper
|
||||
|
||||
## Interface Definition
|
||||
|
||||
**Interface Name**: `IBatchValidator`
|
||||
|
||||
### Interface Methods
|
||||
|
||||
```python
|
||||
class IBatchValidator(ABC):
|
||||
@abstractmethod
|
||||
def validate_batch_size(self, batch: ImageBatch) -> ValidationResult:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def check_sequence_continuity(self, batch: ImageBatch, expected_start: int) -> ValidationResult:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate_naming_convention(self, filenames: List[str]) -> ValidationResult:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate_format(self, image_data: bytes) -> ValidationResult:
|
||||
pass
|
||||
```
|
||||
|
||||
## Component Description
|
||||
|
||||
### Responsibilities
|
||||
- Validate image batch integrity
|
||||
- Check sequence continuity and naming conventions
|
||||
- Validate image format and dimensions
|
||||
- Ensure batch size constraints (10-50 images)
|
||||
- Support strict sequential ordering (ADxxxxxx.jpg)
|
||||
|
||||
### Scope
|
||||
- Batch validation for G05 Image Input Pipeline
|
||||
- Image format validation
|
||||
- Filename pattern matching
|
||||
- Sequence gap detection
|
||||
|
||||
## API Methods
|
||||
|
||||
### `validate_batch_size(batch: ImageBatch) -> ValidationResult`
|
||||
|
||||
**Description**: Validates batch contains 10-50 images.
|
||||
|
||||
**Called By**:
|
||||
- G05 Image Input Pipeline (before queuing)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
batch: ImageBatch:
|
||||
images: List[bytes]
|
||||
filenames: List[str]
|
||||
start_sequence: int
|
||||
end_sequence: int
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
ValidationResult:
|
||||
valid: bool
|
||||
errors: List[str]
|
||||
```
|
||||
|
||||
**Validation Rules**:
|
||||
- **Minimum batch size**: 10 images
|
||||
- **Maximum batch size**: 50 images
|
||||
- **Reason**: Balance between upload overhead and processing granularity
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns `valid=False` with error message (not an exception)
|
||||
|
||||
**Test Cases**:
|
||||
1. **Valid batch (20 images)**: Returns `valid=True`
|
||||
2. **Too few images (5)**: Returns `valid=False`, error="Batch size 5 below minimum 10"
|
||||
3. **Too many images (60)**: Returns `valid=False`, error="Batch size 60 exceeds maximum 50"
|
||||
4. **Empty batch**: Returns `valid=False`
|
||||
|
||||
---
|
||||
|
||||
### `check_sequence_continuity(batch: ImageBatch, expected_start: int) -> ValidationResult`
|
||||
|
||||
**Description**: Validates images form consecutive sequence with no gaps.
|
||||
|
||||
**Called By**:
|
||||
- G05 Image Input Pipeline (before queuing)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
batch: ImageBatch
|
||||
expected_start: int # Expected starting sequence number
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
ValidationResult:
|
||||
valid: bool
|
||||
errors: List[str]
|
||||
```
|
||||
|
||||
**Validation Rules**:
|
||||
1. **Sequence starts at expected_start**: First image sequence == expected_start
|
||||
2. **Consecutive numbers**: No gaps in sequence (AD000101, AD000102, AD000103, ...)
|
||||
3. **Filename extraction**: Parse sequence from ADxxxxxx.jpg pattern
|
||||
4. **Strict ordering**: Images must be in sequential order
|
||||
|
||||
**Algorithm**:
|
||||
```python
|
||||
sequences = [extract_sequence(filename) for filename in batch.filenames]
|
||||
if sequences[0] != expected_start:
|
||||
return invalid("Expected start {expected_start}, got {sequences[0]}")
|
||||
for i in range(len(sequences) - 1):
|
||||
if sequences[i+1] != sequences[i] + 1:
|
||||
return invalid(f"Gap detected: {sequences[i]} -> {sequences[i+1]}")
|
||||
return valid()
|
||||
```
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns `valid=False` with specific gap information
|
||||
|
||||
**Test Cases**:
|
||||
1. **Valid sequence (101-150)**: expected_start=101 → valid=True
|
||||
2. **Wrong start**: expected_start=101, got 102 → valid=False
|
||||
3. **Gap in sequence**: AD000101, AD000103 (missing 102) → valid=False
|
||||
4. **Out of order**: AD000102, AD000101 → valid=False
|
||||
|
||||
---
|
||||
|
||||
### `validate_naming_convention(filenames: List[str]) -> ValidationResult`
|
||||
|
||||
**Description**: Validates filenames match ADxxxxxx.jpg pattern.
|
||||
|
||||
**Called By**:
|
||||
- Internal (during check_sequence_continuity)
|
||||
- G05 Image Input Pipeline
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
filenames: List[str]
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
ValidationResult:
|
||||
valid: bool
|
||||
errors: List[str]
|
||||
```
|
||||
|
||||
**Validation Rules**:
|
||||
1. **Pattern**: `AD\d{6}\.(jpg|JPG|png|PNG)`
|
||||
2. **Examples**: AD000001.jpg, AD000237.JPG, AD002000.png
|
||||
3. **Case insensitive**: Accepts .jpg, .JPG, .Jpg
|
||||
4. **6 digits required**: Zero-padded to 6 digits
|
||||
|
||||
**Regex Pattern**: `^AD\d{6}\.(jpg|JPG|png|PNG)$`
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns `valid=False` listing invalid filenames
|
||||
|
||||
**Test Cases**:
|
||||
1. **Valid names**: ["AD000001.jpg", "AD000002.jpg"] → valid=True
|
||||
2. **Invalid prefix**: "IMG_0001.jpg" → valid=False
|
||||
3. **Wrong digit count**: "AD001.jpg" (3 digits) → valid=False
|
||||
4. **Missing extension**: "AD000001" → valid=False
|
||||
5. **Invalid extension**: "AD000001.bmp" → valid=False
|
||||
|
||||
---
|
||||
|
||||
### `validate_format(image_data: bytes) -> ValidationResult`
|
||||
|
||||
**Description**: Validates image file format and properties.
|
||||
|
||||
**Called By**:
|
||||
- G05 Image Input Pipeline (per-image validation)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
image_data: bytes # Raw image file bytes
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
ValidationResult:
|
||||
valid: bool
|
||||
errors: List[str]
|
||||
```
|
||||
|
||||
**Validation Rules**:
|
||||
1. **Format**: Valid JPEG or PNG
|
||||
2. **Dimensions**: 640×480 to 6252×4168 pixels
|
||||
3. **File size**: < 10MB per image
|
||||
4. **Image readable**: Not corrupted
|
||||
5. **Color channels**: RGB (3 channels)
|
||||
|
||||
**Algorithm**:
|
||||
```python
|
||||
try:
|
||||
image = PIL.Image.open(BytesIO(image_data))
|
||||
width, height = image.size
|
||||
|
||||
if image.format not in ['JPEG', 'PNG']:
|
||||
return invalid("Format must be JPEG or PNG")
|
||||
|
||||
if width < 640 or height < 480:
|
||||
return invalid("Dimensions too small")
|
||||
|
||||
if width > 6252 or height > 4168:
|
||||
return invalid("Dimensions too large")
|
||||
|
||||
if len(image_data) > 10 * 1024 * 1024:
|
||||
return invalid("File size exceeds 10MB")
|
||||
|
||||
return valid()
|
||||
except Exception as e:
|
||||
return invalid(f"Corrupted image: {e}")
|
||||
```
|
||||
|
||||
**Error Conditions**:
|
||||
- Returns `valid=False` with specific error
|
||||
|
||||
**Test Cases**:
|
||||
1. **Valid JPEG (2048×1536)**: valid=True
|
||||
2. **Valid PNG (6252×4168)**: valid=True
|
||||
3. **Too small (320×240)**: valid=False
|
||||
4. **Too large (8000×6000)**: valid=False
|
||||
5. **File too big (15MB)**: valid=False
|
||||
6. **Corrupted file**: valid=False
|
||||
7. **BMP format**: valid=False
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### Test 1: Complete Batch Validation
|
||||
1. Create batch with 20 images, AD000101.jpg - AD000120.jpg
|
||||
2. validate_batch_size() → valid
|
||||
3. validate_naming_convention() → valid
|
||||
4. check_sequence_continuity(expected_start=101) → valid
|
||||
5. validate_format() for each image → all valid
|
||||
|
||||
### Test 2: Invalid Batch Detection
|
||||
1. Create batch with 60 images → validate_batch_size() fails
|
||||
2. Create batch with gap (AD000101, AD000103) → check_sequence_continuity() fails
|
||||
3. Create batch with IMG_0001.jpg → validate_naming_convention() fails
|
||||
4. Create batch with corrupted image → validate_format() fails
|
||||
|
||||
### Test 3: Edge Cases
|
||||
1. Batch with exactly 10 images → valid
|
||||
2. Batch with exactly 50 images → valid
|
||||
3. Batch with 51 images → invalid
|
||||
4. Batch starting at AD999995.jpg (near max) → valid
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- **validate_batch_size**: < 1ms
|
||||
- **check_sequence_continuity**: < 10ms for 50 images
|
||||
- **validate_naming_convention**: < 5ms for 50 filenames
|
||||
- **validate_format**: < 20ms per image (with PIL)
|
||||
- **Total batch validation**: < 100ms for 50 images
|
||||
|
||||
### Reliability
|
||||
- Never raises exceptions (returns ValidationResult with errors)
|
||||
- Handles edge cases gracefully
|
||||
- Clear, actionable error messages
|
||||
|
||||
### Maintainability
|
||||
- Configurable validation rules (min/max batch size, dimensions)
|
||||
- Easy to add new validation rules
|
||||
- Comprehensive error reporting
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal Components
|
||||
- None (pure utility, no internal dependencies)
|
||||
|
||||
### External Dependencies
|
||||
- **Pillow (PIL)**: Image format validation and dimension checking
|
||||
- **re** (regex): Filename pattern matching
|
||||
|
||||
## Data Models
|
||||
|
||||
### ImageBatch
|
||||
```python
|
||||
class ImageBatch(BaseModel):
|
||||
images: List[bytes] # Raw image data
|
||||
filenames: List[str] # e.g., ["AD000101.jpg", ...]
|
||||
start_sequence: int # 101
|
||||
end_sequence: int # 150
|
||||
batch_number: int # Sequential batch number
|
||||
```
|
||||
|
||||
### ValidationResult
|
||||
```python
|
||||
class ValidationResult(BaseModel):
|
||||
valid: bool
|
||||
errors: List[str] = [] # Empty if valid
|
||||
warnings: List[str] = [] # Optional warnings
|
||||
```
|
||||
|
||||
### ValidationRules (Configuration)
|
||||
```python
|
||||
class ValidationRules(BaseModel):
|
||||
min_batch_size: int = 10
|
||||
max_batch_size: int = 50
|
||||
min_width: int = 640
|
||||
min_height: int = 480
|
||||
max_width: int = 6252
|
||||
max_height: int = 4168
|
||||
max_file_size_mb: int = 10
|
||||
allowed_formats: List[str] = ["JPEG", "PNG"]
|
||||
filename_pattern: str = r"^AD\d{6}\.(jpg|JPG|png|PNG)$"
|
||||
```
|
||||
|
||||
### Sequence Extraction
|
||||
```python
|
||||
def extract_sequence(filename: str) -> int:
|
||||
"""
|
||||
Extracts sequence number from filename.
|
||||
|
||||
Example: "AD000237.jpg" -> 237
|
||||
"""
|
||||
match = re.match(r"AD(\d{6})\.", filename)
|
||||
if match:
|
||||
return int(match.group(1))
|
||||
raise ValueError(f"Invalid filename format: {filename}")
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user