component decomposition is done

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-11-24 14:09:23 +02:00
parent acec83018b
commit f50006d100
34 changed files with 8637 additions and 0 deletions
@@ -0,0 +1,329 @@
# Batch Validator Helper
## Interface Definition
**Interface Name**: `IBatchValidator`
### Interface Methods
```python
class IBatchValidator(ABC):
@abstractmethod
def validate_batch_size(self, batch: ImageBatch) -> ValidationResult:
pass
@abstractmethod
def check_sequence_continuity(self, batch: ImageBatch, expected_start: int) -> ValidationResult:
pass
@abstractmethod
def validate_naming_convention(self, filenames: List[str]) -> ValidationResult:
pass
@abstractmethod
def validate_format(self, image_data: bytes) -> ValidationResult:
pass
```
## Component Description
### Responsibilities
- Validate image batch integrity
- Check sequence continuity and naming conventions
- Validate image format and dimensions
- Ensure batch size constraints (10-50 images)
- Support strict sequential ordering (ADxxxxxx.jpg)
### Scope
- Batch validation for G05 Image Input Pipeline
- Image format validation
- Filename pattern matching
- Sequence gap detection
## API Methods
### `validate_batch_size(batch: ImageBatch) -> ValidationResult`
**Description**: Validates batch contains 10-50 images.
**Called By**:
- G05 Image Input Pipeline (before queuing)
**Input**:
```python
batch: ImageBatch:
images: List[bytes]
filenames: List[str]
start_sequence: int
end_sequence: int
```
**Output**:
```python
ValidationResult:
valid: bool
errors: List[str]
```
**Validation Rules**:
- **Minimum batch size**: 10 images
- **Maximum batch size**: 50 images
- **Reason**: Balance between upload overhead and processing granularity
**Error Conditions**:
- Returns `valid=False` with error message (not an exception)
**Test Cases**:
1. **Valid batch (20 images)**: Returns `valid=True`
2. **Too few images (5)**: Returns `valid=False`, error="Batch size 5 below minimum 10"
3. **Too many images (60)**: Returns `valid=False`, error="Batch size 60 exceeds maximum 50"
4. **Empty batch**: Returns `valid=False`
---
### `check_sequence_continuity(batch: ImageBatch, expected_start: int) -> ValidationResult`
**Description**: Validates images form consecutive sequence with no gaps.
**Called By**:
- G05 Image Input Pipeline (before queuing)
**Input**:
```python
batch: ImageBatch
expected_start: int # Expected starting sequence number
```
**Output**:
```python
ValidationResult:
valid: bool
errors: List[str]
```
**Validation Rules**:
1. **Sequence starts at expected_start**: First image sequence == expected_start
2. **Consecutive numbers**: No gaps in sequence (AD000101, AD000102, AD000103, ...)
3. **Filename extraction**: Parse sequence from ADxxxxxx.jpg pattern
4. **Strict ordering**: Images must be in sequential order
**Algorithm**:
```python
sequences = [extract_sequence(filename) for filename in batch.filenames]
if sequences[0] != expected_start:
return invalid("Expected start {expected_start}, got {sequences[0]}")
for i in range(len(sequences) - 1):
if sequences[i+1] != sequences[i] + 1:
return invalid(f"Gap detected: {sequences[i]} -> {sequences[i+1]}")
return valid()
```
**Error Conditions**:
- Returns `valid=False` with specific gap information
**Test Cases**:
1. **Valid sequence (101-150)**: expected_start=101 → valid=True
2. **Wrong start**: expected_start=101, got 102 → valid=False
3. **Gap in sequence**: AD000101, AD000103 (missing 102) → valid=False
4. **Out of order**: AD000102, AD000101 → valid=False
---
### `validate_naming_convention(filenames: List[str]) -> ValidationResult`
**Description**: Validates filenames match ADxxxxxx.jpg pattern.
**Called By**:
- Internal (during check_sequence_continuity)
- G05 Image Input Pipeline
**Input**:
```python
filenames: List[str]
```
**Output**:
```python
ValidationResult:
valid: bool
errors: List[str]
```
**Validation Rules**:
1. **Pattern**: `AD\d{6}\.(jpg|JPG|png|PNG)`
2. **Examples**: AD000001.jpg, AD000237.JPG, AD002000.png
3. **Case insensitive**: Accepts .jpg, .JPG, .Jpg
4. **6 digits required**: Zero-padded to 6 digits
**Regex Pattern**: `^AD\d{6}\.(jpg|JPG|png|PNG)$`
**Error Conditions**:
- Returns `valid=False` listing invalid filenames
**Test Cases**:
1. **Valid names**: ["AD000001.jpg", "AD000002.jpg"] → valid=True
2. **Invalid prefix**: "IMG_0001.jpg" → valid=False
3. **Wrong digit count**: "AD001.jpg" (3 digits) → valid=False
4. **Missing extension**: "AD000001" → valid=False
5. **Invalid extension**: "AD000001.bmp" → valid=False
---
### `validate_format(image_data: bytes) -> ValidationResult`
**Description**: Validates image file format and properties.
**Called By**:
- G05 Image Input Pipeline (per-image validation)
**Input**:
```python
image_data: bytes # Raw image file bytes
```
**Output**:
```python
ValidationResult:
valid: bool
errors: List[str]
```
**Validation Rules**:
1. **Format**: Valid JPEG or PNG
2. **Dimensions**: 640×480 to 6252×4168 pixels
3. **File size**: < 10MB per image
4. **Image readable**: Not corrupted
5. **Color channels**: RGB (3 channels)
**Algorithm**:
```python
try:
image = PIL.Image.open(BytesIO(image_data))
width, height = image.size
if image.format not in ['JPEG', 'PNG']:
return invalid("Format must be JPEG or PNG")
if width < 640 or height < 480:
return invalid("Dimensions too small")
if width > 6252 or height > 4168:
return invalid("Dimensions too large")
if len(image_data) > 10 * 1024 * 1024:
return invalid("File size exceeds 10MB")
return valid()
except Exception as e:
return invalid(f"Corrupted image: {e}")
```
**Error Conditions**:
- Returns `valid=False` with specific error
**Test Cases**:
1. **Valid JPEG (2048×1536)**: valid=True
2. **Valid PNG (6252×4168)**: valid=True
3. **Too small (320×240)**: valid=False
4. **Too large (8000×6000)**: valid=False
5. **File too big (15MB)**: valid=False
6. **Corrupted file**: valid=False
7. **BMP format**: valid=False
## Integration Tests
### Test 1: Complete Batch Validation
1. Create batch with 20 images, AD000101.jpg - AD000120.jpg
2. validate_batch_size() → valid
3. validate_naming_convention() → valid
4. check_sequence_continuity(expected_start=101) → valid
5. validate_format() for each image → all valid
### Test 2: Invalid Batch Detection
1. Create batch with 60 images → validate_batch_size() fails
2. Create batch with gap (AD000101, AD000103) → check_sequence_continuity() fails
3. Create batch with IMG_0001.jpg → validate_naming_convention() fails
4. Create batch with corrupted image → validate_format() fails
### Test 3: Edge Cases
1. Batch with exactly 10 images → valid
2. Batch with exactly 50 images → valid
3. Batch with 51 images → invalid
4. Batch starting at AD999995.jpg (near max) → valid
## Non-Functional Requirements
### Performance
- **validate_batch_size**: < 1ms
- **check_sequence_continuity**: < 10ms for 50 images
- **validate_naming_convention**: < 5ms for 50 filenames
- **validate_format**: < 20ms per image (with PIL)
- **Total batch validation**: < 100ms for 50 images
### Reliability
- Never raises exceptions (returns ValidationResult with errors)
- Handles edge cases gracefully
- Clear, actionable error messages
### Maintainability
- Configurable validation rules (min/max batch size, dimensions)
- Easy to add new validation rules
- Comprehensive error reporting
## Dependencies
### Internal Components
- None (pure utility, no internal dependencies)
### External Dependencies
- **Pillow (PIL)**: Image format validation and dimension checking
- **re** (regex): Filename pattern matching
## Data Models
### ImageBatch
```python
class ImageBatch(BaseModel):
images: List[bytes] # Raw image data
filenames: List[str] # e.g., ["AD000101.jpg", ...]
start_sequence: int # 101
end_sequence: int # 150
batch_number: int # Sequential batch number
```
### ValidationResult
```python
class ValidationResult(BaseModel):
valid: bool
errors: List[str] = [] # Empty if valid
warnings: List[str] = [] # Optional warnings
```
### ValidationRules (Configuration)
```python
class ValidationRules(BaseModel):
min_batch_size: int = 10
max_batch_size: int = 50
min_width: int = 640
min_height: int = 480
max_width: int = 6252
max_height: int = 4168
max_file_size_mb: int = 10
allowed_formats: List[str] = ["JPEG", "PNG"]
filename_pattern: str = r"^AD\d{6}\.(jpg|JPG|png|PNG)$"
```
### Sequence Extraction
```python
def extract_sequence(filename: str) -> int:
"""
Extracts sequence number from filename.
Example: "AD000237.jpg" -> 237
"""
match = re.match(r"AD(\d{6})\.", filename)
if match:
return int(match.group(1))
raise ValueError(f"Invalid filename format: {filename}")
```