# Batch Validator Helper ## Interface Definition **Interface Name**: `IBatchValidator` ### Interface Methods ```python class IBatchValidator(ABC): @abstractmethod def validate_batch_size(self, batch: ImageBatch) -> ValidationResult: pass @abstractmethod def check_sequence_continuity(self, batch: ImageBatch, expected_start: int) -> ValidationResult: pass @abstractmethod def validate_naming_convention(self, filenames: List[str]) -> ValidationResult: pass @abstractmethod def validate_format(self, image_data: bytes) -> ValidationResult: pass ``` ## Component Description ### Responsibilities - Validate image batch integrity - Check sequence continuity and naming conventions - Validate image format and dimensions - Ensure batch size constraints (10-50 images) - Support strict sequential ordering (ADxxxxxx.jpg) ### Scope - Batch validation for F05 Image Input Pipeline - Image format validation - Filename pattern matching - Sequence gap detection ## API Methods ### `validate_batch_size(batch: ImageBatch) -> ValidationResult` **Description**: Validates batch contains 10-50 images. **Called By**: - F05 Image Input Pipeline (before queuing) **Input**: ```python batch: ImageBatch: images: List[bytes] filenames: List[str] start_sequence: int end_sequence: int ``` **Output**: ```python ValidationResult: valid: bool errors: List[str] ``` **Validation Rules**: - **Minimum batch size**: 10 images - **Maximum batch size**: 50 images - **Reason**: Balance between upload overhead and processing granularity **Error Conditions**: - Returns `valid=False` with error message (not an exception) **Test Cases**: 1. **Valid batch (20 images)**: Returns `valid=True` 2. **Too few images (5)**: Returns `valid=False`, error="Batch size 5 below minimum 10" 3. **Too many images (60)**: Returns `valid=False`, error="Batch size 60 exceeds maximum 50" 4. **Empty batch**: Returns `valid=False` --- ### `check_sequence_continuity(batch: ImageBatch, expected_start: int) -> ValidationResult` **Description**: Validates images form consecutive sequence with no gaps. **Called By**: - F05 Image Input Pipeline (before queuing) **Input**: ```python batch: ImageBatch expected_start: int # Expected starting sequence number ``` **Output**: ```python ValidationResult: valid: bool errors: List[str] ``` **Validation Rules**: 1. **Sequence starts at expected_start**: First image sequence == expected_start 2. **Consecutive numbers**: No gaps in sequence (AD000101, AD000102, AD000103, ...) 3. **Filename extraction**: Parse sequence from ADxxxxxx.jpg pattern 4. **Strict ordering**: Images must be in sequential order **Algorithm**: ```python sequences = [extract_sequence(filename) for filename in batch.filenames] if sequences[0] != expected_start: return invalid("Expected start {expected_start}, got {sequences[0]}") for i in range(len(sequences) - 1): if sequences[i+1] != sequences[i] + 1: return invalid(f"Gap detected: {sequences[i]} -> {sequences[i+1]}") return valid() ``` **Error Conditions**: - Returns `valid=False` with specific gap information **Test Cases**: 1. **Valid sequence (101-150)**: expected_start=101 → valid=True 2. **Wrong start**: expected_start=101, got 102 → valid=False 3. **Gap in sequence**: AD000101, AD000103 (missing 102) → valid=False 4. **Out of order**: AD000102, AD000101 → valid=False --- ### `validate_naming_convention(filenames: List[str]) -> ValidationResult` **Description**: Validates filenames match ADxxxxxx.jpg pattern. **Called By**: - Internal (during check_sequence_continuity) - F05 Image Input Pipeline **Input**: ```python filenames: List[str] ``` **Output**: ```python ValidationResult: valid: bool errors: List[str] ``` **Validation Rules**: 1. **Pattern**: `AD\d{6}\.(jpg|JPG|png|PNG)` 2. **Examples**: AD000001.jpg, AD000237.JPG, AD002000.png 3. **Case insensitive**: Accepts .jpg, .JPG, .Jpg 4. **6 digits required**: Zero-padded to 6 digits **Regex Pattern**: `^AD\d{6}\.(jpg|JPG|png|PNG)$` **Error Conditions**: - Returns `valid=False` listing invalid filenames **Test Cases**: 1. **Valid names**: ["AD000001.jpg", "AD000002.jpg"] → valid=True 2. **Invalid prefix**: "IMG_0001.jpg" → valid=False 3. **Wrong digit count**: "AD001.jpg" (3 digits) → valid=False 4. **Missing extension**: "AD000001" → valid=False 5. **Invalid extension**: "AD000001.bmp" → valid=False --- ### `validate_format(image_data: bytes) -> ValidationResult` **Description**: Validates image file format and properties. **Called By**: - F05 Image Input Pipeline (per-image validation) **Input**: ```python image_data: bytes # Raw image file bytes ``` **Output**: ```python ValidationResult: valid: bool errors: List[str] ``` **Validation Rules**: 1. **Format**: Valid JPEG or PNG 2. **Dimensions**: 640×480 to 6252×4168 pixels 3. **File size**: < 10MB per image 4. **Image readable**: Not corrupted 5. **Color channels**: RGB (3 channels) **Algorithm**: ```python try: image = PIL.Image.open(BytesIO(image_data)) width, height = image.size if image.format not in ['JPEG', 'PNG']: return invalid("Format must be JPEG or PNG") if width < 640 or height < 480: return invalid("Dimensions too small") if width > 6252 or height > 4168: return invalid("Dimensions too large") if len(image_data) > 10 * 1024 * 1024: return invalid("File size exceeds 10MB") return valid() except Exception as e: return invalid(f"Corrupted image: {e}") ``` **Error Conditions**: - Returns `valid=False` with specific error **Test Cases**: 1. **Valid JPEG (2048×1536)**: valid=True 2. **Valid PNG (6252×4168)**: valid=True 3. **Too small (320×240)**: valid=False 4. **Too large (8000×6000)**: valid=False 5. **File too big (15MB)**: valid=False 6. **Corrupted file**: valid=False 7. **BMP format**: valid=False ## Integration Tests ### Test 1: Complete Batch Validation 1. Create batch with 20 images, AD000101.jpg - AD000120.jpg 2. validate_batch_size() → valid 3. validate_naming_convention() → valid 4. check_sequence_continuity(expected_start=101) → valid 5. validate_format() for each image → all valid ### Test 2: Invalid Batch Detection 1. Create batch with 60 images → validate_batch_size() fails 2. Create batch with gap (AD000101, AD000103) → check_sequence_continuity() fails 3. Create batch with IMG_0001.jpg → validate_naming_convention() fails 4. Create batch with corrupted image → validate_format() fails ### Test 3: Edge Cases 1. Batch with exactly 10 images → valid 2. Batch with exactly 50 images → valid 3. Batch with 51 images → invalid 4. Batch starting at AD999995.jpg (near max) → valid ## Non-Functional Requirements ### Performance - **validate_batch_size**: < 1ms - **check_sequence_continuity**: < 10ms for 50 images - **validate_naming_convention**: < 5ms for 50 filenames - **validate_format**: < 20ms per image (with PIL) - **Total batch validation**: < 100ms for 50 images ### Reliability - Never raises exceptions (returns ValidationResult with errors) - Handles edge cases gracefully - Clear, actionable error messages ### Maintainability - Configurable validation rules (min/max batch size, dimensions) - Easy to add new validation rules - Comprehensive error reporting ## Dependencies ### Internal Components - None (pure utility, no internal dependencies) ### External Dependencies - **Pillow (PIL)**: Image format validation and dimension checking - **re** (regex): Filename pattern matching ## Data Models ### ImageBatch ```python class ImageBatch(BaseModel): images: List[bytes] # Raw image data filenames: List[str] # e.g., ["AD000101.jpg", ...] start_sequence: int # 101 end_sequence: int # 150 batch_number: int # Sequential batch number ``` ### ValidationResult ```python class ValidationResult(BaseModel): valid: bool errors: List[str] = [] # Empty if valid warnings: List[str] = [] # Optional warnings ``` ### ValidationRules (Configuration) ```python class ValidationRules(BaseModel): min_batch_size: int = 10 max_batch_size: int = 50 min_width: int = 640 min_height: int = 480 max_width: int = 6252 max_height: int = 4168 max_file_size_mb: int = 10 allowed_formats: List[str] = ["JPEG", "PNG"] filename_pattern: str = r"^AD\d{6}\.(jpg|JPG|png|PNG)$" ``` ### Sequence Extraction ```python def extract_sequence(filename: str) -> int: """ Extracts sequence number from filename. Example: "AD000237.jpg" -> 237 """ match = re.match(r"AD(\d{6})\.", filename) if match: return int(match.group(1)) raise ValueError(f"Invalid filename format: {filename}") ```