Files
gps-denied-desktop/docs/02_components/02_Image_Preprocessing/spec.md
T
2025-11-19 23:07:29 +02:00

1.3 KiB

Image Preprocessing Component

Detailed Description

The Image Preprocessing component is a core system module responsible for transforming raw image data into the canonical formats required by the AI models (SuperPoint, AnyLoc, LiteSAM). It ensures that all images entering the pipeline are consistently resized, normalized, and converted to tensors, decoupling the model requirements from the input source format.

API Methods

preprocess_image

  • Input: image: np.array, target_size: tuple, normalize: bool
  • Output: np.array (or Tensor)
  • Description: Resizes the input image to the target dimensions while handling aspect ratio (padding or cropping as configured). Optionally normalizes pixel intensity values (e.g., to 0-1 range or standard deviation).
  • Test Cases:
    • Input 4000x3000, Target 640x480 -> Output 640x480, valid pixel range.
    • Input grayscale vs RGB -> Handles channel expansion/contraction.

Integration Tests

  • Pipeline Compatibility: Verify output matches the exact tensor shape expected by the Model Registry's SuperPoint and AnyLoc wrappers.

Non-functional Tests

  • Latency: Operation must be extremely fast (< 5ms) to not add overhead to the pipeline. Use optimized libraries (OpenCV/TorchVision).