Initial commit

Made-with: Cursor
2026-04-22 20:36:37 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,104 @@
+# Containerization Plan
+
+## Container Architecture
+
+| Container | Base Image | Purpose | GPU Access |
+|-----------|-----------|---------|------------|
+| semantic-detection | nvcr.io/nvidia/l4t-tensorrt:r36.x (JetPack 6.2) | Main detection service (Cython + TRT + scan controller + gimbal + recorder) | Yes (TRT inference) |
+| vlm-service | dustynv/nanollm:r36 (NanoLLM for JetPack 6) | VLM inference (VILA1.5-3B, 4-bit MLC) | Yes (GPU inference) |
+
+## Dockerfile: semantic-detection
+
+```dockerfile
+# Outline — not runnable, for planning purposes
+FROM nvcr.io/nvidia/l4t-tensorrt:r36.x
+
+# System dependencies
+RUN apt-get update && apt-get install -y python3.11 python3-pip libopencv-dev
+
+# Python dependencies
+COPY requirements.txt .
+RUN pip3 install -r requirements.txt  # pyserial, crcmod, scikit-image, pyyaml
+
+# Cython build
+COPY src/ /app/src/
+RUN cd /app/src && python3 setup.py build_ext --inplace
+
+# Config and models mounted as volumes
+VOLUME ["/models", "/etc/semantic-detection", "/data/output"]
+
+ENTRYPOINT ["python3", "/app/src/main.py"]
+```
+
+## Dockerfile: vlm-service
+
+Uses NanoLLM pre-built Docker image. No custom Dockerfile needed — configuration via environment variables and volume mounts.
+
+```yaml
+# docker-compose snippet
+vlm-service:
+  image: dustynv/nanollm:r36
+  runtime: nvidia
+  environment:
+    - MODEL=VILA1.5-3B
+    - QUANTIZATION=w4a16
+  volumes:
+    - vlm-models:/models
+    - vlm-socket:/tmp
+  ipc: host
+  shm_size: 8g
+```
+
+## Volume Strategy
+
+| Volume | Mount Point | Contents | Persistence |
+|--------|-----------|----------|-------------|
+| models | /models | TRT FP16 engines (yoloe-11s-seg.engine, yoloe-26s-seg.engine, mobilenetv3.engine) | Persistent on NVMe |
+| config | /etc/semantic-detection | config.yaml, class definitions | Persistent on NVMe |
+| output | /data/output | Detection logs, recorded frames, gimbal logs | Persistent on NVMe (circular buffer) |
+| vlm-models | /models (vlm-service) | VILA1.5-3B MLC weights | Persistent on NVMe |
+| vlm-socket | /tmp (both containers) | Unix domain socket for IPC | Ephemeral |
+
+## GPU Sharing
+
+Both containers share the same GPU. Sequential scheduling enforced at application level:
+- During Level 1: only semantic-detection uses GPU (YOLOE inference)
+- During Level 2 Tier 3: semantic-detection pauses YOLOE, vlm-service runs VLM inference
+- `--runtime=nvidia` on both containers, but application logic prevents concurrent GPU access
+
+## Resource Limits
+
+| Container | Memory Limit | CPU Limit | GPU |
+|-----------|-------------|-----------|-----|
+| semantic-detection | 4GB | No limit (all 6 cores available) | Shared |
+| vlm-service | 4GB | No limit | Shared |
+
+Note: Limits are soft — shared LPDDR5 means actual allocation is dynamic. Application-level monitoring (HealthMonitor) tracks actual usage.
+
+## Development Environment
+
+```yaml
+# docker-compose.dev.yaml
+services:
+  semantic-detection:
+    build: .
+    environment:
+      - ENV=development
+      - GIMBAL_MODE=mock_tcp
+      - INFERENCE_ENGINE=onnxruntime
+    volumes:
+      - ./src:/app/src
+      - ./config/config.dev.yaml:/etc/semantic-detection/config.yaml
+    ports:
+      - "8080:8080"
+
+  vlm-stub:
+    build: ./tests/vlm_stub
+    volumes:
+      - vlm-socket:/tmp
+
+  mock-gimbal:
+    build: ./tests/mock_gimbal
+    ports:
+      - "9090:9090"
+```