mirror of https://github.com/azaion/detections-semantic.git synced 2026-04-22 23:26:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit

Made-with: Cursor

2026-03-26 00:20:30 +02:00

3.2 KiB

Raw Blame History

Containerization Plan

Container Architecture

Container	Base Image	Purpose	GPU Access
semantic-detection	nvcr.io/nvidia/l4t-tensorrt:r36.x (JetPack 6.2)	Main detection service (Cython + TRT + scan controller + gimbal + recorder)	Yes (TRT inference)
vlm-service	dustynv/nanollm:r36 (NanoLLM for JetPack 6)	VLM inference (VILA1.5-3B, 4-bit MLC)	Yes (GPU inference)

Dockerfile: semantic-detection

# Outline — not runnable, for planning purposes
FROM nvcr.io/nvidia/l4t-tensorrt:r36.x

# System dependencies
RUN apt-get update && apt-get install -y python3.11 python3-pip libopencv-dev

# Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt  # pyserial, crcmod, scikit-image, pyyaml

# Cython build
COPY src/ /app/src/
RUN cd /app/src && python3 setup.py build_ext --inplace

# Config and models mounted as volumes
VOLUME ["/models", "/etc/semantic-detection", "/data/output"]

ENTRYPOINT ["python3", "/app/src/main.py"]

Dockerfile: vlm-service

Uses NanoLLM pre-built Docker image. No custom Dockerfile needed — configuration via environment variables and volume mounts.

# docker-compose snippet
vlm-service:
  image: dustynv/nanollm:r36
  runtime: nvidia
  environment:
    - MODEL=VILA1.5-3B
    - QUANTIZATION=w4a16
  volumes:
    - vlm-models:/models
    - vlm-socket:/tmp
  ipc: host
  shm_size: 8g

Volume Strategy

Volume	Mount Point	Contents	Persistence
models	/models	TRT FP16 engines (yoloe-11s-seg.engine, yoloe-26s-seg.engine, mobilenetv3.engine)	Persistent on NVMe
config	/etc/semantic-detection	config.yaml, class definitions	Persistent on NVMe
output	/data/output	Detection logs, recorded frames, gimbal logs	Persistent on NVMe (circular buffer)
vlm-models	/models (vlm-service)	VILA1.5-3B MLC weights	Persistent on NVMe
vlm-socket	/tmp (both containers)	Unix domain socket for IPC	Ephemeral

Both containers share the same GPU. Sequential scheduling enforced at application level:

During Level 1: only semantic-detection uses GPU (YOLOE inference)
During Level 2 Tier 3: semantic-detection pauses YOLOE, vlm-service runs VLM inference
--runtime=nvidia on both containers, but application logic prevents concurrent GPU access

Resource Limits

Container	Memory Limit	CPU Limit	GPU
semantic-detection	4GB	No limit (all 6 cores available)	Shared
vlm-service	4GB	No limit	Shared

Note: Limits are soft — shared LPDDR5 means actual allocation is dynamic. Application-level monitoring (HealthMonitor) tracks actual usage.

Development Environment

# docker-compose.dev.yaml
services:
  semantic-detection:
    build: .
    environment:
      - ENV=development
      - GIMBAL_MODE=mock_tcp
      - INFERENCE_ENGINE=onnxruntime
    volumes:
      - ./src:/app/src
      - ./config/config.dev.yaml:/etc/semantic-detection/config.yaml
    ports:
      - "8080:8080"

  vlm-stub:
    build: ./tests/vlm_stub
    volumes:
      - vlm-socket:/tmp

  mock-gimbal:
    build: ./tests/mock_gimbal
    ports:
      - "9090:9090"

3.2 KiB Raw Blame History