Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
2026-06-22 21:11:11 +00:00 · 2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
@@ -0,0 +1,7 @@
+# Project Management
+
+- This project uses **Jira ONLY** for work item tracking (NOT Azure DevOps)
+- Jira project key: `AZ` (AZAION)
+- Jira cloud ID: `1598226f-845f-4705-bcd1-5ed0c82d6119`
+- Use the `user-Jira-MCP-Server` MCP server for all Jira operations
+- Never use Azure DevOps MCP for this project's work items
@@ -11,7 +11,7 @@ Workflow for projects with an existing codebase. Starts with documentation, prod
 | 3 | Decompose Tests | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4 |
 | 4 | Implement Tests | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
 | 5 | Run Tests | test-run/SKILL.md | Steps 1–4 |
-| 6 | Refactor | refactor/SKILL.md | Phases 0–5 (6-phase method) |
+| 6 | Refactor | refactor/SKILL.md | Phases 0–6 (7-phase method) (optional) |
 | 7 | New Task | new-task/SKILL.md | Steps 1–8 (loop) |
 | 8 | Implement | implement/SKILL.md | (batch-driven, no fixed sub-steps) |
 | 9 | Run Tests | test-run/SKILL.md | Steps 1–4 |
@@ -75,19 +75,31 @@ Verifies the implemented test suite passes before proceeding to refactoring. The

 ---

-**Step 6 — Refactor**
-Condition: the autopilot state shows Step 5 (Run Tests) is completed AND `_docs/04_refactoring/FINAL_report.md` does not exist
+**Step 6 — Refactor (optional)**
+Condition: the autopilot state shows Step 5 (Run Tests) is completed AND the autopilot state does NOT show Step 6 (Refactor) as completed or skipped AND `_docs/04_refactoring/FINAL_report.md` does not exist

-Action: Read and execute `.cursor/skills/refactor/SKILL.md`
+Action: Present using Choose format:

-The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
+```
+══════════════════════════════════════
+ DECISION REQUIRED: Refactor codebase before adding new features?
+══════════════════════════════════════
+ A) Run refactoring (recommended if code quality issues were noted during documentation)
+ B) Skip — proceed directly to New Task
+══════════════════════════════════════
+ Recommendation: [A or B — base on whether documentation
+ flagged significant code smells, coupling issues, or
+ technical debt worth addressing before new development]
+══════════════════════════════════════
+```

-If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues.
+- If user picks A → Read and execute `.cursor/skills/refactor/SKILL.md`. The refactor skill runs the full method using the implemented tests as a safety net. If `_docs/04_refactoring/` has phase reports, the refactor skill detects completed phases and continues. After completion, auto-chain to Step 7 (New Task).
+- If user picks B → Mark Step 6 as `skipped` in the state file, auto-chain to Step 7 (New Task).

 ---

 **Step 7 — New Task**
-Condition: the autopilot state shows Step 6 (Refactor) is completed AND the autopilot state does NOT show Step 7 (New Task) as completed
+Condition: the autopilot state shows Step 6 (Refactor) is completed or skipped AND the autopilot state does NOT show Step 7 (New Task) as completed

 Action: Read and execute `.cursor/skills/new-task/SKILL.md`

@@ -198,8 +210,8 @@ Action: The project completed a full cycle. Present status and loop back to New
 | Test Spec (2) | Auto-chain → Decompose Tests (3) |
 | Decompose Tests (3) | **Session boundary** — suggest new conversation before Implement Tests |
 | Implement Tests (4) | Auto-chain → Run Tests (5) |
-| Run Tests (5, all pass) | Auto-chain → Refactor (6) |
-| Refactor (6) | Auto-chain → New Task (7) |
+| Run Tests (5, all pass) | Auto-chain → Refactor choice (6) |
+| Refactor (6, done or skipped) | Auto-chain → New Task (7) |
 | New Task (7) | **Session boundary** — suggest new conversation before Implement |
 | Implement (8) | Auto-chain → Run Tests (9) |
 | Run Tests (9, all pass) | Auto-chain → Security Audit choice (10) |
@@ -218,7 +230,7 @@ Action: The project completed a full cycle. Present status and loop back to New
 Step 3   Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 Step 4   Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED / FAILED (retry N/3)]
 Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
- Step 6   Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
+ Step 6   Refactor            [DONE / SKIPPED / IN PROGRESS (phase N) / NOT STARTED / FAILED (retry N/3)]
 Step 7   New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
 Step 8   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED / FAILED (retry N/3)]
 Step 9   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED / FAILED (retry N/3)]
@@ -1,471 +1,99 @@
 ---
 name: refactor
 description: |
-  Structured refactoring workflow (6-phase method) with three execution modes:
-  - Full Refactoring: all 6 phases — baseline, discovery, analysis, safety net, execution, hardening
-  - Targeted Refactoring: skip discovery if docs exist, focus on a specific component/area
-  - Quick Assessment: phases 0-2 only, outputs a refactoring plan without execution
-  Supports project mode (_docs/ structure) and standalone mode (@file.md).
-  Trigger phrases:
-  - "refactor", "refactoring", "improve code"
-  - "analyze coupling", "decoupling", "technical debt"
-  - "refactoring assessment", "code quality improvement"
+  Structured 9-phase refactoring workflow with three execution modes:
+  Full (all phases), Targeted (skip discovery), Quick Assessment (phases 0-2 only).
+  Supports project mode (_docs/) and standalone mode (@file.md).
 category: evolve
 tags: [refactoring, coupling, technical-debt, performance, hardening]
+trigger_phrases: ["refactor", "refactoring", "improve code", "analyze coupling", "decoupling", "technical debt", "code quality"]
 disable-model-invocation: true
 ---

-# Structured Refactoring (6-Phase Method)
+# Structured Refactoring

-Transform existing codebases through a systematic refactoring workflow: capture baseline, document current state, research improvements, build safety net, execute changes, and harden.
+Phase details live in `phases/` — read the relevant file before executing each phase.

 ## Core Principles

 - **Preserve behavior first**: never refactor without a passing test suite
 - **Measure before and after**: every change must be justified by metrics
 - **Small incremental changes**: commit frequently, never break tests
- **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
+- **Save immediately**: write artifacts to disk after each phase
 - **Ask, don't assume**: when scope or priorities are unclear, STOP and ask the user

 ## Context Resolution

-Determine the operating mode based on invocation before any other logic runs.
+Determine operating mode before any other logic runs. Announce detected mode and paths to user.

-**Project mode** (no explicit input file provided):
- PROBLEM_DIR: `_docs/00_problem/`
- SOLUTION_DIR: `_docs/01_solution/`
- COMPONENTS_DIR: `_docs/02_document/components/`
- DOCUMENT_DIR: `_docs/02_document/`
- REFACTOR_DIR: `_docs/04_refactoring/`
- All existing guardrails apply.
+| | Project mode (default) | Standalone mode (`/refactor @file.md`) |
+|---|---|---|
+| PROBLEM_DIR | `_docs/00_problem/` | N/A |
+| SOLUTION_DIR | `_docs/01_solution/` | N/A |
+| COMPONENTS_DIR | `_docs/02_document/components/` | N/A |
+| DOCUMENT_DIR | `_docs/02_document/` | N/A |
+| REFACTOR_DIR | `_docs/04_refactoring/` | `_standalone/refactoring/` |
+| Prereqs | `problem.md` required, `acceptance_criteria.md` warn if absent | INPUT_FILE must exist and be non-empty |

-**Standalone mode** (explicit input file provided, e.g. `/refactor @some_component.md`):
- INPUT_FILE: the provided file (treated as component/area description)
- REFACTOR_DIR: `_standalone/refactoring/`
- Guardrails relaxed: only INPUT_FILE must exist and be non-empty
- `acceptance_criteria.md` is optional — warn if absent
-
-Announce the detected mode and resolved paths to the user before proceeding.
-
-## Mode Detection
-
-After context resolution, determine the execution mode:
-
-1. **User explicitly says** "quick assessment" or "just assess" → **Quick Assessment**
-2. **User explicitly says** "refactor [component/file/area]" with a specific target → **Targeted Refactoring**
-3. **Default** → **Full Refactoring**
-
-| Mode | Phases Executed | When to Use |
-|------|----------------|-------------|
-| **Full Refactoring** | 0 → 1 → 2 → 3 → 4 → 5 | Complete refactoring of a system or major area |
-| **Targeted Refactoring** | 0 → (skip 1 if docs exist) → 2 → 3 → 4 → 5 | Refactor a specific component; docs already exist |
-| **Quick Assessment** | 0 → 1 → 2 | Produce a refactoring roadmap without executing changes |
-
-Inform the user which mode was detected and confirm before proceeding.
-
-## Prerequisite Checks (BLOCKING)
-
-**Project mode:**
-1. PROBLEM_DIR exists with `problem.md` (or `problem_description.md`) — **STOP if missing**, ask user to create it
-2. If `acceptance_criteria.md` is missing: **warn** and ask whether to proceed
-3. Create REFACTOR_DIR if it does not exist
-4. If REFACTOR_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
-
-**Standalone mode:**
-1. INPUT_FILE exists and is non-empty — **STOP if missing**
-2. Warn if no `acceptance_criteria.md` provided
-3. Create REFACTOR_DIR if it does not exist
-
-## Artifact Management
-
-### Directory Structure
-
-```
-REFACTOR_DIR/
-├── baseline_metrics.md          (Phase 0)
-├── discovery/
-│   ├── components/
-│   │   └── [##]_[name].md       (Phase 1)
-│   ├── solution.md              (Phase 1)
-│   └── system_flows.md          (Phase 1)
-├── analysis/
-│   ├── research_findings.md     (Phase 2)
-│   └── refactoring_roadmap.md   (Phase 2)
-├── test_specs/
-│   └── [##]_[test_name].md      (Phase 3)
-├── coupling_analysis.md         (Phase 4)
-├── execution_log.md             (Phase 4)
-├── hardening/
-│   ├── technical_debt.md        (Phase 5)
-│   ├── performance.md           (Phase 5)
-│   └── security.md              (Phase 5)
-└── FINAL_report.md              (after all phases)
-```
-
-### Save Timing
-
-| Phase | Save immediately after | Filename |
-|-------|------------------------|----------|
-| Phase 0 | Baseline captured | `baseline_metrics.md` |
-| Phase 1 | Each component documented | `discovery/components/[##]_[name].md` |
-| Phase 1 | Solution synthesized | `discovery/solution.md`, `discovery/system_flows.md` |
-| Phase 2 | Research complete | `analysis/research_findings.md` |
-| Phase 2 | Roadmap produced | `analysis/refactoring_roadmap.md` |
-| Phase 3 | Test specs written | `test_specs/[##]_[test_name].md` |
-| Phase 4 | Coupling analyzed | `coupling_analysis.md` |
-| Phase 4 | Execution complete | `execution_log.md` |
-| Phase 5 | Each hardening track | `hardening/<track>.md` |
-| Final | All phases done | `FINAL_report.md` |
-
-### Resumability
-
-If REFACTOR_DIR already contains artifacts:
-
-1. List existing files and match to the save timing table
-2. Identify the last completed phase based on which artifacts exist
-3. Resume from the next incomplete phase
-4. Inform the user which phases are being skipped
-
-## Progress Tracking
-
-At the start of execution, create a TodoWrite with all applicable phases. Update status as each phase completes.
+Create REFACTOR_DIR if missing. If it already has artifacts, ask user: **resume or start fresh?**

 ## Workflow

-### Phase 0: Context & Baseline
-
-**Role**: Software engineer preparing for refactoring
-**Goal**: Collect refactoring goals and capture baseline metrics
-**Constraints**: Measurement only — no code changes
-
-#### 0a. Collect Goals
-
-If PROBLEM_DIR files do not yet exist, help the user create them:
-
-1. `problem.md` — what the system currently does, what changes are needed, pain points
-2. `acceptance_criteria.md` — success criteria for the refactoring
-3. `security_approach.md` — security requirements (if applicable)
-
-Store in PROBLEM_DIR.
-
-#### 0b. Capture Baseline
-
-1. Read problem description and acceptance criteria
-2. Measure current system metrics using project-appropriate tools:
-
-| Metric Category | What to Capture |
-|----------------|-----------------|
-| **Coverage** | Overall, unit, blackbox, critical paths |
-| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
-| **Code Smells** | Total, critical, major |
-| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
-| **Dependencies** | Total count, outdated, security vulnerabilities |
-| **Build** | Build time, test execution time, deployment time |
-
-3. Create functionality inventory: all features/endpoints with status and coverage
-
-**Self-verification**:
- [ ] All metric categories measured (or noted as N/A with reason)
- [ ] Functionality inventory is complete
- [ ] Measurements are reproducible
-
-**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
-
-**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
-
---
-
-### Phase 1: Discovery
-
-**Role**: Principal software architect
-**Goal**: Generate documentation from existing code and form solution description
-**Constraints**: Document what exists, not what should be. No code changes.
-
-**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
-
-#### 1a. Document Components
-
-For each component in the codebase:
-
-1. Analyze project structure, directories, files
-2. Go file by file, analyze each method
-3. Analyze connections between components
-
-Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
- Purpose and architectural patterns
- Mermaid diagrams for logic flows
- API reference table (name, description, input, output)
- Implementation details: algorithmic complexity, state management, dependencies
- Caveats, edge cases, known limitations
-
-#### 1b. Synthesize Solution & Flows
-
-1. Review all generated component documentation
-2. Synthesize into a cohesive solution description
-3. Create flow diagrams showing component interactions
-
-Write:
- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
-
-Also copy to project standard locations if in project mode:
- `SOLUTION_DIR/solution.md`
- `DOCUMENT_DIR/system_flows.md`
-
-**Self-verification**:
- [ ] Every component in the codebase is documented
- [ ] Solution description covers all components
- [ ] Flow diagrams cover all major use cases
- [ ] Mermaid diagrams are syntactically correct
-
-**Save action**: Write discovery artifacts
-
-**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
-
---
-
-### Phase 2: Analysis
-
-**Role**: Researcher and software architect
-**Goal**: Research improvements and produce a refactoring roadmap
-**Constraints**: Analysis only — no code changes
-
-#### 2a. Deep Research
-
-1. Analyze current implementation patterns
-2. Research modern approaches for similar systems
-3. Identify what could be done differently
-4. Suggest improvements based on state-of-the-art practices
-
-Write `REFACTOR_DIR/analysis/research_findings.md`:
- Current state analysis: patterns used, strengths, weaknesses
- Alternative approaches per component: current vs alternative, pros/cons, migration effort
- Prioritized recommendations: quick wins + strategic improvements
-
-#### 2b. Solution Assessment
-
-1. Assess current implementation against acceptance criteria
-2. Identify weak points in codebase, map to specific code areas
-3. Perform gap analysis: acceptance criteria vs current state
-4. Prioritize changes by impact and effort
-
-Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
- Weak points assessment: location, description, impact, proposed solution
- Gap analysis: what's missing, what needs improvement
- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
-
-**Self-verification**:
- [ ] All acceptance criteria are addressed in gap analysis
- [ ] Recommendations are grounded in actual code, not abstract
- [ ] Roadmap phases are prioritized by impact
- [ ] Quick wins are identified separately
-
-**Save action**: Write analysis artifacts
-
-**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
-
-**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
-
---
-
-### Phase 3: Safety Net
-
-**Role**: QA engineer and developer
-**Goal**: Design and implement tests that capture current behavior before refactoring
-**Constraints**: Tests must all pass on the current codebase before proceeding
-
-#### 3a. Design Test Specs
-
-Coverage requirements (must meet before refactoring — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds):
- Minimum overall coverage: 75%
- Critical path coverage: 90%
- All public APIs must have blackbox tests
- All error handling paths must be tested
-
-For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
- Blackbox tests: summary, current behavior, input data, expected result, max expected time
- Acceptance tests: summary, preconditions, steps with expected results
- Coverage analysis: current %, target %, uncovered critical paths
-
-#### 3b. Implement Tests
-
-1. Set up test environment and infrastructure if not exists
-2. Implement each test from specs
-3. Run tests, verify all pass on current codebase
-4. Document any discovered issues
-
-**Self-verification**:
- [ ] Coverage requirements met (75% overall, 90% critical paths)
- [ ] All tests pass on current codebase
- [ ] All public APIs have blackbox tests
- [ ] Test data fixtures are configured
-
-**Save action**: Write test specs; implemented tests go into the project's test folder
-
-**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
-
---
-
-### Phase 4: Execution
-
-**Role**: Software architect and developer
-**Goal**: Analyze coupling and execute decoupling changes
-**Constraints**: Small incremental changes; tests must stay green after every change
-
-#### 4a. Analyze Coupling
-
-1. Analyze coupling between components/modules
-2. Map dependencies (direct and transitive)
-3. Identify circular dependencies
-4. Form decoupling strategy
-
-Write `REFACTOR_DIR/coupling_analysis.md`:
- Dependency graph (Mermaid)
- Coupling metrics per component
- Problem areas: components involved, coupling type, severity, impact
- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
-
-**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
-
-#### 4b. Execute Decoupling
-
-For each change in the decoupling strategy:
-
-1. Implement the change
-2. Run blackbox tests
-3. Fix any failures
-4. Commit with descriptive message
-
-Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
-
-Write `REFACTOR_DIR/execution_log.md`:
- Change description, files affected, test status per change
- Before/after metrics comparison against baseline
-
-**Self-verification**:
- [ ] All tests still pass after execution
- [ ] No circular dependencies remain (or reduced per plan)
- [ ] Code smells addressed
- [ ] Metrics improved compared to baseline
-
-**Save action**: Write execution artifacts
-
-**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
-
---
-
-### Phase 5: Hardening (Optional, Parallel Tracks)
-
-**Role**: Varies per track
-**Goal**: Address technical debt, performance, and security
-**Constraints**: Each track is optional; user picks which to run
-
-Present the three tracks and let user choose which to execute:
-
-#### Track A: Technical Debt
-
-**Role**: Technical debt analyst
-
-1. Identify and categorize debt items: design, code, test, documentation
-2. Assess each: location, description, impact, effort, interest (cost of not fixing)
-3. Prioritize: quick wins → strategic debt → tolerable debt
-4. Create actionable plan with prevention measures
-
-Write `REFACTOR_DIR/hardening/technical_debt.md`
-
-#### Track B: Performance Optimization
-
-**Role**: Performance engineer
-
-1. Profile current performance, identify bottlenecks
-2. For each bottleneck: location, symptom, root cause, impact
-3. Propose optimizations with expected improvement and risk
-4. Implement one at a time, benchmark after each change
-5. Verify tests still pass
-
-Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
-
-#### Track C: Security Review
-
-**Role**: Security engineer
-
-1. Review code against OWASP Top 10
-2. Verify security requirements from `security_approach.md` are met
-3. Check: authentication, authorization, input validation, output encoding, encryption, logging
-
-Write `REFACTOR_DIR/hardening/security.md`:
- Vulnerability assessment: location, type, severity, exploit scenario, fix
- Security controls review
- Compliance check against `security_approach.md`
- Recommendations: critical fixes, improvements, hardening
-
-**Self-verification** (per track):
- [ ] All findings are grounded in actual code
- [ ] Recommendations are actionable with effort estimates
- [ ] All tests still pass after any changes
-
-**Save action**: Write hardening artifacts
-
---
+| Phase | File | Summary | Gate |
+|-------|------|---------|------|
+| 0 | `phases/00-baseline.md` | Collect goals, capture baseline metrics | BLOCKING: user confirms |
+| 1 | `phases/01-discovery.md` | Document components, synthesize solution | BLOCKING: user confirms |
+| 2 | `phases/02-analysis.md` | Research improvements, produce roadmap | BLOCKING: user confirms |
+| | | *Quick Assessment stops here* | |
+| 3 | `phases/03-safety-net.md` | Design and implement pre-refactoring tests | GATE: all tests pass |
+| 4 | `phases/04-execution.md` | Analyze coupling, execute decoupling | BLOCKING: user confirms |
+| 5 | `phases/05-hardening.md` | Technical debt, performance, security | Optional: user picks tracks |
+| 6 | `phases/06-test-sync.md` | Remove obsolete, update broken, add new tests | GATE: all tests pass |
+| 7 | `phases/07-verification.md` | Run full suite, compare metrics vs baseline | GATE: all pass, no regressions |
+| 8 | `phases/08-documentation.md` | Update `_docs/` to reflect refactored state | Skip in standalone mode |
+
+**Mode detection:**
+- "quick assessment" / "just assess" → phases 0–2
+- "refactor [specific target]" → skip phase 1 if docs exist
+- Default → all phases
+
+At the start of execution, create a TodoWrite with all applicable phases.
+
+## Artifact Structure
+
+All artifacts are written to REFACTOR_DIR:
+
+```
+baseline_metrics.md                      Phase 0
+discovery/components/[##]_[name].md      Phase 1
+discovery/solution.md                    Phase 1
+discovery/system_flows.md                Phase 1
+analysis/research_findings.md            Phase 2
+analysis/refactoring_roadmap.md          Phase 2
+test_specs/[##]_[test_name].md           Phase 3
+coupling_analysis.md                     Phase 4
+execution_log.md                         Phase 4
+hardening/{technical_debt,performance,security}.md   Phase 5
+test_sync/{obsolete_tests,updated_tests,new_tests}.md  Phase 6
+verification_report.md                   Phase 7
+doc_update_log.md                        Phase 8
+FINAL_report.md                          after all phases
+```
+
+**Resumability**: match existing artifacts to phases above, resume from next incomplete phase.

 ## Final Report

-After all executed phases complete, write `REFACTOR_DIR/FINAL_report.md`:
-
- Refactoring mode used and phases executed
- Baseline metrics vs final metrics comparison
- Changes made summary
- Remaining items (deferred to future)
- Lessons learned
+After all phases complete, write `REFACTOR_DIR/FINAL_report.md`:
+mode used, phases executed, baseline vs final metrics, changes summary, remaining items, lessons learned.

 ## Escalation Rules

 | Situation | Action |
 |-----------|--------|
-| Unclear refactoring scope | **ASK user** |
-| Ambiguous acceptance criteria | **ASK user** |
+| Unclear scope or ambiguous criteria | **ASK user** |
 | Tests failing before refactoring | **ASK user** — fix tests or fix code? |
-| Coupling change risks breaking external contracts | **ASK user** |
-| Performance optimization vs readability trade-off | **ASK user** |
-| Missing baseline metrics (no test suite, no CI) | **WARN user**, suggest building safety net first |
-| Security vulnerability found during refactoring | **WARN user** immediately, don't defer |
-
-## Trigger Conditions
-
-When the user wants to:
- Improve existing code structure or quality
- Reduce technical debt or coupling
- Prepare codebase for new features
- Assess code health before major changes
-
-**Keywords**: "refactor", "refactoring", "improve code", "reduce coupling", "technical debt", "code quality", "decoupling"
-
-## Methodology Quick Reference
-
-```
-┌────────────────────────────────────────────────────────────────┐
-│           Structured Refactoring (6-Phase Method)              │
-├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (project vs standalone) + set paths      │
-│ MODE: Full / Targeted / Quick Assessment                       │
-│                                                                │
-│ 0. Context & Baseline  → baseline_metrics.md                   │
-│    [BLOCKING: user confirms baseline]                          │
-│ 1. Discovery           → discovery/ (components, solution)     │
-│    [BLOCKING: user confirms documentation]                     │
-│ 2. Analysis            → analysis/ (research, roadmap)         │
-│    [BLOCKING: user confirms roadmap]                           │
-│    ── Quick Assessment stops here ──                           │
-│ 3. Safety Net          → test_specs/ + implemented tests       │
-│    [GATE: all tests must pass]                                 │
-│ 4. Execution           → coupling_analysis, execution_log      │
-│    [BLOCKING: user confirms changes]                           │
-│ 5. Hardening           → hardening/ (debt, perf, security)     │
-│    [optional, user picks tracks]                               │
-│    ─────────────────────────────────────────────────           │
-│    FINAL_report.md                                             │
-├────────────────────────────────────────────────────────────────┤
-│ Principles: Preserve behavior · Measure before/after           │
-│             Small changes · Save immediately · Ask don't assume│
-└────────────────────────────────────────────────────────────────┘
-```
+| Risk of breaking external contracts | **ASK user** |
+| Performance vs readability trade-off | **ASK user** |
+| No test suite or CI exists | **WARN user**, suggest safety net first |
+| Security vulnerability found | **WARN user** immediately |
@@ -0,0 +1,40 @@
+# Phase 0: Context & Baseline
+
+**Role**: Software engineer preparing for refactoring
+**Goal**: Collect refactoring goals and capture baseline metrics
+**Constraints**: Measurement only — no code changes
+
+## 0a. Collect Goals
+
+If PROBLEM_DIR files do not yet exist, help the user create them:
+
+1. `problem.md` — what the system currently does, what changes are needed, pain points
+2. `acceptance_criteria.md` — success criteria for the refactoring
+3. `security_approach.md` — security requirements (if applicable)
+
+Store in PROBLEM_DIR.
+
+## 0b. Capture Baseline
+
+1. Read problem description and acceptance criteria
+2. Measure current system metrics using project-appropriate tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+3. Create functionality inventory: all features/endpoints with status and coverage
+
+**Self-verification**:
+- [ ] All metric categories measured (or noted as N/A with reason)
+- [ ] Functionality inventory is complete
+- [ ] Measurements are reproducible
+
+**Save action**: Write `REFACTOR_DIR/baseline_metrics.md`
+
+**BLOCKING**: Present baseline summary to user. Do NOT proceed until user confirms.
@@ -0,0 +1,46 @@
+# Phase 1: Discovery
+
+**Role**: Principal software architect
+**Goal**: Generate documentation from existing code and form solution description
+**Constraints**: Document what exists, not what should be. No code changes.
+
+**Skip condition** (Targeted mode): If `COMPONENTS_DIR` and `SOLUTION_DIR` already contain documentation for the target area, skip to Phase 2. Ask user to confirm skip.
+
+## 1a. Document Components
+
+For each component in the codebase:
+
+1. Analyze project structure, directories, files
+2. Go file by file, analyze each method
+3. Analyze connections between components
+
+Write per component to `REFACTOR_DIR/discovery/components/[##]_[name].md`:
+- Purpose and architectural patterns
+- Mermaid diagrams for logic flows
+- API reference table (name, description, input, output)
+- Implementation details: algorithmic complexity, state management, dependencies
+- Caveats, edge cases, known limitations
+
+## 1b. Synthesize Solution & Flows
+
+1. Review all generated component documentation
+2. Synthesize into a cohesive solution description
+3. Create flow diagrams showing component interactions
+
+Write:
+- `REFACTOR_DIR/discovery/solution.md` — product description, component overview, interaction diagram
+- `REFACTOR_DIR/discovery/system_flows.md` — Mermaid flowcharts per major use case
+
+Also copy to project standard locations if in project mode:
+- `SOLUTION_DIR/solution.md`
+- `DOCUMENT_DIR/system_flows.md`
+
+**Self-verification**:
+- [ ] Every component in the codebase is documented
+- [ ] Solution description covers all components
+- [ ] Flow diagrams cover all major use cases
+- [ ] Mermaid diagrams are syntactically correct
+
+**Save action**: Write discovery artifacts
+
+**BLOCKING**: Present discovery summary to user. Do NOT proceed until user confirms documentation accuracy.
@@ -0,0 +1,41 @@
+# Phase 2: Analysis
+
+**Role**: Researcher and software architect
+**Goal**: Research improvements and produce a refactoring roadmap
+**Constraints**: Analysis only — no code changes
+
+## 2a. Deep Research
+
+1. Analyze current implementation patterns
+2. Research modern approaches for similar systems
+3. Identify what could be done differently
+4. Suggest improvements based on state-of-the-art practices
+
+Write `REFACTOR_DIR/analysis/research_findings.md`:
+- Current state analysis: patterns used, strengths, weaknesses
+- Alternative approaches per component: current vs alternative, pros/cons, migration effort
+- Prioritized recommendations: quick wins + strategic improvements
+
+## 2b. Solution Assessment
+
+1. Assess current implementation against acceptance criteria
+2. Identify weak points in codebase, map to specific code areas
+3. Perform gap analysis: acceptance criteria vs current state
+4. Prioritize changes by impact and effort
+
+Write `REFACTOR_DIR/analysis/refactoring_roadmap.md`:
+- Weak points assessment: location, description, impact, proposed solution
+- Gap analysis: what's missing, what needs improvement
+- Phased roadmap: Phase 1 (critical fixes), Phase 2 (major improvements), Phase 3 (enhancements)
+
+**Self-verification**:
+- [ ] All acceptance criteria are addressed in gap analysis
+- [ ] Recommendations are grounded in actual code, not abstract
+- [ ] Roadmap phases are prioritized by impact
+- [ ] Quick wins are identified separately
+
+**Save action**: Write analysis artifacts
+
+**BLOCKING**: Present refactoring roadmap to user. Do NOT proceed until user confirms.
+
+**Quick Assessment mode stops here.** Present final summary and write `FINAL_report.md` with phases 0-2 content.
@@ -0,0 +1,35 @@
+# Phase 3: Safety Net
+
+**Role**: QA engineer and developer
+**Goal**: Design and implement tests that capture current behavior before refactoring
+**Constraints**: Tests must all pass on the current codebase before proceeding
+
+## 3a. Design Test Specs
+
+Coverage requirements (must meet before refactoring — see `.cursor/rules/cursor-meta.mdc` Quality Thresholds):
+- Minimum overall coverage: 75%
+- Critical path coverage: 90%
+- All public APIs must have blackbox tests
+- All error handling paths must be tested
+
+For each critical area, write test specs to `REFACTOR_DIR/test_specs/[##]_[test_name].md`:
+- Blackbox tests: summary, current behavior, input data, expected result, max expected time
+- Acceptance tests: summary, preconditions, steps with expected results
+- Coverage analysis: current %, target %, uncovered critical paths
+
+## 3b. Implement Tests
+
+1. Set up test environment and infrastructure if not exists
+2. Implement each test from specs
+3. Run tests, verify all pass on current codebase
+4. Document any discovered issues
+
+**Self-verification**:
+- [ ] Coverage requirements met (75% overall, 90% critical paths)
+- [ ] All tests pass on current codebase
+- [ ] All public APIs have blackbox tests
+- [ ] Test data fixtures are configured
+
+**Save action**: Write test specs; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 4. If tests fail, fix the tests (not the code) or ask user for guidance. Do NOT proceed to Phase 4 with failing tests.
@@ -0,0 +1,45 @@
+# Phase 4: Execution
+
+**Role**: Software architect and developer
+**Goal**: Analyze coupling and execute decoupling changes
+**Constraints**: Small incremental changes; tests must stay green after every change
+
+## 4a. Analyze Coupling
+
+1. Analyze coupling between components/modules
+2. Map dependencies (direct and transitive)
+3. Identify circular dependencies
+4. Form decoupling strategy
+
+Write `REFACTOR_DIR/coupling_analysis.md`:
+- Dependency graph (Mermaid)
+- Coupling metrics per component
+- Problem areas: components involved, coupling type, severity, impact
+- Decoupling strategy: priority order, proposed interfaces/abstractions, effort estimates
+
+**BLOCKING**: Present coupling analysis to user. Do NOT proceed until user confirms strategy.
+
+## 4b. Execute Decoupling
+
+For each change in the decoupling strategy:
+
+1. Implement the change
+2. Run blackbox tests
+3. Fix any failures
+4. Commit with descriptive message
+
+Address code smells encountered: long methods, large classes, duplicate code, dead code, magic numbers.
+
+Write `REFACTOR_DIR/execution_log.md`:
+- Change description, files affected, test status per change
+- Before/after metrics comparison against baseline
+
+**Self-verification**:
+- [ ] All tests still pass after execution
+- [ ] No circular dependencies remain (or reduced per plan)
+- [ ] Code smells addressed
+- [ ] Metrics improved compared to baseline
+
+**Save action**: Write execution artifacts
+
+**BLOCKING**: Present execution summary to user. Do NOT proceed until user confirms.
@@ -0,0 +1,51 @@
+# Phase 5: Hardening (Optional, Parallel Tracks)
+
+**Role**: Varies per track
+**Goal**: Address technical debt, performance, and security
+**Constraints**: Each track is optional; user picks which to run
+
+Present the three tracks and let user choose which to execute:
+
+## Track A: Technical Debt
+
+**Role**: Technical debt analyst
+
+1. Identify and categorize debt items: design, code, test, documentation
+2. Assess each: location, description, impact, effort, interest (cost of not fixing)
+3. Prioritize: quick wins → strategic debt → tolerable debt
+4. Create actionable plan with prevention measures
+
+Write `REFACTOR_DIR/hardening/technical_debt.md`
+
+## Track B: Performance Optimization
+
+**Role**: Performance engineer
+
+1. Profile current performance, identify bottlenecks
+2. For each bottleneck: location, symptom, root cause, impact
+3. Propose optimizations with expected improvement and risk
+4. Implement one at a time, benchmark after each change
+5. Verify tests still pass
+
+Write `REFACTOR_DIR/hardening/performance.md` with before/after benchmarks
+
+## Track C: Security Review
+
+**Role**: Security engineer
+
+1. Review code against OWASP Top 10
+2. Verify security requirements from `security_approach.md` are met
+3. Check: authentication, authorization, input validation, output encoding, encryption, logging
+
+Write `REFACTOR_DIR/hardening/security.md`:
+- Vulnerability assessment: location, type, severity, exploit scenario, fix
+- Security controls review
+- Compliance check against `security_approach.md`
+- Recommendations: critical fixes, improvements, hardening
+
+**Self-verification** (per track):
+- [ ] All findings are grounded in actual code
+- [ ] Recommendations are actionable with effort estimates
+- [ ] All tests still pass after any changes
+
+**Save action**: Write hardening artifacts
@@ -0,0 +1,51 @@
+# Phase 6: Test Synchronization
+
+**Role**: QA engineer and developer
+**Goal**: Reconcile the test suite with the refactored codebase — remove obsolete tests, update broken tests, add tests for new code
+**Constraints**: All tests must pass at the end of this phase. Do not change production code here — only tests.
+
+## 6a. Identify Obsolete Tests
+
+1. Compare the pre-refactoring codebase structure (from Phase 0 inventory) with the current state
+2. Find tests that reference removed functions, classes, modules, or endpoints
+3. Find tests that duplicate coverage due to merged/consolidated code
+4. Decide per test: **delete** (functionality removed) or **merge** (duplicates)
+
+Write `REFACTOR_DIR/test_sync/obsolete_tests.md`:
+- Test file, test name, reason (target removed / target merged / duplicate coverage), action taken (deleted / merged into)
+
+## 6b. Update Existing Tests
+
+1. Run the full test suite — collect failures and errors
+2. For each failing test, determine the cause:
+   - Renamed/moved function or module → update import paths and references
+   - Changed function signature → update call sites and assertions
+   - Changed behavior (intentional per refactoring plan) → update expected values
+   - Changed data structures → update fixtures and assertions
+3. Fix each test, re-run to confirm it passes
+
+Write `REFACTOR_DIR/test_sync/updated_tests.md`:
+- Test file, test name, change type (import path / signature / assertion / fixture), description of update
+
+## 6c. Add New Tests
+
+1. Identify new code introduced during Phases 4–5 that lacks test coverage:
+   - New public functions, classes, or modules
+   - New interfaces or abstractions introduced during decoupling
+   - New error handling paths
+2. Write tests following the same patterns and conventions as the existing test suite
+3. Ensure coverage targets from Phase 3 are maintained or improved
+
+Write `REFACTOR_DIR/test_sync/new_tests.md`:
+- Test file, test name, target function/module, coverage type (unit / integration / blackbox)
+
+**Self-verification**:
+- [ ] All obsolete tests removed or merged
+- [ ] All pre-existing tests pass after updates
+- [ ] New code from Phases 4–5 has test coverage
+- [ ] Overall coverage meets or exceeds Phase 3 baseline (75% overall, 90% critical paths)
+- [ ] No tests reference removed or renamed code
+
+**Save action**: Write test_sync artifacts; implemented tests go into the project's test folder
+
+**GATE (BLOCKING)**: ALL tests must pass before proceeding to Phase 7. If tests fail, fix the tests or ask user for guidance.
@@ -0,0 +1,51 @@
+# Phase 7: Final Verification
+
+**Role**: QA engineer
+**Goal**: Run all tests end-to-end, compare final metrics against baseline, and confirm the refactoring succeeded
+**Constraints**: No code changes. If failures are found, go back to the appropriate phase (4/5/6) to fix before retrying.
+
+## 7a. Run Full Test Suite
+
+1. Run unit tests, integration tests, and blackbox tests
+2. Run acceptance tests derived from `acceptance_criteria.md`
+3. Record pass/fail counts and any failures
+
+If any test fails:
+- Determine whether the failure is a test issue (→ return to Phase 6) or a code issue (→ return to Phase 4/5)
+- Do NOT proceed until all tests pass
+
+## 7b. Capture Final Metrics
+
+Re-measure all metrics from Phase 0 baseline using the same tools:
+
+| Metric Category | What to Capture |
+|----------------|-----------------|
+| **Coverage** | Overall, unit, blackbox, critical paths |
+| **Complexity** | Cyclomatic complexity (avg + top 5 functions), LOC, tech debt ratio |
+| **Code Smells** | Total, critical, major |
+| **Performance** | Response times (P50/P95/P99), CPU/memory, throughput |
+| **Dependencies** | Total count, outdated, security vulnerabilities |
+| **Build** | Build time, test execution time, deployment time |
+
+## 7c. Compare Against Baseline
+
+1. Read `REFACTOR_DIR/baseline_metrics.md`
+2. Produce a side-by-side comparison: baseline vs final for every metric
+3. Flag any regressions (metrics that got worse)
+4. Verify acceptance criteria are met
+
+Write `REFACTOR_DIR/verification_report.md`:
+- Test results summary: total, passed, failed, skipped
+- Metric comparison table: metric, baseline value, final value, delta, status (improved / unchanged / regressed)
+- Acceptance criteria checklist: criterion, status (met / not met), evidence
+- Regressions (if any): metric, severity, explanation
+
+**Self-verification**:
+- [ ] All tests pass (zero failures)
+- [ ] All acceptance criteria are met
+- [ ] No critical metric regressions
+- [ ] Metrics are captured with the same tools/methodology as Phase 0
+
+**Save action**: Write `REFACTOR_DIR/verification_report.md`
+
+**GATE (BLOCKING)**: All tests must pass and no critical regressions. Present verification report to user. Do NOT proceed to Phase 8 until user confirms.
@@ -0,0 +1,46 @@
+# Phase 8: Documentation Update
+
+**Role**: Technical writer
+**Goal**: Update existing `_docs/` artifacts to reflect all changes made during refactoring
+**Constraints**: Documentation only — no code changes. Only update docs that are affected by refactoring changes.
+
+**Skip condition**: If no `_docs/02_document/` directory exists (standalone mode), skip this phase entirely.
+
+## 8a. Identify Affected Documentation
+
+1. Review `REFACTOR_DIR/execution_log.md` to list all files changed during Phase 4
+2. Review any hardening changes from Phase 5
+3. Review test changes from Phase 6
+4. Map changed files to their corresponding module docs in `_docs/02_document/modules/`
+5. Map changed modules to their parent component docs in `_docs/02_document/components/`
+6. Determine if system-level docs need updates (`architecture.md`, `system-flows.md`, `data_model.md`)
+7. Determine if test documentation needs updates (`_docs/02_document/tests/`)
+
+## 8b. Update Module Documentation
+
+For each module doc affected by refactoring changes:
+1. Re-read the current source file
+2. Update the module doc to reflect new/changed interfaces, dependencies, internal logic
+3. Remove documentation for deleted code; add documentation for new code
+
+## 8c. Update Component Documentation
+
+For each component doc affected:
+1. Re-read the updated module docs within the component
+2. Update inter-module interfaces, dependency graphs, caveats
+3. Update the component relationship diagram if component boundaries changed
+
+## 8d. Update System-Level Documentation
+
+If structural changes were made (new modules, removed modules, changed interfaces):
+1. Update `_docs/02_document/architecture.md` if architecture changed
+2. Update `_docs/02_document/system-flows.md` if flow sequences changed
+3. Update `_docs/02_document/diagrams/components.md` if component relationships changed
+
+**Self-verification**:
+- [ ] Every changed source file has an up-to-date module doc
+- [ ] Component docs reflect the refactored structure
+- [ ] No stale references to removed code in any doc
+- [ ] Dependency graphs in docs match actual imports
+
+**Save action**: Updated docs written in-place to `_docs/02_document/`
@@ -0,0 +1,14 @@
+FROM python:3.10-slim
+
+WORKDIR /app
+
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends libgl1 libglib2.0-0 && \
+    rm -rf /var/lib/apt/lists/*
+
+COPY requirements-test.txt .
+RUN pip install --no-cache-dir -r requirements-test.txt
+
+COPY . .
+
+CMD ["python", "-m", "pytest", "tests/", "--tb=short", "--junitxml=/app/test-results/test-results.xml", "-q"]
@@ -0,0 +1,51 @@
+# Acceptance Criteria
+
+## Training
+
+- Dataset split: 70% train, 20% validation, 10% test (hardcoded in train.py).
+- Training parameters: YOLOv11 medium, 120 epochs, batch size 11, image size 1280px, save_period=1.
+- Corrupted labels (bounding box coordinates > 1.0) are filtered to `/azaion/data-corrupted/`.
+- Model export to ONNX: 1280px resolution, batch size 4, NMS baked in.
+- Trained model encrypted with AES-256-CBC before upload.
+- Encrypted model split: small part ≤3KB or 20% of total → API server; remainder → CDN.
+- Post-training: model uploaded to both API and CDN endpoints.
+
+## Augmentation
+
+- Each validated image produces exactly 8 outputs (1 original + 7 augmented variants).
+- Augmentation runs every 5 minutes, processing only unprocessed images.
+- Bounding boxes clipped to [0, 1] range; boxes with area < 0.01% of image discarded.
+- Processing is parallelized per image using ThreadPoolExecutor.
+
+## Annotation Ingestion
+
+- Created/Edited annotations from Validators/Admins → `/azaion/data/`.
+- Created/Edited annotations from Operators → `/azaion/data-seed/`.
+- Validated (bulk) events → move from `/data-seed/` to `/data/`.
+- Deleted (bulk) events → move to `/data_deleted/`.
+- Queue consumer offset persisted to `offset.yaml` after each message.
+
+## Inference
+
+- TensorRT inference: ~54s for 200s video, ~3.7GB VRAM.
+- ONNX inference: ~81s for 200s video, ~6.3GB VRAM.
+- Frame sampling: every 4th frame.
+- Batch size: 4 (for both ONNX and TensorRT).
+- Confidence threshold: 0.3 (hardcoded in inference/inference.py).
+- NMS IoU threshold: 0.3 (hardcoded in inference/inference.py).
+- Overlapping detection removal: IoU > 0.3 with lower confidence removed.
+
+## Security
+
+- API authentication via JWT (email/password login).
+- Model encryption: AES-256-CBC with static key.
+- Resource encryption: AES-256-CBC with hardware-derived key (CPU+GPU+RAM+drive serial hash).
+- CDN access: separate read/write S3 credentials.
+- Split-model storage: prevents model theft from single storage compromise.
+
+## Data Format
+
+- Annotation format: YOLO (class_id center_x center_y width height — all normalized 0–1).
+- 17 base annotation classes × 3 weather modes = 51 active classes (80 total slots).
+- Image format: JPEG.
+- Queue message format: msgpack with positional integer keys.
@@ -0,0 +1,47 @@
+# Input Data Parameters
+
+## Annotation Images
+
+- **Format**: JPEG
+- **Naming**: UUID-based (`{uuid}.jpg`)
+- **Source**: Azaion annotation platform via RabbitMQ Streams
+- **Volume**: Up to 360K+ annotations observed in training comments
+- **Delivery**: Real-time streaming via annotation queue consumer
+
+## Annotation Labels
+
+- **Format**: YOLO text format (one detection per line)
+- **Schema**: `{class_id} {center_x} {center_y} {width} {height}`
+- **Coordinate system**: All values normalized to 0–1 relative to image dimensions
+- **Constraints**: Coordinates must be in [0, 1]; labels with coords > 1.0 are treated as corrupted
+
+## Annotation Classes
+
+- **Source file**: `classes.json` (static, 17 entries)
+- **Schema per class**: `{ Id: int, Name: str, ShortName: str, Color: hex_str }`
+- **Classes**: ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier
+- **Weather expansion**: Each class × 3 modes (Norm offset 0, Wint offset 20, Night offset 40)
+- **Total class IDs**: 80 slots (51 used, 29 reserved as placeholders)
+
+## Queue Messages
+
+- **Protocol**: AMQP via RabbitMQ Streams (rstream library)
+- **Serialization**: msgpack with positional integer keys
+- **Message types**: AnnotationMessage (single), AnnotationBulkMessage (batch validate/delete)
+- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (raw bytes), createdRole, createdEmail, source, status
+
+## Configuration Files
+
+| File | Format | Key Contents |
+|------|--------|-------------|
+| `config.yaml` | YAML | API URL, email, password, queue host/port/username/password, directory paths |
+| `cdn.yaml` | YAML | CDN endpoint, read access key/secret, write access key/secret, bucket name |
+| `classes.json` | JSON | Annotation class definitions array |
+| `checkpoint.txt` | Plain text | Last training run timestamp |
+| `offset.yaml` | YAML | Queue consumer offset for resume |
+
+## Video Input (Inference)
+
+- **Format**: Any OpenCV-supported video format
+- **Processing**: Every 4th frame sampled, batched in groups of 4
+- **Resolution**: Resized to model input size (1280×1280) during preprocessing
@@ -0,0 +1,107 @@
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `detection_count: 3`, `file_count: 8` |
+| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05` |
+| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.3` |
+| Pattern match | Output must match a string/regex pattern | `filename matches *_1.jpg` |
+| Set/count | Output must contain specific items or counts | `output_count == 8` |
+
+## Input → Expected Result Mapping
+
+### Augmentation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | 1 image + 1 label from `dataset/` | Single annotated image with valid bboxes | `output_count: 8` (1 original + 7 augmented) | exact | N/A | N/A |
+| 2 | 1 image + 1 label from `dataset/` | Same image, output filenames | Original keeps name; augmented named `{stem}_1` through `{stem}_7` | pattern | N/A | N/A |
+| 3 | 1 image + 1 label from `dataset/` | All output label bboxes | Every coordinate in [0, 1] range | range | [0.0, 1.0] | N/A |
+| 4 | 1 image + label with bbox near edge (x=0.99, w=0.1) | Bbox partially outside image | Bbox clipped: width reduced, tiny bboxes (area < 0.01) removed | threshold_min | width ≥ 0.01, height ≥ 0.01 | N/A |
+| 5 | 1 image + empty label file | Image with no detections | `output_count: 8`, all label files empty | exact | N/A | N/A |
+
+### Dataset Formation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 6 | 100 images + 100 labels from `dataset/` | Full fixture dataset | 3 folders created: `train/`, `valid/`, `test/` | exact | N/A | N/A |
+| 7 | 100 images + 100 labels from `dataset/` | Split ratio | train: 70, valid: 20, test: 10 | exact | N/A | N/A |
+| 8 | 100 images + 100 labels from `dataset/` | Each split has images/ and labels/ subdirs | `train/images/`, `train/labels/`, `valid/images/`, `valid/labels/`, `test/images/`, `test/labels/` | exact | N/A | N/A |
+| 9 | 100 images + 100 labels from `dataset/` | Total files across all splits equals input count | `sum(train + valid + test) == 100` | exact | N/A | N/A |
+
+### Label Validation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 10 | Label file: `0 0.5 0.5 0.1 0.1` | Valid label (all coords ≤ 1.0) | `check_label` returns `True` | exact | N/A | N/A |
+| 11 | Label file: `0 1.5 0.5 0.1 0.1` | Corrupted label (x > 1.0) | `check_label` returns `False` | exact | N/A | N/A |
+| 12 | Label file: `0 0.5 0.5 0.1 1.2` | Corrupted label (h > 1.0) | `check_label` returns `False` | exact | N/A | N/A |
+| 13 | Non-existent label path | Missing label file | `check_label` returns `False` | exact | N/A | N/A |
+| 14 | Mix of 5 valid + 1 corrupted images/labels | Dataset formation with corrupted data | Corrupted image+label moved to `data-corrupted/`; valid ones in dataset splits | exact | corrupted_count: 1, valid_count: 5 | N/A |
+
+### Encryption Roundtrip
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 15 | 1024 random bytes + key "test-key" | Arbitrary binary data | `decrypt(encrypt(data, key), key) == data` | exact | N/A | N/A |
+| 16 | `azaion.onnx` bytes + model encryption key | Full ONNX model file | `decrypt(encrypt(model_bytes, key), key) == model_bytes` | exact | N/A | N/A |
+| 17 | Empty bytes + key "test-key" | Edge case: zero-length input | `decrypt(encrypt(b"", key), key) == b""` | exact | N/A | N/A |
+| 18 | 1 byte + key "test-key" | Edge case: minimum-length input | `decrypt(encrypt(b"\x00", key), key) == b"\x00"` | exact | N/A | N/A |
+
+### Model Encryption + Split
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 19 | 10000 bytes, key | Model-like binary data | Encrypted bytes split: small ≤ 3KB or 20% of total, big = remainder | threshold_max | small ≤ max(3072, total*0.2) | N/A |
+| 20 | 10000 bytes, key | Same data, reassembled | `small + big == encrypted_total` | exact | N/A | N/A |
+
+### Annotation Class Loading
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 21 | `classes.json` | Standard class definitions | `len(classes) == 17` unique base classes | exact | N/A | N/A |
+| 22 | `classes.json` | Weather mode expansion | Class IDs: Norm offset 0, Wint offset 20, Night offset 40 | exact | N/A | N/A |
+| 23 | `classes.json` | Total class slots in data.yaml | `nc: 80` in generated YAML | exact | N/A | N/A |
+
+### Hardware Hash Determinism
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 24 | String "test-hardware-info" | Arbitrary hardware string | `get_hw_hash(s1) == get_hw_hash(s1)` (deterministic) | exact | N/A | N/A |
+| 25 | Strings "hw-a" and "hw-b" | Different hardware strings | `get_hw_hash("hw-a") != get_hw_hash("hw-b")` | exact | N/A | N/A |
+| 26 | String "test-hardware-info" | Hash format | Result is base64-encoded string, length > 0 | pattern | matches `^[A-Za-z0-9+/]+=*$` | N/A |
+
+### ONNX Inference Smoke Test
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 27 | `azaion.onnx` + 1 image from `dataset/` | Model + annotated image (known to contain objects) | Engine loads without error; returns output array with shape [batch, N, 6] | exact (no exception) | N/A | N/A |
+| 28 | `azaion.onnx` + 1 image from `dataset/` | Inference postprocessing | Returns list of Detection objects (≥ 0 items); each Detection has x, y, w, h in [0,1], cls ≥ 0, confidence in [0,1] | range | x,y,w,h ∈ [0,1]; confidence ∈ [0,1]; cls ∈ [0,79] | N/A |
+
+### NMS / Overlap Removal
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 29 | 2 Detections: same position, conf 0.9 and 0.5, IoU > 0.3 | Overlapping detections, different confidence | 1 detection remaining (conf 0.9 kept) | exact | count: 1 | N/A |
+| 30 | 2 Detections: non-overlapping positions, IoU < 0.3 | Non-overlapping detections | 2 detections remaining (both kept) | exact | count: 2 | N/A |
+| 31 | 3 Detections: A overlaps B, B overlaps C, A doesn't overlap C | Chain overlap | ≤ 2 detections remaining; highest confidence per overlap pair kept | threshold_max | count ≤ 2 | N/A |
+
+### Annotation Queue Message Parsing
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 32 | Constructed msgpack bytes matching AnnotationMessage schema | Valid Created annotation message | Parsed AnnotationMessage with correct fields: name, detections, image bytes, status == Created | exact | N/A | N/A |
+| 33 | Constructed msgpack bytes for bulk Validated message | Valid bulk validation message | Parsed with status == Validated, list of annotation names | exact | N/A | N/A |
+| 34 | Constructed msgpack bytes for bulk Deleted message | Valid bulk deletion message | Parsed with status == Deleted, list of annotation names | exact | N/A | N/A |
+| 35 | Malformed msgpack bytes | Invalid message format | Exception raised (caught by handler) | exact (exception type) | N/A | N/A |
+
+### YAML Generation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 36 | `classes.json` + dataset path | Generate data.yaml for training | YAML contains: `nc: 80`, `train: train/images`, `val: valid/images`, `test: test/images`, 80 class names | exact | N/A | N/A |
+| 37 | `classes.json` with 17 classes | Class name listing in YAML | 17 known class names present; 63 placeholder names as `Class-N` | exact | 17 named + 63 placeholder = 80 total | N/A |
@@ -0,0 +1,33 @@
+# Problem Statement
+
+## What is this system?
+
+Azaion AI Training is an end-to-end machine learning pipeline for training and deploying object detection models. It detects military and infrastructure objects in aerial/satellite imagery — including vehicles, artillery, personnel, trenches, camouflage, and buildings — under varying weather and lighting conditions.
+
+## What problem does it solve?
+
+Automated detection of military assets and infrastructure from aerial imagery requires:
+1. Continuous ingestion of human-annotated training data from the Azaion annotation platform
+2. Automated data augmentation to expand limited labeled datasets (8× multiplication)
+3. GPU-accelerated model training using state-of-the-art object detection architectures
+4. Secure model distribution that prevents model theft and ties deployment to authorized hardware
+5. Real-time inference on video feeds with GPU acceleration
+6. Edge deployment capability for low-power field devices
+
+## Who are the users?
+
+- **Annotators/Operators**: Create annotation data through the Azaion platform. Their annotations flow into the training pipeline via RabbitMQ.
+- **Validators/Admins**: Review and approve annotations, promoting them from seed to validated status.
+- **ML Engineers**: Configure and run training pipelines, monitor model quality, trigger retraining.
+- **Inference Operators**: Deploy and run inference on video feeds using trained models on GPU-equipped machines.
+- **Edge Deployment Operators**: Set up and run inference on OrangePi5 edge devices in the field.
+
+## How does it work (high level)?
+
+1. Annotations (images + bounding box labels) arrive via a RabbitMQ stream from the Azaion annotation platform
+2. A queue consumer service routes annotations to the filesystem based on user role (operator → seed, validator → validated)
+3. An augmentation pipeline continuously processes validated images, producing 8 augmented variants per original
+4. A training pipeline assembles datasets (70/20/10 split), trains a YOLOv11 model over ~120 epochs, and exports to ONNX format
+5. Trained models are encrypted with AES-256-CBC, split into small and big parts, and uploaded to the Azaion API and S3 CDN respectively
+6. Inference clients download and reassemble the model, decrypt it using a hardware-bound key, and run real-time detection on video feeds using TensorRT or ONNX Runtime
+7. For edge deployment, models are exported to RKNN format for OrangePi5 devices
@@ -0,0 +1,38 @@
+# Restrictions
+
+## Hardware
+
+- Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
+- TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
+- ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
+- Edge inference requires RK3588 SoC (OrangePi5).
+- Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.
+
+## Software
+
+- Python 3.10+ (uses `match` statements).
+- CUDA 12.1 with PyTorch 2.3.0.
+- TensorRT runtime for production GPU inference.
+- ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
+- Albumentations for augmentation transforms.
+- boto3 for S3-compatible CDN access.
+- rstream for RabbitMQ Streams protocol.
+- cryptography library for AES-256-CBC encryption.
+
+## Environment
+
+- Filesystem paths hardcoded to `/azaion/` root (configurable via `config.yaml`).
+- Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
+- Configuration files (`config.yaml`, `cdn.yaml`) must be present with valid credentials.
+- `classes.json` must be present with the 17 annotation class definitions.
+- No containerization — processes run directly on host OS.
+
+## Operational
+
+- Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
+- Augmentation runs as an infinite loop with 5-minute sleep intervals.
+- Annotation queue consumer runs as a persistent async process.
+- TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
+- Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
+- No graceful shutdown mechanism for the augmentation process.
+- No reconnection logic for the annotation queue consumer on disconnect.
@@ -0,0 +1,33 @@
+# Security Approach
+
+## Authentication
+
+- **API Authentication**: JWT-based. Client sends email/password to `POST /login`, receives JWT token used as Bearer token for subsequent requests.
+- **Auto-relogin**: On HTTP 401/403 responses, the client automatically re-authenticates and retries the request.
+
+## Encryption
+
+- **Model encryption**: AES-256-CBC with a static key defined in `security.py`. All model artifacts (ONNX, TensorRT) are encrypted before upload.
+- **Resource encryption**: AES-256-CBC with a hardware-derived key. The key is generated by hashing the machine's CPU model, GPU name, total RAM, and primary drive serial number. This ties decryption to the specific hardware.
+- **Implementation**: Uses the `cryptography` library with PKCS7 padding. IV is prepended to ciphertext.
+
+## Model Protection
+
+- **Split storage**: Encrypted models are split into a small part (≤3KB or 20% of total size) stored on the Azaion API server and a big part stored on S3-compatible CDN. Both parts are required to reconstruct the model.
+- **Hardware binding**: Inference clients must run on authorized hardware whose fingerprint matches the encryption key used during upload.
+
+## Access Control
+
+- **CDN access**: Separate read-only and write-only S3 credentials. Training uploads use write keys; inference downloads use read keys.
+- **Role-based annotation routing**: Validator/Admin annotations go directly to validated storage; Operator annotations go to seed storage pending validation.
+
+## Known Security Issues
+
+| Issue | Severity | Location |
+|-------|----------|----------|
+| Hardcoded API credentials (email, password) | High | config.yaml |
+| Hardcoded CDN access keys (4 keys) | High | cdn.yaml |
+| Hardcoded model encryption key | High | security.py:67 |
+| Queue credentials in plaintext | Medium | config.yaml, annotation-queue/config.yaml |
+| No TLS certificate validation | Low | api_client.py |
+| No input validation on API responses | Low | api_client.py |
@@ -0,0 +1,89 @@
+# Solution
+
+## Product Solution Description
+
+Azaion AI Training is an ML pipeline for training, exporting, and deploying YOLOv11 object detection models within the Azaion platform ecosystem. The system ingests annotated image data from a RabbitMQ stream, augments it through an Albumentations-based pipeline, trains YOLOv11 models on NVIDIA GPUs, exports them to multiple formats (ONNX, TensorRT, RKNN), and deploys encrypted split-model artifacts to a REST API and S3-compatible CDN for secure distribution.
+
+The pipeline targets aerial/satellite military object detection across 17 base classes with 3 weather modes (Normal, Winter, Night), producing 80 total class slots.
+
+### Component Interaction
+
+```mermaid
+graph LR
+    RMQ[RabbitMQ Streams] -->|annotations| AQ[Annotation Queue]
+    AQ -->|images + labels| FS[(Filesystem)]
+    FS -->|raw data| AUG[Augmentation]
+    AUG -->|8× augmented| FS
+    FS -->|dataset| TRAIN[Training]
+    TRAIN -->|model artifacts| EXP[Export + Encrypt]
+    EXP -->|small part| API[Azaion API]
+    EXP -->|big part| CDN[S3 CDN]
+    API -->|small part| INF[Inference]
+    CDN -->|big part| INF
+    INF -->|detections| OUT[Video Output]
+```
+
+## Architecture
+
+### Component Solution Table
+
+| Component | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost Indicators | Fitness |
+|-----------|----------|-------|------------|-------------|-------------|----------|----------------|---------|
+| Annotation Queue | Async RabbitMQ Streams consumer with role-based routing (Validator→validated, Operator→seed) | rstream, msgpack, asyncio | Decoupled ingestion, independent lifecycle, file-based offset persistence | No reconnect logic on disconnect; single consumer (no scaling) | RabbitMQ with Streams plugin, network access | Credentials in plaintext config | Low (single lightweight process) | Good for current single-server deployment |
+| Data Pipeline | Continuous augmentation loop (5-min interval) producing 8× expansion via geometric + color transforms | Albumentations, OpenCV, ThreadPoolExecutor | Robust augmentation variety, parallel per-image processing | Infinite loop with no graceful shutdown; attribute bug in progress logging | Filesystem access to /azaion/data/ and /azaion/data-processed/ | None | CPU-bound, parallelized | Adequate for offline batch augmentation |
+| Training | Ultralytics YOLO training with automated dataset formation (70/20/10 split), corrupt label filtering, model export and encrypted upload | Ultralytics (YOLOv11m), PyTorch 2.3.0 CUDA 12.1 | Mature framework, built-in checkpointing (save_period=1), multi-format export | Long training cycles (~11.5 days for 360K annotations); batch=11 near 24GB VRAM limit | NVIDIA GPU (RTX 4090 24GB), CUDA 12.1 | Model encrypted AES-256-CBC before upload; split storage pattern | High (GPU compute, multi-day runs) | Well-suited for periodic retraining |
+| Inference | TensorRT (primary) and ONNX Runtime (fallback) engines with async CUDA streams, batch processing, NMS postprocessing | TensorRT, ONNX Runtime, PyCUDA, OpenCV | TensorRT: ~33% faster than ONNX, ~42% less VRAM; batch processing; per-GPU engine compilation | Potential uninitialized batch_size for dynamic shapes; no model caching strategy | NVIDIA GPU with TensorRT support | Hardware-bound decryption key; encrypted model download | Moderate (GPU inference) | Production-ready for GPU servers |
+| Security | AES-256-CBC encryption for models and API resources; hardware fingerprinting (CPU+GPU+RAM+drive serial) for machine-bound keys | cryptography library | Split-model storage prevents single-point theft; hardware binding ties access to authorized machines | Hardcoded encryption key; hardcoded credentials in config files; no TLS cert validation | cryptography, pynvml, platform-specific hardware queries | Core security component | Minimal | Functional but needs credential externalization |
+| API & CDN | REST API client with JWT auth and S3-compatible CDN for large artifact storage; split-resource upload/download pattern | requests, boto3 | Separation of small/big model parts; auto-relogin on 401/403 | No retry on 500 errors; no connection pooling | Azaion API endpoint, S3-compatible CDN endpoint | JWT tokens, separate read/write CDN keys | Low (network I/O only) | Adequate for current model distribution needs |
+| Edge Deployment | RKNN export targeting RK3588 SoC (OrangePi5) with shell-based setup scripts | RKNN toolkit, bash scripts | Low-power edge inference capability | Setup scripts not integrated into main pipeline; no automated deployment | OrangePi5 hardware, RKNN runtime | N/A | Low (edge hardware) | Proof-of-concept stage |
+
+### Deployment Architecture
+
+The system runs as independent processes without containerization or orchestration:
+
+| Process | Runtime Pattern | Host Requirements |
+|---------|----------------|-------------------|
+| Annotation Queue Consumer | Continuous (async event loop) | Network access to RabbitMQ |
+| Augmentation Pipeline | Continuous loop (5-min cycle) | CPU cores, filesystem access |
+| Training Pipeline | Long-running (days per run) | NVIDIA GPU (24GB VRAM), CUDA 12.1 |
+| Inference | On-demand | NVIDIA GPU with TensorRT |
+| Data Tools | Ad-hoc manual execution | Developer machine |
+
+No CI/CD pipeline, container definitions, or infrastructure-as-code were found. Deployment is manual.
+
+## Testing Strategy
+
+### Existing Tests
+
+| Test | Type | Coverage |
+|------|------|----------|
+| `tests/security_test.py` | Script-based | Encrypts a test image, verifies roundtrip decrypt matches original bytes |
+| `tests/imagelabel_visualize_test.py` | Script-based | Loads sample annotations with `preprocessing.read_labels` (broken — `preprocessing` module missing) |
+
+### Gaps
+
+- No formal test framework (pytest/unittest) configured
+- No integration tests for the training pipeline, augmentation, or inference
+- No API client tests (mocked or live)
+- No augmentation correctness tests (bounding box transform validation)
+- Security test is a standalone script, not runnable via test runner
+- The `imagelabel_visualize_test.py` cannot run due to missing `preprocessing` module
+
+### Observed Quality Mechanisms
+
+- Corrupt label detection during dataset formation (coords > 1.0 → moved to /data-corrupted/)
+- Bounding box clipping and filtering during augmentation
+- Training checkpointing (save_period=1) for crash recovery
+- Augmentation exception handling per-image and per-variant
+
+## References
+
+| Artifact | Path | Purpose |
+|----------|------|---------|
+| Main config | `config.yaml` | API credentials, queue config, directory paths |
+| CDN config | `cdn.yaml` | S3 CDN endpoint and access keys |
+| Class definitions | `classes.json` | 17 annotation classes with colors |
+| Python dependencies | `requirements.txt` | Main pipeline dependencies |
+| Queue dependencies | `annotation-queue/requirements.txt` | Annotation queue service dependencies |
+| Edge setup | `orangepi5/*.sh` | OrangePi5 installation and run scripts |
+| Training checkpoint | `checkpoint.txt` | Last training run timestamp (2024-06-27) |
@@ -0,0 +1,205 @@
+# Codebase Discovery
+
+## Directory Tree
+
+```
+ai-training/
+├── annotation-queue/           # Separate sub-service: annotation message queue consumer
+│   ├── annotation_queue_dto.py
+│   ├── annotation_queue_handler.py
+│   ├── classes.json
+│   ├── config.yaml
+│   ├── offset.yaml
+│   ├── requirements.txt
+│   └── run.sh
+├── dto/                        # Data transfer objects for the training pipeline
+│   ├── annotationClass.py
+│   ├── annotation_bulk_message.py  (empty)
+│   ├── annotation_message.py       (empty)
+│   └── imageLabel.py
+├── inference/                  # Inference engine subsystem (ONNX + TensorRT)
+│   ├── __init__.py             (empty)
+│   ├── dto.py
+│   ├── inference.py
+│   ├── onnx_engine.py
+│   └── tensorrt_engine.py
+├── orangepi5/                  # Setup scripts for OrangePi5 edge device
+│   ├── 01 install.sh
+│   ├── 02 install-inference.sh
+│   └── 03 run_inference.sh
+├── scripts/
+│   └── init-sftp.sh
+├── tests/
+│   ├── data.yaml
+│   ├── imagelabel_visualize_test.py
+│   ├── libomp140.x86_64.dll   (binary workaround for Windows)
+│   └── security_test.py
+├── api_client.py               # API client for Azaion backend + CDN resource management
+├── augmentation.py             # Image augmentation pipeline (albumentations)
+├── cdn_manager.py              # S3-compatible CDN upload/download via boto3
+├── cdn.yaml                    # CDN credentials config
+├── checkpoint.txt              # Last training checkpoint timestamp
+├── classes.json                # Annotation class definitions (17 classes + weather modes)
+├── config.yaml                 # Main config (API url, queue, directories)
+├── constants.py                # Shared path constants and config keys
+├── convert-annotations.py      # Annotation format converter (Pascal VOC / bbox → YOLO)
+├── dataset-visualiser.py       # Interactive dataset visualization tool
+├── exports.py                  # Model export (ONNX, TensorRT, RKNN) and upload
+├── hardware_service.py         # Hardware fingerprinting (CPU/GPU/RAM/drive serial)
+├── install.sh                  # Dependency installation script
+├── manual_run.py               # Manual training/export entry point
+├── requirements.txt            # Python dependencies
+├── security.py                 # AES-256-CBC encryption/decryption + key derivation
+├── start_inference.py          # Inference entry point (downloads model, runs TensorRT)
+├── train.py                    # Main training pipeline (dataset formation → YOLO training → export)
+└── utils.py                    # Utility classes (Dotdict)
+```
+
+## Tech Stack Summary
+
+| Category | Technology | Details |
+|----------|-----------|---------|
+| Language | Python 3.10+ | Match statements used (3.10 feature) |
+| ML Framework | Ultralytics (YOLO) | YOLOv11 object detection model |
+| Deep Learning | PyTorch 2.3.0 (CUDA 12.1) | GPU-accelerated training |
+| Inference (Primary) | TensorRT | GPU inference with FP16/INT8 support |
+| Inference (Fallback) | ONNX Runtime GPU | Cross-platform inference |
+| Augmentation | Albumentations | Image augmentation pipeline |
+| Computer Vision | OpenCV (cv2) | Image I/O, preprocessing, visualization |
+| CDN/Storage | boto3 (S3-compatible) | Model artifact storage |
+| Message Queue | RabbitMQ Streams (rstream) | Annotation message consumption |
+| Serialization | msgpack | Queue message deserialization |
+| Encryption | cryptography (AES-256-CBC) | Model encryption, API resource encryption |
+| GPU Management | pycuda, pynvml | CUDA memory management, device queries |
+| HTTP | requests | API communication |
+| Config | PyYAML | Configuration files |
+| Visualization | matplotlib, netron | Annotation display, model graph viewer |
+| Edge Deployment | RKNN (RK3588) | OrangePi5 inference target |
+
+## Dependency Graph
+
+### Internal Module Dependencies (textual)
+
+**Leaves (no internal dependencies):**
+- `constants` — path constants, config keys
+- `utils` — Dotdict helper
+- `security` — encryption/decryption, key derivation
+- `hardware_service` — hardware fingerprinting
+- `cdn_manager` — S3-compatible CDN client
+- `dto/annotationClass` — annotation class model + JSON reader
+- `dto/imageLabel` — image+labels container with visualization
+- `inference/dto` — Detection, Annotation, AnnotationClass (inference-specific)
+- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
+- `convert-annotations` — standalone annotation format converter
+- `annotation-queue/annotation_queue_dto` — queue message DTOs
+
+**Level 1 (depends on leaves):**
+- `api_client` → constants, cdn_manager, hardware_service, security
+- `augmentation` → constants, dto/imageLabel
+- `inference/tensorrt_engine` → inference/onnx_engine (InferenceEngine ABC)
+- `inference/inference` → inference/dto, inference/onnx_engine
+- `annotation-queue/annotation_queue_handler` → annotation_queue_dto
+
+**Level 2 (depends on level 1):**
+- `exports` → constants, api_client, cdn_manager, security, utils
+
+**Level 3 (depends on level 2):**
+- `train` → constants, api_client, cdn_manager, dto/annotationClass, inference/onnx_engine, security, utils, exports
+- `start_inference` → constants, api_client, cdn_manager, inference/inference, inference/tensorrt_engine, security, utils
+
+**Level 4 (depends on level 3):**
+- `manual_run` → constants, train, augmentation
+
+**Broken dependency:**
+- `dataset-visualiser` → constants, dto/annotationClass, dto/imageLabel, **preprocessing** (module not found in codebase)
+
+### Dependency Graph (Mermaid)
+
+```mermaid
+graph TD
+    constants --> api_client
+    constants --> augmentation
+    constants --> exports
+    constants --> train
+    constants --> manual_run
+    constants --> start_inference
+    constants --> dataset-visualiser
+
+    utils --> exports
+    utils --> train
+    utils --> start_inference
+
+    security --> api_client
+    security --> exports
+    security --> train
+    security --> start_inference
+
+    hardware_service --> api_client
+
+    cdn_manager --> api_client
+    cdn_manager --> exports
+    cdn_manager --> train
+    cdn_manager --> start_inference
+
+    api_client --> exports
+    api_client --> train
+    api_client --> start_inference
+
+    dto_annotationClass[dto/annotationClass] --> train
+    dto_annotationClass --> dataset-visualiser
+
+    dto_imageLabel[dto/imageLabel] --> augmentation
+    dto_imageLabel --> dataset-visualiser
+
+    inference_dto[inference/dto] --> inference_inference[inference/inference]
+    inference_onnx[inference/onnx_engine] --> inference_inference
+    inference_onnx --> inference_trt[inference/tensorrt_engine]
+    inference_onnx --> train
+
+    inference_inference --> start_inference
+    inference_trt --> start_inference
+
+    exports --> train
+    train --> manual_run
+    augmentation --> manual_run
+
+    aq_dto[annotation-queue/annotation_queue_dto] --> aq_handler[annotation-queue/annotation_queue_handler]
+```
+
+## Topological Processing Order
+
+| Batch | Modules |
+|-------|---------|
+| 1 (leaves) | constants, utils, security, hardware_service, cdn_manager |
+| 2 (leaves) | dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine |
+| 3 (level 1) | api_client, augmentation, inference/tensorrt_engine, inference/inference |
+| 4 (level 2) | exports, convert-annotations, dataset-visualiser |
+| 5 (level 3) | train, start_inference |
+| 6 (level 4) | manual_run |
+| 7 (separate) | annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler |
+
+## Entry Points
+
+| Entry Point | Description |
+|-------------|-------------|
+| `train.py` (`__main__`) | Main pipeline: form dataset → train YOLO → export + upload ONNX model |
+| `augmentation.py` (`__main__`) | Continuous augmentation loop (runs indefinitely) |
+| `start_inference.py` (`__main__`) | Download encrypted TensorRT model → run video inference |
+| `manual_run.py` (script) | Ad-hoc training/export commands |
+| `convert-annotations.py` (`__main__`) | One-shot annotation format conversion |
+| `dataset-visualiser.py` (`__main__`) | Interactive annotation visualization |
+| `annotation-queue/annotation_queue_handler.py` (`__main__`) | Async queue consumer for annotation CRUD events |
+
+## Leaf Modules
+
+constants, utils, security, hardware_service, cdn_manager, dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine, convert-annotations, annotation-queue/annotation_queue_dto
+
+## Observations
+
+- **Security concern**: `config.yaml` and `cdn.yaml` contain hardcoded credentials (API passwords, S3 access keys). These should be moved to environment variables or a secrets manager.
+- **Missing module**: `dataset-visualiser.py` imports from `preprocessing` which does not exist in the codebase.
+- **Duplicate code**: `AnnotationClass` and `WeatherMode` are defined in three separate locations: `dto/annotationClass.py`, `inference/dto.py`, and `annotation-queue/annotation_queue_dto.py`.
+- **Empty files**: `dto/annotation_bulk_message.py`, `dto/annotation_message.py`, and `inference/__init__.py` are empty.
+- **Separate sub-service**: `annotation-queue/` has its own `requirements.txt` and `config.yaml`, functioning as an independent service.
+- **Hardcoded encryption key**: `security.py` has a hardcoded model encryption key string.
+- **No formal test framework**: tests are script-based, not using pytest/unittest.
@@ -0,0 +1,138 @@
+# Verification Log
+
+## Summary
+
+| Metric | Count |
+|--------|-------|
+| Entities verified | 87 |
+| Entities flagged | 0 |
+| Corrections applied | 0 |
+| Bugs found in code | 5 |
+| Missing modules | 1 |
+| Duplicated code | 1 pattern (3 locations) |
+| Security issues | 3 |
+| Completeness | 21/21 modules (100%) |
+
+## Entity Verification
+
+All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.
+
+### Verified Entities (key samples)
+
+| Entity | Location | Doc Reference | Status |
+|--------|----------|--------------|--------|
+| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
+| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
+| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
+| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
+| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
+| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
+| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
+| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
+| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
+| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
+| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
+| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
+| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
+| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
+| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
+| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
+| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
+| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
+| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
+| `form_dataset` | train.py:42 | modules/train.md | OK |
+| `train_dataset` | train.py:147 | modules/train.md | OK |
+| `export_onnx` | exports.py:29 | modules/exports.md | OK |
+| `export_rknn` | exports.py:19 | modules/exports.md | OK |
+| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
+| `upload_model` | exports.py:82 | modules/exports.md | OK |
+| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
+| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
+| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
+| `Dotdict` | utils.py:1 | modules/utils.md | OK |
+
+## Code Bugs Found During Verification
+
+### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
+- **Location**: augmentation.py, line 118
+- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
+- **Impact**: AttributeError at runtime during progress logging
+- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md
+
+### Bug 2: `train.py` `copy_annotations` — reporting bug
+- **Location**: train.py, line 93 and 99
+- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
+- **Impact**: Incorrect progress reporting (cosmetic)
+- **Documented in**: modules/train.md, components/06_training/description.md
+
+### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
+- **Location**: exports.py, line 97
+- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
+- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
+- **Documented in**: modules/exports.md, components/06_training/description.md
+
+### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
+- **Location**: inference/tensorrt_engine.py, line 43–44
+- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
+- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
+- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md
+
+### Bug 5: `dataset-visualiser.py` — missing import
+- **Location**: dataset-visualiser.py, line 6
+- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
+- **Impact**: Script cannot run; ImportError at startup
+- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md
+
+## Missing Modules
+
+| Module | Referenced By | Status |
+|--------|-------------|--------|
+| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |
+
+## Duplicated Code
+
+### AnnotationClass + WeatherMode (3 locations)
+| Location | Differences |
+|----------|-------------|
+| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
+| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
+| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |
+
+## Security Issues
+
+| Issue | Location | Severity |
+|-------|----------|----------|
+| Hardcoded API credentials | config.yaml (email, password) | High |
+| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
+| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
+| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
+| No TLS cert validation in API calls | api_client.py | Low |
+
+## Completeness Check
+
+All 21 source modules documented. All 8 components cover all modules with no gaps.
+
+| Component | Modules | Complete |
+|-----------|---------|----------|
+| 01 Core | constants, utils | Yes |
+| 02 Security | security, hardware_service | Yes |
+| 03 API & CDN | api_client, cdn_manager | Yes |
+| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
+| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
+| 06 Training | train, exports, manual_run | Yes |
+| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
+| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |
+
+## Consistency Check
+
+- Component docs agree with architecture doc: Yes
+- Flow diagrams match component interfaces: Yes
+- Module dependency graph in discovery matches import analysis: Yes
+- Data model doc matches filesystem layout in architecture: Yes
+
+## Remaining Gaps / Uncertainties
+
+- The `preprocessing` module may have existed previously and been deleted or renamed
+- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
+- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
+- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
@@ -0,0 +1,100 @@
+# Final Documentation Report — Azaion AI Training
+
+## Executive Summary
+
+Azaion AI Training is a Python-based ML pipeline for training, deploying, and running YOLOv11 object detection models targeting aerial military asset recognition. The system comprises 8 components (21 modules) spanning annotation ingestion, data augmentation, GPU-accelerated training, multi-format model export, encrypted model distribution, and real-time inference — with edge deployment capability via RKNN on OrangePi5 devices.
+
+The codebase is functional and production-used (last training run: 2024-06-27) but has no CI/CD, no containerization, no formal test framework, and several hardcoded credentials. Verification identified 5 code bugs, 3 high-severity security issues, and 1 missing module.
+
+## Problem Statement
+
+The system automates detection of 17 classes of military objects and infrastructure in aerial/satellite imagery across 3 weather conditions (Normal, Winter, Night). It replaces manual image analysis with a continuous pipeline: human-annotated data flows in via RabbitMQ, is augmented 8× for training diversity, trains YOLOv11 models over multi-day GPU runs, and distributes encrypted models to inference clients that run real-time video detection.
+
+## Architecture Overview
+
+**Tech stack**: Python 3.10+ · PyTorch 2.3.0 (CUDA 12.1) · Ultralytics YOLOv11m · TensorRT · ONNX Runtime · Albumentations · boto3 · rstream · cryptography
+
+**Deployment**: 5 independent processes (no orchestration, no containers) running on GPU-equipped servers. Manual deployment.
+
+## Component Summary
+
+| # | Component | Modules | Purpose | Key Dependencies |
+|---|-----------|---------|---------|-----------------|
+| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, Dotdict helper | None |
+| 02 | Security & Hardware | security, hardware_service | AES-256-CBC encryption, hardware fingerprinting | cryptography, pynvml |
+| 03 | API & CDN Client | api_client, cdn_manager | REST API (JWT auth) + S3 CDN communication | requests, boto3, Security |
+| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation class definitions, image+label container | OpenCV, matplotlib |
+| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | 8× augmentation, format conversion, visualization | Albumentations, Data Models |
+| 06 | Training Pipeline | train, exports, manual_run | Dataset formation → YOLO training → export → encrypted upload | Ultralytics, API & CDN, Security |
+| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Model download, decryption, TensorRT/ONNX video inference | TensorRT, ONNX Runtime, PyCUDA |
+| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async RabbitMQ Streams consumer for annotation CRUD events | rstream, msgpack |
+
+## System Flows
+
+| # | Flow | Entry Point | Path | Output |
+|---|------|-------------|------|--------|
+| 1 | Annotation Ingestion | RabbitMQ message | Queue → Handler → Filesystem | Images + labels on disk |
+| 2 | Data Augmentation | Filesystem scan (5-min loop) | /data/ → Augmentator → /data-processed/ | 8× augmented images + labels |
+| 3 | Training Pipeline | train.py __main__ | /data-processed/ → Dataset split → YOLO train → Export → Encrypt → Upload | Encrypted model on API + CDN |
+| 4 | Model Download & Inference | start_inference.py __main__ | API + CDN download → Decrypt → TensorRT init → Video frames → Detections | Annotated video output |
+| 5 | Model Export (Multi-Format) | train.py / manual_run.py | .pt → .onnx / .engine / .rknn | Multi-format model artifacts |
+
+## Risk Observations
+
+### Code Bugs (from Verification)
+
+| # | Location | Issue | Impact |
+|---|----------|-------|--------|
+| 1 | augmentation.py:118 | `self.total_to_process` undefined (should be `self.total_images_to_process`) | AttributeError during progress logging |
+| 2 | train.py:93,99 | `copied` counter never incremented | Incorrect progress reporting (cosmetic) |
+| 3 | exports.py:97 | Stale `ApiClient(ApiCredentials(...))` constructor call with wrong params | `upload_model` function would fail at runtime |
+| 4 | inference/tensorrt_engine.py:43-44 | `batch_size` uninitialized for dynamic batch dimensions | NameError for models with dynamic batch size |
+| 5 | dataset-visualiser.py:6 | Imports from `preprocessing` module that doesn't exist | Script cannot run |
+
+### Security Issues
+
+| Issue | Severity | Location |
+|-------|----------|----------|
+| Hardcoded API credentials | High | config.yaml |
+| Hardcoded CDN access keys (4 keys) | High | cdn.yaml |
+| Hardcoded model encryption key | High | security.py:67 |
+| Queue credentials in plaintext | Medium | config.yaml, annotation-queue/config.yaml |
+| No TLS certificate validation | Low | api_client.py |
+
+### Structural Concerns
+
+- No CI/CD pipeline or containerization
+- No formal test framework (2 script-based tests, 1 broken)
+- Duplicated AnnotationClass/WeatherMode code in 3 locations
+- No graceful shutdown for augmentation process
+- No reconnect logic for annotation queue consumer
+- Manual deployment only
+
+## Open Questions
+
+- The `preprocessing` module may have existed previously and been deleted or renamed — its absence breaks `dataset-visualiser.py` and `tests/imagelabel_visualize_test.py`
+- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in `train.py`
+- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
+- `checkpoint.txt` (2024-06-27) suggests training infrastructure was last used in mid-2024
+
+## Artifact Index
+
+| Path | Description | Step |
+|------|-------------|------|
+| `_docs/00_problem/problem.md` | Problem statement | 6 |
+| `_docs/00_problem/restrictions.md` | Hardware, software, environment, operational restrictions | 6 |
+| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria from code | 6 |
+| `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and formats | 6 |
+| `_docs/00_problem/security_approach.md` | Security mechanisms and known issues | 6 |
+| `_docs/01_solution/solution.md` | Retrospective solution document | 5 |
+| `_docs/02_document/00_discovery.md` | Codebase discovery: tree, tech stack, dependency graph | 0 |
+| `_docs/02_document/modules/*.md` | 21 module-level documentation files | 1 |
+| `_docs/02_document/components/0N_*/description.md` | 8 component specifications | 2 |
+| `_docs/02_document/diagrams/components.md` | Component relationship diagram (Mermaid) | 2 |
+| `_docs/02_document/architecture.md` | System architecture document | 3 |
+| `_docs/02_document/system-flows.md` | 5 system flow diagrams with sequence diagrams | 3 |
+| `_docs/02_document/data_model.md` | Data model with ER diagram | 3 |
+| `_docs/02_document/diagrams/flows/flow_*.md` | Individual flow diagrams (4 files) | 3 |
+| `_docs/02_document/04_verification_log.md` | Verification results: 87 entities, 5 bugs, 3 security issues | 4 |
+| `_docs/02_document/FINAL_report.md` | This report | 7 |
+| `_docs/02_document/state.json` | Document skill progress tracking | — |
@@ -0,0 +1,175 @@
+# Architecture
+
+## System Context
+
+Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.
+
+### Boundaries
+
+| Boundary | Interface | Protocol |
+|----------|-----------|----------|
+| Azaion REST API | ApiClient | HTTPS (JWT auth) |
+| S3-compatible CDN | CDNManager (boto3) | HTTPS (S3 API) |
+| RabbitMQ Streams | rstream Consumer | AMQP 1.0 |
+| Local filesystem | Direct I/O | POSIX paths at `/azaion/` |
+| NVIDIA GPU | PyTorch, TensorRT, ONNX RT, PyCUDA | CUDA 12.1 |
+
+### System Context Diagram
+
+```mermaid
+graph LR
+    subgraph "Azaion Platform"
+        API[Azaion REST API]
+        CDN[S3-compatible CDN]
+        Queue[RabbitMQ Streams]
+    end
+
+    subgraph "AI Training System"
+        AQ[Annotation Queue Consumer]
+        AUG[Augmentation Pipeline]
+        TRAIN[Training Pipeline]
+        INF[Inference Engine]
+    end
+
+    subgraph "Storage"
+        FS["/azaion/ filesystem"]
+    end
+
+    subgraph "Hardware"
+        GPU[NVIDIA GPU]
+    end
+
+    Queue -->|annotation events| AQ
+    AQ -->|images + labels| FS
+    FS -->|raw annotations| AUG
+    AUG -->|augmented data| FS
+    FS -->|processed dataset| TRAIN
+    TRAIN -->|trained model| GPU
+    TRAIN -->|encrypted model| API
+    TRAIN -->|encrypted model big part| CDN
+    API -->|encrypted model small part| INF
+    CDN -->|encrypted model big part| INF
+    INF -->|inference| GPU
+```
+
+## Tech Stack
+
+| Layer | Technology | Version/Detail |
+|-------|-----------|---------------|
+| Language | Python | 3.10+ (match statements) |
+| ML Framework | Ultralytics YOLO | YOLOv11 medium |
+| Deep Learning | PyTorch | 2.3.0 (CUDA 12.1) |
+| GPU Inference | TensorRT | FP16/INT8, async CUDA streams |
+| GPU Inference (alt) | ONNX Runtime GPU | CUDAExecutionProvider |
+| Edge Inference | RKNN | RK3588 (OrangePi5) |
+| Augmentation | Albumentations | Geometric + color transforms |
+| Computer Vision | OpenCV | Image I/O, preprocessing, display |
+| Object Storage | boto3 | S3-compatible CDN |
+| Message Queue | rstream | RabbitMQ Streams consumer |
+| Serialization | msgpack | Queue message format |
+| Encryption | cryptography | AES-256-CBC |
+| HTTP Client | requests | REST API communication |
+| Configuration | PyYAML | YAML config files |
+| Visualization | matplotlib, netron | Annotation display, model graphs |
+
+## Deployment Model
+
+The system runs as multiple independent processes on machines with NVIDIA GPUs:
+
+| Process | Entry Point | Runtime | Typical Host |
+|---------|------------|---------|-------------|
+| Training | `train.py` | Long-running (days) | GPU server (RTX 4090, 24GB VRAM) |
+| Augmentation | `augmentation.py` | Continuous loop (infinite) | Same GPU server or CPU-only |
+| Annotation Queue | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access |
+| Inference | `start_inference.py` | On-demand | GPU-equipped machine |
+| Data Tools | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine |
+
+No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.
+
+## Data Model Overview
+
+### Annotation Data Flow
+
+```
+Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
+                        → /azaion/data/ (validated)
+                        → /azaion/data-processed/ (augmented, 8×)
+                        → /azaion/datasets/azaion-{date}/ (train/valid/test split)
+                        → /azaion/data-corrupted/ (invalid labels)
+                        → /azaion/data_deleted/ (soft-deleted)
+```
+
+### Annotation Class System
+
+- 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
+- 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
+- Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
+- Format: YOLO (center_x, center_y, width, height — all normalized 0–1)
+
+### Model Artifacts
+
+| Format | Use | Export Details |
+|--------|-----|---------------|
+| `.pt` | Training checkpoint | YOLOv11 PyTorch weights |
+| `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in |
+| `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture |
+| `.rknn` | Edge inference | RK3588 target (OrangePi5) |
+
+## Integration Points
+
+### Azaion REST API
+- `POST /login` → JWT token
+- `POST /resources/{folder}` → file upload (Bearer auth)
+- `POST /resources/get/{folder}` → encrypted file download (hardware-bound key)
+
+### S3-compatible CDN
+- Upload: model big parts (`upload_fileobj`)
+- Download: model big parts (`download_file`)
+- Separate read/write access keys
+
+### RabbitMQ Streams
+- Queue: `azaion-annotations`
+- Protocol: AMQP with rstream library
+- Message format: msgpack with positional integer keys
+- Offset tracking: persisted to `offset.yaml`
+
+## Non-Functional Requirements (Observed)
+
+| Category | Observation | Source |
+|----------|------------|--------|
+| Training duration | ~11.5 days for 360K annotations on 1× RTX 4090 | Code comment in train.py |
+| VRAM usage | batch=11 → ~22GB (batch=12 fails at 24.2GB) | Code comment in train.py |
+| Inference speed | TensorRT: 54s for 200s video (3.7GB VRAM) | Code comment in start_inference.py |
+| ONNX inference | 81s for 200s video (6.3GB VRAM) | Code comment in start_inference.py |
+| Augmentation ratio | 8× (1 original + 7 augmented per image) | augmentation.py |
+| Frame sampling | Every 4th frame during inference | inference/inference.py |
+
+## Security Architecture
+
+| Mechanism | Implementation | Location |
+|-----------|---------------|----------|
+| API authentication | JWT token (email/password login) | api_client.py |
+| Resource encryption | AES-256-CBC (hardware-bound key) | security.py |
+| Model encryption | AES-256-CBC (static key) | security.py |
+| Split model storage | Small part on API, big part on CDN | api_client.py |
+| Hardware fingerprinting | CPU+GPU+RAM+drive serial hash | hardware_service.py |
+| CDN access control | Separate read/write S3 credentials | cdn_manager.py |
+
+### Security Concerns
+- Hardcoded credentials in `config.yaml` and `cdn.yaml`
+- Hardcoded model encryption key in `security.py`
+- No TLS certificate validation visible in code
+- No input validation on API responses
+- Queue credentials in plaintext config files
+
+## Key Architectural Decisions
+
+| Decision | Rationale (inferred) |
+|----------|---------------------|
+| YOLOv11 medium at 1280px | Balance between detection quality and training time |
+| Split model storage | Prevent model theft from single storage compromise |
+| Hardware-bound API encryption | Tie resource access to authorized machines |
+| TensorRT for production inference | ~33% faster than ONNX, ~42% less VRAM |
+| Augmentation as separate process | Decouples data prep from training; runs continuously |
+| Annotation queue as separate service | Independent lifecycle; different dependency set |
+| RKNN export for OrangePi5 | Edge deployment on low-power ARM SoC |
@@ -0,0 +1,53 @@
+# Component: Core Infrastructure
+
+## Overview
+Shared constants and utility classes that form the foundation for all other components. Provides path definitions, config file references, and helper data structures.
+
+**Pattern**: Configuration constants + utility library
+**Upstream**: None (leaf component)
+**Downstream**: All other components
+
+## Modules
+- `constants` — filesystem paths, config keys, thresholds
+- `utils` — Dotdict helper class
+
+## Internal Interfaces
+
+### constants (public symbols)
+All path/string constants — see module doc for full list. Key exports:
+- Directory paths: `data_dir`, `processed_dir`, `datasets_dir`, `models_dir` and their images/labels subdirectories
+- Config references: `CONFIG_FILE`, `CDN_CONFIG`, `OFFSET_FILE`
+- Model paths: `CURRENT_PT_MODEL`, `CURRENT_ONNX_MODEL`
+- Thresholds: `SMALL_SIZE_KB = 3`
+
+### utils.Dotdict
+```python
+class Dotdict(dict):
+    # Enables config.url instead of config["url"]
+```
+
+## Data Access Patterns
+None — pure constants, no I/O.
+
+## Implementation Details
+- All paths rooted at `/azaion/` — assumes a fixed deployment directory structure
+- No environment-variable override for any path — paths are entirely static
+
+## Caveats
+- Hardcoded root `/azaion/` makes local development without that directory structure impossible
+- No `.env` or environment-based configuration override mechanism
+- `Dotdict.__getattr__` uses `dict.get` which returns `None` for missing keys instead of raising `AttributeError`
+
+## Dependency Graph
+```mermaid
+graph TD
+    constants --> api_client_comp[API & CDN]
+    constants --> training_comp[Training]
+    constants --> data_pipeline_comp[Data Pipeline]
+    constants --> inference_comp[Inference]
+    utils --> training_comp
+    utils --> inference_comp
+```
+
+## Logging Strategy
+None.
@@ -0,0 +1,59 @@
+# Component: Security & Hardware Identity
+
+## Overview
+Provides cryptographic operations (AES-256-CBC encryption/decryption) and hardware fingerprinting. Used for protecting model files in transit and at rest, and for binding API encryption keys to specific machines.
+
+**Pattern**: Utility/service library (static methods)
+**Upstream**: None (leaf component)
+**Downstream**: API & CDN, Training, Inference
+
+## Modules
+- `security` — AES encryption, key derivation (SHA-384), hardcoded model key
+- `hardware_service` — cross-platform hardware info collection (CPU, GPU, RAM, drive serial)
+
+## Internal Interfaces
+
+### Security (static methods)
+```python
+Security.encrypt_to(input_bytes: bytes, key: str) -> bytes
+Security.decrypt_to(ciphertext_with_iv: bytes, key: str) -> bytes
+Security.calc_hash(key: str) -> str
+Security.get_hw_hash(hardware: str) -> str
+Security.get_api_encryption_key(creds, hardware_hash: str) -> str
+Security.get_model_encryption_key() -> str
+```
+
+### hardware_service
+```python
+get_hardware_info() -> str
+```
+
+## Data Access Patterns
+- `hardware_service` executes shell commands to query OS/hardware info
+- `security` performs in-memory cryptographic operations only
+
+## Implementation Details
+- **Encryption**: AES-256-CBC. Key = SHA-256(key_string). IV = 16 random bytes prepended to ciphertext. PKCS7 padding.
+- **Key derivation hierarchy**:
+  1. `get_model_encryption_key()` → hardcoded secret → SHA-384 → base64
+  2. `get_hw_hash(hardware_string)` → salted hardware string → SHA-384 → base64
+  3. `get_api_encryption_key(creds, hw_hash)` → email+password+hw_hash+salt → SHA-384 → base64
+- **Hardware fingerprint format**: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {serial}`
+
+## Caveats
+- **Hardcoded model encryption key** in `get_model_encryption_key()` — anyone with source code access can derive the key
+- **Shell command injection risk**: `hardware_service` uses `shell=True` subprocess — safe since no user input is involved, but fragile
+- **PKCS7 unpadding** in `decrypt_to` uses manual check instead of the `cryptography` library's unpadder — potential padding oracle if error handling is observed
+- `BUFFER_SIZE` constant declared but unused in security.py
+
+## Dependency Graph
+```mermaid
+graph TD
+    hardware_service --> api_client[API & CDN: api_client]
+    security --> api_client
+    security --> training[Training]
+    security --> inference[Inference: start_inference]
+```
+
+## Logging Strategy
+None — operations are silent except for exceptions.
@@ -0,0 +1,90 @@
+# Component: API & CDN Client
+
+## Overview
+Communication layer for the Azaion backend API and S3-compatible CDN. Handles authentication, encrypted file transfer, and the split-resource pattern for secure model distribution.
+
+**Pattern**: Client library with split-storage resource management
+**Upstream**: Core (constants), Security (encryption, hardware identity)
+**Downstream**: Training, Inference, Exports
+
+## Modules
+- `api_client` — REST client for Azaion API, JWT auth, encrypted resource download/upload, split big/small pattern
+- `cdn_manager` — boto3 S3 client with separate read/write credentials
+
+## Internal Interfaces
+
+### CDNCredentials
+```python
+CDNCredentials(host, downloader_access_key, downloader_access_secret, uploader_access_key, uploader_access_secret)
+```
+
+### CDNManager
+```python
+CDNManager(credentials: CDNCredentials)
+CDNManager.upload(bucket: str, filename: str, file_bytes: bytearray) -> bool
+CDNManager.download(bucket: str, filename: str) -> bool
+```
+
+### ApiCredentials
+```python
+ApiCredentials(url, email, password)
+```
+
+### ApiClient
+```python
+ApiClient()
+ApiClient.login() -> None
+ApiClient.upload_file(filename: str, file_bytes: bytearray, folder: str) -> None
+ApiClient.load_bytes(filename: str, folder: str) -> bytes
+ApiClient.load_big_small_resource(resource_name: str, folder: str, key: str) -> bytes
+ApiClient.upload_big_small_resource(resource: bytes, resource_name: str, folder: str, key: str) -> None
+```
+
+## External API Specification
+
+### Azaion REST API (consumed)
+| Endpoint | Method | Auth | Description |
+|----------|--------|------|-------------|
+| `/login` | POST | None (returns JWT) | `{"email": ..., "password": ...}` → `{"token": ...}` |
+| `/resources/{folder}` | POST | Bearer JWT | Multipart file upload |
+| `/resources/get/{folder}` | POST | Bearer JWT | Download encrypted resource (sends hardware info in body) |
+
+### S3-compatible CDN
+| Operation | Description |
+|-----------|-------------|
+| `upload_fileobj` | Upload bytes to S3 bucket |
+| `download_file` | Download file from S3 bucket to disk |
+
+## Data Access Patterns
+- API Client reads `config.yaml` on init for API credentials
+- CDN credentials loaded by API Client from encrypted `cdn.yaml` (downloaded from API)
+- Split resources: big part stored locally + CDN, small part on API server
+
+## Implementation Details
+- **JWT auto-refresh**: On 401/403 response, automatically re-authenticates and retries
+- **Split-resource pattern**: Encrypts data → splits at ~20% (SMALL_SIZE_KB * 1024 min) boundary → small part to API, big part to CDN. Neither part alone can reconstruct the original.
+- **CDN credential isolation**: Separate S3 access keys for upload vs download (least-privilege)
+- **CDN self-bootstrap**: `cdn.yaml` credentials are themselves encrypted and downloaded from the API during ApiClient init
+
+## Caveats
+- Credentials hardcoded in `config.yaml` and `cdn.yaml` — not using environment variables or secrets manager
+- `cdn_manager.download()` saves to current working directory with the same filename
+- No retry logic beyond JWT refresh (no exponential backoff, no connection retry)
+- `CDNManager` imports `sys`, `yaml`, `os` but doesn't use them
+
+## Dependency Graph
+```mermaid
+graph TD
+    constants --> api_client
+    security --> api_client
+    hardware_service --> api_client
+    cdn_manager --> api_client
+    api_client --> exports
+    api_client --> train
+    api_client --> start_inference
+    cdn_manager --> exports
+    cdn_manager --> train
+```
+
+## Logging Strategy
+Print statements for upload/download confirmations and errors. No structured logging.
@@ -0,0 +1,61 @@
+# Component: Data Models
+
+## Overview
+Shared data transfer objects for the training pipeline: annotation class definitions (with weather modes) and image+label containers for visualization and augmentation.
+
+**Pattern**: Plain data classes / value objects
+**Upstream**: None (leaf)
+**Downstream**: Data Pipeline (augmentation, dataset-visualiser), Training (YAML generation)
+
+## Modules
+- `dto/annotationClass` — AnnotationClass, WeatherMode enum, classes.json reader
+- `dto/imageLabel` — ImageLabel container with bbox visualization
+
+## Internal Interfaces
+
+### WeatherMode (Enum)
+| Member | Value | Description |
+|--------|-------|-------------|
+| Norm | 0 | Normal weather |
+| Wint | 20 | Winter |
+| Night | 40 | Night |
+
+### AnnotationClass
+```python
+AnnotationClass(id: int, name: str, color: str)
+AnnotationClass.read_json() -> dict[int, AnnotationClass]  # static
+AnnotationClass.color_tuple -> tuple  # property, RGB ints
+```
+
+### ImageLabel
+```python
+ImageLabel(image_path: str, image: np.ndarray, labels_path: str, labels: list)
+ImageLabel.visualize(annotation_classes: dict) -> None
+```
+
+## Data Access Patterns
+- `AnnotationClass.read_json()` reads `classes.json` from project root (relative to `dto/` parent)
+- `ImageLabel.visualize()` renders to matplotlib window (no disk I/O)
+
+## Implementation Details
+- 17 base annotation classes × 3 weather modes = 51 classes with offset IDs (0–16, 20–36, 40–56)
+- System reserves 80 class slots (DEFAULT_CLASS_NUM in train.py)
+- YOLO label format: [x_center, y_center, width, height, class_id] — all normalized 0–1
+- `color_tuple` parsing strips first 3 chars (assumes "#ff" prefix format) — fragile if color format changes
+
+## Caveats
+- `AnnotationClass` duplicated in 3 locations (dto, inference/dto, annotation-queue/annotation_queue_dto) with slight differences
+- `color_tuple` property has a non-obvious parsing approach that may break on different color string formats
+- Empty files: `dto/annotation_bulk_message.py` and `dto/annotation_message.py` suggest planned but unimplemented DTOs
+
+## Dependency Graph
+```mermaid
+graph TD
+    dto_annotationClass[dto/annotationClass] --> train
+    dto_annotationClass --> dataset-visualiser
+    dto_imageLabel[dto/imageLabel] --> augmentation
+    dto_imageLabel --> dataset-visualiser
+```
+
+## Logging Strategy
+None.
@@ -0,0 +1,74 @@
+# Component: Data Pipeline
+
+## Overview
+Tools for preparing and managing annotation data: augmentation of training images, format conversion from external annotation systems, and visual inspection of annotated datasets.
+
+**Pattern**: Batch processing tools (standalone scripts + library)
+**Upstream**: Core (constants), Data Models (ImageLabel, AnnotationClass)
+**Downstream**: Training (augmented images feed into dataset formation)
+
+## Modules
+- `augmentation` — image augmentation pipeline (albumentations)
+- `convert-annotations` — Pascal VOC / oriented bbox → YOLO format converter
+- `dataset-visualiser` — interactive annotation visualization tool
+
+## Internal Interfaces
+
+### Augmentator
+```python
+Augmentator()
+Augmentator.augment_annotations(from_scratch: bool = False) -> None
+Augmentator.augment_inner(img_ann: ImageLabel) -> list[ImageLabel]
+Augmentator.correct_bboxes(labels) -> list
+Augmentator.read_labels(labels_path) -> list[list]
+```
+
+### convert-annotations (functions)
+```python
+convert(folder, dest_folder, read_annotations, ann_format) -> None
+minmax2yolo(width, height, xmin, xmax, ymin, ymax) -> tuple
+read_pascal_voc(width, height, s: str) -> list[str]
+read_bbox_oriented(width, height, s: str) -> list[str]
+```
+
+### dataset-visualiser (functions)
+```python
+visualise_dataset() -> None
+visualise_processed_folder() -> None
+```
+
+## Data Access Patterns
+- **Augmentation**: Reads from `/azaion/data/images/` + `/azaion/data/labels/`, writes to `/azaion/data-processed/images/` + `/azaion/data-processed/labels/`
+- **Conversion**: Reads from user-specified source folder, writes to destination folder
+- **Visualiser**: Reads from datasets or processed folder, renders to matplotlib window
+
+## Implementation Details
+- **Augmentation pipeline**: Per image → 1 original copy + 7 augmented variants (8× data expansion)
+  - HorizontalFlip (60%), BrightnessContrast (40%), Affine (80%), MotionBlur (10%), HueSaturation (40%)
+  - Bbox correction clips outside-boundary boxes, removes boxes < 1% of image
+  - Incremental: skips already-processed images
+  - Continuous mode: infinite loop with 5-minute sleep between rounds
+  - Concurrent: ThreadPoolExecutor for parallel image processing
+- **Format conversion**: Pluggable reader pattern — `convert()` accepts any reader function that maps (width, height, text) → YOLO lines
+- **Visualiser**: Interactive (waits for keypress) — developer debugging tool
+
+## Caveats
+- `dataset-visualiser` imports from `preprocessing` module which does not exist — broken import
+- `dataset-visualiser` has hardcoded dataset date (`2024-06-18`) and start index (35247)
+- `convert-annotations` hardcodes class mappings (Truck=1, Car/Taxi=2) — not configurable
+- Augmentation parameters are hardcoded, not configurable via config file
+- Augmentation `total_to_process` attribute referenced in `augment_annotation` but never set (uses `total_images_to_process`)
+
+## Dependency Graph
+```mermaid
+graph TD
+    constants --> augmentation
+    dto_imageLabel[dto/imageLabel] --> augmentation
+    constants --> dataset-visualiser
+    dto_annotationClass[dto/annotationClass] --> dataset-visualiser
+    dto_imageLabel --> dataset-visualiser
+    augmentation --> manual_run
+```
+
+## Logging Strategy
+Print statements for progress tracking (processed count, errors). No structured logging.
@@ -0,0 +1,87 @@
+# Component: Training Pipeline
+
+## Overview
+End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.
+
+**Pattern**: Pipeline / orchestrator
+**Upstream**: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images)
+**Downstream**: None (produces trained models consumed externally)
+
+## Modules
+- `train` — main pipeline: dataset formation → YOLO training → export → upload
+- `exports` — model format conversion (ONNX, TensorRT, RKNN) + upload utilities
+- `manual_run` — ad-hoc developer script for selective pipeline steps
+
+## Internal Interfaces
+
+### train
+```python
+form_dataset() -> None
+copy_annotations(images, folder: str) -> None
+check_label(label_path: str) -> bool
+create_yaml() -> None
+resume_training(last_pt_path: str) -> None
+train_dataset() -> None
+export_current_model() -> None
+```
+
+### exports
+```python
+export_rknn(model_path: str) -> None
+export_onnx(model_path: str, batch_size: int = 4) -> None
+export_tensorrt(model_path: str) -> None
+form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
+show_model(model: str = None) -> None
+upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None
+```
+
+## Data Access Patterns
+- **Input**: Reads augmented images from `/azaion/data-processed/images/` + labels
+- **Dataset output**: Creates dated dataset at `/azaion/datasets/azaion-{YYYY-MM-DD}/` with train/valid/test splits
+- **Model output**: Saves trained models to `/azaion/models/azaion-{YYYY-MM-DD}/`, copies best.pt to `/azaion/models/azaion.pt`
+- **Upload**: Encrypted model uploaded as split big/small to CDN + API
+- **Corrupted data**: Invalid labels moved to `/azaion/data-corrupted/`
+
+## Implementation Details
+- **Dataset split**: 70% train / 20% valid / 10% test (random shuffle)
+- **Label validation**: `check_label()` verifies all YOLO coordinates are ≤ 1.0
+- **YAML generation**: Writes `data.yaml` with 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders)
+- **Training config**: YOLOv11 medium (`yolo11m.yaml`), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24
+- **Post-training**: Removes intermediate epoch checkpoints, keeps only `best.pt`
+- **Export chain**: `.pt` → ONNX (1280px, batch=4, NMS) → encrypted → split → upload
+- **TensorRT export**: batch=4, FP16, NMS, simplify
+- **RKNN export**: targets RK3588 SoC (OrangePi5)
+- **Concurrent file copying**: ThreadPoolExecutor for parallel image/label copying during dataset formation
+- **`__main__`** in `train.py`: `train_dataset()` → `export_current_model()`
+
+## Caveats
+- Training hyperparameters are hardcoded (not configurable via config file)
+- `old_images_percentage = 75` declared but unused
+- `train.py` imports `subprocess`, `sleep` but doesn't use them
+- `train.py` imports `OnnxEngine` but doesn't use it
+- `exports.upload_model()` creates `ApiClient` with different constructor signature than the one in `api_client.py` — likely stale code
+- `copy_annotations` uses a global `total_files_copied` counter with a local `copied` variable that stays at 0 — reporting bug
+- `resume_training` references `yaml` (the module) instead of a YAML file path in the `data` parameter
+
+## Dependency Graph
+```mermaid
+graph TD
+    constants --> train
+    constants --> exports
+    api_client --> train
+    api_client --> exports
+    cdn_manager --> train
+    cdn_manager --> exports
+    security --> train
+    security --> exports
+    utils --> train
+    utils --> exports
+    dto_annotationClass[dto/annotationClass] --> train
+    inference_onnx[inference/onnx_engine] --> train
+    exports --> train
+    train --> manual_run
+    augmentation --> manual_run
+```
+
+## Logging Strategy
+Print statements for progress (file count, shuffling status, training results). No structured logging.
@@ -0,0 +1,85 @@
+# Component: Inference Engine
+
+## Overview
+Real-time object detection inference subsystem supporting ONNX Runtime and TensorRT backends. Processes video streams with batched inference, custom NMS, and live visualization.
+
+**Pattern**: Strategy pattern (InferenceEngine ABC) + pipeline orchestrator
+**Upstream**: Core, Security, API & CDN (for model download)
+**Downstream**: None (end-user facing — processes video input)
+
+## Modules
+- `inference/dto` — Detection, Annotation, AnnotationClass data classes
+- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
+- `inference/tensorrt_engine` — TensorRTEngine implementation with CUDA memory management + ONNX converter
+- `inference/inference` — Video processing pipeline (preprocess → infer → postprocess → draw)
+- `start_inference` — Entry point: downloads model, initializes engine, runs on video
+
+## Internal Interfaces
+
+### InferenceEngine (ABC)
+```python
+InferenceEngine.__init__(model_path: str, batch_size: int = 1, **kwargs)
+InferenceEngine.get_input_shape() -> Tuple[int, int]
+InferenceEngine.get_batch_size() -> int
+InferenceEngine.run(input_data: np.ndarray) -> List[np.ndarray]
+```
+
+### OnnxEngine (extends InferenceEngine)
+Constructor takes `model_bytes` (not path). Uses CUDAExecutionProvider + CPUExecutionProvider.
+
+### TensorRTEngine (extends InferenceEngine)
+Constructor takes `model_bytes: bytes`. Additional static methods:
+```python
+TensorRTEngine.get_gpu_memory_bytes(device_id=0) -> int
+TensorRTEngine.get_engine_filename(device_id=0) -> str | None
+TensorRTEngine.convert_from_onnx(onnx_model: bytes) -> bytes | None
+```
+
+### Inference
+```python
+Inference(engine: InferenceEngine, confidence_threshold, iou_threshold)
+Inference.preprocess(frames: list) -> np.ndarray
+Inference.postprocess(batch_frames, batch_timestamps, output) -> list[Annotation]
+Inference.process(video: str) -> None
+Inference.draw(annotation: Annotation) -> None
+Inference.remove_overlapping_detections(detections) -> list[Detection]
+```
+
+## Data Access Patterns
+- Model bytes loaded by caller (start_inference via ApiClient.load_big_small_resource)
+- Video input via cv2.VideoCapture (file path)
+- No disk writes during inference
+
+## Implementation Details
+- **Video processing**: Every 4th frame processed (25% frame sampling), batched to engine batch size
+- **Preprocessing**: cv2.dnn.blobFromImage (1/255 scale, model input size, BGR→RGB)
+- **Postprocessing**: Raw detections filtered by confidence, coordinates normalized to [0,1], custom NMS applied
+- **Custom NMS**: Pairwise IoU comparison. Keeps higher confidence; ties broken by lower class ID.
+- **TensorRT**: Async CUDA execution (memcpy_htod_async → execute_async_v3 → synchronize → memcpy_dtoh)
+- **TensorRT shapes**: Default 1280×1280 input, 300 max detections, 6 values per detection (x1,y1,x2,y2,conf,cls)
+- **ONNX conversion**: TensorRT builder with 90% GPU memory workspace, FP16 if supported
+- **Engine filename**: GPU-architecture-specific: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine`
+- **start_inference flow**: ApiClient → load encrypted TensorRT model (big/small split) → decrypt → TensorRTEngine → Inference.process()
+
+## Caveats
+- `start_inference.get_engine_filename()` duplicates `TensorRTEngine.get_engine_filename()`
+- Video path hardcoded in `start_inference` (`tests/ForAI_test.mp4`)
+- `inference/dto` has its own AnnotationClass — duplicated from `dto/annotationClass`
+- cv2.imshow display requires a GUI environment — won't work headless
+- TensorRT `batch_size` attribute used before assignment if engine input shape has dynamic batch — potential NameError
+
+## Dependency Graph
+```mermaid
+graph TD
+    inference_dto[inference/dto] --> inference_inference[inference/inference]
+    inference_onnx[inference/onnx_engine] --> inference_inference
+    inference_onnx --> inference_trt[inference/tensorrt_engine]
+    inference_trt --> start_inference
+    inference_inference --> start_inference
+    constants --> start_inference
+    api_client --> start_inference
+    security --> start_inference
+```
+
+## Logging Strategy
+Print statements for metadata, download progress, timing. cv2.imshow for visual output.
@@ -0,0 +1,71 @@
+# Component: Annotation Queue Service
+
+## Overview
+Self-contained async service that consumes annotation CRUD events from a RabbitMQ Streams queue and persists images + labels to the filesystem. Operates independently from the training pipeline.
+
+**Pattern**: Message-driven event handler / consumer service
+**Upstream**: External RabbitMQ Streams queue (Azaion platform)
+**Downstream**: Data Pipeline (files written become input for augmentation)
+
+## Modules
+- `annotation-queue/annotation_queue_dto` — message DTOs (AnnotationMessage, AnnotationBulkMessage, AnnotationStatus, Detection, etc.)
+- `annotation-queue/annotation_queue_handler` — async queue consumer with message routing and file management
+
+## Internal Interfaces
+
+### AnnotationQueueHandler
+```python
+AnnotationQueueHandler()
+AnnotationQueueHandler.start() -> async
+AnnotationQueueHandler.on_message(message: AMQPMessage, context: MessageContext) -> None
+AnnotationQueueHandler.save_annotation(ann: AnnotationMessage) -> None
+AnnotationQueueHandler.validate(msg: AnnotationBulkMessage) -> None
+AnnotationQueueHandler.delete(msg: AnnotationBulkMessage) -> None
+```
+
+### Key DTOs
+```python
+AnnotationMessage(msgpack_bytes)  # Full annotation with image + detections
+AnnotationBulkMessage(msgpack_bytes)  # Bulk validate/delete
+AnnotationStatus: Created(10), Edited(20), Validated(30), Deleted(40)
+RoleEnum: Operator(10), Validator(20), CompanionPC(30), Admin(40), ApiAdmin(1000)
+```
+
+## Data Access Patterns
+- **Queue**: Consumes from RabbitMQ Streams queue `azaion-annotations` using rstream library
+- **Offset persistence**: `offset.yaml` tracks last processed message offset for resume
+- **Filesystem writes**:
+  - Validated annotations → `{root}/data/images/` + `{root}/data/labels/`
+  - Unvalidated (seed) → `{root}/data-seed/images/` + `{root}/data-seed/labels/`
+  - Deleted → `{root}/data_deleted/images/` + `{root}/data_deleted/labels/`
+
+## Implementation Details
+- **Message routing**: Based on `AnnotationStatus` from AMQP application properties:
+  - Created/Edited → save label + optionally image; validator role writes to data, operator to seed
+  - Validated (bulk) → move from seed to data
+  - Deleted (bulk) → move to deleted directory
+- **Role-based logic**: `RoleEnum.is_validator()` returns True for Validator, Admin, ApiAdmin — these roles write directly to validated data directory
+- **Serialization**: Messages are msgpack-encoded with positional integer keys. Detections are embedded as a JSON string within the msgpack payload.
+- **Offset tracking**: After each successfully processed message, offset is persisted to `offset.yaml` (survives restarts)
+- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, writes to `logs/` directory
+- **Separate dependencies**: Own `requirements.txt` (pyyaml, msgpack, rstream only)
+- **Own config.yaml**: Points to test directories by default (`data-test`, `data-test-seed`)
+
+## Caveats
+- Credentials hardcoded in `config.yaml` (queue host, user, password)
+- AnnotationClass duplicated (third copy) with slight differences from dto/ version
+- No reconnection logic for queue disconnections
+- No dead-letter queue or message retry on processing failures
+- `save_annotation` writes empty label files when detections list has no newline separators between entries
+- The annotation-queue `config.yaml` uses different directory names (`data-test` vs `data`) than the main `config.yaml` — likely a test vs production configuration issue
+
+## Dependency Graph
+```mermaid
+graph TD
+    annotation_queue_dto --> annotation_queue_handler
+    rstream_ext[rstream library] --> annotation_queue_handler
+    msgpack_ext[msgpack library] --> annotation_queue_dto
+```
+
+## Logging Strategy
+`logging` module with TimedRotatingFileHandler. Format: `HH:MM:SS|message`. Daily rotation, 7-day retention. Also outputs to stdout.
@@ -0,0 +1,106 @@
+# Data Model
+
+## Entity Overview
+
+This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
+
+## Entities
+
+### Annotation Image
+- **Storage**: JPEG files on filesystem
+- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
+- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
+
+### Annotation Label (YOLO format)
+- **Storage**: Text files on filesystem
+- **Naming**: `{uuid}.txt` (matches image name)
+- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
+- **Coordinates**: All normalized to 0–1 range relative to image dimensions
+
+### AnnotationClass
+- **Storage**: `classes.json` (static file, 17 entries)
+- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
+- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
+- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
+
+### Detection (inference)
+- **In-memory only**: Created during inference postprocessing
+- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
+
+### Annotation (inference)
+- **In-memory only**: Groups detections per video frame
+- **Fields**: frame (image), time (ms), detections (list)
+
+### AnnotationMessage (queue)
+- **Wire format**: msgpack with positional integer keys
+- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
+
+### ML Model
+- **Formats**: .pt, .onnx, .engine, .rknn
+- **Encryption**: AES-256-CBC before upload
+- **Split storage**: .small part (API server) + .big part (CDN)
+- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
+
+## Filesystem Entity Relationships
+
+```mermaid
+erDiagram
+    ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
+    ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
+    ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
+    ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
+    DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
+    TRAINING_RUN ||--|| MODEL_PT : "produces"
+    MODEL_PT ||--|| MODEL_ONNX : "exported to"
+    MODEL_PT ||--|| MODEL_ENGINE : "exported to"
+    MODEL_PT ||--|| MODEL_RKNN : "exported to"
+    MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
+    MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
+    ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
+    ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
+```
+
+## Directory Layout (Data Lifecycle)
+
+```
+/azaion/
+├── data-seed/              ← Unvalidated annotations (from operators)
+│   ├── images/
+│   └── labels/
+├── data/                   ← Validated annotations (from validators/admins)
+│   ├── images/
+│   └── labels/
+├── data-processed/         ← Augmented data (8× expansion)
+│   ├── images/
+│   └── labels/
+├── data-corrupted/         ← Invalid labels (coords > 1.0)
+│   ├── images/
+│   └── labels/
+├── data_deleted/           ← Soft-deleted annotations
+│   ├── images/
+│   └── labels/
+├── data-sample/            ← Random sample for review
+├── datasets/               ← Training datasets (dated)
+│   └── azaion-{YYYY-MM-DD}/
+│       ├── train/images/ + labels/
+│       ├── valid/images/ + labels/
+│       ├── test/images/ + labels/
+│       └── data.yaml
+└── models/                 ← Trained model artifacts
+    ├── azaion.pt           ← Current best model
+    ├── azaion.onnx         ← Current ONNX export
+    └── azaion-{YYYY-MM-DD}/← Per-training-run results
+        └── weights/
+            └── best.pt
+```
+
+## Configuration Files
+
+| File | Location | Contents |
+|------|----------|---------|
+| `config.yaml` | Project root | API credentials, queue config, directory paths |
+| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
+| `classes.json` | Project root | Annotation class definitions (17 classes) |
+| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
+| `offset.yaml` | annotation-queue/ | Queue consumer offset |
+| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |
@@ -0,0 +1,99 @@
+# Component Relationship Diagram
+
+```mermaid
+graph TD
+    subgraph "Core Infrastructure"
+        core[01 Core<br/>constants, utils]
+    end
+
+    subgraph "Security & Hardware"
+        sec[02 Security<br/>security, hardware_service]
+    end
+
+    subgraph "API & CDN Client"
+        api[03 API & CDN<br/>api_client, cdn_manager]
+    end
+
+    subgraph "Data Models"
+        dto[04 Data Models<br/>dto/annotationClass, dto/imageLabel]
+    end
+
+    subgraph "Data Pipeline"
+        data[05 Data Pipeline<br/>augmentation, convert-annotations,<br/>dataset-visualiser]
+    end
+
+    subgraph "Training Pipeline"
+        train[06 Training<br/>train, exports, manual_run]
+    end
+
+    subgraph "Inference Engine"
+        infer[07 Inference<br/>inference/*, start_inference]
+    end
+
+    subgraph "Annotation Queue Service"
+        queue[08 Annotation Queue<br/>annotation-queue/*]
+    end
+
+    core --> api
+    core --> data
+    core --> train
+    core --> infer
+
+    sec --> api
+    sec --> train
+    sec --> infer
+
+    api --> train
+    api --> infer
+
+    dto --> data
+    dto --> train
+
+    data -.->|augmented images<br/>on filesystem| train
+
+    queue -.->|annotation files<br/>on filesystem| data
+
+    style core fill:#e8f5e9
+    style sec fill:#fff3e0
+    style api fill:#e3f2fd
+    style dto fill:#f3e5f5
+    style data fill:#fce4ec
+    style train fill:#e0f2f1
+    style infer fill:#f9fbe7
+    style queue fill:#efebe9
+```
+
+## Component Summary
+
+| # | Component | Modules | Purpose |
+|---|-----------|---------|---------|
+| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, helper classes |
+| 02 | Security & Hardware | security, hardware_service | AES encryption, key derivation, hardware fingerprinting |
+| 03 | API & CDN Client | api_client, cdn_manager | REST API + S3 CDN communication, split-resource pattern |
+| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation classes, image+label container |
+| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Data prep: augmentation, format conversion, visualization |
+| 06 | Training Pipeline | train, exports, manual_run | YOLO training, model export, encrypted upload |
+| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Real-time video object detection |
+| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async annotation event consumer service |
+
+## Module Coverage Verification
+All 21 source modules are covered by exactly one component:
+- 01: constants, utils (2)
+- 02: security, hardware_service (2)
+- 03: api_client, cdn_manager (2)
+- 04: dto/annotationClass, dto/imageLabel (2)
+- 05: augmentation, convert-annotations, dataset-visualiser (3)
+- 06: train, exports, manual_run (3)
+- 07: inference/dto, inference/onnx_engine, inference/tensorrt_engine, inference/inference, start_inference (5)
+- 08: annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler (2)
+- **Total: 21 modules covered**
+
+## Inter-Component Communication
+| From | To | Mechanism |
+|------|----|-----------|
+| Annotation Queue → Data Pipeline | Filesystem | Queue writes images/labels → augmentation reads them |
+| Data Pipeline → Training | Filesystem | Augmented images in `/azaion/data-processed/` → dataset formation |
+| Training → API & CDN | API calls | Encrypted model upload (split big/small) |
+| Inference → API & CDN | API calls | Encrypted model download (reassemble big/small) |
+| API & CDN → Security | Function calls | Encryption/decryption for transit protection |
+| API & CDN → Core | Import | Path constants, config file references |
@@ -0,0 +1,3 @@
+# Flow: Annotation Ingestion
+
+See `_docs/02_document/system-flows.md` — Flow 1.
@@ -0,0 +1,3 @@
+# Flow: Data Augmentation
+
+See `_docs/02_document/system-flows.md` — Flow 2.
@@ -0,0 +1,3 @@
+# Flow: Model Download & Inference
+
+See `_docs/02_document/system-flows.md` — Flow 4.
@@ -0,0 +1,3 @@
+# Flow: Training Pipeline
+
+See `_docs/02_document/system-flows.md` — Flow 3.
@@ -0,0 +1,97 @@
+# Module: annotation-queue/annotation_queue_dto
+
+## Purpose
+Data transfer objects for the annotation queue consumer. Defines message types for annotation CRUD events received from a RabbitMQ Streams queue.
+
+## Public Interface
+
+### AnnotationClass (local copy)
+Same as dto/annotationClass but reads `classes.json` from current working directory and adds `opencv_color` BGR field.
+
+### AnnotationStatus (Enum)
+| Member | Value |
+|--------|-------|
+| Created | 10 |
+| Edited | 20 |
+| Validated | 30 |
+| Deleted | 40 |
+
+### SourceEnum (Enum)
+| Member | Value |
+|--------|-------|
+| AI | 0 |
+| Manual | 1 |
+
+### RoleEnum (Enum)
+| Member | Value | Description |
+|--------|-------|-------------|
+| Operator | 10 | Regular annotator |
+| Validator | 20 | Annotation validator |
+| CompanionPC | 30 | Companion device |
+| Admin | 40 | Administrator |
+| ApiAdmin | 1000 | API-level admin |
+
+`RoleEnum.is_validator() -> bool`: Returns True for Validator, Admin, ApiAdmin.
+
+### Detection
+| Field | Type |
+|-------|------|
+| `annotation_name` | str |
+| `cls` | int |
+| `x`, `y`, `w`, `h` | float |
+| `confidence` | float (optional) |
+
+### AnnotationCreatedMessageNarrow
+Lightweight message with only `name` and `createdEmail` (from msgpack fields 1, 2).
+
+### AnnotationMessage
+Full annotation message deserialized from msgpack:
+| Field | Type | Source |
+|-------|------|--------|
+| `createdDate` | datetime | msgpack field 0 (Timestamp) |
+| `name` | str | field 1 |
+| `originalMediaName` | str | field 2 |
+| `time` | timedelta | field 3 (microseconds/10) |
+| `imageExtension` | str | field 4 |
+| `detections` | list[Detection] | field 5 (JSON string) |
+| `image` | bytes | field 6 |
+| `createdRole` | RoleEnum | field 7 |
+| `createdEmail` | str | field 8 |
+| `source` | SourceEnum | field 9 |
+| `status` | AnnotationStatus | field 10 |
+
+### AnnotationBulkMessage
+Bulk operation message for validate/delete:
+| Field | Type | Source |
+|-------|------|--------|
+| `annotation_names` | list[str] | msgpack field 0 |
+| `annotation_status` | AnnotationStatus | field 1 |
+| `createdEmail` | str | field 2 |
+| `createdDate` | datetime | field 3 (Timestamp) |
+
+## Internal Logic
+- All messages are deserialized from msgpack binary using positional integer keys.
+- Detections within AnnotationMessage are stored as a JSON string inside the msgpack payload.
+- Module-level `annotation_classes = AnnotationClass.read_json()` is loaded at import time for Detection.__str__ formatting.
+
+## Dependencies
+- `msgpack` (external) — binary message deserialization
+- `json`, `datetime`, `enum` (stdlib)
+
+## Consumers
+annotation-queue/annotation_queue_handler
+
+## Data Models
+AnnotationClass, AnnotationStatus, SourceEnum, RoleEnum, Detection, AnnotationCreatedMessageNarrow, AnnotationMessage, AnnotationBulkMessage.
+
+## Configuration
+Reads `classes.json` from current working directory.
+
+## External Integrations
+None (pure data classes).
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,59 @@
+# Module: annotation-queue/annotation_queue_handler
+
+## Purpose
+Async consumer for the Azaion annotation queue (RabbitMQ Streams). Listens for annotation CRUD events and writes/moves image+label files on the filesystem.
+
+## Public Interface
+
+### AnnotationQueueHandler
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `()` | — | Reads config.yaml, creates directories, initializes rstream Consumer, reads offset |
+| `start` | `async ()` | — | Starts consumer, subscribes to queue stream, runs event loop |
+| `on_message` | `(message: AMQPMessage, context: MessageContext)` | — | Message callback: routes by AnnotationStatus to save/validate/delete |
+| `save_annotation` | `(ann: AnnotationMessage)` | — | Writes label file + image to data or seed directory based on role |
+| `validate` | `(msg: AnnotationBulkMessage)` | — | Moves annotations from seed to data directory |
+| `delete` | `(msg: AnnotationBulkMessage)` | — | Moves annotations to deleted directory |
+
+### AnnotationQueueHandler.AnnotationName (inner class)
+Helper that pre-computes file paths for an annotation name across data/seed directories.
+
+## Internal Logic
+- **Queue protocol**: Subscribes to a RabbitMQ Streams queue using rstream library with AMQP message decoding. Resumes from a persisted offset stored in `offset.yaml`.
+- **Message routing** (via `application_properties['AnnotationStatus']`):
+  - `Created` / `Edited` → `save_annotation`: If validator role, writes to data dir; else writes to seed dir. For Created status, also saves the image bytes. For Edited by validator, moves image from seed to data.
+  - `Validated` → `validate`: Bulk-moves all named annotations from seed to data directory.
+  - `Deleted` → `delete`: Bulk-moves all named annotations to the deleted directory.
+- **Offset tracking**: After each message, increments offset and persists to `offset.yaml`.
+- **Directory layout**:
+  - `{root}/data/images/` + `{root}/data/labels/` — validated annotations
+  - `{root}/data-seed/images/` + `{root}/data-seed/labels/` — unvalidated annotations
+  - `{root}/data_deleted/images/` + `{root}/data_deleted/labels/` — soft-deleted annotations
+- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, logs to `logs/` directory.
+
+## Dependencies
+- `annotation_queue_dto` — AnnotationStatus, AnnotationMessage, AnnotationBulkMessage
+- `rstream` (external) — RabbitMQ Streams consumer
+- `yaml` (external) — config and offset persistence
+- `asyncio`, `os`, `shutil`, `sys`, `logging`, `datetime` (stdlib)
+
+## Consumers
+None (entry point — runs via `__main__`).
+
+## Data Models
+Uses AnnotationMessage, AnnotationBulkMessage from annotation_queue_dto.
+
+## Configuration
+- `config.yaml`: API creds (url, email, password), queue config (host, port, consumer_user, consumer_pw, name), directory structure (root, data, data_seed, data_processed, data_deleted, images, labels)
+- `offset.yaml`: persisted queue consumer offset
+
+## External Integrations
+- RabbitMQ Streams queue (rstream library) on host `188.245.120.247:5552`
+- Filesystem: `/azaion/data/`, `/azaion/data-seed/`, `/azaion/data_deleted/`
+
+## Security
+- Queue credentials in `config.yaml` (hardcoded — security concern)
+- No encryption of annotation data at rest
+
+## Tests
+None.
@@ -0,0 +1,64 @@
+# Module: api_client
+
+## Purpose
+HTTP client for the Azaion backend API. Handles authentication, file upload/download with encryption, and split-resource management (big/small model parts).
+
+## Public Interface
+
+### ApiCredentials
+| Field | Type | Description |
+|-------|------|-------------|
+| `url` | str | API base URL |
+| `email` | str | Login email |
+| `password` | str | Login password |
+
+### ApiClient
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `()` | — | Reads `config.yaml` for API creds, reads `cdn.yaml` via `load_bytes`, initializes CDNManager |
+| `login` | `()` | — | POST `/login` → stores JWT token |
+| `upload_file` | `(filename: str, file_bytes: bytearray, folder: str)` | — | Uploads file to API resource endpoint |
+| `load_bytes` | `(filename: str, folder: str) -> bytes` | Decrypted bytes | Downloads encrypted resource from API, decrypts with hardware-bound key |
+| `load_big_small_resource` | `(resource_name: str, folder: str, key: str) -> bytes` | Decrypted bytes | Reassembles a split resource: big part from local disk + small part from API, decrypts combined |
+| `upload_big_small_resource` | `(resource: bytes, resource_name: str, folder: str, key: str)` | — | Encrypts resource, splits into big (CDN) + small (API), uploads both |
+
+## Internal Logic
+- **Authentication**: JWT-based. Auto-login on first request, re-login on 401/403.
+- **load_bytes**: Sends hardware fingerprint in request payload. Server returns encrypted bytes. Client decrypts using key derived from credentials + hardware hash.
+- **Split resource pattern**: Large files (models) are split into two parts:
+  - `*.small` — first N bytes (min of `SMALL_SIZE_KB * 1024` or 20% of encrypted size) — stored on API server
+  - `*.big` — remainder — stored on CDN (S3)
+  - This split ensures the model cannot be reconstructed from either storage alone.
+- **CDN initialization**: On construction, `cdn.yaml` is loaded via `load_bytes` (from API, encrypted), then used to initialize `CDNManager`.
+
+## Dependencies
+- `constants` — config file paths, size thresholds, model folder name
+- `cdn_manager` — CDNCredentials, CDNManager for S3 operations
+- `hardware_service` — `get_hardware_info()` for hardware fingerprint
+- `security` — encryption/decryption, key derivation
+- `requests` (external) — HTTP client
+- `yaml` (external) — config parsing
+- `io`, `json`, `os` (stdlib)
+
+## Consumers
+exports, train, start_inference
+
+## Data Models
+`ApiCredentials` — API connection credentials.
+
+## Configuration
+- `config.yaml` — API URL, email, password
+- `cdn.yaml` — CDN credentials (loaded encrypted from API at init time)
+
+## External Integrations
+- Azaion REST API (`POST /login`, `POST /resources/{folder}`, `POST /resources/get/{folder}`)
+- S3-compatible CDN via CDNManager
+
+## Security
+- JWT token-based authentication with auto-refresh on 401/403
+- Hardware-bound encryption for downloaded resources
+- Split model storage prevents single-point compromise
+- Credentials read from `config.yaml` (hardcoded in file — security concern)
+
+## Tests
+None.
@@ -0,0 +1,56 @@
+# Module: augmentation
+
+## Purpose
+Image augmentation pipeline that takes raw annotated images and produces multiple augmented variants for training data expansion. Runs continuously in a loop.
+
+## Public Interface
+
+### Augmentator
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `()` | — | Initializes augmentation transforms and counters |
+| `augment_annotations` | `(from_scratch: bool = False)` | — | Processes all unprocessed images from `data/images` → `data-processed/images` |
+| `augment_annotation` | `(image_file)` | — | Processes a single image file: reads image + labels, augments, saves results |
+| `augment_inner` | `(img_ann: ImageLabel) -> list[ImageLabel]` | List of augmented images | Generates 1 original + 7 augmented variants |
+| `correct_bboxes` | `(labels) -> list` | Corrected labels | Clips bounding boxes to image boundaries, removes tiny boxes |
+| `read_labels` | `(labels_path) -> list[list]` | Parsed YOLO labels | Reads YOLO-format label file into list of [x, y, w, h, class_id] |
+
+## Internal Logic
+- **Augmentation pipeline** (albumentations Compose):
+  1. HorizontalFlip (p=0.6)
+  2. RandomBrightnessContrast (p=0.4)
+  3. Affine: scale 0.8–1.2, rotate ±35°, shear ±10° (p=0.8)
+  4. MotionBlur (p=0.1)
+  5. HueSaturationValue (p=0.4)
+- Each image produces **8 outputs**: 1 original copy + 7 augmented variants
+- Naming: `{stem}_{1..7}.jpg` for augmented, original keeps its name
+- **Bbox correction**: clips bounding boxes that extend outside image borders, removes boxes smaller than `correct_min_bbox_size` (0.01 of image dimension)
+- **Incremental processing**: skips images already present in `processed_images_dir`
+- **Concurrent**: uses `ThreadPoolExecutor` for parallel processing
+- **Continuous mode**: `__main__` runs augmentation in an infinite loop with 5-minute sleep between rounds
+
+## Dependencies
+- `constants` — directory paths (data_images_dir, data_labels_dir, processed_*)
+- `dto/imageLabel` — ImageLabel container class
+- `albumentations` (external) — augmentation transforms
+- `cv2` (external) — image read/write
+- `numpy` (external) — image array handling
+- `concurrent.futures`, `os`, `shutil`, `time`, `datetime`, `pathlib` (stdlib)
+
+## Consumers
+manual_run
+
+## Data Models
+Uses `ImageLabel` from `dto/imageLabel`.
+
+## Configuration
+Hardcoded augmentation parameters (probabilities, ranges). Directory paths from `constants`.
+
+## External Integrations
+Filesystem I/O: reads from `/azaion/data/`, writes to `/azaion/data-processed/`.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,51 @@
+# Module: cdn_manager
+
+## Purpose
+Manages file upload and download to/from an S3-compatible CDN (MinIO/similar) using separate credentials for upload and download operations.
+
+## Public Interface
+
+### CDNCredentials
+| Field | Type | Description |
+|-------|------|-------------|
+| `host` | str | CDN endpoint URL |
+| `downloader_access_key` | str | S3 access key for downloads |
+| `downloader_access_secret` | str | S3 secret for downloads |
+| `uploader_access_key` | str | S3 access key for uploads |
+| `uploader_access_secret` | str | S3 secret for uploads |
+
+### CDNManager
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `(credentials: CDNCredentials)` | — | Creates two boto3 S3 clients (download + upload) |
+| `upload` | `(bucket: str, filename: str, file_bytes: bytearray) -> bool` | True on success | Uploads bytes to S3 bucket |
+| `download` | `(bucket: str, filename: str) -> bool` | True on success | Downloads file from S3 to current directory |
+
+## Internal Logic
+- Maintains two separate boto3 S3 clients with different credentials (read vs write separation)
+- Upload uses `upload_fileobj` with in-memory BytesIO wrapper
+- Download uses `download_file` (saves directly to disk with same filename)
+- Both methods catch all exceptions, print error, return bool
+
+## Dependencies
+- `boto3` (external) — S3 client
+- `io`, `sys`, `yaml`, `os` (stdlib) — Note: `sys`, `yaml`, `os` are imported but unused
+
+## Consumers
+api_client, exports, train, start_inference
+
+## Data Models
+`CDNCredentials` — plain data class holding S3 access credentials.
+
+## Configuration
+Credentials loaded from `cdn.yaml` by callers (not by this module directly).
+
+## External Integrations
+- S3-compatible object storage (configured via `CDNCredentials.host`)
+
+## Security
+- Separate read/write credentials enforce least-privilege access
+- Credentials passed in at construction time, not hardcoded here
+
+## Tests
+None.
@@ -0,0 +1,59 @@
+# Module: constants
+
+## Purpose
+Centralizes all filesystem path constants, config file names, file extensions, and size thresholds used across the training pipeline.
+
+## Public Interface
+
+| Name | Type | Value/Description |
+|------|------|-------------------|
+| `azaion` | str | Root directory: `/azaion` |
+| `prefix` | str | Naming prefix: `azaion-` |
+| `data_dir` | str | `/azaion/data` |
+| `data_images_dir` | str | `/azaion/data/images` |
+| `data_labels_dir` | str | `/azaion/data/labels` |
+| `processed_dir` | str | `/azaion/data-processed` |
+| `processed_images_dir` | str | `/azaion/data-processed/images` |
+| `processed_labels_dir` | str | `/azaion/data-processed/labels` |
+| `corrupted_dir` | str | `/azaion/data-corrupted` |
+| `corrupted_images_dir` | str | `/azaion/data-corrupted/images` |
+| `corrupted_labels_dir` | str | `/azaion/data-corrupted/labels` |
+| `sample_dir` | str | `/azaion/data-sample` |
+| `datasets_dir` | str | `/azaion/datasets` |
+| `models_dir` | str | `/azaion/models` |
+| `date_format` | str | `%Y-%m-%d` |
+| `checkpoint_file` | str | `checkpoint.txt` |
+| `checkpoint_date_format` | str | `%Y-%m-%d %H:%M:%S` |
+| `CONFIG_FILE` | str | `config.yaml` |
+| `JPG_EXT` | str | `.jpg` |
+| `TXT_EXT` | str | `.txt` |
+| `OFFSET_FILE` | str | `offset.yaml` |
+| `SMALL_SIZE_KB` | int | `3` (KB threshold for split-upload small part) |
+| `CDN_CONFIG` | str | `cdn.yaml` |
+| `MODELS_FOLDER` | str | `models` |
+| `CURRENT_PT_MODEL` | str | `/azaion/models/azaion.pt` |
+| `CURRENT_ONNX_MODEL` | str | `/azaion/models/azaion.onnx` |
+
+## Internal Logic
+Pure constant definitions using `os.path.join`. No functions, no classes, no dynamic behavior.
+
+## Dependencies
+- `os.path` (stdlib)
+
+## Consumers
+api_client, augmentation, exports, train, manual_run, start_inference, dataset-visualiser
+
+## Data Models
+None.
+
+## Configuration
+Defines `CONFIG_FILE = 'config.yaml'` and `CDN_CONFIG = 'cdn.yaml'` — the filenames for runtime configuration. Does not read them.
+
+## External Integrations
+None.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,43 @@
+# Module: convert-annotations
+
+## Purpose
+Standalone script that converts annotation files from external formats (Pascal VOC XML, oriented bounding box text) to YOLO format.
+
+## Public Interface
+
+| Function | Signature | Returns | Description |
+|----------|-----------|---------|-------------|
+| `convert` | `(folder, dest_folder, read_annotations, ann_format)` | — | Generic converter: reads images + annotations from folder, writes YOLO format to dest |
+| `minmax2yolo` | `(width, height, xmin, xmax, ymin, ymax) -> tuple` | (cx, cy, w, h) | Converts pixel min/max coords to normalized YOLO center format |
+| `read_pascal_voc` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses Pascal VOC XML, maps class names to IDs, outputs YOLO lines |
+| `read_bbox_oriented` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses 14-column oriented bbox format, outputs YOLO lines (hardcoded class 2) |
+| `rename_images` | `(folder)` | — | Renames files by trimming last 7 chars + replacing extension with .png |
+
+## Internal Logic
+- **convert()**: Iterates image files in source folder, reads corresponding annotation file, calls format-specific reader, copies image and writes YOLO label to destination.
+- **Pascal VOC**: Parses XML `<object>` elements, maps class names via `name_class_map` (Truck→1, Car/Taxi→2), filters forbidden classes (Motorcycle). Default class = 1.
+- **Oriented bbox**: 14-column space-separated format, extracts min/max from columns 6–13, hardcodes class to 2.
+- **Validation**: Skips labels where normalized coordinates exceed 1.0 (out of bounds).
+
+## Dependencies
+- `cv2` (external) — image reading for dimensions
+- `xml.etree.cElementTree` (stdlib) — Pascal VOC XML parsing
+- `os`, `shutil`, `pathlib` (stdlib)
+
+## Consumers
+None (standalone script).
+
+## Data Models
+None.
+
+## Configuration
+Hardcoded class mappings: `name_class_map = {'Truck': 1, 'Car': 2, 'Taxi': 2}`, `forbidden_classes = ['Motorcycle']`.
+
+## External Integrations
+Filesystem I/O only.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,41 @@
+# Module: dataset-visualiser
+
+## Purpose
+Interactive tool for visually inspecting annotated images from datasets or the processed folder, displaying bounding boxes with class colors.
+
+## Public Interface
+
+| Function | Signature | Description |
+|----------|-----------|-------------|
+| `visualise_dataset` | `()` | Iterates images in a specific dataset folder, shows each with annotations. Waits for keypress. |
+| `visualise_processed_folder` | `()` | Shows images from the processed folder with annotations. |
+
+## Internal Logic
+- **visualise_dataset()**: Hardcoded to a specific dataset date (`2024-06-18`), iterates from index 35247 onward. Reads image + labels, calls `ImageLabel.visualize()`, waits for user input to advance.
+- **visualise_processed_folder()**: Lists all processed images, shows the first one.
+- Both functions use `read_labels()` imported from a `preprocessing` module **which does not exist** in the codebase — this is a broken import.
+
+## Dependencies
+- `constants` — directory paths (datasets_dir, prefix, processed_*)
+- `dto/annotationClass` — AnnotationClass for class colors
+- `dto/imageLabel` — ImageLabel for visualization
+- `preprocessing` — **MISSING MODULE** (read_labels function)
+- `cv2` (external), `matplotlib` (external), `os`, `pathlib` (stdlib)
+
+## Consumers
+None (standalone script).
+
+## Data Models
+Uses ImageLabel, AnnotationClass.
+
+## Configuration
+Hardcoded dataset path and start index.
+
+## External Integrations
+Filesystem I/O, matplotlib interactive display.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,49 @@
+# Module: dto/annotationClass
+
+## Purpose
+Defines the `AnnotationClass` data model and `WeatherMode` enum used in the training pipeline. Reads annotation class definitions from `classes.json`.
+
+## Public Interface
+
+### WeatherMode (Enum)
+| Member | Value | Description |
+|--------|-------|-------------|
+| `Norm` | 0 | Normal weather |
+| `Wint` | 20 | Winter conditions |
+| `Night` | 40 | Night conditions |
+
+### AnnotationClass
+| Field/Method | Type/Signature | Description |
+|-------------|----------------|-------------|
+| `id` | int | Class ID (weather_offset + base_id) |
+| `name` | str | Class name (with weather suffix if non-Norm) |
+| `color` | str | Hex color string (e.g. `#ff0000`) |
+| `color_tuple` | property → tuple | RGB tuple parsed from hex color |
+| `read_json()` | static → dict[int, AnnotationClass] | Reads `classes.json`, expands across weather modes, returns dict keyed by ID |
+
+## Internal Logic
+- `read_json()` locates `classes.json` relative to the parent directory of the `dto/` package
+- For each of the 3 weather modes, creates an AnnotationClass per entry in `classes.json` with offset IDs (0, 20, 40)
+- This produces up to 80 classes total (17 base × 3 modes = 51, but the system reserves 80 slots)
+- `color_tuple` strips the first 3 characters of the color string and parses hex pairs
+
+## Dependencies
+- `json`, `enum`, `os.path` (stdlib)
+
+## Consumers
+train (for YAML generation), dataset-visualiser (for visualization colors)
+
+## Data Models
+`AnnotationClass` — annotation class with ID, name, color. `WeatherMode` — enum for weather conditions.
+
+## Configuration
+Reads `classes.json` from project root (relative path from `dto/` parent).
+
+## External Integrations
+None.
+
+## Security
+None.
+
+## Tests
+None directly; used transitively by `tests/imagelabel_visualize_test.py`.
@@ -0,0 +1,41 @@
+# Module: dto/imageLabel
+
+## Purpose
+Container class for an image with its YOLO-format bounding box labels, plus a visualization method for debugging annotations.
+
+## Public Interface
+
+### ImageLabel
+| Field/Method | Type/Signature | Description |
+|-------------|----------------|-------------|
+| `image_path` | str | Filesystem path to the image |
+| `image` | numpy.ndarray | OpenCV image array |
+| `labels_path` | str | Filesystem path to the labels file |
+| `labels` | list[list] | List of YOLO bboxes: [x_center, y_center, width, height, class_id] |
+| `visualize` | `(annotation_classes: dict) -> None` | Draws bounding boxes on image and displays via matplotlib |
+
+## Internal Logic
+- `visualize()` converts BGR→RGB, iterates labels, converts normalized YOLO coordinates to pixel coordinates, draws colored rectangles using `annotation_classes[class_num].color_tuple`, displays with matplotlib.
+- Labels use YOLO format: center_x, center_y, width, height (all normalized 0–1), class_id as last element.
+
+## Dependencies
+- `cv2` (external) — image manipulation
+- `matplotlib.pyplot` (external) — image display
+
+## Consumers
+augmentation (as augmented image container), dataset-visualiser (for visualization)
+
+## Data Models
+`ImageLabel` — image + labels container.
+
+## Configuration
+None.
+
+## External Integrations
+None.
+
+## Security
+None.
+
+## Tests
+Used by `tests/imagelabel_visualize_test.py`.
@@ -0,0 +1,53 @@
+# Module: exports
+
+## Purpose
+Model export utilities: converts trained YOLO .pt models to ONNX, TensorRT, and RKNN formats. Also handles encrypted model upload (split big/small pattern) and data sampling.
+
+## Public Interface
+
+| Function | Signature | Returns | Description |
+|----------|-----------|---------|-------------|
+| `export_rknn` | `(model_path: str)` | — | Exports YOLO model to RKNN format (RK3588 target), cleans up temp folder |
+| `export_onnx` | `(model_path: str, batch_size: int = 4)` | — | Exports YOLO model to ONNX (1280px, NMS enabled, GPU device 0) |
+| `export_tensorrt` | `(model_path: str)` | — | Exports YOLO model to TensorRT engine (batch=4, half precision, NMS) |
+| `form_data_sample` | `(destination_path: str, size: int = 500, write_txt_log: bool = False)` | — | Creates a random sample of processed images |
+| `show_model` | `(model: str = None)` | — | Opens model visualization in netron |
+| `upload_model` | `(model_path: str, filename: str, size_small_in_kb: int = 3)` | — | Encrypts model, splits big/small, uploads to API + CDN |
+
+## Internal Logic
+- **export_onnx**: Removes existing ONNX file if present, exports at 1280px with NMS baked in and simplification.
+- **export_tensorrt**: Uses YOLO's built-in TensorRT export (batch=4, FP16, NMS, simplify).
+- **export_rknn**: Exports to RKNN format targeting RK3588 SoC, moves result file and cleans temp directory.
+- **upload_model**: Encrypts with `Security.get_model_encryption_key()`, splits encrypted bytes at 30%/70% boundary (or `size_small_in_kb * 1024`), uploads small part to API, big part to CDN.
+- **form_data_sample**: Randomly shuffles processed images, copies first N to destination folder.
+
+## Dependencies
+- `constants` — directory paths, model paths, config file names
+- `api_client` — ApiClient, ApiCredentials for upload
+- `cdn_manager` — CDNManager, CDNCredentials for CDN upload
+- `security` — model encryption key, encrypt_to
+- `utils` — Dotdict for config access
+- `ultralytics` (external) — YOLO model
+- `netron` (external) — model visualization
+- `yaml`, `os`, `shutil`, `random`, `pathlib` (stdlib)
+
+## Consumers
+train (export_tensorrt, upload_model, export_onnx)
+
+## Data Models
+None.
+
+## Configuration
+Reads `config.yaml` for API credentials (in `upload_model`), `cdn.yaml` for CDN credentials.
+
+## External Integrations
+- Ultralytics YOLO export pipeline
+- Netron model viewer
+- Azaion API + CDN for model upload
+
+## Security
+- Models are encrypted with AES-256-CBC before upload
+- Split storage (big on CDN, small on API) prevents single-point compromise
+
+## Tests
+None.
@@ -0,0 +1,38 @@
+# Module: hardware_service
+
+## Purpose
+Collects hardware fingerprint information (CPU, GPU, RAM, drive serial) from the host machine for use in hardware-bound encryption key derivation.
+
+## Public Interface
+
+| Function | Signature | Returns |
+|----------|-----------|---------|
+| `get_hardware_info` | `() -> str` | Formatted string: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {drive_serial}` |
+
+## Internal Logic
+- Detects OS via `os.name` (`nt` for Windows, else Linux)
+- **Windows**: PowerShell commands to query `Win32_Processor`, `Win32_VideoController`, `Win32_OperatingSystem`, disk serial
+- **Linux**: `lscpu`, `lspci`, `free`, `/sys/block/sda/device/` serial
+- Parses multi-line output: first line = CPU, second = GPU, second-to-last = memory, last = drive serial
+- Handles multiple GPUs by taking first GPU and last two lines for memory/drive
+
+## Dependencies
+- `os`, `subprocess` (stdlib)
+
+## Consumers
+api_client (used in `load_bytes` to generate hardware string for encryption)
+
+## Data Models
+None.
+
+## Configuration
+None.
+
+## External Integrations
+Executes OS-level shell commands to query hardware.
+
+## Security
+The hardware fingerprint is used as input to `Security.get_hw_hash()` and subsequently `Security.get_api_encryption_key()`, binding API encryption to the specific machine.
+
+## Tests
+None.
@@ -0,0 +1,55 @@
+# Module: inference/dto
+
+## Purpose
+Data transfer objects for the inference subsystem: Detection, Annotation, and a local copy of AnnotationClass/WeatherMode.
+
+## Public Interface
+
+### Detection
+| Field | Type | Description |
+|-------|------|-------------|
+| `x` | float | Normalized center X |
+| `y` | float | Normalized center Y |
+| `w` | float | Normalized width |
+| `h` | float | Normalized height |
+| `cls` | int | Class ID |
+| `confidence` | float | Detection confidence score |
+| `overlaps(det2, iou_threshold) -> bool` | method | IoU-based overlap check |
+
+### Annotation
+| Field | Type | Description |
+|-------|------|-------------|
+| `frame` | numpy.ndarray | Video frame image |
+| `time` | int/float | Timestamp in the video |
+| `detections` | list[Detection] | Detected objects in this frame |
+
+### AnnotationClass (duplicate)
+Same as `dto/annotationClass.AnnotationClass` but with an additional `opencv_color` field (BGR tuple). Reads from `classes.json` relative to `inference/` parent directory.
+
+### WeatherMode (duplicate)
+Same as `dto/annotationClass.WeatherMode`.
+
+## Internal Logic
+- `Detection.overlaps()` computes IoU between two bounding boxes and returns True if above threshold.
+- `AnnotationClass` here adds `opencv_color` as a pre-computed BGR tuple from the hex color for efficient OpenCV rendering.
+
+## Dependencies
+- `json`, `enum`, `os.path` (stdlib)
+
+## Consumers
+inference/inference
+
+## Data Models
+Detection, Annotation, AnnotationClass, WeatherMode.
+
+## Configuration
+Reads `classes.json` from project root.
+
+## External Integrations
+None.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,48 @@
+# Module: inference/inference
+
+## Purpose
+High-level video inference pipeline. Orchestrates preprocessing → engine inference → postprocessing → visualization for object detection on video streams.
+
+## Public Interface
+
+### Inference
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `(engine: InferenceEngine, confidence_threshold, iou_threshold)` | — | Stores engine, thresholds, loads annotation classes |
+| `preprocess` | `(frames: list) -> np.ndarray` | Batched blob tensor | Normalizes, resizes, and stacks frames into NCHW blob |
+| `postprocess` | `(batch_frames, batch_timestamps, output) -> list[Annotation]` | Annotations per frame | Extracts detections from raw output, applies confidence filter and NMS |
+| `process` | `(video: str)` | — | End-to-end: reads video → batched inference → draws + displays results |
+| `draw` | `(annotation: Annotation)` | — | Draws bounding boxes with class labels on frame, shows via cv2.imshow |
+| `remove_overlapping_detections` | `(detections: list[Detection]) -> list[Detection]` | Filtered list | Custom NMS: removes overlapping detections keeping higher confidence |
+
+## Internal Logic
+- **Video processing**: Reads video via cv2.VideoCapture, processes every 4th frame (frame_count % 4), batches frames to engine batch size.
+- **Preprocessing**: `cv2.dnn.blobFromImage` with 1/255 scaling, model input size, BGR→RGB swap.
+- **Postprocessing**: Iterates raw output, filters by confidence threshold, normalizes coordinates from model space to [0,1], creates Detection objects, applies custom NMS.
+- **Custom NMS**: Pairwise IoU comparison. When two detections overlap above threshold, keeps the one with higher confidence (ties broken by lower class ID).
+- **Visualization**: Draws colored rectangles and confidence labels using annotation class colors in OpenCV window.
+
+## Dependencies
+- `inference/dto` — Detection, Annotation, AnnotationClass
+- `inference/onnx_engine` — InferenceEngine ABC (type hint)
+- `cv2` (external) — video I/O, image processing, display
+- `numpy` (external) — tensor operations
+
+## Consumers
+start_inference
+
+## Data Models
+Uses Detection, Annotation from `inference/dto`.
+
+## Configuration
+`confidence_threshold` and `iou_threshold` set at construction.
+
+## External Integrations
+- OpenCV video capture (file or stream input)
+- OpenCV GUI window for real-time display
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,50 @@
+# Module: inference/onnx_engine
+
+## Purpose
+Defines the abstract `InferenceEngine` base class and the `OnnxEngine` implementation for running ONNX model inference with GPU acceleration.
+
+## Public Interface
+
+### InferenceEngine (ABC)
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(model_path: str, batch_size: int = 1, **kwargs)` | Abstract constructor |
+| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) of model input |
+| `get_batch_size` | `() -> int` | Returns the batch size |
+| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs inference, returns output tensors |
+
+### OnnxEngine (extends InferenceEngine)
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(model_bytes, batch_size: int = 1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA+CPU providers |
+| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) from model input shape |
+| `get_batch_size` | `() -> int` | Returns batch size (from model shape or constructor arg) |
+| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs ONNX inference session |
+
+## Internal Logic
+- Uses ONNX Runtime with `CUDAExecutionProvider` (primary) and `CPUExecutionProvider` (fallback).
+- Reads model metadata to extract class names from custom metadata map.
+- If model input shape has a fixed batch dimension (not -1), overrides the constructor batch_size.
+
+## Dependencies
+- `onnxruntime` (external) — ONNX inference runtime
+- `numpy` (external)
+- `abc`, `typing` (stdlib)
+
+## Consumers
+inference/inference, inference/tensorrt_engine (inherits InferenceEngine), train (imports OnnxEngine)
+
+## Data Models
+None.
+
+## Configuration
+None.
+
+## External Integrations
+- ONNX Runtime GPU execution (CUDA)
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,53 @@
+# Module: inference/tensorrt_engine
+
+## Purpose
+TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.
+
+## Public Interface
+
+### TensorRTEngine (extends InferenceEngine)
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `__init__` | `(model_bytes: bytes, **kwargs)` | — | Deserializes TensorRT engine from bytes, allocates CUDA memory |
+| `get_input_shape` | `() -> Tuple[int, int]` | (height, width) | Returns model input dimensions |
+| `get_batch_size` | `() -> int` | int | Returns configured batch size |
+| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Output tensors | Runs async inference on CUDA stream |
+| `get_gpu_memory_bytes` | `(device_id=0) -> int` | GPU memory in bytes | Queries total GPU VRAM via pynvml (static) |
+| `get_engine_filename` | `(device_id=0) -> str \| None` | Filename string | Generates device-specific engine filename (static) |
+| `convert_from_onnx` | `(onnx_model: bytes) -> bytes \| None` | Serialized TensorRT plan | Converts ONNX model to TensorRT engine (static) |
+
+## Internal Logic
+- **Initialization**: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
+- **Dynamic shapes**: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
+- **Output shape**: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
+- **Inference flow**: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
+- **ONNX conversion**: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
+- **Engine filename**: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` — uniquely identifies engine per GPU architecture.
+
+## Dependencies
+- `inference/onnx_engine` — InferenceEngine ABC
+- `tensorrt` (external) — TensorRT runtime and builder
+- `pycuda.driver` (external) — CUDA memory management
+- `pycuda.autoinit` (external) — CUDA context auto-initialization
+- `pynvml` (external) — GPU memory query
+- `numpy`, `json`, `struct`, `re`, `subprocess`, `pathlib`, `typing` (stdlib/external)
+
+## Consumers
+start_inference
+
+## Data Models
+None.
+
+## Configuration
+None.
+
+## External Integrations
+- NVIDIA TensorRT runtime (GPU inference)
+- CUDA driver API (memory allocation, streams)
+- NVML (GPU hardware queries)
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,36 @@
+# Module: manual_run
+
+## Purpose
+Ad-hoc script for manual training operations. Contains commented-out alternatives and a hardcoded workflow for copying model weights and exporting.
+
+## Public Interface
+No functions or classes. Script-level code only.
+
+## Internal Logic
+- Contains commented-out calls to `Augmentator().augment_annotations()`, `train.train_dataset()`, `train.resume_training()`.
+- Active code: references a specific model date (`2025-05-18`), removes intermediate epoch checkpoint files, copies `best.pt` to `CURRENT_PT_MODEL`, then calls `train.export_current_model()`.
+- Serves as a developer convenience script for one-off training/export operations.
+
+## Dependencies
+- `constants` — models_dir, prefix, CURRENT_PT_MODEL
+- `train` — export_current_model
+- `augmentation` — Augmentator (imported, usage commented out)
+- `glob`, `os`, `shutil` (stdlib)
+
+## Consumers
+None (standalone script).
+
+## Data Models
+None.
+
+## Configuration
+Hardcoded model date: `2025-05-18`.
+
+## External Integrations
+Filesystem operations on `/azaion/models/`.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,45 @@
+# Module: security
+
+## Purpose
+Provides AES-256-CBC encryption/decryption and key derivation functions used to protect model files and API resources in transit.
+
+## Public Interface
+
+| Method | Signature | Returns | Description |
+|--------|-----------|---------|-------------|
+| `Security.encrypt_to` | `(input_bytes: bytes, key: str) -> bytes` | IV + ciphertext | AES-256-CBC encrypt with PKCS7 padding; prepends 16-byte random IV |
+| `Security.decrypt_to` | `(ciphertext_with_iv_bytes: bytes, key: str) -> bytes` | plaintext bytes | Extracts IV from first 16 bytes, decrypts, removes PKCS7 padding |
+| `Security.calc_hash` | `(key: str) -> str` | base64-encoded SHA-384 hash | General-purpose hash function |
+| `Security.get_hw_hash` | `(hardware: str) -> str` | base64 hash | Derives a hardware-specific hash using `Azaion_{hardware}_%$$$)0_` salt |
+| `Security.get_api_encryption_key` | `(creds, hardware_hash: str) -> str` | base64 hash | Derives API encryption key from credentials + hardware hash |
+| `Security.get_model_encryption_key` | `() -> str` | base64 hash | Returns a fixed encryption key derived from a hardcoded secret string |
+
+## Internal Logic
+- Encryption: SHA-256 of the key string → 32-byte AES key. Random 16-byte IV generated per encryption. PKCS7 padding applied. Output = IV ∥ ciphertext.
+- Decryption: First 16 bytes = IV, remainder = ciphertext. Manual PKCS7 unpadding (checks last byte is 1–16).
+- Key derivation uses SHA-384 + base64 encoding for all hash-based keys.
+- `BUFFER_SIZE = 64 * 1024` is declared but unused.
+
+## Dependencies
+- `cryptography.hazmat` (external) — AES cipher, CBC mode, PKCS7 padding
+- `hashlib`, `base64`, `os` (stdlib)
+
+## Consumers
+api_client, exports, train, start_inference, tests/security_test
+
+## Data Models
+None.
+
+## Configuration
+None consumed at runtime. Contains hardcoded key material.
+
+## External Integrations
+None.
+
+## Security
+- **Hardcoded model encryption key**: `get_model_encryption_key()` uses a static string `'-#%@AzaionKey@%#---234sdfklgvhjbnn'`. This is a significant security concern — the key should be stored in a secrets manager or environment variable.
+- API encryption key is derived from user credentials + hardware fingerprint, providing per-device uniqueness.
+- AES-256-CBC with random IV is cryptographically sound for symmetric encryption.
+
+## Tests
+- `tests/security_test.py` — basic round-trip encrypt/decrypt test (script-based, no test framework).
@@ -0,0 +1,52 @@
+# Module: start_inference
+
+## Purpose
+Entry point for running inference on video files using a TensorRT engine. Downloads the encrypted model from the API/CDN, initializes the engine, and processes video.
+
+## Public Interface
+
+| Function | Signature | Returns | Description |
+|----------|-----------|---------|-------------|
+| `get_engine_filename` | `(device_id=0) -> str \| None` | Engine filename | Generates GPU-specific engine filename (duplicate of TensorRTEngine.get_engine_filename) |
+
+`__main__` block: Creates ApiClient, downloads encrypted TensorRT model (split big/small), initializes TensorRTEngine, runs Inference on a test video.
+
+## Internal Logic
+- **Model download flow**: ApiClient → `load_big_small_resource` → reassembles from local big part + API-downloaded small part → decrypts with model encryption key → raw engine bytes.
+- **Inference setup**: TensorRTEngine initialized from decrypted bytes, Inference configured with confidence_threshold=0.5, iou_threshold=0.3.
+- **Video source**: Hardcoded to `tests/ForAI_test.mp4`.
+- **get_engine_filename()**: Duplicates `TensorRTEngine.get_engine_filename()` — generates `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` based on CUDA device compute capability and SM count.
+
+## Dependencies
+- `constants` — config file paths
+- `api_client` — ApiClient, ApiCredentials for model download
+- `cdn_manager` — CDNManager, CDNCredentials (imported but CDN managed by api_client)
+- `inference/inference` — Inference pipeline
+- `inference/tensorrt_engine` — TensorRTEngine
+- `security` — model encryption key
+- `utils` — Dotdict
+- `pycuda.driver` (external) — CUDA device queries
+- `yaml` (external)
+
+## Consumers
+None (entry point).
+
+## Data Models
+None.
+
+## Configuration
+- Confidence threshold: 0.5
+- IoU threshold: 0.3
+- Video path: `tests/ForAI_test.mp4` (hardcoded)
+
+## External Integrations
+- Azaion API + CDN for model download
+- TensorRT GPU inference
+- OpenCV video capture and display
+
+## Security
+- Model is downloaded encrypted (split big/small) and decrypted locally
+- Uses hardware-bound and model encryption keys
+
+## Tests
+None.
@@ -0,0 +1,61 @@
+# Module: train
+
+## Purpose
+Main training pipeline. Forms YOLO datasets from processed annotations, trains YOLOv11 models, and exports/uploads the trained model.
+
+## Public Interface
+
+| Function | Signature | Returns | Description |
+|----------|-----------|---------|-------------|
+| `form_dataset` | `()` | — | Creates train/valid/test split from processed images |
+| `copy_annotations` | `(images, folder: str)` | — | Copies image+label pairs to a dataset split folder (concurrent) |
+| `check_label` | `(label_path: str) -> bool` | bool | Validates YOLO label file (all coords ≤ 1.0) |
+| `create_yaml` | `()` | — | Generates YOLO `data.yaml` with class names from `classes.json` |
+| `resume_training` | `(last_pt_path: str)` | — | Resumes training from a checkpoint |
+| `train_dataset` | `()` | — | Full pipeline: form_dataset → create_yaml → train YOLOv11 → save model |
+| `export_current_model` | `()` | — | Exports current .pt to ONNX, encrypts, uploads as split resource |
+
+## Internal Logic
+- **Dataset formation**: Shuffles all processed images, splits 70/20/10 (train/valid/test). Copies in parallel via ThreadPoolExecutor. Corrupted labels (coords > 1.0) are moved to `/azaion/data-corrupted/`.
+- **YAML generation**: Reads annotation classes from `classes.json`, builds `data.yaml` with 80 class names (17 actual + 63 placeholders "Class-N"), sets train/valid/test paths.
+- **Training**: YOLOv11 medium (`yolo11m.yaml`), 120 epochs, batch=11 (tuned for 24GB VRAM), 1280px input, save every epoch, 24 workers.
+- **Post-training**: Copies results to `/azaion/models/{date}/`, removes intermediate epoch checkpoints, copies `best.pt` to `CURRENT_PT_MODEL`.
+- **Export**: Calls `export_onnx`, reads the ONNX file, encrypts with model key, uploads via `upload_big_small_resource`.
+- **Dataset naming**: `azaion-{YYYY-MM-DD}` using current date.
+- **`__main__`**: Runs `train_dataset()` then `export_current_model()`.
+
+## Dependencies
+- `constants` — all directory/path constants
+- `api_client` — ApiClient for model upload
+- `cdn_manager` — CDNCredentials, CDNManager (imported but CDN init done via api_client)
+- `dto/annotationClass` — AnnotationClass for class name generation
+- `inference/onnx_engine` — OnnxEngine (imported but unused in current code)
+- `security` — model encryption key
+- `utils` — Dotdict
+- `exports` — export_tensorrt, upload_model, export_onnx
+- `ultralytics` (external) — YOLO training and export
+- `yaml`, `concurrent.futures`, `glob`, `os`, `random`, `shutil`, `subprocess`, `datetime`, `pathlib`, `time` (stdlib)
+
+## Consumers
+manual_run
+
+## Data Models
+Uses AnnotationClass for class definitions.
+
+## Configuration
+- Training hyperparameters hardcoded: epochs=120, batch=11, imgsz=1280, save_period=1, workers=24
+- Dataset split ratios: train_set=70, valid_set=20, test_set=10
+- old_images_percentage=75 (declared but unused)
+- DEFAULT_CLASS_NUM=80
+
+## External Integrations
+- Ultralytics YOLOv11 training pipeline
+- Azaion API + CDN for model upload
+- Filesystem: `/azaion/datasets/`, `/azaion/models/`, `/azaion/data-processed/`, `/azaion/data-corrupted/`
+
+## Security
+- Trained models are encrypted before upload
+- Uses `Security.get_model_encryption_key()` for encryption
+
+## Tests
+None.
@@ -0,0 +1,36 @@
+# Module: utils
+
+## Purpose
+Provides a dictionary subclass that supports dot-notation attribute access.
+
+## Public Interface
+
+| Name | Type | Signature |
+|------|------|-----------|
+| `Dotdict` | class (extends `dict`) | `Dotdict(dict)` |
+
+`Dotdict` overrides `__getattr__`, `__setattr__`, `__delattr__` to delegate to `dict.get`, `dict.__setitem__`, `dict.__delitem__` respectively.
+
+## Internal Logic
+Single-class module. Allows `config.url` instead of `config["url"]` for YAML-loaded dicts.
+
+## Dependencies
+None (stdlib `dict` only).
+
+## Consumers
+exports, train, start_inference
+
+## Data Models
+None.
+
+## Configuration
+None.
+
+## External Integrations
+None.
+
+## Security
+None.
+
+## Tests
+None.
@@ -0,0 +1,6 @@
+1. update yolo to 26m version
+2. don't use external augmentation, use built-in in yolo, put additional parameters for that in train command, each parameter should be on its own line with a proper comment
+3. because of that, we don't need processed folder, just use data dir.
+4. do not copy the files itself to dataset folder, use hard simlynks for that
+5. unify constants directories in config - remove annotations-queue/config.yaml
+and use constants for that
@@ -0,0 +1,22 @@
+{
+  "current_step": "complete",
+  "completed_steps": ["discovery", "module-analysis", "component-assembly", "system-synthesis", "verification", "solution-extraction", "problem-extraction", "final-report"],
+  "focus_dir": null,
+  "modules_total": 21,
+  "modules_documented": [
+    "constants", "utils", "security", "hardware_service", "cdn_manager",
+    "dto/annotationClass", "dto/imageLabel", "inference/dto", "inference/onnx_engine",
+    "api_client", "augmentation", "inference/tensorrt_engine", "inference/inference",
+    "exports", "convert-annotations", "dataset-visualiser",
+    "train", "start_inference",
+    "manual_run",
+    "annotation-queue/annotation_queue_dto", "annotation-queue/annotation_queue_handler"
+  ],
+  "modules_remaining": [],
+  "module_batch": 7,
+  "components_written": [
+    "01_core", "02_security", "03_api_cdn", "04_data_models",
+    "05_data_pipeline", "06_training", "07_inference", "08_annotation_queue"
+  ],
+  "last_updated": "2026-03-26T00:00:00Z"
+}
@@ -0,0 +1,188 @@
+# System Flows
+
+## Flow 1: Annotation Ingestion (Annotation Queue → Filesystem)
+
+```mermaid
+sequenceDiagram
+    participant RMQ as RabbitMQ Streams
+    participant AQH as AnnotationQueueHandler
+    participant FS as Filesystem
+
+    RMQ->>AQH: AMQP message (msgpack)
+    AQH->>AQH: Decode message, read AnnotationStatus
+
+    alt Created / Edited
+        AQH->>AQH: Parse AnnotationMessage (image + detections)
+        alt Validator / Admin role
+            AQH->>FS: Write label → /data/labels/{name}.txt
+            AQH->>FS: Write image → /data/images/{name}.jpg
+        else Operator role
+            AQH->>FS: Write label → /data-seed/labels/{name}.txt
+            AQH->>FS: Write image → /data-seed/images/{name}.jpg
+        end
+    else Validated (bulk)
+        AQH->>FS: Move images+labels from /data-seed/ → /data/
+    else Deleted (bulk)
+        AQH->>FS: Move images+labels → /data_deleted/
+    end
+
+    AQH->>FS: Persist offset to offset.yaml
+```
+
+### Data Flow Table
+
+| Step | Input | Output | Component |
+|------|-------|--------|-----------|
+| Receive | AMQP message (msgpack) | AnnotationMessage / AnnotationBulkMessage | Annotation Queue |
+| Route | AnnotationStatus header | Dispatch to save/validate/delete | Annotation Queue |
+| Save | Image bytes + detection JSON | .jpg + .txt files on disk | Annotation Queue |
+| Track | Message context offset | offset.yaml | Annotation Queue |
+
+---
+
+## Flow 2: Data Augmentation
+
+```mermaid
+sequenceDiagram
+    participant FS as Filesystem (/azaion/data/)
+    participant AUG as Augmentator
+    participant PFS as Filesystem (/azaion/data-processed/)
+
+    loop Every 5 minutes
+        AUG->>FS: Scan /data/images/ for unprocessed files
+        AUG->>AUG: Filter out already-processed images
+        loop Each unprocessed image (parallel)
+            AUG->>FS: Read image + labels
+            AUG->>AUG: Correct bounding boxes (clip + filter)
+            AUG->>AUG: Generate 7 augmented variants
+            AUG->>PFS: Write 8 images (original + 7 augmented)
+            AUG->>PFS: Write 8 label files
+        end
+        AUG->>AUG: Sleep 5 minutes
+    end
+```
+
+---
+
+## Flow 3: Training Pipeline
+
+```mermaid
+sequenceDiagram
+    participant PFS as Filesystem (/data-processed/)
+    participant TRAIN as train.py
+    participant DS as Filesystem (/datasets/)
+    participant YOLO as Ultralytics YOLO
+    participant API as Azaion API
+    participant CDN as S3 CDN
+
+    TRAIN->>PFS: Read all processed images
+    TRAIN->>TRAIN: Shuffle, split 70/20/10
+    TRAIN->>DS: Copy to train/valid/test folders
+    Note over TRAIN: Corrupted labels → /data-corrupted/
+
+    TRAIN->>TRAIN: Generate data.yaml (80 class names)
+    TRAIN->>YOLO: Train yolo11m (120 epochs, batch=11, 1280px)
+    YOLO-->>TRAIN: Training results + best.pt
+
+    TRAIN->>DS: Copy results to /models/{date}/
+    TRAIN->>TRAIN: Copy best.pt → /models/azaion.pt
+
+    TRAIN->>TRAIN: Export .pt → .onnx (1280px, batch=4)
+    TRAIN->>TRAIN: Read azaion.onnx bytes
+    TRAIN->>TRAIN: Encrypt with model key (AES-256-CBC)
+    TRAIN->>TRAIN: Split: small (≤3KB or 20%) + big (rest)
+
+    TRAIN->>API: Upload azaion.onnx.small
+    TRAIN->>CDN: Upload azaion.onnx.big
+```
+
+---
+
+## Flow 4: Model Download & Inference
+
+```mermaid
+sequenceDiagram
+    participant INF as start_inference.py
+    participant API as Azaion API
+    participant CDN as S3 CDN
+    participant SEC as Security
+    participant TRT as TensorRTEngine
+    participant VID as Video File
+    participant GUI as OpenCV Window
+
+    INF->>INF: Determine GPU-specific engine filename
+    INF->>SEC: Get model encryption key
+
+    INF->>API: Login (JWT)
+    INF->>API: Download {engine}.small (encrypted)
+    INF->>INF: Read {engine}.big from local disk
+    INF->>INF: Reassemble: small + big
+    INF->>SEC: Decrypt (AES-256-CBC)
+
+    INF->>TRT: Initialize engine from bytes
+    TRT->>TRT: Allocate CUDA memory (input + output)
+
+    loop Video frames
+        INF->>VID: Read frame (every 4th)
+        INF->>INF: Batch frames to batch_size
+
+        INF->>TRT: Preprocess (blob, normalize, resize)
+        TRT->>TRT: CUDA memcpy host→device
+        TRT->>TRT: Execute inference (async)
+        TRT->>TRT: CUDA memcpy device→host
+
+        INF->>INF: Postprocess (confidence filter + NMS)
+        INF->>GUI: Draw bounding boxes + display
+    end
+```
+
+### Data Flow Table
+
+| Step | Input | Output | Component |
+|------|-------|--------|-----------|
+| Model resolve | GPU compute capability | Engine filename | Inference |
+| Download small | API endpoint + JWT | Encrypted small bytes | API & CDN |
+| Load big | Local filesystem | Encrypted big bytes | API & CDN |
+| Reassemble | small + big bytes | Full encrypted model | API & CDN |
+| Decrypt | Encrypted model + key | Raw TensorRT engine | Security |
+| Init engine | Engine bytes | CUDA buffers allocated | Inference |
+| Preprocess | Video frame | NCHW float32 blob | Inference |
+| Inference | Input blob | Raw detection tensor | Inference |
+| Postprocess | Raw tensor | List[Detection] | Inference |
+| Visualize | Detections + frame | Annotated frame | Inference |
+
+---
+
+## Flow 5: Model Export (Multi-Format)
+
+```mermaid
+flowchart LR
+    PT[azaion.pt] -->|export_onnx| ONNX[azaion.onnx]
+    PT -->|export_tensorrt| TRT[azaion.engine]
+    PT -->|export_rknn| RKNN[azaion.rknn]
+    ONNX -->|encrypt + split| UPLOAD[API + CDN upload]
+    TRT -->|encrypt + split| UPLOAD
+```
+
+| Target Format | Resolution | Batch | Precision | Use Case |
+|---------------|-----------|-------|-----------|----------|
+| ONNX | 1280px | 4 | FP32 | Cross-platform inference |
+| TensorRT | auto | 4 | FP16 | Production GPU inference |
+| RKNN | auto | auto | auto | OrangePi5 edge device |
+
+---
+
+## Error Scenarios
+
+| Flow | Error | Handling |
+|------|-------|---------|
+| Annotation ingestion | Malformed message | Caught by on_message exception handler, logged |
+| Annotation ingestion | Queue disconnect | Process exits (no reconnect logic) |
+| Augmentation | Corrupted image | Caught per-thread, logged, skipped |
+| Augmentation | Transform failure | Caught per-variant, logged, fewer augmentations produced |
+| Training | Corrupted label (coords > 1.0) | Moved to /data-corrupted/ |
+| Training | Power outage | save_period=1 enables resume_training from last epoch |
+| API download | 401/403 | Auto-relogin + retry |
+| API download | 500 | Printed, no retry |
+| Inference | CUDA error | RuntimeError raised |
+| CDN upload/download | Any exception | Caught, printed, returns False |
@@ -0,0 +1,285 @@
+# Blackbox Test Scenarios
+
+## BT-AUG: Augmentation Pipeline
+
+### BT-AUG-01: Single image produces 8 outputs
+- **Input**: 1 image + 1 valid label from fixture dataset
+- **Action**: Run `Augmentator.augment_inner()` on the image
+- **Expected**: Returns list of exactly 8 ImageLabel objects
+- **Traces**: AC: 8× augmentation ratio
+
+### BT-AUG-02: Augmented filenames follow naming convention
+- **Input**: Image with stem "test_image"
+- **Action**: Run `augment_inner()`
+- **Expected**: Output filenames: `test_image.jpg`, `test_image_1.jpg` through `test_image_7.jpg`; matching `.txt` labels
+- **Traces**: AC: Augmentation output format
+
+### BT-AUG-03: All output bounding boxes in valid range
+- **Input**: 1 image + label with multiple bboxes
+- **Action**: Run `augment_inner()`
+- **Expected**: Every bbox coordinate in every output label is in [0.0, 1.0]
+- **Traces**: AC: Bounding boxes clipped to [0, 1]
+
+### BT-AUG-04: Bounding box correction clips edge bboxes
+- **Input**: Label with bbox near edge: `0 0.99 0.5 0.2 0.1`
+- **Action**: Run `correct_bboxes()`
+- **Expected**: Width reduced so bbox fits within [margin, 1-margin]; no coordinate exceeds bounds
+- **Traces**: AC: Bounding boxes clipped to [0, 1]
+
+### BT-AUG-05: Tiny bounding boxes removed after correction
+- **Input**: Label with tiny bbox that becomes < 0.01 after clipping
+- **Action**: Run `correct_bboxes()`
+- **Expected**: Bbox removed from output (area < correct_min_bbox_size)
+- **Traces**: AC: Bounding boxes with area < 0.01% discarded
+
+### BT-AUG-06: Empty label produces 8 outputs with empty labels
+- **Input**: 1 image + empty label file
+- **Action**: Run `augment_inner()`
+- **Expected**: 8 ImageLabel objects returned; all have empty labels lists
+- **Traces**: AC: Augmentation handles empty annotations
+
+### BT-AUG-07: Full augmentation pipeline (filesystem integration)
+- **Input**: 5 images + labels copied to data/ directory in tmp_path
+- **Action**: Run `augment_annotations()` with patched paths
+- **Expected**: 40 images (5 × 8) in processed images dir; 40 matching labels in processed labels dir
+- **Traces**: AC: 8× augmentation, filesystem output
+
+### BT-AUG-08: Augmentation skips already-processed images
+- **Input**: 5 images in data/; 3 already present in processed/ dir
+- **Action**: Run `augment_annotations()`
+- **Expected**: Only 2 new images processed (16 new outputs); existing 3 untouched
+- **Traces**: AC: Augmentation processes only unprocessed images
+
+---
+
+## BT-DSF: Dataset Formation
+
+### BT-DSF-01: 70/20/10 split ratio
+- **Input**: 100 images + labels in processed/ dir
+- **Action**: Run `form_dataset()` with patched paths
+- **Expected**: train: 70 images+labels, valid: 20, test: 10
+- **Traces**: AC: Dataset split 70/20/10
+
+### BT-DSF-02: Split directories structure
+- **Input**: 100 images + labels
+- **Action**: Run `form_dataset()`
+- **Expected**: Created: `train/images/`, `train/labels/`, `valid/images/`, `valid/labels/`, `test/images/`, `test/labels/`
+- **Traces**: AC: YOLO dataset directory structure
+
+### BT-DSF-03: Total files preserved across splits
+- **Input**: 100 valid images + labels
+- **Action**: Run `form_dataset()`
+- **Expected**: `count(train) + count(valid) + count(test) == 100` (no data loss)
+- **Traces**: AC: Dataset integrity
+
+### BT-DSF-04: Corrupted labels moved to corrupted directory
+- **Input**: 95 valid + 5 corrupted labels (coords > 1.0)
+- **Action**: Run `form_dataset()` with patched paths
+- **Expected**: 5 images+labels in `data-corrupted/`; 95 across train/valid/test splits
+- **Traces**: AC: Corrupted labels filtered
+
+---
+
+## BT-LBL: Label Validation
+
+### BT-LBL-01: Valid label accepted
+- **Input**: Label file: `0 0.5 0.5 0.1 0.1`
+- **Action**: Call `check_label(path)`
+- **Expected**: Returns `True`
+- **Traces**: AC: Valid YOLO label format
+
+### BT-LBL-02: Label with x > 1.0 rejected
+- **Input**: Label file: `0 1.5 0.5 0.1 0.1`
+- **Action**: Call `check_label(path)`
+- **Expected**: Returns `False`
+- **Traces**: AC: Corrupted labels detected
+
+### BT-LBL-03: Label with height > 1.0 rejected
+- **Input**: Label file: `0 0.5 0.5 0.1 1.2`
+- **Action**: Call `check_label(path)`
+- **Expected**: Returns `False`
+- **Traces**: AC: Corrupted labels detected
+
+### BT-LBL-04: Missing label file rejected
+- **Input**: Non-existent file path
+- **Action**: Call `check_label(path)`
+- **Expected**: Returns `False`
+- **Traces**: AC: Missing labels handled
+
+### BT-LBL-05: Multi-line label with one corrupted line
+- **Input**: Label file: `0 0.5 0.5 0.1 0.1\n3 0.5 0.5 0.1 1.5`
+- **Action**: Call `check_label(path)`
+- **Expected**: Returns `False` (any corrupted line fails the whole file)
+- **Traces**: AC: Corrupted labels detected
+
+---
+
+## BT-ENC: Encryption
+
+### BT-ENC-01: Encrypt-decrypt roundtrip (arbitrary data)
+- **Input**: 1024 random bytes, key "test-key"
+- **Action**: `decrypt_to(encrypt_to(data, key), key)`
+- **Expected**: Output equals input bytes exactly
+- **Traces**: AC: AES-256-CBC encryption
+
+### BT-ENC-02: Encrypt-decrypt roundtrip (ONNX model)
+- **Input**: `azaion.onnx` bytes, model encryption key
+- **Action**: `decrypt_to(encrypt_to(model_bytes, key), key)`
+- **Expected**: Output equals input bytes exactly
+- **Traces**: AC: Model encryption
+
+### BT-ENC-03: Empty input roundtrip
+- **Input**: `b""`, key "test-key"
+- **Action**: `decrypt_to(encrypt_to(b"", key), key)`
+- **Expected**: Output equals `b""`
+- **Traces**: AC: Edge case handling
+
+### BT-ENC-04: Single byte roundtrip
+- **Input**: `b"\x00"`, key "test-key"
+- **Action**: `decrypt_to(encrypt_to(b"\x00", key), key)`
+- **Expected**: Output equals `b"\x00"`
+- **Traces**: AC: Edge case handling
+
+### BT-ENC-05: Different keys produce different ciphertext
+- **Input**: Same 1024 bytes, keys "key-a" and "key-b"
+- **Action**: `encrypt_to(data, "key-a")` vs `encrypt_to(data, "key-b")`
+- **Expected**: Ciphertexts differ
+- **Traces**: AC: Key-dependent encryption
+
+### BT-ENC-06: Wrong key fails decryption
+- **Input**: Encrypted with "key-a", decrypt with "key-b"
+- **Action**: `decrypt_to(encrypted, "key-b")`
+- **Expected**: Output does NOT equal original input
+- **Traces**: AC: Key-dependent encryption
+
+---
+
+## BT-SPL: Model Split Storage
+
+### BT-SPL-01: Split respects size constraint
+- **Input**: 10000 encrypted bytes
+- **Action**: Split into small + big per `SMALL_SIZE_KB = 3` logic
+- **Expected**: small ≤ max(3072 bytes, 20% of total); big = remainder
+- **Traces**: AC: Model split ≤3KB or 20%
+
+### BT-SPL-02: Reassembly produces original
+- **Input**: 10000 encrypted bytes → split → reassemble
+- **Action**: `small + big`
+- **Expected**: Equals original encrypted bytes
+- **Traces**: AC: Split model integrity
+
+---
+
+## BT-CLS: Annotation Class Loading
+
+### BT-CLS-01: Load 17 base classes
+- **Input**: `classes.json`
+- **Action**: `AnnotationClass.read_json()`
+- **Expected**: Dict with 17 unique class entries (base IDs)
+- **Traces**: AC: 17 base classes
+
+### BT-CLS-02: Weather mode expansion
+- **Input**: `classes.json`
+- **Action**: `AnnotationClass.read_json()`
+- **Expected**: Same class at offset 0 (Norm), 20 (Wint), 40 (Night); e.g., ID 0, 20, 40 all represent ArmorVehicle
+- **Traces**: AC: 3 weather modes
+
+### BT-CLS-03: YAML generation produces 80 class names
+- **Input**: `classes.json` + dataset path
+- **Action**: `create_yaml()` with patched paths
+- **Expected**: data.yaml contains `nc: 80`, 17 named classes + 63 `Class-N` placeholders
+- **Traces**: AC: 80 total class slots
+
+---
+
+## BT-HSH: Hardware Hash
+
+### BT-HSH-01: Deterministic output
+- **Input**: "test-hardware-info"
+- **Action**: `Security.get_hw_hash()` called twice
+- **Expected**: Both calls return identical string
+- **Traces**: AC: Hardware fingerprinting determinism
+
+### BT-HSH-02: Different inputs produce different hashes
+- **Input**: "hw-a" and "hw-b"
+- **Action**: `Security.get_hw_hash()` on each
+- **Expected**: Results differ
+- **Traces**: AC: Hardware-bound uniqueness
+
+### BT-HSH-03: Output is valid base64
+- **Input**: "test-hardware-info"
+- **Action**: `Security.get_hw_hash()`
+- **Expected**: Matches regex `^[A-Za-z0-9+/]+=*$`
+- **Traces**: AC: Hash format
+
+---
+
+## BT-INF: ONNX Inference
+
+### BT-INF-01: Model loads successfully
+- **Input**: `azaion.onnx` bytes
+- **Action**: `OnnxEngine(model_bytes)`
+- **Expected**: No exception; engine object created with valid input_shape and batch_size
+- **Traces**: AC: ONNX inference capability
+
+### BT-INF-02: Inference returns output
+- **Input**: ONNX engine + 1 preprocessed image
+- **Action**: `engine.run(input_blob)`
+- **Expected**: Returns list of numpy arrays; first array has shape [batch, N, 6+]
+- **Traces**: AC: ONNX inference produces results
+
+### BT-INF-03: Postprocessing returns valid detections
+- **Input**: ONNX engine output from real image
+- **Action**: `Inference.postprocess()`
+- **Expected**: Returns list of Annotation objects; each Detection has x,y,w,h ∈ [0,1], cls ∈ [0,79], confidence ∈ [0,1]
+- **Traces**: AC: Detection format validity
+
+---
+
+## BT-NMS: Overlap Removal
+
+### BT-NMS-01: Overlapping detections — keep higher confidence
+- **Input**: 2 Detection objects at same position, confidence 0.9 and 0.5, IoU > 0.3
+- **Action**: `remove_overlapping_detections()`
+- **Expected**: 1 detection returned (confidence 0.9)
+- **Traces**: AC: NMS IoU threshold 0.3
+
+### BT-NMS-02: Non-overlapping detections — keep both
+- **Input**: 2 Detection objects at distant positions, IoU < 0.3
+- **Action**: `remove_overlapping_detections()`
+- **Expected**: 2 detections returned
+- **Traces**: AC: NMS preserves non-overlapping
+
+### BT-NMS-03: Chain overlap resolution
+- **Input**: 3 Detection objects: A overlaps B (IoU > 0.3), B overlaps C (IoU > 0.3), A doesn't overlap C
+- **Action**: `remove_overlapping_detections()`
+- **Expected**: ≤ 2 detections; highest confidence per overlapping pair kept
+- **Traces**: AC: NMS handles chains
+
+---
+
+## BT-AQM: Annotation Queue Message Parsing
+
+### BT-AQM-01: Parse Created annotation message
+- **Input**: Msgpack bytes matching AnnotationMessage schema (status=Created, role=Validator)
+- **Action**: Decode and construct AnnotationMessage
+- **Expected**: All fields populated: name, detections, image bytes, status == "Created", role == "Validator"
+- **Traces**: AC: Annotation message parsing
+
+### BT-AQM-02: Parse Validated bulk message
+- **Input**: Msgpack bytes with status=Validated, list of names
+- **Action**: Decode and construct AnnotationBulkMessage
+- **Expected**: Status == "Validated", names list matches input
+- **Traces**: AC: Bulk validation parsing
+
+### BT-AQM-03: Parse Deleted bulk message
+- **Input**: Msgpack bytes with status=Deleted, list of names
+- **Action**: Decode and construct AnnotationBulkMessage
+- **Expected**: Status == "Deleted", names list matches input
+- **Traces**: AC: Bulk deletion parsing
+
+### BT-AQM-04: Malformed message raises exception
+- **Input**: Invalid msgpack bytes
+- **Action**: Attempt to decode
+- **Expected**: Exception raised
+- **Traces**: AC: Error handling for malformed messages
@@ -0,0 +1,71 @@
+# Test Environment
+
+## Runtime Requirements
+
+| Requirement | Specification |
+|-------------|--------------|
+| Python | 3.10+ |
+| OS | Linux or macOS (POSIX filesystem paths) |
+| GPU | Optional — ONNX inference falls back to CPUExecutionProvider |
+| Disk | Temp directory for fixture data (~500MB for augmentation output) |
+| Network | Not required (all tests are offline) |
+
+## Execution Modes
+
+Tests MUST be runnable in two ways:
+
+### 1. Local (no Docker) — primary mode
+Run directly on the host machine. Required for macOS development where Docker has GPU/performance limitations.
+
+```bash
+scripts/run-tests-local.sh
+```
+
+### 2. Docker — CI/portable mode
+Run inside a container for reproducible CI environments (Linux-based CI runners).
+
+```bash
+docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
+```
+
+Both modes run the same pytest suite; the only difference is the runtime environment.
+
+## Dependencies
+
+All test dependencies are a subset of the production `requirements.txt` plus pytest:
+
+| Package | Purpose |
+|---------|---------|
+| pytest | Test runner |
+| albumentations | Augmentation tests |
+| opencv-python-headless | Image I/O (headless — no GUI) |
+| numpy | Array operations |
+| onnxruntime | ONNX inference (CPU fallback) |
+| cryptography | Encryption tests |
+| msgpack | Annotation queue message tests |
+| PyYAML | Config/YAML generation tests |
+
+## Fixture Data
+
+| Fixture | Location | Size |
+|---------|----------|------|
+| 100 annotated images | `_docs/00_problem/input_data/dataset/images/` | ~50MB |
+| 100 YOLO labels | `_docs/00_problem/input_data/dataset/labels/` | ~10KB |
+| ONNX model | `_docs/00_problem/input_data/azaion.onnx` | 81MB |
+| Class definitions | `classes.json` (project root) | 2KB |
+
+## Test Isolation
+
+- Each test creates a temporary directory (via `tmp_path` pytest fixture) for filesystem operations
+- No tests modify the actual `/azaion/` directory structure
+- No tests require running external services (RabbitMQ, Azaion API, S3 CDN)
+- Constants paths are patched/overridden to point to temp directories during tests
+
+## Excluded (Require External Services)
+
+| Component | Service Required | Reason for Exclusion |
+|-----------|-----------------|---------------------|
+| API upload/download | Azaion REST API | No mock server; real API has auth |
+| CDN upload/download | S3-compatible CDN | No mock S3; real CDN has credentials |
+| Queue consumption | RabbitMQ Streams | No mock broker; rstream requires live connection |
+| TensorRT inference | NVIDIA GPU + TensorRT | Hardware-specific; cannot run in CI without GPU |
@@ -0,0 +1,33 @@
+# Performance Test Scenarios
+
+## PT-AUG-01: Augmentation throughput
+- **Input**: 10 images from fixture dataset
+- **Action**: Run `augment_annotations()`, measure wall time
+- **Expected**: Completes within 60 seconds (10 images × 8 outputs = 80 files)
+- **Traces**: Restriction: Augmentation runs continuously
+- **Note**: Threshold is generous; actual performance depends on CPU
+
+## PT-AUG-02: Parallel augmentation speedup
+- **Input**: 10 images from fixture dataset
+- **Action**: Run with ThreadPoolExecutor vs sequential, compare times
+- **Expected**: Parallel is ≥ 1.5× faster than sequential
+- **Traces**: AC: Parallelized per-image processing
+
+## PT-DSF-01: Dataset formation throughput
+- **Input**: 100 images + labels
+- **Action**: Run `form_dataset()`, measure wall time
+- **Expected**: Completes within 30 seconds
+- **Traces**: Restriction: Dataset formation before training
+
+## PT-ENC-01: Encryption throughput
+- **Input**: 10MB random bytes
+- **Action**: Encrypt + decrypt roundtrip, measure wall time
+- **Expected**: Completes within 5 seconds
+- **Traces**: AC: Model encryption feasible for large models
+
+## PT-INF-01: ONNX inference latency (single image)
+- **Input**: 1 preprocessed image + ONNX model
+- **Action**: Run single inference, measure wall time
+- **Expected**: Completes within 10 seconds on CPU (no GPU requirement for test)
+- **Traces**: AC: Inference capability
+- **Note**: Production uses GPU; CPU is slower but validates correctness
@@ -0,0 +1,37 @@
+# Resilience Test Scenarios
+
+## RT-AUG-01: Augmentation handles corrupted image gracefully
+- **Input**: 1 valid image + 1 corrupted image file (truncated JPEG) in data/ dir
+- **Action**: Run `augment_annotations()`
+- **Expected**: Valid image produces 8 outputs; corrupted image skipped without crashing pipeline; total output: 8 files
+- **Traces**: Restriction: Augmentation exception handling per-image
+
+## RT-AUG-02: Augmentation handles missing label file
+- **Input**: 1 image with no matching label file
+- **Action**: Run `augment_annotation()` on the image
+- **Expected**: Exception caught per-thread; does not crash pipeline
+- **Traces**: Restriction: Augmentation exception handling
+
+## RT-AUG-03: Augmentation transform failure produces fewer variants
+- **Input**: 1 image + label that causes some transforms to fail (extremely narrow bbox)
+- **Action**: Run `augment_inner()`
+- **Expected**: Returns 1-8 ImageLabel objects (original always present; failed variants skipped); no crash
+- **Traces**: Restriction: Transform failure handling
+
+## RT-DSF-01: Dataset formation with empty processed directory
+- **Input**: Empty processed images dir
+- **Action**: Run `form_dataset()`
+- **Expected**: Creates empty train/valid/test directories; no crash
+- **Traces**: Restriction: Edge case handling
+
+## RT-ENC-01: Decrypt with corrupted ciphertext
+- **Input**: Randomly modified ciphertext bytes
+- **Action**: `Security.decrypt_to(corrupted_bytes, key)`
+- **Expected**: Either raises exception or returns garbage bytes (not original)
+- **Traces**: AC: Encryption integrity
+
+## RT-AQM-01: Malformed msgpack message
+- **Input**: Random bytes that aren't valid msgpack
+- **Action**: Pass to message handler
+- **Expected**: Exception caught; handler doesn't crash
+- **Traces**: AC: Error handling for malformed messages
@@ -0,0 +1,31 @@
+# Resource Limit Test Scenarios
+
+## RL-AUG-01: Augmentation output count bounded
+- **Input**: 1 image
+- **Action**: Run `augment_inner()`
+- **Expected**: Returns exactly 8 outputs (never more, even with retries)
+- **Traces**: AC: 8× augmentation ratio (1 original + 7 augmented)
+
+## RL-DSF-01: Dataset split ratios sum to 100%
+- **Input**: Any number of images
+- **Action**: Check `train_set + valid_set + test_set`
+- **Expected**: Equals 100
+- **Traces**: AC: 70/20/10 split
+
+## RL-DSF-02: No data duplication across splits
+- **Input**: 100 images
+- **Action**: Run `form_dataset()`, collect all filenames across train/valid/test
+- **Expected**: No filename appears in more than one split
+- **Traces**: AC: Dataset integrity
+
+## RL-ENC-01: Encrypted output size bounded
+- **Input**: N bytes plaintext
+- **Action**: Encrypt
+- **Expected**: Ciphertext size ≤ N + 32 bytes (16 IV + up to 16 padding)
+- **Traces**: Restriction: AES-256-CBC overhead
+
+## RL-CLS-01: Total class count is exactly 80
+- **Input**: `classes.json`
+- **Action**: Generate class list for YAML
+- **Expected**: Exactly 80 entries (17 named × 3 weather + 29 placeholders = 80)
+- **Traces**: AC: 80 total class slots
@@ -0,0 +1,43 @@
+# Security Test Scenarios
+
+## ST-ENC-01: Encryption produces different ciphertext each time (random IV)
+- **Input**: Same 1024 bytes, same key, encrypt twice
+- **Action**: Compare two ciphertexts
+- **Expected**: Ciphertexts differ (random IV ensures non-deterministic output)
+- **Traces**: AC: AES-256-CBC with random IV
+
+## ST-ENC-02: Wrong key cannot recover plaintext
+- **Input**: Encrypt with "key-a", attempt decrypt with "key-b"
+- **Action**: `Security.decrypt_to(encrypted, "key-b")`
+- **Expected**: Output != original plaintext
+- **Traces**: AC: Key-dependent encryption
+
+## ST-ENC-03: Model encryption key is deterministic
+- **Input**: Call `Security.get_model_encryption_key()` twice
+- **Action**: Compare results
+- **Expected**: Identical strings
+- **Traces**: AC: Static model encryption key
+
+## ST-HSH-01: Hardware hash is deterministic for same input
+- **Input**: Same hardware info string
+- **Action**: `Security.get_hw_hash()` called twice
+- **Expected**: Identical output
+- **Traces**: AC: Hardware fingerprinting determinism
+
+## ST-HSH-02: Different hardware produces different hash
+- **Input**: Two different hardware info strings
+- **Action**: `Security.get_hw_hash()` on each
+- **Expected**: Different outputs
+- **Traces**: AC: Hardware-bound uniqueness
+
+## ST-HSH-03: API encryption key depends on credentials + hardware
+- **Input**: Same credentials with different hardware hashes
+- **Action**: `Security.get_api_encryption_key()` for each
+- **Expected**: Different keys
+- **Traces**: AC: Hardware-bound API encryption
+
+## ST-HSH-04: API encryption key depends on credentials
+- **Input**: Different credentials with same hardware hash
+- **Action**: `Security.get_api_encryption_key()` for each
+- **Expected**: Different keys
+- **Traces**: AC: Credential-dependent API encryption
@@ -0,0 +1,26 @@
+# Test Data Management
+
+## Fixture Sources
+
+| ID | Data Item | Source | Format | Preparation |
+|----|-----------|--------|--------|-------------|
+| FD-01 | Annotated images (100) | `_docs/00_problem/input_data/dataset/images/` | JPEG | Copy subset to tmp_path at test start |
+| FD-02 | YOLO labels (100) | `_docs/00_problem/input_data/dataset/labels/` | TXT | Copy subset to tmp_path at test start |
+| FD-03 | ONNX model | `_docs/00_problem/input_data/azaion.onnx` | ONNX | Read bytes at test start |
+| FD-04 | Class definitions | `classes.json` (project root) | JSON | Copy to tmp_path at test start |
+| FD-05 | Corrupted labels | Generated at test time | TXT | Create labels with coords > 1.0 |
+| FD-06 | Edge-case bboxes | Generated at test time | In-memory | Construct bboxes near image boundaries |
+| FD-07 | Detection objects | Generated at test time | In-memory | Construct Detection instances for NMS tests |
+| FD-08 | Msgpack messages | Generated at test time | bytes | Construct AnnotationMessage-compatible msgpack |
+| FD-09 | Random binary data | Generated at test time | bytes | `os.urandom(N)` for encryption tests |
+| FD-10 | Empty label file | Generated at test time | TXT | Empty file for augmentation edge case |
+
+## Data Lifecycle
+
+1. **Setup**: pytest `conftest.py` copies fixture files to `tmp_path`
+2. **Execution**: Tests operate on copied data in isolation
+3. **Teardown**: `tmp_path` is automatically cleaned by pytest
+
+## Expected Results Location
+
+All expected results are defined in `_docs/00_problem/input_data/expected_results/results_report.md` (37 test scenarios mapped).
@@ -0,0 +1,67 @@
+# Traceability Matrix
+
+## Acceptance Criteria Coverage
+
+| AC / Restriction | Test IDs | Coverage |
+|------------------|----------|----------|
+| 8× augmentation ratio | BT-AUG-01, BT-AUG-06, BT-AUG-07, RL-AUG-01 | Full |
+| Augmentation naming convention | BT-AUG-02 | Full |
+| Bounding boxes clipped to [0,1] | BT-AUG-03, BT-AUG-04 | Full |
+| Tiny bboxes (< 0.01) discarded | BT-AUG-05 | Full |
+| Augmentation skips already-processed | BT-AUG-08 | Full |
+| Augmentation parallelized | PT-AUG-02 | Full |
+| Augmentation handles corrupted images | RT-AUG-01 | Full |
+| Augmentation handles missing labels | RT-AUG-02 | Full |
+| Transform failure graceful | RT-AUG-03 | Full |
+| Dataset split 70/20/10 | BT-DSF-01, RL-DSF-01 | Full |
+| Dataset directory structure | BT-DSF-02 | Full |
+| Dataset integrity (no data loss) | BT-DSF-03, RL-DSF-02 | Full |
+| Corrupted label filtering | BT-DSF-04, BT-LBL-01 to BT-LBL-05 | Full |
+| AES-256-CBC encryption | BT-ENC-01 to BT-ENC-06, ST-ENC-01, ST-ENC-02 | Full |
+| Model encryption roundtrip | BT-ENC-02 | Full |
+| Model split ≤3KB or 20% | BT-SPL-01, BT-SPL-02 | Full |
+| 17 base classes | BT-CLS-01 | Full |
+| 3 weather modes (Norm/Wint/Night) | BT-CLS-02 | Full |
+| 80 total class slots | BT-CLS-03, RL-CLS-01 | Full |
+| YAML generation (nc: 80) | BT-CLS-03 | Full |
+| Hardware hash determinism | BT-HSH-01 to BT-HSH-03, ST-HSH-01, ST-HSH-02 | Full |
+| Hardware-bound API encryption | ST-HSH-03, ST-HSH-04 | Full |
+| ONNX inference loads model | BT-INF-01 | Full |
+| ONNX inference returns detections | BT-INF-02, BT-INF-03 | Full |
+| NMS overlap removal (IoU 0.3) | BT-NMS-01, BT-NMS-02, BT-NMS-03 | Full |
+| Annotation message parsing | BT-AQM-01 to BT-AQM-04, RT-AQM-01 | Full |
+| Encryption size overhead bounded | RL-ENC-01 | Full |
+| Static model encryption key | ST-ENC-03 | Full |
+| Random IV per encryption | ST-ENC-01 | Full |
+
+## Uncovered (Require External Services)
+
+| AC / Restriction | Reason |
+|------------------|--------|
+| TensorRT inference (54s for 200s video) | Requires NVIDIA GPU + TensorRT runtime |
+| API upload/download with JWT auth | Requires live Azaion API |
+| CDN upload/download (S3) | Requires live S3-compatible CDN |
+| Queue offset persistence | Requires live RabbitMQ Streams |
+| Auto-relogin on 401/403 | Requires live Azaion API |
+| Frame sampling every 4th frame | Requires video file (fixture not provided) |
+| Confidence threshold 0.3 filtering | Partially covered by BT-INF-03 (validates range, not exact threshold) |
+
+## Summary
+
+| Metric | Value |
+|--------|-------|
+| Total AC + Restrictions | 36 |
+| Covered by tests | 29 |
+| Uncovered (external deps) | 7 |
+| **Coverage** | **80.6%** |
+
+## Test Count Summary
+
+| Category | Count |
+|----------|-------|
+| Blackbox tests | 32 |
+| Performance tests | 5 |
+| Resilience tests | 6 |
+| Security tests | 7 |
+| Resource limit tests | 5 |
+| **Total** | **55** |
@@ -0,0 +1,137 @@
+# Test Infrastructure
+
+**Task**: AZ-152_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the test project — pytest configuration, fixtures, conftest, test data management, Docker test environment
+**Complexity**: 3 points
+**Dependencies**: None
+**Component**: Blackbox Tests
+**Jira**: AZ-152
+**Epic**: AZ-151
+
+## Test Project Folder Layout
+
+```
+tests/
+├── conftest.py
+├── test_augmentation.py
+├── test_dataset_formation.py
+├── test_label_validation.py
+├── test_encryption.py
+├── test_model_split.py
+├── test_annotation_classes.py
+├── test_hardware_hash.py
+├── test_onnx_inference.py
+├── test_nms.py
+├── test_annotation_queue.py
+├── performance/
+│   ├── conftest.py
+│   ├── test_augmentation_perf.py
+│   ├── test_dataset_perf.py
+│   ├── test_encryption_perf.py
+│   └── test_inference_perf.py
+├── resilience/
+│   └── (resilience tests embedded in main test files via markers)
+├── security/
+│   └── (security tests embedded in main test files via markers)
+└── resource_limits/
+    └── (resource limit tests embedded in main test files via markers)
+```
+
+### Layout Rationale
+
+Flat test file structure per functional area matches the existing codebase module layout. Performance tests are separated into a subdirectory so they can be run independently (slower, threshold-based). Resilience, security, and resource limit tests use pytest markers (`@pytest.mark.resilience`, `@pytest.mark.security`, `@pytest.mark.resource_limit`) within the main test files to avoid unnecessary file proliferation while allowing selective execution.
+
+## Mock Services
+
+No mock services required. All 55 test scenarios operate offline against local code modules. External services (Azaion API, S3 CDN, RabbitMQ Streams, TensorRT) are excluded from the test scope per user decision.
+
+## Docker Test Environment
+
+### docker-compose.test.yml Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|--------------|---------|------------|
+| test-runner | Build from `Dockerfile.test` | Runs pytest suite | — |
+
+Single-container setup: the system under test is a Python library (not a service), so tests import modules directly. No network services required.
+
+### Volumes
+
+| Volume Mount | Purpose |
+|-------------|---------|
+| `./test-results:/app/test-results` | JUnit XML output for CI parsing |
+| `./_docs/00_problem/input_data:/app/_docs/00_problem/input_data:ro` | Fixture images, labels, ONNX model (read-only) |
+
+## Test Runner Configuration
+
+**Framework**: pytest
+**Plugins**: pytest (built-in JUnit XML via `--junitxml`)
+**Entry point (local)**: `scripts/run-tests-local.sh`
+**Entry point (Docker)**: `docker compose -f docker-compose.test.yml up --build --abort-on-container-exit`
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `fixture_images_dir` | session | Path to 100 JPEG images from `_docs/00_problem/input_data/dataset/images/` |
+| `fixture_labels_dir` | session | Path to 100 YOLO labels from `_docs/00_problem/input_data/dataset/labels/` |
+| `fixture_onnx_model` | session | Bytes of `_docs/00_problem/input_data/azaion.onnx` |
+| `fixture_classes_json` | session | Path to `classes.json` |
+| `work_dir` | function | `tmp_path` based working directory for filesystem tests |
+| `sample_image_label` | function | Copies 1 image + label to `tmp_path` |
+| `sample_images_labels` | function | Copies N images + labels to `tmp_path` (parameterizable) |
+| `corrupted_label` | function | Generates a label file with coords > 1.0 in `tmp_path` |
+| `edge_bbox_label` | function | Generates a label with bbox near image edge in `tmp_path` |
+| `empty_label` | function | Generates an empty label file in `tmp_path` |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| 100 annotated images | `_docs/00_problem/input_data/dataset/images/` | JPEG | Augmentation, dataset formation, inference |
+| 100 YOLO labels | `_docs/00_problem/input_data/dataset/labels/` | TXT | Augmentation, dataset formation, label validation |
+| ONNX model (77MB) | `_docs/00_problem/input_data/azaion.onnx` | ONNX | Encryption roundtrip, inference |
+| Class definitions | `classes.json` (project root) | JSON | Annotation class loading, YAML generation |
+| Corrupted labels | Generated at test time | TXT | Label validation, dataset formation |
+| Edge-case bboxes | Generated at test time | In-memory | Augmentation bbox correction |
+| Detection objects | Generated at test time | In-memory | NMS overlap removal |
+| Msgpack messages | Generated at test time | bytes | Annotation queue parsing |
+| Random binary data | Generated at test time (`os.urandom`) | bytes | Encryption tests |
+| Empty label files | Generated at test time | TXT | Augmentation edge case |
+
+### Data Isolation
+
+Each test function receives an isolated `tmp_path` directory. Fixture files are copied (not symlinked) to `tmp_path` to prevent cross-test interference. Session-scoped fixtures (image dir, model bytes) are read-only references. No test modifies the source fixture data.
+
+## Test Reporting
+
+**Format**: JUnit XML
+**Output path**: `test-results/test-results.xml` (local), `/app/test-results/test-results.xml` (Docker)
+**CI integration**: Standard JUnit XML parseable by GitHub Actions, Azure Pipelines, GitLab CI
+
+## Constants Patching Strategy
+
+The production code uses hardcoded paths from `constants.py` (e.g., `/azaion/data/`). Tests must override these paths to point to `tmp_path` directories. Strategy: use `monkeypatch` or `unittest.mock.patch` to override `constants.*` module attributes at test function scope.
+
+## Acceptance Criteria
+
+**AC-1: Local test runner works**
+Given requirements-test.txt is installed
+When `scripts/run-tests-local.sh` is executed
+Then pytest discovers and runs tests, produces JUnit XML in `test-results/`
+
+**AC-2: Docker test runner works**
+Given Dockerfile.test and docker-compose.test.yml exist
+When `docker compose -f docker-compose.test.yml up --build` is executed
+Then test-runner container runs all tests, JUnit XML is written to mounted `test-results/` volume
+
+**AC-3: Fixtures provide test data**
+Given conftest.py defines session and function-scoped fixtures
+When a test requests `fixture_images_dir`
+Then it receives a valid path to 100 JPEG images
+
+**AC-4: Constants are properly patched**
+Given a test patches `constants.data_dir` to `tmp_path`
+When the test runs augmentation or dataset formation
+Then all file operations target `tmp_path`, not `/azaion/`
@@ -0,0 +1,83 @@
+# Augmentation Blackbox Tests
+
+**Task**: AZ-153_test_augmentation
+**Name**: Augmentation Blackbox Tests
+**Description**: Implement 8 blackbox tests for the augmentation pipeline — output count, naming, bbox validation, edge cases, filesystem integration
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-153
+**Epic**: AZ-151
+
+## Problem
+
+The augmentation pipeline transforms annotated images into 8 variants each. Tests must verify output count, naming conventions, bounding box validity, edge cases, and filesystem integration without referencing internals.
+
+## Outcome
+
+- 8 passing pytest tests in `tests/test_augmentation.py`
+- Covers: single-image augmentation, naming convention, bbox range, bbox clipping, tiny bbox removal, empty labels, full pipeline, skip-already-processed
+
+## Scope
+
+### Included
+- BT-AUG-01: Single image → 8 outputs
+- BT-AUG-02: Augmented filenames follow naming convention
+- BT-AUG-03: All output bounding boxes in valid range [0,1]
+- BT-AUG-04: Bounding box correction clips edge bboxes
+- BT-AUG-05: Tiny bounding boxes removed after correction
+- BT-AUG-06: Empty label produces 8 outputs with empty labels
+- BT-AUG-07: Full augmentation pipeline (filesystem, 5 images → 40 outputs)
+- BT-AUG-08: Augmentation skips already-processed images
+
+### Excluded
+- Performance tests (separate task)
+- Resilience tests (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Output count**
+Given 1 image + 1 valid label
+When augment_inner() runs
+Then exactly 8 ImageLabel objects are returned
+
+**AC-2: Naming convention**
+Given image with stem "test_image"
+When augment_inner() runs
+Then outputs named test_image.jpg, test_image_1.jpg through test_image_7.jpg with matching .txt labels
+
+**AC-3: Bbox validity**
+Given 1 image + label with multiple bboxes
+When augment_inner() runs
+Then every bbox coordinate in every output is in [0.0, 1.0]
+
+**AC-4: Edge bbox clipping**
+Given label with bbox near edge (x=0.99, w=0.2)
+When correct_bboxes() runs
+Then width reduced to fit within bounds; no coordinate exceeds [margin, 1-margin]
+
+**AC-5: Tiny bbox removal**
+Given label with bbox that becomes < 0.01 area after clipping
+When correct_bboxes() runs
+Then bbox is removed from output
+
+**AC-6: Empty label**
+Given 1 image + empty label file
+When augment_inner() runs
+Then 8 ImageLabel objects returned, all with empty labels lists
+
+**AC-7: Full pipeline**
+Given 5 images + labels in data/ directory
+When augment_annotations() runs with patched paths
+Then 40 images in processed images dir, 40 matching labels
+
+**AC-8: Skip already-processed**
+Given 5 images in data/, 3 already in processed/
+When augment_annotations() runs
+Then only 2 new images processed (16 new outputs), existing 3 untouched
+
+## Constraints
+
+- Must patch constants.py paths to use tmp_path
+- Fixture images from _docs/00_problem/input_data/dataset/
+- Each test operates in isolated tmp_path
@@ -0,0 +1,72 @@
+# Augmentation Performance, Resilience & Resource Tests
+
+**Task**: AZ-154_test_augmentation_nonfunc
+**Name**: Augmentation Non-Functional Tests
+**Description**: Implement performance, resilience, and resource limit tests for augmentation — throughput, parallel speedup, error handling, output bounds
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-154
+**Epic**: AZ-151
+
+## Problem
+
+Augmentation must perform within time thresholds, handle corrupted/missing inputs gracefully, and respect output count bounds.
+
+## Outcome
+
+- 6 passing pytest tests across performance and resilience categories
+- Performance tests in `tests/performance/test_augmentation_perf.py`
+- Resilience and resource limit tests in `tests/test_augmentation.py` with markers
+
+## Scope
+
+### Included
+- PT-AUG-01: Augmentation throughput (10 images ≤ 60s)
+- PT-AUG-02: Parallel augmentation speedup (≥ 1.5× faster)
+- RT-AUG-01: Handles corrupted image gracefully
+- RT-AUG-02: Handles missing label file
+- RT-AUG-03: Transform failure produces fewer variants (no crash)
+- RL-AUG-01: Output count bounded to exactly 8
+
+### Excluded
+- Blackbox functional tests (separate task 02)
+
+## Acceptance Criteria
+
+**AC-1: Throughput**
+Given 10 images from fixture dataset
+When augment_annotations() runs
+Then completes within 60 seconds
+
+**AC-2: Parallel speedup**
+Given 10 images from fixture dataset
+When run with ThreadPoolExecutor vs sequential
+Then parallel is ≥ 1.5× faster
+
+**AC-3: Corrupted image**
+Given 1 valid + 1 corrupted image (truncated JPEG)
+When augment_annotations() runs
+Then valid image produces 8 outputs, corrupted skipped, no crash
+
+**AC-4: Missing label**
+Given 1 image with no matching label file
+When augment_annotation() runs on it
+Then exception caught per-thread, pipeline continues
+
+**AC-5: Transform failure**
+Given 1 image + label with extremely narrow bbox
+When augment_inner() runs
+Then 1-8 ImageLabel objects returned, no crash
+
+**AC-6: Output count bounded**
+Given 1 image
+When augment_inner() runs
+Then exactly 8 outputs returned (never more)
+
+## Constraints
+
+- Performance tests require pytest markers: `@pytest.mark.performance`
+- Resilience tests marked: `@pytest.mark.resilience`
+- Resource limit tests marked: `@pytest.mark.resource_limit`
+- Performance thresholds are generous (CPU-bound, no GPU requirement)
@@ -0,0 +1,78 @@
+# Dataset Formation Tests
+
+**Task**: AZ-155_test_dataset_formation
+**Name**: Dataset Formation Tests
+**Description**: Implement blackbox, performance, resilience, and resource tests for dataset split — 70/20/10 ratio, directory structure, integrity, corrupted filtering
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-155
+**Epic**: AZ-151
+
+## Problem
+
+Dataset formation splits annotated images into train/valid/test sets. Tests must verify correct ratios, directory structure, data integrity, corrupted label filtering, and performance.
+
+## Outcome
+
+- 8 passing pytest tests covering dataset formation
+- Blackbox tests in `tests/test_dataset_formation.py`
+- Performance test in `tests/performance/test_dataset_perf.py`
+
+## Scope
+
+### Included
+- BT-DSF-01: 70/20/10 split ratio (100 images → 70/20/10)
+- BT-DSF-02: Split directory structure (6 subdirs created)
+- BT-DSF-03: Total files preserved (sum == 100)
+- BT-DSF-04: Corrupted labels moved to corrupted directory
+- PT-DSF-01: Dataset formation throughput (100 images ≤ 30s)
+- RT-DSF-01: Empty processed directory handled gracefully
+- RL-DSF-01: Split ratios sum to 100%
+- RL-DSF-02: No data duplication across splits
+
+### Excluded
+- Label validation (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Split ratio**
+Given 100 images + labels in processed/ dir
+When form_dataset() runs with patched paths
+Then train: 70, valid: 20, test: 10
+
+**AC-2: Directory structure**
+Given 100 images + labels
+When form_dataset() runs
+Then creates train/images/, train/labels/, valid/images/, valid/labels/, test/images/, test/labels/
+
+**AC-3: Data integrity**
+Given 100 valid images + labels
+When form_dataset() runs
+Then count(train) + count(valid) + count(test) == 100
+
+**AC-4: Corrupted filtering**
+Given 95 valid + 5 corrupted labels
+When form_dataset() runs
+Then 5 in data-corrupted/, 95 across splits
+
+**AC-5: Throughput**
+Given 100 images + labels
+When form_dataset() runs
+Then completes within 30 seconds
+
+**AC-6: Empty directory**
+Given empty processed images dir
+When form_dataset() runs
+Then empty dirs created, no crash
+
+**AC-7: No duplication**
+Given 100 images after form_dataset()
+When collecting all filenames across train/valid/test
+Then no filename appears in more than one split
+
+## Constraints
+
+- Must patch constants.py paths to use tmp_path
+- Requires copying 100 fixture images to tmp_path (session fixture)
+- Performance test marked: `@pytest.mark.performance`
@@ -0,0 +1,62 @@
+# Label Validation Tests
+
+**Task**: AZ-156_test_label_validation
+**Name**: Label Validation Tests
+**Description**: Implement 5 blackbox tests for YOLO label validation — valid labels, out-of-range coords, missing files, multi-line corruption
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-156
+**Epic**: AZ-151
+
+## Problem
+
+Labels must be validated before dataset formation. Tests verify the check_label function correctly accepts valid labels and rejects corrupted ones.
+
+## Outcome
+
+- 5 passing pytest tests in `tests/test_label_validation.py`
+
+## Scope
+
+### Included
+- BT-LBL-01: Valid label accepted (returns True)
+- BT-LBL-02: Label with x > 1.0 rejected (returns False)
+- BT-LBL-03: Label with height > 1.0 rejected (returns False)
+- BT-LBL-04: Missing label file rejected (returns False)
+- BT-LBL-05: Multi-line label with one corrupted line (returns False)
+
+### Excluded
+- Integration with dataset formation (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Valid label**
+Given label file with content `0 0.5 0.5 0.1 0.1`
+When check_label(path) is called
+Then returns True
+
+**AC-2: x out of range**
+Given label file with content `0 1.5 0.5 0.1 0.1`
+When check_label(path) is called
+Then returns False
+
+**AC-3: height out of range**
+Given label file with content `0 0.5 0.5 0.1 1.2`
+When check_label(path) is called
+Then returns False
+
+**AC-4: Missing file**
+Given non-existent file path
+When check_label(path) is called
+Then returns False
+
+**AC-5: Multi-line corruption**
+Given label with `0 0.5 0.5 0.1 0.1\n3 0.5 0.5 0.1 1.5`
+When check_label(path) is called
+Then returns False
+
+## Constraints
+
+- Label files are generated in tmp_path at test time
+- No external fixtures needed
@@ -0,0 +1,102 @@
+# Encryption & Security Tests
+
+**Task**: AZ-157_test_encryption
+**Name**: Encryption & Security Tests
+**Description**: Implement blackbox, security, performance, resilience, and resource tests for AES-256-CBC encryption — roundtrips, key behavior, IV randomness, throughput, size bounds
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-157
+**Epic**: AZ-151
+
+## Problem
+
+The encryption module must correctly encrypt/decrypt data, produce key-dependent ciphertexts with random IVs, handle edge cases, and meet throughput requirements.
+
+## Outcome
+
+- 13 passing pytest tests in `tests/test_encryption.py`
+- Performance test in `tests/performance/test_encryption_perf.py`
+
+## Scope
+
+### Included
+- BT-ENC-01: Encrypt-decrypt roundtrip (1024 random bytes)
+- BT-ENC-02: Encrypt-decrypt roundtrip (ONNX model)
+- BT-ENC-03: Empty input roundtrip
+- BT-ENC-04: Single byte roundtrip
+- BT-ENC-05: Different keys produce different ciphertext
+- BT-ENC-06: Wrong key fails decryption
+- PT-ENC-01: Encryption throughput (10MB ≤ 5s)
+- RT-ENC-01: Decrypt with corrupted ciphertext
+- ST-ENC-01: Random IV (same data, same key → different ciphertexts)
+- ST-ENC-02: Wrong key cannot recover plaintext
+- ST-ENC-03: Model encryption key is deterministic
+- RL-ENC-01: Encrypted output size bounded (≤ N + 32 bytes)
+
+### Excluded
+- Model split tests (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Roundtrip**
+Given 1024 random bytes and key "test-key"
+When encrypt then decrypt
+Then output equals input exactly
+
+**AC-2: Model roundtrip**
+Given azaion.onnx bytes and model encryption key
+When encrypt then decrypt
+Then output equals input exactly
+
+**AC-3: Empty input**
+Given b"" and key
+When encrypt then decrypt
+Then output equals b""
+
+**AC-4: Single byte**
+Given b"\x00" and key
+When encrypt then decrypt
+Then output equals b"\x00"
+
+**AC-5: Key-dependent ciphertext**
+Given same data, keys "key-a" and "key-b"
+When encrypting with each key
+Then ciphertexts differ
+
+**AC-6: Wrong key failure**
+Given encrypted with "key-a"
+When decrypting with "key-b"
+Then output does NOT equal original
+
+**AC-7: Throughput**
+Given 10MB random bytes
+When encrypt + decrypt roundtrip
+Then completes within 5 seconds
+
+**AC-8: Corrupted ciphertext**
+Given randomly modified ciphertext bytes
+When decrypt_to is called
+Then either raises exception or returns non-original bytes
+
+**AC-9: Random IV**
+Given same data, same key, encrypted twice
+When comparing ciphertexts
+Then they differ (random IV)
+
+**AC-10: Model key deterministic**
+Given two calls to get_model_encryption_key()
+When comparing results
+Then identical
+
+**AC-11: Size bound**
+Given N bytes plaintext
+When encrypted
+Then ciphertext size ≤ N + 32 bytes
+
+## Constraints
+
+- ONNX model fixture is session-scoped (77MB, read once)
+- Security tests marked: `@pytest.mark.security`
+- Performance test marked: `@pytest.mark.performance`
+- Resource limit test marked: `@pytest.mark.resource_limit`
@@ -0,0 +1,44 @@
+# Model Split Tests
+
+**Task**: AZ-158_test_model_split
+**Name**: Model Split Tests
+**Description**: Implement 2 blackbox tests for model split storage — size constraint and reassembly integrity
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-158
+**Epic**: AZ-151
+
+## Problem
+
+Encrypted models are split into small and big parts for CDN storage. Tests must verify the split respects size constraints and reassembly produces the original.
+
+## Outcome
+
+- 2 passing pytest tests in `tests/test_model_split.py`
+
+## Scope
+
+### Included
+- BT-SPL-01: Split respects size constraint (small ≤ max(3072 bytes, 20% of total))
+- BT-SPL-02: Reassembly produces original (small + big == encrypted bytes)
+
+### Excluded
+- CDN upload/download (requires external service)
+
+## Acceptance Criteria
+
+**AC-1: Size constraint**
+Given 10000 encrypted bytes
+When split into small + big
+Then small ≤ max(3072, total × 0.2); big = remainder
+
+**AC-2: Reassembly**
+Given split parts from 10000 encrypted bytes
+When small + big concatenated
+Then equals original encrypted bytes
+
+## Constraints
+
+- Uses generated binary data (no fixture files needed)
+- References SMALL_SIZE_KB constant from constants.py
@@ -0,0 +1,57 @@
+# Annotation Class & YAML Tests
+
+**Task**: AZ-159_test_annotation_classes
+**Name**: Annotation Class & YAML Tests
+**Description**: Implement 4 tests for annotation class loading, weather mode expansion, YAML generation, and total class count
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-159
+**Epic**: AZ-151
+
+## Problem
+
+The system loads 17 base annotation classes, expands them across 3 weather modes, and generates a data.yaml with 80 class slots. Tests verify the class pipeline.
+
+## Outcome
+
+- 4 passing pytest tests in `tests/test_annotation_classes.py`
+
+## Scope
+
+### Included
+- BT-CLS-01: Load 17 base classes from classes.json
+- BT-CLS-02: Weather mode expansion (offsets 0, 20, 40)
+- BT-CLS-03: YAML generation produces nc: 80 with 17 named + 63 placeholders
+- RL-CLS-01: Total class count is exactly 80
+
+### Excluded
+- Training configuration (beyond scope)
+
+## Acceptance Criteria
+
+**AC-1: Base classes**
+Given classes.json
+When AnnotationClass.read_json() is called
+Then returns dict with 17 unique base class entries
+
+**AC-2: Weather expansion**
+Given classes.json
+When classes are read
+Then same class exists at offset 0 (Norm), 20 (Wint), 40 (Night)
+
+**AC-3: YAML generation**
+Given classes.json + dataset path
+When create_yaml() runs with patched paths
+Then data.yaml contains nc: 80, 17 named classes + 63 Class-N placeholders
+
+**AC-4: Total count**
+Given classes.json
+When generating class list
+Then exactly 80 entries
+
+## Constraints
+
+- Uses classes.json from project root (fixture_classes_json)
+- YAML output goes to tmp_path
+- Resource limit test marked: `@pytest.mark.resource_limit`
@@ -0,0 +1,65 @@
+# Hardware Hash & API Key Tests
+
+**Task**: AZ-160_test_hardware_hash
+**Name**: Hardware Hash & API Key Tests
+**Description**: Implement 7 tests for hardware fingerprinting — determinism, uniqueness, base64 format, API key derivation from credentials and hardware
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-160
+**Epic**: AZ-151
+
+## Problem
+
+Hardware hashing provides machine-bound security for model encryption and API authentication. Tests must verify determinism, uniqueness, format, and credential/hardware dependency.
+
+## Outcome
+
+- 7 passing pytest tests in `tests/test_hardware_hash.py`
+
+## Scope
+
+### Included
+- BT-HSH-01: Deterministic output (same input → same hash)
+- BT-HSH-02: Different inputs → different hashes
+- BT-HSH-03: Output is valid base64
+- ST-HSH-01: Hardware hash deterministic (duplicate of BT-HSH-01 for security coverage)
+- ST-HSH-02: Different hardware → different hash
+- ST-HSH-03: API encryption key depends on credentials + hardware
+- ST-HSH-04: API encryption key depends on credentials
+
+### Excluded
+- Actual hardware info collection (may need mocking)
+
+## Acceptance Criteria
+
+**AC-1: Determinism**
+Given "test-hardware-info"
+When get_hw_hash() called twice
+Then both calls return identical string
+
+**AC-2: Uniqueness**
+Given "hw-a" and "hw-b"
+When get_hw_hash() called on each
+Then results differ
+
+**AC-3: Base64 format**
+Given "test-hardware-info"
+When get_hw_hash() called
+Then result matches `^[A-Za-z0-9+/]+=*$`
+
+**AC-4: API key depends on hardware**
+Given same credentials, different hardware hashes
+When get_api_encryption_key() called
+Then different keys returned
+
+**AC-5: API key depends on credentials**
+Given different credentials, same hardware hash
+When get_api_encryption_key() called
+Then different keys returned
+
+## Constraints
+
+- Security tests marked: `@pytest.mark.security`
+- May require mocking hardware info collection functions
+- All inputs are generated strings (no external fixtures)
@@ -0,0 +1,62 @@
+# ONNX Inference Tests
+
+**Task**: AZ-161_test_onnx_inference
+**Name**: ONNX Inference Tests
+**Description**: Implement 4 tests for ONNX model loading, inference execution, postprocessing, and CPU latency
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-161
+**Epic**: AZ-151
+
+## Problem
+
+The ONNX inference engine loads a model, runs detection on images, and postprocesses results. Tests must verify the full pipeline works on CPU (smoke test — no precision validation).
+
+## Outcome
+
+- 4 passing pytest tests
+- Blackbox tests in `tests/test_onnx_inference.py`
+- Performance test in `tests/performance/test_inference_perf.py`
+
+## Scope
+
+### Included
+- BT-INF-01: Model loads successfully (no exception, valid engine)
+- BT-INF-02: Inference returns output (array shape [batch, N, 6+])
+- BT-INF-03: Postprocessing returns valid detections (x,y,w,h ∈ [0,1], cls ∈ [0,79], conf ∈ [0,1])
+- PT-INF-01: ONNX inference latency (single image ≤ 10s on CPU)
+
+### Excluded
+- TensorRT inference (requires NVIDIA GPU)
+- Detection precision/recall validation (smoke-only per user decision)
+
+## Acceptance Criteria
+
+**AC-1: Model loads**
+Given azaion.onnx bytes
+When OnnxEngine(model_bytes) is constructed
+Then no exception; engine has valid input_shape and batch_size
+
+**AC-2: Inference output**
+Given ONNX engine + 1 preprocessed image
+When engine.run(input_blob) is called
+Then returns list of numpy arrays; first array has shape [batch, N, 6+]
+
+**AC-3: Valid detections**
+Given ONNX engine output from real image
+When Inference.postprocess() is called
+Then returns list of Detection objects; each has x,y,w,h ∈ [0,1], cls ∈ [0,79], confidence ∈ [0,1]
+
+**AC-4: CPU latency**
+Given 1 preprocessed image + ONNX model
+When single inference runs
+Then completes within 10 seconds
+
+## Constraints
+
+- Uses onnxruntime (CPU) not onnxruntime-gpu
+- ONNX model is 77MB, loaded once (session fixture)
+- Image preprocessing must match model input size (1280×1280)
+- Performance test marked: `@pytest.mark.performance`
+- This is a smoke test — validates structure, not detection accuracy
@@ -0,0 +1,50 @@
+# NMS Overlap Removal Tests
+
+**Task**: AZ-162_test_nms
+**Name**: NMS Overlap Removal Tests
+**Description**: Implement 3 tests for non-maximum suppression — overlapping kept by confidence, non-overlapping preserved, chain overlap resolution
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-162
+**Epic**: AZ-151
+
+## Problem
+
+The NMS module removes overlapping detections based on IoU threshold (0.3), keeping the higher-confidence detection. Tests verify all overlap scenarios.
+
+## Outcome
+
+- 3 passing pytest tests in `tests/test_nms.py`
+
+## Scope
+
+### Included
+- BT-NMS-01: Overlapping detections — keep higher confidence (IoU > 0.3 → 1 kept)
+- BT-NMS-02: Non-overlapping detections — keep both (IoU < 0.3 → 2 kept)
+- BT-NMS-03: Chain overlap resolution (A↔B, B↔C → ≤ 2 kept)
+
+### Excluded
+- Integration with inference pipeline (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Overlap removal**
+Given 2 Detections at same position, confidence 0.9 and 0.5, IoU > 0.3
+When remove_overlapping_detections() runs
+Then 1 detection returned (confidence 0.9)
+
+**AC-2: Non-overlapping preserved**
+Given 2 Detections at distant positions, IoU < 0.3
+When remove_overlapping_detections() runs
+Then 2 detections returned
+
+**AC-3: Chain overlap**
+Given 3 Detections: A overlaps B, B overlaps C, A doesn't overlap C
+When remove_overlapping_detections() runs
+Then ≤ 2 detections; highest confidence per overlapping pair kept
+
+## Constraints
+
+- Detection objects constructed in-memory (no fixture files)
+- IoU threshold is 0.3 (from constants or hardcoded in NMS)
@@ -0,0 +1,64 @@
+# Annotation Queue Message Tests
+
+**Task**: AZ-163_test_annotation_queue
+**Name**: Annotation Queue Message Tests
+**Description**: Implement 5 tests for annotation queue message parsing — Created, Validated bulk, Deleted bulk, malformed handling
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-163
+**Epic**: AZ-151
+
+## Problem
+
+The annotation queue processes msgpack-encoded messages from RabbitMQ Streams. Tests must verify correct parsing of all message types and graceful handling of malformed input.
+
+## Outcome
+
+- 5 passing pytest tests in `tests/test_annotation_queue.py`
+
+## Scope
+
+### Included
+- BT-AQM-01: Parse Created annotation message (all fields populated correctly)
+- BT-AQM-02: Parse Validated bulk message (status == Validated, names list matches)
+- BT-AQM-03: Parse Deleted bulk message (status == Deleted, names list matches)
+- BT-AQM-04: Malformed message raises exception
+- RT-AQM-01: Malformed msgpack bytes handled (exception caught, no crash)
+
+### Excluded
+- Live RabbitMQ Streams connection (requires external service)
+- Queue offset persistence (requires live broker)
+
+## Acceptance Criteria
+
+**AC-1: Created message**
+Given msgpack bytes matching AnnotationMessage schema (status=Created, role=Validator)
+When decoded and constructed
+Then all fields populated: name, detections, image bytes, status == "Created", role == "Validator"
+
+**AC-2: Validated bulk**
+Given msgpack bytes with status=Validated, list of names
+When decoded and constructed
+Then status == "Validated", names list matches input
+
+**AC-3: Deleted bulk**
+Given msgpack bytes with status=Deleted, list of names
+When decoded and constructed
+Then status == "Deleted", names list matches input
+
+**AC-4: Malformed msgpack**
+Given invalid msgpack bytes
+When decode is attempted
+Then exception raised
+
+**AC-5: Resilient handling**
+Given random bytes (not valid msgpack)
+When passed to message handler
+Then exception caught, handler doesn't crash
+
+## Constraints
+
+- Msgpack messages constructed in-memory at test time
+- Must match the AnnotationMessage/AnnotationBulkMessage schemas from annotation-queue/
+- Resilience test marked: `@pytest.mark.resilience`
@@ -0,0 +1,45 @@
+# Dependencies Table
+
+**Date**: 2026-03-26
+**Total Tasks**: 12
+**Total Complexity Points**: 25
+**Epic**: AZ-151
+
+| Task | Name | Complexity | Dependencies | Epic | Test Scenarios |
+|------|------|-----------|-------------|------|----------------|
+| AZ-152 | test_infrastructure | 3 | None | AZ-151 | — |
+| AZ-153 | test_augmentation | 3 | AZ-152 | AZ-151 | BT-AUG-01 to BT-AUG-08 (8) |
+| AZ-154 | test_augmentation_nonfunc | 2 | AZ-152 | AZ-151 | PT-AUG-01, PT-AUG-02, RT-AUG-01 to RT-AUG-03, RL-AUG-01 (6) |
+| AZ-155 | test_dataset_formation | 2 | AZ-152 | AZ-151 | BT-DSF-01 to BT-DSF-04, PT-DSF-01, RT-DSF-01, RL-DSF-01, RL-DSF-02 (8) |
+| AZ-156 | test_label_validation | 1 | AZ-152 | AZ-151 | BT-LBL-01 to BT-LBL-05 (5) |
+| AZ-157 | test_encryption | 3 | AZ-152 | AZ-151 | BT-ENC-01 to BT-ENC-06, PT-ENC-01, RT-ENC-01, ST-ENC-01 to ST-ENC-03, RL-ENC-01 (12) |
+| AZ-158 | test_model_split | 1 | AZ-152 | AZ-151 | BT-SPL-01, BT-SPL-02 (2) |
+| AZ-159 | test_annotation_classes | 2 | AZ-152 | AZ-151 | BT-CLS-01 to BT-CLS-03, RL-CLS-01 (4) |
+| AZ-160 | test_hardware_hash | 2 | AZ-152 | AZ-151 | BT-HSH-01 to BT-HSH-03, ST-HSH-01 to ST-HSH-04 (7) |
+| AZ-161 | test_onnx_inference | 3 | AZ-152 | AZ-151 | BT-INF-01 to BT-INF-03, PT-INF-01 (4) |
+| AZ-162 | test_nms | 1 | AZ-152 | AZ-151 | BT-NMS-01 to BT-NMS-03 (3) |
+| AZ-163 | test_annotation_queue | 2 | AZ-152 | AZ-151 | BT-AQM-01 to BT-AQM-04, RT-AQM-01 (5) |
+
+## Dependency Graph
+
+```
+AZ-151 (Epic: Blackbox Tests)
+└── AZ-152 test_infrastructure
+    ├── AZ-153 test_augmentation
+    ├── AZ-154 test_augmentation_nonfunc
+    ├── AZ-155 test_dataset_formation
+    ├── AZ-156 test_label_validation
+    ├── AZ-157 test_encryption
+    ├── AZ-158 test_model_split
+    ├── AZ-159 test_annotation_classes
+    ├── AZ-160 test_hardware_hash
+    ├── AZ-161 test_onnx_inference
+    ├── AZ-162 test_nms
+    └── AZ-163 test_annotation_queue
+```
+
+## Implementation Strategy
+
+- **Batch 1**: AZ-152 (test infrastructure) — must be implemented first
+- **Batch 2**: AZ-153 to AZ-163 (all test tasks) — can be implemented in parallel after infrastructure is ready
+- **Estimated batches**: 2
@@ -2,9 +2,9 @@

 ## Current Step
 flow: existing-code
-step: 5
-name: Run Tests
-status: not_started
+step: 6
+name: Refactor
+status: in_progress
 sub_step: 0
 retry_count: 0

@@ -31,6 +31,7 @@ retry_count: 0
 | 3 (sub 4) | Decompose Tests — Verification | 2026-03-26 | All 29 covered AC verified, no circular deps, no overlaps, dependencies table produced |
 | 3 | Decompose Tests | 2026-03-26 | 12 tasks total (1 infrastructure + 11 test tasks), 25 complexity points, 2 implementation batches |
 | 4 | Implement Tests | 2026-03-26 | 12/12 tasks implemented, 76 tests passing, 4 commits across 4 sub-batches |
+| 5 | Run Tests | 2026-03-26 | 76 passed, 0 failed, 0 skipped. JUnit XML in test-results/ |

 ## Key Decisions
 - Component breakdown: 8 components confirmed by user
@@ -43,12 +44,24 @@ retry_count: 0
 - Tracker: jira (project AZ, cloud 1598226f-845f-4705-bcd1-5ed0c82d6119)
 - Epic: AZ-151 (Blackbox Tests), 12 tasks: AZ-152 to AZ-163
 - Task grouping: 55 test scenarios grouped into 11 atomic tasks by functional area, all ≤ 3 complexity points
+- Refactor approach: Pydantic BaseModel config chosen over env vars / dataclass / plain dict. pydantic 2.12.5 already installed via ultralytics.
+
+## Refactor Progress (Step 6)
+Work done so far (across multiple sessions):
+- Replaced module-level path variables + get_paths/reload_config in constants.py with Pydantic Config(BaseModel) — paths defined once as @property
+- Migrated all 5 production callers (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to constants.config.X
+- Fixed device=0 bug in exports.py, fixed total_to_process bug in augmentation.py
+- Simplified test infrastructure: conftest.py apply_constants_patch reduced to single config swap
+- Updated 7 test files to use constants.config.X
+- Rewrote E2E test to AAA pattern: Arrange (copy raw data), Act (production functions only: augment_annotations, train_dataset, export_onnx, export_coreml), Assert (7 test methods)
+- All 83 tests passing (76 non-E2E + 7 E2E)
+- Refactor test verification phase still pending

 ## Last Session
-date: 2026-03-26
-ended_at: Step 4 Implement Tests — All batches complete
-reason: auto-chain — Implement Tests complete, next is Run Tests
-notes: 76 tests passing across 12 tasks. All committed and pushed to dev. Virtual environment (.venv) created with requirements-test.txt. pytest.ini added for custom marks.
+date: 2026-03-27
+ended_at: Step 6 Refactor — implementation done, test verification pending
+reason: user indicated test phase not yet completed
+notes: Pydantic config refactor + E2E rewrite implemented. 83/83 tests pass. Formal test verification phase of refactoring still pending.

 ## Retry Log
 | Attempt | Step | Name | SubStep | Failure Reason | Timestamp |
@@ -9,7 +9,7 @@ import albumentations as A
 import cv2
 import numpy as np

-from constants import (data_images_dir, data_labels_dir, processed_images_dir, processed_labels_dir, processed_dir)
+import constants
 from dto.imageLabel import ImageLabel


@@ -60,8 +60,8 @@ class Augmentator:
        results.append(ImageLabel(
                image=img_ann.image,
                labels=img_ann.labels,
-                image_path=os.path.join(processed_images_dir, Path(img_ann.image_path).name),
-                labels_path=os.path.join(processed_labels_dir, Path(img_ann.labels_path).name)
+                image_path=os.path.join(constants.config.processed_images_dir, Path(img_ann.image_path).name),
+                labels_path=os.path.join(constants.config.processed_labels_dir, Path(img_ann.labels_path).name)
            )
        )
        for i in range(7):
@@ -72,8 +72,8 @@ class Augmentator:
                img = ImageLabel(
                    image=res['image'],
                    labels=res['bboxes'],
-                    image_path=os.path.join(processed_images_dir, f'{name}{path.suffix}'),
-                    labels_path=os.path.join(processed_labels_dir, f'{name}.txt')
+                    image_path=os.path.join(constants.config.processed_images_dir, f'{name}{path.suffix}'),
+                    labels_path=os.path.join(constants.config.processed_labels_dir, f'{name}.txt')
                )
                results.append(img)
            except Exception as e:
@@ -95,8 +95,8 @@ class Augmentator:

    def augment_annotation(self, image_file):
        try:
-            image_path = os.path.join(data_images_dir, image_file.name)
-            labels_path = os.path.join(data_labels_dir, f'{Path(str(image_path)).stem}.txt')
+            image_path = os.path.join(constants.config.data_images_dir, image_file.name)
+            labels_path = os.path.join(constants.config.data_labels_dir, f'{Path(str(image_path)).stem}.txt')
            image = cv2.imdecode(np.fromfile(image_path, dtype=np.uint8), cv2.IMREAD_UNCHANGED)

            img_ann = ImageLabel(
@@ -115,7 +115,7 @@ class Augmentator:
                        f.writelines(lines)
                        f.close()

-                print(f'{datetime.now():{"%Y-%m-%d %H:%M:%S"}}: {self.total_files_processed + 1}/{self.total_to_process} : {image_file.name} has augmented')
+                print(f'{datetime.now():{"%Y-%m-%d %H:%M:%S"}}: {self.total_files_processed + 1}/{self.total_images_to_process} : {image_file.name} has augmented')
            except Exception as e:
                print(e)
            self.total_files_processed += 1
@@ -126,15 +126,15 @@ class Augmentator:
        self.total_files_processed = 0

        if from_scratch:
-            shutil.rmtree(processed_dir)
+            shutil.rmtree(constants.config.processed_dir)

-        os.makedirs(processed_images_dir, exist_ok=True)
-        os.makedirs(processed_labels_dir, exist_ok=True)
+        os.makedirs(constants.config.processed_images_dir, exist_ok=True)
+        os.makedirs(constants.config.processed_labels_dir, exist_ok=True)


-        processed_images = set(f.name for f in os.scandir(processed_images_dir))
+        processed_images = set(f.name for f in os.scandir(constants.config.processed_images_dir))
        images = []
-        with os.scandir(data_images_dir) as imd:
+        with os.scandir(constants.config.data_images_dir) as imd:
            for image_file in imd:
                if image_file.is_file() and image_file.name not in processed_images:
                    images.append(image_file)
@@ -0,0 +1,11 @@
+training:
+  model: 'yolo11n.yaml'
+  epochs: 1
+  batch: 4
+  imgsz: 320
+  save_period: 1
+  workers: 0
+
+export:
+  onnx_imgsz: 320
+  onnx_batch: 1
@@ -18,3 +18,15 @@ dirs:
  data_deleted: 'data_deleted'
  images: 'images'
  labels: 'labels'
+
+training:
+  model: 'yolo11m.yaml'
+  epochs: 120
+  batch: 11
+  imgsz: 1280
+  save_period: 1
+  workers: 24
+
+export:
+  onnx_imgsz: 1280
+  onnx_batch: 4
@@ -1,28 +1,105 @@
 from os import path

-azaion = '/azaion'
+import yaml
+from pydantic import BaseModel
+
+
+class DirsConfig(BaseModel):
+    root: str = '/azaion'
+
+
+class TrainingConfig(BaseModel):
+    model: str = 'yolo11m.yaml'
+    epochs: int = 120
+    batch: int = 11
+    imgsz: int = 1280
+    save_period: int = 1
+    workers: int = 24
+
+
+class ExportConfig(BaseModel):
+    onnx_imgsz: int = 1280
+    onnx_batch: int = 4
+
+
+class Config(BaseModel):
+    dirs: DirsConfig = DirsConfig()
+    training: TrainingConfig = TrainingConfig()
+    export: ExportConfig = ExportConfig()
+
+    @property
+    def azaion(self) -> str:
+        return self.dirs.root
+
+    @property
+    def data_dir(self) -> str:
+        return path.join(self.dirs.root, 'data')
+
+    @property
+    def data_images_dir(self) -> str:
+        return path.join(self.data_dir, 'images')
+
+    @property
+    def data_labels_dir(self) -> str:
+        return path.join(self.data_dir, 'labels')
+
+    @property
+    def processed_dir(self) -> str:
+        return path.join(self.dirs.root, 'data-processed')
+
+    @property
+    def processed_images_dir(self) -> str:
+        return path.join(self.processed_dir, 'images')
+
+    @property
+    def processed_labels_dir(self) -> str:
+        return path.join(self.processed_dir, 'labels')
+
+    @property
+    def corrupted_dir(self) -> str:
+        return path.join(self.dirs.root, 'data-corrupted')
+
+    @property
+    def corrupted_images_dir(self) -> str:
+        return path.join(self.corrupted_dir, 'images')
+
+    @property
+    def corrupted_labels_dir(self) -> str:
+        return path.join(self.corrupted_dir, 'labels')
+
+    @property
+    def sample_dir(self) -> str:
+        return path.join(self.dirs.root, 'data-sample')
+
+    @property
+    def datasets_dir(self) -> str:
+        return path.join(self.dirs.root, 'datasets')
+
+    @property
+    def models_dir(self) -> str:
+        return path.join(self.dirs.root, 'models')
+
+    @property
+    def current_pt_model(self) -> str:
+        return path.join(self.models_dir, f'{prefix[:-1]}.pt')
+
+    @property
+    def current_onnx_model(self) -> str:
+        return path.join(self.models_dir, f'{prefix[:-1]}.onnx')
+
+    @classmethod
+    def from_yaml(cls, config_file: str, root: str = None) -> 'Config':
+        try:
+            with open(config_file) as f:
+                data = yaml.safe_load(f) or {}
+        except FileNotFoundError:
+            data = {}
+        if root is not None:
+            data.setdefault('dirs', {})['root'] = root
+        return cls(**data)
+
+
 prefix = 'azaion-'
-images = 'images'
-labels = 'labels'
-
-data_dir = path.join(azaion, 'data')
-data_images_dir = path.join(data_dir, images)
-data_labels_dir = path.join(data_dir, labels)
-
-processed_dir = path.join(azaion, 'data-processed')
-processed_images_dir = path.join(processed_dir, images)
-processed_labels_dir = path.join(processed_dir, labels)
-
-corrupted_dir = path.join(azaion, 'data-corrupted')
-corrupted_images_dir = path.join(corrupted_dir, images)
-corrupted_labels_dir = path.join(corrupted_dir, labels)
-
-sample_dir = path.join(azaion, 'data-sample')
-
-datasets_dir = path.join(azaion, 'datasets')
-models_dir = path.join(azaion, 'models')
-
-
 date_format = '%Y-%m-%d'
 checkpoint_file = 'checkpoint.txt'
 checkpoint_date_format = '%Y-%m-%d %H:%M:%S'
@@ -38,5 +115,4 @@ SMALL_SIZE_KB = 3
 CDN_CONFIG = 'cdn.yaml'
 MODELS_FOLDER = 'models'

-CURRENT_PT_MODEL = path.join(models_dir, f'{prefix[:-1]}.pt')
-CURRENT_ONNX_MODEL = path.join(models_dir, f'{prefix[:-1]}.onnx')
+config: Config = Config.from_yaml(CONFIG_FILE)
@@ -6,12 +6,12 @@ from dto.imageLabel import ImageLabel
 from preprocessing import read_labels
 from matplotlib import pyplot as plt

-from constants import datasets_dir, prefix, processed_images_dir, processed_labels_dir
+import constants
 annotation_classes = AnnotationClass.read_json()


 def visualise_dataset():
-    cur_dataset = os.path.join(datasets_dir, f'{prefix}2024-06-18', 'train')
+    cur_dataset = os.path.join(constants.config.datasets_dir, f'{constants.prefix}2024-06-18', 'train')
    images_dir = os.path.join(cur_dataset, 'images')
    labels_dir = os.path.join(cur_dataset, 'labels')

@@ -33,8 +33,8 @@ def visualise_dataset():
 def visualise_processed_folder():

    def show_image(img):
-        image_path = os.path.join(processed_images_dir, img)
-        labels_path = os.path.join(processed_labels_dir, f'{Path(img).stem}.txt')
+        image_path = os.path.join(constants.config.processed_images_dir, img)
+        labels_path = os.path.join(constants.config.processed_labels_dir, f'{Path(img).stem}.txt')
        img = ImageLabel(
            image_path=image_path,
            image=cv2.imread(image_path),
@@ -42,7 +42,7 @@ def visualise_processed_folder():
            labels=read_labels(labels_path)
        )
        img.visualize(annotation_classes)
-    images = os.listdir(processed_images_dir)
+    images = os.listdir(constants.config.processed_images_dir)
    cur = 0
    show_image(images[cur])
    pass
@@ -0,0 +1,11 @@
+services:
+  test-runner:
+    build:
+      context: .
+      dockerfile: Dockerfile.test
+    volumes:
+      - ./test-results:/app/test-results
+      - ./_docs/00_problem/input_data:/app/_docs/00_problem/input_data:ro
+    environment:
+      - PYTHONDONTWRITEBYTECODE=1
+      - PYTHONUNBUFFERED=1
@@ -11,7 +11,6 @@ from ultralytics import YOLO
 import constants
 from api_client import ApiClient, ApiCredentials
 from cdn_manager import CDNManager, CDNCredentials
-from constants import datasets_dir, processed_images_dir
 from security import Security
 from utils import Dotdict

@@ -26,7 +25,9 @@ def export_rknn(model_path):
    pass


-def export_onnx(model_path, batch_size=4):
+def export_onnx(model_path, batch_size=None):
+    if batch_size is None:
+        batch_size = constants.config.export.onnx_batch
    model = YOLO(model_path)
    onnx_path = Path(model_path).stem + '.onnx'
    if path.exists(onnx_path):
@@ -34,11 +35,18 @@ def export_onnx(model_path, batch_size=4):

    model.export(
        format="onnx",
-        imgsz=1280,
+        imgsz=constants.config.export.onnx_imgsz,
        batch=batch_size,
        simplify=True,
        nms=True,
-        device=0
+    )
+
+
+def export_coreml(model_path):
+    model = YOLO(model_path)
+    model.export(
+        format="coreml",
+        imgsz=constants.config.export.onnx_imgsz,
    )


@@ -54,7 +62,7 @@ def export_tensorrt(model_path):

 def form_data_sample(destination_path, size=500, write_txt_log=False):
    images = []
-    with scandir(processed_images_dir) as imd:
+    with scandir(constants.config.processed_images_dir) as imd:
        for image_file in imd:
            if not image_file.is_file():
                continue
@@ -11,11 +11,11 @@ from augmentation import Augmentator
 # train.train_dataset()
 # train.resume_training('/azaion/dev/ai-training/runs/detect/train12/weights/last.pt')

-model_dir = path.join(constants.models_dir, f'{constants.prefix}2025-05-18')
+model_dir = path.join(constants.config.models_dir, f'{constants.prefix}2025-05-18')

 for file in glob.glob(path.join(model_dir, 'weights', 'epoch*')):
    os.remove(file)
-shutil.copy(path.join(model_dir, 'weights', 'best.pt'), constants.CURRENT_PT_MODEL)
+shutil.copy(path.join(model_dir, 'weights', 'best.pt'), constants.config.current_pt_model)

 train.export_current_model()
 print('success!')
@@ -4,3 +4,4 @@ markers =
    resilience: Resilience/error handling tests
    security: Security tests
    resource_limit: Resource limit tests
+    e2e: End-to-end training pipeline tests (slow, require GPU/MPS)
@@ -0,0 +1,10 @@
+pytest>=7.0
+albumentations
+opencv-python-headless
+numpy==1.26.4
+onnxruntime
+cryptography==44.0.2
+msgpack
+PyYAML
+ultralytics
+coremltools
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+
+cleanup() {
+  if [ -d "$RESULTS_DIR" ]; then
+    echo "Results saved to $RESULTS_DIR"
+  fi
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+echo "════════════════════════════════════"
+echo " Performance Tests"
+echo "════════════════════════════════════"
+echo ""
+echo "Thresholds (from test spec):"
+echo "  PT-AUG-01  Augmentation 10 images   ≤ 60s"
+echo "  PT-AUG-02  Parallel speedup          ≥ 1.5×"
+echo "  PT-DSF-01  Dataset formation 100 img  ≤ 30s"
+echo "  PT-ENC-01  Encrypt/decrypt 10MB       ≤ 5s"
+echo "  PT-INF-01  ONNX inference single      ≤ 10s"
+echo ""
+
+cd "$PROJECT_ROOT"
+
+FAILED=0
+
+if python -m pytest tests/performance/ \
+  --tb=short \
+  --junitxml="$RESULTS_DIR/performance-results.xml" \
+  -v; then
+  echo ""
+  echo "✓ All performance thresholds met"
+else
+  echo ""
+  echo "✗ Some performance thresholds exceeded"
+  FAILED=1
+fi
+
+echo ""
+echo "════════════════════════════════════"
+echo " Summary"
+echo "════════════════════════════════════"
+
+if [ -f "$RESULTS_DIR/performance-results.xml" ]; then
+  python -c "
+import xml.etree.ElementTree as ET
+t = ET.parse('$RESULTS_DIR/performance-results.xml').getroot()
+print(f\"Tests: {t.get('tests',0)}  Failures: {t.get('failures',0)}  Errors: {t.get('errors',0)}\")
+" 2>/dev/null || echo "Could not parse performance results XML"
+fi
+
+if [ $FAILED -ne 0 ]; then
+  echo "EXIT: 1 (thresholds exceeded)"
+  exit 1
+fi
+
+echo "EXIT: 0 (all thresholds met)"
+exit 0
@@ -0,0 +1,96 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+RESULTS_DIR="$PROJECT_ROOT/test-results"
+PERF_ONLY=false
+UNIT_ONLY=false
+
+for arg in "$@"; do
+  case $arg in
+    --unit-only) UNIT_ONLY=true ;;
+    --perf-only) PERF_ONLY=true ;;
+    --help|-h)
+      echo "Usage: $0 [--unit-only] [--perf-only]"
+      echo "  --unit-only   Run only unit/blackbox tests (skip performance)"
+      echo "  --perf-only   Run only performance tests"
+      exit 0
+      ;;
+  esac
+done
+
+cleanup() {
+  if [ -d "$RESULTS_DIR" ]; then
+    echo "Results saved to $RESULTS_DIR"
+  fi
+}
+trap cleanup EXIT
+
+mkdir -p "$RESULTS_DIR"
+
+FAILED=0
+
+if ! $PERF_ONLY; then
+  echo "════════════════════════════════════"
+  echo " Running blackbox + unit tests"
+  echo "════════════════════════════════════"
+
+  cd "$PROJECT_ROOT"
+
+  if python -m pytest tests/ \
+    --ignore=tests/performance/ \
+    --tb=short \
+    --junitxml="$RESULTS_DIR/test-results.xml" \
+    -q; then
+    echo "✓ Tests passed"
+  else
+    echo "✗ Tests failed"
+    FAILED=1
+  fi
+fi
+
+if ! $UNIT_ONLY; then
+  echo ""
+  echo "════════════════════════════════════"
+  echo " Running performance tests"
+  echo "════════════════════════════════════"
+
+  cd "$PROJECT_ROOT"
+
+  if python -m pytest tests/performance/ \
+    --tb=short \
+    --junitxml="$RESULTS_DIR/performance-results.xml" \
+    -q 2>/dev/null; then
+    echo "✓ Performance tests passed"
+  else
+    if [ -d "tests/performance" ]; then
+      echo "✗ Performance tests failed"
+      FAILED=1
+    else
+      echo "⊘ No performance test directory found — skipping"
+    fi
+  fi
+fi
+
+echo ""
+echo "════════════════════════════════════"
+echo " Summary"
+echo "════════════════════════════════════"
+
+if [ -f "$RESULTS_DIR/test-results.xml" ]; then
+  TESTS=$(python -c "
+import xml.etree.ElementTree as ET
+t = ET.parse('$RESULTS_DIR/test-results.xml').getroot()
+print(f\"Tests: {t.get('tests',0)}  Failures: {t.get('failures',0)}  Errors: {t.get('errors',0)}  Skipped: {t.get('skipped',0)}\")
+" 2>/dev/null || echo "Could not parse test results XML")
+  echo "$TESTS"
+fi
+
+if [ $FAILED -ne 0 ]; then
+  echo "EXIT: 1 (failures detected)"
+  exit 1
+fi
+
+echo "EXIT: 0 (all passed)"
+exit 0
@@ -8,35 +8,14 @@ _DATASET_IMAGES = _PROJECT_ROOT / "_docs/00_problem/input_data/dataset/images"
 _DATASET_LABELS = _PROJECT_ROOT / "_docs/00_problem/input_data/dataset/labels"
 _ONNX_MODEL = _PROJECT_ROOT / "_docs/00_problem/input_data/azaion.onnx"
 _CLASSES_JSON = _PROJECT_ROOT / "classes.json"
+_CONFIG_TEST = _PROJECT_ROOT / "config.test.yaml"

 collect_ignore = ["security_test.py", "imagelabel_visualize_test.py"]


 def apply_constants_patch(monkeypatch, base: Path):
    import constants as c
-    from os import path
-
-    root = str(base.resolve())
-    azaion = path.join(root, "azaion")
-    monkeypatch.setattr(c, "azaion", azaion)
-    data_dir = path.join(azaion, "data")
-    monkeypatch.setattr(c, "data_dir", data_dir)
-    monkeypatch.setattr(c, "data_images_dir", path.join(data_dir, c.images))
-    monkeypatch.setattr(c, "data_labels_dir", path.join(data_dir, c.labels))
-    processed_dir = path.join(azaion, "data-processed")
-    monkeypatch.setattr(c, "processed_dir", processed_dir)
-    monkeypatch.setattr(c, "processed_images_dir", path.join(processed_dir, c.images))
-    monkeypatch.setattr(c, "processed_labels_dir", path.join(processed_dir, c.labels))
-    corrupted_dir = path.join(azaion, "data-corrupted")
-    monkeypatch.setattr(c, "corrupted_dir", corrupted_dir)
-    monkeypatch.setattr(c, "corrupted_images_dir", path.join(corrupted_dir, c.images))
-    monkeypatch.setattr(c, "corrupted_labels_dir", path.join(corrupted_dir, c.labels))
-    monkeypatch.setattr(c, "sample_dir", path.join(azaion, "data-sample"))
-    monkeypatch.setattr(c, "datasets_dir", path.join(azaion, "datasets"))
-    models_dir = path.join(azaion, "models")
-    monkeypatch.setattr(c, "models_dir", models_dir)
-    monkeypatch.setattr(c, "CURRENT_PT_MODEL", path.join(models_dir, f"{c.prefix[:-1]}.pt"))
-    monkeypatch.setattr(c, "CURRENT_ONNX_MODEL", path.join(models_dir, f"{c.prefix[:-1]}.onnx"))
+    monkeypatch.setattr(c, "config", c.Config.from_yaml(str(_CONFIG_TEST), root=str(base / "azaion")))


@pytest.fixture(scope="session")
@@ -20,15 +20,7 @@ if "matplotlib" not in sys.modules:


 def _patch_augmentation_paths(monkeypatch, base: Path):
-    import augmentation as aug
-    import constants as c
-
    apply_constants_patch(monkeypatch, base)
-    monkeypatch.setattr(aug, "data_images_dir", c.data_images_dir)
-    monkeypatch.setattr(aug, "data_labels_dir", c.data_labels_dir)
-    monkeypatch.setattr(aug, "processed_images_dir", c.processed_images_dir)
-    monkeypatch.setattr(aug, "processed_labels_dir", c.processed_labels_dir)
-    monkeypatch.setattr(aug, "processed_dir", c.processed_dir)


 def _augment_annotation_with_total(monkeypatch):
@@ -58,8 +50,8 @@ def test_pt_aug_01_throughput_ten_images_sixty_seconds(
    import constants as c
    from augmentation import Augmentator

-    img_dir = Path(c.data_images_dir)
-    lbl_dir = Path(c.data_labels_dir)
+    img_dir = Path(c.config.data_images_dir)
+    lbl_dir = Path(c.config.data_labels_dir)
    img_dir.mkdir(parents=True, exist_ok=True)
    lbl_dir.mkdir(parents=True, exist_ok=True)
    src_img, src_lbl = sample_images_labels(10)
@@ -83,9 +75,9 @@ def test_pt_aug_02_parallel_at_least_one_point_five_x_faster(
    import constants as c
    from augmentation import Augmentator

-    img_dir = Path(c.data_images_dir)
-    lbl_dir = Path(c.data_labels_dir)
-    proc_dir = Path(c.processed_dir)
+    img_dir = Path(c.config.data_images_dir)
+    lbl_dir = Path(c.config.data_labels_dir)
+    proc_dir = Path(c.config.processed_dir)
    img_dir.mkdir(parents=True, exist_ok=True)
    lbl_dir.mkdir(parents=True, exist_ok=True)
    src_img, src_lbl = sample_images_labels(10)
@@ -93,8 +85,8 @@ def test_pt_aug_02_parallel_at_least_one_point_five_x_faster(
        shutil.copy2(p, img_dir / p.name)
    for p in src_lbl.glob("*.txt"):
        shutil.copy2(p, lbl_dir / p.name)
-    Path(c.processed_images_dir).mkdir(parents=True, exist_ok=True)
-    Path(c.processed_labels_dir).mkdir(parents=True, exist_ok=True)
+    Path(c.config.processed_images_dir).mkdir(parents=True, exist_ok=True)
+    Path(c.config.processed_labels_dir).mkdir(parents=True, exist_ok=True)
    names = sorted(p.name for p in img_dir.glob("*.jpg"))

    class _E:
@@ -113,8 +105,8 @@ def test_pt_aug_02_parallel_at_least_one_point_five_x_faster(
    seq_elapsed = time.perf_counter() - t0

    shutil.rmtree(proc_dir)
-    Path(c.processed_images_dir).mkdir(parents=True, exist_ok=True)
-    Path(c.processed_labels_dir).mkdir(parents=True, exist_ok=True)
+    Path(c.config.processed_images_dir).mkdir(parents=True, exist_ok=True)
+    Path(c.config.processed_labels_dir).mkdir(parents=True, exist_ok=True)

    aug_par = Augmentator()
    aug_par.total_images_to_process = len(entries)
@@ -56,8 +56,8 @@ def _prepare_form_dataset(
    constants_patch(tmp_path)
    import train

-    proc_img = Path(c_mod.processed_images_dir)
-    proc_lbl = Path(c_mod.processed_labels_dir)
+    proc_img = Path(c_mod.config.processed_images_dir)
+    proc_lbl = Path(c_mod.config.processed_labels_dir)
    proc_img.mkdir(parents=True, exist_ok=True)
    proc_lbl.mkdir(parents=True, exist_ok=True)

@@ -70,14 +70,8 @@ def _prepare_form_dataset(
        if stem in corrupt_stems:
            dst.write_text("0 1.5 0.5 0.1 0.1\n", encoding="utf-8")

-    today_ds = osp.join(c_mod.datasets_dir, train.today_folder)
-    monkeypatch.setattr(train, "today_dataset", today_ds)
-    monkeypatch.setattr(train, "processed_images_dir", c_mod.processed_images_dir)
-    monkeypatch.setattr(train, "processed_labels_dir", c_mod.processed_labels_dir)
-    monkeypatch.setattr(train, "corrupted_images_dir", c_mod.corrupted_images_dir)
-    monkeypatch.setattr(train, "corrupted_labels_dir", c_mod.corrupted_labels_dir)
-    monkeypatch.setattr(train, "datasets_dir", c_mod.datasets_dir)
-    return train
+    today_ds = osp.join(c_mod.config.datasets_dir, train.today_folder)
+    return train, today_ds


@pytest.mark.performance
@@ -88,7 +82,7 @@ def test_pt_dsf_01_dataset_formation_under_thirty_seconds(
    fixture_images_dir,
    fixture_labels_dir,
 ):
-    train = _prepare_form_dataset(
+    train, today_ds = _prepare_form_dataset(
        monkeypatch,
        tmp_path,
        constants_patch,
@@ -42,9 +42,13 @@ def data_yaml_text(monkeypatch, tmp_path, fixture_classes_json):
    _stub_train_imports()
    import train

-    monkeypatch.setattr(train, "today_dataset", str(tmp_path))
+    import constants as c
+    monkeypatch.setattr(c, "config", c.Config(dirs=c.DirsConfig(root=str(tmp_path))))
+    monkeypatch.setattr(train, "today_folder", "")
+    from pathlib import Path
+    Path(c.config.datasets_dir).mkdir(parents=True, exist_ok=True)
    train.create_yaml()
-    return (tmp_path / "data.yaml").read_text(encoding="utf-8")
+    return (Path(c.config.datasets_dir) / "data.yaml").read_text(encoding="utf-8")


 def test_bt_cls_01_base_classes(fixture_classes_json):
--- a/Show More
+++ b/Show More