Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 09:11:08 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -17,6 +17,17 @@ disable-model-invocation: true
 Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Detects project state from `_docs/`, resumes from where work stopped, and flows through skills automatically. The user invokes `/autopilot` once — the engine handles sequencing, transitions, and re-entry.
 ## File Index
 | File | Purpose |
 |------|---------|
 | `flows/greenfield.md` | Detection rules, step table, and auto-chain rules for new projects |
 | `flows/existing-code.md` | Detection rules, step table, and auto-chain rules for existing codebases |
 | `state.md` | State file format, rules, re-entry protocol, session boundaries |
 | `protocols.md` | User interaction, Jira MCP auth, choice format, error handling, status summary |
 **On every invocation**: read all four files above before executing any logic.
 ## Core Principles
 - **Auto-chain**: when a skill completes, immediately start the next one — no pause between skills
@@ -26,411 +37,50 @@ Auto-chaining execution engine that drives the full BUILD → SHIP workflow. Det
 - **Delegate, don't duplicate**: read and execute each sub-skill's SKILL.md; never inline their logic here
 - **Sound on pause**: follow `.cursor/rules/human-input-sound.mdc` — play a notification sound before every pause that requires human input
 - **Minimize interruptions**: only ask the user when the decision genuinely cannot be resolved automatically
 - **Jira MCP recommended**: steps that create Jira artifacts (Plan Step 6, Decompose) should have authenticated Jira MCP — if unavailable, offer user the choice to continue with local-only task tracking
-## Jira MCP Authentication
+## Flow Resolution
-Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
+Determine which flow to use:
-### Steps That Require Jira MCP
+1. If workspace has source code files **and** `_docs/` does not exist → **existing-code flow** (Pre-Step detection)
 2. If `_docs/_autopilot_state.md` exists and records Document in `Completed Steps` → **existing-code flow**
 3. If `_docs/_autopilot_state.md` exists and `step: done` AND workspace contains source code → **existing-code flow** (completed project re-entry — loops to New Task)
 4. Otherwise → **greenfield flow**
-| Step | Sub-Step | Jira Action |
+After selecting the flow, apply its detection rules (first match wins) to determine the current step.
 |------|----------|-------------|
 | 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
 | 3 (Decompose) | Step 1–3 — All tasks | Create Jira ticket per task, link to epic |
-### Authentication Gate
+## Execution Loop
-Before entering **Step 2 (Plan)** or **Step 3 (Decompose)** for the first time, the autopilot must:
+Every invocation follows this sequence:
 1. Call `mcp_auth` on the Jira MCP server
 2. If authentication succeeds → proceed normally
 3. If the user **skips** or authentication fails → present using Choose format:
 ```
 ══════════════════════════════════════
 Jira MCP authentication failed
 ══════════════════════════════════════
 A) Retry authentication (retry mcp_auth)
 B) Continue without Jira (tasks saved locally only)
 ══════════════════════════════════════
 Recommendation: A — Jira IDs drive task referencing,
 dependency tracking, and implementation batching.
 Without Jira, task files use numeric prefixes instead.
 ══════════════════════════════════════
 ```
 If user picks **B** (continue without Jira):
 - Set a flag in the state file: `jira_enabled: false`
 - All skills that would create Jira tickets instead save metadata locally in the task/epic files with `Jira: pending` status
 - Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of Jira ID prefixes
 - The workflow proceeds normally in all other respects
 ### Re-Authentication
 If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
 ## User Interaction Protocol
 Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
 - State transitions where multiple valid next actions exist
 - Sub-skill BLOCKING gates that require user judgment
 - Any fork where the autopilot cannot confidently pick the right path
 - Trade-off decisions (tech choices, scope, risk acceptance)
 ### When to Ask (MUST ask)
 - The next action is ambiguous (e.g., "another research round or proceed?")
 - The decision has irreversible consequences (e.g., architecture choices, skipping a step)
 - The user's intent or preference cannot be inferred from existing artifacts
 - A sub-skill's BLOCKING gate explicitly requires user confirmation
 - Multiple valid approaches exist with meaningfully different trade-offs
 ### When NOT to Ask (auto-transition)
 - Only one logical next step exists (e.g., Problem complete → Research is the only option)
 - The transition is deterministic from the state (e.g., Plan complete → Decompose)
 - The decision is low-risk and reversible
 - Existing artifacts or prior decisions already imply the answer
 ### Choice Format
 Always present decisions in this format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: [brief context]
 ══════════════════════════════════════
 A) [Option A — short description]
 B) [Option B — short description]
 C) [Option C — short description, if applicable]
 D) [Option D — short description, if applicable]
 ══════════════════════════════════════
 Recommendation: [A/B/C/D] — [one-line reason]
 ══════════════════════════════════════
 ```
 Rules:
 1. Always provide 2–4 concrete options (never open-ended questions)
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
 5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
 6. Record every user decision in the state file's `Key Decisions` section
 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
 ## State File: `_docs/_autopilot_state.md`
 The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
 ### Format
 ```markdown
 # Autopilot State
 ## Current Step
 step: [0-5 or "done"]
 name: [Problem / Research / Plan / Decompose / Implement / Deploy / Done]
 status: [not_started / in_progress / completed]
 sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
 ## Step ↔ SubStep Reference
 | Step | Name       | Sub-Skill              | Internal SubSteps                        |
 |------|------------|------------------------|------------------------------------------|
 | 0    | Problem    | problem/SKILL.md       | Phase 1–4                                |
 | 1    | Research   | research/SKILL.md      | Mode A: Phase 1–4 · Mode B: Step 0–8    |
 | 2    | Plan       | plan/SKILL.md          | Step 1–6                                 |
 | 3    | Decompose  | decompose/SKILL.md     | Step 1–4                                 |
 | 4    | Implement  | implement/SKILL.md     | (batch-driven, no fixed sub-steps)       |
 | 5    | Deploy     | deploy/SKILL.md        | Step 1–7                                 |
 When updating `Current Step`, always write it as:
  step: N          ← autopilot step (0–5)
  sub_step: M      ← sub-skill's own internal step/phase number + name
 Example:
  step: 2
  name: Plan
  status: in_progress
  sub_step: 4 — Architecture Review & Risk Assessment
 ## Completed Steps
 | Step | Name | Completed | Key Outcome |
 |------|------|-----------|-------------|
 | 0 | Problem | [date] | [one-line summary] |
 | 1 | Research | [date] | [N drafts, final approach summary] |
 | 2 | Plan | [date] | [N components, architecture summary] |
 | 3 | Decompose | [date] | [N tasks, total complexity points] |
 | 4 | Implement | [date] | [N batches, pass/fail summary] |
 | 5 | Deploy | [date] | [artifacts produced] |
 ## Key Decisions
 - [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
 - [decision 2: e.g. "6 research rounds, final draft: solution_draft06.md"]
 - [decision N]
 ## Last Session
 date: [date]
 ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
 reason: [completed step / session boundary / user paused / context limit]
 notes: [any context for next session, e.g. "User asked to revisit risk assessment"]
 ## Blockers
 - [blocker 1, if any]
 - [none]
 ```
 ### State File Rules
 1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
 2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
 3. **Read** the state file as the first action on every invocation — before folder scanning
 4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
 5. **Never delete** the state file. It accumulates history across the entire project lifecycle
 ## Execution Entry Point
 Every invocation of this skill follows the same sequence:
 ```
 1. Read _docs/_autopilot_state.md (if exists)
-2. Cross-check state file against _docs/ folder structure
+2. Read all File Index files above
-3. Resolve current step (state file + folder scan)
+3. Cross-check state file against _docs/ folder structure (rules in state.md)
-4. Present Status Summary (from state file context)
+4. Resolve flow (see Flow Resolution above)
-5. Enter Execution Loop:
+5. Resolve current step (detection rules from the active flow file)
-   a. Read and execute the current skill's SKILL.md
+6. Present Status Summary (format in protocols.md)
-   b. When skill completes → update state file
+7. Execute:
-   c. Re-detect next step
+   a. Delegate to current skill (see Skill Delegation below)
-   d. If next skill is ready → auto-chain (go to 5a with next skill)
+   b. When skill completes → update state file (rules in state.md)
-   e. If session boundary reached → update state file with session notes → suggest new conversation
+   c. Re-detect next step from the active flow's detection rules
-   f. If all steps done → update state file → report completion
+   d. If next skill is ready → auto-chain (go to 7a with next skill)
   e. If session boundary reached → update state, suggest new conversation (rules in state.md)
   f. If all steps done → update state → report completion
 ```
 ## State Detection
 Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
 ### Folder Scan Rules (fallback)
 Scan `_docs/` to determine the current workflow position. Check rules in order — first match wins.
 ### Detection Rules
 **Pre-Step — Existing Codebase Detection**
 Condition: `_docs/` does not exist AND the workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`, `src/`, `Cargo.toml`, `*.csproj`, `package.json`)
 Action: An existing codebase without documentation was detected. Present using Choose format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Existing codebase detected
 ══════════════════════════════════════
 A) Start fresh — define the problem from scratch (normal workflow)
 B) Document existing codebase first — run /document to reverse-engineer docs, then continue
 ══════════════════════════════════════
 Recommendation: B — the /document skill analyzes your code
 bottom-up and produces _docs/ artifacts automatically,
 then you can continue with refactor or the normal workflow.
 ══════════════════════════════════════
 ```
 - If user picks A → proceed to Step 0 (Problem Gathering) as normal
 - If user picks B → read and execute `.cursor/skills/document/SKILL.md`. After document skill completes, re-detect state (the produced `_docs/` artifacts will place the project at Step 2 or later).
 ---
 **Step 0 — Problem Gathering**
 Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
 - `problem.md`
 - `restrictions.md`
 - `acceptance_criteria.md`
 - `input_data/` (must contain at least one file)
 Action: Read and execute `.cursor/skills/problem/SKILL.md`
 ---
 **Step 1 — Research (Initial)**
 Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
 Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
 ---
 **Step 1b — Research Decision**
 Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_document/architecture.md` does not exist
 Action: Present the current research state to the user:
 - How many solution drafts exist
 - Whether tech_stack.md and security_analysis.md exist
 - One-line summary from the latest draft
 Then present using the **Choose format**:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Research complete — next action?
 ══════════════════════════════════════
 A) Run another research round (Mode B assessment)
 B) Proceed to planning with current draft
 ══════════════════════════════════════
 Recommendation: [A or B] — [reason based on draft quality]
 ══════════════════════════════════════
 ```
 - If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
 - If user picks B → auto-chain to Step 2 (Plan)
 ---
 **Step 2 — Plan**
 Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_document/architecture.md` does not exist
 Action:
 1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
 2. Read and execute `.cursor/skills/plan/SKILL.md`
 If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
 ---
 **Step 3 — Decompose**
 Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`)
 Action: Read and execute `.cursor/skills/decompose/SKILL.md`
 If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
 ---
 **Step 4 — Implement**
 Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
 Action: Read and execute `.cursor/skills/implement/SKILL.md`
 If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
 ---
 **Step 5 — Deploy**
 Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND `_docs/04_deploy/` does not exist or is incomplete
 Action: Read and execute `.cursor/skills/deploy/SKILL.md`
 ---
 **Done**
 Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
 Action: Report project completion with summary.
 ## Status Summary
 On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback).
 Format:
 ```
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS
 ═══════════════════════════════════════════════════
 Step 0  Problem      [DONE / IN PROGRESS / NOT STARTED]
 Step 1  Research     [DONE (N drafts) / IN PROGRESS / NOT STARTED]
 Step 2  Plan         [DONE / IN PROGRESS / NOT STARTED]
 Step 3  Decompose    [DONE (N tasks) / IN PROGRESS / NOT STARTED]
 Step 4  Implement    [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
 Step 5  Deploy       [DONE / IN PROGRESS / NOT STARTED]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
 Action:  [what will happen next]
 ═══════════════════════════════════════════════════
 ```
 For re-entry (state file exists), also include:
 - Key decisions from the state file's `Key Decisions` section
 - Last session context from the `Last Session` section
 - Any blockers from the `Blockers` section
 ## Auto-Chain Rules
 After a skill completes, apply these rules:
 | Completed Step | Next Action |
 |---------------|-------------|
 | Problem Gathering | Auto-chain → Research (Mode A) |
 | Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
 | Research Decision → proceed | Auto-chain → Plan |
 | Plan | Auto-chain → Decompose |
 | Decompose | **Session boundary** — suggest new conversation before Implement |
 | Implement | Auto-chain → Deploy |
 | Deploy | Report completion |
 ### Session Boundary: Decompose → Implement
 After decompose completes, **do not auto-chain to implement**. Instead:
 1. Update state file: mark Decompose as completed, set current step to 4 (Implement) with status `not_started`
 2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
 3. Present a summary: number of tasks, estimated batches, total complexity points
 4. Use Choose format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Decompose complete — start implementation?
 ══════════════════════════════════════
 A) Start a new conversation for implementation (recommended for context freshness)
 B) Continue implementation in this conversation
 ══════════════════════════════════════
 Recommendation: A — implementation is the longest phase, fresh context helps
 ══════════════════════════════════════
 ```
 This is the only hard session boundary. All other transitions auto-chain.
 ## Skill Delegation
 For each step, the delegation pattern is:
-1. Update state file: set `step` to the autopilot step number (0–5), status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase number and name
+1. Update state file: set `step` to the autopilot step number, status to `in_progress`, set `sub_step` to the sub-skill's current internal step/phase
 2. Announce: "Starting [Skill Name]..."
 3. Read the skill file: `.cursor/skills/[name]/SKILL.md`
-4. Execute the skill's workflow exactly as written, including:
+4. Execute the skill's workflow exactly as written, including all BLOCKING gates, self-verification checklists, save actions, and escalation rules. Update `sub_step` in state each time the sub-skill advances.
-   - All BLOCKING gates (present to user, wait for confirmation)
+5. When complete: mark step `completed`, record date + key outcome, add key decisions to state file, return to auto-chain rules (from active flow file)
   - All self-verification checklists
   - All save actions
   - All escalation rules
   - Update `sub_step` in the state file each time the sub-skill advances to a new internal step/phase
 5. When the skill's workflow is fully complete:
   - Update state file: mark step as `completed`, record date, write one-line key outcome
   - Add any key decisions made during this step to the `Key Decisions` section
   - Return to the auto-chain rules
 Do NOT modify, skip, or abbreviate any part of the sub-skill's workflow. The autopilot is a sequencer, not an optimizer.
 ## Re-Entry Protocol
 When the user invokes `/autopilot` and work already exists:
 1. Read `_docs/_autopilot_state.md`
 2. Cross-check against `_docs/` folder structure
 3. Present Status Summary with context from state file (key decisions, last session, blockers)
 4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
 5. Continue execution from detected state
 ## Error Handling
 All error situations that require user input MUST use the **Choose A / B / C / D** format.
 | Situation | Action |
 |-----------|--------|
 | State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
 | Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
 | User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
 | User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
 | User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
 ## Trigger Conditions
 This skill activates when the user wants to:
@@ -454,35 +104,27 @@ This skill activates when the user wants to:
 │              Autopilot (Auto-Chain Orchestrator)                │
 ├────────────────────────────────────────────────────────────────┤
 │ EVERY INVOCATION:                                              │
-│   1. State Detection (scan _docs/)                             │
+│   1. Read state file + module files                            │
-│   2. Status Summary (show progress)                            │
+│   2. Resolve flow & current step                               │
-│   3. Execute current skill                                     │
+│   3. Status Summary → Execute → Auto-chain (loop)             │
 │   4. Auto-chain to next skill (loop)                           │
 │                                                                │
-│ WORKFLOW:                                                       │
+│ GREENFIELD FLOW (flows/greenfield.md):                         │
-│   Step 0  Problem    → .cursor/skills/problem/SKILL.md         │
+│   Step 0 Problem → Step 1 Research → Step 2 Plan              │
-│     ↓ auto-chain                                               │
+│   → Step 3 Decompose → [SESSION] → Step 4 Implement           │
-│   Step 1  Research   → .cursor/skills/research/SKILL.md        │
+│   → Step 5 Run Tests → Step 6 Deploy → DONE                   │
 │     ↓ auto-chain (ask: another round?)                         │
 │   Step 2  Plan       → .cursor/skills/plan/SKILL.md            │
 │     ↓ auto-chain                                               │
 │   Step 3  Decompose  → .cursor/skills/decompose/SKILL.md       │
 │     ↓ SESSION BOUNDARY (suggest new conversation)              │
 │   Step 4  Implement  → .cursor/skills/implement/SKILL.md       │
 │     ↓ auto-chain                                               │
 │   Step 5  Deploy     → .cursor/skills/deploy/SKILL.md          │
 │     ↓                                                          │
 │   DONE                                                         │
 │                                                                │
-│ STATE FILE: _docs/_autopilot_state.md                          │
+│ EXISTING CODE FLOW (flows/existing-code.md):                   │
-│ FALLBACK: _docs/ folder structure scan                         │
+│   Pre-Step Document → 2b Test Spec → 2c Decompose Tests      │
 │   → [SESSION] → 2d Implement Tests → 2e Refactor             │
 │   → 2f New Task → [SESSION] → 2g Implement                   │
 │   → 2h Run Tests → 2i Deploy → DONE                          │
 │                                                                │
 │ STATE: _docs/_autopilot_state.md (see state.md)                │
 │ PROTOCOLS: choice format, Jira auth, errors (see protocols.md) │
 │ PAUSE POINTS: sub-skill BLOCKING gates only                    │
-│ SESSION BREAK: after Decompose (before Implement)              │
+│ SESSION BREAK: after Decompose/New Task (before Implement)     │
 │ USER INPUT: Choose A/B/C/D format at genuine decisions only    │
 │ AUTO-TRANSITION: when path is unambiguous, don't ask            │
 ├────────────────────────────────────────────────────────────────┤
-│ Principles: Auto-chain · State to file · Rich re-entry         │
+│ Auto-chain · State to file · Rich re-entry · Delegate          │
-│             Delegate don't duplicate · Pause at decisions only  │
+│ Pause at decisions only · Minimize interruptions               │
 │             Minimize interruptions · Choose format for decisions │
 └────────────────────────────────────────────────────────────────┘
 ```
@@ -0,0 +1,181 @@
 # Existing Code Workflow
 Workflow for projects with an existing codebase. Starts with documentation, produces test specs, decomposes and implements tests, refactors with that safety net, then adds new functionality and deploys.
 ## Step Reference Table
 | Step | Name                    | Sub-Skill                       | Internal SubSteps                     |
 |------|-------------------------|---------------------------------|---------------------------------------|
 | —    | Document (pre-step)     | document/SKILL.md               | Steps 1–8                             |
 | 2b   | Blackbox Test Spec      | blackbox-test-spec/SKILL.md     | Phase 1a–1b                           |
 | 2c   | Decompose Tests         | decompose/SKILL.md (tests-only) | Step 1t + Step 3 + Step 4             |
 | 2d   | Implement Tests         | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
 | 2e   | Refactor                | refactor/SKILL.md               | Phases 0–5 (6-phase method)           |
 | 2f   | New Task                | new-task/SKILL.md               | Steps 1–8 (loop)                      |
 | 2g   | Implement               | implement/SKILL.md              | (batch-driven, no fixed sub-steps)    |
 | 2h   | Run Tests               | (autopilot-managed)             | Unit tests → Integration/blackbox tests |
 | 2i   | Deploy                  | deploy/SKILL.md                 | Steps 1–7                             |
 After Step 2i, the existing-code workflow is complete.
 ## Detection Rules
 Check rules in order — first match wins.
 ---
 **Pre-Step — Existing Codebase Detection**
 Condition: `_docs/` does not exist AND the workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`, `src/`, `Cargo.toml`, `*.csproj`, `package.json`)
 Action: An existing codebase without documentation was detected. Present using Choose format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Existing codebase detected
 ══════════════════════════════════════
 A) Start fresh — define the problem from scratch (greenfield workflow)
 B) Document existing codebase first — run /document to reverse-engineer docs, then continue
 ══════════════════════════════════════
 Recommendation: B — the /document skill analyzes your code
 bottom-up and produces _docs/ artifacts automatically,
 then you can continue with test specs, refactor, and new features.
 ══════════════════════════════════════
 ```
 - If user picks A → proceed to Step 0 (Problem Gathering) in the greenfield flow
 - If user picks B → read and execute `.cursor/skills/document/SKILL.md`. After document skill completes, re-detect state (the produced `_docs/` artifacts will place the project at Step 2b or later).
 ---
 **Step 2b — Blackbox Test Spec**
 Condition: `_docs/02_document/FINAL_report.md` exists AND workspace contains source code files (e.g., `*.py`, `*.cs`, `*.rs`, `*.ts`) AND `_docs/02_document/integration_tests/traceability_matrix.md` does not exist AND the autopilot state shows Document was run (check `Completed Steps` for "Document" entry)
 Action: Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`
 This step applies when the codebase was documented via the `/document` skill. Test specifications must be produced before refactoring or further development.
 ---
 **Step 2c — Decompose Tests**
 Condition: `_docs/02_document/integration_tests/traceability_matrix.md` exists AND workspace contains source code files AND the autopilot state shows Document was run AND (`_docs/02_tasks/` does not exist or has no task files)
 Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/integration_tests/` as input). The decompose skill will:
 1. Run Step 1t (test infrastructure bootstrap)
 2. Run Step 3 (integration test task decomposition)
 3. Run Step 4 (cross-verification against test coverage)
 If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
 ---
 **Step 2d — Implement Tests**
 Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 2c (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
 Action: Read and execute `.cursor/skills/implement/SKILL.md`
 The implement skill reads test tasks from `_docs/02_tasks/` and implements them.
 If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
 ---
 **Step 2e — Refactor**
 Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state shows Step 2d (Implement Tests) is completed AND `_docs/04_refactor/FINAL_refactor_report.md` does not exist
 Action: Read and execute `.cursor/skills/refactor/SKILL.md`
 The refactor skill runs the full 6-phase method using the implemented tests as a safety net.
 If `_docs/04_refactor/` has phase reports, the refactor skill detects completed phases and continues.
 ---
 **Step 2f — New Task**
 Condition: `_docs/04_refactor/FINAL_refactor_report.md` exists AND the autopilot state shows Step 2e (Refactor) is completed AND the autopilot state does NOT show Step 2f (New Task) as completed
 Action: Read and execute `.cursor/skills/new-task/SKILL.md`
 The new-task skill interactively guides the user through defining new functionality. It loops until the user is done adding tasks. New task files are written to `_docs/02_tasks/`.
 ---
 **Step 2g — Implement**
 Condition: the autopilot state shows Step 2f (New Task) is completed AND `_docs/03_implementation/` does not contain a FINAL report covering the new tasks (check state for distinction between test implementation and feature implementation)
 Action: Read and execute `.cursor/skills/implement/SKILL.md`
 The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 2d are skipped (the implement skill tracks completed tasks in batch reports).
 If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.
 ---
 **Step 2h — Run Tests**
 Condition: the autopilot state shows Step 2g (Implement) is completed AND the autopilot state does NOT show Step 2h (Run Tests) as completed
 Action: Run the full test suite to verify the implementation before deployment.
 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
 2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 If all tests pass → auto-chain to Step 2i (Deploy).
 If tests fail → present using Choose format:
 ```
 ══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped]
 ══════════════════════════════════════
 A) Fix failing tests and re-run
 B) Proceed to deploy anyway (not recommended)
 C) Abort — fix manually
 ══════════════════════════════════════
 Recommendation: A — fix failures before deploying
 ══════════════════════════════════════
 ```
 ---
 **Step 2i — Deploy**
 Condition: the autopilot state shows Step 2h (Run Tests) is completed AND (`_docs/04_deploy/` does not exist or is incomplete)
 Action: Read and execute `.cursor/skills/deploy/SKILL.md`
 After deployment completes, the existing-code workflow is done.
 ---
 **Re-Entry After Completion**
 Condition: the autopilot state shows `step: done` OR all steps through 2i (Deploy) are completed
 Action: The project completed a full cycle. Present status and loop back to New Task:
 ```
 ══════════════════════════════════════
 PROJECT CYCLE COMPLETE
 ══════════════════════════════════════
 The previous cycle finished successfully.
 You can now add new functionality.
 ══════════════════════════════════════
 A) Add new features (start New Task)
 B) Done — no more changes needed
 ══════════════════════════════════════
 ```
 - If user picks A → set `step: 2f`, `status: not_started` in the state file, then auto-chain to Step 2f (New Task). Previous cycle history stays in Completed Steps.
 - If user picks B → report final project status and exit.
 ## Auto-Chain Rules
 | Completed Step | Next Action |
 |---------------|-------------|
 | Document (existing code) | Auto-chain → Blackbox Test Spec (Step 2b) |
 | Blackbox Test Spec (Step 2b) | Auto-chain → Decompose Tests (Step 2c) |
 | Decompose Tests (Step 2c) | **Session boundary** — suggest new conversation before Implement Tests |
 | Implement Tests (Step 2d) | Auto-chain → Refactor (Step 2e) |
 | Refactor (Step 2e) | Auto-chain → New Task (Step 2f) |
 | New Task (Step 2f) | **Session boundary** — suggest new conversation before Implement |
 | Implement (Step 2g) | Auto-chain → Run Tests (Step 2h) |
 | Run Tests (Step 2h, all pass) | Auto-chain → Deploy (Step 2i) |
 | Deploy (Step 2i) | **Workflow complete** — existing-code flow done |
@@ -0,0 +1,146 @@
 # Greenfield Workflow
 Workflow for new projects built from scratch. Flows linearly: Problem → Research → Plan → Decompose → Implement → Run Tests → Deploy.
 ## Step Reference Table
 | Step | Name      | Sub-Skill              | Internal SubSteps                     |
 |------|-----------|------------------------|---------------------------------------|
 | 0    | Problem   | problem/SKILL.md       | Phase 1–4                             |
 | 1    | Research  | research/SKILL.md      | Mode A: Phase 1–4 · Mode B: Step 0–8 |
 | 2    | Plan      | plan/SKILL.md          | Step 1–6                              |
 | 3    | Decompose | decompose/SKILL.md     | Step 1–4                              |
 | 4    | Implement | implement/SKILL.md     | (batch-driven, no fixed sub-steps)    |
 | 5    | Run Tests | (autopilot-managed)    | Unit tests → Integration/blackbox tests |
 | 6    | Deploy    | deploy/SKILL.md        | Step 1–7                              |
 ## Detection Rules
 Check rules in order — first match wins.
 ---
 **Step 0 — Problem Gathering**
 Condition: `_docs/00_problem/` does not exist, OR any of these are missing/empty:
 - `problem.md`
 - `restrictions.md`
 - `acceptance_criteria.md`
 - `input_data/` (must contain at least one file)
 Action: Read and execute `.cursor/skills/problem/SKILL.md`
 ---
 **Step 1 — Research (Initial)**
 Condition: `_docs/00_problem/` is complete AND `_docs/01_solution/` has no `solution_draft*.md` files
 Action: Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode A)
 ---
 **Step 1b — Research Decision**
 Condition: `_docs/01_solution/` contains `solution_draft*.md` files AND `_docs/01_solution/solution.md` does not exist AND `_docs/02_document/architecture.md` does not exist
 Action: Present the current research state to the user:
 - How many solution drafts exist
 - Whether tech_stack.md and security_analysis.md exist
 - One-line summary from the latest draft
 Then present using the **Choose format**:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Research complete — next action?
 ══════════════════════════════════════
 A) Run another research round (Mode B assessment)
 B) Proceed to planning with current draft
 ══════════════════════════════════════
 Recommendation: [A or B] — [reason based on draft quality]
 ══════════════════════════════════════
 ```
 - If user picks A → Read and execute `.cursor/skills/research/SKILL.md` (will auto-detect Mode B)
 - If user picks B → auto-chain to Step 2 (Plan)
 ---
 **Step 2 — Plan**
 Condition: `_docs/01_solution/` has `solution_draft*.md` files AND `_docs/02_document/architecture.md` does not exist
 Action:
 1. The plan skill's Prereq 2 will rename the latest draft to `solution.md` — this is handled by the plan skill itself
 2. Read and execute `.cursor/skills/plan/SKILL.md`
 If `_docs/02_document/` exists but is incomplete (has some artifacts but no `FINAL_report.md`), the plan skill's built-in resumability handles it.
 ---
 **Step 3 — Decompose**
 Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`) AND (workspace has no source code files OR the user explicitly chose normal workflow in Step 2c)
 Action: Read and execute `.cursor/skills/decompose/SKILL.md`
 If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it.
 ---
 **Step 4 — Implement**
 Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
 Action: Read and execute `.cursor/skills/implement/SKILL.md`
 If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
 ---
 **Step 5 — Run Tests**
 Condition: `_docs/03_implementation/FINAL_implementation_report.md` exists AND the autopilot state does NOT show Step 5 (Run Tests) as completed AND (`_docs/04_deploy/` does not exist or is incomplete)
 Action: Run the full test suite to verify the implementation before deployment.
 1. **Unit tests**: detect the project's test runner (e.g., `pytest`, `dotnet test`, `cargo test`, `npm test`) and run all unit tests
 2. **Integration / blackbox tests**: if `docker-compose.test.yml` or an equivalent test environment exists, spin it up and run the integration test suite
 3. **Report results**: present a summary of passed/failed/skipped tests
 If all tests pass → auto-chain to Step 6 (Deploy).
 If tests fail → present using Choose format:
 ```
 ══════════════════════════════════════
 TEST RESULTS: [N passed, M failed, K skipped]
 ══════════════════════════════════════
 A) Fix failing tests and re-run
 B) Proceed to deploy anyway (not recommended)
 C) Abort — fix manually
 ══════════════════════════════════════
 Recommendation: A — fix failures before deploying
 ══════════════════════════════════════
 ```
 ---
 **Step 6 — Deploy**
 Condition: the autopilot state shows Step 5 (Run Tests) is completed AND (`_docs/04_deploy/` does not exist or is incomplete)
 Action: Read and execute `.cursor/skills/deploy/SKILL.md`
 ---
 **Done**
 Condition: `_docs/04_deploy/` contains all expected artifacts (containerization.md, ci_cd_pipeline.md, environment_strategy.md, observability.md, deployment_procedures.md)
 Action: Report project completion with summary. If the user runs autopilot again after greenfield completion, Flow Resolution rule 3 routes to the existing-code flow (re-entry after completion) so they can add new features.
 ## Auto-Chain Rules
 | Completed Step | Next Action |
 |---------------|-------------|
 | Problem Gathering | Auto-chain → Research (Mode A) |
 | Research (any round) | Auto-chain → Research Decision (ask user: another round or proceed?) |
 | Research Decision → proceed | Auto-chain → Plan |
 | Plan | Auto-chain → Decompose |
 | Decompose | **Session boundary** — suggest new conversation before Implement |
 | Implement | Auto-chain → Run Tests (Step 5) |
 | Run Tests (all pass) | Auto-chain → Deploy (Step 6) |
 | Deploy | Report completion |
@@ -0,0 +1,158 @@
 # Autopilot Protocols
 ## User Interaction Protocol
 Every time the autopilot or a sub-skill needs a user decision, use the **Choose A / B / C / D** format. This applies to:
 - State transitions where multiple valid next actions exist
 - Sub-skill BLOCKING gates that require user judgment
 - Any fork where the autopilot cannot confidently pick the right path
 - Trade-off decisions (tech choices, scope, risk acceptance)
 ### When to Ask (MUST ask)
 - The next action is ambiguous (e.g., "another research round or proceed?")
 - The decision has irreversible consequences (e.g., architecture choices, skipping a step)
 - The user's intent or preference cannot be inferred from existing artifacts
 - A sub-skill's BLOCKING gate explicitly requires user confirmation
 - Multiple valid approaches exist with meaningfully different trade-offs
 ### When NOT to Ask (auto-transition)
 - Only one logical next step exists (e.g., Problem complete → Research is the only option)
 - The transition is deterministic from the state (e.g., Plan complete → Decompose)
 - The decision is low-risk and reversible
 - Existing artifacts or prior decisions already imply the answer
 ### Choice Format
 Always present decisions in this format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: [brief context]
 ══════════════════════════════════════
 A) [Option A — short description]
 B) [Option B — short description]
 C) [Option C — short description, if applicable]
 D) [Option D — short description, if applicable]
 ══════════════════════════════════════
 Recommendation: [A/B/C/D] — [one-line reason]
 ══════════════════════════════════════
 ```
 Rules:
 1. Always provide 2–4 concrete options (never open-ended questions)
 2. Always include a recommendation with a brief justification
 3. Keep option descriptions to one line each
 4. If only 2 options make sense, use A/B only — do not pad with filler options
 5. Play the notification sound (per `human-input-sound.mdc`) before presenting the choice
 6. Record every user decision in the state file's `Key Decisions` section
 7. After the user picks, proceed immediately — no follow-up confirmation unless the choice was destructive
 ## Jira MCP Authentication
 Several workflow steps create Jira artifacts (epics, tasks, links). The Jira MCP server must be authenticated **before** any step that writes to Jira.
 ### Steps That Require Jira MCP
 | Step | Sub-Step | Jira Action |
 |------|----------|-------------|
 | 2 (Plan) | Step 6 — Jira Epics | Create epics for each component |
 | 2c (Decompose Tests) | Step 1t + Step 3 — All test tasks | Create Jira ticket per task, link to epic |
 | 2f (New Task) | Step 7 — Jira ticket | Create Jira ticket per task, link to epic |
 | 3 (Decompose) | Step 1–3 — All tasks | Create Jira ticket per task, link to epic |
 ### Authentication Gate
 Before entering **Step 2 (Plan)**, **Step 2c (Decompose Tests)**, **Step 2f (New Task)**, or **Step 3 (Decompose)** for the first time, the autopilot must:
 1. Call `mcp_auth` on the Jira MCP server
 2. If authentication succeeds → proceed normally
 3. If the user **skips** or authentication fails → present using Choose format:
 ```
 ══════════════════════════════════════
 Jira MCP authentication failed
 ══════════════════════════════════════
 A) Retry authentication (retry mcp_auth)
 B) Continue without Jira (tasks saved locally only)
 ══════════════════════════════════════
 Recommendation: A — Jira IDs drive task referencing,
 dependency tracking, and implementation batching.
 Without Jira, task files use numeric prefixes instead.
 ══════════════════════════════════════
 ```
 If user picks **B** (continue without Jira):
 - Set a flag in the state file: `jira_enabled: false`
 - All skills that would create Jira tickets instead save metadata locally in the task/epic files with `Jira: pending` status
 - Task files keep numeric prefixes (e.g., `01_initial_structure.md`) instead of Jira ID prefixes
 - The workflow proceeds normally in all other respects
 ### Re-Authentication
 If Jira MCP was already authenticated in a previous invocation (verify by listing available Jira tools beyond `mcp_auth`), skip the auth gate.
 ## Error Handling
 All error situations that require user input MUST use the **Choose A / B / C / D** format.
 | Situation | Action |
 |-----------|--------|
 | State detection is ambiguous (artifacts suggest two different steps) | Present findings and use Choose format with the candidate steps as options |
 | Sub-skill fails or hits an unrecoverable blocker | Use Choose format: A) retry, B) skip with warning, C) abort and fix manually |
 | User wants to skip a step | Use Choose format: A) skip (with dependency warning), B) execute the step |
 | User wants to go back to a previous step | Use Choose format: A) re-run (with overwrite warning), B) stay on current step |
 | User asks "where am I?" without wanting to continue | Show Status Summary only, do not start execution |
 ## Status Summary
 On every invocation, before executing any skill, present a status summary built from the state file (with folder scan fallback). Use the template matching the active flow (see Flow Resolution in SKILL.md).
 ### Greenfield Flow
 ```
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS (greenfield)
 ═══════════════════════════════════════════════════
 Step 0   Problem             [DONE / IN PROGRESS / NOT STARTED]
 Step 1   Research            [DONE (N drafts) / IN PROGRESS / NOT STARTED]
 Step 2   Plan                [DONE / IN PROGRESS / NOT STARTED]
 Step 3   Decompose           [DONE (N tasks) / IN PROGRESS / NOT STARTED]
 Step 4   Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
 Step 5   Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
 Step 6   Deploy              [DONE / IN PROGRESS / NOT STARTED]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
 Action:  [what will happen next]
 ═══════════════════════════════════════════════════
 ```
 ### Existing Code Flow
 ```
 ═══════════════════════════════════════════════════
 AUTOPILOT STATUS (existing-code)
 ═══════════════════════════════════════════════════
 Pre      Document            [DONE / IN PROGRESS / NOT STARTED]
 Step 2b  Blackbox Test Spec  [DONE / IN PROGRESS / NOT STARTED]
 Step 2c  Decompose Tests     [DONE (N tasks) / IN PROGRESS / NOT STARTED]
 Step 2d  Implement Tests     [DONE / IN PROGRESS (batch M) / NOT STARTED]
 Step 2e  Refactor            [DONE / IN PROGRESS (phase N) / NOT STARTED]
 Step 2f  New Task            [DONE (N tasks) / IN PROGRESS / NOT STARTED]
 Step 2g  Implement           [DONE / IN PROGRESS (batch M of ~N) / NOT STARTED]
 Step 2h  Run Tests           [DONE (N passed, M failed) / IN PROGRESS / NOT STARTED]
 Step 2i  Deploy              [DONE / IN PROGRESS / NOT STARTED]
 ═══════════════════════════════════════════════════
 Current: Step N — Name
 SubStep: M — [sub-skill internal step name]
 Action:  [what will happen next]
 ═══════════════════════════════════════════════════
 ```
 For re-entry (state file exists), also include:
 - Key decisions from the state file's `Key Decisions` section
 - Last session context from the `Last Session` section
 - Any blockers from the `Blockers` section
@@ -0,0 +1,102 @@
 # Autopilot State Management
 ## State File: `_docs/_autopilot_state.md`
 The autopilot persists its state to `_docs/_autopilot_state.md`. This file is the primary source of truth for re-entry. Folder scanning is the fallback when the state file doesn't exist.
 ### Format
 ```markdown
 # Autopilot State
 ## Current Step
 step: [0-6 or "2b" / "2c" / "2d" / "2e" / "2f" / "2g" / "2h" / "2i" or "done"]
 name: [Problem / Research / Plan / Blackbox Test Spec / Decompose Tests / Implement Tests / Refactor / New Task / Implement / Run Tests / Deploy / Decompose / Done]
 status: [not_started / in_progress / completed]
 sub_step: [optional — sub-skill internal step number + name if interrupted mid-step]
 ## Step ↔ SubStep Reference
 (include the step reference table from the active flow file)
 When updating `Current Step`, always write it as:
  step: N          ← autopilot step (0–6 or 2b/2c/2d/2e/2f/2g/2h/2i)
  sub_step: M      ← sub-skill's own internal step/phase number + name
 Example:
  step: 2
  name: Plan
  status: in_progress
  sub_step: 4 — Architecture Review & Risk Assessment
 ## Completed Steps
 | Step | Name | Completed | Key Outcome |
 |------|------|-----------|-------------|
 | 0 | Problem | [date] | [one-line summary] |
 | 1 | Research | [date] | [N drafts, final approach summary] |
 | ... | ... | ... | ... |
 ## Key Decisions
 - [decision 1: e.g. "Tech stack: Python + Rust for perf-critical, Postgres DB"]
 - [decision N]
 ## Last Session
 date: [date]
 ended_at: Step [N] [Name] — SubStep [M] [sub-step name]
 reason: [completed step / session boundary / user paused / context limit]
 notes: [any context for next session]
 ## Blockers
 - [blocker 1, if any]
 - [none]
 ```
 ### State File Rules
 1. **Create** the state file on the very first autopilot invocation (after state detection determines Step 0)
 2. **Update** the state file after every step completion, every session boundary, and every BLOCKING gate confirmation
 3. **Read** the state file as the first action on every invocation — before folder scanning
 4. **Cross-check**: after reading the state file, verify against actual `_docs/` folder contents. If they disagree (e.g., state file says Step 2 but `_docs/02_document/architecture.md` already exists), trust the folder structure and update the state file to match
 5. **Never delete** the state file. It accumulates history across the entire project lifecycle
 ## State Detection
 Read `_docs/_autopilot_state.md` first. If it exists and is consistent with the folder structure, use the `Current Step` from the state file. If the state file doesn't exist or is inconsistent, fall back to folder scanning.
 ### Folder Scan Rules (fallback)
 Scan `_docs/` to determine the current workflow position. The detection rules are defined in each flow file (`flows/greenfield.md` and `flows/existing-code.md`). Check the existing-code flow first (Pre-Step detection), then greenfield flow rules. First match wins.
 ## Re-Entry Protocol
 When the user invokes `/autopilot` and work already exists:
 1. Read `_docs/_autopilot_state.md`
 2. Cross-check against `_docs/` folder structure
 3. Present Status Summary with context from state file (key decisions, last session, blockers)
 4. If the detected step has a sub-skill with built-in resumability (plan, decompose, implement, deploy all do), the sub-skill handles mid-step recovery
 5. Continue execution from detected state
 ## Session Boundaries
 After any decompose/planning step completes (Step 2c, Step 2f, or Step 3), **do not auto-chain to implement**. Instead:
 1. Update state file: mark the step as completed, set current step to the next implement step with status `not_started`
   - After Step 2c (Decompose Tests) → set current step to 2d (Implement Tests)
   - After Step 2f (New Task) → set current step to 2g (Implement)
   - After Step 3 (Decompose) → set current step to 4 (Implement)
 2. Write `Last Session` section: `reason: session boundary`, `notes: Decompose complete, implementation ready`
 3. Present a summary: number of tasks, estimated batches, total complexity points
 4. Use Choose format:
 ```
 ══════════════════════════════════════
 DECISION REQUIRED: Decompose complete — start implementation?
 ══════════════════════════════════════
 A) Start a new conversation for implementation (recommended for context freshness)
 B) Continue implementation in this conversation
 ══════════════════════════════════════
 Recommendation: A — implementation is the longest phase, fresh context helps
 ══════════════════════════════════════
 ```
 These are the only hard session boundaries. All other transitions auto-chain.
@@ -0,0 +1,218 @@
 ---
 name: blackbox-test-spec
 description: |
  Black-box integration test specification skill. Analyzes input data completeness and produces
  detailed E2E test scenarios (functional + non-functional) that treat the system as a black box.
  2-phase workflow: input data completeness analysis, then test scenario specification.
  Produces 5 artifacts under integration_tests/.
  Trigger phrases:
  - "blackbox test spec", "black box tests", "integration test spec"
  - "test specification", "e2e test spec"
  - "test scenarios", "black box scenarios"
 category: build
 tags: [testing, black-box, integration-tests, e2e, test-specification, qa]
 disable-model-invocation: true
 ---
 # Black-Box Test Scenario Specification
 Analyze input data completeness and produce detailed black-box integration test specifications. Tests describe what the system should do given specific inputs — they never reference internals.
 ## Core Principles
 - **Black-box only**: tests describe observable behavior through public interfaces; no internal implementation details
 - **Traceability**: every test traces to at least one acceptance criterion or restriction
 - **Save immediately**: write artifacts to disk after each phase; never accumulate unsaved work
 - **Ask, don't assume**: when requirements are ambiguous, ask the user before proceeding
 - **Spec, don't code**: this workflow produces test specifications, never test implementation code
 ## Context Resolution
 Fixed paths — no mode detection needed:
 - PROBLEM_DIR: `_docs/00_problem/`
 - SOLUTION_DIR: `_docs/01_solution/`
 - DOCUMENT_DIR: `_docs/02_document/`
 - TESTS_OUTPUT_DIR: `_docs/02_document/integration_tests/`
 Announce the resolved paths to the user before proceeding.
 ## Input Specification
 ### Required Files
 | File | Purpose |
 |------|---------|
 | `_docs/00_problem/problem.md` | Problem description and context |
 | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
 | `_docs/00_problem/restrictions.md` | Constraints and limitations |
 | `_docs/00_problem/input_data/` | Reference data examples |
 | `_docs/01_solution/solution.md` | Finalized solution |
 ### Optional Files (used when available)
 | File | Purpose |
 |------|---------|
 | `DOCUMENT_DIR/architecture.md` | System architecture for environment design |
 | `DOCUMENT_DIR/system-flows.md` | System flows for test scenario coverage |
 | `DOCUMENT_DIR/components/` | Component specs for interface identification |
 ### Prerequisite Checks (BLOCKING)
 1. `acceptance_criteria.md` exists and is non-empty — **STOP if missing**
 2. `restrictions.md` exists and is non-empty — **STOP if missing**
 3. `input_data/` exists and contains at least one file — **STOP if missing**
 4. `problem.md` exists and is non-empty — **STOP if missing**
 5. `solution.md` exists and is non-empty — **STOP if missing**
 6. Create TESTS_OUTPUT_DIR if it does not exist
 7. If TESTS_OUTPUT_DIR already contains files, ask user: **resume from last checkpoint or start fresh?**
 ## Artifact Management
 ### Directory Structure
 ```
 TESTS_OUTPUT_DIR/
 ├── environment.md
 ├── test_data.md
 ├── functional_tests.md
 ├── non_functional_tests.md
 └── traceability_matrix.md
 ```
 ### Save Timing
 | Phase | Save immediately after | Filename |
 |-------|------------------------|----------|
 | Phase 1a | Input data analysis (no file — findings feed Phase 1b) | — |
 | Phase 1b | Environment spec | `environment.md` |
 | Phase 1b | Test data spec | `test_data.md` |
 | Phase 1b | Functional tests | `functional_tests.md` |
 | Phase 1b | Non-functional tests | `non_functional_tests.md` |
 | Phase 1b | Traceability matrix | `traceability_matrix.md` |
 ### Resumability
 If TESTS_OUTPUT_DIR already contains files:
 1. List existing files and match them to the save timing table above
 2. Identify which phase/artifacts are complete
 3. Resume from the next incomplete artifact
 4. Inform the user which artifacts are being skipped
 ## Progress Tracking
 At the start of execution, create a TodoWrite with both phases. Update status as each phase completes.
 ## Workflow
 ### Phase 1a: Input Data Completeness Analysis
 **Role**: Professional Quality Assurance Engineer
 **Goal**: Assess whether the available input data is sufficient to build comprehensive test scenarios
 **Constraints**: Analysis only — no test specs yet
 1. Read `_docs/01_solution/solution.md`
 2. Read `acceptance_criteria.md`, `restrictions.md`
 3. Read testing strategy from solution.md (if present)
 4. If `DOCUMENT_DIR/architecture.md` and `DOCUMENT_DIR/system-flows.md` exist, read them for additional context on system interfaces and flows
 5. Analyze `input_data/` contents against:
   - Coverage of acceptance criteria scenarios
   - Coverage of restriction edge cases
   - Coverage of testing strategy requirements
 6. Threshold: at least 70% coverage of the scenarios
 7. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
 8. Present coverage assessment to user
 **BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
 ---
 ### Phase 1b: Black-Box Test Scenario Specification
 **Role**: Professional Quality Assurance Engineer
 **Goal**: Produce detailed black-box test specifications covering functional and non-functional scenarios
 **Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
 Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
 1. Define test environment using `.cursor/skills/plan/templates/integration-environment.md` as structure
 2. Define test data management using `.cursor/skills/plan/templates/integration-test-data.md` as structure
 3. Write functional test scenarios (positive + negative) using `.cursor/skills/plan/templates/integration-functional-tests.md` as structure
 4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `.cursor/skills/plan/templates/integration-non-functional-tests.md` as structure
 5. Build traceability matrix using `.cursor/skills/plan/templates/integration-traceability-matrix.md` as structure
 **Self-verification**:
 - [ ] Every acceptance criterion is covered by at least one test scenario
 - [ ] Every restriction is verified by at least one test scenario
 - [ ] Positive and negative scenarios are balanced
 - [ ] Consumer app has no direct access to system internals
 - [ ] Docker environment is self-contained (`docker compose up` sufficient)
 - [ ] External dependencies have mock/stub services defined
 - [ ] Traceability matrix has no uncovered AC or restrictions
 **Save action**: Write all files under TESTS_OUTPUT_DIR:
 - `environment.md`
 - `test_data.md`
 - `functional_tests.md`
 - `non_functional_tests.md`
 - `traceability_matrix.md`
 **BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
 Capture any new questions, findings, or insights that arise during test specification — these feed forward into downstream skills (plan, refactor, etc.).
 ---
 ## Escalation Rules
 | Situation | Action |
 |-----------|--------|
 | Missing acceptance_criteria.md, restrictions.md, or input_data/ | **STOP** — specification cannot proceed |
 | Ambiguous requirements | ASK user |
 | Input data coverage below 70% | Search internet for supplementary data, ASK user to validate |
 | Test scenario conflicts with restrictions | ASK user to clarify intent |
 | System interfaces unclear (no architecture.md) | ASK user or derive from solution.md |
 ## Common Mistakes
 - **Referencing internals**: tests must be black-box — no internal module names, no direct DB queries against the system under test
 - **Vague expected outcomes**: "works correctly" is not a test outcome; use specific measurable values
 - **Missing negative scenarios**: every positive scenario category should have corresponding negative/edge-case tests
 - **Untraceable tests**: every test should trace to at least one AC or restriction
 - **Writing test code**: this skill produces specifications, never implementation code
 ## Trigger Conditions
 When the user wants to:
 - Specify black-box integration tests before implementation or refactoring
 - Analyze input data completeness for test coverage
 - Produce E2E test scenarios from acceptance criteria
 **Keywords**: "blackbox test spec", "black box tests", "integration test spec", "test specification", "e2e test spec", "test scenarios"
 ## Methodology Quick Reference
 ```
 ┌────────────────────────────────────────────────────────────────┐
 │       Black-Box Test Scenario Specification (2-Phase)           │
 ├────────────────────────────────────────────────────────────────┤
 │ PREREQ: Data Gate (BLOCKING)                                    │
 │   → verify AC, restrictions, input_data, solution exist         │
 │                                                                │
 │ Phase 1a: Input Data Completeness Analysis                      │
 │   → assess input_data/ coverage vs AC scenarios (≥70%)          │
 │   [BLOCKING: user confirms input data coverage]                │
 │                                                                │
 │ Phase 1b: Black-Box Test Scenario Specification                 │
 │   → environment.md                                              │
 │   → test_data.md                                                │
 │   → functional_tests.md (positive + negative)                   │
 │   → non_functional_tests.md (perf, resilience, security, limits)│
 │   → traceability_matrix.md                                      │
 │   [BLOCKING: user confirms test coverage]                      │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Black-box only · Traceability · Save immediately    │
 │             Ask don't assume · Spec don't code                  │
 └────────────────────────────────────────────────────────────────┘
 ```
@@ -3,11 +3,12 @@ name: decompose
 description: |
  Decompose planned components into atomic implementable tasks with bootstrap structure plan.
  4-step workflow: bootstrap structure plan, component task decomposition, integration test task decomposition, and cross-task verification.
-  Supports full decomposition (_docs/ structure) and single component mode.
+  Supports full decomposition (_docs/ structure), single component mode, and tests-only mode.
  Trigger phrases:
  - "decompose", "decompose features", "feature decomposition"
  - "task decomposition", "break down components"
  - "prepare for implementation"
  - "decompose tests", "test decomposition"
 category: build
 tags: [decomposition, tasks, dependencies, jira, implementation-prep]
 disable-model-invocation: true
@@ -44,6 +45,14 @@ Determine the operating mode based on invocation before any other logic runs.
 - Ask user for the parent Epic ID
 - Runs Step 2 (that component only, appending to existing task numbering)
 **Tests-only mode** (provided file/directory is within `integration_tests/`, or `DOCUMENT_DIR/integration_tests/` exists and input explicitly requests test decomposition):
 - DOCUMENT_DIR: `_docs/02_document/`
 - TASKS_DIR: `_docs/02_tasks/`
 - TESTS_DIR: `DOCUMENT_DIR/integration_tests/`
 - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
 - Runs Step 1t (test infrastructure bootstrap) + Step 3 (integration test decomposition) + Step 4 (cross-verification against test coverage)
 - Skips Step 1 (project bootstrap) and Step 2 (component decomposition) — the codebase already exists
 Announce the detected mode and resolved paths to the user before proceeding.
 ## Input Specification
@@ -70,6 +79,19 @@ Announce the detected mode and resolved paths to the user before proceeding.
 | The provided component `description.md` | Component spec to decompose |
 | Corresponding `tests.md` in the same directory (if available) | Test specs for context |
 **Tests-only mode:**
 | File | Purpose |
 |------|---------|
 | `TESTS_DIR/environment.md` | Test environment specification (Docker services, networks, volumes) |
 | `TESTS_DIR/test_data.md` | Test data management (seed data, mocks, isolation) |
 | `TESTS_DIR/functional_tests.md` | Functional test scenarios (positive + negative) |
 | `TESTS_DIR/non_functional_tests.md` | Non-functional test scenarios (perf, resilience, security, limits) |
 | `TESTS_DIR/traceability_matrix.md` | AC/restriction coverage mapping |
 | `_docs/00_problem/problem.md` | Problem context |
 | `_docs/00_problem/restrictions.md` | Constraints for test design |
 | `_docs/00_problem/acceptance_criteria.md` | Acceptance criteria being verified |
 ### Prerequisite Checks (BLOCKING)
 **Default:**
@@ -80,6 +102,12 @@ Announce the detected mode and resolved paths to the user before proceeding.
 **Single component mode:**
 1. The provided component file exists and is non-empty — **STOP if missing**
 **Tests-only mode:**
 1. `TESTS_DIR/functional_tests.md` exists and is non-empty — **STOP if missing**
 2. `TESTS_DIR/environment.md` exists — **STOP if missing**
 3. Create TASKS_DIR if it does not exist
 4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?**
 ## Artifact Management
 ### Directory Structure
@@ -100,6 +128,7 @@ TASKS_DIR/
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
 | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` |
 | Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` |
 | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 3 | Each integration test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` |
 | Step 4 | Cross-task verification complete | `_dependencies_table.md` |
@@ -118,6 +147,42 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
 ## Workflow
 ### Step 1t: Test Infrastructure Bootstrap (tests-only mode only)
 **Role**: Professional Quality Assurance Engineer
 **Goal**: Produce `01_test_infrastructure.md` — the first task describing the test project scaffold
 **Constraints**: This is a plan document, not code. The `/implement` skill executes it.
 1. Read `TESTS_DIR/environment.md` and `TESTS_DIR/test_data.md`
 2. Read problem.md, restrictions.md, acceptance_criteria.md for domain context
 3. Document the test infrastructure plan using `templates/test-infrastructure-task.md`
 The test infrastructure bootstrap must include:
 - Test project folder layout (`e2e/` directory structure)
 - Mock/stub service definitions for each external dependency
 - `docker-compose.test.yml` structure from environment.md
 - Test runner configuration (framework, plugins, fixtures)
 - Test data fixture setup from test_data.md seed data sets
 - Test reporting configuration (format, output path)
 - Data isolation strategy
 **Self-verification**:
 - [ ] Every external dependency from environment.md has a mock service defined
 - [ ] Docker Compose structure covers all services from environment.md
 - [ ] Test data fixtures cover all seed data sets from test_data.md
 - [ ] Test runner configuration matches the consumer app tech stack from environment.md
 - [ ] Data isolation strategy is defined
 **Save action**: Write `01_test_infrastructure.md` (temporary numeric name)
 **Jira action**: Create a Jira ticket for this task under the "Integration Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
 **Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
 **BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms.
 ---
 ### Step 1: Bootstrap Structure Plan (default mode only)
 **Role**: Professional software architect
@@ -166,7 +231,7 @@ The bootstrap structure plan must include:
 ---
-### Step 2: Task Decomposition (all modes)
+### Step 2: Task Decomposition (default and single component modes)
 **Role**: Professional software architect
 **Goal**: Decompose each component into atomic, implementable task specs — numbered sequentially starting from 02
@@ -200,18 +265,22 @@ For each component (or the single provided component):
 ---
-### Step 3: Integration Test Task Decomposition (default mode only)
+### Step 3: Integration Test Task Decomposition (default and tests-only modes)
 **Role**: Professional Quality Assurance Engineer
 **Goal**: Decompose integration test specs into atomic, implementable task specs
 **Constraints**: Behavioral specs only — describe what, not how. No test code.
-**Numbering**: Continue sequential numbering from where Step 2 left off.
+**Numbering**:
 - In default mode: continue sequential numbering from where Step 2 left off.
 - In tests-only mode: start from 02 (01 is the test infrastructure bootstrap from Step 1t).
 1. Read all test specs from `DOCUMENT_DIR/integration_tests/` (functional_tests.md, non_functional_tests.md)
 2. Group related test scenarios into atomic tasks (e.g., one task per test category or per component under test)
 3. Each task should reference the specific test scenarios it implements and the environment/test_data specs
-4. Dependencies: integration test tasks depend on the component implementation tasks they exercise
+4. Dependencies:
   - In default mode: integration test tasks depend on the component implementation tasks they exercise
   - In tests-only mode: integration test tasks depend on the test infrastructure bootstrap task (Step 1t)
 5. Write each task spec using `templates/task.md`
 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
@@ -221,31 +290,41 @@ For each component (or the single provided component):
 - [ ] Every functional test scenario from `integration_tests/functional_tests.md` is covered by a task
 - [ ] Every non-functional test scenario from `integration_tests/non_functional_tests.md` is covered by a task
 - [ ] No task exceeds 5 complexity points
- [ ] Dependencies correctly reference the component tasks being tested
+- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
 - [ ] Every task has a Jira ticket linked to the "Integration Tests" epic
 **Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`.
 ---
-### Step 4: Cross-Task Verification (default mode only)
+### Step 4: Cross-Task Verification (default and tests-only modes)
 **Role**: Professional software architect and analyst
 **Goal**: Verify task consistency and produce `_dependencies_table.md`
 **Constraints**: Review step — fix gaps found, do not add new tasks
 1. Verify task dependencies across all tasks are consistent
-2. Check no gaps: every interface in architecture.md has tasks covering it
+2. Check no gaps:
-3. Check no overlaps: tasks don't duplicate work across components
+   - In default mode: every interface in architecture.md has tasks covering it
   - In tests-only mode: every test scenario in `traceability_matrix.md` is covered by a task
 3. Check no overlaps: tasks don't duplicate work
 4. Check no circular dependencies in the task graph
 5. Produce `_dependencies_table.md` using `templates/dependencies-table.md`
 **Self-verification**:
 Default mode:
 - [ ] Every architecture interface is covered by at least one task
 - [ ] No circular dependencies in the task graph
 - [ ] Cross-component dependencies are explicitly noted in affected task specs
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
 Tests-only mode:
 - [ ] Every test scenario from traceability_matrix.md "Covered" entries has a corresponding task
 - [ ] No circular dependencies in the task graph
 - [ ] Test task dependencies reference the test infrastructure bootstrap
 - [ ] `_dependencies_table.md` contains every task with correct dependencies
 **Save action**: Write `_dependencies_table.md`
 **BLOCKING**: Present dependency summary to user. Do NOT proceed until user confirms.
@@ -279,15 +358,27 @@ For each component (or the single provided component):
 ```
 ┌────────────────────────────────────────────────────────────────┐
-│          Task Decomposition (4-Step Method)                     │
+│          Task Decomposition (Multi-Mode)                        │
 ├────────────────────────────────────────────────────────────────┤
-│ CONTEXT: Resolve mode (default / single component)             │
+│ CONTEXT: Resolve mode (default / single component / tests-only)│
-│ 1. Bootstrap Structure  → [JIRA-ID]_initial_structure.md       │
+│                                                                │
-│    [BLOCKING: user confirms structure]                         │
+│ DEFAULT MODE:                                                   │
-│ 2. Component Tasks      → [JIRA-ID]_[short_name].md each      │
+│  1.  Bootstrap Structure  → [JIRA-ID]_initial_structure.md     │
-│ 3. Integration Tests    → [JIRA-ID]_[short_name].md each      │
+│      [BLOCKING: user confirms structure]                       │
-│ 4. Cross-Verification   → _dependencies_table.md              │
+│  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
-│    [BLOCKING: user confirms dependencies]                      │
+│  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ TESTS-ONLY MODE:                                                │
 │  1t. Test Infrastructure  → [JIRA-ID]_test_infrastructure.md   │
 │      [BLOCKING: user confirms test scaffold]                   │
 │  3.  Integration Tests    → [JIRA-ID]_[short_name].md each    │
 │  4.  Cross-Verification   → _dependencies_table.md            │
 │      [BLOCKING: user confirms dependencies]                    │
 │                                                                │
 │ SINGLE COMPONENT MODE:                                          │
 │  2.  Component Tasks      → [JIRA-ID]_[short_name].md each    │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Atomic tasks · Behavioral specs · Flat structure   │
 │   Jira inline · Rename to Jira ID · Save now · Ask don't assume│
@@ -0,0 +1,129 @@
 # Test Infrastructure Task Template
 Use this template for the test infrastructure bootstrap (Step 1t in tests-only mode). Save as `TASKS_DIR/01_test_infrastructure.md` initially, then rename to `TASKS_DIR/[JIRA-ID]_test_infrastructure.md` after Jira ticket creation.
 ---
 ```markdown
 # Test Infrastructure
 **Task**: [JIRA-ID]_test_infrastructure
 **Name**: Test Infrastructure
 **Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
 **Complexity**: [3|5] points
 **Dependencies**: None
 **Component**: Integration Tests
 **Jira**: [TASK-ID]
 **Epic**: [EPIC-ID]
 ## Test Project Folder Layout
 ```
 e2e/
 ├── conftest.py
 ├── requirements.txt
 ├── Dockerfile
 ├── mocks/
 │   ├── [mock_service_1]/
 │   │   ├── Dockerfile
 │   │   └── [entrypoint file]
 │   └── [mock_service_2]/
 │       ├── Dockerfile
 │       └── [entrypoint file]
 ├── fixtures/
 │   └── [test data files]
 ├── tests/
 │   ├── test_[category_1].py
 │   ├── test_[category_2].py
 │   └── ...
 └── docker-compose.test.yml
 ```
 ### Layout Rationale
 [Brief explanation of directory structure choices — framework conventions, separation of mocks from tests, fixture management]
 ## Mock Services
 | Mock Service | Replaces | Endpoints | Behavior |
 |-------------|----------|-----------|----------|
 | [name] | [external service] | [endpoints it serves] | [response behavior, configurable via control API] |
 ### Mock Control API
 Each mock service exposes a `POST /mock/config` endpoint for test-time behavior control (e.g., simulate downtime, inject errors). A `GET /mock/[resource]` endpoint returns recorded interactions for assertion.
 ## Docker Test Environment
 ### docker-compose.test.yml Structure
 | Service | Image / Build | Purpose | Depends On |
 |---------|--------------|---------|------------|
 | [system-under-test] | [build context] | Main system being tested | [mock services] |
 | [mock-1] | [build context] | Mock for [external service] | — |
 | [e2e-consumer] | [build from e2e/] | Test runner | [system-under-test] |
 ### Networks and Volumes
 [Isolated test network, volume mounts for test data, model files, results output]
 ## Test Runner Configuration
 **Framework**: [e.g., pytest]
 **Plugins**: [e.g., pytest-csv, sseclient-py, requests]
 **Entry point**: [e.g., pytest --csv=/results/report.csv]
 ### Fixture Strategy
 | Fixture | Scope | Purpose |
 |---------|-------|---------|
 | [name] | [session/module/function] | [what it provides] |
 ## Test Data Fixtures
 | Data Set | Source | Format | Used By |
 |----------|--------|--------|---------|
 | [name] | [volume mount / generated / API seed] | [format] | [test categories] |
 ### Data Isolation
 [Strategy: fresh containers per run, volume cleanup, mock state reset]
 ## Test Reporting
 **Format**: [e.g., CSV]
 **Columns**: [e.g., Test ID, Test Name, Execution Time (ms), Result, Error Message]
 **Output path**: [e.g., /results/report.csv → mounted to host]
 ## Acceptance Criteria
 **AC-1: Test environment starts**
 Given the docker-compose.test.yml
 When `docker compose -f docker-compose.test.yml up` is executed
 Then all services start and the system-under-test is reachable
 **AC-2: Mock services respond**
 Given the test environment is running
 When the e2e-consumer sends requests to mock services
 Then mock services respond with configured behavior
 **AC-3: Test runner executes**
 Given the test environment is running
 When the e2e-consumer starts
 Then the test runner discovers and executes test files
 **AC-4: Test report generated**
 Given tests have been executed
 When the test run completes
 Then a report file exists at the configured output path with correct columns
 ```
 ---
 ## Guidance Notes
 - This is a PLAN document, not code. The `/implement` skill executes it.
 - Focus on test infrastructure decisions, not individual test implementations.
 - Reference environment.md and test_data.md from the test specs — don't repeat everything.
 - Mock services must be deterministic: same input always produces same output.
 - The Docker environment must be self-contained: `docker compose up` sufficient.
@@ -45,13 +45,13 @@ Announce the resolved paths to the user before proceeding.
 ### Required Files
-| File | Purpose |
+| File | Purpose | Required |
-|------|---------|
+|------|---------|----------|
-| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/problem.md` | Problem description and context | Greenfield only |
-| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations | Greenfield only |
-| `_docs/01_solution/solution.md` | Finalized solution |
+| `_docs/01_solution/solution.md` | Finalized solution | Greenfield only |
-| `DOCUMENT_DIR/architecture.md` | Architecture from plan skill |
+| `DOCUMENT_DIR/architecture.md` | Architecture (from plan or document skill) | Always |
-| `DOCUMENT_DIR/components/` | Component specs |
+| `DOCUMENT_DIR/components/` | Component specs | Always |
 ### Prerequisite Checks (BLOCKING)
@@ -0,0 +1,302 @@
 ---
 name: new-task
 description: |
  Interactive skill for adding new functionality to an existing codebase.
  Guides the user through describing the feature, assessing complexity,
  optionally running research, analyzing the codebase for insertion points,
  validating assumptions with the user, and producing a task spec with Jira ticket.
  Supports a loop — the user can add multiple tasks in one session.
  Trigger phrases:
  - "new task", "add feature", "new functionality"
  - "I want to add", "new component", "extend"
 category: build
 tags: [task, feature, interactive, planning, jira]
 disable-model-invocation: true
 ---
 # New Task (Interactive Feature Planning)
 Guide the user through defining new functionality for an existing codebase. Produces one or more task specifications with Jira tickets, optionally running deep research for complex features.
 ## Core Principles
 - **User-driven**: every task starts with the user's description; never invent requirements
 - **Right-size research**: only invoke the research skill when the change is big enough to warrant it
 - **Validate before committing**: surface all assumptions and uncertainties to the user before writing the task file
 - **Save immediately**: write task files to disk as soon as they are ready; never accumulate unsaved work
 - **Ask, don't assume**: when scope, insertion point, or approach is unclear, STOP and ask the user
 ## Context Resolution
 Fixed paths:
 - TASKS_DIR: `_docs/02_tasks/`
 - PLANS_DIR: `_docs/02_task_plans/`
 - DOCUMENT_DIR: `_docs/02_document/`
 - DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md`
 Create TASKS_DIR and PLANS_DIR if they don't exist.
 If TASKS_DIR already contains task files, scan them to determine the next numeric prefix for temporary file naming.
 ## Workflow
 The skill runs as a loop. Each iteration produces one task. After each task the user chooses to add another or finish.
 ---
 ### Step 1: Gather Feature Description
 **Role**: Product analyst
 **Goal**: Get a clear, detailed description of the new functionality from the user.
 Ask the user:
 ```
 ══════════════════════════════════════
 NEW TASK: Describe the functionality
 ══════════════════════════════════════
 Please describe in detail the new functionality you want to add:
 - What should it do?
 - Who is it for?
 - Any specific requirements or constraints?
 ══════════════════════════════════════
 ```
 **BLOCKING**: Do NOT proceed until the user provides a description.
 Record the description verbatim for use in subsequent steps.
 ---
 ### Step 2: Analyze Complexity
 **Role**: Technical analyst
 **Goal**: Determine whether deep research is needed.
 Read the user's description and the existing codebase documentation from DOCUMENT_DIR (architecture.md, components/, system-flows.md).
 Assess the change along these dimensions:
 - **Scope**: how many components/files are affected?
 - **Novelty**: does it involve libraries, protocols, or patterns not already in the codebase?
 - **Risk**: could it break existing functionality or require architectural changes?
 Classification:
 | Category | Criteria | Action |
 |----------|----------|--------|
 | **Needs research** | New libraries/frameworks, unfamiliar protocols, significant architectural change, multiple unknowns | Proceed to Step 3 (Research) |
 | **Skip research** | Extends existing functionality, uses patterns already in codebase, straightforward new component with known tech | Skip to Step 4 (Codebase Analysis) |
 Present the assessment to the user:
 ```
 ══════════════════════════════════════
 COMPLEXITY ASSESSMENT
 ══════════════════════════════════════
 Scope:   [low / medium / high]
 Novelty: [low / medium / high]
 Risk:    [low / medium / high]
 ══════════════════════════════════════
 Recommendation: [Research needed / Skip research]
 Reason: [one-line justification]
 ══════════════════════════════════════
 ```
 **BLOCKING**: Ask the user to confirm or override the recommendation before proceeding.
 ---
 ### Step 3: Research (conditional)
 **Role**: Researcher
 **Goal**: Investigate unknowns before task specification.
 This step only runs if Step 2 determined research is needed.
 1. Create a problem description file at `PLANS_DIR/<task_slug>/problem.md` summarizing the feature request and the specific unknowns to investigate
 2. Invoke `.cursor/skills/research/SKILL.md` in standalone mode:
   - INPUT_FILE: `PLANS_DIR/<task_slug>/problem.md`
   - BASE_DIR: `PLANS_DIR/<task_slug>/`
 3. After research completes, read the solution draft from `PLANS_DIR/<task_slug>/01_solution/solution_draft01.md`
 4. Extract the key findings relevant to the task specification
 The `<task_slug>` is a short kebab-case name derived from the feature description (e.g., `auth-provider-integration`, `real-time-notifications`).
 ---
 ### Step 4: Codebase Analysis
 **Role**: Software architect
 **Goal**: Determine where and how to insert the new functionality.
 1. Read the codebase documentation from DOCUMENT_DIR:
   - `architecture.md` — overall structure
   - `components/` — component specs
   - `system-flows.md` — data flows (if exists)
   - `data_model.md` — data model (if exists)
 2. If research was performed (Step 3), incorporate findings
 3. Analyze and determine:
   - Which existing components are affected
   - Where new code should be inserted (which layers, modules, files)
   - What interfaces need to change
   - What new interfaces or models are needed
   - How data flows through the change
 4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
 Present the analysis:
 ```
 ══════════════════════════════════════
 CODEBASE ANALYSIS
 ══════════════════════════════════════
 Affected components: [list]
 Insertion points:    [list of modules/layers]
 Interface changes:   [list or "None"]
 New interfaces:      [list or "None"]
 Data flow impact:    [summary]
 ══════════════════════════════════════
 ```
 ---
 ### Step 5: Validate Assumptions
 **Role**: Quality gate
 **Goal**: Surface every uncertainty and get user confirmation.
 Review all decisions and assumptions made in Steps 2–4. For each uncertainty:
 1. State the assumption clearly
 2. Propose a solution or approach
 3. List alternatives if they exist
 Present using the Choose format for each decision that has meaningful alternatives:
 ```
 ══════════════════════════════════════
 ASSUMPTION VALIDATION
 ══════════════════════════════════════
 1. [Assumption]: [proposed approach]
    Alternative: [other option, if any]
 2. [Assumption]: [proposed approach]
    Alternative: [other option, if any]
 ...
 ══════════════════════════════════════
 Please confirm or correct these assumptions.
 ══════════════════════════════════════
 ```
 **BLOCKING**: Do NOT proceed until the user confirms or corrects all assumptions.
 ---
 ### Step 6: Create Task
 **Role**: Technical writer
 **Goal**: Produce the task specification file.
 1. Determine the next numeric prefix by scanning TASKS_DIR for existing files
 2. Write the task file using `templates/task.md`:
   - Fill all fields from the gathered information
   - Set **Complexity** based on the assessment from Step 2
   - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR
   - Set **Jira** and **Epic** to `pending` (filled in Step 7)
 3. Save as `TASKS_DIR/[##]_[short_name].md`
 **Self-verification**:
 - [ ] Problem section clearly describes the user need
 - [ ] Acceptance criteria are testable (Gherkin format)
 - [ ] Scope boundaries are explicit
 - [ ] Complexity points match the assessment
 - [ ] Dependencies reference existing task Jira IDs where applicable
 - [ ] No implementation details leaked into the spec
 ---
 ### Step 7: Jira Ticket
 **Role**: Project coordinator
 **Goal**: Create a Jira ticket and link it to the task file.
 1. Create a Jira ticket for the task:
   - Summary: the task's **Name** field
   - Description: the task's **Problem** and **Acceptance Criteria** sections
   - Story points: the task's **Complexity** value
   - Link to the appropriate epic (ask user if unclear which epic)
 2. Write the Jira ticket ID and Epic ID back into the task file header:
   - Update **Task** field: `[JIRA-ID]_[short_name]`
   - Update **Jira** field: `[JIRA-ID]`
   - Update **Epic** field: `[EPIC-ID]`
 3. Rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`
 If Jira MCP is not authenticated or unavailable:
 - Keep the numeric prefix
 - Set **Jira** to `pending`
 - Set **Epic** to `pending`
 - The task is still valid and can be implemented; Jira sync happens later
 ---
 ### Step 8: Loop Gate
 Ask the user:
 ```
 ══════════════════════════════════════
 Task created: [JIRA-ID or ##] — [task name]
 ══════════════════════════════════════
 A) Add another task
 B) Done — finish and update dependencies
 ══════════════════════════════════════
 ```
 - If **A** → loop back to Step 1
 - If **B** → proceed to Finalize
 ---
 ### Finalize
 After the user chooses **Done**:
 1. Update (or create) `TASKS_DIR/_dependencies_table.md` — add all newly created tasks to the dependencies table
 2. Present a summary of all tasks created in this session:
 ```
 ══════════════════════════════════════
 NEW TASK SUMMARY
 ══════════════════════════════════════
 Tasks created: N
 Total complexity: M points
 ─────────────────────────────────────
 [JIRA-ID] [name] ([complexity] pts)
 [JIRA-ID] [name] ([complexity] pts)
 ...
 ══════════════════════════════════════
 ```
 ## Escalation Rules
 | Situation | Action |
 |-----------|--------|
 | User description is vague or incomplete | **ASK** for more detail — do not guess |
 | Unclear which epic to link to | **ASK** user for the epic |
 | Research skill hits a blocker | Follow research skill's own escalation rules |
 | Codebase analysis reveals conflicting architectures | **ASK** user which pattern to follow |
 | Complexity exceeds 5 points | **WARN** user and suggest splitting into multiple tasks |
 | Jira MCP unavailable | **WARN**, continue with local-only task files |
 ## Trigger Conditions
 When the user wants to:
 - Add new functionality to an existing codebase
 - Plan a new feature or component
 - Create task specifications for upcoming work
 **Keywords**: "new task", "add feature", "new functionality", "extend", "I want to add"
 **Differentiation**:
 - User wants to decompose an existing plan into tasks → use `/decompose`
 - User wants to research a topic without creating tasks → use `/research`
 - User wants to refactor existing code → use `/refactor`
 - User wants to define and plan a new feature → use this skill
@@ -0,0 +1,113 @@
 # Task Specification Template
 Create a focused behavioral specification that describes **what** the system should do, not **how** it should be built.
 Save as `TASKS_DIR/[##]_[short_name].md` initially, then rename to `TASKS_DIR/[JIRA-ID]_[short_name].md` after Jira ticket creation.
 ---
 ```markdown
 # [Feature Name]
 **Task**: [JIRA-ID]_[short_name]
 **Name**: [short human name]
 **Description**: [one-line description of what this task delivers]
 **Complexity**: [1|2|3|5] points
 **Dependencies**: [AZ-43_shared_models, AZ-44_db_migrations] or "None"
 **Component**: [component name for context]
 **Jira**: [TASK-ID]
 **Epic**: [EPIC-ID]
 ## Problem
 Clear, concise statement of the problem users are facing.
 ## Outcome
 - Measurable or observable goal 1
 - Measurable or observable goal 2
 - ...
 ## Scope
 ### Included
 - What's in scope for this task
 ### Excluded
 - Explicitly what's NOT in scope
 ## Acceptance Criteria
 **AC-1: [Title]**
 Given [precondition]
 When [action]
 Then [expected result]
 **AC-2: [Title]**
 Given [precondition]
 When [action]
 Then [expected result]
 ## Non-Functional Requirements
 **Performance**
 - [requirement if relevant]
 **Compatibility**
 - [requirement if relevant]
 **Reliability**
 - [requirement if relevant]
 ## Unit Tests
 | AC Ref | What to Test | Required Outcome |
 |--------|-------------|-----------------|
 | AC-1 | [test subject] | [expected result] |
 ## Integration Tests
 | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
 |--------|------------------------|-------------|-------------------|----------------|
 | AC-1 | [setup] | [test subject] | [expected behavior] | [NFR if any] |
 ## Constraints
 - [Architectural pattern constraint if critical]
 - [Technical limitation]
 - [Integration requirement]
 ## Risks & Mitigation
 **Risk 1: [Title]**
 - *Risk*: [Description]
 - *Mitigation*: [Approach]
 ```
 ---
 ## Complexity Points Guide
 - 1 point: Trivial, self-contained, no dependencies
 - 2 points: Non-trivial, low complexity, minimal coordination
 - 3 points: Multi-step, moderate complexity, potential alignment needed
 - 5 points: Difficult, interconnected logic, medium-high risk
 - 8 points: Too complex — split into smaller tasks
 ## Output Guidelines
 **DO:**
 - Focus on behavior and user experience
 - Use clear, simple language
 - Keep acceptance criteria testable (Gherkin format)
 - Include realistic scope boundaries
 - Write from the user's perspective
 - Include complexity estimation
 - Reference dependencies by Jira ID (e.g., AZ-43_shared_models)
 **DON'T:**
 - Include implementation details (file paths, classes, methods)
 - Prescribe technical solutions or libraries
 - Add architectural diagrams or code examples
 - Specify exact API endpoints or data structures
 - Include step-by-step implementation instructions
 - Add "how to build" guidance
@@ -35,9 +35,7 @@ Fixed paths — no mode detection needed:
 Announce the resolved paths to the user before proceeding.
-## Input Specification
+## Required Files
 ### Required Files
 | File | Purpose |
 |------|---------|
@@ -47,115 +45,13 @@ Announce the resolved paths to the user before proceeding.
 | `_docs/00_problem/input_data/` | Reference data examples |
 | `_docs/01_solution/solution.md` | Finalized solution to decompose |
-### Prerequisite Checks (BLOCKING)
+## Prerequisites
-Run sequentially before any planning step:
+Read and follow `steps/00_prerequisites.md`. All three prerequisite checks are **BLOCKING** — do not start the workflow until they pass.
 **Prereq 1: Data Gate**
 1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
 2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
 3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
 4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
 All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
 **Prereq 2: Finalize Solution Draft**
 Only runs after the Data Gate passes:
 1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
 2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
 3. **Rename** it to `_docs/01_solution/solution.md`
 4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
 5. Verify `solution.md` is non-empty — **STOP if missing or empty**
 **Prereq 3: Workspace Setup**
 1. Create DOCUMENT_DIR if it does not exist
 2. If DOCUMENT_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
 ## Artifact Management
-### Directory Structure
+Read `steps/01_artifact-management.md` for directory structure, save timing, save principles, and resumability rules. Refer to it throughout the workflow.
 All artifacts are written directly under DOCUMENT_DIR:
 ```
 DOCUMENT_DIR/
 ├── integration_tests/
 │   ├── environment.md
 │   ├── test_data.md
 │   ├── functional_tests.md
 │   ├── non_functional_tests.md
 │   └── traceability_matrix.md
 ├── architecture.md
 ├── system-flows.md
 ├── data_model.md
 ├── deployment/
 │   ├── containerization.md
 │   ├── ci_cd_pipeline.md
 │   ├── environment_strategy.md
 │   ├── observability.md
 │   └── deployment_procedures.md
 ├── risk_mitigations.md
 ├── risk_mitigations_02.md          (iterative, ## as sequence)
 ├── components/
 │   ├── 01_[name]/
 │   │   ├── description.md
 │   │   └── tests.md
 │   ├── 02_[name]/
 │   │   ├── description.md
 │   │   └── tests.md
 │   └── ...
 ├── common-helpers/
 │   ├── 01_helper_[name]/
 │   ├── 02_helper_[name]/
 │   └── ...
 ├── diagrams/
 │   ├── components.drawio
 │   └── flows/
 │       ├── flow_[name].md          (Mermaid)
 │       └── ...
 └── FINAL_report.md
 ```
 ### Save Timing
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
 | Step 1 | Integration test environment spec | `integration_tests/environment.md` |
 | Step 1 | Integration test data spec | `integration_tests/test_data.md` |
 | Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
 | Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
 | Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
 | Step 2 | Data model documented | `data_model.md` |
 | Step 2 | Deployment plan complete | `deployment/` (5 files) |
 | Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
 | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
 | Step 3 | Diagrams generated | `diagrams/` |
 | Step 4 | Risk assessment complete | `risk_mitigations.md` |
 | Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
 | Step 6 | Epics created in Jira | Jira via MCP |
 | Final | All steps complete | `FINAL_report.md` |
 ### Save Principles
 1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
 2. **Incremental updates**: same file can be updated multiple times; append or replace
 3. **Preserve process**: keep all intermediate files even after integration into final report
 4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
 ### Resumability
 If DOCUMENT_DIR already contains artifacts:
 1. List existing files and match them to the save timing table above
 2. Identify the last completed step based on which artifacts exist
 3. Resume from the next incomplete step
 4. Inform the user which steps are being skipped
 ## Progress Tracking
@@ -165,52 +61,7 @@ At the start of execution, create a TodoWrite with all steps (1 through 6). Upda
 ### Step 1: Integration Tests
-**Role**: Professional Quality Assurance Engineer
+Read and execute `.cursor/skills/blackbox-test-spec/SKILL.md`.
 **Goal**: Analyze input data completeness and produce detailed black-box integration test specifications
 **Constraints**: Spec only — no test code. Tests describe what the system should do given specific inputs, not how the system is built.
 #### Phase 1a: Input Data Completeness Analysis
 1. Read `_docs/01_solution/solution.md` (finalized in Prereq 2)
 2. Read `acceptance_criteria.md`, `restrictions.md`
 3. Read testing strategy from solution.md
 4. Analyze `input_data/` contents against:
   - Coverage of acceptance criteria scenarios
   - Coverage of restriction edge cases
   - Coverage of testing strategy requirements
 5. Threshold: at least 70% coverage of the scenarios
 6. If coverage is low, search the internet for supplementary data, assess quality with user, and if user agrees, add to `input_data/`
 7. Present coverage assessment to user
 **BLOCKING**: Do NOT proceed until user confirms the input data coverage is sufficient.
 #### Phase 1b: Black-Box Test Scenario Specification
 Based on all acquired data, acceptance_criteria, and restrictions, form detailed test scenarios:
 1. Define test environment using `templates/integration-environment.md` as structure
 2. Define test data management using `templates/integration-test-data.md` as structure
 3. Write functional test scenarios (positive + negative) using `templates/integration-functional-tests.md` as structure
 4. Write non-functional test scenarios (performance, resilience, security, edge cases) using `templates/integration-non-functional-tests.md` as structure
 5. Build traceability matrix using `templates/integration-traceability-matrix.md` as structure
 **Self-verification**:
 - [ ] Every acceptance criterion is covered by at least one test scenario
 - [ ] Every restriction is verified by at least one test scenario
 - [ ] Positive and negative scenarios are balanced
 - [ ] Consumer app has no direct access to system internals
 - [ ] Docker environment is self-contained (`docker compose up` sufficient)
 - [ ] External dependencies have mock/stub services defined
 - [ ] Traceability matrix has no uncovered AC or restrictions
 **Save action**: Write all files under `integration_tests/`:
 - `environment.md`
 - `test_data.md`
 - `functional_tests.md`
 - `non_functional_tests.md`
 - `traceability_matrix.md`
 **BLOCKING**: Present test coverage summary (from traceability_matrix.md) to user. Do NOT proceed until confirmed.
 Capture any new questions, findings, or insights that arise during test specification — these feed forward into Steps 2 and 3.
@@ -218,285 +69,37 @@ Capture any new questions, findings, or insights that arise during test specific
 ### Step 2: Solution Analysis
-**Role**: Professional software architect
+Read and follow `steps/02_solution-analysis.md`.
 **Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
 **Constraints**: No code, no component-level detail yet; focus on system-level view
 #### Phase 2a: Architecture & Flows
 1. Read all input files thoroughly
 2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
 4. Document architecture using `templates/architecture.md` as structure
 5. Document system flows using `templates/system-flows.md` as structure
 **Self-verification**:
 - [ ] Architecture covers all capabilities mentioned in solution.md
 - [ ] System flows cover all main user/system interactions
 - [ ] No contradictions with problem.md or restrictions.md
 - [ ] Technology choices are justified
 - [ ] Integration test findings are reflected in architecture decisions
 **Save action**: Write `architecture.md` and `system-flows.md`
 **BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
 #### Phase 2b: Data Model
 **Role**: Professional software architect
 **Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
 1. Extract core entities from architecture.md and solution.md
 2. Define entity attributes, types, and constraints
 3. Define relationships between entities (Mermaid ERD)
 4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
 5. Define seed data requirements per environment (dev, staging)
 6. Define backward compatibility approach for schema changes (additive-only by default)
 **Self-verification**:
 - [ ] Every entity mentioned in architecture.md is defined
 - [ ] Relationships are explicit with cardinality
 - [ ] Migration strategy specifies reversibility requirement
 - [ ] Seed data requirements defined
 - [ ] Backward compatibility approach documented
 **Save action**: Write `data_model.md`
 #### Phase 2c: Deployment Planning
 **Role**: DevOps / Platform engineer
 **Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
 Use the `/deploy` skill's templates as structure for each artifact:
 1. Read architecture.md and restrictions.md for infrastructure constraints
 2. Research Docker best practices for the project's tech stack
 3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
 4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
 5. Define environment strategy: dev, staging, production with secrets management
 6. Define observability: structured logging, metrics, tracing, alerting
 7. Define deployment procedures: strategy, health checks, rollback, checklist
 **Self-verification**:
 - [ ] Every component has a Docker specification
 - [ ] CI/CD pipeline covers lint, test, security, build, deploy
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Observability covers logging, metrics, tracing, alerting
 - [ ] Deployment procedures include rollback and health checks
 **Save action**: Write all 5 files under `deployment/`:
 - `containerization.md`
 - `ci_cd_pipeline.md`
 - `environment_strategy.md`
 - `observability.md`
 - `deployment_procedures.md`
 ---
 ### Step 3: Component Decomposition
-**Role**: Professional software architect
+Read and follow `steps/03_component-decomposition.md`.
 **Goal**: Decompose the architecture into components with detailed specs
 **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
 1. Identify components from the architecture; think about separation, reusability, and communication patterns
 2. Use integration test scenarios from Step 1 to validate component boundaries
 3. If additional components are needed (data preparation, shared helpers), create them
 4. For each component, write a spec using `templates/component-spec.md` as structure
 5. Generate diagrams:
   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
   - Mermaid flowchart per main control flow
 6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
 **Self-verification**:
 - [ ] Each component has a single, clear responsibility
 - [ ] No functionality is spread across multiple components
 - [ ] All inter-component interfaces are defined (who calls whom, with what)
 - [ ] Component dependency graph has no circular dependencies
 - [ ] All components from architecture.md are accounted for
 - [ ] Every integration test scenario can be traced through component interactions
 **Save action**: Write:
 - each component `components/[##]_[name]/description.md`
 - common helper `common-helpers/[##]_helper_[name].md`
 - diagrams `diagrams/`
 **BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
 ---
 ### Step 4: Architecture Review & Risk Assessment
-**Role**: Professional software architect and analyst
+Read and follow `steps/04_review-risk.md`.
 **Goal**: Validate all artifacts for consistency, then identify and mitigate risks
 **Constraints**: This is a review step — fix problems found, do not add new features
 #### 4a. Evaluator Pass (re-read ALL artifacts)
 Review checklist:
 - [ ] All components follow Single Responsibility Principle
 - [ ] All components follow dumb code / smart data principle
 - [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
 - [ ] No circular dependencies in the dependency graph
 - [ ] No missing interactions between components
 - [ ] No over-engineering — is there a simpler decomposition?
 - [ ] Security considerations addressed in component design
 - [ ] Performance bottlenecks identified
 - [ ] API contracts are consistent across components
 Fix any issues found before proceeding to risk identification.
 #### 4b. Risk Identification
 1. Identify technical and project risks
 2. Assess probability and impact using `templates/risk-register.md`
 3. Define mitigation strategies
 4. Apply mitigations to architecture, flows, and component documents where applicable
 **Self-verification**:
 - [ ] Every High/Critical risk has a concrete mitigation strategy
 - [ ] Mitigations are reflected in the relevant component or architecture docs
 - [ ] No new risks introduced by the mitigations themselves
 **Save action**: Write `risk_mitigations.md`
 **BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
 **Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
 ---
 ### Step 5: Test Specifications
-**Role**: Professional Quality Assurance Engineer
+Read and follow `steps/05_test-specifications.md`.
 **Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
 **Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
 1. For each component, write tests using `templates/test-spec.md` as structure
 2. Cover all 4 types: integration, performance, security, acceptance
 3. Include test data management (setup, teardown, isolation)
 4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
 **Self-verification**:
 - [ ] Every acceptance criterion has at least one test covering it
 - [ ] Test inputs are realistic and well-defined
 - [ ] Expected results are specific and measurable
 - [ ] No component is left without tests
 **Save action**: Write each `components/[##]_[name]/tests.md`
 ---
 ### Step 6: Jira Epics
-**Role**: Professional product manager
+Read and follow `steps/06_jira-epics.md`.
 **Goal**: Create Jira epics from components, ordered by dependency
 **Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
 2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
 4. Include effort estimation per epic (T-shirt size or story points range)
 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
 6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
 **CRITICAL — Epic description richness requirements**:
 Each epic description in Jira MUST include ALL of the following sections with substantial content:
 - **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
 - **Problem / Context**: what problem this component solves, why it exists, current pain points
 - **Scope**: detailed in-scope and out-of-scope lists
 - **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
 - **Interface specification**: full method signatures, input/output types, error types (from component description.md)
 - **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
 - **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
 - **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
 - **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
 - **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
 - **Effort estimation**: T-shirt size and story points range
 - **Child issues**: planned task breakdown with complexity points
 - **Key constraints**: from restrictions.md that affect this component
 - **Testing strategy**: summary of test types and coverage from tests.md
 Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
 - [ ] "Integration Tests" epic exists
 - [ ] Every component maps to exactly one epic
 - [ ] Dependency order is respected (no epic depends on a later one)
 - [ ] Acceptance criteria are measurable
 - [ ] Effort estimates are realistic
 - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
 - [ ] Epic descriptions are self-contained — readable without opening other files
 7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
 **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
 ---
-## Quality Checklist (before FINAL_report.md)
+### Final: Quality Checklist
-Before writing the final report, verify ALL of the following:
+Read and follow `steps/07_quality-checklist.md`.
 ### Integration Tests
 - [ ] Every acceptance criterion is covered in traceability_matrix.md
 - [ ] Every restriction is verified by at least one test
 - [ ] Positive and negative scenarios are balanced
 - [ ] Docker environment is self-contained
 - [ ] Consumer app treats main system as black box
 - [ ] CI/CD integration and reporting defined
 ### Architecture
 - [ ] Covers all capabilities from solution.md
 - [ ] Technology choices are justified
 - [ ] Deployment model is defined
 - [ ] Integration test findings are reflected in architecture decisions
 ### Data Model
 - [ ] Every entity from architecture.md is defined
 - [ ] Relationships have explicit cardinality
 - [ ] Migration strategy with reversibility requirement
 - [ ] Seed data requirements defined
 - [ ] Backward compatibility approach documented
 ### Deployment
 - [ ] Containerization plan covers all components
 - [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Observability covers logging, metrics, tracing, alerting
 - [ ] Deployment procedures include rollback and health checks
 ### Components
 - [ ] Every component follows SRP
 - [ ] No circular dependencies
 - [ ] All inter-component interfaces are defined and consistent
 - [ ] No orphan components (unused by any flow)
 - [ ] Every integration test scenario can be traced through component interactions
 ### Risks
 - [ ] All High/Critical risks have mitigations
 - [ ] Mitigations are reflected in component/architecture docs
 - [ ] User has confirmed risk assessment is sufficient
 ### Tests
 - [ ] Every acceptance criterion is covered by at least one test
 - [ ] All 4 test types are represented per component (where applicable)
 - [ ] Test data management is defined
 ### Epics
 - [ ] "Bootstrap & Initial Structure" epic exists
 - [ ] "Integration Tests" epic exists
 - [ ] Every component maps to an epic
 - [ ] Dependency order is correct
 - [ ] Acceptance criteria are measurable
 **Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
 ## Common Mistakes
@@ -522,36 +125,3 @@ Before writing the final report, verify ALL of the following:
 | File structure within templates | PROCEED |
 | Contradictions between input files | ASK user |
 | Risk mitigation requires architecture change | ASK user |
 ## Methodology Quick Reference
 ```
 ┌────────────────────────────────────────────────────────────────┐
 │               Solution Planning (6-Step Method)                │
 ├────────────────────────────────────────────────────────────────┤
 │ PREREQ 1: Data Gate (BLOCKING)                                 │
 │   → verify AC, restrictions, input_data exist — STOP if not    │
 │ PREREQ 2: Finalize solution draft                              │
 │   → rename highest solution_draft##.md to solution.md          │
 │ PREREQ 3: Workspace setup                                      │
 │   → create DOCUMENT_DIR/ if needed                                │
 │                                                                │
 │ 1. Integration Tests  → integration_tests/ (5 files)           │
 │    [BLOCKING: user confirms test coverage]                     │
 │ 2a. Architecture      → architecture.md, system-flows.md       │
 │    [BLOCKING: user confirms architecture]                      │
 │ 2b. Data Model        → data_model.md                          │
 │ 2c. Deployment        → deployment/ (5 files)                  │
 │ 3. Component Decompose → components/[##]_[name]/description    │
 │    [BLOCKING: user confirms decomposition]                     │
 │ 4. Review & Risk      → risk_mitigations.md                    │
 │    [BLOCKING: user confirms risks, iterative]                  │
 │ 5. Test Specifications → components/[##]_[name]/tests.md       │
 │ 6. Jira Epics         → Jira via MCP                           │
 │    ─────────────────────────────────────────────────           │
 │    Quality Checklist → FINAL_report.md                         │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: SRP · Dumb code/smart data · Save immediately      │
 │             Ask don't assume · Plan don't code                 │
 └────────────────────────────────────────────────────────────────┘
 ```
@@ -0,0 +1,27 @@
 ## Prerequisite Checks (BLOCKING)
 Run sequentially before any planning step:
 ### Prereq 1: Data Gate
 1. `_docs/00_problem/acceptance_criteria.md` exists and is non-empty — **STOP if missing**
 2. `_docs/00_problem/restrictions.md` exists and is non-empty — **STOP if missing**
 3. `_docs/00_problem/input_data/` exists and contains at least one data file — **STOP if missing**
 4. `_docs/00_problem/problem.md` exists and is non-empty — **STOP if missing**
 All four are mandatory. If any is missing or empty, STOP and ask the user to provide them. If the user cannot provide the required data, planning cannot proceed — just stop.
 ### Prereq 2: Finalize Solution Draft
 Only runs after the Data Gate passes:
 1. Scan `_docs/01_solution/` for files matching `solution_draft*.md`
 2. Identify the highest-numbered draft (e.g. `solution_draft06.md`)
 3. **Rename** it to `_docs/01_solution/solution.md`
 4. If `solution.md` already exists, ask the user whether to overwrite or keep existing
 5. Verify `solution.md` is non-empty — **STOP if missing or empty**
 ### Prereq 3: Workspace Setup
 1. Create DOCUMENT_DIR if it does not exist
 2. If DOCUMENT_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
@@ -0,0 +1,81 @@
 ## Artifact Management
 ### Directory Structure
 All artifacts are written directly under DOCUMENT_DIR:
 ```
 DOCUMENT_DIR/
 ├── integration_tests/
 │   ├── environment.md
 │   ├── test_data.md
 │   ├── functional_tests.md
 │   ├── non_functional_tests.md
 │   └── traceability_matrix.md
 ├── architecture.md
 ├── system-flows.md
 ├── data_model.md
 ├── deployment/
 │   ├── containerization.md
 │   ├── ci_cd_pipeline.md
 │   ├── environment_strategy.md
 │   ├── observability.md
 │   └── deployment_procedures.md
 ├── risk_mitigations.md
 ├── risk_mitigations_02.md          (iterative, ## as sequence)
 ├── components/
 │   ├── 01_[name]/
 │   │   ├── description.md
 │   │   └── tests.md
 │   ├── 02_[name]/
 │   │   ├── description.md
 │   │   └── tests.md
 │   └── ...
 ├── common-helpers/
 │   ├── 01_helper_[name]/
 │   ├── 02_helper_[name]/
 │   └── ...
 ├── diagrams/
 │   ├── components.drawio
 │   └── flows/
 │       ├── flow_[name].md          (Mermaid)
 │       └── ...
 └── FINAL_report.md
 ```
 ### Save Timing
 | Step | Save immediately after | Filename |
 |------|------------------------|----------|
 | Step 1 | Integration test environment spec | `integration_tests/environment.md` |
 | Step 1 | Integration test data spec | `integration_tests/test_data.md` |
 | Step 1 | Integration functional tests | `integration_tests/functional_tests.md` |
 | Step 1 | Integration non-functional tests | `integration_tests/non_functional_tests.md` |
 | Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
 | Step 2 | Data model documented | `data_model.md` |
 | Step 2 | Deployment plan complete | `deployment/` (5 files) |
 | Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
 | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
 | Step 3 | Diagrams generated | `diagrams/` |
 | Step 4 | Risk assessment complete | `risk_mitigations.md` |
 | Step 5 | Tests written per component | `components/[##]_[name]/tests.md` |
 | Step 6 | Epics created in Jira | Jira via MCP |
 | Final | All steps complete | `FINAL_report.md` |
 ### Save Principles
 1. **Save immediately**: write to disk as soon as a step completes; do not wait until the end
 2. **Incremental updates**: same file can be updated multiple times; append or replace
 3. **Preserve process**: keep all intermediate files even after integration into final report
 4. **Enable recovery**: if interrupted, resume from the last saved artifact (see Resumability)
 ### Resumability
 If DOCUMENT_DIR already contains artifacts:
 1. List existing files and match them to the save timing table above
 2. Identify the last completed step based on which artifacts exist
 3. Resume from the next incomplete step
 4. Inform the user which steps are being skipped
@@ -0,0 +1,74 @@
 ## Step 2: Solution Analysis
 **Role**: Professional software architect
 **Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
 **Constraints**: No code, no component-level detail yet; focus on system-level view
 ### Phase 2a: Architecture & Flows
 1. Read all input files thoroughly
 2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
 4. Document architecture using `templates/architecture.md` as structure
 5. Document system flows using `templates/system-flows.md` as structure
 **Self-verification**:
 - [ ] Architecture covers all capabilities mentioned in solution.md
 - [ ] System flows cover all main user/system interactions
 - [ ] No contradictions with problem.md or restrictions.md
 - [ ] Technology choices are justified
 - [ ] Integration test findings are reflected in architecture decisions
 **Save action**: Write `architecture.md` and `system-flows.md`
 **BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
 ### Phase 2b: Data Model
 **Role**: Professional software architect
 **Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
 1. Extract core entities from architecture.md and solution.md
 2. Define entity attributes, types, and constraints
 3. Define relationships between entities (Mermaid ERD)
 4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
 5. Define seed data requirements per environment (dev, staging)
 6. Define backward compatibility approach for schema changes (additive-only by default)
 **Self-verification**:
 - [ ] Every entity mentioned in architecture.md is defined
 - [ ] Relationships are explicit with cardinality
 - [ ] Migration strategy specifies reversibility requirement
 - [ ] Seed data requirements defined
 - [ ] Backward compatibility approach documented
 **Save action**: Write `data_model.md`
 ### Phase 2c: Deployment Planning
 **Role**: DevOps / Platform engineer
 **Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
 Use the `/deploy` skill's templates as structure for each artifact:
 1. Read architecture.md and restrictions.md for infrastructure constraints
 2. Research Docker best practices for the project's tech stack
 3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
 4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
 5. Define environment strategy: dev, staging, production with secrets management
 6. Define observability: structured logging, metrics, tracing, alerting
 7. Define deployment procedures: strategy, health checks, rollback, checklist
 **Self-verification**:
 - [ ] Every component has a Docker specification
 - [ ] CI/CD pipeline covers lint, test, security, build, deploy
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Observability covers logging, metrics, tracing, alerting
 - [ ] Deployment procedures include rollback and health checks
 **Save action**: Write all 5 files under `deployment/`:
 - `containerization.md`
 - `ci_cd_pipeline.md`
 - `environment_strategy.md`
 - `observability.md`
 - `deployment_procedures.md`
@@ -0,0 +1,29 @@
 ## Step 3: Component Decomposition
 **Role**: Professional software architect
 **Goal**: Decompose the architecture into components with detailed specs
 **Constraints**: No code; only names, interfaces, inputs/outputs. Follow SRP strictly.
 1. Identify components from the architecture; think about separation, reusability, and communication patterns
 2. Use integration test scenarios from Step 1 to validate component boundaries
 3. If additional components are needed (data preparation, shared helpers), create them
 4. For each component, write a spec using `templates/component-spec.md` as structure
 5. Generate diagrams:
   - draw.io component diagram showing relations (minimize line intersections, group semantically coherent components, place external users near their components)
   - Mermaid flowchart per main control flow
 6. Components can share and reuse common logic, same for multiple components. Hence for such occurences common-helpers folder is specified.
 **Self-verification**:
 - [ ] Each component has a single, clear responsibility
 - [ ] No functionality is spread across multiple components
 - [ ] All inter-component interfaces are defined (who calls whom, with what)
 - [ ] Component dependency graph has no circular dependencies
 - [ ] All components from architecture.md are accounted for
 - [ ] Every integration test scenario can be traced through component interactions
 **Save action**: Write:
 - each component `components/[##]_[name]/description.md`
 - common helper `common-helpers/[##]_helper_[name].md`
 - diagrams `diagrams/`
 **BLOCKING**: Present component list with one-line summaries to user. Do NOT proceed until user confirms.
@@ -0,0 +1,38 @@
 ## Step 4: Architecture Review & Risk Assessment
 **Role**: Professional software architect and analyst
 **Goal**: Validate all artifacts for consistency, then identify and mitigate risks
 **Constraints**: This is a review step — fix problems found, do not add new features
 ### 4a. Evaluator Pass (re-read ALL artifacts)
 Review checklist:
 - [ ] All components follow Single Responsibility Principle
 - [ ] All components follow dumb code / smart data principle
 - [ ] Inter-component interfaces are consistent (caller's output matches callee's input)
 - [ ] No circular dependencies in the dependency graph
 - [ ] No missing interactions between components
 - [ ] No over-engineering — is there a simpler decomposition?
 - [ ] Security considerations addressed in component design
 - [ ] Performance bottlenecks identified
 - [ ] API contracts are consistent across components
 Fix any issues found before proceeding to risk identification.
 ### 4b. Risk Identification
 1. Identify technical and project risks
 2. Assess probability and impact using `templates/risk-register.md`
 3. Define mitigation strategies
 4. Apply mitigations to architecture, flows, and component documents where applicable
 **Self-verification**:
 - [ ] Every High/Critical risk has a concrete mitigation strategy
 - [ ] Mitigations are reflected in the relevant component or architecture docs
 - [ ] No new risks introduced by the mitigations themselves
 **Save action**: Write `risk_mitigations.md`
 **BLOCKING**: Present risk summary to user. Ask whether assessment is sufficient.
 **Iterative**: If user requests another round, repeat Step 4 and write `risk_mitigations_##.md` (## as sequence number). Continue until user confirms.
@@ -0,0 +1,20 @@
 ## Step 5: Test Specifications
 **Role**: Professional Quality Assurance Engineer
 **Goal**: Write test specs for each component achieving minimum 75% acceptance criteria coverage
 **Constraints**: Test specs only — no test code. Each test must trace to an acceptance criterion.
 1. For each component, write tests using `templates/test-spec.md` as structure
 2. Cover all 4 types: integration, performance, security, acceptance
 3. Include test data management (setup, teardown, isolation)
 4. Verify traceability: every acceptance criterion from `acceptance_criteria.md` must be covered by at least one test
 **Self-verification**:
 - [ ] Every acceptance criterion has at least one test covering it
 - [ ] Test inputs are realistic and well-defined
 - [ ] Expected results are specific and measurable
 - [ ] No component is left without tests
 **Save action**: Write each `components/[##]_[name]/tests.md`
@@ -0,0 +1,48 @@
 ## Step 6: Jira Epics
 **Role**: Professional product manager
 **Goal**: Create Jira epics from components, ordered by dependency
 **Constraints**: Epic descriptions must be **comprehensive and self-contained** — a developer reading only the Jira epic should understand the full context without needing to open separate files.
 1. **Create "Bootstrap & Initial Structure" epic first** — this epic will parent the `01_initial_structure` task created by the decompose skill. It covers project scaffolding: folder structure, shared models, interfaces, stubs, CI/CD config, DB migrations setup, test structure.
 2. Generate Jira Epics for each component using Jira MCP, structured per `templates/epic-spec.md`
 3. Order epics by dependency (Bootstrap epic is always first, then components based on their dependency graph)
 4. Include effort estimation per epic (T-shirt size or story points range)
 5. Ensure each epic has clear acceptance criteria cross-referenced with component specs
 6. Generate Mermaid diagrams showing component-to-epic mapping and component relationships
 **CRITICAL — Epic description richness requirements**:
 Each epic description in Jira MUST include ALL of the following sections with substantial content:
 - **System context**: where this component fits in the overall architecture (include Mermaid diagram showing this component's position and connections)
 - **Problem / Context**: what problem this component solves, why it exists, current pain points
 - **Scope**: detailed in-scope and out-of-scope lists
 - **Architecture notes**: relevant ADRs, technology choices, patterns used, key design decisions
 - **Interface specification**: full method signatures, input/output types, error types (from component description.md)
 - **Data flow**: how data enters and exits this component (include Mermaid sequence or flowchart diagram)
 - **Dependencies**: epic dependencies (with Jira IDs) and external dependencies (libraries, hardware, services)
 - **Acceptance criteria**: measurable criteria with specific thresholds (from component tests.md)
 - **Non-functional requirements**: latency, memory, throughput targets with failure thresholds
 - **Risks & mitigations**: relevant risks from risk_mitigations.md with concrete mitigation strategies
 - **Effort estimation**: T-shirt size and story points range
 - **Child issues**: planned task breakdown with complexity points
 - **Key constraints**: from restrictions.md that affect this component
 - **Testing strategy**: summary of test types and coverage from tests.md
 Do NOT create minimal epics with just a summary and short description. The Jira epic is the primary reference document for the implementation team.
 **Self-verification**:
 - [ ] "Bootstrap & Initial Structure" epic exists and is first in order
 - [ ] "Integration Tests" epic exists
 - [ ] Every component maps to exactly one epic
 - [ ] Dependency order is respected (no epic depends on a later one)
 - [ ] Acceptance criteria are measurable
 - [ ] Effort estimates are realistic
 - [ ] Every epic description includes architecture diagram, interface spec, data flow, risks, and NFRs
 - [ ] Epic descriptions are self-contained — readable without opening other files
 7. **Create "Integration Tests" epic** — this epic will parent the integration test tasks created by the `/decompose` skill. It covers implementing the test scenarios defined in `integration_tests/`.
 **Save action**: Epics created in Jira via MCP. Also saved locally in `epics.md` with Jira IDs.
@@ -0,0 +1,57 @@
 ## Quality Checklist (before FINAL_report.md)
 Before writing the final report, verify ALL of the following:
 ### Integration Tests
 - [ ] Every acceptance criterion is covered in traceability_matrix.md
 - [ ] Every restriction is verified by at least one test
 - [ ] Positive and negative scenarios are balanced
 - [ ] Docker environment is self-contained
 - [ ] Consumer app treats main system as black box
 - [ ] CI/CD integration and reporting defined
 ### Architecture
 - [ ] Covers all capabilities from solution.md
 - [ ] Technology choices are justified
 - [ ] Deployment model is defined
 - [ ] Integration test findings are reflected in architecture decisions
 ### Data Model
 - [ ] Every entity from architecture.md is defined
 - [ ] Relationships have explicit cardinality
 - [ ] Migration strategy with reversibility requirement
 - [ ] Seed data requirements defined
 - [ ] Backward compatibility approach documented
 ### Deployment
 - [ ] Containerization plan covers all components
 - [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Observability covers logging, metrics, tracing, alerting
 - [ ] Deployment procedures include rollback and health checks
 ### Components
 - [ ] Every component follows SRP
 - [ ] No circular dependencies
 - [ ] All inter-component interfaces are defined and consistent
 - [ ] No orphan components (unused by any flow)
 - [ ] Every integration test scenario can be traced through component interactions
 ### Risks
 - [ ] All High/Critical risks have mitigations
 - [ ] Mitigations are reflected in component/architecture docs
 - [ ] User has confirmed risk assessment is sufficient
 ### Tests
 - [ ] Every acceptance criterion is covered by at least one test
 - [ ] All 4 test types are represented per component (where applicable)
 - [ ] Test data management is defined
 ### Epics
 - [ ] "Bootstrap & Initial Structure" epic exists
 - [ ] "Integration Tests" epic exists
 - [ ] Every component maps to an epic
 - [ ] Dependency order is correct
 - [ ] Acceptance criteria are measurable
 **Save action**: Write `FINAL_report.md` using `templates/final-report.md` as structure
@@ -43,257 +43,51 @@ Determine the operating mode based on invocation before any other logic runs.
 **Standalone mode** (explicit input file provided, e.g. `/research @some_doc.md`):
 - INPUT_FILE: the provided file (treated as problem description)
- OUTPUT_DIR: `_standalone/01_solution/`
+- BASE_DIR: if specified by the caller, use it; otherwise default to `_standalone/`
- RESEARCH_DIR: `_standalone/00_research/`
+- OUTPUT_DIR: `BASE_DIR/01_solution/`
 - RESEARCH_DIR: `BASE_DIR/00_research/`
 - Guardrails relaxed: only INPUT_FILE must exist and be non-empty
 - `restrictions.md` and `acceptance_criteria.md` are optional — warn if absent, proceed if user confirms
 - Mode detection uses OUTPUT_DIR for `solution_draft*.md` scanning
 - Draft numbering works the same, scoped to OUTPUT_DIR
- **Final step**: after all research is complete, move INPUT_FILE into `_standalone/`
+- **Final step**: after all research is complete, move INPUT_FILE into BASE_DIR
 Announce the detected mode and resolved paths to the user before proceeding.
 ## Project Integration
-### Prerequisite Guardrails (BLOCKING)
+Read and follow `steps/00_project-integration.md` for prerequisite guardrails, mode detection, draft numbering, working directory setup, save timing, and output file inventory.
 Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
 **Project mode:**
 1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
 2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
 6. Read **all** files in INPUT_DIR to ground the investigation in the project context
 7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
 **Standalone mode:**
 1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
 2. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
 3. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
 ### Mode Detection
 After guardrails pass, determine the execution mode:
 1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
 2. **No matches found** → **Mode A: Initial Research**
 3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
 4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
 Inform the user which mode was detected and confirm before proceeding.
 ### Solution Draft Numbering
 All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
 1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
 2. Extract the highest existing number
 3. Increment by 1
 4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
 Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
 ### Working Directory & Intermediate Artifact Management
 #### Directory Structure
 At the start of research, **must** create a working directory under RESEARCH_DIR:
 ```
 RESEARCH_DIR/
 ├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
 ├── 00_question_decomposition.md   # Step 0-1 output
 ├── 01_source_registry.md          # Step 2 output: all consulted source links
 ├── 02_fact_cards.md               # Step 3 output: extracted facts
 ├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
 ├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
 ├── 05_validation_log.md           # Step 7 output: use-case validation results
 └── raw/                           # Raw source archive (optional)
    ├── source_1.md
    └── source_2.md
 ```
 ### Save Timing & Content
 | Step | Save immediately after completion | Filename |
 |------|-----------------------------------|----------|
 | Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
 | Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
 | Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
 | Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
 | Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
 | Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
 | Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
 | Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
 ### Save Principles
 1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
 2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
 3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
 4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
 ## Execution Flow
 ### Mode A: Initial Research
-Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
+Read and follow `steps/01_mode-a-initial-research.md`.
-#### Phase 1: AC & Restrictions Assessment (BLOCKING)
+Phases: AC Assessment (BLOCKING) → Problem Research → Tech Stack (optional) → Security (optional).
 **Role**: Professional software architect
 A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
 **Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
 **Task**:
 1. Read all problem context files thoroughly
 2. **ASK the user about every unclear aspect** — do not assume:
   - Unclear problem boundaries → ask
   - Ambiguous acceptance criteria values → ask
   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
   - Conflicting restrictions → ask which takes priority
 3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
 4. Research restrictions from multiple perspectives:
   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
 5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
 **Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
 **📁 Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
 ```markdown
 # Acceptance Criteria Assessment
 ## Acceptance Criteria
 | Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
 |-----------|-----------|-------------------|---------------------|--------|
 | [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
 ## Restrictions Assessment
 | Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
 |-------------|-----------|-------------------|---------------------|--------|
 | [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
 ## Key Findings
 [Summary of critical findings]
 ## Sources
 [Key references used]
 ```
 **BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
 ---
 #### Phase 2: Problem Research & Solution Draft
 **Role**: Professional researcher and software architect
 Full 8-step research methodology. Produces the first solution draft.
 **Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
 **Task** (drives the 8-step engine):
 1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
 2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
 3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
 4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
 5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
 6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
 7. Include security considerations in each component analysis
 8. Provide rough cost estimates for proposed solutions
 Be concise in formulating. The fewer words, the better, but do not miss any important details.
 **📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
 ---
 #### Phase 3: Tech Stack Consolidation (OPTIONAL)
 **Role**: Software architect evaluating technology choices
 Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
 **Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
 **Task**:
 1. Extract technology options from the solution draft's component comparison tables
 2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
 3. Produce a tech stack summary with selection rationale
 4. Assess risks and learning requirements per technology choice
 **📁 Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
 - Requirements analysis (functional, non-functional, constraints)
 - Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
 - Tech stack summary block
 - Risk assessment and learning requirements tables
 ---
 #### Phase 4: Security Deep Dive (OPTIONAL)
 **Role**: Security architect
 Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
 **Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
 **Task**:
 1. Build threat model: asset inventory, threat actors, attack vectors
 2. Define security requirements and proposed controls per component (with risk level)
 3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
 **📁 Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
 - Threat model (assets, actors, vectors)
 - Per-component security requirements and controls table
 - Security controls summary
 ---
 ### Mode B: Solution Assessment
-Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
+Read and follow `steps/02_mode-b-solution-assessment.md`.
-**Role**: Professional software architect
+---
-Full 8-step research methodology applied to assessing and improving an existing solution draft.
+## Research Engine (8-Step Method)
-**Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
+The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
-**Task** (drives the 8-step engine):
+**Investigation phase** (Steps 0–3.5): Read and follow `steps/03_engine-investigation.md`.
-1. Read the existing solution draft thoroughly
+Covers: question classification, novelty sensitivity, question decomposition, perspective rotation, exhaustive web search, fact extraction, iterative deepening.
 2. Research in internet extensively — for each component/decision in the draft, search for:
   - Known problems and limitations of the chosen approach
   - What practitioners say about using it in production
   - Better alternatives that may have emerged recently
   - Common failure modes and edge cases
   - How competitors/similar projects solve the same problem differently
 3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
 4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
 5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
 6. For each identified weak point, search for multiple solution approaches and compare them
 7. Based on findings, form a new solution draft in the same format
-**📁 Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
+**Analysis phase** (Steps 4–8): Read and follow `steps/04_engine-analysis.md`.
 Covers: comparison framework, baseline alignment, reasoning chain, use-case validation, deliverable formatting.
-**Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions above.
+## Solution Draft Output Templates
 - Mode A: `templates/solution_draft_mode_a.md`
 - Mode B: `templates/solution_draft_mode_b.md`
 ## Escalation Rules
@@ -317,389 +111,12 @@ When the user wants to:
 - Gather information and evidence for a decision
 - Assess or improve an existing solution draft
 **Keywords**:
 - "deep research", "deep dive", "in-depth analysis"
 - "research this", "investigate", "look into"
 - "assess solution", "review draft", "improve solution"
 - "comparative analysis", "concept comparison", "technical comparison"
 **Differentiation from other Skills**:
 - Needs a **visual knowledge graph** → use `research-to-diagram`
 - Needs **written output** (articles/tutorials) → use `wsy-writer`
 - Needs **material organization** → use `material-to-markdown`
 - Needs **research + solution draft** → use this Skill
 ## Research Engine (8-Step Method)
 The 8-step method is the core research engine used by both modes. Steps 0-1 and Step 8 have mode-specific behavior; Steps 2-7 are identical regardless of mode.
 ### Step 0: Question Type Classification
 First, classify the research question type and select the corresponding strategy:
 | Question Type | Core Task | Focus Dimensions |
 |---------------|-----------|------------------|
 | **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
 | **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
 | **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
 | **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
 | **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
 **Mode-specific classification**:
 | Mode / Phase | Typical Question Type |
 |--------------|----------------------|
 | Mode A Phase 1 | Knowledge Organization + Decision Support |
 | Mode A Phase 2 | Decision Support |
 | Mode B | Problem Diagnosis + Decision Support |
 ### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
 Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
 **For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
 Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
 **📁 Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
 ---
 ### Step 1: Question Decomposition & Boundary Definition
 **Mode-specific sub-questions**:
 **Mode A Phase 2** (Initial Research — Problem & Solution):
 - "What existing/competitor solutions address this problem?"
 - "What are the component parts of this problem?"
 - "For each component, what are the state-of-the-art solutions?"
 - "What are the security considerations per component?"
 - "What are the cost implications of each approach?"
 **Mode B** (Solution Assessment):
 - "What are the weak points and potential problems in the existing draft?"
 - "What are the security vulnerabilities in the proposed architecture?"
 - "Where are the performance bottlenecks?"
 - "What solutions exist for each identified issue?"
 **General sub-question patterns** (use when applicable):
 - **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
 - **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
 - **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
 - **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
 #### Perspective Rotation (MANDATORY)
 For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
 | Perspective | What it asks | Example queries |
 |-------------|-------------|-----------------|
 | **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
 | **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
 | **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
 | **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
 | **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
 | **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
 Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
 #### Question Explosion (MANDATORY)
 For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
 **Query variant strategies**:
 - **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
 - **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
 - **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
 - **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
 - **Temporal**: "X 2025", "X latest developments", "X roadmap"
 - **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
 Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
 **⚠️ Research Subject Boundary Definition (BLOCKING - must be explicit)**:
 When decomposing questions, you must explicitly define the **boundaries of the research subject**:
 | Dimension | Boundary to define | Example |
 |-----------|--------------------|---------|
 | **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
 | **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
 | **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
 | **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
 **Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
 **📁 Save action**:
 1. Read all files from INPUT_DIR to ground the research in the project context
 2. Create working directory `RESEARCH_DIR/`
 3. Write `00_question_decomposition.md`, including:
   - Original question
   - Active mode (A Phase 2 or B) and rationale
   - Summary of relevant problem context from INPUT_DIR
   - Classified question type and rationale
   - **Research subject boundary definition** (population, geography, timeframe, level)
   - List of decomposed sub-questions
   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
 4. Write TodoWrite to track progress
 ### Step 2: Source Tiering & Exhaustive Web Investigation
 Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
 **For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
 **Tool Usage**:
 - Use `WebSearch` for broad searches; `WebFetch` to read specific pages
 - Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
 - Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
 - When citing web sources, include the URL and date accessed
 #### Exhaustive Search Requirements (MANDATORY)
 Do not stop at the first few results. The goal is to build a comprehensive evidence base.
 **Minimum search effort per sub-question**:
 - Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
 - Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
 - If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
 **Search broadening strategies** (use when results are thin):
 - Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
 - Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
 - Try different geographies: search in English + search for European/Asian approaches if relevant
 - Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
 - Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
 **Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
 **📁 Save action**:
 For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
 ### Step 3: Fact Extraction & Evidence Cards
 Transform sources into **verifiable fact cards**:
 ```markdown
 ## Fact Cards
 ### Fact 1
 - **Statement**: [specific fact description]
 - **Source**: [link/document section]
 - **Confidence**: High/Medium/Low
 ### Fact 2
 ...
 ```
 **Key discipline**:
 - Pin down facts first, then reason
 - Distinguish "what officials said" from "what I infer"
 - When conflicting information is found, annotate and preserve both sides
 - Annotate confidence level:
  - ✅ High: Explicitly stated in official documentation
  - ⚠️ Medium: Mentioned in official blog but not formally documented
  - ❓ Low: Inference or from unofficial sources
 **📁 Save action**:
 For each extracted fact, **immediately** append to `02_fact_cards.md`:
 ```markdown
 ## Fact #[number]
 - **Statement**: [specific fact description]
 - **Source**: [Source #number] [link]
 - **Phase**: [Phase 1 / Phase 2 / Assessment]
 - **Target Audience**: [which group this fact applies to, inherited from source or further refined]
 - **Confidence**: ✅/⚠️/❓
 - **Related Dimension**: [corresponding comparison dimension]
 ```
 **⚠️ Target audience in fact statements**:
 - If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
 - Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
 - Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
 ### Step 3.5: Iterative Deepening — Follow-Up Investigation
 After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
 **Process**:
 1. **Gap analysis**: Review fact cards and identify:
   - Sub-questions with fewer than 3 high-confidence facts → need more searching
   - Contradictions between sources → need tie-breaking evidence
   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
   - Claims that rely only on L3/L4 sources → need L1/L2 verification
 2. **Follow-up question generation**: Based on initial findings, generate new questions:
   - "Source X claims [fact] — is this consistent with other evidence?"
   - "If [approach A] has [limitation], how do practitioners work around it?"
   - "What are the second-order effects of [finding]?"
   - "Who disagrees with [common finding] and why?"
   - "What happened when [solution] was deployed at scale?"
 3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
   - Specific claims that need verification
   - Alternative viewpoints not yet represented
   - Real-world case studies and experience reports
   - Failure cases and edge conditions
   - Recent developments that may change the picture
 4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
 **Exit criteria**: Proceed to Step 4 when:
 - Every sub-question has at least 3 facts with at least one from L1/L2
 - At least 3 perspectives from Step 1 have supporting evidence
 - No unresolved contradictions remain (or they are explicitly documented as open questions)
 - Follow-up searches are no longer producing new substantive information
 ### Step 4: Build Comparison/Analysis Framework
 Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
 **📁 Save action**:
 Write to `03_comparison_framework.md`:
 ```markdown
 # Comparison Framework
 ## Selected Framework Type
 [Concept Comparison / Decision Support / ...]
 ## Selected Dimensions
 1. [Dimension 1]
 2. [Dimension 2]
 ...
 ## Initial Population
 | Dimension | X | Y | Factual Basis |
 |-----------|---|---|---------------|
 | [Dimension 1] | [description] | [description] | Fact #1, #3 |
 | ... | | | |
 ```
 ### Step 5: Reference Point Baseline Alignment
 Ensure all compared parties have clear, consistent definitions:
 **Checklist**:
 - [ ] Is the reference point's definition stable/widely accepted?
 - [ ] Does it need verification, or can domain common knowledge be used?
 - [ ] Does the reader's understanding of the reference point match mine?
 - [ ] Are there ambiguities that need to be clarified first?
 ### Step 6: Fact-to-Conclusion Reasoning Chain
 Explicitly write out the "fact → comparison → conclusion" reasoning process:
 ```markdown
 ## Reasoning Process
 ### Regarding [Dimension Name]
 1. **Fact confirmation**: According to [source], X's mechanism is...
 2. **Compare with reference**: While Y's mechanism is...
 3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
 ```
 **Key discipline**:
 - Conclusions come from mechanism comparison, not "gut feelings"
 - Every conclusion must be traceable to specific facts
 - Uncertain conclusions must be annotated
 **📁 Save action**:
 Write to `04_reasoning_chain.md`:
 ```markdown
 # Reasoning Chain
 ## Dimension 1: [Dimension Name]
 ### Fact Confirmation
 According to [Fact #X], X's mechanism is...
 ### Reference Comparison
 While Y's mechanism is... (Source: [Fact #Y])
 ### Conclusion
 Therefore, the difference between X and Y on this dimension is...
 ### Confidence
 ✅/⚠️/❓ + rationale
 ---
 ## Dimension 2: [Dimension Name]
 ...
 ```
 ### Step 7: Use-Case Validation (Sanity Check)
 Validate conclusions against a typical scenario:
 **Validation questions**:
 - Based on my conclusions, how should this scenario be handled?
 - Is that actually the case?
 - Are there counterexamples that need to be addressed?
 **Review checklist**:
 - [ ] Are draft conclusions consistent with Step 3 fact cards?
 - [ ] Are there any important dimensions missed?
 - [ ] Is there any over-extrapolation?
 - [ ] Are conclusions actionable/verifiable?
 **📁 Save action**:
 Write to `05_validation_log.md`:
 ```markdown
 # Validation Log
 ## Validation Scenario
 [Scenario description]
 ## Expected Based on Conclusions
 If using X: [expected behavior]
 If using Y: [expected behavior]
 ## Actual Validation Results
 [actual situation]
 ## Counterexamples
 [yes/no, describe if yes]
 ## Review Checklist
 - [x] Draft conclusions consistent with fact cards
 - [x] No important dimensions missed
 - [x] No over-extrapolation
 - [ ] Issue found: [if any]
 ## Conclusions Requiring Revision
 [if any]
 ```
 ### Step 8: Deliverable Formatting
 Make the output **readable, traceable, and actionable**.
 **📁 Save action**:
 Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
 - Mode A: `templates/solution_draft_mode_a.md`
 - Mode B: `templates/solution_draft_mode_b.md`
 Sources to integrate:
 - Extract background from `00_question_decomposition.md`
 - Reference key facts from `02_fact_cards.md`
 - Organize conclusions from `04_reasoning_chain.md`
 - Generate references from `01_source_registry.md`
 - Supplement with use cases from `05_validation_log.md`
 - For Mode A: include AC assessment from `00_ac_assessment.md`
 ## Solution Draft Output Templates
 ### Mode A: Initial Research Output
 Use template: `templates/solution_draft_mode_a.md`
 ### Mode B: Solution Assessment Output
 Use template: `templates/solution_draft_mode_b.md`
 ## Stakeholder Perspectives
 Adjust content depth based on audience:
@@ -710,75 +127,6 @@ Adjust content depth based on audience:
 | **Implementers** | Specific mechanisms, how-to | Detailed, emphasize how to do it |
 | **Technical experts** | Details, boundary conditions, limitations | In-depth, emphasize accuracy |
 ## Output Files
 Default intermediate artifacts location: `RESEARCH_DIR/`
 **Required files** (automatically generated through the process):
 | File | Content | When Generated |
 |------|---------|----------------|
 | `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
 | `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
 | `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
 | `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
 | `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
 | `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
 | `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
 | `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
 | `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
 | `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
 **Optional files**:
 - `raw/*.md` - Raw source archives (saved when content is lengthy)
 ## Methodology Quick Reference Card
 ```
 ┌──────────────────────────────────────────────────────────────────┐
 │              Deep Research — Mode-Aware 8-Step Method            │
 ├──────────────────────────────────────────────────────────────────┤
 │ CONTEXT: Resolve mode (project vs standalone) + set paths        │
 │ GUARDRAILS: Check INPUT_DIR/INPUT_FILE exists + required files   │
 │ MODE DETECT: solution_draft*.md in 01_solution? → A or B         │
 │                                                                  │
 │ MODE A: Initial Research                                         │
 │   Phase 1: AC & Restrictions Assessment (BLOCKING)               │
 │   Phase 2: Full 8-step → solution_draft##.md                     │
 │   Phase 3: Tech Stack Consolidation (OPTIONAL) → tech_stack.md   │
 │   Phase 4: Security Deep Dive (OPTIONAL) → security_analysis.md  │
 │                                                                  │
 │ MODE B: Solution Assessment                                      │
 │   Read latest draft → Full 8-step → solution_draft##.md (N+1)    │
 │   Optional: Phase 3 / Phase 4 on revised draft                   │
 │                                                                  │
 │ 8-STEP ENGINE:                                                   │
 │  0. Classify question type → Select framework template           │
 │  0.5 Novelty sensitivity → Time windows for sources              │
 │  1. Decompose question → sub-questions + perspectives + queries  │
 │     → Perspective Rotation (3+ viewpoints, MANDATORY)            │
 │     → Question Explosion (3-5 query variants per sub-Q)          │
 │  2. Exhaustive web search → L1 > L2 > L3 > L4, broad coverage   │
 │     → Execute ALL query variants, search until saturation        │
 │  3. Extract facts → Each with source, confidence level           │
 │  3.5 Iterative deepening → gaps, contradictions, follow-ups     │
 │     → Keep searching until exit criteria met                     │
 │  4. Build framework → Fixed dimensions, structured compare       │
 │  5. Align references → Ensure unified definitions                │
 │  6. Reasoning chain → Fact→Compare→Conclude, explicit            │
 │  7. Use-case validation → Sanity check, prevent armchairing      │
 │  8. Deliverable → solution_draft##.md (mode-specific format)     │
 ├──────────────────────────────────────────────────────────────────┤
 │ Key discipline: Ask don't assume · Facts before reasoning        │
 │   Conclusions from mechanism, not gut feelings                   │
 │   Search broadly, from multiple perspectives, until saturation   │
 └──────────────────────────────────────────────────────────────────┘
 ```
 ## Usage Examples
 For detailed execution flow examples (Mode A initial, Mode B assessment, standalone, force override): Read `references/usage-examples.md`
 ## Source Verifiability Requirements
 Every cited piece of external information must be directly verifiable by the user. All links must be publicly accessible (annotate `[login required]` if not), citations must include exact section/page/timestamp, and unverifiable information must be annotated `[limited source]`. Full checklist in `references/quality-checklists.md`.
@@ -796,7 +144,7 @@ Before completing the solution draft, run through the checklists in `references/
 When replying to the user after research is complete:
-**✅ Should include**:
+**Should include**:
 - Active mode used (A or B) and which optional phases were executed
 - One-sentence core conclusion
 - Key findings summary (3-5 points)
@@ -804,7 +152,7 @@ When replying to the user after research is complete:
 - Paths to optional artifacts if produced: `tech_stack.md`, `security_analysis.md`
 - If there are significant uncertainties, annotate points requiring further verification
-**❌ Must not include**:
+**Must not include**:
 - Process file listings (e.g., `00_question_decomposition.md`, `01_source_registry.md`, etc.)
 - Detailed research step descriptions
 - Working directory structure display
@@ -0,0 +1,103 @@
 ## Project Integration
 ### Prerequisite Guardrails (BLOCKING)
 Before any research begins, verify the input context exists. **Do not proceed if guardrails fail.**
 **Project mode:**
 1. Check INPUT_DIR exists — **STOP if missing**, ask user to create it and provide problem files
 2. Check `problem.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 3. Check `restrictions.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 4. Check `acceptance_criteria.md` in INPUT_DIR exists and is non-empty — **STOP if missing**
 5. Check `input_data/` in INPUT_DIR exists and contains at least one file — **STOP if missing**
 6. Read **all** files in INPUT_DIR to ground the investigation in the project context
 7. Create OUTPUT_DIR and RESEARCH_DIR if they don't exist
 **Standalone mode:**
 1. Check INPUT_FILE exists and is non-empty — **STOP if missing**
 2. Resolve BASE_DIR: use the caller-specified directory if provided; otherwise default to `_standalone/`
 3. Resolve OUTPUT_DIR (`BASE_DIR/01_solution/`) and RESEARCH_DIR (`BASE_DIR/00_research/`)
 4. Warn if no `restrictions.md` or `acceptance_criteria.md` were provided alongside INPUT_FILE — proceed if user confirms
 5. Create BASE_DIR, OUTPUT_DIR, and RESEARCH_DIR if they don't exist
 ### Mode Detection
 After guardrails pass, determine the execution mode:
 1. Scan OUTPUT_DIR for files matching `solution_draft*.md`
 2. **No matches found** → **Mode A: Initial Research**
 3. **Matches found** → **Mode B: Solution Assessment** (use the highest-numbered draft as input)
 4. **User override**: if the user explicitly says "research from scratch" or "initial research", force Mode A regardless of existing drafts
 Inform the user which mode was detected and confirm before proceeding.
 ### Solution Draft Numbering
 All final output is saved as `OUTPUT_DIR/solution_draft##.md` with a 2-digit zero-padded number:
 1. Scan existing files in OUTPUT_DIR matching `solution_draft*.md`
 2. Extract the highest existing number
 3. Increment by 1
 4. Zero-pad to 2 digits (e.g., `01`, `02`, ..., `10`, `11`)
 Example: if `solution_draft01.md` through `solution_draft10.md` exist, the next output is `solution_draft11.md`.
 ### Working Directory & Intermediate Artifact Management
 #### Directory Structure
 At the start of research, **must** create a working directory under RESEARCH_DIR:
 ```
 RESEARCH_DIR/
 ├── 00_ac_assessment.md            # Mode A Phase 1 output: AC & restrictions assessment
 ├── 00_question_decomposition.md   # Step 0-1 output
 ├── 01_source_registry.md          # Step 2 output: all consulted source links
 ├── 02_fact_cards.md               # Step 3 output: extracted facts
 ├── 03_comparison_framework.md     # Step 4 output: selected framework and populated data
 ├── 04_reasoning_chain.md          # Step 6 output: fact → conclusion reasoning
 ├── 05_validation_log.md           # Step 7 output: use-case validation results
 └── raw/                           # Raw source archive (optional)
    ├── source_1.md
    └── source_2.md
 ```
 ### Save Timing & Content
 | Step | Save immediately after completion | Filename |
 |------|-----------------------------------|----------|
 | Mode A Phase 1 | AC & restrictions assessment tables | `00_ac_assessment.md` |
 | Step 0-1 | Question type classification + sub-question list | `00_question_decomposition.md` |
 | Step 2 | Each consulted source link, tier, summary | `01_source_registry.md` |
 | Step 3 | Each fact card (statement + source + confidence) | `02_fact_cards.md` |
 | Step 4 | Selected comparison framework + initial population | `03_comparison_framework.md` |
 | Step 6 | Reasoning process for each dimension | `04_reasoning_chain.md` |
 | Step 7 | Validation scenarios + results + review checklist | `05_validation_log.md` |
 | Step 8 | Complete solution draft | `OUTPUT_DIR/solution_draft##.md` |
 ### Save Principles
 1. **Save immediately**: Write to the corresponding file as soon as a step is completed; don't wait until the end
 2. **Incremental updates**: Same file can be updated multiple times; append or replace new content
 3. **Preserve process**: Keep intermediate files even after their content is integrated into the final report
 4. **Enable recovery**: If research is interrupted, progress can be recovered from intermediate files
 ### Output Files
 **Required files** (automatically generated through the process):
 | File | Content | When Generated |
 |------|---------|----------------|
 | `00_ac_assessment.md` | AC & restrictions assessment (Mode A only) | After Phase 1 completion |
 | `00_question_decomposition.md` | Question type, sub-question list | After Step 0-1 completion |
 | `01_source_registry.md` | All source links and summaries | Continuously updated during Step 2 |
 | `02_fact_cards.md` | Extracted facts and sources | Continuously updated during Step 3 |
 | `03_comparison_framework.md` | Selected framework and populated data | After Step 4 completion |
 | `04_reasoning_chain.md` | Fact → conclusion reasoning | After Step 6 completion |
 | `05_validation_log.md` | Use-case validation and review | After Step 7 completion |
 | `OUTPUT_DIR/solution_draft##.md` | Complete solution draft | After Step 8 completion |
 | `OUTPUT_DIR/tech_stack.md` | Tech stack evaluation and decisions | After Phase 3 (optional) |
 | `OUTPUT_DIR/security_analysis.md` | Threat model and security controls | After Phase 4 (optional) |
 **Optional files**:
 - `raw/*.md` - Raw source archives (saved when content is lengthy)
@@ -0,0 +1,127 @@
 ## Mode A: Initial Research
 Triggered when no `solution_draft*.md` files exist in OUTPUT_DIR, or when the user explicitly requests initial research.
 ### Phase 1: AC & Restrictions Assessment (BLOCKING)
 **Role**: Professional software architect
 A focused preliminary research pass **before** the main solution research. The goal is to validate that the acceptance criteria and restrictions are realistic before designing a solution around them.
 **Input**: All files from INPUT_DIR (or INPUT_FILE in standalone mode)
 **Task**:
 1. Read all problem context files thoroughly
 2. **ASK the user about every unclear aspect** — do not assume:
   - Unclear problem boundaries → ask
   - Ambiguous acceptance criteria values → ask
   - Missing context (no `security_approach.md`, no `input_data/`) → ask what they have
   - Conflicting restrictions → ask which takes priority
 3. Research in internet **extensively** — use multiple search queries per question, rephrase, and search from different angles:
   - How realistic are the acceptance criteria for this specific domain? Search for industry benchmarks, standards, and typical values
   - How critical is each criterion? Search for case studies where criteria were relaxed or tightened
   - What domain-specific acceptance criteria are we missing? Search for industry standards, regulatory requirements, and best practices in the specific domain
   - Impact of each criterion value on the whole system quality — search for research papers and engineering reports
   - Cost/budget implications of each criterion — search for pricing, total cost of ownership analyses, and comparable project budgets
   - Timeline implications — search for project timelines, development velocity reports, and comparable implementations
   - What do practitioners in this domain consider the most important criteria? Search forums, conference talks, and experience reports
 4. Research restrictions from multiple perspectives:
   - Are the restrictions realistic? Search for comparable projects that operated under similar constraints
   - Should any be tightened or relaxed? Search for what constraints similar projects actually ended up with
   - Are there additional restrictions we should add? Search for regulatory, compliance, and safety requirements in this domain
   - What restrictions do practitioners wish they had defined earlier? Search for post-mortem reports and lessons learned
 5. Verify findings with authoritative sources (official docs, papers, benchmarks) — each key finding must have at least 2 independent sources
 **Uses Steps 0-3 of the 8-step engine** (question classification, decomposition, source tiering, fact extraction) scoped to AC and restrictions assessment.
 **Save action**: Write `RESEARCH_DIR/00_ac_assessment.md` with format:
 ```markdown
 # Acceptance Criteria Assessment
 ## Acceptance Criteria
 | Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
 |-----------|-----------|-------------------|---------------------|--------|
 | [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
 ## Restrictions Assessment
 | Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
 |-------------|-----------|-------------------|---------------------|--------|
 | [name] | [current] | [researched range] | [impact] | Added / Modified / Removed |
 ## Key Findings
 [Summary of critical findings]
 ## Sources
 [Key references used]
 ```
 **BLOCKING**: Present the AC assessment tables to the user. Wait for confirmation or adjustments before proceeding to Phase 2. The user may update `acceptance_criteria.md` or `restrictions.md` based on findings.
 ---
 ### Phase 2: Problem Research & Solution Draft
 **Role**: Professional researcher and software architect
 Full 8-step research methodology. Produces the first solution draft.
 **Input**: All files from INPUT_DIR (possibly updated after Phase 1) + Phase 1 artifacts
 **Task** (drives the 8-step engine):
 1. Research existing/competitor solutions for similar problems — search broadly across industries and adjacent domains, not just the obvious competitors
 2. Research the problem thoroughly — all possible ways to solve it, split into components; search for how different fields approach analogous problems
 3. For each component, research all possible solutions and find the most efficient state-of-the-art approaches — use multiple query variants and perspectives from Step 1
 4. For each promising approach, search for real-world deployment experience: success stories, failure reports, lessons learned, and practitioner opinions
 5. Search for contrarian viewpoints — who argues against the common approaches and why? What failure modes exist?
 6. Verify that suggested tools/libraries actually exist and work as described — check official repos, latest releases, and community health (stars, recent commits, open issues)
 7. Include security considerations in each component analysis
 8. Provide rough cost estimates for proposed solutions
 Be concise in formulating. The fewer words, the better, but do not miss any important details.
 **Save action**: Write `OUTPUT_DIR/solution_draft##.md` using template: `templates/solution_draft_mode_a.md`
 ---
 ### Phase 3: Tech Stack Consolidation (OPTIONAL)
 **Role**: Software architect evaluating technology choices
 Focused synthesis step — no new 8-step cycle. Uses research already gathered in Phase 2 to make concrete technology decisions.
 **Input**: Latest `solution_draft##.md` from OUTPUT_DIR + all files from INPUT_DIR
 **Task**:
 1. Extract technology options from the solution draft's component comparison tables
 2. Score each option against: fitness for purpose, maturity, security track record, team expertise, cost, scalability
 3. Produce a tech stack summary with selection rationale
 4. Assess risks and learning requirements per technology choice
 **Save action**: Write `OUTPUT_DIR/tech_stack.md` with:
 - Requirements analysis (functional, non-functional, constraints)
 - Technology evaluation tables (language, framework, database, infrastructure, key libraries) with scores
 - Tech stack summary block
 - Risk assessment and learning requirements tables
 ---
 ### Phase 4: Security Deep Dive (OPTIONAL)
 **Role**: Security architect
 Focused analysis step — deepens the security column from the solution draft into a proper threat model and controls specification.
 **Input**: Latest `solution_draft##.md` from OUTPUT_DIR + `security_approach.md` from INPUT_DIR + problem context
 **Task**:
 1. Build threat model: asset inventory, threat actors, attack vectors
 2. Define security requirements and proposed controls per component (with risk level)
 3. Summarize authentication/authorization, data protection, secure communication, and logging/monitoring approach
 **Save action**: Write `OUTPUT_DIR/security_analysis.md` with:
 - Threat model (assets, actors, vectors)
 - Per-component security requirements and controls table
 - Security controls summary
@@ -0,0 +1,27 @@
 ## Mode B: Solution Assessment
 Triggered when `solution_draft*.md` files exist in OUTPUT_DIR.
 **Role**: Professional software architect
 Full 8-step research methodology applied to assessing and improving an existing solution draft.
 **Input**: All files from INPUT_DIR + the latest (highest-numbered) `solution_draft##.md` from OUTPUT_DIR
 **Task** (drives the 8-step engine):
 1. Read the existing solution draft thoroughly
 2. Research in internet extensively — for each component/decision in the draft, search for:
   - Known problems and limitations of the chosen approach
   - What practitioners say about using it in production
   - Better alternatives that may have emerged recently
   - Common failure modes and edge cases
   - How competitors/similar projects solve the same problem differently
 3. Search specifically for contrarian views: "why not [chosen approach]", "[chosen approach] criticism", "[chosen approach] failure"
 4. Identify security weak points and vulnerabilities — search for CVEs, security advisories, and known attack vectors for each technology in the draft
 5. Identify performance bottlenecks — search for benchmarks, load test results, and scalability reports
 6. For each identified weak point, search for multiple solution approaches and compare them
 7. Based on findings, form a new solution draft in the same format
 **Save action**: Write `OUTPUT_DIR/solution_draft##.md` (incremented) using template: `templates/solution_draft_mode_b.md`
 **Optional follow-up**: After Mode B completes, the user can request Phase 3 (Tech Stack Consolidation) or Phase 4 (Security Deep Dive) using the revised draft. These phases work identically to their Mode A descriptions in `steps/01_mode-a-initial-research.md`.
@@ -0,0 +1,227 @@
 ## Research Engine — Investigation Phase (Steps 0–3.5)
 ### Step 0: Question Type Classification
 First, classify the research question type and select the corresponding strategy:
 | Question Type | Core Task | Focus Dimensions |
 |---------------|-----------|------------------|
 | **Concept Comparison** | Build comparison framework | Mechanism differences, applicability boundaries |
 | **Decision Support** | Weigh trade-offs | Cost, risk, benefit |
 | **Trend Analysis** | Map evolution trajectory | History, driving factors, predictions |
 | **Problem Diagnosis** | Root cause analysis | Symptoms, causes, evidence chain |
 | **Knowledge Organization** | Systematic structuring | Definitions, classifications, relationships |
 **Mode-specific classification**:
 | Mode / Phase | Typical Question Type |
 |--------------|----------------------|
 | Mode A Phase 1 | Knowledge Organization + Decision Support |
 | Mode A Phase 2 | Decision Support |
 | Mode B | Problem Diagnosis + Decision Support |
 ### Step 0.5: Novelty Sensitivity Assessment (BLOCKING)
 Before starting research, assess the novelty sensitivity of the question (Critical/High/Medium/Low). This determines source time windows and filtering strategy.
 **For full classification table, critical-domain rules, trigger words, and assessment template**: Read `references/novelty-sensitivity.md`
 Key principle: Critical-sensitivity topics (AI/LLMs, blockchain) require sources within 6 months, mandatory version annotations, cross-validation from 2+ sources, and direct verification of official download pages.
 **Save action**: Append timeliness assessment to the end of `00_question_decomposition.md`
 ---
 ### Step 1: Question Decomposition & Boundary Definition
 **Mode-specific sub-questions**:
 **Mode A Phase 2** (Initial Research — Problem & Solution):
 - "What existing/competitor solutions address this problem?"
 - "What are the component parts of this problem?"
 - "For each component, what are the state-of-the-art solutions?"
 - "What are the security considerations per component?"
 - "What are the cost implications of each approach?"
 **Mode B** (Solution Assessment):
 - "What are the weak points and potential problems in the existing draft?"
 - "What are the security vulnerabilities in the proposed architecture?"
 - "Where are the performance bottlenecks?"
 - "What solutions exist for each identified issue?"
 **General sub-question patterns** (use when applicable):
 - **Sub-question A**: "What is X and how does it work?" (Definition & mechanism)
 - **Sub-question B**: "What are the dimensions of relationship/difference between X and Y?" (Comparative analysis)
 - **Sub-question C**: "In what scenarios is X applicable/inapplicable?" (Boundary conditions)
 - **Sub-question D**: "What are X's development trends/best practices?" (Extended analysis)
 #### Perspective Rotation (MANDATORY)
 For each research problem, examine it from **at least 3 different perspectives**. Each perspective generates its own sub-questions and search queries.
 | Perspective | What it asks | Example queries |
 |-------------|-------------|-----------------|
 | **End-user / Consumer** | What problems do real users encounter? What do they wish were different? | "X problems", "X frustrations reddit", "X user complaints" |
 | **Implementer / Engineer** | What are the technical challenges, gotchas, hidden complexities? | "X implementation challenges", "X pitfalls", "X lessons learned" |
 | **Business / Decision-maker** | What are the costs, ROI, strategic implications? | "X total cost of ownership", "X ROI case study", "X vs Y business comparison" |
 | **Contrarian / Devil's advocate** | What could go wrong? Why might this fail? What are critics saying? | "X criticism", "why not X", "X failures", "X disadvantages real world" |
 | **Domain expert / Academic** | What does peer-reviewed research say? What are theoretical limits? | "X research paper", "X systematic review", "X benchmarks academic" |
 | **Practitioner / Field** | What do people who actually use this daily say? What works in practice vs theory? | "X in production", "X experience report", "X after 1 year" |
 Select at least 3 perspectives relevant to the problem. Document the chosen perspectives in `00_question_decomposition.md`.
 #### Question Explosion (MANDATORY)
 For **each sub-question**, generate **at least 3-5 search query variants** before searching. This ensures broad coverage and avoids missing relevant information due to terminology differences.
 **Query variant strategies**:
 - **Specificity ladder**: broad ("indoor navigation systems") → narrow ("UWB-based indoor drone navigation accuracy")
 - **Negation/failure**: "X limitations", "X failure modes", "when X doesn't work"
 - **Comparison framing**: "X vs Y for Z", "X alternative for Z", "X or Y which is better for Z"
 - **Practitioner voice**: "X in production experience", "X real-world results", "X lessons learned"
 - **Temporal**: "X 2025", "X latest developments", "X roadmap"
 - **Geographic/domain**: "X in Europe", "X for defense applications", "X in agriculture"
 Record all planned queries in `00_question_decomposition.md` alongside each sub-question.
 **Research Subject Boundary Definition (BLOCKING - must be explicit)**:
 When decomposing questions, you must explicitly define the **boundaries of the research subject**:
 | Dimension | Boundary to define | Example |
 |-----------|--------------------|---------|
 | **Population** | Which group is being studied? | University students vs K-12 vs vocational students vs all students |
 | **Geography** | Which region is being studied? | Chinese universities vs US universities vs global |
 | **Timeframe** | Which period is being studied? | Post-2020 vs full historical picture |
 | **Level** | Which level is being studied? | Undergraduate vs graduate vs vocational |
 **Common mistake**: User asks about "university classroom issues" but sources include policies targeting "K-12 students" — mismatched target populations will invalidate the entire research.
 **Save action**:
 1. Read all files from INPUT_DIR to ground the research in the project context
 2. Create working directory `RESEARCH_DIR/`
 3. Write `00_question_decomposition.md`, including:
   - Original question
   - Active mode (A Phase 2 or B) and rationale
   - Summary of relevant problem context from INPUT_DIR
   - Classified question type and rationale
   - **Research subject boundary definition** (population, geography, timeframe, level)
   - List of decomposed sub-questions
   - **Chosen perspectives** (at least 3 from the Perspective Rotation table) with rationale
   - **Search query variants** for each sub-question (at least 3-5 per sub-question)
 4. Write TodoWrite to track progress
 ---
 ### Step 2: Source Tiering & Exhaustive Web Investigation
 Tier sources by authority, **prioritize primary sources** (L1 > L2 > L3 > L4). Conclusions must be traceable to L1/L2; L3/L4 serve as supplementary and validation.
 **For full tier definitions, search strategies, community mining steps, and source registry templates**: Read `references/source-tiering.md`
 **Tool Usage**:
 - Use `WebSearch` for broad searches; `WebFetch` to read specific pages
 - Use the `context7` MCP server (`resolve-library-id` then `get-library-docs`) for up-to-date library/framework documentation
 - Always cross-verify training data claims against live sources for facts that may have changed (versions, APIs, deprecations, security advisories)
 - When citing web sources, include the URL and date accessed
 #### Exhaustive Search Requirements (MANDATORY)
 Do not stop at the first few results. The goal is to build a comprehensive evidence base.
 **Minimum search effort per sub-question**:
 - Execute **all** query variants generated in Step 1's Question Explosion (at least 3-5 per sub-question)
 - Consult at least **2 different source tiers** per sub-question (e.g., L1 official docs + L4 community discussion)
 - If initial searches yield fewer than 3 relevant sources for a sub-question, **broaden the search** with alternative terms, related domains, or analogous problems
 **Search broadening strategies** (use when results are thin):
 - Try adjacent fields: if researching "drone indoor navigation", also search "robot indoor navigation", "warehouse AGV navigation"
 - Try different communities: academic papers, industry whitepapers, military/defense publications, hobbyist forums
 - Try different geographies: search in English + search for European/Asian approaches if relevant
 - Try historical evolution: "history of X", "evolution of X approaches", "X state of the art 2024 2025"
 - Try failure analysis: "X project failure", "X post-mortem", "X recall", "X incident report"
 **Search saturation rule**: Continue searching until new queries stop producing substantially new information. If the last 3 searches only repeat previously found facts, the sub-question is saturated.
 **Save action**:
 For each source consulted, **immediately** append to `01_source_registry.md` using the entry template from `references/source-tiering.md`.
 ---
 ### Step 3: Fact Extraction & Evidence Cards
 Transform sources into **verifiable fact cards**:
 ```markdown
 ## Fact Cards
 ### Fact 1
 - **Statement**: [specific fact description]
 - **Source**: [link/document section]
 - **Confidence**: High/Medium/Low
 ### Fact 2
 ...
 ```
 **Key discipline**:
 - Pin down facts first, then reason
 - Distinguish "what officials said" from "what I infer"
 - When conflicting information is found, annotate and preserve both sides
 - Annotate confidence level:
  - ✅ High: Explicitly stated in official documentation
  - ⚠️ Medium: Mentioned in official blog but not formally documented
  - ❓ Low: Inference or from unofficial sources
 **Save action**:
 For each extracted fact, **immediately** append to `02_fact_cards.md`:
 ```markdown
 ## Fact #[number]
 - **Statement**: [specific fact description]
 - **Source**: [Source #number] [link]
 - **Phase**: [Phase 1 / Phase 2 / Assessment]
 - **Target Audience**: [which group this fact applies to, inherited from source or further refined]
 - **Confidence**: ✅/⚠️/❓
 - **Related Dimension**: [corresponding comparison dimension]
 ```
 **Target audience in fact statements**:
 - If a fact comes from a "partially overlapping" or "reference only" source, the statement **must explicitly annotate the applicable scope**
 - Wrong: "The Ministry of Education banned phones in classrooms" (doesn't specify who)
 - Correct: "The Ministry of Education banned K-12 students from bringing phones into classrooms (does not apply to university students)"
 ---
 ### Step 3.5: Iterative Deepening — Follow-Up Investigation
 After initial fact extraction, review what you have found and identify **knowledge gaps and new questions** that emerged from the initial research. This step ensures the research doesn't stop at surface-level findings.
 **Process**:
 1. **Gap analysis**: Review fact cards and identify:
   - Sub-questions with fewer than 3 high-confidence facts → need more searching
   - Contradictions between sources → need tie-breaking evidence
   - Perspectives (from Step 1) that have no or weak coverage → need targeted search
   - Claims that rely only on L3/L4 sources → need L1/L2 verification
 2. **Follow-up question generation**: Based on initial findings, generate new questions:
   - "Source X claims [fact] — is this consistent with other evidence?"
   - "If [approach A] has [limitation], how do practitioners work around it?"
   - "What are the second-order effects of [finding]?"
   - "Who disagrees with [common finding] and why?"
   - "What happened when [solution] was deployed at scale?"
 3. **Targeted deep-dive searches**: Execute follow-up searches focusing on:
   - Specific claims that need verification
   - Alternative viewpoints not yet represented
   - Real-world case studies and experience reports
   - Failure cases and edge conditions
   - Recent developments that may change the picture
 4. **Update artifacts**: Append new sources to `01_source_registry.md`, new facts to `02_fact_cards.md`
 **Exit criteria**: Proceed to Step 4 when:
 - Every sub-question has at least 3 facts with at least one from L1/L2
 - At least 3 perspectives from Step 1 have supporting evidence
 - No unresolved contradictions remain (or they are explicitly documented as open questions)
 - Follow-up searches are no longer producing new substantive information
@@ -0,0 +1,146 @@
 ## Research Engine — Analysis Phase (Steps 4–8)
 ### Step 4: Build Comparison/Analysis Framework
 Based on the question type, select fixed analysis dimensions. **For dimension lists** (General, Concept Comparison, Decision Support): Read `references/comparison-frameworks.md`
 **Save action**:
 Write to `03_comparison_framework.md`:
 ```markdown
 # Comparison Framework
 ## Selected Framework Type
 [Concept Comparison / Decision Support / ...]
 ## Selected Dimensions
 1. [Dimension 1]
 2. [Dimension 2]
 ...
 ## Initial Population
 | Dimension | X | Y | Factual Basis |
 |-----------|---|---|---------------|
 | [Dimension 1] | [description] | [description] | Fact #1, #3 |
 | ... | | | |
 ```
 ---
 ### Step 5: Reference Point Baseline Alignment
 Ensure all compared parties have clear, consistent definitions:
 **Checklist**:
 - [ ] Is the reference point's definition stable/widely accepted?
 - [ ] Does it need verification, or can domain common knowledge be used?
 - [ ] Does the reader's understanding of the reference point match mine?
 - [ ] Are there ambiguities that need to be clarified first?
 ---
 ### Step 6: Fact-to-Conclusion Reasoning Chain
 Explicitly write out the "fact → comparison → conclusion" reasoning process:
 ```markdown
 ## Reasoning Process
 ### Regarding [Dimension Name]
 1. **Fact confirmation**: According to [source], X's mechanism is...
 2. **Compare with reference**: While Y's mechanism is...
 3. **Conclusion**: Therefore, the difference between X and Y on this dimension is...
 ```
 **Key discipline**:
 - Conclusions come from mechanism comparison, not "gut feelings"
 - Every conclusion must be traceable to specific facts
 - Uncertain conclusions must be annotated
 **Save action**:
 Write to `04_reasoning_chain.md`:
 ```markdown
 # Reasoning Chain
 ## Dimension 1: [Dimension Name]
 ### Fact Confirmation
 According to [Fact #X], X's mechanism is...
 ### Reference Comparison
 While Y's mechanism is... (Source: [Fact #Y])
 ### Conclusion
 Therefore, the difference between X and Y on this dimension is...
 ### Confidence
 ✅/⚠️/❓ + rationale
 ---
 ## Dimension 2: [Dimension Name]
 ...
 ```
 ---
 ### Step 7: Use-Case Validation (Sanity Check)
 Validate conclusions against a typical scenario:
 **Validation questions**:
 - Based on my conclusions, how should this scenario be handled?
 - Is that actually the case?
 - Are there counterexamples that need to be addressed?
 **Review checklist**:
 - [ ] Are draft conclusions consistent with Step 3 fact cards?
 - [ ] Are there any important dimensions missed?
 - [ ] Is there any over-extrapolation?
 - [ ] Are conclusions actionable/verifiable?
 **Save action**:
 Write to `05_validation_log.md`:
 ```markdown
 # Validation Log
 ## Validation Scenario
 [Scenario description]
 ## Expected Based on Conclusions
 If using X: [expected behavior]
 If using Y: [expected behavior]
 ## Actual Validation Results
 [actual situation]
 ## Counterexamples
 [yes/no, describe if yes]
 ## Review Checklist
 - [x] Draft conclusions consistent with fact cards
 - [x] No important dimensions missed
 - [x] No over-extrapolation
 - [ ] Issue found: [if any]
 ## Conclusions Requiring Revision
 [if any]
 ```
 ---
 ### Step 8: Deliverable Formatting
 Make the output **readable, traceable, and actionable**.
 **Save action**:
 Integrate all intermediate artifacts. Write to `OUTPUT_DIR/solution_draft##.md` using the appropriate output template based on active mode:
 - Mode A: `templates/solution_draft_mode_a.md`
 - Mode B: `templates/solution_draft_mode_b.md`
 Sources to integrate:
 - Extract background from `00_question_decomposition.md`
 - Reference key facts from `02_fact_cards.md`
 - Organize conclusions from `04_reasoning_chain.md`
 - Generate references from `01_source_registry.md`
 - Supplement with use cases from `05_validation_log.md`
 - For Mode A: include AC assessment from `00_ac_assessment.md`
@@ -0,0 +1,49 @@
 # Acceptance Criteria
 ## Detection Accuracy
 - Detections with confidence below `probability_threshold` (default: 0.25) are filtered out.
 - Overlapping detections with containment ratio > `tracking_intersection_threshold` (default: 0.6) are deduplicated, keeping the higher-confidence detection.
 - Tile duplicate detections are identified when all bounding box coordinates differ by less than 0.01 (TILE_DUPLICATE_CONFIDENCE_THRESHOLD).
 - Physical size filtering: detections exceeding `max_object_size_meters` for their class (defined in classes.json, range 2–20 meters) are removed.
 ## Video Processing
 - Frame sampling: every Nth frame processed, controlled by `frame_period_recognition` (default: 4).
 - Minimum annotation interval: `frame_recognition_seconds` (default: 2 seconds) between reported annotations.
 - Tracking: new annotation accepted if any detection moved beyond `tracking_distance_confidence` threshold or confidence increased beyond `tracking_probability_increase`.
 ## Image Processing
 - Images ≤ 1.5× model dimensions (1280×1280): processed as single frame.
 - Larger images: tiled based on ground sampling distance. Tile physical size: 25 meters (METERS_IN_TILE). Tile overlap: `big_image_tile_overlap_percent` (default: 20%).
 - GSD calculation: `sensor_width * altitude / (focal_length * image_width)`.
 ## API
 - `GET /health` always returns `status: "healthy"` (even if engine is unavailable — aiAvailability indicates actual state).
 - `POST /detect` returns detection results synchronously. Errors: 400 (empty/invalid image), 422 (runtime error), 503 (engine unavailable).
 - `POST /detect/{media_id}` returns immediately with `{"status": "started"}`. Rejects duplicate media_id with 409.
 - `GET /detect/stream` delivers SSE events with `mediaStatus` values: AIProcessing, AIProcessed, Error.
 - SSE queue maximum depth: 100 events per client. Overflow is silently dropped.
 ## Engine Lifecycle
 - Engine initialization is lazy (first detection request, not startup).
 - Status transitions: NONE → DOWNLOADING → (CONVERTING → UPLOADING →) ENABLED | WARNING | ERROR.
 - GPU check: NVIDIA GPU with compute capability ≥ 6.1.
 - TensorRT conversion uses FP16 precision when GPU supports fast FP16.
 - Background conversion does not block API responsiveness.
 ## Logging
 - Log files: `Logs/log_inference_YYYYMMDD.txt`.
 - Rotation: daily.
 - Retention: 30 days.
 - Console: INFO/DEBUG/SUCCESS to stdout, WARNING+ to stderr.
 ## Object Classes
 - 19 base detection classes defined in `classes.json`.
 - 3 weather modes (Norm, Wint, Night) — total up to 57 class variants.
 - Each class has: Id, Name, Color, MaxSizeM (max physical size in meters).
@@ -0,0 +1,76 @@
 # Input Data Parameters
 ## Media Input
 ### Single Image Detection (POST /detect)
 | Parameter | Type | Source | Description |
 |-----------|------|--------|-------------|
 | file | bytes (multipart) | Client upload | Image file (JPEG, PNG, etc. — any format OpenCV can decode) |
 | config | JSON string (optional) | Query/form field | AIConfigDto overrides |
 ### Media Detection (POST /detect/{media_id})
 | Parameter | Type | Source | Description |
 |-----------|------|--------|-------------|
 | media_id | string | URL path | Identifier for media in the Loader service |
 | AIConfigDto body | JSON (optional) | Request body | Configuration overrides |
 | Authorization header | Bearer token | HTTP header | JWT for Annotations service |
 | x-refresh-token header | string | HTTP header | Refresh token for JWT renewal |
 Media files (images and videos) are resolved by the Inference pipeline via paths in the config. The Loader service provides model files, not media files directly.
 ## Configuration Input (AIConfigDto / AIRecognitionConfig)
 | Field | Type | Default | Range/Meaning |
 |-------|------|---------|---------------|
 | frame_period_recognition | int | 4 | Process every Nth video frame |
 | frame_recognition_seconds | int | 2 | Minimum seconds between video annotations |
 | probability_threshold | float | 0.25 | Minimum detection confidence (0..1) |
 | tracking_distance_confidence | float | 0.0 | Movement threshold for tracking (model-width fraction) |
 | tracking_probability_increase | float | 0.0 | Confidence increase threshold for tracking |
 | tracking_intersection_threshold | float | 0.6 | Overlap ratio for NMS deduplication |
 | model_batch_size | int | 1 | Inference batch size |
 | big_image_tile_overlap_percent | int | 20 | Tile overlap for large images (0-100%) |
 | altitude | float | 400 | Camera altitude in meters |
 | focal_length | float | 24 | Camera focal length in mm |
 | sensor_width | float | 23.5 | Camera sensor width in mm |
 | paths | list[str] | [] | Media file paths to process |
 ## Model Files
 | File | Format | Source | Description |
 |------|--------|--------|-------------|
 | azaion.onnx | ONNX | Loader service | Base detection model |
 | azaion.cc_{M}.{m}_sm_{N}.engine | TensorRT | Loader service (cached) | GPU-specific compiled engine |
 ## Static Data
 ### classes.json
 Array of 19 objects, each with:
 | Field | Type | Example | Description |
 |-------|------|---------|-------------|
 | Id | int | 0 | Class identifier |
 | Name | string | "ArmorVehicle" | English class name |
 | ShortName | string | "Броня" | Ukrainian short name |
 | Color | string | "#ff0000" | Hex color for visualization |
 | MaxSizeM | int | 8 | Maximum physical object size in meters |
 ## Data Volumes
 - Single image: up to tens of megapixels (aerial imagery). Large images are tiled.
 - Video: processed frame-by-frame with configurable sampling rate.
 - Model file: ONNX model size depends on architecture (typically 10-100 MB). TensorRT engines are GPU-specific compiled versions.
 - Detection output: up to 300 detections per frame (model limit).
 ## Data Formats
 | Data | Format | Serialization |
 |------|--------|---------------|
 | API requests | HTTP multipart / JSON | Pydantic validation |
 | API responses | JSON | Pydantic model_dump |
 | SSE events | text/event-stream | JSON per event |
 | Internal config | Python dict | AIRecognitionConfig.from_dict() |
 | Legacy (unused) | msgpack | serialize() / from_msgpack() |
@@ -0,0 +1,28 @@
 # Problem Statement
 ## What is this system?
 Azaion.Detections is an AI-powered object detection microservice designed for aerial reconnaissance. It processes drone and satellite imagery (both still images and video) to automatically identify and locate military and infrastructure objects — including armored vehicles, trucks, artillery, trenches, personnel, camouflage nets, buildings, and more.
 ## What problem does it solve?
 Manual analysis of aerial imagery is slow, error-prone, and does not scale. When monitoring large areas from drones or satellites, a human analyst cannot review every frame in real time. This service automates the detection process: given an image or video feed, it returns structured bounding boxes with object classifications and confidence scores, enabling rapid situational awareness.
 ## Who are the users?
 - **Client applications** that submit media for analysis (via HTTP API)
 - **Downstream services** (Annotations service) that store and present detection results
 - **Real-time consumers** that subscribe to Server-Sent Events for live detection updates during video processing
 ## How does it work at a high level?
 1. A client sends an image or triggers detection on media files available in the Loader service
 2. The service preprocesses frames — resizing, normalizing, and for large aerial images, splitting into GSD-based tiles to preserve small object detail
 3. Frames are batched and run through a YOLO-based object detection model via TensorRT (GPU) or ONNX Runtime (CPU fallback)
 4. Raw model output is postprocessed: coordinate normalization, confidence thresholding, overlapping detection removal, physical size filtering, and tile deduplication
 5. Results are returned as structured DTOs (bounding box center, dimensions, class label, confidence)
 6. For video/batch processing, results are streamed in real-time via SSE and optionally posted to an external Annotations service
 ## Domain context
 The system operates in a military/defense aerial reconnaissance context. The 19 object classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier, Ammo, Protect.Struct) reflect objects of interest in ground surveillance. Three weather modes (Normal, Winter, Night) provide environment-specific detection variants. Physical size filtering using ground sampling distance ensures detections are physically plausible given camera altitude and optics.
@@ -0,0 +1,33 @@
 # Restrictions
 ## Hardware
 - **GPU**: NVIDIA GPU with compute capability ≥ 6.1 required for TensorRT acceleration. Without a compatible GPU, the system falls back to ONNX Runtime (CPU or CUDA provider).
 - **GPU memory**: TensorRT model conversion uses 90% of available GPU memory as workspace. Minimum ~2 GB GPU memory assumed (default fallback value).
 - **Concurrency**: ThreadPoolExecutor limited to 2 workers — maximum 2 concurrent inference operations.
 ## Software
 - **Python 3** with Cython 3.1.3 compilation required (setup.py build step).
 - **ONNX model**: `azaion.onnx` must be available via the Loader service.
 - **TensorRT engine files** are GPU-architecture-specific (filename encodes compute capability and SM count) — not portable across different GPU models.
 - **OpenCV 4.10.0** for image/video decoding and preprocessing.
 - **classes.json** must exist in the working directory at startup — no fallback if missing.
 - **Model input**: fixed 1280×1280 default for dynamic dimensions (hardcoded in TensorRT engine).
 - **Model output**: maximum 300 detections per frame, 6 values per detection (x1, y1, x2, y2, confidence, class_id).
 ## Environment
 - **LOADER_URL** environment variable (default: `http://loader:8080`) — Loader service must be reachable for model download/upload.
 - **ANNOTATIONS_URL** environment variable (default: `http://annotations:8080`) — Annotations service must be reachable for result posting and token refresh.
 - **Logging directory**: `Logs/` directory must be writable for loguru file output.
 - **No local model storage**: models are downloaded on demand from the Loader service; converted TensorRT engines are uploaded back for caching.
 ## Operational
 - **No persistent storage**: the service is stateless regarding detection results — all results are returned via HTTP/SSE or forwarded to the Annotations service.
 - **No TLS at application level**: encryption in transit is expected to be handled by infrastructure (reverse proxy / service mesh).
 - **No CORS configuration**: cross-origin requests are not explicitly handled.
 - **No rate limiting**: the service has no built-in throttling.
 - **No graceful shutdown**: in-progress detections are not drained on shutdown; background TensorRT conversion runs in a daemon thread.
 - **Single-instance state**: `_active_detections` dict and `_event_queues` list are in-memory — not shared across instances or persistent across restarts.
@@ -0,0 +1,70 @@
 # Azaion.Detections — Solution
 ## 1. Product Solution Description
 Azaion.Detections is a microservice that performs automated object detection on aerial imagery and video. It accepts media via HTTP API, runs inference through ONNX Runtime or TensorRT engines, and returns structured detection results (bounding boxes, class labels, confidence scores). Results are delivered synchronously for single images, or streamed via SSE for batch/video media processing.
 ```mermaid
 graph LR
    Client["Client App"] -->|HTTP| API["FastAPI API"]
    API -->|delegates| INF["Inference Pipeline"]
    INF -->|runs| ENG["ONNX / TensorRT Engine"]
    INF -->|downloads models| LDR["Loader Service"]
    API -->|posts results| ANN["Annotations Service"]
    API -->|streams| SSE["SSE Clients"]
 ```
 ## 2. Architecture
 ### Component Architecture
 | Component | Modules | Responsibility |
 |-----------|---------|---------------|
 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, constants, logging, class registry |
 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML backends (Strategy pattern) |
 | Inference Pipeline | inference, loader_http_client | Engine lifecycle, preprocessing, postprocessing, media processing |
 | API | main | HTTP endpoints, SSE streaming, auth token forwarding |
 ### Solution Assessment
 | Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
 |----------|-------|-----------|-------------|-------------|----------|------|-----|
 | Cython inference pipeline | Python 3, Cython 3.1.3, OpenCV 4.10 | Near-C performance for tight detection loops while retaining Python ecosystem | Build complexity, limited IDE/debug support | Compilation step via setup.py | N/A | Low (open-source) | High — critical for postprocessing throughput |
 | Dual engine strategy (TensorRT + ONNX) | TensorRT 10.11, ONNX Runtime 1.22 | Maximum GPU speed with CPU fallback; auto-conversion and caching | Two code paths; GPU-specific engine files not portable | NVIDIA GPU (CC ≥ 6.1) for TensorRT | N/A | TensorRT free for NVIDIA GPUs | High — balances performance and portability |
 | FastAPI HTTP service | FastAPI, Uvicorn, Pydantic | Async SSE, auto-generated docs, fast development | Sync inference offloaded to ThreadPoolExecutor (2 workers) | Python 3.8+ | Bearer token pass-through | Low (open-source) | High — fits async streaming + sync inference pattern |
 | GSD-based image tiling | OpenCV, NumPy | Preserves small object detail in large aerial images | Complex tile dedup logic; overlap increases compute | Camera metadata (altitude, focal length, sensor width) | N/A | Compute cost scales with image size | High — essential for aerial imagery use case |
 | Lazy engine initialization | pynvml, threading | Fast API startup; background model conversion | First request has high latency; engine may be unavailable | None | N/A | N/A | High — prevents blocking startup on slow model download/conversion |
 ## 3. Testing Strategy
 ### Current State
 No tests found in the codebase. No test directories, test frameworks, or test runner configurations exist.
 ### Observed Validation Mechanisms
 - Detection confidence threshold filtering (`probability_threshold`)
 - Overlapping detection removal (containment-biased NMS)
 - Physical size filtering via ground sampling distance and max_object_size_meters
 - Tile deduplication via coordinate proximity
 - Video annotation validity heuristics (time gap, movement, confidence)
 - AI availability status tracking with error states
 ## 4. References
 | Artifact | Path | Description |
 |----------|------|-------------|
 | FastAPI application | `main.py` | API endpoints, DTOs, SSE streaming |
 | Inference orchestrator | `inference.pyx` / `.pxd` | Core pipeline logic |
 | Engine interface | `inference_engine.pyx` / `.pxd` | Abstract base class |
 | ONNX engine | `onnx_engine.pyx` | CPU/CUDA inference |
 | TensorRT engine | `tensorrt_engine.pyx` / `.pxd` | GPU inference + conversion |
 | Detection models | `annotation.pyx` / `.pxd` | Detection and Annotation classes |
 | Configuration | `ai_config.pyx` / `.pxd` | AIRecognitionConfig |
 | Status tracking | `ai_availability_status.pyx` / `.pxd` | Engine lifecycle status |
 | Constants & logging | `constants_inf.pyx` / `.pxd` | Constants, class registry, logging |
 | HTTP client | `loader_http_client.py` | Model download/upload |
 | Class definitions | `classes.json` | 19 detection classes with metadata |
 | Build config | `setup.py` | Cython compilation |
 | CPU dependencies | `requirements.txt` | Python package versions |
 | GPU dependencies | `requirements-gpu.txt` | TensorRT, PyCUDA additions |
@@ -0,0 +1,135 @@
 # Codebase Discovery
 ## Directory Tree
 ```
 detections/
 ├── main.py                    # FastAPI entry point
 ├── setup.py                   # Cython build configuration
 ├── requirements.txt           # CPU dependencies
 ├── requirements-gpu.txt       # GPU dependencies (extends requirements.txt)
 ├── classes.json               # Object detection class definitions (19 classes)
 ├── .gitignore
 ├── inference.pyx / .pxd       # Core inference orchestrator (Cython)
 ├── inference_engine.pyx / .pxd # Abstract base engine class (Cython)
 ├── onnx_engine.pyx            # ONNX Runtime inference engine (Cython)
 ├── tensorrt_engine.pyx / .pxd # TensorRT inference engine (Cython)
 ├── annotation.pyx / .pxd      # Detection & Annotation data models (Cython)
 ├── ai_config.pyx / .pxd       # AI recognition config (Cython)
 ├── ai_availability_status.pyx / .pxd # AI status enum & state (Cython)
 ├── constants_inf.pyx / .pxd   # Constants, logging, class registry (Cython)
 └── loader_http_client.py      # HTTP client for model loading/uploading
 ```
 ## Tech Stack Summary
 | Aspect | Technology |
 |--------|-----------|
 | Language | Python 3 + Cython |
 | Web Framework | FastAPI + Uvicorn |
 | ML Inference (CPU) | ONNX Runtime 1.22.0 |
 | ML Inference (GPU) | TensorRT 10.11.0 + PyCUDA 2025.1.1 |
 | Image Processing | OpenCV 4.10.0 |
 | Serialization | msgpack 1.1.1 |
 | HTTP Client | requests 2.32.4 |
 | Logging | loguru 0.7.3 |
 | GPU Monitoring | pynvml 12.0.0 |
 | Numeric | NumPy 2.3.0 |
 | Build | Cython 3.1.3 + setuptools |
 ## Dependency Graph
 ### Internal Module Dependencies
 ```
 constants_inf       ← (leaf) no internal deps
 ai_config           ← (leaf) no internal deps
 inference_engine    ← (leaf) no internal deps
 loader_http_client  ← (leaf) no internal deps
 ai_availability_status → constants_inf
 annotation             → constants_inf
 onnx_engine     → inference_engine, constants_inf
 tensorrt_engine → inference_engine, constants_inf
 inference → constants_inf, ai_availability_status, annotation, ai_config,
            onnx_engine | tensorrt_engine (conditional on GPU availability)
 main → inference, constants_inf, loader_http_client
 ```
 ### Mermaid Diagram
 ```mermaid
 graph TD
    main["main.py (FastAPI)"]
    inference["inference"]
    onnx_engine["onnx_engine"]
    tensorrt_engine["tensorrt_engine"]
    inference_engine["inference_engine (abstract)"]
    annotation["annotation"]
    ai_availability_status["ai_availability_status"]
    ai_config["ai_config"]
    constants_inf["constants_inf"]
    loader_http_client["loader_http_client"]
    main --> inference
    main --> constants_inf
    main --> loader_http_client
    inference --> constants_inf
    inference --> ai_availability_status
    inference --> annotation
    inference --> ai_config
    inference -.->|GPU available| tensorrt_engine
    inference -.->|CPU fallback| onnx_engine
    onnx_engine --> inference_engine
    onnx_engine --> constants_inf
    tensorrt_engine --> inference_engine
    tensorrt_engine --> constants_inf
    ai_availability_status --> constants_inf
    annotation --> constants_inf
 ```
 ## Topological Processing Order
 1. `constants_inf` (leaf)
 2. `ai_config` (leaf)
 3. `inference_engine` (leaf)
 4. `loader_http_client` (leaf)
 5. `ai_availability_status` (depends: constants_inf)
 6. `annotation` (depends: constants_inf)
 7. `onnx_engine` (depends: inference_engine, constants_inf)
 8. `tensorrt_engine` (depends: inference_engine, constants_inf)
 9. `inference` (depends: constants_inf, ai_availability_status, annotation, ai_config, onnx_engine/tensorrt_engine)
 10. `main` (depends: inference, constants_inf, loader_http_client)
 ## Entry Points
 - `main.py` — FastAPI application, serves HTTP API on uvicorn
 ## Leaf Modules
 - `constants_inf` — constants, logging, class registry
 - `ai_config` — recognition configuration data class
 - `inference_engine` — abstract base class for engines
 - `loader_http_client` — HTTP client for external loader service
 ## Cycles
 None detected.
 ## External Services
 | Service | URL Source | Purpose |
 |---------|-----------|---------|
 | Loader | `LOADER_URL` env var (default `http://loader:8080`) | Download/upload AI models |
 | Annotations | `ANNOTATIONS_URL` env var (default `http://annotations:8080`) | Post detection results, refresh auth tokens |
 ## Data Files
 - `classes.json` — 19 object detection classes with Ukrainian short names, colors, and max physical size in meters (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, etc.)
@@ -0,0 +1,98 @@
 # Verification Log
 ## Summary
 | Metric | Count |
 |--------|-------|
 | Total entities verified | 82 |
 | Entities confirmed correct | 78 |
 | Issues found | 4 |
 | Corrections applied | 4 |
 | Remaining gaps | 0 |
 | Completeness score | 10/10 modules covered |
 ## Entity Verification
 ### Classes & Functions — All Verified
 | Entity | Module | Status |
 |--------|--------|--------|
 | AnnotationClass | constants_inf | Confirmed |
 | WeatherMode enum (Norm/Wint/Night) | constants_inf | Confirmed |
 | log(), logerror(), format_time() | constants_inf | Confirmed |
 | annotations_dict | constants_inf | Confirmed |
 | AIRecognitionConfig | ai_config | Confirmed |
 | from_dict(), from_msgpack() | ai_config | Confirmed |
 | AIAvailabilityEnum | ai_availability_status | Confirmed |
 | AIAvailabilityStatus | ai_availability_status | Confirmed |
 | Detection, Annotation | annotation | Confirmed |
 | InferenceEngine | inference_engine | Confirmed |
 | OnnxEngine | onnx_engine | Confirmed |
 | TensorRTEngine | tensorrt_engine | Confirmed |
 | convert_from_onnx, get_engine_filename, get_gpu_memory_bytes | tensorrt_engine | Confirmed |
 | Inference | inference | Confirmed |
 | LoaderHttpClient, LoadResult | loader_http_client | Confirmed |
 | DetectionDto, DetectionEvent, HealthResponse, AIConfigDto | main | Confirmed |
 | TokenManager | main | Confirmed |
 | detection_to_dto | main | Confirmed |
 ### API Endpoints — All Verified
 | Endpoint | Method | Status |
 |----------|--------|--------|
 | /health | GET | Confirmed |
 | /detect | POST | Confirmed |
 | /detect/{media_id} | POST | Confirmed |
 | /detect/stream | GET | Confirmed |
 ### Constants — All Verified
 All 10 constants in constants_inf module verified against code values.
 ## Issues Found & Corrections Applied
 ### Issue 1: Legacy PXD Declarations (constants_inf)
 **Location**: `constants_inf.pxd` lines 3-5
 **Finding**: The `.pxd` header declares `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, and `ANNOTATIONS_QUEUE` which are NOT defined in the `.pyx` implementation. Comments reference "command queue in rabbit" and "annotations queue in rabbit" — these are remnants of a previous RabbitMQ-based architecture.
 **Correction**: Added note to `modules/constants_inf.md` documenting these as orphaned legacy declarations.
 ### Issue 2: Unused serialize() Methods
 **Location**: `annotation.pyx` (Annotation.serialize, Detection — via annotation), `ai_availability_status.pyx` (AIAvailabilityStatus.serialize)
 **Finding**: Both `serialize()` methods are defined but never called anywhere in the codebase. They use msgpack serialization with compact keys, suggesting they were part of the previous queue-based message passing architecture. The current HTTP API uses Pydantic JSON serialization instead.
 **Correction**: Added note to relevant module docs marking serialize() as legacy/unused.
 ### Issue 3: Unused from_msgpack() Factory Method
 **Location**: `ai_config.pyx` line 55
 **Finding**: `AIRecognitionConfig.from_msgpack()` is defined but never called. Only `from_dict()` is used (called from `inference.pyx`). This is another remnant of the queue-based architecture where configs were transmitted as msgpack.
 **Correction**: Added note to `modules/ai_config.md`.
 ### Issue 4: Unused file_data Field
 **Location**: `ai_config.pyx` line 31
 **Finding**: `AIRecognitionConfig.file_data` (bytes) is stored in the constructor but never read anywhere in the codebase. It's populated from both `from_dict` and `from_msgpack` but has no consumer.
 **Correction**: Added note to `modules/ai_config.md`.
 ## Cross-Document Consistency
 | Check | Result |
 |-------|--------|
 | Component docs match architecture doc | Consistent |
 | Flow diagrams match component interfaces | Consistent |
 | Data model matches module docs | Consistent |
 | Dependency graph in discovery matches component diagram | Consistent |
 | Constants values in docs match code | Confirmed |
 ## Infrastructure Observation
 No Dockerfile, docker-compose.yml, or CI/CD configuration found in the repository. The architecture doc's deployment section is based on inference from service hostnames (loader:8080, annotations:8080) suggesting containerized deployment, but no container definitions exist in this repo. They likely reside in a parent or infrastructure repository.
@@ -0,0 +1,86 @@
 # Azaion.Detections — Documentation Report
 ## Executive Summary
 Azaion.Detections is a Python/Cython microservice for automated aerial object detection. It exposes a FastAPI HTTP API that accepts images and video, runs YOLO-based inference through TensorRT (GPU) or ONNX Runtime (CPU fallback), and returns structured detection results. The system supports large aerial image tiling with ground sampling distance-based sizing, real-time video processing with frame sampling and tracking heuristics, and Server-Sent Events streaming for live detection updates.
 The codebase consists of 10 modules (2 Python, 8 Cython) organized into 4 components. It integrates with two external services: a Loader service for model storage and an Annotations service for result persistence. The system has no tests, no containerization config in this repo, and several legacy artifacts from a prior RabbitMQ-based architecture.
 ## Problem Statement
 Automated detection of military and infrastructure objects (19 classes including vehicles, artillery, trenches, personnel, camouflage) from aerial imagery and video feeds. Replaces manual analyst review with real-time AI-powered detection, enabling rapid situational awareness for reconnaissance operations.
 ## Architecture Overview
 **Tech stack**: Python 3 + Cython 3.1.3 | FastAPI + Uvicorn | ONNX Runtime 1.22.0 | TensorRT 10.11.0 | OpenCV 4.10.0 | NumPy 2.3.0
 **Key architectural decisions**:
 1. Cython for performance-critical inference loops
 2. Dual engine strategy (TensorRT + ONNX fallback) with automatic conversion and caching
 3. Lazy engine initialization for fast API startup
 4. GSD-based image tiling for large aerial images
 ## Component Summary
 | # | Component | Modules | Purpose | Dependencies |
 |---|-----------|---------|---------|-------------|
 | 01 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, enums, constants, logging, class registry | None (foundation) |
 | 02 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML inference backends (Strategy pattern) | Domain |
 | 03 | Inference Pipeline | inference, loader_http_client | Engine lifecycle, media preprocessing/postprocessing, model loading | Domain, Engines |
 | 04 | API | main | HTTP endpoints, SSE streaming, auth token management | Domain, Pipeline |
 ## System Flows
 | # | Flow | Trigger | Description |
 |---|------|---------|-------------|
 | F1 | Health Check | GET /health | Returns AI engine availability status |
 | F2 | Single Image Detection | POST /detect | Synchronous image inference, returns detections |
 | F3 | Media Detection (Async) | POST /detect/{media_id} | Background processing with SSE streaming + Annotations posting |
 | F4 | SSE Streaming | GET /detect/stream | Real-time event delivery to connected clients |
 | F5 | Engine Initialization | First detection request | TensorRT → ONNX fallback → background conversion |
 | F6 | TensorRT Conversion | No cached engine | Background ONNX→TensorRT conversion and upload |
 ## Risk Observations
 | Risk | Severity | Source |
 |------|----------|--------|
 | No tests in the codebase | High | Verification (Step 4) |
 | No CORS, rate limiting, or request size limits | Medium | Security review (main.py) |
 | JWT token handled without signature verification | Medium | Security review (main.py) |
 | Legacy unused code (serialize, from_msgpack, queue declarations) | Low | Verification (Step 4) |
 | No graceful shutdown for in-progress detections | Medium | Architecture review |
 | Single-instance in-memory state (_active_detections, _event_queues) | Medium | Scalability review |
 | No Dockerfile or CI/CD config in this repository | Low | Infrastructure review |
 | classes.json must exist at startup — no fallback | Low | Reliability review |
 | Hardcoded 1280×1280 default for dynamic TensorRT dimensions | Low | Flexibility review |
 ## Open Questions
 1. Where is the Dockerfile / docker-compose.yml for this service? Likely in a separate infrastructure repository.
 2. Is the legacy RabbitMQ code (serialize methods, from_msgpack, queue constants in .pxd) planned for removal?
 3. What is the intended scaling model — single instance per GPU, or horizontal scaling with shared state?
 4. Should JWT signature verification be added at the detection service level, or is the current pass-through approach intentional?
 5. Are there integration or end-to-end tests in a separate repository?
 ## Artifact Index
 | Path | Description |
 |------|-------------|
 | `_docs/00_problem/problem.md` | Problem statement |
 | `_docs/00_problem/restrictions.md` | System restrictions and constraints |
 | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria |
 | `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and parameters |
 | `_docs/01_solution/solution.md` | Solution description and assessment |
 | `_docs/02_document/00_discovery.md` | Codebase discovery (tech stack, dependency graph) |
 | `_docs/02_document/modules/*.md` | Per-module documentation (10 modules) |
 | `_docs/02_document/components/01_domain/description.md` | Domain component spec |
 | `_docs/02_document/components/02_inference_engines/description.md` | Inference Engines component spec |
 | `_docs/02_document/components/03_inference_pipeline/description.md` | Inference Pipeline component spec |
 | `_docs/02_document/components/04_api/description.md` | API component spec |
 | `_docs/02_document/diagrams/components.md` | Component relationship diagram |
 | `_docs/02_document/architecture.md` | System architecture document |
 | `_docs/02_document/system-flows.md` | System flow diagrams and descriptions |
 | `_docs/02_document/data_model.md` | Data model with ERD |
 | `_docs/02_document/04_verification_log.md` | Verification pass results |
 | `_docs/02_document/FINAL_report.md` | This report |
 | `_docs/02_document/state.json` | Documentation process state |
@@ -0,0 +1,151 @@
 # Azaion.Detections — Architecture
 ## 1. System Context
 **Problem being solved**: Automated object detection on aerial imagery and video — identifying military and infrastructure objects (vehicles, artillery, trenches, personnel, etc.) from drone/satellite feeds and returning structured detection results with bounding boxes, class labels, and confidence scores.
 **System boundaries**:
 - **Inside**: FastAPI HTTP service, Cython-based inference pipeline, ONNX/TensorRT inference engines, image tiling, video frame processing, detection postprocessing
 - **Outside**: Loader service (model storage), Annotations service (result persistence + auth), client applications
 **External systems**:
 | System | Integration Type | Direction | Purpose |
 |--------|-----------------|-----------|---------|
 | Loader Service | REST (HTTP) | Both | Download AI models, upload converted TensorRT engines |
 | Annotations Service | REST (HTTP) | Outbound | Post detection results, refresh auth tokens |
 | Client Applications | REST + SSE | Inbound | Submit detection requests, receive streaming results |
 ## 2. Technology Stack
 | Layer | Technology | Version | Rationale |
 |-------|-----------|---------|-----------|
 | Language | Python 3 + Cython | 3.1.3 (Cython) | Python for API, Cython for performance-critical inference loops |
 | Framework | FastAPI + Uvicorn | latest | Async HTTP + SSE support |
 | ML Runtime (CPU) | ONNX Runtime | 1.22.0 | Portable model format, CPU/CUDA provider fallback |
 | ML Runtime (GPU) | TensorRT + PyCUDA | 10.11.0 / 2025.1.1 | Maximum GPU inference performance |
 | Image Processing | OpenCV | 4.10.0 | Frame decoding, preprocessing, tiling |
 | Serialization | msgpack | 1.1.1 | Compact binary serialization for annotations and configs |
 | HTTP Client | requests | 2.32.4 | Synchronous HTTP to Loader and Annotations services |
 | Logging | loguru | 0.7.3 | Structured file + console logging |
 | GPU Monitoring | pynvml | 12.0.0 | GPU detection, capability checks, memory queries |
 | Numeric | NumPy | 2.3.0 | Tensor manipulation |
 ## 3. Deployment Model
 **Infrastructure**: Containerized microservice, deployed alongside Loader and Annotations services (likely Docker Compose or Kubernetes given service discovery by hostname).
 **Environment-specific configuration**:
 | Config | Development | Production |
 |--------|-------------|------------|
 | LOADER_URL | `http://loader:8080` (default) | Environment variable |
 | ANNOTATIONS_URL | `http://annotations:8080` (default) | Environment variable |
 | GPU | Optional (falls back to ONNX CPU) | Required (TensorRT) |
 | Logging | Console + file | File (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) |
 ## 4. Data Model Overview
 **Core entities**:
 | Entity | Description | Owned By Component |
 |--------|-------------|--------------------|
 | AnnotationClass | Detection class metadata (name, color, max physical size) | 01 Domain |
 | Detection | Single bounding box with class + confidence | 01 Domain |
 | Annotation | Collection of detections for one frame/tile + image | 01 Domain |
 | AIRecognitionConfig | Runtime inference parameters | 01 Domain |
 | AIAvailabilityStatus | Engine lifecycle state | 01 Domain |
 | DetectionDto | API-facing detection response | 04 API |
 | DetectionEvent | SSE event payload | 04 API |
 **Key relationships**:
 - Annotation → Detection: one-to-many (detections within a frame/tile)
 - Detection → AnnotationClass: many-to-one (via class ID lookup in annotations_dict)
 - Annotation → Media: many-to-one (multiple annotations per video/image)
 **Data flow summary**:
 - Media bytes → Preprocessing → Engine → Raw output → Postprocessing → Detection/Annotation → DTO → HTTP/SSE response
 - ONNX model bytes → Loader → Engine init (or TensorRT conversion → upload back to Loader)
 ## 5. Integration Points
 ### Internal Communication
 | From | To | Protocol | Pattern | Notes |
 |------|----|----------|---------|-------|
 | API | Inference Pipeline | Direct Python call | Sync (via ThreadPoolExecutor) | Lazy initialization |
 | Inference Pipeline | Inference Engines | Direct Cython call | Sync | Strategy pattern selection |
 | Inference Pipeline | Loader | HTTP POST | Request-Response | Model download/upload |
 ### External Integrations
 | External System | Protocol | Auth | Rate Limits | Failure Mode |
 |----------------|----------|------|-------------|--------------|
 | Loader Service | HTTP POST | None | None observed | Exception → LoadResult(err) |
 | Annotations Service | HTTP POST | Bearer JWT | None observed | Exception silently caught |
 | Annotations Auth | HTTP POST | Refresh token | None observed | Exception silently caught |
 ## 6. Non-Functional Requirements
 | Requirement | Target | Measurement | Priority |
 |------------|--------|-------------|----------|
 | Concurrent inference | 2 parallel jobs max | ThreadPoolExecutor workers | High |
 | SSE queue depth | 100 events per client | asyncio.Queue maxsize | Medium |
 | Log retention | 30 days | loguru rotation config | Medium |
 | GPU compatibility | Compute capability ≥ 6.1 | pynvml check at startup | High |
 | Model format | ONNX (portable) + TensorRT (GPU-specific) | Engine filename includes CC+SM | High |
 ## 7. Security Architecture
 **Authentication**: Pass-through Bearer JWT from client → forwarded to Annotations service. JWT exp decoded locally (base64, no signature verification) for token refresh timing.
 **Authorization**: None at the detection service level. Auth is delegated to the Annotations service.
 **Data protection**:
 - At rest: not applicable (no local persistence of detection results)
 - In transit: no TLS configured at application level (expected to be handled by infrastructure/reverse proxy)
 - Secrets management: tokens received per-request, no stored credentials
 **Audit logging**: Inference activity logged to daily rotated files. No auth audit logging.
 ## 8. Key Architectural Decisions
 ### ADR-001: Cython for Inference Pipeline
 **Context**: Detection postprocessing involves tight loops over bounding box coordinates with floating-point math.
 **Decision**: Implement the inference pipeline, data models, and engines as Cython `cdef` classes with typed variables.
 **Alternatives considered**:
 1. Pure Python — rejected due to loop-heavy postprocessing performance
 2. C/C++ extension — rejected for development velocity; Cython offers C-speed with Python-like syntax
 **Consequences**: Build step required (setup.py + Cython compilation). IDE support and debugging more complex.
 ### ADR-002: Dual Engine Strategy (TensorRT + ONNX Fallback)
 **Context**: Need maximum GPU inference speed where available, but must also run on CPU-only machines.
 **Decision**: Check GPU at module load time. If compatible NVIDIA GPU found, use TensorRT; otherwise fall back to ONNX Runtime. Background-convert ONNX→TensorRT and cache the engine.
 **Alternatives considered**:
 1. TensorRT only — rejected; would break CPU-only development/testing
 2. ONNX only — rejected; significantly slower on GPU vs TensorRT
 **Consequences**: Two code paths to maintain. GPU-specific engine files cached per architecture.
 ### ADR-003: Lazy Inference Initialization
 **Context**: Engine initialization is slow (model download, possible conversion). API should start accepting health checks immediately.
 **Decision**: `Inference` is created on first actual detection request, not at app startup. Health endpoint works without engine.
 **Consequences**: First detection request has higher latency. `AIAvailabilityStatus` reports state transitions during initialization.
 ### ADR-004: Large Image Tiling with GSD-Based Sizing
 **Context**: Aerial images can be much larger than the model's fixed input size (1280×1280). Simple resize would lose small object detail.
 **Decision**: Split large images into tiles sized by ground sampling distance (`METERS_IN_TILE / GSD` pixels) with configurable overlap. Deduplicate detections across tile boundaries.
 **Consequences**: More complex pipeline. Tile deduplication relies on coordinate proximity threshold.
@@ -0,0 +1,95 @@
 # Component: Domain Models & Configuration
 ## Overview
 **Purpose**: Provides all data models, enums, constants, detection class registry, and logging infrastructure used across the system.
 **Pattern**: Shared kernel — leaf-level types and utilities consumed by all other components.
 **Upstream**: None (foundation layer).
 **Downstream**: Inference Engines, Inference Pipeline, API.
 ## Modules
 | Module | Role |
 |--------|------|
 | `constants_inf` | Application constants, logging, detection class registry from `classes.json` |
 | `ai_config` | `AIRecognitionConfig` data class with factory methods |
 | `ai_availability_status` | Thread-safe `AIAvailabilityStatus` tracker with `AIAvailabilityEnum` |
 | `annotation` | `Detection` and `Annotation` data models |
 ## Internal Interfaces
 ### constants_inf
 ```
 cdef log(str log_message) -> void
 cdef logerror(str error) -> void
 cdef format_time(int ms) -> str
 annotations_dict: dict[int, AnnotationClass]
 ```
 ### ai_config
 ```
 cdef class AIRecognitionConfig:
    @staticmethod cdef from_msgpack(bytes data) -> AIRecognitionConfig
    @staticmethod def from_dict(dict data) -> AIRecognitionConfig
 ```
 ### ai_availability_status
 ```
 cdef class AIAvailabilityStatus:
    cdef set_status(AIAvailabilityEnum status, str error_message=None)
    cdef bytes serialize()
    # __str__ for display
 ```
 ### annotation
 ```
 cdef class Detection:
    cdef overlaps(Detection det2, float confidence_threshold) -> bool
    # __eq__ for tile deduplication
 cdef class Annotation:
    cdef bytes serialize()
 ```
 ## External API
 None — this is a shared kernel, not an externally-facing component.
 ## Data Access Patterns
 - `classes.json` read once at module import time (constants_inf)
 - All data is in-memory, no database access
 ## Implementation Details
 - Cython `cdef` classes for performance-critical detection processing
 - Thread-safe status tracking via `threading.Lock` in `AIAvailabilityStatus`
 - `Detection.__eq__` uses coordinate proximity threshold for tile deduplication
 - `Detection.overlaps` uses containment-biased metric (overlap / min_area) rather than standard IoU
 - Weather mode system triples the class registry (Norm/Wint/Night offsets of 0/20/40)
 ## Caveats
 - `classes.json` must exist in the working directory at import time — no fallback
 - `Detection.__eq__` is designed specifically for tile deduplication, not general equality
 - `annotations_dict` is a module-level global — not injectable/configurable at runtime
 ## Dependency Graph
 ```mermaid
 graph TD
    ai_availability_status --> constants_inf
    annotation --> constants_inf
    ai_config
    constants_inf
 ```
 ## Logging Strategy
 All logging flows through `constants_inf.log` and `constants_inf.logerror`, which delegate to loguru with file rotation and console output.
@@ -0,0 +1,86 @@
 # Component: Inference Engines
 ## Overview
 **Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
 **Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
 **Upstream**: Domain (constants_inf for logging).
 **Downstream**: Inference Pipeline (creates and uses engines).
 ## Modules
 | Module | Role |
 |--------|------|
 | `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
 | `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
 | `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
 ## Internal Interfaces
 ### InferenceEngine (abstract)
 ```
 cdef class InferenceEngine:
    __init__(bytes model_bytes, int batch_size=1, **kwargs)
    cdef tuple get_input_shape()       # -> (height, width)
    cdef int get_batch_size()          # -> batch_size
    cdef run(input_data)               # -> list of output tensors
 ```
 ### OnnxEngine
 ```
 cdef class OnnxEngine(InferenceEngine):
    # Implements all base methods
    # Provider priority: CUDA > CPU
 ```
 ### TensorRTEngine
 ```
 cdef class TensorRTEngine(InferenceEngine):
    # Implements all base methods
    @staticmethod get_gpu_memory_bytes(int device_id) -> int
    @staticmethod get_engine_filename(int device_id) -> str
    @staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
 ```
 ## External API
 None — internal component consumed by Inference Pipeline.
 ## Data Access Patterns
 - Model bytes loaded in-memory (provided by caller)
 - TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
 - ONNX: managed by onnxruntime internally
 ## Implementation Details
 - **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
 - **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
 - **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
 - **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
 - Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
 ## Caveats
 - TensorRT engine files are GPU-architecture-specific and not portable
 - `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
 - Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
 ## Dependency Graph
 ```mermaid
 graph TD
    onnx_engine --> inference_engine
    onnx_engine --> constants_inf
    tensorrt_engine --> inference_engine
    tensorrt_engine --> constants_inf
 ```
 ## Logging Strategy
 Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.
@@ -0,0 +1,129 @@
 # Component: Inference Pipeline
 ## Overview
 **Purpose**: Orchestrates the full inference lifecycle — engine initialization with fallback strategy, media preprocessing (images + video), batched inference execution, postprocessing with detection filtering, and result delivery via callbacks.
 **Pattern**: Façade + Pipeline — `Inference` class is the single entry point that coordinates engine selection, preprocessing, inference, and postprocessing stages.
 **Upstream**: Domain (data models, config, status), Inference Engines (OnnxEngine/TensorRTEngine), External Client (LoaderHttpClient).
 **Downstream**: API (creates Inference, calls `run_detect` and `detect_single_image`).
 ## Modules
 | Module | Role |
 |--------|------|
 | `inference` | Core orchestrator: engine lifecycle, preprocessing, postprocessing, image/video processing |
 | `loader_http_client` | HTTP client for model download/upload from Loader service |
 ## Internal Interfaces
 ### Inference
 ```
 cdef class Inference:
    __init__(loader_client)
    cpdef run_detect(dict config_dict, annotation_callback, status_callback=None)
    cpdef list detect_single_image(bytes image_bytes, dict config_dict)
    cpdef stop()
    # Internal pipeline stages:
    cdef init_ai()
    cdef preprocess(frames) -> ndarray
    cdef postprocess(output, ai_config) -> list[list[Detection]]
    cdef remove_overlapping_detections(list[Detection], float threshold) -> list[Detection]
    cdef _process_images(AIRecognitionConfig, list[str] paths)
    cdef _process_video(AIRecognitionConfig, str video_name)
 ```
 ### LoaderHttpClient
 ```
 class LoaderHttpClient:
    load_big_small_resource(str filename, str directory) -> LoadResult
    upload_big_small_resource(bytes content, str filename, str directory) -> LoadResult
 ```
 ## External API
 None — internal component, consumed by API layer.
 ## Data Access Patterns
 - Model bytes downloaded from Loader service (HTTP)
 - Converted TensorRT engines uploaded back to Loader for caching
 - Video frames read via OpenCV VideoCapture
 - Images read via OpenCV imread
 - All processing is in-memory
 ## Implementation Details
 ### Engine Initialization Strategy
 ```
 1. Check GPU availability (pynvml, compute capability ≥ 6.1)
 2. If GPU:
   a. Try loading pre-built TensorRT engine from Loader
   b. If fails → download ONNX model → start background conversion thread
   c. Background thread: convert ONNX→TensorRT → upload to Loader → set _converted_model_bytes
   d. Next init_ai() call: load from _converted_model_bytes
 3. If no GPU:
   a. Download ONNX model from Loader → create OnnxEngine
 ```
 ### Preprocessing
 - `cv2.dnn.blobFromImage`: normalize 0..1, resize to model input, BGR→RGB
 - Batch via `np.vstack`
 ### Postprocessing
 - Parse `[batch][det][x1,y1,x2,y2,conf,cls]` output
 - Normalize coordinates to 0..1
 - Convert to center-format Detection objects
 - Filter by confidence threshold
 - Remove overlapping detections (greedy: keep higher confidence, tie-break by lower class_id)
 ### Large Image Tiling
 - Ground Sampling Distance: `sensor_width * altitude / (focal_length * image_width)`
 - Tile size: `METERS_IN_TILE / GSD` pixels
 - Overlap: configurable percentage
 - Tile deduplication: absolute-coordinate Detection equality across adjacent tiles
 - Physical size filtering: remove detections exceeding class max_object_size_meters
 ### Video Processing
 - Frame sampling: every Nth frame
 - Annotation validity heuristics: time gap, detection count increase, spatial movement, confidence improvement
 - JPEG encoding of valid frames for annotation images
 ### Callbacks
 - `annotation_callback(annotation, percent)` — called per valid annotation
 - `status_callback(media_name, count)` — called when all detections for a media item are complete
 ## Caveats
 - `ThreadPoolExecutor` with max_workers=2 limits concurrent inference (set in main.py)
 - Background TensorRT conversion runs in a daemon thread — may be interrupted on shutdown
 - `init_ai()` called on every `run_detect` — idempotent but checks engine state each time
 - Video processing is sequential per video (no parallel video processing)
 - `_tile_detections` dict is instance-level state that persists across image calls within a single `run_detect` invocation
 ## Dependency Graph
 ```mermaid
 graph TD
    inference --> constants_inf
    inference --> ai_availability_status
    inference --> annotation
    inference --> ai_config
    inference -.-> onnx_engine
    inference -.-> tensorrt_engine
    inference --> loader_http_client
 ```
 ## Logging Strategy
 Extensive logging via `constants_inf.log`: engine init status, media processing start, GSD calculation, tile splitting, detection results, size filtering decisions.
@@ -0,0 +1,103 @@
 # Component: API
 ## Overview
 **Purpose**: HTTP API layer exposing object detection capabilities via FastAPI — handles request/response serialization, async task management, SSE streaming, and authentication token forwarding.
 **Pattern**: Controller layer — thin API surface that delegates all business logic to the Inference Pipeline.
 **Upstream**: Inference Pipeline (Inference class), Domain (constants_inf for labels).
 **Downstream**: None (top-level, client-facing).
 ## Modules
 | Module | Role |
 |--------|------|
 | `main` | FastAPI app definition, endpoints, DTOs, TokenManager, SSE streaming |
 ## External API Specification
 ### GET /health
 **Response**: `HealthResponse`
 ```json
 {
  "status": "healthy",
  "aiAvailability": "Enabled",
  "errorMessage": null
 }
 ```
 `aiAvailability` values: None, Downloading, Converting, Uploading, Enabled, Warning, Error.
 ### POST /detect
 **Input**: Multipart form — `file` (image bytes), optional `config` (JSON string).
 **Response**: `list[DetectionDto]`
 ```json
 [
  {
    "centerX": 0.5,
    "centerY": 0.5,
    "width": 0.1,
    "height": 0.1,
    "classNum": 0,
    "label": "ArmorVehicle",
    "confidence": 0.85
  }
 ]
 ```
 **Errors**: 400 (empty image / invalid data), 422 (runtime error), 503 (engine unavailable).
 ### POST /detect/{media_id}
 **Input**: Path param `media_id`, optional JSON body `AIConfigDto`, headers `Authorization: Bearer {token}`, `x-refresh-token: {token}`.
 **Response**: `{"status": "started", "mediaId": "..."}` (202-style).
 **Errors**: 409 (duplicate detection for same media_id).
 **Side effects**: Starts async detection task; results delivered via SSE stream and/or posted to Annotations service.
 ### GET /detect/stream
 **Response**: `text/event-stream` (SSE).
 ```
 data: {"annotations": [...], "mediaId": "...", "mediaStatus": "AIProcessing", "mediaPercent": 50}
 ```
 `mediaStatus` values: AIProcessing, AIProcessed, Error.
 ## Data Access Patterns
 - In-memory state:
  - `_active_detections: dict[str, bool]` — guards against duplicate media processing
  - `_event_queues: list[asyncio.Queue]` — SSE client queues (maxsize=100)
 - No database access
 ## Implementation Details
 - `Inference` is lazy-loaded on first use via `get_inference()` global function
 - `ThreadPoolExecutor(max_workers=2)` runs inference off the async event loop
 - SSE: one `asyncio.Queue` per connected client; events broadcast to all queues; full queues silently drop events
 - `TokenManager` decodes JWT exp from base64 payload (no signature verification), auto-refreshes 60s before expiry
 - `detection_to_dto` maps Detection fields to DetectionDto, looks up label from `constants_inf.annotations_dict`
 - Annotations posted to external service with base64-encoded frame image
 ## Caveats
 - No CORS middleware configured
 - No rate limiting
 - No request body size limits beyond FastAPI defaults
 - `_active_detections` is an in-memory dict — not persistent across restarts, not distributed
 - SSE queue overflow silently drops events (QueueFull caught and ignored)
 - JWT token handling has no signature verification — relies entirely on the Annotations service for auth
 - No graceful shutdown handling for in-progress detections
 ## Dependency Graph
 ```mermaid
 graph TD
    main --> inference
    main --> constants_inf
    main --> loader_http_client
 ```
 ## Logging Strategy
 No explicit logging in main.py — errors are caught and returned as HTTP responses. Logging happens in downstream components.
@@ -0,0 +1,157 @@
 # Azaion.Detections — Data Model
 ## Entity-Relationship Diagram
 ```mermaid
 erDiagram
    AnnotationClass {
        int id PK
        string name
        string color
        int max_object_size_meters
    }
    Detection {
        double x
        double y
        double w
        double h
        int cls FK
        double confidence
        string annotation_name
    }
    Annotation {
        string name PK
        string original_media_name
        long time
        bytes image
    }
    AIRecognitionConfig {
        int frame_period_recognition
        double frame_recognition_seconds
        double probability_threshold
        double tracking_distance_confidence
        double tracking_probability_increase
        double tracking_intersection_threshold
        int big_image_tile_overlap_percent
        int model_batch_size
        double altitude
        double focal_length
        double sensor_width
    }
    AIAvailabilityStatus {
        int status
        string error_message
    }
    DetectionDto {
        double centerX
        double centerY
        double width
        double height
        int classNum
        string label
        double confidence
    }
    DetectionEvent {
        string mediaId
        string mediaStatus
        int mediaPercent
    }
    Annotation ||--o{ Detection : contains
    Detection }o--|| AnnotationClass : "classified as"
    DetectionEvent ||--o{ DetectionDto : annotations
 ```
 ## Core Domain Entities
 ### AnnotationClass
 Loaded from `classes.json` at startup. 19 base classes × 3 weather modes = up to 57 entries in `annotations_dict`.
 | Field | Type | Description |
 |-------|------|-------------|
 | id | int | Unique class ID (0-18 base, +20 for winter, +40 for night) |
 | name | str | Display name (e.g. "ArmorVehicle", "Truck(Wint)") |
 | color | str | Hex color for visualization |
 | max_object_size_meters | int | Maximum physical size — detections exceeding this are filtered out |
 ### Detection
 Normalized bounding box (0..1 coordinate space).
 | Field | Type | Description |
 |-------|------|-------------|
 | x, y | double | Center coordinates (normalized) |
 | w, h | double | Width and height (normalized) |
 | cls | int | Class ID → maps to AnnotationClass |
 | confidence | double | Model confidence score (0..1) |
 | annotation_name | str | Back-reference to parent Annotation name |
 ### Annotation
 Groups detections for a single frame or image tile.
 | Field | Type | Description |
 |-------|------|-------------|
 | name | str | Unique name encoding media + tile/time info |
 | original_media_name | str | Source media filename (no extension, no spaces) |
 | time | long | Timestamp in ms (video) or 0 (image) |
 | detections | list[Detection] | Detected objects in this frame |
 | image | bytes | JPEG-encoded frame (set after validation) |
 ### AIRecognitionConfig
 Runtime configuration for inference behavior. Created from dict (API) or msgpack (internal).
 ### AIAvailabilityStatus
 Thread-safe engine lifecycle state. Values: NONE(0), DOWNLOADING(10), CONVERTING(20), UPLOADING(30), ENABLED(200), WARNING(300), ERROR(500).
 ## API DTOs (Pydantic)
 ### DetectionDto
 Outward-facing detection result. Maps from internal Detection + AnnotationClass label lookup.
 ### DetectionEvent
 SSE event payload. Status values: AIProcessing, AIProcessed, Error.
 ### AIConfigDto
 API input configuration. Same fields as AIRecognitionConfig with defaults.
 ### HealthResponse
 Health check response with AI availability status string.
 ## Annotation Naming Convention
 Annotation names encode media source and processing context:
 - **Image**: `{media_name}_000000`
 - **Image tile**: `{media_name}!split!{tile_size}_{x}_{y}!_000000`
 - **Video frame**: `{media_name}_{H}{MM}{SS}{f}` (compact time format)
 ## Serialization Formats
 | Entity | Format | Usage |
 |--------|--------|-------|
 | Detection/Annotation | msgpack (compact keys) | `annotation.serialize()` |
 | AIRecognitionConfig | msgpack (compact keys) | `from_msgpack()` |
 | AIAvailabilityStatus | msgpack | `serialize()` |
 | DetectionDto/Event | JSON (Pydantic) | HTTP API responses, SSE |
 ## No Persistent Storage
 This service has no database. All data is transient:
 - `classes.json` loaded at startup (read-only)
 - Model bytes downloaded from Loader on demand
 - Detection results returned via HTTP/SSE and posted to Annotations service
 - No local caching of results
@@ -0,0 +1,63 @@
 # Component Relationship Diagram
 ```mermaid
 graph TD
    subgraph "04 - API Layer"
        API["main.py<br/>(FastAPI endpoints, DTOs, SSE, TokenManager)"]
    end
    subgraph "03 - Inference Pipeline"
        INF["inference<br/>(orchestrator, preprocessing, postprocessing)"]
        LDR["loader_http_client<br/>(model download/upload)"]
    end
    subgraph "02 - Inference Engines"
        IE["inference_engine<br/>(abstract base)"]
        ONNX["onnx_engine<br/>(ONNX Runtime)"]
        TRT["tensorrt_engine<br/>(TensorRT + conversion)"]
    end
    subgraph "01 - Domain"
        CONST["constants_inf<br/>(constants, logging, class registry)"]
        ANNOT["annotation<br/>(Detection, Annotation)"]
        AICFG["ai_config<br/>(AIRecognitionConfig)"]
        STATUS["ai_availability_status<br/>(AIAvailabilityStatus)"]
    end
    subgraph "External Services"
        LOADER["Loader Service<br/>(http://loader:8080)"]
        ANNSVC["Annotations Service<br/>(http://annotations:8080)"]
    end
    API --> INF
    API --> CONST
    API --> LDR
    API --> ANNSVC
    INF --> ONNX
    INF --> TRT
    INF --> LDR
    INF --> CONST
    INF --> ANNOT
    INF --> AICFG
    INF --> STATUS
    ONNX --> IE
    ONNX --> CONST
    TRT --> IE
    TRT --> CONST
    STATUS --> CONST
    ANNOT --> CONST
    LDR --> LOADER
 ```
 ## Component Summary
 | # | Component | Modules | Purpose |
 |---|-----------|---------|---------|
 | 01 | Domain | constants_inf, ai_config, ai_availability_status, annotation | Shared data models, enums, constants, logging |
 | 02 | Inference Engines | inference_engine, onnx_engine, tensorrt_engine | Pluggable ML inference backends |
 | 03 | Inference Pipeline | inference, loader_http_client | Orchestration: engine lifecycle, preprocessing, postprocessing, media processing |
 | 04 | API | main | HTTP API, SSE streaming, auth token management |
@@ -0,0 +1,125 @@
 # E2E Test Environment
 ## Overview
 **System under test**: Azaion.Detections — FastAPI HTTP service exposing `POST /detect`, `POST /detect/{media_id}`, `GET /detect/stream`, `GET /health`
 **Consumer app purpose**: Standalone test runner that exercises the detection service through its public HTTP/SSE interfaces, validating end-to-end use cases without access to internals.
 ## Docker Environment
 ### Services
 | Service | Image / Build | Purpose | Ports |
 |---------|--------------|---------|-------|
 | detections | Build from repo root (setup.py + Cython compile, uvicorn entrypoint) | System under test — the detection microservice | 8000:8000 |
 | mock-loader | Custom lightweight HTTP stub (Python/Node) | Mock of the Loader service — serves ONNX model files, accepts TensorRT uploads | 8080:8080 |
 | mock-annotations | Custom lightweight HTTP stub (Python/Node) | Mock of the Annotations service — accepts detection results, provides token refresh | 8081:8081 |
 | e2e-consumer | Build from `e2e/` directory | Black-box test runner (pytest) | — |
 ### GPU Configuration
 For tests requiring TensorRT (GPU path):
 - Deploy `detections` with `runtime: nvidia` and `NVIDIA_VISIBLE_DEVICES=all`
 - The test suite has two profiles: `gpu` (TensorRT tests) and `cpu` (ONNX fallback tests)
 - CPU-only tests run without GPU runtime, verifying ONNX fallback behavior
 ### Networks
 | Network | Services | Purpose |
 |---------|----------|---------|
 | e2e-net | all | Isolated test network — all service-to-service communication via hostnames |
 ### Volumes
 | Volume | Mounted to | Purpose |
 |--------|-----------|---------|
 | test-models | mock-loader:/models | Pre-built ONNX model file for test inference |
 | test-media | e2e-consumer:/media | Sample images and video files for detection requests |
 | test-classes | detections:/app/classes.json | classes.json with 19 detection classes |
 | test-results | e2e-consumer:/results | CSV test report output |
 ### docker-compose structure
 ```yaml
 services:
  mock-loader:
    build: ./e2e/mocks/loader
    ports: ["8080:8080"]
    volumes:
      - test-models:/models
    networks: [e2e-net]
  mock-annotations:
    build: ./e2e/mocks/annotations
    ports: ["8081:8081"]
    networks: [e2e-net]
  detections:
    build:
      context: .
      dockerfile: Dockerfile
    ports: ["8000:8000"]
    environment:
      - LOADER_URL=http://mock-loader:8080
      - ANNOTATIONS_URL=http://mock-annotations:8081
    volumes:
      - test-classes:/app/classes.json
    depends_on:
      - mock-loader
      - mock-annotations
    networks: [e2e-net]
    # GPU profile adds: runtime: nvidia
  e2e-consumer:
    build: ./e2e
    volumes:
      - test-media:/media
      - test-results:/results
    depends_on:
      - detections
    networks: [e2e-net]
    command: pytest --csv=/results/report.csv
 volumes:
  test-models:
  test-media:
  test-classes:
  test-results:
 networks:
  e2e-net:
 ```
 ## Consumer Application
 **Tech stack**: Python 3, pytest, requests, sseclient-py
 **Entry point**: `pytest --csv=/results/report.csv`
 ### Communication with system under test
 | Interface | Protocol | Endpoint | Authentication |
 |-----------|----------|----------|----------------|
 | Health check | HTTP GET | `http://detections:8000/health` | None |
 | Single image detect | HTTP POST (multipart) | `http://detections:8000/detect` | None |
 | Media detect | HTTP POST (JSON) | `http://detections:8000/detect/{media_id}` | Bearer JWT + x-refresh-token headers |
 | SSE stream | HTTP GET (SSE) | `http://detections:8000/detect/stream` | None |
 ### What the consumer does NOT have access to
 - No direct import of Cython modules (inference, annotation, engines)
 - No direct access to the detections service filesystem or Logs/ directory
 - No shared memory with the detections process
 - No direct calls to mock-loader or mock-annotations (except for test setup/teardown verification)
 ## CI/CD Integration
 **When to run**: On PR merge to dev, nightly scheduled run
 **Pipeline stage**: After unit tests, before deployment
 **Gate behavior**: Block merge if any functional test fails; non-functional failures are warnings
 **Timeout**: 15 minutes for CPU profile, 30 minutes for GPU profile
 ## Reporting
 **Format**: CSV
 **Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
 **Output path**: `/results/report.csv` (mounted volume → `./e2e-results/report.csv` on host)
@@ -0,0 +1,591 @@
 # E2E Functional Tests
 ## Positive Scenarios
 ### FT-P-01: Health check returns status before engine initialization
 **Summary**: Verify the health endpoint responds correctly when the inference engine has not yet been initialized.
 **Traces to**: AC-API-1, AC-EL-1
 **Category**: API, Engine Lifecycle
 **Preconditions**:
 - Detections service is running
 - No detection requests have been made (engine is not initialized)
 **Input data**: None
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `GET /health` | 200 OK with `{"status": "healthy", "aiAvailability": "None"}` |
 **Expected outcome**: Health endpoint returns `status: "healthy"` and `aiAvailability: "None"` (engine not yet loaded).
 **Max execution time**: 2s
 ---
 ### FT-P-02: Health check reflects engine availability after initialization
 **Summary**: Verify the health endpoint reports the correct engine state after the engine has been initialized by a detection request.
 **Traces to**: AC-API-1, AC-EL-2
 **Category**: API, Engine Lifecycle
 **Preconditions**:
 - Detections service is running
 - Mock-loader serves the ONNX model file
 - At least one successful detection has been performed (engine initialized)
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image (trigger engine init) | 200 OK with detection results |
 | 2 | `GET /health` | 200 OK with `aiAvailability` set to `"Enabled"` or `"Warning"` |
 **Expected outcome**: `aiAvailability` reflects an initialized engine state (not `"None"` or `"Downloading"`).
 **Max execution time**: 30s (includes engine init on first call)
 ---
 ### FT-P-03: Single image detection returns detections
 **Summary**: Verify that a valid small image submitted via POST /detect returns structured detection results.
 **Traces to**: AC-DA-1, AC-API-2
 **Category**: Detection Accuracy, API
 **Preconditions**:
 - Engine is initialized (or will be on this call)
 - Mock-loader serves the model
 **Input data**: small-image (640×480, contains detectable objects)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image as multipart file | 200 OK |
 | 2 | Parse response JSON | Array of detection objects, each with `x`, `y`, `width`, `height`, `label`, `confidence` |
 | 3 | Verify all confidence values | Every detection has `confidence >= 0.25` (default probability_threshold) |
 **Expected outcome**: Non-empty array of DetectionDto objects. All confidences meet threshold. Each detection has valid bounding box coordinates (0.0–1.0 range).
 **Max execution time**: 30s
 ---
 ### FT-P-04: Large image triggers GSD-based tiling
 **Summary**: Verify that an image exceeding 1.5× model dimensions is tiled and processed with tile-level detection results merged.
 **Traces to**: AC-IP-1, AC-IP-2
 **Category**: Image Processing
 **Preconditions**:
 - Engine is initialized
 - Config includes altitude, focal_length, sensor_width for GSD calculation
 **Input data**: large-image (4000×3000)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with large-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
 | 2 | Parse response JSON | Array of detections |
 | 3 | Verify detection coordinates | Bounding box coordinates are in 0.0–1.0 range relative to the full original image |
 **Expected outcome**: Detections returned for the full image. Coordinates are normalized to original image dimensions (not tile dimensions). Processing time is longer than small-image due to tiling.
 **Max execution time**: 60s
 ---
 ### FT-P-05: Detection confidence filtering respects threshold
 **Summary**: Verify that detections below the configured probability_threshold are filtered out.
 **Traces to**: AC-DA-1
 **Category**: Detection Accuracy
 **Preconditions**:
 - Engine is initialized
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image and config `{"probability_threshold": 0.8}` | 200 OK |
 | 2 | Parse response JSON | All returned detections have `confidence >= 0.8` |
 | 3 | `POST /detect` with same image and config `{"probability_threshold": 0.1}` | 200 OK |
 | 4 | Compare result counts | Step 3 returns >= number of detections from Step 1 |
 **Expected outcome**: Higher threshold produces fewer or equal detections. No detection below threshold appears in results.
 **Max execution time**: 30s
 ---
 ### FT-P-06: Overlapping detections are deduplicated
 **Summary**: Verify that overlapping detections with containment ratio above threshold are deduplicated, keeping the higher-confidence one.
 **Traces to**: AC-DA-2
 **Category**: Detection Accuracy
 **Preconditions**:
 - Engine is initialized
 - Image produces overlapping detections (dense scene)
 **Input data**: small-image (scene with clustered objects)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image and config `{"tracking_intersection_threshold": 0.6}` | 200 OK |
 | 2 | Collect detections | No two detections of the same class overlap by more than 60% containment ratio |
 | 3 | `POST /detect` with same image and config `{"tracking_intersection_threshold": 0.01}` | 200 OK |
 | 4 | Compare result counts | Step 3 returns fewer or equal detections (more aggressive dedup) |
 **Expected outcome**: No pair of returned detections exceeds the configured overlap threshold.
 **Max execution time**: 30s
 ---
 ### FT-P-07: Physical size filtering removes oversized detections
 **Summary**: Verify that detections exceeding the MaxSizeM for their class (given GSD) are removed.
 **Traces to**: AC-DA-4
 **Category**: Detection Accuracy
 **Preconditions**:
 - Engine is initialized
 - classes.json loaded with MaxSizeM values
 **Input data**: small-image, config with known GSD parameters
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image and config `{"altitude": 400, "focal_length": 24, "sensor_width": 23.5}` | 200 OK |
 | 2 | For each detection, compute physical size from bounding box + GSD | No detection's physical size exceeds the MaxSizeM defined for its class in classes.json |
 **Expected outcome**: All returned detections have plausible physical dimensions for their class.
 **Max execution time**: 30s
 ---
 ### FT-P-08: Async media detection returns "started" immediately
 **Summary**: Verify that POST /detect/{media_id} returns immediately with status "started" while processing continues in background.
 **Traces to**: AC-API-3
 **Category**: API
 **Preconditions**:
 - Engine is initialized
 - Media file paths are available via config
 **Input data**: jwt-token, test-video path in config
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect/test-media-001` with config paths and auth headers | 200 OK, `{"status": "started"}` |
 | 2 | Measure response time | Response arrives within 1s (before video processing completes) |
 **Expected outcome**: Immediate response with `{"status": "started"}`. Processing continues asynchronously.
 **Max execution time**: 2s (response only; processing continues in background)
 ---
 ### FT-P-09: SSE streaming delivers detection events during async processing
 **Summary**: Verify that SSE clients receive real-time detection events during async media detection.
 **Traces to**: AC-API-4, AC-API-3
 **Category**: API
 **Preconditions**:
 - Engine is initialized
 - SSE client connected before triggering detection
 **Input data**: jwt-token, test-video path in config
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Open SSE connection: `GET /detect/stream` | Connection established |
 | 2 | `POST /detect/test-media-002` with config and auth headers | `{"status": "started"}` |
 | 3 | Listen on SSE connection | Receive events with `mediaStatus: "AIProcessing"` as frames are processed |
 | 4 | Wait for completion | Final event with `mediaStatus: "AIProcessed"` and `percent: 100` |
 **Expected outcome**: Multiple SSE events received. Events include detection data. Final event signals completion.
 **Max execution time**: 120s
 ---
 ### FT-P-10: Video frame sampling processes every Nth frame
 **Summary**: Verify that video processing respects the `frame_period_recognition` setting.
 **Traces to**: AC-VP-1
 **Category**: Video Processing
 **Preconditions**:
 - Engine is initialized
 - SSE client connected
 **Input data**: test-video (10s, 30fps = 300 frames), config `{"frame_period_recognition": 4}`
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Open SSE connection | Connection established |
 | 2 | `POST /detect/test-media-003` with config `{"frame_period_recognition": 4, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
 | 3 | Count distinct SSE events with detection data | Number of processed frames ≈ 300/4 = 75 (±10% tolerance for start/end frames) |
 **Expected outcome**: Approximately 75 frames processed (not all 300). The count scales proportionally with frame_period_recognition.
 **Max execution time**: 120s
 ---
 ### FT-P-11: Video annotation interval enforcement
 **Summary**: Verify that annotations are not reported more frequently than `frame_recognition_seconds`.
 **Traces to**: AC-VP-2
 **Category**: Video Processing
 **Preconditions**:
 - Engine is initialized
 - SSE client connected
 **Input data**: test-video, config `{"frame_recognition_seconds": 2}`
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Open SSE connection | Connection established |
 | 2 | `POST /detect/test-media-004` with config `{"frame_recognition_seconds": 2, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
 | 3 | Record timestamps of consecutive SSE detection events | Minimum gap between consecutive annotation events ≥ 2 seconds |
 **Expected outcome**: No two annotation events are closer than 2 seconds apart.
 **Max execution time**: 120s
 ---
 ### FT-P-12: Video tracking accepts new annotations on movement
 **Summary**: Verify that new annotations are accepted when detections move beyond the tracking threshold.
 **Traces to**: AC-VP-3
 **Category**: Video Processing
 **Preconditions**:
 - Engine is initialized
 - SSE client connected
 - Video contains moving objects
 **Input data**: test-video, config with `tracking_distance_confidence > 0`
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Open SSE connection | Connection established |
 | 2 | `POST /detect/test-media-005` with config `{"tracking_distance_confidence": 0.05, "paths": ["/media/test-video.mp4"]}` | `{"status": "started"}` |
 | 3 | Collect SSE events | Annotations are emitted when object positions change between frames |
 **Expected outcome**: Annotations contain updated positions reflecting object movement. Static objects do not generate redundant annotations.
 **Max execution time**: 120s
 ---
 ### FT-P-13: Weather mode class variants
 **Summary**: Verify that the system supports detection across different weather mode class variants (Norm, Wint, Night).
 **Traces to**: AC-OC-1
 **Category**: Object Classes
 **Preconditions**:
 - Engine is initialized
 - classes.json includes weather-mode variants
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image | 200 OK |
 | 2 | Inspect returned detection labels | Labels correspond to valid class names from classes.json (base or weather-variant) |
 **Expected outcome**: All returned labels are valid entries from the 19-class × 3-mode registry.
 **Max execution time**: 30s
 ---
 ### FT-P-14: Engine lazy initialization on first detection request
 **Summary**: Verify that the engine is not initialized at startup but is initialized on the first detection request.
 **Traces to**: AC-EL-1, AC-EL-2
 **Category**: Engine Lifecycle
 **Preconditions**:
 - Fresh service start, no prior requests
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `GET /health` immediately after service starts | `aiAvailability: "None"` — engine not loaded |
 | 2 | `POST /detect` with small-image | 200 OK (may take longer — engine initializing) |
 | 3 | `GET /health` | `aiAvailability` changed to `"Enabled"` or status indicating engine is active |
 **Expected outcome**: Engine transitions from "None" to an active state only after a detection request.
 **Max execution time**: 60s
 ---
 ### FT-P-15: ONNX fallback when GPU unavailable
 **Summary**: Verify that the system falls back to ONNX Runtime when no compatible GPU is available.
 **Traces to**: AC-EL-2, RESTRICT-HW-1
 **Category**: Engine Lifecycle
 **Preconditions**:
 - Detections service running WITHOUT GPU runtime (CPU-only Docker profile)
 - Mock-loader serves ONNX model
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with small-image | 200 OK with detection results |
 | 2 | `GET /health` | `aiAvailability` indicates engine is active (ONNX fallback) |
 **Expected outcome**: Detection succeeds via ONNX Runtime. No TensorRT-related errors.
 **Max execution time**: 60s
 ---
 ### FT-P-16: Tile deduplication removes duplicate detections at tile boundaries
 **Summary**: Verify that detections appearing in overlapping tile regions are deduplicated.
 **Traces to**: AC-DA-3
 **Category**: Detection Accuracy
 **Preconditions**:
 - Engine is initialized
 - Large image that triggers tiling
 **Input data**: large-image with config including GSD parameters and `big_image_tile_overlap_percent: 20`
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with large-image and tiling config | 200 OK |
 | 2 | Inspect detections near tile boundaries | No two detections of the same class are within 0.01 coordinate difference of each other (TILE_DUPLICATE_CONFIDENCE_THRESHOLD) |
 **Expected outcome**: Tile boundary detections are merged. No duplicates with near-identical coordinates remain.
 **Max execution time**: 60s
 ---
 ## Negative Scenarios
 ### FT-N-01: Empty image returns 400
 **Summary**: Verify that submitting an empty file to POST /detect returns a 400 error.
 **Traces to**: AC-API-2 (negative case)
 **Category**: API
 **Preconditions**:
 - Detections service is running
 **Input data**: empty-image (zero-byte file)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with empty-image as multipart file | 400 Bad Request |
 **Expected outcome**: HTTP 400 with error message indicating empty or invalid image.
 **Max execution time**: 5s
 ---
 ### FT-N-02: Invalid image data returns 400
 **Summary**: Verify that submitting a corrupt/non-image file returns a 400 error.
 **Traces to**: AC-API-2 (negative case)
 **Category**: API
 **Preconditions**:
 - Detections service is running
 **Input data**: corrupt-image (random binary data)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect` with corrupt-image as multipart file | 400 Bad Request |
 **Expected outcome**: HTTP 400. Image decoding fails gracefully with an error response (not a 500).
 **Max execution time**: 5s
 ---
 ### FT-N-03: Detection when engine unavailable returns 503
 **Summary**: Verify that a detection request returns 503 when the engine cannot be initialized.
 **Traces to**: AC-API-2 (negative case), AC-EL-2
 **Category**: API, Engine Lifecycle
 **Preconditions**:
 - Mock-loader configured to return errors (model download fails)
 - Engine has not been previously initialized
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Configure mock-loader to return 503 on model requests | — |
 | 2 | `POST /detect` with small-image | 503 Service Unavailable or 422 |
 **Expected outcome**: HTTP 503 or 422 error indicating engine is not available. No crash or unhandled exception.
 **Max execution time**: 30s
 ---
 ### FT-N-04: Duplicate media_id returns 409
 **Summary**: Verify that submitting a second async detection request with an already-active media_id returns 409.
 **Traces to**: AC-API-3 (negative case)
 **Category**: API
 **Preconditions**:
 - Engine is initialized
 - An async detection is already in progress for media_id "dup-test"
 **Input data**: jwt-token, test-video
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | `POST /detect/dup-test` with config and auth headers | `{"status": "started"}` |
 | 2 | Immediately `POST /detect/dup-test` again (same media_id) | 409 Conflict |
 **Expected outcome**: Second request is rejected with 409. First detection continues normally.
 **Max execution time**: 5s
 ---
 ### FT-N-05: Missing classes.json prevents startup
 **Summary**: Verify that the service fails or returns no detections when classes.json is not present.
 **Traces to**: RESTRICT-SW-4
 **Category**: Restrictions
 **Preconditions**:
 - Detections service started WITHOUT classes.json volume mount
 **Input data**: None
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Attempt to start detections service without classes.json | Service fails to start OR starts with empty class registry |
 | 2 | If started: `POST /detect` with small-image | Empty detections or error response |
 **Expected outcome**: Service either fails to start or returns no detections. No unhandled crash.
 **Max execution time**: 30s
 ---
 ### FT-N-06: Loader service unreachable during model download
 **Summary**: Verify that the system handles Loader service being unreachable during engine initialization.
 **Traces to**: RESTRICT-ENV-1, AC-EL-2
 **Category**: Resilience, Engine Lifecycle
 **Preconditions**:
 - Mock-loader is stopped or unreachable
 - Engine not yet initialized
 **Input data**: small-image
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Stop mock-loader service | — |
 | 2 | `POST /detect` with small-image | Error response (503 or 422) |
 | 3 | `GET /health` | `aiAvailability` reflects error state |
 **Expected outcome**: Detection fails gracefully. Health endpoint reflects the engine error state.
 **Max execution time**: 30s
 ---
 ### FT-N-07: Annotations service unreachable — detection continues
 **Summary**: Verify that async detection continues even when the Annotations service is unreachable.
 **Traces to**: RESTRICT-ENV-2
 **Category**: Resilience
 **Preconditions**:
 - Engine is initialized
 - Mock-annotations is stopped or returns errors
 - SSE client connected
 **Input data**: jwt-token, test-video
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Stop mock-annotations service | — |
 | 2 | `POST /detect/test-media-006` with config and auth | `{"status": "started"}` |
 | 3 | Listen on SSE | Detection events still arrive (annotations POST failure is silently caught) |
 | 4 | Wait for completion | Final `AIProcessed` event received |
 **Expected outcome**: Detection processing completes. SSE events are delivered. Annotations POST failure does not stop the detection pipeline.
 **Max execution time**: 120s
 ---
 ### FT-N-08: SSE queue overflow is silently dropped
 **Summary**: Verify that when an SSE client's queue reaches 100 events, additional events are dropped without error.
 **Traces to**: AC-API-4
 **Category**: API
 **Preconditions**:
 - Engine is initialized
 - SSE client connected but NOT consuming events (stalled reader)
 **Input data**: test-video (generates many events)
 **Steps**:
 | Step | Consumer Action | Expected System Response |
 |------|----------------|------------------------|
 | 1 | Open SSE connection but pause reading | Connection established |
 | 2 | `POST /detect/test-media-007` with config that generates > 100 events | `{"status": "started"}` |
 | 3 | Wait for processing to complete | No error on the detection side |
 | 4 | Resume reading SSE | Receive ≤ 100 events (queue max depth) |
 **Expected outcome**: No crash or error. Overflow events are silently dropped. Detection completes normally.
 **Max execution time**: 120s
@@ -0,0 +1,325 @@
 # E2E Non-Functional Tests
 ## Performance Tests
 ### NFT-PERF-01: Single image detection latency
 **Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
 **Traces to**: AC-API-2
 **Metric**: Request-to-response latency (ms)
 **Preconditions**:
 - Engine is initialized and warm (at least 1 prior detection)
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
 | 2 | Compute p50, p95, p99 | — |
 **Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
 **Duration**: ~60s (10 requests)
 ---
 ### NFT-PERF-02: Concurrent inference throughput
 **Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
 **Traces to**: RESTRICT-HW-3
 **Metric**: Throughput (requests/second), latency under concurrency
 **Preconditions**:
 - Engine is initialized and warm
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
 | 2 | Send 3 concurrent requests | Third request should queue behind the first two |
 | 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
 **Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
 **Duration**: ~30s
 ---
 ### NFT-PERF-03: Large image tiling processing time
 **Summary**: Measure processing time for a large image that triggers GSD-based tiling.
 **Traces to**: AC-IP-2
 **Metric**: Total processing time (ms), tiles processed
 **Preconditions**:
 - Engine is initialized and warm
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
 | 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
 **Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
 **Duration**: ~120s
 ---
 ### NFT-PERF-04: Video processing frame rate
 **Summary**: Measure effective frame processing rate during video detection.
 **Traces to**: AC-VP-1
 **Metric**: Frames processed per second, total processing time
 **Preconditions**:
 - Engine is initialized and warm
 - SSE client connected
 **Steps**:
 | Step | Consumer Action | Measurement |
 |------|----------------|-------------|
 | 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
 | 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
 **Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
 **Duration**: ~120s
 ---
 ## Resilience Tests
 ### NFT-RES-01: Loader service outage after engine initialization
 **Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
 **Traces to**: RESTRICT-ENV-1
 **Preconditions**:
 - Engine is initialized (model already downloaded)
 **Fault injection**:
 - Stop mock-loader service
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Stop mock-loader | — |
 | 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
 | 3 | `GET /health` | `aiAvailability` remains "Enabled" |
 **Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
 ---
 ### NFT-RES-02: Annotations service outage during async detection
 **Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
 **Traces to**: RESTRICT-ENV-2
 **Preconditions**:
 - Engine is initialized
 - SSE client connected
 **Fault injection**:
 - Stop mock-annotations mid-processing
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
 | 2 | After first few SSE events, stop mock-annotations | — |
 | 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
 | 4 | Wait for completion | Final `AIProcessed` event received |
 **Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
 ---
 ### NFT-RES-03: Engine initialization retry after transient loader failure
 **Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
 **Traces to**: AC-EL-2
 **Preconditions**:
 - Fresh service (engine not initialized)
 **Fault injection**:
 - Mock-loader returns 503 on first model request, then recovers
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Configure mock-loader to fail first request | — |
 | 2 | `POST /detect` with small-image | Error (503 or 422) |
 | 3 | Configure mock-loader to succeed | — |
 | 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
 **Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
 ---
 ### NFT-RES-04: Service restart with in-memory state loss
 **Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
 **Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
 **Preconditions**:
 - Previous detection may have been in progress
 **Fault injection**:
 - Restart detections container
 **Steps**:
 | Step | Action | Expected Behavior |
 |------|--------|------------------|
 | 1 | Restart detections container | — |
 | 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
 | 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
 **Pass criteria**: No stale state from previous session. All endpoints functional after restart.
 ---
 ## Security Tests
 ### NFT-SEC-01: Malformed multipart payload handling
 **Summary**: Verify that the service handles malformed multipart requests without crashing.
 **Traces to**: AC-API-2 (security)
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|------------------|
 | 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
 | 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
 | 3 | `GET /health` after malformed requests | Service is still healthy |
 **Pass criteria**: All malformed requests return 4xx. Service remains operational.
 ---
 ### NFT-SEC-02: Oversized request body
 **Summary**: Verify system behavior when an extremely large file is uploaded.
 **Traces to**: RESTRICT-OP-4
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|------------------|
 | 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
 | 2 | `GET /health` | Service is still running |
 **Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
 ---
 ### NFT-SEC-03: JWT token is forwarded without modification
 **Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
 **Traces to**: AC-API-3
 **Steps**:
 | Step | Consumer Action | Expected Response |
 |------|----------------|------------------|
 | 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
 | 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
 **Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
 ---
 ## Resource Limit Tests
 ### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
 **Summary**: Verify that no more than 2 inference operations run simultaneously.
 **Traces to**: RESTRICT-HW-3
 **Preconditions**:
 - Engine is initialized
 **Monitoring**:
 - Track concurrent request timings
 **Steps**:
 | Step | Consumer Action | Expected Behavior |
 |------|----------------|------------------|
 | 1 | Send 4 concurrent `POST /detect` requests | — |
 | 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
 **Duration**: ~60s
 **Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
 ---
 ### NFT-RES-LIM-02: SSE queue depth limit (100 events)
 **Summary**: Verify that the SSE queue per client does not exceed 100 events.
 **Traces to**: AC-API-4
 **Preconditions**:
 - Engine is initialized
 **Monitoring**:
 - SSE event count
 **Steps**:
 | Step | Consumer Action | Expected Behavior |
 |------|----------------|------------------|
 | 1 | Open SSE connection but do not read (stall client) | — |
 | 2 | Trigger async detection that produces > 100 events | — |
 | 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
 **Duration**: ~120s
 **Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
 ---
 ### NFT-RES-LIM-03: Max 300 detections per frame
 **Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
 **Traces to**: RESTRICT-SW-6
 **Preconditions**:
 - Engine is initialized
 - Image with dense scene expected to produce many detections
 **Monitoring**:
 - Detection count per response
 **Duration**: ~30s
 **Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
 ---
 ### NFT-RES-LIM-04: Log file rotation and retention
 **Summary**: Verify that log files rotate daily and are retained for 30 days.
 **Traces to**: AC-LOG-1, AC-LOG-2
 **Preconditions**:
 - Detections service running with Logs/ volume mounted for inspection
 **Monitoring**:
 - Log file creation, naming, and count
 **Steps**:
 | Step | Consumer Action | Expected Behavior |
 |------|----------------|------------------|
 | 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
 | 2 | Verify log file name matches current date | File name contains today's date |
 | 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
 **Duration**: ~10s
 **Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.
@@ -0,0 +1,41 @@
 # E2E Test Data Management
 ## Seed Data Sets
 | Data Set | Description | Used by Tests | How Loaded | Cleanup |
 |----------|-------------|---------------|-----------|---------|
 | onnx-model | Small YOLO ONNX model (valid architecture, 1280×1280 input, 19 classes) | All detection tests | Volume mount to mock-loader `/models/azaion.onnx` | Container restart |
 | classes-json | classes.json with 19 detection classes, 3 weather modes, MaxSizeM values | All tests | Volume mount to detections `/app/classes.json` | Container restart |
 | small-image | JPEG image 640×480 — below 1.5× model size (1920×1920 threshold) | FT-P-03, FT-P-05, FT-P-06, FT-P-07, FT-N-01, FT-N-02, NFT-PERF-01 | Volume mount to consumer `/media/` | N/A (read-only) |
 | large-image | JPEG image 4000×3000 — above 1.5× model size, triggers tiling | FT-P-04, FT-P-16, NFT-PERF-03 | Volume mount to consumer `/media/` | N/A (read-only) |
 | test-video | MP4 video, 10s duration, 30fps — contains objects across frames | FT-P-10, FT-P-11, FT-P-12, NFT-PERF-04 | Volume mount to consumer `/media/` | N/A (read-only) |
 | empty-image | Zero-byte file | FT-N-01 | Volume mount to consumer `/media/` | N/A (read-only) |
 | corrupt-image | Binary garbage (not valid image format) | FT-N-02 | Volume mount to consumer `/media/` | N/A (read-only) |
 | jwt-token | Valid JWT with exp claim (not signature-verified by detections) | FT-P-08, FT-P-09 | Generated by consumer at runtime | N/A |
 ## Data Isolation Strategy
 Each test run starts with fresh containers (`docker compose down -v && docker compose up`). The detections service is stateless — no persistent data between runs. Mock services reset their state on container restart. Tests that modify mock behavior (e.g., making loader unreachable) must run in isolated test groups.
 ## Input Data Mapping
 | Input Data File | Source Location | Description | Covers Scenarios |
 |-----------------|----------------|-------------|-----------------|
 | data_parameters.md | `_docs/00_problem/input_data/data_parameters.md` | API parameter schemas, config defaults, classes.json structure | Informs all test input construction |
 ## External Dependency Mocks
 | External Service | Mock/Stub | How Provided | Behavior |
 |-----------------|-----------|-------------|----------|
 | Loader Service | HTTP stub | Docker service `mock-loader` | Serves ONNX model from volume on `GET /models/azaion.onnx`. Accepts TensorRT upload on `POST /upload`. Returns 404 for unknown files. Configurable: can simulate downtime (503) via control endpoint `POST /mock/config`. |
 | Annotations Service | HTTP stub | Docker service `mock-annotations` | Accepts annotation POST on `POST /annotations` — stores in memory for verification. Provides token refresh on `POST /auth/refresh`. Configurable: can simulate downtime (503) via control endpoint `POST /mock/config`. Returns recorded annotations on `GET /mock/annotations` for test assertions. |
 ## Data Validation Rules
 | Data Type | Validation | Invalid Examples | Expected System Behavior |
 |-----------|-----------|-----------------|------------------------|
 | Image file (POST /detect) | Non-empty bytes, decodable by OpenCV | Zero-byte file, random binary, text file | 400 Bad Request |
 | media_id (POST /detect/{media_id}) | String, unique among active detections | Already-active media_id | 409 Conflict |
 | AIConfigDto fields | probability_threshold: 0.0–1.0; frame_period_recognition: positive int; big_image_tile_overlap_percent: 0–100 | probability_threshold: -1 or 2.0; frame_period_recognition: 0 | System uses defaults or returns validation error |
 | Authorization header | Bearer token format | Missing header, malformed JWT | Token forwarded to Annotations as-is; detections still proceeds |
 | classes.json | JSON array of objects with Id, Name, Color, MaxSizeM | Missing file, empty array, malformed JSON | Service fails to start / returns empty detections |
@@ -0,0 +1,70 @@
 # E2E Traceability Matrix
 ## Acceptance Criteria Coverage
 | AC ID | Acceptance Criterion | Test IDs | Coverage |
 |-------|---------------------|----------|----------|
 | AC-DA-1 | Detections with confidence below probability_threshold are filtered out | FT-P-03, FT-P-05 | Covered |
 | AC-DA-2 | Overlapping detections with containment ratio > tracking_intersection_threshold are deduplicated | FT-P-06 | Covered |
 | AC-DA-3 | Tile duplicate detections identified when bounding box coordinates differ by < 0.01 | FT-P-16 | Covered |
 | AC-DA-4 | Physical size filtering: detections exceeding max_object_size_meters removed | FT-P-07 | Covered |
 | AC-VP-1 | Frame sampling: every Nth frame processed (frame_period_recognition) | FT-P-10, NFT-PERF-04 | Covered |
 | AC-VP-2 | Minimum annotation interval: frame_recognition_seconds between annotations | FT-P-11 | Covered |
 | AC-VP-3 | Tracking: new annotation accepted on movement/confidence change | FT-P-12 | Covered |
 | AC-IP-1 | Images ≤ 1.5× model dimensions processed as single frame | FT-P-03 | Covered |
 | AC-IP-2 | Larger images: tiled based on GSD, tile overlap configurable | FT-P-04, FT-P-16, NFT-PERF-03 | Covered |
 | AC-API-1 | GET /health returns status: "healthy" with aiAvailability | FT-P-01, FT-P-02 | Covered |
 | AC-API-2 | POST /detect returns detections synchronously. Errors: 400, 422, 503 | FT-P-03, FT-N-01, FT-N-02, FT-N-03, NFT-SEC-01 | Covered |
 | AC-API-3 | POST /detect/{media_id} returns immediately with "started". Rejects duplicate with 409 | FT-P-08, FT-N-04, NFT-SEC-03 | Covered |
 | AC-API-4 | GET /detect/stream delivers SSE events. Queue max depth: 100 | FT-P-09, FT-N-08, NFT-RES-LIM-02 | Covered |
 | AC-EL-1 | Engine initialization is lazy (first detection, not startup) | FT-P-01, FT-P-14 | Covered |
 | AC-EL-2 | Status transitions: NONE → DOWNLOADING → ENABLED / ERROR | FT-P-02, FT-P-14, FT-N-03, NFT-RES-03 | Covered |
 | AC-EL-3 | GPU check: NVIDIA GPU with compute capability ≥ 6.1 | FT-P-15 | Covered |
 | AC-EL-4 | TensorRT conversion uses FP16 when GPU supports it | — | NOT COVERED — requires specific GPU hardware; verified by visual inspection of TensorRT build logs |
 | AC-EL-5 | Background conversion does not block API responsiveness | FT-P-01, FT-P-14 | Covered |
 | AC-LOG-1 | Log files: Logs/log_inference_YYYYMMDD.txt | NFT-RES-LIM-04 | Covered |
 | AC-LOG-2 | Rotation: daily. Retention: 30 days | NFT-RES-LIM-04 | Covered |
 | AC-OC-1 | 19 base classes, 3 weather modes, up to 57 variants | FT-P-13 | Covered |
 | AC-OC-2 | Each class has Id, Name, Color, MaxSizeM | FT-P-07, FT-P-13 | Covered |
 ## Restrictions Coverage
 | Restriction ID | Restriction | Test IDs | Coverage |
 |---------------|-------------|----------|----------|
 | RESTRICT-HW-1 | GPU CC ≥ 6.1 required for TensorRT | FT-P-15 | Covered |
 | RESTRICT-HW-2 | TensorRT conversion uses 90% GPU memory workspace | — | NOT COVERED — requires controlled GPU memory environment; verified during manual engine build |
 | RESTRICT-HW-3 | ThreadPoolExecutor limited to 2 workers | NFT-PERF-02, NFT-RES-LIM-01 | Covered |
 | RESTRICT-SW-1 | Python 3 + Cython 3.1.3 compilation required | — | NOT COVERED — build-time constraint; verified by Docker build succeeding |
 | RESTRICT-SW-2 | ONNX model (azaion.onnx) must be available via Loader | FT-N-06, NFT-RES-01, NFT-RES-03 | Covered |
 | RESTRICT-SW-3 | TensorRT engines are GPU-architecture-specific (not portable) | — | NOT COVERED — requires multiple GPU architectures; documented constraint |
 | RESTRICT-SW-4 | classes.json must exist at startup | FT-N-05 | Covered |
 | RESTRICT-SW-5 | Model input: fixed 1280×1280 | FT-P-03, FT-P-04 | Covered |
 | RESTRICT-SW-6 | Max 300 detections per frame | NFT-RES-LIM-03 | Covered |
 | RESTRICT-ENV-1 | LOADER_URL must be reachable for model download | FT-N-06, NFT-RES-01, NFT-RES-03 | Covered |
 | RESTRICT-ENV-2 | ANNOTATIONS_URL must be reachable for result posting | FT-N-07, NFT-RES-02 | Covered |
 | RESTRICT-ENV-3 | Logs/ directory must be writable | NFT-RES-LIM-04 | Covered |
 | RESTRICT-OP-1 | Stateless — no local persistence of detection results | NFT-RES-04 | Covered |
 | RESTRICT-OP-2 | No TLS at application level | — | NOT COVERED — infrastructure-level concern; out of scope for application E2E tests |
 | RESTRICT-OP-3 | No CORS configuration | — | NOT COVERED — requires browser-based testing; out of scope for API-level E2E |
 | RESTRICT-OP-4 | No rate limiting | NFT-SEC-02 | Covered |
 | RESTRICT-OP-5 | No graceful shutdown — in-progress detections not drained | NFT-RES-04 | Covered |
 | RESTRICT-OP-6 | Single-instance in-memory state (not shared across instances) | NFT-RES-04 | Covered |
 ## Coverage Summary
 | Category | Total Items | Covered | Not Covered | Coverage % |
 |----------|-----------|---------|-------------|-----------|
 | Acceptance Criteria | 22 | 21 | 1 | 95% |
 | Restrictions | 18 | 13 | 5 | 72% |
 | **Total** | **40** | **34** | **6** | **85%** |
 ## Uncovered Items Analysis
 | Item | Reason Not Covered | Risk | Mitigation |
 |------|-------------------|------|-----------|
 | AC-EL-4 (FP16 TensorRT) | Requires specific GPU with FP16 support; E2E test cannot control hardware capabilities | Low — TensorRT builder auto-detects FP16 | Verified during manual TensorRT build; logged by engine |
 | RESTRICT-HW-2 (90% GPU memory) | Requires controlled GPU memory environment with specific memory sizes | Low — hardcoded workspace fraction | Verified by observing TensorRT build logs on target hardware |
 | RESTRICT-SW-1 (Cython compilation) | Build-time constraint, not runtime behavior | Low — Docker build validates this | Docker build step serves as the validation gate |
 | RESTRICT-SW-3 (TensorRT non-portable) | Requires multiple GPU architectures in test environment | Low — engine filename encodes architecture | Architecture-specific filenames prevent incorrect loading |
 | RESTRICT-OP-2 (No TLS) | Infrastructure-level concern; application does not implement TLS | None — by design | TLS handled by reverse proxy / service mesh in deployment |
 | RESTRICT-OP-3 (No CORS) | Browser-specific concern; API-level E2E tests don't use browsers | Low — known limitation | Can be tested separately with browser automation if needed |
@@ -0,0 +1,68 @@
 # Module: ai_availability_status
 ## Purpose
 Thread-safe status tracker for the AI engine lifecycle (downloading, converting, uploading, enabled, warning, error).
 ## Public Interface
 ### Enum: AIAvailabilityEnum
 | Value | Name | Meaning |
 |-------|------|---------|
 | 0 | NONE | Initial state, not yet initialized |
 | 10 | DOWNLOADING | Model download in progress |
 | 20 | CONVERTING | ONNX-to-TensorRT conversion in progress |
 | 30 | UPLOADING | Converted model upload in progress |
 | 200 | ENABLED | Engine ready for inference |
 | 300 | WARNING | Operational with warnings |
 | 500 | ERROR | Failed, not operational |
 ### Class: AIAvailabilityStatus
 | Field | Type | Description |
 |-------|------|-------------|
 | `status` | AIAvailabilityEnum | Current status |
 | `error_message` | str or None | Error/warning details |
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `()` | Sets status=NONE, error_message=None |
 | `__str__` | `() -> str` | Thread-safe formatted string: `"StatusText ErrorText"` |
 | `serialize` | `() -> bytes` | Thread-safe msgpack serialization `{s: status, m: error_message}` **(legacy — not called in current codebase)** |
 | `set_status` | `(AIAvailabilityEnum status, str error_message=None) -> void` | Thread-safe status update; logs via constants_inf (error or info) |
 ## Internal Logic
 All public methods acquire a `threading.Lock` before reading/writing status fields. `set_status` logs the transition: errors go to `constants_inf.logerror`, normal transitions go to `constants_inf.log`.
 ## Dependencies
 - **External**: `msgpack`, `threading`
 - **Internal**: `constants_inf` (logging)
 ## Consumers
 - `inference` — creates instance, calls `set_status` during engine lifecycle, exposes `ai_availability_status` for health checks
 - `main` — reads `ai_availability_status` via inference for `/health` endpoint
 ## Data Models
 - `AIAvailabilityEnum` — status enum
 - `AIAvailabilityStatus` — stateful status holder
 ## Configuration
 None.
 ## External Integrations
 None.
 ## Security
 Thread-safe via Lock — safe for concurrent access from FastAPI async + ThreadPoolExecutor.
 ## Tests
 None found.
@@ -0,0 +1,69 @@
 # Module: ai_config
 ## Purpose
 Data class holding all AI recognition configuration parameters, with factory methods for deserialization from msgpack and dict formats.
 ## Public Interface
 ### Class: AIRecognitionConfig
 #### Fields
 | Field | Type | Default | Description |
 |-------|------|---------|-------------|
 | `frame_period_recognition` | int | 4 | Process every Nth frame in video |
 | `frame_recognition_seconds` | double | 2.0 | Minimum seconds between valid video annotations |
 | `probability_threshold` | double | 0.25 | Minimum detection confidence |
 | `tracking_distance_confidence` | double | 0.0 | Distance threshold for tracking (model-width units) |
 | `tracking_probability_increase` | double | 0.0 | Required confidence increase for tracking update |
 | `tracking_intersection_threshold` | double | 0.6 | IoU threshold for overlapping detection removal |
 | `file_data` | bytes | `b''` | Raw file data (msgpack use) |
 | `paths` | list[str] | `[]` | Media file paths to process |
 | `model_batch_size` | int | 1 | Batch size for inference |
 | `big_image_tile_overlap_percent` | int | 20 | Tile overlap percentage for large image splitting |
 | `altitude` | double | 400 | Camera altitude in meters |
 | `focal_length` | double | 24 | Camera focal length in mm |
 | `sensor_width` | double | 23.5 | Camera sensor width in mm |
 #### Methods
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `from_msgpack` | `(bytes data) -> AIRecognitionConfig` | Static cdef; deserializes from msgpack binary |
 | `from_dict` | `(dict data) -> AIRecognitionConfig` | Static def; deserializes from Python dict |
 ## Internal Logic
 Both factory methods apply defaults for missing keys. `from_msgpack` uses compact single-character keys (`f_pr`, `pt`, `t_dc`, etc.) while `from_dict` uses full descriptive keys.
 **Legacy/unused**: `from_msgpack()` is defined but never called in the current codebase — it is a remnant of a previous queue-based architecture. Only `from_dict()` is actively used. The `file_data` field is stored but never read anywhere.
 ## Dependencies
 - **External**: `msgpack`
 - **Internal**: none (leaf module)
 ## Consumers
 - `inference` — creates config from dict, uses all fields for frame selection, detection filtering, image tiling, and tracking
 ## Data Models
 - `AIRecognitionConfig` — the sole data class
 ## Configuration
 Camera/altitude parameters (`altitude`, `focal_length`, `sensor_width`) are used for ground sampling distance calculation in aerial image processing.
 ## External Integrations
 None.
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,83 @@
 # Module: annotation
 ## Purpose
 Data models for object detections and annotations (grouped detections for a frame/tile with metadata).
 ## Public Interface
 ### Class: Detection
 Represents a single bounding box detection in normalized coordinates.
 | Field | Type | Description |
 |-------|------|-------------|
 | `x` | double | Center X (normalized 0..1) |
 | `y` | double | Center Y (normalized 0..1) |
 | `w` | double | Width (normalized 0..1) |
 | `h` | double | Height (normalized 0..1) |
 | `cls` | int | Class ID (maps to constants_inf.annotations_dict) |
 | `confidence` | double | Detection confidence (0..1) |
 | `annotation_name` | str | Parent annotation name (set after construction) |
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(double x, y, w, h, int cls, double confidence)` | Constructor |
 | `__str__` | `() -> str` | Format: `"{cls}: {x} {y} {w} {h}, prob: {confidence}%"` |
 | `__eq__` | `(other) -> bool` | Two detections are equal if all bbox coordinates differ by less than `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` |
 | `overlaps` | `(Detection det2, float confidence_threshold) -> bool` | Returns True if IoU-like overlap ratio (overlap area / min area) exceeds threshold |
 ### Class: Annotation
 Groups detections for a single frame or image tile.
 | Field | Type | Description |
 |-------|------|-------------|
 | `name` | str | Unique annotation name (encodes tile/time info) |
 | `original_media_name` | str | Source media filename (without extension/spaces) |
 | `time` | long | Timestamp in milliseconds (video) or 0 (image) |
 | `detections` | list[Detection] | Detections found in this frame/tile |
 | `image` | bytes | JPEG-encoded frame image (set after validation) |
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(str name, str original_media_name, long ms, list[Detection] detections)` | Sets annotation_name on all detections |
 | `__str__` | `() -> str` | Formatted detection summary |
 | `serialize` | `() -> bytes` | Msgpack serialization with compact keys **(legacy — not called in current codebase)** |
 ## Internal Logic
 - `Detection.__eq__` uses `constants_inf.TILE_DUPLICATE_CONFIDENCE_THRESHOLD` (0.01) to determine if two detections at absolute coordinates are duplicates across adjacent tiles.
 - `Detection.overlaps` computes the overlap as `overlap_area / min(area1, area2)` — this is not standard IoU but a containment-biased metric.
 - `Annotation.__init__` sets `annotation_name` on every child detection.
 ## Dependencies
 - **External**: `msgpack`
 - **Internal**: `constants_inf` (TILE_DUPLICATE_CONFIDENCE_THRESHOLD constant)
 ## Consumers
 - `inference` — creates Detection and Annotation instances during postprocessing, uses overlaps for NMS, uses equality for tile dedup
 - `main` — reads Detection fields for DTO conversion
 ## Data Models
 - `Detection` — bounding box + class + confidence
 - `Annotation` — frame/tile container for detections + metadata + image
 ## Configuration
 None.
 ## External Integrations
 None.
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,95 @@
 # Module: constants_inf
 ## Purpose
 Application-wide constants, logging infrastructure, and the object detection class registry loaded from `classes.json`.
 ## Public Interface
 ### Constants
 | Name | Type | Value | Description |
 |------|------|-------|-------------|
 | `CONFIG_FILE` | str | `"config.yaml"` | Configuration file path |
 | `QUEUE_CONFIG_FILENAME` | str | `"secured-config.json"` | Queue config filename |
 | `AI_ONNX_MODEL_FILE` | str | `"azaion.onnx"` | ONNX model filename |
 | `CDN_CONFIG` | str | `"cdn.yaml"` | CDN configuration file |
 | `MODELS_FOLDER` | str | `"models"` | Directory for model files |
 | `SMALL_SIZE_KB` | int | `3` | Small file size threshold (KB) |
 | `SPLIT_SUFFIX` | str | `"!split!"` | Delimiter in tiled image names |
 | `TILE_DUPLICATE_CONFIDENCE_THRESHOLD` | double | `0.01` | Threshold for tile duplicate detection equality |
 | `METERS_IN_TILE` | int | `25` | Physical tile size in meters for large image splitting |
 | `weather_switcher_increase` | int | `20` | Offset between weather mode class ID ranges |
 ### Enum: WeatherMode
 | Value | Name | Meaning |
 |-------|------|---------|
 | 0 | Norm | Normal weather |
 | 20 | Wint | Winter |
 | 40 | Night | Night |
 ### Class: AnnotationClass
 Fields: `id` (int), `name` (str), `color` (str), `max_object_size_meters` (int).
 Represents a detection class with its display metadata and physical size constraint.
 ### Functions
 | Function | Signature | Description |
 |----------|-----------|-------------|
 | `log` | `(str log_message) -> void` | Info-level log via loguru |
 | `logerror` | `(str error) -> void` | Error-level log via loguru |
 | `format_time` | `(int ms) -> str` | Converts milliseconds to compact time string `HMMSSf` |
 ### Global: `annotations_dict`
 `dict[int, AnnotationClass]` — loaded at module init from `classes.json`. Contains 19 base classes × 3 weather modes (Norm/Wint/Night) = up to 57 entries. Keys are class IDs, values are `AnnotationClass` instances.
 ## Internal Logic
 - On import, reads `classes.json` and builds `annotations_dict` by iterating 3 weather mode offsets (0, 20, 40) and adding class ID offsets. Weather mode names are appended to class names for non-Norm modes.
 - Configures loguru with:
  - File sink: `Logs/log_inference_YYYYMMDD.txt` (daily rotation, 30-day retention)
  - Stdout: INFO/DEBUG/SUCCESS levels
  - Stderr: WARNING and above
 ## Legacy / Orphaned Declarations
 The `.pxd` header declares `QUEUE_MAXSIZE`, `COMMANDS_QUEUE`, and `ANNOTATIONS_QUEUE` (with comments referencing RabbitMQ) that are **not defined** in the `.pyx` implementation. These are remnants of a previous queue-based architecture and are unused.
 ## Dependencies
 - **External**: `json`, `sys`, `loguru`
 - **Internal**: none (leaf module)
 ## Consumers
 - `ai_availability_status` (logging)
 - `annotation` (tile duplicate threshold)
 - `onnx_engine` (logging)
 - `tensorrt_engine` (logging)
 - `inference` (logging, constants, annotations_dict, format_time, SPLIT_SUFFIX, METERS_IN_TILE, MODELS_FOLDER, AI_ONNX_MODEL_FILE)
 - `main` (annotations_dict for label lookup)
 ## Data Models
 - `AnnotationClass` — detection class metadata
 - `WeatherMode` — enum for weather conditions
 ## Configuration
 - Reads `classes.json` at import time (must exist in working directory)
 ## External Integrations
 None.
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,107 @@
 # Module: inference
 ## Purpose
 Core inference orchestrator — manages the AI engine lifecycle, preprocesses media (images and video), runs batched inference, postprocesses detections, and applies validation filters (overlap removal, size filtering, tile deduplication, video tracking).
 ## Public Interface
 ### Class: Inference
 #### Fields
 | Field | Type | Access | Description |
 |-------|------|--------|-------------|
 | `loader_client` | object | internal | LoaderHttpClient instance |
 | `engine` | InferenceEngine | internal | Active engine (OnnxEngine or TensorRTEngine), None if unavailable |
 | `ai_availability_status` | AIAvailabilityStatus | public | Current AI readiness status |
 | `stop_signal` | bool | internal | Flag to abort video processing |
 | `model_width` | int | internal | Model input width in pixels |
 | `model_height` | int | internal | Model input height in pixels |
 | `detection_counts` | dict[str, int] | internal | Per-media detection count |
 | `is_building_engine` | bool | internal | True during async TensorRT conversion |
 #### Methods
 | Method | Signature | Access | Description |
 |--------|-----------|--------|-------------|
 | `__init__` | `(loader_client)` | public | Initializes state, calls `init_ai()` |
 | `run_detect` | `(dict config_dict, annotation_callback, status_callback=None)` | cpdef | Main entry: parses config, separates images/videos, processes each |
 | `detect_single_image` | `(bytes image_bytes, dict config_dict) -> list` | cpdef | Single-image detection from raw bytes, returns list[Detection] |
 | `stop` | `()` | cpdef | Sets stop_signal to True |
 | `init_ai` | `()` | cdef | Engine initialization: tries TensorRT engine file → falls back to ONNX → background TensorRT conversion |
 | `preprocess` | `(frames) -> ndarray` | cdef | OpenCV blobFromImage: resize, normalize to 0..1, swap RGB, stack batch |
 | `postprocess` | `(output, ai_config) -> list[list[Detection]]` | cdef | Parses engine output to Detection objects, applies confidence threshold and overlap removal |
 ## Internal Logic
 ### Engine Initialization (`init_ai`)
 1. If `_converted_model_bytes` exists → load TensorRT from those bytes
 2. If GPU available → try downloading pre-built TensorRT engine from loader
 3. If download fails → download ONNX model, start background thread for ONNX→TensorRT conversion
 4. If no GPU → load OnnxEngine from ONNX model bytes
 ### Preprocessing
 - `cv2.dnn.blobFromImage`: scale 1/255, resize to model dims, BGR→RGB, no crop
 - Stack multiple frames via `np.vstack` for batched inference
 ### Postprocessing
 - Engine output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
 - Coordinates normalized to 0..1 by dividing by model width/height
 - Converted to center-format (cx, cy, w, h) Detection objects
 - Filtered by `probability_threshold`
 - Overlapping detections removed via `remove_overlapping_detections` (greedy, keeps higher confidence)
 ### Image Processing
 - Small images (≤1.5× model size): processed as single frame
 - Large images: split into tiles based on ground sampling distance. Tile size = `METERS_IN_TILE / GSD` pixels. Tiles overlap by configurable percentage.
 - Tile deduplication: absolute-coordinate comparison across adjacent tiles using `Detection.__eq__`
 - Size filtering: detections whose physical size (meters) exceeds `AnnotationClass.max_object_size_meters` are removed. Physical size computed from GSD × pixel dimensions.
 ### Video Processing
 - Frame sampling: every Nth frame (`frame_period_recognition`)
 - Batch accumulation up to engine batch size
 - Annotation validity: must differ from previous annotation by either:
  - Time gap ≥ `frame_recognition_seconds`
  - More detections than previous
  - Any detection moved beyond `tracking_distance_confidence` threshold
  - Any detection confidence increased beyond `tracking_probability_increase`
 - Valid frames get JPEG-encoded image attached
 ### Ground Sampling Distance (GSD)
 `GSD = sensor_width * altitude / (focal_length * image_width)` — meters per pixel, used for physical size filtering of aerial detections.
 ## Dependencies
 - **External**: `cv2`, `numpy`, `pynvml`, `mimetypes`, `pathlib`, `threading`
 - **Internal**: `constants_inf`, `ai_availability_status`, `annotation`, `ai_config`, `tensorrt_engine` (conditional), `onnx_engine` (conditional), `inference_engine` (type)
 ## Consumers
 - `main` — lazy-initializes Inference, calls `run_detect`, `detect_single_image`, reads `ai_availability_status`
 ## Data Models
 Uses `Detection`, `Annotation` (from annotation), `AIRecognitionConfig` (from ai_config), `AIAvailabilityStatus` (from ai_availability_status).
 ## Configuration
 All runtime config comes via `AIRecognitionConfig` dict. Engine selection is automatic based on GPU availability (checked at module-level via pynvml).
 ## External Integrations
 - **Loader service** (via loader_client): model download/upload
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,59 @@
 # Module: inference_engine
 ## Purpose
 Abstract base class defining the interface that all inference engine implementations must follow.
 ## Public Interface
 ### Class: InferenceEngine
 #### Fields
 | Field | Type | Description |
 |-------|------|-------------|
 | `batch_size` | int | Number of images per inference batch |
 #### Methods
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Stores batch_size |
 | `get_input_shape` | `() -> tuple` | Returns (height, width) of model input. Abstract — raises `NotImplementedError` |
 | `get_batch_size` | `() -> int` | Returns `self.batch_size` |
 | `run` | `(input_data) -> list` | Runs inference on preprocessed input blob. Abstract — raises `NotImplementedError` |
 ## Internal Logic
 Pure abstract class. All methods except `get_batch_size` raise `NotImplementedError` and must be overridden by subclasses (`OnnxEngine`, `TensorRTEngine`).
 ## Dependencies
 - **External**: `numpy` (declared in .pxd, not used in base)
 - **Internal**: none (leaf module)
 ## Consumers
 - `onnx_engine` — subclass
 - `tensorrt_engine` — subclass
 - `inference` — type reference in .pxd
 ## Data Models
 None.
 ## Configuration
 None.
 ## External Integrations
 None.
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,61 @@
 # Module: loader_http_client
 ## Purpose
 HTTP client for downloading and uploading model files (and other binary resources) via an external Loader microservice.
 ## Public Interface
 ### Class: LoadResult
 Simple result wrapper.
 | Field | Type | Description |
 |-------|------|-------------|
 | `err` | str or None | Error message if operation failed |
 | `data` | bytes or None | Response payload on success |
 ### Class: LoaderHttpClient
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(str base_url)` | Stores base URL, strips trailing slash |
 | `load_big_small_resource` | `(str filename, str directory) -> LoadResult` | POST to `/load/{filename}` with JSON body `{filename, folder}`, returns raw bytes |
 | `upload_big_small_resource` | `(bytes content, str filename, str directory) -> LoadResult` | POST to `/upload/{filename}` with multipart file + form data `{folder}` |
 | `stop` | `() -> None` | No-op placeholder |
 ## Internal Logic
 Both load/upload methods wrap all exceptions into `LoadResult(err=str(e))`. Errors are logged via loguru but never raised.
 ## Dependencies
 - **External**: `requests`, `loguru`
 - **Internal**: none (leaf module)
 ## Consumers
 - `inference` — downloads ONNX/TensorRT models, uploads converted TensorRT engines
 - `main` — instantiates client with `LOADER_URL`
 ## Data Models
 - `LoadResult` — operation result with error-or-data semantics
 ## Configuration
 - `base_url` — provided at construction time, sourced from `LOADER_URL` environment variable in `main.py`
 ## External Integrations
 | Integration | Protocol | Endpoint Pattern |
 |-------------|----------|-----------------|
 | Loader service | HTTP POST | `/load/{filename}` (download), `/upload/{filename}` (upload) |
 ## Security
 None (no auth headers sent to loader).
 ## Tests
 None found.
@@ -0,0 +1,115 @@
 # Module: main
 ## Purpose
 FastAPI application entry point — exposes HTTP API for object detection on images and video media, health checks, and Server-Sent Events (SSE) streaming of detection results.
 ## Public Interface
 ### API Endpoints
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Returns AI engine availability status |
 | POST | `/detect` | Single image detection (multipart file upload) |
 | POST | `/detect/{media_id}` | Start async detection on media from loader service |
 | GET | `/detect/stream` | SSE stream of detection events |
 ### DTOs (Pydantic Models)
 | Model | Fields | Description |
 |-------|--------|-------------|
 | `DetectionDto` | centerX, centerY, width, height, classNum, label, confidence | Single detection result |
 | `DetectionEvent` | annotations (list[DetectionDto]), mediaId, mediaStatus, mediaPercent | SSE event payload |
 | `HealthResponse` | status, aiAvailability, errorMessage | Health check response |
 | `AIConfigDto` | frame_period_recognition, frame_recognition_seconds, probability_threshold, tracking_*, model_batch_size, big_image_tile_overlap_percent, altitude, focal_length, sensor_width, paths | Configuration input for media detection |
 ### Class: TokenManager
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(str access_token, str refresh_token)` | Stores tokens |
 | `get_valid_token` | `() -> str` | Returns access_token; auto-refreshes if expiring within 60s |
 ## Internal Logic
 ### `/health`
 Returns `HealthResponse` with `status="healthy"` always. `aiAvailability` reflects the engine's `AIAvailabilityStatus`. On exception, returns `aiAvailability="None"`.
 ### `/detect` (single image)
 1. Reads uploaded file bytes
 2. Parses optional JSON config
 3. Runs `inference.detect_single_image` in ThreadPoolExecutor (max 2 workers)
 4. Returns list of DetectionDto
 Error mapping: RuntimeError("not available") → 503, RuntimeError → 422, ValueError → 400.
 ### `/detect/{media_id}` (async media)
 1. Checks for duplicate active detection (409 if already running)
 2. Extracts auth tokens from Authorization header and x-refresh-token header
 3. Creates `asyncio.Task` for background detection
 4. Detection runs `inference.run_detect` in ThreadPoolExecutor
 5. Callbacks push `DetectionEvent` to all SSE queues
 6. If auth token present, also POSTs annotations to the Annotations service
 7. Returns immediately: `{"status": "started", "mediaId": media_id}`
 ### `/detect/stream` (SSE)
 - Creates asyncio.Queue per client (maxsize=100)
 - Yields `data: {json}\n\n` SSE format
 - Cleans up queue on disconnect
 ### Token Management
 - Decodes JWT exp claim from base64 payload (no signature verification)
 - Auto-refreshes via POST to `{ANNOTATIONS_URL}/auth/refresh` when within 60s of expiry
 ### Annotations Service Integration
 - POST to `{ANNOTATIONS_URL}/annotations` with:
  - `mediaId`, `source: 0`, `videoTime` (formatted from ms), `detections` (list of dto dicts)
  - Optional base64-encoded `image`
  - Bearer token in Authorization header
 ## Dependencies
 - **External**: `asyncio`, `base64`, `json`, `os`, `time`, `concurrent.futures`, `typing`, `requests`, `fastapi`, `pydantic`
 - **Internal**: `inference` (lazy import), `constants_inf` (label lookup), `loader_http_client` (client instantiation)
 ## Consumers
 None (entry point).
 ## Data Models
 - `DetectionDto`, `DetectionEvent`, `HealthResponse`, `AIConfigDto` — Pydantic models for API
 - `TokenManager` — JWT token lifecycle
 ## Configuration
 | Env Var | Default | Description |
 |---------|---------|-------------|
 | `LOADER_URL` | `http://loader:8080` | Loader service base URL |
 | `ANNOTATIONS_URL` | `http://annotations:8080` | Annotations service base URL |
 ## External Integrations
 | Service | Protocol | Purpose |
 |---------|----------|---------|
 | Loader | HTTP (via LoaderHttpClient) | Model loading |
 | Annotations | HTTP POST | Auth refresh (`/auth/refresh`), annotation posting (`/annotations`) |
 ## Security
 - Bearer token from request headers, refreshed via Annotations service
 - JWT exp decoded (base64, no signature verification) — token validation is not performed locally
 - No CORS configuration
 - No rate limiting
 - No input validation on media_id path parameter beyond string type
 ## Tests
 None found.
@@ -0,0 +1,51 @@
 # Module: onnx_engine
 ## Purpose
 ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.
 ## Public Interface
 ### Class: OnnxEngine (extends InferenceEngine)
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. |
 | `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
 | `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) |
 | `run` | `(input_data) -> list` | Runs session inference, returns output tensors |
 ## Internal Logic
 - Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available.
 - If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter.
 - Logs model input metadata and custom metadata map at init.
 ## Dependencies
 - **External**: `onnxruntime`
 - **Internal**: `inference_engine` (base class), `constants_inf` (logging)
 ## Consumers
 - `inference` — instantiated when no compatible NVIDIA GPU is found
 ## Data Models
 None (wraps onnxruntime.InferenceSession).
 ## Configuration
 None.
 ## External Integrations
 None directly — model bytes are provided by caller (loaded via `loader_http_client`).
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,57 @@
 # Module: tensorrt_engine
 ## Purpose
 TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
 ## Public Interface
 ### Class: TensorRTEngine (extends InferenceEngine)
 | Method | Signature | Description |
 |--------|-----------|-------------|
 | `__init__` | `(bytes model_bytes, int batch_size=4, **kwargs)` | Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
 | `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
 | `get_batch_size` | `() -> int` | Returns batch size |
 | `run` | `(input_data) -> list` | Async H2D copy → execute → D2H copy, returns output as numpy array |
 | `get_gpu_memory_bytes` | `(int device_id) -> int` | Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
 | `get_engine_filename` | `(int device_id) -> str` | Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine` |
 | `convert_from_onnx` | `(bytes onnx_model) -> bytes or None` | Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
 ## Internal Logic
 - Input shape defaults to 1280×1280 for dynamic dimensions.
 - Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
 - `run` uses async CUDA memory transfers with stream synchronization.
 - `convert_from_onnx` uses explicit batch mode, configures FP16 precision when GPU supports it.
 - Default batch size is 4 (vs OnnxEngine's 1).
 ## Dependencies
 - **External**: `tensorrt`, `pycuda.driver`, `pycuda.autoinit`, `pynvml`, `numpy`
 - **Internal**: `inference_engine` (base class), `constants_inf` (logging)
 ## Consumers
 - `inference` — instantiated when compatible NVIDIA GPU is found; also calls `convert_from_onnx` and `get_engine_filename`
 ## Data Models
 None (wraps TensorRT runtime objects).
 ## Configuration
 - Engine filename is GPU-specific (compute capability + SM count).
 - Workspace memory is 90% of available GPU memory.
 ## External Integrations
 None directly — model bytes provided by caller.
 ## Security
 None.
 ## Tests
 None found.
@@ -0,0 +1,13 @@
 {
  "current_step": "complete",
  "completed_steps": ["discovery", "module-analysis", "component-assembly", "system-synthesis", "verification", "solution-extraction", "problem-extraction", "final-report"],
  "modules_total": 10,
  "modules_documented": [
    "constants_inf", "ai_config", "inference_engine", "loader_http_client",
    "ai_availability_status", "annotation", "onnx_engine", "tensorrt_engine",
    "inference", "main"
  ],
  "modules_remaining": [],
  "components_written": ["01_domain", "02_inference_engines", "03_inference_pipeline", "04_api"],
  "last_updated": "2026-03-21"
 }
@@ -0,0 +1,259 @@
 # Azaion.Detections — System Flows
 ## Flow Inventory
 | # | Flow Name | Trigger | Primary Components | Criticality |
 |---|-----------|---------|-------------------|-------------|
 | F1 | Health Check | Client GET /health | API, Inference Pipeline | High |
 | F2 | Single Image Detection | Client POST /detect | API, Inference Pipeline, Engines, Domain | High |
 | F3 | Media Detection (Async) | Client POST /detect/{media_id} | API, Inference Pipeline, Engines, Domain, Loader, Annotations | High |
 | F4 | SSE Event Streaming | Client GET /detect/stream | API | Medium |
 | F5 | Engine Initialization | First detection request | Inference Pipeline, Engines, Loader | High |
 | F6 | TensorRT Background Conversion | No pre-built TensorRT engine | Inference Pipeline, Engines, Loader | Medium |
 ## Flow Dependencies
 | Flow | Depends On | Shares Data With |
 |------|-----------|-----------------|
 | F1 | F5 (for meaningful status) | — |
 | F2 | F5 (engine must be ready) | — |
 | F3 | F5 (engine must be ready) | F4 (via SSE event queues) |
 | F4 | — | F3 (receives events) |
 | F5 | — | F6 (triggers conversion if needed) |
 | F6 | F5 (triggered by init failure) | F5 (provides converted bytes) |
 ---
 ## Flow F1: Health Check
 ### Description
 Client queries the service health status. Returns the current AI engine availability (None, Downloading, Converting, Enabled, Error, etc.) without triggering engine initialization.
 ### Sequence Diagram
 ```mermaid
 sequenceDiagram
    participant Client
    participant API as main.py
    participant INF as Inference
    participant STATUS as AIAvailabilityStatus
    Client->>API: GET /health
    API->>INF: get_inference()
    INF-->>API: Inference instance
    API->>STATUS: str(ai_availability_status)
    STATUS-->>API: "Enabled" / "Downloading" / etc.
    API-->>Client: HealthResponse{status, aiAvailability, errorMessage}
 ```
 ### Error Scenarios
 | Error | Where | Detection | Recovery |
 |-------|-------|-----------|----------|
 | Inference not yet created | get_inference() | Exception caught | Returns aiAvailability="None" |
 ---
 ## Flow F2: Single Image Detection
 ### Description
 Client uploads an image file and optionally provides config. The service runs inference synchronously (via ThreadPoolExecutor) and returns detection results.
 ### Sequence Diagram
 ```mermaid
 sequenceDiagram
    participant Client
    participant API as main.py
    participant INF as Inference
    participant ENG as Engine (ONNX/TRT)
    participant CONST as constants_inf
    Client->>API: POST /detect (file + config?)
    API->>API: Read image bytes, parse config
    API->>INF: detect_single_image(bytes, config_dict)
    INF->>INF: init_ai() (idempotent)
    INF->>INF: cv2.imdecode → preprocess
    INF->>ENG: run(input_blob)
    ENG-->>INF: raw output
    INF->>INF: postprocess → filter by threshold → remove overlaps
    INF-->>API: list[Detection]
    API->>CONST: annotations_dict[cls].name (label lookup)
    API-->>Client: list[DetectionDto]
 ```
 ### Error Scenarios
 | Error | Where | Detection | Recovery |
 |-------|-------|-----------|----------|
 | Empty image | API | len(bytes)==0 | 400 Bad Request |
 | Invalid image data | imdecode | frame is None | 400 ValueError |
 | Engine not available | init_ai | engine is None | 503 Service Unavailable |
 | Inference failure | run/postprocess | RuntimeError | 422 Unprocessable Entity |
 ---
 ## Flow F3: Media Detection (Async)
 ### Description
 Client triggers detection on media files (images/video) available via the Loader service. Processing runs asynchronously. Results are streamed via SSE (F4) and optionally posted to the Annotations service.
 ### Sequence Diagram
 ```mermaid
 sequenceDiagram
    participant Client
    participant API as main.py
    participant INF as Inference
    participant ENG as Engine
    participant LDR as Loader Service
    participant ANN as Annotations Service
    participant SSE as SSE Queues
    Client->>API: POST /detect/{media_id} (config + auth headers)
    API->>API: Check _active_detections (duplicate guard)
    API-->>Client: {"status": "started"}
    Note over API: asyncio.Task created
    API->>INF: run_detect(config, on_annotation, on_status)
    loop For each media file
        INF->>INF: Read/decode media (cv2)
        INF->>INF: Preprocess (tile/batch)
        INF->>ENG: run(input_blob)
        ENG-->>INF: raw output
        INF->>INF: Postprocess + validate
        opt Valid annotation found
            INF->>API: on_annotation(annotation, percent)
            API->>SSE: DetectionEvent → all queues
            opt Auth token present
                API->>ANN: POST /annotations (detections + image)
            end
        end
    end
    INF->>API: on_status(media_name, count)
    API->>SSE: DetectionEvent(status=AIProcessed, percent=100)
 ```
 ### Data Flow
 | Step | From | To | Data | Format |
 |------|------|----|------|--------|
 | 1 | Client | API | media_id, config, auth tokens | HTTP POST JSON + headers |
 | 2 | API | Inference | config_dict, callbacks | Python dict + callables |
 | 3 | Inference | Engine | preprocessed batch | numpy ndarray |
 | 4 | Engine | Inference | raw detections | numpy ndarray |
 | 5 | Inference | API (callback) | Annotation + percent | Python objects |
 | 6 | API | SSE clients | DetectionEvent | SSE JSON stream |
 | 7 | API | Annotations Service | detections + base64 image | HTTP POST JSON |
 ### Error Scenarios
 | Error | Where | Detection | Recovery |
 |-------|-------|-----------|----------|
 | Duplicate media_id | API | _active_detections check | 409 Conflict |
 | Engine unavailable | run_detect | engine is None | Error event pushed to SSE |
 | Inference failure | processing | Exception | Error event pushed to SSE, media_id cleared |
 | Annotations POST failure | _post_annotation | Exception | Silently caught, detection continues |
 | SSE queue full | event broadcast | QueueFull | Event silently dropped for that client |
 ---
 ## Flow F4: SSE Event Streaming
 ### Description
 Client opens a persistent SSE connection. Receives real-time detection events from all active F3 media detection tasks.
 ### Sequence Diagram
 ```mermaid
 sequenceDiagram
    participant Client
    participant API as main.py
    participant Queue as asyncio.Queue
    Client->>API: GET /detect/stream
    API->>Queue: Create queue (maxsize=100)
    API->>API: Add to _event_queues
    loop Until disconnect
        Queue-->>API: await event
        API-->>Client: data: {DetectionEvent JSON}
    end
    Note over API: Client disconnects (CancelledError)
    API->>API: Remove from _event_queues
 ```
 ---
 ## Flow F5: Engine Initialization
 ### Description
 On first detection request, the Inference class initializes the ML engine. Strategy: try TensorRT pre-built engine → fall back to ONNX → background TensorRT conversion.
 ### Flowchart
 ```mermaid
 flowchart TD
    Start([init_ai called]) --> CheckEngine{engine exists?}
    CheckEngine -->|Yes| Done([Return])
    CheckEngine -->|No| CheckBuilding{is_building_engine?}
    CheckBuilding -->|Yes| Done
    CheckBuilding -->|No| CheckConverted{_converted_model_bytes?}
    CheckConverted -->|Yes| LoadConverted[Load TensorRT from bytes]
    LoadConverted --> SetEnabled[status = ENABLED]
    SetEnabled --> Done
    CheckConverted -->|No| CheckGPU{GPU available?}
    CheckGPU -->|Yes| DownloadTRT[Download pre-built TensorRT engine]
    DownloadTRT --> TRTSuccess{Success?}
    TRTSuccess -->|Yes| LoadTRT[Create TensorRTEngine]
    LoadTRT --> SetEnabled
    TRTSuccess -->|No| DownloadONNX[Download ONNX model]
    DownloadONNX --> StartConversion[Start background thread: convert ONNX→TRT]
    StartConversion --> Done
    CheckGPU -->|No| DownloadONNX2[Download ONNX model]
    DownloadONNX2 --> LoadONNX[Create OnnxEngine]
    LoadONNX --> Done
 ```
 ---
 ## Flow F6: TensorRT Background Conversion
 ### Description
 When no pre-built TensorRT engine exists, a background daemon thread converts the ONNX model to TensorRT, uploads the result to Loader for caching, and stores the bytes for the next `init_ai` call.
 ### Sequence Diagram
 ```mermaid
 sequenceDiagram
    participant INF as Inference
    participant TRT as TensorRTEngine
    participant LDR as Loader Service
    participant STATUS as AIAvailabilityStatus
    Note over INF: Background thread starts
    INF->>STATUS: set_status(CONVERTING)
    INF->>TRT: convert_from_onnx(onnx_bytes)
    TRT->>TRT: Build TensorRT engine (90% GPU memory workspace)
    TRT-->>INF: engine_bytes
    INF->>STATUS: set_status(UPLOADING)
    INF->>LDR: upload_big_small_resource(engine_bytes, filename)
    LDR-->>INF: LoadResult
    INF->>INF: _converted_model_bytes = engine_bytes
    INF->>STATUS: set_status(ENABLED)
    Note over INF: Next init_ai() call will load from _converted_model_bytes
 ```
@@ -0,0 +1,175 @@
 # Test Infrastructure
 **Task**: AZ-138_test_infrastructure
 **Name**: Test Infrastructure
 **Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
 **Complexity**: 5 points
 **Dependencies**: None
 **Component**: Integration Tests
 **Jira**: AZ-138
 **Epic**: AZ-137
 ## Test Project Folder Layout
 ```
 e2e/
 ├── conftest.py
 ├── requirements.txt
 ├── Dockerfile
 ├── pytest.ini
 ├── mocks/
 │   ├── loader/
 │   │   ├── Dockerfile
 │   │   └── app.py
 │   └── annotations/
 │       ├── Dockerfile
 │       └── app.py
 ├── fixtures/
 │   ├── small_image.jpg          (640×480 JPEG with detectable objects)
 │   ├── large_image.jpg          (4000×3000 JPEG for tiling tests)
 │   ├── test_video.mp4           (10s, 30fps MP4 with moving objects)
 │   ├── empty_image              (zero-byte file)
 │   ├── corrupt_image            (random binary garbage)
 │   ├── classes.json             (19 classes, 3 weather modes, MaxSizeM values)
 │   └── azaion.onnx              (small valid YOLO ONNX model, 1280×1280 input, 19 classes)
 ├── tests/
 │   ├── test_health_engine.py
 │   ├── test_single_image.py
 │   ├── test_tiling.py
 │   ├── test_async_sse.py
 │   ├── test_video.py
 │   ├── test_negative.py
 │   ├── test_resilience.py
 │   ├── test_performance.py
 │   ├── test_security.py
 │   └── test_resource_limits.py
 └── docker-compose.test.yml
 ```
 ### Layout Rationale
 - `mocks/` separated from tests — each mock is a standalone Docker service with its own Dockerfile
 - `fixtures/` holds all static test data, volume-mounted into containers
 - `tests/` organized by test category matching the test spec structure (one file per task group)
 - `conftest.py` provides shared pytest fixtures (HTTP clients, SSE helpers, service readiness checks)
 - `pytest.ini` configures markers for `gpu`/`cpu` profiles and test ordering
 ## Mock Services
 | Mock Service | Replaces | Endpoints | Behavior |
 |-------------|----------|-----------|----------|
 | mock-loader | Loader service (model download/upload) | `GET /models/azaion.onnx` — serves ONNX model from volume. `POST /upload` — accepts TensorRT engine upload, stores in memory. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/status` — returns mock state. | Deterministic: serves model file from `/models/` volume. Configurable downtime via control endpoint. First-request-fail mode for retry tests. |
 | mock-annotations | Annotations service (result posting, token refresh) | `POST /annotations` — accepts annotation POST, stores in memory. `POST /auth/refresh` — returns refreshed token. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/annotations` — returns recorded annotations for assertion. | Records all incoming annotations in memory. Provides token refresh. Configurable downtime. Assertions via GET endpoint to verify what was received. |
 ### Mock Control API
 Both mock services expose:
 - `POST /mock/config` — accepts JSON `{"mode": "normal"|"error"|"first_fail"}` to control behavior
 - `POST /mock/reset` — clears recorded state (annotations, uploads)
 - `GET /mock/status` — returns current mode and recorded interaction count
 ## Docker Test Environment
 ### docker-compose.test.yml Structure
 | Service | Image / Build | Purpose | Depends On |
 |---------|--------------|---------|------------|
 | detections | Build from repo root (Dockerfile) | System under test — FastAPI detection service | mock-loader, mock-annotations |
 | mock-loader | Build from `e2e/mocks/loader/` | Serves ONNX model, accepts TensorRT uploads | — |
 | mock-annotations | Build from `e2e/mocks/annotations/` | Accepts annotation results, provides token refresh | — |
 | e2e-consumer | Build from `e2e/` | pytest test runner | detections |
 ### Networks and Volumes
 **Network**: `e2e-net` — isolated bridge network, all services communicate via hostnames
 **Volumes**:
 | Volume | Mount Target | Content |
 |--------|-------------|---------|
 | test-models | mock-loader:/models | `azaion.onnx` model file |
 | test-media | e2e-consumer:/media | Test images and video files |
 | test-classes | detections:/app/classes.json | `classes.json` with 19 detection classes |
 | test-results | e2e-consumer:/results | CSV test report output |
 ### GPU Profile
 Two Docker Compose profiles:
 - **cpu** (default): `detections` runs without GPU runtime, exercises ONNX fallback path
 - **gpu**: `detections` runs with `runtime: nvidia` and `NVIDIA_VISIBLE_DEVICES=all`, exercises TensorRT path
 ### Environment Variables (detections service)
 | Variable | Value | Purpose |
 |----------|-------|---------|
 | LOADER_URL | http://mock-loader:8080 | Points to mock Loader |
 | ANNOTATIONS_URL | http://mock-annotations:8081 | Points to mock Annotations |
 ## Test Runner Configuration
 **Framework**: pytest
 **Plugins**: pytest-csv (reporting), requests (HTTP client), sseclient-py (SSE streaming), pytest-timeout (per-test timeouts)
 **Entry point**: `pytest --csv=/results/report.csv -v`
 ### Fixture Strategy
 | Fixture | Scope | Purpose |
 |---------|-------|---------|
 | `base_url` | session | Detections service base URL (`http://detections:8000`) |
 | `http_client` | session | `requests.Session` configured with base URL and default timeout |
 | `sse_client_factory` | function | Factory that opens SSE connection to `/detect/stream` |
 | `mock_loader_url` | session | Mock-loader base URL for control API calls |
 | `mock_annotations_url` | session | Mock-annotations base URL for control API and assertion calls |
 | `wait_for_services` | session (autouse) | Polls health endpoints until all services are ready |
 | `reset_mocks` | function (autouse) | Calls `POST /mock/reset` on both mocks before each test |
 | `small_image` | session | Reads `small_image.jpg` from `/media/` volume |
 | `large_image` | session | Reads `large_image.jpg` from `/media/` volume |
 | `test_video_path` | session | Path to `test_video.mp4` on host filesystem |
 | `empty_image` | session | Reads zero-byte file |
 | `corrupt_image` | session | Reads random binary file |
 | `jwt_token` | function | Generates a valid JWT with exp claim for auth tests |
 | `warm_engine` | module | Sends one detection request to initialize engine, used by tests that need warm engine |
 ## Test Data Fixtures
 | Data Set | Source | Format | Used By |
 |----------|--------|--------|---------|
 | azaion.onnx | Pre-built small YOLO model | ONNX (1280×1280 input, 19 classes) | All detection tests (via mock-loader) |
 | classes.json | Static fixture | JSON (19 objects with Id, Name, Color, MaxSizeM) | All tests (volume mount to detections) |
 | small_image.jpg | Static fixture | JPEG 640×480 | Health, single image, filtering, negative, performance tests |
 | large_image.jpg | Static fixture | JPEG 4000×3000 | Tiling tests, performance tests |
 | test_video.mp4 | Static fixture | MP4 10s 30fps | Async, SSE, video processing tests |
 | empty_image | Static fixture | Zero-byte file | FT-N-01 |
 | corrupt_image | Static fixture | Random binary | FT-N-02 |
 ### Data Isolation
 Each test run starts with fresh containers (`docker compose down -v && docker compose up`). The detections service is stateless — no persistent data between runs. Mock services reset state via `POST /mock/reset` before each test. Tests that modify mock behavior (e.g., making loader unreachable) run with function-scoped mock resets.
 ## Test Reporting
 **Format**: CSV
 **Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
 **Output path**: `/results/report.csv` → mounted to `./e2e-results/report.csv` on host
 ## Acceptance Criteria
 **AC-1: Test environment starts**
 Given the docker-compose.test.yml
 When `docker compose -f docker-compose.test.yml up` is executed
 Then all services start and the detections service is reachable at http://detections:8000/health
 **AC-2: Mock services respond**
 Given the test environment is running
 When the e2e-consumer sends requests to mock-loader and mock-annotations
 Then mock services respond with configured behavior and record interactions
 **AC-3: Test runner executes**
 Given the test environment is running
 When the e2e-consumer starts
 Then pytest discovers and executes test files from `tests/` directory
 **AC-4: Test report generated**
 Given tests have been executed
 When the test run completes
 Then `/results/report.csv` exists with columns: Test ID, Test Name, Execution Time, Result, Error Message
@@ -0,0 +1,47 @@
 # Autopilot State
 ## Current Step
 step: 2d
 name: Decompose Tests
 status: in_progress
 sub_step: 1t — Test Infrastructure Bootstrap
 ## Step ↔ SubStep Reference
 | Step | Name                   | Sub-Skill                        | Internal SubSteps                        |
 |------|------------------------|----------------------------------|------------------------------------------|
 | 0    | Problem                | problem/SKILL.md                 | Phase 1–4                                |
 | 1    | Research               | research/SKILL.md                | Mode A: Phase 1–4 · Mode B: Step 0–8    |
 | 2    | Plan                   | plan/SKILL.md                    | Step 1–6                                 |
 | 2b   | Blackbox Test Spec     | blackbox-test-spec/SKILL.md      | Phase 1a–1b (existing code path only)    |
 | 2c   | Post-Test-Spec Decision| (autopilot decision gate)        | Refactor vs normal workflow              |
 | 2d   | Decompose Tests        | decompose/SKILL.md (tests-only)  | Step 1t + Step 3 + Step 4                |
 | 2e   | Implement Tests        | implement/SKILL.md               | (batch-driven, no fixed sub-steps)       |
 | 3    | Decompose              | decompose/SKILL.md               | Step 1–4                                 |
 | 4    | Implement              | implement/SKILL.md               | (batch-driven, no fixed sub-steps)       |
 | 5    | Deploy                 | deploy/SKILL.md                  | Step 1–7                                 |
 ## Completed Steps
 | Step | Name | Completed | Key Outcome |
 |------|------|-----------|-------------|
 | — | Document (pre-step) | 2026-03-21 | 10 modules, 4 components, full _docs/ generated from existing codebase |
 | 2b | Blackbox Test Spec | 2026-03-21 | 39 test scenarios (16 positive, 8 negative, 11 non-functional), 85% total coverage, 5 artifacts produced |
 | 2c | Post-Test-Spec Decision | 2026-03-22 | User chose refactor path (A) |
 ## Key Decisions
 - User chose B: Document existing codebase before proceeding
 - Component breakdown: 4 components (Domain, Inference Engines, Inference Pipeline, API)
 - Verification: 4 legacy issues found and documented (unused serialize/from_msgpack, orphaned queue declarations)
 - Input data coverage approved at ~90% (Phase 1a)
 - Test coverage approved at 85% (21/22 AC, 13/18 restrictions) with all gaps justified
 - User chose A: Refactor path (decompose tests → implement tests → refactor)
 - Integration Tests Epic: AZ-137
 ## Last Session
 date: 2026-03-22
 ended_at: Step 2d Decompose Tests — SubStep 1t Test Infrastructure Bootstrap
 reason: in progress
 notes: Starting tests-only mode decomposition. 39 test scenarios to decompose into atomic tasks.
 ## Blockers
 - none