Refactor task management structure and update documentation

- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-28 01:17:45 +02:00
parent 8c665bd0a4
commit cbf370c765
35 changed files with 1348 additions and 58 deletions
+9 -5
View File
@@ -12,7 +12,7 @@ If you want to run a specific skill directly (without the orchestrator), use the
/problem — interactive problem gathering → _docs/00_problem/ /problem — interactive problem gathering → _docs/00_problem/
/research — solution drafts → _docs/01_solution/ /research — solution drafts → _docs/01_solution/
/plan — architecture, components, tests → _docs/02_document/ /plan — architecture, components, tests → _docs/02_document/
/decompose — atomic task specs → _docs/02_tasks/ /decompose — atomic task specs → _docs/02_tasks/todo/
/implement — batched parallel implementation → _docs/03_implementation/ /implement — batched parallel implementation → _docs/03_implementation/
/deploy — containerization, CI/CD, observability → _docs/04_deploy/ /deploy — containerization, CI/CD, observability → _docs/04_deploy/
``` ```
@@ -118,7 +118,7 @@ Bottom-up codebase documentation. Analyzes existing code from modules through co
2. /plan — architecture, data model, deployment, components, risks, tests, Jira epics → _docs/02_document/ 2. /plan — architecture, data model, deployment, components, risks, tests, Jira epics → _docs/02_document/
3. /decompose — atomic task specs + dependency table → _docs/02_tasks/ 3. /decompose — atomic task specs + dependency table → _docs/02_tasks/todo/
4. /implement — batched parallel agents, code review, commit per batch → _docs/03_implementation/ 4. /implement — batched parallel agents, code review, commit per batch → _docs/03_implementation/
``` ```
@@ -146,7 +146,7 @@ Or just use `/autopilot` to run steps 0-5 automatically.
| **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` | | **problem** | "problem", "define problem", "new project" | `_docs/00_problem/` |
| **research** | "research", "investigate" | `_docs/01_solution/` | | **research** | "research", "investigate" | `_docs/01_solution/` |
| **plan** | "plan", "decompose solution" | `_docs/02_document/` | | **plan** | "plan", "decompose solution" | `_docs/02_document/` |
| **decompose** | "decompose", "task decomposition" | `_docs/02_tasks/` | | **decompose** | "decompose", "task decomposition" | `_docs/02_tasks/todo/` |
| **implement** | "implement", "start implementation" | `_docs/03_implementation/` | | **implement** | "implement", "start implementation" | `_docs/03_implementation/` |
| **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS | | **code-review** | "code review", "review code" | Verdict: PASS / FAIL / PASS_WITH_WARNINGS |
| **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` | | **refactor** | "refactor", "improve code" | `_docs/04_refactoring/` |
@@ -180,7 +180,11 @@ _docs/
│ ├── deployment/ — containerization, CI/CD, environments, observability, procedures │ ├── deployment/ — containerization, CI/CD, environments, observability, procedures
│ ├── diagrams/ │ ├── diagrams/
│ └── FINAL_report.md │ └── FINAL_report.md
├── 02_tasks/ — [JIRA-ID]_[name].md + _dependencies_table.md ├── 02_tasks/ — task lifecycle folders + _dependencies_table.md
│ ├── _dependencies_table.md
│ ├── todo/ — tasks ready for implementation
│ ├── backlog/ — parked tasks (not scheduled yet)
│ └── done/ — completed/archived tasks
├── 03_implementation/ — batch reports, FINAL report ├── 03_implementation/ — batch reports, FINAL report
├── 04_deploy/ — containerization, CI/CD, environments, observability, procedures, scripts ├── 04_deploy/ — containerization, CI/CD, environments, observability, procedures, scripts
├── 04_refactoring/ — baseline, discovery, analysis, execution, hardening ├── 04_refactoring/ — baseline, discovery, analysis, execution, hardening
@@ -202,4 +206,4 @@ _docs/
/decompose @_docs/02_document/components/03_parser/description.md /decompose @_docs/02_document/components/03_parser/description.md
``` ```
Appends tasks for that component to `_docs/02_tasks/` without running bootstrap or cross-verification. Appends tasks for that component to `_docs/02_tasks/todo/` without running bootstrap or cross-verification.
+2 -2
View File
@@ -1,7 +1,7 @@
--- ---
name: implementer name: implementer
description: | description: |
Implements a single task from its spec file. Use when implementing tasks from _docs/02_tasks/. Implements a single task from its spec file. Use when implementing tasks from _docs/02_tasks/todo/.
Reads the task spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria. Reads the task spec, analyzes the codebase, implements the feature with tests, and verifies acceptance criteria.
Launched by the /implement skill as a subagent. Launched by the /implement skill as a subagent.
--- ---
@@ -11,7 +11,7 @@ You are a professional software developer implementing a single task.
## Input ## Input
You receive from the `/implement` orchestrator: You receive from the `/implement` orchestrator:
- Path to a task spec file (e.g., `_docs/02_tasks/[JIRA-ID]_[short_name].md`) - Path to a task spec file (e.g., `_docs/02_tasks/todo/[JIRA-ID]_[short_name].md`)
- Files OWNED (exclusive write access — only you may modify these) - Files OWNED (exclusive write access — only you may modify these)
- Files READ-ONLY (shared interfaces, types — read but do not modify) - Files READ-ONLY (shared interfaces, types — read but do not modify)
- Files FORBIDDEN (other agents' owned files — do not touch) - Files FORBIDDEN (other agents' owned files — do not touch)
+1 -1
View File
@@ -18,7 +18,7 @@ This applies to:
- Requesting approval for destructive actions - Requesting approval for destructive actions
- Reporting that you are blocked and need guidance - Reporting that you are blocked and need guidance
- Any situation where the conversation will stall without user response - Any situation where the conversation will stall without user response
- Completing a task (final answer / deliverable ready for review)
Do NOT play the sound when: Do NOT play the sound when:
- You are providing a final answer that doesn't require a response
- You are in the middle of executing a multi-step task and just providing a status update - You are in the middle of executing a multi-step task and just providing a status update
@@ -68,7 +68,7 @@ Action: Analyze the codebase against the test specs to determine whether the cod
- **Source**: `autopilot-testability-analysis` - **Source**: `autopilot-testability-analysis`
- One change entry per testability issue found (change ID, file paths, problem, proposed change, risk, dependencies) - One change entry per testability issue found (change ID, file paths, problem, proposed change, risk, dependencies)
- Invoke the refactor skill in **guided mode**: read and execute `.cursor/skills/refactor/SKILL.md` with the `list-of-changes.md` as input - Invoke the refactor skill in **guided mode**: read and execute `.cursor/skills/refactor/SKILL.md` with the `list-of-changes.md` as input
- The refactor skill will create RUN_DIR (`01-testability-refactoring`), create tasks in `_docs/02_tasks/`, delegate to implement skill, and verify results - The refactor skill will create RUN_DIR (`01-testability-refactoring`), create tasks in `_docs/02_tasks/todo/`, delegate to implement skill, and verify results
- Phase 3 (Safety Net) is automatically skipped by the refactor skill for testability runs - Phase 3 (Safety Net) is automatically skipped by the refactor skill for testability runs
- After refactoring completes, mark Step 3 as `completed` - After refactoring completes, mark Step 3 as `completed`
- Auto-chain to Step 4 (Decompose Tests) - Auto-chain to Step 4 (Decompose Tests)
@@ -76,23 +76,23 @@ Action: Analyze the codebase against the test specs to determine whether the cod
--- ---
**Step 4 — Decompose Tests** **Step 4 — Decompose Tests**
Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Step 3 (Code Testability Revision) is completed or skipped AND (`_docs/02_tasks/` does not exist or has no test task files) Condition: `_docs/02_document/tests/traceability-matrix.md` exists AND workspace contains source code files AND the autopilot state shows Step 3 (Code Testability Revision) is completed or skipped AND (`_docs/02_tasks/todo/` does not exist or has no test task files)
Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will: Action: Read and execute `.cursor/skills/decompose/SKILL.md` in **tests-only mode** (pass `_docs/02_document/tests/` as input). The decompose skill will:
1. Run Step 1t (test infrastructure bootstrap) 1. Run Step 1t (test infrastructure bootstrap)
2. Run Step 3 (blackbox test task decomposition) 2. Run Step 3 (blackbox test task decomposition)
3. Run Step 4 (cross-verification against test coverage) 3. Run Step 4 (cross-verification against test coverage)
If `_docs/02_tasks/` has some task files already (e.g., refactoring tasks from Step 3), the decompose skill's resumability handles it — it appends test tasks alongside existing refactoring tasks. If `_docs/02_tasks/` subfolders have some task files already (e.g., refactoring tasks from Step 3), the decompose skill's resumability handles it — it appends test tasks alongside existing tasks.
--- ---
**Step 5 — Implement Tests** **Step 5 — Implement Tests**
Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 4 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist Condition: `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND the autopilot state shows Step 4 (Decompose Tests) is completed AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
Action: Read and execute `.cursor/skills/implement/SKILL.md` Action: Read and execute `.cursor/skills/implement/SKILL.md`
The implement skill reads test tasks from `_docs/02_tasks/` and implements them. The implement skill reads test tasks from `_docs/02_tasks/todo/` and implements them.
If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues. If `_docs/03_implementation/` has batch reports, the implement skill detects completed tasks and continues.
@@ -135,7 +135,7 @@ Condition: the autopilot state shows Step 7 (Refactor) is completed or skipped A
Action: Read and execute `.cursor/skills/new-task/SKILL.md` Action: Read and execute `.cursor/skills/new-task/SKILL.md`
The new-task skill interactively guides the user through defining new functionality. It loops until the user is done adding tasks. New task files are written to `_docs/02_tasks/`. The new-task skill interactively guides the user through defining new functionality. It loops until the user is done adding tasks. New task files are written to `_docs/02_tasks/todo/`.
--- ---
@@ -144,7 +144,7 @@ Condition: the autopilot state shows Step 8 (New Task) is completed AND `_docs/0
Action: Read and execute `.cursor/skills/implement/SKILL.md` Action: Read and execute `.cursor/skills/implement/SKILL.md`
The implement skill reads the new tasks from `_docs/02_tasks/` and implements them. Tasks already implemented in Step 5 are skipped (the implement skill tracks completed tasks in batch reports). The implement skill reads the new tasks from `_docs/02_tasks/todo/` and implements them. Tasks already implemented in Step 5 are skipped (completed tasks have been moved to `done/`).
If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues. If `_docs/03_implementation/` has batch reports from this phase, the implement skill detects completed tasks and continues.
+3 -3
View File
@@ -110,16 +110,16 @@ If the project IS a UI project → present using Choose format:
--- ---
**Step 5 — Decompose** **Step 5 — Decompose**
Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/` does not exist or has no task files (excluding `_dependencies_table.md`) Condition: `_docs/02_document/` contains `architecture.md` AND `_docs/02_document/components/` has at least one component AND `_docs/02_tasks/todo/` does not exist or has no task files
Action: Read and execute `.cursor/skills/decompose/SKILL.md` Action: Read and execute `.cursor/skills/decompose/SKILL.md`
If `_docs/02_tasks/` has some task files already, the decompose skill's resumability handles it. If `_docs/02_tasks/` subfolders have some task files already, the decompose skill's resumability handles it.
--- ---
**Step 6 — Implement** **Step 6 — Implement**
Condition: `_docs/02_tasks/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist Condition: `_docs/02_tasks/todo/` contains task files AND `_dependencies_table.md` exists AND `_docs/03_implementation/FINAL_implementation_report.md` does not exist
Action: Read and execute `.cursor/skills/implement/SKILL.md` Action: Read and execute `.cursor/skills/implement/SKILL.md`
+1 -1
View File
@@ -159,7 +159,7 @@ The `/implement` skill invokes this skill after each batch completes:
| Input | Type | Source | Required | | Input | Type | Source | Required |
|-------|------|--------|----------| |-------|------|--------|----------|
| `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/` for the current batch | Yes | | `task_specs` | list of file paths | Task `.md` files from `_docs/02_tasks/todo/` for the current batch | Yes |
| `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes | | `changed_files` | list of file paths | Files modified by implementer agents (from `git diff` or agent reports) | Yes |
| `batch_number` | integer | Current batch number (for report naming) | Yes | | `batch_number` | integer | Current batch number (for report naming) | Yes |
| `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists | | `project_restrictions` | file path | `_docs/00_problem/restrictions.md` | If exists |
+30 -25
View File
@@ -35,12 +35,14 @@ Determine the operating mode based on invocation before any other logic runs.
**Default** (no explicit input file provided): **Default** (no explicit input file provided):
- DOCUMENT_DIR: `_docs/02_document/` - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/`
- TASKS_TODO: `_docs/02_tasks/todo/`
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, DOCUMENT_DIR
- Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification) - Runs Step 1 (bootstrap) + Step 2 (all components) + Step 3 (blackbox tests) + Step 4 (cross-verification)
**Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory): **Single component mode** (provided file is within `_docs/02_document/` and inside a `components/` subdirectory):
- DOCUMENT_DIR: `_docs/02_document/` - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/`
- TASKS_TODO: `_docs/02_tasks/todo/`
- Derive component number and component name from the file path - Derive component number and component name from the file path
- Ask user for the parent Epic ID - Ask user for the parent Epic ID
- Runs Step 2 (that component only, appending to existing task numbering) - Runs Step 2 (that component only, appending to existing task numbering)
@@ -48,6 +50,7 @@ Determine the operating mode based on invocation before any other logic runs.
**Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition): **Tests-only mode** (provided file/directory is within `tests/`, or `DOCUMENT_DIR/tests/` exists and input explicitly requests test decomposition):
- DOCUMENT_DIR: `_docs/02_document/` - DOCUMENT_DIR: `_docs/02_document/`
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/`
- TASKS_TODO: `_docs/02_tasks/todo/`
- TESTS_DIR: `DOCUMENT_DIR/tests/` - TESTS_DIR: `DOCUMENT_DIR/tests/`
- Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR - Reads from: `_docs/00_problem/`, `_docs/01_solution/`, TESTS_DIR
- Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage) - Runs Step 1t (test infrastructure bootstrap) + Step 3 (blackbox test decomposition) + Step 4 (cross-verification against test coverage)
@@ -99,8 +102,8 @@ Announce the detected mode and resolved paths to the user before proceeding.
**Default:** **Default:**
1. DOCUMENT_DIR contains `architecture.md` and `components/`**STOP if missing** 1. DOCUMENT_DIR contains `architecture.md` and `components/`**STOP if missing**
2. Create TASKS_DIR if it does not exist 2. Create TASKS_DIR and TASKS_TODO if they do not exist
3. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?** 3. If TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) already contain task files, ask user: **resume from last checkpoint or start fresh?**
**Single component mode:** **Single component mode:**
1. The provided component file exists and is non-empty — **STOP if missing** 1. The provided component file exists and is non-empty — **STOP if missing**
@@ -108,8 +111,8 @@ Announce the detected mode and resolved paths to the user before proceeding.
**Tests-only mode:** **Tests-only mode:**
1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing** 1. `TESTS_DIR/blackbox-tests.md` exists and is non-empty — **STOP if missing**
2. `TESTS_DIR/environment.md` exists — **STOP if missing** 2. `TESTS_DIR/environment.md` exists — **STOP if missing**
3. Create TASKS_DIR if it does not exist 3. Create TASKS_DIR and TASKS_TODO if they do not exist
4. If TASKS_DIR already contains task files, ask user: **resume from last checkpoint or start fresh?** 4. If TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) already contain task files, ask user: **resume from last checkpoint or start fresh?**
## Artifact Management ## Artifact Management
@@ -117,30 +120,32 @@ Announce the detected mode and resolved paths to the user before proceeding.
``` ```
TASKS_DIR/ TASKS_DIR/
├── [JIRA-ID]_initial_structure.md ├── _dependencies_table.md
├── [JIRA-ID]_[short_name].md ├── todo/
├── [JIRA-ID]_[short_name].md ├── [JIRA-ID]_initial_structure.md
├── ... │ ├── [JIRA-ID]_[short_name].md
└── _dependencies_table.md │ └── ...
├── backlog/
└── done/
``` ```
**Naming convention**: Each task file is initially saved with a temporary numeric prefix (`[##]_[short_name].md`). After creating the Jira ticket, rename the file to use the Jira ticket ID as prefix (`[JIRA-ID]_[short_name].md`). For example: `01_initial_structure.md``AZ-42_initial_structure.md`. **Naming convention**: Each task file is initially saved in `TASKS_TODO/` with a temporary numeric prefix (`[##]_[short_name].md`). After creating the Jira ticket, rename the file to use the Jira ticket ID as prefix (`[JIRA-ID]_[short_name].md`). For example: `todo/01_initial_structure.md``todo/AZ-42_initial_structure.md`.
### Save Timing ### Save Timing
| Step | Save immediately after | Filename | | Step | Save immediately after | Filename |
|------|------------------------|----------| |------|------------------------|----------|
| Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `[JIRA-ID]_initial_structure.md` | | Step 1 | Bootstrap structure plan complete + Jira ticket created + file renamed | `todo/[JIRA-ID]_initial_structure.md` |
| Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `[JIRA-ID]_test_infrastructure.md` | | Step 1t | Test infrastructure bootstrap complete + Jira ticket created + file renamed | `todo/[JIRA-ID]_test_infrastructure.md` |
| Step 2 | Each component task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` | | Step 2 | Each component task decomposed + Jira ticket created + file renamed | `todo/[JIRA-ID]_[short_name].md` |
| Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `[JIRA-ID]_[short_name].md` | | Step 3 | Each blackbox test task decomposed + Jira ticket created + file renamed | `todo/[JIRA-ID]_[short_name].md` |
| Step 4 | Cross-task verification complete | `_dependencies_table.md` | | Step 4 | Cross-task verification complete | `_dependencies_table.md` |
### Resumability ### Resumability
If TASKS_DIR already contains task files: If TASKS_DIR subfolders already contain task files:
1. List existing `*_*.md` files (excluding `_dependencies_table.md`) and count them 1. List existing `*_*.md` files across `todo/`, `backlog/`, and `done/` (excluding `_dependencies_table.md`) and count them
2. Resume numbering from the next number (for temporary numeric prefix before Jira rename) 2. Resume numbering from the next number (for temporary numeric prefix before Jira rename)
3. Inform the user which tasks already exist and are being skipped 3. Inform the user which tasks already exist and are being skipped
@@ -176,11 +181,11 @@ The test infrastructure bootstrap must include:
- [ ] Test runner configuration matches the consumer app tech stack from environment.md - [ ] Test runner configuration matches the consumer app tech stack from environment.md
- [ ] Data isolation strategy is defined - [ ] Data isolation strategy is defined
**Save action**: Write `01_test_infrastructure.md` (temporary numeric name) **Save action**: Write `todo/01_test_infrastructure.md` (temporary numeric name)
**Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header. **Jira action**: Create a Jira ticket for this task under the "Blackbox Tests" epic. Write the Jira ticket ID and Epic ID back into the task header.
**Rename action**: Rename the file from `01_test_infrastructure.md` to `[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename. **Rename action**: Rename the file from `todo/01_test_infrastructure.md` to `todo/[JIRA-ID]_test_infrastructure.md`. Update the **Task** field inside the file to match the new filename.
**BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms. **BLOCKING**: Present test infrastructure plan summary to user. Do NOT proceed until user confirms.
@@ -224,11 +229,11 @@ The bootstrap structure plan must include:
- [ ] Environment strategy covers dev, staging, production - [ ] Environment strategy covers dev, staging, production
- [ ] Test structure includes unit and blackbox test locations - [ ] Test structure includes unit and blackbox test locations
**Save action**: Write `01_initial_structure.md` (temporary numeric name) **Save action**: Write `todo/01_initial_structure.md` (temporary numeric name)
**Jira action**: Create a Jira ticket for this task under the "Bootstrap & Initial Structure" epic. Write the Jira ticket ID and Epic ID back into the task header. **Jira action**: Create a Jira ticket for this task under the "Bootstrap & Initial Structure" epic. Write the Jira ticket ID and Epic ID back into the task header.
**Rename action**: Rename the file from `01_initial_structure.md` to `[JIRA-ID]_initial_structure.md` (e.g., `AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename. **Rename action**: Rename the file from `todo/01_initial_structure.md` to `todo/[JIRA-ID]_initial_structure.md` (e.g., `todo/AZ-42_initial_structure.md`). Update the **Task** field inside the file to match the new filename.
**BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms. **BLOCKING**: Present structure plan summary to user. Do NOT proceed until user confirms.
@@ -254,7 +259,7 @@ For each component (or the single provided component):
6. Write each task spec using `templates/task.md` 6. Write each task spec using `templates/task.md`
7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does 7. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
8. Note task dependencies (referencing Jira IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`) 8. Note task dependencies (referencing Jira IDs of already-created dependency tasks, e.g., `AZ-42_initial_structure`)
9. **Immediately after writing each task file**: create a Jira ticket, link it to the component's epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`. 9. **Immediately after writing each task file**: create a Jira ticket, link it to the component's epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[JIRA-ID]_[short_name].md`.
**Self-verification** (per component): **Self-verification** (per component):
- [ ] Every task is atomic (single concern) - [ ] Every task is atomic (single concern)
@@ -264,7 +269,7 @@ For each component (or the single provided component):
- [ ] No tasks duplicate work from other components - [ ] No tasks duplicate work from other components
- [ ] Every task has a Jira ticket linked to the correct epic - [ ] Every task has a Jira ticket linked to the correct epic
**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename the file to `[JIRA-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use Jira IDs of the dependency tasks. **Save action**: Write each `todo/[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `todo/[JIRA-ID]_[short_name].md`. Update the **Task** field inside the file to match the new filename. Update **Dependencies** references in the file to use Jira IDs of the dependency tasks.
--- ---
@@ -287,7 +292,7 @@ For each component (or the single provided component):
5. Write each task spec using `templates/task.md` 5. Write each task spec using `templates/task.md`
6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does 6. Estimate complexity per task (1, 2, 3, 5 points); no task should exceed 5 points — split if it does
7. Note task dependencies (referencing Jira IDs of already-created dependency tasks) 7. Note task dependencies (referencing Jira IDs of already-created dependency tasks)
8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `[##]_[short_name].md` to `[JIRA-ID]_[short_name].md`. 8. **Immediately after writing each task file**: create a Jira ticket under the "Blackbox Tests" epic, write the Jira ticket ID and Epic ID back into the task header, then rename the file from `todo/[##]_[short_name].md` to `todo/[JIRA-ID]_[short_name].md`.
**Self-verification**: **Self-verification**:
- [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task - [ ] Every scenario from `tests/blackbox-tests.md` is covered by a task
@@ -296,7 +301,7 @@ For each component (or the single provided component):
- [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode) - [ ] Dependencies correctly reference the dependency tasks (component tasks in default mode, test infrastructure in tests-only mode)
- [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic - [ ] Every task has a Jira ticket linked to the "Blackbox Tests" epic
**Save action**: Write each `[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `[JIRA-ID]_[short_name].md`. **Save action**: Write each `todo/[##]_[short_name].md` (temporary numeric name), create Jira ticket inline, then rename to `todo/[JIRA-ID]_[short_name].md`.
--- ---
@@ -342,7 +347,7 @@ Tests-only mode:
- **Cross-component tasks**: each task belongs to exactly one component - **Cross-component tasks**: each task belongs to exactly one component
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation - **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Creating git branches**: branch creation is an implementation concern, not a decomposition one - **Creating git branches**: branch creation is an implementation concern, not a decomposition one
- **Creating component subdirectories**: all tasks go flat in TASKS_DIR - **Creating component subdirectories**: all tasks go flat in `TASKS_TODO/`
- **Forgetting Jira**: every task must have a Jira ticket created inline — do not defer to a separate step - **Forgetting Jira**: every task must have a Jira ticket created inline — do not defer to a separate step
- **Forgetting to rename**: after Jira ticket creation, always rename the file from numeric prefix to Jira ID prefix - **Forgetting to rename**: after Jira ticket creation, always rename the file from numeric prefix to Jira ID prefix
+19 -5
View File
@@ -33,12 +33,22 @@ The `implementer` agent is the specialist that writes all the code — it receiv
## Context Resolution ## Context Resolution
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/`
- Task files: all `*.md` files in TASKS_DIR (excluding files starting with `_`) - Task files: all `*.md` files in `TASKS_DIR/todo/` (excluding files starting with `_`)
- Dependency table: `TASKS_DIR/_dependencies_table.md` - Dependency table: `TASKS_DIR/_dependencies_table.md`
### Task Lifecycle Folders
```
TASKS_DIR/
├── _dependencies_table.md
├── todo/ ← tasks ready for implementation (this skill reads from here)
├── backlog/ ← parked tasks (not scheduled yet, ignored by this skill)
└── done/ ← completed tasks (moved here after implementation)
```
## Prerequisite Checks (BLOCKING) ## Prerequisite Checks (BLOCKING)
1. TASKS_DIR exists and contains at least one task file — **STOP if missing** 1. `TASKS_DIR/todo/` exists and contains at least one task file — **STOP if missing**
2. `_dependencies_table.md` exists — **STOP if missing** 2. `_dependencies_table.md` exists — **STOP if missing**
3. At least one task is not yet completed — **STOP if all done** 3. At least one task is not yet completed — **STOP if all done**
@@ -46,7 +56,7 @@ The `implementer` agent is the specialist that writes all the code — it receiv
### 1. Parse ### 1. Parse
- Read all task `*.md` files from TASKS_DIR (excluding files starting with `_`) - Read all task `*.md` files from `TASKS_DIR/todo/` (excluding files starting with `_`)
- Read `_dependencies_table.md` — parse into a dependency graph (DAG) - Read `_dependencies_table.md` — parse into a dependency graph (DAG)
- Validate: no circular dependencies, all referenced dependencies exist - Validate: no circular dependencies, all referenced dependencies exist
@@ -134,9 +144,13 @@ Track `auto_fix_attempts` count in the batch report for retrospective analysis.
After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step. After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
### 13. Loop ### 13. Archive Completed Tasks
- Go back to step 2 until all tasks are done Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
### 14. Loop
- Go back to step 2 until all tasks in `todo/` are done
- When all tasks are complete, report final summary - When all tasks are complete, report final summary
## Batch Report Persistence ## Batch Report Persistence
+7 -6
View File
@@ -31,13 +31,14 @@ Guide the user through defining new functionality for an existing codebase. Prod
Fixed paths: Fixed paths:
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/`
- TASKS_TODO: `_docs/02_tasks/todo/`
- PLANS_DIR: `_docs/02_task_plans/` - PLANS_DIR: `_docs/02_task_plans/`
- DOCUMENT_DIR: `_docs/02_document/` - DOCUMENT_DIR: `_docs/02_document/`
- DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md` - DEPENDENCIES_TABLE: `_docs/02_tasks/_dependencies_table.md`
Create TASKS_DIR and PLANS_DIR if they don't exist. Create TASKS_DIR, TASKS_TODO, and PLANS_DIR if they don't exist.
If TASKS_DIR already contains task files, scan them to determine the next numeric prefix for temporary file naming. If TASKS_DIR already contains task files (scan `todo/`, `backlog/`, and `done/`), use them to determine the next numeric prefix for temporary file naming.
## Workflow ## Workflow
@@ -195,13 +196,13 @@ Present using the Choose format for each decision that has meaningful alternativ
**Role**: Technical writer **Role**: Technical writer
**Goal**: Produce the task specification file. **Goal**: Produce the task specification file.
1. Determine the next numeric prefix by scanning TASKS_DIR for existing files 1. Determine the next numeric prefix by scanning all TASKS_DIR subfolders (`todo/`, `backlog/`, `done/`) for existing files
2. Write the task file using `.cursor/skills/decompose/templates/task.md`: 2. Write the task file using `.cursor/skills/decompose/templates/task.md`:
- Fill all fields from the gathered information - Fill all fields from the gathered information
- Set **Complexity** based on the assessment from Step 2 - Set **Complexity** based on the assessment from Step 2
- Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR - Set **Dependencies** by cross-referencing existing tasks in TASKS_DIR subfolders
- Set **Jira** and **Epic** to `pending` (filled in Step 7) - Set **Jira** and **Epic** to `pending` (filled in Step 7)
3. Save as `TASKS_DIR/[##]_[short_name].md` 3. Save as `TASKS_TODO/[##]_[short_name].md`
**Self-verification**: **Self-verification**:
- [ ] Problem section clearly describes the user need - [ ] Problem section clearly describes the user need
@@ -259,7 +260,7 @@ Ask the user:
After the user chooses **Done**: After the user chooses **Done**:
1. Update (or create) `TASKS_DIR/_dependencies_table.md` — add all newly created tasks to the dependencies table 1. Update (or create) `DEPENDENCIES_TABLE` — add all newly created tasks to the dependencies table
2. Present a summary of all tasks created in this session: 2. Present a summary of all tasks created in this session:
``` ```
+3 -2
View File
@@ -38,6 +38,7 @@ Announce detected paths and input mode to user before proceeding.
| COMPONENTS_DIR | `_docs/02_document/components/` | | COMPONENTS_DIR | `_docs/02_document/components/` |
| DOCUMENT_DIR | `_docs/02_document/` | | DOCUMENT_DIR | `_docs/02_document/` |
| TASKS_DIR | `_docs/02_tasks/` | | TASKS_DIR | `_docs/02_tasks/` |
| TASKS_TODO | `_docs/02_tasks/todo/` |
| REFACTOR_DIR | `_docs/04_refactoring/` | | REFACTOR_DIR | `_docs/04_refactoring/` |
| RUN_DIR | `REFACTOR_DIR/NN-[run-name]/` | | RUN_DIR | `REFACTOR_DIR/NN-[run-name]/` |
@@ -99,9 +100,9 @@ doc_update_log.md Phase 7
FINAL_report.md after all phases FINAL_report.md after all phases
``` ```
Task files produced during Phase 2 go to TASKS_DIR (not RUN_DIR): Task files produced during Phase 2 go to TASKS_TODO (not RUN_DIR):
``` ```
TASKS_DIR/[JIRA-ID]_refactor_[short_name].md TASKS_TODO/[JIRA-ID]_refactor_[short_name].md
TASKS_DIR/_dependencies_table.md (appended) TASKS_DIR/_dependencies_table.md (appended)
``` ```
+1 -1
View File
@@ -32,7 +32,7 @@ Fixed paths:
- IMPL_DIR: `_docs/03_implementation/` - IMPL_DIR: `_docs/03_implementation/`
- METRICS_DIR: `_docs/06_metrics/` - METRICS_DIR: `_docs/06_metrics/`
- TASKS_DIR: `_docs/02_tasks/` - TASKS_DIR: `_docs/02_tasks/` (scan all subfolders: `todo/`, `backlog/`, `done/`)
Announce the resolved paths to the user before proceeding. Announce the resolved paths to the user before proceeding.
@@ -0,0 +1,221 @@
# LiveKit Stream Detection
**Task**: AZ-150_livekit_stream_detection
**Name**: LiveKit Stream Detection Integration
**Description**: Enable real-time object detection on 5-10 simultaneous LiveKit WebRTC streams. Two-app architecture: a Playwright companion app for authentication and stream discovery, plus LiveKit SDK integration in the detection service for frame capture and inference.
**Complexity**: 5 points
**Dependencies**: None (extends existing detection service)
**Component**: Feature
**Jira**: AZ-150
**Epic**: AZ-149
## Problem
The platform streams live video via LiveKit WebRTC. The detection service currently only processes pre-recorded video files and static images via `cv2.VideoCapture`. There is no way to run real-time object detection on live streams. The user needs to detect objects on 5-10 out of 50+ simultaneous streams shown on the platform's web page.
Key constraints:
- No LiveKit API key/secret available (only browser-level access)
- LiveKit WebRTC streams cannot be consumed by `cv2.VideoCapture`
- Tokens are issued by the platform's backend and expire periodically
- Must handle 5-10 concurrent streams without overwhelming the GPU inference engine
## Outcome
- User can open the platform's stream page in a Playwright-controlled browser, log in, and see all available streams
- The system automatically discovers stream IDs, LiveKit room names, tokens, and server URL from network traffic
- User selects which streams to run detection on via an injected UI overlay
- Detection runs continuously on selected streams with results flowing through the existing SSE endpoint
- Tokens are automatically refreshed as the page renews them
## Architecture
### Two-App Design
```
App 1: stream_discover.py (Playwright companion)
- Launches real Chromium browser (separate process)
- Python controls it via Chrome DevTools Protocol (CDP) over local WebSocket
- User interacts with browser normally (login, navigation)
- Python silently intercepts network traffic and reads the DOM
- Injects a floating selection UI onto the page
- Sends selected stream configs to the detection service API
App 2: Detection Service (existing FastAPI in main.py)
- New /detect/livekit/* endpoints receive stream configs from companion app
- livekit_source.py connects to LiveKit rooms via livekit.rtc SDK
- livekit_detector.py orchestrates multi-stream frame capture and batched inference
- inference.pyx gains a new detect_frames() method for raw numpy frame batches
- Results flow through existing SSE /detect/stream endpoint
```
### How Playwright Works (NOT a Webview)
Playwright is a browser automation library by Microsoft. It does NOT embed a browser inside a Python window. Instead:
1. `playwright.chromium.launch(headless=False)` starts a **real standalone Chromium process** -- identical to opening Chrome
2. Python communicates with this browser via CDP (Chrome DevTools Protocol) over a local WebSocket
3. The user sees a normal browser window and interacts with it normally (login, clicking, scrolling)
4. Python silently observes all network traffic, reads the DOM, and can inject HTML/JavaScript
5. There is no Python GUI -- the browser window IS the entire interface
```
Python Process Chromium Process (separate)
+--------------------------+ +---------------------------+
| stream_discover.py | | Normal browser window |
| | | |
| - Playwright library | CDP | - User logs in normally |
| - Token interceptor |<====>| - DevTools Protocol |
| - DOM parser | WS | - Full web app rendering |
| - Selection UI injector | | - LiveKit video playback |
+--------------------------+ +---------------------------+
```
Advantages over a webview:
- No GUI code to write -- browser IS the UI
- User sees the exact same web app they normally use
- Full access to network requests, cookies, localStorage
- Playwright handles CDP complexity
### Data Flow
```
1. User logs in via browser
2. User navigates to streams page
3. Python intercepts HTTP responses containing LiveKit JWT tokens
4. Python parses DOM for data-testid="mission-video-*" elements
5. Python decodes JWTs to extract room names
6. Python injects floating panel with stream checkboxes onto the page
7. User selects streams, clicks "Start Detection"
8. Python POSTs {livekit_url, rooms[{name, token, stream_id}]} to detection service
9. Detection service connects to LiveKit rooms via livekit.rtc
10. Frames are sampled, batched, and run through inference engine
11. DetectionEvents emitted via existing SSE /detect/stream
12. Python companion stays open, intercepts token refreshes, pushes to detection service
```
### Multi-Stream Frame Processing
```
Stream 1 (async task) ─── sample every Nth frame ──┐
Stream 2 (async task) ─── sample every Nth frame ──├─► Shared Frame Queue ─► Detection Worker Thread
Stream N (async task) ─── sample every Nth frame ──┘ │ │
│ ▼
backpressure: inference.detect_frames()
keep only latest │
frame per stream ▼
DetectionEvent → SSE
```
- At 30fps input with frame_period_recognition=4: ~7.5 fps per stream
- 10 streams = ~75 frames/sec into the queue
- Engine batch size determines how many frames are processed at once
- Backpressure: each stream keeps only its latest unprocessed frame; stale frames dropped
## Scope
### Included
**Companion App (stream_discover.py)**
- Playwright browser launch and lifecycle management
- Network response interception for LiveKit JWT token capture
- WebSocket URL interception for LiveKit server URL discovery
- DOM parsing for stream ID and display name extraction
- JWT decoding to map stream_id -> room_name
- Injected floating UI panel with stream checkboxes and "Start Detection" button
- HTTP POST to detection service with selected stream configs
- Token refresh monitoring and forwarding
**Detection Service**
- `livekit_source.py`: LiveKit room connection, video track subscription, VideoFrame -> BGR numpy conversion
- `livekit_detector.py`: multi-stream task orchestration, frame sampling, shared queue, batched detection loop, SSE event emission
- `inference.pyx`/`.pxd`: new `detect_frames(frames, config)` cpdef method for raw numpy frame batches
- `main.py`: new endpoints POST /detect/livekit/start, POST /detect/livekit/refresh-tokens, DELETE /detect/livekit/stop, GET /detect/livekit/status
- `requirements.txt`: add `livekit` and `playwright` dependencies
### Excluded
- LiveKit API key/secret based token generation (no access)
- Publishing video back to LiveKit
- Recording or saving stream frames to disk
- Modifying existing /detect or /detect/{media_id} endpoints
- UI beyond the injected browser overlay
## Acceptance Criteria
**AC-1: Stream Discovery**
Given the user opens the platform's stream page in the Playwright browser
When the page loads and streams are rendered
Then the companion app discovers all stream IDs, display names, LiveKit tokens, room names, and server URL from network traffic and DOM
**AC-2: Stream Selection UI**
Given streams are discovered
When the companion app injects the selection panel
Then the user sees a floating panel listing all streams with checkboxes and a "Start Detection" button
**AC-3: Start Detection**
Given the user selects N streams and clicks "Start Detection"
When the companion app sends the config to the detection service
Then the detection service connects to N LiveKit rooms and begins receiving video frames
**AC-4: Real-Time Inference**
Given the detection service is receiving frames from LiveKit streams
When frames are sampled and batched through the inference engine
Then DetectionEvents with annotations are emitted via the existing SSE /detect/stream endpoint
**AC-5: Multi-Stream Handling**
Given 5-10 streams are active simultaneously
When inference runs continuously
Then all streams are processed fairly (round-robin or queue-based) without any stream being starved
**AC-6: Token Refresh**
Given the platform's frontend refreshes LiveKit tokens periodically
When the companion app detects a token renewal in network traffic
Then the new token is forwarded to the detection service and the LiveKit connection continues without interruption
**AC-7: Stop Detection**
Given detection is running on N streams
When the user calls DELETE /detect/livekit/stop
Then all LiveKit connections are cleanly closed and detection tasks cancelled
## File Changes
| File | Action | Description |
|------|--------|-------------|
| `stream_discover.py` | New | Playwright companion app |
| `livekit_source.py` | New | LiveKit room connection and frame capture |
| `livekit_detector.py` | New | Multi-stream detection orchestration |
| `inference.pyx` | Modified | Add `detect_frames` cpdef method |
| `inference.pxd` | Modified | Declare `detect_frames` method |
| `main.py` | Modified | Add /detect/livekit/* endpoints |
| `requirements.txt` | Modified | Add `livekit`, `playwright` |
## Non-Functional Requirements
**Performance**
- Frame-to-detection latency < 500ms per batch (excluding network latency)
- 10 concurrent streams without OOM or queue overflow
**Reliability**
- Graceful handling of LiveKit disconnections (auto-reconnect or clean stop)
- Token expiry handled without crash
## Risks & Mitigation
**Risk 1: LiveKit Python SDK frame format compatibility**
- *Risk*: VideoFrame format (RGBA/I420/NV12) may vary by codec and platform
- *Mitigation*: Use `frame.convert(VideoBufferType.RGBA)` to normalize, then convert to BGR
**Risk 2: Token expiration before refresh is captured**
- *Risk*: If the companion app misses a token refresh, the LiveKit connection drops
- *Mitigation*: Implement reconnection logic in livekit_source.py; companion app can re-request tokens
**Risk 3: Inference engine bottleneck with 10 streams**
- *Risk*: GPU/CPU inference cannot keep up with frame arrival rate
- *Mitigation*: Backpressure design (drop stale frames); configurable frame_period_recognition to reduce load
**Risk 4: Playwright browser stability**
- *Risk*: Long-running browser session may leak memory or crash
- *Mitigation*: Monitor browser process health; provide manual restart capability
**Risk 5: LiveKit room structure unknown**
- *Risk*: Rooms may be structured differently than expected (multi-track, SFU routing)
- *Mitigation*: Start with single-track subscription per room; adapt after initial testing
@@ -0,0 +1,193 @@
# Test Infrastructure
**Task**: AZ-138_test_infrastructure
**Name**: Test Infrastructure
**Description**: Scaffold the E2E test project — test runner, mock services, Docker test environment, test data fixtures, reporting
**Complexity**: 5 points
**Dependencies**: None
**Component**: Integration Tests
**Jira**: AZ-138
**Epic**: AZ-137
## Test Project Folder Layout
```
e2e/
├── conftest.py
├── requirements.txt
├── Dockerfile
├── pytest.ini
├── mocks/
│ ├── loader/
│ │ ├── Dockerfile
│ │ └── app.py
│ └── annotations/
│ ├── Dockerfile
│ └── app.py
├── fixtures/
│ ├── image_small.jpg (1280×720 JPEG, aerial, detectable objects)
│ ├── image_large.JPG (6252×4168 JPEG, triggers tiling)
│ ├── image_dense01.jpg (1280×720 JPEG, dense scene, clustered objects)
│ ├── image_dense02.jpg (1920×1080 JPEG, dense scene variant)
│ ├── image_different_types.jpg (900×1600 JPEG, varied object classes)
│ ├── image_empty_scene.jpg (1920×1080 JPEG, no detectable objects)
│ ├── video_short01.mp4 (short MP4 with moving objects)
│ ├── video_short02.mp4 (short MP4 variant for concurrent tests)
│ ├── video_long03.mp4 (long MP4, generates >100 SSE events)
│ ├── empty_image (zero-byte file, generated at build)
│ ├── corrupt_image (random binary garbage, generated at build)
│ ├── classes.json (19 classes, 3 weather modes, MaxSizeM values)
│ └── azaion.onnx (YOLO ONNX model, 1280×1280 input, 19 classes, 81MB)
├── tests/
│ ├── test_health_engine.py
│ ├── test_single_image.py
│ ├── test_tiling.py
│ ├── test_async_sse.py
│ ├── test_video.py
│ ├── test_negative.py
│ ├── test_resilience.py
│ ├── test_performance.py
│ ├── test_security.py
│ └── test_resource_limits.py
└── docker-compose.test.yml
```
### Layout Rationale
- `mocks/` separated from tests — each mock is a standalone Docker service with its own Dockerfile
- `fixtures/` holds all static test data, volume-mounted into containers
- `tests/` organized by test category matching the test spec structure (one file per task group)
- `conftest.py` provides shared pytest fixtures (HTTP clients, SSE helpers, service readiness checks)
- `pytest.ini` configures markers for `gpu`/`cpu` profiles and test ordering
## Mock Services
| Mock Service | Replaces | Endpoints | Behavior |
|-------------|----------|-----------|----------|
| mock-loader | Loader service (model download/upload) | `GET /models/azaion.onnx` — serves ONNX model from volume. `POST /upload` — accepts TensorRT engine upload, stores in memory. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/status` — returns mock state. | Deterministic: serves model file from `/models/` volume. Configurable downtime via control endpoint. First-request-fail mode for retry tests. |
| mock-annotations | Annotations service (result posting, token refresh) | `POST /annotations` — accepts annotation POST, stores in memory. `POST /auth/refresh` — returns refreshed token. `POST /mock/config` — control API (simulate 503, reset state). `GET /mock/annotations` — returns recorded annotations for assertion. | Records all incoming annotations in memory. Provides token refresh. Configurable downtime. Assertions via GET endpoint to verify what was received. |
### Mock Control API
Both mock services expose:
- `POST /mock/config` — accepts JSON `{"mode": "normal"|"error"|"first_fail"}` to control behavior
- `POST /mock/reset` — clears recorded state (annotations, uploads)
- `GET /mock/status` — returns current mode and recorded interaction count
## Docker Test Environment
### docker-compose.test.yml Structure
| Service | Image / Build | Purpose | Depends On |
|---------|--------------|---------|------------|
| detections | Build from repo root (Dockerfile) | System under test — FastAPI detection service | mock-loader, mock-annotations |
| mock-loader | Build from `e2e/mocks/loader/` | Serves ONNX model, accepts TensorRT uploads | — |
| mock-annotations | Build from `e2e/mocks/annotations/` | Accepts annotation results, provides token refresh | — |
| e2e-consumer | Build from `e2e/` | pytest test runner | detections |
### Networks and Volumes
**Network**: `e2e-net` — isolated bridge network, all services communicate via hostnames
**Volumes**:
| Volume | Mount Target | Content |
|--------|-------------|---------|
| test-models | mock-loader:/models | `azaion.onnx` model file |
| test-media | e2e-consumer:/media | Test images and video files |
| test-classes | detections:/app/classes.json | `classes.json` with 19 detection classes |
| test-results | e2e-consumer:/results | CSV test report output |
### GPU Profile
Two Docker Compose profiles:
- **cpu** (default): `detections` runs without GPU runtime, exercises ONNX fallback path
- **gpu**: `detections` runs with `runtime: nvidia` and `NVIDIA_VISIBLE_DEVICES=all`, exercises TensorRT path
### Environment Variables (detections service)
| Variable | Value | Purpose |
|----------|-------|---------|
| LOADER_URL | http://mock-loader:8080 | Points to mock Loader |
| ANNOTATIONS_URL | http://mock-annotations:8081 | Points to mock Annotations |
## Test Runner Configuration
**Framework**: pytest
**Plugins**: pytest-csv (reporting), requests (HTTP client), sseclient-py (SSE streaming), pytest-timeout (per-test timeouts)
**Entry point**: `pytest --csv=/results/report.csv -v`
### Fixture Strategy
| Fixture | Scope | Purpose |
|---------|-------|---------|
| `base_url` | session | Detections service base URL (`http://detections:8000`) |
| `http_client` | session | `requests.Session` configured with base URL and default timeout |
| `sse_client_factory` | function | Factory that opens SSE connection to `/detect/stream` |
| `mock_loader_url` | session | Mock-loader base URL for control API calls |
| `mock_annotations_url` | session | Mock-annotations base URL for control API and assertion calls |
| `wait_for_services` | session (autouse) | Polls health endpoints until all services are ready |
| `reset_mocks` | function (autouse) | Calls `POST /mock/reset` on both mocks before each test |
| `image_small` | session | Reads `image_small.jpg` from `/media/` volume |
| `image_large` | session | Reads `image_large.JPG` from `/media/` volume |
| `image_dense` | session | Reads `image_dense01.jpg` from `/media/` volume |
| `image_dense_02` | session | Reads `image_dense02.jpg` from `/media/` volume |
| `image_different_types` | session | Reads `image_different_types.jpg` from `/media/` volume |
| `image_empty_scene` | session | Reads `image_empty_scene.jpg` from `/media/` volume |
| `video_short_path` | session | Path to `video_short01.mp4` on `/media/` volume |
| `video_short_02_path` | session | Path to `video_short02.mp4` on `/media/` volume |
| `video_long_path` | session | Path to `video_long03.mp4` on `/media/` volume |
| `empty_image` | session | Reads zero-byte file |
| `corrupt_image` | session | Reads random binary file |
| `jwt_token` | function | Generates a valid JWT with exp claim for auth tests |
| `warm_engine` | module | Sends one detection request to initialize engine, used by tests that need warm engine |
## Test Data Fixtures
| Data Set | Source | Format | Used By |
|----------|--------|--------|---------|
| azaion.onnx | `input_data/azaion.onnx` | ONNX (1280×1280 input, 19 classes, 81MB) | All detection tests (via mock-loader) |
| classes.json | repo root `classes.json` | JSON (19 objects with Id, Name, Color, MaxSizeM) | All tests (volume mount to detections) |
| image_small.jpg | `input_data/image_small.jpg` | JPEG 1280×720 | Health, single image, filtering, negative, performance tests |
| image_large.JPG | `input_data/image_large.JPG` | JPEG 6252×4168 | Tiling tests, performance tests |
| image_dense01.jpg | `input_data/image_dense01.jpg` | JPEG 1280×720 dense scene | Dedup tests, detection cap tests |
| image_dense02.jpg | `input_data/image_dense02.jpg` | JPEG 1920×1080 dense scene | Dedup variant |
| image_different_types.jpg | `input_data/image_different_types.jpg` | JPEG 900×1600 varied classes | Weather mode class variant tests |
| image_empty_scene.jpg | `input_data/image_empty_scene.jpg` | JPEG 1920×1080 empty | Zero-detection edge case |
| video_short01.mp4 | `input_data/video_short01.mp4` | MP4 short video | Async, SSE, video processing tests |
| video_short02.mp4 | `input_data/video_short02.mp4` | MP4 short video variant | Concurrent, resilience tests |
| video_long03.mp4 | `input_data/video_long03.mp4` | MP4 long video (288MB) | SSE overflow, queue depth tests |
| empty_image | Generated at build | Zero-byte file | FT-N-01 |
| corrupt_image | Generated at build | Random binary | FT-N-02 |
### Data Isolation
Each test run starts with fresh containers (`docker compose down -v && docker compose up`). The detections service is stateless — no persistent data between runs. Mock services reset state via `POST /mock/reset` before each test. Tests that modify mock behavior (e.g., making loader unreachable) run with function-scoped mock resets.
## Test Reporting
**Format**: CSV
**Columns**: Test ID, Test Name, Execution Time (ms), Result (PASS/FAIL/SKIP), Error Message (if FAIL)
**Output path**: `/results/report.csv` → mounted to `./e2e-results/report.csv` on host
## Acceptance Criteria
**AC-1: Test environment starts**
Given the docker-compose.test.yml
When `docker compose -f docker-compose.test.yml up` is executed
Then all services start and the detections service is reachable at http://detections:8000/health
**AC-2: Mock services respond**
Given the test environment is running
When the e2e-consumer sends requests to mock-loader and mock-annotations
Then mock services respond with configured behavior and record interactions
**AC-3: Test runner executes**
Given the test environment is running
When the e2e-consumer starts
Then pytest discovers and executes test files from `tests/` directory
**AC-4: Test report generated**
Given tests have been executed
When the test run completes
Then `/results/report.csv` exists with columns: Test ID, Test Name, Execution Time, Result, Error Message
@@ -0,0 +1,87 @@
# Health & Engine Lifecycle Tests
**Task**: AZ-139_test_health_engine
**Name**: Health & Engine Lifecycle Tests
**Description**: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-139
**Epic**: AZ-137
## Problem
The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).
## Outcome
- Health endpoint behavior verified across all engine states
- Lazy initialization confirmed (no engine load at startup)
- ONNX fallback path validated on CPU-only environments
- Engine state transitions observable through health endpoint
## Scope
### Included
- FT-P-01: Health check returns status before engine initialization
- FT-P-02: Health check reflects engine availability after initialization
- FT-P-14: Engine lazy initialization on first detection request
- FT-P-15: ONNX fallback when GPU unavailable
### Excluded
- TensorRT-specific engine tests (require GPU hardware)
- Performance benchmarking of engine initialization time
- Engine error recovery scenarios (covered in resilience tests)
## Acceptance Criteria
**AC-1: Pre-init health check**
Given the detections service just started with no prior requests
When GET /health is called
Then response is 200 with status "healthy" and aiAvailability "None"
**AC-2: Post-init health check**
Given a successful detection has been performed
When GET /health is called
Then aiAvailability reflects an active engine state (not "None" or "Downloading")
**AC-3: Lazy initialization**
Given a fresh service start
When GET /health is called immediately
Then aiAvailability is "None" (engine not loaded at startup)
And after POST /detect with a valid image, GET /health shows engine active
**AC-4: ONNX fallback**
Given the service runs without GPU runtime (CPU-only profile)
When POST /detect is called with a valid image
Then detection succeeds via ONNX Runtime without TensorRT errors
## Non-Functional Requirements
**Performance**
- Health check response within 2s
- First detection (including engine init) within 60s
**Reliability**
- Tests must work on both CPU-only and GPU Docker profiles
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Fresh service, no requests | GET /health before any detection | 200, aiAvailability: "None" | Max 2s |
| AC-2 | After POST /detect succeeds | GET /health after detection | aiAvailability not "None" | Max 30s |
| AC-3 | Fresh service | Health → Detect → Health sequence | State transition None → active | Max 60s |
| AC-4 | CPU-only Docker profile | POST /detect on CPU profile | Detection succeeds via ONNX | Max 60s |
## Constraints
- Tests must use the CPU Docker profile for ONNX fallback verification
- Engine initialization time varies by hardware; timeouts must be generous
- Health endpoint schema depends on AiAvailabilityStatus enum from codebase
## Risks & Mitigation
**Risk 1: Engine init timeout on slow CI**
- *Risk*: Engine initialization may exceed timeout on resource-constrained CI runners
- *Mitigation*: Use generous timeouts (60s) and mark as known slow test
@@ -0,0 +1,92 @@
# Single Image Detection Tests
**Task**: AZ-140_test_single_image
**Name**: Single Image Detection Tests
**Description**: Implement E2E tests verifying single image detection, confidence filtering, overlap deduplication, physical size filtering, and weather mode classes
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-140
**Epic**: AZ-137
## Problem
Single image detection is the core functionality of the system. Tests must verify that detections are returned with correct structure, confidence filtering works at different thresholds, overlapping detections are deduplicated, physical size filtering removes implausible detections, and weather mode class variants are recognized.
## Outcome
- Detection response structure validated (x, y, width, height, label, confidence)
- Confidence threshold filtering confirmed at multiple thresholds
- Overlap deduplication verified with configurable containment ratio
- Physical size filtering validated against MaxSizeM from classes.json
- Weather mode class variants (Norm, Wint, Night) recognized correctly
## Scope
### Included
- FT-P-03: Single image detection returns detections
- FT-P-05: Detection confidence filtering respects threshold
- FT-P-06: Overlapping detections are deduplicated
- FT-P-07: Physical size filtering removes oversized detections
- FT-P-13: Weather mode class variants
### Excluded
- Large image tiling (covered in tiling tests)
- Async/video detection (covered in async and video tests)
- Negative input validation (covered in negative tests)
## Acceptance Criteria
**AC-1: Detection response structure**
Given an initialized engine and a valid small image
When POST /detect is called with the image
Then response is 200 with an array of DetectionDto objects containing x, y, width, height, label, confidence fields with coordinates in 0.0-1.0 range
**AC-2: Confidence filtering**
Given an initialized engine
When POST /detect is called with probability_threshold 0.8
Then all returned detections have confidence >= 0.8
And calling with threshold 0.1 returns >= the number from threshold 0.8
**AC-3: Overlap deduplication**
Given an initialized engine and a scene with clustered objects
When POST /detect is called with tracking_intersection_threshold 0.6
Then no two detections of the same class overlap by more than 60%
And lower threshold (0.01) produces fewer or equal detections
**AC-4: Physical size filtering**
Given an initialized engine and known GSD parameters
When POST /detect is called with altitude, focal_length, sensor_width config
Then no detection's computed physical size exceeds the MaxSizeM for its class
**AC-5: Weather mode classes**
Given an initialized engine with classes.json including weather variants
When POST /detect is called
Then all returned labels are valid entries from the 19-class x 3-mode registry
## Non-Functional Requirements
**Performance**
- Single image detection within 30s (includes potential engine init)
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm, small-image | POST /detect response structure | Array of DetectionDto, coords 0.0-1.0 | Max 30s |
| AC-2 | Engine warm, small-image | Two thresholds (0.8 vs 0.1) | Higher threshold = fewer detections | Max 30s |
| AC-3 | Engine warm, small-image | Two containment thresholds | Lower threshold = more dedup | Max 30s |
| AC-4 | Engine warm, small-image, GSD config | Physical size vs MaxSizeM | No oversized detections returned | Max 30s |
| AC-5 | Engine warm, small-image | Detection label validation | Labels match classes.json entries | Max 30s |
## Constraints
- Deduplication verification requires the test image to produce overlapping detections
- Physical size filtering requires correct GSD parameters matching the fixture image
- Weather mode verification depends on classes.json fixture content
## Risks & Mitigation
**Risk 1: Insufficient detections from test image**
- *Risk*: Small test image may not produce enough detections for meaningful filtering/dedup tests
- *Mitigation*: Use an image with known dense object content; accept >= 1 detection as valid
+68
View File
@@ -0,0 +1,68 @@
# Image Tiling Tests
**Task**: AZ-141_test_tiling
**Name**: Image Tiling Tests
**Description**: Implement E2E tests verifying GSD-based tiling for large images and tile boundary deduplication
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-141
**Epic**: AZ-137
## Problem
Images exceeding 1.5x model dimensions (1280x1280) must be tiled based on Ground Sample Distance (GSD) calculations. Tests must verify that tiling produces correct results with coordinates normalized to the original image, and that duplicate detections at tile boundaries are properly merged.
## Outcome
- Large image tiling confirmed with GSD-based tile sizing
- Detection coordinates normalized to original image dimensions (not tile-local)
- Tile boundary deduplication verified (no near-identical coordinate duplicates)
- Bounding box coordinates remain in 0.0-1.0 range
## Scope
### Included
- FT-P-04: Large image triggers GSD-based tiling
- FT-P-16: Tile deduplication removes duplicate detections at tile boundaries
### Excluded
- Small image detection (covered in single image tests)
- Tiling performance benchmarks (covered in performance tests)
- Tile overlap configuration beyond default (implementation detail)
## Acceptance Criteria
**AC-1: GSD-based tiling**
Given an initialized engine and a large image (4000x3000)
When POST /detect is called with altitude, focal_length, sensor_width config
Then detections are returned with coordinates in 0.0-1.0 range relative to the full original image
**AC-2: Tile boundary deduplication**
Given an initialized engine and a large image with tile overlap
When POST /detect is called with tiling config including big_image_tile_overlap_percent
Then no two detections of the same class have coordinates within 0.01 of each other (TILE_DUPLICATE_CONFIDENCE_THRESHOLD)
## Non-Functional Requirements
**Performance**
- Large image processing within 60s (tiling adds overhead)
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm, large-image (4000x3000), GSD config | POST /detect with large image | Detections with coords 0.0-1.0 relative to full image | Max 60s |
| AC-2 | Engine warm, large-image, tile overlap config | Check for near-duplicate detections | No same-class duplicates within 0.01 coords | Max 60s |
## Constraints
- Large image fixture must exceed 1.5x model input (1920x1920) to trigger tiling
- GSD parameters must be physically plausible for the test scenario
- Tile dedup threshold is hardcoded at 0.01 in the system
## Risks & Mitigation
**Risk 1: No detections at tile boundaries**
- *Risk*: Test image may not have objects near tile boundaries
- *Mitigation*: Verify tiling occurred by checking processing time is greater than small image; dedup assertion is vacuously true if no boundary objects
@@ -0,0 +1,77 @@
# Async Detection & SSE Streaming Tests
**Task**: AZ-142_test_async_sse
**Name**: Async Detection & SSE Streaming Tests
**Description**: Implement E2E tests verifying async media detection initiation, SSE event streaming, and duplicate media_id rejection
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-142
**Epic**: AZ-137
## Problem
Async media detection via POST /detect/{media_id} must return immediately with "started" status while processing continues in background. SSE streaming must deliver real-time detection events to connected clients. Duplicate media_id submissions must be rejected with 409.
## Outcome
- Async detection returns immediately without waiting for processing
- SSE connection receives detection events during processing
- Final SSE event signals completion with mediaStatus "AIProcessed"
- Duplicate media_id correctly rejected with 409 Conflict
## Scope
### Included
- FT-P-08: Async media detection returns "started" immediately
- FT-P-09: SSE streaming delivers detection events during async processing
- FT-N-04: Duplicate media_id returns 409
### Excluded
- Video frame sampling details (covered in video tests)
- SSE queue overflow behavior (covered in resource limit tests)
- Annotations service interaction (covered in resilience tests)
## Acceptance Criteria
**AC-1: Immediate async response**
Given an initialized engine
When POST /detect/{media_id} is called with config and auth headers
Then response arrives within 1s with {"status": "started"}
**AC-2: SSE event delivery**
Given an SSE client connected to GET /detect/stream
When async detection is triggered via POST /detect/{media_id}
Then SSE events are received with detection data and mediaStatus "AIProcessing"
And a final event with mediaStatus "AIProcessed" and percent 100 arrives
**AC-3: Duplicate media_id rejection**
Given an async detection is already in progress for a media_id
When POST /detect/{media_id} is called again with the same ID
Then response is 409 Conflict
## Non-Functional Requirements
**Performance**
- Async initiation response within 1s
- SSE events delivered within 120s total processing time
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm, test-video, JWT token | POST /detect/{media_id} | Response < 1s, status "started" | Max 2s |
| AC-2 | Engine warm, SSE connected, test-video | Listen SSE during async detection | Events received, final AIProcessed | Max 120s |
| AC-3 | Active detection in progress | Second POST with same media_id | 409 Conflict | Max 5s |
## Constraints
- SSE client must connect before triggering async detection
- JWT token required for async detection endpoint
- Test video must be accessible via configured paths
## Risks & Mitigation
**Risk 1: SSE connection timing**
- *Risk*: SSE connection may not be established before detection starts
- *Mitigation*: Add small delay between SSE connect and detection trigger; verify connection established
+75
View File
@@ -0,0 +1,75 @@
# Video Processing Tests
**Task**: AZ-143_test_video
**Name**: Video Processing Tests
**Description**: Implement E2E tests verifying video frame sampling, annotation interval enforcement, and movement-based tracking
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure, AZ-142_test_async_sse
**Component**: Integration Tests
**Jira**: AZ-143
**Epic**: AZ-137
## Problem
Video detection processes frames at a configurable interval (frame_period_recognition), enforces minimum annotation intervals (frame_recognition_seconds), and tracks object movement to avoid redundant annotations. Tests must verify these three video-specific behaviors work correctly.
## Outcome
- Frame sampling verified: only every Nth frame processed (±10% tolerance)
- Annotation interval enforced: no two annotations closer than configured seconds
- Movement tracking confirmed: annotations emitted on object movement, suppressed for static objects
## Scope
### Included
- FT-P-10: Video frame sampling processes every Nth frame
- FT-P-11: Video annotation interval enforcement
- FT-P-12: Video tracking accepts new annotations on movement
### Excluded
- Async detection initiation (covered in async/SSE tests)
- SSE delivery mechanics (covered in async/SSE tests)
- Video processing performance (covered in performance tests)
## Acceptance Criteria
**AC-1: Frame sampling**
Given a 10s 30fps video (300 frames) and frame_period_recognition=4
When async detection is triggered
Then approximately 75 frames are processed (±10% tolerance)
**AC-2: Annotation interval**
Given a test video and frame_recognition_seconds=2
When async detection is triggered
Then minimum gap between consecutive annotation events >= 2 seconds
**AC-3: Movement tracking**
Given a test video with moving objects and tracking_distance_confidence > 0
When async detection is triggered
Then annotations contain updated positions for moving objects
And static objects do not generate redundant annotations
## Non-Functional Requirements
**Performance**
- Video processing completes within 120s
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm, SSE connected, test-video, frame_period=4 | Count processed frames via SSE | ~75 frames (±10%) | Max 120s |
| AC-2 | Engine warm, SSE connected, test-video, frame_recognition_seconds=2 | Measure time between annotations | >= 2s gap between annotations | Max 120s |
| AC-3 | Engine warm, SSE connected, test-video, tracking config | Inspect annotation positions | Updated coords for moving objects | Max 120s |
## Constraints
- Test video must contain moving objects for tracking verification
- Frame counting tolerance accounts for start/end frame edge cases
- Annotation interval measurement requires clock precision within 0.5s
## Risks & Mitigation
**Risk 1: Inconsistent frame counts**
- *Risk*: Frame sampling may vary slightly depending on video codec and frame extraction
- *Mitigation*: Use ±10% tolerance as specified in test spec
@@ -0,0 +1,82 @@
# Negative Input Tests
**Task**: AZ-144_test_negative
**Name**: Negative Input Tests
**Description**: Implement E2E tests verifying proper error responses for invalid inputs, unavailable engine, and missing configuration
**Complexity**: 2 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-144
**Epic**: AZ-137
## Problem
The system must handle invalid and edge-case inputs gracefully, returning appropriate HTTP error codes without crashing. Tests must verify error responses for empty files, corrupt data, engine unavailability, and missing configuration.
## Outcome
- Empty image returns 400 Bad Request
- Corrupt/non-image data returns 400 Bad Request
- Detection when engine unavailable returns 503 or 422
- Missing classes.json prevents normal operation
- Service remains healthy after all negative inputs
## Scope
### Included
- FT-N-01: Empty image returns 400
- FT-N-02: Invalid image data returns 400
- FT-N-03: Detection when engine unavailable returns 503
- FT-N-05: Missing classes.json prevents startup
### Excluded
- Duplicate media_id (covered in async/SSE tests)
- Service outage scenarios (covered in resilience tests)
- Malformed multipart payloads (covered in security tests)
## Acceptance Criteria
**AC-1: Empty image**
Given the detections service is running
When POST /detect is called with a zero-byte file
Then response is 400 Bad Request with error message
**AC-2: Corrupt image**
Given the detections service is running
When POST /detect is called with random binary data
Then response is 400 Bad Request (not 500)
**AC-3: Engine unavailable**
Given mock-loader is configured to fail model requests
When POST /detect is called
Then response is 503 or 422 with no crash or unhandled exception
**AC-4: Missing classes.json**
Given detections service started without classes.json volume mount
When the service runs or a detection is attempted
Then service either fails to start or returns empty/error results without crashing
## Non-Functional Requirements
**Reliability**
- Service must remain operational after processing invalid inputs (AC-1, AC-2)
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Service running | POST /detect with empty file | 400 Bad Request | Max 5s |
| AC-2 | Service running | POST /detect with corrupt binary | 400 Bad Request | Max 5s |
| AC-3 | mock-loader returns 503 | POST /detect with valid image | 503 or 422 | Max 30s |
| AC-4 | No classes.json mounted | Start service or detect | Fail gracefully | Max 30s |
## Constraints
- AC-4 requires a separate Docker Compose configuration without the classes.json volume
- AC-3 requires mock-loader control API to simulate failure
## Risks & Mitigation
**Risk 1: AC-4 service start behavior**
- *Risk*: Behavior when classes.json is missing may vary (fail at start vs. fail at detection)
- *Mitigation*: Test both paths; accept either as valid graceful handling
@@ -0,0 +1,107 @@
# Resilience Tests
**Task**: AZ-145_test_resilience
**Name**: Resilience Tests
**Description**: Implement E2E tests verifying service resilience during external service outages, transient failures, and container restarts
**Complexity**: 5 points
**Dependencies**: AZ-138_test_infrastructure, AZ-142_test_async_sse
**Component**: Integration Tests
**Jira**: AZ-145
**Epic**: AZ-137
## Problem
The detection service must continue operating when external dependencies fail. Tests must verify resilience during loader outages (before and after engine init), annotations service outages, transient loader failures with retry, and service restarts with state loss.
## Outcome
- Detection continues when loader goes down after engine is loaded
- Async detection completes when annotations service is down
- Engine initialization retries after transient loader failure
- Service restart clears all in-memory state cleanly
- Loader unreachable during initial model download handled gracefully
- Annotations failure during async detection does not stop the pipeline
## Scope
### Included
- FT-N-06: Loader service unreachable during model download
- FT-N-07: Annotations service unreachable — detection continues
- NFT-RES-01: Loader service outage after engine initialization
- NFT-RES-02: Annotations service outage during async detection
- NFT-RES-03: Engine initialization retry after transient loader failure
- NFT-RES-04: Service restart with in-memory state loss
### Excluded
- Input validation errors (covered in negative tests)
- Performance under fault conditions
- Network partition simulation beyond service stop/start
## Acceptance Criteria
**AC-1: Loader unreachable during init**
Given mock-loader is stopped and engine not initialized
When POST /detect is called
Then response is 503 or 422 error
And GET /health reflects engine error state
**AC-2: Annotations unreachable — detection continues**
Given engine is initialized and mock-annotations is stopped
When async detection is triggered
Then SSE events still arrive and final AIProcessed event is received
**AC-3: Loader outage after init**
Given engine is already initialized (model in memory)
When mock-loader is stopped and POST /detect is called
Then detection succeeds (200 OK, engine already loaded)
And GET /health remains "Enabled"
**AC-4: Annotations outage mid-processing**
Given async detection is in progress
When mock-annotations is stopped mid-processing
Then SSE events continue arriving
And detection completes with AIProcessed event
**AC-5: Transient loader failure with retry**
Given mock-loader fails first request then recovers
When first POST /detect fails and second POST /detect is sent
Then second detection succeeds (engine initializes on retry)
**AC-6: Service restart state reset**
Given a detection may have been in progress
When the detections container is restarted
Then GET /health returns aiAvailability "None" (fresh start)
And POST /detect/{media_id} is accepted (no stale _active_detections)
## Non-Functional Requirements
**Reliability**
- All fault injection tests must restore mock services after test completion
- Service must not crash or leave zombie processes after any failure scenario
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | mock-loader stopped, fresh engine | POST /detect | 503/422 graceful error | Max 30s |
| AC-2 | Engine warm, mock-annotations stopped | Async detection + SSE | SSE events continue, AIProcessed | Max 120s |
| AC-3 | Engine warm, mock-loader stopped | POST /detect (sync) | 200 OK, detection succeeds | Max 30s |
| AC-4 | Async detection started, then stop mock-annotations | SSE event stream | Events continue, pipeline completes | Max 120s |
| AC-5 | mock-loader: first_fail mode | Two sequential POST /detect | First fails, second succeeds | Max 60s |
| AC-6 | Restart detections container | Health + detect after restart | Clean state, no stale data | Max 60s |
## Constraints
- Fault injection via Docker service stop/start and mock control API
- Container restart test requires docker compose restart capability
- Mock services must support configurable failure modes (normal, error, first_fail)
## Risks & Mitigation
**Risk 1: Container restart timing**
- *Risk*: Container restart may take variable time, causing flaky tests
- *Mitigation*: Use service readiness polling with generous timeout before assertions
**Risk 2: Mock state leakage between tests**
- *Risk*: Stopped mock may affect subsequent tests
- *Mitigation*: Function-scoped mock reset fixture restores all mocks before each test
@@ -0,0 +1,86 @@
# Performance Tests
**Task**: AZ-146_test_performance
**Name**: Performance Tests
**Description**: Implement E2E tests measuring detection latency, concurrent inference throughput, tiling overhead, and video processing frame rate
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-146
**Epic**: AZ-137
## Problem
Performance characteristics must be baselined and verified: single image latency, concurrent request handling with the 2-worker ThreadPoolExecutor, tiling overhead for large images, and video processing frame rate. These tests establish performance contracts.
## Outcome
- Single image latency profiled (p50, p95, p99) for warm engine
- Concurrent inference behavior validated (2-at-a-time processing confirmed)
- Large image tiling overhead measured and bounded
- Video processing frame rate baselined
## Scope
### Included
- NFT-PERF-01: Single image detection latency
- NFT-PERF-02: Concurrent inference throughput
- NFT-PERF-03: Large image tiling processing time
- NFT-PERF-04: Video processing frame rate
### Excluded
- GPU vs CPU comparative benchmarks
- Memory usage profiling
- Load testing beyond 4 concurrent requests
## Acceptance Criteria
**AC-1: Single image latency**
Given a warm engine
When 10 sequential POST /detect requests are sent with small-image
Then p95 latency < 5000ms for ONNX CPU or p95 < 1000ms for TensorRT GPU
**AC-2: Concurrent throughput**
Given a warm engine
When 2 concurrent POST /detect requests are sent
Then both complete without error
And 3 concurrent requests show queuing (total time > time for 2)
**AC-3: Tiling overhead**
Given a warm engine
When POST /detect is sent with large-image (4000x3000)
Then request completes within 120s
And processing time scales proportionally with tile count
**AC-4: Video frame rate**
Given a warm engine with SSE connected
When async detection processes test-video with frame_period=4
Then processing completes within 5x video duration (< 50s)
And frame processing rate is consistent (no stalls > 10s)
## Non-Functional Requirements
**Performance**
- Tests themselves should complete within defined bounds
- Results should be logged for trend analysis
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm | 10 sequential detections | p95 < 5000ms (CPU) | ~60s |
| AC-2 | Engine warm | 2 then 3 concurrent requests | Queuing observed at 3 | ~30s |
| AC-3 | Engine warm, large-image | Single large image detection | Completes < 120s | ~120s |
| AC-4 | Engine warm, SSE connected | Video detection | < 50s, consistent rate | ~120s |
## Constraints
- Pass criteria differ between CPU (ONNX) and GPU (TensorRT) profiles
- Concurrent request tests must account for connection overhead
- Video frame rate depends on hardware; test measures consistency, not absolute speed
## Risks & Mitigation
**Risk 1: CI hardware variability**
- *Risk*: Latency thresholds may fail on slower CI hardware
- *Mitigation*: Use generous thresholds; mark as performance benchmark tests that can be skipped in resource-constrained CI
@@ -0,0 +1,78 @@
# Security Tests
**Task**: AZ-147_test_security
**Name**: Security Tests
**Description**: Implement E2E tests verifying handling of malformed payloads, oversized requests, and JWT token forwarding
**Complexity**: 2 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-147
**Epic**: AZ-137
## Problem
The service must handle malicious or malformed input without crashing, reject oversized uploads, and correctly forward authentication tokens to downstream services. These tests verify security-relevant behaviors at the API boundary.
## Outcome
- Malformed multipart payloads return 4xx (not 500 or crash)
- Oversized request bodies handled without OOM or crash
- JWT token forwarded to annotations service exactly as received
- Service remains operational after all security test scenarios
## Scope
### Included
- NFT-SEC-01: Malformed multipart payload handling
- NFT-SEC-02: Oversized request body
- NFT-SEC-03: JWT token is forwarded without modification
### Excluded
- Authentication/authorization enforcement (service doesn't implement auth)
- TLS verification (handled at infrastructure level)
- CORS testing (requires browser context)
## Acceptance Criteria
**AC-1: Malformed multipart**
Given the service is running
When POST /detect is sent with truncated multipart (missing boundary) or empty file part
Then response is 400 or 422 (not 500)
And GET /health confirms service still healthy
**AC-2: Oversized request**
Given the service is running
When POST /detect is sent with a 500MB random file
Then response is an error (413, 400, or timeout) without OOM crash
And GET /health confirms service still running
**AC-3: JWT forwarding**
Given engine is initialized and mock-annotations is recording
When POST /detect/{media_id} is sent with Authorization and x-refresh-token headers
Then mock-annotations received the exact same Authorization header value
## Non-Functional Requirements
**Reliability**
- Service must not crash on any malformed input
- Memory usage must not spike beyond bounds on oversized uploads
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Service running | Truncated multipart + no file part | 400/422, not 500 | Max 5s |
| AC-2 | Service running | 500MB random file upload | Error response, no crash | Max 60s |
| AC-3 | Engine warm, mock-annotations recording | Detect with JWT headers | Exact token match in mock | Max 120s |
## Constraints
- Oversized request test may require increased client timeout
- JWT forwarding verification requires async detection to complete annotation POST
- Malformed multipart construction requires raw HTTP request building
## Risks & Mitigation
**Risk 1: Oversized upload behavior varies**
- *Risk*: FastAPI/Starlette may handle oversized bodies differently across versions
- *Mitigation*: Accept any non-crash error response (413, 400, timeout, connection reset)
@@ -0,0 +1,99 @@
# Resource Limit Tests
**Task**: AZ-148_test_resource_limits
**Name**: Resource Limit Tests
**Description**: Implement E2E tests verifying ThreadPoolExecutor worker limit, SSE queue depth cap, max detections per frame, SSE overflow handling, and log file rotation
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure, AZ-142_test_async_sse
**Component**: Integration Tests
**Jira**: AZ-148
**Epic**: AZ-137
## Problem
The system enforces several resource limits: 2 concurrent inference workers, 100-event SSE queue depth, 300 max detections per frame, and daily log rotation. Tests must verify these limits are enforced correctly and that overflow conditions are handled gracefully.
## Outcome
- ThreadPoolExecutor limited to 2 concurrent inference operations
- SSE queue capped at 100 events per client, overflow silently dropped
- No response contains more than 300 detections per frame
- Log files use date-based naming with daily rotation
- SSE overflow does not crash the service or the detection pipeline
## Scope
### Included
- FT-N-08: SSE queue overflow is silently dropped
- NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
- NFT-RES-LIM-02: SSE queue depth limit (100 events)
- NFT-RES-LIM-03: Max 300 detections per frame
- NFT-RES-LIM-04: Log file rotation and retention
### Excluded
- Memory limits (OS-level, not application-enforced)
- Disk space limits
- Network bandwidth throttling
## Acceptance Criteria
**AC-1: Worker limit**
Given an initialized engine
When 4 concurrent POST /detect requests are sent
Then first 2 complete roughly together, next 2 complete after (2-at-a-time processing)
And all 4 requests eventually succeed
**AC-2: SSE queue depth**
Given an SSE client connected but not reading (stalled)
When async detection produces > 100 events
Then stalled client receives <= 100 events when it resumes reading
And no OOM or connection errors
**AC-3: SSE overflow handling**
Given an SSE client pauses reading
When async detection generates many events
Then detection completes normally (no error from overflow)
And stalled client receives at most 100 buffered events
**AC-4: Max detections per frame**
Given an initialized engine and a dense scene image
When POST /detect is called
Then response contains at most 300 detections
**AC-5: Log file rotation**
Given the service is running with Logs/ volume mounted
When detection requests are made
Then log file exists at Logs/log_inference_YYYYMMDD.txt with today's date
And log content contains structured INFO/DEBUG/WARNING entries
## Non-Functional Requirements
**Reliability**
- Resource limits must be enforced without crash or undefined behavior
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm | 4 concurrent POST /detect | 2-at-a-time processing pattern | Max 60s |
| AC-2 | Engine warm, stalled SSE | Async detection > 100 events | <= 100 events buffered | Max 120s |
| AC-3 | Engine warm, stalled SSE | Detection pipeline behavior | Completes normally | Max 120s |
| AC-4 | Engine warm, dense scene image | POST /detect | <= 300 detections | Max 30s |
| AC-5 | Service running, Logs/ mounted | Detection requests | Date-named log file exists | Max 10s |
## Constraints
- Worker limit test requires precise timing measurement of response arrivals
- SSE overflow test requires ability to pause/resume SSE client reading
- Detection cap test requires an image producing many detections (may not reach 300 with test fixture)
- Log rotation test verifies naming convention; full 30-day retention requires long-running test
## Risks & Mitigation
**Risk 1: Insufficient detections for cap test**
- *Risk*: Test image may not produce 300 detections to actually hit the cap
- *Mitigation*: Verify the cap exists by checking detection count <= 300; accept as passing if under limit
**Risk 2: SSE client stall implementation**
- *Risk*: HTTP client libraries may not support controlled read pausing
- *Mitigation*: Use raw socket or thread-based approach to control when events are consumed