mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:46:35 +00:00

Files

T

Oleksandr Bezdieniezhnykh 222f552a10 [AZ-171] Add TensorRT tests, AC coverage gate in implement skill, optimize test infrastructure

- Add TensorRT export tests with graceful skip when no GPU available
- Add AC test coverage verification step (Step 8) to implement skill
- Add test coverage gap analysis to new-task skill
- Move exported_models fixture to conftest.py as session-scoped (shared across modules)
- Reorder tests: e2e training runs first so images/labels are available for all tests
- Consolidate teardown into single session-level cleanup in conftest.py
- Fix infrastructure tests to count files dynamically instead of hardcoded 20

Made-with: Cursor

2026-03-28 21:32:28 +02:00

11 KiB

Raw Blame History

name, description, category, tags, disable-model-invocation

name

description

Implementation Orchestrator

Orchestrate the implementation of all tasks produced by the /decompose skill. This skill is a pure orchestrator — it does NOT write implementation code itself. It reads task specs, computes execution order, delegates to implementer subagents, validates results via the /code-review skill, and escalates issues.

The implementer agent is the specialist that writes all the code — it receives a task spec, analyzes the codebase, implements the feature, writes tests, and verifies acceptance criteria.

Core Principles

Orchestrate, don't implement: this skill delegates all coding to implementer subagents
Dependency-aware batching: tasks run only when all their dependencies are satisfied
Max 4 parallel agents: never launch more than 4 implementer subagents simultaneously
File isolation: no two parallel agents may write to the same file
Integrated review: /code-review skill runs automatically after each batch
Auto-start: batches launch immediately — no user confirmation before a batch
Gate on failure: user confirmation is required only when code review returns FAIL
Commit and push per batch: after each batch is confirmed, commit and push to remote

Context Resolution

TASKS_DIR: _docs/02_tasks/
Task files: all *.md files in TASKS_DIR/todo/ (excluding files starting with _)
Dependency table: TASKS_DIR/_dependencies_table.md

Task Lifecycle Folders

TASKS_DIR/
├── _dependencies_table.md
├── todo/        ← tasks ready for implementation (this skill reads from here)
├── backlog/     ← parked tasks (not scheduled yet, ignored by this skill)
└── done/        ← completed tasks (moved here after implementation)

Prerequisite Checks (BLOCKING)

TASKS_DIR/todo/ exists and contains at least one task file — STOP if missing
_dependencies_table.md exists — STOP if missing
At least one task is not yet completed — STOP if all done

Algorithm

1. Parse

Read all task *.md files from TASKS_DIR/todo/ (excluding files starting with _)
Read _dependencies_table.md — parse into a dependency graph (DAG)
Validate: no circular dependencies, all referenced dependencies exist

2. Detect Progress

Scan the codebase to determine which tasks are already completed
Match implemented code against task acceptance criteria
Mark completed tasks as done in the DAG
Report progress to user: "X of Y tasks completed"

3. Compute Next Batch

Topological sort remaining tasks
Select tasks whose dependencies are ALL satisfied (completed)
If a ready task depends on any task currently being worked on in this batch, it must wait for the next batch
Cap the batch at 4 parallel agents
If the batch would exceed 20 total complexity points, suggest splitting and let the user decide

4. Assign File Ownership

For each task in the batch:

Parse the task spec's Component field and Scope section
Map the component to directories/files in the project
Determine: files OWNED (exclusive write), files READ-ONLY (shared interfaces, types), files FORBIDDEN (other agents' owned files)
If two tasks in the same batch would modify the same file, schedule them sequentially instead of in parallel

5. Update Tracker Status → In Progress

For each task in the batch, transition its ticket status to In Progress via the configured work item tracker (see protocols.md for tracker detection) before launching the implementer. If tracker: local, skip this step.

6. Launch Implementer Subagents

For each task in the batch, launch an implementer subagent with:

Path to the task spec file
List of files OWNED (exclusive write access)
List of files READ-ONLY
List of files FORBIDDEN
Explicit instruction: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.

Launch all subagents immediately — no user confirmation.

7. Monitor

Wait for all subagents to complete
Collect structured status reports from each implementer
If any implementer reports "Blocked", log the blocker and continue with others

Stuck detection — while monitoring, watch for these signals per subagent:

Same file modified 3+ times without test pass rate improving → flag as stuck, stop the subagent, report as Blocked
Subagent has not produced new output for an extended period → flag as potentially hung
If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report

8. AC Test Coverage Verification

Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:

Read the task spec's Acceptance Criteria section
Search the test files (new and existing) for tests that cover each AC
Classify each AC as:
- Covered: a test directly validates this AC (running or skipped-with-reason)
- Not covered: no test exists for this AC

If any AC is Not covered:

This is a BLOCKING failure — the implementer must write the missing test before proceeding
Re-launch the implementer with the specific ACs that need tests
If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with pytest.mark.skipif or pytest.skip() explaining the prerequisite
A skipped test counts as Covered — the test exists and will run when the environment allows

Only proceed to Step 9 when every AC has a corresponding test.

9. Code Review

Run /code-review skill on the batch's changed files + corresponding task specs
The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL

10. Auto-Fix Gate

Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:

If verdict is PASS or PASS_WITH_WARNINGS: show findings as info, continue automatically to step 11
If verdict is FAIL (attempt 1 or 2):
- Parse the code review findings (Critical and High severity items)
- For each finding, attempt an automated fix using the finding's location, description, and suggestion
- Re-run /code-review on the modified files
- If now PASS or PASS_WITH_WARNINGS → continue to step 11
- If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
If still FAIL after 2 auto-fix attempts: present all findings to user (BLOCKING). User must confirm fixes or accept before proceeding.

Track auto_fix_attempts count in the batch report for retrospective analysis.

11. Commit and Push

After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
- git add all changed files from the batch
- git commit with a message that includes ALL task IDs (tracker IDs or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: [TASK-ID-1] [TASK-ID-2] ... Summary of changes
- git push to the remote branch

12. Update Tracker Status → In Testing

After the batch is committed and pushed, transition the ticket status of each task in the batch to In Testing via the configured work item tracker. If tracker: local, skip this step.

13. Archive Completed Tasks

Move each completed task file from TASKS_DIR/todo/ to TASKS_DIR/done/.

14. Loop

Go back to step 2 until all tasks in todo/ are done

15. Final Test Run

After all batches are complete, run the full test suite once
Read and execute .cursor/skills/test-run/SKILL.md (detect runner, run suite, diagnose failures, present blocking choices)
Test failures are a blocking gate — do not proceed until the test-run skill completes with a user decision
When tests pass, report final summary

Batch Report Persistence

After each batch completes, save the batch report to _docs/03_implementation/batch_[NN]_report.md. Create the directory if it doesn't exist. When all tasks are complete, produce a FINAL implementation report with a summary of all batches. The filename depends on context:

Test implementation (tasks from test decomposition): _docs/03_implementation/implementation_report_tests.md
Feature implementation: _docs/03_implementation/implementation_report_{feature_slug}.md where {feature_slug} is derived from the batch task names (e.g., implementation_report_core_api.md)
Refactoring: _docs/03_implementation/implementation_report_refactor_{run_name}.md

Determine the context from the task files being implemented: if all tasks have test-related names or belong to a test epic, use the tests filename; otherwise derive the feature slug from the component names.

Batch Report

After each batch, produce a structured report:

# Batch Report

**Batch**: [N]
**Tasks**: [list]
**Date**: [YYYY-MM-DD]

## Task Results

| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|---------------|-------|-------------|--------|
| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |

## AC Test Coverage: [All covered / X of Y covered]
## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
## Auto-Fix Attempts: [0/1/2]
## Stuck Agents: [count or None]

## Next Batch: [task list] or "All tasks complete"

Stop Conditions and Escalation

Situation	Action
Implementer fails same approach 3+ times	Stop it, escalate to user
Task blocked on external dependency (not in task list)	Report and skip
File ownership conflict unresolvable	ASK user
Test failure after final test run	Delegate to test-run skill — blocking gate
All tasks complete	Report final summary, suggest final commit
`_dependencies_table.md` missing	STOP — run `/decompose` first

Recovery

Each batch commit serves as a rollback checkpoint. If recovery is needed:

Tests fail after final test run: git revert <batch-commit-hash> using hashes from the batch reports in _docs/03_implementation/
Resuming after interruption: Read _docs/03_implementation/batch_*_report.md files to determine which batches completed, then continue from the next batch
Multiple consecutive batches fail: Stop and escalate to user with links to batch reports and commit hashes

Safety Rules

Never launch tasks whose dependencies are not yet completed
Never allow two parallel agents to write to the same file
If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
Always run the full test suite after all batches complete (step 15)

11 KiB Raw Blame History