mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 04:26:35 +00:00
[AZ-171] Add TensorRT tests, AC coverage gate in implement skill, optimize test infrastructure
- Add TensorRT export tests with graceful skip when no GPU available - Add AC test coverage verification step (Step 8) to implement skill - Add test coverage gap analysis to new-task skill - Move exported_models fixture to conftest.py as session-scoped (shared across modules) - Reorder tests: e2e training runs first so images/labels are available for all tests - Consolidate teardown into single session-level cleanup in conftest.py - Fix infrastructure tests to count files dynamically instead of hardcoded 20 Made-with: Cursor
This commit is contained in:
@@ -94,6 +94,7 @@ For each task in the batch, launch an `implementer` subagent with:
|
||||
- List of files OWNED (exclusive write access)
|
||||
- List of files READ-ONLY
|
||||
- List of files FORBIDDEN
|
||||
- **Explicit instruction**: the implementer must write or update tests that validate each acceptance criterion in the task spec. If a test cannot run in the current environment (e.g., TensorRT requires GPU), the test must still be written and skip with a clear reason.
|
||||
|
||||
Launch all subagents immediately — no user confirmation.
|
||||
|
||||
@@ -108,46 +109,64 @@ Launch all subagents immediately — no user confirmation.
|
||||
- Subagent has not produced new output for an extended period → flag as potentially hung
|
||||
- If a subagent is flagged as stuck, do NOT let it continue looping — stop it and record the blocker in the batch report
|
||||
|
||||
### 8. Code Review
|
||||
### 8. AC Test Coverage Verification
|
||||
|
||||
Before code review, verify that every acceptance criterion in each task spec has at least one test that validates it. For each task in the batch:
|
||||
|
||||
1. Read the task spec's **Acceptance Criteria** section
|
||||
2. Search the test files (new and existing) for tests that cover each AC
|
||||
3. Classify each AC as:
|
||||
- **Covered**: a test directly validates this AC (running or skipped-with-reason)
|
||||
- **Not covered**: no test exists for this AC
|
||||
|
||||
If any AC is **Not covered**:
|
||||
- This is a **BLOCKING** failure — the implementer must write the missing test before proceeding
|
||||
- Re-launch the implementer with the specific ACs that need tests
|
||||
- If the test cannot run in the current environment (GPU required, platform-specific, external service), the test must still exist and skip with `pytest.mark.skipif` or `pytest.skip()` explaining the prerequisite
|
||||
- A skipped test counts as **Covered** — the test exists and will run when the environment allows
|
||||
|
||||
Only proceed to Step 9 when every AC has a corresponding test.
|
||||
|
||||
### 9. Code Review
|
||||
|
||||
- Run `/code-review` skill on the batch's changed files + corresponding task specs
|
||||
- The code-review skill produces a verdict: PASS, PASS_WITH_WARNINGS, or FAIL
|
||||
|
||||
### 9. Auto-Fix Gate
|
||||
### 10. Auto-Fix Gate
|
||||
|
||||
Auto-fix loop with bounded retries (max 2 attempts) before escalating to user:
|
||||
|
||||
1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 10
|
||||
1. If verdict is **PASS** or **PASS_WITH_WARNINGS**: show findings as info, continue automatically to step 11
|
||||
2. If verdict is **FAIL** (attempt 1 or 2):
|
||||
- Parse the code review findings (Critical and High severity items)
|
||||
- For each finding, attempt an automated fix using the finding's location, description, and suggestion
|
||||
- Re-run `/code-review` on the modified files
|
||||
- If now PASS or PASS_WITH_WARNINGS → continue to step 10
|
||||
- If now PASS or PASS_WITH_WARNINGS → continue to step 11
|
||||
- If still FAIL → increment retry counter, repeat from (2) up to max 2 attempts
|
||||
3. If still **FAIL** after 2 auto-fix attempts: present all findings to user (**BLOCKING**). User must confirm fixes or accept before proceeding.
|
||||
|
||||
Track `auto_fix_attempts` count in the batch report for retrospective analysis.
|
||||
|
||||
### 10. Commit and Push
|
||||
### 11. Commit and Push
|
||||
|
||||
- After user confirms the batch (explicitly for FAIL, implicitly for PASS/PASS_WITH_WARNINGS):
|
||||
- `git add` all changed files from the batch
|
||||
- `git commit` with a message that includes ALL task IDs (tracker IDs or numeric prefixes) of tasks implemented in the batch, followed by a summary of what was implemented. Format: `[TASK-ID-1] [TASK-ID-2] ... Summary of changes`
|
||||
- `git push` to the remote branch
|
||||
|
||||
### 11. Update Tracker Status → In Testing
|
||||
### 12. Update Tracker Status → In Testing
|
||||
|
||||
After the batch is committed and pushed, transition the ticket status of each task in the batch to **In Testing** via the configured work item tracker. If `tracker: local`, skip this step.
|
||||
|
||||
### 12. Archive Completed Tasks
|
||||
### 13. Archive Completed Tasks
|
||||
|
||||
Move each completed task file from `TASKS_DIR/todo/` to `TASKS_DIR/done/`.
|
||||
|
||||
### 13. Loop
|
||||
### 14. Loop
|
||||
|
||||
- Go back to step 2 until all tasks in `todo/` are done
|
||||
|
||||
### 14. Final Test Run
|
||||
### 15. Final Test Run
|
||||
|
||||
- After all batches are complete, run the full test suite once
|
||||
- Read and execute `.cursor/skills/test-run/SKILL.md` (detect runner, run suite, diagnose failures, present blocking choices)
|
||||
@@ -177,10 +196,11 @@ After each batch, produce a structured report:
|
||||
|
||||
## Task Results
|
||||
|
||||
| Task | Status | Files Modified | Tests | Issues |
|
||||
|------|--------|---------------|-------|--------|
|
||||
| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [count or None] |
|
||||
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|
||||
|------|--------|---------------|-------|-------------|--------|
|
||||
| [TRACKER-ID]_[name] | Done | [count] files | [pass/fail] | [N/N ACs covered] | [count or None] |
|
||||
|
||||
## AC Test Coverage: [All covered / X of Y covered]
|
||||
## Code Review Verdict: [PASS/FAIL/PASS_WITH_WARNINGS]
|
||||
## Auto-Fix Attempts: [0/1/2]
|
||||
## Stuck Agents: [count or None]
|
||||
@@ -212,4 +232,4 @@ Each batch commit serves as a rollback checkpoint. If recovery is needed:
|
||||
- Never launch tasks whose dependencies are not yet completed
|
||||
- Never allow two parallel agents to write to the same file
|
||||
- If a subagent fails or is flagged as stuck, stop it and report — do not let it loop indefinitely
|
||||
- Always run the full test suite after all batches complete (step 14)
|
||||
- Always run the full test suite after all batches complete (step 15)
|
||||
|
||||
@@ -129,7 +129,7 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
|
||||
### Step 4: Codebase Analysis
|
||||
|
||||
**Role**: Software architect
|
||||
**Goal**: Determine where and how to insert the new functionality.
|
||||
**Goal**: Determine where and how to insert the new functionality, and whether existing tests cover the new requirements.
|
||||
|
||||
1. Read the codebase documentation from DOCUMENT_DIR:
|
||||
- `architecture.md` — overall structure
|
||||
@@ -144,6 +144,10 @@ The `<task_slug>` is a short kebab-case name derived from the feature descriptio
|
||||
- What new interfaces or models are needed
|
||||
- How data flows through the change
|
||||
4. If the change is complex enough, read the actual source files (not just docs) to verify insertion points
|
||||
5. **Test coverage gap analysis**: Read existing test files that cover the affected components. For each acceptance criterion from Step 1, determine whether an existing test already validates it. Classify each AC as:
|
||||
- **Covered**: an existing test directly validates this behavior
|
||||
- **Partially covered**: an existing test exercises the code path but doesn't assert the new requirement
|
||||
- **Not covered**: no existing test validates this behavior — a new test is required
|
||||
|
||||
Present the analysis:
|
||||
|
||||
@@ -156,9 +160,22 @@ Present the analysis:
|
||||
Interface changes: [list or "None"]
|
||||
New interfaces: [list or "None"]
|
||||
Data flow impact: [summary]
|
||||
─────────────────────────────────────
|
||||
TEST COVERAGE GAP ANALYSIS
|
||||
─────────────────────────────────────
|
||||
AC-1: [Covered / Partially covered / Not covered]
|
||||
[existing test name or "needs new test"]
|
||||
AC-2: [Covered / Partially covered / Not covered]
|
||||
[existing test name or "needs new test"]
|
||||
...
|
||||
─────────────────────────────────────
|
||||
New tests needed: [count]
|
||||
Existing tests to update: [count or "None"]
|
||||
══════════════════════════════════════
|
||||
```
|
||||
|
||||
When gaps are found, the task spec (Step 6) MUST include the missing tests in the Scope (Included) section and the Unit/Blackbox Tests tables. Tests are not optional — if an AC is not covered by an existing test, the task must deliver a test for it.
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Validate Assumptions
|
||||
|
||||
Reference in New Issue
Block a user