Files
admin/_docs/06_metrics/retro_2026-05-13.md
T
Oleksandr Bezdieniezhnykh 3a925b9b0f
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
refactor: remove obsolete resource download and installer endpoints
- Deleted the `POST /resources/get/{dataFolder?}` and `GET /resources/get-installer` endpoints as part of the architectural shift towards simplified resource management.
- Removed associated methods and configurations, including `ResourcesService.GetEncryptedResource`, `ResourcesService.GetInstaller`, and related properties in `ResourcesConfig`.
- Cleaned up environment variables and configuration files to reflect the removal of installer-related settings.
- Eliminated the `GetResourceRequest` DTO and its validator, along with the `WrongResourceName` error code.
- Updated documentation to clarify the changes in resource handling and the retirement of per-user file encryption.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 04:17:55 +03:00

170 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Retrospective — 2026-05-13 (Cycle 1, end of cycle)
**Mode**: cycle-end
**Cycle**: 1
**Window**: 2026-04-16 (Phase A baseline) → 2026-05-13 (Phase B feature cycle complete + Deploy)
**Previous retro**: N/A — first retrospective
## Implementation Summary
| Metric | Phase A (baseline) | Phase B (cycle 1) | Total |
|--------|-------------------:|------------------:|------:|
| Total tasks | 7 | 4 | **11** |
| Total batches | 4 | 2 | **6** |
| Total complexity points | 29 | 11 | **40** |
| Avg tasks per batch | 1.75 | 2.0 | 1.83 |
| Avg complexity per batch | 7.25 | 5.5 | 6.67 |
| Tasks per task spec | — | — | 1 |
Per-task complexity (Phase B): AZ-513 (3) + AZ-196 (2) + AZ-183 (3, reverted) + AZ-197 (3) = 11 points.
## Quality Metrics
### Code Review Results
| Verdict | Count | % |
|---------|------:|--:|
| PASS | 5 | 83% |
| PASS_WITH_WARNINGS | 1 | 17% |
| FAIL | 0 | 0% |
### Findings by Severity (code review only — security audit findings counted separately below)
| Severity | Count | Source |
|----------|------:|--------|
| Critical | 0 | — |
| High | 0 | — |
| Medium | 1 | batch_05 F1 (race on sequential serial) |
| Low | 3 | batch_05 F2/F3/F4 (uniqueness, key rotation, default empty key) |
### Findings by Category
| Category | Count | Top Files |
|----------|------:|-----------|
| Bug | 1 | `Azaion.Services/UserService.cs` (RegisterDevice) |
| Maintainability | 3 | `Azaion.Services/ResourceUpdateService.cs` (×2), `Azaion.AdminApi/appsettings.json` |
| Spec-Gap | 0 | — |
| Security | 0 *(code review)* / 13 *(security audit)* | — |
| Performance | 0 | — |
| Style | 0 | — |
| Scope | 0 | — |
### Security Audit (out-of-band, post-implementation)
| Severity | Count | Status at end of cycle |
|----------|------:|------------------------|
| Critical | 0 | — |
| High | 3 | F-1 closed (OTA reverted), F-3 closed (UNIQUE INDEX), D-1 closed (Newtonsoft 13.0.4); 1 pre-existing (F-2 path traversal) deferred to AZ-516 |
| Medium | 5 | 0 closed in audit; recorded as AZ-517..AZ-520 |
| Low | 5 | 0 closed; recorded as AZ-521 (bundle) |
> The audit found 1 **regression** introduced by cycle-1 work: F-1 (`/get-update` exposed plaintext encryption keys, AZ-183). Fix: full revert of AZ-183. F-3 was an amplification of a pre-existing race (`RegisterDevice` not having a UNIQUE INDEX); the audit closed it by adding `env/db/06_users_email_unique.sql` and consolidating `RegisterDevice` to delegate row insertion to `RegisterUser`.
### Performance Test
| Verdict | NFT thresholds met | Coverage gaps |
|---------|--------------------|---------------|
| PASS | 2/2 (NFT-PERF-01 login p95=33 ms vs 500 ms; NFT-PERF-04 user-list p95=152 ms vs 1000 ms) | NFT-PERF-02/03 obsolete (OTA reverted); no `/classes` perf coverage yet |
### Deploy Audit (this step)
| Drift | Severity | Resolved this cycle | Carried forward |
|-------|---------:|--------------------:|----------------:|
| A — host pulls `:latest`, CI never produces it | Medium | yes | — |
| B — no secret manager | Medium | yes (sops + age) | — |
| C — container runs as root | Medium | yes (`USER app`) | — |
| D — stale `.woodpecker/build-arm.yml` reference | Low | yes (doc + actual files audited) | — |
| E — perf script run-on-demand | Low | spec'd; auto-gating deferred | I |
| F — no vulnerable-dep gate | Low | yes (deps-audit step) | — |
| G — unused `docker.test/Dockerfile` | Low | yes (deleted) | — |
| H — TCP-only healthcheck in test compose | Low | yes (curl /health/live) | — |
| I — no coverage threshold | Low | — | yes |
| J — manual DB migrations | Low | — | yes |
| K — no metrics / tracing implemented | Medium | spec only | yes |
| L — no central log aggregator | Low | — | yes |
| M — no tracing exporter | Low | — | yes |
| N — no zero-downtime deploy | Medium | — | yes |
| O — no remote SSH wrapper | Low | — | yes |
**7 resolved this cycle, 8 carried forward.**
## Efficiency Metrics
| Metric | Value | Notes |
|--------|------:|-------|
| Blocked tasks | 0 | — |
| Tasks requiring fixes after review | 0 | All findings deferred or descoped, none required cycle re-entry |
| Auto-fix attempts triggered | 0 | Across all 6 batches |
| Stuck agents | 0 | — |
| Reverts after main code shipped | 1 | **AZ-183** — same-day revert after security audit finding F-1 |
| Skipped tests with documented reason | 1 | AZ-195 AC-1 (DB recovery test needs Docker socket access) |
| Test pass rate (E2E suite, end of Step 7) | 44/44 | After Dockerfile + healthcheck changes |
### Blocker Analysis
No blockers, but two notable mid-cycle pivots:
| Event | Type | Prevention idea |
|-------|------|------------------|
| User clarified mid-implement (2026-05-13) that the Loader is architecturally retired → AZ-197 was rescoped from cross-workspace to admin-only | Spec ambiguity discovered late | Add an "implicit assumptions" review gate to `new-task` Step 5 (Acceptance Criteria) that explicitly asks: which other workspaces does this touch? Are they still active? |
| Security audit found AZ-183 ships plaintext encryption keys → entire feature reverted same day | Threat model gap not caught at planning | Add a lightweight "what new authenticated endpoints / persistence does this introduce?" prompt to `new-task` Step 5; route any non-zero answer through a 5-minute threat-model check before complexity is finalized |
## Structural Snapshot
This is the first retro, so no delta computation. Snapshot persisted to `_docs/06_metrics/structure_2026-05-13.md` (placeholder — module-layout.md has 5 conceptual sub-components but only **one** ownership boundary in the registry, so cross-component edge counting is degenerate for this workspace).
| Metric | Value | Source |
|--------|------:|--------|
| Components (registry) | 1 (`Admin API`) | `_docs/02_document/module-layout.md` |
| Conceptual sub-components | 5 | same |
| csproj projects | 5 | `Azaion.AdminApi.sln` (4 prod + 1 e2e) |
| Cycles in module graph | 0 | inspection (single deployable, no cross-component edges in the registry) |
| New Architecture violations this cycle | 0 | no `cumulative_review_batches_*.md` exists; verified by inspection of batch reviews — no Architecture-category findings |
| Resolved Architecture violations | 0 | — |
| Net Architecture delta | 0 | — |
| Public-API contract files (`_docs/02_document/contracts/`) | 0 | folder absent |
| Contract coverage % | n/a | n/a |
> Contract files are not part of this project's documentation set today. If future cycles introduce them (e.g., as part of a UI ↔ admin contract test effort), this section will start carrying real coverage numbers.
## Trend Comparison
| Metric | Previous | Current | Change |
|--------|----------|--------:|--------|
| Pass rate | n/a | 83% (5/6) | n/a |
| Avg findings per batch | n/a | 0.67 | n/a |
| Reverts | n/a | 1 | n/a |
| Carried-forward operational drifts | n/a | 8 | n/a |
## Top 3 Improvement Actions
1. **Add a security threat-model micro-step to `new-task` Step 5 (Acceptance Criteria)**
- **What**: Two extra lines on every task spec — "New authenticated endpoints introduced: [list]" and "New persistent data introduced: [list]". If either is non-empty, the next sub-step is a 5-minute threat-model check (data flow, secrets exposure, replay surface). Output recorded in the task spec under `## Threat Model Notes`.
- **Impact**: catches the AZ-183-style "endpoint exposes plaintext key" class of regression at planning time, before the 3-pt budget is committed. Saves at least one cycle of implement → security-audit → revert per occurrence.
- **Effort**: low (skill text edit + template addition).
2. **Adopt the `_cycleN_` batch-report naming convention starting cycle 2**
- **What**: Rename forward — every new batch report and code-review file in cycle 2+ uses `batch_NN_cycleM_report.md` and `batch_NN_cycleM_review.md`. Cycle-1 files stay as `batch_NN_report.md` for history. Update the `implement` skill's report-filename template.
- **Impact**: prevents silent overwrite of cycle-1 batch reports when cycle 2's `batch_07` lands (would currently collide with `batch_07_report.md` if that name was used). Already documented in the existing-code flow Step 10 — this enforces it.
- **Effort**: low (one edit in `.cursor/skills/implement/`).
3. **File the 8 carried-forward deploy drifts as Jira tickets in cycle 2 backlog**
- **What**: I, J, K, L, M, N, O are real backlog items (coverage gates, automated migrations, metrics + tracing, central logs, exporter, zero-downtime deploy, remote SSH wrapper). They currently live only as references in `_docs/04_deploy/*.md`. Promote them to AZ-tickets with story points.
- **Impact**: makes operational debt visible alongside feature work; protects against silent erosion of the deploy plan over multiple cycles.
- **Effort**: medium (≈ 30 min of ticket creation + sizing).
## Suggested Rule / Skill Updates
| File | Change | Rationale |
|------|--------|-----------|
| `.cursor/skills/new-task/SKILL.md` | Add Step 5.5 — "Threat-Model Micro-Check" with the two prompts above | AZ-183 revert (cycle 1) |
| `.cursor/skills/implement/SKILL.md` | Update batch-report filename template to `batch_NN_cycleM_report.md` (and review file analogously) | Naming-collision risk on cycle 2 |
| `.cursor/rules/coderule.mdc` | Add bullet: "Do not reuse retired numeric error codes (gaps are intentional)" | Batch 6 deletes codes 40 and 45 from `ExceptionEnum` — needs a rule so cycle 2 reviewers know not to fill the gap |
| `_docs/04_deploy/`-derived backlog | New AZ-* tickets for drifts I, J, K, L, M, N, O | Top action 3 above |
## Notes
- **First retrospective.** No prior baseline; cycle 2 will be the first one with delta numbers.
- **Cycle health**: green. 0 FAIL verdicts, 0 stuck agents, 0 auto-fix attempts, 44/44 E2E tests pass after Step 7's code edits. The single revert (AZ-183) was caught by the next-step security audit and resolved before deploy — the system worked, but the goal of the threat-model micro-check is to catch it one step earlier.
- **Operator burden after this cycle**: the 8 carried-forward drifts represent ≈ 22 story points of follow-up infrastructure work (rough sizing — to be confirmed when filed as tickets per Top Action 3).