Files
admin/_docs/04_deploy/ci_cd_pipeline.md
T
Oleksandr Bezdieniezhnykh c7b297de83
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
refactor: remove deploy.cmd and update Dockerfile for health checks
- Deleted the deploy.cmd script as it was no longer needed.
- Updated Dockerfile to include curl for health checks and added a non-root user for improved security.
- Modified health check command to use curl for better reliability.
- Adjusted docker-compose.test.yml to reflect changes in health check configuration.
- Cleaned up appsettings.json and removed unused configuration properties.
- Removed Resource entity and related requests from the codebase as part of the architectural shift.
- Updated documentation to reflect the removal of hardware binding and related endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 08:47:21 +03:00

159 lines
9.8 KiB
Markdown

# Azaion Admin API — CI/CD Pipeline
**Date**: 2026-05-13 · **Cycle**: 1 · **Status**: planning artifact (current Woodpecker files audited; proposed changes land as concrete YAML in Step 7).
## 1. Platform & Constraints
| Constraint | Value | Source |
|------------|-------|--------|
| CI platform | **Woodpecker CI** | restrictions.md §Operational |
| Default agent label | `arm64` | `.woodpecker/01-test.yml`, `.woodpecker/02-build-push.yml` |
| Future agent label | `amd64` (matrix entry, currently commented out) | `.woodpecker/02-build-push.yml` |
| Two-workflow contract | `01-test.yml` → tests; `02-build-push.yml` (`depends_on: 01-test`) → image | Already in repo |
| Registry | `$REGISTRY_HOST/azaion/admin` | Woodpecker secret `registry_host` |
| Branches with full pipeline | `dev`, `stage`, `main` | both files' `when.branch` |
The reference contract from `.cursor/skills/deploy/templates/ci_cd_pipeline.md` is already partially adopted. This step closes the remaining gaps.
## 2. Current Pipeline (audited)
### `.woodpecker/01-test.yml` — what it does today
| Step | Image | Action | Quality gate |
|------|-------|--------|--------------|
| `unit-tests` | `mcr.microsoft.com/dotnet/sdk:10.0` | `dotnet restore` + `dotnet test Azaion.AdminApi.sln` (release, TRX logger) | All unit tests pass |
| `e2e-tests` | `mcr.microsoft.com/dotnet/sdk:10.0` | `dotnet restore` + `dotnet test e2e/Azaion.E2E/Azaion.E2E.csproj` | All E2E tests pass |
**Audit findings**:
1. ✅ Tests are gated before build (matches contract).
2. ❌ E2E test step runs `dotnet test` directly — but the project uses **Docker-orchestrated black-box tests** via `docker-compose.test.yml`. The pure `dotnet test` invocation cannot start `system-under-test` + `test-db` containers, so `e2e-tests` as written either skips integration scenarios or relies on undocumented agent state. The reference contract uses `docker compose … --abort-on-container-exit --exit-code-from e2e-runner` instead.
3. ❌ No coverage report.
4. ❌ No SAST / dependency scan / image scan stage. Security audit recommendation 13 explicitly asked for `dotnet list package --vulnerable` in CI (Drift F).
5. ❌ No artifact upload of TRX results — failures are visible only in console logs.
### `.woodpecker/02-build-push.yml` — what it does today
| Step | Image | Action | Quality gate |
|------|-------|--------|--------------|
| `build-push` | `docker` | `docker login``docker build` (with three OCI labels + `CI_COMMIT_SHA` build-arg) → `docker push $REGISTRY_HOST/azaion/admin:${CI_COMMIT_BRANCH}-${TAG_SUFFIX}` | Push succeeds |
**Audit findings**:
1. ✅ Multi-arch matrix scaffolding present (`PLATFORM` / `TAG_SUFFIX`) with amd64 commented for future use.
2.`depends_on: [01-test]` — gating is correct.
3. ✅ OCI labels (`revision`, `created`, `source`) injected as build-time labels.
4. ❌ Only branch-based mutable tag pushed. No immutable `<sha12>-<arch>` tag → host scripts cannot pin (Drift A).
5. ❌ No image scan (Trivy) before push.
6. ❌ Old documentation referenced `.woodpecker/build-arm.yml` which no longer exists (Drift D — fix in this doc, see §10).
## 3. Proposed Stage Map (target state for cycle 1)
| Stage | Trigger | Workflow file | Quality gate |
|-------|---------|---------------|--------------|
| Lint / format | every push & PR | `01-test.yml` (new step) | `dotnet format --verify-no-changes` returns 0 |
| Unit tests | every push & PR | `01-test.yml` | All `Azaion.*Tests` pass; TRX uploaded |
| Black-box E2E (Docker compose) | every push & PR | `01-test.yml` | `docker compose -f docker-compose.test.yml up --abort-on-container-exit --exit-code-from e2e-consumer` returns 0; results uploaded |
| Security: dependency audit | every push & PR | `01-test.yml` (new step) | `dotnet list package --vulnerable --include-transitive` reports zero High/Critical CVEs |
| Security: image scan | post-build, pre-push | `02-build-push.yml` (new step) | `trivy image --severity HIGH,CRITICAL --exit-code 1` returns 0 |
| Build | push to `dev` / `stage` / `main` | `02-build-push.yml` | `docker build` succeeds |
| Push (branch tag + SHA tag) | push to `dev` / `stage` / `main` | `02-build-push.yml` | both `docker push` calls succeed |
| Performance smoke (optional) | manual on `stage` / `main` | `03-perf.yml` (new) | k6 thresholds in `scripts/perf-scenarios.js` all `ok: true` |
| Deploy staging | tag push or `stage` branch | `04-deploy.yml` (new) | health check returns 200 within timeout |
| Deploy production | manual approval | `04-deploy.yml` (new) | health check returns 200 within timeout |
> Note on coverage: the test infrastructure (cycle 1) does not yet collect or report coverage. The skill's 75% gate cannot be enforced this cycle. Recorded as **Drift I** (carried forward to a future cycle); does NOT block this deploy.
## 4. Caching Strategy
| Cache | Key | Notes |
|-------|-----|-------|
| `nuget` packages | hash of `**/*.csproj` | Mounted on `/root/.nuget/packages`; restored before `dotnet restore`. Cache invalidates on any csproj change. |
| Docker layer cache | hash of `Dockerfile` + `**/*.csproj` | Use Woodpecker `--cache-from` against the previous push of the same branch (e.g. `--cache-from $REGISTRY_HOST/azaion/admin:dev-arm`). Cheapest cache available without buildx. |
| E2E DB init scripts | none — re-init each run | Schema differences would mask test failures. `down -v` between runs is intentional (mirrors `scripts/run-tests.sh`). |
## 5. Parallelization
```
01-test.yml (matrix: arm64 [+ amd64 future])
├── lint-format ─┐
├── unit-tests ─┼── all run in parallel on the same agent;
├── e2e-tests ─┤ the slowest (e2e) gates the workflow
└── deps-audit ─┘
02-build-push.yml (matrix: arm64 [+ amd64 future])
├── build ─→ image-scan ─→ push (branch tag) ─→ push (sha tag)
└─→ artifact: image digest stored as Woodpecker artifact
03-perf.yml (manual; arm64 only)
└── k6-perf (uses the docker-compose.test.yml SUT)
04-deploy.yml (manual; per-environment)
└── pull → stop → start → health-check → smoke
```
Cross-workflow gates: `02 depends_on 01`; `04 depends_on 02` for the same SHA.
## 6. Quality Gates (summary)
| Gate | Threshold | Action on breach |
|------|-----------|------------------|
| Lint | 0 violations | fail workflow |
| Unit tests | 100% pass | fail workflow |
| E2E tests | 100% pass | fail workflow |
| Dependency audit (High / Critical) | 0 CVEs | fail workflow (Drift F) |
| Image scan (High / Critical) | 0 CVEs | fail workflow |
| Coverage | not enforced this cycle (Drift I) | inform-only |
| Performance (k6) | thresholds in `perf-scenarios.js` | fail workflow when run |
## 7. Notifications
| Event | Channel | Recipients |
|-------|---------|------------|
| `01-test` failure | Woodpecker UI + Slack `#azaion-ci` | Backend team |
| `02-build-push` failure | Woodpecker UI + Slack `#azaion-ci` | Backend team |
| Image-scan High/Critical finding | Slack `#azaion-security` | Security + on-call |
| `04-deploy` failure | Slack `#azaion-ops` + email on-call | Ops on-call |
| Manual production deploy approval requested | Slack `#azaion-ops` | Approvers |
> Slack channel names are placeholders — swap to actual channel IDs in Step 7 when wiring `from_secret: slack_webhook_*`. Email/Pager wiring is deferred until those secrets exist.
## 8. Image Tags
Resolves Drift A:
| Push order | Tag | Stability | Used by |
|-----------|-----|-----------|---------|
| 1 | `${CI_COMMIT_BRANCH}-${TAG_SUFFIX}` | mutable (overwritten each push to the branch) | quick dev pulls (`docker pull …:dev-arm`) |
| 2 | `${CI_COMMIT_SHA:0:12}-${TAG_SUFFIX}` | immutable | host deploy scripts; rollback target |
Production deploys MUST reference the SHA tag, never the branch tag (Step 6 procedures will enforce this).
## 9. Reproducibility & Audit
- Every pushed image carries `org.opencontainers.image.revision` = full `CI_COMMIT_SHA`. The 12-char prefix in the tag is for human reading; the label is the source of truth.
- `org.opencontainers.image.created` = ISO-8601 build start time (UTC).
- `org.opencontainers.image.source` = `$CI_REPO_URL`.
- Both image scan and dependency audit reports are uploaded as Woodpecker artifacts on every run (success and failure).
## 10. Drifts Resolved Here / Carried Forward
| ID | Severity | Description | Status |
|----|----------|-------------|--------|
| A | Medium | Branch-tag-only push; host pulls `:latest` that CI never produces | **Resolved in spec** — add SHA-tag push (§8); script change in Step 7 |
| D | Low | Old docs referenced `.woodpecker/build-arm.yml` | **Resolved here** — corrected to `01-test.yml` + `02-build-push.yml` everywhere |
| E | Low | `scripts/run-performance-tests.sh` is run-on-demand only | **Spec**`03-perf.yml` planned; manual trigger in cycle 1, automatic gate in a future cycle when threshold fluctuation is understood |
| F | Low | No vulnerable-dep gate in CI | **Resolved in spec**`deps-audit` step in `01-test.yml`; concrete YAML in Step 7 |
| I | Low (NEW) | No coverage threshold enforced (no coverage collection wired) | **Carried forward** to a future cycle; recorded in the deploy plan, not blocking |
## 11. Self-verification
- [x] All pipeline stages defined with triggers and gates.
- [ ] Coverage threshold enforced — **deferred (Drift I)** with explicit justification.
- [x] Security scanning included (deps + image; SAST deferred to a future cycle when a SAST tool is selected).
- [x] Caching configured (NuGet + Docker layer).
- [x] Multi-environment deployment scaffold (staging → production manual).
- [x] Rollback referenced (SHA-tagged images make `docker run …:<previous-sha>-arm` a one-line rollback; details in Step 6).
- [x] Notification matrix defined.