mirror of
https://github.com/azaion/annotations.git
synced 2026-06-21 13:51:07 +00:00
docs+src: complete Steps 1-3 outcomes + auth re-sync baseline
This commit captures everything produced during autodev existing-code Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec), together with the targeted auth + CORS re-sync triggered on 2026-05-14 when codebase drift was detected at Step 4 entry. None of this work was previously committed. Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution, architecture, system flows, glossary, module-layout, per-component specs (01..06), modules, deployment, diagrams, data model, FINAL report, verification log, discovery. Step 2 (Architecture Baseline) — architecture_compliance_baseline.md. Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No High/Critical findings; auto-chained to Step 3 per existing-code flow. Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across blackbox, security, resilience, resource-limit, performance), plus e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh, scripts/run-performance-tests.sh. Coverage 88% over the active scope (40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered). Targeted auth + CORS re-sync — replaces the deleted in-house token issuer with a JWKS-verifier model. AuthController and TokenService removed; JwtExtensions switched from HS256 symmetric to ES256 over admin's JWKS. ConfigurationResolver and CorsConfigurationValidator added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01, SEC-02, SEC-03 marked Closed. One new testability risk recorded in architecture.md Open Risks Section 6 (JWKS HTTPS gating). Source changes: - src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning - src/Program.cs (modified) — DI wiring for ConfigurationResolver and CorsConfigurationValidator - src/Controllers/AuthController.cs (deleted) — no in-service issuance - src/Services/TokenService.cs (deleted) — same - src/Infrastructure/ConfigurationResolver.cs (new) - src/Infrastructure/CorsConfigurationValidator.cs (new) - .env.example (new) — required env var documentation - .gitignore (updated) Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec captures the change-spec for downstream services that consumed the now deleted /auth endpoints. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,72 @@
|
||||
# CI / CD Pipeline
|
||||
|
||||
Source of truth: `.woodpecker/build-arm.yml`.
|
||||
|
||||
## Engine
|
||||
|
||||
Woodpecker CI. No GitHub Actions / GitLab CI / Azure Pipelines configured in this repo — `.github/workflows/` is absent (`00_discovery.md`). Suite-wide CI may layer on top of this; that lives outside the workspace.
|
||||
|
||||
## Trigger
|
||||
|
||||
```yaml
|
||||
when:
|
||||
event: [push, manual]
|
||||
branch: [dev, stage, main]
|
||||
```
|
||||
|
||||
- Builds run on push to **`dev`**, **`stage`**, or **`main`**, plus manual triggers.
|
||||
- Other branches do **not** build images.
|
||||
|
||||
## Runner constraint
|
||||
|
||||
```yaml
|
||||
labels:
|
||||
platform: arm64
|
||||
```
|
||||
|
||||
Pipeline pins to ARM64 runners. The Dockerfile is multi-arch capable but this pipeline only builds `arm64`.
|
||||
|
||||
## Steps (single step `build-push`)
|
||||
|
||||
1. Login to private registry using secrets `registry_host`, `registry_user`, `registry_token`.
|
||||
2. Compute `TAG=${CI_COMMIT_BRANCH}-arm` and `BUILD_DATE` (`date -u +%Y-%m-%dT%H:%M:%SZ`).
|
||||
3. `docker build -f src/Dockerfile` with build args + OCI labels:
|
||||
- `--build-arg CI_COMMIT_SHA=$CI_COMMIT_SHA`
|
||||
- `--label org.opencontainers.image.revision=$CI_COMMIT_SHA`
|
||||
- `--label org.opencontainers.image.created=$BUILD_DATE`
|
||||
- `--label org.opencontainers.image.source=$CI_REPO_URL`
|
||||
- tag: `$REGISTRY_HOST/azaion/annotations:$TAG`
|
||||
4. `docker push` of that tag.
|
||||
5. Mounts `/var/run/docker.sock` into the build container (Docker-out-of-Docker pattern).
|
||||
|
||||
## Image tagging
|
||||
|
||||
Per branch:
|
||||
|
||||
| Branch | Image tag |
|
||||
|--------|-----------|
|
||||
| `dev` | `dev-arm` |
|
||||
| `stage` | `stage-arm` |
|
||||
| `main` | `main-arm` |
|
||||
|
||||
Tags are **mutable** — every push to a branch overwrites the prior image at that tag. No immutable revision-tagged images are produced today (`main-arm-${SHA}` is not pushed). Adding immutable tags would simplify rollback and trace-back from a running image to a commit.
|
||||
|
||||
## Secrets
|
||||
|
||||
| Secret | Purpose |
|
||||
|--------|---------|
|
||||
| `registry_host` | Registry hostname (also used in pushed image FQN) |
|
||||
| `registry_user` | Registry login user |
|
||||
| `registry_token` | Registry login token (used via `--password-stdin`) |
|
||||
|
||||
Secrets are referenced via `from_secret:` and never echoed.
|
||||
|
||||
## What CI does NOT do today
|
||||
|
||||
- No tests run (no test project exists in repo per `00_discovery.md`).
|
||||
- No linters / format checks (`dotnet format`).
|
||||
- No `amd64` image.
|
||||
- No scan (Trivy / Grype) on the produced image.
|
||||
- No automated rollback on failed deploy (deploy itself is out of pipeline scope).
|
||||
|
||||
These are gaps to track when the test project is added in autodev Phase A Step 6.
|
||||
@@ -0,0 +1,53 @@
|
||||
# Containerization
|
||||
|
||||
Source of truth: `src/Dockerfile`.
|
||||
|
||||
## Build
|
||||
|
||||
Two-stage build:
|
||||
|
||||
1. **build stage** — `mcr.microsoft.com/dotnet/sdk:10.0`, `--platform=$BUILDPLATFORM`. Reads `$TARGETARCH` and runs `dotnet publish -c Release -o /app --os linux --arch $arch` (mapping `amd64 → x64`, otherwise `$TARGETARCH`).
|
||||
2. **runtime stage** — `mcr.microsoft.com/dotnet/aspnet:10.0`. Copies the published output, exposes port `8080`, sets `ENTRYPOINT ["dotnet", "Azaion.Annotations.dll"]`.
|
||||
|
||||
## Build arguments
|
||||
|
||||
| Arg | Default | Purpose |
|
||||
|-----|---------|---------|
|
||||
| `BUILDPLATFORM` | provided by Buildx | Multi-arch host platform |
|
||||
| `TARGETARCH` | provided by Buildx | Output arch (`amd64` / `arm64`) |
|
||||
| `CI_COMMIT_SHA` | `unknown` | Stamped into `AZAION_REVISION` env at runtime |
|
||||
|
||||
## Runtime
|
||||
|
||||
| Aspect | Value |
|
||||
|--------|-------|
|
||||
| Base image | `mcr.microsoft.com/dotnet/aspnet:10.0` |
|
||||
| Working dir | `/app` |
|
||||
| Exposed port | `8080` (HTTP) |
|
||||
| Entry point | `dotnet Azaion.Annotations.dll` |
|
||||
| Runtime env stamped at build | `AZAION_REVISION = $CI_COMMIT_SHA` |
|
||||
|
||||
## Multi-arch
|
||||
|
||||
Dockerfile is multi-arch capable via Buildx. The current Woodpecker pipeline emits **`arm64` only** (label `platform: arm64`, tag `${BRANCH}-arm`). Producing `amd64` requires an additional pipeline (or extending the existing one to a matrix).
|
||||
|
||||
## Image size & caching
|
||||
|
||||
- Layers: SDK install → `COPY . .` → publish → runtime copy. The final layer is the published `/app` directory only — no SDK in runtime image.
|
||||
- Cache hit on `COPY . .` is wide (entire `src/`); finer caching (e.g., `COPY *.csproj` first, then `dotnet restore`, then sources) is **not configured** — improvement candidate.
|
||||
|
||||
## Image labels
|
||||
|
||||
Set in CI (`.woodpecker/build-arm.yml`), not in the Dockerfile:
|
||||
|
||||
- `org.opencontainers.image.revision = $CI_COMMIT_SHA`
|
||||
- `org.opencontainers.image.created = $BUILD_DATE`
|
||||
- `org.opencontainers.image.source = $CI_REPO_URL`
|
||||
|
||||
These follow the OCI standard so the registry surfaces them correctly.
|
||||
|
||||
## Open items
|
||||
|
||||
- Add `amd64` build target if non-ARM hosts are required.
|
||||
- Consider non-root user inside the runtime image (none configured today).
|
||||
- Consider `dotnet restore` cache layer split for faster CI builds.
|
||||
@@ -0,0 +1,75 @@
|
||||
# Environment Strategy
|
||||
|
||||
Source of truth: `src/Program.cs` + `src/Database/DatabaseMigrator.cs` + `.woodpecker/build-arm.yml`.
|
||||
|
||||
## Environments
|
||||
|
||||
Branch-driven from CI:
|
||||
|
||||
| Branch | Image tag | Intended environment |
|
||||
|--------|-----------|----------------------|
|
||||
| `dev` | `dev-arm` | Development (shared) |
|
||||
| `stage` | `stage-arm` | Pre-production |
|
||||
| `main` | `main-arm` | Production |
|
||||
|
||||
The service binary is identical across environments — all variation is **runtime configuration via env vars** (no per-environment build flags).
|
||||
|
||||
## Configuration sources (priority order, per `Program.cs`)
|
||||
|
||||
1. `Environment.GetEnvironmentVariable("KEY")`.
|
||||
2. ASP.NET Core `IConfiguration` (`builder.Configuration["KEY"]`) — covers `appsettings.json`, command-line args, etc.
|
||||
3. **No hard-coded fallback for security-sensitive values.** `DATABASE_URL`, `JWT_ISSUER`, `JWT_AUDIENCE`, `JWT_JWKS_URL`, and (in `Production`) a non-empty `CorsConfig:AllowedOrigins` are required; missing values cause startup to fail fast via `ConfigurationResolver.ResolveRequiredOrThrow` / `CorsConfigurationValidator.EnsureSafeForEnvironment`.
|
||||
|
||||
## Required environment variables
|
||||
|
||||
| Variable | Purpose | Default | Production action |
|
||||
|----------|---------|---------|---------------------|
|
||||
| `DATABASE_URL` | Postgres connection (URL or LinqToDB conn string) | — (required, fail-fast) | **MUST set** |
|
||||
| `JWT_ISSUER` | Expected `iss` claim; must match admin's `JwtConfig:Issuer` | — (required, fail-fast) | **MUST set** |
|
||||
| `JWT_AUDIENCE` | Expected `aud` claim; must match admin's `JwtConfig:Audience` | — (required, fail-fast) | **MUST set** |
|
||||
| `JWT_JWKS_URL` | Admin's JWKS endpoint (HTTPS) | — (required, fail-fast) | `https://admin.azaion.com/.well-known/jwks.json` |
|
||||
| `CorsConfig__AllowedOrigins__0` | First allowed CORS origin (array via `__N` indices) | — | **MUST set** (or `CorsConfig__AllowAnyOrigin=true`) in Production |
|
||||
| `CorsConfig__AllowAnyOrigin` | Opt-in to permissive CORS (non-production only) | `false` | Leave `false` in Production |
|
||||
| `RABBITMQ_HOST` | Stream host | `127.0.0.1` | Override |
|
||||
| `RABBITMQ_STREAM_PORT` | Stream port | `5552` | Override if non-default |
|
||||
| `RABBITMQ_PRODUCER_USER` | Stream user | `azaion_producer` | Override |
|
||||
| `RABBITMQ_PRODUCER_PASS` | Stream password | `producer_pass` | Override |
|
||||
| `RABBITMQ_STREAM_NAME` | Stream name | `azaion-annotations` | Usually keep (suite contract) |
|
||||
|
||||
`JWT_SECRET` was removed in this cycle — annotations no longer mints HS256 tokens; admin is the sole token issuer (ES256).
|
||||
|
||||
## URL format conversion
|
||||
|
||||
`Program.cs` accepts `DATABASE_URL` either as a Linq2DB connection string or as a `postgresql://user:pass@host:port/db` URL. The `ConvertPostgresUrl` helper rewrites the URL form into LinqToDB conn-string form. This means operators can use either ENV-style URLs (kubectl/Postgres operator output) or `Host=...` directly.
|
||||
|
||||
## DB-driven configuration
|
||||
|
||||
Several runtime concerns are stored in **database tables**, not env:
|
||||
|
||||
- **Filesystem roots** — `directory_settings` (defaults `/data/...`). Updated via `PUT /settings/directories`; **must trigger** `PathResolver.Reset` for the change to take effect (Flow F7).
|
||||
- **System settings** — `system_settings` (`generate_annotated_image`, `silent_detection`, thumbnail dimensions).
|
||||
- **User settings** — `user_settings` (per UI session prefs).
|
||||
|
||||
Operators changing filesystem layout in production need an `ADM` JWT and the right cluster connectivity, **not** a redeploy.
|
||||
|
||||
## Filesystem mounts
|
||||
|
||||
The container expects `/data/` (or whatever `directory_settings` points at) to be a **writable persistent mount**:
|
||||
|
||||
- `/data/images` — annotation full images
|
||||
- `/data/labels` — YOLO `.txt` files
|
||||
- `/data/thumbnails` — thumbnails
|
||||
- `/data/results` — annotated images (when `generate_annotated_image=true`)
|
||||
- `/data/videos` — media uploads
|
||||
- `/data/gps_sat`, `/data/gps_route` — GPS overlays
|
||||
|
||||
Without these mounts, every annotation-create / media-upload flow returns 500 from `ErrorHandlingMiddleware` (FS write fails).
|
||||
|
||||
## Config drift between environments
|
||||
|
||||
Today, environment-specific config is held wherever the deployment platform places env vars (Helm values / Kustomize overlays / Compose files in `_infra/`). This repo intentionally does not commit per-environment values; the only environment-aware file in-repo is `.woodpecker/build-arm.yml`.
|
||||
|
||||
## Open items
|
||||
|
||||
- No `appsettings.Production.json` — all env-specific config is operator-supplied.
|
||||
- `Swagger UI` is mounted in all environments (ADR-005); production exposure must be controlled at the perimeter.
|
||||
@@ -0,0 +1,55 @@
|
||||
# Observability
|
||||
|
||||
Source of truth: `src/Program.cs` (no dedicated logging config files exist in repo).
|
||||
|
||||
## Health check
|
||||
|
||||
```csharp
|
||||
app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
|
||||
```
|
||||
|
||||
- Path: `GET /health`
|
||||
- Auth: none (`MapGet` bypasses controller-level `[Authorize]`).
|
||||
- Response: `200 { "status": "healthy" }`
|
||||
- **Liveness only**: this endpoint does not probe the DB, RabbitMQ, or filesystem. A pod can return healthy while the failsafe outbox is unable to publish or while DB connectivity is broken.
|
||||
|
||||
## API documentation
|
||||
|
||||
- `app.UseSwagger()` and `app.UseSwaggerUI()` mounted unconditionally (ADR-005).
|
||||
- Endpoints: `/swagger/v1/swagger.json` (OpenAPI), `/swagger/index.html` (UI).
|
||||
- No version pinning of the OpenAPI document (Swashbuckle defaults).
|
||||
|
||||
## Logging
|
||||
|
||||
- Default ASP.NET Core console logger. No `appsettings.json` overrides in repo.
|
||||
- No structured logger (Serilog / NLog) configured.
|
||||
- No correlation id middleware in repo (`X-Request-Id` not propagated).
|
||||
|
||||
## Metrics
|
||||
|
||||
None configured today. Possible additions:
|
||||
- `prometheus-net` exporter on `/metrics`.
|
||||
- ASP.NET Core `MetricsCollector` (built-in HTTP / runtime counters).
|
||||
|
||||
## Traces
|
||||
|
||||
None configured. OpenTelemetry SDK is not referenced in `csproj`.
|
||||
|
||||
## Image revision stamp
|
||||
|
||||
The runtime container has `AZAION_REVISION = $CI_COMMIT_SHA` set as an env var (Dockerfile + Woodpecker pipeline). This makes "what's running?" diagnosable from inside the container with `printenv AZAION_REVISION` or by surfacing it in a future `/info` endpoint.
|
||||
|
||||
## Error visibility to clients
|
||||
|
||||
`ErrorHandlingMiddleware` maps exceptions to JSON `{ statusCode, message }` with HTTP 400 / 404 / 409 / 500. Internal exception details are not leaked beyond the `message` string (confirm during Step 4 verification — make sure 500 paths do not echo stack traces).
|
||||
|
||||
## Open observability items
|
||||
|
||||
- **Readiness vs liveness split**: today there is one endpoint that does not check dependencies. A `GET /ready` that pings DB and (optionally) RabbitMQ would unblock proper rolling-update gates.
|
||||
- **Structured logs** with request id correlation across HTTP + outbox drain + SSE.
|
||||
- **Outbox depth metric** (`COUNT(*)` on `annotations_queue_records`) — surfaces stuck-failsafe scenarios early.
|
||||
- **SSE subscriber count metric** — visibility into connected UIs.
|
||||
- **Stream publish lag** — time from outbox row insertion to RabbitMQ publish.
|
||||
- **Failure injection / chaos hooks** — none today.
|
||||
|
||||
These are candidates for the deploy-and-retro phase of autodev (Steps 14–17 once the project enters Phase B).
|
||||
Reference in New Issue
Block a user