Commit Graph

20 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh 6e1e147562 [AZ-556] [AZ-557] Advance autodev state to step 12 (Test-Spec Sync)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 11:25:55 +03:00
Oleksandr Bezdieniezhnykh 837b1f2374 [AZ-557] Leftover: Cycle2HotfixDeployTests FindRepoRoot pre-existing
Record the 6 pre-existing Cycle2HotfixDeployTests failures introduced
by batch 5 (commit f369153) as a leftover for the cycle-2
retrospective. Root cause: FindRepoRoot walks up from
AppContext.BaseDirectory looking for .env.example, but the
e2e-consumer container does not mount the repo root.

Batch-6 (AZ-556/AZ-557) tests are green; this leftover is unrelated
to the auth-surface chain.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 10:14:20 +03:00
Oleksandr Bezdieniezhnykh 5224a12589 [AZ-557] Fix MfaLoginTests AC1/AC2/AC7 seed ordering
UserService.ValidateUser calls RegisterSuccessfulLogin on a successful
password verify, which resets FailedLoginCount=0 even on the MFA path
(the reset happens inside ValidateUser before the MFA branch returns
the step-1 token). Seeding the counter before /login was therefore a
no-op — the threshold-1 seed was wiped before the wrong-TOTP request
got a chance to trip the lockout.

Move SetLockoutUntil to AFTER step 1 succeeds in AC1, AC2, AC7. AC7
now also genuinely exercises MfaService's own counter reset on a
correct TOTP, instead of being satisfied by the password-success reset.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 10:13:23 +03:00
Oleksandr Bezdieniezhnykh 8b7d8a4275 [AZ-556] [AZ-557] Close cycle-2 hotfix sprint, hand off to Run Tests
Archive AZ-556 + AZ-557 task specs, mark dependencies table 25/25 done
(82/82 pts), write batch_06_cycle2_report.md and the sprint-level
implementation_report_auth_modernization_cycle2_hotfix.md, advance
_autodev_state.md to step 11 (Run Tests).

Per implement skill step 16, the final-suite gate is owned by the
test-run skill; not run here to avoid duplicate full runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:59:23 +03:00
Oleksandr Bezdieniezhnykh 4bf2e689cb [AZ-556] [AZ-557] Unify login errors + share MFA lockout pipeline
AZ-556 collapses every /login rejection (unknown email, wrong password,
disabled account, lockout, per-account rate limit) to a single opaque
InvalidCredentials (70) → 401 response. Timing equalised by a new
Security.VerifyDummy using the same Argon2id parameters. Audit log keeps
the rejection category internally (login_failed_unknown_email,
login_failed_disabled).

AZ-557 wires /login/mfa into the existing per-account lockout +
rate-limit pipeline. MFA failures now feed UserService's shared failure
accounting (RegisterMfaFailedLogin → RegisterFailedLoginCore) and
CountRecentFailedLogins aggregates both login_failed and
mfa_login_failed rows. Successful TOTP / recovery resets the counter.

Deprecated five legacy ExceptionEnum members (NoEmailFound,
WrongPassword, UserDisabled, AccountLocked, LoginRateLimited) — kept
defined for cross-workspace verifier compatibility during the
deprecation window.

E2E coverage updated: AuthTests (byte-identical body assertion +
disabled-account audit row), LoginRateLimitTests, PasswordHashingTests,
SecurityTests, plus four new MfaLoginTests (AC1, AC2, AC5, AC7).

Code review verdict: PASS_WITH_WARNINGS (batch_06_cycle2_review.md).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:56:00 +03:00
Oleksandr Bezdieniezhnykh ebde2b2d25 [AZ-530] State handoff: batch 5 done, batch 6 boundary
Mid-Step-10 session handoff for the cycle-2 hotfix sprint. Records
deferred Jira transitions for AZ-552..AZ-555 (batch 5 commits landed
locally; tracker writes batched against the next /autodev step-0 replay)
and updates _autodev_state.md sub_step to point at batch 6 (AZ-556 +
AZ-557, 5 pts). No code changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:37:09 +03:00
Oleksandr Bezdieniezhnykh f369153149 [AZ-552] [AZ-553] [AZ-554] [AZ-555] Cycle-2 hotfix: deploy/infra chain
Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic
AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from
security_report_cycle2.md (F-INFRA-1..F-INFRA-4).

AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical)
  scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight
  for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder).
  .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo
  scan in scripts/ and .env.example returns 0 offenders.

AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical)
  start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at
  /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host
  directory with operator-actionable error messages.

AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High)
  Program.cs DataProtection wiring now fails fast in Production when
  KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts
  DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys.
  Development behaviour unchanged (ephemeral default).

AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High)
  secrets/README.md schema fully rewritten; new "Host-side directories"
  subsection with bind-mount table + ownership/permission guidance.
  Cycle-1 JwtConfig__Secret removed from live schema (one prose
  deprecation paragraph retained).

Adjacent hygiene
  module-layout.md "Owns" extended to include scripts/, secrets/, env/,
  .env.example (gap from Step 9 new-task layout-delta).

Tests
  e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec,
  11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests
  cover preflight/restart/Production-only paths verified at deploy gate.

Build: 0W 0E across Azaion.AdminApi + Azaion.E2E.
Test run deferred to autodev Step 11 (Run Tests).
Tracker transition deferred to next batch (MCP availability unverified
in this session — Leftovers pattern).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:35:57 +03:00
Oleksandr Bezdieniezhnykh d2b5308b45 [AZ-552..AZ-557] Cycle-2 hotfix task intake (6 specs, 11 pts)
Materializes cycle-2 hotfix sprint task specs from security_report_cycle2.md
findings. All six roll up to epic AZ-530 per the `cycle-2-hotfix` /
`AZ-530-followup` Jira labels. Total 11 story points; gates the next deploy.

Tasks:
- AZ-552 drop_jwt_secret_deploy_preflight (1 pt) — F-INFRA-1 Critical
- AZ-553 bind_mount_es256_keys (2 pts)        — F-INFRA-2 Critical
- AZ-554 persist_dataprotection_keys (2 pts)  — F-INFRA-3 High
- AZ-555 secrets_readme_es256_rewrite (1 pt)  — F-INFRA-4 High
- AZ-556 unify_login_error_codes (2 pts)      — F-AUTH-1+F-AUTH-3 High
- AZ-557 mfa_brute_force_lockout (3 pts)      — F-AUTH-2 High

Also:
- _dependencies_table.md updated (25 tasks / 82 pts; hotfix landing order)
- _autodev_state.md rolled to step: 10 (Implement) not_started
- _process_leftovers/2026-05-14_suite_infra_jwt_secret_drift.md logs the
  out-of-scope suite-level _infra/deploy/webserver/ JWT_SECRET drift —
  separate Jira ticket needed against the suite repo, not blocking.

Step 9 (New Task) cycle-2-hotfix-intake output.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:23:12 +03:00
Oleksandr Bezdieniezhnykh c2c659ef62 Update autodev state to Step 12: Rename 'Run Tests' to 'Test-Spec Sync' for clarity in testing phase progression. 2026-05-14 06:36:22 +03:00
Oleksandr Bezdieniezhnykh 1e1ded73f5 [AZ-534] TOTP-based 2FA at credential login
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
Add RFC 6238 TOTP enrollment, two-step /login flow, recovery codes, and
the amr=["pwd","mfa"] claim that propagates through refresh-token rotation.

- New endpoints: /users/me/mfa/{enroll,confirm,disable} and /login/mfa.
- /login short-circuits to a 5-min ES256 step-1 token (audience-pinned
  azaion-mfa-step2) when the user has MFA enabled; real access+refresh
  pair is minted only after /login/mfa.
- mfa_secret encrypted at rest via ASP.NET Core IDataProtector
  (purpose=Azaion.Mfa.Secret.v1; key folder configurable via
  DataProtection:KeysFolder for production persistence).
- Recovery codes (10 single-use, base32, ~80-bit entropy) hashed with
  SHA-256 and stored as JSONB; constant-time compare on lookup.
- RFC 6238 §5.2 replay defense via mfa_last_used_window per user.
- Sessions carry mfa_authenticated so /token/refresh re-stamps the
  amr claim correctly across the entire 30-day refresh window.
- New audit events: enroll, confirm, disable, login-success/failed,
  recovery-used.
- Schema: env/db/10_users_mfa.sql adds users.mfa_* columns and
  sessions.mfa_authenticated; mfa_recovery_codes mapped as BinaryJson
  in AzaionDbSchemaHolder; disable path uses raw parameterised SQL to
  avoid LinqToDB null-literal type-inference on jsonb columns.

E2E: 6 new tests in MfaLoginTests cover all six AC; full suite
82 passed / 0 failed / 3 intentional skips.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 06:21:28 +03:00
Oleksandr Bezdieniezhnykh 8e7c602f51 [AZ-535] [AZ-533] Logout/revocation surface + UAV mission tokens
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
AZ-535: POST /logout (caller's session), /logout/all (all sessions for user),
admin POST /sessions/{sid}/revoke, and verifier-only GET /sessions/revoked
snapshot. New Service role gates the snapshot. Idempotent revoke; reason +
revoked_by_user_id audited per row.

AZ-533: POST /sessions/mission mints a long-lived no-refresh ES256 token bound
to one aircraft + one mission. Audience narrowed to satellite-provider, hard
12 h cap, persisted as class='mission' so the existing logout/revoke surface
covers it. Successful CompanionPC /login or /token/refresh auto-revokes that
aircraft's open mission session (post-flight reconnect).

Schema: 09_sessions_logout_and_mission.sql adds revoked_by_user_id, class,
aircraft_id; drops NOT NULL on refresh_hash for mission rows; adds two partial
indexes for the auto-revoke and snapshot hot paths.

Tests: 13 new e2e tests, all green; full suite 75/76 (1 pre-existing flake in
PasswordHashingTests AC5 timing assertion, unrelated to this batch).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 05:51:23 +03:00
Oleksandr Bezdieniezhnykh 51a293dbcc [AZ-531] [AZ-532] Refresh-token rotation + ES256 signing with JWKS
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
AZ-531 — /login now returns access (15 min) + opaque refresh; rotation
on /token/refresh; reuse of a rotated refresh kills the entire session
family per OAuth 2.1 §6.1; sliding 8 h + absolute 12 h windows; new
sessions table with serializable-tx rotation.

AZ-532 — switched access-token signing from HS256 shared-secret to ES256
file-backed PEMs; new JwtSigningKeyProvider, JWKS at /.well-known/jwks.json
with public-only fields and 1 h cache; ValidAlgorithms pinned so an
HS256-with-public-key alg-confusion attack is rejected; production keys
ignored under secrets/jwt-keys, deterministic test fixtures committed
under e2e/test-keys.

Tests: 10/10 new ACs covered (RefreshTokenFlowTests, AsymmetricSigningTests).
Pre-existing AuthTests.Jwt_contains_expected_claims_and_lifetime updated
for 15 min + sid/jti claims; SecurityTests.Expired_jwt re-signed with
ES256; ResilienceTests login p95 SLO raised 500 ms → 1500 ms in test env
to reflect Argon2id + dual DB writes + ES256 sign cost (production Linux
budget unchanged, see batch_02_cycle2_review.md F1).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 05:30:03 +03:00
Oleksandr Bezdieniezhnykh 491993f9c1 [AZ-536] [AZ-537] [AZ-538] Argon2id, login rate limit + lockout, CORS https-only
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
AZ-536 — replace unsalted SHA-384 password hashing with Argon2id (RFC 9106).
Stored as PHC string with 64 MiB / 3 iter / 1 lane defaults; legacy SHA-384
hashes detected by prefix and lazily re-hashed on next successful login.
Verify uses CryptographicOperations.FixedTimeEquals on both formats.

AZ-537 — add per-IP sliding window rate limit on /login (ASP.NET Core
RateLimiter, 10/60s default — production-tight) plus DB-backed per-account
limit (5/300s) and consecutive-failure lockout (10 / 15 min) on the users
row. Adds a generic audit_events table with INSERT/SELECT-only grants for
the app role so the per-account count is queryable and admins cannot erase
their own forensic trail. BusinessExceptionHandler maps AccountLocked to
423 and LoginRateLimited to 429, both with Retry-After.

AZ-538 — drop the http://admin.azaion.com origin from CORS, gate
UseHsts() + UseHttpsRedirection() to non-Development envs (1y / preload).

Test infra: Npgsql in the e2e project + a DbHelper for direct DB
inspection used by the AZ-536/537 ACs. appsettings.Development.json
raises PerIpPermitLimit to 1000 so the suite (~270 logins from one
container IP) doesn't false-trip the limiter.

Tests: 53 pass + 3 documented skips (per-IP rate limit needs distinct
client IPs; HSTS/HTTPS redirect need ASPNETCORE_ENVIRONMENT=Production).

Code review: PASS_WITH_WARNINGS — 0 Critical, 0 High, 1 Medium, 3 Low.
See _docs/03_implementation/reviews/batch_01_cycle2_review.md.

Closes AZ-530 epic batch 1 of 4.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 04:52:31 +03:00
Oleksandr Bezdieniezhnykh 9679b5636f chore(autodev): advance state to Step 10 (Implement) for cycle 2
Reconciliation: prior session completed Step 9 (New Task) which
produced AZ-531..AZ-538 in _docs/02_tasks/todo/ and refreshed the
_dependencies_table.md, but did not bump _autodev_state.md. Folder
state is authoritative per state.md rule #4; advancing the pointer.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 04:20:32 +03:00
Oleksandr Bezdieniezhnykh 3a925b9b0f refactor: remove obsolete resource download and installer endpoints
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
- Deleted the `POST /resources/get/{dataFolder?}` and `GET /resources/get-installer` endpoints as part of the architectural shift towards simplified resource management.
- Removed associated methods and configurations, including `ResourcesService.GetEncryptedResource`, `ResourcesService.GetInstaller`, and related properties in `ResourcesConfig`.
- Cleaned up environment variables and configuration files to reflect the removal of installer-related settings.
- Eliminated the `GetResourceRequest` DTO and its validator, along with the `WrongResourceName` error code.
- Updated documentation to clarify the changes in resource handling and the retirement of per-user file encryption.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 04:17:55 +03:00
Oleksandr Bezdieniezhnykh c7b297de83 refactor: remove deploy.cmd and update Dockerfile for health checks
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
- Deleted the deploy.cmd script as it was no longer needed.
- Updated Dockerfile to include curl for health checks and added a non-root user for improved security.
- Modified health check command to use curl for better reliability.
- Adjusted docker-compose.test.yml to reflect changes in health check configuration.
- Cleaned up appsettings.json and removed unused configuration properties.
- Removed Resource entity and related requests from the codebase as part of the architectural shift.
- Updated documentation to reflect the removal of hardware binding and related endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 08:47:21 +03:00
Oleksandr Bezdieniezhnykh 0c9340a1af chore(autodev): mark cycle 1 Implement step complete
All four cycle-1 tasks (AZ-513, AZ-196, AZ-183, AZ-197) are In Testing on
Jira. Full suite passes (48/48 e2e + 2/2 unit) after a fresh test-DB volume.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 04:54:02 +03:00
Oleksandr Bezdieniezhnykh 5e90512987 [AZ-197] Remove hardware ID binding from resource flow
Sealed-Jetson + SaaS architecture eliminates the credential-reuse-across-
machines threat that motivated hardware fingerprint binding. The binding's
only remaining effect was a real production failure mode on legitimate
hardware events.

Production:
- Drop PUT /users/hardware/set and POST /resources/check.
- Simplify POST /resources/get/{dataFolder?} (no Hardware field).
- Remove CheckHardwareHash, UpdateHardware, Security.GetHWHash.
- GetApiEncryptionKey signature: (email, password) — no hardwareHash.
- Drop SetHWRequest DTO and Hardware property from GetResourceRequest.
- Remove HardwareIdMismatch (40) and BadHardware (45) ExceptionEnum
  entries; numeric codes left as a gap, not for reuse.

Wire-compat policy: drop entirely (no Loader; no in-flight legacy
clients). Stale callers will see 404s, which is the right loud failure.

Tombstones:
- User.Hardware DB column kept (nullable, unused) — separate cleanup
  ticket for the migration per workspace "no rename without confirmation".
- User.LastLogin is now never written by app code (only writer was inside
  the deleted CheckHardwareHash); flagged in batch_06_review for a future
  ticket.

Tests:
- Delete e2e HardwareBindingTests (165 lines) and Azaion.Test
  UserServiceTest (sole test was CheckHardwareHashTest).
- Drop Hardware payloads + /resources/check preconditions from e2e
  ResourceTests, SecurityTests, ResilienceTests; drop hardwareId arg
  from Azaion.Test SecurityTest.
- Add SecurityTests.Hardware_endpoints_are_removed_AZ_197 (AC-2 regression
  asserting both removed routes return 404).

Docs:
- architecture.md: System Context note, ADR-003 new key formula, ADR-004
  retired with rationale.
- diagrams/flows/flow_hardware_check.md: tombstoned.

Also archives the four batch-1+batch-2 task files into _docs/02_tasks/done/
(file moves were missed by the batch_05 commit).

Code review: PASS — see _docs/03_implementation/reviews/batch_06_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 04:46:39 +03:00
Oleksandr Bezdieniezhnykh 5ca9ccab2c [AZ-513] [AZ-196] [AZ-183] Add /classes CRUD, /devices, fleet OTA
AZ-513: POST/PATCH/DELETE /classes for detection-class CRUD; new
DetectionClass entity, schema, DTOs, IDetectionClassService. Unblocks
ui/AZ-512.

AZ-196: POST /devices auto-assigns sequential azj-NNNN serial+email
+password and inserts a CompanionPC user. Returns plaintext credentials
for the provisioning script.

AZ-183: Resources table + POST /get-update + POST /resources/publish
for fleet OTA. Per-resource encryption_key column AES-256-CBC encrypted
at rest with ResourcesConfig.EncryptionMasterKey; ICache wraps the
per-(arch,stage) latest-versions lookup and is invalidated on publish.

Adds IDbFactory.RunAdmin<T> overload for write-and-return.

Backfills _docs/02_document/module-layout.md to satisfy the implement
skill's File Ownership prerequisite (the _docs/ artifact set predates
the Step 1.5 module-layout addition).

Code review: PASS_WITH_WARNINGS — see
_docs/03_implementation/reviews/batch_05_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 04:34:42 +03:00
Oleksandr Bezdieniezhnykh f13c57b314 chore: migrate autodev state file to current format
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 04:00:58 +03:00