mirror of https://github.com/azaion/missions.git synced 2026-06-21 06:41:07 +00:00

Files

T

Oleksandr Bezdieniezhnykh 001e80fe96 [AZ-585] [AZ-586] ResLim+Perf NFT tests; close test cycle 1

Batch 4 of test implementation cycle 1 (existing-code Step 6, final batch).

- AZ-585 SteadyStateLoadTests + ColdStartRssTests: NFT-RES-LIM-01..04.
  SteadyStateLoadFixture runs one 5-min sustained-load window and samples
  RSS (docker stats), Npgsql conns (pg_stat_activity), and FDs
  (/proc/1/fd) every 5s; three test methods assert independently. All
  SkippableFact-gated on docker primitives.
- AZ-586 PerformanceTests: NFT-PERF-01..04. Sequential single-client,
  5 warm-ups + N measured calls, P50+P95 via LatencyPercentiles, recorded
  to PERF_RESULTS_FILE. Tagged Category=Perf so default gate excludes them.

Infrastructure:
- entrypoint.sh now applies --filter "${TEST_FILTER:-Category!=Perf}"
  per AZ-586 (default CI gate excludes performance).
- MetricCsvRecorder: idempotent CSV appender keyed on env var, used by
  both Perf and ResLim categories.

Step 6 (Implement Tests) is complete. Final report at
_docs/03_implementation/implementation_report_tests.md handoffs the
full-suite gate to test-run/SKILL.md (Step 7).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-15 09:11:53 +03:00

9.3 KiB

Raw Blame History

Resource Limit Tests

Task: AZ-585_test_resource_limits Name: Resource limit tests (NFT-RES-LIM-01..04) Description: Implement xUnit blackbox tests for the 4 resource-limit observation scenarios — steady-state RSS memory under 5-min sustained load (P95 ≤ 250 MiB; no monotonic climb), Npgsql connection pool ≤ 100 with no unbounded growth, file-descriptor count ≤ 1024 with no leak, and cold-start RSS ≤ 200 MiB at t=30s after health-ok. Provisional gates documented per restrictions.md H6 — locked in after first green run. Complexity: 3 points Dependencies: AZ-576_test_infrastructure Component: Blackbox Tests Tracker: AZ-585 Epic: AZ-575

Problem

Per H6, container-level resource limits are NOT enforced inside the container — they will be set at the suite level (_infra/_compose/) per device type once locked. These tests establish baseline observations so the suite can size the cgroup limits correctly AND provide an upper-bound regression gate so future changes do not silently 10× the memory or FD footprint. The 8 GB Jetson Orin must accommodate ~6 .NET edge services + Postgres + UI; missions's budget is ~200 MiB cold + ~250 MiB hot. Without these observation tests, a leak or library bloat could ship to the device and force a re-sizing decision late in deployment.

Outcome

All four NFT-RES-LIM-01..04 scenarios run and pass against the dockerised missions service.
Each test produces a CSV row with Category=ResLim, Traces=H1|H3|H6|O10, Result=pass, AND records the measured value (e.g., P95_RSS_MiB=187) in the Traces column so suite-level deployment planning can read it.
NFT-RES-LIM-01 measures P95 RSS over 5 minutes of mixed sustained load AND asserts final_RSS - P95_RSS ≤ 20% * P95_RSS (no monotonic climb).
NFT-RES-LIM-02 measures Npgsql connection count via pg_stat_activity every 5s AND asserts both max ≤ 100 AND final ≤ 1.3 * first_minute_steady_state.
NFT-RES-LIM-03 measures /proc/<pid>/fd | wc -l inside the container every 5s AND asserts both max ≤ 1024 AND final ≤ 1.3 * minute_one_count.
NFT-RES-LIM-04 measures cold-start RSS exactly 30s after GET /health first returns 200 (no requests issued yet) AND asserts RSS ≤ 200 MiB.

Scope

Included

NFT-RES-LIM-01 Steady-state memory under 5-min sustained load.
NFT-RES-LIM-02 Connection pool steady-state.
NFT-RES-LIM-03 File-descriptor steady-state.
NFT-RES-LIM-04 Cold-start RSS budget.
Each test records the measured value to the CSV Traces field so deployment planning can pick it up.
Provisional gates: 250 MiB hot, 200 MiB cold, 100 connections, 1024 FDs. On first green run, replace provisional gates with measured + 50% and open a Refactor Backlog ticket if the provisional gate was exceeded.

Excluded

Performance (latency / throughput) tests live in Task 19.
GPU / temperature / disk-I/O monitoring (per restrictions.md H8 — no specialised hardware on a CRUD service).
Long-soak / endurance tests (> 5 min) — explicitly deferred per restrictions.md H8.

Acceptance Criteria

AC-1: NFT-RES-LIM-01 steady-state RSS ≤ provisional 250 MiB with no monotonic climb Given missions running with seed_25_missions + seed_3_vehicles_2_default and no host-side memory limit When the test orchestrator drives ~50 RPS of mixed GET /vehicles, GET /missions, GET /missions/{id}/waypoints for 5 minutes from a single concurrent client, while polling docker stats --no-stream missions-sut every 5s Then the P95 of the 60 RSS samples is ≤ 250 MiB (provisional gate) And the final-sample RSS is within ± 20% of the P95 RSS (no sustained leak — RSS does not climb monotonically) And the measured P95 is recorded to the CSV Traces column as P95_RSS_MiB=<n>

AC-2: NFT-RES-LIM-02 connection pool ≤ 100 with no unbounded growth Given the same setup as NFT-RES-LIM-01 When the test orchestrator polls side-channel SELECT count(*) FROM pg_stat_activity WHERE application_name LIKE 'Npgsql%' OR (usename='postgres' AND backend_type='client backend') every 5s for 5 minutes Then the max sampled connection count is ≤ 100 And the final-sample count is ≤ 1.3 × (mean of samples in the first minute) And the measured max is recorded as MAX_NPGSQL_CONNS=<n>

AC-3: NFT-RES-LIM-03 file descriptors ≤ 1024 with no leak Given the same setup as NFT-RES-LIM-01 When the test orchestrator executes docker exec missions-sut sh -c 'ls /proc/$(pgrep -f Azaion.Missions.dll | head -1)/fd | wc -l' every 5s for 5 minutes Then the max sampled FD count is ≤ 1024 And the final-sample count is ≤ 1.3 × (count at t=1min) And the measured max is recorded as MAX_FD=<n>

AC-4: NFT-RES-LIM-04 cold-start RSS ≤ 200 MiB Given missions has been started fresh (via docker compose up -d missions after down -v), no requests issued yet When GET /health first returns 200 AND 30s have elapsed Then docker stats --no-stream missions-sut reports MEM USAGE ≤ 200 MiB And the measured cold-start RSS is recorded as COLD_RSS_MiB=<n>

Non-Functional Requirements

Performance

NFT-RES-LIM-01..03: each take exactly 5 minutes (sampling window). With Arrange/teardown, ≤ 6 minutes wall-clock.
NFT-RES-LIM-04: ≤ 60s wall-clock (fresh start + health-poll + 30s wait + measurement).
The total task runtime budget is ≤ 20 minutes, fitting inside the documented 15-min suite CI gate per environment.md. NFT-RES-LIM-01..03 share the same 5-minute window and run concurrently against a single dockerised missions; NFT-RES-LIM-04 runs separately because it requires a fresh start.

Reliability

The load generator is a single-thread HttpClient driving requests in a tight loop; this is documented at 50 RPS approximately for the in-suite test runner. If the runner is unable to sustain 50 RPS (CI infrastructure too slow), the test SKIPS NFT-RES-LIM-01..03 with Result=skip and a clear ErrorMessage=runner cannot sustain target load. CI then reruns these on a beefier worker.

Blackbox Tests

AC Ref	Initial Data/Conditions	What to Test	Expected Behavior	NFR References
AC-1	`seed_25_missions` + 50 RPS for 5 min	P95 RSS sampling	P95 ≤ 250 MiB + no monotonic climb	H1, H6, O10
AC-2	same	`pg_stat_activity` polling	max ≤ 100 + final ≤ 1.3×steady	O10
AC-3	same	`/proc/<pid>/fd` polling	max ≤ 1024 + final ≤ 1.3×minute-one	H6, O10
AC-4	fresh `docker compose up -d`	cold-start RSS at t=30s	RSS ≤ 200 MiB	H1, H3

Constraints

docker stats and docker exec from inside the runner: requires Docker socket access; AZ-576 covers this.
NFT-RES-LIM-03 requires pgrep inside the missions image; the test FAILS in Arrange (not Assert) if pgrep is unavailable. Alternative: parse /proc/1/comm if PID 1 is the .NET process (preferred for the small Dockerfile).
All measurements are recorded to the CSV report's Traces field so deployment planning can pick them up; this is more important than the pass/fail gate.
Provisional gates are documented per restrictions.md H6 — locked in based on first measured run.
AAA pattern with // Arrange / // Act / // Assert per test.

Risks & Mitigation

Risk 1: Measurement variance on shared CI runners

Risk: A runner under noisy-neighbour load reports inflated RSS, flaking the gate.
Mitigation: Gates are provisional and generous (250 MiB vs. typical .NET service of ~150 MiB; 100 connections vs. typical idle pool of ~5–10). After the first green run, the gate is locked at measured + 50%.

Risk 2: NFT-RES-LIM-01..03 share a 5-minute window — flake correlation

Risk: A CI hiccup that kills the SUT mid-window flakes all three at once.
Mitigation: Each test asserts its own metric; on missions-sut exit during the window, the test FAILS with a "SUT exited during measurement window" ErrorMessage rather than reporting a misleading metric value.

Risk 3: Provisional gates silently accepted as the locked gate

Risk: If the first green run measures 200 MiB and the test passes, a future engineer treats 250 MiB as the gate forever — but actual headroom is only 50 MiB.
Mitigation: The test logs (measured / gate) ratio; CI dashboards flag ratios > 0.8 for re-tuning consideration. The lock-in workflow is documented in restrictions.md H6.

System Under Test Boundary

Tests drive the product through the public HTTP surface for load generation; docker stats, docker exec, and side-channel pg_stat_activity for measurement. Expected outputs are the documented gates from _docs/02_document/tests/resource-limit-tests.md (provisional) and the corresponding entries in _docs/00_problem/input_data/expected_results/results_report.md (when locked).
Stubs are allowed ONLY for: the external admin JWT issuer (jwks-mock container) and the DB-only stub tables for media, annotations, detection, map_objects.
Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the Npgsql connection pool, the AppDataConnection lifetime, or the Program.cs startup path. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.

9.3 KiB Raw Blame History Unescape Escape