Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,100 @@
# FDR Log Bridge (ERROR + WARN forwarding)
**Task**: AZ-267_fdr_log_bridge
**Name**: FDR Log Bridge
**Description**: Subscribe a logging Handler to the shared logger that forwards every ERROR and WARN record into the Flight Data Recorder via the FDR producer client, tagged `kind="log"` so post-flight tooling can correlate log events with the rest of the recorded telemetry.
**Complexity**: 2 points
**Dependencies**: AZ-266_log_module, AZ-247 (forward — FDR producer + record schema not yet decomposed; this task's contract surface is satisfied once AZ-247's record schema contract is published)
**Component**: shared.logging (cross-cutting; epic AZ-245 / E-CC-LOG)
**Tracker**: AZ-267
**Epic**: AZ-245 (E-CC-LOG)
### Document Dependencies
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — log envelope this bridge consumes (produced by AZ-266).
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` — FDR record schema this bridge writes into (produced by AZ-247; document does not yet exist — Step 4 cross-verification will catch the forward reference).
## Problem
The acceptance criterion "ERROR + WARN records appear in FDR with `kind = \"log\"` and a back-reference to the originating component" requires a bridge between the shared Python `logging` machinery and the FDR producer client. Without this bridge, post-flight tools cannot correlate a `c5_state` ERROR log with the surrounding telemetry frames captured at the same flight time.
## Outcome
- Every emitted log record at level WARN or ERROR is enqueued into the FDR producer queue with `kind="log"` and the originating component slug preserved.
- INFO and DEBUG records are NEVER enqueued into FDR (verified by the contract test in PBI #3 of this epic).
- The bridge never blocks the calling thread — it uses the FDR producer client's drop-oldest semantics so a saturated queue cannot stall a `logger.error(...)` call on the hot path.
## Scope
### Included
- A logging Handler subclass installed onto the root onboard logger (or each `get_logger(...)` instance, whichever the AZ-266 implementation chose) that subscribes to records at WARN and ERROR.
- Translation logic from `LogRecord` (per `log_record_schema` v1.0.0) into the FDR record envelope expected by the FDR producer client, with `kind="log"` and a `component` back-reference.
- Wire-up in the composition root (consumed from AZ-246 / E-CC-CONF) so the bridge is attached exactly once, after the logger and the FDR client are both initialised.
### Excluded
- The FDR producer client itself — owned by AZ-247 / E-CC-FDR-CLIENT.
- The on-disk FDR segment writer thread — owned by AZ-248 / E-C13.
- The contract test that verifies "DEBUG + INFO never reach FDR" — owned by PBI #3 of this epic (next task).
- Per-component log call sites — owned by each component epic.
## Acceptance Criteria
**AC-1: WARN records reach FDR**
Given the bridge is installed and the FDR client's queue is below capacity
When any component emits `logger.warning(...)` via the shared logger
Then a single FDR record with `kind="log"`, `level="WARN"`, and `component=<originating component slug>` is enqueued
**AC-2: ERROR records reach FDR with traceback when applicable**
Given the bridge is installed
When a component emits `logger.exception(...)` from inside an `except` clause
Then the enqueued FDR record's `exc` field carries the formatted traceback string from the `LogRecord`
**AC-3: INFO and DEBUG never reach FDR**
Given the bridge is installed
When any component emits `logger.info(...)` or `logger.debug(...)`
Then no FDR record is enqueued for that log call (verified by both unit tests here and the contract test in the next task)
**AC-4: Backpressure is non-blocking**
Given the FDR producer queue is at its drop-oldest threshold
When a component emits `logger.error(...)` on the hot path
Then the call returns within the same latency budget as a stdout-only WARN call (no blocking on the queue), and the FDR client's existing drop counter is incremented
**AC-5: Single attachment**
Given `compose_root(config)` runs at process start
When the bridge wire-up is invoked
Then exactly one bridge Handler is attached to the logger; reinitialising the composition root in tests does not stack duplicates
## Non-Functional Requirements
**Performance**
- Bridge add ≤ 0.05 ms p99 latency on top of the formatter's 0.2 ms budget (i.e. logger.error → bridge enqueue total p99 ≤ 0.25 ms on Tier-2).
**Reliability**
- A failure to enqueue (queue full + drop-oldest already saturated) MUST NOT raise into the caller; it MUST log a one-shot internal `WARN` record (via stdout only — recursion into the bridge is short-circuited by a thread-local flag) every N occurrences, where N is at least 1000.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | Emit a WARN through the shared logger with the bridge installed | Stub FDR queue receives one record with `kind="log"`, `level="WARN"`, `component` matching origin |
| AC-2 | Inside an `except` block, call `logger.exception("boom")` | Stub FDR queue's record carries non-empty `exc` traceback string |
| AC-3 | Emit INFO and DEBUG records | Stub FDR queue receives zero records |
| AC-4 | Pre-fill stub FDR queue to drop-oldest threshold, then emit an ERROR | Caller returns under 0.5 ms wall clock; FDR client's drop counter increments |
| AC-5 | Call `compose_root` twice with the same config in a single process | Logger has exactly one bridge Handler attached after the second call |
## Constraints
- The bridge has a forward dependency on AZ-247 (FDR producer client + record schema). It cannot pass its own AC tests until AZ-247 is implemented; Step 4 cross-verification will record this temporal dependency in `_dependencies_table.md`.
- The bridge's record translation MUST consume only the public surface of `log_record_schema` v1.0.0 — no peeking into formatter internals.
## Risks & Mitigation
**Risk 1: Recursion via internal `WARN` on enqueue failure**
- *Risk*: The "queue full" internal WARN itself goes through the bridge, recurses, and corrupts the queue further.
- *Mitigation*: Thread-local "in-bridge" flag short-circuits any logging call originating from the bridge itself; verified by a unit test that fills the queue and asserts no infinite loop.
**Risk 2: Forward dependency on AZ-247 contract not yet written**
- *Risk*: The FDR record schema is described in epic AZ-247's text but not yet a contract file; this task's expectations may drift from AZ-247's eventual contract.
- *Mitigation*: AZ-247's first PBI MUST publish `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` before AZ-247's other PBIs; this task's implementation begins only after that contract exists. Step 4 cross-verification flags the temporal dependency.