Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
6.7 KiB
FDR Log Bridge (ERROR + WARN forwarding)
Task: AZ-267_fdr_log_bridge
Name: FDR Log Bridge
Description: Subscribe a logging Handler to the shared logger that forwards every ERROR and WARN record into the Flight Data Recorder via the FDR producer client, tagged kind="log" so post-flight tooling can correlate log events with the rest of the recorded telemetry.
Complexity: 2 points
Dependencies: AZ-266_log_module, AZ-247 (forward — FDR producer + record schema not yet decomposed; this task's contract surface is satisfied once AZ-247's record schema contract is published)
Component: shared.logging (cross-cutting; epic AZ-245 / E-CC-LOG)
Tracker: AZ-267
Epic: AZ-245 (E-CC-LOG)
Document Dependencies
_docs/02_document/contracts/shared_logging/log_record_schema.md— log envelope this bridge consumes (produced by AZ-266)._docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md— FDR record schema this bridge writes into (produced by AZ-247; document does not yet exist — Step 4 cross-verification will catch the forward reference).
Problem
The acceptance criterion "ERROR + WARN records appear in FDR with kind = \"log\" and a back-reference to the originating component" requires a bridge between the shared Python logging machinery and the FDR producer client. Without this bridge, post-flight tools cannot correlate a c5_state ERROR log with the surrounding telemetry frames captured at the same flight time.
Outcome
- Every emitted log record at level WARN or ERROR is enqueued into the FDR producer queue with
kind="log"and the originating component slug preserved. - INFO and DEBUG records are NEVER enqueued into FDR (verified by the contract test in PBI #3 of this epic).
- The bridge never blocks the calling thread — it uses the FDR producer client's drop-oldest semantics so a saturated queue cannot stall a
logger.error(...)call on the hot path.
Scope
Included
- A logging Handler subclass installed onto the root onboard logger (or each
get_logger(...)instance, whichever the AZ-266 implementation chose) that subscribes to records at WARN and ERROR. - Translation logic from
LogRecord(perlog_record_schemav1.0.0) into the FDR record envelope expected by the FDR producer client, withkind="log"and acomponentback-reference. - Wire-up in the composition root (consumed from AZ-246 / E-CC-CONF) so the bridge is attached exactly once, after the logger and the FDR client are both initialised.
Excluded
- The FDR producer client itself — owned by AZ-247 / E-CC-FDR-CLIENT.
- The on-disk FDR segment writer thread — owned by AZ-248 / E-C13.
- The contract test that verifies "DEBUG + INFO never reach FDR" — owned by PBI #3 of this epic (next task).
- Per-component log call sites — owned by each component epic.
Acceptance Criteria
AC-1: WARN records reach FDR
Given the bridge is installed and the FDR client's queue is below capacity
When any component emits logger.warning(...) via the shared logger
Then a single FDR record with kind="log", level="WARN", and component=<originating component slug> is enqueued
AC-2: ERROR records reach FDR with traceback when applicable
Given the bridge is installed
When a component emits logger.exception(...) from inside an except clause
Then the enqueued FDR record's exc field carries the formatted traceback string from the LogRecord
AC-3: INFO and DEBUG never reach FDR
Given the bridge is installed
When any component emits logger.info(...) or logger.debug(...)
Then no FDR record is enqueued for that log call (verified by both unit tests here and the contract test in the next task)
AC-4: Backpressure is non-blocking
Given the FDR producer queue is at its drop-oldest threshold
When a component emits logger.error(...) on the hot path
Then the call returns within the same latency budget as a stdout-only WARN call (no blocking on the queue), and the FDR client's existing drop counter is incremented
AC-5: Single attachment
Given compose_root(config) runs at process start
When the bridge wire-up is invoked
Then exactly one bridge Handler is attached to the logger; reinitialising the composition root in tests does not stack duplicates
Non-Functional Requirements
Performance
- Bridge add ≤ 0.05 ms p99 latency on top of the formatter's 0.2 ms budget (i.e. logger.error → bridge enqueue total p99 ≤ 0.25 ms on Tier-2).
Reliability
- A failure to enqueue (queue full + drop-oldest already saturated) MUST NOT raise into the caller; it MUST log a one-shot internal
WARNrecord (via stdout only — recursion into the bridge is short-circuited by a thread-local flag) every N occurrences, where N is at least 1000.
Unit Tests
| AC Ref | What to Test | Required Outcome |
|---|---|---|
| AC-1 | Emit a WARN through the shared logger with the bridge installed | Stub FDR queue receives one record with kind="log", level="WARN", component matching origin |
| AC-2 | Inside an except block, call logger.exception("boom") |
Stub FDR queue's record carries non-empty exc traceback string |
| AC-3 | Emit INFO and DEBUG records | Stub FDR queue receives zero records |
| AC-4 | Pre-fill stub FDR queue to drop-oldest threshold, then emit an ERROR | Caller returns under 0.5 ms wall clock; FDR client's drop counter increments |
| AC-5 | Call compose_root twice with the same config in a single process |
Logger has exactly one bridge Handler attached after the second call |
Constraints
- The bridge has a forward dependency on AZ-247 (FDR producer client + record schema). It cannot pass its own AC tests until AZ-247 is implemented; Step 4 cross-verification will record this temporal dependency in
_dependencies_table.md. - The bridge's record translation MUST consume only the public surface of
log_record_schemav1.0.0 — no peeking into formatter internals.
Risks & Mitigation
Risk 1: Recursion via internal WARN on enqueue failure
- Risk: The "queue full" internal WARN itself goes through the bridge, recurses, and corrupts the queue further.
- Mitigation: Thread-local "in-bridge" flag short-circuits any logging call originating from the bridge itself; verified by a unit test that fills the queue and asserts no infinite loop.
Risk 2: Forward dependency on AZ-247 contract not yet written
- Risk: The FDR record schema is described in epic AZ-247's text but not yet a contract file; this task's expectations may drift from AZ-247's eventual contract.
- Mitigation: AZ-247's first PBI MUST publish
_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.mdbefore AZ-247's other PBIs; this task's implementation begins only after that contract exists. Step 4 cross-verification flags the temporal dependency.