Skip to content

Add streaming task log support to BaseExecutor and FileTaskHandler#69299

Open
jason810496 wants to merge 1 commit into
apache:mainfrom
jason810496:refactor/logging/add-stream-method-for-base-executor
Open

Add streaming task log support to BaseExecutor and FileTaskHandler#69299
jason810496 wants to merge 1 commit into
apache:mainfrom
jason810496:refactor/logging/add-stream-method-for-base-executor

Conversation

@jason810496

@jason810496 jason810496 commented Jul 3, 2026

Copy link
Copy Markdown
Member

part of the streaming task log series

  1. Merge this first - the other two build on it.
  2. Add streaming task log support to KubernetesExecutor #69300
  3. Add running_pod_log_lines config option to KubernetesExecutor #69301

Why

When the API server reads logs for a RUNNING task, the executor's get_task_log returns the whole log fully materialized, so it sits on the heap at once before the bounded LogStreamAccumulator downstream (5000 messages resident, rest spilled to disk) can do its job. Large logs spike the API server's anonymous heap and can OOM it.

What

  • Add get_streaming_task_log and a supports_streaming_logs class attribute (default False) to BaseExecutor.
  • FileTaskHandler._read prefers the streaming method when the executor advertises supports_streaming_logs, and falls back to the legacy get_task_log otherwise, so provider and custom executors that haven't implemented it keep working unchanged.
  • No executor in this PR advertises support yet; the KubernetesExecutor family implementation is the follow-up PR.

Benchmark

A/B measurement of API server memory serving the same ~415 MB ndjson log (1M lines) of a RUNNING KubernetesExecutor task through GET .../logs/{try_number} with Accept: application/x-ndjson, sampling the API server cgroup. A = materializing read (before this series), B = streaming read (this PR + the KubernetesExecutor follow-up).

Metric (API server cgroup) A: materializing B: streaming
Peak anonymous heap growth +2093.9 MiB +179.9 MiB (~11.6x lower)
Peak RSS (memory.current) 2964.4 MiB 1193.4 MiB
Elapsed 33.2 s 19.4 s

Without streaming the full log lives on the heap at once (~2.1 GiB anonymous, non-reclaimable memory, which is what OOMs the API server). With streaming the executor yields into LogStreamAccumulator; B's remaining RSS growth is mostly reclaimable page cache from the accumulator's disk spill.

Interpretation caveat: KubernetesExecutor.RUNNING_POD_LOG_LINES = 100 normally caps the executor read to the last 100 lines, which would hide the difference; the benchmark lifted the cap to 100,000,000 in both builds. Production keeps the cap at 100, so this measures the mechanism's headroom for executors returning large logs, not a shipped memory reduction today (the running_pod_log_lines config PR is what makes the cap tunable). Single run per build, and only the ndjson path streams end to end (Accept: application/json buffers the whole response regardless).


Was generative AI tooling used to co-author this PR?

Reading a running task's log through an executor materializes the whole
log in the API server before the bounded LogStreamAccumulator can bound
memory, so large logs spike the API server heap. This adds an interface
executors can implement to stream log lines lazily instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:Executors-core LocalExecutor & SequentialExecutor area:logging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant