Skip to content

feat: add stderr heartbeat to diagnose stdout pipe blocking#953

Draft
devin-ai-integration[bot] wants to merge 12 commits intomainfrom
devin/1773412868-nonblocking-stdout
Draft

feat: add stderr heartbeat to diagnose stdout pipe blocking#953
devin-ai-integration[bot] wants to merge 12 commits intomainfrom
devin/1773412868-nonblocking-stdout

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 13, 2026

feat: add stderr heartbeat to diagnose stdout pipe blocking

Summary

Adds a lightweight background heartbeat thread to launch() that writes periodic diagnostic status to stderr (fd 2) every 30 seconds. This is a diagnostic-only change — the actual stdout write path (print()) is unchanged.

Purpose: In Airbyte Cloud, the orchestrator controls reading from the source container's stdout pipe. When it pauses reading, print() blocks in the kernel. We need to prove whether the platform ever resumes reading or permanently stops. Since stderr is collected independently by the Kubernetes container runtime, heartbeat lines will appear in logs even when stdout is blocked.

Heartbeat output:

STDOUT_HEARTBEAT: t=30s msgs=1042 bytes=5242880 print_blocked=NO
STDOUT_HEARTBEAT: t=60s msgs=1042 bytes=5242880 print_blocked=YES blocked_since=28s

What changed:

  • Added a daemon thread (stdout-heartbeat) that writes to fd 2 every 30s
  • Wrapped the existing print() loop with print_blocked / messages_written / bytes_written tracking
  • Added try/finally to signal the heartbeat thread to stop on exit
  • No changes to the stdout write path itself — still uses print() with PRINT_BUFFER

Updates since last revision

The PR was simplified significantly. The previous implementation included a dedicated writer thread, bounded write queue, O_NONBLOCK + select(), watchdog timeout, and capsys detection. All of that has been removed. Only the stderr heartbeat diagnostic remains, as the previous approach caused OOM kills in Cloud testing.

Review & Testing Checklist for Human

  • Thread safety of shared state: print_blocked, messages_written, and bytes_written are read by the heartbeat thread without locks. This is safe under CPython's GIL for simple type reads/writes, but worth confirming no subtle issues arise.
  • End-to-end test in Cloud: Deploy a connector prerelease pinned to this CDK version against a high-throughput source (e.g., Google Ads) in the sandbox workspace. Verify STDOUT_HEARTBEAT lines appear in stderr logs, and that print_blocked=YES with increasing blocked_since is visible when the sync hangs.
  • No performance regression: The heartbeat writes once per 30s to stderr — negligible overhead, but confirm no unexpected impact on high-throughput syncs.

Notes

Requested by @AnatoliiYatsuk.

Related issues:

Devin session

…sure

When the Airbyte platform pauses reading from the source container's
stdout pipe, the main thread's print() call blocks in an OS-level
write() syscall. This stalls the record queue consumer, filling the
bounded queue and blocking all worker threads — a complete deadlock.

This change replaces blocking print() with non-blocking os.write()
using select() to wait for the pipe to become writable. The main
thread stays in a Python-level retry loop instead of getting stuck
in a kernel syscall. When the platform resumes reading, select()
returns, the write completes, and the pipeline resumes automatically.

Key properties:
- Memory stays bounded (queue maxsize=10,000 unchanged)
- No deadlock (main thread never stuck in blocking syscall)
- Automatic recovery when platform resumes reading
- 600s watchdog raises RuntimeError if pipe stays blocked

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773412868-nonblocking-stdout#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773412868-nonblocking-stdout

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link

github-actions bot commented Mar 13, 2026

PyTest Results (Fast)

3 934 tests  ±0   3 923 ✅ +1   7m 37s ⏱️ +28s
    1 suites ±0      11 💤  - 1 
    1 files   ±0       0 ❌ ±0 

Results for commit 3a41aaa. ± Comparison against base commit 0e57414.

♻️ This comment has been updated with latest results.

…nd log restore failures

Co-Authored-By: unknown <>
@github-actions
Copy link

github-actions bot commented Mar 13, 2026

PyTest Results (Full)

3 937 tests  ±0   3 925 ✅ ±0   11m 13s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 3a41aaa. ± Comparison against base commit 0e57414.

♻️ This comment has been updated with latest results.

…global BlockingIOError

Setting os.set_blocking(fd, False) is a process-wide change that causes
BlockingIOError in other threads (logging via print_buffer, worker threads).
Instead, use a dedicated stdout-writer thread that does blocking os.write()
calls. If the pipe is full, only the writer thread stalls - the main thread
continues draining the record queue.

Co-Authored-By: unknown <>
…ise KeyboardInterrupt/SystemExit

Co-Authored-By: unknown <>
…ite timing

Logs when os.write() blocks for >5s (indicates platform paused reading),
and logs every 30s when write_queue is full. This will help validate
whether the platform ever resumes reading from the pipe after a pause.

Co-Authored-By: unknown <>
…ding from stdout pipe

Co-Authored-By: unknown <>
Co-Authored-By: unknown <>
@devin-ai-integration devin-ai-integration bot changed the title feat: non-blocking stdout writes to prevent deadlock on pipe backpressure feat: add stderr heartbeat to diagnose stdout pipe blocking Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant