Skip to content

Conversation

@liaoxin01
Copy link
Contributor

pick from:#60284

@liaoxin01 liaoxin01 requested a review from yiguolei as a code owner January 27, 2026 14:46
Copilot AI review requested due to automatic review settings January 27, 2026 14:46
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a deadlock issue in load_stream by ensuring flush tasks are properly waited for before TabletStream destruction. The fix reverts the use of shared_from_this() and instead relies on explicit lifetime management through the IndexStream destructor.

Changes:

  • Removed std::enable_shared_from_this inheritance from TabletStream and all shared_from_this() usages
  • Added wait_for_flush_tasks() method to TabletStream to safely wait for all flush tasks before destruction
  • Added IndexStream destructor to ensure all TabletStreams wait for flush tasks before cleanup

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
be/src/runtime/load_stream.h Removed shared_from_this inheritance, added wait_for_flush_tasks() declaration, added IndexStream destructor, added _flush_tasks_done flag
be/src/runtime/load_stream.cpp Replaced shared_from_this with this in lambdas, implemented wait_for_flush_tasks() and IndexStream destructor, refactored pre_close() to use wait_for_flush_tasks()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from 1529b65 to ed0add4 Compare January 27, 2026 15:08
@liaoxin01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 77.78% (28/36) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.72% (18902/35852)
Line Coverage 35.84% (175554/489826)
Region Coverage 32.43% (135628/418238)
Branch Coverage 33.33% (58864/176591)

@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from ed0add4 to 3e507a7 Compare January 28, 2026 01:36
…h_tasks is called before destruction

Problem:
When TabletStream is destroyed without pre_close() being called (e.g., on_idle_timeout scenario),
the _flush_token destructor calls shutdown() which triggers deadlock detection if called from
the pool thread.

Root cause:
- on_idle_timeout() directly calls brpc::StreamClose() without calling LoadStream::close()
- This triggers the destruction chain without calling pre_close() on TabletStreams
- If flush tasks are still running, TabletStream may be destroyed in pool thread

Solution:
- Add IndexStream::~IndexStream() to ensure wait_for_flush_tasks() is called on all TabletStreams
- Add TabletStream::wait_for_flush_tasks() to wait for all flush tasks to complete
- This ensures _flush_token is properly handled before TabletStream destruction
- Revert commit c18ef17 (shared_from_this) as it is no longer needed
@liaoxin01 liaoxin01 force-pushed the fix-load-stream-flush-token-deadlock branch from 3e507a7 to 2a1ef37 Compare January 28, 2026 01:37
@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 78.38% (29/37) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (18899/35852)
Line Coverage 35.82% (175477/489826)
Region Coverage 32.48% (135823/418238)
Branch Coverage 33.33% (58860/176591)

@yiguolei
Copy link
Contributor

skip buildall

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 28, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 947f495 into apache:branch-4.0 Jan 28, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants