-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Wrap immutable plan parts into Arc (make creating ExecutionPlans less costly)
#19893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b9caf1d to
bb97763
Compare
|
run benchmark reset_plan_states |
|
🤖 Hi @askalt, thanks for the request (#19893 (comment)). |
|
@xudong963 could you please |
|
run benchmark reset_plan_states |
|
🤖 |
|
🤖: Benchmark completed Details
|
70a396a to
3614687
Compare
3614687 to
bb97763
Compare
bb97763 to
f2d829a
Compare
20b3042 to
e6fed67
Compare
|
I decided to simplify |
1e27378 to
2c0746c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry this is taking so long, but I a very worried about API churn and it takes me a non trivial amount of time to review the potential implications of this change
I have two major points:
- I really think it is important to not force downstream users to change how they call
project_schemaas it is a very common operation and there is no technical reason we can't keep their code the same (though it is harder in Datafusion). I made a PR to do so (and backout many of the changes currently required in this PR): askalt#5 as suggested by @crepererum - As I mentioned previously
OptionProjectionRefseems overly complicated to me to add to the public API -- and because theOptionis wrapped, it means that the normal functions forOption(like match and .and_then, etc) don't work. I think it would be simpler, and follow the rest of the code more closely if it wereOption<ProjectionRef>and the special projection application code is put into ProjectionRef
Thank you for your work on this @askalt
Ok, I will do it. |
2c0746c to
6053a7c
Compare
6053a7c to
5d11e60
Compare
@alamb, done as suggested, please check
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @askalt - this looks really nice to me now. I think the code is faster and easier to work with 🏆 (HashJoinExecBuilder, NestedLoopJoinExecBuilder are especially nice)
I'll kick off a final benchmark run but I think this should be good to merge tomorrow
| hash_join.null_aware, | ||
| )?))) | ||
| Ok(Some(Arc::new( | ||
| HashJoinExecBuilder::from(hash_join) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
| @@ -1370,9 +1377,9 @@ impl ExecutionPlan for AggregateExec { | |||
| ) -> Result<Arc<dyn ExecutionPlan>> { | |||
| let mut me = AggregateExec::try_new_with_schema( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow on PR, I think we can probably avoid a lot of redundant work here -- try_new_with_schema does many things, including re-compute plan properties
| } | ||
| } | ||
|
|
||
| /// Helps to build [`HashJoinExec`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice
|
run benchmark sql_planner |
1 similar comment
|
run benchmark sql_planner |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
- Closes apache#19852 Improve performance of query planning and plan state re-set by making node clone cheap. - Store projection as `Option<Arc<[usize]>>` instead of `Option<Vec<usize>>` in `FilterExec`, `HashJoinExec`, `NestedLoopJoinExec`. - Store exprs as `Arc<[ProjectionExpr]>` instead of Vec in `ProjectionExprs`. - Store arced aggregation, filter, group by expressions within `AggregateExec`.
5d11e60 to
a4bbb71
Compare
ExecutionPlans less costly)
| can_project(&self.schema(), projection.as_deref())?; | ||
| let projection = | ||
| combine_projections(projection.as_ref(), self.projection.as_ref())?; | ||
| HashJoinExecBuilder::from(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this builder pattern a bit risky. Currently, HashJoinExecBuilder::from(self) clones some fields from self while implicitly resetting others (for example, the dynamic filter).
This behavior may be surprising for certain transformations. It might be safer to make such partial resets explicit, so callers are clearly aware of which fields are preserved and which are dropped.
Would it be possible to remove these builders and require all cloning to be explicit instead? Since immutable fields are already wrapped in Arc, planning performance should not be affected. The main downside I see is increased verbosity.
Which issue does this PR close?
Arc#19852Rationale for this change
Improve performance of query planning and plan state re-set by making node clone cheap.
What changes are included in this PR?
Option<Arc<[usize]>>instead ofOption<Vec<usize>>inFilterExec,HashJoinExec,NestedLoopJoinExec.Arc<[ProjectionExpr]>instead of Vec inProjectionExprs.AggregateExec.