|
checkNCCL(ncclAllReduce(w_grad_ptr, |
A thread on Zulip mentioned some additional care required for NCCL within a Legion task. Rohan spotted a problem in FlexFlow's use of ncclAllReduce. You may need to add concurrent_task_barrier before and after the call, and call set_concurrent_barrier on the task. More info is in the comment for that barrier.
flexflow-train/lib/kernels/src/cuda/optimizer_kernel.cu
Line 83 in de7fa32
A thread on Zulip mentioned some additional care required for NCCL within a Legion task. Rohan spotted a problem in FlexFlow's use of
ncclAllReduce. You may need to addconcurrent_task_barrierbefore and after the call, and callset_concurrent_barrieron the task. More info is in the comment for that barrier.