Migrate OMP use to Aten/Parallel.h (#5947) by q10 · Pull Request #5947 · pytorch/FBGEMM

q10 · 2026-06-23T18:59:15Z

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2862

This is the only direct use of omp in the library, and I believe that if we migrate to using Aten/Parallel.h it will use omp under the hood if pytorch is built with AT_PARALLEL_OPENMP set to 1 anyway, so this allows fbgemm_gpu users to set their threading model consistently based on their PyTorch build.

Test Plan:
Ran the CPU unit test that directly exercises the rewritten csr2csc_template_ (the function this diff migrates from #pragma omp to at::parallel_for), covering both index_t instantiations:

buck2 test fbcode//mode/opt fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:cpu_kernel

Result: Pass 2. Fail 0. (CpuKernelTest.csr2csc_test_int32, CpuKernelTest.csr2csc_test_int64)
Test session: https://www.internalfb.com/intern/testinfra/testrun/26458647838875303

Higher-level TBE suites that exercise csr2csc indirectly:

buck2 test fbcode//mode/opt \
  fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward \
  fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:backward_sgd

Not exercised locally: the Python test runtime fails with cannot execute binary file: Exec format error (signal 126) on this aarch64 devserver, including the CPU-only cases (e.g. test_forward_cpu_fp32, test_backward_sgd_fp32_pmNONE_cpu). This is a host arch/toolchain limitation, not a code issue. Run these on an x86_64 GPU host or via Sandcastle CI.

Differential Revision: D109460231

Pulled By: q10

meta-codesync · 2026-06-23T18:59:33Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109460231.

Summary: X-link: facebookresearch/FBGEMM#2862 This is the only direct use of `omp` in the library, and I believe that if we migrate to using `Aten/Parallel.h` it will use `omp` under the hood if `pytorch` is built with `AT_PARALLEL_OPENMP` set to 1 anyway, so this allows `fbgemm_gpu` users to set their threading model consistently based on their PyTorch build. Test Plan: Ran the CPU unit test that directly exercises the rewritten `csr2csc_template_` (the function this diff migrates from `#pragma omp` to `at::parallel_for`), covering both `index_t` instantiations: ``` buck2 test fbcode//mode/opt fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:cpu_kernel ``` Result: Pass 2. Fail 0. (CpuKernelTest.csr2csc_test_int32, CpuKernelTest.csr2csc_test_int64) Test session: https://www.internalfb.com/intern/testinfra/testrun/26458647838875303 Higher-level TBE suites that exercise csr2csc indirectly: ``` buck2 test fbcode//mode/opt \ fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward \ fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:backward_sgd ``` Not exercised locally: the Python test runtime fails with `cannot execute binary file: Exec format error` (signal 126) on this aarch64 devserver, including the CPU-only cases (e.g. test_forward_cpu_fp32, test_backward_sgd_fp32_pmNONE_cpu). This is a host arch/toolchain limitation, not a code issue. Run these on an x86_64 GPU host or via Sandcastle CI. Differential Revision: D109460231 Pulled By: q10

Summary: Pull Request resolved: pytorch#5947 X-link: https://github.com/facebookresearch/FBGEMM/pull/2862 This is the only direct use of `omp` in the library, and I believe that if we migrate to using `Aten/Parallel.h` it will use `omp` under the hood if `pytorch` is built with `AT_PARALLEL_OPENMP` set to 1 anyway, so this allows `fbgemm_gpu` users to set their threading model consistently based on their PyTorch build. Pull Request resolved: pytorch#5943 Test Plan: Ran the CPU unit test that directly exercises the rewritten `csr2csc_template_` (the function this diff migrates from `#pragma omp` to `at::parallel_for`), covering both `index_t` instantiations: ``` buck2 test fbcode//mode/opt fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:cpu_kernel ``` Result: Pass 2. Fail 0. (CpuKernelTest.csr2csc_test_int32, CpuKernelTest.csr2csc_test_int64) Test session: https://www.internalfb.com/intern/testinfra/testrun/26458647838875303 Higher-level TBE suites that exercise csr2csc indirectly: ``` buck2 test fbcode//mode/opt \ fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward \ fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:backward_sgd ``` Not exercised locally: the Python test runtime fails with `cannot execute binary file: Exec format error` (signal 126) on this aarch64 devserver, including the CPU-only cases (e.g. test_forward_cpu_fp32, test_backward_sgd_fp32_pmNONE_cpu). This is a host arch/toolchain limitation, not a code issue. Run these on an x86_64 GPU host or via Sandcastle CI. Differential Revision: D109460231 Pulled By: q10

meta-codesync · 2026-06-24T22:32:31Z

@q10 merged this pull request in a632f25.

meta-cla Bot added the cla signed label Jun 23, 2026

meta-codesync Bot added the meta-exported label Jun 23, 2026

meta-codesync Bot changed the title ~~Migrate OMP use to Aten/Parallel.h (#5943)~~ Migrate OMP use to Aten/Parallel.h (#5947) Jun 23, 2026

q10 force-pushed the export-D109460231 branch from 6a3a3bd to 8c963a9 Compare June 23, 2026 23:20

q10 force-pushed the export-D109460231 branch from 8c963a9 to 08afaaf Compare June 23, 2026 23:27

meta-codesync Bot closed this in a632f25 Jun 24, 2026

meta-codesync Bot added the Merged label Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate OMP use to Aten/Parallel.h (#5947)#5947

Migrate OMP use to Aten/Parallel.h (#5947)#5947
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D109460231

q10 commented Jun 23, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

q10 commented Jun 23, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

q10 commented Jun 23, 2026 •

edited by meta-codesync Bot

Loading