Adaptive Regularization for sparse embeddings — kernel + codegen by hdmeta · Pull Request #5949 · pytorch/FBGEMM

hdmeta · 2026-06-24T18:28:07Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2864

Backend (C++) half of the RMSprop-style Adaptive Regularization (AR) optimizer for sparse embeddings.

This diff adds the CUDA/CPU kernel template, codegen registration, and BUCK entries — the backend half of a kernel/frontend split. The frontend (Python TBE dispatch, EmbOptimType.ROWWISE_RMSPROP_AR, OptimizerArgs fields, tests) lands separately.

AR replaces static L2 weight decay with a lazy, staleness-aware shrink paid only when a row is touched: weight ← max(min_shrinkage, 1 − min(1, lr·ar_alpha·I))·weight − mult·grad, where I is the number of steps since the row was last updated.

Design:

RMSprop-style EMA accumulator: v ← ema_beta·v + (1−ema_beta)·mean(g²). Named ROWWISE_RMSPROP_AR (not ROWWISE_ADAGRAD_AR) since the accumulator is an EMA, not a cumulative Adagrad sum; this keeps the lr/(√v+eps) multiplier bounded on hot rows.
Linear-clipped decay max(min_shrinkage, 1−λ). Default min_shrinkage=0.1 retains ≥10% of a cold row's weight; 0.0 enables a full soft-reset.
ar_alpha=0 reduces exactly to plain rowwise RMSprop (safe default).
Per-row state: momentum1 (EMA of g²) + prev_iter, 2 floats/row.

After landing: torch.ops.fbgemm.split_embedding_backward_codegen_rowwise_rmsprop_ar_* C++ ops exist; the Python-visible enum/args land with the frontend.

Reference: Li & Lyu, "Adaptive Regularization for Large-Scale Sparse Feature Embedding Models" (ICLR 2026), Algorithm 2 (appendix G).

Differential Revision: D105272960

Summary: X-link: facebookresearch/FBGEMM#2864 Backend (C++) half of the RMSprop-style Adaptive Regularization (AR) optimizer for sparse embeddings. This diff adds the CUDA/CPU kernel template, codegen registration, and BUCK entries — the backend half of a kernel/frontend split. The frontend (Python TBE dispatch, `EmbOptimType.ROWWISE_RMSPROP_AR`, `OptimizerArgs` fields, tests) lands separately. The split avoids `RuntimeError: No such operator` when the C++ op (`training_platform` fbpkg) and Python frontend (`ads_dper3` fbpkg) ship at different cadences. AR replaces static L2 weight decay with a lazy, staleness-aware shrink paid only when a row is touched: `weight ← max(min_shrinkage, 1 − min(1, lr·ar_alpha·I))·weight − mult·grad`, where `I` is the number of steps since the row was last updated. Design: - RMSprop-style EMA accumulator: `v ← ema_beta·v + (1−ema_beta)·mean(g²)`. Named `ROWWISE_RMSPROP_AR` (not `ROWWISE_ADAGRAD_AR`) since the accumulator is an EMA, not a cumulative Adagrad sum; this keeps the `lr/(√v+eps)` multiplier bounded on hot rows. - Linear-clipped decay `max(min_shrinkage, 1−λ)`. Default `min_shrinkage=0.1` retains ≥10% of a cold row's weight; `0.0` enables a full soft-reset. - `ar_alpha=0` reduces exactly to plain rowwise RMSprop (safe default). - Per-row state: `momentum1` (EMA of g²) + `prev_iter`, 2 floats/row. After landing: `torch.ops.fbgemm.split_embedding_backward_codegen_rowwise_rmsprop_ar_*` C++ ops exist; the Python-visible enum/args land with the frontend. Reference: Li & Lyu, "Adaptive Regularization for Large-Scale Sparse Feature Embedding Models" (ICLR 2026), Algorithm 2 (appendix G). Reviewed By: spcyppt, skarakulak Differential Revision: D105272960

meta-codesync · 2026-06-24T18:28:24Z

@hdmeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105272960.

meta-cla Bot added the cla signed label Jun 24, 2026

meta-codesync Bot added the meta-exported label Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adaptive Regularization for sparse embeddings — kernel + codegen#5949

Adaptive Regularization for sparse embeddings — kernel + codegen#5949
hdmeta wants to merge 1 commit into
pytorch:mainfrom
hdmeta:export-D105272960

hdmeta commented Jun 24, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hdmeta commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hdmeta commented Jun 24, 2026 •

edited

Loading