Conversation
Greptile OverviewGreptile Summaryadds original GLU (Gated Linear Unit) activation with sigmoid gating to match the paper definition Key changes
Issues found
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Python as Python Layer<br/>(layernorm_mlp.py)
participant OpsAPI as Ops API<br/>(ops/basic/activation.py)
participant PyBind as PyBind<br/>(pybind.cpp)
participant CPP as C++ Wrapper<br/>(activation.cpp)
participant CUDA as CUDA Kernel<br/>(glu.cu)
User->>Python: LayerNormMLP(activation="glu")
Python->>Python: Check activation in supported list
Python->>Python: Set fc1_output_features = 2 * hidden_size
Note over User,CUDA: Forward Pass
User->>OpsAPI: GLU.forward(input)
OpsAPI->>PyBind: tex.glu(input, quantizer)
PyBind->>CPP: glu(input, quantizer)
CPP->>CPP: Create output tensor (shape_divisor=2)
CPP->>CUDA: nvte_glu(input, output, stream)
CUDA->>CUDA: gated_act_fn<sigmoid>(input, output)
CUDA->>CUDA: Split input into a, b
CUDA->>CUDA: Compute sigmoid(a) * b
CUDA-->>CPP: output
CPP-->>PyBind: output tensor
PyBind-->>OpsAPI: output tensor
OpsAPI-->>User: result
Note over User,CUDA: Backward Pass
User->>OpsAPI: GLU.backward(grad_output)
OpsAPI->>PyBind: tex.dglu(grad, input, quantizer)
PyBind->>CPP: dglu(grad, input, quantizer)
CPP->>CUDA: nvte_dglu(grad, input, output, stream)
CUDA->>CUDA: dgated_act_fn<sigmoid, dsigmoid>(grad, input, output)
CUDA->>CUDA: Compute gradient w.r.t. a and b
CUDA-->>CPP: grad_input
CPP-->>PyBind: grad_input tensor
PyBind-->>OpsAPI: grad_input tensor
OpsAPI-->>User: gradient
|
Additional Comments (3)
|
Signed-off-by: Kim, Jin <jinn.kim@sk.com>
for more information, see https://pre-commit.ci Signed-off-by: Kim, Jin <jinn.kim@sk.com>
8adb19a to
f7c6385
Compare
Signed-off-by: Tim Moon <tmoon@nvidia.com>
for more information, see https://pre-commit.ci
This comment was marked as outdated.
This comment was marked as outdated.
Additional Comments (3)
Fix: add explicit
The pybind enum export for Fix: add
Fix: add the appropriate key (consistent with the existing conventions, e.g. how GEGLU/REGLU/SWIGLU are represented) mapping to |
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
Additional Comments (1)
|
| elif activation == "sigmoid": | ||
| y_ref = torch.nn.functional.sigmoid(x_ref) |
There was a problem hiding this comment.
Sigmoid is not an option in the test, is this a leftover code?
There was a problem hiding this comment.
It's not that important, but I find it awkward to have GeLU/GeGLU, ReLU/ReGLU, SiLU/SwiGLU, and then just GLU.
|
Could we also add the test for regular LayerNormMLP module in the test_numerics.py? |
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Additional Comments (2)
Also update the JAX activation dispatch switch to handle
The activation dispatch Add a |
|
/te-ci |
Additional Comments (1)
|
Description
Add the original GLU (Gated Linear Unit) activation function as described in
Dauphin et al. (2017) and referenced in
Shazeer (2020), "GLU Variants Improve Transformer".
GLU is defined as:
where$\sigma$ is the sigmoid function and the input is split into two halves $a$ and $b$ along the last dimension.
Transformer Engine already supports several GLU variants (GEGLU, ReGLU, SReGLU, SwiGLU, etc.)
but was missing the original sigmoid-gated GLU. This PR fills that gap so that users can
simply pass
activation="glu"toLayerNormMLPorTransformerLayer.Type of change
Changes
transformer_engine/common/activation/glu.cu(new file): CUDA kernelsnvte_gluandnvte_dgluusing existingsigmoid/dsigmoidprimitives frommath.hand thegated_act_fn/dgated_act_fntemplates.transformer_engine/common/include/transformer_engine/activation.h: AddedGLUtoNVTE_Activation_Typeenum; declarednvte_gluandnvte_dgluwith doxygen documentation.transformer_engine/common/CMakeLists.txt: Registeredactivation/glu.cuin botharch_specific_sourcesandfast_mathbuild lists.transformer_engine/pytorch/csrc/extensions/activation.cpp: Addedglu()anddglu()C++ wrapper functions.transformer_engine/pytorch/csrc/extensions.h: Declaredgluanddglu.transformer_engine/pytorch/csrc/extensions/pybind.cpp: Exposedtex.gluandtex.dgluto Python.transformer_engine/pytorch/module/layernorm_mlp.py: Added"glu"to_get_act_func_supported_list(all 3 recipe branches), FC1 output-doubling condition, ONNX exportactivation_map, and docstring.transformer_engine/pytorch/ops/basic/activation.py: AddedGLUoperation class with forward (tex.glu) and backward (tex.dglu).transformer_engine/pytorch/ops/basic/__init__.py: ExportedGLU.transformer_engine/pytorch/transformer.py: UpdatedTransformerLayerdocstring to list'glu'as a supported activation.Checklist: