Create TransformerLayer implementation to enable full end-to-end support for grouped tensor flows as required by MoE training and inference workflows. This could be achieved by either extending the existing TransformerLayer or creating a separate layer.
This is both for the ease of use and easier end-to-end testing and benchmarking MoE workloads.