[rocm7.2_internal_testing] update related_commit#3003
Open
amd-sriram wants to merge 1 commit intorocm7.2_internal_testingfrom
Open
[rocm7.2_internal_testing] update related_commit#3003amd-sriram wants to merge 1 commit intorocm7.2_internal_testingfrom
amd-sriram wants to merge 1 commit intorocm7.2_internal_testingfrom
Conversation
Commit Messages: - Create custom python operators for MixedFusedLayerNorm and MixedFusedRMSNorm. (#304) - Add new apex module to jit load system (#294) * add code to add loader module for jit module * fix errors to create jit module adder - use correct file name to save code to * fix errors to create jit module adder - use correct class name of the builder and parameter to supply builder module name * fix errors to create jit module loader * add description about jit module script to add jit loader for a jit module with builder provided * add description about jit module script to add jit loader for a jit module with builder provided * add attributes and methods to override when creating a jit module builder * add extra new lines * update jit module to take the builder file name and extract module name from the builder, update missing entries in the table in readme for adding new module in jit * refine the description about module to jit * add description about jit * add description about jit * add code to create a builder based on user inputs * change the example from fused_dense to swiglu * allow user to skip sources list * change description of cxx and nvcc flags, add description of methods and fields in the initial builder code created by script - add details of fused_conv_bias_relu in table of modules and fix error of maximum depth reached (#297) * add details of fused_conv_bias_relu in table of modules and build flag * solve the maximum depth error. - Port fused_conv_bias_relu to ROCm (#295) * Add support for conv bias relu * Fix compilation failure * omit check_cudnn_version_and_warn check (no cuDNN on ROCm) * Flatten bias for PyTorch from 4D to 1D * Implement fusion of Conv with ReLU with MIOpen * Fix compilation issues * Fix crash for ConvBias * Fix merge issues * Add support for ConvBias and ConvBiasMaskRelu * Fix segmentation fault on bwd for ConvBias * add code for fusing conv+bias for retinanet, add test case for retinanet * Fix torch warning * Fix warnings in a unit test file as well * add builder and loader for fused_conv_bias_relu module --------- Co-authored-by: Sergey Solovyev <sergey.solovyev@amd.com> Co-authored-by: Mikko Tukiainen <mikko.tukiainen@amd.com> - Bump version from 1.10.0 to 1.11.0 (#293) - [REDUX] Refactor Apex build process to use the PyTorch JIT extension flow (#291) * Created initial code for loading fused_dense module dynamically instead of building it. Code uses accelerator and op_builder modules from deepspeed code. * add apex/git_version_info_installed.py to gitignore as it is dynamically created by setup.py for the build process * add code for building fused rope dynamically * add code for building fused bias swiglu dynamically * fix the code so that fused rope and fused softmax are not compiled in jit mode, add csrc back to setup.py since it is not copied to apex wheel * load the jit modules inside and this prevents them from building when building the wheel * convert syncbn module to jit * fix the unnecessary compile of syncbn module in wheel building due to imports in python module * add fused layer norm module to jit build * make focal loss module as jit module * make focal loss module as jit module * make xentropy module as jit module * make bpn module as jit module * add code to build individual extensions without JIT * clean up the flags for the modules based on apex/setup.py * add function to get the backward_pass_guard_args in CudaOpBuilder and make MLP JIT compile * add fused weight gradient mlp to jit compile * move fused_weight_gradient_mlp_cuda load inside so that it is not compiled during apex installation * make fused index mul 2d jit compile and dd aten atomic header flag method to CUDAOpBuilder to support its jit compile * make fast multihead attention as jit module, add generator_args to CudaOpBuilder support jit of this module * make transducer loss and transducer joint modules as jit modules, add nvcc_threads_args method in CUDAOpBuilder to support these jit modules * remove extra method - installed_cuda_version from CUDAOpBuilder * add apex_C module to jit compile, add py-cpuinfo to requirements.txt as it is needed for TorchCPUOpBuilder * make nccl allocator as a jit compile module, add nccl_args method to CUDAOpBuilder to support this * make amp_C as a jit module * add a few uses of amp_C jit module * add a few uses of amp_C jit module * make fused adam as a jit module * add a few uses of amp_C jit module * fix the issue with fused adam jit module * make fused lamb as jit module * make distributed adam as jit module * make distributed lamb as jit module * add remaining amp_C uses with jit loader * add remaining usage of apexC jit module * make nccl p2p module as jit compile * make peer memory module as jit compile * add code to check for minimum nccl version to compile nccl allocator module * add provision to provide APEX_CPP_OPS=1 and APEX_CUDA_OPS=1 as replacement for --cpp_ext --cuda_ext command line arguments for building specific extensions in apex, save these settings for later use * check for minimum torch version for nccl allocator, check if the module is compatible other removed from installed ops list * add build as a dependency to support wheel building * Replace is_compatible to check for installation conditions with is_supported, because there is an issue with loading nccl allocator * Similar to pytorch we create a make command to install aiter, that the user can use. There will be no building aiter in the setup.py * update extension import test so that it considers jit compile extensions * clean up MultiTensorApply usages so that amp_C is not build in jit compile mode * Adding missing modules from deepspeed repo. Remove extra code in setup.py. Use is_compatible instead of is_supported * change name of apex_C module * change the name of cpp and cuda build flags, remove APEX_BUILD_OPS, cleanup the logic to build specific modules * add missing files used in cpu accelerator * add make clean command to handle deleting torch extensions installed for jit modules, fix the cpu builder import error * remove unused code in setup.py, fix the code to build for cpu mode * Removing unused code * remove accelerator package and refactor the used code into op_builder.all_ops BuilderUtils class * remove accelerator package usages * revert code that was removed by mistake * Cleaning up the setup file and renaming functions and variable to more readable names. * Fix the nccl version so that the nccl_allocator.so file can be loaded properly. Setup() call has an argument called py_modules which copies the python class into sitepackages folder. The python modules in the compatibility folder do lazy load of the builder classes. First these files are copied in the parent folder so that the files themselves are copied into sitepackages so that the kernel can be loaded into python then these temporary files are deleted. * Restore to original importing the extension code. * renamed compatibility/scaled_masked_softmax_cuda.py, added some extra tests in the contrib test runner * Added instructions for JIT load and changes in installation options * Restructuring the README * Added instructions for building wheel * replaced TorchCPUBuilder with CPUBuilder, added a main method in contrib test runner * create a script to build different jit conditions for running different tests * add script to run tests with different jit builds, add instructions to run jit build and tests in readme, add other tests in readme * fix the issues with running the tests - improper paths, counting .so files in apex folder * add mad internal scripts * remove print statement * remove testing section from readme * change location of result file * remove multiple results file from models.json * add platform specific description to wheel name even if no CppExtension or CUDAExtension is built with JIT load approach * add ninja and wheel to requirements to be installed * Update Release notes in Readme * Exclude compatibility folder while installing apex * Update README.md * Update README.md * Update README.md * Adding modification note to the original copywrite * fix the issue with symbolic links for op_builder, csrc when the apex repo is cloned in the docker * assign the symbolically linked folders into a variable and then loop across the list entries * remove unnecessary tabs --------- Co-authored-by: skishore <sriramkumar.kishorekumar@amd.com> Co-authored-by: sriram <sriram.kumar@silo.ai> - Pow implementation is very expensive on AMD CDNA4. (#292) This commit changes it to a mathematically equivalent exp(y*log(x)) for x > 0. However 1-2 ULP prec loss might be possible. - Update README.md (#289) - Update version to 1.10.0 (#282) - add code to read BUILD_VERSION env variable, so that it is used instead of version.txt when creating a wheel (#278) PRs: - ROCm/apex#304 Fixes: - https://example.com/issue-292 - https://example.com/issue-278 - https://example.com/issue-295 - https://example.com/issue-294 - https://example.com/issue-289 - https://example.com/issue-304 - https://example.com/issue-291 - https://example.com/issue-282 - https://example.com/issue-293 - https://example.com/issue-297
|
Jenkins build for 11d5b7fb26193b5007c497b4e9c9ef9680c089f9 commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit Messages: