Add GPU Functionality to the dev Branch#440
Draft
braxtoncuneo wants to merge 11 commits into
Draft
Conversation
It appears the builtin implementation is not provided on AMD. This, or something like it, would need to be included until there is a real fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of changes
This PR adds GPU execution to the dev branch. This was accomplished by re-implementing some functionality in a manner compatible with execution through
numba-hipandnumba-cuda. This includes changes to how gpu state objects are managed, the introduction of helper intrinsics/functions to handle returning arrays, and replacement of builtin functions that are not supported on one or more gpu plaltforms (e.g. powers for complex numbers, numpy polynomial solvers).Types of changes
New feature: Helper functions/intrinsics for returning arrays on GPU
As has been mentioned in previous issues/PRs,
numba-hipandnumba-cudado not permit returning arrays from functions. In brief, this is because memory management on GPU is hard, and the simplest way to create an array as a local variable is to allocate it on the stack. Once the array that created a local array returns, that array no longer "exists" in a way that it can be safely used.This is a problem, since the helper functions provided by
numba_objects_generator.pyreturn references to and slices of arrays in the mcdc global data structure. We'd like to keep those helper functions, considering how much convenience and quality of life they provide to developers. Also, since these global data structures have a lifetime which encompasses the lifetime of all transport logic (i.e. the structures are in memory and safe to use whenever any transport logic is run) it should be safe to ignore this array-returning restriction for these helper functions.For this specific use case, when we know for certain that an array is safe to use after a function returns, this PR introduces the
array_returndecorator and thearray_resultfunction, which enables the return of array references/slices like so:This works by smuggling a pointer to the array's data and the size of the array out as a tuple, then reconstructing the array based off of the returned tuple in the calling function.
In normal python execution, the
array_resultfunction simply returns its input. However, there exists an overload ofarray_resultwhich will handle the conversion to tuple in compiled contexts.The
array_returndecorator registers an intrinsic overload of the decorated function. This intrinsic calls the decorated function and creates a new array based on the returned tuple. Because intrinsics are inlined directly into their calling function, they aren't their own function with a separate stack frame and so are not rejected by the array-returning guards ofnumba-hipornumba-cuda.New feature: GPU-compatible quartic solver and complex power calculation
numpy.rootssolves polynomials and is implemented bynumbain compiled contexts!numba-hipandnumba-cudadon't implementnumpy.roots- likely because the general case does not have analytic solutions and requires the use of arbitrarily large matrixes (determined by the degree of the polynomial).numba-hipdoesn't implement the intrinsic for powers of complex numbers.New feature: GPU execution
Developer Checklist
Associated Issues and PRs
Associated Developers