Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuple of functions (with identical signature) does not compile for CUDA #9420

Open
2 tasks done
s-m-e opened this issue Feb 3, 2024 · 2 comments
Open
2 tasks done
Assignees
Labels
CUDA CUDA related issue/PR feature_request

Comments

@s-m-e
Copy link

s-m-e commented Feb 3, 2024

The following example results in NumbaNotImplementedError:

import numpy as np
import numba as nb
from numba import cuda

@cuda.jit("f8(f8)", device = True, inline = True)
def foo(x):
    return x * 10

@cuda.jit("f8(f8)", device = True, inline = True)
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

@nb.vectorize("f8(f8,i8)", target = "cuda")
def demo(x, idx):
    return FUNCS[idx](x)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

A similar thing does work on the CPU, with a warning though:

import numpy as np
import numba as nb

@nb.njit("f8(f8)", inline = "always")
def foo(x):
    return x * 10

@nb.njit("f8(f8)", inline = "always")
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

@nb.vectorize("f8(f8,i8)", target = "cpu")
def demo(x, idx):
    return FUNCS[idx](x)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

I am trying to get around the limitation of being unable to pass custom user-provided functions to a CUDA-kernel. The tuple FUNCS is being generated dynamically (from a list or array), then a custom version of demo is supposed to be compiled. All functions in FUNCS have the exact same signature. For further context / use-case, see here.

@s-m-e
Copy link
Author

s-m-e commented Feb 3, 2024

Workaround:

import numpy as np
import numba as nb
from numba import cuda

@cuda.jit("f8(f8)", device = True, inline = True)
def foo(x):
    return x * 10

@cuda.jit("f8(f8)", device = True, inline = True)
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

exec("""
@cuda.jit("f8(f8,i8)", device = True, inline = True)
def dispatcher(x, idx):
{DISPATCHER:s}
    return np.nan
""".format(DISPATCHER = "\n".join([
    f"    {'if' if idx == 0 else 'elif':s} idx == {idx:d}:\n        return {func.__name__:s}(x)"
    for idx, func in enumerate(FUNCS)
])))

@nb.vectorize("f8(f8,i8)", target = "cuda")
def demo(x, idx):
    return dispatcher(x, idx)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

@s-m-e
Copy link
Author

s-m-e commented Feb 3, 2024

Dynamically building custom functions around arbitrary lists/tuples of functions is non-trivial ... It took me a while to fully piece it together in a semi-robust fashion. In case someone comes across this later, the full workaround looks somewhat like this:

import numpy as np
import numba as nb
from numba import cuda

def factory():
    out = []
    for factor in (10, 100):
        @cuda.jit("f8(f8)", device = True, inline = True)
        def foo(x):
            return x * factor
        out.append(foo)
    return out

def builder(funcs):
    # names are not unique, ids are ...
    funcs = [(f"func_{id(func):x}", func) for func in funcs]

    # HACK https://stackoverflow.com/a/71560563
    globals_, locals_ = globals(), locals()
    globals_.update({name: handle for name, handle in funcs})
    exec("""
@cuda.jit("f8(f8,i8)", device = True, inline = True)
def dispatcher(x, idx):
{DISPATCHER:s}
    return np.nan
    """.format(DISPATCHER = "\n".join([
        f"    {'if' if idx == 0 else 'elif':s} idx == {idx:d}:\n        return {name:s}(x)"
        for idx, (name, _) in enumerate(funcs)
    ])), globals_, locals_)
    globals_["dispatcher"] = locals_["dispatcher"]

    @nb.vectorize("f8(f8,i8)", target = "cuda")
    def prototype(x, idx):
        return dispatcher(x, idx)

    return prototype

user_funcs = factory()
demo1 = builder(user_funcs)
demo2 = builder(user_funcs[::-1])

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo1(X, mask))
print(demo2(X, mask))

factory prepares two different functions with identical names, just to capture this edge-case in builder.

s-m-e added a commit to pleiszenburg/hapsira that referenced this issue Feb 5, 2024
@gmarkall gmarkall self-assigned this Feb 6, 2024
@stuartarchibald stuartarchibald added feature_request CUDA CUDA related issue/PR labels Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA CUDA related issue/PR feature_request
Projects
None yet
Development

No branches or pull requests

3 participants