Tuple of functions (with identical signature) does not compile for CUDA #9420

s-m-e · 2024-02-03T17:14:12Z

I have tried using the latest released version of Numba (most recent is
visible in the release notes
(https://numba.readthedocs.io/en/stable/release-notes-overview.html).
I have included a self contained code sample to reproduce the problem.
i.e. it's possible to run as 'python bug.py'.

The following example results in NumbaNotImplementedError:

import numpy as np
import numba as nb
from numba import cuda

@cuda.jit("f8(f8)", device = True, inline = True)
def foo(x):
    return x * 10

@cuda.jit("f8(f8)", device = True, inline = True)
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

@nb.vectorize("f8(f8,i8)", target = "cuda")
def demo(x, idx):
    return FUNCS[idx](x)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

A similar thing does work on the CPU, with a warning though:

import numpy as np
import numba as nb

@nb.njit("f8(f8)", inline = "always")
def foo(x):
    return x * 10

@nb.njit("f8(f8)", inline = "always")
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

@nb.vectorize("f8(f8,i8)", target = "cpu")
def demo(x, idx):
    return FUNCS[idx](x)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

I am trying to get around the limitation of being unable to pass custom user-provided functions to a CUDA-kernel. The tuple FUNCS is being generated dynamically (from a list or array), then a custom version of demo is supposed to be compiled. All functions in FUNCS have the exact same signature. For further context / use-case, see here.

The text was updated successfully, but these errors were encountered:

s-m-e · 2024-02-03T17:23:27Z

Workaround:

import numpy as np
import numba as nb
from numba import cuda

@cuda.jit("f8(f8)", device = True, inline = True)
def foo(x):
    return x * 10

@cuda.jit("f8(f8)", device = True, inline = True)
def bar(x):
    return x * 100

FUNCS = (
    foo,
    bar,
)

exec("""
@cuda.jit("f8(f8,i8)", device = True, inline = True)
def dispatcher(x, idx):
{DISPATCHER:s}
    return np.nan
""".format(DISPATCHER = "\n".join([
    f"    {'if' if idx == 0 else 'elif':s} idx == {idx:d}:\n        return {func.__name__:s}(x)"
    for idx, func in enumerate(FUNCS)
])))

@nb.vectorize("f8(f8,i8)", target = "cuda")
def demo(x, idx):
    return dispatcher(x, idx)

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo(X, mask))

s-m-e · 2024-02-03T18:29:50Z

Dynamically building custom functions around arbitrary lists/tuples of functions is non-trivial ... It took me a while to fully piece it together in a semi-robust fashion. In case someone comes across this later, the full workaround looks somewhat like this:

import numpy as np
import numba as nb
from numba import cuda

def factory():
    out = []
    for factor in (10, 100):
        @cuda.jit("f8(f8)", device = True, inline = True)
        def foo(x):
            return x * factor
        out.append(foo)
    return out

def builder(funcs):
    # names are not unique, ids are ...
    funcs = [(f"func_{id(func):x}", func) for func in funcs]

    # HACK https://stackoverflow.com/a/71560563
    globals_, locals_ = globals(), locals()
    globals_.update({name: handle for name, handle in funcs})
    exec("""
@cuda.jit("f8(f8,i8)", device = True, inline = True)
def dispatcher(x, idx):
{DISPATCHER:s}
    return np.nan
    """.format(DISPATCHER = "\n".join([
        f"    {'if' if idx == 0 else 'elif':s} idx == {idx:d}:\n        return {name:s}(x)"
        for idx, (name, _) in enumerate(funcs)
    ])), globals_, locals_)
    globals_["dispatcher"] = locals_["dispatcher"]

    @nb.vectorize("f8(f8,i8)", target = "cuda")
    def prototype(x, idx):
        return dispatcher(x, idx)

    return prototype

user_funcs = factory()
demo1 = builder(user_funcs)
demo2 = builder(user_funcs[::-1])

X = np.arange(0., 10.)
mask = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype="i8")

print(demo1(X, mask))
print(demo2(X, mask))

factory prepares two different functions with identical names, just to capture this edge-case in builder.

s-m-e added a commit to pleiszenburg/hapsira that referenced this issue Feb 5, 2024

workaround: numba/numba#9420

c6916b5

gmarkall self-assigned this Feb 6, 2024

s-m-e mentioned this issue Feb 24, 2024

[WIP] jit/core redesign pleiszenburg/hapsira#7

Open

12 tasks

stuartarchibald added feature_request CUDA CUDA related issue/PR labels Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuple of functions (with identical signature) does not compile for CUDA #9420

Tuple of functions (with identical signature) does not compile for CUDA #9420

s-m-e commented Feb 3, 2024

s-m-e commented Feb 3, 2024

s-m-e commented Feb 3, 2024

Tuple of functions (with identical signature) does not compile for CUDA #9420

Tuple of functions (with identical signature) does not compile for CUDA #9420

Comments

s-m-e commented Feb 3, 2024

s-m-e commented Feb 3, 2024

s-m-e commented Feb 3, 2024