This will solve CPU-only, CUDA-only and any mix of them. #98

AlessandroFlati · 2024-05-06T17:10:43Z

This solves the post-fix_symbolic problem with cuda, the initialize_from_another_model problem with cuda, and the cpu problem related (already mentioned in this PR) that forced to use cuda.

AlessandroFlati · 2024-05-06T17:13:25Z

@KindXiaoming This should close many issues related to using CUDA. To work properly, I recommend updating requirements.txt to the following

matplotlib==3.6.2
numpy==1.26.4
scikit-learn==1.4.2
setuptools==69.5.1
sympy==1.11.1
torch==2.2.2
tqdm==4.66.2

Please let me know if you want me to make another PR or you'll handle this by yourself.

Jim137 · 2024-05-06T17:39:10Z

There's another device missing in https://github.com/KindXiaoming/pykan/blob/master/kan/KAN.py#L205. I've addressed it in my fork at https://github.com/Jim137/pykan/tree/develop. Would you be open to merging my changes and submitting a pull request together?

AlessandroFlati · 2024-05-06T18:01:44Z

Good point, I added it.

brainer3220 · 2024-05-07T06:04:20Z

I don't know why but if use MPS(Apple SIlicon) to loss is nan.

model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10., device=device.type);

train loss: nan | test loss: nan | reg: nan : 100%|█████████████████| 20/20 [00:03<00:00,  5.11it/s]

AlessandroFlati · 2024-05-07T06:08:48Z

@brainer3220 I'm afraid I can't help too much with MPS, but it seems nonetheless a common issue between MPS and Torch (see pytorch/pytorch#112834, for example).

rajdeepbanerjee-git · 2024-05-07T09:46:17Z

I am trying to run the given example of KAN in colab with @AlessandroFlati AlessandroFlati:develop implementation:

Still getting the above error. I used the following requirements:
matplotlib==3.6.2
numpy==1.26.4
scikit-learn==1.4.2
setuptools==69.5.1
sympy==1.11.1
torch==2.2.1
tqdm==4.66.2

In case I want to run on cpu, it says no NVIDIA drivers selected.

Any help to resolve this is appreciated. Thanks!

SimoSbara · 2024-05-07T10:15:23Z

I am trying to run the given example of KAN in colab with @AlessandroFlati AlessandroFlati:develop implementation:

Still getting the above error. I used the following requirements: matplotlib==3.6.2 numpy==1.26.4 scikit-learn==1.4.2 setuptools==69.5.1 sympy==1.11.1 torch==2.2.1 tqdm==4.66.2

In case I want to run on cpu, it says no NVIDIA drivers selected.

Any help to resolve this is appreciated. Thanks!

First you need to initialize a torch.device like this device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Then use device in all constructors device = device

Finally, you will need to put the dataset tensor on device doing this:

dataset['train_input'] = dataset['train_input'].to(device)
dataset['train_label'] = dataset['train_label'].to(device)

rajdeepbanerjee-git · 2024-05-07T10:39:46Z

Thanks, now I am able to run on colab GPU. But the CPU problem persists.

SimoSbara · 2024-05-07T11:02:59Z

Thanks, now I am able to run on colab GPU. But the CPU problem persists.

This pull request solves it, you can try to modify pykan like in those commits:
d606bd8
c857dd6

I had the same problem #75.

KindXiaoming · 2024-05-07T12:42:45Z

Hi @AlessandroFlati, would appreciate you make another PR for me! Thanks in advance :)

alpaca202204 · 2024-05-08T13:59:30Z

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device(type='cuda')

print(torch.cuda.is_available())
True

model.to(device)

dataset['train_input'] = dataset['train_input'].to(device)
dataset['train_label'] = dataset['train_label'].to(device)

but there is still a problem

--> 170 x = torch.einsum('ij,k->ikj', x, torch.ones(self.out_dim, device=self.device)).reshape(batch, self.size).permute(1, 0)
171 preacts = x.permute(1, 0).clone().reshape(batch, self.out_dim, self.in_dim)
172 base = self.base_fun(x).permute(1, 0) # shape (batch, size)

File E:\anaconda\envs\4torch2\lib\site-packages\torch\functional.py:380, in einsum(*args)
375 return einsum(equation, *_operands)
377 if len(operands) <= 2 or not opt_einsum.enabled:
378 # the path for contracting 0 or 1 time(s) is already optimized
379 # or the user has disabled using opt_einsum
--> 380 return _VF.einsum(equation, operands) # type: ignore[attr-defined]
382 path = None
383 if opt_einsum.is_available():

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

AlessandroFlati · 2024-05-08T14:46:05Z

You shouldn't just model.to(device), but rather create both model and dataset passing device=device argument. Besides, you're missing test_input and test_label keys for dataset

This will solve CPU-only, CUDA-only and any mix of them.

d606bd8

This will solve CPU-only, CUDA-only and any mix of them.

c857dd6

This was referenced May 6, 2024

CUDA training #84

Open

CUDA device for training #75

Closed

AlessandroFlati mentioned this pull request May 7, 2024

M1 runtime fails with "AssertionError: Torch not compiled with CUDA enabled" #107

Open

SimoSbara mentioned this pull request May 7, 2024

CUDA device error #110

Open

KindXiaoming merged commit 70b7b8d into KindXiaoming:master May 7, 2024

SimoSbara mentioned this pull request May 7, 2024

CPU without cuda problem #96

Open

Jim137 mentioned this pull request May 8, 2024

Error AssertionError: Torch not compiled with CUDA enabled with PC only #121

Open

CaSiOFT mentioned this pull request May 15, 2024

When running on Apple GPU (MPS), the loss is always nan. #199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This will solve CPU-only, CUDA-only and any mix of them. #98

This will solve CPU-only, CUDA-only and any mix of them. #98

AlessandroFlati commented May 6, 2024 •

edited

Loading

AlessandroFlati commented May 6, 2024

Jim137 commented May 6, 2024

AlessandroFlati commented May 6, 2024

brainer3220 commented May 7, 2024 •

edited

Loading

AlessandroFlati commented May 7, 2024

rajdeepbanerjee-git commented May 7, 2024 •

edited

Loading

SimoSbara commented May 7, 2024

rajdeepbanerjee-git commented May 7, 2024

SimoSbara commented May 7, 2024

KindXiaoming commented May 7, 2024

alpaca202204 commented May 8, 2024

AlessandroFlati commented May 8, 2024

This will solve CPU-only, CUDA-only and any mix of them. #98

This will solve CPU-only, CUDA-only and any mix of them. #98

Conversation

AlessandroFlati commented May 6, 2024 • edited Loading

AlessandroFlati commented May 6, 2024

Jim137 commented May 6, 2024

AlessandroFlati commented May 6, 2024

brainer3220 commented May 7, 2024 • edited Loading

AlessandroFlati commented May 7, 2024

rajdeepbanerjee-git commented May 7, 2024 • edited Loading

SimoSbara commented May 7, 2024

rajdeepbanerjee-git commented May 7, 2024

SimoSbara commented May 7, 2024

KindXiaoming commented May 7, 2024

alpaca202204 commented May 8, 2024

AlessandroFlati commented May 8, 2024

AlessandroFlati commented May 6, 2024 •

edited

Loading

brainer3220 commented May 7, 2024 •

edited

Loading

rajdeepbanerjee-git commented May 7, 2024 •

edited

Loading