-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This will solve CPU-only, CUDA-only and any mix of them. #98
Conversation
@KindXiaoming This should close many issues related to using CUDA. To work properly, I recommend updating requirements.txt to the following
Please let me know if you want me to make another PR or you'll handle this by yourself. |
There's another device missing in https://github.com/KindXiaoming/pykan/blob/master/kan/KAN.py#L205. I've addressed it in my fork at https://github.com/Jim137/pykan/tree/develop. Would you be open to merging my changes and submitting a pull request together? |
Good point, I added it. |
I don't know why but if use MPS(Apple SIlicon) to loss is nan. model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10., device=device.type); train loss: nan | test loss: nan | reg: nan : 100%|█████████████████| 20/20 [00:03<00:00, 5.11it/s] |
@brainer3220 I'm afraid I can't help too much with MPS, but it seems nonetheless a common issue between MPS and Torch (see pytorch/pytorch#112834, for example). |
I am trying to run the given example of KAN in colab with @AlessandroFlati AlessandroFlati:develop implementation: Still getting the above error. I used the following requirements: In case I want to run on cpu, it says no NVIDIA drivers selected. Any help to resolve this is appreciated. Thanks! |
First you need to initialize a torch.device like this Then use device in all constructors Finally, you will need to put the dataset tensor on device doing this:
|
Thanks, now I am able to run on colab GPU. But the CPU problem persists. |
Hi @AlessandroFlati, would appreciate you make another PR for me! Thanks in advance :) |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(torch.cuda.is_available()) model.to(device) dataset['train_input'] = dataset['train_input'].to(device) but there is still a problem --> 170 x = torch.einsum('ij,k->ikj', x, torch.ones(self.out_dim, device=self.device)).reshape(batch, self.size).permute(1, 0) File E:\anaconda\envs\4torch2\lib\site-packages\torch\functional.py:380, in einsum(*args) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! |
You shouldn't just |
This solves the post-
fix_symbolic
problem with cuda, theinitialize_from_another_model
problem with cuda, and thecpu
problem related (already mentioned in this PR) that forced to use cuda.