Revised nvidia docker support #6

DaniAsh551 · 2023-04-08T10:48:44Z

I have built up on my previous PR by adding the missed "opencl" build feature for rllama.

I have added proper instructions for setting up the host (for fedora, debian and arch).

I also took the liberty in correcting the nvidia GPU names in the /README.md to reflect their official names (GTX vs RTX).

Noeda · 2023-04-09T00:51:25Z

Wonderful! Thanks for also some of the cleanup work on the README.md, somehow I guess I never realized my RTX card says RTX and not GTX even though it is sitting right next to me in fancy rainbow LEDs.

DaniAsh551 · 2023-04-09T10:31:19Z

@Noeda haha, no problem

While I have you here, I saw some commits related to huggingface models, does this mean we could use those with rllama now?

Also, have you given any thought about LoRA? I mean supporting those models aswell, or are they fine now aswell?

LoRA should prove very helpful for people with terrible internet connections like me or people with limited [V]RAM or disk space.

Noeda · 2023-04-09T17:26:08Z

While I have you here, I saw some commits related to huggingface models, does this mean we could use those with rllama now?

You can't use any arbitrary huggingface model, only the LLaMA ones. I added it so that we could run Vicuna model which is architecturally identical to LLaMA, just finetuned to respond to instructions. The Vicuna model comes in a format that's different from the original LLaMA format and I had to write a loader for it.

It does mean that in future if I decide to support non-LLaMA models from huggingface some parts of the code don't need to be rewritten.

Also, have you given any thought about LoRA? I mean supporting those models aswell, or are they fine now aswell?

I feel like I need to understand what exactly the LoRA repo there is offering. I thought it would be about saving memory by replacing large matrices with their low-rank approximations but I think that's not exactly right. It's about fine-tuning existing models? Sounds like it would keep both the original weights and then fine-tuned weights loaded at the same time but from skimming it I can't tell. (I'll read it later with the paper to understand it).

Training finetunings with rllama will need either gradient descent implementation or manually implemented gradients which is tough work. I've done that sort of thing with Haskell projects so I would know an attack plan but to be perfectly honest I'm not sure if I'll ever add any training capability to rllama.

For memory savings, I've thought of implementing quantization. I have a branch k4bit where I already implemented 4-bit weights but the quality of the output dropped so dramatically that I thought it's worse than nothing.

DaniAsh551 · 2023-04-09T17:32:27Z

Oh, sorry for the broad description, by hugging face models I meant LLAMA ones on huggingface.

As for LoRA, there are already LLAMA LoRA models on hugging face from what I can gather, see here and here.

As you said, I dont care much for training with rllama, atleast not yet anyway.

DaniAsh551 added 2 commits April 8, 2023 14:02

readme: nvidia RTX GPU naming

32eac9e

docker:nvidia opencl support with instructions

faf98bd

Noeda merged commit 1e1131f into Noeda:master Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised nvidia docker support #6

Revised nvidia docker support #6

DaniAsh551 commented Apr 8, 2023

Noeda commented Apr 9, 2023 •

edited

Loading

DaniAsh551 commented Apr 9, 2023

Noeda commented Apr 9, 2023

DaniAsh551 commented Apr 9, 2023

Revised nvidia docker support #6

Revised nvidia docker support #6

Conversation

DaniAsh551 commented Apr 8, 2023

Noeda commented Apr 9, 2023 • edited Loading

DaniAsh551 commented Apr 9, 2023

Noeda commented Apr 9, 2023

DaniAsh551 commented Apr 9, 2023

Noeda commented Apr 9, 2023 •

edited

Loading