Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised nvidia docker support #6

Merged
merged 2 commits into from
Apr 9, 2023
Merged

Revised nvidia docker support #6

merged 2 commits into from
Apr 9, 2023

Conversation

DaniAsh551
Copy link
Contributor

I have built up on my previous PR by adding the missed "opencl" build feature for rllama.

I have added proper instructions for setting up the host (for fedora, debian and arch).

I also took the liberty in correcting the nvidia GPU names in the /README.md to reflect their official names (GTX vs RTX).

@Noeda Noeda merged commit 1e1131f into Noeda:master Apr 9, 2023
@Noeda
Copy link
Owner

Noeda commented Apr 9, 2023

Wonderful! Thanks for also some of the cleanup work on the README.md, somehow I guess I never realized my RTX card says RTX and not GTX even though it is sitting right next to me in fancy rainbow LEDs.

@DaniAsh551
Copy link
Contributor Author

@Noeda haha, no problem

While I have you here, I saw some commits related to huggingface models, does this mean we could use those with rllama now?

Also, have you given any thought about LoRA? I mean supporting those models aswell, or are they fine now aswell?

LoRA should prove very helpful for people with terrible internet connections like me or people with limited [V]RAM or disk space.

@Noeda
Copy link
Owner

Noeda commented Apr 9, 2023

While I have you here, I saw some commits related to huggingface models, does this mean we could use those with rllama now?

You can't use any arbitrary huggingface model, only the LLaMA ones. I added it so that we could run Vicuna model which is architecturally identical to LLaMA, just finetuned to respond to instructions. The Vicuna model comes in a format that's different from the original LLaMA format and I had to write a loader for it.

It does mean that in future if I decide to support non-LLaMA models from huggingface some parts of the code don't need to be rewritten.

Also, have you given any thought about LoRA? I mean supporting those models aswell, or are they fine now aswell?

I feel like I need to understand what exactly the LoRA repo there is offering. I thought it would be about saving memory by replacing large matrices with their low-rank approximations but I think that's not exactly right. It's about fine-tuning existing models? Sounds like it would keep both the original weights and then fine-tuned weights loaded at the same time but from skimming it I can't tell. (I'll read it later with the paper to understand it).

Training finetunings with rllama will need either gradient descent implementation or manually implemented gradients which is tough work. I've done that sort of thing with Haskell projects so I would know an attack plan but to be perfectly honest I'm not sure if I'll ever add any training capability to rllama.

For memory savings, I've thought of implementing quantization. I have a branch k4bit where I already implemented 4-bit weights but the quality of the output dropped so dramatically that I thought it's worse than nothing.

@DaniAsh551
Copy link
Contributor Author

Oh, sorry for the broad description, by hugging face models I meant LLAMA ones on huggingface.

As for LoRA, there are already LLAMA LoRA models on hugging face from what I can gather, see here and here.

As you said, I dont care much for training with rllama, atleast not yet anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants