🦙🦙🦙 SNO on Spot running LLM's 🦙🦙🦙

A simple method to provision RHOAI on Single Node OpenShift to try out different quantized LLM's including meta's llama2,3 and ibm/redhat granite models.

We use a g6.4xlarge on aws spot - which comes with a modern Nvidia L4 (24GB), 16 vCPU, 64 GiB RAM.

Running OpenShift 4.15 Single Node. We configure Nvidia time slicing to parallel share the GPU for running jupyter notebooks and model serving.

Install OpenShift

Install OCP using SNO on SPOT.

export AWS_PROFILE=sno-llama
export AWS_DEFAULT_REGION=us-east-2
export AWS_DEFAULT_ZONES=["us-east-2c"]
export CLUSTER_NAME=sno
export BASE_DOMAIN=sandbox.opentlc.com
export PULL_SECRET=$(cat ~/tmp/pull-secret)
export SSH_KEY=$(cat ~/.ssh/id_rsa.pub)
export INSTANCE_TYPE=g6.4xlarge
export ROOT_VOLUME_SIZE=200
export OPENSHIFT_VERSION=4.15.9

mkdir -p ~/tmp/sno-${AWS_PROFILE} && cd ~/tmp/sno-${AWS_PROFILE}

curl -Ls https://raw.githubusercontent.com/eformat/sno-for-100/main/sno-for-100.sh | bash -s -- -d

Install Everything Else

Bootstrap ArgoCD operator and everything using gitops (GPU, Cluster PerfEnhancements, CertManager, GPU Setup, LVM+Noobaa/S3 Storage, RHOAI). Your SNO will reboot for MachineConfig updates.

./gitops/install.sh -d

Create Users using htpasswd. Delete's the kubeadmin user.

./gitops/users.sh

Install Let's Encrypt certificates for api, apps - using CertManager and Route53.

./gitops/certificates.sh

Scale the RHOAI Platform down a bit so we free up some cpu.

./gitops/scale-resources.sh

The manual instructions are still here if you want to run them.

Model Notebooks

Now open RHOAI and Login.

Run the jupyter Notebook - "PyTorch, CUDA v11.8, Python v3.9, PyTorch v2.0, Small, 1 NVIDIA GPU Accelerator".

Make sure you give your notebook plenty of local storage (50-100GB).

You can login as admin or admin2 and work on each notebook separately to see GPU timeslicing in action.

Llama2

Meta's llama-2 model.

Open the sno-llama2.ipynb notebook and have a play.

Llama3

Meta's llama-3 model.

Open the sno-llama3.ipynb notebook and have a play.

Granite

InstructLab's opensource granite model.

Open the sno-granite.ipynb notebook and have a play.

Code-Llama

Deploy your own IDE python coding assistant.

Open the sno-code-llama.ipynb notebook and have a play.

Prompt Caching

How can we start to remeber previous chat contexts using llama.cpp

Open the sno-prompt-cache.ipynb notebook and have a play.

Instructlab

Use RHOAI try out instructlab using a notebook image. See Instructlab README.md

Open the sno-instructlab.ipynb notebook and have a play.

Model Serving

Use RHOAI to serve the models with a llama-cpp custom runtime. See Serving README.md

Delete SNO instance

If you no longer need your instance, to remove all related aws objects just run inside your $RUNDIR.

openshift-install destroy cluster --dir=cluster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙🦙🦙 SNO on Spot running LLM's 🦙🦙🦙

Install OpenShift

Install Everything Else

Model Notebooks

Llama2

Llama3

Granite

Code-Llama

Prompt Caching

Instructlab

Model Serving

Delete SNO instance

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
continue		continue
gitops		gitops
images		images
instructlab		instructlab
serving		serving
MANUAL_INSTALL.md		MANUAL_INSTALL.md
README.md		README.md
node-commit.sh		node-commit.sh
sno-code-llama.ipynb		sno-code-llama.ipynb
sno-granite.ipynb		sno-granite.ipynb
sno-instructlab.ipynb		sno-instructlab.ipynb
sno-llama2.ipynb		sno-llama2.ipynb
sno-llama3.ipynb		sno-llama3.ipynb
sno-prompt-cache.ipynb		sno-prompt-cache.ipynb

eformat/sno-llama

Folders and files

Latest commit

History

Repository files navigation

🦙🦙🦙 SNO on Spot running LLM's 🦙🦙🦙

Install OpenShift

Install Everything Else

Model Notebooks

Llama2

Llama3

Granite

Code-Llama

Prompt Caching

Instructlab

Model Serving

Delete SNO instance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages