A Single Usage Is All You Need

Awards:

FIRST PLACE in the Israeli Young Scientist and Developer Contest 2023

Finalist at Regeneron International Science and Engineering Fair 2023

5 high school credit points in data science. Grade: 99%

Abstarct:

I developed and published an open-source efficient text-generation algorithm called grouped sampling to enable cheap and accessible AI text-generation services for everyone.

Causal language models are state-of-the-art text generation models that power many popular products like chat-GPT. The naive text generation algorithm requires x usages of a causal language model to generate x words, which makes it inefficient.

Grouped sampling is an alternative algorithm that manipulates the input text before passing it to the model,

forcing the model to predict the entire output at once.

Grouped sampling only requires one use of a causal language model to generate text of any length, making it much more efficient.

I compared grouped sampling and the naive algorithm in translating TED talks.

The naive algorithm required 33.049 GPU hours that costs $17.87.

Grouped sampling required 0.028 GPU hours that costs $0.015.

Grouped sampling translated more accurately by 5%-24%, measured using BERT scores.

In conclusion, grouped sampling is an accurate and efficient text-generation technique.

It is 1180 times faster and cheaper to run than the naive algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 1,255 Commits
.github/workflows		.github/workflows
.idea		.idea
evaluation		evaluation
src		src
tests		tests
.deepsource.toml		.deepsource.toml
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cometml_key.py		cometml_key.py
dev-requirements.txt		dev-requirements.txt
embeddings_ispection.ipynb		embeddings_ispection.ipynb
fix_bitsandbytes.py		fix_bitsandbytes.py
profiling.ipynb		profiling.ipynb
publish_new_version.py		publish_new_version.py
requirements.txt		requirements.txt
run_docker_on_cloud.sh		run_docker_on_cloud.sh
run_tests.py		run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Single Usage Is All You Need

Awards:

FIRST PLACE in the Israeli Young Scientist and Developer Contest 2023

Finalist at Regeneron International Science and Engineering Fair 2023

5 high school credit points in data science. Grade: 99%

Abstarct:

About

Releases

Packages

Contributors 3

Languages

yonikremer/grouped_sampling

Folders and files

Latest commit

History

Repository files navigation

A Single Usage Is All You Need

Awards:

FIRST PLACE in the Israeli Young Scientist and Developer Contest 2023

Finalist at Regeneron International Science and Engineering Fair 2023

5 high school credit points in data science. Grade: 99%

Abstarct:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages