Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
benchmark_decode.py		benchmark_decode.py
benchmark_pytorch_engine_a100.sh		benchmark_pytorch_engine_a100.sh
benchmark_turbomind_engine_a100.sh		benchmark_turbomind_engine_a100.sh
profile_generation.py		profile_generation.py
profile_hf_generation.py		profile_hf_generation.py
profile_pipeline_api.py		profile_pipeline_api.py
profile_restful_api.py		profile_restful_api.py
profile_serving.py		profile_serving.py
profile_throughput.py		profile_throughput.py
profile_triton_python_backend.py		profile_triton_python_backend.py

README.md

Benchmark

We provide several profiling tools to benchmark our models.

profile with dataset

Download the dataset below or create your own dataset.

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

Profiling your model with profile_throughput.py

python profile_throughput.py \
 ShareGPT_V3_unfiltered_cleaned_split.json \
 /path/to/your/model \
 --concurrency 64

profile without dataset

profile_generation.py perform benchmark with dummy data.

pip install nvidia-ml-py

python profile_generation.py \
 /path/to/your/model \
 --concurrency 1 8 --prompt-tokens 1 512 --completion-tokens 2048 512

profile serving

Tools above profile models with Python API. profile_serving.py is used to do benchmark on serving.

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

python profile_serving.py \
    ${TritonServerAddress} \
    /path/to/tokenizer \ # ends with .model for most models. Otherwise, please pass model_path/triton_models/tokenizer.
    ShareGPT_V3_unfiltered_cleaned_split.json \
    --concurrency 64

profile restful api

profile_restful_api.py is used to do benchmark on api server.

wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json

python profile_restful_api.py \
    ${ServerAddress} \
    /path/to/tokenizer \ # ends with .model for most models. Otherwise, please pass model_path/triton_models/tokenizer.
    ShareGPT_V3_unfiltered_cleaned_split.json \
    --concurrency 64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

README.md

Benchmark

profile with dataset

profile without dataset

profile serving

profile restful api

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

Benchmark

profile with dataset

profile without dataset

profile serving

profile restful api