Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
README.md		README.md
beam_search_causal_lm.cpp		beam_search_causal_lm.cpp
greedy_causal_lm.cpp		greedy_causal_lm.cpp
group_beam_searcher.hpp		group_beam_searcher.hpp
set_up_and_run.sh		set_up_and_run.sh

README.md

Causal LM

These applications showcase inference of a causal language model (LM). They don't have many configuration options to encourage the reader to explore and modify the source code. There's a Jupyter notebook which corresponds to these pipelines and discusses how to create an LLM-powered Chatbot: https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/254-llm-chatbot.

Note

This project is not for production use.

How it works

greedy_causal_lm

The program loads a tokenizer, a detokenizer and a model (.xml and .bin) to OpenVINO. A prompt is tokenized and passed to the model. The model greedily generates token by token until the special end of sequence (EOS) token is obtained. The predicted tokens are converted to chars and printed in a streaming fashion.

beam_search_causal_lm

The program loads a tokenizer, a detokenizer and a model (.xml and .bin) to OpenVINO. A prompt is tokenized and passed to the model. The model predicts a distribution over the next tokens and group beam search samples from that distribution to explore possible sequesnses. The result is converted to chars and printed.

Install OpenVINO Runtime

Install OpenVINO Runtime from an archive: Linux. <INSTALL_DIR> below refers to the extraction location.

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`

Linux/macOS

git submodule update --init
source <INSTALL_DIR>/setupvars.sh
cmake -DCMAKE_BUILD_TYPE=Release -S ./ -B ./build/ && cmake --build ./build/ -j

Windows

git submodule update --init
<INSTALL_DIR>\setupvars.bat
cmake -S .\ -B .\build\ && cmake --build .\build\ --config Release -j

Supported models

This pipeline can work with other similar topologies produced by optimum-intel with the same model signature.

Download and convert the model and tokenizers

The --upgrade-strategy eager option is needed to ensure optimum-intel is upgraded to the latest version.

Linux/macOS

source <INSTALL_DIR>/setupvars.sh
python3 -m pip install --upgrade-strategy eager "optimum>=1.14" -r ../../../llm_bench/python/requirements.txt ../../../thirdparty/openvino_contrib/modules/custom_operations/[transformers] --extra-index-url https://download.pytorch.org/whl/cpu
python3 ../../../llm_bench/python/convert.py --model_id meta-llama/Llama-2-7b-hf --output_dir ./Llama-2-7b-hf/ --precision FP16 --stateful
convert_tokenizer ./Llama-2-7b-hf/pytorch/dldt/FP16/ --output ./Llama-2-7b-hf/pytorch/dldt/FP16/ --with-detokenizer --trust-remote-code

Windows

<INSTALL_DIR>\setupvars.bat
python -m pip install --upgrade-strategy eager "optimum>=1.14" -r ..\..\..\llm_bench\python\requirements.txt ..\..\..\thirdparty\openvino_contrib\modules\custom_operations\[transformers] --extra-index-url https://download.pytorch.org/whl/cpu
python ..\..\..\llm_bench\python\convert.py --model_id meta-llama/Llama-2-7b-hf --output_dir .\Llama-2-7b-hf\ --precision FP16 --stateful
convert_tokenizer .\Llama-2-7b-hf\pytorch\dldt\FP16\ --output .\Llama-2-7b-hf\pytorch\dldt\FP16\ --with-detokenizer --trust-remote-code

Run

Usage:

greedy_causal_lm <MODEL_DIR> "<PROMPT>"
beam_search_causal_lm <MODEL_DIR> "<PROMPT>"

Examples:

./build/greedy_causal_lm ./Llama-2-7b-hf/pytorch/dldt/FP16/ "Why is the Sun yellow?"
./build/beam_search_causal_lm ./Llama-2-7b-hf/pytorch/dldt/FP16/ "Why is the Sun yellow?"

To enable Unicode characters for Windows cmd open Region settings from Control panel. Administrative->Change system locale->Beta: Use Unicode UTF-8 for worldwide language support->OK. Reboot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpp

cpp

README.md

Causal LM

How it works

greedy_causal_lm

beam_search_causal_lm

Install OpenVINO Runtime

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`

Linux/macOS

Windows

Supported models

Download and convert the model and tokenizers

Linux/macOS

Windows

Run

Files

cpp

Directory actions

More options

Directory actions

More options

Latest commit

History

cpp

Folders and files

parent directory

README.md

Causal LM

How it works

greedy_causal_lm

beam_search_causal_lm

Install OpenVINO Runtime

Build greedy_causal_lm, beam_search_causal_lm and user_ov_extensions

Linux/macOS

Windows

Supported models

Download and convert the model and tokenizers

Linux/macOS

Windows

Run

Build `greedy_causal_lm`, `beam_search_causal_lm` and `user_ov_extensions`