Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llm #2

Merged
merged 18 commits into from
Oct 23, 2023
Merged

Add llm #2

merged 18 commits into from
Oct 23, 2023

Conversation

Wovchena
Copy link
Collaborator

Ticket 121865

Ticket 121865
@Wovchena
Copy link
Collaborator Author

@slyalin , @apaniukov , @ilya-lavrenov - see this PR. It is basically dependent on llama.cpp due to tokenization and this is not great to be honest., could we cover that differently with ov-tokenizers perhaps?

Yes, there is a Python full pipeline example that uses llama tokenizer, which can be used here, after tokenzier conversion: https://github.com/apaniukov/openvino_contrib/tree/tokenizer-fix-decode/modules/custom_operations/user_ie_extensions/tokenizer/python#text-generation-pipeline

python demos/thirdparty/llama.cpp/convert.py open_llama_3b_v2/ --vocab-only --outfile open_llama_3b_v2/vocab.gguf

We could provide similar convert_tokenizer.py cli script, to get tokenizer/detokenizer.xml from HF hub or disk.

@apaniukov is there C++ API for tokenization and detokenization?

@apaniukov
Copy link
Contributor

@slyalin , @apaniukov , @ilya-lavrenov - see this PR. It is basically dependent on llama.cpp due to tokenization and this is not great to be honest., could we cover that differently with ov-tokenizers perhaps?

Yes, there is a Python full pipeline example that uses llama tokenizer, which can be used here, after tokenzier conversion: https://github.com/apaniukov/openvino_contrib/tree/tokenizer-fix-decode/modules/custom_operations/user_ie_extensions/tokenizer/python#text-generation-pipeline

python demos/thirdparty/llama.cpp/convert.py open_llama_3b_v2/ --vocab-only --outfile open_llama_3b_v2/vocab.gguf

We could provide similar convert_tokenizer.py cli script, to get tokenizer/detokenizer.xml from HF hub or disk.

@apaniukov is there C++ API for tokenization and detokenization?

Yes, there is. Right now is on openvin_contrib branch. Here is an instruction for building and installation.

It does not support llama tokenizers from tokenizer.json file, only for *.model files. All llama tokenizers should be interchangeable, so you can get it from here (or any other llama repo that have .model file).

Then you can convert tokenizer to OV models:

from transformers import AutoTokenizer
from openvino import save_model
from ov_tokenizer import init_extension, convert_tokenizer


init_extension("path/to/libuser_ov_extensions.so")

hf_tokenizer = AutoTokenizer.from_pretrained("microsoft/Llama2-7b-WhoIsHarryPotter")
ov_tokenizer, ov_detokenizer = convert_tokenizer(hf_tokenizer, with_decoder=True)

save_model(ov_tokenizer, "tokenizer.xml")
save_model(ov_detokenizer, "detokenizer.xml")

From here you can work with them like with any other OV model. You can also add similar postprocessing to the llama model to get out_token right away instead of logits.

Models cannot work with strings at this point, so you need to convert the input to uint8 tensor with a predefined format, see pack_strings. To get strings from the detokenizer uint8 output, see unpack_strings.

llm/llm.cpp Outdated
throw std::runtime_error("Model and vocab number of tokens don't match");
}
float* logits = ireq.get_tensor("logits").data<float>() + (prompt.size() - 1) * n_vocab;
ptrdiff_t out_token = std::max_element(logits, logits + n_vocab) - logits;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yury-gorbachev, I was told that you requested to add beam search. Should it be a separate application or should I provide beam search implementation only given that beam size of 1 is greedy sampling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge greedy sampling for now to unblock others

llm/README.md Outdated

## Supported models

1. [LLaMA 2](https://huggingface.co/meta-llama/Llama-2-13b-hf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably any models of these families?

@Wovchena Wovchena marked this pull request as ready for review October 23, 2023 10:51
@Wovchena Wovchena merged commit e2cff7b into openvinotoolkit:master Oct 23, 2023
1 check passed
}
ireq.get_tensor("input_ids").set_shape(tokenizer.get_tensor("input_ids").get_shape()); // TODO: replace with ireq.set_tensor("input_ids", tokenizer.get_tensor("input_ids")); after it's fixed
ireq.get_tensor("attention_mask").set_shape(tokenizer.get_tensor("input_ids").get_shape());
std::copy_n(tokenizer.get_tensor("input_ids").data<int32_t>(), tokenizer.get_tensor("input_ids").get_size(), ireq.get_tensor("input_ids").data<int32_t>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have Tensor::copy_to, which can also allocate output tensor

)
else()
target_compile_options(llm PRIVATE -Wall) # Display all warnings
target_compile_options(sentencepiece-static PRIVATE -Wno-stringop-overflow) # Disable the warning from openvino_contrib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this code to contrib

eaidova pushed a commit that referenced this pull request Dec 14, 2023
Verify beam_idx added, upgrade OpenVINO
vshampor pushed a commit to vshampor/openvino.genai that referenced this pull request May 13, 2024
as-suvorov pushed a commit to as-suvorov/openvino.genai that referenced this pull request May 16, 2024
sammysun0711 added a commit that referenced this pull request Jul 23, 2024
Added embedding pipeline and embedding handle, encapsulated handle code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants