RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Itime-ren · 2024-10-08T02:37:08Z

System Info

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in #24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 479, in
main()
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 457, in main
write_tokenizer(
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 367, in write_tokenizer
tokenizer = tokenizer_class(input_tokenizer_path)
File "/home/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 157, in init
super().init(
File "/home/transformers/src/transformers/tokenization_utils_fast.py", line 132, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model

Who can help?

@ArthurZucker @itazap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 /Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py
--input_dir /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct
--model_size 1B
--output_dir /Data_disk/meta_llama/meta_llama3.2/out

Expected behavior

get safetensors

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-10-08T10:17:43Z

Hey @Itime-ren, what's the content of /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct?

If trying to use the llama 3.2 1B Instruct, why don't you use this repo which is already transformers-compatible?

Itime-ren added the bug label Oct 8, 2024

LysandreJik added the Core: Tokenization Internals of the library; Tokenization. label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Itime-ren commented Oct 8, 2024

LysandreJik commented Oct 8, 2024

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Comments

Itime-ren commented Oct 8, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Oct 8, 2024