Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.compute() not compatible with perplexity for text-generation #483

Open
hans-ekbrand opened this issue Aug 7, 2023 · 1 comment
Open

Comments

@hans-ekbrand
Copy link

I'm trying to assess a model that does not fit into a single GPU, and therefore needs to be loaded with device_map="auto", and torch_dtype=torch.float16. As far as I understand, evaluate.evaluator.compute(), does not support passing arguments to AutoModelForCausalLM.from_pretrained(), nor does it support an instantiated model, so I had to resort to using a pipe.

$ python3.11
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import evaluate, torch, accelerate
[2023-08-07 17:06:13,359] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
>>> from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
>>> from datasets import load_dataset
>>> model_id = "TheBloke/Llama-2-7b-chat-fp16"
>>> my_pipe = pipeline(model=model_id, device_map="auto", torch_dtype=torch.float16, max_length=256)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:37<00:00, 18.81s/it]
>>> my_data = load_dataset("Abirate/english_quotes", split='train').select([1])
>>> my_metric=evaluate.load("perplexity")
>>> my_evaluator = evaluate.evaluator(task="text-generation")
>>> result = my_evaluator.compute(model_or_pipeline=my_pipe,
...                               data=my_data, input_column = "quote",
...                               metric=my_metric,
...                               tokenizer=AutoTokenizer.from_pretrained(model_id))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hans/.local/lib/python3.11/site-packages/evaluate/evaluator/base.py", line 261, in compute
    metric_results = self.compute_metric(
                     ^^^^^^^^^^^^^^^^^^^^
  File "/home/hans/.local/lib/python3.11/site-packages/evaluate/evaluator/base.py", line 467, in compute_metric
    result = metric.compute(**metric_inputs, **self.METRIC_KWARGS)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hans/.local/lib/python3.11/site-packages/evaluate/module.py", line 433, in compute
    self._finalize()
  File "/home/hans/.local/lib/python3.11/site-packages/evaluate/module.py", line 385, in _finalize
    file_paths, filelocks = self._get_all_cache_files()
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hans/.local/lib/python3.11/site-packages/evaluate/module.py", line 302, in _get_all_cache_files
    raise ValueError(
ValueError: Evaluation module cache file doesn't exist. Please make sure that you call `add` or `add_batch` at least once before calling `compute`.

How can one compute perplexity with a custom pipe?

@SamuelGong
Copy link

SamuelGong commented Dec 19, 2023

As a user, I found a walkaround: using customized metrics, carefully. It works for me and thus I am here to share with you.

Step 1: at some /path/to/somewhere, create a folder my_perplexity, under which further create a Python file my_perplexity.py. Copy all the content from the officially defined perplexity.py (e.g., here) and paste it there.

Step 2: Comment out the following two lines in the definition of _compute() in my_perplexity.py:

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

and add something like this:

model, tokenizer = model_id

This way, one receives a customized model and tokenizer piggybacked by the argument model_id. By the way, the argument model_id is the only argument you can override with no side-effect in this context.

Step 3: In the original evaluation code, we can now use a customized model and tokenizer in this way:

perplexity = evaluate.load("/path/to/somewhere/my_perplexity", module_type="metric")
results = metric.compute(
    model_id=(model, tokenizer),
    predictions=predictions  # a list of input text
)

Still, I hope the developer can directly support this. It is of no intrinsic difficulty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants