Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor perplexity implementations to be usable with evaluators #240

Open
mathemakitten opened this issue Aug 9, 2022 · 0 comments · May be fixed by #252
Open

Refactor perplexity implementations to be usable with evaluators #240

mathemakitten opened this issue Aug 9, 2022 · 0 comments · May be fixed by #252

Comments

@mathemakitten
Copy link
Contributor

Currently the perplexity metric and measurement both instantiate an entire model object within the _compute() function and run inference, which breaks the pattern where only predictions, references, and other metadata is passed in and only the metric computation is performed (e.g. inference is already run before the metric hits its _compute function, which is the case for most other metrics).

Screen Shot 2022-08-09 at 12 13 18 PM

Evaluators already instantiate model-like objects with model_or_pipeline, which makes it redundant to reinstantiate another copy of the model/pipeline within the _compute() function. It seems like this is done because perplexity can be used in a standalone manner to calculate the perplexity of some text with regards to a pretrained model (e.g. perplexity.compute(model_id='gpt2', add_start_token=False, input_texts=input_texts), trading off developer flexibility for user-friendliness.

For background, this is relevant since I'm in the middle of implementing a "text-generation" evaluator, and it makes sense for perplexity to be the default metric for text generation. We could alternatively write one-off logic for the text generation evaluator to compute perplexity with the model instantiated in the evaluator instead of using the metric implemented in evaluate, but that seems suboptimally out-of-pattern. Additionally, perplexity metrics right now are limited to be used with models which can easily instantiated via AutoModelForCausalLM.from_pretrained(model_id), but does not support a generic call to a language model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant