Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steering sampling and decoding strategy with num_beams and do_sample #8353

Open
d-kleine opened this issue Jul 7, 2024 Discussed in #8265 · 0 comments
Open

Steering sampling and decoding strategy with num_beams and do_sample #8353

d-kleine opened this issue Jul 7, 2024 Discussed in #8265 · 0 comments

Comments

@d-kleine
Copy link

d-kleine commented Jul 7, 2024

Discussed in #8265

Originally posted by d-kleine July 3, 2024
Enhancement idea
Would it be possible to add num_beams and do_sample to llama.cpp to steer sampling and decoding strategy easier?

For example when using for greedy decoding:
Setting temperature to 0 makes the model deterministic by focusing on the most likely token. However, this setting alone does not control the overall decoding strategy.

  • Setting num_beams to 1 ensures that the model does not use beam search, which is a strategy that explores multiple possible sequences to find the most probable one.
  • Setting do_sample to False ensures that the model does not use sampling methods like multinomial sampling, which introduce randomness into the token selection process.

Currently, in llama.cpp there is no native support for these params: Error: unknown parameter

Please see https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig

What do you think about this idea (before making it an Enhancement in Issues)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant