Steering sampling and decoding strategy with `num_beams` and `do_sample` #8353

d-kleine · 2024-07-07T17:56:12Z

Discussed in #8265

^{Originally posted by d-kleine July 3, 2024}
Enhancement idea
Would it be possible to add num_beams and do_sample to llama.cpp to steer sampling and decoding strategy easier?

For example when using for greedy decoding:
Setting temperature to 0 makes the model deterministic by focusing on the most likely token. However, this setting alone does not control the overall decoding strategy.

Setting num_beams to 1 ensures that the model does not use beam search, which is a strategy that explores multiple possible sequences to find the most probable one.
Setting do_sample to False ensures that the model does not use sampling methods like multinomial sampling, which introduce randomness into the token selection process.

Currently, in llama.cpp there is no native support for these params: Error: unknown parameter

Please see https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig

What do you think about this idea (before making it an Enhancement in Issues)?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8353

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8353

d-kleine commented Jul 7, 2024

Steering sampling and decoding strategy with num_beams and do_sample #8353

Steering sampling and decoding strategy with num_beams and do_sample #8353

Comments

d-kleine commented Jul 7, 2024

Discussed in #8265

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8353

Steering sampling and decoding strategy with `num_beams` and `do_sample` #8353