Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Model ignores system prompt when use /completion endpoint #8393

Closed
andreys42 opened this issue Jul 9, 2024 · 5 comments
Closed

Bug: Model ignores system prompt when use /completion endpoint #8393

andreys42 opened this issue Jul 9, 2024 · 5 comments
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale

Comments

@andreys42
Copy link

andreys42 commented Jul 9, 2024

What happened?

I'm testing the Meta-Llama-3-8B-Instruct-Q8_0 model using the llamacpp HTTP server, both through the chatui interface and direct requests via Python's requests.

When I use chatui with the chatPromptTemplate option, everything works fine, and the model's output is predictable and desirable.

However, when I make direct requests to the same server with the same model, the output is messy (lot of newline characters, repeating of the question, and so on) and most of the system instructions are being ignored (but general logic of ouput is fine), when I ask to answer only with 0 or 1, model still trying to motivate its decision in output

My attempts so far have been:

  1. Use the same template (chatPromptTemplate from chatui) as the prompt key with user requests and assistant answers.

  2. Using the {"chat-template" :"llama3"}

  3. Using the prompt as a raw string of the current user's prompt with the "system_prompt" key as a base for the system instructions.

I've spent a lot of time trying to figure out the issue, but all of these approaches work much worse than using chatui way.

I believe the problem lies in my understanding of how to format the input prompts, and I'm not familiar enough with the syntax documentation.

Name and Version

lastest libs
Meta-Llama-3-8B-Instruct-Q8_0

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@andreys42 andreys42 added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jul 9, 2024
@dspasyuk
Copy link
Contributor

dspasyuk commented Jul 9, 2024

@andreys42 Unless you are using conversation llama-cli -cnv mode you will need to use --in-prefix --in-suffix or wrap your input in Llama3 prompt template.

@andreys42
Copy link
Author

@andreys42 Unless you are using conversation llama-cli -cnv mode you will need to use --in-prefix --in-suffix or wrap your input in Llama3 prompt template.

@dspasyuk tnx for suggestion, --in-prefix/--in-suffix indeed make sense, will try, thank you
As for using llama3 prompt template for my input, I did that and mention it before, this made no differences for me...

@matteoserva
Copy link
Contributor

You are probably using the wrong template.

Send your request to the /completion endpoint, then open the /slots endpoint to see what was effectively sent.

You can compare the good and bad prompts to see what was wrong.

@dspasyuk
Copy link
Contributor

@andreys42 here is the setting I use in llama.cui that works well across major models:

../llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct-q5_k_s.gguf --n-gpu-layers 25 -cnv --simple-io -b 2048 --ctx_size 0 --temp 0 --top_k 10 --multiline-input --chat-template llama3 --log-disable

Here is the result:

Screencast.from.2024-07-10.10.20.44.AM.webm

You can test it for yourself here: https://github.com/dspasyuk/llama.cui

@github-actions github-actions bot added the stale label Aug 10, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale
Projects
None yet
Development

No branches or pull requests

3 participants