When completions streaming mode enabled, the response no longer includes the `usage` field. #291

Hime-Hina · 2023-03-18T00:48:20Z

Clear and concise description of the problem

As the official cookbook How to stream completions cites:

Another small drawback of streaming responses is that the response no longer includes the usage field to tell you how many tokens were consumed. After receiving and combining all of the responses, you can calculate this yourself using tiktoken.

Personally, I think it would be useful to implement that feature. Users wouldn't have to check the daily usage breakdown on their account page, and it would make for a more responsive and user-friendly experience.

Alternative

Maybe there is a way to implement it on the back-end by providing an API, but I have not succeeded in achieving that so far because it seems impossible to load a wasm file when deploying on Vercel. I have followed the tutorial on Vercel docs and tried some plugins to load the wasm file but failed. If anyone knows about this, please let me know! 😁

Additional context

I have not optimized my code, but it suffices for now. There are some bugs, as shown below:

The first completion is primed with \n\n, and 20 tokens are used. After conducting some tests, I have observed that the number of tokens of the completion seems to be equal to the number of tokens of the completion content only, indicating that the special tokens and line breaks are not included in the count (Please refer to the code for more details).

The second completion has exactly the same content as the first one, but is not primed with \n\n. As \n\n is encoded in 271, it indicates that one token is used. Therefore, the result is 19, which is exactly what we expected.

But the paradox is that

The daily usage gives me both 19. I have no ideas about this, it requires further testing.

If you know about this, please let me know! I would appreciate it.

In addition, I feel that my implementation method is still quite rough and only supports the 'gpt-3.5' model. I have not tested it on other models. Also, if you have any advice, please let me know too.

Validations

Follow our Code of Conduct
Read the Contributing Guide.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

The text was updated successfully, but these errors were encountered:

yzh990918 · 2023-03-18T05:18:32Z

Thank you for your sincere advice.

CNSeniorious000 · 2023-06-24T20:59:42Z

Thank you a lot @Hime-Hina! Inspired by your demo, I integrated the latest tiktoken library with my chatgpt-demo fork, for which you can view a live demo here. I found some conclusions related to token counting.

For conclusion, the pseudo formula can be represented as:

$$ \begin{align*} \text{prompt tokens} &= \sum_{\texttt{msg}}\texttt{( encode(msg).length+4 ) + 3}\\ \text{completion tokens} &= \texttt{encode(msg).length} \end{align*} $$

I've compared the token count in the API response's Header with the I calculated myself using Python and JavaScript respectively, and found that there is no issue. (Mention that I interestingly found that in fact the official tokenizer demo is a GPT-3 tokenizer, which encodes Chinese letters much worse than gpt-3.5-turbo's)

OpenAI also has a note of the markup language they created for conversations.

As you said, trying to make WASM work on edge functions is incredibly tough. I almost spent half a day with bugs. In the end, I found that this way works well in self-host route, which is similar to your solution in the dev branch of your demo repo. But this don't work on Edge Functions of Vercel or Netlify (yes, serverless functions work, but they can't stream responses). Finally I use fetch to load wasm and use dynamic import to solve this.

You can view my implementation through the following pages:

Component that works for self-host, Component that works on edges too
token counter utils
Netlify deploy Vercel deploy self-host deploy

Hime-Hina added the enhancement New feature or request label Mar 18, 2023

yzh990918 added V2 UI labels Mar 18, 2023

CNSeniorious000 mentioned this issue Jun 26, 2023

Allow for the configuration of the length of historical messages to circumvent the constraints imposed by the MaxTokens parameter. anse-app/anse#74

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When completions streaming mode enabled, the response no longer includes the `usage` field. #291

When completions streaming mode enabled, the response no longer includes the `usage` field. #291

Hime-Hina commented Mar 18, 2023

yzh990918 commented Mar 18, 2023

CNSeniorious000 commented Jun 24, 2023

When completions streaming mode enabled, the response no longer includes the usage field. #291

When completions streaming mode enabled, the response no longer includes the usage field. #291

Comments

Hime-Hina commented Mar 18, 2023

Clear and concise description of the problem

Suggested solution

Alternative

Additional context

Validations

yzh990918 commented Mar 18, 2023

CNSeniorious000 commented Jun 24, 2023

When completions streaming mode enabled, the response no longer includes the `usage` field. #291

When completions streaming mode enabled, the response no longer includes the `usage` field. #291