GenAI (LLM): how to capture streaming #1170

lmolkova · 2024-06-20T03:48:10Z

Some questions (and proposals) on capturing streaming LLM completions:

Should the GenAI span cover the duration till the last token in case of streaming?
- Yes, otherwise how do we capture completion, errors, usage, etc?
Do we need an event when the first token comes? Or another span to capture duration-to-first token from the beginning?
- This might be too verbose/not quite useful
Do we need some indication on the span that it represents a streaming call?
Do we need new metrics?
- see Add LLM model server metrics #1103 for server streaming metrics:
  - Time-to-first-token
  - Time-to-next-token
  - Number of active streams would also be useful - streaming seems to be quite hard and error prone and users would appreciate knowing they don't close streams, don't read them to the end, etc.
What should gen_ai.client.operation.duration capture?
- same as span: time-to-last-token

The text was updated successfully, but these errors were encountered:

karthikscale3 · 2024-08-22T06:34:22Z

Token Generation Latency is another metric that could be useful

TaoChenOSU · 2024-10-09T17:32:06Z

time-to-first-token and time-to-next-token could be hard to capture by some SDKs since a single chunk returned by some APIs may contain multiple tokens. Will time-to-first-response make more sense?

Another option would be we recommend people to indicate streaming or non-streaming in the operation name, such as streaming chat for streaming and chat for non-streaming.

lmolkova · 2024-10-09T19:24:18Z

time-to-first-token and time-to-next-token could be hard to capture by some SDKs since a single chunk returned by some APIs may contain multiple tokens. Will time-to-first-response make more sense?

good catch! maybe time-to-first-chunk and time-to-next-chunk ?

github-actions bot assigned jsuereth Jun 20, 2024

lmolkova unassigned jsuereth Jun 20, 2024

lmolkova added the area:gen-ai label Jun 20, 2024

nirga mentioned this issue Jul 5, 2024

🐛 Bug Report: incompatibilities with LLM semantics traceloop/openllmetry#1455

Open

1 task

lmolkova self-assigned this Aug 8, 2024

lmolkova mentioned this issue Aug 8, 2024

Add Initial Support for Instrumenting OpenAI Python Library - Chat Completion Create open-telemetry/opentelemetry-python-contrib#2759

Open

10 tasks

lmolkova mentioned this issue Sep 26, 2024

Add genai system-specific conventions for Azure AI Inference #1393

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenAI (LLM): how to capture streaming #1170

GenAI (LLM): how to capture streaming #1170

lmolkova commented Jun 20, 2024 •

edited

Loading

karthikscale3 commented Aug 22, 2024

TaoChenOSU commented Oct 9, 2024

lmolkova commented Oct 9, 2024

GenAI (LLM): how to capture streaming #1170

GenAI (LLM): how to capture streaming #1170

Comments

lmolkova commented Jun 20, 2024 • edited Loading

karthikscale3 commented Aug 22, 2024

TaoChenOSU commented Oct 9, 2024

lmolkova commented Oct 9, 2024

lmolkova commented Jun 20, 2024 •

edited

Loading