Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding inference trace injection #36890

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
06cef91
adding inference trace injection
Aug 14, 2024
9dc2cf9
changing the interface based on feedback
Aug 16, 2024
58a032b
updates
Aug 16, 2024
ec1cd16
changing name of environment variable
Aug 20, 2024
3270076
changes based on review comments and some other changes
Sep 6, 2024
7cbbc0b
file name change
Sep 6, 2024
941a9ae
fixing exception handling
Sep 10, 2024
bcc6e74
relocating inference trace instrumentation
Sep 10, 2024
709923c
reverting change in azure core tracing
Sep 10, 2024
baac83f
Merge branch 'main' into mhietala/inference_genai_tracing
Sep 16, 2024
a64d870
fixes
Sep 16, 2024
198b9cd
changing span and model name for cases when model info not available
Sep 17, 2024
cd8bba2
some fixes
Sep 17, 2024
b28a3fe
adding sync trace tests
Sep 20, 2024
b549b38
fix and async trace test
Sep 23, 2024
469d32c
updating readme and setup
Sep 23, 2024
f1424a1
adding tracing sample
Sep 23, 2024
92da09a
changes based on review comments
Sep 25, 2024
d9652f5
changed to readme based on review comments
Sep 26, 2024
6da2a7d
removed distributed_trace and some other updates
Sep 26, 2024
521f7f0
fixing pre python v3.10 issue
Sep 26, 2024
814f87f
Merge branch 'Azure:main' into mhietala/inference_genai_tracing
M-Hietala Sep 26, 2024
8c80099
test fixes
Sep 26, 2024
514dea4
Fix some of the non-trace tests
dargilco Sep 26, 2024
83f85d6
fixing issues reported by tools
Sep 27, 2024
79ea9b3
Merge branch 'mhietala/inference_genai_tracing' of https://github.com…
Sep 27, 2024
e8dd67d
adding uninstrumentation to the beginning of tracing tests
Sep 27, 2024
0c286c3
updating readme and sample
Sep 27, 2024
1aaf87c
adding ignore related to tool issue
Sep 27, 2024
a1b1f13
Merge branch 'Azure:main' into mhietala/inference_genai_tracing
M-Hietala Sep 30, 2024
510a6ca
updating code snippet in readme
Sep 30, 2024
04da0e6
Merge branch 'mhietala/inference_genai_tracing' of https://github.com…
Sep 30, 2024
fa8e8b0
Add missing `@recorded_by_proxy` decorators to new tracing tests
dargilco Oct 1, 2024
e410c31
Push new recordings
dargilco Oct 1, 2024
18b3d92
fixing issues reported by tools
Oct 2, 2024
200ab61
Merge branch 'mhietala/inference_genai_tracing' of https://github.com…
Oct 2, 2024
4a56354
adding inference to shared requirements
Oct 2, 2024
3113e35
Merge branch 'Azure:main' into mhietala/inference_genai_tracing
M-Hietala Oct 2, 2024
58a754f
remove inference from setup
Oct 2, 2024
4ed67dc
adding comma to setup
Oct 3, 2024
5a0aa71
updating version requirement for core
Oct 3, 2024
1214978
changes based on review comments
Oct 7, 2024
1350293
Merge branch 'Azure:main' into mhietala/inference_genai_tracing
M-Hietala Oct 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,9 @@
"uamqp",
"uksouth",
"ukwest",
"uninstrument",
"uninstrumented",
"uninstrumenting",
"unpad",
"unpadder",
"unpartial",
Expand Down
89 changes: 89 additions & 0 deletions sdk/ai/azure-ai-inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,14 @@ To update an existing installation of the package, use:
pip install --upgrade azure-ai-inference
```

If you want to install Azure AI Inferencing package with support for OpenTelemetry based tracing, use the following command:

```bash
pip install azure-ai-inference[trace]
```



## Key concepts

### Create and authenticate a client directly, using API key or GitHub token
Expand Down Expand Up @@ -530,6 +538,87 @@ For more information, see [Configure logging in the Azure libraries for Python](

To report issues with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues)

## Tracing
M-Hietala marked this conversation as resolved.
Show resolved Hide resolved

The Azure AI Inferencing API Tracing library provides tracing for Azure AI Inference client library for Python. Refer to Installation chapter above for installation instructions.

### Setup

The environment variable AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED controls whether the actual message contents will be recorded in the traces or not. By default, the message contents are not recorded as part of the trace. When message content recording is disabled any function call tool related function names, function parameter names and function parameter values are also not recorded in the trace. Set the value of the environment variable to "true" (case insensitive) for the message contents to be recorded as part of the trace. Any other value will cause the message contents not to be recorded.

You also need to configure the tracing implementation in your code by setting `AZURE_SDK_TRACING_IMPLEMENTATION` to `opentelemetry` or configuring it in the code with the following snippet:

<!-- SNIPPET:sample_chat_completions_with_tracing.trace_setting -->

```python
from azure.core.settings import settings
settings.tracing_implementation = "opentelemetry"
```

<!-- END SNIPPET -->

Please refer to [azure-core-tracing-documentation](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme) for more information.

### Exporting Traces with OpenTelemetry

Azure AI Inference is instrumented with OpenTelemetry. In order to enable tracing you need to configure OpenTelemetry to export traces to your observability backend.
Refer to [Azure SDK tracing in Python](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme?view=azure-python-preview) for more details.

Refer to [Azure Monitor OpenTelemetry documentation](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable?tabs=python) for the details on how to send Azure AI Inference traces to Azure Monitor and create Azure Monitor resource.

### Instrumentation

Use the AIInferenceInstrumentor to instrument the Azure AI Inferencing API for LLM tracing, this will cause the LLM traces to be emitted from Azure AI Inferencing API.

<!-- SNIPPET:sample_chat_completions_with_tracing.instrument_inferencing -->

```python
from azure.core.tracing.ai.inference import AIInferenceInstrumentor
# Instrument AI Inference API
AIInferenceInstrumentor().instrument()
```

<!-- END SNIPPET -->


It is also possible to uninstrument the Azure AI Inferencing API by using the uninstrument call. After this call, the traces will no longer be emitted by the Azure AI Inferencing API until instrument is called again.

<!-- SNIPPET:sample_chat_completions_with_tracing.uninstrument_inferencing -->

```python
AIInferenceInstrumentor().uninstrument()
```

<!-- END SNIPPET -->

### Tracing Your Own Functions
The @tracer.start_as_current_span decorator can be used to trace your own functions. This will trace the function parameters and their values. You can also add further attributes to the span in the function implementation as demonstrated below. Note that you will have to setup the tracer in your code before using the decorator. More information is available [here](https://opentelemetry.io/docs/languages/python/).

<!-- SNIPPET:sample_chat_completions_with_tracing.trace_function -->

```python
from opentelemetry.trace import get_tracer
tracer = get_tracer(__name__)

# The tracer.start_as_current_span decorator will trace the function call and enable adding additional attributes
# to the span in the function implementation. Note that this will trace the function parameters and their values.
@tracer.start_as_current_span("get_temperature") # type: ignore
def get_temperature(city: str) -> str:

# Adding attributes to the current span
span = trace.get_current_span()
span.set_attribute("requested_city", city)

if city == "Seattle":
return "75"
elif city == "New York City":
return "80"
else:
return "Unavailable"
```

<!-- END SNIPPET -->

## Next steps

* Have a look at the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder, containing fully runnable Python code for doing inference using synchronous and asynchronous clients.
Expand Down
2 changes: 1 addition & 1 deletion sdk/ai/azure-ai-inference/assets.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
"AssetsRepo": "Azure/azure-sdk-assets",
"AssetsRepoPrefixPath": "python",
"TagPrefix": "python/ai/azure-ai-inference",
"Tag": "python/ai/azure-ai-inference_498e85cbfd"
"Tag": "python/ai/azure-ai-inference_19a0adafc6"
}
6 changes: 3 additions & 3 deletions sdk/ai/azure-ai-inference/azure/ai/inference/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ def load_client(
"The AI model information is missing a value for `model type`. Cannot create an appropriate client."
M-Hietala marked this conversation as resolved.
Show resolved Hide resolved
)

# TODO: Remove "completions" and "embedding" once Mistral Large and Cohere fixes their model type
if model_info.model_type in (_models.ModelType.CHAT, "completion"):
# TODO: Remove "completions", "chat-completions" and "embedding" once Mistral Large and Cohere fixes their model type
if model_info.model_type in (_models.ModelType.CHAT, "completion", "chat-completion", "chat-completions"):
chat_completion_client = ChatCompletionsClient(endpoint, credential, **kwargs)
chat_completion_client._model_info = ( # pylint: disable=protected-access,attribute-defined-outside-init
model_info
Expand Down Expand Up @@ -454,7 +454,7 @@ def complete(
:raises ~azure.core.exceptions.HttpResponseError:
"""

@distributed_trace
# pylint:disable=client-method-missing-tracing-decorator
def complete(
self,
body: Union[JSON, IO[bytes]] = _Unset,
Expand Down
4 changes: 2 additions & 2 deletions sdk/ai/azure-ai-inference/azure/ai/inference/aio/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ async def load_client(
)

# TODO: Remove "completions" and "embedding" once Mistral Large and Cohere fixes their model type
if model_info.model_type in (_models.ModelType.CHAT, "completion"):
if model_info.model_type in (_models.ModelType.CHAT, "completion", "chat-completion", "chat-completions"):
chat_completion_client = ChatCompletionsClient(endpoint, credential, **kwargs)
chat_completion_client._model_info = ( # pylint: disable=protected-access,attribute-defined-outside-init
model_info
Expand Down Expand Up @@ -630,7 +630,7 @@ async def complete(

return _deserialize(_models._patch.ChatCompletions, response.json()) # pylint: disable=protected-access

@distributed_trace_async
# pylint:disable=client-method-missing-tracing-decorator-async
async def get_model_info(self, **kwargs: Any) -> _models.ModelInfo:
# pylint: disable=line-too-long
"""Returns information about the AI model.
Expand Down
4 changes: 3 additions & 1 deletion sdk/ai/azure-ai-inference/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
-e ../../../tools/azure-sdk-tools
../../core/azure-core
aiohttp
../../core/azure-core-tracing-opentelemetry
aiohttp
opentelemetry-sdk
10 changes: 2 additions & 8 deletions sdk/ai/azure-ai-inference/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,7 @@ See [Prerequisites](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/

* Clone or download this sample repository
* Open a command prompt / terminal window in this samples folder
* Install the client library for Python with pip:
```bash
pip install azure-ai-inference
```
or update an existing installation:
```bash
pip install --upgrade azure-ai-inference
```
* Install the client library for Python with pip. See [Install the package](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/README.md#install-the-package)
* If you plan to run the asynchronous client samples, insall the additional package [aiohttp](https://pypi.org/project/aiohttp/):
```bash
pip install aiohttp
Expand Down Expand Up @@ -105,6 +98,7 @@ similarly for the other samples.
|[sample_get_model_info.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_get_model_info.py) | Get AI model information using the chat completions client. Similarly can be done with all other clients. |
|[sample_chat_completions_with_model_extras.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_model_extras.py) | Chat completions with additional model-specific parameters. |
|[sample_chat_completions_azure_openai.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_azure_openai.py) | Chat completions against Azure OpenAI endpoint. |
|[sample_chat_completions_with_tracing.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_tracing.py) | Chat completions with traces enabled. Includes streaming and non-streaming chat operations. The non-streaming chat uses function call tool and also demonstrates how to add traces to client code so that they will get included as part of the traces that are emitted. |

### Text embeddings

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
DESCRIPTION:
This sample demonstrates how to get a chat completions response from
the service using a synchronous client. The sample also shows how to
set default chat compoletions configuration in the client constructor,
set default chat completions configuration in the client constructor,
which will be applied to all `complete` calls to the service.
This sample assumes the AI model is hosted on a Serverless API or
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# ------------------------------------
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------
"""
DESCRIPTION:
M-Hietala marked this conversation as resolved.
Show resolved Hide resolved
This sample demonstrates how to use tracing with the Inference client library.
Azure AI Inference is instrumented with OpenTelemetry. In order to enable tracing
you need to configure OpenTelemetry to export traces to your observability backend.
This sample shows how to capture the traces to a file.

This sample assumes the AI model is hosted on a Serverless API or
Managed Compute endpoint. For GitHub Models or Azure OpenAI endpoints,
the client constructor needs to be modified. See package documentation:
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/README.md#key-concepts

USAGE:
python sample_chat_completions_with_tracing.py

Set these two environment variables before running the sample:
1) AZURE_AI_CHAT_ENDPOINT - Your endpoint URL, in the form
https://<your-deployment-name>.<your-azure-region>.models.ai.azure.com
where `your-deployment-name` is your unique AI Model deployment name, and
`your-azure-region` is the Azure region where your model is deployed.
2) AZURE_AI_CHAT_KEY - Your model key (a 32-character string). Keep it secret.
"""


M-Hietala marked this conversation as resolved.
Show resolved Hide resolved
import os
from opentelemetry import trace
# opentelemetry-sdk is required for the opentelemetry.sdk imports.
# You can install it with command "pip install opentelemetry-sdk".
#from opentelemetry.sdk.trace import TracerProvider
#from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage, CompletionsFinishReason
from azure.core.credentials import AzureKeyCredential

# [START trace_setting]
from azure.core.settings import settings
settings.tracing_implementation = "opentelemetry"
# [END trace_setting]

# Setup tracing to console
# Requires opentelemetry-sdk
#exporter = ConsoleSpanExporter()
#trace.set_tracer_provider(TracerProvider())
#tracer = trace.get_tracer(__name__)
#trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(exporter))


def chat_completion_streaming(key, endpoint):
client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
response = client.complete(
stream=True,
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="Tell me about software engineering in five sentences."),
],
)
for update in response:
if update.choices:
print(update.choices[0].delta.content or "", end="")
pass
client.close()

# [START trace_function]
from opentelemetry.trace import get_tracer
tracer = get_tracer(__name__)

# The tracer.start_as_current_span decorator will trace the function call and enable adding additional attributes
# to the span in the function implementation. Note that this will trace the function parameters and their values.
@tracer.start_as_current_span("get_temperature") # type: ignore
def get_temperature(city: str) -> str:

# Adding attributes to the current span
span = trace.get_current_span()
span.set_attribute("requested_city", city)

if city == "Seattle":
return "75"
elif city == "New York City":
return "80"
else:
return "Unavailable"
# [END trace_function]


def get_weather(city: str) -> str:
if city == "Seattle":
return "Nice weather"
elif city == "New York City":
return "Good weather"
else:
return "Unavailable"


def chat_completion_with_function_call(key, endpoint):
import json
from azure.ai.inference.models import ToolMessage, AssistantMessage, ChatCompletionsToolCall, ChatCompletionsToolDefinition, FunctionDefinition

weather_description = ChatCompletionsToolDefinition(
function=FunctionDefinition(
name="get_weather",
description="Returns description of the weather in the specified city",
parameters={
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city for which weather info is requested",
},
},
"required": ["city"],
},
)
)

temperature_in_city = ChatCompletionsToolDefinition(
function=FunctionDefinition(
name="get_temperature",
description="Returns the current temperature for the specified city",
parameters={
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city for which temperature info is requested",
},
},
"required": ["city"],
},
)
)

client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="What is the weather and temperature in Seattle?"),
]

response = client.complete(messages=messages, tools=[weather_description, temperature_in_city])

if response.choices[0].finish_reason == CompletionsFinishReason.TOOL_CALLS:
# Append the previous model response to the chat history
messages.append(AssistantMessage(tool_calls=response.choices[0].message.tool_calls))
# The tool should be of type function call.
if response.choices[0].message.tool_calls is not None and len(response.choices[0].message.tool_calls) > 0:
for tool_call in response.choices[0].message.tool_calls:
if type(tool_call) is ChatCompletionsToolCall:
function_args = json.loads(tool_call.function.arguments.replace("'", '"'))
print(f"Calling function `{tool_call.function.name}` with arguments {function_args}")
callable_func = globals()[tool_call.function.name]
function_response = callable_func(**function_args)
print(f"Function response = {function_response}")
# Provide the tool response to the model, by appending it to the chat history
messages.append(ToolMessage(tool_call_id=tool_call.id, content=function_response))
# With the additional tools information on hand, get another response from the model
response = client.complete(messages=messages, tools=[weather_description, temperature_in_city])

print(f"Model response = {response.choices[0].message.content}")


def main():
# [START instrument_inferencing]
from azure.core.tracing.ai.inference import AIInferenceInstrumentor
# Instrument AI Inference API
AIInferenceInstrumentor().instrument()
# [END instrument_inferencing]

try:
endpoint = os.environ["AZURE_AI_CHAT_ENDPOINT"]
key = os.environ["AZURE_AI_CHAT_KEY"]
except KeyError:
print("Missing environment variable 'AZURE_AI_CHAT_ENDPOINT' or 'AZURE_AI_CHAT_KEY'")
print("Set them before running this sample.")
exit()

M-Hietala marked this conversation as resolved.
Show resolved Hide resolved
print("===== starting chat_completion_streaming() =====")
chat_completion_streaming(key, endpoint)
print("===== chat_completion_streaming() done =====")

print("===== starting chat_completion_with_function_call() =====")
chat_completion_with_function_call(key, endpoint)
print("===== chat_completion_with_function_call() done =====")
# [START uninstrument_inferencing]
AIInferenceInstrumentor().uninstrument()
# [END uninstrument_inferencing]


if __name__ == "__main__":
main()
Loading
Loading