Azure · M-Hietala · Aug 14, 2024 · Aug 16, 2024 · Aug 16, 2024 · Aug 20, 2024
diff --git a/.vscode/cspell.json b/.vscode/cspell.json
@@ -406,6 +406,9 @@
     "uamqp",
     "uksouth",
     "ukwest",
+    "uninstrument",
+    "uninstrumented",
+    "uninstrumenting",
     "unpad",
     "unpadder",
     "unpartial",

@@ -57,6 +57,14 @@ To update an existing installation of the package, use:
 pip install --upgrade azure-ai-inference
 ```
 
+If you want to install Azure AI Inferencing package with support for OpenTelemetry based tracing, use the following command:
+
+```bash
+pip install azure-ai-inference[trace]
+```
+
+
+
 ## Key concepts
 
 ### Create and authenticate a client directly, using API key or GitHub token
@@ -530,6 +538,87 @@ For more information, see [Configure logging in the Azure libraries for Python](
 
 To report issues with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues)
 
+## Tracing
+
+The Azure AI Inferencing API Tracing library provides tracing for Azure AI Inference client library for Python. Refer to Installation chapter above for installation instructions.
+
+### Setup
+
+The environment variable AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED controls whether the actual message contents will be recorded in the traces or not. By default, the message contents are not recorded as part of the trace. When message content recording is disabled any function call tool related function names, function parameter names and function parameter values are also not recorded in the trace. Set the value of the environment variable to "true" (case insensitive) for the message contents to be recorded as part of the trace. Any other value will cause the message contents not to be recorded.
+
+You also need to configure the tracing implementation in your code by setting `AZURE_SDK_TRACING_IMPLEMENTATION` to `opentelemetry` or configuring it in the code with the following snippet:
+
+<!-- SNIPPET:sample_chat_completions_with_tracing.trace_setting -->
+
+```python
+from azure.core.settings import settings
+settings.tracing_implementation = "opentelemetry"
+```
+
+<!-- END SNIPPET -->
+
+Please refer to [azure-core-tracing-documentation](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme) for more information.
+
+### Exporting Traces with OpenTelemetry
+
+Azure AI Inference is instrumented with OpenTelemetry. In order to enable tracing you need to configure OpenTelemetry to export traces to your observability backend. 
+Refer to [Azure SDK tracing in Python](https://learn.microsoft.com/python/api/overview/azure/core-tracing-opentelemetry-readme?view=azure-python-preview) for more details.
+
+Refer to [Azure Monitor OpenTelemetry documentation](https://learn.microsoft.com/azure/azure-monitor/app/opentelemetry-enable?tabs=python) for the details on how to send Azure AI Inference traces to Azure Monitor and create Azure Monitor resource.
+
+### Instrumentation
+
+Use the AIInferenceInstrumentor to instrument the Azure AI Inferencing API for LLM tracing, this will cause the LLM traces to be emitted from Azure AI Inferencing API.
+
+<!-- SNIPPET:sample_chat_completions_with_tracing.instrument_inferencing -->
+
+```python
+from azure.core.tracing.ai.inference import AIInferenceInstrumentor
+# Instrument AI Inference API
+AIInferenceInstrumentor().instrument()
+```
+
+<!-- END SNIPPET -->
+
+
+It is also possible to uninstrument the Azure AI Inferencing API by using the uninstrument call. After this call, the traces will no longer be emitted by the Azure AI Inferencing API until instrument is called again.
+
+<!-- SNIPPET:sample_chat_completions_with_tracing.uninstrument_inferencing -->
+
+```python
+AIInferenceInstrumentor().uninstrument()
+```
+
+<!-- END SNIPPET -->
+
+### Tracing Your Own Functions
+The @tracer.start_as_current_span decorator can be used to trace your own functions. This will trace the function parameters and their values. You can also add further attributes to the span in the function implementation as demonstrated below. Note that you will have to setup the tracer in your code before using the decorator. More information is available [here](https://opentelemetry.io/docs/languages/python/).
+
+<!-- SNIPPET:sample_chat_completions_with_tracing.trace_function -->
+
+```python
+from opentelemetry.trace import get_tracer
+tracer = get_tracer(__name__)
+
+# The tracer.start_as_current_span decorator will trace the function call and enable adding additional attributes
+# to the span in the function implementation. Note that this will trace the function parameters and their values.
+@tracer.start_as_current_span("get_temperature") # type: ignore
+def get_temperature(city: str) -> str:
+
+    # Adding attributes to the current span
+    span = trace.get_current_span()
+    span.set_attribute("requested_city", city)
+
+    if city == "Seattle":
+        return "75"
+    elif city == "New York City":
+        return "80"
+    else:
+        return "Unavailable"
+```
+
+<!-- END SNIPPET -->
+
 ## Next steps
 
 * Have a look at the [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples) folder, containing fully runnable Python code for doing inference using synchronous and asynchronous clients.

@@ -2,5 +2,5 @@
   "AssetsRepo": "Azure/azure-sdk-assets",
   "AssetsRepoPrefixPath": "python",
   "TagPrefix": "python/ai/azure-ai-inference",
-  "Tag": "python/ai/azure-ai-inference_498e85cbfd"
+  "Tag": "python/ai/azure-ai-inference_19a0adafc6"
 }
@@ -102,8 +102,8 @@ def load_client(
             "The AI model information is missing a value for `model type`. Cannot create an appropriate client."
         )
 
-    # TODO: Remove "completions" and "embedding" once Mistral Large and Cohere fixes their model type
-    if model_info.model_type in (_models.ModelType.CHAT, "completion"):
+    # TODO: Remove "completions", "chat-completions" and "embedding" once Mistral Large and Cohere fixes their model type
+    if model_info.model_type in (_models.ModelType.CHAT, "completion", "chat-completion", "chat-completions"):
         chat_completion_client = ChatCompletionsClient(endpoint, credential, **kwargs)
         chat_completion_client._model_info = (  # pylint: disable=protected-access,attribute-defined-outside-init
             model_info
@@ -454,7 +454,7 @@ def complete(
         :raises ~azure.core.exceptions.HttpResponseError:
         """
 
-    @distributed_trace
+    # pylint:disable=client-method-missing-tracing-decorator
     def complete(
         self,
         body: Union[JSON, IO[bytes]] = _Unset,

@@ -87,7 +87,7 @@ async def load_client(
         )
 
     # TODO: Remove "completions" and "embedding" once Mistral Large and Cohere fixes their model type
-    if model_info.model_type in (_models.ModelType.CHAT, "completion"):
+    if model_info.model_type in (_models.ModelType.CHAT, "completion", "chat-completion", "chat-completions"):
         chat_completion_client = ChatCompletionsClient(endpoint, credential, **kwargs)
         chat_completion_client._model_info = (  # pylint: disable=protected-access,attribute-defined-outside-init
             model_info
@@ -630,7 +630,7 @@ async def complete(
 
         return _deserialize(_models._patch.ChatCompletions, response.json())  # pylint: disable=protected-access
 
-    @distributed_trace_async
+    # pylint:disable=client-method-missing-tracing-decorator-async
     async def get_model_info(self, **kwargs: Any) -> _models.ModelInfo:
         # pylint: disable=line-too-long
         """Returns information about the AI model.

@@ -1,3 +1,5 @@
 -e ../../../tools/azure-sdk-tools
 ../../core/azure-core
-aiohttp
+../../core/azure-core-tracing-opentelemetry
+aiohttp
+opentelemetry-sdk
@@ -24,14 +24,7 @@ See [Prerequisites](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/
 
 * Clone or download this sample repository
 * Open a command prompt / terminal window in this samples folder
-* Install the client library for Python with pip:
-  ```bash
-  pip install azure-ai-inference
-  ```
-  or update an existing installation:
-  ```bash
-  pip install --upgrade azure-ai-inference
-  ```
+* Install the client library for Python with pip. See [Install the package](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/README.md#install-the-package)
 * If you plan to run the asynchronous client samples, insall the additional package [aiohttp](https://pypi.org/project/aiohttp/):
   ```bash
   pip install aiohttp
@@ -105,6 +98,7 @@ similarly for the other samples.
 |[sample_get_model_info.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_get_model_info.py) | Get AI model information using the chat completions client. Similarly can be done with all other clients. |
 |[sample_chat_completions_with_model_extras.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_model_extras.py) | Chat completions with additional model-specific parameters. |
 |[sample_chat_completions_azure_openai.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_azure_openai.py) | Chat completions against Azure OpenAI endpoint. |
+|[sample_chat_completions_with_tracing.py](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/samples/sample_chat_completions_with_tracing.py) | Chat completions with traces enabled. Includes streaming and non-streaming chat operations. The non-streaming chat uses function call tool and also demonstrates how to add traces to client code so that they will get included as part of the traces that are emitted. |
 
 ### Text embeddings
 

@@ -6,7 +6,7 @@
 DESCRIPTION:
     This sample demonstrates how to get a chat completions response from
     the service using a synchronous client. The sample also shows how to 
-    set default chat compoletions configuration in the client constructor,
+    set default chat completions configuration in the client constructor,
     which will be applied to all `complete` calls to the service.
 
     This sample assumes the AI model is hosted on a Serverless API or

@@ -0,0 +1,192 @@
+# ------------------------------------
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+# ------------------------------------
+"""
+DESCRIPTION:
+    This sample demonstrates how to use tracing with the Inference client library.
+    Azure AI Inference is instrumented with OpenTelemetry. In order to enable tracing
+    you need to configure OpenTelemetry to export traces to your observability backend.
+    This sample shows how to capture the traces to a file.
+
+    This sample assumes the AI model is hosted on a Serverless API or
+    Managed Compute endpoint. For GitHub Models or Azure OpenAI endpoints,
+    the client constructor needs to be modified. See package documentation:
+    https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/ai/azure-ai-inference/README.md#key-concepts
+
+USAGE:
+    python sample_chat_completions_with_tracing.py
+
+    Set these two environment variables before running the sample:
+    1) AZURE_AI_CHAT_ENDPOINT - Your endpoint URL, in the form 
+        https://<your-deployment-name>.<your-azure-region>.models.ai.azure.com
+        where `your-deployment-name` is your unique AI Model deployment name, and
+        `your-azure-region` is the Azure region where your model is deployed.
+    2) AZURE_AI_CHAT_KEY - Your model key (a 32-character string). Keep it secret.
+"""
+
+
+import os
+from opentelemetry import trace
+# opentelemetry-sdk is required for the opentelemetry.sdk imports.
+# You can install it with command "pip install opentelemetry-sdk".
+#from opentelemetry.sdk.trace import TracerProvider
+#from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
+from azure.ai.inference import ChatCompletionsClient
+from azure.ai.inference.models import SystemMessage, UserMessage, CompletionsFinishReason
+from azure.core.credentials import AzureKeyCredential
+
+ # [START trace_setting]
+from azure.core.settings import settings
+settings.tracing_implementation = "opentelemetry"
+# [END trace_setting]
+
+# Setup tracing to console
+# Requires opentelemetry-sdk
+#exporter = ConsoleSpanExporter()
+#trace.set_tracer_provider(TracerProvider())
+#tracer = trace.get_tracer(__name__)
+#trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(exporter))
+
+
+def chat_completion_streaming(key, endpoint):
+    client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
+    response = client.complete(
+        stream=True,
+        messages=[
+            SystemMessage(content="You are a helpful assistant."),
+            UserMessage(content="Tell me about software engineering in five sentences."),
+        ],
+    )
+    for update in response:
+        if update.choices:
+            print(update.choices[0].delta.content or "", end="")
+            pass
+    client.close()
+
+ # [START trace_function]
+from opentelemetry.trace import get_tracer
+tracer = get_tracer(__name__)
+
+# The tracer.start_as_current_span decorator will trace the function call and enable adding additional attributes
+# to the span in the function implementation. Note that this will trace the function parameters and their values.
+@tracer.start_as_current_span("get_temperature") # type: ignore
+def get_temperature(city: str) -> str:
+
+    # Adding attributes to the current span
+    span = trace.get_current_span()
+    span.set_attribute("requested_city", city)
+
+    if city == "Seattle":
+        return "75"
+    elif city == "New York City":
+        return "80"
+    else:
+        return "Unavailable"
+ # [END trace_function]
+
+
+def get_weather(city: str) -> str:
+    if city == "Seattle":
+        return "Nice weather"
+    elif city == "New York City":
+        return "Good weather"
+    else:
+        return "Unavailable"
+
+
+def chat_completion_with_function_call(key, endpoint):
+    import json
+    from azure.ai.inference.models import ToolMessage, AssistantMessage, ChatCompletionsToolCall, ChatCompletionsToolDefinition, FunctionDefinition
+
+    weather_description = ChatCompletionsToolDefinition(
+        function=FunctionDefinition(
+            name="get_weather",
+            description="Returns description of the weather in the specified city",
+            parameters={
+                "type": "object",
+                "properties": {
+                    "city": {
+                        "type": "string",
+                        "description": "The name of the city for which weather info is requested",
+                    },
+                },
+                "required": ["city"],
+            },
+        )
+    )
+
+    temperature_in_city = ChatCompletionsToolDefinition(
+        function=FunctionDefinition(
+            name="get_temperature",
+            description="Returns the current temperature for the specified city",
+            parameters={
+                "type": "object",
+                "properties": {
+                    "city": {
+                        "type": "string",
+                        "description": "The name of the city for which temperature info is requested",
+                    },
+                },
+                "required": ["city"],
+            },
+        )
+    )
+
+    client = ChatCompletionsClient(endpoint=endpoint, credential=AzureKeyCredential(key))
+    messages=[
+        SystemMessage(content="You are a helpful assistant."),
+        UserMessage(content="What is the weather and temperature in Seattle?"),
+    ]
+
+    response = client.complete(messages=messages, tools=[weather_description, temperature_in_city])
+
+    if response.choices[0].finish_reason == CompletionsFinishReason.TOOL_CALLS:
+        # Append the previous model response to the chat history
+        messages.append(AssistantMessage(tool_calls=response.choices[0].message.tool_calls))
+        # The tool should be of type function call.
+        if response.choices[0].message.tool_calls is not None and len(response.choices[0].message.tool_calls) > 0:
+            for tool_call in response.choices[0].message.tool_calls:
+                if type(tool_call) is ChatCompletionsToolCall:
+                    function_args = json.loads(tool_call.function.arguments.replace("'", '"'))
+                    print(f"Calling function `{tool_call.function.name}` with arguments {function_args}")
+                    callable_func = globals()[tool_call.function.name]
+                    function_response = callable_func(**function_args)
+                    print(f"Function response = {function_response}")
+                    # Provide the tool response to the model, by appending it to the chat history
+                    messages.append(ToolMessage(tool_call_id=tool_call.id, content=function_response))
+                    # With the additional tools information on hand, get another response from the model
+            response = client.complete(messages=messages, tools=[weather_description, temperature_in_city])
+
+    print(f"Model response = {response.choices[0].message.content}")
+
+
+def main():
+    # [START instrument_inferencing]
+    from azure.core.tracing.ai.inference import AIInferenceInstrumentor
+    # Instrument AI Inference API
+    AIInferenceInstrumentor().instrument()
+    # [END instrument_inferencing]
+
+    try:
+        endpoint = os.environ["AZURE_AI_CHAT_ENDPOINT"]
+        key = os.environ["AZURE_AI_CHAT_KEY"]
+    except KeyError:
+        print("Missing environment variable 'AZURE_AI_CHAT_ENDPOINT' or 'AZURE_AI_CHAT_KEY'")
+        print("Set them before running this sample.")
+        exit()
+
+    print("===== starting chat_completion_streaming() =====")
+    chat_completion_streaming(key, endpoint)
+    print("===== chat_completion_streaming() done =====")
+
+    print("===== starting chat_completion_with_function_call() =====")
+    chat_completion_with_function_call(key, endpoint)
+    print("===== chat_completion_with_function_call() done =====")
+    # [START uninstrument_inferencing]
+    AIInferenceInstrumentor().uninstrument()
+    # [END uninstrument_inferencing]
+
+
+if __name__ == "__main__":
+    main()