Skip to content

Embodied AI Sample Unity project using Microsoft Azure Cloud and Edge

Notifications You must be signed in to change notification settings

alexanikiev/Azure-EmbodiedAI-Sample

Repository files navigation

Azure-EmbodiedAI-Sample

Welcome to the Azure EmbodiedAI Sample repository on GitHub. With only about 500 lines of custom code written with the help of ChatGPT this simple Unity project demonstrates how to build an EmbodiedAI character which is able to speak, listen and see by using Azure Cognitive Services (Azure Speech SDK), Azure Kinect Body Tracking SDK and Unity Visual Scripting. Initial design for Microsoft's legendary Clippy character is inspired by Blender project here.

Motivation

Make everything as simple as possible, but not simpler.

Getting Started

This Unity sample is intended for Windows Desktop platform. Azure Kinect DK and Azure Kinect Body Tracking SDK are leveraged for Body Tracking (seeing), it is recommended to use NVIDIA GeForce RTX graphics and CUDA processing mode for better performance, however CPU processing mode may also be used otherwise. For speaking and listening Azure Speech SDK is utilized, it is assumed that your PC has microphone and speakers.

This sample is built based on the following foundational GitHub templates:

Solution architecture of this template is described in detail in the following Medium articles:

Solution demo video can be found here and Code walkthrough is available here.

Azure EmbodiedAI Sample Setup and Demo

Prerequisites

Please see the list below with the compatibility guidance for the versions of software the sample was developed and tested on.

Prerequisite Download Link Version
Unity https://unity.com/download 2021.3.16f1 (LTS)
Azure Speech SDK for Unity https://aka.ms/csspeech/unitypackage 1.24.2
Azure Kinect Body Tracking SDK https://www.microsoft.com/en-us/download/details.aspx?id=104221 1.1.2
NVIDIA CUDA Toolkit https://developer.nvidia.com/cuda-downloads 11.7
NVIDIA cuDNN for CUDA https://developer.nvidia.com/rdp/cudnn-archive 8.5.0

Please note that you will need to install Unity, Azure Kinect Body Tracking SDK, NVIDIA CUDA Toolkit and NVIDIA cuDNN for CUDA on your PC. And Azure Speech SDK for Unity and select Azure Kinect Body Tracking SDK, NVIDIA CUDA Toolkit and NVIDIA cuDNN for CUDA DLLs have been already packaged with the project for convenience of setup and configuration.

Installing

After you've cloned or downloaded the code please install the prerequisites. Please note that select large files (DLLs) are stored with Git LFS, so you will need to manually download the files listed here.

The default processing mode for Body Tracking is set to CUDA to optimize performance, to configure CUDA support please refer to guidance here and here. However you may change the processing mode to CPU as necessary.

Because this sample leverages Azure Backend service deployed in the Cloud, please follow Deployment guidance to deploy the necessary Azure Backend services. Once deployed, please copy this Auth file to the Desktop of your PC and propagate configuration details based on the deployed Azure Backend services.

Also this sample introduces a number of custom C# nodes for Unity Visual Scripting, after you open the project in Unity please Generate Nodes as described here.

Unity Sample Project

To run the sample please open SampleScene and hit Play. You may also select Clippy game object in the Hierarchy and open SampleStateMachine graph to visually inspect the sample flow.

Unity Visual Scripting State Graph

Cloud Deployment

This sample features One-Click Deployment for the necessary Azure Backend services. If you need to sign up for Azure Subscription please follow this link.

Deploy to Azure

Edge Deployment

Capability Docker Container Protocol Port
STT (Speech-to-Text) mcr.microsoft.com/azure-cognitive-services/speechservices/speech-to-text WS(S) 5001
TTS (Neural-Text-to-Speech) mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech WS(S) 5002
LUIS (Intent Recognition) mcr.microsoft.com/azure-cognitive-services/language/luis HTTP(S) 5003
ELASTIC-OSS Elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2 HTTP(S) 9200
ELASTIC-OSS Kibana docker.elastic.co/kibana/kibana-oss:7.10.2 HTTP(S) 5601
GPT-J HTTP(S) 5004

Next Steps

This minimalistic template may be a great jump start for your own EmbodiedAI project and the possibilities from now on are truly endless.

This template already gets you started with the following capabilities:

Please consider adding the following capabilities to your EmbodiedAI project as described in this Medium article:

Note: If you are interested in leveraging Microsot Power Virtual Agents (PVA) on Azure while building your EmbodiedAI project, please review this Medium article which provides details about using DialogServiceConnector class based on Interact with a bot in C# Unity GitHub template.

Please also review important Game Development Concepts in the context of Azure Speech SDK explained in detail in this article.

Future Updates

  • Character Rig and animations using Unity Mechanim
  • Azure Backend services deployment options using Bicep and Terraform
  • Backend services deployed on the Edge using Helm chart on Kubernetes (K8S) cluster
  • Sample template for Unreal Engine (UE) using Blueprints
  • Sample template for WebGL using Babylon.js

Lip Sync Visemes

This template features a simple and yet robust implementation of Lip Sync Visemes using Azure Speech SDK as explained in this video. You can choose to introduce visemes for your character using Maya Blend shapes, Blender shape keys, etc. Then in Unity you can animate visemes by manipulating with Blend shape weights, by means of Unity Blend tree, etc.

Lip Sync Visemes Maya Blend Shapes

Azure OpenAI

Sample context: Clippy is an endearing and helpful digital assistant, designed to make using Microsoft Office Suite of products more efficient and user-friendly. With his iconic paperclip shape and friendly personality, Clippy is always ready and willing to assist users with any task or question they may have. His ability to anticipate and address potential issues before they even arise has made him a beloved and iconic figure in the world of technology, widely recognized as an invaluable tool for productivity.

Experiment (1/23/2023): Azure OpenAI is deployed in South Central US region and Unity App client runs in West US geo.

Chat turn # Sample Human request Responsiveness test example 1 Responsiveness test example 2
1 What is Azure Cognitive Search? Total tokens: 117, Azure OpenAI processing timing: 90.6115 ms, Azure OpenAI Studio browser request/response timing: 1.56 s Total tokens: 117, Azure OpenAI processing timing: 96.2115 ms, Azure OpenAI Studio browser request/response timing: 1.40 s
2 What are the benefits of using Azure Cognitive Search? Total tokens: 177, Azure OpenAI processing timing: 128.6817 ms, Azure OpenAI Studio browser request/response timing: 1.41 s Total tokens: 167, Azure OpenAI processing timing: 73.1746 ms, Azure OpenAI Studio browser request/response timing: 957.91 ms
3 How can I index my data with Azure Cognitive Search? Total tokens: 246, Azure OpenAI processing timing: 168.1983 ms, Azure OpenAI Studio browser request/response timing: 1.34 s Total tokens: 229, Azure OpenAI processing timing: 229.8757 ms, Azure OpenAI Studio browser request/response timing: 597.07 ms

GPU Acceleration

When building modern immersive EmbodiedAI experiences responsiveness, fluidity and speed of interaction are very important. Thus GPU Acceleration will be an important factor for the successful adoption of your solution.

We've already described above how we take advantage of GPU Acceleration for Azure Kinect Body Tracking SDK.

GPU Acceleration can also help in the context of any custom Machine Learning models you may want to build, train, deploy and expose as a Web Service (Web API) for consumption in your EmbodiedAI solution. Please find more information about Building and Deploying custom Machine Learning models in Azure Cloud and Edge in these Medium articles: here and here. PyTorch is a popular open source machine learning framework which allows you to build such models and has built-in support for GPU Acceleration. When building and deploying custom models in Azure Cloud you can take advantage of Azure Machine Learning capabilities. When dealing with Hybrid and Edge scenarios you may consider Azure Stack managed appliances, for example, Azure Stack Edge Pro with GPU as described in these Medium articles: here and here. For the latter, you may also need to build your own Docker images for models training, running processes optimization or real-time inference. In the table below we provide a handful of options for GPU Accelerated PyTorch applications (MNIST example) using NVIDIA GPUs with CUDA support which may be great for training and/or inference.

Base Docker Image Description Dockerfile
python:3.10 Python Link
nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 NVIDIA CUDA + cuDNN Link
pytorch/pytorch:1.13.1-cuda11.6-cudnn8-devel PyTorch Link
mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04 Azure ML (AML) Link
nvcr.io/nvidia/pytorch:23.02-py3 NVIDIA PyTorch Link
mcr.microsoft.com/vscode/devcontainers/anaconda VSCode Dev Container Link

To build Docker images please use docker build -t your_name . command. To run Docker containers with GPU acceleration please use docker run --gpus all your_name command.

Reference local setup: A laptop with NVIDIA RTX GPU (great for training), a desktop with 2 NVIDIA GTX GPUs and Azure Stack Edge Pro with NVIDIA Tesla T4 GPU (great for inference).

Disclaimer

This code is provided "as is" without warranties to be used at your own risk.

About

Embodied AI Sample Unity project using Microsoft Azure Cloud and Edge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published