Intune AI Voice Bot

In the todays blog, I will announce the release of our experimental AI-driven voice assistant for Microsoft Intune-related questions. As you know, I am an huge fans of automation and AI technologies. I teamed up with Fabian Peschke to develop this innovative voice bot that aims to help users with their Intune questions.

Our voice bot is built using two different Microsoft cognitive services: Azure Speech Services and OpenAI’s GPT-35 Turbo. The Azure Speech Services allows the bot to recognize and synthesize speech, while OpenAI’s engine enables the bot to understand and respond to user queries intelligently. This bot was developed based on this example from microsoft.

Content

  1. Content
  2. Requirements
  3. How it works
    1. Azure Speech services
    2. OpenAI’s services
    3. Integration of Azure Speech Services and GPT-35 Turbo Engine
  4. Where can I find the script
  5. How to setup
    1. Set up Azure Speech Services
    2. Set up OpenAI’s GPT-35 Turbo Engine
    3. Configure the Script
  6. Conclusion

Requirements

  • Python 3.6 or higher
  • Azure Cognitive Services Speech SDK
  • OpenAI Python library

You can install the required libraries using pip:

pip install azure-cognitiveservices-speech
pip install openai

How it works

In this chapter, we will dive into how the AI-driven voice assistant works for Microsoft Intune. We will check the integration of Azure Speech Services and OpenAI’s engine and explain how they work together to provide a seamless user experience.

Azure Speech services

Azure Speech Services is a suite of APIs provided by Microsoft that facilitates speech recognition and synthesis. In our voice assistant, we use two main components of Azure Speech Services:

Speech Recognition: This component enables the voice assistant to transcribe spoken words into text. It listens to the user’s voice and converts the speech into text, which can then be processed by the OpenAi engine.

Speech Synthesis: This component is responsible for converting the text-based responses generated by the GPT-35 Turbo engine into spoken words. It uses a neural text-to-speech system to synthesise human-like speech, allowing the voice assistant to deliver answers audibly.

OpenAI’s services

The OpenAi engine is a powerful language model developed by OpenAI. It can understand and generate human-like text based on a given prompt. In our voice assistant, the GPT-35 Turbo engine processes the text generated by the speech recognition component and generates an appropriate response based on the user’s query.

Integration of Azure Speech Services and GPT-35 Turbo Engine

The script we’ve developed integrates Azure Speech Services and OpenAI’s GPT-35 Turbo engine. Here’s a step-by-step explanation of how the process works:

  • The user initiates the conversation with the voice assistant by saying “Hey” followed by their question.
  • The speech recognition component of Azure Speech Services transcribes the user’s speech into text.
  • The text is passed to the GPT-35 Turbo model, which processes the input and generates an appropriate response based on the briefing message and the base message defined in the script.

You are an Microsoft Intune senior expert voice assistant who can answer all intune related questions. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Microsoft products or intune.

  • The generated response is then passed to the speech synthesis component, which converts the text into speech and delivers the answer audibly.
  • If the user wishes to end the conversation, they can say “Stop” or press Ctrl-Z. To reset the conversation and delete the history, the user can say “Reset.”

Where can I find the script

As always you can find my script in my github repositroy or here:

import os
import azure.cognitiveservices.speech as speechsdk
import openai

# Speech Services
speech_key = ""
speech_region = "" #"eastus"
language = "" #"en-US"
voice = "" #"en-US-JennyMultilingualNeural"

# Open Ai
openai.api_key = ""
openai.api_base =  "https://XXXXXXX.openai.azure.com/"
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
deployment_id= "" #"gpt-35-turbo"

# Prompt
base_message = [{"role":"system","content":"You are an Microsoft Intune senior expert voice assistant who can answer all intune related questions. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Microsoft products or intune."}]


#######################
###### Functions ######
#######################
def ask_openai(prompt):
    base_message.append({"role": "user", "content": prompt})

    response = openai.ChatCompletion.create(
    engine="gpt-35-turbo",
    messages = base_message,
    temperature=0.24,
    max_tokens=50,
    top_p=0.95,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None)

    text = response['choices'][0]['message']['content'].replace('\n', ' ').replace(' .', '.').strip()
    print('Azure OpenAI response:' + text)
    base_message.append({"role": "assistant", "content": text})
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized to speaker for text [{}]".format(text))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))

def chat_with_open_ai():
    while True:
        print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
        try:
            speech_recognition_result = speech_recognizer.recognize_once_async().get()
            if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                text = speech_recognition_result.text
                if text == "Stop.": 
                    print("Conversation ended.")
                    break
                if text == "Reset.":
                    print("Reset")
                    base_message = [{"role":"system","content":"You are an AI voice assistant that helps to answer questions."}]
                if "Hey" in text: 
                    print("Recognized  speech: {}".format(speech_recognition_result.text))
                    ask_openai(speech_recognition_result.text)
            elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
                print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
                break
            elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
                cancellation_details = speech_recognition_result.cancellation_details
                print("Speech Recognition canceled: {}".format(cancellation_details.reason))
                if cancellation_details.reason == speechsdk.CancellationReason.Error:
                    print("Error details: {}".format(cancellation_details.error_details))
        except EOFError:
            break

#######################
######## Start ########
#######################
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_config.speech_recognition_language=language
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
speech_config.speech_synthesis_voice_name=voice
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config)

try:
    chat_with_open_ai()
except Exception as err:
    print("Encountered exception. {}".format(err))

How to setup

As descripted in the chapter “how it works” we need two cognitive services. In this chapter I will show you how you can add this both services in azure.

Set up Azure Speech Services

To set up Azure Speech Services, follow these steps:

  • In the search bar, type “Speech” and select “Speech” from the search results.
  • Click on the “Create” button to start the setup process.
  • Fill in the required fields, including the subscription, resource group, name, region, and pricing tier. Then click “Review + create

If this is your first resource you can select the price tier F0 to have some free contingent

  • After reviewing your settings, click “Create” to deploy the Speech service.
  • Once the deployment is complete, navigate to the “Keys and Endpoint” section of the Speech resource. Make a note of the “Key1” and the “Location/Region” as you will need them later to configure the script.

Set up OpenAI’s GPT-35 Turbo Engine

To set up the GPT-35 Turbo engine, you will need access to OpenAI’s API. Follow these steps:

  • Fill out the following formula to get access to the open ai beta. (It could take some days until this is approved)
  • Sign in to the Azure portal (https://portal.azure.com/)
  • Click Create a resource
  • In the search bar, type “OpenAi” and select “OpenAi” from the search results.
  • Click on the “Create” button to start the setup process.
  • Fill in the required fields, including the subscription, resource group, name, region, and pricing tier. Then click “Review + create
  • After reviewing your settings, click “Create” to deploy the OpenAi service.
  • Once the deployment is complete, navigate to the “Keys and Endpoint” section of the Speech resource. Make a note of the “Key1” and the “Endpoint” as you will need them later to configure the script.
  • Click on “Model deployment” and +Create
  • Deploy the gpt-35-turbo model

Configure the Script

Now that you have both Azure Speech Services and the OpenAI’s services set up, you need to configure the script with the required API keys and endpoints.

  • Open the script in your preferred code editor e.g. vs code.
  • Fill in the noted values in the variables
  • speech_key: Your Azure Cognitive Services Speech subscription key.
  • speech_region: The region of your Azure Cognitive Services Speech service (e.g., “eastus”).
  • language: The language for speech recognition and synthesis (e.g., “en-US”).
  • voice: The voice for speech synthesis (e.g., “en-US-JennyMultilingualNeural”).
  • openai.api_key: Your OpenAI API key.
  • openai.api_base: The base URL for the OpenAI API (e.g., “https://XXXXXXX.openai.azure.com/”).
  • deployment_id: The deployment ID for GPT-3.5 Turbo (e.g., “gpt-35-turbo”).

You can find more informations here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=stt

  • Save the changes and run the Script

You can run the script on you pc or you can build you own voice assistant with help of an raspberry pi, an speeker and an microphone.

Conclusion

With the script configured, you can now run the voice assistant. Follow the instructions in the script to start a conversation with the bot and ask your Microsoft Intune-related questions.

You have also a lot of possibilities to further develop this or to simplify you daily work or to answer your questions.

That’s it! You have successfully set up both cognitive services and configured the script for the AI-driven voice assistant. Enjoy your new Intune voice assistant and let me know if you have any questions or feedback.