Building a Voice-Activated Chatbot with JavaScript

Diving into various programming techniques can be both educational and immensely exciting. One project that particularly caught my eye involves building a voice-activated chatbot using JavaScript. This project integrates several APIs, including OpenAI for natural language processing, Whisper for audio transcription, and Eleven Labs for text-to-speech conversion, to create a seamless user experience.

In this blog post, I'll guide you through the process of developing this chatbot, explaining the code and sharing insights along the way. By the end, you'll understand how to set up a voice-activated chatbot and implement it using JavaScript.


Project Overview

This project combines multiple APIs to create an interactive chatbot that listens to user input, converts audio to text, processes the text using OpenAI's GPT-3.5 model, and converts the responses back to speech. This setup allows for a fluid conversation with the chatbot through voice commands.

My implementation maintains a conversation history to ensure the bot's responses are contextually relevant. To manage performance and memory, the conversation history is truncated periodically.


What I Learned

  • API Integration:Learned how to integrate different APIs to handle tasks like transcription, generating responses, and text-to-speech conversion.
  • User Interface Management: Improved skills in managing user interactions and dynamically updating the web page using JavaScript.
  • Audio Processing: Gained experience in handling audio input, processing it, and converting text back to audio.

Implementing the Voice-Activated Chatbot

Let's break down the key components and explore how to implement the voice-activated chatbot using JavaScript.

Conversation History

The conversation history is stored in an array, allowing the bot to maintain context and generate relevant responses. The history is truncated to manage memory and performance.

Getting the Chatbot Response

We send the user's input to OpenAI's GPT-3.5 API and retrieve the chatbot's response. The conversation history is managed to keep context relevant and concise. This involves pushing user inputs and bot responses to the conversation array and trimming it to maintain a manageable length.

Audio Transcription

To handle audio inputs, we use OpenAI's Whisper API to convert spoken input into text. This involves uploading an audio file and retrieving the transcribed text. I created a blob from the audio input and sent it to the Whisper API for transcription.

// Create a new Blob object and pass the recorded data to it
const blobTranscript = new Blob(chunks, { type: "audio/wav" });
chunks = [];
// Get the transcript from the audio
user_text = await getTranscript(blobTranscript);
Text-to-Speech Conversion

For converting text responses into speech, Eleven Labs' API is used. The text is sent to the API, which returns an audio file that is then played back to the user. I have limited free access to the Eleven Labs API, so I cannot guarantee that the TTS function will work.

Handling User Input

The user interface includes input fields for text, a dropdown for selecting prompts, and buttons for controlling voice recording. When a user enters text or uses the voice command, the chatbot processes the input and provides a response, which can also be spoken back using text-to-speech.


Experimentation and Exploration

Once the basics are in place, you can start experimenting with different functionalities:

  • Custom Prompts: Add custom prompts to guide the conversation and provide users with options.
  • Enhanced Responses: Improve the chatbot's responses by incorporating more context and personalization.
  • Multi-Language Support: Implement multi-language support to cater to a wider audience.
  • Enhanced Interactivity: Add more commands to the bot to handle various interactions and responses.

Conclusion

Creating a voice-activated chatbot is a fascinating project that combines multiple technologies to enhance user engagement. This project provided valuable insights into bot development, text-to-speech integration, and using AI chatbots. I hope this post has inspired you to explore bot development and perhaps experiment with your own implementations.

The website for this project can be found here, but like I said earlier in this post some features may not work due to usage limits on the free tokens.


Posted by: Aidan Vidal

Posted on: June 11, 2024