In the world of technology, the field of virtual reality (VR) is rapidly evolving, with AI-based chatbots now being integrated into VR environments. Our approach was to check to what extent it is already possible to equip scripted non-player characters, i.e. computer-controlled characters in games, with artificial intelligence to turn them into autonomous agents. The aim is for these AI agents to be able to engage in free dialog with players. In theory, this would make the gaming experience much more immersive and individual in the future, as the characters in the games could improvise in a similar way to actors.
Our VR chatbot prototype offers a showroom experience in virtual reality, accompanied by two virtual moderators. These are not conventional chatbots, but can provide comprehensive answers to questions about VR devices. The showroom is divided into areas for different VR devices such as Pico Neo 3, Oculus Rift, Meta Quest and HTC Vive.
The chatbot reacts proactively to user actions, such as when a user approaches a VR device. These interactions ensure a realistic experience. In our example, the two moderators showed clearly different character traits. It was notable that both the female and male characters tended to be extremely long-winded in their responses, a tendency that we could hardly stop. With more time, we probably could have managed this. I found it particularly fascinating that the virtual characters reacted to some conversations in surprising and sometimes angry ways, making it feel like we were communicating with a human being with their own whims. However, the discussions beyond our predetermined VR theme were sometimes on a rather childish level. The AI-controlled characters reminded me of twelve-year-old children in the body of an adult avatar who, for some inexplicable reason, had a detailed knowledge of virtual reality. The conversations became particularly amusing when I asked the avatars personal questions. For example, the female avatar found her existence boring at times, while the other described hers as extremely fulfilling.
An extensive knowledge database provides the chatbots with information about VR devices so that they can give precise answers and make comparisons between different devices. With regard to specialist knowledge: Although large language models have an extensive knowledge database, this is often not up to date. In our case, our chatbot lacked knowledge about the latest hardware. We therefore had to implement our own, up-to-date knowledge database on the topic of virtual reality. For example, our chatbot understood that the devices formerly known as Oculus Quest are now called Meta Quest. Before we had the updated database in place, the chatbot tried to correct me when I spoke of the Meta Quest and could not be convinced that the name had been changed.
An intent recognition function in the chatbot prototype enables the AI to react appropriately to certain inputs or events, such as switching to another moderator or following the user.
The moderators of the VR chatbot can have different personalities and speaking styles, which personalizes the experience and makes it more authentic. However, it is difficult to really assess the influence on the mind of the respective AI character based on abstract parameters.
We had to do a lot of testing here to avoid getting a totally moody and rude or unbearably over-motivated presenter.
The main challenge was to keep the latency between the user asking the question and the response from our chatbots as short as possible. This is a complex task, as a variety of technical processes take place in the background. First, the voice message of the user asking the question is recorded and sent to a server with a speech-to-text service when there is a pause in the conversation. The server converts the spoken word into text, which is then forwarded to the language model. The speech model then generates a response, which is converted into audio data using a text-to-speech engine. This audio data is sent back to our application and played back by the avatar as a response. Each of these phases must be completed as quickly as possible to ensure a credible and fluid conversation. As there is no language model on the market that works just as fast in German as in English, we had to have the avatars speak English. All attempts with German resulted in long waiting times between the question and the answer of our AI chatbots.
Unity is our 3D engine of choice. Unreal would also have been an option, but we are faster in Unity.
We have divided the avatars into Ready Player Me created. This is quite quick, but the Avatars but they all have a comic look.
Inworld is a company that specializes in the development of AI-based avatars. The company offers a platform for the creation and management of avatars that can be used in virtual worlds. With Inworld, AI avatars can be created quite intuitively and require hardly any programming experience. Conveniently, Ready Player Me avatars can be integrated directly into Inworld.
TTS stands for "text-to-speech" and refers to the technology of generating speech from text.
IBM Watson is an AI platform from IBM that can be used for a variety of applications, including TTS. Watson's TTS function sounds very human and offers a range of setting options to personalize the output.
ReadSpeaker is a company that specializes in the development of TTS solutions. The company offers a range of TTS solutions for various application areas.
ConvAI is a Germany-based company that specializes in the development of AI-based chatbots. The company offers a range of chatbot solutions for various application areas.
LMNT: Emotive AI is a company that specializes in the development of AI-based chatbots that can express emotions.
elevenlabs is a company that specializes in the development of TTS solutions. The company offers a range of TTS solutions for various application areas.
Category | Technology | Features |
---|---|---|
STT | OpenAI Whisper | + Automatically multilingual + Understands slurred speech + Very accurate ~ Autocorrection of wrong inflection - No audio stream support - Slowly |
ConvAI | + Fast + Supports audio streaming ~ Moderately accurate with clear pronunciation - Often swallows last spoken word | |
Inworld | + Fast + Supports audio streaming + Relatively accurate - English only | |
LLM | OpenAI ChatGPT | + Very accurate + Extensive knowledge + Very human answers + Responds appropriately to the role and with emotion + Answers also in colloquial language or slang - Content of the answers is difficult to control - Response length is difficult to narrow down - Often falls out of role despite instructions not to do so - Clever prompts lead to far-reaching digressions - Variable speed independent of the prompt length - Noticeably decreasing speed with prompt length |
ConvAI | + Very fast + Very good setting options + Simultaneous use of different personalities ~ Supports actions (but often triggers them unreliably and randomly) - Repetitive - Often starts German answers meaninglessly with "And," | |
Inworld | + Fast + Very good setting options + Does not fall out of place quickly + Simultaneous use of different personalities + Emotionevents - English only | |
TTS | IBM Watson | + Sounds very human + Adjustment options (pitch/speed) ~ Intonation ok ~ Phrasing ok |
ReadSpeaker | + Fast - Recognizably computer-generated - No emotion - Lack of intonation - Monotonous rhythm - No phrasing - No setting options | |
ConvAI | + Fast + Sounds human - Monotone - Emotionless | |
LMNT: Emotive AI | + Fast - Robotic detention - English only | |
Elevenlabs | + Sounds human + Good emotion + Good intonation + Natural rhythm + phrasing + Many setting options - Slowly | |
Inworld | + Fast + Sounds human + Many setting options - English only |
Despite promising aspects, there are challenges such as fine-tuning the AI and the risk of inappropriate responses. We are committed to further developing this technology and providing a safer, more interactive VR experience.
Our VR chatbot prototype represents a significant step forward in combining VR and AI. We continue to explore new possibilities and strive to improve the VR experience through advanced chatbots.
This HTML code uses headings (h1, h2) to clarify the structure of the content and contains meta tags for SEO purposes, such as keywords and description.
Are you interested in developing a virtual reality or 360° application? You may still have questions about budget and implementation. Feel free to contact me.
I am looking forward to you
Clarence Dadson CEO Design4real