Alone Together with ChatGPT's language model
As ChatGPT has been making headlines across every corner of the internet over the past couple of months, it’s worth remembering that the incredible technology behind it – OpenAI‘s GPT-3 language model – has actually been around for quite a bit longer. In fact, we have been working with it for the past 18 months, and I thought it would be useful to reflect on our experience.
As far back as mid/late 2021, we chose GPT-3 as the engine to drive our upcoming immersive AI short film series, Alone Together, produced with Dustin Harvey from XO Secret and Afro Viking Pictures. From the moment we integrated it into our app architecture we were blown away by its capability to mimic human speech behaviour, and it continues to amaze us every day.
Alone Together is a collection of episodes in which users watch a dramatic situation unfold between two characters, before finding themselves thrust into one of the roles and tasked with continuing the conversation. They are free to further explore the drama or to veer off and ask the character absolutely anything they want. Whatever the user decides, the character responds in an astonishingly human way – all thanks to GPT-3.
A hit at SXSW
Our goal was to explore the boundaries of AI chat in a creative context, and GPT-3 was by far the most promising of the language models available. And it certainly caught the attention of visitors to our demo at SXSW last year, being one of the first implementations of the technology in a creative medium.
The experience goes like this:
Watch a short film episode. After 2-3 minutes the linear film seamlessly becomes interactive, with the user taking the position of the main protagonist, but also now in control of the camera – with which they can look around the film set in 360 degrees. An accurate avatar of another character from the film is sitting opposite the user and speaks to them. The user talks back, and is understood by the avatar, leading them into a conversation. After a couple of minutes, the film becomes linear again.
For the SXSW demo we had developed a single short narrative, but have now developed this into a four-episode series, with each episode both a discrete experience and part of an overarching whole. To avoid any spoilers I won’t go into any more details just yet!
On the path to true human-AI conversation
The more important point, at least for those of us interested in the tech, is what’s happening inside the app. To deliver the user experience above, we complete the following sequence of events in under two seconds:
- Capture the user’s voice as an audio file (using custom algorithm to determine when user has started and stopped a sentence)
- Convert the audio to text using iOS native Speech-to-Text engine
- Send the resulting text to our GPT-3 model using OpenAI’s GPT-3 API
- Get resulting text response from GPT-3
- Send resulting text to Microsoft’s Text-to-Speech service
- Get resulting audio file from Microsoft and play in the app
- Synchronise the avatar to the resulting audio using SALSA lip-sync plugin
The result is AI interaction which is getting closer and closer to true human conversational behaviour. Its uses are clearly vast and widespread, but in our case we are thrilled to have successfully used it to unleash entirely new forms of creative expression.
ChatGPT will continue to astonish, terrify, delight and anger the world, but our experience with the GPT-3 model behind it demonstrates the mind-bending creative possibilities that are unlocked by this incredible advanced technology.
Alone Together is coming to festivals this February, with a wider release scheduled for later in 2023.