What is the split between your individual and enterprise customers?
Today it’s close to 60/40, 50/50. It was [previously] lower on the enterprise side. At the beginning of 2024, it was 90/10. A lot of that is growing on the enterprise side, especially now as we co-build and work deeper with some companies, like Deutsche Telekom or Epic Games.
ElevenLabs builds both conversational AI chatbots and creative tools that generate audio based on prompts. Which is moving quicker?
[time-brightcove not-tgx=”true”]
Conversational AI. The portion of self-serve that’s using conversational AI is mostly developers. There’s an interesting push from very traditional spaces to also move into that conversational segment. Epic Games is a great example. That was probably one of the biggest deployments that we’ve ever done, to bring into Fortnite an experience where everybody could interact with Darth Vader. That was done in partnership with the estate of James Earl Jones. But millions of players had Darth Vader live, which you could play with on the fly. That was not something that you could do traditionally, moving from static pre-generated lines into dynamic character. That was huge. Now we are seeing the customer-support [field] is moving in the direction of huge disruption [thanks to] conversational AI.
How can you keep ahead of big players?
I think we have some of the best people. And with the text to speech, [and] with this speech to text, the next big challenge that everybody’s trying to solve is, can you train an omni-model—so a combination of an LLM [large language model] and speech, which can produce much better conversation while making sure it’s not only emotional and quick, but also stable. We have a prototype internally that’s the biggest thing that we are trying to create later this year. But the goal for conversational AI as a product and for research that will power that product is effectively passing the Turing test for a conversation with an AI agent. So you feel like it’s a real conversation. That’s the North Star.
I thought we’d already passed the Turing test benchmark.
Pure voice interactions like customer support probably pass the Turing test, and I think, hopefully, we were one of the first to do so with some of these things we’re doing. But I think what it was is emotional contextual awareness and [there is a] higher intelligence threshold in a conversation.
How do you deal with people misusing your technology?
We’ve built safeguards. One is transparency, or provenance, so every [piece of] content is traceable back to the account. Second is the moderation side, where we moderate both the text and the voice. So, we moderate for fraud and scams, we moderate for child safety. On the voice side, we moderate to check they are not misused. The last piece is how we can bring the technology to people, so they know that they are interacting with AI. [There is] a classifier so people can upload audio content and get information as to [whether it’s] AI or not. We’re partnering with Oxford, Berkeley, Reality Defender, and AI safety institutes in the U.S. and U.K. to give a classifier to other organizations. The very big piece is how we can give all of the technology to good actors while preventing the bad actors. It’s that balance.