Ego revolutionizes gaming with interactive characters powered by Cartesia
Ego revolutionizes gaming with interactive characters powered by Cartesia

Ego is an AI-native simulation engine where users can create and share 3D animated characters, worlds, and game scripts using natural language. Ego’s vision is revolutionary for gaming - imagine being able to prompt the most popular games today like Minecraft or Animal Crossing into existence through simple commands.
Ego’s founders bring deep expertise in both gaming and AI. CEO Vishnu Hari, a former Top 500 Overwatch player, previously led product teams at Facebook AI Applied Research (FAIR) and the Meta Horizon Scripting team. CTO Peggy Wang, who shipped ML algorithms for Meta's AR Avatar face tracking, brings experience from Stanford AI Research and Lyft Level 5 in autonomous behavior planning algorithms for robotics.
Partnering with Cartesia has accelerated Ego's vision of allowing gamers to spawn the 3D world they imagine, complete with AI agents and Non-Player Characters (NPCs) that display human-like behavior. This partnership was recently showcased with the launch of Thrall, a game mod for the Viking-themed survival game Valheim. Using Cartesia's ultra-low latency voice technology (<90 ms) and advanced voice design capabilities, Thrall responds to players with human-like reactions, emotional awareness, and natural interactions.
Check out the launch tweet from Ego here.
The Role of Voice AI in Immersive Gaming
Until recently, open-world gaming faced three major limitations:
NPCs were limited to rigid, pre-programmed behaviors without true agency
Creating game interactions required extensive coding expertise
Developing and scaling 3D assets and environments was prohibitively resource-intensive
The proliferation of generative AI has changed this landscape dramatically. Ego saw an opportunity to democratize immersive gaming by enabling anyone to create rich gaming experiences through natural language—similar to filling out a character sheet in Dungeons & Dragons.
Voice AI enhances two critical elements in modern gaming:
Non-Player Characters (NPCs): The computer-controlled characters that populate game worlds—from shopkeepers to quest-givers
AI Companions: Intelligent agents that can make decisions and interact like human players
Both require voices that can match their intelligence and adapt to emotion and context in real-time.
The challenge
When evaluating voice providers, Ego sought to solve several critical requirements:
Achieving natural, real-time voice interactions between players and AI companions
Generating contextually appropriate emotional responses
Supporting multiple languages for international players
Creating distinct voices for different character personalities
Maintaining low latency for seamless gaming experiences
The solution
Gaming demands voices that can match dynamic player interactions. Ego chose Cartesia's Sonic model for its distinctive capabilities:
Emotion-First Design: Unlike traditional text-to-speech models, Sonic enables AI companions to express a full spectrum of emotions—from curiosity to combat urgency—making every interaction feel authentic. Alternate providers typically have some form of emotion recognition pre-trained in their models, but contextual awareness falls flat. Cartesia gives customers fine-grained control over emotions without having to regenerate the voice over and over again.
Gaming-Optimized Latency: Powered by Cartesia's breakthrough SSM architecture, Sonic delivers industry-leading 90ms latency—essential for maintaining immersion in real-time gaming environments.
Unlimited Dynamic Character Voices: With Cartesia's instant voice cloning capability, each AI character maintains its unique voice characteristics, trainable with just 10 seconds of audio. This enables Ego to create diverse, memorable characters that resonate with players while maintaining consistent personalities.
Global Gaming Community: Ego’s games reach a global audience across 14 other languages in addition to regional accents, ensuring authentic character interactions worldwide.
Voice Prompting: Cartesia's voice changer allows precise control over dialogue delivery, replacing traditional studio sessions with efficient voice-prompted commands. Developers can record a sample with the desired style, emotion, and prosody, then apply it to any character voice—enabling natural voice interactions without the need for repeated studio sessions.
The results
Ego's launch of Thrall represents a breakthrough in NPC design—players can use natural voice commands to direct it in performing any task a human player would do, from gathering resources to engaging in combat. The AI companion responds with contextually appropriate emotions, celebrating successful hunts or acknowledging commands with human-like reactions. Most importantly, players can customize Thrall's personality and behavioral traits, creating a truly personalized gaming companion.
The partnership represents a significant step toward Ego's vision of democratizing game creation and enabling truly interactive gaming experiences where AI characters feel alive and responsive.
"Gaming has always been where communities form - from my generation's World of Warcraft and Runescape to today's Roblox and Minecraft. As games evolve into social platforms, AI characters need to feel genuinely human in both their responsiveness and emotional depth. Cartesia's technology, with its ultra-low latency, natural voices, and precise emotional control, helps us create truly immersive worlds where AI characters feel alive and authentic."
Peggy Wang, Co-Founder and CTO, Ego
"As an avid gamer, I've been amazed by Ego's breakthrough in creating truly responsive gaming experiences. Previously, NPCs and game characters were limited by rigid, pre-programmed behaviors. Sonic brings these characters to life with natural personalities and expressiveness. I'm excited to contribute to Ego's vision for the future of interactive gaming."
Karan Goel, CEO, Cartesia
Ego is an AI-native simulation engine where users can create and share 3D animated characters, worlds, and game scripts using natural language. Ego’s vision is revolutionary for gaming - imagine being able to prompt the most popular games today like Minecraft or Animal Crossing into existence through simple commands.
Ego’s founders bring deep expertise in both gaming and AI. CEO Vishnu Hari, a former Top 500 Overwatch player, previously led product teams at Facebook AI Applied Research (FAIR) and the Meta Horizon Scripting team. CTO Peggy Wang, who shipped ML algorithms for Meta's AR Avatar face tracking, brings experience from Stanford AI Research and Lyft Level 5 in autonomous behavior planning algorithms for robotics.
Partnering with Cartesia has accelerated Ego's vision of allowing gamers to spawn the 3D world they imagine, complete with AI agents and Non-Player Characters (NPCs) that display human-like behavior. This partnership was recently showcased with the launch of Thrall, a game mod for the Viking-themed survival game Valheim. Using Cartesia's ultra-low latency voice technology (<90 ms) and advanced voice design capabilities, Thrall responds to players with human-like reactions, emotional awareness, and natural interactions.
Check out the launch tweet from Ego here.
The Role of Voice AI in Immersive Gaming
Until recently, open-world gaming faced three major limitations:
NPCs were limited to rigid, pre-programmed behaviors without true agency
Creating game interactions required extensive coding expertise
Developing and scaling 3D assets and environments was prohibitively resource-intensive
The proliferation of generative AI has changed this landscape dramatically. Ego saw an opportunity to democratize immersive gaming by enabling anyone to create rich gaming experiences through natural language—similar to filling out a character sheet in Dungeons & Dragons.
Voice AI enhances two critical elements in modern gaming:
Non-Player Characters (NPCs): The computer-controlled characters that populate game worlds—from shopkeepers to quest-givers
AI Companions: Intelligent agents that can make decisions and interact like human players
Both require voices that can match their intelligence and adapt to emotion and context in real-time.
The challenge
When evaluating voice providers, Ego sought to solve several critical requirements:
Achieving natural, real-time voice interactions between players and AI companions
Generating contextually appropriate emotional responses
Supporting multiple languages for international players
Creating distinct voices for different character personalities
Maintaining low latency for seamless gaming experiences
The solution
Gaming demands voices that can match dynamic player interactions. Ego chose Cartesia's Sonic model for its distinctive capabilities:
Emotion-First Design: Unlike traditional text-to-speech models, Sonic enables AI companions to express a full spectrum of emotions—from curiosity to combat urgency—making every interaction feel authentic. Alternate providers typically have some form of emotion recognition pre-trained in their models, but contextual awareness falls flat. Cartesia gives customers fine-grained control over emotions without having to regenerate the voice over and over again.
Gaming-Optimized Latency: Powered by Cartesia's breakthrough SSM architecture, Sonic delivers industry-leading 90ms latency—essential for maintaining immersion in real-time gaming environments.
Unlimited Dynamic Character Voices: With Cartesia's instant voice cloning capability, each AI character maintains its unique voice characteristics, trainable with just 10 seconds of audio. This enables Ego to create diverse, memorable characters that resonate with players while maintaining consistent personalities.
Global Gaming Community: Ego’s games reach a global audience across 14 other languages in addition to regional accents, ensuring authentic character interactions worldwide.
Voice Prompting: Cartesia's voice changer allows precise control over dialogue delivery, replacing traditional studio sessions with efficient voice-prompted commands. Developers can record a sample with the desired style, emotion, and prosody, then apply it to any character voice—enabling natural voice interactions without the need for repeated studio sessions.
The results
Ego's launch of Thrall represents a breakthrough in NPC design—players can use natural voice commands to direct it in performing any task a human player would do, from gathering resources to engaging in combat. The AI companion responds with contextually appropriate emotions, celebrating successful hunts or acknowledging commands with human-like reactions. Most importantly, players can customize Thrall's personality and behavioral traits, creating a truly personalized gaming companion.
The partnership represents a significant step toward Ego's vision of democratizing game creation and enabling truly interactive gaming experiences where AI characters feel alive and responsive.
"Gaming has always been where communities form - from my generation's World of Warcraft and Runescape to today's Roblox and Minecraft. As games evolve into social platforms, AI characters need to feel genuinely human in both their responsiveness and emotional depth. Cartesia's technology, with its ultra-low latency, natural voices, and precise emotional control, helps us create truly immersive worlds where AI characters feel alive and authentic."
Peggy Wang, Co-Founder and CTO, Ego
"As an avid gamer, I've been amazed by Ego's breakthrough in creating truly responsive gaming experiences. Previously, NPCs and game characters were limited by rigid, pre-programmed behaviors. Sonic brings these characters to life with natural personalities and expressiveness. I'm excited to contribute to Ego's vision for the future of interactive gaming."
Karan Goel, CEO, Cartesia


Experience the emotin-first voice AI
Experience the emotin-first voice AI
Sonic enables AI companions to express a full spectrum of emotions—from curiosity to combat urgency
Sonic enables AI companions to express a full spectrum of emotions—from curiosity to combat urgency
Ego is an AI-native simulation engine where users can create and share 3D animated characters, worlds, and game scripts using natural language.
PRODUCTS
Voice Conversion
Voice Changer
Voice Cloning
Ego is an AI-native simulation engine where users can create and share 3D animated characters, worlds, and game scripts using natural language.
PRODUCTS
Voice Conversion
Voice Changer
Voice Cloning
Explore more success stories
Explore more success stories
Explore more success stories

Forethought partners with Cartesia to transform 1 Billion+ customer service calls per month
Read the full story

SuperDial revolutionizes healthcare administration with Cartesia voice AI
Read the full story
11x partners with Cartesia to redefine the future of work
Read the full story