Explore our advanced text to speech capabilities
Discover our text-to-speech technology in playground or via API for lifelike voices with accurate transcript following and full delivery controllability.
No hallucinations
Our TTS technology enables accurate transcript handling, even with complex content like numbers, dates, or medical jargon.
Fine-grained control
Adjust pitch, speed, and emotion with our creator studio and API to create a personalized and engaging audio experience.
Fast response time
Experience a blazing fast 40ms model latency with our Sonic Turbo model, ensuring seamless real-time interactions.
Accurate and high quality text to speech
Cartesia text to speech accurately handles complex transcript elements such as names, phone numbers, confirmation numbers, medical terms, and industry jargon. It's perfect for voice AI agents in healthcare, insurance, banking, etc.
Phone number
Date
Take full control of the voice delivery for your content, with complete expressiveness and slider controls for speed and emotions.
Surprised
Sad
You can fine-tune the audio by adding breaks or pauses using the <break />
tags. You can specify the break/pause duration in seconds (s
) or milliseconds (ms
).
With 500ms break tags
With 300ms and 600ms break tags
To spell out input text, you can wrap it in <spell>
tags. This is particularly useful for pronouncing long numbers or identifiers, such as credit card numbers, phone numbers, or unique IDs.
Spell out numbers
Complex number and letter
What our customers say
Join the growing list of companies opting for Sonic.
"Communication in healthcare relies on countless forms of complex phone-based interactions—from appointment scheduling and benefits verification to following up on claim denials. We wanted a text-to-speech solution that's both natural and context aware in order to optimize our solution for every common workflow. Cartesia has delivered this, helping our best-in-class voice AI agents automate tedious phone calls across various use cases on behalf of the healthcare organizations we serve."
Sam Schwager, CEO and Co-Founder of SuperDial
"In 1999, Salesforce brought software to the cloud. In 2025, 11x is killing software as we know it and unleashing the era of digital workers. To realise this vision, we needed AI voice technology that feels truly human. Cartesia’s technology gives our AI digital workers reps the speed, reliability, and natural expressiveness required to engage customers at scale.
It's the only solution fit for our relentless drive toward innovation.”
Keith Fearon, Head of Product & Growth, 11x
"Before conversational voice models like Cartesia, Thoughtly relied on legacy text-to-speech APIs from major cloud providers. Nearly two years later, the evolution of this technology is staggering—customers can clone their voice and hear it speaking autonomously over the phone in just 60 seconds.”
Torrey Leonard, CEO, Thoughtly
How to use our text to speech playground and API
Step One
Visit Cartesia's website and sign up to access to our TTS in the playground. Explore the documentation for API integration details.
Step Two
Select the desired language and voice settings. Use the playground or API to input text and generate audio in real-time.
Step Three
Implement the generated audio into your application, or export the audio to MP3, M4a or other prefered audio formats.