Cartesia | Series A and the future of voice AI

At Cartesia, we’re building the future of voice AI - ultra-realistic, fast, and controllable. Over the past year, we’ve powered millions of calls and helped tens of thousands of creators make their content more accessible.

We’re thrilled to announce our $64 million Series A led by Kleiner Perkins. The new funding will help us expand our team and invest in research to build the next generation of models, infrastructure, and products for voice, starting with the launch of our latest voice generation model—Sonic 2.0.

Sonic 2.0 is built on our new state space model architecture and is the fastest and most controllable voice model available today. It’s twice as large as Sonic, yet runs faster, at just 90ms latency for the full model and 40ms for turbo. And in blind, head-to-head evaluations on 100 held out voices, 1.5x as many people preferred Sonic 2.0 over the next best provider.

Beyond speed and quality, Sonic 2.0 offers unprecedented control over generations, with best-in-class voice cloning that captures complex accents and rich audio soundscapes. We’ve also introduced two powerful new endpoints:

Voice changer – Perfect the style and voice of your audio.
Infill – Seamlessly edit content within your audio.

We’re building the platform for Voice AI with enterprise-grade infrastructure. The Sonic API is purpose built for developers and has the most reliable and fastest serving stack for voice generation, with 99.9% uptime and the fastest P90 latencies globally. We’re SOC-2 and HIPAA compliant and support real-time on-premise and on-device deployments.

Finally, we’re continuing to advance our long-term research agenda. The next generation of audio models will require multiple algorithmic advances in several areas, including streaming architectures, codecs, long context modeling, and on-device inference - and we’re excited to share our progress here.

Learn more about our work on voice AI at cartesia.ai/sonic. If you’re interested in working with us, please reach out.