Ink: The fastest and most accurate speech to text model

Ranked #1 on accuracy, built for voice agents with semantic endpointing and industry-leading latency.
Need a topic?
What's your favorite way to spend a Sunday?

Join the teams making the switch to Cartesia

Artificial Analysis

Ranked #1

in Speech Arena leaderboard & Speech to Text leaderboard by Artificial Analysis

One transcription model for every environment your business takes you to

Trains rumble past, announcements crackle overhead. Ink-2 transcribes every word the caller says.

Noisy city

Morning street scene illustration
Midday street scene illustration
Night street scene illustration
Morning street scene illustration

Built for Voice Agents

Four capabilities that make Ink the transcription layer production agents rely on.

Dates, alphanumerics, IDs

Accuracy

Heard right the first time.

In practice

In a voice agent, the transcript is the foundation everything else builds on. A transcription error undermines the LLM input and takes the interaction in the wrong direction.

The inverse is equally true — accuracy compounds, and a precise transcript means a better response and a call that resolves.


Ink-2's approach

Ink has the lowest Word Error Rate (WER) of any streaming STT model, natively handling structured data — phone numbers, dates, emails, currencies, and UUIDs. Built for real-world audio settings — telephony, background noise, varied accents, and more.

Cartesia

turn.eager_end

0:05.8

Ink flags a likely end-of-turn — your agent can start replying

turn.eager_end

0:09.6

Caller keeps talking — Ink resumes, no false cutoff

turn.end

9.6s sooner

Your agent can respond before the competitor even starts

Competitor

turn.end

0:15.4

Only now does the agent know the caller finished — it starts from scratch

Conversational flow

Knows when you start and finish.

In practice

A conversation has two critical moments — when a caller starts talking and when they finish. Miss the start and the agent misses the turn entirely. Trigger too early on the end and the agent jumps in mid-thought. The right transcription model gets both right without the wait.


Ink-2's approach

Ink-2 is built with native turn detection — turn.start and turn.end signaled directly by the model, with no external VAD to integrate or maintain. For lower latency, turn.eager_end gives your LLM a head start before the turn is confirmed complete.

Semantic endpointing determines turn end by meaning, not silence — so pauses mid-thought don't trigger the agent prematurely.

Ink
88ms
Blink of an eye
100ms
Human response threshold
150ms

Speed

The caller stops talking.
The agent starts thinking.

In practice

When transcription is fast and consistent, the agent's response feels immediate. One slow transcript in ten means one call in ten where that readiness breaks. Nine great calls don't cancel out the one that didn't feel right.


Ink-2's approach

Ink is the fastest streaming ASR model - built on a custom inference engine purpose-built for real-time conversation. Time to final transcript is 0.1s, with turn.eager_end reducing the gap between the last word and the first response.

Cost

Quality that doesn't cost more as you grow.

In practice

Voice is the most natural interface for communication. Getting cost and quality right at scale enables voice everywhere — the default interface across every agentic interaction.


Ink-2's approach

Ink's State Space Model architecture delivers 10-100x the throughput of transformers — lower compute cost at scale, with no quality tradeoffs. Ongoing optimization of our model stack means better unit economics as you scale.

Enterprise-grade security. From Cloud to Local.

  • HIPAA compliant badge

    HIPAA compliant

  • SOC 2 Type 2 badge

    SOC 2 Type 2

  • GDPR badge

    GDPR

  • PCI badge

    PCI

FAQs

Get started today

Talk to an expert. Connect with a member of our team and learn how Cartesia can help you build world-class voice experiences.

Contact Sales

Start building. Access our models via API and bring an agent into production with our robust SDKs and developer tools.

Try Cartesia