SPEECH-TO-TEXT

Ink-Whisper: streaming speech-to-text

Ink-Whisper: streaming speech-to-text

The fastest, most affordable real-world transcription model for conversational AI

From Sonic to Ink: fastest voice out,
now fastest voice in

Fastest TTCT. Ink-Whisper has the fastest time-to-complete-transcript for fluid, responsive interactions

Real-world accuracy. Get clear transcriptions despite everyday background noise, accents, and jargon

Most affordable. Ink-Whisper is the lowest cost streaming STT at 1 credit/sec ($0.13/hr on our Scale plan)

Purpose-built for voice agents, flexible for all streaming speech

Voice agents

Ink-Whisper delivers ultra-fast, accurate transcription that powers responsive experiences for human-agent conversations for support and sales to healthcare and finance.

Live captioning

Real-time translations

Purpose-built for voice agents, flexible for all streaming speech

Voice agents

Ink-Whisper delivers ultra-fast, accurate transcription that powers responsive experiences for human-agent conversations for support and sales to healthcare and finance.

Live captioning

Real-time translations

Purpose-built for voice agents, flexible for all streaming speech

Voice agents

Ink-Whisper delivers ultra-fast, accurate transcription that powers responsive experiences for human-agent conversations for support and sales to healthcare and finance.

Live captioning

Real-time translations

MEDIAN

P90

Cartesia Streaming
Ink-Whisper

66ms

Fireworks Whisper Streaming

70ms

Deepgram Nova3 Streaming

74ms

AssemblyAI Universal Streaming

737ms

The fastest streaming model

Delivering the most fluid conversations, Ink-Whisper has the fastest time-to-complete-transcript (TTCT) of any streaming speech-to-text model we’ve tested.

66ms

time-to-complete-transcript

MEDIAN

P90

Cartesia Streaming
Ink-Whisper

66ms

Fireworks Whisper Streaming

70ms

Deepgram Nova3 Streaming

74ms

AssemblyAI Universal Streaming

737ms

The fastest streaming model

Delivering the most fluid conversations, Ink-Whisper has the fastest time-to-complete-transcript (TTCT) of any streaming speech-to-text model we’ve tested.

66ms

time-to-complete-transcript

MEDIAN

P90

Cartesia Streaming
Ink-Whisper

66ms

Fireworks Whisper Streaming

70ms

Deepgram Nova3 Streaming

74ms

AssemblyAI Universal Streaming

737ms

The fastest streaming model

Delivering the most fluid conversations, Ink-Whisper has the fastest time-to-complete-transcript (TTCT) of any streaming speech-to-text model we’ve tested.

66ms

time-to-complete-transcript

Optimized for accuracy in real-world complexity

Optimized for accuracy in real-world complexity

Ink-Whisper delivers accurate transcription in the highly variable conditions of real-world conversation where standard STT models fall short

Audio Quality and Environment

Audio Quality and Environment

AUDIO

Telephony Artifacts

Due to compression, or low-bandwidth audio.

TRANSCRIPTION

HeyVanessa,umIneedtolearnalittlebitmoreaboutmyPTO.

TRANSCRIPTION

HeyVanessa,umIneedtolearnalittlebitmoreaboutmyPTO.

AUDIO

Telephony Artifacts

Due to compression, or low-bandwidth audio.

TRANSCRIPTION

HeyVanessa,umIneedtolearnalittlebitmoreaboutmyPTO.

AUDIO

Background noise

Like traffic, chatter, babies, static

TRANSCRIPTION

Hi,David.ThisisCarla.I'mcallingaboutmyshipment.ThetrackingIDis1Z4489XT73.

TRANSCRIPTION

Hi,David.ThisisCarla.I'mcallingaboutmyshipment.ThetrackingIDis1Z4489XT73.

AUDIO

Background noise

Like traffic, chatter, babies, static

TRANSCRIPTION

Hi,David.ThisisCarla.I'mcallingaboutmyshipment.ThetrackingIDis1Z4489XT73.

Elements of Natural Conversation

Elements of Natural Conversation

AUDIO

Disfluencies

Like "um", “ah” and pauses

TRANSCRIPTION

Hey,um,justwantedtocheckthatyou’restillonfordinnertomorrow,umyajustlikecallmebackandletmeknowhahaumokyeahthatsit,bye!

TRANSCRIPTION

Hey,um,justwantedtocheckthatyou’restillonfordinnertomorrow,umyajustlikecallmebackandletmeknowhahaumokyeahthatsit,bye!

AUDIO

Disfluencies

Like "um", “ah” and pauses

TRANSCRIPTION

Hey,um,justwantedtocheckthatyou’restillonfordinnertomorrow,umyajustlikecallmebackandletmeknowhahaumokyeahthatsit,bye!

AUDIO

Accents

Globally diverse voices and pronunciations

TRANSCRIPTION

Hi,I’mtryingtoreschedulemyhairappointmentatRuralSalon.ItwassupposedtobeonThursday.

TRANSCRIPTION

Hi,I’mtryingtoreschedulemyhairappointmentatRuralSalon.ItwassupposedtobeonThursday.

AUDIO

Accents

Globally diverse voices and pronunciations

TRANSCRIPTION

Hi,I’mtryingtoreschedulemyhairappointmentatRuralSalon.ItwassupposedtobeonThursday.

Linguistic complexity

Linguistic complexity

AUDIO

Proper nouns and domain terms

Like brands, medical or financial terms

TRANSCRIPTION

Hi,IjuststartedaroundofAmoxicillin,andIwantedtoaskifitwassafetotakethatwithmycurrentSpironolactoneprescription?

TRANSCRIPTION

Hi,IjuststartedaroundofAmoxicillin,andIwantedtoaskifitwassafetotakethatwithmycurrentSpironolactoneprescription?

AUDIO

Proper nouns and domain terms

Like brands, medical or financial terms

TRANSCRIPTION

Hi,IjuststartedaroundofAmoxicillin,andIwantedtoaskifitwassafetotakethatwithmycurrentSpironolactoneprescription?

Get started quickly and confidently

Voice platform integrations

Voice platform integrations

Voice platform integrations

Rapidly deploy Ink-Whisper to your voice agent through our seamless integrations with Vapi, LiveKit, and Pipecat

Lowest-cost

Lowest-cost

The most affordable streaming STT at just 1 credit per 1 second ($0.13/hr) on our Scale plan.

Enterprise-grade

Enterprise-grade

With 99.9% uptime and enterprise-grade compliance (SOC 2 Type II, HIPAA, PCI), you can trust us for reliability and security

Real-time, multimodal intelligence for every device.

Real-time, multimodal intelligence for every device.

Real-time, multimodal intelligence for every device.