Powered byChatGPTClaudeGoogle Gemini
Works withGoogle DriveDropboxOneDrive
Available onWebExtensionSoonDesktopSoonWindowsSoonAndroidSooniOSSoonMacSoon
Works inChromeFirefoxSafariEdge
Speech-to-text API

A speech-to-text API for developers

Integrate transcription into your product with a clean REST API, rtvk_ keys, webhooks, and predictable JSON with word-level timestamps, speaker labels, and 100+ languages.

Try it now, no signup

Record live or drop in a file (up to 30 MB) and watch it transcribe.

Tap to start recording from your microphone

RealtimeVoiceKIT gives you speech-to-text as a simple HTTP API. Authenticate with an rtvk_ key, submit audio or video by upload or URL, and receive predictable JSON with the transcript, word-level timestamps, confidence scores, and speaker labels. Jobs are asynchronous: submit and we call your webhook the moment a result is ready, no polling. The same API powers subtitles, translation, and AI summaries, so you can build a complete pipeline on one integration.

What developers build

In-product transcription

Add transcription to your app without running speech models yourself.

Automated pipelines

Wire transcription into ingestion and processing with webhooks.

Captioning at scale

Generate SRT and VTT for large media libraries programmatically.

Voice analytics

Feed timestamps, speakers, and summaries into your own analysis.

What's included

REST API with rtvk_ keysWebhooks (no polling)Word-level timestampsSpeaker labelsSubtitles, translation & summaries100+ languages

How it works

Drop audio · video · URLinterview.mp3
01

Create a key

Generate an rtvk_ API key from your dashboard.

Speaker 1
02

Submit audio

POST a file or URL; we transcribe it asynchronously.

ENES · FR · DE
TXTSRTVTT
03

Receive results

We call your webhook with predictable JSON, text, timestamps, speakers, and more.

Frequently asked questions

How is the speech-to-text API authenticated?

With bearer rtvk_ API keys you create in your dashboard. The same keys also work with our MCP server.

Does it use webhooks or polling?

Webhooks. Submit a job and RealtimeVoiceKIT calls your endpoint when it finishes, so you don't have to poll.

What does a response contain?

Predictable JSON with the transcript text, word-level timestamps, confidence scores, and speaker labels, plus subtitle, translation, and summary output.

Is there a free plan?

Yes. 10 minutes every month, free, so you can build and test before you scale. API access is included on the Premium and Business plans.

Build with the speech-to-text API

Create an rtvk_ key and add transcription to your product, start free with 10 minutes monthly.