Try it now, no signup
Record live or drop in a file (up to 30 MB) and watch it transcribe.
Tap to start recording from your microphone
RealtimeVoiceKIT gives you speech-to-text as a simple HTTP API. Authenticate with an rtvk_ key, submit audio or video by upload or URL, and receive predictable JSON with the transcript, word-level timestamps, confidence scores, and speaker labels. Jobs are asynchronous: submit and we call your webhook the moment a result is ready, no polling. The same API powers subtitles, translation, and AI summaries, so you can build a complete pipeline on one integration.
What developers build
In-product transcription
Add transcription to your app without running speech models yourself.
Automated pipelines
Wire transcription into ingestion and processing with webhooks.
Captioning at scale
Generate SRT and VTT for large media libraries programmatically.
Voice analytics
Feed timestamps, speakers, and summaries into your own analysis.
What's included
How it works
Create a key
Generate an rtvk_ API key from your dashboard.
Submit audio
POST a file or URL; we transcribe it asynchronously.
Receive results
We call your webhook with predictable JSON, text, timestamps, speakers, and more.
Frequently asked questions
How is the speech-to-text API authenticated?
With bearer rtvk_ API keys you create in your dashboard. The same keys also work with our MCP server.
Does it use webhooks or polling?
Webhooks. Submit a job and RealtimeVoiceKIT calls your endpoint when it finishes, so you don't have to poll.
What does a response contain?
Predictable JSON with the transcript text, word-level timestamps, confidence scores, and speaker labels, plus subtitle, translation, and summary output.
Is there a free plan?
Yes. 10 minutes every month, free, so you can build and test before you scale. API access is included on the Premium and Business plans.