You recorded an interview in French and you need it in English: as quotes for an article, as captions for a video, or as a clean transcript a colleague can read. Doing this by hand means transcribing French, then translating it, then matching each line back to the right speaker. It is slow, and small errors creep in at every step. With the right workflow you can get an accurate, speaker-labeled English transcript in minutes, and still trace every sentence back to the original recording.
Here is how to do it well.
Start with the cleanest source you can get
Translation quality begins with transcription quality, and transcription quality begins with the audio. If you can, record each speaker on a separate channel, or at least ask people not to talk over each other. Before you upload, trim long silences and remove any segment that is off the record. A French interview with two clear voices and little background noise will translate far better than the same conversation captured in a noisy cafe. If your interview is inside a video file, you do not need to extract the audio first; modern tools read the audio track directly from MP4 and similar formats.
Transcribe first, then translate, and keep both
The reliable order is: transcribe the French audio into French text, then translate that text into English. Skipping straight to a single English output throws away the original wording, which is exactly what you need when you quote someone. A good system separates speakers automatically, so the interview reads like a script with Speaker A and Speaker B instead of one long block. It also timestamps every word, so each line is tied to a moment in the recording.
This is the workflow RealtimeVoiceKIT is built around. You upload the French audio or video, or paste a URL, and it returns a speaker-labeled transcript with word-level timestamps and confidence scores, then translates it into clean English while keeping the speaker labels and timing intact. You end up with the French transcript and the English version side by side, which is what makes accurate quoting possible.
Review the parts that matter
AI translation is strong, but no system is flawless with idioms, fast crosstalk, names, and technical terms. You do not need to reread everything. Skim the low-confidence sections the transcript flags, then focus on the lines you actually plan to publish. For a direct quote, check the English against the original French and the audio at that timestamp. French has formal and informal registers, and figures of speech that do not translate word for word, so a quick human pass keeps the meaning faithful. Fix any mistranslated names or jargon once, and the rest of the transcript usually holds up.
Export, caption, and cite reliably
Once the English reads cleanly, export in the format you need. Plain text works for an article. If the interview was filmed, export subtitles as SRT or WebVTT and you have English captions in minutes, with timing carried over from the original French audio. An AI summary is useful for pulling the key points and candidate pull-quotes before you write.
For citation, anchor every published quote to a timestamp in the source recording and keep the original French alongside your English. That way an editor or reader can verify the wording against the audio, and you can show that a translated quote reflects what was actually said. RealtimeVoiceKIT keeps both languages and the timestamps together, so this is a matter of copying, not reconstructing.
The easiest way to see whether this fits your work is to run a real interview through it. RealtimeVoiceKIT offers a free plan with 10 minutes per month, including speaker labels and subtitle export, with no credit card required. Upload a French clip, read the English back, and check a quote against the audio. When you need more, the Premium plan unlocks more minutes, translation across many languages, and the full developer API.
The RealtimeVoiceKIT team writes about audio, AI, and the workflows that turn recordings into reach for the RealtimeVoiceKIT team.