Speech to Text

* Model data (75–145 MB) is downloaded on first use only. Audio is processed entirely in your browser.

Drag & drop audio/video here, or click to select

MP3 / WAV / MP4 / WebM / M4A / OGG

Model Language (leave blank for auto-detect)

Transcribe audio and video files with AI entirely in your browser using Whisper. Nothing sent to a server.