Skip to content

Transcription Overview

FloWords uses state-of-the-art AI models to convert your voice into text. All processing happens locally on your Mac, ensuring complete privacy.

How It Works

When you speak, FloWords:

Captures audio from your selected microphone
Processes it locally using AI models on your Mac
Transcribes to text with high accuracy
Pastes automatically where your cursor is

Transcription Methods

FloWords offers multiple transcription approaches:

Local Models

Whisper and Parakeet models run entirely on your Mac. No internet required, complete privacy.

Cloud Providers

Optional cloud services for enhanced accuracy or speed. Requires API keys and internet.

Local vs Cloud

Aspect	Local Models	Cloud Providers
Privacy	100% private	Data sent to servers
Internet	Not required	Required
Cost	Free (included)	Pay per usage
Speed	Depends on Mac	Generally faster
Accuracy	Very good to excellent	Excellent

Supported Local Models

FloWords includes three local transcription engines. All run on your Mac and are multilingual.

Model	Engine	Download	RAM	Latency
Whisper Turbo (default)	OpenAI • Q5_0	~547 MB	~2 GB	~200-800 ms
Parakeet V3	NVIDIA • INT8	~640 MB	~2 GB	~50-200 ms
Apple Speech	Native macOS	None	Minimal	~100-500 ms

Whisper Turbo - recommended default, best balance of accuracy and speed
Parakeet V3 - fastest, lowest latency
Apple Speech - no download, on-device, great for quick drafts

Supported Cloud Providers

For users who prefer cloud transcription:

OpenAI Whisper API - High accuracy, reliable
Groq - Ultra-fast transcription
Deepgram - Real-time streaming
Google Gemini - Multimodal capabilities
ElevenLabs - Speech recognition
Mistral - European AI provider
Soniox - Multilingual async transcription

Fallback System

FloWords includes an intelligent fallback system:

Primary: Your selected model (local or cloud)
Secondary: Alternative model if primary fails
Tertiary: Apple’s native Speech Recognition

This ensures you always get a transcription, even if your preferred method encounters issues.

Audio Sources

FloWords can transcribe from:

Live microphone - Real-time dictation
Audio files - WAV, MP3, M4A, AAC, FLAC, AIFF, CAF
Video files - MP4, MOV (extracts audio)

Language Support

Whisper models support 99+ languages including:

English, Spanish, French, German
Chinese, Japanese, Korean
Arabic, Hindi, Portuguese
And many more…

Set your language in Settings > Model > Language or enable auto-detection.

Next Steps

Learn about Local Models for privacy-first transcription
Explore Cloud Providers for enhanced options
Transcribe Audio Files for existing recordings