← Back to Samsara

How Samsara Compares

An honest comparison with existing voice control and dictation tools.

Samsara Dragon Professional Talon Voice Windows Voice Access OpenWhispr
Price Free & open source $500 – $700 Free (beta $25/mo) Free (Win 11 only) Free (Pro $8/mo)
Platform Windows Windows Mac, Linux, Windows Windows 11 Windows, Mac
Works offline ✓ Fully local ✓ Local ✓ Local ✗ Requires internet ✓ Local
Speech engine Whisper (faster-whisper / CTranslate2) Proprietary (per-user trained) Conformer (wav2letter) Microsoft cloud ASR Whisper + NVIDIA Parakeet
Voice commands 360+ built-in Basic app & text control Extensive (community scripts) Limited OS-level Dictation only
Custom commands Python plugins Voice macros .talon files + Python Custom shortcuts (limited)
Dictation Real-time streaming Real-time (requires training) Raw transcription Real-time AI-cleaned dictation
Training required None 20–30 min + ongoing None (steep learning curve) None None
Wake word ✓ "Jarvis" (OpenWakeWord) "Voice access" ✗ Hotkey only
AI assistant ✓ Ava (local or cloud LLM)
Text selection Voice markers Voice editing Cursor/selection commands Grid-based clicking
Eye tracking ✓ Tobii
GPU acceleration ✓ CUDA optional N/A N/A N/A ✓ CUDA
Download size 137 MB ~3 GB ~500 MB Built-in ~200 MB
Privacy Zero telemetry Phone-home licensing No telemetry Microsoft telemetry No telemetry
Open source ✓ BSL-1.1 → MIT 2030 ✗ Proprietary Partial (scripts open, core closed) ✗ Proprietary
Best for Accessibility + desktop control Professional dictation Developers / power users General Windows users Fast dictation

Under the Hood

Samsara uses OpenAI's Whisper speech recognition model, running locally through faster-whisper — a CTranslate2-optimized build. CTranslate2 is a C++ inference engine that runs the Whisper Transformer with int8 quantization, meaning 3.5 seconds of audio transcribes in about 300ms on a GPU. That's an RTF (real-time factor) of 0.09 — roughly 11× faster than real time. On CPU, expect 1–2 seconds.

Wake word detection runs separately through OpenWakeWord, which uses small ONNX models (~5ms per audio chunk on CPU). This means the app uses near-zero resources while idle — Whisper only fires when the wake word is detected.

Voice activity detection uses Silero VAD, a lightweight neural network that classifies audio as speech or silence before Whisper runs. This prevents wasted cycles on background noise.

The AI assistant (Ava) runs on local Ollama models or optionally routes to cloud providers (DeepSeek, OpenAI, Anthropic) for users who want faster, more capable responses.