An honest comparison with existing voice control and dictation tools.
| Samsara | Dragon Professional | Talon Voice | Windows Voice Access | OpenWhispr | |
|---|---|---|---|---|---|
| Price | Free & open source | $500 – $700 | Free (beta $25/mo) | Free (Win 11 only) | Free (Pro $8/mo) |
| Platform | Windows | Windows | Mac, Linux, Windows | Windows 11 | Windows, Mac |
| Works offline | ✓ Fully local | ✓ Local | ✓ Local | ✗ Requires internet | ✓ Local |
| Speech engine | Whisper (faster-whisper / CTranslate2) | Proprietary (per-user trained) | Conformer (wav2letter) | Microsoft cloud ASR | Whisper + NVIDIA Parakeet |
| Voice commands | 360+ built-in | Basic app & text control | Extensive (community scripts) | Limited OS-level | Dictation only |
| Custom commands | Python plugins | Voice macros | .talon files + Python | Custom shortcuts (limited) | ✗ |
| Dictation | Real-time streaming | Real-time (requires training) | Raw transcription | Real-time | AI-cleaned dictation |
| Training required | None | 20–30 min + ongoing | None (steep learning curve) | None | None |
| Wake word | ✓ "Jarvis" (OpenWakeWord) | ✗ | ✗ | "Voice access" | ✗ Hotkey only |
| AI assistant | ✓ Ava (local or cloud LLM) | ✗ | ✗ | ✗ | ✗ |
| Text selection | Voice markers | Voice editing | Cursor/selection commands | Grid-based clicking | ✗ |
| Eye tracking | ✗ | ✗ | ✓ Tobii | ✗ | ✗ |
| GPU acceleration | ✓ CUDA optional | N/A | N/A | N/A | ✓ CUDA |
| Download size | 137 MB | ~3 GB | ~500 MB | Built-in | ~200 MB |
| Privacy | Zero telemetry | Phone-home licensing | No telemetry | Microsoft telemetry | No telemetry |
| Open source | ✓ BSL-1.1 → MIT 2030 | ✗ Proprietary | Partial (scripts open, core closed) | ✗ Proprietary | ✓ |
| Best for | Accessibility + desktop control | Professional dictation | Developers / power users | General Windows users | Fast dictation |
Samsara uses OpenAI's Whisper speech recognition model, running locally through
faster-whisper — a CTranslate2-optimized build. CTranslate2 is a C++
inference engine that runs the Whisper Transformer with int8 quantization, meaning 3.5 seconds
of audio transcribes in about 300ms on a GPU. That's an RTF (real-time factor) of 0.09 — roughly
11× faster than real time. On CPU, expect 1–2 seconds.
Wake word detection runs separately through OpenWakeWord, which uses small ONNX models
(~5ms per audio chunk on CPU). This means the app uses near-zero resources while idle — Whisper
only fires when the wake word is detected.
Voice activity detection uses Silero VAD, a lightweight neural network that classifies
audio as speech or silence before Whisper runs. This prevents wasted cycles on background noise.
The AI assistant (Ava) runs on local Ollama models or optionally routes to cloud providers
(DeepSeek, OpenAI, Anthropic) for users who want faster, more capable responses.