Getting Started
Everything you need to install, configure, and start using Samsara.
Installation
Requirements
- Windows 10 or 11
- Python 3.10+ (if running from source)
- A microphone (USB, headset, or built-in)
- Optional: NVIDIA GPU with CUDA for faster transcription
- Optional: Ollama installed for Ava voice assistant
Download
Download the latest release from GitHub Releases:
- Samsara-Windows-v0.9.9.7z — CPU version, works on any machine
- Samsara-CUDA-Pack-v0.9.9.zip — Optional GPU acceleration (extract into the same folder)
From Source
git clone https://github.com/Morne-Ingstar/Samsara.git cd Samsara pip install -r requirements.txt python dictation.py
First Run
On first launch, a setup wizard walks you through:
- Selecting your microphone
- Choosing a Whisper model size (small.en recommended for most users)
- Setting your record hotkey (default: Ctrl+Shift, hold to talk)
- Choosing a wake word (default: "Jarvis")
- Configuring streaming dictation hotkey (default: CapsLock)
All settings can be changed later from the Settings window.
Basic Usage
Push-to-Talk Dictation
Hold Ctrl+Shift and speak. Release when done. Your speech is transcribed and pasted at the cursor position.
Wake Word Commands
Say your wake word (default: "Jarvis") followed by a command. For example:
- "Jarvis, open Chrome"
- "Jarvis, scroll down"
- "Jarvis, take a screenshot"
- "Jarvis, mute"
Samsara listens for the wake word passively. When it hears it, it captures the following command and executes it.
Streaming Dictation
Press CapsLock to start streaming mode. Speak continuously — text appears in real time as you talk. Press CapsLock again to stop.
Voice Modes
Samsara has six voice modes. All can run simultaneously — wake word mode and continuous mode share a single audio stream internally, so there are no conflicts.
| Mode | Activation | Purpose |
|---|---|---|
| Push-to-Talk | Hold Ctrl+Shift | Quick dictation — speak while held, pastes on release |
| Wake Word | Say your wake phrase (default: "Jarvis") | Hands-free commands — Samsara listens passively for the wake phrase, then captures and executes your command |
| Streaming | Enable in tray menu, then toggle CapsLock | Continuous dictation — text flows in real time as you speak |
| Command Mode | Hold Right Ctrl (configurable) | Walkie-talkie style — hold, speak a command, release to execute |
| Continuous | Toggle Ctrl+Alt+D | Always-on dictation — everything you say is transcribed |
| Ava Mode | Hold Right Alt | Talk to Ava voice assistant — ask questions, request actions, teach aliases |
All hotkeys are configurable in Settings. See the Hotkeys & Modes section for details on each.
Settings
Samsara's settings are organised into eight tabs. Open settings from the system tray icon or say "Jarvis, open settings."
General
| Setting | What it does |
|---|---|
| Microphone | Select your input device. USB mics and headsets are listed by name. |
| Model Size | Whisper model for transcription. small.en is the best balance of speed and accuracy. medium.en or large-v3 are more accurate but slower. |
| Language | Primary language for transcription. en for English. Affects Whisper's accuracy. |
| Auto-paste | Automatically paste transcribed text at the cursor. Disable if you want to copy it manually. |
| Add trailing space | Adds a space after each dictation so the next word doesn't concatenate. |
| Auto-capitalize | Capitalises the first letter of each sentence. |
| Format numbers | Converts "twenty three" to "23" in dictation output. |
| Cleanup mode | clean removes filler words and false starts. verbatim keeps everything. |
Hotkeys & Modes
All hotkeys are configurable in Settings. This table shows the defaults.
| Setting | Default | What it does |
|---|---|---|
| Record hotkey | Ctrl+Shift | Push-to-talk dictation. Hold to record, release to transcribe and paste. |
| Record mode | Hold | Hold = push-to-talk. Toggle = press to start, press again to stop. |
| Continuous hotkey | Ctrl+Alt+D | Toggle always-on dictation. |
| Wake word hotkey | Ctrl+Alt+W | Toggle wake word listening. |
| Command mode hotkey | Ctrl+Alt+C | Toggle command mode (walkie-talkie via Right Ctrl). |
| Streaming hotkey | CapsLock | Toggle streaming dictation. Must be enabled in tray menu first. |
| Cancel hotkey | Escape | Cancel the current recording. |
| Undo hotkey | Ctrl+Alt+Z | Undo the last dictation paste. |
| Ava mode key | Right Alt | Hold to talk to Ava voice assistant. |
Wake Word Configuration
| Setting | Default | What it does |
|---|---|---|
| Wake phrase | "jarvis" | The word that activates Samsara. Options: samsara, hey samsara, computer, hey computer, jarvis, hey jarvis. |
| Speech threshold | Auto | Sensitivity for detecting speech. Auto-calibrates to your environment. |
| Wake command timeout | 5 seconds | How long Samsara listens for a command after hearing the wake word. |
| End words | "over", "done", "end dictation" | Words that signal you've finished a command. Experimental — may not work reliably in all situations. |
| Cancel words | "cancel", "abort" | Words that cancel the current wake word session. Experimental — may not work reliably in all situations. |
Command Mode
A walkie-talkie style mode. Hold the button, speak a command, release to execute. Ideal for rapid command sequences without saying the wake word each time.
| Setting | Default | What it does |
|---|---|---|
| Button | Right Ctrl | The button to hold. Configurable to any keyboard key or mouse button (Mouse 4, Mouse 5, etc). |
| Mode | Hold | Hold to talk, release to execute. |
| Enter debounce | 200ms | Prevents accidental activations from brief taps. |
| Inactivity timeout | 30 seconds | Automatically exits command mode after this long without a command. |
| Miss limit | 5 | After this many unrecognised commands, exits command mode. |
Commands
The Commands tab lets you browse, enable, and disable command packs. Each pack is a named group of related commands.
| Pack | Commands | Default |
|---|---|---|
| core | Essential commands: open apps, copy, paste, undo, redo, scroll, repeat, restart | Enabled |
| text-editing | Select all, bold, italic, word navigation, delete word, markers | Enabled |
| window-management | Snap, maximize, minimize, move between monitors, saved layouts | Enabled |
| browsers | Tab management, bookmarks, address bar, refresh, navigation | Enabled |
| media | Play, pause, next/previous track, volume | Enabled |
| smart-home | Hyperion LED control: lights red, lights off, etc. | Disabled |
| 3d-printing | FlashForge printer control: start print, check status, abort | Disabled |
| stremio | Stremio media control: play, pause, fullscreen | Disabled |
| screen-capture | Screenshot, screen recording, GIF capture | Enabled |
| macros | Delete line, duplicate tab, custom key sequences | Enabled |
| audio | Audio device switching | Enabled |
| ai | Ava commands, corrections, scheduling | Enabled |
| accessibility | Narrator, magnifier, high contrast, cursor size | Enabled |
| mouse | Left click, double click, right click by voice | Disabled |
Sounds
| Setting | What it does |
|---|---|
| Sound theme | Choose from several earcon themes. Each has distinct sounds for wake detection, command success, errors, etc. |
| Volume | Master volume for all sound effects (0.0 to 1.0). |
| Audio feedback | Enable/disable all earcons. When off, Samsara is completely silent except for TTS. |
Text-to-Speech
Samsara can speak. TTS is used by Ava for responses, and optionally for confirmations and status updates.
| Setting | Default | What it does |
|---|---|---|
| Enabled | Off | Master switch for all TTS. Turn on to hear Ava speak. |
| Engine | EdgeTTS | edge = Microsoft Edge TTS (high quality, requires internet). winrt = Windows built-in voices (offline, lower quality). |
| Voice | en-US-AvaNeural | The TTS voice. EdgeTTS has many options — Ava, Jenny, Guy, etc. |
| Speed | 1.0 | Speech rate. 1.0 = normal, 1.5 = fast, 0.75 = slow. |
| Volume | 0.8 | TTS output volume. |
TTS Categories
Control which types of speech Samsara produces:
| Category | Default | When it speaks |
|---|---|---|
| Agent responses | On | Ava's answers to questions and conversational replies. |
| Confirmations | On | "Opening Chrome", "Schedule stopped", etc. |
| Warnings | On | Error messages and safety warnings. |
| Status updates | On | "Cloud mode enabled", "Restarting", etc. |
| Dictation readback | Off | Reads your dictated text back to you after transcription. |
| Errors | On | Command failures and system errors. |
Audio Coordinator
The AudioCoordinator manages the relationship between TTS, your microphone, and background audio:
- Ducking — lowers mic sensitivity while Samsara is speaking, preventing echo
- Interrupt — if you start talking while Samsara is speaking, TTS stops immediately
- Pre-buffer discard — audio captured during TTS is discarded, so Samsara doesn't transcribe its own voice
Alarms
Configurable reminders for health and productivity. Built for people who need regular prompts to move, stretch, hydrate, or rest their eyes.
| Setting | Default | What it does |
|---|---|---|
| Enable alarms | On | Master switch for the alarm system. |
| Complete hotkey | F7 | Mark the current alarm as complete. |
| Dismiss hotkey | F8 | Dismiss the current alarm without completing it. |
| Nag interval | 60 seconds | How often an unacknowledged alarm repeats. |
Built-in Alarms
- Hydration — every 60 minutes. "Time to drink some water."
- Break — every 45 minutes. "Take a short break — stretch and rest your eyes."
- Stretch — every 120 minutes. "Time to stretch your hands, wrists, and neck."
- 20-20-20 Rule — every 20 minutes. "Look at something 20 feet away for 20 seconds."
Smart Actions
Smart Actions allow Samsara to interact with external services through a webhook bridge.
| Setting | What it does |
|---|---|
| Enabled | Master switch. Off by default. |
| Endpoint URL | The webhook URL that receives Smart Action payloads. |
| Auth header | Optional authentication header sent with each request. |
| Brain dump path | Where voice-captured brain dumps are saved. Default: Documents\Samsara Brain Dump.md |
| Allowed directories | Directories Smart Actions can read from (sandboxed). |
| Session window | Minutes before a Smart Action session expires. |
Advanced
| Setting | Default | What it does |
|---|---|---|
| Device | cuda | Hardware for Whisper inference. cuda = NVIDIA GPU, cpu = processor only. |
| Compute type | float16 | Precision for GPU inference. float16 = fast, int8 = smaller memory, float32 = most accurate. |
| Performance mode | balanced | fast = prioritise speed. balanced = good accuracy and speed. accurate = best transcription, slower. |
| Silence threshold | 2.0 | Seconds of silence before dictation is considered complete. |
| Min speech duration | 0.3s | Minimum audio length to process. Filters out brief noises. |
| Calibration multiplier | 3.0 | Sensitivity multiplier for auto speech threshold. Higher = less sensitive. |
| Echo cancellation | Enabled | Reduces feedback when speakers are near the microphone. |
| Listening indicator | On, bottom-center | Shows a small pill overlay when Samsara is actively listening. |
Ava Voice Assistant
Ava is Samsara's built-in AI assistant. She runs on a local LLM (phi3.5 via Ollama) and can answer questions, execute commands, schedule tasks, and learn your personal vocabulary.
Talking to Ava
Hold Right Alt and speak naturally. Ava will respond via TTS.
- "Hey Ava, what's the capital of France?" — she answers conversationally
- "Hey Ava, how are you?" — she responds briefly and doesn't loop
- Say "no thanks" or "I'm good" — she stops, doesn't keep offering help
Requirements
- Ollama installed and running
- A model pulled:
ollama pull phi3.5(recommended, ~2.2GB) - TTS enabled in Settings → Text-to-Speech
Ava + Commands
Ava can execute Samsara commands through natural language. Use action-oriented phrasing:
- "Hey Ava, can you open Spotify?"
- "Hey Ava, take a screenshot"
- "Hey Ava, go full screen"
- "Hey Ava, mute this tab"
Most commands execute immediately. Potentially destructive commands (close window, delete file, lock screen) require confirmation — Ava will ask you to say "yes" before executing.
How It Works
Ava translates your natural language into the exact Samsara command name, then Samsara executes it through the normal command pipeline. She has access to all 320+ commands that are marked as visible to her.
Scheduling Actions
Ask Ava to repeat an action on a timer:
- "Hey Ava, refresh this page every 5 minutes"
- "Hey Ava, scroll down every 30 seconds"
- "Hey Ava, press F5 every minute"
Ava confirms the schedule and waits for "yes" before starting. Say "stop schedule" to cancel a running schedule. Only one schedule can be active at a time.
Teaching Aliases
Teach Ava your personal vocabulary so she understands your shortcuts:
- "Hey Ava, when I say browser I mean open Firefox"
- "Hey Ava, remember that the meeting doc means open https://docs.google.com/..."
- "Hey Ava, from now on project means open VS Code"
Aliases persist across restarts. They're stored in ~/.samsara/ava_corrections.json.
Managing Aliases
- Forget: "Hey Ava, forget browser"
- Query: "Hey Ava, what does browser mean?"
- List all: "Hey Ava, list my aliases"
If you teach an alias that already exists, Ava asks for confirmation before replacing it. Aliases are injected into Ava's context as structured knowledge — she applies them intelligently rather than doing blind text substitution.
Your Profile
Teach Ava basic information about yourself:
- "Hey Ava, my name is Morne"
- "Hey Ava, I live in Cape Town"
- "Hey Ava, I'm a developer"
- "Hey Ava, remember about me that I have chronic hand pain"
Profile information is stored in ~/.samsara/ava_profile.json and injected into every conversation with Ava. She'll use your name naturally and adapt responses to your context.
Managing Your Profile
- "Hey Ava, what do you know about me?" — reads back everything
- "Hey Ava, forget my location" — clears one field
- "Hey Ava, forget what you know about me" — clears everything
Cloud LLM Mode
For users who want a more capable AI, Ava can route requests through a cloud LLM instead of local Ollama. You provide your own API key and pay your own usage costs.
Supported Providers
| Provider | Default Model | Cost |
|---|---|---|
| DeepSeek (default) | deepseek-chat | Very low (~$0.14/million tokens) |
| OpenAI | gpt-4o-mini | Low |
| Anthropic | claude-sonnet-4-20250514 | Medium |
Setup
Cloud LLM configuration is currently done through config.json. A settings UI for this is planned.
- Open
config.jsonand find thecloud_llmsection - Set your API key:
"api_key": "sk-your-key-here" - Optionally change the provider:
"provider": "deepseek" - Save the file and restart Samsara
- Say "Jarvis, ava cloud" to enable cloud mode
- Say "Jarvis, ava local" to switch back to offline mode at any time
If the cloud provider is unreachable, Ava automatically falls back to local Ollama. You'll never be left without a response.
Commands
Samsara ships with 320+ voice commands organised into packs. All commands are deterministic — they execute the same way every time.
Overview
Commands are triggered by saying the wake word followed by the command name:
"Jarvis, open chrome" "Jarvis, scroll down fast" "Jarvis, mark here" "Jarvis, again"
Commands are defined in two places:
commands.json— hotkeys, macros, press commands, launch commands, web shortcuts- Plugin files in
plugins/commands/— Python-implemented commands using the@commanddecorator
Command Packs
Packs are named groups of commands that can be enabled or disabled from Settings → Commands. This keeps the command vocabulary focused on what you actually use.
Disabled packs don't load their commands at all — they won't be recognised by the wake word listener or appear in the cheat sheet.
Custom Commands
Using the Settings Menu (recommended)
The easiest way to add a command is through Settings → Commands. Click "Add Command", choose a type (hotkey, launch, macro), fill in the fields, and save. No file editing required.
Editing commands.json (advanced)
For more control, you can edit commands.json directly. Open it in any text editor and add entries:
Adding a Hotkey Command
Add an entry to commands.json:
{
"my custom shortcut": {
"type": "hotkey",
"keys": ["ctrl", "shift", "n"],
"description": "Open new window",
"pack": "core"
}
}
Adding a Launch Command
{
"open blender": {
"type": "launch",
"target": "C:\\Program Files\\Blender\\blender.exe",
"description": "Launch Blender 3D",
"pack": "core"
}
}
Adding a Web Shortcut
{
"open my project": {
"type": "launch",
"target": "https://github.com/your-repo",
"description": "Open project on GitHub",
"pack": "core"
}
}
Adding a Macro
{
"select line": {
"type": "macro",
"steps": [
{"action": "press", "key": "home"},
{"action": "press", "key": "shift+end"}
],
"description": "Select the current line",
"pack": "text-editing"
}
}
Command Types
| Type | What it does |
|---|---|
hotkey | Presses a key combination (e.g. ctrl+c) |
press | Presses a single key (e.g. enter, delete) |
launch | Opens an application, URL, or file |
macro | Executes a sequence of key presses with optional delays |
method | Calls a Python method on the main app class |
text | Types a text string (e.g. punctuation characters) |
Command Reference
A selection of commonly used commands. For the full list, say "Jarvis, show commands" to open the floating cheat sheet.
| Category | Example Commands |
|---|---|
| Navigation | open chrome, open spotify, open file explorer, open settings, go back, go forward, new tab, close tab |
| Text editing | select all, copy, paste, cut, undo, redo, save, find, bold, italic, delete word |
| Scrolling | scroll up, scroll down, scroll up fast, scroll down a little, scroll to top |
| Text selection | mark here, select to here — anchor-based selection across any distance |
| Window management | maximize, minimize, snap left, snap right, move to left monitor, full screen |
| Media | play, pause, back a song, volume up, volume down, mute |
| Repeat | again, repeat — re-fires the last safe command |
| Overlays | show commands, hide commands, show numbers |
| System | screenshot, lock screen, dark mode, restart samsara |
| Accessibility | start narrator, bigger cursor, bigger text, high contrast |
Plugins
Plugins extend Samsara with new voice commands written in Python. Every plugin is a single .py file in plugins/commands/.
How Plugins Work
On startup, Samsara scans plugins/commands/ and loads every .py file. Each file can register voice commands using the @command decorator. Plugins have full access to the app instance, so they can control audio, UI, system calls, and hardware.
Creating a Plugin
Create a new file in plugins/commands/, for example my_plugin.py:
from samsara.plugin_commands import command
@command("say hello", aliases=["greet me"], pack="core")
def say_hello(app, remainder="", **kwargs):
"""Says hello via TTS."""
if hasattr(app, 'audio_coordinator') and app.audio_coordinator:
app.audio_coordinator.speak("Hello there!", category="agent_response")
else:
print("Hello there!")
The @command Decorator
| Parameter | What it does |
|---|---|
| First argument (string) | The primary voice command phrase, e.g. "say hello" |
aliases | Alternative phrases that trigger the same command. |
pack | Which command pack this belongs to. Must match a key in command_packs config. |
ai_visible | True (default) or False. When False, Ava won't see or suggest this command. |
Accessing the App
Every command function receives app as its first argument. Through it you can access:
app.config— the full config dictionaryapp.command_executor— execute other commands programmaticallyapp.audio_coordinator— TTS, ducking, audio stateapp.root— the Tkinter root window (for UI operations)
Using ctypes for System Control
For Windows-level operations (mouse control, key injection, audio APIs), use ctypes directly rather than pyautogui. This avoids dependencies and gives you full Win32 access. See plugins/commands/volume.py and plugins/commands/text_marker.py for examples.
Built-in Plugins
| Plugin | File | What it does |
|---|---|---|
| Volume | volume.py | Volume up/down/mute via Core Audio API (no media keys) |
| Text Marker | text_marker.py | Deferred anchor text selection: mark here → select to here |
| Scroll | scroll.py | 5-speed mouse wheel scrolling via SendInput |
| Stremio | stremio.py | Stremio media control via AutoHotkey |
| Ava / Ollama | ask_ollama.py | Voice AI assistant, command translation, scheduling |
| Music | music.py | Spotify playback control |
| Hyperion | hyperion.py | LED strip control via Hyperion JSON API |
| FlashForge | flashforge.py | 3D printer monitoring and control |
| Brain Dump | brain_dump.py | Capture voice thoughts to a markdown file |
| Window Manager | windows.py | Window positioning, saved layouts, lost window recovery |
Features
Detailed guides for Samsara's key features.
Streaming Dictation
Press CapsLock to start streaming. Text appears at your cursor as you speak, updating in real time. Press CapsLock again to stop.
Under the hood, Samsara uses a rolling Ctrl+Z approach: each update undoes the previous paste and replaces it with the expanded text. This works in any text field that supports undo.
Configuration
streaming_hotkey— the toggle key (default: CapsLock)streaming_direct_paste— when true, pastes directly instead of typing character by character
Text Selection Markers
Selecting large blocks of text by click-dragging is painful. Samsara replaces it with two voice commands:
- Click where you want the selection to start
- Say "Jarvis, mark here" — sets an invisible anchor at the cursor position
- Scroll freely (by voice or mouse) to where you want the selection to end
- Say "Jarvis, select to here" — everything between the anchor and the current position is highlighted
The anchor is tied to the window that was focused when you set it. If you switch to a different window, Samsara warns you and clears the anchor rather than selecting in the wrong app.
Voice Scrolling
Five speed tiers, all using mouse wheel simulation so they work in every application:
| Command | Speed |
|---|---|
| "scroll up a little" / "scroll down a little" | Slow (3 clicks) |
| "scroll up" / "scroll down" | Default (8 clicks) |
| "scroll up medium" / "scroll down medium" | Medium (15 clicks) |
| "scroll up high" / "scroll down high" | Medium-high (25 clicks) |
| "scroll up fast" / "scroll down fast" | Fast (40 clicks) |
Scroll amounts are configurable in config.json under the scroll key. Values are in wheel "clicks" (each click = 120 wheel delta units).
Repeat / Again
Say "again" or "repeat" to re-fire the last command. Chain them: "scroll up a little, again, again, again" scrolls four times.
Destructive commands (close tab, delete file, etc.) are blacklisted from repeat to prevent accidental damage.
Overlays
Command Cheat Sheet
Say "show commands" to open a floating overlay listing all active commands. It stays on top of other windows and has an adjustable opacity slider. Filter by typing in the search box or scrolling.
Show Numbers
Say "show numbers" and every clickable element on screen gets a numbered label. Say the number to click that element. Fully hands-free UI navigation.
Listening Indicator
A small pill-shaped overlay appears when Samsara is actively listening. Position it anywhere on screen from Settings → Advanced.
Window Manager
Control window positioning by voice:
- "Move to left monitor" / "move to right monitor" — shifts the active window between screens
- "Snap left" / "snap right" — snaps to half-screen
- "Save layout" — saves the current arrangement of all windows
- "Restore layout" — restores a saved arrangement
- "Bring back lost windows" — recovers windows that are off-screen