Getting Started

Everything you need to install, configure, and start using Samsara.

Installation

Requirements

Windows 10 or 11
Python 3.10+ (if running from source)
A microphone (USB, headset, or built-in)
Optional: NVIDIA GPU with CUDA for faster transcription
Optional: Ollama installed for Ava voice assistant

Download

Download the latest release from GitHub Releases:

Samsara-Windows-v0.9.9.7z — CPU version, works on any machine
Samsara-CUDA-Pack-v0.9.9.zip — Optional GPU acceleration (extract into the same folder)

From Source

git clone https://github.com/Morne-Ingstar/Samsara.git
cd Samsara
pip install -r requirements.txt
python dictation.py

First Run

On first launch, a setup wizard walks you through:

Selecting your microphone
Choosing a Whisper model size (small.en recommended for most users)
Setting your record hotkey (default: Ctrl+Shift, hold to talk)
Choosing a wake word (default: "Jarvis")
Configuring streaming dictation hotkey (default: CapsLock)

All settings can be changed later from the Settings window.

Basic Usage

Push-to-Talk Dictation

Hold Ctrl+Shift and speak. Release when done. Your speech is transcribed and pasted at the cursor position.

Wake Word Commands

Say your wake word (default: "Jarvis") followed by a command. For example:

"Jarvis, open Chrome"
"Jarvis, scroll down"
"Jarvis, take a screenshot"
"Jarvis, mute"

Samsara listens for the wake word passively. When it hears it, it captures the following command and executes it.

Streaming Dictation

Press CapsLock to start streaming mode. Speak continuously — text appears in real time as you talk. Press CapsLock again to stop.

Voice Modes

Samsara has six voice modes. All can run simultaneously — wake word mode and continuous mode share a single audio stream internally, so there are no conflicts.

Mode	Activation	Purpose
Push-to-Talk	Hold `Ctrl`+`Shift`	Quick dictation — speak while held, pastes on release
Wake Word	Say your wake phrase (default: "Jarvis")	Hands-free commands — Samsara listens passively for the wake phrase, then captures and executes your command
Streaming	Enable in tray menu, then toggle `CapsLock`	Continuous dictation — text flows in real time as you speak
Command Mode	Hold `Right Ctrl` (configurable)	Walkie-talkie style — hold, speak a command, release to execute
Continuous	Toggle `Ctrl`+`Alt`+`D`	Always-on dictation — everything you say is transcribed
Ava Mode	Hold `Right Alt`	Talk to Ava voice assistant — ask questions, request actions, teach aliases

All hotkeys are configurable in Settings. See the Hotkeys & Modes section for details on each.

Settings

Samsara's settings are organised into eight tabs. Open settings from the system tray icon or say "Jarvis, open settings."

General

Setting	What it does
Microphone	Select your input device. USB mics and headsets are listed by name.
Model Size	Whisper model for transcription. `small.en` is the best balance of speed and accuracy. `medium.en` or `large-v3` are more accurate but slower.
Language	Primary language for transcription. `en` for English. Affects Whisper's accuracy.
Auto-paste	Automatically paste transcribed text at the cursor. Disable if you want to copy it manually.
Add trailing space	Adds a space after each dictation so the next word doesn't concatenate.
Auto-capitalize	Capitalises the first letter of each sentence.
Format numbers	Converts "twenty three" to "23" in dictation output.
Cleanup mode	`clean` removes filler words and false starts. `verbatim` keeps everything.

Hotkeys & Modes

All hotkeys are configurable in Settings. This table shows the defaults.

Setting	Default	What it does
Record hotkey	`Ctrl`+`Shift`	Push-to-talk dictation. Hold to record, release to transcribe and paste.
Record mode	Hold	`Hold` = push-to-talk. `Toggle` = press to start, press again to stop.
Continuous hotkey	`Ctrl`+`Alt`+`D`	Toggle always-on dictation.
Wake word hotkey	`Ctrl`+`Alt`+`W`	Toggle wake word listening.
Command mode hotkey	`Ctrl`+`Alt`+`C`	Toggle command mode (walkie-talkie via `Right Ctrl`).
Streaming hotkey	`CapsLock`	Toggle streaming dictation. Must be enabled in tray menu first.
Cancel hotkey	`Escape`	Cancel the current recording.
Undo hotkey	`Ctrl`+`Alt`+`Z`	Undo the last dictation paste.
Ava mode key	`Right Alt`	Hold to talk to Ava voice assistant.

Wake Word Configuration

Setting	Default	What it does
Wake phrase	"jarvis"	The word that activates Samsara. Options: samsara, hey samsara, computer, hey computer, jarvis, hey jarvis.
Speech threshold	Auto	Sensitivity for detecting speech. Auto-calibrates to your environment.
Wake command timeout	5 seconds	How long Samsara listens for a command after hearing the wake word.
End words	"over", "done", "end dictation"	Words that signal you've finished a command. Experimental — may not work reliably in all situations.
Cancel words	"cancel", "abort"	Words that cancel the current wake word session. Experimental — may not work reliably in all situations.

Command Mode

A walkie-talkie style mode. Hold the button, speak a command, release to execute. Ideal for rapid command sequences without saying the wake word each time.

Setting	Default	What it does
Button	`Right Ctrl`	The button to hold. Configurable to any keyboard key or mouse button (Mouse 4, Mouse 5, etc).
Mode	Hold	Hold to talk, release to execute.
Enter debounce	200ms	Prevents accidental activations from brief taps.
Inactivity timeout	30 seconds	Automatically exits command mode after this long without a command.
Miss limit	5	After this many unrecognised commands, exits command mode.

Commands

The Commands tab lets you browse, enable, and disable command packs. Each pack is a named group of related commands.

Pack	Commands	Default
core	Essential commands: open apps, copy, paste, undo, redo, scroll, repeat, restart	Enabled
text-editing	Select all, bold, italic, word navigation, delete word, markers	Enabled
window-management	Snap, maximize, minimize, move between monitors, saved layouts	Enabled
browsers	Tab management, bookmarks, address bar, refresh, navigation	Enabled
media	Play, pause, next/previous track, volume	Enabled
smart-home	Hyperion LED control: lights red, lights off, etc.	Disabled
3d-printing	FlashForge printer control: start print, check status, abort	Disabled
stremio	Stremio media control: play, pause, fullscreen	Disabled
screen-capture	Screenshot, screen recording, GIF capture	Enabled
macros	Delete line, duplicate tab, custom key sequences	Enabled
audio	Audio device switching	Enabled
ai	Ava commands, corrections, scheduling	Enabled
accessibility	Narrator, magnifier, high contrast, cursor size	Enabled
mouse	Left click, double click, right click by voice	Disabled

Sounds

Setting	What it does
Sound theme	Choose from several earcon themes. Each has distinct sounds for wake detection, command success, errors, etc.
Volume	Master volume for all sound effects (0.0 to 1.0).
Audio feedback	Enable/disable all earcons. When off, Samsara is completely silent except for TTS.

Text-to-Speech

Samsara can speak. TTS is used by Ava for responses, and optionally for confirmations and status updates.

Setting	Default	What it does
Enabled	Off	Master switch for all TTS. Turn on to hear Ava speak.
Engine	EdgeTTS	`edge` = Microsoft Edge TTS (high quality, requires internet). `winrt` = Windows built-in voices (offline, lower quality).
Voice	en-US-AvaNeural	The TTS voice. EdgeTTS has many options — Ava, Jenny, Guy, etc.
Speed	1.0	Speech rate. 1.0 = normal, 1.5 = fast, 0.75 = slow.
Volume	0.8	TTS output volume.

TTS Categories

Control which types of speech Samsara produces:

Category	Default	When it speaks
Agent responses	On	Ava's answers to questions and conversational replies.
Confirmations	On	"Opening Chrome", "Schedule stopped", etc.
Warnings	On	Error messages and safety warnings.
Status updates	On	"Cloud mode enabled", "Restarting", etc.
Dictation readback	Off	Reads your dictated text back to you after transcription.
Errors	On	Command failures and system errors.

Audio Coordinator

The AudioCoordinator manages the relationship between TTS, your microphone, and background audio:

Ducking — lowers mic sensitivity while Samsara is speaking, preventing echo
Interrupt — if you start talking while Samsara is speaking, TTS stops immediately
Pre-buffer discard — audio captured during TTS is discarded, so Samsara doesn't transcribe its own voice

Alarms

Configurable reminders for health and productivity. Built for people who need regular prompts to move, stretch, hydrate, or rest their eyes.

Setting	Default	What it does
Enable alarms	On	Master switch for the alarm system.
Complete hotkey	`F7`	Mark the current alarm as complete.
Dismiss hotkey	`F8`	Dismiss the current alarm without completing it.
Nag interval	60 seconds	How often an unacknowledged alarm repeats.

Built-in Alarms

Hydration — every 60 minutes. "Time to drink some water."
Break — every 45 minutes. "Take a short break — stretch and rest your eyes."
Stretch — every 120 minutes. "Time to stretch your hands, wrists, and neck."
20-20-20 Rule — every 20 minutes. "Look at something 20 feet away for 20 seconds."

Smart Actions

Smart Actions allow Samsara to interact with external services through a webhook bridge.

Setting	What it does
Enabled	Master switch. Off by default.
Endpoint URL	The webhook URL that receives Smart Action payloads.
Auth header	Optional authentication header sent with each request.
Brain dump path	Where voice-captured brain dumps are saved. Default: `Documents\Samsara Brain Dump.md`
Allowed directories	Directories Smart Actions can read from (sandboxed).
Session window	Minutes before a Smart Action session expires.

Smart Actions can send data outside your machine. Only enable this if you understand what the configured endpoint does with your data.

Advanced

Setting	Default	What it does
Device	`cuda`	Hardware for Whisper inference. `cuda` = NVIDIA GPU, `cpu` = processor only.
Compute type	`float16`	Precision for GPU inference. `float16` = fast, `int8` = smaller memory, `float32` = most accurate.
Performance mode	`balanced`	`fast` = prioritise speed. `balanced` = good accuracy and speed. `accurate` = best transcription, slower.
Silence threshold	2.0	Seconds of silence before dictation is considered complete.
Min speech duration	0.3s	Minimum audio length to process. Filters out brief noises.
Calibration multiplier	3.0	Sensitivity multiplier for auto speech threshold. Higher = less sensitive.
Echo cancellation	Enabled	Reduces feedback when speakers are near the microphone.
Listening indicator	On, bottom-center	Shows a small pill overlay when Samsara is actively listening.

Ava Voice Assistant

Ava is Samsara's built-in AI assistant. She runs on a local LLM (phi3.5 via Ollama) and can answer questions, execute commands, schedule tasks, and learn your personal vocabulary.

Talking to Ava

Hold Right Alt and speak naturally. Ava will respond via TTS.

"Hey Ava, what's the capital of France?" — she answers conversationally
"Hey Ava, how are you?" — she responds briefly and doesn't loop
Say "no thanks" or "I'm good" — she stops, doesn't keep offering help

Requirements

Ollama installed and running
A model pulled: ollama pull phi3.5 (recommended, ~2.2GB)
TTS enabled in Settings → Text-to-Speech

Ava runs entirely on your machine. No data leaves your computer unless you enable Cloud LLM mode.

Ava + Commands

Ava can execute Samsara commands through natural language. Use action-oriented phrasing:

"Hey Ava, can you open Spotify?"
"Hey Ava, take a screenshot"
"Hey Ava, go full screen"
"Hey Ava, mute this tab"

Most commands execute immediately. Potentially destructive commands (close window, delete file, lock screen) require confirmation — Ava will ask you to say "yes" before executing.

How It Works

Ava translates your natural language into the exact Samsara command name, then Samsara executes it through the normal command pipeline. She has access to all 320+ commands that are marked as visible to her.

Scheduling Actions

Ask Ava to repeat an action on a timer:

"Hey Ava, refresh this page every 5 minutes"
"Hey Ava, scroll down every 30 seconds"
"Hey Ava, press F5 every minute"

Ava confirms the schedule and waits for "yes" before starting. Say "stop schedule" to cancel a running schedule. Only one schedule can be active at a time.

Teaching Aliases

Teach Ava your personal vocabulary so she understands your shortcuts:

"Hey Ava, when I say browser I mean open Firefox"
"Hey Ava, remember that the meeting doc means open https://docs.google.com/..."
"Hey Ava, from now on project means open VS Code"

Aliases persist across restarts. They're stored in ~/.samsara/ava_corrections.json.

Managing Aliases

Forget: "Hey Ava, forget browser"
Query: "Hey Ava, what does browser mean?"
List all: "Hey Ava, list my aliases"

If you teach an alias that already exists, Ava asks for confirmation before replacing it. Aliases are injected into Ava's context as structured knowledge — she applies them intelligently rather than doing blind text substitution.

Your Profile

Teach Ava basic information about yourself:

"Hey Ava, my name is Morne"
"Hey Ava, I live in Cape Town"
"Hey Ava, I'm a developer"
"Hey Ava, remember about me that I have chronic hand pain"

Profile information is stored in ~/.samsara/ava_profile.json and injected into every conversation with Ava. She'll use your name naturally and adapt responses to your context.

Managing Your Profile

"Hey Ava, what do you know about me?" — reads back everything
"Hey Ava, forget my location" — clears one field
"Hey Ava, forget what you know about me" — clears everything

Cloud LLM Mode

For users who want a more capable AI, Ava can route requests through a cloud LLM instead of local Ollama. You provide your own API key and pay your own usage costs.

Supported Providers

Provider	Default Model	Cost
DeepSeek (default)	deepseek-chat	Very low (~$0.14/million tokens)
OpenAI	gpt-4o-mini	Low
Anthropic	claude-sonnet-4-20250514	Medium

Setup

Cloud LLM configuration is currently done through config.json. A settings UI for this is planned.

Open config.json and find the cloud_llm section
Set your API key: "api_key": "sk-your-key-here"
Optionally change the provider: "provider": "deepseek"
Save the file and restart Samsara
Say "Jarvis, ava cloud" to enable cloud mode
Say "Jarvis, ava local" to switch back to offline mode at any time

When cloud mode is enabled, your voice requests are sent to the cloud provider's servers. Use "ava local" to switch back to fully offline operation at any time.

If the cloud provider is unreachable, Ava automatically falls back to local Ollama. You'll never be left without a response.

Commands

Samsara ships with 320+ voice commands organised into packs. All commands are deterministic — they execute the same way every time.

Overview

Commands are triggered by saying the wake word followed by the command name:

"Jarvis, open chrome"
"Jarvis, scroll down fast"
"Jarvis, mark here"
"Jarvis, again"

Commands are defined in two places:

commands.json — hotkeys, macros, press commands, launch commands, web shortcuts
Plugin files in plugins/commands/ — Python-implemented commands using the @command decorator

Command Packs

Packs are named groups of commands that can be enabled or disabled from Settings → Commands. This keeps the command vocabulary focused on what you actually use.

Disabled packs don't load their commands at all — they won't be recognised by the wake word listener or appear in the cheat sheet.

Custom Commands

Using the Settings Menu (recommended)

The easiest way to add a command is through Settings → Commands. Click "Add Command", choose a type (hotkey, launch, macro), fill in the fields, and save. No file editing required.

Editing commands.json (advanced)

For more control, you can edit commands.json directly. Open it in any text editor and add entries:

Adding a Hotkey Command

Add an entry to commands.json:

{
  "my custom shortcut": {
    "type": "hotkey",
    "keys": ["ctrl", "shift", "n"],
    "description": "Open new window",
    "pack": "core"
  }
}

Adding a Launch Command

{
  "open blender": {
    "type": "launch",
    "target": "C:\\Program Files\\Blender\\blender.exe",
    "description": "Launch Blender 3D",
    "pack": "core"
  }
}

Adding a Web Shortcut

{
  "open my project": {
    "type": "launch",
    "target": "https://github.com/your-repo",
    "description": "Open project on GitHub",
    "pack": "core"
  }
}

Adding a Macro

{
  "select line": {
    "type": "macro",
    "steps": [
      {"action": "press", "key": "home"},
      {"action": "press", "key": "shift+end"}
    ],
    "description": "Select the current line",
    "pack": "text-editing"
  }
}

Command Types

Type	What it does
`hotkey`	Presses a key combination (e.g. `ctrl+c`)
`press`	Presses a single key (e.g. `enter`, `delete`)
`launch`	Opens an application, URL, or file
`macro`	Executes a sequence of key presses with optional delays
`method`	Calls a Python method on the main app class
`text`	Types a text string (e.g. punctuation characters)

Command Reference

A selection of commonly used commands. For the full list, say "Jarvis, show commands" to open the floating cheat sheet.

Category	Example Commands
Navigation	open chrome, open spotify, open file explorer, open settings, go back, go forward, new tab, close tab
Text editing	select all, copy, paste, cut, undo, redo, save, find, bold, italic, delete word
Scrolling	scroll up, scroll down, scroll up fast, scroll down a little, scroll to top
Text selection	mark here, select to here — anchor-based selection across any distance
Window management	maximize, minimize, snap left, snap right, move to left monitor, full screen
Media	play, pause, back a song, volume up, volume down, mute
Repeat	again, repeat — re-fires the last safe command
Overlays	show commands, hide commands, show numbers
System	screenshot, lock screen, dark mode, restart samsara
Accessibility	start narrator, bigger cursor, bigger text, high contrast

Plugins

Plugins extend Samsara with new voice commands written in Python. Every plugin is a single .py file in plugins/commands/.

How Plugins Work

On startup, Samsara scans plugins/commands/ and loads every .py file. Each file can register voice commands using the @command decorator. Plugins have full access to the app instance, so they can control audio, UI, system calls, and hardware.

Creating a Plugin

Create a new file in plugins/commands/, for example my_plugin.py:

from samsara.plugin_commands import command

@command("say hello", aliases=["greet me"], pack="core")
def say_hello(app, remainder="", **kwargs):
    """Says hello via TTS."""
    if hasattr(app, 'audio_coordinator') and app.audio_coordinator:
        app.audio_coordinator.speak("Hello there!", category="agent_response")
    else:
        print("Hello there!")

The @command Decorator

Parameter	What it does
First argument (string)	The primary voice command phrase, e.g. "say hello"
`aliases`	Alternative phrases that trigger the same command.
`pack`	Which command pack this belongs to. Must match a key in `command_packs` config.
`ai_visible`	`True` (default) or `False`. When False, Ava won't see or suggest this command.

Accessing the App

Every command function receives app as its first argument. Through it you can access:

app.config — the full config dictionary
app.command_executor — execute other commands programmatically
app.audio_coordinator — TTS, ducking, audio state
app.root — the Tkinter root window (for UI operations)

Using ctypes for System Control

For Windows-level operations (mouse control, key injection, audio APIs), use ctypes directly rather than pyautogui. This avoids dependencies and gives you full Win32 access. See plugins/commands/volume.py and plugins/commands/text_marker.py for examples.

Built-in Plugins

Plugin	File	What it does
Volume	volume.py	Volume up/down/mute via Core Audio API (no media keys)
Text Marker	text_marker.py	Deferred anchor text selection: mark here → select to here
Scroll	scroll.py	5-speed mouse wheel scrolling via SendInput
Stremio	stremio.py	Stremio media control via AutoHotkey
Ava / Ollama	ask_ollama.py	Voice AI assistant, command translation, scheduling
Music	music.py	Spotify playback control
Hyperion	hyperion.py	LED strip control via Hyperion JSON API
FlashForge	flashforge.py	3D printer monitoring and control
Brain Dump	brain_dump.py	Capture voice thoughts to a markdown file
Window Manager	windows.py	Window positioning, saved layouts, lost window recovery

Features

Detailed guides for Samsara's key features.

Streaming Dictation

Streaming dictation must be enabled first. Right-click the Samsara tray icon and enable "Streaming Mode". The toggle is also available in Settings → Hotkeys & Modes.

Press CapsLock to start streaming. Text appears at your cursor as you speak, updating in real time. Press CapsLock again to stop.

Under the hood, Samsara uses a rolling Ctrl+Z approach: each update undoes the previous paste and replaces it with the expanded text. This works in any text field that supports undo.

Configuration

streaming_hotkey — the toggle key (default: CapsLock)
streaming_direct_paste — when true, pastes directly instead of typing character by character

Text Selection Markers

Selecting large blocks of text by click-dragging is painful. Samsara replaces it with two voice commands:

Click where you want the selection to start
Say "Jarvis, mark here" — sets an invisible anchor at the cursor position
Scroll freely (by voice or mouse) to where you want the selection to end
Say "Jarvis, select to here" — everything between the anchor and the current position is highlighted

The anchor is tied to the window that was focused when you set it. If you switch to a different window, Samsara warns you and clears the anchor rather than selecting in the wrong app.

Voice Scrolling

Five speed tiers, all using mouse wheel simulation so they work in every application:

Command	Speed
"scroll up a little" / "scroll down a little"	Slow (3 clicks)
"scroll up" / "scroll down"	Default (8 clicks)
"scroll up medium" / "scroll down medium"	Medium (15 clicks)
"scroll up high" / "scroll down high"	Medium-high (25 clicks)
"scroll up fast" / "scroll down fast"	Fast (40 clicks)

Scroll amounts are configurable in config.json under the scroll key. Values are in wheel "clicks" (each click = 120 wheel delta units).

Repeat / Again

Say "again" or "repeat" to re-fire the last command. Chain them: "scroll up a little, again, again, again" scrolls four times.

Destructive commands (close tab, delete file, etc.) are blacklisted from repeat to prevent accidental damage.

Overlays

Command Cheat Sheet

Say "show commands" to open a floating overlay listing all active commands. It stays on top of other windows and has an adjustable opacity slider. Filter by typing in the search box or scrolling.

Show Numbers

Say "show numbers" and every clickable element on screen gets a numbered label. Say the number to click that element. Fully hands-free UI navigation.

Listening Indicator

A small pill-shaped overlay appears when Samsara is actively listening. Position it anywhere on screen from Settings → Advanced.

Window Manager

Control window positioning by voice:

"Move to left monitor" / "move to right monitor" — shifts the active window between screens
"Snap left" / "snap right" — snaps to half-screen
"Save layout" — saves the current arrangement of all windows
"Restore layout" — restores a saved arrangement
"Bring back lost windows" — recovers windows that are off-screen