Getting Started

Everything you need to install, configure, and start using Samsara.

Installation

Requirements

Download

Download the latest release from GitHub Releases:

From Source

git clone https://github.com/Morne-Ingstar/Samsara.git
cd Samsara
pip install -r requirements.txt
python dictation.py

First Run

On first launch, a setup wizard walks you through:

All settings can be changed later from the Settings window.

Basic Usage

Push-to-Talk Dictation

Hold Ctrl+Shift and speak. Release when done. Your speech is transcribed and pasted at the cursor position.

Wake Word Commands

Say your wake word (default: "Jarvis") followed by a command. For example:

Samsara listens for the wake word passively. When it hears it, it captures the following command and executes it.

Streaming Dictation

Press CapsLock to start streaming mode. Speak continuously — text appears in real time as you talk. Press CapsLock again to stop.

Voice Modes

Samsara has six voice modes. All can run simultaneously — wake word mode and continuous mode share a single audio stream internally, so there are no conflicts.

ModeActivationPurpose
Push-to-TalkHold Ctrl+ShiftQuick dictation — speak while held, pastes on release
Wake WordSay your wake phrase (default: "Jarvis")Hands-free commands — Samsara listens passively for the wake phrase, then captures and executes your command
StreamingEnable in tray menu, then toggle CapsLockContinuous dictation — text flows in real time as you speak
Command ModeHold Right Ctrl (configurable)Walkie-talkie style — hold, speak a command, release to execute
ContinuousToggle Ctrl+Alt+DAlways-on dictation — everything you say is transcribed
Ava ModeHold Right AltTalk to Ava voice assistant — ask questions, request actions, teach aliases

All hotkeys are configurable in Settings. See the Hotkeys & Modes section for details on each.

Settings

Samsara's settings are organised into eight tabs. Open settings from the system tray icon or say "Jarvis, open settings."

General

SettingWhat it does
MicrophoneSelect your input device. USB mics and headsets are listed by name.
Model SizeWhisper model for transcription. small.en is the best balance of speed and accuracy. medium.en or large-v3 are more accurate but slower.
LanguagePrimary language for transcription. en for English. Affects Whisper's accuracy.
Auto-pasteAutomatically paste transcribed text at the cursor. Disable if you want to copy it manually.
Add trailing spaceAdds a space after each dictation so the next word doesn't concatenate.
Auto-capitalizeCapitalises the first letter of each sentence.
Format numbersConverts "twenty three" to "23" in dictation output.
Cleanup modeclean removes filler words and false starts. verbatim keeps everything.

Hotkeys & Modes

All hotkeys are configurable in Settings. This table shows the defaults.

SettingDefaultWhat it does
Record hotkeyCtrl+ShiftPush-to-talk dictation. Hold to record, release to transcribe and paste.
Record modeHoldHold = push-to-talk. Toggle = press to start, press again to stop.
Continuous hotkeyCtrl+Alt+DToggle always-on dictation.
Wake word hotkeyCtrl+Alt+WToggle wake word listening.
Command mode hotkeyCtrl+Alt+CToggle command mode (walkie-talkie via Right Ctrl).
Streaming hotkeyCapsLockToggle streaming dictation. Must be enabled in tray menu first.
Cancel hotkeyEscapeCancel the current recording.
Undo hotkeyCtrl+Alt+ZUndo the last dictation paste.
Ava mode keyRight AltHold to talk to Ava voice assistant.

Wake Word Configuration

SettingDefaultWhat it does
Wake phrase"jarvis"The word that activates Samsara. Options: samsara, hey samsara, computer, hey computer, jarvis, hey jarvis.
Speech thresholdAutoSensitivity for detecting speech. Auto-calibrates to your environment.
Wake command timeout5 secondsHow long Samsara listens for a command after hearing the wake word.
End words"over", "done", "end dictation"Words that signal you've finished a command. Experimental — may not work reliably in all situations.
Cancel words"cancel", "abort"Words that cancel the current wake word session. Experimental — may not work reliably in all situations.

Command Mode

A walkie-talkie style mode. Hold the button, speak a command, release to execute. Ideal for rapid command sequences without saying the wake word each time.

SettingDefaultWhat it does
ButtonRight CtrlThe button to hold. Configurable to any keyboard key or mouse button (Mouse 4, Mouse 5, etc).
ModeHoldHold to talk, release to execute.
Enter debounce200msPrevents accidental activations from brief taps.
Inactivity timeout30 secondsAutomatically exits command mode after this long without a command.
Miss limit5After this many unrecognised commands, exits command mode.

Commands

The Commands tab lets you browse, enable, and disable command packs. Each pack is a named group of related commands.

PackCommandsDefault
coreEssential commands: open apps, copy, paste, undo, redo, scroll, repeat, restartEnabled
text-editingSelect all, bold, italic, word navigation, delete word, markersEnabled
window-managementSnap, maximize, minimize, move between monitors, saved layoutsEnabled
browsersTab management, bookmarks, address bar, refresh, navigationEnabled
mediaPlay, pause, next/previous track, volumeEnabled
smart-homeHyperion LED control: lights red, lights off, etc.Disabled
3d-printingFlashForge printer control: start print, check status, abortDisabled
stremioStremio media control: play, pause, fullscreenDisabled
screen-captureScreenshot, screen recording, GIF captureEnabled
macrosDelete line, duplicate tab, custom key sequencesEnabled
audioAudio device switchingEnabled
aiAva commands, corrections, schedulingEnabled
accessibilityNarrator, magnifier, high contrast, cursor sizeEnabled
mouseLeft click, double click, right click by voiceDisabled

Sounds

SettingWhat it does
Sound themeChoose from several earcon themes. Each has distinct sounds for wake detection, command success, errors, etc.
VolumeMaster volume for all sound effects (0.0 to 1.0).
Audio feedbackEnable/disable all earcons. When off, Samsara is completely silent except for TTS.

Text-to-Speech

Samsara can speak. TTS is used by Ava for responses, and optionally for confirmations and status updates.

SettingDefaultWhat it does
EnabledOffMaster switch for all TTS. Turn on to hear Ava speak.
EngineEdgeTTSedge = Microsoft Edge TTS (high quality, requires internet). winrt = Windows built-in voices (offline, lower quality).
Voiceen-US-AvaNeuralThe TTS voice. EdgeTTS has many options — Ava, Jenny, Guy, etc.
Speed1.0Speech rate. 1.0 = normal, 1.5 = fast, 0.75 = slow.
Volume0.8TTS output volume.

TTS Categories

Control which types of speech Samsara produces:

CategoryDefaultWhen it speaks
Agent responsesOnAva's answers to questions and conversational replies.
ConfirmationsOn"Opening Chrome", "Schedule stopped", etc.
WarningsOnError messages and safety warnings.
Status updatesOn"Cloud mode enabled", "Restarting", etc.
Dictation readbackOffReads your dictated text back to you after transcription.
ErrorsOnCommand failures and system errors.

Audio Coordinator

The AudioCoordinator manages the relationship between TTS, your microphone, and background audio:

Alarms

Configurable reminders for health and productivity. Built for people who need regular prompts to move, stretch, hydrate, or rest their eyes.

SettingDefaultWhat it does
Enable alarmsOnMaster switch for the alarm system.
Complete hotkeyF7Mark the current alarm as complete.
Dismiss hotkeyF8Dismiss the current alarm without completing it.
Nag interval60 secondsHow often an unacknowledged alarm repeats.

Built-in Alarms

Smart Actions

Smart Actions allow Samsara to interact with external services through a webhook bridge.

SettingWhat it does
EnabledMaster switch. Off by default.
Endpoint URLThe webhook URL that receives Smart Action payloads.
Auth headerOptional authentication header sent with each request.
Brain dump pathWhere voice-captured brain dumps are saved. Default: Documents\Samsara Brain Dump.md
Allowed directoriesDirectories Smart Actions can read from (sandboxed).
Session windowMinutes before a Smart Action session expires.
Smart Actions can send data outside your machine. Only enable this if you understand what the configured endpoint does with your data.

Advanced

SettingDefaultWhat it does
DevicecudaHardware for Whisper inference. cuda = NVIDIA GPU, cpu = processor only.
Compute typefloat16Precision for GPU inference. float16 = fast, int8 = smaller memory, float32 = most accurate.
Performance modebalancedfast = prioritise speed. balanced = good accuracy and speed. accurate = best transcription, slower.
Silence threshold2.0Seconds of silence before dictation is considered complete.
Min speech duration0.3sMinimum audio length to process. Filters out brief noises.
Calibration multiplier3.0Sensitivity multiplier for auto speech threshold. Higher = less sensitive.
Echo cancellationEnabledReduces feedback when speakers are near the microphone.
Listening indicatorOn, bottom-centerShows a small pill overlay when Samsara is actively listening.

Ava Voice Assistant

Ava is Samsara's built-in AI assistant. She runs on a local LLM (phi3.5 via Ollama) and can answer questions, execute commands, schedule tasks, and learn your personal vocabulary.

Talking to Ava

Hold Right Alt and speak naturally. Ava will respond via TTS.

Requirements

Ava runs entirely on your machine. No data leaves your computer unless you enable Cloud LLM mode.

Ava + Commands

Ava can execute Samsara commands through natural language. Use action-oriented phrasing:

Most commands execute immediately. Potentially destructive commands (close window, delete file, lock screen) require confirmation — Ava will ask you to say "yes" before executing.

How It Works

Ava translates your natural language into the exact Samsara command name, then Samsara executes it through the normal command pipeline. She has access to all 320+ commands that are marked as visible to her.

Scheduling Actions

Ask Ava to repeat an action on a timer:

Ava confirms the schedule and waits for "yes" before starting. Say "stop schedule" to cancel a running schedule. Only one schedule can be active at a time.

Teaching Aliases

Teach Ava your personal vocabulary so she understands your shortcuts:

Aliases persist across restarts. They're stored in ~/.samsara/ava_corrections.json.

Managing Aliases

If you teach an alias that already exists, Ava asks for confirmation before replacing it. Aliases are injected into Ava's context as structured knowledge — she applies them intelligently rather than doing blind text substitution.

Your Profile

Teach Ava basic information about yourself:

Profile information is stored in ~/.samsara/ava_profile.json and injected into every conversation with Ava. She'll use your name naturally and adapt responses to your context.

Managing Your Profile

Cloud LLM Mode

For users who want a more capable AI, Ava can route requests through a cloud LLM instead of local Ollama. You provide your own API key and pay your own usage costs.

Supported Providers

ProviderDefault ModelCost
DeepSeek (default)deepseek-chatVery low (~$0.14/million tokens)
OpenAIgpt-4o-miniLow
Anthropicclaude-sonnet-4-20250514Medium

Setup

Cloud LLM configuration is currently done through config.json. A settings UI for this is planned.

  1. Open config.json and find the cloud_llm section
  2. Set your API key: "api_key": "sk-your-key-here"
  3. Optionally change the provider: "provider": "deepseek"
  4. Save the file and restart Samsara
  5. Say "Jarvis, ava cloud" to enable cloud mode
  6. Say "Jarvis, ava local" to switch back to offline mode at any time
When cloud mode is enabled, your voice requests are sent to the cloud provider's servers. Use "ava local" to switch back to fully offline operation at any time.

If the cloud provider is unreachable, Ava automatically falls back to local Ollama. You'll never be left without a response.

Commands

Samsara ships with 320+ voice commands organised into packs. All commands are deterministic — they execute the same way every time.

Overview

Commands are triggered by saying the wake word followed by the command name:

"Jarvis, open chrome"
"Jarvis, scroll down fast"
"Jarvis, mark here"
"Jarvis, again"

Commands are defined in two places:

Command Packs

Packs are named groups of commands that can be enabled or disabled from Settings → Commands. This keeps the command vocabulary focused on what you actually use.

Disabled packs don't load their commands at all — they won't be recognised by the wake word listener or appear in the cheat sheet.

Custom Commands

Using the Settings Menu (recommended)

The easiest way to add a command is through Settings → Commands. Click "Add Command", choose a type (hotkey, launch, macro), fill in the fields, and save. No file editing required.

Editing commands.json (advanced)

For more control, you can edit commands.json directly. Open it in any text editor and add entries:

Adding a Hotkey Command

Add an entry to commands.json:

{
  "my custom shortcut": {
    "type": "hotkey",
    "keys": ["ctrl", "shift", "n"],
    "description": "Open new window",
    "pack": "core"
  }
}

Adding a Launch Command

{
  "open blender": {
    "type": "launch",
    "target": "C:\\Program Files\\Blender\\blender.exe",
    "description": "Launch Blender 3D",
    "pack": "core"
  }
}

Adding a Web Shortcut

{
  "open my project": {
    "type": "launch",
    "target": "https://github.com/your-repo",
    "description": "Open project on GitHub",
    "pack": "core"
  }
}

Adding a Macro

{
  "select line": {
    "type": "macro",
    "steps": [
      {"action": "press", "key": "home"},
      {"action": "press", "key": "shift+end"}
    ],
    "description": "Select the current line",
    "pack": "text-editing"
  }
}

Command Types

TypeWhat it does
hotkeyPresses a key combination (e.g. ctrl+c)
pressPresses a single key (e.g. enter, delete)
launchOpens an application, URL, or file
macroExecutes a sequence of key presses with optional delays
methodCalls a Python method on the main app class
textTypes a text string (e.g. punctuation characters)

Command Reference

A selection of commonly used commands. For the full list, say "Jarvis, show commands" to open the floating cheat sheet.

CategoryExample Commands
Navigationopen chrome, open spotify, open file explorer, open settings, go back, go forward, new tab, close tab
Text editingselect all, copy, paste, cut, undo, redo, save, find, bold, italic, delete word
Scrollingscroll up, scroll down, scroll up fast, scroll down a little, scroll to top
Text selectionmark here, select to here — anchor-based selection across any distance
Window managementmaximize, minimize, snap left, snap right, move to left monitor, full screen
Mediaplay, pause, back a song, volume up, volume down, mute
Repeatagain, repeat — re-fires the last safe command
Overlaysshow commands, hide commands, show numbers
Systemscreenshot, lock screen, dark mode, restart samsara
Accessibilitystart narrator, bigger cursor, bigger text, high contrast

Plugins

Plugins extend Samsara with new voice commands written in Python. Every plugin is a single .py file in plugins/commands/.

How Plugins Work

On startup, Samsara scans plugins/commands/ and loads every .py file. Each file can register voice commands using the @command decorator. Plugins have full access to the app instance, so they can control audio, UI, system calls, and hardware.

Creating a Plugin

Create a new file in plugins/commands/, for example my_plugin.py:

from samsara.plugin_commands import command

@command("say hello", aliases=["greet me"], pack="core")
def say_hello(app, remainder="", **kwargs):
    """Says hello via TTS."""
    if hasattr(app, 'audio_coordinator') and app.audio_coordinator:
        app.audio_coordinator.speak("Hello there!", category="agent_response")
    else:
        print("Hello there!")

The @command Decorator

ParameterWhat it does
First argument (string)The primary voice command phrase, e.g. "say hello"
aliasesAlternative phrases that trigger the same command.
packWhich command pack this belongs to. Must match a key in command_packs config.
ai_visibleTrue (default) or False. When False, Ava won't see or suggest this command.

Accessing the App

Every command function receives app as its first argument. Through it you can access:

Using ctypes for System Control

For Windows-level operations (mouse control, key injection, audio APIs), use ctypes directly rather than pyautogui. This avoids dependencies and gives you full Win32 access. See plugins/commands/volume.py and plugins/commands/text_marker.py for examples.

Built-in Plugins

PluginFileWhat it does
Volumevolume.pyVolume up/down/mute via Core Audio API (no media keys)
Text Markertext_marker.pyDeferred anchor text selection: mark here → select to here
Scrollscroll.py5-speed mouse wheel scrolling via SendInput
Stremiostremio.pyStremio media control via AutoHotkey
Ava / Ollamaask_ollama.pyVoice AI assistant, command translation, scheduling
Musicmusic.pySpotify playback control
Hyperionhyperion.pyLED strip control via Hyperion JSON API
FlashForgeflashforge.py3D printer monitoring and control
Brain Dumpbrain_dump.pyCapture voice thoughts to a markdown file
Window Managerwindows.pyWindow positioning, saved layouts, lost window recovery

Features

Detailed guides for Samsara's key features.

Streaming Dictation

Streaming dictation must be enabled first. Right-click the Samsara tray icon and enable "Streaming Mode". The toggle is also available in Settings → Hotkeys & Modes.

Press CapsLock to start streaming. Text appears at your cursor as you speak, updating in real time. Press CapsLock again to stop.

Under the hood, Samsara uses a rolling Ctrl+Z approach: each update undoes the previous paste and replaces it with the expanded text. This works in any text field that supports undo.

Configuration

Text Selection Markers

Selecting large blocks of text by click-dragging is painful. Samsara replaces it with two voice commands:

  1. Click where you want the selection to start
  2. Say "Jarvis, mark here" — sets an invisible anchor at the cursor position
  3. Scroll freely (by voice or mouse) to where you want the selection to end
  4. Say "Jarvis, select to here" — everything between the anchor and the current position is highlighted

The anchor is tied to the window that was focused when you set it. If you switch to a different window, Samsara warns you and clears the anchor rather than selecting in the wrong app.

Voice Scrolling

Five speed tiers, all using mouse wheel simulation so they work in every application:

CommandSpeed
"scroll up a little" / "scroll down a little"Slow (3 clicks)
"scroll up" / "scroll down"Default (8 clicks)
"scroll up medium" / "scroll down medium"Medium (15 clicks)
"scroll up high" / "scroll down high"Medium-high (25 clicks)
"scroll up fast" / "scroll down fast"Fast (40 clicks)

Scroll amounts are configurable in config.json under the scroll key. Values are in wheel "clicks" (each click = 120 wheel delta units).

Repeat / Again

Say "again" or "repeat" to re-fire the last command. Chain them: "scroll up a little, again, again, again" scrolls four times.

Destructive commands (close tab, delete file, etc.) are blacklisted from repeat to prevent accidental damage.

Overlays

Command Cheat Sheet

Say "show commands" to open a floating overlay listing all active commands. It stays on top of other windows and has an adjustable opacity slider. Filter by typing in the search box or scrolling.

Show Numbers

Say "show numbers" and every clickable element on screen gets a numbered label. Say the number to click that element. Fully hands-free UI navigation.

Listening Indicator

A small pill-shaped overlay appears when Samsara is actively listening. Position it anywhere on screen from Settings → Advanced.

Window Manager

Control window positioning by voice: