Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
8cdeb83
add initial mvp working integration
sumerc-psh Jun 16, 2026
0bb4058
add parakeet files
sumerc-psh Jun 16, 2026
987a968
add licenses
sumerc-psh Jun 16, 2026
25e61fe
download local models for dev + config checks
sumerc-psh Jun 16, 2026
44e890b
add local model + parakeet transcriber files
sumerc-psh Jun 16, 2026
546de16
add tray icons
sumerc-psh Jun 16, 2026
d67940c
fix: tray imgs+add rss metric to diag log+doctor
sumerc Jun 16, 2026
51d26b8
fix: rss metrics
sumerc Jun 16, 2026
8625dca
update mudler-parakeet + recompile parakeet each time
sumerc Jun 16, 2026
2f8eab7
ignore: dirty in submodules (we have self-modifying code in ggml)
sumerc Jun 16, 2026
aabfbdc
add support for save last audio for local models
sumerc Jun 19, 2026
e088e2a
fix: multilang model only auto-detect
sumerc Jun 22, 2026
fc8ba93
add integration tests for local models
sumerc Jun 22, 2026
61d1415
fix: tray lang/model switch issues
sumerc Jun 22, 2026
1aab06a
debug CI local model inference times
sumerc Jun 22, 2026
c7eb46c
batch transcribe for faster CI
sumerc Jun 22, 2026
ea31331
shorten CI test audio fixtures to ~1.5s
sumerc Jun 22, 2026
7bbec28
block re-record during inference + blue icon and denied beep
sumerc Jun 23, 2026
489a517
tweak: sky-blue transcribing icon
sumerc Jun 23, 2026
7a90ddb
fix rare segv on concurrrent malgo ctx
sumerc Jun 24, 2026
49b05df
fix: segv (double free after sleep)
sumerc Jun 25, 2026
0730014
fix: add missing malgolock package; finalize double-free fix
sumerc Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,25 @@ jobs:
needs: test
steps:
- uses: actions/checkout@v5
with:
submodules: recursive

- uses: actions/setup-go@v6
with:
go-version: '1.24'

# Cache the on-device Parakeet ggufs (~940MB) so the local-model tests
# run without re-downloading every time. Bump the key when the models
# release changes (localmodel.Version).
- name: Cache local models
uses: actions/cache@v4
with:
path: models/parakeet/v1
key: parakeet-models-v1

- name: Download local models
run: make download-models

- name: Integration tests
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
Expand Down
13 changes: 9 additions & 4 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ jobs:
runs-on: macos-15
steps:
- uses: actions/checkout@v5
with:
submodules: recursive

- uses: actions/setup-go@v6
with:
Expand All @@ -36,6 +38,8 @@ jobs:
runs-on: macos-15
steps:
- uses: actions/checkout@v5
with:
submodules: recursive

- uses: actions/setup-go@v6
with:
Expand All @@ -54,6 +58,7 @@ jobs:
- uses: actions/checkout@v5
with:
fetch-depth: 0
submodules: recursive

- uses: actions/setup-go@v6
with:
Expand All @@ -72,9 +77,9 @@ jobs:
VERSION: ${{ github.ref_name }}
GITHUB_TOKEN: ${{ github.token }}
run: |
universal=$(find dist -path "*universal*" -name "zee" -type f | head -1)
test -n "$universal" || { echo "universal binary not found"; find dist -type f; exit 1; }
chmod +x "$universal"
packaging/mkdmg.sh "$universal" "$VERSION" "Zee-${VERSION}.dmg"
bin=$(find dist -name "zee" -type f | head -1)
test -n "$bin" || { echo "arm64 binary not found"; find dist -type f; exit 1; }
chmod +x "$bin"
packaging/mkdmg.sh "$bin" "$VERSION" "Zee-${VERSION}.dmg"
shasum -a 256 "Zee-${VERSION}.dmg" >> dist/checksums.txt
gh release upload "$VERSION" "Zee-${VERSION}.dmg" dist/checksums.txt --clobber
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ zee-gui
Zee-*.dmg
packaging/Zee.icns

# Local STT models (downloaded / dev-placed ggufs — never committed)
/models/

# Environment
.env

Expand Down
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "parakeet.cpp"]
path = third_party/parakeet.cpp
url = https://github.com/mudler/parakeet.cpp
ignore = dirty
18 changes: 10 additions & 8 deletions .goreleaser.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,28 @@ version: 2

before:
hooks:
# Init the parakeet.cpp submodule and build its static archives before the
# arm64 cgo build links them (darwin/arm64 only; no-op elsewhere).
- git submodule update --init --recursive
- make parakeet-lib
- packaging/mkicns.sh packaging/appicon.png

builds:
- id: zee
env:
- CGO_ENABLED=1
# Stamp the macOS 11.0 deploy target so the binary runs on every supported
# Mac (matches the -mcpu=apple-m1 / deploy target the archives were built with).
- CGO_CFLAGS=-mmacosx-version-min=11.0
- CGO_LDFLAGS=-mmacosx-version-min=11.0
- MACOSX_DEPLOYMENT_TARGET=11.0
goos:
- darwin
goarch:
- amd64
- arm64
- arm64 # Apple Silicon only — local STT (parakeet.cpp) is arm64-only
ldflags:
- -s -w -X main.version={{ .Version }}

universal_binaries:
- id: zee-universal
ids:
- zee
replace: false

archives:
- builds:
- zee
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
# Changelog

## Unreleased

- Add offline, on-device transcription via Parakeet (parakeet.cpp, CPU) on Apple Silicon
- Works out of the box with no API key on Apple Silicon (falls back to the local 110M model)
- Add local model picker in the tray: 110M English (default), 0.6B v3 multilingual, 0.6B v2 English (opt-in download)
- Download missing local models on demand from the tray with progress
- `-transcribe` supports local WAV (16 kHz mono) transcription without a network call, and accepts multiple files in one invocation (model loaded once; one transcript printed per line)
- Block starting a new recording while the previous transcription is still in progress (the recording guard now spans inference, not just capture); show a blue status-dot tray icon during transcription and play a short "denied" beep if the hotkey is pressed then
- `-doctor` reports local model status (present, path, size, decoder)
- `-doctor` transcription test uses the app's default engine (local Parakeet, else first cloud key) instead of prompting for a provider + API key
- Idle tray icon adapts to the menubar appearance (template tinting) — renders white on dark/transparent menubars instead of black
- Diagnostics log per-transcription process RSS (`rss_mb`, from gopsutil — includes cgo/mmap model memory) for both batch and stream sessions
- "Save Last Recording" now works for the local (Parakeet) model — captured PCM is saved as WAV (was cloud-only before)
- Add `-provider` and `-model` flags to override the saved provider/model from the CLI (an unavailable explicit `-provider` is now a hard error)
- Fix a crash (SIGSEGV/SIGABRT double-free in `ma_device_uninit`) when recording after a sleep/wake or audio-device change: the per-call device reinit left the device pointer dangling when reinit failed, so the next call uninited the already-freed device. Null the device after uninit in both capture and beep playback. Also serialize all miniaudio device lifecycle calls behind a process-wide lock (`internal/malgolock`) as defense against concurrent capture/playback init/uninit
- Fix the tray language menu to always reflect the active model's languages — both at startup and on model switch. English-only Parakeet models no longer offer Auto-detect, and switching providers (e.g. to Groq) now updates the list

## v0.3.8

- Update dialog points to install instructions
Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Sumer Cip

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
57 changes: 47 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,34 +1,71 @@
.PHONY: build build-linux-amd64 build-linux-arm64 test test-integration benchmark integration-test clean bump-version release icns app
.PHONY: build build-linux-amd64 build-linux-arm64 test test-integration benchmark integration-test clean bump-version release icns app parakeet-lib download-models

VERSION ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")

build:
go build -ldflags="-X main.version=$(VERSION)" -o zee
# Local STT (Parakeet) is a darwin/arm64-only cgo feature. On that host we build
# the static parakeet.cpp + ggml archives first and stamp the macOS deploy
# target; everywhere else the no-cgo stub is compiled and these are no-ops.
MACOS_MIN := 11.0
PARAKEET_DIR := third_party/parakeet.cpp
PARAKEET_LIB := $(PARAKEET_DIR)/build-release/libparakeet.a
HOST := $(shell go env GOOS)/$(shell go env GOARCH)
ifeq ($(HOST),darwin/arm64)
CGO_ENV := MACOSX_DEPLOYMENT_TARGET=$(MACOS_MIN) CGO_CFLAGS=-mmacosx-version-min=$(MACOS_MIN) CGO_LDFLAGS=-mmacosx-version-min=$(MACOS_MIN)
endif

build: parakeet-lib download-models
$(CGO_ENV) go build -ldflags="-X main.version=$(VERSION)" -o zee

# Fetch the mandatory (PreFetch) local models into the dev folder from the
# pinned models-<Version> GitHub release. Reuses the localmodel registry +
# downloader (single source of truth) and is a per-file no-op when present.
download-models:
go run ./cmd/modeldl

# Configure once (submodule init + cmake, which auto-applies the in-tree ggml
# patches), then always `cmake --build` so source changes recompile incrementally
# and relink — a no-op when nothing changed. After a submodule bump, delete
# build-release to force a reconfigure (re-applies the patch to the new ggml).
parakeet-lib:
@if [ "$(HOST)" != "darwin/arm64" ]; then exit 0; fi; \
if [ ! -f $(PARAKEET_DIR)/CMakeLists.txt ]; then \
echo "==> initializing parakeet.cpp submodule (first checkout)"; \
git submodule update --init --recursive $(PARAKEET_DIR); \
fi; \
if [ ! -d $(PARAKEET_DIR)/build-release ]; then \
echo "==> configuring parakeet.cpp (one-time)"; \
cmake -S $(PARAKEET_DIR) -B $(PARAKEET_DIR)/build-release \
-DBUILD_SHARED_LIBS=OFF -DPARAKEET_SHARED=OFF -DPARAKEET_BUILD_CLI=OFF \
-DPARAKEET_GGML_METAL=OFF -DGGML_NATIVE=OFF \
-DCMAKE_OSX_DEPLOYMENT_TARGET=$(MACOS_MIN) \
-DCMAKE_C_FLAGS="-mcpu=apple-m1" -DCMAKE_CXX_FLAGS="-mcpu=apple-m1"; \
fi && \
cmake --build $(PARAKEET_DIR)/build-release -j

build-linux-amd64:
GOOS=linux GOARCH=amd64 go build -ldflags="-X main.version=$(VERSION) -s -w" -o zee-linux-amd64

build-linux-arm64:
GOOS=linux GOARCH=arm64 go build -ldflags="-X main.version=$(VERSION) -s -w" -o zee-linux-arm64

test:
go test -race -v ./...
test: parakeet-lib
$(CGO_ENV) go test -race -v ./...

integration-test:
integration-test: parakeet-lib
@test -n "$(WAV)" || (echo "Usage: make integration-test WAV=file.wav" && exit 1)
@if [ -f .env ]; then export $$(grep -v '^#' .env | xargs); fi; \
test -n "$$GROQ_API_KEY" || (echo "Error: GROQ_API_KEY not set (create .env or export it)" && exit 1); \
go run test/integration_test.go $(WAV)
$(CGO_ENV) go run test/integration_test.go $(WAV)

benchmark: build
@test -n "$(WAV)" || (echo "Usage: make benchmark WAV=file.wav [RUNS=5]" && exit 1)
@if [ -f .env ]; then export $$(grep -v '^#' .env | xargs); fi; \
./zee -benchmark $(WAV) -runs $(or $(RUNS),3)

test-integration:
test-integration: parakeet-lib
@tmp=$$(mktemp -d) && \
go build -o "$$tmp/zee-test-bin" . && \
ZEE_TEST_BIN="$$tmp/zee-test-bin" go test -race -tags integration -v -timeout 120s -count=1 ./test/ ; \
$(CGO_ENV) go build -o "$$tmp/zee-test-bin" . && \
ZEE_TEST_BIN="$$tmp/zee-test-bin" $(CGO_ENV) go test -race -tags integration -v -timeout 600s -count=1 ./test/ ; \
status=$$? ; rm -rf "$$tmp" ; exit $$status

icns:
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

## Highlights

- **Offline, on-device** — on Apple Silicon, transcribes fully locally via Parakeet (parakeet.cpp, CPU) with **no API key and no network**. Cloud providers are optional and switchable from the tray.
- **System tray app** — lives in the menu bar. Switch microphones, transcription providers, and languages from the tray menu. Dynamic icons show recording and warning states.
- **Two recording modes** — push-to-talk (hold hotkey) or tap-to-toggle (tap to start/stop).
- **Real-time streaming** — when a streaming-capable model is selected (e.g. Deepgram Nova-3), words appear as you speak and auto-paste into the focused window incrementally.
Expand All @@ -25,7 +26,7 @@
- **Multiple providers** — Groq, OpenAI, Mistral, ElevenLabs, and Deepgram, switchable from the tray menu at runtime.
- **36 languages** — select transcription language from the tray menu or via `-lang` flag.
- **Cross-platform** — minimal dependencies, pure Go where possible.
- [x] macOS
- [x] macOS (Apple Silicon)
- [ ] Linux
- [ ] Windows

Expand All @@ -50,15 +51,13 @@ Downloads the latest DMG, verifies its SHA256 against `checksums.txt`, copies `Z
For terminal usage:

```bash
# Apple Silicon
# Apple Silicon (the only supported target)
curl -L https://github.com/sumerc/zee/releases/latest/download/zee_darwin_arm64.tar.gz | tar xz

# Intel
curl -L https://github.com/sumerc/zee/releases/latest/download/zee_darwin_amd64.tar.gz | tar xz
```

```bash
GROQ_API_KEY=xxx ./zee # Groq Whisper
./zee # offline, on-device (no key needed)
GROQ_API_KEY=xxx ./zee # Groq Whisper (cloud)
DEEPGRAM_API_KEY=xxx ./zee # Deepgram (streaming auto-enabled when a streaming model is selected from the tray)
./zee -debug-transcribe # include transcription text logs
```
Expand All @@ -67,15 +66,20 @@ DEEPGRAM_API_KEY=xxx ./zee # Deepgram (streaming auto-enabled when a st

### Build from source

Requires **Apple Silicon**, plus `cmake` and the Xcode Command Line Tools (for the one-time on-device STT engine build).

```bash
git clone https://github.com/sumerc/zee && cd zee
make build # CLI binary
make build # builds the local STT engine (cmake) + CLI binary;
# first run also fetches the default models (~900 MB) into models/parakeet/v1/
make app # macOS DMG
```

The submodule, static libraries, and models are all set up automatically by `make build` — no manual steps.

## Usage

Set at least one API key, then run zee:
On Apple Silicon, zee works offline out of the box — no key required. To use a cloud provider instead, set its key (pick the provider from the tray), then run zee:

```bash
export GROQ_API_KEY=your_key # batch mode (Groq Whisper)
Expand Down
88 changes: 88 additions & 0 deletions THIRD_PARTY_LICENSES
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
Third-Party Licenses and Attribution
=====================================

Zee statically links the components below and bundles/downloads the local
speech-to-text models below. Their licenses and the required attributions are
reproduced here.


--------------------------------------------------------------------------------
1. parakeet.cpp — local speech-to-text engine (statically linked)
https://github.com/mudler/parakeet.cpp
License: MIT
--------------------------------------------------------------------------------

MIT License

Copyright (c) 2026 the parakeet.cpp authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


--------------------------------------------------------------------------------
2. ggml — tensor library (statically linked via parakeet.cpp)
https://github.com/ggml-org/ggml
License: MIT
--------------------------------------------------------------------------------

MIT License

Copyright (c) 2023-2026 The ggml authors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


--------------------------------------------------------------------------------
3. Parakeet speech-to-text models (GGUF)
License: Creative Commons Attribution 4.0 International (CC-BY-4.0)
https://creativecommons.org/licenses/by/4.0/
--------------------------------------------------------------------------------

The local transcription models that Zee bundles and/or downloads are GGUF
conversions of the NVIDIA NeMo Parakeet checkpoints:

- nvidia/parakeet-tdt_ctc-110m (110M, English)
- nvidia/parakeet-tdt-0.6b-v2 (0.6B, English)
- nvidia/parakeet-tdt-0.6b-v3 (0.6B, multilingual)

Source: NVIDIA NeMo — https://github.com/NVIDIA/NeMo
License: CC-BY-4.0 — https://creativecommons.org/licenses/by/4.0/

Changes made: the original NeMo checkpoints were converted to the GGUF format
for use with parakeet.cpp; the v3 model is additionally quantized (q4_k). No
other modifications were made to the model weights.

"Parakeet", "NeMo", and "NVIDIA" are trademarks of NVIDIA Corporation. Zee is
not affiliated with, sponsored by, or endorsed by NVIDIA.
Loading
Loading