kitsu is a self-hosted chat application built on top of llama.cpp. It provides a web frontend and a backend API that manages llama.cpp processes, handles authentication, and runs tool calls.
Features:
- Model management (sleep mode and wake-on-keystroke) — define multiple models in
config.json; kitsu serves them on demand and swaps between them automatically. Idle models are shut down after a configurable timeout to save system resources. When a user starts typing on the frontend, the backend begins waking up the selected model. - Web search and page loading — uses Brave search + Playwright to give models access to the web
- File uploads — supports PDF, JSON, text, and image uploads in chat
This is a hobby project, use at your own risk. This codebase makes a lot of assumptions that pertain to my personal setup:
- OS: Ubuntu Server 24
- GPU: NVIDIA GeForce RTX 5090
- Preferred models: Qwen3.5-27B, Qwen3.5-35B-A3B
git clone --recurse-submodules https://github.com/wadealexc/kitsu
cd kitsu
# Install deps
npm run install:all
# Build llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
cd ..See llama.cpp build docs for other configurations.
cp config.example.json config.jsonEdit config.json — at minimum, set your model paths and ports. See Configuration below.
TODO: add a script to automate creation of config.json from config.example.json
Place GGUF files in the models/ directory. Subdirectories are supported:
models/
my-model/
model-q6.gguf
mmproj.gguf # optional, for vision models
Then reference them in config.json under models.models.
TODO: add a script that auto-updates config.json as models are added to models
npm run dev # backend
npm run dev:frontend # frontendBuild the backend and install the systemd service:
npm run build
./install-service.sh
sudo systemctl enable --now kitsuStart the frontend:
docker compose up -dThe frontend is available at http://localhost:5050.
Useful service commands:
systemctl status kitsu
journalctl -u kitsu -f -o cat
sudo systemctl stop kitsu
sudo systemctl restart kitsukitsu is configured via config.json at the repo root. Copy config.example.json as a starting point.
models — define models and the models directory:
"models": {
"path": "./models",
"onStart": "my-model",
"models": [
{
"gguf": "my-model/model-q6",
"alias": "my-model",
"args": ["--ctx-size", "32768"]
}
]
}ports — configure host/port for llama-server and the backend:
"ports": {
"llamaCpp": { "port": 8070, "host": "0.0.0.0" },
"backend": { "port": 8071, "host": "0.0.0.0" }
}llamaCpp — sleep timeout:
"llamaCpp": {
"sleepAfterXSeconds": 600
}web — enable web search and page loading (requires a Brave API key):
"web": {
"enable": true,
"braveAPIKey": "YOUR_API_KEY",
"blacklistHosts": ["grokipedia.com"]
}blacklistHosts is an optional list of hostnames to exclude from web search results. Blacklisted hosts are never fetched. Defaults to [].
After setting enable: true, install the Firefox browser for Playwright (one-time setup, run from the backend/ directory):
cd backend && npx playwright install firefox