About Kitten TTS: Open-source (Apache 2.0) by KittenML. ONNX-based, CPU-first, 15M-80M params. v0.8.1.

Prerequisites

  • Ubuntu 20.04+, Debian 11+, or Fedora 36+
  • Python 3.8+
  • pip and venv
  • ~100 MB disk space

Step 1: System Dependencies

# Ubuntu / Debian
sudo apt update
sudo apt install python3 python3-pip python3-venv libsndfile1-dev

# Fedora
sudo dnf install python3 python3-pip python3-virtualenv libsndfile-devel

Step 2: Virtual Environment

mkdir kitten-tts-project
cd kitten-tts-project
python3 -m venv venv
source venv/bin/activate

Step 3: Install Kitten TTS v0.8.1

pip install --upgrade pip
pip install soundfile
pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl

Step 4: Verify

python3 -c "from kittentts import KittenTTS; print('Ready!')"

Step 5: First Generation

from kittentts import KittenTTS
import soundfile as sf

model = KittenTTS("KittenML/kitten-tts-mini-0.8")
audio = model.generate("Hello from Linux!", voice="Bruno")
sf.write("output.wav", audio, 24000)
print("Saved output.wav")

Step 6: systemd Service

Create a systemd service for 24/7 TTS production:

sudo nano /etc/systemd/system/kittentts.service
[Unit]
Description=Kitten TTS Service
After=network.target

[Service]
Type=simple
User=youruser
WorkingDirectory=/home/youruser/kitten-tts-project
ExecStart=/home/youruser/kitten-tts-project/venv/bin/python3 tts_server.py
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable kittentts
sudo systemctl start kittentts
sudo systemctl status kittentts

Model Variants for Linux

ModelParamsSizeBest For
kitten-tts-mini-0.880M80 MBProduction servers
kitten-tts-micro-0.840M41 MBVPS deployment
kitten-tts-nano-0.815M56 MBRaspberry Pi, edge
kitten-tts-nano-0.8-int815M25 MBUltra-light VPS

Troubleshooting

libsndfile not found: Install libsndfile1-dev (Debian/Ubuntu) or libsndfile-devel (Fedora).
Headless server: For servers without audio, use soundfile.write() to save WAV files. No audio hardware needed.