wyoming-chatterbox

wyoming protocol server for chatterbox tts with voice cloning.

clone any voice with a 10-30 second audio sample.

requirements

nvidia gpu with 4gb+ vram
cuda 12.x
python 3.10+

install

from source:

git clone https://github.com/sudoxreboot/wyoming-chatterbox
cd wyoming-chatterbox
pip install .

usage

wyoming-chatterbox --uri tcp://0.0.0.0:10201 --voice-ref /path/to/voice_sample.wav

options

option	default	description
`--uri`	required	server uri (e.g., `tcp://0.0.0.0:10201`)
`--voice-ref`	required	path to voice reference wav (10-30s of speech)
`--volume-boost`	3.0	output volume multiplier
`--device`	cuda	torch device (`cuda` or `cpu`)
`--debug`	false	enable debug logging

voice reference tips

for best results:

10-30 seconds of clean speech
no background music or noise
consistent speaking style
wav format (any sample rate)

systemd service

sudo tee /etc/systemd/system/wyoming-chatterbox.service << 'EOF'
[Unit]
Description=Wyoming Chatterbox TTS
After=network-online.target

[Service]
Type=simple
User=YOUR_USER
Environment=PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
ExecStart=/path/to/venv/bin/wyoming-chatterbox \
  --uri tcp://0.0.0.0:10201 \
  --voice-ref /path/to/voice_reference.wav \
  --volume-boost 3.0
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now wyoming-chatterbox

home assistant

settings → devices & services → add integration
search "wyoming protocol"
host: YOUR_IP, port: 10201
use in your voice assistant pipeline as tts

gpu memory

chatterbox uses ~3.5gb vram. if you get oom errors:

# check gpu usage
nvidia-smi

# kill zombie processes
pkill -f wyoming-chatterbox

license

mit

made by sudoxnym ⚡

2 KiB Raw Blame History