No description
Find a file
2026-04-23 19:06:37 -05:00
wyoming_chatterbox initial release - wyoming protocol server for chatterbox tts 2025-12-14 21:33:26 -06:00
.env.example Rename 1.env.example to .env.example 2026-04-23 17:50:12 -05:00
.gitignore initial release - wyoming protocol server for chatterbox tts 2025-12-14 21:33:26 -06:00
compose.yaml Add files via upload 2026-04-23 17:46:01 -05:00
Dockerfile Update Dockerfile 2026-04-23 19:06:37 -05:00
LICENSE initial release - wyoming protocol server for chatterbox tts 2025-12-14 21:33:26 -06:00
pyproject.toml initial release - wyoming protocol server for chatterbox tts 2025-12-14 21:33:26 -06:00
README.md Add files via upload 2026-04-23 17:46:01 -05:00

wyoming-chatterbox

wyoming protocol server for chatterbox tts with voice cloning.

clone any voice with a 10-30 second audio sample. integrates directly with home assistant as a tts provider.

requirements

  • nvidia gpu with 4gb+ vram (3.5gb used at runtime)
  • cuda 12.x host driver (≥550.54.14)
  • nvidia container toolkit installed on host
  • docker + docker compose v2

1. configure

git clone https://github.com/sudoxreboot/wyoming-chatterbox
cd wyoming-chatterbox
cp .env.example .env

edit .env:

WYOMING_PORT=10800          # host port — change if 10800 is taken
VOICE_REF_DIR=/path/to/dir  # directory containing your reference wav
VOICE_REF_FILE=reference.wav
VOLUME_BOOST=3.0
TORCH_DEVICE=cuda

2. build and run

docker compose build
docker compose up -d

first run downloads ~3.5gb of chatterbox model weights into a named docker volume (chatterbox-cache). this only happens once.

3. check logs

docker compose logs -f
# you should see: "starting server at tcp://0.0.0.0:10800"

voice reference tips

  • 10-30 seconds of clean speech
  • no background music or noise
  • consistent speaking style
  • wav format (any sample rate)

install from source (no docker)

git clone https://github.com/sudoxreboot/wyoming-chatterbox
cd wyoming-chatterbox
python3 -m venv .venv
source .venv/bin/activate
pip install .
wyoming-chatterbox --uri tcp://0.0.0.0:10800 --voice-ref /path/to/voice.wav

options

option default description
--uri required server uri (e.g., tcp://0.0.0.0:10800)
--voice-ref required path to voice reference wav (10-30s of speech)
--volume-boost 3.0 output volume multiplier
--device cuda torch device (cuda or cpu)
--debug false enable debug logging

systemd service (source install)

sudo tee /etc/systemd/system/wyoming-chatterbox.service << EOF
[Unit]
Description=Wyoming Chatterbox TTS
After=network-online.target

[Service]
Type=simple
User=$(whoami)
Environment=PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
ExecStart=$(pwd)/.venv/bin/wyoming-chatterbox \
  --uri tcp://0.0.0.0:10800 \
  --voice-ref /path/to/voice_reference.wav \
  --volume-boost 3.0
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now wyoming-chatterbox

home assistant

  1. settings → devices & services → add integration
  2. search wyoming protocol
  3. host: your server ip, port: 10800 (or whatever you set in .env)
  4. select it as your tts provider in the voice assistant pipeline

gpu memory

chatterbox uses ~3.5gb vram at runtime. if you get oom errors:

nvidia-smi

# docker
docker compose restart

# source
pkill -f wyoming-chatterbox

license

mit


made by sudoxnym