mirror of https://github.com/sudoxnym/wyoming-chatterbox.git synced 2026-06-17 09:44:04 +00:00

No description

Find a file

sudoxreboot 6653cec9bc Update Dockerfile		2026-04-23 19:06:37 -05:00
wyoming_chatterbox	initial release - wyoming protocol server for chatterbox tts	2025-12-14 21:33:26 -06:00
.env.example	Rename 1.env.example to .env.example	2026-04-23 17:50:12 -05:00
.gitignore	initial release - wyoming protocol server for chatterbox tts	2025-12-14 21:33:26 -06:00
compose.yaml	Add files via upload	2026-04-23 17:46:01 -05:00
Dockerfile	Update Dockerfile	2026-04-23 19:06:37 -05:00
LICENSE	initial release - wyoming protocol server for chatterbox tts	2025-12-14 21:33:26 -06:00
pyproject.toml	initial release - wyoming protocol server for chatterbox tts	2025-12-14 21:33:26 -06:00
README.md	Add files via upload	2026-04-23 17:46:01 -05:00

README.md

wyoming-chatterbox

wyoming protocol server for chatterbox tts with voice cloning.

clone any voice with a 10-30 second audio sample. integrates directly with home assistant as a tts provider.

requirements

nvidia gpu with 4gb+ vram (3.5gb used at runtime)
cuda 12.x host driver (≥550.54.14)
nvidia container toolkit installed on host
docker + docker compose v2

docker (recommended)

1. configure

git clone https://github.com/sudoxreboot/wyoming-chatterbox
cd wyoming-chatterbox
cp .env.example .env

edit .env:

WYOMING_PORT=10800          # host port — change if 10800 is taken
VOICE_REF_DIR=/path/to/dir  # directory containing your reference wav
VOICE_REF_FILE=reference.wav
VOLUME_BOOST=3.0
TORCH_DEVICE=cuda

2. build and run

docker compose build
docker compose up -d

first run downloads ~3.5gb of chatterbox model weights into a named docker volume (chatterbox-cache). this only happens once.

3. check logs

docker compose logs -f
# you should see: "starting server at tcp://0.0.0.0:10800"

voice reference tips

10-30 seconds of clean speech
no background music or noise
consistent speaking style
wav format (any sample rate)

install from source (no docker)

git clone https://github.com/sudoxreboot/wyoming-chatterbox
cd wyoming-chatterbox
python3 -m venv .venv
source .venv/bin/activate
pip install .

wyoming-chatterbox --uri tcp://0.0.0.0:10800 --voice-ref /path/to/voice.wav

options

option	default	description
`--uri`	required	server uri (e.g., `tcp://0.0.0.0:10800`)
`--voice-ref`	required	path to voice reference wav (10-30s of speech)
`--volume-boost`	3.0	output volume multiplier
`--device`	cuda	torch device (`cuda` or `cpu`)
`--debug`	false	enable debug logging

systemd service (source install)

sudo tee /etc/systemd/system/wyoming-chatterbox.service << EOF
[Unit]
Description=Wyoming Chatterbox TTS
After=network-online.target

[Service]
Type=simple
User=$(whoami)
Environment=PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
ExecStart=$(pwd)/.venv/bin/wyoming-chatterbox \
  --uri tcp://0.0.0.0:10800 \
  --voice-ref /path/to/voice_reference.wav \
  --volume-boost 3.0
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now wyoming-chatterbox

home assistant

settings → devices & services → add integration
search wyoming protocol
host: your server ip, port: 10800 (or whatever you set in .env)
select it as your tts provider in the voice assistant pipeline

gpu memory

chatterbox uses ~3.5gb vram at runtime. if you get oom errors:

nvidia-smi

# docker
docker compose restart

# source
pkill -f wyoming-chatterbox

license

mit

made by sudoxnym ⚡