The setup script checks your hardware, installs dependencies, walks you through cloning a voice from YouTube, and starts the server.
git clone https://github.com/adrianwedd/afterwords.git
cd afterwords
bash setup.shclick to copy
Setup installs the afterwords command to your PATH and starts the server. If Claude Code is installed, it also wires a Stop hook so every response is spoken aloud. Without it, you get a standalone TTS API at localhost:7860.
Each says “You are absolutely right. Your Claude Code session could sound like me.” — generated locally on a 32 GB M1.
Add your own:
afterwords clone "https://youtube.com/watch?v=..." myvoice 30click to copy
Three flagship voices, each synthesized by both Qwen3-TTS sizes. Click a tab to hear the difference between the 0.6B (default, fastest) and 1.7B (higher quality, slower) variants on the same 15-second reference.
Chatterbox and VoxCPM are also loaded as backends but their voice-cloning fidelity in this integration is under verification — samples are tracked in #14 and will be added once we’re confident they’re cloning the reference rather than producing default voices.
Claude Code integration
Standalone API
Programmatic cloning (--allow-clone)
/voice handles input. This project handles output. Together: voice conversations.
The server ships four MLX backends: Qwen3-TTS (0.6B and 1.7B, 8-bit), Chatterbox (fp16, multilingual), and VoxCPM 1.5 (44.1 kHz). Zero-shot voice cloning — no training. A 15-second reference + transcript = cloned voice on every backend.
The server runs on localhost:7860. No authentication. Use it from curl, scripts, other editors, web apps — anything that speaks HTTP. Endpoints marked --allow-clone require launching the server with that flag.
Server status, loaded voices, and readiness.
curl localhost:7860/health | jq .click to copy
{
"status": "ok",
"model": "mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit",
"backend": "mlx",
"model_loaded": true,
"ready": true,
"voices": ["attenborough", "attenborough-chatterbox", "attenborough-qwen3-17b", "attenborough-voxcpm-15", "audrey", "..."],
"default_voice": "galadriel",
"loaded_backends": {
"qwen3-0.6b": {"loaded": true, "voice_count": 45, "sample_rate": 24000, "display_name": "Qwen3-TTS 0.6B", "supported_langs": ["en","zh","ja","ko","es","fr","de","it","pt","ru"]},
"qwen3-1.7b": {"loaded": true, "voice_count": 61, "sample_rate": 24000, "display_name": "Qwen3-TTS 1.7B", "supported_langs": ["en","zh","ja","ko","es","fr","de","it","pt","ru"]},
"chatterbox": {"loaded": true, "voice_count": 3, "sample_rate": 24000, "display_name": "Chatterbox (fp16, multilingual)", "supported_langs": ["en","es","fr","de","it","pt","zh","ja","ko"]},
"voxcpm-1.5": {"loaded": true, "voice_count": 3, "sample_rate": 44100, "display_name": "VoxCPM 1.5", "supported_langs": ["en","zh"]},
"voxtral": {"loaded": true, "voice_count": 0, "sample_rate": 24000, "display_name": "Voxtral 4B", "supported_langs": ["en","fr","de","es","it","pt","nl","ru","zh","ja","ko","ar","hi"]},
"soprotts": {"loaded": true, "voice_count": 0, "sample_rate": 24000, "display_name": "SoproTTS", "supported_langs": ["en"]}
}
}click to copy
Generate speech from text. Returns 16-bit PCM WAV audio.
# Synthesize and play
curl "localhost:7860/synthesize?text=Hello+world&voice=snape" -o out.wav
afplay out.wav
# Non-English (the voice's backend must support the lang, or its family must)
curl "localhost:7860/synthesize?text=Ni+hao&voice=galadriel&lang=zh" -o hi.wav
# Pipe directly to speaker (macOS)
curl -s "localhost:7860/synthesize?text=Testing" | afplay -click to copy
Response includes timing headers: X-Synthesis-Time, X-Duration, X-Sample-Rate, X-Backend (the actual backend that synthesized — may differ from the voice's pinned backend if family-routing kicked in).
Clone a new voice from a YouTube clip. Run afterwords reload to load it (no restart needed).
# Interactive (prompts for URL and name)
afterwords clone
# Non-interactive (URL, name, start-second)
afterwords clone "https://youtube.com/watch?v=..." mycustomvoice 30
# Fully automated (skip transcript confirmation)
afterwords clone "https://youtube.com/watch?v=..." mycustomvoice 30 --yesclick to copy
Each voice is a 700 KB WAV + JSON profile in voices/. Adding voices costs zero extra memory.
JSON body version of /synthesize. Supports emotion-based palette lookup for session-cloned voices. Requires --allow-clone.
curl -X POST localhost:7860/synthesize \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "my-session", "emotion": "cheerful"}' \
-o out.wavclick to copy
Create a voice profile from uploaded audio. Denoises, optionally transcribes via Whisper, and registers the voice for immediate use. Requires --allow-clone.
# Clone a single voice
curl -X POST localhost:7860/clone \
-F "audio=@sample.wav" \
-F "session_id=my-voice" \
-F "emotion=neutral"
# Build a palette with multiple emotions
curl -X POST localhost:7860/clone \
-F "audio=@cheerful.wav" -F "session_id=my-voice" -F "emotion=cheerful"
curl -X POST localhost:7860/clone \
-F "audio=@serious.wav" -F "session_id=my-voice" -F "emotion=serious"click to copy
Returns voice name, quality rating (rough/developing/good based on duration), session ID, and sequence number. Voices are available immediately — no restart needed.
Rescan voices/*.json and merge new or changed profiles into the live registry — no restart, no synthesis interruption. Add-only and atomic: if any profile fails to validate, no changes commit. Requires --allow-clone.
# After cloning a new voice or editing voices/*.json:
curl -X POST localhost:7860/reload | jq .
# Or via the CLI wrapper:
afterwords reloadclick to copy
Returns {"status":"ok","reloaded":[names...],"errors":[]} on success or {"status":"failed","errors":[{file, error}]} on atomic abort. Voices removed from disk are NOT dropped — use DELETE /session/{id} for that.
Remove all voice palette entries and files for a session. Also cleans up any backend temp files (e.g. VoxCPM resampled refs). Requires --allow-clone.
curl -X DELETE localhost:7860/session/my-voiceclick to copy
Drop a .afterwords file in any project root. The hook reads it before each synthesis — no server restart.
echo "galadriel" > ~/work/frontend/.afterwords
echo "snape" > ~/work/backend/.afterwords
echo "loki" > ~/fun/side-project/.afterwordsclick to copy
If your project uses multiple agents, map each one to its own voice:
# .afterwords — agent-to-voice mapping
default: data
clara-oswald: clara-oswald
donna-noble: donna-noble
k9: k9click to copy
When Claude Code spawns a subagent, the hook reads its agent_type and looks up the matching voice. Falls back to default: if no match.
Clone the same speaker multiple times with different emotional deliveries. The server groups them into a palette by session ID.
# Clone three emotions from the same speaker
curl -X POST localhost:7860/clone \
-F "audio=@neutral.wav" -F "session_id=narrator" -F "emotion=neutral"
curl -X POST localhost:7860/clone \
-F "audio=@cheerful.wav" -F "session_id=narrator" -F "emotion=cheerful"
curl -X POST localhost:7860/clone \
-F "audio=@serious.wav" -F "session_id=narrator" -F "emotion=serious"
# Synthesize with a specific emotion
curl -X POST localhost:7860/synthesize \
-H "Content-Type: application/json" \
-d '{"text": "Great news!", "voice": "narrator", "emotion": "cheerful"}' \
-o out.wavclick to copy
Each palette entry gets a quality rating based on clip duration: rough (<5s), developing (5–15s), good (15s+). If no emotion match is found, the server falls back to the best-quality entry for that session. Clean up with DELETE /session/{id}.
--allow-clone. Palette voices are ephemeral — stored in memory and on disk while the server runs. Use afterwords clone for permanent voices.
The afterwords CLI is added to your PATH during setup. It wraps launchd, health checks, and voice management into a single tool.
afterwords start # start the server (auto-starts on login)
afterwords stop # stop the server
afterwords restart # restart after adding voices
afterwords status # show health, model, loaded voices
afterwords logs # tail the server log
afterwords voices # list available voices
afterwords reload # pick up new voices without restart
afterwords clone # clone a new voice from YouTube
afterwords uninstall # remove service and optionally hooksclick to copy
The server runs on localhost:7860 and auto-starts on login via macOS launchd. If you prefer to run it manually:
source .venv/bin/activate
python server.py # read-only (GET endpoints only)
python server.py --allow-clone # enables POST /clone, POST /synthesize, DELETE /sessionclick to copy
127.0.0.1 (localhost only) for security.
The setup script installs everything else. Claude Code is optional — use bash setup.sh --server-only for the API without hooks. Either way, you get the afterwords CLI for managing the server.
Qwen3-TTS (Alibaba) · Chatterbox (mlx-community) · VoxCPM (mlx-community) · mlx-audio · MLX (Apple) · Claude Code (Anthropic)
Originally built for SPARK, a robot with an inner life. Full tutorial: Voice Cloning with Qwen3-TTS.