Open Source · Apple Silicon · Zero Cloud

Give your code
a voice

Clone any voice from a 15-second YouTube clip. Run it locally on your Mac. Hear Claude Code speak every response — or use the API from anything.

16
Voices included
~6 GB
Peak memory
~20s
Per sentence
0
Cloud calls
Five minutes to your first voice

The setup script checks your hardware, installs dependencies, walks you through cloning a voice from YouTube, and starts the server.

git clone https://github.com/adrianwedd/afterwords.git
cd afterwords
bash setup.shclick to copy

If Claude Code is installed, setup wires a Stop hook so every response is spoken aloud. Without it, you get a standalone TTS API at localhost:7860.

16 voices, each cloned from a 15-second clip

Each says “You are absolutely right. Your Claude Code session could sound like me.” — generated locally on an 8 GB M1.

Add your own:

bash clone-voice.sh "https://youtube.com/watch?v=..." myvoice 30click to copy
Input meets output
You speak
/voice
Claude responds
Stop hook
TTS server
Speaker

/voice handles input. This project handles output. Together: voice conversations.

The server uses Qwen3-TTS (0.6B, 8-bit) on MLX. Zero-shot voice cloning — no training. A 15-second reference + transcript = cloned voice.

Why local? Nothing leaves your machine. No API key, no rate limits, no bill. The voice is yours.
A plain HTTP interface

The server runs on localhost:7860. No authentication. Use it from curl, scripts, other editors, web apps — anything that speaks HTTP.

GET /health

Server status, loaded voices, and readiness.

curl localhost:7860/health | jq .click to copy
{
  "status": "ok",
  "model": "mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit",
  "backend": "mlx",
  "model_loaded": true,
  "ready": true,
  "voices": ["audrey", "aurora", "avasarala", "bardem", ...],
  "default_voice": "galadriel"
}
200 OK
GET /synthesize

Generate speech from text. Returns 16-bit PCM WAV audio.

text required — string, max 5000 chars
voice optional — defaults to galadriel. Any name from /health
# Synthesize and play
curl "localhost:7860/synthesize?text=Hello+world&voice=snape" -o out.wav
afplay out.wav

# Pipe directly to speaker (macOS)
curl -s "localhost:7860/synthesize?text=Testing" | afplay -click to copy

Response includes timing headers: X-Synthesis-Time, X-Duration, X-Sample-Rate.

200 audio/wav 400 unknown voice 400 text empty / too long 503 warming up
CLI clone-voice.sh

Clone a new voice from a YouTube clip. The server auto-discovers new voices on restart.

# Interactive
bash clone-voice.sh

# Non-interactive (URL, name, start-second)
bash clone-voice.sh "https://youtube.com/watch?v=..." mycustomvoice 30

# Fully automated (skip transcript confirmation)
bash clone-voice.sh "https://youtube.com/watch?v=..." mycustomvoice 30 --yesclick to copy

Each voice is a 700 KB WAV + JSON profile in voices/. Adding voices costs zero extra memory.

Different voice, different repo

Drop a .afterwords file in any project root. The hook reads it before each synthesis — no server restart.

echo "galadriel" > ~/work/frontend/.afterwords
echo "snape"     > ~/work/backend/.afterwords
echo "loki"      > ~/fun/side-project/.afterwordsclick to copy
On an 8 GB M1
Model load
~5s cached
Per sentence
~20s
Peak memory
~6 GB
Adding a voice
0 extra RAM
What you need
Hardware
Apple Silicon M1+
Memory
8 GB+ RAM
Python
3.11+
Disk
~2 GB

The setup script installs everything else. Claude Code is optional — use --server-only for the API without hooks.

Qwen3-TTS (Alibaba, Apache 2.0) · mlx-audio · MLX (Apple) · Claude Code (Anthropic)

Originally built for SPARK, a robot with an inner life. Full tutorial: Voice Cloning with Qwen3-TTS.