Open Source · Apple Silicon · Zero Cloud

Give your code
a voice

Clone any voice from a 15-second YouTube clip. Run it locally on your Mac. Hear Claude Code speak every response — or use the API from anything.

20
Voices included
4
Backends
~10 GB
Peak memory
0
Cloud calls
Five minutes to your first voice

The setup script checks your hardware, installs dependencies, walks you through cloning a voice from YouTube, and starts the server.

git clone https://github.com/adrianwedd/afterwords.git
cd afterwords
bash setup.shclick to copy

Setup installs the afterwords command to your PATH and starts the server. If Claude Code is installed, it also wires a Stop hook so every response is spoken aloud. Without it, you get a standalone TTS API at localhost:7860.

20 demo clips from the 110+ voice gallery — see backend comparison below for flagship voices across all 6 models

Each says “You are absolutely right. Your Claude Code session could sound like me.” — generated locally on a 32 GB M1.

Add your own:

afterwords clone "https://youtube.com/watch?v=..." myvoice 30click to copy
Same voice, two model sizes

Three flagship voices, each synthesized by both Qwen3-TTS sizes. Click a tab to hear the difference between the 0.6B (default, fastest) and 1.7B (higher quality, slower) variants on the same 15-second reference.

Picard
Patrick Stewart, Star Trek
Galadriel
Cate Blanchett, LOTR
Attenborough
David Attenborough, BBC Earth

Chatterbox and VoxCPM are also loaded as backends but their voice-cloning fidelity in this integration is under verification — samples are tracked in #14 and will be added once we’re confident they’re cloning the reference rather than producing default voices.

Input meets output

Claude Code integration

You speak
/voice
Claude responds
Stop hook
TTS server
Speaker

Standalone API

Any client
GET /synthesize
TTS server
WAV audio

Programmatic cloning (--allow-clone)

Upload audio
POST /clone
Denoise + transcribe
Voice palette
POST /synthesize

/voice handles input. This project handles output. Together: voice conversations.

The server ships four MLX backends: Qwen3-TTS (0.6B and 1.7B, 8-bit), Chatterbox (fp16, multilingual), and VoxCPM 1.5 (44.1 kHz). Zero-shot voice cloning — no training. A 15-second reference + transcript = cloned voice on every backend.

Why local? Nothing leaves your machine. No API key, no rate limits, no bill. The voice is yours.
A plain HTTP interface

The server runs on localhost:7860. No authentication. Use it from curl, scripts, other editors, web apps — anything that speaks HTTP. Endpoints marked --allow-clone require launching the server with that flag.

GET /health

Server status, loaded voices, and readiness.

curl localhost:7860/health | jq .click to copy
{
  "status": "ok",
  "model": "mlx-community/Qwen3-TTS-12Hz-0.6B-Base-8bit",
  "backend": "mlx",
  "model_loaded": true,
  "ready": true,
  "voices": ["attenborough", "attenborough-chatterbox", "attenborough-qwen3-17b", "attenborough-voxcpm-15", "audrey", "..."],
  "default_voice": "galadriel",
  "loaded_backends": {
    "qwen3-0.6b":  {"loaded": true, "voice_count": 45, "sample_rate": 24000, "display_name": "Qwen3-TTS 0.6B", "supported_langs": ["en","zh","ja","ko","es","fr","de","it","pt","ru"]},
    "qwen3-1.7b":  {"loaded": true, "voice_count": 61, "sample_rate": 24000, "display_name": "Qwen3-TTS 1.7B", "supported_langs": ["en","zh","ja","ko","es","fr","de","it","pt","ru"]},
    "chatterbox":  {"loaded": true, "voice_count": 3,  "sample_rate": 24000, "display_name": "Chatterbox (fp16, multilingual)", "supported_langs": ["en","es","fr","de","it","pt","zh","ja","ko"]},
    "voxcpm-1.5":  {"loaded": true, "voice_count": 3,  "sample_rate": 44100, "display_name": "VoxCPM 1.5", "supported_langs": ["en","zh"]},
    "voxtral":     {"loaded": true, "voice_count": 0,  "sample_rate": 24000, "display_name": "Voxtral 4B", "supported_langs": ["en","fr","de","es","it","pt","nl","ru","zh","ja","ko","ar","hi"]},
    "soprotts":    {"loaded": true, "voice_count": 0,  "sample_rate": 24000, "display_name": "SoproTTS", "supported_langs": ["en"]}
  }
}click to copy
200 OK
GET /synthesize

Generate speech from text. Returns 16-bit PCM WAV audio.

text required — string, max 5000 chars
voice optional — defaults to galadriel. Any name from /health
lang optional — BCP-47 language code, defaults to en. Must be in the voice's backend's supported_langs (see /health). If unsupported and the voice declares a family, the server auto-routes to a same-family voice on a backend that supports it.
# Synthesize and play
curl "localhost:7860/synthesize?text=Hello+world&voice=snape" -o out.wav
afplay out.wav

# Non-English (the voice's backend must support the lang, or its family must)
curl "localhost:7860/synthesize?text=Ni+hao&voice=galadriel&lang=zh" -o hi.wav

# Pipe directly to speaker (macOS)
curl -s "localhost:7860/synthesize?text=Testing" | afplay -click to copy

Response includes timing headers: X-Synthesis-Time, X-Duration, X-Sample-Rate, X-Backend (the actual backend that synthesized — may differ from the voice's pinned backend if family-routing kicked in).

200 audio/wav 400 unknown voice 400 lang not supported 400 text empty / too long 500 synthesis failed 503 warming up
CLI afterwords clone

Clone a new voice from a YouTube clip. Run afterwords reload to load it (no restart needed).

# Interactive (prompts for URL and name)
afterwords clone

# Non-interactive (URL, name, start-second)
afterwords clone "https://youtube.com/watch?v=..." mycustomvoice 30

# Fully automated (skip transcript confirmation)
afterwords clone "https://youtube.com/watch?v=..." mycustomvoice 30 --yesclick to copy

Each voice is a 700 KB WAV + JSON profile in voices/. Adding voices costs zero extra memory.

POST /synthesize

JSON body version of /synthesize. Supports emotion-based palette lookup for session-cloned voices. Requires --allow-clone.

text required — string, max 5000 chars
voice required — voice name or session ID
emotion optional — selects the matching palette entry (e.g. "cheerful", "serious")
curl -X POST localhost:7860/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "my-session", "emotion": "cheerful"}' \
  -o out.wavclick to copy
200 audio/wav 400 unknown voice 404 --allow-clone not enabled 503 warming up
POST /clone

Create a voice profile from uploaded audio. Denoises, optionally transcribes via Whisper, and registers the voice for immediate use. Requires --allow-clone.

audio required — WAV file upload (multipart/form-data)
session_id required — groups palette entries (e.g. "my-voice")
emotion optional — tag for this entry, defaults to "neutral"
transcript optional — if omitted, auto-transcribed with Whisper
# Clone a single voice
curl -X POST localhost:7860/clone \
  -F "audio=@sample.wav" \
  -F "session_id=my-voice" \
  -F "emotion=neutral"

# Build a palette with multiple emotions
curl -X POST localhost:7860/clone \
  -F "audio=@cheerful.wav" -F "session_id=my-voice" -F "emotion=cheerful"
curl -X POST localhost:7860/clone \
  -F "audio=@serious.wav" -F "session_id=my-voice" -F "emotion=serious"click to copy

Returns voice name, quality rating (rough/developing/good based on duration), session ID, and sequence number. Voices are available immediately — no restart needed.

200 JSON 400 audio too short 404 --allow-clone not enabled 500 clone failed
POST /reload

Rescan voices/*.json and merge new or changed profiles into the live registry — no restart, no synthesis interruption. Add-only and atomic: if any profile fails to validate, no changes commit. Requires --allow-clone.

# After cloning a new voice or editing voices/*.json:
curl -X POST localhost:7860/reload | jq .

# Or via the CLI wrapper:
afterwords reloadclick to copy

Returns {"status":"ok","reloaded":[names...],"errors":[]} on success or {"status":"failed","errors":[{file, error}]} on atomic abort. Voices removed from disk are NOT dropped — use DELETE /session/{id} for that.

200 OK 404 --allow-clone not enabled 500 atomic abort (errors[] populated)
DELETE /session/{session_id}

Remove all voice palette entries and files for a session. Also cleans up any backend temp files (e.g. VoxCPM resampled refs). Requires --allow-clone.

curl -X DELETE localhost:7860/session/my-voiceclick to copy
200 OK 404 --allow-clone not enabled
Different voice, different repo

Drop a .afterwords file in any project root. The hook reads it before each synthesis — no server restart.

echo "galadriel" > ~/work/frontend/.afterwords
echo "snape"     > ~/work/backend/.afterwords
echo "loki"      > ~/fun/side-project/.afterwordsclick to copy

If your project uses multiple agents, map each one to its own voice:

# .afterwords — agent-to-voice mapping
default: data
clara-oswald: clara-oswald
donna-noble: donna-noble
k9: k9click to copy

When Claude Code spawns a subagent, the hook reads its agent_type and looks up the matching voice. Falls back to default: if no match.

One voice, many moods

Clone the same speaker multiple times with different emotional deliveries. The server groups them into a palette by session ID.

# Clone three emotions from the same speaker
curl -X POST localhost:7860/clone \
  -F "audio=@neutral.wav" -F "session_id=narrator" -F "emotion=neutral"
curl -X POST localhost:7860/clone \
  -F "audio=@cheerful.wav" -F "session_id=narrator" -F "emotion=cheerful"
curl -X POST localhost:7860/clone \
  -F "audio=@serious.wav" -F "session_id=narrator" -F "emotion=serious"

# Synthesize with a specific emotion
curl -X POST localhost:7860/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Great news!", "voice": "narrator", "emotion": "cheerful"}' \
  -o out.wavclick to copy

Each palette entry gets a quality rating based on clip duration: rough (<5s), developing (5–15s), good (15s+). If no emotion match is found, the server falls back to the best-quality entry for that session. Clean up with DELETE /session/{id}.

Requires --allow-clone. Palette voices are ephemeral — stored in memory and on disk while the server runs. Use afterwords clone for permanent voices.
One command for everything

The afterwords CLI is added to your PATH during setup. It wraps launchd, health checks, and voice management into a single tool.

afterwords start       # start the server (auto-starts on login)
afterwords stop        # stop the server
afterwords restart     # restart after adding voices
afterwords status      # show health, model, loaded voices
afterwords logs        # tail the server log
afterwords voices      # list available voices
afterwords reload      # pick up new voices without restart
afterwords clone       # clone a new voice from YouTube
afterwords uninstall   # remove service and optionally hooksclick to copy

The server runs on localhost:7860 and auto-starts on login via macOS launchd. If you prefer to run it manually:

source .venv/bin/activate
python server.py                  # read-only (GET endpoints only)
python server.py --allow-clone    # enables POST /clone, POST /synthesize, DELETE /sessionclick to copy
--allow-clone enables the clone and session endpoints and automatically binds to 127.0.0.1 (localhost only) for security.
On a 32 GB Apple Silicon Mac
Qwen3 0.6B
~20s / sentence
Qwen3 1.7B
~35s / sentence
Chatterbox
~25s / sentence
VoxCPM 1.5
~30s / sentence
Peak memory
~10 GB (all 4 loaded)
Adding a voice
0 extra RAM
What you need
Hardware
Apple Silicon M1+
Memory
32 GB+ RAM
Python
3.11+
Disk
~2 GB

The setup script installs everything else. Claude Code is optional — use bash setup.sh --server-only for the API without hooks. Either way, you get the afterwords CLI for managing the server.

Qwen3-TTS (Alibaba) · Chatterbox (mlx-community) · VoxCPM (mlx-community) · mlx-audio · MLX (Apple) · Claude Code (Anthropic)

Originally built for SPARK, a robot with an inner life. Full tutorial: Voice Cloning with Qwen3-TTS.