Zero-shot voice cloning on Apple Silicon using Qwen3-TTS. 98 voices included. Runs entirely on-device via a local HTTP server.
Each clip says "You are absolutely right. Your AI session could sound like me." — synthesized locally on a 32 GB M1.
Same voice, two model sizes. Qwen3-TTS 0.6B is the default (fastest). 1.7B is higher quality, ~2× slower.
Add your own with afterwords clone "https://youtube.com/watch?v=..." myvoice 30
The setup script checks your hardware, installs dependencies, and starts the server. Writes the afterwords CLI to your PATH.
git clone https://github.com/adrianwedd/afterwords
cd afterwords
bash setup.sh
curl "localhost:7860/synthesize?text=Hello+world&voice=picard" | afplay -
Server runs at localhost:7860 and auto-starts on login via launchd. No authentication.
Afterwords integrates with every major AI coding harness. A hook fires after each response, pipes the text to the local server, and plays it back in whatever voice you've set for that project or agent.
# Per-project voice — drop a .afterwords file in any repo
echo "galadriel" > ~/work/my-project/.afterwords
# Per-agent voices — one key per harness
cat > ~/work/my-project/.afterwords <<EOF
default: picard
cursor: lister
codex: seven-of-nine
agy: samantha
hermes: data
EOF
Each voice profile is a 700 KB WAV + JSON pair. Adding voices costs zero extra RAM — the reference audio is loaded per-synthesis, not held in memory.
| Backend | Voice profiles | Languages | Status |
|---|---|---|---|
| qwen3-0.6b | 100+ | en zh ja ko es fr de it pt ru | stable |
| qwen3-1.7b | 100+ | en zh ja ko es fr de it pt ru | stable |
| voxtral-4b | yours | en fr de es it pt nl ru zh ja ko ar hi | verified |
| soprotts | yours | en | verified |
13 further backends are scaffolded and available for experimentation (OpenVoice, F5-TTS, CosyVoice2, and others). See the README for integration status and hardware requirements.