ElevenLabs voice cloning for product demo narration that does not sound like a robot
Clone your own voice with a 60-second sample, generate narration for product demo videos via the API, and edit the output in Descript. The result sounds like you recorded it, because the voice is yours.
- Time saved
- Saves 2-3 hrs per demo video
- Monthly cost
- ~$5-22/mo (ElevenLabs Starter/Creator)/mo
- Published
Product demo videos need narration. Recording that narration means booking a quiet room, doing multiple takes, editing out the pauses and mistakes, and doing it again when the product changes and the demo needs updating. For a team shipping weekly demos, the recording step is the bottleneck.
ElevenLabs voice cloning lets you record a 60-second sample of your voice, train a clone from it, and then generate speech from any text that sounds like you. Combined with Descript for editing and Claude for script writing, this turns demo narration from a half-day recording session into a 15-minute pipeline.
What you'll build
A workflow that:
- Clones your voice from a short audio sample in ElevenLabs
- Writes demo narration scripts from product release notes using Claude
- Generates narration audio via the ElevenLabs API
- Edits and polishes the audio in Descript
- Outputs a finished narration track ready to overlay on a screen recording
Prerequisites
- An ElevenLabs account. The Starter plan at $5/month gives you 30,000 characters per month (roughly 30 minutes of audio) and 3 custom voice clones. For heavier usage, the Creator plan at $22/month gives you 100,000 characters and 10 clones.
- A 60-second audio recording of your voice, recorded in a quiet room. This is your clone training data. Read something natural, not a script. A paragraph from a blog post or a product description works well. Record in WAV or high-quality MP3.
- A Descript account. The Hobbyist plan at $24/month includes transcription and the audio editor. The Creator plan at $35/month adds more transcription hours and AI features.
- An Anthropic API key for Claude, or access to the Claude web app.
- Python 3.10+ with the
elevenlabsSDK installed (pip install elevenlabs).
How to build it
Step 1: Clone your voice
Log into ElevenLabs, go to Voices > Add Voice > Instant Voice Clone. Upload your 60-second recording. Give the voice a name. ElevenLabs processes it in about 30 seconds and the clone is available immediately.
Test it by typing a sentence in the voice preview and clicking play. The first-generation clone is usually 80-90% accurate. If it sounds off, the most common fix is recording a better training sample: less background noise, more natural pacing, and no vocal fry or extreme intonation.
Note your voice ID from the ElevenLabs dashboard. You will need it for the API calls.
Step 2: Write the narration script with Claude
For each demo, paste the product release notes or feature description into Claude with this prompt:
Write a product demo narration script for a screen recording.
The narrator is walking the viewer through the feature while
the screen shows the product in use. The script should:
- Be 250-400 words (roughly 2-3 minutes of speech)
- Use short, clear sentences
- Include natural pauses marked as [pause]
- Assume the viewer can see the screen, so describe what to
notice rather than what to click
- British English spelling
- Conversational tone, not formal
Feature to demo:
[paste your release notes or feature description]
Claude's output is ready to send to ElevenLabs. The [pause] markers translate to natural breath points in the generated speech.
Step 3: Generate narration via the API
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key="YOUR_API_KEY")
script = """Your script text goes here. Include pauses
as short sentences or line breaks."""
audio = client.text_to_speech.convert(
voice_id="YOUR_VOICE_ID",
model_id="eleven_multilingual_v2",
text=script,
voice_settings={
"stability": 0.65,
"similarity_boost": 0.80,
"style": 0.15,
}
)
with open("demo-narration.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)
print("Narration saved to demo-narration.mp3")
The stability parameter controls how consistent the voice sounds. Higher values (0.7-0.9) sound more robotic but consistent. Lower values (0.4-0.6) sound more natural but may vary between sentences. For product demos, 0.65 is a good middle ground.
The similarity_boost controls how closely the output matches your voice clone versus the base model. At 0.8, it sounds recognisably like you. At 1.0, it may pick up artefacts from your training sample (background noise, room tone).
Step 4: Edit in Descript
Import the MP3 into Descript. Descript auto-transcribes the audio, giving you a text document that you edit like a word processor. Delete a sentence in the transcript and the corresponding audio disappears.
This is where you fix pacing issues. ElevenLabs occasionally rushes through a list or pauses too long between sentences. In Descript, you can shorten pauses by selecting the gap in the waveform, or add pauses by pressing Enter in the transcript.
Normalise the loudness to -16 LUFS for video overlay. Export as WAV or high-quality MP3 and overlay it on your screen recording in your video editor of choice.
Cost breakdown
- ElevenLabs Starter: $5/month (30 minutes of audio, enough for 10-15 demos)
- Descript Hobbyist: $24/month
- Claude: free for script generation
- Total: $29/month for roughly 10-15 demo narrations
The voice drift problem
After generating 20-30 narrations over several weeks, you may notice the clone drifting slightly from your actual voice. This happens because the clone is a statistical model, not a recording, and repeated generation from the same model introduces subtle variation. The fix is to re-record your training sample every 2-3 months with fresh audio. This resets the clone and keeps it sounding current.
ElevenLabs' Flash model generates audio roughly 4x faster than the standard model and sounds nearly as good for speaking-pace narration. For batch processing 10 demos at once, Flash saves real time on the generation step. The quality difference is most noticeable on emotional or highly expressive speech, which product demo narration rarely requires.
More Recipes
Automated Podcast Production Workflow
Automated Podcast Production Workflow: From Raw Audio to Published Episode
Build an Automated YouTube Channel with AI
Build an Automated YouTube Channel with AI
Medical device regulatory documentation from technical specifications
Medtech companies spend significant resources translating technical specs into regulatory-compliant documentation.