ElevenLabs voice cloning for product demo narration that does not sound like a robot

Product demo videos need narration. Recording that narration means booking a quiet room, doing multiple takes, editing out the pauses and mistakes, and doing it again when the product changes and the demo needs updating. For a team shipping weekly demos, the recording step is the bottleneck.

ElevenLabs voice cloning lets you record a 60-second sample of your voice, train a clone from it, and then generate speech from any text that sounds like you. Combined with Descript for editing and Claude for script writing, this turns demo narration from a half-day recording session into a 15-minute pipeline.

What you'll build

A workflow that:

Clones your voice from a short audio sample in ElevenLabs
Writes demo narration scripts from product release notes using Claude
Generates narration audio via the ElevenLabs API
Edits and polishes the audio in Descript
Outputs a finished narration track ready to overlay on a screen recording

Prerequisites

An ElevenLabs account. The Starter plan at $5/month gives you 30,000 characters per month (roughly 30 minutes of audio) and 3 custom voice clones. For heavier usage, the Creator plan at $22/month gives you 100,000 characters and 10 clones.
A 60-second audio recording of your voice, recorded in a quiet room. This is your clone training data. Read something natural, not a script. A paragraph from a blog post or a product description works well. Record in WAV or high-quality MP3.
A Descript account. The Hobbyist plan at $24/month includes transcription and the audio editor. The Creator plan at $35/month adds more transcription hours and AI features.
An Anthropic API key for Claude, or access to the Claude web app.
Python 3.10+ with the elevenlabs SDK installed (pip install elevenlabs).

How to build it

Step 1: Clone your voice

Log into ElevenLabs, go to Voices > Add Voice > Instant Voice Clone. Upload your 60-second recording. Give the voice a name. ElevenLabs processes it in about 30 seconds and the clone is available immediately.

Test it by typing a sentence in the voice preview and clicking play. The first-generation clone is usually 80-90% accurate. If it sounds off, the most common fix is recording a better training sample: less background noise, more natural pacing, and no vocal fry or extreme intonation.

Note your voice ID from the ElevenLabs dashboard. You will need it for the API calls.

Step 2: Write the narration script with Claude

For each demo, paste the product release notes or feature description into Claude with this prompt:

Write a product demo narration script for a screen recording.
The narrator is walking the viewer through the feature while
the screen shows the product in use. The script should:
- Be 250-400 words (roughly 2-3 minutes of speech)
- Use short, clear sentences
- Include natural pauses marked as [pause]
- Assume the viewer can see the screen, so describe what to
  notice rather than what to click
- British English spelling
- Conversational tone, not formal

Feature to demo:
[paste your release notes or feature description]

Claude's output is ready to send to ElevenLabs. The [pause] markers translate to natural breath points in the generated speech.

Step 3: Generate narration via the API

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")

script = """Your script text goes here. Include pauses
as short sentences or line breaks."""

audio = client.text_to_speech.convert(
    voice_id="YOUR_VOICE_ID",
    model_id="eleven_multilingual_v2",
    text=script,
    voice_settings={
        "stability": 0.65,
        "similarity_boost": 0.80,
        "style": 0.15,
    }
)

with open("demo-narration.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

print("Narration saved to demo-narration.mp3")

The stability parameter controls how consistent the voice sounds. Higher values (0.7-0.9) sound more robotic but consistent. Lower values (0.4-0.6) sound more natural but may vary between sentences. For product demos, 0.65 is a good middle ground.

The similarity_boost controls how closely the output matches your voice clone versus the base model. At 0.8, it sounds recognisably like you. At 1.0, it may pick up artefacts from your training sample (background noise, room tone).

Step 4: Edit in Descript

Import the MP3 into Descript. Descript auto-transcribes the audio, giving you a text document that you edit like a word processor. Delete a sentence in the transcript and the corresponding audio disappears.

This is where you fix pacing issues. ElevenLabs occasionally rushes through a list or pauses too long between sentences. In Descript, you can shorten pauses by selecting the gap in the waveform, or add pauses by pressing Enter in the transcript.

Normalise the loudness to -16 LUFS for video overlay. Export as WAV or high-quality MP3 and overlay it on your screen recording in your video editor of choice.

Cost breakdown

ElevenLabs Starter: $5/month (30 minutes of audio, enough for 10-15 demos)
Descript Hobbyist: $24/month
Claude: free for script generation
Total: $29/month for roughly 10-15 demo narrations

The voice drift problem

After generating 20-30 narrations over several weeks, you may notice the clone drifting slightly from your actual voice. This happens because the clone is a statistical model, not a recording, and repeated generation from the same model introduces subtle variation. The fix is to re-record your training sample every 2-3 months with fresh audio. This resets the clone and keeps it sounding current.

ElevenLabs' Flash model generates audio roughly 4x faster than the standard model and sounds nearly as good for speaking-pace narration. For batch processing 10 demos at once, Flash saves real time on the generation step. The quality difference is most noticeable on emotional or highly expressive speech, which product demo narration rarely requires.