Back to Alchemy
Alchemy RecipeIntermediateworkflow

Create AI-powered video demos for product launches with talking avatars and multilingual audio

A product launch without a demo video feels incomplete. Your engineering team has built something remarkable, but prospects need to see it in motion, not just read about it. The old approach meant hiring a video production agency (£3,000–10,000), booking talent, scheduling shoots around time zones, and waiting six to eight weeks for a final cut. By then, your launch window has closed. There is another path. AI video generation tools combined with text-to-speech platforms can produce professional demo videos in hours, not weeks. The workflow orchestrates multiple APIs to turn a product description into a talking-head video with AI-generated voiceover in any language you choose. No actors. No camera crew. No manual handoffs between tools. This post walks through a fully automated pipeline that takes a single product brief and outputs a finished demo video in your chosen language, ready to publish.

The Automated Workflow

The sequence runs like this: a product description triggers a workflow; an AI model writes the script; text-to-speech generates audio in multiple languages; a video generation tool creates the avatar video; and the final file is delivered to cloud storage. The entire process requires no human intervention between the trigger and the output.

Why this tool combination works:

ElevenLabs handles voice synthesis with natural prosody and emotion control. HeyGen or Hour One generates the video with realistic talking avatars. Pika AI adds motion and visual polish if you need animated product footage woven in. An orchestration platform like n8n or Make coordinates the API calls, passing outputs from one tool into the next. For this workflow, n8n is the best choice. It has native integrations with ElevenLabs and supports webhook-based triggers, so you can kick off the entire pipeline from a Slack message, a form submission, or an API call. The workflow structure: 1. Trigger: A product brief arrives via webhook or Slack.

  1. Script generation: Claude Opus 4.6 writes a 60-90 second demo script from the brief.

  2. Voice synthesis: ElevenLabs converts the script to audio (multiple language versions in parallel).

  3. Video generation: HeyGen creates a video with talking avatar, synced to the audio.

  4. Storage: The finished video uploads to an S3 bucket or Google Drive. Here is the n8n configuration:

Step 1: Webhook trigger

POST /webhook/demo-video-request Payload:
{ "product_name": "Acme Dashboard", "product_description": "Real-time analytics for SaaS metrics", "key_features": ["Live dashboards", "Custom alerts", "Data export"], "target_languages": ["en", "es", "fr"]
}

Step 2: Script generation with Claude

Use the HTTP Request node in n8n to call the Anthropic API:

POST https://api.anthropic.com/v1/messages Headers:
{ "x-api-key": "{{ $secrets.ANTHROPIC_API_KEY }}", "anthropic-version": "2023-06-01", "content-type": "application/json"
} Body:
{ "model": "claude-opus-4.6", "max_tokens": 1024, "messages": [ { "role": "user", "content": "Write a 90-second demo script for a SaaS product. Product: {{ $json.product_name }}. Description: {{ $json.product_description }}. Key features: {{ $json.key_features.join(', ') }}. The script should be conversational, highlight one use case, and include a call-to-action. Format as plain text, no stage directions." } ]
}

Store the script in a variable for the next steps. Claude will return something like: "Welcome to Acme Dashboard. Managing SaaS metrics shouldn't be this hard. Here's how it works. See all your important numbers in one place, in real time. Set up an alert, and we'll notify you the moment something changes. And when you need to share data with your team, just export it with one click. Try Acme Dashboard free today."

Step 3: Text-to-speech in parallel

Create one HTTP Request node per language. This example uses English:

POST https://api.elevenlabs.io/v1/text-to-speech/{{ voice_id }} Headers:
{ "xi-api-key": "{{ $secrets.ELEVENLABS_API_KEY }}", "content-type": "application/json"
} Body:
{ "text": "{{ $json.script }}", "model_id": "eleven_turbo_v2_5", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 }
}

The endpoint returns audio in MP3 format. Save the binary output to a variable. Repeat this node for each target language (es, fr) with different voice IDs. n8n's parallel execution runs all three simultaneously, cutting time by two-thirds.

Step 4: Video generation

Once you have the audio file, post it to HeyGen's API:

POST https://api.heygen.com/v1/video_talks.generate Headers:
{ "X-API-Key": "{{ $secrets.HEYGEN_API_KEY }}", "content-type": "application/json"
} Body:
{ "video_inputs": [ { "character": { "type": "avatar", "avatar_id": "Ashley-incasual-20220827", "avatar_style": "normal" }, "voice": { "type": "audio", "audio_url": "{{ $json.audio_url_en }}" }, "background": { "type": "colour", "colour": "#ffffff" } } ], "test": false
}

HeyGen queues the video and returns a session ID. You will need to poll this endpoint to check generation status:

GET https://api.heygen.com/v1/video_talks.get?session_id={{ session_id }}

Poll every 5 seconds until the status changes from "processing" to "completed", then extract the video_url.

Step 5: Upload to storage

Once the video is ready, download it and push to S3 (or Google Drive, OneDrive, etc.):

POST https://s3.amazonaws.com/your-bucket/demo-videos/{{ $json.product_name }}-{{ language }}.mp4 Headers:
{ "Authorization": "AWS4-HMAC-SHA256 ...", "Content-Type": "video/mp4"
} Binary: [video file from HeyGen]

Step 6: Notification

Send a Slack message with the download link:

POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL Body:
{ "text": "Demo video ready: {{ $json.product_name }}", "blocks": [ { "type": "section", "text": { "type": "mrkdwn", "text": "*{{ $json.product_name }} demo video generated*\n\nEnglish: {{ $json.s3_url_en }}\nSpanish: {{ $json.s3_url_es }}\nFrench: {{ $json.s3_url_fr }}" } } ]
}

The entire pipeline, from product brief to finished video, runs in 8–12 minutes. Error handling is crucial; add a try-catch wrapper around the HeyGen polling loop in case a video fails to render. Retry the request three times before alerting you manually.

The Manual Alternative

If you prefer more control over script tone, avatar selection, or background visuals, handle each step separately: 1. Write the script yourself in a Google Doc or text editor, refining it for pacing and emphasis.

  1. Record audio in ElevenLabs directly, choosing voice, emotion, and speed via their web interface.

  2. Upload audio to HeyGen's dashboard, select your avatar, pick a background template, and render.

  3. Download the MP4 and upload it to your video hosting platform (Vimeo, YouTube, Wistia). This approach takes 30–45 minutes but gives you more creative input. Use it for your first demo to set a brand standard, then automate subsequent videos once you know what works.

Pro Tips

1. Voice cloning for brand consistency

If you want the same voice across all languages, train ElevenLabs to clone a real speaker's voice. Record 30 seconds of clean audio, upload it to ElevenLabs, and it will generate a voice ID. Then use that ID across all language variants. Cost is around £10 per voice clone.

2. Rate limiting and queuing

Both ElevenLabs and HeyGen have strict rate limits. ElevenLabs allows 1,000 characters per minute on free tier; HeyGen queues only 3 concurrent generation jobs. Build a queue system in n8n using the "Wait" node to space requests 30 seconds apart. This prevents 429 errors and keeps the pipeline stable.

3. Cache scripts for multiple avatars

Generate one script, then create videos with three different avatars in parallel. This gives you variation for A/B testing without running Claude again. Save the script output to n8n's internal database, reuse it for each avatar variant, and reduce API costs by 66%.

4. Add a fallback TTS engine

If ElevenLabs returns a timeout, automatically retry with OpenAI TTS-1-HD as a backup. The voice quality is slightly lower, but it prevents the entire workflow from failing. Set up a conditional branch in n8n that checks for errors and routes to the fallback.

5. Monitor costs in real time

ElevenLabs charges per character generated. A 90-second script is roughly 1,200 characters. Three languages = 3,600 characters = £0.18 per video (on paid tier). HeyGen charges £1.20–3.00 per video depending on your plan. Track cumulative spend with an n8n monitoring dashboard or send weekly summaries to Slack so you catch runaway costs early.

Cost Breakdown

ToolPlan NeededMonthly CostNotes
ElevenLabsCreator (pay-as-you-go)£0.18–0.36 per video100k characters free tier; £5 per 100k characters on paid
HeyGenStandard (unlimited videos)£120Pay upfront; covers 100 video generations per month
Claude Opus 4.6Via Anthropic API£0.01–0.03 per script~1,500 input tokens, 200 output tokens per script
n8nCloud Pro£25Includes 10k executions monthly; enough for 200+ videos
AWS S3 (storage)Standard tier£0.02–0.05~5GB of video storage per 50 videos
Total£145–155Produces 100+ demo videos per month