Back to Alchemy
Alchemy RecipeIntermediateautomation

Software Feature Demo Video Production for Product Releases

24 March 2026

Introduction

Product launches live or die on communication. Your engineering team has built something excellent, but your marketing and sales teams need footage that explains what it does and why it matters. Feature demo videos sit at that awkward intersection where you need them quickly, they're expensive to produce manually, and every day you wait is a day your competitors are talking to prospects.

The traditional approach involves hiring a videographer, writing scripts, recording voiceovers, editing footage, and waiting weeks for output. The cost balloons fast: freelance videographers charge £500 to £2,000 per day, voice actors want £200 to £500 per video, and video editors need another week of labour to stitch it all together. Meanwhile, your product team is shipping faster than your marketing can document it.

What if you could produce polished feature demo videos in hours instead of weeks? By combining four AI tools into an automated workflow, you can take a product URL, generate a demo video with AI voiceover, add motion graphics, and have it ready for your sales team before your next standup. This guide shows you exactly how.

The Automated Workflow

The workflow moves through five distinct stages: URL capture and screen recording, script generation, voiceover production, video assembly, and motion graphics enhancement. Each stage feeds into the next with zero manual handoff. Data flows as structured JSON through your orchestration tool, which acts as the connective tissue.

Architecture Overview

The orchestration layer (we'll use n8n for this guide, though Zapier and Make work similarly) receives a trigger: usually a webhook POST containing the product URL and desired demo focus. From there:

  1. Clipwing captures an interactive screen recording of your product
  2. The video frames and metadata feed into Claude to generate a structured demo script
  3. ElevenLabs converts that script into broadcast-quality voiceover
  4. Demofly assembles the video with voiceover synced to product actions
  5. PixelMotion AI adds motion graphics and title sequences

Let's walk through the actual implementation.

Step 1:

Screen Recording with Clipwing

Clipwing provides an API that records product interactions at specified URLs. Start by setting up your n8n workflow with an incoming webhook node.


POST /webhook/demo-request
Content-Type: application/json

{
  "product_url": "https://yourapp.com",
  "feature_focus": "dashboard analytics",
  "duration_seconds": 120,
  "interaction_script": "Click the analytics tab, wait 2 seconds, hover over the chart, click export button"
}

Your first n8n node captures this webhook. The second node calls Clipwing's recording API:


POST https://api.clipwing.io/v1/recordings
Authorization: Bearer YOUR_CLIPWING_API_KEY
Content-Type: application/json

{
  "url": "{{ $json.product_url }}",
  "duration": {{ $json.duration_seconds }},
  "resolution": "1920x1080",
  "framerate": 30,
  "interactions": [
    {
      "type": "click",
      "selector": "[data-testid='analytics-tab']",
      "delay_before": 500
    },
    {
      "type": "wait",
      "duration": 2000
    },
    {
      "type": "hover",
      "selector": ".chart-container"
    },
    {
      "type": "click",
      "selector": "[data-testid='export-button']"
    }
  ],
  "capture_metadata": true
}

Clipwing returns a job ID and, once processing completes (typically 90 seconds), a signed URL to your video file plus frame-by-frame metadata:

{
  "job_id": "job_abc123xyz",
  "status": "completed",
  "video_url": "https://clipwing-cdn.s3.amazonaws.com/...",
  "duration_ms": 118500,
  "frames": [
    {
      "timestamp_ms": 0,
      "dom_state": "initial page load",
      "interactive_elements": ["button#analytics", "nav#menu"]
    },
    {
      "timestamp_ms": 2000,
      "dom_state": "analytics tab clicked",
      "interactive_elements": ["chart.svg", "button#export"]
    }
  ]
}

Store this entire response in an n8n variable. You'll need the video URL and frames data for downstream steps.

Step 2:

Script Generation with Claude

Now you have video content and structured metadata about what happens on screen. Use Claude (via API) to generate a natural, engaging demo script that aligns with on-screen actions. Claude can see the frame timing and generate a script where narration lands precisely when features appear.


POST https://api.anthropic.com/v1/messages
Authorization: Bearer YOUR_CLAUDE_API_KEY
Content-Type: application/json

{
  "model": "claude-3-5-sonnet-20241022",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": "Generate a compelling product demo script for a video that is 118 seconds long. The product feature being demonstrated is: {{ $json.feature_focus }}. Here are the screen events that occur during the video, with timestamps:\n\n{{ $json.frames | map(frame => frame.timestamp_ms + 'ms: ' + frame.dom_state) | join('\\n') }}\n\nFormat the script as JSON with this structure:\n{\n  \"sections\": [\n    {\n      \"start_time_ms\": 0,\n      \"end_time_ms\": 3000,\n      \"narration\": \"Opening hook text\",\n      \"tone\": \"energetic\"\n    }\n  ]\n}\n\nMake the narration engaging, avoid jargon, and match the pacing to on-screen actions."
    }
  ]
}

Claude returns structured JSON with narration blocks tied to specific time windows:

{
  "sections": [
    {
      "start_time_ms": 0,
      "end_time_ms": 3500,
      "narration": "Meet your new analytics hub. In seconds, you can see your entire business performance at a glance.",
      "tone": "energetic"
    },
    {
      "start_time_ms": 3500,
      "end_time_ms": 8000,
      "narration": "Just click the Analytics tab, and boom, your data is right there. Real-time charts, instant insights.",
      "tone": "conversational"
    },
    {
      "start_time_ms": 8000,
      "end_time_ms": 12000,
      "narration": "Need to share these results? One click exports everything to PDF. Your stakeholders get exactly what they need.",
      "tone": "professional"
    }
  ]
}

Store this script object in your n8n context. You'll use it to sync voiceover timing in the next step.

Step 3:

Voiceover Generation with ElevenLabs

ElevenLabs' API converts text to natural-sounding speech. The critical detail here is getting duration predictions so you can validate that narration fits within the video timeline.


POST https://api.elevenlabs.io/v1/text-to-speech/pNInz6obpgDQGcFmaJVE
Authorization: xi-api-key YOUR_ELEVENLABS_API_KEY
Content-Type: application/json

{
  "text": "{{ $json.script_sections[0].narration }}",
  "model_id": "eleven_monolingual_v1",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75
  }
}

ElevenLabs returns audio data directly. But before you commit to generating all sections, test one first and check the duration. An n8n HTTP node can make this request, and a subsequent code node can extract the audio duration:

// In an n8n Code node
const audioBuffer = $json.audio; // base64-encoded audio from ElevenLabs
const audioBufferLength = Buffer.byteLength(Buffer.from(audioBuffer, 'base64'));

// Rough estimate: MP3 at 128kbps = 16,000 bytes per second
const estimatedDuration = (audioBufferLength / 16000) * 1000;

return {
  duration_ms: estimatedDuration,
  section_index: $json.section_index
};

If duration matches the script section's time window (within 500ms tolerance), proceed. If not, loop back to Claude and ask for a shorter or longer version.

For each confirmed section, store the audio files with their start/end timestamps. You'll now have multiple MP3 files, each synced to a specific moment in the video.

Step 4:

Video Assembly with Demofly

Demofly combines your Clipwing video with your ElevenLabs voiceover tracks and handles the hard part: synchronising audio to on-screen actions. Demofly also adds subtle callouts, arrows, and highlighting to draw attention to important UI elements.


POST https://api.demofly.io/v1/videos/assemble
Authorization: Bearer YOUR_DEMOFLY_API_KEY
Content-Type: application/json

{
  "base_video_url": "{{ $json.clipwing_video_url }}",
  "voiceover_tracks": [
    {
      "audio_url": "{{ $json.voiceover_section_0_url }}",
      "start_time_ms": 0,
      "end_time_ms": 3500
    },
    {
      "audio_url": "{{ $json.voiceover_section_1_url }}",
      "start_time_ms": 3500,
      "end_time_ms": 8000
    },
    {
      "audio_url": "{{ $json.voiceover_section_2_url }}",
      "start_time_ms": 8000,
      "end_time_ms": 12000
    }
  ],
  "callouts": [
    {
      "type": "highlight_element",
      "selector": "[data-testid='analytics-tab']",
      "start_time_ms": 1000,
      "end_time_ms": 3000,
      "style": "pulse"
    },
    {
      "type": "arrow",
      "from_x": 1200,
      "from_y": 300,
      "to_x": 1400,
      "to_y": 350,
      "start_time_ms": 3500,
      "end_time_ms": 5000,
      "colour": "#FF6B35"
    }
  ],
  "background_music": {
    "url": "https://music-library.s3.amazonaws.com/upbeat-tech.mp3",
    "volume": 0.2,
    "fade_in_ms": 1000,
    "fade_out_ms": 2000
  },
  "output_format": "mp4",
  "resolution": "1920x1080",
  "framerate": 30
}

Demofly processes this request (typically 2 to 3 minutes for a 2-minute video) and returns a job status endpoint:

{
  "job_id": "job_dfly_xyz789",
  "status": "processing",
  "status_url": "https://api.demofly.io/v1/videos/jobs/job_dfly_xyz789",
  "output_url": null
}

In n8n, use a "Wait" node to poll this status URL every 10 seconds until status becomes "completed". Once complete, the response includes the final video URL.

Step 5:

Motion Graphics with PixelMotion AI

PixelMotion AI takes your assembled video and adds professional title sequences, transitions, and motion graphics. This is the final polish layer.


POST https://api.pixelmotion.ai/v1/enhance
Authorization: Bearer YOUR_PIXELMOTION_API_KEY
Content-Type: application/json

{
  "video_url": "{{ $json.demofly_output_url }}",
  "enhancements": {
    "title_sequence": {
      "enabled": true,
      "duration_ms": 3000,
      "text": "Introducing Analytics Dashboard",
      "subtitle": "See your business, instantly",
      "style": "modern_minimal",
      "background": "gradient",
      "gradient_start": "#001F3F",
      "gradient_end": "#0074D9"
    },
    "transitions": {
      "enabled": true,
      "type": "smart",
      "duration_ms": 300
    },
    "motion_tracking": {
      "enabled": true,
      "track_interactive_elements": true
    },
    "end_card": {
      "enabled": true,
      "duration_ms": 2000,
      "cta_text": "Try it Free",
      "cta_url": "https://yourapp.com/signup"
    }
  },
  "output_format": "mp4"
}

PixelMotion returns a similar job structure. Poll for completion, and you now have a broadcast-ready feature demo video.

Complete n8n Workflow Structure

Here's the high-level flow in n8n:

  1. Webhook Trigger: Receives product URL and feature focus
  2. Clipwing Recording: Calls Clipwing API, stores video and frame metadata
  3. Wait for Clipwing: Polls status until video is ready
  4. Claude Script Generation: Converts frames into a structured narration script
  5. Loop through Script Sections: For each narration block,
    • Call ElevenLabs API
    • Store audio file URL
    • Validate duration
  6. Demofly Assembly: Assembles video and audio tracks
  7. Wait for Demofly: Polls status until assembly completes
  8. PixelMotion Enhancement: Adds motion graphics
  9. Wait for PixelMotion: Polls status until complete
  10. Notify Slack: Sends final video URL to your team's Slack channel

The entire flow typically completes in 8 to 12 minutes, depending on video length and API response times.

The Manual Alternative

If you prefer direct control over any step, you can trigger portions of this workflow manually. For instance, after Clipwing captures video, you could review it before proceeding to script generation. Or you could write your own script in your favourite editor and skip the Claude step entirely.

To do this, insert HTTP request nodes that pause the workflow and wait for a webhook callback. When someone manually approves or submits data, that webhook resumes the workflow. This trades some automation for control; useful if your demo videos require brand-specific language or you want an editor's eye on the script before voiceover generation begins.

Pro Tips

1. Cache Voiceover and Motion Graphics Across Videos

Once you've chosen a voice profile in ElevenLabs that suits your brand, lock those settings into your API call. Similarly, use the same title sequence template and motion graphic style across all product videos. This creates consistency and reduces API calls (lowering costs). Store these configuration objects in n8n global variables.

2. Handle Clipwing Recording Failures Gracefully

Screen recordings sometimes fail if the target URL times out or JavaScript doesn't load. Wrap the Clipwing call in error handling. If it fails twice, send a Slack notification to your team asking for manual intervention, rather than failing silently. In n8n, use a "Try/Catch" node or set up error handlers on the HTTP request.

3. Rate Limit ElevenLabs Voiceover Calls

If you're generating many demo videos in parallel, you'll hit ElevenLabs' rate limits. Use n8n's queue feature or delay nodes between voiceover requests. A 2-second delay between API calls keeps you comfortably under typical rate limits (300 requests per minute on the Starter plan).

4. Validate Audio Duration Before Committing

Before you send all voiceover files to Demofly, check that total audio duration matches or is slightly shorter than your video duration. If audio is longer, you either need shorter narration or a longer video. Add this validation as a code node after voiceover generation, before the Demofly call. This catches errors early.

5. Store Output URLs with Metadata

After your workflow completes, save the final video URL, along with the input product URL, feature focus, timestamps, and cost (see breakdown below) to a simple database or Google Sheet. This creates an audit trail and lets you reuse videos for the same feature without re-running the entire workflow. In n8n, a "Google Sheets" node can handle this.

Cost Breakdown

ToolPlan NeededMonthly CostNotes
ClipwingGrowth£150300 minutes/month recording, includes 1080p
ElevenLabsStarter£1110,000 characters/month; budget ~£50 if scaling
DemoflyProfessional£250100 videos/month, includes callouts and music library
PixelMotion AIPro£99Unlimited enhancements, advanced motion tracking
n8n (self-hosted)Free or Cloud Pro£0–£25Free self-hosted, or £25/month Cloud with more executions
Total£610/monthScales to ~150 videos/month with this stack

This is substantially cheaper than hiring freelance videographers (£1,500–£3,000 per video) or a dedicated in-house video editor (£30,000–£50,000 annually). For a single high-quality demo video, the tools pay for themselves immediately.

At scale (100+ videos per month), cost per video drops below £5, making this approach viable even for organisations with smaller marketing budgets.