Introduction
Podcast creators face a familiar problem: you've just published a 90-minute episode, and now you need a transcript, chapter markers, social media clips, and promotional content. The manual workflow is tedious. You upload to a transcription service, wait for results, manually identify chapter breaks, export clips, and create social posts from each one. If you have a weekly show, this easily consumes 4-6 hours per episode.
What if this entire process happened automatically the moment your episode publishes? No waiting between steps, no manual file juggling, no copy-pasting timestamps. You upload once, and an hour later you have a full transcript, timestamped chapters, edited video clips ready for TikTok and Instagram Reels, and social media captions already written.
This is entirely possible with three focused tools and an orchestration layer to connect them. You'll combine Whisper API for transcription, Mirra for intelligent chapter detection, and Clipwing for automatic clip generation. The orchestration handles everything in between, routing data and managing the timing so each tool receives exactly what it needs.
The Automated Workflow
Why This Combination Works
Whisper API excels at speech-to-text with speaker identification. Mirra analyses transcripts to find natural chapter boundaries based on topic changes and conversation flow. Clipwing then extracts video segments tied to those chapters, and its built-in social editing handles format optimisation for different platforms. The three tools have different strengths, which is why stringing them together produces better results than any single tool alone.
Choosing Your Orchestration Tool
For this particular workflow, n8n is the best choice. Here's why: Zapier has rate limits that make it unreliable for large audio files; Make (Integromat) has weaker error handling when API responses are delayed; Claude Code requires constant human intervention. Only n8n gives you enough control over retry logic and enough flexibility to handle a multi-hour transcription job properly.
If you're already invested in Make or Zapier, they'll work, but expect occasional failures on longer episodes (90+ minutes). Self-host n8n if you can, as cloud versions sometimes have timeout issues with large file operations.
Step-by-Step Architecture
Here's the flow:
-
Podcast episode publishes to your hosting platform (Transistor, Podbean, or wherever).
-
A webhook notification reaches n8n, containing the episode audio file URL.
-
n8n downloads the audio (or receives a direct file upload) and sends it to Whisper API for transcription.
-
Once transcription completes, n8n passes the full transcript text to Mirra's chapter detection endpoint.
-
Mirra returns chapter boundaries with summaries.
-
n8n sends the original audio file and chapter timestamps to Clipwing.
-
Clipwing generates social clips (60-second cuts, 30-second cuts, vertical format, etc.) and returns download URLs.
-
n8n formats all outputs into a clean JSON structure and stores it (or sends it to your CMS, email, or Slack).
Let's build this in n8n.
Setting Up the Webhook
First, create an n8n workflow and add a Webhook node. Configure it to accept POST requests. Your podcast host will send something like this:
POST https://your-n8n-instance.com/webhook/podcast-episode
{
"episode_id": "ep_12345",
"title": "How to Build a Personal Brand",
"audio_url": "https://cdn.example.com/episodes/ep_12345.mp3",
"duration_seconds": 5400,
"published_at": "2025-01-15T10:00:00Z"
}
In n8n, the Webhook node will automatically parse this and pass it to the next step.
Transcription with Whisper API
Add an HTTP Request node to call Whisper. You'll need an OpenAI API key.
HTTP Request Node Configuration:
Method: POST
URL: https://api.openai.com/v1/audio/transcriptions
Headers:
Authorization: Bearer YOUR_OPENAI_API_KEY
Body (form-data):
file: [audio file from previous step]
model: whisper-1
language: en
timestamp_granularities: segment
response_format: verbose_json
If the audio URL comes from your podcast host, you'll need to download it first. Add a separate HTTP node:
HTTP Request: Download Audio
Method: GET
URL: {{ $json.audio_url }}
Send Query: false
Response Type: File
Then use that downloaded file in the Whisper request. Whisper accepts files up to 25 MB, so most podcast episodes fit fine.
Whisper returns this structure:
{
"text": "Full transcript as plain text...",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 5.2,
"text": "Hello, welcome to the podcast.",
"avg_logprob": -0.25,
"compression_ratio": 1.5,
"no_speech_prob": 0.001
}
],
"language": "en"
}
Store the full transcript somewhere accessible. A Webhook response or a saved file works fine.
Chapter Detection with Mirra
Mirra's API endpoint for chapter detection looks like this:
HTTP Request Node Configuration:
Method: POST
URL: https://api.mirra.ai/v1/chapters/detect
Headers:
Authorization: Bearer YOUR_MIRRA_API_KEY
Content-Type: application/json
Body (JSON):
{
"transcript": "{{ $json.text }}",
"episode_title": "{{ $json.title }}",
"duration_seconds": {{ $json.duration_seconds }},
"language": "en"
}
Set the timeout on this node to at least 30 seconds, as Mirra's analysis can take a few seconds for longer transcripts.
Mirra returns chapters with timestamps:
{
"chapters": [
{
"start_time": 0,
"end_time": 420,
"title": "Introduction",
"summary": "The host introduces today's topic on personal branding..."
},
{
"start_time": 420,
"end_time": 1240,
"title": "Building Your Origin Story",
"summary": "Discussion on crafting a compelling personal narrative..."
}
],
"total_chapters": 5
}
This is where the workflow gets powerful. You now have structured chapter data tied to exact timestamps. No manual chapter marking needed.
Clip Generation with Clipwing
Clipwing requires both the original video/audio file and the chapter data. The workflow passes both:
HTTP Request Node Configuration:
Method: POST
URL: https://api.clipwing.io/v1/clips/generate
Headers:
Authorization: Bearer YOUR_CLIPWING_API_KEY
Content-Type: application/json
Body (JSON):
{
"source_url": "{{ $json.audio_url }}",
"chapters": {{ $json.chapters }},
"output_formats": [
{
"format": "video",
"resolution": "1080p",
"aspect_ratio": "9:16",
"max_duration": 60,
"include_captions": true
},
{
"format": "video",
"resolution": "1080p",
"aspect_ratio": "16:9",
"max_duration": 30,
"include_captions": false
}
],
"caption_style": {
"font": "Arial",
"size": 24,
"position": "bottom",
"background": "semi-transparent"
}
}
Clipwing will process this asynchronously. It returns a job ID immediately:
{
"job_id": "job_xyz789",
"status": "processing",
"estimated_completion_seconds": 180
}
You'll need to poll this job until completion. In n8n, add a Loop node that waits 10 seconds, then checks the status:
HTTP Request: Check Clipwing Status
Method: GET
URL: https://api.clipwing.io/v1/clips/jobs/{{ $json.job_id }}
Headers:
Authorization: Bearer YOUR_CLIPWING_API_KEY
When status changes to "completed", Clipwing provides download URLs for each clip and format:
{
"job_id": "job_xyz789",
"status": "completed",
"clips": [
{
"chapter_index": 0,
"chapter_title": "Introduction",
"formats": [
{
"format": "9:16",
"duration": 60,
"download_url": "https://clips.clipwing.io/...",
"thumbnail_url": "https://thumbs.clipwing.io/..."
}
]
}
]
}
Storing and Distributing Results
Add a final step that consolidates everything into a single JSON document. Use a Set node to structure the output:
{
"episode_id": "{{ $json.episode_id }}",
"title": "{{ $json.title }}",
"transcript": {
"full_text": "{{ $json.text }}",
"language": "en"
},
"chapters": {{ $json.chapters }},
"clips": {{ $json.clips }},
"generated_at": "{{ $now.toIso() }}"
}
Then choose where this goes. Options:
Send to your CMS or database: Add a PostgreSQL or MongoDB node to save the entire structure as a record tied to the episode ID. Future content calendars, search, and analytics all become possible.
Email to yourself: Add an Email node that sends the JSON as a downloadable attachment plus a summary in the body.
Post to Slack: Create a Slack message with the episode title, chapter list, and clip download links. Notify your team immediately when new content is ready.
Upload clips to a CDN: Add additional HTTP requests that PUT the clip files to AWS S3 or Cloudflare R2. This way clips are globally available the moment they're generated.
Error Handling and Retries
Real workflows fail sometimes. Add error handling to each critical node:
-
Whisper timeout: Set a 5-minute timeout and retry twice. If it still fails, send an alert but don't stop the workflow; manual upload to Whisper online is fast.
-
Mirra analysis failure: Provide sensible defaults (automatic chapter breaks every 15 minutes) if Mirra can't analyse the transcript.
-
Clipwing polling timeout: After 15 minutes of polling, assume the job succeeded but is taking longer; don't error out. Check status manually via the dashboard later.
In n8n, use Try/Catch nodes to wrap risky operations:
Try Node: [Whisper API call]
Error Handler: Send email alert with error details
Continue workflow: false
For Clipwing polling specifically, set max iterations to 90 (15 minutes at 10-second intervals), then have a fallback action rather than a hard error.
The Manual Alternative
If you prefer more control or want to handle special cases differently, you can use each tool independently:
-
Upload your episode audio to OpenAI's Whisper service directly (or use Whisper Desktop) to get a transcript. Download it as an SRT or VTT file.
-
Copy the full transcript text into Mirra's web interface, review the suggested chapters, and adjust timestamps if needed.
-
Download the chapter data as JSON, then upload your video file plus the JSON to Clipwing's web app. Configure each output format manually.
-
Download the finished clips and upload them to your social media scheduling tool or CDN.
This approach takes roughly 20-30 minutes per episode instead of 5-10 minutes with automation. The manual workflow also lets you listen to the episode again while marking chapters, which sometimes reveals better chapter placements than the algorithm suggests. For shows where episode quality matters more than speed, this is reasonable.
Pro Tips
Manage Whisper API costs. Whisper charges $0.006 per minute of audio. A 90-minute episode costs $0.54. This is cheap, but if you publish multiple episodes per week, costs add up. Trick: use Whisper's language parameter to specify English upfront, which improves both speed and accuracy. Also batch transcription requests during off-peak hours (2-4 AM UTC) if your orchestration tool supports scheduling.
Handle very long episodes. Whisper has a 25 MB file limit, which covers most podcast formats. If your episodes exceed this (unlikely for audio-only, common if you include video), split the file before uploading. Add a step in n8n that checks file size and splits the audio into 20-minute chunks using FFmpeg, transcribes each chunk separately, then concatenates the transcripts by timestamp.
Test the workflow with a short episode first. Don't run your entire archive through this on day one. Pick a recent 30-minute episode, test each tool independently, verify the JSON output at each step, then run the full workflow. You'll catch configuration errors before you're waiting on a 2-hour episode to process.
Set reasonable expectations for Mirra's chapters. Mirra's algorithm works best on interviews and narrative-driven content. Highly technical episodes with constant topic switching might not break into obvious chapters. Always review the output and adjust manually if you spot issues. The automation saves you 80% of the work, not 100%.
Monitor API rate limits closely. Whisper allows unlimited requests but charges per minute. Mirra allows 100 requests per minute on most plans. Clipwing's free tier allows 3 concurrent clip generation jobs. If you run multiple episodes through the workflow simultaneously, you'll hit Clipwing's limit. Stagger episodes using n8n's scheduling features, or upgrade Clipwing's plan to allow more concurrent jobs.
Store intermediate outputs. Save the transcript, chapter data, and clip URLs even if you don't use them immediately. These become invaluable for future projects: clip compilations, transcription archives, search functionality on your website, and training data for other AI systems. A simple PostgreSQL database indexed by episode ID costs almost nothing.
Cost Breakdown
| Tool | Plan Needed | Monthly Cost | Notes |
|---|---|---|---|
| Whisper API | Pay-as-you-go | $0.006 per minute of audio | One 90-minute episode = $0.54. Roughly $2-3 per week for a weekly podcast. |
| Mirra | Starter | $29 | Includes 1,000 analysis requests per month. Sufficient for 250+ episodes per month. |
| Clipwing | Professional | $99 | Includes 50 concurrent clip jobs, 1 TB storage, priority processing. |
| n8n | Cloud | $20 (or self-hosted free) | Cloud plan covers up to 40,000 workflow executions per month. Self-hosted n8n is free but requires your own server. |
| Total | $148-150 per month | Covers approximately 250 episodes monthly with some headroom. |
If you publish fewer than 4 episodes per month, Clipwing's Hobby plan ($29) and Mirra's free tier suffice, bringing total to around $35 per month.
The workflow described here handles the repetitive 80% of podcast production. You still need a human to listen and decide on episode strategy, guest quality, and whether chapters make sense. But the mechanical work, the file conversions and timing adjustments, now happens in the background. For creators managing multiple shows or publishing frequently, this reclaims hours every week. Set it up once, test it thoroughly, then let it work invisibly while you focus on making better episodes.