You've written a demo track in your bedroom, and it's genuinely good. The melody hooks, the production is tight, but your voice recording sounds thin and amateur. Hiring a vocalist costs £500 minimum. A mixing engineer wants £800. Mastering runs another £300. That's £1,600 you don't have, and meanwhile, three other artists have already released similar tracks this week. This is the reality for most independent musicians working outside major label infrastructure. You own the creative vision but lack access to the professional resources that make the difference between a bedroom demo and a release-ready track. Traditionally, you'd compromise on quality, save for months, or shelve the song entirely. An automated AI music production pipeline changes this entirely. You can generate studio-quality vocals from your lyrics, produce complementary instrumental layers, mix everything coherently, and master the final product, all without touching a mixing console or paying session musicians. The workflow runs end-to-end with minimal human intervention, turning raw songwriting into a polished release in hours rather than weeks. This post shows how to build it.
The Automated Workflow
The core strategy here is sequential processing: lyrics go in at one end, a professionally mastered track emerges at the other. We'll use n8n as the orchestration layer because it has direct integrations with most music AI platforms and gives you fine-grained control over error handling and retries, which matters when you're orchestrating multiple API calls across different services. The flow looks like this: 1. Accept lyrics and song metadata (title, key, tempo, mood) via webhook 2. Generate instrumental backing track using Hydra AI 3. Synthesise vocal performance using ElevenLabs 4. Combine vocals and instrumentals using Bronze 5. Master the final mix using Landr Start by setting up an n8n instance (self-hosted or cloud). Create a new workflow and add a Webhook Trigger node to receive incoming requests.
POST /webhook/music-production
Content-Type: application/json { "title": "Summer Clarity", "lyrics": "Waking up to morning light...", "tempo": 120, "key": "G major", "mood": "uplifting", "vocal_style": "warm", "artist_name": "Your Name"
}
Once the webhook receives the data, pass it to an HTTP Request node to call the Hydra AI API. Hydra generates copyright-free instrumentals based on mood, tempo, and key, which is crucial because it sidesteps licensing complications entirely.
POST https://api.hydra-ai.io/v1/generate-instrumental
Authorization: Bearer YOUR_HYDRA_API_KEY
Content-Type: application/json { "tempo": 120, "key": "G major", "mood": "uplifting", "duration": 180, "style": "pop"
}
Hydra returns a URL pointing to an MP3 file. Store this URL in an n8n variable for later reference. The instrumental typically takes 10-20 seconds to generate. Next, send the lyrics to ElevenLabs Turbo v2.5 to synthesise the vocal track. ElevenLabs excels at producing natural-sounding vocals with emotional consistency. You'll want to choose a voice that matches your artistic direction; the platform offers hundreds of pre-built voices, but you can also clone your own voice and have the model sing in your timbre.
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream
xi-api-key: YOUR_ELEVENLABS_API_KEY
Content-Type: application/json { "text": "Waking up to morning light, everything feels right...", "model_id": "eleven_turbo_v2_5", "voice_settings": { "stability": 0.75, "similarity_boost": 0.85 }
}
This returns a WAV file. The "stability" and "similarity_boost" parameters are important: stability at 0.75 gives you natural variation without sounding robotic, and similarity_boost at 0.85 ensures the vocal matches your chosen voice identity without excessive distortion. ElevenLabs doesn't handle multi-line vocal arrangements or melismatic (multi-note) singing particularly well, so if your song has multiple vocal layers or complex melodic runs, split the lyrics into separate chunks and call the API multiple times, then stack the results later in Bronze. Now you need to combine the instrumental and vocal(s) into a cohesive mix. This is where Bronze comes in. Bronze is an AI-driven music platform that handles vocal-to-instrumental alignment, automatic volume balancing, and basic EQ matching. It's not a full DAW, but it's precisely what you need for this workflow.
POST https://api.bronze-music.io/v1/mix
Authorization: Bearer YOUR_BRONZE_API_KEY
Content-Type: application/json { "instrumental_url": "https://output.hydra-ai.io/track_xyz.mp3", "vocal_urls": [ "https://output.elevenlabs.io/vocal_abc.wav" ], "title": "Summer Clarity", "tempo": 120, "key": "G major"
}
Bronze processes the mix and returns a URL to a balanced, pre-mastered file. The entire process typically takes 30-60 seconds. You now have a professionally balanced mix, but it's not yet ready for distribution. That final step belongs to Landr. Landr's AI mastering engine analyses your mix and applies compression, EQ, limiting, and loudness normalisation tailored to your genre and target platform (Spotify, Apple Music, YouTube, etc.). Mastering is the step most independent artists skip, which is why so many releases sound dull or fatiguing after a few listens.
POST https://api.landr.com/v1/master
Authorization: Bearer YOUR_LANDR_API_KEY
Content-Type: application/json { "audio_url": "https://output.bronze-music.io/mix_def.mp3", "target_platform": "spotify", "loudness_target": -14, "stem_separation": false
}
Landr returns a mastered stereo file and optionally a stem set (separate tracks for further editing). The mastering process takes 2-5 minutes depending on queue load. In n8n, chain these HTTP Request nodes sequentially, using the "Continue on Error" setting for each node so that if one step fails, you get a clear error log rather than a silent timeout. Set a 10-minute total timeout at the workflow level; if any service takes longer, something is wrong and you should be notified. Finally, add a node that stores the mastered file URL and metadata in your preferred database (PostgreSQL, MongoDB, Airtable, etc.) so you have a record of what you created and can retrieve it later. Optionally, add a Slack notification node so you get pinged when a track is ready.
{ "track_id": "uuid-here", "title": "Summer Clarity", "artist": "Your Name", "mastered_file_url": "https://output.landr.com/master_ghi.wav", "created_at": "2026-03-15T10:42:00Z", "cost_usd": 4.85, "processing_time_seconds": 187
}
The entire pipeline, from webhook trigger to mastered file, runs in 5-10 minutes. You've created a release-ready track from nothing but lyrics and a few parameters, with zero manual mixing work.
The Manual Alternative
If you want more creative control, you can stop after the Bronze step and manually refine the mix in a DAW like Ableton Live, Logic, or Reaper. Drop the balanced mix and instrumental stems into your project, adjust levels, EQ, reverb, and compression to taste, then submit the refined version to Landr for mastering only. This hybrid approach lets you shape the sound while still outsourcing the technical mastering step that requires acoustic treatment and monitor calibration most home studios lack. Alternatively, if you have strong opinions about vocal performance, replace the ElevenLabs step with your own recording: record yourself singing the melody, clean up the timing in your DAW, then route it into Bronze alongside the Hydra-generated instrumental. This preserves your unique voice while using AI for the instrumental foundation and final mastering.
Pro Tips
Rate limiting and queuing.
ElevenLabs allows 10,000 characters per minute on the Starter plan; if you're processing multiple tracks daily, implement a queue system in n8n using the "Wait" node to space out requests.
Hydra and Landr also have concurrent job limits, so stagger requests to avoid hitting caps during peak hours.
Cost optimisation.
Run test workflows with shorter instrumental durations (60 seconds instead of 180) during development. The per-second pricing means you can prototype for pence rather than pounds, then only run full-length production workflows once the pipeline works reliably. Also, batch multiple songs in a single Landr submission if they're going to the same platform; Landr offers modest discounts for bulk mastering.
Error handling and retries.
Landr's API sometimes queues mastering jobs instead of processing them immediately. Instead of polling indefinitely, use Landr's webhook callback feature to get notified when mastering completes, then have n8n listen for that callback and trigger the next step. This avoids wasted API calls and handles concurrency elegantly.
Vocal timing.
ElevenLabs doesn't produce tempo-locked vocals by default. If you find the synthetic vocals drift slightly ahead or behind the beat, download the vocal file and time-stretch it in Audacity or your DAW before uploading to Bronze. Bronze itself can handle minor timing variances, but large drift (>50ms) will cause phase cancellation.
Voice cloning.
ElevenLabs' Voice Lab feature lets you upload 5-10 seconds of your own voice and create a custom voice model that sings in your timbre. This takes 24-48 hours to train but is worth doing early; once your clone exists, you can reuse it across dozens of tracks, giving your releases a coherent vocal identity that pre-built voices can't match.
Cost Breakdown
| Tool | Plan Needed | Monthly Cost | Notes |
|---|---|---|---|
| n8n | Cloud (paid tier) or self-hosted | £0–£50 | Self-hosted is free; cloud paid tier adds priority support |
| Hydra AI | Pro | £25–£50 | Pay-as-you-go typically cheaper for occasional use |
| ElevenLabs | Creator | £99 | 100,000 characters monthly; ~8-10 full songs at typical length |
| Bronze | Pro | £29/month | Unlimited mixes after subscription |
| Landr | Pro | £60–£120 | Depends on volume; typically £9–£15 per master at pay-as-you-go |
| Total | £213–£309 | Per month assuming moderate output (~10 tracks) |