Music production pipeline from lyrics to mastered track with AI vocals

Most bedroom producers hit the same wall: they can write a decent hook, maybe sketch out drums and bass, but the moment vocals come into play, they're stuck. Studio time costs £50 to £150 per hour. A professional vocalist wants £200 to £500 per track. Mastering runs another £75 to £200. By the time a solo artist budgets for these essentials, their DIY project has become a luxury they cannot afford, and the track stays unfinished in a folder somewhere. What if you could move from lyric to release-ready master in under two hours, spending less than £20? Not by cutting corners or compromising on quality, but by treating production like an assembly line where each AI tool does exactly what it does best. You write the lyrics. One system generates the instrumental. Another synthesises professional vocals. A third masters everything to competitive loudness standards. No context-switching between platforms. No manual exports and re-imports. Just data flowing from one tool to the next, fully automated. This workflow is not hypothetical. It works, it's affordable, and it produces genuinely usable results. Here's how to build it.

The Automated Workflow

The core chain looks like this: a trigger (your lyrics in a spreadsheet or email) fires a webhook; an orchestration platform coordinates the calls to Hydra AI for instrumental generation, ElevenLabs for vocal synthesis, and Landr for mastering; the final file lands in your cloud storage ready to listen. We'll use n8n as the orchestrator, because it offers the most granular control and the cheapest self-hosted option. If you prefer a managed service, Make (Integromat) or Zapier both work, though they'll cost more per automation run.

Setting up the trigger

Start by defining what kicks off your workflow. The simplest approach is a webhook that accepts a JSON payload containing the song title, lyrics, and a few metadata fields.

json
{ "songTitle": "Neon Dreams", "lyrics": "City lights and midnight flights...", "genre": "synthwave", "tempo": 95, "key": "A minor", "vocalStyle": "breathy"
}

In n8n, create a Webhook node set to POST and give it a descriptive name. That URL becomes your entry point. You can trigger it manually via curl, a form, or integrate it with a spreadsheet whenever you add a new row.

Generating the instrumental with Hydra AI

Hydra AI accepts a request containing your genre, tempo, key, and duration. The response is a URL to your generated instrumental (usually a WAV file that will persist for 24 hours).

POST https://api.hydra.ai/v1/generate
Content-Type: application/json
Authorization: Bearer YOUR_HYDRA_API_KEY { "genre": "synthwave", "tempo": 95, "key": "A minor", "duration": 180, "style": "atmospheric"
}

The response includes a track ID and a download URL. Store both; you'll need the URL in the next step. In n8n, add an HTTP Request node that makes this call. Map your webhook payload fields to Hydra's parameters. Once you get the response, extract the download URL using a Set node so you can reuse it later.

Synthesising vocals with ElevenLabs

ElevenLabs' Text-to-Speech endpoint takes your lyrics and a voice ID, then returns an audio file (MP3 or WAV). The tricky part is timing: you need to match the vocal duration roughly to the instrumental so they fit together.

POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
Content-Type: application/json
xi-api-key: YOUR_ELEVENLABS_API_KEY { "text": "City lights and midnight flights...", "model_id": "eleven_monolingual_v1", "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 }
}

Use the voice ID that matches your desired vocal style. Stability controls how consistent the delivery is; similarity_boost keeps it close to the reference voice. Aim for stability around 0.5 and similarity around 0.75 for natural results. The endpoint returns audio directly. Save it to a temporary variable or upload it to a file storage service (Google Drive, Dropbox, or S3) so Landr can fetch it later.

Combining instrumental and vocals

This is the step that separates amateur from usable. You need to mix the instrumental and the vocal file together. You have two options here. Option 1: Use an audio mixing library embedded in n8n. This requires running n8n with a custom node that calls FFmpeg or a similar tool. FFmpeg can layer two audio tracks:

bash
ffmpeg -i instrumental.wav -i vocal.wav -filter_complex "[0][1]amix=inputs=2:duration=longest" output.wav

If you're self-hosting n8n on a VPS or local machine, this is straightforward. Add a Script node that executes the ffmpeg command with your downloaded files. Option 2: Skip this step for now and let Landr handle it. Upload both files to Landr separately; some Landr workflows can accept multiple stems and mix them during mastering. Check their API documentation for stem support. We'll assume Option 1 for this guide. After ffmpeg combines the tracks, upload the mixed file back to your cloud storage.

Mastering with Landr

Landr's API accepts an audio file and returns a mastered version optimised for loudness, EQ balance, and stereo width. Landr is one of the few AI mastering tools that actually produces competitive results; it's worth the cost.

POST https://www.landr.com/api/v1/upload
Content-Type: multipart/form-data
Authorization: Bearer YOUR_LANDR_API_KEY file: <your mixed audio file>
genre: synthwave
service: master

The response includes a job ID. Mastering takes anywhere from 30 seconds to 5 minutes depending on Landr's queue. Poll the job endpoint every 10 seconds until status is "complete", then download the mastered file.

GET https://www.landr.com/api/v1/mastered/{job_id}
Authorization: Bearer YOUR_LANDR_API_KEY

Final automation loop in n8n

Your n8n workflow now looks like this: 1. Webhook receives lyrics and metadata.

HTTP Request calls Hydra AI, extracts instrumental URL.
HTTP Request calls ElevenLabs, stores vocal file in cloud storage.
Script node downloads both files, runs ffmpeg to mix them, uploads result.
HTTP Request uploads mixed file to Landr, receives job ID.
Wait node holds for 30 seconds.
Loop node polls Landr's API every 10 seconds until mastering is done.
HTTP Request downloads the final mastered file.
Google Drive node saves the file to a folder called "Finished Tracks".
Email node sends you a notification with the download link. This entire sequence takes 5 to 10 minutes from webhook trigger to finished master in your drive.

The Manual Alternative

If you want human oversight at any point (for instance, to approve the vocal before mastering), skip the full automation and instead have n8n stop at step 4. You listen to the mixed version, tweak the levels in Audacity or your DAW if needed, then manually upload to Landr. This adds 10 minutes of work but gives you absolute control over how the vocal sits in the mix. Alternatively, use Zapier to build a simpler 2-3 step workflow that just triggers Hydra AI and ElevenLabs separately, then download both files and mix them locally. This is slower but requires no coding knowledge and costs less per run if you only do it occasionally.

Pro Tips Rate limits and queuing.

Hydra AI and ElevenLabs both have rate limits (typically 10 to 100 requests per minute depending on your plan). If you're generating multiple tracks in quick succession, add a 60-second delay between webhook triggers or batch them with a 10-minute gap. Landr's mastering queue can back up during peak hours (6 PM to 11 PM UK time); schedule production runs for early morning if you're impatient. Vocal timing and pacing. ElevenLabs reads at a consistent speed, which works fine for rap or spoken word but sounds robotic for emotional ballads. Slow down your lyrics by adding line breaks and punctuation, or record the vocals manually and use ElevenLabs only as a backup. For better results, pay for ElevenLabs' Premium Voice feature, which includes prosody controls. Audio format compatibility. Hydra AI usually outputs WAV; ElevenLabs outputs MP3 or WAV; Landr accepts both but prefers WAV above 16-bit 44.1 kHz. Normalise all audio to WAV 24-bit 48 kHz before mixing using ffmpeg to avoid surprises. Cost optimisation. Generate the instrumental first, check the duration, then set your ElevenLabs text length accordingly. If your instrumental is 3 minutes but your lyrics only take 90 seconds to read aloud, Landr will have blank space to master. Pad the lyrics with a chorus repeat or use ffmpeg to loop the instrumental to match vocal length. Error handling. Add error catch nodes in n8n after every HTTP request. If Hydra AI fails, log the error and send yourself an alert rather than proceeding with a missing file. Store job IDs and file URLs in a database so you can retry failed runs manually without starting from scratch.

Cost Breakdown

Tool	Plan Needed	Monthly Cost	Notes
Hydra AI	Starter	£9.99	50 generations per month; most solo projects need 2–4.
ElevenLabs	Creator	£9.99	10,000 characters per month; a 3-minute song is roughly 300–500 characters.
Landr	Unlimited Mastering	£11.99	Unlimited masters, one per day. Upgrade to Pro (£19.99) for multiple masters per day.
n8n	Self-hosted (free) or Cloud Starter	£0–25	Free if self-hosted on your own VPS. Cloud Starter (£25/month) if you prefer managed hosting.
Cloud storage	Google Drive (free) or S3 Standard	£0–5	Google Drive free tier includes 15 GB. S3 costs about £0.023 per GB stored.
Total		£30–60	Per month for unlimited production.