Back to Alchemy
Alchemy RecipeBeginnerguide

Getting Started with Voice Synthesis: Which Tool Fits Your Project and Budget

24 March 2026

Introduction

Voice synthesis tools have become essential for projects ranging from audiobook production to automated customer service systems. If you're building a web application, mobile app, or content platform that needs realistic speech output, you'll quickly realise that choosing the right tool matters more than you might expect. The difference between a robotic voice that sounds like a 1990s GPS and natural-sounding speech that users actually enjoy listening to can significantly impact user engagement and satisfaction.

This guide covers three popular voice synthesis platforms: ElevenLabs, iSpeech, and Resemble AI. Each serves different needs and budgets, so by the end of this post, you'll understand which one makes sense for your specific project. We'll walk through practical setup steps, show you actual code examples, and highlight the real costs involved, not just the marketing claims.

The goal isn't to declare one "winner". Instead, you'll learn the genuine strengths and limitations of each platform so you can make an informed decision based on your requirements, timeline, and budget.

What You'll Need

Before diving into the technical setup, make sure you have the following in place:

  • A text editor or IDE for writing code (VS Code, PyCharm, or similar)
  • Basic familiarity with HTTP requests and API keys
  • A working internet connection and a credit card for paid services
  • Python 3.8 or later installed if you plan to use Python examples
  • Node.js 14+ if you prefer JavaScript examples

All three tools offer free trial tiers, so you don't need to commit financially to test them. However, free tiers come with significant limitations: reduced monthly character limits, no access to advanced voices, or watermarked audio files. Budget roughly £20 to £50 per month if you're testing these tools seriously or running a small production project.

If you're building something for a single feature request or one-off project, the free tier might be sufficient. For production applications with regular usage, expect to spend between £30 and £200 per month depending on traffic and voice quality requirements.

Step-by-Step Setup

ElevenLabs

Account creation and API key generation

Start by visiting the ElevenLabs website and signing up for an account. The registration is straightforward: enter your email, choose a password, and verify your email address. Once logged in, navigate to your account settings and find the "API Key" section. Copy this key and store it somewhere safe, like a password manager or environment variable file.

export ELEVENLABS_API_KEY="your_key_here"

ElevenLabs offers a generous free tier: 10,000 characters per month with access to basic voices. This is enough to test the API without spending money.

Testing with a simple Python request

Create a new Python file called elevenlabs_test.py. This script will send text to ElevenLabs and download the resulting audio file:

import requests
import os

api_key = os.getenv("ELEVENLABS_API_KEY")
voice_id = "21m00Tcm4TlvDq8ikWAM"  # Pre-built voice ID for "Rachel"

text = "Hello, this is a test of ElevenLabs voice synthesis."
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"

headers = {
    "xi-api-key": api_key,
    "Content-Type": "application/json"
}

payload = {
    "text": text,
    "model_id": "eleven_monolingual_v1",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.75
    }
}

response = requests.post(url, json=payload, headers=headers)

if response.status_code == 200:
    with open("output.mp3", "wb") as f:
        f.write(response.content)
    print("Audio file saved as output.mp3")
else:
    print(f"Error: {response.status_code} - {response.text}")

Run the script with python elevenlabs_test.py. If successful, you'll have an MP3 file with ElevenLabs synthesising your text. The stability parameter controls how consistent the voice sounds (higher values mean less variation), while similarity_boost makes the voice closer to the original model.

Choosing voices and customising settings

ElevenLabs provides a list of available voices through their API. Fetch this list to see what's available:

import requests
import os

api_key = os.getenv("ELEVENLABS_API_KEY")
url = "https://api.elevenlabs.io/v1/voices"

headers = {
    "xi-api-key": api_key
}

response = requests.get(url, headers=headers)
voices = response.json()["voices"]

for voice in voices:
    print(f"{voice['name']}: {voice['voice_id']}")

This will list all available voices with their IDs. Premium voices offer more natural intonation and emotion, but they require a paid subscription.

iSpeech

Getting started with iSpeech credentials

iSpeech uses a different authentication method than ElevenLabs. Sign up at their website and navigate to your API credentials page. You'll receive an API key and a "Speech Key" (these are different). Store both in environment variables:

export ISPEECH_API_KEY="your_api_key"
export ISPEECH_SPEECH_KEY="your_speech_key"

iSpeech's free tier includes 500 requests per month, which is more restrictive than ElevenLabs in terms of character count but still workable for testing.

Making your first iSpeech request

Create a file called ispeech_test.py:

import requests
import os
from urllib.parse import urlencode

api_key = os.getenv("ISPEECH_API_KEY")
speech_key = os.getenv("ISPEECH_SPEECH_KEY")

text = "This is a test of iSpeech voice synthesis."
lang = "en-US"
voice = "usenglishfemale"

params = {
    "apikey": api_key,
    "speechkey": speech_key,
    "action": "convert",
    "text": text,
    "format": "mp3",
    "voice": voice,
    "speed": 0,
    "quality": "hq"
}

url = "https://tts-api.ispeech.org/api/tts?" + urlencode(params)

response = requests.get(url)

if response.status_code == 200:
    with open("ispeech_output.mp3", "wb") as f:
        f.write(response.content)
    print("Audio file saved as ispeech_output.mp3")
else:
    print(f"Error: {response.status_code} - {response.text}")

iSpeech uses query parameters rather than JSON bodies, which makes it slightly simpler to test in a browser if needed. The quality parameter accepts "lq" (low), "mq" (medium), or "hq" (high quality), with obvious trade-offs in file size and processing time.

Voice selection and speed control

iSpeech offers fewer voice options than ElevenLabs, but it does include basic speed control. The speed parameter ranges from -10 (very slow) to 10 (very fast), with 0 being normal speed. Experiment with different voice names like "usenglishmale", "usenglishfemale", "usenglisholdmale", and "usenglisholdwoman".

Resemble AI

Setting up your Resemble AI workspace

Resemble AI takes a different approach: you create a custom voice by uploading voice samples. Sign up on their website and, once logged in, navigate to the "Voices" section. You'll see an option to create a new voice project. Upload at least 30 minutes of clean audio in WAV format (they'll guide you through this). This process takes several hours as Resemble trains a model specific to your voice.

While you wait for your custom voice to train, grab your API key from the account settings:

export RESEMBLE_API_KEY="your_api_key"
export RESEMBLE_PROJECT_UUID="your_project_uuid"
export RESEMBLE_VOICE_UUID="your_voice_uuid"

You'll need the project UUID and voice UUID from the Resemble dashboard. Free tier limits you to 1,000 characters per month, which is quite restrictive.

Creating and using a custom voice

Once your voice has finished training, use this Python script to synthesise speech with your custom voice:

import requests
import os
import time

api_key = os.getenv("RESEMBLE_API_KEY")
project_uuid = os.getenv("RESEMBLE_PROJECT_UUID")
voice_uuid = os.getenv("RESEMBLE_VOICE_UUID")

text = "This is a test using my custom voice from Resemble AI."

url = f"https://api.resemble.ai/v2/projects/{project_uuid}/clips"

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "body": text,
    "voice_uuid": voice_uuid
}

response = requests.post(url, json=payload, headers=headers)

if response.status_code == 201:
    clip_data = response.json()
    clip_uuid = clip_data["clip"]["uuid"]
    print(f"Clip created with UUID: {clip_uuid}")
    
    # Poll for completion
    while True:
        clip_url = f"https://api.resemble.ai/v2/clips/{clip_uuid}"
        clip_response = requests.get(clip_url, headers=headers)
        clip_status = clip_response.json()["clip"]
        
        if clip_status["status"] == "done":
            audio_url = clip_status["audio_url"]
            print(f"Audio ready at: {audio_url}")
            break
        elif clip_status["status"] == "error":
            print(f"Synthesis failed: {clip_status['error_message']}")
            break
        else:
            print(f"Status: {clip_status['status']}, waiting...")
            time.sleep(2)
else:
    print(f"Error: {response.status_code} - {response.text}")

Note that Resemble operates asynchronously. You create a clip request, then poll the API until the audio is ready. This differs from ElevenLabs and iSpeech, which return audio immediately.

Tips and Pitfalls

Character limits and billing confusion

Each service counts "characters" differently. ElevenLabs counts the exact number of characters in your input text, while iSpeech counts requests as units. Resemble AI also counts characters but applies them per project. Before going live, calculate your expected monthly usage and check each tool's pricing against that number. Many people get surprised by overage charges because they didn't realise their usage would scale.

Audio quality versus latency trade-offs

Higher quality audio takes longer to generate and costs more. If you're generating speech in real time (like a chatbot), choose "medium" quality settings initially. If you're pre-generating content (like an audiobook), you can afford to wait for the best quality. ElevenLabs' "eleven_multilingual_v2" model is slower but significantly more natural-sounding than their v1 model.

Managing API rate limits

All three services implement rate limiting. ElevenLabs allows roughly 3 requests per second on their free tier. iSpeech limits you to 500 requests per month (not per-second rate limiting, just a monthly cap). Resemble AI doesn't publish strict rate limits but expects reasonable usage patterns. If you're building a high-traffic application, test thoroughly before launch and contact support about enterprise plans.

Handling long text inputs

Services perform better with shorter inputs. If you're synthesising a 5,000-word article, split it into paragraphs of 500 words or fewer. This also makes it easier to handle failures (if one paragraph fails, you don't lose the entire synthesis). Most services will reject text longer than a certain limit anyway (ElevenLabs caps individual requests at around 5,000 characters).

Voice consistency in production

If you're building a customer-facing application, choose one voice and stick with it. Users develop familiarity with a voice, and switching between different voices feels jarring. Test voice options thoroughly before launching, preferably with actual users.

Watermarking and licensing

Some free tiers add watermarks to audio files. Check whether your use case allows this. iSpeech's free tier includes watermarks; ElevenLabs' free tier does not. If you're creating content for commercial use, verify the licensing terms in the service's terms of service.

Cost Breakdown

ToolPlanMonthly CostNotes
ElevenLabsFree£010,000 characters per month, basic voices only
ElevenLabsStarter£11100,000 characters per month, access to all voices
ElevenLabsProfessional£991,000,000 characters per month, commercial use rights
iSpeechFree£0500 requests per month, watermarked audio
iSpeechStandard£1510,000 requests per month, watermark-free
iSpeechProfessional£4550,000 requests per month, premium support
Resemble AIFree£01,000 characters per month, limited voice training
Resemble AIStarter£50100,000 characters per month, custom voice training included
Resemble AIProfessional£2001,000,000 characters per month, multi-voice support

Notes on costs:

Characters are typically counted as the length of input text, not the duration of generated audio. A 1,000-character paragraph might generate 30-60 seconds of audio depending on speech rate. For rough budgeting, assume you'll use 3-5x your expected character count to account for testing, experimentation, and edge cases. iSpeech's pricing structure is simpler if you have predictable request volumes; ElevenLabs is better if character count varies significantly month to month. Resemble AI makes sense if voice quality and naturalness are critical, especially if you need a voice that sounds like a specific person.

Summary

ElevenLabs offers the best balance of ease-of-use, voice quality, and reasonable pricing for most projects. Start here if you're unsure. iSpeech works well for straightforward applications where natural sound is less critical, and its simpler API might appeal to smaller teams. Resemble AI is your choice if you need a truly custom voice that sounds like a real person speaking, though the setup time and training cost make it better suited to larger projects.

Test all three with their free tiers before committing financially. Your specific requirements, not generic recommendations, should drive your final choice.