Deepgram Nova screenshot

What is Deepgram Nova?

Deepgram Nova is a speech-to-text API that converts spoken audio into written text. It's designed for developers and companies that need accurate transcription at scale, whether for customer service recordings, meeting notes, or accessibility features. Nova is Deepgram's fastest and most accurate model, built specifically for production use where reliability matters. You can integrate it via API calls, making it suitable for applications that process audio in real time or in batches. The freemium pricing model lets you test the service before committing to paid plans.

Key Features

High accuracy speech-to-text conversion with support for multiple languages and accents

Real-time and batch audio processing via REST and WebSocket APIs

Speaker diarisation to identify who is speaking in multi-speaker recordings

Customisable vocabulary and domain-specific language models for technical or industry terminology

Automatic punctuation and capitalisation in transcripts

Low latency processing suitable for live applications and interactive use

Pros & Cons

Advantages

  • Faster processing speed compared to competing models, reducing wait times for results
  • High accuracy rates reduce manual correction work and improve downstream use of transcripts
  • Freemium tier allows hands-on testing without requiring a credit card upfront
  • Flexible API integration works with existing workflows and applications

Limitations

  • Accuracy and speed vary depending on audio quality and language; heavily accented or noisy recordings may perform worse
  • Custom model training or fine-tuning may require contacting support, limiting self-service customisation
  • Pricing scales with usage, so high-volume transcription can become expensive without careful monitoring

Use Cases

Transcribing customer service calls and support tickets for quality assurance and training

Converting meeting recordings into searchable transcripts for remote and hybrid teams

Providing live captions for video content, podcasts, or live events

Automating voice input for accessibility features in applications

Analysing interview or survey recordings for market research and content analysis