Deepgram

Enterprise-grade Voice AI APIs for real-time speech-to-text, text-to-speech, and conversational voice agents.

Freemium

#AI Transcription #Multilingual Transcription #Natural Language Processing #Real-Time Transcription #Speech Recognition #Video Transcription+2

Overview

Deepgram offers advanced voice AI solutions including speech-to-text, text-to-speech, and a unified Voice Agent API that integrates conversational AI with real-time transcription and natural voice synthesis. It supports over 36 languages with ultra-low latency, high accuracy, and customizable models tailored for industries like healthcare, customer support, and media. Trusted by enterprises and startups, Deepgram enables scalable, secure, and cost-effective voice AI experiences through flexible cloud and self-hosted deployments.

Key Features

Unified Voice Agent API

Combines speech-to-text, text-to-speech, and large language model orchestration into a single API to reduce complexity, latency, and cost for building conversational AI agents.

Flux Conversational STT Model

A speech-to-text model optimized for real-time conversation with built-in turn detection, natural interruption handling, and sub-300ms latency for human-like voice agents.

Nova-3 Transcription Model

High-performance speech-to-text model offering top accuracy, multilingual support, and noise robustness for production transcription needs.

Industry-Tuned and Custom Models

Specialized models optimized for domains like healthcare, legal, and finance, plus custom models trained on proprietary datasets for maximum accuracy.

Audio Intelligence Features

Includes summarization, topic detection, sentiment analysis, and intent recognition powered by task-specific language models that work with or without transcription.

Multichannel Audio Support

Ability to transcribe multichannel audio with speaker diarization and separate channel billing for accurate transcription in overlapping speech scenarios.

Text-to-Speech with Natural Voices

Responsive, natural-sounding text-to-speech models designed for high-throughput voicebots and conversational AI applications, billed per character.

Enterprise-Grade Security and Scalability

Offers cloud and self-hosted deployment options, priority support, and compliance-ready solutions for large volume and sensitive data environments.

Who It's For

How to Use

Sign Up and Get API Key

Create a free Deepgram account to access your API key and start using the platform with $200 in free credits.

Choose Your Speech-to-Text Model

Select from Flux for real-time conversation, Nova-3 for transcription accuracy, or industry/custom models based on your needs.

Integrate Audio Input

Stream live audio or upload pre-recorded files to the Deepgram API for transcription and analysis.

Configure Features

Enable optional features like speaker diarization, keyterm boosting, redaction, and smart formatting to tailor output.

Use Voice Agent API for Conversational AI

Leverage the unified API to build voice agents that combine STT, LLM orchestration, and TTS for natural interactions.

Pricing

Pricing details are gathered from the official Deepgram website and are provided for reference only. Always confirm the latest information directly with the vendor.

Plan	Price	Highlights
Pay As You Go	Free $200 credit then pay-as-you-go	Access all speech-to-text, text-to-speech, and audio intelligence endpoints No minimums or expiration No credit card required to start
Growth	From $4,000	All Pay As You Go features Up to 20% discount on usage Higher concurrency limits Discord and community support
Enterprise	Custom Pricing	Custom-trained speech-to-text models Priority access to new features and models Highest concurrency support Self-hosted deployment options Paid support plans available

Found a change in pricing? We welcome corrections. Reach out so we can keep this listing accurate.

Pros & Cons

Pros

Unified API simplifies building conversational AI agents by integrating STT, TTS, and LLM orchestration.
Ultra-low latency transcription with sub-300ms delay supports real-time applications.
Supports over 36 languages and dialects for global reach.
Custom and industry-tuned models improve accuracy for specialized domains.
Flexible pricing plans including pay-as-you-go and enterprise options with self-hosting available.

Cons

Pricing can be complex due to multiple models and add-ons like redaction and keyterm prompting.
Some advanced features require contacting sales, limiting transparency for smaller users.
Text-to-speech currently supports only English language.
Voice Agent API pricing depends on WebSocket connection time, which may be harder to estimate.
Limited public documentation on detailed LLM integration options and tiers.

Use Cases

Explore tools grouped by use case so you can keep researching without losing momentum.

Frequently Asked Questions

Alternatives

Compare other vetted products our editors see buyers evaluate alongside Deepgram.

Ratings & reviews

Key Features

Unified Voice Agent API

Combines speech-to-text, text-to-speech, and large language model orchestration into a single API to reduce complexity, latency, and cost for building conversational AI agents.

Flux Conversational STT Model

A speech-to-text model optimized for real-time conversation with built-in turn detection, natural interruption handling, and sub-300ms latency for human-like voice agents.

Nova-3 Transcription Model

High-performance speech-to-text model offering top accuracy, multilingual support, and noise robustness for production transcription needs.

Industry-Tuned and Custom Models

Specialized models optimized for domains like healthcare, legal, and finance, plus custom models trained on proprietary datasets for maximum accuracy.

Audio Intelligence Features

Includes summarization, topic detection, sentiment analysis, and intent recognition powered by task-specific language models that work with or without transcription.

Multichannel Audio Support

Ability to transcribe multichannel audio with speaker diarization and separate channel billing for accurate transcription in overlapping speech scenarios.

Text-to-Speech with Natural Voices

Responsive, natural-sounding text-to-speech models designed for high-throughput voicebots and conversational AI applications, billed per character.

Enterprise-Grade Security and Scalability

Offers cloud and self-hosted deployment options, priority support, and compliance-ready solutions for large volume and sensitive data environments.

Who It's For

How to Use

Sign Up and Get API Key

Create a free Deepgram account to access your API key and start using the platform with $200 in free credits.

Choose Your Speech-to-Text Model

Select from Flux for real-time conversation, Nova-3 for transcription accuracy, or industry/custom models based on your needs.

Integrate Audio Input

Stream live audio or upload pre-recorded files to the Deepgram API for transcription and analysis.

Configure Features

Enable optional features like speaker diarization, keyterm boosting, redaction, and smart formatting to tailor output.

Use Voice Agent API for Conversational AI

Leverage the unified API to build voice agents that combine STT, LLM orchestration, and TTS for natural interactions.

Pricing

Pricing details are gathered from the official Deepgram website and are provided for reference only. Always confirm the latest information directly with the vendor.

Plan	Price	Highlights
Pay As You Go	Free $200 credit then pay-as-you-go	Access all speech-to-text, text-to-speech, and audio intelligence endpoints No minimums or expiration No credit card required to start
Growth	From $4,000	All Pay As You Go features Up to 20% discount on usage Higher concurrency limits Discord and community support
Enterprise	Custom Pricing	Custom-trained speech-to-text models Priority access to new features and models Highest concurrency support Self-hosted deployment options Paid support plans available

Found a change in pricing? We welcome corrections. Reach out so we can keep this listing accurate.

Pros & Cons

Pros

Unified API simplifies building conversational AI agents by integrating STT, TTS, and LLM orchestration.
Ultra-low latency transcription with sub-300ms delay supports real-time applications.
Supports over 36 languages and dialects for global reach.
Custom and industry-tuned models improve accuracy for specialized domains.
Flexible pricing plans including pay-as-you-go and enterprise options with self-hosting available.

Cons

Pricing can be complex due to multiple models and add-ons like redaction and keyterm prompting.
Some advanced features require contacting sales, limiting transparency for smaller users.
Text-to-speech currently supports only English language.
Voice Agent API pricing depends on WebSocket connection time, which may be harder to estimate.
Limited public documentation on detailed LLM integration options and tiers.

Use Cases

Explore tools grouped by use case so you can keep researching without losing momentum.

Frequently Asked Questions

Alternatives

Compare other vetted products our editors see buyers evaluate alongside Deepgram.

Deepgram

Overview

Featured Video

Key Features

Unified Voice Agent API

Flux Conversational STT Model

Nova-3 Transcription Model

Industry-Tuned and Custom Models

Audio Intelligence Features

Multichannel Audio Support

Text-to-Speech with Natural Voices

Enterprise-Grade Security and Scalability

Who It's For

How to Use

Sign Up and Get API Key

Choose Your Speech-to-Text Model

Integrate Audio Input

Configure Features

Use Voice Agent API for Conversational AI

Pricing

Pros & Cons

Pros

Cons

Use Cases

Frequently Asked Questions

Alternatives

Ratings & reviews

Featured Video

Key Features

Unified Voice Agent API

Flux Conversational STT Model

Nova-3 Transcription Model

Industry-Tuned and Custom Models

Audio Intelligence Features

Multichannel Audio Support

Text-to-Speech with Natural Voices

Enterprise-Grade Security and Scalability

Who It's For

How to Use

Sign Up and Get API Key

Choose Your Speech-to-Text Model

Integrate Audio Input

Configure Features

Use Voice Agent API for Conversational AI

Pricing

Pros & Cons

Pros

Cons

Use Cases

Frequently Asked Questions

Alternatives

Ratings & reviews