> ## Documentation Index
> Fetch the complete documentation index at: https://www.edenai.co/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech To Text

> Speech Recognition (or speech to text or voice to text) can recognize and transcribe spoken words (voice),  that will be converted to text with or without diarization.

export const TechArticleSchema = ({title, description, path, articleSection, about, proficiencyLevel = "Beginner", dependencies, keywords = [], datePublished, dateModified, image, inLanguage = "en"}) => {
  const baseUrl = "https://www.edenai.co/docs";
  const canonicalUrl = `${baseUrl}/${path}`.replace(/\/+$/, "");
  const ogParams = new URLSearchParams({
    division: articleSection || "",
    title: title || "",
    description: description || ""
  });
  const resolvedImage = image || `https://edenai.mintlify.app/_mintlify/api/og?${ogParams.toString()}`;
  const data = {
    "@context": "https://schema.org",
    "@type": "TechArticle",
    "@id": `${canonicalUrl}#techarticle`,
    mainEntityOfPage: {
      "@type": "WebPage",
      "@id": canonicalUrl
    },
    headline: title,
    name: title,
    description: description,
    url: canonicalUrl,
    inLanguage: inLanguage,
    isPartOf: {
      "@type": "WebSite",
      name: "Eden AI Documentation",
      url: baseUrl
    },
    author: [{
      "@type": "Organization",
      name: "Eden AI",
      url: "https://www.edenai.co/"
    }],
    publisher: {
      "@type": "Organization",
      name: "Eden AI",
      url: "https://www.edenai.co/",
      logo: {
        "@type": "ImageObject",
        url: "https://www.edenai.co/assets/logo.png"
      }
    }
  };
  if (articleSection) data.articleSection = articleSection;
  if (about) data.about = {
    "@type": "Thing",
    name: about
  };
  if (proficiencyLevel) data.proficiencyLevel = proficiencyLevel;
  if (dependencies) data.dependencies = dependencies;
  if (keywords && keywords.length) data.keywords = keywords;
  if (datePublished) data.datePublished = datePublished;
  if (dateModified) data.dateModified = dateModified;
  data.image = Array.isArray(resolvedImage) ? resolvedImage : [resolvedImage];
  const json = JSON.stringify(data);
  const schemaId = `techarticle-${canonicalUrl}`;
  React.useEffect(() => {
    if (typeof document === "undefined") return;
    document.querySelectorAll(`script[data-schema-id="${schemaId}"]`).forEach(n => n.remove());
    const script = document.createElement("script");
    script.type = "application/ld+json";
    script.dataset.schemaId = schemaId;
    script.textContent = json;
    document.head.appendChild(script);
    return () => script.remove();
  }, [json, schemaId]);
  return null;
};

<TechArticleSchema title={`Speech To Text`} description={`Speech Recognition (or speech to text or voice to text) can recognize and transcribe spoken words (voice),  that will be converted to text with or without diarization.`} path="v3/expert-models/features/audio/speech-to-text-async" articleSection="Audio Features" about={`Audio AI API`} proficiencyLevel="Intermediate" keywords={[`Eden AI`, `AI API`, `speech to text`, `text to speech`]} datePublished="2026-05-06T00:00:00Z" dateModified="2026-05-07T00:00:00Z" />

## Endpoint

`POST /v3/universal-ai/async` (async)

Model string pattern: `audio/speech_to_text_async/{provider}[/{model}]`

## Input

| Field             | Type           | Required | Description                                                                            |
| ----------------- | -------------- | -------- | -------------------------------------------------------------------------------------- |
| file              | file\_input    | Yes      | Audio file ID from /v3/upload or direct file URL                                       |
| language          | string         | No       | Language code in ISO format (e.g., 'en', 'fr', 'es')                                   |
| speakers          | int            | No       | Number of speakers present in the audio                                                |
| profanity\_filter | bool           | No       | Whether to filter profanity and replace inappropriate words with a series of asterisks |
| vocabulary        | array\[string] | No       | List of words or composed words to be detected by the speech to text engine            |

## Output

| Field               | Type           | Required | Description |
| ------------------- | -------------- | -------- | ----------- |
| text                | string         | Yes      |             |
| **diarization**     | object         | Yes      |             |
|     total\_speakers | int            | Yes      |             |
|     **entries**     | array\[object] | No       |             |
|         segment     | string         | Yes      |             |
|         start\_time | string         | Yes      |             |
|         end\_time   | string         | Yes      |             |
|         speaker     | int            | Yes      |             |
|         confidence  | float          | Yes      |             |
|     error\_message  | string         | No       |             |

## Available Providers

| Provider            | Model String                                   | Price                    |
| ------------------- | ---------------------------------------------- | ------------------------ |
| amazon              | `audio/speech_to_text_async/amazon`            | \$0.024 per 60 secondes  |
| assembly            | `audio/speech_to_text_async/assembly`          | \$0.011 per 60 secondes  |
| deepgram (base)     | `audio/speech_to_text_async/deepgram/base`     | \$0.0169 per 60 secondes |
| deepgram            | `audio/speech_to_text_async/deepgram`          | \$0.0189 per 60 secondes |
| deepgram (enhanced) | `audio/speech_to_text_async/deepgram/enhanced` | \$0.0189 per 60 secondes |
| deepgram (nova-3)   | `audio/speech_to_text_async/deepgram/nova-3`   | \$0.0052 per 60 secondes |
| gladia              | `audio/speech_to_text_async/gladia`            | \$0.0102 per 60 secondes |
| google              | `audio/speech_to_text_async/google`            | \$0.024 per 60 secondes  |
| microsoft           | `audio/speech_to_text_async/microsoft`         | \$0.0168 per 60 secondes |
| openai              | `audio/speech_to_text_async/openai`            | \$0.006 per 60 secondes  |

## Quick Start

> This is an **async** feature. The initial response returns a job ID. Poll `GET /v3/universal-ai/async/{job_id}` until the job completes.

<CodeGroup>
  ```python Python theme={null}
  import requests

  url = "https://api.edenai.run/v3/universal-ai/async"
  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }

  payload = {
      "model": "audio/speech_to_text_async/amazon",
      "input": {
          "file": "YOUR_FILE_UUID_OR_URL",
          "language": "en"
      }
  }

  response = requests.post(url, headers=headers, json=payload)
  print(response.json())
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.edenai.run/v3/universal-ai/async \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "audio/speech_to_text_async/amazon",
      "input": {"file": "YOUR_FILE_UUID_OR_URL", "language": "en"}
    }'
  ```
</CodeGroup>
