How to transcribe long audio files
Tutorial

How to transcribe long audio files

Using Eden AI for long audio transcription

Audio files are often encountered in various applications, ranging from podcasts and interviews to recordings of lectures or meetings. Nevertheless, dealing with long audio files can be challenging when the objective is to transcribe or process specific segments of the content. This is where Eden AI comes into play.

In this tutorial, we will guide you through the process of splitting long audio files into smaller chunks, generating text transcriptions, and concatenating the resulting text. Let’s get started.

Prerequisites:

Ensure that you have the following requirements in place beforehand:

  1. A valid API key from Eden AI.
  2. Python installed on your system.
  3. The necessary Python libraries: requests, pydub, and pydub.silence.

Step 1: Import the Required Libraries

To start with, let’s import the necessary libraries to access the Eden AI API and handle audio processing. Open your Python environment or IDE and import the following libraries:


import json
import requests
from pydub import AudioSegment
from pydub.silence import split_on_silence

Step 2: Set Up the API Key and Audio File URL

Next, we need to set up the API key and specify the URL of the audio file that you want to split. To get your API key, you’ll need to create an account on Eden AI:

Update the following variables with your API key and audio file URL:


# Replace with your API key and audio file URL
api_key = "YOUR_API_KEY"
audio_file_url = "AUDIO_FILE_URL"

Step 3: Download and Prepare the Audio File

In this step, we will download the long audio file from the specified URL and prepare it for further processing. Add the following code:


# Download the audio file
response = requests.get(audio_file_url)
with open("temp_audio_file.mp3", "wb") as file:
    file.write(response.content)
audio = AudioSegment.from_file("temp_audio_file.mp3", format="mp3")

Step 4: Split the Audio File into Chunks

Now, let’s split the audio file into smaller chunks based on periods of silence. We’ll use the split_on_silence function from the pydub.silence module. Include the following code:


 # Load the audio file into chunks
 chunks = split_on_silence(audio, min_silence_len=500, silence_thresh=-40)

Step 5: Define the Transcription Function

To transcribe each audio chunk, we need to define a function that utilizes the Eden AI API. Add the following code:


# Function to transcribe an audio chunk

def transcribe_audio_chunk(chunk, index):
    chunk.export(f"temp_chunk_{index}.mp3", format="mp3")

    url = "https://api.edenai.run/v2/audio/speech_to_text_async"
    headers = {"Authorization": f"Bearer {api_key}"}
    json_payload = {
        "providers": "google, amazon",
        "language": "en-US",
        "file_url": f"🔗 your_audio_url",
    }

    response = requests.post(url, json=json_payload, headers=headers)
    result = json.loads(response.text)
    return result["result"]["google"]["transcription"]

Step 6: Transcribe and Concatenate Text

In this final step, we will transcribe each audio chunk and concatenate the resulting text. Add the following code:


# Transcribe each chunk and concatenate the text
transcribed_text = ""
for index, chunk in enumerate(chunks):
    text = transcribe_audio_chunk(chunk, index)
    transcribed_text += " " + text

print(transcribed_text)

Best Practices for Working with Audio Files

1. Audio File Format and Quality

Ensure that your audio file is in a compatible format supported by the Eden AI API. Commonly used formats include MP3, WAV, FLAC, and OGG. Additionally, consider the quality of the audio file. Higher-quality recordings generally yield better transcription results.

2. Pre-processing and Noise Reduction

Before splitting your audio file, consider applying pre-processing techniques to improve transcription accuracy. This includes reducing background noise, normalizing audio levels, and enhancing speech clarity. Tools like the pydub library provide functionalities for noise reduction and audio enhancement.

3. Optimal Chunk Size

Choose an appropriate chunk size based on your specific requirements. Smaller chunks allow for more granular processing but may increase API usage and processing time. Larger chunks reduce API calls but may result in longer transcriptions or lower accuracy for sections with significant background noise or overlapping speech. Experiment with different chunk sizes to find the balance that suits your needs.

4. Silence Threshold and Minimum Silence Length

The split_on_silence function requires setting the silence threshold and minimum silence length parameters. Adjust these values according to the characteristics of your audio file. Higher silence thresholds may result in splitting the audio at lower volumes, while shorter minimum silence lengths may lead to more frequent splits. Fine-tune these parameters to achieve desired results.

5. Error Handling and Retry Mechanisms

When making API calls to Eden AI, implement appropriate error handling and retry mechanisms. Network disruptions or API limitations may cause intermittent failures. Consider incorporating error handling and retries to ensure the reliability and robustness of your code.

Note: As a good practice, make sure to clean up any temporary files generated during the process.

Complete Code

Handling long audio files can be a complex task, especially when you need to extract specific sections for transcription or further processing. With the knowledge gained from this tutorial, you are now equipped to tackle the challenge of working with long audio files.

Remember to handle your API key securely and consider optimizing your audio files by choosing compatible formats, applying pre-processing techniques for noise reduction.


import json
import requests
from pydub import AudioSegment
from pydub.silence import split_on_silence

# Replace with your API key and audio file URL
api_key = "your_API_key"
audio_file_url = "your_audio_url"

# Download the audio file
response = requests.get(audio_file_url)
with open("temp_audio_file.mp3", "wb") as file:
    file.write(response.content)

# Load the audio file and split it into chunks
audio = AudioSegment.from_file("temp_audio_file.mp3", format="mp3")
chunks = split_on_silence(audio, min_silence_len=500, silence_thresh=-40)

# Function to transcribe an audio chunk
def transcribe_audio_chunk(chunk, index):
    chunk.export(f"temp_chunk_{index}.mp3", format="mp3")

    url = "https://api.edenai.run/v2/audio/speech_to_text_async"
    headers = {"Authorization": f"Bearer {api_key}"}
    json_payload = {
        "providers": "google, amazon",
        "language": "en-US",
        "file_url": f"🔗 you_audio_url",
    }

    response = requests.post(url, json=json_payload, headers=headers)
    result = json.loads(response.text)
    return result["result"]["google"]["transcription"]

# Transcribe each chunk and concatenate the text
transcribed_text = ""
for index, chunk in enumerate(chunks):
    text = transcribe_audio_chunk(chunk, index)
    transcribed_text += " " + text

print(transcribed_text)

Related Posts

Try Eden AI for free.

You can directly start building now. If you have any questions, feel free to schedule a call with us!

Get startedContact sales