Analyze easily audio files with AI: Speech recognition + Translation + Text Mining (NLP)



This article is brought to you by the Eden AI team. We allow you to test and use in production a large number of AI engines from different providers directly through our API and platform. In this article, we expose how using AI pipeline easily allows you to solve complex use cases requiring and combining Speech-to-Text and text analysis (NLP).


You are a solution provider and want to integrate Eden AI, contact us at : contact@edenai.co


With AI services, you can build pipelines that solve common issues. When you need an OCR engine to detect text in your data, you often need other engines to analyze or transcribe the text detected. To solve this problem, you have multiple options using AI:

  • First option: multiple open source Speech-to-Text and NLP engines exist, they are free to use. Some of them can be performant but it can be complex to set up and use. Using an open source AI library requires data science expertise. Moreover, you will need to set up a server internally to run open source engines.

  • Second option: you can use engines from your cloud provider. Actually, cloud providers like Google Cloud, AWS, Microsoft Azure, Alibaba Cloud or IBM Watson are all providing multiple AI engines for vision, text, translation, prediction or speech. This option looks very easy because you already have all engines centralized in the same platform so you can easily access the AI engines. Additionally, you can stay in a known environment where you might have abilities in your company.

But you can’t be sure that the engines from your cloud provider offer the best performance, rapidity and prices. Moreover, it is possible that your cloud provider does not provide the engine you are looking for because they do not provide all AI services available on the market.


It remains the third option that we advise to you: the multi cloud approach. The performance ranking between the different providers will always change depending on your data (amount, type, quality, etc.) and the technology you need (object detection, OCR invoice, explicit content detection, syntax analysis, speech-to-text, etc.). Many providers exist for each type of engine: big cloud providers and AI specialists. Here's some examples of rankings using different data sets:

The Multi cloud approach


The only way you have to select the right provider is to benchmark different providers' engines with your data and choose the best OR combine different providers' engines results. You can also compare prices if the price is one of your priorities, as well as you can do for rapidity.


With this method, you can build powerful AI pipelines with the Speech-to-Text engine, and Text analysis engines that are the most adapted and powerful for your data and your project.

This method is the best in terms of performance and optimization but it presents many inconveniences:

  • you may not know every performant providers on the market

  • you need to subscribe and contract with all providers

  • you need to master each providers API documentation

  • you need to check their pricings

  • You need to process data in each engine to realize the benchmark


Here is where Eden AI becomes very useful. You just have to subscribe and create an Eden AI account, and you have access to many providers engines for many technologies (vision, NLP, speech, OCR, translation, prediction). The platform allows you to benchmark and visualize results from different engines, and also allows you to have centralized cost for the use of different providers.


Here is an example of a pipeline:

Eden AI — Example of pipeline: Speech-to-Text + Translation + Sentiment Analysis pipeline

Eden AI provides the same easy to use API with the same documentation for every technology. You can use Eden AI API to call Speech-to-Text, Sentiment Analysis and Translation for example, with provider as a simple parameter. With only few lines, you can set up your project in production :


Speech-to-Text:

payload = {'providers': "[\'microsoft\']",'language': 'en-US'}
files = {'files': open('Downloads\call.mp3','rb')}

response = requests.post( url+"audio/speech_recognition", headers=headers, data = payload, files = files)
result = json.loads(response.text)["result"]
transcription = result[0]["transcribe"]

print(transcription)

Translation:

payload = {'providers': "[\'amazon\']",'source_language': "en-US",'target_language': 'fr-FR', 'text_to_translate':transcription}

response = requests.post( url+"text/automatic_translation", headers=headers, data = payload)

result1 = json.loads(response.text)["result"]
translated_text = result1[0]["result"]["translated_text"]
print(translated_text)

Sentiment Analysis:

payload = {'providers': "['lettria']",'text':translated_text[:1000],'language': 'en-US'}

response = requests.post( url+"text/sentiment_analysis", headers=headers, data = payload)

result2 = json.loads(response.text)["result"]
sentiment = result2[0]
print(sentiment)

The pipeline is built very easily, and Eden AI allows you to go further. Provider is a parameter that allows you to set up with 2 lines of code a fallback provider in case the first one is down. You can also combine providers' results if you can’t get the performance you are looking for with only one provider’s engine.


Here's a video showing how Eden AI works:




Conclusion:


There are hundreds of AI engines available on the market: it's impossible to know all of them, to know those who provide good performance. Most of the time, you don’t use only one engine, you combine them as a pipeline to process your data (Speech + NLP for example). The best way to build this pipeline is the multi-cloud approach that guarantees you to reach the best performance and prices for each technology. This approach seems to be complex but it is simplified byEden AI which centralizes best providers APIs for each technology.

35 views0 comments

Recent Posts

See All