This technology (also automatic speech recognition) is used to automatically transcribe audio (or video) files and extract the text to be analyzed afterwards.
Unlike the pre-trained speech-to-text engines, this technology allows to generate a customized transcription engine specially trained on the user's labeled data.
This technology makes it possible to generate audio from a text while choosing the associated voice (male or female voice, etc.)
This technology is used to separate several voices in an audio and to specify the precise moments when each individual speaks.
This technology (also called opinion mining) automatically analyzes the feelings and emotions associated with an audio or a video.
This technology automatically defines the most likely language in which an audio is expressed. It can then be translated for example.
This technology enables the association of information describing the voice being analyzed: estimation of sex, age, etc. It is also possible to use it to identify a person.
This technology allows a speech in an audio to be translated directly from one defined language to another.