Transcribe audio into text and get an accurate translation with artificial intelligence

How to obtain the text transcription of an audio file effortlessly thanks to the help of artificial intelligence and generative models. The infinite possibilities that OpenAI APIs open up to transcribe texts and have precise and punctual translations.

Transcribing audio into editable text is an increasingly popular activity in many industries. However, turning hours of recordings into written text can be a long and tedious process that requires many hours of work. Fortunately, there are online tools and services that can help make this task easier.

In this article, we’ll show you how transcribe audio to text using artificial intelligence and a service that uses APIs Whisper by OpenAI, the company that developed the well-known chatbot ChatGPT.

The tool in question is called Writeout.ai and it is at the moment fully free. It is used from a web browser and allows you to convert audio to text very simply and in a few minutes.

The “bonus” feature that sets it apart Writeout.ai is that it allows translate the text script that was generated from an audio recording. Besides simplifying the audio-to-text transcription process and improving your productivity, it becomes possible to translate the transcribed text into different languages, to make it easier to understanding of audio content all over the world.

In this article we focus first on the practical aspects or on how it is possible to obtain one precise transcription of an audio clip of variable length; in the second part, however, we will briefly see the operation of a tool like Writeout.ai and how developers can exploit the potential of OpenAI API also locally.

How to transcribe audio to text with Writeout.ai

Get ready because the result that a tool like can achieve Writeout.ai it is something really impressive!

Suppose we have a file audio recorded in one of the 10 languages currently supported by Writeout.ai: there is of course also theEuropen as well as English, German, Spanish, French, Dutch, Portuguese, Russian, Japanese, Chinese and… klingon.

The latter is the language spoken by the humanoid extraterrestrial species of the science fiction universe of Star Trek: if you also want to communicate with them… you never know. What to say? Sense of humor at the highest level and fantastic insight of the authors of Writeout.ai who have evidently taken advantage of the fact that the generative model also knows languages that are the result of the imagination and not spoken on this planet.

The audio file cannot currently exceed 25 MB in size and must be in one of the formats following: mp3, mp4, mpeg, mpga, m4a, wav, webm. What does MP4 have to do with audio? Writeout.ai is able to extract the audio track contained in the videos.

Moreover, using one of the tools available on the web is even possible generare file MP3, M4A o WAV starting from a multimedia content published on the main video sharing platforms.

Transcribe audio into text and get an accurate translation with artificial intelligence

From the Writeout.ai home page just click on Transcribe for free then choose the audio file whose content is to be turned into editable text.

In the box Prompt we strongly advise you to insert a detailed description, in Europen or in any case, preferably, in the language used in the audio recording, of what is contained in the file. The information of Prompt must provide one description of the audio content which will be very useful for the generative model in order to “understand” specific words and acronyms in the most correct way.

With one click your Transcribeafter a few moments of waiting, Writeout.ai shows a page like the one below confirming that the transcription has been done and is available.

By clicking on the button Download transcriptthe browser automatically downloads a file in VTT format.

The acronym VTT stands for WebVTTwhat does it mean Web Video Text Tracks: This is a format created for i subtitles of web video but has become a widely used standard across several online video platforms, such as YouTube and Vimeo.

Il file VTT is pure text which can be opened with a normal text editor (or specialized software) and which contains subtitle information, such as the start and end time of each sentence compared to an audio or video content and the text to be shown in real time.

What you see when opening a VTT file generated with Writeout.ai is similar to what is reproduced in the figure. In the example we used the audio of our YouTube video to verify the quality of the transcription obtained. The result? Absolutely perfect!

In another article we have seen how to transcribe the text of YouTube subtitles but here, with Writeout.ai we are at a much higher level. To the video YouTube a file containing the corresponding subtitles is included: Writeout.aion the other hand, uses AI to transcribe text from audio tracks that don’t have any “attached” subtitles.

Using the same tips presented in the article dedicated to YouTube, it is still possible clean the VTT file and get only the transcript of the text.

Despite everything, Chrome and other web browsers consider the VTT format as “uncommon”: thus, at the time of text transcript downloadthe warning may be shown This file is not commonly downloaded and could be dangerous.

To proceed, just click the button Continua download and authorize the download of the text file in VTT format.

How to translate the transcript with just one click

It didn’t end there. By selecting one of supported languages and Writeout.aiand clicking on Changeyou can get one translation of the previously generated transcription: also in this case a VTT file is obtained with the indication of all the times for any subtitling of the audio or video contribution.

How to delete superfluous information from the VTT file

If you wanted to simply get the scrolling text from the transcript generated by Writeout.aiyou can open for example a text editor like Notepad++ then press the key combination CTRL+H to open the window Replace.

By pasting the following into the box Findsleaving the field blank Replace with and finally selecting the option Regular expressionwith one click your Replace all Notepad++ removes all timestamps:

\d{2}:\d{2}:\d{2}.\d+ –> \d{2}:\d{2}:\d{2}.\d+\s*\r?\n?

Per remove empty lines in the VTT file, just paste the following into the field Finds then repeating the procedure just described:
^\s*\r?\n

What are the limitations of Writeout.ai?

As we explained earlier, Writeout.ai it relies on the OpenAI API but uses the free plan. When the credits offered monthly by OpenAI run out, the application no longer works on the web showing the error message Transcription failed.

Writeout.ai works for the moment on the basis of the principle first come, first served: users who are the first to “consume” i token provided free by OpenAI can translate audio into text right away.

As we explain in the next paragraph, however, the source code Of Writeout.ai has been published on GitHub. Anyone can then activate a account OpenAI (even at no cost) and use Writeout.ai from a local system, within your network infrastructure. In this way it is possible to overcome the limitations currently present in the Web version of Writeout.ai.

How AI-powered audio-to-text transcription works

Laravel is an open source framework for developing PHP-based web applications that provides a robust and modular architecture as well as a wide range of predefined functionalities, such as user management, authentication, data processing, caching and many others, to create efficient code (also easy to manage and maintain) quickly.

On the GitHub page of Writeout.ai the developers explained that they used their own Laravel to interact in real time with the OpenAI API.

Specifically, the so-called were used queued job Of Laravel to perform the asynchronous tasks, in a reliable and scalable way, that lead to the generation of audio transcripts.

Thanks to the publication on GitHub of the source code Of Writeout.aidevelopers can simply clone the repository with the command git clone https://github.com/beyondcode/writeout.ai then register an account on OpenAI in order to obtain the key for using the API.

Your key must then be specified in the variable OPENAI_API_KEY inside the configuration file .env Of Writeout.ai.

The use of OpenAI API has a cost, as we also saw in the article on the integration between ChatGPT and Google Sheets. The work done by Writeout.ai however, it is adaptable, with a minimum effort, to local use.

The same OpenAI Whisper modelIndeed, it can work free locally: in this way it is possible to manage transcriptions and translations without going through the web and remote cloud services.

To the programmers who read us, we point out Whisper based on WASM: it is an interesting project, also in this case published on GitHub, which allows process audio data locallyon the user’s computer. Thanks to WASM (WebAssembly) and its ability to increase the performance of web applications, making it possible to execute high-performance code directly in the browser, transcription of text and its translation very quickly even locally, on your systems. Fantastic, right?