Hi everyone. Kevin here. Today, we’re going to look at how you can take speech and turn it into text using AI. And the really crazy thing is that it does a better job than most humans. You can use it with English and 96 other languages. It works even if you have a lot of background noise. And it also works if you have a very thick accent. The best part is that it’s completely free and also open source. Let’s check out how to do this. We’re going to use an AI tool called Whisper. Whisper is made by a company called OpenAI. And you might have heard of them before. That’s the same company behind the immensely popular ChatGPT, which allows you to converse with a computer. They’re also the company behind Dalle2, where you can type in some text, and then it’ll generate an image based on that text. You can install Whisper directly on your computer. You can click on the link right up above. But you do need a somewhat capable computer. So instead, we’re going to use something called Google Colaboratory. This allows you to run code directly in your web browser. So it doesn’t really matter what type of PC you have. To use Google Colaboratory, head to Google Drive. You can click on the link right up above. You’ll need a Google account, and if you don’t have one yet, it’s entirely free to set up. On Google Drive, in the top left-hand corner, let’s click on the New button. And at the very bottom, let’s click on More, and then go down to Connect More Apps. At the top of this dialog, let’s click into the search field, and here, type in Google Colaboratory and then search. Here, we see this result for Colaboratory. Let’s click on that, and here, let’s click on Install. Next, let’s click on Continue. Next, you should see a message saying that Google Colaboratory was connected to Google Drive. Let’s click on OK. And look at that. It has successfully been installed. Let’s click on Done. Now, you can close out this window. Let’s now go back to the top left-hand corner. Click on the New button again. Then go down to More. And here, you should now see an option for Google Colaboratory. Let’s click on this one. This drops us into the Google Colaboratory space. And at first glance, it might look a little bit intimidating. But trust me, this is going to be so easy, and the results are going to be so good. In the top left-hand corner, first off, let’s give our file a name. This way, you could find your way back to this in the future. I’ll click on Untitled. Let’s double-click on that, and here, I’ll type in Transcribe Audio. Here, I’ll click away, and that’s now the name of the file. Next, let’s click on the menu titled Runtime, and right here, there’s the option for Change Runtime Type. Let’s click on that, and that opens up this dialog where we can choose the hardware accelerator. Be sure to select GPU or a graphics card. It turns out that graphics cards run these models extremely well. Next, let’s click on Save. Next, we need to install Whisper AI. So let’s go up to this field right up above where we can enter in code. And here, I’ll enter this in. You’ll find this in the description, so you could simply copy and paste it from there. First, we’re going to install Whisper, and we’re getting this from GitHub. This is where all of the code is kept and also maintained. Once we get that, we’re going to install something called ffmpeg. And this allows us to work with audio and video files. And although I say we’re going to install it, don’t worry, we’re not installing anything on your computer. This is installing it all to the Google Colaboratory. Once you’re all set, over on the left-hand side, let’s click on this Run icon. This will now go through and install Whisper and also ffmpeg. And it looks like the installation finished in about 23 seconds. Not too bad. Over on the left-hand side, let’s click on this Folder icon. And you can now drag in an audio file or a video file that you would like to transcribe. Here, I have an MP3 file, and I’ll simply drop this in. Here, it says that the uploaded files will get deleted when this runtime is recycled. That’s OK, so let’s click on OK. And now we can see that the file has been successfully uploaded. I’m now ready to extract text from this audio file. Let’s go back up to the top and here, I’ll insert some code. This inserts another field down below, and here, I’ll type in Whisper. Here, this is calling the Whisper AI. Then you need to type in the name of the file that you want to extract text from. Mine is called cookies.mp3. So here, I’ll make sure it says cookies.mp3. And last, you can also specify the model that you would like to use. I want to use the medium model. You have five different models that you can choose from. On the low end, you have the tiny model. This takes up the least space. It also works the quickest, but you get the worst accuracy. On the other end, you have the large model. It takes up about a gig and a half. It also takes the longest time to process. But you also get the highest quality level. I found that a good sweet spot is going with the medium model. Once you finish entering this in, let’s click on the Run icon. And check that out. It has now finished running. And right down here, I can see a transcript of everything that was said in this audio file. Also, over on the left-hand side, if you don’t see these three new files, right up on top, click on the Refresh icon, and you should see an SRT file, a TXT file, and a VTT file. A text file is just all of the text from the audio. SRT and VTT, these are caption formats that also include timestamps, so you know what was said when. To download any one of these files, over on the right-hand side, click on the ellipsis or the three dot, and here you can click on Download. I’ll download the SRT file and also the TXT file. Here, I’ll click on Download. Here, we can see the TXT file. And the thing I love about using Whisper is first off, reading through this, it looks like it did a perfect job transcribing. Also, look at all of this, it applied capitalization. You also get punctuation, so this is a very high-quality transcript. When I open up the SRT file, here you’ll see the exact same transcript, but it also includes timestamps for when everything is said. To transcribe another file, you could simply drag another audio or video file in, and then simply update the name right here, and you can run again, and then you’ll get another transcript for your next file. To transcribe this file, we just use a very basic command. You also have some additional parameters that you can use. Right up on top, let’s add some more code, and right down here, type in whisper -h. You’ll also find this in the description, and then let’s click on Run. This opens up all of the available parameters. Here, for instance, you can specify where you want to save the output. Here, you could also specify whether you want to transcribe a file or whether you also want to translate a file. Here, you could also specify the language, and you have many other parameters. If you’re not sure what a parameter does, if you scroll down a little bit, here you’ll see a detailed explanation of what every single parameter does. Once you leave Google Colaboratory, your runtime will end, and it’ll automatically remove all of your files. So if you’ve transcribed some audio, I’d recommend downloading it first before you leave. This is such amazing technology. I personally use it for all of my YouTube video captions. It does a better job than Google’s auto-generated captions because it gets all the words right. It applies capitalization. It takes care of the punctuation. I just have to go in and make a few very minor tweaks and refinements to get it perfect. To watch more videos like this one, please consider subscribing, and I’ll see you in the next video.