Hello,

Sign up to join our community!

Welcome Back,

Please sign in to your account!

Forgot Password,

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Sorry, you do not have permission to ask a question, You must login to ask a question.

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Home Latest Topics

  • 9k
  • 9k
Kevin Stratvert

How to Install & Use Whisper AI Voice to Text

video
play-sharp-fill
12:44

Hi everyone, Kevin here. Today, we’re going to look at how you can both install and also use OpenAI’s Whisper AI. With Whisper, you can transcribe speech to text. It has extremely high quality. In fact, you can click on the caption icon down below on this video to see captions generated by Whisper. It works with over 96 different languages and my favorite part. It’s completely free to use. If you would prefer not to install anything on your PC, check out the video right up above, and that shows you how you can use Whisper AI entirely in the cloud. In this video, we’re going to install it on your PC. Let’s check this out. To get Whisper AI working on your computer, we need to install five different items, and I know that sounds like a lot, but we’ll walk through step by step how you install all of them. Also at the very end, if you no longer have a need for a Whisper AI for transcribing audio, I’ll also walk you through how you can uninstall all of this. First, we need to download something called Python. You can click on the card up above or the link down below in the description. Python is the programming language that Whisper AI uses. On the Python homepage, click on the text that says download and on the download page, you have a few different versions. Whisper AI works from version 3.7 all the way up to 3.10. It currently does not work on 3.11. If I scroll down a little bit, here we see all of the different release versions. I’ll click on 3.10.10, and on this page, if we scroll all the way to the bottom, here you can choose your operating system. I’m running a Windows machine, so I’ll select the Windows installer 64 bit. Once you finish downloading Python, in your downloads folder, click on the exe file. This kicks off the installation process, and there’s one thing you have to make sure to check. Down at the very bottom, you’ll see this checkbox that says add python.exe to path. Check this box. This allows us to run Python directly from the command prompt, and we’re going to do that later. So make sure to check that. Next, click on install now and run through the install. And just like that, it looks like the setup was successful. To confirm the installation, go down to the search icon down below on the taskbar and type in CMD for command prompt. This opens up the command prompt and you can type in python -V, V for version, and when you hit enter, here it tells me that I have Python 3.10.10 installed, and that’s exactly what I expected. Next, we need to install something called PyTorch. You can click on the card up above or the link down below in the description. PyTorch is a machine learning library. Here on the homepage, if we scroll down just a little bit, we see a section that says start locally. Basically, we want to run this on our computer. Right down below, we have to configure a few different settings. Right here we want to install the current stable version, so I’ll make sure this is selected. Right here, you can choose your operating system. It works on Linux, Mac, and also Windows. I’ll select Windows. Right down here, we have to choose the package type and I’ll select PIP since we just installed Python. For the language, we’ll use Python. And right down here, we can choose the compute platform. If you have a high-end graphics card in your computer, like let’s say an Nvidia graphics card, I would recommend choosing CUDA 11.8. That’s the most recent version. Over on the right-hand side, if you don’t have a high-powered GPU in your computer, then you could select CPU, but this doesn’t go as quickly as a dedicated graphics card. So ideally you could select this option. Once you make all these selections here, let’s copy this command down below. I’ll press control C. Back in command prompt, you can press control V or your right mouse button, and that will paste the command that we just copied. To install PyTorch, simply press enter now. And it looks like it has now successfully completed installing. We’re on number three now. See, we’re making some really good progress. Here we need to download a package manager called Chocolatey, and this will work on a Windows. If you’re running Mac, I recommend downloading and installing something called Homebrew. In the top right-hand corner, let’s click on the text that says install. This drops us onto the install page, and right down here, we need to choose how to install Chocolatey, Here, I’ll select individual. If we go down a little bit more, you’ll see a text box. Let’s click into this and then select copy. On your Windows desktop, go down to the search icon and type in PowerShell. Here, we see PowerShell as the best match. Right click on that and then select run as administrator. This now opens up PowerShell. You can press control V or your right mouse button, and that pastes in the command that we just copied from Chocolatey. Here, press enter, and this will go through and install Chocolatey. Now that we’ve finished installing Chocolatey, we’re going to use the Chocolatey package manager to install something called FFMPEG, and we’re going to use FFMPEG to read the different audio files, so whether it’s a WAV file or whether it’s an MP3. Down below within PowerShell, type in choco, this is using Chocolatey, then type in install, and we want to install FFMPEG, then hit enter. This will now install the package. Here I’ll click on yes. And here it looks like FFMPEG was successfully installed. Here I am now in command prompt in administrator mode, and this brings us to the fifth and final item to install. And that’s Whisper AI. To install it, type in pip install, and here I’ll type in a -U. That way, if for whatever reason you already have Whisper on your computer, that will upgrade it to the latest version. Next type in OpenAI-Whisper, and then hit enter. This will now go through and install Whisper AI. Congratulations. We have now finished installing all of the prerequisites to run Whisper AI. Next, navigate to the folder that has all of your different audio files. This will work with WAV files, MP3, MP4, all types of audio and also video files. Within File Explorer, click into the address field right up here and then type in CMD and press enter. This opens up command prompt, and we’re currently in the same directory that all of our audio files are in. So that’s perfect, and we’re now ready to finally run Whisper. To run Whisper, simply type in whisper space, and then type in the file name. Here I’ll type in sampleaudio1.wav. If your file say has spaces in it over, here you can put quotes around it. So here I could also type in sampleaudio1.wav, close my quotes, and that will also work. And that’s all you need to do. Let’s now hit enter, and this will start running Whisper AI. By default, this will use the small model and later on, I’ll show you how you can use other models. One of the really neat things is here you see that it automatically detects the language used in the file and here it’s successfully identified that I used English and right down below, I can see all the different text that makes up this file, so it looks like it has successfully transcribed the file. Let’s now minimize command prompt, and this brings me back into File Explorer and you’ll probably notice that we have several new files here and they’re all different file formats, but they all include the transcript. Here for instance, I can click into the JSON file and here I see a JSON file, and here we have all of the transcribed text. Especially if you want to pull your text in paragraph format, this is a really good way to do that here. I can click into the SRT file, and this is a caption file that includes a transcript of everything that was said, along with time stamps up above. Here you have some additional caption formats and here you also have a TXT file, and here we just see the pure text without any timestamps at all. Let’s now go back into command prompt to see how we can transcribe multiple files at once. Just like we did before, let’s type in Whisper and I’ll type in one of my file names, SampleAudio1.wav here by simply insert the space. And then I can type in another file name, here SampleAudio2.wav. And now I could press enter and it’ll go through and transcribe both audio files. This works especially well if let’s say you have a number of different files that you need to transcribe. And just like that, it has now finished transcribing both of my files. Here, if I minimize command prompt, here I can see that I have all of these files for each one of my audio files. That’s pretty quick and easy. By default, Whisper AI uses the small model, but you have five different models that you can choose from. In general, the larger the model, the better the quality that you’ll get, but you do need to have a GPU that’s capable of running that. Also, you’ll find that the larger the model, it also tends to take a longer time to process. And at least from what I found is there are diminishing returns as you go larger. Next, let’s look at how you can use one of these different models when you run your transcript. Back in command prompt, to use another model, simply type in whisper, your file name dot wave. And next let’s type in a dash dash and type in model. And here you can specify the model that you would like to use. I’ll type in medium and now simply press enter and it’ll use that other model. If you haven’t used that model before, first it will need to download it. And it’s now finished transcribing all of my audio. And the main thing that stands out to me is it looks like it included some additional punctuation, here attention comma, over here it included a comma, and I didn’t get that when I just used the small model. So, you do get slightly better quality, but again, it will take a little bit longer. Back within File Explorer, here I have a file titled German.wav, and this is audio in German, and I would like to transcribe this. Right up above, let’s again launch command prompt. Within command prompt, just like we’ve been doing all along, type in whisper and then the file name, Here I’ll type in German.wav. Now I could just press enter and it’ll auto detect the language, but I can also specify the language. Here I’ll type in language dash dash language space, and then here I could specify that it’s German. So, it doesn’t have to auto detect and then I could press enter. And just like that, here I have a transcript in German of all of the audio in this file. Along with transcribing audio in different languages, you can also translate the audio into English. Unfortunately, you cannot currently translate into any other language. Here I simply entered in the same command that I entered in previously. Here I’ll enter a dash dash and then task and currently by default, it’s set to transcribe, but here I could also set it to translate and then I’ll press enter and there I can see a translation of all of this German text. Now it’s not perfect and I’ll have to go back and make some tweaks, but overall, I’d say it’s pretty solid. As we’ve been walking through this, I’ve been using lots of different arguments like dash dash language or dash dash task, and if you’d like to see a list of all of the different arguments that you could use with Whisper, simply type in whisper dash dash, and then help. This one I’ll spit out a list of all the different arguments that you have available to you. And it also includes a description of what all of them do. So, for instance, you can choose where you want to save all of your additional files, and there are lots of other settings here. So, feel free to look through here to see what all of the different options are. On the following page, and you’ll find a link in the description, you can see all of the different languages that Whisper AI supports. In general, the lower the number, the higher the quality that you’ll see. Typically, when I finish transcribing, I’ll listen to the audio and look at the text just to make sure that it’s accurate. Overall, it works incredibly well, but I do find that I have to go back and make a few small tweaks here and there. If you decide that you no longer want Whisper AI on your computer, you can uninstall it by walking through all of these different steps. You’ll also find this in the description of this video. All right, well, let me know down below in the comments, were you successful at transcribing audio? To watch more videos like this one, please consider subscribing and I’ll see you in the next video.

Related Topics

You must login to add an answer.