Hey there, tech enthusiasts! In this Whisper tutorial, we'll dive into the world of audio transcription using Python. We'll be working with the Pytube library to download and convert YouTube video audio into an MP4 file, and then use Whisper to transcribe the audio into text.
First things first, let's install the Pytube library. Open your terminal and run the following command:
pip install pytube
Now that Pytube is installed, let's move on to the next step.
Next, we need to import Pytube and provide the link to the YouTube video we want to transcribe. We'll use the following code to convert the audio to MP4:
Import Pytube
The output is a file named like the video title in your current directory. In our case, the file is named "Python in 100 Seconds.mp4".
Now, it's time to convert audio into text using Whisper. We'll start by installing and importing the Whisper library:
pip install whisper
Next, we'll load the model. We'll use the "base" model for this tutorial, but you can find more information about the models here. Each one has tradeoffs between accuracy and speed (compute needed).
Finally, we'll transcribe the audio file using the following code:
transcript
And that's it! We can print out the output:
print(transcript)
First things first, let's install the Pytube library. Open your terminal and run the following command:
pip install pytube
Python:
pip install pytube
Now that Pytube is installed, let's move on to the next step.
Next, we need to import Pytube and provide the link to the YouTube video we want to transcribe. We'll use the following code to convert the audio to MP4:
Import Pytube
Code:
#Importing Pytube library
import pytube
# Reading the YouTube link
video = "https://www.youtube.com/watch?v=x7X9w_GIm1s"
data = pytube.YouTube(video)
# Converting and downloading as 'MP4' file
audio = data.streams.get_audio_only()
audio.download()
The output is a file named like the video title in your current directory. In our case, the file is named "Python in 100 Seconds.mp4".
Now, it's time to convert audio into text using Whisper. We'll start by installing and importing the Whisper library:
pip install whisper
Code:
!pip install git+https://github.com/openai/whisper.git -q
Code:
import whisper
Next, we'll load the model. We'll use the "base" model for this tutorial, but you can find more information about the models here. Each one has tradeoffs between accuracy and speed (compute needed).
Finally, we'll transcribe the audio file using the following code:
transcript
Code:
model = whisper.load_model("base")
text = model.transcribe("Python in 100 Seconds.mp4")
And that's it! We can print out the output:
print(transcript)
Code:
#printing the transcribe
text['text']