- Youtube -> Text: Translate youtube urls as text file (csv)
- Youtube -> Audio: Downloads youtube urls as audio file (wav, flac)
- Audio -> Text: Translate audio file (wav, flac) to text file (csv)
Three folders will be created to store the output files.
<Own Path> or <HOME_DIRECTORY>/youtube2text
│
├── audio/
│ └── 2022Jan02_011802.flac
|
├── audio-chunks/
│ └── 2022Jan02_011802
│ ├── chunk1.flac
│ ├── chunk2.flac
│ └── chunk3.flac
│
└── text/
└── 2022Jan02_011802.csv
Install and update using pip
pip install youtube2text
git clone <this_repo>
cd <this_repo>
python setup.py install
- Using the library requires internet connection for both downloading youtube videos and speech recognition operation
from youtube2text import Youtube2Text
converter = Youtube2Text()
converter.url2text(urlpath="https://www.youtube.com/watch?v=Ad9Q8rM0Am0&t=114s")
Check out more at howtouse.ipynb
- Support audio output of
- wav
- flac
- Support Automatic Speech Recognition with speech-recognition library
def url2text(self, urlpath, outfile = None, audioformat = "flac", audiosamplingrate=16000):
'''
Convert youtube url to text
Parameters:
urlpath (str): Youtube url
outfile (str, optional): File path/name of output file (.csv)
audioformat (str, optional): Audioformat supported in self.__audioextension
audiosamplingrate (int, optional): Audio sampling rate
'''
def url2audio(self, urlpath, audiofile = None, audiosamplingrate=16000):
'''
Convert youtube url to audiofile
Parameters:
urlpath (str): Youtube url
audiofile (str, optional): File path/name to save audio file
audiosamplingrate (int, optional): Audio sampling rate
'''
def audio2text(self, audiofile, textfile = None):
'''
Convert audio to csv file
Parameters:
audiofile (str): File path/name of audio file
textfile (str, optional): File path/name of text file (*.csv)
'''
- This repository is highly dependent on Pytube to download Youtube videos, which at times buggy. Workaround is often provided in issues page of Pytube repository or in this repository. Do take the intiative to file for issues to help others who will use this repository.
Read out the article below on how to use the repository.
This repository is created out from personal use to retrieve audio files for conversational speech recognition and audio classification.
For custom functionality development support, enterprise support and other related questions, reach out at