You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inappropriate content: NSFW, hate speech, offensive words, sentiments
Title, captions, tags, comments
Quality data:
Length, Aspect Ratio, Resolution, likes, views, number of subscribers of the creator, comments filtered with language model, ensure some amount of movement in the video
Language: English for now
(From Michael) the final version of the dataset from the collection side of things will look like the following:
channels.tsv with columns ['link', 'name', 'description', 'subscribers', 'isFamilySafe', 'tags']
videos.tsv with columns ['channel_link', 'id', 'title', 'date', 'length', 'views']
a sample with a 2 videos from a random set of channels listed in channels.tsv can be found in the DuckAI google drive (note there may be some duplicates!!)
Loose tasks:
Data analysis on metadata in the TSVs
Build pipeline to collect metadata using video2dataset (may need to alter video2dataset)
Analysis on NSFW (using some kind of NSFW filter)
English filter
Pixel based filters using thumbnails (see comments below on youtube's auto generated thumbnails)
Brainstorm more ideas on filtering to have high quality video!
The text was updated successfully, but these errors were encountered:
We could use VidGear/CamGear to do frame based filtering/streaming (it allow for reading in frames of a video and streaming without needing to download)
Inappropriate content: NSFW, hate speech, offensive words, sentiments
Title, captions, tags, comments
Quality data:
Length, Aspect Ratio, Resolution, likes, views, number of subscribers of the creator, comments filtered with language model, ensure some amount of movement in the video
Language: English for now
(From Michael) the final version of the dataset from the collection side of things will look like the following:
channels.tsv
with columns ['link', 'name', 'description', 'subscribers', 'isFamilySafe', 'tags']videos.tsv
with columns ['channel_link', 'id', 'title', 'date', 'length', 'views']Loose tasks:
The text was updated successfully, but these errors were encountered: