MohsenAmiri79 / DigiKlothes Public

Notifications You must be signed in to change notification settings
Fork 3
Star 14

A dataset of more than 55,000 clothing items in the digikala website and their current information, such as, name, item url, image url (+ current price, rating & discount).

GPL-3.0 license

14 stars 3 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
clean_Data		clean_Data
clean_reduced_Data		clean_reduced_Data
raw_Data		raw_Data
LICENSE		LICENSE
README.md		README.md
Single product image scraper.py		Single product image scraper.py
category product scraper.py		category product scraper.py
preprocess_clean.ipynb		preprocess_clean.ipynb
preprocess_reduce.ipynb		preprocess_reduce.ipynb

Repository files navigation

DigiKlothes

A Digikala Clothes Image Dataset

Introduction

This dataset contains:

more than 55,000 clothing items from digikala.com
name, price, discount, rating, url and a list of the url for all of each item's images.

Data

What each folder contains:

raw_Data: Contains everything except image urls.
clean_Data: Contains everything.
clean_reduced_Data: Contains only names, urls, and image urls.

Reproducing the data

What each code file does:

'category product scraper.py' searches the input category in the website and extracts the name, price, discount, rating and url of each item.
'preprocess_clean.ipynb' uses the csv files in the 'raw_Data' folder and finds the image links for each record of the csv file.
'Single product image scraper.py' extracts the image links of the input (have to be set by hand) link.
'preprocess_reduce.ipynb' uses the csv files in the 'clean_Data' folder and drops the price, discount and rating columns.

About

A dataset of more than 55,000 clothing items in the digikala website and their current information, such as, name, item url, image url (+ current price, rating & discount).

scraper dataset web-scraping clothes preprocessing image-retrieval data-crawling digikala clothes-retrieval digikala-crawler digikala-images digikala-clothes clothes-dataset

GPL-3.0 license

Report repository

Releases

No releases published

Packages

No packages published

Languages