Skip to content

athlatif/Goodreads_BlogPost

Repository files navigation

Data Science Blog Post - Goodreads

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Acknowledgements

1. Installation

  • Python versions 3.*.
  • Libraries:
  • Pandas.
  • matplotlib.
  • seaborn.

2. Project Motivation

In this project, I was curious about the factors that motivate users and people in general to read a book. Goodreads is a website and a social network where users share reviews and find new books. I tried to answer the following:

  1. Are classic books better than modern books?
  2. What is the most popular genre?
  3. Does the number of reviews per book and the average rating influence users’ choices of books to read?

3. File Descriptions

  1. Notebook file
  2. Data files:
  • to_read.csv provides IDs of the books marked "to read" by each user, as user_id,book_id pairs, sorted by time. There are close to a million pairs.
  • books.csv has metadata for each book (goodreads IDs, authors, title, average rating, etc.).
  • book_tags.csv contains tags/shelves/genres assigned by users to books. Tags in this file are represented by their IDs. They are sorted by goodreads_book_id ascending and count descending.
  • tags.csv translates tag IDs to names.

4. Results

Please check the following blog post Here

5. Acknowledgements

Data credits Here

About

This project is a data analysis and visualisation of 10k+ records Goodreads dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published