Python-Project

Investigating Netflix Movies and Guest Stars in The Office

Create the years and durations lists

years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] durations = [103, 101, 99, 100, 100, 95, 95, 96, 93, 90]

Create a dictionary with the two lists

movie_dict = { "years" : years, "durations": durations }

Print the dictionary

movie_dict

Import pandas under its usual alias

import pandas as pd

Create a DataFrame from the dictionary

durations_df = pd.DataFrame(movie_dict)

Print the DataFrame

print((durations_df.head))

Import matplotlib.pyplot under its usual alias and create a figure

import matplotlib.pyplot as plt fig = plt.figure()

Draw a line plot of release_years and durations

plt.plot(durations_df["years"],durations_df["durations"]) plt.xlabel("Release Years") plt.ylabel("Durations")

Create a title

plt.title("Netflix Movie Durations 2011-2020")

Show the plot

plt.show()

Read in the CSV as a DataFrame

netflix_df = pd.read_csv("datasets/netflix_data.csv")

Print the first five rows of the DataFrame

print(netflix_df.head())

Subset the DataFrame for type "Movie"

netflix_df_movies_only = netflix_df[netflix_df["type"] == "Movie"]

Select only the columns of interest

netflix_movies_col_subset = netflix_df_movies_only[["title", "country", "genre", "release_year","duration"]]

Print the first five rows of the new DataFrame

print(netflix_movies_col_subset.head())

Create a figure and increase the figure size

fig = plt.figure(figsize=(12,8))

Create a scatter plot of duration versus year

plt.scatter(netflix_movies_col_subset["release_year"],netflix_movies_col_subset["duration"]) plt.xlabel("Release Year") plt.ylabel("Duration")

Create a title

plt.title("Movie Duration by Year of Release")

Show the plot

plt.show()

Filter for durations shorter than 60 minutes

short_movies = netflix_movies_col_subset[netflix_movies_col_subset["duration"] < 60]

Print the first 20 rows of short_movies

print(short_movies.head(20))

Define an empty list

colors = []

Iterate over rows of netflix_movies_col_subset

for lab, row in netflix_movies_col_subset.iterrows() : if row['genre'] == "Children" : colors.append("red") elif row['genre'] == "Documentaries" : colors.append("blue") elif row['genre'] == "Stand-Up" : colors.append("green") else: colors.append("black")

Inspect the first 10 values in your list

print(colors[:11])

Set the figure style and initalize a new figure

plt.style.use('fivethirtyeight') fig = plt.figure(figsize=(12,8))

Create a scatter plot of duration versus release_year

plt.scatter(netflix_movies_col_subset["release_year"],netflix_movies_col_subset["duration"],c=colors)

Create a title and axis labels

plt.xlabel("Release Year") plt.ylabel("Duration(min)") plt.title("Movie duration by year of release")

Show the plot

plt.show()

Are we certain that movies are getting shorter?

are_movies_getting_shorter = "No, all movies are not getting shorter but we can conclude more short movies are produced lately"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python-Project

Create the years and durations lists

Create a dictionary with the two lists

Print the dictionary

Import pandas under its usual alias

Create a DataFrame from the dictionary

Print the DataFrame

Import matplotlib.pyplot under its usual alias and create a figure

Draw a line plot of release_years and durations

Create a title

Show the plot

Read in the CSV as a DataFrame

Print the first five rows of the DataFrame

Subset the DataFrame for type "Movie"

Select only the columns of interest

Print the first five rows of the new DataFrame

Create a figure and increase the figure size

Create a scatter plot of duration versus year

Create a title

Show the plot

Filter for durations shorter than 60 minutes

Print the first 20 rows of short_movies

Define an empty list

Iterate over rows of netflix_movies_col_subset

Inspect the first 10 values in your list

Set the figure style and initalize a new figure

Create a scatter plot of duration versus release_year

Create a title and axis labels

Show the plot

Are we certain that movies are getting shorter?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python-Project

Create the years and durations lists

Create a dictionary with the two lists

Print the dictionary

Import pandas under its usual alias

Create a DataFrame from the dictionary

Print the DataFrame

Import matplotlib.pyplot under its usual alias and create a figure

Draw a line plot of release_years and durations

Create a title

Show the plot

Read in the CSV as a DataFrame

Print the first five rows of the DataFrame

Subset the DataFrame for type "Movie"

Select only the columns of interest

Print the first five rows of the new DataFrame

Create a figure and increase the figure size

Create a scatter plot of duration versus year

Create a title

Show the plot

Filter for durations shorter than 60 minutes

Print the first 20 rows of short_movies

Define an empty list

Iterate over rows of netflix_movies_col_subset

Inspect the first 10 values in your list

Set the figure style and initalize a new figure

Create a scatter plot of duration versus release_year

Create a title and axis labels

Show the plot

Are we certain that movies are getting shorter?