Analyzing Netflix Movies and TV Shows Dataset – A Simple Data Science Project

Analyzing Netflix Movies and TV Shows Dataset – A Simple Data Science Project

Analyzing Netflix Movies and TV Shows Dataset – A Simple Data Science Project

This project explores a dataset of Netflix titles to uncover insights about content type, release trends, and popular genres. It's a great beginner data science project using Python and pandas.

Tools Used: Python, pandas, matplotlib, seaborn

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Load the Dataset

Download the dataset from Kaggle: “Netflix Movies and TV Shows”

df = pd.read_csv("netflix_titles.csv")
df.head()

Step 3: Basic Information

df.info()
df.isnull().sum()

Fill missing values in 'country' or 'director' if needed.

Step 4: Data Cleaning

df['date_added'] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month

Step 5: Data Visualization

Content Type Count

sns.countplot(data=df, x='type', palette='pastel')
plt.title("Content Type Distribution")
plt.show()

Top 10 Countries by Number of Shows

top_countries = df['country'].value_counts().head(10)
top_countries.plot(kind='barh', color='coral')
plt.title("Top 10 Countries with Netflix Content")
plt.show()

New Titles Added by Year

df['year_added'].value_counts().sort_index().plot(kind='bar', color='skyblue')
plt.title("Content Added by Year")
plt.xlabel("Year")
plt.ylabel("Number of Titles")
plt.show()

Step 6: Genre and Duration Analysis

Most Common Genres

df['listed_in'].value_counts().head(10)

Movie Duration Distribution

df_movies = df[df['type'] == 'Movie']
df_movies['duration'] = df_movies['duration'].str.replace(' min', '').astype(int)
df_movies['duration'].plot(kind='hist', bins=20, color='purple')
plt.title("Movie Duration Distribution")
plt.xlabel("Minutes")
plt.show()

Step 7: Conclusion

  • Most of the content on Netflix is movies.
  • United States contributes the most to Netflix's catalog.
  • Content uploads have grown significantly over recent years.
  • Most movies are between 80 to 120 minutes long.

What's Next?

  • Create visual dashboards using Plotly or Power BI.
  • Perform sentiment analysis on Netflix show descriptions.
  • Apply clustering to group similar titles based on metadata.

Comments

Popular posts from this blog

Career Guide - B.Tech Students

How to Get a Job in Top IT MNCs (TCS, Infosys, Wipro, Google, etc.) – Step-by-Step Guide for B.Tech Final Year Students

Common HR Interview Questions