Netflix Data Analysis and Visualization Using Python

Fitri Widya Nanda
8 min readAug 31, 2021
picture by eDigital

Netflix is one of the most popular digital streaming media service providers today. Netflix provides streaming services for movies and tv shows from various countries in the world. As a digital media with many users, Netflix also has a very large amount of data. In this article, I will perform data analysis using Netflix data. Netflix data analysis will be performed using several python libraries.

Dataset: The dataset that is used comes from Netflix Movies and TV Shows 2021. To download the dataset, please click here.

Source Code: Please check my Github page to see the source code.

Table of Contents:

  1. Importing Libraries
  2. Loading the Dataset
  3. Data Overview
  4. Data Cleansing
  5. Exploratory Data Analysis and Visualization
  6. Conclusions

Importing Libraries

Here are some libraries that will be used.

Loading the Dataset

The dataset is loaded using the pandas library and then named netflix.

Data Overview

#The first 3 rows of the dataset

#Shape of the dataset

Based on the output obtained, it can be seen that the dataset consists of 5,967 rows and 13 columns.

#Columns of the dataset

The dataset has 13 columns consisting of show_id, title, description, director, genres, cast, production_country, release_date, duration, imdb_score, content_type, date_added.

#Information of the dataset

Here we can see dataset information such as the number of non-null data and the data type of each column.

Based on the output obtained, it can be seen that the majority of columns have a total of 5,967 non-null values, but there are 8 columns that have a varying number of non-null values. In addition, it can be seen that 12 columns have an object data type and 1 column has a float data type.

#Number of null values per column

From the output above, we know that there are 8 columns with null values. ‘director column has 2,064 null values, ‘cast’ has 530 null values, ‘production_country’ has 559 null values, ‘release_date’ has 3 null values, ‘rating’ has 4 null values, ‘duration’ has 3 null values, ‘imdb_score’ has 608 null values, and ‘date_added’ has 1,335 null values.

Data Cleansing

#Imputing missing values

Panda’s fillna() function can be used to fill in missing values in a dataset. I decided to replace the null values in the ‘cast’ column with cast unavailable, I also replaced the null values in the ‘production_country’ column with production country unavailable, and replaced the null values in the ‘director’ column with director unavailable.

#Dropping missing values

Columns ‘release_date’, ‘rating’, ‘duration’, and ‘imdb_score’ have not too many missing values. Therefore, I will remove any rows that have missing values in those columns. Meanwhile, the ‘date_added’ column has quite a lot of missing values so I decided to get rid of the date_added column.

#Converting data type

Some of the columns have incompatible data types. The data type of the ‘imdb_score’ column should be float not object and the data type of the ‘release date’ column should be integer not float. Therefore the data type must be converted.

#Dataset information

The following is the dataset information after cleaning the data.

From the output, we can see that all columns now have 5,359 non-null data, ‘release_date’ column has been changed to ‘release_year’, data type ‘release_year’ has changed to integer, and data type ‘imdb_score’ has changed to floats. Since the data cleaning has been completed, we can move on to the exploratory data analysis and visualization process.

Exploratory Data Analysis and Visualization

  1. Top 10 Content Producing Countries

The first analysis begins by finding the top 10 countries that produce the largest number of content titles. We can enter the following code to see the top 10 countries that produce the largest number of content titles on Netflix:

From the chart above, we can see the top 10 countries that produce the largest number of content titles on Netflix. The first position is the United States with 2,000 ++ Netflix content.

2. Distribution Map of Producing Countries

Next, we want to see a map of the distribution of countries that produce Netflix content .

From the picture above, we can see that countries that produce Netflix content are marked in dark blue. We have also seen that almost half of all countries in the world produce Netflix content.

3. The Number of Content Titles by Rating

To find out the number of content titles by rating, we can enter the code below:

Based on the chart above, can be seen the number of content titles for each rating. The rating with the largest number of content titles is TV-MA, which is content intended for adult viewers and may be unsuitable for children under 17. Meanwhile, the rating with the smallest number of content titles is NC-17 (no children under 17 admitted).

4. Top 10 Genres with the Largest Number of Content Titles

Now, we want to look at the top 10 genres with the largest number of content titles.

From the chart above, we can see the top 10 genres with the largest number of content titles and the first rank is International Movies with 1,700++ content titles.

5. Movies and TV Shows Ratings

Next, we want to know the rating for each content. We can enter the code below to see it:

Based on the picture above, it can be seen that the rating with the largest number of Movies content is TV-MA with 1400++ content titles and the rating with the highest number of TV Shows content is also TV-MA with 800++ content titles.

6. Comparison of Ratings in the US and Indonesia

We want to know the difference in the number of content titles for movies and tv shows based on ratings in Indonesia and the United States.

From the picture above, we can see that the United States produces more types of TV Show content than Indonesia. We also know that the rating with the largest number of Movies and TV Shows content in the Unites States is TV-MA, while the rating with the largest number of Movies and TV Shows content in Indonesia is TV-14. Content rated TV-14 contain material that parents or adults may find unsuitable for children under the age of 14 and content rated TV-MA is intended to be viewed by mature, adult audiences and may be unsuitable for children under 17. So, it can be seen that Indonesia produces more Netflix content for those aged 14 years and over, while the United States produces more Netflix content for those aged 17 years and over.

7. The Number of Content Titles in the Last 10 Years

To find out the number of content titles in the last 10 years, we can enter the following code:

From the graph above, we can see that the number of content titles on Netflix continued to increase from 2012 to 2019. However, the number of content titles in 2020 experienced a slight decline compared to the previous year. This may be due to Covid-19 which appeared in early 2020. The number of content titles in 2021 is of course less than the others, considering that this analysis was carried out in mid-2021, so we can ignore it.

8. Top 10 Actors by Number of Titles

Next, we want to find out who are the 10 actors with the largest number of content titles on Netflix.

The 10 actors with the largest number of content titles on Netflix can be seen in the image above. The first position is occupied by Shah Rukh Khan who is an actor from India.

9. Top 5 Durations Based on The Number of Content Titles

We want to know the 5 durations with the largest number of content titles on Netflix.

The top 5 durations with the largest number of content titles on Netflix are 1 season, 2 seasons, 3 seasons, 102 minutes, and 97 minutes. The duration with the largest number of content titles is 1 season with 1000+ content titles on Netflix.

10. The Percentage of Content Types

Netflix content types are divided into Movie and TV Show. So, to find out the percentage of Netflix content types we can enter the following code:

Based on the chart above, it can be seen that the percentage of movie is 66.3% of the total content, while the percentage of TV show is 33.7% of the total content. So, out of 5,359 content titles, the number of Movies content types is 3,551 and the number of TV Shows content types is 1,808.

11. Top 10 IMDb Scores with the Largest Number of Content Titles

Now, we want to find the 10 IMDb Scores with the largest number of content titles on Netflix.

The image above shows 10 IMDb scores with the largest number of content titles on Netflix. We can see that 200++ Netflix content titles have an IMDb score of 7.1 out of 10.

12. Top 5 Directors with the Largest Number of Content Titles

The final analysis we will do is find out who the 5 directors with the largest number of content titles on Netflix are. So, we can enter the following code to find out.

The chart above shows that the top 5 directors with the largest number of content titles on Netflix are Jan Suter, Raul Compos, Marcus Raboy, Jay Karas, Youssef Chahine. The first position is occupied by Jan Suter who has directed 20++ number of content titles on Netflix.

Conclusions

  • Country that produces the largest number of content titles on Netflix is the United States with 2,000++ content titles production.
  • The genre with the largest number of content titles is International Movies with 1,700++ content.
  • Rating with the largest number of Movies content is TV-MA with 1,400++ content titles and the rating with the largest number of TV Shows content is also TV-MA with 800++ content titles.
  • The number of content titles on Netflix continued to increase from 2012 to 2019.
  • The actor with the largest number of content titles on Netflix is Shah Rukh Khan.
  • The percentage of movies is 66.3% of the total content, while the percentage of TV shows is 33.7% of the total content.
  • The Director with the largest number of content titles on Netflix is Jan Suter who has directed 20++ number of content titles on Netflix.

--

--