Data Analysis for E-Commerce Challenge
Data analysis for e-commerce challenge is one of the project-based modules provided by DQLab to find out the extent of our ability to analyze e-commerce company data. E-commerce sales data was analyzed using Google BigQuery as a cloud data warehouse (CDW) that allows us to run super-fast queries against large data sets. The results of the analysis are then visualized using the Tableau software.
Goals: Provide business insights based on the e-commerce performance in 2019–2020.
Source Code: Please check my Github page to see the source code.
Dataset: The dataset that is used comes from the DQLab Store (e-commerce) where buyers and sellers meet each other. Users can buy products from other users who sell them. Each user can be both a buyer and a seller. The dataset contains e-commerce transactions from January 2019 to May 2020. To download the dataset, please click here.
Tables Description :
- Orders: The order table contains sales transaction data. The order table consists of 10 columns and 74,874 rows. Based on the existing 74,874 transaction data, it can be seen that there were orders that were successfully paid and sent, orders that were not paid and not shipped, and orders that were paid and not sent. Further information about the orders table can be seen in the image below:
2. Order_details: The order_details table contains sales transaction data in more detail when compared to the orders table. From the order_details table, it can be seen that the total quantity sold during the January 2019-May 2020 period was 4,025,016. The total quantity sold throughout 2019 was 1,536,259, while the total quantity sold in January-May 2020 was 2,488,757. More information about the orders_details table can be seen in the image below:
3. Products: The products table contains data about product names, product categories, etc. Products are divided into 12 product categories and 1,145 unique products. Further information regarding the products table can be seen in the image below:
4. Users: The users table contains the user's personal information. Based on the users table, it can be seen that there are 17,936 unique users. Further information about the users table can be seen in the image below:
Monthly Transaction Summary
To find out the number of transactions every month, we can enter the following query:
SELECT FORMAT_DATE(‘%Y-%m’, created_at) AS Months, COUNT(order_id) AS Number_of_Transactions
FROM `first-trial-321912.dqlab_project.order`
GROUP BY 1
ORDER BY 1;
Output:
Based on the output of the query results, it can be seen that the number of transactions continues to increase every month from January 2019 to December 2019. The number of transactions decreased in January 2020 and then increased again every month. December 2019 was the month with the highest number of transactions compared to other months.
Transaction Status
Transaction status in e-commerce data is divided into the time when the order was created, the time when the order was paid, and the time when the order was delivered. However, there are unpaid or undeliverable orders. Therefore I divide the transaction status into orders completed (when the order has been paid and shipped), orders paid but not shipped, and orders not paid and not shipped.
- The number of transactions that have been paid and sent
We can enter the following query to get the number of transactions that have been paid and sent:
SELECT COUNT(order_id) AS Complited_Order
FROM `first-trial-321912.dqlab_project.order`
WHERE paid_at IS NOT NULL AND delivery_at IS NOT NULL;
Output:
2. The number of transactions that have been paid and not sent
We can enter the following query to get the number of transactions that have been paid and not sent:
SELECT COUNT(order_id) AS Undelivered_Order
FROM `first-trial-321912.dqlab_project.order`
WHERE paid_at IS NOT NULL AND delivery_at IS NULL;
Output:
3. The number of transactions that are not paid and not sent
We can enter the following query to get the number of transactions that are not paid and not sent
SELECT COUNT(order_id)
FROM `first-trial-321912.dqlab_project.order`
WHERE paid_at IS NULL AND delivery_at IS NULL;
Output:
Based on 74,874 transaction data, it can be seen that there were a total of 65,084 orders that were completed (successfully paid and sent), 5,046 orders that were not paid and not sent, and 4,744 orders that were paid and not sent.
User Transactions
E-commerce users are generally divided into sellers and buyers. Therefore we will count the number of users who transact as sellers, buyers, sellers and buyers, and users who have never made a transaction
- The number of users who have transacted as buyers
We can enter the following query to get the number of users who have transacted as buyers
SELECT COUNT(DISTINCT buyer_id) AS buyers
FROM `first-trial-321912.dqlab_project.order`;
Output:
2. The number of users who have transacted as sellers
We can enter the following query to get the number of users who have transacted as sellers
SELECT COUNT(DISTINCT seller_id) AS sellers
FROM `first-trial-321912.dqlab_project.order`;
Output:
3. The number of users who have transacted as sellers and buyers
We can enter the following query to get the number of users who have transacted as sellers and buyers
SELECT COUNT(DISTINCT seller_id) AS buyer_seller
FROM `first-trial-321912.dqlab_project.order`
WHERE seller_id IN (SELECT buyer_id FROM `first-trial-321912.dqlab_project.order`);
Output:
4. The number of users who have never transacted as sellers and buyers
We can enter the following query to get the number of users who have never transacted as sellers and buyers
SELECT COUNT(DISTINCT user_id) AS no_transaction
FROM `first-trial-321912.dqlab_project.users`
WHERE user_id NOT IN
(SELECT buyer_id FROM `first-trial-321912.dqlab_project.order`
UNION ALL
SELECT seller_id FROM `first-trial-321912.dqlab_project.order`);
Top 5 Users with The highest Purchase Value
To find out who are the 5 users with the highest purchase value, we can enter the following query:
SELECT user_id, nama_user, SUM(total) AS total
FROM `first-trial-321912.dqlab_project.users`
JOIN `first-trial-321912.dqlab_project.order`
ON user_id = buyer_id
GROUP BY user_id,nama_user
ORDER BY total DESC
LIMIT 5;
Output:
The query results show that the 5 users with the highest purchase value have a total transaction value above Rp. 40,000,000. The user with the highest total transaction value is Jaga Puspasari with a total transaction value of Rp.54,102,250.
Top 5 Users with The Highest Number of Transactions But Never Use Discount
To find out who are the 5 users with the highest number of transactions but never use discount, we can enter the following query:
SELECT user_id, nama_user, COUNT(total) AS total
FROM `first-trial-321912.dqlab_project.users`
JOIN `first-trial-321912.dqlab_project.order`
ON user_id = buyer_id
GROUP BY user_id,nama_user
HAVING SUM(discount) = 0
ORDER BY total DESC
LIMIT 5;
Output:
Based on the query results, it can be seen that the 5 users with the highest number of transactions but never use discount all have made transactions more than 10 times. The user with the largest number of transactions and has never used a discount is Yessi Wibisono with a total of 13 transactions.
Top 5 Users with The Highest Number of Transactions and Highest Purchase Value
To find out who are the 5 users with the highest number of transactions and highest purchase value, we can enter the following query:
SELECT user_id, nama_user, COUNT(user_id) AS transaction, SUM(total) AS total
FROM `first-trial-321912.dqlab_project.users`
JOIN `first-trial-321912.dqlab_project.order`
ON user_id = buyer_id
GROUP BY user_id, nama_user
ORDER BY transaction DESC, total DESC
LIMIT 5;
Output:
The 5 users with the highest number of transactions and highest purchase value have made transactions more than 10 times. The user with the largest number of transactions and highest purchase value is Yessi Wibisono with a total of 13 transactions and Rp.40,589,000 total purchase value.
Number of Transactions with Discounts and without Discounts
We can enter the following query to get the number of transactions using discounts:
SELECT COUNT(order_id) AS discount, SUM(discount) AS total_discountFROM `first-trial-321912.dqlab_project.order`WHERE discount != 0;
Output:
We can enter the following query to get the number of transactions without using a discount:
SELECT COUNT(order_id) AS non_discount
FROM `first-trial-321912.dqlab_project.order`
WHERE discount = 0;
Output:
Based on 74,874 transaction data, it can be seen that there were a total of 6,391 orders that were use for discounts with a total discount value of Rp.1,213,710,550. We also know that there were 68,483 transactions without a discount.
Top 5 Best Selling Products in December 2019
To find out what is the top 5 products purchased in December 2019 based on total quantity, we can enter the following query:
SELECT desc_product, SUM(quantity) AS total_quantity
FROM `first-trial-321912.dqlab_project.products`
JOIN `first-trial-321912.dqlab_project.order_details`
ON product_id = productID
JOIN (SELECT * FROM `first-trial-321912.dqlab_project.order` WHERE paid_at BETWEEN ‘2019–12–01’ AND ‘2019–12–31’) AS paid_order
ON (order_id = orderID)
GROUP BY desc_product
ORDER BY total_quantity DESC
LIMIT 5;
Output:
Top 5 Best Selling Products in 2019
To find out what is the top 5 products purchased in 2019 based on total quantity, we can enter the following query:
SELECT desc_product, SUM(quantity) AS total_quantity
FROM `first-trial-321912.dqlab_project.products`
JOIN `first-trial-321912.dqlab_project.order_details`
ON product_id = productID
JOIN (SELECT * FROM `first-trial-321912.dqlab_project.order` WHERE paid_at BETWEEN ‘2019–01–01’ AND ‘2019–12–31’) AS paid_order
ON (order_id = orderID)
GROUP BY desc_product
ORDER BY total_quantity DESC
LIMIT 5;
Output:
Top 5 Best Selling Products in 2020
To find out what is the top 5 products purchased in 2020 based on total quantity, we can enter the following query:
SELECT desc_product, SUM(quantity) AS total_quantity
FROM `first-trial-321912.dqlab_project.products`
JOIN `first-trial-321912.dqlab_project.order_details`
ON product_id = productID
JOIN (SELECT * FROM `first-trial-321912.dqlab_project.order` WHERE paid_at BETWEEN ‘2020–01–01’ AND ‘2020–05–31’) AS paid_order
ON (order_id = orderID)
GROUP BY desc_product
ORDER BY total_quantity DESC
LIMIT 5;
Output:
Monthly Transaction in 2019
This SQL query below can be used to find monthly transaction in 2019 based on amount of transactions and total transaction value:
SELECT FORMAT_DATE(‘%Y-%m’, created_at) AS Month, COUNT(order_id) AS Number_of_Transaction, SUM(total) AS total_transaction_value
FROM `first-trial-321912.dqlab_project.order`
WHERE created_at BETWEEN ‘2019–01–01’ AND ‘2019–12–31’
GROUP BY 1
ORDER BY 1;
Output:
Monthly Transaction in 2020
This SQL query below can be used to find monthly transaction in 2020 based on amount of transactions and total transaction value:
SELECT FORMAT_DATE(‘%Y-%m’, created_at) AS Month, COUNT(order_id) AS Number_of_Transaction, SUM(total) AS total_transaction_value
FROM `first-trial-321912.dqlab_project.order`
WHERE created_at >= ‘2020–01–01’
GROUP BY 1
ORDER BY 1;
Output:
Top 5 Best Selling Product Category in 2019
we can get 5 product categories with the highest total quantity in 2019, only for transactions that have been sent to buyers by entering the following query:
SELECT category, SUM(quantity) AS total_quantity, SUM(price) AS total_price
FROM `first-trial-321912.dqlab_project.order`
INNER JOIN `first-trial-321912.dqlab_project.order_details`
ON (order_id = orderID)
INNER JOIN `first-trial-321912.dqlab_project.products`
ON (product_id = productID)
WHERE created_at BETWEEN ‘2019–01–01’ AND ‘2020–01–01’ AND delivery_at IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
Output:
Top 5 Best Selling Product Category in 2020
we can get 5 product categories with the highest total quantity in 2020, only for transactions that have been sent to buyers by entering the following query:
SELECT category, SUM(quantity) AS total_quantity, SUM(price) AS total_price
FROM `first-trial-321912.dqlab_project.order`
INNER JOIN `first-trial-321912.dqlab_project.order_details`
ON (order_id = orderID)
INNER JOIN `first-trial-321912.dqlab_project.products`
ON (product_id = productID)
WHERE created_at>=’2020–01–01' AND delivery_at IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
Output:
High-Value Buyers
I want to find buyers who have transacted more than 5 times, and each transaction is more than 2,000,000. To find out who are the high value buyers, we can enter the following query:
SELECT nama_user, COUNT(nama_user) AS number_of_transaction, SUM(total) AS total_transaction_value, MIN(total) AS min_transaction_value
FROM `first-trial-321912.dqlab_project.users`
JOIN `first-trial-321912.dqlab_project.order`
ON user_id = buyer_id
GROUP BY user_id, nama_user
HAVING COUNT(nama_user) > 5 AND MIN(total) > 2000000
ORDER BY total_transaction_value DESC;
Output:
Dropshipper
I want to find out which users are dropshippers, which are buyers who bought goods but sent them to someone else. Its characteristics are many transactions, with different addresses.
So I created a SQL query to find buyers with 10 transactions or more whose transaction delivery addresses are always different for each transaction.
SELECT nama_user, COUNT(nama_user) AS number_of_transaction, COUNT(DISTINCT kodepos) AS pos_code, SUM(total) AS total_transaction_value, AVG(total) AS average_transaction_value
FROM `first-trial-321912.dqlab_project.users`
JOIN `first-trial-321912.dqlab_project.order`
ON user_id = buyer_id
GROUP BY user_id, nama_user
HAVING COUNT(nama_user) >= 10 AND COUNT(nama_user) = COUNT(DISTINCT kodepos)
ORDER BY 2 DESC;
Output:
The User who is Transacted as Buyer and seller
I want to find a seller who has also transacted as a buyer at least 7 times. The query below can be used to get it:
SELECT nama_user, Jumlah_transaksi_beli, Jumlah_transaksi_jual
FROM `first-trial-321912.dqlab_project.users`
INNER JOIN
(SELECT buyer_id, COUNT(1) AS Jumlah_transaksi_beli
FROM `first-trial-321912.dqlab_project.order`
GROUP BY 1) AS buyer
ON buyer_id = user_id
INNER JOIN (SELECT seller_id, COUNT(1) AS Jumlah_transaksi_jual FROM `first-trial-321912.dqlab_project.order` GROUP BY 1) AS seller
ON seller_id = user_id
WHERE Jumlah_transaksi_beli >= 7
ORDER BY 1;
Summary
- Throughout January 2019 to May 2020 there were 74,874 transactions. Of the total transactions, there were 86.92% of transactions that were successful (paid and sent), 6.74% of transactions that were not paid and not sent, 6.34% of transactions that were paid but not sent. The company needs to improve the system to ensure that all orders that have been paid for, must be sent by the seller within a certain period of time.
- The total quantity sold during the period January 2019-May 2020 was 4,025,016. The total quantity sold throughout 2019 was 1,536,259, while in January-May 2020 it was 2,488,757. This shows an increase in sales in 2020. Even the quantity sold in January-May 2020 almost reached 2 times the sales in 2019.
- There are 12 product categories and 1,145 products sold throughout January 2019 — May 2020. The Top 5 Best Selling Product Categories in 2019 and 2020 are personal hygiene, fresh food, instant food, groceries, soft drinks. The Top 5 Best Selling Product Category did not change from 2019 to 2020 but experienced an increase in purchase volume.
- Of the 17,936 users, there are 99.67% users who have transacted as buyers, 0.38% of users have transacted as sellers, 0.38% of users have transacted as sellers and buyers, 0.33% of users who have never made transactions either as seller and buyer. Companies need to make strategies to increase the number of sellers so that the variety of products that can be purchased by customers also increases. In addition, companies also need to carry out promotions such as providing discount vouchers for new users in order to reduce the percentage of users who have never made a transaction at all.
- The total value of transactions and the number of transactions carried out in 2019 increased from month to month. December 2019 is the month where the total transaction value and the number of transactions are the highest among other months. Transactions in 2020 also experienced an increase in the total transaction value and the number of transactions from month to month. The company can carry out various promotions such as giving discount vouchers, cashback, etc. in December due to the high number of transactions in that month so that it can attract the attention of customers.
- From a total of 74,874 transactions made, there were 6,391 users who made transactions using discounts with a total discount value of Rp. 1,213,710,550 and 68,483 users who transact without using a discount.