NYC Citi Bike Trips Data Analysis

Fitri Widya Nanda
8 min readNov 15, 2021

--

picture by allvectorlogo.com

Citi Bike is New York City’s bike-sharing system with thousands of bikes at hundreds of stations and is available 24/7 every day of the year. Citi Bike is a convenient solution for quick trips around the City. Users could unlock a bike at any station and ride wherever they want. Daily, three-day, and annual passes are available, making Citi Bike a great option for visitors and locals alike.

Dataset Description

The dataset that is used comes from bigquery public data with table ID `bigquery-public-data:new_york_citibike.citibike_trips`. The dataset contains information about NYC Citibike trips from July 1, 2013, to May 31, 2018. The dataset consists of 58,937,715 rows and 16 columns. Further information about the dataset can be seen below:

  1. The Number of Citibikes

we can get the number of city bikes in 2013–2018 by entering the following query:

SELECT EXTRACT(YEAR FROM starttime) AS year, 
COUNT(DISTINCT(bikeid)) AS num_bikes
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY year
ORDER BY year;

Output :

Based on the query results, it can be seen that the number of bikes provided continues to increase every year from 2013 to 2017. The number of bikes in 2013 was 6,503 units and continued to increase until it reached 13,820 units in 2017. In 2018, it is seen that the number of bikes has decreased compared to 2017. This is because the data used is only up to May 2018, so the query results do not fully describe the number of bicycles in 2018.

2. Number of Users by Generation

We can enter the following query to get the number of users by generation:

WITH new_view AS(
SELECT birth_year,
CASE WHEN birth_year BETWEEN 1883 AND 1900 THEN ‘Lost Generation’
WHEN birth_year BETWEEN 1901 AND 1927 THEN ‘G.I. Generation’
WHEN birth_year BETWEEN 1928 AND 1945 THEN ‘Silent Generation’
WHEN birth_year BETWEEN 1946 AND 1964 THEN ‘Baby Boomers’
WHEN birth_year BETWEEN 1965 AND 1980 THEN ‘Generation X’
WHEN birth_year BETWEEN 1981 AND 1996 THEN ‘Millenials'
WHEN birth_year BETWEEN 1997 AND 2012 THEN ‘Generation Z'
ELSE 'Other'
END AS generation
FROM `bigquery-public-data.new_york_citibike.citibike_trips`)
SELECT COUNT(birth_year) AS users, generation
FROM new_view
GROUP BY generation
ORDER BY users DESC;

Output:

We obtained the query results that the NYC Citibike users in 2013–2018 mostly came from the millennial generation, as many as 22,515,467 people. Generation X is the second largest user with 17,087,299 and Baby Boomers is the third most user with 7,246,214. Based on the results obtained, it can be seen that the 3 generations that used the Citibike the most during the 2013–2018 period were millennials, generation x, and baby boomers, namely people born in 1965–1996.

3. The top 10 stations serve the most trip for annual members in 2016

To find out the top 10 stations that serve the most trip for annual members in 2016, we can enter the following query:

SELECT start_station_name, num_station
FROM
(SELECT start_station_name, COUNT(start_station_name) AS
num_station, EXTRACT(YEAR FROM starttime) AS year
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
WHERE usertype = 'Subscriber'
GROUP BY start_station_name,year
ORDER BY year)
WHERE year = 2016
ORDER BY num_station DESC
LIMIT 10

Output:

Based on the query results, it can be seen that the station that served the most trips for annual members in 2016 was Pershing Square North station which served 105,479 trips.

4. Top 10 Bikes with The Highest Trip Duration

We can get the top 10 bikes with the highest trip duration by entering the following query:

SELECT bikeid, num_trip, duration, ROUND((duration/num_trip), 2) AS avg_duration_trip
FROM
(SELECT bikeid, SUM(tripduration) AS duration, COUNT(*) AS
num_trip
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
WHERE bikeid IS NOT NULL
GROUP BY
bikeid
ORDER BY duration DESC)
LIMIT 10;

Output:

Based on the query results, it can be seen that the bike with the highest travel duration in 2013–2018 was a bike with a bikeid of 30503. This bike has traveled for 20,378,802 seconds or 5,660.8 hours, with a total of 1,111 trips, and an average duration of trips made this bike is 18,342.76 seconds or 5.09 hours.

5. Number of Users Based On Gender

To find out the number of users based on gender, we can enter the following query:

SELECT EXTRACT(YEAR FROM starttime) AS year,
COUNT(CASE WHEN gender = ‘female’ THEN 1 END ) AS
count_female,
COUNT(CASE WHEN gender = ‘male’ THEN 1 END ) AS count_male
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY year
ORDER BY year;

Output:

Based on the query results, it can be seen that there are more male than female NYC Citibike users. Every year, female users are not even half of the number of male users. This may be due to a lack of security for women when using public transportation, especially bicycles.

6. Number of Users Based On User Types

We can get the number of users based on user types by entering the following query:

SELECT EXTRACT(YEAR FROM starttime) AS year,
COUNT(CASE WHEN usertype = ‘Subscriber’ THEN 1 END ) AS
count_subscriber,
COUNT(CASE WHEN usertype = ‘Customer’ THEN 1 END ) AS
count_customer
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY year
ORDER BY year;

Output:

Based on the query results, it can be seen that during the 2013–2017 period, the number of NYC Citibike users who are the subscriber type is always more than the customer type. Subscriber type users continued to increase from 4,370,245 in 2013 to 12,441,957 in 2017. Meanwhile, customer type users were 666,940 in 2013 and increased to 1,676,712 in 2017.

The subscriber/annual membership is just $15/month ($179 billed upfront annually). It includes unlimited 45-minute rides on a classic Citi Bike, while The day pass costs $15 for a 24-hour period. Based on the price comparison, it can be seen that the subscriber type is very worth it to be used by people who use bicycles every day, while the customer type is very suitable for use by tourists or people who want to cycle all day long.

7. Trip Growth

We can get the trip growth during 2013–2018 by entering the following query:

SELECT year, trip, previous, trip-previous AS trip_growth, 
ROUND((trip-previous)/previous*100, 2) AS
percentage_trip_growth
FROM (SELECT year, trip, LAG(trip) OVER (ORDER BY year) AS previous
FROM (SELECT EXTRACT(YEAR FROM starttime) AS year,
COUNT(start_station_id) AS trip
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY year)
WHERE year IS NOT NULL
GROUP BY year, trip
ORDER BY year)
ORDER BY year

Output:

Based on the query results, it can be seen that in 2013–2017, the number of trips continued to increase from year to year. In fact, in 2017 the number of trips experienced a sharp increase compared to the previous year, which was 3,856,020 trips compared to the previous year. In 2018, it is seen that the trip growth has decreased compared to 2017. This is because the data used is only up to May 2018, so the query results do not fully describe the trip growth in 2018.

8. Trip Duration Per Month in 2014–2015

We can get the trip duration per month in 2014–2015 by entering the following query:

SELECT year,
SUM(CASE WHEN Month = ‘Jan’ THEN duration END) AS January,
SUM(CASE WHEN Month = ‘Feb’ THEN duration END) AS February,
SUM(CASE WHEN Month = ‘Mar’ THEN duration END) AS March,
SUM(CASE WHEN Month = ‘Apr’ THEN duration END) AS April,
SUM(CASE WHEN Month = ‘May’ THEN duration END) AS May,
SUM(CASE WHEN Month = ‘Jun’ THEN duration END) AS June,
SUM(CASE WHEN Month = ‘Jul’ THEN duration END) AS July,
SUM(CASE WHEN Month = ‘Aug’ THEN duration END) AS August,
SUM(CASE WHEN Month = ‘Sep’ THEN duration END) AS September,
SUM(CASE WHEN Month = ‘Oct’ THEN duration END) AS October,
SUM(CASE WHEN Month = ‘Nov’ THEN duration END) AS November,
SUM(CASE WHEN Month = ‘Dec’ THEN duration END) AS December
FROM (SELECT EXTRACT (YEAR FROM starttime) AS year,
FORMAT_DATE(‘%b’,starttime) AS Month,
ROUND(SUM(tripduration)/3600, 1) AS duration
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY year, Month)
WHERE year BETWEEN 2014 AND 2015
GROUP BY year
ORDER BY year

Output:

In 2015–2016, the most use of Citibikes was in May-October. While the use of Citibike decreased in November, December, January, February, and April. This could have been caused by changes in the weather where in November is the transition to winter and December to February is the winter period.

9. 10 Stations That Serve the Lowest Number of Trips

We can get the top 10 stations with the lowest number of trips by entering the following query:

SELECT start_station_name AS station, COUNT(start_station_name) AS num_station
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY station
ORDER BY num_station
LIMIT 10

Output:

Based on the query results, it can be seen that the stations that serve the lowest number of customers are stations 1 Ave & E 105 St and 333 Johnson TEST 1, which have only served customer departures once each.

10. Number of Users by Times of Day

We can get the number of users by times of day by entering the following query:

WITH time AS (SELECT FORMAT_DATETIME(‘%H:%M:%S’, CAST(starttime AS DATETIME)) AS hour
FROM `bigquery-public-data.new_york_citibike.citibike_trips`)
SELECT
COUNT(CASE WHEN hour BETWEEN ‘05:00:00’ AND ‘11:59:59’ THEN 1 END)
AS morning,
COUNT(CASE WHEN hour BETWEEN ‘12:00:00’ AND ‘16:59:59’ THEN 1 END)
AS afternoon,
COUNT(CASE WHEN hour BETWEEN ‘17:00:00’ AND ‘19:59:59’ THEN 1 END)
AS evening,
COUNT(CASE WHEN hour BETWEEN ‘20:00:00’ AND ‘23:59:59’ THEN 1 END)
AS night,
COUNT(CASE WHEN hour BETWEEN ‘00:00:00’ AND ‘04:59:59’ THEN 1 END)
AS late_night
FROM time;

Output:

Based on the query results, it can be seen that most NYC Citibike users start using bicycles in the morning and afternoon. There are 16,089,253 users who use bicycles between 05:00 and 11:59 and 16,043,162 users who use bicycles between 12:00 and 16:59. While the lowest number of bicycle users is from 00.00 to 04.59, which is 1,121,711 throughout 2013–2018.

--

--