Simple EDA Project Udemy

Zuda Pradana Putra
7 min readMay 9, 2023

Hey, stay healthy all of you who are reading this. This is my second article in completing a simple EDA project from the data I got on Kaggle about the information on courses available on Udemy.

Photo by Firmbee.com on Unsplash

Many online course provider platforms are available today, but what caught my attention is the Udemy platform, which offers many categories of lessons and very low prices. Over the past 2 years, I have been accustomed to buying several courses such as Web Developer, Database Engineer, Cloud, and Data Analytics. I am interested in analyzing the profits generated by instructors and Udemy, which courses are the most popular and have good ratings for both free and paid, the distribution of prices for each course, and whether all courses are always updated.

This data set contains information on every course available on Udemy, taken on October 10, 2022. There are a total of 209k courses with 72k different instructors and a total of 648m enrolled students. The average course sold is $89 (I think the data was taken not during discount time), with the highest price of $999 and the lowest of $0.10 (I ignored free courses priced at $0). In this project, I will explore the data using Python with supporting libraries such as pandas, numpy, and plotly for visualization. Well, let’s dive into the analysis.

Subscription Type Percentage

Udemy offers two types of subscription, where instructors can either sell their courses at a specific price (paid) or offer them for free. It is observed that almost 90% or 177.996 courses of the available courses are paid subscriptions, which is designed to allow instructors to profit from their courses.

The Distribution of Prices for Each Course

Regarding the distribution of Udemy’s course prices, they range from $0.1 to $999. However, the distribution below shows that the prices are dominantly scattered below $200. Around 20.7% of all Udemy course prices range from $18 to $19.99, and 7% range from $198 to $199. The development category has the highest price distribution compared to others. My tip is to wait for specific events because Udemy often applies discounts of up to 90% off the original price for all courses.

Total Revenue and Subscribers by Category

There are currently 13 categories available on Udemy, such as Design, Marketing, Lifestyle, Development, Business, and others. What caught my attention are the top 3 categories as shown in the figure below. The first ranking is occupied by the Development category with 15% (31.6k) of the total categories, followed by IT & Software at 14.5%, and Teaching & Academics at 12.5%. The lowest total revenue is from the Lifestyle and Music courses, both of which have less than $800M.

Top 3 Sub-Category

Let’s explore the sub-categories within the top three categories. For the first category, Development (with 10 different topics), the highest sub-category is Web Development (with 10,3k courses). For IT & Software, the highest sub-category is Other IT & Software(11.5k course), and for the Teaching & Academics category, the highest sub-category is Language Learning(7.5k course).

Top 10 Topics With the Most Number of Subscribers (Paid & Free Courses)

Python tops the list with the highest number of available courses and has a significant number of enrollments with 27 million enrollments for paid courses alone. The Python topic is spread across several subcategories, including Programming Languages, Other IT & Software, and Data Science. Excel is in second place with considerable popularity, followed by JavaScript in the third position.

Top 10 Subscriptions Courses Paid/Free

The top three paid subscription courses on Udemy with over 1 million subscriptions are “2022 Complete Python Bootcamp From Zero to Hero in Python”, “Microsoft Excel — Excel From Beginner to Advanced”, and “Automate the Boring Stuff with Python Programming”. On the other hand, the top two free courses with high subscriptions are “Java Tutorial for Complete Beginners” with 1.8 million subscriptions, followed by “Introduction To Python Programming” with 780,000 subscriptions.

Top 10 Languages Used by Instructor

On the Udemy platform, there are 79 language variations available, depending on the main language used by each instructor. About 59% (123.9k) of courses are in English, followed by Portuguese at 8.8% and Spanish at 8.3%. However, some popular courses are also available in other languages with the help of automatic subtitles.

Distribution of Course Duration(Min) by Subscription Type

The distribution, of course, durations differs between paid and free courses. The average duration of paid courses is 285 minutes (4.7 hours). However, there are outliers with a duration of more than 675 minutes, which account for 16,648 out of 187,996 courses. On the other hand, free courses have an average duration of 91 minutes, but there are also 1,114 courses with a duration of more than 200 minutes.

Course Growth Published Annually

Since its establishment in 2010, Udemy has been steadily growing its course offerings every year. The peak growth was reached in 2020 with a 90% increase in the total number of courses (44,929) compared to the previous year. In 2021, there was a further increase of around 14.5%, resulting in the highest number of published courses to date. This is not surprising given that instructors took advantage of the pandemic to create more courses. Unfortunately, subscriber growth did not follow the same trend as the increase in available courses. The only similarity was the peak increase in 2020 compared to other years. However, there was a significant and unexpected drop in subscriber growth in 2021 and 2022, with a decrease of 85.2%.

Total Courses vs Numb of Subs

Share Profit Udemy & Instructure

Based on the information provided on the Udemy website, instructors receive a profit of 97% if they make a sale using their referral link, while Udemy receives 37% of the revenue from sales made without the use of referral links. Assuming instructors only receive 37% of the profit, the highest profits were recorded in 2020, with Udemy earning up to 6.8 billion USD and instructors earning 4 billion USD. This aligns with the subscriber growth graph, where a significant decrease was observed in 2021 and 2022.

Percentage Updated Course Less Than Half-Year

Not all courses are updated regularly. Udemy does not have specific rules for sellers (instructors) to update their courses. Out of the total number of courses, only 23% (48,160) have been updated within the last 6 months, or are newly published courses. Before purchasing a course, it is advisable to check how recent the material is, especially for IT-related courses where developments occur rapidly every year.

Top Instructors by Number of Subscribers, Total Courses, Profit, and Average Rating

Ranked by the number of subscribers, Learn Tech Plus has a total of 191 courses with 7,910,243 subscribers and a profit of $443 million, but has a relatively low average course rating of 3.6 compared to other instructors. However, Jose Portilla stands out as the best performer with a 4.58 rating, 49 courses, 4,196,088 subscribers, and a high profit of $302 million for a relatively small number of courses. The trendline between the number of subscribers and profit has a fairly strong correlation with an R square of 0.75.

Udemy is a massively popular Massive Open Online Course (MOOC) platform that allows anyone to become an instructor, create courses in their style, and set their prices. Based on a simple analysis, I have drawn several conclusions:

  • Almost 90% of the courses available on the platform are subscription-based. The price range varies from $0.1 to $200, with the majority falling within the range of $18 to $19.9, accounting for 20% of the total variation in prices.
  • Python is the most popular topic among both paid and free courses, with the largest number of subscribers. However, based on the courses with the highest number of subscribers, the “Java Tutorial for Complete Beginners” (free) has 1.8M subscribers (with an average rating of 4.44/5), followed by the “2022 Complete Python Bootcamp” (paid) with 1.6M subscribers (with an average rating of 4.61/5).
  • The average course duration varies, with free courses averaging 90 minutes and paid courses averaging 285 minutes. However, 7.9% of paid courses have a duration of over 675 minutes. In total, there are 79 languages used by instructors, with 59% of the content in English.
  • The number of courses published on Udemy has consistently increased from 2010 to 2021, with the highest increase in 2020 of approximately 90%. Similarly, the number of subscribers reached its highest peak in 2020, with a total of 120.7M subscribers.
  • Assuming the 63% profit rule, Udemy made a profit of $6.89B in 2020, with Learn Tech Plus (191 courses) and TJ Walker (209 courses) being the two instructors who earned over $4B. Overall, only 23% of courses were updated within six months of data collection.

Thank you for taking the time to read my second article. I hope this article was beneficial to you and provided new insights on the topic discussed. I also appreciate the support and appreciation given. Hopefully, this article can provide positive benefits for readers and the wider community. To see the cleaning process before EDA, visit my github profile via this link

--

--