John is a young data scientist who thoroughly enjoys the Applied Data Analysis course. He also has another passion in life - Youtube. He wants to start his own adventure and turn his passion into a living. Before embarking on his journey, John wishes to conduct an analysis of YouTube in order to identify patterns of evolution and gain a better understanding of the platform.
John has at his disposal the Youniverse dataset, which is divided into several subsets : among others, a Youtube channels dataset and another focusing on the time series of these channels.
First, let's look at the categories of these channels.
Throughout time, some categories have gained prominence and taken up a large portion of the landscape, like Music, Entertainment, and Gaming.
This can be explained by the fact that these genres are far more inclusive than others (Music vs Pets & Animals), in terms of their target audience.
Another explanation is that the channels in these major categories don't always belong to the typical independent Youtube creators but rather to businesses or actors that are a part of a broader sector. Take music, where T-Series, a significant Indian record label, has the most followers (more than 100 millions).
Moreover, how much does each category generate audience ?
As indicated by the previous pie chart, a few categories, namely Entertainment, Gaming, Music, and News Politics, largely monopolize the majority of the audience in our sample, both in terms of subscribers and published videos.
However, when we combine these two units, the big picture appears to shift: we can see that other categories have the highest ratio of subscribers to videos produced. This time, the categories are Comedy, Film and Animation, and Howto & Style. This could be due to increased user loyalty, as users prefer to follow funny content creators rather than news channels (shown by the very low ratio of News & Politics).
To get a better understanding, let's a deeper look at the metadata of those categories. When we examine the interactions between users and creators (particularly through likes and dislikes), we can draw the same conclusions as before: certain categories (Entertainment, Gaming, and Music) appear to monopolize the interactions.
However, when these figures are normalized to the number of videos in each category, the ranking shifts again, and other categories generate a high number of interactions per video. This reinforces the above-mentioned trend of user preferences and loyalty, particularly in the Comedy category, which has the highest ratio yet again. However, the three categories mentioned above (Entertainment, Gaming, and Music) generate a high number of interactions per video and are ranked second, fourth, and seventh, respectively.
To conclude this point about interaction, by linking dislikes and likes, we see that the most represented categories do not appear to be those that are universally supported by users: Entertainment has a very low like/dislike ratio when compared to Shows, which has a very high like/dislike ratio when compared to the other categories.
The following two categories are Comedy, which has a high engagement rate, and Gaming, which is one of the most represented categories in terms of channel count.
When looking at the average and median lengths of the videos, there are a few outliers within the categories: shows, gaming, education, and non-profit, which are all around 1000 seconds or longer, while the majority of the other categories are between 200 and 600 seconds.
Choosing a short video category, such as Howto & Style, may benefit John because it may take him less time and energy to create than a longer video.
To summarize, we can see from this quick data analysis that certain categories are overrepresented, but that quantity is not synonymous with quality: these same categories are not always those that arouse the commitment of their audience.
This helped John, our data scientist, decide which field to start in: he needs a category that addresses a large audience while allowing you to build a community. HowTo & Style could be an appropriate field: while being in the top subscribers category, it still piques the audience's interest as it has high likes/dislikes ratios over the numbers of videos.