Check out the glossy app!
LinkedIn | GitHub | Author Bio

Summary:

Digital content creation is an industry of approximately $ 12 billion and is estimated to grow at an annual rate of 12%. An environment like YouTube is lowering the threshold for individual participation in the industry, and content created by small businesses and individuals has never been bigger than it is now. The YouTube Trending List is a potentially useful tool for small content producers to make their work known to a wider audience. This project is exploring what features the videos sharing this list hope to better inform content producers who want to make this list and get YouTube to market their videos to them. In the R app, an app was built using the Shiny app to introduce visitors to the results of the project and provide advice on how to make content that has a better chance of making a list.

YouTube Trend List:

The purpose of Youtube’s Trending List is to bring out videos that a wide range of viewers find interesting. It highlights videos that (1) attract a wide range of viewers, (2) are not misleading, non-clickable, or sensational, (3) capture the scale of what is happening on YouTube and the world, and (4) showcase diversity of factors. Some trends are predictable, such as a new song by a popular artist or a new movie trailer. Others are surprising, like the viral video. The Trends list shows the same list of trend videos in each country to all users, providing a wide range of new viewers for the content producers who created the list. Visibility to new viewers and the actual marketing of YouTube videos for you are the main benefits of creating a Trending list for content producers.

Information:

Data from YouTube Trending list videos were used daily to perform the analysis in the YouTube Trending list. Kaggle user Rishav Sharma collected it using the YouTube API. The first part of Shiny goes through a data check. The dataset contained approximately 56,000 records over a 280-day period from mid-August 2020 to May 2021. More than 9,800 unique videos from 3,200 creators included a trend list during this period, and the average video remained on the trend list for 5.69 days.

Figure 1. Table of basic information used.

Research data analysis showed a median view of the video at the time it made the trend list 1.14 million views. The distribution of views per video was found to be log-normal, similar to the distribution of likes and comments per video.

Figure 2. Distribution of likes per video in the trend list.

A particularly interesting finding in data retrieval related to the “sequence” of occurrences in the Trending list. Continuity was calculated as the ratio of the number of days in video trends to the number of calendar days between the first and last occurrences of a video. a ratio of 1.0 meant that each day the video was trending was consecutive, while a ratio of 0.5 meant that the video was on the Trending list on average every other day during the period when it was Trending. Most of the videos were 1.0 in a row, and the ratio of the majority of videos was over 0.75. This means that when a video falls off the trend list for a day or two, its activity is likely to be over.

Figure 3. Sequence of trend days per video.

The Shiny app related to this project owned the entire portion for this EDA. One tab contained the basic information described above. The second tab included visualizations of the search for this information, presenting one chart at a time, selectable from a drop-down menu, as well as a short paragraph explaining the meaning of each chart.

Analysis and recommendations:

The main analysis of this project addressed four different features, each with its own section related to the Shiny application. The first feature was the title of the video, which examined and plotted the length distribution in characters, word count, and capitalization. The title of the video plays a big role in whether the viewer decides to spend time with the video and is an important factor for content producers. The median video for the rising list was 51 characters long, and content producers are encouraged to keep their names between 30 and 80 characters to produce content with similar characteristics to previously rising videos. It should be noted that the occurrences of trend lists have dropped sharply to the 100-character level, which may be due to YouTube actively selecting videos with a very long title.

Figure 4. Video title length (in characters) distribution.

Another aspect of the video name analyzed in this project was the word count of the title. The median word count for the video title was eight words, and content producers are advised to keep their title between 4-15 words, keeping in mind the previous character length recommendation. The third and final feature of the video title studied was the proportion of capital letters in the video title. The median ratio was 0.21. This makes sense because the first letter of each word in the title (assuming the average length of the English word is 5.1 characters, according to Wolfram Alpha) would result in an average ratio of 0.19. Apparently, most of the videos on the Trending list have the first letter of each word in uppercase. The capitalization ratio is also long, indicating that some videos have significantly more capitalization. Content producers are encouraged to capitalize the first letter of each word in their video titles and optionally add capital letters to emphasize it.

Figure 5. Uppercase ratio in a video title.

Another feature examined in this analysis was the channel name of the videos included in the Trending list. The median length of the channel name was found to be 12 characters, while the median content of the channel title was only two words. It is recommended that content producers keep the channel title short, 5-20 characters long, with only 1-3 words.

Figure 6. Channel header length (in characters).

The third property studied in this project was the class property. At the time the video was released, the creator had to select one of the 15 categories as the genre of the video. It was examined what proportion of the videos on the trend list represents each category. The categories of music, entertainment, and games were found to be most represented, while pets and animals, tourism and events, and organizations and activism were underrepresented. While a video needs, on average, more likes to get on the Trends list in overrepresented categories (about 3 times more likes), we still recommend that content providers make content in the Music, Entertainment, or Game categories to maximize their chances of getting on the list. These categories are 20x more in the list than the underrepresented categories.

Figure 7. Categories based on appearance.

Last but not least, the User Added Tags field was analyzed. This is a fairly unbuilt feature where the user can enter any tag they want and any number of tags. To restore the structure, the most common tag of all videos was kept for each video and stored as a ‘header’ field. No decisive recommendations were made for the tags other than their use. More than 85% of the videos that made the trend list used tags. An interactive bar chart of tags by category was created for Shiny, allowing users of the app to explore the most commonly used tags in each category or combination of categories. This is an area where content producers can explore the use of tags in Trending videos, even if the project has not made any recommendations related to them.

Figure 8. Display of the Shiny Interactive Tag Chart.

Future work:

The aforementioned identifier field is a priority area for future development. Due to the time constraints of the project, this field was analyzed using rather simplified methods. The use of more complex natural language processing techniques, such as latent Dirichlet allocation, can lead to a useful insight into the use of video tags in the Trending list.

The more I have worked with this information, the more restrictive I found it to be. The fact that I only have information about videos that are rising from the Trending list, and nothing for managing videos that aren’t on the list, or even a random sample of all the videos, means I couldn’t compare the differences between the list of videos made and the ones that aren’t. Another limitation of the data is that impressions, likes, and comments do not represent the popularity of videos with certain features. They represent the level of data needed to access the Videos Trend List. I’d like to work with YouTube data on collecting views, likes, and comments on videos that both made and didn’t make the Trends list a certain time after publication, maybe between a month and one year (or multiple time frames). This would allow me to compare the popularity of videos with different features. With this additional information, I could probably find useful advice for content producers on YouTube.

LEAVE A REPLY

Please enter your comment!
Please enter your name here