To understand how YouTube’s recommendation system works, we have to look at three issues YouTube has to solve to provide its customers with a best in class recommendation experience.
There are over 800 million videos on YouTube. Any existing recommendation algorithms proven to work reasonably efficiently on small data sets, like comparing user profiles, do not work on a data set this large.
2. The cold-start problem
The cold-start problem exists where we have little to no behavioural data when an item or piece of content is uploaded to a platform - we are unable to recommend these items to users.
Susan Wojcicki, YouTube’s CEO, made the cold-start problem very apparent back in February 2020 when she declared there were two billion monthly YouTube users around the world, and 500 hours of video uploaded every minute. This is a figure likely to have been inflated in a society that has endured a Covid-19 induced lockdown for the past 12-18 months.
3. Data noise
With over 800 million videos on YouTube, there are inevitably many videos with extremely sparse historical user behaviour data. As a result, these videos are less likely to surface as a recommendation.
YouTube can also rarely obtain the ground truth of user satisfaction and instead has to model noisy implicit feedback signals. We will go into why explicit signals are an ineffectual measurement of user satisfaction when it comes to a recommendation system later on in this blog post.
Issues with noisy implicit data becomes a double edged sword when you take into account the textual data associated with content (titles, descriptions etc). This is infamously poor because it is uploaded by content creators themselves.