Popularity Characterization and Modelling for User-generated Videos
Date
2013-01-29
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ORCID
Type
Degree Level
Masters
Abstract
User-generated content systems such as YouTube have become highly popular. It is difficult to under- stand and predict content popularity in such systems. Characterizing and modelling content popularity can provide deeper insights into system design trade-offs and enable prediction of system behaviour in advance.
Borghol et al. collected two datasets of YouTube video weekly view counts over eight months in 2008/09, namely a “recently-uploaded” dataset and a “keyword-search” dataset, and analyzed the popular- ity characteristics of the videos in the recently-uploaded dataset including the video popularity evolution over time. Based on the observed characteristics, they developed a model that can generate synthetic video weekly view counts whose characteristics with respect to video popularity evolution match those observed in the recently-uploaded dataset.
For this thesis, new weekly view count data was collected over two months in 2011 for the videos in the recently-uploaded and keyword-search datasets of Borghol et al. This data was used to evaluate the accuracy of the Borghol et al. model when used to generate synthetic view counts for a much longer time period than the eight month period previously considered. Although the model yielded distributions of total (lifetime) video view counts that match the empirical distributions, significant differences between the model and em- pirical data were observed. These differences appear to arise because of particular popularity characteristics that change over time rather than being week-invariant as assumed in the model.
This thesis also characterizes how video popularity evolves beyond the eight month period considered by Borghol et al., and studies the characteristics of the keyword-search dataset with respect to content popu- larity, popularity evolution, and sampling biases. Finally, the thesis studies the popularity characteristics of the videos in the recently-uploaded and keyword-search datasets for which additional view count data could not be collected, owing to the removal of these videos from YouTube.
Description
Keywords
VoD, Video on Demand, UGC, User-generated Content, HBO, Home Box Office, DVR, Digital Video Recorder, OECD, Organization of Economic Co-operation and Development, P2P, Peer-to-Peer, IPTV, Internet Protocol Television, CDN, Content Distribution Network, OSN, Online Social Network, CDF, Cumulative Distribution Function, CCDF, Complementary Cumulative Distribution Function, MLE, Maximum Likelihood Estimation
Citation
Degree
Master of Science (M.Sc.)
Department
Computer Science
Program
Computer Science