Application of data analytics and machine learning on data collected by smartphones to understand human behavioural patterns
A growing number of health studies seek to leverage smartphone-based recording to continuously monitor consenting participants’ health behaviours, including those related to mental health, mobility, and activity. So as to better understand health risks and the influence of the environment on human physical and mental health conditions, such studies commonly use smartphones to collect health behaviour relevant metrics such as screen state, app usage, location, activity level, browsing behaviour, etc. They also typically use survey instruments incorporating questionnaires, voice recordings, photos, multi-media content on which the user is asked to provide feedback, etc. When the data volume and variety grow substantially --- such as is common with sensed data --- then challenges associated with data quantity, quality, diversity, trustworthiness, etc. also increase significantly. Because most health scientists are unfamiliar with tools and concepts required for effective analysis of such high-volume and high-velocity data, it is challenging for health scientists alone to perform the computationally intensive analyses needed to secure certain types of insight from the collected data. The primary objective of this thesis is to provide computational mechanisms to support research teams associated with 3 distinct case studies utilizing smartphone-based data, so as to help obtain insights accessible to team health scientists. The data sets for these three studies were collected from participants using a pre-existing smartphone based application named Ethica. Such data was accumulated over a period ranging from 2 weeks to 6 months – with the study period differing across the three studies – through a set of surveys and mobile sensors such as those for the battery, screen state, GPS, etc. This thesis addresses three significant challenges associated with the extraction and processing of smartphone data. The first is the computational burden and intricacies associated with data extraction, preprocessing and analytic steps. The second consists of a need for handling omitted and missing data points with the help of machine learning and statistical methods. The final challenge covered here is to secure valuable findings from these data sets through exploratory analysis following examination of participant adherence patterns and evaluation of the quantity and quality of the data collected. The methods applied in this thesis are useful for other studies using the Ethica platform because of the shared structure of Ethica datasets and the capacity of the code to be reused and readily adapted for other such datasets.
Machine learning, Data analysis, big data, simulation modelling, computational statistics, behavioural patterns, smartphone data, data analytics, statistical analysis, exploratory analysis, quantitative analysis, healthdata, Scala and Spark, human behavior, healthscientists, computerscience research
Master of Science (M.Sc.)