A Comparison of Machine Learning Techniques to Classify Tweets relevant to People impacted by Dementia and COVID-19

Azizi, Mehrnoosh

A Comparison of Machine Learning Techniques to Classify Tweets relevant to People impacted by Dementia and COVID-19

dc.contributor.advisor	Spiteri, Raymond
dc.contributor.committeeMember	Vassileva, Julita
dc.contributor.committeeMember	Klarkowski, Madison
dc.creator	Azizi, Mehrnoosh
dc.creator.orcid	0000-0002-4337-4630
dc.date.accessioned	2022-11-18T22:02:34Z
dc.date.available	2022-11-18T22:02:34Z
dc.date.copyright	2022
dc.date.created	2022-11
dc.date.issued	2022-11-18
dc.date.submitted	November 2022
dc.date.updated	2022-11-18T22:02:34Z
dc.description.abstract	Dementia has emerged as one of today's biggest healthcare challenges due to the increasing demand for medical, social, and institutional care. Moreover, the COVID-19 pandemic has had a unique impact on people with dementia. Those with dementia are also at an increased risk of contracting COVID-19, as well as having more severe symptoms and disease consequences. This highlights the importance of focusing on the issues of people living with dementia. Modern technologies including social media can help psychologists to analyze people’s experiences and take necessary measures. However, one of the principal problems for psychologists is that they must process huge amounts of data, but not all of the data can be analyzed due to a lot of irrelevant information in the data. Therefore, the data need to be labeled manually either by one or several researchers, which is a tedious and time-consuming task and may be costly due to the human effort involved. Thus, improvements to existing methodologies are needed to enable psychologists to make better use of the data and understand the impacts of COVID-19 on people with dementia. Nowadays, one of the modern and reasonable ways perform a task (e.g., automatic labeling) is to use Machine Learning (ML) algorithms to save time and energy. To this end, this study compares various ML algorithms to classify tweets relevant to dementia and COVID-19 in order to help psychologist examine the COVID-19 impacts on people living with dementia. In this case, three different datasets are used: (i) a dataset comprised of 5,058 tweets extracted from Twitter on COVID-19 and dementia from February 15 to September 7, 2020 to train, evaluate, and compare different models, (ii) a dataset comprised of 6,240 tweets from September 8, 2020 to December 8, 2021 to test the best model, and (iii) a dataset comprised of 1,289 tweets related to Canada’s Alzheimer’s Awareness Month from January 1 to January 31, 2022 to retrain and test the best model. In the first step, to choose the best machine learning model, several classification models, including logistic regression, Gaussian naïve Bayes classifier, multinomial naïve Bayes classifier, support vector classifier, decision tree classifier, K-nearest neighbor classifier, random forest classifier, AdaBoost classifier, XGBoost classifier, BERT classifier, and ALBERT classifier are trained and compared in terms of classification performance. According to the classification results, the ALBERT model outperformed all other models in the comparison and achieved the least over-fitting problem and the highest accuracy, AUC, and F1-score compared to the other explored models. In the second step, the ALBERT model is tested on the second dataset (a completely unseen dataset) and achieved an accuracy of 80% in classifying relevant and irrelevant tweets for people impacted by dementia and COVID-19. Finally, to show that the ALBERT model can be used for future studies in the context of people impacted by dementia and COVID-19 in an efficient way, the model is trained on 10% of the third dataset and tested using 90% of the rest and reached an accuracy of 88%.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/10388/14317
dc.language.iso	en
dc.subject	Dementia, COVID-19, logistic regression, Gaussian naïve Bayes classifier, multinomial naïve Bayes classifier, support vector classifier, decision tree classifier, K-nearest neighbor classifier, random forest classifier, AdaBoost classifier, XGBoost classifier, BERT classifier, ALBERT classifier
dc.title	A Comparison of Machine Learning Techniques to Classify Tweets relevant to People impacted by Dementia and COVID-19
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Computer Science
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Saskatchewan
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.Sc.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AZIZI-THESIS-2022.pdf
Size:: 3.81 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 2.27 KB
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations