Repository logo
 

A Comparison of Machine Learning Techniques to Classify Tweets relevant to People impacted by Dementia and COVID-19

dc.contributor.advisorSpiteri, Raymond
dc.contributor.committeeMemberVassileva, Julita
dc.contributor.committeeMemberKlarkowski, Madison
dc.creatorAzizi, Mehrnoosh
dc.creator.orcid0000-0002-4337-4630
dc.date.accessioned2022-11-18T22:02:34Z
dc.date.available2022-11-18T22:02:34Z
dc.date.copyright2022
dc.date.created2022-11
dc.date.issued2022-11-18
dc.date.submittedNovember 2022
dc.date.updated2022-11-18T22:02:34Z
dc.description.abstractDementia has emerged as one of today's biggest healthcare challenges due to the increasing demand for medical, social, and institutional care. Moreover, the COVID-19 pandemic has had a unique impact on people with dementia. Those with dementia are also at an increased risk of contracting COVID-19, as well as having more severe symptoms and disease consequences. This highlights the importance of focusing on the issues of people living with dementia. Modern technologies including social media can help psychologists to analyze people’s experiences and take necessary measures. However, one of the principal problems for psychologists is that they must process huge amounts of data, but not all of the data can be analyzed due to a lot of irrelevant information in the data. Therefore, the data need to be labeled manually either by one or several researchers, which is a tedious and time-consuming task and may be costly due to the human effort involved. Thus, improvements to existing methodologies are needed to enable psychologists to make better use of the data and understand the impacts of COVID-19 on people with dementia. Nowadays, one of the modern and reasonable ways perform a task (e.g., automatic labeling) is to use Machine Learning (ML) algorithms to save time and energy. To this end, this study compares various ML algorithms to classify tweets relevant to dementia and COVID-19 in order to help psychologist examine the COVID-19 impacts on people living with dementia. In this case, three different datasets are used: (i) a dataset comprised of 5,058 tweets extracted from Twitter on COVID-19 and dementia from February 15 to September 7, 2020 to train, evaluate, and compare different models, (ii) a dataset comprised of 6,240 tweets from September 8, 2020 to December 8, 2021 to test the best model, and (iii) a dataset comprised of 1,289 tweets related to Canada’s Alzheimer’s Awareness Month from January 1 to January 31, 2022 to retrain and test the best model. In the first step, to choose the best machine learning model, several classification models, including logistic regression, Gaussian naïve Bayes classifier, multinomial naïve Bayes classifier, support vector classifier, decision tree classifier, K-nearest neighbor classifier, random forest classifier, AdaBoost classifier, XGBoost classifier, BERT classifier, and ALBERT classifier are trained and compared in terms of classification performance. According to the classification results, the ALBERT model outperformed all other models in the comparison and achieved the least over-fitting problem and the highest accuracy, AUC, and F1-score compared to the other explored models. In the second step, the ALBERT model is tested on the second dataset (a completely unseen dataset) and achieved an accuracy of 80% in classifying relevant and irrelevant tweets for people impacted by dementia and COVID-19. Finally, to show that the ALBERT model can be used for future studies in the context of people impacted by dementia and COVID-19 in an efficient way, the model is trained on 10% of the third dataset and tested using 90% of the rest and reached an accuracy of 88%.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/14317
dc.language.isoen
dc.subjectDementia, COVID-19, logistic regression, Gaussian naïve Bayes classifier, multinomial naïve Bayes classifier, support vector classifier, decision tree classifier, K-nearest neighbor classifier, random forest classifier, AdaBoost classifier, XGBoost classifier, BERT classifier, ALBERT classifier
dc.titleA Comparison of Machine Learning Techniques to Classify Tweets relevant to People impacted by Dementia and COVID-19
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AZIZI-THESIS-2022.pdf
Size:
3.81 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: