Repository logo
 

Identifying cell types with single cell sequencing data

dc.contributor.committeeMemberWu, Fangxiang
dc.contributor.committeeMemberBui, Francis
dc.contributor.committeeMemberJin, Lingling
dc.contributor.committeeMemberXing, Li
dc.creatorZhang, Wenjing
dc.creator.orcid0000-0003-3350-6350
dc.date.accessioned2022-01-31T20:52:38Z
dc.date.available2022-01-31T20:52:38Z
dc.date.created2022-01
dc.date.issued2022-01-31
dc.date.submittedJanuary 2022
dc.date.updated2022-01-31T20:52:38Z
dc.description.abstractSingle-cell RNA sequencing (scRNA-seq) techniques, which examine the genetic information of individual cells, provide an unparalleled resolution to discern deeply into cellular heterogeneity. On the contrary, traditional RNA sequencing technologies (bulk RNA sequencing technologies), measure the average RNA expression level of a large number of input cells, which are insufficient for studying heterogeneous systems. Hence, scRNA-seq technologies make it possible to tackle many inaccessible problems, such as rare cell types identification, cancer evolution and cell lineage relationship inference. Cell population identification is the fundamental of the analysis of scRNA-seq data. Generally, the workflow of scRNA-seq analysis includes data processing, dropout imputation, feature selection, dimensionality reduction, similarity matrix construction and unsupervised clustering. Many single-cell clustering algorithms rely on similarity matrices of cells, but many existing studies have not received the expectant results. There are some unique challenges in analyzing scRNA-seq data sets, including a significant level of biological and technical noise, so similarity matrix construction still deserves further study. In my study, I present a new method, named Learning Sparse Similarity Matrices (LSSM), to construct cell-cell similarity matrices, and then several clustering methods are used to identify cell populations respectively with scRNA-seq data. Firstly, based on sparse subspace theory, the relationship between a cell and the other cells in the same cell type is expressed by a linear combination. Secondly, I construct a convex optimization objective function to find the similarity matrix, which is consist of the corresponding coefficients of the linear combinations mentioned above. Thirdly, I design an algorithm with column-wise learning and greedy algorithm to solve the objective function. As a result, the large optimization problem on the similarity matrix can be decomposed into a series of smaller optimization problems on the single column of the similarity matrix respectively, and the sparsity of the whole matrix can be ensured by the sparsity of each column. Fourthly, in order to pick an optimal clustering method for identifying cell populations based on the similarity matrix developed by LSSM, I use several clustering methods separately based on the similarity matrix calculated by LSSM from eight scRNA-seq data sets. The clustering results show that my method performs the best when combined with spectral clustering (Laplacian eigenmaps + k-means clustering). In addition, compared with five state-of-the-art methods, my method outperforms most competing methods on eight data sets. Finally, I combine LSSM with t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the data points of scRNA-seq data in the two-dimensional space. The results show that for most data points, in the same cell types they are close, while from different cell clusters, they are separated.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/13800
dc.subjectClustering
dc.subjectCell type identification
dc.subjectSparse similarity learning
dc.subjectSingle cell
dc.titleIdentifying cell types with single cell sequencing data
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentBiomedical Engineering
thesis.degree.disciplineBiomedical Engineering
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHANG-THESIS-2022.pdf
Size:
1.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: