Development of New Computational Tools for Analyzing Hi-C Data and Predicting Three-Dimensional Genome Organization
Background: The development of Hi-C (and related methods) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. There has been extensive effort in developing new tools to analyze this data in order to better understand the relationship between 3D genomic structure and function. While useful, the existing tools are far from maturity and (in some cases) lack the generalizability that would be required for application in a diverse set of organisms. This is problematic since the research community has proposed many cross-species "hallmarks" of 3D genome organization without confirming their existence in a variety of organisms. Research Objective: Develop new, generalizable computational tools for Hi-C analysis and 3D genome prediction. Results: Three new computational tools were developed for Hi-C analysis or 3D genome prediction: GrapHi-C (visualization), GeneRHi-C (3D prediction) and StoHi-C (3D prediction). Each tool has the potential to be used for 3D genome analysis in both model and non-model organisms since the underlying algorithms do not rely on any organism-specific constraints. A brief description of each tool follows. GrapHi-C is a graph-based visualization of Hi-C data. Unlike existing visualization methods, GrapHi-C allows for a more intuitive structural visualization of the underlying data. GeneRHi-C and StoHi-C are tools that can be used to predict 3D genome organizations from Hi-C data (the 3D-genome reconstruction problem). GeneRHi-C uses a combination of mixed integer programming and network layout algorithms to generate 3D coordinates from a ploidy-dependent subset of the Hi-C data. Alternatively, StoHi-C uses t-stochastic neighbour embedding with the complete set of Hi-C data to generate 3D coordinates of the genome. Each tool was applied to multiple, independent existing Hi-C datasets from fission yeast to demonstrate their utility. This is the first time 3D genome prediction has been successfully applied to these datasets. Overall, the tools developed here more clearly recapitulated documented features of fission yeast genomic organization when compared to existing techniques. Future work will focus on extending and applying these tools to analyze Hi-C datasets from other organisms. Additional Information: This thesis contains a collection of papers pertaining to the development of new tools for analyzing Hi-C data and predicting 3D genome organization. Each paper's publication status (as of January 2020) has been provided at the beginning of the corresponding chapter. For published papers, reprint permission was obtained and is available in the appendix.
3D Genome Organization, 3D Genome Structure, 3D Genome Reconstruction Problem, Hi-C, Fission Yeast
Doctor of Philosophy (Ph.D.)