|dc.description.abstract||Ciliates are a type of unicellular eukaryotic organism that has two types of nuclei within each cell; one is called the macronucleus (MAC) and the other is known as the micronucleus (MIC). During mating, ciliates exchange their MIC, destroy their own MAC, and create a new MAC from the genetic material of their new MIC. The process of developing a new MAC from the exchanged new MIC is known as gene assembly in ciliates, and it consists of a massive amount of DNA excision from the micronucleus, and the rearrangement of the rest of the DNA sequences. During the gene assembly process, the DNA segments that get eliminated are known as internal eliminated segments (IESs), and the remaining DNA segments that are rearranged in an order that is correct for creating proteins, are called macronuclear destined segments (MDSs).
A topic of interest is to predict the correct order to descramble a gene or chromosomal segment. A prediction can be made based on the principle of parsimony, whereby the smallest sequence of operations is likely close to the actual number of operations that occurred. Interestingly, the order of MDSs in the newly assembled 22,354 Oxytricha trifallax MIC chromosome fragments provides evidence that multiple parallel recombinations occur, where the structure of the chromosomes allows for interleaving between two sections of the developing macronuclear chromosome in a manner that can be captured with a common string operation called the shuffle operation (the shuffle operation on two strings results in a new string by weaving together the first two, while preserving the order within each string). Thus, we studied four similar systems involving applications of shuffle to see how the minimum number of operations needed to assemble differs between the types. Two algorithms for each of the first two systems have been implemented that are both shown to be optimal. And, for the third and fourth systems, four and two heuristic algorithms, respectively, have been implemented. The results from these algorithms revealed that, in most cases, the third system gives the minimum number of applications of shuffle to descramble, but whether the best implemented algorithm for the third system is optimal or not remains an open question. The best implemented algorithm for the third system showed that 96.63% of the scrambled micronuclear chromosome fragments of Oxytricha trifallax can be descrambled by only 1 or 2 applications of shuffle. This small number of steps lends theoretical evidence that some structural component is enforcing an alignment of segments in a shuffle-like fashion, and then parallel recombination is taking place to enable MDS rearrangement and IES elimination.
Another problem of interest is to classify segments of the MIC into MDSs and IESs; this is the second topic of the thesis, and is a matter of determining the right "class label", i.e. MDS or IES, on each nucleotide. Thus, training data of labelled input sequences was used with hidden Markov models (HMMs), which is a well-known supervised machine learning classification algorithm. HMMs of first-, second-, third-, fourth-, and fifth-order have been implemented. The accuracy of the classification was verified through 10-fold cross validation. Results from this work show that an HMM is more likely to fail to accurately classify micronuclear chromosomes without having some additional knowledge.||