The BioLighthouse: Reusable Software Design for Bioinformatics
MetadataShow full item record
Advances in next-generation sequencing have accelerated the field of microbiology by making accessible a wealth of information about microbiomes. Unfortunately, microbiome experiments are among the least reproducible in terms of bioinformatics. Software tools are often poorly documented, under-maintained, and commonly have arcane dependencies requiring significant time investment to configure them correctly. Microbiome studies are multidisciplinary efforts but communication and knowledge discrepancies make accessibility, reproducibility, and transparency of computational workflows difficult. The BioLighthouse uses Ansible roles, playbooks, and modules to automate configuration and execution of bioinformatics workflows. The roles and playbooks act as virtual laboratory notebooks by documenting the provenance of a bioinformatics workflow. The BioLighthouse was tested for platform dependence and data-scale dependence with a microbial profiling pipeline. The microbial profiling pipeline consisted of Cutadapt, FLASH2, and DADA2. The pipeline was tested on 3 canola root and soil microbiome datasets with differing orders of magnitude of data: 1 sample, 10 samples, and 100 samples. Each dataset was processed by The BioLighthouse with 10 unique parameter sets and outputs were compared across 8 computing environments for a total of 240 pipeline runs. Outputs after each step in the pipeline were tested for identity using the Linux diff command to ensure reproducible results. Testing of The BioLighthouse suggested no platform or data-scale dependence. To provide an easy way of maintaining environment reproducibility in user-space, Conda and the channel Bioconda were used for virtual environments and software dependencies for configuring bioinformatics tools. The BioLighthouse provides a framework for developers to make their tools accessible to the research community, for bioinformaticians to build bioinformatics workflows, and for the broader research community to consume these tools at a high level while knowing the tools will execute as intended.
DegreeMaster of Science (M.Sc.)
CommitteeKusalik, Anthony; Horsch, Michael; Dumonceaux, Tim; Sharbel, Tim
Copyright DateAugust 2020