Repository logo
 

A Provenance-Aware Visual Framework for Explorative and Reproducible Computational Scientific Experiments

Date

2025-06-02

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

0000-0003-1704-669X

Type

Thesis

Degree Level

Doctoral

Abstract

Researchers encapsulate diverse tools and data into a cohesive pipeline, known as a scientific workflow, to conduct computational scientific experiments. This workflow is then submitted to a runtime infrastructure for execution. Scientific Workflow Management Systems (SWfMSs) integrate various tools, techniques, languages, and graphical interfaces to provide platforms for specifying, executing, monitoring, and managing workflows, effectively abstracting the complexities of data and process management for researchers. These systems support both reproducible research, which ensures valid results through established protocols, and exploratory research, which investigates new phenomena iteratively. Provenance information collected during workflow composition and execution validates workflow structure and execution results via queries and visualization. SWfMSs provide graphical or textual interfaces for workflow composition using either a graphical or textual language. Graphical languages are user-friendly but can become unwieldy with complex workflows, whereas textual languages offer concise expressions but require steeper learning curves. Despite advancements in execution environments, limited usability in composition interfaces hinders SWfMS adoption, leading to the retirement of once-prominent systems. Additionally, managing tool integration poses challenges, especially in web-based SWfMSs, which must serve many users simultaneously and deal with platform and tool incompatibility. Empowering end-users to integrate external tools via extensibility mechanisms is essential for improving usability and flexibility in explorative research. Similarly, reproducing external experiments within SWfMSs is challenging and requires innovative solutions. To address these issues, we conducted five studies. First, we investigated existing SWfMS architectures and derived a novel architecture for a graphical SWfMS framework designed for intuitive workflow composition, along with abstracted execution and data and process management. Second, we examined the challenges in designing scientific workflows and addressed them by proposing an interactive experiment development environment. This framework facilitates the rapid development of scientific experiments by combining textual Domain-Specific Language (DSL)-based workflow specifications with graphical tools that intuitively accelerate composition and enhance comprehension. The third study designed a domain-specific environment for capturing, querying, and visualizing provenance information. The fourth study addressed tool integration challenges through bioinformatics and software analytics case studies. The fifth study proposed packaging experiments and complex tools in Docker containers and registering them via a graphical interface, overcoming installation barriers of entire experiments and complex tools and enhancing integration with SWfMS composition and runtime environments. Through prototypes, experiments, user studies, and case studies, our work advances the usability, flexibility, reproducibility, extensibility, and scalability of SWfMSs for computational scientific experiments.

Description

Keywords

Scientific Experiments, Scientific Workflow Management System, SWfMS, Data Analysis, Provenance

Citation

Degree

Doctor of Philosophy (Ph.D.)

Department

Computer Science

Program

Computer Science

Advisor

Roy, Chanchal K
Schneider, Kevin A
Roy, Banani

Committee

Mohamed, Ebrahim Bedeer;Mondal, Manishankar;Hamou-Lhadj, Wahab;Derek, Eager;McQuillan, Ian

Part Of

item.page.relation.ispartofseries

DOI

item.page.identifier.pmid

item.page.identifier.pmcid