Facilitating Asynchronous Collaboration in Scientific Workflow Composition Using Provenance
Recent advances in various domains have led to a data explosion, which has created many significant scientific discovery opportunities. Therefore, researchers need systems that allow them to analyze data efficiently. Scientific Workflow Management Systems (SWfMS) such as Galaxy, Taverna, Kepler and, VizSciFlow are popular software among researchers for data-intensive experiments. Advances in other domains have led to the increasing complexity of the experiments and the demand for collaboration between scientists. Many scientific experiments require scientists from different domains to work collaboratively toward addressing a problem. Very few of the existing SWfMSs such as ProveDB, SciWorCS, Workspace, support collaboration but in many cases, their method are not efficient. Researchers can share their work in existing collaborative data analysis systems, meaning all the collaborators must work on a single version of the workflow, which increases the chance of potential interference as the number of collaborators grows. Furthermore, when collaborators join an experiment, to contribute effectively, they require information about the project’s status, such as the history of its changes and current problems. Existing SWfMSs neither offer this insight nor provide group awareness in an asynchronous setting. The first contribution of this work is that we provide tools to facilitate collaborative workflow composition in the context of SWfMS. With this aim, we simulated some standard concepts of version control systems (VCS e.g., Github), such as branching and versioning in SWfMSs. As a proof of concept of collaborative features, we developed an API capable of capturing the provenance information and managing the branches and versions of the workflow. As the second contribution, we propose a set of visualizations and reports in order to provide the information collaborators require when joining a project or continuing to collaborate with added efficiency. We capture the system event’s log, also known as provenance information, during workflow composition and execution phases, and using such data, we generate the visualizations and reports. Before implementing the visualizations, we created a demo of our work and surveyed potential users to discover how much our proposed visualizations could contribute to group awareness. Moreover, we asked to what extent the proposed version control system could help address shortcomings in collaborative experiments. We invited programmers and researchers who had experience using SWfMSs, and domain specialists from associated areas to participate in our study. We selected particular roles due to the relevance of their experience to our research topic. Twelve individuals participated in the survey. They provided valuable feedback about improving the proposed collaborative tools and what other kinds of visualizations they would need as potential users. 70% of the participants found the proposed tools are beneficial for collaborative workflow composition.
Provenance, Groupware, Workflow, SWfMS, Asynchronous Collaboration, Version Control System, Group Awareness
Master of Science (M.Sc.)