Repository logo
 

Workflow Provenance: from Modeling to Reporting

dc.contributor.advisorRoy, Chanchal K.
dc.contributor.advisorSchneider, Kevin A.
dc.contributor.committeeMemberKhan, Shahedul
dc.contributor.committeeMemberDeters, Ralph
dc.contributor.committeeMemberKeil, Mark
dc.creatorFerdous, Rayhan 1992-
dc.creator.orcid0000-0002-5937-0925
dc.date.accessioned2019-03-12T04:21:21Z
dc.date.available2019-03-12T04:21:21Z
dc.date.created2019-02
dc.date.issued2019-03-11
dc.date.submittedFebruary 2019
dc.date.updated2019-03-12T04:21:21Z
dc.description.abstractWorkflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10388/11902
dc.subjectScientific workflow, provenance, log analytics, automated logging, programming model, graph analysis, provenance questions, classification, taxonomy, data visualization, software engineering, software architecture.
dc.titleWorkflow Provenance: from Modeling to Reporting
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FERDOUS-THESIS-2019.pdf
Size:
8.86 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: