Dataflows and Provenance: From Nested Relational Calculus to the Open Provenance Model

KWASNIKOWSKA, Natalia

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/21926

Title:	Dataflows and Provenance: From Nested Relational Calculus to the Open Provenance Model
Authors:	KWASNIKOWSKA, Natalia
Advisors:	VAN DEN BUSSCHE, Jan
Issue Date:	2011
Abstract:	Excerpt from the introduction: ... A conceptual data model for dataflow repositories should offer a precise specification of the types of data (including the data flows themselves) stored in the repository, and of the relationships among them. Such a data model is important because it provides a formal framework that allows: • Analyzing, in a rigorous manner, the possibilities and limitations of dataflow repositories. • Comparing, again in a rigorous manner, the functionality of different existing systems. • Highlighting differences in meaning of common notions as used by different authors or in different systems, such as "workflow" , "provenance" , or "collection" . It should be clear that the purpose of our contribution is not to propose a blueprint for a new dataflow management system with innovative features that "competing" systems do not support yet. Rather, our work is a detailed effort to model such a system, hoping to contribute a formally defined synthesis of some key database aspects of dataflow management systems. Indeed, each of the aspects we touch upon has been addressed in existing systems, of course each particular system with its own emphasis. In a perfect world, one would like a standard database schema for the exchange of dataflow specifications and executions, similar to the myExperiment.org initiative that is specific to the Taverna system. The need for such a repository has also been emphasized by Lacroix. Our work is a new step in this direction, after initial steps during the 1990s did not receive the follow-up they deserved. The need for a workflow repository is also acknowledged in other fields, as shown by Blockeel and Vanschoren's Experiment Databases for Machine Learning. Dataflow management systems have been largely developed within the computer systems community, and have received less interest in the database theory community. We hope to partially fill this gap by the present work. We do note that much attention has been paid to automated verification of data intensive workflows, but this is a research focus that is orthogonal to the focus on data modelling and querying taken in this work. For a review of scientific workflow management and provenance systems, we refer to Freire et al., Yogesh et al., Bose and Frew, and Davidson and Freire. Where-provenance is one form of data provenance as investigated in database research. In this work we show how the concept of where-provenance can also be defined in the context of workflow provenance. ...
Document URI:	http://hdl.handle.net/1942/21926
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
Natalia Kwasnikowska.pdf		24.77 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM