WPAT09

Workflows and Programatically Accessible Tools



   Deadline for applications: October 30th 2009
   Notification of acceptance date: November 3rd 2009
   Course date: November 16th - November 17th 2009

Instructor:

Katy Wolstencroft is a Research Fellow, working with Professor Carole Goble in the School of Computer Science at the University of Manchester. Her interests involve workflows, semantic service discovery and biological ontologies. She has been working on Taverna as part of the myGrid project (http://www.mygrid.org) for the past 3 years where she leads the outreach and teaching activities.

University of Manchester, Manchester, UK


Course description:

The quantity and size of bioinformatics data is continually growing, providing rich resources to researchers, but also presenting problems of interoperability and data management. Workflow technologies offer a solution to this problem as they enable the automated and systematic use of distributed bioinformatics data and applications from the scientist's desktop. This provides a fast and efficient methodology for conducting large-scale experiments without the overhead of installing and maintaining local resources. Additionally, data and metadata management capabilities facilitate the support of the whole in silico experiment life cycle.
The underlying technology for many workflow systems is cutting-edge computer science technologies. For example, the use of distributed Web Services by workflow management systems allows the co-ordinated access to supercomputing resources from standard desktop computers. Also, the use of ontologies for describing workflow processes and for service descriptions and discovery enables the possibility of automated workflow composition and the collection of experimental provenance.
Workflows have already been used in many areas of biology, including; transcriptomics, proteomics, systems biology, data integration, comparative genomics, sequence analysis and structural biology, however, their use is not restricted to these areas. Any in silico experiment involving multiple steps in its analysis and multiple data sources and resources can potentially benefit from using workflows.
Taverna (a component of the myGrid Project) is a workflow management system that allows user to develop and run workflows by combining distributed and local analysis tools and data resources. It has over 62500 downloads and is used by over 350 institutions worldwide and has been used by scientists in different domains, including amongst others, life sciences, medicine and astronomy. Many of the workflows that have already been developed are available for reuse or repurposing from myExperiment, a repository for publishing workflows and a social networking resource for sharing expertise and experience. myExperiment also enables workflow enactment through a web-interface, providing an alternative mechanism for running workflows and sharing results between groups of collaborators. Currently, myExperiment has 1400 users and stores over 560 workflows. This pool of workflows and know-how provides scientists with a wealth of components to integrate into the design of new experiments.

Aims and objectives:

This course aims to provide attendees with a "hands-on" introduction to designing and building workflows in Taverna. We will provide background on workflow-based systems available and examples of workflow projects. We will show practical demonstrations of workflow construction and highlight associated issues such as provenance, service discovery and workflow reuse. The objectives of the course are to:
  1)Understand and experience the steps involved in good workflow design and implementation
  2)Design and Build workflows using distributed and local analysis tools and data resources
  3)Understand the major issues faced when designing and building workflows   4)Gain experience of where using workflows would be advantageous to your research

Target Audience:

This "tutorial-style" course will be of benefit for anyone (postgraduate students and researchers) wishing to explore new methods of designing complex, and/or repetitive, in silico experiments in the life sciences. It will also be of interest to those who are already exploring workflow technology and have use cases in mind.



Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:  June 18th 2009