NGSDM11

Next Generation Sequencing Data Management

   IMPORTANT DATES for this Course
   Deadline for applications: April 30th 2011
   Notification of acceptance dates:
        EARLY: April 16th 2011
        NORMAL: May 10th 2011
   Course date: May 26th to May 27th 2011

Instructors:

Matthias_Haimel is working as a software engineer for the Ensembl Genomes Team (led by Dr. Paul Kersey) at the European Bioinformatics Institute in Hinxton, UK. At the Upper Austria University of Applied Sciences in Hagenberg, Austria, he studied Bioinformatics with a focus on Computer Science and graduated in 2006 after working on his diploma thesis at the Friedrich Miescher Institute for Biomedical Research in Basel, Switzerland. Starting at the EBI in 2006, he took over the development and production for the International Protein Index and worked in a team on redeveloping the back-end production framework in the Integr8 project. Since 2008 he has worked on the application of next generation sequence assembly algorithms to a variety of genomes and started developing Curtain, a paired-end assembly pipeline for larger genomes.

Affiliation: European Bioinformatics Institute, Hinxton, UK

David Judge is a Computer Scientist that teaches Bioinformatics since 1985. He runs the Bioinformatics Training Facilty housed at the Department of Genetics in the University of Cambridge, providing the necessary environment for graduate and undergraduate courses, on top of a comprehensive training programme in cooperation with the European Bioinformatics Institute, the Wellcome Trust Sanger Institute and the Instituto Gulbenkian de Cięncia through GTPB. He teaches Bioinformatics in several international training programmes and is regularly invited to teach in many places in Europe, Asia, Africa and America. His course notes and exercises are well known in the international community of Bioinformatics professionals and users, many of which (difficult to count) have had their first contact with Bioinformatics through him.

Affiliation: University of Cambridge, Department of Genetics, Cambridge, UK

Note: Matthias and David have prepared a brand new set of course notes and exercises for NGSDM11.

Course description

Since new sequencing technologies have dropped the cost of sequencing entire genomes, the major difficulty has shifted away from the generation of data to handling, processing and assembling large quantities of reads.
This course is aimed at providing hands-on skills with tools for assembly and mapping that can handle NGS data. To achieve that, a brief introduction to the methodologies is provided. From the beginning of the course, we will mix hands-on exercises with short presentations.
We will explain the use of de Bruijn-based algorithms to enhance assembly results. We will also illustrate the need for using sequence quality to avoid pitfalls in the assembly process.
This course will also provide a review of well known de Bruijn-based assemblers. Through exercises using both real and simulated data, we will be able to show their advantages and limitations. The course will provide practical usage skills in a whole set of additional tools for mapping and visualization.

Course participants will learn to:
- Run velvet assemblies using a variety of reads and how to mix them
- Assess the quality and filtering of short-reads
- Perform visualisation of assemblies
- Create, use and view SAM/BAM files

Software that is extensively used: Velvet, Curtain, Cortex, FastQC, bwa, samtools/Picard
Software that is visited for illustration: EnsemblGenomes,Tablet, IGV, ABySS-Explorer

Course Pre requisites:
- basic knowledge of working with Linux / Unix command line
- basic knowledge of Next Generation Sequencing data
- specific interest in de novo genome assembly

Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated: March 6th 2011