CPANG19

Computational PANGenomics

Downloadable poster in PDF

   IMPORTANT DATES for this Course
   Deadline for applications: September 2nd 2019
   Course date: September 9th - September 13th 2019

Candidates with adequate profile will be accepted in the next 72 hours after the application until we reach 20 participants.

Instructors:

Erik Garrison is a Postdoctoral fellow at the University of California,Santa Cruz. His research has focused on the development of a software toolkit for practical pangenomics: vg. He has eight years of experience in genomics, where he has worked on the development of sequencing systems, participated in large scale sequencing projects such as the 1000 Genomes Project, and authored popular bioinformatics software such as the freebayes variant caller. Raised in Kentucky, Erik obtained an undergraduate degree in the social sciences from Harvard University. After graduation he worked at Harvard Medical School, One Laptop Per Child, and Boston College before beginning his studies at the Sanger Institute.

Affiliation: University of California, Santa Cruz, US

Mikko Rautiainen is a PhD student at the Max Planck Institute for Informatics. His doctoral work consists of theoretical and practical work on graph genomes and various related topics, with a focus on read alignment to graphs. His research includes projects such as genome assembly, read error correction and RNA expression quantification. He is the author of GraphAligner, a tool for aligning third generation sequencing reads such as Pacific Biosciences or Oxford Nanopore reads to sequence graphs. He graduated from the University of Helsinki with a masters degree in computer science, has worked professionally for three years as a software developer and now studies for a doctorate in bioinformatics at Saarbruecken.

Affiliation: Max Planck Institute for Informatics, Algorithms for Computational Genomics, Saarbrücken, DE

Course Description

Reference genomes have become central to bioinformatics approaches, and form the core of standard analyses using contemporary sequencing data. However, the use of linear reference genomes, which provide the sequence of one representative genome for a species, is increasingly becoming a limitation as the number of sequenced genomes grows. In particular, they tend to bias us away from the observation of variation in the genomes we study.

A general solution to this problem is to use a pangenome that incorporates both sequence and variation from many individuals as our reference system. This pangenome is naturally modelled as a graph with annotations and can provide all the functionality traditionally provided by linear reference genomes. Unlike linear reference genomes, a pangenome readily incorporates both small and large variation, allowing bias-free genotyping at known alleles.

In this course we will explore the use of modern bioinformatic tools that allow researchers to use pangenomes as their reference system when engaging in studies of organisms of all types. Such techniques will aid any researcher working on organisms of high genetic diversity or on organisms lacking a high-quality reference genome. This course targets all researchers interested in learning about an exciting paradigm shift in computational genomics.

Objectives

Participants first will learn about limitations of linear reference-based methods and work through a brief refresher or introduction to standard approaches for processing sequencing data, including read alignment and variant calling. Provided these motivating examples, we will use data from a variety of relevant sources to develop an intuition about pangenomic methods and a practical familiarity with applicable tools.

Target Audience

This course is oriented towards biologists and bioinformaticians. The course will be of particular interest to researchers investigating organisms without a reference genome or populations featuring high levels of genetic diversity.

Course Pre-requisites

  • Minimal to intermediate skills in using Unix/Linux command line.
  • Familiarity with basic sequence data handling procedures and most common data formats used in Genomics.

Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated: July 29th 2019