Computational PANGenomics

Downloadable poster in PDF

   IMPORTANT DATES for this Course
   Deadline for applications: February 23rd 2018
   Course date: March 6th - March 9th 2018

Candidates with adequate profile will be accepted in the next 72 hours after the application until we reach 20 participants.


Tobias Marschall is an assistant professor at the Center for Bioinformatics at Saarland University and affiliated with the Max Planck Institute for Informatics as a senior researcher. He is heading the Algorithms for Computational Genomics group. His research targets algorithmic and statistical challenges arising from present-day genomics technologies. A particular focus is on population-scale sequencing efforts, on structural variation discovery, on algorithms and data structures for pangenomes and on haplotyping in various contexts (from humans to viruses). He has co-organized two workshops on computational pangenomics, one at the Lorentz Center in Leiden, The Netherlands, which has lead to a comprehensive review paper, and one at ECCB 2016.
Tobias received his undergraduate education at Bielefeld University (Germany), where he studied computer science and physics. He obtained his PhD in bioinformatics from TU Dortmund. After that, he moved to Centrum Wiskunde & Informatica (Amsterdam, The Netherlands), the Dutch national institute for mathematics and computer science, as a postdoctoral researcher. During his time as a postdoc, he also was a long-term participant of the semester program on Mathematical and Computational Approaches in High-Throughput Genomics held at the Institute for Pure and Applied Mathematics (IPAM) at University of California Los Angeles (UCLA).

Affiliation: Center for Bioinformatics at Saarland University and Max Planck Institute for Informatics

Erik Garrison is a PhD student at the Wellcome Trust Sanger Institute and Cambridge University. His ongoing doctoral research has focused on the development of a software toolkit for practical pangenomics: vg. He has seven years of experience in genomics, where he has worked on the development of sequencing systems, participated in large scale sequencing projects such as the 1000 Genomes Project, and authored popular bioinformatics software such as the freebayes variant caller. Raised in Kentucky, Erik obtained an undergraduate degree in the social sciences from Harvard University. After graduation he worked at Harvard Medical School, One Laptop Per Child, and Boston College before beginning his studies at the Sanger Institute.

Affiliation: Wellcome Trust Sanger Institute and Cambridge University

Course Description

Reference genomes have become central to bioinformatics approaches, and form the core of standard analyses using contemporary sequencing data. However, the use of linear reference genomes, which provide the sequence of one representative genome for a species, is increasingly becoming a limitation as the number of sequenced genomes grows. In particular, they tend to bias us away from the observation of variation in the genomes we study.

A general solution to this problem is to use a pangenome that incorporates both sequence and variation from many individuals as our reference system. This pangenome is naturally modeled as a graph with annotations, and can provide all the functionality traditionally provided by linear reference genomes. Unlike linear reference genomes, a pangenome readily incorporates both small and large variation, allowing bias-free genotyping at known alleles.

In this course we will explore the use of modern bioinformatic tools that allow researchers to use pangenomes as their reference system when engaging in studies of organisms of all types. Such techniques will aid any researcher working on organisms of high genetic diversity or on organisms lacking a high-quality reference genome. This course targets all researchers interested in learning about an exciting paradigm shift in computational genomics.


Participants first will learn about limitations of linear reference-based methods and work through a brief refresher or introduction to standard approaches for processing sequencing data, including read alignment and variant calling. Provided these motivating examples, we will use data from a variety of relevant sources to develop an intuition about pangenomic methods and a practical familiarity with applicable tools.

Target Audience

This course is oriented towards biologists and bioinformaticians. The course will be of particular interest to researchers investigating organisms without a reference genome or populations featuring high levels of genetic diversity.

Course Pre-requisites

  • Minimal to intermediate skills in using Unix/Linux command line.
  • Familiarity with basic sequence data handling procedures and most common data formats used in Genomics.

(*) Note: An optional free session will be arranged with the participants that may be interested, on the EVE of the first day (Monday, May 5th at 4PM), where we will ensure that every participant willing to attend can use the Linux operating system at the required level.

Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:  January 26th 2018