Computational PANGenomics

Downloadable poster in PDF

   IMPORTANT DATES for this Course
   Deadline for applications: May 10th 2022
   Course date: May 23rd - May 27th 2022

Candidates with adequate profile will be accepted in the next 72 hours after the application until we reach 20 participants.


Erik Garrison builds methods that let us understand the precise relationships between thousands of genomes. His work on this topic began with the development of Bayesian methods to detect and genotype genomic variants, with application of these methods to the thousands of human genomes cataloged in the 1000 Genomes Project. Lessons learned in that effort guided him to work on unbiased methods for genome inference based on graphical models of pangenomes. In these, the genome is encoded in a graph that may represent a population sample of individuals from the same species, a metagenome, the diploid genome of a single individual, or any other useful collection of genomic sequence information. Erik have shown that this approach provides more accurate alignment of reads when it is possible to construct a high-quality pangenome, and is currently applying it to the construction of pangenome graphs for humans as part of the HPRC.

Affiliation: University of Tennessee Health Science Center, Memphis, TN, US

Andrea Guarracino is a Scientific Advisor at the Human Technopole in Milan, where he is part of the Pangenomic Research Unit, a bioinformatics group dedicated to the development and application of new methods to analyze collections of sequences (also called pangenomes). After a degree in computer engineering and an experience in a software company, he met biology by chance and fell in love with it. He then decided to pursue a master's degree in bioinformatics and then a PhD in the same field. During his doctorate, he explored chromosomal instability in colorectal cancer stem cells for the development of novel therapeutic approaches. In the middle of his research, "thanks" to the pandemic, he attended a virtual event where he encountered pangenomics and began actively working on developing new tools for building and analyzing pangenomes, which he is still currently doing.

Affiliation: Human Technopole, Milan, IT

Course Description

Reference genomes are central to most bioinformatics approaches. However, the use of a single reference genome to guide an analysis can result in reference bias: other genomes appear more similar to the reference than they actually are. We might miss, or misunderstand information about genome variability and relationship that cannot be expressed relative to a chosen reference genome. Moreover, new assembly methods are making it easier than ever to generate high-quality complete genome assemblies. To obtain a total understanding of variation between multiple whole genomes, we need to use a model that expresses many genomes and their mutual alignment. A general solution to these problems is to use a pangenome graph wherein genomes are described as labeled walks through an underlying s equence graph. In this course, we will work with methods to build such graphs from whole genome assemblies, and to use the built graphs in an array of downstream applications in comparative genomics, evolution, variation analysis, sequence alignment, and phenotype association.


Participants will develop an understanding of pangenome concepts, and refine this through practical experience with methods to build and interrogate pangenome graphs. We will apply these methods to difficult study questions wherein we need to understand the relationship between many genomes, or account for variability when we analyze new genomes. Participants will leave with a deep understanding of pangenome methods based on whole genome assemblies.

Target Audience

This course is oriented towards biologists and bioinformaticians. The course will be of particular interest to researchers investigating organisms without a reference genome or populations featuring high levels of genetic diversity, to researchers in comparative genomics, and to researchers who are assembling pangenomes of any species.

Course Pre-requisites

  • Minimal to intermediate skills in using Unix/Linux command line.
  • Familiarity with basic sequence data handling procedures and most common data formats used in Genomics.


Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated: May 3rd 2022