Analysis and manipulation of phylogenomic data using ETE

   Deadline for applications: June 11th 2010
   Notification of acceptance dates:
        EARLY:June 2nd
        NORMAL: June 12th 2010
   Course date: June 23rd - June 25th 2010


Jaime Huerta-Cepas is a postdoc researcher within the Comparative Genomics group at the Centre for Genomic Regulation>, headed by Toni Gabaldón. He got his PhD on human genome evolution [1] and large scale phylogentic analyses at the "Universidad Autónoma de Madrid" in 2008. Jaime is the main developer of the phylomeDB database [2], and the ETE toolkit [3]. His work focuses on applying large scale phylogenetic analyses to address different biological problems, such as understanding gene duplication, the evolution of gene expression, functional genome annotation, orthology and paralogy prediction, and the reconstruction of species Tree of Life.

Affiliation: CRG-Centre for Genomic Regulation, Barcelona, ES

Marina Marcet-Houben obtained her degree in Biochemistry in the Rovira i Virgili University (Tarragona, Spain) she presented her diploma in advanced studies in the evolutionary genomics group at the same university. She is currently a last year PhD student in the Comparative Genomics lab at the Center for Genomic Regulation in Barcelona. Her main research interests are related to the use of large scale phylogenomics tools on the evolution of fungi [4] as well as in studies involving the robustness of species trees [5]. Marina is an active collaborator in the phylomeDB project and the main developer of TreeKo, a tool for comparing phylogenetic tree topologies.

Affiliation: CRG-Centre for Genomic Regulation, Barcelona, ES

Course description:

Phylogentic analyses are gradually reaching genomic scales. Nowadays, many resources and surveys encompass a large number of trees that, often, cannot be manually analyzed. Bioinformatics toolkits are intended to provide a flexible framework to deal with specific data in a programmatic way, thus facilitating the analysis of large collections of data. The Environment for Tree Exploration (ETE, is a Python programming toolkit specially focused on dealing with hierarchical trees. It allows, for instance, to perform a number of operations on phylogenetic trees, as well as designing automatic pipelines. It also provides a highly customizable drawing engine, which can be used to create complex annotated tree images in an automatic way or to interactively explore single trees. Moreover, the ETE toolkit is not only limited to large scale analyses, as it can be used to easily develop specific tree analysis methods for single trees.
The purpose of this course is to provide an introduction to the analysis of phylogenetic trees. It will cover a broad range of tasks that are usually required in any phylogenomic analysis: tree rooting, prediction of orthology and paralogy relationships, tree annotation, calculating distances among sequences or species, tree pruning, trees comparison, and tree visualization. The use of large scale phylogenomic resources, such as phylomeDB or Ensembl Compara, will be also tackled through examples and exercises. This course will be mostly practical and will be focused on solving real life examples.

Course Pre-requisites:

Course attendees are expected to have basic programming skills (not necessarily in Python, although it is recommended*). All exercises will consist on developing Python scripts to perform different analysis on phylogenetic trees using the ETE toolkit on a GNU/Linux environment.

*Important Note: NO introduction to Python programming is scheduled in the course. However, Python is a very intuitive language that can be learned quickly when you have programmed in other languages. As a reference, Chapters 3-7 and 9 from this tutorial would be more than enough to follow the whole course.

Detailed Program


[1] J. Huerta-Cepas, H. Dopazo, J. Dopazo and T. Gabaldón. The Human Phylome. Genome Biology 8:r109, 2007.
[2] Huerta-Cepas, J., Bueno, A., Dopazo, J., Gabaldon, T. PhylomeDB: A database for complete collections of gene phylogenies. Nucleic Acids Res. 2008 Jan. 36 (Database issue):D491-6.
[3] Jaime Huerta-Cepas, Joaquín Dopazo and Toni Gabaldón. ETE: A python Environment for Tree Exploration. BMC Bioinformatics. 2010, 11:24.
[4] Marcet-Houben M, Gabaldón T. Acquisition of prokaryotic genes by fungal genomes. Trends Genet. 2010 Jan;26(1):5-8.
[5] Marcet-Houben M, Gabaldón T. The tree versus the forest: the fungal tree of life and the topological diversity within the yeast phylome. PLoS One. 2009;4(2):e4357. Epub 2009 Feb 3.

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:  April 19th 2010