Computational Phyloinformatics - a GTPB-NESCent Collaboration

   NEW extended deadline for applications: July 3rd 2009
   Notification of acceptance: within three working days
   Course dates: July 9th - July 19th 2009


Hilmar Lapp is the Assistant Director for Informatics at the US National Evolutionary Synthesis Center (NESCent). In this role he is involved in several phyloinformatics and cyberinfrastructure research projects (such as Phenoscape, Dryad, Hackathons, PhyloWS). He has been an active leader in several free and open-source software projects for more than a decade, among which are BioPerl and BioSQL. He obtained his M.Sc. in Biology from the University of Freiburg (Germany), and previously held bioinformatics research and leadership positions at the Novartis Research Institute in Vienna (Austria) and the Genomics Institute of the Novartis Foundation (GNF) in San Diego, California. His main interests are in building sustainable distributed developer communities, data modeling, large-scale data integration, and reusability of data and software.

Affiliation US National Evolutionary Synthesis Center (NESCent)

Darin London is a Lead Programmer at the Institute for Genome Sciences and Policy, Duke University Medical Center. He is involved with automating analyses for clinical trials, developing data sharing systems to facilitate collaborative research, and designing analysis pipelines. Previously he was a lead developer of the BioMart system at the European Bioinformatics Institute, and has also held a position with the Bioinformatics Unity at GlaxoSmithkline, Inc. He has a Masters of Biology from Texas Tech University.

Affiliation Institute for Genome Sciences and Policy, Duke University Medical Center

William Piel is Associate Director for Evolutionary Informatics at the Yale Peabody Museum. He has developed and managed TreeBASE, a database of phylogenetic knowledge, and is involved in various projects relating to phyloinformatics, such as pPOD, PhyloWidget, and PhyloDB. He is currently active in developing a new version of TreeBASE, mapping the evolution of wing patterns in butterflies, assembling a tree of all plants as an iPlant Grand Challenge, and developing new tree visualization tools for the Encyclopedia of Life. Piel has a Ph.D. in Organismic and Evolutionary Biology from Harvard University.

Affiliation Peabody Museum of Natural History, Yale University

Rutger Vos is a postdoctoral research fellow in Wayne Maddison's lab at the University of British Columbia where he studies computational biology, phylogenetics, and evolution. He is the main developer for the Bio::Phylo and NeXML open source projects, and contributes to CIPRES, TreeBASE, and Mesquite. Rutger has taught programming, phylogenetic theory, and phylogenetic methods at the University of Amsterdam, Simon Fraser University, and the U.S. National Evolutionary Synthesis Center. Rutger received a Ph.D. from Simon Fraser University with a dissertation on big tree inference.

Affiliation University of British Columbia

Course description:

Computational Phyloinformatics will provide hands-on instruction in phyloinformatics using Perl (BioPerl and Bio::Phylo) and SQL (Postgres and BioSQL). Phylogenetics is key to studying evolution, systematics, comparative genomics, and bioinformatics — phylogenies are now ubiquitous in the biological literature. However, with the growth in computational power and DNA sequencing, and with ever more complex substitution models and analytical methods, it is less and less practical to run simple, one-shot analyses on a personal computer with an off-the-shelf program. As a result, we increasingly rely on custom-scripted analyses or custom-designed computational pipelines, and often on large compute machines or clusters. This course aims to introduce these skills with practical, hands-on training in Perl and SQL, and will be structured to accommodate students with less prior programming experience.

The course is divided into three parts:

  • Part I: Students bring their programming skills up to par with a review and tutorial in Perl, starting with "hello world" and ending with object oriented programming
  • Part II: Students learn how to do large-scale phyloinformatics with BioPerl and Bio::Phylo -- for example, automating the assembly and molecular clock recalibration of a supertree from a large number of component trees
  • Part III: Students learn to design a database, write queries in SQL, store and query trees, and build bindings between BioPerl and BioSQL.

Students will learn how to write basic phylogenetic or comparative analysis scripts: parsing NEXUS files; traversing and computing over trees; and making practical use of phylogenetic libraries. These skills will be learned in a biological context, touching on a diverse array of topics.

Course Pre-requisites:

Biology: A solid understanding of phylogenetics — for example, having already taken the Workshop on Molecular Evolution, the Molecular Evolution, Phylogenetics and Adaptation course, or equivalent coursework or experience.

Computing: Prior experience with Perl or having studied books on Perl prior to the course starting; and prior experience with basic operations in unix. We will offer two days of review to bring everyone up to speed, but the onus is on the students to have studied Perl ahead of time.

Detailed Program

Instituto Gulbenkian de Ciêcncia,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:  May 17th 2009