Biostatistical Foundations in Bioinformatics

   Deadline for applications: October 18th 2010
        NORMAL: October 22nd 2010
   Course date: November 15th - November 18th 2010


Lisete Sousa is an Assistant Professor in Faculty of Sciences of the University of Lisbon (FCUL), and coordinates the MSc in Biostatistics. She is an integrated member of the Center of Statistics and Applications (CEAUL) since 2004 when she obtained her PhD in Probability and Statistics (Lisbon University and Dortmund University - Germany). Since then she has taught the course Statistical Methods in Bioinformatics in the University of Lisbon, among other statistical courses for BSc. and MSc. Since 2001, her main research interests are the development of statistical methodologies for detecting differentially expressed genes, and the prediction of transmembrane proteins topology based on hidden Markov models but adopting a Bayesian perspective. She collaborates with researchers from Instituto de Medicina Molecular (IMM) and Instituto de Ciência Aplicada e Tecnologia of FCUL (ICAT).
Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal

Carina Silva graduated in Statistics and Operational Research and holds a MSc degree in Probability and Statistics from Faculdade de Ciências da Universidade de Lisboa. She is Adjunct Professor at Escola Superior de Tecnologia da Saúde de Lisboa (ESTeSL - IPL) and teaches Applied Mathematics and Biostatistics there. She coordinated the course "Statistical Data Analysis with R" at ESTeSL. She is a member and collaborator of CEAUL (Center of Statistics and Applications). She cooperated on the design of the Biostatistics module of the European Masters Programme in Radiation Sciences for Oncology named EMPIRION. Her main research interests are Statistics in Health Sciences, ROC analysis, Molecular Genetics (Microarray Data Analysis) and Bayesian Statistics.
Escola Superior de Tecnologias da Saúde Lisboa, Lisboa, Portugal

Course description:

This is one of our «Foundations» type courses, providing a systematic and detailed review of fundamental concepts and techniques used in Bioinformatics. Many analytical and inferential methods, regardless of their novelty, have their niches of application all over the place in Bioinformatics. Newer techniques such as the ones employed in high throughput data analysis are not different in this respect. We will be looking at statistical methods, digging into their inner workings, wearing the skins of professional statisticians. Attendees can expect to attend a thorough set of lectures that will reveal the conceptual frameworks that are needed to understand the methods, and extensive hands-on practice, exclusively based on biological examples.

Target Audience

Everybody using Bioinformatics methods is implicitly using statistical methods. Most people have had one or more semester courses in Statistics in their graduate education. For many, Statistics happened in their lives a long a time ago, and that makes it difficult to go back and manipulate the concepts with full confidence. Moreover, proper judgment of the results often calls for a deeper level of understanding than what is required to solve scholarly exercises.

Attending this course is a chance of revisiting subjects like experimental design, hypothesis testing, inference and prediction in an intensive and systematic way. We will look into particular areas such as Bayesian Inference, Hidden Markov Chains and Multivariate methods with the attitude, eyes and brains of a statistician that wants to understand how the methods work.

Some of the software that will be used for practicals:

R The R Project for Statistical Computing
WinBugs Bayesian inference Using Gibbs Sampling
PROVID-TMHMM Transmembrane protein topology prediction using hidden Markov models and evolutionary information
TOP-MOD Topological Mesh Modeler
dChip DNA-Chip Analyzer
BAMarray Bayesian analysis of variance for microarray data
SVM-light, SVM-Struct Support Vector Machine for classification and regression problems


The course will introduce a relatively high number of concepts and methods. To keep it highly practical, we will spend most of the time in hands-on sessions.
- We will focus on each method using examples taken from real world Bioinformatics practice.
- We will then dissect the method, identifying the concepts and exploring their interrelationships.
- The applicability and limitations of each method will be emphasized.
- The use of the method will be illustrated using appropriate Bioinformatics tools and biological data resources.

Course Pre-requisites

Basic knowledge in Statistics is handy.
Before the course, attendees should briefly review the following concepts, if needed: probability, conditional probability, distribution, statistical tests, hypothesis testing, and inference.
Elementary skills in computer usage are needed.
Familiarity with the R environment is useful but not required.

Detailed Program

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:   September 24th 2010