Massive Data Analysis (using Babelomics)

Feb 21st - Feb 23rd 2011

   Deadline for applications: Feb 14th 2011
   Notification of acceptance dates:
        EARLY: Feb 8th 2011
        NORMAL: Feb 15th 2011


Joaquín Dopazo has a master degree in Chemistry (Universidad de Valencia) and a PhD in Biology (Universidad de Valencia). He is the head of the Department of Bioinformatics at the CIPF (Valencia). In previous appointments he was responsible for Bioinformatics units at the CNIO (Madrid) and at GlaxoWellcome SA (Madrid). He has supervised several large scale projects of software development, as the GEPAS or the Babelomics ( where more than 500 microarray experiments are daily analysed. He has more than one hundred papers published in international peer reviewed journals and has edited a book on genomic data analysis. His main interests include functional and comparative genomics.

Affiliation Centro de Investigacion Principe Felipe, Valencia, ES

Javier Santoyo-Lopez obtained his PhD in Biological Sciences from the University Autonoma of Madrid (UAM) in 1997. His PhD studies were performed at the Spanish Research Council (CSIC) Centre of Molecular Biology "Severo Ochoa"ť (Madrid, Spain). After his PhD he moved to Dundee (Scotland, UK) where he obtained his MSc in Bioinformatics at the University of Abertay Dundee. He was a Bioinformatician at the Wellcome Trust Biocentre (Dundee, Scotland, UK), at the Sanger Institute (Hinxton, England, UK), at the Spanish National Cancer Centre - CNIO (Madrid, Spain) and at the Division of Pathway Medicine (Edinburgh, Scotland, UK). Since 2008 he works at the Biomedical Network Research Centre for Rare Diseases (CIBERER) in The Prince Felipe Research Centre CIPF (Valencia, Spain) where he is involved in the study of genomic variations and regulatory elements that are cause of Rare Diseases. He is also involved in the study of non-coding RNAs, in particular, the study of the RNA interference phenomenon (RNAi).

Affiliation Centro de Investigacion Principe Felipe, Valencia, ES

Sara C. Madeira obtained her PhD in Computer Science and Engineering (CSE) at Instituto Superior Técnico, Technical University of Lisbon, in December 2008, with a thesis entitled "Efficient Biclustering Algorithms for Time Series Gene Expression Data Analysis". She is currently an Auxiliary Professor at the CSE Department at IST where she teaches graduate courses on Computational Biology and Data Integration for Bioinformatics, and undergraduate courses on Algorithms. She is also a researcher at the Knowledge Discovery and Bioinformatics (KDBIO) group at INESC-ID. Her research interests include biclustering algorithms and gene expression data analysis, design and development of algorithms and data mining techniques to tackle biomedical problems, clinical and omics' data integration/fusion and integrative approaches to the study of complex diseases. She is currently the principal investigator of the research project NEUROCLINOMICS - Understanding NEUROdegenerative diseases throught CLINical and OMICS data integration.

Personal Website
Affiliation INESC-ID, Lisboa PT and Instituto Superior Técnico, Lisboa PT

Course description:

High-throughput technologies such as expression microarrays, genotyping (GWAS), next generation sequencing (NGS), etc., are characterized for producing massive amounts of data. Its analysis and interpretation is not trivial. New genome-scale technologies offer possibilities for querying living systems that only a few years ago we could not even dream. However, at the same time, posses new challenges in the way the hypotheses must be tested and the results have to be analyzed.

Since the first microarray papers published in the late nineties the number of questions that have been addressed have both increased and diversified, especially in the last years, with the advent of NGS tecnologies. One common interest in transcriptomic analysis has been to find genes differentially expressed among distinct experimental conditions, or correlated to diverse parameters. There is also much interest in robust methods for building predictors of clinical outcomes. There is also a clear demand for methods that allow automatic transfer of biological information to the results of transcriptomics or GWAS experiments and to interpret them at the light of the biological knowledge. New methods of analysis have been proposed that directly address hypothesis on blocks of genes functionally related that have demonstrated to be superior to the classical one-gene-at-a-time approaches (Mootha et al., 2003; Al-Shahrour et al., 2007). Also, the use of protein-protein interaction networks or text mining methodologies can help in the interpretation of the results of genomic experiments (Minguez et al. 2009).

New generations of sequencing technologies will increase the availability of transcriptomic and genomic data in a different format but with the same scientific questions behind.

The MDA11 course covers the state-of-the-art in the above mentioned topics, which are of major relevance in today's genomic and gene expression data analysis. Through sessions of theory and practical examples, the attendees will acquire the experience necessary to address relevant scientific questions to genomic or transcriptomic datasets and solve them. Special attention will be devoted to important aspects in massive data analysis, such as multiple testing or functional profiling.

The course is designed to be a mixture of theoretical and practical sessions. The latter will require some familiarity with the use of web-based tools and basic notions of statistics. Practical sessions will be carried out using the Babelomics suite (Al-Shahrour et al.,2005, Medina et al., 2010), an integrated web tool for microarray data analysis and functional profiling of genome-scale experiments.

NOTE: This course is immediately followed by AFADM11, which is a natural choice for those that want to perform automatic annotations, specially (but not exclusively) those who work with non-model organisms. You have the choice of applying to MDA11, AFADM11 or both, but please apply separately to each of them.

Course Pre-requisites:

Basic knowledge in Molecular Biology and Statistics.

Detailed Program for MDA11
Detailed Program for AFADM11

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:  Jan 25th 2011