The Gulbenkian Training Programme in Bioinformatics
Powerful Search and Mining in Biological Databases PSMBD08 - Course WebsiteIMPORTANT DATESDeadline for applications: September 8th 2008 Course date: September 15th - 19th 2008 |
Instructors: |
Ophir Frieder is the Royden B. Davis, S.J., Chair in Interdisciplinary Studies at Georgetown University. He is also the IITRI Chair Professor of Computer Science and the Director of the Information Retrieval Laboratory at the Illinois Institute of Technology, from which, he is currently on leave. He frequently consults for industry and government and for key intellectual property litigation. His research interests focus on scalable information retrieval systems spanning search and retrieval and communications issues. His systems are deployed in actual commercial and governmental production environments worldwide. He is Fellow of the AAAS, ACM, IEEE, and the 2007 ASIS&T Research Award recipient.
Georgetown University, Washington DC, USA |
Jay Urbain completed his PhD in Computer Science at the Illinois Institute of Technology where he was a member of the Information
Retrieval Laboratory. Dr. Urbain is currently an Assistant Professor in the Department of Electrical Engineering and Computer Science at
the Milwaukee School of Engineering where he has taught since 2003. His research interests include information retrieval, text and data
mining, machine learning, and bioinformatics. Dr. Urbain has taught for Learning Tree International, and has consulted for Blue Cross Blue
Shield of Northeastern Pennsylvania, Intelligent Medical Objects, Emersion Electric, and American Honda Motor Co. Previously, Jay was VP
of Software Development for ThinkMed LLC a health care data mining company, and Sr. Engineering Manager for Patient Monitoring Software
at Marquette Medical Systems.
Milwaukee School of Engineering, Milwaukee WI, USA |
Course Description |
We introduce the foundations of information systems as they relate to Bioinformatics. We assume no prior knowledge, as we initially overview the principal methods in relational database management systems, information retrieval, and data mining. We describe evaluation approaches and support instruction via hands-on laboratory assignments. This course will serve as an introduction for work in the field and will provide the participants with the fundamental understanding to conduct research in this domain. |
Course Timetable: |
PSMBD08 | Powerful Search and Mining in Biological Databases |
Sept 15th | Bioinformatics: scope and purpose Day 1 |
09:30 - 11:00 | Introduction to the course A brief Molecular Biology primer (for non-Biologists) Gene sequences -> amino acids -> proteins -> function Topics in Bioinformatics - Sequence alignments (find similarity between DNA / protein - amino acid sequences) - Genome assembly (combine genomic fragments to form whole genome) - Gene identification & annotation (classify) - Microarrays & gene expression analysis (use DNA microarray to measure mRNA) - Protein folding (Compute 3D protein structure from protein sequence and vice versa) - Phylogeny (infer common ancestry relationships and assign confidence levels) |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Bioinformatics: grand challenges Find genomes of all organisms Identify and annotate all genes Compute sequence from 3D structure for all proteins, and vice versa Identify protein function and gene regulatory networks Compare DNA / protein sequences for similarity Compare families of DNA / protein sequences |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Bioinformatics hands on lab and algorithms Understanding the complexity of bioinformatics problems Introduction to basic bioinformatics algorithms Pairwise sequence alignment Multiple sequence alignment Phylogeny analysis Protein structure prediction & alignment |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Hands on lab Pairwise sequence alignment with dynamic programming, BLAST |
Sept 16th | DBMS and the relational data model Day 2 |
09:30 - 11:00 |
Introduction to Database Management Systems Relational data model |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Relational operators and SQL Relational operators SQL Parallel database principles |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Data modeling Data modeling: 1:1, 1:m, m:m relationships Hands on lab: ERD data modeling |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Hands on lab: SQL query lab |
Sept 17th | Information Retrieval and Online biological databases Day 3 |
09:30 - 11:00 | Information Retrieval - Part 1 Introduction to IR Measuring effectiveness: Precision/Recall IR System architecture |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Application development Indexing Query processing |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Introduction to NCBI databases Entrez gene MEDLINE UMLS OMIM MESH Hands on Lab 1: Searching online biological databases |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Application development NCBI database application development Hands on Lab 2: Using Java to interface to NCBI databases |
Sept 18th | Information Retrieval / Introduction to Data Mining Day 3 |
09:30 - 11:00 | Retrieval Models and utilities Boolean Vector Space Probabilistic Utilities Stemming, synonyms, term proximity |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Scalable Information Retrieval Techniques Parallel information retrieval principals Compression |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Hands on IR lab and Introduction to Data Mining Hands on IR lab Evaluate different retrieval functions with IR system Introduction to Data Mining Data preprocessing Data warehousing Machine learning |
16:00 - 16:30 | Tea Break |
16:30 - 18:00 | Applications Gene co-occurrence clustering Identifying biological pathways Hands on DNA chip data clustering lab |
Sept 19th | Data Mining / Biological Literature Search Day 5 |
09:30 - 11:00 | Machine learning Principles of machine learning Machine learning algorithms Naïve Bayes Decision trees Association rules Clustering Probabilistic graphical models |
11:00 - 11:30 | Coffee Break |
11:30 - 12:30 | Hands on machine learning lab |
12:30 - 14:00 | Lunch Break |
14:00 - 16:00 | Biological Literature Search and Mining Rationale: Explosion of biological data and literature from research using high throughput devices Need to annotate gene expression, protein function, regulatory networks Gene/protein named entity identification, term normalization, synonymy, polysemy Integration of structured data and text Dimensional data model |
16:00 - 16:30 | Tea Break |
14:00 - 16:00 | Application development Multi-evidentiary retrieval models Document, passage retrieval Wrap up |
Pre-requisites: Elementary computing skills. |
Instituto Gulbenkian de Ciência,
Apartado14, 2781-901 Oeiras, Portugal