3C-based data analysis and 3D reconstruction of chromatin folding

Downloadable poster in PDF

IMPORTANT DATES for this Course

Deadline for applications: Aug 31^st 2018
Course date: Sep 17^th to Sept 21^st 2018

Candidates with adequate profile will be accepted in the next 72 hours after the application, until we reach 20 participants.

Instructors:

Marc A. Martí-Renom obtained a Ph.D. in Biophysics from the Universidad Autonoma de Barcelona where he worked on protein folding under the supervision of B. Oliva, F.X. Aviles and M. Karplus. After that, he went to the US for a postdoctoral training on protein structure modeling at the Sali Lab (Rockefeller University) as the recipient of the Burroughs Wellcome Fund fellowship. Later on, Marc was appointed Assistant Adjunct Professor at UCSF. Between 2006 and 2011, he headed of the Structural Genomics Group at the CIPF in Valencia (Spain). Currently, Marc is an ICREA research professor and leads the Structural Genomics Group at the National Center for Genomic Analysis - Centre for Genomic Regulation (CNAG-CRG) in Barcelona. His group is broadly interested on how RNA, proteins and genomes organize and regulate cell fate. Finally, Marc is an Associate Editor of the PLoS Computational Biology journal and has published over 90 articles in international peer-reviewed journals.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

François Serra obtained his Degree in Biology, specialized in Physiology and Neurophysiology, his Master's Degree in Structural genomics and bioinformatics (Strasbourg I University, France) and it's PhD in Evolutionary Genomics in the Department of Bioinformatics at the CIPF (Valencia). He is now part of the Structural Genomic team of Marc A. Martí-Renom at CNAG-CRG (Barcelona). His main research interests are grounded on comparative genomics and evolution with a special focus on the effect of evolution in the structural arrangement of genomes. He has taught MEPA and 3DMOG for GTPB, and also in similar courses at CIPF (Valencia, ES) and the Department of Genetics of the University of Cambridge (UK).

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

David Castillo obtained his MSc in Photonics from the Universitat Politècnica de Catalunya in Barcelona (Spain) where he worked in Super-resolution microscopy. He has a background in Physics and Engineering. He works as a technician in the Structural Genomics team of Marc A. Martí-Renom at CNAG-CRG (Barcelona), developing tools for the analysis, modelling and visualization of HiC data. He is also interested in the integration of microscopy to the modeling of genomic 3D structures.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

Jürgen Walther obtained his B.Sc. degree at the University of Würzburg (Germany) in Physics with final work done on astrophysics and his master degree at the University of Texas at Austin (USA) in Physics with specialization in Biophysics. The final work was about imaging molecular motor regulation on a single molecule level where he integrated single molecule imaging, optical trapping and convolutional microscopy in one machine. He is now working as a PhD student in the Molecular Modeling and Bioinformatics laboratory of Modesto Orozco at the Institute for Research in Biomedicine (IRB) Barcelona. His main focus is developing coarse-grained models to simulate large pieces of DNA and chromatin at a very high resolution. Some other projects involve in-depth analysis of molecular dynamics (MD) trajectories, analysis of NGS data and 3D visualization of 2D microscopy data. His models 'MC-DNA' and 'Chromatin Dynamics' are a component of the Multiscale Complex Genomics VRE where a unified view of the genome at all length scales from base-pair to chromosome in form of a web interface is developed.

Affiliation: Institute for research in Biomedicine (IRB), Barcelona, ES

Diana Buitrago obtained her Master's degree in bioinformatics at Universidad Pompeu Fabra (Barcelona). She is currently a PhD student in Molecular Modeling and Bioinformatics group (Institute for Research in Biomedicine) working with Dr. Modesto Orozco. She has worked in the analysis of nucleosome positioning, its relation with epigenetic marks (DNA methylation and histone modifications) and how these factors change the physical properties of DNA and influence gene regulation and expression. She is also interested in modeling of chromatin structure in 3D using polymer modeling based on chromatin types derived from epigenetic modifications.

Affiliation: Institute for research in Biomedicine (IRB), Barcelona, ES

Course description

3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are in close spatial proximity in the cell nucleus. Overall, those interaction matrices have been used to study how the genome folds within the nucleus, which is one of the most fascinating problems in modern biology. The rigorous analysis of those paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in space. Currently, there is a clear expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1Kb resolution (see for example: http://hic.umassmed.edu/welcome/welcome.php ; http://promoter.bx.psu.edu/hi-c/view.php or http://www.aidenlab.org/data.html). In this course, participants will learn to use TADbit, a software designed and developed to manage all dimensionalities of the Hi-C data:

1D - Map paired-end sequences to generate Hi-C interaction matrices
2D - Normalize matrices and identify constitutive domains (TADs, compartments)
3D - Generate populations of structures which satisfy the Hi-C interaction matrices
4D - Compare samples at different time points

Participants can bring specific biological questions and/or their own 3C-based data to analyze during the course. At the end of the course, participants will be familiar with the TADbit software and will be able to fully analyze Hi-C data. On the last day, in order to be able to look beyond kb resolution in genome dynamics, participants will be introduced to computational approaches needed to bridge known atomistic insights of DNA with chromosome structure information. Two models will be introduced to provide the user of the MuG VRE detail of higher level of resolution. The first model 'MC-DNA' consists of a coarse-grained model of B-DNA where DNA is represented intrinsically at base pair level with an elastic potential representing interactions between adjacent base pairs. This model allows to probe sequence-specific properties of naked B-DNA of a sequence specified by the user. This coarse-grained DNA model is extended towards kb-long chromatin chains by implementing nucleosomes between linker DNA ('Chromatin Dynamics').

Note: Although the TADbit software is central in this course, alternative software will be discussed for each part of the analysis.

Target Audience

The course design is oriented towards experimental researchers and bioinformaticians at the graduate and post-graduate levels. The last edition of this course was attended by people with different backgrounds and interested in the genome organization.
Moreover, Hi-C data have recently been used in metagenomics studies to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species.
It is likely that the participants to this course aim at getting involved in generating Hi-C data for chromosome structure determination or that they just want to be able to correctly interpret and analyse publicly available data.

Course Pre-requisites

Recommended Linux and basic Python programming skills, graduate level in Life Sciences.
All hands-on will be given at 3 levels of computational expertise (web platform, command-line tool and python scripting).

TADbit API

This tutorial is associated with a specific version of TADbit. if you wish to reproduce exactly the results you should use the version of TADbit tagged 3DAROC_2018.

To install this version, please issue these commands:

git clone https://github.com/3DGenomes/TADbit

cd tadbit

git checkout tags/3DAROC_2018

sudo python setup.py install

TADbit tools

Most of the tasks of the "core pipeline" can be tunned directly from command line (without any python), using TADbit tool. Have a look to the commands, and the metadata of the results.

For now TADbit tool is not incuded in the general documetation, as it is still under active development. Use it carefully, and don't hesitate to repport any unexpected behaviour you observe.

TADbit page at github: https://github.com/3DGenomes/TADbit

Virtual research environment

With small datasets TADbit core pipeline can be runned through a new Virtual Research Environment (VRE), hosted by the MuG project. This might also be the best place to try TADkit for visualizing genomes in 3D together with interactions matrices and any other genomic track.

MuG website from which to access to the VRE: https://www.multiscalegenomics.eu/MuGVRE/

Detailed Program

Support and sponsorship

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated: Sept 4^th 2018