ARANGS15

Automated and reproducible analysis of NGS data

Timetable (provisional)

ARANGS15 Automated and reproducible analysis of NGS data
Mon, May 11th
Day #1
09:30 - 11:00 Introduction, Installation, Git and Github 101
Introduce ourselves, the course concepts and discover each other's expectations for the coming week. Download any required software if you haven't already done so. Introduce Git:
  • Github
    • open source licenses
    • class Github repositories
    • forking our repositories to make your own copy
    • Using forks as a proxy for 'citations' of code
    • Following our repositories
    • Pulling changes from our repositories
    • Submitting Push Requests (sharing back)
  • Git
    • Clone
    • Local and remote repositories
    • .gitignore and secrets
    • The Git Workflow
11:00 - 11:30 Coffee Break
11:30 - 12:30 Introduce the pipeline software requirements, and discuss its compute environment
Using the pipeline as a launching point, introduce the Linux operating system variants, and discuss their different package management systems. Demonstrate the breadth of internet resources availble for students to find out how to install the packages they need.
12:30 - 14:00 Lunch Break
14:00 - 16:00 First run of the pipeline
Students will attempt to use the documentation provided to get their local machines ready to run the pipeline. We will discuss pain points along the way. Instructors will run the pipeline to demonstrate that it works, and the output that is expected.
16:00 - 16:30 Tea Break
16:30 - 18:00 Continue running the pipeline, note any different versions of software on different machines.
Tue, May 12th
Day #2
09:30 - 11:00 Morning Wrap-up (what have we done so far?), Virtualization 101, Virtualbox
Review day 1, then dive into Virtualization with Virtualbox.
11:00 - 11:30 Coffee Break
11:30 - 12:30 Adding Vagrant to your virtualization toolbox
Students will learn how to use vagrant to automate and manage their research pipeline projects. They will learn about the Vagrantfile syntax, and the location of free base boxes from which they can start to build the machine environment for the pipeline. They will also use package management applications to install the software needed to run the pipeline on their Virtual Machine.
12:30 - 14:00 Lunch Break
14:00 - 16:00 Adding Puppet to your virtualization toolbox
Students will learn about configuration management, and start to use Puppet to automate the process of provisioning their pipeline virtual machine.
16:00 - 16:30 Tea Break
16:30 - 18:00 Run the pipeline inside a virtual machine image provisioned with puppet
Students will finish provisioning their pipeline virtual machine with puppet, and run the pipeline inside the virtual machine.
Wed, May 13th
Day #3
09:30 - 11:00 Morning Wrap-up (what have we done so far?), Vagrant boxes
Students will learn how Vagrant can be used to store and share the exact machine image that they created, with specific versions of the software required to run their pipeline, and a standard directory structure.
11:00 - 11:30 Coffee Break
11:30 - 12:30 Host machine directory mounts
Students will learn how to mount local directories onto their Virtual Machine. They will learn about how data from different projects can be plugged into their virtual machine in place of the directory structure it expects.
12:30 - 14:00 Lunch Break
14:00 - 16:00 Introduce Docker
Students will learn how Docker builds off virtualization with a different approach. They will learn about the docker toolset: docker, docker-machine, and docker-compose. The session will end with a discussion of docker concepts:
  • Container Applications compared to Virtual Machine Images
  • Volume Containers to permenantly store and share data
  • Volumes to plug different host directories into the directory structure expected by a container application
16:00 - 16:30 Tea Break
16:30 - 18:00 Docker Machine, and Docker Commandline
Students will learn how virtualization is leveraged to produce a standard machine environment on any host machine within which docker images can be stored and docker containers can be run. Students will create their docker machine, and learn how to configure the Docker commandline application to work with it. Students will also learn how the Docker commandline's preconfigured knowledge of the global docker hub registry can speed up their use of docker to design their environments.
Thu, May 14th
Day #4
09:30 - 11:00 Morning Wrap-up (what have we done so far?) Dockerfile and the Docker build context
Students will learn how to create a Docker build context to automate and document the way they build their machine environment.
11:00 - 11:30 Coffee Break
11:30 - 12:30 Docker Compose
Students will learn how to use docker-compose to automate the way they build and launch their pipeline components. They will use docker-compose as they continue creating the pipeline image build contexts.
12:30 - 14:00 Lunch Break
14:00 - 16:00 Run the pipeline
Students will run their pipeline using docker-compose. They will learn about how docker-compose can automate volume mounts to host locally stored data, and how its logging system can help them debug their applications.
16:00 - 16:30 Tea Break
16:30 - 18:00 Share the pipeline
Students will learn how to use Docker to store their pipeline machine images, and share them with each other to reuse.
Fri, May 15th
Day #5
09:30 - 11:00 Morning Wrap-up (what have we done so far?) Sharing your pipeline machine environments with the world
Students will begin learn how to use Vagrant to share their machine images with the rest of the world.
11:00 - 11:30 Coffee Break
11:30 - 12:30 Sharing your code and Vagrantfile
Students will get a brief introduction into how they can use github to share the code and Vagrantfile to build their machine image with the world.
12:30 - 14:00 Lunch Break
14:00 - 16:00 Sharing your docker images, code, and build contexts with the world
Students will learn how they can use the worldwide docker hub registry to share their docker images with the world. They will learn how to store their code with the Docker build contexts needed to build the images needed to run the code in github.
16:00 - 16:30 Tea Break
16:30 - 18:00 Final Wrap-up session
Course Homepage

Instituto Gulbenkian de Ciência,

Apartado 14, 2781-901 Oeiras, Portugal

GTPB Homepage

IGC Homepage

Last updated:   Apr 12th 2015