skip to primary navigationskip to content
 

Introduction to high-throughput sequencing data analysis

Description

This course provides an introduction to high-throughput sequencing (HTS) data analysis methodologies. Lectures will give insight into how biological knowledge can be generated from RNA-seq, ChIP-seq and DNA-seq experiments and illustrate different ways of analyzing such data.

Practicals will consist of computer exercises that will enable the participants to apply statistical methods to the analysis of RNA-seq, ChIP-seq and DNA-seq data under the guidance of the lecturers and teaching assistants.

 

Trainers

Guillermo Parada, Sanger Institute

Konrad Rudolph, University of Cambridge

Luigi Grassi, University of Cambridge

Myrto Kostadima, EMBL-EBI

Sandra Cortijo, The Sainsbury Laboratory

Stefan Graf, University of Cambridge

 

Audience and Prerequisites

  • It is aimed at researchers who are applying or planning to apply HTS technologies and bioinformatics methods in their research.
  • Basic experience of command line UNIX
  • Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
  • Basic knowledge of the R syntax
  • For a real beginner's introduction into R see here. More advanced R instructions can be found at Quick-R or An Introduction to R
  • Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals

 

Syllabus, Tools and Resources

During this course you will learn about:

  • HTS technology
  • Quality control of raw reads: FASTQC and fastx toolkit
  • Considerations on experiment design for ChIP-seq and RNA-seq
  • Read alignment to a reference genome: Bowtie and Tophat
  • File format conversion and processing: UCSC tools and samtools
  • Peak calling: MACS
  • Motif analysis: MEME
  • Quantification of expression and guided transcriptome assembly: Cufflinks
  • Analysis of variants

 

Learning Objectives

After this course you should be able to: 

  • Understand the advantages and limitations of the high-throughput assays presented
  • Assess the quality of your datasets
  • Perform alignment and peak calling of ChIP-seq datasets
  • Perform alignment, quantification of expression and guided transcriptome assembly of RNA-seq datasets
  • Understand the standard file formats for representing variant data.
  • Apply filters to your list of variants and functionally annotate variants

 

Links

Book Here

 

Timetable

Day 1 |  Introduction to NGS analysis, Quality Control and Mapping
9:30 - 10:30  Introduction to Unix
10:30 - 10:45 Coffee / Tea 
10:45  - 12:00 Introduction to Unix
12:00- 13:00 Lunch 
13:00 - 15:30  Introduction to R and Bioconductor
15:30 - 15:45 Coffee / Tea 
15:45 - 16:30  High-throughput sequencing overview
16:30 - 17:30  High-throughput sequencing analysis workflow - group exercise
Day 2 | ChIP-Seq analysis
9:30 - 10:45 High-throughput sequencing quality control
10:45 - 11:00  Coffee / Tea 
11:00 - 12:00  Sequence alignment 
12:00 -13:00  Lunch
13:00 - 14:00  Introduction to ChIP-seq
14:00 - 15:00 ChIP-seq - peak calling and annotation
15:00 - 15:30  ChIP-seq - motif analysis 
15:30 - 15:45 Coffee / Tea 
15:45 - 17:30  ChIP-seq - enrichment plots 
Day 3 | RNA-Seq analysis
9:30 - 11:00 Introduction to RNA-seq 
11:00 - 11:15 Coffee / Tea 
11:15 - 12:00  RNA-seq - Alignment and splice junction identification 
12:00 - 13:00   RNA-seq - Transcriptome assembly 
13:00 - 14:00  Lunch 
14:00 - 15:00  RNA-seq analysis - Differential expression analysis 
15:00 - 15:30  RNA-seq - Functional annotation 
15:30 - 15:45 Coffee / Tea 
15:45 - 16:30  RNA-seq - Functional annotation (Cont.) 
16:30 - 17:30  RNA-seq - Advanced differential expression 
Day 4 | Variants analysis
9:30 - 11:00 Preparing a bam file for variant calling
11:00 - 11:15 Coffee/Tea
11:15 - 12:30 Calling variants
12:30 - 13:30 Lunch
13:30 - 15:00 Filtering and recalibrating variants
15:00 - 15:15 Coffee/tea
15:15 - 17:00 Variant annotation

 

Filed under: