skip to primary navigationskip to content
 

Introduction to high-throughput sequencing data analysis

Description

This course provides an introduction to high-throughput sequencing (HTS) data analysis methodologies. Lectures will give insight into how biological knowledge can be generated from RNA-seq, ChIP-seq and DNA-seq experiments and illustrate different ways of analyzing such data.

Practicals will consist of computer exercises that will enable the participants to apply statistical methods to the analysis of RNA-seq, ChIP-seq and DNA-seq data under the guidance of the lecturers and teaching assistants.

 

Trainers

Myrto Kostadima, University of Cambridge

Stefan Graf, University of Cambridge

 

Audience and Prerequisites

  • This course is aimed at PhD students and post-doctoral researchers who are considering or already generating NGS datasets, but have limited experience in data analysis
  • Participants are expected to have some UNIX experience, such as a basic understanding of the command-line operations cd and ls for navigating and listing directories, respectively and the difference between absolute and relative paths. Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
  • Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals

 

Syllabus, Tools and Resources

During this course you will learn about:

  • HTS technology
  • Quality control of raw reads: FASTQC and fastx toolkit
  • Considerations on experiment design for ChIP-seq and RNA-seq
  • Read alignment to a reference genome: Bowtie and Tophat
  • File format conversion and processing: UCSC tools and samtools
  • Peak calling: MACS
  • Motif analysis: MEME
  • Quantification of expression and guided transcriptome assembly: Cufflinks
  • Analysis of variants

 

Learning Objectives

After this course you should be able to: 

  • Understand the advantages and limitations of the high-throughput assays presented
  • Assess the quality of your datasets
  • Understand the difference between splice-aware and splice-unaware aligners
  • Perform alignment and peak calling of ChIP-seq datasets
  • Perform alignment, quantification of expression and guided transcriptome assembly of RNA-seq datasets

 

Links

Book Here

 

Timetable

Day 1 |  Introduction to NGS analysis, Quality Control and Mapping
9:30 - 10:30  Introduction to Unix
10:30 - 10:45 Coffee / Tea 
10:45  - 12:00 Introduction to Unix
12:00- 13:00 Lunch 
13:00 - 15:30  Introduction to R and Bioconductor
15:30 - 15:45 Coffee / Tea 
15:45 - 16:30  High-throughput sequencing overview
16:30 - 17:30  High-throughput sequencing analysis workflow - group exercise
Day 2 | ChIP-Seq analysis
9:30 - 10:45 High-throughput sequencing quality control
10:45 - 11:00  Coffee / Tea 
11:00 - 12:00  Sequence alignment 
12:00 -13:00  Lunch
13:00 - 14:00  Introduction to ChIP-seq
14:00 - 15:00 ChIP-seq - peak calling and annotation
15:00 - 15:30  ChIP-seq - motif analysis 
15:30 - 15:45 Coffee / Tea 
15:45 - 17:30  ChIP-seq - enrichment plots 
Day 3 | RNA-Seq analysis
9:30 - 11:00 Introduction to RNA-seq 
11:00 - 11:15 Coffee / Tea 
11:15 - 12:00  RNA-seq - Alignment and splice junction identification 
12:00 - 13:00   RNA-seq - Transcriptome assembly 
13:00 - 14:00  Lunch 
14:00 - 15:00  RNA-seq analysis - Differential expression analysis 
15:00 - 15:30  RNA-seq - Functional annotation 
15:30 - 15:45 Coffee / Tea 
15:45 - 16:30  RNA-seq - Functional annotation (Cont.) 
16:30 - 17:30  RNA-seq - Advanced differential expression 
Day 4 | Variants analysis
9:30 - 11:00 Preparing a bam file for variant calling
11:00 - 11:15 Coffee/Tea
11:15 - 12:30 Calling variants
12:30 - 13:30 Lunch
13:30 - 15:00 Filtering and recalibrating variants
15:00 - 15:15 Coffee/tea
15:15 - 17:00 Variant annotation

 

Filed under: