skip to primary navigationskip to content

Variant analysis with GATK



This workshop will focus on the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. You will learn why each step is essential to the variant discovery process, what are the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

In the course of this workshop, we highlight key functionalities such as the germline GVCF workflow for joint variant discovery in cohorts, RNAseq ­specific processing, and somatic variant discovery using MuTect2. We also preview capabilities of the upcoming GATK version 4, including a new workflow for CNV discovery, and we demonstrate the use of pipelining tools to assemble and execute GATK workflows.

The workshop is composed of one day of lectures and two days of hands­-on training, structured as follows.

Day 1: theory and application of the Best Practices for Variant Discovery in high­throughput sequencing data.

Day 2 and the morning of Day 3: hands­on exercises on how to manipulate the standard data formats involved in variant discovery and how to apply GATK tools appropriately to various use cases and data types.

Day 3 afternoon: hands-on exercises on how to write workflow scripts using WDL, the Broad's new Workflow Description Language, and to execute these workflows locally as well as
through a publicly accessible cloud-based service.

Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to non­ human data, and we will address some questions regarding adaptations that are needed for analysis of non­ human data, but we will not go into much detail on those points.



Geraldine Van der Auwera, Broad Institute


Audience and Prerequisites

  • Familiarity with the basic terms and concepts of genetics and genomics.
  • Basic familiarity with the command line environment is required
  • Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
  • The lecture­based component of the workshop is aimed at a mixed audience of people who are new to the topic of variant discovery or to GATK, seeking an introductory course into the tools, or who are already GATK users seeking to improve their understanding of and proficiency with the tools.
  • The hands-­on component is aimed at novice to intermediate users who are seeking detailed guidance with GATK and related tools. Basic familiarity with the command line environment is required.
  • Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals.


Syllabus, Tools and Resources

During this course you will learn about:

  • Pre-processing of high-throughput DNA and RNAseq sequencing data
  • Variant discovery (germline and somatic short variants, somatic CNV)
  • Germline variant filtering and evaluation
  • Pipelining strategies


Learning Objectives

After this course you should be able to:

  • Understand the overall variant discovery workflow rationale and requirements
  • Understand key methods and functionalities in light of the latest research
  • Understand key differences between germline and somatic variant discovery approaches
  • Apply analysis tools and Best Practices workflows to a real data set
  • Interpret analysis results and troubleshoot common problems
  • Write and execute WDL analysis pipelines



Book Here

GATK website



Day 1 Lectures 
Introduction to variant discovery analysis and GATK Best Practices 
Coffee Break 
Marking Duplicates Indel Realignment Base Recalibration 
Lunch Break 
Variant Calling and Joint Genotyping Filtering variants with VQSR Genotype Refinement Workflow 
Coffee Break 
Callset Evaluation
Somatic variant discovery with MuTect2 Preview of CNV discovery with GATK4 
Day 2 Hands­on exercises 
Working with standard data formats and data types: BAM, VCF, WGS, WEx, RNAseq Running Picard and GATK tools to process sequence data and collect QC metrics 
Coffee break 
Variant calling with HaplotypeCaller and the GVCF workflow 
Lunch break 
Variant callset evaluation and filtering 1 
Coffee break 
Variant callset evaluation and filtering 2 


Filed under: