This workshop will focus on the core steps involved in calling germline short variants, somatic short variants, and copy number alterations with the Broad’s Genome Analysis Toolkit (GATK), using “Best Practices” developed by the GATK methods development team. A team of methods developers and instructors from the Data Sciences Platform at Broad will give talks explaining the rationale, theory, and real-world applications of the GATK Best Practices. You will learn why each step is essential to the variant-calling process, what key operations are performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset. If you are an experienced GATK user, you will gain a deeper understanding of how the GATK works under-the-hood and how to improve your results further, especially with respect to the latest innovations.

The hands-on tutorials for learning GATK tools and commands will be on Terra, a new platform developed at Broad in collaboration with Verily Life Sciences for accessing data, running analysis tools and collaborating securely and seamlessly. (If you’ve heard of or been a user of FireCloud, think of Terra as the new and improved user interface for FireCloud that makes doing research easier than before!)

  • Day 1: Introductory topics and hands-on tutorials. We will start off with introductory lectures on sequencing data, preprocessing, variant discovery, and pipelining. Then you will get hands-on with a recreation of a real variant discovery analysis in Terra.
  • Day 2: Germline short variant discovery. Through a combination of lectures and hands-on tutorials, you will learn: germline single nucleotide variants and indels, joint calling, variant filtering, genotype refinement, and callset evaluation.
  • Day 3: Somatic variant discovery. In a format similar to Day 2, you will learn: somatic single nucleotide variants and indels, Mutect2, and somatic copy number alterations.
  • Day 4: Pipelining and performing your analysis end-to-end in Terra. On the final day, you will learn how to write your own pipelining scripts in the Workflow Description Language (WDL) and execute them with the Cromwell workflow management system. You will also be introduced to additional tools that help you do your analysis end-to-end in Terra.

Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to non-human data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.



  • The course is aimed primarily at mid-career scientists – especially those whose formal education likely included statistics, but who have not perhaps put this into practice since.
  • Familiarity with the basic terms and concepts of genetics and genomics.
  • Basic familiarity with the command line environment is required.
  • Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.


For additional information, follow this link.