+ All Categories
Home > Documents > Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step...

Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step...

Date post: 21-Sep-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
48
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | 2020 1 PowerPoint by Casey Hanson Edited by Saba Ghaffari
Transcript
Page 1: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Variant Calling WorkshopChris Fields

Variant Calling Workshop | Chris Fields | 2020 1

PowerPoint by Casey HansonEdited by Saba Ghaffari

Page 2: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Introduction

In this lab, we will do the following:

1. Perform variant calling analysis on the IGB biocluster.

2. Visualize our results on the desktop using the Integrative Genomics Viewer (IGV) tool.

Variant Calling Workshop | Chris Fields | 2020 2

Page 3: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Start the VM

• Follow instructions for starting VM. (This is the Remote Desktop software.)

• The instructions are different for UIUC and Mayo participants.

• Instructions for UIUC users are here: http://publish.illinois.edu/compgenomicscourse/files/2020/06/SetupVM_UIUC.pdf

• Instructions for Mayo users are here:http://publish.illinois.edu/compgenomicscourse/files/2020/06/VM_Setup_Mayo.pdf

Variant Calling Workshop | Chris Fields | 2020 3

Page 4: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0A: Accessing the IGB Biocluster1. In the VM, Run Putty.exe

2. In the hostname textbox type:

biologin.igb.illinois.edu

3. Click Open

4. If popup appears (it may not), Click Yes

5. Enter login credentials assigned to you; e.g.,

login as: class00password: (type it out here)

4

Now you are logged on to biocluster.

Variant Calling Workshop | Chris Fields | 2020

Where is this?

Page 5: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0B: Lab SetupThe data and code needed for the lab, as well as the output files from the lab, are located in the following directory:

/home/classroom/hpcbio/mayo_workshop/2019/Mayo-Variant-Calling/

You don’t need to do anything yet. You and the TA may together consult it later if you unsure about your runs.

Variant Calling Workshop | Chris Fields | 2020 5

DON’T TYPE THIS!

We’ll call the above the “common” directory for the lab, since it is visible to all students

Page 6: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0C: Lab Setup

Variant Calling Workshop | Chris Fields | 2020 6

The three commands in the gray box below will achieve the following:

a) Create a directory called 03_Variant_Calling in your home directory.

b) Change your current directory to this new directoryc) Copy all code (files ending with “.sh”) from the

common directory (previous slide) to your current directory.

• Note: a “directory” is the Unix word for “folder”

• Note: your “home directory” is your landing folder when you log on.

• Note: for better organization, we don’t put all files in the home directory. We create a folder (directory) inside it, and “move into” it, just as you would move inside a sub-folder in Windows/Mac by double-clicking it.

$ mkdir ~/03_Variant_Calling

# Make new directory inside your home directory. “mkdir” stands for “make directory”

$ cd ~/03_Variant_Calling

# Move into this directory. The “cd” stands for “change directory”

$ cp /home/classroom/hpcbio/mayo_workshop/2019/Mayo-Variant-Calling/*.sh .

# Copy all files whose names end with “.sh” to current directory. These are code files

DON’T RUN ANYTHING YET

Page 7: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0C: Lab Setup

Variant Calling Workshop | Chris Fields | 2020 7

$ mkdir ~/03_Variant_Calling

# Make new directory inside your home directory. “mkdir” stands for “make directory”

$ cd ~/03_Variant_Calling

# Move into this directory. The “cd” stands for “change directory”

$ cp /home/classroom/hpcbio/mayo_workshop/2019/Mayo-Variant-Calling/*.sh .

# Copy all files whose names end with “.sh” to current directory. These are code files

The three commands in the gray box below will achieve the following:

a) Create a directory called 03_Variant_Calling in your home directory.

b) Change your current directory to this new directoryc) Copy all code (files ending with “.sh”) from the

common directory (previous slide) to your working directory.

Q: Why “~/03_Variant_Calling”?

A: This is Linux short hand for “a directory called 03_Variant_Calling inside of the home directory “~”

RUN the first two commands now (shown in red font). (TYPE & HIT ENTER)

Page 8: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0C: Lab Setup

Variant Calling Workshop | Chris Fields | 2020 8

$ mkdir ~/03_Variant_Calling

# Make new directory inside your home directory. “mkdir” stands for “make directory”

$ cd ~/03_Variant_Calling

# Move into this directory. The “cd” stands for “change directory”

$ cp /home/classroom/hpcbio/mayo_workshop/2019/Mayo-Variant-Calling/*.sh .

# Copy all files whose names end with “.sh” FROM the common directory TO current directory. These are code files.

The three commands in the gray box below will achieve the following:

a) Create a directory called 03_Variant_Calling in your home directory.

b) Change your current directory to this new directoryc) Copy all code (files ending with “.sh”) from the

common directory (previous slide) to your working directory.

Q: What is this “.” (period)?

A: This is Linux short hand for “current directory”

Note: Space after “cp” and another space before the “.”

RUN the third command now (shown in red font)

Page 9: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 0C: Lab Setup

Variant Calling Workshop | Chris Fields | 2020 9

$ mkdir ~/03_Variant_Calling

# Make new directory inside your home directory. “mkdir” stands for “make directory”

$ cd ~/03_Variant_Calling

# Move into this directory. The “cd” stands for “change directory”

$ cp /home/classroom/hpcbio/mayo_workshop/2019/Mayo-Variant-Calling/*.sh .

# Copy all files whose names end with “.sh” FROM the common directory TO current directory. These are code files

The three commands in the gray box below will achieve the following:

a) Create a directory called 03_Variant_Calling in your home directory.

b) Change your current directory to this new directoryc) Copy all code (files ending with “.sh”) from the

common directory (previous slide) to your working directory.

DON’T RUN ANYTHING NOW. YOU’RE DONE WITH THESE COMMANDS

Copied Files:

annotate_snpeff.sh

call_variants_ug.sh

hard_filtering.sh

post_annotate.sh

HOW TO CHECK IF THE COPYING WORKED?

Page 10: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Variant Calling SetupIn this exercise, we will use data from the 1000 Genomes project (EXOME, 60x coverage) to call variants, in particular single nucleotide polymorphisms.

The initial part of the GATK pipeline (alignment, local realignment, base quality score recalibration) has been done, and the BAM file has been reduced for a portion of human chromosome 20. This is the data we will be working with in this exercise.

Variant Calling Workshop | Chris Fields | 2020 10

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 11: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1A: Running a Variant Calling Job In this step, we will start a variant calling job using the sbatch command.

Variant Calling Workshop | Chris Fields | 2020 11

$ sbatch call_variants_ug.sh

# This will execute call_variants_ug.sh on the biocluster.

# After you hit enter, it should say something like:

# Submitted batch job 5143759

RUN the command shown in red font. (TYPE IT, HIT ENTER)

Note: submitting the code for execution on the biocluster does not mean it has run successfully. It may take a few minutes to run.

Page 12: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1A: Running a Variant Calling Job In this step, we will start a variant calling job using the sbatch command.

Additionally, we will gather statistics about our job using the squeue command.

Variant Calling Workshop | Chris Fields | 2020 12

$ sbatch call_variants_ug.sh

# This will execute call_variants_ug.sh on the biocluster.

$ squeue -u <userID>

# Get statistics on your submitted job

RUN the command shown in red font. BUT REPLACE <userID> with your user id, e.g., class07

Page 13: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1B: Output of Variant Calling Job

Variant Calling Workshop | Chris Fields | 2020 13

Files

raw_indels.vcf

raw_indels.vcf.idx

raw_snps.vcf

raw_snps.vcf.idxHOW TO CHECK THIS?

As long as the code you submitted (through sbatch) is running, it will show something like this:

Repeat the “squeue -u <userID>” command periodically to see if your job has finished. When it finishes, it will only show the header row. When you don’t see it listed anymore, it’s done!Quick tip: You don’t need to retype or repaste it every time. Hit the UP-ARROW key and see what happens!You should have 4 files when it has completed.

Page 14: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Discussion

What did we just do?

•We ran the GATK UnifiedGenotyper to call variants.

Look at file structure.

•Which file(s)?• How to “look”?

Variant Calling Workshop | Chris Fields | 2020 14

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 15: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1C: SNP and Indel Counting• In this step, we will count the # of SNPS and Indels identified in the

raw_snps.vcf and raw_indels.vcf files.• We will use the program grep, which is a text matching program.

Variant Calling Workshop | Chris Fields | 2020 15

$ grep -c -v '^#' raw_snps.vcf

# Get the number of SNPs in file “raw_snps.vcf”

# -v Tells grep to show all lines not beginning with “#” in raw_snps.vcf.

# -c Tells grep to report the total number of lines that match the above criterion. Each such line is a SNP.

# Output should be approx. 14400.

RUN the command shown in red font. Be SUPER CAREFUL with spaces and case.

Page 16: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1C: SNP and Indel Counting• In this step, we will count the # of SNPS and Indels identified in the

raw_snps.vcf and raw_indels.vcf files.• We will use the program grep, which is a text matching program.

Variant Calling Workshop | Chris Fields | 2020 16

$ grep -c -v '^#' raw_snps.vcf

# Get the number of SNPs in file “raw_snps.vcf”

# -v Tells grep to show all lines not beginning with “#” in raw_snps.vcf.

# -c Tells grep to report the total number of lines that match the above criterion. Each such line is a SNP.

# Output should be approx. 14400.

$ grep -c -v '^#' raw_indels.vcf

# Get the number of indels.

# Output should be approx. 1069.

RUN the command shown in red font.

Page 17: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 1D: SNP and Indel Counting in dbSNP• In this step, we will count the number of SNPs and Indels in dbSNP.• dbSNP SNPs and Indels have the rs# identifier where # is a number, e.g., rs1000

Variant Calling Workshop | Chris Fields | 2020 17

$ grep -c 'rs[0-9]*' raw_snps.vcf

# Get the number of dbSNP SNPs.

# Report all lines in raw_snps.vcf containing “rs” followed by a number.

# -c Tells grep to return the total number of returned lines.

# Output should be approx. 13329

$ grep -c 'rs[0-9]*' raw_indels.vcf

# Get the number of dbSNP indels.

# Output should be approx. 983.

RUN the commands shown in red font.

Page 18: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 2A: Hard Filtering Variant Calls • We need to filter these variant calls in some way.• In general, we would filter on quality scores. However, since we have a

very small set of variants, we will use hard filtering.

Variant Calling Workshop | Chris Fields | 2020 18

Output Files

hard_filtered_snps.vcf

hard_filtered_indels.vcf

$ sbatch hard_filtering.sh

# Execute hard_filtering.sh on the biocluster.

$ squeue -u <userID>

Periodically, repeat the squeue command to see if your job has finished. When you don’t see it listed anymore, it’s done!Quick tip: You don’t need to retype or repaste it every time. Hit the UP-ARROW key and see what happens!

RUN the commands shown in red font. (REPLACE <userID> appropriately.)

This was explained in lecture. Please refer to slides for lecture.

Where are these?

Page 19: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 2B: Hard Filtering Variants CallsIn this step, we will count the # of filtered SNPs and Indels.

Variant Calling Workshop | Chris Fields | 2020 19

$ grep -c 'PASS' hard_filtered_snps.vcf

# Count # of passes

# Output approx. 8547.

$ grep -c 'PASS' hard_filtered_indels.vcf

# Count # of PASSES

# Output approx. 1069

RUN the commands shown in red font.

Page 20: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 2B: Discussion

Variant Calling Workshop | Chris Fields | 2020 20

1. Did we lose any variants?2. How many PASSED the filter?3. What is the difference in the filtered and raw input?4. Why are these approximate (why do results slightly differ)?

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 21: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 2B: Discussion

Some of the filters are as following:QD: variant confidence/ quality by depth, QD < 2MQ: RMS mapping quality, MQ < 40FS: Phred-scaled p-value, FS > 60.0

What is the difference in the filtered and raw input?In the filtered input the “FORMAT” column has information on whether the SNP has passed all the filters (“PASS”) or has failed any of them.

Variant Calling Workshop | Chris Fields | 2020 21

Page 22: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 3A: Annotating Variants With SnpEff

• With our filtered variants in hand, we now need to annotate them with SnpEff.

• SnpEff adds information about where variants are in relation to specific genes.

Variant Calling Workshop | Chris Fields | 2020 22

$ sbatch annotate_snpeff.sh

# This will execute snpeff.sh on the biocluster.

$ squeue -u <userID>

Output Files

hard_filtered_snps_annotated.vcf

hard_filtered_indels_annotated.vcf

Periodically (every 20 seconds or so), run the “squeue -u <userID>” command to see if your job has finished.

RUN the commands shown in red font. (REPLACE <userID> appropriately.)

Where are these?

Page 23: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 3B: Summarizing Annotated Variants• The IDs for the human assembly version we us are from Ensemble. The

Ensemble format is ENSGXXXXXXXXXXX.

• Example: FOXA2’s Ensemble ID is ENSG00000125798.

• In this step, we would like to see if there are any variants of FOXA2.

Variant Calling Workshop | Chris Fields | 2020 23

$ grep -c 'ENSG00000125798' hard_filtered_snps_annotated.vcf

# Get the number of SNPS in FOXA2, ENSG00000125798.

# Output should be 3.

$ grep -c 'ENSG00000125798' hard_filtered_indels_annotated.vcf

# Get the number of Indels in FOXA2, ENSG00000125798.

# Output should be 0.

RUN the commands shown in red font.

Page 24: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 4 (Optional): GATK Variant Annotator

• Run this on your own (later), we’ll skip this step in live session.• SnpEff adds a lot of information to the VCF file. • GATK Variant Annotator helps remove a lot of the extraneous information.

Variant Calling Workshop | Chris Fields | 2020 24

$ sbatch post_annotate.sh

# This will execute post_annotate.sh on the biocluster.

$ squeue -u <userID>

RUN the commands shown in red font.

What files are created? Check for yourself. (Or ask TA after session)

Page 25: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

• We’re done finding variants and annotating them. • Files created are in the current directory.• We want to visualize the variants now. • But for that we’ll go back to the VM (Remote Desktop).• What happens to the files?• We already copied over the result files to your VM, so you don’t have

to copy them now! • Exit PuTTY by either closing the window or typing ‘exit’ in the

command prompt.

Variant Calling Workshop | Chris Fields | 2020 25

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 26: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Visualization of ResultsIn this exercise, we will visualize the results of the previous exercise using the Integrated Genomics Viewer (IGV).

We are going to do visualization on VM

Variant Calling Workshop | Chris Fields | 2020 26

Page 27: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Local Files (for UIUC users)

The files needed for this laboratory exercise are in a folder called “VM”on your Desktop. See if you can locate it. Double click on it to get inside.

Once you are inside the folder, navigate to subfolder03_Variant_Calling\results

You should see something like this:

Variant Calling Workshop | Chris Fields | 2020 27

Page 28: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Local Files (for Mayo Clinic users)

The files needed for this laboratory exercise are in a folder called “datafiles”on your Desktop. See if you can locate it. Double click on it to get inside.

Once you are inside the folder, navigate to subfolder03_Variant_Calling\results

You should see something like this:

Variant Calling Workshop | Chris Fields | 2020 28

Page 29: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Run IGV

• IGV software should be on your VM Desktop or searchable through the search bar at the bottom.

Variant Calling Workshop | Chris Fields | 2020 29

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 30: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5A: Visualization With IGV

Variant Calling Workshop | Chris Fields | 2020 30

Switch the genome to Human (b37).

Note: If you cant find “Human (b37)”, click on “More”, type ”human” in the ”Filter” text box; you should see this now.

Page 31: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5B: Loading VCF Files

On the menu bar, click File

Click Load from File…

Navigate to:UIUC VM: Desktop -> VM -> 03_Variant_Calling -> results

Mayo VM: Desktop -> datafiles -> 03_Variant_Calling -> results

Hold the Ctrl key down.

Click both vcf files.

Click Open.Variant Calling Workshop | Chris Fields | 2020 31

Page 32: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5C: Loading VCF Files

You should see a window similar to below:

Variant Calling Workshop | Chris Fields | 2020 32

Page 33: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5D: Navigate to Chromosome 20

In the drop-down menu next to “Human (b37)”, select “20”.

You should see a view similar to the (partial) screenshot on the right.

Variant Calling Workshop | Chris Fields | 2020 33

Page 34: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5E: Navigate to Chromosome 20

Click and drag from around the 20 mb mark to about the 27 mb mark. (Don’t have to be exact.)

Variant Calling Workshop | Chris Fields | 2020 34

Page 35: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5F: Navigate to Chromosome 20

The result should look similar to the screenshot below:

Variant Calling Workshop | Chris Fields | 2020 35

Page 36: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5G: Setting Feature Visibility Window

Do this for each VCF track:

Right Click and Select Set Feature Visibility Window

Enter 10000 (which is 10 Mb).

Click OK.

Variant Calling Workshop | Chris Fields | 2020 36

Page 37: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 5H: Viewing FOXA2 Polymorphisms

In the search box, type FOXA2 and press Enter.

You should see something like the window below:

Variant Calling Workshop | Chris Fields | 2020 37

Page 38: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Checkpoint: FOXA2 Polymorphisms

1. How many SNPs are here?

2. How many Indels are here?

3. How many SNPs are heterozygotes?

Variant Calling Workshop | Chris Fields | 2020 38

CHECKPOINT REACHED. LAB PAUSE HERE.

Page 39: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6A: Loading a BAM File

On the menu bar, click File

Click Load from File…

Navigate to:UIUC VM: Desktop -> VM -> 03_Variant_Calling -> results Mayo VM: Desktop -> datafiles -> 03_Variant_Calling -> results

Click the .bam file.

Click Open.

Variant Calling Workshop | Chris Fields | 2020 39

Page 40: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6B: Viewing a loaded BAM File

You should see a new track in your window, similar to the one below:

Variant Calling Workshop | Chris Fields | 2020 40

Page 41: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6C: Show Coverage Track

Note: If you don't see a summary track like below :

Right Click on the BAM track.

Click Show Coverage Track.

Variant Calling Workshop | Chris Fields | 2020 41

Page 42: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6D: Color Alignments by Read

Right Click on the .bam track. (Not the .bam.Coverage track.)

Click Color Alignment by and then Read Strand

Variant Calling Workshop | Chris Fields | 2020 42

Page 43: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6E: FOXA2 Read GAP QuestionWhat is happening in the highlighted portion? (You wont see the green rectangle, we’ve added that to highlight that region here.)

Variant Calling Workshop | Chris Fields | 2020 43

Page 44: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 6F: Viewing SNP CallsZoom In (double-click several times) on a SNP to see the base pair calls on each read. Note: the double-click may be tricky to do, keep trying!

Variant Calling Workshop | Chris Fields | 2020 44

How to zoom out once this is done?

Find the instructions here:

http://software.broadinstitute.org/software/igv/?q=navigate#zoom

Page 45: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Done with IGV!

• You may close the software by going to “File” (top left), then “Exit”

Variant Calling Workshop | Chris Fields | 2020 45

Page 46: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 7: SnpEff Results (Do remaining slides on your own)

SnpEff gives a nice summary HTML file.

Navigate to the results directory for this lab:

UIUC: Desktop->VM->03_Variant_Calling->resultsMayo: Desktop->datafiles->03_Variant_Calling->results

Open snpEff_summary.html in each of the following sub directories:

1. snpeff_snp_results

2. snpeff_indel_results

Browse each of the HTML files and note the results of the following slides:

Variant Calling Workshop | Chris Fields | 2020 46

Page 47: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 7B: SNPEff Summary of SNPS

Variant Calling Workshop | Chris Fields | 2020 47

Page 48: Variant Calling Workshoppublish.illinois.edu/compgenomicscourse/files/2020/...Jun 03, 2020  · Step 2B: Hard Filtering Variants Calls In this step, we will count the # of filtered

Step 7B: SNPEff Summary of Indel Lengths

The summary of snpeff indels shows the following distribution of indellengths:

Variant Calling Workshop | Chris Fields | 2020 48


Recommended