+ All Categories
Home > Documents > Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with...

Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with...

Date post: 03-Jul-2019
Category:
Upload: dangthu
View: 215 times
Download: 0 times
Share this document with a friend
19
Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist Aaron Quinlan University of Utah quinlanlab.org 1 https://gist.github.com/arq5x/9e1928638397ba45da2e#file-autosomal-dominant-sh
Transcript
Page 1: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Identifying dominant gene candidates with GEMINI

Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist

Aaron Quinlan University of Utah

!!!!!quinlanlab.org

1

https://gist.github.com/arq5x/9e1928638397ba45da2e#file-autosomal-dominant-sh

Page 2: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Autosomal dominant pedigrees

Jessica Chong2

Page 3: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Create a PED file (done for you)

Jessica Chong3

Page 4: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Create a GEMINI database from a VCF

Notes: 1. The VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP.

$  curl  https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz  >  trio.trim.vep.vcf.gz  

$  curl  https://s3.amazonaws.com/gemini-­‐tutorials/dominant.ped  >  dominant.ped  

$  gemini  load  -­‐-­‐cores  4  \  

                           -­‐v  trio.trim.vep.vcf.gz  \  

                           -­‐t  VEP  \  

                           -­‐-­‐skip-­‐gene-­‐tables  \  

                           -­‐p  dominant.ped  \  !                  trio.trim.vep.dominant.db

Note: copy and paste the full command from the Github Gist to avoid errors

http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants

4

Page 5: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Running the autosomal_dominant tool

$ gemini autosomal_dominant --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ trio.trim.vep.dominant.db \ | head \ | column -t

Note: copy and paste the full command from the Github Gist

chrom    start            end                ref    alt    gene                impact                          cadd_raw    variant_id    family_id    family_members    family_genotypes    samples        family_count  chr15    59508889      59508890      T        C        AC092756.1    mature_miRNA              -­‐0.78          8480                family1        1805,1847,4805    T/C,T/T,T/C              1805,4805    1  chr15    84868493      84868494      G        C        AC103965.1    downstream                  -­‐0.37          9360                family1        1805,1847,4805    G/C,G/G,G/C              1805,4805    1  chr15    75966884      75966885      G        A        AC105020.1    upstream                      -­‐0.74          8906                family1        1805,1847,4805    G/A,G/G,G/A              1805,4805    1  chr15    84945184      84945185      C        T        AC136704.1    upstream                      0.01            9370                family1        1805,1847,4805    C/T,C/C,C/T              1805,4805    1  chr15    84945511      84945512      C        T        AC136704.1    upstream                      0.74            9371                family1        1805,1847,4805    C/T,C/C,C/T              1805,4805    1  chr15    84949950      84949951      C        G        AC136704.1    mature_miRNA              -­‐0.43          9378                family1        1805,1847,4805    C/G,C/C,C/G              1805,4805    1  chr15    89398552      89398553      C        A        ACAN                non_syn_coding          2.57            9519                family1        1805,1847,4805    C/A,C/C,C/A              1805,4805    1  chr15    100649247    100649248    G        A        ADAMTS17        synonymous_coding    1.51            9856                family1        1805,1847,4805    G/A,G/G,G/A              1805,4805    1  chr15    100692844    100692845    A        G        ADAMTS17        non_syn_coding          -­‐1.16          9858                family1        1805,1847,4805    A/G,A/A,A/G              1805,4805    1

5

Page 6: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Time to start filtering

$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ trio.trim.vep.dominant.db \ | wc -l

Note: copy and paste the full command from the Github Gist

1515  candidates

6

Page 7: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Again, use the -­‐-­‐filter option

$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL" \ trio.trim.vep.dominant.db \ | wc -l

Note: copy and paste the full command from the Github Gist

1288  candidates

Exclude variants that failed GATK filters

7

Page 8: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Further restrict variants with functional consequence

$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW'" \ trio.trim.vep.dominant.db \ | wc -l

Note: copy and paste the full command from the Github Gist

347  candidates

8

Page 9: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l

Use ESP and ExAC to focus on rare variants

Note: copy and paste the full command from the Github Gist

40  candidates

9

Page 10: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" trio.trim.vep.dominant.db \ wc -l

Let’s be more strict with the functional consequence

Note: copy and paste the full command from the Github Gist

chrom    start          end              ref    alt    gene          impact                      cadd_raw    variant_id    family_id    family_members    family_genotypes    samples        family_count  chr17    28778816    28778817    T        C        CPD            stop_loss                1.09            11860              family1        1805,1847,4805    T/C,T/T,T/C              1805,4805    1  chr2      21236250    21236251    G        A        APOB          stop_gain                7.82            492                  family1        1805,1847,4805    G/A,G/G,G/A              1805,4805    1  chr22    21363743    21363744    T        C        TUBA3FP    splice_acceptor    2.46            15272              family1        1805,1847,4805    T/C,T/T,T/C              1805,4805    1

10

Page 11: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Are any of these variants known to underlie a clinical phenotype?

chrom    start          end              ref    alt    gene          impact                      cadd_raw    variant_id    family_id    family_members    family_genotypes    samples        family_count  chr17    28778816    28778817    T        C        CPD            stop_loss                1.09            11860              family1        1805,1847,4805    T/C,T/T,T/C              1805,4805    1  chr2      21236250    21236251    G        A        APOB          stop_gain                7.82            492                  family1        1805,1847,4805    G/A,G/G,G/A              1805,4805    1  chr22    21363743    21363744    T        C        TUBA3FP    splice_acceptor    2.46            15272              family1        1805,1847,4805    T/C,T/T,T/C              1805,4805    1

How could you extend the query from the previous slide to answer this? Hint: use the gemini documentation

11

Page 12: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Custom analyses: Use the query module to

identify autosomal dominant variants

12

Page 13: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Using the -­‐-­‐gt-­‐filter

We need to use the query module to enforce an autosomal dominant inheritance pattern.

13

Page 14: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Using the -­‐-­‐gt-­‐filter

Note: copy and paste the full command from the Github Gist

$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "gt_types.4805 == HET \ and gt_types.1805 == HET \ and gt_types.1847 == HOM_REF" \ trio.trim.vep.dominant.db \ | head \ | column -t

chrom    start        end            ref    alt    gene                impact              gts.1805    gts.1847    gts.4805  chr2      229933      229934      A        G        SH3YL1            intron              A/G              A/A              A/G  chr2      277249      277250      G        A        ACP1                downstream      G/A              G/G              G/A  chr2      675830      675831      G        T        TMEM18            intron              G/T              G/G              G/T  chr2      905441      905442      A        T        AC113607.2    upstream          A/T              A/A              A/T  chr2      1636903    1636904    C        T        PXDN                UTR_3_prime    C/T              C/C              C/T  chr2      1642789    1642790    T        C        PXDN                intron              T/C              T/T              T/C  chr2      3653841    3653842    C        A        COLEC11          intron              C/A              C/C              C/A  chr2      3660868    3660869    G        A        COLEC11          intron              G/A              G/G              G/A  chr2      3661001    3661002    G        A        COLEC11          intron              G/A              G/G              G/A

14

Page 15: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

What about large pedigrees or multiple families? --gt-­‐filter  “wildcards”

15

Page 16: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Using the --gt-­‐filter  “wildcards”

Note: copy and paste the full command from the Github Gist

$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all)” \ trio.trim.vep.dominant.db \ | head \ | column -t

Affected individuals must be HETUnffected individuals must be HOM_REF

chrom    start        end            ref    alt    gene                impact              gts.1805    gts.1847    gts.4805  chr2      229933      229934      A        G        SH3YL1            intron              A/G              A/A              A/G  chr2      277249      277250      G        A        ACP1                downstream      G/A              G/G              G/A  chr2      675830      675831      G        T        TMEM18            intron              G/T              G/G              G/T  chr2      905441      905442      A        T        AC113607.2    upstream          A/T              A/A              A/T  chr2      1636903    1636904    C        T        PXDN                UTR_3_prime    C/T              C/C              C/T  chr2      1642789    1642790    T        C        PXDN                intron              T/C              T/T              T/C  chr2      3653841    3653842    C        A        COLEC11          intron              C/A              C/C              C/A  chr2      3660868    3660869    G        A        COLEC11          intron              G/A              G/G              G/A  chr2      3661001    3661002    G        A        COLEC11          intron              G/A              G/G              G/A16

Page 17: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Can apply multiple --gt-­‐filter  “wildcards”

Note: copy and paste the full command from the Github Gist

$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all) \ and (gt_depths).(*).(>=20).(all)" \ trio.trim.vep.dominant.db \ | head \ | column -t

Affected individuals must be HETUnaffected individuals must be HOM_REF

chrom    start        end            ref    alt    gene                impact              gts.1805    gts.1847    gts.4805    gt_depths.1805    gt_depths.1847    gt_depths.4805  chr2      229933      229934      A        G        SH3YL1            intron              A/G              A/A              A/G              161                          225                          231  chr2      277249      277250      G        A        ACP1                downstream      G/A              G/G              G/A              183                          237                          234  chr2      905441      905442      A        T        AC113607.2    upstream          A/T              A/A              A/T              90                            142                          246  chr2      1636903    1636904    C        T        PXDN                UTR_3_prime    C/T              C/C              C/T              54                            87                            174  chr2      1642789    1642790    T        C        PXDN                intron              T/C              T/T              T/C              74                            120                          179  chr2      3653841    3653842    C        A        COLEC11          intron              C/A              C/C              C/A              123                          223                          199  chr2      3660868    3660869    G        A        COLEC11          intron              G/A              G/G              G/A              76                            96                            250  chr2      3661001    3661002    G        A        COLEC11          intron              G/A              G/G              G/A              70                            86                            182  chr2      6879884    6879885    T        A        LINC00487      intron              T/A              T/T              T/A              83                            110                          106

Everyone must have sequence depth >=20

17

Page 18: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

We have the inheritance model, now apply the annotation filters

Note: copy and paste the full command from the Github Gist

$ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants \ WHERE filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==HET).(all) \ and (gt_types).(phenotype==1).(==HOM_REF).(all) \ and (gt_depths).(*).(>=20).(all)" \ trio.trim.vep.dominant.db \ | column -t

chrom    start          end              ref    alt    gene          impact                      gts.1805    gts.1847    gts.4805    gt_depths.1805    gt_depths.1847    gt_depths.4805  chr2      21236250    21236251    G        A        APOB          stop_gain                G/A              G/G              G/A              46                            72                            112  chr17    28778816    28778817    T        C        CPD            stop_loss                T/C              T/T              T/C              144                          171                          207  chr22    21363743    21363744    T        C        TUBA3FP    splice_acceptor    T/C              T/T              T/C              36                            56                            241

Note that (pleasingly) these are the same three candidates as detected with the autosomal_dominant tool.

18

Page 19: Identifying dominant gene candidates with GEMINI · Identifying dominant gene candidates with GEMINI Please refer to the following Github Gist to find each command for this session.

Load the following files into IGV (Load from URL) and inspect your candidates

BAM alignment files: !https://s3.amazonaws.com/gemini-­‐tutorials/1805.workshop.bam  https://s3.amazonaws.com/gemini-­‐tutorials/1847.workshop.bam  https://s3.amazonaws.com/gemini-­‐tutorials/4805.workshop.bam

VCF variant file: !https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz  

!

19


Recommended