+ All Categories
Home > Education > Introduction to Gene Mining Part A: BLASTn-off!

Introduction to Gene Mining Part A: BLASTn-off!

Date post: 19-Aug-2015
Category:
Upload: adcobb
View: 16 times
Download: 0 times
Share this document with a friend
Popular Tags:
55
Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to search for a human gene of interest in a plant model. Evaluate the significance of your search results to see how similar human and plant genes might be. 1
Transcript

1

Introduction to Gene MiningPart A: BLASTn-off!

After Part A you will demonstrate your ability to:

Use the bioinformatics NCBI Gene and BLASTn tools to search for a human gene of interest in a plant model.Evaluate the significance of your search results to see how similar human and plant genes might be.

The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI-1262414)

and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1).

These lessons were developed during the summer of 2015 as education outreach for the www.Araport.org portal in

conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA.

Contact informationGeneral information: [email protected]

Jason Miller, Grant Co-Principal Investigator, JCVI [email protected]

This lesson was prepared by Andrea Cobb, Ph.D. ([email protected])

with the help of Margot Goldberg ([email protected])

The images below are all examples of….?

3

What science models do you recall?

Lipid bilayer model Lock and key model of enzymes Stickleback model of evolutionComputer models Experimental model of osmosis

4

Why use models instead of the “real thing”?

To simplify a complex systemExample: Study an enzyme reaction in a test tube rather than in the whole organism which contains many enzymes.To better manipulate and measure an effectExample: Treat Drosophila with drug X and measure the drug’s effect on Drosophila life span.To predict (test the model)Example: Use a computer model to find protein coding regions in the DNA of a newly sequenced genome. Other ideas?

5

Thanks for volunteering for our study. Your chart says you have problems eating, facial weakness and

overall poor muscle tone. Looks like your mother had the same symptoms.

Your diagnosis is nemaline myopathy. I am sad to tell you that no known treatment exists, but my

researchers and I are working hard to find a treatment.

You can find information on this genetic disorder in a website called Online Mendelian Inheritance

in Man http://www.OMIM.orgThe OMIM database shows that you might have

a mutation in your Actin alpha 1 gene.

We won’t experiment on you! It is much faster, kinder and less expensive to use a plant model.

Thanks for your

help, Doctor!

https://www.youtube.com/watch?v=foHiKrlY9Qc explains why scientists use a certain plant for a model

7

Which plant will you use tostudy a version of my actin alpha 1

(ACTA1) gene?

https://www.arabidopsis.org/portals/education/aboutarabidopsis.jsp

8

Can plants really be used as models for studying human diseases?

9

Xiang Ming Zu and Simon Geir Molier, Current Opinions in Biotechnology, 2011, 22, 300-307. 10

• http://www.bbc.co.uk/programmes/p00lx6cl

• https://www.youtube.com/watch?v=eDA8rmUP5ZM

http://aboutlifting.com/music-helps-plants-grow-and-will-help-muscles-grow/ 11

Before we find out whether plants have human muscle genes, it would be important to know if plants move!

12

Why don’t you rest? I am going to search the OMIM database to find out more about your possible gene

mutation.

Use your computer and go to: http://www.OMIM.organd find out more about nemaline myopathy and the ACTA1 gene that may be involved.

After you answer questions on your handout, type in any human disease that interests you and examine the results.

• Use your computer to find: http://www.OMIM.org and learn more about nemaline myopathy and the ACTA1 gene that may be involved.

• After you answer questions on your handout, search for any human disease that interests you and examine the information.

13

Use your textbook, open access textbooks, videos and databases to begin to find information about

muscle genes and proteins.

https://www.boundless.com/biology/ 14

Usually, a general search engine will give you too many hits for the question below!

15

108 results

Even a broad scientific database may provide too many unrelated hits!

Why are thereSO MANY results?

16

“BIG DATA”

Biologists are increasingly able to quickly generate enormous amounts of data but their data analysis may take weeks or even years. Data transfer protocols are not interchangeable, data storage is expensive, queries can crash!

https://en.wikipedia.org/wiki/List_of_RNAs

17

What scientific approach finds better information?

• Bioinformatics is an interdisciplinary approach which uses computational, mathematical, and engineering methods to analyze and make discoveries from enormous data sets.

18

To address the problem of BIG DATA, scientists can share data and analysis with other scientists.

This speeds analysis and adds expertise . Scientists can share their data in research-specific portals.

These research-specific portals usually have customized bioinformatics tools.

19

A few examples of how bioinformatics is used….Use Questions addressed:

Basic research How is DNA organized in chromosomes?Are genes related to other genes? Given sequence data, how do we find a gene? How are genes expressed in response to the environment?

Biomedicine Will this drug work on this patient? Can we cure genetic diseases? Which genetic variations are associated with heart disease? Which pathogen proteins are best for vaccine development?

Microbiology Can microbes remove pollution? Can microbes decrease the impact of climate change? Where did a disease originate?

Agriculture Can drought resistant plants be identified, bred or engineered? Can insect resistant plants improve food supplies? Can more healthful food sources be developed?

Use Questions addressed:

Basic research How is DNA organized in chromosomes?Are genes related to other genes? Given sequence data, how do we find a gene? How are genes expressed in response to the environment?

Biomedicine Will this drug work on this patient? Can we cure genetic diseases? Which genetic variations are associated with heart disease? Which pathogen proteins are best for vaccine development?

Microbiology Can microbes remove pollution? Can microbes decrease the impact of climate change? Where did a disease originate?

Agriculture Can drought resistant plants be identified, bred or engineered? Can insect resistant plants improve food supplies? Can more healthful food sources be developed?

20

Scientists are more likely to find useful information in bioinformatics portals that support their particular research.

21

Explore

National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/gene

Araport https://www.araport.org/

An example of increasingly more specific research-centered portals22

http://www.phytosystems.ulg.ac.be/florid/FLOR-ID

23

For our plant model to be useful for my research, I must find a similar plant

version of the ACTA1 gene involved in nemaline myopathy.

Since plants and animals both move, do they use the same types of proteins to

move? Do they have the same genes coding for

these proteins?

Begin your search on the NCBI portal to find names of human muscle genes.

Use http://www.ncbi.nlm.nih.gov/ and enter information shown, use the pull- down menu to select Gene. (Note: Araport.org and similar genome browsers will also allow you to search for genes and proteins of interest.)

24

Could plant and animal versions of this gene have a function in common?

25

Actin subunits self-assemble to form filaments which have a role in cell structure.

Check the “Inner Life of the Cell” video.https://www.youtube.com/watch?v=FzcTgrxMzZk(2:20 until 3:15)

https://www.youtube.com/watch?v=VVgXDW_8O4U

is a video showing polymerization of G-actin, a protein similar to Alpha Actin.

This is how your actin

should work.

26

Click on FASTA to obtain the human ACTA1 gene sequence.

If it is reasonable that plants might have a gene similar to human ACTA1, you will need to find the ACTA1 gene sequence.

27

Copy, then paste the ACTA1 gene sequence to a new Word document or clipboard—we will use this to look for an

Arabidopsis thaliana version of this gene. Save the Word document as “human ACTA1 DNA sequence”.

28

I want to search for a version of the human ACTA1 gene in Arabidopsis thaliana.

What bioinformatics tool could I use?

29

30

BLAST Types

BLASTn compares 2 or more DNA sequences

BLASTp compares 2 or more protein sequences

BLASTX reads a DNA sequence in the 6 possible reading frames then compares it to a protein sequence database

tBLASTX compares 2 or more DNA sequence translated in 6 reading frames

31

32

http://www.ncbi.nlm.nih.gov/

There are several ways to access NCBI BLAST. Start at the URL and page, then select BLAST.

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Or just go to the BLAST page URL below.

Select nucleotide blast

If I have a known DNA sequence , how can I use BLASTn to look for an unknown similar sequence?

33

Click on FASTA to obtain the human ACTA1 gene sequence.

You found a human gene to compare…

34

And you’ve already copied and pasted the ACTA1 gene sequence to a Word document or clipboard—we

will use this to look for an Arabidopsis thaliana version of this gene.

35

Steps to use BlastnPaste in your copied ACTA1 sequence

Enter the name of the organism in which we are looking for the same gene (Arabidopsis thaliana)

Select the program –use “Somewhat similar sequences” for the broadest search

#4 push blast button

Check “show results” in a new window, then click on BLAST 36

What information is provided in an NCBI BLASTn report?

The Graphics Section shows the query sequence in the red bar (green arrow) and aligned sequences are shown in colored tracks below.

Each “track” represents a sequence that the BLASTn tool discovered in the database that is similar to your query sequence. The colored sections in each track are blocks of DNA which align with varying similarity (score), shown by the colored bar above. The black lines connecting the colored blocks are poorly aligned sequences (less than 40% identity).

Move the mouse over a block to see the definition and score for that sequence result (also called “hit”).

By clicking on a colored box, you will jump to the actual DNA alignment farther down the page.

37

38

What information is provided in an NCBI BLASTn report?

The Descriptions Section lists the aligned sequence names and provides information about the alignment. In this search, we are using one gene sequence to find a similar gene sequence. Look at the results that end in “gene”.

What is gene alignment?

What BLASTn values tell us whether the alignment is meaningful?

39

40

https://www.youtube.com/watch?v=6Udqou3vmngGo to 31:13-40:15 for a more detailed explanation of alignment.

Query

Subject (database used for search)

Starting and ending nucleotides of your query

Starting and ending nucleotide coordinates for this sequence in its database

41

BLASTn seeks to maximize the score for aligning shorter stretches of Query compared to the database. Alignment of the entire query is not required by Local alignment. Matching nucleotides are given a score of +1 and mismatches are negative. There are penalties for gaps. There are different algorithms, but this is the general idea.

42

“Query cover” tells what percentage of the alignment is a good match to your input sequence (query).

Note that the query is more than 2750 nucleotides long.

43

The query coverage is low here (20%) because you are comparing 2 DNA sequences which contain exons (conserved, thus aligned) and introns (not highly conserved, thus non-aligned or poorly aligned.

44

Although only 20% of the query aligns to a sequence in the Arabidopsis database, 80% of the aligned part is identical to the query (see the “Ident” value of 80% and the color-coded portions of the result track. )

45

“Alignments” provides details about nucleotide locations, matches, gaps or mismatches.

Access more info about the sequence by clicking on the sequence ID

46

The E-value indicates the number of alignments with an equivalent or better score from this database that would be expected just by chance. For example, a one-in- a million (1/1,000,000) chance is a very small chance and would be written 1e-6.The lower the E-value, the more significant the score (less likely due just to chance) . E-values are in scientific notation, ex: 3e-80 = 3 x 10-80

47

In general, an E-value of 1X10-5 or smaller is considered significant (not just aligned by chance).

48This is from the Alignments Section and shows the details

Results are arranged in a default setting from lowest E-value to highest. Compare the E-value, Query cover and % identity for the checked “hits”. Which GENE is most similar to the human ACTA1 sequence query?

Click on the accession number for more information about the gene that had the most significant alignment

49

50

Amino acid sequence

Link for more info!

51

Explain the process you used to find a version of the human ACTA1 gene in Arabidopsis thaliana.

What information did you use to indicate that the plant version was a meaningful find?

52

Extend------You choose! 1. Pick a human gene which you

think is highly conserved between plants and animals.

2. Follow the procedure you just learned to see if a similar Arabidopsis version exists.

3. Record your info on the scorecard.

4. Repeat for a gene that you predict is unique to humans.

53

Human Gene Name

Human Gene ID

Human Gene Function

Arabidopsis Gene Name

Arabidopsis Gene ID

Arabidopsis Gene Function

Out-come evidence :Score, E-value, Similar Function,

Predic-tion?

Actin alpha 1

ACTA1 Cytoskeletal structure

ACT7 Actin 7 Cytoskeletal structure

E value was 1e-80, not random, both have similar functions….

Yes

Gene Discovery Scorecard

54

• What information so far indicates whether or not plants have animal muscle genes?

• What additional information might you need to be more certain whether ACT7 is a plant version of human ACTA1?

55


Recommended