2. Intro MolBio (3 hours) - Home. Tecnun. Universidad de .... Intro_MolBio.pdf · Central dogma...

1

Introduction to molecular biology

Summary

• Cells

• Chromosomes

• DNA

• RNA

• Aminoacids

• Proteins

• Genomics

• Transcriptomics

• Proteomics

2

Cells

All the living beings are composed of cells, thatare the basic unit of life. Each cell derives fromother cell.

Prokaryotes

No nucleus or internal membranes. Eukaryotes

• Nucleus. • Internal membranes.• Organelles inside the cell that play different and specific roles.

Organisms can be:Unicellular

• Prokaryotes: bacteria, rchaea.• Eukaryotes: baker yeast.

Multicellular

•Eukaryotes: animals, plants, fungi…

Human beings: 60 E18 cells, 320 different types

Cells

Composition

70% Water7% Small molecules:

• Salts• Lipids• Aminoacids• Nucleotides

23% Macromolecules:• Proteins• Polysaccharide

Cell functions:

A cell contains all the necessary information toperform a replication (a virus does not!). Processesdeveloped by cells include:

Metabolic pathwaysTraduction of RNA to proteins…

3

Chromosomes

• The nucleus of Eukaryots containsone or more DNA molecules (doublestranded). Each of thesesupermoleluces are calledchomosomes.

• For examples, human beings have22 pairs of autosomes) and 1 pair ofsexual chromosomes. :

Cells

• Almost all the cells in an organism have the samegenome (some times there are slight differences).

• The DNA represents all the information needed by thecell to perform its functions.

4

Three basic macromolecules for life

• DNA

– It contains all the information needed by the cell (the “hard drive”)

– Actually, since almost all the cells in an organism share the samegenome, it contains all the information needed by ANY cell to performtheir functions.

– It stays (almost) always in the nucleus.

• RNA

– RNA has two main functions:

• It mimics the information in DNA (located in the nucleus) and migrates toother parts of the cell where this information is used (messenger RNA, mRNA)

• It has a crucial role in protein synthesis (transfer RNA, tRNA).

• Proteins

– Many different functions (signalling, structural, enzymes, regulation…). They are the key constituents of the organism.

Central dogma of molecular biology

• It is not a DOGMA– A dogma is some

important part of thefaith that must be believed.

– The researcher thatcoined this term finallyrecognized that “I did notknow what dogma meant”.

– There are strong supportto this… it is not a dogma (or at least there are other fields of knowledgethat deserve this termmuch more ☺.

5

DNA vs RNA

DNA: code of life

There are four different nucleotides for all living beings: Adenine (A), Guanine(G), Cytosine (C) y Thymine (T). They have two complimentary pairs: A-T andC-G

DeoxyriboNucleic Acid

6

DNA structure

DNA replication

7

Structure of a nucleotide

• Purines: Adenine (A) and Guanine (G). There is a double ring.

• Pyrimidines: Thymine (T), Cytosine (C) andUracil (U). Thymine is substituted by Uracilon RNA. Single ring.

One “nucleotide” is a compound formed by one base (A, C, T ó G), one sugarmolecule and phosphoric acid.

How to read a DNA sequence?

The DNA molecule is created when bonds betwen 5’

and 3’ of the nucleotides are set. DNA is alwyas readfrom 5’ to 3’. Funny equivocations…

All the nucleotides have two

bonds: 5’ and 3’. The number isthe position of the carbon atoms

in the sugar molecule.

Nucleotides, in turn, form a

phosphodiesther bond.

Sequence: TGACT

8

DNA

• It can be seen as a codewith only 4 letters instead of2 (binary coding).

• How many letters? 16 todescribe differentpossibilities.

None (gap)----

aNyG or A or T or CN

not-C, D follows CG or A or TD

not-T (not-U), V follows UG or C or AV

not-A, B follows AG or T or CB

not-G, H follows G in the alphabetA or C or TH

Weak interaction (2 H bonds)A or TW

Strong interaction (3 H bonds)G or CS

KetoG or TK

aMinoA or CM

pYrimidineT or CY

puRineG or AR

CytosineCC

ThymineTT

AdenineAA

GuanineGG

Origin of the nameMeaningSymbol

Code:

DNA is double stranded

• Hydrogen bonds between the nucleotide pairs..

• DNA is not symmetric!! It has two directions and isread, always, from 5’ to 3’.

• Both strands are complimentary: A-T and C-G

– Forward strand

– Reverse strand

9

Mitochondrial DNA

Mitochondrial DNA (mtDNA) is the DNA located in organelles called mitochondria. All mtDNA is received by the mother (since mitochondria is provided by the zygote.

Mitochondria are sometimes described as "cellular power plants," because they generate ATP, used as a source of chemical energy .

RNA (RiboNucleic Acid)

• Protein synthesis occurs in the Ribosomes• Organelles located in in the cytoplasm outside the nucleus.

• DNA is in the nucleus.• RNA transports the information from the nucleus to the Ribosomes

• The mechanism that creates RNA complimentary to DNA is called

Transcription.

DNA vs RNA:

(T) is substituted by uracil (U). • RNA is single stranded. It can bend and form two stranded chains (palindromes) (“Sit on a potato pan, Otis”).

10

Messenger RNA (mRNA)

• Part of the DNA is

trascripted into RNA (RNA is

a copy of the DNA).

• RNA goes to the cytoplasm

and in the ribosomes, mRNA

is used to build proteins.

• RNA itself is the message

from the nucleus to the

cytoplasm.

Transcription process

Inititiation

In the first stage, RNA polimerase binds to a region of DNA (that is called the promoter). The enzyme opens de DNA, and allows the creation of the RNA molecule thathas a complementary sequence to the DNA.

Elongation

RNA polimerase moves along the supporting strand andRNA nucleotides are inserted in the new RNA molecule

Termination

RNA termination process is a complex process (it involvespalidrons –hairpins- in prokaryots and more complexprocesses in eukaryots). Once it has finished, DNA isclosed again, and RNA moves form the nucleus to thecytoplasm.

11

Transcription in action

RNA maturation

• In Eukaryots, the sequence that appears in the genome is not exactly the onetranslated.

• RNA has a maturation process• Remove intermediate sequences called introns.

• Join the exons using polymerases.

• A single gene (DNA) can raise several variants (using different exons). This process iscalled Alternative Splicing.

12

What is a gene?

• Promoter region. It contains the necessary sequences to activate or deactivatethe gene. Limits are fuzzy and depends on different genes. Proximal promoter isconsidered to be 1000-50000 bp upstream the TSS (transcription start site)

• Exons: Coding regions of the gen (it converts into proteins) 1 to 178 exones/gene (average: 8.8)

8 bp to 17 kb /exon (average145 bp)

• Introns: Non coding region flanked by two exons.Size (average): 1 kb – 50 kb /intron (much larger than exons)

• Size of a gene: the largest: 2.4 Mb (Dystrophin). Average: 27 kb.

13

PROTEINS

Aminoacids

Amino acids are the basic structural building units of proteins. An amino acid is a molecule that contains both amine and carboxyl functional groups with the general formula H2NCHRCOOH, where R is an organic

substituent.

They form polymer chains

Short ones called peptides, large ones called polypeptides or proteins.

TranslationProcess to form the protein according to the mRNA template.

As both the amine and carboxylic acid groups of amino acids can react to form amide bonds, one amino

acid molecule can react with another and become joined through an amide linkage. This polymerization

of amino acids is what creates proteins.

14

Aminoacids

• 20 standard aminoacids• Bricks to build proteins.

• 10 essential amino acids

• Cannot be synthesized by human body. • They therefore must be obtained from food• Plants synthesizes all ofthem.

Aminoacids

Amino Acid 3-Letter 1-Letter polarity acidity hydrophobycityAlanine Ala A nonpolar neutral 1.8Arginine Arg R polar basic -4.5Asparagine Asn N polar neutral -3.5Aspartic acid Asp D polar acidic -3.5Cysteine Cys C polar neutral 2.5Glutamic acid Glu E polar acidic -3.5Glutamine Gln Q polar neutral -3.5Glycine Gly G nonpolar neutral -0.4Histidine His H polar basic -3.2Isoleucine Ile I nonpolar neutral 4.5Leucine Leu L nonpolar neutral 3.8Lysine Lys K polar basic -3.9Methionine Met M nonpolar neutral 1.9Phenylalanine Phe F nonpolar neutral 2.8Proline Pro P nonpolar neutral -1.6Serine Ser S polar neutral -0.8Threonine Thr T polar neutral -0.7Tryptophan Trp W nonpolar neutral -0.9Tyrosine Tyr Y polar neutral -1.3Valine Val V nonpolar neutral 4.2

15

Proteins

• Proteins are large molecules composed of aminoacids.

• Their 3D structure is complex

• It is not a double helix as DNA: the shape is different for each of

them.

• Proteins fold. This folding plays a crucial role in their function

• For example, mad cow disease is produced by an anormal folding

of a protein.

Protein structure

• Protein structure is crucial todetermine their chemicalproperties, and even, theirfunction.

• 3D structure determines whichare the aminoacids in thesurface.

• There are 4 levels at whichstructure can be studied:1. Aminoacid sequence

2. Polipeptide folding

3. Protein shape

4. Protein interactions (that includechanges in the positions of theatoms).

16

Central Dogma (once again)

• Transcription brings the data fromDNA to RNA

• RNA from the nucleus to theRibosoms

• Translation obtains protein accordingto the genetic code and thecorresponding mRNA

– tRNA is used as a lorry to carry theaminoacids as we will see.

Translation

• Translation is the second step in the central dogma. • mARN is decoded using the genetic code

• Aminoacids follow the sequence givenby mRNA.

• This process takes place in the cytoplasm.• tRNA is used as a “lorry” to carry theaminoacids.•Ribosomes are the factories to build the

proteins.

17

Trasnfer RNA (tRNA)

tRNA is a RNA that is used to carry aminoacids to the ribosomes in order to build teh proteins.tRNA abundance is larger than mRNA (75% vs 15%)

Most RNA in the cell is tRNAtRNA acknoledges mRNA and transfer the correspondign aminoacid to the protein beingcreated.

Genetic code

Codon: a sequence of 3 nucleotides that codes for an aminoacid according tothis table.

AUG codes methionine, and is also the start code. First AUG in mRNA is the region where translation starts.

18

Some exceptions:

Genetic code is almost universal.

Other considerations…

• A codon is a sequence of 3 nuclotides (DNA or RNA) thatcodes for a particular aminoacid.

– There are 4 possible bases (RNA) : A, C, G y U

– 3 bases per codon

• Therefore, there are: 4 * 4 * 4 = 64 possible codons

• Special codons:

– Start codon: AUG. Translation starts in this codon. It also codesfor an aminoacid (methinine)

– Stop codons (three flavours): UAA, UAG, UGA

• There are 61 codons left to code 19 aminoacids

– Genetic code is redundant: the same aminoacid may be codedby several codons.

19

Translation again:

Anticodon: A sequence of 3

nucleotides in tRNA that acknowledgethe corresponding codon in mRNA.

Using the anticodon, the aminoacid toinclude in the protein is selected.

tRNA carries the “free” aminoacid and, in the Ribosome, it is joined to thepolypeptide chain that it is beingcreated.

For example, tRNA with anticodonUAC, corresponds to the AUG codonthat, in turn, codes methionine.

ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG…

ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G…

M E V F K A P P I G I stop

Translation in action

20

In brief:

• Proteins are coded in the genes in ADN located in the nucleus. DNA stays always in the nucleus.

• Ribosomes are factories to build proteins located in the cytoplasm. mRNA carries the mesage from the nucleus to the ribosomes. There is an intermediate step called mRNA maturation in whichintrons are excluded and exons are retained.

• Ribosomes build what mRNA codes, using aminoacids that in turn, are carried by tRNA.– Ribosomes are composed of proteins and rRNA (a third class of RNA…

and there are even more!!)

Some important

Definitions in

BIOINFORMATICS

21

• Coding From DNA to protein is done by codons. There are three possibilities (startswith the first, the second or the third nucleotide in the sequence). We can use onestrand (forward) or the other (reverse strand). Each of these six possbilities are called a reading frame. Only one of them is valid. For example, this sequence has the following possibilities :

ATGCC (M) ATGCC (C) ATGCC (A)

• A sequence flanked by start codon and a stop codon is called an Open ReadingFrame (ORF).

ORF (Open Reading Frame):

Genomic Sequence

Open reading frame

ATG TGA

ORFs as gene candidates

• An open reading frame that begins with a start codon (ATG)

• Most prokaryotic genes code for proteins that are 60 or more amino acids in length

• The probability that a random sequence of nucleotides of length n has no stop codons is (61/64)n

• When n is 50, there is a probability of 92% that the random sequence contains a stop codon

• When n is 100, this probability exceeds 99%

– A large sequence without stop codons is probably coding a protein.

22

• Nucleic acids = composed of nucleotides= bases or base pairs

• Short form: nt (nucleotides), bp (base pairs).

• 400-nt: means 400 nucleotide positions (in DNA theyare 800!)

• 400-bp menas 400 base pairs

• 1000000-bp = 1000-kb = 1-Mb

Definitions:

Genomic analysis

How to build a whole genome in four steps:– Cut it!:

• Restriction enzymes break the DNA in specific sites.

• It is divided into sort pieces.

– Copy it!: • It is easy to copy DNA (it was designed for that!).

• We get several clones of each DNA sequence using the Polymerase ChainReaction (PCR).

– Using a cycle of PCR, the concentration of DNA is doubled

» 20 cycles of PCR increases the concentration by 2^20…

» (about 1 million times)

– Read it!:• Electrophoresis to read small fragments.

– Ensembl it!:• Using all the fragments, there are overlapping sequences that can be used to

perform the ensembl (just like building a puzzle).

• This puzzle has “large sky regions” difficult to build: there are large parts ofthe genomes quite repetitve (and they are also important).

23

Genomic analysis and bioinformatics

DNA sequence analysis

Gene hunting

Protein sequenceanalysis

2001: First draft versionof the human genome. 2003: Human genomecurated. First “releaseversion”. Mouse genomecompleted.

Protein function can be inferred from

their sequence and structure. Structure

analysis gives better resutls

Once that we have the sequence we can find genes (using

statistical properties of the intra gene regions).

It is also important to measure gene expression and predict

their function.

• DNA– Useful for genomic diseases

• Single gene (mendelian), chromosomal.

• Multifactorial o complex diseases.

Predisposition to develop a disease

– Does not change if the organism has an acquired disease condition �

Not valid as a marker of an acquiered disease

• RNA– Easy to measure

– RNA concentration changes for disease state

Early marker for different diseases

• Proteins

– It is difficult to perform a whole proteome analysis.

– They finally explain most of the disease targets� Closer to thebiological fact

Most reasonable drug targets

Bioinformatics analysis:

24

Genomes:

ORGANISM CHROMOSOMES Size GENE Number

Homo sapiens

(Humans)23 3,200,000,000 ~ 30,000

Mus musculus

(Mouse)20 2,600,000,000 ~30,000

Drosophila

melanogaster

(Fruit Fly)

4 180,000,000 ~18,000

Saccharomyces

cerevisiae (Yeast)16 14,000,000 ~6,000

Zea mays (Corn) 10 2,400,000,000 ???

Transcriptome:

• Different mRNA (including splice forms) for a particular organism.

– About 30.000 genes

– About 250.000 splice variants

• Other RNA fucntions related with trasncription regulation

– miRNA: small pieces of RNA that interrupt the transcription of a gene

25

Proteome

• The complete collection of proteins in an organism– Nobody knows how many… At least several millions.

– For each splice variant, using post transductional modifications, different proteins (with different functions can be obtaines).

• One gene � Several splice forms � Several proteins– Proteins are modified by other molecules that are joined to it

• Phosphate, acyl, methil, sugars, lípids, etc.,

• They change radically the activity of the protein

– There are many proteins with two forms: idle and active. Thetransition is done by adding a phosphate group.

• Many disparate biological activity can be assigned to a single gene.

Questions?

26

Problem:

Genetic code

AUG codes methionine, and it is also the start codon.

Date post:	16-Apr-2018
Category:	Documents
Upload:	hoangdien
View:	218 times
Download:	3 times

2. Intro MolBio (3 hours) - Home. Tecnun. Universidad de .... Intro_MolBio.pdf · Central dogma...

Documents