COT 6930HPC and Bioinformatics
Introduction to Molecular Biology
Xingquan ZhuDept. of Computer Science and Engineering
Outline
Cell DNA
DNA Structure DNA Sequencing
RNA (DNA-> RNA) Protein
Protein structure Protein synthesis
Central Dogma of Biology: DNA, RNA, and the Flow of Information
TranslationTranscription
Replication
A sequence from 20 amino acids
Adopts a stable 3D structure that can be measured experimentally
RibbonSpace fillingCartoon Surface
Oxygen
Nitrogen
Carbon
Sulfur
Protein
Lys Lys Gly Gly Leu Val Ala His
The 20 amino acids
• Each amino acid contains an "amine" group (NH3) and a "carboxy" group (COOH) (shown in black in the diagram).• The amino acids vary in their side chains (indicated in blue in the diagram).
Protein Structure
Protein Structure Primary structure (amino acid sequence) Secondary structure (local folding) Tertiary Structure (global folding) Quaternary structure (multiple-chain)
Protein Structure Animation https://mywebspace.wisc.edu/jonovic/web/
proteins.html
C- terminal
N-terminal
Polypeptide
One end of every polypeptide, called the amino terminal or N-terminal, has a free amino group. The other end, with its free carboxyl group, is called the carboxyl terminal or C-terminal.
Peptide: 50 amino acids or lessPolypeptide: 50-100 amino acidsProtein: over 100 amino acids
Polypeptide
The amino acids are linked covalently by peptide bonds. The image shows how three amino acids linked by peptide bonds into a tripeptide.
Secondary Structure Secondary structure describes the way the chain
folds Local structure of consecutive amino acids Common regular secondary structures
Helix Sheet b turn
Tertiary Structure of protein
Tertiary Structure describes the shapes which form when the secondary spirals of the protein chain further fold up on themselves.
Quaternary structure (multi-chain structures)
Quaternary structure describes any final adjustments to the molecule before it can become active. For example, pairs of chains may bind together or other inorganic substances may be incorporated into the molecule.
Protein Structure Space
http://www.nigms.nih.gov/psi/
Protein folding taxonomy :
all alphaall beta
alpha/betaalpha+beta
others
Geometry of Protein Structure
rotatable rotatable
Total number of degree is 2*(n-1)
where n is the length of the protein
The Leventhal Paradox
Given a small protein (100aa) assume 3 possible conformations/peptide bond
3100 = 5 × 1047 conformations Fastest motions 10- 15 sec so sampling all conformations would
take 5 × 1032 sec 60 × 60 × 24 × 365 = 31536000 seconds in a year Sampling all conformations will take 1.6 × 1025 years Proteins do not have problem in folding, we have! the Leventhal
paradox
Outline
Cell DNA
DNA Structure DNA Sequencing
RNA (DNA-> RNA) Protein
Protein structure Protein synthesis
Codons and anticodonsDNA: TAC CAT GAG ACT … ATC
mRNA: AUG GUA CUC UGA … UAGtRNA: UAC CAU GAG ACU … AUC
DNA RNA
cDNAESTsUniGene
phenotype
GenomicDNADatabases
Protein sequence databases
protein
Protein structure databases
transcription translation
Gene expressiondatabase
Transcription & Open Reading Frame (ORF)
Open Reading Frame (ORF) Where to start reading codons (ATG) 6 possible reading frames (3 forward, 3 backward) Gene is usually longest ORF found
Forward reading frame example
Complication – Non-coding Regions
Non-coding regions Very little genomic DNA produce proteins Exon – DNA expressed in protein (2–3% of human genome) Intron – DNA transcribed into mRNA but later removed Untranslated region (UTR) – DNA not expressed
UTRs may affect gene regulation & expression Biological processes
Remove introns from mRNA, splice exons together Transition between intron / exon = splice site
Splicing can be inconsistent Some exons may be skipped Result = splice-variant gene / isoform Estimated 30% of human proteins from splice-variant genes
Transcription
The process of making RNA from DNA
Needs a promoter region to begin transcription.
ExonsControl regions
Splicing
Transcription Introns
Alternative Splicing
One single gene produce different forms of a protein A single gene can contain numerous exons and introns, and the
exons can be spliced together in different ways
Complication: Mutations
Mutations Modifications during DNA replication
Possible changes Point mutation / single nucleotide polymorphism (SNP)
5’ A T A C G T A … 5’ A T G C G T A … Occur every 100 to 300 bases along the 3-billion-base human
genome Duplicate sequence Inverted sequence Insert / delete sequence ( indel )
Outline
Cell DNA
DNA Structure DNA Sequencing
RNA (DNA-> RNA) Protein
Protein structure Protein synthesis