+ All Categories
Home > Documents > Genome organization & its genetic implications

Genome organization & its genetic implications

Date post: 24-Feb-2016
Category:
Upload: lerato
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Genome organization & its genetic implications. Lander , ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach, J Rogers, PS Schnable , K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant Sci 16:77. - PowerPoint PPT Presentation
Popular Tags:
43
Genome organization & its genetic implications Lander , ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187 Feuillet, C, JE Leach, J Rogers, PS Schnable, K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant Sci 16:77
Transcript
Page 1: Genome organization  & its genetic implications

Genome organization &

its genetic implications

Lander , ES (2011) Initial impact of the sequencing of the human genome. Nature 470:187

Feuillet, C, JE Leach, J Rogers, PS Schnable, K Eersole (2011) Crop genome sequencing: lessons and rationales. Trendt Plant

Sci 16:77

Page 2: Genome organization  & its genetic implications

DNA sequencing technologies

First gen Next gen(Sanger) (454/Illumina/APG)

Read length 800 bases 30-300 basesSpeed 0.1Gb/day 1-5 Gb/dayCost / human genome

$70, 000,000 $75,000-$250,000

Metzker, M (2010) Sequencing technologies – the next generation. Nature Rev Genet 11:31

Page 3: Genome organization  & its genetic implications

What are the challenges for the correct assembly of genome sequence information?

• Genome sizeEukaryotic genomes ~ 109 – 1010 bp

• Genome compositionEukaryotic genomes ~ 50 % repetitive DNA

Page 4: Genome organization  & its genetic implications

Genome size – the C-value paradox

genome size in basepairs

Page 5: Genome organization  & its genetic implications

The amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes

Genome Size – the C value paradox:

Page 6: Genome organization  & its genetic implications

• Complexity = length in nucleotides of longest non- repeating sequence that can be formed by splicing together all unique sequence in a sample

• Eukaryotic genomes contain different classes of DNA based on sequence complexity:

highly repetitive

middle repetitive

unique

Genome composition

Page 7: Genome organization  & its genetic implications

Genome composition – DNA re-association kinetics

complexity in

[moles of nucleotide / liter] x sec

Page 8: Genome organization  & its genetic implications

Genome composition - DNA re-association kinetics for a complex eukaryotic genome

[moles of nucleotide / liter] x sec

highly repetitive sequences

middle repetitive sequences

single copy sequences

Page 9: Genome organization  & its genetic implications

From genome composition to genome organization

How are unique, middle repetitive and highly repetitive sequences organized in the genome?

Page 10: Genome organization  & its genetic implications

Genome organizationE. coli

S. cerevisiae

H. sapiens

Z. mays

= Repeat= Gene

gene islandgene desert

Page 11: Genome organization  & its genetic implications

Genetic complexity

• Eukaryotic genomes contain ~ 20,000 – 30,000 genes

• 30% of protein coding genes are members of gene families

duplication & divergence of sequence & gene function

Page 12: Genome organization  & its genetic implications

Gene complexity

• What does a gene look like from a sequence or transcript perspective?no “typical gene”

• Introns and exonsintrons can be numerous and long, i.e. some genes are more intron than exon!

alternative splicing variants are common

• Not all genes encode proteins

non-coding structural RNAs (e.g. rRNA, tRNA, snRNA, snoRNA)

non-coding regulatory RNAs (e.g. miRNA, lncRNA)

Page 13: Genome organization  & its genetic implications

Implications of gene and genetic complexity

• Forward genetics: Have mutant – want gene• Via map-based cloning:

Map your mutationLook at the genome sequence in the map interval to identify candidate genes

• Candidate gene identification may not be trivial, even with good genome annotation!

Especially an issue for plant genome sequences – only arabidopsis and rice are considered “finished” quality

• Note further genetic tests required, even if the perfect candidate is identified.

Page 14: Genome organization  & its genetic implications

Gene identification - open reading frames

5'atgcccaagctgaatagcgtagaggggttttcatcatga

frame 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca taa M   P   K   L   N   S   V   E   G   F S S *

frame 2  tgc cca agc tga ata gcg tag agg ggt ttt cat cat tggC   P   S   *   I   A   *   R   G   F H H

How to tell real orfs from random chance orfs?• • • •

Page 15: Genome organization  & its genetic implications

Galindo et al. PLoS Biol 5(5): e106 doi:10.1371/journal.pbio.0050106

Gene identification - short orfs can be translated!

• e.g. the drosophila tarsal-less gene

Page 16: Genome organization  & its genetic implications

Gene identification – database searchinge.g. http://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 17: Genome organization  & its genetic implications

Gene identification – shared syntenyPreserved localization of genes on chromosomes of different species

e.g. mouse chromosome 11 and parts of 5 different human chromosomes

Perfect correspondence in order, orientation and spacing of 23 putative genes, and 245 conserved sequence blocks in noncoding regions

Caution! Even regions of high synteny may not show perfect gene-for-gene correspondence

from Gibson & Muse (2002) A Primer of Genome Science,Sinauer Inc.

Page 18: Genome organization  & its genetic implications

Gene identification – shared synteny

Preserved localization of genes on chromosomes of different species

e.g. maize – sorghum (G) -rice (H)

Schnable et al. Science 326:1112

Page 19: Genome organization  & its genetic implications

Gene identification – promoter elements

• TATA – box elements 5'-TATAAA-3' or variantplant and animal promoters

• CpG islandsRegions of higher than expected CpG dinucleotide

content, un-methlylated in active promoters~ 40% of mammalian promoters~ 70% of human promotersbut NOT in plant promoter regions

• Y patch (pyrimidine-rich patch) plant not mammalian promoters

Page 20: Genome organization  & its genetic implications

Gene identification – introns & exons

• Long gene space more intron than exon

• Extreme example - human clotting factor VIII gene

Page 21: Genome organization  & its genetic implications

Gene identification – alternative splicing variants

Pistoni et al. RNA Biol 7:441

Page 22: Genome organization  & its genetic implications

Gene identification – trans-splicing

Gingeras, Nature 461: 206

Page 23: Genome organization  & its genetic implications

Gene identification – non-coding RNAs

• non-coding structural RNAs rRNA & tRNA – transcription & translationsnoRNA – small nucleolar RNAs

guide chemical modification of rRNAs & tRNAssnRNA – small nuclear RNAs

guide splicing reactions

• non-coding regulatory RNAs miRNA & siRNA - small interfering RNAs

RNAi pathwaylncRNA - long noncoding RNAs

Page 24: Genome organization  & its genetic implications

Origins of long non-coding RNAs

Kapranov, Nature Rev Genet 8:413

Overlapping transcriptional architecture

• e.g. the human phosphatidylserine decarboxylase (PISD) gene

Page 25: Genome organization  & its genetic implications

Wilusz et al. Genes Dev. 23: 1494–1504

Functions of lncRNAs

Page 26: Genome organization  & its genetic implications

Genome - Transcriptome - Proteome

• GenomeFull complement of an organism’s hereditary information

• TranscriptomeFull set of RNA molecules, coding and non-coding,

transcribed from the genome

• ProteomeFull set of proteins expressed from a genome

• Not a 1:1:1 correspondence

Page 27: Genome organization  & its genetic implications

Implications of gene and genetic complexity

• What is the take-home message for forward genetics?

Page 28: Genome organization  & its genetic implications

Implications of gene and genetic complexity

• Reverse genetics: Have gene – want phenotype

Predict phenotypes based on gene function in other organismsKnock out or knock down your gene of interest & look for corresponding changes in phenotype

Page 29: Genome organization  & its genetic implications

Gene families• Gene duplication followed by:

Duplication of gene functionDivergence of gene functionLoss of gene function leading to a pseudogene

• e.g. humanglobin gene family

Page 30: Genome organization  & its genetic implications

Gene families

• Gene duplication followed by:Duplication of gene functionDivergence of gene functionLoss of gene function leading to a pseudogene

• e.g. human beta-globin gene cluster chromosome 11Five functional genes and two pseudogenes

Page 31: Genome organization  & its genetic implications

Gene families – paralogs & orthologs

• Homologs Protein or DNA sequences having shared ancestry

• OrthologsHomologs created by a speciation eventMay or may not retain the same function!

• ParalogsHomologs created by a gene duplication eventMay or may not retain the same function!

• It is not always easy or possible to distinguish orthologs from paralogs when comparing genes or proteins between species

Page 32: Genome organization  & its genetic implications

Gene families – paralogs & orthologs

globin geneparalogs

Page 33: Genome organization  & its genetic implications

Gene families – paralogs & orthologs

orthologs

paralogs

orthologs

orthologs

Storz et al. IUBMB Life 63:313

Page 34: Genome organization  & its genetic implications

Implications of gene and genetic complexity

• What are the implications of gene families for forward genetics (i.e. looking for candidate genes that condition a mutant phenotype?)

• What are the implications of gene families for reverse genetics (i.e. altering gene function and looking for a phenotype)?

Page 35: Genome organization  & its genetic implications

Genome organization – repeated sequences ~ 50% of the genome

• Segmental duplications and copy number variation

• Tandemly repeated genesrRNA, tRNA and histone gene products needed in large amounts

• Duplicated gene families

• Transposons

• Tandem simple sequence repeatscentromeric & telomeric repeatsminisatellitesmicrosatellites

Page 36: Genome organization  & its genetic implications

Repeated sequences – segmental duplications & copy number variants

• Segmental duplications> 1 kb block of duplicated sequence with > 90%

sequence identityrecombine to mediate further copy number variants

Koszul & Fischer, C.R. Biologies 332:254

Page 37: Genome organization  & its genetic implications

Repeated sequences – segmental duplications & copy number variants

Page 38: Genome organization  & its genetic implications

Repeated sequences – segmental duplications & copy number variants

Girirajan et al. Annu Rev Genet 45:203

• Copy number variant (CNV)

Deviation from diploidcopy number at a locus

• Copy number polymorphism (CNP)

CNV present in >1% of apopulation

• Recent association with human developmental syndromes

Page 39: Genome organization  & its genetic implications

Transposon-derived repeated sequences

• ~ 45% of human & 85% of maize genome

Page 40: Genome organization  & its genetic implications

Transposon-derived repeated sequences

Gogvadze & Buzdin Cell Mol Life Sci 66:3727

• Many are truncated & inactive• Considered to be important in the

evolution of genome organization & function

Page 41: Genome organization  & its genetic implications

Repeated sequences – short tandem repeats

• CentromericLong array (~100,000 bp) of short tandem repeats

~ 5bp drosophila, ~150 bp maize, ~170 bp humannot conserved across speciesin some cases not even conserved in all chromosomes

of the same speciesAssociation with a centromere-specific histone H3

• Telomeric Length varies between species

~ 300 base pairs - 150 kilobasepairsConserved, G-rich repeat sequence

vertebrates TTAGGG ; most plants TTTAGGG

Page 42: Genome organization  & its genetic implications

Repeated sequences – short tandem repeats

• Minisatellites (Variable number tandem repeats, VNTRs) 10-100 bp repeat units500-30,000 bp arraysThe original DNA fingerprinting marker via Southern

blottingNow supplanted by microsatellites

Page 43: Genome organization  & its genetic implications

Repeated sequences – short tandem repeats

[CACACACA]

[GTGTGTGT]

variety A[CACA]

[GTGT]

variety B

• Microsatellites (Simple sequence repeats, SSRs)Di, tri or tetra-nucleotide repeats; 1-10 repeat units per

locusRepeat numbers expand or contract over a short

evolutionary, or even generational time-frameAmplified by PCR

Primers based on unique flanking sequenceProducts fractionated by capillary or acrylamide gel electrophoresis

Co-dominant mapping & fingerprinting markersBoth alleles can be detected in a heterozygous individual


Recommended