+ All Categories
Home > Documents > Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical...

Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical...

Date post: 21-Jan-2016
Category:
Upload: madlyn-ellis
View: 216 times
Download: 3 times
Share this document with a friend
Popular Tags:
22
Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics nderstanding your sequence context
Transcript
Page 1: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Sequence Tracking

Deanna M. Church Staff Scientist, NCBI

@deannachurch Short Course in Medical Genetics 2013

Understanding your sequence context

Page 2: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

What’s in a name?

Bob Bob

BobBob

Page 3: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Bob

*

*http://howmanyofme.com

What’s in a name?

123-45-6789

Page 4: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Bob

MirandaLydia

Samantha

What’s in a name?

Need more than unique identifiertrack updates/improvements

Page 5: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

chr1Chr11Chrom1

Page 6: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Mouse chrX: 34,800,000-34,890,000

NC_000086.123456 CM001013.17 2

Page 7: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Mouse chrX: 35,000,000-36,000000

X

MGSCv3 MGSCv36

Page 8: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

GenBank

Data Archives

Data in a common formatData in a single location (and mirrored)Most quality checked prior to depositionRobust data tracking mechanism (accession.version)Data owned by submitter

Page 9: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Data tracking

ABC14-1065514J1GapsPhase LengthDate

FP565796.1 1 121-Oct-2009

FP565796.2 1 014-Oct-2010

FP565796.3 3 007-Nov-2010

Page 10: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Data Archives

Initial versions of human and mouse reference assemblies not in INSDC!!*

First human version in INSDC: GRCh37First mouse version in INSDC: NCBI36

* But were tracked by RefSeq

Page 11: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Data ArchivesINSDC archives track INDIVIDUAL sequences

An assembly is a COLLECTION of sequences

Page 12: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

hg19GRCh37

mm8MGSCv37

NCBIM37

danRer5Zv7

More naming issues

Page 13: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

chr21:8,913,216-9,246,964

Zv7

Page 14: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Zv7 chr21:8,913,216-9,246,964 vs MGSCv36 chrX

Page 15: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

http://www.ncbi.nlm.nih.gov/genome/assembly

GRCh37hg19

Page 16: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.
Page 17: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Genome Browser Agreement

Submitter deposits assembly to

GenBank/EMBL/DDBJAssembly QA

Submitter updates assembly based on QA

results

Browsers pick up assembly from

GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ

Page 18: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

GenBank RefSeq vs

Submitter Owned RefSeq Owned

Redundancy Non-RedundantUpdated rarely Curated

INSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

Page 19: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.
Page 20: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

RefSeq for Assemblies

Typical assembly edits

Addition of non-nuclear (e.g. MT) assembly units

Removal of contamination

Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome(while preserving coordinate space)

Page 21: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

http://www.ncbi.nlm.nih.gov/assembly/organism/9606/

Human assemblies in assembly database

Page 22: Sequence Tracking Deanna M. Church Staff Scientist, NCBI @deannachurch Short Course in Medical Genetics 2013 Understanding your sequence context.

Take home messages

Assemblies can (and do) update!Know what assembly your are working on

Track by accession.version, not just nameData in INSDC databases are mirroredRefSeq is NCBI specific


Recommended