Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | bennett-glenn |
View: | 212 times |
Download: | 0 times |
BioInformatics ConsultationPractice 3
Gábor Pauler, Ph.D.
Tax.reg.no: 63673852-3-22Bank account: 50400113-11065546
Location: 1st Széchenyi str., 7666 Pogány, HungaryTel: +36-309-015-488
E-mail: [email protected]
Content of the PracticeFragment processing:
Restriction site database: WebCutterPrimer cleaning: SMS2 DNA PatternVector cleaning: NCBI VecScreenFragment assembly: CAP3
Auxiliary sequence operations: SMS2GUIConversion operationsSequence analysisSeqence mappingRandom sequences
Uploading sequences: EBIRegistrationUpload auxiliary dataUpload sequence
Data Import/Export/Conversion operations: Excel, AccessText file formatsConverting text file formatsHTML-tables and wide textText to Excel From Excel to Text, HTML, Picture Metafile, Bitmap, Access tables
Home Assignment 3: Fragment clean and matchReferences
Fragment processing: Restriction maps: WebCutter: Input The first task in cloning where bioinformatics
is heavily involved is in pre-processing: Selecting restriction enzymes Forecasting restriction sites in case the
cloned sequence is known Performing these tasks we need a restriction
mapping tool based on database of restricti-on enzymes
We will use WebCutter for this purpose: (http://rna.lundberg.gu.se/cutter2/index.html)
At the Start Screen: Sequence title: Title of analysis DNA sequence box:Copy the exami-
ned nucleotide sequence in FASTA for-mat through clipboard(max.50000chars)
Type: Type of analysis Linear: Linear DNA Circular: Circular (eg. in plasmids) Silent mutagenesis: sites in non-
coding parts Display options: results can be
displayed both in graphic or tabular format ordered by nucleotide position/enzyme name
Enzymes: can be filtered by Least and Most number of cutting Lenght of recognition site in bases
(as lenght influences accuracy) By enzyme name list (multiple
selection with Ctrl+Click) Press Analyze sequence to run
ClickClick
CtrlCtrl
ClickClickClickClick
ClickClick
ClickClick
ClickClick
Click
Click
ClickClick
Fragment processing: Restriction maps: WebCutter: Output Character-based restriction map by base positions:
This is great for manual processing and prediction of lenght of possible fragments
However it is hard to process automatically at more numerous fragment lenght computation
Tabular list of restriction sites:
It contains enzime names, number of sites, list of coordinates of sites and recognized sequence wit GCG masked nucleotide codes at uncertain matches
It can be copied into Excel for more detailed fragment lenght forecasts (see later)
Fragment processing: Primer cleaning: SMS2 DNA Pattern In post-processing the sequenced
fragments, the first task is to eliminate sequence of primer, as it can confuse further analysis
As primers are at the very beginning of fragment sequences, usually they are already eliminated in chromatogram analysis, as recognintion of initial sequence is most of the time uncertain
But, in case it is not already eliminated, we can use SMS2 DNA Pattern (http://www.bioinformatics.org/sms2/dna_pattern.html ) to do it:
At the Start Screen: Raw sequence:Copy one or more
nucleotide sequences in FASTA format (max.50000chars)
Search pattern: Sequence of the primer. We can give alternative bases for one position in brackets: [AT] We assume here that sequence of primer(s) used is known!
Submit button: Run At Output Screen:
It gives coordinates of matching sequences
At both strands of the DNA!Results for 180 residue sequence "sample sequence one" starting "ttaaggaccc">match number 1 to "ctt[ca]" start=68 end=71 on the reverse strandctta>match number 2 to "ctt[ca]" start=2 end=5 on the reverse strandctta
ClickClick
Fragment processing: Vector cleaning: NCBI VecScreen: Input Comparing to primers, it is more
cumbersome task to clean up sequence of vectors from fragments:
Vectors sequences are longer They usually can take place both
beginning/end of fragments Vectors are usually used for multiple
purposes containing highly-featured sites
So vector-contamination can totally confuse up any further analysis if it is left in the fragment-sequence!
We will use NCBI’s VecScreen (http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html):
At the Start Screen: Sequence box:Copy the analyzed
nucleotide sequence in FASTA format (max.50000chars)
Run Vecscreen button: it will match sequence against vectors stored in NCBI’s UniVec database (ftp://ftp.ncbi.nih.gov/pub/UniVec/)
At Output Settings Screen: Graphic output Sequence retrieval: display cleaned
sequence View report button: go to output
ClickClick
ClickClick
ClickClick ClickClick
Fragment processing: Vector cleaning: NCBI VecScreen: Output At Output Screen:
At the top, we can see a graphic map overview of matching vector parts
Different intensity of match-es are coded with color regions
Down, there is a text list of matching vec-tor sequences with:
Data of vector
Matching statistics: ratio of identityes and gaps
Detailed character maps of matching
Fragment assembly: Basic definition, CAP3: Input- There is a limitation in PCR that regular DNA
polymerazes work only on. 500-1000 base pair lenght parts, and also most sequencing techniques have serious lenght limitations
- So, longer sequences can be assembled only from cloned fragments, which usually have 50-100 base pairs overlap at their end
- However, restriction sites do not distribute evenly in the genome, and it may disturb overlapped assembly. Thats why we use restriction maps designing the cloning.
- Whenever clone fragments are sequenced and cleaned from primer and vector sequences, we need a software, which Assembles(Összeszerel) the fragments: it finds ca. 100 matching base pairs between beginning/end sequence of one fragment and end/beginning sequence of reverse complement of another fragment.
- After assembly of fragments, we will have the Contig(Kontig): the longest possible compromised sequence assembled
We use CAP3 software (http://pbil.univ-lyon1.fr/cap3.php) for fragment assembly:
- At Start screen:- Sequence box: copy here fragment
DNA sequences in FASTA format after each other
- Submit button: Run- At Otput screen: we get a menu of outputs:
- Contigs: sequence of the longest resulting contig(s) (ideally there should be one) in FASTA format:
>Contig1TCCTTTAAATCCCTTACATGATCTGAGTTCAGACCGGCGTGAGCCAGGTCGGTTTCTATCCTTATTTTTTGTTTATATTTTAGTACGAAAGGACCAAGTATTTTAAATAATTTATTTTAT
ClickClick
ClickClick
Fragment assembly: CAP3: Output- Assembly details:
- It gives the sequence pairs matched at contig assemply:- + denotes the original fragment sequence, - - denotes the reverse complemented another fragment sequence
- Below them it gives the consensus sequence:Number of segment pairs = 2; number of pairwise comparisons = 1'+' means given segment; '-' means reverse complementOverlaps Containments No. of Constraints Supporting Overlap******************* Contig 1 ********************2006-ISO-TD1-2006-ISO-16S+DETAILED DISPLAY OF CONTIGS******************* Contig 1 ******************** . : . : . : . : . : . :2006-ISO-TD1- TCCTTTAAATCCCTTACATGATCTGAGTTCAGACCGGCGTGAGCCAGGTCGGTTTCTATC2006-ISO-16S+ GACCGGCGTGAGCCAGGTCGGTTTCTC-C ____________________________________________________________consensus TCCTTTAAATCCCTTACATGATCTGAGTTCAGACCGGCGTGAGCCAGGTCGGTTTCTATC
Content of the PracticeFragment processing:
Restriction site database: WebCutterPrimer cleaning: SMS2 DNA PatternVector cleaning: NCBI VecScreenFragment assembly: CAP3
Auxiliary sequence operations: SMS2GUIConversion operationsSequence analysisSeqence mappingRandom sequences
Uploading sequences: EBIRegistrationUpload auxiliary dataUpload sequence
Data Import/Export/Conversion operations: Excel, AccessText file formatsConverting text file formatsHTML-tables and wide textText to Excel From Excel to Text, HTML, Picture Metafile, Bitmap, Access tables
Home Assignment 3: Fragment clean and matchReferences
Auxiliary sequence operations: SMS2: GUI- Before uploading and further analysis of
assembled contig sequences we may need certain transformations and format conversions called sequence manipulation.There is an easy-to use, comprehensive toolkit for this called:
Sequence Manipulation Site (SMS2) (http://www.bioinformatics.org/sms2/index.html):
Graphic User Interface (GUI): all SMS2 tool share pretty similar user interface:
- Left menu: we can choose the requred operation from the hierarchic ordered list
- At Start screens:- Top: We can see the explanation of
operation- Sequence box: We can copy here
input sequence in FASTA format (or more sequences consecutively, if the current operation requres)
- There is always a suitable example nucleotide/protein input sequence in the box, making it easier to try out tools!
- Below: are parameter settings of current operation
- Submit button: Run- At Output Screens:
- Outputs are partially graphic, partially in text format, or HTML tables depending on the operation
Site: Positions: AatI agg|cct none AatII gacgt|c 160 Acc16I tgc|gca none AccII cg|cg 44
ClickClick
ClickClick
ClickClick
Auxiliary sequence operations: SMS2: Format conversion operations: Split/Combine FASTA: Cutting a longer continous FASTA sequence into
standard lenght row or concatenate more sorter FASTA into one. EMBL/GenBank-FASTA: From EMBL/GenBank record to FASTA sequence EMBL/GenBank Feature Extractor: From EMBL/GenBank record it extracts
exons and assembles them to cDNA, based on the records feature table EMBL/GenBank Trans Extractor: From EMBL/GenBank record it extracts
possible translated proteins in FASTA format (considering alternative splicing) Filter DNA/Protein: From FASTA formatted DNA/Protein sequence it cleans
illegal characters (except N, which denotes uncertain sequencing in DNA) OneToThree/ThreeToOne: It converts FASTA formatted protein sequences
between 1-char and 3-char coding format, where * and *** respectively denote uncertain sequencing or translation
Window Extractor DNA/Protein: It extracts a window from FASTA formatted DNA/Protein sequence giving the window center position coordinate and width
Range Extractor DNA/Protein: It extracts multiple ranges from FASTA formatted DNA/Protein sequence, given by comma separated coordinates or coordinate ranges, eg: 1,2,3..15,END
- It can concatenate them into one FASTA or split into equal lenght FASTA files Reverse Complement: It computes reverse(5’-3’3’-5’)/ or complement(AT,
CG)/ or reverse complement from FASTA formatted DNA sequence Complements of mask characters denoting uncertain sequencing are treated
by GCG code table! Split Codons: A coding DNA sequence given in FASTA format is understood as
undisturbed sequence of triplet/codons, and it is split to 3 sequences by in-codon position(1,2,3), eg.: from sequence: ATGATG 3 sequences: AAA,TTT,GGG
It is used solely in codon position statistical analysis
Auxiliary sequence operations: SMS2: Sequence analysis operations: Restriction Digest: Simulation of a restriction of a longer DNA sequence given in
FASTA format with a restriction enzyme selected from SMS database (or its binding site sequence given manually):
It computes a list of possible fragment sequences and writes them in one text file in consecutive FASTA records for further processing
Restriction Summary: The same as above, except that it gives not the fragments itself, but a statistic summary table about their properties
PCR Primer Stats: It forecasts for designed primer sequences given in FASTA format:- Melting temperature (important for PCR temperature programming)- Complementarity or partial complementarity (considerably complementer primers
connect to each other instead of cloned DNA strand, reducing PCR efficiency)- In case of linear or circular DNA
PCR Products: It simulates PCR of a DNA sequence given in FASTA format:- Using the selected or manually inputed open/close primers- Prepares a list of expected PCR product sequences in one text file in consecutive
FASTA records ORF Finder: In a DNA sequence given in FASTA format, it searches Open Reading
Frames (ORF): sequence parts bordered by stop codons on 2 DNA strands × 3 reading frames of codon starting positions (1,2,3) = in 6 reading frames. It is used finding possible coding parts of a DNA
Gives list of ORFs in 1 file as consecutive FASTA recs, Gives a summary table about their lenght and position
CpG Island: In a DNA sequence given in FASTA format, it searches CG-dimer rich „islands”: they are usually take place at the 5’-end of genes in vertebrates(gerincesek)
Gives a summary table about CG-island’s lenght and position Translate/Reverse translate: Translating FASTA DNA to FASTA 1-char coded
Protein, or translate back protein to most likely cDNA sequence based on the selected specie’s Codon Usage Table: probability alterante codons of aminoacids in species
Auxiliary sequence operations: SMS2: Other operations Sequence mapping operations:
Primer map: In a DNA sequence given in FASTA format, it prepares a graphic map of binding sites of given list of primers
- Also gives a summary table of coordinates of sites and primer name Restriction map: The same as above, just for restriction enzymes Translation map: In a DNA sequence given in FASTA format, it translates all
6 possible reading frames into FAST 1-car coded Protein sequences- Valid codon table can be selected (the default is Genomial (not
Mithocondrial), and Standard for vertebrates)- It assumes that DNA contains only coding parts, no introns should be there
Random sequence generation operations: Random DNA/cDNA/Protein: Random DNA/cDNA/Protein sequences for:
Simulation or try out other software, or Make unprepared students really cry
at sequence analysis computer lab exam! Wohahaha, Yeah!
Mutate/Shuffle DNA/Protein: In a DNA sequence given in FASTA format, it crea-tes flip/ insert/ shuffle mutations
Random DNA/Protein regions: In a DNA/Protein sequence given in FASTA format, it randomizes regions given by comma separated coordinates or coordinate ranges, eg: 1,2,3..15,END
Content of the PracticeFragment processing:
Restriction site database: WebCutterPrimer cleaning: SMS2 DNA PatternVector cleaning: NCBI VecScreenFragment assembly: CAP3
Auxiliary sequence operations: SMS2GUIConversion operationsSequence analysisSeqence mappingRandom sequences
Uploading sequences: EBIRegistrationUpload auxiliary dataUpload sequence
Data Import/Export/Conversion operations: Excel, AccessText file formatsConverting text file formatsHTML-tables and wide textText to Excel From Excel to Text, HTML, Picture Metafile, Bitmap, Access tables
Home Assignment 3: Fragment clean and matchReferences
Uploading sequences: EBI: Registration, Upload auxiliary data 1 After cleaning and assembling fragments, now
we have a nice sequence we would like to share with other researchers
For this purpose, we will use EBI’s interface (http://www.ebi.ac.uk/embl/Submission/index.html)
At Registration&Login Screen:- Register: Register to EBI database first:- Giving your Personal data and press Save- Then you will receive a validation e-mail to
your given address, where you should click a link to validate your registration
- After that you can login giving your e-mail and password pressing Log in button
At Function Select Screen: we have to select- Submit sequences option button- At Here link, we get a utility to check out
whether there is any vector contamination left in the sequence: it uses EBI’s BLASTN nucleotid alignment tool, to check contami-nation in a FASTA formatted DNA sequen-ce based on EMVEC vector database
At Sequence Type Select Screen: we can give the type of uploaded sequence, eg.:
- WGS (Unannotated): whole genom with shotgun cloning
- EST: Expressed sequence tags- We can way faster upload dat if it is prefor-
matted, then select EMBL, MIENS, etc. At Valid From Date Screen: we can give
whether to show it immediately or delayed. Delayed submit is important when you want to prove later, that you submitted first, but don’t want other researchers to access it until your paper is not published
ClickClick
Clic
kC
lick
Click
Click
ClickClick
ClickClick
Click Click
Click
Click
Click Click
ClickClick
Uploading sequences: EBI: Upload auxiliary data and sequence At Publication Reference Screen:
- Citation type: published/unpublished journal article, etc.
- Title, Year, Jornal name- Authors Initials, Surname- In case of multiple publications you can return to
this screen and add more ones At Auxiliary Info Selection Screen: we can select
what kind of environmental info will be attached to sequence:
- Organism, Organelle- Strain, Isolate- Contig name
At Auxiliary Info Selection Screen: we can give the previously selected auxiliary data
At Validation Screen: it checks internal logical dependencies of data. Pressing Validate button it searches Organism, Organelle at EBI databases If everything is OK, then:
At Sequence Upload Screen: you can upload sequence in FASTA format
ClickClickClickClickClickClick
ClickClick
Content of the PracticeFragment processing:
Restriction site database: WebCutterPrimer cleaning: SMS2 DNA PatternVector cleaning: NCBI VecScreenFragment assembly: CAP3
Auxiliary sequence operations: SMS2GUIConversion operationsSequence analysisSeqence mappingRandom sequences
Uploading sequences: EBIRegistrationUpload auxiliary dataUpload sequence
Data Import/Export/Conversion operations: Excel, AccessText file formatsConverting text file formatsHTML-tables and wide textText to Excel From Excel to Text, HTML, Picture Metafile, Bitmap, Access tables
Home Assignment 3: Fragment clean and matchReferences
Data Import/Export/Conversion: Introduction, Text file formats- Most of bioinformatic software receives input and gives back output
in text files (as FASTA, EMBL, Genbank are all text files)- The problem is that they output sizeable table-like results (eg.
restriction site lists) also in text file or in HTML-table, what we would like to effectively transfer to Spreadsheets(Táblázatkezelő) (Excel) or Databases(Adatbáziskezelő) (Access) for advanced analysis.
- Learning some simple tricks and techniques, one can avoid days of manual work eating time from research, solving things in 5 minutes!
Text file formats: to describe tables in text files, software use alternative methods:
Fixed column width tables: this is most popular, but it is worst:- All columns of a table have their fixed charcter-width- Data content cannot be longer than column with. If it is
shorter excess space is filled with Space(ASCII32) chars - Looking it in a Word processzor(Szövegszerkesztő)
columns look nice and aligned (assuming that text is in fixed width Courier New font type)
- Sometimes it does not contain column name texts, or only in abbreviated form, as it may not fit in the same number of characters as the data content
Column delimiter symbol-based tables: less frequent, but better:- Columns are separated by a given delimiter symbol
(Elhatárolójel) _ , : ;- So looking the file in a word processor, we can see bounch
of them- But columns do not look nice and aligned, as their data
content can be pretty variable length- So, the first line can contain whatever lenghty column
names
AA BB CC6.45 5.5 7.3515.6 17.8 3.2
AA,BB,CC6.45,5.5,7.3515.6,17.8,3.2
Data Import/Export/Conversion: Text file formats 2- There are different subspecies of delimiter symbol: Comma Separated Values, CSV:
- Popular among USB-connected instruments- Hovewer in German and Hungarian we use
comma as decimal separator instead of dot, so it can confused up with column separators.Also,text data content can contain comma
- Therefore text data can be put between Text Markers (””) Space(ASCII32)delimited format:
- This is also very popular format- One serious issue that it is very easy to mix up with fixed
column lenght format, which prevents auto-processing:- If columns are not aligned at all rows with spaces, it cannot be
processed as fixed format- While Space-delimited format understands two consecutive
spaces as Null(Üres)-valued field, messing up columns: eg.: before 7.35 there are 2 spaces. This will be the bad result:
- Such a messed up text file can be corrected in Word by selecting the text with, Shift+Cursor and launch Edit|Find/Replace(Szerkesztés|Keresés/Csere) menu to replace two consecutive spaces (__) with one (_), using Replace all(Összes cseréje) button. Repeating this sometimes, space duplications will be eliminated
Colon and semicolon separated formats:better than space delimited, but this characters can appear in stored text also. This can be solved with text mar-kers also
Tab(ASCII9)delimited format:as Tab specially denotes column break- It cannot be mixed up with other characters- But simple users can get confused,as Tab is invisible,except
when pressing button ( ) in word
”AA”,”This,not delimiter!”,”CC”6.45,5.5,7.3515.6,17.8,3.2
AA BB CC6.45 5.5 7.3515.6 17.8 3.2AA BB CC6.45 5.5 7.3515.6 17.8 3.2AA BB CC6.45 5.5 7.3515.617.8 3.2
”AA”;”This;not delimiter!”;”CC”6.45;5.5;7.3515.6;17.8;3.2
AA BB CC6.45 5.5 7.3515.6 17.8 3.2
╥╥ ╥╥╥╥ ╥╥╥╥ ╥╥
AA,BB,CC6.45,5.5,7.3515.6,17.8,3.2
Click Click
Data Import/Export/Conversion: Converting text file formats- Our frequent task is to export table-like text outputs into Excel,
Access or PowerPoint (eg. Codon usage frequencies): Word text to HTML table:
- Select the thext with Shift+Pull- Table|Convert|Text to Table (Táblázat|Konvertálás|Szöveg
táblázattá) menu:- It tries to autodetect, whether the text is in fixed column
width or in delimited format- If it misjudges(eg.on mixing the 2 formats) we can correct it- It gives the number of rows/columns to be created
- Properties of HTML table in Word:- Its rows/columns/cells are fully formattable: sizeable, colorable,
and frameable, also Font/Style/Size/Color of text can be set- Its cell can contain pictures also, while Excel table cell
cannot: picture can be there in background or on overlay- Width of columns can be set to Manual, Uniform, Fit to
content, Fit to Window width- One stupid thing in HTML is that default cell margins are huge
eating up lot of desktop space, reduce them to 0: - Select all the table with Shift+Pull- Table|Table Properties (Táblázat|Táblázat_tulajdonsá-
gai) menu:- |Cells(Cella) tab|Settings(Beállítások) button:
- |Uncheck Same as whole table (Teljes táblázat-tal egyezően)
- |Set Cell Margins(Margók) to 0cm- Another stupid thing of HTML that default column height is not
0, adding redundant space between rows. Set it to 0:- |Rows(Sorok) tab
- |Define row height (Magasság megadása):
- |At least(legalább)| 0 cm
AA BB CC6.45 5.5 7.3515.6 17.8 3.2
AA BB CC 6.45 5.5 7.35 15.6 17.8 3.2
ShiftShift
Clic
kC
lick
PullPull
ClickClick
ClickClick
ClickClick Click
Click
Click
Click
Data Import/Export/Conversion: Converting HTML and wide text HTML table from Word to Excel/PowerPoint/HTML webpage:
- Can be simply copied with Edit|Copy (Szerkesztés|Másolás) Ctrl+C Edit|Paste (Szerkesztés|Beillesztés)Ctrl+V through clipboard keeping all the formattings
HTML table from Word to Text:Select all the table with Shift+Pull:- Table|Convert|Table to Text (Táblázat|Konvertálás|Táblázat
szöveggé) menu: writes out to delimited text file format|Give delimiter character:Tab
Text from Wordb to Picture Metafile:- Output of numerous bioinformatic softwares are text files which use so
wide lines consisting lot of characters (eg. restriction or alignment maps of sequence wit characters) that they cannot fit into the page body of a Word document or a PowerPoint slide and lines messed up.How we can solve it:
- We can reduce font size but it reduces visibility:- Or we can shift from fixed lenght font Courier New to more
compact font (eg. Arial narrow), but alinment of rows will be dest-royed because it is non-fixed lenght font
- Therefore copy text to clipboard, and instead pasting normally with Edit|Paste(Szerkesztés|Beillesztés) Ctrl+V paste it with Edit|Past special(Szerkesztés|Irányított beillesztés) menu:
- |Select Enchanced Metafile(Kép) format- Text will be pasted into Word or PowerPoint as easy-to
resize picture, - Additionally,using their drawing tool(View|Tools|Drawing (Nézet|
Eszköztárak|Rajzoló) menu), picture still can be edited as a set of graphic objects: we can rewrite characters or put additional graphic
- But it cannot be edited as word processor text anymore
AA BB CC
6.45 5.5 7.35 15.6 17.8 3.2
AA BB CC6.45 5.5 7.3515.6 17.8 3.21 cagctggggggaggtggcgaggaagatgacgtggtagttgtcgcggcagctgccaggaga1 10 20 30 40 50 1 gtcgacccccctccaccgctccttctactgcaccatcaacagcgccgtcgacggtcctct
1 cagctggggggaggtggcgaggaagatgacgtggtagttgtcgcggcagctgccaggaga1 10 20 30 40 50 1 gtcgacccccctccaccgctccttctactgcaccatcaacagcgccgtcgacggtcctct
1 cagctggggggaggtggcgaggaagatgacgtggtagttgtcgcggcagctgccaggaga1 10 20 30 40 50 1 gtcgacccccctccaccgctccttctactgcaccatcaacagcgccgtcgacggtcctct
ShiftShift
PullPull
ClickClick
ClickClick
ClickClick
ClickClick
Content of the PracticeFragment processing:
Restriction site database: WebCutterPrimer cleaning: SMS2 DNA PatternVector cleaning: NCBI VecScreenFragment assembly: CAP3
Auxiliary sequence operations: SMS2GUIConversion operationsSequence analysisSeqence mappingRandom sequences
Uploading sequences: EBIRegistrationUpload auxiliary dataUpload sequence
Data Import/Export/Conversion operations: Excel, AccessText file formatsConverting text file formatsHTML-tables and wide textText to Excel From Excel to Text, HTML, Picture Metafile, Bitmap, Access tables
Home Assignment 3: Fragment clean and matchReferences
Data Import/Export/Conversion: Text to Excel Text file table to Excel table:
- Select and copy table in a text file to clipboard then paste it into cell (A1) of an empty Excel worksheet with Edit|Past special (Szerkesztés|Irányított beillesztés) menü|Selecting Plain text (Nem formázott szöveg) format:
- This will look pretty nasty at first: Excel copies it into separate rows, but columns will be melted together as text in one cell
- Select this single column (A1:A3) with Shift+Pull, and make sure that columns to the right of it are empty
- Then use Data|Text to Columns (Adatok|Szövegből oszlo-pok) menu to start text breaking wizard:
- First it asks whether text data is in fixed/delimited format:- If delimited, give delimiter symbol (eg. Comma),
and the text marker, and set whether consecutive delimiters are melted or create empty field:
- If fixed, it gives a breaking screen where you can define column delimiter arrows with Click/Pull
- Then it shows columns created, and we can decide their data type manually or leave it detected automatically:
- First problem is with that Excel by default recognizes text as dates if they conform the international settings of Windows at Start button|Control panel|Inter-national settings|Date- and numeric format (Start gomb|vezérlőpult|Területi beállítások|Dátum- és számformátum). Different dates are left as text!
- You can recognize incorrect detection by alignment: text is at left in cell, recognized dates/numbers at right
- This can be solved setting Date (Dátum) format con-form with data content (YMD(ÉHN), MDY(HNÉ), etc.)
- With Special(Irányított) button we can define Deci-mal separator(Tizedesjel) and Thousand separator (Ezres elválasztó) if it is not detected correctly
- Pressing Finish(Bezár) button of the wizard, the table will be placed in consecutive columns with correct data formatting:
Notrecog-nized!
ShiftShift
PullPull
ClickClick
ClickClick
ClickClickClickClick
ClickClick
Click
Click
PullPull
Click
Click
Clic
kC
lick
ClickClick
ClickClick
Data Import/Export/Conversion:From Excel to Text/ HTML/ Presentation Excel table to text format table:
We can copy selected excel table/diagram or both together to clipboard with Edit|Copy(Szerkesztés|Másolás) Ctrl+C
- Paste to Word or PowerPoint with Edit|Pastspecial(Szerkesz-tés|Irányított beillesztés) menu|In Plain text(Nem formázott szöveg) format. It puts Tab(ASCII9) characters among columns as delimiters
- If we would like another delimiter, past the table as HTML and convert it to text as described earlier choosing delimiter char
- Alternatively, you can concatenate content of columns into continous text in a separate column using cell formulas: =A1& ”,”&B1&”,”&C1 where:&-text concat, „”-constant, A1-cell ref.
Excel table to Presentation: HTML table/Picture Metafile/Bitmap:- Never ever paste it with Edit|Paste(Szerkesztés|Beillesztés)
Ctrl+V into Word or PowerPoint!!! Because this embeds the WHOLE Excel file invisibly into teh document/presentation as many times as you pasted any part table:
- Embedded Excel can still make computations with cell formula, but most of the time we do not need that
- However it will result in a huge document/presentation file, which will frequently freeze Word and PowerPoint
- Correctly,you should paste it with Edit|Past special(Szerkesz-tés|Irányított beillesztés) where you have following options:
- HTML format: Preserves color/font formatting well and table is fully editable (cell formula replaced with numbers)Row/colum sizes/margins messed up, lot of work to fix!
- Picture Metafile:Excellent preservation of all formattingExcellent resizeable Cannot be edited as tableCan be edited as drawing with Word/PPT drawing toolAt simple table/graphic it consumes less resource than:
- Bitmap:It is pasted exactly as you can see on screenBad resizeability, quality deteriorates rapidly Very limi-ted editability with PaintBrush In case of highly cokplex diagrams bitmap consumes less resources than metafile
AA BB CC6.45 5.5 7.3515.6 17.8 3.2
AA BB CC
6.45 5.5 7.35
15.6 17.8 3.2 AA BB CC
6.45 5.5 7.3515.6 17.8 3.2
ShiftShift
PullPull
╥╥ ╥╥╥╥ ╥╥╥╥ ╥╥
Ctrl+CCtrl+C
ClickClick
ClickClickClickClick
Ctrl+VCtrl+V
Data Import/Export/Conversion: Excel diagram to Picture Metafile in PPTRe-formatting charts at presentation: There
are some features of charts we cannot set in Excel, but it is possible to do in meta-file:Eg. at complex 3D area charts,it would be great to create semi-transparent func-tion surfaces partially covering eachother, but it cannot be done in Excel. How to do: Copy 3D area charttrough clipboard as metafileConvert metafile into PPT drawing with View|Toolbars|Drawing|Drawing me-nu|Ungroup(Nézet|Eszköztárak|Raj-zoló |Rajzoló menü|Csoportbontás), repeat it as long as it can be doneDelete unnecessary chart background, axis, axis text, etc. elementsSelect all remaining elements, format them Doubleclicking on selection, set their color, border, and transparencyGroup elements together againBut a difficult drawing containing 1000s of elements can eat up lot of resources and freeze presentationTherefore, cut metafile to clipboard with Edit|Cut (Szerkesztés|Kivágás)Paste as GIF picture with Edit|Paste special|GIF (Szerkesztés|Irányított beillesztés|GIF). It keeps transparency, and reduces resource consumption, but it can be edited only as image anymore
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
0.00
0.30
0.60
0.90
0.000.100.200.300.400.500.600.700.800.901.00
ClickClick
ClickClick ClickClick
ClickClick
ClickClick
ClickClick
ClickClick
Szerkesztés
SzerkesztésKivágás
Clic
kC
lick
Click
Click
ClickClick
Pull
Pull
ClickClick
ClickClick
ClickClickClickClick
ClickClickClickClick
Data Import/Export/Conversion: From Excel to Access Databse Table- As an Excel worksheet can process max. 65535 rows,
it is worth to put sizeable data tables into database be-fore Excel freezes.In Access, steps are the following:
- With File|New|Empty database|{Path/Name.mdb} |Save (Fájl|Új|Üres adatbázis|{Elérési út/Név.mdb} |Mentés) menu we create a new empty *.mdb data-base file with the given name on given path:
- With File|Get external data|Import|Excel+Name.xls |Import (Fájl|Külső adatok átvétele|Importál|Excel fájlok + Név.xls|Importálás) menu, import wizard is launced(only if Access is set up in full setup version!):
- First, we select from which worksheet we will import the table: this should have regular row/column structure, with column name at the first line and identical type of data within one column, otherwise Access cannot import:
- Next, we can see the table to import, and it asks wheteher there are column names in the first line
- Next, it asks whether to put data in new database table or an already existing (it should have compatible column structure to receive data)
- Next, we can overview types of columns- Next, it ask to assign primary key to table: No- At Finish, it ask the name of new table: Munka1
- After the wizard finished, new table can be opened with DoubleClick on Tables|Munka1 icon:
- Access can handle ca. 10M rows in a table and computes much more faster than Excel
- However its programming is much more difficult, can be done in Structured Query Language (SQL)
Munka1
ClickClick
ClickClick
ClickClick
ClickClick
ClickClick Click
Click
Home Assignment 3: Fragment clean and matchClean up the following fragments given in FASTA fromat from primer and vector sequences and try to match them using suitable software! (5pts)
Fragment1: Fragment1.txtFragment2: Fragment2.txtFragment3: Fragment3.txt
Solution: 3-1HomeAssignSolution.doc
Cloning, fragment processing:Restriction site database: WebCutter: http://rna.lundberg.gu.se/cutter2/index.html Primer cleaning: SMS2 DNA Pattern: http://www.bioinformatics.org/sms2/index.htmlVector cleaning: NCBI VecScreen: http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html Fragment assembly: CAP3: http://pbil.univ-lyon1.fr/cap3.php
Auxiliary sequence operations: SMS2: http://www.bioinformatics.org/sms2/index.html Uploading sequences: EBI: http://www.ebi.ac.uk/embl/Submission/index.html Data Import/Export/Conversion operations in Excel/Access:
http://www.andrewsexceltips.com/ http://www.andypope.info/ http://www.dicks-blog.com/
References