Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 221 times |
Download: | 0 times |
3
Sample PDB File
Sample PDB File
Each PDB File represented as a text file (~ 60K Lines)
Inefficient for pattern matching Relational Database required for
most efficient solution
4
Structure of Database DB divided into two major components:
Protein Data Motif (Occurrence) Data
Protein Data Obtained from PDB Files (Protein Data Bank) Derived Data
Motif Data Obtained from Luke’s FFSM technique Derived Data
8
Tools Used Obtaining Data
Perl Scripts Database:
SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data)
9
Obtaining Data
PDB File Temp Tables (T-SQL)
T-SQL Procedures
CSV FileExtract Import
Final DB Convert and Derive
10
Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain
Combinations Entries in tables:
E.g. Approx. 800 Million Rows in the proteinchaindistance table
Initial version imported 10 PDB files in 1 day
Current version: under 3 minutes
11
Current Functionality Protein (PDB) data has been completely
uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev)
Visualize protein structure using data from database (data available)
Data can be obtained from Server using SOAP or web services.
Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics.