Motif Space Database Design Kiranjit Sidhu. 2 Outline Schema Design Content of Database ...

Motif Space Database Design

Kiranjit Sidhu

2

Outline Schema Design Content of Database Functionality Future Plans

3

Sample PDB File

Sample PDB File

Each PDB File represented as a text file (~ 60K Lines)

Inefficient for pattern matching Relational Database required for

most efficient solution

4

Structure of Database DB divided into two major components:

Protein Data Motif (Occurrence) Data

Protein Data Obtained from PDB Files (Protein Data Bank) Derived Data

Motif Data Obtained from Luke’s FFSM technique Derived Data

5

Schema Design

6

Schema Design - Protein

7

Schema Design - Motif

8

Tools Used Obtaining Data

Perl Scripts Database:

SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data)

9

Obtaining Data

PDB File Temp Tables (T-SQL)

T-SQL Procedures

CSV FileExtract Import

Final DB Convert and Derive

10

Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain

Combinations Entries in tables:

E.g. Approx. 800 Million Rows in the proteinchaindistance table

Initial version imported 10 PDB files in 1 day

Current version: under 3 minutes

11

Current Functionality Protein (PDB) data has been completely

uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev)

Visualize protein structure using data from database (data available)

Data can be obtained from Server using SOAP or web services.

Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics.

12

Demo

Date post:	19-Dec-2015
Category:	Documents
View:	221 times
Download:	0 times

Motif Space Database Design Kiranjit Sidhu. 2 Outline Schema Design Content of Database ...

Documents