An Introduction to "Bioinformatics & Internet"

Post on 03-Jul-2015

536 views 4 download

description

An introduction to Bioinformatics & its relationship with internet.

transcript

01/31/1401/31/14 Introduction to ComputersIntroduction to Computers 11

Bioinformatics & InternetBioinformatics & Internet

ByASAR KHAN

M.Sc Zoology AWKUM

Buner Campus

BioinformaticsBioinformatics

• The information technology applied to the biological information to Receive , Analyze & Retrieve the biological data.

33

44Let`s Discuss in Detail….

55

What is a Computer???What is a Computer???

• In general, a computer is a machine which accepts data, processes it and returns new information as output.

Data(Input)

Information (output)

Processing

66

SoftwareSoftware• Software is set of programs (which are step by

step instructions) telling the computer how to process data. it is Also called “firmware”.

• Software needs to be installed on a computer, usually from a CD or USB.

• e.g Digital audio editors , Win 98 , Win2000 , MS Office , Win7 , XP ghost , Win 2006.

77

Advantages of Using ComputersAdvantages of Using Computers

• Speed: Computers can carry out instructions in less than a millionth of a second.

• Accuracy : Computers can do the calculations without errors and very accurately.

• Diligence : Computers are capable of performing any task given to them repetitively.

• Storage Capacity : Computers can store large volume of data and information on magnetic media.

What is PERL?What is PERL?• Larry Wall developed Perl in 1986.• Perl is an interpreted language optimized for

scanning arbitrary text files, extracting information from these files, and printing reports based on that information.

• It is also a good language for many system management tasks.

• In addition Perl-5 is used for graphics programming, system administration, network programming, finance, bioinformatics, and other applications

99

Advantages of PERL

• These benefits include its generous licensing (it's free).

• Cost and Licensing

First, Perl is generally available on most server platforms, including the following:

• Most UNIX variants , MS-DOS , Windows NT Windows 95 , OS/2

1010

1111

What is an Internet?What is an Internet?

• The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite (TCP/IP) to serve several billion users worldwide.

• Internet provides many services:– Email– World Wide Web (www)– Remote Login (Telnet)– File Transfer (FTP)

1212

Computer NetworkComputer Network

• A Computer Network is interconnection of Computers to share resources.

• Resources can be : Information, Load, Devices etc.

Through WI-FIThrough WI-FI

• Wi-Fi, is a popular technology that allows an electronic device to exchange data or connect to the internet wirelessly using radio waves.

1616

BrowsersBrowsers• Clients that communicates with servers , using a

set of standard protocols & conversations.

• It contains the software we need in order to find , retrieve , view & send information over internet.

1717

BrowsersBrowsers• Lynx

it was developed in Kansas university USA to

construct a campus-wide information system.

it only provide a text-only via lower cost.

• Mosaic

Developed in 1993 at NCSA university of Illinois USA

deign for M.Windows it provide a single user-friendly

interface to diverse protocols , data formats & info.

Servers available throughout internet.

1818

• Netscape

developed in 1994 by NCC California USA.

it is now the most popular package for browsing

information's on internet. e.g e-mail , audio videos etc

• Internet Explorer

developed in 1995 by Microsoft corp. Redmond USA

designed to work with PC-based OS , it offers

hypermedia browsers , including java & ActiveX

User can navigate by clicking on specific buttons or

pictures which are known as hyperlinks.

1919

• Hyperlinks

usually characterized by being highlighted in some

way , either by using a different color from the main

body of the text or by being boxed etc.

• Each link have a uniform address known as URL (uniform resource locator)

• HTTP (hyper text transport

protocol) used to exchange info

over internet.2020

• HTML (hyper text markup language)

Hyper text documents are written in a standard markup language known as HTML.

HTML code is strictly text-based & any associated graphics or sound for that document exist as separate files in a common format.

2121

EMB netEMB net• EMB net (European Molecular Biology network)

is an international network that aims to enhance bioinformatics services by bringing together bioinformatics service providers.

2222

EMB net EMB net • Computer store sequence info as a simple rows of

sequence characters called strings. Each character stored in binary code “smallest unit of memory” called byte 1byte = 8 bites

• A DNA seq usually stored & read in computer as a series of 8-bit words in binary format, Value = 0 or 1 producing 255 possible combinations.

• A protein seq appears as a series of 8-bit words

comprising the binary form of amino acid letters.

2323

• Normally DNA & Protein seq are presented in ASCII (American Standard Code for Information Interchange) &

FASTA (FAST Alignment) format.

2424

(1)>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA DIDGDGQVNYEEFVQMMTAK*

TELNETTELNET• Its allows a user to remotely log onto a computer &

access its facilities. It is useful only for occasional queries.

• Its disadvantage is “it has extensive management of user identification & overloading of remote computer processing power”.

2525

AddressAddress• To facilitate communication b/w nodes each

computer on internet is given a unique identifying No (its IP address).

• It is encoded in dotted decimal format e.g 182.181.255.15 it represent a particular machine (PC).

• But the domain-name sys also implemented which makes internet addresses easier to users.

e.g ncbi.nlm.nih.gov meaning

ncbi = national center for biotec & information

nlm = national library of medicine

nih = national institute of health 2626

World Wide Web (www)World Wide Web (www)

• The World Wide Web consists of all the public Web sites connected to the Internet worldwide, including the client devices (such as computers and cell phones) that access Web content.

2727

• It was developed by ENRC (European Nuclear Research Council)

in 1989. to allow internationally info sharing , it led to a medium through which text , images , sounds & videos could be delivered on demand to users.

• WWW greatly enhanced the power of cross references with the guarantee to retrieve the latest information.

• The 1st Molecular biology web server was ExPASy (Expert Protein Analysis System) developed in 1993 by Geneva University Hospital & University of Geneva.

2828

Web pagesWeb pages• The documents which appear in the web browser

window when we surf the www called web pages.

• Each document display on web is called “web page”

& all of the related pages of a particular server is collectively called a web site.

• Web site is a collection of relevant web pages & stored on one computer & each website has a unique address , the most feature of a site is link which allows jump to another page anywhere in the current website.

2929

Web PagesWeb Pages

3030

Nodes Nodes

• In communication networks, a node is a connection point, either a redistribution point or a communication endpoint.

• EMBnet operates 34 nodes in which 20 are national b/c nations have the mandate to provide database , software and online services , including sequence analysis , protein modeling , genetic mapping etc.

3131

• 8 nodes design for user support & training and to undertake research and development.

• These are actually academic , industrial or research centers that have knowledge of specific areas of B.I

• They are responsible for the maintenance of biological database & software's.

3232

• Remaining 6 sites have been accepted within EMBnet as associate nodes, Which are biocomputing centers from non-European countries

• that serve their user communicate with the same kinds of service , as might a typical national node.

• Most of them offer up-to-date access to sequence databases & analysis software.

for molecular mapping , genome management , genetic mapping & so on.

3333

EMBnet associate nodesEMBnet associate nodes

Abbreviation Country Site

IBBM Argentina http://sol.biol.unlp.edu.ar/

ANGIS Australia http://www.angis.su.oz.au/

CBI China http://www.cbi.pku.edu.cn/

CIGB Cuba http://www.bio.cigb.edu.cu

CDFD India http://salarjung.embnet/

SANBI South Africa http://www.sanbi.ac.za

3434

SRS SRS (Sequence Retrieval System)(Sequence Retrieval System)

• It is a network browser for database in molecular biology , this involved to help EMBnet users.

• It allows any flat-file database to be index to any other , it allows user to retrieve , link & access entries from all the interconnected resources.

• The source links nucleic acid , protein sequence , structure , pattern , bibliographic databeses.

3535

• SRS is integral system for info retrieval from many different sequence & for feeding the sequences retrieved into analytic tools such as sequence comparison and alignment programes.

• It can search a total of 141 databases of protein & nucleotide sequences , metabolic pathways , 3D structures & functions , genomes , diseases and phenotype information.

3636

NCBI NCBI (The National Center for Biotechnology Information)(The National Center for Biotechnology Information)

• Established in 1988 in USA as a division of National Library of Medicine located at Bethesda, Maryland

• Its role is to develop new information technologies in aiding our understanding of molecular & genetic processes that underline health & diseases.

• Its specific aims include the creation of automated system for sorting and analyzing biological infor..

3737

• The development of advanced methods of computer-based information processing.

• The facilitation to user access to databases & software , and coordination of efforts to gather biotechnology information worldwide.

• It maintain GenBank , the NIH DNA seq database.

this data is exchange with international nucleotide databases , EMBL & DDBJ.

3838

EntrezEntrez

• DB of different kind merged together and become global hubs of knowledge.

• Just like SRS for EMBnet , entrez facility evolved at NCBI to allow retrieval of molecular biology data & bibliographic citations from NCBI`s.

• It permit related articles in different database to be linked to each other.

3939

• It provide access to DNA seq from (GenBank ,EMBL & DDBJ) while protein seq from (SWISS-PORT ,PIR , PRF ,PBD & translated protein seq from DNA seq databases).

• It is front-end to all databases maintained by NCBI`s & it is extremely easy to use , it is linked to total of 11 databases

• It can be accessed through NCBI website by following URL

http://www.ncbi.nlm.nih.gov/entrez/

4040

Databases covered by Entrez are listed belowDatabases covered by Entrez are listed belowCategory Databases

1. N.A sequence Entrez ntds: seq obtained from GenBank , Refseq & PDB

2. Protein sequencesEntrez Protein: seq obtained , from SWISS-PROT, PIR , PRF , PDB & translations from coding region GenBank , Refseq

3. 3D structure Entrez Molecular Modeling Database (MMDB)

4. Genomes Complete genome assemblies from many sources

5. PopSet From GenBank , set of DNA seq that have been collected to analyze the evolutionary relatedness of a population.

6. OMIM Online Mendelian Inheritance in Man

7. Taxonomy NCBI taxonomy database

8. Books Bookshelf

9. Probeset Gene Expression Omnibus (GEO)

10. 3D domain Domains from the entrez Molecular Modeling Database

11. Literature PubMed4141

Retrieval & ApplicationRetrieval & Application• The two main reasons for putting the data on the

computer is Retrieval & Discovery.

• Retrieval is the ability to get back out what we put in so this is more valuable to get back from the system more knowledge than was put in.

• This will help in biological discoveries

• NCBI uses 4 core data elements: bibliographic citations , DNA seq , Protein seq , & 3D structures.

4242

BioseqBioseq

• Bioseq or biological sequence is a central element in NCBI data model it contain a single , continues molecule of nucleic acid or protein

4343

Mirrors & IntranetMirrors & Intranet Different servers providing the same services are

called mirrors , to access a particular website it is

necessary to type the URL in the address bar of the

browser.

4444

IntranetIntranet

• Many academic institutions have an intranet , which means a local network that can be accessed only from computer within the institution.

4545

• What makes a web the most powerful is its network

• Here some basic sites for beginner of bioinformatics

1.http://www.ncbi.nlm.nib.gov/

2.http://www.ebi.ac.uk/

3.http://www.expasy.ch/

4.http://www.embl-heidelberg.de/

5.http://www.gmd.de/welcome.en.html

6.http://links.bmn.com/

4646

• Apart from these sites , there are a great number of specialist sites with biological data which can be accessed. e.g

• General purpose search engines such as

4747

THANK YOU FOR YOUR THANK YOU FOR YOUR ATTENTION ATTENTION

Questions are Welcomed . . . Questions are Welcomed . . .

4848