1 Databases in Internet Applications: Case Studies Anil Nori CTO AserA Inc. Palo Alto USA...

Post on 15-Jan-2016

214 views 0 download

transcript

1

Databases in Internet Applications:

Case Studies

Anil NoriCTO

AserA Inc.Palo Alto

USAanori@asera.com

2

Acknowledgements

Sources for some of the material Oracle Corporation CNN Custome News Excite Cisco

3

Database Technology Timeline

Early 80s Late 80s Early - Mid 90s Late 90s - 21st C

Pre-relational

EarlyRelational

Client-serverRelational

Enterprise -capable

Relational

Internet Computing

SimpleOLTP

ActiveDatabase

Data Warehouse &Hi-end OLTP

Packaged & Vertical

Applications

Simple transactions,

on-linebackup & recovery

Stored procedures,

triggers

Scaleable OLTP, parallel query, partitioning,

cluster support, row-level locking, high availability

Middleware (messaging,

queues, events)Java,

CORBA, Web interfaces

Support for all types of

data, extensibility,

objects

Simple Data Management

Global Enterprise Management

Current State of DBMSs

OLTP applications• Large amounts of data

• Simple data, simple queries and updates Update statement from debit/credit transaction:

UPDATE accountsSET abalance = abalance + :delta

WHERE aid = :aid;

• Typically update intensive

• Large number of concurrent users (transactions) Data warehousing applications

• Large amounts of data

• Simple data but complex querying

• Typically read intensive

• Large number of users

Current State of DBMSs

These applications require:• Large users/transactions

• High performance

• High availability (7x24 operations)

• Scalability

• High levels of security

• Administrative support

• Good utilities

6

Internet Applications: Challenges

TerabytesGigabytes

ImmediateBatch

UsageUsage

Business-CriticalUseful

ImportanceImportance

Every EmployeeAnalysts

UsersUsers

SizeSize

Self-ServiceTrained

Larger User PopulationsLarger User Populations

IntegratedIndependent

Network SystemsNetwork Systems

IntelligentSimple

Systems ManagementSystems Management

Global Local

Operations HoursOperations Hours

Transaction Processing Data Warehousing

7

Internet Applications: Challenges

HeterogeneousTabular

TypeType

PersonalizedGeneric

DeliveryDelivery

Lots of read-onlyRead/write

AccessAccess

Information Management

Search Direct

ContentContent

OpenProprietary

APIsAPIs

IntegratedStandalone

E-commerce/Apps

ApplicationsApplications

Low TCO, Mission Critical

ManagementManagement

24X7Occasional

AvailabilityAvailability

Site Operation

8

Internet Challenges

Availability• Need near 100% availability• Must be easy to manage• Replication, hot standby, foolproof system?

Scalability• Number of users is orders of magnitude higher

Security• Global users• Managing millions of users• Encryption• Performance

Internet user expectations• Speed vs correctness

(e.g. Search engines vs blade/cartridge/extender

• Availability vs correctness

9

Internet Application Architecture: Today

Application messages

Browser Browser

Physical Middle Tier

Data Sources

Client Tier

ORDBMS

WEB/APP Server

Middle TierApplication

Data Integration, Storage, Query, Management

Other Data Sources

Gateways

OLE/DBData source

authoring tools etc.

HTTP

HTTP

Remote messages

10

Case Studies

CNN Custom News Excite Cisco Internet Applications

11

CNN Custom News

On-line news service Allows users to customize news in a

personalized manner Offers variety of news items (e.g.

national, international, business etc.)

12

Custom News Application Architecture

Browser Browser

Physical Middle Tier

Client Tier

WEB Server

Application Server

HTTP

OracleDBMS

Application Server

Application Server

DatabaseTier

OracleDBMS

OPS

WEB Server WEB Server

Hardware Load Balancing

...

13

CNN Custom News

Backend:• SUN SOLARIS enterprise servers

• Oracle Parallel Server 7.3.4 Middle-Tier (9 Machines)

• Web Servers

• Oracle Application Servers

• PL/SQL Cartridges Load Balancing

• Harware based

• DNS router

• Round -robin

14

Oracle Application Server

CORBA Backend

Adapter

Car

trid

ge

Car

trid

ge

Car

trid

ge

15

CNN Custom News

Data feeds into the database Keeps text in the database Images in files Images accessed in the middle-tier PL/SQL Cartridge

16

PL/SQL Cartridge

OAS

PL/SQL Cartridge

Connection poolingSession CachingParameter MarshallingValidationResult Processing

OracleDBMSPL/SQL

17

PL/SQL

Server-side Used to generate HTML Suited for database logic

18

Searching

Uses Oracle ConText cartridge Content-based searching Uses bitmap indexes

19

CNN Custom News: Observations

Database-centric Uses PL/SQL based scripting Application Server for scalability

20

Excite

Personalized online service that gives Web users everything they want, all in one place

Builds tools that manage vast amounts of information available on the internet

Provides variety of user services (apps):• News • Money and Investing -- stock quotes• Message boards and Chat• Mail• Communities• Classifieds• Jobs

21

Excite

Supports suite of applications Each application uses three-tier

architecture Federated approach

• Many databases

• Databases specific to applications Application logic in the middle-tier as

multi-threaded embedded C programs (pro*c programs)

22

Excite: An Application Architecture

Browser Browser

Physical Middle Tier

Client Tier

WEB Server

Middle TierApplication

HTTP

HTTP

OracleDBMS

Middle TierApplication

Middle TierApplication

WEB Server

DatabaseTier

23

Excite - PFP Application

Personalized front page application Application is deployed as 50 middle-tier

daemon processes The middle-tier application daemons

perform:• Application logic in C

• Connection pooling Each daemons keeps about 40 connections to

the database (about 2000 total connections to the database)

• Load balancing

24

Excite - PFP Database Configuration

Oracle8 on SUN solaris server

• 2 SUN 6500s -- 28 way SMP

PFP database is split into multiple databases for load balancing and scalability

Scalar data stored in the database in relational tables

About 20 tables for storing user profiles; 100 tables for content

25

Excite - PFP Database Configuration

Multi-media content (e.g. Stock quotes or news item) stored in memory mapped files for fast access. File references stored in the database

Lot of the content is read-only; need not be backed up; can be reconstructed from the original sources

26

Excite - Scalability

By partitioning the application across multiple databases

Each application partition supported by multiple middle-tier daemon processes

Multiple web servers to reduce traffic congestion

27

Excite - Availability

Using replication and hot standby Uses oracle8 hot standby feature Uses asynchronous replication. Data

replicated at 10 sec latency Almost every database is replicated for

failover Replication preferred over hot standby.

Hot standby cannot be used for normal usage

28

Excite - Other Applications

Most of the Excite applications have similar three-tier architecture

29

Excite - Observations

Some content (specially, for communities applications) could be stored in the database. Management benefits attractive. If content stored in the database, access performance is very critical

Need fast replication Currently not using middle-tier caching.

Caching could be quite useful but coherency is an issue

30

Cisco Successfully implemented applications

for the internet Internet commerce

• Order placement• Checking order status• On-line, guided product configuration• Price quotes

Employee self-service• Provides all employee services

electronically• Employee directories• Employee benefits• Expense reports

31

Cisco

Supply chain management• Networked suppliers, resellers and

customers

• Enables business partners to manage and operate major portions of its supply chain

• Entire supply chain works off one central demand forecast

Customer care• Exchange of technical information

• Software upgrades (90% of software upgrades via internet)

• On-line support ( 70% of support on-line)

• On-line, assisted trouble-shooting

32

Cisco

Communications and collaboration• Sales and technical training

• Virtual classrooms

• Company-wide meetings and broadcasts

33

Cisco Commerce Server Architecture

Browser Browser

Physical Middle Tier

Client Tier

WEB Server

HTTP

HTTP

OracleDBMS

Commerce Server

DatabaseTier

OracleDBMS

OracleApplications

34

Cisco Commerce Server

Typical three-tier architecture Proprietary web server

• Performs content aggregation

• Encryption

• Accesses oracle DBMS

• Runs on a dedicated SUN server Proprietary commerce server

• Proprietary application server

• Performs variety of commerce functions

35

Cisco Commerce Server

Scalability and availability• Big servers for scalability

• Multiple commerce server processes for load balancing

• Databases replicated

• Hot standby for availability

36

Case Studies: Observations

Database is being used mostly for storage

Application in the middle-tier Middle-tier also provides:

• scalability

• load balancing

• large number of users

37

Analyzing Internet Applications

Web integration Web publishing Application integration E-commerce

38

WEB Integration Heterogeneous data sources Heterogeneous data types 1000s of data sources Dynamic data Warehousing

Web Publishing

Problem: internet placing new requirements on content management• Heterogeneity: access different types of

content from browsers e.g. Email, data warehouses, reports, HTML files

• Personalized: structured, dynamic, customized content

• Transactive: content blending with application

• Aggregation: portalization via major “gateways”

40

Application Integration

Integrating Multiple Applications (e.g. ERP/Front Office)

• Application workflow specification Asynchronous communication

• Queuing and propagation Message tracking Message warehouse (persistence)

• Message broker/server Data transformation

• Transforming messages to different application formats (e.g. SAP, CLARIFY, …I

41

Electronics Commerce

Automating business-to-business, business-to-consumer interactions

• Selling and buying Order management Product catalogs Product configuration

• Sales and marketing

• Education and training

• Service

• Communities

Database Technology Uses

Business/workflow transactions• Support across multiple database/ERP

systems

• Transactional

• Tools to generate compensating actions

• Transformations Queuing

• Support for heterogeneous messages

• Transactional

• Querying, e.g. On attribute, value pairs

• Indexing, e.g. On attribute, value pairs

• Publish/subscribe

Database Technology Uses

Rule engines• Complex business processing rules

• Customization/profiling rules Business domain rules Presentation rules

Repositories for Application Development• Managing Java objects, interfaces, etc.

• Must for application integration

• Standardized object models and protocols

• Directories vs repositories

Database Technology Uses

XML support• XML schema/storage

• XML caching

• XML querying

• Coexistence with SQL -- current efforts seem disjoint

Multiple caches• Consistency of middle-tier and database

caches Data mining

• Algorithms need to become more pragmatic

45

Database Technology Uses

Internet user expectations• Speed vs correctness

(e.g. Search engines vs blade/cartridge/extender)

• Availability vs correctness Component Architecture

• Caching• XML support• Querying• Transactions• Rule engines• Metadata management• Queueing

Database Technology Uses Availability

• Need near 100% availability

• Must be easy to manage

• Replication, hot standby, foolproof system? Scalability

• Number of users is orders of magnitude higher

Security• Global users

• Managing millions of users

• Encryption

• Performance

47

Internet Applications Architecture: Future

Browser Browser

Logical Middle Tier

Data Sources

Client Tier

ORDBMS

WEB/APP Server

XML enabled

XML Database

XML Integration & Query Server; Warehouse Server

XMLdocumentson the Web

Otherdocumentson the Webe.g. HTML,WORD

XML Transformer & Gateway

OLE/DBData source

XML

XML

XML

XML XML

XML enabled tools: authoring tools etc.

XML enabled Application Messages

48

XML in the Database

XML has the potential to impact four important markets

• Web integration

• Web publishing

• Application integration

• Electronic commerce

Xml-enable the DBMS

Xml-enabled DBMS

DBMS “Xml-enable” the database

system • Store XML data/documents the

database server

• Querying and searching of structured and unstructured XML

• In generate XML data from the database server

• Add XML capabilities in supporting database facilities

Store XMLStore XML

GenerateGenerateXMLXML

Integrate with Integrate with other facilitiesother facilities

Store XML Data

Enhance XML storage facilities in the database with support in utilities• Facilities to load XML data into the database

• Provide more efficient database storage (componentized storage, compression, indexing,…)

• XML export facilities from the server

51

Search and Query XML Data

Search XML data efficiently • Special SQL queries over structured +

unstructured XML

• Content-based indexing (e.g. Text indexes) for searching XML data efficiently

• Support for XML query languages (e.g. XQL) on XML data

Generate XML

Generate XML from the database server• Map SQL92, SQL3 and PL/SQL datatypes to XML

• Provide mappings between java, SQL and XML types

Script XML content from the database• Allow SQL queries to return XML results

• Provide embedded XML in stored procedures

• Java scripting: support embedded XML in java

• Common apis to access any XML content in databases

XML and Supporting Facilities

Provide XML capabilities in supporting database facilities• Support XML in database utilities - loader,

export/import ..

• Allow server-to-server replication of XML data

• Fine grained access to XML documents

54

XML Caching

Need to temporarily cache it, index it, update the cached copy, transact it

Need to query XML caches Also requires a store for managing it in

the middle-tier Provides XML logical views

55

DBMS Architecture for Internet Applications

Monolithic architecture• Enhance the DBMS with all the features

necessary for supporting internet applications

Component architecture• Provide components for supporting

internet applications

• Components can reside in the DBMS or in the middle-tier

56

Monolithic Approach

+ Database is the platform

+ Leverage DBMS infrastructure

+ Uniform management

- Not flexible

- Forces 2-tier architecture

- May not be suitable for high-end configurations

- Not suitable for heterogeneous application integration

57

Component Approach

+ Flexible

+ Accommodates multi-tier architecture - components can be deployed in the middle or database tier

+ Facilitates heterogeneous integration of applications

- Need to manage multiple components

Looking Ahead

Database Technology has lot to offer for building internet applications!

Componentized Databases?