1 © 2005 Cloud Computing Overview: Big Data and Business Analytics Hsinchun Chen University of...

Post on 23-Dec-2015

218 views 0 download

transcript

1© 2005

Cloud Computing Overview: Big Data and Business Analytics

Hsinchun ChenUniversity of Arizona

2© 2005

Interesting Questions

Cloud Computing Applications

Big Data Analytics

Business Models (CIA)

3© 2005

Cloud Computing Applications:Overview and Examples

4© 2005

IQ: How Amazon makes its money?

5© 2005

Cloud Computing Overview• Cloud computing: applications, system software, and hardware delivered as services over the Internet.• Service oriented architecture + virtualization + utility computing• Software as a Service (SaaS), Infrastructure as a Service (IaaS), Platform as a Service (PaaS)• From web services to cloud computing applications • Moving towards cloud applications and cloud business models, e.g., SaleForce.com, Apple iTune, Amazon

6© 2005

Major Could Computing Platforms

• Amazon Elastic Compute Cloud (EC2): LAMP (Linux, Apache, mySQL, and PHP) stack

• Google App Engine: Java and Python runtime, Java Persistence API (JPA), Google Bigtable, File systems; Hadoop, MapReduce

• Windows Azure: .Net, MS SQL, SharePoint

7© 2005

Emerging Applications

• E-Commerce: B2C, life style & entertainment, global supply-chain, banking, telecommunications, IT hosting, business intelligence and analytics• E-Government: government data sources, services• E-Education: online education content delivery• E-Security: cybersecurity, intelligence

• E-Health: healthcare big data, healthcare 2.0; genomics + EHR

8© 2005

Selected Health Cloud Initiatives• National Electronic Health Record Data Bank, Singapore: MOH + Accenture, August 2010; healthcare management, quality and performance management, EHR information aggregation, patient self management, decision support • E-Health, E-Health Cloud, England: Chelsea Westminster Hospital + Flexiant, July 2011, patient EHR access• CareStream Cloud, US: Carestream Health (Onex + Kodak), 2009; health imaging sharing, 1B medical images, health cloud SaaS vendor• Taiwan Smart Health Cloud, NTU & NCKU

(Sources: NTU Health Cloud proposal)

9© 2005

IQ: What’s the difference between 2005 and 2012 for web computing?

10© 2005

Web Computing and Mining

• Emerging web applications business models

• Web services, APIs, mashups cloud & mobile computing

• Business analytics Data, text and web mining

11© 2005

Web Services and Computing (No Cloud), 2005 (Web 2.0)-2011

12© 2005

50 Projects, 2005-2012(“Business Web Mining Using Amazon, Google, eBay, and

Google”)

• E-commerce and e-Services:iRelocate RealTomatoes SmallBH HobbyCentral NewPlaceSeekCollege Advisor Friendly Gifter Clipper GottaCouch SkiStop vTrackBarter Bay Link-US Smart Gift Card Timely Bid Tucson Gamer Café TV and More Deliverables Cellphone Intelligent Auctioning Tucson Book Exchange SciBubble Wish Sky GiftChannel PriceSmart WetYourWhistle

• Life Style and Entertainment:BetSmart XTREME F1 MLB 100Yards CricWeb iBollywood Sa Ri Ga Ma WOW Bollywood Funzic HinduShrines Indiapaaru NachBaliye Movie Location Quest Remakes SugarSuite MusicBox Artist Connection Concerto Star Search

• Government and Education:RepCheck SmallNGreenCars Change of Base iDog Tasty Park iSupport

13© 2005

SmallNGreenCars

14© 2005

SmallNGreenCars

15© 2005

• Unique Concept • Global customers• Youtube vehicle videos• Flickr vehicle photos• Google Maps and Local Search• Google visualization• RSS feeds of global vehicle news• Facebook recommendation from friends• Yahoo Finance for currency exchange• Google Translate for web pages• Recommendation System• Fuel Efficiency Challenge

SmallNGreenCars • By Kumar Vakeel, Kunal Jain, Neeraj Munshi; MS MIS, Spring 2010• One-stop portal for green cars information and resources

16© 2005

SmallNGreenCars

17© 2005

Sa Ri Ga Ma

18© 2005

Sa Ri Ga Ma

19© 2005

Sa Ri Ga Ma

• Sarigama.com latest news and RSS Feeds• Artist information• Transliteration• Music play and video• Shopping• Lessons and Library• Concert locator• Forums• Interactive Features• Tag Clouds• Lyrics Recommender system

• Mahalakshmi Sundararajan, Pavithra Ravi, Sahana Nagaraja; Spring 2010 • Carnatic Music: One of the two main genres of Indian classical music; Mostly performed vocally• Sarigama.com: one stop information portal for carnatic music

20© 2005

Sa Ri Ga Ma

21© 2005

Web Services, Cloud Computing, and Mobile Web, 2012 (Web 3.0)

22© 2005

25 Projects, 2012Cloud and Mobile Computing

• E-commerce and e-Services:GamerzLykMe MobileAppPortal Gemstones PersonalInvestment iScream iRace SeeMeSocial AZRegionTrend HelpMeAZ

• Health & Life Style:EatRight OrganiCook RoadTrip Xtravel WreckDivers VoiceOfNature HealthMiners HelpAsthma DiabeatUS HikeAday YogaWorld BikersParadise YogaWorld BikersParadise

23© 2005

OrganiCook

24© 2005

25© 2005

OrganiCook

• Organic food supplier location• Different health concerned

recipe catalogs• Integrate healthy content with

social media• Text mining for cookware

recommendation• Mark allergens among

ingredients• Provide health news• Advertisement• Unique recommendation system• Amazon EC2 Cloud server• Intetergrate Mahout with

Hadoop

• By Zilong Chang,Mengwen Cheng,Yajie Wang, andHaiqing Wu, Spring 2012• One-stop portal for healthy foods

26© 2005

FatSecret Get recipes and nutrition facts

Yahoo Local Get location of organic food suppliers

Google Map Google Map-map the location

Google Places Get detail info about the food suppliers

Facebook Social Plugin

Like Button , Comments

Twitter Buttons Share a link , Follow

Twitter Search Return tweets based on user’s search keyword and recipe name

Google+ Share the page

Return relevant videos

Flicker Return pictures of the recipe

amazon Return info about cookers

OrganiCook

27© 2005

User

Browser

Internet Connection

Cloud

Database server

Amazon EC2

Data Mining

Application Server Apache Tomcat

J2EEREST API

MySQL 5.5

Mahout Taste

JavaScript API

API Servers

OrganiCook

28© 2005

EatRight

29© 2005

30© 2005

EatRight

• True SoLoMo (Web 3.0)• Nutrition based meal shopping• Capturing user preferences: “Eat This”

button• Directed search advertising rates • Targeted ads based on nutrition

preferences and location• EatRight API• Twitter Sentiment• PCI Compliant Credit Card Processing• Amazon EC2 Cloud• Android Mobile App (iOS too!)

• By Jim Marquardson, Justin William, Dave Wilson, and Mark Grimes, Spring, 2012• Health & nutrition mobile site

31© 2005

EatRight

32© 2005

Big Data & Business Analytics

33© 2005

IQ: Size (storage) of LOC book collection?

34© 2005

IQ: What is a Yottabyte & who owns it?

35© 2005

The Data Deluge (Big Data)

• The Economists, March 2010– LOC total book collection 15 TBs– Google processes 10 PBs per day– Internet traffic 667 Exabytes by 2013, Cisco– Total amount of world information in 2010, 1.2

Zettabyte • KB-MB-GB-TB-PB-EB-ZB-Yottabyte

• E-Commerce, Government, Health, Security applications: many with TB/PB of valuable content from customers, citizens, patients, etc.

36© 2005

• $3B BI revenue in 2009 (Gartner, 2006); $9.4B BI software M&A spending in 2010 and $14.1B by 2014 (Forrester)

• IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians Acquired i2/COPLINK in 2011

BI & Analytics: The Market

37© 2005

BI & Analytics: Definition and Components

• BI and Analytics refers to: (1) the technologies, systems, practices and applications that (2) analyze critical business data to (3) help an enterprise better understand its business and market.”

• Core technologies: data warehousing, Extraction, Transformation, and Load (ETL); Business Performance Management (BPM), visual dashboards; data and text mining, social network analysis

• BI 2.0 & 3.0 research: web analytics, web 2.0; in-memory and real-time BI; web 3.0, cloud computing, Hadoop, MapReduce; mobile computing, stream data mining

38© 2005

Big Data Analytics Research at UA/AI Lab• Applications/problems: digital libraries, search engines,

biomedical informatics, healthcare data mining, security informatics, business intelligence

• Approaches: web collection/spidering, databases, data warehousing, data mining, text mining, web mining, statistical NLP, ontologies, social media analytics, interface design, information visualization, economic modeling, assessment

• Structure: federal funding, director, affiliated faculty, post-docs, Ph.D./MS/BS students commercialization

• Major phases: DLI COPLINK Dark Web DiabeticLink

39© 2005

Business Models

40© 2005

IQ: What is “CIA” and their differences?

41© 2005

• Central Intelligence Agency; Culinary Institute of America

• Chinese: math/science, team player, IT/hardware/web, China market (China)

• Indians: math/science, entrepreneurial spirit, English • Americans: English, entrepreneurial spirit, IT/software,

business development, market (US), VC access ($)

CIA in the Global IT Landscape

42© 2005

My COPLINK Experience

• Taiwan/US Training: NCTU (math) SUNY Buffalo (MBA) NYU (AI) U of Arizona (top 3)

• AI Lab: Digital Library COLINK Dark Web DiabeticLink

• COPLINK federal funding ($4M), NSF/NIJ, 1997-2002• COPLINK commercialization ($4.6M), angels/VCs (Taiwan, CA,

AZ), 2000 & 2003• Customer sales ($30M), 4,500 agencies, 120 FTEs, 2000-2011• M&A Exit, Silverlake/i2/IBM acquisition, 2009 (i2), 2011 (IBM);

$500M valuation

43© 2005 43

44© 2005

COPLINK Identity Resolution and Criminal Network Analysis (DHS)

44

Cross-jurisdictional Information Sharing/Collaboration

Border Crossing Data(AZ, CA, TX)

Vehicles People

Law-enforcement Data

AZ CA TX

CAN Visualizer

Criminal Network AnalysisCriminal Link Prediction

Predict interaction between individuals and vehicles using link prediction techniques to

identify high-risk border crossers.

High-risk VehicleIdentification

Identify high-risk vehicles using association techniques like mutual information using

border crossing and law enforcement data.

Law-enforcement Data Border Crossing Data

0

500

1000

1500

2000

No

v 1

1

No

v 1

7

De

c 1

9

De

c 2

1

De

c 2

9

Jan

6

Jan

6

Jan

6

Jan

15

Jan

19

Jan

26

Jan

31

Fe

b 2

7

Ma

r 5

Ma

r 5

Ma

y 1

8

Ma

y 1

8

Ma

y 2

5

Ma

y 2

8

Ma

y 3

0

Jun

9

Jun

e 1

7

< 2004 Dates 2005 >

Tim

e o

f D

ay

Vehicle A Vehicle B

Frequent Crossers at Night

Mutual Information

Narcotics Network

Vehicle A Vehicle B

Suspect Traffi c Burst Detection

Detect real-time anomalies and threats in border traffi c using Markov switching and

other models.

Arizona IDMatcher

Detect false and deceptive identities across jurisdictions using a probabilistic naïve-

Bayes based resolution system.

Identity ResolutionIdentityMatch

NameMatch

DOBMatch

IDMatch

AddressMatch

Middle Name Match

DOBSimilarity

IDSimilarity

AddressSimilarity

First NameMatch

Last Name Match

Middle Name

Similarity

First Name

Similarity

Last Name

Similarity

* Only the grayed datasets are available to the AI Lab

• Funding: NSF, DOJ, DHS ($4M), VCs ($4.6M); Digital Government• Publications: ACM TOIS, CACM, IEEE TKDE, IEEE IS, JASIST, DSS• Impact: 3500 agencies, 25 NATO countries, 1M users public safety

45© 2005

Newsweek Magazine,  March 3, 2003

A computerized way for police to coordinate crime databases

Washington Post, March 6, 2008, COPLINK in use in 3,500 police agencies in US!

COPLINK acquired by i2 (Silver Lake) in 2009; i2/COPLINK acquired by IBM in 2011

for $500M

ABC News  April 15, 2003

Google for Cops: Coplink software helps police search for cyber clues to bust criminals

The New York Times, November 2, 2002

COPLINK assisted in DC sniper investigation

46© 2005

• Startup Phase: business ideas (product and market), team (founders & mentors), share structure (shares, directors, options; legal/CPA), business plan (short plan, good introduction), funding (government, angels, VCs, family) Year 0, 1-3 founders, $250K funding (IT/cloud)

• Early Phase: first product, product positioning, team building, initial sales Years 1-3, $500K sales

• Growth Phase: products plan, strong sales team, sustainable revenues, unique IPs (SW, content), loyal customers Years 3-8, $10M sales

• Exit Phase: IPO or M&A (partners), when ($20M+), next venture

Taking risks!

IT Business Models: Some Thoughts

47© 2005

Pain, Sorrow, and Regret• Loss of family time/life (but never money)• Managing university obligations and COI• University bureaucracy, Office of Technology Transfer (OPTT)• Lawyers, accountants are expensive• Chasing angels/VCs (40 frogs 1 prince)• Office, employees, products• Selling products (becoming a vendor)• Burning cash• Bubble burst• Raising second round funding when you are down ($2M)• Board room yelling matches• University accusations• Losing control and shares• Anti-dilution clause (losing $60M for the $2M you never used)

48© 2005

hchen@eller.Arizona.edu

http://ai.Arizona.edu