Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | rose-mcbride |
View: | 218 times |
Download: | 0 times |
1© 2005
Cloud Computing Overview: Big Data and Business Analytics
Hsinchun ChenUniversity of Arizona
2© 2005
Interesting Questions
Cloud Computing Applications
Big Data Analytics
Business Models (CIA)
3© 2005
Cloud Computing Applications:Overview and Examples
4© 2005
IQ: How Amazon makes its money?
5© 2005
Cloud Computing Overview• Cloud computing: applications, system software, and hardware delivered as services over the Internet.• Service oriented architecture + virtualization + utility computing• Software as a Service (SaaS), Infrastructure as a Service (IaaS), Platform as a Service (PaaS)• From web services to cloud computing applications • Moving towards cloud applications and cloud business models, e.g., SaleForce.com, Apple iTune, Amazon
6© 2005
Major Could Computing Platforms
• Amazon Elastic Compute Cloud (EC2): LAMP (Linux, Apache, mySQL, and PHP) stack
• Google App Engine: Java and Python runtime, Java Persistence API (JPA), Google Bigtable, File systems; Hadoop, MapReduce
• Windows Azure: .Net, MS SQL, SharePoint
7© 2005
Emerging Applications
• E-Commerce: B2C, life style & entertainment, global supply-chain, banking, telecommunications, IT hosting, business intelligence and analytics• E-Government: government data sources, services• E-Education: online education content delivery• E-Security: cybersecurity, intelligence
• E-Health: healthcare big data, healthcare 2.0; genomics + EHR
8© 2005
Selected Health Cloud Initiatives• National Electronic Health Record Data Bank, Singapore: MOH + Accenture, August 2010; healthcare management, quality and performance management, EHR information aggregation, patient self management, decision support • E-Health, E-Health Cloud, England: Chelsea Westminster Hospital + Flexiant, July 2011, patient EHR access• CareStream Cloud, US: Carestream Health (Onex + Kodak), 2009; health imaging sharing, 1B medical images, health cloud SaaS vendor• Taiwan Smart Health Cloud, NTU & NCKU
(Sources: NTU Health Cloud proposal)
9© 2005
IQ: What’s the difference between 2005 and 2012 for web computing?
10© 2005
Web Computing and Mining
• Emerging web applications business models
• Web services, APIs, mashups cloud & mobile computing
• Business analytics Data, text and web mining
11© 2005
Web Services and Computing (No Cloud), 2005 (Web 2.0)-2011
12© 2005
50 Projects, 2005-2012(“Business Web Mining Using Amazon, Google, eBay, and
Google”)
• E-commerce and e-Services:iRelocate RealTomatoes SmallBH HobbyCentral NewPlaceSeekCollege Advisor Friendly Gifter Clipper GottaCouch SkiStop vTrackBarter Bay Link-US Smart Gift Card Timely Bid Tucson Gamer Café TV and More Deliverables Cellphone Intelligent Auctioning Tucson Book Exchange SciBubble Wish Sky GiftChannel PriceSmart WetYourWhistle
• Life Style and Entertainment:BetSmart XTREME F1 MLB 100Yards CricWeb iBollywood Sa Ri Ga Ma WOW Bollywood Funzic HinduShrines Indiapaaru NachBaliye Movie Location Quest Remakes SugarSuite MusicBox Artist Connection Concerto Star Search
• Government and Education:RepCheck SmallNGreenCars Change of Base iDog Tasty Park iSupport
13© 2005
SmallNGreenCars
14© 2005
SmallNGreenCars
15© 2005
• Unique Concept • Global customers• Youtube vehicle videos• Flickr vehicle photos• Google Maps and Local Search• Google visualization• RSS feeds of global vehicle news• Facebook recommendation from friends• Yahoo Finance for currency exchange• Google Translate for web pages• Recommendation System• Fuel Efficiency Challenge
SmallNGreenCars • By Kumar Vakeel, Kunal Jain, Neeraj Munshi; MS MIS, Spring 2010• One-stop portal for green cars information and resources
16© 2005
SmallNGreenCars
17© 2005
Sa Ri Ga Ma
18© 2005
Sa Ri Ga Ma
19© 2005
Sa Ri Ga Ma
• Sarigama.com latest news and RSS Feeds• Artist information• Transliteration• Music play and video• Shopping• Lessons and Library• Concert locator• Forums• Interactive Features• Tag Clouds• Lyrics Recommender system
• Mahalakshmi Sundararajan, Pavithra Ravi, Sahana Nagaraja; Spring 2010 • Carnatic Music: One of the two main genres of Indian classical music; Mostly performed vocally• Sarigama.com: one stop information portal for carnatic music
20© 2005
Sa Ri Ga Ma
21© 2005
Web Services, Cloud Computing, and Mobile Web, 2012 (Web 3.0)
22© 2005
25 Projects, 2012Cloud and Mobile Computing
• E-commerce and e-Services:GamerzLykMe MobileAppPortal Gemstones PersonalInvestment iScream iRace SeeMeSocial AZRegionTrend HelpMeAZ
• Health & Life Style:EatRight OrganiCook RoadTrip Xtravel WreckDivers VoiceOfNature HealthMiners HelpAsthma DiabeatUS HikeAday YogaWorld BikersParadise YogaWorld BikersParadise
23© 2005
OrganiCook
24© 2005
25© 2005
OrganiCook
• Organic food supplier location• Different health concerned
recipe catalogs• Integrate healthy content with
social media• Text mining for cookware
recommendation• Mark allergens among
ingredients• Provide health news• Advertisement• Unique recommendation system• Amazon EC2 Cloud server• Intetergrate Mahout with
Hadoop
• By Zilong Chang,Mengwen Cheng,Yajie Wang, andHaiqing Wu, Spring 2012• One-stop portal for healthy foods
26© 2005
FatSecret Get recipes and nutrition facts
Yahoo Local Get location of organic food suppliers
Google Map Google Map-map the location
Google Places Get detail info about the food suppliers
Facebook Social Plugin
Like Button , Comments
Twitter Buttons Share a link , Follow
Twitter Search Return tweets based on user’s search keyword and recipe name
Google+ Share the page
Return relevant videos
Flicker Return pictures of the recipe
amazon Return info about cookers
OrganiCook
27© 2005
User
Browser
Internet Connection
Cloud
Database server
Amazon EC2
Data Mining
Application Server Apache Tomcat
J2EEREST API
MySQL 5.5
Mahout Taste
JavaScript API
API Servers
OrganiCook
28© 2005
EatRight
29© 2005
30© 2005
EatRight
• True SoLoMo (Web 3.0)• Nutrition based meal shopping• Capturing user preferences: “Eat This”
button• Directed search advertising rates • Targeted ads based on nutrition
preferences and location• EatRight API• Twitter Sentiment• PCI Compliant Credit Card Processing• Amazon EC2 Cloud• Android Mobile App (iOS too!)
• By Jim Marquardson, Justin William, Dave Wilson, and Mark Grimes, Spring, 2012• Health & nutrition mobile site
31© 2005
EatRight
32© 2005
Big Data & Business Analytics
33© 2005
IQ: Size (storage) of LOC book collection?
34© 2005
IQ: What is a Yottabyte & who owns it?
35© 2005
The Data Deluge (Big Data)
• The Economists, March 2010– LOC total book collection 15 TBs– Google processes 10 PBs per day– Internet traffic 667 Exabytes by 2013, Cisco– Total amount of world information in 2010, 1.2
Zettabyte • KB-MB-GB-TB-PB-EB-ZB-Yottabyte
• E-Commerce, Government, Health, Security applications: many with TB/PB of valuable content from customers, citizens, patients, etc.
36© 2005
• $3B BI revenue in 2009 (Gartner, 2006); $9.4B BI software M&A spending in 2010 and $14.1B by 2014 (Forrester)
• IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians Acquired i2/COPLINK in 2011
BI & Analytics: The Market
37© 2005
BI & Analytics: Definition and Components
• BI and Analytics refers to: (1) the technologies, systems, practices and applications that (2) analyze critical business data to (3) help an enterprise better understand its business and market.”
• Core technologies: data warehousing, Extraction, Transformation, and Load (ETL); Business Performance Management (BPM), visual dashboards; data and text mining, social network analysis
• BI 2.0 & 3.0 research: web analytics, web 2.0; in-memory and real-time BI; web 3.0, cloud computing, Hadoop, MapReduce; mobile computing, stream data mining
38© 2005
Big Data Analytics Research at UA/AI Lab• Applications/problems: digital libraries, search engines,
biomedical informatics, healthcare data mining, security informatics, business intelligence
• Approaches: web collection/spidering, databases, data warehousing, data mining, text mining, web mining, statistical NLP, ontologies, social media analytics, interface design, information visualization, economic modeling, assessment
• Structure: federal funding, director, affiliated faculty, post-docs, Ph.D./MS/BS students commercialization
• Major phases: DLI COPLINK Dark Web DiabeticLink
39© 2005
Business Models
40© 2005
IQ: What is “CIA” and their differences?
41© 2005
• Central Intelligence Agency; Culinary Institute of America
• Chinese: math/science, team player, IT/hardware/web, China market (China)
• Indians: math/science, entrepreneurial spirit, English • Americans: English, entrepreneurial spirit, IT/software,
business development, market (US), VC access ($)
CIA in the Global IT Landscape
42© 2005
My COPLINK Experience
• Taiwan/US Training: NCTU (math) SUNY Buffalo (MBA) NYU (AI) U of Arizona (top 3)
• AI Lab: Digital Library COLINK Dark Web DiabeticLink
• COPLINK federal funding ($4M), NSF/NIJ, 1997-2002• COPLINK commercialization ($4.6M), angels/VCs (Taiwan, CA,
AZ), 2000 & 2003• Customer sales ($30M), 4,500 agencies, 120 FTEs, 2000-2011• M&A Exit, Silverlake/i2/IBM acquisition, 2009 (i2), 2011 (IBM);
$500M valuation
43© 2005 43
44© 2005
COPLINK Identity Resolution and Criminal Network Analysis (DHS)
44
Cross-jurisdictional Information Sharing/Collaboration
Border Crossing Data(AZ, CA, TX)
Vehicles People
Law-enforcement Data
AZ CA TX
CAN Visualizer
Criminal Network AnalysisCriminal Link Prediction
Predict interaction between individuals and vehicles using link prediction techniques to
identify high-risk border crossers.
High-risk VehicleIdentification
Identify high-risk vehicles using association techniques like mutual information using
border crossing and law enforcement data.
Law-enforcement Data Border Crossing Data
0
500
1000
1500
2000
No
v 1
1
No
v 1
7
De
c 1
9
De
c 2
1
De
c 2
9
Jan
6
Jan
6
Jan
6
Jan
15
Jan
19
Jan
26
Jan
31
Fe
b 2
7
Ma
r 5
Ma
r 5
Ma
y 1
8
Ma
y 1
8
Ma
y 2
5
Ma
y 2
8
Ma
y 3
0
Jun
9
Jun
e 1
7
< 2004 Dates 2005 >
Tim
e o
f D
ay
Vehicle A Vehicle B
Frequent Crossers at Night
Mutual Information
Narcotics Network
Vehicle A Vehicle B
Suspect Traffi c Burst Detection
Detect real-time anomalies and threats in border traffi c using Markov switching and
other models.
Arizona IDMatcher
Detect false and deceptive identities across jurisdictions using a probabilistic naïve-
Bayes based resolution system.
Identity ResolutionIdentityMatch
NameMatch
DOBMatch
IDMatch
AddressMatch
Middle Name Match
DOBSimilarity
IDSimilarity
AddressSimilarity
First NameMatch
Last Name Match
Middle Name
Similarity
First Name
Similarity
Last Name
Similarity
* Only the grayed datasets are available to the AI Lab
• Funding: NSF, DOJ, DHS ($4M), VCs ($4.6M); Digital Government• Publications: ACM TOIS, CACM, IEEE TKDE, IEEE IS, JASIST, DSS• Impact: 3500 agencies, 25 NATO countries, 1M users public safety
45© 2005
Newsweek Magazine, March 3, 2003
A computerized way for police to coordinate crime databases
Washington Post, March 6, 2008, COPLINK in use in 3,500 police agencies in US!
COPLINK acquired by i2 (Silver Lake) in 2009; i2/COPLINK acquired by IBM in 2011
for $500M
ABC News April 15, 2003
Google for Cops: Coplink software helps police search for cyber clues to bust criminals
The New York Times, November 2, 2002
COPLINK assisted in DC sniper investigation
46© 2005
• Startup Phase: business ideas (product and market), team (founders & mentors), share structure (shares, directors, options; legal/CPA), business plan (short plan, good introduction), funding (government, angels, VCs, family) Year 0, 1-3 founders, $250K funding (IT/cloud)
• Early Phase: first product, product positioning, team building, initial sales Years 1-3, $500K sales
• Growth Phase: products plan, strong sales team, sustainable revenues, unique IPs (SW, content), loyal customers Years 3-8, $10M sales
• Exit Phase: IPO or M&A (partners), when ($20M+), next venture
Taking risks!
IT Business Models: Some Thoughts
47© 2005
Pain, Sorrow, and Regret• Loss of family time/life (but never money)• Managing university obligations and COI• University bureaucracy, Office of Technology Transfer (OPTT)• Lawyers, accountants are expensive• Chasing angels/VCs (40 frogs 1 prince)• Office, employees, products• Selling products (becoming a vendor)• Burning cash• Bubble burst• Raising second round funding when you are down ($2M)• Board room yelling matches• University accusations• Losing control and shares• Anti-dilution clause (losing $60M for the $2M you never used)