Date post: | 13-Dec-2014 |
Category: |
Technology |
Upload: | isaac-mosquera |
View: | 89 times |
Download: | 0 times |
Copyright © 2013 Splunk Inc.
Big Data at the Speed of Business
Isaac Mosquera Director of Mobile, ShareThis
Clint SharpPrincipal Big Data Product Manager, Splunk
Copyright © 2013 Splunk Inc.
What We’ll Talk About
Our quest for visibility
Analyzing at scale
Splunk and Big Data
Where do you start?
Q&A
About Splunk
Company (NASDAQ: SPLK)Founded 2004, first software release in 2006
HQ: San Francisco
Business Model / ProductsIndustry-leading machine data platform
On-premise, in the cloud and SaaS
5,600+ Customers63 of the Fortune 100
Largest license: 100 Terabytes per day
#1 Big Data Innovator*
* Fast Company's Most Innovative Companies Issue (March 2013)
About ShareThis and Socialize
ShareThis makes the world more connected, trusted and valuable through sharing
Powers the social web, touching the lives of 95 percent of U.S.
Acquires Socialize, which makes mobile and social more engaging
Socialized integrated into thousands of iOS and Android Apps
Installed on 80M+ devices
Evaluating 20 Billion Ad Impressions Monthly
Ad Request RTB
Ad Request
Socialize Bidder
Bid ResponseWinning Bidder's Ad
Ad Impression
Ad Click
Little Bit About Real-Time Bidding
All this needs to happen in less than 100 milliseconds!
So What Are Some of the Problems?
Decision Making (Bid Algorithms)
Ingesting more than 10,000 queries per secondWhich bids are > 100msQuickly finding any errors within the system
Campaign spendingCampaign efficiencyDissect data by:– apps– users– devices
Operational
Analyzing Big Data Efficiently
1. 2. 3. 4.
Collection Storage RetrievalAnalyzation/Aggregation
Some Options
SQL functions like count() presents problems at scale
Write operations too high for a single DB, as well as a single point of failure
Would work well for high inserts and queries, however we would need to build alerting, charting and reporting dashboards
Easy to setup and query using Hive however we would have to setup a new environments and learn new technology
RDBMS
RDBMS
NoSQL
Hadoop
Easily identify problems and prevent erroneous spending. When an alert goes off we hit a script which shuts off the bidder.
Allows us to find patterns in the data to improve our bid algorithms
Instantly know campaign metrics for us and our clients
Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key
Operational Reporting
AdHoc Queries
Application Reporting
Scalability
Splunk Fits the Bill
Analysis/Aggregationindex=ad_events displayed_ad| bin _time span=1m| stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time| mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"
RDBMS(Generated Reports)
SearchHead
Indexer
Indexer
Indexer
Interactive analysis with Search Processing Language:
Using Splunk to Analyze Operational Data
Easily digest information through charts
source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime
Final Architecture
RDBMS (Generated
Reports)S3 Snapshots
SearchHead
Socialize Bidder
SplunkIndexer
Indexer
Indexer
Cache Cluster
Memcache Memcache Memcache
So, What is Splunk?
14
Expanding Universe of Data Sources
Machine-generated DataBusiness Application Data Human-generated Data
Highly Structured Arbitrarily Structured
2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=“2012-12-05 07:06:44” [email protected] Email_Opt_In_c Customer_Street_Address_c=“123 Main St.” purchased_product_id=product_i BD-01 twitter_username john_t_doe
Industry Leading Platform for Machine Data
Any Machine Data Operational Intelligence
HA Indexes and Storage
CommodityServers
DeveloperPlatform
Custom dashboards
Monitor and alert
Ad hoc search
Report and analyze
Analyzing Heterogeneous Data
Universal Index Schema-on-the-fly Flexibility and Fast Time to Value
• No data normalization• Automatically handles
timestamps• Parsers not required• Index every term &
pattern “blindly”• No attempt to
“understand” up front
• Structure applied at search-time
• No brittle schema to work around
• Automatically find transactions, patterns and trends
• Normalization as it’s needed
• Faster implementation• Easy search language• Multiple views into the
same data
Gain Critical Insights … in Real-timeOrder ID
Customer’s Tweet
Time Waiting On Hold
Product ID
Company’s Name
Sources
Care IVR
Middleware Error
Order Processing
Order ID
Customer ID
Twitter ID
Customer ID
Customer ID
Deep Visibility and Insight for IT and Business
IT Operations Management Web Intelligence
Business AnalyticsApplication Management
Security and Compliance Industrial Data / Internet of Things
Over 5,600 organizations using Splunk across IT and business users
Driving Insights from Big Data
Hadoop
The ShareThis Insights Platform
On Father’s day:“Who were the most shared about topics?”“What type of type of beers do people drink?”
API ETL Pre-aggregation Analytics
?
Finding the Optimal Approach
Hadoop and MapReduce are great for complex data science on data at rest – the previous architecture took 9 months with a team of engineers, data architects, etc.The Splunk platform delivers real-time, interactive analysis – we can build many of the same insights within 1 hour
What should be the core focus or competency of your team?
Conclusion: find the most optimal approach for the business
What About Ad Hoc Analysis?
PR Insights ExampleWhat was the situation? (e.g. fast moving business, needed real-time insights)What was the PR team struggling with? Difficult to find useful data to build interesting use-casesWhat did they want? They wanted a flexible real-time reporting environment to extract insights useful for the marketHow my team helped? Delivered a single dashboard that contained real-time data into the sharing behaviors across our network
PR Insights Dashboard
Let’s not forgetThe low-hanging fruit
Operational Analytics for an Online World
website
API NotificationGoogle (GCM)
FeedbackProcessor
Apple (APNS)
? !
Notifications Systems
Driving Superior Customer Experience
How many 500 errors have I had over time?
Look for anomalies and spikes!
Zone in directly to the customer!!
Online Device Notifications
One More Thing …
28
Copyright © 2013 Splunk Inc.
New product from Splunk delivers interactive data exploration, analysis and visualizations for Hadoop
Announcing Hunk BetaSplunk Analytics for Hadoop
Derive Actionable Insights from Raw Data
HadoopStorage
Immediately start exploring, analyzing and visualizing raw data in Hadoop
1 2Point Splunk at Hadoop Cluster
Explore Analyze Visualize Dashboards Share
Learn More
31
splunk.com/bigdata
Copyright © 2013 Splunk Inc.
Questions?