Date post: | 08-May-2015 |
Category: |
Technology |
Upload: | treasure-data-inc |
View: | 22,336 times |
Download: | 0 times |
Fluentd ♥ MongoDB
Log Everything As JSON
Kazuki Ohta, CTO at Treasure Data, Inc.
Tuesday, July 17, 2012
Self-Introduction• Kazuki Ohta
> twitter: @kzk_mover> github: kzk
• Treasure Data, Inc.> Chief Technology Officer; Founder> Original Fluentd Author @frsyuki is another co-founder.
• Open-Source Enthusiast> KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc.> Fluentd rpm/deb package manager
2
Tuesday, July 17, 2012
Logging? Why?
Tuesday, July 17, 2012
4
Figure 1: Common Logging Purposes
Analytics
Error Notification
Recommendation
Tuesday, July 17, 2012
5
Figure 2: Types of Logs
App Log
Access Log(Apache, Rails, etc.)System Log(syslog etc.)Others
Tuesday, July 17, 2012
From “Scaling Lessons learned at Dropbox”6
Tuesday, July 17, 2012
From “Scaling Lessons learned at Dropbox”6
Fragile for format change,No type information,No field name, etc.
Tuesday, July 17, 2012
About Fluentd
Tuesday, July 17, 2012
8
It's like syslogd, but uses JSON for log messages
Tuesday, July 17, 2012
Logs in JSON? Why?
9
1. Machine-Readable> machine is goint to be a main consumer of logs
2. Schema-Free> you want to add/remove fields from logs at anytime
Write Logs for Machines, use JSONhttp://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
Tuesday, July 17, 2012
Logs As JSON
10
Logs As TEXT
+ Field Name+ No Custom Parser+ Type Information+ Schema Free
Tuesday, July 17, 2012
Logs As JSON
10
“2011-04-01 host1 myapp: cmessage size=12MB user=me”
2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, “user”: “me”}
Logs As TEXT
+ Field Name+ No Custom Parser+ Type Information+ Schema Free
Tuesday, July 17, 2012
• Website> http://fluentd.org/
• Community> http://github.com/fluent> 16 committers across
many organizations> web, game, enterprise
• Mailing list> Google groups
12
Tuesday, July 17, 2012
Fluentd Architecture
Tuesday, July 17, 2012
14
Application
Fluentd
Storage
Fluentd: Log Format
Tuesday, July 17, 2012
14
Application
Fluentd
Storage
2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
Fluentd: Log Format
Tuesday, July 17, 2012
14
Application
Fluentd
Storage
2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}
timetag
record
Fluentd: Log Format
Tuesday, July 17, 2012
15
Fluentd: Plugins
Application
Fluentd
Storage
filter / buffer / routing
Tuesday, July 17, 2012
15
Fluentd: Plugins
Application
Fluentd
FluentdStorageSaaS
filter / buffer / routing
Plug-in Plug-in Plug-in
Tuesday, July 17, 2012
16
Fluentd: Plugins
Application
Fluentd
FluentdStorageSaaS
filter / buffer / routing
Plug-in Plug-in Plug-in
Tuesday, July 17, 2012
16
Fluentd: Plugins
Application
Fluentd
FluentdStorageSaaS
filter / buffer / routing
File
tail
Scribesyslogd
Plug-in Plug-in
Plug-in
Plug-in Plug-in Plug-in
Tuesday, July 17, 2012
17
• Client libraries> Ruby> Perl> PHP> Python> Java> ...
Application
Fluentd
HTTP / TCP / UDS
Buffering
Tuesday, July 17, 2012
17
• Client libraries> Ruby> Perl> PHP> Python> Java> ...
Fluent.open(“myapp”)
Fluent.event(“login”, {“user”=>38})
#=> 2012-02-04 04:56:01 myapp.login {“user”:38}
Application
Fluentd
HTTP / TCP / UDS
Buffering
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
18
Burst of trafficrsync consumesall bandwidth
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
18
Application
File File File ...
App server
Application
File File File ...
App server
File
Application
File File File ...
App server
Log server
Burst of trafficrsync consumesall bandwidth
High latencymust wait for a day
Hard to analyzecomplex text parsers
Tuesday, July 17, 2012
Log Collection using Fluentd
19
Fluentd Fluentd Fluentd
Fluentd Fluentd
Realtime!
Tuesday, July 17, 2012
Log Collection using Fluentd
19
Fluentd Fluentd Fluentd
Fluentd Fluentd
Hadoop/ Hive
MongoDB
AmazonS3 / EMR
Ready toAnalyze!
Realtime!
Tuesday, July 17, 2012
Fluentd Case Study
20
Fluentd Fluentd
Fluentd Fluentd Fluentd
Ruby on Rails Ruby on Rails Ruby on Rails
Hadoop/ Hive
MongoDBPV logs
User behaviorlogs
routing✓ 127 RoR servers✓ 100,000 msgs/sec✓ 120Mbps at peak✓ 1TB/day
Tuesday, July 17, 2012
# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>
# save access logs to MongoDB<match apache.access> type mongo host 127.0.0.1</match>
# forward other logs to servers# (load-balancing + fail-over)<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>
Tuesday, July 17, 2012
Comparison
Tuesday, July 17, 2012
23
Frontend servers
Aggregator nodesscribe
scribescribe
scribe
scribescribe
HadoopHDFS
Scribe: log collector by Facebook
Tuesday, July 17, 2012
Scribe’s Pros & Cons• Pros.
• Fast (written in C++)• Cons.
• VERY HARD to install• nightmare of boost, thrift, libhdfs, etc.
• Unstructured Logs• parsing must be required before the analysis
• Hard to extend• recompiling C++ programs are required
• No longer maintained
24
Tuesday, July 17, 2012
Fluentd vs Scribe• Easy to install
• “gem install fluentd”• Stable RPM and Deb packages
• http://packages.treasure-data.com/• Easy to write plugins
• you can use Ruby• Easy plugin distribution
• “gem search -rd fluent-plugin”
25
Tuesday, July 17, 2012
26
Flume: distributed log collector by Cloudera
Flume
HadoopHDFS
Flume Flume
Flume MasterPhisicalTopology
LogicalTopology
Tuesday, July 17, 2012
Flume’s Pros & Cons• Pros.
• Central master server manages all nodes• Cons.
• Difficult to understand• logical topologies, phisical servers and a
configuration of the logical/phisical mapping
• Difficult to configure• replicated master servers, log servers and agents
• Big footprint• 50,000 lines of Java
27
Tuesday, July 17, 2012
Fluentd vs Flume
• Easy to understand• “syslogd that understands JSON”
• Easy to setup• “sudo fluentd --setup && fluentd”
• Very small footprint• small engine (3,000) lines + plugins• small, but battle-tested!
• Easy to configure
28
Tuesday, July 17, 2012
29
Fluentd Scribe FlumeInstallation
Footprint
Plugin
Plugin distribution
Master Server
License
gem/rpm/deb make jar/rpm/deb
3000 lines ofRuby
8000 lines ofC++
50,000 lines ofJava
Ruby N/A Java
RubyGems.org N/A N/A
No No Yes
Apache License Apache License Apache License
Tuesday, July 17, 2012
Fluentd Plugin for
Tuesday, July 17, 2012
fluent-plugin-mongo• Included within rpm/deb by default!
• http://github.com/fluent/fluent-plugin-mongo
• #1 plugin among 50+ Fluentd plugins
• Logs As JSON. WHY NOT Put Them Into Mongo??
• http://fluentd.org/plugin/• Supports most of the MongoDB features
• Authentication
• ReplicaSet
• Capped Collection
31
Tuesday, July 17, 2012
32
Application
Fluentd
MongoDB MongoDB
MongoDBMongoDB
MongoDB
ShardingReplicaSet
Single Instance(Capped or Not)
• MongoDB Output Plugin• Maintain JSON Structure• Reliable Buffering• Batch Insertion• Handle Broken Records
• Ruby Driver #82
Buffering
Authentication
MongoDBMongoDB
MongoDBMongoDBMongoDB
Tuesday, July 17, 2012
32
Application
Fluentd
MongoDB MongoDB
MongoDBMongoDB
MongoDB
ShardingReplicaSet
Single Instance(Capped or Not)
• MongoDB Output Plugin• Maintain JSON Structure• Reliable Buffering• Batch Insertion• Handle Broken Records
• Ruby Driver #82
Buffering
Authentication
MongoDBMongoDB
MongoDBMongoDBMongoDB
Tuesday, July 17, 2012
33
Fluentd
MongoDB
• MongoDB Input Plugin• Tailing Capped Collections
Buffering
MongoDB
MongoDBMongoDB
ReplicaSet(Capped Collection)
Single Instance(Capped Collection)
Authentication
Tuesday, July 17, 2012
33
Fluentd
MongoDB
• MongoDB Input Plugin• Tailing Capped Collections
Buffering
MongoDB
MongoDBMongoDB
ReplicaSet(Capped Collection)
Single Instance(Capped Collection)
Authentication
Tuesday, July 17, 2012
34
Realtime Analytics with Fluentd + MongoDB
Fluentd Fluentd
Fluentd Fluentd Fluentd
App App App
MongoDB
routing
ChartingqueryAlert
Nagios, Zabbix, etc.
Tuesday, July 17, 2012
35
Realtime or Batch? No, BOTH!
Fluentd Fluentd
Fluentd Fluentd Fluentd
App App App
MongoDB
routing
Chartingquery
realtime
AmazonS3
Hadoop/ Hive
archivebatchTuesday, July 17, 2012
36
Intro of our company’s service: Treasure Data
Fluentd Fluentd
Fluentd Fluentd Fluentd
App App App
MongoDB
routing
realtime
TreasureDatabatch
Hadoop-basedCloud Data Warehouse
Tuesday, July 17, 2012
Exercise: Apache Logs into MongoDB
Tuesday, July 17, 2012
38
Log File
Tuesday, July 17, 2012
39
Tuesday, July 17, 2012
40
Tuesday, July 17, 2012
Conclusion
• Log Everything as JSON• Machine Readability• Schema Freeness
• MongoDB fits into Fluentd’s backend perfectly• Both using JSON representation
41
Tuesday, July 17, 2012