Centralized logging system using mongoDB

Centralized Logging System Using MongoDB

@vparihar

AVP Engineering,Webonise Lab

Vivek Parihar

Who Am I?

● A Weboniser and Rubyist● Blogger(vparihar01.github.com)● MongoDb user● Geek● DevOps● Mainly write Ruby, but have great passion for Javascript and Cloud

Platforms...

● What is Logging? ● Why we need Logging? ● Logging DO’s and Don’t ● Logs are Streams, Not FIles● Problems managing Logs for huge INFRA● What Central Logging System can do for us?● Central Logging System Architecture● What and why Fluentd?● Why MongoDB is good fit.

Agenda

What is Logging?

Mmmm Logging: It is the most important part of any application.

In General, Logging refers to keeping track of something.

Why we need Logging?

Logging: Helps me finding and fixing bugs

Logging: Extensively used for Debugging

Logging: Helps us diagnose & understand the behaviour of application.

Logging: Tells us exactly what happened when, where and why?

Who did it ?At what time ?What did he steal ?

Logging: Do’s and Don’t

#1 It should be FAST


#2 Should not affect user

Prevent DISK BLOAT

It should not be like-:{● "#########its working#########"

● "!!!!!coming here in to get secondary users!!!!!"

● "#########I am Here#########"

● "#########Task completed#######"}


#3 Do Log only useful INFO


4. Differentiate Log Levels

Logs Are Streams, Not Files

Logs are a stream, and it behooves everyone to treat them as such. Your programs should log to stdout and/or stderr and omit any attempt to handle log paths, log rotation, or sending logs over the syslog protocol.

Directing where the program’s log stream goes can be left up to the runtime container: a local terminal or IDE (in development environments), an Upstart / Systemd launch script (in traditional hosting environments), or a system like Logplex/Heroku (in a platform environment).

By: Adam Wiggins, Heroku co-founder.

Problems managing Logs for huge Infra

What about infra like these ?

Problems managing Logs for huge Infra

Expression like:

How can we solve huge Infra problem ?

Solution: Centralized Logging System

What Centralized Logging System can do for us?


All of the logs are in one place, this makes things like searching through logs and analysis across multiple servers easier than bouncing around between boxes. Greatly simplifying log analysis and correlation tasks.

#1 Log Collections

#2 Aggregation

Scaled-out servers behind load balancers each produce their own log files, making it impossible to debug a single action flow that distributed between servers, unless the logs converge into a single article.


#3 High Availability

Suppose your system is down or overloaded and unable to tell you what happened.


Local logs from the server may be lost in the event of an intrusion or system failure. But by having the logs elsewhere you at least have a chance of finding something useful about what happened.

#4 Security


It reduces disk space usage and disk I/O on core servers that should be busy doing something else.

#5 Prevent Disk BLOAT


#6 Visual IndicatorsAbnormal behaviors can be detected faster when we see them in a visual instrument such as a graph, where peak points are easily noticed.


Centralized Logging System Architecture

What and Why ?

What’s Fluentd?

It’s like syslogd, but uses JSON for log messages

What’s Fluentd?

What’s Fluentd?

timetag

record

What’s Fluentd?

What’s Fluentd?

Plug-in Plug-in Plug-in

So Fluentd is a:BufferRouterCollectorConverterAggregator…….

What’s Fluentd?

It’s written in RUBY :)

Why Fluentd?

Extensibility - Plugin ArchitectureWhy Fluentd?

Unified log format - JSON formatWhy Fluentd?

Reliable - HA configurationWhy Fluentd?

Easy to install - RPM/deb packages> sudo fluentd --setup && fluentd

Very small footprint> small engine (3,000 line) + plugins

Why Fluentd?

Why is good fit ?

1. It’s Schemaless

Document-oriented / JSON is a great format for log information. Very flexible and “schemaless” in the sense we can throw in an extra field any time we want.

Why ?

2. Fire and Forget

MongoDB inserts can be done asynchronously.

Why ?

3. Scalable and easy to replicate.

Built in ReplicaSet and Sharding provides high availability.

Why ?

4. Centralized and easy remote access

Why ?

5. Capped Collection● They "remember" the insertion order of their documents● They store inserted documents in the insertion order on disk● They remove the oldest documents in the collection automatically as new

documents are inserted

However, you give up some things with capped collections:

● They have a fixed maximum size● You cannot shared a capped collection● Any updates to documents in a capped collection must not cause a document to

grow. (i.e. not all$set operations will work, and no $push or $pushAll will)● You may not explicitly .remove() documents from a capped collection

Why ?

6. Tailing Logs● You’ll really miss ability to tail logfiles● Or, .. will you?● MongoDB offers tailable cursors

Why ?

Tailable Cursors

What with Tailable Cursors ?

We can implement the pub/sub usingNode.js and MongoDB

https://github.com/scttnlsn/mubsub

Why ?

Thanks

Would Love to answer your queries...

Vivek Parihar@vparihar

Date post:	28-Jan-2015
Category:	Technology
Upload:	vivek-parihar
View:	126 times
Download:	0 times

Centralized logging system using mongoDB

Technology