+ All Categories
Home > Technology > How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Date post: 22-Jan-2018
Category:
Upload: devopsdays-tel-aviv
View: 114 times
Download: 0 times
Share this document with a friend
14
How our ISP cost us a full day of the entire R&D team Lior Redlus Co-founder and Chief Data Scientist Coralogix
Transcript
Page 1: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

How our ISP cost us a full day of the entire R&D team

Lior Redlus

Co-founder and Chief Data Scientist

Coralogix

Page 2: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

About Myself

• 32yr. Scientist at heart

• B.Sc and M.Sc in Neuroscience and Information Processing (BIU)

• Co-founder and Chief Data Scientist @ Coralogix

Page 3: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

About Coralogix

• A Machine Learning powered scalable Log Analysis solution

• Log Management already included: indexing, querying, filtering, alerting etc.

• Coralogix Analytics:• Turns your data into patterns and flows

• Gives you deep insights on your system

• Automatically detects production problems

• Finds system behavior changes between code deployments

Page 4: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Interacting with your logging data

• Coralogix provides 3 ways to get insights from your logs:1. Coralogix Dashbaord – a simple and powerful dashboard with machine

learning capabilities

2. Elastic’s Kibana – with a rich query language and flexible visualizations

3. Elasticsearch API – for deep technical querying and aggregations

Page 5: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Good product, happy customers!

• Everything worked smoothly for months

• Until we got a call from a customer (0.5TB / day)

• Some of his heavier dashboards could not be loaded

• He was not happy

• And neither were we

Page 6: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Well, of course. This makes no one happy!

• The error message was replicated in our offices as well

Page 7: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Kibana – technical overview

Port 5601

Node.jsserver

Angular.jsclient

localhost

Docker container

Docker container

Docker container

Our proprietary Kibana proxy:• Emulates elasticsearch for Kibana• Confines customers to only

access their data• Parses queries for various SLA

restrictions

Port9200

Port9200

Customer

Pu

blic

do

mai

n

Page 8: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

So what could have gone wrong..?

• We looked into everything we could think of:

• Was the customer’s dashboard defined properly?

• It was.

• Was any indexed elasticsearch data corrupted?

• No.

• Was a large Kibana dashboard overloading our Kibana Proxy?

• Not according to the CPU and memory monitoring.

• Was there a hidden bug in our Kibana Proxy for certain queries?

• Replies seemed to be correct for every query we researched.

• Was any Docker container replaced recently, possibly with different settings?

• Yes, but new settings were not introduced.

• Was any Docker networking bug (and there are many…) interacting here?

• Not any that we could find.

Page 9: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Everything looked perfect!

• However, we did have one odd finding:• When we were connected to our VPN, all the problems disappeared!

• Late at night and disappointed, we decided to call it a day:

Page 10: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Connecting the dots…

• Returning home, we each loaded the dashboard, and to our surprise – everything worked!

• The same ISP served us and the customer, but not our homes.

• The new suspect was our Internet Service Provider!

Page 11: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Results – 1

• The next day, confident, we experimented:• SSL vs. no SSL

• Kibana’s standard port 5601 vs. 443 https port

• Adding our Kibana CNAME to Cloudflare

• The results were staggering!

Page 12: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Results – 2• Loading the dashboard without SSL through port 5601:

• Loading the same dashboard with SSL through port 443 and Cloudflare:

Page 13: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Results, solution and conclusion

• The ISP was throttling our requests, causing timeouts and packet losses – eventually crashing heavy-loaded dashboards

• Adding our Kibana to Cloudflare under port 443 solved our problems• (aside from wasting a whole day of our R&D team!)

• Conclusion: trust no-one!

Page 14: How our ISP cost us a full day of the entire R&D team - Lior Redlus - DevOpsDays Tel Aviv 2017

Questions?

• Please feel free to contact me directly:

Lior Redlus, Chief Data Scientist, [email protected]

One month free trial @ http://www.coralogix.com


Recommended