Making big data work

Post on 19-Feb-2017

112 views 0 download

transcript

Making Big Data workLewis CrawfordPrincipal Architect @ the DataShedthedatashed.co.ukLewis@thedatashed.co.uk

© the DataShed Limited 2015

intro

Who am I?• For the last 3 years, the DataShed has been providing consultancy services to a vast array

of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset.

• We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world.

• Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner.

© the DataShed Limited 2015

So what is ‘Big Data’?

© the DataShed Limited 2015

Why do Big Data projects fail?

Too many people think that Big Data is:

“The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.”

Gill Press, Forbes.com

© the DataShed Limited 2015

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

Real-time data

© the DataShed Limited 2015

© the DataShed Limited 2015

© the DataShed Limited 2015

Continuous Integration Demo

© the DataShed Limited 2015

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

Little Big Data

© the DataShed Limited 2015

A problem closer to home…• Every business needs to understand:• Their potential customers and market• Current customers• Their products and sales• How and when they engage prospects and customers

• Analytics and data are expensive• Many of the mandatory elements are very similar for everyone• The DataShed is Analytics as a Service and Single Customer View as a

Service.

© the DataShed Limited 2015

The deduplication problem…

• SME has 250,000 customers (two systems of record)• To identify duplicates brute force approach: 31,249,875,000 comparisons• Building a system to process a minimum of 100 clients a day…

• 3.1 trillion records to compare using > 10 different algorithms

• Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules• A small data problem but a big data solution?

Title First Name Surname Address 1 Address 2 Address 3

Dr R J Smith Two Oaks 112 Old St. County Durham

Mrs Robyn Smith 112 Old Street Durham DH1 5YJ

© the DataShed Limited 2015

© the DataShed Limited 2015

The Shed demo

© the DataShed Limited 2015

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

How to make Big Data work?1. Understand your problem

• ’Big Data’ challenges aren’t necessarily new, however much of the technology is• Articulate and communicate – focus on distilling your problem down• Incremental improvement not wholesale replacement

2. Apply appropriate tools• Understand the economics as well as the technology• New technologies need to be evaluated within the context of your problem scope• New technologies are enablers not deliverables (#datalake)• ’Big Data’ technology should be seen as complementary to existing technology

3. Automate everything• Continuous integration to include all testing• Containerise where possible• Measure everything

© the DataShed Limited 2015

If you really want to get involved…

© the DataShed Limited 2015

Get your hands dirtyIf you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to:

Email: hello@thedatashed.co.ukTwitter: @thedatashed

© the DataShed Limited 2015

Any questions?

© the DataShed Limited 2015

Lewis CrawfordPrincipal Architect @ the DataShedthedatashed.co.ukLewis@thedatashed.co.uk