1 © Cloudera, Inc. All rights reserved.
Big Data is not Rocket Science European Commission Workshop December 9th 2014 Lars George | EMEA Chief Architect
2 © Cloudera, Inc. All rights reserved.
“It took 51 years before hard disk drives reached the size of 1 TB. This happened in 2007. In 2009, the first hard drive with 2 TB of storage arrived. So while it took 51 years to reach the first terabyte, it took just two years to reach the second.” —Royal Pingdom
Source: hXp://royal.pingdom.com/2010/02/18/amazing-‐facts-‐and-‐figures-‐about-‐the-‐evolu\on-‐of-‐hard-‐disk-‐drives/
3 © Cloudera, Inc. All rights reserved.
Architectural Changes Trigger
• This is the third age of data processing • We were always data driven, but scale has changed • Today data is too fast, too varying, and too heavy to move it around • IT needs to follow this model and move faster
4 © Cloudera, Inc. All rights reserved.
What is the Problem with Big Data?
• Big Data is a Buzzword just as much as Cloud is • The defini\on is fuzzy but tries to describe a new piece of technology
• Important takeaway is that Big Data is turning data processing upside down • Load before Extract and Transform
• The current technology stack is not suited (yet) for data-‐centric architectures • Current academic educa\on is split into mul\ple, disconnected approaches • Science: Trains mathema\cians, sta\s\cians, engineers • Applied science: Trains polyglot, “generic” developers (coders) • Research: Develops new tools to store and process data
• None of the training helps to speed up the adop\on of Big Data!
5 © Cloudera, Inc. All rights reserved.
DaaS is not PaaS
• There is a gap between PaaS and being successful with Big Data as a Service (DaaS). • Big Data engineers need to fill this gap for the \me being • Future will bring building blocks to build data applica\ons
Ø For now there is nothing to simplify the technology for users!
6 © Cloudera, Inc. All rights reserved.
Big Data Engineering
Engineering Task: • Build reliable, automated, scalable, managed, and governed data processing pipelines. • Apply all exis\ng knowledge smartly
7 © Cloudera, Inc. All rights reserved.
Job Requirements
Ques\on: What are the requirements for a Big Data engineer? IT systems are built with various layers to handle specific tasks. There are dis\nct sec\ons that can be assigned to differently trained people.
Opera\ng System
Hardware
Applica\on Solware
OSI Model
8 © Cloudera, Inc. All rights reserved.
Job Requirements
• Developer, DBA, etc.
• System Administrator
• Network Engineer
• Datacenter Technician
• Building Facili\es
Opera\ng System
Hardware
Applica\on Solware
OSI Model
9 © Cloudera, Inc. All rights reserved.
Job Requirements
The problem is that Big Data needs all of these skills combined: DevOps! This is a big issue, as it requires change and training on every level. This is mostly an organisa\onal challenge, not a technology one.
Opera\ng System
Hardware
Applica\on Solware
OSI Model
Big Da
ta Skills
10 © Cloudera, Inc. All rights reserved.
Summary
• The biggest issue is that the technology is not complete yet but at the same \me requires a complete ver\cal adop\on within the IT department • There is a requirement to train and educate the exis\ng and new IT and science professionals
• Ac\on Items: • Combine exis\ng educa\onal material to reflect new challenges • Train staff to understand challenges concerning their responsibili\es • Develop new middleware that makes adop\on of planorm easier
11 © Cloudera, Inc. All rights reserved.
Thank you Lars George | EMEA Chief Architect [email protected] @larsgeorge