Data Science Overview

Post on 02-Dec-2014

629 views 1 download

description

La BuzzWord dell’ultimo anno è “Data Science”. Ma cosa significa realmente? Cosa fa un “Data Scientist”? Che strumenti sono messi a disposizione da Microsoft? E che altri strumenti ci sono oltre a Microsoft?

transcript

Previously known as

Think Big. Move Fast.

Template designed by

brought to you by

SolidQ• Born in 2002 in USA and Spain

• Established in 2007 in Italy

• More than 1000 customers and more than 200 consultants worldwide

• Dedicated to Data Management on the Microsoft Platform

• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors

• www.solidq.com

Davide Mauri• 18 Years of experience on the SQL Server Platform• Specialized in Data Solution Architecture, Database Design, Performance

Tuning, Business Intelligence• Microsoft SQL Server MVP• President of UGISS (Italian SQL Server UG)• Mentor @ SolidQ• Video, Book & Article Author• Regular Speaker @ SQL Server events• Projects, Consulting, Mentoring & Training

Data ScienceReinassance 2.0

“Companies are collecting mountains of information about

you, to predict how likely you are to buy a product,

and using that knowledge to craft a marketing message

precisely calibrated to get you to do so”

Business Week Magazine

1994

Data Science• Extraction of knowledge from data

• So, what’s new?

• Nothing. Except that it’s now economic and fast.

• It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge

Data Science

DecisionsKnowledgeInformationData

Data Science• A Sum Of

• Statistics• Mathematics• Machine Learning• Data Mining• Computer Programming• Data Engineering• Visualization• Data Warehousing• High Performance Computing

• To support (Informed) Decision Making• Data-Driven Decisions

Data Scientist• IBM

• A data scientist represents an evolution from the business or data analyst role. • The formal training is similar, with a solid foundation typically in computer science and

applications, modeling, statistics, analytics and math. • What sets the data scientist apart is strong business acumen, coupled with the ability to

communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.

• It's almost like a Renaissance individual who really wants to learn and bring change to an organization.

Algorithms• Algorithms are the new gatekeepers

• http://www.slideshare.net/socialisten/algorithms-are-the-new-gatekeepers • There is simply too much data for a human to analyze!• They decide

• What we find• What we see• What we buy

• Data is the foundation upon which algorithm works• Better Data lease Better Results

• Data-Driven Decisions will be a MUST in the next years!• Data Scientists will help companies to leverage their most valuable asset: Data

Modern Data Environment

MasterData

EDWData Mart

Big Data

UnstructuredData

BI Environment

Analytics Environment

StructuredData

Big Data

The 3 V

No, the 4 V!!!

No, no, the 5 V!!!!!6V!!!

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Big Data• Volume, Velocity, Variety, Veracity….V<your-v-here>

• Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time

• Grid Computing, Parallel Computing needed• keep processing time reasonable• provide scalability

Big Data Data• Paradigm: “Store Now, Figure Out Later”

• Data is the new resource. Never throw it away!

• Unstructured Data• Text Files• Images• Sounds

• Structured/Semi Structured Data• Sensors• Transactions• Logs

Data Storage• RDBMS

• SQL Server

• Hadoop• HDInsight• Hortonworks Data Platform

• Distributed File (Eco)System• CSV• JSON• *.*

Data Storage• Hadoop Ecosystem

http://hortonworks.com/hadoop-modern-data-architecture/

Data Science & Big Data• Data Science != Big Data

• Data Science Not Only on Big Data

• Data Science can be applied to Big Data

• Data Science starts from Small Data• 1) find the algorithm that extract knowledge• 2) measure algorithm results and in terms of probability

Machine Learning• Machine learning, a branch of artificial intelligence, concerns the construction

and study of systems that can learn from data. (Wikipedia)• For example, a machine learning system could be trained on email messages to learn to

distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.

• Flavors• Supervised• Unsupervised

Data Analysis• Common Data Scientists Tools

• R• Weka• Octave• Scikit-Learn

• Common Data Scientists Languages• Python• Scala• F#

Resources• https://www.coursera.org/

• Data Scientist Specialization

• https://www.khanacademy.org/ • Math

• http://www.osservatori.net/business_intelligence • Italian Big Data Market Analysis Resources

• http://www.solidq.com/consulting/• Data Science Services• Big Data / Business Intelligence / Data Warehousing

Previously known as

Think Big. Move Fast.