+ All Categories
Home > Documents > Boston KM Forum

Boston KM Forum

Date post: 23-Feb-2016
Category:
Upload: ave
View: 35 times
Download: 0 times
Share this document with a friend
Description:
Boston KM Forum. How big d ata b ecomes a ctionable information Tweaked version of Gilbane big data presentation Other Gilbane Conference impressions And some open source/content management market dynamics slides Discussion. Big Data 101 Agenda. Big data in context Recap Risks - PowerPoint PPT Presentation
Popular Tags:
41
Boston KM Forum • How big data becomes actionable information – Tweaked version of Gilbane big data presentation • Other Gilbane Conference impressions – And some open source/content management market dynamics slides • Discussion 1
Transcript
Page 1: Boston KM Forum

1

Boston KM Forum

• How big data becomes actionable information– Tweaked version of Gilbane big data presentation

• Other Gilbane Conference impressions– And some open source/content management

market dynamics slides• Discussion

Page 2: Boston KM Forum

2

Big Data 101 Agenda

• Big data in context• Recap• Risks• Recommendations

Page 3: Boston KM Forum

3

Big Data in Context

• What is “big data”?– Unhelpfully, both “big data” and “NoSQL,” generally

considered a key part of the big data wave, are defined more in terms of what they aren’t than what they are

– A typical big data definition (Wikipedia): • “[…] data sets that grow so large that they become awkward

to work with using on-hand database management tools”– Often associated with Gartner’s volume, variety (and

complexity), and velocity model• Also value and veracity considerations

Page 4: Boston KM Forum

4

Big Data in Context

• Why is big data a big deal now?– The need to deal with really big data sources, e.g., Web

site logs, social network activities, and sensor network feeds

– Commoditized hardware, software, and networking• Capability and price/performance curves that continue to defy

all economic “laws”• Cloud services with radical new capability/cost equations

– Maturation and uptake of related open source software, especially Hadoop• Powerful and often no- or low-cost

Page 5: Boston KM Forum

5

Big Data in Context

• Why is big data a big deal now (continued)?– Market enthusiasm for “NoSQL” systems

• Which often simply means Hadoop– Useful and often “open source”/public domain data

sources and services– Mainstreaming of semantic tools and techniques

• Overall: many things that used to be complex, expensive, and scarce– Are now relatively straightforward, inexpensive, and

abundant

Page 6: Boston KM Forum

6

Big Data in Context

• Big data reality checks– Most decision-makers don’t want big data per se;

instead, they probably want• Relevant, accurate, and timely answers to big questions

– Including alerts pertaining to questions they may or may not have asked yet

• The ability to purposefully analyze information without having to master arcane technologies

– It’s more about the ability to formulate and ask big questions (and to effectively analyze and act on answers) than it is about related technologies

Page 7: Boston KM Forum

7

A Prime Minicomputer, c1982

Page 8: Boston KM Forum

8

Fast-Forward to 2012

Page 9: Boston KM Forum

9

Fast-Forward to 2012

Page 10: Boston KM Forum

10

Fast-Forward to 2012

Page 11: Boston KM Forum

11

Fast-Forward to 2012

Page 12: Boston KM Forum

12

Fast-Forward to 2012

Page 13: Boston KM Forum

13

Google BigQuery

Page 14: Boston KM Forum

14

Hadoop

• Hadoop is often considered central to big data– Originating with Google’s MapReduce architecture,

Apache Hadoop is an open source architecture for distributed processing on networks of commodity hardware

– From Wikipedia:• “’Map’ step: The master node takes the input, divides it into

smaller sub-problems, and distributes them to worker nodes• ‘Reduce’ step: The master node then collects the answers to all

the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve”

Page 15: Boston KM Forum

15

• Hadoop commercial application domains (from Wikipedia) include – Log and/or clickstream analysis of various kinds– Marketing analytics– Machine learning and/or sophisticated data mining– Image processing– Processing of XML messages– Web crawling and/or text processing– General archiving, including of relational/tabular data,

e.g. for compliance

Page 16: Boston KM Forum

16

Hadoop

• Hadoop is popular and rapidly evolving– Most leading information management vendors

have embraced Hadoop– There is now a Hadoop ecosystem

Page 17: Boston KM Forum

17

Meanwhile, Back in the Googleplex

• Dremel, BigQuery, Spanner, and other really big data projects

Page 18: Boston KM Forum

18

Meanwhile, Back in the Googleplex

Page 19: Boston KM Forum

19

Google Now

Page 20: Boston KM Forum

20

A NoSQL Taxonomy

• From the NoSQL Wikipedia article:

Page 21: Boston KM Forum

21

A View of the NoSQL Landscape

Page 22: Boston KM Forum

Another NoSQL Landscape View

Page 23: Boston KM Forum

23

NoSQL Perspectives• The “NoSQL” meme confusingly conflates

– Document database requirements • Best served by XML DBMS (XDBMS)

– Physical database model decisions on which only DBAs and systems architects should focus• And which are more complementary than competitive with DBMS

– Object databases, which have floundered for decades• But with which some application developers are nonetheless enamored, for

minimized “impedance mismatch,” despite significant information management compromises

– Semantic (e.g., RDF) models• Also more complementary than competitive with RDBMS/XDBMS

• Also consider: the “traditional” DBMS players can leverage the same underlying technology power curves

Page 24: Boston KM Forum

24

Modeling AbstractionsResources Relations

Conceptual Documents and links; documents focused primarily on narrative,

hierarchy, and sequence

Entities, attributes, relationships, and identifiers

Logical Model: hypertextLanguage: XQuery (ideally…)

Model: extended relationalLanguage: SQL

Physical Indexing (e.g., scalar data types, XML, and full-text), locking and isolation levels (for transactions), federation, replication/synchronization, in-memory

databases, columnar storage, table spaces, caching, and more

Page 25: Boston KM Forum

25

Data as a Service• The (single source of) truth is out there?...

– High-quality data sources are being commoditized– Value is shifting to the ability to discern and leverage conceptual

connections, not just to manage big databases• Some resources and developments to explore

– Social networking graphs and activities– Data.com (Salesforce.com)– Data.gov– Google Knowledge Graph– Linked Data– Microsoft Windows Azure Data Marketplace– Wikidata.org– Wolfram Alpha

Page 26: Boston KM Forum

26

Mainstreaming Semantics• Tools and techniques applied in search of more

meaning, e.g.,– Vocabulary management– Disambiguation and auto-categorization– Text mining and analysis– Context and relationship analysis

• It’s still ideal to help people capture and apply data and metadata in context– Semantic tools/techniques are complementary

Page 27: Boston KM Forum

27

Mainstreaming Semantics• The Semantic Web is still more vision than reality– But Google, Microsoft, and Yahoo, and Yandex, for

example, are improving Web searches by capturing and applying more metadata and relationships via schema.org schemas in Web pages

– And Google’s Knowledge Graph is about “things, not strings,” with, as of mid-2012, “500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects”

Page 28: Boston KM Forum

28

Recap

• Commoditization and cloud– Very significant new opportunities

• Hadoop and related frameworks– Complementary to RDBMS and XDBMS

• NoSQL– Likely headed for meme-bust…

• Data services– Game-changing potential

• Semantic tools and techniques– Rapidly gaining momentum

Page 29: Boston KM Forum

29

Risks• The potential for an ever-expanding set of information silos

– Focus on minimized redundancy and optimized integration • GIGO (garbage in, garbage out) at super-scale

– New opportunities for unprecedented self-inflicted damage, for organizations that don’t model or query effectively

• Cognitive overreach – The potential for information workers to create and act on

nonsensical queries based on poorly-designed and/or misunderstood information models

• Skills gaps can create competitive disadvantages– Modeling, query formulation, and data analysis– Critical thinking and information literacy

Page 30: Boston KM Forum

30

Recommendations

• Aim high: big data is in many respects just getting started…– A lot of technology recycling but also significant

and disruptive innovation• Work to build consensus among stake-

holders on the opportunities and risks• Focus on human skills – e.g., critical thinking

and information literacy– For now, an instance of the most creative and

powerful type of semantic big data processor we know of is between your ears

[End of tweaked Gilbane presentation]

Page 31: Boston KM Forum

31

Gilbane 2012 Impressions• The big themes– Cloud– Social– Mobile– Big data– Web

• Other recurring themes– Open source: enterprise-ready for many domains

Page 32: Boston KM Forum

32

Gilbane 2012 Impressions• Projections– Consolidation ahead for W*M and ECM vendors• Likely to be accelerated by market uptake of native XML

information management systems– And rediscovery of the utility of modern DBMSs

» Along with SQL/XML (e.g., XQuery) synergy

– Cloud as accelerator• Ridiculously low entry cost and complexity, relative to

earlier on-premises alternatives• Tipping point with other shifts to cloud, e.g., for social,

CRM/SFA, and public data sources

Page 33: Boston KM Forum

33

Gilbane 2012 Impressions• Projections– New challenges and opportunities for IT groups• Potential to derive unprecedented value from both

existing and new information resources• Transition systems to “the cloud”

– With or without IT assistance…

– Blurring boundaries• Application, document, page…• Ability to apply and capture data and metadata in

context, e.g., activity streams

Page 34: Boston KM Forum

34

Gilbane 2012 Impressions• Projections– The next critical IT scarcity is not about technology

• It is instead the number of people who can– Think critically and structure problems/scenarios– Understand and apply conceptual models– Formulate queries and objectively analyze results

» And generally get into an event/action routine, for work and personal activities

– Growing awareness of the critical need for information responsibility• Producer: information quality, integrity, context…• Consumer: information literacy; critical and purposeful thinking

Page 35: Boston KM Forum

35

Reference Slides

• Content management + open source• Hypertext

Page 36: Boston KM Forum

36

Open source examples

Page 37: Boston KM Forum

37

Open source examples

Page 38: Boston KM Forum

38

Open source examples

Page 39: Boston KM Forum

39

Open source examples

Page 40: Boston KM Forum

40

Hypertext

• Criteria from a 2006 Burton Group report:– A content model based on collections of

information items and links– Pervasive support for info item labels– Typed and bidirectional info item relationships– A means of creating, organizing, and sharing info

item collections– Journaling (tracking info item changes)– Robust access control privilege management

Page 41: Boston KM Forum

41

Discussion

[email protected]


Recommended