+ All Categories
Home > Technology > The Next Generation SharePoint: Powered by Text Analytics

The Next Generation SharePoint: Powered by Text Analytics

Date post: 11-Nov-2014
Category:
Upload: alyona-medelyan
View: 2,358 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
39
PLATINUM SPONSOR GOLD SPONSORS THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS Alyona Medelyan (Pingar) @zelandiya
Transcript
Page 1: The Next Generation SharePoint: Powered by Text Analytics

PLATINUM SPONSOR

GOLD SPONSORS

THE NEXT-GENERATION SHAREPOINT:

POWERED BY TEXT ANALYTICS

Alyona Medelyan (Pingar)@zelandiya

Page 2: The Next Generation SharePoint: Powered by Text Analytics

AGENDA

• Information tasks • Text analytics• APIs• Demos• Conclusions

Page 3: The Next Generation SharePoint: Powered by Text Analytics

Information tasksWhat do they cost us?How does SharePoint help?

Page 4: The Next Generation SharePoint: Powered by Text Analytics

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

14.513.3

9.6 9.5 8.8 8.36.8 6.7

5.6 5.64.3 4.2

1

Avg. hours per week

= $37K year / person

Source: IDC, Hidden Cost of Information (2005)

Page 5: The Next Generation SharePoint: Powered by Text Analytics

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

SHAREPOINT SAVES TIME Interact with SP from Outlook

Create docs collaboratively Customize search configuration

Define Managed Metadata Configure forms

Design Workflow

Use sites, sets & libraries

Page 6: The Next Generation SharePoint: Powered by Text Analytics

Text AnalyticsWhat is it and how does it work?What tasks does it solve?

Page 7: The Next Generation SharePoint: Powered by Text Analytics

Text MiningNatural Language Processing

WHAT IS TEXT ANALYTICS?

unstructured data

Opinion MiningBusiness IntelligenceDocument Organization

Data ExtractionSearch

Machine Learning

Text ProcessingStatistics

Linguistics

Page 8: The Next Generation SharePoint: Powered by Text Analytics

Emails

Creatin

g doc

s

Analyz

ing in

fo

Search

ing

Review

ing

Gatheri

ng in

fo

Organiz

ing do

cs

Creatin

g pres

entat

ions

Creatin

g imag

es

Data en

try

Doc ap

prova

l

Publish

ing

Transla

ting

TEXT ANALYTICS SAVES MORE TIME

Compose search reports Extract entities

Redact

Generate metadata Fill databases

Cluster search results

Summarize

Mine opinions & sentiment… automatically

Profanity check

Page 9: The Next Generation SharePoint: Powered by Text Analytics

Text Analytics SoftwareWhat companies offer text analytics?What are open source tools like?

Page 10: The Next Generation SharePoint: Powered by Text Analytics

TEXT ANALYTICS: GLOBAL PERSPECTIVE

User adoption has grown by 25% in 2010 creating an $835 million market because:

• Unstructured data grows (ex. social) Text analytics!

• Text analytics is central to effective information access

• Many successes in NLP: IBM Watson, Wolfram Alpha

Full report by Seth Grimes: http://altaplana.com/TA2011

Page 11: The Next Generation SharePoint: Powered by Text Analytics

APPLICATIONS OF TEXT ANALYTICS

Law enforcementMillitary intelligence

Insurance & fraudContent management

OtherFinance

Online commerceProduct design

Life sciencesE-discovery

Customer serviceCompetitive intelligence

ResearchBrand management

Customer experience managementSearch & info access

6%7%

8%8%

9%10%

11%15%15%15%

26%33%

36%39%39%39%

Source: http://altaplana.com/TA2011

Page 12: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESS METADATA EXTRACTION

Document Easy to extract: File type, name & location, creation & modification date, authors

Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned

Metadata

Page 13: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Page 14: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Page 15: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates Properties

FrequencyPosition

Corpus statsRelatedness

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Page 16: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESSKEYWORD EXTRACTION

Document KeywordsCandidates Properties

Heuristicscoring

Machinelearning

Scoring

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Page 17: The Next Generation SharePoint: Powered by Text Analytics

SEARCH & INFO ACCESSNAMES EXTRACTION

Document Names

If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

Training data(annotations)

Examples Properties Learning

NLP,Heuristics,Text mining

Machine Learning

Page 18: The Next Generation SharePoint: Powered by Text Analytics

<SEARCH + TEXT ANALYTICS> COMPANIES

Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv

Page 19: The Next Generation SharePoint: Powered by Text Analytics

BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS

Sentiment Analysis

If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.

BUT:Negativesuck

terribleawful

Positivefantasticexcellentawesome

Naïve approach: Sentiment-words dictionary!

DocumentDocumentReviewsTweetsSurveys

VisualizationSummary

No sentiment words!

Page 20: The Next Generation SharePoint: Powered by Text Analytics

BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS

DocumentDocumentReviewsTweetsSurveys

VisualizationSummary

Examples

Training data(annotations)

PresencePosition

Part-of-SpeechNegation

Generalization

Properties

Lexicon induction

Learning

Machine Learning

Important: Identifying sentiment bearing sentencesAttaching sentiment to a topic!

Page 21: The Next Generation SharePoint: Powered by Text Analytics

SENTIMENT ANALYSIS COMPANIES

Attensity AlchemyAPI LexalyticsSaploMedalliaSAS

Page 22: The Next Generation SharePoint: Powered by Text Analytics

RESEARCH TEXT SUMMARIZATION

Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.

AddressAnnouncement

Details

More details

Conclusion

Extractive summary: As of today, MetaStock has several new functions.Sentence compression: MetaStock has several new functions.

The new interface looks different.Abstractive summary: MetaStock has new features and a new interface.

Page 23: The Next Generation SharePoint: Powered by Text Analytics

TEXT SUMMARIZATION COMPANIES

Lexalytics, Pingar

Page 24: The Next Generation SharePoint: Powered by Text Analytics

COMPETITIVE INTELLIGENCE:ENTITY & ENTITY RELATION EXTRACTION

Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta

Page 25: The Next Generation SharePoint: Powered by Text Analytics

FRAUD INVESTIGATION:NORMALIZATION OF DATES & NAMES

Companies: Cicero, BasisTech

Page 26: The Next Generation SharePoint: Powered by Text Analytics

OPEN-SOURCE TOOLS

• NLTK – Apache license, Book, Python & academic datasets, nltk.org

• LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe

• OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp

• GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk

• Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu

Page 27: The Next Generation SharePoint: Powered by Text Analytics

APIsWhat’s an API and how does it work?What are the advantages of the API model?Which API is the right one for you?

Page 28: The Next Generation SharePoint: Powered by Text Analytics

API ENGINE

API ACCESS

Developer creates an application

Software enginesolves a specific task

An interface thatensures communication

calls via a web service

includes API authentication

a call is an XML messagedescribing the request

a protocol specifies how XML needs to be encoded

• SOAP• REST

SDKusage examples

Page 29: The Next Generation SharePoint: Powered by Text Analytics

REST API ACCESS FROM A BROWSER

API requesthttp://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&context=Italian+sculptors+and+painters+of+the+renaissance+favored+the+Virgin+Mary+for+inspiration

API response

Page 30: The Next Generation SharePoint: Powered by Text Analytics

SOAP API ACCESS FROM VS2010

Page 31: The Next Generation SharePoint: Powered by Text Analytics

SOAP API ACCESS IN POWERSHELL

Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate

Page 32: The Next Generation SharePoint: Powered by Text Analytics

API = EASY INTEGRATION & FLEXIBILITY• Integrate into existing architecture

via any programming language

• Improve known flaws in the current system/process

• Minimize adoption barriers within the companyno or little training required for stuff

• Only pay for the features you need

• Flexible deployment:• Host API on site = Secure data exchange

• Access the API in the cloud = Save on tech support & hardware

Page 33: The Next Generation SharePoint: Powered by Text Analytics

WHICH API IS BEST FOR YOU?

I need to take some text and get a list of the important entities/keywords/phrases.

Blog post on API comparison:faganm.com/blog

Y: Term ExtractorOpenCalaisBeliefNetworksOpenAmplifyAlchemyAPIEvri

API restrictionsSupported languagesQuality of resultsSemantic linksSynonyms/Duplicates

1st2nd

Page 34: The Next Generation SharePoint: Powered by Text Analytics

HOW TO CHOOSE AN API:

• Define a specific task• Think of what features are important• Get prepared:

• Subscribe for API keys

• Get SDKs

• Learn libraries

• Find representative data• Build a test framework• Compare results

Page 35: The Next Generation SharePoint: Powered by Text Analytics

METADATA EXTRACTION IN SHAREPOINT

Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API

Page 36: The Next Generation SharePoint: Powered by Text Analytics

INTEGRATING APIS INTO SCANNING

Video Using Fuji Xerox SmartConnect and Pingar APIto scan documents in batch into SharePoint

http://www.youtube.com/watch?v=kluVp25upag

Page 37: The Next Generation SharePoint: Powered by Text Analytics
Page 38: The Next Generation SharePoint: Powered by Text Analytics

THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS

• What can be automated?• Metadata extraction, Data entry, Opinion mining,

Sanitization, Doc approval, Summarization, …

• How to integrate text analytics into existing SharePoint applications?• Easy! Via an API

• How to find the right text analytics API?• Review what’s available

Set up an experiment Compare results

Page 39: The Next Generation SharePoint: Powered by Text Analytics

Thank you to all of our Sponsors

PLATINUM SPONSOR

SILVER SPONSORS

GOLD SPONSORS

BRONZE SPONSORS


Recommended