Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | rajatkr |
View: | 335 times |
Download: | 0 times |
Building intelligence through semantics
Text Analytic
s
Text Analytics
OntologyBuilding
Context Analysis
Sentiment Analysis
MachineLearning
• A semantic technology service provider leveraging its capabilities to provide standardized and bespoke solutions
• One of 5 companies worldwide named as Semantic Application Specialists by Gartner (Who’s Who of Text Analytics, September 2012)
About Veda
Who we are
• Started as a JV with the Fraunhofer Institute, Germany• Earlier part of 3i Infotech, a large listed IT form. Acquired by current promoters as
part of a management buy out
Formation and
background
Location
Team
• Headquartered in Bangalore, India’s software capital, with ready access to critical talent
• Currently a 20 member team, also having a sales presence in Chicago, USA. Key members of technology team each have over a decade’s worth of experience in semantic technology
Awards and references
3
Unstructured Data:• Consists of textual
information like contracts, emails, presentations
• 70% of organizations’ information remains in an unstructured form hence it is not utilized at all.
~70%
~30%
Structured Data:• Consists of information
from ERP, CRM systems, XML data
• It is organized and manageable
• Currently only 30% of organizations’ information is analysed for decision making
Enterprise’ Information Distribution
5
Are we using only structured data for decision making? What are the critical misses that are made as a result?
What is hidden in unstructured data
6
• Customer complaints
• Employee feedback
• Brand perception
• Financial data from reports
• Competitive news
• Information
• Facts
• Events etc.
• And many many more….
• Insights
• Opportunities
• Risks
Examples of unstructured data What it contains
• Just the things needed for good decision making!
Semantics – making sense of unstructured data
• Semantics is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for their denotation. [Wikipedia]
• SEMANTICS = MEANING• It is about describing things
• In linguistics, semantics is the subfield that is devoted to the study of meaning as inherent at the levels of words, phrases, sentences, and larger units of discourse.
7
Industry Overview - Need for Semantic Technology
• Heterogeneous• Distributed• Unorganized
• Increasing numbers• Increasing Sources• Unmanageable
• Keyword search is inefficient• Lack of Classification and relevance• Focus on “Search” rather than “Find”
Information overload
High data volumes
Inefficient retrieval
The definition of ‘Data’, which had been artificially restricted to only numerical data, can now extend to text and other unstructured data as well…
…Providing more insights and richness for decision making
8
9
Top 9 Technology Trends Likely to Impact Information Management in 2013
Technology Trend
Big Data
Modern information infrastructure
Semantic technologies
The logical data warehouse
NoSQL DBMSs
In-memory computing
Chief data officer and other information-centric roles
Information stewardship applications
Information valuation / infonomics
Source: Gartner
Broadly, text based offerings can be clubbed under two main heads
10
Statistical text mining Natural language processing
• Looks for documents based on statistical techniques.
• Helps identify high frequency terms or expressions
• Identifies other terms being used in conjunction with them
• Assigns match probability to documents based on mathematical techniques to facilitate searches and knowledge management
• Accuracy could be improved further by using machine learning principles
• Parses a sentence to identify nature of words in it
• More relevant for sentence level analysis as opposed to document level analysis
• Principles of English, as opposed to statistical techniques, take precedence in analysis
• Accuracy dependent on strengths of algorithms written
• Primary applications: Text mining and document matching (eg VoC analysis, Email analysis, E Discovery, etc)
• Primary applications: Named Entity Extraction (knowledge management), Sentiment analysis (VoC analysis, E mail monitoring, etc)
Industry Overview – usual application areas
Areas Technique used
11
Marketing
Compliance
Risk analysis, Fraud detection
Social media analyticsBetter advertising placementCRM information capture and action
E DiscoveryAuto classificationForensic analysis
Knowledge Management
Pattern analysisPredictive modelling
Auto tagging and classificationDiscovery (eg healthcare information sharing)
Sentiment Analysis using NLPCoupled with vertical specific taxonomies
Statistical text miningNamed Entity Recognition (NER) Machine learning
Statistical text mining Named Entity RecognitionCoupled with structured data (e.g. frequency of mails, department information, etc)
NER (for named entities)Statistical text miningCustom ontologies / semantic networks
Vertical specific use cases
Examples:Financial services, Publishing, Pharma, Healthcare, Legal, Insurance, etc
Various degrees of text mining, NLP and sentiment analysis, and entity extraction techniques
But purely from an R&D perspective, quality thresholds have a very high standard deviation
NLP
eDiscovery
Ontology
• Attaching sentiment to attribute, and attribute to object• Handling basic keywords (e.g. I like something, vs. something is like another)• Vertical taxonomies that allow aggregation• Vertical specific sentiment words (e.g. executing a man vs. executing a
transaction, high fuel economy vs. high fuel consumption)
High variability in Recall and Precision ratesTagging of concepts remains difficultSummarization techniques based on basic lexical parsing
Limited use casesOften seen as multi year projects as opposed to quick win areas
12
The reason for the quality difference is that at many times, client context is not fully understood and the software is not trained on such context
13
• What is the primary purpose for which the tool will be used for: finding trends, better search, forensics, fraud prevention, building predictive models, etc
• Are certain terms so common that they must be ignored while doing an analysis
• Are there domain specific words that attain a different meaning than in other domains (eg ‘execution’ has a different meaning in financial services than in the news domain)
• Should weightages assigned to certain kinds of documents / words be increased to improve relevance
• How will the results be presented – are they to be shown visually and not be connected to other enterprise systems, or should they be an integrated part of the overall BI roadmap of an organization
Unlike traditional systems, text analytics has a large dependency on context. Consequently, in order to unleash its full potential, the usual bifurcation between consultancy, software development and software implementation must disappear in the case of text analytics. An off-the-shelf product approach will definitely not help, and one must adopt a services model to better serve client needs!
In addition, there is limited focus on client needs and use cases
• Companies mostly founded and run by technology experts
• Focus on technology capability and terms as opposed to problems to be solved
Technology focused
• Leave out value to be derived by examining enterprise specific data more closely, or integrating it with structured data for greater insights
Product approach
Customer language
14
16
An example of our Natural Language Processing capabilities
“The car model looks like the old one”
“I loved the food, but the service was terrible”
“Did anyone like the car?”
“I really luuuuv it”
“The Tokyo office does not like the current prototype of the product. Bob said we should talk to them to find out why they are unhappy. Must close this ASAP to get the launch done by August 2013.”
• Can tag sentiments to attributes, and attributes to products
• Can handle difficult words, eg ‘like’ based on context – most engines cannot
• Can handle anaphora resolution (eg pronouns)
• Can handle Named Entity Recognition with high recall and precision
IP protection:
• Patent being filed for clause based sentiment extraction process
17
Our Discovery product demonstrates the NLP capability in a powerful manner, making consumer feedback actionable
• In this example about a vehicle, most people care about comfort, and luckily, the product gets mostly positive reviews in this area
• Clickthrough allows deeper dives into each category
• Though price gets mainly negative reviews, not too many people seem to talk about it. Perhaps a discount scheme could help?
• Actual sentences are displayed, and things to which the sentiments are attached are highlighted
• Sentiments are associated with specific aspects of the product
Example of Natural Language Processing in Financial Domain (continuing R&D)
18
Extracts economic factors that have been impacted
Recommendations and predictions help analyze complex financial information in quickest time.
Helps in predictive analytics
Linguistic rules to extract financial / economic indicators
Domain specific verbs and nouns to understand movement
Financial markets rebounded strongly in 2006's third quarter .
FINANCE ENT : Financial markets
ACTION : rebounded
TIME : 2006's third quarter
MOVEMENT : UP
By the end of the third quarter , crude oil had fallen over 20 %
from its[crude_oil] July peak , while a similar retreat in natural
gas prices produced the latest high-profile hedge fund debacle .
FINANCE ENT : crude oil
ACTION : had fallen
TIME : the end of the third quarter
QUANTITY : 20 %
MOVEMENT : DOWN
FINANCE ENT : natural gas prices
ACTION : produced the latest high-profile hedge fund debacle
MOVEMENT : DOWN
Prices of longer-dated bonds rallied too : the 10-year U. S.
Treasury bond yield fell over 60 basis points during the third
quarter .
FINANCE ENT : Prices of longer-dated bonds
ACTION : rallied
MOVEMENT : UP
FINANCE ENT : the 10-year U. S. Treasury bond yield
ACTION : fell over 60 basis points
TIME : the third quarter
QUANTITY : 60 basis points
MOVEMENT : DOWN
Example of Natural Language Processing in Financial Domain – highlighting outlook by driver (continuing R&D)
As the fourth quarter begins , financial markets remain supported by
positive earnings and interest rate trends .
FINANCE ENT : financial markets
ACTION : remain supported
TIME : the fourth quarter
CAUSE : positive earnings and interest rate trends
EFFECT : financial markets remain supported
However , the pace of U. S. economic activity will slow further by
year-end as weakness in the housing and automotive sectors becomes
increasingly acute .
FINANCE ENT : the pace of U. S. economic activity
ACTION : will slow
TIME : year-end
MOVEMENT : DOWN
CAUSE : weakness in the housing and automotive sectors becomes
increasingly acute .
EFFECT : the pace of U. S. economic activity will slow year-end
Example of Natural Language Processing in Financial Domain -extracting Cause and Effect (continuing R&D)
20
An example of our Enterprise capabilities
• Ontology modeling using RDF and OWL semantic web standards
• Document Matching / Similarity using statistical models and concept based approach for Patent Search, Knowledge Management etc..
• Information Extraction using linguistic models for Fraud Detection, analysis of news stories etc..
• Demonstrated capability for patent search, legal cases, handling survey data
• Machine learning capability allows for precision to be attuned and increased for specific client situations
• Can disambiguate based on domain specific situations, e.g. execution may mean a different thing in a news domain, vs. executing a transaction in financial services domain
21
22
Veda Text Mining capability – key features
• Data input in various forms (eg txt, doc, etc)• Can accept data from public sources (eg Facebook, Twitter) apart from Enterprise sourcesInput
Preprocessing
Processing
Categorization
UI, editing and export
• Removal of junk text around emails• Removal of small Emails like “Thanks” • Removal of forwarded Emails attached to main Email from analysis• Spell checks and autocorrects• Language parsing for English
• Natural Language and Statistical Processing techniques • Extraction of key discussion items from the text, and what is being said in relation to them• Key themes from messages and semantic chaining. Can be combined with sentiment analysis as well.• Ability to handle high velocity and high volume data using Big Data infrastructure (Hadoop, Storm, etc.)
• Group discussion items into categories and sub categories, while identifying what is being said about them:• Automatic for synonyms, singular and plural, etc• Ability to add / delete categories• Ability to further analyse sub-categories
• Simple, easy custom built UI with filtering and drill down capability• Machine learning approach where human insight guides further results• Output not only available in visual format, but exportable to other applications or databases
23
Veda Text Mining capability – screens of analysis in progress
Clustering conversations into categories using semantic analysis.
Example customized outputs
Proof of Concept
Trial & Demonstration Delivery Methodology
High-level client requirements Detailed solution requirements
- Define the scope of work - Delivery framework (core offering + value added services)
- Documented External Interfaces with Volume and associated recurring cost (if any) information
- User Guide & Training
- Proof of concept - Methodology (Agile, Waterfall approach or client specified approach)
- Timelines for each deliverable
- Responsibility Matrix
Our Delivery Capabilities
24
Test & Verify
Analysis and
Design
Business Require-
ments
Machine Learning
ReleasePost
Release Support
Project Closure
Data Set Creation
Develop-ment
Feature Selection
Project Kick-offProject
Delivery
Program Initiation
Program Benefits Tracking
Program Mgmt
Program HR
Mgmt
Change Analysis
Program Activities
Infrastructure Readiness
Support Delivery
TrainingOperational Readiness
Support Activities
Delivery Methodology
Client assignments
25
Ph
ase
1 Veda will solve a business challenge you choose to demonstrate the power of a semantics based solutions in a quick turn around (Typically within few days)exercise
Ph
ase
2 Taking the next step
*Implement for a business function/division/a single geography
*Multiple features of SIS implemented including cross business solutions leading to concrete measurable gains
Ph
ase
3 Replicating the success of the previous phase –
*Across Larger Sections of the enterprise
*Wider Data consolidation scope
*Multiple output delivery channels
*Visible long term gains
For bespoke development, we are prepared to start small, to show clients clear value and RoI
26
* Collecting unstructured data from disparate sources
* Analyse all collected unstructured data, Organize it using rich knowledge representation/domain ontologies
* Insights from Unstructured data coupled with Analytics from Structured Data assets (E.g. BI, Big Data)
27
But ultimately, we believe that clients will benefit considerably by a unified Semantic Information System
Marketing Purchasing Payroll
Data Mart
Data Mart
Data Mart
Unstructured data
(Server,SAN,SAS)
InternetPublic Web Data
Databases
Databases
Databases
Web Crawler
Email Crawler
Files Crawler
Social Media
Crawler
Visual Segregation
Veda Collection Processes Veda Organising Processes
LOB Applications
Staging Area Data Warehouse Reporting
Unstructured & Semi-Structured Data
Data
Unstructured Data
Online
Store into Cubes Processed data
Formatted data
Processed data Processed data Ready insights
Data
Categorized Data
Dashboards
Alerts
Social Media
chatter
Natural Language processing
Ontologies
Semantic Analysis
Knowledge Base
Auto Classification
OperationsSales
Stru
ctur
ed D
ata
Stru
ctur
ed D
ata
Stru
ctur
ed D
ata
Structured data
Our proprietary Collect – Organize- Present framework and tools allow us to undertake quick bespoke development
• Connectors— Collect information from variety of (heterogeneous) sources
• Information Extraction— Using NLP and semantic analysis
• Semantic Net / Ontology Editor— Smart knowledge representation of a domain
• Auto Classifier— Classify data and tag it to industry specific concepts automatically
• Ontology Reasoning— Analyze industry knowledge and infer from ontological knowledge
• Analytics— Identify various patterns and insights from the data
• Semantic Matching— Provide most relevant information
• Semantic Search and Browsing— Semantic explorer to retrieve contextual concept-based information
Collect
Organize
Present
28
Veda Approach – COP Framework
• Deep understanding of the Semantics space
• In the semantic technology space for more than a decade
• Expertise in both NLP and ontologies / taxonomies, and in standards (RDF / OWL)
• Team has provided services not only to clients, but to other semantic service providers
• Tie up with academia
• Tie up with leading Indian university in the area
• Allows for cutting edge R&D
• High quality talent pipeline
• Live - Delivery and Support Turnaround
— The Veda Platform is the core that— Is a solution accelerator giving a head start to all our assignments (tested and
certified components)— Allows for lower costs— Allows for incremental rollouts
29
Veda’s Value Proposition
Technology
Delivery
• Expertise in Multiple Business Domains
• Healthy mix of business and technology expertise – can provide clear use cases for
Semantics and help establish clear RoI metrics
• Core team members have had experience in Semantic technology since 2003, longer
than most other companies
• Technology team experienced in providing expertise in a wide variety of business
domains leading to speedy and effective solution implementations
• Located in India, with associated inherent advantages
• Lower cost options for clients with onshore – offshore model
• 24 hour work cycle
• Large talent pool
• Tie ups with companies focused on various other related technologies to offer
integrated offerings, eg full service offering / working with offshore vendor to make
outsourced processes more efficient using semantics
30
Veda’s Value Proposition (contd)
Experience
Location
• Text Analytics — Analyzing unstructured text, converting to structured data
• Machine learning— Statistical techniques resulting in increasing accuracy over time (with more inputs)
• Sentiment Analysis— Identifying if the sentiment of a sentence is positive, negative or neutral (and the various shades
in between)
• Semantic Information Retrieval— More artifacts searched/More accurate – e- Mails, Documents, Spreadsheets, Output from
existing structured data sources
• Semantic Web Standards— Standardized storage and output formats for easier information sharing
Veda’s End-to-End Semantic Expertise
32
Past Experience
Client Profile Project Description
A global publishing house in legal, tax, finance and healthcare
Context-based content research platform for tax & legal domain Automatic meta-tagging , ontology modeling and ontology driven
content reference system.
A prominent product manufacturer on inference and reasoning engine
Leveraged semantics for a supply chain process to integrate systems with heterogeneous data sources and help in automatic decision making in case of any disruptions in the cycle.
Provided ontology modeling and application development services.
A reputed university and complex systems research lab in Australia
Produced a method for organizing and potentially navigating the wide range of web-pages associated with the Murray-Darling river system in a seamless fashion
An analytics software manufacturer in Australia
Assist investigation of fraud and terrorism – Establishing links between entities
Unstructured data analysis
A premier worldwide online providers of news, information, communication, entertainment and shopping services
Developed a web analytics platform for analyzing click-stream data in real-time.
33
Some sample use cases mapped to our current technology demonstrators
Legal contracts
Current situation How Semantics will help Mapping to current Veda technology demonstrator
• Saved in C drives or in DMS, separate excel sheets maintained to check on timely renewals, etc.
• Tough to compare specific clauses across contracts or find relevant clause as needed
• Search for specific kind of contract and specific clause will throw up (a) master template (b) earlier contracts entered into in the area (c) extracts from the relevant clause
• Patent search demonstrator uses similar techniques, allowing the user to also see probabilistic match of documents
Process changes
• Dig deep into embedded code to see what departments and areas will get impacted
• Ontology based relational steps make it easy to see connected departments, processes, etc. that will be impacted
• Tax caselaw and section ontology created
Marketing
• Mapping social sentiment and reviews done manually or using dictionary based social monitoring tools
• Some social marketing and social listening already being done, though not accurate. A better quality NLP engine allows for more accurate results (e.g. the word ‘like’).
• Veda Discovery Engine which has sentiment capabilities
HR
• Obtaining right resumes using keyword search remains time consuming
• Employee suggestions in open ended surveys not aggregatable
• Qualitative comments in employee evaluations not aggregated
• Identify key intervention areas at aggregate levels
• Map trends in overall ratings to key strength and weakness areas
• Veda Discovery for aggregation, Veda Txt for identification of gist of comments
Knowledge management
• Metatagging remains a manual process and as a result, searches remain searches, not findings
• Automatic metatagging (Persons, Locations, Organizations, concepts, etc.)
• Veda Discovery – NER Engine, Veda Legal demonstrator, Veda Msg (for alerts)
34
Domain Description
Publishing, media
Allows automatic extraction of people, location, dates and events, being extended to themes and concepts. Helps in automatic metatagging.• Current tagging process is manual and time consuming. Technology provides clear RoI
by reducing this time and manual labour, providing consistent tagging, and allowing easier search for future reference, rather than relying on keywords (eg Mahatma vsGandhi vs Mahatma Gandhi).
Oil and Gas Can make Incident monitoring and reporting systems more robust, thereby reducing risk of major accidents• For incident reporting, a user need not fill in multiple structured data fields. Text
analytics can quickly match data to structured inputs.• Witness reports, once converted to text, can be monitored across incidents for patters
that would otherwise have gone unnoticed. Helps make process changes easier and allows all linked aspects to be seen at one go• Helps determine what other processes and safety regulations are relevant if a sub
process is sought to be changed (could also include contractual information etc if relevant)
Usually, companies have millions of oil well logs which can be classified by performing named entity extraction and enrichment
Sample use cases by industries
35
Domain Description
Financial services • Contract matching (including addendums)• VoC analysis
• Churn prediction• Highlights capability gaps
• Promotion management• Avoids duplication of creation of similar material across divisions / locations. Saving in man
hours and resources by leveraging all available material produced earlier• Risk analysis
• Manage and gather customer documents from various sources to look for areas of concern• “Know your customer” analysis• Competitor analysis• Financial news analysis for investment managers
Telecom • Legal interception and pattern recognition• SMS analyses for recognizing spam to avoid penalties • VoC analysis
Airlines • Analysis of unstructured problem and safety logs to avoid incidents
Sample use cases by industries
36
Domain Description
Healthcare • Link and compare patient records to obtain insights on:• Symptoms, medicines and discharge times to determine if some medication mixes may be
more beneficial than others across a wide set of patient records• why some patients may be re-admitted
Pharma • R&D improvement by allowing scientists, who need to refer to papers but may not know exactly what to look for, to see relevant topics (based on automatic metatagging, and linked ontology at the backend)
• Better knowledge management - automatically tag papers, saving scientist time and making search consistent
• Feedback analysis for product from distributors, doctors and end patients
Insurance • Broker document analysis to deepen insight on insured risks to improve risk management
Sample use cases by industries
37
Domain Description
Marketing • Voice of Customer analysis• New product ideas• Competitor analysis• Complaint monitoring
HR • Drawing insights from employee suggestions• Analysing unstructured inputs in evaluations and improving training efficacy
Risk • Internal document monitoring for risk and compliance
Legal • Better contract management
Sample functional use cases
38
• Configurable to any Business requirement across Industries
• Sources of content can be structured AND Unstructured
• Can be integrated to various Business Applications - ERP, Content Management, Portals, etc..
• Configurable User Interface with features such as:
– Saving of Search for later reference
– Tabbed Views
– No. of results to be displayed with sort order
Veda Solutions Currently Deployed
Veda for Business Process Workflow
39
Veda Social Media Analytics Registration & log in
Inputs from Social Media
Inputs from Blogs, Websites
Hierarchy & Relevance Analysis
Sentiment Analysis
Rich Reporting
Veda Solutions Currently Deployed
40
Veda Recruiter
Veda Solutions Currently Deployed
41
Veda Patent Search
Registration & log in
Subscription
Payment Gateway
Keyword Search
Semantic Search
Rich Internet Application
Saved Search
Filters
Veda Solutions Currently Deployed
42
Veda SMS Service
Registration & log in
Subscription
Payment Gateway
Keyword Search
Semantic Search
Legal ontology (Indian)
Filters
Veda Solutions Currently Deployed
• Crunches judgment text into high relevance words that can be sent through an SMS for immediate access
• Is combined with website service offering full access for relevant cases
44
Veda Semantics Pvt Ltd
www.vedasemantics.com
Contact person:Rajat Kumar (CEO)[email protected]# +91-9619308745
Contact details
45