A System for A System for Real-time Real-time Competitive Competitive Market Market IntelligenceIntelligence
SIGKDD 02 Edmonton, Alberta, Canada
Copyright 2002 ACM
Sholom M. Weiss and Naval K. VermaIBM T.J. Watson Research Center,P.O. Box 218, Yorktown Heights, NY 10598, USAsholom @ us.ibm.com, nverma@ ics.uci.edu
AgendaAgenda• Introduction
• Related technical research areas
• Methods and procedures
• Lightweight rule induction
• Results and discussion
Presented byJoyce Chen
IntroductionIntroduction
GoalDetect critical differences in the text written about a
company vs. the text for its competitors.
Real-time market intelligence and competitive analysis.
Many sources of news that feed in real-time.
So much information is available on-line and is immediately accessible.
Presented byJoyce Chen
Introduction (Continued)Introduction (Continued)
Overall design consists of the following components:
Real-time crawler.
conditional document retriever.
Text analysis techniques that convert the documents to numerical format.
Rule induction methods for finding patterns in data.
Display result.
Presented byJoyce Chen
Related technical research areasRelated technical research areas
Researchers: Learning rules for extraction of useful data from unstructured information.Using an information retrieval score (tf/idf) to rank words and phrases that characterize each web site.
In this paper:The goal is also to learn, not for the purpose of assembling data, but to find patterns in unstructured text that distinguish among competitors.Have a real-time crawler We apply a complete pattern. The patterns are not just sets of words but more meaningful conjunctions and disjunctions.
Presented byJoyce Chen
Methods and ProceduresMethods and Procedures
Presented byJoyce Chen
Methods and Procedures (Continued.)Methods and Procedures (Continued.)
Crawl the net in real time for articles about the competitors.
Specify conditions for separating the documents into groups for comparison.
Transform the text into a numerical form in preparation for applying machine learning methods.
Apply machine learning methods. (Decision rule induction methods)
Determine interesting word patterns for specific companies.
Presented byJoyce Chen
Methods and Procedures (Continued.)Methods and Procedures (Continued.)Assembling the documents for a designed group of competitors
Presented byJoyce Chen
Methods and Procedures (Continued.)Methods and Procedures (Continued.)
Presented byJoyce Chen
Methods and Procedures (Continued.)Methods and Procedures (Continued.)
Presented byJoyce Chen
Lightweight Rule Induction method (LRI)Lightweight Rule Induction method (LRI)
The method learns compact disjunctive normal form (DNF).
DNF (Disjunctive Normal Form) 或語範式A wff is a DNF, iff (1) clauses 之間是 disjunctive, 且 (2)clauses
之內是 conjunctive 。例: (PQ)(XYZ)
Figure 5 shows and example of a typical DNF rule generated by LRI.
In this example, the rule has a length of three with two disjuncts.
Presented byJoyce Chen
Result and discussionResult and discussion
Step1:Starting with stories dated after September1, 2001 crawl the newswires.
Collect stories for IBM, Microsoft, Dell, Compaq, and Sun.
Sample very 15 miniutes and add any new materials.
Clean and convert to XML.
Add stories to current data base.
Step2:Indicate conditions for forming analytical groups and labels
IBM stories: December 1, 2001 ~ December 10, 2001 V.S.
December 11~ December 31.
Presented byJoyce Chen
Result and discussion (Continued)Result and discussion (Continued)
Step3:Compare using rules for form A or B , where A and B are no more than 2 words each.
Resulting patterns:
Service or network
York or work
Step4:Delete york and invoke a new comparison.
Resulting patterns:
Sign or systems is added as the second pattern.
Display documents and highlight words
Presented byJoyce Chen
Result and discussion (Continued)Result and discussion (Continued)
Step5: New conditionsIBM vs. Sun for the month of December.
Step6: CompareResulting patterns:
IBM: data or sign.
Step7: New conditionsIBM to competitors market cap increases vs. IBM market cap decreases, same time period.
Presented byJoyce Chen
Snapshot comparing IBM and Microsoft NewswiresSnapshot comparing IBM and Microsoft Newswires