+ All Categories
Home > Data & Analytics > Solving a business problem through text and sentiment mining

Solving a business problem through text and sentiment mining

Date post: 11-Apr-2017
Category:
Upload: onkar-jadhav
View: 111 times
Download: 1 times
Share this document with a friend
18
Solving a business problem Solving a business problem through Text and Sentiment through Text and Sentiment mining mining Onkar Jadhav [email protected] 405-762-2829
Transcript
Page 1: Solving a business problem through text and sentiment mining

Solving a business problem Solving a business problem

through Text and Sentiment through Text and Sentiment

miningmining

Onkar Jadhav

[email protected] 405-762-2829

Page 2: Solving a business problem through text and sentiment mining

Note:

This study has been a major part of my research and expe-rience. Things and theories expressed are solely based on my approach and validations. There might exist a differ-ent way to deal with it.

Below are the stages from a grass root survey, where we actually got to know about an issue and then how we ad-vanced through each stage.

The StagesThe Stages

Interview/Survey of the audience concerned helps us to get to know about the issues/concerns faced by the audi-ence. This stage is when we are not really sure (ambiguous thoughts) about the concerns we are to tackle.

Reading through the survey stage/ interviews, helps us to identify the keywords mentioned by interviewees and the audience. The frequency of the keywords should benoted .

•A. Survey Stage :

•B. Human Intervention Stage:

Page 3: Solving a business problem through text and sentiment mining

These keywords could be ―concerns‖ of the interviewees or even ―solutions‖ to concerns. They could also be ―sentiments‖.

C. Revised survey/interviews

After human intervention , the initial surveys should be reframed. This time more structured, asking specific questions and revolving around the keywords found in Stage B. The questions should be strategic and should be framed in such a way that those ―keywords‖ should serve as answers to them.

D. Cleaning of the interview/Survey data This depends on the type of software you’d be using for text/sentiment mining. Each software needs input in a specific format (format of text at its input) and the results it gives, depends on this in-format. Cleaning could be done using Python, R or C. Cleaning can even have an human intervention , where one reads through inter-views and filters down important chunk of text.

Page 4: Solving a business problem through text and sentiment mining

E. Text Mining results and validations. I have used SAS text miner for text mining and would try to adhere to concepts rather than being software specific. ―Concept Links‖ is a techniques that gives associations between keywords. ―Text Clustering‖ would cluster words together which appear in proximity when they were uttered by the interviewees. ―Rules Builder‖, helps us to build certain rules (predefined tags) , which would sort the text as per the tags.

F. Thinking of alternatives and brainstorming to solve the-se validations When we have validated the concerns through data , we have to think of solutions which could solve these con-cerns. I had to validate ―Transportation‖ issue. I thought of ways and means by which we could deploy a new transportation system/ improve the existing transporta-tion systems and make it efficient, so-on. With each alter-native we need to have a revenue model built. Right from the capital investment phase , including the cost that would incurred in its running phase and methods to make it sustainable.

Page 5: Solving a business problem through text and sentiment mining

G. Pitching the best possible alternative After stage F. , we should be in a position to defend with our best possible alternative. Best alternative is one which has best ROI (return on investment), has low capital in-vestment, low on maintenance, is self-sustainable , well defined liability incase a failure occurs. This best possible solution should be pitched with its business/revenue model and with the problem validations from stage E.

I. Pilot model deployment. This stage involves setting up of resources and inventories to deploy a pilot model. This arises when our idea pitch-ing stage went well and we have been given a green signal to deploy our pilot model. This stage involves an actual re-al-time model , put down to collect real time data and simulation of scenarios/use cases.

J. The final stage This involves evaluation of our pilot model, The KPI’s of the model, simulation of the real time data collected, de-gree to which the model could solve the problem, break-even of the cost incurred and cost collected. Future scenar-io evaluation.

Page 6: Solving a business problem through text and sentiment mining

This is a preliminary stage, when we are not really sure of the business problem or the concern that we need to ad-dress. One can skip this phase if the concerns are already clear. In my case, we were not sure about the concerns and issues of our audience. We though had a brief idea about these concerns but we were not sure about them. Our audience were staff at the clinics. Our questions/survey at this stage were vague. This was just to have a fish eye view of issues. The type of questions framed were:

A.What kind of things you suppose are hindrances in nor-mal workflow efficiency of clinics.

B. What are the normal , day to day activities which cause loss of time or revenue

These are open ended questions. Their answers would be in terms of phrases/few sentences. One can have survey questions framed with ―options‖, to chose answers from. This limits the responses of the responders to those 4/5 options .

A.Do you think some redundant activities affect your dai-ly efficiency of workflow in clinics

a. Strongly Agree

A. Survey Stage :

Page 7: Solving a business problem through text and sentiment mining

b. Agree c. Neutral d. Disagree e. Strongly Disagree One can even frame questions , which could be answered on a scale as follows:- a lot frequent/Frequent/not so fre-quent/Never or a Yes/No. Finally if followed this approach, approach of options, one then needs to recode these options numerically , to gauge the metrics. An example is shown below: Strongly agree = +2 , B. Agree = +1 , Neutral = 0, Disa-gree = -1 , Strongly Disagree = -2. Finally on totaling all the responses of all the responders one could get a cumulative weight for a particular ques-tion. One can have both, a quantitative survey and a qualitative survey. As this post, mostly talks about text mining, I would rather address qualitative approach. Responders address questions with open ended answers, mention key-words which they suppose are causing problems/possible solutions/concerns/sentiments.

Page 8: Solving a business problem through text and sentiment mining

B. Human Intervention Stage. Followed after a brief qualitative survey, we need some-one who could actually go through the answers given by our audience. Surveys/interviews might be recorded on audio record-ers and exist as audio files. We need someone who could actually go through each of the files, listening to inter-views and responses. If surveys were paper based, one must go through an-swers to select the keywords. Keyword- These are words with a high frequency (occurring a fairly high number of times relative to oth-er words) , important from the point of view of the audi-ence. For E.g. If one goes through an interview answer file and finds the word ―internet lag‖ , fairly a large number of times, then this might be a key-word. If the same word is recurring in other interview files as well, then this is a key-word. Choose key-words wisely, as this stage would lay a foun-dation for our revised survey stage, when we would frame new questions which would specifically address the key-word issue.

Page 9: Solving a business problem through text and sentiment mining

C. Revised survey/interviews Following the identification of key-words is the re-vised survey/interview phase. As I mentioned ear-lier this phase involves re-structuring of questions/interviews/surveys which revolved around the key-words, or for whom key-words are the answers. For e.g. the key-word ―internet-lag‖ if was identi-fied in the previous phase, one could frame a ques-tion like: A.Is ―internet-lag‖ or ―connectivity‖ an important

factor that causes hindrances in your routine. An-swer would be a Yes/No or may be an open end-ed.

B. Do you think establishing a sophisticated ―wi-fi‖ network would make any considerable difference.

C. What tasks face hindrances due to slow internet. Such reframing of questions around the key-words would give you ―keywords‖ as their answers and also suggest concerns and associations of these key-words with other words. One should always decide the format of this revised survey for ease of text mining done in the next phase. This is software and text mining technique specific.

Page 10: Solving a business problem through text and sentiment mining

D. The data cleaning phase: This phase depends on the extent of cleaning of data required , the in-format of data expected by the soft-ware/tool. Here I’ll explain generic things that you’d need to implement , in order to get a clean data, tak-en as input by most of the software's 1. Word file format I

This format is not suitable for text mining as the words ―Interviewer‖ and ―Interviewee‖ have been repeating quite a number of times. A Software would always show these words (high freq.) associated with other words , which makes no sense. 2. Word file format II Filter the ―interviewee‖ content (answers given by interviewee‖ on to a new document. Thus answer given by the interviewee to every question is on a new text/word document. Number of answers = number of documents.

Page 11: Solving a business problem through text and sentiment mining

3. Excel file in-format The following snap indicates questions and answers cap-tured in a tabular format. This is a structured way of storing information but does not work well with text mining soft wares.

More the number of files, better it is for the software to generate associations between key-words. One can split each cell in this file into a new file itself. Excel comes with Visual Basic which could come to our rescue. The following code, I used for splitting an excel file into its cells. Note : I had 243 rows (243 responders) and 8 ques-tions. I used SAS text miner for text mining. Sub split() Dim t As Single Dim z As String For k = 3 To 243 Open ActiveWorkbook.Path & "\Q1" & k & ".txt" For Output As #k Print #k, Sheet1.Cells(k, 3).Value Close #k Next k End Sub

Page 12: Solving a business problem through text and sentiment mining

E. Text mining results and validations: The strongest validations and claims to support your thoughts could be generated at this phase. A concept link

looks like a tree, branching out on words which are closely associated to the word from where they branch out. We can see the word weak connected to conclude with a thicker line which shows a strong association between the-se words. A link could be traced in a following way : Tear—mother—weak—worst– engender . After connect-ing words one needs to make sense out of it. Thus it is easy for a person who has actually done these surveys and done the question framing and re-framing to make sense out of these words, as only he/she can interpret the senti-ment behind these words. Make sure to branch out the key-words, as these would support our business idea and the cause we are trying to prove through our validations.

Page 13: Solving a business problem through text and sentiment mining

A text clustering technique could be useful also. What does this technique do—It creates clusters of words which appear together in the chunk of the text. In the in-terviews if our audience has mentioned words ―internet‖ and ―lag‖ together , or if they have mentioned ―internet‖ and ―hindrance‖ in the same sentence (close proximity), then these words are bound to clustered together. This figure shows, words which are clustered together as per their cluster ID. Each ID stands for a new cluster.

Thus, figure shows a text clustering and words which represent their re-spective clusters.

Page 14: Solving a business problem through text and sentiment mining

Text topic technique is also popularly used as text clus-tering. This technique allows the user to create a ―Topic‖ of his/her choice and words that he/she believes fall un-der this topic. If I chose the key-word ―internet‖ , I would create a Text topic named ―Internet‖ and ask the software to group the words like lag, connectivity, emails, Google, wi-fi under this topic. If these words appear anywhere in the text , in proximity of the word ―internet‖, they would immediate-ly be included under the ―internet‖ topic. This technique works just like text clustering, just that here you are able to chose own words (topics) that represent your clusters We have chosen terms which fall under the topic ―internet‖. A weight is importance given to a term. It ranges from 0 (least importance) to 1 (most important)

Page 15: Solving a business problem through text and sentiment mining

F. & G. Thinking of alternatives and brainstorming to solve these validations and pitching

Now, that we have validated the issue ―Internet‖ and proved its association with words like ―lag‖ and ―hindrance‖ , through concept links and text topics we need to now think and brainstorm about the alternatives which could solve this problem. This is the root-cause analysis. We need to answer questions like: A.What is the cause behind slow/lag in the internet. B. Can switching to a new operator solve problem . C. Are the devices that harness the internet are faulty D.If we need to install new devices what would be the to-

tal investment.. E. To what degree does the new installment resolve the is-

sue and the time frame for transformation. This is highly subjective phase. Every possible alternative we think would be supported by its pros and lack due to its cons. A team work, good domain knowledge is what it take for successful implementation. Whatever might be the claim or an idea, it should always be backed by figures and percentages. A cash flow is a must when one drafts a proposal from this phase. Money flowing in and out is al-ways a concern for the funding authority. This also evalu-ates the break-even point of any possible solution.

Page 16: Solving a business problem through text and sentiment mining

In my case the root cause of the issues was a weak trans-portation system. We brainstormed and came down to some conclusions as : 1. Improving the existing transportation system in senses

of frequency of rides, routes , scheduling of rides, im-proved vehicles.

2. Setting up a new transportation system as per our needs, routes and schedules.

For each of the alternative we plugged in the cash flow. The amount invested, where would the amount be put to use, running costs incurred, cost recovered, liabilities and costs associated with it. With all these evaluations one can prepare a business and a revenue model which would support his/her claims. A model pitched should be self sustainable and as far as possible need a one-time investment. One-time investment in a pilot model is a best way to approach investors. Its like claiming—‖Invest once and just reap the benefits later.‖ if one is able to show in figures , the ROI his/her model can achieve, it is easy to convey the model profitability and efficiency to its audience.

Page 17: Solving a business problem through text and sentiment mining

I. & J. Pilot model deployment and model evaluation: Finally after getting an approval and the necessary funds , after a successful pitching stage, comes the model deployment phase. This phase is more about prototyping or setting up a scaled down model of our pitched model. In my case , we put a model for almost 3 dozen cities/towns and our scaled model (pilot model) was for one small town. We made provisions and arrangements for in-ventories and resources which were needed. For e.g. if it was for ―internet‖, we could purchase a few new routers and devices and install them at selected areas. We would then shift operations from the old network to a new net-work and start collecting real time data. One thing im-portant while we proceed through this stage is, we have to prove our pilot model is better than the traditional way of doing things. The real time data collection that I men-tioned would serve this purpose. If it was for internet, we could collect data like : 1. Current speed of the internet vs. speed with old system 2. Time saved in minutes/ hours 3. Work efficiency that has increased due to new system

in percentage or other suitable unit. 4. Future scope , when our system would be installed on a

full scale. One can also understand the pitfalls of a mod-el during its pilot deployment phase. Remember about the key-words. Internet was connected with words like

Page 18: Solving a business problem through text and sentiment mining

Lag and hindrances. Now that our model has been run-ning we need to address it. These key-words should be ―solved‖ or overcome through our proposed model. If one could come up with forecasted cash figures that we could be saving , when the model is deployed on a full-scale then it could be considered an impetus for our pilot model and idea. Always remember to abide by the time frame and costs that one has pitched to get his/her model funded. One could also have a final satisfaction survey , where the degree of issue being resolved can be gauged. This would serve as a validation for our pilot model


Recommended