+ All Categories
Home > Documents > „Al. I. Cuza” University of Iasi, Romania Faculty of...

„Al. I. Cuza” University of Iasi, Romania Faculty of...

Date post: 21-May-2018
Category:
Upload: hoangdiep
View: 214 times
Download: 2 times
Share this document with a friend
49
Applications of Natural Language Processing Course 3 - 8 March 2012 „Al. I. Cuza” University of Iasi, Romania Faculty of Computer Science 1
Transcript

Applications of Natural Language Processing

Course 3 - 8 March 2012

„Al. I. Cuza” University of Iasi, Romania

Faculty of Computer Science

1

Data Mining ◦ Definition

◦ Examples

◦ Data, Information, Knowledge

◦ Elements, Levels of Analysis

◦ Notable Uses

◦ Resources

Text Mining ◦ Definition

◦ Domains

◦ Applications

◦ TerMine, AcroMine, FACTA+, KLEIO, MEDIE

2

The process of analyzing data from different perspectives and summarizing it into useful information (information that can be used to increase revenue, cuts costs, etc.)

The process of finding correlations or patterns among dozens of fields in large relational databases

Data, Information, and Knowledge

3

4

Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns

They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer +

In order to increase revenue =>

◦ They could move the beer display closer to the diaper display

◦ They could make sure beer and diapers were sold at full price on Thursdays

5

Any facts, numbers, or text that can be processed by a computer

Organizations are accumulating vast and growing amounts of data in different formats and different databases: ◦ operational or transactional data such as, sales,

cost, inventory, payroll, and accounting

◦ nonoperational data, such as industry sales, forecast data, and macro economic data

◦ meta data - data about the data itself, such as logical database design or data dictionary definitions

6

The patterns, associations, or relationships among all this data can provide information

For example, analysis of retail point of sale transaction data can yield information on which products are selling and when

7

Information can be converted into knowledge about historical patterns and future trends

For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior

Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts

8

Data warehousing represents an ideal vision of maintaining a central repository of all organizational data

Centralization of data is needed to maximize user

access and analysis

9

Enables these companies to determine relationships among "internal" factors (price, product positioning, or staff skills) and "external" factors (economic indicators, competition, and customer demographics)

◦ vs

And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data

10

NBA - The Advanced Scout software

analyzes the movements of players to

help coaches orchestrate plays and

strategies

For example, an analysis of the play-by-play

sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one!

11

Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes – use existing data. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order

Clusters – group existing data according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities

12

Associations – data are mined to identify associations. The beer-diaper example

Sequential patterns - Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

13

Extract, transform, and load transaction data onto the data warehouse system

Store and manage the data in a multidimensional database system

Provide data access to business analysts and information technology professionals

Analyze the data by application software

Present the data in a useful format, such as a graph or table

14

Artificial neural networks learn through training and resemble biological neural networks in structure

Genetic algorithms: a design based

on the concepts of natural evolution

Decision trees: Tree-shaped

structures that represent sets of decisions

15

Nearest neighbor method: classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset

Rule induction: The extraction of useful if-then rules from data based on statistical significance

Data visualization: The visual interpretation of complex relationships in multidimensional data

16

Games: dots-and-boxes and chess

Business: customer relationship management, businesses employing, identifying the characteristics of their most successful employees, market basket analysis

Science and engineering: genetics, bioinformatics, medicine, education and electrical power engineering

17

Spatial data mining: geography, GIS

Organizations possessing huge databases with thematic and geographically are:

◦ offices requiring analysis or dissemination of geo-referenced statistical data

◦ public health services searching for explanations of disease clusters

◦ environmental agencies assessing the impact of changing land-use patterns on climate change

◦ geo-marketing companies doing customer segmentation based on spatial location

18

Sensor data mining: wireless sensor networks (air pollution monitoring)

Visual data mining: The process of turning from analogical into digital

Music data mining: discover relevant similarities among music corpora

19

Surveillance: stop terrorist programs

◦ Pattern mining: For example, an association rule "beer ⇒ potato chips (80%)" states that four out of five customers that bought beer also bought potato chips

◦ Subject-based data mining: associations between individuals in data

20

Data mining in meteorology: changes in temperature, air pressure, moisture and wind direction - Self-Organizing Map (SOM)

Educational data mining: methods to better understand students (feedback, recommendations, predicting performance, planning and scheduling)

21

It is important to note that the term data mining has no ethical implications

Data mining requires data preparation which can uncover information or patterns which may compromise confidentiality and privacy obligations (in special through data aggregation)

The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has access to the newly compiled data set, to be able to identify specific individuals, especially when originally the data were anonymous

=> 22

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/hall/resources.htm

23

The process of discovering and extracting of previously unknown knowledge from unstructured data

Text mining (sometimes text data mining) comprises three main activities:

◦ Information retrieval to gather relevant texts

◦ Information extraction to identify and extract entities, facts and relationships between them

◦ Data mining to find associations among the pieces of information extracted from many different texts

24

Data Mining ◦ In Text Mining, patterns are extracted from natural

language text rather than databases

Web Mining ◦ In Text Mining, the input is free unstructured text,

whilst web sources are structured

Information Retrieval (Information Access) ◦ No genuinely new information is found

◦ The desired information merely coexists with other valid pieces of information

25

Computation Linguistics (CPL) & Natural Language Processing (NLP) ◦ An extrapolation from Data Mining on numerical

data to Data Mining from textual collections [Hearst 1999]

◦ CPL computes statistics over large text collections in order to discover useful patterns which are used to inform algorithms for various sub-problems within NLP, e.g. Parts Of Speech tagging, and Word Sense Disambiguation [Armstrong 1994]

26

Text mining can help to make implicit information to be explicit information

27

Information retrieval

Data mining

Machine learning

Statistics

Computational linguistics

Multilingual data mining: the ability to gain information across languages and cluster similar items from different linguistic sources according to their meaning

28

Text categorization

Text clustering

Concept/entity extraction

Production of granular taxonomies

Sentiment analysis

Document summarization

Entity relation modeling (i.e., learning relations between named entities)

29

Security applications: analysis of plain text sources such as Internet news and the study of text encryption

Biomedical applications: GoPubmed - the first semantic search engine on the Web (in biomedical literature), PubGene – combines biomedical text mining

with network visualization

30

Software and applications: IBM, Microsoft – for tracking and monitoring terrorist activities

Online media applications: editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content

31

Marketing applications: more specifically in analytical customer relationship management

Sentiment analysis: analysis of movie reviews, students evaluations, children stories and news stories

32

National Centre for Text Mining (NaCTeM) (University of Manchester + Tsujii Lab, University of Tokyo) http://www.nactem.ac.uk/index.php

School of Information at University of California, Berkeley http://www.ischool.berkeley.edu/

33

Automatically detects and extracts multi-word technical terms from text

http://www.nactem.ac.uk/software/termine/

34

36

Finds expanded forms of acronyms from a database of those previously used by authors

http://www.nactem.ac.uk/software/acromine/

37

38

Tool that helps discover associations between biomedical concepts contained in MEDLINE articles

http://refine1-nactem.mc.man.ac.uk/facta/

39

40

An advanced information retrieval system providing knowledge enriched searching for biomedicine

http://www.nactem.ac.uk/Kleio/

41

42

Uses semantic search to retrieve biomedical correlations from MEDLINE

http://www.nactem.ac.uk/medie/

43

44

45

46

1) A miniMEDIE application for Romanian that for a subject and a verb can identify possible objects from a corpora built before.

Use the Romanian POS service from address: http://instrumente.infoiasi.ro/WebPosTagger/

47

Data Mining: What is Data Mining? http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

Microsoft Association Rules: http://e-university.wisdomjobs.com/data-mining/chapter-377-199/microsoft-association-rules.html

Data mining: http://en.wikipedia.org/wiki/Data_mining

Data mining in meteorology: http://en.wikipedia.org/wiki/Data_mining_in_meteorology

Applied data mining: http://en.wikipedia.org/wiki/Category:Applied_data_mining

Self organizing map: http://en.wikipedia.org/wiki/Self-Organizing_Map

GoPubMed: http://www.gopubmed.com/web/gopubmed/

PubGene: http://www.pubgene.org/

Text Mining: http://en.wikipedia.org/wiki/Text_mining

NaCTeM: http://www.nactem.ac.uk/index.php

NaCTeM Brochure: http://www.nactem.ac.uk/brochure/NaCTeM_Brochure.pdf

Text Mining Tutorial: http://eprints.pascal-network.org/archive/00000017/01/Tutorial_Marko.pdf

Text Mining: http://www.cs.sunysb.edu/~cse634/presentations/TextMining.pdf

Text Mining Resources: http://bioinformatics.ualr.edu/resources/links/text_mining_category.html

BioNLP Shared Task: https://sites.google.com/site/bionlpst/ 48

49


Recommended