+ All Categories
Home > Internet > II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impact of Scholarly...

II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impact of Scholarly...

Date post: 28-Nov-2014
Category:
Upload: dr-haxel-congress-and-event-management-gmbh
View: 204 times
Download: 4 times
Share this document with a friend
Description:
 
43
TEXT MINING, TERM MINING, AND VISUALIZATION IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING MONDAY 16 APRIL 2012 NICE, FRANCE Marjorie M.K. Hlava, President Jay Ven Eman, CEO Access Innovations, Inc. [email protected] [email protected] 1
Transcript
Page 1: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

TEXT MINING, TERM MINING, AND VISUALIZATION

IMPROVING THE IMPACT OF SCHOLARLY PUBLISHING

MONDAY 16 APRIL 2012NICE, FRANCE

Marjorie M.K. Hlava, President

Jay Ven Eman, CEO

Access Innovations, Inc.

[email protected]

[email protected]

1

Page 2: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

What we will cover today

• Term and Text Mining

• The basics of visualization

• Case studies

• Using subject terms as metrics

• Applications

• Visualizing the results

Page 3: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Definitions

• Term Mining - a systematic comparison processing

algorithmic method to find patterns in text

• Text Mining – using controlled vocabulary tags in text to

find patterns and directions

• Term & text mining

� Many similarities

� Can be complimentary; not mutually exclusive

Page 4: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Term mining

• Precise

� Meaningful semantic relationships; contextual

� Replicable; repeatable; consistent

� Vetted; controlled

� Based on a controlled vocabulary

� Trends; gaps; relationship analysis; visualizations

� Less data processing load

Page 5: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Text mining

� Algorithmic; formulaic

� Neural nets, statistical, latent semantic, co -

occurrence

� Serendipitous relationships

� Sentiment; hot topics; trends

� False drops; noise;

� Misleading semantic relationships

� Heavy processing load

Page 6: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Why take a visual look?

• Humans can process information 17 times faster in visual

presentations

• Now data can be analyzed, manipulated and presented as visual

displays.

• To see the trends effectively we need to make the data into rich

graph-able formats

6

Page 7: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Visualization of data

• Needs− Measurement

− Metrics

− Numbers

• Shows− Adjacency

− Relationships

− Trends

− Co – occurrence

− Conceptual distance

• Is richer with− Linking

− Semantic enrichment

− Classification

• Supports

− Forecasting

− Trend analysis

− Segmentation

− Distribution

7

Page 8: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Man’s attention to visual display to convey

knowledge is ancient

8

Page 9: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

The art in mapsis a

longstanding tradition

9

Page 10: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Super imposing data is now commonA mash up example

10

Traffic Injury MapUK Data ArchiveUS National Highway

Safety AdministrationGoogle Maps Base

Accident categories includechildrenautomobilebicycleetc.

Datatimeplacetype

Source:

JISC TechWatch: Data Mash-ups September 2010

Page 11: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Mash up of bird flight migrations and weather patterns

http://www.youtube.com/watch?v=uPff1t4pXiI&feature=youtu.be

11

Page 12: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

http://www.youtube.com/watch?v=nokQBjk1s_8&feature=player_embedded

Page 13: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

How does it work?

� Develop controlled vocabulary

» Prefer one with hierarchy

� Apply to full text

» Or to the “heads”

� Decide on data points to convey information

� Divide the XML into graphable sections

Page 14: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Start with data – like this XML file

14

Page 15: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Index or tag using subject terms from thesaurus or taxonomy

� date, category, taxonomy term, frequency

15

Page 16: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Many views of one set of data

16

Page 17: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Load to a visualization program Like Prefuse

17

Page 18: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Or Pajek

18

Page 19: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 19

Page 20: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

National Information Center for Educational Media

Albuquerque’s own» Sandia developed VxInsight

» Access Innovations = NICEM

Same data – several views

Primary and Secondary Education in US

Shows the US Valley of Science

Little Science taught in elementary years

20

Page 21: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Page 22: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Page 23: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Using visualization to show

� From a society / publisher perspective» Identify Core, Boundary and Cross Border

» Provides Indicators� Activity

� Growth

� Relatedness

� Centrality

» Locates Journal domains

� From a thesaurus perspective» Identifies terms that are too broadly defined

» Potential Improvements in thesaurus structure using topic structures

23

Page 24: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Case Study: Mapping IEEE thesaurus space

� We are interested in an expanded map that includes adjacencies to the IEEE data» Expanded term set shows adjacent white space;

opportunities for expansion

� Overlaps and edges of the science» We need comparison data

� Learn the directions in the field» Low occurrence rate in IEEE documents?

» Linkage to terms in IEEE documents?

� Where do we find these terms? How can we add them?

24

Page 25: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

The process

� Built a rule base to auto index IEEE content» “90 % accuracy out of the box on journal data”*

» “80% out of the box on proceedings data”*

� The overlapping data sets» Auto indexed 1.2 million Xplore records

» Auto indexed 10 years of US Patent data

» Auto indexed 10 years of Medline

� Term sets used» IEEE thesaurus terms rule base

» Medical Subject Headings (MeSH) (and simple rule base)

» Defense Technical Information Center (DTIC) Thesaurus ( and simple rule base)

» Similar level of detail to current IEEE thesaurus terms

25

Page 26: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Defining expanded term space

26

IEEE2k term

s

1.2M documents

1. The data - Select related corpus

14k D

TIC

475k patents

24k M

eS

H

PubMed525k docs

Page 27: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Defining expanded term space

27

IEEE2k term

s

1.2M documents

2. Identify related termsUse the IEEE Thesaurus to index the three collections

Page 28: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Defining expanded term space

28

IEEE2k term

s

1.2M documents

2. Identify related termsUse MESH and DTIC to also index the three collections

Page 29: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 29

IEEE2k term

s

1.2M documents

3. Resulting term setThe co-indexed items from the three collections

Defining expanded term space

Page 30: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 30

4. Term:Term MatrixWhere do the articles and their indexing intersect?

Defining expanded term space

Page 31: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 31

Visualization Strategies

MatrixVisualization

Software

Page 32: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 32

All data up-posted to the top level

Page 33: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 33

Many map options

IEEE ExperiencePrevious Experience

Page 34: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 34

Sensors

Council

Nucl Plasma

Sci Soc

Nanotech

Council

Ultrason,

Ferro …

Prod Saf

Engng Soc

Oceanic

Engng Soc

Geosci Rem

Sens Soc

Council

Supercond

Compon,

Packag …

Instr

Measur Soc

Magnetics

Soc

Dielectr El

Insul Soc

Electromag

Compat Soc

Antennas

Propag Soc

Power

Electron Soc

Electron

Dev Soc

Circuits &

Systems

Power &

Energy Soc

Industry

Appl Soc

Solid St

Circuits Soc

Industr

Electr Soc

Microwave

Theory Soc

Aerosp

Electr

Sys Soc

Sys Man

Cyber

Society

Computer

Intelligence

Society

Systems

Council

Reliability

Society Education

Society

Prof

Commun

Society

Computer

Society

Robot

Autom Soc

Social

Impl Techn

Council Electr

Design Auto

Signal

Proc Soc

Intell Transp

Sys Soc

Commun

Soc

Info Theory

Soc

Vehicular

Techn Soc

Consumer

Electr Soc

Broadcast

Techn Soc

Photonics

Soc

Eng Med

Biol Sci

IEEE Portfolio

Page 35: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 35

Radial Visualization

Page 36: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 36

Publication Strategy

JASIST reference

Page 37: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 37

Conference Strategy

Page 38: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony 38

Turbines

Measurement

Circuits

Amplifiers

Displays

Games

Toys

Flow

Cooling

Heating

Components

Gearing

Brakes

Dynamics

Vehicles,Parts

Disk

Optics

Photochem

Molding

Conductors

Coatings

Lasers

Lamps

Motors

Plants,Micro-orgs

Control

Boats

OilfieldServices

Med Instruments

Welding

Conveyers

Rubber

Acyclic Comp

Footwear

Lubricants

Radiology

Catalysis

Macromolecules

Sprayers

Electrochem

Fitness

Hygiene

Cleaning

Printing

Paper

IC Engines

Magn/Elect

MagnetsTextiles

Layers

MedicalDevices

Clocks

Pipes

Valves

Blasting

Cables

Appliances

Outerwear

ExhaustPumps

Packaging

Aircraft

Semiconductors

Use a Thesaurus to Label Maps

Agriculture

Food

ConsumerProducts

Construction

Automotive +Defense

IndustrialProducts

Leisure

Energy

Telecom ComputerHW/SW

Electronics

Chemicals

Pharma

Metals

Health Care

Page 39: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Questions Answered

� Is there a way, using our own information, to forecast our direction?

� Where is the industry headed? What about by technology sector?

� Does our coverage match our mission and vision?

� Can we become smarter about our data and potential markets using our collection in new ways?Are the societies publishing and talking about what their charter indicates they cover?

� What are the trends – are topics emerging/cooling?

� Can we use technology and our own data to explore these questions while enhancing our data?

39

Page 40: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

The research team

� Access Innovations / Data Harmony» Founded in 1978

» Data enrichment and normalization

» Suite of Semantic Enrichment tools

� SciTechStrategies» Understanding data through visualization

� IEEE Indexing & Abstracting Group

40

Page 41: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

We looked at visualization of data

� Finding the Metrics» Measurement» Numbers» Terms as indicators

� Ways to show» Adjacency» Relationships» Trends» Co – occurrence» Conceptual distance

� How to enrich with» Linking» Semantic enrichment» Classification

� Maps supporting» Forecasting

» Trend analysis

» Segmentation

» Distribution

41

Page 42: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

Well Formed Data • Semantic Enrichment • Taxonomies • Access Innovations • Data Harmony

Effective maps require

� Contextual data

� Detailed data

� Classification methods

� At least two directions in the matrix

� A little art for fun

42

Page 43: II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impact of Scholarly Publishing

43

It just takes a little imagination

Thank you

Marjorie M.K. HlavaPresident

[email protected]

Jay Ven Eman, [email protected]

, Access Innovations505-998-0800


Recommended