+ All Categories
Home > Documents > Web Intelligence (WI) Definition, Research Challenges and Major Tools Yang Chen UNC Charlotte.

Web Intelligence (WI) Definition, Research Challenges and Major Tools Yang Chen UNC Charlotte.

Date post: 20-Dec-2015
Category:
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
40
Web Intelligence (WI) Definition, Research Challenges and Major Tool s Yang Chen UNC Charlotte
Transcript

Web Intelligence (WI)

Definition, Research Challenges and Major Tools

Yang Chen

UNC Charlotte

Outline

• A brief history of Web Intelligence

• Motivations for WI

• Definition and Perspectives of WI

• Research Agenda

• Major Web Intelligence Tools

• Conclusion

A Brief History of WI

• 1999: Collaborative research initiatives– Ning Zhong, Data Mining and Knowledge Systems– Jiming Liu, Intelligent agents and multi-agents– Yiyu Yao, Information retrieval and intelligent

information systems

• Combined research efforts with common goal: create a new sub-discipline covering theories and techniques related to web information.

A Brief History of WI

• 2000: Publication of a two-page position paper on WI (Zhong, Liu, Yao, Ohsuga, COMPSAC 2000)

A Brief History of WI

• 2001: First Asia-Pacific Conference on Web Intelligence

• 2002: Publication of first special issue on WI in IEEE Computer

• 2002: Web Intelligence Consortium• 2003: First edited book on WI

• 2005: The international WIC Institute

Outline

• A brief history of Web Intelligence

• Motivations for WI

• Definition and Perspectives of WI

• Trends and Research Agenda

• Major Web Intelligence Tools

• Conclusion

Motivation

• The sheer size of Web– Difficulties in the storage, management, and

efficient and effective retrieval

• Complexity of Web– Heterogeneous collection of structured,

unstructured, semi-structured, interrelated, and distributed Web documents

– Consist texts, images and sounds

MotivationWeb Intelligence on the Web

Industrial Interests in WI

• Web Intelligence kis-lab.com/wi01/• Web-Intelligence Home Page

– www.web-intelligence.com/

• Intelligence on the Web– www.fas.org/irp/intelwww.html

• WIN: home WEB INTELLIGENCE NETWORK,– smarter.net/

• CatchTheWeb - Web Research, Web Intelligence Collaboration www.catchtheweb.com/

• Infonoia: Web Intelligence In Your Hands– www.infonoia.com/myagent/en/baseframe.html

Motivations

• Data production on the Web is at an exponential growth rate.

• A fast growing industrial interest in WI• Only a few academic papers• We need to narrow the gap between

industry needs and academic research.

Outline

• A brief history of Web Intelligence

• Motivations for WI

• Definition and Perspectives of WI

• Research Agenda

• Major Web Intelligence Tools

• Conclusion

What is Web Intelligence

• Web Intelligence (WI) exploits the fundamental and practical impact that advanced Information Technology (IT) and innovative Artificial Intelligence (AI) will have on the Web:

– Integration of IT with AI– Applications of AI on the Web

Web Intelligence System

Based on Zhong`s AWIC03 keynote talk

An Example

Advanced Questions

• How the customer enters VIP portal in order to target products and manage promotions and marketing campaigns?

• What is the semantic association between the pages the customer visited?

• Is the visitor familiar with the Web structure? Or is he or she a new user or a random one?

• Is the visitor a Web robot or other users?• …

Advanced WI System

• Making a dynamic recommendation to a Web user based on the user profile and usage behavior;

• Automatic modification of a website’s contents and organization;

• Combining Web usage data with marketing data to give information about how visitors used a website.

Advanced WI System

Perspectives of WI

• WI can be classified into four categories (based on Russel & Norvig`s scheme)

Outline

• A brief history of Web Intelligence

• Motivations for WI

• Definition and Perspectives of WI

• Research Agenda

• Major Web Intelligence Tools

• Conclusion

Research Agenda of WI

• Semantic Web mining and automatic

construction of ontologies

• Social network intelligence

The Semantic Web

• The Semantic Web is based on languages that make more of the semantic content of the page available in machine-readable formats for agent-based computing.

A “semantic” language that ties the information on a page to machine

readable semantics (ontology).

Components of Semantic Web

• A unifying data model such as RDF.• Languages with defined semantics, built on

RDF, such as OWL (DAML+OIL).• Ontologies of standardized terminology for

marking up Web resources.• Tools that assist the generation and processing

of semantic markup.

Ontologies provides the semantic backbone for Semantic Web applications.

Ontologies offer

• Communication– Normative models, Networks of relationships

• Sharing & Reuse– Specifications, Reliability

• Control– Classification, and Finding, sharing,

discovering relationships

Categories of Ontologies

• A domain-specific ontology describes a well-defined technical or business domain.

• A task ontology might be either domain-specific or reconstructed from a set of domain-specific ontologies for meeting the requirement of a task.

• A universal ontology describes knowledge at higher levels.

Research Agenda of WI

• Semantic Web mining and automatic

construction of ontologies

• Social network intelligence

The Web as a Graph

• We can view the Web as a directed social network that connects people (organizations or social entities).

• Research Questions:• How big is the graph? (outdegree and indegree)• Can we browse from any page to any other? (clicks)• Can we exploit the structure of the Web? (searching and mining)• How to discover and manage the Web communities?• What does the Web graph reveal about social dynamics?

Social Network Intelligence

Social Network

Outline

• A brief history of Web Intelligence

• Motivations for WI

• Definition and Perspectives of WI

• Trends and Research Agenda

• Major Web Intelligence Tools

• Conclusion

Major Web Intelligence Tools• I. Collection

– Offline Explorer– SpidersRUs (AI Lab)– Google Scholar

• II. Analysis (Data and Text Mining)– Google APIs– Google Translation– GATE– Arizona Noun Phraser (AI Lab)– Self-Organizing Map, SOM (AI Lab)– Weka

• III. Visualization– NetDraw– JUNG– Analyst’s Notebook and Starlight

Collection: Offline Explorer

Project list

Project properties setup window

File filters, URL filters, and other advanced properties.

Download URLs

Download level

File modification check

Analysis: Google APIs• Google provides many APIs to help you quickly develop your own applications.

http://code.google.com/more/

• Examples of Google APIs:– Google API for Inlink: Discovers what pages link to your website.– Google Data APIs: Provide a simple, standard protocol for reading and writing

data on the Web. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums.

– Google AJAX Search API: Uses JavaScript to embed a simple, dynamic Google search box and display search results in your own Web pages.

– Google Analytics: Allows users gather, view, and analyze data about their Website traffic. Users can see which content gets the most visits, average page views and time on site for visits.

– Google Safe Browsing APIs: Allow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages.

– YouTube Data API: Integrates online videos from YouTube into your applications.

GATE

• Information Extraction tasks:– Named Entity Recognition (NE)

• Finds names, places, dates, etc.– Co-reference Resolution (CO)

• Identifies identity relations between entities in texts. – Template Element Construction (TE)

• Adds descriptive information to NE results (using CO). – Template Relation Construction (TR)

• Finds relations between TE entities.– Scenario Template Production (ST)

• Fits TE and TR results into specified event scenarios.

• GATE also includes:– Parsers, stemmers, and Information Retrieval tools;– Tools for visualizing and manipulating ontology; and– Evaluation and benchmarking tools.

GATE

Results display

Attributes

Project information

SOM

• The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona.

– Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions.

SOM

Topic region

Topic

# of documents belonging to this topic

Warm colors represent new topics.

Different Topics

Visualization: JUNG• The Java Universal Network/Graph Framework (JUNG) is a

software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It was developed by School of Information and Computer Science at the University of California, Irvine. http://jung.sourceforge.net/index.html

• The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis:– Clustering– Decomposition– Optimization– Random Graph Generation– Statistical Analysis– Calculation of Network Distances and Flows and Importance

Measures (Centrality, PageRank, HITS, etc.).

JUNG

Examples of visualization types

Conclusion

• The marriage of hypertext and internet leads to a revolution: the Web.

• The marriage of Artificial Intelligence and Advanced Information Technology, on the platform of Web, will lead to another paradigm shift: the Intelligent and Wisdom Web.

Thank You

Any Question?


Recommended