+ All Categories
Home > Technology > 2010-04-29-swnj-pcls-presentation

2010-04-29-swnj-pcls-presentation

Date post: 11-Jul-2015
Category:
Upload: douglas-randall
View: 412 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Phonotonetic Chinese Language System and the Semantic Web Phonotonetic Chinese Language Institute April 29, 2010 Douglas R. Donahue
Transcript
Page 1: 2010-04-29-swnj-pcls-presentation

Phonotonetic Chinese Language System and the

Semantic Web

Phonotonetic Chinese Language Institute

April 29, 2010

Douglas R. Donahue

Page 2: 2010-04-29-swnj-pcls-presentation

Orthgraphic Types

Global Writing System Methods

Alphabetic based

Logographic based

Syllabary based

Abjad based

Abugida based

“A different language is a different vision of life.”

Federico Fellini

Page 3: 2010-04-29-swnj-pcls-presentation

Linguistic Word Order Patterns

SVO: Subject-Verb-Object (English, Mandarin Chinese)

VSO: Verb-Subject-Object (Scandinavian languages, Celtic, Hawaiian)

SOV: Subject-Object-Verb (Hindi, Persian, Latin, Korean)

VOS: Verb-Object-Subject (Fijian, Malagasy)

OVS: Object-Verb-Subject (Hixkaryana)

OSV: Object-Subject-Verb (Xavante, Warao)

Page 4: 2010-04-29-swnj-pcls-presentation

Chinese Orthographies

Logographs− Traditional− Simplified

Transliteration Methods− Hanyu Pinyin, Zhuyin Fuhao, Wade-Giles− Logographic searches take place according to

phonetic soundings Radical Collation: Primary Chinese character

dictionary ordering rules, which enable users to locate, compare, sort, merge, etc. written logographs

Stroke Order: Minor primary ordering classifier

Page 5: 2010-04-29-swnj-pcls-presentation

Logographic Transliterative Side Effects

Homophone: A word that are pronounced the same as another word but differs in meaning.

Heterographs: Homophones that are spelled differently.

Homographs:One of a group of words that share the same spelling but have different meanings.

Homonyms: One of a group of words that share the same spelling and the same pronunciation but have different meanings

Heterophones: Identically written words that having different pronunciations and meanings.

Page 6: 2010-04-29-swnj-pcls-presentation

Phonetic ChineseLanguage System

A Chinese language specific alphabet and dictionary 1-to-1 correspondence between the spoken sounds, and

written symbols for words of the Mandarin Chinese language

A bridge between the phonological (alphabetic), and ideographic orthographic types

Provides lexicographical (a.k.a dictionary) ordering of both Mandarin words and logographs (via transliteration)M

Provides utilitarian collation of written Mandarin Chinese, for both human an machine

Enables software written in Western languages to work directly with Mandarin

Extends existing Chinese orthagraphies, and works in union with them

Page 7: 2010-04-29-swnj-pcls-presentation

Lexicographical Order

Alphabetic order; Dictionary order

A natural order result of the Cartesian product of two ordered sets i.e.

(a,b) ≤ (a,b′) only if a < a′ or (a = a′ and b ≤ b′) Logographic

Page 8: 2010-04-29-swnj-pcls-presentation

WWW Temporal Characteristic

• The Internet• WWW v1 & v2• Social Web• Semantic Web• Ubiquitous Web

• (International) Domain Name System

• RDF• OWL• RSS• I18N Ontologies• I18N Datasets

Page 9: 2010-04-29-swnj-pcls-presentation

Finding Information

Collation The assembly of written information into a standard order. Collating lists of words or names into alphabetical order is the basis of most office filing systems, library catalogs and reference books.

Classification concerned with arranging information into logical categories, while collation is concerned with the ordering of those categories

Collation algorithm: A process which defines the order involved with the comparison of two values (e.g. the "Unicode collation algorithm")

Sorting algorithm: A procedure to put a list of items in the order specified by the associated 'rules' of collation

Page 10: 2010-04-29-swnj-pcls-presentation

Structured Data

Structured Data: particular ways of organizing data in a computer for efficient use. Essential ingredients that make the management of huge amounts of data possible; i.e. databases and (internet) indexing services

Lists, Sets, Queues, Maps DiGraph: An abstract data type which

represents a relationship or connection. Consists of two types of elements; namely vertices and edges.

Edges: Vertice (endpoint) connectors. Vertices: Graph element endpoints

Page 11: 2010-04-29-swnj-pcls-presentation

Structured Web

Information on the Web is becoming more structured.

Information on the Semantic Web MUST become more structured!

Movements toward structure include: The rise of APIs The proliferation of vertical applications that

run on top of existing data An increase in classic Semantic Technologies

and Microformats The spread of RSS as an information delivery

mechanism

Page 12: 2010-04-29-swnj-pcls-presentation

Resource Description Framework (RDF)

A method employed to describe conceptual understanding through a modeling approach involving information represented by Web associated Resources

Employs the use of URIs to make statements about Web associated Resources

Employs the use of URIs to make statements about Resources that are of interest in the real world

Standard Data Model for the exchange of data on The Web

Various languages are used to express the modelled data

A Directed Graph is employed whose elements consist of a pair of vertices and an edge

Page 13: 2010-04-29-swnj-pcls-presentation

Semantic Web

• Storing data on the network is not sufficient

• Ability to sieve information from network resident datastores

• Transforming network resident datastores to knowledge

• Converting network resident knowledge to action

Page 14: 2010-04-29-swnj-pcls-presentation

Internationalized RDF

http://odoncaoa.orgfree.com/foaf.rdf#odoncaoa ifoaf:name_cn http:// 唐道革 . 中国

Page 15: 2010-04-29-swnj-pcls-presentation

Data models• Description of how data

are represented and accessed via computational system.

• Formal definition of the computational system's data elements and their relations, within the domain

• Wayfinding mechanism, that employs a set of symbols and text in the explanation of an information subset that improves communication

Page 16: 2010-04-29-swnj-pcls-presentation

Giant Global Graph Not a separate Web but an extension of the

current WWW Information is given well-defined structure and

meaning Facilitate a more automated networking

environment, where computers can operate more autonomously on behalf of people

A Structured Web, akin to an immense DB A common framework enabling data to be

shared, and reused across application, enterprise and community boundaries

Page 17: 2010-04-29-swnj-pcls-presentation

Hash Tables/Maps

• Data Structure for collections involving unique IDs• Most prevalent data structure in use to perform common

key/value paired searches• Structured data mechanism which support the management

of the concepts and relationships used to describe, and represent knowledge areas via ontologies

Page 18: 2010-04-29-swnj-pcls-presentation

PCL System

• PCLS ameliorates the indeterminism generated as a transliterative side effect

• Sorting• Searching• Indexing• Lexicographical

Order

Page 19: 2010-04-29-swnj-pcls-presentation

Orthographic Relations

Page 20: 2010-04-29-swnj-pcls-presentation

Conclusion

Metcalfe's Law Formulated by Robert Metcalfe in regard to Ethernet Explains many network effects involving human use of

communication technologies e.g. The Internet, Social Networking, and the World Wide Web

More of a heuristic or metaphor than an iron-clad empirical rule.

The social utility of a network depends upon the number of nodes in contact.

If English speaking Americans, and Mandarin speaking Chinese users don't understand each other, the utility of the network of users speaking the other language, is zero; and the heuristic has to be calculated for the two networks separately.


Recommended