+ All Categories
Home > Technology > Big, Open, Data and Semantics for Real-World Application Near You

Big, Open, Data and Semantics for Real-World Application Near You

Date post: 20-Aug-2015
Category:
Upload: biplav-srivastava
View: 764 times
Download: 1 times
Share this document with a friend
Popular Tags:
84
Big Open Data and Semantics for a Real-World Application Near You Dr. Biplav Srivastava, IBM Research – India Keynote Talk at AMECSE 2014 on 21 October 2014
Transcript

Big Open Data and Semantics for a Real-World Application Near You

Dr. Biplav Srivastava, IBM Research – India

Keynote Talk at AMECSE 2014 on 21 October 2014

The Distinguished Speakers Program is made possible by

For additional information, please visit http://dsp.acm.org/

About ACM

ACM, the Association for Computing Machinery is the world’s largest educational and scientific computing society, uniting educators, researchers and

professionals to inspire dialogue, share resources and address the field’s challenges.

ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical

excellence.

ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional

networking. ������

With over 100,000 members from over 100 countries, ACM works to advance computing as a science and a profession. www.acm.org

Real-World Applications of ICT: Ingredients

!   Data – Available, Consumable with Semantics, Visualization / Analysis

!   Access - APIs, Apps (Applications), Usability - Human Computer Interface

!   Value – Providing benefits that matter, to people most in need of, in a timely and cost-efficient manner. Going beyond technology to process and people aspects.

Running Example – Data from Conference

!   Data – Technical Program

!   Access – Website

!   Value – To participants, organizers and wider ecosystem

Thought: Can any real-world application immediately benefit from data created at this event?

Outline

!   “Big Result” !   IBM’s Watson Q-A System: Intersection of Big Data, Analytics and Human Computer Interaction

!   “Small Problem” – do it repeatedly and rapidly for key city services !   Data challenge: Make data available freely; Give semantics to data

!  Open: World Wide Web Consortium, Data.gov movement !  Semantic: Linked Open Data, Ontologies

!   Access - APIs: standards based access, composition !   Value - application challenge: Give benefit to citizens; create business opportunities

!   Emerging Examples of Societal Applications with Analytical (AI) Techniques and Open Government Data !   Tourism: attract people to visit for new experiences and spend their money as well

!  Traffic: make public transportation attractive for commuting even without physical sensors !  Corruption: predictable, uniform, public services !  Public Health (covered more later in panel): reduce disease impact

!   Not covered: Environment, Water, Public Safety, Energy, … ! 

Call for action !   Make your data available in usable manner

Use more open data in your ongoing work (apps, research, monitoring, …) !   Build apps and make them available by citizens and other stakeholders

Big Result: Watson

7

Slides Courtesy: IBM Watson Team

Technical details: Ferrucci, D, et al. (2010), "Building Watson: An Overview of the DeepQA Project", AI Magazine (AI Magazine.) 31 (3)

Want to Play Chess or Just Chat?

!   Chess !   A finite, mathematically well-defined search space !   Limited number of moves and states !   All the symbols are completely grounded in the mathematical rules of the game

!   Human Language !   Words by themselves have no meaning !   Only grounded in human cognition !   Words navigate, align and communicate an infinite space of intended meaning !   Computers can not ground words to human experiences to derive meaning

“Built on IBM's DeepQA technology for hypothesis generation, massive evidence gathering, analysis, and scoring” – IBM (link)

IBM’s Watson is an emerging technology at the intersection of Big Data, Analytics and Human / Computer Interaction trends

IBM's Watson: A HorizonWatching Trend Report

9

“Watson is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project” – Wikipedia (link)

“An application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, and Machine Learning technologies to the field of open domain question answering” – IBM (link)

Wikipedia Definition

IBM Definition

Enabling Technology Areas •  Natural Language Processing •  Semantic Analysis •  Information Retrieval •  Automated Reasoning •  Machine Learning

http://www.youtube.com/watch?v=dQmuETLeQcg

Video: What is Watson?

“DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA)” – AI Magazine (link)

AI Magazine

Easy Questions?

10

ln((12,546,798 * π)) ^ 2 / 34,567.46 =

Owner Serial Number

David Jones 45322190-AK

Serial Number Type Invoice #

45322190-AK LapTop INV10895

Invoice # Vendor Payment

INV10895 MyBuy $104.56

David Jones

David Jones =

0.00885

Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”,

Dave Jones

David Jones ≠

Hard Questions? Computer programs are natively explicit, fast and exacting in their calculation over

numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise.

!   Where was X born? One day, from among his city views of Ulm, Otto chose a water color to

send to Albert Einstein as a remembrance of Einstein´s birthplace.

!   X ran this? If leadership is an art then surely Jack Welch has proved himself a master

painter during his tenure at GE.

Person Birth Place

A.  Einstein ULM

Person Organization

J. Welch GE

Structured

Unstructured

The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key

Dimensions

Broad/Open Domain

Complex Language

High Precision

Accurate Confidence

High Speed

$600 In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus

$200 If you're standing, it's the direction you should look to

check out the wainscoting.  

$2000 Of the 4 countries in the world that the U.S. does not have diplomatic relations with, the one that’s farthest north

$1000 The first person mentioned by name in ‘The Man in the Iron Mask’ is this hero of a previous book by the same author.

Basic Game Play Technology Classics The Great

Outdoors Speak of

the Dickens Mind Your Manners

Before and After

$200 $200 $200 $200 $200 $200

$400 $400 $400 $400 $400 $400

$600 $600 $600 $600 $600 $600

$800 $800 $800 $800 $800 $800

$1000 $1000 $1000 $1000 $1000 $1000

6 Categories

5 Levels of Difficulty

q 1 of 3 Players Selects a Clue

q Host reads Clue out loud

ALL POLICEMEN CAN THANK STEPHANIE KWOLEK FOR HER INVENTION OF THIS POLYMER FIBER, 5 TIMES TOUGHER THAN STEEL

TECHNOLOGY

q  All Players compete to answer

q  1st to buzz-in gets to answer

q  IF correct

Ø earns $ value

Ø selects Next Clue

q  IF wrong Ø  loses $ value Ø  other players buzz again (rebounds)

q  Two Rounds Per Game + Final Question

q  ONE Daily Double in First Round, TWO in 2nd Round

14

Broad Domain

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

he

film

gr

oup

capi

tal

wom

an

song

si

nger

sh

ow

com

pose

r tit

le

fruit

plan

et

ther

e pe

rson

la

ngua

ge

holid

ay

colo

r pl

ace

son

tree

line

prod

uct

bird

s an

imal

s si

te

lady

pr

ovin

ce

dog

subs

tanc

e in

sect

w

ay

foun

der

sena

tor

form

di

seas

e so

meo

ne

mak

er

fath

er

wor

ds

obje

ct

writ

er

nove

list

hero

ine

dish

po

st

mon

th

vege

tabl

e si

gn

coun

tries

ha

t ba

y

Our Focus is on reusable NLP technology for analyzing volumes of as-is text. Structured sources (DBs and KBs) are used to help interpret the text.

We do NOT attempt to anticipate all questions and build specialized databases.

In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. And for each these types 1000’s of different things may be asked.

*13% are non-distinct (e.g., it, this, these or NA)

Even going for the head of the tail will barely make a dent

DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture

Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms.

These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.

. . .

Answer Scoring

Models

Answer & Confidence

Question

Evidence Sources

Models

Models

Models

Models

Models Primary Search

Candidate Answer

Generation

Hypothesis Generation

Hypothesis and Evidence Scoring

Final Confidence Merging &

Ranking Synthesis

Answer Sources

Question & Topic

Analysis

Evidence Retrieval

Deep Evidence Scoring

Learned Models help combine and

weigh the Evidence

Hypothesis Generation

Hypothesis and Evidence Scoring

Question Decomposition

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s Scores from many Deep Analysis

Algorithms

100’s sources

100’s Possible Answers

Balance & Combine

Wilhelm Tempel

HMS Paramour

Isaac Newton

Halley’s Comet

Pink Panther

Christiaan Huygens

Peter Sellers

Edmond Halley

Candidate Answer Generation

[0.58 0 -1.3 … 0.97]

[0.71 1 13.4 … 0.72]

[0.12 0 2.0 … 0.40]

[0.84 1 10.6 … 0.21]

[0.33 0 6.3 … 0.83]

[0.21 1 11.1 … 0.92]

[0.91 0 -8.2 … 0.61]

[0.91 0 -1.7 … 0.60]

Evidence Scoring

Example  Ques-on  IN 1698, THIS COMET DISCOVERER TOOK A

SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY

SCIENTIFIC SEA VOYAGE

Related Content (Structured & Unstructured)

Primary Search

1)  Edmond Halley (0.85) 2)  Christiaan Huygens (0.20) 3)  Peter Sellers (0.05)

Merging & Ranking

Evidence Retrieval

Question Analysis

Keywords: 1698, comet, paramour, pink, … AnswerType(comet discoverer) Date(1698) Took(discoverer, ship) Called(ship, Paramour Pink) …

One Jeopardy! question can take 2 hours on a single 2.6Ghz Core Optimized & Scaled out on 2,880-Core Power750 using UIMA-AS, Watson is answering in 2-6 seconds.

Question 100s Possible

Answers

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s scores from many simultaneous Text Analysis Algorithms 100s sources

. . .

Hypothesis Generation

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

Synthesis Question &

Topic Analysis

Question Decomposition

Hypothesis Generation

Hypothesis and Evidence Scoring

Answer & Confidence

IBM’s Watson has been recognized as one of the most important technology achievements of 2011

IBM's Watson: A HorizonWatching Trend Report

18

“CIOs, business planners, enterprise architects, and strategy teams should familiarize themselves with its capabilities, and brainstorm ways in which human decision processes can be supported” – Gartner (link)

“The impact of Watson…will be felt far beyond the game show. This technology could have significant effect on business, government and society.” – TED (link)

Link: TED: Final Jeopardy and the Future of Watson

“Much of the technology that IBM built for Watson can be deployed against other types of tasks besides winning a Jeopardy game, to make solutions for these tasks "smarter." This technology addresses all of the five A's of smart computing that we have identified, that is, Awareness, Analysis, Alternatives, Actions, and Auditability. ” – Forrester (link)

“What is thinking? What is intelligence? What is the role that computers should and will play in our lives, and what are the boundaries between humans and computers? IBM's Watson demands that we reconsider each of these questions” – IDC (link)

Video: The Future of Watson Gartner

Forrester

IDC

Watson – Additional Information and Resources

IBM's Watson: A HorizonWatching Trend Report 19

•  AI Magazine: Building Watson: An Overview of the DeepQA Project •  CIO Insight: IBM’s Watson: 11 Personal Apps •  eWeek: IBM’s Watson: The Future of Computing •  IDC: What is Watson: The IBM Jeopardy Challenge •  IBM’s Watson Portal: IBM Watson •  IBM: Watson press kit and Watson Facebook Page and IBM Research: The DeepQA Project •  NY Times: What is IBM’s Watson? •  PBS Video: Smartest Machine on Earth •  Time: 10 Questions for Watson's Human •  Twitter: @IBMWatson and hasthag #ibmwatson •  YouTube: Watson playlist •  Wikipedia: Watson

“We believe this will be an invaluable resource for our partnering physicians and will dramatically enhance the quality and effectiveness of medical care they deliver to our members.” – Wellpoint (link)

Small Problem

20

Data – Make data available freely; Give semantics to data Access - APIs: standards based access, composition Value – Give benefit to citizens; create business opportunities

Do it repeatedly and rapidly for core services

Big Data

!   Volume

!   Variety

!   Velocity

!   Veracity

!   …

Cartoon critical of big data application, by T. Gregorius. http://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Big_data_cartoon_t_gregorius.jpg/220px-Big_data_cartoon_t_gregorius.jpg

Open Data !   Open data is the notion that data should not be

hidden, but made available to everyone. The idea is not new.

!   Scientific publications follow this: “standing on the shoulders of giants” !   Science stands for repeatability of results and

hence, sharing !   The scientific community asserts that open data

leads to increased pace of discovery. (See: Ray P. Norris, How to Make the Dream Come True: The Astronomers' Data Manifesto, At http://www.jstage.jst.go.jp/article/dsj/6/0/6_S116/_article, Accessed 2 Apr, 2012)

!   Governments are the new source for open data ! Data.gov efforts world-wide; 300+ governmental

bodies, including 20+ national agencies, including India, have opened data

!   In India, additional movement is “Right to Information Act” 22

390 Data Catalogs of Public Data

As on 20 Sep 2014

Open Data – It’s Time for Africa!

Data.gov.in (India)

As on 20 Sep 2014

India: Right to Information Act

!   Any citizen “may request information from a "public authority" (a body of Government or "instrumentality of State") which is required to reply expeditiously or within thirty days.” !   Passed by Parliament on 15 June 2005 and came fully into force on 13

October 2005. Citation Act No. 22 of 2005

!   Lauded and reviled !   Brought transparency !   Also,

!   Increased bureaucracy !   Shortcomings in preventing corruption

!   More information ! http://en.wikipedia.org/wiki/Right_to_Information_Act ! http://rti.gov.in

Illustration

27

Source: http://5stardata.info/

Does Opening Data Make It Reusable? No

1

2

3

4

5

Running Example – Temperature at Conference Location

!   Measurement System – Celsius, Fahrenheit, Kelvin, Color of spectrum, …

!   Indoor or Outdoor !   Indoor – should we need to capture events happening inside? !   Outdoor – should we have to capture predicted weather?

!   Location - Latitude, Longitude, Address, Part of building

!   Measuring equipment details

!   Data quality - refresh rates, default values when equipment broken

Data Quality in Public Data in India

!   Right to Information !   Not even 1*

!   Information available to requester, but no one else

! Data.gov.in !   2-3*

!   Available in CSV, etc but not uniquely referenceable

!   Open data movements are moving to linked data form for semantics

Linking of Open Data for Reusability

30

Source: http://5stardata.info/

Source: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/

Illustration: W3C Organization !  Abstract:  This  document  describes  a  core  ontology  for  organiza-onal  structures,  

aimed  at  suppor-ng  linked-­‐data  publishing  of  organiza-onal  informa-on  across  a  number  of  domains.  It  is  designed  to  allow  domain-­‐specific  extensions  to  add  classifica-on  of  organiza-ons  and  roles,  as  well  as  extensions  to  support  neighbouring  informa-on  such  as  organiza-onal  ac-vi-es.  

1.  Introduc-on  2.  Conformance  3.  Namespaces  4.  Overview  of  ontology  5.  Design  notes  6.  Notes  on  style  7.  Organiza-onal  structure  

7.1  Class:  Organiza-on  7.1.1  Property:  subOrganiza-onOf  7.1.2  Property:  transi-veSubOrganiza-onOf  7.1.3  Property:  hasSubOrganiza-on  7.1.4  Property:  purpose  7.1.5  Property:  hasUnit  7.1.6  Property:  unitOf  7.1.7  Property:  classifica-on  7.1.8  Property:  iden-fier  7.1.9  Property:  linkedTo  

7.2  Class:  FormalOrganiza-on  7.3  Class:  Organiza-onalUnit  7.4  Notes  on  formal  organiza-ons  7.5  Notes  on  organiza-onal  hierarchy  7.6  Notes  on  organiza-onal  classifica-on  

8.  Repor-ng  rela-onships  and  roles  8.1  Class:  Membership  

8.1.1  Property:  member  8.1.2  Property:  organiza-on  8.1.3  Property:  role  8.1.4  Property:  hasMembership  8.1.5  Property:  memberDuring  8.1.6  Property:  remunera-on  

8.2  Class:  Role  8.2.1  Property:  roleProperty  

8.3  Property:  hasMember  8.4  Property:  reportsTo  8.5  Property:  headOf  8.6  Discussion  

9.  Loca-on  9.1  Class:  Site  

9.1.1  Property:  siteAddress  9.1.2  Property:  hasSite  9.1.3  Property:  siteOf  9.1.4  Property:  hasPrimarySite  9.1.5  Property:  hasRegisteredSite  9.1.6  Property:  basedAt  

9.2  Property:  loca-on  10.  Projects  and  other  ac-vi-es  

10.1  Class:  Organiza-onalCollabora-on  11.  Historical  informa-on  

11.1  Class:  ChangeEvent  11.1.1  Property:  originalOrganiza-on  11.1.2  Property:  changedBy  11.1.3  Property:  resultedFrom  11.1.4  Property:  resul-ngOrganiza-on  

A.  Change  history  B.  Acknowledgments  C.  References  

C.1  Norma-ve  references  C.2  Informa-ve  references  

http://www.w3.org/TR/vocab-org/

Usage of W3C’s Org Ontology – Community Directory

@prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dir: <http://dir.w3.org/directory/schema#> . @prefix directory: <http://dir.w3.org/directory/orgtypes/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix gr: <http://purl.org/goodrelations/v1#> . @prefix org: <http://www.w3.org/ns/org#> . <> foaf:primaryTopic <#org> . <#org> a org:Organization, dir:Organization, gr:BusinessEntity, vcard:Organization ; rdfs:label "International Business Machines" ; gr:legalName "International Business Machines" ; vcard:organization-name "International Business Machines" ; skos:prefLabel "International Business Machines" ; dir:isOrganizationType directory:commercial ; vcard:url <http://www.ibm.com> ; vcard:logo <http://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/200px-IBM_logo.svg.png> ; rdfs:comment """International Business Machines Corporation (NYSE: IBM), or IBM, is an American multinational technology and consulting corporation, with headquarters in Armonk, New York, United States. IBM manufactures and markets computer hardware and software, and offers infrastructure, hosting and consulting services in areas ranging from mainframe computers to nanotechnology.""" . <#org> org:siteAddress <#address-1NewOrchardRoad+Armonk+UnitedStates> . <#address-1NewOrchardRoad+Armonk+UnitedStates> a vcard:VCard, vcard:Address ; vcard:street-address "1 New Orchard Road " ; vcard:locality "Armonk " ; vcard:country-name "United States" ; vcard:region "New York" ; vcard:postal-code "10504-1722" .

Still Confused on Semantics? Start with Linked Data Glossary

Peek into the Future - Amsterdam

34 http://citydashboard.waag.org/

Small Problem

35

Data – Make data available freely; Give semantics to data Access - APIs: standards based access, composition Value – Give benefit to citizens; create business opportunities

Do it repeatedly and rapidly for core services

API Example

36

http://www.programmableweb.com/api/sabre-instaflights-search

Example: API Registry

37

As on 16 Oct 2014

A Composition (Mashup) Example

38

Business  

Source: Bessemer Venture Partners 2012

Business Capabilities as Services are being via APIs and delivered as-a-service, allowing Businesses to engage with Clients and Partners with speed at Scale

REST v/s Web Services?

40

REST •  support limited integration styles, and involves

fewer decisions on architectural alternatives •  This simplifies client-side integration steps (at

the cost of lessening automation in system evolution); more focus on do-it-yourself

Source: Pautasso et al, RESTful Web Services vs. “Big” Web Services: Making the Right Architectural Decision, WWW 2008

Running Example – APIs for Temperature at Conference Location

!   API examples !   Get temperature (input: current, last, input instant)

!   Get temperature interval (input: day)

!   Get average temperature (input: time range)

!   REST or web-service

!   Semantic annotation on input and output

Every citizen is a potential city event sensor •  Citizen notices 311 event worth reporting •  Reports event using mobile

•  Launches mobile application •  Browses recent already-reported events •  Creates new event report

•  [Is pre-enabled or gets any needed credentials to report event] •  Identifies service type for new event •  Shares location using mobile device (coordinates) •  Can add location annotations (road, district, city) and description

•  Get confirmation of submission •  Get updates on service request

Extreme Personalization

=

Location Intelligence

Empowered Citizen

+

SocialAnalytics

+ +

42

ALLGOV SCENARIO: CROWDSOURCING 311* EVENT REPORTING

*311 data standard •  non-emergency events like graffiti,

garbage, down trees, abandoned car, …; Not human life threatening

•  60+ cities support it world-wide; demo works on 4 (Chicago, Boston, Tucson – USA; Bonn – Germany), and backend test of 10s more.

Browsing Services in One’s City: Mary M. can look at the 311 services her city provides On selecting the icon, •  She sees a small set of categories

(health, building, traffic, cityimage, others) around which all the city’s services are grouped.

•  She can look at a list of services and check out the agencies involved •  If there has been a change in agency responsible or new

services added for an agency, she can note that directly

Browsing Services in Other Cities: Her colleagues from another city are visiting. She may want to bring a window (instantiate an app with browse city pattern) to look at what that city offers to their citizens [Alternatively, if she is travelling to another city, she may be interested to know how that city does compared to her’s, by which agency, etc.] On selecting the icon, •  See sees a small set of familiar categories (health, building, traffic,

cityimage, others) regardless of what the city calls its services •  She can look at a list of services and check out the agencies involved

If her city does something different, she can show that to her colleagues in her or other cities.

A Demonstration of AllGov Pattern with Open 311

Applica-on  Pa]ern  

! What  is  it?:  A  pa]ern  is  any  applica-on  using  APIs,  with  some  informa-on  generalized  (i.e.,  removed  and  parameterized)  

! Business  Value:  A  pa]ern    ! standardizes  the  usage  experience  by  promo-ng  similar  behavior  (for  users)    ! simplifies  applica-on  development  by  templa-zing  API  interac-ons  (for  developers)  ! serves  as  the  organiza-on’s  memory  of  the  best-­‐prac-ces  in  developing  a  class-­‐of-­‐applica-ons  even  when  the  specific  APIs  may  not  be  relevant  (for  business)  

! Key  Technical  Issue  ! What  pa]erns  should  one  build  ?  Theore-cally,  there  exists  a  trivial  method  to  blindly  generate  a  pa]ern  from  any  applica-on.  Any  pa]ern  development  process  has  to  do  be]er  than  this  baseline.  

! How  should  the  pa]erns  be  used  in  prac-ce?  ! Building  a  tool-­‐enabled  process  around  Pa]ern-­‐based  programming  

Applica-on  Pa]ern  

! Approach  followed  in  AllGov  ! Common  steps  taken  by  a  role  player  is  a  candidate  pa]ern  

! Common  steps  that  can  be  executed  in  the  same  infrastructure  is  a  candidate  pa]ern  

! Pa]ern  1:  Browse  city  services  pa]ern  [User  Role:  Govt.  Dept  Admin;  Environment:  PRODUCTION  system]  

! find  a  city's  services  

! find  a  service's  defini-on  

! find  services  of  a  par-cular  high-­‐level  category  (example:  building,  graffi-,  ...)  

! Pa]ern  2:  Create  service  request  pa]ern  [User  Role:  Developer;  Environment:  TEST  system]  

! Browse  city  services  

! Browse  raised  city  service  requests  

! Create  a  new  service  request  

! Pa]ern  3:  Create  service  request  pa]ern  [User  Role:  General  ci-zen  of  a  par,cular  City;  Environment:  PRODUCTION  system]  

! Browse  city  services  

! Browse  raised  city  service  requests  

! Create  a  new  service  request  

AllGov Scenario Deconstruction (flows)

Customer Mobile

AllGov City Services

1

2

External IBM Client

browse events get recent events

Request confirmation

get service types create request

Post location coordinates

Post details on Event, location

3 Notify service completed

P1, P1+

P2, P3

Emerging Examples of

Societal Applications with

AI Techniques and Open Government Data

48

Smarter Tourism

Why Tourism Matters

!   Pros !   Promotes services jobs !   Helps upgrade infrastructure !   Gives alternative revenue source to government beyond

traditional agriculture and manufacturing !   Helps take local culture world-wide !   Promotes country image

!   Cons !   Can lead to environmental impact if not planned well !   Can dilute local traditions and culture if unplanned

World Tourism in Numbers Key Points •  In 2013, >1 billion people spent overnight in another city and spent >

1 trillion USD •  Among oldest civilizations (> 5K years) in the world, of China, Egypt

and India, only China gets and sends tourists in top-5 by numbers and money spent.

•  Tourists go beyond language and history to spend their money for novel experiences

Key Points for Africa and Middle East •  In 2013, there were over 55.7 million international tourist arrivals to

Africa, an increase of 5.4% over 2012. •  In 2013, there were over 51.5 million international tourist arrivals to

the Middle East, a decrease of 0.2% over 2012. •  Top countries are individually getting more tourists than Africa or

Middle-east as a whole (70-80M range v/s 50M-55M)

Tables Courtesy: http://en.wikipedia.org/wiki/World_Tourism_rankings (Accessed 20 Oct, 2014)

Top Cities Tourists Visit (by money spent)

Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/

Top cities are getting money from tourists that countries in Middle East/ Africa are planning by 2020

Top Cities in MEA

Figure Courtesy: MasterCard 2014 Global Destination Cities Index, At http://newsroom.mastercard.com/digital-press-kits/mastercard-global-destination-cities-index-2014/

There is tremendous scope to grow if things are done differently

Possible Strategy to Promote Tourism

!   Increase quality of experience for USPs using better information availability. Examples: !   Service quality – Information on what is happening and what

to expect, when, at what cost; make it easy to consume offerings

!   Remove barriers to travel and spending - Remove perception of lack-of-safety, increase transparency about supporting services like roads, hospitals, taxis

!   Promote domestic tourism in addition to international tourism !   Helps natives inculcate service-industry culture, build capacity

City Concierge (CC): Serving People by Design

!   Target users !   Citizens wanting to know more about their city !   Travellers planning to visit new cities with memorable experiences

!   People (e.g., business, government) wanting to compare cities

!   Group information along a small set of easy-to-follow categories !   We selected - Traffic, health, building, city image, others

!   Easy to change to any set of categories

!   Languages supported – English, Portuguese, Spanish, German !   Easy to extend to any

2nd place winner in Europe’s CitySDK App Hackathon in June 2014 Details: http://www.slideshare.net/biplavsrivastava/city-concierge-presentation10june2014

Serving People by Design !   Target users: Citizens, Travellers, People

Citizens, Travellers Most events – Helsinki Most open service requests - Lisbon

Check Services of Your Favorite City – Chicago, in example

Lisbon (in Portuguese) Bonn(in German)

People, Travellers Most city services – Lisbon; Traffic most common category in cities

CC Design Principles

!   Focus on features that promote usage of city data !   Overcoming language barriers

!   Overcoming API and data diversity barriers !   Highlight commonalities, promote comparison

!   Follow standards ! CitySDK for tourism events upcoming !   Open 311 for city’s non-emergency services and service requests

!   Programming level approach !   Overcome (City API) errors to stay useful !   Be resource efficient to promote mobile apps !   Standardize on output formats

Prototype: Bharat Khoj – Searching Events on Mobile and Web

59

Tourism Capacity Building with Smarter Transportation

Details: •  Making Public Transportation Schedule Information Consumable for Improved Decision Making, Raj

Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE Annual Conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep 16-19, 2012.

•  City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav Srivastava, in 20th ITS World Congress 2013, Tokyo

Promoting Public Transportation: Before and After We Seek

Many cities around the world, and especially in India and emerging ones, are getting their transportation infrastructure in shape.

–  They have multiple, fragmented, transportation agencies in a region (e.g., city) –  They do not have instrumentation on their vehicles, like GPS, to know about their

operations in real-time –  Schedule of public transportation is widely available in semi-structured form. They

are also beginning to invest in new, novel, sensing technologies –  Cities give SMS-based alerts about events on the road. Our approach seeks to accelerate time-to-value for such cities.

Kind of Information Today Available to Bus User

With IRL-Transit+ Benefit

Bus Schedule (static) Available online and pamphlets

Available from IT-enabled devices( low-cost phones, smart phones, web)

Increase accessibility

Bus Schedule Changes (dynamic)

No information Infer from city updates Increase information

Analytics (Bus Selection Decision Support)

No information Will be available (Transit)

Increase information

Standardization of information

No support Will be supported (SCRIBE, Transit)

Increase information’s interoperability

Background: Public Transportation Schedule Information

!   Is widely available for public transportation agencies around the world

!   Gives the basic, static, information about transportation service

!   Usually in semi-structured format with varying semantics

!   Can have errors, missing data

Basic Solution Steps !   Use the widely available schedule information from individual operators

(agencies)

!   Clean and consolidate it across agencies and modes to get a multi-modal view for the region !   Optionally: Convert it into a standard form !   Optionally: Enhance (fuse) it with any real-time updates about

services for the region

!   Perform what-if analysis on consolidated data !   Path finding using Djikstra’s algorithm !   Analyses can be pre-determined, analyses can also be user-

created and defined

!   Make analysis results available as a service !   On any device !   To any subscriber

Multi-Mode Commuting Recommender in Delhi And Bangalore

64

Highlights •  Published data of multiple authorities used; repeatable process • Multiple modes searched •  Preference over modes, time, hops and number of choices supported; more extensions, like fare possible •  Integration of results with map as future work; already done as part of other projects, viz. SCRIBE-STAT

Further Work* !   Invariant Inputs:

!   The person !  has a vehicle (e.g., car), and !  can also walk short distances

!   The city has taxis, buses, metros, autos, rickshaws !  Buses and metros have published routes, frequency and stops !  Autos and rickshaws can be available at stands, or opportunistically, on the road !  Taxis can be ordered over the phone

!   Input: !   A person wants to travel from place A to B !   [Optional] City provides updates on ongoing events, some may

affect traffic !   Output

!   Suggest to the person which mode or combination of modes to select

!   Observation: Using preferences over factors that matter to users to keep commuting convenient, while making best use of available public and para-transit commute methods

* City Notifications as a Data Source for Traffic Management, Pramod Anantharam, Biplav Srivastava, in 20th ITS World Congress 2013, Tokyo

Number of SMS messages for bus stops in Delhi for 2 years (Aug 2010 – Aug 2012)*

•  344 stops with updates •  3931 total stops

* using Exact Matching

IRL – Transit in Aug 2012

Key Points • SMS message from city •  Event and location identified •  Impact assessed •  Impact used in search

Increase Accessibility and Availability of Bus Information to Passengers

Kind of Information

Today Available to Bus Users

With Project in Bangalore

Mysore ITS (for reference)*

Benefit

Bus Schedule (static) Available online and pamphlets

Available from low-cost phones (Spoken Web – Static)

Available online and pamphlets

Increase accessibility

Bus Schedule Changes (dynamic)

No information today

Will be available (Spoken Web - Human)

No information but in plan

Increase information

Bus Location No information today

Will be available (GPS)

Will be available (GPS)

Increase information

Bus Condition No information today

Will be available (Spoken Web - Human)

No information today

Increase information

Analytics (Bus Selection Decision Support)

No information today

Will be available (Transit)

No information but in plan

Increase information

Last –mile Connectivity to/ from nearest stop

No information today

Will be available (Spoken Web - Human)

No information today Increase information

Standardization of information

No support Will be supported (SCRIBE, Transit)

Some support due to GPS

Increase information’s interoperability

* Opinion based on only public information

A Flexible Journey Plan

69

Our End Vision: Information to Commuters to Reach Destination in All Eventuality

Pilots  running  in  Dublin,  Ireland  

Resources !   Tutorial on AI-Driven Analytics In Traffic Management, in conjunction with International

Joint Conference on Artificial Intelligence (IJCAI-13), Biplav Srivastava, Akshat Kumar, at Beijing, China, Aug 3-5, 2013 (tutorial-slides).

!   Tutorial on Traffic Management and AI, in conjunction with 26th Conference of Association for Advancement of Artificial Intelligence (AAAI-12), Biplav Srivastava, Anand Ranganathan, at Toronto, Canada, July 22-26, 2012 (tutorial-slides).

!   Making Public Transportation Schedule Information Consumable for Improved Decision Making, Raj Gupta, Biplav Srivastava, Srikanth Tamilselvam, In 15th International IEEE Annual Conference on Intelligent Transportation Systems (ITSC 2012), Anchorage, USA, Sep 16-19, 2012.

!   Mythologies, Metros & Future Urban Transport , by Prof. Dinesh Mohan, TRIPP, 2008 !   A new look at the traffic management problem and where to start, by Biplav Srivastava, In 18th

ITS Congress, Orlando, USA, Oct 16-20, 2011. ! Arnott, Richard and K.A. Small, 1994, “The Economics of Traffic Congestion,” American

Scientist, Vol. 82, No. 5, pp. 446-455. ! Chengri Ding and Shunfeng Song , Paradoxes of Traffic Flow and Congestion Pricing,

Tourism Capacity Building with Corruption Prevention

Details: •  A Computational Model for Corruption Assessment, Nidhi Rajshree, Nirmit V. Desai and Biplav

Srivastava, IJCAI 2013 Workshop on Semantic Cities, Beijing, 2013 [Corruption-FormalModels]

•  Open Government Data for Tackling Corruption – A Perspective, Nidhi Rajshree, Biplav Srivastava, in AAAI 2012 Workshop on Semantic Cities, Toronto, July 2012. [Area: Open data-Corruption]

Corruption “the misuse of public office for personal gains”

* Source: http://cpi.transparency.org/cpi2012/results/

Corruption afflicts both public and corporate services world wide. It is known that it has a significant negative impact on the growth of economies and hence, is universally considered undesirable.

Corruption : “Monopoly + Discretion – Accountability” (Klitgaard, Robert E. Controlling corruption. Berkeley: U. of California Press, 1988)

A Nation’s Competitiveness and Corruption Perception

Don’t Go Hand-in-Hand

For Promoting Tourism, Corruption Perception has to

be Removed

Corruption – It’s Far and Near

Some Key Questions Related to Corruption

• Exchange of money: can a service for which the customer does not pay a fee (free service) be termed corrupt? Or conversely, can a corrupt practice only happen if the customer pays for a service?

• Human agents: can a service be corrupt if the agent delivering the service is not a human but an automated agent?

• Contention for resources: can corruption happen if delivering it requires no contention of resources? Alternatively, if resources are scarce, will an objective way of allocating them help remove corruption?

Metamodel – Expressing Key Concepts for Corruption

Provider  

Ac-vity  Process  

Task   Decision  Inputs   Outputs  Escala-on  

Requestor  

0..1   *  

1   +  

Person  

Organiza-on  

1  

1  

1  

1  

1   1   1  

*  Process  Instance   *  

Ac-vity  Instance  

1  

+  

Execu-on  Time  

Execu-on  Cost  

1  

1  1  

1  

1  

Framework Evaluation, by Example

National Registration - Kenya

1. Submit supporting documents

2. Validate

docs

4. Handover serialized App Form

11. App signed and stamped by Chief

Asst. Officer

12. Submit documents to

NRB

13. Verify identity of the

applicant

14. Process ID Card

17. Collect ID Card

- Proof of birth - Proof of citizenship - Proof of residence

5. Fill and submit application form

- Form 101 - Form 136 A - Form 136 C

6. Take finger prints

7. Click photograph for ID card

8. Handover the waiting card

10. Submit documents

to Chief

3. Vetting 15. Send ID card to the

Registration Office - Additional proof of

residence

Ancestral home town is a border district or age >> 18

Insufficient documents

Sufficient documents

9. Receive waiting card and wait for processing

16. Receive ID Card from

NRB

Citi

zen

Reg

istr

atio

n O

ffic

er

Satisfied

Not satisfied

Vet

ting

Com

mitt

ee

Ch.

Ass

t. O

ffic

er

NR

B O

ffic

er

National Registration Kenya India (Aadhar) USA (Social Security)

•  The decision node, 3 - vetting, and the activity, 13 - verify identity, are discretionary with no clear mechanism on how to accomplish them.

•  In contrast, the checks for documents having been submitted are objective.

•  There is no Service Level Agreement (SLA) for the process.

•  The ID process is monopolistic since only a single authority

•  (registration office) can process it. •  The process has little reviewability and

low visibility since there is no escalation mechanism.

•  18 Proofs of Identity (PoI) and 33 Proofs of Address (PoA) documents are permitted for making the request.

•  The process also allows discretion by allowing at- tested documents from high-level officials.

•  The cost and time limits for the service are prescribed.

•  The process, however, can only be handled by a single agency creating a monopoly.

•  In SS, a clear list of documents proving US citizenship (or legal residence), age and identity is listed.

•  There is little room for discretion because no category allows a signed attestation by a high-level official to be acceptable

•  The cost and time limits for the service are prescribed.

•  The process, however, can only be handled by a single agency creating a monopoly.

Framework Evaluation, by Example

International Driving Permit (IDP) 1. Submit supporting documents

2. Validate docs

5. Handover Appl Form

10. Stamp and sign the

IDP

13. Collect IDP

- Driver’s license - Passport - Air tickets - VISA

5. Fill and submit application form

- Form CMV1

+ 4. DL Address change

process

8. Verify

applicants driving skills

DL address not under RTO jurisdiction

Insufficient documents

DL address under RTO jurisdiction

Citi

zen

Fron

t Des

k O

ffic

er

Satisfied

Not satisfied

Insp

ecto

r

Reg

iona

l Tr

ansp

ort

Off

icer

3. Validate

address

7. Send applicant for DL Test

6. Verify DL

issuance date

9. Send application to Regional Transport Officer

11. Send IDP to front

desk officer

12. Receive IDP from Regional

Transport Officer

Address has not changed

DL issued within 3 months

Address has changed

DL issued within more than 3 months

International Driving License

India (IDP) USA (AAA)

•  Service execution cost is specified (of Rs 500) but not service execution time given.

•  There is no escalation mechanism •  The check whether all documents have

been sub- mitted is objective. •  The IDP is monopolistic since only a

single authority (RTO) can process it. •  The process has little reviewability and

low visibility since there is no escalation mechanism.

Procedure involves filling a form online, visiting the office of an authorized agency with a valid state-issued driver’s license, photos and fees, and getting the permit. Here, there are multiple agencies to process the request and the prerequisite driver license can be verified objectively (e.g., with social security databases). •  No monopoly •  Objective criteria

Tackling Corruption Tackling corruption pro-actively:

!   Open Gov. Data !   Increases transparency hence increasing the risk of being caught in the

act of corruption !   Makes measurements by SLAs possible

!   Process Redesign !   Ensures a robust process design reducing corruption hotspots !   Reduce monopoly, discretion

!   Automation !   Automation needs outcomes to be formally defined !   Reduces discretion, requires data (input, output, outcome) to be

adequately captured

Corruption : “Monopoly + Discretion – Accountability” (Klitgaard, Robert E. Controlling corruption. Berkeley: U. of California Press, 1988)

Running Example – Potential Applications of Temperature at Conference Location (Over Time)

!   External temperature !   Environment models, weather forecasting, pollution

spread models, disease spread rates, …

!   Internal temperature !   Energy management, security management, building

management, traffic management, …

!   Temperature is unrelated to technical program. Imagine what all can be enabled with conference’s technical content if made machine consumable with APIs and used for real applications ?

Call for Action

!   Main message !   Use more open data in your research !   Build apps and make them out available

!   Specifics !   Governments should

!   Come out with data sharing/ disclosure policies, and !  Example: USA - US Executive Order 13556, Controlled Unclassified Information, At

http://www.whitehouse.gov/the-pressoffice/2010/11/04/executive-order-controlled-unclassifiedinformation

!  Example: India - National Data Sharing and Accessibility Policy (NDSAP) at http://dst.gov.in/NDSAP.pdf

!   Come out with specific application licensing guidelines !   Implement them!

!   Academia must !   Lead research in this area !   Make their own data available in linked open form (LOD)

!   Industry and standardization bodies should help !   by documenting best practices !   building necessary tools !   using open standards, and !   reporting case studies.

Thank You

Merci Grazie

Gracias Obrigado

Danke

Japanese

French

Russian

German Italian

Spanish

Portuguese

Arabic

Traditional Chinese

Simplified Chinese

Hindi

Romanian

Korean

Multumesc

Turkish

Teşekkür ederim

English

Dr. Biplav Srivastava, [email protected]://www.research.ibm.com/people/b/biplav/


Recommended