+ All Categories
Home > Entertainment & Humor > NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledge Generation on Top of...

NiceCover: A Serverless Webapp for Crowdsourcing Data Extraction and Knowledge Generation on Top of...

Date post: 16-May-2015
Category:
Upload: marat-zhanikeev
View: 718 times
Download: 1 times
Share this document with a friend
Description:
In spite of the fact that it is now possible to host socially distributed web applications in the cloud without resorting to traditional servers, there are few existing tools which even try to support social collaboration when generating knowledge. Semantic Web in its early definition has withered, replaced with search services (APIs) provided separately by each (scientific) portal. This creates a void which can be filled by social groups which can collaborate in knowledge generation on top of the data provided by scientific portals. This paper presents one such application codenamed NiceCover. Usecases in the paper are based on the web portal for IEICE Technical Meetings (kens) but the webapp itself can be applied to any portal(s).
Popular Tags:
29
Transcript

.

Objectives

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 2/26...

2/26

.

Objectives

1. facilitate true social collaboration of researchers

2. base your collaboration on existing scientific portals◦ like IEICE iscover, IEEE Xplore, SpringerLink, ADL, etc.

3. create software for that

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 3/26...

3/26

.

What's a Serverless Webapp?

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 4/26...

4/26

.

Serverless Apps in the Cloud

.Serverless Webapp.....

.

... is a webapp that can run

without a web server

• entirely based on clouds• hosted in cloud drives• data management using

APIs◦ Google Drive API, Dropbox

API

Service/AppsPortal

The Internet

TrulyOpenClientsTraditional

Clients

Data

Data

DataApps

SyncData

BigData

User

FacilitatorsData Hoarders

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 5/26...

5/26

.

What's Social Collaboration... in the world of webapps and cloud APIs?

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 6/26...

6/26

.

Social Collaboration in the Cloud• data can be yours only or it can be shared

• you can see all these models inpractice today

• the shared model is onlyused by big players (big data)07

Data App

Data

Data App Data

App

Data App

App App

Data Data Data

Data App

App

Data App

07 R.Barros+4 "A Collaborative Approach to Building Evaluated Web Pages Datasets" Future Gen. Comp.Sys., vol.27(1) (2011)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 7/26...

7/26

.

Simple Example: maps2graphshttp://tinyurl.com/maps2graphs• you can try it now! just let it run or commit before closing

.Problem Statement..

.

You need to create a graph where nodes are actual places on a map and

links are roads connecting them

• the above link: 300 Family Marts in Fukuoka

• 35k GoogleMaps API requests, assuming A→B = B→A routes◦ free accounts are allowed 2.5k requests per day

• non-social solution: single client, 18 days

• social solution: 100 clients, a couple of hoursM.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 8/26

...

8/26

.

Step 1: Make Social CollaborationHappen

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 9/26...

9/26

.

Step 1 : Make Social CollaborationHappen

1. is someone interested in graphs of actual locations?◦ supply chains, road traffic optimization, etc.

2. is someone interested in research collaboration based oncontent published in IEEE Xplore?◦ post-publication discussion?◦ professor-students collaboration in a lab?

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 10/26...

10/26

.

Step 2: Distribute and Make itData-Centric

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 11/26...

11/26

.

Step 2: Distribute and Make itData-Centric

.Problem..

.

.... is that most existing software is all about big players and is completely

useless for real social collaboration

• problems that need solutions

1. client-side indexing with cloud storage 12• Lucene is a really bad choice! 13

2. seamless operation on top of scientific portals• LinkedData APIs are not enough!

3. reasonable level of security• your data should be safe!

12 myself "Stringex client" https://github.com/maratishe/stringex (current)

13 "Apache Lucene" lucene.apache.org (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 12/26...

12/26

.

Step 2: The Stringex Problem inClient-Side Indexing

• JSON based

• index is created locally inbrowser

• block-wise updates

• optimization problem --minimize traffic exchange 11

◦ part of NiceCover, but havePHP client as well 12

JSON { name: value1, age: value2, …}

Hash table

000 [ ]

001 …

#1 #2 …[ ]

Doc # JSON data

a123d …

53ffe3 { name: value1, age: value2, …}

…. ….

Per JSON key…

hashing

Bit mask

Doc #Doc #

Cloud storage

Localstorage

RealtimeSyncname.block1

Block

Block

name.block2

age.block1

…age.block2

docs.block1

…docs.block2

Cloud Drive API App Space

11 myself "The Stringex Problem: a New Formulation and Optimizations for Client-Side Cloud Applications" Springer (in review)

12 myself "Stringex client" https://github.com/maratishe/stringex (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 13/26...

13/26

.

Step 2: Stringex : Performance

• Lucene is too jittery• Stringex client writes moresmaller files◦ localized updates later on

• Stringex is more efficient

0 4000 8000 12000 16000 20000 24000Transmitted traffic volume (kb)

0

10

20

30

40

50

60

File

cou

nt

Stringex (keyHashMask=4; docHashMask=24)Lucene

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 14/26...

14/26

.

Step 3: Make it easy to Come By andUse

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 15/26...

15/26

.

Step 3: Chrome Extension

• 3 JS scripts as defined byChrome 03b

1. run foreach page2. run once, stays in

background3. control panel

NiceCover x

Browser Extension

Control Panel(user clickable)

Add-to-pageparsing script

Backgroundscript

Parsingreport(popup)

Cloud Drivewrites onlyCloud Drive

reads/writes

Status

Pass datafor storage

03b "Google Chrome Extensions Developer Reference" http://developer.chrome.com/extensions/getstarted.html (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 16/26...

16/26

.

Step 3 : Full Automation

• will run with zero userinput

• user still has freedom to cancelindexing of pages

• Stringex index and cloud syncall happen in the background◦ assuming you do not close your

browser...

NiceCover(robot)

Users

Extractdata

URL Prefix Match

Waitfor input

Storedefault

Hide popup

Timeout

Manualinput

Userclick

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 17/26...

17/26

.

Step 4: Make it Secure

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 18/26...

18/26

.

OAuth-Based Communication• NiceCover will only use what others shared with you

Cloud Drive API

Miner

Mapper

User’s own content

NiceCover webappNiceCover Public DataNiceCover Private Datax

xx

Min

erM

appe

r

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 19/26...

19/26

.

Public and Private, ... or Both?• shadow writes,private + public

• simple protectionagainst

maliciousactions◦ user confirmsdeletes andoverwrites

◦ added data islogged andpresented to user onrequest

◦ rollback backups

NiceCoverMapper

Cloud DriveProvider

x

NiceCoverx

NiceCoverx…

NiceCoverMapper

x

NiceCoverMapper

xPeer connection

Shadowwrites

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 20/26...

20/26

.

A DEMO?

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 21/26...

21/26

.

That’s all, thank you ...

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 22/26...

22/26

.

[01] (current)The enPiT Projecthttp://www.enpit.jp

[02] (current)NiceCover Project Pagehttps://github.com/maratishe/nicecover

[03] (current)Kontagent Homepagehttp://www.kontagent.com/

[03b] (current)Google Chrome Extensions Developer Referencehttp://developer.chrome.com/extensions/getstarted.html

[04] (current)Dropbox Homepagewww.dropbox.com/

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 22/26...

22/26

.

[05] (current)IEEE Xplore Terms of Usehttp://ieeexplore.ieee.org/xpl/termsOfUse.jsp

[06] (current)IEICE iSCover Search Enginehttp://i-scover.ieice.org/

[07] R.Barros+4 (2011)A Collaborative Approach to Building Evaluated Web Pages DatasetsFuture Gen. Comp.Sys., vol.27(1)

[08] myself (in review)On Metro Maps versus Ontology Graphs in Assisted Context Creation andBrowsingIEICE Tran. Info.

[09] K.Nesbitt (2004)Getting to more abstract places using the metro map metaphorConf. on Info. Visual. (IV)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 22/26...

22/26

.

[10] myself (current)Metromaps Projecthttps://github.com/maratishe/metromaps

[11] myself (in review)The Stringex Problem: a New Formulation and Optimizations for Client-Side CloudApplicationsSpringer

[12] myself (current)Stringex clienthttps://github.com/maratishe/stringex

[13] (current)Apache Lucenelucene.apache.org

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 23/26...

23/26

.

Q/A 1: Structure/Visualization

• currently, NiceCover only providesraw metadata

• need to connect

• metromaps is a good wayto do that 08 09 10

• otherwise, ontologies

viz

pcapca

dependency

test

visualization

multidimensional

softwaresoftwaresoftwaresoftware

models

kbseken

smil

multimedia

session

performance

todotodotodotodotodo testing

modeling

benchmark

vne

optimization

cloud

ieiceconf

ospf

crosslayer

game

theory

opportunetstic

opportunistic

networking

multiflow

aggregationaggregation

hardware

p2pwifi

wifi

direct

multiple

connectivity

google

wirelesswireless

drive

content

rcskenrcsken

virtual

access

adhoc

aodv

e2e

path

establishment

battery

differential

backup

cost

budget

depletion

energy

efficiency

tomo

tomography

network

coordinates

endtoend

delay

networks

matrix

missingmissingmissingmissingmissingmissingmissing

values

08 myself "On Metro Maps versus Ontology Graphs in Assisted Context Creation and Browsing" IEICE Tran. Info. (in review)

09 K.Nesbitt "Getting to more abstract places using the metro map metaphor" Conf. on Info. Visual. (IV) (2004)

10 myself "Metromaps Project" https://github.com/maratishe/metromaps (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 23/26...

23/26

.

Q/A 2: Why Not Wikipedia Style?1. Wikipedia is few create, many watch model

◦ collaborators are roughly equal in NiceCover, you contribute by browsing◦ some may contribute more (professors versus students) but the effort gap is not

extreme

2. Wikipedia is a centralized place◦ NIceCover is a bottom-up aggregation◦ there is a hiararchy◦ sideways (peer) connections are also important

3. Wikipedia pages are about one thing with references to other things◦ NIceCover nodes are aggregates of things under a rough common intersection of topics◦ at least they are supposed to be by design◦ see metromaps 08 09 10

08 myself "On Metro Maps versus Ontology Graphs in Assisted Context Creation and Browsing" IEICE Tran. Info. (in review)

09 K.Nesbitt "Getting to more abstract places using the metro map metaphor" Conf. on Info. Visual. (IV) (2004)

10 myself "Metromaps Project" https://github.com/maratishe/metromaps (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 24/26...

24/26

.

Q/A 3: Is It Even Legal?

• not working with files, onlymetadata freely availabled on websites◦ think about this as group browsing

• does not violate existing terms of use◦ IEEE Xplore 05◦ IECE iSCover seems to have no terms of use

• no problem withDropbox -- we are just storing some data

05 "IEEE Xplore Terms of Use" http://ieeexplore.ieee.org/xpl/termsOfUse.jsp (current)

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 25/26...

25/26

.

Q/A 4: Why Bother? IEEE XploreDoes it Better!

• you seem to have missed the entire point

• IEEE Xplore: all about big information◦ big data: big aggregates of information -- papers, presentations, etc.◦ with growing volume, precision/focus/relevance is lost

• NiceCover: all about specific information◦ focus is as big as your collaboration -- which is normally comparatively small

• big portals have poor APIs -- not suitable for collaboration, search only

M.Zhanikeev -- [email protected] -- NiceCover: A Serverless Webapp for ... on top of Scientific Portals 26/26...

26/26


Recommended