+ All Categories
Home > Technology > Building a super database from linked data

Building a super database from linked data

Date post: 25-Jun-2015
Category:
Upload: stephen-wang
View: 1,960 times
Download: 1 times
Share this document with a friend
Description:
Stephen Wang http://stephenwang.comAlivenotdead.com CTOmongoDB Beijing Presentation (March 3, 2011):From Rotten Tomatoes to alivenotdead.com to alive.cn, an explanation of the evolution of building an entertainment database at each stage of evolution. The current version is a multi-lingual global entertainment database using linked open data and mongoDB.
Popular Tags:
18
Building a super database from linked data Stephen Wang 王傳仁 [email protected] March 3, 2011
Transcript
Page 1: Building a super database from linked data

Building a super database from linked data

Stephen Wang 王傳仁[email protected]

March 3, 2011

Page 2: Building a super database from linked data

Who is this NOT for?

Building a large database from a tiny team Organizing the world's information Information innovation

Who IS this for?

Page 3: Building a super database from linked data

About

Co-founder, CTO Popular movie reviews web site Aggregated reviews,

comprehensive film database

Page 4: Building a super database from linked data

The Stone Age

Static HTML templates

Editors read articles and pull quotations

Only cover the newest movies

~1000 films

Page 5: Building a super database from linked data

Modern Times

Shift to LAMP License long-tail

database Automated spiders,

early UGC via critics Use homegrown

CMS for additional content

(How I felt maintaining Rotten Tomatoes' overloaded database servers)

Page 6: Building a super database from linked data

8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)

v

The Result

Page 7: Building a super database from linked data

About Co-founder, CTO

SNS for artists started with Daniel Wu 吴彦祖

Started with six artists, now 1,600 artists, 600K registered users

Also powers official web sites:

李连杰: JetLi.com

成龙: JackieChan.com

莫文蔚: KarenMok.com

Page 8: Building a super database from linked data

Our LAMP stack: Not the best setup for...Newsfeeds...

Viral loop analysis...

Multivariate testing...

The Problem?!?Scalability issues with real-time data, but without traffic from

public, long-tail content

Page 9: Building a super database from linked data

About

A better entertainment database

Providing the long-tail content

Still a part of alivenotdead.com

Still in alpha

Page 10: Building a super database from linked data

Features Comprehensive info

for celebrities, films, music, and TV

Searchable, structured data

Multilingual: English, Chinese, Japanese

Aggregated social media from inside/outside China

Page 11: Building a super database from linked data

Why use mongoDB?

Flexible schema for different data sources

Dozens of other sources...

Page 12: Building a super database from linked data

Why use

Scalable big data 500,000 translations

Next challenge:

Aggregating and storing the social media firehose

2 million+ topics covered

Page 13: Building a super database from linked data

Why use

Crossing the border... alive.tom.com in

Tianjin Alivenotdead.com

in Hong Kong

Use replica sets/eventual consistency to overcome frequent cross-border network issues

Page 14: Building a super database from linked data

Wikipedia as structured data Creative Commons license

Multiple CC sources Organized taxonomy Acquired by Google No Chinese/Japanese yet!

Using Linked Open Data

Page 15: Building a super database from linked data

Wikipedia as structured data Creative Commons license

Only Wikipedia Messy taxonomy Chinese/Japanese topic

translations, but requires English topic link

Using Linked Open Data

Page 16: Building a super database from linked data

Using Linked Open Data

Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources

Page 17: Building a super database from linked data

The Future

Developer API Topic extraction Real-time trends

across languages Other verticals

Already 10x more data than Rotten Tomatoes...

The complete sum of information from across the web...

Information not constrained by language...

Page 18: Building a super database from linked data

We're hiring PHP engineers! Send your CV to [email protected]

My blog: http://stephenwang.com


Recommended