Date post: | 25-Jun-2015 |
Category: |
Technology |
Upload: | stephen-wang |
View: | 1,960 times |
Download: | 1 times |
Who is this NOT for?
Building a large database from a tiny team Organizing the world's information Information innovation
Who IS this for?
About
Co-founder, CTO Popular movie reviews web site Aggregated reviews,
comprehensive film database
The Stone Age
Static HTML templates
Editors read articles and pull quotations
Only cover the newest movies
~1000 films
Modern Times
Shift to LAMP License long-tail
database Automated spiders,
early UGC via critics Use homegrown
CMS for additional content
(How I felt maintaining Rotten Tomatoes' overloaded database servers)
8 million unique visitors / month Lean startup: 25x traffic with 7 staff Great site for film lovers (including Steve Jobs)
v
The Result
About Co-founder, CTO
SNS for artists started with Daniel Wu 吴彦祖
Started with six artists, now 1,600 artists, 600K registered users
Also powers official web sites:
李连杰: JetLi.com
成龙: JackieChan.com
莫文蔚: KarenMok.com
Our LAMP stack: Not the best setup for...Newsfeeds...
Viral loop analysis...
Multivariate testing...
The Problem?!?Scalability issues with real-time data, but without traffic from
public, long-tail content
About
A better entertainment database
Providing the long-tail content
Still a part of alivenotdead.com
Still in alpha
Features Comprehensive info
for celebrities, films, music, and TV
Searchable, structured data
Multilingual: English, Chinese, Japanese
Aggregated social media from inside/outside China
Why use mongoDB?
Flexible schema for different data sources
Dozens of other sources...
Why use
Scalable big data 500,000 translations
Next challenge:
Aggregating and storing the social media firehose
2 million+ topics covered
Why use
Crossing the border... alive.tom.com in
Tianjin Alivenotdead.com
in Hong Kong
Use replica sets/eventual consistency to overcome frequent cross-border network issues
Wikipedia as structured data Creative Commons license
Multiple CC sources Organized taxonomy Acquired by Google No Chinese/Japanese yet!
Using Linked Open Data
Wikipedia as structured data Creative Commons license
Only Wikipedia Messy taxonomy Chinese/Japanese topic
translations, but requires English topic link
Using Linked Open Data
Using Linked Open Data
Use Freebase organized taxonomy, broad data Expand DBpedia to Chinese-only topics Same methodology across Chinese wiki sources
The Future
Developer API Topic extraction Real-time trends
across languages Other verticals
Already 10x more data than Rotten Tomatoes...
The complete sum of information from across the web...
Information not constrained by language...