Date post: | 09-May-2015 |
Category: |
Technology |
Upload: | the-guardian-open-platform |
View: | 9,614 times |
Download: | 0 times |
21 May 2010Apache Lucene EuroCon
1
From publisher to platformHow the guardian used content, search, and open source to build a powerful new business modelStephen Dunn, Guardian News and Media
21 May 2010Apache Lucene EuroCon 2
The publishing era
21 May 2010Apache Lucene EuroCon
We started a long time ago:
21 May 2010Apache Lucene EuroCon
To secure the financial and editorial independence of the Guardian in perpetuity. To promote freedom in the press and liberal journalism globally.
To become the world's leading liberal voice.
“To secure the financial and editorial independence of The Guardian in perpetuity.”
“To promote freedom in the press and liberal journalism globally.”
21 May 2010Apache Lucene EuroCon
2010
21 May 2010Apache Lucene EuroCon
Swine flu
Keyword page
Twitter updates
Content partnerships
Audio
Video Data API
Live blogs
Comment
Mobile siteiPhone app
Newspapers
2010
21 May 2010Apache Lucene EuroCon 6
1996
21 May 2010Apache Lucene EuroCon
1999
7
21 May 2010Apache Lucene EuroCon
1999
7
21 May 2010Apache Lucene EuroCon 8
01-> 06
21 May 2010Apache Lucene EuroCon 9
2009★ 1.5M pages
and counting
★ 250M+ pages/month
★ 30M visitors/month
★ 4x Webby award winner (best newspaper site)
21 May 2010Apache Lucene EuroCon 9
2009★ 1.5M pages
and counting
★ 250M+ pages/month
★ 30M visitors/month
★ 4x Webby award winner (best newspaper site)
21 May 2010Apache Lucene EuroCon 9
2009★ 1.5M pages
and counting
★ 250M+ pages/month
★ 30M visitors/month
★ 4x Webby award winner (best newspaper site)
21 May 2010Apache Lucene EuroCon 9
2009★ 1.5M pages
and counting
★ 250M+ pages/month
★ 30M visitors/month
★ 4x Webby award winner (best newspaper site)
21 May 2010Apache Lucene EuroCon 10
Part of the Web
21 May 2010Apache Lucene EuroCon
• “A cool URI is one that does not change” Tim Berners-Lee 1998• 1.5 million resources redirected to new scheme
11
1. Permanent
http://www.flickr.com/photos/fstorr/
21 May 2010Apache Lucene EuroCon 12
2. Addressable★ Resources are “about” something - ready for the
social web.
★ We live in “the age of point-at-things” (Coates 2005)
21 May 2010Apache Lucene EuroCon 13
★ Multiple routes to content
★ Tagging drives discovery
3. Discoverable
21 May 2010Apache Lucene EuroCon 13
★ Multiple routes to content
★ Tagging drives discovery
3. Discoverable
21 May 2010Apache Lucene EuroCon 13
★ Multiple routes to content
★ Tagging drives discovery
3. Discoverable
21 May 2010Apache Lucene EuroCon 13
★ Multiple routes to content
★ Tagging drives discovery
3. Discoverable
21 May 2010Apache Lucene EuroCon 14
21 May 2010Apache Lucene EuroCon
The hackable guardian.co.ukhttp://www.guardian.co.uk/....
21 May 2010Apache Lucene EuroCon
/technology/internet
/technology/all
/environment/climatechange
The hackable guardian.co.ukhttp://www.guardian.co.uk/....
21 May 2010Apache Lucene EuroCon
/technology/internet
/technology/all
/environment/climatechange
The hackable guardian.co.ukhttp://www.guardian.co.uk/....
+business/globaleconomy
21 May 2010Apache Lucene EuroCon
/technology/internet
/technology/all
/environment/climatechange
The hackable guardian.co.ukhttp://www.guardian.co.uk/....
+business/globaleconomy
21 May 2010Apache Lucene EuroCon
/technology/internet
/technology/all
/environment/climatechange
The hackable guardian.co.ukhttp://www.guardian.co.uk/....
/rss
/rss
+business/globaleconomy/rss
21 May 2010Apache Lucene EuroCon
Results...
16
21 May 2010Apache Lucene EuroCon 17
First release
Final ReleaseSite traffic growthUnique Users
21 May 2010Apache Lucene EuroCon 17
3,750,000
7,500,000
11,250,000
15,000,000
18,750,000
22,500,000
26,250,000
30,000,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Uni
que
Use
rs
First release
Final ReleaseSite traffic growthUnique Users
21 May 2010Apache Lucene EuroCon 17
3,750,000
7,500,000
11,250,000
15,000,000
18,750,000
22,500,000
26,250,000
30,000,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Uni
que
Use
rs Pre - project
First release
Final ReleaseSite traffic growthUnique Users
21 May 2010Apache Lucene EuroCon 17
3,750,000
7,500,000
11,250,000
15,000,000
18,750,000
22,500,000
26,250,000
30,000,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Uni
que
Use
rs Pre - project
First release
Final ReleaseSite traffic growthUnique Users
36M
21 May 2010Apache Lucene EuroCon
However...
18
21 May 2010Apache Lucene EuroCon 19
1 Billion+Internet Users!
21 May 2010Apache Lucene EuroCon 20
21 May 2010Apache Lucene EuroCon 21
21 May 2010Apache Lucene EuroCon 22
21 May 2010Apache Lucene EuroCon 23
....”How I stopped worrying about my website and learned to love the whole Internet.”
Matt McAlister
21 May 2010Apache Lucene EuroCon 24
OPEN IN
Bring in data and apps from the Internet
OPEN OUT
Enable partners to build applications using Guardian content and services for other digital platforms
The Open Strategy
21 May 2010Apache Lucene EuroCon 25
21 May 2010Apache Lucene EuroCon 26
21 May 2010Apache Lucene EuroCon 27
21 May 2010Apache Lucene EuroCon 28
"Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.”
21 May 2010Apache Lucene EuroCon 29
BETA
The Open Platform
21 May 2010Apache Lucene EuroCon 30
OPEN OUT
Allow partners to build applications using Guardian content and services for other digital platforms
OPEN IN
Bring in data and apps from the Internet
BETA
OPEN IN
Bring in data and apps from the Internet
21 May 2010Apache Lucene EuroCon 30
OPEN OUT
Allow partners to build applications using Guardian content and services for other digital platforms
OPEN IN
Bring in data and apps from the Internet
BETA
21 May 2010Apache Lucene EuroCon 31
The suite of services enabling partners to build
applications with the Guardian
BETA
21 May 2010Apache Lucene EuroCon
32
BETA
21 May 2010Apache Lucene EuroCon
32
CONTENT API
A service for selecting and
collecting content from the Guardian
for re-use
DATA STORE
A directory of useful data curated by Guardian editors
POLITICS API
Open database of candidates, voting
records, constituencies, election results,
live data on election day
BETA
21 May 2010Apache Lucene EuroCon
Guardian database
CMSSearch engine
REST API
Your App Here!BETA
CONTENT APIA service for selecting and collecting content from the Guardian for
re-use
21 May 2010Apache Lucene EuroCon 34
BETA
21 May 2010Apache Lucene EuroCon 35• Stamen Design - APIMaps.org
21 May 2010Apache Lucene EuroCon 36
Text
21 May 2010Apache Lucene EuroCon
BETA
DATA STOREA directory of
useful data curated by Guardian
editors
21 May 2010Apache Lucene EuroCon
BETA
POLITICS APIOpen database of candidates, voting
records, constituencies, election results, live data on election day
21 May 2010Apache Lucene EuroCon 39
POLITICS APIOpen database of candidates, voting
records, constituencies, election results, live data on election day
BETA
21 May 2010Apache Lucene EuroCon 40
Open for Business
BETA
21 May 2010Apache Lucene EuroCon 40
Open for Business
21 May 2010Apache Lucene EuroCon 41
3 Tiers of access, 3 Revenue models
BESPOKE: Take, reformat, augment our content. Same access as Guardian. Revenue model to be negotiated. Combination of Media, Fees, Downloads.
APPROVED: Take our full article content, with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue
KEYLESS: Take our headlines. You keep associated revenues
1
21 May 2010Apache Lucene EuroCon 42
21 May 2010Apache Lucene EuroCon 43
OPEN OUT: Developers can now access our full content APIs on demand with keys post-approved.
We are now positioning the platform as a place to do business with us.
So, rapid scalability, reliability, performance, are now core requirements
What this means
21 May 2010Apache Lucene EuroCon
44
CONTENT APIA service for selecting and collecting content from the Guardian for
re-use
DATA STOREA directory of
useful data curated by Guardian
editors
POLITICS APIOpen database of candidates, voting
records, constituencies,
election results, live data on election day
2 Open In
21 May 2010Apache Lucene EuroCon
44
CONTENT APIA service for selecting and collecting content from the Guardian for
re-use
DATA STOREA directory of
useful data curated by Guardian
editors
POLITICS APIOpen database of candidates, voting
records, constituencies,
election results, live data on election day
MICROAPPSA framework for
integrating 3rd party applications into guardian.co.uk.
2 Open In
21 May 2010Apache Lucene EuroCon 45
OPEN OUT
Allow partners to build applications using Guardian content and services for other digital platforms
OPEN IN
Bring in data and apps from the Internet
21 May 2010Apache Lucene EuroCon 46
21 May 2010Apache Lucene EuroCon 47
21 May 2010Apache Lucene EuroCon 48
App showcase
21 May 2010Apache Lucene EuroCon 49
What this meansOpen In: Partners can now more easily integrate into our core
The Open Platform will become key to our commercial future.
21 May 2010Apache Lucene EuroCon 50
Evolving the architecture
21 May 2010Apache Lucene EuroCon 51
From Publisher to Platform
★Seeking massive growth, but no longer only broadcasting content
★User/partner engagement & contribution on★journalism★data★software★applications★revenue and ads
★ Support developers and partners with data and APIs, need scalability, reliability, speed
21 May 2010Apache Lucene EuroCon
App server App server App server
Web server Web server Web server
CMS
Oracle
Memcached
21 May 2010Apache Lucene EuroCon
App server App server App server
Web server Web server Web server
CMS Data feeds
Oracle
Memcached
Why RDBMS?
5 years ago, fewer alternatives
Understand operations procedures
Can easily recruit DBAs / devs
Developer/ops tools
Business critical system: a safe choice
21 May 2010Apache Lucene EuroCon 54
Scaling
21 May 2010Apache Lucene EuroCon 55
Unique Users
21 May 2010Apache Lucene EuroCon 55
3,750,000
7,500,000
11,250,000
15,000,000
18,750,000
22,500,000
26,250,000
30,000,000
Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009
Uni
que
Use
rs
Unique Users
21 May 2010Apache Lucene EuroCon 56
Unique Users
21 May 2010Apache Lucene EuroCon
12,250,00014,500,00016,750,00019,000,00021,250,00023,500,00025,750,00028,000,000
May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 200956
Unique Users
21 May 2010Apache Lucene EuroCon
Whatʼs going on?
57
★We tag our content (multifaceted)
★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.
★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS
21 May 2010Apache Lucene EuroCon
Whatʼs going on?
57
★We tag our content (multifaceted)
★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.
★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS
21 May 2010Apache Lucene EuroCon
Whatʼs going on?
57
★We tag our content (multifaceted)
★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.
★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS
21 May 2010Apache Lucene EuroCon 58
“Related content” from search engine
21 May 2010Apache Lucene EuroCon
59
21 May 2010Apache Lucene EuroCon
Guardian database
CMSSearch engine
REST API
Your App Here!
CONTENT APIA service for selecting and collecting content from the Guardian for
re-use
21 May 2010Apache Lucene EuroCon 61
21 May 2010Apache Lucene EuroCon
We used Solr/LuceneCan perform complex queries, including full text search
We can change the schema with no downtime.
On our dataset most queries are of a similar cost
Scales very well horizontally
Replication makes it easy to work in the cloud
62
21 May 2010Apache Lucene EuroCon
App server
Web servers
CMS
Memcached
Core
rdbms
63
Solr
Content API
Cloud, EC2
21 May 2010Apache Lucene EuroCon
App server
Web servers
CMS
Memcached
Core
Solr
Solr
Solr
Solr
Solr
rdbms
63
21 May 2010Apache Lucene EuroCon
MICROAPPSA framework for
integrating 3rd party applications into guardian.co.uk.
Simple REST/ HTTP framework allows lightweight development
Applications proxied for performance
Apps generally hosted in the cloud, hot deployment into production
Open in?
21 May 2010Apache Lucene EuroCon
MICROAPPSA framework for
integrating 3rd party applications into guardian.co.uk.
Simple REST/ HTTP framework allows lightweight development
Applications proxied for performance
Apps generally hosted in the cloud, hot deployment into production
Open in?
21 May 2010Apache Lucene EuroCon
App server
Web servers
CMS
Memcached
Core
App
App
App
App
App
App
Apps
Proxy
external hostingapp engine etc
rdbms
65
21 May 2010Apache Lucene EuroCon
App servers
Web servers
CMS
Memcached
Solr
Solr
Solr
Solr
Solr
Solr
Cloud, EC2
App
App
App
App
App
App
Proxyexternal hostingapp engine etc
rdbms
OPEN IN OPEN OUT
21 May 2010Apache Lucene EuroCon
C
Clo
OI
external r
C
Clo
OI
external r
CONTENT
???????
21 May 2010Apache Lucene EuroCon 68
Thank you
http://www.guardian.co.uk/open-platform
Twitter: @openplatform @cuica (Stephen Dunn)