+ All Categories
Home > Technology > From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a...

From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a...

Date post: 09-May-2015
Category:
Upload: the-guardian-open-platform
View: 9,614 times
Download: 0 times
Share this document with a friend
Description:
Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content. This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy. We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.
93
21 May 2010 Apache Lucene EuroCon 1 From publisher to platform How the guardian used content, search, and open source to build a powerful new business model Stephen Dunn, Guardian News and Media
Transcript
Page 1: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

1

From publisher to platformHow the guardian used content, search, and open source to build a powerful new business modelStephen Dunn, Guardian News and Media

Page 4: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

To secure the financial and editorial independence of the Guardian in perpetuity. To promote freedom in the press and liberal journalism globally.

To become the world's leading liberal voice.

“To secure the financial and editorial independence of The Guardian in perpetuity.”

“To promote freedom in the press and liberal journalism globally.”

Page 6: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Swine flu

Keyword page

Twitter updates

Content partnerships

Audio

Video Data API

Live blogs

Comment

Mobile siteiPhone app

Newspapers

2010

Page 11: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

Page 12: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

Page 13: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

Page 14: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

Page 17: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 12

2. Addressable★ Resources are “about” something - ready for the

social web.

★ We live in “the age of point-at-things” (Coates 2005)

Page 23: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

Page 24: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

Page 25: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

+business/globaleconomy

Page 26: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

+business/globaleconomy

Page 27: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

/rss

/rss

+business/globaleconomy/rss

Page 30: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs

First release

Final ReleaseSite traffic growthUnique Users

Page 31: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs Pre - project

First release

Final ReleaseSite traffic growthUnique Users

Page 32: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs Pre - project

First release

Final ReleaseSite traffic growthUnique Users

36M

Page 38: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 23

....”How I stopped worrying about my website and learned to love the whole Internet.”

Matt McAlister

Page 39: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 24

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other digital platforms

The Open Strategy

Page 43: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 28

"Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.”

Page 45: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 30

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

BETA

OPEN IN

Bring in data and apps from the Internet

Page 46: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 30

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

BETA

Page 47: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 31

The suite of services enabling partners to build

applications with the Guardian

BETA

Page 49: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

32

CONTENT API

A service for selecting and

collecting content from the Guardian

for re-use

DATA STORE

A directory of useful data curated by Guardian editors

POLITICS API

Open database of candidates, voting

records, constituencies, election results,

live data on election day

BETA

Page 50: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Guardian database

CMSSearch engine

REST API

Your App Here!BETA

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

Page 55: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

BETA

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

Page 56: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 39

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

BETA

Page 59: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 41

3 Tiers of access, 3 Revenue models

BESPOKE: Take, reformat, augment our content. Same access as Guardian. Revenue model to be negotiated. Combination of Media, Fees, Downloads.

APPROVED: Take our full article content, with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue

KEYLESS: Take our headlines. You keep associated revenues

1

Page 61: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 43

OPEN OUT: Developers can now access our full content APIs on demand with keys post-approved.

We are now positioning the platform as a place to do business with us.

So, rapid scalability, reliability, performance, are now core requirements

What this means

Page 62: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

44

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

DATA STOREA directory of

useful data curated by Guardian

editors

POLITICS APIOpen database of candidates, voting

records, constituencies,

election results, live data on election day

2 Open In

Page 63: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

44

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

DATA STOREA directory of

useful data curated by Guardian

editors

POLITICS APIOpen database of candidates, voting

records, constituencies,

election results, live data on election day

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

2 Open In

Page 64: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 45

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

Page 68: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 49

What this meansOpen In: Partners can now more easily integrate into our core

The Open Platform will become key to our commercial future.

Page 70: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 51

From Publisher to Platform

★Seeking massive growth, but no longer only broadcasting content

★User/partner engagement & contribution on★journalism★data★software★applications★revenue and ads

★ Support developers and partners with data and APIs, need scalability, reliability, speed

Page 71: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

App server App server App server

Web server Web server Web server

CMS

Oracle

Memcached

Page 72: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

App server App server App server

Web server Web server Web server

CMS Data feeds

Oracle

Memcached

Why RDBMS?

5 years ago, fewer alternatives

Understand operations procedures

Can easily recruit DBAs / devs

Developer/ops tools

Business critical system: a safe choice

Page 75: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 55

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs

Unique Users

Page 77: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

12,250,00014,500,00016,750,00019,000,00021,250,00023,500,00025,750,00028,000,000

May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 200956

Unique Users

Page 78: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

Page 79: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

Page 80: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

Page 83: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

Guardian database

CMSSearch engine

REST API

Your App Here!

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

Page 85: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

We used Solr/LuceneCan perform complex queries, including full text search

We can change the schema with no downtime.

On our dataset most queries are of a similar cost

Scales very well horizontally

Replication makes it easy to work in the cloud

62

Page 87: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

Solr

Content API

Cloud, EC2

21 May 2010Apache Lucene EuroCon

App server

Web servers

CMS

Memcached

Core

Solr

Solr

Solr

Solr

Solr

rdbms

63

Page 88: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

Simple REST/ HTTP framework allows lightweight development

Applications proxied for performance

Apps generally hosted in the cloud, hot deployment into production

Open in?

Page 89: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

Simple REST/ HTTP framework allows lightweight development

Applications proxied for performance

Apps generally hosted in the cloud, hot deployment into production

Open in?

Page 90: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

App server

Web servers

CMS

Memcached

Core

App

App

App

App

App

App

Apps

Proxy

external hostingapp engine etc

rdbms

65

Page 91: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon

App servers

Web servers

CMS

Memcached

Solr

Solr

Solr

Solr

Solr

Solr

Cloud, EC2

App

App

App

App

App

App

Proxyexternal hostingapp engine etc

rdbms

OPEN IN OPEN OUT

Page 93: From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model

21 May 2010Apache Lucene EuroCon 68

Thank you

http://www.guardian.co.uk/open-platform

Twitter: @openplatform @cuica (Stephen Dunn)


Recommended