Mashed Up Playlist

Post on 30-Oct-2014

12 views 1 download

Tags:

description

Presented at Web Directions South 09The ABC launched three new socially networked digital radio websites: ABC Dig Music, ABC Jazz and ABC Country in July 2009. They are the first of several ABC projects involving content aggregation. As well as having slick, highly usable designs the music platform integrates with various sources including MusicBrainz, YouTube, Last.fm and Wikipedia. This aggregation functionality graphically illustrates the possibilities of Semantic Web technology for an editorial organisation such as the ABC.

transcript

THE MASHED UP PLAYLIST part II

David Peterson @davidseth #w3c http://www.flickr.com/photos/soyignatius/

David Peterson@davidseth

Challenge

Create a snapshot of an artist

Problem

<xml><track>

<title>Purple Rain</title><artistName>Prince</artistName>

</track></xml>

Into

It’s all about story telling

Shared Understanding

• Can’t tell a story if the other person doesn’t get what we mean

• Or even speak the same language• Imagine – explain what a kiwi was– or what a sheep was

• The story matters• ... but ...• You never really have all the information you

need, whether big or small

You Just don’t Always Know

• Someone else knows more than you• How to find it?

One Exception

Semantic Web

• Core idea – you never really know the entire picture

• This is a good thing• Freedom

Closed World

Open World

http://www.flickr.com/photos/almasryalyoum_e/

Finding a Solution

• Which APIs to use• Which APIs can we use• How can we combine data from multiple

sources• How can we automate it

The Curse of too much

• There are over 50 APIs listed on programmableweb.com

• Too many to look into• Each has its own API methods and return data

formats– JSON, XML, RSS, RDF !!!

Take your Pick

• APIs everywhere– BBC Music– Discogs– Last.fm– MusicBrainz– Yahoo Music– Flickr– Youtube– The Hype Machine

Finding the key

• One common feature was the usage of a MusicBrainz ID– Last.fm– Discogs– Freebase– Wikipedia/Dbpedia– BBC

Eureka!

• Great, now all I had to do was use the MusicBrainz API to look up the ID and I was done. Easy...

• :( • The search API sucked. It returned too many

fuzzy results• crap

Back to the future

• This is where the Semantic Web enters the picture– All that stuff about story telling– Shared understanding– URIs (web links)

SPARQL

Think of it as Google with a WHERE clause

SELECT ?artist WHERE { ?artist foaf:name "Prince"@en . ?artist a <http://dbpedia.org/ontology/MusicalArtist>.}

SELECT ?artist ?bio ?url ?album WHERE { ?artist foaf:name "Prince"@en . ?artist a <http://dbpedia.org/ontology/MusicalArtist> . ?artist dbpedia2:abstract ?bio . ?artist foaf:page ?url .

OPTIONAL { ?album <http://dbpedia.org/ontology/artist> ?artist . ?album rdfs:label "Purple Rain"@en . }}LIMIT 1

Pinpoint results

• This returns ONE result• “exactly” what we are looking for (or nothing!)

{170d193a-845c-479f-980e-bef15710653e}

http://www.flickr.com/photos/riseofphoenix/

{070d193a-845c-479f-980e-bef15710653e}

http://www.flickr.com/photos/angeldew/

Raw Data

• Not too pretty to look at• But computers LOVE this stuff

So, what do we get

• Disambiguation• MusicBrainz ID• Discography• Related Artists• Official homepage• Bio• Credit card details (in Semantic Web 2.0)

The Rosetta Stone

• MusicBrainz ID is our key to the wild web of APIs

• Wikipedia URL is the key to Semantic Web• One happy family

http://www.flickr.com/photos/vportals/

• [insert LOD graph]

Take a look

[browser]

Hindsight is 20/20

... or lessons learned

Drupal Sucks

• Drupal performance, what performance?• Out of the box it’s been beaten with an ugly

stick

Don’t use Drupal

• To get the best performance out of Drupal, don’t use Drupal

Pressflow

• Key patches and enhancements• Releases mirror official Drupal releases• Big players are using it– Drupal.org– ABC– Music labels– Newspapers

Start your Engines

MySQL base install is ... lacking• MyISAM == slow• Use Percona XtraDB• ... or ... InnoDB

Reduce your footprint

• APC– PHP app is compiled & cached in memory

Search

• Drupal’s built in search can be a dawg• Solr – Much faster search– Offers faceting– Can become a platform in its own right.

A Fresh Coat of Paint

• Varnish– Last but certainly not least– Up to 10 million hits per hour

What’s Next?

• Project Mercury• Drupal 7– RDFa– Views 3– FOAF+SSL• open social networking• everything under your control