Media Fragments Indexing using Social Media

Media Fragment Indexing Using Social Media

Yunjia Li1, Raphael Troncy2, Mike Wald1 and Gary Wills1 1School of Electronics and Computer Science

University of Southampton, UK 2EURECOM, Sophia Antipolis, France,

1

Agenda

• Media Fragments

• Media Fragment Indexing Framework

• Survey on Media Fragment URI Implementations on Video Sharing Platforms

• Indexing Media Fragments Using Twitter

• Conclusions and Future Work

2

Media Fragment • Denote the inside content of multimedia resources

• Dimensions defined in the Media Fragment URI 1.0 spec

– Temporal dimension

http://example.org/test.mp4#t=3,7

– Spatial dimension (a rectangle area)

http://example.org/test.mp4#xywh=120,240,180,240

3

Current Situation • Multimedia uploading, sharing, tagging is easy

• Searching a complete multimedia resource on major search engines is easy

• But searching multimedia resource at a fine-grained level on major search engines is difficult

– Availability of annotations: limited amount of annotations linked to media fragments

– SEO problem:

• The landing page is not search-engine-friendly • Everything is on the same page and the notion of

media fragment is not explicitly embedded in HTML 4

Media Fragment Indexing Framework

5

Google’s Ajax Content Crawler

• The Crawler is designed to index Ajax content

• Replace token “#!” in URLs with “_escaped_fragment_”

6 *Diagram from https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

Key Ideas

• The fragment information must be included in the URL

– Syntax: W3C Media Fragment 1.0 Specification

• Prepare two sets of pages for every media fragment

– original landing page for end-users

– a snapshot page for SEO

• Landing page keeps the original user interaction

– Highlight media fragments on opening

• SEO page

– ONLY includes annotations of the media fragment

– Embed rich snippet

7

The Solution

8

Server

Crawler

1:

1: Submit pretty URL replay/1#!t=3,7 to the crawler

2:

2: Crawler asks server for replay/1?_escaped_fragment_=t=3,7

Terrace Theater 3:

Snapshot page Snapshot/1?_escaped_frag

ment_=t=3,7

3: Redirect the request to the snapshot page generated by the server. The snapshot page only contains annotations and Microdata for “#t=3,7”,

Terrace Theater Linked Data

Landing page replay/1#!t=3,7

Terrace Theater replay/1#!t=3,7

4:

4: The snapshot page is returned to the crawler with URL replay/1#!t=3,7

5: Terrace Theater

5: A user searches keyword “Terrace Theater”

6: replay/1#!t=3,7

6: Google includes replay/1#!t=3,7 in the search results

7:

7: The user click the link and ask for the document at replay/1#!t=3,7

8:

8: The server returns the landing page containing both “Terrace Theater” and “Linked Data”

9:

9: The landing page highlights the media fragment by start playing from 3s to 7s

Discussion

• The Media Fragment Indexing Framework solved the SEO problem of media fragments

• The scalability of such method largely relies on whether there are large number of annotations linked to media fragments

• Looking for media fragment annotations?

– Timed-text, transcript, speech recognition

– Manual annotations on each video sharing platforms

– Social Media (Twitter)

9

Survey on Media Fragment URI Implementation

10

Media Fragments and Social Media • The deep-linking function

• A Media Fragment URL can be embedded in a Tweet

• Text of the Tweet is the annotation to the URL

• Get annotations by filtering Tweets that have MF URIs

11

Filter Tweets by Media Fragment URIs

• Problem:

– Any URL in Tweet is potentially a MF URI

– Too many false-positive cases

http://example.org/1234#t=23

http://example.org/1234?t=23

http://example.org/1234?track=23

– They could all be MF URIs, need to be identified manually

• Work around:

– Identify platforms (partially-)implementing MF URI

– Only filter Tweets containing URLs from those domains

12

Survey Methodology

• Find a list of video sharing platforms

– http://en.wikipedia.org/wiki/List_of_video_hosting_services

– 59 websites are targeted in the survey

– Some of them have access restrictions

• Go through each website manually to see whether they provide deep-linking function, such as:

– Social sharing button from a certain time point

– Deep-linking option in right click menu

13

http://en.wikipedia.org/wiki/List_of_video_hosting_services

Survey Results (1) • 9 websites partially-implemented MFURI

– 56.com, Dailymotion, Hulu, Vbox7, Viddler, vimeo, Tudou, Youku and YouTube

• They use different syntax to encode temporal dimension

– Most of them use URI query, except YouTube & Vimeo

– Parameter name: “start”, “t”, “st”, etc

– Only Hulu implemented the end time

• Only YouTube partially implemented spatial dimension

– This is an external function implemented by Clickberry

https://clickberry.tv/video/6dafe30e-dcb8-44b8-8190-32be8249a297 14

Survey Results (2) • Only 9 websites partially-implemented MFURI, however:

– Those websites have covered most videos shared on the web

– eBizMBA report: http://www.ebizmba.com/articles/video-websites

• Select filter keywords based on the survey results:

– Twitter is banned in China, so 56.com, Tudou and Youku are ignored

– Hulu has access restriction outside U.S.

• Filter keywords: “YouTube”, “Dailymotion”, “Vbox7”, “Vimeo” and “Viddler”

15

http://www.ebizmba.com/articles/video-websites

Indexing Media Fragments Using Twitter

16

Twitter Media Fragment Indexer • Collect Tweets filtered by the keywords

• Extract MF URIs in Tweets, parse the media fragment information

• Use Media Fragment Indexing Framework to publish Tweets as media fragment annotations

• Embed rich snippet in the snapshot pages

• Create sitemap for Google to crawl the snapshot pages

• User searches keywords in the Tweet in Google and the link will lead to the video with corresponding start time

17

The Detailed Workflow

18

Indexing Results (1) • Monitor 50-hour non-stop Twitter stream

• Filter phrase: “youtube, dailymotion, vimeo, vbox7, viddler”

• 5,779,858 Tweets examined, 5,269,742 contain URLs

• 32,754 Tweets contain MF URIs, 32796 MF URIs in total

• Media Fragment URIs shared in each website:

19

Website No. of MFURIs %

YouTube 32,666 99.604

Dailymotion 101 0.308

Vbox7 0 0

Viddler 0 0

Vimeo 29 0.088

Indexing Results (2) • 13,088 distinct videos are found

• 17,854 distinct MF URIs for sitemap

– Many Tweets share the same video, but different fragments

– Many retweets

– Some video are not available in UK

• 17,479 URLs (97.9%) in the sitemap have been indexed by Google

• Only 775 URLs are indexed as VideoObject even though all rich snippets are embedded in all snapshot pages

20

Demo • Search “Chris Eppstein”

• As a result, this landing page will be opened and the video start playing from the time indicated in the Tweet containing keywords “Chris Eppstein”

21

https://www.google.co.uk/

http://twitter-mediafragment-indexer.herokuapp.com/v/Ug6XAw6hzaw

Conclusions and Future Work

22

Conclusions • Introduced Media Fragment Indexing Framework

• Propose the using of social media to acquire more annotations to media fragments

• Survey the MF URI implementation on major video sharing platforms

• Twitter Media Fragment Indexer

– Monitor Tweet Stream and automatically create media fragment annotations

– Index media fragments in Google

– YouTube is the most important domain to share media fragments on Twitter

23

Future Work • How valid tweets could be served as media fragment

annotations

– many noisy and unrelated text

– many re-tweets

• Experiment on larger scale (billions of tweets and continuous monitoring)

• Expand the methodology to other media fragment annotations, such as timed-text

• Extract named entities from tweets and further link media fragments to the Linked Data Cloud

24

Questions?

25

Date post:	11-May-2015
Category:	Internet
Upload:	linkedtv
View:	491 times
Download:	1 times

Media Fragments Indexing using Social Media

Internet