+ All Categories
Home > Engineering > Big-Data Analytics for Media Management

Big-Data Analytics for Media Management

Date post: 15-Dec-2014
Category:
Upload: techkrish
View: 46 times
Download: 3 times
Share this document with a friend
Description:
An overview of Media Analytics outlining the evolution of image classification and knowledge extraction. The presentation offers an insight into the Big-Data Analytics for Media Management.
Popular Tags:
89
07/06/22 1 Krishna Chandramouli, Associate Professor, Media Engineering and Analytics Research Group, School of Information Technology and Engineering, VIT University [email protected] Big-Data Analytics for Media Management
Transcript
Page 1: Big-Data Analytics for Media Management

10/04/23 1

Krishna Chandramouli,Associate Professor,Media Engineering and Analytics Research Group, School of Information Technology and Engineering,VIT [email protected]

Big-Data Analytics for Media Management

Page 2: Big-Data Analytics for Media Management

10/04/23 2

Krishna Chandramouli,Associate Professor,Media Engineering and Analytics Research Group, School of Information Technology and Engineering,VIT [email protected]

Big-Data Analytics for Media Management

Page 3: Big-Data Analytics for Media Management

Overview Media and Internet Information Access Subjective vs Objective Indexing The Semantic Gap Evolving Strategies Social Media Analysis Indexing Large-scale Repositories Future Research Directions Take Away Message Q & A10/04/23 3

Page 4: Big-Data Analytics for Media Management

Media and internet

10/04/23 4

Page 5: Big-Data Analytics for Media Management

Media and Internet In March 2013 that Flickr

had a total of 87 million registered members and more than 3.5 million new images uploaded daily.

There are currently almost 90 billion photos total on Facebook.  This means we are, by far, the largest photos site on the Internet.

10/04/23 5

Page 6: Big-Data Analytics for Media Management

Information access

Textual search

Visual search

Search query formulation

10/04/23 6

Page 7: Big-Data Analytics for Media Management

Information Access

Traditional ordering of images is achieved through categorization of information into logical structures Creation of albums Categorizing through date/time Clustering through location

Image based search engines are gaining popularity with the increase in power of indexing schemes

10/04/23 7

Page 8: Big-Data Analytics for Media Management

Information Access

10/04/23 8

Page 9: Big-Data Analytics for Media Management

Information Access

10/04/23 9

Page 10: Big-Data Analytics for Media Management

Information Access

10/04/23 10

Page 11: Big-Data Analytics for Media Management

Information Access

10/04/23 11

Page 12: Big-Data Analytics for Media Management

Indexingsubjective or objective

10/04/23 12

Page 13: Big-Data Analytics for Media Management

Subjective vs Objective Indexing

How to uniquely name an image to make them distinguishable?

What names can be used to search images? How many names are needed to make the

images unique? Will all humans use the same names to

identify the images?

10/04/23 13

Page 14: Big-Data Analytics for Media Management

Subjective vs Objective Indexing

Humans are culturally influenced Terms contain different meanings across

boundaries and cultures Therefore, any tag/word assigned to an image

will be considered subjective Objective signatures for images are generated

from the characteristics of the images The beginning of MPEG-7 standardisation

activities.10/04/23 14

Page 15: Big-Data Analytics for Media Management

Subjective vs Objective Indexing Image characteristics exploited for objective

annotation include Colour

Colour Layout Descriptor Colour Structure Descriptor Dominant Colour Descriptor Scalable Colour Descriptor

Texture Texture Browsing Descriptor Edge Histogram Descriptor Homogenous Texture Descriptor

Shape

10/04/23 15

Page 16: Big-Data Analytics for Media Management

The semantic gap

10/04/23 16

Page 17: Big-Data Analytics for Media Management

The Semantic Gap

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols.

In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation

10/04/23 17

Page 18: Big-Data Analytics for Media Management

The Semantic Gap

10/04/23 18

Page 19: Big-Data Analytics for Media Management

The Semantic Gap

10/04/23 19

Page 20: Big-Data Analytics for Media Management

Evolving strategies

Image Classification

Visual Classifier

Knowledge Assisted Analysis

Image Retrieval and User Relevance Feedback

Multi-Concept Space Search and Retrieval

10/04/23 20

Page 21: Big-Data Analytics for Media Management

Evolving Strategies The problem of Image classification and

clustering has been the subject of active research for last decade. Mainly attributed to the exponential growth of digital content.

The efficiency of the clustering and classification algorithms can be attributed to the efficiency of the machine learning approaches.

To improve the performance of machine learning algorithms, different optimisation techniques has been employed such as Genetic Algorithms.

10/04/23 21

Page 22: Big-Data Analytics for Media Management

Evolving Strategies Recent developments in applied and heuristic

optimisation techniques have been strongly influenced and inspired by natural and biological systems.

Algorithms developed from such observations are Ant Colony Optimisation (ACO) - based on the ability of

an ant colony to nd the shortest path between the food and the source compared to an individual ant.

Articial Immune System (AIS) - typically exploit the immune system's characteristics of learning and memory to solve a problem

Particle Swarm Optimisation (PSO) - inspired by the social behaviour of a flock of birds.

10/04/23 22

Page 23: Big-Data Analytics for Media Management

Evolving Strategies In the study of "Semantic Gap", machine

learning algorithms are the building blocks for bottom-up approach.

Some of the applications of efficient machine learning algorithms are: Automatic Content Annotation Knowledge Extraction Content Retrieval

In the top-down approach, Ontology provides partial understanding of human semantics.

10/04/23 23

Page 24: Big-Data Analytics for Media Management

Visual classifier

10/04/23 24

Page 25: Big-Data Analytics for Media Management

Slide: 25

Particle Swarm Optimisation In an effort to transform the social interaction of

different species into a computer simulation, Kennedy and Eberhart developed an optimisation technique named Particle Swarm Optimisation.

• In theory, the universal behaviour of individuals is summarised in terms of Evaluate, Compare and Imitate principles.

Page 26: Big-Data Analytics for Media Management

Slide: 26

Particle Swarm Optimisation

Evaluate: The tendency to evaluate stimuli – to rate them as positive or negative, attractive or repulsive is perhaps the most ubiquitous behavioural characteristic of living organisms.

Compare: In almost every aspect of life, human tend to compare with others

Imitate: Humans imitation comprises taking the perspective of the other person, not only imitating a behaviour but also realising its purpose and executing the behaviour when it is appropriate

Page 27: Big-Data Analytics for Media Management

Slide: 27

Particle Swarm Optimisation

Equations governing the motion of particles in PSO.

Page 28: Big-Data Analytics for Media Management

Slide: 28

Particle Swarm Optimisation

Pseudo code for the algorithm Step 1: Random Initialization of Particles Step 2: Function Evaluation Step 3: Computation of personal best and global

best Step 4: Velocity update Step 5: Position update Step 6: Loop to step 2, until the stopping criteria

is reached

Page 29: Big-Data Analytics for Media Management

Slide: 29

Visual Classification Framework

Self Organising Map

[X]

[X] - Input feature vectorClass 1 – RedUntrained - Black

Winner Node selected based on L2 norm

Page 30: Big-Data Analytics for Media Management

Slide: 30

Visual Classification Framework

Training of R-SOM network with PSO Algorithm

Page 31: Big-Data Analytics for Media Management

Slide: 31

Visual Classification Framework

.. .

Winner Node

)]([)()1( tmxhtmtm iciii )]([)()1( tmxhtmtm iciii

Dual Layer SOM

Dual – Layer SOM Network

Page 32: Big-Data Analytics for Media Management

Slide: 32

Chaos-Particle Swarm Classifier The elementary principle of “Chaos” is introduced to

model the behaviour of particle motion.

The theoretical discussion on Chaotic – PSO includes the notion of “wind speed” and “wind direction” modelling the biological atmosphere for position update of the particles.

Page 33: Big-Data Analytics for Media Management

Slide: 33

Chaos-Particle Swarm Optimisation

The wind speed and therefore the position update equation are presented by:

Page 34: Big-Data Analytics for Media Management

Knowledge assisted framework

10/04/23 34

Page 35: Big-Data Analytics for Media Management

Slide: 35

Knowledge Assisted Analysis

Architecture

Page 36: Big-Data Analytics for Media Management

Slide: 36

Knowledge Assisted Analysis

Machine Learning - Evaluation

Page 37: Big-Data Analytics for Media Management

Slide: 37

Knowledge Assisted Analysis Experimental Dataset

A set of 500 Images, belonging to the general category of vacation images was assembled.

The content was mainly obtained from Flickr online photo management and sharing application and includes images that depict cityscape, seaside, mountain and landscape locations.

Every image was manually annotated, i.e. after the segmentation algorithm is applied, a single concept was associated with each resulting image segment

Page 38: Big-Data Analytics for Media Management

Slide: 38

Knowledge Assisted Analysis

A subset of Database

Page 39: Big-Data Analytics for Media Management

Slide: 39

Knowledge Assisted Analysis

Comparison of Machine Learning techniques

Page 40: Big-Data Analytics for Media Management

Slide: 40

Knowledge Assisted Analysis From the results it can be seen that the combined use

of PSO optimisation technique with SOM results in better classification accuracy compared to using the latter alone.

It can be noted that the performance of PSO classier is better than the performance of SVM and GA classifiers.

Since, SVM's need large training data to accurately discriminate between image classes.

Page 41: Big-Data Analytics for Media Management

Image retrieval and user relevance feedback

10/04/23 41

Page 42: Big-Data Analytics for Media Management

Slide: 42

User Relevance Feedback

Overview of Multimedia Retrieval System

Page 43: Big-Data Analytics for Media Management

Slide: 43

User Relevance Feedback

Relevance Feedback Framework

Page 44: Big-Data Analytics for Media Management

Slide: 44

User Relevance Feedback The database used in the experiment is generated

from Corel Dataset and consists of seven concepts namely, building, cloud, car, elephant, grass, lion and tiger

The test set has been modelled for seven concepts with a variety of background elements and overlapping concepts, hence making the test set complex.

Page 45: Big-Data Analytics for Media Management

Slide: 45

User Relevance Feedback

Example images from Corel Dataset

Page 46: Big-Data Analytics for Media Management

Slide: 46

User Relevance Feedback

Average Accuracy for 7 concepts and 10 user interaction

Page 47: Big-Data Analytics for Media Management

Multi-concept search space

10/04/23 47

Page 48: Big-Data Analytics for Media Management

Multi-concept framework

Slide 48

• High-level queries“A tiger resting in the forest and guarding his

territory” • Mid-level features (context independent)“Tiger”, “Grass”, “Rock”, “Water”,……

Page 49: Big-Data Analytics for Media Management

Multi-concept framework• Mid-level features:In a constrained environment with limited number of

mid-level features, the performance of classification algorithm has found to be satisfactory

• High-level queries:Open to subjective interpretation of the concepts

and also may involve more than one mid-level feature

Main objective:• In this multi-concept framework, users are

encouraged to construct high level queries based on their preferences

Page 50: Big-Data Analytics for Media Management

Multi-concept framework

Slide 50

Page 51: Big-Data Analytics for Media Management

Mid-level feature extraction

Slide 51

• SVM Classifier • SVM Light toolbox was used to generate

semantic labels• CLD+EHD

• Multi-feature classifier (MF) • Employs a mixture of 7 visual features.

• The visual features are merged using Multi-Objective Learning (MOL)

Page 52: Big-Data Analytics for Media Management

Query space formulation

Slide 52

• Pre-processing stage: mid-level feature concept detection

• Query formulation: users to construct a high-level semantic information space

Page 53: Big-Data Analytics for Media Management

Query space visualisation Fisheye distortion technique Overview + focus

Slide 53

Page 54: Big-Data Analytics for Media Management

Query space visualisation

Slide 54

• Query space panel• Concept map panel• Concept chart panel

Page 55: Big-Data Analytics for Media Management

Experiments and Evaluation

Slide 55

• A 3500 image set collection • From Corel dataset• Natural images with many elements• Foreground and background• Rich semantic context• Fully annotated

• 10 mid-level conceptslion, water, grass, building, car, cloud, rock, tiger,

elephant, flower

• 8 high-level conceptsflower fields, modern city view, rural garden, mountain

view, waterfalls, wild life, city street, boat

Page 56: Big-Data Analytics for Media Management

Comparison of results Retrieval of high level queries using

the proposed MCB framework

Slide 56

Page 57: Big-Data Analytics for Media Management

Comparison of results Retrieval of high level queries using

SVM classification

Slide 57

Page 58: Big-Data Analytics for Media Management

Comparison of results Content-based retrieval with RF

mechanism

Slide 58

Page 59: Big-Data Analytics for Media Management

Experiments and Evaluation

Slide 59

Page 60: Big-Data Analytics for Media Management

Experiment and Evaluation

Slide 60

Page 61: Big-Data Analytics for Media Management

Experiment and Evaluation

Slide 61

User 1Landscape water, grass 0.58Modern city building, cloud 0.8Wild life lion, tiger, elephant 0.59Rural garden flower, water, grass 0.9

User 2Landscape water 0.23Modern city building 0.71Wild life lion, rock, grass, tiger, elephant 0.87Rural garden flower 0.28

User 3Landscape water, grass, cloud, car, elephant 0.59Modern city cloud, building, car 0.91Wild life lion, tiger, grass, elephant, rock 0.82Rural garden flower, water, grass 0.88

Page 62: Big-Data Analytics for Media Management

Social media analysis

10/04/23 62

Page 63: Big-Data Analytics for Media Management

Social Media Analysis

Social media is the interaction among people in which they create, share or exchange information and ideas in virtual communities and networks.

Andreas Kaplan and Michael Haenlein define social media as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0

10/04/23 63

Page 64: Big-Data Analytics for Media Management

Social Media Analysis

Social media allows for the creation and exchange of user-generated content.

Social media differ from traditional or industrial media in many ways, including quality, reach, frequency, usability, immediacy, and permanence.

10/04/23 64

Page 65: Big-Data Analytics for Media Management

Slide: 65

Textual and Visual Analysis• Images are often accompanies with free-text

annotations, which can be used as complementary information for content-based classification

• The challenge is to extract entities from text and classify them into an arbitrary set of classes

Plansarsko lakeShepherd in Bucegi National Park

Page 66: Big-Data Analytics for Media Management

Slide: 66

Labeled Corpora(MUC,

BBN,ACE)

Textual and Visual Analysis

Page 67: Big-Data Analytics for Media Management

Slide: 67

Textual and Visual Analysis

Page 68: Big-Data Analytics for Media Management

Slide: 68

Annotated Images

Binary Segmentation Masks

Segmentation of Images Feature

Extraction

Biologically Inspired

Classifier

Training ModelCic={Sky, Rock,..}

Semantic ConceptMapping

TargetedHypernym Discovery

Wordnet

Wikipedia ClassifierFusion

Labeled Segments

Labeled Entities

Ctc = {Person, Landscape,..}

Visual analysis (KAA)Visual analysis (KAA)

Text Analysis (SCM+THD)Text Analysis (SCM+THD) Fusion Fusion

Use-case scenario

Page 69: Big-Data Analytics for Media Management

Slide: 69

Church of our Lady Mercy in Buje

BuildingBuilding

EXIF

Binary mask

PSO

Largest region size

Use-case scenario

Page 70: Big-Data Analytics for Media Management

Slide: 70

Map word to Wordnet concept

1. noun phrase

2. head noun

3. hypernym for noun phrase (with THD)

4. hypernym for head noun (with THD) Compute similarity with each of the classses

Experiments carried out with Lin similarity measure

The probability of encountering concept cis usually estimated from a large corpus

Semantic Concept Mapping

Page 71: Big-Data Analytics for Media Management

Slide: 71

Content-based analysis (KAA) restricted to classes for which the classifier has been learnt

For text-based analysis (SCM/THD), the classes have to be exhaustive - all entities are classified

Mapping from SCM/THD to KAA

Perform intersection between the individual classifier results

Select concept occupying largest area on the image

Image Class.(KAA)

Text Class.

Classifier Fusion

Page 72: Big-Data Analytics for Media Management

Indexing large-scale repositories

10/04/23 72

Page 73: Big-Data Analytics for Media Management

Indexing Large-scale Repositories

10/04/23 73

Page 74: Big-Data Analytics for Media Management

Indexing Large-scale Repositories

The textual analysis block aims to generate a list of named entities extracted from the textual metadata associated with the input video

The pre-processing framework classifies the tags into two general categories common-tags named entities

10/04/23 74

Page 75: Big-Data Analytics for Media Management

Indexing Large-scale Repositories

Common tags correspond to either action, country or associated with synset in WordNet

Named-entity tags do not have a WordNet synset and thus depend on extrenal resources to contextualise them

The objective of the pre-processing module is to ensure the named entities are disambiguated to enable a semantic similarity search

10/04/23 75

Page 76: Big-Data Analytics for Media Management

Indexing Large-scale Repositories

Bag of Articles Classifier The input of a BOA classifier is a set of labelled

instances and a set of unlabelled instances (noun chunks).

Wikipedia article titles provide an unanimous mapping between the labelled instance and a wikipedia article

Each article is described by its type (article, page, disambiguation page, category page and so forth)

10/04/23 76

Page 77: Big-Data Analytics for Media Management

Indexing Large-scale Repositories A BOA classifier requires a Wikipedia index

containing the following information about each article term vectors with term frequencies out links and popularity ranking (for most frequent sense relevance

ranking) For geo-tagging adaptation, the textual analysis

block searches for geographical named entities in the queries Wikipedia articles

The location details are extracted with the help of DBpedia using SPARQL end-point

10/04/23 77

Page 78: Big-Data Analytics for Media Management

On-going research challenges

10/04/23 78

Page 79: Big-Data Analytics for Media Management

VIT@MediaEval 2013

Social Event Detection Task

10/04/23 79

Page 80: Big-Data Analytics for Media Management

VIT@MediaEval 2013

10/04/23 80

The geographical coordinates is an important component and indicator of where an event has happened.

The event clusters are nalised through the weighted occurrence of tags among the distribution of media annotation

Page 81: Big-Data Analytics for Media Management

VIT@MediaEval 2013

10/04/23 81

The system computes the similarity between synset representing the tags and each of the categories.

We use Lin similarity measure to evaluate the semantic distance between the synset and category.

Page 82: Big-Data Analytics for Media Management

VIT@MediaEval 2013

Placing Task

10/04/23 82

Page 83: Big-Data Analytics for Media Management

VIT@MediaEval 2013

Dividing the globe into grids with a maximum of 10,000 images per grid . Starting from an initial grid that spans the entire globe, recursively subdividing grids into smaller ones once the threshold is reached.

10/04/23 83

Page 84: Big-Data Analytics for Media Management

VIT@MediaEval 2013

10/04/23 84

Page 85: Big-Data Analytics for Media Management

Future research directions

10/04/23 85

Page 86: Big-Data Analytics for Media Management

Future Research Directions

MediaEval is a multimedia benchmarking initiative that offers tasks and datasets to the research community that emphasize the human and social aspects of multimedia.

In 2014, MediaEval is offering eight classic tasks and three Brave New Tasks.

http://www.multimediaeval.org/mediaeval2014/

10/04/23 86

Page 87: Big-Data Analytics for Media Management

Future Research Directions

ImageCLEF 2014 ImageCLEF organizes four main tasks to

benchmark the challenging task of image annotation for a wide range of source images and annotation objective, such as general multi-domain images for object or concept detection, as well as domain-specific tasks such as visual-depth images for robot vision and volumetric medical images for automated structured reporting.

10/04/23 87

Page 88: Big-Data Analytics for Media Management

Future Research Directions The tasks address different aspects of the annotation

problem and are aimed at supporting and promoting the cutting-edge research addressing the key challenges in the field, such as multi-modal image annotation, domain adaptation and ontology driven image annotation.

http://www.imageclef.org/2014

10/04/23 88

Page 89: Big-Data Analytics for Media Management

Thank you!!!Q & A

10/04/23 89


Recommended