April 2, 2016 16:3 WSPC IJSC2016 - University of Miamirvc.eng.miami.edu/Paper/2016/IJSC2016.pdfApril...

April 2, 2016 16:3 WSPC IJSC2016

International Journal of Semantic Computingc© World Scientific Publishing Company

Supporting Semantic Concept Retrieval with Negative Correlations in

a Multimedia Big Data Mining System

Yilin Yan

Department of Electrical and Computer Engineering

University of Miami

Coral Gables, Florida 33146, [email protected]

Mei-Ling Shyu

Department of Electrical and Computer Engineering

University of MiamiCoral Gables, Florida 33146, USA

[email protected]://rvc.eng.miami.edu

Qiusha Zhu

Senzari, Inc.

601 Brickell Key DriveMiami, Florida 33131, USA

[email protected]

With the extensive use of smart devices and blooming popularity of social media websitessuch as Flickr, YouTube, Twitter, and Facebook, we have witnessed an explosion of

multimedia data. The amount of data nowadays is formidable without effective big data

technologies. It is well-acknowledged that multimedia high-level semantic concept miningand retrieval has become an important research topic; while the semantic gap (i.e.,

the gap between the low-level features and high-level concepts) makes it even more

challenging. To address these challenges, it requires the joint research efforts from bothbig data mining and multimedia areas. In particular, the correlations among the classes

can provide important context cues to help bridge the semantic gap. However, correlation

discovery is computationally expensive due to the huge amount of data. In this paper,a novel multimedia big data mining system based on the MapReduce framework is

proposed to discover negative correlations for semantic concept mining and retrieval.Furthermore, the proposed multimedia big data mining system consists of a big dataprocessing platform with Mesos for efficient resource management and with Cassandra

for handling data across multiple data centers. Experimental results on the TRECVIDbenchmark datasets demonstrate the feasibility and the effectiveness of the proposed

multimedia big data mining system with negative correlation discovery for semantic

concept mining and retrieval.

Keywords: Big Data; Negative Correlation; Multimedia Semantic Mining and Retrieval;Information Integration; Hadoop; MapReduce; Spark; Mesos; Cassandra.

1


2 Yilin Yan, Mei-Ling Shyu, Qiusha Zhu

1. Introduction

The objective of multimedia big data mining is to extract useful information from

the large quantities of multimedia data including texts, images, videos, etc. [1][2][3].

The data is considered big in terms of its volume, variety, and velocity. IBM claims

90 percent of todays stored data was generated in just the last two years. We

have seen many big data examples in our daily life. Twitter generates 7TB of data

each day [4]; while Facebook generates 10TB daily. As another example, an Airbus

A380 has as many as 25,000 sensors and generates a huge amount data during a

single flight [4]. The deluge of data presents both opportunities and challenges. On

one hand, large collections of data pose lots of opportunities since they offer rich

information. On the other hand, its rapid growth outpaces the development of data

storage and searching technologies, leading to tremendous difficulties of mining and

retrieving useful information efficiently [5][6][7].

Mining interesting patterns and human understandable semantic features au-

tomatically from raw multimedia data to facilitate large-scale knowledge discovery

and information retrieval has become an essential research topic in today’s multi-

media big data mining [8][9][10][11]. Traditional text-based indexing and retrieval

systems suffer from the noisy and missing tag issue as the data nowadays are highly

unorganized and often crowdsourced. As a result, more and more researchers turn

to content-based approaches [12][13][14], which aims to discover information by ana-

lyzing images and videos with little or no help from their associated text. One main

research task in the content-based multimedia data retrieval field is multimedia

concept mining and retrieval [15][16][17][18][19][20]. It focuses on mining high-level

semantic features or so-called semantic concepts (like sky, airplane flying, buildings,

and dogs) directly from the extracted low-level visual features (e.g., color, shape,

texture, etc.).

In most of the existing approaches, the annotation of the concepts (or classes)

is handled independently as a binary classification problem or a multi-class clas-

sification problem [21], and the correlations among different concepts are ignored.

However, many concepts are often correlated, either positively or negatively. In other

words, some concepts co-occur more frequently (such as sky and cloud); while oth-

ers rarely co-occur (such as road and fish). Such correlations can provide important

context cues to help detect the concepts [22][23][24]. There are many types of corre-

lations, including Pearson correlation, Spearman correlation, and Cross correlation,

which detect the number of times two things occur together [25].

In this paper, a novel multimedia big data mining system to extract negative

concept correlations is proposed. The ICF-based (Integrated Correlation Factor)

negative correlation discovery approach is able to effectively select concepts with

significant negative associations [26][27]. Moreover, the proposed system adopts

Hadoop and Spark to address the big data challenge. The Apache HadoopTM [28]

is an open-source software framework written in Java for distributed storage and

distributed processing of very large datasets on computer clusters built from the


A Multimedia Big Data Processing Platform for Negative Correlation Discovery 3

commodity hardware. All the modules in Hadoop are designed with a fundamental

assumption that hardware failures are common and should be automatically han-

dled by the framework. On the other hand, Apache SparkTM [29] is a large-scale

data processing engine and is considered more advanced than Hadoop since it con-

tains many new features. Hence, many popular big data systems adopted Spark.

Furthermore, Spark can be deployed on different kinds of resource management

systems such as Apache MesosTM [30] which is a cluster resource management sys-

tem with efficient resource isolation and sharing across distribute applications. It

has also been shown that traditional Relational Database Management Systems

(RDBMSs) are not able to meet the big data management needs and have become

the bottleneck of a big data processing platform. New database technologies like

NoSQL are the solution. Among them, Apache CassandraTM [31] is a distributed,

highly available database designed to handle large amounts of data across multiple

data centers. Using these techniques, an efficient multimedia big data mining sys-

tem is built to discover negative concept correlations from the multimedia big data

for multimedia semantic concept retrieval.

The main contribution of this paper is the design and development of an effi-

cient multimedia big data mining system. Using the ICF-based negative correlation

discovery approach, the proposed system is built on Spark, Mesos, and Cassandra

so that it is able to process and handle multimedia big datasets. In addition, ex-

perimental results on the benchmark datasets from TRECVID semantic indexing,

a task from the annual TREC Video Retrieval (TRECVID) competition organized

by National Institute of Standards and Technology (NIST) [32], shows that the pro-

posed system is capable of processing and handling multimedia big datasets while

achieving the best MAP performance results from the ICF-based negative correla-

tion discovery approach for multimedia semantic concept mining and retrieval.

This paper is organized as follows. In Section 2, several types of correlations as

well as their usages are discussed. In Section 3, the proposed novel system that in-

cludes ICF-based negative correlation discovery is presented. Section 4 and Section 5

describe the system design and implementation in Hadoop and Spark, respectively.

Section 6 presents a multimedia big data mining system using Spark, Mesos, and

Cassandra for the efficient calculation of the ICF values. Section 7 demonstrates the

effectiveness of using the generated ICF values in semantic concept mining retrieval

from the TRECVID IACC.1.A dataset. Finally, Section 8 draws the conclusion and

identifies future research directions.

2. Correlation Coefficients

A correlation coefficient is a coefficient that illustrates a quantitative measure of

the statistical relationships between two or more random variables or observed data

values. Correlations can be divided into the following four different types (namely

nominal, ordinal, interval, and ratio) based on the category of the input data.



2.1. Nominal Scale

Nominal scales are used to label variables that do not have quantitative values,

and the data are put into categories without any order or structure. For example,

any data with the YES/NO labels is nominal since it has no order and there is

no distance between YES and NO. Another example is the data of colors that the

underlying spectrum is ordered but the names are nominal.

There are many coefficients in this category. Given a 2×2 contingency table,

and let Oi be an observed frequency, Ei be an expected (theoretical) frequency

asserted by the null hypothesis, and n be the number of cells in the table. χ2 [33]

is the Pearson’s cumulative test statistics, which asymptotically approaches a χ2

distribution as shown in Equation (1). If N is the grand total of the observations, the

coefficient φ can be defined in Equation (2). This coefficient can only be calculated

for frequency data represented in 2×2 tables.

χ2 =

n∑i=1

(Oi − Ei)2

Ei. (1)

φ =

√χ2

N. (2)

The Contingency (C) coefficient and Cramer’s V coefficient are similar. The C

Coefficient [34] in Equation (3) is used when there are 3 or more values for each

nominal variable, as long as there are an equal number of possible values leading to

the construction of a data matrix that has the same numbers of rows and columns

(e.g., 3×3, 4×4, etc.). C suffers from the disadvantage that it does not reach a

maximum of 1 or the minimum of -1. The highest it can reach in a 2×2 table is 0.707,

and the maximum it can reach in a 4×4 table is 0.870. It can reach values closer to 1

in the contingency tables with more categories. It should, therefore, not be used to

compare associations among tables with different numbers of categories. Moreover,

it does not apply to asymmetrical tables (i.e., those with different numbers of rows

and columns). On the other hand, Cramer’s V coefficient in Equation (4) is used

when the numbers of possible values for the two variables are not the same, yielding

different numbers of rows and columns in the data matrix (e.g., 2×3, 3×5, etc.).

It is also a measure of associations between two nominal variables, giving a value

between 0 and +1 (inclusive). Cramer’s V coefficient [35] is widely used because

it can solve both multiply variable cases and asymmetrical variable cases, but it

can be a heavily biased estimator of its population counterpart and may tend to

overestimate the strength of association.

C =

√χ2

χ2 +N; (3)

V =

√χ2

N(k − 1)=

√φ2

(k − 1)=

√φ2

min[(r − 1), (c− 1)]. (4)



2.2. Ordinal Scale

The order of the values in ordinal scales is more significant, so smaller (<) and

bigger (>) can be applied to the values but the differences between them is not

really known. However, an ordinal scale can only interpret a gross order but not

the relative positional distances. The simplest example is ranking, and there is no

objective distance between any two points on the subjective scale. The top may be

far superior to the second in one case; while the distance may be subjectively small

in another case.

There are several coefficients in this category. The Gamma (G) coefficient [36]

is for symmetrical correlation from -1 to +1 as follows.

G =Ns −NdNs +Nd

, (5)

where Ns is the parity of the number of non-inversions, i.e., the pairs of elements

x, y of σ such that x<y and σ(x)<σ(y) or x>y and σ(x)>σ(y); Nd is the parity

of the number of inversions, i.e., the pairs of elements x, y of σ such that x<y

and σ(x)>σ(y) or x>y and σ(x)<σ(y). In this case, Ns is the number of pair of

cases that are ranked the same on both variables; while Nd is the number of pair of

cases that are ranked differently on both variables. The Gamma coefficient needs

all nominal variables to be ranked. A similar one is the Kendall tau (τ) coefficient

which considers the tied pairs.

The Somers d coefficient [33] is for asymmetrical correlation. If X is an inde-

pendent variable (column) and Ny is the parity of the number of non-inversions

in rows, versus if Y is an independent variable (row) and Nx is the parity of the

number of non-inversions in columns, then coefficients can be defined as follows.

dxy =Ns −Nd

Ns +Nd +Ny; (6)

dyx =Ns −Nd

Ns +Nd +Nx. (7)

The Spearman’s rank correlation coefficient [37] is a nonparametric measure of

statistical dependence between two variables. It assesses how well the relationship

between two variables can be described using a monotonic function. If there are

no repeated data values, a perfect Spearman correlation of -1 or +1 occurs when

each of the variables is a perfect monotone function of the other. The Spearman

correlation coefficient is defined as the Pearson correlation coefficient between the

ranked variables. For a sample of size n, the n raw scores Xi and Yi are converted

to ranks xi and yi, and ρ is computed from Equation (8). Identical values (such

as rank ties or value duplicates) are assigned a rank equal to the average of their



positions in the ascending order of the values.

ρ =

n∑i

(xi − x)(yi − y)√n∑i

(xi − x)2(yi − y)

2

(8)

2.3. Interval Scale

Data in this category are numeric scales in which not only the order but also the

exact differences between the values are known and thus the realm of statistical

analysis on such data opens up. The Celsius temperature is considered the classic

example data in this category. Furthermore, since they are numeric variables, plus

(+) and minus (−) can also be applied.

The Pearson product-moment correlation (ρ or r) coefficient [26][27] is an exam-

ple coefficient in this category. It is a measure of the linear correlation dependence

between two variables X and Y , giving a value between -1 and +1 (inclusive),

where -1 is a total negative correlation, 0 is no correlation, and +1 is a total posi-

tive correlation. It is widely used in sciences as a measure of the degree of the linear

dependence between two variables. For a population:

ρX,Y =cov(X,Y )

σXσY=E[(X − µX)(Y − µY )]

σXσY, (9)

where cov is the covariance of X and Y and is equal to E[x−E(x)][(y−E(y)] =

E(xy) − E(x)E(y); σX is the standard deviation of

√1N

N∑i=1

(xi − µX)2; σY is the

standard deviation of

√1N

N∑i=1

(yi − µY )2; µX is the mean of X (i.e., 1

N

N∑i=1

xi); µY

is the mean of Y (i.e., 1N

N∑i=1

yi); and finally E is the expectation.

2.4. Ratio Scale

Data in the ratio scales have the order of the values, the exact value between units,

and an absolute zero. Thus a wide range of both descriptive and inferential statistics

can be applied to the data in the ratio scale. Since they are numeric variables with

an absolute zero (like temperature, mass, etc.), the multiply (×) and divid (÷)

operators can also be applied. However, data in multimedia and social research are

usually not in this category, but most of the correlation coefficients for the interval

scales can also be applied to data in the ratio scales.

3. Integrated Correlation Factor (ICF)

While a number of approaches have adopted positive correlations to improve the

performance of semantic concept mining and retrieval, very few work is explored



for negative correlations among concepts. Based on our literature review, finding

negative correlations in the multimedia big data area is a challenge. Some studies

speculated that negative correlations might only improve the performance slightly

[38]; while some other groups even reported their performance gain by negative

correlations was nearly 0.

Unlike positive instances, the labels of a large number of negative instances are

actually inferred in many large-scale multimedia datasets. This is partially because

when the label of a training sample is manually annotated as “skipped” or “not

sure”, we usually consider it as a negative instance. Although the co-occurrence

of two concepts in a video shot (keyframe) increases the probability that they are

positively correlated, one concept does not occur while the other appears does not

necessarily mean that they are negatively correlated. It is hard to conclude the

appearance of the concept “bird” suppresses the appearance of the concept “flower”

just because they do not co-occur together.

Inspired by the above findings, we use a three-step hierarchical selection strategy

in this paper. In order to efficiently eliminate irrelevant correlations, the first step

includes a conditional probability-based coarse filtering algorithm. Let CT be a

target concept and CR represents a reference concept. Then C+T or C−T denotes the

positive or negative collection of CT ; whereas C+R or C−R represents the positive or

negative collection of CR. P (.) is for probability as usual.

If CT and CR are negatively correlated, the probability of CT appearing de-

creases if CR appears, and at the same time the probability of CT appearing in-

creases if CR does not appear. Such an idea can be represented by the following

two inequalities:

P (C+T |C

−R )

P (C+T )

> 1 andP (C−T |C

+R )

P (C−T )> 1 (10)

From the association rule point of view, these two values are related to the conviction

measurement introduced in [39]. In order to satisfy the necessary condition for

the negative correlations, i.e., P (C+T |C

−R ) > P (C+

T ) and P (C−T |C+R ) > P (C−T ), the

threshold is set to 1.

To save computation time and with the truth that:

P (C−T |C+R )

P (C−T )=

P (C−T C+R )

P (C+R )P (C−T )

=P (C+

R |C−T )

P (C+R )

(11)

For each concept pair, only the results of all concepts from the left inequality in

Equation (10) need to be calculated based on Equation (11). By simply switching

the target concept with the related concept during training, we can definitely find

the values of the right inequality in the result table as well.

Next, NIC (Negative Independent Coefficient) is defined to measure the negative

correlations between concepts to eliminate a large number of possible correlations



as given in the following equation:

NIC(T,R) =P (C+

T |C−R )

P (C+T )

+P (C−T |C

+R )

P (C−T )(12)

All concept pairs with the NIC value less than a certain threshold will be filtered

and thus significantly reduces the computational complexity in the next step.

Consider the following example. The concepts “Road” and “Waterscape” should

have (nearly) perfect negative correlations. However, there are actually 113220 out

of 115806 data instances with negative labels for both concepts and thus the con-

ditional probability:

P (C+Road|C

−Waterscape) =

P (C+RoadC

−Waterscape)

P (C−Waterscape)= 0.01211 (13)

This value severely deviates from the ideal threshold value 1. Since our goal is to

improve concept detection for all kinds of test data instances using the correlation

information, it is biased to simply discard those data instances. Also, the discarding

strategy does not resolve the issue since it will introduce the bias that these two

concepts are negatively correlated.

Therefore, based on the observation that a huge number of data instances are

not labeled and are given the inferred negative label, the third step is applied as

follows. Generally speaking, if two concepts are negatively correlated, their correla-

tions would not be affected by the existence of the third concept (called a control

concept in this paper). ICF (Integrated Correlation Factor) represents an average

quantitative metric of correlations under different control concepts. Let Ω be the set

of all concepts, |Ω| be the total number of concepts, and CD represent the control

concept. Similarly, with the definitions of C+T and C+

R , C+D is the condition that a

data instance is positive for CD. Using such information, we define ICF between

the target concept and the reference concept in Equation (14).

ICF (T,R) =1

|Ω| − 2

∑D∈Ω,D 6=T,D 6=R

ρ(CT , CR|C+D), (14)

where ρ(CT , CR|C+D) is the Pearson product-moment correlation coefficient [40] be-

tween CT and CR given C+D, which has been explained in Section 2.3.

However, there is still one issue left. Under special cases, ρ(CT , CR|C+D) is not

defined, i.e., the corresponding Pearson correlation coefficient is not defined. This

may occur when the data instances have unique labels for either CT or CR. When

it happens, if CT and CR co-occur for all C+D, the value is set to 1; whereas if both

CT and CR do not appear for all C+D, the value is set to 0. As long as CT and

CR co-occur once, the value is set to 0.5 to impose a relatively large penalty on

that concept pair. These thresholds are determined from our empirical studies. For

the rest of the cases, the value is set to the average value of the negative Pearson

correlation coefficients.



4. Hadoop MapReduce for Negative Correlations

Hadoop MapReduce [41] is a software framework for easily writing applications

that process vast amounts of data in parallel on thousands of nodes of commodity

hardware in a reliable and fault-tolerant manner. Hadoop [28], its open-source im-

plementation, allows the distributed processing of large datasets across the clusters

of computers using simple programming models. It is designed to scale up from a

single server to thousands of machines, each offering local computation and storage.

Rather than relying on hardware to deliver high-availability, Hadoop is designed to

detect and handle failures at the application layer, so delivering a highly-available

service on top of a cluster of computers, each of which may be prone to failures.

Hadoop provides a distributed file system called Hadoop Distributed File System

(HDFS). It splits the input files into large blocks and distributes them amongst the

nodes in the cluster. To process the data in parallel, Hadoop MapReduce trans-

fers the packaged code to nodes based on the data each node needs to process.

A MapReduce program consists of two user-defined functions: a map function to

process pieces of the input data (called input splits), and a reduce function to aggre-

gate the output of invocations of the map function. Both functions use user-defined

key-value pairs as the input and output.

Let Sn represent a shot in a video where n=1 to N ; Cm be a concept where

m=1 to M ; and vmn indicates whether the concept Cm appears in the shot Sn. If

the value of vmn is 1 (or 0), it means that concept Cm appears (or does not appear)

in shot Sn. To adopt the MapReduce framework, the concept ID and the label of a

shot are obtained from each video file and the <Key, Value> pairs for each shot and

each concept pair are generated. For example, we have the following <Key, Value>

pairs:

Key = (C1, C2), V alue = (v1n, v2n)


...

Key = (C1, CM ), V alue = (v1n, vMn)


...

Key = (C2, CM ), V alue = (v2n, vMn)

...

Key = (CM−1, CM ), V alue = (vM−1n, vMn)

The Map() and Reduce() functions use the defined key-value pairs as the input

and output. In this paper, the Map() function is used to generate the aforementioned

key-value pairs; while the Reduce() function calculates the conditional probability

value using Equation (10) for each key. The rationale of our proposed functions is

that unlike many symmetrical correlations like Pearson and Spearman correlations,

conditional probability values are asymmetrical. This results in N ×M ×M keys



and M ×M outputs. The next step is to filter the outputs using the ICF values for

each concept pair. Please note that the running time complexities for the Mappers

and the Reducers are O(NM2) and O(M2), respectively. As can be observed, the

time complexities are both high and cannot scale up very well.

5. Spark for Negative Correlations

Spark is an open source big data processing framework advertised as “lightning”

fast cluster computing built around speed, ease of use, and sophisticated analytics.

It provides a faster and more general data processing platform. The Spark core is

complemented by a set of powerful, higher-level libraries which can be seamlessly

used in the same application. These libraries currently include SparkSQL, Spark

Streaming, MLlib (for machine learning), and GraphX. Additional Spark libraries

and extensions are currently under development as well. Spark introduces the con-

cept of an RDD (Resilient Distributed Dataset), an immutable fault-tolerant, dis-

tributed collection of objects that can be operated in parallel. An RDD can contain

any type of objects and is created by loading an external dataset or distributing a

collection from the driver program.

Spark has several advantages compared to other big data and MapReduce tech-

nologies including Hadoop. For example, Spark’s multi-stage in-memory primitives

provide performance up to 100 times faster for certain applications [29][42][43] when

comparing to Hadoop’s two-stage disk-based MapReduce paradigm. In addition to

the Map and Reduce operations, Spark provides many operations called transfor-

mations such as map, flatMap, sample, filter, groupByKey, reduceByKey, union,

join, sort, cogroup, mapValues, and partionBy. Developers can use these operations

stand-alone or combine them to run in a single data pipeline use case. Moreover,

Spark can run programs up to 100x faster in memory or 10x faster on disk than

Hadoop.

In order to build a more efficient correlation discovery system using Spark, the

shots and their concept labels are obtained from a video collection and they are

grouped by the concept IDs using the combineByKey function. Now, the key is the

concept ID, and its value is the labels of all the shots, which is represented as an

array or an iterator in Java (shown below):

Key = (C1), V alue = (v11, v12, ..., v1N )

Key = (C2), V alue = (v21, v22, ..., v2N )

Key = (C3), V alue = (v31, v32, ..., v3N )

...

Key = (CM ), V alue = (vM1, vM2, ..., vMN )

Then, the data can be represented by a 2-D matrix while the first dimension

is for concept ID and the second dimension is for shot ID. Again, vmn indicates

whether the concept Cm appears in the shot Sn or not. To correlate all the con-

cepts, the Cartesian product of groups by groups is executed, which generates all



possible combinations of (Ci, Cj) and are used as the input keys for computing the

correlations as follows, where vm[ ] is a vector contain values from vm1 to vmN .

Key = (C1, C2), V alue = (v1[ ], v2[ ])

Key = (C1, C3), V alue = (v1[ ], v3[ ])

...

Key = (C1, CM ), V alue = (v1[ ], vM [ ])

Key = (C2, C1), V alue = (v2[ ], v1[ ])

...

Key = (C2, CM ), V alue = (v2[ ], vM [ ])

...

Key = (CM−1, CM ), V alue = (vM−1[ ], vM [ ])

Now, one half of the key pairs in (Ci, Cj) where i>j are filtered and the NIC

values using Equation (12) are generated. Furthermore, the key pairs whose NIC

values are less than a threshold will be discarded. Afterward, the ICF values are

generated. Please note that new key value pairs will be generated as follows. Since

the keys are filtered based on the NIC values between the concept pairs, the values

are the same. However, the target concept and the related concept are different for

different keys (as illustrated in Equation (14)). The idea is that for each key pair,

the rest of the concepts are used as the control concepts. Therefore, the ICF values

are generated for each key pair with the following final output.

Key = (C1, C2), V alue = (ICF1,2)

...

Key = (C1, CM ), V alue = (ICF1,M )

Key = (C2, C3), V alue = (ICF2,3)

...

Key = (C2, CM ), V alue = (ICF2,M )

...

Key = (CM−1, CM ), V alue = (ICFM−1,M )

6. Big Data Processing Platform for Negative Correlations using

Spark, Mesos, and Cassandra

A big data processing platform not only needs a cluster computing infrastructure,

but also requires complementary tools for efficient calculations since an outdated

component could be the bottleneck of the platform. In the past 5 years, many

corresponding tools have been developed to support the big data era. In this paper,

an efficient processing platform for negative correlation discovery is built with the

utilization of some of these tools.



6.1. Spark on Mesos

In Section 4 and Section 5, both Hadoop and Spark are built on YARN (Yet An-

other Resource Negotiator). In Hadoop 2.0, YARN is split from MapReduce and

runs on top of it. YARN is a generic cluster resource management framework that

can run applications on a Hadoop cluster. In the YARN model of computation,

ResourceManager runs as a master daemon and manages ApplicationMasters and

NodeManagers. ApplicationMaster is a lightweight process that coordinates the ex-

ecution of tasks of an application and requests the resource containers for tasks

from the ResourceManager with the NodeManager offering the resources (memory

and CPU) as the resource containers.

Mesos is a distributed system kernel which essentially uses a container architec-

ture but is abstract enough to allow a seamless execution of multiple (sometimes

identical) distributed systems on the same architecture, minus the resource overhead

of the virtualization systems. This includes appropriate resource isolation while still

allowing data locality needed for frameworks like MapReduce. Mesos was built to

be a global resource manager for the entire data center. The primary difference

between Mesos and YARN is their schedulers. In Mesos, when a job comes in, a

job request comes into the Mesos master, and what Mesos does is to determine

what the resources are available and to make the offers back. Those offers can be

accepted or rejected. This allows the framework to decide what the best fit is for

the job that needs to be run. If Mesos accepts the job for the resources, then it

places the job on the slave and all is done. It has the option to reject the offer and

wait for another offer to come in. One big advantage of Mesos over YARN is that it

can manage all the resources in the data center. Therefore, Spark is run on Mesos

instead of YARN in our proposed system. Mesos cluster consists of Master nodes

which are responsible for resource offers and scheduling and Slave nodes which do

the actual heavy lifting of task executions.

6.2. Spark with Cassandra

A DBMS (Database Management System) is a software system that enables users

to define, create, maintain, and control access to the database based on a database

model. In the past half-century, a RDBMS (Relation Database Management Sys-

tem) using SQL (Structured Query Language) has been a dominant solution. Re-

lations bring the benefits of group-keeping the data as constrained collections (in

tables) containing the information in a structured way, and relate all the inputs

by assigning values to the attributes. During the past decades, database systems

that implemented the relational models are more and more efficient and reliable,

e.g. MySQL, PostgreSQL, and SQLite. However, with the rapid development of

big data computing techniques, the traditional relational data model faces great

challenges when working with other big data processing tools and often comes out

as the bottleneck of the infrastructure. In particular, when the size of a relational

table increases tremendously, even answering simple queries becomes a problem.



The recent development has made many implementations available though each

work very differently and serves a specific need. These schema-less solutions either

allow an unlimited forming of entries or have a very simple but extremely efficient

key-based value stores. For example, the NoSQL database systems do not come with

a model as used with the structured relational solutions. Different from most people

think, NoSQL databases actually have existed since the 1960s, but have recently

gained attractions with popular options such as MongoDB, CouchDB, Redis and

Apache Cassandra. Among them, Cassandra is well-known for its high-availability

and high-throughput characteristics and is capable of handling enormous write loads

and surviving cluster node failures. In terms of the CAP (Consistency, Availabil-

ity, and Partition tolerance) theorem, Cassandra provides tunable consistency and

availability for operations. What is more interesting when it comes to data process-

ing is that Cassandra is linearly scalable and it provides cross-datacenter replication

capabilities. In our proposed processing platform, the key-value pair can be directly

stored in Cassandra since it supports multi-values in one column.

6.3. System Integration

Spark and Cassandra are connected by the spark-cassandra-connector, which works

similar to how JDBC (Java Database Connectivity) connects Java code with a

database. The Spark-Cassandra connector can expose Cassandra tables as Spark

RDDs and map the table rows to CassandraRow objects or tuples.

In this paper, the Mesos Slaves and Cassandra nodes are collocated to enforce

better data locality for Spark; while Spark binaries are configured with proper

master endpoints and executor jar location and then are deployed to all worker

nodes. A deployment overview scenario of the proposed platform is shown in Fig. 1.

While our proposed big data processing platform is concise and consists of only

three components, it is possible to implement different system designs that fit to

not only purely batch or stream processing, but also more complex Lambda and

Kappa architectures as well.

7. Experiments and Results

7.1. Dataset

The IACC.1.A dataset is chosen from the semantic indexing (SIN) task of the

TRECVID 2015 benchmark [44] in our experiment, which aims to detect the seman-

tic concept contained within a video shot. It is essential for retrieval, categorization,

and other video exploitations. Several challenges existed in this task, such as data

imbalance, scalability, and the semantic gap [45] as mentioned earlier.

The TRECVID conference series encourage research in information retrieval and

provide a huge number of videos for training (more than 200 hours in IACC.1.A

dataset). It is very suitable to test their datasets on our big data mining system due

to the huge number of videos (and video shots). By extracting a keyframes from each



Framework

Scheduler

ZK

MasterLEADER

Cluster Node 1

Slave

Node

…

ZK ZK

MasterSTANDBY

MasterSTANDBY

ZooKeeperquorum

Spark Executor

Task Task

Spark Executor

Task Task

Spark Executor

Task Task

Cluster Node 2

Slave

Node

Spark Executor

Task Task

Spark Executor

Task Task

Spark Executor

Task Task

Cluster Node N

Slave

Node

Spark Executor

Task Task

Spark Executor

Task Task

Spark Executor

Task Task

Fig. 1. Deployment Overview of the Proposed Big Data Processing Platform

video shot, we have totally 144,774 training data instances. Such a huge number of

video shots makes it very time consuming in the training phase as reported by many

groups participating trecvid in each year [46]. In this dataset, totally 130 concepts

are given, including many popular semantic concepts include “Sky”, “Mountain”,

and “Person”. The list of concepts and the detailed explanations can be found

in [32]. In this paper, we download the detection scores from the DVMM Lab of

Columbia University [47] for all video shots. The TRECVID 2015 training labels

are also utilized to increase the number of ground truth in the negative association

selection component. The proposed multimedia big data mining system is tested

using some of the results from our previous work as shown in [26][27].

7.2. NIC-based and ICF-based Negative Correlation Selection

In the IACC.1.A dataset, there are 16,770 possible links for 130 concepts if one

bi-directional link between two concepts is counted as two different links. Thus, the

number of all pair-wise associations is 8385. We test our proposed system on the

IACC.1.A dataset and show the top 10 negative correlations using the NIC-based

selection approach and the ICF-based selection approach in Table 1. We add the

two probability ratios in Equation (10) for the NIC-based selection approach be-

fore ranking the selected concept pairs. Among these concept pairs, some of them

are correlated because of the definitions of the concepts. For instance, the con-

cepts “Two people” and “Single Person” definitely cannot occur in the same frame.

Meanwhile, the selection of this concept pair can prove the efficiency of our system.



Comparatively, the ICF-based selection approach selects more significant negative

associations compared to the NIC-based approach, which shows that the ICF-based

approach is more effective. Here, we target seven concepts in the top 10 ICF-based

negative correlations, namely “Road”, “Indoor”, “Daytime Outdoor”, “Suburban”,

“Trees”, “Male Person”, and “Two People”.

Table 1. Comparison of Negative Correlation Selection

Rank NIC-based ICF-based

1 Entertainment, Building Road, Waterscape Waterfront

2 Infants, Industrial Indoor, Plant

3 Person, Helicopter Hovering Indoor, Vegetation

4 Person, Natural-Disaster Daytime Outdoor, Indoor

5 Person, Airplane Flying Indoor, Outdoor

6 Canoe, Bus Suburban, Indoor

7 Telephones, Swimming Indoor, Building

8 Cats, Person Trees, Indoor

9 Canoe, Car Racing Male Person, Female Human Face Closeup

10 Person, Birds Two People, Single Person

The shot IDs and the corresponding concept labels are obtained from all the

training video files and are read into the Key-Value pairs as described in Section 5,

which is much more efficient as the Key-Value pairs in Section 4. The conditional

probability values are first calculated and then fed into our proposed system again

to generate the NIC values. To save the computation time and make our proposed

system more efficient, some key pairs are discarded due to their low NIC values

and the rest are kept to generate the ICF values. These steps are shown in Fig. 2

and discussed in Section 3 and Section 5. All the correlation values are stored in

Cassandra after the processing.

Then, using our previous work in [26][27], we extract a set of features from

the original training dataset to train the MCA-based negative weight estimation

model [48]. After score normalization, a regression model is built using the label,

score of the target concept, score of the reference concept, and MCA-based weight

for the score integration. All the trained negative correlations, MCA models, and

regression models are stored and tested for the testing data.

In the testing phase, all the testing data instances are fed into all the concept

detection models to generate the testing scores. The features contributed to the

training model are extracted from the testing data instances to compute the MCA-

based weights; while similar score and weight normalization steps are processed.

Finally, they are the input to the trained regression-based score integration model

to generate the predicted scores for the testing data instances.



Fig. 2. Spark Implantation of the ICF Calculation

7.3. Performance Evaluation

For the performance evaluation, the average precision (AP) value is adopted in this

paper. AP is widely used in the multimedia concept retrieval domain. For a given

concept, Pre(a) indicates the precision of the a-th data instance in the ranking list.

ψ is for the number of the retrieved data instances; while Gn is for the total number

of data instances containing that concept in the database. Min(Gn, ψ) indicates the

smaller value of Gn and ψ. The average precision at ψ (i.e., AP@ψ) is defined in

Equation (15). By generating the AP values for all the concepts and calculating

their mean value, the mean average precision (MAP) value is used to capture the

ranking information.

AP@ψ =

ψ∑a=1

Pre(a)× rel(a)

Min(Gn, ψ), (15)

where rel(a) =

1, if instance a is positive,

0, if instance a is negative.

To conduct the comparison, the proposed multimedia big data mining system

with the ICF-based negative correlation discovery approach is evaluated against

four approaches. The first one, “Base”, consists of the raw scores from [47] with-

out modification. The “Subtraction” approach subtracts the scores of a reference

concept from those of a target concept. Another intuitive idea, the so-called “Ran-



dom” approach, randomly selects a reference concept. The fourth one is the domain

adaptive semantic diffusion (DASD) framework [49]. In our previous study [49],

the “DASD” approach was implemented by setting the number of iterations to 20

and keeping all negative affinities as described in [49]. We compare our results with

DASD since it also focuses on correlation information and achieves the best perfor-

mance among many similar technologies. Three-fold cross-validation over the seven

ICF-based target concepts is used as shown in Table 1.

Here, we set two thresholds (Th1 and Th2) to the mean of NIC values minus

one standard deviation and two standard deviations to show the application of

negative correlations and compare the effects of thresholds on the performance.

Based on these two thresholds, there are different numbers of target concepts. For

all the approaches in the comparison, the MAP values for the selected concepts at

different numbers of the retrieved data instances are compared in Table 2 and Table

3 using Th1 and Th2 for the IACC.1.A datasets, respectively. As can be seen, the

ICF-based approach outperforms all the other approaches by large margins across

all different MAP measures. This comparison shows the effectiveness of the ICF-

based approach under different numbers of the target concepts. Please note that

the MAP values in these two tables were reported in our previous work [26][27].

The focus of the performance evaluation in this paper is to demonstrate that the

proposed multimedia big data mining system built on Hadoop and Spark is able to

handle big datasets like TRECVID and achieve the same MAP results, in terms of

both the number of target concepts and the number of retrieved data instances.

Table 2. MAP Values at Different Numbers of Retrieved Data Instances (7 Target Concepts and 10 SelectedLinks),

Framework MAP@10 MAP@20 MAP@40 MAP@60 MAP@80 MAP@100 MAP@500

Base 0.4508 0.4084 0.3576 0.3137 0.2738 0.2441 0.1305

Subtraction 0.4729 0.3997 0.3391 0.2872 0.2537 0.2227 0.1155

Random 0.3601 0.3156 0.2317 0.1844 0.1587 0.1397 0.0845

DASD 0.4827 0.4020 0.3340 0.3113 0.2786 0.2431 0.1222

ICF-based 0.8626 0.7355 0.6054 0.5588 0.5105 0.4729 0.3397

Table 3. MAP Values at Different Numbers of Retrieved Data Instances (30 Target Concepts and 67Selected Links),

Framework MAP@10 MAP@20 MAP@40 MAP@60 MAP@80 MAP@100 MAP@500

Base 0.4782 0.4060 0.3397 0.2911 0.2577 0.2346 0.1555

Subtraction 0.4472 0.3798 0.3169 0.2734 0.2424 0.2197 0.1400

Random 0.3990 0.3548 0.2406 0.1974 0.1745 0.1541 0.1010

DASD 0.4719 0.3890 0.3095 0.2682 0.2399 0.2154 0.1377

ICF-based 0.6472 0.5485 0.4614 0.4157 0.3827 0.3539 0.2868



8. Conclusion and Future Work

In this paper, a novel multimedia big data mining system with ICF-based negative

correlation discovery for semantic concept mining and retrieval is proposed. For effi-

cient big data processing, our system is built on top of Mesos with NoSQL Database

(Cassandra); while adopts both Hadoop and Spark. The performance comparison

of the ICF-based approach running in the proposed system on different numbers

of the target concepts and retrieved data instances is conducted. The experimental

results demonstrate that our proposed multimedia big data mining system is able

to handle big datasets while executing the ICF-based approach to capture the ICF

values between the target and reference concepts to effectively utilize the negative

correlation information for semantic concept mining and retrieval.

Under the design and development of the proposed system, the ICF-based nega-

tive correlation discovery approach can be easily extended to calculate other kinds of

suitable correlations and coefficients for big datasets. For example, other researches

may be interested in replacing the “Conditional Probability Calculation” to other

desired correlation coefficients in different circumstances. Furthermore, if some cor-

relations can be ternary or even quaternary, the Cartesian part can be replaced

to generate the corresponding groups of correlations. It may also be beneficial to

include correlations between frames to further improve the current system.

References

[1] X. Li, S.-C. Chen, M.-L. Shyu, and B. Furht, “An effective content-based visualimage retrieval system,” in Proceedings of the Computer Software and ApplicationsConference, 2002, pp. 914–919.

[2] S.-C. Chen, S. Sista, M.-L. Shyu, and R. Kashyap, “Augmented transition networksas video browsing models for multimedia databases and multimedia information sys-tems,” in Proceedings of the 11th IEEE International Conference on Tools with Arti-ficial Intelligence, 1999, pp. 175–182.

[3] S.-C. Chen, M.-L. Shyu, and R. Kashyap, “Augmented transition network as a se-mantic model for video data,” International Journal of Networking and InformationSystems, vol. 3, no. 1, pp. 9–25, 2000.

[4] MangoDB World, “Big data explained,”https://www.mongodb.com/big-data-explained, accessed March 2016.

[5] M.-L. Shyu, C. Haruechaiyasak, and S.-C. Chen, “Category cluster discovery fromdistributed www directories,” Information Sciences, vol. 155, no. 3, pp. 181–197,2003.

[6] S.-C. Chen and R. Kashyap, “Temporal and spatial semantic models for multimediapresentations,” in Proceedings of the 1997 International Symposium on MultimediaInformation Processing, 1997, pp. 441–446.

[7] M.-L. Shyu, S.-C. Chen, M. Chen, C. Zhang, and K. Sarinnapakorn, “Image databaseretrieval utilizing affinity relationships,” in Proceedings of the 1st ACM InternationalWorkshop on Multimedia Databases, ser. MMDB ’03. New York, NY, USA: ACM,2003, pp. 78–85. [Online]. Available: http://doi.acm.org/10.1145/951676.951691

[8] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Video semantic concept discov-ery using multimodal-based association classification,” in Proceedings of the IEEEInternational Conference on Multimedia & Expo, July 2007, pp. 859–862.



[9] X. Li, S.-C. Chen, M.-L. Shyu, and B. Furht, “Image retrieval by color, texture,and spatial information,” in Proceedings of the 8th International Conference on Dis-tributed Multimedia Systems, September 2002, pp. 152–159.

[10] X. Huang, S.-C. Chen, M.-L. Shyu, and C. Zhang, “User concept pattern discov-ery using relevance feedback and multiple instance learning for content-based imageretrieval,” in Proceedings of the Third International Workshop on Multimedia DataMining, in conjunction with the 8th ACM International Conference on KnowledgeDiscovery & Data Mining, July 2002, pp. 100–108.

[11] M.-L. Shyu, S.-C. Chen, and R. Kashyap, “Generalized affinity-based associationrule mining for multimedia database queries,” Knowledge and Information Systems(KAIS): An International Journal, vol. 3, no. 3, pp. 319–337, August 2001.

[12] L. Lin, G. Ravitz, M.-L. Shyu, and S.-C. Chen, “Effective feature space reduction withimbalanced data for semantic concept detection,” in Proceedings of the IEEE Inter-national Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing,2008, pp. 262–269.

[13] M. L. Shyu, Z. Xie, M. Chen, and S. C. Chen, “Video semantic event/concept detec-tion using a subspace-based multimedia data mining framework,” IEEE Transactionson Multimedia, vol. 10, no. 2, pp. 252–259, Feb 2008.

[14] S.-C. Chen, A. Ghafoor, and R. L. Kashyap, Semantic Models for MultimediaDatabase Searching and Browsing. Springer Science & Business Media, 2000.

[15] S.-C. Chen, S. Rubin, M.-L. Shyu, and C. Zhang, “A dynamic user concept patternlearning framework for content-based image retrieval,” IEEE Transactions on Sys-tems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 36, no. 6, pp.772–783, Nov 2006.

[16] M.-L. Shyu, C. Haruechaiyasak, S.-C. Chen, and N. Zhao, “Collaborative filtering bymining association rules from user access sequences,” in Proceedings of the Interna-tional Workshop on Challenges in Web Information Retrieval and Integration, April2005, pp. 128–135.

[17] S.-C. Chen, M.-L. Shyu, and C. Zhang, “Innovative shot boundary detection for videoindexing,” in Video Data Management and Information Retrieval, S. Deb, Ed. IdeaGroup Publishing, 2005, pp. 217–236.

[18] S.-C. Chen, M.-L. Shyu, C. Zhang, and R. L. Kashyap, “Identifying overlapped ob-jects for video indexing and modeling in multimedia database systems,” InternationalJournal on Artificial Intelligence Tools, vol. 10, no. 4, pp. 715–734, 2001.

[19] S.-C. Chen, M.-L. Shyu, and C. Zhang, “An intelligent framework for spatio-temporalvehicle tracking,” in Proceedings of the 4th IEEE International Conference on Intel-ligent Transportation Systems, August 2001, pp. 213–218.

[20] X. Chen, C. Zhang, S.-C. Chen, and M. Chen, “A latent semantic indexing basedmethod for solving multiple instance learning problem in region-based image re-trieval,” in Seventh IEEE International Symposium on Multimedia, Dec 2005, pp.37–44.

[21] E. A. Cherman, J. Metz, and M. C. Monard, “Incorporating label dependency intothe binary relevance framework for multi-label classification,” Expert Systems withApplications, vol. 39, no. 2, pp. 1647–1655, February 2011.

[22] L. Lin and M.-L. Shyu, “Weighted association rule mining for video semantic de-tection,” International Journal of Multimedia Data Engineering and Management,vol. 1, no. 1, pp. 37–54, 2010.

[23] M.-L. Shyu, T. Quirino, Z. Xie, S.-C. Chen, and L. Chang, “Network intrusiondetection through adaptive sub-eigenspace modeling in multiagent systems,”ACM Trans. Auton. Adapt. Syst., vol. 2, no. 3, Sep. 2007. [Online]. Available:



http://doi.acm.org/10.1145/1278460.1278463[24] M.-L. Shyu, S.-C. Chen, M. Chen, and C. Zhang, “A unified framework

for image database clustering and content-based retrieval,” in Proceedings ofthe 2Nd ACM International Workshop on Multimedia Databases, ser. MMDB’04. New York, NY, USA: ACM, 2004, pp. 19–27. [Online]. Available:http://doi.acm.org/10.1145/1032604.1032609

[25] S. Perera, Instant MapReduce Patterns - Hadoop Essentials How-to. Packt Publish-ing, May 2005.

[26] Y. Yan, M.-L. Shyu, and Q. Zhu, “Negative correlation discovery for big multimediadata semantic concept mining and retrieval,” in Proceedings of the IEEE InternationalConference on Semantic Computing, Feb 2016, pp. 55–62.

[27] T. Meng, Y. Liu, M.-L. Shyu, Y. Yan, and C.-M. Shu, “Enhancing multimedia se-mantic concept mining and retrieval by incorporating negative correlations,” in Pro-ceedings of the IEEE International Conference on Semantic Computing, June 2014,pp. 28–35.

[28] Apache, “Hadoop,” http://hadoop.apache.org, accessed Oct. 2015.[29] ——, “Spark,” https://spark.apache.org, accessed Oct. 2015.[30] ——, “Mesos,” http://mesos.apache.org/, accessed Mar, 2016.[31] ——, “Cassandra,” http://cassandra.apache.org/, accessed March 2016.[32] A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid,”

in Proceedings of the 8th ACM International Workshop on Multimedia InformationRetrieval, October 2006, pp. 321–330.

[33] I. Kaimi, “Understanding advanced statistical methods p. westfall and k. s. s.henning, 2013 boca raton, chapman and hallcrc 570 pp., 44.99 isbn 978-1-466-51210-8,” Journal of the Royal Statistical Society: Series A (Statistics in Society), vol. 178,no. 1, pp. 302–302, 2015. [Online]. Available: http://dx.doi.org/10.1111/rssa.12096 3

[34] E. Wong, T. Wei, Y. Qi, and L. Zhao, “A crosstab-based statistical method foreffective fault localization,” in Software Testing, Verification, and Validation, 20081st International Conference on, April 2008, pp. 42–51.

[35] W. Bergsma, “A bias-correction for cramrs and tschuprows,” Journal of theKorean Statistical Society, vol. 42, no. 3, pp. 323 – 328, 2013. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S1226319212001032

[36] W. H. K. Leo A. Goodman, “Measures of association for cross classifications,iv: Simplification of asymptotic variances,” Journal of the American StatisticalAssociation, vol. 67, no. 338, pp. 415–421, 1972. [Online]. Available: http://www.jstor.org/stable/2284396

[37] J. Piantadosi, P. Howlett, and J. Boland, “Matching the grade correlationcoefficient using a copula with maximum disorder,” Journal of Industrial andManagement Optimization, vol. 3, no. 2, pp. 305–312, 2007. [Online]. Available:http://aimsciences.org/journals/displayArticlesnew.jsp?paperID=2265

[38] Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo, “Domain adaptive semanticdiffusion for large scale context-based video annotation,” in Proceedings of the12thIEEE International Conference on Computer Vision, Sept 2009, pp. 1420–1427.

[39] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic itemset counting andimplication rules for market basket data,” in Proceedings of the ACM SIGMOD in-ternational conference on management of data, vol. 26, 1997, pp. 255–264.

[40] K. Pearson, “Notes on regression and inheritance in the case of two parents,” Pro-ceedings of the Royal Society of London, vol. 58, pp. 240–242, 1895.

[41] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on largeclusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [Online]. Available:



http://doi.acm.org/10.1145/1327452.1327492[42] U. Berkeley, “amplab,” https://amplab.cs.berkeley.edu, accessed Oct. 2015.[43] R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, “Shark: Sql

and rich analytics at scale,” in Proceedings of the 2013 ACM SIGMOD InternationalConference on Management of Data, ser. SIGMOD ’13. New York, NY, USA: ACM,2013, pp. 13–24. [Online]. Available: http://doi.acm.org/10.1145/2463676.2465288

[44] P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, W. Kraaij, A. F. Smeaton,G. Quenot, and R. Ordelman, “Trecvid 2015 – an overview of the goals, tasks, data,evaluation mechanisms and metrics,” in Proceedings of TRECVID 2015. NIST, USA,2015.

[45] L. Lin, C. Chen, M.-L. Shyu, and S.-C. Chen, “Weighted subspace filtering andranking algorithms for video concept retrieval,” IEEE Multimedia, vol. 18, no. 3, pp.32–43, March 2011.

[46] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei,“Large-scale video classification with convolutional neural networks,” in Proceedingsof the 2014 IEEE Conference on Computer Vision and Pattern Recognition, ser.CVPR ’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 1725–1732.[Online]. Available: http://dx.doi.org/10.1109/CVPR.2014.223

[47] Y.-G. Jiang, “Prediction scores on TRECVID 2010 data set,”http://www.ee.columbia.edu/ln/dvmm/CU-VIREO374/, 2010, last accessed onSeptember 2011. [Online]. Available: http://www.ee.columbia.edu/ln/dvmm/CU-VIREO374/

[48] Q. Zhu and M.-L. Shyu, “Sparse linear integration of content and context modalitiesfor semantic concept retrieval,” IEEE Transactions on Emerging Topics in Comput-ing, vol. 3, no. 2, pp. 152–160, June 2015.

[49] Y.-G. Jiang, J. Wang, S.-F. Chang, and C.-W. Ngo, “Domain adaptive semanticdiffusion for large scale context-based video annotation,” in Proceedings of the Inter-national Conference on Computer Vision, Kyoto, Japan, September 2009, pp. 1420–1427.

Date post:	30-May-2020
Category:	Documents
Upload:	others
View:	13 times
Download:	0 times

April 2, 2016 16:3 WSPC IJSC2016 - University of Miamirvc.eng.miami.edu/Paper/2016/IJSC2016.pdfApril...

Documents