Activity recognition through MLNs / ProbLogjasonfil/files/cvss_07_03.pdf · 2013. 7. 12. ·...

Activity recognition

through MLNs

Jason Filippou

CVSS Summer Seminar series

July 3rd 2013

Activity recognition through MLNs

• Joint work with Complex Event Recognition (CER) Lab in

NCSR “Demokritos”, Athens, Greece

Traditional logical reasoning for activity

recognition

• Pros: • Can model events of arbitrary complexity (FOL expressiveness)

• Complexity = Multiple actors, temporal complexity and persistence (inertia)

• Rules are easy to write:

• Facilitation of interaction between developer & domain expert.

• Formal semantics

• Satisfiability of a rule

• Deterministic operators (⇒,∃, … )

• Popular implementations

• Efficient Prolog systems (SWI, YAP,…)

• Cons: • Cannot handle uncertainty (facts / rules are 0-1)

Traditional logical reasoning for activity

recognition

• Pros: • Can model events of arbitrary complexity (FOL expressiveness)

• Complexity = Multiple actors, temporal complexity and persistence (inertia)

• Rules are easy to write:

• Facilitation of interaction between developer & domain expert.

• Formal semantics

• Satisfiability of a rule

• Deterministic operators (⇒,∃, … )

• Popular implementations

• Efficient Prolog systems (SWI / YAP…)

• Cons: • Cannot handle uncertainty (facts / rules are 0-1)

Goal: Ameliorate

the cons without

losing the pros

Uncertainty in activity recognition

• Uncertainty in the input stream

• Human detection confidence

• Occlusions

• Identity maintenance / Tracking

• Uncertainty in the rules for recognizing events

• E.g. a rule that dictates that two people are “moving” along together

if they are moving in parallel might not always be true

• Traditional logic cannot handle those

• This paper deals with the second type of uncertainty

• [4] deals with the first, [5] deals with both

Today’s paper: DEC-MLN

• DEC-MLN: Discrete Event Calculus based on MLNs.

Today’s paper: DEC-MLN

• DEC-MLN: Discrete Event Calculus based on Markov Logic Networks.

• Format of the rest of the presentation:

1. The Event Calculus

2. The CAVIAR dataset

3. Markov Logic Networks

4. DEC-MLN approach

5. Pros / Cons of the approach & pointers for further discussion

Will discuss all of those…

• Discrete?

• Event Calculus?

• Markov Logic

Networks?

The Event Calculus

The Event Calculus

• A formal logical language for representing events and

their effects

• Introduced by Kowalski and Sergot [1]

• Core constructs:

• Fluents (F) which might take different values over time (F=V)

• Events E, considered instantaneous

• Time model T (Discrete? Continuous? Smooth?)

The “D” in

DEC-MLN

EC (contd.)

• Domain-independent axioms govern when a fluent F has

a value V

• Two axioms mainly studied in the literature

• 𝑕𝑜𝑙𝑑𝑠𝐴𝑡(𝐹 = 𝑉, 𝑇): fluent F has value V at time T

• 𝑕𝑜𝑙𝑑𝑠𝐹𝑜𝑟(𝐹 = 𝑉, 𝐼): I is the union of time intervals during

which F has value V continuously.

EC (contd.)

• Domain – dependent predicates govern the initiation and

termination of value assignments to fluents

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝐹 = 𝑉, 𝑇 ← 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝐸, 𝑇 , 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠 𝑇

• 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇- is a set of further conditions that have to be

satisfied by the actors involved in either 𝐹 or 𝐸. • Will provide concrete examples

• 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡/2 rules are of the same form

Law of inertia

• In EC, a fluent holds its value V once initiated and until

termination.

• Concretely:

𝑕𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 + 1 ← 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝐹 = 𝑉, 𝑇

𝑕𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 + 1 ←

𝑕𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 , 𝑛𝑜𝑡 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡(𝐹 = 𝑉, 𝑇)

• This temporal persistence (inertia) is at the core of EC

• Different approaches model it in different ways

Inertia!

EC take-home messages

• EC induces:

1. A separation between (1) domain-independent axioms,

encoded in terms of 𝑕𝑜𝑙𝑑𝑠𝐴𝑡 and (2) domain knowledge,

encoded in terms of 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 and 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 • Vision: event hierarchy (low-level and high-level events)

2. A natural characterization of temporal inertia

• An event holds until evidence to the contrary suggests that it shouldn’t.

• Vision: human detection, tracking…

CAVIAR

(Context-Aware Vision using Image-

based Activity Recognition)

The CAVIAR dataset

• Two sub-sets

• INRIA

• Portuguese shopping center

• This work focuses on INRIA

• 28 (staged) surveillance videos, 26419 frames.

• Annotations for coordinates, trajectories, events

• Even identity maintenance has been annotated

• One can always discard unwanted annotations depending on what

kind of inference / learning is to be performed

CAVIAR-INRIA

CAVIAR-INRIA (contd.)

High-level

(group)

events

Low-level

(atomic)

events

CAVIAR & EC

• CAVIAR naturally induces an activity hierarchy

• Annotations for:

• LLE: walking, running, inactive, active, abrupt

• HLE: fighting, meeting, moving, leaving_object

• Very compatible with Event Calculus formalism!

• HLE-> binary fluents (true/false)

• LLE-> events (logical facts)

• In this work, the focus is on HLE recognition (group

activities) based on LLE + locations / orientations

• LLE recognition also possible

• Pose identification / learning…

We

added

this

EC rule examples in CAVIAR

• Fights are initiated by a person performing the “abrupt” activity near a non-inactive person

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑕𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑏𝑟𝑢𝑝𝑡 𝑃1 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒 𝑃1, 𝑃2, 𝑇 , ¬ 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇

• Fights are terminated when a person walks away from the other person.

𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑕𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑃2, 𝑇 , ¬ 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)

EC rule examples in CAVIAR

• Fights are initiated by a person performing the “abrupt” activity near a non-inactive person

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑕𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑏𝑟𝑢𝑝𝑡 𝑃1 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒 𝑃1, 𝑃2, 𝑇 , ¬ 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇

• Fights are terminated when a person walks away from the other person.

𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑕𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑃2, 𝑇 , ¬ 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)

HLE (fluents)

LLE, events

𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇-

HLE (fluents)

LLE, events

𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇-

CAVIAR Pros / Cons

• Pros:

• Suitable for modeling group activities (High-Level Events, HLE) in

conjunction with atomic activities (Low-Level Events, LLE)

• Annotations for very different elements

• Cons:

• An “easy” dataset for many reasons:

• Very few occlusions (person-person, person-object) (NOT the case for

the Portuguese shopping center sub-dataset)

• Staged activities

• Lighting is controlled

• Static camera / background

CAVIAR Pros / Cons

• Pros:

• Suitable for modeling group activities (High-Level Events, HLE) in

conjunction with atomic activities (Low-Level Events, LLE)

• Annotations for very different elements

• Cons:

• An “easy” dataset for many reasons:

• Very few occlusions (person-person, person-object) (NOT the case for

the Portuguese shopping center sub-dataset)

• Staged activities

• Lighting is controlled

• Static camera / background Natural properties of

indoor surveillance

Markov Logic Networks

(in 5 slides…)

Markov Logic Networks

• Perhaps the most prevalent framework for merging FOL

with probabilistic inference

• From (binary) possible worlds to (soft) probable worlds

• Underlying representation: Ground Markov Random Field

(MRF)

• A set of FOL rules and constants are used to ground the MRF

• Nodes = binary ground facts (can be true or false)

• Every rule is translated to a clique in the network.

• Main idea: Rules are accompanied by weights

• Possible world = assignment to all variables (ground facts).

Grounding example

• Assume the following rule set drawn from a parking lot

surveillance scenario:

1. A car almost certainly has someone driving it.

∀𝑥 (𝐶𝑎𝑟(𝑥) ⇒ ∃𝑦 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥 ) (𝑤1 = 4.0)

2. Vehicles parked next to cars are usually also cars.

∀𝑥, 𝑦 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ⇒ 𝐶𝑎𝑟 𝑦 𝑤2 = 2.5

Grounding example

• Assume the following rule set drawn from a parking lot

surveillance scenario:

1. A car almost certainly has someone driving it.

∀𝑥 (𝐶𝑎𝑟(𝑥) ⇒ ∃𝑦 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥 ) (𝑤1 = 4.0)

¬𝐶𝑎𝑟 𝑥 ∨ 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥

2. Vehicles park next to cars are usually also cars.

∀𝑥, 𝑦 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ⇒ 𝐶𝑎𝑟 𝑦 𝑤2 = 2.5

¬ 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ) ∨ 𝐶𝑎𝑟 𝑦

We need to translate the

formulae into clausal form

before grounding

Grounding example (contd.)

• Combined with a constant set 𝐶 = *𝐽𝑜𝑕𝑛,𝑁𝑖𝑠𝑠𝑎𝑛+:

Car(J)

PNT(J, N) PNT(N, J)

Dr (N, J)

Car(N)

Dr(J, N) Dr(N, N)

Dr(J, J)

Grounding example (contd.)

• Combined with a constant set 𝐶 = *𝐽𝑜𝑕𝑛,𝑁𝑖𝑠𝑠𝑎𝑛+:

Car(J)

PNT(J, N) PNT(N, J)

Dr (N, J)

Car(N)

Dr(J, N) Dr(N, N)

Dr(J, J)

All ground facts participating in rule 2:

¬ 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ) ∨ 𝐶𝑎𝑟 𝑦

Joint probability

• The joint probability distribution for X = x is equal to:

𝑃 𝑋 = 𝑥 = 1

𝑍exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥

𝐹

𝑖=1

• 𝑤𝑖: real number representing a rule weight

• Can set to “infinite” (very large) for rules that capture certainties for

the environment

• 𝑛𝑖 (𝑥): #times rule 𝑖 is true in 𝑥 • Lots of groundings of a high-weight rule add to the probability of 𝑥

• This is the model feature associated with every clique

• 𝑍 = exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥𝐹𝑖=1𝑥∈X is a normalization constant

that ensures 𝑃 is an actual probability distribution.

Joint probability

• The joint probability distribution for X = x is equal to:

𝑃 𝑋 = 𝑥 = 1

𝑍exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥

𝐹

𝑖=1

• 𝑤𝑖: real number representing a rule weight

• Can set to “infinite” (very large) for rules that capture certainties for

the environment

• 𝑛𝑖 (𝑥): #times rule 𝑖 is true in 𝑥 • Lots of groundings of a high-weight rule add to the probability of 𝑥.

• This is the model feature associated with every clique

• 𝑍 = exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥𝐹𝑖=1𝑥∈X is a normalization constant

that ensures 𝑃 is an actual probability distribution.

2|X|

entries…

MLN Inference

• Direct computation of joint distribution 𝑃 is intractable

• Often resort to approximations such as sampling (Gibbs Sampling

/ MC-SAT)

• But sometimes the joint isn’t even what we want!

• Maximum A – Posteriori (MAP) inference (𝑎𝑟𝑔𝑚𝑎𝑥𝑥𝑃(𝑋 =𝑥)) • Approximate (MaxWalkSAT)

• Exact (AND/OR Branch & Bound)

• Marginals 𝑃 𝑋 = 𝑥 and conditionals 𝑃 𝑋 = 𝑥 𝐸 = 𝑒) • Sampling

• The possible presence of evidence 𝐸 prunes out areas of the

network and makes sampling faster

DEC-MLN

(Putting it all together)

EC-LP: The Baseline

• To evaluate their EC dialect, the authors compare against

a Prolog-based EC dialect described in [2] (EC-LP).

• The axioms (𝑕𝑜𝑙𝑑𝑠𝐴𝑡) and domain-dependent predicates

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 in that dialect have the form

already discussed.

• As a Prolog-based implementation, that dialect employs

the Closed World Assumption (CWA)

EC-LP: The Baseline

• To evaluate their EC dialect, the authors compare against

a Prolog-based EC dialect described in [2] (EC-LP).

• The axioms (𝑕𝑜𝑙𝑑𝑠𝐴𝑡) and domain-dependent predicates

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 in that dialect have the form

already discussed.

• As a Prolog-based implementation, that dialect employs

the Closed World Assumption (CWA)

CWA???

Open vs Closed World Assumption

• MLNs employ First-Order Logic (FOL), which is more

general than Logic Programming (LP), in that it makes the

Open World Assumption (OWA) instead of the more strict

Closed World Assumption (CWA)

DEC-MLN⇒ FOL ⇒ OWA

EC-LP⇒ LP ⇒ CWA

OWA vs CWA (contd.)

• In the CWA, we only believe what is known to us, and

everything else is perceived false.

• Example:

Knowledge Base Query Response

in(Room_4424, Jason) ?- in(Room_4424, Paul)

OWA vs CWA (contd.)



• Example:


in(Room_4424, Jason) ?- in(Room_4424, Paul) NO

CWA

OWA vs CWA (contd.)



• Example:


in(Room_4424, Jason) ?- in(Room_4424, Paul) NO

CWA


in(Room_4424, Jason) ?- in(Room_4424, Paul) UNKNOWN

OWA

OWA and EC

• We discussed earlier about the law of inertia in EC

• In EC-LP, the law is straightforward, because Prolog employs the CWA • If the 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 and 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 pre-conditions are not satisfied

by the evidence, then there is no fluent initiation / termination.

• However, DEC-MLN employs FOL (not LP), which follows the OWA! • Fluents may be initiated/terminated by irrelevant events, causing

the loss of the inertia

• The solution: circumscription via predicate completion • Intuitively: Introduce equivalences (⇔) wherever we have

implications (⇐)

• “Inject” the CWA into FOL, by ensuring that pre-conditions have a 1-1 relationship with conclusions.

Predicate completion

Σ =

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ←

𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑐𝑡𝑖𝑣𝑒 𝑃1 , 𝑇 ,

¬𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑃2 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ←

𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃1 , 𝑇 ,

¬𝑕𝑎𝑝𝑝𝑒𝑛𝑠(𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑃2), 𝑇 ,

¬𝑕𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇 ,𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)

Σ′ =

𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 →

𝑒𝑛𝑡𝑖𝑟𝑒 𝑏𝑜𝑑𝑦 𝑜𝑓 𝑓𝑖𝑟𝑠𝑡 𝑟𝑢𝑙𝑒 𝑎𝑏𝑜𝑣𝑒 ∨(𝑒𝑛𝑡𝑖𝑟𝑒 𝑏𝑜𝑑𝑦 𝑜𝑓 𝑠𝑒𝑐𝑜𝑛𝑑 𝑟𝑢𝑙𝑒 𝑎𝑏𝑜𝑣𝑒)

• By splitting every equivalence (⇔) into sets Σ and Σ’ we can add weights to rules in

either set and control how an HLE is recognized (with what probability)

Different cases

• Σ and Σ’ have “infinite” weights:

• HLEs are initiated and terminated with absolute certainty, and

inertia is retained.

HLE

Probability

Time 𝑡𝑖𝑛𝑖𝑡1

0

1

𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … 𝑡𝑡𝑒𝑟𝑚2 …

Different cases

• Σ has non-infinite weights, Σ’ has infinite weights.

• Given the preconditions for initiation and termination, the result

(head) may or may not hold!

• Hence HLEs are initiated and terminated with a degree of

uncertainty but inertia is retained.

HLE

Probability


0

1

𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … 𝑡𝑡𝑒𝑟𝑚2 …

0.4

0.6

Different cases

• Rules in Σ have infinite weights, Σ’ contains “soft” rules

• HLE initiated/terminated with certainty, but inertia is gradually lost

• By the implication in Σ’: initiation / termination conditions for the

HLE might be fired in the presence of irrelevant evidence (w.r.t the

rules)

HLE

Probability


0

1

𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … …

0.4

0.6

Different cases

• Both Σ and Σ’ are “soft” sets

• A combination of cases (2) and (3) occurs.

HLE

Probability


0

1

𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … …

0.4

0.6

Experimental results

• Input: • All 26419 frames of CAVIAR

• LLE annotations in the forms of 𝑕𝑎𝑝𝑝𝑒𝑛𝑠 ground facts

• Peoples’ coordinates are used to compute 𝑐𝑙𝑜𝑠𝑒-ness of people

• Peoples’ poses are encoded in ground 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 predicates

• The frames that a person enters or exits the scene, in the form of corresponding ground predicates 𝑒𝑛𝑡𝑒𝑟 and 𝑒𝑥𝑖𝑡

• Output: • Sequence of ground 𝑕𝑜𝑙𝑑𝑠𝐴𝑡(𝐹 = 𝑉, 𝑇) predicates, indicating that 𝐹 = 𝑉 at time 𝑇 • A detection probability of 0.5 or above signifies a positive.

• Comparison to HLE annotation makes evaluation possible

• Evaluation: • Precision, Recall, F-measure

Experimental results (contd.)

• Compared against EC-LP for the HLE “meeting” over all

28 videos:

• DEC-MLNa: Only rules in Σ are soft-constrained

• DEC-MLNb: Both Σ and Σ’ are soft-constrained

Method TP FP FN Precision Recall

EC-LP 3099 2258 525 0.578 0.855

DEC-MLNa 3048 1762 576 0.633 0.841

DEC-MLNb 3048 1154 576 0.725 0.841



28 videos:



• What does the improvement imply?


EC-LP 3099 2258 525 0.578 0.855

DEC-MLNa 3048 1762 576 0.633 0.841

DEC-MLNb 3048 1154 576 0.725 0.841



28 videos:




EC-LP 3099 2258 525 0.578 0.855

DEC-MLNa 3048 1762 576 0.633 0.841

DEC-MLNb 3048 1154 576 0.725 0.841

Adding a small weight to

the rarely fired 2nd

initiation rule for 𝑚𝑒𝑒𝑡 reduces #FP (see paper

for details)



28 videos:




EC-LP 3099 2258 525 0.578 0.855

DEC-MLNa 3048 1762 576 0.633 0.841

DEC-MLNb 3048 1154 576 0.725 0.841

Allowing the inertia to

decrease enables the 𝑚𝑒𝑒𝑡 HLE’s probability to

decrease faster in cases of

co-occurrence with 𝑚𝑜𝑣𝑒 (see paper for details)

Experimental results - Conclusions

• Rules are imperfect

• According to the human annotation, meeting may occur in cases

which are not captured by our rules

• In this system, the rules were found manually, i.e there was no

structure learning

• But even with learning, one cannot achieve perfect F-measure, because

that would overfit extremely.

• Therefore, rules will always be imperfect!

• DEC-MLN allows for a relaxation of how much we trust

our event modeling rules.

Pros

• First probabilistic dialect of the EC

• Good theoretical insights on how the Event Calculus can

be translated into MLNs; implications about inference

• Lift over EC-LP baseline

• Interesting explanation of how imperfect rules can be thus

regularized to meet with their own inadequacies

• Per-frame approach

• Applicable online

Cons

• Hard to motivate to Vision people

• Rules might appear simplistic

• The notion of rule uncertainty is prevalent in SRL, but not all activity

recognition in Vision is rule-based

• Lackluster experimental evaluation

• Compare to 3rd paper, which experiments on the same dataset and

against the same baseline

• Per-frame approach

• Errors accumulate

Input stream uncertainty

What about uncertainty in the

input stream? Can you handle

that with DEC-MLN?

Input stream uncertainty

• Yes! • Assigning weights to Σ only allows emulation of the approach of [4],

which deals with input stream uncertainty through ProbLog

• Also see [5] for an MLN-based approach which incorporates input stream uncertainty by using “observational variables” (dummy rules)

What about uncertainty in the

input stream? Can you do

that with the DEC-MLN?

Discussion

• Logic-based expressiveness

• Integration with state-of-the-art probabilistic modeling systems

• Alchemy

• YAP Prolog / ProbLog

• Is it important enough?

• I think so, but LP has fallen out of favor probably because of the semantic

web’s inadequacies…

• Important directions: weight / structure learning from data

• Part of current work in NCSR

• Structure learning challenging because of OWA / CWA semantics

• What can be treated “inertially” in Vision?

• Identity maintenance?

• Activity recognition in the presence of occlusions?

References

1. R. Kowalski and M. Sergot: A logic-based Calculus of Events. In New Generation Computing 4: 67–95, 1986.

2. Artikis A., Sergot M. and Paliouras G. A Logic Programming Approach to Activity Recognition, ACM International Workshop on Events in Multimedia, 2010.

3. V. Shet, J. Neumann, R. Visvanathan and L.S Davis: Bilattice-based logical reasoning for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007

4. A.Skarlatidis, A.Artikis, J.Filippou and G.Paliouras: A Probabilistic Logic Programming Event Calculus. In Theory and Practice of Logic Programming, Special Issue in Probability, Logic & Learning, 2013

5. S. Tran, L. S. Davis: Event modeling and recognition using Markov Logic Networks. In European Conference on Computer Vision (ECCV), 2008

6. M. Richardson, P.Domingos: Markov Logic Networks. In Journal of Machine Learning, vol. 62, p. 107-136, February 2006

Date post:	17-Aug-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Activity recognition through MLNs / ProbLogjasonfil/files/cvss_07_03.pdf · 2013. 7. 12. ·...

Documents