+ All Categories
Home > Documents > An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... ·...

An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... ·...

Date post: 18-Feb-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
43
An old Ar(ficial Intelligence dream that comes true: Merging language and vision modali(es Raffaella Bernardi University of Trento
Transcript
Page 1: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

An  old  Ar(ficial  Intelligence  dream  that  comes  true:  

Merging  language  and  vision  modali(es  Raffaella  Bernardi  University  of  Trento  

Page 2: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

An  old  AI  dream  

A.  Turing,  Compu(ng  machinery  and  intelligence,  Mind  59,  pp.  433-­‐460,  1950    

Need  of:  •  Natural  Language  Processing  (NLP)  •  Knowledge  Representa(on  •  Reasoning  •  …  

Page 3: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

AI  

Knowledge  Representa(on   Planning  

Machine  Learning  

Natural  Language  Processing  

computer  vision  

Reasoning  

Robot  

Social  Intelligence  

Crea(vity  

Page 4: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Natural  Language  Processing  (NLP):    

 

•  Part  of  Speech  Tagging  (PoS)  •  Syntax  •  Seman(cs  •  Discourse  •  Dialogue  

Page 5: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Distribu(onal  Seman(cs  The  meaning  of  a  word  is  given  by  its  context  

Page 6: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Distribu(onal  Seman(cs:    coun(ng  words  distribu(on  

Words  are  represented  by  vectors  harvested  from  a  corpus  of  texts  by  coun(ng  word  co-­‐occurences.  

Page 7: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Distribu(onal  Seman(cs:    Predict  the  context  

The  vector  represen(ng  a  word  is  obtained  by  learning  to  predict  its  nearby  words.  (Mikolov  et  al,  2013)  

Page 8: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Seman(c  Rela(onship  Mikolov  et  al.  NIPS  2013    

Page 9: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Pause:  Neural  Network  

It's  a  composi(on  of  func(ons  (neurons)  that  goes  from  an  n-­‐dimensional  vector  to  class  scores.  

Each  neuron  receives  some  inputs,  performs  a  dot  product  and  op(onally  follows  it  with  a  non  linearity.  On  the  last  (fully-­‐connected)  layer,  they  have  a  loss  func(on  (e.g.,  So]max).  

Page 10: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Pause:  Recurrent  NN  

Tradi(onal  neural  networks  cannot  use  the  informa(on  about  previous  inputs  to  inform  later  ones.  •  Recurrent  neural  networks  (RNNs)  address  this  issue:  They  are  networks  with  loops  in  them,  allowing  informa(on  to  persist.  They  work  well  with  short  dependencies.  

•  Long  Short  Term  Memory  (LSTM)  are  a  special  kind  of  RNN,  capable  of  learning  long-­‐term  dependencies.  

Page 11: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

LSTM:  Sentence  representa(on  

Star(ng  from  word2vec  word  representa(ons  or  from  the  plain  words,  obtain  the  sentence  representa(on  via  LSTM:  

Page 12: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Distribu(onal  Seman(cs:  A  successful  story..  

Lexical  meaning  •  Synonyms  •  Concept  categoriza(on  (eg.  car  ISA  vehicle)  •  Selec(onal  preferences  (e.g.  eat  chocolate  vs.  *eat  

sympathy)  •  Rela(on  classifica(on  (exam-­‐anxiety  CAUSE-­‐EFFECT  

rela(on)  •  Salient  proper(es  (car-­‐wheels)  

Composi5onality:  Phrase  and  Sentence    •  Similarity    •  Entailment  

Page 13: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Distribu(onal  Seman(cs:  ..  but  Grounding  Problem  

Grounding  language  representa(on  into  the  world:  point  to  the  reference  of  our  mental  representa(on.  

Page 14: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Computer  Vision:  From  pixels  to  Meaning  

Page 15: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Computer  Vision:  Abstract  Features  

Page 16: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

CV  tradi(onal  tasks:  Objects  

Image  classifica(on:  

Object  localiza(on:  

From  objects  to  scene  classifica(on  

Page 17: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

CV  first  important  revolu(on:  ImageNet  

ImageNet:    •  Stanford  Vision  Lab,  Stanford  University  &  Princeton  University.  

•  Image  database  organized  according  to  the  WordNet  hierarchy.  

•  Challenges:  2007-­‐present  •  AMT:  48,940  annotators  from  167  countries  •  15M  images  •  22K  categories  of  objects  

Page 18: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

CV  second  important  revolu(on:  Convolu(onal  Neural  Networks  

ImageNet  Classifica(on  with  Deep  Convolu(onal  Neural  Networks    Alex  Krizhevsky,  Ilya  Sutskever  and  Georey  E.  Hinton,  2012    •  2012:  Krizhevsky  outperformed  the  

other  systems  using  CNN  •  2013:  half  of  the  systems  used  CNN  •  2014:  All  of  the  systems  used  CNN.  

Page 19: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

CNN:  Hierarchy  of  features  

Page 20: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

CNN:  off-­‐the-­‐shelf  vector  representa(on  

•  Train  a  CNN  on  a  vision  task  (e.g.  AlexNet  on  ImageNet)  •  Do  a  forward  pass  given  an  image  input  •  Transfer  one  or  more  layers  (e.g.  FC7  or  C5)  

Page 21: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Language  and  Vision  

Language  and  Visual  Spaces  can  be  combined!  

Cogni(ve  Angle:    Language  and  Vision  Representa(ons    

must  be  combined!  

Applied  Angle:  Combining  Language  and  Vision  Representa(ons  

gives  very  useful    

Page 22: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Language  and  Vision  

•  Mul(modal  Tasks:  –  Exploit  language  to  improve  on  tradi(onal  CV  tasks  –  Exploit  vision  to  improve  on  tradi(onal  NLP  tasks  – New  Mul(modal  Tasks  

•  Mul(modal  Representa(ons:  –  learned  separately  and  translated  one  into  the  other  –  learned  separately  and  concatenated  –  learned  jointly  

 

Page 23: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Mul(modal  Tasks:  Improve  tradi(onal  CV  tasks  

Not  a  lemon,  it's  more  probable  a  tennis  ball.  -­‐-­‐  Info  come  from  a  KB  (word  similarity  list,  extracted  from  internet  Google  Sets).    Rabinovich,  A.  Vedaldi,  C.  Galleguillos,  E.  Wiewiora,  S.  Belongie  (ICCV  2007)  Objects  in  Context.  

Use  of  Corpora  for  Ac(on  Recogni(on.  Thu  Le  Dieu,  Jasper  Uijlings  and  R.  Bernardi  (2010,  2011)  

Page 24: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Mul(modal  Tasks:    Improve  tradi(onal  NLP  tasks  

E.  Bruni,  G.B.  Tran  and  M.  Baroni  (GEMS  2011,  ACL  2012,  Journal  of  AI  2014),  E.  Bruni,  G.  Boleda,  M.  Baroni  and  N.  Tran  (ACL  2012)  

Page 25: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Mul(modal  Vector  Spaces  

Kiros  et  al.  2014    

Page 26: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks:  Cross-­‐Modal  Mapping  

Lazaridou,  Bruni  and  Baroni  ACL  2014  

Page 27: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks:  Image  Cap(oning  (IC)  

•  Datasets:  Flickr,  Pascal,  MS-­‐COCO  (164K  images,  5  cap(ons  each)  •  Survey:  Automa(c  Descrip(on  Genera(on  from  Images:  A  Survey  of  Models,  

Datasets,  and  Evalua(on  Measures,  Bernardi  et  al.  JAIR  2016  •  Very  good  talk:  by  Karpathy  (2015):  

Limita5ons:  •  Evalua(on  Measures:  Bleu,  Rouge,  etc.  but  not  precise.  •  No  reasoning  

Page 28: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks:  Visual  Ques(on  Answering  (VQA)  

Limita5ons:  •  Language  prior  problem:  Blind  models  perform  preky  well  (50%  accuracy  on  COCO-­‐

VQA!).  è  But  see  development  of  new  real  image  datasets:  VQA2,  TDIUC  

Datasets:  DAQUAR  2014,  COCO-­‐QA,  VQA,  Visual7W,  Visual  Genome,  VisWiz  Survey:  Visual  Ques(on  Answering:  A  Survey  of  Methods  and  Datasets  Wu  et  ali,  (2016)  

Page 29: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks  Image-­‐Text  Aiignment  Datasets:  Faces  in  the  Wild,  Flickr  

30k  En((es,  VRD,  Visual  Genome    Duygulu  et  al  2002,  Barnard  et  al  2003,  Berg  et  al  2004,  Plummer  et  al  2015,  Karpathy  and  Fei-­‐Fei  2015,  Zhu  et  al  2015,  Krishna  et  al  2016,  Lu  et  al  2016  

Referring  Expressions   Datasets:  D-­‐TUNA  Corpus,  Referit  Game  Dataset,  Referit  Game  MS-­‐COCO    Mitchell  et  al  2013,  Fitzgerald  et  al  2013,  Kazemzadeh  et  al  2014,  Mao  et  al  2015,  Yu  et  al  2016,  Hu  et  al  2016,  Yu  et  al  2017,  Nagaraja  et  al  2016,  Fang  et  al  2015  

Credits:  Vicente  Ordóñez-­‐Román  

Page 30: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks  Diagnos(c  Datasets:  FOIL  

Shekhar  et  al  ACL  2017:  hkps://foilunitn.github.io/    

Page 31: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks  Diagnos(c  Dataset:  CLEVR  

Jonhson  et  al  CVRP  2017:  hkps://cs.stanford.edu/people/jcjohns/clevr/    

Page 32: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

New  Mul(modal  Tasks:  Diagnos(c  Datasets:  NLVR  

Suhr  et  al  ACL  2017:  hkps://github.com/clic-­‐lab/nlvr    

Page 33: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

 Other  more  recent    

New  Mul(modal  Tasks:    •  Spoken  VQA  

•  Mul(modal  Machine  Transla(on  •  Image  Genera(on  •  Visual  Dialogue    •  Visual  Story  Telling  (Huang  et  al.  2016)  •  Ques(on  Genera(on  (Mostafazadeh  et  al  2016,  Jain  et  al  2017)  •  Explana(on  (Park  et  al.  2018),  Counter-­‐factual  (Hendricks  et  al.  

2018),  Inferences  (Iyyer  et  al.  2017),  Entailment  (Vu  et  al.  2018)  •  Emo(on  recogni(on,  You  et  al.  2016  •  Learning  to  quan(fy  (vague  quan(fiers,  exact  numbers).  Pezzelle  et  

al.  2016,  2017,  2018  •  …………  

Page 34: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Visual  Dialogue:  GuessWhat?!  game  

•  Collected  by  de  Vries  et  al  2017  via  AMT  

•  Two  par(cipants  see  an  image  (from  MS-­‐COCO).  

•  155K  dialogues  about  66K  different  images  

•  Av.  of  QA  per  game:  5.2  •  84.6%  of  the  games  are  

completed  successfully    

See  also:  Visual  Dialog  hkps://visualdialog.org      

Page 35: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Mul(modal  Representa(on  Multimodal Distributional Semantics Bruni, Tran and Baroni (2014)

Combining Language and Vision with a Multimodal Skipgram Model Lazaridou, Phan and Baroni (2015)

Page 36: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Basic  Mul(modal  Models:  Point-­‐wise  mul(plica(on  

Page 37: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

What  has  the  community  gained?  •  Aken(on  Networks  •  Hierarchical  Co-­‐aken(on  •  Bokom-­‐up  Top-­‐down  aken(on  

•  Composi(onality  •  Mul(-­‐modal  Pooling  

Bokom-­‐Up  and  Top-­‐Down  Aken(on  Anderson  et  al.,  CVPR  18  

Mul(modal  Compact  Bilinear  Pooling  Fukui  et  al.,  EMNLP  16  

Neural  Module  Networks  Andreas  et  al.,  CVPR  16  

Hierarchical  Ques(on-­‐Image  Co-­‐Aken(on    Lu  et  al.,  NIPS  16  

Stacked  Aken(on  Networks    Yang  et  al.,  CVPR  16  

Credits:  Aishwarya  Agrawal  

Page 38: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Cuung-­‐edge  fancy  models:  Learning  Paradigms  

•  Adversarial  learning  •  Reinforcement  Learning  •  Coopera(ve  Learning  •  …  

Page 39: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Surveys  

•  ACL  2017:  hkps://www.cs.cmu.edu/~morency/MMML-­‐Tutorial-­‐ACL2017.pdf  

•  COLING  2018hkps://arxiv.org/abs/1806.06371  

Page 40: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

Some  research  groups  •  Stanford  Vision  Lab    Le  Fei  Fei  hkp://vision.stanford.edu/  •  MIT:  Antonio  Torralba  hkp://web.mit.edu/torralba/www/  •  University  of  North  Carolina.  Tamara  Berg  hkp://www.tamaraberg.com/  •  Virginia  University  Devi  Parikh  hkps://filebox.ece.vt.edu/~parikh/CVL.html  •  CLIC  hkp://clic.cimec.unitn.it/lavi/      •  Edinburgh  University  (M.  Lapata,  F.  Keller  )  •  University  of  Sheffild    Lucia  Specia  

hkp://staffwww.dcs.shef.ac.uk/people/L.Specia/  •  Universitat  Pompeu  Fabra,  COLT  group,  Gemma  Boleda:  

hkp://gboleda.utcompling.com/  

•  Facebook  FAIR  •  Google  DeepMind  •  More  on  the  iV&L  Net  Cost  Ac(on  

hkp://www.cost.eu/COST_Ac(ons/ict/Ac(ons/IC1307  

Page 41: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

LaVi  @  UniTn  

•  Learning  the  meaning  of  Quan(fiers  from  Language  and  Vision:  hkps://quan(t-­‐clic.github.io/  

•  Visually  Grounded  Talking  Agents  (in  collabora(on  with  UvA):  hkps://vista-­‐unitn-­‐uva.github.io/  

•  Grounded  TE  (in  collabora(on  with  Malta):  hkps://github.com/claudiogreco/coling18-­‐gte  

On  going  work:  •  Be  Different  for  Be  Beker  (with  SAP)  •  Con(nual  learning      

Page 42: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

UniTN  LaVi  People  

Ionut  (-­‐>Barcelona)   Sandro   Ravi  

Aliia  

Claudio  

Alberto   Aurelie   me  

Page 43: An#old#Ar(ficial#Intelligence#dream#that comes#true:# …bernardi/Slides/lavi_tutorial... · 2018-09-10 · AI Knowledge Representaon# Planning# Machine# Learning# Natural# Language#

References  on  tasks    •  Jain,  U.,  Zhang,  Z.,  Schwing,  A.G.:  Crea(vity:  Genera(ng  Diverse  Ques(ons  using  Varia(onal  

Autoencoders.  In:  CVPR.  (2017)  5415-­‐5424  •  Li,  Y.,  Huang,  C.,  Tang,  X.,  Change  Loy,  C.:  Learning  to  Disambiguate  by  Asking  Discrimina(ve  

Ques(ons.  In:  Proceedings  of  the  IEEE  Interna(onal  Conference  on  Computer  Vision.  (2017)  3419-­‐3428  

•  Vondrick,  C.,  Oktay,  D.,  Pirsiavash,  H.,  Torralba,  A.:  Predic(ng  mo(va(ons  of  ac(ons  by  leveraging  text.  In:  Proceedings  of  the  IEEE  Conference  on  Computer  Vision  and  Pakern  Recogni(on.  (2016)  2997-­‐3005  

•  Park,  D.H.,  Hendricks,  L.A.,  Akata,  Z.,  Rohrbach,  A.,  Schiele,  B.,  Darrell,  T.,  Rohrbach,  M.:  Mul(modal  Explana(ons:  Jus(fying  Decisions  and  Poin(ng  to  the  Evidence.  In:  31st  IEEE  Conference  on  Computer  Vision  and  Pakern  Recogni(on.  (2018)  

•  Hendricks,  L.A.,  Hu,  R.,  Darrell,  T.,  Akata,  Z.:  Genera(ng  Counterfactual  Explana(ons  with  Natural  Language.  arXiv  preprint  arXiv:1806.09809  (2018)  

•  You,  Q.,  Luo,  J.,  Jin,  H.,  Yang,  J.:  Building  a  Large  Scale  Dataset  for  Image  Emo(on  Recogni(on:  The  Fine  Print  and  The  Benchmark.  In:  AAAI.  (2016),  308-­‐314  

•  Yu,  L.,  Park,  E.,  Berg,  A.C.,  Berg,  T.L.:  Visual  madlibs:  Fill  in  the  blank  descrip(on  genera(on  and  ques(on  answering.  In:  Proceedings  of  the  IEEE  interna(onal  conference  on  computer  vision.  (2015)  2461-­‐2469  

•  Iyyer,  M.,  Manjunatha,  V.,  Guha,  A.,  Vyas,  Y.,  Boyd-­‐Graber,  J.L.,  Daume  III,  H.,  Davis,  L.S.:  The  Amazing  Mysteries  of  the  Guker:  Drawing  Inferences  Between  Panels  in  Comic  Book  Narra(ves.  In:  CVPR.  (2017)  6478-­‐6487  

è  For    a  rather  extensive  overview  see  Pezzelle  et  al.  SiVL  2018    


Recommended