TAC 2018 Streaming Multimedia KBP Pilot...National Institute of Standards and Technology. Background...

transcript

TAC2018StreamingMultimediaKBPPilot

HoaTrangDang

NationalInstituteofStandardsandTechnology

Background

• NISTwillevaluateperformersinDARPAAIDAProgram(ActiveInterpretationofDisparateAlternatives)• SomeAIDAevaluationswillbeopenevaluationsinTACandTRECVID.• ThegoalofAIDAistodevelopasemanticenginethatautomaticallygeneratesmultiplealternativeanalyticinterpretations ofasituation,basedonavarietyofunstructuredsourcesthatmaybenoisy,conflicting,ordeceptive.• Documentscancontainamixofmultilingualtext,speech,image,video;includingmetadata.• Adocumentcanbeassmallasasingletweet,oraslargeasaWebpagecontaininganewsarticlewithtext,picturesandvideoclips.

§ Alldatawillbein streamingmode; systemscanaccessthedataonlyonceinrawformat,butmayaccessaKBcontainingastructuredsemanticrepresentationofalldataseentodate

ACTIVE INTERPRETATIONOF DISPARATE ALTERNATIVES(AIDA)• Givenascenario(“Benghazi”),documentstream,andseveraltopics.Foreachtopic:

• TA1outputsallKnowledgeElements(entities,relations,events,etc.,definedintheontology)inthedocuments,includingalternativeinterpretations• TA2fusesKEsfromTA1intotheTA2KB,maintainingalternativeinterpretations• TA3constructsinternallyconsistenthypotheses(partialKBs)fromTA2KB

TA1TA2TA3

Scenario-SpecificOntology

• Scenarioswillinvolveeventssuchasinternationalconflicts,naturaldisasters,violenceatinternationalevents,orprotestsanddemonstrations.• AIDAwillextendKBPontologyofentities,relations,events,beliefandsentimenttoincludeadditionalconceptsthatareneededtocoverinformationalconflictsineachtopicinthescenario• Ideally,wouldhaveasingleontologyforalltopicsinthescenario(?)

AIDAKBrepresentation

• KnowledgeElement(KE)isastructuredrepresentationofentities,relations,events,etc.-- likelyanaugmentedtriplelikeinColdStartKB• Tripleisaugmentedwithprovenanceandconfidence• Provenanceisasetofjustifications.Eachjustificationhasajustification-levelconfidence• KE-levelconfidenceisexplicitlyprovidedbyTA1andTA2,andisanaggregationofjustification-levelconfidences

• KBcontainsconflictingKEs(asfoundintherawdocuments)• Representation-- notreconciliation-- ofconflicts

WhatisallowedinKBrepresentation?

• AIDA:“Althoughtheremaybeneedforsomenaturallanguage,imagethumbnails,featurized media,etc.intheKBforreference,registration,ormatchingpurposes,itisexpectedthatmostoftheassertionsintheKBwillbeexpressibleinthestructuredrepresentation,withelementsderivedfromanontology.”• FeaturesaccessibletoTA1/TA2inKEcannotbedocument-levelcontentfeatures(?).Allowablefeaturesinclude• Numberofsupportingdocs,andlinktodocs(butcan’treaddocs)• Timeoffirstsupportingdoc,mostrecentsupportingdoc

• Comments/recommendationsfromparticipatingteamsarewelcomeregardingwhatfeaturesshouldbeallowedintheKB• Forevaluationpurposes,provenanceaccessibletoLDCshouldbepointersintotherawdocumentsdenotingtextspans,audiospans,images,orvideoshots

TAC/TRECVID2018tasks(pilot)• Task1:Extractallevents,subeventoractions,entities,relations,locations,time,andsentimentfrommultimediadocumentstream ,conditionedonzeroormoredifferentcontexts,orhypotheses (TAC,TRECVID2018)• OutputisasetofallpossibleKEs,includingconfidenceandprovenance• Mention-leveloutput,includingwithin-documentlinking

• Task2:BuildKBbyaggregatingallKEsfromTA1and“user”(TAC2018)• OutputisKBincludingcross-doclinking• Evaluatebyqueries(withentrypoints)andassessment

• [Task3:CreatehypothesesfromTask2KBs(AIDAprogram-internalin2018)]

Training/Evaluationdata

• Onenewscenarioperevaluationcycle;4 scenariostotaloverlifetimeofAIDAprogram.• 100Kdocs/scenario,includingrelevantandirrelevantdocuments• 5-20%ofdocswillberelevanttothescenario• 200labeleddocsperscenario

• 12-20topicsperscenario• Atleastoneforeignlanguageperscenario,plusEnglish• AIDA:“Governmentwillprovidelinguisticresources andtoolsofaqualityandcompositiontobedetermined,butconsisting atleastofthetypeandsizefoundinaLORELEIRelatedLanguagePack (LRLP)"

LowResourceLanguagePacks• 1Mw- 2Mw+monotextfromnews,webtext&socialmedia• 300Kw- 1.1Mw+paralleltextofvariablequality(professional,crowd,found,comparable)• Annotationsfor25Kw- 75Kw/languageincluding

• SimpleNamedEntity(PER,ORG,GPE,LOC/FAC)• KBlinkingofnamestoGeoNames andCIAWorldFactBook• SituationFrames:needs/issuesforanincident(e.g.UrgentshelterneedinKermanshahprovince)

• FullEntity(name,nom,pro)andwithin-doccoref• Predicate-argumentannotationofdisaster-relevantActsandStates

• Grammaticalresourcesrangingfromfullgrammaticalsketchtofoundresources(dictionaries,grammars,primers,gazetteers)tolexicons• BasicNLPtoolsincludingword,sentencesegmenters,encodingconverters; nametaggers

RelatedTRECVIDTasks

TRECVID(2001– Present)• Shotboundarydetection:Identifytheshotboundariesinthegivenvideoclip(s)• High-levelfeatureextraction/SemanticIndexing:Givenastandardsetofshotboundariesandalistoffeature(concepts)definitions,returnarankedlistofshotsaccordingtothehighestpossibilityofdetectingthepresenceofeachfeature

• Ad-hocVideoSearch:Givenastatementofinformationneed,returnarankedlistofshotswhichbestsatisfytheneed;similartosemanticindexing,butwithcomplexconcepts(combinationofconcepts);e.g.,findgroupofchildrenplayingfrisbee inapark.

• RushesSummarization:Givenavideofromtherushestestcollection,automaticallycreateanMPEG-1summarycliplessthanorequaltoamaximumdurationthatshowsthemainobjectsandeventsintherushesvideotobesummarized

• Surveillanceeventdetection:detectasetofpredefinedeventsandidentifytheiroccurrencestemporally

• Content-basedcopydetection:givenatestcollectionofvideosandasetof(video,audio,video+audio)queries,determineforeachquerytheplace,ifany,thatsomepartofthequeryoccurs,withpossibletransformations,inthetestcollection

TRECVID(2001– Present)• Known-itemSearch:Givenatext-onlydescriptionofthevideodesiredandatestcollectionofvideowithassociatedmetadata,automaticallyreturnalistofupto100videoIDsrankedbyprobabilitytobetheonesought• InstanceSearch:Givenacollectionoftestvideos,amastershotreference,andacollectionofqueriesthatdelimitaperson,object,orplaceentityinsomeexamplevideo,locateforeachquerythe1000shotsmostlikelytocontainarecognizableinstanceoftheentity[AIDATA2cross-doccoref]• MultimediaEventDetection:Givenacollectionoftestvideosandalistoftestevents,indicatewhethereachofthetesteventsispresentanywhereineachofthetestvideosandgivethestrengthofevidenceforeachsuchjudgment• Localization:Givenavideoshot,Determinethepresenceofaconcepttemporallywithintheshot,withrespecttoasubsetoftheframescomprisedbytheshot,and,spatially,foreachsuchframethatcontainstheconcept,toaboundingrectangle[AIDAprovenance?]

Latesttaskintroducedin2016:Video-to-Text• Givenasetof2000URLsofTwitter(Vine)videosandsetsoftextdescriptions(eachcomposedof2000sentences),systemsareaskedtoworkandsubmitresultsfortwosubtasks:

• MatchingandRanking: ReturnforeachvideoURLarankedlistofthemostlikelytextdescriptionthatcorrespond(wasannotated)tothevideofromeachofthedifferenttextdescriptionsets.

• DescriptionGeneration: AutomaticallygenerateforeachvideoURLatextdescription(1sentence)independentlyandwithouttakingintoconsiderationtheexistenceoftextdescription

• Systemsandannotatorswereencouragedtodescribevideosusing4facets:• Who isthevideodescribingsuchasconcreteobjectsandbeings(kindsofpersons,animals,things)• What aretheobjectsandbeingsdoing?(genericactions,conditions/stateorevents)• Where suchaslocale,site,place,geographic,architectural(kindofplace,geographicorarchitectural)

• When suchastimeofday,season,etc

AirplaneAnchorpersonAnimal Basketball BeachBicyclingBoat_ShipBoy Bridges BusCar_RacingChair CheeringClassroom Computers Dancing Demonstration_Or_ProtestGreetingHand Highway

Sitting_DownStadium Swimming Telephones ThrowingBaby Door_OpeningFields Flags Forest George_BushHill Lakes Military_AirplaneExplosion_FireFemale-Human-Face-Closeup Flowers GirlGovernment-Leader Instrumental_Musician

Oceans Quadruped Skating Skier SoldiersStudio_With_AnchorpersonTraffic Kitchen MeetingMotorcycle News_StudioNighttime Office Old_PeoplePeople_MarchingPress_ConferenceReportersRoadway_JunctionRunningSinging

ExamplesofconceptsusedintheTRECVIDSemanticINdexing(SIN)task

Multimedia

• Eachdocumentcancontainamixoftext,speech,image,video;includingmetadata.• Multiplelanguages:Englishplus1-2foreignlanguages(TBA)• LDCwillprovidelanguagepackscontainingresourcesforeachlanguage

• Allparticipantswillbegiventhesamedocuments• Participantsareallowedtoprocessinfoinapropersubsetofthelanguagesormediatypes• NISTmayreportbreakdownevaluationresultsbylanguage,mediatype,etc.

StreamingExtraction

• Documentsarriveinbatchesasachunk.• ~100documents/chunk(?),withcaponlengthoftimecoveredinachunk

• TA1(andTA2?)systememitsKE’s(triple+confidence+extras)aftereachchunk.• Atspecifiedtimepointsinthestream,thesetofaccumulatedKE’sisevaluated.• Rankedprecision/recallderivatives.

• Atsomeofthosepoints,awildhypothesisappears!• Ahypothesis=asetofproposedtuples.• TA1systemoutputsKE’sprimedbythehypothesis,whichareevaluated.

TA1ExtractionConditionedonContext• TA1mustbecapableofacceptingalternatecontexts and producingalternateanalyses foreachcontext.• Forexample,theanalysisofacertainimageproducesknowledgeelementsrepresentingabuson aroad.However,knowledgeelementsinoneormorehypothesessuggestthatthisisariverratherthanaroad. Theanalysis algorithmshouldusethisinformationforadditionalanalysisoftheimagewithpriorsfavoringa boat.

• Simplifyingassumptionsforevaluationpurposes:• Contextsarecoherenthypotheses(representedasapartialKB)drawnfromasmallstaticsetofpossiblehypothesesthatareproducedmanuallybyLDC• Only“whatif”hypothesesareinputtoTA1;KEsandconfidencevaluesresultingfrom“whatif”hypothesesdonotgetpassedontoTA2butareevaluatedseparately

HowisTask1differentfrompastTRECVIDandTACcomponenttasks?

• Multimedia• Streaminginput• Can’tgobacktoreanalyzerawdocsinpreviousdatachunks

• TA1hasaccesstoTA2KBencodingpreviouslyaddedKE’s

• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationalconflictsinscenario• TA1outputsallpossibleextractionsandinterpretations,notjustthemostconfidentones• TA1extractionfromdataitemsmaybeconditionedonhypothesis

HowisTask2 differentfromColdStartKBP?

• Multimedia• Streaminginput• TA2hasnoaccesstorawdataitemstoassistinfusingincomingKEswithexistingKB;canonlyusewhat’srepresentedintheincomingKEandexistingKB

• Multiplehypothesesandinterpretations• Expandedontologytocoverinformationconflictsinscenario• TA2KBmustmaintainallpossibleKEs(evenlow-confidenceKEs)inordertosupportcreationofmultiplehypothesesanddisparateinterpretations• TA2KEsandconfidencestheoreticallycouldbeconditionedonhypothesisinfuture,butfor2018theTA2KBisindependentofany“whatif”hypotheses.

EvaluationbyAssessment

• Evaluateusingpost-submissionassessmentandclusteringofpooledmentions• TosupportevaluationofTA1extractionconditionedoncontext,ground-truthmustbeconditionedonasmallsetofhypotheses,predeterminedbyLDC.

• OnlytargetedKEs(relevanttohypotheses)willbeevaluated• Onlykhighest-confidencementions/justificationsforeachKEwillbepooledandassessed• LDCmight provideexhaustiveannotationofmentionsofentitiesforasmall setofdocuments,forgold-standardbased“NER”evaluation

AIDAEvaluationSchedule

• 318-monthphases• January2018kick-off

• ~Sept2018:Eval Pilot• ~May2019:Eval 1(Phase1)• ~Nov2020:Eval 2(Phase2)• ~May2022:Eval 3(Phase3)

TAC2018StreamingMMKBPPilotEvaluationSchedule

• Sample/training/eval datarelease:• ~January:scenarioand3mostlylabeledtopicsfortraining;all100Kunlabeleddocsforthescenario(foreignlanguagesannouncedatthistime)• ~April:3additionallabeledtopicsfortraining• ~September:6“evaluation”topics

• EarlySeptember(?):Task1evaluationwindow• MidSeptember(?):Task2evaluationwindow