+ All Categories
Home > Documents > Vagueness - the Neglected Feature in Big Data fileScience and Humani@es Wilhelm Dilthey (Einleitung...

Vagueness - the Neglected Feature in Big Data fileScience and Humani@es Wilhelm Dilthey (Einleitung...

Date post: 27-Oct-2019
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
23
Vagueness - the Neglected Feature in Big Data Walther von Hahn Universität Hamburg • Computer Science Department E-Mail: vhahn@informa@k.uni-hamburg.de © Walther v.Hahn Colloque interna@onal • Les mégadonnées et les sciences sociales. 22. – 24. Sept. 2017 • Bucarest
Transcript

Vagueness-theNeglectedFeaturein

BigData

WalthervonHahnUniversitätHamburg•ComputerScienceDepartment

E-Mail:vhahn@[email protected]©Waltherv.Hahn

Colloqueinterna@onal•Lesmégadonnéesetlessciencessociales.22.–24.Sept.2017•Bucarest

Contents•  Theore@calBackground•  WhichBigData?•  Vagueness

–  types–  onseverallayers–  historicalmaterial

•  Annotata@onsandInferencing•  Datacollec@onandinterpreta@oninDH•  VaguenessinInterpreta@onGUIs•  LexicalandSyntac4cSourcesofvaguenessinoriginal•  ExamplefornecessaryManualAnnota4onofFactualuncertainity•  Summary•  References

OverviewInsocialscience(asinotherfieldsof„DigitalHumani@es“)bigdataprojectstendtocollectdataasfactsina(rela@onal)database.Socialscience,howeverpartakes-asahumanity-accordingtoWilhelmDiltheyinahermeneu@cparadigmforestablishingsocialhypotheses.Accordingly,socialdataoaenconsisteitheroftextsmirroringactudes,allega@ons,beliefs,etc.,orarereac@onsoftestsubjectstoverbals@muli.Suchmaterialcannotbetreatedasfactslikenumbersorposi@[email protected],analysingonlyformalfeaturesinthematerialdoesrarelycontributetothehermeneu@caimsofthesociologicalquest.•  •  Thetalkisaboutpossiblewaysoutofthisdilemma.Afirstsolu@onisthe

subsequentusageofbigdataforhumanreadingandinterpreta@ononly,which,however,underes@matesthescien@ficpowerofcompu@ng.

•  •  Anothersolu@onisasemi-automa@[email protected]

bymetadataaboutthecredibilityoftextsandauthorsaswellasbylexicalannota@onsofvaguenessexpressions.Occurrencesof“perhaps”,“mostly”or“toacertainextent”(tonameonlyobviousexamples)[email protected],annota@onsupportsseman@cqualifica@onsandallowsforreasoningovervaguefeaturesinbigdata.

EmpiricalSocialScience,HumanityorScience

SCIENCE:–  sta@s@csandstochas@cs–  computerizedmethodsfor(semi-automa@c)collec@on,retrieval,

annota@onandanalysis,dataexchange,linkingamongdataHUMANITY:

–  quan@ta@vemethodssupporthumani@es‘qualita@veresearch,butdonotreplacethem.Themainhermeneu@ctaskisleaopen.

–  Computerformalismstypicallymodelfactsindatabases.However,onlyfewhumanis@cissuesarefacts,mostareopentointerpreta@on.Standarddatabasesinsomewayobscurethedatabyalleging“facts”.Themainhermeneu@ctaskiss@llleaopen.

–  Example:MostownersofTVsetshavealowIQè Background:Bothfactsareco-occurrent,whodecides,thattheyarenotcausal

©v.Hahn•UniHamburg 4

ScienceandHumani@esWilhelmDilthey(EinleitungindieGeisteswissenscha81922):Diltheydescribeshistoryas“aseriesofworldviews.”Mancanonlyunderstandhimselfthroughwhat“historycantellhim”…[email protected]“intrinsictemporalityofallunderstanding”i.e.,thatman’sunderstandingisdependentonpastworldviews,interpreta@ons,andasharedworld.LateronHans-GeorgGadamer(WahrheitundMethode1960)declared,thatinterpre@ngatextinvolvesafusionofhorizons(Horizontverschmelzung).Boththetextandtheinterpreterfindthemselveswithinapar@cularhistoricaltradi@on,or“horizon”.Eachhorizonisexpressedthroughthemediumoflanguage,andbothtextandinterpreterbelongtoandpar@cipateinhistoryandlanguage.JürgenHabermas(TechnikundWissenscha8alsIdeologie,TheoriedeskommunikaDvenHandelns,1968)dis@nguishesbetweenpurposivera@onalac@onandsocialac@on,[email protected]ürgenHabermas’conceptandtheoryofcommunica@vera@onalitydis@nguishesitselffromthera@onalisttradi@on,byloca@ngra@onalityinstructuresofinterpersonallinguis@ccommunica@onratherthaninthestructureofthecosmos.

©v.Hahn•UniHamburg 5

WhichBigData?

•  Vaguenessinsocialscienceisanissueforthosebigdata,whichintheendareevaluatedseman@cally,i.e.byanalysesorannota@onshigherthanlinguis@cformalstructures.

•  AtleastwhenyouusewordNetsynsetsorevenworse,theirtransla@onsfromEnglish,youhavetoenvisagevaguenessproblems,becauseyouusewordsensesinahermeneu@cway,notonlybymeasuring.

Heissa@sfied

DialogueandSeman@cInterpreta@on

Oh,ingeneral,ok!

HowdoyouliveinIdohalive?

thisafactinthedatabase

.. ..

Example•  Manysocialsciencedatacomefrompublicopinionpollsandare

individualresponsestoverbals@muli.•  Measuringformaldetails(e.g.sentencelength,response@me)is

notahermeneu@cac@vity.•  However,alateraqachmentofmeaning(„Amajorityof

respondentsarescepDcagainstAfricanimmigrants“)tonumericalresultsisahermeneu@cissueandsubjecttointerpreta@on.

•  Toavoidinvalidinterpreta@ons,userhavetoincludeintotheirevalua@onmetadataaboutthesurveydetails,i.e.aboutques@onnaireandtherespondents.

•  Alinguis@canalysishastocheckthehomogeneityoftherandomsampleorpossibleinterferenceamongtheinterviewerandtherespondents.

•  Withinbigdataauserhastomergethemetadataintoadatareliabilityinfo.

M.Pinkal‘sSchemaofSeman@cVagueness

Semantic Vagueness

Vagueness in anarrow sense

Ambiguity

Porosity

Relativity

Inexactness

BorderlineUncertainty

Homonymy

Polysemy

SyntakticAmbiguity

Referential Ambiguity

Elliptical Ambiguity

Metaphorical Ambiguity

one-dimen-sional many-dimensional

IllocutiveUnclarity

CommunicativeUnderspecification

©v.Hahn•UniHamburg 9

Cimiano,UngerandMcCrae

ReferentialUncertainty

FactualUncertainty(yet)unexploredfacts "themoonis384402,56mdistantfromthe

earth”rangeexpressions “Thebeginningofthe18.century”“Romaniain

themiddleages“uncertaindefini@on “thenorthernslopeofthemountain”Inexactmeasures „4Tagereisen“,„10Fuß“,a4days’journey,10

feet”unclearplace „Syrfia“unclearfacts „aufBefehldesSultans“,„byorderofthesultan“unclear@me „IngrauerVorzeit“,„inprehistoricDmes“unclearperson „derdamaligeFürst“,„theformerprince“unclearac@on „DieUnterwerfungderBarbaren“,

„thesubmissionofthebarbarians“

ChallengeforDH:Vaguenessonseverallayers

Examplesfromhistorictexts:•  Linguis@cvagueness,•  Logicalvagueness,•  Fuzzyconcepts

–  “BeforeStephantheGreat,allmountainsaroundMoldaviabelongedtoTransilvaniaandthecountrywasnarrowonthisside”…,

•  Vagueorconcurrentontologies:–  TheTurkishandtheMoldavianadministra@on,

•  Referen@alvaguenessoruncertainty–  Theoriginofthehill“ChanTepesi”or“MogilaRabuy”,

•  NaïveHistory(derivedfrom‘naivephysics’)–  „TheRomanEmpireconqueredDacia“,

•  Historicalchange,•  Vaguenessofthesources

©v.Hahn•UniHamburg 11

Themoreyougointohistory,themoredatabecomevague

•  measures•  @mespanexpressions:InthebeginningofXthcentury,

shortlylater•  Persons:theformerprince,thecurrentpope•  evenNEsareoaenvague,arevague,•  Addi@onally,changesinwri@ngcreatesar@ficial

vagueness,

Howtoannotate

•  Inbigdatayoucannotannotatelargeamountsoftextswithreasonablecosts.

•  Theonlywaysout:– smalllearningtextsandautoma@cpropaga@on,– automa@cannota@onoflexicalindicators,–  includingmeta-datafortextclasses,– establishinginferencerulesfor„vaguenesscombina@ons“.

LexicalVaguenessPredictors

•  Modalverbs:must,should,will,can...•  Adverbs:perhaps,forexample,sotosay,possibly,maybe,by

anychance,roughly,rude,coarse,andsoon,andsoforth,basicly,...

•  Adjec@ves:simplified,•  Compara@vedegrees:beder,more,worse...•  Vaguequan@fiers:many,most,mostly,majority,o8en

MetadatatobeincludedintheGUI

genre:•  officialdocument,•  leqer•  fic@on, •  fairytale•  legend,•  folktradi@oncredibilityofauthor•  poli@cian•  journalist•  fic@onwriterhistoricaldistance•  modern•  historical

decreasingreliability

decreasingcredibility

CurrentDH-Approach

MachineLearning

ManualAnnota4on(Domainspecific)

CL-Tools

NE-Recogni4on

Knowledge-Base

Quan@ta@veMeasures

A B

C D E

Interpreta@on

DATA

IncludingVagueness

MachineLearning

ManualVaguenessAnnota4on

CL-Tools

NE-Recogni4on

Fuzzy-KB

Quan@ta@veMeasures

Interpreta@on

DATA

MachineLearning

ManualDomainAnnota4on

AB

CD E

?

??

LexicalandSyntac4cSourcesofvaguenessintheoriginal

Hactenus Gregoras: ad cuius verba observare haud extra propositum erit τὴν πρώτην, quam Gregoras vocat „Tartariam”, eandem esse, quam hodie vulgo „Magnam” appellamus eiusque incolarum nomina, etsi ab historicis recenseantur, tamen adscita magis, aut ab exteris indita, quam propria eisque, dum in suis sedibus morarentur, peculiaria fuisse. Ita, si quis in praefixa huic tractationi Praefatione legerit Oguzorum gentis Principes in duas stirpes fuisse divisos, „Aliothman” unam, et „Ali Dzengiz”1 alteram, ne credat sub ipsis horum generum conditoribus hanc appellationem iam apud eas gentes invaluisse. Vti enim absonum videtur, Aliothmanos Suleimano parentes ab huius nepote, qui integro post saeculo iis imperabat, nomen fuisse sortitos; ita non minus falso vulgo praedicantur Tartarorum Crimensium Principes ab ipso Dzengizchano „Alidzengiz” appellationem retinuisse.

Până aici l-am citat pe Gregoras: faţă de cuvintele lui nu va fi nepotrivit să observăm că acea Tartaria „ἡ πρώτη”, pe care o numeşte Gregoras, este chiar aceea pe care o numim îndeobşte cea „Mare”, iar numele locuitorilor ei, chiar dacă sunt înregistrate de istorici, au fost totuşi mai degrabă împrumutate sau date de străini decât proprii lor, purtate întocmai pe vremea când se aflau în sălaşurile lor. Astfel, dacă va fi citit cineva în Prefaţa pusă înaintea acestui tratat că principii neamului oguzilor au fost împărţiţi în două stirpe, una „aliothmană”, cealaltă, „alidzengiză”, să nu creadă că denumirea aceasta era de-acum valabilă pentru întemeietorii acestor neamuri. Căci, după cum pare nepotrivit ca aliothmanizii care i se supun lui Suleiman să-şi fi ales numele de la nepotul acestuia, care a domnit peste ei după un secol întreg, la fel de fals se spune îndeobşte că principii tartarilor din Crimea şi-ar fi păstrat denumirea „alidzengiz” chiar de la Dzengizchan

Moreplausible

Quota4on

Wouldhavebeen…

seemsunlikely

equalyfalse

DomnulceldintâicareledupănăvălirealuiBa@e,aagonisitiarășistrălucireaceamaidinainteaMoldoveiafost:1.Dragoșșimăcarcăhronografiilenoastrenuaratăpentruș@ințaneamuluisău,darlanoisezicenecontenit,căafostdinneamulcelvechiualcrailorMoldovineș@,șiaavuttatăpeBogdanfiulluiIoan,delacareletoțiDom-niiobișnuesca-șipunelaiscăliturănumeleIoan.Șicuvântulacestaestemaiușordeaseade-verișipentruaceasta,căcicugreuestedeasecrede,căaltuldinneammaiprost,arfipututcuotovărășieașamaresămeargălavânat,careleadatprilejladescoperireaMol-doveișiarfiputut...

was

Dererstedemnach,dernachBa@aEinfall(*)derMoldauihrenvorigenGlanzwiederverscha}hat,war1.Dragosch.ObgleichunsreJahrbücherseinGeschlechtsregisternichtangeben,soistesdocheinebeständigeSagebeyuns,daßerausdemaltenköniglichenmoldauischenStammegewesensey,unddenBogdanzumVatergehabthabe,welchereinSohndesJohanniswar,vonwelchemalleFürstendenNamenJohannisinihremTitelzuführenpflegen;dieserMeinungistdestomehrGlaubenbeyzumessen,weilmanschwerlichglaubenkan,daßeinervongemeinerHerkunamiteinemsogroßenGefolgeaufdieJagd(welchedieMoldauzuentdeckenGelegenheitgegeben,)habeausgehen,….

Dragos= belongs_toMoldaviankings

Dragosch≈ belongs_toMoldaviankings

shouldhavebeen

Exampleforwrongknowledgeextractedwithoutdeeperlinguis4cannota4on–GermanandRomaniancase

ExamplefornecessaryManualAnnota4onofFactualuncertainity

[…]HefoughttwoBaqleswithBajazetIldirim;inthefirsthewasvictor,andinthesecondheroutedhimwithamemorableslaughter,whichsevenvastpilesofTurkishBodieserectedaaertheBaqle,witnessed,bytheConfessionofHezarfennhimself,thefaithfulTurkishHistorian.

Cantemir,pp.47(Annota@ons)

Hezarfen(HezarfenHüseyinEfendi)(?-1691/92),Tenkih-iTevarih-iMülük:isNOTmen@oningthesefacts

TheTurkishhistorianssoextollthisprince’sexpedi@oninassemblinghistroops,inexecu@nghisdesigns,andinvanquishinghisenemies,thatwhentheytalkofthenaturalspeedoftheTartarsincomparisonwithhiswonderfulmarches,theycallthefirst,thecreepingofasnail.

Cantemir,pp.48(Annota@ons)

DescribedinSolakzade:?,HocaSaadecn:,Neşri:

Howtorepresentvagueness

Author

Genre

HistoryFacts

Transla@on

Vocabulary

SummaryToavoid,•  thatwords/textsbecomefactsorconceptswithoutseman@c

annota@ons,•  thatbigsocialdatabecomeuniformdatabaseentrieswithout

somesortofreliabilitycheck,weneedindica@onsoftheirvagueness.

References•  ThomasT.BallmerandPinkal,Manfred,ApproachingVagueness,

Amsterdam1983•  GeeraertsDirk,Vagueness'spuzzles,polysemy'svagaries.In:Newman,

JohnCogni@[email protected].•  v.Hahn,Walther,VagheitbeiderVerwendungvonFachsprachen.In:

Hoffmann/Kalverkämper/Wiegand:Fachsprachen.Band1.Berlin1998.S.383–390.

•  Pinkal,Manfred,Seman@scheVagheit:PhänomeneundTheorien,TeilI.In:[email protected],S.1-26,Wiesbaden1980.

•  Pinkal,Manfred,Seman@scheVagheit:PhänomeneundTheorien,TeilII.In:[email protected],S.1-26,Wiesbaden1981.

•  EdeltraudWinkler,ÜberlegungenzuArtefaktbezeichnungenimDeutschen.In:DeutscheSprache37(2009)H.1,S.33-47.


Recommended