Automated Personality Classification A. KARTELJ and V. FILIPOVIC School of Mathematics, University of Belgrade, Serbia and V. MILUTINOVIC School of Electrical Engineering, University of Belgrade, Serbia
Transcript
Slide 1
Automated Personality Classification A. KARTELJ and V.
FILIPOVIC School of Mathematics, University of Belgrade, Serbia and
V. MILUTINOVIC School of Electrical Engineering, University of
Belgrade, Serbia
Slide 2
Agenda Problem overview Classification of the existing
solutions Presentation of the existing solutions Comparison of the
solutions Work in progress: Bayesian Structure Learning for the APC
Future work: Video Based APC Conclusions MULTI 201223.10.2012
Slide 3
Problem Overview MULTI 201233.10.2012
Slide 4
The Big 5 Model MULTI 201243.10.2012
Slide 5
The Steps in Our Research 1. Survey paper (under review at ACM
CSUR) 2. Research paper: A new APC model based on Bayesian
structure learning (in progress) 3. Real-purpose application of the
APC model from step 2 4. Go to step 3 MULTI 201253.10.2012
Slide 6
Elements of APC Corpus: Essay, weblog, email, news group,
Twitter counts... Personality measurement: Questionnaire (internet
and written). We are searching for an alternative! Model: Stylistic
analysis, linguistic features, machine learning techniques MULTI
201263.10.2012
Slide 7
Applications MULTI 201273.10.2012
Slide 8
Mining Peoples Characteristics MULTI 201283.10.2012
Slide 9
Classification of Solutions MULTI 201293.10.2012 C1 criterion
separates solutions by type of conversation (1 = self-reflexive, N
= continuous) C2 criterion separates solutions by approach (TD =
top-down, DD = data-driven, or HY = hybrid)
Slide 10
Linguistic Styles: Language Use as an Individual Difference
Pennebaker and King [1999] MULTI 2012103.10.2012
Slide 11
LIWC and MRC Features FeatureTypeExample Anger wordsLIWCHate,
kill Metaphysical issuesLIWCGod, heaven, coffin Physical state /
functionLIWCAche, breast, sleep Inclusive wordsLIWCWith, and,
include Social processesLIWCTalk, us, friend Family membersLIWCMom,
brother, cousin Past tense verbsLIWCWalked, were, had References to
friendsLIWCPal, buddy, coworker Imagery of wordsMRCLow: future,
peace High: table, car Syllables per wordMRCLow: a High:
uncompromisingly ConcretenessMRCLow: patience, candor High: ship
Frequency of useMRCLow: duly, nudity High: he, the MULTI
2012113.10.2012
Slide 12
What Are They Blogging About? Personality, Topic and Motivation
in Blogs Gill et al. [2009] MULTI 2012123.10.2012
Slide 13
Taking Care of the Linguistic Features of Extraversion Gill and
Oberlander [2002] MULTI 2012133.10.2012
Slide 14
Personality Based Latent Friendship Mining Wang et al. [2009]
MULTI 2012143.10.2012
Slide 15
A Comparative Evaluation of Personality Estimation Algorithms
for the TWIN Recommender System Roshchina et al. [2011] MULTI
2012153.10.2012
Slide 16
Predicting Personality with Social Media Golbeck et al. [2011]
MULTI 2012163.10.2012
Slide 17
Our Twitter Profiles, Our Selves: Predicting Personality with
Twitter Quercia et al. [2011] MULTI 2012173.10.2012
Slide 18
PaperInputCorpusFeaturesAlgorithmSoft.Cit.ISAR [Pennebaker and
King 1999]textessaysLIWCcorrelationsn/a455HHHM [Mairesse et al.
2007]text, speechessaysLIWC, MRCC4.5, NB, SMO, M5Weka99MMHM [Gill
et al. 2009]textweblogs (14.8words)LIWClinear regressionn/a26HHMM
[Yarkoni 2010]textweblogs (100K words)LIWCcorrelationsn/a21HMMM
[Gill and Oberlander 2002]textemails (105 students)bigramsbigram
analysisn/a49LMML [Nowson et al. 2005]textweblogs (410K words)word
listcorrelationsn/a48LHHL [Oberlander 2006]textweblogs (410K
words)N-gramsNB, SMOWeka53HMHM [Wang et al. 2009]text,weblogs (200
pairs)lexical freq., TFIDF logistic regressionMinitab1HMMM
[Iacobelli et al. 2011]textweblogs (3000)LIWC, bigrams,SVM, SMO,
NB..Weka1HHMH [Argamon et al. 2005]textessaysword list,
conj.SMOWeka38HMMM [Argamon et al. 2007]textessaysword list,
conj.SMO Weka, ATMan 45HMMM [Mairesse and Walker 2006] text, conv.
extracts 96 persons ( 100Kwords) LIWC, MRC, utterance
RankBoostn/a22MMHM [Rigby and Hassan 2007]textmail. lists (140K
emails)LIWCC4.5Weka, SPSS30MHML [Roshchina et al.
2011]textTripAdvisor reviewsLIWC, MRCLinear, M5, SVMWeka2HMLM
[Quercia et al. 2011]meta335 Twitter usersTwitter countsM5
rulesWeka5MHMM [Golbeck et al. 2011]text, meta279 FB users 5
classes (161 in total) M5 rules, Gaussian processes Weka12HMMM
[Celli 2012]text1065 posts22 ling. Features majority-based
classification n/a1MMMM MULTI 2012183.10.2012
Slide 19
Naive Bayes Classifier MULTI 2012193.10.2012
Slide 20
Naive Bayes and Bayesian Network MULTI 2012203.10.2012
Slide 21
Bayesian Network for the APC MULTI 2012213.10.2012
Slide 22
Bayesian Network Structure Learning 1. Obtain corpus (training
set T) 2. Fit T to appropriate network structure by: a)ILP
formulation + solver (CPLEX, Gurobi) on smaller instances b)Apply
metaheuristic on larger instances 3. Validate quality of
metaheuristic approach 4. Compare obtained APC accuracy with other
approaches MULTI 2012223.10.2012
Slide 23
Other Ideas MULTI 201223 Games with a purpose (GWAP) Clustering
personality characteristics 3.10.2012
Slide 24
Packing everything together: Video Based APC MULTI
2012243.10.2012
Slide 25
Conclusions Classification of the existing solutions (Survey
paper) Filling the gaps inside classification tree Introducing
Bayesian Structure Learning for the APC Utilizing metaheuristics in
dealing with high dimensionality APC potential: social networks,
recommender, and expert systems MULTI 2012253.10.2012