Truman, Or How I Learned to Stop Worrying and Love ... · Truman, Or How I Learned to Stop Worrying...

Truman Or How I Learned to Stop Worrying and LoveSimulated Social Media

Lee Taberltaberucscedu

Kevin Weatherwaxwaxucscedu

Alexander Maybenamaybenucscedu

Sean Fernandessemafernucscedu

ABSTRACTConducting unbiased empirical research in social media spacesis of great importance However it is a highly restrictive spacefor researchers who are not aligned with or working fromwithin the companies who control those spaces Addition-ally even when access to social media networks is availablethere are a range of ethical concerns and hurdles around con-ducting experimental research Here we present a proof ofconcept for a system of easily accessible software tools whichleverage procedural generation machine learning and inter-active media to allow researchersndash from both technical andnon-technical domainsndash to conduct investigations that involvehuman personality traits (ie Big 5 attributes) in a simulatedsocial media environment Our past research on personalityattributes in social media spaces is both our motivation andexemplar use case as we have found that affordances andsocial mores influence individual presentation as measured bypersonality traits

INTRODUCTIONMost researchers are not large tech companies they are largelyunable to present participants a fully functioning social me-dia platform with an established user base Data from largesocial media platforms (eg Facebook Instagram Snapchatetc) are largely inaccessible to researchers outside ofndash andsometimes withinndash their own companies Unless one alreadyworks at a social media company researchers may find itsignificantly more difficult to test many of the insights thatthey might possibly glean from users Furthermore fakingsuch a social media would be very difficult using the commonpractices such as Wizard of Oz (WOZ) Especially if one is at-tempting to maintain any sort of validity consistency in postsis very important In order to further research in this spacecollecting empirical data becomes increasingly important fora wide range of research fields

Personality and social media research is an example of anarea where empirical tools would be of great use Person-ality is a long-standing and intuitive construct that we useas a way to understand how people behave across differentsituations Much research has been collected on how person-ality is exhibited online especially in social media [10 241 6 11] However this research is mostly correlational andis not useful for making causal claims about the relationshipbetween personality and online platforms Furthermore lit-tle of this research has informed the creation of personas or

character archetypes that tend to portray specific traits Theissue of correlation is significant in the social media researchspace Without control over certain aspects of the social mediaplatform performing empirical research becomes especiallychallenging Measuring correlations is about the limit of whatmost researchers can accomplish with current tools

In this paper we present our first steps at developing an ecosys-tem of tools to simulate manipulate and record human inter-actions on social media platforms We build upon an existingsimulated social media framework from DiFranzorsquos Trumanplatform [4] but we also utilize machine learning processes soas to generate and validate potential simulated posts The de-velopment of this system was motivated by our own past workinvestigating personality traits in social media However thisis also a good exemplar use case to present simulated socialmedia posts as the core principles and language are widelyunderstood and thus accessible beyond our own discipline

PersonalityThe Big Five (OCEAN) personality trait taxonomy is ahighly validated and well-studied survey [14 5 21] The fivepersonality traits are commonly referred to by the acronymOCEAN Openness to Experience Conscientiousness Ex-troversion Agreeableness and Neuroticism Openness toExperience is related to the range of a personrsquos interests in-tellectual curiosity and aesthetic sense Someone who rateshigh on Openness is likely to have a wide range of interestsand enjoy tackling new ideas Someone who rates low onOpenness is likely to focus on a few areas and enjoy routinesConscientiousness is related to responsibility organizationand time-keeping Someone who rates high on Conscientious-ness is likely to be punctual and keep spaces around themclean and tidy Someone who rates low on Conscientiousnessis likely to procrastinate have a messy desk and find troublebeing on time for events Extroversion is related to socialactivity energy level and assertiveness Someone who rateshigh on Extroversion is likely to be the life of the party ener-getic in social situations and not afraid to speak their mindSomeone who rates low on Extroversion is likely to stay athome and not engage frequently in social situations Agree-ableness is related to warmth trust and respect Someone whorates higher on Agreeableness is likely to be a kind forgivingperson who sees the best in everyone Someone who rateslower on Agreeableness is likely to be a cold mean personwho is distrusting of others Finally Neuroticism is related to

emotional volatility anxiety and depression Someone whorates high on Neuroticism is likely to have mood swings be aworrier andor have depression Someone who rates low onNeuroticism is likely to be emotionally stable and to be seenas a generally calm person

All of the above descriptions of the traits describe so-called of-fline behaviors However people can also make predictionsabout someonersquos personality from a variety of clues This caninclude the spaces where they work and live [9] how they con-duct themselves [18] and even their handwriting [12] Withinsocial media spaces other researchers have correlated person-ality traits with certain behaviors on social media For exampleAmichai-Hamburger amp Vinizky [1] correlated OCEAN traitswith Facebook behaviors and found Extroversion correlatedwith number of friends and negatively correlated with theamount of posted personal information Openness correlatedwith amount of posted personal information as well as the useof certain features within the userrsquos personal info section suchas emoji Neuroticism was correlated with likelihood of theuser posting a photo on their profile as well as particularlyhigh and low scorers sharing more basic info (such as selfiesand personal information) This was hypothesized to be theresult of emotionally stable people being willing to share morebecause they are secure while emotionally volatile people doso when seeking self-assurance

Gosling et al [10] also found correlations between OCEANtraits and Facebook behaviors Extroverted people correlatedwith number of friends number of friends in local networkslikelihood of maintaining an up-to-date presence and like-lihood of commenting on other pages Openness correlatedwith number of friends shared local networks and photosof new activities people and things Conscientiousness wasnegatively correlated with time spent on Facebook those whoare low on Conscientiousness tend to procrastinate more andthus spend more time on social media People who were highon Agreeableness were more likely to view usersrsquo pages moreoften including both others and their own Personality is auseful test case for this tool as there is a great deal of existingresearch related to its correlations with social media behavior

Social Media ResearchOur research partly builds off of an existing simulated socialmedia platform named Truman by the original researcher [4]In DiFranzo et alrsquos study researchers created a simulated so-cial media environment in order to better understand whichattributes of the system could potentially influence cyberbully-ing intervention among college age students The social mediais simulated because the participant is the only real actor onthe platform Each person who signs in sees their own versionof the platform All other posts that participants see are ac-tors scripts that post pictures and comments at pre-specifiedtimes The design of the simulation is also extremely clever tolimit the types of images needed and to avoid having to showa bunch of photos of people (which would be difficult to sim-ulate for a large number of people) their simulated platformnamed EatSnapLove is essentially a food documentation-based social network People are only allowed to post picturesof food and anything else would be automatically deleted

Figure 1 Example of current human-authored Truman post

ensuring that participants are encouraged not to post anythingcrazy or risque Truman also employed two hidden variableswithin its design communication The first variable decidedon whether the system indicated that other people could seewhat the user viewed and the second chose whether to portraythe userrsquos audience was large small or vacant These vari-ables aimed to test the influence of social pressure andor thebystander effect in cases where users may have seen one actorbullying another within a given post Sadly this interventionfailed as the vast majority ( 75) of participants regardlessof condition simply didnrsquot do anything about bullying thatthey witnessed

While Trumanrsquos original experiment was an unfortunate fail-ure the underlying systems remain amazingly robust for con-ducting empirical research As an early test of this system oneof the authors wanted to understand how social media usersevaluate aspects of other usersrsquo behaviors to evaluate their per-sonalities To this end the author used previous research onpersonality correlates to social media behaviors [1 10] in orderto generate several personas each persona attempting to por-tray one pole of each of the Big 5 dimensions (see Figure 1 forexample of human authored posts) Participants in this versionwere asked to read posts from these and other personas over aperiod of three days and at the end guess which mask eachpersona was wearing (visit httpstrumanherokuappcom tosee for yourself) Personas would use as many attributes ofthe post format as possible to convey their personality traitsFor example the high-Extroversion persona made more postscommented frequently on other posts solicited interaction in

Figure 2 Proof of concept workflow for implementing generative personality type agents in Truman

their posts had more likes and comments than other personasand so on However this method quickly raised several is-sues It was onerous to have one person generate hundreds ofposts and comments to make sure that each was exhibiting thecorrect traits and it was unclear which aspects people wereexamining to make their determinations From these concernsthe author became interested in ways to procedurally generateposts as well as validate the personality traits being exhibitedThis in turn motivated the assemblage and application of thisset of tools

METHOD AND RESULTSIn this section we describe the workflow and assemblage of ourtoolkit This serves as a proof of concept (refer to Figure 2 foroverview) for procedurally generated but controllable stimulifor populating an artificial social media platform with postsmeasured for personality traits We will provide a high-leveldescription of each individual piece and how it contributes tothe overall proposed system

Image CaptioningPeople on the largest social media platforms (eg FacebookInstagram etc) [3] generally start conversations with an imageand an accompanying piece of text Normally the process ofmanually captioning images to populate platforms like Trumanis time consuming and difficult To address this we startedwith an image classifier to parse and describe the content ofimages Specifically we used a machine learning model for

captioning visual images built from a TensorFlow tutorial[25] This model is attention-based [31] it learns by focusingon different sections of an image as it decides upon whichcaptions to apply This allows users to see which parts of theimage are in focus when each word of a caption is decidedupon It uses InceptionV3 [23] for image classification andfeature recognition The model was trained on the MS-COCOdataset of 221000 captioned images [16] Each image in thedataset is segmented into discrete objects and has five distinctcaptions making it ideal for this type of image captioningCaptioning vocabulary was created by tokenizing the contentof the MS-COCO captions (ie generating and saving textfiles) After the model was fully trained we input images fromTruman to generate sample captions for later modificationIn most cases these captions accurately identified the mainsubject of the image albeit with some confusion about contextand background details

Social MediaficationHaving produced an accurate descriptive caption of a socialmedia image we then need to adjust the language so that itfeels appropriate for a social media post This adjustment isdone by generating a new line of text based off the descriptivecaption passed to us above through GPT-2 (Generative Pre-trained Transformer-2) GPT-2 is a generative text algorithmbased upon an unsupervised Transformer machine learningmodel which generates text based on a pre-trained dataset ofseveral million webpagesrsquo worth of content This text can then

Table 1 Image Caption Tweets

Image Caption Tweets

A cat sitting next to a phone book ldquoI donrsquot see a cat anymorerdquoldquoExcept shersquos terrifiedrdquoldquoLet me know oh yeah I cant record her making a noiseas that sounds like an accident lolrdquoldquoA cat hollering at a rogue landlord PICTUREYou might only be responsible for two daysbut at least you got it fixedrdquo

be further tuned based on a specific corpus of data which willspecify the output to a specific genre or style In lay terms yougive the model a chunk of relevant source data and it re-framesits already robust ability to generate cohesive text to produceoriginal content that is thematicallysemantically similar towhat it was given GPT-2 was developed by OpenAI and hasbeen leveraged for a wide range of text based natural languageprojects [20] However the original version is notably diffi-cult to use for fine-tuning towards specific implementationsOur team therefore utilized GPT-2-simple a version of GPT-2developed by Max Woolf that has been re-translated throughcode wrapping in Python and TensorFlow [28 29 30] GPT-2-simple takes the GPT-2 algorithm developed by OpenAI andutilizes these wrappers to streamline the algorithmrsquos tuningand execution

For our purposes we fine-tuned GPT-2-simple on a dataset ofsocial media posts Specifically we used the Sentiment140corpus of approximately 16 million Twitter posts gatheredby Alec Go Richa Bhayani and Lei Huang [8] Once GPT-2trained on this corpus (we trained the 124MB model on thecorpus for a total of 1000 runs) the algorithm used our imagecaption (see 2 1) as a prompt before generating several linesof text In line with how our corpus is structured each lineof the output represents a separate social media post relatedto that original caption (ie responses) However since theTwitter data from our corpus is not articulated in narrativechunks this output is largely nonsensical when taken togetherTo address this issue we applied beginning and ending tokensto each tweet in the corpus which then gave us the abilityto stipulate cutoff points The end result is that the modelaccepts a descriptive caption generates a piece of text thatit thinks should immediately follow and then discards theoriginal caption This leaves a stub of Twitter-style text that isbased off of the image description (see Table 1)

OCEAN Classification and SortingFinally after text has been generated we need a way to thenvalidate any personality traits it appears to be showing Be-cause we intend to build personas with consistent traits weneed to be able to collect several posts that exhibit the samepattern of traits In order to validate this text we run it througha multi-label classifier Multi-label classifiers categorize textby using multiple non-overlapping binary flags for indicatingdifferent attributes As in Figure 2 each trait is given a proba-bility of the presence or absence of that trait within the textand are all mutually exclusive from one another Thus in thismodel onersquos Extroversion has no bearing on their Agreeable-

ness which in turn delineates the correct empirical approachfor evaluating personality traits

The model we used was built using TensorFlow trained onthe My Personality dataset generated by Kosinski et al [15]This dataset is a collection of 10000 Facebook posts whichhave been coded for Big Five personality values This ensuresthat we can create consistent personality profiles for eachpost However this method also bears some flaws Ideallya larger dataset of Facebook posts would result in a morerobust model since the current model often fails to understandtext that does not appear in the original dataset For examplecertain instances of netspeak seem to have little impact on thepersonality rating It is also difficult to re-interpret the binarypresence or absence of personality traits onto the typical 1-5scale used in personality research ergo the probability of agiven trait occurring in a given body of text indicates nothingabout the perceived strength of said trait If the probability isin the middle (ie around 50) does that necessarily meanthat the person is neutral in regards to that trait The answerto this is unclear and may present a useful area for futureexploration This is currently defined as the last stage in ourworkflow but we aim to collect the posts and then move theminto Truman as an example of persona creation

Platform ImplementationAs of now the implementation of our platform is a bit roughWe would ideally have all of the systems working together toa greater degree such that the information is directly passedfrom one stage to another Right now we are passing theinformation indirectly from one step to another Another im-plementation issue that we could address in the short termwould be greater validation of automatically generated infor-mation For example when using GPT-2 to generate a captionwe have to select one based on what makes the most sense forthat photo We could perhaps implement some form of checkthat makes sure that GPT-2 captions more directly referencean element of the photo (maybe by using the caption generatedfrom the image captioning step) Although this would makeGPT-2 generation take a bit longer we could hopefully avoidhaving to have eyes on each step of the process to the samedegree

Although this is currently unimplemented the last step wouldbe to build a collection of social media posts for each patternof traits (ie a persona who is high on Extroversion and lowon everything else) and use those posts for that persona Oneway this could be implemented into Truman would be by goingthrough several revisions of GPT-2 generation for a caption

with a caption only passing if it is classified within certainlimits for each persona For example to have a persona as theabove example a caption is only collected if its Extroversionprobability is gt80 and all other traits are lt30

One potential issue with this implementation is that there willbe little narrative captured between each post In the currentversion of Truman smaller narratives can take place over postsand in comments (for example two personas get into a fightin the comments with one declaring that the other is blockedand not interacting with them anymore) Following DiFranzorsquosexample clever design might help reduce participant skepti-cism about the simulated platformrsquos posts For example ifthe social media is framed such that other users are postingwhatever they are observing at certain specified times then aparticipant may be more willing to forgive a lack of cohesionin the posts shown We can also reduce this by constrainingthe class of pictures that each persona can use For example apersona that is supposed to be in New York could have severalphotos only from New York with a relatively consistent toneto the photos Careful photo selection and clever design canthus help strengthen the illusion of the personas

Another potential issue with implementation is that the currentmodel doesnrsquot account for other attributes as mentioned inthe introduction (views comments likes posting behavioretc) However these could be set post-hoc for each personaFor example any personas that score high on Extroversionmight automatically receive more likes and have more auto-generated comments on their posts (again using GPT-2 togenerate social media-like posts except by using the GPT-2generated post text as the prompt) These sorts of implemen-tation changes would help reduce the amount of necessaryhuman intervention while still preserving most of the usefulaspects of personality research This would be very useful forsocial media research as one could easily tune what the socialmedia looks like through the personasrsquo exhibited behavior Forinstance an study based on this platform could examine howhigh versus low Agreeableness in personas might influenceparticipantrsquos willingness to troll or bully personas By keepingother patterns the same and only flipping a certain switch onecould therefore experimentally test the effect of Agreeablenesson perceptions and actions within social media if everyoneon the social media seems to be a jerk can we prove a newuser is more likely to act like one too

Finally the current implementation is based upon personaltyas that is the example use case we decided to build for How-ever this could easily be tweaked for other potential use casesdepending on the research goals of the study For exampleif one wishes to study if a readerrsquos understanding of bridgingsocial capital is different than their understanding of bond-ing social capital (as studies of social capital are common inthe space of social media research [7 2 22]) then all that isneeded is to train a different classifier If one has a corpusof posts that are tagged correctly then it would be easy totrain a model to classify whatever one wishes to study Otherresearch goals could be achieved through Truman as well Ifone is interested in how different affordances shape a userrsquosunderstanding of the social media (such as ephemerality in

Snapchat [26 17 19]) then they are able to encode said af-fordances into Truman Do people really act worse if the onlydifference is anonymity Alter Truman and provide the exactsame posts to each participant The level of control gained forresearchers through the machine learning models and Trumanallow for a wide array of studies to be created

FUTURE WORKOur short term goal for this project would be to redesign theexisting implementation of Truman and experimentally testit against an equivalent version using the machine learningmethod as outlined above We could measure participants onthe correct identification of personality attributes if they choseto post or interact with any agents as well as self-report dataon how easy it was to understand other users how coherent orbelievable their posts seemed and so on This could also becompared to a baseline using real social media data to see howboth methods rank against one another Further research couldthen investigate effects of altering certain social variables onthe platform If we manipulate the presence or absence ofAgreeableness in a group of social media posts we could seehow this potentially affects participants by measuring log filestyle data (amount of log-ins posts comments content ofcomments) as well as participant self report (On a scale of 1-5 how annoyed were you by the other posters) and qualitativedata This would help present a richer picture of how socialmores or social presentation might influence social mediabehaviors in an empirical study with fairly high ecologicalvalidity

In the long term making this system available for other re-searchers would be an important contribution For instancemany traditional methods and tools for social science researchstruggle at the intersection of human experience and tech-nology because the breadth of knowledge experience andfine-tuned methodologies cannot readily be applied Support-ing personality attributes is itself only a fractional area butit is not hard to imagine that this workflow and set of toolscould be re-implemented as a proof of concept to assist inother research domains

LIMITATIONSOur method presents a flexible framework for allowing socialmedia researchers to begin to build their own empirical re-search plans However the current implementation has someissues One potential issue is that the GPT-2 model and themultilabel classifier are trained on slightly different socialmedia and the output is formatted for a third type of socialmedia This issues could lead to potential conflicts in theoverall effectiveness of the stages For example the GPT-2model is trained on Twitter data which is heavily text-basedand places tight constraints on messages (including a strictcharacter limit) [27 13] Facebook is relatively more picture-focused although the posts the model was trained on donrsquotinvolve images themselves Finally the visual output of Tru-man is formatted to be similar to Instagram which is heavilyimage based with the caption offering supporting context ora shift of frame Although GPT-2 is flexible it could be thatthe textual nature of Twitter would lead to a different outputthan if GPT-2 were to be fine-tuned on an equivalent corpus of

Instagram posts Future research would be advised to verifywhether datasets are justifiably equivalent or where possiblecreate their own unique datasets (using human coders for anew classifier dataset for example)

CONCLUSIONOverall this project was successful in accomplishing theprojectrsquos more broad-stroked goals We were able to takean image create a relevant caption and rate the caption onpersonality using only machine learning models with minimalhuman writing This is a promising start for a set of toolsthat may be used by current or future researchers within thesocial media space Future work along this path would needto clean up streamline and make the overall experience moreuser-friendly for less technical researchers Furthermore mak-ing the set of tools widely available would be a useful task initself as other researchers would be able to conduct empiricalresearch and test long-held ideas toward online socializationOther future work would be to start implementation of one ormore example studies so as to evaluate how useful the generalsystem is We aim to continue to build upon these tools inorder to provide studies for other researchers and developfurther experiments for our own research goals

REFERENCES[1] Yair Amichai-Hamburger and Gideon Vinitzky 2010

Social network use and personality Computers inHuman Behavior 26 6 (Nov 2010) 1289ndash1295 DOIhttpdxdoiorg101016jchb201003018

[2] Moira Burke and Robert E Kraut 2014 Growing closeron facebook Proceedings of the 32nd annual ACMconference on Human factors in computing systems -CHI rsquo14 (2014) 4187ndash4196 DOIhttpdxdoiorg10114525562882557094 ISBN9781450324731

[3] J Clement 2020 Global social media ranking 2019(Feb 2020)httpswwwstatistacomstatistics272014

global-social-networks-ranked-by-number-of-users

[4] Dominic DiFranzo Samuel Hardman TaylorFranccesca Kazerooni Olivia D Wherry and Natalya NBazarova 2018 Upstanding by Design BystanderIntervention in Cyberbullying In Proceedings of the2018 CHI Conference on Human Factors in ComputingSystems - CHI rsquo18 ACM Press Montreal QC Canada1ndash12 DOIhttpdxdoiorg10114531735743173785

[5] John M Digman 1990 Personality StructureEmergence of the Five-Factor Model Annual Review ofPsychology 41 1 (1990) 417ndash440 DOIhttpdxdoiorg101146annurevps41020190002221

arXiv 10111669v3 ISBN 00664308

[6] Nicole Ellison Rebecca Heino and Jennifer Gibbs2006 Managing Impressions Online Self-PresentationProcesses in the Online Dating Environment Journal ofComputer-Mediated Communication 11 2 (2006)415ndash441 DOI

httpdxdoiorg101111j1083-6101200600020x

ISBN 1083-6101

[7] Nicole B Ellison Charles Steinfield and Cliff Lampe2007 The benefits of facebook friends Social capitaland college studentsrsquo use of online social network sitesJournal of Computer-Mediated Communication 12 4(2007) 1143ndash1168 DOIhttpdxdoiorg101111j1083-6101200700367x

ISBN 0521832969

[8] Alec Go Richa Bhayani and Lei Huang 2014 ForAcademics - Sentiment140 (2014)httphelpsentiment140comfor-students

[9] Samuel D Gosling 2009 Snoop What your stuff saysabout you Basic Books -

[10] Samuel D Gosling Adam A Augustine Simine VazireNicholas Holtzman and Sam Gaddis 2011Manifestations of Personality in Online SocialNetworks Self-Reported Facebook-Related Behaviorsand Observable Profile Information CyberpsychologyBehavior and Social Networking 14 9 (Sept 2011)483ndash488 DOIhttpdxdoiorg101089cyber20100087

[11] Rachel Grieve and Jarrah Watkinson 2016 ThePsychological Benefits of Being Authentic on FacebookCyberPsychology Behavior amp Social Networking 19 7(July 2016) 420ndash425 DOIhttpdxdoiorg101089cyber20160010

[12] Jacob B Hirsh and Jordan B Peterson 2009Personality and language use in self-narratives Journalof Research in Personality 43 3 (2009) 524ndash527 DOIhttpdxdoiorg101016jjrp200901006 ISBN0092-6566

[13] David John Hughes Moss Rowe Mark Batey andAndrew Lee 2012 A tale of two sites Twitter vsFacebook and the personality predictors of social mediausage Computers in Human Behavior 28 2 (March2012) 561ndash569 DOIhttpdxdoiorg101016jchb201111001

[14] Oliver P John Laura P Naumann and Cristopher JSoto 2008 Paradigm shift to the integrative Big FiveTrait taxonomy Handbook of personality Theory andresearch (2008) 114ndash158 DOIhttpdxdoiorg101016S0191-8869(97)81000-8

ISBN 978-1-59385-836-0

[15] Michal Kosinski David Stillwell Pushmeet KohliYoram Bachrach and Thore Graepel 2012 Personalityand Website Choice WebSci (2012) 0ndash3 ISBN9781450302678

[16] Tsung-Yi Lin Michael Maire Serge Belongie JamesHays Pietro Perona Deva Ramanan Piotr Dollaacuter andC Lawrence Zitnick 2014 Microsoft coco Commonobjects in context In European conference on computervision Springer - - 740ndash755

[17] Sarah McRoberts Haiwei Ma Andrew Hall andSvetlana Yarosh 2017 Share First Save LaterPerformance of Self through Snapchat Stories InProceedings of the 2017 CHI Conference on HumanFactors in Computing Systems - CHI rsquo17 - -6902ndash6911 DOIhttpdxdoiorg10114530254533025771

[18] Laura P Naumann Simine Vazire Peter J Rentfrowand Samuel D Gosling 2009 Personality JudgmentsBased on Physical Appearance Personality and SocialPsychology Bulletin 35 12 (2009) 1661ndash1671 DOIhttpdxdoiorg1011770146167209346309 ISBN0146-1672

[19] Lukasz Piwek and Adam Joinson 2016 What do theysnapchat about Patterns of use in time-limited instantmessaging service Computers in Human Behavior 54(2016) 358ndash367 DOIhttpdxdoiorg101016jchb201508026

Publisher Elsevier Ltd ISBN 0747-5632

[20] Alec Radford Jeffrey Wu Rewon Child David LuanDario Amodei and Ilya Sutskever 2019 Languagemodels are unsupervised multitask learners OpenAIBlog 1 8 (2019) 9

[21] Christopher J Soto and Oliver P John 2017 The nextBig Five Inventory (BFI-2) Developing and assessing ahierarchical model with 15 facets to enhance bandwidthfidelity and predictive power Journal of Personality andSocial Psychology 113 1 (2017) 117ndash143 DOIhttpdxdoiorg101037pspp0000096

[22] Charles Steinfield Nicole B Ellison and Cliff Lampe2008 Social capital self-esteem and use of onlinesocial network sites A longitudinal analysis Journal ofApplied Developmental Psychology 29 6 (2008)434ndash445 DOIhttpdxdoiorg101016jappdev200807002 arXiv10111669v3 ISBN 0193-3973

[23] Christian Szegedy Vincent Vanhoucke Sergey IoffeJon Shlens and Zbigniew Wojna 2016 Rethinking theInception Architecture for Computer Vision In TheIEEE Conference on Computer Vision and PatternRecognition (CVPR) CHI -

[24] Lee Taber and Steve Whittaker 2018 PersonalityDepends on The Medium Differences inSelf-Perception on Snapchat Facebook and Offline InProceedings of the 2018 CHI Conference on HumanFactors in Computing Systems - CHI rsquo18 ACM PressMontreal QC Canada 1ndash13 DOIhttpdxdoiorg10114531735743174181

[25] Tensorflow Tutorials 2020 Image captioning withvisual attention (2020) httpswwwtensorfloworgtutorialstextimage_captioning

[26] J Mitchell Vaterlaus Kathryn Barnett Cesia Roche andJimmy A Young 2016 Snapchat is more personalAn exploratory study on Snapchat behaviors and youngadult interpersonal relationships Computers in HumanBehavior 62 (2016) 594ndash601 DOIhttpdxdoiorg101016jchb201604029

Publisher Elsevier Ltd

[27] Sophie F Waterloo Susanne E Baumgartner JochenPeter and Patti M Valkenburg 2018 Norms of onlineexpressions of emotion Comparing Facebook TwitterInstagram and WhatsApp New Media amp Society 20 5(May 2018) 1813ndash1831 DOIhttpdxdoiorg1011771461444817707349

[28] Max Woolf 2019a How To Make CustomAI-Generated Text With GPT-2 (2019)httpsminimaxircom201909howto-gpt2

[29] Max Woolf 2019b Python package to easily retrainOpenAIrsquos GPT-2 text-generating model on new texts(2019) httpsgithubcomminimaxirgpt-2-simple

[30] Max Woolf 2019c Train a GPT-2 Text-GeneratingModel w GPU For Free (2019)httpscolabresearchgooglecomdrive

1VLG8e7YSEwypxU-noRNhsv5dW4NfTGcescrollTo=

H7LoMj4GA4n_

[31] Kelvin Xu Jimmy Lei Ba Ryan Kiros Kyunghyun ChoAaron Courville Ruslan Salakhutdinov Richard SZemel and Yoshua Bengio 2015 Show attend and tellNeural image caption generation with visual attentionIn 2015 IEEE International Conference on MachineLearning - - 2048ndash2057 DOIhttpdxdoiorg101109ICDMW201751

Introduction
- Personality
- Social Media Research
- - Method and Results
  - - Image Captioning
    - Social Mediafication
    - OCEAN Classification and Sorting
    - Platform Implementation
    - - Future Work
      - Limitations
      - Conclusion
      - References

Page 2: Truman, Or How I Learned to Stop Worrying and Love ... · Truman, Or How I Learned to Stop Worrying and Love Simulated Social Media Lee Taber ltaber@ucsc.edu Kevin Weatherwax wax@ucsc.edu

emotional volatility anxiety and depression Someone whorates high on Neuroticism is likely to have mood swings be aworrier andor have depression Someone who rates low onNeuroticism is likely to be emotionally stable and to be seenas a generally calm person

All of the above descriptions of the traits describe so-called of-fline behaviors However people can also make predictionsabout someonersquos personality from a variety of clues This caninclude the spaces where they work and live [9] how they con-duct themselves [18] and even their handwriting [12] Withinsocial media spaces other researchers have correlated person-ality traits with certain behaviors on social media For exampleAmichai-Hamburger amp Vinizky [1] correlated OCEAN traitswith Facebook behaviors and found Extroversion correlatedwith number of friends and negatively correlated with theamount of posted personal information Openness correlatedwith amount of posted personal information as well as the useof certain features within the userrsquos personal info section suchas emoji Neuroticism was correlated with likelihood of theuser posting a photo on their profile as well as particularlyhigh and low scorers sharing more basic info (such as selfiesand personal information) This was hypothesized to be theresult of emotionally stable people being willing to share morebecause they are secure while emotionally volatile people doso when seeking self-assurance

Gosling et al [10] also found correlations between OCEANtraits and Facebook behaviors Extroverted people correlatedwith number of friends number of friends in local networkslikelihood of maintaining an up-to-date presence and like-lihood of commenting on other pages Openness correlatedwith number of friends shared local networks and photosof new activities people and things Conscientiousness wasnegatively correlated with time spent on Facebook those whoare low on Conscientiousness tend to procrastinate more andthus spend more time on social media People who were highon Agreeableness were more likely to view usersrsquo pages moreoften including both others and their own Personality is auseful test case for this tool as there is a great deal of existingresearch related to its correlations with social media behavior

Social Media ResearchOur research partly builds off of an existing simulated socialmedia platform named Truman by the original researcher [4]In DiFranzo et alrsquos study researchers created a simulated so-cial media environment in order to better understand whichattributes of the system could potentially influence cyberbully-ing intervention among college age students The social mediais simulated because the participant is the only real actor onthe platform Each person who signs in sees their own versionof the platform All other posts that participants see are ac-tors scripts that post pictures and comments at pre-specifiedtimes The design of the simulation is also extremely clever tolimit the types of images needed and to avoid having to showa bunch of photos of people (which would be difficult to sim-ulate for a large number of people) their simulated platformnamed EatSnapLove is essentially a food documentation-based social network People are only allowed to post picturesof food and anything else would be automatically deleted

Figure 1 Example of current human-authored Truman post

ensuring that participants are encouraged not to post anythingcrazy or risque Truman also employed two hidden variableswithin its design communication The first variable decidedon whether the system indicated that other people could seewhat the user viewed and the second chose whether to portraythe userrsquos audience was large small or vacant These vari-ables aimed to test the influence of social pressure andor thebystander effect in cases where users may have seen one actorbullying another within a given post Sadly this interventionfailed as the vast majority ( 75) of participants regardlessof condition simply didnrsquot do anything about bullying thatthey witnessed

While Trumanrsquos original experiment was an unfortunate fail-ure the underlying systems remain amazingly robust for con-ducting empirical research As an early test of this system oneof the authors wanted to understand how social media usersevaluate aspects of other usersrsquo behaviors to evaluate their per-sonalities To this end the author used previous research onpersonality correlates to social media behaviors [1 10] in orderto generate several personas each persona attempting to por-tray one pole of each of the Big 5 dimensions (see Figure 1 forexample of human authored posts) Participants in this versionwere asked to read posts from these and other personas over aperiod of three days and at the end guess which mask eachpersona was wearing (visit httpstrumanherokuappcom tosee for yourself) Personas would use as many attributes ofthe post format as possible to convey their personality traitsFor example the high-Extroversion persona made more postscommented frequently on other posts solicited interaction in