Computational Models of Discourse Analysis
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction Institute
Warm Up How would
you rate the new girl and the Indian blogger on these scales?
An why?
Warm-Up discussion We read two theory papers so
far in this unit: The first paper was about what
style of reference says about identification with a community and with an interlocutor (as an ingroup member or not)
The second paper related to the way the use of time in a narrative speaks about self-concept and projected reader
How do these issues related to the aspects of personality covered in the Gill paper?
Based on that comparison, do these numbers make sense?
Further Discussion What was the research question the authors were trying to
answer? What was the reason for choosing LIWC? Do you think this was a reasonable approach? What do you conclude about how much of personality as it
is revealed through text is captured by LIWC features?
Notice that the author cited a lot of prior work from his own lab or people who used a similar methodology
Background on LIWC
Developed by Pennebaker Used frequently in medical informatics Usually applied to highly controlled data
Isolates as much as possible the variable being examined
Is the blog data controlled in the right way?
* Connection with subpopulations/ domain adaptation
Online LIWC Assessment
What would we conclude?
The new girl is more neurotic
The Indian is a little more extroverted
The Indian is a little more open
The new girl is more conscientious
The new girl is more agreeable
* Would you expect a machine learning model with these features to work well?
What features might you try instead?
Announcement
For Monday
Analyze one of the two blog posts from this perspective
(pp 92-135)
Questions?