Date post: | 29-Jan-2016 |
Category: |
Documents |
Upload: | leo-lester |
View: | 212 times |
Download: | 0 times |
Crowdsourcing Insights with Opinion SpaceKen Goldberg, IEOR, School of Information, EECS, UC Berkeley
“We’re moving from an Information Age to an Opinion Age.”- Warren Sack, UCSC
Motivation
Goals• Engage community• Understand community
– Solicit input– Understand the distribution of
viewpoints– Discover insightful comments
Goals of Community Members• Understand relationship to other
community members• Participate, express ideas, and be
heard• Encounter a diversity of viewpoints
Motivation
Classic approach: surveys, polls
Drawbacks: limited samples, slow, doesn’t increase engagement
Modern approach: online forums, comment lists
Drawbacks: data deluge, cyberpolarization, hard to discover insights
Approach: Visualization
Approach: Level the Playing Field
Approach: Wisdom of Crowds
Related Work: Visualization
Clockwise, starting from top left:
Morningside Analytics, MusicBox, Starry Night
Related Work: Politics
Clockwise, starting from top left:
EU Profiler, Poligraph, The How Progressive Are You? quiz
Related Work: Opinion Sharing
• Polling & Opinion Mining– Fishkin, 1991: deliberative polling– Dahlgren, 2005: Internet & the
public sphere– Berinsky, 1999: understanding
public opinion– Pang & Lee, 2008: sentiment
analysis
• Increasing Participation– Bishop, 2007: theoretical
framework– Brandtzaeg & Heim: user study– Ludford et al, 2004: uniqueness
& group dissimilarity
Related Work: Info Filtering
• K. Goldberg et al, 2001: Eigentaste
• E. Bitton, 2009: spatial model• Polikar, 2006: ensemble
learning
Opinion Space:Live Demonstration
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
To try it:google: “opinion space”
contact us:http://goldberg.berkeley.edu
Dimensionality Reduction
low variance projection maximal variance projection
Dimensionality Reduction
Principal Component Analysis (PCA)• Assumes independence and linearity• Minimizes squared error• Scalable: compute position of new user in constant time
Canonical Correlation Analysis
• 2-view PCA• Assume:
– Each data point has a latent low-dim canonical representation z
– Observetwo different representations of each data point (e.g. numerical ratings and text)
• Learn MLEs for low-rank projections A and B
• Equivalently, pick projection that maximizes correlation between views
zz
xx yyGraphical model for CCA
x = Az + εy = Bz + ε
z = A-1x = B-1y
CCA on Opinion Space
• Each user is a data point– xi = user i’s responses to propositions
– yi = vector representation of textual comment
• Run CCA to find A and B, use A-1 to find 2D representation
• Position of users reflects rating vector and textual response
• Ignores ratings that are not correlated with text, and vice versa
• Given text, can predict ratings (using B)
zz
xx yyGraphical model for CCA
x = Az + εy = Bz + ε
z = A-1x = B-1y
Multidimensional Scaling
• Goal: rearrange objects in low dim space so as to reproduce distances in higher dim
• Strategy: Rearrange & compare solns, maximizing goodness of fit:
• Can use any kind of similarity function• Pros
– Data need not be normal, relationships need not be linear
– Tends to yield fewer factors than FA• Con: slow, not scalable
dij f (ij ) 2i, j
δiji
j
diji
j
Kernel-based Nonlinear PCA
• Intuition: in general, can’t linearly separate n points in d < n dim, but can almost always do so in d ≥ n dim
• Method: compute covariance matrix after transforming data into higher dim space
• Kernel trick used to improve complexity• If Φ is the identity, Kernel PCA = PCA
C 1
m x j x j
T j1
m
Kernel-based Nonlinear PCA
• Pro: Good for finding clusters with arbitrary shape• Cons: Need to choose appropriate kernel (no unique
solution); does not preserve distance relationships
Input data KPCA output with Gaussian kernel
Stochastic Neighbor Embedding
• Converts Euclidean dists to conditional probabilities• pj|i = Pr(xi would pick xj as its neighbor | neighbors picked
according to their density under a Gaussian centered at xi)
• Compute similar prob qj|i in lower dim space
• Goal: minimize mismatch between pj|i and qj|i:
• Cons: tends to crowd points in center of map; difficult to optimize
C KL Pi Qi i
p j | i logp j | iq j | ij
i
Metavid
Six 50-minute Learning Object Modules, preparation materials, slides for in-class lectures, discussion ideas, hand-on activities, and homework assignments.
Opinion Space: Crowdsourcing InsightsScalability: n Participants, n Viewpointsn2 Peer to Peer ReviewsViewpoints are k-DimensionalDim. Reduction: 2D Map of Affinity/SimilarityInsight vs. Agreement: Nonlinear Scoring
Ken Goldberg, UC BerkeleyAlec Ross, U.S. State Dept
Opinion SpaceWisdom of Crowds: Insights are RareScalable, Self-Organizing, Spatial Interface Visualize Diversity of ViewpointsIncorporate Position into Scoring Metrics
Ken GoldbergUC Berkeley