Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | fay-fleming |
View: | 218 times |
Download: | 0 times |
Computation and Minimax Risk
• The most challenging topic…• Some recent progress:
– tradeoffs between time and accuracy via convex relaxations (Chandrasekaran & Jordan, 2013)
– constraints on computation via optimization oracles (Duchi, McMahan & Jordan, 2014)
– parallelization via optimistic concurrency control (Pan, et al., 2014)
Concurrency Control for Distributed Machine
LearningMichael I. Jordan
University of California, Berkeley
(with Xinghao Pan, Joseph Gonzalez, Stefanie Jegelka, Tamara Broderick and Joseph Bradley)
Distributed Computing Meets Large-Scale Statistical Inference
• In many areas of statistics, parallel/distributed approaches are increasingly essential (e.g., to provide time/sample tradeoffs)
• Many methods, either optimization-based or integration-based, involve exploring models having variable structure
• Leading to a core problem: how to ensure that statistical consistency and coherence are maintained when multiple processors are making structural changes to a model?
Data
ModelState
Serial Inference
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Data
Data
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Keep Calm and Carry On.
Accuracy
Serial
Low High
Accuracy
Scalability
Coordination-free
Serial
High
Low High
Low
Accuracy
Scalability
Coordination-free
Serial
High
Low High
Low
ConcurrencyControl
Database mechanismso Guarantee correctnesso Maximize concurrency Mutual exclusion Optimistic CC
Data
ModelState
Mutual Exclusion Through Locking
Processor 1
Processor 2
Introducing locking (scheduling) protocols to identify
potential conflicts.
Data
ModelState
Processor 1
Processor 2
✗
Enforce serialization of computation that could conflict.
Mutual Exclusion Through Locking
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Allow computation to proceed without blocking.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
?✔
Validate potential conflicts.
Valid outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
? ?✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Take a compensating action.
✗ ✗Amend the Value
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗Rollback and Redo
Take a compensating action.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Rollback and Redo
Non-Blocking Computation
Validation: Identify Errors
Resolution: Correct Errors
Concurrency
AccuracyFast
Infrequent
Requirements:
Concurrency Control
Coordination Free:
Provably fast and correct under key assumptions.
Concurrency Control:
Provably correct and fast under key assumptions.
Systems Ideas toImprove Efficiency
Examples
Keyw
ord
sQ
ueri
es
A B C D E F G H
1 2 3 4 5 6 7 8
$2 $5 $1 $2 $5 $1 $4 $2
Costs
$2 $2 $4 $4 $3 $6 $5 $1
Value
θ1
ϕ1
θ2
θ3θ4
ϕ2 ϕ3 ϕ4θ5
θ6
Clustering: DP-means Submodularity: Double Greedy
Bayesian Nonparametrics: Chinese Restaurant Process
Clustering with DP-means
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model
Bayesian Nonparametrics Meets Optimization
• A methodology whereby optimization functionals arise when “small-variance asymptotics” are applied to Bayesian models based on combinatorial stochastic process priors
• Inspiration: the venerable, scalable K-means algorithm can be derived as the limit of an Expectation-Maximization algorithm for fitting a mixture model
• We do something similar in spirit, taking limits of various Bayesian nonparametric models:– Dirichlet process mixtures– hierarchical Dirichlet process mixtures– beta processes and hierarchical beta processes
DP-Means Algorithm
Computing cluster membership
[Kulis and Jordan, 2012]
λ
DP-Means Algorithm
Updating cluster centers:
[Kulis and Jordan, ICML’12]
DP-Means Parallel Execution
Computing cluster membership in parallel:
CPU 1
CPU 2
Cannot introduce
overlapping clusters in parallel
<λ
Optimistic Concurrency Control
for Parallel DP-Means
<λ
ResolutionAssign new cluster center to existing cluster
Optimistic AssumptionNo new cluster created nearby
ValidationVerify that new clusters don’t overlap
CPU 1
CPU 2
Corr
ectn
es
sConcurrency Control for DP-means
Theorem: OCC DP-means is serializable, i.e. equivalent to some sequential execution.
Corollary: OCC DP-means preserves theoretical properties of DP-means.
Theorem: Assuming well-spaced clusters, expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.
Con
cu
rre
ncy
Empirical Validation Failure Rate
30
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
λ Separable Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Independence of dataset size
Empirical Validation Failure Rate
31
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
Overlapping Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Weak dependence of dataset size
Distributed Evaluation Amazon EC2
1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
Number of Machines
Ru
nti
me I
n S
econ
dP
er
Com
ple
te P
ass o
ver
Data
OCC DP-means Runtime Projected Linear Scaling
2x #machines≈ ½x runtime
~140 million data points; 1, 2, 4, 8 machines
Summary
Accuracy Scalability
SequentialAppealing theoretical properties
Little
Coordination-free
Approximate, under
assumptionsAlways fast
Concurrency Control
Always correctGood, under assumptions• Coordination-free approach guarantees speed, and
analysis focuses on showing accuracy under assumptions.• Our approach guarantees accuracy, and analysis focuses
on showing speed under assumptions.
Conclusions
• Many conceptual and mathematical challenges arising in taking seriously the problem of “Big Data”
• Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations – thus reshaping both disciplines