+ All Categories
Home > Documents > Lookahead: A Far-sighted Alternative of Magnitude-based...

Lookahead: A Far-sighted Alternative of Magnitude-based...

Date post: 05-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
21
Lookahead: A Far-sighted Alternative of Magnitude-based Pruning Speaker: Sejun Park Joint work with Jaeho Lee, Sangwoo Mo, and Jinwoo Shin Korea Advanced Institute of Science and Technology (KAIST) ICLR 2020
Transcript
Page 1: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Lookahead: A Far-sighted Alternative of Magnitude-based Pruning

Speaker: Sejun Park

Joint work with Jaeho Lee, Sangwoo Mo, and Jinwoo Shin

Korea Advanced Institute of Science and Technology (KAIST)

ICLR 2020

Page 2: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Modern neural networks are severely over-parametrized

• For training data, parameters network can achieve zero training error [Yun et al.’19]

• e.g., 16M parameters are enough for fitting ImageNet dataset perfectly

Motivation: Over-parametrization in Modern Deep Learning

Number of parameters and ImageNet classification accuracy [Zoph et al.’18]

Page 3: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Modern neural networks are severely over-parametrized

• For training data, parameters network can achieve zero training error [Yun et al.’19]

• e.g., 16M parameters are enough for fitting ImageNet dataset perfectly

Motivation: Over-parametrization in Modern Deep Learning

More parameters• Better generalization

• Better training accuracy

• Better optimization landscape

• Better convergence speed

• …

Page 4: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Modern neural networks are severely over-parametrized

• For training data, parameters network can achieve zero training error [Yun et al.’19]

• e.g., 16M parameters are enough for fitting ImageNet dataset perfectly

Motivation: Over-parametrization in Modern Deep Learning

More parameters• Better generalization

• Better training accuracy

• Better optimization landscape

• Better convergence speed

• …

More parameters• More memory

• More inference time

• More power consumption

• More CO2

• …

Page 5: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Modern neural networks are severely over-parametrized

• For training data, parameters network can achieve zero training error [Yun et al.’19]

• e.g., 16M parameters are enough for fitting ImageNet dataset perfectly

Motivation: Over-parametrization in Modern Deep Learning

Less parameters• Less memory

• Less inference time

• Less power consumption

• Less CO2

• …

Pruning over-parametrized network

+More parameters• Better generalization

• Better training accuracy

• Better optimization landscape

• Better convergence speed

• …

Page 6: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Magnitude-based pruning (MP) is a popular pruning algorithm, removing small weight edges

Despite its simplicity, MP has been showing remarkable performance in practice

• [Han et al.’15, Han et al.’16, Guo et al.’16, Han et al.’17, Narang et al.’17, Zhu and Gupta’18, Frankle and Carbin’19, Gale et al.’19, Renda et al.’20, Lin et al.’20]

Motivation: Magnitude-based Pruning

Smaller weight

Larger weight

Before pruning After pruning

Page 7: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Magnitude-based pruning (MP) is a popular pruning algorithm, removing small weight edges

Despite its simplicity, MP has been showing remarkable performance in practice

• [Han et al.’15, Han et al.’16, Guo et al.’16, Han et al.’17, Narang et al.’17, Zhu and Gupta’18, Frankle and Carbin’19, Gale et al.’19, Renda et al.’20, Lin et al.’20]

However, large weight edges may not be important as much as their weights

Motivation: Magnitude-based Pruning

What if there exists large weight edges

connected only to small weight edges?

Page 8: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

We propose a new pruning algorithm by

1. Interpreting MP as layerwise approximation

2. Extending it to block approximation

Page 9: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

For each layer, MP minimizes Frobenius norm of difference of weight tensors before and after pruning

Intuition: Magnitude-based Pruning = Layerwise Approximation

Input of the layer

Weight before pruning

Weight after pruning

Page 10: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

For each layer, MP minimizes Frobenius norm of difference of weight tensors before and after pruning

Intuition: Magnitude-based Pruning = Layerwise Approximation

Input of the layer

Weight before pruning

Weight after pruning

Page 11: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

For each layer, MP minimizes Frobenius norm of difference of weight tensors before and after pruning

Intuition: Magnitude-based Pruning = Layerwise Approximation

Pruning edge with smallest MP score minimizes

Page 12: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

We propose lookahead pruning (LAP) extending layerwise approximation of MP to block of layers

Contribution: Lookahead Pruning = Block Approximation

Assume linear activation for now

Page 13: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

We propose lookahead pruning (LAP) extending layerwise approximation of MP to block of layers

Contribution: Lookahead Pruning = Block Approximation

Page 14: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

We propose lookahead pruning (LAP) extending layerwise approximation of MP to block of layers

Contribution: Lookahead Pruning = Block Approximation

Pruning edge with smallest LAP score minimizes

Page 15: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

LAP for ReLU activation under i.i.d. activation probability

Contribution: Lookahead Pruning for ReLU

0-1 random diagonal matrix indicating activated neurons

Page 16: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

LAP for ReLU activation under i.i.d. activation probability

Contribution: Lookahead Pruning for ReLU

Page 17: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

LAP for ReLU activation under i.i.d. activation probability

Contribution: Lookahead Pruning for ReLU

Pruning edge with smallest LAP score minimizes

Page 18: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Empirical evaluation of LAP and MP for MNIST classification task under non-linear activation functions

Experiments: Lookahead Pruning for Other Activations

ReLU sigmoid hyperbolic tangent

Page 19: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Empirical evaluation of LAP and MP for CIFAR-10 and Tiny-ImageNet classification tasks

Experiments: Lookahead Pruning for Modern CNNs

LAP outperforms MP especially

in high-sparsity regime!

Page 20: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Summary

We propose lookahead pruning by extending layerwise approximation of MP to block approximation

Page 21: Lookahead: A Far-sighted Alternative of Magnitude-based Pruningalinlab.kaist.ac.kr/resource/lookahead_slide.pdf · 2020. 9. 2. · Park et al. (KAIST) Lookahead: A far sighted alternative

Park et al. (KAIST) Lookahead: A far sighted alternative of magnitude-based pruning ICLR 2020

Summary

We propose lookahead pruning by extending layerwise approximation of MP to block approximation

Codes are available at

In our paper, there are

• More empirical evaluations

• Variants and sequential version of LAP

• LAP for various types of layers

• LAP utilizing real activation probability

• …


Recommended