+ All Categories
Home > Documents > of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import)...

of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import)...

Date post: 25-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
30
Introduction to practical aspects of Deep Learning Yuping Luo
Transcript
Page 1: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Introduction to practical aspects of Deep LearningYuping Luo

Page 2: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Introduction to practical aspects of Deep Learning PyTorchYuping Luo

Page 3: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

● Performance;

● Fewer bugs;

● Code reuse (backpropagation, convolution, etc.);

● Community;

● ...

Why using a framework?

Page 4: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Basic Pipeline

1. (Installation and Import)

2. Data Loading

3. Network Architecture

4. Optimization(train)

5. Evaluation

Page 5: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Installation and Import

$ pip install torch torchvision

# Optimizers

# Basic Tensor operations

# Modules and layers (class style)

# Modules and layers (function style)

Page 6: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Data Loading

Data preprocessing

Page 7: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Network Architecture and Optimizer

Move parameters to device (cpu or cuda)

Page 8: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Training

Use train mode: Dropout/BatchNorm/etc.

compute gradient

apply gradient

Page 9: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Evaluation

+ Average over multiple random seeds!

don’t need gradient

Use train mode: Dropout/BatchNorm/etc.

Page 10: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Main Loop

Page 11: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Run!

Page 12: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

A container of parameters.

nn.Module: Flexible Network Architecture

Nested module

Page 13: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

nn.Module: Flexible Network Architecture

He, Kaiming, et al.

Page 14: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Inplace Operations

Do NOT use inplace operations if you require grads.

Live examples!

Page 15: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

GPU

1. GPU/CPU interaction is slow.

2. Large batch size.

a. Data parallel (mostly used)

b. Async gradient update (ASGD, etc.)

3. Floating point: 32-bit (float) vs 64-bit (double) vs 16-bit (half)

4. nvidia-smi

5. Async preprocessing (by CPU).

Page 16: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Reproducibility

1. Easier to debug.

2. Fix random seed!

3. Make a copy of source code / command lines.

Page 17: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

...TensorFlow

1. Fewer calls to sess.run due to large overhead in TensorFlow.

2. Debug?

a. tf.Print

b. sess.run(‘Add:0’)

c. ...or eager mode

Page 18: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Overfitting

1. Regularization (L2, etc.);

2. Dropout;

3. Data augmentation;

4. Smaller network;

5. Early stop;

6. ...

Page 19: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hyperparameter Tuning

1. Coordinate Descent;

2. Grid Search;

3. Random Search;

4. …

Page 20: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Advanced Techniques

Page 21: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

+ Why not storing the whole matrix?

+ Quadratic Form

+ Minimizer

+ Conjugate Gradient only requires to compute Hv

Page 22: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

symbolic

Page 23: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

Page 24: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Hessian-vector product

Page 25: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

From Rhu, Minsoo, et al.

Page 26: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

https://github.com/openai/gradient-checkpointing

Page 27: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Page 28: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Re-compute

Page 29: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing

Re-compute

Page 30: of Deep Learning Introduction to practical aspects · Basic Pipeline 1. (Installation and Import) 2. Data Loading 3. Network Architecture 4. Optimization(train) 5. Evaluation

Gradient Checkpointing


Recommended