ホーム | 統計数理研究所...a JetPack-2.3 NVIDIA Jetson TX1 board (nVIDIA): an embedded...

6

6 7 0

:9

12

:2

6 7 57%

RD :ET

) 2• B

• 9 G

13• M

• B• ( . %

3

•

6

•

! =!#!$!%!&

' ='#'$'%

( )7

( )

:

! =!#!$!%!&

'

( =(#($(%

)

*# = + ,#! + .#*$ = +(,$*# + .$)

⋮ ⋮( = ,2*3 + .2

, . +

8

( ) ) 9

9A 1

1•

0

!(#$, &$)

min+ ∑$ #$ − .+ &$/

-

•• B

•

) )((

13

) G

/9 ( ne B l o

f k Cfb C P

:1 ( ( B U S

0 8 :

.8 3 :1

( 2 3 / /

1 / 1

8:/

1

(%

(((

) )) %

.3 44 8 .3 44 8 419 17 419 17 241 17 16 17

/ 95

2

• 3• ( ( ( 21

2 ./

6

( )6)

(

CUnder review as a conference paper at ICLR 2017

Figure 1: Top1 vs. network. Single-crop top-1 vali-dation accuracies for top scoring single-model archi-tectures. We introduce with this chart our choice ofcolour scheme, which will be used throughout thispublication to distinguish effectively different archi-tectures and their correspondent authors. Notice thatnetworks of the same group share the same hue, forexample ResNet are all variations of pink.

Figure 2: Top1 vs. operations, size / parameters.Top-1 one-crop accuracy versus amount of operationsrequired for a single forward pass. The size of theblobs is proportional to the number of network pa-rameters; a legend is reported in the bottom right cor-ner, spanning from 5⇥106 to 155⇥106 params. Boththese figures share the same y-axis, and the grey dotshighlight the centre of the blobs.

single run of VGG-161 (Simonyan & Zisserman, 2014) and GoogLeNet (Szegedy et al., 2014) are8.70% and 10.07% respectively, revealing that VGG-16 performs better than GoogLeNet. Whenmodels are run with 10-crop sampling,2 then the errors become 9.33% and 9.15% respectively, andtherefore VGG-16 will perform worse than GoogLeNet, using a single central-crop. For this reason,we decided to base our analysis on re-evaluations of top-1 accuracies3 for all networks with a singlecentral-crop sampling technique (Zagoruyko, 2016).

For inference time and memory usage measurements we have used Torch7 (Collobert et al., 2011)with cuDNN-v5 (Chetlur et al., 2014) and CUDA-v8 back-end. All experiments were conducted ona JetPack-2.3 NVIDIA Jetson TX1 board (nVIDIA): an embedded visual computing system witha 64-bit ARM R� A57 CPU, a 1 T-Flop/s 256-core NVIDIA Maxwell GPU and 4 GB LPDDR4of shared RAM. We use this resource-limited device to better underline the differences betweennetwork architecture, but similar results can be obtained on most recent GPUs, such as the NVIDIAK40 or Titan X, to name a few. Operation counts were obtained using an open-source tool that wedeveloped (Paszke, 2016). For measuring the power consumption, a Keysight 1146B Hall effectcurrent probe has been used with a Keysight MSO-X 2024A 200MHz digital oscilloscope with asampling period of 2 s and 50 kSa/s sample rate. The system was powered by a Keysight E3645AGPIB controlled DC power supply.

3 RESULTS

In this section we report our results and comparisons. We analysed the following DDNs: AlexNet(Krizhevsky et al., 2012), batch normalised AlexNet (Zagoruyko, 2016), batch normalised NetworkIn Network (NIN) (Lin et al., 2013), ENet (Paszke et al., 2016) for ImageNet (Culurciello, 2016),GoogLeNet (Szegedy et al., 2014), VGG-16 and -19 (Simonyan & Zisserman, 2014), ResNet-18,-34, -50, -101 and -152 (He et al., 2015), Inception-v3 (Szegedy et al., 2015) and Inception-v4(Szegedy et al., 2016) since they obtained the highest performance, in these four years, on theImageNet (Russakovsky et al., 2015) challenge.

1 In the original paper this network is called VGG-D, which is the best performing network. Here we preferto highlight the number of layer utilised, so we will call it VGG-16 in this publication.

2 From a given image multiple patches are extracted: four corners plus central crop and their horizontalmirrored twins.

3 Accuracy and error rate always sum to 100, therefore in this paper they are used interchangeably.

2

2 2/7 2

3 20

+ . + (

%

21

3.)/

) 3

)

)

( 3(

/

Li+ (2018)

/

( )

3

// .

Li+ (2018)

/

3./

/ .

0 2 33 . 3

. 1

11 1

12 . 1

11

.

2 .

3

1).

• (

24

B

S

e

.0 . & , .1 1 2 1 & ( 21 & (

kI F 7 9

e i fa25

. .

. 1 11

.

..

26

27

22 . 2

22

22

R)

!"

!#

R≈ V .

!", !#V

(

≈C

C 2

.(

2 ) (

a Aa A

b Al

hN B Ae

!"

!#

⋮

+ . ) 0851. 7 ,. 20 (

30

. 1

11 1

1. 1

13 1

P 3. 3

• -

O32

. a -• a 3 3 O

P a a

a

a .Allen-Zhu+ (2019), Liang+(2018), Kawaguchi+ (2019)

3

2

O

O3)

) 3 2O

.

( 01 O

34

•

53 53

••

••

1()

• 0 2

•39

G

• F0 7

2

21

) (9

40

•• Stone, C. J. (1980). Optimal rates of convergence for

nonparametric estimators. The annals of Statistics.• Cybenko, G. (1989). Approximation by superpositions of

a sigmoidal function. Mathematics of control, signals and systems

• Safran, I., & Shamir, O. (2017). Depth-width tradeoffs in approximating natural functions with neural networks. International Conference on Machine Learning.• Li, H., Xu, Z., Taylor, G., Studer, C., & Goldstein, T. (2018).

Visualizing the loss landscape of neural nets. Advances in Neural Information Processing Systems.

42

• Imaizumi, M., & Fukumizu, K. (2019). Deep Neural Networks Learn Non-Smooth FuncEons EffecEvely. ArEficial Intelligence and StaEsEcs.• Suzuki, T. (2018). AdapEvity of deep ReLU network for

learning in Besov and mixed smooth Besov spaces: InternaEonal Conference on Learning RepresentaEons.• Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-

based capacity control in neural networks. In Conference on Learning Theory.• BartleY, P. L., Foster, D. J., & Telgarsky, M. J. (2017).

Spectrally-normalized margin bounds for neural networks. Advances in Neural InformaEon Processing Systems.

43

• Arora, S., Ge, R., Neyshabur, B., & Zhang, Y. (2018). Stronger generalizaBon bounds for deep nets via a compression approach. InternaBonal Conference on Machine Learning.• Allen-Zhu, Z., Li, Y., & Song, Z. (2018). A convergence

theory for deep learning via over-parameterizaBon.InternaBonal Conference on Machine Learning.• Liang, S., Sun, R., Lee, J. D., & Srikant, R. (2018). Adding

one neuron can eliminate all bad local minima. Advances in Neural InformaBon Processing Systems.• Kawaguchi, K., & Kaelbling, L. P. (2019). EliminaBon of all

bad local minima in deep learning. arXiv preprint.

44

•

•

• [email protected]

45

Date post:	10-Feb-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

ホーム | 統計数理研究所...a JetPack-2.3 NVIDIA Jetson TX1 board (nVIDIA): an embedded...

Documents