+ All Categories
Home > Documents > Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and...

Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and...

Date post: 26-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
Information Fusion and Data Science Series Editor Henry Leung, University of Calgary, Calgary, AB, Canada
Transcript
Page 1: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Information Fusion and Data Science

Series Editor

Henry Leung, University of Calgary, Calgary, AB, Canada

Page 2: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

This book series provides a forum to systematically summarize recent developments,discoveries and progress on multi-sensor, multi-source/multi-level data and infor-mation fusion along with its connection to data-enabled science. Emphasis is alsoplaced on fundamental theories, algorithms and real-world applications of massivedata as well as information processing, analysis, fusion and knowledge generation.

The aim of this book series is to provide the most up-to-date research results andtutorial materials on current topics in this growing field as well as to stimulate furtherresearch interest by transmitting the knowledge to the next generation of scientistsand engineers in the corresponding fields. The target audiences are graduate stu-dents, academic scientists as well as researchers in industry and government, relatedto computational sciences and engineering, complex systems and artificial intelli-gence. Formats suitable for the series are contributed volumes, monographs andlecture notes.

More information about this series at http://www.springer.com/series/15462

Page 3: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Haitao Zhao • Zhihui Lai • Henry LeungXianyi Zhang

Feature Learning andUnderstandingAlgorithms and Applications

Page 4: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Haitao ZhaoEast China University of Scienceand TechnologyShanghai, Shanghai, China

Zhihui LaiShenzhen UniversityShenzhen, China

Henry LeungDepartment of Electrical& Computer EngineeringUniversity of CalgaryCalgary, AB, Canada

Xianyi ZhangEast China University of Scienceand TechnologyShanghai, Shanghai, China

ISSN 2510-1528 ISSN 2510-1536 (electronic)Information Fusion and Data ScienceISBN 978-3-030-40793-3 ISBN 978-3-030-40794-0 (eBook)https://doi.org/10.1007/978-3-030-40794-0

© Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors, and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor the authors orthe editors give a warranty, expressed or implied, with respect to the material contained herein or for anyerrors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Page 5: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Preface

Big data is a big opportunity for us in this era. The digital transformation leaves noorganization untouched, and all companies have to derive value and insight fromdata. The theories and applications of mining-specific information and extractingknowledge from massive data are becoming more and more important.

Features are data representatives that can be much easier to understand in thecontext of a problem, and feature learning is the process of using domain knowledgeand special techniques to transform raw data into features. Feature learning is anessential procedure for data analysis and machine intelligence. Readers who are notsure what feature learning is could turn to Chap. 1 or some other works.

This book covers the essential concepts and strategies within traditional andcutting-edge feature learning methods. Each feature learning method has its owndedicated chapter that explains how it is theoretically derived and shows how it isimplemented for real-world applications through case studies.

In this book, readers can find not only traditional feature learning methods, suchas principal component analysis, linear discriminant analysis, geometrical structure-based methods, and kernel-based learning methods but also advanced feature learn-ing methods, such as sparse learning, low-rank decomposition, tensor-based featureextraction, and deep learning-based feature learning. Some relevant codes andexperimental results uploaded at https://github.com/haitaozhao/flu allow readers toreproduce the experiments easily by themselves.

The intended audience of this book are nonspecialists whose needs cannot besatisfied by the black box. It seems that these people will be chiefly interested in themethods themselves – how they are derived and how they can be adapted toparticular problems. The aim of this book is to bring the reader to the point wherehe/she can go to the research literature to augment what is in this book.

Readers are assumed to have a knowledge of elementary analysis and linearalgebra and a reasonable amount of programming experience. Strictly speaking,this book is not a textbook. The guiding principle has been that if something isworth explaining, it is worth explaining clearly. This is necessarily restricted to the

v

Page 6: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

scope of the book, but the authors hope the selected feature learning methods inthis book will give the reader a good basis for further study or research.

Many people have contributed to this book. We would like to thank the followingcolleagues and friends for their help: Qianqian Wang, Yuqi Li, Yuru Chen, ZiyanLiao, Zhengwei Hu, Jingchao Peng, and Yaobin Xu. Their suggestions and remarkshave contributed significantly to improvements.

Shanghai, ChinaJanuary 2020

Haitao Zhao

vi Preface

Page 7: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Contents

1 A Gentle Introduction to Feature Learning . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Data and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 Data Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Feature Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Solutions to Eigenvalue Equations . . . . . . . . . . . . . . . . 101.3.2 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Latent Semantic Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 132.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Feature Extraction by SVD . . . . . . . . . . . . . . . . . . . . . 152.2.2 An Example of SVD . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 SVD Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 SVD with Compressive Sampling . . . . . . . . . . . . . . . . . . . . . . 222.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5.1 Analysis of Coil-20 Data Set . . . . . . . . . . . . . . . . . . . . 232.5.2 Latent Semantic Feature Extraction

for Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 242.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Classical Principal Component Analysis . . . . . . . . . . . . . . . . . . 32

3.2.1 Maximizing Variance and Minimizing Residuals . . . . . 323.2.2 Theoretical Derivation of PCA . . . . . . . . . . . . . . . . . . 33

vii

Page 8: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

3.2.3 An Alternative View of PCA . . . . . . . . . . . . . . . . . . . 353.2.4 Selection of the Reduced Dimension . . . . . . . . . . . . . . 373.2.5 Eigendecomposition of XXT or XTX . . . . . . . . . . . . . . 383.2.6 Relationship between PCA and SVD . . . . . . . . . . . . . . 39

3.3 Probabilistic Principal Component Analysis . . . . . . . . . . . . . . . 403.3.1 Latent Variable Model . . . . . . . . . . . . . . . . . . . . . . . . 403.3.2 The Probability Model of PPCA . . . . . . . . . . . . . . . . . 413.3.3 The Maximum Likelihood Estimation of PPCA . . . . . . 423.3.4 The PPCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.1 Enterprise Profit Ratio Analysis Using PCA . . . . . . . . . 443.4.2 Fault Detection Based on PCA . . . . . . . . . . . . . . . . . . 46

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Manifold-Learning-Based Feature Extraction . . . . . . . . . . . . . . . . . 534.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Manifold Learning and Spectral Graph Theory . . . . . . . . . . . . . 544.3 Neighborhood Preserving Projection . . . . . . . . . . . . . . . . . . . . 54

4.3.1 Locally Linear Embedding (LLE) . . . . . . . . . . . . . . . . 554.3.2 Neighborhood Preserving Embedding (NPE) . . . . . . . . 58

4.4 Locality Preserving Projection (LPP) . . . . . . . . . . . . . . . . . . . . 594.4.1 Relationship to PCA . . . . . . . . . . . . . . . . . . . . . . . . . . 614.4.2 Relationship to Laplacian Eigenmaps . . . . . . . . . . . . . . 62

4.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.1 Handwritten Digit Visualization . . . . . . . . . . . . . . . . . 634.5.2 Face Manifold Analysis . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . 725.3 Analysis of FLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.4 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.4.1 An Example of LDA . . . . . . . . . . . . . . . . . . . . . . . . . 795.4.2 Foley-Sammon Optimal Discriminant Vectors . . . . . . . 80

5.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 Kernel-Based Nonlinear Feature Learning . . . . . . . . . . . . . . . . . . . 876.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.2 Kernel Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.3 Kernel Principal Component Analysis . . . . . . . . . . . . . . . . . . . 90

6.3.1 Revisiting of PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3.2 Derivation of Kernel Principal Component Analysis . . . 906.3.3 Kernel Averaging Filter . . . . . . . . . . . . . . . . . . . . . . . 93

6.4 Kernel Fisher Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

viii Contents

Page 9: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

6.5 Generalized Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . 986.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7 Sparse Feature Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.2 Sparse Representation Problem with Different Norm

Regularizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.2.1 ℓ0-norm Regularized Sparse Representation . . . . . . . . . 1057.2.2 ℓ1-norm Regularized Sparse Representation . . . . . . . . . 1077.2.3 ℓp-norm (0 < p < 1) Regularized Sparse

Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087.2.4 ℓ2,1-norm Regularized Group-Wise Sparse

Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097.3 Lasso Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097.4 Sparse Feature Learning with Generalized Regression . . . . . . . . 111

7.4.1 Sparse Principal Component Analysis . . . . . . . . . . . . . 1117.4.2 Generalized Robust Regression (GRR) for Jointly

Sparse Subspace Learning . . . . . . . . . . . . . . . . . . . . . . 1127.4.3 Robust Jointly Sparse Regression with Generalized

Orthogonal Learning for Image Feature Selection . . . . . 1177.4.4 Locally Joint Sparse Marginal Embedding

for Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 1227.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8 Low Rank Feature Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1358.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1358.2 Low Rank Approximation Problems . . . . . . . . . . . . . . . . . . . . . 1378.3 Low Rank Projection Learning Algorithms . . . . . . . . . . . . . . . . 1408.4 Robust Low Rank Projection Learning . . . . . . . . . . . . . . . . . . . 143

8.4.1 Low-Rank Preserving Projections . . . . . . . . . . . . . . . . 1438.4.2 Low-Rank Preserving Projection with GRR . . . . . . . . . 1478.4.3 Low-Rank Linear Embedding . . . . . . . . . . . . . . . . . . . 1508.4.4 Feature Selective Projection with Low-Rank

Embedding and Dual Laplacian Regularization . . . . . . 1538.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.5.1 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.5.2 Observations and Discussions . . . . . . . . . . . . . . . . . . . 159

8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9 Tensor-Based Feature Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1619.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1619.2 Tensor Representation Based on Tucker Decomposition . . . . . . 163

9.2.1 Preliminaries of Tucker Decomposition . . . . . . . . . . . . 1639.2.2 Main Idea of Tucker-Based Feature Learning . . . . . . . . 167

Contents ix

Page 10: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

9.3 Rationality: Criteria for Tucker-Based Feature LearningModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689.3.1 Least Square Error Multi-linear Representation:

Tucker-Based PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689.3.2 Living in a Manifold: Tucker-Based Manifold

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1709.3.3 Learning with the Truth: Tucker-Based

Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 1719.4 Solvability: An Algorithmic Framework of Alternative

Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1739.4.1 Alternative Minimization Algorithms . . . . . . . . . . . . . . 1749.4.2 A Unified Framework . . . . . . . . . . . . . . . . . . . . . . . . . 1809.4.3 Sparsity Helps: Sparse Tensor Alignment . . . . . . . . . . . 185

9.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.5.1 Alternative Minimization for MJSPCA . . . . . . . . . . . . 1889.5.2 Action Recognition with MJSPCA . . . . . . . . . . . . . . . 190

9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

10 Neural-Network-Based Feature Learning: Auto-Encoder . . . . . . . . 19510.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19510.2 Auto-Encoder (AE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

10.2.1 Fully Connected Layer and Activation Function . . . . . . 19610.2.2 Basic Auto-Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 19810.2.3 Backpropagation and Computational Graphs . . . . . . . . 20010.2.4 Relationship Between the Dimension of Data

and the Dimension of Feautures . . . . . . . . . . . . . . . . . 20710.3 Denoising Auto-Encoder (DAE) . . . . . . . . . . . . . . . . . . . . . . . 20810.4 Stacked Auto-Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

10.4.1 Training Stacked Auto-Encoder . . . . . . . . . . . . . . . . . . 21010.4.2 Stacked Denoising Auto-Encoders (SDAE) . . . . . . . . . 211

10.5 Applications of Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . 21110.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

10.6.1 Auto-Encoder for Feature Learning . . . . . . . . . . . . . . . 21310.6.2 Auto-Encoder for Fault Detection . . . . . . . . . . . . . . . . 215

10.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

11 Neural-Network-Based Feature Learning: ConvolutionalNeural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21911.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21911.2 Basic Architecture of CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

11.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 22011.2.2 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22311.2.3 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 22411.2.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

x Contents

Page 11: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

11.2.5 Relationship between Convolutional Layerand Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . 226

11.2.6 Backpropagation of Convolutional Layers . . . . . . . . . . 22811.3 Transfer Feature Learning of CNN . . . . . . . . . . . . . . . . . . . . . . 237

11.3.1 Formalization of Transfer Learning Problems . . . . . . . . 23811.3.2 Basic Method of Transfer Learning . . . . . . . . . . . . . . . 238

11.4 Deep Convolutional Models . . . . . . . . . . . . . . . . . . . . . . . . . . 24011.4.1 The Beginning of Deep Convolutional Neural

Networks: AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . 24111.4.2 Common Architecture: VGG . . . . . . . . . . . . . . . . . . . . 24211.4.3 Inception Mechanism: GoogLeNet . . . . . . . . . . . . . . . 24311.4.4 Stacked Convolutional Auto-Encoders . . . . . . . . . . . . . 244

11.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24611.5.1 CNN-Based Handwritten Numeral Recognition . . . . . . 24611.5.2 Spatial Transformer Network . . . . . . . . . . . . . . . . . . . 249

11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

12 Neural-Network-Based Feature Learning: Recurrent NeuralNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25312.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25312.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

12.2.1 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . 25412.2.2 Backpropagation Through Time (BPTT) . . . . . . . . . . . 25512.2.3 Different Types of RNNs . . . . . . . . . . . . . . . . . . . . . . 257

12.3 Long Short-Term Memory (LSTM) . . . . . . . . . . . . . . . . . . . . . 25812.3.1 Forget Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26012.3.2 Input Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26012.3.3 Output Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26112.3.4 The Backpropagation of LSTM . . . . . . . . . . . . . . . . . . 26212.3.5 Explanation of Gradient Vanishing . . . . . . . . . . . . . . . 265

12.4 Gated Recurrent Unit (GRU) . . . . . . . . . . . . . . . . . . . . . . . . . . 26512.5 Deep RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26812.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

12.6.1 Datasets Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 27012.6.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 27012.6.3 Define Network Architecture and Training

Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27212.6.4 Test the Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

12.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

Contents xi

Page 12: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered
Page 13: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

Notation

Numbers and Sets

x A scalarx A vectorX A matrix A tensorxij The element of matrix X at row i and column jXr The matrices formed by the first r columns of X\ The intersection of two sets[ The union of two sets[ni¼1 The union of n sets

8 For allR The set of all real numbersR n The set of n-dimensional column vectors of real numbersR m�n The set of matrices of real numbers with m rows and n columnsxif gni¼1 The set consists of samples x1, x2, . . ., xn

Operators and Functions

(∙)T The transpose of a vector or a matrixkxk0 The ℓ0-norm of a vector xkxk(kxk2) The ℓ2-norm of a vector xkXkF The Frobenius norm of a matrix XkXk2,1 The ℓ2,1-norm of a matrix XkXk� The nuclear norm of a matrix X: the sum of all its singular valuesX�1 The inverse of matrix XX+ The generalized inverse of matrix Xdet(X) The determinant of matrix X

xiii

Page 14: Series Editor Henry Leung, University of Calgary, …978-3-030-40794...claims in published maps and institutional affiliations. This Springer imprint is published by the registered

rank(X) The rank of a matrix X: the dimension of the vector space generated(or spanned) by its columns

trace(X) The trace of a matrix X: the sum of all its diagonal entries⨂ The Kronecker product⨀ The Hadamard (elementwise) productb∙c Rounding a number to the next smaller integer∑ Series addition∏ Series multiplicationRf(x)dx The indefinite integral of f with respect to x

R ba f xð Þdx The definite integral of f from a to b with respect to x

f(∙) A functionln(∙) The natural logarithmexp(∙) The exponential function

Derivative and Gradient

dydx

The derivative of y with respect to x∂y∂x

The partial derivative of y with respect to x

∇f The gradient of function f

Probability and Statistics

p(x) The probability density function of a random variable xp(x| y) The conditional probability of the random variable y given xP(n) The probability distribution of a discrete variable nE(x) The expectation of xcov(x) The covariance matrix of a random vector xN μ,Σð Þ The normal (Gaussian) distribution with mean μ and covariance Σ

Graph Theory and Symbols

G A graphV The set of nodesE The set of edges connecting the pointsL The Laplace Beltrami operator� a � b means a is much less-than b2 s 2 S means s is an element of set S

xiv Notation


Recommended