ISBN-13: 9781119810452 / Angielski / Twarda / 2021 / 240 str.
ISBN-13: 9781119810452 / Angielski / Twarda / 2021 / 240 str.
Author Biographies xiPreface xiiiAcknowledgments xvTable of Figures xvii1 Introduction 11.1 Development History 21.2 Neural Network Models 41.3 Neural Network Classification 41.3.1 Supervised Learning 41.3.2 Semi-supervised Learning 51.3.3 Unsupervised Learning 61.4 Neural Network Framework 61.5 Neural Network Comparison 10Exercise 11References 122 Deep Learning 132.1 Neural Network Layer 132.1.1 Convolutional Layer 132.1.2 Activation Layer 172.1.3 Pooling Layer 182.1.4 Normalization Layer 192.1.5 Dropout Layer 202.1.6 Fully Connected Layer 202.2 Deep Learning Challenges 22Exercise 22References 243 Parallel Architecture 253.1 Intel Central Processing Unit (CPU) 253.1.1 Skylake Mesh Architecture 273.1.2 Intel Ultra Path Interconnect (UPI) 283.1.3 Sub Non-unified Memory Access Clustering (SNC) 293.1.4 Cache Hierarchy Changes 313.1.5 Single/Multiple Socket Parallel Processing 323.1.6 Advanced Vector Software Extension 333.1.7 Math Kernel Library for Deep Neural Network (MKL-DNN) 343.2 NVIDIA Graphics Processing Unit (GPU) 393.2.1 Tensor Core Architecture 413.2.2 Winograd Transform 443.2.3 Simultaneous Multithreading (SMT) 453.2.4 High Bandwidth Memory (HBM2) 463.2.5 NVLink2 Configuration 473.3 NVIDIA Deep Learning Accelerator (NVDLA) 493.3.1 Convolution Operation 503.3.2 Single Data Point Operation 503.3.3 Planar Data Operation 503.3.4 Multiplane Operation 503.3.5 Data Memory and Reshape Operations 513.3.6 System Configuration 513.3.7 External Interface 523.3.8 Software Design 523.4 Google Tensor Processing Unit (TPU) 533.4.1 System Architecture 533.4.2 Multiply-Accumulate (MAC) Systolic Array 553.4.3 New Brain Floating-Point Format 553.4.4 Performance Comparison 573.4.5 Cloud TPU Configuration 583.4.6 Cloud Software Architecture 603.5 Microsoft Catapult Fabric Accelerator 613.5.1 System Configuration 643.5.2 Catapult Fabric Architecture 653.5.3 Matrix-Vector Multiplier 653.5.4 Hierarchical Decode and Dispatch (HDD) 673.5.5 Sparse Matrix-Vector Multiplication 68Exercise 70References 714 Streaming Graph Theory 734.1 Blaize Graph Streaming Processor 734.1.1 Stream Graph Model 734.1.2 Depth First Scheduling Approach 754.1.3 Graph Streaming Processor Architecture 764.2 Graphcore Intelligence Processing Unit 794.2.1 Intelligence Processor Unit Architecture 794.2.2 Accumulating Matrix Product (AMP) Unit 794.2.3 Memory Architecture 794.2.4 Interconnect Architecture 794.2.5 Bulk Synchronous Parallel Model 81Exercise 83References 845 Convolution Optimization 855.1 Deep Convolutional Neural Network Accelerator 855.1.1 System Architecture 865.1.2 Filter Decomposition 875.1.3 Streaming Architecture 905.1.3.1 Filter Weights Reuse 905.1.3.2 Input Channel Reuse 925.1.4 Pooling 925.1.4.1 Average Pooling 925.1.4.2 Max Pooling 935.1.5 Convolution Unit (CU) Engine 945.1.6 Accumulation (ACCU) Buffer 945.1.7 Model Compression 955.1.8 System Performance 955.2 Eyeriss Accelerator 975.2.1 Eyeriss System Architecture 975.2.2 2D Convolution to 1D Multiplication 985.2.3 Stationary Dataflow 995.2.3.1 Output Stationary 995.2.3.2 Weight Stationary 1015.2.3.3 Input Stationary 1015.2.4 Row Stationary (RS) Dataflow 1045.2.4.1 Filter Reuse 1045.2.4.2 Input Feature Maps Reuse 1065.2.4.3 Partial Sums Reuse 1065.2.5 Run-Length Compression (RLC) 1065.2.6 Global Buffer 1085.2.7 Processing Element Architecture 1085.2.8 Network-on- Chip (NoC) 1085.2.9 Eyeriss v2 System Architecture 1125.2.10 Hierarchical Mesh Network 1165.2.10.1 Input Activation HM-NoC 1185.2.10.2 Filter Weight HM-NoC 1185.2.10.3 Partial Sum HM-NoC 1195.2.11 Compressed Sparse Column Format 1205.2.12 Row Stationary Plus (RS+) Dataflow 1225.2.13 System Performance 123Exercise 125References 1256 In-Memory Computation 1276.1 Neurocube Architecture 1276.1.1 Hybrid Memory Cube (HMC) 1276.1.2 Memory Centric Neural Computing (MCNC) 1306.1.3 Programmable Neurosequence Generator (PNG) 1316.1.4 System Performance 1326.2 Tetris Accelerator 1336.2.1 Memory Hierarchy 1336.2.2 In-Memory Accumulation 1336.2.3 Data Scheduling 1356.2.4 Neural Network Vaults Partition 1366.2.5 System Performance 1376.3 NeuroStream Accelerator 1386.3.1 System Architecture 1386.3.2 NeuroStream Coprocessor 1406.3.3 4D Tiling Mechanism 1406.3.4 System Performance 141Exercise 143References 1437 Near-Memory Architecture 1457.1 DaDianNao Supercomputer 1457.1.1 Memory Configuration 1457.1.2 Neural Functional Unit (NFU) 1467.1.3 System Performance 1497.2 Cnvlutin Accelerator 1507.2.1 Basic Operation 1517.2.2 System Architecture 1517.2.3 Processing Order 1547.2.4 Zero-Free Neuron Array Format (ZFNAf) 1557.2.5 The Dispatcher 1557.2.6 Network Pruning 1577.2.7 System Performance 1577.2.8 Raw or Encoded Format (RoE) 1587.2.9 Vector Ineffectual Activation Identifier Format (VIAI) 1597.2.10 Ineffectual Activation Skipping 1597.2.11 Ineffectual Weight Skipping 161Exercise 161References 1618 Network Sparsity 1638.1 Energy Efficient Inference Engine (EIE) 1638.1.1 Leading Nonzero Detection (LNZD) Network 1638.1.2 Central Control Unit (CCU) 1648.1.3 Processing Element (PE) 1648.1.4 Deep Compression 1668.1.5 Sparse Matrix Computation 1678.1.6 System Performance 1698.2 Cambricon-X Accelerator 1698.2.1 Computation Unit 1718.2.2 Buffer Controller 1718.2.3 System Performance 1748.3 SCNN Accelerator 1758.3.1 SCNN PT-IS-CP-Dense Dataflow 1758.3.2 SCNN PT-IS-CP-Sparse Dataflow 1778.3.3 SCNN Tiled Architecture 1788.3.4 Processing Element Architecture 1798.3.5 Data Compression 1808.3.6 System Performance 1808.4 SeerNet Accelerator 1838.4.1 Low-Bit Quantization 1838.4.2 Efficient Quantization 1848.4.3 Quantized Convolution 1858.4.4 Inference Acceleration 1868.4.5 Sparsity-Mask Encoding 1868.4.6 System Performance 188Exercise 188References 1889 3D Neural Processing 1919.1 3D Integrated Circuit Architecture 1919.2 Power Distribution Network 1939.3 3D Network Bridge 1959.3.1 3D Network-on-Chip 1959.3.2 Multiple-Channel High-Speed Link 1959.4 Power-Saving Techniques 1989.4.1 Power Gating 1989.4.2 Clock Gating 199Exercise 200References 201Appendix A: Neural Network Topology 203Index 205
Albert Chun Chen Liu, PhD, is Chief Executive Officer of Kneron. He is Adjunct Associate Professor at National Tsing Hua University, National Chiao Tung University, and National Cheng Kung University. He has published over 15 IEEE papers and is an IEEE Senior Member. He is a recipient of the IBM Problem Solving Award based on the use of the EIP tool suite in 2007 and IEEE TCAS Darlington award in 2021.Oscar Ming Kin Law, PhD, is the Director of Engineering at Kneron. He works on smart robot development and in-memory architecture for neural networks. He has over twenty years of experience in the semiconductor industry working with CPU, GPU, and mobile design. He has also published over 60 patents in various areas.
1997-2024 DolnySlask.com Agencja Internetowa