ISBN-13: 9781119873679 / Angielski / Twarda / 2023
ISBN-13: 9781119873679 / Angielski / Twarda / 2023
Notes on Contributors xiiiForeword xivPreface xvAcknowledgments xviiiAcronyms xixIntroduction xxiiPart I Fundamentals of Deep Reinforcement Learning 11 Deep Reinforcement Learning and Its Applications 31.1 Wireless Networks and Emerging Challenges 31.2 Machine Learning Techniques and Development of DRL 41.2.1 Machine Learning 41.2.2 Artificial Neural Network 71.2.3 Convolutional Neural Network 81.2.4 Recurrent Neural Network 91.2.5 Development of Deep Reinforcement Learning 101.3 Potentials and Applications of DRL 111.3.1 Benefits of DRL in Human Lives 111.3.2 Features and Advantages of DRL Techniques 121.3.3 Academic Research Activities 121.3.4 Applications of DRL Techniques 131.3.5 Applications of DRL Techniques in Wireless Networks 151.4 Structure of this Book and Target Readership 161.4.1 Motivations and Structure of this Book 161.4.2 Target Readership 191.5 Chapter Summary 20References 212 Markov Decision Process and Reinforcement Learning 252.1 Markov Decision Process 252.2 Partially Observable Markov Decision Process 262.3 Policy and Value Functions 292.4 Bellman Equations 302.5 Solutions of MDP Problems 312.5.1 Dynamic Programming 312.5.1.1 Policy Evaluation 312.5.1.2 Policy Improvement 312.5.1.3 Policy Iteration 312.5.2 Monte Carlo Sampling 322.6 Reinforcement Learning 332.7 Chapter Summary 35References 353 Deep Reinforcement Learning Models and Techniques 373.1 Value-Based DRL Methods 373.1.1 Deep Q-Network 383.1.2 Double DQN 413.1.3 Prioritized Experience Replay 423.1.4 Dueling Network 443.2 Policy-Gradient Methods 453.2.1 REINFORCE Algorithm 463.2.1.1 Policy Gradient Estimation 463.2.1.2 Reducing the Variance 483.2.1.3 Policy Gradient Theorem 503.2.2 Actor-Critic Methods 513.2.3 Advantage of Actor-Critic Methods 523.2.3.1 Advantage of Actor-Critic (A2C) 533.2.3.2 Asynchronous Advantage Actor-Critic (A3C) 553.2.3.3 Generalized Advantage Estimate (GAE) 573.3 Deterministic Policy Gradient (DPG) 593.3.1 Deterministic Policy Gradient Theorem 593.3.2 Deep Deterministic Policy Gradient (DDPG) 613.3.3 Distributed Distributional DDPG (D4PG) 633.4 Natural Gradients 633.4.1 Principle of Natural Gradients 643.4.2 Trust Region Policy Optimization (TRPO) 673.4.2.1 Trust Region 693.4.2.2 Sample-Based Formulation 703.4.2.3 Practical Implementation 703.4.3 Proximal Policy Optimization (PPO) 723.5 Model-Based RL 743.5.1 Vanilla Model-Based RL 753.5.2 Robust Model-Based RL: Model-Ensemble TRPO (ME-TRPO) 763.5.3 Adaptive Model-Based RL: Model-Based Meta-Policy Optimization (mb-mpo) 773.6 Chapter Summary 78References 794 A Case Study and Detailed Implementation 834.1 System Model and Problem Formulation 834.1.1 System Model and Assumptions 844.1.1.1 Jamming Model 844.1.1.2 System Operation 854.1.2 Problem Formulation 864.1.2.1 State Space 864.1.2.2 Action Space 874.1.2.3 Immediate Reward 884.1.2.4 Optimization Formulation 884.2 Implementation and Environment Settings 894.2.1 Install TensorFlow with Anaconda 894.2.2 Q-Learning 904.2.2.1 Codes for the Environment 914.2.2.2 Codes for the Agent 964.2.3 Deep Q-Learning 974.3 Simulation Results and Performance Analysis 1024.4 Chapter Summary 106References 106Part II Applications of Drl in Wireless Communications and Networking 1095 DRL at the Physical Layer 1115.1 Beamforming, Signal Detection, and Decoding 1115.1.1 Beamforming 1115.1.1.1 Beamforming Optimization Problem 1115.1.1.2 DRL-Based Beamforming 1135.1.2 Signal Detection and Channel Estimation 1185.1.2.1 Signal Detection and Channel Estimation Problem 1185.1.2.2 RL-Based Approaches 1205.1.3 Channel Decoding 1225.2 Power and Rate Control 1235.2.1 Power and Rate Control Problem 1235.2.2 DRL-Based Power and Rate Control 1245.3 Physical-Layer Security 1285.4 Chapter Summary 129References 1316 DRL at the MAC Layer 1376.1 Resource Management and Optimization 1376.2 Channel Access Control 1396.2.1 DRL in the IEEE 802.11 MAC 1416.2.2 MAC for Massive Access in IoT 1436.2.3 MAC for 5G and B5G Cellular Systems 1476.3 Heterogeneous MAC Protocols 1556.4 Chapter Summary 158References 1587 DRL at the Network Layer 1637.1 Traffic Routing 1637.2 Network Slicing 1667.2.1 Network Slicing-Based Architecture 1667.2.2 Applications of DRL in Network Slicing 1687.3 Network Intrusion Detection 1797.3.1 Host-Based IDS 1807.3.2 Network-Based IDS 1817.4 Chapter Summary 183References 1838 DRL at the Application and Service Layer 1878.1 Content Caching 1878.1.1 QoS-Aware Caching 1878.1.2 Joint Caching and Transmission Control 1898.1.3 Joint Caching, Networking, and Computation 1918.2 Data and Computation Offloading 1938.3 Data Processing and Analytics 1988.3.1 Data Organization 1988.3.1.1 Data Partitioning 1988.3.1.2 Data Compression 1998.3.2 Data Scheduling 2008.3.3 Tuning of Data Processing Systems 2018.3.4 Data Indexing 2028.3.4.1 Database Index Selection 2028.3.4.2 Index Structure Construction 2038.3.5 Query Optimization 2058.4 Chapter Summary 206References 207Part III Challenges, Approaches, Open Issues, and Emerging Research Topics 2139 DRL Challenges in Wireless Networks 2159.1 Adversarial Attacks on DRL 2159.1.1 Attacks Perturbing the State space 2159.1.1.1 Manipulation of Observations 2169.1.1.2 Manipulation of Training Data 2189.1.2 Attacks Perturbing the Reward Function 2209.1.3 Attacks Perturbing the Action Space 2229.2 Multiagent DRL in Dynamic Environments 2239.2.1 Motivations 2239.2.2 Multiagent Reinforcement Learning Models 2249.2.2.1 Markov/Stochastic Games 2259.2.2.2 Decentralized Partially Observable Markov Decision Process (dpomdp) 2269.2.3 Applications of Multiagent DRL in Wireless Networks 2279.2.4 Challenges of Using Multiagent DRL in Wireless Networks 2299.2.4.1 Nonstationarity Issue 2299.2.4.2 Partial Observability Issue 2299.3 Other Challenges 2309.3.1 Inherent Problems of Using RL in Real-Word Systems 2309.3.1.1 Limited Learning Samples 2309.3.1.2 System Delays 2309.3.1.3 High-Dimensional State and Action Spaces 2319.3.1.4 System and Environment Constraints 2319.3.1.5 Partial Observability and Nonstationarity 2319.3.1.6 Multiobjective Reward Functions 2329.3.2 Inherent Problems of DL and Beyond 2329.3.2.1 Inherent Problems of dl 2329.3.2.2 Challenges of DRL Beyond Deep Learning 2339.3.3 Implementation of DL Models in Wireless Devices 2369.4 Chapter Summary 237References 23710 DRL and Emerging Topics in Wireless Networks 24110.1 DRL for Emerging Problems in Future Wireless Networks 24110.1.1 Joint Radar and Data Communications 24110.1.2 Ambient Backscatter Communications 24410.1.3 Reconfigurable Intelligent Surface-Aided Communications 24710.1.4 Rate Splitting Communications 24910.2 Advanced DRL Models 25210.2.1 Deep Reinforcement Transfer Learning 25210.2.1.1 Reward Shaping 25310.2.1.2 Intertask Mapping 25410.2.1.3 Learning from Demonstrations 25510.2.1.4 Policy Transfer 25510.2.1.5 Reusing Representations 25610.2.2 Generative Adversarial Network (GAN) for DRL 25710.2.3 Meta Reinforcement Learning 25810.3 Chapter Summary 259References 259Index 263
Dinh Thai Hoang, Ph.D., is a faculty member at the University of Technology Sydney, Australia. He is also an Associate Editor of IEEE Communications Surveys & Tutorials and an Editor of IEEE Transactions on Wireless Communications, IEEE Transactions on Cognitive Communications and Networking, and IEEE Transactions on Vehicular Technology.Nguyen Van Huynh, Ph.D., obtained his Ph.D. from the University of Technology Sydney in 2022. He is currently a Research Associate in the Department of Electrical and Electronic Engineering, Imperial College London, UK.Diep N. Nguyen, Ph.D., is Director of Agile Communications and Computing Group and a member of the Faculty of Engineering and Information Technology at the University of Technology Sydney, Australia.Ekram Hossain, Ph.D., is a Professor in the Department of Electrical and Computer Engineering at the University of Manitoba, Canada, and a Fellow of the IEEE. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).Dusit Niyato, Ph.D., is a Professor in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He co-authored the Wiley title Radio Resource Management in Multi-Tier Cellular Wireless Networks (2013).
1997-2024 DolnySlask.com Agencja Internetowa