ISBN-13: 9781119815037 / Angielski / Twarda / 2022 / 1136 str.
ISBN-13: 9781119815037 / Angielski / Twarda / 2022 / 1136 str.
Preface xxvAcknowledgments xxxiPart I - Introduction 11 Sequential Decision Problems 31.1 The Audience 71.2 The Communities of Sequential Decision Problems 81.3 Our Universal Modeling Framework 101.4 Designing Policies for Sequential Decision Problems 151.5 Learning 201.6 Themes 211.7 Our Modeling Approach 271.8 How to Read this Book 271.9 Bibliographic Notes 33Exercises 34Bibliography 382 Canonical Problems and Applications 392.1 Canonical Problems 392.2 A Universal Modeling Framework for Sequential Decision Problems 642.3 Applications 692.4 Bibliographic Notes 85Exercises 90Bibliography 933 Online Learning 1013.1 Machine Learning for Sequential Decisions 1023.2 Adaptive Learning Using Exponential Smoothing 1103.3 Lookup Tables with Frequentist Updating 1113.4 Lookup Tables with Bayesian Updating 1123.5 Computing Bias and Variance* 1183.6 Lookup Tables and Aggregation* 1213.7 Linear Parametric Models 1313.8 Recursive Least Squares for Linear Models 1363.9 Nonlinear Parametric Models 1403.10 Nonparametric Models* 1493.11 Nonstationary Learning* 1593.12 The Curse of Dimensionality 1623.13 Designing Approximation Architectures in Adaptive Learning 1653.14 Why Does It Work?** 1663.15 Bibliographic Notes 174Exercises 176Bibliography 1804 Introduction to Stochastic Search 1834.1 Illustrations of the Basic Stochastic Optimization Problem 1854.2 Deterministic Methods 1884.3 Sampled Models 1934.4 Adaptive Learning Algorithms 2024.5 Closing Remarks 2104.6 Bibliographic Notes 210Exercises 212Bibliography 218Part II - Stochastic Search 2215 Derivative-Based Stochastic Search 2235.1 Some Sample Applications 2255.2 Modeling Uncertainty 2285.3 Stochastic Gradient Methods 2315.4 Styles of Gradients 2375.5 Parameter Optimization for Neural Networks* 2425.6 Stochastic Gradient Algorithm as a Sequential Decision Problem 2475.7 Empirical Issues 2485.8 Transient Problems* 2495.9 Theoretical Performance* 2505.10 Why Does it Work? 2505.11 Bibliographic Notes 263Exercises 264Bibliography 2706 Stepsize Policies 2736.1 Deterministic Stepsize Policies 2766.2 Adaptive Stepsize Policies 2826.3 Optimal Stepsize Policies* 2896.4 Optimal Step sizes for Approximate Value Iteration* 2976.5 Convergence 3006.6 Guidelines for Choosing Stepsize Policies 3016.7 Why Does it Work* 3036.8 Bibliographic Notes 306Exercises 307Bibliography 3147 Derivative-Free Stochastic Search 3177.1 Overview of Derivative-free Stochastic Search 3197.2 Modeling Derivative-free Stochastic Search 3257.3 Designing Policies 3307.4 Policy Function Approximations 3337.5 Cost Function Approximations 3357.6 VFA-based Policies 3387.7 Direct Lookahead Policies 3487.8 The Knowledge Gradient (Continued)* 3627.9 Learning in Batches 3807.10 Simulation Optimization* 3827.11 Evaluating Policies 3857.12 Designing Policies 3947.13 Extensions* 3987.14 Bibliographic Notes 409Exercises 412Bibliography 424Part III - State-dependent Problems 4298 State-dependent Problems 4318.1 Graph Problems 4338.2 Inventory Problems 4398.3 Complex Resource Allocation Problems 4468.4 State-dependent Learning Problems 4568.5 A Sequence of Problem Classes 4608.6 Bibliographic Notes 461Exercises 462Bibliography 4669 Modeling Sequential Decision Problems 4679.1 A Simple Modeling Illustration 4719.2 Notational Style 4769.3 Modeling Time 4789.4 The States of Our System 4819.5 Modeling Decisions 5009.6 The Exogenous Information Process 5069.7 The Transition Function 5159.8 The Objective Function 5189.9 Illustration: An Energy Storage Model 5239.10 Base Models and Lookahead Models 5289.11 A Classification of Problems* 5299.12 Policy Evaluation* 5329.13 Advanced Probabilistic Modeling Concepts** 5349.14 Looking Forward 5409.15 Bibliographic Notes 542Exercises 544Bibliography 55710 Uncertainty Modeling 55910.1 Sources of Uncertainty 56010.2 A Modeling Case Study: The COVID Pandemic 57510.3 Stochastic Modeling 57510.4 Monte Carlo Simulation 58110.5 Case Study: Modeling Electricity Prices 58910.6 Sampling vs. Sampled Models 59510.7 Closing Notes 59710.8 Bibliographic Notes 597Exercises 598Bibliography 60111 Designing Policies 60311.1 From Optimization to Machine Learning to Sequential Decision Problems 60511.2 The Classes of Policies 60611.3 Policy Function Approximations 61011.4 Cost Function Approximations 61311.5 Value Function Approximations 61411.6 Direct Lookahead Approximations 61611.7 Hybrid Strategies 62011.8 Randomized Policies 62611.9 Illustration: An Energy Storage Model Revisited 62711.10 Choosing the Policy Class 63111.11 Policy Evaluation 64111.12 Parameter Tuning 64211.13 Bibliographic Notes 646Exercises 646Bibliography 651Part IV - Policy Search 65312 Policy Function Approximations and Policy Search 65512.1 Policy Search as a Sequential Decision Problem 65712.2 Classes of Policy Function Approximations 65812.3 Problem Characteristics 66512.4 Flavors of Policy Search 66612.5 Policy Search with Numerical Derivatives 66912.6 Derivative-Free Methods for Policy Search 67012.7 Exact Derivatives for Continuous Sequential Problems* 67712.8 Exact Derivatives for Discrete Dynamic Programs** 68012.9 Supervised Learning 68612.10 Why Does it Work? 68712.11 Bibliographic Notes 690Exercises 691Bibliography 69813 Cost Function Approximations 70113.1 General Formulation for Parametric CFA 70313.2 Objective-Modified CFAs 70413.3 Constraint-Modified CFAs 71413.4 Bibliographic Notes 725Exercises 726Bibliography 729Part V - Lookahead Policies 73114 Exact Dynamic Programming 73714.1 Discrete Dynamic Programming 73814.2 The Optimality Equations 74014.3 Finite Horizon Problems 74714.4 Continuous Problems with Exact Solutions 75014.5 Infinite Horizon Problems* 75514.6 Value Iteration for Infinite Horizon Problems* 75714.7 Policy Iteration for Infinite Horizon Problems* 76214.8 Hybrid Value-Policy Iteration* 76414.9 Average Reward Dynamic Programming* 76514.10 The Linear Programming Method for Dynamic Programs** 76614.11 Linear Quadratic Regulation 76714.12 Why Does it Work?** 77014.13 Bibliographic Notes 783Exercises 783Bibliography 79315 Backward Approximate Dynamic Programming 79515.1 Backward Approximate Dynamic Programming for Finite Horizon Problems 79715.2 Fitted Value Iteration for Infinite Horizon Problems 80415.3 Value Function Approximation Strategies 80515.4 Computational Observations 81015.5 Bibliographic Notes 816Exercises 816Bibliography 82116 Forward ADP I: The Value of a Policy 82316.1 Sampling the Value of a Policy 82416.2 Stochastic Approximation Methods 83516.3 Bellman's Equation Using a Linear Model* 83716.4 Analysis of TD(0), LSTD, and LSPE Using a Single State* 84216.5 Gradient-based Methods for Approximate Value Iteration* 84516.6 Value Function Approximations Based on Bayesian Learning* 85216.7 Learning Algorithms and Atepsizes 85516.8 Bibliographic Notes 860Exercises 862Bibliography 86417 Forward ADP II: Policy Optimization 86717.1 Overview of Algorithmic Strategies 86917.2 Approximate Value Iteration and Q-Learning Using Lookup Tables 87117.3 Styles of Learning 88117.4 Approximate Value Iteration Using Linear Models 88617.5 On-policy vs. off-policy learning and the exploration-exploitation problem 88817.6 Applications 89417.7 Approximate Policy Iteration 90017.8 The Actor-Critic Paradigm 90717.9 Statistical Bias in the Max Operator* 90917.10 The Linear Programming Method Using Linear Models* 91217.11 Finite Horizon Approximations for Steady-State Applications 91517.12 Bibliographic Notes 917Exercises 918Bibliography 92418 Forward ADP III: Convex Resource Allocation Problems 92718.1 Resource Allocation Problems 93018.2 Values Versus Marginal Values 93718.3 Piecewise Linear Approximations for Scalar Functions 93818.4 Regression Methods 94118.5 Separable Piecewise Linear Approximations 94418.6 Benders Decomposition for Nonseparable Approximations** 94618.7 Linear Approximations for High-Dimensional Applications 95618.8 Resource Allocation with Exogenous Information State 95818.9 Closing Notes 95918.10 Bibliographic Notes 960Exercises 962Bibliography 96719 Direct Lookahead Policies 97119.1 Optimal Policies Using Lookahead Models 97419.2 Creating an Approximate Lookahead Model 97819.3 Modified Objectives in Lookahead Models 98519.4 Evaluating DLA Policies 99219.5 Why Use a DLA? 99719.6 Deterministic Lookaheads 99919.7 A Tour of Stochastic Lookahead Policies 100519.8 Monte Carlo Tree Search for Discrete Decisions 100919.9 Two-Stage Stochastic Programming for Vector Decisions* 101819.10 Observations on DLA Policies 102419.11 Bibliographic Notes 1025Exercises 1027Bibliography 1031Part VI - Multiagent Systems 103320 Multiagent Modeling and Learning 103520.1 Overview of Multiagent Systems 103620.2 A Learning Problem - Flu Mitigation 104420.3 The POMDP Perspective* 105920.4 The Two-Agent Newsvendor Problem 106220.5 Multiple Independent Agents - An HVAC Controller Model 106720.6 Cooperative Agents - A Spatially Distributed Blood Management Problem 107020.7 Closing Notes 107420.8 Why Does it Work? 107420.9 Bibliographic Notes 1076Exercises 1077Bibliography 1083Index 1085
Warren B. Powell, PhD, is Professor Emeritus of Operations Research and Financial Engineering at Princeton University, where he taught for 39 years. He was the founder and Director of CASTLE Laboratory, a research unit that works with industrial partners to test new ideas found in operations research. He supervised 70 graduate students and post-docs, with whom he wrote over 250 papers. He is currently the Chief Analytics Officer of Optimal Dynamics, a lab spinoff that is taking his research to industry.
1997-2025 DolnySlask.com Agencja Internetowa