Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Reinforcement Learning

Industrial Applications of Intelligent Agents

Paperback Engels 2020 9781098114831
Verkooppositie 3942Hoogste positie: 3942
Verwachte levertijd ongeveer 8 werkdagen


Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcementand enable a machine to learn by itself.

Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learnnumerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML.

- Learn what RL is and how the algorithms help solve problems
- Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning
- Dive deep into a range of value and policy gradient methods
- Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning
- Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more
- Get practical examples through the accompanying website


Aantal pagina's:350
Hoofdrubriek:IT-management / ICT


Wees de eerste die een lezersrecensie schrijft!

Geef uw waardering

Zeer goed Goed Voldoende Matig Slecht


Who Should Read This Book?
Guiding Principles and Style
Scope and Outline
Supplementary Materials
Conventions Used in This Book
Mathematical Notation
Fair Use Policy
O’Reilly Online Learning
How to Contact Us

1. Why Reinforcement Learning?
Why Now?
Machine Learning
Reinforcement Learning
When Should You Use RL?
RL Applications
Taxonomy of RL Approaches
Model-Free or Model-Based
How Agents Use and Update Their Strategy
Discrete or Continuous Actions
Optimization Methods
Policy Evaluation and Improvement
Fundamental Concepts in Reinforcement Learning
The First RL Algorithm
Is RL the Same as ML?
Reward and Feedback
Reinforcement Learning as a Discipline
Further Reading

2. Markov Decision Processes, Dynamic Programming, and Monte Carlo Methods
Multi-Arm Bandit Testing
Reward Engineering
Policy Evaluation: The Value Function
Policy Improvement: Choosing the Best Action
Simulating the Environment
Running the Experiment
Improving the ϵ -greedy Algorithm
Markov Decision Processes
Inventory Control
Inventory Control Simulation
Policies and Value Functions
Discounted Rewards
Predicting Rewards with the State-Value Function
Predicting Rewards with the Action-Value Function
Optimal Policies
Monte Carlo Policy Generation
Value Iteration with Dynamic Programming
Implementing Value Iteration
Results of Value Iteration
Further Reading

3. Temporal-Difference Learning, Q-Learning, and n-Step Algorithms
Formulation of Temporal-Difference Learning
Q-Learning Versus SARSA
Case Study: Automatically Scaling Application Containers to Reduce Cost
Industrial Example: Real-Time Bidding in Advertising
Defining the MDP
Results of the Real-Time Bidding Environments
Further Improvements
Extensions to Q-Learning
Double Q-Learning
Delayed Q-Learning
Comparing Standard, Double, and Delayed Q-learning
Opposition Learning
n-Step Algorithms
n-Step Algorithms on Grid Environments
Eligibility Traces
Extensions to Eligibility Traces
Watkins’s Q( λ )
Fuzzy Wipes in Watkins’s Q( λ )
Speedy Q-Learning
Accumulating Versus Replacing Eligibility Traces
Further Reading

4. Deep Q-Networks
Deep Learning Architectures
Common Neural Network Architectures
Deep Learning Frameworks
Deep Reinforcement Learning
Deep Q-Learning
Experience Replay
Q-Network Clones
Neural Network Architecture
Implementing DQN
Example: DQN on the CartPole Environment
Case Study: Reducing Energy Usage in Buildings
Rainbow DQN
Distributional RL
Prioritized Experience Replay
Noisy Nets
Dueling Networks
Example: Rainbow DQN on Atari Games
Other DQN Improvements
Improving Exploration
Improving Rewards
Learning from Offline Data
Further Reading

5. Policy Gradient Methods
Benefits of Learning a Policy Directly
How to Calculate the Gradient of a Policy
Policy Gradient Theorem
Policy Functions
Linear Policies
Arbitrary Policies
Basic Implementations
Monte Carlo (REINFORCE)
REINFORCE with Baseline
Gradient Variance Reduction
n-Step Actor-Critic and Advantage Actor-Critic (A2C)
Eligibility Traces Actor-Critic
A Comparison of Basic Policy Gradient Algorithms
Industrial Example: Automatically Purchasing Products for Customers
The Environment: Gym-Shopping-Cart
Results from the Shopping Cart Environment
Further Reading

6. Beyond Policy Gradients
Off-Policy Algorithms
Importance Sampling
Behavior and Target Policies
Off-Policy Q-Learning
Gradient Temporal-Difference Learning
Off-Policy Actor-Critics
Deterministic Policy Gradients
Deterministic Policy Gradients
Deep Deterministic Policy Gradients
Twin Delayed DDPG
Case Study: Recommendations Using Reviews
Improvements to DPG
Trust Region Methods
Kullback–Leibler Divergence
Natural Policy Gradients and Trust Region Policy Optimization
Proximal Policy Optimization
Example: Using Servos for a Real-Life Reacher
Experiment Setup
RL Algorithm Implementation
Increasing the Complexity of the Algorithm
Hyperparameter Tuning in a Simulation
Resulting Policies
Other Policy Gradient Algorithms
Retrace( λ )
Actor-Critic with Experience Replay (ACER)
Actor-Critic Using Kronecker-Factored Trust Regions (ACKTR)
Emphatic Methods
Extensions to Policy Gradient Algorithms
Quantile Regression in Policy Gradient Algorithms
Which Algorithm Should I Use?
A Note on Asynchronous Methods
Further Reading

7. Learning All Possible Policies with Entropy Methods
What Is Entropy?
Maximum Entropy Reinforcement Learning
Soft Actor-Critic
SAC Implementation Details and Discrete Action Spaces
Automatically Adjusting Temperature
Case Study: Automated Traffic Management to Reduce Queuing
Extensions to Maximum Entropy Methods
Other Measures of Entropy (and Ensembles)
Optimistic Exploration Using the Upper Bound of Double Q-Learning
Tinkering with Experience Replay
Soft Policy Gradient
Soft Q-Learning (and Derivatives)
Path Consistency Learning
Performance Comparison: SAC Versus PPO
How Does Entropy Encourage Exploration?
How Does the Temperature Parameter Alter Exploration?
Industrial Example: Learning to Drive with a Remote Control Car
Description of the Problem
Minimizing Training Time
Dramatic Actions
Hyperparameter Search
Final Policy
Further Improvements
Equivalence Between Policy Gradients and Soft Q-Learning
What Does This Mean For the Future?
What Does This Mean Now?

8. Improving How an Agent Learns
Rethinking the MDP
Partially Observable Markov Decision Process
Case Study: Using POMDPs in Autonomous Vehicles
Contextual Markov Decision Processes
MDPs with Changing Actions
Regularized MDPs
Hierarchical Reinforcement Learning
Naive HRL
High-Low Hierarchies with Intrinsic Rewards (HIRO)
Learning Skills and Unsupervised RL
Using Skills in HRL
HRL Conclusions
Multi-Agent Reinforcement Learning
MARL Frameworks
Centralized or Decentralized
Single-Agent Algorithms
Case Study: Using Single-Agent Decentralized Learning in UAVs
Centralized Learning, Decentralized Execution
Decentralized Learning
Other Combinations
Challenges of MARL
MARL Conclusions
Expert Guidance
Behavior Cloning
Imitation RL
Inverse RL
Curriculum Learning
Other Paradigms
Transfer Learning
Further Reading

9. Practical Reinforcement Learning
The RL Project Life Cycle
Life Cycle Definition
Problem Definition: What Is an RL Project?
RL Problems Are Sequential
RL Problems Are Strategic
Low-Level RL Indicators
Types of Learning
RL Engineering and Refinement
Environment Engineering
State Engineering or State Representation Learning
Policy Engineering
Mapping Policies to Action Spaces
Reward Engineering
Further Reading

10. Operational Reinforcement Learning
Scaling RL
Ancillary Tooling
Safety, Security, and Ethics
Further Reading

11. Conclusions and the Future
Tips and Tricks
Framing the Problem
Your Data
Monitoring for Debugging
The Future of Reinforcement Learning
RL Market Opportunities
Future RL and Research Directions
Concluding Remarks
Next Steps
Now It’s Your Turn
Further Reading

A. The Gradient of a Logistic Policy for Two Actions
B. The Gradient of a Softmax Policy

Acronyms and Common Terms
Symbols and Notation


Managementboek Top 100


Populaire producten



        Reinforcement Learning