强化学习与最优控制

2023-06-15 22:57 作者:想不到吧我还是我 0人读过 | 我要投稿

链接：https://pan.baidu.com/s/1wgyQz-jpgF32a-lkQzuV7A?pwd=8xc7

提取码：8xc7

Dimitri P. Bertseka,美国MIT终身教授，美国国家工程院院士，清华大学复杂与网络化系统研究中心客座教授,电气工程与计算机科学领域国际知名作者，著有《非线性规划》《网络优化》《凸优化》等十几本畅销教材和专著。本书的目的是考虑大型且具有挑战性的多阶段决策问题，这些问题原则上可以通过动态规划和最优控制来解决，但它们的精确解决方案在计算上是难以处理的。本书讨论依赖于近似的解决方法，以产生具有足够性能的次优策略。这些方法统称为增强学习，也可以叫做近似动态规划和神经动态规划等。

本书的主题产生于最优控制和人工智能思想的相互作用。本书的目的之一是探索这两个领域之间的共同边界，并架设一座具有任一领域背景的专业人士都可以访问的桥梁。

内容简介

　　《强化学习与最优控制（英文版）》的目的是考虑大型且具有挑战性的多阶段决策问题，这些问题原则上可以通过动态规划和优控制来解决，但它们的解决方案在计算上是难以处理的。《强化学习与最优控制（英文版）》讨论依赖于近似的解决方法，以产生具有足够性能的次优策略。这些方法统称为增强学习，也可以叫做近似动态规划和神经动态规划等。《强化学习与最优控制（英文版）》的主题产生于优控制和人工智能思想的相互作用。《强化学习与最优控制（英文版）》的目的之一是探索这两个领域之间的共同边界，并架设一座具有任一领域背景的人士都可以访问的桥梁。

作者简介

Dimitri P. Bertseka,美国MIT终身教授，美国国家工程院院士，清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者，著有《非线性规划》《网络优化》《凸优化》等十几本畅销教材和专著。

前言/序言

Preface

Turning to the succor of modern computing machines, let us

renounce all analytic tools.

Richard Bellman [Bel57]

From a teleological point of view the particular numerical solution

of any particular set of equations is of far less importance than

the understanding of the nature of the solution.

Richard Bellman [Bel57]

In this book we consider large and challenging multistage decision problems,

which can be solved in principle by dynamic programming (DP for short),

but their exact solution is computationally intractable. We discuss solution

methods that rely on approximations to produce suboptimal policies with

adequate performance. These methods are collectively known by several

essentially equivalent names: reinforcement learning, approximate dynamic

programming, and neuro-dynamic programming. We will use primarily the

most popular name: reinforcement learning.

Our subject has benefited greatly from the interplay of ideas from

optimal control and from artificial intelligence. One of the aims of the

book is to explore the common boundary between these two fields and to

form a bridge that is accessible by workers with background in either field.

Another aim is to organize coherently the broad mosaic of methods that

have proved successful in practice while having a solid theoretical and/or

logical foundation. This may help researchers and practitioners to find

their way through the maze of competing ideas that constitute the current

state of the art.

There are two general approaches for DP-based suboptimal control.

The first is approximation in value space, where we approximate in some

way the optimal cost-to-go function with some other function. The major

alternative to approximation in value space is approximation in policy

查看全部↓

标签：

强化学习与最优控制

内容简介

作者简介

目录

前言/序言