Controlled Stochastic Process

a stochastic process whose probability characteristics can be changed by means of control actions. The main goal of the theory of stochastic control is to find optimal or near-optimal controls that provide an extremum for a given performance criterion.

Let us take the simple case of controlled Markov chains and consider one of the ways in which a mathematical statement of the problem of finding the optimal control can be formulated. Suppose is a family of homogeneous Markov chains with a finite number of states E = {0,1,..., N} and matrices of transition probabilities . The transition probabilities depend on the parameter d, which belongs to some set of control actions D. The set of functions α = {α₀(x₀), α₁(x₀, x₁,...} with values in D is called the strategy, and each of the functions α_n = α_n (x₀, ..., x_n) is called the control at time n. To every strategy α there corresponds a controlled Markov chain where

Let

where the function f(d, x) ≥ 0 and f(d, 0) = 0. (If the point {0} is an absorbing state and f(d, x) = 1, d ∊ D, x = 1, . . ., N, then V^α(x) is the mathematical expectation of the time of transition from point x to point 0.) The function

is called the value, and the strategy α* is said to be optimal if

Vα* (x) = V (x)

for all x ∊ E.

Under quite general assumptions regarding the set D, it can be shown that the value V(x) satisfies the following optimality equation (the Bellman equation):

where

In the class of all strategies, homogeneous Markovian strategies, which are characterized by a single function α(x) such that α_n (x₀,...,x_n) = α(x_n) for all n = 0, 1, ..., are of the greatest interest.

The following optimality criterion, or sufficient condition for optimality, can be used to verify that a given homogeneous Markovian strategy is optimal: let there be functions α* = α*(x) and V* = V*(x) such that for any d ∊ D

0 = f(x, α*(x)) + Lα* V* ≤ f(x, d) + L^dV*(x)

(where L^d = T^d – I, I being the identity operator), then V* is the value (V* = V), and the strategy α* = α*(x) is optimal.

REFERENCE

Howard, R. A. Dinamicheskoe programmirovanie i markovskie protsessy. Moscow, 1964. (Translated from English.)

A. N. SHIRIAEV

单词	controlled stochastic process
释义	Controlled Stochastic Process Controlled Stochastic Process a stochastic process whose probability characteristics can be changed by means of control actions. The main goal of the theory of stochastic control is to find optimal or near-optimal controls that provide an extremum for a given performance criterion. Let us take the simple case of controlled Markov chains and consider one of the ways in which a mathematical statement of the problem of finding the optimal control can be formulated. Suppose is a family of homogeneous Markov chains with a finite number of states E = {0,1,..., N} and matrices of transition probabilities . The transition probabilities depend on the parameter d, which belongs to some set of control actions D. The set of functions α = {α₀(x₀), α₁(x₀, x₁,...} with values in D is called the strategy, and each of the functions α_n = α_n (x₀, ..., x_n) is called the control at time n. To every strategy α there corresponds a controlled Markov chain where Let where the function f(d, x) ≥ 0 and f(d, 0) = 0. (If the point {0} is an absorbing state and f(d, x) = 1, d ∊ D, x = 1, . . ., N, then V^α(x) is the mathematical expectation of the time of transition from point x to point 0.) The function is called the value, and the strategy α* is said to be optimal if Vα* (x) = V (x) for all x ∊ E. Under quite general assumptions regarding the set D, it can be shown that the value V(x) satisfies the following optimality equation (the Bellman equation): where In the class of all strategies, homogeneous Markovian strategies, which are characterized by a single function α(x) such that α_n (x₀,...,x_n) = α(x_n) for all n = 0, 1, ..., are of the greatest interest. The following optimality criterion, or sufficient condition for optimality, can be used to verify that a given homogeneous Markovian strategy is optimal: let there be functions α* = α(x) and V = V(x) such that for any d ∊ D 0 = f(x, α(x)) + Lα* V* ≤ f(x, d) + L^dV(x) (where L^d* = T^d – I, I being the identity operator), then V* is the value (V* = V), and the strategy α* = α(x) is optimal. REFERENCE Howard, R. A. Dinamicheskoe programmirovanie i markovskie protsessy*. Moscow, 1964. (Translated from English.) A. N. SHIRIAEV
随便看	ewmce ewmd ewmda ewmds ewme ewmf ewmg ewmh ewmi ew/mint ewmis ewmm ewmn ewmo ew module e. w. morley ewmp ewmpca ewms ewmsc ewmsd ewmsnsum ewmv ewn ewnc