Lecture 5 (Dutta): Dynamic Programming 4

Original article (link) posted: 13/10/2005

First we looked at two more properties of the value function, supermodularity and differentiability.
Then, we examined the model with slightly different assumptions; deterministic transition function, unbounded reward function, unit discount factor, and finite horizon DP.

Result (Supermodularity)
Suppose action space is subset of R. If the reward function is supermodular and transition function is action-dependent, then the optimal action correspondence is monotone (the largest action in the optimal set is monotone).

Result (Differentiability)
Suppose reward function is differentiable on S and either
a) transition function is action-dependent
b) transition function has a density which is differentiable on S
then, V is differentiable on int S

Deterministic model
Just a special case of the stochastic case

Unbounded reward
Some kind of bound conditions are needed

No discounting
Continuity of discounting case and no discounting case (See Dutta (1995))
It is hard to derive the value function without discounting (Long-run average payoff). So, we can first solve the value function in the discounting case and make a discount factor go to unity to solve it.

Finite horizon DP (continuity of V)
The value function of a finite DP problem with T remaining period, V(T) converges (in sup norm) to that of infinite horizon model as T goes to infinite.

Finite horizon DP (continuity of h)
Under continuity and compactness assumptions, there exists a sequence of optimal action policies h(t), h(t-1),… If h(T) converges to some policy, say h, as T goes to infinity, then h is a stationary optimal policy for the corresponding infinite DP problem.

No comments: