*Original article (link) posted: 13/10/2005*

First we looked at two more properties of the value function, supermodularity and differentiability.

Then, we examined the model with slightly different assumptions; deterministic transition function, unbounded reward function, unit discount factor, and finite horizon DP.

__Result (Supermodularity)__

Suppose action space is subset of

*R*. If the reward function is supermodular and transition function is action-dependent, then the optimal action correspondence is monotone (the largest action in the optimal set is monotone).

__Result (Differentiability)__

Suppose reward function is differentiable on

*S*and either

a) transition function is action-dependent

b) transition function has a density which is differentiable on

*S*

then,

*V*is differentiable on

*int S*

__Deterministic model__

Just a special case of the stochastic case

__Unbounded reward__

Some kind of bound conditions are needed

__No discounting__

Continuity of discounting case and no discounting case (See

**Dutta (1995)**)

*Note)*

It is hard to derive the value function without discounting (Long-run average payoff). So, we can first solve the value function in the discounting case and make a discount factor go to unity to solve it.

__Finite horizon DP (continuity of V)__

The value function of a finite DP problem with

*T*remaining period,

*V(T)*converges (in sup norm) to that of infinite horizon model as

*T*goes to infinite.

__Finite horizon DP (continuity of h)__

Under continuity and compactness assumptions, there exists a sequence of optimal action policies

*h(t), h(t-1)*,… If

*h(T)*converges to some policy, say

*h*, as

*T*goes to infinity, then

*h*is a stationary optimal policy for the corresponding infinite DP problem.

## No comments:

Post a Comment