Unpredictably Rational : Lecture 4 (Dutta): Dynamic Programming 3

2010-12-12

Lecture 4 (Dutta): Dynamic Programming 3

Original article (link) posted: 06/10/2005

In the last class, we checked the strong relationship between the dynamic optimization problem and the functional equation. In this class, we examined continuity, monotonicity and concavity of the value function.

Assumptions (for continuity)
(A1) State Space: Borel subset of a metric space
(A2) Action Space: Metric space
(A3) Reward Function: Bounded, measurable and continuous
(A4) Transition Function: Weakly continuous
(A5) Feasibility Correspondence: continuous

Theorem (Maitra 68)
Under (A1)-(A5), the value function V is bounded and continuous function. Furthermore, there is a stationary Markovian policy, h, that is optimal.
Proof
Step1 Bellman operator T maps bounded and continuous function (denoted by C(S)) back into the same space.
Step2 C(S) is a complete metric space.
Step3 T is a contraction.
Step4 There is a measurable selection h from the Bellman equation.
Note) To prove Step1, we use the Maximum Theorem. For the other part of the proof, we rely on the established results.

Assumptions (for monotonicity)
(B1) State Space: A subset of R^n
(B2)=(A2)
(B3)=(A3) + increasing on S (State Space)
(B4)=(A4) + First Order Stochastically Increases on S
(B5)=(A5) + A higher state has a larger feasible set

Theorem (Monotonicity)
Under (B1)-(B5), the value function is bounded, continuous and increasing on S.

Note) Step2-4 are almost same as before. So, we essentially need to check only Step1.

Theorem (Strict Monotonicity)
If we have a strictly increasing reward function, then V is also strictly increasing.
Proof
It is easy to see from the definition of the functional equation.
You should not try to restrict the space to the set of strictly increasing functions, because it is not complete. (the limit may not be a strictly increasing function)

Assumptions (Concavity)
(C1)=(B1) + S is convex
(C2) Action Space: A subset of R^m and convex
(C3)=(B3) + reward function is concave
(C4)=(A4) + transition function is "concave"
(concavity here is defined in terms of Second Order Stochastic Dominance)
(C5)=(B5) + graph is convex

Theorem (Concavity)
Under (C1)-(C5), the value function is bounded, continuous, and concave.
With strict concavity of the reward function, V is bounded, continuous, and strictly concave. In this case, h becomes a continuous function.

2010-12-12

Lecture 4 (Dutta): Dynamic Programming 3

No comments: