Unpredictably Rational : dynamic programming

Showing posts with label dynamic programming. Show all posts

2011-02-19

Lecture 7 (Dutta): Repeated Games 2

Original article (link) posted: 25/10/2005

We continued to examine the Abreu-Pearce-Stacchetti (APS) operator, particularly focusing on the following two theorems.

Theorem1 (Necessity)
V* = LV*

Theorem2 (Sufficiency)
If V = LV (and V is bounded), then V is a subset of V*

where L is APS operator and V* is the set of SPE payoffs of the repeated game.

The proof of Theorem 1 is not difficult. We used "unimprovability" to prove Theorem 2. APS operator also establishes following results.

1. V* is compact
2. V* is increasing in the discount factor
3. APS operator is monotone

Using the third result with two theorems mentioned above, we can derive the algorithm to compute SPE payoffs. That is, starting with a large set of candidate equilibrium payoffs (say, a convex hull of the set of feasible payoffs), we just need to apply the APS operator iteratively until the sequence of sets will converge. Then, the limit must coincide with V*.

2010-12-28

Lecture 5 (Dutta): Dynamic Programming 4

Original article (link) posted: 13/10/2005

First we looked at two more properties of the value function, supermodularity and differentiability.
Then, we examined the model with slightly different assumptions; deterministic transition function, unbounded reward function, unit discount factor, and finite horizon DP.

Result (Supermodularity)
Suppose action space is subset of R. If the reward function is supermodular and transition function is action-dependent, then the optimal action correspondence is monotone (the largest action in the optimal set is monotone).

Result (Differentiability)
Suppose reward function is differentiable on S and either
a) transition function is action-dependent
b) transition function has a density which is differentiable on S
then, V is differentiable on int S

Deterministic model
Just a special case of the stochastic case

Unbounded reward
Some kind of bound conditions are needed

No discounting
Continuity of discounting case and no discounting case (See Dutta (1995))
Note)
It is hard to derive the value function without discounting (Long-run average payoff). So, we can first solve the value function in the discounting case and make a discount factor go to unity to solve it.

Finite horizon DP (continuity of V)
The value function of a finite DP problem with T remaining period, V(T) converges (in sup norm) to that of infinite horizon model as T goes to infinite.

Finite horizon DP (continuity of h)
Under continuity and compactness assumptions, there exists a sequence of optimal action policies h(t), h(t-1),… If h(T) converges to some policy, say h, as T goes to infinity, then h is a stationary optimal policy for the corresponding infinite DP problem.

2010-12-12

Lecture 4 (Dutta): Dynamic Programming 3

Original article (link) posted: 06/10/2005

In the last class, we checked the strong relationship between the dynamic optimization problem and the functional equation. In this class, we examined continuity, monotonicity and concavity of the value function.

Assumptions (for continuity)
(A1) State Space: Borel subset of a metric space
(A2) Action Space: Metric space
(A3) Reward Function: Bounded, measurable and continuous
(A4) Transition Function: Weakly continuous
(A5) Feasibility Correspondence: continuous

Theorem (Maitra 68)
Under (A1)-(A5), the value function V is bounded and continuous function. Furthermore, there is a stationary Markovian policy, h, that is optimal.
Proof
Step1 Bellman operator T maps bounded and continuous function (denoted by C(S)) back into the same space.
Step2 C(S) is a complete metric space.
Step3 T is a contraction.
Step4 There is a measurable selection h from the Bellman equation.
Note) To prove Step1, we use the Maximum Theorem. For the other part of the proof, we rely on the established results.

Assumptions (for monotonicity)
(B1) State Space: A subset of R^n
(B2)=(A2)
(B3)=(A3) + increasing on S (State Space)
(B4)=(A4) + First Order Stochastically Increases on S
(B5)=(A5) + A higher state has a larger feasible set

Theorem (Monotonicity)
Under (B1)-(B5), the value function is bounded, continuous and increasing on S.

Note) Step2-4 are almost same as before. So, we essentially need to check only Step1.

Theorem (Strict Monotonicity)
If we have a strictly increasing reward function, then V is also strictly increasing.
Proof
It is easy to see from the definition of the functional equation.
You should not try to restrict the space to the set of strictly increasing functions, because it is not complete. (the limit may not be a strictly increasing function)

Assumptions (Concavity)
(C1)=(B1) + S is convex
(C2) Action Space: A subset of R^m and convex
(C3)=(B3) + reward function is concave
(C4)=(A4) + transition function is "concave"
(concavity here is defined in terms of Second Order Stochastic Dominance)
(C5)=(B5) + graph is convex

Theorem (Concavity)
Under (C1)-(C5), the value function is bounded, continuous, and concave.
With strict concavity of the reward function, V is bounded, continuous, and strictly concave. In this case, h becomes a continuous function.

2010-09-30

Lecture 3 (Dutta): Dynamic Programming 2

Original article (link) posted: 28/09/2005

In the last class, we saw the following results;
Necessity: the value function of the dynamic problem V becomes the fixed point of the corresponding Bellman problem, i.e., TV=V.
Sufficiency: if the bounded function U satisfies TU=U in a Bellman problem and if there exists a maximizing selection for TU, then U is a value function of the original dynamic problem.

Our focus on Lecture 3 is to prove these results formally and to derive some other properties about Bellman operator T.

Topics in the class

1) The proof of necessity
2) The proof of sufficiency
To read the chapter 4 of SLP in advance helped me a lot to understand the proof. Although Chapter 4 deals with DP under certainty, it has direct link to uncertain case which was covered in the class.

3) Finding a value function
(a) The Bellman operator is a contraction (by Blackwell)
(b) The set of bounded and continuous functions with sup-norm is a complete metric space. Completeness is quite important to derive properties of value functions, because it assures the properties in the limit (=the value function).

Comments
Blackwell's sufficient conditions for a mapping to be a contraction seem very useful. Professor Dutta mentioned that despite the seminal works in the field of applied mathematics Blackwell's first half of the career was not bright because of the racial discrimination. (He is a black.)

2010-08-26

Lecture 2 (Dutta): Dynamic Programming 1

Original article (link) posted: 22/09/2005

Topics in the class

1) Conditional Probability and Feller Property
(a) The definitions of a transition function: q and the conditional expectation operator: T.
(b) The properties of Tg (g is a measurable function); measurable, non-negative (if g non-negative), and bounded (if g bounded).
(c) Feller Property.

2) Dynamic Programming Set Up
(d) The def. of a reward function and a feasibility correspondence.
(e) Example; Neo-classical growth model, Portfolio choice, Capital accumulation, Search, and, Price with inertia (menu cost).
(f) The def. of history and policy (action).
(g) Setting up optimization problem and value function.

3) Bellman (Optimality) Equation
(h) Necessity: If the value function V is measurable, then TV=V.
(i) Sufficiency: If the bounded and measurable function U solves U=TU, then U is larger than or equal to V. Additionally if there is a selection from the optimality equation, then U=V. Note) a selection from TU=U is a stationary Markovian policy which solves TU=U.

Comments

Basic concepts in Measure theory such as sigma-algebra and measurability are postulated. I should check what Feller Property exactly means. (I'm not sure if it's just a definition or with necessary and sufficient conditions.)

2010-07-21

Great statistician passed away

I heard a sad news: Eminent statistician David Blackwell has died at 91

Professor Blackwell, though a mathematician or statistician, is quite famous to us economists for his significant contribution in mathematical statistics and dynamic programming.
Here is a link to the wikipedia article about him.

He has written many influential papers such as:

"Conditional Expectation and Unbiased Sequential Estimation," Annals of Mathematical Statistics, Vol. 18, No. 1 (Mar., 1947), pp. 105-110
"Equivalent Comparisons of Experiments," Annals of Mathematical Statistics, Vol. 24, No. 2 (Jun., 1953), pp. 265-272
"Discounted Dynamic Programming," Annals of Mathematical Statistics, Vol. 36, No. 1 (Feb., 1965), pp. 226-235

Interestingly, despite his deep thoughts and ingenious works on each research topic, he described himself as follows:

I've worked in so many areas—I'm sort of a dilettante. Basically, I'm not interested in doing research and I never have been.

May he rest in peace.

2010-07-07

Lecture 1 (Dutta): Math Preliminaries

Original article (link) posted: 13/09/2005

The class by professor Dutta started with mathematical preliminaries. It might be good to study those concepts and some theorem again because I left myself unclear of some of them during taking a math class in my first year.
Well, I am thinking to write a brief summary for each week (The class is once every week on Monday). Here is the first one.

Topics in the class

1) Correspondings and Maximum Theorem
2) Contraction Mapping Theorem

What we covered in the class

In (1):
(a) The definitions of correspondings and several versions of continuities: upper semi-continuity (USC), lower semi-continuity (LSC) (and continuity).
(b) Maximum Theorem and its proof. Note) Maximum Theorem says that the maximum value is continuous and the maximizer is USC in parameters under some conditions.

In (2):
(c) The def. of contraction, Cauchy sequences and complete metric space.
(d) Contraction Mapping Theorem and its proof. Note) The theorem says that if there is a contraction corresponding and its domain is a complete metric space, then there exists a unique fixed point.

Comments

(a) I've often mixed up USC and LSC, but finally the difference seems to be clear for me.
(b) I need to reconsider the proof. It's not so complicated but not that easy either.
(c) I realized that I had forgotten the def. of complete metric space...
(d) The proof is much easier than (b). Uniqueness is almost straight forward.

Recommended readings

SLP Chapter 3
Sundaram (1995) "A Course in Optimization Theory"

2010-07-04

Microeconomic Analysis - G6218

Original article (link) posted: 13/09/2005

Today, I attended the first class of G6218 which is an advanced microeconomics for upper class students by Professor Dutta. The course covers basic issues in dynamic economics and consists of the following three parts.

1) Dynamic Programming: The theory of single-agent models
2) Repeated/Dynamic Games: The theory of multi-agent models
3) Application: A strategic analysis of climate change

Although there is no required textbook, the two books listed below seems to be useful.

Stokey, Lucas and Prescott (1989) "Recursive Methods in DYnamic Economics" for (1)
Fudenberg and Tirole (1991) "Game Theory" for (2)

Since professor Dutta is very good at teaching and the topics are interesting for me and deep in mathematics, I am thinking to take this course. It should be better to bring SLP from Princeton... (I asked my friends in Princeton to keep my books.)

2010-06-09

Two essential papers by APS

Original article (link) posted: 20/08/2005

What's the difference between APS (1986) and APS (1990)? The distinction is similar to that of Abreu (1986) and Abreu (1988). The earlier papers (both published in JET), APS (1986) and Abreu (1988), analyze the optimal strategies in actual oligopoly models by using powerful properties in repeated games established by them. The later papers (both in Econometrica) extend the results in generalized situations. Although these two Econometrica papers are more general and sophisticated than those of JET, they may have some drawbacks such as, lack of motivation, too concise explanation (to understand), and more seriously, showing no actual optimal strategies. Two papers in JET are strongly motivated by the actual collusion problems in IO, and hence you can see how useful the properties they found are.

Two APS papers heavily rely on the technique used in dynamic programming. Using those technique, they reduce the repeated game to a static structure from which can be extracted the optimal equilibria in question. That is,they construct a new game by truncating the discounted supergame as follows:
after each first-period history, replace the sequential equilibrium successor by the payoffs associated with that successor.

APS (1990) summarize the distinction of the two papers.

First, it relaxes the restriction of symmetry, showing the theory capable of embracing both asymmetric equilibria of symmetric games and arbitrary asymmetric games. Secondly, the sufficiency of using bang-bang reward functions in efficiently collusive equilibria is strengthened to a necessity theorem. Finally, we provide an algorithm useful in computing the sequential equilibrium value set.
(APS (1990), p.1044)

Finally, I would like to remind you of the difference between Abreu and APS. These papers are essentially different in the sense that the former analyze perfect monitoring cases whereas the latter consider imperfect monitoring cases. Moreover, even though all these papers are inspired by dynamic programming technique, their focus is different more or less. The argument in Abreu is about equilibrium paths, but that in APS is about equilibrium payoff sets.

References

Abreu (1986) "Extremal Equilibria of Oligopolistic Supergames" JET, 39
Abreu (1988) "On the Theory of Infinitely Repeated Games with Discounting" Econometrica, 56
Abreu, Pearce and Stacchetti (1986) "Optimal Cartel Equilibria with Imperfect Monitoring" JET, 39
Abreu, Pearce and Stacchetti (1990) "Toward a Theory of Discounted Repeated Games with Imperfect Monitoring" Econometrica, 58

2010-05-24

POMDP

Unpated (July 24, 2010)
Now, their paper is uploaded:
KANDORI, M. and Obara, I. (2010) "Towards a Belief-Based Theory of Repeated Games with Private Monitoring: An Application of POMDP"
(paper can be downloaded from the author's website)

When I had a chat with Prof. Kandori on Friday, he talked about his recent joint-work with Ichiro Obara (UCLA) on repeated games with private monitoring. The key idea is to use a well-known method in dynamic programming called POMDP (partially observable Markov decision process).
Then, they derived the (right notion of) one-shot deviation principle for private monitoring, and constructed a method to check whether an arbitrary strategy, expressed by finite automaton, constitutes an equilibrium or not. I couldn't follow the arguments in detail, but what they have found sounds really path-breaking.
Look forward to see their paper soon!

Information about POMDP websites
Useful site includes tutorials and codes. link
Lecture movie at Stanford (Machine learning: Lecture 20) link

2010-05-21

Discounting and dynamic programming

Original article (link) posted: 03/08/2005

Radner (1985) demonstrated the existence of fully efficient perfect equilibria in a class of partnership games with the limit of means criterion.
...
In my opinion the repeated partnership games (and most repeated games) are better modeled with discounting than with the limit of means criterion.
...
The per-period nature of the problem is nicely reflected in the discounting case, where the loss gets capitalized in the value set: player must be punished (sooner or later) if current output is low. Without discounting, it is not necessary to deter shirking period by period: if a player cheats for k periods, it has no effect on his long-run average payoff. Only infinite strings of deviations are a problem, and these Radner detects using a "review strategy" that, according to the law of the iterated log, will yield a first-best equilibrium average payoff.
...
Statistical methods are ideally suited to guarding against long-run deviations, whereas dynamic programming methods are largely inapplicable at "delta"=1. With discounting, the problem of deterring current deviations leads naturally to the decomposition of a supergame profile into behavior today and continuation values for the future. The dynamic programming perspective has the benefit of unifying the treatment of patient and impatient players, infinite and finite horizon games, and implicit and explicit contracts.
(Pearce (1992), p.154-5)

Reference

Radner (1985) "Repeated Partnership Games with Imperfect Monitoring and No Discounting" RES, 53

2010-05-20

Bang-bang rewards

Original article (link) posted: 03/08/2005

Self-generation and related techniques were first developed in the context of unrestricted symmetric equilibria of the Green-Porter model in APS (1986), and then presented in greater generality in APS (1990).
(Pearce (1992), p.152)

In APS (1990), they found the following bang-bang property.

The equilibrium value set V is compact, and for all elements v in V there exists an equilibrium whose implicit reward functions after each history take only values in the set of extreme points of V.
Under certain conditions the "bang-bang sufficiency" result given above can be strengthened to a necessity result: an equilibrium that maximizes a linear combination of player payoffs (including negative combinations) must have implicit reward functions that use only extreme points of V. The rough intuition is the same as the one given earlier for the Green-Porter model: if you are creating incentives by moving rewards in a direction that reduces the objective function of the problem, do so aggressively (move until you can't go any further in V) but in as small and informative a region of signal space as possible. This advice cannot be applied literally in a model with a discrete signal space, so the bang-bang necessity result does not hold. The sufficiency result can be restored trivially in an essentially discrete model if the signal space is taken to include the outcome space of a public randomization device.
(Pearce (1992), p.153)