# Connexions

You are here: Home » Content » Topics in Applied Probability » Markov Decision -- Type 3 Gains

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• Rice Digital Scholarship

This collection is included in aLens by: Digital Scholarship at Rice University

Click the "Rice Digital Scholarship" link to see all content affiliated with them.

• NSF Partnership

This collection is included inLens: NSF Partnership in Signal Processing
By: Sidney Burrus

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This collection is included inLens: Connexions Featured Content
By: Connexions

Click the "Featured Content" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Inside Collection (Textbook):

Textbook by: Paul E Pfeiffer. E-mail the author

# Markov Decision -- Type 3 Gains

Module by: Paul E Pfeiffer. E-mail the author

## Example 1

An electronic store stocks a certain type of VCR. At the end of each week, an order is placed for early delivery the following Monday. A maximum of four units is stocked. Let the states be the number of units on hand at the end of the sales week: Eb={0,1,2,3,4}Eb={0,1,2,3,4}. Two possible actions:

• Order two units, at a cost of $150 each • Order four units, at a cost of$120 each

Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a penalty of$40 per unit (in losses due to customer dissatisfaction, etc.). Because of turnover, return on sales is considered two percent per week, so that discount is α=1/1.02α=1/1.02 on a weekly basis.

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are two possible actions: order 0 or 2. In states 3 and 4, the only action is to order 0. Customer demand in week n+1n+1 is represented by a random variable Dn+1Dn+1. The class is iid, uniformly distributed on the values 0, 1, 2, 3, 4. If Xn is the state at the end of week n, then {Xn,Dn+1}{Xn,Dn+1} is independent for each n.

Analyze the system as a Markov decision process with case 3 gains, depending upon current state, action, and demand. Determine the transition probability matrix PA (properly padded) and the gain matrix (also padded). Sample calculations are as follows:

• State 0, action 0: p00(0)=1p00(0)=1 (all other p0k(0)=0p0k(0)=0)
• State 0, action 2: p00(2)=P(D2)=3/5,p01(2)=P(D=1)=1/5p00(2)=P(D2)=3/5,p01(2)=P(D=1)=1/5, etc.
• State 2, action 2: p2j(k)=1/5,k=0,1,2,3,4p2j(k)=1/5,k=0,1,2,3,4

For state = i, action = a, and demand = k, we seek g(i,a,k)g(i,a,k)

g(0,0,k)=-40k 0 -40 -80 -120 -160 g(0,2,k)=-300+200min{k,2}-40max{k-2,0} -300 -100 100 60 20 g(0,4,k)=-480+200k -480 -280 -80 120 320 g(0,0,k)=-40k 0 -40 -80 -120 -160 g(0,2,k)=-300+200min{k,2}-40max{k-2,0} -300 -100 100 60 20 g(0,4,k)=-480+200k -480 -280 -80 120 320

1. Complete the transition probability table and the gain table.
2. Determine an optimum infinite-horizon strategy with no discounting.
3. Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02).
4. The manager decides to set up a six-week strategy, after which new sales conditions may be established. Determine an optimum strategy for the six-week period.

Data file

% file orderdata.m

% Version of 4/5/94

% Data organized for computation

type = 3;

states = 0:4;

A   = [0   2   4 ...                   % Actions (padded)

0   2  02 ...

0   2  02 ...

0  00  00 ...

0  00  00];

C   = [0 -300 -480 ...           % Order costs (padded)

0 -300 -300 ...

0 -300 -300 ...

0    0    0 ...

0    0    0];

SP = 200;                                   % Selling price

BP = 40;                                     % Backorder penalty

PD = 0.2*ones(1,5);               % Demand probabilities


## Transition Probabilities and Gains

The procedure

 % file reorder.m
% Version of 4/11/94
% Calculates PA and GA for reorder policy
states = input('Enter row vector of states   ');
A   = input('Enter row vector A of actions (padded)   ');
C   = input('Enter row vector C of order costs (padded)   ');
D   = input('Enter row vector D of demand values   ');
PD = input('Enter row vector PD of demand probabilities     ');
SP = input('Enter unit selling price SP   ');
BP = input('Enter backorder penalty cost BP   ');
m   = length(states');
q   = length(A);
na = q/m;
N   = length(D);
S   = ones(na,1)*states;
S   = S(:)';
[d,s] = meshgrid(D,S);
a   = A'*ones(1,N);
ca = C'*ones(1,N);
TA = (s + a - d).*(s + a - d >= 0);
for i = 1:q
PA(i,:) = tdbn(states,TA(i,:),PD);
end
PA
GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > s+a)


The calculations

orderdata
reorder
Enter row vector of states   states
Enter row vector A of actions (padded)   A
Enter row vector C of order costs (padded)   C
Enter row vector D of demand values   D
Enter row vector PD of demand probabilities     PD
Enter unit selling price SP   SP
Enter backorder penalty cost BP   BP
PA =
1.0000            0            0            0            0
0.6000       0.2000       0.2000            0            0
0.2000       0.2000       0.2000       0.2000       0.2000
0.8000       0.2000            0            0            0
0.4000       0.2000       0.2000       0.2000            0
0.4000       0.2000       0.2000       0.2000            0
0.6000       0.2000       0.2000            0            0
0.2000       0.2000       0.2000       0.2000       0.2000
0.2000       0.2000       0.2000       0.2000       0.2000
0.4000       0.2000       0.2000       0.2000            0
0.4000       0.2000       0.2000       0.2000            0
0.4000       0.2000       0.2000       0.2000            0
0.2000       0.2000       0.2000       0.2000       0.2000
0.2000       0.2000       0.2000       0.2000       0.2000
0.2000       0.2000       0.2000       0.2000       0.2000
GA =
0          -40          -80         -120         -160
-300         -100          100           60           20
-480         -280          -80          120          320
0          200          160          120           80
-300         -100          100          300          260
-300         -100          100          300          260
0          200          400          360          320
-300         -100          100          300          500
-300         -100          100          300          500
0          200          400          600          560
0          200          400          600          560
0          200          400          600          560
0          200          400          600          800
0          200          400          600          800
0          200          400          600          800


## Infinite-horizon strategy (no discounting)

polit
Data needed:

- - - - - - - - - - - - - - -

Enter type number to show gain type   type
Enter row vector of states   states
Enter row vector A of possible actions   A
Enter value of alpha (= 1 for no discounting)   1
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
Index    Action   Value
1         0     -80
2         2     -44
3         4     -80
4         0     112
5         2      52
6         2      52
7         0     256
8         2     100
9         2     100
10         0     352
11         0     352
12         0     352
13         0     400
14         0     400
15         0     400
Initial policy: action numbers
2         1         1         1         1
Policy: actions
2         0         0         0         0

New policy: action numbers
3         2         2         1         1
Policy: actions
4         2         2         0         0
Long-run distribution
0.2800       0.2000       0.2000       0.2000       0.1200
Test values for selecting new policy
Index         Action   Test Value
1.0000             0    -248.0000
2.0000        2.0000    -168.8000
3.0000        4.0000     -41.6000
4.0000             0     -48.8000
5.0000        2.0000      -5.6000
6.0000        2.0000      -5.6000
7.0000             0     131.2000
8.0000        2.0000     138.4000
9.0000        2.0000     138.4000
10.0000             0     294.4000
11.0000             0     294.4000
12.0000             0     294.4000
13.0000             0     438.4000
14.0000             0     438.4000
15.0000             0     438.4000
Optimum policy
State         Action        Value
0        4.0000    -168.0000
1.0000        2.0000    -132.0000
2.0000        2.0000      12.0000
3.0000             0     168.0000
4.0000             0     312.0000
Long-run expected gain per period G
126.4000


## Infinite-horizon strategy (with discounting)

polit
Data needed:
- - - - - - - - - - - - - - -
Enter case number to show gain type   type
Enter row vector of states   states
Enter row vector A of possible actions   A
Enter value of alpha (= 1 for no discounting)   1/1.02
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
Index    Action     Value
1         0      -80
2         2      -44
3         4      -80
4         0      112
5         2       52
6         2       52
7         0      256
8         2      100
9         2      100
10         0     352
11         0     352
12         0     352
13         0     400
14         0     400
15         0     400
Initial policy: action numbers
2         1         1         1         1
Policy: actions
2         0         0         0         0
New policy: action numbers
3         2         2         1         1
Policy: actions
4         2         2         0         0
Test values for selecting policy
Index         Action     Test Value
1.0e+03 *
0.0010             0         6.0746
0.0020        0.0020         6.1533
0.0030        0.0040         6.2776
0.0040             0         6.2740
0.0050        0.0020         6.3155
0.0060        0.0020         6.3155
0.0070             0         6.4533
0.0080        0.0020         6.4576
0.0090        0.0020         6.4576
0.0100             0         6.6155
0.0110             0         6.6155
0.0120             0         6.6155
0.0130             0         6.7576
0.0140             0         6.7576
0.0150             0         6.7576
Optimum policy
State       Action       Value
1.0e+03 *
0       0.0040       6.2776
0.0010       0.0020       6.3155
0.0020       0.0020       6.4576
0.0030            0       6.6155
0.0040            0       6.7576


## Finite-horizon calculations

dpinit
Initialize for finite horizon calculations
Matrices A, PA, and GA, padded if necessary
Enter type number to show gain type   type
Enter vector of states   states
Enter row vector A of possible actions   A
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
Call for dprog
dprog
States and expected total gains
0       1       2       3       4
-44     112     256     352     400
States   Actions
0         2
1         0
2         0
3         0
4         0
dprog
States and expected total gains
0     1.0000     2.0000     3.0000     4.0000
135.2000   178.4000   315.2000   478.4000   615.2000
States   Actions
0         4
1         2
2         2
3         0
4         0
dprog
States and expected total gains
0     1.0000     2.0000     3.0000     4.0000
264.4800   300.4800   444.4800   600.4800   744.4800
States   Actions
0         4
1         2
2         2
3         0
4         0
dprog
States and expected total gains
0     1.0000     2.0000     3.0000     4.0000
390.8800   426.8800   570.8800   726.8800   870.8800
States   Actions
0         4
1         2
2         2
3         0
4         0
dprog
States and expected total gains
0     1.0000     2.0000     3.0000     4.0000
517.2800   553.2800   697.2800   853.2800   997.2800
States   Actions
0         4
1         2
2         2
3         0
4         0
dprog
States and expected total gains
1.0e+03 *
0       0.0010       0.0020       0.0030       0.0040
0.6437       0.6797       0.8237       0.9797       1.1237
States   Actions
0         4
1         2
2         2
3         0
4         0


## Content actions

### Give feedback:

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks