Skip to content Skip to navigation Skip to collection information

Connexions

You are here: Home » Content » Topics in Applied Probability » Markov Decision -- Type 3 Gains

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This collection is included in aLens by: Digital Scholarship at Rice University

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

  • NSF Partnership display tagshide tags

    This collection is included inLens: NSF Partnership in Signal Processing
    By: Sidney Burrus

    Click the "NSF Partnership" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Featured Content display tagshide tags

    This collection is included inLens: Connexions Featured Content
    By: Connexions

    Click the "Featured Content" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • UniqU content

    This collection is included inLens: UniqU's lens
    By: UniqU, LLC

    Click the "UniqU content" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Markov Decision -- Type 3 Gains

Module by: Paul E Pfeiffer. E-mail the author

Example 1

An electronic store stocks a certain type of VCR. At the end of each week, an order is placed for early delivery the following Monday. A maximum of four units is stocked. Let the states be the number of units on hand at the end of the sales week: Eb={0,1,2,3,4}Eb={0,1,2,3,4}. Two possible actions:

  • Order two units, at a cost of $150 each
  • Order four units, at a cost of $120 each

Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a penalty of $40 per unit (in losses due to customer dissatisfaction, etc.). Because of turnover, return on sales is considered two percent per week, so that discount is α=1/1.02α=1/1.02 on a weekly basis.

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are two possible actions: order 0 or 2. In states 3 and 4, the only action is to order 0. Customer demand in week n+1n+1 is represented by a random variable Dn+1Dn+1. The class is iid, uniformly distributed on the values 0, 1, 2, 3, 4. If Xn is the state at the end of week n, then {Xn,Dn+1}{Xn,Dn+1} is independent for each n.

Analyze the system as a Markov decision process with case 3 gains, depending upon current state, action, and demand. Determine the transition probability matrix PA (properly padded) and the gain matrix (also padded). Sample calculations are as follows:

  • State 0, action 0: p00(0)=1p00(0)=1 (all other p0k(0)=0p0k(0)=0)
  • State 0, action 2: p00(2)=P(D2)=3/5,p01(2)=P(D=1)=1/5p00(2)=P(D2)=3/5,p01(2)=P(D=1)=1/5, etc.
  • State 2, action 2: p2j(k)=1/5,k=0,1,2,3,4p2j(k)=1/5,k=0,1,2,3,4

For state = i, action = a, and demand = k, we seek g(i,a,k)g(i,a,k)

g(0,0,k)=-40k 0 -40 -80 -120 -160 g(0,2,k)=-300+200min{k,2}-40max{k-2,0} -300 -100 100 60 20 g(0,4,k)=-480+200k -480 -280 -80 120 320 g(0,0,k)=-40k 0 -40 -80 -120 -160 g(0,2,k)=-300+200min{k,2}-40max{k-2,0} -300 -100 100 60 20 g(0,4,k)=-480+200k -480 -280 -80 120 320

  1. Complete the transition probability table and the gain table.
  2. Determine an optimum infinite-horizon strategy with no discounting.
  3. Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02).
  4. The manager decides to set up a six-week strategy, after which new sales conditions may be established. Determine an optimum strategy for the six-week period.

Data file

% file orderdata.m

    % Version of 4/5/94

    % Data organized for computation

    type = 3;

    states = 0:4;

    A   = [0   2   4 ...                   % Actions (padded)

           0   2  02 ...

           0   2  02 ...

           0  00  00 ...

           0  00  00];

    C   = [0 -300 -480 ...           % Order costs (padded)

           0 -300 -300 ...

           0 -300 -300 ...

           0    0    0 ...

           0    0    0];

    SP = 200;                                   % Selling price

    BP = 40;                                     % Backorder penalty

    PD = 0.2*ones(1,5);               % Demand probabilities

Transition Probabilities and Gains

The procedure

 % file reorder.m
% Version of 4/11/94
% Calculates PA and GA for reorder policy
states = input('Enter row vector of states   ');
A   = input('Enter row vector A of actions (padded)   ');
C   = input('Enter row vector C of order costs (padded)   ');
D   = input('Enter row vector D of demand values   ');
PD = input('Enter row vector PD of demand probabilities     ');
SP = input('Enter unit selling price SP   ');
BP = input('Enter backorder penalty cost BP   ');
m   = length(states');
q   = length(A);
na = q/m;
N   = length(D);
S   = ones(na,1)*states;
S   = S(:)';
[d,s] = meshgrid(D,S);
a   = A'*ones(1,N);
ca = C'*ones(1,N);
TA = (s + a - d).*(s + a - d >= 0);
for i = 1:q
PA(i,:) = tdbn(states,TA(i,:),PD);
end
PA
GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > s+a)

The calculations

orderdata
 reorder
Enter row vector of states   states
Enter row vector A of actions (padded)   A
Enter row vector C of order costs (padded)   C
Enter row vector D of demand values   D
Enter row vector PD of demand probabilities     PD
Enter unit selling price SP   SP
Enter backorder penalty cost BP   BP
PA =
      1.0000            0            0            0            0
      0.6000       0.2000       0.2000            0            0
      0.2000       0.2000       0.2000       0.2000       0.2000
      0.8000       0.2000            0            0            0
      0.4000       0.2000       0.2000       0.2000            0
      0.4000       0.2000       0.2000       0.2000            0
      0.6000       0.2000       0.2000            0            0
      0.2000       0.2000       0.2000       0.2000       0.2000
      0.2000       0.2000       0.2000       0.2000       0.2000
      0.4000       0.2000       0.2000       0.2000            0
      0.4000       0.2000       0.2000       0.2000            0
      0.4000       0.2000       0.2000       0.2000            0
      0.2000       0.2000       0.2000       0.2000       0.2000
      0.2000       0.2000       0.2000       0.2000       0.2000
      0.2000       0.2000       0.2000       0.2000       0.2000
GA =
           0          -40          -80         -120         -160
        -300         -100          100           60           20
        -480         -280          -80          120          320
           0          200          160          120           80
        -300         -100          100          300          260
        -300         -100          100          300          260
           0          200          400          360          320
        -300         -100          100          300          500
        -300         -100          100          300          500
           0          200          400          600          560
           0          200          400          600          560
           0          200          400          600          560
           0          200          400          600          800
           0          200          400          600          800
           0          200          400          600          800
    

Infinite-horizon strategy (no discounting)

polit
Data needed:

- - - - - - - - - - - - - - -

Enter type number to show gain type   type
Enter row vector of states   states
Enter row vector A of possible actions   A
Enter value of alpha (= 1 for no discounting)   1
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
Index    Action   Value
    1         0     -80
    2         2     -44
    3         4     -80
    4         0     112
    5         2      52
    6         2      52
    7         0     256
    8         2     100
    9         2     100
   10         0     352
   11         0     352
   12         0     352
   13         0     400
   14         0     400
   15         0     400
Initial policy: action numbers
        2         1         1         1         1
Policy: actions
        2         0         0         0         0

New policy: action numbers
        3         2         2         1         1
Policy: actions
        4         2         2         0         0
Long-run distribution
      0.2800       0.2000       0.2000       0.2000       0.1200
Test values for selecting new policy
      Index         Action   Test Value
      1.0000             0    -248.0000
      2.0000        2.0000    -168.8000
      3.0000        4.0000     -41.6000
      4.0000             0     -48.8000
      5.0000        2.0000      -5.6000
      6.0000        2.0000      -5.6000
      7.0000             0     131.2000
      8.0000        2.0000     138.4000
      9.0000        2.0000     138.4000
     10.0000             0     294.4000
     11.0000             0     294.4000
     12.0000             0     294.4000
     13.0000             0     438.4000
     14.0000             0     438.4000
     15.0000             0     438.4000
    Optimum policy
      State         Action        Value
           0        4.0000    -168.0000
      1.0000        2.0000    -132.0000
      2.0000        2.0000      12.0000
      3.0000             0     168.0000
      4.0000             0     312.0000
Long-run expected gain per period G
  126.4000
    

Infinite-horizon strategy (with discounting)

polit
Data needed:
- - - - - - - - - - - - - - -
Enter case number to show gain type   type
Enter row vector of states   states
Enter row vector A of possible actions   A
Enter value of alpha (= 1 for no discounting)   1/1.02
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
 Index    Action     Value
     1         0      -80
     2         2      -44
     3         4      -80
     4         0      112
     5         2       52
     6         2       52
     7         0      256
     8         2      100
     9         2      100
     10         0     352
     11         0     352
     12         0     352
     13         0     400
     14         0     400
     15         0     400
Initial policy: action numbers
        2         1         1         1         1
Policy: actions
        2         0         0         0         0
New policy: action numbers
        3         2         2         1         1
Policy: actions
        4         2         2         0         0
Test values for selecting policy
Index         Action     Test Value
1.0e+03 *
0.0010             0         6.0746
0.0020        0.0020         6.1533
0.0030        0.0040         6.2776
0.0040             0         6.2740
0.0050        0.0020         6.3155
0.0060        0.0020         6.3155
0.0070             0         6.4533
0.0080        0.0020         6.4576
0.0090        0.0020         6.4576
0.0100             0         6.6155
0.0110             0         6.6155
0.0120             0         6.6155
0.0130             0         6.7576
0.0140             0         6.7576
0.0150             0         6.7576
Optimum policy
State       Action       Value
1.0e+03 * 
     0       0.0040       6.2776
0.0010       0.0020       6.3155
0.0020       0.0020       6.4576
0.0030            0       6.6155
0.0040            0       6.7576

Finite-horizon calculations

dpinit
Initialize for finite horizon calculations
Matrices A, PA, and GA, padded if necessary
Enter type number to show gain type   type
Enter vector of states   states
Enter row vector A of possible actions   A
Enter matrix PA of transition probabilities   PA
Enter matrix GA of gains   GA
Enter row vector PD of demand probabilities   PD
Call for dprog
dprog
States and expected total gains
      0       1       2       3       4
    -44     112     256     352     400
States   Actions
     0         2
     1         0
     2         0
     3         0
     4         0
dprog
States and expected total gains
         0     1.0000     2.0000     3.0000     4.0000
  135.2000   178.4000   315.2000   478.4000   615.2000
States   Actions
     0         4
     1         2
     2         2
     3         0
     4         0
dprog
States and expected total gains
         0     1.0000     2.0000     3.0000     4.0000
  264.4800   300.4800   444.4800   600.4800   744.4800
States   Actions
     0         4
     1         2
     2         2
     3         0
     4         0
dprog
States and expected total gains
         0     1.0000     2.0000     3.0000     4.0000
  390.8800   426.8800   570.8800   726.8800   870.8800
States   Actions
     0         4
     1         2
     2         2
     3         0
     4         0
 dprog
States and expected total gains
         0     1.0000     2.0000     3.0000     4.0000
  517.2800   553.2800   697.2800   853.2800   997.2800
States   Actions
     0         4
     1         2
     2         2
     3         0
     4         0
dprog
States and expected total gains
    1.0e+03 *
           0       0.0010       0.0020       0.0030       0.0040
      0.6437       0.6797       0.8237       0.9797       1.1237
States   Actions
     0         4
     1         2
     2         2
     3         0
     4         0

Collection Navigation

Content actions

Download module as:

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks