Example 1
An electronic store stocks a certain type of DVD player. At the end of each week,
an order is placed for early delivery the following Monday. A maximum of four
units is stocked. Let the states be the number of units on hand at the end
of the sales week:
- Order two units, at a cost of $150 each
- Order four units, at a cost of $120 each
Units sell for $200. If demand exceeds the stock in hand, the retailer assumes
a penalty of $40 per unit (in losses due to customer dissatisfaction, etc.).
Because of turnover, return on sales is considered two percent per week, so
that discount is
In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2
there are two possible actions: order 0 or 2. In states 3 and 4, the only action
is to order 0. Customer demand in week
Analyze the system as a Markov decision process with type 3 gains, depending upon current state, action, and demand. Determine the transition probability matrix PA (properly padded) and the gain matrix (also padded). Sample calculations are as follows:
For state = i, action = a, and demand = k, we seek
- Complete the transition probability table and the gain table.
- Determine an optimum infinite-horizon strategy with no discounting.
- Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02).
- The manager decides to set up a six-week strategy, after which new sales conditions may be established. Determine an optimum strategy for the six-week period.
Data file
% file orderdata.m
% Version of 4/5/94
% Data organized for computation
type = 3;
states = 0:4;
= [0 2 4 ... % Actions (padded)
0 2 02 ...
0 2 02 ...
0 00 00 ...
0 00 00];
C = [0 -300 -480 ... % Order costs (padded)
0 -300 -300 ...
0 -300 -300 ...
0 0 0 ...
0 0 0];
SP = 200; % Selling price
BP = 40; % Backorder penalty
PD = 0.2*ones(1,5); % Demand probabilities







Basic M-Procedures for Calculation
