### Example 1

An electronic store stocks a certain type of DVD player. At the end of each week,
an order is placed for early delivery the following Monday. A maximum of four
units is stocked. Let the states be the number of units on hand at the end
of the sales week:

- Order two units, at a cost of $150 each
- Order four units, at a cost of $120 each

Units sell for $200. If demand exceeds the stock in hand, the retailer assumes
a penalty of $40 per unit (in losses due to customer dissatisfaction, etc.).
Because of turnover, return on sales is considered two percent per week, so
that discount is

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2
there are two possible actions: order 0 or 2. In states 3 and 4, the only action
is to order 0. Customer demand in week *X _{n}* is the state at the end of week

*n*, then

*n*.

Analyze the system as a Markov decision process with type 3 gains, depending upon
current state, action, and demand. Determine the transition probability matrix
PA (properly padded) and the gain matrix (also padded). *Sample calculations* are
as follows:

For state = *i*, action = *a*, and demand = *k*, we seek

- Complete the transition probability table and the gain table.
- Determine an optimum infinite-horizon strategy with no discounting.
- Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02).
- The manager decides to set up a six-week strategy, after which new sales conditions may be established. Determine an optimum strategy for the six-week period.

**Data file**

```
% file orderdata.m
% Version of 4/5/94
% Data organized for computation
type = 3;
states = 0:4;
= [0 2 4 ... % Actions (padded)
0 2 02 ...
0 2 02 ...
0 00 00 ...
0 00 00];
C = [0 -300 -480 ... % Order costs (padded)
0 -300 -300 ...
0 -300 -300 ...
0 0 0 ...
0 0 0];
SP = 200; % Selling price
BP = 40; % Backorder penalty
PD = 0.2*ones(1,5); % Demand probabilities
```