Skip to content Skip to navigation

OpenStax-CNX

You are here: Home » Content » Modeling Concurrent Processes

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "Model Checking Concurrent Programs"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • eScience, eResearch and Computational Problem Solving

    This module is included inLens: eScience, eResearch and Computational Problem Solving
    By: Jan E. OdegardAs a part of collection: "Model Checking Concurrent Programs"

    Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Modeling Concurrent Processes

Module by: Ian Barland, Moshe Vardi, John Greiner. E-mail the authors

We will model concurrency in two ways. First, we will use Promela, a language with C-like syntax. It is not a fully featured programming language, and is not intended for general computation. Instead, Promela (“PROcess MEta-LAnguage”) programs are intended to be simplifications or models of real-world systems, for use in verification. SPIN (“Simple Promela INterpreter”) is the tool for executing and verifying programs written in Promela. Second, we will use a simple state-based transition system that will help in understanding the specification and verification of Promela programs.

Here, we introduce Promela, SPIN, and the state-based transition system through a series of examples. For the moment, we will use SPIN merely as an interpreter, to run of Promela programs. In the next section, we will introduce the verification features of SPIN. (Reference manuals and download/install instructions are available via the SPIN homepage, spinroot.com.)

Promela and SPIN Basics

We start with a series of examples illustrating race conditions.

Example 1: A Tiny First Program


 1   /* A variable shared between all processes. */
 2   show int bal = 0;
 3
 4   active proctype deposit()
 5   {
 6     bal++;
 7   }
 8
 9   active proctype withdraw()
10   {
11     bal--;
12   }
        

We have two threads, one running deposit and one running withdraw. The proctype keyword specifies that the following is code for a thread/process, while the active keyword specifies that the thread is started immediately when we start the program. Variables declared outside the body of a proctype are shared. The keyword show before a variable declaration will direct SPIN to display the value as it changes.

Here, the two processes of deposit and withdraw can interleave arbitrarily. Regardless, with this very simple example, we will always get the same result balance.

To run the code, we use SPIN. We'll describe how to use the program xspin, which uses a graphical interface. More specifically, these instructions are for UNIX version 4.1.3. The PC and Mac versions are identical, except for how to start the program. For details, see the program's README. Ask your local system administrator where the program is installed on your computer.

Aside:

There is also a version based on the command-line, called spin. It is more difficult to use interactively, but is appropriate for use non-interactive use, such as with scripts. For its options, see the manual pages for spin and the related pan. xspin is just a graphical front-end to spin. The underlying spin commands and output are displayed at the bottom of the main xspin window. These can be ignored.

owlside:

To run either xspin or spin from Rice University's Owlnet, first type setenv PATH /home/comp607/bin:$PATH.

Within SPIN, you'll work with a Promela program. If you already have a Promela program saved, you can open it with the "File" menu's "Open" option. Alternatively, start SPIN with the Promela program's filename:

xspin filename.pml
(The conventional suffix for Promela programs is .pml.) Either of these loads the Promela code into an editor window, where it can be modified. To create a new program, you can type into this window, or you can copy and past it from another editor.

After loading or typing a program, it's always a good idea to check your typing. In the "Run" menu, use "Run Syntax Check". It will report and highlight the first syntax error, if any.

Before the first run, you need to specify some parameters, even if those are the defaults. From the "Run" menu, select the "Set Simulation Parameters" option. Change the "Display Mode" defaults, by making sure that the "MSC" and "Data Values" panels are each selected, as well as each feature under the "Data Values" panel. Use the "Start" button to start the run. This opens three new windows. From the "Simulation Output" window, you can choose to single-step the run or let it finish. For now, choose "Run". The body of this window displays each step taken. Throughout the run, the "Data Values" window displays the values of variables. The "Message Sequence Chart" displays changes to each process and variable. Each column represents a separate process or variable, and the vertical axis represents time. For now, the only changes to processes are when they stop.

Each process gets a unique process identifier number, starting with 0. These are displayed in the output of the various windows. Later, we will see how to access these IDs from within the Promela program.

For additional runs, from the "Run" menu of the main window, you can select "(Re)Run Simulation". However, each run will be identical, because it makes the same choices for interleaving. To vary runs, or traces, change the "Seed Value" in the simulation options. Also, be sure to try the "Interactive" simulation style, where you make each choice in the interleaving.

Aside:

Each xspin run opens several new windows, without closing the old ones. The easiest way to close the windows for a run is to use the "Cancel" button in the "Simulation Output" window.

Aside:

Experiment with the other display modes in the simulation options to see what else is available.

Using spin from the command-line:

Running a basic simulation is one of the few things which is easier in spin than xspin, since you don't need to set any of the simulation parameters:

prompt>  spin filename.pml
Definition 1: State
A snapshot of the entire program. In Promela, this is an assignment to all the program's variables, both global and local, plus the program counter, or line number, of each process. In other languages, the state also includes information such as each process' stack contents.
Definition 2: Trace
A sequence — possibly infinite — of successive states, corresponding to one particular run of a program.
See Also: state

Exercise 1

How many traces are possible for the previous example's code? Does each end with the same value for bal?

Solution

Figure 1: The two possible traces for the above program.
(a) (b)
Figure 1(a) (tiny_first_trace1.png)Figure 1(b) (tiny_first_trace2.png)

Did you see each one using SPIN? Each trace ends with bal equal to 0, as expected.

Aside:
There are some minor differences between SPIN's behavior and the somewhat simplified conceptual behavior that we are describing; see the following list
  • SPIN adds an extra transition and state at the end of each process. You can think of the transition as the process exiting, but it doesn't change any of the variables' values. We omit this, except for one homework exercise.
  • Marking variables with show introduces transitions and states to display their changing values. We have omitted these.
  • Finally, as a minor presentational detail, SPIN doesn't report the start state of a trace like we do.

All the traces of a Promela program start at the same initial state. Any variables not explicitly initialized by the code are set to particular known values: zero, for numbers; and false, for booleans. (Compare to a C program, where two traces of the same program might each start from a different state.) And we've just seen that different traces may end in different states, or they may sometimes end in the same state while having taken different paths to reach that end state. Finally, if we look closely at all the possible traces, we see that often, several different traces might happen to pass through the same intermediate state. By combining the traces at these common states, we can depict the entire state space — a map of all the conceivable executions of our program!

Definition 3: State space
A directed graph with program states as nodes. The edges represent program statements, since they transform one state to another. There is exactly one state, the initial state, which represents the program state before execution begins.
See Also: state, end state, valid end state
Definition 4: Valid end state
(Informal) Some states of our state space may be designated valid end states. These represent points where all of the threads have completed execution.
See Also: deadlock, end state

Example 2

Figure 2: An example state space. The initial state is indicated by an incoming arrow. The valid end state is indicated by a thicker border.
Figure 2 (tiny_first_statespace.png)

The above is the state space of the Tiny First Program's code. The states are shown with the value of the only variable's value, plus the program point of each process. The edges are shown only as arrows, as the change in line numbers indicates which statement's and thread's execution each represents.

The possible traces are then all the different paths in the state space. State spaces are primarily helpful for understanding how to verify concurrent programs, and will be used mostly in the next section.

Aside:

SPIN actually has two notions of states and state space. One form describes a single process' information, while the other describes the whole system. The latter is what we have defined and illustrated. The per-process form can be displayed in xspin by the "Run" menu's option to "View Spin Automaton for each Proctype". Also, the state numbering of this per-process form is reported by SPIN's output of traces. There is a close relation between the two forms, as the whole system is intuitively some sort of cross-product of the individual processes.

We now modify the previous example. The two assignment statements have each been broken up into three statements. These three simulate a typical sequence of machine-level instructions in a load/store style architecture. This is a way to model that level of atomicity within Promela.

Example 3: A Race Condition


 1   /* A variable shared between all processes. */
 2   show int bal = 0;
 3        
 4   active proctype deposit()
 5   {
 6     show int new_bal;
 7
 8     new_bal = bal;
 9     new_bal++;
10     bal = new_bal;
11   }
12
13   active proctype withdraw()
14   {
15     show int new_bal;
16
17     new_bal = bal;
18     new_bal--;
19     bal = new_bal;
20   }
        
Figure 3: Two possible traces for the above program resulting in the desired zero result.
(a) (b)
Figure 3(a) (balance1_trace_zero1.png)Figure 3(b) (balance1_trace_zero2.png)

Here, each process has its own local variable named new_bal. Note that we can also have SPIN show local variables in the "Message Sequence Chart" window. Run this code as described in the previous example.

Exercise 2

How many possible traces does this code allow? (Try it in SPIN!)

Solution

This code has 70 possible traces — many more than the previous version. Even with such a small program, it is too cumbersome to try to list them all. This is a fundamental reason why it is so hard for humans to debug a concurrent program, since it involves understanding all the possible ways threads can interact. Once we start using SPIN as a verifier, letting it enumerate these as a part of verification will be one of the benefits of automation. Later, we will verify this specific example.

Exercise 3

More importantly, what is the result of the computation, i.e., the value of bal at the end?

Solution

It can be 0, as expected, but it can also be 1 or -1! These new possibilities are the result of traces such as the following:

Figure 4: Two possible traces for the above program resulting in undesired non-zero results.
(a) (b)
Figure 4(a) (balance1_trace_one.png)Figure 4(b) (balance1_trace_negone.png)

In particular, after one process has updated and printed the global variable, the other process can update it based upon the original un-updated value.

Exercise 4

Each of the previous two exercises could be very easily answered, if you were given the program's state space. Draw the state space.

Solution

Figure 5: State space for above program. For brevity, the variable names are elided. This leaves the information in the form of the deposit and withdraw line numbers and the values of bal, deposit's new_bal, and withdraw's new_bal, respectively.
Figure 5 (balance1_statespace.png)

Whew! See how much larger this is than the state space of simpler version of this program.

Making the atomicity more fine-grained increases the possibility of concurrency problems. We will use this idea in many of the examples in this module, because we want to ensure that we can handle even the hard problems. The particular code sequence used in the previous example is also one of the shortest with a race condition.

Example 4: A Race Condition Variation

In Promela, initialization of a variable and assigning to a variable have subtly different semantics. This is illustrated by this variation on the previous example.


 1   /* A variable shared between all processes. */
 2   show int bal = 0;
 3
 4   active proctype deposit()
 5   {
 6     show int new_bal = bal;
 7  
 8     new_bal++;
 9     bal = new_bal;
10   }
11  
12   active proctype withdraw()
13   {
14     show int new_bal = bal;
15  
16     new_bal--;
17     bal = new_bal;
18   }
        

In Promela, initializing variables is part of starting a process. Thus, all variables of all active processes are initialized (defaulting to 0) before any other statements are executed. Here, each copy of new_bal starts with the value 0, so bal always ends with either the value 1 or -1, but never 0.

Figure 6: State space for above program with local variable initializations. For brevity, the variable names are elided. This leaves the information in the form of the deposit and withdraw line numbers and the values of bal, deposit's new_bal, and withdraw's new_bal, respectively.
Figure 6 (balance2_statespace.png)

Example 5: A Race Condition Generalized

The previous examples modeled two people simultaneously updating the same bank account, one making a deposit, and one a withdrawal. What if we had even more simultaneous updates? Let's focus only on deposits, so each is doing the same action. Rather than duplicating code with multiple proctype's (say, deposit1, deposit2, and deposit3), Promela allows us to abstract that, as in the following:


 1   /* Number of copies of the process to run. */
 2   #define NUM_PROCS 3
 3
 4   /* A variable shared between all processes. */
 5   show int z = 0;
 6
 7   active[NUM_PROCS] proctype increment()
 8   {
 9     show int new_z;
10
11     new_z = z;
12     new_z++;
13     z = new_z;
14   }
          

We've abstracted the variables' names, but otherwise, increment is the same as the previous deposit. The first substantial change is that we are starting three processes, rather than two, each with a copy of the same code. The bracket notation in the proctype declaration indicates how many copies of that code are to be used. The number of copies must be given by a constant, and it is good practice to name that constant as shown.

Although vastly simplified, this is also the core of our web-based flight reservation system example. The shared variable z would represent the number of booked seats.

Exercise 5

We can also ask questions like “What is the probability that we end with z having the value 3?” How would you solve this?

Solution

Divide the number of traces meeting this condition by the total number of conceivable traces. 1

However, this assumes that each possible trace is equally likely — a dangerous assumption! In real life, with a particular operating system and hardware, some traces might be much more common than others. For example, it might be very unlikely (but possible) that two context switches would occur only a small number of instructions apart. So, a more realistic answer would be based upon a statistical analysis of repeated runs.

But an even more preferable answer is to say that this is a trick question! Rather than thinking of our concurrent system as being probabilistic, it is often better to stop short and merely call it non-deterministic. For example, if processes are distributed across the internet, then which trace occurs might depend on network traffic, which in turn might depend on such vagaries as the day's headlines, or how many internet routers are experimenting with a new algorithm, or whether somebody just stumbled over an ethernet cord somewhere. In practice, we can't come up with any convincing distribution on how jobs get scheduled. Moreover, sometimes we want to cover adversarial schedules (such as guarantees of how our web server protocol acts during denial-of-service attack).

Guarded Statements and Blocking

Guarded statements are one of the most basic elements of the Promela language. As this section will introduce, they introduce the ability to synchronize processes. In the next section, we see how they are also fundamental parts of the looping (do) and conditional (if) constructs. After that, we will return to the synchronization issues, exploring them in greater depth.

Definition 5: Guarded statement
A guarded statement, written in Promela as guard -> body, consists of two parts. The guard is a boolean expression. The body is a statement. Execution of the body will block, or wait, doing nothing, until the guard becomes true.
See Also: enabled

Example 6: Produce-One and Consume-One


 1  bool ready = false;
 2  int  val;
 3
 4  active proctype produce_one()
 5  {
 6    /* Compute some information somehow. */
 7    val = 42;
 8
 9    ready = true;
10
11    /* Can do something else. */
12    printf("produce_one done.\n");
13  }
14
15  active proctype consume_one()
16  {
17    int v;
18
19    /* Wait until the information has been computed. */
20    ready ->
21
22    /* Now, we can use this. */
23    v = val;
24  }
        

This program leads to the following state space, illustrating two important traits about Promela's guards. First, the guarded statement of consume_one will block — sit and do nothing — until its guard is satisfied. This is seen by the behavior of the first three states in the state space. Second, execution of the guard is itself a transition to a state. This allows code, such as produce_one's printf to be interleaved between the guard and its body. (The otherwise unnecessary printf is there solely to make this point.) When such interruptions between the guard and body are undesired, they can be prevented, as shown later.

Figure 7: State space for the above program. The consume_one process cannot make progress until produce_one sets the ready variable.
Figure 7 (pcone_statespace.png)
Definition 6: Enabled
A guarded statement is enabled if its guard evaluates to true.
See Also: guarded statement, blocked

Aside:

The previous definition of “enabled” is a sufficient simplification for our purposes. But more accurately, in Promela, -> is just alternate syntax for ;, and an expression is a valid statement by itself. Thus, we more properly define when each kind of Promela statement is enabled. In particular, boolean expressions are enabled when they evaluate to true, assignment statements are always enabled, and a compound statement

stmt_1 ;
… ;
stmt_n
          
is enabled if its first statement is. One simplification that this allows is that trivially true guards are unnecessary in Promela. The statement true -> stmt is equivalent to the simpler stmt.

Later we will explore the possibility that a process stays blocked forever, or equivalently, never becomes enabled.

Control Flow in Promela

Of course, few interesting programs are made simply of straight-line code. We generally need to have more interesting control flow, including conditionals and iteration. These are illustrated in the remaining examples of this section.

Example 7: Random Walk 1

The following code illustrates use of the do statement, the primary iteration statement.


 1   #define  STARTING_DISTANCE  2
 2   show int dist = STARTING_DISTANCE;
 3
 4   active proctype stumble()
 5   {
 6     do
 7     :: true -> dist--;                        /* Approach. */
 8     :: true -> dist++;                        /* Retreat.  */
 9     :: dist <= 0 -> break;                    /* Success!  */
10     :: dist >= STARTING_DISTANCE*2 -> break;  /* Give up.  */
11     od;
12   }
	  

This process consists solely of a loop from do to the matching od. The loop contains four guarded statements as alternatives, each prefaced with double colons (::). On each iteration, all guards are evaluated. From those evaluating to true, only one is chosen arbitrarily, and its body executed. (What if none are true? We'll see soon.)

Executing a break terminates the loop. More generally, with nested loops, it terminates the most tightly enclosing loop.

A loop in the control flow leads to cycles in the state space.

Figure 8: Random Walk 1 state space.
Figure 8 (random_walk1_statespace.png)
As usual, we have a state for each possible variable value. There are a finite number of states, since the int range is finite: -231 -231 2311 231 1 .

Aside:

Since the large range of numbers is not needed for this example, it would have been better to use the short type, instead. With a smaller range, -215 -215 2151 215 1 , the state space would be significantly smaller, and SPIN could reason about the program more quickly.

Exercise 6

Run this code several times, observing the changing value of dist. As before, to see different behaviors, you'll need to change the "Seed Value" or use the "Interactive" simulation.

Exercise 7

Is it equivalent to change the inequalities (<= and >=) to equalities (==) in this example? Why or why not?

Solution

No. For example, the first clause could be chosen until dist was negative. Unlike in the original code given, the successful termination clause could not be chosen until dist is incremented to zero again. Try it in SPIN, and look for such a trace.

Example 8: Random Walk 2

The following code is a variation on the previous, illustrating the if statement and the else keyword.


 1   #define  STARTING_DISTANCE  2
 2   show int dist = STARTING_DISTANCE;
 3
 4   active proctype stumble()
 5   {
 6     do
 7     :: (!(dist <= 0) &&
 8         !(dist >= STARTING_DISTANCE*2)) ->
 9        if
10        :: true -> dist--;                  /* Approach. */ 
11        :: true -> dist++;                  /* Retreat.  */
12        fi;
13     :: else -> break;
14     od;
15   }
	  

First, consider the do loop. It has two guarded expressions, one with the guard else. On each iteration, if there are any enabled clauses, one is chosen arbitrarily, as before. But, if none are enabled the else clause, if any, is chosen. There may be only one else clause. An else clause is not the same as one with a true guard, as it can only be executed if no other clause is enabled.

Here, this means that the first clause of the do, i.e., the entire if statement, is used whenever the distance is within the specified range. Otherwise, the loop is exited. Thus, unlike the previous example, the loop is always exited once the distance is 0 or STARTING_DISTANCE*2.

Inside this loop, we have an if statement, which itself has two clauses. Like the do, it chooses arbitrarily among the enabled clauses, and may have an else clause.

Exercise 8

Run this code several times and observe its behavior.

Exercise 9

What is the state space for this code?

Solution

Figure 9: Random Walk 2 state space.
Figure 9 (random_walk2_statespace.png)

As desired, the program will always exit if dist reaches either of its limit values.

Exercise 10

Is it equivalent to change the <= and >= to == in this example? Why or why not?

Solution

Yes. As mentioned, because of the else clause, the loop will terminate once the distance is out of range. Since the distance is only ever changed by 1 on an iteration, it is sufficient to check just the boundary conditions of the range.

While equivalent, it is probably a not good idea to make this change, since its correctness is dependent on other factors.

Exercise 11

What is Promela's equivalent of the following C conditional statement?


if (condition)
   then_stmt;
else
   else_stmt;
	    

Solution


if
:: condition -> then_stmt;
:: else -> else_stmt;
fi;
	    

Exercise 12

What is Promela's equivalent of the following C loop statement?


while (condition)
   body_stmt;
	    

Solution


do
:: condition -> body_stmt;
:: else -> break;
do;
	    

The previous examples illustrate what happens where there is a choice among guarded statements and at least one is enabled. What if none are enabled? In that case, that process is blocked, and it halts execution until it is no longer blocked.

Exercise 13

Create a Promela program illustrating synchronization and looping in the following way. One process counts down a global variable, until it reaches zero. Another process waits for the count to be zero to print a message. Run your program in "Interactive" mode and confirm that the second process is blocked, and thus does nothing, until the count reaches zero.

Solution


show int count = 5;

active proctype count_down()
{
  do
  :: count > 0 -> count--;
  :: else -> break;
  od;
}

active proctype wait()
{
  count == 0 ->
  printf("I'm done waiting!\n");
}
	  

Process Interaction

We have so far seen very simple interactions between processes, using shared variables for communication and guarded statements for synchronization. This sections delves further into the possible interactions and introduces some additional Promela syntax for that.

While blocking is an essential tool in designing concurrent programs, since it allows process to synchronize, it also introduces the possibility of deadlock, when all processes are blocked. Since deadlock is a common problem in concurrent programs, we will later see how to verify whether a program can deadlock.

Exercise 14

Create a small Promela program that can deadlock. The program need not do anything computationally interesting. Run your program to see that it can get stuck.

Solution

There are many possible programs, but the following is a simple example.


show int x = 1;

active[2] proctype wait()
{
  x == 0 ->
  printf("How on earth did x become 0?!?\n");
}
	  

One standard concurrent programming technique that uses guarded statements is locking. While details of its use are beyond the scope of this module, we'll provide a quick review. Locks are used when there is some resource or critical section of code that we want to control access to. For example, only one process at a time should be able to write data to a specific file, lest the file get intermingled inconsistently. To ensure this, the process must acquire the lock associated with writing this file before it can do the operation, and then must release it.

Example 9: Locking to avoid race condition

We can modify the Generalized Race Condition example to eliminate its race condition. The lock controls access to the shared resource, i.e., the shared variable z.


 1   /* Number of copies of the process to run. */
 2   #define NUM_PROCS 3
 3
 4   show int  z = 0;
 5   show bool locked = false;
 6
 7   active[NUM_PROCS] proctype increment()
 8   {
 9     show int new_z;
10
11     atomic {
12       !locked ->        /* Lock available? */
13       locked = true;    /* Acquire lock.   */
14     }
15
16     /* Critical section of code. */
17     new_z = z;
18     new_z++;
19     z = new_z;
20
21     locked = false;     /* Release lock.   */
22     /* End critical section of code. */
23   }
	  

As described before, when using a lock, we need to acquire the lock, execute the critical section, and release the lock. Releasing the lock is the final action of the critical section, as it allows another process to acquire the lock. To acquire the lock, clearly we need to check whether it is available. This guard is where each process potentially blocks, waiting for its turn to get the block.

The atomic keyword, as its name implies, makes a group of statements atomic. Without it, two processes could first each determine the lock was available, then each acquire the lock, defeating the purpose of the lock.

Exercise 15

What values of z are possible at the end of this program?

Solution

As desired, only the value 3, since there are 3 processes.

Exercise 16

Run this code several times and observe its behavior. Also, eliminate the atomic keyword and its associated braces, and repeat.

Exercise 17

Assume that the atomic keyword is removed, and there are only two processes, instead of three. Give a trace where both are in the critical section at the same time. What are the differences in the state spaces with and without the atomic?

Solution

The following figure shows one such trace. Notice that z is effectively incremented only once, instead of twice.

Figure 10: An undesired possible trace for increment(), if atomic keyword were removed. For brevity, states show the line number for each process, the z and locked values, and each process' new_z value, respectively.
Figure 10 (increment1_race_trace.png)

The critical difference in state spaces is that the atomic keyword prohibits processes interleaving between the guard and its body. This reduces the size of the state space. In this case, as in many others, it also reduces the number of different end states and, thus, possible result values.

Example 10: Tiny Server and Clients

We now have the tools to demonstrate how a tiny server and its clients can be modeled. The server process is a loop that repeatedly waits for and then processes requests. Clients send requests. For simplicity, in this example, the server won't communicate any results back to the clients, and the clients will only make one request each.

Servers generally allow multiple kinds of requests. Here, we'll use one more new piece of Promela syntax, mtype, to define some named constants, similar to an enumerated type.


/* Number of each type of client. */
#define NUM_CLIENTS 1

/* Define the named constants. */
mtype = {NONE, REQUEST1, REQUEST2};

/* Declare a shared variable.  */
show mtype request = NONE;

active proctype server()
{
  do
  :: request == REQUEST1 ->
     printf("Processing request type 1.\n");
     request = NONE;
  :: request == REQUEST2 ->
     printf("Processing request type 2.\n");
     request = NONE;
  od;
}

active[NUM_CLIENTS] proctype client1()
{
  atomic{
    request == NONE ->
    request = REQUEST1;
  }
}

active[NUM_CLIENTS] proctype client2()
{
  atomic{
    request == NONE ->
    request = REQUEST2;
  }
}
	

Function calls are another standard kind of control flow. Surprisingly, Promela does not have function calls! Every proctype instance is a separate process. Though if you really want, you could simulate a function call by creating a new process dynamically, and blocking until that process returns.

Verification

So far, we have determined the possible behaviors of a program simply by running the program a bunch of times. For small programs, we can be very careful and make sure we exhibit all the possible traces, but the state space soon becomes unwieldy.

The real power of SPIN is as a tool for verification, our original goal. SPIN will search the entire state space for us, looking for (reachable) states which fail to have desired properties.

Assertions

The first verification technique we'll examine are assertions, common to many programming languages. In Promela, the statement assert(condition); evaluates the condition. If the result is true, execution continues as usual. Otherwise, the entire program is aborted and an error message is printed.

When simulating a single run of the program, SPIN automatically checks these run-time assertions; this is the usage that most programmers should be familiar with from traditional programming languages. But additionally, we'll see that SPIN, in the course of searching the entire state space, verifies whether an assertion can ever fail! (Though of course it can only search finite, feasible state spaces; happily, “feasible” can often include hundreds of millions of states.)

Example 11

Consider our last race condition example. One of our original naïve expectations was that, within each process, the value of z at the end of the process is exactly one more than at the beginning. The previous examples have shown that to be wrong, but we had to run the program until we encountered a run when it failed. Here, the assert statement puts that expectation explicitly into the program, for SPIN to check.


 1   #define NUM_PROCS 3
 2
 3   show int z = 0;
 4
 5   active[NUM_PROCS] proctype increment()
 6   {
 7     show int new_z;
 8
 9     /* A saved copy of the old z, for the assertion. */
10     show int old_z;
11
12     old_z = z;
13     new_z = old_z + 1;
14     z = new_z;
15
16     assert(z == old_z+1);
17   }
	    

It is often the case, as it is here, that to state the desired condition we need to add an extra variable — here, old_z. As always, it is important that when introducing such code for testing that you don't substantially change the code to be tested, lest you inadvertently introduce new bugs!

Exercise 18

Run this code several times, and observe when the assertion fails. This text indicates which assertion failed, and the line will be highlighted in the code window. To see which process' copy of increment failed and why, you have to look more closely at the steps shown.


spin: line  16 "pan_in", Error: assertion violated
spin: text of failed assertion: assert((z==(old_z+1)))
              

Certainly this run-time check is helpful, but what about the promise of checking whether an assertion can fail? We will need to use SPIN as a verifier, rather than just a simulator.

As with simulation, before our first verification, we need to specify some parameters, even if only using the defaults. From the "Run" menu, select the "Set Verification Parameters" options. For this example, make sure that the "Assertions" option is marked. Use the "Run" button to start the verification.

Using spin from the command-line:

Using spin by hand involves three steps: we must first create a verifier (a C program) from our promela source; then compile the verifier, and then run the verifier:

prompt>  spin -a filename.pml
prompt>  gcc pan.c -o filename
prompt>  filename
The -a flag tells SPIN to generate a verifier, which it always names pan.c.

As desired, the verification finds a problem, as reported in the "Verification Output" window. It also saves a description of the example run found where the assertion fails.


pan: assertion violated (z==(old_z+1)) (at depth 11)
pan: wrote pan_in.trail
          
The remainder of this window's output is not too important, as it describes which verification options were used and how much work was done in the verification.

Aside:

pan is the standard SPIN default name for files related to Promela verification. With xspin, you'll see these filenames reported, but you don't need to use or remember them.

Now we can have SPIN show us the sample failed run it found. In the new "Suggested Action" window, select "Run Guided Simulation". This brings up the now-familiar simulation windows, which will show this particular run. This run is "guided" by a trace saved by the verification. (Alternatively, selecting "Setup Guided Simulation" allows you to change the display options first.)

Using spin from the command line:

Running a guided simulation from the command line is similar to running a regular simulation, except you provide the -t flag to have SPIN use the pre-named trail file. It's often handy to use the -p flag to have SPIN print each line it executes:

prompt>  spin -t -p filename.pml

The trace found in our verification executed some of each thread. From our experience with this program, we know there are less complicated traces where the assertion fails. How can we find them?

Again, in the "Run" menu, select "Set Verification Parameters". Now, select "Set Advanced Options", then "Find Shortest Trail", and "Set". Then run the verification again.


pan: assertion violated (z==(old_z+1)) (at depth 11)
pan: wrote pan_in.trail
pan: reducing search depth to 10
pan: wrote pan_in.trail
pan: reducing search depth to 5
	
Figure 11: A nonminimal and a minimal trace failing the assertion in previous example. For brevity, some variable names are omitted. The second line shows the new_z and old_z for each process, respectively. For the assertions, the relevant variables are underlined.
(a) (b)
Figure 11(a) (increment2_failed_trace_nonminimal.png)Figure 11(b) (increment2_failed_trace_minimal.png)

Aside:

Experiment on your own with the other advanced options under "Error Trapping". Using breadth-first search is an alternate strategy to finding a shortest trace with an error.

Example 12

Previously, we claimed that adding a lock fixed the race condition of the previous example. Let's now verify that claim. The following is the locking version of the code augmented with the same assertion code just used. Note that the assertion needs to be inside the critical section. (Otherwise, process B could execute its critical section, changing z, in between process A's critical section and its assertion.)


/* Number of copies of process to run. */
#define NUM_PROCS 3

show int  z = 0;
show bool locked = false;

active[NUM_PROCS] proctype increment()
{
  show int new_z;

  /* A saved copy of the old z, for the assertion. */
  show int old_z;

  atomic {
    !locked ->        /* Lock available? */
    locked = true;    /* Acquire lock.   */
  }

  /* Critical section of code. */
  old_z = z;
  new_z = old_z + 1;
  z = new_z;

  assert(z == old_z+1);

  locked = false;     /* Release lock.   */
  /* End critical section of code. */
}
	  

Exercise 19

Run the verification as before. As expected, no errors are found. Happily, this time no error messages are reported before the version information in the "Verification Output" window, and SPIN doesn't suggest any guided simulation.

Deadlock and End States

Deadlock

A previous exercise asked you to write a small deadlocking program, and provided one solution. Running such a program in SPIN's simulation mode, some execution paths result in deadlock, i.e., every single process waiting on other processes to do something first. How can we verify whether or not deadlock is possible? SPIN checks for deadlock automatically, as one of a hand-full of built-in checks, so we will see how that is reported.

Run SPIN's verifier on your tiny deadlocking program. The deadlock is reported by a somewhat obscure error message:


pan: invalid end state (at depth 0)
pan: wrote pan_in.trail
	    
As usual, xspin allows you to easily run the guided simulation it found.

Deadlock occurs in a state when nothing further will happen—that is, there are no outgoing edges in the state space. We call this an end state. Most programs we've seen previously have a valid end state: all processes complete their last line. In contrast, in deadlock we are in an end state, yet at least one process hasn't completed. Hence “invalid”. SPIN checks for end states by searching the entire state space (either by a depth- or breadth-first search).

Valid End States (optional)

We've mentioned that valid end states are those where every process has completed its last line, and that deadlock ends with at least one process which hasn't completed. However, that's not enough to conclude deadlock: sometimes it is desirable to have a trace reach an end-state with a process still (say) waiting for more input. Consider a server and its clients. By default, the server waits and does nothing. Only when a client makes a request, the server responds and does something. If all clients exit, it is expected, not an error, for the server to be blocked waiting for another request. In SPIN, we need to notate that it is valid for the server to end at that point, and not actually deadlock.

If we run SPIN's verifier on the code for the Tiny Server and Clients example, we get a spurious “invalid end state” error, since we expect the client to be alive and listening for for any more requests.

To let SPIN know that is acceptable and shouldn't actually be considered deadlock, we label the beginning of the server loop as being a valid end state: We give it a label whose first three letters are “end”, as shown below. Run the verifier again on this, and there should be no error message.


 1  /* Number of each type of client. */
 2  #define NUM_CLIENTS 1
 3
 4  /* Define the named constants. */
 5  mtype = {NONE, REQ1, REQ2};
 6
 7  /* Declare a shared variable.  */
 8  show mtype request = NONE;
 9
10  active proctype server()
11  {
12   /* Waiting for a request is a valid place to end. */
13   endwait:
14    do
15    :: request == REQ1 ->
16       printf("Processing request type 1.\n");
17       request = NONE;
18    :: request == REQ2 ->
19       printf("Processing request type 2.\n");
20       request = NONE;
21    od;
22  }
23
24  active[NUM_CLIENTS] proctype client1()
25  {
26    atomic {
27      request == NONE ->
28      request = REQ1;
29    }
30  }
31
32  active[NUM_CLIENTS] proctype client2()
33  {
34    atomic {
35      request == NONE ->
36      request = REQ2;
37    }
38  }
	      

More formally, what this does is to tag some of the states in the state space as being valid, if they happen to be an end-state.

Figure 12: The state space for the above program. The endwait label makes the bottom two states be valid end states.
Figure 12 (server_statespace.png)

Putting the end label in front of the entire do statement may not seem as natural to you as:


do
::
 end1:
  request == REQ1 ->
  …
::
 end2:
  request == REQ2 ->
  …
od;
	    
However, this is not acceptable! Try it, by entering the changes and running a syntax check. The problem is that end1 and end2 both represent the same program point, where the program waits for some guard to become true. It would be nonsensical to have (say) one of the two guards labeled an end-state without the other one being an end-state. To prevent surprising inconsistencies, Promela disallows labels in front of individual guards of a compound statement.

There are other syntactic restrictions of where labels can appear. The most commonly encountered is that they cannot appear at the end of a statement block. For example, instead of


{
  x = 1;
  y = 2;
 label:   /* Label not allowed here. */
}
	    
you can introduce a dummy statement, and label it:

{
   x = 1;
   y = 2;
  label:
   skip;  /* A statement that does nothing. */
}
	    

Mutual Exclusion — a morality play (optional)

It's worth showing several examples of correct and incorrect concurrent programs, and how SPIN can implement and attempt to verify them. We'll examine a sequence of programs, all dealing with mutual exclusion protocols. We hope to leave the gentle reader with an appreciation of the non-obvious nature of concurrent bugs (and hence the value of automated verification).

To get started, we revisit the fundamental race condition, phrased in terms of two processes simultaneously each being in their critical section.

 1  /* An INCORRECT attempt at Mutual exclusion.
 2   * Adapted from Ruys 2002: SPIN Beginner's Tutorial.
 3   */
 4  
 5  bool flag = false;  /* Is some process in its crit section? */
 6  int  data = 0;
 7  
 8  active[2] proctype proc()
 9  {
10    do
11    :: flag == false ->  /* Wait until nobody in crit section. */
12       flag = true;
13  
14       /* Entering critical section. */
15       data++;
16       assert(data == 1);
17       data--;
18       /* Leaving critical section. */
19  
20       flag = false;
21    od;
22  }

Exercise 20

Take a moment to run SPIN's verifier, and confirm that the assert statement indeed might be violated.

Here is another attempt which doesn't work. Can you see how it might go awry?

 1  /* An INCORRECT attempt at Mutual exclusion.
 2   * Version 2.
 3   * Adapted from Ruys 2002: SPIN Beginner's Tutorial.
 4   */
 5  
 6  bool flagA = false;  /* A wants to enter its crit section. */
 7  bool flagB = false;  /* B, similarly. */
 8  int  data = 0;
 9  
10  active proctype A()
11  {
12    do
13    :: flagA =  true;       /* Declare intent to enter crit section. */
14       flagB == false ->    /* Wait for B to leave their crit section, if in. */
15        
16       /* Entering critical section. */
17       data++;
18       assert(data == 1);
19       data--;
20       /* Leaving critical section. */
21  
22       flagA = false;
23    od;
24  }
25  
26  
27  active proctype B()
28  {
29    do
30    :: flagB =  true;       /* Declare intent to enter crit section. */
31       flagA == false ->    /* Wait for A to leave their crit section, if in. */
32        
33       /* Entering critical section. */
34       data++;
35       assert(data == 1);
36       data--;
37       /* Leaving critical section. */
38  
39       flagB = false;
40    od;
41  }

Exercise 21

Again, use SPIN to verify that the assertion can be violated. (Keep in mind that we can't use atomic (or d_step) here; the entire point of a mutex algorithm is to be able to implement multi-statement atomicity using nothing but shared memory and the atomicity of simple-assignment.)

By being substantially more clever though, interacting flags can be used to get two processes to share a critical section:

 1  /* A correct attempt at Mutual exclusion, huzzah!
 2   * Peterson, 1981.
 3   */
 4  
 5  bool flagA = false;  /* A wants to enter its crit section? */
 6  bool flagB = false;  /* B, similarly. */
 7  pid  turn;           /* For politeness, offer other people a turn. */
 8  int  data = 0;
 9  
10  
11  active proctype A()
12  {
13    do
14    :: flagA = true;        /* Declare intent to enter crit section. */
15       turn  = 1 - _pid;    /* Offer the turn to the other. */
16  
17       ((flagB == false) || (turn == _pid)) ->
18        
19       /* Entering critical section. */
20       data++;
21       assert(data == 1);
22       data--;
23       /* Leaving critical section. */
24  
25       flagA = false;
26    od;
27  }
28  
29  active proctype B()
30  {
31    do
32    :: flagB = true;        /* Declare intent to enter crit section. */
33       turn  = 1 - _pid;    /* Offer the turn to the other. */
34  
35       ((flagA == false) || (turn == _pid)) ->
36        
37       /* Entering critical section. */
38       data++;
39       assert(data == 1);
40       data--;
41       /* Leaving critical section. */
42  
43       flagB = false;
44    od;
45  }
Processes A and B are using their flags in symmetric ways, and we can certainly factor their common code to get two instances of the same proctype:
 1  /* A correct attempt at Mutual exclusion, huzzah!
 2   * Peterson, 1981.
 3   */
 4  
 5  bool flag[2] = false;  /* Process #i wants to enter its crit. section? */
 6  pid  turn;             /* For politeness, offer other people a turn. */
 7  int  data = 0;
 8  
 9  active[2] proctype P()
10  {
11    pid me, peer;
12  
13    me   =     _pid;
14    peer = 1 - _pid;
15  
16    do
17    :: flag[me] = true;   /* Declare intent to enter crit section. */
18       turn = peer;       /* Politely give others a chance, first. */
19        
20       ((flag[peer] == false) || (turn == me)) ->
21          
22       /* Entering critical section. */
23       data++;
24       assert(data == 1);
25       data--;
26       /* Leaving critical section. */
27  
28       flag[me] = false;
29    od;
30  }
A paper proof of this algorithm requires careful thought; fortunately verifying this with SPIN is much easier. Note that if you wanted to tweak an innocuous line or two of Peterson's algorithm, without SPIN you'd have to reason extremely carefully to be sure you didn't lose mutual exclusion.

A different algorithm for mutual exclusion, the Bakery algorithm, can be thought of as having a roll of sequentially-numbered tickets; when a process wants access to the critical section (the bakery clerk), they take a number, and then look around to be sure they have the lowest ticket number of anybody, before striding up to the counter. The tricky bit is that “taking a number” means both copying the value of a global counter, and updating that counter, in the presence of other processes who might be doing so simultaneously. It seems like it's begging the question, to even do that much!

Promela code for a two-process implementaton of the Bakery algorithm is below; it can be generalized to nn processes 2

 1  /* A correct attempt at Mutual exclusion.
 2   * Lamport's "Bakery algorithm"
 3   * Procs which communicate only through ticket[].
 4   */
 5  
 6  byte ticket[2];      /* `byte' ranges from 0..255.  Initialized to 0. */
 7  int  data = 0;
 8  
 9  active [2] proctype customer()
10  {
11    pid me   =   _pid;
12    pid peer = 1-_pid;             /* Special case for 2 procs only. */
13  
14    skip;                          /* Doing some non-critical work... */
15  
16    ticket[me] = 1;                /* Declare desire to enter crit section. */
17    ticket[me] = ticket[peer]+1;   /* In general: max(all-your-peers)+1.    */
18    ((ticket[peer] == 0) || (ticket[me] < ticket[peer])) ->
19  
20    /* Entering critical section. */
21    data++;
22    assert(data == 1);
23    data--;
24    /* Leaving critical section. */
25  
26    ticket[me] = 0;                /* Relinquish our ticket-number. */
27  }
28  
29  
30  /* The meaning of ticket[], for each process:
31   * ticket[me]==0 means I am not contending for crit section.
32   * ticket[me]==1 means I am in the process of taking a ticket
33   *               (I am looking at all other ticket holders)
34   * ticket[me] > 1 means I have received a ticket and will
35   *              enter the critical section when my turn comes up.
36   *
37   * A problem still present, when trying to generalize to > 2 procs:
38   *   Two procs may choose the same ticket-number;  
39   *   when nobody has a smaller ticket than you, check them all for equal tix;
40   *   must break any ties reasonably (eg by smaller _pid).
41   *
42   * It's a bit sneaky, that 0 and 1 have special meanings
43   *  (and the fact that 1 is less than any other "valid" ticket-number
44   *  is important -- while anybody is calculating their ticket,
45   *  nobody else will enter the critical section.
46   *  This inefficiency can be avoided by being cleverer.)
47   */
SPIN can be used to verify this code. But it is worth taking the time to reason through the code 3 in your head, to compare your brain's horsepower with SPIN's steam-engine technology.

You might have noticed that this example (unlike earlier ones) only has each process perform their critical section once. No problem, we can put that in a do loop:

 1  /* An overflow bug sneaks into our Mutual exclusion, uh-oh!
 2   * Lamport's "Bakery algorithm"
 3   * Procs which communicate only through ticket[].
 4   */
 5  
 6  byte ticket[2];
 7  int  data = 0;
 8  
 9  active [2] proctype customer()
10  {
11    pid me   =   _pid;
12    pid peer = 1-_pid;
13  
14    do
15    :: skip;                       /* Doing some non-critical work... */
16  
17       ticket[me] = 1;             /* Declare desire to enter crit section. */
18       ticket[me] = ticket[peer]+1;
19       ((ticket[peer] == 0) || (ticket[me] < ticket[peer])) ->
20     
21       /* Entering critical section. */
22       data++;
23       assert (data == 1);
24       data--;
25       /* Leaving critical section. */
26  
27       ticket[me] = 0;             /* Relinquish our ticket-number. */
28    od;
29  }
SPIN verifies thi—wait a moment, the verification fails! And at depth 2058, no less. What's going on? A clue can be found in scrutinizing the output:

spin: line  17 "mutexG2.pml", Error: value (256->0 (8)) truncated in assignment
          
Our variable ticket, a byte, is overflowing!

Exercise 22

  1. Describe one example trace, where ticket can keep incrementing until overflow.
  2. Does this overflow necessarily happen in every run?
  3. What is the probability that this overflow will occur?
  4. While changing the type of ticket from byte to int changes nothing conceptually, it does change SPIN's verification. Re-run the verification with this change. What is the difference?
Solution
  1. Call the two processes P1 and P2. P1 might get the first ticket=1 and enter its critical section. While there, P2 takes ticket=2, blocking until P1 is done. Then P2 enters its critical section, but before P2 finishes P1 again requests its critical section, taking ticket=3. This process repeats—each process requesting the critical section (and bumping up the ticket number) before its peer finishes.
  2. No. If there is a moment when no processes are wanting the critical section, then the ticket number (intentionally) drops back to zero.
  3. This is a trick question (as we've mentioned in a previous problem's solution): For example, road construction may fundamentally shift the rate of clients in the bakery, or scheduling might even become adaptively adversarial (perhaps: a grudge-holding customer trying to continually thwart another customer, by continually taking numbers just to ask for a glass of water).
  4. SPIN runs out of state space before it detects the cycle. ¶ This justifies our initial use of byte: there is no conceptual difference in the verification errors between using byte and int, but by keeping the state space from being extraneously large, we get better insight to what the difficulty was. (Imagine if we'd used int first — we might have seen the state-space error, presumed that other parts of our program made verification infeasible, crossed our fingers and given up.)

The moral of the story is that concurrent protocols can difficult and subtle. The SPIN book puts it well: “The number of incorrect mutual exclusion algorithms that have been dreamt up over the years, often supported by long and persuasive correctness arguments, is considerably larger than the number of correct ones. [One exercise], for instance, shows a version, converted into Promela, that was recommended by a major computer manufacturer in the not too distant past.”

The following are some other ideas, each of these can form the kernel of a successful mutex algorithm.

  • First-come, first-served. Within Promela/SPIN, we assume one statement is executed at a time, so processes must arrive at the critical section in some particular order.
  • Pre-determined priority order, e.g., by process ID.
  • Least recently used. I.e., processes defer to others that haven't recently executed this code.
  • Random. E.g., each process randomly chooses a number, with ties broken randomly also.

Issues in writing feasibly-verifiable programs (optional)

Example 13: Banking

Our first Promela examples were drastically simplified models of banking, with deposits and withdrawals. Now, let's consider a production-quality system of bank accounts. Its code base could easily entail millions of lines of code.

Before we start porting the entire system to Promela, let's consider what it is we want to use SPIN to verify. Ideally, we want to verify everything, including ensuring that each account balance is correct. But, think about that in conjunction with SPIN's approach using state spaces.

Even with only one bank account, if we model balances accurately, we need an infinite number of states — one for each possible balance. Similarly, the production system likely has no bound on the number of possible accounts. SPIN would have a wee bit of difficulty searching the entire state space in finite time. To make verification feasible, all SPIN models are required to be finite. (All data types, including int, have a finite range.)

How could we restrict the Promela program to guarantee finiteness? The most obvious and least restrictive option is to simply use the int type and ensure that any examples used should not cause overflow.

To make verification efficient, the state space should be relatively small. Even using ints as bank balances and account numbers, assuming only 1000 lines of code and ignoring any other program variables, we still have 2322321000 232 232 1000 states — over 18 sextillion. Even at a billion states per second, a verification could still take over half a millenium!

Let's abandon the goal having our Promela prototype track the particular balances accurately. Since our focus is on concurrency, we mainly want to ensure that when there are multiple simultaneous transactions, we don't lose or mix up any of the data. We might also want to verify the security protocols used in selecting appropriate accounts. But, we no longer need to even keep track of balances. Furthermore, it is highly unlikely that the production code might have errors that only occur when there are a large number of accounts, so we can comfortably use just a few.

Since we have not fully specified this example's original system, we will leave the details to the reader's imagination.

The lost Pathfinder (optional)

In 1997, the Mars Pathfinder seemed to beam its data back to Earth just fine, except— sometimes it would appear to freeze up. The “fix”? Press Control-Alt-Delete from Earth, and re-boot its rover (incurring not only an expensive delay for Mission Control, but the full attention of late-night-TV comedians).

The cause turned out to be a bug in the concurrency protocol, between a low-priority data-gathering mode, which was meant to yield to a high-priority data-sending mode. However, sometimes these priorities would become inverted, with the high-priority thread blocking on the (supposedly) low-priority one. (More detail.) The bug 4 can be realized in Promela:

 1  /* From Spin primer/manual, Gerard Holzman. */
 2  
 3  mtype = { free, busy };             /* states for the variable "mutex" */
 4  mtype = { idle, waiting, running }  /* states for the processes */
 5  
 6  mtype highProc = idle;
 7  mtype  lowProc = idle;
 8  mtype mutex    = free;
 9  
10  active proctype high_priority()
11  {
12   end:
13    do
14    :: highProc = waiting;
15       atomic {
16         mutex == free ->
17         mutex = busy
18       };
19       highProc = running;
20       /* work work work; produce data */
21       atomic {
22         highProc = idle;
23         mutex = free
24       }
25    od
26  }
27  
28  active proctype low_priority() provided (highProc == idle)
29  {
30   end:
31    do
32    :: lowProc = waiting;
33       atomic {
34         mutex == free ->
35         mutex = busy
36       };
37       lowProc = running;
38       /* work work work; consume data and send it back to Earth. */
39       atomic {
40         lowProc = idle;
41         mutex = free
42       }
43    od
44  }
45  
46  
47  /* Note the "provided" clause as part of low_priority()'s declaration.
48   * This condition is monitered and confirmed before every single
49   * transition that low_priority() attempts.
50   * (This is a handier syntax than placing that guard condition
51   *  in front of every single statement within the process.)
52   */

Exercise 23

Can you detect the problem using SPIN?

Non-progress: livelock, starvation, and fairness

Livelock

As described previously, livelock is very similar to deadlock. But in livelock, computation doesn't get completely stuck — it simply doesn't do anything sufficiently interesting.

In previous examples, processes waited with a guarded statement. Here, as an example of busy waiting, we repeatedly check for the desired condition, and otherwise do something. Busy waiting for a condition that never comes true is one common situation that leads to livelock.


 1   show int x = 1;
 2
 3   active[2] proctype busywait()
 4   {
 5     do
 6     :: x != 0 ->
 7        printf( "%d waiting.\n", _pid );
 8     :: else ->
 9        break;
10     od;
11
12     printf( "Now this is more interesting.\n" );
13   }
            
This example also uses _pid, a read-only variable whose value is the process ID number (0-based) of the given process.

Now we want to verify this code. From the "Run" menu, select "Set Verification Parameters" and then turn on "Liveness" checking and "Non-Progress Cycles".

Using spin from the command-line:
As usual, we use -a to have SPIN generate a verifier. Then, when compiling pan.c, we must include a flag to check for non-progress, -DNP. Finally, when running the resulting executable, we must also specify loop-checking via the flag -l.

prompt>  spin -a filename.pml
prompt>  gcc -DNP pan.c -o filename
prompt>  filename -l
(With all the varying flag names, you can begin to appreciate xspin's interface!)
Verifying, SPIN now gives the following error message, declaring it is possible for the code to not make progress:

pan: non-progress cycle (at depth 6)
pan: wrote pan_in.trail
            

Now run the guided simulation. In the "Simulation Output" window, the start of the cycle is so labeled, and the end of the cycle is the last step displayed. Pictorially, the trace is as follows.

Figure 13: Trace of a non-progress cycle. Naturally, since it is a cycle, it repeats forever.
Figure 13 (busywait_cycle.png)

Starvation

Whereas livelock is when computation continues but no process makes actual progress, starvation is when computation continues, and some processes make progress but others don't.

Consider the following code. (Clearly, it does not compute anything interesting.)

 1  show int x = 0;
 2   
 3  active proctype A()
 4  {
 5    do
 6    :: true -> x = 1 - x;
 7    od
 8  }
 9  
10  active proctype B()
11  {
12    do
13    :: true -> x = 1 - x;
14    od
15  }
Our naïve expectation and desire is that A and B each get executed infinitely often (though not necessarily exactly alternating). The problem is that we could get unlucky — very 5 unlucky. It is possible for, say, A to get all the processor time, while B never even gets executed once, ever. More generally, maybe B gets its fair share of processor time for a while, but later priority is always given to A, and B never runs again.

This is simply another instance of a lack of progress. For example, if we consider only B's loop to be progress, then there is a non-progress cycle.

Exercise 24

Modify the code to test for this non-progress cycle, and then run SPIN's verifier.

Solution

Since we are currently considering only B's loop to be progress, we only add a progress label in B.


 1   show int x = 0;
 2 
 3   active proctype A()
 4   {
 5     do
 6     :: true -> x = 1 - x;
 7     od
 8   }
 9
10   active proctype B()
11   {
12    progress_B:
13     do
14     :: true -> x = 1 - x;
15     od
16   }
		

Now we want to verify this code First, run a syntax check. Partly, this is just another reminder of a good habit. But, it also forces xspin to re-scan your code and recognize the progress label. After doing this, re-run the verifier, and indeed it will find a non-progress cycle The following are two of the possible traces it could find, one minimal, and one not.

Figure 14: A minimal trace that only executes A and a nonminimal trace that executes B a little first. Progress states are depicted with red borders.
(a) (b)
Figure 14(a) (ab_cycle_minimal.png)Figure 14(b) (ab_cycle_nonminimal.png)

The next figure shows the entire state space, which has a bit of a surprise. Some of the states are duplicated, differing only in their status as progress states or not. To see why that is necessary, contrast the following. Entering line 14, as in the first transition of the above nonminimal trace, entails progress. This corresponds to each of the red progress states. However, say that B execute for one step, long enough to pass the guard, but not yet execute the assignment. Now, if only A executes, B never makes further progress. This corresponds to the black non-progress states with B's line counter at 14.

Figure 15: State space for above program.
Figure 15 (ab_statespace.png)
Exercise 25

How can we test for the possibility of A looping forever without executing B?

Solution

Remove the progress label from B, and add one to the corresponding place within A.

Exercise 26

Of course, the faulty behavior we'd really like to have SPIN alert us to is “A or B is being starved”, instead of having to choose one and name it.

Why does putting progress labels in both A and B not achieve what we want?

Solution

Putting in two progress labels, and getting a green light from SPIN's verifier, would only mean “It is verified that in every trace, either A or B is making progress.” But we want to confirm “… A and B are making progress.”

In this toy example, where A and B are identical (up to differing in progress labels), we can argue by symmetry that lack of non-progress-for-A is enough to imply lack of non-progress-for-B. But in general, SPIN makes us run two separate verifications (though we will see some possible code contortions later).

Fairness

Process scheduling software is usually written so as to avoid starvation. For example, a simple scheduler could simply pick processes in a round-robin fashion, executing some of each process 0, 1, 2, …, 0, 1, 2, … forever. However, this scheme inappropriately schedules time for blocked threads, doesn't allow the processes to be prioritized, and doesn't allow the set of running processes to change dynamically. Real schedulers manage not only the order, but also the frequency of context switches.

So far, we have considered verification with an arbitrary scheduler. Since we know nothing about the scheduler, we must consider all possible traces. (Remember that the scheduler might sometimes be nature, or even an adversary.) But, SPIN also allows you to specify some more restricted scheduling behaviors, thus reducing the state space. We will look at one of these mechanisms in SPIN — enforcing “weak fairness”.

Definition 7: Weak Fairness
Each statement that becomes enabled and remains enabled thereafter will eventually be scheduled.
(Weak fairness is but one of many notions of fairness.)

Example 14

Let us return to our starvation example, but this time we will have SPIN enforce weak fairness. To turn this feature on in xspin, select it from the verification options.

Using spin from the command-line:
The steps for creating and compiling the verifier are unchanged, but we execute the verifier with a flag -f for fairness, as well as -l for loop-checking.

prompt>  spin -a filename.pml
prompt>  gcc -DNP pan.c -o filename
prompt>  filename -f -l
 1  show int x = 0;
 2   
 3  active proctype A()
 4  {
 5    do
 6    :: true -> x = 1 - x;
 7    od
 8  }
 9  
10  active proctype B()
11  {
12    do
13    :: true -> x = 1 - x;
14    od
15  }

In this code, each process is always enabled, since each guard is just true. Thus, a weakly fair scheduler guarantees that at each point in the trace, each process will eventually be scheduled in the future. I.e., it will repeatedly switch between the two processes as desired.

Problem 1

Now verify there are no non-progress cycles when weak fairness is guaranteed. I.e., add the progress label to B again and verify that no errors are reported. Repeat with the progress label in A.

Example 15

Weak fairness is defined not in terms of processes, but of individual statements. This example illustrates how it applies to situations other than avoiding starvation, and in particular, when there is only one process.


show int x = 0;

active proctype loop()
{
  do
  :: true -> x = 1;
  :: x==1 -> x = 2;
  :: x==2 -> x = 3;
  od;
}
	      

Without any kind of fairness enforced, it is possible to repeatedly execute only the first option.

Exercise 27

Verify the previous statement. Where would you add progress labels?

Solution

We want a progress label within the second option. Although that is not allowed before the guard, it is sufficient for this example to add it after the guard.


active proctype loop()
{
  do
  :: true -> x = 1;
  :: x==1 ->
     progress_2: x = 2;
  :: x==2 ->
     progress_3: x = 3;
  od;
}
		

The progress_3 label is unnecessary since it would only be encountered after the second option is executed. However, it is arguably a good idea not to reason through such arguments, and instead let the tool formally do the reasoning.

After the first execution of the first option, the second one is continously enabled, since x==1. With weak fairness enforced, we are thus guaranteed that x eventually becomes 2.

Exercise 28

Now verify again with weak fairness enforced.

After x becomes 2, both the first and third options are enabled. Weak fairness is not sufficient to guarantee that the third will be ever be picked, as x could alternate between the values 1 and 2 forever. Since it wouldn't have the value 2 continuously, the weak fairness guarantee doesn't apply to the third option.

Exercise 29

Verify this claim. Where are progress labels needed now?

Solution

Only the previous progress_3 label should be used.

Aside:
Strong fairness states that if a statement is enabled infinitely often, it will eventually be executed. That would be sufficient to guarantee the third option is eventually chosen. SPIN does not have a built-in switch for enforcing strong fairness.

Some Unexpected Progress (optional)

Does the following code have a non-progress loop?

 1  int x = 0;
 2  
 3  active proctype A()
 4  {
 5   progressA:
 6    /* Consider: A reaches this line, but never executes it. */
 7    x = 1;
 8  }
 9  
10  active proctype B()
11  {
12    do
13    :: x == 0 -> skip;
14    :: x == 1 -> 
15       progressB: skip;
16    od;
17  }

It does if A never runs, and B runs forever with x being zero. (Of course, this only happens if fairness is not being enforced.) But surprisingly, SPIN does not report any non-progress cycles! It views A as forever sitting in the state progressA, and thus feels that progress is always happening (even though that statement is never executed).

This mismatch isn't due to the fact that SPIN associates labels with the state just before executing the labeled statement; even if they were associated with the statement preceding the label, we could still construct a situation where SPIN views A as making progress even though it is doing nothing at all (just idling in a state tagged as a progress).

The fundamental mismatch is that to you and me, the idea of “making progress” corresponds to making certain transitions. But SPIN views labels only as relating to states. If there are transitions which don't correspond to progress, yet arrive at a labeled-progress-state (in particular: self-loops, possibly introduced for stutter-extension), then the state-based and transition-based approaches differ.

We'll see in the next section a way to still capture our transition-based notion with SPIN. Nevertheless, beware that sometimes a particular tool might have some surprising notions.

Footnotes

  1. If you're truly interested, the code is simple and straightforward enough that it's not too hard to count these traces: 3!9!3!3!3!=61680=1280 3 9 3 3 3 6 1680 1 280 .
  2. As the comments suggest, references to ticket[peer] must be replaced by looping to find the maximum ticket-number among all peers. Moreover, it becomes possible for two processes to grab the same ticket-number in this way. This situation has to be checked for, and some way of breaking ties included (e.g. smallest process ID number (_pid) wins).
  3. Note that the particular value ticket == 1 is important — it corresponds to “I'm still calculating max of others”, and doing so has priority over anybody entering the process.
  4. This is of course not the full Sojourner Rover code, but it is the concurrency-protocol skeleton. The other thousands of lines of code would, of course, be omitted from a Promela prototype.
  5. Insert our usual “nondeterministic, not random” caveat here.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks