Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
We will model concurrency in two ways. First, we will use Promela, a language with C-like syntax. It is not a fully featured programming language, and is not intended for general computation. Instead, Promela (“PROcess MEta-LAnguage”) programs are intended to be simplifications or models of real-world systems, for use in verification. SPIN (“Simple Promela INterpreter”) is the tool for executing and verifying programs written in Promela. Second, we will use a simple state-based transition system that will help in understanding the specification and verification of Promela programs.
Here, we introduce Promela, SPIN, and the state-based transition system through a series of examples. For the moment, we will use SPIN merely as an interpreter, to run of Promela programs. In the next section, we will introduce the verification features of SPIN. (Reference manuals and download/install instructions are available via the SPIN homepage, spinroot.com.)
We start with a series of examples illustrating race conditions.
1 /* A variable shared between all processes. */
2 show int bal = 0;
3
4 active proctype deposit()
5 {
6 bal++;
7 }
8
9 active proctype withdraw()
10 {
11 bal--;
12 }
We have two threads, one running deposit
and one running withdraw.
The proctype keyword specifies
that the following is code for a thread/process, while the
active keyword specifies that
the thread is started immediately when we start the program.
Variables declared outside the body of a
proctype are shared.
The keyword show before a
variable declaration will direct
SPIN to display the value as it changes.
Here, the two processes of deposit
and withdraw can interleave
arbitrarily. Regardless, with this very simple example,
we will always get the same result balance.
To run the code, we use SPIN. We'll describe how to use the
program xspin, which uses
a graphical interface.
More specifically, these instructions are for UNIX version 4.1.3.
The PC and Mac versions are identical, except for how to start
the program. For details, see the program's
README.
Ask your local system administrator where the program is installed
on your computer.
spin.
It is more difficult to use interactively, but is appropriate
for use non-interactive use, such as with scripts.
For its options, see the manual pages for
spin
and the related
pan.
xspin is just a graphical front-end to
spin.
The underlying spin commands and output
are displayed at
the bottom of the main xspin window.
These can be ignored.
xspin or
spin
from Rice University's Owlnet, first type
setenv PATH /home/comp607/bin:$PATH.
Within SPIN, you'll work with a Promela program. If you already have a Promela program saved, you can open it with the "File" menu's "Open" option. Alternatively, start SPIN with the Promela program's filename:
xspin filename.pml.pml.)
Either of these loads the Promela code into an editor window,
where it can be modified.
To create a new program, you can type into this window, or you can
copy and past it from another editor.
After loading or typing a program, it's always a good idea to check your typing. In the "Run" menu, use "Run Syntax Check". It will report and highlight the first syntax error, if any.
Before the first run, you need to specify some parameters, even if those are the defaults. From the "Run" menu, select the "Set Simulation Parameters" option. Change the "Display Mode" defaults, by making sure that the "MSC" and "Data Values" panels are each selected, as well as each feature under the "Data Values" panel. Use the "Start" button to start the run. This opens three new windows. From the "Simulation Output" window, you can choose to single-step the run or let it finish. For now, choose "Run". The body of this window displays each step taken. Throughout the run, the "Data Values" window displays the values of variables. The "Message Sequence Chart" displays changes to each process and variable. Each column represents a separate process or variable, and the vertical axis represents time. For now, the only changes to processes are when they stop.
Each process gets a unique process identifier number, starting with 0. These are displayed in the output of the various windows. Later, we will see how to access these IDs from within the Promela program.
For additional runs, from the "Run" menu of the main window, you can select "(Re)Run Simulation". However, each run will be identical, because it makes the same choices for interleaving. To vary runs, or traces, change the "Seed Value" in the simulation options. Also, be sure to try the "Interactive" simulation style, where you make each choice in the interleaving.
xspin run opens several new windows,
without closing the old ones.
The easiest way to close the windows for a run is to use the "Cancel"
button in the "Simulation Output" window.
spin than xspin,
since you don't need to set
any of the simulation parameters:
prompt> spin filename.pml
How many traces are possible for the
previous example's code?
Does each end with the same value for
bal?
|
Did you see each one using SPIN?
Each trace ends with bal
equal to 0, as expected.
show
introduces transitions and states to display their
changing values. We have omitted these.
All the traces of a Promela program start at the same initial state. Any variables not explicitly initialized by the code are set to particular known values: zero, for numbers; and false, for booleans. (Compare to a C program, where two traces of the same program might each start from a different state.) And we've just seen that different traces may end in different states, or they may sometimes end in the same state while having taken different paths to reach that end state. Finally, if we look closely at all the possible traces, we see that often, several different traces might happen to pass through the same intermediate state. By combining the traces at these common states, we can depict the entire state space — a map of all the conceivable executions of our program!
![]() |
The above is the state space of the Tiny First Program's code. The states are shown with the value of the only variable's value, plus the program point of each process. The edges are shown only as arrows, as the change in line numbers indicates which statement's and thread's execution each represents.
The possible traces are then all the different paths in the state space. State spaces are primarily helpful for understanding how to verify concurrent programs, and will be used mostly in the next section.
xspin
by the "Run" menu's option to
"View Spin Automaton for each Proctype".
Also, the state numbering of this per-process form is reported
by SPIN's output of traces.
There is a close relation between the two forms, as
the whole system is intuitively some sort of
cross-product
of the individual processes.
We now modify the previous example. The two assignment statements have each been broken up into three statements. These three simulate a typical sequence of machine-level instructions in a load/store style architecture. This is a way to model that level of atomicity within Promela.
1 /* A variable shared between all processes. */
2 show int bal = 0;
3
4 active proctype deposit()
5 {
6 show int new_bal;
7
8 new_bal = bal;
9 new_bal++;
10 bal = new_bal;
11 }
12
13 active proctype withdraw()
14 {
15 show int new_bal;
16
17 new_bal = bal;
18 new_bal--;
19 bal = new_bal;
20 }
|
Here, each process has its own local variable named
new_bal.
Note that we can also have SPIN
show local variables in the
"Message Sequence Chart" window.
Run this code as described in the previous example.
How many possible traces does this code allow? (Try it in SPIN!)
This code has 70 possible traces — many more than the previous version. Even with such a small program, it is too cumbersome to try to list them all. This is a fundamental reason why it is so hard for humans to debug a concurrent program, since it involves understanding all the possible ways threads can interact. Once we start using SPIN as a verifier, letting it enumerate these as a part of verification will be one of the benefits of automation. Later, we will verify this specific example.
More importantly, what is the result of the computation,
i.e., the
value of bal at the end?
It can be 0, as expected, but it can also be 1 or -1! These new possibilities are the result of traces such as the following:
|
In particular, after one process has updated and printed the global variable, the other process can update it based upon the original un-updated value.
Each of the previous two exercises could be very easily answered, if you were given the program's state space. Draw the state space.
![]() |
Whew! See how much larger this is than the state space of simpler version of this program.
Making the atomicity more fine-grained increases the possibility of concurrency problems. We will use this idea in many of the examples in this module, because we want to ensure that we can handle even the hard problems. The particular code sequence used in the previous example is also one of the shortest with a race condition.
In Promela, initialization of a variable and assigning to a variable have subtly different semantics. This is illustrated by this variation on the previous example.
1 /* A variable shared between all processes. */
2 show int bal = 0;
3
4 active proctype deposit()
5 {
6 show int new_bal = bal;
7
8 new_bal++;
9 bal = new_bal;
10 }
11
12 active proctype withdraw()
13 {
14 show int new_bal = bal;
15
16 new_bal--;
17 bal = new_bal;
18 }
In Promela, initializing variables is part of starting a process.
Thus, all variables of all active processes are initialized
(defaulting to 0) before any other statements are executed.
Here, each copy of new_bal starts
with the value 0, so bal always
ends with either the value 1 or -1, but never 0.
![]() |
The previous examples modeled two people simultaneously updating the
same bank account, one making a deposit, and one a withdrawal.
What if we had even more simultaneous updates?
Let's focus only on deposits, so each is doing the same action.
Rather than duplicating code with multiple
proctype's (say,
deposit1,
deposit2, and
deposit3),
Promela allows us to
abstract that, as in the following:
1 /* Number of copies of the process to run. */
2 #define NUM_PROCS 3
3
4 /* A variable shared between all processes. */
5 show int z = 0;
6
7 active[NUM_PROCS] proctype increment()
8 {
9 show int new_z;
10
11 new_z = z;
12 new_z++;
13 z = new_z;
14 }
We've abstracted the variables' names, but otherwise,
increment is the same as the
previous deposit.
The first substantial change is that we are starting three processes,
rather than two, each with a copy of the same code.
The bracket notation in the
proctype declaration indicates
how many copies of that code are to be used.
The number of copies must be given by a constant,
and it is good practice to name that constant as shown.
Although vastly simplified, this is also the core of our
web-based flight reservation system example.
The shared variable z
would represent the number of booked seats.
We can also ask questions like
“What is the probability that we end
with z having the value 3?”
How would you solve this?
Divide the number of traces meeting this condition by the total number of conceivable traces. 1
However, this assumes that each possible trace is equally likely — a dangerous assumption! In real life, with a particular operating system and hardware, some traces might be much more common than others. For example, it might be very unlikely (but possible) that two context switches would occur only a small number of instructions apart. So, a more realistic answer would be based upon a statistical analysis of repeated runs.
But an even more preferable answer is to say that this is a trick question! Rather than thinking of our concurrent system as being probabilistic, it is often better to stop short and merely call it non-deterministic. For example, if processes are distributed across the internet, then which trace occurs might depend on network traffic, which in turn might depend on such vagaries as the day's headlines, or how many internet routers are experimenting with a new algorithm, or whether somebody just stumbled over an ethernet cord somewhere. In practice, we can't come up with any convincing distribution on how jobs get scheduled. Moreover, sometimes we want to cover adversarial schedules (such as guarantees of how our web server protocol acts during denial-of-service attack).
Guarded statements
are one of the most basic elements of the Promela language.
As this section will introduce, they introduce the ability to
synchronize processes.
In the next section,
we see how they are also fundamental parts of the
looping (do) and
conditional (if) constructs.
After that,
we will return to the synchronization issues, exploring them
in greater depth.
guard -> body,
consists of two parts.
The guard is a boolean expression.
The body is a statement.
Execution of the body will block,
or wait, doing nothing, until the guard becomes true.
1 bool ready = false;
2 int val;
3
4 active proctype produce_one()
5 {
6 /* Compute some information somehow. */
7 val = 42;
8
9 ready = true;
10
11 /* Can do something else. */
12 printf("produce_one done.\n");
13 }
14
15 active proctype consume_one()
16 {
17 int v;
18
19 /* Wait until the information has been computed. */
20 ready ->
21
22 /* Now, we can use this. */
23 v = val;
24 }
This program leads to the following state space,
illustrating two important traits about Promela's guards.
First, the guarded statement of
consume_one will block
— sit and do nothing — until
its guard is satisfied.
This is seen by the behavior of the first three states in
the state space.
Second, execution of the guard is itself a transition to
a state. This allows code, such as
produce_one's
printf to be interleaved
between the guard and its body.
(The otherwise unnecessary printf
is there solely to make this point.)
When such interruptions between the guard and body are
undesired, they can be prevented, as
shown later.
![]() |
-> is just
alternate syntax for ;, and
an expression is a valid statement by itself.
Thus, we more
properly define when each kind of Promela statement is enabled.
In particular, boolean expressions are enabled when they evaluate
to true, assignment statements are always enabled, and a compound
statement
stmt_1 ;
… ;
stmt_n
true -> stmt
is equivalent to the simpler
stmt.
Later we will explore the possibility that a process stays blocked forever, or equivalently, never becomes enabled.
Of course, few interesting programs are made simply of straight-line code. We generally need to have more interesting control flow, including conditionals and iteration. These are illustrated in the remaining examples of this section.
The following code illustrates use of
the do statement, the
primary iteration statement.
1 #define STARTING_DISTANCE 2
2 show int dist = STARTING_DISTANCE;
3
4 active proctype stumble()
5 {
6 do
7 :: true -> dist--; /* Approach. */
8 :: true -> dist++; /* Retreat. */
9 :: dist <= 0 -> break; /* Success! */
10 :: dist >= STARTING_DISTANCE*2 -> break; /* Give up. */
11 od;
12 }
This process consists solely of a loop from
do to the matching
od.
The loop contains four guarded statements as alternatives,
each prefaced with double colons (::).
On each iteration, all guards are evaluated. From those evaluating
to true,
only one is chosen arbitrarily, and its body executed.
(What if none are true?
We'll see soon.)
Executing a break terminates the loop.
More generally, with nested loops,
it terminates the most tightly enclosing loop.
A loop in the control flow leads to cycles in the state space.
![]() |
int range is finite:
short type, instead.
With a smaller range,
Run this code several times, observing the changing value of
dist.
As before, to see different behaviors, you'll need to change the
"Seed Value" or use the "Interactive" simulation.
Is it equivalent to change the inequalities
(<= and >=) to
equalities (==) in this example?
Why or why not?
No. For example, the first clause could be chosen until
dist
was negative. Unlike in the original code given, the
successful termination clause could not
be chosen until dist
is incremented to zero again.
Try it in SPIN, and look for such a trace.
The following code is a variation on the previous, illustrating
the if statement and the
else keyword.
1 #define STARTING_DISTANCE 2
2 show int dist = STARTING_DISTANCE;
3
4 active proctype stumble()
5 {
6 do
7 :: (!(dist <= 0) &&
8 !(dist >= STARTING_DISTANCE*2)) ->
9 if
10 :: true -> dist--; /* Approach. */
11 :: true -> dist++; /* Retreat. */
12 fi;
13 :: else -> break;
14 od;
15 }
First, consider the do loop.
It has two guarded expressions,
one with the guard else.
On each iteration, if there are any
enabled clauses, one is chosen arbitrarily, as before.
But, if none are enabled the else
clause, if any, is chosen.
There may be only one else clause.
An else clause is not the same
as one with a true guard,
as it can only be executed if no other clause is enabled.
Here, this means that the first clause of the
do, i.e.,
the entire if statement,
is used whenever the distance is within
the specified range. Otherwise, the loop is exited.
Thus, unlike the
previous example,
the loop is always
exited once the distance is 0 or
STARTING_DISTANCE*2.
Inside this loop, we have an if
statement, which itself has
two clauses. Like the do,
it chooses arbitrarily among the
enabled clauses, and may have an else
clause.
Run this code several times and observe its behavior.
What is the state space for this code?
![]() |
As desired, the program will always exit if
dist reaches either of its
limit values.
Is it equivalent to change the <=
and >= to
== in this example?
Why or why not?
Yes. As mentioned, because of the
else clause, the loop
will terminate once the distance is out of range. Since the
distance is only ever changed by 1 on an iteration, it is
sufficient to check just the boundary conditions of the range.
While equivalent, it is probably a not good idea to make this change, since its correctness is dependent on other factors.
What is Promela's equivalent of the following C conditional statement?
if (condition)
then_stmt;
else
else_stmt;
if
:: condition -> then_stmt;
:: else -> else_stmt;
fi;
What is Promela's equivalent of the following C loop statement?
while (condition)
body_stmt;
do
:: condition -> body_stmt;
:: else -> break;
do;
The previous examples illustrate what happens where there is a choice among guarded statements and at least one is enabled. What if none are enabled? In that case, that process is blocked, and it halts execution until it is no longer blocked.
Create a Promela program illustrating synchronization and looping in the following way. One process counts down a global variable, until it reaches zero. Another process waits for the count to be zero to print a message. Run your program in "Interactive" mode and confirm that the second process is blocked, and thus does nothing, until the count reaches zero.
show int count = 5;
active proctype count_down()
{
do
:: count > 0 -> count--;
:: else -> break;
od;
}
active proctype wait()
{
count == 0 ->
printf("I'm done waiting!\n");
}
We have so far seen very simple interactions between processes, using shared variables for communication and guarded statements for synchronization. This sections delves further into the possible interactions and introduces some additional Promela syntax for that.
While blocking is an essential tool in designing concurrent programs, since it allows process to synchronize, it also introduces the possibility of deadlock, when all processes are blocked. Since deadlock is a common problem in concurrent programs, we will later see how to verify whether a program can deadlock.
Create a small Promela program that can deadlock. The program need not do anything computationally interesting. Run your program to see that it can get stuck.
There are many possible programs, but the following is a simple example.
show int x = 1;
active[2] proctype wait()
{
x == 0 ->
printf("How on earth did x become 0?!?\n");
}
One standard concurrent programming technique that uses guarded statements is locking. While details of its use are beyond the scope of this module, we'll provide a quick review. Locks are used when there is some resource or critical section of code that we want to control access to. For example, only one process at a time should be able to write data to a specific file, lest the file get intermingled inconsistently. To ensure this, the process must acquire the lock associated with writing this file before it can do the operation, and then must release it.
We can modify the
Generalized Race Condition example
to eliminate its race condition.
The lock controls access to the shared resource,
i.e., the shared
variable z.
1 /* Number of copies of the process to run. */
2 #define NUM_PROCS 3
3
4 show int z = 0;
5 show bool locked = false;
6
7 active[NUM_PROCS] proctype increment()
8 {
9 show int new_z;
10
11 atomic {
12 !locked -> /* Lock available? */
13 locked = true; /* Acquire lock. */
14 }
15
16 /* Critical section of code. */
17 new_z = z;
18 new_z++;
19 z = new_z;
20
21 locked = false; /* Release lock. */
22 /* End critical section of code. */
23 }
As described before, when using a lock, we need to acquire the lock, execute the critical section, and release the lock. Releasing the lock is the final action of the critical section, as it allows another process to acquire the lock. To acquire the lock, clearly we need to check whether it is available. This guard is where each process potentially blocks, waiting for its turn to get the block.
The atomic keyword,
as its name implies, makes a group of statements atomic.
Without it, two processes
could first each determine the lock was available, then each
acquire the lock, defeating the purpose of the lock.
What values of z
are possible at the end of this program?
As desired, only the value 3, since there are 3 processes.
Run this code several times and observe its behavior.
Also, eliminate the atomic
keyword and its associated braces, and repeat.
Assume that the atomic keyword
is removed, and there are only two processes, instead of three.
Give a trace where both
are in the critical section at the same time.
What are the differences in the state spaces with and without
the atomic?
The following figure shows one such trace.
Notice that z
is effectively incremented only once, instead of twice.
![]() |
The critical difference in state spaces is that
the atomic keyword prohibits
processes interleaving between the guard and its body.
This reduces the size of the state space.
In this case, as in many others, it also reduces the number
of different end states and, thus, possible result values.
We now have the tools to demonstrate how a tiny server and its clients can be modeled. The server process is a loop that repeatedly waits for and then processes requests. Clients send requests. For simplicity, in this example, the server won't communicate any results back to the clients, and the clients will only make one request each.
Servers generally allow multiple kinds of requests. Here, we'll
use one more new piece of Promela syntax,
mtype, to define some named
constants, similar to an enumerated type.
/* Number of each type of client. */
#define NUM_CLIENTS 1
/* Define the named constants. */
mtype = {NONE, REQUEST1, REQUEST2};
/* Declare a shared variable. */
show mtype request = NONE;
active proctype server()
{
do
:: request == REQUEST1 ->
printf("Processing request type 1.\n");
request = NONE;
:: request == REQUEST2 ->
printf("Processing request type 2.\n");
request = NONE;
od;
}
active[NUM_CLIENTS] proctype client1()
{
atomic{
request == NONE ->
request = REQUEST1;
}
}
active[NUM_CLIENTS] proctype client2()
{
atomic{
request == NONE ->
request = REQUEST2;
}
}
Function calls are another standard kind of control flow.
Surprisingly,
Promela does not have function calls!
Every proctype instance is a separate process.
Though if you really want, you could simulate a function call by
creating
a new process dynamically,
and blocking until that process returns.
So far, we have determined the possible behaviors of a program simply by running the program a bunch of times. For small programs, we can be very careful and make sure we exhibit all the possible traces, but the state space soon becomes unwieldy.
The real power of SPIN is as a tool for verification, our original goal. SPIN will search the entire state space for us, looking for (reachable) states which fail to have desired properties.
The first verification technique we'll examine are
assertions, common to many programming languages.
In Promela, the statement
assert(condition);
evaluates the condition. If the result is true, execution continues
as usual. Otherwise, the entire program
is aborted and an error message is printed.
When simulating a single run of the program, SPIN automatically checks these run-time assertions; this is the usage that most programmers should be familiar with from traditional programming languages. But additionally, we'll see that SPIN, in the course of searching the entire state space, verifies whether an assertion can ever fail! (Though of course it can only search finite, feasible state spaces; happily, “feasible” can often include hundreds of millions of states.)
Consider
our last race condition example.
One of our original naïve expectations was that,
within each process,
the value of z at the end
of the process is exactly one more than at the beginning.
The previous examples have shown that to be wrong, but we had to
run the program until we encountered a run when it failed.
Here, the assert
statement puts that expectation explicitly
into the program, for SPIN to check.
1 #define NUM_PROCS 3
2
3 show int z = 0;
4
5 active[NUM_PROCS] proctype increment()
6 {
7 show int new_z;
8
9 /* A saved copy of the old z, for the assertion. */
10 show int old_z;
11
12 old_z = z;
13 new_z = old_z + 1;
14 z = new_z;
15
16 assert(z == old_z+1);
17 }
It is often the case, as it is here, that to state the desired
condition we need to add an extra variable —
here, old_z.
As always, it is important that when introducing such code for
testing that you don't substantially change the code to be tested,
lest you inadvertently introduce new bugs!
Run this code several times, and observe when the assertion
fails.
This text indicates which assertion failed, and the line will be
highlighted in the code window. To see which process' copy of
increment failed and why,
you have to look more closely at the steps shown.
spin: line 16 "pan_in", Error: assertion violated
spin: text of failed assertion: assert((z==(old_z+1)))
Certainly this run-time check is helpful, but what about the promise of checking whether an assertion can fail? We will need to use SPIN as a verifier, rather than just a simulator.
As with simulation, before our first verification, we need to specify some parameters, even if only using the defaults. From the "Run" menu, select the "Set Verification Parameters" options. For this example, make sure that the "Assertions" option is marked. Use the "Run" button to start the verification.
spin by hand involves three steps:
we must first create a verifier (a C program) from our promela source;
then compile the verifier, and then run the verifier:
prompt> spin -a filename.pml
prompt> gcc pan.c -o filename
prompt> filename
-a flag tells SPIN to generate
a verifier, which it always names pan.c.
As desired, the verification finds a problem, as reported in the "Verification Output" window. It also saves a description of the example run found where the assertion fails.
pan: assertion violated (z==(old_z+1)) (at depth 11)
pan: wrote pan_in.trail
pan is the standard SPIN
default name for files related to Promela verification.
With xspin, you'll see these
filenames reported, but you don't need to use or remember them.
Now we can have SPIN show us the sample failed run it found. In the new "Suggested Action" window, select "Run Guided Simulation". This brings up the now-familiar simulation windows, which will show this particular run. This run is "guided" by a trace saved by the verification. (Alternatively, selecting "Setup Guided Simulation" allows you to change the display options first.)
-t
flag to have SPIN use the pre-named trail file.
It's often handy to use the -p flag to
have SPIN print each line it executes:
prompt> spin -t -p filename.pml
The trace found in our verification executed some of each thread. From our experience with this program, we know there are less complicated traces where the assertion fails. How can we find them?
Again, in the "Run" menu, select "Set Verification Parameters". Now, select "Set Advanced Options", then "Find Shortest Trail", and "Set". Then run the verification again.
pan: assertion violated (z==(old_z+1)) (at depth 11)
pan: wrote pan_in.trail
pan: reducing search depth to 10
pan: wrote pan_in.trail
pan: reducing search depth to 5
|
Previously,
we claimed that adding a lock fixed the race
condition of the previous example. Let's now verify that claim.
The following is the locking version of the code augmented with
the same assertion code just used.
Note that the assertion needs to be inside the critical section.
(Otherwise, process B could execute its critical section, changing
z,
in between process A's critical section and its assertion.)
/* Number of copies of process to run. */
#define NUM_PROCS 3
show int z = 0;
show bool locked = false;
active[NUM_PROCS] proctype increment()
{
show int new_z;
/* A saved copy of the old z, for the assertion. */
show int old_z;
atomic {
!locked -> /* Lock available? */
locked = true; /* Acquire lock. */
}
/* Critical section of code. */
old_z = z;
new_z = old_z + 1;
z = new_z;
assert(z == old_z+1);
locked = false; /* Release lock. */
/* End critical section of code. */
}
Run the verification as before. As expected, no errors are found. Happily, this time no error messages are reported before the version information in the "Verification Output" window, and SPIN doesn't suggest any guided simulation.
A previous exercise asked you to write a small deadlocking program, and provided one solution. Running such a program in SPIN's simulation mode, some execution paths result in deadlock, i.e., every single process waiting on other processes to do something first. How can we verify whether or not deadlock is possible? SPIN checks for deadlock automatically, as one of a hand-full of built-in checks, so we will see how that is reported.
Run SPIN's verifier on your tiny deadlocking program. The deadlock is reported by a somewhat obscure error message:
pan: invalid end state (at depth 0)
pan: wrote pan_in.trail
xspin allows
you to easily run the guided simulation it found.
Deadlock occurs in a state when nothing further will happen—that is, there are no outgoing edges in the state space. We call this an end state. Most programs we've seen previously have a valid end state: all processes complete their last line. In contrast, in deadlock we are in an end state, yet at least one process hasn't completed. Hence “invalid”. SPIN checks for end states by searching the entire state space (either by a depth- or breadth-first search).
We've mentioned that valid end states are those where every process has completed its last line, and that deadlock ends with at least one process which hasn't completed. However, that's not enough to conclude deadlock: sometimes it is desirable to have a trace reach an end-state with a process still (say) waiting for more input. Consider a server and its clients. By default, the server waits and does nothing. Only when a client makes a request, the server responds and does something. If all clients exit, it is expected, not an error, for the server to be blocked waiting for another request. In SPIN, we need to notate that it is valid for the server to end at that point, and not actually deadlock.
If we run SPIN's verifier on the code for the Tiny Server and Clients example, we get a spurious “invalid end state” error, since we expect the client to be alive and listening for for any more requests.
To let SPIN know that is acceptable and shouldn't actually
be considered deadlock, we label the beginning of
the server loop as being a valid end state:
We give it a label whose first three
letters are “end”, as shown below.
Run the verifier again on this, and there should
be no error message.
1 /* Number of each type of client. */
2 #define NUM_CLIENTS 1
3
4 /* Define the named constants. */
5 mtype = {NONE, REQ1, REQ2};
6
7 /* Declare a shared variable. */
8 show mtype request = NONE;
9
10 active proctype server()
11 {
12 /* Waiting for a request is a valid place to end. */
13 endwait:
14 do
15 :: request == REQ1 ->
16 printf("Processing request type 1.\n");
17 request = NONE;
18 :: request == REQ2 ->
19 printf("Processing request type 2.\n");
20 request = NONE;
21 od;
22 }
23
24 active[NUM_CLIENTS] proctype client1()
25 {
26 atomic {
27 request == NONE ->
28 request = REQ1;
29 }
30 }
31
32 active[NUM_CLIENTS] proctype client2()
33 {
34 atomic {
35 request == NONE ->
36 request = REQ2;
37 }
38 }
More formally, what this does is to tag some of the states in the state space as being valid, if they happen to be an end-state.
![]() |
Putting the end label in front of the entire
do statement may not seem as natural to you as:
do
::
end1:
request == REQ1 ->
…
::
end2:
request == REQ2 ->
…
od;
end1 and
end2
both represent the same program point, where the program
waits for some guard to become true.
It would be nonsensical to have (say) one of the two guards
labeled an end-state without the other one being an end-state.
To prevent surprising inconsistencies, Promela disallows
labels in front of individual guards of a compound statement.
There are other syntactic restrictions of where labels can appear. The most commonly encountered is that they cannot appear at the end of a statement block. For example, instead of
{
x = 1;
y = 2;
label: /* Label not allowed here. */
}
{
x = 1;
y = 2;
label:
skip; /* A statement that does nothing. */
}
It's worth showing several examples of correct and incorrect concurrent programs, and how SPIN can implement and attempt to verify them. We'll examine a sequence of programs, all dealing with mutual exclusion protocols. We hope to leave the gentle reader with an appreciation of the non-obvious nature of concurrent bugs (and hence the value of automated verification).
To get started, we revisit the fundamental race condition, phrased in terms of two processes simultaneously each being in their critical section.
1 /* An INCORRECT attempt at Mutual exclusion.
2 * Adapted from Ruys 2002: SPIN Beginner's Tutorial.
3 */
4
5 bool flag = false; /* Is some process in its crit section? */
6 int data = 0;
7
8 active[2] proctype proc()
9 {
10 do
11 :: flag == false -> /* Wait until nobody in crit section. */
12 flag = true;
13
14 /* Entering critical section. */
15 data++;
16 assert(data == 1);
17 data--;
18 /* Leaving critical section. */
19
20 flag = false;
21 od;
22 }
Take a moment to run SPIN's verifier, and confirm that
the assert statement indeed might be violated.
Here is another attempt which doesn't work. Can you see how it might go awry?
1 /* An INCORRECT attempt at Mutual exclusion.
2 * Version 2.
3 * Adapted from Ruys 2002: SPIN Beginner's Tutorial.
4 */
5
6 bool flagA = false; /* A wants to enter its crit section. */
7 bool flagB = false; /* B, similarly. */
8 int data = 0;
9
10 active proctype A()
11 {
12 do
13 :: flagA = true; /* Declare intent to enter crit section. */
14 flagB == false -> /* Wait for B to leave their crit section, if in. */
15
16 /* Entering critical section. */
17 data++;
18 assert(data == 1);
19 data--;
20 /* Leaving critical section. */
21
22 flagA = false;
23 od;
24 }
25
26
27 active proctype B()
28 {
29 do
30 :: flagB = true; /* Declare intent to enter crit section. */
31 flagA == false -> /* Wait for A to leave their crit section, if in. */
32
33 /* Entering critical section. */
34 data++;
35 assert(data == 1);
36 data--;
37 /* Leaving critical section. */
38
39 flagB = false;
40 od;
41 }
Again, use SPIN to verify that the assertion can be violated.
(Keep in mind that we can't use
atomic
(or d_step)
here; the entire point of a mutex algorithm
is to be able to implement multi-statement atomicity
using nothing but shared memory
and the atomicity of simple-assignment.)
By being substantially more clever though, interacting flags can be used to get two processes to share a critical section:
1 /* A correct attempt at Mutual exclusion, huzzah!
2 * Peterson, 1981.
3 */
4
5 bool flagA = false; /* A wants to enter its crit section? */
6 bool flagB = false; /* B, similarly. */
7 pid turn; /* For politeness, offer other people a turn. */
8 int data = 0;
9
10
11 active proctype A()
12 {
13 do
14 :: flagA = true; /* Declare intent to enter crit section. */
15 turn = 1 - _pid; /* Offer the turn to the other. */
16
17 ((flagB == false) || (turn == _pid)) ->
18
19 /* Entering critical section. */
20 data++;
21 assert(data == 1);
22 data--;
23 /* Leaving critical section. */
24
25 flagA = false;
26 od;
27 }
28
29 active proctype B()
30 {
31 do
32 :: flagB = true; /* Declare intent to enter crit section. */
33 turn = 1 - _pid; /* Offer the turn to the other. */
34
35 ((flagA == false) || (turn == _pid)) ->
36
37 /* Entering critical section. */
38 data++;
39 assert(data == 1);
40 data--;
41 /* Leaving critical section. */
42
43 flagB = false;
44 od;
45 }A and B are using their
flags in symmetric ways, and we can certainly factor
their common code to get two instances of the same proctype:
1 /* A correct attempt at Mutual exclusion, huzzah!
2 * Peterson, 1981.
3 */
4
5 bool flag[2] = false; /* Process #i wants to enter its crit. section? */
6 pid turn; /* For politeness, offer other people a turn. */
7 int data = 0;
8
9 active[2] proctype P()
10 {
11 pid me, peer;
12
13 me = _pid;
14 peer = 1 - _pid;
15
16 do
17 :: flag[me] = true; /* Declare intent to enter crit section. */
18 turn = peer; /* Politely give others a chance, first. */
19
20 ((flag[peer] == false) || (turn == me)) ->
21
22 /* Entering critical section. */
23 data++;
24 assert(data == 1);
25 data--;
26 /* Leaving critical section. */
27
28 flag[me] = false;
29 od;
30 }A different algorithm for mutual exclusion, the Bakery algorithm, can be thought of as having a roll of sequentially-numbered tickets; when a process wants access to the critical section (the bakery clerk), they take a number, and then look around to be sure they have the lowest ticket number of anybody, before striding up to the counter. The tricky bit is that “taking a number” means both copying the value of a global counter, and updating that counter, in the presence of other processes who might be doing so simultaneously. It seems like it's begging the question, to even do that much!
Promela code for a two-process implementaton of the Bakery algorithm
is below; it can be generalized to
1 /* A correct attempt at Mutual exclusion.
2 * Lamport's "Bakery algorithm"
3 * Procs which communicate only through ticket[].
4 */
5
6 byte ticket[2]; /* `byte' ranges from 0..255. Initialized to 0. */
7 int data = 0;
8
9 active [2] proctype customer()
10 {
11 pid me = _pid;
12 pid peer = 1-_pid; /* Special case for 2 procs only. */
13
14 skip; /* Doing some non-critical work... */
15
16 ticket[me] = 1; /* Declare desire to enter crit section. */
17 ticket[me] = ticket[peer]+1; /* In general: max(all-your-peers)+1. */
18 ((ticket[peer] == 0) || (ticket[me] < ticket[peer])) ->
19
20 /* Entering critical section. */
21 data++;
22 assert(data == 1);
23 data--;
24 /* Leaving critical section. */
25
26 ticket[me] = 0; /* Relinquish our ticket-number. */
27 }
28
29
30 /* The meaning of ticket[], for each process:
31 * ticket[me]==0 means I am not contending for crit section.
32 * ticket[me]==1 means I am in the process of taking a ticket
33 * (I am looking at all other ticket holders)
34 * ticket[me] > 1 means I have received a ticket and will
35 * enter the critical section when my turn comes up.
36 *
37 * A problem still present, when trying to generalize to > 2 procs:
38 * Two procs may choose the same ticket-number;
39 * when nobody has a smaller ticket than you, check them all for equal tix;
40 * must break any ties reasonably (eg by smaller _pid).
41 *
42 * It's a bit sneaky, that 0 and 1 have special meanings
43 * (and the fact that 1 is less than any other "valid" ticket-number
44 * is important -- while anybody is calculating their ticket,
45 * nobody else will enter the critical section.
46 * This inefficiency can be avoided by being cleverer.)
47 */
You might have noticed that this example (unlike earlier ones)
only has each process perform their critical section once.
No problem, we can put that in a do loop:
1 /* An overflow bug sneaks into our Mutual exclusion, uh-oh!
2 * Lamport's "Bakery algorithm"
3 * Procs which communicate only through ticket[].
4 */
5
6 byte ticket[2];
7 int data = 0;
8
9 active [2] proctype customer()
10 {
11 pid me = _pid;
12 pid peer = 1-_pid;
13
14 do
15 :: skip; /* Doing some non-critical work... */
16
17 ticket[me] = 1; /* Declare desire to enter crit section. */
18 ticket[me] = ticket[peer]+1;
19 ((ticket[peer] == 0) || (ticket[me] < ticket[peer])) ->
20
21 /* Entering critical section. */
22 data++;
23 assert (data == 1);
24 data--;
25 /* Leaving critical section. */
26
27 ticket[me] = 0; /* Relinquish our ticket-number. */
28 od;
29 }
spin: line 17 "mutexG2.pml", Error: value (256->0 (8)) truncated in assignment
ticket, a byte,
is overflowing!
ticket can keep incrementing until overflow.
ticket
from byte
to int changes nothing conceptually,
it does change SPIN's verification.
Re-run the verification with this change.
What is the difference?
ticket=1 and enter
its critical section.
While there, P2 takes ticket=2,
blocking until P1 is done.
Then P2 enters its critical section, but before
P2 finishes P1 again requests its critical section,
taking ticket=3.
This process repeats—each process requesting
the critical
section (and bumping up the ticket number)
before its peer finishes.
ticket number
(intentionally) drops back to zero.
byte:
there is no conceptual difference in the verification errors
between using byte and int,
but by keeping the state space from being extraneously large,
we get better insight to what the difficulty was.
(Imagine if we'd used int first — we might
have seen the state-space error, presumed that other parts of
our program made verification infeasible, crossed our fingers
and given up.)
The moral of the story is that concurrent protocols can difficult and subtle. The SPIN book puts it well: “The number of incorrect mutual exclusion algorithms that have been dreamt up over the years, often supported by long and persuasive correctness arguments, is considerably larger than the number of correct ones. [One exercise], for instance, shows a version, converted into Promela, that was recommended by a major computer manufacturer in the not too distant past.”
The following are some other ideas, each of these can form the kernel of a successful mutex algorithm.
Our first Promela examples were drastically simplified models of banking, with deposits and withdrawals. Now, let's consider a production-quality system of bank accounts. Its code base could easily entail millions of lines of code.
Before we start porting the entire system to Promela, let's consider what it is we want to use SPIN to verify. Ideally, we want to verify everything, including ensuring that each account balance is correct. But, think about that in conjunction with SPIN's approach using state spaces.
Even with only one bank account, if we model balances accurately,
we need an infinite number of states —
one for each possible balance.
Similarly, the production system likely has no bound
on the number of possible accounts.
SPIN would have a wee bit of difficulty searching the
entire state space in finite time.
To make verification feasible, all SPIN models are required
to be finite.
(All data types, including int,
have a finite range.)
How could we restrict the Promela program to guarantee
finiteness?
The most obvious and least restrictive option is to simply use
the int type
and ensure that any examples used should not cause overflow.
To make verification efficient, the state space should be
relatively small. Even using ints
as bank balances and
account numbers, assuming only 1000 lines of code and ignoring
any other program variables, we still have
Let's abandon the goal having our Promela prototype track the particular balances accurately. Since our focus is on concurrency, we mainly want to ensure that when there are multiple simultaneous transactions, we don't lose or mix up any of the data. We might also want to verify the security protocols used in selecting appropriate accounts. But, we no longer need to even keep track of balances. Furthermore, it is highly unlikely that the production code might have errors that only occur when there are a large number of accounts, so we can comfortably use just a few.
Since we have not fully specified this example's original system, we will leave the details to the reader's imagination.
In 1997, the Mars Pathfinder seemed to beam its data back to Earth just fine, except— sometimes it would appear to freeze up. The “fix”? Press Control-Alt-Delete from Earth, and re-boot its rover (incurring not only an expensive delay for Mission Control, but the full attention of late-night-TV comedians).
The cause turned out to be a bug in the concurrency protocol, between a low-priority data-gathering mode, which was meant to yield to a high-priority data-sending mode. However, sometimes these priorities would become inverted, with the high-priority thread blocking on the (supposedly) low-priority one. (More detail.) The bug 4 can be realized in Promela:
1 /* From Spin primer/manual, Gerard Holzman. */
2
3 mtype = { free, busy }; /* states for the variable "mutex" */
4 mtype = { idle, waiting, running } /* states for the processes */
5
6 mtype highProc = idle;
7 mtype lowProc = idle;
8 mtype mutex = free;
9
10 active proctype high_priority()
11 {
12 end:
13 do
14 :: highProc = waiting;
15 atomic {
16 mutex == free ->
17 mutex = busy
18 };
19 highProc = running;
20 /* work work work; produce data */
21 atomic {
22 highProc = idle;
23 mutex = free
24 }
25 od
26 }
27
28 active proctype low_priority() provided (highProc == idle)
29 {
30 end:
31 do
32 :: lowProc = waiting;
33 atomic {
34 mutex == free ->
35 mutex = busy
36 };
37 lowProc = running;
38 /* work work work; consume data and send it back to Earth. */
39 atomic {
40 lowProc = idle;
41 mutex = free
42 }
43 od
44 }
45
46
47 /* Note the "provided" clause as part of low_priority()'s declaration.
48 * This condition is monitered and confirmed before every single
49 * transition that low_priority() attempts.
50 * (This is a handier syntax than placing that guard condition
51 * in front of every single statement within the process.)
52 */Can you detect the problem using SPIN?
As described previously, livelock is very similar to deadlock. But in livelock, computation doesn't get completely stuck — it simply doesn't do anything sufficiently interesting.
In previous examples, processes waited with a guarded statement. Here, as an example of busy waiting, we repeatedly check for the desired condition, and otherwise do something. Busy waiting for a condition that never comes true is one common situation that leads to livelock.
1 show int x = 1;
2
3 active[2] proctype busywait()
4 {
5 do
6 :: x != 0 ->
7 printf( "%d waiting.\n", _pid );
8 :: else ->
9 break;
10 od;
11
12 printf( "Now this is more interesting.\n" );
13 }
_pid, a read-only variable
whose value is the process ID number (0-based) of the given process.
Now we want to verify this code. From the "Run" menu, select "Set Verification Parameters" and then turn on "Liveness" checking and "Non-Progress Cycles".
-a to have SPIN generate a verifier.
Then, when compiling pan.c,
we must include a flag to check for non-progress, -DNP.
Finally, when running the resulting executable, we must also
specify loop-checking via the flag -l.
prompt> spin -a filename.pml
prompt> gcc -DNP pan.c -o filename
prompt> filename -l
xspin's interface!)
pan: non-progress cycle (at depth 6)
pan: wrote pan_in.trail
Now run the guided simulation. In the "Simulation Output" window, the start of the cycle is so labeled, and the end of the cycle is the last step displayed. Pictorially, the trace is as follows.
![]() |
Whereas livelock is when computation continues but no process makes actual progress, starvation is when computation continues, and some processes make progress but others don't.
Consider the following code. (Clearly, it does not compute anything interesting.)
1 show int x = 0;
2
3 active proctype A()
4 {
5 do
6 :: true -> x = 1 - x;
7 od
8 }
9
10 active proctype B()
11 {
12 do
13 :: true -> x = 1 - x;
14 od
15 }A and
B each get executed infinitely often
(though not necessarily exactly alternating).
The problem is that we could get unlucky — very
5
unlucky.
It is possible for, say, A
to get all the processor time,
while B never even gets executed once, ever.
More generally, maybe B
gets its fair share of processor time for a while,
but later priority is always given to A,
and B never runs again.
This is simply another instance of a lack of progress.
For example, if we consider only B's
loop to be progress, then there is a non-progress cycle.
Modify the code to test for this non-progress cycle, and then run SPIN's verifier.
Since we are currently considering only
B's loop to be progress,
we only add a progress label in B.
1 show int x = 0;
2
3 active proctype A()
4 {
5 do
6 :: true -> x = 1 - x;
7 od
8 }
9
10 active proctype B()
11 {
12 progress_B:
13 do
14 :: true -> x = 1 - x;
15 od
16 }
Now we want to verify this code
First, run a syntax check. Partly, this is just another
reminder of a good habit. But, it also forces
xspin to re-scan your code and recognize
the progress label. After doing this,
re-run the verifier, and indeed it will find a non-progress
cycle
The following are two of the possible traces it could find,
one minimal, and one not.
|
The next figure shows the entire state space, which has
a bit of a surprise. Some of the states are duplicated,
differing only
in their status as progress states or not.
To see why that is necessary, contrast the following.
Entering line 14, as in the first
transition of the above nonminimal trace, entails progress.
This corresponds to each of the red progress states.
However, say that B
execute for one step, long enough to pass the guard, but
not yet execute the assignment.
Now, if only A executes,
B never makes further progress.
This corresponds to the black non-progress states with
B's line counter at 14.
![]() |
How can we test for the possibility of
A looping forever
without executing B?
Remove the progress label from B,
and add one to the
corresponding place within A.
Of course, the faulty behavior we'd really like
to have SPIN alert us to is
“A or B
is being starved”,
instead of having to choose one and name it.
Why does putting progress labels in both
A and B
not achieve what we want?
Putting in two progress labels, and getting a
green light from SPIN's verifier, would only mean
“It is verified that in every trace,
either A or B
is making progress.”
But we want to confirm
“… A
and
B are making progress.”
In this toy example,
where A and B are
identical (up to differing in progress labels),
we can argue by symmetry that
lack of non-progress-for-A
is enough to imply
lack of non-progress-for-B.
But in general, SPIN makes us run two separate
verifications (though we will see some possible
code
contortions later).
Process scheduling software is usually written so as to avoid starvation. For example, a simple scheduler could simply pick processes in a round-robin fashion, executing some of each process 0, 1, 2, …, 0, 1, 2, … forever. However, this scheme inappropriately schedules time for blocked threads, doesn't allow the processes to be prioritized, and doesn't allow the set of running processes to change dynamically. Real schedulers manage not only the order, but also the frequency of context switches.
So far, we have considered verification with an arbitrary scheduler. Since we know nothing about the scheduler, we must consider all possible traces. (Remember that the scheduler might sometimes be nature, or even an adversary.) But, SPIN also allows you to specify some more restricted scheduling behaviors, thus reducing the state space. We will look at one of these mechanisms in SPIN — enforcing “weak fairness”.
Let us return to our
starvation example,
but this time we will have SPIN enforce weak fairness.
To turn this feature on in xspin,
select it from the verification options.
-f for
fairness, as well as -l for loop-checking.
prompt> spin -a filename.pml
prompt> gcc -DNP pan.c -o filename
prompt> filename -f -l
1 show int x = 0;
2
3 active proctype A()
4 {
5 do
6 :: true -> x = 1 - x;
7 od
8 }
9
10 active proctype B()
11 {
12 do
13 :: true -> x = 1 - x;
14 od
15 }
In this code, each process is always enabled, since each guard is
just true. Thus, a weakly fair
scheduler guarantees that at each point in the trace,
each process will eventually be scheduled
in the future. I.e., it will
repeatedly switch between the two processes as desired.
Now verify there are no non-progress cycles when weak
fairness is guaranteed. I.e.,
add the progress label to B
again and verify that no errors are reported.
Repeat with the progress label in
A.
Weak fairness is defined not in terms of processes, but of individual statements. This example illustrates how it applies to situations other than avoiding starvation, and in particular, when there is only one process.
show int x = 0;
active proctype loop()
{
do
:: true -> x = 1;
:: x==1 -> x = 2;
:: x==2 -> x = 3;
od;
}
Without any kind of fairness enforced, it is possible to repeatedly execute only the first option.
Verify the previous statement. Where would you add progress labels?
We want a progress label within the second option. Although that is not allowed before the guard, it is sufficient for this example to add it after the guard.
active proctype loop()
{
do
:: true -> x = 1;
:: x==1 ->
progress_2: x = 2;
:: x==2 ->
progress_3: x = 3;
od;
}
The progress_3 label is unnecessary
since it would only be encountered after the second option
is executed. However, it is arguably a good idea not to reason
through such arguments, and instead let the tool formally
do the reasoning.
After the first execution of the first option, the second one
is continously enabled, since x==1.
With weak fairness enforced, we are thus guaranteed that
x eventually becomes 2.
Now verify again with weak fairness enforced.
After x becomes 2,
both the first and third options are enabled.
Weak fairness is not sufficient to guarantee that the third will be
ever be picked, as x could
alternate between the values 1 and 2
forever. Since it wouldn't have the value 2 continuously, the weak
fairness guarantee doesn't apply to the third option.
Verify this claim. Where are progress labels needed now?
Only the previous progress_3
label should be used.
Does the following code have a non-progress loop?
1 int x = 0;
2
3 active proctype A()
4 {
5 progressA:
6 /* Consider: A reaches this line, but never executes it. */
7 x = 1;
8 }
9
10 active proctype B()
11 {
12 do
13 :: x == 0 -> skip;
14 :: x == 1 ->
15 progressB: skip;
16 od;
17 }
It does if
A never runs, and B runs forever
with x being zero.
(Of course, this only happens if fairness is not being enforced.)
But surprisingly,
SPIN does not report any non-progress cycles!
It views A as forever sitting in the
state progressA, and thus feels that progress is
always happening (even though that statement is never executed).
This mismatch isn't due to the fact that
SPIN associates labels with the state just
before
executing the labeled statement;
even if they were associated with the statement preceding
the label,
we could still construct a situation
where SPIN views A as making progress even
though it is doing nothing at all (just idling
in a state tagged as a progress).
The fundamental mismatch is that to you and me, the idea of “making progress” corresponds to making certain transitions. But SPIN views labels only as relating to states. If there are transitions which don't correspond to progress, yet arrive at a labeled-progress-state (in particular: self-loops, possibly introduced for stutter-extension), then the state-based and transition-based approaches differ.
We'll see in the next section a way to still capture our transition-based notion with SPIN. Nevertheless, beware that sometimes a particular tool might have some surprising notions.
ticket[peer] must
be replaced by looping to find the maximum ticket-number among
all peers. Moreover, it becomes possible for two processes to
grab the same ticket-number in this way.
This situation has to be checked for, and some way of breaking ties
included (e.g. smallest process ID number
(_pid) wins).
ticket == 1 is
important — it corresponds to
“I'm still calculating max of others”,
and doing so has priority
over anybody entering the process.