Note: Your browser may not currently support MathML. See our browser support page for additional details. You can always view the correct math in the PDF version.
Concurrency is the execution of two or more independent, interacting programs over the same period of time; their execution can be interleaved or even simultaneous. Concurrency is used in many kinds of systems, both small and large.
The typical home computer is full of examples. In separate windows, a user runs a web browser, a text editor, and a music player all at once. The operating system interacts with each of them.
Almost unnoticed, the clock and virus scanner are also running. The operating system waits for the user to ask more programs to start, while also handling underlying tasks such as resolving what information from the Internet goes to which program.
Using a web-based airline reservation system, Joe and Sue each want to buy a ticket. Each has a separate computer, and thus a separate web browser. Their two web browsers plus the airline's web server together comprise the concurrent program.
As this example illustrates, the term concurrent system might be more appropriate. However, for verification purposes, we will view them as a single, distributed entity to be modeled.
A concurrent system may be implemented via processes and/or threads. Although details can vary upon platform, the fundamental difference is that processes have separate address spaces, whereas threads share address spaces. We will follow the common theoretical practice and ignore this distinction, using shared variables when desired, and using the two terms interchangeably.
We view these threads as executing relatively independently. However, since they are acting together towards some goal, they must need to communicate and coordinate. The two most common communication techniques in processes and threads are through, respectively, message passing and shared variables. In order to communicate at the right times, they must synchronize, together arriving at agreed-upon control points. Often, one or more threads block, or stop and wait for some external event. In this module, we will use only shared variables, although the tools we use also allow message passing.
Even though we have multiple flows of control, that doesn't imply we need multiple processors. Concurrent programs may be executed on a single processor by interleaving their control flows. In order to understand what happens in a concurrent program over time, we must understand how the individual operations of the threads can interleave. This is independent of how many processesors we use, although it is convenient to think of only having one processor. To consider the possible interleavings, we need to know which actions are atomic, or indivisible. If an atomic operation begins, we know that it won't be interrupted by a context switch until it is done. We generally assume that each hardware machine instruction is atomic, but a programming language might guarantee that even more complicated operations are atomic as well.
Let's examine the possible interleavings of the code fragments for two threads. First, let's assume that each assignment statement is atomic.
1 thread0:
2 {
3 x=x+1
4 }
5 thread1:
6 {
7 x=x*2
8 }
|
Here, each trace is a timeline, representing
one possible interleaving of operations.
From top to bottom, each state represents one point in time,
with a current line number for each thread, plus the value
of each variable.
The first state is the starting point, before execution begins.
Here, we assume that x
is initially zero.
We will use traces and their timeline diagrams throughout this
module to understand concurrent programs.
For contrast, let's change what is atomic. Now, assume that each
machine instruction operation, such as loading, adding, or storing
a register, is atomic.
Again, we assume x is initially zero.
Observe that with finer-grained atomicity,
there are many more ways to interleave the threads,
and (in this case) more possible result values for
x.
1 thread0:
2 {
3 r0 = x
4 r0 += 1
5 x = r0
6 }
7 thread1:
8 {
9 r1 = x
10 r1 <<= 1 /* Shifting left one position multiplies by 2. */
11 x = r1
12 }
|
Both the syntax and semantics of the concurrency primitives can vary among programming languages and concurrency libraries. We will use a particular language, Promela, which implements the most commonly-used features of concurrent programs. In Promela, assignment statements are defined to be atomic. We'll later contrast atomic with non-atomic assignment statements.
Programmers are comfortable with debugging single-threaded code, where most questions involve ``does this sequence of instructions compute the correct answer?'' Concurrency, however, introduces the added complications of communication and synchronization; a single task may be spread across several sequences of instructions running asynchronously, and multiple threads can interact in unintended ways. The following are five common categories of problems with concurrency.
As an analogy, consider two people walking in a hallway towards each other. The hallway is wide enough for two people to pass. Of interest is what happens when the two people meet. If on the same side of the hallway, a polite strategy is to step to the other side. A more belligerent strategy is to wait for the other person to move. Two belligerent people will suffer in deadlock, glaring face to face in front of each other. Two polite people could suffer from livelock if they repeatedly side-step simultaneously. (No conclusions on morality are to be inferred from the fact that one polite and one belligerent person don't have any problems.)
![]() |
There are more specific notions of fairness that describe when and how often threads are guaranteed to get a turn. For example, do we expect that each thread should be executed as often as any other, or is it acceptable if one runs a hundred steps for each step of any other thread? When considering fairness, one must also consider the system code implementing the threads. The implementation includes a scheduler that determines how to interleave the threads, and the scheduler might or might not provide any fairness guarantees.
Assume we are modeling a store. Consumers buy products,
reducing the shelf inventory. Meanwhile, employees restock
the shelves, increasing the shelf inventory.
More simply, let thread
Unless our scheduler (which may be nature) otherwise gives some guarantees, this system does not ensure that the two threads execute fairly. In particular, it is possible for one thread to starve the other.
A Sesame Street skit has Grover in line at the grocery store, his basket heaped full of animal crackers and other essentials. A little old lady gets in line behind him, holding only a single candle, muttering how she is late for her grandson's very first birthday party. Grover muses that she has to wait a long time for his big basket to be rung up, whereas if she went first it's hardly any delay at all for him. So he invites her to get in front of him in line, and she gratefully accepts.
You might guess what happens next though: Next in line is the Swedish Chef, needing only a single meatball (Bork! Bork! Bork!), and then fish-flinging Lew Zealand, who is buying just one little herring, followed by … [insert Grover's resigned sigh here].
The scheduler, in this case, is the social protocol at the checkout line (which includes Grover's politeness). This input situation demonstrates how Grover gets starved. The alternate protocol of having an additional express checkout line (without allowing any butt-ins) is an attempt to minimize customers' wait-delay without introducing starvation.
Returning to our airline reservation system example, suppose Joe and Sue each see that one seat is left. It is a race condition error if the system allows both of them to successfully reserve the one seat.
Typically, a race condition is an oversight by the programmer who did not realize the interleaving was possible. Race conditions are notoriously hard to debug and test for because they can well occur in highly unlikely situations.
More generally, these and other issues are categorized as being either safety or liveness issues.
In this module, we will concentrate on the four issues of deadlock, livelock, race conditions, and fairness. We will see how to write and automatedly detect these problems in some concurrent programs, using the tools Promela and SPIN. We will also learn how to formally specify more intricate properties using temporal logic.
There are known programming techniques to avoid certain problems such as deadlock and some kinds of race conditions. Some of these techniques — notably locks and semaphores — are so common that they are often embedded into programming languages. While the details of such techniques are outside the scope of this module, it is important to recognize that although they greatly help writing concurrent problems, they don't mean that concurrency problems can't arise.
Given concurrent code, how can we determine if it behaves as expected? Certainly we can test it on various inputs, but we can never try every possible input. Neither can we test every possible interleaving on every possible input. As always, testing can reveal errors, but it can't demonstrate the lack of errors.
Since concurrency problems can lead to programs that don't terminate, it's clear that detecting concurrency problems can't be any easier than the Halting Problem, which is itself unsolvable. So we can't write a checker which takes a concurrent program as input and always determines whether it's free of (say) deadlock.
But just as we can study individual programs and conclude whether or not they will halt, for many real-world concurrent programs, we can determine whether or not they are susceptible to deadlock. For example, communications protocols generally have a finite number of options and are amenable to automated checking. Thus automated tools (such as SPIN) are valuable aid in catching and eliminating real-world bugs.