Skip to content Skip to navigation

Connexions

You are here: Home » Content » Using Markov Random Fields for Election Prediction

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • Rice Digital Scholarship

    This module is included in aLens by: Digital Scholarship at Rice UniversityAs a part of collection: "The Art of the PFUG"

    Click the "Rice Digital Scholarship" link to see all content affiliated with them.

Also in these lenses

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney BurrusAs a part of collection: "The Art of the PFUG"

    Click the "Lens for Engineering" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.
 

Using Markov Random Fields for Election Prediction

Module by: Andrew Tilley, Steven Moen. E-mail the authorsEdited By: Andrew Tilley, Steven Moen

Summary: We model the counties of Colorado as a Markov Random Field in an attempt to predict the results of the 2012 Presidential Election therein.

Introduction

The importance of predicting the United States Presidential Election cannot be understated because of the impact the U.S. president has on national policy. The U.S. President holds the veto power, which can make it very difficult for the Congress to pass legislation without his approval. In addition, the president appoints all federal judges, most notably the justices of the Supreme Court. While all nominees must be confirmed by the Senate, the confirmation process is usually a rubber stamp and few nominees are turned down, with some notable exceptions (such as Ronald Reagan's failed appointment of Robert Bork). Also, the U.S. President is the Commander in Chief of the U.S. military and can go to war unilaterally and then ask for Congressional approval 48 hours later under the War Powers Resolution of 1973. Lastly, presidents use the executive order to carry out their wishes, perhaps outside the constraints of the Constitution, to choose which laws they want to enforce. For example, President Obama does not enforce neither Defense of Marriage Act nor the deportation of illegal immigrants under the age of 30, Harry Truman ended racial segregation of the military, and Jimmy Carter created the Federal Emergency Management Agency (FEMA). Clearly the U.S. president has enormous power in the U.S. and abroad. Therefore, a company or an individual making decisions that demand foresight about the political state of the country would benefit knowing who the next president is.

Despite the large demand for predicting elections, most existing methods are either unscientific or unreliable. Unscientific judgements include evaluations of a candidate's character or analysis of a candidate's rhetoric. It is not wrong to say that one candidate will win over another because he has more charisma or more appeal to the party base. Such statements are meaningful because political intangibles like charisma and appeal to the base are vital aspects for any politician. The problem is that such intangibles lack some sort of reliable measuring stick, something that can be viewed objectively rather than through the lens of political opinion, in which people may view candidates as they want to see them rather than as they really are.

The most common answer to the measuring stick problem is using public opinion polls. However, public opinion polls are unreliable. Polls are no doubt useful for a first approximation to predicting an election, but they contain many pitfalls for the election forecaster. For example, national polls may not be able to predict who will win the electoral college. The electoral college evolved to limit direct democracy, and an important consequence of the electoral college is that a president can win a presidential election without winning the popular vote. Four U.S. presidents have done so: John Quincy Adams in 1824, Rutherford B. Hayes in 1876, Benjamin Harrison in 1888, and most recently George W. Bush in 2000[4]. Furthermore, national polls include voters from states of no consequence in that they are solidly Republican or Democratic states (i.e. Democratic voters from Texas or Republican voters from Massachusetts or New York). For these reasons, state level polls are more accurate.

But where might a state poll go wrong? The most obvious answer is with independent voters, voters who haven't made up their mind yet. For example, Rasmussen's latest Colorado poll has 5% of people undecided, 6% of people for some third party candidate, and equal percentages for the two major party candidates[2] . State level polls are fine for a month before an election: vice presidential candidates are decided, undecided voters are becoming less and less common, primary struggles are long finished, and party conventions are over. But as of the writing of this paper, these factors, crucial for the accuracy of polls, are not yet decided.

A key to predicting a presidential election is using swing states as the focus of the analysis. In the United States, swing states such as Ohio, Florida, Colorado, Virginia, and Nevada are different than other states in the sense that they are on the border between Republican and Democrat, hovering around 50% in recent elections. These swing states are often bellwethers for the presidency. For example, no president has been elected without winning Ohio since John F. Kennedy lost Ohio in 1960. Also, by focusing on the state level, data becomes more meaningful. For example, while the national unemployment rate is hovering around 8.2% for the June jobs report from the Bureau of Labor Statistics, North Dakota has unemployment below 5% whereas Nevada and California have statewide unemployment rates above 10%[7]. Unemployment levels by county can vary even more from the national average[1]. The point here is that while national averages are indicative of the state of the country as a whole, they are less meaningful for the states in question.

One important insight about the level of detail available on the state and local level is the relationship between counties in a state. Two observations are absolutely critical for the prediction method described herein. First is that strongly Republican counties are usually not right next to strongly Democratic counties, and that the change between Democratic and Republican counties in a state is gradual rather than sudden. The second observation is that the differences between two counties is not random and does not vary greatly nor randomly between elections. If one knows the voting percentages of one county, it is possible to guess the voting percentages of the neighboring counties using historical data. In order to take advantage of these facts, we can model the counties as a Markov Random Field and find the most likely outcome for each county (maximum a posteriori inference).

Markov Random Fields

Definition

A Markov Random Field (or Markov network) is an undirected, probabilistic graphical model. It consists of a set of nodes connected by a set of edges, where each node takes exactly one state. The nodes in a Markov Random Field have the Markov property in the sense that their state depends only on the nodes they are connected to as well as their own built-in preferences. Each assignment of states over the network yields a level of energy, which is a measure of the probability of that assignment occurring given the model. The exact probability of a particular assignment is its energy divided by the sum of the energies of all possible assignments. In our research, we focus on pairwise Markov Random Fields, in which any potential function describing the energy attained from an assignment of nodes' states is a function over no more than 2 nodes. We note that any Markov Random Field can be converted to a pairwise Markov Random Field[5].

Figure 1: An example of a Markov Random Field
Figure 1 (markov.png)

The energy of a given assignment is equal to eθ·μeθ·μ, where θθ is a vector of parameters and μμ is a vector of indicator functions, i.e. 1 means on, 0 means off. Below, we define these notions exactly[8].

                θ=θ1(x1)θ1(x2)...θ1(xK)θ2(x1)...θ2(xK)θ3(x1)...θN(xK)θE11,E12(x1,x1)θE11,E12(x1,x2)...θE11,E12(x1,xK)θE11,E12(x2,x1)...θE11,E12(xK,xK)θE21,E22(x1,x1)...θEM1,EM2(xK,xK)θ=θ1(x1)θ1(x2)...θ1(xK)θ2(x1)...θ2(xK)θ3(x1)...θN(xK)θE11,E12(x1,x1)θE11,E12(x1,x2)...θE11,E12(x1,xK)θE11,E12(x2,x1)...θE11,E12(xK,xK)θE21,E22(x1,x1)...θEM1,EM2(xK,xK)                                μ=μ1(x1)μ1(x2)...μ1(xK)μ2(x1)...μ2(xK)μ3(x1)...μN(xK)μE11,E12(x1,x1)μE11,E12(x1,x2)...μE11,E12(x1,xK)μE11,E12(x2,x1)...μE11,E12(xK,xK)μE21,E22(x1,x1)...μEM1,EM2(xK,xK)μ=μ1(x1)μ1(x2)...μ1(xK)μ2(x1)...μ2(xK)μ3(x1)...μN(xK)μE11,E12(x1,x1)μE11,E12(x1,x2)...μE11,E12(x1,xK)μE11,E12(x2,x1)...μE11,E12(xK,xK)μE21,E22(x1,x1)...μEM1,EM2(xK,xK)

  • KK = Total Number of Possible States (Assumed to be the same for each node here, though this isn't necessarily true)
  • NN = Total Number of Nodes
  • MM = Total Number of Edges
  • xkxk = State k
  • Em1Em1 = The first of two nodes connected by edge m
  • Em2Em2 = The second of two nodes connected by edge m
  • θn(xk)θn(xk) = The level of energy gained by node n being in state k
  • θEm1,Em2(xj,xk)θEm1,Em2(xj,xk) = The level of energy gained by the two nodes connected by edge m being in states j and k, respectively
  • μn(xk)μn(xk) = 1 if node n is in state k, 0 if not.
  • μEm1,Em2(xj,xk)μEm1,Em2(xj,xk) = 1 if the two nodes connected by edge m are in states j and k, respectively. 0 otherwise.

It can be seen that D=D= length of θ=θ= length of μ=N*K+M*K2μ=N*K+M*K2. In general, we are interested in maximizing the amount of energy over the network (also known as maximum a posteriori inference). Since the exponential is monotonic, this is equivalent to maximizing the following binary program[8]:

maximize μ d = 1 D θ d * μ d subject to k = 1 K μ E m 1 , E m 2 ( x j , x k ) = μ E m 1 ( x j ) m , j j = 1 K μ E m 1 , E m 2 ( x j , x k ) = μ E m 2 ( x k ) m , k k = 1 K μ n ( x k ) = 1 n μ { 0 , 1 } D maximize μ d = 1 D θ d * μ d subject to k = 1 K μ E m 1 , E m 2 ( x j , x k ) = μ E m 1 ( x j ) m , j j = 1 K μ E m 1 , E m 2 ( x j , x k ) = μ E m 2 ( x k ) m , k k = 1 K μ n ( x k ) = 1 n μ { 0 , 1 } D
(1)

The first two constraints ensure consistency between the edge indicator functions and the nodal indicator functions. The third constraint ensures that each node takes exactly one state.

Colorado Counties as a Markov Random Field

We consider each of the 64 counties of Colorado to be a node. Two counties are connected by an edge if they are geographically adjacent (see figure 2). The state of a county is the percentage of its voters in 2012 who will vote for Barack Obama (precisely it's the number of voters voting for Obama divided by the number of voters voting for Obama or Romney). For computational efficiency, we cut these percentages off at 20 and 79. In order to use this Markov Network to predict the 2012 election, we must know the model parameters, θθ.

Figure 2: A subset of our Colorado Markov Random Field
Figure 2 (cocounties.png)

Learning the Model Parameters

In order to learn the model parameters, we examine presidential elections from 1960 to 2008. We note that 1960 was chosen somewhat arbitrarily but roughly represents the end of an old political era. The historical observations (call them ξ1,,ξ13ξ1,,ξ13) are used in a Maximum Likelihood Estimation. This optimization problem searches for the most likely model parameters given the observed data:

maximize θ l ( θ : Ξ ) = i = 1 D θ i j = 1 13 μ i ( ξ j ) - M ln Z ( θ ) maximize θ l ( θ : Ξ ) = i = 1 D θ i j = 1 13 μ i ( ξ j ) - M ln Z ( θ )
(2)

Technically, our equation above gives the log-likelihood of a set of parameters θθ, but this is unimportant since the logarithm is a monotonic function. Z(θ)Z(θ) is the partition function, i.e. the sum over all possible outcomes given a set of parameters, θθ.

Z ( θ ) = ξ exp i = 1 D θ i μ i ( ξ j ) Z ( θ ) = ξ exp i = 1 D θ i μ i ( ξ j )
(3)

Note that the sum over ξξ in the partition function refers to the sum over all possible ξξ, not just the ξξ that have been observed. This fact makes computation of the partition function intractable and we must approximate it. Following a sampling-based learning technique, we conclude[6]:

ln Z ( θ ) ln 1 T t = 1 T exp i = 1 D ( θ i - θ i 0 ) μ i ( ξ t ) + ln Z ( θ 0 ) ln Z ( θ ) ln 1 T t = 1 T exp i = 1 D ( θ i - θ i 0 ) μ i ( ξ t ) + ln Z ( θ 0 )
(4)

Where θ0θ0 is some set of parameters from which T samples are drawn (20 in our case). Since lnZ(θ0)lnZ(θ0) is a constant, we can leave it out of the optimization's objective function and we solve the MLE problem via gradient ascent.

θ i t - 1 = j = 1 13 μ i ( ξ j ) - 13 t = 1 T μ i ( ξ t ) exp r = 1 D ( θ r t - 1 - θ r t - 2 ) μ r ( ξ t ) t = 1 T exp r = 1 D ( θ r t - 1 - θ r t - 2 ) μ r ( ξ t ) θ i t - 1 = j = 1 13 μ i ( ξ j ) - 13 t = 1 T μ i ( ξ t ) exp r = 1 D ( θ r t - 1 - θ r t - 2 ) μ r ( ξ t ) t = 1 T exp r = 1 D ( θ r t - 1 - θ r t - 2 ) μ r ( ξ t )
(5)
θ t = θ t - 1 + s * θ t - 1 θ t = θ t - 1 + s * θ t - 1
(6)

Where ss is some small step size. We update θ0θ0 on each iteration to be θt-2θt-2. This is due to the fact that the partition function approximation is only reasonable in a neighborhood of θ0θ0[6]. It follows that the ξξ's which are indexed by t are drawn from a model with parameters θt-2θt-2, while the ξξ's indexed by j still represent the historical data.

Correcting for Lack of Data

Due to the small number of historical observations (13) and the large number of possible combinations for any edge (60states*60states=3,60060states*60states=3,600 combinations), we must come up with a more concise way to learn the relationships between counties. To that end, we look not at the absolute voting percentages of counties but rather the difference in voting percentage between each pair of neighboring counties. This method has the added bonus of circumventing the problem of overall change that has affected every county. Unfortunately, there are still 119 possible differences that could occur (-59,-58,...,0,...58,59) and only 13 elections to determine the frequency with which each difference occurs. Therefore, we place each difference into a cluster, e.g. [-9,-6]. We use 11 clusters total and since the differences between counties are fairly consistent between years, the 13 observations should be sufficient for an approximation of the marginal probabilities for each edge. These approximation techniques do not affect the way we solve the problem via gradient ascent. However, once gradient ascent is finished we must convert our small θθ into standard long form (as displayed in Section 2.1).

Performing MAP Inference

Due to our approximation techniques in the learning process, we are confronted with a problem when attempting to predict the 2012 election. Since the entire model is based off relativity, any outcome for a particular county is equally likely as long as the rest of the model shifts with it. In order to ensure we do not get extremely low or high results, we must fix some subset of the counties as a starting point for the model. In order to do this, we utilize linear regression techniques (as discussed in the next section). Once the model is partially filled in, we solve the binary program stated above with our learned θθ (in standard long form) via Gurobi Optimizer.

Multivariate Regression

Multivariate Linear Regression is commonly used in social sciences as a means of predicting future outcomes based off of known data. It will provide us with a comparison as well as a starting off point for our Markov Random Field model. Our model will have Incumbent Party Vote % as the dependent variable. That is, if a Democratic president is currently in office, then we will be predicting the voting %'s earned by this year's Democratic Candidate.

Regressors

Percentage Change in Per Capita Income

Reflects an overall change in well-being since the last election. As the Percentage Change in Per Capita Income decreases, the incumbent becomes more unelectable.

Unemployment

Tells how many people are unable to find work as a percentage of the workforce who cannot find find work. As unemployment increases, the incumbent's chance of re-election decreases.

Control of Congress

A variable reflecting whether the incumbent's party controls Congress or not. 1 if the incumbent party controls both the house and the senate, 1/2 if the incumbent's party controls only one chamber, 0 if the incumbent's party controls neither.

Incumbent Approval Rating

The incumbent's approval from the last month in June. As this number decreases, the incumbent becomes more unelectable.

Incumbent Vote in the Past Three Elections

A variable showing how people voted in the past for the incumbent. If the previous incumbent vote is high, chances are people will not go drastically away from that result.

Assumptions

  • Percentage Change in Per Capita Income: In 2008 CPI Adjusted dollars[3], 2012 figures assumes same growth rate as 2009-2010
  • Unemployment: 1992-2008 figures are averages for the year, 2012 figures are the unemployment data for May

Results

2008

We used both the regression model on its own as well as the partially filled-in Markov model to predict the 2008 election using data from 2004 and earlier. The results are presented in Table 1 and Figure 3.

Table 1: Prediction Results for 2008 (% Democratic vote)
  State Prediction Average Prediction Error (magnitude)
Linear Regression 52.18% 2.08%
Markov Random Field 47% 6.84%
Actual Result 54.57% -
Figure 3: Histogram of Errors in Predictions for 2008
Figure 3 (hists.png)

2012

In 2012, we predict Mitt Romney to win Colorado with about 60% of the vote. Our regression model predicts 60.36% while our Markov model predicts 60.65%. In figure 4, we present the county-by-county predictions.

Figure 4: Our Predictions for 2012
Figure 4 (2012preds.png)

Turnout Assumption

In order to calculate the statewide prediction based on the individual county predictions, we had to estimate how many people would vote in each county in 2012. The assumption we made was that the percentage of Colorado voters in each county would remain constant.

Conclusions/Further Research

Based off the results from 2008, the Markov Random Field technique does not seem to perform as well as the regression by itself. This could be due to a lack of data (again, we only had 13 observations to learn the model from), or it could be due to inconsistencies in neighboring counties' relationships with one another. If the latter is true, then our hypothesis was incorrect. That is, the relationships between adjacent counties are not consistent enough to use for election predictions. To answer this question for certain would require many more observations, probably from all types of elections for many years. This is one potential area for future research. Despite the uncertainty regarding our Markov model, we can seemingly conclude success with our regression model. It predicted the correct winner in both 2008 (trained on data from 1992-2004) and 2004 (not shown here, trained on data from 1992-2000). Of course, we used a small set of regressors and there is much room for further research in this area, as well.

References

  1. (2012). Labor Force Data by County, 2011 Annual Averages. Technical report. U.S. Department of Labor.
  2. (2012, June). Election 2012: Colorado President. [Rasmussen Reports]. Web.
  3. (2012). Consumer Price Index. Technical report. U.S. Department of Labor.
  4. Gore, D'Angelo. (2008, March). Presidents Winning Without Popular Vote. [FactCheck.org]. Web.
  5. Jordan, M.I. and Wainwright, M. (2008). Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, 1(1-2),
  6. Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
  7. (2012). Unemployment Rates for States. Technical report. U.S. Department of Labor.
  8. Sontag, D. (2010). Approximate Inference in Graphical Models Using LP Relaxations. Ph. D. Thesis. Massachusetts Institute of Technology.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks