5.2 Using markov random fields for election prediction (Page 3/5)

Page 3 / 5

The energy of a given assignment is equal to $e^{θ \cdot μ}$ , where $θ$ is a vector of parameters and $μ$ is a vector of indicator functions, i.e. 1 means on, 0 means off. Below, we define these notions exactly [link] .

$θ = (\begin{matrix} θ_{1} (x_{1}) \\ θ_{1} (x_{2}) \\ . \\ . \\ . \\ θ_{1} (x_{K}) \\ θ_{2} (x_{1}) \\ . \\ . \\ . \\ θ_{2} (x_{K}) \\ θ_{3} (x_{1}) \\ . \\ . \\ . \\ θ_{N} (x_{K}) \\ θ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{1}) \\ θ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{2}) \\ . \\ . \\ . \\ θ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{K}) \\ θ_{E_{1_{1}}, E_{1_{2}}} (x_{2}, x_{1}) \\ . \\ . \\ . \\ θ_{E_{1_{1}}, E_{1_{2}}} (x_{K}, x_{K}) \\ θ_{E_{2_{1}}, E_{2_{2}}} (x_{1}, x_{1}) \\ . \\ . \\ . \\ θ_{E_{M_{1}}, E_{M_{2}}} (x_{K}, x_{K}) \end{matrix})$ $μ = (\begin{matrix} μ_{1} (x_{1}) \\ μ_{1} (x_{2}) \\ . \\ . \\ . \\ μ_{1} (x_{K}) \\ μ_{2} (x_{1}) \\ . \\ . \\ . \\ μ_{2} (x_{K}) \\ μ_{3} (x_{1}) \\ . \\ . \\ . \\ μ_{N} (x_{K}) \\ μ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{1}) \\ μ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{2}) \\ . \\ . \\ . \\ μ_{E_{1_{1}}, E_{1_{2}}} (x_{1}, x_{K}) \\ μ_{E_{1_{1}}, E_{1_{2}}} (x_{2}, x_{1}) \\ . \\ . \\ . \\ μ_{E_{1_{1}}, E_{1_{2}}} (x_{K}, x_{K}) \\ μ_{E_{2_{1}}, E_{2_{2}}} (x_{1}, x_{1}) \\ . \\ . \\ . \\ μ_{E_{M_{1}}, E_{M_{2}}} (x_{K}, x_{K}) \end{matrix})$

$K$ = Total Number of Possible States (Assumed to be the same for each node here, though this isn't necessarily true)
$N$ = Total Number of Nodes
$M$ = Total Number of Edges
$x_{k}$ = State k
$E_{m_{1}}$ = The first of two nodes connected by edge m
$E_{m_{2}}$ = The second of two nodes connected by edge m
$θ_{n} (x_{k})$ = The level of energy gained by node n being in state k
$θ_{E_{m_{1}}, E_{m_{2}}} (x_{j}, x_{k})$ = The level of energy gained by the two nodes connected by edge m being in states j and k, respectively
$μ_{n} (x_{k})$ = 1 if node n is in state k, 0 if not.
$μ_{E_{m_{1}}, E_{m_{2}}} (x_{j}, x_{k})$ = 1 if the two nodes connected by edge m are in states j and k, respectively. 0 otherwise.

It can be seen that $D =$ length of $θ =$ length of $μ = N * K + M * K^{2}$ . In general, we are interested in maximizing the amount of energy over the network (also known as maximum a posteriori inference). Since the exponential is monotonic, this is equivalent to maximizing the following binary program [link] :

\begin{matrix} \underset{μ}{maximize} & \sum_{d = 1}^{D} θ_{d} * μ_{d} \\ subject to & \sum_{k = 1}^{K} μ_{E_{m_{1}}, E_{m_{2}}} (x_{j}, x_{k}) = μ_{E_{m_{1}}} (x_{j}) \forall m, j \\ \sum_{j = 1}^{K} μ_{E_{m_{1}}, E_{m_{2}}} (x_{j}, x_{k}) = μ_{E_{m_{2}}} (x_{k}) \forall m, k \\ \sum_{k = 1}^{K} μ_{n} (x_{k}) = 1 \forall n \\ μ \in {0, 1}^{D} \end{matrix}

The first two constraints ensure consistency between the edge indicator functions and the nodal indicator functions. The third constraint ensures that each node takes exactly one state.

Colorado counties as a markov random field

We consider each of the 64 counties of Colorado to be a node. Two counties are connected by an edge if they are geographically adjacent (see figure 2). The state of a county is the percentage of its voters in 2012 who will vote for Barack Obama (precisely it's the number of voters voting for Obama divided by the number of voters voting for Obama or Romney). For computational efficiency, we cut these percentages off at 20 and 79. In order to use this Markov Network to predict the 2012 election, we must know the model parameters, $θ$ .

A subset of our Colorado Markov Random Field

Learning the model parameters

In order to learn the model parameters, we examine presidential elections from 1960 to 2008. We note that 1960 was chosen somewhat arbitrarily but roughly represents the end of an old political era. The historical observations (call them $ξ_{1}, \dots, ξ_{13}$ ) are used in a Maximum Likelihood Estimation. This optimization problem searches for the most likely model parameters given the observed data:

\begin{matrix} \underset{θ}{maximize} & l (θ : Ξ) = (\sum_{i = 1}^{D}, θ_{i}, \sum_{j = 1}^{13}, μ_{i}, (ξ_{j})) - M ln Z (θ) \end{matrix}

Technically, our equation above gives the log -likelihood of a set of parameters $θ$ , but this is unimportant since the logarithm is a monotonic function. $Z (θ)$ is the partition function, i.e. the sum over all possible outcomes given a set of parameters, $θ$ .

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, The art of the pfug. OpenStax CNX. Jun 05, 2013 Download for free at http://cnx.org/content/col10523/1.34

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The art of the pfug' conversation and receive update notifications?

Ask

	28 Biology 28 Invertebrates MCQ By OpenStax Start Quiz
	28 AP 28 Development Inheritance MCQ By OpenStax Start Quiz
	6 Enterprise JavaBeans By JavaChamp Team Start Quiz
	Biology By OpenStax Read Online Course
	Social Dances 2 By Marion Cabalfin Start Quiz
	3 CDL Quiz - General Knowledge Part 1 By Jazzycazz Jackson Start Quiz
	1 SCJP/OCJP Java Certification By JavaChamp Team Start Exam
	Excel 2007 Quiz By Lakeima Roberts Start Quiz
©flickr: Gisela	Electrocardiogram Quiz By Anonymous User Start Quiz
	English Proficiency Test By Anindyo Mukhopadhyay Start Quiz