<< Chapter < Page | Chapter >> Page > |
The energy of a given assignment is equal to , where is a vector of parameters and is a vector of indicator functions, i.e. 1 means on, 0 means off. Below, we define these notions exactly [link] .
It can be seen that length of length of . In general, we are interested in maximizing the amount of energy over the network (also known as maximum a posteriori inference). Since the exponential is monotonic, this is equivalent to maximizing the following binary program [link] :
The first two constraints ensure consistency between the edge indicator functions and the nodal indicator functions. The third constraint ensures that each node takes exactly one state.
We consider each of the 64 counties of Colorado to be a node. Two counties are connected by an edge if they are geographically adjacent (see figure 2). The state of a county is the percentage of its voters in 2012 who will vote for Barack Obama (precisely it's the number of voters voting for Obama divided by the number of voters voting for Obama or Romney). For computational efficiency, we cut these percentages off at 20 and 79. In order to use this Markov Network to predict the 2012 election, we must know the model parameters, .
In order to learn the model parameters, we examine presidential elections from 1960 to 2008. We note that 1960 was chosen somewhat arbitrarily but roughly represents the end of an old political era. The historical observations (call them ) are used in a Maximum Likelihood Estimation. This optimization problem searches for the most likely model parameters given the observed data:
Technically, our equation above gives the log -likelihood of a set of parameters , but this is unimportant since the logarithm is a monotonic function. is the partition function, i.e. the sum over all possible outcomes given a set of parameters, .
Notification Switch
Would you like to follow the 'The art of the pfug' conversation and receive update notifications?