Victory probability mapObama lead over time

Friday, June 27, 2008

Indiana

Right now, many sites are surprisingly bullish on Obama's chances in Indiana. This should be surprising to anyone with historical perspective, since IN has been so consistently Republican in presidential elections. During the 20th and 21st century, IN only voted for Democrats in the Roosevelt and Johnson landslides, and in the weird 1912 election when Wilson won IN with a 43% plurality. Throughout the '90s, IN voted against Bill Clinton, even as Clinton won every surrounding state.

Here's a table of how various people view IN:

SourceEstimate of Obama leadProbability of Obama win
me-5.99.8
538-1.343
Pollster+1n/a
most recent poll+1n/a
Intraden/a33-37

I'm clearly the outlier here, but I have the courage of my convictions, and maintain that mine is the most reasonable view of IN. Let's look at the available IN data:

PollsterDatesSample sizeObama lead
Survey USA6/21-23627+1.0
IN Leg Thought5/27-6/1601-9
Downs Ctr4/28-301274+1
Research 20004/23-24600-8
Selzer-Star-WTHR4/20-23384+8
Downs Ctr4/14-161254-7
Survey USA2/26-28579-8.9
Survey USA2/3-4499-10.2
Bush vs. Kerry11/2/20042.5MM-20.7

And here's an aggregation by month. The "margin of error" is a 95% certainty range for the difference between the candidates, assuming a -100% correlation between the two. (This isn't a perfect assumption, and not what I use in my model, but should be good enough.)

monthAggregate sample sizeAggregate Obama leadMargin of error
June627+1.07.8
May601-98.0
April3512-1.93.3
February1078-9.56.0

The story I get from this table is that there is not much information on IN. Just from looking at the IN, one could come up with several reasonable hypotheses, for examples:

  • A moderate McCain lead stretching back a few months, with some small fluctuations through the spring and some pro-Obama shift on net since February. In this view, there's been significant error in a few polls, although not so great as to be incommensurate with the polls' published sample errors. This is roughly the view I have, with my estimate that McCain leads today by 5.9 +/- 1.7.

  • Obama has had a big surge from May to June, and the race is now essentially tied. All the polls have been valid, but the race is volatile. This roughly seems to be the view that 538 has, and it's implicitly the view that people have who always use the most recent poll, such as many reporters or the map at MyDD.

  • The race has been essentially tied for at least a couple months. The polls that were good for Obama are good, and the polls that were good for McCain are bad. I see no evidence to support this idea, but it might not be wrong.
  • McCain has led substantially for at least a couple months. The polls that were good for McCain are good, and the polls that were good for Obama are bad. I see no evidence to support this idea, but it might not be wrong.

Now let's look at what 538 and I think are the changes in the D-R spread since 2004 in IN and nearby states. (I don't mean to pick on 538. I'm using it because it's the most sophisticated example of a site that takes the second viewpoint.)

State2004 spreadChange per meChange per 538
IN-20.7+14.8+19.4
IL+10.3+14.4+8.8
MI+4.4+3.3+0.2
OH-2.1+7.4+6.5
KY-19.9+6.2+2.8
US-2.5+8.3+6.1

In order to believe 538's results, you must believe that Obama has improved by much more in IN than in the US as a whole, or in any surrounding state. You must even believe that there has been twice as much pro-Obama swing in IN than in Obama's home state of IL. And most importantly, you must believe all this on the basis of a total of scant recent polling in the state -- only 1228 people polled since the IN primary. To me, this state of affair seems neither plausible nor well-supported by evidence.

By contrast, I think my results make common sense. The distribution of shifts across the states makes sense: Obama does best in his home IL and neighboring IN, mediocre in OH and KY (states where he struggled in the primary), and poorly in MI (where he snubbed the primary entirely). Furthermore, none of my current results are inconsistent with available polling.

The reason that my results don't get out-of-whack with each other or with polling data is that my methodology is centered around two fundamental principles:

  1. Output should be based on data, i.e., polls.
  2. Opinion shifts in different states are highly correlated.

Sometimes these principles conflict, and there are certainly cases where I predict that opinion has shifted differently in different states -- see IN and MI above for example. But in cases like IN/MI, the divergence is strongly rooted in available data. My methodology would rarely (if indeed ever) predict an 11-point divergence in the IL and IN shifts with only 2 stale polls in one state and 8 scattered polls in the other.

Labels:

4 Comments:

Blogger Jason said...

You and 538 use fairly similar methodologies. You've given a somewhat qualitative argument as to why you think IN isn't as strong for Obama as other sources say, but what is the difference between your model and 538's that gives you different results on IN, but less so on other states? Is this a free parameter that differs between you two, or is it more fundamental?

June 28, 2008 2:47 PM  
Blogger Benjamin Schak said...

I think there are two things at play:

1) I explicitly assume a very high (90%+) level of correlation between the change in public opinion in IN and the change in public opinion in neighboring states. 538 does something to adjust the results of states when other state/national polls show shifts. Also, I think that the work shown here is roughly equivalent to assuming a fairly low correlation between states. (The implied correlation would be highest for states with high values of m.) Since the IN-specific poll show Obama doing better than non-IN-specific poll suggest, I am less optimistic about Obama in IN than 538. And taking this to the extreme, a site like electoral-vote.com implicitly assumes a 0 correlation between states, so it has been even more optimistic than 538.

Whether I completely understand 538's m or not, I'm pretty sure that it is related to some second-order statistic, like volatility or correlation or covariance or beta. For each state, Nate does some sort of regression to obtain the best estimate of m. The problem I see with this is that estimates of second-order statistics from small-to-medium data sets tend to be really bad. This is why PA and OH have very different m-values for 538 even though they probably behave similarly in reality.

If I were to improve 538's method for ascertaining m, I would first figure out what I think m conceptually represented, and what underlying factors affected the size of m. Then, use a regression to get a formula that between m and those underlying factors to get a formula for m, so that similar states have similar m values.

What I did to get my correlation assumptions is this:
a) Run my process on the 2004 election assuming 0 correlations, and measure the correlation of movements in different states. (Because of the small sample size, this gives awful results for many of the state-pairs. Stopping at this step would be basically the same mistake that 538 is making with its m.)
b) Although many individual correlations from step (a) are bad, they tend to show that states that are near each other are more strongly correlated, and that states with similar racial compositions are more strongly correlated. So, I run a regression to get a formula for correlation as a function of distance and racial similarity.
c) Go back to step (a) using correlations from the formula in step (b), and repeat the process. Continue doing this until the correlation formula converges on something.

(FYI, I think I did some technical details stupidly in coming up with my correlations, and I'll be re-doing this at some point, probably using 2008 data instead of 2004. But I think that the main point is that correlations will be very high even after I fix these details, so this is a low priority.)

2) Secondarily, in a couple of places, I think that 538 either explicitly or implicitly assumes a higher volatility in public opinion than I do. One implication of this is that recent polls are given more weight relative to stale polls there than here, and the one recent IN poll has Obama in the lead. I'm actually not very sure whether I'm assuming the right level of volatility, and this is my next big methodological project to try to figure out a systematic way of finding the right volatility to assume.

June 28, 2008 5:21 PM  
Anonymous hersco said...

You make an argument based, somewhat, on the history of voting patterns in Indiana in presidential elections. It certainly makes sense to do this, but we should also note that in 2006, we saw a larger trend toward Democrats in Indiana than we saw in most other states by at least one measure; one-third of the House seats flipped. If Indianans like how their new delegation is performing, it seems at least possible to me that Indiana may be more open to voting for Obama, and we may see larger shifts in Indiana than in neighboring states.

I certainly haven't done any formal analysis on the subject, but I thought I'd throw it out there for your comment.

June 28, 2008 10:39 PM  
Blogger Benjamin Schak said...

That's a fair point. I think that your argument makes a lot of sense that IN will swing more in 2008 than most states (although this logic doesn't seem to apply to NH, which changed even more spectacularly from 2004 to 2006). Indeed, I currently predict it swinging more than any surrounding state except perhaps IL. I just don't think it's plausible that it'll swing twice as much as Obama's home state of IL.

By the way, I took a glance at the swings in the IN congressional races between 2004 and 2006. The districts averaged a swing of +14, and the five districts with rematches of 2004 averaged a swing of +11. (Most of this difference is because Souder had a far stronger challenger in 2006.) I think this historical swing is in the right range to agree with either my result or 538's is plausible. What I think makes 538's result implausible is that it's such an extreme outlier.

June 28, 2008 11:17 PM  

Post a Comment

<< Home