Race to 270: Indiana

Victory probability map

Obama lead over time

Friday, June 27, 2008

Indiana

Right now, many sites are surprisingly bullish on Obama's chances in Indiana. This should be surprising to anyone with historical perspective, since IN has been so consistently Republican in presidential elections. During the 20th and 21st century, IN only voted for Democrats in the Roosevelt and Johnson landslides, and in the weird 1912 election when Wilson won IN with a 43% plurality. Throughout the '90s, IN voted against Bill Clinton, even as Clinton won every surrounding state.

Here's a table of how various people view IN:

Source	Estimate of Obama lead	Probability of Obama win
me	-5.9	9.8
538	-1.3	43
Pollster	+1	n/a
most recent poll	+1	n/a
Intrade	n/a	33-37

I'm clearly the outlier here, but I have the courage of my convictions, and maintain that mine is the most reasonable view of IN. Let's look at the available IN data:

Pollster	Dates	Sample size	Obama lead
Survey USA	6/21-23	627	+1.0
IN Leg Thought	5/27-6/1	601	-9
Downs Ctr	4/28-30	1274	+1
Research 2000	4/23-24	600	-8
Selzer-Star-WTHR	4/20-23	384	+8
Downs Ctr	4/14-16	1254	-7
Survey USA	2/26-28	579	-8.9
Survey USA	2/3-4	499	-10.2
Bush vs. Kerry	11/2/2004	2.5MM	-20.7

And here's an aggregation by month. The "margin of error" is a 95% certainty range for the difference between the candidates, assuming a -100% correlation between the two. (This isn't a perfect assumption, and not what I use in my model, but should be good enough.)

month	Aggregate sample size	Aggregate Obama lead	Margin of error
June	627	+1.0	7.8
May	601	-9	8.0
April	3512	-1.9	3.3
February	1078	-9.5	6.0

The story I get from this table is that there is not much information on IN. Just from looking at the IN, one could come up with several reasonable hypotheses, for examples:

A moderate McCain lead stretching back a few months, with some small fluctuations through the spring and some pro-Obama shift on net since February. In this view, there's been significant error in a few polls, although not so great as to be incommensurate with the polls' published sample errors. This is roughly the view I have, with my estimate that McCain leads today by 5.9 +/- 1.7.
Obama has had a big surge from May to June, and the race is now essentially tied. All the polls have been valid, but the race is volatile. This roughly seems to be the view that 538 has, and it's implicitly the view that people have who always use the most recent poll, such as many reporters or the map at MyDD.
The race has been essentially tied for at least a couple months. The polls that were good for Obama are good, and the polls that were good for McCain are bad. I see no evidence to support this idea, but it might not be wrong.
McCain has led substantially for at least a couple months. The polls that were good for McCain are good, and the polls that were good for Obama are bad. I see no evidence to support this idea, but it might not be wrong.

Now let's look at what 538 and I think are the changes in the D-R spread since 2004 in IN and nearby states. (I don't mean to pick on 538. I'm using it because it's the most sophisticated example of a site that takes the second viewpoint.)

State	2004 spread	Change per me	Change per 538
IN	-20.7	+14.8	+19.4
IL	+10.3	+14.4	+8.8
MI	+4.4	+3.3	+0.2
OH	-2.1	+7.4	+6.5
KY	-19.9	+6.2	+2.8
US	-2.5	+8.3	+6.1

In order to believe 538's results, you must believe that Obama has improved by much more in IN than in the US as a whole, or in any surrounding state. You must even believe that there has been twice as much pro-Obama swing in IN than in Obama's home state of IL. And most importantly, you must believe all this on the basis of a total of scant recent polling in the state -- only 1228 people polled since the IN primary. To me, this state of affair seems neither plausible nor well-supported by evidence.

By contrast, I think my results make common sense. The distribution of shifts across the states makes sense: Obama does best in his home IL and neighboring IN, mediocre in OH and KY (states where he struggled in the primary), and poorly in MI (where he snubbed the primary entirely). Furthermore, none of my current results are inconsistent with available polling.

The reason that my results don't get out-of-whack with each other or with polling data is that my methodology is centered around two fundamental principles:

Output should be based on data, i.e., polls.
Opinion shifts in different states are highly correlated.

Sometimes these principles conflict, and there are certainly cases where I predict that opinion has shifted differently in different states -- see IN and MI above for example. But in cases like IN/MI, the divergence is strongly rooted in available data. My methodology would rarely (if indeed ever) predict an 11-point divergence in the IL and IN shifts with only 2 stale polls in one state and 8 scattered polls in the other.

Labels: indiana

4 Comments:

Jason said...: You and 538 use fairly similar methodologies. You've given a somewhat qualitative argument as to why you think IN isn't as strong for Obama as other sources say, but what is the difference between your model and 538's that gives you different results on IN, but less so on other states? Is this a free parameter that differs between you two, or is it more fundamental?; June 28, 2008 2:47 PM
Benjamin Schak said...: I think there are two things at play:

1) I explicitly assume a very high (90%+) level of correlation between the change in public opinion in IN and the change in public opinion in neighboring states. 538 does something to adjust the results of states when other state/national polls show shifts. Also, I think that the work shown here is roughly equivalent to assuming a fairly low correlation between states. (The implied correlation would be highest for states with high values of m.) Since the IN-specific poll show Obama doing better than non-IN-specific poll suggest, I am less optimistic about Obama in IN than 538. And taking this to the extreme, a site like electoral-vote.com implicitly assumes a 0 correlation between states, so it has been even more optimistic than 538.

Whether I completely understand 538's m or not, I'm pretty sure that it is related to some second-order statistic, like volatility or correlation or covariance or beta. For each state, Nate does some sort of regression to obtain the best estimate of m. The problem I see with this is that estimates of second-order statistics from small-to-medium data sets tend to be really bad. This is why PA and OH have very different m-values for 538 even though they probably behave similarly in reality.

If I were to improve 538's method for ascertaining m, I would first figure out what I think m conceptually represented, and what underlying factors affected the size of m. Then, use a regression to get a formula that between m and those underlying factors to get a formula for m, so that similar states have similar m values.

What I did to get my correlation assumptions is this:
a) Run my process on the 2004 election assuming 0 correlations, and measure the correlation of movements in different states. (Because of the small sample size, this gives awful results for many of the state-pairs. Stopping at this step would be basically the same mistake that 538 is making with its m.)
b) Although many individual correlations from step (a) are bad, they tend to show that states that are near each other are more strongly correlated, and that states with similar racial compositions are more strongly correlated. So, I run a regression to get a formula for correlation as a function of distance and racial similarity.
c) Go back to step (a) using correlations from the formula in step (b), and repeat the process. Continue doing this until the correlation formula converges on something.

(FYI, I think I did some technical details stupidly in coming up with my correlations, and I'll be re-doing this at some point, probably using 2008 data instead of 2004. But I think that the main point is that correlations will be very high even after I fix these details, so this is a low priority.)

2) Secondarily, in a couple of places, I think that 538 either explicitly or implicitly assumes a higher volatility in public opinion than I do. One implication of this is that recent polls are given more weight relative to stale polls there than here, and the one recent IN poll has Obama in the lead. I'm actually not very sure whether I'm assuming the right level of volatility, and this is my next big methodological project to try to figure out a systematic way of finding the right volatility to assume.; June 28, 2008 5:21 PM
hersco said...: You make an argument based, somewhat, on the history of voting patterns in Indiana in presidential elections. It certainly makes sense to do this, but we should also note that in 2006, we saw a larger trend toward Democrats in Indiana than we saw in most other states by at least one measure; one-third of the House seats flipped. If Indianans like how their new delegation is performing, it seems at least possible to me that Indiana may be more open to voting for Obama, and we may see larger shifts in Indiana than in neighboring states.

I certainly haven't done any formal analysis on the subject, but I thought I'd throw it out there for your comment.; June 28, 2008 10:39 PM
Benjamin Schak said...: That's a fair point. I think that your argument makes a lot of sense that IN will swing more in 2008 than most states (although this logic doesn't seem to apply to NH, which changed even more spectacularly from 2004 to 2006). Indeed, I currently predict it swinging more than any surrounding state except perhaps IL. I just don't think it's plausible that it'll swing twice as much as Obama's home state of IL.

By the way, I took a glance at the swings in the IN congressional races between 2004 and 2006. The districts averaged a swing of +14, and the five districts with rematches of 2004 averaged a swing of +11. (Most of this difference is because Souder had a far stronger challenger in 2006.) I think this historical swing is in the right range to agree with either my result or 538's is plausible. What I think makes 538's result implausible is that it's such an extreme outlier.; June 28, 2008 11:17 PM

Place	# of EV	Vote Est.	Win Prob.
National
US-Popular	-	53	100
US-Electoral	[538]	366	100

No Swing States

New England
ME	2	58	100
ME1	1	59	100
ME2	1	56	99
NH	4	54	100
VT	3	65	100
MA	12	60	100
RI	4	63	100
CT	7	58	100
Mid Atlantic
NY	31	61	100
NJ	15	57	100
PA	21	55	100
DE	3	59	100
MD	10	61	100
DC	3	91	100
Eastern South
VA	13	53	100
WV	5	47	1
NC	15	51	78
SC	8	45	0
GA	15	47	0
FL	27	51	95
Western South
KY	8	44	0
TN	11	43	0
AL	9	41	0
MS	6	46	0
AR	6	45	0
LA	9	46	0
TX	34	46	0
Great Lakes
OH	20	52	100
MI	17	56	100
IN	11	51	65
IL	21	62	100
WI	10	55	100
MN	10	56	100
Plains
IA	7	56	100
MO	11	51	76
ND	3	50	45
SD	3	46	1
NE	2	43	0
NE1	1	46	4
NE2	1	48	10
NE3	1	35	0
KS	6	44	0
OK	7	37	0
Mountain West
MT	3	49	12
ID	4	38	0
WY	3	39	0
CO	9	54	100
UT	5	37	0
NM	5	55	100
AZ	10	47	0
NV	5	52	99
AK	3	44	0
Pacific
WA	11	57	100
OR	7	57	100
CA	55	61	100
HI	4	69	100

Obama electoral win (269+)	100.0
Obama electoral landslide (369+)	53.9
Obama electoral avalanche (469+)	0.0
Electoral tie (269-269)	0.0
McCain electoral win (270+)	0.0
McCain electoral landslide (369+)	0.0
McCain electoral avalanche (469+)	0.0

Popular/electoral split	0.0
Obama popular, McCain electoral	0.0
McCain popular, Obama electoral	0.0

Likeliest result (378-160 Obama)	20.5

Mean EV	365.7
Median EV	375
Mode EV	375
95% confidence range for EV	311-382

Mean PV	53.3
95% confidence range for PV	51.8-54.9

Race to 270

Friday, June 27, 2008

Indiana

4 Comments:

Current Estimates — updated 11/3

Scenarios

Simulation Stats

Bookmarks

Links

Recent Posts

About Me