Race to 270

Possibly the most important parameters in my methodology are the assumed volatility of public opinion and the formula for assumed correlations between states. For most of this year, I've been assuming that one standard deviation of one-day national public opinion change is 0.2% and correlations of 95%+ between states, numbers that I chose because they looked roughly right from the data I saw, and because they gave results that made sense to me. However, ever since I saw the graph on this post from 538, I've worried that the volatility assumption was somewhat too low. So, over the weekend, I came up with a reasonable way to assess parameters from available 2008 data. More on that below the fold.

The main point is, now that I have a better grasp of the parameters I'm using, I have much greater confidence about my analysis than I ever have before. Additionally, I adjusted my estimates of voter turnout by scaling the 2004 turnout up by the increase in a state's voting-eligible population.

(By the way, I now believe that the graph I linked to above is somewhat misleading. I was only 10 in 1992, but as I recall Ross Perot's share of the vote plummeted after he bizarrely left and re-entered the race. I believe this is what produces the high-error dots early in the 1992 campaign.)

The key insight for setting parameters is this. Let's say I run the algorithm to generate estimates for each day during the campaign. If the algorithm's parameters are set correctly, then the polling data should tend to equal the estimate plus/minus a predictable error range. If the polling data are skewed to one side of the estimates, if the polling data match the estimates more closely than expected, or if the polling data match the estimates less closely than expected, then the parameters are bad.

For each date during the campaign and for each geographical unit, my algorithm produces (among other data) an estimate of the Obama-McCain difference in public opinion and the uncertainty of that estimate. Now suppose A is the uncertainty of my estimate on the polling date (i.e., one standard deviation of the difference between my estimate and the platonic truth), B is the poll's sample error, and C is the poll's non-sample error (which I assume is 2%). These are the three components of the difference between my estimate of the Obama-McCain difference (E) and the poll's estimate of the Obama-McCain difference (P), and it's reasonable to assume that these sources are independent. Then I expect the poll's estimate of the Obama-McCain difference to be normally distributed with an expected value of my estimate of the Obama-McCain difference and a standard deviation of sqrt(A^2+B^2+C^2).

In other words, if I list out the statistic (P-E)/sqrt(A^2+B^2+C^2) for each poll, I expect the list to have a standard normal distribution. To measure how close it is to a standard normal distribution, I first measure the mean (m) and standard deviation (s) of this statistic. Then, I evaluate the integral of [pdf(0,1)-pdf(m,s)]^2, where pdf(x,y) is the probability density function of the normal distribution with mean x and standard deviation y. Clearly, lower values of this integral signal a better distribution of polls around my estimates, which signals better parameters.

So far, the best parameters I've found are a daily national volatility of 0.245% and a correlation between each pair of states of 92%.

Labels: methodology, volatility

Place	# of EV	Vote Est.	Win Prob.
National
US-Popular	-	53	100
US-Electoral	[538]	366	100

No Swing States

New England
ME	2	58	100
ME1	1	59	100
ME2	1	56	99
NH	4	54	100
VT	3	65	100
MA	12	60	100
RI	4	63	100
CT	7	58	100
Mid Atlantic
NY	31	61	100
NJ	15	57	100
PA	21	55	100
DE	3	59	100
MD	10	61	100
DC	3	91	100
Eastern South
VA	13	53	100
WV	5	47	1
NC	15	51	78
SC	8	45	0
GA	15	47	0
FL	27	51	95
Western South
KY	8	44	0
TN	11	43	0
AL	9	41	0
MS	6	46	0
AR	6	45	0
LA	9	46	0
TX	34	46	0
Great Lakes
OH	20	52	100
MI	17	56	100
IN	11	51	65
IL	21	62	100
WI	10	55	100
MN	10	56	100
Plains
IA	7	56	100
MO	11	51	76
ND	3	50	45
SD	3	46	1
NE	2	43	0
NE1	1	46	4
NE2	1	48	10
NE3	1	35	0
KS	6	44	0
OK	7	37	0
Mountain West
MT	3	49	12
ID	4	38	0
WY	3	39	0
CO	9	54	100
UT	5	37	0
NM	5	55	100
AZ	10	47	0
NV	5	52	99
AK	3	44	0
Pacific
WA	11	57	100
OR	7	57	100
CA	55	61	100
HI	4	69	100

Obama electoral win (269+)	100.0
Obama electoral landslide (369+)	53.9
Obama electoral avalanche (469+)	0.0
Electoral tie (269-269)	0.0
McCain electoral win (270+)	0.0
McCain electoral landslide (369+)	0.0
McCain electoral avalanche (469+)	0.0

Popular/electoral split	0.0
Obama popular, McCain electoral	0.0
McCain popular, Obama electoral	0.0

Likeliest result (378-160 Obama)	20.5

Mean EV	365.7
Median EV	375
Mode EV	375
95% confidence range for EV	311-382

Mean PV	53.3
95% confidence range for PV	51.8-54.9

Race to 270

Thursday, July 31, 2008

Exciting news about parameters

Current Estimates — updated 11/3

Scenarios

Simulation Stats

Bookmarks

Links

Recent Posts

Archives

About Me