« previous | TPM CAFÉ READER POSTS HOME | next »
Weekend General Election Simulations
For the past several weeks I've been taking the polling data Andrew Tanenbaum uses for www.electoral-vote.com and using it as input to Monte Carlo simulations of the general election matchups. State-by-state polling is still erratic at best (some states haven't had polls since the end of February), but since people like to look at the headline electoral vote totals from that site, I think computing averages and winning percentages from the same data provides more value.
I'm using a 4% margin of error on the polls, and am assuming the sampling error on polls is the only source of variability, so at best this is a snapshot in time of the most recent polls. But while this is not a particularly good prediction model (once a lead in a state gets beyond the margin of error, the candidate basically wins it all the time in the simulation), it is interesting to see both how things stand now, and how they've changed over the time I've been running the data. I've found changes in just a few key states can swing winning percentatges wildly, and those percentages can be far more extreme that I'd predict for actual probabilities (or, say, from market probabilities inferred from political futures markets like the Iowa Electronic Markets or Intrade).
So here's the most recent data, using 10,000 trials for each simulation*:
Obama wins 81.0%, averages 281.9 EV
McCain wins 18.8%, averages 256.1 EV
Electoral tie 0.2%
Clinton wins 100%, averages 320.9 EV
McCain wins 0%, averages 217.1 EV
No electoral ties
By comparison, electoral-vote.com gives the state to whoever is leading, and gets totals of Obama 266, McCain 248, tie 24 on one side, and Clinton 314, McCain 207, tie 16.
Obama took leads from McCain in Ohio, New Mexico, and New Hampshire, and he pulled even in Virginia. Actually, there were two new polls in Virgina, one from Virginia Commonwealth University giving McCain an 8 point lead, and another (centering around the same date) from Survey USA giving Obama a 7 point lead. Following the electoral-vote.com algorithm, I'm taking the most recent state poll and averaging it with any other polls from within the same week.
Compared with last week, Clinton has now taken leads in Missouri, New Mexico, New Hampshire, and North Carolina(!), so she is now ahead in enough states by enough margins that sampling error alone wouldn't cause her to lose the general election based on the most recent state polls.
Obviously this is Clinton's best showing since I've been tracking this, and Obama's winning percentage also matches his best (although three weeks ago his average electoral vote average was higher, mainly because back then the most recent Texas poll showed just a 1 point lead for McCain, so Obama would win it about 1/3 of the time in the model; now the most recent Texas poll, from May 7th, gives McCain a 9 point lead, guaranteeing him the state in this simulation).
Once the Democratic primary fight ends, I'd expect a bounce for the nominee, although given the tone of the campaign, I think Obama would get a larger one than Clinton would. The current data support the idea that either candidate is in good shape right now, and that McCain is vulnerable.








Comments (11)
why would a candidate win 100% of the time when outside the moe?
don't they normally target the moe to 90% or 95% confidence?
May 25, 2008 12:04 AM | Reply | Permalink
Yes - they target 95% confidence for MOE, which means that in my model a lead at the MOE means you should win the state 97.5% of the time: 2.5% of the time you're actually behind, and the other 2.5% of the time your lead is actually twice the MOE or more. So in this model, for example, McCain is leading Clinton by exactly 4% in Wisconsin, and he won the state in 97.7% of the trials, about what we should expect.
So an individual state at/beyond the MOE means you pretty much always win it. Then when you combine the 51 state contests, it can become almost impossible for the trailing candidate to win overall, which is what happened in the McCain/Clinton matchup.
Now that I've done this for a few weeks, I think it would be better to use a larger margin of error, but I'm reluctant to pick a new number for two reasons:
1. Even if the current method overstates the likelihood of the leading candidate winning, there is value in consistency, because then I can compare results over time on an equal basis.
2. If I use a larger margin of error, what should I use? If I use a high enough value, eventually it all approaches a 50/50 coin flip, which is of no interest. But is there some particular reason to pick 5%, 6%, or 8% over some other value? I don't know. Raising margin of error reduces the likelihood of either Democrat winning the overall contest with this week's data.
May 25, 2008 12:18 AM | Reply | Permalink
i think maybe you left off a factor of 2.
assuming the poll numbers are correct for mccain +4 in WI. this gives a 2.5% that mccain is really 4 less than that (or less).
but it is a 16% that mccain is really 2 less than that or less. but if he loses 2, then maybe someone else gains 2 which is enough to win.
so i think mcain should have won maybe 84%, not 97.5%
May 25, 2008 3:28 AM | Reply | Permalink
In a head-to-head election matchup, you're basically only dealing with one independent variable (the difference between the two candidates), not two (the vote share for each candidate), because I'm assuming voters will either be for the Democrat or the Republican. It's a simplification: if McCain loses support, some of it may go to Bob Barr instead of Obama or Clinton, or some of it may simply not vote rather than support the opposing candidate. But it makes the calculations easier.
May 25, 2008 8:11 AM | Reply | Permalink
run it at different confidence levels.
they say it is at 95% confidence levels, but I think it is obvious that that often is a marketing number, not an engineering number.
i'd be interested in running numbers at 80% confidence level (which is roughly equivalent to increasing the moe by a factor of 1.53 from that of 95% moe)
May 25, 2008 1:55 AM | Reply | Permalink
You've got that point backards, I think. Lowering the confidence level has the effect of increasing the likelihood the leader in the state will win it, thus lowering the margin of error. Recall that the whole point of margin of error and confidence interval is to quantify how likely the sample in your poll accurately reflects the actual population. In the normal Gaussian distribution, about 2/3 of all observations are within 1 standard deviation of the mean, but 95% are within 2 standard deviations, and that is what polling companies typically use when reporting results. So a 4% margin of error implies that the standard deviation (which is what my model takes as input) is half that, or just 2%.
I have run (but not reported) simulations at different standard deviations as a sanity check on my code. With a standard deviation of 0, the results simply reflect counting the state for the leader only, like the headline electoral votes at the Votemasters's site. With a very large standard deviation, the results become a coin flip, because the standard error outweighs any lead a candidate might have. In between there are other interesting numbers.
May 25, 2008 8:21 AM | Reply | Permalink
i also think 80% is too high.
the moe numbers are calculated from a simple problem:
you have a very large bag of blue and red marbles. you pull out some of them, count how many are of each color and infer what is in the rest of the bag.
but, in real life it is more difficult. your sampling strategy is likely to favor one set over another. the marbles really come in all sorts of colors, and you have to decide to count each as red, blue, or throw it out. also, the color of a marble is not set in stone. a little acid rain will change its color.
a question: if you see a poll of say SD that has obama up 3 with a 3 moe. if that is your only info, and you have to make an intrade bet, where would you you put your decision point for buy vs sell?
May 25, 2008 3:18 AM | Reply | Permalink
I agree the winning percentages generated by my model are too high for the Democrats right now, and widening the margin of error would produce numbers that better predict what will happen in November. I've played around with doing that on my own a fair amount, but I've not bothered reporting other results because:
1. I'd rather stay consistent with what I'd done originally, so I can compare on an apples-to-apples basis.
2. For most of the past simulations, increasing margin of error more improves Obama's performance relative to Clinton's more, and now that I know that I don't want my candidate preference bias to affect what numbers I'd report. I don't have a good rationale why I should pick 6%, 8%, 10%, or some other number as "margin of error" rather than 4%.
This is why I often note that my model is only accounting for sampling error, not the fact that opinion will change between now and November. So at best it's a snapshot of the most recent polls in each state (some of which are quite old). And, to be very pedantic, I'm not accurately accounting for sampling error, because Tanenbaum's data don't give per-poll margins of error. Via e-mail he told me that the margin of error is typically 4% in the polls he uses, but it differs from poll to poll. Finally, the percentages for each candidate are surely rounded to the nearest whole number, so a lead of .51% and a lead of 1.49% would be treated the same in my model, itself a potentially large distortion.
So these numbers should be taken with a grain, make that a whole shaker, of salt. It's at best a fuzzy map of the current state of affairs. But by using a consistent methodology, I think there is value added by comparing and tracking changes over time in the landscape. And I think these totals better summarize what might happen than simply adding EVs for the poll leader in each state.
May 25, 2008 8:36 AM | Reply | Permalink
I'd still use my gut quite a bit, more so the further we are from the primary. If there was other recent polling data, how that compares to the 3 point lead would factor into my choice, but if all I have is a 3 point lead in one poll, with a 3 point margin of error, then I'd look at both markets to see if one side is more favorable than the other (buying Obama at 95 is equivalent to shorting Clinton at 5), but I'd look to buy Obama up to the high 80s or low 90s, and sell him if he's at 95 or more.
For a primary poll, where we're less than 2 weeks away, the assumption that sampling error is the only thing that affects the results is much better than for the November election. And if the lead is exactly the margin of error of the poll, that implies that there's a 97.5% chance that correcting for sampling error would still leave him ahead (further assuming that there's not any systematic bias in determining the poll sample, itself yet another complication).
May 25, 2008 8:52 AM | Reply | Permalink
if lead = moe, then i'd say high 80's to low 90's is paying way too much (even assuming no time factor).
my feeling is i'd pay in the low 80's for a bet with no time factor, and say now for the ge (5+ months) i'd pay in the low 60's to mid 50's.
part of that is what i mentioned above. polling wrong by 1/2 or more the moe(in one direction) is about 16%(tail past one deviation). in this case, that is enough if the lead is only 1 moe.
May 25, 2008 1:06 PM | Reply | Permalink
Thanks, as always, for faithfully running the numbers for us. Seems I just caught your post before it dropped off the recent list.
One of these days I'm going to spend some time on E-V and make a state-by-state list of who won each state's primary or caucus and the latest state polling averages for each candidate; then it'd be easy to update it as we move through the next tortured weeks. Most big states that went to Hillary in the primaries are leaning Obama-ward now; in some, Obama even leads Hillary against McBush. But I'm too busy making comments on the more outraged posts today trying to draw outraged Obama supporters back into thinking about the general. Priorities, I guess.
May 25, 2008 6:13 PM | Reply | Permalink
Post a Comment