We project states based on a simple methodology. We consider the results from every credible statewide poll we can get our hands on. We do not consider national polls, as electoral votes are determined through statewide races. (Gore won the popular vote in 2000, after all, but Bush won the electoral vote and the presidency.)
In regards to statewide races, we consider the polling of the three candidates of interest: Bush, Kerry, and Nader.
In the case of Nader, the big question is: Is he on the ballot in a given state? Most states have deadlines for the ballot in July or August (even September in a few cases), so it'll be months before we know for sure. If Nader is on the ballot in a state, we only consider polls that include him in the polling. If he has definitely failed to get on the ballot, we only consider polls that do not include him. For states where his status is still pending, we base the campaign map on a "three out of five" scenario. If three of the last five polls include Nader, then we make a projection based on that. If three of the last five polls do not include Nader, then we ignore him with our analysis. Since we don't know his ballot status, we focus on the most recent data instead.
To strongly "own" a state, a candidate must beat his opponent by more than eight points. To weakly own a state a candidate must beat his opponent by more than four points but no more than eight. Furthermore, this must be done across an average of three recent polls in that state. As newer polls are added older ones "scroll off" and up-to-date projections are made. On rare occasions (very rare) I manually tweak ownership of a state. Most notably this happened in Colorado when a series of polls without Nader said the race was tied. There were no polls with Nader though and since he was on the ballot the algorithmns ignored the non-Nader polls. Manually tweaking ownership of the state was a way of overcoming this weakness in the programming that runs the site. Fortunately it is needed very rarely.
It is impossible to accurately predict the eventual winner of the presidential race in advance, but we can make a rough estimate of who would win if the election were held today.
To do this, we make two projections: "Bush vs. Kerry", and "Bush vs. Kerry vs. Nader". This is actually a bit inaccurate. The first matchup assumes that Nader is not on the ballot except in states for which he has already qualified. The second matchup assumes Nader is on the ballot, except in states that he has officially failed to qualify. By doing these two analyses we know which polls to consider and can give a "high" and "low" estimate on how the candidates would do if the election were today. The truth is somewhere between both estimates, as Nader's eventual ballot status will likely come down somewhere between both extremes. As the election season progresses and we learn more about Nader's ballot status, the two projections will become more and more similar. When Nader's ballot status is finally ascertained in all 50 states and Washington DC, the two analyses will be identical and one will be removed as redundant.
In our "if the election were held today" projections, we consider the three most recent polls. Again, with a caveat. The three most recent polls must be of the type for which Nader's ballot status applies within that state. For example, Nader will be on the ballot in Florida. As a result, we only consider the three most recent polls that include Nader in the results. Even if a poll without Nader is more recent, that poll is ignored. Conversely, Nader will not be on the ballot in Arizona. As a result any poll with Nader on the ballot is discounted when determining who wins Arizona's electoral votes.
In the states where Nader's ballot status is uncertain, we conduct a "best three out of five" scenario. In this we look at the five most recent polls. Whichever type of poll has at least three polls is the analysis which is used. So if a state has three polls with Nader and two without, only the polls with Nader are considered.
If by some chance the state is tied after three polls, we will look back at a fourth poll. If that is tied, we will look at a fifth and so on. This is rare and is unlikely to occur. Under no circumstances do we look at polls that fall into our "outdated" classification. The exact number of days old that a poll can be without being outdated will decrease as the election approaches.
We also take into account the fact that undecided voters traditionally break against the incumbent. Each poll computes the number of undecided voters using the formula: Undecided voters = 100 - %Kerry - %Bush - %Nader - 1.01. Two-thirds of the undecided voters are then added to Kerry's total percentage. The remaining third is given to George Bush. The split was designed based on the research of Charlie Cook, a noted political analyst and senior contributor to Cook's Political Report. We believe the number of undecided voters that will go over to Nader will be insignificant. The 1.01 figure in the above formula is the national percentage of voters in 2000 that voted for a candidate other than Bush, Gore, or Nader. Without a good reason to adjust the figure, we have made the assumption that the figure will be the same in the 2004 election.
Our projection of the popular vote is based on a similar design. We look at the three most recent polls with or without Nader depending on whether Nader is on the ballot in a given state. If Nader's ballot is uncertain, we provide projections for both options. If possible, we only look at polls within the last 45 days. If a state has no "current" polls (usually in low population states leaning heavily one direction or the other), we look at polls regardless of their date. If no polls exist at all for a given state, then we assume Kerry and Bush will score the same percentages as Gore and Bush did in 2000.
Regarding the number of actual voters, we determine the rate at which a state votes by determine the state's ratio of actual voters to registered voters that voted in the 2000 election. We then adjust the figure up or down by taking into account the number of voters registered in each state. All of these figures taken from the Federal Election Commission's website. Because Wisconsin and North Dakota are exempt from the terms of the act and provided no statistics to Congress, we did not adjust the number and assumed that the same base number of voters would vote in 2004.
Please note that only active voter statistics were used. The reports submitted to Congress by the FEC under the National Voter Registration Act distinguishes between the total number of registered voters vs. the number of active registered voters. The active number of voters takes into account voters that have been flagged as invalid (although technically elgible to vote) for a variety of reasons. For example, if a voter moves to a new address and mailings to his old address are now being returned as undeliverable. In this case the voter may technically be registered twice. The active statistic eliminates the duplicate. (Not always though. Alaska, amusingly, reported to the FEC that more than 120% of their population was registered to vote.)
All figures were then adjusted to reflect that the electorate is highly polarized and unusually active this election year. Battleground states were adjusted to reflect an 8% increase in the number of people voting. Non-battleground states were increased by 6%. These are based on loose calculations and my own intuition. (The Bush campaign is estimating a 6% increase. Kerry's is estimating 10%, primarily among independents and loosely-aligned voters.) If anyone has more data on the likely change in the number of voters in 2004, I would be glad to adjust the formulas accordingly.
The result was then determined by multiplying the number of expected voters by the expected percentage of voters for each candidate. The break of undecided voters for the challenger was taken into account as described in part two (above).
All of this leads to the formulas that calculate the raw popular vote for each candidate. The formula is applied for each state seperately.
Number of voters for a candidate = (# of voters in the election) * (% support for that candidate)
Number of voters in an election = (# of voters in the 2000 election) / (# of registered voters in 2000) * (# of registered voters in 2002) * (1.06 if a non-battleground state or 1.08 if a battleground state)
% support for a candidate = ((1/3 if Bush or 2/3 if Kerry) * ((((100 - % supporting Kerry) - % supporting Bush) - % supporting Nader) - 1.01)) + % supporting that candidate
The % support for a candidate is averaged from up to three recent polls if possible and up to three outdated poll if not. As a last result the formula is ignored and the Bush-Gore percentages of support from 2000 are used.
Got all that? There'll be a quiz on Monday.