Disclaimer:

Disclaimer: While I am a paid undergraduate researcher for Texas Tech University, the opinions expressed on this blog are my own. The statements on this blog are not endorsements of views of Texas Tech University, The Honors College of Texas Tech University, or the math department of Texas Tech University, or any other department or campus organization of Texas Tech University.

Monday, January 23, 2017

Results

Since Donald Trump is officially the president and almost all results are official (except Michigan where it not clear if the changes discovered in the recount will change the official tally) I am posting about how well I did.

My model: 88%
Wrong in Florida, North Carolina, Michigan, Wisconsin, Pennsylvania, and Ohio,

My personal prediction: 90%
Wrong in Florida, North Carolina, Michigan, Wisconsin, Pennsylvania,


Five Thirty Eight: (taken on Saturday) 92%
Wrong in Nevada, Michigan,  Wisconsin, Pennsylvania

Five Thirty Eight Final: 88%
Wrong in Florida, Michigan,  Wisconsin, Pennsylvania, North Carolina

Princeton Election Consortium: (taken on Saturday) 90%
Wrong in Florida, North Carolina, Michigan, Wisconsin, and Pennsylvania
Note: there was no call in Iowa on Saturday, but the final call was a Trump win so I am considering that call correct.  The final call is identical




New York Times Upshot- 90%
Wrong in Florida, North Carolina, Michigan, Wisconsin, and Pennsylvania
Upshot did not have a final prediction.

Real Clear Politics (Both final and Saturday) 88%
Wrong in  Florida, Nevada, Michigan, Wisconsin, New Hampshire, and Pennsylvania

I also ran my model for 2008 and 2012 and it matched the Five Thirty Eight predictions for the winners.  I was 93.58% as accurate as Five Thirty at predicting means (using root mean square error).  

Considering the success of other models, I did relatively well.  The core ideas behind my project appear to be solid.  My predictions were highly competitive with compared to models created by organizations that had far more resources than I did.  I look forward to 2018, where I plan to predict the senate races.


This will be the final blog post.   I will continue  posting about statistics in our everyday lives at balexanderstatistics.com

Thursday, November 10, 2016

We were wrong.

I didn't see a Trump presidency coming. I never liked Trump, but I liked him a little more than Clinton personally. But I couldn't personally imagine that he would receive support outside the republican party. I underestimated him. I assumed that undecided voters would be split pretty evenly, which normally happens.  The polls showed Clinton would win, and the early voting data supported this.  I don't know why the polls were wrong. I don't know if I and other people  focused enough on the potential for error in our models. Maybe we sold our models as a certainty,  when there was a potential for error.   So I thought the polls were right. But they weren't.  I don't know what happened.  I don't think the polls were rigged.  Maybe people lied about their support.  Maybe Trump supporters didn't participate enough in polls.  Maybe people changed their mind suddenly after polls stopped. This isn't over.  Just because we may have failed in this election doesn't mean we going to stop predicting.  We had bad data, and statistics doesn't work on bad data. We could have made mistakes and assumptions that turn out to be incorrect.  But we will learn and grow and examine our results.

We may have made mistakes. I hope that you will understand that the failure of the predictors shouldn't mean we need to be dismissed.  I think the biggest problem we had was we focused on the data a little too much and ignored the fact that Trump overperformed in the primaries compared to the polls, and Clinton underperformed her poll numbers in the primaries.  I don't have the answers yet.  It could take years to understand what happened. But it will be studied. We will work on being better.  We will focus on creating better polls and better models.  I ask instead of dismissing us as biased or as pseudoscientists, that you would give statistics another chance.   Election prediction is a relatively new field. Give us a chance to grow.  Answer polls and help us get better data. Give us a second chance.







Tuesday, November 8, 2016

Looking forward to the future

After the election results become final, I will make a final post on this blog with my accuracy and the accuracy of the four other organizations mentioned before.  But after the final post, I will no longer be using this blog, but it will still be available for reference. I will continue to write about statistics on my new website : balexanderstatistics.com . I can be contacted on the form on the site found here: http://www.balexanderstatistics.com/contact-us/ .

Saturday, November 5, 2016

Comparison Information

Here are the current calls of four  different websites to be used as comparisons for the success of my model.

My model

Five Thirty Eight


Princeton Election Consortium



Upshot from New York Times

Real Clear Politics

Final Call Summary

This post represents my official final prediction for the 2016 presidential election using my experimental model based upon bayesian analysis.   Overall my model predicts 340 electors for Clinton and 198 electors for Trump.
My final set of data was pulled from Pollster (run by Huffington Post) on either Friday (11/04) or Saturday (11/05).

First I am going to lay out again how my model works and how decisions are made.


I designed my experiment in September of 2016. Basically my model operates on an adjusted average of poll data from both the state and a similar state with more information. There are 5 categories for the states: red southern states (Texas polls), red midwest states (Nebraska polls), blue northern states (New York polls), blue western states (California polls), and swing states (National polls).  These states were chosen in advance. If I were making this decision today, I would have picked Indiana for the midwest states instead of Nebraska because it has more polls with better information on other candidates.  For a poll to be considered it had to  been conducted on or after July 1st, 2016. The lastest polls do not play a large factor in my analysis since I am averaging the since July 1st.   Percentages for third party candidates were attempted to be approximated, but because of the inconsistencies in the inclusion of third party  in state level polls my model will underestimate these candidates. Changes over time are not a factor in my analysis, because the theory is over time opinions don't change that much about candidates in an usual election.


I am defining success as correctly predicting at least 46 states. However, this experiment is not about being correct.  This is about testing the effectiveness of a new type of model.   My model is probably not the best model on how to predict a presidential election, but it is still a valid model and should be relatively accurate.  This model is also untested and there are no examples of this exact approach.   I make my calls and analysis based on my model not my opinions. I only think my model is wrong in Ohio because I believe the latest trends over the last few weeks there are probably more accurate than the overall average for the last few months. In the nomination process, I made calls on my personal opinion in cases of overlapping intervals that were close so that no true winner could be found.  If I had a situation like that in the general election I would make a call on outside information, but I this is not the case.  This prediction is what I am studying academically and is what I will write my paper on and present at conferences.  However, I might release another prediction tomorrow or Monday based on new polls that changes the results of my model in the 6 states I think might flip before election day. This updated prediction will be based on my model, but won't determine the success of my experiment.


Nebraska and Maine award based on congressional district.  My prediction is Trump will win all congressional districts in Nebraska, and the 2nd district in Maine.  Clinton will win the 1st district in Maine.

This is a summary map of how my model is calling every state:
This is the official map for my experiment.





This map isn't based entirely on my model.  I think this is what will actually happen based upon a combination of model and other factors like momentum, trends, early voting data, new stories, and other non-poll based factors. However, besides Ohio both models are in agreement.



Personal Opinion Disclosure:  I voted a straight republican ticket in the election except for president where I wrote in Ana Navarro.


Technical Description of my model:  I am doing a bayesian analysis assuming a normal prior, a normal hypothesis, and a normal posterior.  This is done in Anaconda using Scipy. The method for finding standard deviation is the standard formula based on the sum of the squares of deviations from the mean.  I am aware that this is probably not the best method for this kind of situation.  But, I am limited by time and mathematical abilities as an undergraduate student.  I plan to go more indepth in the future.






Close Swing States

This group of states is close enough where new polling in the next few days could change how my model calls the state.  These states are Arizona, Florida, Iowa, North Carolina, Nevada, Ohio, in the order of least likely to most likely to change under my model.  To be clear about when I pulled results time is included. All data is from Pollster (run by Huffington Post).  All times are from 11/05 and are in Central Standard Time.

Full Results:
Note some confidence intervals were corrected to have a nonnegative beginning since probabilities can't be less than zero.

1. Arizona 3pm - Trump, outside of confidence interval
The recent news from the FBI, and the fact that after the 2nd debate Arizona got less close, means Trump will probably win Arizona.  My main concern is that the hispanic vote (which leans towards Clinton) could  change the outcome of this state.  Polls are often based on likely voters which uses data from past elections, and if more hispanics vote in this election than ever before, the polls could be wrong. Republican have a lead in early voting, but it is close and not all registered Republican will vote for Trump.  But given the strong history of Republican wins in Arizona, and the polls, it seems like Trump will win. My model shows a Trump win, but a closer race than ever before.

2.  Florida 3pm - Clinton, outside of confidence interval
Florida is another close race.  Democrats have a small lead in early voting, but it is close.  My model shows a Clinton win by 3 percentage points, but I think personally it is closer than that. If Clinton can get Obama voters to turn out, she will win Florida.  She has a great ground game in Florida and I think she will  win there.

3. Iowa 3pm - Trump, outside of confidence interval
Iowa has been trending for Trump for the past month, but voting started in September, and early turnout for Republican was initially low. Now I think the race has stabilized and will be won by Trump.  This will also be a close race.

4.  North Carolina 4pm  -Clinton, outside of confidence interval
Like I have said before Clinton has a great ground game and organization. This will be close, but since President Obama and Michelle Obama have been to this state recently to get out the vote.  I think it is going to come down to the better run campaign. A recount could be likely here, maybe not on a statewide level, but in certain areas.

5. Nevada 4 pm - Clinton
I don't think Trump has enough support among Hispanics to win here.  Hillary Clinton has an ad with employees of the Trump Corporation talking about why they don't want him to be president.  The legalization of marijuana is on the ballot here, and I think that will draw out younger, more socially liberal voters who may not have otherwise voted.   I think there could be a recount in certain areas, because the race is so close.

6.  Ohio 4pm - My model - Clinton,  My personal opinion - Trump
Sometimes statistics don't work.  Normally they are right, and work well.  Since my model focuses on the average of polls which have mainly been Clinton leads before the the last few weeks.  So while the difference between my models two means is less than 1%, it is too big to call a Trump lead.  This doesn't mean my model is wrong or bad. Sometimes these things happen. If I had only included polls starting after the first debate, or did things different I could have gotten a result suggesting a Trump lead.  But to protect my research I decided this things in September.  This is a tough year.  Even the experts are making different calls than each other.  So while I think the FBI's announcements about both Hillary and Bill Clinton, and early voting data suggest that Trump will win Ohio, as a researcher I have to default to my model.  If a few new polls come out on Sunday or Monday with Trump leads it may tip the scales in my model, and I may post an update.  At the end of the day this was never about being right, but rather to test a model.  So while my gut tells me Trump will win Ohio, I have to admit that my model says Clinton will win.

Safe Swing State Final Call

Swing States I feel are safe for the leading party are Colorado, Michigan, New Hampshire, Pennsylvania, Wisconsin, Virginia.  My model shows Clinton winning all of these states.  These races will all probably be close, be Clinton should win all of them. It would be highly unlikely for new polls to change the results of my model in these states.

I downloaded the csv files at 10 am 11/05 from Pollster (run by Huffington Post).

Since these are swing states I am going to give an analysis on every state.  Overall I think Clinton is running the better campaign.  She has the support of a president that is viewer more favorably than unfavorably.  She has a strong ground game and the support of multiple celebrities that I think will help turn out the younger and minority votes.

Full Results:
Note some confidence intervals were corrected to have a nonnegative beginning since probabilities can't be less than zero.

1. Colorado - Clinton win, Outside of the confidence interval

Colorado has voted for a democrat for president since 2004.  Colorado uses a mail-in ballot system (ballots can be dropped off in person) which I think will increase turnout.  Johnson is getting enough support (around 5%) to make it harder for Trump to win.  Regardless of the recent FBI document releases,  Clinton should have no problem winning.

2.  New Hampshire - Clinton win, outside of the confidence interval

I think New Hampshire is close, but most of the older polls show a Clinton lead which means on average she is leading in my model.  New Hampshire doesn't have early voting (absentee is available as needed) so it is hard to tell how people will actually vote. There is a really close senate race in New Hampshire that may flip from a republican incumbent to the democrat.  Because the democrats have a good chance to gain a senate seat here,  they will (and have) spent effort and money trying to win this seat and the state's electors. So I think there will be a Clinton win here, however close the race will be.

3.  Michigan - Clinton win, outside of the confidence interval

Michigan is getting closer over time.  However, Clinton has a great ground game in the swing states and her celebrity surrogates should turn out the vote her.  I don't think Trump has enough support from members of the Republican party to help him get out the vote. There is no early voting in Michigan.   This will be a close race, but I think Clinton ground game and organization is why she will win this state and ultimately the presidency.

4. Pennsylvania

Pennsylvania has gotten closer since the FBI announcements.  There is no early voting in Pennsylvania.  But she is still leading in basically all the polls (some of the time it is in the margin of error).  If the republican nominee was less controversial and more moderate, this would probably be a closer race.  But Donald Trump doesn't have enough support among independents in my opinion to win in most swing states.

5. Virginia
Virginia has been showing a consistent Clinton lead.  I think the area close to Washington D.C. will have a strong Clinton lead, because Trump doesn't have a lot of support among federal employees.  Obama won  Virginia twice.  I think Clinton will win again.

6.  Wisconsin

Wisconsin may be the home of Paul Ryan, but it has voted for a Democrat for president in every 21st century election. Like most of the  states the race is getting closer, but I still think Clinton will win here. Wisconsin also has same-day registration that will probably benefit Clinton more than Trump.  I don't think Paul Ryan will lose his seat, but I don't think Trump will win the votes of everybody who votes for a least one Republican



Full Results:

Colorado  based on National
Hillary's Mean: 0.471294699683
Hillary's Std: 0.004746435642928872
Hillary's CI: (0.46974422586373638, 0.47284517350209559)
Trump's Mean:  0.415064661232
Trump's SD: 0.006151607836034659
Trump's CI: (0.41305517293112315, 0.41707414953300392)
Other Mean: 0.058935930613
Other Std:  0.008229305583452357
Other CI: (0.056247740186133093, 0.061624121039913431)
Johnson's Mean: 0.054704708472
Johnson's SD: 0.0077782639740753
Johnson's CI: (0.052163855596758547, 0.057245561347236205)

Michigan based on National
Hillary's Mean: 0.516253556871
Hillary's Std: 0.009037563625769775
Hillary's CI: (0.51263784481006636, 0.51986926893286012)
Trump's Mean:  0.442293032229
Trump's SD: 0.009428492613825277
Trump's CI: (0.4385209188721837, 0.446065145585541)
Other Mean: 0.015130517008
Other Std:  0.004747167974335746
Other CI: (0.013231289161518625, 0.017029744854453369)
Johnson's Mean: 0.0263228938917
Johnson's SD: 0.0038995691388233867
Johnson's CI: (0.024762769974077541, 0.027883017809299544)

New Hampshire based on National
Hillary's Mean: 0.492744025978
Hillary's Std: 0.00466779451328252
Hillary's CI: (0.49153224985058147, 0.49395580210505935)
Trump's Mean:  0.43275346221
Trump's SD: 0.00556498611490036
Trump's CI: (0.43130877193826578, 0.43419815248115956)
Other Mean: 0.0382835487485
Other Std:  0.005947772316775469
Other CI: (0.036739485811923628, 0.039827611685165035)
Johnson's Mean: 0.0362189630639
Johnson's SD: 0.005857012985192196
Johnson's CI: (0.034698461573938841, 0.037739464553906435)


Pennsylvania based on National
Hillary's Mean: 0.499431200071
Hillary's Std: 0.0033953724888021474
Hillary's CI: (0.49848051324353293, 0.50038188689843299)
Trump's Mean:  0.435130943208
Trump's SD: 0.0037467479683491486
Trump's CI: (0.43408187305387341, 0.43618001336161999)
Other Mean: 0.0384127761622
Other Std:  0.0047163530136822155
Other CI: (0.037092221584308589, 0.039733330740078322)
Johnson's Mean: 0.0270250805591
Johnson's SD: 0.003966667570309904
Johnson's CI: (0.025914434048155514, 0.028135727069998492)

Virginia based on National
Hillary's Mean: 0.500227647747
Hillary's Std: 0.00515263724294697
Hillary's CI: (0.49866934077383424, 0.50178595472015575)
Trump's Mean:  0.418424814401
Trump's SD: 0.005782379618990899
Trump's CI: (0.41667605506545191, 0.42017357373671893)
Other Mean: 0.0396046559353
Other Std:  0.005760302314560753
Other CI: (0.037862573416945891, 0.041346738453629253)
Johnson's Mean: 0.0417428819166
Johnson's SD: 0.006042677016940509
Johnson's CI: (0.039915401096990238, 0.043570362736273516)

Wisconsin based on National
Hillary's Mean: 0.508592572807
Hillary's Std: 0.005190628103447107
Hillary's CI: (0.50655788397894475, 0.5106272616349038)
Trump's Mean:  0.449783183794
Trump's SD: 0.004841721322588826
Trump's CI: (0.44788526391113176, 0.45168110367731323)
Other Mean: 0.0150244708269
Other Std:  0.004100060835103732
Other CI: (0.013417276512683526, 0.016631665141174139)
Johnson's Mean: 0.0265997725719
Johnson's SD: 0.003358142358301173
Johnson's CI: (0.025283404956478558, 0.027916140187370035)