Like most of you, I’m sitting down and doing some serious rough draft writing this weekend. I sat down with the Dr. Greenlaw on Friday, and he taught me how to mess around with E-Views. It wasn’t as hard as I thought, but my results were a bit off at first, so I decided to mess with them a little. I initially tried messing with the variables that I entered. I got satisfactory results the first time, although it disproved my initial hypothesis, but I figured that I would try different angles instead of just taking one result as “word-of-law”.
I got similar results when I swapped out “Total Quantity” with “Quantity of Men” or “Quantity of Women”, so I tried a different angle altogether. I went into my data spreadsheet (after making a copy of it of course) and deleted seven different “Outliers”, meaning races that either 1). were too long that the general public were unwilling/unable to participate, or 2). That had an ABSURDLY high quantity of runners, or an absurdly high price, or an absurdly high number of “years held” due to other variables that I couldn’t gather data for. These outliers were most likely influenced by other variables that I could not get data for/could not put into a regression analysis. Removing these skewed attributes changed things.
My regression results went from having the “Number of years held” variable being the only statistically significant one (having a P value of below 0.05), to having “Price” be the significant variable, AS WELL AS “Number of years held”. My Standard of Error also dropped from over 300 to 112, meaning that the error off of my “Line of best fit” is approximately 120 people per race, which is not a terrible margin of error. This is a pretty big spread, but I expected this, because I’m dealing with human behavior and not everything is perfect when it comes to that. My R-Squared also jumped about 16 percent, going from around 12% to around 28%, which, for cross-section data, is considered within acceptable range.
I’m doing more research into the meanings of all the many other listed statistics about my regression, because I know the P value and R-Squared are not the only important ones. My Durbin-Watson (measure of auto-correlation) dropped from just around 2.0 to 1.8, but it isn’t dropping towards 1.0 at all, which I’m counting as a good thing, meaning that none of my variables are auto-correlated.
Now that I have deleted my outliers, I find that my hypothesis is actually correct, and that I am on the right track for actually writing my paper! Which I will do tomorrow, fueled by Caffeine and Willpower!