MULTIPLE REGRESSION
MULTIPLE REGRESSION AND ITS DISCONTENTS
Introduction
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
Anecdotes
* One of the authors recalls watching an SUV size computer placed in glassed off room with a specialist who wore tie and lab coat. It was located not far from the university department. This huge machine could also correct exams. A secretary nearby was moved to tears. A machine would cut her problems correcting exams and science had triumphed again. It was the mid 60’s. By the early 70’s much smaller and more efficient machines emerged. By the mid 70’s, stat classes at undergrad and grad programs had numerous statistics courses with multiple regression. This measurement of multi-variate analysis became available because of the rapid calculations completed by the computer. At the time of this writing, the rest is history.
**Nominal variables (dummy variables) are treated AS THOUGH they had a number meaning (Kerlinger & Pedhauzer) To these authors that is quite a leap of faith.
***”Even though it is a nominal variable (dummy variable), you can treat it as a ratio. variable.(www.socialresearchmethods.net/kb/dummyvar.php)
We can only ask why?
****”All you need to reject the bell curve is for such a movement (black swan) to occur once, and only once–just consider the consequences.(Taleb, 2007: 230) We want to see this, if it is valid.
DISCUSSION
Slowly, multiple regression has brought about a number of criticisms.
One can only check their computer listing on Google (criticism and bell curve) to see what the authors are saying. It is not our place to try to
indicate that Gaussian statistics is wrong, but to ask the reader to make room for two alternative positions. One is Mandelbrot and the other is Serendipidity. Further, Gauss theory is the foundation of numerous statistics and is the tool used most in research and the trillions of dollars traded on the stock market. Most know the bell curve.
FURTHER DISCUSSION
As one of the earliest textbooks for multiple regression, (Kerlinger & Pedzaur: 1973:441-444) noted that the field was emerging and they started writing a manual to deal with this “new” strategy for the behavioral sciences. The manual got larger and larger and became a book.
However, with the excitement of this relative new era, they were not arrogant. They noted that multiple regression had weaknesses. Some include multi-co linearity, sorting variables without theory, utilizing small samples under an N of 200, overabundance of independent variables, impact of changing order of independent variables regressed on the dependent variable (that may be reduced with stepwise functions) However, they never mention the theorist who came before multiple regression and a number of econometric models like Gauss or his adversary such as Mandelbrot.
GAUSS
Very briefly, the Gaussian world is constructed of looking for the most recurring event that becomes the average of low numbers to high numbers. Once numbers are added in the numerator an average is calculated divided by the denominator of the average deviation of each number from the mean. Signs are disregarded or a standard deviation is calculated. All of the following can then give birth to the bell curve. Further, if the numbers are robust to the level of ratio power, (graphpad.com/faq/viewfaq.cfn?faq=1089) this sequence can give birth to T tests F tests and Analysis of Variance. From there, multiple regression can occur. The Beta weight that is most robust explains the most impact on the dependent variable. Univariate analysis may tolerate nominal, ordinal, and interval measures.
Even the most harshest critic admits that Gaussian statistics can apply to such things as poker games and death rates where upper and lower limits are known as well as other areas( Gladwell, 2009: 69)
Further, Gaussian statistics fits nicely into assumptions of science, that nature is orderly, we can know it, all phenomena have natural causes, nothing can be accepted without testing. With Gauss, it can appear to be a cozy knowable world and it is at the time of this writing the standard of nearly all societies.(Cheng,E. 2009:967-977) This is the incredible power of this theory of numbers strategy. (web.utk.edu/dhastings/Basic_Assumptions_of_Science.htm)
However, the world appears to be chaotic at the point level where one exists(Snell,J. Cangemi,J. and C. Kowalski, 2008) Order comes from distance. Multiple Regression assumes that variables are normally distributed, homoscadistic, and measures are reliable. Last, independent and dependent variables can be truly isolated and tested.
(pareonline.net/getvn.asp?v=8&n=2)
CRITICISMS
We don’t live in a Gaussian world. David Li using a variation of
Gaussian stats helped bring down Wall Street in the Great Recession of 2008(Hornbrook, 2009.) This is especially applicable to Robert Merton’s “Portfoilio Theory.” The theory nearly destroyed the market in the late 90’s. However, he still received a Nobel Prize and MBA’s are learning of it as a valuable tool(Taleb,207:278-280)
Using Dummy Variables that are nominal and treated as ratio is another criticism and is mentioned above. It is an imperial stretch to
ratio, and some suggest that it can act as an off/on switch or represented groups of data that are ratio(www.socialresearchmethods.net/kb/dummyvar.php/)
Last, multiple regression assumes that the isolated variables being tested
are truly isolated and that the remaining world around the stats are calm.
Goertzl in both (2002) and (2004) noted how multiple regression can sully research when simpler modeling is not only more understandable, but more valid. His articles indicated that multiple regression is “junk science.”
We disagree. Rather than toss, we want to find ways where other strategies may also be useful. Is that not what the Hegelian dialect is all about?
MANDELBROT
This individual and his approach starts with the premise that the middle may miss the mark of validity. Assume the year 2000 is
365 days and that for some reason most days on a certain ratio measure is around July 15th. Mandlebrot starts with September 11th of that year.
The attack on America’s soils is such a supreme act in the States, that this day and the attacks was the Black Swan that reorder numerous events and assumptions in the United States when large air planes
plowed into important buildings. This was even more incredible than the Oklahoma bombing of years before. From this point, iterations flow and can be counted. Some of the theory may be best explained
in (Snell et.al 2008: 11-17)(see Brockmartin.com) Please see the terms
and theory in the article noted. However for the purpose of this article terms include fractals, phase portrait, attractors, and bifurcation.) Taleb ( id bid. ) spends an entire chapter 16 explaining calculations. Further, a Mandelbrot set calculator is available at
(www.disordered.org/JTMandel.html) Like the above discussion of Gauss, we will leave the reader to these sources for calculation.
However, please recall that in the year 2000 that we did not choose the “average” day of July 15th, but September 11th or now 9/11.
It is the most historic day of that year. We will then start with a portion of fractal geometry and let the computer calculate the iterations and related.
So will this strategy get us to ultimate validity and reliability? We don’t know, but we doubt it. That is why we don’t want to delete Gauss and we believe that the world will find it’s useful as well as Mandlebrot. Then something will critique these approaches and hopefully we will march on to whatever and wherever the new criticism emerge.
SERENDIPIDITY
We want to discuss a third strategy that may go to the heart of level of discovery rather than level of verification that is promised by Gauss and Mandlebrot. Snell & Marsh(2008) discuss the level of discovery in terms of exotics that create new hypotheses and new paradigms. It is also the third portion of measurement that stumbles onto validity.
In his last book before he died, Robert Merton the father of the above
Merton of Portfolio theory, analyzed and discussed the surprises that one can find in research (Merton& Barber, 2004) underlined how the word has been abused and used throughout the years. An interesting footnote is that Merton rebranded the term and aggregate the history of the term (Tolson, 2004) For the purpose of this paper, we will use the term to mean “surprise discovery.” In Chaos theory, it is the first bifurcation.
Or, are inadvertent measurement stumbles onto a discovery and verification.
EXAMPLES
We want to use some terms that approximate Serendipity using our own definitions. They are: 1) “Off label” we create a medication for one sympton and accidentally find that it not only works for the first symptom but for some other malady that may be entirely not related.
2)”Deconstruction” in viewing research that is generally backed by established sources we find that in fact the data supports a contrary and not acceptable finding.
3) “Outliers” are findings where the isolated exception
may tell us more about what appears to be valid then the modal numbers. 4) “Chaotic Buttterflies” a small but remote variable makes an incredible change in the outcome of the research 5) “Black Swans” a variable is introduced that is so far out of the box that it decimates nearly all the previous assumptions. 6) “Urban Legends” is an assumption that is incorporated in research and repeated over and over that it is thought to valid when it may not be like divorce at a rate of 50%(Luscombe, 2010) 7. “Paradoxes” the findings when faced with reality do just the opposite of the desired goals. 8. “Alternative Paradigms” because we are now approach the same numbers with a new theory is becomes more understandable. 9) “Misinterpretation” researchers discover that the author of a certain theory meant one thing and the majority of those counting numbers thought something else.
10.)” Conundrums” one or more research producers keep coming to a loss. generation after generation. Whatever is being tested goes beyond understanding and each new generation come to the same conclusion.
SUMMARY
We reintroduced Gauss to compare with Mandlebrot. Serendipidity was also discussed as a way to cover the gamut of statistical measurement and analysis. However, we also saw another reason. Gauss is the heart of modernism and its assumptions. Mandlebrot though he did not mean it to be is the author of postmodern statistics. Especially the non-linear Chaos theory that arrived on college campuses in the late 70’s. Roughly at the same time so did deconstruction as defined by neo-postmodernism in which nominalism is treated as a very soft variable so that one can still abstract to another imprecisely what happened in the past.
Serendipidity is the third level of discovery and has its attachment to pre-modernism although it is still very apparent today. One finds an inadvertent surprise discovery and it translate itself into research. Originally,
before modernism, research could take on the aspects of trial and error. Then information via oral tradition and then written word were passed down to the next generation. When groups die, perhaps some very helpful cures or findings that may even hold up today,
pass into the ages.
CONCLUSION
This has been a discussion of Gauss, Mandlebrot, and Serendipitous issues. Hopefully, this may cover some of the territory in a more readable way than past descriptions.
REFERENCES CITED