In the previous post we looked at a simple linear regression between (simulated) human heights and weights. In that example, the regression was truly ‘linear’, in that the predictor variable only occured in its first power, to provide a linear regression equation on the form mu = beta * x + alpha.
From a purely mathematical point of view though, there is nothing preventing us using a higher power for the predictor, in a polynomial form. The only thing that changes (except for some of the ‘fancy’ plotting code’) is the regression equation, and adding a new prior (beta2);
for instance for a quadratic regression (for some weird reason, it’s still called ‘linear’ regression though!) :
mu = beta2 * x*x + beta1 * x + alpha
in fact, we can add arbitrarily many terms with increasing powers, each with their own prior stochastic coefficient.
Below results from the first three powers, this time on real data from [Howell]:
power = 1:
power = 2:
power = 3:
As can be seen, the more terms with higher powers we add, the better the fit. In fact, by adding terms we can create a perfect fit, where the regression line passes exactly thru each data point.
Now, the question becomes: which of these three models is “better”…? Perhaps non-intuively, from a statistical inference point of view, it’s rarely the one with the “best” fit… I might return to elaborate on that topic in a later post, meanwhile, I encourage the curious reader to think a bit about why this might be so….