.comment-link {margin-left:.6em;}

Friday, January 09, 2009

 

What's Up with Polynomial Fits (yes, it has astronomy content)

A reader has wondered why I was so down on polynomial fits in my post on the Australians War Against Science*. Polynomial fits have their place, but to give you an idea of why polynomial fits are a Bad Thing (tm) in climate (or a wide range of) data, without extensive mathematical explanations, have a look at the graph to the left (click to enlarge).

The raw data, on casual observation, is a bit noisy, but appears to plateau then fall in the final observations.

The polynomial fit is the green curvy line (I've used a 5th order fit here, as the graph in the Jon Jenkins article used a 5th order polynomial). It sort of wobbles up and down a bit, then does a spectacular dive at the end, just like in the Jenkins graph).

A pretty good estimation of the trend eh? No. The data is actually a continuously increasing line to which I have added noise. The scale of the graph and the range of noise variation was chosen to match the UAH tropospheric temperature data. The "trend" given by the polynomial is dead wrong. The linear fit (black straight line) recovers the real line out of the noise pretty well.

The polynomial fit is quite sensitive to the final data points in a series, it just needs a bit of noise to send the curve flailing around like a snapped hawser. And climate data is very noisy, that's why scientists study long term trends, where the noise can be averaged out. Also, as we are coming off a La Nina event, we expect there to be a drop in global temperatures, it's what La Ninas do. To use a polynomial fit of truncated data when you know there is more data and you know we are coming out of a La Nina event is either sheer stupidity or culpable fraud.

This is why you don't use polynomials fits in these circumstances, they will either be misleading or completely wrong. Have a look at the second graph, it a graph of stellar intensity. The black line is the polynomial fit. Now, if someone told you that the polynomial "trend" meant that the star would soon fade away, would you accept that?

Espeically if you knew it was a truncated section of the intensity graph of Algol? That's the sort of thing Jenkins is trying to get away with.

* They are at it again today, with a headline "Obesity Epidemic a Myth", which grossly distorts the actual science. I'll rant about that later (our kids are nolonger getting exponentially fatter, they are still some of the fattest kids in the world though).

Labels: , ,


Comments:
It looks to me like the polynomial fit matched the data rather well in your second graph.

Would a linear fit have been better?

Do not confuse fitting known data with extrapolation. These are two very different concepts.

Extrapolation of a polynomial fit, unless very well behaved, will always be wrong.
 
For camera calibration, I often average hundreds of images together and use a polynomial fit to approximate the vignette of the optics.

Because of random noise, you want to use the minimum number of polynomial terms that can reproduce the basic vignette of the lens.

Optical vignette is a low order (bright in the middle and dark at the edges) distortion.

HINT: Use this method with the STEREO satellite data!

P.S: I am not "Anomymous", but have no idea how to use my actual name in these comments. A person should be proud to use his own name.
 
"The data is actually a continuously increasing line to which I have added noise."

Why did you have to add noise?

What was wrong with the original data?

Were you not getting the results that you had expected, and by adding random noise, it became a better fit?
 
It looks to me like the polynomial fit matched the data rather well in your second graph. Except it fails to follow the recovering light curve, and the "trend" keeps on shooting down, even as the light curve is going up. So a polynomial fit is not useful in a noisy environment wherethe temprature will dip (or spke) and recover.
 
"The data is actually a continuously increasing line to which I have added noise."

Why did you have to add noise?

What was wrong with the original data?

Were you not getting the results that you had expected, and by adding random noise, it became a better fit?


[SFX: exasperated sigh and drumming of fingers] The whole point is to show how noise affects the accuracy of a polynomial fit. So you take a known curve or line, add some noise, and see if the polynomial fit recovers the line or curve (I’m an experimentalist, I like to run experimental demonstrations).

All systems have noise. Noise comes form a variety of areas. Back in the day my waking life was spent trying to fit radioligand binding data to one site or two site logistic equations. The difference between a one site and a two site fit meant the discovery of a new protein, with the accompanying fame, adulation and attention form hot nerdettes that this entails. Hormones bind to receptors deterministically, and a hormone receptor binding curve should theoretically follow the logistic curve exactly. But in reality that never happens, all sorts of preparation and measurement errors add noise.

If you want to see how noise affects the particular tests for statistical identity you are using, you generate a logistic curve and add noise.

Now climate data is noisy, you have all sorts of sources of noise from measurement error to the quasiperiodic El Nino-La Nina cycle and the occasional volcano messing things up. To what degree can a polynomial fit recover a true curve from a noisy environment? To demonstrate this I took a known deterministic curve (a straight line), the slope of which which was matched to the values seen in the Global tropospheric data (a rise of 0.13 degrees per decade). The amount of noise added was matched to the real variability of the satellite temperature record.

As we saw, the polynomial fits could not accurately recover the real curve underlying the data. Indeed the terminal part of the fit deviated widely from the real curve. Thus any claim that a polynomial fit to the real climate data is a “trend”, and that warming has been wiped out is false.

You can demonstrate this yourself very easily. The line I used was y = 0.004x - 0.004, If you have Excel or something similar you can easily do this by setting the cell A1 to 0 and A2 to =A1+0.004 then copy cell A2 to around 100 or so cells. To add noise I used the formula in B1 = A1 +(RANDBETWEEN(-30,30)/100). Plot the values in column B as an X,Y scaterplot, then add various polynomial lines. By hitting the F9 key you can generate new random noise, and you can watch the terminal section of the polynomial fit whip around like crazy. You can even play with the amount of noise and see how that affects the polynomial fit (its fun, I recommend it). You can try other sorts of curves too and see what happens.

The bottom line, polynomial fits and climate data– just say no.
 
I consider Ian as a personal friend, and if we debase about a specific subject, then that is what friends do.

However, if any post-grad student working for me had presented these two graphs, and tried to convince me that the polynomial fit had a lower standard diviation with the raw data...

I would have fired him!

Sometimes, the concept of "double blind" can keep scientists honest.
 
What does "a lower standard diviation with the raw data" mean? Do you mean that the correlation or the similarity is low?

Hopefully your post-grad would have a chance to point out that matching the raw data isn't usually the point of curve-fitting--if it was, you'd just draw lines connecting the original points and call it a day. Curve-fitting is intended to accurately depict the true curve you're investigating--something the raw data, biased and noisy as they are, may not do.
 
However, if any post-grad student working for me had presented these two graphs, and tried to convince me that the polynomial fit had a lower standard diviation with the raw data...

I don't know mhow to say this more clearly. This is one of my many failings as a science communicator. Lets try this

We know in advance the true curve.

We know in advance the true curve.

And if the polynomial doesn't recover the true curve, then it is wrong.
 
yes,yes, Ian, but why did you add the NOISE??

Isn't that the same sort of dodgy behavior we've come to expect from the global warming alarmists, even though we're actually probably in an Ice Age?

P.S. The polynomial fit is more elegant and beautiful, and Einstein said that was very important in physics, and indeed in science generally.
 
Marion,

You are usually quite dry with your irony but perhaps you've overdone it this time? ;)
 
probably, TrueSceptic, but in my defense, my head hurt.

It's clearly turtles all the way down, isn't it?
 
@Marion Delgado

The graph is of random data added to a line. (No real world measurement)

The graph is ONLY to show why polynomial is bad and not anything on data.
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?