Least-Squares Line Fits and Associated Uncertainty
By David Archer
Introduction
There are several measurement situation where one is trying to determine if there is a linear relationship between a pair of measured values. For instance the relationship between stress and strain, voltage and current, input voltage and output voltage, etc.. In this article the basics of least-squares line fits will be discussed, along with a basic uncertainty analysis.
The General Least-Squares Problem
In the general least-squares problem, one has a set of measured data collected as ordered pairs
and one wants to fit this data to the functional form
where the ci are parameters M parameters that define the function.
One can define the residuals of the data set from
which measures how far each data point deviates from the function along a vertical line. A measure of the goodness of the fit is the root-mean-square (RMS) value of the residuals
.
The general least-squares problem is to find the constants ci that minimizes the RMS value of the residuals. From basic calculus, one knows one has a minimum if
One generates a series of M such equations for the M unknowns in the functional form. Solving this system of equations results in the least-squares fit for the particular functional form.
The Least-Squares Line Fit Problem
The problem at hand is to fit the data to the functional form
where the slope m, and the intercept b, are chosen to minimize the RMS residuals
.
Taking the derivative of this expression with respect to b and equating it to zero results in
.
With a little manipulation this reduces to
Performing the same operation for the parameter m,
which can be manipulated into the form
To simplify these equations, the following quantities are introduced
The equations that determine the minimum RMS residual can now be recast in the form
or written as a matrix equation
This system of equations have the solution
which solves the least squares problem.
Uncertainty Analysis
Assuming for the moment that the slope or intercept were the quantities that one was actually trying to measure, one should perform an uncertainty analysis on these parameters by using the "Law of propagation of uncertainty" to determine the uncertainty in the slope and intercept in terms of the uncertainty in the measurement data.
The uncertainty in the slope can be written
where is the
uncertainty associated with
, and
is the
uncertainty associated with
.
It can be shown that the partial derivatives are
and
.
Similarly for the uncertainty in the intercept is
and the partial derivatives are
and
.
Now it may also be the case that one wants to use the best fit line parameters to use in future measurements. In such cases one wants for a specific x
the uncertainty in the value y is then
Now what has been calculated to this point has been uncertainty associated with the fit itself. It assumes that the underlying phenomenon is linear. This may not be the case. However, one does have a quantity that is a direct measure of the "goodness" of the linearity assumption, namely the residuals.
There will be non-zero residuals for a given fit, even if the underlying phenomenon is perfectly linear, due to the uncertainty in the measurement. That however is predictable. The estimated residuals due to the measurement uncertainty is
This can be compared against the actual residuals. If the actual residuals are significantly larger than the predicted residuals, one can say there is some non-linearity in the underlying phenomenon. In such a case one would probably want to combine the RMS residuals with the uncertainty in y for an overall uncertainty statement.
Conclusion
The basics of least-squares line fits was presented, along with a basic uncertainty analysis. Hopefully this article can be useful as a reference if your measurement requires some sort of least-squares line fit. There are many phenomenon, and situations in calibration and measurement, where such a fit is useful.