Least-Squares Line Fits and Associated Uncertainty

By David Archer




There are several measurement situation where one is trying to determine if there is a linear relationship between a pair of measured values. For instance the relationship between stress and strain, voltage and current, input voltage and output voltage, etc.. In this article the basics of least-squares line fits will be discussed, along with a basic uncertainty analysis.


The General Least-Squares Problem


In the general least-squares problem, one has a set of measured data collected as ordered pairs



and one wants to fit this data to the functional form



where the ci are parameters M parameters that define the function.


One can define the residuals of the data set from



which measures how far each data point deviates from the function along a vertical line. A measure of the goodness of the fit is the root-mean-square (RMS) value of the residuals




The general least-squares problem is to find the constants ci that minimizes the RMS value of the residuals. From basic calculus, one knows one has a minimum if



One generates a series of M such equations for the M unknowns in the functional form. Solving this system of equations results in the least-squares fit for the particular functional form.


The Least-Squares Line Fit Problem


The problem at hand is to fit the data to the functional form



where the slope m, and the intercept b, are chosen to minimize the RMS residuals




Taking the derivative of this expression with respect to b and equating it to zero results in




With a little manipulation this reduces to



Performing the same operation for the parameter m,



which can be manipulated into the form



To simplify these equations, the following quantities are introduced



The equations that determine the minimum RMS residual can now be recast in the form



or written as a matrix equation



This system of equations have the solution



which solves the least squares problem.


Uncertainty Analysis


Assuming for the moment that the slope or intercept were the quantities that one was actually trying to measure, one should perform an uncertainty analysis on these parameters by using the  "Law of propagation of uncertainty" to determine the uncertainty in the slope and intercept in terms of the uncertainty in the measurement data.


The uncertainty in the slope  can be written



where  is the uncertainty associated with , and  is the uncertainty associated with .


It can be shown that the partial derivatives are








Similarly for the uncertainty in the intercept  is



and the partial derivatives are







Now it may also be the case that one wants to use the best fit line parameters to use in future measurements. In such cases one wants for a specific x



the uncertainty in the value y is then



Now what has been calculated to this point has been uncertainty associated with the fit itself. It assumes that the underlying phenomenon is linear. This may not be the case. However, one does have a quantity that is a direct measure of the "goodness" of the linearity assumption, namely the residuals.


There will be non-zero residuals for a given fit, even if the underlying phenomenon is perfectly linear, due to the uncertainty in the measurement. That however is predictable. The estimated residuals due to the measurement uncertainty is



This can be compared against the actual residuals. If the actual residuals are significantly larger than the predicted residuals, one can say there is some non-linearity in the underlying phenomenon. In such a case one would probably want to combine the RMS residuals with the uncertainty in y for an overall uncertainty statement.




The basics of least-squares line fits was presented, along with a basic uncertainty analysis. Hopefully this article can be useful as a reference if your measurement requires some sort of least-squares line fit. There are many phenomenon, and situations in calibration and measurement, where such a fit is useful.

Quick Links