Least-Squares Approximation

Often an exact solution to a problem in applied mathematics is difficult or impossible to obtain. However, it is usually just as useful to find an approximation to a solution. In particular, finding “linear approximations” is a powerful technique in applied mathematics. One basic case is the situation where a system of linear equations has no solution, and it is desirable to find a “best approximation” to a solution to the system.

We begin by defining the “best approximation” in a natural way, and showing that computing the best approximation reduces to solving a related system of linear equations called the normal equations. Next, we demonstrate a common application where a collection of data points is approximated by a line (or a curve). We conclude this section by showing that -factorization provides us with a more efficient way to solve the normal equations and compute the best approximation.

Best Approximate Solutions

Let Consider the matrix equation . A quick examination of the last two rows should convince you that this equation has no solutions. In other words, is not in the span of the columns of .

If were an exact solution to , then would be . Since the equation does not have a solution, we will attempt to find the next best thing to a solution by finding such that is as small as possible. The quantity is called the error.

The following GeoGebra interactive will help you understand the geometry behind finding .

RIGHT-CLICK and DRAG to rotate the image for a better view.

Record your best guess for – you will have a chance to check your answer in Example ex:leastSquares1.

What did you discover about the geometry of minimizing ? Select all that apply.

is orthogonal to the plane spanned by the columns of . is orthogonal to . is orthogonal to . is orthogonal to . is an orthogonal projection of onto .

Our geometric observations will help us develop a method for finding .

Suppose is an matrix, and is a column vector in . Consider the matrix equation . If this equation does not have a solution, we can attempt to find a best approximation by finding which minimizes the error, . The expression is also sometimes called the residual. The error (or the residual) is given in terms of a vector norm. Recall that our definition of the norm involves the sum of squares of the vector components. When we minimize the norm, we minimize the sum of squares. This is why the method we are describing is often referred to as least squares. We will explore this idea further later in this section.

In the case when is a subspace of , we can see geometrically that is the best approximation if and only if is an orthogonal projection of onto , and the error is the magnitude of , as shown below.

What we observed above, holds in general. We will use this fact to find .

Every vector in can be written in the form for some in . Our goal is to find such that is the orthogonal projection of onto . By Corollary cor:orthProjOntoW, every vector in is orthogonal to . This means is in the orthogonal complement of , which is .

Therefore, we have

Since is normal to the subspace , we call the system of linear equations in (eq:normalForZ) the normal equations for . If is invertible, then we can write

We will return to the question of invertibility of in Theorem th:ATAinverse. For now, let’s revisit the problem posed in Exploration exp:leastSq1.

We now come back to the question of when is invertible.

Proof
Let be a matrix with linearly independent columns. We will show that has only the trivial solution. For , a solution of , we have
Therefore . By linear independence of the columns of we conclude that .

We summarize our findings in the following theorem.

Application of Least Squares to Curve Fitting

In Curve Fitting, we discussed how to fit a function to a set of data points, so that the graph of the function passes through each of the points. We also discussed why doing so is sometimes impossible (two points lie one above the other), and may not even be desirable (overfitting). In this section we will learn how to approximate a collection of data points with a line (or a curve) that fits the “trend” of the points. We will start with data that fit a linear pattern.

Consider the points , and . These points do not lie on a straight line, but they have a general upward linear trend. (Typically there would be many more points to consider, but we will limit our exploration to what we can do by hand.) Our goal is to find a line that fits these points as closely as possible.

We are looking for a function of the form such that the following infeasible system is satisfied as closely as possible

From the first part of this section we know how to find a best approximation. By Theorem th:bestApprox, we have

According to our computations, the line that best fits the data is given by Let’s take a look.

We found this fit by minimizing . We will now investigate the meaning of this expression in relation to the line and the data points.

Observe that each entry of is the signed vertical distance between a particular point and the line.

Instead of computing the error, , we will compute to avoid the square root.

Minimizing also minimizes . Therefore, what we have minimized is the sum of squares of the vertical distances between the data points and the line. The following GeoGebra interactive will help you explore this idea.

In Exploration exp:leastSq2 we discovered that is the sum of squares of vertical distances between the given data points and the proposed line. By minimizing , we minimize the sum of squares of vertical distances. This observation holds in general. Given a collection of points , finding a linear function of the form that best fits the points we would find a best solution to the system by minimizing A geometric interpretation of is shown below.

The line we obtain in this fashion is called a line of best fit or a trendline, and the method we used is referred to as the method of least squares.

We can apply the method of least squares to find best fitting non-linear functions.

-Factorization: A Quicker Way to do Least Squares

When solving the normal equations in (eq:normalForZ), it is advantageous to have a -factorization of . For then we can write

Since is invertible, then also has an inverse, and multiplying on the left by it yields

This last equation is easily solved by back-substitution, since is upper triangular. This greatly reduces the amount of computations we need to make, as we will observe by using Octave in our final example of the section.

Practice Problems

Find the best approximation to a solution to the system of equations. Enter answers in fraction form below.
Find a linear function of best fit for each of the following sets of data points. Examine how well your line fits the points by typing the equation of the line into the Desmos window.
Enter your answers in fraction form.

Modify the Octave code in Example ex:leastSquaresPolyRevisited to retry the problem in Example ex:leastSquares3. Is quicker?
Use Octave to find the least squares approximating quadratic function for the following data points. Round your answers to three decimal places.
If is an matrix, it can be proved that there exists a unique matrix satisfying the following four conditions: ; ; and are symmetric. The matrix is called the Moore-Penrose inverse.
(a)
If is square and invertible, show that .
(b)
If , show that .
(c)
If , show that . (Notice the appearance of the Moore-Penrose inverse arrived when we solve the normal equations, arriving at Equation (eq:leastSquaresZ)).

Text Source

A portion of this section has been adapted from W. Keith Nicholson, Linear Algebra with Applications, Lyryx 2021-A, Open Edition, pp. 308-319.