You are about to erase your work on this activity. Are you sure you want to do this?
Updated Version Available
There is an updated version of this activity. If you update to the most recent version of this activity, then your current progress on this activity will be erased. Regardless, your record of completion will remain. How would you like to proceed?
Mathematical Expression Editor
Least-Squares Approximation
Often an exact solution to a problem in applied mathematics is difficult or impossible
to obtain. However, it is usually just as useful to find an approximation to a solution.
In particular, finding “linear approximations” is a powerful technique in applied
mathematics. One basic case is the situation where a system of linear equations has
no solution, and it is desirable to find a “best approximation” to a solution to the
system.
We begin by defining the “best approximation” in a natural way, and showing that
computing the best approximation reduces to solving a related system of linear
equations called the normal equations. Next, we demonstrate a common application
where a collection of data points is approximated by a line (or a curve). We conclude
this section by showing that -factorization provides us with a more efficient
way to solve the normal equations and compute the best approximation.
Best Approximate Solutions
Let
Consider the matrix equation . A quick examination of the last two rows should
convince you that this equation has no solutions. In other words, is not in the span
of the columns of .
If were an exact solution to , then would be . Since the equation does not
have a solution, we will attempt to find the next best thing to a solution
by finding such that is as small as possible. The quantity is called the
error.
The following GeoGebra interactive will help you understand the geometry behind
finding .
RIGHT-CLICK and DRAG to rotate the image for a better view.
Record your best guess for – you will have a chance to check your answer in
Example ex:leastSquares1.
What did you discover about the geometry of minimizing ? Select all that
apply.
is orthogonal to the plane spanned by the columns of . is orthogonal to . is
orthogonal to . is orthogonal to . is an orthogonal projection of onto
.
Our geometric observations will help us develop a method for finding .
Suppose is an matrix, and is a column vector in . Consider the matrix equation .
If this equation does not have a solution, we can attempt to find a best
approximation by finding which minimizes the error, . The expression is
also sometimes called the residual. The error (or the residual) is given in
terms of a vector norm. Recall that our definition of the norm involves the
sum of squares of the vector components. When we minimize the norm, we
minimize the sum of squares. This is why the method we are describing is often
referred to as least squares. We will explore this idea further later in this
section.
In the case when is a subspace of , we can see geometrically that is the best
approximation if and only if is an orthogonal projection of onto , and the error is
the magnitude of , as shown below.
What we observed above, holds in general. We will use this fact to find .
Every vector in can be written in the form for some in . Our goal is to find such
that is the orthogonal projection of onto . By Corollary cor:orthProjOntoW, every vector in is
orthogonal to . This means is in the orthogonal complement of , which is
.
Therefore, we have
Since is normal to the subspace , we call the system of linear equations in (eq:normalForZ) the
normal equations for . If is invertible, then we can write
We will return to the question of invertibility of in Theorem th:ATAinverse. For now, let’s revisit
the problem posed in Exploration exp:leastSq1.
We now return to the matrix equation of Exploration exp:leastSq1 to find that best
approximates a solution.
Recall that
In this case, exists. Applying equation (eq:leastSquaresZ), we compute
Compare this answer to your guess in Exploration exp:leastSq1. If your guess was correct, nice
job! If your guess was different, try setting to the correct answer and use
the GeoGebra interactive in Exploration exp:leastSq1 to examine the geometry of the
problem.
We now come back to the question of when is invertible.
If columns of matrix are linearly independent, then is invertible.
Proof
Let be a matrix with linearly independent columns. We will
show that has only the trivial solution. For , a solution of , we have
Therefore . By linear independence of the columns of we conclude that
.
We summarize our findings in the following theorem.
Let be an matrix, let be a column vector in . Consider the matrix equation
(a)
Any solution to the normal equations
is a best approximation to a solution to in the sense that is minimized.
(b)
If the columns of are linearly independent, then is invertible and is given
uniquely by
The sytem of linear equations
has no solution. Find the vector that best approximates a solution.
We have
The normal equations are . We compute
We observe that is is invertible. Multiplying on the left by yields
With this vector , the left sides of the equations become
This is as close as possible to a solution.
The average number of goals per game scored by a hockey player seems to be
related linearly to two factors: the number of years of experience and the
number of goals in the preceding 10 games. The data on the following page
were collected on four players. Find the linear function that best fits the
data.
If the relationship is given by , then the data can be described as follows:
Using Theorem th:bestApprox, we get
Hence the best-fitting function is .
Application of Least Squares to Curve Fitting
In Curve Fitting, we discussed how to fit a function to a set of data points, so
that the graph of the function passes through each of the points. We also
discussed why doing so is sometimes impossible (two points lie one above the
other), and may not even be desirable (overfitting). In this section we will
learn how to approximate a collection of data points with a line (or a curve)
that fits the “trend” of the points. We will start with data that fit a linear
pattern.
Consider the points , and . These points do not lie on a straight line, but
they have a general upward linear trend. (Typically there would be many
more points to consider, but we will limit our exploration to what we can
do by hand.) Our goal is to find a line that fits these points as closely as
possible.
We are looking for a function of the form such that the following infeasible system is
satisfied as closely as possible
From the first part of this section we know how to find a best approximation. By Theorem th:bestApprox, we
have
According to our computations, the line that best fits the data is given
by
Let’s take a look.
We found this fit by minimizing . We will now investigate the
meaning of this expression in relation to the line and the data points.
Observe that each entry of is the signed vertical distance between a particular point
and the line.
Instead of computing the error, , we will compute to avoid the square root.
Minimizing also minimizes . Therefore, what we have minimized is the sum of
squares of the vertical distances between the data points and the line. The following
GeoGebra interactive will help you explore this idea.
In Exploration exp:leastSq2 we discovered that is the sum of squares of vertical distances
between the given data points and the proposed line. By minimizing , we minimize
the sum of squares of vertical distances. This observation holds in general. Given a
collection of points , finding a linear function of the form that best fits the points we
would find a best solution to the system
by minimizing
A geometric interpretation of is shown below.
The line we obtain in this fashion is called a line of best fit or a trendline, and the
method we used is referred to as the method of least squares.
We can apply the method of least squares to find best fitting non-linear
functions.
Find the least squares approximating quadratic polynomial of the form for the
following points.
We are looking for an approximate solution to the system of equations
This corresponds to the matrix equation
Multiplying on the left by gives us the normal equations.
It turns out that is invertible, so it is easy to solve for . You can use technology to
accomplish this. We will demonstrate how to do this using Octave.
To use Octave, go to the Sage Math Cell Webpage, copy the code below into the cell,
select OCTAVE as the language, and press EVALUATE.
% Define matrix A
A=[9 -3 1;1 -1 1;0 0 1;1 1 1; 9 3 1];
% Define vector b
b=[3;1;1;2;4];
% Let B be the inverse of A^TA.
B= inv(transpose(A)*A);
% Now we solve for z
ans=B*transpose(A)*b
We arrive at the solution
Therefore, the quadratic function of best fit is given by . You can see the graph and
the points shown below.
Before the end of this section we will return to this problem with a more
computationally efficient approach.
Given the data points , , and , find the least squares approximating function
of the form .
We are looking for an approximate solution to the system of
equations
This corresponds to the matrix equation
Using the normal equations, we obtain
Solving for yields
Therefore, the function of best fit (of the given form) is given by
-Factorization: A Quicker Way to do Least Squares
When solving the normal equations in (eq:normalForZ), it is advantageous to have a -factorization
of . For then we can write
Since is invertible, then also has an inverse, and multiplying on the left by it
yields
This last equation is easily solved by back-substitution, since is upper triangular.
This greatly reduces the amount of computations we need to make, as we will observe
by using Octave in our final example of the section.
Let’s return to the Octave calculations we did in Example ex:leastSquaresPoly, so we can see the
improvement.
To use Octave, go to the Sage Math Cell Webpage, copy the code below into the cell,
select OCTAVE as the language, and press EVALUATE.
% Define matrix A
A=[9 -3 1;1 -1 1;0 0 1;1 1 1; 9 3 1];
% Define vector b
b=[3;1;1;2;4];
tic %this command begins a timer so we can compare how long it takes to perform our computations
% Let B be the inverse of A^TA.
B= inv(transpose(A)*A);
% Now we solve for z
ans=B*transpose(A)*b
toc % This command gives us the amount of time it took to perform the calculations
% Here we try the computation again using QR-factorization, and we will see it is more efficient
tic %start the timer
% Compute the QR-factorization of A
[Q,R]=qr(A);
ans=R\(transpose(Q)*b)
toc % Is the result noticeably faster?
Practice Problems
Find the best approximation to a solution to the system of equations.
Enter answers in fraction form below.
Find a linear function of best fit for each of the following sets of data points.
Examine how well your line fits the points by typing the equation of the line into the
Desmos window.
Use Octave to find the least squares approximating quadratic function for the
following data points.
Round your answers to three decimal places.
If is an matrix, it can be proved that there exists a unique matrix satisfying the
following four conditions: ; ; and are symmetric. The matrix is called the
Moore-Penrose inverse.
(a)
If is square and invertible, show that .
(b)
If , show that .
(c)
If , show that . (Notice the appearance of the Moore-Penrose inverse
arrived when we solve the normal equations, arriving at Equation (eq:leastSquaresZ)).
Text Source
A portion of this section has been adapted from W. Keith Nicholson, Linear Algebra
with Applications, Lyryx 2021-A, Open Edition, pp. 308-319.