Suppose we are given a matrix equation with a vector variable taking values in , and a fixed vector in (implying that is an matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when is in the span of the columns of , or alternatively, when . But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation
because the projection of onto the column space of will always be in the column space of regardless of whether or not the original vector is.The question then becomes: given that we know (eqn:lss) has at least one solution, how do we go about finding it (or them)? The starting point for answering that question is the following theorem, often referred to as the Fundamental Subspaces theorem (originally proven by Gauss)
- Proof
- Because the theorem is stated for all matrices, and because for any subspace , the second, third and fourth statements are consequences of the first, and is suffices to verify that case. To see this, we recall that is the subspace of spanned by the columns of ; then iff for all iff for all iff .
Write for . Then
- (as is the unique vector in with )
- (by Theorem thm:fst)
This last equation has the same set of solutions as the equation that started the sequence, namely , and is therefore always consistent. It is derived from our original equation by simply multiplying both sides on the left by , and is often referred to as the associated normal equation of the original matrix equation from which it was derived.
This yields a straightforward procedure for finding the least-squares solution to our original equation ; i.e., a solution to the associated normal equation , which by the above is equivalent to a solution to the related equation . Note that the original equation is consistent precisely when , or equivalently when ; in other words, when the least-squares solution is an exact solution. The advantages to seeking a least-squares solution are i) it always exists (regardless of whether or not the original equation is consistent), and ii) it yields an actual solution whenever an actual solutions exist. Because this procedure finds the least-squares solution first, it can be also applied to finding the least-squares approximation to as , where is a least-squares solution to the original equation.
The steps are:
- Form the associated normal equation ;
- find the solution(s) to the normal equation by computing . These will be the least-squares solution(s) to the original equation;
- for any least-squares solution from Step 2, compute . This will yield the least-squares approximation to by a vector in the column space of .
Again, there will only be one least-squares approximation to by a vector in , because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is
A final question then remains; when will there be a unique least-squares solution? We say that the matrix has full column rank (or just full rank when there is no confusion) if the columns of are linearly independent; namely that . If is , this imposes the constraint that (otherwise the rank would have to be less than the number of columns). A useful fact about the ranks of matrices (which we do not prove here) is
As the normal equation is always consistent, we see