Curve Fitting

We know that two points determine a line. Do you know how many points determine a quadratic function of the form ? Given any number of points in the plane, is it always possible to find a polynomial function whose graph contains every one of the given points? To address these questions we will start with an alternative way of finding an equation of a line.

Consider two points and . We will find a function whose graph is a line that passes through these points. We know that for some constants and . Because the graph of passes through and , we must have the following: To solve for and , we need to solve the following matrix equation: Solving the equation, we find that and . This gives us:

The GeoGebra interactive below shows two points and , together with the matrix equation that produces function coefficients for the function whose graph passes through and . Drag the points around the plane to see how the matrix equation changes.

From a purely formal standpoint, we observe that the matrix equation has the form: where each row corresponds to one point.

Now we are ready to move to quadratic, and higher degree polynomial functions.

Linear function in Exploration exp:curveFitLine had two unknown coefficients that we needed to find in order to determine the function. Two points gave us a system of two equations and two unknowns.

A quadratic polynomial function, whose graph is a parabola, is given by: Three unknown coefficients will require three points to determine them.

We will find a quadratic function of the form whose graph passes through , and . To do this, we need to find coefficients , and such that

The following GeoGebra interactive shows points , , and , together with the matrix equation, and its solution.

Drag the points around the plane to observe changes in the coefficient matrix. Think geometrically to find locations of , and such that

  • ; .
  • ; .

Observe the structure of the matrix equation.

In general (provided that no one point lies directly above another), given points, we can always find an -degree polynomial function whose graph contains every one of the given points. To find such a polynomial function, given by , we need to solve a system of equations with unknowns which translates into the following matrix equation.

In Practice Problems prob:systemProblems1 and prob:systemProblems2 you will show that the matrix equation in (eq:matEq) has a unique solution if and only if no two of the given points share an -coordinate.

Using Technology

Throughout this section we have omitted the tedious process of solving matrix equations. It is useful to practice solving smaller matrix equations by hand, but for larger matrices, we can use technology. Below is the Octave code that can be used to find the solution to Exploration exp:curveFitParabola. You will be able to modify this code to solve some of the practice problems.

To use Octave, go to the Sage Math Cell Webpage, copy the code below into the cell, select OCTAVE as the language, and press EVALUATE.

% Define the coefficient matrix
 
A=[4 -2 1;0 0 1;1 1 1];  
% Define vector b  
b=[2;-1;5];  
% We can find the solution in two ways  
% Method 1: ans1=A^(-1)b  
ans1=inv(A)*b  
% Method 2  
ans2=A\b  
% If A is invertible, both methods produce the same result.

On the Dangers of Overfitting

It is exciting to know that we can fit a function to a set of data points, but before we get carried away fitting a 299-degree polynomial function to 300 points, let’s consider the following situation.

In the GeoGebra interactive below, you can see that points - form a somewhat linear pattern. A linear model can be used to describe these points. Click on the “Display linear model” check-box to see the trend line. (You will learn how to find such models in Least-Squares Approximation). You can see that even though the line does not pass through any of the given points, it fits the overall pattern of the points and can be used to estimate the -coordinates of other points whose -coordinates fall within the limits of the scatter plot.

It might be tempting to think that we can find a better model by finding a -degree polynomial function whose graph contains every one of the six points. Click on the “Display 5th degree poly model” check-box to see the alternative model. Can this model be successfully used to make predictions?

Try moving individual points around to see how their placement affects the line and the curve.

Any modeling process which insists on fitting the existing data points exactly, at the risk of failing to predict future observations, is referred to as overfitting. While sometimes it is beneficial to have a curve that passes through specific points, more often it is the trend, not the individual instances, that we try to capture. We will return to this topic in Least-Squares Approximation.

Practice Problems

In each case, find a polynomial function of an appropriate degree that passes through the given points.
Plot the graph of in the Desmos window below.

Plot the graph of in the Desmos window below.

Two GeoGebra screenshots are shown below:

In the first screenshot, points and coincide. In the second screenshot, point is located directly above point . In both cases, GeoGebra failed to produce a linear function whose graph passes through and .

Based on what you know about functions and geometry, explain why the process fails for these two examples. How do your observations correspond to what happens from an algebraic standpoint?

Both systems are inconsistent. The first system is inconsistent, the second has infinitely many solutions. Both systems have infinitely many solutions. The first system has infinitely many solutions, the second system is inconsistent.
Prove that equation (eq:matEq) has a unique solution if and only if no two given points share an -coordinate.
Show that the rows of the matrix are linearly independent if and only if no two given points share an -coordinate.
Under what circumstances is a solution not unique? Under what circumstances does a solution not exist?