Interpreting the gradient vector

$\newenvironment {prompt}{}{} \newcommand {\ungraded }[0]{} \newcommand {\todo }[0]{} \newcommand {\oiint }[0]{{\large \bigcirc }\kern -1.56em\iint } \newcommand {\mooculus }[0]{\textsf {\textbf {MOOC}\textnormal {\textsf {ULUS}}}} \newcommand {\npnoround }[0]{\nprounddigits {-1}} \newcommand {\npnoroundexp }[0]{\nproundexpdigits {-1}} \newcommand {\npunitcommand }[1]{\ensuremath {\mathrm {#1}}} \newcommand {\RR }[0]{\mathbb R} \newcommand {\R }[0]{\mathbb R} \newcommand {\N }[0]{\mathbb N} \newcommand {\Z }[0]{\mathbb Z} \newcommand {\sagemath }[0]{\textsf {SageMath}} \newcommand {\d }[0]{\,d} \newcommand {\l }[0]{\ell } \newcommand {\ddx }[0]{\frac {d}{\d x}} \newcommand {\zeroOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {0}{0}}}} \newcommand {\inftyOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {\infty }{\infty }}}} \newcommand {\zeroOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {0}{\infty }}}} \newcommand {\zeroTimesInfty }[0]{\ensuremath {\small \boldsymbol {0\cdot \infty }}} \newcommand {\inftyMinusInfty }[0]{\ensuremath {\small \boldsymbol {\infty -\infty }}} \newcommand {\oneToInfty }[0]{\ensuremath {\boldsymbol {1^\infty }}} \newcommand {\zeroToZero }[0]{\ensuremath {\boldsymbol {0^0}}} \newcommand {\inftyToZero }[0]{\ensuremath {\boldsymbol {\infty ^0}}} \newcommand {\numOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {\#}{0}}}} \newcommand {\dfn }[0]{\textbf } \newcommand {\unit }[0]{\mathop {}\!\mathrm } \newcommand {\eval }[1]{\bigg [ #1 \bigg ]} \newcommand {\seq }[1]{\left ( #1 \right )} \newcommand {\epsilon }[0]{\varepsilon } \newcommand {\phi }[0]{\varphi } \newcommand {\iff }[0]{\Leftrightarrow } \DeclareMathOperator {\arccot }{arccot} \DeclareMathOperator {\arcsec }{arcsec} \DeclareMathOperator {\arccsc }{arccsc} \DeclareMathOperator {\si }{Si} \DeclareMathOperator {\scal }{scal} \DeclareMathOperator {\sign }{sign} \newcommand {\arrowvec }[1]{{\overset {\rightharpoonup }{#1}}} \newcommand {\vec }[1]{{\overset {\boldsymbol {\rightharpoonup }}{\mathbf {#1}}}\hspace {0in}} \newcommand {\point }[1]{\left (#1\right )} \newcommand {\pt }[1]{\mathbf {#1}} \newcommand {\Lim }[2]{\lim _{\point {#1} \to \point {#2}}} \DeclareMathOperator {\proj }{\mathbf {proj}} \newcommand {\veci }[0]{{\boldsymbol {\hat {\imath }}}} \newcommand {\vecj }[0]{{\boldsymbol {\hat {\jmath }}}} \newcommand {\veck }[0]{{\boldsymbol {\hat {k}}}} \newcommand {\vecl }[0]{\vec {\boldsymbol {\l }}} \newcommand {\uvec }[1]{\mathbf {\hat {#1}}} \newcommand {\utan }[0]{\mathbf {\hat {t}}} \newcommand {\unormal }[0]{\mathbf {\hat {n}}} \newcommand {\ubinormal }[0]{\mathbf {\hat {b}}} \newcommand {\dotp }[0]{\bullet } \newcommand {\cross }[0]{\boldsymbol \times } \newcommand {\grad }[0]{\boldsymbol \nabla } \newcommand {\divergence }[0]{\grad \dotp } \newcommand {\curl }[0]{\grad \cross } \newcommand {\lto }[0]{\mathop {\longrightarrow \,}\limits } \newcommand {\bar }[0]{\overline } \newcommand {\surfaceColor }[0]{violet} \newcommand {\surfaceColorTwo }[0]{redyellow} \newcommand {\sliceColor }[0]{greenyellow} \newcommand {\vector }[1]{\left \langle #1\right \rangle } \newcommand {\sectionOutcomes }[0]{} \newcommand {\HyperFirstAtBeginDocument }[0]{\AtBeginDocument }$

The gradient is the fundamental notion of a derivative for a function of several variables.

Three things about the gradient vector

We have now learned much about the gradient vector. However, there are three things you must know about the gradient vector:

Remember given a function $F:\R ^n\to \R$ : $\grad F = \vector {\pp [F]{x_1},\pp [F]{x_2},\dots ,\pp [F]{x_n}}$ This is a vector-valued function of $n$ variables. This means when you compute the gradient, you should express it as a vector!

Remember, the gradient vector of a function of $n$ variables is a vector that lives in $\R ^n$ . The gradient vector tells you how to immediately change the values of the inputs of a function to find the initial greatest increase in the output of the function. We can see this in the interactive below.

The gradient at each point shows you which direction to change the $(x,y)$ -values to get the greatest initial change in the $z$ -value.

In particular, given $F:\R ^2\to \R$ , the gradient vector $\grad F\in \R ^2$ is always orthogonal to the level curves $c = F(x,y)$ . Moreover, given $F:\R ^3\to \R$ , $\grad F \in \R ^3$ is always orthogonal to level surfaces.

Computing the gradient vector

Given a function of several variables, say $F:\R ^2\to \R$ , the gradient, when evaluated at a point in the domain of $F$ , is a vector in $\R ^2$ . We can see this in the interactive below.

The gradient at each point is a vector pointing in the $(x,y)$ -plane. You compute the gradient vector, by writing the vector: $\grad F = \vector {\pp [F]{x_1},\pp [F]{x_2},\dots ,\pp [F]{x_n}}$ You’ve done this sort of direct computation many times before. So now, try your hand at these puzzlers:

Consider a differentiable function $F:\R ^2\to \R$ whose tangent plane at $(x,y) = (2,-1)$ is given by: $z = 3x - 2y -1$ In this case what is $F(2,-1)$ ? $F(2,-1) = \answer {7}$

Suppose you know that $F^{(1,0)}(2,-1)>0$ . What is $\grad F(2,-1)$ ? $\grad F (2,-1) = \vector {\answer {3},\answer {-2}}$

Consider a differentiable function $G:\R ^2\to \R$ and the unit vector $\uvec {u} = \vector {1/\sqrt {2},1/\sqrt {2}}$ . Suppose that $D_{\uvec {u}} (G(1,-3)) = 0$ and that $G^{(0,1)}(1,-3)=2$ . Compute: $\grad G(1,-3) \begin {prompt} = \vector {\answer {-2},\answer {2}} \end {prompt}$

Consider a differentiable function $H:\R ^2\to \R$ where $H^{(0,1)}(-5,6) = 3$ and the line $\vecl (t) = \vector {1-2t,3+t}$ . Suppose that $\eval {\dd {t} H(\vecl (t)) }_{t=3} = 5$ Compute: $\grad H (-5,6) \begin {prompt} =\vector {\answer {-1},\answer {3}} \end {prompt}$

Use the chain rule.

The initial greatest increase

Given a function $F:\R ^n\to \R$ and point in $\R ^n$ , the gradient vector tells you which initial direction to leave the point in order to get the greatest increase in $F$ . Why is this so? Well, to compute the change in the output of a function when changing the inputs in a specific direction, we should use the directional derivative. Recall: $D_\uvec {u}(F) = \grad {F} \dotp \uvec {u}$ To make this change as large as possible, $\uvec {u}$ must be the same direction as $\grad F$ . Hence, it is the gradient vector that points in the initial direction of greatest increase for the function.

We can directly witness that the gradient vector points in the initial direction of greatest increase by looking at a differentiable function $F:\R ^2\to \R$ that is described by a table of values.

Let $F:\R ^2\to \R$ be a differentiable function described by the following table of values:

Estimate $\grad F(3,5)$ .

We estimate $\grad F(3,5)$ by estimating the partial derivatives. To estimate $F^{(1,0)}(3,5)$ , we examine the change in $F(x,5)$ between $x=4$ and $x=3$ : $\frac {F(4,5)-F\left (\answer [given]{3},5\right )}{\answer [given]{4}-3}= \answer [given]{-1}$ We should also examine the change in $F(x,5)$ between $x=3$ and $x=2$ : $\frac {F(3,5)-F\left (\answer [given]{2},5\right )}{\answer [given]{3}-2} =\answer [given]{-5}$ Now if we average these values together, we see: $\eval {\pp {x} F(x,y)}_{(x,y)=(3,5)} \approx \answer [given]{-3}$ On the other hand, using a similar procedure, we find that: $\eval {\pp {y} F(x,y)}_{(x,y)=(3,5)} \approx \answer [given]{4}$ Thus the gradient is $\grad F(3,5) = \vector {\answer [given]{-3},\answer [given]{4}}$ Note if you leave the point $(3,5)$ in the direction of $\grad F(3,5) = \vector {\answer [given]{-3},\answer [given]{4}}$ , you head toward $F(2,6)= \answer [given]{16}$ , the greatest initial increase from $(3,5)$ .

Here is a plot of an elliptic paraboloid $G(x,y) = x^2 + y^2$ along with a vector attached to a point on the surface:

True or false: The vector above could be the gradient vector for $G$ at the given point.

The answer is “False.” Here the graph of the function is three dimensional. The gradient vector is in one less dimension than the function’s graph. Hence the gradient of $G$ is in fact always a two dimensional vector.

So far we have mostly talked about the direction of the gradient vector. Now let’s talk about the magnitude of the gradient vector. The magnitude of the gradient vector tells you “how fast” the function is increasing.

Suppose you have a differentiable function $F:\R ^2\to \R$ with the following set of level curves. You should interpolate reasonable values of the function $F$ between the level curves which are shown:

Consider the points $A$ , $B$ , and $C$ on the surface $z=F(x,y)$ . Where $|\grad F|$ largest? The magnitude of the gradient vector of $F$ is largest at point $\answer [format=string]{B}$ . Where is $|\grad F|$ smallest? The magnitude of the gradient vector of $F$ is smallest at point $\answer [format=string]{C}$ .

Now, stand back. We’re going to do some serious calculus. Just read, relax and enjoy.

Consider the surface given by $F(x,y)= 20-x^2-2y^2$ :

Water is poured on the surface at $(1,1/4)$ . What path does it take as it flows downhill?

Let $\vec {w}(t) = \vector {x(t), y(t)}$ be the vector-valued function describing the path of the water in the $(x,y)$ -plane. We seek $x(t)$ and $y(t)$ . We know that water will always flow downhill in the initial steepest direction. Therefore, at any point on its path, it will be moving in the direction of $-\grad F(x,y)$ We’ll ignore the physical effects of momentum on the water. Thus $\vec {w}(t)$ will be parallel to $\grad F$ . Ah! This means there is some scalar function $c(t)$ such that $c(t)\grad F(x(t),y(t)) = \vec {w}'(t) = \vector {x'(t), y'(t)}.$ Computing the gradient, $\grad F(x(t),y(t)) = \vector {-2x(t), -4y(t)}$ Then $\begin{align*} c(t)\cdot \grad F(x(t),y(t)) &= \vector{ x'(t), y'(t)}\\ c(t)\cdot \vector{-2x(t),-4y(t)} &= \vector{ x'(t), y'(t)}\\ \vector{-2c(t)x(t),-4c(t)y(t)} &= \vector{ x'(t), y'(t)}\\ \end{align*}$ This implies $-2c(t)x(t) = x'(t) \quad \text {and} \quad -4c(t)y(t) =y'(t)$ so $c(t) = -\frac {x'(t)}{2x(t)} \quad \text {and} \quad c(t) =-\frac {y'(t)}{4y(t)}.$ Now recall that the differentials $\d x = x'(t) \d t$ , and $\d y=y'(t)\d t$ , so we may write $\begin{align*} \int \frac{1}{2x}x'(t)\d t &=\int \frac{1}{4y} y'(t)\d t \\ \int \frac{1}{2x}\d x &=\int\frac{1}{4y}\d y \\ \frac{1}{2}\ln|x| +C &= \frac{1}{4}\ln|y|\\ 2\ln|x| + C &= \ln|y|\\ \ln|x^2| + C &= \ln|y| \end{align*}$ Raising $e$ to the left-hand and right hand sides, we see $\begin{align*} e^{\ln|x^2| + C} &= e^{\ln|y|}\\ x^2\cdot e^C &= |y|, \end{align*}$ setting $K = e^C$ , we write $K\cdot x^2 = y.$ We are so close to being done, $y=K\cdot x^2$ , this is the path described in the $(x,y)$ -plane. Since the water started at the point $(1,1/4)$ , we can solve for $K$ : $K\cdot 1^2 = \frac 14 \quad \Rightarrow \quad K = \frac 14.$ Thus the water follows the curve $y=x^2/4$ in the $(x,y)$ -plane.

What were you supposed to learn from that last example?

There are two key take-aways from the example above:

First, that the negative of the gradient points in the initial direction of greatest decrease.
Second, is just to observe how the problem combines many aspects of calculus.

Orthogonality and the gradient

Now that we know gradient vectors point in the initial direction of the greatest increase of the function, let’s think about the geometry of the gradient vector. Previously we used the chain rule to show that the gradient vector is always orthogonal to level sets. The argument went like this: Suppose that a vector-valued function $\vec {c}(t)=\vector {x(t),y(t)}$ runs along a level surface for the surface $F(x,y)$ . If we ask ourselves: “What is the change in $F$ as $t$ varies?” We must conclude that $\dd {t} F(\vec {c}(t)) = 0$ since the value of $F$ doesn’t change on the curve drawn by $\vec {c}$ (remember, $\vec {c}$ draws a level curve). On the other hand, by the chain rule: $\dd {t} F(\vec {c}(t)) = \grad F(\vec {c}(t)) \dotp \vec {c}'(t)$ The vector $\vec {c}'$ is tangent to the curve drawn by $\vec {c}$ , and putting the two equations above together we see $0 = \grad F(\vec {c}(t)) \dotp \vec {c}'(t)$ so $\grad F(\vec {c}(t))$ must be orthogonal to $\vec {c}'$ , and hence orthogonal to the curve drawn by $\vec {c}$ .

The explanation we just gave is a good one, but let’s give one more. In this book, we are always thinking about differentiable functions. Remember, a function $F:\R ^2\to \R$ is differentiable if one can “zoom-in” and eventually the function will look like a plane. So let’s imagine that we’ve “zoomed-in” on a differentiable function and it looks like a plane. The contour plot of a plane looks like a bunch of parallel lines:

If we wish to leave the point above in the direction of the initial greatest increase, then we should move in a direction perpendicular to the level curves:

Gradient vectors point in the initial direction of greatest increase and the fastest way to leave a line is perpendicular to that line.

The fact that the gradient is always orthogonal to level surfaces is very powerful. In fact it gives new (easier!) solutions to old problems. Let’s use this fact to find a plane tangent to a surface.

Find an implicit equation for the tangent plane to the elliptic paraboloid $z = x^2 + y^2$

at $\vec {p} = \vector {2,3,13}$ .

Consider $F(x,y,z) =x^2 +y^2 -z$ and imagine the elliptic paraboloid as the level surface $F(x,y,z) = \answer [given]{0}$ Remember, the gradient is perpendicular to level surfaces. We’ll use this fact to find a normal vector to the surface, and with this vector we’ll find the tangent plane. The gradient is: $\begin{align*} \grad F(x,y,z) &= \vector{\pp[F]{x}, \pp[F]{y},\pp[F]{z}}\\ &= \vector{\answer[given]{2x},\answer[given]{2y}, \answer[given]{-1}}. \end{align*}$ Since this vector is normal to the surface, we can use it to find an implicit formula for the tangent plane to the surface by computing $\vec {n}\dotp (\vec {x}-\vec {p}) = 0$ where $\vec {p} = \vector {2,3,13}$ and $\begin{align*} \vec{n} &= \grad F(\vec{p})\\ &=\vector{\answer[given]{4}, \answer[given]{6},\answer[given]{-1}} \end{align*}$ Thus the equation of the plane tangent to the ellipsoid at $\vec {p}$ is: $\answer [given]{4}(x-2) + 6(y-3) - \left (z-\answer [given]{13}\right ) = \answer [given]{0}$

Now let’s see a more in-depth problem.

A plane perpendicular to the $(x,y)$ -plane contains the point $(-8,3,-8)$ on the hyperbolic paraboloid $z = x^2-8y^2$ . The line tangent to the intersection of the paraboloid and the plane is parallel to the $(x,y)$ -plane at this point. Find an equation of the plane.

Planes are determined by a point and their normal vector $\vec {n}$ . Since this plane is perpendicular to the $(x,y)$ -plane, we know that: $\vec {n} = \vector {a,b,\answer [given]{0}}$ Moreover the hyperbolic paraboloid $z = x^2-8y^2$ can be thought of as a level surface of $G(x,y,z) = x^2-8y^2 -z,$ in particular, $G(x,y,z) = 0$ . Since gradient vectors are normal to level surfaces, we compute $\grad G$ to find: $\grad G (x,y,z) = \vector {2x, -16y,-1}$ Now if we compute: $\begin{align*} \vec{n} \cross \grad G(-8,3,-8) &= \vector{a,b,0}\cross\vector{-16,-48,-1}\\ &=\vector{\answer[given]{-b},\answer[given]{a},\answer[given]{16b-48a}} \end{align*}$ But we know that the $z$ -component of the vector above must be $\answer [given]{0}$ . So, write with me: $\begin{align*} 16b-48a &= 0\\ b &= \answer[given]{3a} \end{align*}$ So $\vec {n}$ is parallel to $\vector {a,3a,0}$ . Hence one formula for the plane is $(x+8)+3(y-3) = 0$

Summary

To conclude, we will repeat ourselves: There are three things you must know about the gradient vector:

First: You must know how to compute the gradient vector. Second: The gradient vector points in the initial direction of greatest increase for a function. Third: The gradient vector is orthogonal to level sets.

Press...	...to do
left/right arrows	Move cursor
shift+left/right arrows	Select region
ctrl+a	Select all
ctrl+x/c/v	Cut/copy/paste
ctrl+z/y	Undo/redo
ctrl+left/right	Add entry to list or column to matrix
shift+ctrl+left/right	Add copy of current entry/column to to list/matrix
ctrl+up/down	Add row to matrix
shift+ctrl+up/down	Add copy of current row to matrix
ctrl+backspace	Delete current entry in list or column in matrix
ctrl+shift+backspace	Delete current row in matrix

Type...	...to get
norm	$\|\|\blue{[?]}\|\|$
text	$\text{\blue{[?]}}$
sym_name	$\backslash\texttt{\blue{[?]}}$
abs	$\left\|\blue{[?]}\right\|$
sqrt	$\sqrt{\blue{[?]}}$
paren	$\left(\blue{[?]}\right)$
floor	$\lfloor \blue{[?]} \rfloor$
factorial	$\blue{[?]}!$
exp	${\blue{[?]}}^{\blue{[?]}}$
sub	${\blue{[?]}}_{\blue{[?]}}$
frac	$\dfrac{\blue{[?]}}{\blue{[?]}}$
int	$\displaystyle\int{\blue{[?]}}d\blue{[?]}$
defi	$\displaystyle\int_{\blue{[?]}}^{\blue{[?]}}\blue{[?]}d\blue{[?]}$
deriv	$\displaystyle\frac{d}{d\blue{[?]}}\blue{[?]}$
sum	$\displaystyle\sum_{\blue{[?]}}^{\blue{[?]}}\blue{[?]}$
prod	$\displaystyle\prod_{\blue{[?]}}^{\blue{[?]}}\blue{[?]}$
root	$\sqrt[\blue{[?]}]{\blue{[?]}}$
vec	$\left\langle \blue{[?]} \right\rangle$
mat	$\left(\begin{matrix} \blue{[?]} \end{matrix}\right)$
*	$\cdot$
infinity	$\infty$
arcsin	$\arcsin\left(\blue{[?]}\right)$
arccos	$\arccos\left(\blue{[?]}\right)$
arctan	$\arctan\left(\blue{[?]}\right)$
sin	$\sin\left(\blue{[?]}\right)$
cos	$\cos\left(\blue{[?]}\right)$
tan	$\tan\left(\blue{[?]}\right)$
sec	$\sec\left(\blue{[?]}\right)$
csc	$\csc\left(\blue{[?]}\right)$
cot	$\cot\left(\blue{[?]}\right)$
log	$\log\left(\blue{[?]}\right)$
ln	$\ln\left(\blue{[?]}\right)$
alpha	$\alpha$
beta	$\beta$
gamma	$\gamma$
delta	$\delta$
epsilon	$\epsilon$
zeta	$\zeta$
eta	$\eta$
theta	$\theta$
iota	$\iota$
kappa	$\kappa$
lambda	$\lambda$
mu	$\mu$
nu	$\nu$
xi	$\xi$
omicron	$\omicron$
pi	$\pi$
rho	$\rho$
sigma	$\sigma$
tau	$\tau$
upsilon	$\upsilon$
phi	$\phi$
chi	$\chi$
psi	$\psi$
omega	$\omega$
Gamma	$\Gamma$
Delta	$\Delta$
Theta	$\Theta$
Lambda	$\Lambda$
Xi	$\Xi$
Pi	$\Pi$
Sigma	$\Sigma$
Phi	$\Phi$
Psi	$\Psi$
Omega	$\Omega$

Three things about the gradient vector

Computing the gradient vector

The initial greatest increase

Orthogonality and the gradient

Summary

Controls

Symbols

Settings