The directional derivative

$\newenvironment {prompt}{}{} \newcommand {\ungraded }[0]{} \newcommand {\todo }[0]{} \newcommand {\oiint }[0]{{\large \bigcirc }\kern -1.56em\iint } \newcommand {\mooculus }[0]{\textsf {\textbf {MOOC}\textnormal {\textsf {ULUS}}}} \newcommand {\npnoround }[0]{\nprounddigits {-1}} \newcommand {\npnoroundexp }[0]{\nproundexpdigits {-1}} \newcommand {\npunitcommand }[1]{\ensuremath {\mathrm {#1}}} \newcommand {\RR }[0]{\mathbb R} \newcommand {\R }[0]{\mathbb R} \newcommand {\N }[0]{\mathbb N} \newcommand {\Z }[0]{\mathbb Z} \newcommand {\sagemath }[0]{\textsf {SageMath}} \newcommand {\d }[0]{\,d} \newcommand {\l }[0]{\ell } \newcommand {\ddx }[0]{\frac {d}{\d x}} \newcommand {\zeroOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {0}{0}}}} \newcommand {\inftyOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {\infty }{\infty }}}} \newcommand {\zeroOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {0}{\infty }}}} \newcommand {\zeroTimesInfty }[0]{\ensuremath {\small \boldsymbol {0\cdot \infty }}} \newcommand {\inftyMinusInfty }[0]{\ensuremath {\small \boldsymbol {\infty -\infty }}} \newcommand {\oneToInfty }[0]{\ensuremath {\boldsymbol {1^\infty }}} \newcommand {\zeroToZero }[0]{\ensuremath {\boldsymbol {0^0}}} \newcommand {\inftyToZero }[0]{\ensuremath {\boldsymbol {\infty ^0}}} \newcommand {\numOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {\#}{0}}}} \newcommand {\dfn }[0]{\textbf } \newcommand {\unit }[0]{\mathop {}\!\mathrm } \newcommand {\eval }[1]{\bigg [ #1 \bigg ]} \newcommand {\seq }[1]{\left ( #1 \right )} \newcommand {\epsilon }[0]{\varepsilon } \newcommand {\phi }[0]{\varphi } \newcommand {\iff }[0]{\Leftrightarrow } \DeclareMathOperator {\arccot }{arccot} \DeclareMathOperator {\arcsec }{arcsec} \DeclareMathOperator {\arccsc }{arccsc} \DeclareMathOperator {\si }{Si} \DeclareMathOperator {\scal }{scal} \DeclareMathOperator {\sign }{sign} \newcommand {\arrowvec }[1]{{\overset {\rightharpoonup }{#1}}} \newcommand {\vec }[1]{{\overset {\boldsymbol {\rightharpoonup }}{\mathbf {#1}}}\hspace {0in}} \newcommand {\point }[1]{\left (#1\right )} \newcommand {\pt }[1]{\mathbf {#1}} \newcommand {\Lim }[2]{\lim _{\point {#1} \to \point {#2}}} \DeclareMathOperator {\proj }{\mathbf {proj}} \newcommand {\veci }[0]{{\boldsymbol {\hat {\imath }}}} \newcommand {\vecj }[0]{{\boldsymbol {\hat {\jmath }}}} \newcommand {\veck }[0]{{\boldsymbol {\hat {k}}}} \newcommand {\vecl }[0]{\vec {\boldsymbol {\l }}} \newcommand {\uvec }[1]{\mathbf {\hat {#1}}} \newcommand {\utan }[0]{\mathbf {\hat {t}}} \newcommand {\unormal }[0]{\mathbf {\hat {n}}} \newcommand {\ubinormal }[0]{\mathbf {\hat {b}}} \newcommand {\dotp }[0]{\bullet } \newcommand {\cross }[0]{\boldsymbol \times } \newcommand {\grad }[0]{\boldsymbol \nabla } \newcommand {\divergence }[0]{\grad \dotp } \newcommand {\curl }[0]{\grad \cross } \newcommand {\lto }[0]{\mathop {\longrightarrow \,}\limits } \newcommand {\bar }[0]{\overline } \newcommand {\surfaceColor }[0]{violet} \newcommand {\surfaceColorTwo }[0]{redyellow} \newcommand {\sliceColor }[0]{greenyellow} \newcommand {\vector }[1]{\left \langle #1\right \rangle } \newcommand {\sectionOutcomes }[0]{} \newcommand {\HyperFirstAtBeginDocument }[0]{\AtBeginDocument }$

We introduce a way of analyzing the rate of change in a given direction.

For functions of several variables, partial derivatives measure the rate of change when changing only one of the inputs. We can think of partial derivatives geometrically if we consider the surface $z=F(x,y)$ . Let’s imagine that our surface is a hill and consider $F_y(a,b)$ . This tell us we should hold $x$ constant and see how $z= F(x,y)$ changes as $y$ changes. In essence, we imagine a path “parallel” to the $y$ -axis and note how $z=F(x,y)$ changes:

We can now interpret $F_y(a,b)$ as either:

the slope of the hill if we walk along it in a direction parallel to the $y$ -axis.
the instantaneous rate of change of $F(x,y)$ at $(a,b)$ as we approach $(a,b)$ along the line $x=a$ .

We have a similar interpretation of $F_x(a,b)$ . However, there is no reason that we must approach $(a,b)$ along a line that is parallel to one of the coordinate axes. What if we want to approach along any line? Consider, for example, this line:

Indeed, once we are at a point on the surface above $\vec {a}=\vector {a,b}$ , there are actually many different directions that we can travel along the hill. Let’s consider a line $\vecl$ that passes through $\vec {a}$ as our path in the domain. Our rate of change will be given by $\textrm {rate of change} = \frac {\textrm {rise}}{\textrm {run}},$ where the “run” is the distance traveled along the line $\vecl$ and the “rise” is the corresponding change in the $z$ -values of the function. Since we are ultimately concerned about a curve on the surface, a good first step is to parameterize the line in the domain, then use the function to find a parametric description of the curve on the surface above $\vecl$ .

In order to make computing the run most efficiently, we pick a unit vector $\uvec {u}$ in the direction $\vecl$ is drawn. We’ll see how to do this in the next example, but we can always start at $\vec {a}$ and draw a unit vector that extends from $\vec {a}$ along $\vecl$ .

To find a parameterization of $\vecl$ , note that $\uvec {u}$ is parallel to $l$ and $\vec {a}$ is a point on the line, so letting $h$ denote the parameter, a description of $\vecl$ is given by

$\vecl (h) = \vec {a} + h \uvec {u}$

For the sake of example, let $h>0$ (a similar argument can be given if $h<0$ ). One convenient consequence of using a unit vector in the direction of $\vecl$ is that the “run,” which is the distance between $\vec {a}$ and $\vec {a}+h\uvec {u}$ is simply $\textrm {``run'' } = \left |\vec {a}+h\uvec {u} - \vec {a}\right | = |h \uvec {u}| = |h| |\uvec {u}| = h$ since $h>0$ and $\uvec {u}$ is a unit vector.

The “rise” is computed by noting that it is the corresponding change in $z$ -values.

$\textrm { ``rise'' } = F(\vec {a}+h\uvec {u})-F(\vec {a})$

These are shown in the image below.

To find the instantaneous rate of change, we take the limit as $h$ goes to $0$ (since $F$ is differentiable, it can be shown this limit must exist). We call the result the directional derivative of $F$ at $\vec {a}$ in the direction $\uvec {u}$ and will henceforth denote this by $D_{\uvec {u}}(F(\vec {a}))$ . Let’s give a formal definition.

Suppose that $F: \R ^2 \to \R$ is a differentiable function. Given a unit vector $\uvec {u}$ and a point $\vec {a}$ in the domain of $F$ , we define the directional derivative of $F$ at $\vec {a}$ in the direction $\uvec {u}$ , as: $D_\uvec {u}F(\vec {a}) = \lim _{h \to 0} \frac {F(\vec {a}+h\uvec {u})-F(\vec {a})}{h}$

While writing down the definition above might seem tricky, notice that the qualitative idea of finding the instantaneous rate of change as a limit is exactly the same as what we did with functions of a single variable. It’s really just an old problem in a new setting! The difficulty lies in using the tools we have been developing to write down the actual limit that must be computed.

In essence, $D_{\uvec {u}}(F(\vec {a})$ is the instantaneous rate of change of $F$ at $\vec {a}$ as we approach $\vec {a}$ in the direction of $\uvec {u}$ .

There’s a quick way to compute this limit by using the gradient vector. We first give the result and save the derivation of the formula until the end of the section.

Suppose that $F: \R ^2 \to \R$ is a differentiable function and let $\uvec {u}$ be a unit vector. Then, we compute $D_{\uvec {u}}(F(\vec {a}))$ by the formula $D_\uvec {u}F(\vec {a}) = \grad F(\vec {a})\dotp \uvec {u}.$

Find $D_{\uvec {u}}(F(2,1))$ for the function $F(x,y) = x^2-3xy+4y^2+7$ and $\uvec {u}$ is parallel to the line $2x-y=3$ .

We want to use the result $D_{\uvec {u}}(F(2,1)) =\grad {F}(2,1) \dotp \uvec {u}$ . To do this we need two quantities: $\uvec {u}$ and $\grad {F}(2,1)$ .

Finding $\uvec {u}$
Vectors have both a magnitude and a direction. We’ve seen that it is much more challenging to find a vector in the appropriate direction than it is to scale a vector appropriately, so let’s start by finding a vector $\vec {u}$ parallel to the line $2x-y=3$ .
There are many ways we can do this and one such way is to parameterize the line. Since we can explicitly find $y=\answer [given]{2x-3}$ , we set $x(t)=t$ and $y(t) = \answer [given]{2t-3}$ . A parameterization is thus
$\vec {p}(t) = \vector {x(t),y(t)} = \vector {\answer [given]{t},\answer [given]{2t-3}}$ Now, we need a $t$ -value for which $x(t) = 2$ , $y(t)=1$ . By inspecting the first component of the parameterization, we find $t=\answer [given]{2}$ . Thus, a vector parallel to the line will be $\vec {p}'(2)$ . We note
$\vec {p}'(t) = \vector {1,2}$
So $\vec {p}'(2) = \vector {\answer [given]{1},\answer [given]{2}}$ . This is the vector $\vec {u}$ we will use to be parallel to the line. We now note that $\vec {u}$ isis not a unit vector.
We find the unit vector the usual way by computing $\uvec {u} = \frac {\vec {u}}{|\vec {u}|} = \frac {\vector {1,2}}{\answer [given]{\sqrt {5}}}.$
Finding $\grad {F}(2,1)$ .
Since $F(x,y) = x^2-3xy+4y^2+7$ , we find $F_x(x,y) = \answer [given]{2x-3y}$ and $F_y(x,y) = -3x+8y$ , so
$\grad {F}(x,y) = \vector {\answer [given]{2x-3y},\answer [given]{-3x+8y}}$ Thus, $\grad {F}(2,1) = \vector {\answer [given]{1},\answer [given]{2}}$ .

Now, using the formula $D_{\uvec {u}}(F(2,1)) =- \grad {F}(2,1) \dotp \uvec {u}$ gives $D_{\uvec {u}}(F(2,1)) = \vector {1,2} \dotp \vector {\frac {1}{\sqrt {5}},\frac {2}{\sqrt {5}} } = \answer [given]{\frac {5}{\sqrt {5}}}$ .

Now that we have defined and worked with the directional derivative, what does it tell us?

The instantaneous rate of change of $F(x,y)$ at the point $(1,2)$ as we approach it in the direction parallel to $\vector {\frac {1}{\sqrt {5}},\frac {2}{\sqrt {5}}}$ . The slope of the tangent line to the curve on the surface $z= x^2-3xy+4y^2+7$ above the line $2x-y=3$ in its domain. The normal vector to the surface at the point. The slope of the tangent plane.

The first two choices are two ways of thinking about the directional derivative. Since the directional derivative is a scalar, not a vector, the third option cannot be correct. The fourth option is also not correct because there is no “slope” associated to a plane.

Directions of initial change

Consider a surface defined by $z = F(x,y)$ . Given a particular point $(a,b,F(a,b))$ on the surface where $\grad {F}(a,b) \neq \vec {0}$ , there are a few questions we can ask.

In which initial direction should we travel from $(a,b,F(a,b))$ if we want to head up the surface the fastest?
In which initial direction should we travel from $(a,b,F(a,b))$ if we want to head down the surface the fastest?
In which direction should we travel if we do not want our current elevation to change?

The following theorem answers these questions. We state the theorem for functions $F:\R ^2\to \R$ , but it actually holds for functions from $\R ^n$ to $\R$ .

Consider a function $F:\R ^2 \to \R$ and a point $(a,b)$ at which $\grad {F}(a,b) \neq \vec {0}$ .

The initial direction of greatest increase is in the direction of $\grad {F}(a,b)$ .
The initial direction of greatest decrease is in the direction of $-\grad {F}(a,b)$ .
The initial directions of no change are orthogonal to $\grad {F}(a,b)$ .

To explain this we have to consider every possible direction we can travel from the point $(a,b,F(a,b))$ along the surface. This may seem daunting, but remember that we have a nice formula for the directional derivative as a dot product, and dot products capture important geometric information. $D_{\uvec {u}} F(a,b) = \grad {F}(a,b) \dotp \uvec {u} = |\grad {F}(a,b)||\uvec {u}|\cos (\theta ).$ Since $\uvec {u}$ is a unit vector, $|\uvec {u}|=\answer [given]{1}$ , and thus $D_{\uvec {u}} F(a,b) = |\grad {F}(a,b)|\cos (\theta ).$ To find the initial direction of greatest increase, we need to find a choice for $\uvec {u}$ that makes $D_{\uvec {u}} F(a,b)$ as large as possible. Since $D_{\uvec {u}} F(a,b) = |\grad {F}(a,b)|\cos (\theta ),$ this occurs when $\theta =\answer [given]{0}$ . This means that a vector $\uvec {u}$ that points in the initial direction of greatest increase is parallel to the gradient vector.

As another upshot, we actually know exactly what the maximum rate of increase is at $(a,b)$ too. It’s $D_{\uvec {u}}(F(a,b)) = |\grad {F}(a,b)|$ .

We can use similar logic to determine that the maximum rate of decrease, or the “most negative” rate of change occurs in the direction $\uvec {u}$ opposite the direction of the gradient vector, and that this most negative rate of change is $D_{\uvec {u}}(F(a,b)) = -|\grad {F}(a,b)|$ .

To tackle the direction of no change, we need to find the directions $\uvec {u}$ for which $D_{\uvec {u}} F(a,b) =0$ . Once again, the formula $D_{\uvec {u}}(F(a,b)) = \grad {F}(a,b) \dotp \uvec {u}$ comes to the rescue. Setting $D_{\uvec {u}} F(a,b) =0$ gives that $= \grad {F}(a,b) \dotp \uvec {u} = 0$ , which means that the directions of no change are to $\grad {F}(a,b)$ .

Suppose that $F(x,y) = \sin (xy)+y^2$ . Give a unit vector in the initial direction of greatest increase, decrease, and no change at $(0,1)$ . What are the corresponding maximum and minimum rates of change at $(0,1)$ ?

We first compute the gradient. Since $F(x,y) = \sin (xy)+y^2$ ,

$F_x(x,y) = \answer [given]{y\cos (xy)}$ so $F_x(0,1) = 1$ .
$F_y(x,y) = x\cos (xy)+2y$ , so $F_y(0,1) = \answer [given]{2}$ .

Thus, $\grad {F}(0,1) = \vector {\answer [given]{1},\answer [given]{2}}$ . We can now use this to find the requested directions and rates.

The initial direction of greatest increase is in the direction of the gradient. Since $|\grad {F}(0,1)| = \sqrt {(1)^2+(2)^2} = \sqrt {5}$ , a unit vector $\uvec {u}$ in the direction of greatest increase is $\uvec {u} = \vector {\answer [given]{\frac {1}{\sqrt {5}}},\answer [given]{\frac {2}{\sqrt {5}}}}$ and the maximum rate of change is $|\grad {F}(0,1)| = \sqrt {5}$ .
The initial direction of greatest decrease is in the direction of the gradient. A unit vector $\uvec {u}$ in the direction of greatest increase is $\uvec {u} = \vector {-\frac {1}{\sqrt {5}},-\frac {2}{\sqrt {5}}}$ and the greatest rate of decrease is $-|\grad {F}(0,1)| = -\sqrt {5}$ .
There are two unit vectors in the initial direction of no change. To see why, note that $\grad {F}(0,1) = \vector {1,2}$ , so both the vectors $\vec {w}_1 =\vector {-2,1}$ and $\vec {w}_2 =\vector {2,-1}$ are orthogonal to $\vec {u}$ (Notice that for two dimensional vectors, we can always find a vector orthogonal to a given one by inspection; just flip the components and negate one of them).
The magnitude of both $\vec {w}_1$ and $\vec {w}_2$ is $\answer [given] {\sqrt {5}}$ , so the two unit vectors in the initial direction of no change are $\uvec {w}_1 = \vector {-\frac {2}{\sqrt {5}},\frac {1}{\sqrt {5}}}$ and $\uvec {w}_2 = \vector {\frac {2}{\sqrt {5}},-\frac {1}{\sqrt {5}}}$ .

The formula for the directional derivative

We conclude this section by giving the derivation of the formula

$D_{\uvec {u}}(F(\vec {a}))=\grad {F}(\vec {a})\dotp \uvec {u}.$ Since our function $F$ is differentiable, we know that when we “zoom in” on the graph of the surface $z=F(x,y)$ , the surface looks like its tangent plane, $z=L(\vec {x})$ , which is mathematized in the definition of differentiability below. $\lim _{\vec {x} \to \vec {a} } \frac {F(\vec {x})-L(\vec {x})}{|\vec {x}-\vec {a}|} = 0$

We have seen that we can use the gradient to write the formula for the tangent plane as $L(\vec {x}) = F(\vec {a}) + \grad {F}(\vec {a}) \dotp (\vec {x}-\vec {a})$ . Substituting into the above limit gives

$\lim _{\vec {x} \to \vec {a} } \frac {F(\vec {x})-F(\vec {a}) - \grad {F}(\vec {a}) \dotp (\vec {x}-\vec {a})}{|\vec {x}-\vec {a}|}=0 .$

Now, recall that the directional derivative $D_{\uvec {u}}(F(\vec {a}))$ requires that we approach $\vec {a}$ along the line $\vecl (t) = \vec {a}+t\uvec {u}$ . Since the above limit exists, the result holds along any path along which $\vec {x} \to \vec {a}$ , so it certainly holds along this path. Letting $\vec {x}$ approach $\vec {a}$ along this path is found by setting $\vec {x} = \vec {a}+t\uvec {u}$ , and the limit $\vec {x} \to \vec {a}$ is now found by taking $t \to 0$ . To simplify, we will consider as $t \to 0^+$ ; the argument for the other sided limit is very similar. Now, we update our limit along the chosen path.

$\begin{align*} 0 & =\lim_{\vec{x} \to \vec{a} } \frac{F(\vec{x})-F(\vec{a}) - \grad{F}(\vec{a}) \dotp (\vec{x}-\vec{a})}{|\vec{x}-\vec{a}|} \\ &= \lim_{t \to 0} \frac{F( \vec{a}+t\uvec{u})-F(\vec{a}) + \grad{F}(\vec{a}) \dotp ( \vec{a}+t\uvec{u}-\vec{a})}{| \vec{a}+t\uvec{u}-\vec{a}|}\\ &= \lim_{t \to 0} \frac{F( \vec{a}+t\uvec{u})-F(\vec{a}) - \grad{F}(\vec{a}) \dotp (t\uvec{u})}{| t\uvec{u}|} \\ &= \lim_{t \to 0} \frac{F( \vec{a}+t\uvec{u})-F(\vec{a})}{t} - \grad{F}(\vec{a}) \dotp \uvec{u}\\ \end{align*}$ where in the last step, we have used the fact that $|\uvec {u}|=1$ since $\uvec {u}$ is a unit vector.

Recalling that this limit is $0$ in the first place gives $\lim _{t \to 0} \frac {F( \vec {a}+t\uvec {u})-F(\vec {a})}{t} - \grad {F}(\vec {a}) \dotp \uvec {u} = 0,$ and since by definition, $D_{\uvec {u}}(\vec {a}) = \lim _{t \to 0} \frac {F( \vec {a}+t\uvec {u})-F(\vec {a})}{t}$ , we have

$D_{\uvec {u}} (F(\vec {a})) - \grad {F}(\vec {a}) \dotp \uvec {u} = 0.$

We may thus conclude that $D_{\uvec {u}} (F(\vec {a})) = \grad {F}(\vec {a}) \dotp \uvec {u}$ .