The chain rule

$\newenvironment {prompt}{}{} \newcommand {\ungraded }[0]{} \newcommand {\todo }[0]{} \newcommand {\oiint }[0]{{\large \bigcirc }\kern -1.56em\iint } \newcommand {\mooculus }[0]{\textsf {\textbf {MOOC}\textnormal {\textsf {ULUS}}}} \newcommand {\npnoround }[0]{\nprounddigits {-1}} \newcommand {\npnoroundexp }[0]{\nproundexpdigits {-1}} \newcommand {\npunitcommand }[1]{\ensuremath {\mathrm {#1}}} \newcommand {\RR }[0]{\mathbb R} \newcommand {\R }[0]{\mathbb R} \newcommand {\N }[0]{\mathbb N} \newcommand {\Z }[0]{\mathbb Z} \newcommand {\sagemath }[0]{\textsf {SageMath}} \newcommand {\d }[0]{\,d} \newcommand {\l }[0]{\ell } \newcommand {\ddx }[0]{\frac {d}{\d x}} \newcommand {\zeroOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {0}{0}}}} \newcommand {\inftyOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {\infty }{\infty }}}} \newcommand {\zeroOverInfty }[0]{\ensuremath {\boldsymbol {\tfrac {0}{\infty }}}} \newcommand {\zeroTimesInfty }[0]{\ensuremath {\small \boldsymbol {0\cdot \infty }}} \newcommand {\inftyMinusInfty }[0]{\ensuremath {\small \boldsymbol {\infty -\infty }}} \newcommand {\oneToInfty }[0]{\ensuremath {\boldsymbol {1^\infty }}} \newcommand {\zeroToZero }[0]{\ensuremath {\boldsymbol {0^0}}} \newcommand {\inftyToZero }[0]{\ensuremath {\boldsymbol {\infty ^0}}} \newcommand {\numOverZero }[0]{\ensuremath {\boldsymbol {\tfrac {\#}{0}}}} \newcommand {\dfn }[0]{\textbf } \newcommand {\unit }[0]{\mathop {}\!\mathrm } \newcommand {\eval }[1]{\bigg [ #1 \bigg ]} \newcommand {\seq }[1]{\left ( #1 \right )} \newcommand {\epsilon }[0]{\varepsilon } \newcommand {\phi }[0]{\varphi } \newcommand {\iff }[0]{\Leftrightarrow } \DeclareMathOperator {\arccot }{arccot} \DeclareMathOperator {\arcsec }{arcsec} \DeclareMathOperator {\arccsc }{arccsc} \DeclareMathOperator {\si }{Si} \DeclareMathOperator {\scal }{scal} \DeclareMathOperator {\sign }{sign} \newcommand {\arrowvec }[1]{{\overset {\rightharpoonup }{#1}}} \newcommand {\vec }[1]{{\overset {\boldsymbol {\rightharpoonup }}{\mathbf {#1}}}\hspace {0in}} \newcommand {\point }[1]{\left (#1\right )} \newcommand {\pt }[1]{\mathbf {#1}} \newcommand {\Lim }[2]{\lim _{\point {#1} \to \point {#2}}} \DeclareMathOperator {\proj }{\mathbf {proj}} \newcommand {\veci }[0]{{\boldsymbol {\hat {\imath }}}} \newcommand {\vecj }[0]{{\boldsymbol {\hat {\jmath }}}} \newcommand {\veck }[0]{{\boldsymbol {\hat {k}}}} \newcommand {\vecl }[0]{\vec {\boldsymbol {\l }}} \newcommand {\uvec }[1]{\mathbf {\hat {#1}}} \newcommand {\utan }[0]{\mathbf {\hat {t}}} \newcommand {\unormal }[0]{\mathbf {\hat {n}}} \newcommand {\ubinormal }[0]{\mathbf {\hat {b}}} \newcommand {\dotp }[0]{\bullet } \newcommand {\cross }[0]{\boldsymbol \times } \newcommand {\grad }[0]{\boldsymbol \nabla } \newcommand {\divergence }[0]{\grad \dotp } \newcommand {\curl }[0]{\grad \cross } \newcommand {\lto }[0]{\mathop {\longrightarrow \,}\limits } \newcommand {\bar }[0]{\overline } \newcommand {\surfaceColor }[0]{violet} \newcommand {\surfaceColorTwo }[0]{redyellow} \newcommand {\sliceColor }[0]{greenyellow} \newcommand {\vector }[1]{\left \langle #1\right \rangle } \newcommand {\sectionOutcomes }[0]{} \newcommand {\HyperFirstAtBeginDocument }[0]{\AtBeginDocument }$

We investigate the chain rule for functions of several variables.

The chain rule states that $\ddx \Big (f\big (g(x)\big )\Big ) = f'\big (g(x)\big )g'(x).$ If $t=g(x)$ , we can express the chain rule as $\dd [f]{x} = \dd [f]{t}\dd [t]{x}.$ In this section we extend the chain rule to functions of more than one variable.

Let $F:\R ^n\to \R$ be a differentiable function and let $\vec {x}(t) = \vector {x_1(t),x_2(t),\dots ,x_n(t)}$ be a differentiable vector-valued function from $\R \to \R ^n$ . Then $\dd [F]{t} = \grad F(\vec {x}(t)) \dotp \vec {x}'(t)$

It is good to understand what the situation of $F(x,y)$ , $\vec {x}(t) = \vector {x(t),y(t)}$ describes. We know that $F(x,y)$ describes a surface; we also recognize that $\vec {x}(t)$ describes a curve in the $(x,y)$ -plane. Combining these together, we are describing a curve that lies on the surface described by $F$ . The parametric equations for this curve are $x=x(t)$ , $y=y(t)$ and $F\big (x(t),y(t)\big )$ . Consider:

Here a surface is drawn, along with a dashed curve in the $(x,y)$ -plane. Restricting $F$ to just the points on this circle gives the curve shown on the surface. The derivative $\dd [F]{t}$ gives the instantaneous rate of change of $F$ with respect to $t$ .

Now try your hand at the chain rule.

Let $F(x,y)=x^2y+x$ , where $x(t)=\sin (t)$ and $y(t)=e^{5t}$ . Compute $\dd [F]{t}$ . $\dd [F]{t} = \answer {(2\sin (t)e^{5t}+1)\cos (t)+5e^{5t}\sin ^2(t)}.$

The previous example can make us wonder: if we substituted for $x$ and $y$ at the end to show that $\dd [F]{t}$ is really just a function of $t$ , why not substitute before differentiating, showing clearly that $F$ is a function of $t$ ?

That is, $z = x^2y+x = (\sin t)^2e^{5t}+\sin t.$ Applying the chain and product rules, we have $\dd [F]{t} = 2\sin (t)\cos (t) e^{5t}+ 5\sin ^2(t) e^{5t}+\cos (t),$ which matches the result from the example.

This may now make one wonder “What’s the point? If we could already find the derivative, why learn another way of finding it?” In some cases, applying this rule makes differentiation simpler, but this is hardly the power of the chain rule. Rather, the chain rule is extremely powerful when we do not know what $F$ , $x$ and/or $y$ are. It may be hard to believe, but often in “the real world” we know rate-of-change information (information about derivatives) without explicitly knowing the underlying functions. The chain rule allows us to combine several rates of change to find another rate of change.

Suppose the curve below has an arc length parameterization given by $\vec {p}(s)$ .

Let $F(x,y) = x^2 - y^3$ . Compute: $\eval {\dd {s}F(\vec {p}(s))}_{s=1} \begin {prompt} = \answer {-6} \end {prompt}$

The chain rule also tells us something about the meaning of the gradient. As we will see, the gradient vector is always orthogonal to level curves and surfaces.

Suppose that a vector-valued function $\vec {c}(t)=\vector {x(t),y(t)}$ runs along a level surface for the surface $F(x,y)$ . Explain how the chain rule shows that the gradient is orthogonal to level curves and surfaces.

We should ask ourselves: “What is the change in $F$ as $t$ varies?” Since $\vec {c}(t)$ traces out a level curve, the change must be $\answer [given]{0}$ . Comparing this to the chain rule we see: $\begin{align*} \dd{t} F(\vec{c}(t)) &= \grad F(\vec{c}(t)) \dotp \vec{c}'(t) \\ &= \answer[given]{0} \end{align*}$ this tells us that the gradient is orthogonal to the tangent vectors of our level curve. This means that the gradient is orthogonal to level curves.

Note that the last explanation works in any dimension. The up-shot?

Gradient vectors are orthogonal to level sets.

This is a key concept concerning the gradient.

New solutions for old problems

We can also use our new chain rule to revisit problems from our previous studies of calculus. Our new tools allow for simpler solutions to these problems.

Differentiating integrals

Recall the following form of the Fundamental Theorem of Calculus: $\ddx \int _a^x f(t) \d t = f(x)$

Compute $\dd x \int _0^x \frac {\sin (t)}{t} \d t \begin {prompt} = \answer {\frac {\sin (x)}{x}} \end {prompt}$

It is easy to use the Fundamental Theorem of Calculus to differentiate integrals, when the limits of integration are a constant and a variable. However, when the limits are functions, things get more complicated. The multivariable chain rule helps out in these situations.

Compute: $\dd {t} \int _{\sin (t)}^{\cos (t)} e^{(s^2)} \d s$

Let $\begin{align*} F(x,y) &= \int_y^x e^{(s^2)} \d s\\ x(t) &= \cos(t)\\ y(t) &= \sin(t). \end{align*}$ Now, $\dd {t} F(t) = \grad F(x(t),y(t)) \dotp \vector {\answer [given]{-\sin (t)},\answer [given]{\cos (t)}}.$ To compute the partials, we use the Fundamental Theorem of Calculus: $\begin{align*} \pp[F]{x} &= \pp{x} \int_y^x e^{(s^2)} \d s\\ &= \answer[given]{e^{(x^2)}} \end{align*}$ And $\begin{align*} \pp[F]{y} &= \pp{y} \int_y^x e^{(s^2)} \d s\\ \pp[F]{y} &= \pp{y} \left(-\int_x^y e^{(s^2)} \d s\right)\\ &= \answer[given]{-e^{(y^2)}} \end{align*}$ So $\begin{align*} \dd{t} F(t) &= \vector{e^{\cos^2(t)},-e^{\sin^2(t)}}\dotp \vector{-\sin(t),\cos(t)}\\ &= \answer[given]{-e^{\cos^2(t)}\sin(t)-e^{\sin^2(t)}\cos(t)}. \end{align*}$

Compute: $\dd {t} \int _{t^2}^{t^3} \frac {\sin (s)}{s} \d s \begin {prompt} = \answer {\frac {-2 t \sin (t^2)}{t^2}+\frac {3t^2\sin (t^3)}{t^3}} \end {prompt}$

Implicit differentiation

We’ve used implicit differentiation to compute $\dd [y]{x}$ when $y$ is given as an implicit function of $x$ . Now we’ll revisit this with the chain rule and give a new, simpler, method of finding $\dd [y]{x}$ .

For instance, consider the implicit function $x^2y-xy^3=3$ . We learned to use the following steps to find $\dd [y]{x}$ :

$\begin{align*} \ddx\Big(x^2y-xy^3\big) &= \ddx\Big(3\Big) \\ 2xy + x^2\dd[y]{x}-y^3-3xy^2\dd[y]{x} &= 0\\ \dd[y]{x} = -\frac{2xy-y^3}{x^2-3xy^2}. \end{align*}$

Instead of using this method, consider $z=x^2y-xy^3$ . The implicit function above describes the level curve $z=3$ . Considering $x$ and $y$ as functions of $x$ , the chain rule states that $\dd [z]{x} = \pp [z]{x}\dd [x]{x}+\pp [z]{y}\dd [y]{x}.$ Since $z$ is constant (in our example, $z=3$ ), $\dd [z]{x} = 0$ . We also know $\dd [x]{x} = 1$ . Write with me,

$\begin{align*} 0 &= \pp[z]{x}(1) + \pp[z]{y}\dd[y]{x} \\ \dd[y]{x} &= -\pp[z]{x}\Big/\pp[z]{y}\\ &= -\frac{F^{(1,0)}(x,y)}{F^{(0,1)}(x,y)}. \end{align*}$

Note how our solution for $\dd [y]{x}$ above is just the partial derivative of $z$ , with respect to $x$ , divided by the partial derivative of $z$ with respect to $y$ . We state the above as a theorem.

Let $F:\R ^2\to \R$ be a differentiable function of $x$ and $y$ , where $F(x,y)=c$ defines $y$ as an implicit function of $x$ , for some constant $c$ . Then $\dd [y]{x} = -\frac {F^{(1,0)}(x,y)}{F^{(0,1)}(x,y)}.$

Try your hand at this.

Given the implicitly defined function $\sin (x^2y^2)+y^3=x+y$ , find $y'$ .

Consider $F(x,y) = \sin (x^2y^2)+y^3-x-y$ , and find $F^{(1,0)}(x,y)$ , and $F^{(0,1)}(x,y)$ .

$\dd [y]{x} = \answer {\frac {-2xy^2\cos (x^2y^2)+1}{2x^2y\cos (x^2y^2)+3y^2-1}}$