Section 13.7 : Directional Derivatives
To this point we’ve only looked at the two partial derivatives \({f_x}\left( {x,y} \right)\) and \({f_y}\left( {x,y} \right)\). Recall that these derivatives represent the rate of change of \(f\) as we vary \(x\) (holding \(y\) fixed) and as we vary \(y\) (holding \(x\) fixed) respectively. We now need to discuss how to find the rate of change of \(f\) if we allow both \(x\) and \(y\) to change simultaneously. The problem here is that there are many ways to allow both \(x\) and \(y\) to change. For instance, one could be changing faster than the other and then there is also the issue of whether or not each is increasing or decreasing. So, before we get into finding the rate of change we need to get a couple of preliminary ideas taken care of first. The main idea that we need to look at is just how are we going to define the changing of \(x\) and/or \(y\).
Let’s start off by supposing that we wanted the rate of change of \(f\) at a particular point, say \(\left( {{x_0},{y_0}} \right)\). Let’s also suppose that both \(x\) and \(y\) are increasing and that, in this case, \(x\) is increasing twice as fast as \(y\) is increasing. So, as \(y\) increases one unit of measure \(x\) will increase two units of measure.
To help us see how we’re going to define this change let’s suppose that a particle is sitting at \(\left( {{x_0},{y_0}} \right)\) and the particle will move in the direction given by the changing \(x\) and \(y\). Therefore, the particle will move off in a direction of increasing \(x\) and \(y\) and the \(x\) coordinate of the point will increase twice as fast as the \(y\) coordinate. Now that we’re thinking of this changing \(x\) and \(y\) as a direction of movement we can get a way of defining the change. We know from Calculus II that vectors can be used to define a direction and so the particle, at this point, can be said to be moving in the direction,
\[\vec v = \left\langle {2,1} \right\rangle \]Since this vector can be used to define how a particle at a point is changing we can also use it to describe how \(x\) and/or \(y\) is changing at a point. For our example we will say that we want the rate of change of \(f\) in the direction of \(\vec v = \left\langle {2,1} \right\rangle \). In this way we will know that \(x\) is increasing twice as fast as \(y\) is. There is still a small problem with this however. There are many vectors that point in the same direction. For instance, all of the following vectors point in the same direction as \(\vec v = \left\langle {2,1} \right\rangle \).
\[\vec v = \left\langle {\frac{1}{5},\frac{1}{{10}}} \right\rangle \,\hspace{0.25in}\,\,\,\vec v = \left\langle {6,3} \right\rangle \hspace{0.25in}\vec v = \left\langle {\frac{2}{{\sqrt 5 }},\frac{1}{{\sqrt 5 }}} \right\rangle \]We need a way to consistently find the rate of change of a function in a given direction. We will do this by insisting that the vector that defines the direction of change be a unit vector. Recall that a unit vector is a vector with length, or magnitude, of 1. This means that for the example that we started off thinking about we would want to use
\[\vec v = \left\langle {\frac{2}{{\sqrt 5 }},\frac{1}{{\sqrt 5 }}} \right\rangle \]since this is the unit vector that points in the direction of change.
For reference purposes recall that the magnitude or length of the vector \(\vec v = \left\langle {a,b,c} \right\rangle \) is given by,
\[\left\| {\vec v} \right\| = \sqrt {{a^2} + {b^2} + {c^2}} \]For two dimensional vectors we drop the \(c\) from the formula.
Sometimes we will give the direction of changing \(x\) and \(y\) as an angle. For instance, we may say that we want the rate of change of \(f\) in the direction of \(\theta = \frac{\pi }{3}\). The unit vector that points in this direction is given by,
\[\vec u = \left\langle {\cos \theta ,\sin \theta } \right\rangle \]Okay, now that we know how to define the direction of changing \(x\) and \(y\) its time to start talking about finding the rate of change of \(f\) in this direction. Let’s start off with the official definition.
Definition
The rate of change of \(f\left( {x,y} \right)\) in the direction of the unit vector \(\vec u = \left\langle {a,b} \right\rangle \) is called the directional derivative and is denoted by \({D_{\vec u}}f\left( {x,y} \right)\). The definition of the directional derivative is,
\[{D_{\vec u}}f\left( {x,y} \right) = \mathop {\lim }\limits_{h \to 0} \frac{{f\left( {x + ah,y + bh} \right) - f\left( {x,y} \right)}}{h}\]So, the definition of the directional derivative is very similar to the definition of partial derivatives. However, in practice this can be a very difficult limit to compute so we need an easier way of taking directional derivatives. It’s actually fairly simple to derive an equivalent formula for taking directional derivatives.
To see how we can do this let’s define a new function of a single variable,
\[g\left( z \right) = f\left( {{x_0} + az,{y_0} + bz} \right)\]where \({x_0}\), \({y_0}\), \(a\), and \(b\) are some fixed numbers. Note that this really is a function of a single variable now since \(z\) is the only letter that is not representing a fixed number.
Then by the definition of the derivative for functions of a single variable we have,
\[g'\left( z \right) = \mathop {\lim }\limits_{h \to 0} \frac{{g\left( {z + h} \right) - g\left( z \right)}}{h}\]and the derivative at \(z = 0\) is given by,
\[g'\left( 0 \right) = \mathop {\lim }\limits_{h \to 0} \frac{{g\left( h \right) - g\left( 0 \right)}}{h}\]If we now substitute in for \(g\left( z \right)\) we get,
\[g'\left( 0 \right) = \mathop {\lim }\limits_{h \to 0} \frac{{g\left( h \right) - g\left( 0 \right)}}{h} = \mathop {\lim }\limits_{h \to 0} \frac{{f\left( {{x_0} + ah,{y_0} + bh} \right) - f\left( {{x_0},{y_0}} \right)}}{h} = {D_{\vec u}}f\left( {{x_0},{y_0}} \right)\]So, it looks like we have the following relationship.
\[\begin{equation}g'\left( 0 \right) = {D_{\vec u}}f\left( {{x_0},{y_0}} \right) \label{eq:eq1} \end{equation}\]Now, let’s look at this from another perspective. Let’s rewrite \(g\left( z \right)\) as follows,
\[g\left( z \right) = f\left( {x,y} \right)\,\,\,\,{\mbox{where}}\,\,\,x = {x_0} + az{\mbox{ and }}y = {y_0} + bz\]We can now use the chain rule from the previous section to compute,
\[g'\left( z \right) = \frac{{dg}}{{dz}} = \frac{{\partial f}}{{\partial x}}\frac{{dx}}{{dz}} + \frac{{\partial f}}{{\partial y}}\frac{{dy}}{{dz}} = {f_x}\left( {x,y} \right)a + {f_y}\left( {x,y} \right)b\]So, from the chain rule we get the following relationship.
\[\begin{equation}g'\left( z \right) = {f_x}\left( {x,y} \right)a + {f_y}\left( {x,y} \right)b \label{eq:eq2}\end{equation}\]If we now take \(z = 0\) we will get that \(x = {x_0}\) and \(y = {y_0}\) (from how we defined \(x\) and \(y\) above) and plug these into \(\eqref{eq:eq2}\) we get,
\[\begin{equation}g'\left( 0 \right) = {f_x}\left( {{x_0},{y_0}} \right)a + {f_y}\left( {{x_0},{y_0}} \right)b \label{eq:eq3}\end{equation}\]Now, simply equate \(\eqref{eq:eq1}\) and \(\eqref{eq:eq3}\) to get that,
\[{D_{\vec u}}f\left( {{x_0},{y_0}} \right) = g'\left( 0 \right) = {f_x}\left( {{x_0},{y_0}} \right)a + {f_y}\left( {{x_0},{y_0}} \right)b\]If we now go back to allowing \(x\) and \(y\) to be any number we get the following formula for computing directional derivatives.
This is much simpler than the limit definition. Also note that this definition assumed that we were working with functions of two variables. There are similar formulas that can be derived by the same type of argument for functions with more than two variables. For instance, the directional derivative of \(f\left( {x,y,z} \right)\) in the direction of the unit vector \(\vec u = \left\langle {a,b,c} \right\rangle \) is given by,
Let’s work a couple of examples.
- \({D_{\vec u}}f\left( {2,0} \right)\) where \(f\left( {x,y} \right) = x{{\bf{e}}^{xy}} + y\) and \(\vec u\) is the unit vector in the direction of \(\displaystyle \theta = \frac{{2\pi }}{3}\).
- \({D_{\vec u}}f\left( {x,y,z} \right)\) where \(f\left( {x,y,z} \right) = {x^2}z + {y^3}{z^2} - xyz\) in the direction of \(\vec v = \left\langle { - 1,0,3} \right\rangle \).
We’ll first find \({D_{\vec u}}f\left( {x,y} \right)\) and then use this a formula for finding \({D_{\vec u}}f\left( {2,0} \right)\). The unit vector giving the direction is,
\[\vec u = \left\langle {\cos \left( {\frac{{2\pi }}{3}} \right),\sin \left( {\frac{{2\pi }}{3}} \right)} \right\rangle = \left\langle { - \frac{1}{2},\frac{{\sqrt 3 }}{2}} \right\rangle \]So, the directional derivative is,
\[{D_{\vec u}}f\left( {x,y} \right) = \left( { - \frac{1}{2}} \right)\left( {{{\bf{e}}^{xy}} + xy{{\bf{e}}^{xy}}} \right) + \left( {\frac{{\sqrt 3 }}{2}} \right)\left( {{x^2}{{\bf{e}}^{xy}} + 1} \right)\]Now, plugging in the point in question gives,
\[{D_{\vec u}}f\left( {2,0} \right) = \left( { - \frac{1}{2}} \right)\left( 1 \right) + \left( {\frac{{\sqrt 3 }}{2}} \right)\left( 5 \right) = \frac{{5\sqrt 3 - 1}}{2}\]b \({D_{\vec u}}f\left( {x,y,z} \right)\) where \(f\left( {x,y,z} \right) = {x^2}z + {y^3}{z^2} - xyz\) in the direction of \(\vec v = \left\langle { - 1,0,3} \right\rangle \). Show Solution
In this case let’s first check to see if the direction vector is a unit vector or not and if it isn’t convert it into one. To do this all we need to do is compute its magnitude.
\[\left\| {\vec v} \right\| = \sqrt {1 + 0 + 9} = \sqrt {10} \ne 1\]So, it’s not a unit vector. Recall that we can convert any vector into a unit vector that points in the same direction by dividing the vector by its magnitude. So, the unit vector that we need is,
\[\vec u = \frac{1}{{\sqrt {10} }}\left\langle { - 1,0,3} \right\rangle = \left\langle { - \frac{1}{{\sqrt {10} }},0,\frac{3}{{\sqrt {10} }}} \right\rangle \]The directional derivative is then,
\[\begin{align*}{D_{\vec u}}f\left( {x,y,z} \right) & = \left( { - \frac{1}{{\sqrt {10} }}} \right)\left( {2xz - yz} \right) + \left( 0 \right)\left( {3{y^2}{z^2} - xz} \right) + \left( {\frac{3}{{\sqrt {10} }}} \right)\left( {{x^2} + 2{y^3}z - xy} \right)\\ & = \frac{1}{{\sqrt {10} }}\left( {3{x^2} + 6{y^3}z - 3xy - 2xz + yz} \right)\end{align*}\]There is another form of the formula that we used to get the directional derivative that is a little nicer and somewhat more compact. It is also a much more general formula that will encompass both of the formulas above.
Let’s start with the second one and notice that we can write it as follows,
\[\begin{align*}{D_{\vec u}}f\left( {x,y,z} \right) & = {f_x}\left( {x,y,z} \right)a + {f_y}\left( {x,y,z} \right)b + {f_z}\left( {x,y,z} \right)c\\ & = \left\langle {{f_x},{f_y},{f_z}} \right\rangle \centerdot \left\langle {a,b,c} \right\rangle \end{align*}\]In other words, we can write the directional derivative as a dot product and notice that the second vector is nothing more than the unit vector \(\vec u\) that gives the direction of change. Also, if we had used the version for functions of two variables the third component wouldn’t be there, but other than that the formula would be the same.
Now let’s give a name and notation to the first vector in the dot product since this vector will show up fairly regularly throughout this course (and in other courses). The gradient of \(f\) or gradient vector of \(f\) is defined to be,
\[\nabla f = \left\langle {{f_x},{f_y},{f_z}} \right\rangle \hspace{0.25in}{\mbox{or}}\hspace{0.5in}\nabla f = \left\langle {{f_x},{f_y}} \right\rangle \]Or, if we want to use the standard basis vectors the gradient is,
\[\nabla f = {f_x}\,\vec i + {f_y}\vec j + {f_z}\,\vec k\hspace{0.5in}{\mbox{or}}\hspace{0.5in}\nabla f = {f_x}\,\vec i + {f_y}\vec j\]The definition is only shown for functions of two or three variables, however there is a natural extension to functions of any number of variables that we’d like.
With the definition of the gradient we can now say that the directional derivative is given by,
\[{D_{\vec u}}f = \nabla f\centerdot \vec u\]where we will no longer show the variable and use this formula for any number of variables. Note as well that we will sometimes use the following notation,
\[{D_{\vec u}}f\left( {\vec x} \right) = \nabla f\centerdot \vec u\]where \(\vec x = \left\langle {x,y,z} \right\rangle \) or \(\vec x = \left\langle {x,y} \right\rangle \) as needed. This notation will be used when we want to note the variables in some way, but don’t really want to restrict ourselves to a particular number of variables. In other words, \(\vec x\) will be used to represent as many variables as we need in the formula and we will most often use this notation when we are already using vectors or vector notation in the problem/formula.
Let’s work a couple of examples using this formula of the directional derivative.
- \({D_{\vec u}}f\left( {\vec x} \right)\) for \(f\left( {x,y} \right) = x\cos \left( y \right)\) in the direction of \(\vec v = \left\langle {2,1} \right\rangle \).
- \({D_{\vec u}}f\left( {\vec x} \right)\) for \(f\left( {x,y,z} \right) = \sin \left( {yz} \right) + \ln \left( {{x^2}} \right)\) at \(\left( {1,1,\pi } \right)\) in the direction of \(\vec v = \left\langle {1,1, - 1} \right\rangle \).
Let’s first compute the gradient for this function.
\[\nabla f = \left\langle {\cos \left( y \right), - x\sin \left( y \right)} \right\rangle \]Also, as we saw earlier in this section the unit vector for this direction is,
\[\vec u = \left\langle {\frac{2}{{\sqrt 5 }},\frac{1}{{\sqrt 5 }}} \right\rangle \]The directional derivative is then,
\[\begin{align*}{D_{\vec u}}f\left( {\vec x} \right) & = \left\langle {\cos \left( y \right), - x\sin \left( y \right)} \right\rangle \centerdot \left\langle {\frac{2}{{\sqrt 5 }},\frac{1}{{\sqrt 5 }}} \right\rangle \\ & = \frac{1}{{\sqrt 5 }}\left( {2\cos \left( y \right) - x\sin \left( y \right)} \right)\end{align*}\]b \({D_{\vec u}}f\left( {\vec x} \right)\) for \(f\left( {x,y,z} \right) = \sin \left( {yz} \right) + \ln \left( {{x^2}} \right)\) at \(\left( {1,1,\pi } \right)\) in the direction of \(\vec v = \left\langle {1,1, - 1} \right\rangle \). Show Solution
In this case are asking for the directional derivative at a particular point. To do this we will first compute the gradient, evaluate it at the point in question and then do the dot product. So, let’s get the gradient.
\[\begin{align*}\nabla f\left( {x,y,z} \right) & = \left\langle {\frac{2}{x},z\cos \left( {yz} \right),y\cos \left( {yz} \right)} \right\rangle \\ & \nabla f\left( {1,1,\pi } \right) = \left\langle {\frac{2}{1},\pi \cos \left( \pi \right),\cos \left( \pi \right)} \right\rangle = \left\langle {2, - \pi , - 1} \right\rangle \end{align*}\]Next, we need the unit vector for the direction,
\[\left\| {\vec v} \right\| = \sqrt 3 \hspace{0.5in}\vec u = \left\langle {\frac{1}{{\sqrt 3 }},\frac{1}{{\sqrt 3 }}, - \frac{1}{{\sqrt 3 }}} \right\rangle \]Finally, the directional derivative at the point in question is,
\[\begin{align*}{D_{\vec u}}f\left( {1,1,\pi } \right) & = \left\langle {2, - \pi , - 1} \right\rangle \centerdot \left\langle {\frac{1}{{\sqrt 3 }},\frac{1}{{\sqrt 3 }}, - \frac{1}{{\sqrt 3 }}} \right\rangle \\ & = \frac{1}{{\sqrt 3 }}\left( {2 - \pi + 1} \right)\\ & = \frac{{3 - \pi }}{{\sqrt 3 }}\end{align*}\]Before proceeding let’s note that the first order partial derivatives that we were looking at in the majority of the section can be thought of as special cases of the directional derivatives. For instance, \({f_x}\) can be thought of as the directional derivative of \(f\) in the direction of \(\vec u = \left\langle {1,0} \right\rangle \) or \(\vec u = \left\langle {1,0,0} \right\rangle \), depending on the number of variables that we’re working with. The same can be done for \({f_y}\) and \({f_z}\)
We will close out this section with a couple of nice facts about the gradient vector. The first tells us how to determine the maximum rate of change of a function at a point and the direction that we need to move in order to achieve that maximum rate of change.
Theorem
The maximum value of \({D_{\vec u}}f\left( {\vec x} \right)\) (and hence then the maximum rate of change of the function \(f\left( {\vec x} \right)\)) is given by \(\left\| {\nabla f\left( {\vec x} \right)} \right\|\) and will occur in the direction given by \(\nabla f\left( {\vec x} \right)\).
Proof
This is a really simple proof. First, if we start with the dot product form \({D_{\vec u}}f\left( {\vec x} \right)\) and use a nice fact about dot products as well as the fact that \(\vec u\) is a unit vector we get,
\[{D_{\vec u}}f = \nabla f\centerdot \vec u = \left\| {\nabla f} \right\|\,\,\left\| {\vec u} \right\|\cos \theta = \left\| {\nabla f} \right\|\cos \theta \]
where \(\theta \) is the angle between the gradient and \(\vec u\).
Now the largest possible value of \(\cos \theta \) is 1 which occurs at \(\theta = 0\). Therefore the maximum value of \({D_{\vec u}}f\left( {\vec x} \right)\) is \(\left\| {\nabla f\left( {\vec x} \right)} \right\|\) Also, the maximum value occurs when the angle between the gradient and \(\vec u\) is zero, or in other words when \(\vec u\) is pointing in the same direction as the gradient, \(\nabla f\left( {\vec x} \right)\).
Let’s take a quick look at an example.
First, you will hopefully recall from the Quadric Surfaces section that this is an elliptic paraboloid that opens downward. So even though most hills aren’t this symmetrical it will at least be vaguely hill shaped and so the question makes at least a little sense.
Now on to the problem. There are a couple of questions to answer here, but using the theorem makes answering them very simple. We’ll first need the gradient vector.
\[\nabla f\left( {\vec x} \right) = \left\langle { - 0.02x, - 0.04y} \right\rangle \]The maximum rate of change of the elevation will then occur in the direction of
\[\nabla f\left( {60,100} \right) = \left\langle { - 1.2, - 4} \right\rangle \]The maximum rate of change of the elevation at this point is,
\[\left\| {\nabla f\left( {60,100} \right)} \right\| = \sqrt {{{\left( { - 1.2} \right)}^2} + {{\left(- 4 \right)}^2}} = \sqrt {17.44} = 4.176\]Before leaving this example let’s note that we’re at the point \(\left( {60,100} \right)\) and the direction of greatest rate of change of the elevation at this point is given by the vector \(\left\langle { - 1.2, - 4} \right\rangle \). Since both of the components are negative it looks like the direction of maximum rate of change points up the hill towards the center rather than away from the hill.
The second fact about the gradient vector that we need to give in this section will be very convenient in some later sections.
Fact
The gradient vector \(\nabla f\left( {{x_0},{y_0}} \right)\) is orthogonal (or perpendicular) to the level curve \(f\left( {x,y} \right) = k\) at the point \(\left( {{x_0},{y_0}} \right)\). Likewise, the gradient vector \(\nabla f\left( {{x_0},{y_0},{z_0}} \right)\) is orthogonal to the level surface \(f\left( {x,y,z} \right) = k\) at the point \(\left( {{x_0},{y_0},{z_0}} \right)\).
Proof
We’re going to do the proof for the \({\mathbb{R}^3}\)case. The proof for the \({\mathbb{R}^2}\) case is identical. We’ll also need some notation out of the way to make life easier for us let’s let \(S\) be the level surface given by \(f\left( {x,y,z} \right) = k\) and let \(P = \left( {{x_0},{y_0},{z_0}} \right)\). Note as well that \(P\) will be on \(S\).
Now, let \(C\) be any curve on \(S\) that contains \(P\). Let \(\vec r\left( t \right) = \left\langle {x\left( t \right),y\left( t \right),z\left( t \right)} \right\rangle \) be the vector equation for \(C\) and suppose that \({t_0}\) be the value of \(t\) such that \(\vec r\left( {{t_0}} \right) = \left\langle {{x_0},{y_0},{z_0}} \right\rangle \). In other words, \({t_0}\) be the value of \(t\) that gives \(P\).
Because \(C\) lies on \(S\) we know that points on \(C\) must satisfy the equation for \(S\). Or,
\[f\left( {x\left( t \right),y\left( t \right),z\left( t \right)} \right) = k\]
Next, let’s use the Chain Rule on this to get,
\[\frac{{\partial f}}{{\partial x}}\frac{{dx}}{{dt}} + \frac{{\partial f}}{{\partial y}}\frac{{dy}}{{dt}} + \frac{{\partial f}}{{\partial z}}\frac{{dz}}{{dt}} = 0\]
Notice that \(\nabla f = \left\langle {{f_x},{f_y},{f_z}} \right\rangle \) and \(\vec r'\left( t \right) = \left\langle {x'\left( t \right),y'\left( t \right),z'\left( t \right)} \right\rangle \) so this becomes,
\[\nabla f\,\centerdot \,\vec r'\left( t \right) = 0\]
At, \(t = {t_0}\) this is,
\[\nabla f\left( {{x_0},{y_0},{z_0}} \right)\,\centerdot \,\vec r'\left( {{t_0}} \right) = 0\]
This then tells us that the gradient vector at \(P\) , \(\nabla f\left( {{x_0},{y_0},{z_0}} \right)\), is orthogonal to the tangent vector, \(\vec r'\left( {{t_0}} \right)\), to any curve \(C\) that passes through \(P\) and on the surface \(S\) and so must also be orthogonal to the surface \(S\).
As we will be seeing in later sections we are often going to be needing vectors that are orthogonal to a surface or curve and using this fact we will know that all we need to do is compute a gradient vector and we will get the orthogonal vector that we need. We will see the first application of this in the next chapter.