JustToThePoint English Website Version
JustToThePoint en español
Colaborate with us

Chain Rule

Give me six hours to chop down a tree and I will spend the first four sharpening the axe, Abraham Lincoln

Complex limits

Chain Rule

Chain Rule. Let $\mathbb{f}:\mathbb{R}^n \to \mathbb{R}^m, \mathbb{g}:\mathbb{R}^m \to \mathbb{R}^p$ be vector valued functions, assume that f and g are differentiable at $x \in dom(\mathbb{f}), f(x) \in dom(\mathbb{g})$, respectively. The composition function $\mathbb{h}:\mathbb{R}^n \to \mathbb{R}$ defined as $\mathbb{h}(x) = \mathbb{g}(\mathbb{f}(x))$ is also differentiable at x and its derivative can be expressed as $\mathbf{Dh(x) = Dg(f(x))Df(x)}$. In particular, if m = p = 1, h is a scalar-valued function from $\mathbf{R} \to \mathbf{R}$ differentiable at x and its gradient is given by the expression: $\nabla \mathbb{h}(x) = \mathbb{g’}(\mathbb{f}(x))\nabla \mathbb{f}(x)$ where $\mathbb{g’}(\mathbb{f}(x))$ is just the ordinary derivative of g (a single-variable function) evaluated at f(x).

$Df(x)$ is an m×n matrix (the Jacobian of f at x). $Dg(f(x))$ is a   p×m matrix (the Jacobian of g at the point f(x)). Thus, their product is a p×n matrix, matching the dimension of $Dh(x)$.

It essentially states that the overall change in h is a combination of the changes in f and g, propagated through the composition. If m = p = 1, the small change in h for a small change in x is governed by how f changes in each direction (the gradient $\nabla \mathbf{f(x)}$) scaled by the rate of change of g at f(x).

Examples:

Let’s compute $\mathbf{Dh(x) = Dg(f(x))Df(x)}$. The derivative $Df(x)$ with respect to x is (a mxn, 2x1 matrix, a column vector): $\nabla \mathbf{f}(x) = (\begin{smallmatrix}cos(x)\\ -sin(x)\end{smallmatrix})$. The derivative $Dg(f(x))$ is (a pxm, 1x2, a row vector): $\nabla \mathbf{D}(g(f(x))) = (\begin{smallmatrix}2sin(x) & 2cos(x)\end{smallmatrix})$

Apply the Chain Rule: $\mathbf{Dh(x) = Dg(f(x))Df(x)} = (\begin{smallmatrix}2sin(x) & 2cos(x)\end{smallmatrix})(\begin{smallmatrix}cos(x)\\ -sin(x)\end{smallmatrix}) = 2sin(x)cos(x)+2cos(x)(-sin(x)) = 0$ (a ). Indeed, since h(x) = 1 constant (pxn, 1x1, a real value), its derivate is zero.

$f_1(x, y)=x²+y², f_2(x, y) = x -y, \frac{\partial f_1}{\partial x} = 2x, \frac{\partial f_1}{\partial y} = 2y, \frac{\partial f_2}{\partial x} = 1, \frac{\partial f_2}{\partial y} = -1, \nabla \mathbf{f}(x, y) = (\begin{smallmatrix}2x & 2y\\ 1 & -1\end{smallmatrix})$ a mxn, 2x2 matrix, each row corresponds to the gradient of f1 and f2 respectively.

$\nabla \mathbf{g}(f(x, y)) = (\begin{smallmatrix}1 & 1\end{smallmatrix})$ (a pxm, 1x2 row vector). Apply the Chain Rule: $\mathbf{Dh(x) = Dg(f(x))Df(x)} = (\begin{smallmatrix}1 & 1\end{smallmatrix})(\begin{smallmatrix}2x & 2y\\ 1 & -1\end{smallmatrix}) = (\begin{smallmatrix}2x + 1 & 2y -1\end{smallmatrix})$, a 1 x 2 row vector.

Bibliography

Bitcoin donation

JustToThePoint Copyright © 2011 - 2025 Anawim. ALL RIGHTS RESERVED. Bilingual e-books, articles, and videos to help your child and your entire family succeed, develop a healthy lifestyle, and have a lot of fun. Social Issues, Join us.

This website uses cookies to improve your navigation experience.
By continuing, you are consenting to our use of cookies, in accordance with our Cookies Policy and Website Terms and Conditions of use.