JustToThePoint English Website Version
JustToThePoint en español
Colaborate with us

Differentiation

Behind this mask there is more than just flesh. Beneath this mask there is an idea… and ideas are bulletproof, Alan Moore

Complex limits

Differentiability at a point

Definition. Differentiability at a point. Let $f : ℝ^n \to ℝ^m$ be a function and let x be an interior point of the domain of f, $x \in \text{interior dom } f$. The function f is differentiable at x if there exists a matrix $Df(x) \in ℝ^{m \times n}$ that satisfies $\lim_{z \in dom f, z \neq x, z \to x} \frac{f(z) - f(x) - Df(x)||(z-x)||_2}{||(z-x)||_2} = 0$ [*]

Such a matrix Df(x) is called the derivative (or Jacobian) of f at x.

  1. $x \in \text{interior dom } f$ is very important because it ensures that for small displacements z - x (or h in the alternative notation), the point z (or x + h) remains within the domain of f.
  2. $z \neq x, z \to x$, we are calculating the limit, meaning that z approaches x, but z is never actually equal to x because we are looking at the rate of change as we get arbitrary close to x, not the value at x itself.
  3. Uniqueness of Df(x). If the limit exists, the matrix Df(x) satisfying the limit in [*] is unique. In other words, there’s only one linear transformation that best approximates the function f at the point x.
  4. Df(x)(z -x) represents the linear approximation of the change in f as we move from x to z. Df(x) is the Jacobian matrix (which represents the derivative as a linear transformation) and it’s multiplied by the vector (z - x) (which is a vector in ℝn).
  5. The term is actually Df(x)||(z-x)||2, ||···||2 denotes the Euclidean norm of a vector that measures the length or magnitude of the vector. This is essential because we are dealing with vectors (not real numbers) in ℝn and ℝm and we need a way to measure their size or magnitude.
  6. The limit equal to 0: The entire expression inside the limit represents the relative error of the linear approximation. The limit being 0 means that this relative error becomes arbitrarily small as z gets closer to x. In other words, the linear approximation becomes increasingly accurate as we zoom in on the point x.
  7. Alternative form with h. If we write or substitute z = x + h, then as z → x, h → 0. This gives us an alternative or equivalent form: $\lim_{x + h \in dom f, h \neq 0, h \to 0} \frac{f(x+h) - f(x) - Df(x)||h||_2}{||h||_2} = 0$
  8. Obtaining Df(x) from partial derivatives. The matrix Df(x) can be obtained from the partial derivatives: $Df(x)_{ij} = \frac{∂f_i(x)}{∂x_j}$, i = 1, ···, m and j = 1, ···, n. This means the entry in the i-th row and j-th column of Df(x) is the partial derivative of the i-th component function fᵢ with respect to the j-th variable xⱼ, evaluated at the point x.

Df(x) = $\Biggl (\begin{smallmatrix}\frac{∂f_1(x)}{∂x_1} & \frac{∂f_1(x)}{∂x_2} & ··· & \frac{∂f_1(x)}{∂x_n}\\ \frac{∂f_2(x)}{∂x_1} & \frac{∂f_2(x)}{∂x_2} & ··· & \frac{∂f_2(x)}{∂x_n}\\· & · & · & ·\\\\· & · & · & ·\\\\· & · & · & ·\\\frac{∂f_m(x)}{∂x_1} & \frac{∂f_m(x)}{∂x_2} & ··· & \frac{∂f_m(x)}{∂x_n}\end{smallmatrix}\Biggr )$. The Jacobian Df(x) is an m x n real matrix and this is the practical way to compute the Jacobian matrix.

The definition of differentiability captures the idea that a function can be locally approximated by a linear transformation. The Jacobian matrix is the matrix representation of this linear transformation, and its entries are the partial derivatives of the component functions. The use of norms is crucial for making the definition rigorous in higher dimensions. The condition x ∈ interior dom f ensures that we can consider small perturbations around x within the function’s domain.

Examples

Example (m = 2, n = 3). If A = $(\begin{smallmatrix}a_{11} & a_{12} & a_{13}\\a_{21} & a_{22} & a_{23}\end{smallmatrix})$ and $f(\vec{x}) = A\vec{x}$, then $Df(\vec{x}) = A = (\begin{smallmatrix}a_{11} & a_{12} & a_{13}\\a_{21} & a_{22} & a_{23}\end{smallmatrix})$

Differentiable function

Definition. A function f is called differentiable if its domain f is open and it is differentiable at every point of its domain.

  1. Its domain is an open set, meaning that for every point x in the domain of f, there exists a small open interval around x that is also entirely contained within the domain.
  2. It is differentiable at every point within its domain: This means that the derivative of f exists at each point x in the domain.
  3. Recall that a function is differentiable if it is “smooth” and has no sharp corners, cusps, or vertical tangents (i.e., the slope is undefined) within its domain.

Definition. The affine function $\tilde{f}(x) = f(a) + Df(a)(x -a)$ is called the first-order approximation or linearization of the function f at the point x = a where a lies in the interior of the domain of f (i.e. a ∈ int(dom(f))).

Definition. Let $f: \real^n \to \real$ be a real-valued function. The derivative $Df(\vec{x})$ is a 1 x n matrix (a row vector). The gradient of f at $\vec{x}$, denoted by $\nabla f(\vec{x})$, is defined as the transpose of the derivative: $\nabla f(\vec{x}) = Df(\vec{x})^T$ where $\vec{x} \in Interior(dom(f))$ (the interior of the domain of f). This results in a column vector. The i-th component of the gradient is given by the partial derivative with respect to the i-th variable, $\nabla f(\vec{x})_i = \frac{\partial f(x)}{\partial x_i}$, for i = 1, ···, n provided f is differentiable at x.

The components of the gradients are given by the partial derivatives: $\nabla f(\vec{x})_i = \frac{\partial f(x)}{\partial x_i} =$

$\frac{\partial}{\partial x_i}(\sum_{j=1}^n a_jx_j) = a_i \implies \nabla f(\vec{x}) = \vec{a}$

The components of the gradients are given by the partial derivatives: $\nabla f(\vec{x})_i = \frac{\partial f(x)}{\partial x_i} =$

$\frac{\partial}{\partial x_i}(\sum_{j=1}^n a_jx_j + b) = a_i \implies \nabla f(\vec{x}) = \vec{a}$. As expected, the intercept b is just a constant term which doesn’t affect the gradient.

$2a_{kk}x_k + \sum_{i, i \neq k} x_ia_{ik}$ (these are linear terms where j = k and i = 1, ···, n, i ≠ k) + $\sum_{j, j \neq k} a_{kj}x_j$ (these are linear terms where i = k and j = 1, ···, n, j ≠ k) =[Moving one akk into each sum] $\sum_{i=1}^n x_ia_{ik} + \sum_{j=1}^n a_{kj}x_j$. In vector notation, this can be written as: $\nabla f(\vec{x}) = A^T\vec{x} + A\vec{x} = (A^T+A)\vec{x} =[\text{if A is symmetric, i.e., A =} A^T] 2A\vec{x}$.

$f(\vec{x}) = \vec{x}^TA\vec{x} = (\begin{smallmatrix}x_1 & x_2 & ··· & x_n \end{smallmatrix})\Biggl (\begin{smallmatrix}a_{11} & a_{12} & ··· & a_{1n}\\ a_{21} & a_{22} & ··· & a_{2n} \\ ··· & ··· & ··· & ···\\ a_{n1} & a_{n2} & ··· & a_{nn} \end{smallmatrix} \Biggr) \Biggl ( \begin{smallmatrix}x_1\\ x_2 \\ ··· \\ x_n \end{smallmatrix} \Biggr ) = (\sum_{i=1}^n x_ia_{i1}, \sum_{i=1}^n x_ia_{i2}, ···, \sum_{i=1}^n x_ia_{in}) \Biggl ( \begin{smallmatrix}x_1\\ x_2 \\ ··· \\ x_n \end{smallmatrix} \Biggr ) = \sum_{j=1}^n(\sum_{i=1}^n x_ia_{ij})x_j = \sum_{i=1}^n \sum_{j=1}^n x_ia_{ij}x_j$

A particular case is the gradient of squared norm $\mathbb{l}_2$. Let f be a quadratic form give by $f: \real^n \to \real, f(\vec{x}) = ||\vec{x}||_2² = \vec{x}^T\vec{x} =[\text{This can also be expressed as}] \vec{x}^TI\vec{x}$ where I is the n x n identity matrix. From the general result for the gradient of a quadratic form $\nabla f(\vec{x}) = A^T\vec{x} + A\vec{x} = (A^T+A)\vec{x}$, we can substitute A = I. Therefore, the gradient of f(x) is $\nabla f(\vec{x}) = (I^T+I)\vec{x} =[\text{Since the identity matrix is symmetric} I^T = I, \text{ this simplifies to}] 2I\vec{x} = 2\vec{x}$.

The $\mathbb{l}_2$, Euclidean or squared norm is a measure of the magnitude of a vector in Euclidean space. It is calculated as the square root of the sum of the squares of the vector’s components. The $\mathbb{l}_2$ norm of a vector $\mathbf{\vec{x}}$ is denoted as $∥\vec{x}∥^2_2 = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2} = \vec{x}^T\vec{x}$.

The gradient could be calculated as follows: $\nabla f(\vec{x}) = \nabla (\frac{1}{2}\vec{x}^TP\vec{x} + \mathbb{q}^T\vec{x} + r) = $[We can apply the gradient operator to each term separately, as the gradient is linear] $\frac{1}{2} \nabla(\vec{x}^TP\vec{x}) + \nabla(\mathbb{q}^T\vec{x}) + \nabla(r) =$ [From the general result for the gradient of a quadratic form $\nabla f(\vec{x}) = A^T\vec{x} + A\vec{x} = (A^T+A)\vec{x}$, we can substitute A = P, the gradient of a constant r is zero, the gradient of a linear function of the form $f(\vec{x}) = \vec{a}^T\vec{x}, \nabla f(\vec{x}) = \vec{a}$, so putting all together] $\frac{1}{2}(P^T+P)\vec{x}+\vec{q}$ =[Since P is symmetric] $P\vec{x}+\vec{q}$

Continuous differentiability

Definition. Let $\mathbb{f}:\mathbb{R}^n \to \mathbb{R}$ be a real valued function with domain S = dom(f). Let $\mathbb{U} \subseteq \mathbb{S}$ be an open set. t. If all the partial derivatives of f exist and are continuous at every point x ∈ U, then f is said to be continuously differentiable on U. If the domain S itself is an open set and f is continuously differentiable on S, then f is said to be continuously differentiable.

Bitcoin donation

JustToThePoint Copyright © 2011 - 2025 Anawim. ALL RIGHTS RESERVED. Bilingual e-books, articles, and videos to help your child and your entire family succeed, develop a healthy lifestyle, and have a lot of fun. Social Issues, Join us.

This website uses cookies to improve your navigation experience.
By continuing, you are consenting to our use of cookies, in accordance with our Cookies Policy and Website Terms and Conditions of use.