Give me six hours to chop down a tree and I will spend the first four sharpening the axe, Abraham Lincoln

Complex limits

Recall

A complex number is specified by an ordered pair of real numbers (a, b) ∈ ℝ² and expressed or written in the form z = a + bi, where a and b are real numbers, and i is the imaginary unit, defined by the property i² = −1 ⇔ i = $\sqrt{-1}$, e.g., 2 + 5i, $7\pi + i\sqrt{2}.$ ℂ= { a + bi ∣a, b ∈ ℝ}.

Definition. Let D ⊆ ℂ be a set of complex numbers. A complex-valued function f of a complex variable, defined on D, is a rule that assigns to each complex number z belonging to the set D a unique complex number w, f: D ➞ ℂ.

The set D is called the domain of the complex function f, D = Dom(f).
The set of all actual outputs {f(z): z ∈ D} is called the range (or image) of the complex function f, also denoted f(D) = {f(z): z ∈ D}.

We often call the elements of D as points. If z = x+ iy ∈ D, then f(z) is called the image of the point z under f. f: D ➞ ℂ means that f is a complex function with domain D. We often write f(z) = u(x ,y) + iv(x, y), where u, v: ℝ² → ℝ are the real and imaginary parts.

ε–δ for f:ℝ → ℝ: ∀ε>0 ∃δ>0 s.t. |f(x) − L| < ε whenever 0 < ∣x−a∣ < δ.

Definition. Let D ⊆ ℂ, $f: D \rarr \Complex$ be a function and z₀ be a limit point of D (so arbitrarily close points of D lie around z₀, though possibly z₀ ∉ D). A complex number L is said to be a limit of the function f as z approaches z₀, written or expressed as $\lim_{z \to z_0} f(z)=L$, if for every epsilon ε > 0, there exist a corresponding delta δ > 0 such that |f(z) -L| < ε whenever z ∈ D and 0 < |z - z₀| < δ.

Why 0 < |z - z₀|? We exclude z = z₀ itself because the limit cares about values near z₀, not at z₀ itself. When z₀ ∉ D, you cannot evaluate f(z₀), so you only care about z approaching z₀. When z₀ ∈ D, you still want the function’s nearby behavior; this separates “limit” from “value.”

Equivalently, if ∀ε >0, ∃ δ > 0: (for every ε > 0, there exist a corresponding δ > 0) such that whenever z ∈ D ∩ B'(z₀; δ), f(z) ∈ B(L; ε) ↭ f(D ∩ B'(z₀; δ)) ⊂ B(L; ε).

If no such L exists, then we say that f(z) does not have a limit as z approaches z₀. This is exactly the same ε–δ formulation we know from real calculus, but now z and L live in the complex plane ℂ, and neighborhoods are round disks rather than intervals.

Continuity in the complex plane

Definition. Let D ⊆ ℂ. A function f: D → ℂ is said to be continuous at a point z₀ ∈ D if given any arbitrarily small ε > 0, there is a corresponding δ > 0 such that |f(z) - f(z₀)| < ε whenever z ∈ D and |z - z₀| < δ.

In words, arbitrarily small output‐changes ε can be guaranteed by restricting z to lie in a sufficiently small disk of radius δ around z₀.

Alternative (Sequential) Definition. Let D ⊆ ℂ. A function f: D → ℂ is said to be continuous at a point z₀ ∈ D if for every sequence {z_n}^∞_n=1 such that z_n ∈ D ∀n∈ℕ & z_n → z₀, we have $\lim_{z_n \to z_0} f(z_n) = f(z_0)$ .

Global Continuity

Definition. A function f: D → ℂ is said to be continuous if it is continuous at every point in its domain (∀z₀ ∈ D).

The Jacobian Matrix

Let $f : ℝ^n \to ℝ^m$ be a differentiable vector-valued function. The Jacobian matrix of f, denoted Df(x), is the m x n matrix of all first-order partial derivatives: $Df(x)= \begin{pmatrix}\frac{\partial f_1}{\partial x_1}&\cdots&\frac{\partial f_1}{\partial x_n}\\\ \cdots & \ddots & \dots\\\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n}\end{pmatrix}_{\vec{x}}$, it represents the total derivative of f at ${\vec{x}}$

This matrix represents the unique best linear approximation to f at x: f(x+h) = f(x) + Df(x)h + o(∥h∥).

Differentiability at a point

Definition. Differentiability at a point. Let $f : ℝ^n \to ℝ^m$ be a function and let x be an interior point of the domain of f, $x \in \text{interior(dom f)} $. The function f is differentiable at x if there exists a matrix $Df(x) \in ℝ^{m \times n}$ that satisfies $\lim_{\substack{z \in \text{dom} f \\ z \neq x, z \to x}} \frac{||f(z) - f(x) - Df(x)(z-x)||_2}{||(z-x)||_2} = 0$ [*]

This matrix Df(x) is called the derivative or the Jacobian matrix of f at the point x.

Differentiable function

Definition. A function f is called differentiable if its domain f (dom(f) ⊆ ℝⁿ) is open and f is differentiable at every point of its domain (∀x ∈ dom(f)).

image info

Figure. For f(x,y)=x²+y², the red plane at (1,1) is the Jacobian’s linear approximation.

Chain Rule

The Chain Rule is one of the most fundamental and powerful tools in calculus. It governs how derivatives behave under composition of functions — whether in single-variable or multivariable settings. In higher dimensions, it elegantly expresses the derivative of a composite function as the matrix product of the Jacobians of the component functions.

Chain Rule. Let:

$\mathbb{f}:\mathbb{R}^n \to \mathbb{R}^m$ be a vector-valued function differentiable at a point $x \in dom(\mathbb{f})$.
$\mathbb{g}:\mathbb{R}^m \to \mathbb{R}^p$ be a vector-valued function differentiable at the point $f(x) \in dom(\mathbb{g})$.

Define the composite function $\mathbb{h}:\mathbb{R}^n \to \mathbb{R}^p$ by $\mathbb{h}(x) = \mathbb{g}(\mathbb{f}(x))$.

Then, h is also differentiable at x and its its total derivative (Jacobian matrix) at x is given by $\mathbf{Dh(x) = Dg(f(x))Df(x)}$. This matrix equation is known as the multivariable chain rule.

Let’s verify the dimensions to ensure consistency:

Df(x) is the Jacobian of f: an m×n matrix (since f maps $\mathbb{R}^n \to \mathbb{R}^m$.
Dg(f(x)) is the Jacobian of g evaluated at f(x): a p×m matrix (since g maps $\mathbb{R}^m \to \mathbb{R}^p$).
Their product: (p×m)·(m×n) = p×n , which matches the expected size of Dh(x). The dimensions align perfectly.

The chain rule states that the overall change in h is a combination of the changes in f and g, propagated through the composition. Think of h=g∘f as a two-stage process:

Input x ∈ ℝⁿ is transformed by f into an intermediate output f(x) ∈ ℝ^m,
This output is then fed into g to produce the final result h(x) ∈ ℝ^p
The total sensitivity of h to small changes in x is therefore the composition of sensitivities: First, how f responds to changes in x; then, how g responds to changes in its input (which is f(x)).

Special Case: Scalar-Valued Composition (m = p = 1)

In particular, if m = p = 1, then $f:\mathbf{R}ⁿ \to \mathbf{R}, g:\mathbf{R} \to \mathbf{R}$. The chain rule reduces to: $\nabla \mathbb{(g∘f)}(x) = \mathbb{g'}(\mathbb{f}(x))\nabla \mathbb{f}(x)$ where $\mathbb{g'}(\mathbb{f}(x))$ is an ordinary derivative of g evaluated at f(x) and ∇f(x) is the n-vector of partials.

$Df(x)$ is an m×n matrix (the Jacobian of f at x). $Dg(f(x))$ is a p×m matrix (the Jacobian of g at the point f(x)). Thus, their product is a p×n matrix, matching the dimension of $Dh(x)$.

It essentially states that the overall change in h is a combination of the changes in f and g, propagated through the composition. If m = p = 1, the small change in h for a small change in x is governed by how f changes in each direction (the gradient $\nabla \mathbf{f(x)}$) scaled by the rate of change of g at f(x).

Illustrative Examples

Trigonometric Composition. Let f(x) = (sin(x), cos(x)) -n = 1, m = 2-, g(y, z) = y² + z² -m = 2, p = 1-. Then, their composition, h(x) = g(f(x)) = sin²(x) + cos²(x) = 1.

Let’s compute $\mathbf{Dh(x) = Dg(f(x))Df(x)}$. The derivative $Df(x)$ with respect to x is (a mxn, 2x1 matrix, a column vector): $\nabla \mathbf{f}(x) = (\begin{smallmatrix}cos(x)\\\\ -sin(x)\end{smallmatrix})$. The derivative $Dg(f(x))$ is (a pxm, 1x2, a row vector): $\nabla \mathbf{D}(g(f(x))) = (\begin{smallmatrix}2sin(x) & 2cos(x)\end{smallmatrix})$

Apply the Chain Rule: $\mathbf{Dh(x) = Dg(f(x))Df(x)} = (\begin{smallmatrix}2sin(x) & 2cos(x)\end{smallmatrix})(\begin{smallmatrix}cos(x)\\\\ -sin(x)\end{smallmatrix}) = 2sin(x)cos(x)+2cos(x)(-sin(x)) = 0$ (a ). Indeed, since h(x) ≡ 1 constant (pxn, 1x1, a real value), its derivate is zero.

Polynomial Composition. f(x, y) = (x² + y², x - y), g(u, v) = u + v. h(x, y) = g(f(x, y)) = (x² + y²) + (x - y).

$f_1(x, y)=x²+y², f_2(x, y) = x -y, \frac{\partial f_1}{\partial x} = 2x, \frac{\partial f_1}{\partial y} = 2y, \frac{\partial f_2}{\partial x} = 1, \frac{\partial f_2}{\partial y} = -1, \nabla \mathbf{f}(x, y) = (\begin{smallmatrix}2x & 2y\\\\ 1 & -1\end{smallmatrix})$ a mxn, 2x2 matrix, each row corresponds to the gradient of f₁ and f₂ respectively.

$\nabla \mathbf{g}(f(x, y)) = (\begin{smallmatrix}1 & 1\end{smallmatrix})$ (a pxm, 1x2 row vector). Apply the Chain Rule: $\mathbf{Dh(x) = Dg(f(x))Df(x)} = (\begin{smallmatrix}1 & 1\end{smallmatrix})(\begin{smallmatrix}2x & 2y\\\\ 1 & -1\end{smallmatrix}) = (\begin{smallmatrix}2x + 1 & 2y -1\end{smallmatrix})$, a 1 x 2 row vector.

Vector-Valued Example. Let f(t) =(cos(t), sin(t)), so f: ℝ → ℝ², g(x, y) = x² -y², so g: ℝ² → ℝ. h(t) = g(f(t)) = cos²(t) -sin²(t) = cos(2t).

Now compute h’(t) using the chain rule. Df(t) = $(\begin{smallmatrix}-sin(t) \\\\ cos(t)\end{smallmatrix})$ (a 2 x 1 column vector), Dg(x, y) = (2x, -2y). Dg(f(t)) = [2cos(t), −2sin(t)]. Then, h′(t) = Dg(f(t))⋅Df(t) = $[2cos(t), −2sin(t)]·(\begin{smallmatrix}-sin(t) \\\\ cos(t)\end{smallmatrix})$ = −2cos(t)sin(t) − 2sin(t)cos(t) = −4sin(t)cos(t) = −2sin(2t).

Direct differentiation of h(t)=cos(2t) gives h′(t)=−2sin(2t) — matches perfectly!✅

$f(\vec{x}) = \sum_{i=0}^{n} e^{x_i}, g(y) = ln(y).$ The composition function $\mathbb{h}:\mathbb{R}^n \to \mathbb{R}$ is defined as $\mathbb{h}(\vec{x}) = \mathbb{g}(\mathbb{f}(\vec{x})) = ln(\sum_{i=0}^{n} e^{x_i})$. By the Chain Rule, $\nabla \mathbb{h}(\vec{x}) = \mathbb{g'}(\mathbb{f}(\vec{x}))\nabla \mathbb{f}(\vec{x}) = \frac{1}{\sum_{i=0}^{n} e^{x_i}}\biggr(\begin{smallmatrix}e^{x_1}\\\\ e^{x_2}\\\\ \cdots \\\\ e^{x_n}\end{smallmatrix}\biggl)$
Let f(x, y) = x² + y², g(t) = eᵗ, so h(x, y) = $e^{x^2+y^2}$. Then: ∇f(x, y) = (2x, 2y), g’(t) = eᵗ ⇒ g’(f(x, y)) = $e^{x^2+y^2}$.

Hence, ∇h(x, y) = $e^{x^2+y^2}·(2x, 2y) = (2xe^{x^2+y^2}, 2ye^{x^2+y^2})$.

$f: \mathbf{R^n} \to \mathbf{R}, f(\vec{x}) = \sum_{i=1}^{n} x_i^{2} = ||\vec{x}||_2^{2}, g: \mathbf{R} \to \mathbf{R}, g(y) = \sqrt{y}.$ Then, the composition $h(\vec{x}) = g(f(\vec{x})) = ||\vec{x}||$ is the l₂ norm, also known as the Euclidean norm (it measures the length or magnitude of a vector in a Euclidean space).

f is differentiable on ℝⁿ, with gradient $\nabla f(\vec{x}) = 2\vec{x}$, g is differentiable for y > 0, with derivative $g'(y) = \frac{1}{2\sqrt{y}}$

Applying the Chain Rule (g is differentiable on the open set ℝⁿ) $\nabla \mathbb{h}(\vec{x}) = \mathbb{g'}(\mathbb{f}(\vec{x}))\nabla \mathbb{f}(\vec{x}) = \frac{1}{2\sqrt{||\vec{x}||_2^{2}}}2\vec{x} = \frac{\vec{x}}{||\vec{x}||_2}$ for every non-zero vector $\vec{x} \ne \vec{0}$. At $\vec{x} = \vec{0}$, the Euclidean norm is not differentiable due to a corner point.

Intuition: The total derivative is the product of local linear approximations.

Df(x) captures how small changes in x produce changes in f(x). It’s the unique linear map that approximates f around x.
Similarly, Dg(f(x)) tells us how small changes in the input of g (which is f(x)) cause changes in the output of g.
When you form the composite function h = g∘f, you first apply f, then g. To approximate h near x, you: (1) Approximate f near x by its linear map Df(x). (2) Feed that approximation into the linear map Dg evaluated at f(x).
Algebraically, this becomes the chain‐rule formula: D(g∘f)(x) = Dg(f(x)) · Df(x). Derivatives are fundamentally about linear approximations. Near any point, we approximate the function’s behavior with a linear function, and the chain rule shows us how these approximations compose multiplicatively.

Bibliography

Topics in Signal Processing By Shailesh Kumar.

The Chain Rule: Composing Derivatives in Multi-Dimensional Calculus