Logic will get you from A to B. Imagination will take you everywhere, Albert Einstein

image info

In single-variable variable, limits and continuity spring directly from the ε–δ definitions you first met in Calculus I. In complex analysis, we use disks (or balls) instead of intervals, but keep the same ε–δ spirit and ideas. Then, when we move to multivariable functions, f: ℝⁿ→ℝᵐ, differentiability is no longer just a slope. It becomes the existence of a unique linear map (the Jacobian) that gives the best first‐order approximation (tangent-plane or hyperplane) to f. This article builds that bridge between different levels of abstraction in Calculus step by step.

This article:

Reviews ε–δ limits and continuity from ℝ to ℂ.
Generalizes these ideas to ℝⁿ→ℝᵐ.
Defines the Jacobian as the matrix of all partial derivatives.
Shows how it gives the precise linearization $\tilde{f}(x) = f(a) + \nabla f(a)^T(x-a)$
Proves the error term is “little-o” of ∥h∥.
Illustrates with concrete examples.

Recall

A complex number is specified by an ordered pair of real numbers (a, b) ∈ ℝ² and expressed or written in the form z = a + bi, where a and b are real numbers, and i is the imaginary unit, defined by the property i² = −1 ⇔ i = $\sqrt{-1}$, e.g., 2 + 5i, $7\pi + i\sqrt{2}.$ ℂ= { a + bi ∣a, b ∈ ℝ}.

Definition. Let D ⊆ ℂ be a set of complex numbers. A complex-valued function f of a complex variable, defined on D, is a rule that assigns to each complex number z belonging to the set D a unique complex number w, f: D ➞ ℂ.

The set D is called the domain of the complex function f, D = Dom(f).
The set of all actual outputs {f(z): z ∈ D} is called the range (or image) of the complex function f, also denoted f(D) = {f(z): z ∈ D}.

We often call the elements of D as points. If z = x+ iy ∈ D, then f(z) is called the image of the point z under f. f: D ➞ ℂ means that f is a complex function with domain D. We often write f(z) = u(x ,y) + iv(x, y), where u, v: ℝ² → ℝ are the real and imaginary parts.

ε–δ for f:ℝ → ℝ: ∀ε>0 ∃δ>0 s.t. |f(x) − L| < ε whenever 0 < ∣x−a∣ < δ.

Definition. Let D ⊆ ℂ, $f: D \rarr \Complex$ be a function and z₀ be a limit point of D (so arbitrarily close points of D lie around z₀, though possibly z₀ ∉ D). A complex number L is said to be a limit of the function f as z approaches z₀, written or expressed as $\lim_{z \to z_0} f(z)=L$, if for every epsilon ε > 0, there exist a corresponding delta δ > 0 such that |f(z) -L| < ε whenever z ∈ D and 0 < |z - z₀| < δ.

Why 0 < |z - z₀|? We exclude z = z₀ itself because the limit cares about values near z₀, not at z₀ itself. When z₀ ∉ D, you cannot evaluate f(z₀), so you only care about z approaching z₀. When z₀ ∈ D, you still want the function’s nearby behavior; this separates “limit” from “value.”

Equivalently, if ∀ε >0, ∃ δ > 0: (for every ε > 0, there exist a corresponding δ > 0) such that whenever z ∈ D ∩ B'(z₀; δ), f(z) ∈ B(L; ε) ↭ f(D ∩ B'(z₀; δ)) ⊂ B(L; ε).

If no such L exists, then we say that f(z) does not have a limit as z approaches z₀. This is exactly the same ε–δ formulation we know from real calculus, but now z and L live in the complex plane ℂ, and neighborhoods are round disks rather than intervals.

Continuity in the complex plane

Definition. Let D ⊆ ℂ. A function f: D → ℂ is said to be continuous at a point z₀ ∈ D if given any arbitrarily small ε > 0, there is a corresponding δ > 0 such that |f(z) - f(z₀)| < ε whenever z ∈ D and |z - z₀| < δ.

In words, arbitrarily small output‐changes ε can be guaranteed by restricting z to lie in a sufficiently small disk of radius δ around z₀.

Alternative (Sequential) Definition. Let D ⊆ ℂ. A function f: D → ℂ is said to be continuous at a point z₀ ∈ D if for every sequence {z_n}^∞_n=1 such that z_n ∈ D ∀n∈ℕ & z_n → z₀, we have $\lim_{z_n \to z_0} f(z_n) = f(z_0)$ .

Global Continuity

Definition. A function f: D → ℂ is said to be continuous if it is continuous at every point in its domain (∀z₀ ∈ D).

The Jacobian Matrix

Let $f : ℝ^n \to ℝ^m$ be a differentiable vector-valued function. The Jacobian matrix of f, denoted Df(x), is the m x n matrix of all first-order partial derivatives: $Df(x)= \begin{pmatrix}\frac{\partial f_1}{\partial x_1}&\cdots&\frac{\partial f_1}{\partial x_n}\\\ \cdots & \ddots & \dots\\\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n}\end{pmatrix}_{\vec{x}}$, it represents the total derivative of f at ${\vec{x}}$

Differentiability at a point

Definition. Differentiability at a point. Let $f : ℝ^n \to ℝ^m$ be a function and let x be an interior point of the domain of f, $x \in \text{interior(dom f)} $. The function f is differentiable at x if there exists a matrix $Df(x) \in ℝ^{m \times n}$ that satisfies $\lim_{\substack{z \in \text{dom} f \\ z \neq x, z \to x}} \frac{||f(z) - f(x) - Df(x)(z-x)||_2}{||(z-x)||_2} = 0$ [*]

This matrix Df(x) is called the derivative or the Jacobian matrix of f at the point x.

Differentiable function

Definition. A function f is called differentiable if its domain f (dom(f) ⊆ ℝⁿ) is open and f is differentiable at every point of its domain (∀x ∈ dom(f)).

image info

Figure. For f(x,y)=x²+y², the red plane at (1,1) is the Jacobian’s linear approximation.

First Order Approximation

Definition. The affine function $\tilde{f}(x) = f(a) + \nabla f(a)^T(x-a)$ is called the first-order or linear approximation of a real-valued function at the point x = a ∈ int(dom(f)) where:

f(a) is function’s value at the base point a. Geometrically, in a plot of z = f(x), it’s the height of the graph directly above x = a.
$\nabla f(a)$ is the gradient of the function f evaluated at the base point a. It is a vector pointing in the direction of the greatest rate of increase (steepest increase) of the function at that point. For a real-valued function of multiple variables, the gradient is a vector of partial derivatives: $\nabla f(a) = (\frac{\partial f}{\partial x_1}(a), \frac{\partial f}{\partial x_2}(a), \cdots, \frac{\partial f}{\partial x_n}(a))$
$\nabla f(a)^T$ is the transpose of the gradient vector, turning it into a column vector, so we can form a dot-product or matrix‐vector product.
(x - a) is the displacement vector, the difference between the point x (where we want to approximate the function) and the point a (where we’re basing our approximation). It’s a vector that represents the displacement from a to x, but in ℝⁿ, it lives in the same n-dimensional space as x.
$\nabla f(a)^T(x-a)$ is the dot product between the transposed gradient and the displacement vector, representing the directional derivative of f at a in the direction of (x -a). It tells us how much the function is changing as we move from a to x along a straight line.
Finally, $f(a) + \nabla f(a)^T(x-a)$ start with the function’s value at a (f(a)) and add the change predicted by the gradient ($\nabla f(a)^T(x-a)$). This gives us a linear approximation of the function’s value at x.
The first-order approximation assumes that the function behaves linearly near a. This is a good approximation if x is very close to a. However, as x moves further away from a, because we have replaced a generally curved function by its tangent hyperplane, the curvature of the function typically becomes more significant and the linear approximation becomes less accurate.

Rigorous Error Statement

Theorem (Little-o error bound). When f is continuously differentiable on an open set U containing a, f ∈ C¹(U) with U ⊂ ℝⁿ open, then $\mathbb{f}(x) = \mathbb{f}(a)+ \nabla \mathbb{f}(a)^T(x-a) + r(x), \text{ where } \lim_{x \to a } \frac{\parallel r(x) \parallel}{\parallel x - a \parallel} = 0$

Equivalently, $f(a+h) = \mathbb{f}(a)+ \nabla \mathbb{f}(a)^T(h) + r(h) = \mathbb{f}(a)+ \mathbb{Df}(a)h + r(h), \lim_{h \to 0} \frac{\parallel r(h) \parallel}{\parallel h \parallel} = 0.$

f is continuously differentiable on an open set U containing a, f ∈ C¹(U) if (1) all first-order partial derivatives of f exist on U and (2) these partial derivatives are continuous on U.

A function can be differentiable without having continuous partial derivatives.

$f(x) = \begin{cases} x^2 sin(1/x), &x > 0 \\\\ 0, &x < 0 \end{cases}$

f′(x) exists everywhere, but f′(x) is not continuous at x=0. So f is differentiable, but not C¹ near 0.

In Landau notation this is written: $\mathbb{f}(x) = \mathbb{f}(a)+ \nabla \mathbb{f}(a)^T(x-a) + o(||x-a||), a \in \mathbb{S}, \frac{o(t)}{t} \to 0 \text{ as } t \to 0⁺$. It means the remainder r(x) grows much slower than ∥x−a∥.

This theorem assumes that the function f is differentiable at the point x (f is continuously differentiable on its domain). If f is not differentiable at x, the limit in the theorem may not even exist or may not be zero.

Equivalence to the Derivative-Definition Limit

Theorem. Let $\mathbb{f}:\mathbb{R}^n \to \mathbb{R}$ be a real valued function defined on an open set $\mathbb{S} = dom(\mathbf{f})$ containing a. If $\mathbb{f}$ is continuously differentiable on its domain, then the following statement holds true, $\lim_{d \to 0} \frac{f(x+d)-f(x)-\nabla f(x)^Td}{||d||} = 0, \forall x \in \mathbb{S}$. In words, this first order approximation accuracy theorem states that for a differentiable function f at an arbitrary point x in its domain, the first order linear approximation provided by the gradient $\nabla f(x) = \biggr(\begin{smallmatrix}\frac{\partial f}{\partial x_1}\\\\ \frac{\partial f}{\partial x_2}\\\\ \cdots \\\\ \frac{\partial f}{\partial x_n}\end{smallmatrix}\biggl), f(x) + \nabla f(x)^Td$, becomes increasingly accurate as the displacement d from x approaches zero.

As the displacement d becomes smaller and smaller, the difference between the actual function value $\mathbb{f}(x+d)$ and its linear approximation $\mathbb{f}(x) + \nabla \mathbb{f}(x)^Td$ becomes negligible compared to the magnitude of the displacement ∣∣d∣∣ as d → 0. In other words, the linear approximation becomes increasingly accurate as we zoom in closer to the point x.

Single‐Variable Special Case

When we restrict to a function of one real variable, n = 1, f: ℝ → ℝ, the multivariable machinery collapses to familiar single-variable calculus. In this setting, the gradient ∇f(x) is just the ordinary derivative f′(x) and linearization becomes $\tilde{f}(x) = f(a) + f'(a)(x-a),$ the well-known tangent-line approximation from Calculus I.

Key Points:

It matches the actual function value at x = a.
Its slope equals the derivative f’(a), so it’s the tangent line.
Error Term and Remainder. The difference between the true value and the tangent-line approximation is negligible compared to (x-a). In little-o notation: f(x) = f(a) + f′(a)(x−a) + ½f′′(ξ)(x−a)² where ξ lies somewhere between a and x.

Geometric Intuition

In 1D, we replace a curve by its tangent line.
In 2D, a surface z = f(x, y) is replaced by its tangent plane.
In nD, we replace the graph by a tangent hyperplane, whose normal vector encodes the gradient.

As you zoom in (magnify) around the point a, the surface and its tangent plane become indistinguishable in the limit. That is the essence of differentiability.

Illustrative examples

1D Example: Sine Function

Let f(x) = sin(x), base point a = 0. True value at 0: f(0) = sin(0) = 0. Derivative at 0: f’(x) = cos(x), so f’(0) = 1.

Linearization about 0: $\tilde{f}(x) = f(a) + f'(a)(x-a) =[a = 0] f(0) + f'(0)(x-0) = 0 + 1·x = x.$ For small x, sin(x) ≈ x. Error bound (f’’(x)=-sin(x)): |sin(x) - x| ≤ ½f’’(ξ)x² for some ξ between 0 and x.

Since ∣sin(ξ)∣≤1 , we have: |sin(x) - x| ≤ ½x².

Example: x = 0.1. Approximation: sin(0.1) ≈ 0.1. True value (calculator): sin(0.1) ≈ 0.09983341664. Actual error: ∣0.09983341664−0.1∣=0.00016658336. Error bound: ½(0.1)² = 0.005. The actual error ≈ 0.000167 is much less than the upper bound 0.005.

2D Example: f(x, y) = x² + y²

Base point: (a, b) = (1, 2).
True value: f(1, 2) = 1² + 2² = 5.
Gradient: ∇f(x, y) = (2x, 2y), so ∇f(1, 2) = (2, 4).
Linearization about (1, 2): $\tilde{f}(x) = \mathbb{f}(a)+ \nabla \mathbb{f}(a)^T(x-a) = \mathbb{f}(1, 2) + \nabla \mathbb{f}(1, 2)^T(x-1, y-2) = 5 + (\begin{smallmatrix}2\\\ 4\end{smallmatrix})(x-1, y-2) = $ 5 + 2(x−1) + 4(y−2).

To approximate (1.1, 1.9): $\mathbb{f}(1.1, 1.9)$ = 5 + 2(0.1) + 4(−0.1) = 5 + 0.2 −0.4 = 4.8.

Actual value: f(1.1, 1.9) = 1.1² + 1.9² = 1.21 + 3.61 = 4.82. Error = 4.82 - 4.8 = 0.02.

Vector‐Valued Example (Jacobian in Action)

Let $\mathbb{F(x, y)} = (\begin{smallmatrix}\mathbb{u(x, y)}\\\ \mathbb{v(x, y)}\end{smallmatrix}) = (\begin{smallmatrix}x^2\dot y\\\ \mathbb{e^{x+y}}\end{smallmatrix})$

Base point: (a, b) = (0, 0).
True value: $\mathbb{F(0, 0)} = (\begin{smallmatrix}0\\\ \mathbb{e^{0}}\end{smallmatrix}) = (\begin{smallmatrix}0\\\ 1\end{smallmatrix})$
Gradient: $\mathbb{DF(x, y)} = (\begin{smallmatrix}\partial_x (x^2\dot y) & \partial_y (x^2\dot y)\\\ \partial_x (e^{x+y}) & \partial_y (e^{x+y})\end{smallmatrix}) = (\begin{smallmatrix}2xy & \partial_y x^2\\\ e^{x+y} & e^{x+y}\end{smallmatrix})$
At (0, 0), $\mathbb{DF(0, 0)} = (\begin{smallmatrix}0 & 0\\\ 1 & 1\end{smallmatrix})$
Linearization near (0, 0), $\tilde{\mathbb{F}}\mathbb{(x, y)} = \mathbb{F(0, 0)} + \mathbb{DF(0, 0)}(\begin{smallmatrix}x\\\ y\end{smallmatrix}) = (\begin{smallmatrix}0\\\ 1\end{smallmatrix}) + (\begin{smallmatrix}0 & 0\\\ 1 & 1\end{smallmatrix})(\begin{smallmatrix}x\\\ y\end{smallmatrix}) = (\begin{smallmatrix}0\\\ 1+x+y\end{smallmatrix})$
Use. $\tilde{\mathbb{F}}\mathbb{(0.1, -0.05)} = (\begin{smallmatrix}0\\\ 1+0.1-0.05\end{smallmatrix}) = (\begin{smallmatrix}0\\\ 1.05\end{smallmatrix})$
Actual value. $\mathbb{F(0.1, -0.05)} = (\begin{smallmatrix}(0.1)^2(-0.05)\\\ \mathbb{e^{0.05}}\end{smallmatrix}) = (\begin{smallmatrix}-0.0005\\\ 1.051271\end{smallmatrix})$
$(\begin{smallmatrix}0\\\ 1.05\end{smallmatrix})$ and $(\begin{smallmatrix}-0.0005\\\ 1.051271\end{smallmatrix})$ are very close.