For every problem there is always, at least, a solution which seems quite plausible. It is simple and clean, direct, neat and nice, and yet very wrong, #Anawim, justtothepoint.com

In single-variable calculus, the derivative $f'(a)$ is the slope of the tangent line at x = a, satisfying $f(a + h) - f(a) = f′(a)h + o(∣h∣)$ as $h \to 0$.
In multivariable calculus ($F: \mathbb{R}^n \to \mathbb{R}^m$), the concept generalizes: the derivative is not a single number, but a linear transformation $DF(a): \mathbb{R}^n \to \mathbb{R}^m$ such that $F(a + h) - F(a) = DF(a)h + o(∣|h|∣)$ as $h \to 0$.
This linear map DF(a) is the best linear approximation* of F near a.
Definition. The n-dimensional Euclidean space ℝⁿ is the set of all ordered n-tuples: $\mathbb{R}^n = \{(x_1, x_2, \ldots, x_n) : x_i \in \mathbb{R}\}$ equipped with:
Definition. The standard basis for ℝⁿ consists of the vectors: $e_1 = (1, 0, 0, \ldots, 0), \quad e_2 = (0, 1, 0, \ldots, 0), \quad \ldots, \quad e_n = (0, 0, \ldots, 0, 1)$
More precisely: $(e_j)_i = \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}$
The standard basis $\{ e_j \}_{j=1}^n$ spans $\mathbb{R}^n$. Any vector can be written as: $x = (x_1, x_2, \ldots, x_n) = x_1 e_1 + x_2 e_2 + \cdots + x_n e_n = \sum_{j=1}^{n} x_j e_j$. x has coordinates $(x_1, x_2, \ldots, x_n)$ relative to this bases.
Definition. A function L: ℝⁿ → ℝᵐ is linear if:
Equivalently: $L(\alpha x + \beta y) = \alpha L(x) + \beta L(y)$ for all scalars α, β.
Key Property: A linear map is completely determined by its action on basis vectors: $L(x) = L\left(\sum_{j=1}^{n} x_j e_j\right) = \sum_{j=1}^{n} x_j L(e_j)$
Every linear map L: ℝⁿ → ℝᵐ can be represented by an m × n matrix A: $L(x) = Ax$
The columns of A are the images of the standard basis vectors:
$$A = \begin{pmatrix} | & | & & | \\ L(e_1) & L(e_2) & \cdots & L(e_n) \\ | & | & & | \end{pmatrix}$$Example: The linear map L: ℝ² → ℝ³ with L(e₁) = (1, 2, 3)ᵀ and L(e₂) = (4, 5, 6)ᵀ has matrix:
$$A = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix}$$For a function F: U ⊆ ℝⁿ → ℝᵐ, we want to approximate F near a point a using the simplest possible function: a linear map.
The derivative is not a number or a vector —it’s a linear transformation that captures the “first-order” behavior of F at a .
The approximation takes the form: $F(a + h) \approx F(a) + L(h)$ where L: $\mathbb{R}^n \to \mathbb{R}^m$ is linear and the error vanishes faster than ∥h∥. The derivative is the best such approximation.
Definition (Fréchet Derivative). Let F: U ⊆ ℝⁿ → ℝᵐ where U is open, and let a ∈ U. We say F is differentiable at a if there exists a linear map L: ℝⁿ → ℝᵐ such that:
$$\lim_{h \to 0} \frac{\|F(a + h) - F(a) - L(h)\|}{\|h\|} = 0$$The definition can be rewritten as: $F(a + h) = F(a) + L(h) + o(\|h\|) \quad \text{as } h \to 0$ where $o(\|h\|)$ denotes a term with $\lim_{h \to 0} \frac{o(\|h\|)}{\|h\|} = 0$.
Or more explicitly: $F(a + h) = F(a) + L(h) + \|h\| \cdot \varepsilon(h)$ where $\varepsilon(h) \to 0$ as $h \to 0$.
Theorem (Uniqueness). If F is differentiable at a, the derivative $DF_a$ is unique.
Proof. Suppose L₁ and L₂ both satisfy the definition. Then, $\frac{\|L_1(h) - L_2(h)\|}{\|h\|} = \frac{\|[F(a+h) - F(a) - L_2(h)] - [F(a+h) - F(a) - L_1(h)]\|}{\|h\|}$
By the triangle inequality, $\leq \frac{\|F(a+h) - F(a) - L_1(h)\|}{\|h\|} + \frac{\|F(a+h) - F(a) - L_2(h)\|}{\|h\|}$
Both terms on the right tend to 0 as $h \to 0$. Hence, $\lim_{h \to 0} \frac{\|L_1(h) - L_2(h)\|}{\|h\|} = 0 (\star).$
Because $L_1$ and $L_2$ are linear, for any fixed non-zero vector u and any real $t \ne 0$, $L_i(tu) = tL_i(u)$.
Next, take h = tu with $t \to 0$. Then, $\|h\| = |t|\|u\|$ and $\frac{\|L_1(tu) - L_2(tu)\|}{|t|\|u\|} = \frac{|t| \|L_1(u) - L_2(u)\|}{|t|\|u\|} = \frac{\|L_1(u) - L_2(u)\|}{\|u\|}$
This expression does not depend on t. Since the limit as $f \to 0$ must be 0 $(\star)$, we obtain $\frac{\|L_1(u) - L_2(u)\|}{\|u\|} = 0 \implies L_1(u) = L_2(u)$
The equality holds for every non‑zero vector u; for u = 0 it is trivial because linear maps send 0 to 0. Therefore, $L_1$ and $L_2$ agree on the whole space, i.e., $L_1 = L_2$ ∎
Why this matters? Uniqueness guarantees that the derivative is well‑defined; otherwise the notation $DF_a$ would be ambiguous. Furthermore, the derivative is the best linear approximation.
While the total derivative $DF_a$ captures how F changes in all directions simultaneously, partial derivatives measure rates of change along coordinate axes only.
Definition. Let F: U ⊆ ℝⁿ → ℝᵐ be a function and a ∈ U. The partial derivative of F with respect to $x_j$ at a is: $\frac{\partial F}{\partial x_j}(a) = \lim_{t \to 0} \frac{F(a + te_j) - F(a)}{t}$ where
The partial derivative $\frac{\partial F}{\partial x_j}(a)$ is the rate of change of F when we move from a in the $e_j$ direction, keeping all other coordinates fixed.
Example: F: ℝ² → ℝ³ defined by F(s, t) = (s² + t³, 2st, s + 3t) where $F_1(s,t) = s^2 + t^3$, $F_2(s,t) = 2st$, and $F_3(s,t) = s + 3t$
Partial with respect to s: $\frac{\partial F}{\partial s} = \begin{pmatrix} \frac{\partial F_1}{\partial s} \\[6pt] \frac{\partial F_2}{\partial s} \\[6pt] \frac{\partial F_3}{\partial s} \end{pmatrix} = \begin{pmatrix} 2s \\ 2t \\ 1 \end{pmatrix}$
Partial with respect to t: $\frac{\partial F}{\partial t} = \begin{pmatrix} \frac{\partial F_1}{\partial t} \\[6pt] \frac{\partial F_2}{\partial t} \\[6pt] \frac{\partial F_3}{\partial t} \end{pmatrix} = \begin{pmatrix} 3t^2 \\ 2s \\ 3 \end{pmatrix}$
Since $DF_a$ is a linear map from $\mathbb{R}^n$ to $\mathbb{R}^m$, it can be represented by an $m \times n$ matrix. This is called the Jacobian Matrix.
Definition. Let F: U ⊆ ℝⁿ → ℝᵐ with component functions $F = (F_1, F_2, \ldots, F_m)$. The Jacobian matrix of F at a point a ∈ U is the m × n matrix formed by all partial derivatives evaluated at x = a:
$$J_F(a) = \begin{pmatrix} \frac{\partial F_1}{\partial x_1}(a) & \frac{\partial F_1}{\partial x_2}(a) & \cdots & \frac{\partial F_1}{\partial x_n}(a) \\[8pt] \frac{\partial F_2}{\partial x_1}(a) & \frac{\partial F_2}{\partial x_2}(a) & \cdots & \frac{\partial F_2}{\partial x_n}(a) \\[8pt] \vdots & \vdots & \ddots & \vdots \\[8pt] \frac{\partial F_m}{\partial x_1}(a) & \frac{\partial F_m}{\partial x_2}(a) & \cdots & \frac{\partial F_m}{\partial x_n}(a) \end{pmatrix}$$Row View. The rows of the Jacobian are the transposes of the gradients of the component functions: $J_F(a) = \begin{pmatrix} — (\nabla F_1(a))^T — \\ — (\nabla F_2(a))^T — \\ \vdots \\ — (\nabla F_m(a))^T — \end{pmatrix}$
Column View: Each column is a partial derivative vector:
$$J_F = \begin{pmatrix} | & | & & | \\[4pt] \frac{\partial F}{\partial x_1} & \frac{\partial F}{\partial x_2} & \cdots & \frac{\partial F}{\partial x_n} \\[4pt] | & | & & | \end{pmatrix}$$To find the actual change vector $DF_a(h)$ for a specific displacement $h$, we perform matrix multiplication: $dF_a(h) = J_F(a) \cdot \begin{pmatrix} h_1 \\ \vdots \\ h_n \end{pmatrix}$
The differential dFₐ is the best linear approximation of the change in F near the point a. The Jacobian matrix JF(a) is the matrix that represents this linear transformation. When you multiply the Jacobian matrix by the vector h, you get the approximate change in F corresponding to the small change h.
Examples:
We can think of a complex number z = x + iy as a point $(x,y) \in \mathbb{R^{\mathnormal{2}}}$. So any function $f:\mathbb{C}\rightarrow \mathbb{C}$ can also be seen as a function $F:\mathbb{R^{\mathnormal{2}}}\rightarrow \mathbb{R^{\mathnormal{2}}},\quad F(x,y)=(u(x,y),v(x,y)),$ where f(x + iy) = u(x, y) + iv(x, y).
So there are two notions of differentiability:
They are related, but not the same. Complex differentiability is much more restrictive.
Real differentiability: any linear map is allowed. For a function $F:\mathbb{R^{\mathnormal{2}}}\rightarrow \mathbb{R^{\mathnormal{2}}}$, real differentiability at a point $(x_0,y_0)$ means that there exists a real linear map $DF(x_0,y_0):\mathbb{R^{\mathnormal{2}}}\rightarrow \mathbb{R^{\mathnormal{2}}}$ (a $2\times 2$ matrix) such that
$F(x_0+\Delta x,y_0+\Delta y)=F(x_0,y_0)+DF(x_0,y_0)\left( \begin{matrix}\Delta x\\ \Delta y\end{matrix}\right) +\mathrm{error},$ where the error is small compared to $\sqrt{(\Delta x)^2+(\Delta y)^2}.$
So in the real sense, a function is differentiable if it can be locally approximated by a linear transformation (a matrix multiplication). This matrix can stretch, rotate, reflect, or skew space in any way.
Complex differentiability: only complex multiplication is allowed. Now look at $f:\mathbb{C}\rightarrow \mathbb{C}$.
Complex differentiability at $z_0$ means that there exists a complex number a such that $f(z_0+h)=f(z_0)+a\, h+\mathrm{error},$ where the error is small compared to |h| as $h\rightarrow 0$.
In the real case, the linear approximation is any real-linear map $\mathbb{R^{\mathnormal{2}}}\rightarrow \mathbb{R^{\mathnormal{2}}}$. However, in the complex case, the linear approximation must be multiplication by a single complex number a, $a =re^{i\theta}$.
Multiplication by a complex number $z$ results only in rotation by angle $\theta$ and uniform scaling by r = |a|. It does not allow for reflection or skewing.
Every complex number $a=\alpha +i\beta$ defines a real-linear map $\mathbb{R^{\mathnormal{2}}}\rightarrow \mathbb{R^{\mathnormal{2}}}$ via multiplication: $a(x+iy)=(\alpha x-\beta y)+i(\beta x+\alpha y).$
In matrix form, this is: $\left( \begin{matrix}u\\ v\end{matrix}\right) =\left( \begin{matrix}\alpha &-\beta \\ \beta &\alpha \end{matrix}\right) \left( \begin{matrix}x\\ y\end{matrix}\right).$
So complex multiplication corresponds exactly to matrices of the form $\left( \begin{matrix}a&-b\\ b&a\end{matrix}\right)$.
These are precisely the matrices that represent rotation + uniform scaling (no reflection, no shear).
However, a general real derivative $DF(x_0,y_0)$ is an arbitrary matrix $\left( \begin{matrix}A&B\\ C&D\end{matrix}\right)$, with no relation between A, B, C, and D.
For complex differentiability, we require that this matrix comes from a single complex number, i.e. $\left( \begin{matrix}A&B\\ C&D\end{matrix}\right) =\left( \begin{matrix}a&-b\\ b&a\end{matrix}\right)$ for some real a, b. That forces: $A=D,\quad C=-B$.
These are exactly the Cauchy–Riemann equations in disguise.
Write f(z) = u(x, y) + iv(x, y), with z = x + iy. If f is complex differentiable at $z_0$, then:
Matching entries gives: $u_x=v_y,\quad u_y=-v_x.$ These are the Cauchy–Riemann equations. They are exactly the condition that the real derivative is not just any linear map, but one that comes from complex multiplication.
Conclusion: