I have yet to see any problem, however complicated, which, when looked at in the right way did not become still more complicated, Paul Anderson.
Definition. The affine function given by $\tilde{f}(x) = f(a) + \nabla f(a)^T(x-a)$ is a first order approximation of a real-valued function at a point x = a ∈ int(dom(f)) where:
The first-order approximation assumes that the function behaves linearly near a. This is a good approximation if x is very close to a. However, as x moves further away from a, the curvature of the function typically becomes more significant, and the linear approximation becomes less accurate.
Theorem. Let $\mathbb{f}:\mathbb{R}^n \to \mathbb{R}$ be a real valued function defined on an open set $\mathbb{S} = dom(\mathbf{f})$. If $\mathbb{f}$ is continuously differentiable on its domain, then the following statement holds true, $\lim_{d \to 0} \frac{f(x+d)-f(x)-\nabla f(x)^Td}{||d||} = 0, \forall x \in \mathbb{S}$. In words, this first order approximation accuracy theorem states that for a differentiable function f at an arbitrary point x in its domain, the first order linear approximation provided by the gradient $\nabla f(x) = \biggr(\begin{smallmatrix}\frac{\partial f}{\partial x_1}\\\\ \frac{\partial f}{\partial x_2}\\\\ \cdots \\\\ \frac{\partial f}{\partial x_n}\end{smallmatrix}\biggl), f(x) + \nabla f(x)^Td$, becomes increasingly accurate as the displacement d from x approaches zero.
As the displacement d becomes smaller and smaller, the difference between the actual function value $\mathbb{f}(x+d)$ and its linear approximation $\mathbb{f}(x) + \nabla \mathbb{f}(x)^Td$ becomes negligible compared to the magnitude of the displacement ∣∣d∣∣. In other words, the linear approximation becomes increasingly accurate as we zoom in closer to the point x.
This theorem assumes that the function f is differentiable at the point x (f is continuously differentiable on its domain). If f is not differentiable at x, the limit in the theorem may not even exist or may not be zero. Mathematically, this idea can also be express as $\mathbb{f}(x) = \mathbb{f}(a)+ \nabla \mathbb{f}(a)^T(x-a) + o(||x-a||), a \in \mathbb{S}, \frac{o(t)}{t} \to 0 \text{ as } t \to 0⁺$