To err is human, to blame it on someone else is even more human, Jacob’s Law.

- A function of two variables f: ℝ x ℝ → ℝ assigns to each ordered pair in its domain a unique real number, e.g., Area = $\frac{1}{2}b·h$, z = f(x, y) = 2x + 3y, f(x, y) = x
^{2}+ y^{2}, e^{x+y}, etc.

- Consider a function f(x
_{1}, x_{2}, ···, x_{n}) that depends on n variables. The partial derivative of f with respect to one of these variables, say x_{i}, is the derivative of f while holding the other variables constant.

For a function f(x,y), **the partial derivative with respect to x at a point (x _{0}, y_{0})** is defined as:

$\frac{\partial f}{\partial x}(x_0, y_0) = \lim_{\Delta x \to 0} \frac{f(x_0+\Delta x, y_0)-f(x_0, y_0)}{\Delta x}$

Using a different notation, $\frac{\partial f}{\partial x}(x, y) = \lim_{h \to 0} \frac{f(x+h, y)-f(x, y)}{h}$

- The tangent plane to the surface defined by the function f(x,y) at the point (x
_{0}, y_{0}) is given by: $z ≈ f(x_0, y_0) + \frac{\partial f}{\partial x}(x_0, y_0)(x -x_0)+ \frac{\partial f}{\partial y}(x_0, y_0)(y -y_0)$

This can be rewritten as:

$Δz ≈ \frac{\partial f}{\partial x}(x_0, y_0)(x -x_0)+ \frac{\partial f}{\partial y}(x_0, y_0)(y -y_0) =f_xΔx+f_yΔy$ (Figure B) where $Δz = z - f(x_0, y_0), Δx = x -x_0, Δy = y -y_0.$

This equation states that **the graph of f is close to its tangent plane in the vicinity of the point (x _{0}, y_{0})**. The tangent plane provides a good approximation of the surface near this point.

Least squares interpolation is a method used to fit a curve to a set of data points in such a way that **the sum of the squares of the differences between the observed and predicted values is minimized**. This technique aims to find the best-fitting curve by minimizing the total squared error between the actual data points and the curve.

Let’s illustrate this with a simple example where we fit a linear model to a set of data points (x_{i}, y_{i}). We choose a linear model of the form y = ax +b to represent the relationship between the independent and dependent variables. Here, a is the slope and b is the y-intercept of the line.

Our goal is to minimize **the total squared deviation between the observed values y _{i} and the predicted value ax_{i}+b.** The deviation for each data point is given by y

Therefore, our goal is to minimize the total square deviation, i.e., D(a, b) = $\sum_{i=1}^n [y_i-(ax_i +b)]^2$. In other words, our aim is to find the values of a and b that minimize D(a, b).

To find the minimum of D(a, b), we take the partial derivatives of D(a, b) with respect to a and b, and set them to zero.

$\frac{\partial D}{\partial a} = \sum_{i=1}^n 2(y_i-(ax_i +b))(-x_i)=0, \frac{\partial D}{\partial b} = \sum_{i=1}^n 2(y_i-(ax_i +b))(-1)=0$

By simplifying these partial derivatives, we get the following system of equations:

$\begin{cases} \sum_{i=1}^n (y_i-(ax_i +b))(-x_i)=0 \\ \sum_{i=1}^n (y_i-(ax_i +b))(-1)=0 \end{cases}$

Rearranging terms:

$\begin{cases} \sum_{i=1}^n (x_i^2a+x_ib-x_iy_i)= 0 \\ \sum_{i=1}^n (x_ia +b -y_i)=0 \end{cases}$

$\begin{cases} (\sum_{i=1}^n x_i^2)a + (\sum_{i=1}^nx_i)b = \sum_{i=1}^n x_iy_i \\ (\sum_{i=1}^n x_i)a + nb = \sum_{i=1}^n y_i \end{cases}$

This is a linear system of equations with two unknowns, a and b. We can solve this system to find the values of a and b that minimize the total squared error.

Least squares is a method to find the best-fitting line to a given set of points. The line is defined by the equation y = ax +b. The goal is to minimize the sum of the squared differences between the observed values and the values predicted by the line.

We start with the normal equations derived from the least squares method: $\begin{cases} (\sum_{i=1}^n x_i^2)a + (\sum_{i=1}^nx_i)b = \sum_{i=1}^n x_iy_i \\ (\sum_{i=1}^n x_i)a + nb = \sum_{i=1}^n y_i \end{cases}$ where n is the number of data points.

For our data points:

$\sum_{i=1}^n x_i = 0+2+3= 5, \sum_{i=1}^n y_i = 1+1+4 = 6, \sum_{i=1}^n x_i^2 = 0^2+2^2+3^2 = 0 + 4 + 9 = 13, \sum_{i=1}^n x_iy_i = 0 + 2 + 12 = 14, n = 3$ (since we have three data points).

Form the system of linear equations: $\begin{cases} 13a + 5b = 14 \\ 5a + 3b = 6 \end{cases}$

Solve the system of equations:

Multiply the second equation by 5: 25a +15b = 30. Multiply the first equation by 3: 39a +15b = 42.

Subtract the second modified equation from the first: (39a + 15b) − (25a+15b)= 42−30 ⇒ 14a=12 ⇒$a = \frac{6}{7}$.

Substitute a into the second equation: $5·\frac{6}{7} + 3b = 6 ⇒ 3b = 6 - \frac{30}{7} = \frac{42}{7}-\frac{30}{7} = \frac{12}{7} ⇒ b = \frac{4}{7}.$

So the line of best fit is $y = \frac{6}{7} + \frac{4}{7}x$ (Figure ii).

Given some data points (x_{1}, y_{1}), (x_{2}, y_{2}), ···, (x_{n}, y_{m}), we want to fit a quadratic function of the form y = ax^{2} + bx + c to these points using the least squares method. This means we want to minimize the sum of the squared differences between the observed y_{i} values and the predicted y values given by the quadratic function.

First, we **define the objective function**. The sum of squared errors (SSE) is given by: $\sum_{i=1}^n (y_i- (ax_i^2+bx_i+c))^2.$ We want to find the values of a, b, and c that minimize this SSE.

To minimize the SSE, we take partial derivatives of the SSE with respect to a, b, and c, and set them to zero.

$\frac{∂}{∂a}(\sum_{i=1}^n (y_i- (ax_i^2+bx_i+c))^2) = 0 ⇒[\text{Expanding and differentiating term by term}] \sum_{i=1}^n 2·(y_i- (ax_i^2+bx_i+c))·(-x_i^2)=0 ⇒ \sum_{i=1}^n -2x_i^2(y_i- (ax_i^2+bx_i+c)) = 0 ⇒[\text{Simplify}] \sum_{i=1}^n x_i^2y_i = a\sum_{i=1}^nx_i^4 + b\sum_{i=1}^nx_i^3 + c\sum_{i=1}^nx_i^2$

$\frac{∂}{∂b}(\sum_{i=1}^n (y_i- (ax_i^2+bx_i+c))^2) = 0 ⇒[\text{Expanding and differentiating term by term}] \sum_{i=1}^n 2·(y_i- (ax_i^2+bx_i+c))·(-x_i)=0 ⇒ \sum_{i=1}^n -2x_i(y_i- (ax_i^2+bx_i+c)) = 0 ⇒[\text{Simplify}] \sum_{i=1}^n x_iy_i = a\sum_{i=1}^nx_i^3 + b\sum_{i=1}^nx_i^2 + c\sum_{i=1}^nx_i$

$\frac{∂}{∂c}(\sum_{i=1}^n (y_i- (ax_i^2+bx_i+c))^2) = 0 ⇒[\text{Expanding and differentiating term by term}] \sum_{i=1}^n 2·(y_i- (ax_i^2+bx_i+c))·(-1)=0 ⇒ \sum_{i=1}^n -2(y_i- (ax_i^2+bx_i+c)) = 0 ⇒[\text{Simplify}] \sum_{i=1}^n y_i = a\sum_{i=1}^nx_i^2 + b\sum_{i=1}^nx_i + c\sum_{i=1}^n1 = a\sum_{i=1}^nx_i^2 + b\sum_{i=1}^nx_i + cn$

The three equations derived from the partial derivatives are known as the normal equations. They are:

$\begin{cases} \sum_{i=1}^n x_i^2y_i = a\sum_{i=1}^nx_i^4 + b\sum_{i=1}^nx_i^3 + c\sum_{i=1}^nx_i^2 \\ \sum_{i=1}^n x_iy_i = a\sum_{i=1}^nx_i^3 + b\sum_{i=1}^nx_i^2 + c\sum_{i=1}^nx_i \\ \sum_{i=1}^n y_i = a\sum_{i=1}^nx_i^2 + b\sum_{i=1}^nx_i + cn \end{cases}$

For a better fit, especially if the data suggests a nonlinear trend, we can use a quadratic model: $y = ax^2 + bx + c.$ To find the coefficients a, b, and c, we minimize the function: $f(a, b, c) = \sum_{i=1}^3 (y_i- (ax_i^2+bx_i+c))^2.$

Given the sums from the data points (0, 1), (2, 1), and (3, 4): $\sum_{i=1}^3 x_i = 5, \sum_{i=1}^3 y_i = 6, \sum_{i=1}^3 x_i^2 = 0 + 4 +9 = 13, \sum_{i=1}^3 x_iy_i = 0 +2 + 12 = 14, \sum_{i=1}^3 x_i^3 = 0 + 8 + 27 = 35, \sum_{i=1}^3 x_i^4 = 0 + 16 + 81 = 97, \sum_{i=1}^3 x_i^2y_i = 0 + 4 + 36 = 40$

Let’s plug these values into our equations (n = 3):

$\begin{cases} 40 = 97a + 35b + 13c \\ 14 = 35a + 13b + 5c \\ 6 = 13a + 5b + 3c \end{cases}$

The solution of these system is a = 1, b = −2, and c = 1. The quadratic function that best fits the data points (0, 1), (2, 1), and (3, 4) using the least square method is: y = x^{2} -2x + 1.

Linear fit: For x = 0, $y = \frac{6}{7}≈ 0.857$. It is close, but not exactly 1. For x = 2, $y = \frac{6}{7} + \frac{4}{7}·2 = \frac{14}{7} = 2 ≠ 1$. For x = 3, y ≈ 2.571 ≠ 4.

Quadratic fit: For x = 0, y = 0^{2} −2(0) +1 = 1 (0, 1). For x = 2, y = 2^{2} −2(2) +1 = 1 (2, 1). For x = 3,y = 3^{2} −2(3) +1 = 4 (3, 4). The quadratic fit passes exactly through all the given points. In conclusion, the quadratic fit perfectly matches all points, whereas the linear fit does not.

Moore's law is the observation (or the prediction) that the number of transistors in an integrated circuit (IC) doubles approximately every two years. This concept, first articulated by Gordon Moore, co-founder of Intel, has held remarkably true for several decades and is a key driver of technological advancement.

The exponential growth described by Moore’s Law can be mathematically represented by the function: y = $c·e^{ax}$ where:

- y represents the number of transistors on an integrated circuit.
- c is a constant multiplier that accounts for initial conditions.
- a is the growth rate parameter.
- x is the time measured in years.

To analyze this growth using linear regression, we can transform the exponential function into a linear form. Taking the natural logarithm of both sides of the equation y = $c·e^{ax}$, we get: $ln(y) = ln(c) + ln(e^{ax})$ ↭ **ln(y) = ln(c) + ax**.

This transformation reveals that
ln(y) is a linear function of x, where ln(c) is the y-intercept, a is the slope, and more importantly that’s the **linear best fit**.

To find the best straight-line fit for ln(y) over time, we use linear regression. Below is a Python script that demonstrates how to perform this regression using sample data.

```
import numpy as np
import matplotlib.pyplot as plt
import math
# Sample data (years and number of transistors)
years = np.array([1970, 1972, 1974, 1976, 1978, 1980, 1982, 1984, 1986, 1988, 1990, 1992, 1994, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018])
transistors = np.array([2300, 4500, 10000, 20000, 29000, 55000, 120000, 275000, 1180000, 3100000, 11800000, 27500000, 55000000, 278000000, 1000000000, 4200000000, 22000000000, 59200000000, 291000000000, 904000000000, 2600000000000, 7300000000000, 18800000000000, 36000000000000, 70000000000000])
# Transform data for linear regression
log_transistors = np.log(transistors)
# Perform linear regression (Least squares polynomial fit)
slope, intercept = np.polyfit(years, log_transistors, 1)
# Compute predicted values
predicted_log_transistors = slope * years + intercept
predicted_transistors = np.exp(predicted_log_transistors)
# Plotting the data and the fit
plt.figure(figsize=(10, 6))
plt.scatter(years, transistors, label='Observed Data', color='blue')
plt.plot(years, predicted_transistors, label='Fitted Line', color='red')
plt.yscale('log')
plt.xlabel('Year')
plt.ylabel('Number of Transistors (log scale)')
plt.title("Moore's Law: Transistor Count Over Time")
plt.legend()
plt.grid(True)
plt.show()
```

This content is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and is based on MIT OpenCourseWare [18.01 Single Variable Calculus, Fall 2007].

- NPTEL-NOC IITM, Introduction to Galois Theory.
- Algebra, Second Edition, by Michael Artin.
- LibreTexts, Calculus. Abstract and Geometric Algebra, Abstract Algebra: Theory and Applications (Judson).
- Field and Galois Theory, by Patrick Morandi. Springer.
- Michael Penn, Andrew Misseldine, blackpenredpen, and MathMajor, YouTube’s channels.
- Contemporary Abstract Algebra, Joseph, A. Gallian.
- MIT OpenCourseWare, 18.01 Single Variable Calculus, Fall 2007 and 18.02 Multivariable Calculus, Fall 2007, YouTube.
- Calculus Early Transcendentals: Differential & Multi-Variable Calculus for Social Sciences.