Linear Algebra

math

All things Linear Algebra, matrices, vectors etc.

Author

Theo POMIES

Published

August 29, 2025

Modified

September 12, 2025

Definitions & Formulas

Vector Space

A vector space (over a field, like $\mathbb{R}$, which provides the scalars) is a set of vectors equipped with vector addition and scalar multiplication that satisfies the following conditions:

Closure under addition: $\mathbf{u} + \mathbf{v}$ is in the space for any $\mathbf{u}, \mathbf{v}$ in the space.
Closure under scalar multiplication: $\alpha \mathbf{u}$ is in the space for any scalar $\alpha$.
Associativity of addition: $\mathbf{u} + (\mathbf{v} + \mathbf{w}) = (\mathbf{u} + \mathbf{v}) + \mathbf{w}$.
Commutativity of addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$.
Additive identity: there exists $\mathbf{0}$ such that $\mathbf{u} + \mathbf{0} = \mathbf{u}$.
Additive inverse: for each $\mathbf{u}$, there exists $-\mathbf{u}$ such that $\mathbf{u} + (-\mathbf{u}) = \mathbf{0}$.
Compatibility of scalar multiplication: $\alpha(\beta \mathbf{u}) = (\alpha\beta) \mathbf{u}$.
Identity of scalar multiplication: $1 \cdot \mathbf{u} = \mathbf{u}$.
Distributivity over vector addition: $\alpha(\mathbf{u} + \mathbf{v}) = \alpha \mathbf{u} + \alpha \mathbf{v}$.
Distributivity over scalar addition: $(\alpha + \beta)\mathbf{u} = \alpha \mathbf{u} + \beta \mathbf{u}$.

Note

This might be useful later, but for all intents and purposes in these notes, the field will be $\mathbb{R}$, vectors will be in $\mathbb{R}^n$, and addition and scalar multiplication will be element-wise.

Transpose

\[ (\mathbf{A}^\top)_{i,j} = \mathbf{A}_{j,i} \]

Dot Product

\[ \mathbf{u} \cdot \mathbf{v} = \sum_{i} u_i v_i \]

u, v = torch.arange(3), torch.arange(3)  # torch.tensor([0, 1, 2])
torch.dot(u, v)  # u @ v

tensor(5)

Note

The dot product has a geometric meaning.

\[\mathbf{v}\cdot\mathbf{w} = \|\mathbf{v}\|\|\mathbf{w}\|\cos{\theta}\] \[\iff \theta = \arccos\left(\frac{\mathbf{v}\cdot\mathbf{w}}{\|\mathbf{v}\|\|\mathbf{w}\|}\right).\]

Note

In linear algebra, vectors are column vectors by default, so \[ \mathbf{u} \in \mathbb{R}^{n \times 1}, \quad \mathbf{u} = \begin{bmatrix} u_1 \\ \vdots \\ u_n \end{bmatrix} \]

Matrix-Vector Product

\[ (\mathbf{A}\mathbf{u})_{i} = \mathbf{a}_{i,:} \cdot \mathbf{u} = \sum_{j} a_{i,j} u_j \]

A, u = torch.arange(6).reshape(3, 2), torch.arange(2)
A @ u

tensor([1, 3, 5])

Note

When we say “we apply a linear transformation $\mathbf{A}$ to $\mathbf{v}$”, we mean $\mathbf{A}\mathbf{v}$.

Matrix Multiplication

\[ (\mathbf{A}\mathbf{B})_{i,j} = \mathbf{a}_{i,:} \cdot \mathbf{b}_{:, j} = \sum_{k} a_{i,k} b_{k,j} \]

U, V = torch.arange(6).reshape(3, 2), torch.arange(6).reshape(2, 3)
U @ V

tensor([[ 3,  4,  5],
        [ 9, 14, 19],
        [15, 24, 33]])

Properties

Non-commutativity: most of the time $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$
Distributivity: $\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{B}\mathbf{C}$ and $(\mathbf{A} + \mathbf{B})\mathbf{C} = \mathbf{A}\mathbf{C} + \mathbf{B}\mathbf{C}$
Associativity: $\mathbf{A}(\mathbf{B}\mathbf{C}) = (\mathbf{A}\mathbf{B})\mathbf{C}$
Transpose: $(\mathbf{A}\mathbf{B})^\top = \mathbf{B}^\top\mathbf{A}^\top$

Outer Product

\[ \mathbf{u} \otimes \mathbf{v} = \begin{bmatrix} u_1v_1 & u_1v_2 & \dots & u_1v_n \\ u_2v_1 & u_2v_2 & \dots & u_2v_n \\ \vdots & \vdots & \ddots & \vdots \\ u_mv_1 & u_mv_2 & \dots & u_mv_n \\ \end{bmatrix} = \mathbf{u}\mathbf{v}^\top\]

u, v = torch.arange(3), torch.arange(3)
torch.outer(u, v), torch.all(torch.outer(u, v) == u.reshape(3, 1) @ v.reshape(1, 3))

(tensor([[0, 0, 0],
         [0, 1, 2],
         [0, 2, 4]]),
 tensor(True))

Note

The outer product of two column vectors is the matrix whose entries are all products of an element in the first vector with an element in the second vector.

Inverse of a Square Matrix

\[ \mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I},\quad \mathbf{A} \in \mathbb{R}^{n \times n} \]

Note

Might not exist if $\mathbf{A}$ is not invertible, that is $\det(\mathbf{A}) = 0$

A = torch.randn(3, 3, dtype=torch.double)
A_i = A.inverse()
I = torch.eye(3, dtype=torch.double)
torch.allclose(A @ A_i, I), torch.allclose(A_i @ A, I)

(True, True)

Determinant of a Matrix

I think of the determinant of $\mathbf{A}$ as the scaling factor of the linear transformation represented by $\mathbf{A}$.

This explains why a matrix whose determinant is 0 is not invertible: it “collapses”, and two images might have the same original input $\mathbf{x}$

Basic identities

\[ \det(\mathbf{A}^\top) = \det(\mathbf{A}) \] \[ \det(\mathbf{A}\mathbf{B}) = \det(\mathbf{A})\det(\mathbf{B}) \] \[ \det(\mathbf{A}^{-1}) = \dfrac{1}{\det(\mathbf{A})} \iff \det(\mathbf{A}) \neq 0 \]

Determinant of Diagonal Matrix

\[ A = \operatorname{diag}(a_1, \dots, a_n), \quad \det(A) = \prod_{i=1}^n a_i \]

# using I as a mask
I = torch.eye(3, dtype=torch.float32)
A = torch.arange(1, 10, dtype=torch.float32).reshape(3, 3) * I  # A = diag(1, 5, 9)
torch.det(A), torch.det(A) == torch.diag(A).prod()

(tensor(45.), tensor(True))

Determinant of a Triangular Matrix

\[ \det(T) = \prod_{i} t_{i,i}, \quad \text{where $T$ is triangular} \]

A = torch.triu(torch.arange(1, 10, dtype=torch.float32).reshape(3, 3))
torch.det(A), torch.det(A) == torch.diag(A).prod()

(tensor(45.), tensor(True))

Orthogonality

\[ \mathbf{A} \, \textnormal{is orthogonal} \iff \mathbf{A}^\top\mathbf{A} = \mathbf{I}\: (= \mathbf{A}\mathbf{A}^\top) \]

Interstingly,

\[ \mathbf{A} \, \textnormal{is orthogonal} \iff \mathbf{A}^{-1} = \mathbf{A}^\top \]

Elementary Matrix

An elementary matrix is a matrix obtained from the application of a single elementary row operation to the identity matrix $\mathbf{I}$.

Because of associativity, the product of an elementary matrix and another matrix is the same as applying the row operation to the other matrix.

Scaling a row

Multiplying row $i$ by $m$

\[\mathbf{D}_i(m) = \operatorname{diag}(1, \dots, 1, d_i = m, 1, \dots, 1)\]

Properties

$\mathbf{D}_i(m) = \mathbf{D}_i(\dfrac{1}{m})^{-1} = \mathbf{D}_i(m)^\top$
$\operatorname{det}(\mathbf{D}_i(m)) = m$ — obviously because it’s a diagonal matrix full of $1$’s and a single $m$

Adding a multiple of one row to another

Adding $m$ times row $j$ to row $i$

\[\mathbf{L}_{ij}(m) = \begin{bmatrix} 1 &&&&&& \\ & \ddots &&&&& \\ && 1 &&&& \\ &&& \ddots &&& \\ && l_{i,j} = m && 1 && \\ &&&&& \ddots & \\ &&&&&& 1 \end{bmatrix}\]

Properties

$\mathbf{L}_{ij}(m)^{-1} = \mathbf{L}_{ij}(-m)$ — the inverse of adding a row multiple to another, is subtracting the same row multiple
$\mathbf{L}_{ij}(m)$ and $\mathbf{L}_{ij}(m)^{-1}$ are triangular matrices
So $\operatorname{det}(\mathbf{L}_{ij}(m)) = 1 \iff \operatorname{det}(\mathbf{L}_{ij}(m)\mathbf{A}) = \operatorname{det}(\mathbf{A})$

Swapping (switching) rows

Swapping rows $i$ and $j$

\[\mathbf{T}_{ij} = \begin{bmatrix} 1 &&&&&& \\ & \ddots &&&&& \\ && 0 && t_{j,i} = 1 && \\ &&& \ddots &&& \\ && t_{i,j} = 1 && 0 && \\ &&&&& \ddots & \\ &&&&&& 1 \end{bmatrix}\]

Properties

$\mathbf{T}_{ij}^\top = \mathbf{T}_{ij}^{-1} = \mathbf{T}_{ij}$
$\operatorname{det}(\mathbf{T}_{ij}) = -1\:^{(*)} \iff \operatorname{det}(\mathbf{T}_{ij}\mathbf{A}) = -\operatorname{det}(\mathbf{A})$

$^{(*)}$ Consider that swapping two rows can be expressed as applying consecutive additions/subtractions as follows \[\begin{aligned} \mathbf{T}_{ij} & = \mathbf{D}_{i}(-1)\mathbf{L}_{ij}(-1)\mathbf{L}_{ji}(1)\mathbf{L}_{ij}(-1) \\ \iff \operatorname{det}(\mathbf{T}_{ij}) & = \operatorname{det}(\mathbf{D}_{i}(-1)) \cdot \operatorname{det}(\mathbf{L}_{ij}(-1)) \cdot \operatorname{det}(\mathbf{L}_{ji}(1)) \cdot \operatorname{det}(\mathbf{L}_{ij}(-1)) \\ \iff \operatorname{det}(\mathbf{T}_{ij}) & = -1 \\ \end{aligned}\]

Linear Dependence

We say of $n$ vectors — $\mathbf{v}_1, \dots, \mathbf{v}_n$ that they are linearly dependent if there exists scalars $a_1, \dots, a_n$ not all equal to 0 satisfying \[\sum_{i=1}^n a_i\mathbf{v}_i = 0\]

If the only solution is $a_i = 0$ for $i$ in $0, \dots, n$, they are linearly independent.

Note

Linear dependence means that some vectors could be expressed as a weighted sum (linear combination) of others, hence carrying redudant information/operation.

Rank

The rank of a matrix is the size of the largest subset of linearly independent columns among its columns. Equivalently, it is the dimension of the column space.

Note

This reflects that if some columns are linearly dependent, they are redundant: they do not contribute a new direction. Thus, the matrix can be expressed with fewer independent columns, and the image (output space) of the matrix has correspondingly fewer dimensions.

Basis

A basis (not the, since a vector space can have many different bases) of a vector space is a linearly independent subset that spans the space.

All bases of the same vector space have the same size = the dimension

Note

That means: a basis is a set of vectors that you can combine (with weighted sums) to form any vector in the space — and none of them are redundant.

Dimension

The dimension of a vector space is the number of vectors in a basis of that space.

Note

Since every basis of a vector space has the same size, the dimension is well-defined. It tells you “how many independent directions” the space has.

Eigendecomposition

Eigen{vectors, values}

An eigenvector is a vector whose direction (eg. “line”) is unchanged by a linear transformation (represented by a matrix). That means that when applying the linear transformation $\mathbf{A}$ to an eigeinvector $\mathbf{v}$ it simply gets scaled by a constant scalar factor, the eigenvalue $\lambda$. Represented by this eigenequation. \[\begin{aligned} \mathbf{A}\mathbf{v} & = \lambda\mathbf{v} \\ \iff (\mathbf{A} - \lambda\mathbf{I})\mathbf{v} & = \mathbf{0} \label{eq:1} \end{aligned} \tag{1}\]

This equation 1 has a non-zero solution $\mathbf{v}$ only if $\operatorname{det}(\mathbf{A} - \lambda\mathbf{I}) = 0$ — called the characteristic equation of $\mathbf{A}$. The values of $\lambda$ that satisfy this equation are the egeinvalues of $\mathbf{A}$.

The eigenvectors $\mathbf{v}_{\lambda=n}$ corresponding to each eigenvalue can be found by plugging the values of $\lambda$ in equation 1.

Decomposing matrices

By definition of eigenvectors, for a square matrix $\mathbf{A}$, if it has a full set of linearly independent eigenvectors (ie. it’s diagonalizable), we have

\[ \mathbf{A}\mathbf{V} = \mathbf{V}\mathbf{\Lambda} \]

where $\mathbf{V}$ a matrix whose columns are the the eigenvectors of $\mathbf{A}$, and $\mathbf{\Lambda}$ the diagonal of associated eigenvalues.

Because $\mathbf{V}$ is invertible, we can multiply $\mathbf{V}^{-1}$ by both sides:

\[ \mathbf{A} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1} \]

Properties and uses

\[ \mathbf{A}^n = \overbrace{\mathbf{A}\dots\mathbf{A}}^{\text{n times}} = \overbrace{(\mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1})\dots(\mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1})}^{\text{n times}} = \mathbf{V}\overbrace{\mathbf{\Lambda}\dots\mathbf{\Lambda}}^{\text{n times}}\mathbf{V}^{-1} = \mathbf{V}\mathbf{\Lambda}^n\mathbf{V}^{-1}\]

For negative powers: \[\mathbf{A}^{-1} = \mathbf{V}\mathbf{\Lambda}^{-1}\mathbf{V}^{-1}\]

as long as all diagonal entry $\lambda_{i,i}$ (ie. eigenvalues) are non-zero. Since \[\begin{aligned} \det(\mathbf{A}) & = \det(\mathbf{V})\cdot\det(\mathbf{\Lambda})\cdot\det(\mathbf{V}^{-1}) \\ & = \det(\mathbf{V})\cdot\det(\mathbf{\Lambda})\cdot\dfrac{1}{\det(\mathbf{V})} \\ & = \det(\mathbf{\Lambda}) = \prod_{i=1}^n \lambda_i \end{aligned}\]

(see determinant identities and determinant of diagonal matrix)

This reflects that a matrix is invertible if its determinant is 0.

Proofs

Later!

Algorithms

Gaussian Elimination

Gaussian elimination — or row reduction – is an algorithm for solving systems of linear equations, based on the following elementary row operations:

Scaling a row: $\mathbf{R}_i \gets k\mathbf{R}_i$, where $k \neq 0$
Adding a multiple of one row to another: $\mathbf{R}_i \gets \mathbf{R}_i + k\mathbf{R}_j$, where $i \neq j$
Swapping two rows: $\mathbf{R}_i \leftrightarrow \mathbf{R}_j$

Laplace Expansion

todo

LU decomposition (or LU factorization)

Computing the determinant of a Matrix is not trivial at first glance.

We know how to easily compute:

the determinant of a product of matrices
the determinant of a elementary matrices

Knowing that, we want to find a representation of our original matrix $\mathbf{A}$ that involves an Upper Triangular Matrix $\mathbf{U}$, and one or more other matrices whose determinant is known or trivial to compute, as $\mathbf{P}\mathbf{A} = \mathbf{L}\mathbf{U}$

To go from $\mathbf{A}$ to $\mathbf{U}$ we’ll use Gaussian Elimination, $\mathbf{P}$ tracks our permutations (row swaps) and $\mathbf{L}$ tracks our row operations (row additions).

Now, because $\mathbf{P}$ is orthogonal, we have \[ \mathbf{P}\mathbf{A} = \mathbf{L}\mathbf{U} \implies \mathbf{A} = \mathbf{P}^{-1}\mathbf{L}\mathbf{U} = \mathbf{P}^\top\mathbf{L}\mathbf{U} \]

Finally, this means that \[ \det(\mathbf{A}) = \det(\mathbf{P}^\top) \cdot \det(\mathbf{L}) \cdot \det(\mathbf{U}) = \det(\mathbf{P}) \cdot \det(\mathbf{L}) \cdot \det(\mathbf{U}) \]

$\det(\mathbf{P}) = (-1)^{\#swaps}$
$\det(\mathbf{L}) = 1$ — product of “multiplied row additions” elementary matrices

Now, if we just keep track of row swaps, we can easily compute $\det(\mathbf{A})$!

Notation

$x$: a scalar
$\mathbf{x}$: a vector
$\mathbf{X}$: a matrix
$x_i$: the $i^\textrm{th}$ element of vector $\mathbf{x}$
$x_{i,j}$: the element of matrix $\mathbf{X}$ at row $i$ and column $j$
$\mathbf{x}_{i, :}$: the $i^\textrm{th}$ row-vector of $\mathbf{X}$
$\mathbf{x}_{:,j}$: the $j^\textrm{th}$ column-vector of $\mathbf{X}$
$\operatorname{diag}(a_1, \dots, a_n)$: a diagonal matrix
$\mathbf{I}$: the indentity matrix
$\mathbf{0}$: the zero vector / zero matrix — depending on context
$(\cdot)^\top$: the transpose of a vector or matrix
$\mathbf{A}^{-1}$: the inverse of a matrix
$[\cdot, \cdot]$: concatenation
$\odot$: Hadamard (elementwise) product
$\otimes$: Outer product