Matrix Operations and Properties
2 of 36Mathematics for Machine Learning
Matrix Operations and Properties
A small handful of matrix operations — multiplication, transpose, inverse, trace, and rank — power everything from linear regression to the attention mechanism. This lesson is about developing fluency with those operations and, crucially, knowing when each one fails.
1. Matrix Multiplication
For and , the product has shape , with
The inner dimensions must match. Three useful ways to see matrix multiplication — pick whichever fits the moment:
- Row-by-column: the classic "take the -th row of , dot-product with the -th column of ". Good for hand calculation.
- Column combinations: every column of is a linear combination of columns of , with coefficients from that column of . Good for understanding what is "doing" to .
- Composition of transformations: if and are each linear maps, is the map you get by applying then . Good for thinking about neural network layers.
2. Transpose
The transpose flips a matrix across its main diagonal: rows become columns. Key identities you will use constantly:
The middle identity — order reverses when you transpose a product — is the source of half the "why is there a transpose here" moments in backprop derivations.
A matrix is symmetric if . Symmetric matrices are everywhere in ML: covariance matrices, Gram matrices, Hessians of twice-differentiable losses. They have special structure (real eigenvalues, orthogonal eigenvectors) that we will exploit in Lesson 4.
3. Inverse
The inverse of a square matrix satisfies . It is the "undo" of the linear transformation .
In practice, never compute matrix inverses in ML code. If you need , solve the linear system instead. It's faster and numerically stabler.
import numpy as np
A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])
# BAD — builds the inverse, O(n³), numerically fragile
x_bad = np.linalg.inv(A) @ b
# GOOD — solves directly, same cost, better conditioning
x_good = np.linalg.solve(A, b)
print(x_good) # [2. 3.]
4. Determinant
The determinant of a square matrix is a single scalar that summarizes two things about the linear map :
- Magnitude: is the factor by which scales volume. If , any region of space is cut in half after applying .
- Orientation: a negative determinant means also flips orientation (like a mirror reflection).
Useful properties:
Determinants appear in probability (changes of variable in densities), in normalizing flows, and in regularization for certain generative models. In day-to-day ML code they're rare, but conceptually important.
5. Rank
The rank of a matrix is the number of linearly independent columns (equivalently, rows). A full-rank matrix preserves dimensions; a rank-deficient one collapses them.
Rank shows up throughout ML:
| Where | Why rank matters |
|---|---|
| Linear regression | If is rank-deficient, the closed-form solution is undefined — features are perfectly collinear. |
| LoRA fine-tuning | The whole idea is that weight updates are low-rank — you can approximate as where are thin. |
| PCA, embeddings | Dimensionality reduction is finding a low-rank approximation to your data matrix. |
6. Trace
The trace is the sum of the diagonal entries of a square matrix. It looks innocuous but has a magical property:
This cyclic property lets you rearrange products inside a trace freely. It's the tool that makes derivations of batch normalization, the score function, and many loss gradients clean rather than horrific.