Background: Approximate Matrix Multiplication (AMM)
Matrix multiplication is a fundamental operation in scientific computing, machine learning, data mining, and numerical linear algebra. Given two matrices \( A \in \mathbb{R}^{n \times d} \) and \( B \in \mathbb{R}^{d \times m} \), the exact computation of \( AB \) requires \( O(ndm) \) arithmetic operations, which becomes computationally expensive when the inner dimension \( d \) is large.
Approximate Matrix Multiplication (AMM) addresses this challenge by computing an approximation \( \widetilde{C} \approx AB \) that is significantly faster to obtain while maintaining provable accuracy guarantees. The key idea is to reduce the dimensionality of the problem using randomized techniques, thereby lowering computational cost and memory usage.
Random Sampling of Columns and Rows (Sketching)
One of the simplest and most intuitive approaches to AMM is random sampling of columns of \( A \) and the corresponding rows of \( B \). Since matrix multiplication can be written as a sum of outer products,
\[ AB = \sum_{i=1}^{d} A_{(:,i)} B_{(i,:)} \]we can approximate this sum by randomly selecting only a small subset of indices \( S \subset \{1,2,\dots,d\} \) with \( |S| = k \ll d \). The approximation is then given by:
\[ \widetilde{C} = \frac{d}{k} \sum_{i \in S} A_{(:,i)} B_{(i,:)} \]The scaling factor \( \frac{d}{k} \) ensures that the estimator is unbiased, meaning:
\[ \mathbb{E}[\widetilde{C}] = AB \]Why Sketching Works
Each column--row pair \( A_{(:,i)} B_{(i,:)} \) contributes differently to the final product. Random sampling exploits the fact that many contributions are often small or redundant, especially in large, structured, or noisy data. By sampling only a few representative pairs, we can closely approximate the full sum with much less computation.
Accuracy and Trade-offs
The accuracy of the approximation depends on the number of samples \( k \). As \( k \) increases, the approximation converges to the exact product. With appropriate sampling strategies (such as importance sampling based on column norms), strong error bounds can be obtained in Frobenius or spectral norm.
Random sampling of columns and rows is widely used as a building block in:
- Large-scale machine learning algorithms
- Dimensionality reduction and feature compression
- Recommender systems
- Streaming and distributed computation
In the following examples, we illustrate how this technique works through simple numerical demonstrations.
Approximate Matrix Multiplication using Random Sampling of Columns/Rows
Given matrices \( A \in \mathbb{R}^{n \times d} \) and \( B \in \mathbb{R}^{d \times m} \), approximate:
Example 1: Vector × Vector
\( A = [1 \; 2 \; 3 \; 4] \), \( B = [5 \; 6 \; 7 \; 8]^T \)
Exact:
\[ AB = 70 \]Sample: \( S = \{2,4\} \), \( d=4, k=2 \)
Approximate:
\[ AB \approx 2(2\cdot6 + 4\cdot8) = 88 \]Example 2: Matrix × Vector
\[ A = \begin{bmatrix} 1 & 3 & 5\\ 2 & 4 & 6 \end{bmatrix}, \quad B = \begin{bmatrix} 1\\2\\3 \end{bmatrix} \]Exact:
\[ AB = \begin{bmatrix} 22\\28 \end{bmatrix} \]Sample: \( S=\{3\} \), \( d=3 \)
Approximate:
\[ AB \approx 3 \begin{bmatrix} 5\\6 \end{bmatrix} \cdot 3 = \begin{bmatrix} 45\\54 \end{bmatrix} \]Example 3: Matrix × Matrix
\[ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \end{bmatrix}, \quad B = \begin{bmatrix} 7 & 8\\ 9 & 10\\ 11 & 12 \end{bmatrix} \]Exact:
\[ AB = \begin{bmatrix} 58 & 64\\ 139 & 154 \end{bmatrix} \]Sample: \( S=\{1,3\} \)
Approximate:
\[ AB \approx 1.5(A_{(:,1)}B_{(1,:)} + A_{(:,3)}B_{(3,:)}) \]Example 4: Single Column Sampling
\[ A = [2 \; 4 \; 6], \quad B = [1 \; 3 \; 5]^T \]Exact: \( 44 \)
Approximate:
\[ AB \approx 3(4\cdot3) = 36 \]Example 5: Symmetric Vectors
\[ A = B = [1 \; 2 \; 3] \]Exact: \( 14 \)
Approximate:
\[ AB \approx \frac{3}{2}(1^2 + 3^2) = 15 \]Example 6: Tall Matrix × Vector
\[ A = \begin{bmatrix} 1 & 0 & 2\\ 0 & 1 & 3\\ 1 & 1 & 1 \end{bmatrix}, \quad B = \begin{bmatrix} 2\\4\\6 \end{bmatrix} \]Approximate:
\[ AB \approx 3 \begin{bmatrix} 2\\3\\1 \end{bmatrix} \cdot 6 \]Example 7: Larger Matrix × Matrix
\[ A = \begin{bmatrix} 1 & 0 & 2 & 1\\ 2 & 1 & 0 & 3 \end{bmatrix}, \quad B = \begin{bmatrix} 1 & 2\\ 3 & 4\\ 5 & 6\\ 7 & 8 \end{bmatrix} \]Approximate:
\[ AB \approx 2(A_{(:,1)}B_{(1,:)} + A_{(:,4)}B_{(4,:)}) \]Error Bounds and Intuition
The approximation quality in Random Sampling–based Approximate Matrix Multiplication depends on how many column–row pairs are sampled. Let \( \widetilde{C} \) denote the approximate product and \( C = AB \) the exact product.
With uniform random sampling, the expected error measured in Frobenius norm satisfies:
\[ \mathbb{E}\left[\| \widetilde{C} - C \|_F^2 \right] \le \frac{d}{k} \sum_{i=1}^{d} \|A_{(:,i)}\|_2^2 \|B_{(i,:)}\|_2^2 \]This inequality shows that the error decreases as the number of samples \( k \) increases. In particular, doubling the number of sampled columns roughly halves the expected error.
Intuition Behind the Error Bound
Matrix multiplication can be viewed as summing many outer products. Random sampling replaces this full sum with an estimate based on only a few terms. The approximation error arises from ignoring some column–row pairs. If most of these pairs contribute little to the final product, then the error remains small.
Thus, matrices with redundant, correlated, or low-energy columns are especially well-suited for approximate multiplication.
Importance Sampling
Uniform sampling treats all column–row pairs equally. However, in practice, some columns of \( A \) and rows of \( B \) contribute much more to \( AB \) than others. Importance sampling improves accuracy by sampling indices with higher probability when they have larger impact.
A common choice of sampling probability is:
\[ p_i = \frac{\|A_{(:,i)}\|_2 \|B_{(i,:)}\|_2} {\sum_{j=1}^{d} \|A_{(:,j)}\|_2 \|B_{(j,:)}\|_2} \]The approximation becomes:
\[ \widetilde{C} = \frac{1}{k} \sum_{i \in S} \frac{1}{p_i} A_{(:,i)} B_{(i,:)} \]Importance sampling significantly reduces variance and provides tighter error bounds, especially when column norms vary widely.
Key idea: sample what matters most.
Real-World Machine Learning Applications
- Recommendation Systems: User–item matrices are often large and sparse. Approximate matrix multiplication accelerates similarity computations and matrix factorization.
- Deep Learning: Large fully connected layers and attention mechanisms can be approximated using sketching techniques to reduce training and inference cost.
- Natural Language Processing: Word embeddings and document–term matrices benefit from approximate multiplication during similarity search and latent semantic analysis.
- Kernel Methods: AMM is used to approximate large kernel matrices in support vector machines and Gaussian processes.
- Large-Scale Linear Regression: Sketching reduces the cost of computing \( X^T X \) and \( X^T y \) in high-dimensional data.
Conclusion: Random sampling of columns and rows significantly reduces computation while providing an unbiased approximation. Increasing the sample size improves accuracy.
No comments:
Post a Comment