What is Weighted Sampling?
Weighted sampling is a sampling technique where each item is assigned a weight, and the probability of selecting that item is proportional to its weight.
Higher weight $\Rightarrow$ higher chance of being selected
Lower weight $\Rightarrow$ lower chance (but usually not zero)
Mathematical Definition
Given $n$ items with weights:
$ w_1, w_2, \dots, w_n $
The probability of selecting item $i$ is:
$ P(i) = \frac{w_i}{\sum_{j=1}^{n} w_j} $
Example: weights $[1,3,6] \Rightarrow [0.1,0.3,0.6]$
Visual Intuition
Weighted bar representation:
A | █ B | ███ C | ██████
Flattened view:
| A | B B B | C C C C C C |
Sampling is equivalent to picking a random point on this bar. Larger segments correspond to higher probability.
Four Examples of Weighted Sampling
- Lottery: More tickets $\Rightarrow$ higher winning chance
- Online advertising: Higher bids shown more often
- Survey sampling: Oversampling rare groups
- Game loot drops: Common items have higher drop rates
Weighted Sampling in Machine Learning
- Class imbalance: Minority classes sampled more frequently
- Mini-batch construction: Hard examples sampled more often
- Reinforcement learning: Prioritized experience replay
- Monte Carlo methods: Importance sampling
Types of Weighted Sampling
-
With replacement
Probabilities remain constant across draws -
Without replacement
Probabilities change after each selection -
Stratified weighted sampling
Sampling within predefined groups -
Importance sampling
Sampling with probability correction
Importance Sampling
Goal: estimate an expectation under distribution $p(x)$ while sampling from $q(x)$.
$ \mathbb{E}_{p}[f(x)] = \mathbb{E}_{q}\left[f(x)\frac{p(x)}{q(x)}\right] $
Importance weight:
$ w(x) = \frac{p(x)}{q(x)} $
Weighted Sampling vs Loss Weighting
| Aspect | Weighted Sampling | Loss Weighting |
|---|---|---|
| Data frequency | Changes | Unchanged |
| Loss magnitude | Unchanged | Scaled |
Loss Weighting Formula
Instead of changing how often samples appear, loss weighting scales their contribution to the loss:
$ \mathcal{L} = \sum_i w_i \cdot \ell(y_i, \hat{y}_i) $
What is Weighted Sampling?
Weighted sampling is a sampling technique where each item is assigned a weight, and the probability of selecting that item is proportional to its weight.
Higher weight → higher chance of being selected
Lower weight → lower chance (but usually not zero)
Mathematical Definition
Given n items with weights:
w1, w2, ..., wn
The probability of selecting item i is:
P(i) = wi / Σj=1n wj
Example: weights [1, 3, 6] → probabilities [0.1, 0.3, 0.6]
Visual Intuition
Weighted bar representation:
A | █ B | ███ C | ██████
Flattened view:
| A | B B B | C C C C C C |
Sampling is equivalent to picking a random point on this bar. Larger segments correspond to higher probability.
Four Examples of Weighted Sampling
- Lottery: More tickets mean higher winning chances
- Online advertising: Higher bids are shown more often
- Survey sampling: Rare groups are oversampled
- Game loot drops: Common items have higher drop rates
Weighted Sampling in Machine Learning
- Class imbalance: Minority classes sampled more frequently
- Mini-batch construction: Hard examples sampled more often
- Reinforcement learning: Prioritized experience replay
- Monte Carlo methods: Importance sampling
Types of Weighted Sampling
-
With replacement
Items can be selected multiple times (e.g., bootstrapping) -
Without replacement
Items are removed after selection (e.g., surveys) -
Stratified weighted sampling
Sampling within predefined groups (strata) -
Importance sampling
Sampling from one distribution and correcting with weights
Importance Sampling Formula
Goal: estimate an expectation under distribution p(x) while sampling from q(x).
Ep[f(x)] = Eq[ f(x) · p(x) / q(x) ]
Weight:
w(x) = p(x) / q(x)
Weighted Sampling vs Loss Weighting
| Aspect | Weighted Sampling | Loss Weighting |
|---|---|---|
| Changes data frequency | Yes | No |
| Changes loss magnitude | No | Yes |
| Affects mini-batches | Yes | No |
Loss Weighting Formula
Instead of changing how often samples appear, loss weighting scales their contribution to the loss:
L = Σ wi · ℓ(yi, ŷi)
Key Takeaway
Weighted sampling controls what the model sees more often.
Loss weighting controls how much mistakes matter.
Key Takeaway
Weighted sampling controls what the model sees more often.
Loss weighting controls how much mistakes matter.
No comments:
Post a Comment