Weighted Sampling

What is Weighted Sampling?

Weighted sampling is a sampling technique where each item is assigned a weight, and the probability of selecting that item is proportional to its weight.

Higher weight $\Rightarrow$ higher chance of being selected
Lower weight $\Rightarrow$ lower chance (but usually not zero)


Mathematical Definition

Given $n$ items with weights:

$ w_1, w_2, \dots, w_n $

The probability of selecting item $i$ is:

$ P(i) = \frac{w_i}{\sum_{j=1}^{n} w_j} $

Example: weights $[1,3,6] \Rightarrow [0.1,0.3,0.6]$


Visual Intuition

Weighted bar representation:

A | █
B | ███
C | ██████

Flattened view:

| A | B B B | C C C C C C |

Sampling is equivalent to picking a random point on this bar. Larger segments correspond to higher probability.


Four Examples of Weighted Sampling

  • Lottery: More tickets $\Rightarrow$ higher winning chance
  • Online advertising: Higher bids shown more often
  • Survey sampling: Oversampling rare groups
  • Game loot drops: Common items have higher drop rates

Weighted Sampling in Machine Learning

  • Class imbalance: Minority classes sampled more frequently
  • Mini-batch construction: Hard examples sampled more often
  • Reinforcement learning: Prioritized experience replay
  • Monte Carlo methods: Importance sampling

Types of Weighted Sampling

  1. With replacement
    Probabilities remain constant across draws
  2. Without replacement
    Probabilities change after each selection
  3. Stratified weighted sampling
    Sampling within predefined groups
  4. Importance sampling
    Sampling with probability correction

Importance Sampling

Goal: estimate an expectation under distribution $p(x)$ while sampling from $q(x)$.

$ \mathbb{E}_{p}[f(x)] = \mathbb{E}_{q}\left[f(x)\frac{p(x)}{q(x)}\right] $

Importance weight:

$ w(x) = \frac{p(x)}{q(x)} $


Weighted Sampling vs Loss Weighting

Aspect Weighted Sampling Loss Weighting
Data frequency Changes Unchanged
Loss magnitude Unchanged Scaled

Loss Weighting Formula

Instead of changing how often samples appear, loss weighting scales their contribution to the loss:

$ \mathcal{L} = \sum_i w_i \cdot \ell(y_i, \hat{y}_i) $


What is Weighted Sampling?

Weighted sampling is a sampling technique where each item is assigned a weight, and the probability of selecting that item is proportional to its weight.

Higher weight → higher chance of being selected
Lower weight → lower chance (but usually not zero)


Mathematical Definition

Given n items with weights:

w1, w2, ..., wn

The probability of selecting item i is:

P(i) = wi / Σj=1n wj

Example: weights [1, 3, 6] → probabilities [0.1, 0.3, 0.6]


Visual Intuition

Weighted bar representation:

A | █
B | ███
C | ██████

Flattened view:

| A | B B B | C C C C C C |

Sampling is equivalent to picking a random point on this bar. Larger segments correspond to higher probability.


Four Examples of Weighted Sampling

  • Lottery: More tickets mean higher winning chances
  • Online advertising: Higher bids are shown more often
  • Survey sampling: Rare groups are oversampled
  • Game loot drops: Common items have higher drop rates

Weighted Sampling in Machine Learning

  • Class imbalance: Minority classes sampled more frequently
  • Mini-batch construction: Hard examples sampled more often
  • Reinforcement learning: Prioritized experience replay
  • Monte Carlo methods: Importance sampling

Types of Weighted Sampling

  1. With replacement
    Items can be selected multiple times (e.g., bootstrapping)
  2. Without replacement
    Items are removed after selection (e.g., surveys)
  3. Stratified weighted sampling
    Sampling within predefined groups (strata)
  4. Importance sampling
    Sampling from one distribution and correcting with weights

Importance Sampling Formula

Goal: estimate an expectation under distribution p(x) while sampling from q(x).

Ep[f(x)] = Eq[ f(x) · p(x) / q(x) ]

Weight:

w(x) = p(x) / q(x)


Weighted Sampling vs Loss Weighting

Aspect Weighted Sampling Loss Weighting
Changes data frequency Yes No
Changes loss magnitude No Yes
Affects mini-batches Yes No

Loss Weighting Formula

Instead of changing how often samples appear, loss weighting scales their contribution to the loss:

L = Σ wi · ℓ(yi, ŷi)


Key Takeaway

Weighted sampling controls what the model sees more often.
Loss weighting controls how much mistakes matter.

Key Takeaway

Weighted sampling controls what the model sees more often.
Loss weighting controls how much mistakes matter.

No comments:

Post a Comment

Assignment: Probability and Statistics Basic

Sticky Ad Probability Problems with Detailed Solutions Click each question to expand the detailed interpretation and solution. ...