Understanding sampling methods (visuals and code)

Image by the author

Random sampling

Random sampling
import randompopulation = 100
data = range(population)
print(random.sample(data,5))
> 4, 19, 82, 45, 41

Layered sampling

Layered sampling
from sklearn.model_selection import train_test_split

stratified_sample, _ = train_test_split(population, test_size=0.9, stratify=population[['label']])
print (stratified_sample)

Cluster sample

Cluster sample
import numpy as npclusters=5
pop_size = 100
sample_clusters=2
#assigning cluster ids sequentially from 1 to 5 on gap of 20
cluster_ids = np.repeat([range(1,clusters+1)], pop_size/clusters)
cluster_to_select = random.sample(set(cluster_ids), sample_clusters)indexes = [i for i, x in enumerate(cluster_ids) if x in cluster_to_select]cluster_associated_elements = [el for idx, el in enumerate(range(1, 101)) if idx in indexes]print (cluster_associated_elements)

Systematic sampling

Systematic sampling
population = 100
step = 5
sample = [element for element in range(1, population, step)]
print (sample)

Multi-stage sampling

Multi-stage sampling
import numpy as npclusters=5
pop_size = 100
sample_clusters=2
sample_size=5
#assigning cluster ids sequentially from 1 to 5 on gap of 20
cluster_ids = np.repeat([range(1,clusters+1)], pop_size/clusters)
cluster_to_select = random.sample(set(cluster_ids), sample_clusters)indexes = [i for i, x in enumerate(cluster_ids) if x in cluster_to_select]cluster_associated_elements = [el for idx, el in enumerate(range(1, 101)) if idx in indexes]print (random.sample(cluster_associated_elements, sample_size))

LEAVE A REPLY

Please enter your comment!
Please enter your name here