# A Data Scientist’s Guide to 8 Types of Sampling Techniques

This one talks about the different types of sampling techniques available to us: Probability Sampling: In probability sampling, every element of the population has an equal chance of being selected.

Probability sampling gives us the best chance to create a sample that is truly representative of the population Non-Probability Sampling: In non-probability sampling, all elements do not have an equal chance of being selected.

Consequently, there is a significant risk of ending up with a non-representative sample which does not produce generalizable results For example, let’s say our population consists of 20 individuals.

Each individual is numbered from 1 to 20 and is represented by a specific color (red, blue, green, or yellow).

Each person would have odds of 1 out of 20 of being chosen in probability sampling.

With non-probability sampling, these odds are not equal.

A person might have a better chance of being chosen than others.

So now that we have an idea of these two sampling types, let’s dive into each and understand the different types of sampling under each section.

Types of Probability Sampling Simple Random Sampling This is a type of sampling technique you must have come across at some point.

Here, every individual is chosen entirely by chance and each member of the population has an equal chance of being selected.

Simple random sampling reduces selection bias.

One big advantage of this technique is that it is the most direct method of probability sampling.

But it comes with a caveat – it may not select enough individuals with our characteristics of interest.

Monte Carlo methods use repeated random sampling for the estimation of unknown parameters.

Systematic Sampling In this type of sampling, the first individual is selected randomly and others are selected using a fixed ‘sampling interval’.

Let’s take a simple example to understand this.

Say our population size is x and we have to select a sample size of n.

Then, the next individual that we will select would be x/nth intervals away from the first individual.

We can select the rest in the same way.

Suppose, we began with person number 3, and we want a sample size of 5.

So, the next individual that we will select would be at an interval of (20/5) = 4 from the 3rd person, i.

e.

7 (3+4),  and so on.

3,  3+4=7,  7+4=11,  11+4=15, 15+4=19 =  3, 7, 11, 15, 19 Systematic sampling is more convenient than simple random sampling.

However, it might also lead to bias if there is an underlying pattern in which we are selecting items from the population (though the chances of that happening are quite rare).

Stratified Sampling In this type of sampling, we divide the population into subgroups (called strata) based on different traits like gender, category, etc.

And then we select the sample(s) from these subgroups: Here, we first divided our population into subgroups based on different colors of red, yellow, green and blue.

Then, from each color, we selected an individual in the proportion of their numbers in the population.

We use this type of sampling when we want representation from all the subgroups of the population.

However, stratified sampling requires proper knowledge of the characteristics of the population.

Cluster Sampling In a clustered sample, we use the subgroups of the population as the sampling unit rather than individuals.

The population is divided into subgroups, known as clusters, and a whole cluster is randomly selected to be included in the study: In the above example, we have divided our population into 5 clusters.

Each cluster consists of 4 individuals and we have taken the 4th cluster in our sample.

We can include more clusters as per our sample size.

This type of sampling is used when we focus on a specific region or area.

Types of Non-Probability Sampling Convenience Sampling This is perhaps the easiest method of sampling because individuals are selected based on their availability and willingness to take part.

Here, let’s say individuals numbered 4, 7, 12, 15 and 20 want to be part of our sample, and hence, we will include them in the sample.

Convenience sampling is prone to significant bias, because the sample may not be the representation of the specific characteristics such as religion or, say the gender, of the population.

Quota Sampling In this type of sampling, we choose items based on predetermined characteristics of the population.

Consider that we have to select individuals having a number in multiples of four for our sample: Therefore, the individuals numbered 4, 8, 12, 16, and 20 are already reserved for our sample.

In quota sampling, the chosen sample might not be the best representation of the characteristics of the population that weren’t considered.

Judgment Sampling It is also known as selective sampling.

It depends on the judgment of the experts when choosing whom to ask to participate.

Suppose, our experts believe that people numbered 1, 7, 10, 15, and 19 should be considered for our sample as they may help us to infer the population in a better way.

As you can imagine, quota sampling is also prone to bias by the experts and may not necessarily be representative.

Snowball Sampling I quite like this sampling technique.

Existing people are asked to nominate further people known to them so that the sample increases in size like a rolling snowball.

This method of sampling is effective when a sampling frame is difficult to identify.

Here, we had randomly chosen person 1 for our sample, and then he/she recommended person 6, and person 6 recommended person 11, and so on.

1->6->11->14->19 There is a significant risk of selection bias in snowball sampling, as the referenced individuals will share common traits with the person who recommends them.

End Notes In this article, we learned about the concept of sampling, steps involved in sampling, and the different types of sampling methods.

Sampling has wide applications in the statistical world as well as the real world.