Stats 101: How to do sampling in R?

How to do Sampling in R

You have understood what is sampling theory in the previous post. Let us understand how to perform various types of sampling in R.

To perform sampling in R, one can take the help of various functions available for each type of sampling technique.

Statistics
How to do sampling in R
Data Science
Statistics for Data Science
Different Sampling Functions in R

Simple Random Sampling in R

Simple random sampling can be done in R using the function ‘sample()’. It accepts an argument called replace to decide whether it has to be with replacement or without replacement of values in the original set. Basically, whether we can repeat values in the selection or not.

replace=TRUE mean repetitions of values are allowed.
replace=FALSE mean repetitions of values are NOT allowed.

Let us look at the below example. From the first 100 numbers, if you want to select 12 numbers randomly without repetition.

Here, the function parameters are explained below.

  • x: Range of values from where random selection has to be performed
  • size: number of values to select
  • replace: whether to allow repetition or not.

Output:

Simple Random Sampling WITHOUT Replacement

Output:

Simple Random Sampling with Replacement

Stratified Sampling in R

As discussed, stratified sampling needs a grouping reference. Based on this column data rows can be selected from each group.

There are many libraries in R to perform stratified sampling, two of the easiest to use are listed below

  • library(sampling) —> strata() function
  • library(caret) —> createDataPartition() function

Lets use below tiny data to understand this concept

Output:

Sample Data for Stratified Sampling

Output(SampleIndex):

Stratified sampling using createDataPartition() function from library(caret)

Output(SampleIndexNew):

Stratified sampling using strata() function from library(sampling)

Systematic Sampling in R

seq() function in R helps to generate the ith index

Systematic Sampling means just select every ith value from the dataset.

Output:

Systematic sampling in R using seq() function

Biased Sampling in R

Biased sampling does not require any special function in R. One can select any index of value as per the need and inspect it.

Output:

Biased sampling in R using manual indexing

Conclusion

  • Sampling means choosing random values.
  • A randomly selected sample is representative of the whole group (population).
  • Simple Random Sampling in R is done using the sample() function
  • Systematic Sampling in R is done by using the seq() function.
  • Biased Sampling in R is done by choosing the sample indexes manually.

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

Leave a Reply!

Your email address will not be published. Required fields are marked *