R – INSTRUCTIONS – INFORMATION EXCHANGE

A step-by-step review of the most basic, yet most commonly used information management features

Hi, I’m Gregor, a data scientist or someone who needs to evaluate and clean up data most of the time. I love working with Python / Pandas and R / tidyverse as much on my projects. Since we used R and the tidyverse package in our latest project, I would like to share the basic but most used functions to manipulate data sets.

In the next section, I present my techniqueofThe definition in this article uses the examples in this article immediately. Then, in Section 3, I present seven functions using the Gapminder dataset. If you have any questions or comments, you can share them with me.

I will introduce the functions using Gapminder dataset. The Gapminder dataset contains data on life expectancy, GDP per capita, and population in the country for decades.

The seven functions are part of the package dplyr developed by Hadley Wickham et al. It is part of the ecosystem of the packaging package. I think it makes R such an efficient and clean computing platform. If you want to know more about the word, I highly recommend the free book “R for data science”.

dplyr is a data processing grammar that provides a consistent set of verbs to help you solve the most common data processing challenges.

Gapminder dataset (10 rows); author’s image

seven functions let you choose and rename certain columns sort and filter data set, create and calculate new columnsand summary set of values. I will use watertight the data for each activity is easy to track and apply to your data sets. Note that the further we go, I use a combination of these functions.

3.1 select () – Select columns in a data set

Column selection only on the mainland, yearand pop.

Gapminder dataset (10 rows); author’s image

Select all columns but year column.

Gapminder dataset (10 rows); author’s image

Select all columns that start in cooperation using starting with. Learn more about useful features, including expires () or contain().

Gapminder dataset (10 rows); author’s image

3.2 rename () – Rename columns

Rename the columns year part Year and lifePlx part Life expectancy.

Gapminder dataset (10 rows); author’s image

3.3 sort () – Sort the data set

Sort by someone year.

Gapminder dataset (10 rows); author’s image

Sort by someone lifePlx and by year (downward).

Gapminder dataset (10 rows); author’s image

3.4 Filter () – Filter rows in a data set

Filter rows year 1972

Gapminder dataset (10 rows); author’s image

Filter rows year 1972 and a life expectancy below average.

Gapminder dataset (10 rows); author’s image

Filter rows year 1972 and a life expectancy below average and I either be Bolivia OR Angola.

Gapminder dataset (2 rows); author’s image

3.5 mutate () – Create new rows in your dataset

Create a column that connects on the mainland and on the ground data and a second column that appears rounded lifePlx information.

Gapminder dataset (10 rows); author’s image

3.6 Summary () – Create summary calculations for your dataset

Calculate the mean and standard deviation for the entire data set population and life expectations.

Gapminder dataset (summary); author’s image

3.7 group_by () – Group the data set and create summary calculations

The summary function is just so useful without group_by () function. Using both together is an effective way to create new data sets. In the following example, I group the data sets by continent and then create summaries populationand lifePlx.

Gapminder dataset (grouped and summarized); author’s image

It is also possible to group into more than one column. In the following example I use group_by () with on the mainland and year.

Gapminder dataset (grouped and summarized); author’s image

In this article, I showed you the most commonly used R functions for manipulating data sets. I gave you some examples that will hopefully be the perfect basis to try them out for each function. If you want to know more about R and dplyr, be sure to check official documents as well as beautiful R for data science book.

Let me know what you think and what your most used functions are. Thank you!

LEAVE A REPLY

Please enter your comment!
Please enter your name here