R – INSTRUCTIONS – INFORMATION EXCHANGE
Hi, I’m Gregor, a data scientist or someone who needs to evaluate and clean up data most of the time. I love working with Python / Pandas and R / tidyverse as much on my projects. Since we used R and the tidyverse package in our latest project, I would like to share the basic but most used functions to manipulate data sets.
In the next section, I present my techniqueofThe definition in this article uses the examples in this article immediately. Then, in Section 3, I present seven functions using the Gapminder dataset. If you have any questions or comments, you can share them with me.
I will introduce the functions using Gapminder dataset. The Gapminder dataset contains data on life expectancy, GDP per capita, and population in the country for decades.
The seven functions are part of the package dplyr developed by Hadley Wickham et al. It is part of the ecosystem of the packaging package. I think it makes R such an efficient and clean computing platform. If you want to know more about the word, I highly recommend the free book “R for data science”.
dplyr is a data processing grammar that provides a consistent set of verbs to help you solve the most common data processing challenges.
seven functions let you choose and rename certain columns sort and filter data set, create and calculate new columnsand summary set of values. I will use watertight the data for each activity is easy to track and apply to your data sets. Note that the further we go, I use a combination of these functions.
3.1 select () – Select columns in a data set
Column selection only on the mainland, yearand pop.
Select all columns but year column.
Select all columns that start in cooperation using starting with. Learn more about useful features, including expires () or contain().
3.2 rename () – Rename columns
Rename the columns year part Year and lifePlx part Life expectancy.
3.3 sort () – Sort the data set
Sort by someone year.
Sort by someone lifePlx and by year (downward).
3.4 Filter () – Filter rows in a data set
Filter rows year 1972
Filter rows year 1972 and a life expectancy below average.
Filter rows year 1972 and a life expectancy below average and I either be Bolivia OR Angola.
3.5 mutate () – Create new rows in your dataset
Create a column that connects on the mainland and on the ground data and a second column that appears rounded lifePlx information.
3.6 Summary () – Create summary calculations for your dataset
Calculate the mean and standard deviation for the entire data set population and life expectations.
3.7 group_by () – Group the data set and create summary calculations
The summary function is just so useful without group_by () function. Using both together is an effective way to create new data sets. In the following example, I group the data sets by continent and then create summaries populationand lifePlx.
It is also possible to group into more than one column. In the following example I use group_by () with on the mainland and year.
In this article, I showed you the most commonly used R functions for manipulating data sets. I gave you some examples that will hopefully be the perfect basis to try them out for each function. If you want to know more about R and dplyr, be sure to check official documents as well as beautiful R for data science book.
Let me know what you think and what your most used functions are. Thank you!