Skip to contents
#remotes::install_github("beckerbenj/eatRecode") 
library(eatRecode)

The eatRecode package helps you to recode open text fields. The infrastructure is pretty straight forward: eatRecode draws on existing recode lists containing already recoded value pairs. All values in your data set that are already part of the recode list can be recoded automatically. All other values are handled manually, and can be added to the recode list in the end. Over time, your recode lists will grow, calling for less and less manual recodes every time.

Recode Lists

A recode list always has to contain the two columns oldValues and newValues:

#>   oldValues newValues
#> 1   Bavaria   Germany
#> 2    Berlin   Germany
#> 3   England        UK
#> 4     Wales        UK

The values in the column oldValues will be matched with values in the column you want to recode. All values in your column that already are in the data set will be recoded to the values saved in the column newValues of the recode list.

Getting Started

The workflow is quite simple:

  1. Apply a recode list to your data frame.
  2. Extract all columns that couldn’t be recoded and recode them manually.
  3. Update your recode list.
  4. Apply this updated recode list to your data frame.

Let’s take a closer look at these four steps:

1. Apply a Recode List to Your Data Frame

We have the following data frame, and want to recode the values in the column country:

#>   id    country
#> 1  1     Berlin
#> 2  2      Kairo
#> 3  3    England
#> 4  4 Schottland

The two main functions of eatRecode are useRecodeList() and extractManualRecode(). As you might have guessed, useRecodeList() applies a recode list to a column of your data frame:

recoded_df <- useRecodeList(df, varName = "country", new_varName = "country_r", recodeList = recode_db)
recoded_df
#>   id    country country_r
#> 1  1     Berlin   Germany
#> 2  2      Kairo      <NA>
#> 3  3    England        UK
#> 4  4 Schottland      <NA>

This adds the column country_r to our data frame df, which contains all recoded values that were part of the recode list recode_db.

2. Extract All Columns That Couldn’t Be Recoded and Recode Them Manually

All others can be extracted with extractManualRecode():

manual_recodes <- extractManualRecode(recoded_df, varName = "country_r")
manual_recodes
#>   id    country newValues
#> 2  2      Kairo      <NA>
#> 4  4 Schottland      <NA>

This list can be edited manually to add the recoded values in newValues. For example, you can save the file as an .xlsx file and edit it in excel, then load it into R again:

openxlsx::write.xlsx(manual_recodes, "./manual_recodes.xlsx")
recoded_df2 <- openxlsx::read.xlsx("./manual_recodes_changed.xlsx")

You can also edit the data directly in R:

manual_recodes$newValues <- c("Egypt", "UK")

3. Update Your Recode List

The next step is to update your recode data base, so it contains all needed values:

updateRecodeDB(newRecodes = manual_recodes,
               oldValues = "country",
               directory = here::here("tests", "testthat"), # I use the here package to set the path to the data base
               DBname = "helper_recodeDB_2",             
               ListName = "country",
               fileType = "xlsx")
#> [1] "Successfully updated helper_recodeDB_2.xlsx"

4. Apply this Updated Recode List to Your Data Frame

Now you can use this recoded list on your data frame.

recodeList_2 <- as.data.frame(readxl::read_xlsx(here::here("tests", "testthat", "helper_recodeDB_2.xlsx")))
recoded_df2 <- useRecodeList(df, varName = "country", new_varName = "country_r", 
                             recodeList = recodeList_2)

recoded_df2
#>   id    country country_r
#> 1  1     Berlin   Germany
#> 2  2      Kairo     Egypt
#> 3  3    England        UK
#> 4  4 Schottland        UK