Getting Started
getting_started.Rmd
The eatRecode
package helps you to recode open text
fields. The infrastructure is pretty straight forward:
eatRecode
draws on existing recode lists containing already
recoded value pairs. All values in your data set that are already part
of the recode list can be recoded automatically. All other values are
handled manually, and can be added to the recode list in the end. Over
time, your recode lists will grow, calling for less and less manual
recodes every time.
Recode Lists
A recode list always has to contain the two columns
oldValues
and newValues
:
#> oldValues newValues
#> 1 Bavaria Germany
#> 2 Berlin Germany
#> 3 England UK
#> 4 Wales UK
The values in the column oldValues
will be matched with
values in the column you want to recode. All values in your column that
already are in the data set will be recoded to the values saved in the
column newValues
of the recode list.
Getting Started
The workflow is quite simple:
- Apply a recode list to your data frame.
- Extract all columns that couldn’t be recoded and recode them manually.
- Update your recode list.
- Apply this updated recode list to your data frame.
Let’s take a closer look at these four steps:
1. Apply a Recode List to Your Data Frame
We have the following data frame, and want to recode the values in
the column country
:
#> id country
#> 1 1 Berlin
#> 2 2 Kairo
#> 3 3 England
#> 4 4 Schottland
The two main functions of eatRecode
are
useRecodeList()
and extractManualRecode()
. As
you might have guessed, useRecodeList()
applies a recode
list to a column of your data frame:
recoded_df <- useRecodeList(df, varName = "country", new_varName = "country_r", recodeList = recode_db)
recoded_df
#> id country country_r
#> 1 1 Berlin Germany
#> 2 2 Kairo <NA>
#> 3 3 England UK
#> 4 4 Schottland <NA>
This adds the column country_r
to our data frame
df
, which contains all recoded values that were part of the
recode list recode_db
.
2. Extract All Columns That Couldn’t Be Recoded and Recode Them Manually
All others can be extracted with
extractManualRecode()
:
manual_recodes <- extractManualRecode(recoded_df, varName = "country_r")
manual_recodes
#> id country newValues
#> 2 2 Kairo <NA>
#> 4 4 Schottland <NA>
This list can be edited manually to add the recoded values in
newValues
. For example, you can save the file as an
.xlsx
file and edit it in excel, then load it into R
again:
openxlsx::write.xlsx(manual_recodes, "./manual_recodes.xlsx")
recoded_df2 <- openxlsx::read.xlsx("./manual_recodes_changed.xlsx")
You can also edit the data directly in R:
manual_recodes$newValues <- c("Egypt", "UK")
3. Update Your Recode List
The next step is to update your recode data base, so it contains all needed values:
updateRecodeDB(newRecodes = manual_recodes,
oldValues = "country",
directory = here::here("tests", "testthat"), # I use the here package to set the path to the data base
DBname = "helper_recodeDB_2",
ListName = "country",
fileType = "xlsx")
#> [1] "Successfully updated helper_recodeDB_2.xlsx"
4. Apply this Updated Recode List to Your Data Frame
Now you can use this recoded list on your data frame.
recodeList_2 <- as.data.frame(readxl::read_xlsx(here::here("tests", "testthat", "helper_recodeDB_2.xlsx")))
recoded_df2 <- useRecodeList(df, varName = "country", new_varName = "country_r",
recodeList = recodeList_2)
recoded_df2
#> id country country_r
#> 1 1 Berlin Germany
#> 2 2 Kairo Egypt
#> 3 3 England UK
#> 4 4 Schottland UK