Skip to contents

This vignettes shows a minimal working example for creating a codebook via the eatCodebook package. For illustrative purposes we use the R built in data set iris. We import the data set using the eatGADS package, which is automatically installed when eatCodebook is installed. We also run the checkFormat() function from the eatGADS package which adds SPSS format information to the meta data of the data set.

library(eatCodebook)
dat <- eatGADS::import_DF(iris)
#> Sepal.Length has been renamed to Sepal_Length
#> Sepal.Width has been renamed to Sepal_Width
#> Petal.Length has been renamed to Petal_Length
#> Petal.Width has been renamed to Petal_Width
dat <- eatGADS::checkFormat(dat)
#> Format of Variable Sepal_Length will be changed from character(0) to F3.1
#> Format of Variable Sepal_Width will be changed from character(0) to F3.1
#> Format of Variable Petal_Length will be changed from character(0) to F3.1
#> Format of Variable Petal_Width will be changed from character(0) to F3.1
#> Format of Variable Species will be changed from character(0) to F1

Descriptive Statistics

One of the key elements of a codebook are descriptive statistics shortly describing each variable in the data set. What kind of descriptive statistics is reported for each variable depends on the type of the variable. The function createInputForDescriptives() creates a template to provide the information that is needed to calculate the descriptive statistics for an GADSdat object.

inputForDescriptives <- createInputForDescriptives(GADSdat = dat)
head(inputForDescriptives)
#>              varName varLabel format   imp     type   scale        group
#> FALSE.1 Sepal_Length     <NA>   F3.1 FALSE variable numeric Sepal_Length
#> FALSE.2  Sepal_Width     <NA>   F3.1 FALSE variable numeric  Sepal_Width
#> FALSE.3 Petal_Length     <NA>   F3.1 FALSE variable numeric Petal_Length
#> FALSE.4  Petal_Width     <NA>   F3.1 FALSE variable numeric  Petal_Width
#> FALSE.5      Species     <NA>     F1 FALSE variable ordinal      Species

The template should be exported to .xlsx, modified and reimported to R.

writeExcel(inputForDescriptives, "inputForDescriptives.xlsx")
inputForDescriptives_edited <- getInputForDescriptives("inputForDescriptives.xlsx")

This input is then used to calculate descriptive statistics via calculateDescriptives().

# calculate descriptives
descStatistics <- calculateDescriptives(GADSdat = dat, 
                                        inputForDescriptives = inputForDescriptives_edited)
descStatistics[[3]]
#>         N.valid      mean.valid        sd.valid       min.valid       max.valid 
#>           "150"          "5.84"          "0.83"           "4.3"           "7.9" 
#> sysmis.totalabs 
#>             "0"

Value and Missing Labels

Another important part of a codebook is the documentation of the value labels of valid and missing values. A respective overview is created via createMissings().

missings <- createMissings(dat, inputForDescriptives = inputForDescriptives_edited)
head(missings)
#>   Var.name Wert missing    LabelSH Zeilenumbruch_vor_Wert
#> 5  Species    1    nein     setosa                   nein
#> 6  Species    2    nein versicolor                   nein
#> 7  Species    3    nein  virginica                   nein

In this case, the resulting object missings has to be written to xlsx and imported via getMissings(). Note that all the getXXX functions perform important cleaning and preparation steps, therefore the exporting to xlsx is obligatory.

writeExcel(missings, "example_miss.xlsx", row.names = FALSE)
miss_final <- getMissings("example_miss.xlsx")
#>   Var.name Wert missing    LabelSH Zeilenumbruch_vor_Wert
#> 1  Species    1    nein     setosa                   nein
#> 2  Species    2    nein versicolor                   nein
#> 3  Species    3    nein  virginica                   nein

Variable Information

A key element of the eatCodebook package is that various forms of variable information can be supplied.

varInfo <- createVarInfo(dat, inputForDescriptives = inputForDescriptives_edited)
head(varInfo)
#>       Var.Name in.DS.und.SH Unterteilung.im.Skalenhandbuch Layout LabelSH
#> 1 Sepal_Length           ja                             NA      -    <NA>
#> 2  Sepal_Width           ja                             NA      -    <NA>
#> 3 Petal_Length           ja                             NA      -    <NA>
#> 4  Petal_Width           ja                             NA      -    <NA>
#> 5      Species           ja                             NA      -    <NA>
#>   Anmerkung.Var Gliederung Reihenfolge Titel rekodiert QuelleSH Instruktionen
#> 1             -          -          NA  <NA>      nein        -             -
#> 2             -          -          NA  <NA>      nein        -             -
#> 3             -          -          NA  <NA>      nein        -             -
#> 4             -          -          NA  <NA>      nein        -             -
#> 5             -          -          NA  <NA>      nein        -             -
#>   Hintergrundmodell HGM.Reihenfolge HGM.Variable.erstellt.aus intern.extern
#> 1              nein               -                         -             -
#> 2              nein               -                         -             -
#> 3              nein               -                         -             -
#> 4              nein               -                         -             -
#> 5              nein               -                         -             -
#>   Seitenumbruch.im.Inhaltsverzeichnis
#> 1                                nein
#> 2                                nein
#> 3                                nein
#> 4                                nein
#> 5                                nein
writeExcel(varInfo, "example_varInfo.xlsx", row.names = FALSE)
varInfo_final <- getVarInfo("example_varInfo.xlsx")
varInfo_final2 <- inferLayout(varInfo_final, GADSdat = dat, 
                              inputForDescriptives = inputForDescriptives_edited)
#>       Var.Name in.DS.und.SH Unterteilung.im.Skalenhandbuch Layout LabelSH
#> 1 Sepal_Length           ja                             NA      -    <NA>
#> 2  Sepal_Width           ja                             NA      -    <NA>
#> 3 Petal_Length           ja                             NA      -    <NA>
#> 4  Petal_Width           ja                             NA      -    <NA>
#> 5      Species           ja                             NA      -    <NA>
#>   Anmerkung.Var Gliederung Reihenfolge        Titel rekodiert QuelleSH
#> 1             -        1.1           0 Sepal Length      nein        -
#> 2             -        1.1           0  Sepal Width      nein        -
#> 3             -        1.2           0 Pepal Length      nein        -
#> 4             -        1.2           0  Pepal Width      nein        -
#> 5             -        2.1           0      Species      nein        -
#>   Instruktionen Hintergrundmodell HGM.Reihenfolge HGM.Variable.erstellt.aus
#> 1             -              nein               -                         -
#> 2             -              nein               -                         -
#> 3             -              nein               -                         -
#> 4             -              nein               -                         -
#> 5             -              nein               -                         -
#>   intern.extern Seitenumbruch.im.Inhaltsverzeichnis
#> 1             -                                nein
#> 2             -                                nein
#> 3             -                                nein
#> 4             -                                nein
#> 5             -                                nein

Structure

A key element of the eatCodebook package is that various forms of variable information can be supplied.

struc <- createStructure(varInfo_final)
head(struc)
#>     Titel Ebene
#> 1.1    NA     1
#> 1.2    NA   1.1
#> 1.3    NA   1.2
#> 2.1    NA     2
#> 2.2    NA   2.1
writeExcel(struc, "example_struc.xlsx", row.names = FALSE)
struc_final <- getStructure("example_struc.xlsx")
#>                 Titel Ebene
#> 1             Metrics     1
#> 2                <NA>   1.1
#> 3                <NA>   1.2
#> 4 Species Information     2
#> 5                <NA>   2.1

Scale Information

A key element of the eatCodebook package is that various forms of variable information can be supplied.

scaleInfo <- createScaleInfo(inputForDescriptives_edited)
head(scaleInfo)
#> [1] varName              Quelle               Anzahl_valider_Werte
#> [4] Items_der_Skala     
#> <0 rows> (or 0-length row.names)
writeExcel(scaleInfo, "example_scaleInfo.xlsx", row.names = FALSE)
scaleInfo_final <- getScaleInfo("example_scaleInfo.xlsx")
#> [1] varName              Quelle               Anzahl_valider_Werte
#> [4] Items_der_Skala     
#> <0 rows> (or 0-length row.names)

Meta data

Meta data can be added to the codebook.

meta <- createMetadata()
meta[1, "Title"] <- "Codebook Test"
meta[1, "Author"] <- "Anna Muster"
meta[1, "Keywords"] <- "lsa, education"
meta[1, "Subject"] <- "test"
writeExcel(meta, "example_meta.xlsx", row.names = FALSE)
meta_final <- makeMetadata("example_meta.xlsx")

Chapters

Create the chapter structure.

chapters <- createChapters(varInfo_final2)
chapters[1, 2] <- "Iris Datensatz"

Codebok

Now we create the actual codebook script via calling the codebook() function.

latex_skript <- codebook(varInfo = varInfo_final2, missings = miss_final, struc = struc_final,
                         scaleInfo = scaleInfo_final, dat = eatGADS::extractData(dat),
                         Kennwertedatensatz = descStatistics, chapters = chapters)
#> 
#>  Erstelle Layout-Skripte fuer: dat
#>   Layout der Variable: Sepal_Length
#> Warning in Latex.length(sections.var1[d], FALSE, FALSE): Fuer folgende Zeichen
#> gibt es keine Laengenangaben: NA. Die Laenge von NA in Latex wird daher
#> unterschaetzt.
#>   Layout der Variable: Sepal_Width
#>   Layout der Variable: Petal_Length
#> Warning in Latex.length(sections.var1[d], FALSE, FALSE): Fuer folgende Zeichen
#> gibt es keine Laengenangaben: NA. Die Laenge von NA in Latex wird daher
#> unterschaetzt.
#>   Layout der Variable: Petal_Width
#>   Layout der Variable: Species
#> Warning in Latex.length(sections.var1[d], FALSE, FALSE): Fuer folgende Zeichen
#> gibt es keine Laengenangaben: NA. Die Laenge von NA in Latex wird daher
#> unterschaetzt.

Save the Codebok

The resulting object and the meta data are then separately save to the hard drive. Both objects should be saved into the same folder.

write.table(latex_skript , file = "minimal_example.tex" , fileEncoding="UTF-8" ,
            col.names=FALSE , row.names=FALSE , quote = FALSE )
write.table(meta_final , file = "minimal_example_meta.xmpdata", fileEncoding="UTF-8" ,
            col.names=FALSE , row.names=FALSE , quote = FALSE )

Compilation

The LaTeX script for the codebook has now been saved to the hard drive. This file should now be opened via a Tex-Editor and simply compiled. Sometimes multiple consecutive compilation steps are required for a clean output. Alternatively, the document can be compiled from within R, for example via tools::texi2pdf().