Skip to contents

Create a variable information data.frame from the GADSdat object. This input can be used to calculate the descriptives of the data via the calculateDescriptives function.

Usage

createInputForDescriptives(
  GADSdat,
  idExpr = "^ID",
  impExpr = c("IMPUTATION\\s+{0,1}[[:digit:]]{1,2}", "PV\\s+{0,1}[[:digit:]]{1,2}"),
  scaleExpr = "^Skala",
  itemExpr = "plausible|indikator",
  fakeItemExpr = "fake",
  nwExpr = "IDinClass",
  varNameSeparatorImp = "_",
  ncharSeparatorImp = 2,
  lastOccurrence = TRUE,
  groupSuffixImp = "imp",
  nCatsForOrdinal = c(2:5),
  nwVarNameSeparatorImp = "_",
  nwNcharSeparatorImp = 6,
  nwLastOccurrence = TRUE,
  verbose = FALSE
)

Arguments

GADSdat

Object of class GADSdat, created by import_spss from the eatGADS package, for example. Alternatively, a list of objects of class GADSdat

idExpr

Regular expression to identify ID variables from variable names (Note: for multiple expressions, i.e. if idExpr is a character vector of length > 1, at least one expression should match to identify the variable as ID variable)

impExpr

Regular expression to identify imputed variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if impExpr is a character vector of length > 1, at least one expression should match to identify the variable as an imputed variable)

scaleExpr

Regular expression to identify scale or fake scale variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if scaleExpr is a character vector of length > 1, at least one expression should match to identify the variable as a scale variable)

itemExpr

Regular expression to identify items which constitute a true scale from the variable labels in GADSdat object

fakeItemExpr

Regular expression to identify fake items which constitute a fake scale from the variable labels in GADSdat object

nwExpr

Regular expression to identify network variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if nwExpr is a character vector of length > 1, at least one expression should match to identify the variable as a network variable)

varNameSeparatorImp

character sign to separate the "pooled" suffix from group name in group column. For example, if multiple imputed variables occur in the wide-format data.frame as pv_1, pv_2, pv_3, use "_". If no such sign exists in the data, i.e. if multiple imputations occur as pv1, pv2, pv3, instead of pv_1, pv_2, pv_3, or pv.1, pv.2, pv.3, use NA or NULL or "".

ncharSeparatorImp

Integer: only relevant if no varNameSeparatorImp exists, i.e. if multiple imputations occur as pv1, pv2, pv3, instead of pv_1, pv_2, pv_3, or pv.1, pv.2, pv.3. ncharSeparatorImp than specifies the number of character signs which should be trimmed to identify the common variable stem. If varNameSeparatorImp is not NA or NULL or "", ncharSeparatorImp will be ignored. For example, if multiple imputations occur as pv_1, pv_2, pv_3, use varNameSeparatorImp = "_". If multiple imputations occur as pv1, pv2, pv3, use varNameSeparatorImp = NULL and ncharSeparatorImp = 2. The first 2 signs of variables names (i.e., "pv") will be used to identify the imputed variables which belong to a common stem.

lastOccurrence

Logical: If varNameSeparatorImp occurrs multiple times within a string, lastOccurrence defines whether the last occurrence should be used for splitting

groupSuffixImp

tbd

nCatsForOrdinal

Numeric vector with number of categories considered for ordinal variables. Variables with number of categories as defined here are considered to be ordinal instead of nominal. If NULL, this rule will be ignored, and nominal/ordinal assignment is done in other ways

nwVarNameSeparatorImp

character sign to separate network variable names from network variable groups. For example, if network variables occur as friend_1, friend_2, ..., friend_12, use "_". If no such sign exists in the data, i.e. if network variable names occur as friend1, friend2, ..., friend12, use NA or NULL or "".

nwNcharSeparatorImp

Integer: only relevant if no nwVarNameSeparatorImp exists, i.e. if network variables occur as friend1, friend2, ..., friend12, instead of friend_1, friend_2, ..., friend_12. nwVcharSeparatorImp than specifies the number of character signs which should be trimmed to identify the common variable stem. If nwVarNameSeparatorImp is not NA or NULL or "", ncharSeparatorImp will be ignored. For example, if network variables occur as friend_1, friend_2, ..., friend_12, use nwVarNameSeparatorImp = "_". If network variables occur as friend1, friend2, ..., friend12, use nwVarNameSeparatorImp = NULL and nwNcharSeparatorImp = 6. The first 6 signs of variables names (i.e., "friend") will be used to identify the group.

nwLastOccurrence

Logical: If nwVarNameSeparatorImp occurrs multiple times within a string, nwLastOccurrence defines whether the last occurrence should be used for splitting

verbose

Should scale identification be reported?

Value

Returns a data.frame with variable information with following columns

  • varName The name of the variable as it occurs in the data

  • varLabel The label of the variable as it occurs in the GADSdat label sheet

  • format The variable format as displayed in the labels sheet of the GADSdat object

  • imp Logical: Whether or not the variable is imputed

  • type The type of the variable. Two possible entries, variable or scale

  • scale The scale level of the variable. Possible entries: nominal, ordinal, numeric. ID variables and character variables have missing entries in this column. Be cautious that 'ordinal' sometimes may be allocated erroneously. The resulting table should be exported to Excel for further checks.

  • group If the variable is part of a scale with several items, a common entry in the group column indicates that these variables belong together

Examples

varInfo <- createInputForDescriptives(eatGADS::pisa, impExpr = "Plausible Value")