Create input data.frame for subsequent calculation of descriptives
Source:R/createInputForDescriptives.r
createInputForDescriptives.Rd
Create a variable information data.frame from the GADSdat object. This input can be used
to calculate the descriptives of the data via the calculateDescriptives
function.
Usage
createInputForDescriptives(
GADSdat,
idExpr = "^ID",
impExpr = c("IMPUTATION\\s+{0,1}[[:digit:]]{1,2}", "PV\\s+{0,1}[[:digit:]]{1,2}"),
scaleExpr = "^Skala",
itemExpr = "plausible|indikator",
fakeItemExpr = "fake",
nwExpr = "IDinClass",
varNameSeparatorImp = "_",
ncharSeparatorImp = 2,
lastOccurrence = TRUE,
groupSuffixImp = "imp",
nCatsForOrdinal = c(2:5),
nwVarNameSeparatorImp = "_",
nwNcharSeparatorImp = 6,
nwLastOccurrence = TRUE,
verbose = FALSE
)
Arguments
- GADSdat
Object of class
GADSdat
, created byimport_spss
from theeatGADS
package, for example. Alternatively, a list of objects of classGADSdat
- idExpr
Regular expression to identify ID variables from variable names (Note: for multiple expressions, i.e. if
idExpr
is a character vector of length > 1, at least one expression should match to identify the variable as ID variable)- impExpr
Regular expression to identify imputed variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if
impExpr
is a character vector of length > 1, at least one expression should match to identify the variable as an imputed variable)- scaleExpr
Regular expression to identify scale or fake scale variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if
scaleExpr
is a character vector of length > 1, at least one expression should match to identify the variable as a scale variable)- itemExpr
Regular expression to identify items which constitute a true scale from the variable labels in GADSdat object
- fakeItemExpr
Regular expression to identify fake items which constitute a fake scale from the variable labels in GADSdat object
- nwExpr
Regular expression to identify network variables from variable labels in GADSdat object (Note: for multiple expressions, i.e. if
nwExpr
is a character vector of length > 1, at least one expression should match to identify the variable as a network variable)- varNameSeparatorImp
character sign to separate the "pooled" suffix from group name in group column. For example, if multiple imputed variables occur in the wide-format data.frame as
pv_1
,pv_2
,pv_3
, use"_"
. If no such sign exists in the data, i.e. if multiple imputations occur aspv1
,pv2
,pv3
, instead ofpv_1
,pv_2
,pv_3
, orpv.1
,pv.2
,pv.3
, useNA
orNULL
or""
.- ncharSeparatorImp
Integer: only relevant if no
varNameSeparatorImp
exists, i.e. if multiple imputations occur aspv1
,pv2
,pv3
, instead ofpv_1
,pv_2
,pv_3
, orpv.1
,pv.2
,pv.3
.ncharSeparatorImp
than specifies the number of character signs which should be trimmed to identify the common variable stem. IfvarNameSeparatorImp
is notNA
orNULL
or""
,ncharSeparatorImp
will be ignored. For example, if multiple imputations occur aspv_1
,pv_2
,pv_3
, usevarNameSeparatorImp = "_"
. If multiple imputations occur aspv1
,pv2
,pv3
, usevarNameSeparatorImp = NULL
andncharSeparatorImp = 2
. The first 2 signs of variables names (i.e.,"pv"
) will be used to identify the imputed variables which belong to a common stem.- lastOccurrence
Logical: If
varNameSeparatorImp
occurrs multiple times within a string,lastOccurrence
defines whether the last occurrence should be used for splitting- groupSuffixImp
tbd
- nCatsForOrdinal
Numeric vector with number of categories considered for ordinal variables. Variables with number of categories as defined here are considered to be ordinal instead of nominal. If NULL, this rule will be ignored, and nominal/ordinal assignment is done in other ways
- nwVarNameSeparatorImp
character sign to separate network variable names from network variable groups. For example, if network variables occur as
friend_1
,friend_2
, ...,friend_12
, use"_"
. If no such sign exists in the data, i.e. if network variable names occur asfriend1
,friend2
, ...,friend12
, useNA
orNULL
or""
.- nwNcharSeparatorImp
Integer: only relevant if no
nwVarNameSeparatorImp
exists, i.e. if network variables occur asfriend1
,friend2
, ...,friend12
, instead offriend_1
,friend_2
, ...,friend_12
.nwVcharSeparatorImp
than specifies the number of character signs which should be trimmed to identify the common variable stem. IfnwVarNameSeparatorImp
is notNA
orNULL
or""
,ncharSeparatorImp
will be ignored. For example, if network variables occur asfriend_1
,friend_2
, ...,friend_12
, usenwVarNameSeparatorImp = "_"
. If network variables occur asfriend1
,friend2
, ...,friend12
, usenwVarNameSeparatorImp = NULL
andnwNcharSeparatorImp = 6
. The first 6 signs of variables names (i.e.,"friend"
) will be used to identify the group.- nwLastOccurrence
Logical: If
nwVarNameSeparatorImp
occurrs multiple times within a string,nwLastOccurrence
defines whether the last occurrence should be used for splitting- verbose
Should scale identification be reported?
Value
Returns a data.frame
with variable information with following columns
varName
The name of the variable as it occurs in the datavarLabel
The label of the variable as it occurs in theGADSdat
label sheetformat
The variable format as displayed in the labels sheet of theGADSdat
objectimp
Logical: Whether or not the variable is imputedtype
The type of the variable. Two possible entries,variable
orscale
scale
The scale level of the variable. Possible entries:nominal
,ordinal
,numeric
. ID variables and character variables have missing entries in this column. Be cautious that 'ordinal' sometimes may be allocated erroneously. The resulting table should be exported to Excel for further checks.group
If the variable is part of a scale with several items, a common entry in the group column indicates that these variables belong together
Examples
varInfo <- createInputForDescriptives(eatGADS::pisa, impExpr = "Plausible Value")