Skip to contents

This is the old version of the aggregateData function from the eatPrep package. In contrast to aggregateDataOld, aggregateDataOldL accepts and returns long format data, i.e. one line per variable/person ID combination. The function is currently deprecated and was only kept in the package to keep older scripts executable. The function might be beneficial if aggregation information from the IQB database is not available.

Usage

aggregateDataOldL(
  datLong,
  idCol,
  varCol,
  valueCol,
  varExclude = NULL,
  itemColName = "item",
  unexpected.pattern.as.na = TRUE,
  printCases = FALSE,
  printPattern = FALSE,
  inputList = NULL
)

Arguments

datLong

A data frame in the long format with at least three columns: person identifier, varable identifier and response.

idCol

Name or number of the person identifier column in the long format data

varCol

Name or number of the variable identifier column in the long format data

valueCol

Name or number of the variable response column in the long format data

varExclude

Optional: Numeric vector of variables which are to be excluded from aggregation in any case

itemColName

Name of the item identifier column in the newly created aggregated data.frame. Name must not occur in the current data.frame.

unexpected.pattern.as.na

Logical: TRUE, if non-valid patterns should be aggregated to NA.

printCases

Logical: Specifies whether exhaustive aggregation information should be printed on console.

printPattern

Logical: Print the unexpected patterns to console?

inputList

Optional: Input list to differentiate between variables (sub items) and items. If NULL, all variables in the data frame which share the same ID except for the last sign are considered to belong to the same item.

Value

A data.frame in the long format containing sum and aggregated values, and maintaining all additional columns from the original data.frame if its values do not vary between variables which belong to the same item. The sum variable is named "valueSum", the aggregated variable is names "valueAgg". The number of aggregated variables (i.e. the maximum possible sum score per item) is captured in the variable "valueMax".

Details

The function use a rather simple aggregation rule: all variables which share a common “stem” are considered to belong together. The “stem” is everything except the last sign. By convention, the last sign may be a letter in ascending order, for example "I1a", "I1b", "I1c". However, "I12", "I1_", "I1j" is also possible although less convenient. The item "I1" consists of three variables and is considered to be correct if all variables are correct. See examples for further details. Note: if inputList is specified, aggregation rules are executed as specified in the ZKD input list.

Examples

### create artificial data
dat <- data.frame ( id = paste0("P", 11:50),
       matrix(data = sample(x=0:1, size = 400, replace = TRUE),nrow=40, ncol = 10))
### aggregate Item 0+1+2 and 4+5 and 8+9: define sequential letter
colnames(dat)[-1] <- c("I1a", "I1b", "I1c", "I2a", "I3a", "I3b", "I4a", "I5a", "I6a", "I6b")
### reshape data to the long format
datL<- reshape2::melt(dat, id.vars = "id")
agg <- aggregateDataOldL(datL,idCol="id", varCol="variable", valueCol="value")
#> Overall: 10 variables, 6 items. Aggregate 7 variables to 3 items.