Create New Terms for Data — term

Function that takes a list specifying base variables and new terms to create and updates a data frame appropriately. Allows for easy pre-processing of multiple data frames in the same manner for analyses (e.g., when prepping training and validation sets for data).

term_prep(x, settings, output = "x")

Arguments

x: A data frame.
settings: A named list, in the form: list(column = list( new1 = list(...), new2 = list(...))), where column is a pre-existing variable in x, and new1, new2, etc. are new terms to be created using variable column. If new terms are to be combined (e.g., interaction effects), provide a list with (1) the element 'new' for the terms to create and (2) the element 'combo' for the terms to combine (see example).
output: A character string, the type of output to return, where 'x' (the default) returns an updated data frame, 'settings' returns an updated list, and 'both' returns a list with both the data frame and list of settings.

Value

Either a data frame, a list of settings, or a list with both the data frame and list of settings.

Examples

data("mtcars")
#' Split into two sets of data
x1 <- mtcars[ seq( 1, 32, 2), ] # Odd
x2 <- mtcars[ seq( 1, 32, 2), ] # Even

lst <- list(
  new = list(
    mpg = list(
      outcome = term_new(
        label = 'Miles per gallon'
      )
    ),
    hp = list(
      log_hp = term_new(
        label = 'Log(Horsepower)',
        transformation = 'log(x)',
        scale = TRUE,
        order = c( 't', 's' )
      )
    ),
    vs = list(
      vs_0v1 = term_new(
        label = 'Engine: V-shaped vs. straight',
        coding = term_coding_effect( 1, 0 ),
        scale = TRUE,
        order = c( 'c', 's' )
      )
    ),
    am = list(
      am_0v1 = term_new(
        label = 'Transmission: Automatic vs. manual',
        coding = term_coding_effect( 1, 0 ),
        scale = TRUE,
        order = c( 'c', 's' )
      )
    )
  ),
  combo = list(
    vs_x_am = term_combo(
      combine = c( vs = 'vs_0v1', am = 'am_0v1' ),
      transformation = 'vs*am',
      scale = TRUE
    )
  )
)
# Add info on mean/SD from 'x1' data
lst <- x1 |> term_prep( lst, output = 'settings' )
# Update 'x1' and 'x2'
x1 <- x1 |> term_prep( lst )
x2 <- x2 |> term_prep( lst )

# Fit 'x1' data
fit <- lm( outcome ~ log_hp + vs_0v1 + am_0v1 + vs_x_am, data = x1)
# Predict 'x2' data
predict( fit, newdata = x2 )
#>          Mazda RX4         Datsun 710  Hornet Sportabout         Duster 360 
#>           23.27412           24.12912           16.46534           13.60518 
#>           Merc 230          Merc 280C         Merc 450SL Cadillac Fleetwood 
#>           21.49094           19.29522           16.22587           15.12037 
#>  Chrysler Imperial        Honda Civic      Toyota Corona        AMC Javelin 
#>           14.14223           29.07088           21.31384           17.77568 
#>   Pontiac Firebird      Porsche 914-2     Ford Pantera L      Maserati Bora 
#>           16.46534           24.88598           15.83227           13.80763