term_prep.Rd
Function that takes a list specifying base variables and new terms to create and updates a data frame appropriately. Allows for easy pre-processing of multiple data frames in the same manner for analyses (e.g., when prepping training and validation sets for data).
term_prep(x, settings, output = "x")
A data frame.
A named list, in the form:
list(column = list( new1 = list(...), new2 = list(...)))
,
where column
is a pre-existing variable in x
,
and new1
, new2
, etc. are new terms to be created
using variable column
. If new terms are to be
combined (e.g., interaction effects), provide a list with
(1) the element 'new'
for the terms to create and
(2) the element 'combo'
for the terms to combine
(see example).
A character string, the type of output to return,
where 'x'
(the default) returns an updated data frame,
'settings'
returns an updated list, and 'both'
returns a list with both the data frame and list of settings.
Either a data frame, a list of settings, or a list with both the data frame and list of settings.
data("mtcars")
#' Split into two sets of data
x1 <- mtcars[ seq( 1, 32, 2), ] # Odd
x2 <- mtcars[ seq( 1, 32, 2), ] # Even
lst <- list(
new = list(
mpg = list(
outcome = term_new(
label = 'Miles per gallon'
)
),
hp = list(
log_hp = term_new(
label = 'Log(Horsepower)',
transformation = 'log(x)',
scale = TRUE,
order = c( 't', 's' )
)
),
vs = list(
vs_0v1 = term_new(
label = 'Engine: V-shaped vs. straight',
coding = term_coding_effect( 1, 0 ),
scale = TRUE,
order = c( 'c', 's' )
)
),
am = list(
am_0v1 = term_new(
label = 'Transmission: Automatic vs. manual',
coding = term_coding_effect( 1, 0 ),
scale = TRUE,
order = c( 'c', 's' )
)
)
),
combo = list(
vs_x_am = term_combo(
combine = c( vs = 'vs_0v1', am = 'am_0v1' ),
transformation = 'vs*am',
scale = TRUE
)
)
)
# Add info on mean/SD from 'x1' data
lst <- x1 |> term_prep( lst, output = 'settings' )
# Update 'x1' and 'x2'
x1 <- x1 |> term_prep( lst )
x2 <- x2 |> term_prep( lst )
# Fit 'x1' data
fit <- lm( outcome ~ log_hp + vs_0v1 + am_0v1 + vs_x_am, data = x1)
# Predict 'x2' data
predict( fit, newdata = x2 )
#> Mazda RX4 Datsun 710 Hornet Sportabout Duster 360
#> 23.27412 24.12912 16.46534 13.60518
#> Merc 230 Merc 280C Merc 450SL Cadillac Fleetwood
#> 21.49094 19.29522 16.22587 15.12037
#> Chrysler Imperial Honda Civic Toyota Corona AMC Javelin
#> 14.14223 29.07088 21.31384 17.77568
#> Pontiac Firebird Porsche 914-2 Ford Pantera L Maserati Bora
#> 16.46534 24.88598 15.83227 13.80763