Function to standardize (mean-center and scale by standard deviation resulting in a mean of 0 and standard deviation of 1) columns in a matrix or data frame.

standardize(
  x,
  y = NULL,
  mean_sd = NULL,
  raw = FALSE,
  as_list = FALSE,
  labels = c("X", "Y")
)

Arguments

x

A data frame or matrix of numeric values.

y

A data frame or matrix of numeric values (must have the same column names in same order as x).

mean_sd

A list of two numeric vectors equal in length to the number of columns with the means and standard deviations, respectively, to use for scaling.

raw

Logical; if TRUE, uses the means and standard deviations given in mean_sd to return the original raw values of x.

as_list

Logical; if TRUE returns a named list with the scaled values of x (and y if provided) along with the means and standard deviations used for scaling. Automatically set to TRUE when y is provided.

labels

A character vector with the labels for the x and y data sets if returning a list.

Value

Either a scaled data frame or matrix or a list with the scaled values and the means and standard deviations used for scaling.

Examples

# Create data frame
x_raw <- round( matrix( rnorm( 9, 100, 15 ), 3, 3 ) )
colnames(x_raw) <- paste0( 'X', 1:3 )
print(x_raw)
#>      X1  X2  X3
#> [1,] 97 130 104
#> [2,] 69  65 124
#> [3,] 96 118 121

# Standardize columns
x <- standardize( x_raw )
print(x)
#>              X1         X2         X3
#> [1,]  0.6085404  0.7420674 -1.1434795
#> [2,] -1.1541284 -1.1371942  0.7108116
#> [3,]  0.5455880  0.3951268  0.4326679

# Create second data frame with same
# variables but new values
y_raw <- round( matrix( rnorm( 9, 50, 15 ), 3, 3 ) )
colnames(y_raw) <- paste0( 'X', 1:3 )
print(y_raw)
#>      X1 X2 X3
#> [1,] 52 39 56
#> [2,] 67 34 40
#> [3,] 63 69 65

# Scale columns of y_raw based on means and
# standard deviations from x_raw
lst <- standardize( x_raw, y_raw, labels = c('x', 'y') )
y <- lst$Data$y
print( y )
#>             X1        X2        X3
#> [1,] -2.224320 -1.888899 -5.593778
#> [2,] -1.280033 -2.033457 -7.077211
#> [3,] -1.531843 -1.021547 -4.759347

# Undo scaling
standardize( y, mean_sd = lst$Scaling, raw = TRUE )
#>      X1 X2 X3
#> [1,] 52 39 56
#> [2,] 67 34 40
#> [3,] 63 69 65