Function for running a principal components analysis (PCA) with training (and test) data. Expands on the stats::prcomp function.

principal_components_analysis(train, test = NULL)

Arguments

train

A data frame or matrix. Assumes all columns should be included in the PCA.

test

An optional data frame or matrix. Must have the same number of columns and same column names as train.

Value

A list with the standardized data, the results of the call to stats::prcomp, the rotation matrix (eigenvectors), the eigenvalues, the correlations between raw scores and component scores, and the root-mean square error for the training and test sets when using a smaller number of components.

Examples

# Simulate training and test data
train <- MASS::mvrnorm( 100, rep( 0, 8 ), diag(8) )
test <- MASS::mvrnorm( 100, rep( 0, 8 ), diag(8) )

PCA <- principal_components_analysis(
  train = train, test = test
)

# Loading matrix
lambda <- cbind(
  c( runif( 4, .3, .9 ), rep( 0, 4 ) ),
  c( rep( 0, 4 ), runif( 4, .3, .9 ) )
)
# Communalities
D_tau <- diag( runif( 8, .5, 1.5 ) )

cov_mat <- lambda %*% t( lambda ) + D_tau
cor_mat <- cov2cor( cov_mat )

set.seed( 341 ) # For reproducibility
x <- MASS::mvrnorm( n = 200, mu = rep( 0, 8 ), Sigma = cor_mat )
colnames(x) <- paste0( 'C', 1:8 )

PCA <- principal_components_analysis(
  train = x
)