Flexible Formatted Summary Statistics

A function that allows users to create a nicely formatted character string with summary statistics based on user-supplied identifiers via a simple, intuitive syntax.

summa(
  x,
  syntax = "[[M]] ([[SD]])",
  categories = NULL,
  digits = NULL,
  na.rm = TRUE,
  pad = FALSE,
  f = NULL,
  ...
)

Arguments

x: A vector of values.
syntax: A character string with identifiers in the form [[.]] (where . can be a variety of letter sets for different summary statistics) - the function then substitutes the appropriate computed value for the corresponding identifier (see details for more information).
categories: An optional vector of elements to match in x when computing frequencies, proportions, or percentages.
digits: Number of digits to round summary statistics.
na.rm: Logical; if TRUE removes NA values from x.
pad: Logical; if TRUE pads values with 0 to all have a matching number of decimal places.
f: An optional user-defined function that takes x as a first argument and returns a vector of values. The i-th outputted value will then be substituted for the corresponding identifier [[i]] (see examples).
...: Additional arguments for the user-defined function f.

Value

A character string.

Details

This function provides some simple syntax to allow users to write out a custom phrase for reporting summary statistics. The function then searches the input for identifiers - once found, the function computes the appropriate summary statistic and substitutes the numeric result in place of the given identifier.

For example, a user can provide the phrase:

'Mean = [[M]]',

and the function will then substitute the sample mean of the vector x for the identifier [[M]].

Pre-defined identifiers are:

[[N]] = Sample size;
[[M]] = Mean;
[[SD]] = Standard deviation;
[[SE]] = Standard error of the mean;
[[Mn]] = Minimum;
[[Q1]] = 1st quartile;
[[Md]] = Median;
[[Q3]] = 2nd quartile;
[[Mx]] = Maximum;
[[IQR]] = Inter-quartile range;
[[C]] = Counts/frequencies;
[[P]] = Percent;
[[Pr]] = Proportion.

Users can also pass in a custom function f that takes x as a first argument and returns a vector of values. Then element i from the outputted vector is substituted for the identifier [[i]].

Examples

# Example using 'iris' data set
data("iris")
# Continuous variable - sepal length
x <- iris$Sepal.Length

# Mean and standard deviation
summa(x)
#> [1] "5.84 (0.83)"
# Median and IQR
summa(x, "[[M]] ([[IQR]])")
#> [1] "5.84 (1.3)"
# Pad to 2 decimal places
summa(x, "[[M]] ([[IQR]])", pad = TRUE)
#> [1] "5.84 (1.30)"
# Mean (SD); N [min and max]
summa(x, "[[N]]; [[M]] ([[SD]]); " %p%
  "[[[Mn]], [[Q1]], [[Md]], [[Q3]], [[Mx]]]",
digits = 1
)
#> [1] "150; 5.8 (0.8); [4.3, 5.1, 5.8, 6.4, 7.9]"

# Custom measures via user-defined function
# (e.g., bootstrapped confidence interval)
fnc <- function(x) {
  btstrp <- bootstrap(
    x,
    summary = function(y) quantile(y, c(.025, .975))
  )
  return(btstrp$summary)
}
summa(x, "[[M]] ([[SE]]) [[[1]] to [[2]]]",
  f = fnc
)
#> [1] "5.84 (0.07) [5.71 to 5.98]"

# Example using 'mtcars' data set
# Categorical variable - # of forward gears
data("mtcars")
x <- mtcars$gear

# Percent and counts for 3 forward gears
summa(x == 3, "[[P]]% ([[C]] out of [[N]])")
#> [1] "46.9% (15 out of 32)"
# Percent and counts for 4 or 5 forward gears
summa(x, "[[P]]% ([[C]] out of [[N]])",
  categories = c(4, 5)
)
#> [1] "53.1% (17 out of 32)"