A function that allows users to create a nicely formatted character string with summary statistics based on user-supplied identifiers via a simple, intuitive syntax.

summa(
  x,
  syntax = "[[M]] ([[SD]])",
  categories = NULL,
  digits = NULL,
  na.rm = TRUE,
  pad = FALSE,
  f = NULL,
  ...
)

Arguments

x

A vector of values.

syntax

A character string with identifiers in the form [[.]] (where . can be a variety of letter sets for different summary statistics) - the function then substitutes the appropriate computed value for the corresponding identifier (see details for more information).

categories

An optional vector of elements to match in x when computing frequencies, proportions, or percentages.

digits

Number of digits to round summary statistics.

na.rm

Logical; if TRUE removes NA values from x.

pad

Logical; if TRUE pads values with 0 to all have a matching number of decimal places.

f

An optional user-defined function that takes x as a first argument and returns a vector of values. The i-th outputted value will then be substituted for the corresponding identifier [[i]] (see examples).

...

Additional arguments for the user-defined function f.

Value

A character string.

Details

This function provides some simple syntax to allow users to write out a custom phrase for reporting summary statistics. The function then searches the input for identifiers - once found, the function computes the appropriate summary statistic and substitutes the numeric result in place of the given identifier.

For example, a user can provide the phrase:

'Mean = [[M]]',

and the function will then substitute the sample mean of the vector x for the identifier [[M]].

Pre-defined identifiers are:

  • [[N]] = Sample size;

  • [[M]] = Mean;

  • [[SD]] = Standard deviation;

  • [[SE]] = Standard error of the mean;

  • [[Mn]] = Minimum;

  • [[Q1]] = 1st quartile;

  • [[Md]] = Median;

  • [[Q3]] = 2nd quartile;

  • [[Mx]] = Maximum;

  • [[IQR]] = Inter-quartile range;

  • [[C]] = Counts/frequencies;

  • [[P]] = Percent;

  • [[Pr]] = Proportion.

Users can also pass in a custom function f that takes x as a first argument and returns a vector of values. Then element i from the outputted vector is substituted for the identifier [[i]].

Examples

# Example using 'iris' data set
data("iris")
# Continuous variable - sepal length
x <- iris$Sepal.Length

# Mean and standard deviation
summa(x)
#> [1] "5.84 (0.83)"
# Median and IQR
summa(x, "[[M]] ([[IQR]])")
#> [1] "5.84 (1.3)"
# Pad to 2 decimal places
summa(x, "[[M]] ([[IQR]])", pad = TRUE)
#> [1] "5.84 (1.30)"
# Mean (SD); N [min and max]
summa(x, "[[N]]; [[M]] ([[SD]]); " %p%
  "[[[Mn]], [[Q1]], [[Md]], [[Q3]], [[Mx]]]",
digits = 1
)
#> [1] "150; 5.8 (0.8); [4.3, 5.1, 5.8, 6.4, 7.9]"

# Custom measures via user-defined function
# (e.g., bootstrapped confidence interval)
fnc <- function(x) {
  btstrp <- bootstrap(
    x,
    summary = function(y) quantile(y, c(.025, .975))
  )
  return(btstrp$summary)
}
summa(x, "[[M]] ([[SE]]) [[[1]] to [[2]]]",
  f = fnc
)
#> [1] "5.84 (0.07) [5.71 to 5.98]"

# Example using 'mtcars' data set
# Categorical variable - # of forward gears
data("mtcars")
x <- mtcars$gear

# Percent and counts for 3 forward gears
summa(x == 3, "[[P]]% ([[C]] out of [[N]])")
#> [1] "46.9% (15 out of 32)"
# Percent and counts for 4 or 5 forward gears
summa(x, "[[P]]% ([[C]] out of [[N]])",
  categories = c(4, 5)
)
#> [1] "53.1% (17 out of 32)"