summa.RdA function that allows users to create a nicely formatted character string with summary statistics based on user-supplied identifiers via a simple, intuitive syntax.
summa(
x,
syntax = "[[M]] ([[SD]])",
categories = NULL,
digits = NULL,
na.rm = TRUE,
pad = FALSE,
f = NULL,
...
)A vector of values.
A character string with
identifiers in the form [[.]]
(where . can be a variety of
letter sets for different summary statistics) -
the function then substitutes the appropriate
computed value for the corresponding
identifier (see details for more information).
An optional vector of
elements to match in x when
computing frequencies, proportions,
or percentages.
Number of digits to round summary statistics.
Logical; if TRUE
removes NA values from x.
Logical; if TRUE pads
values with 0 to all have a matching
number of decimal places.
An optional user-defined function
that takes x as a first argument
and returns a vector of values.
The i-th outputted value will then
be substituted for the corresponding
identifier [[i]] (see examples).
Additional arguments for the
user-defined function f.
A character string.
This function provides some simple syntax to allow users to write out a custom phrase for reporting summary statistics. The function then searches the input for identifiers - once found, the function computes the appropriate summary statistic and substitutes the numeric result in place of the given identifier.
For example, a user can provide the phrase:
'Mean = [[M]]',
and the function will then substitute the sample
mean of the vector x for the identifier
[[M]].
Pre-defined identifiers are:
[[N]] = Sample size;
[[M]] = Mean;
[[SD]] = Standard deviation;
[[SE]] = Standard error of the mean;
[[Mn]] = Minimum;
[[Q1]] = 1st quartile;
[[Md]] = Median;
[[Q3]] = 2nd quartile;
[[Mx]] = Maximum;
[[IQR]] = Inter-quartile range;
[[C]] = Counts/frequencies;
[[P]] = Percent;
[[Pr]] = Proportion.
Users can also pass in a custom function f
that takes x as a first argument and
returns a vector of values. Then element i
from the outputted vector is substituted for
the identifier [[i]].
# Example using 'iris' data set
data("iris")
# Continuous variable - sepal length
x <- iris$Sepal.Length
# Mean and standard deviation
summa(x)
#> [1] "5.84 (0.83)"
# Median and IQR
summa(x, "[[M]] ([[IQR]])")
#> [1] "5.84 (1.3)"
# Pad to 2 decimal places
summa(x, "[[M]] ([[IQR]])", pad = TRUE)
#> [1] "5.84 (1.30)"
# Mean (SD); N [min and max]
summa(x, "[[N]]; [[M]] ([[SD]]); " %p%
"[[[Mn]], [[Q1]], [[Md]], [[Q3]], [[Mx]]]",
digits = 1
)
#> [1] "150; 5.8 (0.8); [4.3, 5.1, 5.8, 6.4, 7.9]"
# Custom measures via user-defined function
# (e.g., bootstrapped confidence interval)
fnc <- function(x) {
btstrp <- bootstrap(
x,
summary = function(y) quantile(y, c(.025, .975))
)
return(btstrp$summary)
}
summa(x, "[[M]] ([[SE]]) [[[1]] to [[2]]]",
f = fnc
)
#> [1] "5.84 (0.07) [5.71 to 5.98]"
# Example using 'mtcars' data set
# Categorical variable - # of forward gears
data("mtcars")
x <- mtcars$gear
# Percent and counts for 3 forward gears
summa(x == 3, "[[P]]% ([[C]] out of [[N]])")
#> [1] "46.9% (15 out of 32)"
# Percent and counts for 4 or 5 forward gears
summa(x, "[[P]]% ([[C]] out of [[N]])",
categories = c(4, 5)
)
#> [1] "53.1% (17 out of 32)"