summa.Rd
A function that allows users to create a nicely formatted character string with summary statistics based on user-supplied identifiers via a simple, intuitive syntax.
summa(
x,
syntax = "[[M]] ([[SD]])",
categories = NULL,
digits = NULL,
na.rm = TRUE,
pad = FALSE,
f = NULL,
...
)
A vector of values.
A character string with
identifiers in the form [[.]]
(where .
can be a variety of
letter sets for different summary statistics) -
the function then substitutes the appropriate
computed value for the corresponding
identifier (see details for more information).
An optional vector of
elements to match in x
when
computing frequencies, proportions,
or percentages.
Number of digits to round summary statistics.
Logical; if TRUE
removes NA
values from x
.
Logical; if TRUE
pads
values with 0 to all have a matching
number of decimal places.
An optional user-defined function
that takes x
as a first argument
and returns a vector of values.
The i-th outputted value will then
be substituted for the corresponding
identifier [[i]]
(see examples).
Additional arguments for the
user-defined function f
.
A character string.
This function provides some simple syntax to allow users to write out a custom phrase for reporting summary statistics. The function then searches the input for identifiers - once found, the function computes the appropriate summary statistic and substitutes the numeric result in place of the given identifier.
For example, a user can provide the phrase:
'Mean = [[M]]'
,
and the function will then substitute the sample
mean of the vector x
for the identifier
[[M]]
.
Pre-defined identifiers are:
[[N]]
= Sample size;
[[M]]
= Mean;
[[SD]]
= Standard deviation;
[[SE]]
= Standard error of the mean;
[[Mn]]
= Minimum;
[[Q1]]
= 1st quartile;
[[Md]]
= Median;
[[Q3]]
= 2nd quartile;
[[Mx]]
= Maximum;
[[IQR]]
= Inter-quartile range;
[[C]]
= Counts/frequencies;
[[P]]
= Percent;
[[Pr]]
= Proportion.
Users can also pass in a custom function f
that takes x
as a first argument and
returns a vector of values. Then element i
from the outputted vector is substituted for
the identifier [[i]]
.
# Example using 'iris' data set
data("iris")
# Continuous variable - sepal length
x <- iris$Sepal.Length
# Mean and standard deviation
summa(x)
#> [1] "5.84 (0.83)"
# Median and IQR
summa(x, "[[M]] ([[IQR]])")
#> [1] "5.84 (1.3)"
# Pad to 2 decimal places
summa(x, "[[M]] ([[IQR]])", pad = TRUE)
#> [1] "5.84 (1.30)"
# Mean (SD); N [min and max]
summa(x, "[[N]]; [[M]] ([[SD]]); " %p%
"[[[Mn]], [[Q1]], [[Md]], [[Q3]], [[Mx]]]",
digits = 1
)
#> [1] "150; 5.8 (0.8); [4.3, 5.1, 5.8, 6.4, 7.9]"
# Custom measures via user-defined function
# (e.g., bootstrapped confidence interval)
fnc <- function(x) {
btstrp <- bootstrap(
x,
summary = function(y) quantile(y, c(.025, .975))
)
return(btstrp$summary)
}
summa(x, "[[M]] ([[SE]]) [[[1]] to [[2]]]",
f = fnc
)
#> [1] "5.84 (0.07) [5.71 to 5.98]"
# Example using 'mtcars' data set
# Categorical variable - # of forward gears
data("mtcars")
x <- mtcars$gear
# Percent and counts for 3 forward gears
summa(x == 3, "[[P]]% ([[C]] out of [[N]])")
#> [1] "46.9% (15 out of 32)"
# Percent and counts for 4 or 5 forward gears
summa(x, "[[P]]% ([[C]] out of [[N]])",
categories = c(4, 5)
)
#> [1] "53.1% (17 out of 32)"