stats_by_group.Rd
A function to compute assorted univariate statistics for a specified variable in a data frame over desired grouping factors.
stats_by_group(
dtf,
column,
groupings,
statistics = c("M", "SD"),
method = "Student's T",
categories = 1,
width = 0.95,
na.rm = TRUE
)
A data frame.
A character string, the column in dtf
to
compute statistics for.
A character vector, the columns in dtf
to use as grouping factors (it is recommended that they all
be categorical variables).
A character vector, the set of different statistics to compute over groups.
A character string, the type of method to use
when computing uncertainty intervals. Options include:
"Student's T"
for means or
"Beta-binomial"
for proportions.
An optional vector of elements to match over when computing frequencies, proportions, or percentages.
A numeric value between 0 and 1, the width for uncertainty intervals.
A logical value; if TRUE
removes
NA
values.
A data frame with separate rows for each combination of grouping factors and separate columns for each statistic to compute.
Possible univariate statistics that can be computed:
'N'
= Sample size;
'M'
= Mean;
'Md'
= Median;
'SD'
= Standard deviation;
'SE'
= Standard error of the mean;
'C'
= Counts/frequencies;
'Pr'
= Proportions;
'P'
= Percentages.
Additionally, specifying 'UI'
in combination with the
argument method
will compute the lower and upper limits
of a desired uncertainty interval. The width of the interval
can be controlled by the argument width
.
# Example data set
data(iris)
dtf <- iris
# Mean/SD for sepal length by species
dtf |> stats_by_group( 'Sepal.Length', 'Species' )
#> Species M SD
#> 1 setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3 virginica 6.588 0.6358796
# Create additional categorical variable
dtf$Long_petal <- c( 'No', 'Yes' )[
( dtf$Petal.Length > median( dtf$Petal.Length) ) + 1
]
# Sample size, mean, and confidence intervals using Student's T
# distribution by species and whether petals are long
dtf |> stats_by_group(
'Sepal.Length', c( 'Species', 'Long_petal' ), c( 'N', 'M', 'UI' )
)
#> Species Long_petal N M UI_LB UI_UB
#> 1 setosa No 50 5.006 4.905824 5.106176
#> 2 versicolor No 25 5.616 5.462622 5.769378
#> 3 versicolor Yes 25 6.256 6.074862 6.437138
#> 4 virginica Yes 50 6.588 6.407285 6.768715
# Create additional categorical variable
dtf$Long_sepal <- c( 'No', 'Yes' )[
( dtf$Sepal.Length > median( dtf$Sepal.Length) ) + 1
]
# Proportion and confidence intervals based on beta-binomial
# distribution for long sepals by long petals
dtf |> stats_by_group(
'Long_sepal', c( 'Long_petal' ), c( 'N', 'Pr', 'UI' ),
categories = 'Yes', method = 'Beta-binomial'
)
#> Long_petal N Pr UI_LB UI_UB
#> 1 No 75 0.06666667 0.0258918 0.1399343
#> 2 Yes 75 0.86666667 0.7763800 0.9293352