stats_by_group.RdA function to compute assorted univariate statistics for a specified variable in a data frame over desired grouping factors.
stats_by_group(
dtf,
column,
groupings,
statistics = c("M", "SD"),
method = "Student's T",
categories = 1,
width = 0.95,
na.rm = TRUE
)A data frame.
A character string, the column in dtf to
compute statistics for.
A character vector, the columns in dtf
to use as grouping factors (it is recommended that they all
be categorical variables).
A character vector, the set of different statistics to compute over groups.
A character string, the type of method to use
when computing uncertainty intervals. Options include:
"Student's T" for means or
"Beta-binomial" for proportions.
An optional vector of elements to match over when computing frequencies, proportions, or percentages.
A numeric value between 0 and 1, the width for uncertainty intervals.
A logical value; if TRUE removes
NA values.
A data frame with separate rows for each combination of grouping factors and separate columns for each statistic to compute.
Possible univariate statistics that can be computed:
'N' = Sample size;
'M' = Mean;
'Md' = Median;
'SD' = Standard deviation;
'SE' = Standard error of the mean;
'C' = Counts/frequencies;
'Pr' = Proportions;
'P' = Percentages.
Additionally, specifying 'UI' in combination with the
argument method will compute the lower and upper limits
of a desired uncertainty interval. The width of the interval
can be controlled by the argument width.
# Example data set
data(iris)
dtf <- iris
# Mean/SD for sepal length by species
dtf |> stats_by_group( 'Sepal.Length', 'Species' )
#> Species M SD
#> 1 setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3 virginica 6.588 0.6358796
# Create additional categorical variable
dtf$Long_petal <- c( 'No', 'Yes' )[
( dtf$Petal.Length > median( dtf$Petal.Length) ) + 1
]
# Sample size, mean, and confidence intervals using Student's T
# distribution by species and whether petals are long
dtf |> stats_by_group(
'Sepal.Length', c( 'Species', 'Long_petal' ), c( 'N', 'M', 'UI' )
)
#> Species Long_petal N M UI_LB UI_UB
#> 1 setosa No 50 5.006 4.905824 5.106176
#> 2 versicolor No 25 5.616 5.462622 5.769378
#> 3 versicolor Yes 25 6.256 6.074862 6.437138
#> 4 virginica Yes 50 6.588 6.407285 6.768715
# Create additional categorical variable
dtf$Long_sepal <- c( 'No', 'Yes' )[
( dtf$Sepal.Length > median( dtf$Sepal.Length) ) + 1
]
# Proportion and confidence intervals based on beta-binomial
# distribution for long sepals by long petals
dtf |> stats_by_group(
'Long_sepal', c( 'Long_petal' ), c( 'N', 'Pr', 'UI' ),
categories = 'Yes', method = 'Beta-binomial'
)
#> Long_petal N Pr UI_LB UI_UB
#> 1 No 75 0.06666667 0.0258918 0.1399343
#> 2 Yes 75 0.86666667 0.7763800 0.9293352