Compute Statistics by Group — stats_by

A function to compute assorted univariate statistics for a specified variable in a data frame over desired grouping factors.

stats_by_group(
  dtf,
  column,
  groupings,
  statistics = c("M", "SD"),
  method = "Student's T",
  categories = 1,
  width = 0.95,
  na.rm = TRUE
)

Arguments

dtf: A data frame.
column: A character string, the column in dtf to compute statistics for.
groupings: A character vector, the columns in dtf to use as grouping factors (it is recommended that they all be categorical variables).
statistics: A character vector, the set of different statistics to compute over groups.
method: A character string, the type of method to use when computing uncertainty intervals. Options include: "Student's T" for means or "Beta-binomial" for proportions.
categories: An optional vector of elements to match over when computing frequencies, proportions, or percentages.
width: A numeric value between 0 and 1, the width for uncertainty intervals.
na.rm: A logical value; if TRUE removes NA values.

Value

A data frame with separate rows for each combination of grouping factors and separate columns for each statistic to compute.

Details

Possible univariate statistics that can be computed:

'N' = Sample size;
'M' = Mean;
'Md' = Median;
'SD' = Standard deviation;
'SE' = Standard error of the mean;
'C' = Counts/frequencies;
'Pr' = Proportions;
'P' = Percentages.

Additionally, specifying 'UI' in combination with the argument method will compute the lower and upper limits of a desired uncertainty interval. The width of the interval can be controlled by the argument width.

Examples

# Example data set
data(iris)
dtf <- iris

# Mean/SD for sepal length by species
dtf |> stats_by_group( 'Sepal.Length', 'Species' )
#>      Species     M        SD
#> 1     setosa 5.006 0.3524897
#> 2 versicolor 5.936 0.5161711
#> 3  virginica 6.588 0.6358796

# Create additional categorical variable
dtf$Long_petal <- c( 'No', 'Yes' )[
  ( dtf$Petal.Length > median( dtf$Petal.Length) ) + 1
]
# Sample size, mean, and confidence intervals using Student's T
# distribution by species and whether petals are long
dtf |> stats_by_group(
  'Sepal.Length', c( 'Species', 'Long_petal' ), c( 'N', 'M', 'UI' )
)
#>      Species Long_petal  N     M    UI_LB    UI_UB
#> 1     setosa         No 50 5.006 4.905824 5.106176
#> 2 versicolor         No 25 5.616 5.462622 5.769378
#> 3 versicolor        Yes 25 6.256 6.074862 6.437138
#> 4  virginica        Yes 50 6.588 6.407285 6.768715

# Create additional categorical variable
dtf$Long_sepal <- c( 'No', 'Yes' )[
  ( dtf$Sepal.Length > median( dtf$Sepal.Length) ) + 1
]
# Proportion and confidence intervals based on beta-binomial
# distribution for long sepals by long petals
dtf |> stats_by_group(
  'Long_sepal', c( 'Long_petal' ), c( 'N', 'Pr', 'UI' ),
  categories = 'Yes', method = 'Beta-binomial'
)
#>   Long_petal  N         Pr     UI_LB     UI_UB
#> 1         No 75 0.06666667 0.0258918 0.1399343
#> 2        Yes 75 0.86666667 0.7763800 0.9293352