A Primer on Applied Statistics

Simple linear regression

Introduction

Formulation
Section 2

1. Formulation

Given a predictor x = \(\{x_1, x_2, …, x_n\}\) (or independent variable) and an outcome y = \(\{y_1, y_2, …, y_n\}\) (or dependent variable), the formula for the simple linear regression model is

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]

where \(\epsilon_i \sim \text{N}( 0, \sigma_{\epsilon}^2 )\).

In other words, the residuals \(\epsilon\) follows a normal distribution with a mean of 0 and a variance of \(\sigma_{\epsilon}^2\). An equivalent way of writing this is

\[y_i \sim \text{N}( \mu_i, \sigma_{\epsilon}^2 ) \text{ with } \mu_i = \beta_0 + \beta_1 x_i.\]

This formulation helps to emphasizes some key assumptions for the simple linear regression model:

Linearity - the expected value for \(y_i\) is a linear combination of the intercept and \(x_i\) with the coefficients \(\beta_0\) and \(\beta_1\);
Homoscedasticity - as indicated by a lack of a subscript i on \(\sigma_{\epsilon}^2\), the variance of the distribution is fixed irrespective of the values of y or x;
Independence - observations are independent and identically distributed (i.i.d.).
- Normality - the dependent variable is assumed to follow a normal distribution;

3. Decomposition of the variance for y

In the simple linear model, the variance for the outcome y can be decomposed into

\[\sigma_{y}^2 = \sigma_{\text{R}}^2 + \sigma_{\epsilon}^2.\]

The first term to on the right-hand side of the equation refers to the variance accounted by the predictor x, which is

\[\sigma_{R}^2 = \beta_1^2 \sigma_{x}^2,\]

where \(\sigma_{x}^2\) is the variance for the predictor x.

Therefore, the proportion of variance of the outcome y accounted for by its linear relation with the predictor x, a metric known as \(R^2\), is

\[R^2 = \frac{\sigma_{R}^2}{\sigma_{y}^2}\]

which is equivalent to

\[R^2 = 1 - \frac{\sigma_{\epsilon}^2}{\sigma_{y}^2}.\] \[\sigma_{\epsilon}^2 = \frac{\beta_1^2 \sigma_{x}^2(1 - R^2 )}{R^2}\]

?. Estimation

DISCUSS “Least-squares solution”.

The sampling distribution for the parameter \(\beta_1\) is N(\(\beta_1\), \(\sigma_{\beta_1}^2\)), where

\[\sigma_{\beta_1}^2 = \frac{1}{n - 2} \lgroup \frac{\sigma_{y}^2}{\sigma_{x}^2} \rgroup (1 - R^2).\]

Here, \(\sigma_{y}^2\) and \(\sigma_{x}^2\) refer to the standard deviation of y and x (in the population).

References:

Reference →

🡹 🡻

1. Section 2

Content.

# Example R code

Note: Advanced content.

References:

Reference →

🡹

Return to: Chapter; Sections; Index; Home page

A Primer on Applied Statistics

A user-friendly set of tutorials/FAQs for applied statistics, with a focus on psychology research.

Simple linear regression

Table of contents

1. Formulation

3. Decomposition of the variance for y

?. Estimation

References:

1. Section 2

References: