Skip to the content.

Probability axioms

Paraphrasing from an excellent blog post by John Mount titled Kolmogorov’s Axioms of Probability: Even Smarter Than You Have Been Told , we have two major questions regarding probabilities:

  1. What do probabilities mean?
  2. What kind of calculations (e.g., addition, multiplication) can we perform on/with probabilities?

An axiomatic definition of probability (axioms are propositions to be taken as true) allows us to answer question 2. The most well-known axiomatic approach is based on Kolmogorov (1933), who laid out 6 axioms based on set theory. The final 3 axioms, with some reformulation, provide the core definition of probability reported in modern textbooks today.

Note: Others have laid out their own formulations to address question 2, but Kolmogorov’s axioms are the most well-known and successful.

Table of contents

  1. The Kolmogorov axioms
  2. Current axiomatic foundations
  3. Calculus of probabilities
  4. Interpreting probabilities

🡻

1. The Kolmogorov axioms

Kolmogorov (1933) set forth 6 axioms to define probabilities. Based on a 2018 translation of his work, with slight notational changes:

Let \( \Omega \) (the sample space) be a collection of elements \( e_{1}, e_{2}, e_{3}, …, \) dubbed elementary events, and let \( \mathcal{F} \) be a set of subsets of \( \Omega \), where the elements of \( \mathcal{F} \) are called random events. Then…

  1. \( \mathcal{F} \) is a field of sets (now known as a set algebra).
  2. \( \mathcal{F} \) contains \( \Omega \).
  3. To each set \( E_{i} \) in \( \mathcal{F} \), a non-negative real number \( P(E_{i}) \) is assigned. The number \( P(E_{i}) \) is called the probability of event \( E_{i} \).
  4. \( P(\Omega) \) equals 1.
  5. If set \( E_{i} \) and \( E_{j} \) have no element in common \( (E_{i} \cap E_{j} = \emptyset) \), then \( P(E_{i} \cup E_{j}) = P(E_{i}) + P(E_{j}) \).
  6. If \(E_{1} \supseteq E_{2} \supseteq \dots \) is a countably infinite decreasing sequence of events from \( \mathcal{F} \) with \( \bigcap_{i=1}^{\infty} E_{i} = \emptyset \) then \( \lim_{i \to \infty} P(E_{i}) = 0 \).

Axioms 1 - 2: These axioms provide some useful groundwork in preparation of defining probability. Important implications are:

Axioms 3 - 5: These are the key axioms that define probability. These axioms (or variants thereof) are what you will typically see reported in textbooks and introductory statistics courses. Several useful collaries can be derived from them:

Axiom 6: This axiom ensures the definition of probability works both for finite and continuous variables.

References:

🡹 🡻

2. Current axiomatic foundations

Standard textbooks on probability reformulate the axioms proposed by Kolmogov, providing some groundwork from set theory and then presenting three axioms. First, we define a \( \sigma\text{-algebra} \) or Borel field \( (\mathcal{B}) \) as a collection of subsets of a sample space \( \Omega \) with the following 3 properties:

  1. \( \emptyset \in \mathcal{B} \); The empty set is an element of \( \mathcal{B} \)).
  2. If \( E \in \mathcal{B} \) then \( E^{c} \in \mathcal{B} \); \( \mathcal{B} \) is closed under complementation.
  3. If \( E_{1}, E_{2}, \dots \in \mathcal{B} \) then \( \bigcup_{i=1}^{\infty} E_i \in \mathcal{B} \); \( \mathcal{B} \) is closed under countable unions.

Given a sample space \( \Omega \) and an associated \( \sigma\text{-algebra} \) \( \mathcal{B} \), a probability function is therefore a function P with domain \( \mathcal{B} \) that satisfies:

  1. \( P(E) \geq 0 \); The probability of any event must be non-negative.
  2. \( P(\Omega) = 1 \); The probability of any event occuring, including the empty set, is one.
  3. If \( E_{1}, E_{2}, \dots \in \mathcal{B} \) are pairwise disjoint, then \( P(\bigcup_{i=1}^{\infty} E_i) = \sum_{i=1}^{\infty} P(E_{i}). \); This is known as countable additivity.

Given a finite set, there is an intuitive general-purpose method to define a function that satisfies Kolmogorov’s axioms and therefore is a legimitate probability function. Let \( \Omega = { e_{1}, e_{2}, …, e_{n} } \) be a finite set of elements. Let \( \mathcal{B} \) be a \( \sigma\text{-algebra} \) of subsets of \( \Omega \). Let \( p_{1}, p_{2}, …, p_{n} \) be nonnegative numbers that sum to 1. Then for any \( E \in \mathcal{B} \) we can define \( P(E) \) as:

\[P(E) = \sum_{i:e_{i} \in A} p_{i}.\]
References:

🡹 🡻

3. Calculus of probabilities

If \( P \) is a probability function and A is any set in \( \mathcal{B} \), then:

If \( P \) is a probability function and A and B are any sets in \( \mathcal{B} \), then:

Bonferonni’s Inequality provides a bound on a simultaneous event based on the probabilities of individual events:

\[P(A \cap B) \geq P(A) + P(B) - 1.\]
# Example R code

Note: Advanced content.

References:

🡹

4. Interpreting probabilities

Content.

# Example R code

Note: Advanced content.

References:

🡹

Return to: Probability; Sections; Home page