Jan 09, 2025
Click on the link or scan the QR code to answer the Ed Discussion poll
https://edstem.org/us/courses/68995/discussion/5942168

Introduction to the course
Syllabus activity
Review of probability
Bayesian Health Data Science involves using Bayesian methods to analyze health data, which can include electronic health records (EHR), clinical trial data, and other health-related datasets. These methods are model-based and can appropriately quantify and propagate uncertainty, making them suitable for tackling challenges in health research.
Source: ChatGPT
Modeling
Probabilistic Programming
Prerequisites: BIOSTAT 724 (Introduction to Applied Bayesian Analysis) or equivalent course with instructor permission.
By the end of the semester, you will be able to…

All analyses using R, a statistical programming language
Inference using Stan, a probabilistic programming language (rstan)
Write reproducible reports in Quarto
Access RStudio through STA725 Docker Containers

Access assignments
Facilitates version control and collaboration
All work in BIOSTAT 725 course organization
Group 1: What to expect in the course
Group 2: Homework
Group 3: Exams
Group 4: Live Coding
Group 5: Application Exercises
Group 6: Academic honesty (except AI policy)
Group 7: Artificial intelligence policy
Group 8: Late work policy and waiver for extenuating circumstances
Group 10: Getting help in the course
Group 1: What to expect in the course
Group 2: Homework
Group 3: Exams
Group 4: Live Coding
Group 5: Application Exercises
Group 6: Academic honesty (except AI policy)
Group 7: Artificial intelligence policy
Group 8: Late work policy and waiver for extenuating circumstances
Group 10: Getting help in the course
| Category | Percentage |
|---|---|
| Homework | 40% |
| Exam 01 | 20% |
| Exam 02 | 20% |
| Live Coding | 10% |
| Application Exercises | 10% |
| Total | 100% |
Complete all the preparation work before class.
Ask questions in class, office hours, and on Ed Discussion.
Do the homework; get started on homework early when possible.
Don’t procrastinate and don’t let a week pass by with lingering questions.
Stay up-to-date on announcements on Ed Discussion and sent via email.
This is foundational material that you should have already learned in a previous course. I’m reviewing important concepts that are needed for Bayesian inference.
The goal of Bayesian statistics is to compute the posterior distribution (i.e., the uncertainty distribution of the parameters, \(\boldsymbol{\theta}\), after observing the data, \(\mathbf{Y}\)).
This is the conditional distribution of \(\boldsymbol{\theta}\) given \(\mathbf{Y}\).
Therefore, we need to review the probability concepts that lead to the conditional distribution of one variable conditioned on another.
Probability mass (pmf) and density (pdf) functions
Joint distributions
Marginal and conditional distributions
\(X\) (capital) is a random variable.
We want to compute the probability that \(X\) takes on a specific value \(x\) (lowercase).
We also might want to compute the probability of \(X\) being in a set \(\mathcal A\).
The set of possible values that \(X\) can take on is called its support, \(\mathcal S\).
Example 1: \(X\) is the roll of a die.
Example 2: \(X\) is a newborn baby’s weight.
Objective (associated with frequentist)
Subjective (associated with Bayesian)
A Bayesian analysis makes use of both of these concepts.
Aleatoric uncertainty (likelihood)
Uncontrollable randomness in the experiment.
For example, the results of a fair coin flip can never be predicted with certainty.
Epistemic uncertainty (prior/posterior)
Uncertainty about a quantity that could theoretically be known.
For example, if we flipped a coin infinitely-many times we could know the true probability of a head.
A Bayesian analysis makes use of both of these concepts.
We often distinguish between discrete and continuous random variables.
The random variable \(X\) is discrete if its support \(\mathcal S\) is countable.
Examples:
\(X \in \{0, 1, 2, 3\}\) is the number of successes in 3 trials.
\(X \in \{0, 1, 2, \ldots\}\) is the number of patients with COVID in Durham County.
We often distinguish between discrete and continuous random variables.
The random variable \(X\) is continuous if its support \(\mathcal S\) is uncountable.
Examples with \(\mathcal S = (0, \infty)\):
\(X > 0\) is systolic blood pressure.
\(X > 0\) is a patient’s BMI.
If \(X\) is discrete we describe its distribution with its probability mass function (pmf).
The pmf is \(f(x) = P(X = x)\).
The domain of \(X\) is the set of \(x\) with \(f(x) > 0\).
We must have \(f(x) \geq 0\) and \(\sum_x f(x) = 1\).
The mean is \(\mathbb E[X] = \sum_x x f(x)\).
The variance is \(\mathbb V(X) = \sum_x(x − \mathbb E[X])^2f(x)\).
The last three sums are over \(X\)’s domain.
A statistical analysis typically proceeds by selecting a pmf that seems to match the distribution of a sample.
We rarely know the pmf exactly, but we assume it is from a parametric family of distributions.
For example, Binomial(10, 0.5) and Binomial(4, 0.1) are different but both from the binomial family.
A family of distributions have the same equation for the pmf but differ by some unknown parameters \(\boldsymbol{\theta}\).
We must estimate these parameters.
If \(X\) is continuous we describe its distribution with the probability density function (pdf) \(f(x) \geq 0\).
Since there are uncountably many possible values, \(P(X = x) = 0\) for all \(x\).
Probabilities are computed as areas under the pdf curve \[P(a < X < b) = \int_a^b f(x)dx.\]
Therefore, to be valid \(f(x)\) must satisfy \(f(x) \geq 0\) and \[P(−\infty < X < \infty) = \int_{-\infty}^{\infty} f(x)dx = 1.\]
The domain is the set of \(x\) values with \(f(x) > 0\).
The mean and the variance are defined similarly to the discrete case but with the sums replaced by integrals.
The mean is \(\mathbb E[X] = \int x f(x)dx\).
The variance is \(\mathbb V(X) = \int (x − \mathbb E[X])^2 f(x)dx\).
\(\mathbf{X} = (X_1, \ldots, X_p)\) is a random vector (vectors and matrices should be in bold).
For notational convenience, let’s consider only \(p = 2\) random variables \(X\) and \(Y\).
\((X, Y)\) is discrete if it can take on a countable number of values, such as:
\((X, Y)\) is continuous if it can take on an uncountable number of values, such as:
The joint pmf: \(f(x, y) = P(X = x, Y = y)\)
The marginal pmf for \(X\): \(f_X(x) = P(X = x) = \sum_y f(x, y)\)
The marginal pmf for \(Y\): \(f_Y(y) = P(Y = y) = \sum_x f(x, y)\)
The marginal distribution is the same as univariate distribution as if we ignored the other variable.
The conditional pmf of \(Y\) given \(X\) is \(f(y|x) = P(Y = y|X = x) = \frac{P(X = x, Y = y)}{P(X = x)} = \frac{f(x, y)}{f_X (x)}.\)
\(X\) and \(Y\) are independent if \(f(x, y) = f_X(x)f_Y(y)\) for all \(x\) and \(y\).
Equivalently, \(X\) and \(Y\) are independent if \(f(x|y) = f_X(x)\) for all \(x\) and \(y\).
Notation: \(X_1, \dots, X_n \overset{\mathrm{iid}}{\sim} f(x)\) means that \(X_1, \ldots, X_n\) are independent and identically distributed.
This implies the joint pmf is \[P(X_1 = x_1, \ldots, X_n = x_n) = \prod_{i=1}^n f(x_i).\]
The same notation and definitions of independence apply to continuous random variables.
In this class, assume independence unless otherwise noted.
Manipulating joint pdfs is similar to joint pmfs but sums are replaced by integrals.
The joint pdf is denoted \(f(x, y)\).
Probabilities are computed as volume under the pdf: \[P((X, Y) ∈ A) = \int_A f(x, y)dxdy\] where \(A \subset \mathbb{R}^2\).
The marginal pdf of \(X\) is \(f_X(x) = \int f(x, y)dy\).
\(f_X\) is the univariate pdf for \(X\) as if we never considered \(Y\).
The conditional pdf of \(Y\) given \(X\) is \[f(y|x) = \frac{f(x, y)}{f_X (x)}.\]
Proper: \(\int f(y|x)dy = \int \frac{f(x,y)}{f_X(x)}dy = \frac{\int f(x,y)dy}{f_X(x)} = 1\).
Specifying joint distributions is hard.
Every joint distribution can be written \(f(x, y) = f(y|x)f(x)\).
Therefore, any joint distribution can be defined by,
\(X\)’s marginal distribution
The conditional distribution of \(Y|X\)
The joint problem reduces to two univariate problems.
This idea forms the basis of hierarchical modeling.

Thomas Bayes, 1701-1761

Pierre-Simon Laplace, 1749-1827
\[f(\boldsymbol{\theta}|\mathbf{Y}) = \frac{f(\mathbf{Y}|\boldsymbol{\theta})f(\boldsymbol{\theta})}{\int f(\mathbf{Y}|\boldsymbol{\theta})f(\boldsymbol{\theta})d\boldsymbol{\theta}}\]
Complete HW 00 tasks
Review syllabus
Complete reading to prepare for Tuesday’s lecture
Tuesday’s lecture: Monte Carlo Sampling