statisticscase-studyteacher-workshop

Case Study Workshop: Using Pharma News to Teach Regression and Risk Analysis

eequations

2026-02-06

10 min read

Turn STAT pharma headlines into multi-session case studies teaching regression, logistic models, risk analysis, and clear uncertainty communication for instructors.

Turn STAT headlines into a multi-session workshop that teaches regression, logistic models, and communicating uncertainty

Hook: As an instructor, you know the gap: students can run regressions by rote but still stumble when asked to interpret model outputs for a newsroom, a policy brief, or a patient. In 2026, with faster drug approvals, legal headlines, and AI-assisted data tools dominating the media cycle, teachers need classroom-ready case studies that connect statistical technique to real-world stakes. This workshop blueprint turns STAT-style pharma headlines — think regulatory voucher debates, weight-loss drug safety flags, or insider trading suits — into a semester-long, hands-on learning sequence focused on regression, logistic models, risk analysis, and communicating uncertainty.

Why this matters now (2026 context)

In late 2025 and early 2026 the intersection of rapid regulatory change and explosive media coverage of pharma made statistical literacy a classroom priority. Regulators issued clearer guidance on real-world evidence and transparency; newsrooms accelerated use of probabilistic claims; and educators reported students struggling to translate model coefficients into actionable risk communication. Meanwhile, tools for synthetic-data generation, differential privacy, and AI-augmented analysis are now classroom-ready. This workshop leverages those trends so students learn both statistical rigor and the ethical, legal, and communication skills expected by employers and the public.

Workshop overview — multi-session structure (ready-to-run)

Run this as a 6–8 week module within a statistics, data science, or public health methods course. Each session is 90–120 minutes with pre-work and an applied assignment.

Session 0: Preparation and framing — news framing, ethics, data sourcing
Session 1: Exploratory Data Analysis (EDA) anchored to a headline — descriptive context and story-building
Session 2: Linear regression for effect estimation — assumptions, diagnostics, and interpretation
Session 3: Logistic regression and binary outcomes — odds ratios, thresholds, ROC, calibration
Session 4: Risk analysis and translating results — absolute vs relative risk, NNT, prediction intervals
Session 5: Confounding, causality, and sensitivity analysis — propensity scores, instrumental variables, simulations
Session 6: Capstone presentations — communicating to lay audiences — press release, policy memo, and reproducible code

Session-by-session breakdown with actionable steps

Session 0 — Preparation and ethics (45–60 min prep + class)

Goal: Set expectations, gather news hooks, and ensure safe data practices.

Assign reading: pick one timely STAT headline (example: coverage of weight-loss drug safety concerns or FDA voucher program hesitancy).
Discuss legal and ethical basics: patient privacy, defamation risk when linking individuals to stories, and embargoes.
Data sourcing checklist: OpenFDA, ClinicalTrials.gov, CMS/Medicare claims (public subsets), or synthetic & privacy-preserving data generators. Teach students to document provenance.
Pre-work: students find two public datasets (or generate synthetic analogs) relevant to the headline and upload to GitHub Classroom.

Session 1 — EDA anchored to the headline (90–120 min)

Goal: Train students to move from headline to analytic question.

Start with the headline: what claims are being made? Write 2–3 testable hypotheses.
EDA steps: variable types, missingness, simple visual checks (histograms, boxplots, bar charts), and cohort selection. Use reproducible notebooks (Jupyter/R Markdown).
Class activity: create a one-slide storyboard — who is affected, timeframe, and data limitations.
Deliverable: cleaned dataset, code notebook, and a 200-word problem statement.

Session 2 — Linear regression: estimating effects and checking assumptions

Goal: Teach how to set up a regression, test assumptions, and interpret coefficients in context.

Start with the standard linear model:

Y = β0 + β1 X1 + β2 X2 + ... + ε

Step-by-step in class:

Model selection: specify predictors (demographics, baseline risk, treatment indicator).
Fit the model (Python/R snippet):

<!-- code snippet -->
# Python (pandas + statsmodels)
import statsmodels.formula.api as smf
model = smf.ols('weight_loss_pct ~ age + sex + treatment_group + baseline_BMI', data=df).fit()
print(model.summary())

Diagnostics: residual plot, QQ-plot, heteroskedasticity test (Breusch-Pagan), multicollinearity (VIF).
Interpretation workshop: translate β coefficients to plain English. Example: a β1 = 2.3 for treatment means an average 2.3 percentage-point greater weight loss, holding other covariates constant.
Class task: create 95% confidence intervals and write two sentences that accurately reflect uncertainty.

Session 3 — Logistic regression and communicating binary risk

Goal: Model adverse events (yes/no) and explain odds ratios, thresholds, and calibration.

Start with the logit model:

logit(p) = log(p / (1 - p)) = β0 + β1 X1 + ...

Key teaching points and activities:

Fit a logistic model (Python/R):

<!-- code snippet -->
# R example
glm_fit <- glm(adverse_event ~ age + treatment_group + comorbidity_score, family = binomial(), data = df)
summary(glm_fit)
exp(coef(glm_fit)) # odds ratios

Interpretation practice: convert coefficients to odds ratios: OR = exp(β). Explain the difference between odds and probability. Use concrete base rates to avoid misleading relative claims.
Threshold and decision analysis: explore how changing the classification threshold alters sensitivity/specificity; plot ROC curve and compute AUC.
Calibration: compute calibration-in-the-large and plot predicted vs observed risk; discuss Brier score.
Class exercise: students prepare a 60-second ‘newsroom-ready’ explanation of an odds ratio, including absolute risk translation.

Session 4 — Risk analysis and communicating uncertainty

Goal: Equip students to translate model output into policy-relevant metrics and clear communications.

Absolute vs relative risk: show how a 50% relative increase can be tiny when baseline risk is low. Practice converting model-predicted probabilities into absolute risk differences with confidence intervals.
Number Needed to Treat (NNT): compute NNT = 1 / absolute risk reduction and interpret for policy audiences.
Prediction intervals and fan charts: show how to make prediction intervals for individuals and populations; demonstrate fan charts to visualize uncertainty over time — useful when reporting forward-looking regulatory outcomes.
Communicating uncertainty: provide templates for statements that balance clarity and honesty. Use both “plain-language” and “methods appendix” versions.

Example plain-language line: "Our models estimate that among 10,000 people like those in the study, 12 fewer would experience a serious side effect if Treatment A is used, but the true number could reasonably be between 4 and 20."

Session 5 — Confounding, causality, and sensitivity

Goal: Move beyond association to structured causal thinking using practical tools.

Introduce directed acyclic graphs (DAGs) to map potential confounding related to prescription patterns, insurance access, and severity.
Teach propensity score matching, inverse probability weighting, and doubly robust estimators with lab exercises.
Run a sensitivity analysis: show how unmeasured confounding of a given strength would change effect estimates; students write short interpretations of robustness.
Connect to real-world topics: when a headline discusses legal risk over accelerated approvals (e.g., voucher concerns), link causal assumptions to regulatory decisions.

Session 6 — Capstone: presentation and communication sprint

Goal: Students produce a mini case study that includes code, a methods appendix, and a 90-second public-facing statement.

Deliverables: reproducible notebook, slide deck (5 slides), 300-word policy brief, and a 90-second recorded pitch to a mock newsroom or regulatory board.
Grading rubric: clarity of question (20%), technical correctness (30%), interpretation & uncertainty communication (30%), reproducibility & code quality (20%).
Optional extension: students submit a one-page press release that a health beat reporter could publish with minimal edits.

Sample classroom artifacts (copy-paste ready)

Press-release template

Headline: Short, active sentence tying the finding to the population

Lead (1–2 sentences): Most important result plus uncertainty (e.g., CI)

Context (1–2 sentences): Baseline rates and why this matters

Methods note (1 sentence): Data source and model type

Quote (1): Plain-language interpretation

Rubric highlights

Reproducibility: notebook runs end-to-end (automated check)
Statistical validity: appropriate model and diagnostics
Communication: absolute risks provided, uncertainty stated, no misleading relative claims
Ethics: data provenance and anonymization documented

Practical instructor tips and tooling (2026-ready)

Use modern infrastructure to keep the workshop smooth and scalable:

Notebooks: JupyterHub or RStudio Cloud for reproducible environments. Use GitHub Classroom to collect assignments.
Synthetic & privacy-preserving data: in 2025–26 multiple open-source libraries for synthetic healthcare data matured. Use them when you cannot share real patient-level data.
Auto-grading and tests: include unit tests for key deliverables (e.g., correct model formula, CI calculation).
Visualization libraries: use Altair or ggplot2 for clear uncertainty visuals. Consider interactive dashboards for the capstone demonstration.
AI as assistant: leverage LLMs to produce first-draft plain-language summaries, but require students to justify every translation and cite model-derived numbers. Emphasize explainability and model provenance.

Common pitfalls and how to teach around them

Pitfall: Students over-rely on p-values. Teaching move: emphasize effect sizes and CIs; require absolute risk statements.
Pitfall: Odds ratios miscommunicated as risk ratios. Teaching move: force conversion exercises and simple analogies.
Pitfall: Confounding masked by poor variable selection. Teaching move: use DAGs and mandatory sensitivity analyses.
Pitfall: Misleading headlines that amplify relative claims. Teaching move: run a newsroom simulation where students edit a headline under time pressure.

Example classroom script: from headline to slide in 90 minutes

5 min — Read headline together and list claimed facts.
10 min — Formulate a single testable hypothesis.
25 min — Quick EDA and baseline rate calculation (pair programming).
30 min — Fit a simple model (linear or logistic) and compute an effect, CI, and absolute risk translation.
20 min — Create a 1-slide explanation and a 30-second oral summary for a lay audience.

Assessment & learning outcomes

By module end, students should be able to:

Specify and fit appropriate regression models for continuous and binary outcomes.
Diagnose model fit and report realistic uncertainty (CIs, prediction intervals, calibration).
Translate statistical results into absolute risk statements suitable for news or policy contexts.
Demonstrate understanding of confounding and perform basic causal sensitivity checks.
Produce reproducible analyses that adhere to modern privacy and ethical standards.

Resources and data sources

Suggested public sources and tools (2026):

OpenFDA APIs (drug event data)
ClinicalTrials.gov and aggregate results datasets
CMS public use files and Medicare Common Data
Synthetic data libraries (privacy-preserving generators released 2024–2026)
Stats & visualization: statsmodels, scikit-learn, tidymodels, ggplot2

Case study examples from recent headlines — quick prompts

Adaptable prompts you can hand students in the first week:

"A report suggests major drugmakers hesitate to join an accelerated review program due to legal risk. Using claims-level data, estimate whether drugs approved under fast-track pathways have different adverse-event reporting rates compared to standard approvals."
"A STAT feature examines weight-loss drugs and emerging safety signals. Using trial-aggregated or observational data, model the probability of a serious side effect and translate to absolute risk among typical patients."
"A local news story reports insider trading related to vaccine manufacturing. Use time-series regression to test for abnormal returns around announcement dates and discuss uncertainty in event-study estimates."

Final notes: what students should walk away with

This workshop is deliberately practical: students leave with not only statistical skills but the ability to frame quantitative uncertainty for non-experts. In 2026, that blend of modeling rigor, ethical data handling, and crisp communication is exactly what journalists, regulators, and employers are asking for.

Call to action

If you want a ready-to-run kit (slides, notebooks, grading rubrics, and synthetic datasets) adapted to your syllabus, download the free instructor pack at equations.live/workshop-kits or contact our team to co-design a custom module for your course. Bring the next STAT headline into class — and teach students how to turn raw claims into robust, responsibly communicated analysis.

equations

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.