But What Is Confounding, Really?

Confounding is inherent to observational studies, so we need to really understand what it is.

Instead of talking about a breaking medical study – my usual modus operandi, I wanted to take a moment to discuss one of the things that applies to so many breaking medical studies – the concept of confounding.

Most docs have an intuition for what a “confounder” is.

We read a study that suggests that breast feeding is linked to higher IQ and we get a bit skeptical. We seem to have a gut feeling that maybe the act of breast feeding isn’t causing higher IQs in children – maybe it is merely a marker for some other thing – maternal education or something.  That other thing is a confounder.  And confounders make assessment of causality really difficult.

Why does causality matter? For the simple reason that if A causes B, then changing A should lead to a change in B. That’s the model for a new therapy – and the goal of much of medical science.

But if you want to really understand confounding, and who doesn’t, you need to know a bit more.  Let me walk you through it.

Let’s imagine a study that started, as most studies do, with a clinical observation. 

A researcher noted that individuals who drink rosé are shorter, on average, than those who don’t.  The hypothesis: Rosé stunts your growth.


We can represent this hypothesis with a schematic – you can call it a directed acyclic graph if you want to sound smart at parties, but I often refer to it as a “causal diagram”. We are asking does rosé drinking cause short stature?

Now, you are probably way ahead of me. There is a major confounder here. 

Women drink more rosé than men and women are shorter than men, on average. In this case, women confound the rosé : height relationship.


Here’s the critical thing to note. Being a woman is associated with BOTH the exposure of interest AND the outcome of interest. That is the definition of a confounder.

Once we’ve identified a confounder, we can “adjust” for it.  “Adjusting” cuts causal lines – like this, allowing the true relationship between exposure and outcome to emerge.

Now, let’s reset and ask about a variable that is only associated with the outcome, but not the exposure, say – parental height?

Not a confounder. Practically speaking, you don’t need to worry about it. Testing the rosé : height hypothesis does NOT require measurement of parent’s height.


Adjusting for parents’ height gets you no closer to the causality question than you were when you started.

Ok, let’s reset and add back in our confounder. Our factor associated with both exposure and outcome.

What about factors associated with the exposure, but NOT the outcome.

Now this can be special. If you can find a factor that is associated with the exposure, but has no plausible link to the outcome, you have identified what is called an “instrumental variable” and these are super cool.

Let’s imagine there is a gene called ROSE1.

It codes for a receptor on the tongue that makes rosé taste absolutely bonkers amazing. People born with this gene will drink more rosé because it just tastes so good.

Let’s further posit that the gene is not on a sex-chromosome, so it has no relationship to gender. Further, the gene has no plausible link to height – it doesn’t code for any growth proteins or anything.

OK IF rosé REALLY does cause short stature, people born with this gene will be shorter on average due to all that rosé they have been drinking. There is a path from the gene to stature.


IF, on the other hand, the observed rosé : height relationship is all due to confounding, people born with the ROSE1 gene will be no taller or shorter than the rest of us.

See? Without a causal link between rosé drinking and stature, the gene promoting rosé drinking has no path to get to stature.

That is what is so special about an instrumental variable – it allows for a decent assessment of causality.

A genetic instrumental variable is even more special. In fact, if you ever read a study referencing “Mendelian Randomization” – this is exactly what they are talking about. They found an instrumental variable that just happened to be a gene, and it allows for all sorts of causal inference that you couldn’t do otherwise because genes are, basically, assigned at birth.

So that’s more than you ever wanted to know about confounding, but I hope it helps the next time you are reading an observational study while drinking rosé. Cheers.

This commentary first appeared on medscape.com.