Omitted Variable Bias
category_specifier : "Causal Inference"
Reference Docs: Statistical Bias | Endogeneity and Exogeneity | Frisch Waugh Theorem | Instrument Variable | Hypothesis Testing | Using control variables
Motivation
💡What happens if we forget to control for something important? Is there a third variable that affects both X and Y?
When does OVB arise?
-
When a relevant variable is:
-
Omitted from the regression, and
- Correlated with both the explanatory variable X and the outcome Y where Z is the omitted variable.
Definition
Omitted Variable Bias (OVB) is a form of statistical bias that occurs when a key variable is excluded from a regression model, causing its effect to be wrongly attributed to other variables.
Why It Matters
OVB compromises our ability to identify true cause-and-effect relationships. When we omit a confounding variable, we risk drawing false conclusions about relationships between variables. This can lead to incorrect decisions in business and analytics. Understanding OVB reminds us that correlation is not causation and helps us identify potential hidden factors affecting our analysis.
How to Address OVB
Here are four ways to address omitted variable bias:
- Add Controls: Include omitted variables in the regression.
- Fixed Effects: Control for time-invariant or entity-invariant factors using panel data.
- Instrumental Variables: Use variables that affect X but not the error term.
- Randomized Trials: Use A/B tests to ensure treatment independence.
Combining these methods often works best to make variables more exogenous.
Examples
- Education → Income, but we omit ability → upward bias.
- Ad spending → Sales, but we omit seasonality → spurious correlation.
- Police presence → Crime rates, but we omit neighborhood risk → distorted effect.
Key Equations
The bias in a coefficient estimate when omitting variable Z is:
This shows bias depends on Z's effect (\(\beta_2\)) and its correlation with X. The exogeneity assumption \(E[u|X]=0\) is violated when important variables are omitted.