Making Potential Outcomes Concrete 2

In this exercise, I will describe a simple data science project, and your job is to map the elements of that project onto the Potential Outcomes Framework.

The Study

You work for the HR department of a large company, WeMakeWidgets LLC. WeMakeWidgets is struggling because their health insurance costs have been skyrocketing, and they’re looking for ways to control costs.

For the past few years, WeMakeWidgets (WMW) has allowed its employees to enroll in a program where, instead of being allowed to see doctors in person, they have to first see a nurse via Zoom who offers advice, and then only sends them to see a doctor in person if necessary. In exchange, for doing these Zoom triage appointments, these employees are given a 50 dollar a month bonus.

Employees who are enrolled in the program spend substantially less on healthcare than people not in the program, saving the company money.

With that in mind, the HR department is thinking of making an initial Zoom appointment mandatory for all employees seeking medical care.

Mapping to the Potential Outcomes Framework

In words, describe exactly what quantities in this study context would correspond to the different components of the Potential Outcomes Framework we’ve been studying. When defining these, remember to define both the thing being measured and the population being measured.

As you do so, avoid using terms like “treatment”, “control group”, or “potential outcome”. The goal of this exercise is to move from the abstract conceptual derivations we’ve read to the specifics of this study. For example, I’ve put in an answer to Question 1 below:

1: \(E(Y_0)\)

The average healthcare a WMW employee would consume if there were no Zoom triage appointments.

2: \(E(Y_1)\)


3: \(E(Y_1) - E(Y_0)\)


4: \(E(Y_1| D=1)\)


5: \(E(Y_0| D=0)\)


6: \(E(Y_1|D=0)\)


7: \(E(Y_0| D=1)\)


8: \(E(Y_1| D=0) - E(Y_0|D=0)\)


9: \(E(Y_0| D=1) - E(Y_0|D=0)\)



10: Now, which of the quantities above can be directly observed?


Causal Inference

In order for the difference in healthcare consumption for Zoom enrollees – that those enrolled in the Zoom program consume less healthcare – to be a true estimate of the average effect of Zoom triage appointments, we know that it must be the case that:

\(E(Y_0| D=1) - E(Y_0| D=0) = 0\)


\(E(Y_1| D=0) - E(Y_0| D=0) = E(Y_1| D=1) - E(Y_0 | D=1)\).

In the context of this study, what do those two conditions mean in plain english? As above, avoid using abstract terms (“treatment”, “baseline”, etc.) and try and be as concrete as possible.

11 \(E(Y_0| D=1) - E(Y_0| D=0) = 0\):


12 \(E(Y_1| D=0) - E(Y_0| D=0) = E(Y_1| D=1) - E(Y_0| D=1)\):


Now, for each of the conditions above, please give one reason – in plain english – why those conditions may not be met in the context of this study?

As you do so, be specific! Tell me a story about why in the case of this study you think one of these conditions may hold. One can always say things like “people in the two groups may have been different”, but I want a specific reason you think they might have been different in a way that meets the conditions.

13 It may be the case that \(E(Y_0| D=1) - E(Y_0| D=0) \neq 0\) because…:


14 It may be the case that \(E(Y_1| D=0) - E(Y_0|D=0) \neq E(Y_1| D=1) - E(Y_0| D=1)\) because…: