Resume Experiment Analysis

How much harder is it to get a job in the United States if you are Black than if you are White? Or, expressed differently, what is the effect of race on the difficulty of getting a job in the US?

In this exercise, we will be analyzing data from a real world experiment designed to help answer this question. Namely, we will be analyzing data from a randomized experiment in which 4,870 ficticious resumes were sent out to employers in response to job adverts in Boston and Chicago in 2001. The resumes differ in various attributes including the names of the applicants, and different resumes were randomly allocated to job openings.

The “experiment” part of the experiment is that resumes were randomly assigned Black- or White-sounding names, and then watched to see whether employers called the “applicants” with Black-sounding names at the same rate as the applicants with the White-sounding names.

(Which names constituted “Black-sounding names” and “White-sounding names” was determined by analyzing names on Massachusetts birth certificates to determine which names were most associated with Black and White children, and then surveys were used to validate that the names were perceived as being associated with individuals of one racial category or the other. Also, please note I subscribe to the logic of Kwame Anthony Appiah and chose to capitalize both the B in Black and the W in White).

You can get access to original article here.

Note to Duke students: if you are on the Duke campus network, you’ll be able to access almost any academic journal articles directly; if you are off campus and want access, you can just go to the Duke Library website and search for the article title. Once you find it, you’ll be asked to log in, after which you’ll have full access to the article. You will also find this pattern holds true at nearly any major University in the US.

Gradescope Autograding

Please follow all standard guidance for submitting this assignment to the Gradescope autograder, including storing your solutions in a dictionary called results and ensuring your notebook runs from the start to completion without any errors.

For this assignment, please name your file exercise_resume_experiment.ipynb before uploading.

You can check that you have answers for all questions in your results dictionary with this code:

assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}

Submission Limits

Please remember that you are only allowed FOUR submissions to the autograder. Your last submission (if you submit 4 or fewer times), or your third submission (if you submit more than 4 times) will determine your grade Submissions that error out will not count against this total.

That’s one more than usual in case there are issues with exercise clarity.

Checking for Balance

The first step in analyzing any experiment is to check whether you have balance across your treatment arms—that is to say, do the people who were randomly assigned to the treatment group look like the people who were randomly assigned to the control group. Or in this case, do the resumes that ended up with Black-sounding names look like the resumes with White-sounding names.

Checking for balance is critical for two reasons. First, it’s always possible that random assignment will create profoundly different groups—the Large of Large Numbers is only a “law” in the limit. So we want to make sure we have reasonably similar groups from the outset. And second, it’s also always possible that the randomization wasn’t actually implemented correctly—you would be amazed at the number of ways that “random assignment” can go wrong! So if you ever do find you’re getting unbalanced data, you should worry not only about whether the groups have baseline differences, but also whether the “random assignment” was actually random!

Exercise 1

Download the data set from this experiment (resume_experiment.dta) from github. To aid the autograder, please load the data directly from a URL.

Exercise 2

  • black is the treatment variable in the data set (whether the resume has a “Black-sounding” name).

  • call is the dependent variable of interest (did the employer call the fictitious applicant for an interview)

In addition, the data include a number of variables to describe the other features in each fictitious resume, including applicants education level (education), years of experience (yearsexp), gender (female), computer skills (computerskills), and number of previous jobs (ofjobs). Each resume has a random selection of these attributes, so on average the Black-named fictitious applicant resumes have the same qualifications as the White-named applicant resumes.

Check for balance in terms of the average values of applicant gender (female), computer skills (computerskills), and years of experience (yearsexp) across the two arms of the experiment (i.e. by black). Calculate both the differences in means across treatment arms and test for statistical significance of these differences. Does gender, computer skills, and yearsexp look balanced across race groups in terms of both statistical significance and magnitude of difference?

Store the p-values associated with your t-test of these variables in ex2_pvalue_female, ex2_pvalue_computerskills, and ex2_pvalue_yearsexp. Round your values to 2 decimal places.

[1]:
import pandas as pd
import numpy as np

pd.set_option("mode.copy_on_write", True)

from scipy import stats
import statsmodels.api as sm

resumes = pd.read_stata(
    "https://github.com/nickeubank/MIDS_Data/blob/master/resume_experiment/"
    "resume_experiment.dta?raw=true"
)
results = {}
[2]:
resumes.head()
[2]:
education ofjobs yearsexp computerskills call female black
0 4 2 6 1 0.0 1.0 0.0
1 3 3 6 1 0.0 1.0 0.0
2 4 1 6 1 0.0 1.0 1.0
3 3 4 6 1 0.0 1.0 1.0
4 3 3 22 1 0.0 1.0 0.0
[3]:
covariates = ["female", "computerskills", "yearsexp"]
for i in covariates:
    black = resumes.loc[resumes.black == 1, i].mean()
    white = resumes.loc[resumes.black == 0, i].mean()
    pvalue = stats.ttest_ind(
        resumes.loc[resumes.black == 1, i].values,
        resumes.loc[resumes.black == 0, i].values,
    ).pvalue
    results[f"ex2_pvalue_{i}"] = np.round(pvalue, 2)
    print(f"For {i}, the mean for Black applicants is {black:.2f},")
    print(f"            the mean for White applicants is {white:.2f},")
    print(f"and the p-value for this difference is {results[f'ex2_pvalue_{i}']:.2f}")
    print("\n")
For female, the mean for Black applicants is 0.77,
            the mean for White applicants is 0.76,
and the p-value for this difference is 0.38


For computerskills, the mean for Black applicants is 0.83,
            the mean for White applicants is 0.81,
and the p-value for this difference is 0.03


For yearsexp, the mean for Black applicants is 7.83,
            the mean for White applicants is 7.86,
and the p-value for this difference is 0.85


Yes, gender and experience are roughly balanced across race groups. Black shows a statistically significantly higher level of computer skills than White, but the magnitude isn’t too worrying. Provided other things seem balanced, this is hopefully spurious. Later we will add computer skills as a control to see if it makes a difference.

Exercise 3

Do a similar tabulation for education (education). Education is a categorical variable coded as follows:

  • 0: Education not reported

  • 1: High school dropout

  • 2: High school graduate

  • 3: Some college

  • 4: College graduate or higher

Because these are categorical, you shouldn’t just calculate and compare means—you should compare share or count of observations with each value (e.g., a chi-squared contingency table). You may also find the pd.crosstab function useful.

Does education look balanced across racial groups?

Store the p-value from your chi squared test in results under the key ex3_pvalue_education. Please round to 2 decimal places.

[4]:
# Quick and dirty:
ctab = pd.crosstab(resumes["education"], resumes["black"])
ctab
[4]:
black 0.0 1.0
education
0 18 28
1 18 22
2 142 132
3 513 493
4 1744 1760
[5]:
import scipy.stats

chi2, p, dof, expected = scipy.stats.chi2_contingency(ctab.values)

results["ex3_pvalue_education"] = np.round(p, 2)
print(
    f"The p-value of the chi-squared test of education levels is {results['ex3_pvalue_education']:.2f}."
)
The p-value of the chi-squared test of education levels is 0.49.
[6]:
# So not statistically different. Pretty darn similar!

Exercise 4

What do you make of the overall results on resume characteristics? Why do we care about whether these variables look similar across the race groups? And if they didn’t look similar, would that be a threat to internal or external validity?

Answer in markdown, then also store your answer to the question of whether imbalances are a threat to internal or external validity in "ex4_validity" as the string "internal" or "external".

Checking on the balance across groups on a number of key variables ensures that the resumes sent out are the same or almost the same, which allows us to confidently isolate the independent variable, the “Black-ness” or “White-ness” of the name.

[7]:
results["ex4_validity"] = "internal"

Estimating Effect of Race

Exercise 5

The variable of interest in the data set is the variable call, which indicates a call back for an interview. Perform a two-sample t-test comparing applicants with black sounding names and white sounding names.

Interpret your results—in both percentage and in percentage points, what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume?

Store how much more likely a White applicant is to receive a call back than a Black respondent in percentage and percentage points in "ex5_white_advantage_percent"and "ex5_white_advantage_percentage_points". Please scale percentages so 1 is 1% and percentage points so a value of 1 corresponds to 1 percentage point. Please round these answers to 2 decimal places.

Store the p-value of the difference in "ex5_pvalue" Please round your p-value to 5 decimal places.

[8]:
black = resumes[resumes.black == 1]["call"]
white = resumes[resumes.black == 0]["call"]
t2, p2 = stats.ttest_ind(black, white)

print(f"The mean callback rates are:")
print(f"{black.mean():.3%} for Black applicants, and")
print(f"{white.mean():.3%} for White applicants.")
results["ex5_white_advantage_percent"] = np.round(
    float((white.mean() - black.mean()) / black.mean()) * 100, 2
)
results["ex5_white_advantage_percentage_points"] = np.round(
    float((white.mean() - black.mean()) * 100), 2
)

print(
    f"White applicants are more likely to get a call back by {results['ex5_white_advantage_percentage_points']: .2f} percentage points, "
    f"an advantage of {results['ex5_white_advantage_percent'] :.2f}%."
)
results["ex5_pvalue"] = np.round(p2, 5)
print(f"This difference has a p-value of {results['ex5_pvalue']:.5f}")
The mean callback rates are:
6.448% for Black applicants, and
9.651% for White applicants.
White applicants are more likely to get a call back by  3.20 percentage points, an advantage of 49.68%.
This difference has a p-value of 0.00004

Exercise 6

Now, use a linear probability model (a linear regression with a 0/1 dependent variable!) to estimate the differential likelihood of being called back by applicant race (i.e. the racial discrimination by employers). Please use statsmodels.

Since we have a limited dependent variable, be sure to use heteroskedastic robust standard errors. Personally, I prefer the HC3 implementation, as it tends to do better with smaller samples than other implementations.

Interpret these results—what is the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume in terms of the likelihood you’ll be called back?

How does this compare to the estimate you got above in exercise 5?

Store the p-value associated with black in "ex6_black_pvalue". Please round your pvalue to 5 decimal places.

[9]:
import statsmodels.formula.api as smf

model = smf.ols("call ~ black", resumes).fit()
model.get_robustcov_results(cov_type="HC3").summary()
[9]:
OLS Regression Results
Dep. Variable: call R-squared: 0.003
Model: OLS Adj. R-squared: 0.003
Method: Least Squares F-statistic: 16.92
Date: Sun, 03 Mar 2024 Prob (F-statistic): 3.96e-05
Time: 10:32:17 Log-Likelihood: -562.24
No. Observations: 4870 AIC: 1128.
Df Residuals: 4868 BIC: 1141.
Df Model: 1
Covariance Type: HC3
coef std err t P>|t| [0.025 0.975]
Intercept 0.0965 0.006 16.121 0.000 0.085 0.108
black -0.0320 0.008 -4.114 0.000 -0.047 -0.017
Omnibus: 2969.205 Durbin-Watson: 1.440
Prob(Omnibus): 0.000 Jarque-Bera (JB): 18927.068
Skew: 3.068 Prob(JB): 0.00
Kurtosis: 10.458 Cond. No. 2.62


Notes:
[1] Standard Errors are heteroscedasticity robust (HC3)
[10]:
results["ex6_black_pvalue"] = np.round(model.pvalues["black"], 5)
print(f'The p-value associated with `black` is {results["ex6_black_pvalue"]:.6f}')
The p-value associated with `black` is 0.000040

Black respondents are 3.2 percentage points less likely to get a call back than White respondents. Exact same estimate as from t-test (because this kind of regression is precisely equivalent!)

Exercise 7

Even when doing a randomized experiment, adding control variables to your regression can improve the statistical efficiency of your estimates of the treatment effect (the upside is the potential to explain residual variation; the downside is more parameters to be estimated). Adding controls can be particularly useful when randomization left some imbalances in covariates (which you may have seen above).

Now let’s see if we can improve our estimates by adding in other variables as controls. Add in education, yearsexp, female, and computerskills—be sure to treat education as a categorical variable!

[11]:
import statsmodels.formula.api as smf

model = smf.ols(
    "call ~ black + C(education) + yearsexp + computerskills + female", resumes
).fit()
model.get_robustcov_results(cov_type="HC3").summary()
[11]:
OLS Regression Results
Dep. Variable: call R-squared: 0.008
Model: OLS Adj. R-squared: 0.006
Method: Least Squares F-statistic: 4.350
Date: Sun, 03 Mar 2024 Prob (F-statistic): 3.04e-05
Time: 10:32:17 Log-Likelihood: -551.02
No. Observations: 4870 AIC: 1120.
Df Residuals: 4861 BIC: 1178.
Df Model: 8
Covariance Type: HC3
coef std err t P>|t| [0.025 0.975]
Intercept 0.0821 0.040 2.053 0.040 0.004 0.160
C(education)[T.1] -0.0017 0.057 -0.030 0.976 -0.113 0.110
C(education)[T.2] -8.953e-05 0.042 -0.002 0.998 -0.082 0.082
C(education)[T.3] -0.0025 0.039 -0.065 0.948 -0.079 0.074
C(education)[T.4] -0.0047 0.038 -0.124 0.901 -0.080 0.070
black -0.0316 0.008 -4.076 0.000 -0.047 -0.016
yearsexp 0.0032 0.001 3.665 0.000 0.001 0.005
computerskills -0.0186 0.011 -1.616 0.106 -0.041 0.004
female 0.0112 0.010 1.165 0.244 -0.008 0.030
Omnibus: 2950.646 Durbin-Watson: 1.448
Prob(Omnibus): 0.000 Jarque-Bera (JB): 18631.250
Skew: 3.047 Prob(JB): 0.00
Kurtosis: 10.395 Cond. No. 225.


Notes:
[1] Standard Errors are heteroscedasticity robust (HC3)

Basically no change in estimate or statistical power. Though given the sample size and balance, that isn’t super surprising.

Estimating Heterogeneous Effects

Exercise 8

As you may recall from some past readings (such as this one on the migraine medication Aimovig), our focus on estimating Average Treatment Effects runs the risk of papering over variation in how individuals respond. In the case of Aimovig, for example, nearly no patients actually experienced the Average Treatment Effect of the medication; around half of patients experienced no benefit, while the other half experienced a benefit of about twice the average treatment effect.

So far in this analysis we’ve been focusing on the average effect of having a Black-sounding name (as compared to a White-sounding name). But we can actually use our regression framework to look for evidence of heterogeneous treatment effects—effects that are different for different types of people in our data. We accomplish this by interacting a variable we think may be related to experiencing a differential treatment effect with our treatment variable. For example, if we think that applicants with Black-sounding names who have a college degree are likely to experience less discrimination, we can interact black with an indicator for having a college degree. If having a college degree reduces discrimination, we could expect the interaction term to be positive.

Is there more or less racial discrimination (the absolute magnitude difference in call back rates between Black and White applicants) among applicants who have a college degree? Store your answer as the string "more discrimination" or "less discrimination" under the key "ex8_college_heterogeneity".

Please still include education, yearsexp, female, and computerskills as controls.

Note: it’s relatively safe to assume that someone hiring employees who sees a resume that does not report education levels will assume the applicant does not have a college degree. So treat “No education reported” as “not having a college degree.”

In percentage points, what is the difference in call back rates:

  • between White applicants without a college degree and Black applicants without a college degree (ex8_black_nocollege).

  • between White applicants with a college degree and Black applicants with a college degree (ex8_black_college).

Use negative values to denote a lower probability for Black applicants to get a call back. Scale so a value of ``1`` is a one percentage point difference. Please round your answer to 2 percentage points.

Focus on the coefficient values, even if the significance is low.

[12]:
resumes["college_degree"] = resumes.education == 4
model = smf.ols(
    "call ~ black*college_degree + C(education) + yearsexp + computerskills + female",
    resumes,
).fit()
fit_model = model.get_robustcov_results(cov_type="HC3")
fit_model.summary()
/Users/nce8/opt/miniconda3/lib/python3.11/site-packages/statsmodels/base/model.py:1896: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 10, but rank is 9
  warnings.warn('covariance of constraints does not have full '
[12]:
OLS Regression Results
Dep. Variable: call R-squared: 0.008
Model: OLS Adj. R-squared: 0.006
Method: Least Squares F-statistic: 3.952
Date: Sun, 03 Mar 2024 Prob (F-statistic): 4.93e-05
Time: 10:32:17 Log-Likelihood: -550.76
No. Observations: 4870 AIC: 1122.
Df Residuals: 4860 BIC: 1186.
Df Model: 9
Covariance Type: HC3
coef std err t P>|t| [0.025 0.975]
Intercept 0.0875 0.040 2.176 0.030 0.009 0.166
college_degree[T.True] -0.0060 0.019 -0.307 0.759 -0.044 0.032
C(education)[T.1] -0.0023 0.057 -0.040 0.968 -0.114 0.110
C(education)[T.2] -0.0012 0.042 -0.030 0.976 -0.083 0.081
C(education)[T.3] -0.0036 0.039 -0.092 0.927 -0.080 0.073
C(education)[T.4] -0.0060 0.019 -0.307 0.759 -0.044 0.032
black -0.0405 0.015 -2.736 0.006 -0.070 -0.011
black:college_degree[T.True] 0.0123 0.017 0.710 0.478 -0.022 0.046
yearsexp 0.0032 0.001 3.672 0.000 0.001 0.005
computerskills -0.0186 0.011 -1.618 0.106 -0.041 0.004
female 0.0112 0.010 1.157 0.247 -0.008 0.030
Omnibus: 2950.182 Durbin-Watson: 1.448
Prob(Omnibus): 0.000 Jarque-Bera (JB): 18623.859
Skew: 3.046 Prob(JB): 0.00
Kurtosis: 10.393 Cond. No. 2.85e+15


Notes:
[1] Standard Errors are heteroscedasticity robust (HC3)
[2] The smallest eigenvalue is 5.38e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
[13]:
results["ex8_black_nocollege"] = 100 * float(
    fit_model.params[np.array(fit_model.model.exog_names) == "black"]
)

results["ex8_black_nocollege"] = np.round(results["ex8_black_nocollege"], 2)

results["ex8_black_college"] = 100 * float(
    fit_model.params[np.array(fit_model.model.exog_names) == "black"]
    + fit_model.params[
        np.array(fit_model.model.exog_names) == "black:college_degree[T.True]"
    ]
)

results["ex8_black_college"] = np.round(results["ex8_black_college"], 2)

print(
    f'Black respondents without a college degree are {results["ex8_black_nocollege"]:.2f} '
    "percentage points less likely to get a call back than White applicant."
)
print(
    f'Black respondents with a college degree are {results["ex8_black_college"]:.2f} '
    "percentage points less likely to get a call back than White applicant"
)
Black respondents without a college degree are -4.05 percentage points less likely to get a call back than White applicant.
Black respondents with a college degree are -2.82 percentage points less likely to get a call back than White applicant
/var/folders/fs/h_8_rwsn5hvg9mhp0txgc_s9v6191b/T/ipykernel_18376/838979874.py:1: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  results["ex8_black_nocollege"] = 100 * float(
/var/folders/fs/h_8_rwsn5hvg9mhp0txgc_s9v6191b/T/ipykernel_18376/838979874.py:7: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  results["ex8_black_college"] = 100 * float(
[14]:
results["ex8_college_heterogeneity"] = "less discrimination"

But with that said, the difference is far from statistically significant, so it should be taken with a few grains of salt, but also not ignored.

One thing to bear in mind is that this is a situation where our standard errors are relatively large with respect to the statistical quantities of interest, suggesting that this is just a little bit underpowered for analyzing our subpopulations, not a situation where we can’t reject the null hypothesis of no effect because we had a very precisely estimated estimate of zero effect.

Exercise 9

Now let’s compare men and women—is the penalty for having a Black-sounding name greater for Black men or Black women? Store your answer as "greater discrimination for men" or "greater discrimination for women" in "ex9_gender_and_discrimination".

Focus on the coefficient values, even if the significance is low.

Again, please still include education, yearsexp, female, and computerskills as controls.

[15]:
# As interaction
model = smf.ols(
    "call ~ black*female + yearsexp + computerskills + C(education)", resumes
).fit()
model.get_robustcov_results(cov_type="HC3").summary()
[15]:
OLS Regression Results
Dep. Variable: call R-squared: 0.008
Model: OLS Adj. R-squared: 0.006
Method: Least Squares F-statistic: 3.866
Date: Sun, 03 Mar 2024 Prob (F-statistic): 6.76e-05
Time: 10:32:17 Log-Likelihood: -551.00
No. Observations: 4870 AIC: 1122.
Df Residuals: 4860 BIC: 1187.
Df Model: 9
Covariance Type: HC3
coef std err t P>|t| [0.025 0.975]
Intercept 0.0807 0.040 1.996 0.046 0.001 0.160
C(education)[T.1] -0.0021 0.057 -0.037 0.971 -0.114 0.110
C(education)[T.2] -0.0001 0.042 -0.003 0.998 -0.082 0.082
C(education)[T.3] -0.0026 0.039 -0.066 0.947 -0.079 0.074
C(education)[T.4] -0.0048 0.038 -0.125 0.900 -0.080 0.070
black -0.0287 0.016 -1.840 0.066 -0.059 0.002
female 0.0131 0.014 0.919 0.358 -0.015 0.041
black:female -0.0038 0.018 -0.213 0.831 -0.039 0.031
yearsexp 0.0032 0.001 3.668 0.000 0.001 0.005
computerskills -0.0186 0.011 -1.618 0.106 -0.041 0.004
Omnibus: 2950.616 Durbin-Watson: 1.448
Prob(Omnibus): 0.000 Jarque-Bera (JB): 18630.964
Skew: 3.047 Prob(JB): 0.00
Kurtosis: 10.395 Cond. No. 226.


Notes:
[1] Standard Errors are heteroscedasticity robust (HC3)
[16]:
results["ex9_gender_and_discrimination"] = "greater discrimination for women"

A little bit more discrimination for women maybe, but not only is it extremely insignificant statistically, but the point estimate is also very small (unlike above)

Exercise 10

Calculate and/or lookup the following online:

  • What is the share of applicants in our dataset with college degrees?

  • What share of Black adult Americans have college degrees (i.e. have completed a bachelors degree)?

Is the share of Black applicants with college degrees in this data "greater", or "less" than in the US? Store your answer as one of those strings in "ex10_experiment_v_us"

[17]:
# In our data:
print(
    f"{ (resumes['education'] == 4).mean():.1%} of "
    "applicants have a college degree in the experimental data"
)
72.0% of applicants have a college degree in the experimental data

In the US, about 16% of Black adults have a college degree.

So the applicants are significantly more educated than the average American.

[18]:
results["ex10_experiment_v_us"] = "greater"

Exercise 11

Bearing in mind your answers to Exercise 8 and to Exercise 10, how do you think the Average Treatment Effect you estimated in Exercises 5 and 6 might generalize to the experience of the average Black American (i.e., how do you think the ATE for the average Black American would compare to the ATE estimated from this experiment)?

With the obvious caveat that the statistical significance of the difference is low, but where we also acknowledge that is probably because our test is underpowered, given that discrimination seems higher for less educated job applicants and the resume experiment over-represented people with college degrees, real workplace discrimination is likely higher than suggested by the ATE estimated in Exercise 6.

Exercise 12

What does your answer to Exercise 10 imply about the study’s internal validity?

Nothing! It’s unrelated to internal validity.

Exercise 13

What does your answer to Exercise 10 imply about the study’s external validity?

It implies that the study may not have external validity with respect to the broader US population, though we can easily use this as a lower bound on call back discrimination.

What Did We Just Measure?

It’s worth pausing for a moment to think about exactly what we’ve measured in this experiment. Was it the effect of race on hiring? Or the difference in the experience of the average White job applicant from the average Black job applicant?

Well… no. What we have measured in this experiment is just the effect of having a Black-sounding name (as opposed to a White-sounding name) on your resume on the likelihood of getting a followup call from someone hiring in Boston or Chicago given identical resumes. In that sense, what we’ve measured is a small piece of the difference in the experience of Black and White Americans when seeking employment. As anyone looking for a job knows, getting a call-back is obviously a crucial step in getting a job, so this difference—even if it’s just one part of the overall difference—is remarkable.

[22]:
assert set(results.keys()) == {
    "ex2_pvalue_computerskills",
    "ex2_pvalue_female",
    "ex2_pvalue_yearsexp",
    "ex3_pvalue_education",
    "ex4_validity",
    "ex5_pvalue",
    "ex5_white_advantage_percent",
    "ex5_white_advantage_percentage_points",
    "ex6_black_pvalue",
    "ex8_black_college",
    "ex8_black_nocollege",
    "ex8_college_heterogeneity",
    "ex9_gender_and_discrimination",
    "ex10_experiment_v_us",
}