Matching Exercise¶
In this exercise, we’ll be evaluating how getting a college degree impacts earnings in the US using matching.
Matching Packages: Python v. R¶
Just as the best tools for machine learning tend to be in Python since they’re developed by CS people (who prefer Python), most of the best tools for causal inference are implemented in R since innovation in causal inference tends to be lead by social scientists using R. As a result, the most well developed matching package is called MatchIt, and is only available in R (though you can always call it from Python using rpy2
).
In the last couple years, though, a group of computer scientists and statisticians here at Duke have made some great advancements in matching (especially the computational side of things), and they recently released a set of matching packages in both R and Python that we’ll be using today. They have some great algorithms we’ll use today, but be aware these packages aren’t as mature, and aren’t general purpose packages yet. So if you ever get deep into matching, be aware you will probably still want to make at least partial use of the R package MatchIt, as well as some other R packages for new innovative techniques (like Matching Frontier estimation), or Adaptive Hyper-Box Matching.
Installing dame-flame.¶
For this lesson, begin by installing dame-flame
with pip install dame-flame
(it’s not on conda yet).
DAME is an algorithm that we can use for a version of course exact matching. The package only accepts a list of categorical variables, and then attempts to match pairs that match exactly on those variables. That means that if you want to match on, say, age, you have to break it up into categories (say, under 18, 18-29, 30-39, etc. etc.).
Of course, one cannot always find exact matches on all variables, so what DAME does is:
Find all observations that match on all matching variables.
Figure out which matching variable is least useful in predicting the outcome of interest \(Y\) and drops that, then tries to match the remaining observations on the narrowed set of matching variables.
This repeats until you run out of variables, all observations are matched, or you hit a stopping run (namely: quality of matches falls below a threshold).
In addition, the lab has also created FLAME, which does the same thing, but employs some tricks to make it massively more computationally efficient, meaning it can be used on datasets with millions of observations (which most matching algorithms cannot). It’s a little less accurate, but an amazing contribution never the less.
Data Setup¶
To save you some time and let you focus on matching, I’ve pre-cleaned about one month worth of of data from the US Current Population Survey data we used for our gender discrimination analysis. You can download the data from here, or read it directly with:
cps = pd.read_stata(
"https://github.com/nickeubank/MIDS_Data/blob/master"
"/Current_Population_Survey/cps_for_matching.dta?raw=true"
)
Load the data and quickly familiarize yourself with its contents.
[1]:
# Load critical packages
import pandas as pd
import numpy as np
import dame_flame
[2]:
# Load our Current Population Survey data
# a regular survey of US citizens
cps = pd.read_stata(
"https://github.com/nickeubank/MIDS_Data/blob/master"
"/Current_Population_Survey/cps_for_matching.dta?raw=true"
)
[3]:
# Take a look at the data
cps.head()
[3]:
index | annual_earnings | female | simplified_race | has_college | age | county | class94 | |
---|---|---|---|---|---|---|---|---|
0 | 151404 | NaN | 1 | 3.0 | 1 | 30 | 0-WV | Private, For Profit |
1 | 123453 | NaN | 0 | 0.0 | 0 | 21 | 251-TX | Private, For Profit |
2 | 187982 | NaN | 0 | 0.0 | 0 | 40 | 5-MA | Self-Employed, Unincorporated |
3 | 122356 | NaN | 1 | 0.0 | 1 | 27 | 0-TN | Private, Nonprofit |
4 | 210750 | 42900.0 | 1 | 0.0 | 0 | 52 | 0-IA | Private, For Profit |
Getting To Know Your Data¶
Before you start matching, it is important to examine your data to ensure that matching is feasible (you have some overlap the the features of people in the treated and untreated groups), and also that there is a reason to match: either you’re unsure about some of the functional forms at play, or your have some imbalance between the two groups.
Exercise 1¶
Show the raw difference of annual_earnings
between those with and without a college degree (has_college
). Is the difference statistically significant?
[4]:
import statsmodels.formula.api as smf
smf.ols("annual_earnings ~ has_college", cps).fit().summary()
[4]:
Dep. Variable: | annual_earnings | R-squared: | 0.063 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.063 |
Method: | Least Squares | F-statistic: | 370.2 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 6.56e-80 |
Time: | 13:59:39 | Log-Likelihood: | -63018. |
No. Observations: | 5515 | AIC: | 1.260e+05 |
Df Residuals: | 5513 | BIC: | 1.261e+05 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.887e+04 | 336.007 | 115.669 | 0.000 | 3.82e+04 | 3.95e+04 |
has_college | 1.416e+04 | 735.820 | 19.242 | 0.000 | 1.27e+04 | 1.56e+04 |
Omnibus: | 2214.375 | Durbin-Watson: | 1.974 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 10578.287 |
Skew: | 1.910 | Prob(JB): | 0.00 |
Kurtosis: | 8.608 | Cond. No. | 2.59 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[5]:
# About 14,000 a year, and it's very significant.
Exercise 2¶
Next we can check for balance. Check the share of people in different racial groups who have college degrees. Are those differences statistically significant?
Race is coded as White Non-Hispanic (0), Black Non-Hispanic (1), Hispanic (2), Other (3).
Does the distribution also look different across counties (I don’t need statistical significance for this)?
Does the data seem balanced?
[6]:
# This question wording is, admittedly, a little iffy.
# Basically, while we want frequency tables to do our chi2
# test, I know _I_ can't look at a frequency table and have
# any sense of whether the groups are actually balanced.
# So I like to see shares with my eyes, then use freq table to test.
[7]:
# One easy way to get differences in shares (and bi-variate significance)
smf.ols("has_college ~ C(simplified_race)", cps).fit().summary()
[7]:
Dep. Variable: | has_college | R-squared: | 0.032 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.032 |
Method: | Least Squares | F-statistic: | 122.1 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 7.74e-78 |
Time: | 13:59:39 | Log-Likelihood: | -7675.0 |
No. Observations: | 11150 | AIC: | 1.536e+04 |
Df Residuals: | 11146 | BIC: | 1.539e+04 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.4382 | 0.006 | 79.420 | 0.000 | 0.427 | 0.449 |
C(simplified_race)[T.1.0] | -0.1206 | 0.016 | -7.507 | 0.000 | -0.152 | -0.089 |
C(simplified_race)[T.2.0] | -0.2398 | 0.014 | -17.682 | 0.000 | -0.266 | -0.213 |
C(simplified_race)[T.3.0] | 0.0367 | 0.016 | 2.261 | 0.024 | 0.005 | 0.069 |
Omnibus: | 46681.807 | Durbin-Watson: | 1.965 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 1670.333 |
Skew: | 0.377 | Prob(JB): | 0.00 |
Kurtosis: | 1.261 | Cond. No. | 3.97 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[8]:
# Or just groupby:
cps.groupby("simplified_race")["has_college"].mean()
[8]:
simplified_race
0.0 0.438205
1.0 0.317647
2.0 0.198413
3.0 0.474900
Name: has_college, dtype: float64
[9]:
# Then for statistical significance:
ctab = pd.crosstab(cps["simplified_race"], cps["has_college"])
ctab
[9]:
has_college | 0 | 1 |
---|---|---|
simplified_race | ||
0.0 | 4282 | 3340 |
1.0 | 696 | 324 |
2.0 | 1212 | 300 |
3.0 | 523 | 473 |
[10]:
import scipy.stats
chi2, p, dof, expected = scipy.stats.chi2_contingency(ctab.values)
p
[10]:
1.2993875943569016e-76
[11]:
# Insanely significant. :)
[12]:
# And look at counties.
cps.groupby("county")[["has_college"]].describe().sort_values(("has_college", "mean"))
[12]:
has_college | ||||||||
---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | |
county | ||||||||
71-MO | 4.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
700-VA | 3.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69-NY | 1.0 | 0.000000 | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
69-FL | 4.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
17-MD | 2.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
75-CA | 17.0 | 0.882353 | 0.332106 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 |
21-NJ | 4.0 | 1.000000 | 0.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
19-NJ | 2.0 | 1.000000 | 0.000000 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
81-IN | 1.0 | 1.000000 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
171-MN | 1.0 | 1.000000 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
326 rows × 8 columns
[13]:
cps.groupby("county")[["has_college"]].mean().describe()
[13]:
has_college | |
---|---|
count | 326.000000 |
mean | 0.390058 |
std | 0.207616 |
min | 0.000000 |
25% | 0.262319 |
50% | 0.375000 |
75% | 0.500000 |
max | 1.000000 |
[14]:
# Good in the middle, but many counties have no college grads, and a few only have college grads.
Exercise 3¶
One of the other advantages of matching is that even when you have balanced data, you don’t have to go through the process of testing out different functional forms to see what fits the data base.
In our last exercise, we looked at the relationship between gender and earnings “controlling for age”, where we just put in age as a linear control. Plot a non-linear regression of annual_earnings on age (if you’re using plotnine
, use geom_smooth(method="lowess")
— if you’re using altair, use transform_loess
(tutorial examples here)).
Does the relationship look linear?
Does this speak to why it’s nice to not have to think about functional forms with matching as much?
[15]:
import altair as alt
alt.data_transformers.enable("data_server")
alt.Chart(cps).encode(x="age", y="annual_earnings").transform_loess(
on="age", loess="annual_earnings"
).mark_line()
[15]:
[16]:
# Not even remotely linear. Thank goodness we don't have to worry about that with matching!
# Though it wouldn't be *that* hard to fit a quadratic.
Matching!¶
Because DAME is an implementation of exact matching, we have to discretize all of our continuous variables. Thankfully, in this case we only have age
, so this shouldn’t be too hard!
Exercise 4¶
Create a new variable that discretizes age into a single value for each decade of age.
Because CPS only has employment data on people 18 or over, though, include people who are 18 or 19 with the 20 year olds so that group isn’t too small, and if you see any other really small groups, please merge those too.
[17]:
cps["discretized_age"] = cps.age // 10
cps.loc[cps["discretized_age"] == 1, "discretized_age"] = 2
cps["discretized_age"].value_counts()
[17]:
3 2760
4 2551
5 2397
2 1990
6 1236
7 173
8 43
Name: discretized_age, dtype: int64
[18]:
# 70 and 80 year olds are tiny groups.
cps.loc[cps["discretized_age"] == 8, "discretized_age"] = 7
Exercise 5¶
We also have to covert our string variables into numeric variables for DAME, so convert county
and class94
to a numeric vector of intergers.
(Note: it’s not clear whether class94
belongs: if it reflects people choosing fields based on passion, it belongs; if people choose certain jobs because of their degrees, its not something we’d actually want in our regression.
Hint: if you use pd.Categorical
to convert you var to a categorical, you can pull the underlying integer codes with .codes
.
[19]:
cps["county"] = pd.Categorical(cps["county"]).codes
cps["county"].value_counts()
[19]:
41 576
200 275
12 230
33 225
51 223
...
122 1
263 1
285 1
154 1
213 1
Name: county, Length: 326, dtype: int64
[20]:
cps["class94"] = pd.Categorical(cps["class94"]).codes
cps["class94"].value_counts()
[20]:
3 7809
1 740
4 706
2 615
6 552
5 387
0 337
7 4
Name: class94, dtype: int64
Let’s Do Matching with DAME¶
Exercise 6¶
First, drop all the variables you don’t want in matching (e.g. your original age
variable), and any observations for which annual_earnings
is missing.
You will probably also have to drop a column named index
: DAME will try and match on ANY included variables, and so because there was a column called index
in the data we imported, if we leave it in DAME will try (and obviously fail) to match on index.
Also, it’s best to reset your index, as dame_flame
using index labels (e.g., the values in df.index
) to identify matches. So you want to be sure those are unique.
[21]:
for_matching = cps.drop(["age", "index"], axis="columns")
for_matching = for_matching[for_matching.annual_earnings.notnull()]
for_matching = for_matching.reset_index(drop=True)
for_matching
[21]:
annual_earnings | female | simplified_race | has_college | county | class94 | discretized_age | |
---|---|---|---|---|---|---|---|
0 | 42900.0 | 1 | 0.0 | 0 | 10 | 3 | 5 |
1 | 31200.0 | 0 | 2.0 | 0 | 31 | 3 | 3 |
2 | 20020.0 | 0 | 0.0 | 1 | 8 | 3 | 6 |
3 | 22859.2 | 0 | 0.0 | 0 | 44 | 1 | 4 |
4 | 73860.8 | 0 | 0.0 | 1 | 24 | 3 | 3 |
... | ... | ... | ... | ... | ... | ... | ... |
5510 | 33800.0 | 1 | 3.0 | 0 | 247 | 3 | 3 |
5511 | 23920.0 | 0 | 3.0 | 0 | 272 | 3 | 5 |
5512 | 31200.0 | 0 | 2.0 | 0 | 246 | 3 | 2 |
5513 | 37440.0 | 0 | 0.0 | 0 | 99 | 3 | 2 |
5514 | 26000.0 | 0 | 1.0 | 0 | 23 | 2 | 5 |
5515 rows × 7 columns
Exercise 7¶
The syntax of dame_flame
is similar to the syntax of sklearn
. If you start with a dataset called my_data
with a treat
variable with treatment assignment and an outcome
variable for my outcome of interest (\(Y\)), the syntax to do basic matching would be:
import dame_flame
model = dame_flame.matching.DAME(repeats=False, verbose=3, want_pe=True)
model.fit(
my_data,
treatment_column_name="treat",
outcome_column_name="outcome",
)
result = model.predict(my_data)
Where the arguments:
repeats=False
says that I only want each observation to get matched once. We’ll talk about what happens if we userepeats=True
below.verbose=3
tells dame to report everything it’s doing as it goes.want_pe
says “please include the predictive error in your printout at each step”. This is a measure of match quality.
So run DAME on your data!
[22]:
model = dame_flame.matching.DAME(repeats=False, verbose=3, want_pe=True)
model.fit(
for_matching,
treatment_column_name="has_college",
outcome_column_name="annual_earnings",
)
result = model.predict(for_matching)
Completed iteration 0 of matching
Number of matched groups formed in total: 370
Unmatched treated units: 644 out of a total of 1150 treated units
Unmatched control units: 3187 out of a total of 4365 control units
Number of matches made this iteration: 1684
Number of matches made so far: 1684
Covariates dropped so far: set()
Predictive error of covariate set used to match: 1199312680.0957854
Completed iteration 1 of matching
Number of matched groups formed in total: 494
Unmatched treated units: 25 out of a total of 1150 treated units
Unmatched control units: 180 out of a total of 4365 control units
Number of matches made this iteration: 3626
Number of matches made so far: 5310
Covariates dropped so far: frozenset({'county'})
Predictive error of covariate set used to match: 1199421883.1095908
Completed iteration 2 of matching
Number of matched groups formed in total: 494
Unmatched treated units: 25 out of a total of 1150 treated units
Unmatched control units: 180 out of a total of 4365 control units
Number of matches made this iteration: 0
Number of matches made so far: 5310
Covariates dropped so far: frozenset({'simplified_race'})
Predictive error of covariate set used to match: 1204727749.8949614
Completed iteration 3 of matching
Number of matched groups formed in total: 505
Unmatched treated units: 8 out of a total of 1150 treated units
Unmatched control units: 129 out of a total of 4365 control units
Number of matches made this iteration: 68
Number of matches made so far: 5378
Covariates dropped so far: frozenset({'county', 'simplified_race'})
Predictive error of covariate set used to match: 1204742613.479154
Completed iteration 4 of matching
Number of matched groups formed in total: 505
Unmatched treated units: 8 out of a total of 1150 treated units
Unmatched control units: 129 out of a total of 4365 control units
Number of matches made this iteration: 0
Number of matches made so far: 5378
Covariates dropped so far: frozenset({'class94'})
Predictive error of covariate set used to match: 1205072671.3262901
Completed iteration 5 of matching
Number of matched groups formed in total: 508
Unmatched treated units: 5 out of a total of 1150 treated units
Unmatched control units: 120 out of a total of 4365 control units
Number of matches made this iteration: 12
Number of matches made so far: 5390
Covariates dropped so far: frozenset({'class94', 'county'})
Predictive error of covariate set used to match: 1205171280.4727237
Completed iteration 6 of matching
Number of matched groups formed in total: 509
Unmatched treated units: 4 out of a total of 1150 treated units
Unmatched control units: 119 out of a total of 4365 control units
Number of matches made this iteration: 2
Number of matches made so far: 5392
Covariates dropped so far: frozenset({'class94', 'simplified_race'})
Predictive error of covariate set used to match: 1210524158.7436352
Completed iteration 7 of matching
Number of matched groups formed in total: 511
Unmatched treated units: 0 out of a total of 1150 treated units
Unmatched control units: 110 out of a total of 4365 control units
Number of matches made this iteration: 13
Number of matches made so far: 5405
Covariates dropped so far: frozenset({'class94', 'county', 'simplified_race'})
Predictive error of covariate set used to match: 1210539313.933855
5405 units matched. We finished with no more treated units to match
Interpreting DAME output¶
The output you get from doing this should be reports from about 8 iterations of matching. In each iteration, you’ll see a description of the number of matches made in the iteration, the number of treatment units still unmatched, and the number of control units unmatched.
In the first iteration, the algorithm tries to match observations that match on all the variables in your data. That’s why in the first iteration, you see the set of variables being dropped is an empty set – it hasn’t dropped any variables:
Completed iteration 0 of matching
Number of matched groups formed in total: 370
Unmatched treated units: 644 out of a total of 1150 treated units
Unmatched control units: 3187 out of a total of 4365 control units
Number of matches made this iteration: 1684
Number of matches made so far: 1684
Covariates dropped so far: set()
Predictive error of covariate set used to match: 1199312680.0957854
(Note depending on how you binned ages, you may get slightly different results than this)
But as we can see from this output, the algorithm found 1,684 perfect matches—pairs of observations (one treated, one untreated) that had exactly the same value of all the variables we included. But we also see we still have 644 unmatched treated units, so what do we do?
The answer is that if we want to match more of our treatment variables, we have to try and match on a subset of our variables.
But what variable should we drop? This is the secret sauce of DAME. DAME picks the variables to drop by trying to predict our outcome \(Y\) using all our variables (by default using a ridge regression), then it drops the matching variable that is contributing the least to that prediction. Since our goal in matching is to eliminate baseline differences (\(E(Y_0|D=1) - E(Y_1|D=0)\)), dropping the covariates least related to \(Y\) makes sense.
As a result, in the second iteration (called iteration 1, since it uses 0-based indexing), we see that the variable it drops first is county
, and it’s subsequently able to make another 3,626 new matches on the remaining variables!
Completed iteration 1 of matching
Number of matched groups formed in total: 494
Unmatched treated units: 25 out of a total of 1150 treated units
Unmatched control units: 180 out of a total of 4365 control units
Number of matches made this iteration: 3626
Number of matches made so far: 5310
Covariates dropped so far: frozenset({'county'})
Predictive error of covariate set used to match: 1199421883.1095908
And so DAME continues until after 8 iterations, it’s matched all treated observations.
Exercise 8¶
Congratulations! You just on your first one-to-many matching!
The next step is to think about which of the matches that DAME generated are good enough for inclusion in our analysis. As you may recall, one of the choices you have to make as a researcher when doing matching is how “good” a match has to be in order to be included in your final data set. By default, DAME will keep dropping matching variables until it has been able to match all the treated observations or runs out of variables. It will do this no matter how bad the matches start to become – if it ends up with the treated observation and a control observation that can only be matched on gender, it will match them just on gender, even though we probably don’t think that that’s a “good” match.
The way to control this behavior is to tell DAME when to stop manually using the early_stop_iterations
argument to tell the matching algorithm when to stop.
So when is a good time to stop? There’s no objective or “right” answer to that question. It fundamentally comes down to a trade-off between bias (which gets higher is you allow more low quality matches into your data) and variance (which will go down as you increase the number of matches you keep).
But one way to start the process of picking a cut point is to examine how the quality of matches evolves over iterations. DAME keeps this information in model.pe_each_iter
. This shows, for each iteration, the “prediction error” resulting from dropping the variables excluded in each step. This “prediction error” is the difference in the mean-squared error of regressing \(Y\) on our matching variables (by default in a ridge regression) with all variables versus with the subset being used for matching in a given iteration. By design, of course, this is always increasing.
To see how this evolves, plot your pe
against iteration numbers. You can also see the pe
values for each iteration reported in the output from when DAME ran above if you want to make your you’re lining up the errors with iterations right.
Are there any points where the match quality seems to fall off dramatically?
[23]:
model.pe_each_iter
[23]:
[1199312680.0957854,
1199421883.1095908,
1204727749.8949614,
1204742613.479154,
1205072671.3262901,
1205171280.4727237,
1210524158.7436352,
1210539313.933855]
[24]:
for_pe = pd.DataFrame(
{"pe": model.pe_each_iter, "i": range(0, len(model.pe_each_iter))}
)
for_pe
[24]:
pe | i | |
---|---|---|
0 | 1.199313e+09 | 0 |
1 | 1.199422e+09 | 1 |
2 | 1.204728e+09 | 2 |
3 | 1.204743e+09 | 3 |
4 | 1.205073e+09 | 4 |
5 | 1.205171e+09 | 5 |
6 | 1.210524e+09 | 6 |
7 | 1.210539e+09 | 7 |
[25]:
alt.Chart(for_pe).encode(x="i", y=alt.Y("pe", scale=alt.Scale(zero=False))).mark_line()
[25]:
[26]:
# Yup! Iteration 2 and 6 are the really the big ones...
Exercise 9¶
Suppose we want to ensure we have at least 5,000 observations in our matched data—where might you cut off the data to get a sample size of at least that but before a big quality falloff?
[27]:
# I'd stop after iteration 1 (the second iteration)—things fall off fast
# starting after that, but with very few added matches.
Exercise 10¶
Re-run your matching, stopping at the point you picked above using early_stop_iterations
.
[28]:
model = dame_flame.matching.DAME(
repeats=False, verbose=3, want_pe=True, early_stop_iterations=1
)
model.fit(
for_matching,
treatment_column_name="has_college",
outcome_column_name="annual_earnings",
)
result = model.predict(for_matching)
Completed iteration 0 of matching
Number of matched groups formed in total: 370
Unmatched treated units: 644 out of a total of 1150 treated units
Unmatched control units: 3187 out of a total of 4365 control units
Number of matches made this iteration: 1684
Number of matches made so far: 1684
Covariates dropped so far: set()
Predictive error of covariate set used to match: 1199312680.0957854
Completed iteration 1 of matching
Number of matched groups formed in total: 494
Unmatched treated units: 25 out of a total of 1150 treated units
Unmatched control units: 180 out of a total of 4365 control units
Number of matches made this iteration: 3626
Number of matches made so far: 5310
Covariates dropped so far: frozenset({'county'})
Predictive error of covariate set used to match: 1199421883.1095908
5310 units matched. We stopped after iteration 1
Getting Back a Dataset¶
OK, my one current complaint with DAME is that it doesn’t just give you back a nice dataset of your matches for analysis. If we look at our results – matches
– it’s almost what we want, except it’s dropped our treatment and outcome columns, and it’s put a string *
in any entry where a value wasn’t used for matching:
female simplified_race county class94 discretized_age
0 1.0 0.0 10.0 3.0 5.0
1 0.0 2.0 * 3.0 3.0
2 0.0 0.0 8.0 3.0 6.0
3 0.0 0.0 * 1.0 4.0
4 0.0 0.0 24.0 3.0 3.0
So for now (though I think this will get updated in the package), we’ll have to do it ourselves! Just copy-paste this:
def get_dataframe(model, result_of_fit):
# Get original data
better = model.input_data.loc[result_of_fit.index]
if not better.index.is_unique:
raise ValueError("Need index values in input data to be unique")
# Get match groups for clustering
better["match_group"] = np.nan
better["match_group_size"] = np.nan
for idx, group in enumerate(model.units_per_group):
better.loc[group, "match_group"] = idx
better.loc[group, "match_group_size"] = len(group)
# Get weights. I THINK this is right?! At least for with repeat=False?
t = model.treatment_column_name
better["t_in_group"] = better.groupby("match_group")[t].transform(np.sum)
# Make weights
better["weights"] = np.nan
better.loc[better[t] == 1, "weights"] = 1 # treaments are 1
# Controls start as proportional to num of treatments
# each observation is matched to.
better.loc[better[t] == 0, "weights"] = better["t_in_group"] / (
better["match_group_size"] - better["t_in_group"]
)
# Then re-normalize for num unique control observations.
control_weights = better[better[t] == 0]["weights"].sum()
num_control_obs = len(better[better[t] == 0].index.drop_duplicates())
renormalization = num_control_obs / control_weights
better.loc[better[t] == 0, "weights"] = (
better.loc[better[t] == 0, "weights"] * renormalization
)
assert better.weights.notnull().all()
better = better.drop(["t_in_group"], axis="columns")
# Make sure right length and values!
assert len(result_of_fit) == len(better)
assert better.loc[better[t] == 0, "weights"].sum() == num_control_obs
return better
Exercise 11¶
Copy-paste that code and run it with your original data, your (fit) model, and what you got back when you ran result_of_fit
. Then we’ll work with the output of that. You should get back a single dataframe of the same length as your original model.
[29]:
result
[29]:
female | simplified_race | county | class94 | discretized_age | |
---|---|---|---|---|---|
0 | 1.0 | 0.0 | 10.0 | 3.0 | 5.0 |
1 | 0.0 | 2.0 | * | 3.0 | 3.0 |
2 | 0.0 | 0.0 | 8.0 | 3.0 | 6.0 |
3 | 0.0 | 0.0 | * | 1.0 | 4.0 |
4 | 0.0 | 0.0 | 24.0 | 3.0 | 3.0 |
... | ... | ... | ... | ... | ... |
5509 | 0.0 | 0.0 | * | 3.0 | 6.0 |
5510 | 1.0 | 3.0 | 247.0 | 3.0 | 3.0 |
5511 | 0.0 | 3.0 | * | 3.0 | 5.0 |
5512 | 0.0 | 2.0 | 246.0 | 3.0 | 2.0 |
5513 | 0.0 | 0.0 | 99.0 | 3.0 | 2.0 |
5310 rows × 5 columns
[30]:
def get_dataframe(model, result_of_fit):
# Get original data
better = model.input_data.loc[result_of_fit.index]
if not better.index.is_unique:
raise ValueError("Need index values in input data to be unique")
# Get match groups for clustering
better["match_group"] = np.nan
better["match_group_size"] = np.nan
for idx, group in enumerate(model.units_per_group):
better.loc[group, "match_group"] = idx
better.loc[group, "match_group_size"] = len(group)
# Get weights. I THINK this is right?! At least for with repeat=False?
t = model.treatment_column_name
better["t_in_group"] = better.groupby("match_group")[t].transform(np.sum)
# Make weights
better["weights"] = np.nan
better.loc[better[t] == 1, "weights"] = 1 # treaments are 1
# Controls start as proportional to num of treatments
# each observation is matched to.
better.loc[better[t] == 0, "weights"] = better["t_in_group"] / (
better["match_group_size"] - better["t_in_group"]
)
# Then re-normalize for num unique control observations.
control_weights = better[better[t] == 0]["weights"].sum()
num_control_obs = len(better[better[t] == 0].index.drop_duplicates())
renormalization = num_control_obs / control_weights
better.loc[better[t] == 0, "weights"] = (
better.loc[better[t] == 0, "weights"] * renormalization
)
assert better.weights.notnull().all()
better = better.drop(["t_in_group"], axis="columns")
# Make sure right length and values!
assert len(result_of_fit) == len(better)
assert better.loc[better[t] == 0, "weights"].sum() == num_control_obs
return better
[31]:
matched_data = get_dataframe(model, result)
[32]:
matched_data.head()
[32]:
annual_earnings | female | simplified_race | has_college | county | class94 | discretized_age | match_group | match_group_size | weights | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 42900.0 | 1 | 0.0 | 0 | 10 | 3 | 5 | 59.0 | 5.0 | 0.930000 |
1 | 31200.0 | 0 | 2.0 | 0 | 31 | 3 | 3 | 411.0 | 108.0 | 0.070189 |
2 | 20020.0 | 0 | 0.0 | 1 | 8 | 3 | 6 | 52.0 | 3.0 | 1.000000 |
3 | 22859.2 | 0 | 0.0 | 0 | 44 | 1 | 4 | 424.0 | 28.0 | 1.240000 |
4 | 73860.8 | 0 | 0.0 | 1 | 24 | 3 | 3 | 106.0 | 7.0 | 1.000000 |
Check Your Matches and Analyze¶
Exercise 12¶
We previously tested balance on simplified_race
, and by county. Check those again. Are there still statistically significant differences in college education by simplified_race
?
Note that when you test for this, you’ll need to take into account the weights
column you got back from get_dataframe
. What DAME does is not actually the 1-to-1 matching described in our readings – instead, however many observations that exact match it finds it puts in the same “group”. (These groups are identified in the dataframe you got from get_dataframe
by the column match_group
, and the size of each group is in match_group_size
.)
So to analyze the data, you need to use the wls
(weighted least squares) function in statsmodels
. For example, if your data is called matched_data
, you might run:
smf.wls(
"has_college ~ C(simplified_race)", matched_data, weights=matched_data["weights"]
).fit().summary()
[33]:
import statsmodels.formula.api as smf
smf.wls(
"has_college ~ C(simplified_race)", matched_data, weights=matched_data["weights"]
).fit().summary()
[33]:
Dep. Variable: | has_college | R-squared: | 0.000 |
---|---|---|---|
Model: | WLS | Adj. R-squared: | -0.001 |
Method: | Least Squares | F-statistic: | 1.134e-12 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 1.00 |
Time: | 13:59:41 | Log-Likelihood: | -3736.0 |
No. Observations: | 5310 | AIC: | 7480. |
Df Residuals: | 5306 | BIC: | 7506. |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.2119 | 0.007 | 31.608 | 0.000 | 0.199 | 0.225 |
C(simplified_race)[T.1.0] | 3.469e-17 | 0.018 | 1.92e-15 | 1.000 | -0.036 | 0.036 |
C(simplified_race)[T.2.0] | -5.378e-17 | 0.019 | -2.86e-15 | 1.000 | -0.037 | 0.037 |
C(simplified_race)[T.3.0] | 1.18e-16 | 0.020 | 5.83e-15 | 1.000 | -0.040 | 0.040 |
Omnibus: | 860.389 | Durbin-Watson: | 2.000 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 1353.227 |
Skew: | 1.234 | Prob(JB): | 1.41e-294 |
Kurtosis: | 2.851 | Cond. No. | 3.95 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Exercise 13¶
Now use a weighted least squares regression on your matched data to regress annual earnings on just having a college eduction. What is the apparent effect of a BA? How does that compare to our initial estimate using the raw CPS data (before matching)?
[34]:
smf.wls(
"annual_earnings ~ has_college", matched_data, weights=matched_data["weights"]
).fit().summary()
[34]:
Dep. Variable: | annual_earnings | R-squared: | 0.058 |
---|---|---|---|
Model: | WLS | Adj. R-squared: | 0.057 |
Method: | Least Squares | F-statistic: | 324.1 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 2.19e-70 |
Time: | 13:59:41 | Log-Likelihood: | -61753. |
No. Observations: | 5310 | AIC: | 1.235e+05 |
Df Residuals: | 5308 | BIC: | 1.235e+05 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.909e+04 | 351.293 | 111.287 | 0.000 | 3.84e+04 | 3.98e+04 |
has_college | 1.374e+04 | 763.203 | 18.003 | 0.000 | 1.22e+04 | 1.52e+04 |
Omnibus: | 2934.035 | Durbin-Watson: | 2.006 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 33100.529 |
Skew: | 2.424 | Prob(JB): | 0.00 |
Kurtosis: | 14.230 | Cond. No. | 2.58 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[35]:
# dame_flame.utils.post_processing.ATE(matching_object=model)
Exercise 14¶
Now include our other matching variables as controls (e.g. all the coefficients you gave to DAME to use). Does the coefficient change?
[36]:
smf.wls(
"annual_earnings ~ has_college + C(simplified_race)"
" + C(discretized_age) + female + C(county)",
matched_data,
weights=matched_data["weights"],
).fit().summary()
[36]:
Dep. Variable: | annual_earnings | R-squared: | 0.238 |
---|---|---|---|
Model: | WLS | Adj. R-squared: | 0.188 |
Method: | Least Squares | F-statistic: | 4.786 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 1.62e-132 |
Time: | 13:59:42 | Log-Likelihood: | -61189. |
No. Observations: | 5310 | AIC: | 1.230e+05 |
Df Residuals: | 4984 | BIC: | 1.252e+05 |
Df Model: | 325 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 4.761e+04 | 2429.505 | 19.595 | 0.000 | 4.28e+04 | 5.24e+04 |
C(simplified_race)[T.1.0] | -8344.9150 | 1067.331 | -7.818 | 0.000 | -1.04e+04 | -6252.476 |
C(simplified_race)[T.2.0] | -6753.9175 | 1140.523 | -5.922 | 0.000 | -8989.844 | -4517.991 |
C(simplified_race)[T.3.0] | -3220.6308 | 1202.997 | -2.677 | 0.007 | -5579.035 | -862.227 |
C(discretized_age)[T.3] | 8584.0505 | 868.037 | 9.889 | 0.000 | 6882.316 | 1.03e+04 |
C(discretized_age)[T.4] | 1.251e+04 | 923.078 | 13.558 | 0.000 | 1.07e+04 | 1.43e+04 |
C(discretized_age)[T.5] | 1.266e+04 | 964.214 | 13.131 | 0.000 | 1.08e+04 | 1.46e+04 |
C(discretized_age)[T.6] | 9235.0616 | 1189.062 | 7.767 | 0.000 | 6903.976 | 1.16e+04 |
C(discretized_age)[T.7] | 1.347e+04 | 2975.342 | 4.528 | 0.000 | 7639.580 | 1.93e+04 |
C(county)[T.1] | -1.114e+04 | 3231.653 | -3.446 | 0.001 | -1.75e+04 | -4799.550 |
C(county)[T.2] | -1.279e+04 | 3245.734 | -3.942 | 0.000 | -1.92e+04 | -6430.115 |
C(county)[T.3] | -9142.9921 | 1.19e+04 | -0.771 | 0.441 | -3.24e+04 | 1.41e+04 |
C(county)[T.4] | -6471.4363 | 2990.234 | -2.164 | 0.030 | -1.23e+04 | -609.261 |
C(county)[T.5] | -6378.3577 | 4178.131 | -1.527 | 0.127 | -1.46e+04 | 1812.617 |
C(county)[T.6] | -1.627e+04 | 1.04e+04 | -1.566 | 0.118 | -3.66e+04 | 4103.579 |
C(county)[T.7] | -1.023e+04 | 3835.754 | -2.666 | 0.008 | -1.77e+04 | -2706.649 |
C(county)[T.8] | -1.153e+04 | 3175.916 | -3.629 | 0.000 | -1.78e+04 | -5300.388 |
C(county)[T.9] | -1.382e+04 | 5130.249 | -2.694 | 0.007 | -2.39e+04 | -3762.096 |
C(county)[T.10] | -1.501e+04 | 3427.926 | -4.380 | 0.000 | -2.17e+04 | -8293.693 |
C(county)[T.11] | -1.418e+04 | 3040.008 | -4.664 | 0.000 | -2.01e+04 | -8218.173 |
C(county)[T.12] | -6849.3954 | 3121.285 | -2.194 | 0.028 | -1.3e+04 | -730.303 |
C(county)[T.13] | -1.381e+04 | 3368.446 | -4.101 | 0.000 | -2.04e+04 | -7209.235 |
C(county)[T.14] | -1.369e+04 | 3918.748 | -3.493 | 0.000 | -2.14e+04 | -6007.378 |
C(county)[T.15] | -9190.4689 | 4799.618 | -1.915 | 0.056 | -1.86e+04 | 218.895 |
C(county)[T.16] | -1.061e+04 | 3616.372 | -2.935 | 0.003 | -1.77e+04 | -3523.982 |
C(county)[T.17] | -1.601e+04 | 7731.556 | -2.070 | 0.038 | -3.12e+04 | -848.909 |
C(county)[T.18] | -1788.7046 | 4787.690 | -0.374 | 0.709 | -1.12e+04 | 7597.275 |
C(county)[T.19] | -1.684e+04 | 6555.900 | -2.569 | 0.010 | -2.97e+04 | -3989.067 |
C(county)[T.20] | -1.4e+04 | 4328.098 | -3.235 | 0.001 | -2.25e+04 | -5515.772 |
C(county)[T.21] | -6184.3776 | 3595.547 | -1.720 | 0.085 | -1.32e+04 | 864.476 |
C(county)[T.22] | -1.826e+04 | 3694.928 | -4.941 | 0.000 | -2.55e+04 | -1.1e+04 |
C(county)[T.23] | -1.242e+04 | 3244.837 | -3.828 | 0.000 | -1.88e+04 | -6059.940 |
C(county)[T.24] | -1.104e+04 | 3171.193 | -3.482 | 0.001 | -1.73e+04 | -4824.300 |
C(county)[T.25] | -1.506e+04 | 3496.105 | -4.309 | 0.000 | -2.19e+04 | -8211.049 |
C(county)[T.26] | -1.391e+04 | 3137.745 | -4.432 | 0.000 | -2.01e+04 | -7755.819 |
C(county)[T.27] | -1.337e+04 | 3345.523 | -3.997 | 0.000 | -1.99e+04 | -6814.638 |
C(county)[T.28] | -1.297e+04 | 5671.204 | -2.287 | 0.022 | -2.41e+04 | -1853.870 |
C(county)[T.29] | -4845.0790 | 5203.125 | -0.931 | 0.352 | -1.5e+04 | 5355.335 |
C(county)[T.30] | -2248.8602 | 5067.398 | -0.444 | 0.657 | -1.22e+04 | 7685.469 |
C(county)[T.31] | -1.155e+04 | 4710.979 | -2.452 | 0.014 | -2.08e+04 | -2313.897 |
C(county)[T.32] | -7114.5598 | 3632.908 | -1.958 | 0.050 | -1.42e+04 | 7.538 |
C(county)[T.33] | -1.272e+04 | 3067.035 | -4.149 | 0.000 | -1.87e+04 | -6711.354 |
C(county)[T.34] | -1.32e+04 | 3443.339 | -3.833 | 0.000 | -1.99e+04 | -6447.428 |
C(county)[T.35] | -9324.2429 | 3384.603 | -2.755 | 0.006 | -1.6e+04 | -2688.932 |
C(county)[T.36] | -1.505e+04 | 3658.198 | -4.115 | 0.000 | -2.22e+04 | -7881.001 |
C(county)[T.37] | -1.023e+04 | 3762.247 | -2.720 | 0.007 | -1.76e+04 | -2856.766 |
C(county)[T.38] | -1.308e+04 | 3310.408 | -3.951 | 0.000 | -1.96e+04 | -6590.153 |
C(county)[T.39] | -1.177e+04 | 3252.230 | -3.618 | 0.000 | -1.81e+04 | -5391.513 |
C(county)[T.40] | -1.516e+04 | 3677.299 | -4.123 | 0.000 | -2.24e+04 | -7954.028 |
C(county)[T.41] | -9224.6049 | 2690.524 | -3.429 | 0.001 | -1.45e+04 | -3949.994 |
C(county)[T.42] | -1.365e+04 | 3460.095 | -3.944 | 0.000 | -2.04e+04 | -6864.991 |
C(county)[T.43] | -1.165e+04 | 3698.161 | -3.151 | 0.002 | -1.89e+04 | -4404.402 |
C(county)[T.44] | -1.07e+04 | 2961.782 | -3.611 | 0.000 | -1.65e+04 | -4889.799 |
C(county)[T.45] | -7052.4496 | 3127.876 | -2.255 | 0.024 | -1.32e+04 | -920.437 |
C(county)[T.46] | -9064.0230 | 3380.497 | -2.681 | 0.007 | -1.57e+04 | -2436.761 |
C(county)[T.47] | -1.681e+04 | 3013.844 | -5.579 | 0.000 | -2.27e+04 | -1.09e+04 |
C(county)[T.48] | -8897.8731 | 3296.249 | -2.699 | 0.007 | -1.54e+04 | -2435.774 |
C(county)[T.49] | 2.125e+04 | 7993.324 | 2.659 | 0.008 | 5580.256 | 3.69e+04 |
C(county)[T.50] | -2.251e+04 | 1.13e+04 | -1.997 | 0.046 | -4.46e+04 | -409.459 |
C(county)[T.51] | -5880.2544 | 3452.043 | -1.703 | 0.089 | -1.26e+04 | 887.270 |
C(county)[T.52] | -1.006e+04 | 8297.064 | -1.213 | 0.225 | -2.63e+04 | 6201.302 |
C(county)[T.53] | -7396.8710 | 1.97e+04 | -0.376 | 0.707 | -4.59e+04 | 3.12e+04 |
C(county)[T.54] | -1.349e+04 | 8853.843 | -1.524 | 0.128 | -3.08e+04 | 3865.030 |
C(county)[T.55] | 6016.8132 | 9691.581 | 0.621 | 0.535 | -1.3e+04 | 2.5e+04 |
C(county)[T.56] | -7435.2407 | 7281.280 | -1.021 | 0.307 | -2.17e+04 | 6839.272 |
C(county)[T.57] | 8434.9441 | 1.4e+04 | 0.604 | 0.546 | -1.89e+04 | 3.58e+04 |
C(county)[T.58] | -5918.5652 | 5278.657 | -1.121 | 0.262 | -1.63e+04 | 4429.925 |
C(county)[T.59] | -1.631e+04 | 9894.838 | -1.649 | 0.099 | -3.57e+04 | 3086.114 |
C(county)[T.60] | -1.499e+04 | 6719.761 | -2.231 | 0.026 | -2.82e+04 | -1819.273 |
C(county)[T.61] | -8015.9218 | 1.09e+04 | -0.732 | 0.464 | -2.95e+04 | 1.34e+04 |
C(county)[T.62] | 1.171e+04 | 1.32e+04 | 0.884 | 0.377 | -1.43e+04 | 3.77e+04 |
C(county)[T.63] | -1.124e+04 | 8626.461 | -1.303 | 0.193 | -2.82e+04 | 5671.473 |
C(county)[T.64] | 1.57e+04 | 8596.970 | 1.826 | 0.068 | -1156.314 | 3.26e+04 |
C(county)[T.65] | -1.964e+04 | 9254.939 | -2.122 | 0.034 | -3.78e+04 | -1498.590 |
C(county)[T.66] | -2.286e+04 | 1.47e+04 | -1.550 | 0.121 | -5.18e+04 | 6046.864 |
C(county)[T.67] | -1.286e+04 | 2.97e+04 | -0.434 | 0.664 | -7.1e+04 | 4.53e+04 |
C(county)[T.68] | 265.8046 | 9594.437 | 0.028 | 0.978 | -1.85e+04 | 1.91e+04 |
C(county)[T.69] | -1.382e+04 | 9042.247 | -1.528 | 0.127 | -3.15e+04 | 3909.886 |
C(county)[T.70] | -2.086e+04 | 1.33e+04 | -1.567 | 0.117 | -4.7e+04 | 5231.063 |
C(county)[T.71] | -1.914e+04 | 8999.435 | -2.127 | 0.033 | -3.68e+04 | -1501.949 |
C(county)[T.72] | 1.203e+04 | 1.91e+04 | 0.630 | 0.528 | -2.54e+04 | 4.95e+04 |
C(county)[T.73] | -4151.5319 | 5648.737 | -0.735 | 0.462 | -1.52e+04 | 6922.479 |
C(county)[T.74] | -1.341e+04 | 7534.403 | -1.780 | 0.075 | -2.82e+04 | 1357.560 |
C(county)[T.75] | -1.719e+04 | 3924.766 | -4.381 | 0.000 | -2.49e+04 | -9500.690 |
C(county)[T.76] | -3.746e+04 | 2.62e+04 | -1.430 | 0.153 | -8.88e+04 | 1.39e+04 |
C(county)[T.77] | 237.8624 | 6897.951 | 0.034 | 0.972 | -1.33e+04 | 1.38e+04 |
C(county)[T.78] | 6770.8989 | 2.24e+04 | 0.302 | 0.763 | -3.72e+04 | 5.07e+04 |
C(county)[T.79] | 4.146e+04 | 8462.736 | 4.899 | 0.000 | 2.49e+04 | 5.8e+04 |
C(county)[T.80] | -1.741e+04 | 5589.953 | -3.115 | 0.002 | -2.84e+04 | -6454.909 |
C(county)[T.81] | -1.143e+04 | 5721.800 | -1.998 | 0.046 | -2.27e+04 | -216.304 |
C(county)[T.82] | -1.658e+04 | 1.61e+04 | -1.032 | 0.302 | -4.81e+04 | 1.49e+04 |
C(county)[T.83] | -2.307e+04 | 1.25e+04 | -1.840 | 0.066 | -4.77e+04 | 1514.953 |
C(county)[T.84] | -8999.9730 | 1.22e+04 | -0.740 | 0.459 | -3.28e+04 | 1.48e+04 |
C(county)[T.85] | 3419.8384 | 9046.856 | 0.378 | 0.705 | -1.43e+04 | 2.12e+04 |
C(county)[T.86] | 1.824e+04 | 1.43e+04 | 1.279 | 0.201 | -9720.390 | 4.62e+04 |
C(county)[T.87] | -2.403e+04 | 1.09e+04 | -2.205 | 0.027 | -4.54e+04 | -2667.316 |
C(county)[T.88] | -1.326e+04 | 1.74e+04 | -0.763 | 0.445 | -4.73e+04 | 2.08e+04 |
C(county)[T.89] | -1.574e+04 | 8531.472 | -1.845 | 0.065 | -3.25e+04 | 985.816 |
C(county)[T.90] | -8870.8793 | 4968.614 | -1.785 | 0.074 | -1.86e+04 | 869.791 |
C(county)[T.91] | -1.04e+04 | 9791.880 | -1.062 | 0.288 | -2.96e+04 | 8799.744 |
C(county)[T.92] | -1.687e+04 | 1.17e+04 | -1.441 | 0.150 | -3.98e+04 | 6076.622 |
C(county)[T.93] | -4593.8182 | 9052.966 | -0.507 | 0.612 | -2.23e+04 | 1.32e+04 |
C(county)[T.94] | -66.5470 | 9761.637 | -0.007 | 0.995 | -1.92e+04 | 1.91e+04 |
C(county)[T.95] | -1.199e+04 | 5085.007 | -2.357 | 0.018 | -2.2e+04 | -2017.441 |
C(county)[T.96] | -1.27e+04 | 1.03e+04 | -1.228 | 0.219 | -3.3e+04 | 7572.284 |
C(county)[T.97] | -2.457e+04 | 9432.408 | -2.605 | 0.009 | -4.31e+04 | -6080.404 |
C(county)[T.98] | 2197.3011 | 1.53e+04 | 0.144 | 0.886 | -2.78e+04 | 3.22e+04 |
C(county)[T.99] | -9289.2093 | 3098.282 | -2.998 | 0.003 | -1.54e+04 | -3215.213 |
C(county)[T.100] | 1.121e+04 | 7876.656 | 1.423 | 0.155 | -4235.032 | 2.66e+04 |
C(county)[T.101] | -1.684e+04 | 9598.189 | -1.754 | 0.079 | -3.57e+04 | 1979.720 |
C(county)[T.102] | -2597.8743 | 1.35e+04 | -0.192 | 0.847 | -2.91e+04 | 2.39e+04 |
C(county)[T.103] | -1.014e+04 | 9066.754 | -1.118 | 0.264 | -2.79e+04 | 7638.542 |
C(county)[T.104] | -252.2631 | 1.16e+04 | -0.022 | 0.983 | -2.3e+04 | 2.25e+04 |
C(county)[T.105] | -1.145e+04 | 1.3e+04 | -0.880 | 0.379 | -3.7e+04 | 1.41e+04 |
C(county)[T.106] | -3.539e+04 | 3.35e+04 | -1.055 | 0.291 | -1.01e+05 | 3.03e+04 |
C(county)[T.107] | -3.364e+04 | 1.71e+04 | -1.962 | 0.050 | -6.72e+04 | -29.720 |
C(county)[T.109] | -4795.8027 | 8034.001 | -0.597 | 0.551 | -2.05e+04 | 1.1e+04 |
C(county)[T.110] | -8716.5745 | 1.06e+04 | -0.822 | 0.411 | -2.95e+04 | 1.21e+04 |
C(county)[T.111] | -1.946e+04 | 2.19e+04 | -0.889 | 0.374 | -6.24e+04 | 2.35e+04 |
C(county)[T.112] | -7351.2007 | 1.42e+04 | -0.516 | 0.606 | -3.53e+04 | 2.06e+04 |
C(county)[T.113] | -2.293e+04 | 2.6e+04 | -0.882 | 0.378 | -7.39e+04 | 2.8e+04 |
C(county)[T.114] | -1.908e+04 | 1.61e+04 | -1.185 | 0.236 | -5.07e+04 | 1.25e+04 |
C(county)[T.115] | -2798.0105 | 1e+04 | -0.280 | 0.780 | -2.24e+04 | 1.68e+04 |
C(county)[T.116] | -8785.2988 | 1.35e+04 | -0.650 | 0.516 | -3.53e+04 | 1.77e+04 |
C(county)[T.117] | -2.473e+04 | 4.28e+04 | -0.577 | 0.564 | -1.09e+05 | 5.92e+04 |
C(county)[T.118] | -1.698e+04 | 1.1e+04 | -1.538 | 0.124 | -3.86e+04 | 4669.861 |
C(county)[T.120] | -1.704e+04 | 1.32e+04 | -1.287 | 0.198 | -4.3e+04 | 8908.447 |
C(county)[T.121] | -2.767e+04 | 1.32e+04 | -2.093 | 0.036 | -5.36e+04 | -1747.007 |
C(county)[T.123] | -1.291e+04 | 5091.347 | -2.536 | 0.011 | -2.29e+04 | -2931.559 |
C(county)[T.124] | -3.209e+04 | 1e+04 | -3.197 | 0.001 | -5.18e+04 | -1.24e+04 |
C(county)[T.125] | -1.795e+04 | 1.25e+04 | -1.440 | 0.150 | -4.24e+04 | 6490.095 |
C(county)[T.126] | -6948.1460 | 9793.690 | -0.709 | 0.478 | -2.61e+04 | 1.23e+04 |
C(county)[T.127] | -4111.7736 | 9598.180 | -0.428 | 0.668 | -2.29e+04 | 1.47e+04 |
C(county)[T.128] | -1.407e+04 | 3.35e+04 | -0.420 | 0.675 | -7.98e+04 | 5.17e+04 |
C(county)[T.129] | -1.338e+04 | 1.22e+04 | -1.094 | 0.274 | -3.73e+04 | 1.06e+04 |
C(county)[T.130] | -2.958e+04 | 1.24e+04 | -2.379 | 0.017 | -5.4e+04 | -5205.688 |
C(county)[T.131] | -9462.1890 | 2.13e+04 | -0.445 | 0.656 | -5.11e+04 | 3.22e+04 |
C(county)[T.132] | -7889.2394 | 5638.330 | -1.399 | 0.162 | -1.89e+04 | 3164.369 |
C(county)[T.133] | 2175.4884 | 1.51e+04 | 0.144 | 0.886 | -2.75e+04 | 3.18e+04 |
C(county)[T.134] | 5.467e+04 | 1.8e+04 | 3.045 | 0.002 | 1.95e+04 | 8.99e+04 |
C(county)[T.135] | -2.083e+04 | 1.38e+04 | -1.509 | 0.131 | -4.79e+04 | 6228.141 |
C(county)[T.136] | -4420.8608 | 5997.620 | -0.737 | 0.461 | -1.62e+04 | 7337.113 |
C(county)[T.137] | -4087.2387 | 8100.626 | -0.505 | 0.614 | -2e+04 | 1.18e+04 |
C(county)[T.138] | -7939.1789 | 8029.739 | -0.989 | 0.323 | -2.37e+04 | 7802.643 |
C(county)[T.139] | -692.3178 | 1.02e+04 | -0.068 | 0.946 | -2.08e+04 | 1.94e+04 |
C(county)[T.140] | -1.651e+04 | 1.37e+04 | -1.202 | 0.229 | -4.34e+04 | 1.04e+04 |
C(county)[T.141] | -8251.4595 | 1.22e+04 | -0.676 | 0.499 | -3.22e+04 | 1.57e+04 |
C(county)[T.142] | -1373.8309 | 2.13e+04 | -0.065 | 0.948 | -4.3e+04 | 4.03e+04 |
C(county)[T.143] | -5545.8845 | 5862.828 | -0.946 | 0.344 | -1.7e+04 | 5947.838 |
C(county)[T.144] | -4.741e+04 | 2.12e+04 | -2.232 | 0.026 | -8.91e+04 | -5771.921 |
C(county)[T.145] | -9556.0440 | 3.47e+04 | -0.276 | 0.783 | -7.75e+04 | 5.84e+04 |
C(county)[T.146] | -1.903e+04 | 1.59e+04 | -1.196 | 0.232 | -5.02e+04 | 1.22e+04 |
C(county)[T.147] | -2.8e+04 | 1.52e+04 | -1.848 | 0.065 | -5.77e+04 | 1707.239 |
C(county)[T.148] | -2.012e+04 | 1.76e+04 | -1.146 | 0.252 | -5.45e+04 | 1.43e+04 |
C(county)[T.149] | -1.012e+04 | 4824.670 | -2.097 | 0.036 | -1.96e+04 | -657.758 |
C(county)[T.150] | -2.052e+04 | 3.35e+04 | -0.612 | 0.541 | -8.63e+04 | 4.52e+04 |
C(county)[T.151] | -2.235e+04 | 1.13e+04 | -1.974 | 0.048 | -4.45e+04 | -157.867 |
C(county)[T.152] | -1.249e+04 | 7332.914 | -1.703 | 0.089 | -2.69e+04 | 1887.273 |
C(county)[T.153] | -1.074e+04 | 2.99e+04 | -0.360 | 0.719 | -6.93e+04 | 4.78e+04 |
C(county)[T.155] | -1.174e+04 | 1.32e+04 | -0.892 | 0.373 | -3.75e+04 | 1.41e+04 |
C(county)[T.157] | -2.379e+04 | 1.34e+04 | -1.777 | 0.076 | -5e+04 | 2454.505 |
C(county)[T.158] | -3.57e+04 | 1.24e+04 | -2.878 | 0.004 | -6e+04 | -1.14e+04 |
C(county)[T.159] | -3961.2072 | 2.67e+04 | -0.149 | 0.882 | -5.62e+04 | 4.83e+04 |
C(county)[T.160] | -1.811e+04 | 8356.451 | -2.167 | 0.030 | -3.45e+04 | -1729.232 |
C(county)[T.161] | 6091.1910 | 8574.294 | 0.710 | 0.477 | -1.07e+04 | 2.29e+04 |
C(county)[T.162] | 9087.0830 | 1.55e+04 | 0.586 | 0.558 | -2.13e+04 | 3.95e+04 |
C(county)[T.163] | -2.133e+04 | 2.02e+04 | -1.054 | 0.292 | -6.1e+04 | 1.84e+04 |
C(county)[T.164] | -2.283e+04 | 1.51e+04 | -1.511 | 0.131 | -5.24e+04 | 6795.581 |
C(county)[T.165] | -2.21e+04 | 2.13e+04 | -1.037 | 0.300 | -6.39e+04 | 1.97e+04 |
C(county)[T.166] | -1.551e+04 | 9441.670 | -1.642 | 0.101 | -3.4e+04 | 3002.990 |
C(county)[T.167] | -3.947e+04 | 3.24e+04 | -1.218 | 0.223 | -1.03e+05 | 2.41e+04 |
C(county)[T.168] | 1.007e+04 | 7103.076 | 1.417 | 0.156 | -3857.374 | 2.4e+04 |
C(county)[T.169] | 1.283e+04 | 1.14e+04 | 1.121 | 0.262 | -9600.934 | 3.53e+04 |
C(county)[T.170] | -1.615e+04 | 1.4e+04 | -1.150 | 0.250 | -4.37e+04 | 1.14e+04 |
C(county)[T.171] | 2.365e+04 | 7177.844 | 3.295 | 0.001 | 9576.024 | 3.77e+04 |
C(county)[T.172] | -3.626e+04 | 3.24e+04 | -1.119 | 0.263 | -9.98e+04 | 2.73e+04 |
C(county)[T.173] | -2.629e+04 | 2.38e+04 | -1.106 | 0.269 | -7.29e+04 | 2.03e+04 |
C(county)[T.174] | -2.314e+04 | 1.59e+04 | -1.458 | 0.145 | -5.43e+04 | 7983.357 |
C(county)[T.176] | -8331.5679 | 1.76e+04 | -0.472 | 0.637 | -4.29e+04 | 2.62e+04 |
C(county)[T.177] | -7787.9729 | 6943.898 | -1.122 | 0.262 | -2.14e+04 | 5825.123 |
C(county)[T.178] | -8426.9047 | 9527.061 | -0.885 | 0.376 | -2.71e+04 | 1.03e+04 |
C(county)[T.179] | 5061.4958 | 9108.387 | 0.556 | 0.578 | -1.28e+04 | 2.29e+04 |
C(county)[T.180] | -9156.0300 | 7253.999 | -1.262 | 0.207 | -2.34e+04 | 5065.000 |
C(county)[T.181] | -6042.8525 | 1.16e+04 | -0.522 | 0.602 | -2.87e+04 | 1.67e+04 |
C(county)[T.182] | -1.6e+04 | 8180.157 | -1.956 | 0.051 | -3.2e+04 | 40.230 |
C(county)[T.183] | 1688.4654 | 4051.315 | 0.417 | 0.677 | -6253.894 | 9630.825 |
C(county)[T.184] | -1.566e+04 | 3739.789 | -4.189 | 0.000 | -2.3e+04 | -8333.281 |
C(county)[T.185] | -1.158e+04 | 1.5e+04 | -0.769 | 0.442 | -4.11e+04 | 1.79e+04 |
C(county)[T.186] | -1.279e+04 | 1.07e+04 | -1.198 | 0.231 | -3.37e+04 | 8134.088 |
C(county)[T.187] | -6652.3137 | 8781.097 | -0.758 | 0.449 | -2.39e+04 | 1.06e+04 |
C(county)[T.188] | -1.083e+04 | 3548.268 | -3.053 | 0.002 | -1.78e+04 | -3878.254 |
C(county)[T.189] | -2.122e+04 | 5407.161 | -3.925 | 0.000 | -3.18e+04 | -1.06e+04 |
C(county)[T.190] | -2.661e+04 | 1.48e+04 | -1.803 | 0.071 | -5.55e+04 | 2321.109 |
C(county)[T.191] | -3036.3270 | 1.92e+04 | -0.158 | 0.874 | -4.06e+04 | 3.45e+04 |
C(county)[T.192] | 6940.9690 | 9281.647 | 0.748 | 0.455 | -1.13e+04 | 2.51e+04 |
C(county)[T.193] | -9444.8116 | 1.05e+04 | -0.902 | 0.367 | -3e+04 | 1.11e+04 |
C(county)[T.194] | 2498.3312 | 1.35e+04 | 0.185 | 0.853 | -2.4e+04 | 2.9e+04 |
C(county)[T.195] | -2.03e+04 | 9595.165 | -2.116 | 0.034 | -3.91e+04 | -1493.798 |
C(county)[T.196] | -2753.6229 | 6585.461 | -0.418 | 0.676 | -1.57e+04 | 1.02e+04 |
C(county)[T.197] | 1.33e+04 | 5757.604 | 2.311 | 0.021 | 2015.983 | 2.46e+04 |
C(county)[T.198] | 7328.9890 | 7484.437 | 0.979 | 0.328 | -7343.801 | 2.2e+04 |
C(county)[T.199] | -2.136e+04 | 8594.063 | -2.485 | 0.013 | -3.82e+04 | -4510.853 |
C(county)[T.200] | -7508.9207 | 3016.843 | -2.489 | 0.013 | -1.34e+04 | -1594.581 |
C(county)[T.201] | -2.737e+04 | 1.51e+04 | -1.811 | 0.070 | -5.7e+04 | 2251.151 |
C(county)[T.202] | -1237.8204 | 2.25e+04 | -0.055 | 0.956 | -4.54e+04 | 4.29e+04 |
C(county)[T.203] | -6666.3212 | 1.53e+04 | -0.435 | 0.664 | -3.67e+04 | 2.34e+04 |
C(county)[T.204] | -1.464e+04 | 9850.437 | -1.486 | 0.137 | -3.39e+04 | 4674.099 |
C(county)[T.205] | -1.706e+04 | 6442.944 | -2.648 | 0.008 | -2.97e+04 | -4427.328 |
C(county)[T.206] | -3.733e+04 | 1.51e+04 | -2.465 | 0.014 | -6.7e+04 | -7640.473 |
C(county)[T.209] | 6087.6907 | 1.35e+04 | 0.452 | 0.651 | -2.03e+04 | 3.25e+04 |
C(county)[T.210] | -2.578e+04 | 2.65e+04 | -0.973 | 0.330 | -7.77e+04 | 2.61e+04 |
C(county)[T.211] | -1093.8388 | 7382.081 | -0.148 | 0.882 | -1.56e+04 | 1.34e+04 |
C(county)[T.213] | -2.153e+04 | 2.35e+04 | -0.916 | 0.360 | -6.76e+04 | 2.45e+04 |
C(county)[T.214] | -2.351e+04 | 1.27e+04 | -1.858 | 0.063 | -4.83e+04 | 1299.507 |
C(county)[T.215] | -1.66e+04 | 1.12e+04 | -1.488 | 0.137 | -3.85e+04 | 5271.613 |
C(county)[T.216] | -9052.8944 | 5337.794 | -1.696 | 0.090 | -1.95e+04 | 1411.531 |
C(county)[T.217] | -2.914e+04 | 9643.602 | -3.021 | 0.003 | -4.8e+04 | -1.02e+04 |
C(county)[T.218] | -1.642e+04 | 1.66e+04 | -0.987 | 0.324 | -4.9e+04 | 1.62e+04 |
C(county)[T.219] | -482.9429 | 8190.878 | -0.059 | 0.953 | -1.65e+04 | 1.56e+04 |
C(county)[T.220] | 1.038e+04 | 1.11e+04 | 0.934 | 0.350 | -1.14e+04 | 3.22e+04 |
C(county)[T.221] | 7271.3058 | 1.15e+04 | 0.634 | 0.526 | -1.52e+04 | 2.97e+04 |
C(county)[T.222] | -1.622e+04 | 1.02e+04 | -1.588 | 0.112 | -3.63e+04 | 3804.653 |
C(county)[T.223] | -1.878e+04 | 5199.813 | -3.612 | 0.000 | -2.9e+04 | -8586.505 |
C(county)[T.224] | -9192.1466 | 1.45e+04 | -0.636 | 0.525 | -3.75e+04 | 1.92e+04 |
C(county)[T.225] | 2641.2389 | 1.98e+04 | 0.134 | 0.894 | -3.61e+04 | 4.14e+04 |
C(county)[T.226] | -8830.2891 | 6639.675 | -1.330 | 0.184 | -2.18e+04 | 4186.397 |
C(county)[T.227] | -2.143e+04 | 1.09e+04 | -1.970 | 0.049 | -4.28e+04 | -102.683 |
C(county)[T.228] | 3.233e+04 | 1.71e+04 | 1.894 | 0.058 | -1135.703 | 6.58e+04 |
C(county)[T.229] | -1.428e+04 | 1.91e+04 | -0.748 | 0.454 | -5.17e+04 | 2.31e+04 |
C(county)[T.230] | -1.28e+04 | 6264.355 | -2.043 | 0.041 | -2.51e+04 | -517.493 |
C(county)[T.231] | -8271.2748 | 7245.759 | -1.142 | 0.254 | -2.25e+04 | 5933.602 |
C(county)[T.232] | -2.299e+04 | 8519.159 | -2.699 | 0.007 | -3.97e+04 | -6289.960 |
C(county)[T.233] | -1.285e+04 | 8491.101 | -1.513 | 0.130 | -2.95e+04 | 3796.614 |
C(county)[T.234] | -1.735e+04 | 3.1e+04 | -0.559 | 0.576 | -7.82e+04 | 4.35e+04 |
C(county)[T.235] | -1.766e+04 | 3.16e+04 | -0.559 | 0.576 | -7.96e+04 | 4.43e+04 |
C(county)[T.236] | -1.217e+04 | 8349.230 | -1.457 | 0.145 | -2.85e+04 | 4200.746 |
C(county)[T.237] | -1.141e+04 | 4505.618 | -2.533 | 0.011 | -2.02e+04 | -2581.176 |
C(county)[T.238] | -1.613e+04 | 7139.990 | -2.258 | 0.024 | -3.01e+04 | -2127.974 |
C(county)[T.239] | -2.712e+04 | 2.14e+04 | -1.266 | 0.206 | -6.91e+04 | 1.49e+04 |
C(county)[T.240] | 6490.6376 | 9154.156 | 0.709 | 0.478 | -1.15e+04 | 2.44e+04 |
C(county)[T.241] | -3359.7031 | 6189.659 | -0.543 | 0.587 | -1.55e+04 | 8774.753 |
C(county)[T.242] | -1.613e+04 | 1.27e+04 | -1.272 | 0.203 | -4.1e+04 | 8724.775 |
C(county)[T.243] | -1.788e+04 | 1.39e+04 | -1.286 | 0.199 | -4.51e+04 | 9383.449 |
C(county)[T.244] | -3.071e+04 | 1.43e+04 | -2.150 | 0.032 | -5.87e+04 | -2703.321 |
C(county)[T.245] | -8153.4624 | 1.35e+04 | -0.603 | 0.547 | -3.47e+04 | 1.84e+04 |
C(county)[T.246] | 5281.9970 | 4036.138 | 1.309 | 0.191 | -2630.610 | 1.32e+04 |
C(county)[T.247] | -9765.4698 | 5639.622 | -1.732 | 0.083 | -2.08e+04 | 1290.671 |
C(county)[T.248] | 143.0204 | 8692.204 | 0.016 | 0.987 | -1.69e+04 | 1.72e+04 |
C(county)[T.249] | -1.476e+04 | 3.35e+04 | -0.440 | 0.660 | -8.05e+04 | 5.1e+04 |
C(county)[T.250] | 1.616e+04 | 1.05e+04 | 1.541 | 0.123 | -4398.223 | 3.67e+04 |
C(county)[T.251] | -1.651e+04 | 9569.822 | -1.726 | 0.084 | -3.53e+04 | 2247.701 |
C(county)[T.252] | -1.57e+04 | 2.01e+04 | -0.781 | 0.435 | -5.51e+04 | 2.37e+04 |
C(county)[T.253] | -1.801e+04 | 1.58e+04 | -1.141 | 0.254 | -4.89e+04 | 1.29e+04 |
C(county)[T.254] | -5819.5313 | 1.24e+04 | -0.470 | 0.638 | -3.01e+04 | 1.85e+04 |
C(county)[T.256] | -9352.6145 | 1.31e+04 | -0.712 | 0.476 | -3.51e+04 | 1.64e+04 |
C(county)[T.257] | -1.474e+04 | 4650.454 | -3.170 | 0.002 | -2.39e+04 | -5626.575 |
C(county)[T.258] | -2.95e+04 | 1.86e+04 | -1.587 | 0.113 | -6.6e+04 | 6945.417 |
C(county)[T.259] | 5.662e+04 | 1.78e+04 | 3.176 | 0.002 | 2.17e+04 | 9.16e+04 |
C(county)[T.260] | -2.075e+04 | 1.42e+04 | -1.461 | 0.144 | -4.86e+04 | 7093.704 |
C(county)[T.261] | -1.214e+04 | 9508.704 | -1.277 | 0.202 | -3.08e+04 | 6499.043 |
C(county)[T.262] | -1.499e+04 | 3.16e+04 | -0.474 | 0.635 | -7.69e+04 | 4.7e+04 |
C(county)[T.263] | 1.094e+04 | 1.25e+04 | 0.872 | 0.383 | -1.36e+04 | 3.55e+04 |
C(county)[T.264] | -2479.7477 | 1.18e+04 | -0.210 | 0.834 | -2.57e+04 | 2.07e+04 |
C(county)[T.265] | -2.203e+04 | 1.05e+04 | -2.105 | 0.035 | -4.25e+04 | -1517.160 |
C(county)[T.266] | 6210.2882 | 3.35e+04 | 0.185 | 0.853 | -5.95e+04 | 7.19e+04 |
C(county)[T.267] | -1.138e+04 | 2.33e+04 | -0.488 | 0.626 | -5.71e+04 | 3.44e+04 |
C(county)[T.268] | -2.097e+04 | 1.51e+04 | -1.389 | 0.165 | -5.06e+04 | 8629.225 |
C(county)[T.269] | -1.836e+04 | 7924.529 | -2.317 | 0.021 | -3.39e+04 | -2824.110 |
C(county)[T.270] | -1.361e+04 | 1.72e+04 | -0.789 | 0.430 | -4.74e+04 | 2.02e+04 |
C(county)[T.271] | -1.204e+04 | 1.55e+04 | -0.777 | 0.437 | -4.24e+04 | 1.84e+04 |
C(county)[T.272] | -1.485e+04 | 8880.399 | -1.672 | 0.095 | -3.23e+04 | 2561.281 |
C(county)[T.273] | -1.149e+04 | 2.63e+04 | -0.438 | 0.662 | -6.3e+04 | 4e+04 |
C(county)[T.274] | -9891.3781 | 4694.196 | -2.107 | 0.035 | -1.91e+04 | -688.687 |
C(county)[T.275] | -5173.7930 | 1.15e+04 | -0.449 | 0.653 | -2.77e+04 | 1.74e+04 |
C(county)[T.276] | -1.966e+04 | 1.66e+04 | -1.182 | 0.237 | -5.22e+04 | 1.29e+04 |
C(county)[T.277] | -3.402e+04 | 2.86e+04 | -1.188 | 0.235 | -9.01e+04 | 2.21e+04 |
C(county)[T.278] | 2322.2439 | 1.9e+04 | 0.122 | 0.903 | -3.49e+04 | 3.95e+04 |
C(county)[T.279] | -1.119e+04 | 1.84e+04 | -0.609 | 0.543 | -4.72e+04 | 2.48e+04 |
C(county)[T.280] | -1.345e+04 | 6306.646 | -2.133 | 0.033 | -2.58e+04 | -1085.930 |
C(county)[T.281] | -8403.4354 | 2.99e+04 | -0.281 | 0.778 | -6.69e+04 | 5.01e+04 |
C(county)[T.282] | -3681.0429 | 1.2e+04 | -0.307 | 0.759 | -2.72e+04 | 1.99e+04 |
C(county)[T.283] | -5339.8789 | 9936.127 | -0.537 | 0.591 | -2.48e+04 | 1.41e+04 |
C(county)[T.284] | -1.452e+04 | 8179.287 | -1.775 | 0.076 | -3.06e+04 | 1519.240 |
C(county)[T.285] | -3.43e+04 | 2.13e+04 | -1.613 | 0.107 | -7.6e+04 | 7382.687 |
C(county)[T.286] | 7684.0660 | 1.16e+04 | 0.664 | 0.507 | -1.5e+04 | 3.04e+04 |
C(county)[T.287] | -1.775e+04 | 5134.756 | -3.456 | 0.001 | -2.78e+04 | -7680.041 |
C(county)[T.288] | 2038.4699 | 1.76e+04 | 0.116 | 0.908 | -3.26e+04 | 3.66e+04 |
C(county)[T.289] | -8041.0146 | 1.11e+04 | -0.726 | 0.468 | -2.97e+04 | 1.37e+04 |
C(county)[T.290] | -1.581e+04 | 7099.249 | -2.227 | 0.026 | -2.97e+04 | -1889.232 |
C(county)[T.291] | -7663.1814 | 1.35e+04 | -0.567 | 0.571 | -3.42e+04 | 1.88e+04 |
C(county)[T.292] | -1.567e+04 | 7319.153 | -2.141 | 0.032 | -3e+04 | -1323.652 |
C(county)[T.293] | -5825.2555 | 1.34e+04 | -0.433 | 0.665 | -3.22e+04 | 2.05e+04 |
C(county)[T.294] | 1.947e+04 | 9840.528 | 1.978 | 0.048 | 177.105 | 3.88e+04 |
C(county)[T.295] | -1.75e+04 | 1.76e+04 | -0.996 | 0.319 | -5.2e+04 | 1.7e+04 |
C(county)[T.296] | -1096.1249 | 1.8e+04 | -0.061 | 0.951 | -3.64e+04 | 3.42e+04 |
C(county)[T.297] | -1.141e+04 | 4502.234 | -2.535 | 0.011 | -2.02e+04 | -2586.102 |
C(county)[T.298] | -8732.4700 | 2.87e+04 | -0.304 | 0.761 | -6.5e+04 | 4.75e+04 |
C(county)[T.299] | -1.74e+04 | 2.99e+04 | -0.583 | 0.560 | -7.59e+04 | 4.11e+04 |
C(county)[T.300] | -1.986e+04 | 1.04e+04 | -1.907 | 0.057 | -4.03e+04 | 560.482 |
C(county)[T.301] | -4.329e+04 | 3.24e+04 | -1.336 | 0.182 | -1.07e+05 | 2.03e+04 |
C(county)[T.302] | 4581.0131 | 7593.047 | 0.603 | 0.546 | -1.03e+04 | 1.95e+04 |
C(county)[T.303] | -1.484e+04 | 7473.001 | -1.985 | 0.047 | -2.95e+04 | -186.892 |
C(county)[T.304] | -2.222e+04 | 8772.347 | -2.533 | 0.011 | -3.94e+04 | -5022.926 |
C(county)[T.305] | -2514.2672 | 7217.330 | -0.348 | 0.728 | -1.67e+04 | 1.16e+04 |
C(county)[T.306] | -2.336e+04 | 1.99e+04 | -1.174 | 0.240 | -6.24e+04 | 1.57e+04 |
C(county)[T.307] | -1.116e+04 | 9796.527 | -1.139 | 0.255 | -3.04e+04 | 8045.695 |
C(county)[T.308] | 8.973e+04 | 1.66e+04 | 5.411 | 0.000 | 5.72e+04 | 1.22e+05 |
C(county)[T.309] | -1.179e+04 | 5120.504 | -2.302 | 0.021 | -2.18e+04 | -1747.726 |
C(county)[T.310] | -1.268e+04 | 3.35e+04 | -0.378 | 0.705 | -7.84e+04 | 5.31e+04 |
C(county)[T.311] | -1.624e+04 | 1.01e+04 | -1.611 | 0.107 | -3.6e+04 | 3524.476 |
C(county)[T.312] | -9191.1488 | 9873.286 | -0.931 | 0.352 | -2.85e+04 | 1.02e+04 |
C(county)[T.313] | -1.765e+04 | 4.28e+04 | -0.412 | 0.680 | -1.02e+05 | 6.63e+04 |
C(county)[T.314] | -1.793e+04 | 7170.454 | -2.501 | 0.012 | -3.2e+04 | -3875.671 |
C(county)[T.315] | 879.4379 | 1.04e+04 | 0.084 | 0.933 | -1.96e+04 | 2.13e+04 |
C(county)[T.316] | -5807.9640 | 6758.678 | -0.859 | 0.390 | -1.91e+04 | 7442.020 |
C(county)[T.317] | 1.794e+04 | 9335.924 | 1.922 | 0.055 | -362.354 | 3.62e+04 |
C(county)[T.318] | -4980.2142 | 8582.095 | -0.580 | 0.562 | -2.18e+04 | 1.18e+04 |
C(county)[T.319] | 1051.3237 | 1.42e+04 | 0.074 | 0.941 | -2.68e+04 | 2.89e+04 |
C(county)[T.320] | -1.431e+04 | 1.5e+04 | -0.951 | 0.341 | -4.38e+04 | 1.52e+04 |
C(county)[T.321] | -1.684e+04 | 6968.208 | -2.416 | 0.016 | -3.05e+04 | -3176.799 |
C(county)[T.322] | -406.6903 | 6523.846 | -0.062 | 0.950 | -1.32e+04 | 1.24e+04 |
C(county)[T.323] | -1.853e+04 | 5726.625 | -3.236 | 0.001 | -2.98e+04 | -7302.079 |
C(county)[T.324] | -2501.1256 | 7866.013 | -0.318 | 0.751 | -1.79e+04 | 1.29e+04 |
C(county)[T.325] | -1.951e+04 | 2.24e+04 | -0.873 | 0.383 | -6.33e+04 | 2.43e+04 |
has_college | 1.325e+04 | 742.630 | 17.843 | 0.000 | 1.18e+04 | 1.47e+04 |
female | -8657.2875 | 609.011 | -14.215 | 0.000 | -9851.217 | -7463.358 |
Omnibus: | 2414.039 | Durbin-Watson: | 1.981 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 22245.534 |
Skew: | 1.945 | Prob(JB): | 0.00 |
Kurtosis: | 12.242 | Cond. No. | 198. |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Exercise 15¶
If you stopped matching after the second iteration (Iteration 1) back in Exercise 10, you may be wondering if that was a good choice! Let’s check by restricting our attention to ONLY exact matches (iteration = 0
). Run that match.
[37]:
model2 = dame_flame.matching.DAME(
repeats=False, verbose=3, want_pe=True, early_stop_iterations=0
)
model2.fit(
for_matching,
treatment_column_name="has_college",
outcome_column_name="annual_earnings",
)
result2 = model2.predict(for_matching)
Completed iteration 0 of matching
Number of matched groups formed in total: 370
Unmatched treated units: 644 out of a total of 1150 treated units
Unmatched control units: 3187 out of a total of 4365 control units
Number of matches made this iteration: 1684
Number of matches made so far: 1684
Covariates dropped so far: set()
Predictive error of covariate set used to match: 1199312680.0957854
1684 units matched. We stopped after iteration 0
[38]:
matched_data2 = get_dataframe(model2, result2)
Exercise 16¶
Now use a weighted linear regression on your matched data to regress annual earnings on just having a college eduction. Is that different from what you had when you allowed more low quality matches?
[39]:
smf.wls(
"annual_earnings ~ has_college", matched_data2, weights=matched_data2["weights"]
).fit().summary()
[39]:
Dep. Variable: | annual_earnings | R-squared: | 0.049 |
---|---|---|---|
Model: | WLS | Adj. R-squared: | 0.048 |
Method: | Least Squares | F-statistic: | 86.65 |
Date: | Sat, 04 Mar 2023 | Prob (F-statistic): | 3.92e-20 |
Time: | 13:59:42 | Log-Likelihood: | -19512. |
No. Observations: | 1684 | AIC: | 3.903e+04 |
Df Residuals: | 1682 | BIC: | 3.904e+04 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.914e+04 | 664.386 | 58.907 | 0.000 | 3.78e+04 | 4.04e+04 |
has_college | 1.128e+04 | 1212.039 | 9.308 | 0.000 | 8904.805 | 1.37e+04 |
Omnibus: | 855.250 | Durbin-Watson: | 2.037 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 6653.000 |
Skew: | 2.256 | Prob(JB): | 0.00 |
Kurtosis: | 11.629 | Cond. No. | 2.42 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Other Forms of Matching¶
OK, hopefully this gives you a taste of matching! There are, of course, many other permutations to be aware of though.
Matching with replacement. In this exercise, we set
repeat=False
, so each observation could only end up in our final dataset once. However, if we userepeat=True
, if an untreated observation is the closest observation to multiple treated observations, it may get put in the dataset multiple times. We can still use this dataset in almost the same way, though, except we have to make use of weights so that if an observation appears, say, twice, each observation has a weight that’s 1/2 the weight of an observation only appearing once.Matching with continuous variables: DAME is used for exact matching, but if you have lots of continuous variables, you can also match on those. In fact, the Almost Exact Matching Lab also has a library called MALTS that will do matching with continuous variables. That package does something like Mahalanobis Distance matching, but ulike Mahalanobis, which calculates the distance between observations in terms of the difference in all the matching variables normalized by each matching variable’s standard deviation, MALTS does something much more clever. (Here’s the paper describing the technique if you want all the details). Basically, it figures out how well each matching variable predicts our outcome \(Y\), then weights the different variables by their predictive power instead of just normalizing by something arbitrary like their standard deviation. As a result, final matches will prioritize matching more closely on variables that are outcome-relevant. In addition, when it sees a categorical variable, it recognizes that and only pairs observations when they are an exact match on that categorical variable.
If you’re dataset is huge, use
FLAME
: this dataset is small, but if you have lots of observations and lots of matching variable, the computational complexity of this task explodes, so the AEML created FLAME, which works with millions of observations at only a small cost to match quality.
Absolutely positively need the solutions?¶
Don’t use this link until you’ve really, really spent time struggling with your code! Doing so only results in you cheating yourself.