Crime and Poverty Descriptive Analysis¶

In this exercise we’ll be examining the relationship between crime and policing expenditures using county-level data from Massachusetts.

Exercise 1¶

Begin by downloading the data for this exercise from https://github.com/nickeubank/MIDS_Data/blob/master/descriptive_exercise/crime_expend_MA.csv (just go to github.com/nickeubank.MIDS_Data, then go to descriptive_exercise and get crime_expend_MA.csv if you don’t want to type all that).

[1]:

import pandas as pd

df = pd.read_csv(
    "https://media.githubusercontent.com/media/nickeubank/"
    "MIDS_Data/master/descriptive_exercise/crime_expend_MA.csv"
)
df.head()

[1]:

	county_code	crimeindex	policeexpenditures	month	year
0	1	61.411101	32.331110	1	1990
1	10	92.779361	59.342067	1	1990
2	11	93.222701	50.481508	1	1990
3	12	95.588374	65.815540	1	1990
4	13	92.472719	38.337757	1	1990

Exercise 2¶

This data includes monthly data on both each county’s policing expenditures (policeexpenditures as share of county budget) and an index of crime (crimeindex, scaled 0-100) from 1990 to late 2001.

In these exercises, we’ll be focusing on just two counties – county_code 4 and 10.

First, for each of these two counties, calculate the mean expenditure level and mean crimeindex score (i.e. calculate both means separately for each county).

Just to make sure we’re practicing applied skills – do it with a loop and print your results nicely! So you should get output like this (though obviously with different numbers – I’m not gonna give you the answer!):

for county 4, average policing expenditure is 23.7 and average crime index is 75.83
for county 10, average policing expenditure is 62.15 and average crime index is 55.88

[2]:

for c in [4, 10]:
    police = df.loc[df.county_code == c, "policeexpenditures"].mean()
    crime = df.loc[df.county_code == c, "crimeindex"].mean()

    print(
        f"for county {c}, average policing expenditure "
        f"is {police:.2f} and average crime index is {crime:.2f}"
    )

for county 4, average policing expenditure is 54.26 and average crime index is 47.83
for county 10, average policing expenditure is 54.24 and average crime index is 47.77

Exercise 2¶

Now calculate the standard deviation of both poverty and crime for these two counties.

[3]:

for c in [4, 10]:
    police = df.loc[df.county_code == c, "policeexpenditures"].std()
    crime = df.loc[df.county_code == c, "crimeindex"].std()

    print(
        f"for county {c}, the std of policing expend "
        f"is {police:.2f} and the std of crime index is {crime:.2f}"
    )

for county 4, the std of policing expend is 16.77 and the std of crime index is 26.94
for county 10, the std of policing expend is 16.68 and the std of crime index is 27.00

Exercise 3¶

Now calculate the correlation between poverty and crimeindex for both of these counties (again with a loop and nice printed output!)

[4]:

for c in [4, 10]:
    corr = df.loc[df.county_code == c, ["policeexpenditures", "crimeindex"]].corr()
    corr = corr.iloc[0, 1]
    print(f"for county {c}, the correlation is {corr:.2f}")

for county 4, the correlation is -0.06
for county 10, the correlation is -0.06

Exercise 4¶

Based on your results up to this point, what would you guess about whether policing reduces crime? I know—these are just descriptive statistics, and correlation does not imply causality. But what would you infer if this was all you knew?

It seems like policing expenditures might have a very small ability to reduce crime, but it’s certainly not a strong correlation.

Exercise 5¶

Given what you’ve seen up till now, would you infer that county 4 and county 10 have a similar relationship between crime and police expenditures?

It seems like County 4 and County 10 have experienced very similar levels of crime and police expenditures, and the relationship between these two measures is very similar as well.

Exercise 6¶

Now plot histograms of policeexpenditures for both county 4 and county 10. Do the results change you impression of the similarity of county 4 and county 10?

[16]:

import altair as alt
alt.renderers.enable('mimetype')

# Altair grumpy about
# displaying in a loop...
hists = dict()
for county in [4, 10]:
    hists[county] = (
        alt.Chart(df[df.county_code == county], title=f"County {county}")
        .mark_bar()
        .encode(
            alt.X("policeexpenditures:Q", bin=True),
            y="count()",
        )
    )

hists[4]

[16]:

<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

[17]:

hists[10]

[17]:

<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

Exercise 7¶

Finally, create a scatter plot of the relationship between crime and police expenditures for each county (e.g. crime on one axis, police expenditures on the other). Does this change your sense of how similar these are?

[20]:

scatter = dict()
for county in [4, 10]:
    scatter[county] = (
        alt.Chart(df[df.county_code == county], title=f"County {county}")
        .mark_point()
        .encode(
            alt.X("policeexpenditures", scale=alt.Scale(zero=False)),
            alt.Y("crimeindex", scale=alt.Scale(zero=False)),
        )
    )

scatter[4]

[20]:

<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

[21]:

scatter[10]

[21]:

<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

After you have answered…¶

Read this discussion page.