Writing Data Science Report for Non-Technical Audiences¶
As a data scientist, you’ll often be required to summarize your analyses and present them to non-data scientists. This type of translation of technical analyses to something of use to less-technical audiences is an absolutely critical part of being an effective data scientist – if you don’t communicate what you’ve done to decision makers, it often doesn’t matter how rigorous or careful your work has been up to that point.
With that in mind, here is an outline of one strategy for writing for non-technical audiences. Obviously different people may prefer slightly different approaches, but I think this is a good model to start with.
Also, note that this is the model I’d like you to use when writing your final report for this class, so there are a few notes that are specific to class expectations!
Identify your audience¶
Before you write a single word, you should pause to reflect on exactly who you wish to address with your report, and their background. What follows are general guidelines, but the better you know your audience, the more precisely you can tailor the level of detail in your report.
For this class: At the top of your report, please specify the stakeholder to whom you are addressing your report – a product manager, a legislative aid, a policymaker, etc. This stakeholder should be relevant to your study, but should not be someone with data science training. You may assume they know about basic statistical concepts (means and standard deviations), but no more (no assumed understanding of potential outcomes, the theoretical underpinnings of experiments, specific designs like differences-in-differences, etc.). Obviously this is not something you’d put in a real write-up, but will be helpful for evaluation of your project.
Introduction / Executive Summary¶
One of the most important things to remember when writing up an analysis is that the person your writing to has too many things to do, and is definitely less interested in your project than you are. With that in mind, it’s important that you write and organize your report in a way that catches their attention early and gets them invested so they keep reading. As a result, one generally wants to start with the most important parts of the analysis, then slowly draw back and lay out additional details.
You may have never noticed this before, but this is how most news articles are written: one of the first two or three paragraphs is what’s referred to as the “nut graf” (or nutshell paragraph) in which the journalist basically summarizes the entire news article in a single paragraph. In the words of Ken Wells from the Wall Street Journal, the nut graf is “a paragraph that says what this whole story is about and why you should read it. It’s a flag to the reader, high up in the story: You can decide to proceed or not, but if you read no farther, you know what that story’s about.
Thankfully you probably aren’t so pressed for time that you have to summarize everything in a single paragraph, but we will follow a similar structure in which we try and give the reader a full summary of why your project is important, how you do your analysis, and broadly what you conclude up front. In particular, I would argue that your introduction / executive summary should be organized as follows:
Identify the problem you wish to address
The first thing to do in any report is motivate your analysis – tell us about why you need to undertake this project. At this point in the report, keep this relatively brief – the motivation for the project is important, but you don’t want to drown the reader in background. This should probably be one-to-two solid paragraphs. But don’t draw it out – we can get more into background on the problem later, and you don’t want to get bogged down talking about the problem, you want to get to how you’re gonna help the reader.
What question will you try to answer, and how will it help you address
Here’s the linchpin of the report: announce the question you’re seeking to answer in your project and make it clear how this will help address the problem you’ve identified. This transition is where you will either get the reader to buy into the report and read it carefully, or lose their interest.
Summarize your strategy
Now in one to two paragraphs provide an overview of your project, your approach, and a preliminary summary of your results.
In all, you should have covered all this is about one page, maybe a page and a half, and hopefully now you’ve got your reader hooked!
OK, so at this point you’ve hopefully caught your readers interest, so now you can circle back and provide any additional background needed to help the reader better understand your motivation or the specific context you are analyzing (if you’re looking at a policy change, the details of the policy, the context in which it occurred, the players involved, etc.) The amount of background needed will vary across projects, but whatever you need goes here.
Here’s where you lay out how you plan to answer the question you laid out in your summary.
As you do so, bear in mind the difference between your goals in writing to a non-technical stakeholder and your goals when writing to a fellow data scientist (most of your professors).
When writing to a fellow data scientist, you’re generally writing to a skeptical audience. Your goal is to try and convince them that you did everything correctly – crossed every t and dotted every i. This is especially true when writing to professors in technical classes, since you’re usually trying to demonstrate your mastery of a technical skill, which means communicating very detail.
But a stakeholder reading your analysis is generally someone who mostly decided to put their trust in you when they hired you, and at this point your job is not to convince them every technical nuance of the project is right – by definition, most non-technical audiences wouldn’t be able to read a balance table showing that your randomization created balanced samples – but rather to communicate to them the key take-aways of the analysis.
That’s not to say you don’t need to engage with some technical aspects of your project. For example, if you’re using a good causal design, it’s critical the reader know why your causal research design is better than just looking at observational data in a regular regression (especially since someone else may try and argue with your results using that type of data). And of course they need to know about any limitations of what you’ve learned. But you don’t have to put every bit of due deligience you’ve done in the main report.
With that in mind, one thing that’s crucial to this section if you’re doing causal inference is to help the reader understand why you’re using a specific causal design without using technical language. To do so, you want to lay out specific, concrete reasons that just using observational data might lead to erroneous conclusions (e.g. do the same thing you did on the homework assignments / midterm when asked about how people were interpreting observational studies.)
For example, if you are doing an experiment to see how sending people coupons would impact consumer behavior, you want to explain that “we can’t just use data on sales from stores that chose to send out coupons to evaluate whether we should be sending out coupons to all our customers because it’s possible that the stores that sent out coupons did so precisely because they knew that their customers were struggling financially, and thus needed coupons to be able to afford products. As a result, if we compared sales to customers who got coupons to those who did not, we might inadvertently assume the lower sales to customers who got coupons was the result of the coupons, when in fact it actually reflected the fact that the coupons went to customers who were less well-off financially to begin with.”
“But if we run an experiment in which we randomly assign customers to either receive a coupon or not, then we know that on average the people getting coupons will be the same as the people not getting coupons (since who gets coupons is random, and not related to anything like customer income). As a result, we can compare sales to customers who got coupons and those that did not, and infer with confidence that any difference we see is the result of getting coupons, not other differences in the customers with or without coupons.”
(See? No discussion of potential outcomes or use of terms like “baseline differences!”
Now it’s time for results! As with your design, remember your goal is to emphasize the key take-aways of your analysis, which means both what the data can tell you and what it can’t. Remember that honest humility is a key part of being a good data scientist – don’t over-sell your results!
Now the final part of the project – quickly recapitulate the problem you wanted to address, the question you sought to answer, the answer you reached, and the implications of this result. In this discussion, make sure you talk a lot about external validity: where are these results likely applicable? Where are they not? What other research could be done to learn more? Do you have concrete recommendations?
Remember when I said that in writing to a non-technical stakeholder, you don’t have to detail all the nuances of your analysis? Well… that’s true. BUT: it’s often good to put the details of all the careful analyses of robustness and diagnostic tests you completed in appendices. That way you can reference them in the body of your report (communicating in broad terms that you were careful without boring your reader), but then also include them in case your stakeholder wants to share your report with another data scientist for a second opinion.
So you probably want (and for this class, should have) an appendix with things like balance tests, A/A tests, evidence of parallel trends, discussion of why you chose certain sample restrictions, alternate specifications, etc., depending on what’s appropriate for your particular research design.