In our last post, Chris Smith wrote about the data pipeline: the process by which we clean, wrangle, and pre-process data so that we can visualize and make proper inferences. Today we give you another necessary step when telling data stories: Know your data.

In my role as a Data Coach, I spend a majority of my time helping people to understand what a certain metric means and what are plausible inferences to make from the data. What I find most often is one of two scenarios: (1) people have little idea how a certain metric might be calculated, and (2) people are overconfident in their ability to make inferences from data and often jump to conclusions. So here are three steps for you and your school to know your data better to avoid those two pitfalls.

Know The Math Behind The Data

Within reason, you must now a little bit of math to understand your data. You need to know how the metric is calculated so avoid making false assertions. Let me tell you a tale of my own experience:

I’ve worked at schools that take annual standardized tests. These tests have a “Growth Projection” metric in the fall for each student and then when we take the test again, we see ow many of our students met this projection. Here was my approximate data over 7 years:

Are seeing what I’m seeing? My first 5 years I hovered in the high 40, low 50 percent range before climbing to the 60-70 percent range in 2016-2017. The first inference to commonly make is that only around 50% of my students are meeting growth targets. While a perfect 100% is not a fair expectation, this is far too low, right? A previous administrator thought so and I had to set goals to improve my scores. Do you side with that inference and action plan?

Not so fast. If you don’t know how those projections are calculated, then how can you infer what percentage is appropriate? It turns out in this test, that each students’ growth projection is calculated from an average of a huge data set of similar students. You know the thing about averages, right? They’re in the middle – meaning approximately half your students will be above that number and half below. The target percentage to achieve on this metric is therefore to have around 50% of your students meet projections. I’d go as far as to say that when my data started to stretch into the 70% range, that now we have a real problem: I might be teaching to the test to get metrics that high or something in the curriculum might be too standardized test-oriented.

Make sure you know (within reason) how your data is calculated to make accurate inferences.

Know The Limitations Of Your Data

This strategy takes humility. People who like data often point to how numbers are more concrete or a hard science as compared to individual perceptions. However, its so often that the second we get data, we start making inferences well beyond the scope of the data itself.

At its core, data only measures what it measured. For example: if your students take a history quiz, you may use the results to have ideas about how much your students understand about the unit or topic. However, that is not what the data measured, that is meaning we have added to the data, or inference. Concretely, all the data 100% says is how that group of students did at answering those specific questions on that day. Any further conclusions about what students know, or what was taught well, is an inference made by us, not the data.

All data is limited this way. The SAT only measures how a student does on the SAT and colleges have used that data to infer potential college success. The grades we assign in class only measure the sets of data we have collected and we infer indirectly that it shows proficiency. This is why I’m a big proponent of building multiple sets of data for high-stakes conclusions. It’s fine to use a single quiz to recommend a student do review assignment or the teacher re-tool a lesson. It’s one small piece of data, but the stakes are low. However, placement tests to determine course placements? All we’ve measured is how a student does on one test on one day on those particular questions, but we’ve determined potential years of course options.

Humility to understand limitations and willingness to do the work to triangulate data is a necessity in a positive data culture.

Appoint Experts Who Will Do the Research When Necessary

Large data sets are constantly changing and evolving. The NWEA MAP Growth tests have over 150 potential variables/metrics for each student and reports from College Board and IBO are similarly large. It’s impossible to study all such variables, and in my experience with representatives from those organizations, it can be hard for their customer facing representatives to know how those variables are measured. No shade being thrown here, it’s just a lot.

At my school, as the Data Coach, one of my functions is to read documentation. Like all coaches, staying on top of our role requires reading research. For me, sometimes that means studying standardized testing documentation and learning how metrics are calculated. With so much amazing data to dig into, someone needs to be able to clarify how the data is gathered, measured, and how it can be used to generate theories. If your school has the means, designate a point-person to become a specialist in understanding specific data sets. Not a general “data person” who is “good at numbers,” but someone who will become an expert in standardized tests, a person who will become an expert in your database data sets, etc. This person needs to be willing to follow my previous two recommendations of knowing the math and knowing the limitations.


Pin It on Pinterest

Share This