One of the more difficult analyses I do regularly is to forecast student success.  I do this by gathering and analyzing current student data and using a technique called Supervised Learning.  It’s a cool way of being able to tell the future.

So how is it done?  And how can normal, non-data scientists use this procedure?  I will walk you through a procedure I do each year that can be done just with excel.

Step 1: Decide what you want to forecast

Is it future classroom grades? Is it future standardized test scores?  In the USA there’s a lot of forecasting done around who might become at-risk of not completing a high school diploma.  You need to decide what it is that you want to predict.

I have frequently worked at school’s with Math tracking, so in my example, we are going to try and predict future grades if a student is placed in the highest track.

Step 2: Collect data on your current students

Figure out what data is meaningful and already available.  It is often easy for you to download standardized test results, grade reports, and attendance records.  You can then copy them into a spreadsheet.  You need to then do this with past students so we can see their current success.

In my example, we are using semester academic grades (out of 100%), learning behavior scores (on a scale of 1 to 4), and standardized test results (out of 320).  All of this is for students from last year.

Screen Shot 2018-05-17 at 3.23.10 PM

Step 3: Collect the achievement indicator you are looking for

Screen Shot 2018-05-17 at 3.23.00 PM.png


This is challenging because it takes some transparency and educators opening up their current grade books to one another.  Hopefully, this is done in a way that allows people to feel safe (possibly by compiling them anonymously or by having a Coach rather than Administrator gather the data).

In our working example, we grabbed their 9th-grade semester 1 report card grades through a quick gradebook export.




Step 4: Find the strongest correlations

We can then use Excel’s “correl” function to find out the correlation between various metrics and their grades the next year. You want to look for the strongest correlations (closest to 1 or -1)

In my example below, I correlated each of the metrics we collected (besides the teacher comments) with their 9th-grade semester grades.

Screen Shot 2018-05-17 at 3.26.11 PM

Step 5: Interpret the results

Find the strongest predictor by which one has the strongest correlation.  It might be more than one!  However, using multiple predictor variables is a much more complicated procedure.

From my results, the strongest correlations to academic grades in the future, are the grades they received during semester 2.  This tells me that how students finish the year is a strong predictor of success the following year.  Also, interesting in my sample, is how strong Behavior grades correlate.  It seems that the message for my students is that it’s important to finish the year on your best foot.

Step 6: Use the results to tell the future

You now need to look at your current students rather than past data.   Maybe you want to find at-risk students and you found that attendance records have the strongest correlation high school success, so now you need to look at your current students to see who has a high absentee rate.  Maybe, like me, you are trying to forecast grades in a tracked class, and you found that second-semester grades have the best correlation to higher marks in the following year; so now we need to find who has grades that fit success in that tracked class.

Screen Shot 2018-05-21 at 11.03.20 AM.png

In our example, I can now target which kids may not be suited for an advanced class based on their second semester grades.






Step 7 (Optional): Develop an actual algorithm

Excel can take this one step further by developing a regression line.  This is a technique from your High School Algebra class that you might remember as a “line of best fit” or “least-squares regression.”

Screen Shot 2018-05-21 at 11.13.47 AMYou can then get an actual equation that could predict a possible value in the future.  In my example, my equation helped show me that students below a 75% in my class, are predicted to stay under 80% in the following year when moved into the higher tracked class.  So now I might refine my list by only searching for students below 75%.



It might seem daunting at first, but once you setup a procedure it’s actually quite quick.  Plus it allows you the advantage of providing supports and interventions before there’s a problem or the ability to make accurate programatic decisions that are in the best interest of students.


Disclaimer: The data and graphics used on this site are simulated re-creations intended to protect the privacy of the original data sources.