Using recurrent neural networks, we predict the spread of illness in the United States.
By Sarah Pilewski: quantitative analytics cat herder and Graham Ganssle: time-series machine learning nerd
How widespread is the flu right now? In my state, are people generally getting more or less sick this week? What about in my county? What about next month?
These are all questions the CDC has been attempting to answer for twenty years, and they’re doing a pretty good job at it. They regularly poll hospitals for data to answer these questions, and they’re reasonably accurate to within a month or so. But if you’re trying to decide when to get your kid a flu shot, you don’t want to know what happened a month ago; you want to know what’s happening now, and what’s likely to happen in the next days, weeks, months.
That’s where Kinsa Health comes in. They aggregate illness data from all over the country in real time to provide accurate measurements of illness levels from nationwide, all the way down to the county level. Log into their app and you’ll see up-to-the-minute population illness percentages. Want to see the future? Now, Kinsa does that, too. Kinsa worked with Expero to build an illness forecast so their customers can see how the flu is going to spread for up to a year in the future.
Well, it’s not. Expero’s data science team has worked on many time series forecasting problems, and the spread of illness is a doozie. In fact, it’s a doozie in both the temporal and spatial domains!
Illness spread is a highly underdetermined, ill-constrained spatiotemporal system characterized by nonlinear propagation pathways. Modeling this system is tough. Really tough. Like, the CDC has a contest to do this, tough. Like, Google tried to do this in 2016 and failed, tough. Now it’s 2019, and working together with Expero, the experts at Kinsa were able to crack this nut.
For decades people have tried applying techniques like seasonal decomposition and ARIMA to build models of disease spread. If your desired level of accuracy is low enough, these models perform satisfactorily and have the added benefit of being simple enough that most analysts know how to build and maintain them. If, however, you require a higher level of accuracy which adapts to perturbations in your data patterns (read: illness spreads unexpectedly), you need machine learning.
Deep learning, in particular, does an excellent job at adapting to nonlinearities and inconsistencies in real world data. By using a geospatially linked mesh of recurrent neural networks, the Kinsa and Expero data science teams were able to build a model which forecasts the spread of illness in the United States to a surprisingly high level of accuracy. How do we know the system is accurate? We rolled back our input data ten times, once for every year in the 2008 - 2018 range, and built forward looking illness forecasts. Since we had the data about what actually occurred in those flu seasons, we were able to calculate the variance between the forecasted and actual seasons.
Now individuals have the ability to see when illness is on the rise in their area, and how fast it’s likely to increase or decrease. Schools and community centers can see when illness will peak, to determine when it’s time to start putting out the hand sanitizer! Moreover, clinics have the ability to look ahead to plan how much vaccine to purchase, enabling them to never run short. So whether you’re buying tissues, disinfectants, or vaccines, Kinsa’s got you covered with their illness forecast.
Illness forecasts are currently in beta, so get yourself a Kinsa profile and smart thermometer today!
Have questions about this blog? Email us at info@experoinc.com.
Tell us what you need and one of our experts will get back to you.