Missing Data

I gather some data regularly about my life. I examine and look for trends and anomalies every week, and use the data in data analysis to help me make decisions. I get this data from my watch, a Garmin Forerunner 645, which tracks a number of data points for me. Some data points are recorded automatically, some I set, but I have come to appreciate that information.

Recently I had to get a replacement watch after a bit of a hardware failure. Huge props to Garmin for replacing my out-warranty, 3 year old watch at no cost. During this time, I was without a watch for almost two weeks, and I really noticed the lack of data.

My watch gathers data on sleep, which I often check to see when I’ve struggled at night and wake up tired. Where was my time spent during the night, with lots of movement and wakefulness or more deep sleep? I don’t need a high level of accuracy here, but just something that helps me to think about what I might have done differently the previous day with diet, with stress, with exercise (or lack) or even emotional coping. During my week, I often check the step count, just to see if I’ve taken breaks and gotten moving, or am I sitting too much. I look at my heart rate regularly, both for the long term trends and the specific exercise to see if I’m trying hard enough.

Losing some of this data isn’t really a big deal, but it is something I need to account for, since this is a pretty big hole in my system. As I look for trends and compute averages, I need to account for the missing data points. If I were building a report for myself, I might want to ensure the missing dates are still shown, albeit with no data. I can’t enter zeros here, because imagine 2 weeks worth of zeros in a month or resting heart rate data. I might think I’m much more fit than I am!

There are plenty of solutions and ways to handle both aggregations and visualizations, but as data professionals, we need to ensure we know how to, and make sure we do, account for missing data points. This becomes especially important with sensor data, but even sales data can be affected. We can, and do, lose data for a variety of reasons, and we should also be prepared for those situations.

Steve Jones

Listen to the podcast at Libsyn, Stitcher or iTunes.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Editorial and tagged . Bookmark the permalink.