The Fitness Wearables Data You Can (and Can’t) Trust
Or, why I’ll take “time in bed” over a sleep score any day.
Credit: Composite / Alisa Stern; Shutterstock / Andrey_Popov
This post is part of Find Your Fit Tech, Lifehacker's fitness wearables buying guide. I'm asking the tough questions about whether wearables can really improve your health, how to find the right one for you, and how to make the most of the data wearables can offer.
When you strap on a fitness wearable, you suddenly have a wealth of data about your own body. Instead of a little voice saying “I feel tired,” you now have a sleep score, a step count, and maybe a virtual coach advising you to tweak your workout to account for fatigue.
Companies are trying to dazzle you with more and more data, but if you dig through the mountains of information, you’ll find a hard truth: only some of that data is useful.
Why you can’t trust everything your wearable tells you
The paradox of wearables is that there will always be a mismatch between what you want to know, and what a bundle of sensors strapped to your wrist can actually provide. Your goals are probably things like: sleeping better, running faster, losing weight, and staying active. And the wearable can provide things like: heart rate, skin temperature, and distance traveled in a given time. There are some pretty big leaps between those two categories.
I find it helpful to separate a wearable’s features into things it measures versus things it estimates. I like to use Marco Altini’s framework for this. It's divided into three categories, which he defines like this:
Measurements, defined as data that comes directly (or with very minimal interpretation) from a sensor. Examples would be heart rate, skin temperature, and movement.
Calculations or estimates of things we could theoretically measure another way. For example, a wearable might tell you how much time you’re spending in each sleep stage, and in theory you could do a sleep study and find out how it compares. There are studies that have attempted to test or validate some of these features, with mixed results
Calculations or estimates of things that can’t be validated. A lot of the more sophisticated-sounding features on wearables fall into this category. If an app gives you a “sleep score” of 66, there’s no way to know if that number is correct. It’s a made-up metric.
We're left with two questions, then. First, is a given metric accurate? One can only judge that for the first two categories, and only where we have data that somebody has collected and tested in large numbers.
And second, is the metric useful? That depends on context. What do you end up changing in your real-world life based on your fitness tracker’s data? For example, if you end up getting more sleep or more exercise, that’s probably a good thing. But it’s easy to end up chasing a made-up metric that doesn’t actually benefit you. Skipping workouts to get a better recovery score, or doing grumpy laps around the living room because it’s 10 p.m. and you’re only up to 9,000 steps, may not be making you a fitter person.
Accuracy is only part of the picture
Fortunately, scientists can compare wearables’ output to known standards for measuring things. They can check how accurate a given device is at gauging your heart rate, measuring sleep stages, or estimating the number of calories burned during exercise.
Unfortunately, those studies don’t happen before devices are released. They happen after the fact, usually in small numbers and with not enough funding. By the time they’re published, the companies have moved on and are producing different models. I discussed the results of some of those studies in this piece on calorie burn (spoiler: every device sucks at calculating calorie burn).
When studies are done before a device makes it to market, those studies are usually done or funded by the company that profits from the device. And they can only validate direct measurements, or estimates of things that can themselves be measured (like our sleep study example, above). There is, by definition, no way to validate whether any of the made-up metrics are accurate.
So, with all that in mind, let’s run through the types of data you can get from your fitness tracker, and what tends to be worth paying attention to.
Pay attention to sleep time (but not stages)
Sleep tracking devices have appeared to get more and more sophisticated over the years—but their base functionality hasn’t really changed.
The best thing a sleep-tracking device can do for you is provide a reality check on how much you’re sleeping. When you’re asleep, you stop moving, and your heart rate and other measurements like skin temperature can help the device tell the difference between actually sleeping versus reading in bed. (Some devices are better at this than others.)
But wearables are not good at telling the difference between different stages of sleep. To do that properly, you need to analyze brain waves, which a wristwatch simply cannot do. (This is why I was amused, but not concerned, when I wore an Oura and a Whoop simultaneously and got opposite advice from each. The Oura said I wasn’t getting enough REM sleep. The Whoop said I was getting too much.)
Some devices have gotten impressively good at guessing at sleep stages, but they’re still just guesses. For example, the Oura ring—touted as one of the best—found that its newest algorithm is 92% to 93% accurate for detecting when somebody is asleep versus awake, but that drops to 76-78% accuracy for picking up sleep stages. The data from the study show that the algorithm still overestimates its measurements for some people while underestimating others, and accuracy varies with the age of the person wearing the ring. As a feat of technology: very cool. As personal data: absolutely not worth paying attention to.
Even if the sleep stages were totally accurate, I still wouldn’t bother looking at them. Because how do you get more deep sleep, or more light sleep, or more REM? Advice for each of these always boils down to: get more sleep. Here’s Whoop saying that if you want to get more deep sleep, you should spend more time in bed; they point readers to Sleep Foundation tips on general sleep hygeine. That’s the same advice they give to people who want to get more REM sleep. You get the idea.
The bottom line:
Trust the amount of sleep it says you got (but do your own reality check; does the number make sense?)
Don’t trust the sleep stages or sleep quality scores.
Trust your heart rate during exercise (but not zones, unless you do some homework)
Heart rate is a relatively simple measurement: Your heart is either beating at this exact moment, or you’re between beats. So it’s reasonable to expect this metric to be relatively accurate.
In truth, accuracy varies between models, but most of the popular brands are good enough for general use. For example, here’s a 2020 study finding that the Apple Watch Series 4, Fitbit Charge 2, Garmin Vivosport 3, and Xaomi Miband 3 all did a pretty good job of measuring heart rate while at rest or during sustained exercise.
Devices tend to have trouble getting an accurate heart rate when you’re moving around a lot, so if your heart rate is constantly spiking and resetting while you’re doing something very active, your wrist-based tracker might miss some of those peaks. But if you’re looking to gauge how hard you were working during a whole workout—whether you did that entire jog in zone 2, let’s say—the sensor is probably good enough. If you want to be sure you’re getting accurate heart rate data, I recommend a chest strap.
But what about those zones? This is where things get trickier. If you’re training with heart rate, you’re probably using zones. Zone 1 means you’re barely working; zone 5 is an all-out effort that you can only keep up for a few seconds. A given workout might target a certain zone: for example, an easy jog might be done in zone 2. This would all make sense except for two things:
“Zone 2” doesn’t have an agreed-upon definition; all the companies define their zones differently.
The zones are based on assumptions about your heart, usually calculating your maximum heart rate based on your age. These calculations are often very wrong.
If you’ve ever finished an “easy” jog, just to see that you spent all your time in zone 5, I can guarantee that your zones were calibrated wrong. Usually this is because your wearable uses a formula to predict your max heart rate, and then it sets your zones based on that prediction.
But all of the heart rate prediction formulas suck, and they tend to get less accurate the older you are. Subtracting your age from 220 just does not have much to do with how fast your heart can actually beat. The authors of the paper I just linked concluded that, “individuals should use [graded treadmill exercise tests] to determine [max heart rate]” rather than relying on a prediction equation. The American Council on Exercise, meanwhile, teaches its personal trainers to ignore the formulas entirely and to either use exercise tests to set personalized zones, or to use RPE (basically, how hard you feel like you’re working) to design clients’ workouts.
And if you think that asking a person to guess “uh…I guess this feels like a three out of 10” sounds inaccurate: well, it’s still better than relying on an arbitrary zone calculated from an error-prone formula. The formula isn’t accurate, either.
The bottom line:
Trust the heart rate number to be roughly accurate over the course of an exercise (big picture, not individual spikes)
Don’t trust the heart rate zones if they were calculated from default settings.
Trust recovery metrics like HRV (but not readiness scores)
Do you need to track your recovery? No. Most of us can probably stop there. But some of us like to see how our exercise and sleep habits affect each other, and to get a handle on what’s going on in our body when we’re sick or stressed. I will be the first to tell you that this data is useless, yet at the same time you can pry my Oura ring from my cold dead hands. I like to look at the numbers.
When it comes to recovery, I see a huge gap between what the heart rate numbers can tell me, and what an app decides my “recovery” or “readiness” actually is.
Resting heart rate tends to increase when you’re stressed or sick. It can spike upwards if you drank alcohol or didn’t get much sleep. It trends downward over time as your cardio fitness improves.
Heart rate variability, or HRV, measures how irregular your heartbeats are. More variability is good, and your HRV tends to be higher when you’re healthy, not too stressed, and have been sleeping well.
Readiness or recovery scores, including metrics like Body Battery, use RHR and HRV, but also myriad other data to come up with a number describing how you’re doing. Rather than only measuring how your body reacts to exercise and sleep, they’ll also take your exercise intensity, your sleep scores, and other variables into account.
Because recovery scores use so many factors, they can easily get far away from what you’re actually trying to measure. For example, I’ve found that light cardio on my “rest” days helps me to recover faster from hard workouts, but Oura’s algorithm counts this as too much activity and lowers my score. It also doesn’t have a sense of when I want to be working harder and taking on more fatigue.
Looking back at my scores, I really don’t see any meaningful correlation between the days I performed well in the gym or in competitions, and the days Oura thought I was well-recovered.
I’ve written before about the better way to use your readiness data. Ignore the scores, and take the measurements in context.
The bottom line:
Trust your HRV and resting heart rate
Don’t trust scores like recovery, readiness, or body battery that are trying to capture too many things in one number.
Trust your calorie burn, but only in the big picture
Probably the biggest feature of wearables for many people is the fact that they can tell you how many calories you’re actually burning each day. No longer will you have to guess at how much to eat—you can see the number right on your wrist!
Too bad they aren’t accurate enough to really fulfill that promise. As I’ve noted before, fitness trackers are notoriously inaccurate at calculating calorie burn. They’re not as bad as those lying elliptical machines, but they underestimate for some people and some activities, and they overestimate for others. There’s no way to tell whether the number you’re getting is too big or too small, or just right—so what’s the use?
I find that calorie burn numbers can be useful in the big picture. If you used to burn 1,800 calories per day, but you’ve started marathon training and are now burning 2,200 calories per day, you can absolutely take that as a signal that you should eat a bit more so that you’re properly fueling your runs.
What I wouldn’t do is nickel-and-dime yourself when it comes to specific numbers. Oh, I burned 100 fewer calories today than yesterday, so I should only have a half portion of salad dressing. Or, I went ice skating for the one and only time I’ll skate this year, and my watch says that burned 600 calories, so I can eat an extra 600 calories of dessert. Your watch isn’t accurate enough to support either of those assumptions.
The bottom line:
Trust the general trend, using it as a reality check on whether your activity levels have gone up or down
Don’t trust the exact number, especially for individual exercise sessions. Fueling your body and feeling good are more important than making the numbers match up exactly.
Trust your step count, as long as you keep perspective
I have a love/hate relationship with this one, personally. If I’m taking lots of walks or going for lots of runs, I like to check my step count and watch it climb. But if I’m in a phase where my workouts happen mainly on the bike or in the gym, my step count will be abysmal, even though I’m getting plenty of activity.
Ultimately, the number of steps you take in a day is not an important number to track. But it’s an easy number to track. Even if you don’t have a smartwatch, your phone is probably already counting your steps (just open your Apple Health or Google Fit app to see).
If you find step counts motivating, feel free to keep tabs on them. Just make sure you’re willing to be honest with yourself about whether it’s having a positive influence on your life.
Oh, and if you’re curious about accuracy: No two devices are going to agree about how many steps you took in a day. Some will underestimate, some will overestimate, and they’ll all track some activities better than others. I wouldn’t worry about those differences. Just compare the readings you get from day to day from the same device.
The bottom line:
Trust the number of steps you get (it’s not wholly accurate, but it doesn’t need to be).
Don’t trust the implication that you need to hit X number every day to be a healthy or good person. Use the step count if it helps you, and ignore it if it doesn’t.
Trust your cardio fitness or VO2max, but take it as a rough estimate
Nearly every gadget will now give you an estimate of your VO2max, often under a label like a “cardio fitness” score.
VO2max is a measurement you can get done in a lab (and professional athletes will often go to a lab to get tested) that puts a number on how much oxygen your body can use at a time. In short: the higher your VO2max, the better your cardio fitness. People with excellent VO2max tend to be able to run longer and faster than people with lower scores.
Studies have found that VO2max correlates with improved health and longevity, but that’s not necessarily because VO2max itself makes you healthier. It’s just one of the easier-to-measure components of aerobic and athletic abilities. (Other metrics of cardiovascular fitness also correlate with longevity.)
Wearables test cardio fitness in a very different way than a laboratory VO2max test. Instead of putting you on exercise equipment with an oxygen-measuring mask strapped to your face, they simply measure your heart rate while you run or walk. If you can run faster with the same heart rate, or run the same speed as before with a lowered heart rate, your cardio fitness has improved.
Should you compare this number with VO2max charts? No, because it’s not a true VO2max. But you can keep an eye on this number over time, and trust that you’re gaining fitness when you see it go up. Just promise me one thing: you’ll look up how your wearable measures it, and take the number in context. Some watches’ calculations will get messed up when the weather is hot, and if you don’t do outdoor walks or runs very often, the watch won’t have consistent data to work from.
The bottom line:
Trust the way this metric changes over time (higher is better)
Don’t trust the exact number, and don’t put too much stock in it at all if you aren’t consistently running or walking outdoors.
Trust your mileage and GPS location, usually
When you do outdoor activities, like running or cycling, your watch will want to measure the distance you went. This is helpful to track your overall mileage and your speed—but there are caveats.
First, GPS isn’t always accurate, and it tends to have reliability issues in urban areas, where the signals can bounce off of buildings. We discuss this more in the shopping guides for running watches and outdoor adventure watches, but if you want the most accurate location data, you’ll want a device that can work with multiple satellite systems (for example, GLONASS and Galileo in addition to GPS) and to use dual-band GPS if possible.
That said, most running watches and smartwatches have excellent location data now compared to what they were capable of five or even 10 years ago. Your watch will probably be a lot more accurate than your phone, too.
But it’s not going to be perfect. Those rare glitches can be pretty annoying. For example, there’s a spot in a local park where two roads run very close to one another, and my watch thinks I’m on one when I’m actually on the other. When it finally realizes where I am, the GPS track suddenly shoots over to my actual location, making it look like I teleported, and messing up my split time for that mile.
So don’t worry too much if your watch tracks a distance that’s slightly different from what you were expecting. And definitely don’t worry if you run a 5K race and find that your smartwatch thinks you ran 5.3 kilometers. Your watch doesn’t measure a race course the same way as the race course’s certifying body, so it’s normal for your watch to think you went a bit further. Your finish time is the one that counts for your PR.
The bottom line:
Trust the distance and location, with just a tiny grain of salt.
Don’t trust race distances or locations to be accurate to the inch. A good watch will be more accurate than an older watch or phone, but nothing is perfect.
Trusting badges and streaks is up to you
Here I return to that divide I spoke about earlier: that gap between what a wearable can measure and the impact it ultimately has on your life.
Wearable brands try to bridge this gap with gamification. You can earn badges and keep up streaks, so long as you keep interacting with the device and its app. If these little dopamine hits keep you using the device, and using the device improves your health, that’s arguably harmless.
But of course it’s not always that simple. On the one hand, regularly wearing an activity tracker tends to get people to exercise more and may help with weight loss. On the other, the tricks the devices use to encourage consistency can end up backfiring. If the only thing keeping you going is fear of breaking your streak, then once you do break that streak, you better have something else to keep you going. A streak can act as your training wheels, but it will never replace the work of properly forming a habit.
The bottom line:
Trust that you will find your own motivation through action. (If you try, you probably will!)
Don’t trust that the badges and streaks themselves will keep you going.