Marketers’ consumer data may be extremely flawed—how Truthset is aiming to change that
Truthset launches Data Collective, comprised of 20 players in the space, to find more accurate data
Marketers are, of course, increasingly obsessed with amassing and deploying consumer data. But Truthset thinks that the marketing world needs to take a step back and first consider the accuracy of all that data—because a lot of it may be seriously flawed.
Toward that end, the San Francisco-based firm just launched the Truthset Data Collective, comprised of 20 players in the data space—including Epsilon, Verisk, Fluent, Alliant and TargetSmart—that are together focused on “validating the accuracy of consumer data,” per the group’s mission statement.
Truthset, which was founded in 2019 by veterans of Nielsen, Salesforce, LiveRamp and Procter & Gamble, has actually been quietly working with its partners in private beta over the past few years. What that has meant in practice is that Truthset got access to massive overlapping datasets related to millions of consumers and billions of data points, giving it the ability to compare and benchmark various data segments algorithmically.
The endgame is not only good PR—data companies coming together to champion accuracy is obviously a good look—but an opportunity for Truthset to further position itself as the go-to data-validation service for marketers looking to better target customers.
Ad Age’s Simon Dumenco spoke with Founder-CEO Scott McKinley and President-Chief Revenue Officer Chip Russo about their company and the Truthset Data Collective.
Ad Age: So before we dive into the Truthset Data Collective, let’s talk about Truthset itself. What’s your elevator pitch as a company?
Scott McKinley: Basically, we set out to create a service that allows anybody who is using consumer data to measure the accuracy of every record they’re using. What our service does at its essence is it allows a user of consumer data to look at a confidence score to decide if they want to include an ID or not include an ID in a given operation.
Ad Age: Give us an example. Like, say I need to reach Hispanic consumers and I’ve rented a list.
McKinley: So, if you’re building an audience that’s supposed to be Hispanics, that data is going to come from all over the place—inferences, probabilistic guesses, some deterministic data—and there’s going to be some amount of error in that file. We allow the person who’s looking at that audience to know exactly which records are likely to actually be Hispanic and which ones aren’t.
Ad Age: How do you do that?
McKinley: We set up Truthset, we decided to put together a census-level view of everything. We went to 20 data providers, asked them to send all of their data to us, and we built an algorithm that runs across all of those data providers. We have independent truth sets—we call them validation sets—that we use to test and train our models.
So what happens, just to simplify it, is we first take every data provider and we test them by themselves against this validation set to see how good they are at getting every attribute right. So if a provider is 60% accurate at getting gender right—which is pretty bad, but it’s also pretty standard—that’s their score for getting gender right. And we assign confidence scores to every single record for 25 different attributes.
Ad Age: A lot of marketers are putting emphasis on having their own data—first-party data, zero-party data—and that’s presumably improving their ability to get their hands on more accurate data. But across other data—second-party data, third-party data—why has it historically been so bad? Because 60% accuracy on gender, you’re right, that’s bad. Bad because of bad data collection? Bad inferences?
McKinley: Well, first of all, you can’t survey everybody in the United States. So what everybody does is take whatever little bit of information they might know and then extrapolate about the audience they don’t know, to try to guess at these attributes about the larger population.
Whether it’s Nielsen looking at media ratings, or it’s Experian looking at information they get from public records and other sources, they then try to do a lookalike model, an extrapolation model, to describe an entire audience. So the first thing is, the data is not going to be perfect right out the gate. The second thing is, incentives are all around scale, right? All the business incentives for the whole data ecosystem are about “Give me more matches so I can reach more people.” And, you know, everybody chased what the client wanted, which was more—more of the things.
Ad Age: And marketers and everybody else just got used to working with a certain level of flawed data.
McKinley: Right. Given there was no way to measure accuracy anyway, people just sort of, you know, covered their eyes and closed their nose and accepted whatever the data provider said. And the waste was built into the price, which isn’t good for anybody. You know, “If half of the data is wrong and I’m paying a $5 CPM, really it’s a $10 CPM, and I’m still OK with that, because it allows me to reach my customers.”
Ad Age: Your goal, then, is to cut that list down to the half that’s accurate.
McKinley: Yeah, we do the opposite of what everyone else does. Whereas everyone else is trying to get as many of the thing as they can—for instance, as many of the IDs in a list to be Hispanic—we’re the opposite. We have algorithms that are built for a specific purpose, which is not to find more, but actually to find less. Our mission is to find the truth about, say, all the genders of the people in a list.
Ad Age: Say I’m a marketer, and I have a third-party list of what I think are 5 million of my ideal target consumers. I go to you guys, you take a look at the data, and you say only 3 million on the list are actually my target consumers. So I’m losing reach, but in a way not really, because my relative response rate should actually go up if I just target the 3 million.
Chip Russo: Yeah. Like, if you’re looking for dog owners, and you market to people who don’t have pets in the house, you’ve wasted every dollar, right? Every dollar that didn’t get to that target is total waste.
Ad Age: And what if I’m a marketer who still really needs to reach 5 million? Can you help me merge various lists that you’ve cut down so that I can get back up to 5 million?
McKinley: Now you’re talking about multi-sourcing, as opposed to single-sourcing from one data provider. The world of audience segments involves buying a lot of hay in haystacks to get to the needles. But we’re moving toward being able to compose piles of just needles.
Ad Age: What’s the level of amenability and cooperation among various data providers to a client coming back and saying, “Hey, Truthset tells us this list is only 45% accurate to our target, so we only want to buy 45% of these names”?
McKinley: Well, first of all, they know that their data is not perfect. A lot of it is probabilistic and modeled. The second thing is all data is not created equal.
But that’s OK, right? I mean, I can buy a $10 bottle of red wine or I could buy a $1,000 bottle of red wine—there are markets for both types of wine, right? So sometimes you do want to pull down the data-accuracy threshold in order to hit the reach threshold. The point is that just because we offer the ability to shrink an audience to absolute certainty doesn’t necessarily mean that everyone’s going to buy that. Another thing is we’re actually helping these data providers price based on quality—they’ve never been able to do that before.
Our philosophy is all boats rise with data accuracy, except for the ones that shouldn’t be in the game in the first place.
Russo: That’s why the Truthset Data Collective is so important. These are the industry-leading data providers, all knowing that if they come together, it’s going to be for the betterment of the industry.
And, you know, they’ve been working with us for three-plus years. We’re just publicly announcing the Data Collective now, but we’ve been laying the groundwork for a long time. Every single quarter, they ship us their complete consumer data file. And then we do analysis—we essentially do data governance for them. That helps them make really smart strategic decisions about their own sources of data. And because we’re measuring quarter-over-quarter, they see trends. Like, “Hey, we invested in this one thing, or we did this modeling exercise to this one segment, and boy, did it change—it changed for the better, or it changed for the worse.”
The Data Collective, in and of itself, knows that it’s raising the awareness of data accuracy. Because if you’re a part of it, then you’re leaning in—you’re making a statement that you’re a data provider that cares about accuracy, that you’re willing to offer transparency and insight. And that’s a big differentiator.