The biggest findings in the Google Search leak

Illustration: The VergeOne thing right off the bat: the Google Search algorithm has not leaked, and SEO experts don’t suddenly have all the answers. But the information that did leak this week — a collection of thousands of internal...

May 31, 2024 - 21:02

0 66

The biggest findings in the Google Search leak

One thing right off the bat: the Google Search algorithm has not leaked, and SEO experts don’t suddenly have all the answers. But the information that did leak this week — a collection of thousands of internal Google documents — is still huge. It’s an unprecedented look into Google’s inner workings that are typically closely guarded.

Perhaps the most notable revelation from the 2,500 documents is that they suggest Google representatives have misled the public in the past when discussing how the biggest gatekeeper of the internet assesses and ranks content for its search engine.

How Google ranks content is a black box: websites depend on search traffic to survive, and many will go to great lengths — and great expense — to beat out the competition and rise to the top of results. Better ranking means more website visits, which means more money. As a result, website operators hang on to every word Google publishes and each social media post by employees working on search. Their word is taken as gospel, which, in turn, trickles down to everyone using Google to find things.

Over the years, Google spokespeople have repeatedly denied that user clicks factor into ranking websites, for example — but the leaked documents make note of several types of clicks users make and indicate they feed into ranking pages in search. Testimony from the antitrust suit by the US Department of Justice previously revealed a ranking factor called Navboost that uses searchers’ clicks to elevate content in search.

“To me, the larger, meta takeaway is that even more of Google’s public statements about what they collect and how their search engine works have strong evidence against them,” Rand Fishkin, a veteran of the search engine optimization (SEO) industry, told The Verge via email.

The leak first spread after SEO experts Fishkin and Mike King published some of the contents of the leaked documents earlier this week along with accompanying analyses. The leaked API documents contain repositories filled with information about and definitions of data Google collects, some of which may inform how webpages are ranked in search. At first, Google dodged questions about the authenticity of the leaked documents before confirming their veracity on Wednesday.

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information,” Google spokesperson Davis Thompson told The Verge in an email on Wednesday. “We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”

There’s no indication in the documents about how different attributes are weighted, for one. It’s also possible that some of the attributes named in the documents — like an identifier for “small personal sites” or a demotion for product reviews, for example — might have been deployed at some point but have since been phased out. They also may have never been used for ranking sites at all.

“We don’t necessarily know how [the factors named] are being used, aside from the different descriptions of them. But even though they’re somewhat sparse, there’s a lot of information for us,” King says. “What are the aspects that we should be thinking about more specifically when we’re creating websites or optimizing websites?”

The suggestion that the world’s largest search platform doesn’t base search result rankings on how users engage with the content feels absurd on its face. But the repeated denials, carefully worded company responses, and industry publications that unquestioningly carry these claims have made it a contentious topic of debate among SEO marketers.

Another major point highlighted by Fishkin and King relates to how Google may use Chrome data in its search rankings. Google Search representatives have said that they don’t use anything from Chrome for ranking, but the leaked documents suggest that may not be true. One section, for example, lists “chrome_trans_clicks” as informing which links from a domain appear below the main webpage in search results. Fishkin interprets it as meaning Google “uses the number of clicks on pages in Chrome browsers and uses that to determine the most popular/important URLs on a site, which go into the calculation of which to include in the sitelinks feature.”

There are over 14,000 attributes mentioned in the documents, and researchers will be digging for weeks looking for hints contained within the pages. There’s mention of “Twiddlers,” or ranking tweaks deployed outside of major system updates, that boost or demote content according to certain criteria. Elements of webpages, like who the author is, are mentioned, as are measurements of the “authority” of websites. Fishkin points out that there’s plenty that’s not represented much in the documents, too, like information about AI-generated search results.

So what does this all mean for everyone other than the SEO industry? For one, expect that anyone who operates a website will be reading about this leak and trying to make sense of it. A lot of SEO is throwing things against the wall to see what sticks, and publishers, e-commerce companies, and businesses will likely design various experiments to try to test some of what’s suggested in the documents. I imagine that, as this happens, websites might start to look, feel, or read a little differently — all as these industries try to make sense of this wave of new but still vague information.

“Journalists and publishers of information about SEO and Google Search need to stop uncritically repeating Google’s public statements, and take a much harsher, more adversarial view of the search giant’s representatives,” Fishkin says. “When publications repeat Google’s claims as though they are fact, they’re helping Google spin a story that’s only useful to the company and not to practitioners, users, or the public.”