Reddit Cuts off Search Engine Scrapers, Including Bing

Reddit’s looking to place a higher value on its data, with a specific focus on AI development.

Jul 26, 2024 - 21:07

0 69

Reddit Cuts off Search Engine Scrapers, Including Bing

This is interesting.

This week, Reddit mas moved to block search engines not named Google from crawling its site, via an update to its robot.txt file which blocks their crawlers.

Microsoft’s Bing has now stopped crawling Reddit, after an update to the platform’s robots.txt file on July 1st, which essentially refuses access to all non-approved search engines, meaning that Reddit results will not be displayed on other search engines.

Except, of course, Google.

Reddit signed a $60 million per year data deal with Google back in February, which has seen Google referring a heap more traffic to its pages, and it seems that this deal has now empowered Reddit to set a precedent on data access, as it looks to expand its revenue potential.

Though Reddit says that it’s not specifically linked to the Google deal, as such.

As per Reddit:

“This is not at all related to our recent partnership with Google. We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.”

AI training has been a big focus for Reddit and X (formerly Twitter), with many early AI projects scraping both of their platforms to source human-created inputs for their LLMs. Both X and Reddit have now upped the price of their API access, in order to ensure that AI projects are not profiting off of their insights, which also gives them more control over which AI projects they allow to use such for their initiatives.

Reddit’s move to restrict search scraper access is aligned with the same, with Reddit looking to implement more controls over its data, in order to maximize its profits.

Which makes sense. Reddit, which is now a publicly listed entity, is looking to enhance value for its shareholders, however it can, and building its business, through various means, is key to its long term viability.

Reddit’s data is highly valuable, as its communities cover a range of niche topics, providing human insight and answers to common web queries. That can help to improve AI chatbots and systems, which is why Google has opted to pay Reddit for access.

It seems that Reddit’s now looking for similar deals with other search engines, and if they don’t provide it, it’s cutting them off. Which will hurt Reddit traffic to some degree, by reducing referral links, but Reddit’s obviously decided that such an impact is worth the risk, in order to place a higher value on its data.

It’ll be interesting to see if other platforms follow suit, and whether Google, and others, are forced to make data deals to maintain scraper access. The company with the most valuable data will win out in the AI race, and Reddit definitely has some of the best quality data inputs available, and it’ll be interesting to see whether more platforms and publishers seek to value their access in the same way.

If that happens, that’ll price many smaller AI projects out of the market, as the big players secure valuable data partnerships, and others are potentially forced to train and re-train their models on AI generated outputs.

Which will lead to worse quality results, and less usage, and ultimately, it does seem that platforms like Reddit, as well as Meta and X, which have a steady flow of user input, do hold the cards in this race.

We’ll see how it plays out.