Non-Google search engines blocked from showing recent Reddit results


A spokesperson for Microsoft told me:

Microsoft respects the robots.txt standard and we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models. Bing stopped crawling Reddit after they implemented their updated robots.txt file on July 1, which prohibits all crawling of their site.

In October, The Washington Post, citing an anonymous source, reported that Reddit was considering blocking Bing search crawlers if it couldn’t reach a deal with Microsoft.

As 404 Media pointed out, Reddit’s guide for accessing its data names “search or website ads” as a commercial use warranting fees. It’s unclear how much money other search engines would need to spend to be permitted to scrape the platform. Rathschmidt said Reddit is “open to working with partners big and small.”

“It’s bad for the health of the Internet for for-profit companies to scrape our content without constraint and use it for, among other things, [training] AI models,” he said.

For now, Google can continue leaning on Reddit to help make search results more relevant. Google didn’t respond to Ars’ request for comment.

Meanwhile, alternative search engines may find it harder to compete.

“With our own ranking algorithms, previously users would often find different pages on Reddit than they might find with Google and others,” Mojeek’s Hayhurst told me.

The CEO added that while being blocked by Reddit alone “is not a huge deal,” he is concerned about the precedent it could set. “Search engines are the main traffic source for most websites, and a spreading of this behavior will further choke off traffic. And smaller sites will be impacted even more than large sites,” he said.

Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.

This article was updated with additional comment from Microsoft.

Scroll to Top