InvezzInvezz

Reddit sues AI search engine Perplexity over data scraping

2 min czytania
Reddit sues perplexity

Social media giant Reddit has filed a lawsuit against artificial intelligence company Perplexity, accusing it of illegally harvesting user posts to train its AI models in one of the latest clashes between content owners and AI developers over data rights.

The complaint, filed Wednesday in a New York federal court, alleges that Perplexity, along with three partners—Lithuanian data scraper Oxylabs, Texas-based SerpApi, and a “former Russian botnet” called AWMProxy—circumvented technological safeguards to access Reddit’s copyrighted material.

Reddit claims the defendants “masked their identities and disguised their web scrapers as regular users” to extract large volumes of user-generated data.

Perplexity, which operates an AI-powered search engine, denied the allegations, describing the lawsuit as “extortion” and an attack on open access to information.

“Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest,” Perplexity said in a statement.

SerpApi also rejected Reddit’s claims, telling CNBC it would “vigorously defend itself” in court.

Data rights battle intensifies between AI and content platforms

Reddit’s lawsuit underscores a growing tension between social media platforms and AI companies that rely on publicly available data to train their language models.

Similar legal battles have emerged across the tech industry, including Reddit’s ongoing lawsuit against Anthropic, filed earlier this year.

“AI companies are locked in an arms race for quality human content and that pressure has fueled an industrial-scale ‘data laundering’ economy,” Reddit chief legal officer Ben Lee said in a statement.

He added that such practices undermine content creators’ rights and the integrity of the platform’s data ecosystem.

Reddit said it sent Perplexity a cease-and-desist letter after detecting unauthorized use of its material, but alleged that Perplexity responded by increasing its citations to Reddit content “forty-fold.”

The company maintains that its user posts have become some of the most frequently cited sources in Perplexity’s AI-generated summaries.

Perplexity pushes back, calling suit a business tactic

Perplexity denied training its AI systems on Reddit content, saying it only summarizes and cites publicly available discussions.

“It is impossible for us to sign a licensing agreement for something we do not use,” the company said in a post on Reddit, framing the lawsuit as a “show of force” in the platform’s broader negotiations with OpenAI and Google.

“After we explained this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data,” the company said, adding that the lawsuit reflects how “public data has become central to Reddit’s business model.”

AI licensing becomes a key revenue stream for Reddit

Reddit, which hosts more than 100,000 “subreddit” communities, has moved to monetize its vast archive of conversations by selling data access to AI companies.

The firm has signed licensing deals with both OpenAI and Google, and executives have acknowledged that data licensing accounts for nearly 10% of Reddit’s total revenue.

Analysts say the outcome of this lawsuit could set a major precedent for how online platforms control access to user-generated data in the AI era.

If Reddit prevails, it may strengthen the legal foundation for platforms seeking to charge for or restrict AI training data—a move that could reshape the economics of generative AI development.