Selling user content to AI: licensing without user consent or compensation
February 2024
Reddit disclosed in 2024 that it had signed data-licensing deals worth hundreds of millions to let AI firms train on user posts — content created by users who were neither asked for consent nor offered any share of the proceeds.
What happened
In February 2024, ahead of its IPO, Reddit revealed the scale of a business line that would come to define its strategy: licensing user-generated content to artificial-intelligence companies. The company disclosed that it had entered data-licensing arrangements with an aggregate contract value of roughly $203 million over terms of two to three years, anchored by a deal with Google reported at around $60 million per year and followed by an OpenAI partnership estimated at roughly $70 million annually. The product being sold was the collective writing of Reddit's users — their posts, comments, questions, and answers — repurposed as training data for AI models.
The controversy was not that Reddit had found a way to monetize its archive, but how. The users who wrote the content were not asked for consent before it was packaged and sold, and they were offered no share of the revenue. Reddit's user agreement grants the company broad license to user content, so the deals were contractually permissible, but many users experienced the disclosure as a violation of an implicit understanding: that they were contributing to a community, not generating a commercial dataset for the platform to sell to the largest technology companies in the world.
Reddit's framing tied the licensing directly to its earlier, deeply unpopular decisions. The company argued that the 2023 API price increases — which had killed third-party apps and triggered the mass subreddit blackout — were necessary precisely so that Reddit, rather than AI firms, captured the value of its data. In that telling, the data-licensing deals were the payoff that justified the API crackdown. To critics, this connected two grievances into one narrative: Reddit had degraded the user and developer experience in order to corner a resource it would then sell without sharing the proceeds with the people who produced it.
The arrangement also drew regulatory attention. In March 2024 the U.S. Federal Trade Commission notified Reddit that its staff was conducting a non-public inquiry into the company's sale, licensing, or sharing of user-generated content for AI training — an early signal that authorities were scrutinizing the emerging market for content-as-training-data. (That inquiry is documented separately.)
The deeper significance is what the deals revealed about the economics of user-generated platforms in the AI era. A community's contributions, aggregated over years, turn out to be enormously valuable to AI developers, and the platform that holds the rights captures that value. Reddit's 2024 disclosures made it the clearest case study of this dynamic: hundreds of millions of dollars changing hands for content the contributors gave freely, with no mechanism — and, in the company's view, no obligation — to compensate them or seek their consent.