AI Data Licensing
BusinessDefinition
AI data licensing is the practice of a platform selling structured, authorised access to its user-generated content so that artificial-intelligence companies can use it to train and ground large language models, delivered through a controlled data feed rather than by open scraping. For Reddit, whose archive of human discussion is a sought-after training resource, such deals have become a meaningful and fast-growing source of revenue outside advertising.
In February 2024 Reddit announced a partnership with Google, reported to be worth around sixty million dollars a year for real-time access to its content, disclosed shortly before the company's stock-market listing. In May 2024 it struck a comparable agreement with OpenAI, bringing Reddit content into ChatGPT. These licences typically include user-protection terms, such as requiring the AI company to honour content that users delete. Data licensing matters to platform governance because it monetises material that users created without payment, raising questions about consent and compensation, and because Reddit has positioned licensed access as the lawful alternative to the unauthorised scraping it has sued to stop.
Related issues
Sources
- 01Google expands partnership with Reddit — GoogleOfficial / Reddit2024
- 02
- 03