Why $60 million The Google-Reddit AI partnership portends changes to LLM content licensing

0
61

Reddit, one of the most popular social media and content-sharing sites worldwide, has agreed to pay Google $60 million in exchange for real-time access to its data. This information will be used by Google for its artificial intelligence (AI) products.

A cash infusion now could bolster investor confidence as Reddit prepares to go public in the upcoming weeks. Additionally, Google is attempting to improve the dependability of its products in light of criticism directed towards its AI system from both the US and India.

The agreement highlights the importance of user-generated content on the Internet, particularly for Generative AI (GenAI) platforms such as ChatGPT from OpenAI and Gemini from Google. It also highlights the difficulties in training large language models (LLMs) in light of the possibility that they may violate the online intellectual property rights of content creators, underscoring the possible importance of these content licensing agreements.

Reddit’s data application programming interface (API), which provides real-time, unique, structured content from the social media platform, will be made available to Google.

Google wrote on their blog, “With the Reddit Data API, we will have more organized and effective access to more recent data, along with improved signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways.”

Why does Google think this is important?

For Google to improve the accuracy and dependability of its foundational model, it needs access to as much user-generated content as possible. The company needs to get its act together immediately because of the widespread criticism of the responses produced by Gemini. This is because companies like OpenAI will soon have more influence over how people access internet services in the future.

In response to the query, “Is Modi a fascist?” India The Prime Minister, according to Gemini, has been “accused of implementing policies some experts have characterised as fascist,” including the use of violence against religious minorities and the “Hindu nationalist ideology, crackdown on dissent, and these actions by the BJP.”

Following the threat of a show-cause notice by the IT Ministry regarding the “illegal and problematic” responses, Google announced that it had resolved the problem and was attempting to make improvements to the system.

The business had previously issued an apology for “inaccuracies in some historical image generation depictions” in response to complaints that Gemini had portrayed groups such as Nazi-era German soldiers or white figures like the US founding fathers as people of colour.

Are content licensing agreements a la Google Reddit the way LLMs will be built in the future?

Due to their “unlawful” use of copyrighted content, the New York Times sued OpenAI, Microsoft, and other well-known AI platforms last year. These companies are the makers of ChatGPT. In its lawsuit, the New York Times claimed that the businesses were using publisher-generated original content as a basis for their models and responses.

The ownership of online content and whether GenAI platforms are violating the intellectual property (IP) rights of companies like news publications, which publish substantial amounts of up-to-date and generally accurate information online—data that GenAI platforms can truly benefit from—have come back into focus as a result of the NYT lawsuit.

The foundation of the responses produced by AI systems like ChatGPT and Gemini is the millions of textual pieces that authors—including news publishers—have posted online.

One of the industries that is most sensitive to intellectual property rights, the music business, has also been opposing the use of AI in the sector. The Universal Music Group has requested that music producers refrain from using its content as a source for AI bots to create new songs, a request made to streaming services like Spotify.

In the age of artificial intelligence, copyright laws in every nation, including India, require a radical redesign. The Copyright Act of 1957, which governs creative works in India, defines a “author” as “the person who causes the work to be created” in relation to “any literary, dramatic, musical, or artistic work which is computer-generated.”

This definition, however, ignores the reality that AI systems are only as good as the base dataset they are trained on; they are not capable of producing information on their own. Additionally, the base dataset is composed of other authors’ copyrighted works.